[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-google-research--xtreme":3,"tool-google-research--xtreme":64},[4,17,27,35,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",143909,2,"2026-04-07T11:33:18",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85013,"2026-04-06T11:09:19",[26,43,44,45,14,46,15,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":23,"last_commit_at":54,"category_tags":55,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[14,26,13,15,46],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",75054,"2026-04-07T10:38:03",[15,26,13,46],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":23,"env_os":95,"env_gpu":95,"env_ram":95,"env_deps":96,"category_tags":107,"github_topics":79,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":108,"updated_at":109,"faqs":110,"releases":136},5221,"google-research\u002Fxtreme","xtreme","XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 typologically diverse languages and includes nine tasks.","XTREME 是一个专为评估预训练多语言模型跨语言泛化能力而设计的大规模基准测试平台。它旨在解决当前人工智能在处理非英语语种，尤其是低资源语言时表现不佳的难题，帮助开发者验证模型是否能将从一种语言学到的知识有效迁移到其他语言中。\n\n该平台涵盖了 40 种类型学差异巨大的语言，涉及 12 个语系，其中包括泰米尔语、斯瓦希里语等以往常被忽视的语言。XTREME 包含了句子分类、结构化预测、句子检索和问答等九项核心任务，要求模型在不同语法和语义层级上进行综合推理。其独特的技术亮点在于采用“零样本跨语言”评估设定，即模型仅在英语数据上训练，直接在其他语言测试集上进行评测，从而真实反映模型的泛化潜力。\n\nXTREME 非常适合自然语言处理领域的研究人员和算法工程师使用。通过提供标准化的数据集下载脚本、基线系统代码以及详细的排行榜提交指南，它为多语言模型的对比研究和性能优化提供了权威依据，是推动全球语言平等和技术普惠的重要基础设施。","# XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization\n\n[**Tasks**](#tasks-and-languages) | [**Download**](#download-the-data) |\n[**Baselines**](#build-a-baseline-system) |\n[**Leaderboard**](#leaderboard-submission) |\n[**Website**](https:\u002F\u002Fsites.research.google\u002Fxtreme) |\n[**Paper**](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2003.11080.pdf) |\n[**Translations**](https:\u002F\u002Fconsole.cloud.google.com\u002Fstorage\u002Fbrowser\u002Fxtreme_translations)\n\nThis repository contains information about XTREME, code for downloading data, and\nimplementations of baseline systems for the benchmark.\n\n# Introduction\n\nThe Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models. It covers 40 typologically diverse languages (spanning 12 language families) and includes nine tasks that collectively require reasoning about different levels of syntax and semantics. The languages in XTREME are selected to maximize language diversity, coverage in existing tasks, and availability of training data. Among these are many under-studied languages, such as the Dravidian languages Tamil (spoken in southern India, Sri Lanka, and Singapore), Telugu and Malayalam (spoken mainly in southern India), and the Niger-Congo languages Swahili and Yoruba, spoken in Africa.\n\nFor a full description of the benchmark, see [the paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.11080).\n\n# Tasks and Languages\n\nThe tasks included in XTREME cover a range of standard paradigms in natural language processing, including sentence classification, structured prediction, sentence retrieval and question answering. The full list of tasks can be seen in the image below.\n\n![The datasets used in XTREME](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-research_xtreme_readme_fab461ee8c3b.png)\n\nIn order for models to be successful on the XTREME benchmark, they must learn representations that generalize across many tasks and languages. Each of the tasks covers a subset of the 40 languages included in XTREME (shown here with their ISO 639-1 codes): af, ar, bg, bn, de, el, en, es, et, eu, fa, fi, fr, he, hi, hu, id, it, ja, jv, ka, kk, ko, ml, mr, ms, my, nl, pt, ru, sw, ta, te, th, tl, tr, ur, vi, yo, and zh. The languages were selected among the top 100 languages with the [most Wikipedia articles](https:\u002F\u002Fmeta.wikimedia.org\u002Fwiki\u002FList_of_Wikipedias) to maximize language diversity, task coverage, and availability of training data. They include members of the Afro-Asiatic, Austro-Asiatic, Austronesian, Dravidian, Indo-European, Japonic, Kartvelian, Kra-Dai, Niger-Congo, Sino-Tibetan, Turkic, and Uralic language families as well as of two isolates, Basque and Korean.\n\n# Download the data\n\nIn order to run experiments on XTREME, the first step is to download the dependencies. We assume you have installed [`anaconda`](https:\u002F\u002Fwww.anaconda.com\u002F) and use Python 3.7+. The additional requirements including `transformers`, `seqeval` (for sequence labelling evaluation), `tensorboardx`, `jieba`, `kytea`, and `pythainlp` (for text segmentation in Chinese, Japanese, and Thai), and `sacremoses` can be installed by running the following script:\n```\nbash install_tools.sh\n```\n\nThe next step is to download the data. To this end, first create a `download` folder with ```mkdir -p download``` in the root of this project. You then need to manually download `panx_dataset` (for NER) from [here](https:\u002F\u002Fwww.amazon.com\u002Fclouddrive\u002Fshare\u002Fd3KGCRCIYwhKJF0H3eWA26hjg2ZCRhjpEQtDL70FSBN) (note that it will download as `AmazonPhotos.zip`) to the `download` directory. Finally, run the following command to download the remaining datasets:\n```\nbash scripts\u002Fdownload_data.sh\n```\n\nNote that in order to prevent accidental evaluation on the test sets while running experiments,\nwe remove labels of the test data during pre-processing and change the order of the test sentences\nfor cross-lingual sentence retrieval.\n\n# Build a baseline system\n\nThe evaluation setting in XTREME is zero-shot cross-lingual transfer from English. We fine-tune models that were pre-trained on multilingual data on the labelled data of each XTREME task in English. Each fine-tuned model is then applied to the test data of the same task in other languages to obtain predictions.\n\nFor every task, we provide a single script `scripts\u002Ftrain.sh` that fine-tunes pre-trained models implemented in the [Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) repo. To fine-tune a different model, simply pass a different `MODEL` argument to the script with the corresponding model. The current supported models are `bert-base-multilingual-cased`, `xlm-mlm-100-1280` and `xlm-roberta-large`.\n\n## Universal dependencies part-of-speech tagging\n\nFor part-of-speech tagging, we use data from the Universal Dependencies v2.5. You can fine-tune a pre-trained multilingual model on the English POS tagging data with the following command:\n```\nbash scripts\u002Ftrain.sh [MODEL] udpos\n```\n\n## Wikiann named entity recognition\n\nFor named entity recognition (NER), we use data from the Wikiann (panx) dataset. You can fine-tune a pre-trained multilingual model on the English NER data with the following command:\n```\nbash scripts\u002Ftrain.sh [MODEL] panx\n```\n\n## PAXS-X sentence classification\n\nFor sentence classification, we use the Cross-lingual Paraphrase Adversaries from Word Scrambling (PAWS-X) dataset. You can fine-tune a pre-trained multilingual model on the English PAWS data with the following command:\n```\nbash scripts\u002Ftrain.sh [MODEL] pawsx\n```\n\n## XNLI sentence classification\n\nThe second sentence classification dataset is the Cross-lingual Natural Language Inference (XNLI) dataset. You can fine-tune a pre-trained multilingual model on the English MNLI data with the following command:\n```\nbash scripts\u002Ftrain.sh [MODEL] xnli\n```\n\n## XQuAD, MLQA, TyDiQA-GoldP question answering\n\nFor question answering, we use the data from the XQuAD, MLQA, and TyDiQA-Gold Passage datasets.\nFor XQuAD and MLQA, the model should be trained on the English SQuAD training set. For TyDiQA-Gold Passage, the model is trained on the English TyDiQA-GoldP training set. Using the following command, you can first fine-tune a pre-trained multilingual model on the corresponding English training data, and then you can obtain predictions on the test data of all tasks.\n```\nbash scripts\u002Ftrain.sh [MODEL] [xquad,mlqa,tydiqa]\n```\n\n## BUCC sentence retrieval\n\nFor cross-lingual sentence retrieval, we use the data from the Building and Using Parallel Corpora (BUCC) shared task. As the models are not trained for this task but the representations of the pre-trained models are directly used to obtain similarity judgements, you can directly apply the model to obtain predictions on the test data of the task:\n```\nbash scripts\u002Ftrain.sh [MODEL] bucc2018\n```\n\n## Tatoeba sentence retrieval\n\nThe second cross-lingual sentence retrieval dataset we use is the Tatoeba dataset. Similarly to BUCC, you can directly apply the model to obtain predictions on the test data of the task:\n```\nbash scripts\u002Ftrain.sh [MODEL] tatoeba\n```\n\n# Leaderboard Submission\n\n## Submissions\nTo submit your predicitons to [**XTREME**](https:\u002F\u002Fsites.research.google\u002Fxtreme), please create one single folder that contains 9 sub-folders named after all the tasks, i.e., `udpos`, `panx`, `xnli`, `pawsx`, `xquad`, `mlqa`, `tydiqa`, `bucc2018`, `tatoeba`. Inside each sub-folder, create a file containing the predicted labels of the test set for all languages. Name the file using the format `test-{language}.{extension}` where `language` indicates the 2-character language code, and `extension` is `json` for QA tasks and `tsv` for other tasks. You can see an example of the folder structure in `mock_test_data\u002Fpredictions`.\n\n## Evaluation\nWe will compare your submissions with our label files using the following command:\n```\npython evaluate.py --prediction_folder [path] --label_folder [path]\n```\n\n# Translations\n\nAs part of training translate-train and translate-test baselines we have automatically translated\nEnglish training sets to other languages and tests sets to English. Translations are available for\nthe following datasets: SQuAD v1.1 (only train and dev), MLQA, PAWS-X, TyDiQA-GoldP, XNLI, and XQuAD.\n\nFor PAWS-X and XNLI, the translations are in the following format:\nColumn 1 and Column 2: original sentence pairs\nColumn 3 and Column 4: translated sentence pairs\nColumn 5: label\n\nThis will help make the association between the original data and their translations.\n\nFor XNLI and XQuAD, we have furthermore created pseudo test sets by automatically translating the English test set to the remaining\nlanguages in XTREME so that test data for all 40 languages is available. Note that\nthese translations are noisy and should not be treated as ground truth.\n\nAll translations are available [here](https:\u002F\u002Fconsole.cloud.google.com\u002Fstorage\u002Fbrowser\u002Fxtreme_translations).\n\n# Paper\n\nIf you use our benchmark or the code in this repo, please cite our paper `\\cite{hu2020xtreme}`.\n```\n@article{hu2020xtreme,\n      author    = {Junjie Hu and Sebastian Ruder and Aditya Siddhant and Graham Neubig and Orhan Firat and Melvin Johnson},\n      title     = {XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization},\n      journal   = {CoRR},\n      volume    = {abs\u002F2003.11080},\n      year      = {2020},\n      archivePrefix = {arXiv},\n      eprint    = {2003.11080}\n}\n```\nPlease consider including a note similar to the one below to make sure to cite all the individual datasets in your paper.\n\nWe experiment on the XTREME benchmark `\\cite{hu2020xtreme}`, a composite benchmark for multi-lingual learning consisting of data from the XNLI `\\cite{Conneau2018xnli}`, PAWS-X `\\cite{Yang2019paws-x}`, UD-POS `\\cite{nivre2018universal}`, Wikiann NER `\\cite{Pan2017}`, XQuAD `\\cite{artetxe2020cross}`, MLQA `\\cite{Lewis2020mlqa}`, TyDiQA-GoldP `\\cite{Clark2020tydiqa}`, BUCC 2018 `\\cite{zweigenbaum2018overview}`, Tatoeba `\\cite{Artetxe2019massively}` tasks. We provide their BibTex information as follows.\n```\n@inproceedings{Conneau2018xnli,\n    title = \"{XNLI}: Evaluating Cross-lingual Sentence Representations\",\n    author = \"Conneau, Alexis  and\n      Rinott, Ruty  and\n      Lample, Guillaume  and\n      Williams, Adina  and\n      Bowman, Samuel  and\n      Schwenk, Holger  and\n      Stoyanov, Veselin\",\n    booktitle = \"Proceedings of EMNLP 2018\",\n    year = \"2018\",\n    pages = \"2475--2485\",\n}\n\n@inproceedings{Yang2019paws-x,\n    title = \"{PAWS-X}: A Cross-lingual Adversarial Dataset for Paraphrase Identification\",\n    author = \"Yang, Yinfei  and\n      Zhang, Yuan  and\n      Tar, Chris  and\n      Baldridge, Jason\",\n    booktitle = \"Proceedings of EMNLP 2019\",\n    year = \"2019\",\n    pages = \"3685--3690\",\n}\n\n@article{nivre2018universal,\n  title={Universal Dependencies 2.2},\n  author={Nivre, Joakim and Abrams, Mitchell and Agi{\\'c}, {\\v{Z}}eljko and Ahrenberg, Lars and Antonsen, Lene and Aranzabe, Maria Jesus and Arutie, Gashaw and Asahara, Masayuki and Ateyah, Luma and Attia, Mohammed and others},\n  year={2018}\n}\n\n@inproceedings{Pan2017,\nauthor = {Pan, Xiaoman and Zhang, Boliang and May, Jonathan and Nothman, Joel and Knight, Kevin and Ji, Heng},\nbooktitle = {Proceedings of ACL 2017},\npages = {1946--1958},\ntitle = {{Cross-lingual name tagging and linking for 282 languages}},\nyear = {2017}\n}\n\n@inproceedings{artetxe2020cross,\nauthor = {Artetxe, Mikel and Ruder, Sebastian and Yogatama, Dani},\nbooktitle = {Proceedings of ACL 2020},\ntitle = {{On the Cross-lingual Transferability of Monolingual Representations}},\nyear = {2020}\n}\n\n@inproceedings{Lewis2020mlqa,\nauthor = {Lewis, Patrick and Oğuz, Barlas and Rinott, Ruty and Riedel, Sebastian and Schwenk, Holger},\nbooktitle = {Proceedings of ACL 2020},\ntitle = {{MLQA: Evaluating Cross-lingual Extractive Question Answering}},\nyear = {2020}\n}\n\n@inproceedings{Clark2020tydiqa,\nauthor = {Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki},\nbooktitle = {Transactions of the Association of Computational Linguistics},\ntitle = {{TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}},\nyear = {2020}\n}\n\n@inproceedings{zweigenbaum2018overview,\n  title={Overview of the third BUCC shared task: Spotting parallel sentences in comparable corpora},\n  author={Zweigenbaum, Pierre and Sharoff, Serge and Rapp, Reinhard},\n  booktitle={Proceedings of 11th Workshop on Building and Using Comparable Corpora},\n  pages={39--42},\n  year={2018}\n}\n\n@article{Artetxe2019massively,\nauthor = {Artetxe, Mikel and Schwenk, Holger},\njournal = {Transactions of the ACL 2019},\ntitle = {{Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond}},\nyear = {2019}\n}\n```\n","# XTREME：用于评估跨语言泛化能力的大规模多语言多任务基准\n\n[**任务**](#tasks-and-languages) | [**下载**](#download-the-data) |\n[**基线系统**](#build-a-baseline-system) |\n[**排行榜**](#leaderboard-submission) |\n[**官网**](https:\u002F\u002Fsites.research.google\u002Fxtreme) |\n[**论文**](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2003.11080.pdf) |\n[**译文**](https:\u002F\u002Fconsole.cloud.google.com\u002Fstorage\u002Fbrowser\u002Fxtreme_translations)\n\n本仓库包含关于XTREME的介绍、数据下载代码以及该基准测试中基线系统的实现。\n\n# 简介\n\n跨语言迁移多语言编码器评测（XTREME）基准是一个用于评估预训练多语言模型跨语言泛化能力的基准。它涵盖了40种类型学上多样化的语言（横跨12个语系），并包括九项任务，这些任务综合起来需要对不同层次的句法和语义进行推理。XTREME中的语言选择旨在最大化语言多样性、现有任务的覆盖范围以及训练数据的可获得性。其中包含许多研究较少的语言，例如达罗毗荼语系的泰米尔语（主要在印度南部、斯里兰卡和新加坡使用）、泰卢固语和马拉雅拉姆语（主要在印度南部使用），以及尼日尔-刚果语系的斯瓦希里语和约鲁巴语，它们广泛分布于非洲。\n\n有关该基准的完整描述，请参阅[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.11080)。\n\n# 任务与语言\n\nXTREME包含的任务覆盖了自然语言处理领域的一系列标准范式，包括句子分类、结构化预测、句子检索和问答。完整的任务列表见下图。\n\n![XTREME中使用的数据集](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-research_xtreme_readme_fab461ee8c3b.png)\n\n为了在XTREME基准上取得成功，模型必须学习能够在多种任务和语言之间泛化的表示。每项任务都涵盖XTREME中所包含的40种语言中的部分子集（以下为其ISO 639-1代码）：af, ar, bg, bn, de, el, en, es, et, eu, fa, fi, fr, he, hi, hu, id, it, ja, jv, ka, kk, ko, ml, mr, ms, my, nl, pt, ru, sw, ta, te, th, tl, tr, ur, vi, yo, 和 zh。这些语言是从拥有[最多维基百科文章](https:\u002F\u002Fmeta.wikimedia.org\u002Fwiki\u002FList_of_Wikipedias)的前100种语言中选出的，以最大限度地提高语言多样性、任务覆盖范围以及训练数据的可用性。它们包括非洲-亚欧语系、南亚语系、南岛语系、达罗毗荼语系、印欧语系、日本语系、高加索语系、壮侗语系、尼日尔-刚果语系、汉藏语系、突厥语系和乌拉尔语系的成员，以及两种孤立语言——巴斯克语和韩语。\n\n# 下载数据\n\n要在XTREME上运行实验，第一步是下载依赖项。我们假设您已安装[`anaconda`](https:\u002F\u002Fwww.anaconda.com\u002F)，并且使用Python 3.7及以上版本。其他所需包包括`transformers`、`seqeval`（用于序列标注评估）、`tensorboardx`、`jieba`、`kytea`和`pythainlp`（用于中文、日语和泰语的文本分词），以及`sacremoses`，可通过运行以下脚本进行安装：\n```\nbash install_tools.sh\n```\n\n下一步是下载数据。为此，首先在本项目的根目录下创建一个`download`文件夹，命令为```mkdir -p download```。然后，您需要手动从[这里](https:\u002F\u002Fwww.amazon.com\u002Fclouddrive\u002Fshare\u002Fd3KGCRCIYwhKJF0H3eWA26hjg2ZCRhjpEQtDL70FSBN)下载`panx_dataset`（用于NER），并将其保存到`download`目录中（请注意，它会以`AmazonPhotos.zip`的形式下载）。最后，运行以下命令下载其余数据集：\n```\nbash scripts\u002Fdownload_data.sh\n```\n\n需要注意的是，为了避免在实验过程中意外对测试集进行评估，我们在预处理阶段会移除测试数据的标签，并改变跨语言句子检索任务中测试句子的顺序。\n\n# 构建基线系统\n\nXTREME中的评估设置是零样本跨语言迁移，即从英语迁移到其他语言。我们会在每项XTREME任务的英语标注数据上微调那些已在多语言数据上预训练过的模型。随后，将每个微调后的模型应用于同一任务在其他语言上的测试数据，以获得预测结果。\n\n对于每一项任务，我们都提供了一个脚本`scripts\u002Ftrain.sh`，用于微调在[Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)库中实现的预训练模型。若要微调不同的模型，只需在脚本中传递相应的`MODEL`参数即可。目前支持的模型有`bert-base-multilingual-cased`、`xlm-mlm-100-1280`和`xlm-roberta-large`。\n\n## 泛用依存关系词性标注\n\n对于词性标注任务，我们使用泛用依存关系树库v2.5的数据。您可以通过以下命令在英语词性标注数据上微调预训练的多语言模型：\n```\nbash scripts\u002Ftrain.sh [MODEL] udpos\n```\n\n## Wikiann命名实体识别\n\n对于命名实体识别（NER）任务，我们使用Wikiann（panx）数据集。您可以通过以下命令在英语NER数据上微调预训练的多语言模型：\n```\nbash scripts\u002Ftrain.sh [MODEL] panx\n```\n\n## PAXS-X句子分类\n\n对于句子分类任务，我们使用跨语言释义对抗词汇打乱（PAWS-X）数据集。您可以通过以下命令在英语PAWS数据上微调预训练的多语言模型：\n```\nbash scripts\u002Ftrain.sh [MODEL] pawsx\n```\n\n## XNLI句子分类\n\n第二个句子分类数据集是跨语言自然语言推理（XNLI）数据集。您可以通过以下命令在英语MNLI数据上微调预训练的多语言模型：\n```\nbash scripts\u002Ftrain.sh [MODEL] xnli\n```\n\n## XQuAD、MLQA、TyDiQA-GoldP问答\n\n对于问答任务，我们使用XQuAD、MLQA和TyDiQA-Gold Passage数据集。对于XQuAD和MLQA，模型应在英语SQuAD训练集上进行训练。而对于TyDiQA-Gold Passage，则需在英语TyDiQA-GoldP训练集上训练模型。您可以先使用以下命令在相应的英语训练数据上微调预训练的多语言模型，然后再对所有任务的测试数据进行预测：\n```\nbash scripts\u002Ftrain.sh [MODEL] [xquad,mlqa,tydiqa]\n```\n\n## BUCC句子检索\n\n对于跨语言句子检索任务，我们使用构建与使用平行语料库（BUCC）共享任务的数据。由于这些模型并未针对此任务进行训练，而是直接利用预训练模型的表示来计算相似度判断，因此您可以直接应用模型对任务的测试数据进行预测：\n```\nbash scripts\u002Ftrain.sh [MODEL] bucc2018\n```\n\n## Tatoeba 句子检索\n\n我们使用的第二个跨语言句子检索数据集是 Tatoeba 数据集。与 BUCC 类似，您可以直接应用模型来获得该任务测试数据的预测结果：\n```\nbash scripts\u002Ftrain.sh [MODEL] tatoeba\n```\n\n# 榜单提交\n\n## 提交\n要将您的预测提交至 [**XTREME**](https:\u002F\u002Fsites.research.google\u002Fxtreme)，请创建一个包含 9 个子文件夹的主文件夹，这些子文件夹分别以所有任务的名称命名，即 `udpos`、`panx`、`xnli`、`pawsx`、`xquad`、`mlqa`、`tydiqa`、`bucc2018` 和 `tatoeba`。在每个子文件夹中，创建一个文件，其中包含针对所有语言的测试集的预测标签。文件名应采用 `test-{language}.{extension}` 的格式，其中 `language` 表示双字符语言代码，而 `extension` 对于 QA 任务为 `json`，对于其他任务则为 `tsv`。您可以在 `mock_test_data\u002Fpredictions` 中查看文件夹结构的示例。\n\n## 评估\n我们将使用以下命令将您的提交与我们的标签文件进行比较：\n```\npython evaluate.py --prediction_folder [path] --label_folder [path]\n```\n\n# 翻译\n\n作为训练 translate-train 和 translate-test 基线的一部分，我们已自动将英语训练集翻译成其他语言，并将测试集翻译成英语。翻译适用于以下数据集：SQuAD v1.1（仅训练集和验证集）、MLQA、PAWS-X、TyDiQA-GoldP、XNLI 和 XQuAD。\n\n对于 PAWS-X 和 XNLI，翻译的格式如下：\n第 1 列和第 2 列：原始句对  \n第 3 列和第 4 列：翻译后的句对  \n第 5 列：标签  \n\n这将有助于建立原始数据与其翻译之间的对应关系。\n\n对于 XNLI 和 XQuAD，我们进一步通过将英语测试集自动翻译成 XTREME 中的其余语言，创建了伪测试集，从而使得所有 40 种语言的测试数据都可用。请注意，这些翻译存在噪声，不应被视为真实标签。\n\n所有翻译均可在此处获取：[这里](https:\u002F\u002Fconsole.cloud.google.com\u002Fstorage\u002Fbrowser\u002Fxtreme_translations)。\n\n# 论文\n\n如果您使用我们的基准或本仓库中的代码，请引用我们的论文 `\\cite{hu2020xtreme}`。\n```\n@article{hu2020xtreme,\n      author    = {Junjie Hu and Sebastian Ruder and Aditya Siddhant and Graham Neubig and Orhan Firat and Melvin Johnson},\n      title     = {XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization},\n      journal   = {CoRR},\n      volume    = {abs\u002F2003.11080},\n      year      = {2020},\n      archivePrefix = {arXiv},\n      eprint    = {2003.11080}\n}\n```\n\n请考虑在您的论文中加入类似下面的说明，以确保引用所有单独的数据集。\n\n我们在 XTREME 基准上进行了实验 `\\cite{hu2020xtreme}`，这是一个用于多语言学习的综合基准，由来自 XNLI `\\cite{Conneau2018xnli}`、PAWS-X `\\cite{Yang2019paws-x}`、UD-POS `\\cite{nivre2018universal}`、Wikiann NER `\\cite{Pan2017}`、XQuAD `\\cite{artetxe2020cross}`、MLQA `\\cite{Lewis2020mlqa}`、TyDiQA-GoldP `\\cite{Clark2020tydiqa}`、BUCC 2018 `\\cite{zweigenbaum2018overview}` 和 Tatoeba `\\cite{Artetxe2019massively}` 等任务的数据组成。我们提供它们的 BibTex 信息如下。\n```\n@inproceedings{Conneau2018xnli,\n    title = \"{XNLI}: Evaluating Cross-lingual Sentence Representations\",\n    author = \"Conneau, Alexis  and\n      Rinott, Ruty  and\n      Lample, Guillaume  and\n      Williams, Adina  and\n      Bowman, Samuel  and\n      Schwenk, Holger  and\n      Stoyanov, Veselin\",\n    booktitle = \"Proceedings of EMNLP 2018\",\n    year = \"2018\",\n    pages = \"2475--2485\",\n}\n\n@inproceedings{Yang2019paws-x,\n    title = \"{PAWS-X}: A Cross-lingual Adversarial Dataset for Paraphrase Identification\",\n    author = \"Yang, Yinfei  and\n      Zhang, Yuan  and\n      Tar, Chris  and\n      Baldridge, Jason\",\n    booktitle = \"Proceedings of EMNLP 2019\",\n    year = \"2019\",\n    pages = \"3685--3690\",\n}\n\n@article{nivre2018universal,\n  title={Universal Dependencies 2.2},\n  author={Nivre, Joakim and Abrams, Mitchell and Agi{\\'c}, {\\v{Z}}eljko and Ahrenberg, Lars and Antonsen, Lene and Aranzabe, Maria Jesus and Arutie, Gashaw and Asahara, Masayuki and Ateyah, Luma and Attia, Mohammed and others},\n  year={2018}\n}\n\n@inproceedings{Pan2017,\nauthor = {Pan, Xiaoman and Zhang, Boliang and May, Jonathan and Nothman, Joel and Knight, Kevin and Ji, Heng},\nbooktitle = {Proceedings of ACL 2017},\npages = {1946--1958},\ntitle = {{Cross-lingual name tagging and linking for 282 languages}},\nyear = {2017}\n}\n\n@inproceedings{artetxe2020cross,\nauthor = {Artetxe, Mikel and Ruder, Sebastian and Yogatama, Dani},\nbooktitle = {Proceedings of ACL 2020},\ntitle = {{On the Cross-lingual Transferability of Monolingual Representations}},\nyear = {2020}\n}\n\n@inproceedings{Lewis2020mlqa,\nauthor = {Lewis, Patrick and Oğuz, Barlas and Rinott, Ruty and Riedel, Sebastian and Schwenk, Holger},\nbooktitle = {Proceedings of ACL 2020},\ntitle = {{MLQA: Evaluating Cross-lingual Extractive Question Answering}},\nyear = {2020}\n}\n\n@inproceedings{Clark2020tydiqa,\nauthor = {Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki},\nbooktitle = {Transactions of the Association of Computational Linguistics},\ntitle = {{TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}},\nyear = {2020}\n}\n\n@inproceedings{zweigenbaum2018overview,\n  title={Overview of the third BUCC shared task: Spotting parallel sentences in comparable corpora},\n  author={Zweigenbaum, Pierre and Sharoff, Serge and Rapp, Reinhard},\n  booktitle={Proceedings of 11th Workshop on Building and Using Comparable Corpora},\n  pages={39--42},\n  year={2018}\n}\n\n@article{Artetxe2019massively,\nauthor = {Artetxe, Mikel and Schwenk, Holger},\njournal = {Transactions of the ACL 2019},\ntitle = {{Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond}},\nyear = {2019}\n}\n```","# XTREME 快速上手指南\n\nXTREME 是一个大规模多语言多任务基准测试，用于评估预训练多语言模型的跨语言泛化能力。它涵盖 40 种类型多样的语言和 9 项自然语言处理任务（如分类、命名实体识别、问答等）。本指南将帮助你快速搭建环境并运行基线系统。\n\n## 环境准备\n\n在开始之前，请确保你的系统满足以下要求：\n\n*   **操作系统**: Linux 或 macOS (Windows 用户建议使用 WSL)\n*   **Python 版本**: 3.7 或更高\n*   **包管理器**: 已安装 [Anaconda](https:\u002F\u002Fwww.anaconda.com\u002F) 或 Miniconda\n*   **网络环境**: 需要访问 Google Cloud Storage 和 Amazon Cloud Drive 下载数据（国内用户可能需要配置代理或使用加速工具）\n\n## 安装步骤\n\n### 1. 创建虚拟环境并安装依赖\n首先，克隆仓库并进入目录，然后使用提供的脚本安装必要的 Python 库（包括 `transformers`, `seqeval`, `jieba` 等）。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fxtreme.git\ncd xtreme\nbash install_tools.sh\n```\n\n> **提示**：如果 `install_tools.sh` 执行缓慢，可手动使用 pip 安装核心依赖：\n> `pip install transformers seqeval tensorboardx jieba kytea pythainlp sacremoses`\n\n### 2. 下载数据集\n数据下载分为两部分：手动下载 NER 数据和自动下载其余数据。\n\n**第一步：手动下载 WikiANN (panx) 数据**\n1.  在项目根目录创建下载文件夹：\n    ```bash\n    mkdir -p download\n    ```\n2.  前往 [Amazon Cloud Drive 链接](https:\u002F\u002Fwww.amazon.com\u002Fclouddrive\u002Fshare\u002Fd3KGCRCIYwhKJF0H3eWA26hjg2ZCRhjpEQtDL70FSBN) 下载 `panx_dataset` (文件名为 `AmazonPhotos.zip`)。\n3.  将下载的压缩包放入 `download` 目录并解压（确保解压后的文件结构符合脚本预期）。\n\n**第二步：自动下载其他数据集**\n运行以下脚本下载剩余的任务数据（如 XNLI, PAWS-X, XQuAD 等）：\n\n```bash\nbash scripts\u002Fdownload_data.sh\n```\n\n> **注意**：为防止意外在测试集上进行训练，脚本在预处理时会自动移除测试集的标签，并对跨语言句子检索任务的测试句顺序进行打乱。\n\n## 基本使用\n\nXTREME 的评估设定为**从英语出发的零样本跨语言迁移**。即：使用英语标注数据微调多语言预训练模型，然后直接在其他语言的测试集上进行预测。\n\n### 运行基线训练与评估\n项目为每个任务提供了统一的训练脚本 `scripts\u002Ftrain.sh`。你只需指定预训练模型名称和任务名称即可。\n\n**支持的预训练模型**:\n*   `bert-base-multilingual-cased`\n*   `xlm-mlm-100-1280`\n*   `xlm-roberta-large`\n\n**通用命令格式**:\n```bash\nbash scripts\u002Ftrain.sh [MODEL] [TASK]\n```\n\n### 任务示例\n\n以下是针对不同类型任务的运行示例：\n\n**1. 词性标注 (UD-POS)**\n使用 Universal Dependencies v2.5 数据进行英语微调：\n```bash\nbash scripts\u002Ftrain.sh bert-base-multilingual-cased udpos\n```\n\n**2. 命名实体识别 (WikiANN)**\n使用 WikiANN (panx) 数据进行英语微调：\n```bash\nbash scripts\u002Ftrain.sh xlm-roberta-large panx\n```\n\n**3. 句子分类 (XNLI)**\n使用 MNLI (英语) 数据进行微调：\n```bash\nbash scripts\u002Ftrain.sh bert-base-multilingual-cased xnli\n```\n\n**4. 机器阅读理解 (XQuAD \u002F MLQA \u002F TyDiQA)**\n支持多个问答数据集，以 XQuAD 为例：\n```bash\nbash scripts\u002Ftrain.sh xlm-roberta-large xquad\n```\n\n**5. 跨语言句子检索 (BUCC \u002F Tatoeba)**\n此类任务无需微调，直接使用预训练模型的表示计算相似度：\n```bash\nbash scripts\u002Ftrain.sh bert-base-multilingual-cased bucc2018\n```\n\n### 查看结果\n脚本运行完成后，预测结果将生成在相应的输出目录中。你可以使用官方提供的评估脚本对比预测结果与标签：\n\n```bash\npython evaluate.py --prediction_folder [你的预测文件夹路径] --label_folder [标签文件夹路径]\n```\n\n提交到 Leaderboard 时，请将所有 9 个任务的预测结果整理到一个文件夹中，每个任务一个子文件夹，文件命名格式为 `test-{语言代码}.{扩展名}` (QA 任务用 `.json`，其他用 `.tsv`)。","某跨国科技公司的算法团队正在研发一款支持全球市场的多语言情感分析模型，急需验证其在低资源语种上的泛化能力。\n\n### 没有 xtreme 时\n- **评估维度单一**：团队只能依赖常见的英语或少数主流语言数据集进行测试，无法得知模型在斯瓦希里语、泰米尔语等 40 种类型学多样语言上的真实表现。\n- **零样本迁移效果未知**：缺乏统一的零样本跨语言评估标准，难以判断模型是否真正学会了跨语言推理，还是仅仅过拟合了训练数据。\n- **基线对比困难**：由于缺少涵盖句法、语义等多层级任务的标准化基准，团队无法与业界最先进模型进行公平、全面的性能对标。\n- **数据准备繁琐**：自行收集并清洗覆盖 12 个语系的多语言测试数据耗时耗力，且难以保证数据质量和任务多样性。\n\n### 使用 xtreme 后\n- **全景式能力画像**：利用 xtreme 覆盖的 40 种语言和 9 项任务，团队迅速定位到模型在尼日尔 - 刚果语系等低资源语言上的薄弱环节。\n- **量化泛化增益**：通过标准的零样本跨语言设置，清晰验证了预训练模型在未见过语言上的迁移效果，证明了架构的鲁棒性。\n- **精准对标前沿**：直接复用 xtreme 提供的基线系统和排行榜机制，快速确认当前模型在全球范围内的竞争力排名。\n- **开箱即用数据**：一键下载包含句子分类、问答、检索等多种范式的高质量数据集，将原本数周的数据准备工作缩短至几小时。\n\nxtreme 通过提供大规模、多样化的标准化基准，帮助开发者从“盲目猜测”转向“数据驱动”，显著提升了多语言模型的研发效率与可靠性。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-research_xtreme_9903b7e3.png","google-research","Google Research","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fgoogle-research_c23b2adf.png","",null,"https:\u002F\u002Fresearch.google","https:\u002F\u002Fgithub.com\u002Fgoogle-research",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",63.2,{"name":88,"color":89,"percentage":90},"Shell","#89e051",36.8,652,110,"2026-03-16T05:27:11","Apache-2.0","未说明",{"notes":97,"python":98,"dependencies":99},"建议使用 Anaconda 管理环境。运行前需手动下载 PANX 数据集（NER 任务），其余数据集可通过脚本自动下载。该工具主要用于评估多语言模型的跨语言泛化能力，支持零样本跨语言迁移实验。","3.7+",[100,101,102,103,104,105,106],"transformers","seqeval","tensorboardx","jieba","kytea","pythainlp","sacremoses",[15,46],"2026-03-27T02:49:30.150509","2026-04-08T03:56:13.331823",[111,116,121,126,131],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},23674,"复现 MLQA 基准测试时，中文（zh）语言的评估分数异常低怎么办？","这是因为 Hugging Face Transformers 库中的 `squad_metrics.py` 文件对中文分词处理有误。解决方法是修改该文件：删除第 514 行，并在该位置添加 `final_text = tok_text`。修改后，重新运行 `predict_qa.py` 生成预测文件，然后再运行评估脚本即可得到正确结果。","https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fxtreme\u002Fissues\u002F38",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},23675,"运行 PANX（命名实体识别）任务预测步骤时，部分语言报错或无法处理怎么办？","这通常是因为某些语言的测试数据文件中存在连续的空行，导致预处理脚本 `utils_tag.py` 生成错误的特征。解决方法是在运行预测前，使用脚本清理数据文件中的连续空行。例如，可以使用以下 bash 命令遍历并修复所有 `.tsv` 文件：\nfor file in *.tsv\ndo\n  sed -i 'N;\u002F^\\n$\u002FD;P;D;' $file\ndone","https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fxtreme\u002Fissues\u002F16",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},23676,"运行 Tatoeba 任务时遇到 `transformers.modeling_bert` 相关的 ImportError 或版本兼容性问题如何解决？","这是由于 `transformers` 库版本不匹配导致的。建议不要直接使用预训练模型的表示，而是使用在 SQuAD 上微调过的模型进行检索。请检查 `run_tatoeba.sh` 脚本，将其中的模型路径替换为微调后的模型路径。此外，确保使用的 `transformers` 版本与代码库中第三方脚本（如 `bert.py`）所依赖的 API 兼容，必要时需根据报错调整导入路径或降级\u002F升级库版本。","https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fxtreme\u002Fissues\u002F85",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},23677,"XTREME-R 基准测试（包括 Mewsli-X 数据集）何时发布？","维护团队正在努力准备发布，但由于学术会议截止日期的影响，发布时间有所推迟。官方表示正在尽快推进发布工作，以便社区能够在新包含的任务上测试和评估模型。","https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fxtreme\u002Fissues\u002F67",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},23678,"使用 mBERT 复现 XQUAD 结果时，越南语（vi）和泰语（th）等语言的分数远低于论文数据，原因是什么？","这是一个已知的复现性问题。虽然英语、西班牙语和德语的结果与论文相当，但部分语言（如 vi, th, hi, el）存在显著差距。这可能与特定的预处理步骤、分词器设置或评估脚本的细节有关。如果遇到此问题，建议仔细核对 README 中的每一步指令，并参考其他成功复现者的配置。如果问题持续，应在对应的 Issue 中标记维护者寻求具体参数确认。","https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fxtreme\u002Fissues\u002F8",[]]