[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-allenai--natural-instructions":3,"tool-allenai--natural-instructions":64},[4,17,25,39,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":10,"last_commit_at":23,"category_tags":24,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":26,"name":27,"github_repo":28,"description_zh":29,"stars":30,"difficulty_score":10,"last_commit_at":31,"category_tags":32,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[33,34,35,36,14,37,15,13,38],"图像","数据工具","视频","插件","其他","音频",{"id":40,"name":41,"github_repo":42,"description_zh":43,"stars":44,"difficulty_score":45,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[14,33,13,15,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":45,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[15,33,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":45,"last_commit_at":62,"category_tags":63,"status":16},2181,"OpenHands","OpenHands\u002FOpenHands","OpenHands 是一个专注于 AI 驱动开发的开源平台，旨在让智能体（Agent）像人类开发者一样理解、编写和调试代码。它解决了传统编程中重复性劳动多、环境配置复杂以及人机协作效率低等痛点，通过自动化流程显著提升开发速度。\n\n无论是希望提升编码效率的软件工程师、探索智能体技术的研究人员，还是需要快速原型验证的技术团队，都能从中受益。OpenHands 提供了灵活多样的使用方式：既可以通过命令行（CLI）或本地图形界面在个人电脑上轻松上手，体验类似 Devin 的流畅交互；也能利用其强大的 Python SDK 自定义智能体逻辑，甚至在云端大规模部署上千个智能体并行工作。\n\n其核心技术亮点在于模块化的软件智能体 SDK，这不仅构成了平台的引擎，还支持高度可组合的开发模式。此外，OpenHands 在 SWE-bench 基准测试中取得了 77.6% 的优异成绩，证明了其解决真实世界软件工程问题的能力。平台还具备完善的企业级功能，支持与 Slack、Jira 等工具集成，并提供细粒度的权限管理，适合从个人开发者到大型企业的各类用户场景。",70626,"2026-04-05T22:51:36",[15,14,13,36],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":100,"forks":101,"last_commit_at":102,"license":103,"difficulty_score":104,"env_os":78,"env_gpu":105,"env_ram":105,"env_deps":106,"category_tags":109,"github_topics":79,"view_count":45,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":110,"updated_at":111,"faqs":112,"releases":148},2872,"allenai\u002Fnatural-instructions","natural-instructions","Expanding natural instructions ","natural-instructions 是一个由社区共同构建的开源项目，旨在收集海量自然语言任务指令及其定义。它解决了传统人工智能模型只能针对特定任务进行训练、难以泛化到未见过的全新任务的痛点。通过让模型学习理解多样化的自然语言指令，该项目致力于培养出能够灵活应对各类新任务的通用型 AI。\n\n该项目非常适合自然语言处理（NLP）领域的研究人员和开发者使用。无论是希望探索大模型指令跟随能力的学者，还是想要基于高质量指令数据微调模型的工程师，都能从中获益。其核心亮点在于规模庞大的任务库，目前已汇聚了超过 1500 项由全球贡献者提供的任务，涵盖广泛的推理能力。每个任务都遵循严谨的结构，包含清晰的定义、输入输出示例及正负样本，为训练下一代具备更强泛化能力的 AI 模型提供了宝贵的“游乐场”。如果你正在寻找提升模型零样本或少样本学习能力的数据资源，natural-instructions 值得重点关注。","# A Repository of Language Instructions for NLP Tasks\n\n**TLDR;** this repository maintains a community effort to create a large collection of tasks and their natural language definitions\u002Finstructions. \nCheck the [releases](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Freleases) for the summary of the latest changes and additions to the tasks.  \nIf you have any suggestions to improve the data, let us know. We're looking for more contributions to make this data better and bigger! 🙌  \n\n### News Bulletin\n\n- *May 2022:* We released the several models trained on our data. Check out the [code](https:\u002F\u002Fgithub.com\u002Fyizhongw\u002FTk-Instruct) and [checkpoints](https:\u002F\u002Fhuggingface.co\u002Fmodels?search=tk-instruct-).\n- *April 2022:* A [paper]( https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.07705) on our data is out!\n- *October 15, 2021:* the goal date for the our v2 dataset.\n  - The community have contributed over 1500 tasks!! 🎉\n  - We are working on cleaning up the new tasks and publishing a paper summarizing our new findings!\n  - You can still submit new tasks! The new tasks will be part of the future data releases.\n- *Sept 2021*: general [call for contributions](https:\u002F\u002Fmedium.com\u002Fai2-blog\u002Fcall-for-contributions-a-community-driven-repository-of-natural-language-instructions-9d3f24d5a9db) is out!\n- *June 2021:* we initiated this repository with 61 tasks!\n\n## Background \n### Why define tasks in natural language?\nWhile the current dominant paradigm (supervised learning with task-specific labeled examples) has been successful in building task-specific models, such models can't generalize to unseen tasks; for example, a model that is supervised to solve questions cannot solve a classification task. \nWe hypothesize that a model equipped with understanding and reasoning with natural language instructions should be able to generalize to any task that can be defined in terms of natural language.\n\n### Any empirical evidence that this might be true?\nIn our [earlier effort](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.08773), we built a smaller data (61 tasks) and \nobserved that language models benefit from language instructions, i.e., their generalization to unseen tasks when they were provided with more instructions.  \nAlso, generalization to unseen tasks improves as the model is trained on more tasks.\n\n### Why build this dataset?  \nWe believe that [our earlier work](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.08773) is just scratching the surface and there is probably so much that be studied in this setup.\nWe hope to put together a much larger dataset that covers a wider range of reasoning abilities. \nWe believe that this expanded dataset will serve as a useful playground for the community to study and build the next generation of AI\u002FNLP models.\nSee [this blog post](https:\u002F\u002Fmedium.com\u002Fai2-blog\u002Fcall-for-contributions-a-community-driven-repository-of-natural-language-instructions-9d3f24d5a9db) for a summary of the motivation behind this work.\n\n\n## Task schema  \nEach consists of input\u002Foutput. For example, think of the task of sentiment classification:  \n - **Input:** `I thought the Spiderman animation was good, but the movie disappointed me.`\n - **Output:** `Mixed` \n\nHere is another example from the same task: \n - **Input:** `The pumpkin was one of the worst that I've had in my life.` \n - **Output:**  `Negative`  \n\nAdditionally, each ask contains a task *definition*: \n```\nGiven a tweet, classify it into one of 4 categories: Positive, Negative, Neutral, or Mixed.\n``` \n\nOverall, each tasks follows this schema:\n \n![](doc\u002Fschema-simplified.svg ) \n\nOr if you're comfortable with json files, here is how it would look like: \n```json \n{\n  \"Contributors\": [\"\"],\n  \"Source\": [\"\"],\n  \"URL\": [\"\"],\n  \"Categories\": [\"\"],\n  \"Reasoning\": [\"\"],\n  \"Definition\": [\"\"],\n  \"Input_language\": [\"\"], \n  \"Output_language\": [\"\"],\n  \"Instruction_language\": [\"\"],  \n  \"Domains\": [\"\"],    \n  \"Positive Examples\": [ { \"input\": \"\", \"output\": \"\",  \"explanation\": \"\"} ], \n  \"Negative Examples\": [ { \"input\": \"\", \"output\": \"\",  \"explanation\": \"\"} ],\n  \"Instances\": [ { \"id\": \"\", \"input\": \"\", \"output\": [\"\"]} ],\n}\n```\n\n## How to contribute \nWe would appreciate any external contributions! 🙏 You can contribute in a variety of ways. \n - If you think an important task is missing, you can contribute it via Pull-Request.  You can also get inspirations from the task suggestions in [the Github issues](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fissues?q=is%3Aissue+is%3Aopen+label%3Atask-suggestion) which you can sign up to work on. \n - If you have any other suggested tasks but you're not sure if they're good fit, bring them up in the [issues](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fissues).  \n - If you have any questions or suggestions, please use [the issues](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fissues) feature.  \n - If you're addimg a new task, make sure to review the following guidelines: \n    * Each task must contain contain a `.json` file that contains the task content. You can look inside the [`tasks\u002F`](tasks) directory for several examples.  \n       * Make sure that your json is human readable (use proper indentation; e.g., in Python: `json.dumps(your_json_string, indent=4, ensure_ascii=False)`)   \n       * Make sure that you json file is not bigger than 50MB. \n       * Make sure your task has no more 6.5k instances (input\u002Foutput pairs).\n       * Each instance must have a unique id, which should be the task number plus a string generated by `uuid.uuid4().hex`. E.g., `task1356-bb5ff013dc5d49d7a962e85ed1de526b`.\n       * Make sure to include task category and domains, based on [this list](doc\u002Ftask-hierarchy.md). \n       * Make sure to number your task json correctly \n          * Look at the task number in the latest pull request, task number in your submission should be the next number. \n          * Make sure to include the source dataset name and the task type when naming your task json file. \n             * You can use this format: `taskabc_\u003Csource_dataset>_\u003Ctask_type>.json` E.g. in `task001_quoref_question_generation.json`, the source dataset is `quoref` and the task is `question generation`. \n       * Note that, source need not necessarily be a dataset and can be a website e.g. leetcode. \n          * If you have created the json without any reference, use `synthetic` in place of source.\n       * You should have one pull request per dataset. Name your pull request as `Task Name \u003Cstart_task_number>-\u003Cend_task_number>`.\n       * If you're building your tasks based existing datasets and their crowdsourcing templates, see these [guidelines](doc\u002Fcrowdsourcing.md). \n    * Add your task to [our list of tasks](tasks\u002FREADME.md).\n    * To make sure that your addition is formatted correctly, run the tests: `> python src\u002Ftest_all.py`\n       * To only test the formatting of a range of tasks, run `> python src\u002Ftest_all.py --task \u003Cbegin_task_number> \u003Cend_task_number>`. For example, running `> python src\u002Ftest_all.py --task 5 10` will run the test from task005 to task010.\n\n## Benchmarking cross-task generalization\n\nAs is introduced in our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.07705), this dataset can be used for systematic study of cross-task generalization, i.e., training on a subset of tasks and evaluating on the remaining unseen ones. To make the comparison among different methods easier, we create an official split [here](splits\u002F), as is described in the paper. You can follow the instructions to set up your experiments.\n\nWe also released our [experiment code](https:\u002F\u002Fgithub.com\u002Fyizhongw\u002FTk-Instruct) and [checkpoints](https:\u002F\u002Fhuggingface.co\u002Fmodels?search=tk-instruct-) for reproducibility and future research.\n\n## License \nAll the data here (except the instances of each task) are released under Apache-2.0 license. \nThe instances of each tasks are subject to the license under which the original dataset was released. \nThese license information are available unders \"Instance License\" field within each task file. \n\n\n## Misc.\n\nIf you want to use Natural Instructions v1, here's the code: [link](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-v1)\n\nFeel free to cite us. \n\n```bibtex\n@inproceedings{naturalinstructions,\n  title={Cross-task generalization via natural language crowdsourcing instructions},\n  author={Mishra, Swaroop and Khashabi, Daniel and Baral, Chitta and Hajishirzi, Hannaneh},\n  booktitle={ACL},\n  year={2022}\n}\n@inproceedings{supernaturalinstructions,\n  title={Super-NaturalInstructions:Generalization via Declarative Instructions on 1600+ Tasks},\n  author={Wang, Yizhong and Mishra, Swaroop and Alipoormolabashi, Pegah and Kordi, Yeganeh and Mirzaei, Amirreza and Arunkumar, Anjana and Ashok, Arjun and Dhanasekaran, Arut Selvan and Naik, Atharva and Stap, David and others},\n  booktitle={EMNLP},\n  year={2022}\n}\n```\n","# 用于自然语言处理任务的语言指令库\n\n**简而言之：** 本仓库致力于通过社区协作，构建一个包含大量任务及其自然语言定义\u002F指令的集合。\n请查看[发布页面](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Freleases)，了解最新任务变更和新增内容的概览。  \n如果您有任何改进建议，请随时告诉我们。我们期待更多贡献，让这份数据更加完善、更加丰富！🙌  \n\n### 最新动态\n\n- *2022年5月：* 我们发布了基于该数据集训练的多个模型。请参阅[代码](https:\u002F\u002Fgithub.com\u002Fyizhongw\u002FTk-Instruct)和[检查点](https:\u002F\u002Fhuggingface.co\u002Fmodels?search=tk-instruct-)。\n- *2022年4月：* 关于我们数据集的一篇[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.07705)已发表！\n- *2021年10月15日：* 我们v2版本数据集的目标日期。\n  - 社区已贡献超过1500个任务！！🎉\n  - 我们正在整理新增任务，并计划发表一篇总结新发现的论文！\n  - 您仍然可以提交新的任务！这些新任务将纳入未来的数据发布中。\n- *2021年9月：* 发布了面向社区的[征集公告](https:\u002F\u002Fmedium.com\u002Fai2-blog\u002Fcall-for-contributions-a-community-driven-repository-of-natural-language-instructions-9d3f24d5a9db)！\n- *2021年6月：* 我们以61个任务启动了这个仓库！\n\n## 背景  \n### 为什么用自然语言来定义任务？\n尽管当前主流范式（即基于特定任务标注数据的监督学习）在构建特定任务模型方面取得了成功，但这类模型难以泛化到未见过的任务。例如，一个经过问题解答任务监督训练的模型，无法直接解决分类任务。\n我们假设，如果模型具备理解与推理自然语言指令的能力，那么它应该能够泛化到任何可以用自然语言描述的任务。\n\n### 是否有实验证据支持这一假设？\n在我们之前的[研究](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.08773)中，我们构建了一个较小的数据集（61个任务），并观察到语言模型确实受益于语言指令——当提供更多的指令时，它们对未见任务的泛化能力显著提升。此外，随着模型在更多任务上进行训练，其对未见任务的泛化能力也会进一步增强。\n\n### 为什么要构建这个数据集？\n我们认为，先前的工作[仅触及了冰山一角]，在这种设置下仍有大量值得深入研究的内容。我们希望汇集一个规模更大、涵盖更广泛推理能力的数据集。相信这个扩展后的数据集将成为社区研究和开发下一代人工智能\u002F自然语言处理模型的宝贵平台。\n有关这项工作的动机概述，请参阅[这篇博客文章](https:\u002F\u002Fmedium.com\u002Fai2-blog\u002Fcall-for-contributions-a-community-driven-repository-of-natural-language-instructions-9d3f24d5a9db)。\n\n## 任务模式  \n每个任务由输入和输出组成。以情感分类任务为例：  \n - **输入：** `我觉得蜘蛛侠的动画不错，但这部电影让我很失望。`\n - **输出：** `混合`  \n\n以下是同一任务的另一个示例：  \n - **输入：** `这南瓜是我吃过的最糟糕的之一。`\n - **输出：** `负面`  \n\n此外，每个任务还包含一个任务*定义*：  \n```\n给定一条推文，将其归类为以下四类之一：正面、负面、中性或混合。\n```  \n\n总体而言，每个任务遵循如下模式：\n![](doc\u002Fschema-simplified.svg ) \n\n或者，如果您熟悉JSON格式，也可以这样表示：  \n```json \n{\n  \"Contributors\": [\"\"],\n  \"Source\": [\"\"],\n  \"URL\": [\"\"],\n  \"Categories\": [\"\"],\n  \"Reasoning\": [\"\"],\n  \"Definition\": [\"\"],\n  \"Input_language\": [\"\"], \n  \"Output_language\": [\"\"],\n  \"Instruction_language\": [\"\"],  \n  \"Domains\": [\"\"],    \n  \"Positive Examples\": [ { \"input\": \"\", \"output\": \"\",  \"explanation\": \"\"} ], \n  \"Negative Examples\": [ { \"input\": \"\", \"output\": \"\",  \"explanation\": \"\"} ],\n  \"Instances\": [ { \"id\": \"\", \"input\": \"\", \"output\": [\"\"]} ],\n}\n```\n\n## 如何贡献\n我们非常欢迎任何外部贡献！🙏 您可以通过多种方式参与贡献。\n- 如果您认为缺少一项重要任务，可以通过 Pull Request 提交。您也可以参考 [GitHub 问题](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fissues?q=is%3Aissue+is%3Aopen+label%3Atask-suggestion) 中的任务建议，并选择感兴趣的任务进行开发。\n- 如果您有其他任务建议，但不确定是否合适，可以在 [问题](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fissues) 中提出讨论。\n- 如果您有任何疑问或建议，请使用 [问题](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fissues) 功能。\n- 如果您要添加新任务，请务必遵循以下指南：\n  * 每个任务必须包含一个 `.json` 文件，其中包含任务内容。您可以查看 [`tasks\u002F`](tasks) 目录中的示例。\n    * 确保您的 JSON 文件易于阅读（使用适当的缩进；例如，在 Python 中可以使用 `json.dumps(your_json_string, indent=4, ensure_ascii=False)`）。\n    * 确保 JSON 文件大小不超过 50MB。\n    * 确保每个任务的实例数量不超过 6,500 对（输入\u002F输出对）。\n    * 每个实例必须有一个唯一的 ID，格式为任务编号加上由 `uuid.uuid4().hex` 生成的字符串。例如：`task1356-bb5ff013dc5d49d7a962e85ed1de526b`。\n    * 根据 [此列表](doc\u002Ftask-hierarchy.md)，请明确标注任务类别和领域。\n    * 请正确编号您的任务 JSON 文件：\n      * 查看最新 Pull Request 中的任务编号，您提交的任务编号应为下一个数字。\n      * 命名任务 JSON 文件时，请注明来源数据集名称和任务类型。\n        * 可以使用如下格式：`taskabc_\u003Csource_dataset>_\u003Ctask_type>.json`。例如，`task001_quoref_question_generation.json` 表示来源数据集是 `quoref`，任务类型是“问题生成”。\n    * 需要注意的是，来源不一定是数据集，也可能是网站等，例如 LeetCode。如果您是在没有参考的情况下创建的 JSON 文件，请在来源处填写“synthetic”。\n    * 每个数据集应提交一个 Pull Request。Pull Request 的命名格式为：“任务名称 \u003C起始任务编号>-\u003C结束任务编号>”。\n    * 如果您基于现有数据集及其众包模板构建任务，请参阅 [众包指南](doc\u002Fcrowdsourcing.md)。\n  * 将您的任务添加到 [我们的任务列表](tasks\u002FREADME.md) 中。\n  * 为确保您的新增内容格式正确，请运行测试：`> python src\u002Ftest_all.py`。\n    * 如果只想测试特定范围任务的格式，可以运行：`> python src\u002Ftest_all.py --task \u003C开始任务编号> \u003C结束任务编号>`。例如，运行 `> python src\u002Ftest_all.py --task 5 10` 将会测试从 task005 到 task010 的格式。\n\n## 跨任务泛化基准测试\n如我们在 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.07705) 中所述，该数据集可用于系统性地研究跨任务泛化能力，即在部分任务上进行训练，然后在剩余未见过的任务上进行评估。为了便于比较不同方法的效果，我们按照论文中的描述，在 [此处](splits\u002F) 创建了一个官方划分。您可以按照说明设置您的实验。\n\n我们还发布了用于复现和未来研究的 [实验代码](https:\u002F\u002Fgithub.com\u002Fyizhongw\u002FTk-Instruct) 和 [检查点](https:\u002F\u002Fhuggingface.co\u002Fmodels?search=tk-instruct-)。\n\n## 许可证\n此处的所有数据（除各任务的实例外）均采用 Apache-2.0 许可证发布。\n各任务的实例则遵循其原始数据集所使用的许可证。这些许可证信息可在每个任务文件中的“Instance License”字段中找到。\n\n## 其他\n如果您想使用 Natural Instructions v1，代码链接如下：[链接](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-v1)\n\n欢迎引用我们的工作。\n\n```bibtex\n@inproceedings{naturalinstructions,\n  title={通过自然语言众包指令实现跨任务泛化},\n  author={Mishra, Swaroop 和 Khashabi, Daniel 和 Baral, Chitta 和 Hajishirzi, Hannaneh},\n  booktitle={ACL},\n  year={2022}\n}\n@inproceedings{supernaturalinstructions,\n  title={Super-NaturalInstructions：通过 1600 多项任务的声明式指令实现泛化},\n  author={Wang, Yizhong 和 Mishra, Swaroop 和 Alipoormolabashi, Pegah 和 Kordi, Yeganeh 和 Mirzaei, Amirreza 和 Arunkumar, Anjana 和 Ashok, Arjun 和 Dhanasekaran, Arut Selvan 和 Naik, Atharva 和 Stap, David 等},\n  booktitle={EMNLP},\n  year={2022}\n}\n```","# Natural Instructions 快速上手指南\n\nNatural Instructions 是一个由社区驱动的大型自然语言指令数据集仓库，旨在通过自然语言定义的任务来训练和评估模型跨任务泛化能力。本指南将帮助你快速获取数据并了解其结构。\n\n## 环境准备\n\n本项目主要提供数据文件（JSON 格式）和相关测试脚本，无需复杂的深度学习框架即可浏览数据，但若需运行官方测试或进行模型训练，建议准备以下环境：\n\n*   **操作系统**: Linux, macOS 或 Windows (需支持 Python)\n*   **Python 版本**: 3.6 或更高版本\n*   **前置依赖**:\n    *   `git`: 用于克隆仓库\n    *   `python`: 运行测试脚本\n    *   (可选) `torch`, `transformers`: 若需加载官方发布的 Tk-Instruct 模型进行检查点测试或微调\n\n> **国内加速建议**:\n> 由于 GitHub 访问可能不稳定，建议使用国内镜像源克隆仓库，或配置 Git 代理。\n> *   使用 Gitee 镜像（如有同步）: `git clone https:\u002F\u002Fgitee.com\u002Fmirror\u002Fnatural-instructions.git`\n> *   或者配置 Git 全局代理后克隆原仓库。\n\n## 安装步骤\n\n1.  **克隆仓库**\n    打开终端，执行以下命令获取最新代码和数据：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion.git\n    cd natural-instructions-expansion\n    ```\n\n2.  **验证环境（可选）**\n    如果你计划贡献新任务或验证数据格式，可以运行官方提供的测试脚本。确保已安装 Python 依赖（通常只需标准库，若报错可尝试 `pip install -r requirements.txt`，如果存在该文件）：\n    ```bash\n    python src\u002Ftest_all.py\n    ```\n    *注：若只想测试特定范围的任务（例如任务 5 到 10），可运行：*\n    ```bash\n    python src\u002Ftest_all.py --task 5 10\n    ```\n\n## 基本使用\n\nNatural Instructions 的核心是位于 `tasks\u002F` 目录下的 JSON 文件。每个文件代表一个特定的 NLP 任务，包含任务定义、正负示例及大量实例。\n\n### 1. 浏览数据结构\n你可以直接使用文本编辑器或 Python 读取任务文件。以下是一个简单的 Python 示例，展示如何加载并查看第一个任务的定义和样本：\n\n```python\nimport json\nimport os\n\n# 定位到 tasks 目录\ntasks_dir = \"tasks\"\n# 获取第一个任务文件 (假设文件名排序后第一个)\ntask_files = sorted([f for f in os.listdir(tasks_dir) if f.endswith('.json')])\nfirst_task_file = os.path.join(tasks_dir, task_files[0])\n\nwith open(first_task_file, 'r', encoding='utf-8') as f:\n    task_data = json.load(f)\n\n# 查看任务定义\nprint(\"任务定义:\", task_data[\"Definition\"][0])\nprint(\"输入语言:\", task_data[\"Input_language\"][0])\nprint(\"输出语言:\", task_data[\"Output_language\"][0])\n\n# 查看一个正样本示例\nif task_data[\"Positive Examples\"]:\n    example = task_data[\"Positive Examples\"][0]\n    print(\"\\n--- 正样本示例 ---\")\n    print(\"输入:\", example[\"input\"])\n    print(\"输出:\", example[\"output\"])\n    print(\"解释:\", example.get(\"explanation\", \"无\"))\n\n# 查看实际数据实例 (Instances)\nif task_data[\"Instances\"]:\n    instance = task_data[\"Instances\"][0]\n    print(\"\\n--- 实际数据实例 ---\")\n    print(\"ID:\", instance[\"id\"])\n    print(\"输入:\", instance[\"input\"])\n    print(\"输出:\", instance[\"output\"])\n```\n\n### 2. 数据字段说明\n每个任务 JSON 文件遵循以下核心结构：\n*   `Definition`: 任务的自然人语言描述（即 Prompt 指令）。\n*   `Positive Examples` \u002F `Negative Examples`: 用于演示正确和错误行为的示例。\n*   `Instances`: 实际用于训练或评估的输入\u002F输出对列表。\n*   `Categories` \u002F `Domains`: 任务所属的类别和领域。\n\n### 3. 使用官方模型 (进阶)\n如果你想直接使用基于此数据训练的模型（Tk-Instruct），可以通过 Hugging Face 加载：\n\n```python\nfrom transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n\n# 加载模型和分词器 (以 tk-instruct-3b-def-pos 为例)\nmodel_name = \"yizhongw\u002Ftk-instruct-3b-def-pos\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForSeq2SeqLM.from_pretrained(model_name)\n\n# 构建输入 (指令 + 输入)\ninstruction = \"Given a tweet, classify it into one of 4 categories: Positive, Negative, Neutral, or Mixed.\"\ninput_text = \"I thought the Spiderman animation was good, but the movie disappointed me.\"\nprompt = f\"{instruction}\\n{input_text}\"\n\n# 推理\ninputs = tokenizer(prompt, return_tensors=\"pt\")\noutputs = model.generate(**inputs, max_length=50)\nresult = tokenizer.decode(outputs[0], skip_special_tokens=True)\n\nprint(result)\n```\n\n> **注意**: 数据中的实例（Instances）版权遵循原始数据集的许可证，具体信息可在每个任务文件的 `Instance License` 字段中查阅。其他元数据遵循 Apache-2.0 协议。","某初创公司的 NLP 团队正试图构建一个能同时处理情感分析、文本摘要和意图识别的通用客服机器人，但受限于标注数据稀缺。\n\n### 没有 natural-instructions 时\n- **模型泛化能力差**：为每个新任务（如从“判断情绪”切换到“提取关键词”）都必须重新收集大量特定标注数据并单独训练模型，耗时耗力。\n- **冷启动成本高昂**：面对从未见过的任务类型，模型完全无法理解需求，只能返回错误或乱码，导致新业务上线周期长达数周。\n- **指令理解僵化**：模型仅依赖固定的输入输出模式，无法通过自然语言描述（如“请把这段话改成委婉的语气”）来动态调整行为，灵活性极低。\n\n### 使用 natural-instructions 后\n- **跨任务零样本迁移**：利用 natural-instructions 中收录的 1500+ 多样化任务定义，模型仅需读取新的自然语言指令即可直接执行未见过的任务，无需额外训练。\n- **快速响应新需求**：当需要增加“检测讽刺语气”等新功能时，直接调用数据集中类似的推理任务模板，将开发周期从数周缩短至几小时。\n- **真正的指令跟随**：模型学会了根据丰富的自然语言定义进行推理，能够准确理解复杂的任务约束（如“只输出 JSON 格式”或“忽略无关背景”），大幅提升交互智能。\n\nnatural-instructions 通过提供大规模、多样化的自然语言任务定义，让 AI 模型从“专才”进化为能通过指令即时学习新技能的“通才”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fallenai_natural-instructions_5c37946d.png","allenai","Ai2","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fallenai_65c450d5.png","",null,"ai2-info@allenai.org","http:\u002F\u002Fwww.allenai.org","https:\u002F\u002Fgithub.com\u002Fallenai",[84,88,92,96],{"name":85,"color":86,"percentage":87},"Python","#3572A5",76.9,{"name":89,"color":90,"percentage":91},"Jinja","#a52a22",22.1,{"name":93,"color":94,"percentage":95},"Shell","#89e051",0.6,{"name":97,"color":98,"percentage":99},"Dockerfile","#384d54",0.3,1039,198,"2026-03-30T04:49:57","Apache-2.0",1,"未说明",{"notes":107,"python":105,"dependencies":108},"该仓库主要是一个自然语言指令数据集集合，而非直接可运行的模型推理代码。README 中提到的模型训练和实验代码位于外部链接（Tk-Instruct 仓库）。本地使用主要涉及通过 Python 脚本（如 src\u002Ftest_all.py）验证 JSON 格式数据，具体运行环境需求需参考关联的 Tk-Instruct 项目或自行根据数据处理规模设定。",[],[15,34],"2026-03-27T02:49:30.150509","2026-04-06T10:24:04.870587",[113,118,123,128,133,138,143],{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},13275,"项目是否有发布标准的评估协议或训练\u002F测试集划分？","是的，官方评估划分已发布。您可以在此处找到：https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Ftree\u002Fmaster\u002Fsplits。请按照该目录中的说明设置实验。此外，项目还发布了 Tk-Instruct 模型的代码（https:\u002F\u002Fgithub.com\u002Fyizhongw\u002FTk-Instruct）和检查点（可在 HuggingFace 搜索 tk-instruct-）。","https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fissues\u002F757",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},13276,"如果发现任务中存在重复的实例（输入相同但输出不同），应该如何处理？","需要将具有相同输入但不同有效输出的实例合并为一个实例（即一个输入对应多个 distinct 的有效输出列表）。维护者建议编写 Python 脚本来检查重复项，修复错误后，将提交放到分支上并发送 Pull Request (PR) 以便合并到主分支。","https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fissues\u002F39",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},13277,"某些任务（如 MMMLU 系列）的指令描述不够清晰，导致标注者无法理解预期输出，该如何改进？","需要完善任务定义（task definitions），明确解释预期输出与主题的关系。例如，不仅要说“这是一个关于商业伦理的问题”，还需说明答案应选择更符合伦理还是更不符合伦理的选项，并解释相关概念。同时，建议参考众包反馈（如 Google Sheets 中的人类预测数据），根据标注者的理解困难进一步优化指令。","https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fissues\u002F631",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},13278,"发现某些任务的实例信息不足，导致无法根据提供的信息正确回答问题，如何解决？","对于因缺少上下文而无法回答的实例，通常的解决方案是删除这些实例中导致困惑的特定短语，或者直接移除该实例。维护者确认如果实例本身在原始数据集中就存在缺陷，且数量不多，可以直接通过 PR 移除以避免引入偏差。","https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fissues\u002F662",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},13279,"如何提交基于新数据集（如 ATOMIC）的新任务建议？","维护者建议直接提交包含新任务定义的 Pull Request (PR)。不需要预先进行过多的讨论，团队会根据 PR 的具体内容来审查和解决问题。如果您已经准备好了正负样本和实例，可以直接开启 PR。","https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fissues\u002F217",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},13280,"发现某个任务的数据分布严重倾斜（例如过度集中在某一类别），应该重采样还是下采样？","首先需检查数据倾斜的原因。如果是由于数据源中某些标签是唯一出现的谓词，可能需要查找未文档化的特殊标签定义。解决方案通常是通过 PR 补充缺失的标签定义或修正数据。如果问题实例数量较少，直接移除可能比重采样更简单且不会引起明显的分布偏差。","https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fissues\u002F652",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},13281,"收到众包工人关于任务指令的反馈（如大小写规范、负样本改进建议），应如何处理？","应根据反馈更新任务指令。例如：在负样本中说明如何修改标题使其无效；在示例中包含涵盖“谁、什么、何时、何地、为什么”的回答；确保专有名词（如 Hitler, Louisiana）使用正确的大写字母。修改完成后，还需检查人类预测结果，确保指令改进能提升标注者的理解。","https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fissues\u002F759",[149,154,159,164,169,174,179,184,189],{"id":150,"version":151,"summary_zh":152,"released_at":153},71964,"v2.8","## 变更内容\n* 由 @danyaljj 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F776 中更新了 README.md\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fcompare\u002Fv2.7...v2.8","2023-02-01T12:57:36",{"id":155,"version":156,"summary_zh":157,"released_at":158},71965,"v2.7","## 变更内容\n* @manandey 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F770 中修复了拼写错误\n* @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F774 中解决了数据泄露问题\n* @aviaefrat 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F775 中修复了任务 README 中的一个小拼写错误\n\n## 新贡献者\n* @aviaefrat 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F775 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fcompare\u002Fv2.6...v2.7","2022-10-10T15:09:22",{"id":160,"version":161,"summary_zh":162,"released_at":163},71966,"v2.6","## 变更内容\n* 根据 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F761 中的人工反馈，更新了任务 200 - 600。\n* 解决了 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F762 中针对任务 600 - 800 的反馈意见。\n* 处理了 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F760 中收到的众包工作者反馈。\n* @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F763 中更新了 README.md 文件。\n* @yizhongw 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F764 中添加了标准评估设置。\n* @manandey 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F765 中修复了一个拼写错误。\n* @lkm2835 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F766 中修正了 rouge 中的一个参数名称错误。\n* @manandey 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F767 中修复了多个拼写错误。\n* @yizhongw 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F768 中对排行榜的设置进行了更新。\n* @yizhongw 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F769 中添加了关于如何生成参考文件的说明。\n\n## 新贡献者\n* @manandey 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F765 中完成了他们的首次贡献。\n* @lkm2835 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F766 中完成了他们的首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fcompare\u002Fv2.5...v2.6","2022-07-01T17:37:17",{"id":165,"version":166,"summary_zh":167,"released_at":168},71967,"v2.5","## 变更内容\n* 使用新的数据源、URL 和类别更新任务 JSON 文件。由 @yizhongw 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F742 中完成。\n* 添加 Web NLG、Personachat、Quartz 和 WiQA 数据集。由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F740 中完成。\n* 增加几行代码，用于打印平均实例数量。由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F743 中完成。\n* 从提示来源中补充缺失的任务。由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F744 中完成。\n* 从提示来源中补充缺失的任务。由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F746 中完成。\n* 修复数据源中的一个拼写错误。由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F749 中完成。\n* 调整任务 1296 的顺序。由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F747 中完成。\n* 从定义中删除多余的词语。由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F750 中完成。\n* 从定义中删除多余的词语。由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F751 中完成。\n* 更新任务 1509_evaluation_antonyms。由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F753 中完成。\n* 更新任务 575 和任务 1599。由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F756 中完成。\n* 使用测试类别中的任务创建用于众包标注的单个文件。由 @yizhongw 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F758 中完成。\n* 修复 README 中存在多余列的几行内容。由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F754 中完成。\n\n## 新贡献者\n* @yizhongw 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F742 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fcompare\u002Fv2.4...v2.5","2022-04-12T04:19:44",{"id":170,"version":171,"summary_zh":172,"released_at":173},71968,"v2.4","## 变更内容\n* 众包工作者对任务的评估 [进行中]，由 @danyaljj 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F276 中完成\n* 修订类别：任务1-500，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F695 中完成\n* 从任务702、711和712的说明中移除一个具有误导性的短语，由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F697 中完成\n* 为任务697添加示例，并修改部分实例以修复#661问题，由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F698 中完成\n* 因#685中报告的模糊性而删除1562和1563号任务，由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F699 中完成\n* 解决实例中的不平衡问题，由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F702 中完成\n* 修复Task050标签不平衡的错误，由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F704 中完成\n* 重命名重复的任务119，由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F708 中完成\n* 为任务265添加英文实例，由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F707 中完成\n* 修订剩余任务，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F705 中完成\n* 更新test_all，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F710 中完成\n* 移除备用任务中的重复和错误领域，并删除两项任务，由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F709 中完成\n* 从任务中移除URL，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F711 中完成\n* 添加推理字段——任务1-50，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F712 中完成\n* 添加推理字段——任务51-200，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F713 中完成\n* 添加推理——任务201-400，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F714 中完成\n* 添加推理——任务401-700，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F715 中完成\n* 添加推理——任务701-1014，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F716 中完成\n* 添加推理——任务1015-1316，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F717 中完成\n* 添加推理——任务1317至结尾，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F718 中完成\n* 更新测试和层级结构，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F720 中完成\n* 修复Unicode字符问题，由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F723 中完成\n* 删除定性推理，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F724 中完成\n* 任务1385-1390：ANLI、CB、HellaSwag、WSC，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F719 中完成\n* 用于更新类别、领域和推理的脚本，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions\u002Fpull\u002F722 中完成\n* 将定义改为字符串列表，由 @Palipoor 在 https:\u002F","2022-03-26T01:27:26",{"id":175,"version":176,"summary_zh":177,"released_at":178},71969,"v2.3","更多改进：修复了若干 bug，并优化了任务类别分配。\n\n## 变更内容\n* 修复 Issue #652。由 @RushangKaria 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F680 中完成。\n* 修复 #488。在所有实例中标记了所有代词。由 @pulkitverma25 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F676 中完成。\n* Matres 任务增强功能，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F684 中实现。\n* 更新 task1489_sarcasmdetection_tweet_classification.json 文件，由 @danyaljj 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F688 中完成。\n* 任务 1400-1425：分配类别、领域并处理反馈意见，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F651 中完成。\n* 任务 1100-1200：分配类别和领域，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F683 中完成。\n* 任务 1550-1600：分配类别和领域，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F687 中完成。\n* 任务 1600-1726：分配类别和领域，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F690 中完成。\n* 任务层级结构的最终版本，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F691 中完成。\n* 从 TriviaQA 中移除链接，由 @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F689 中完成。\n* 任务 901-1000 的更新文件，由 @aarunku5 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F693 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fcompare\u002F2.2...v2.3","2022-01-26T03:53:49",{"id":180,"version":181,"summary_zh":182,"released_at":183},71970,"2.2","修复 bug 并改进任务分类。\n\n## 变更内容\n* @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F646 中修复了任务 738 中缺失的视角。\n* @pulkitverma25 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F644 中修复了任务 1700+ 的人工评估问题及部分语法问题。\n* @kurbster 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F636 中修复了问题 #592，更新了任务 564。\n* @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F638 中根据反馈调整了任务 1300 和 1400。\n* @pulkitverma25 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F647 中根据反馈调整了任务 738 至 800。\n* @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F641 中根据反馈调整了任务 1600 至 1657。\n* 任务 551 至 575：@yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F594 中分配了类别和领域，并处理了相关反馈。\n* @RushangKaria 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F649 中修复了问题 #577。\n* @XudongOliverShen 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F650 中改进了 HateEval 任务。\n* @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F653 中更新了任务 1601 和 1602，以解决问题 #639。\n* @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F643 中将任务 1626 至 1629 的示例更新为克罗地亚语，并修复了字符相关问题。\n* 任务 1283 至 1300：@yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F625 中分配了类别和领域，并处理了相关反馈。\n* @ghlai9665 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F593 中更新了任务 161 至 180 的类别和领域。\n* @ghlai9665 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F606 中更新了任务 181 至 199 的类别和领域。\n* 任务 1300 至 1325：@yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F659 中分配了类别和领域。\n* 任务 1425 至 1450：@yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F656 中分配了类别、领域，并处理了相关反馈。\n* @pulkitverma25 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F655 中修复了问题 #524。\n* @Sujan242 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F654 中改进了 MMMLU 任务。\n* @XudongOliverShen 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F648 中改进了 Jigsaw 任务。\n* 任务 1325 至 1350：@yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F660 中分配了类别和领域。\n* 任务 1200 至 1225：@yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F601 中分配了类别、领域，并处理了相关反馈。\n* @pulkitverma25 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F602 中为 MMMLU 任务添加了反例，并修复了问题 #569。","2021-12-16T17:26:44",{"id":185,"version":186,"summary_zh":187,"released_at":188},71971,"v2.1","此版本包含：\n- 一批新增任务\n- 新的任务类型及其领域层级结构\n- 对现有任务的大量改进和修复。\n\n我们将继续发布更多版本，以不断完善数据并增加更多实验。\n如果您想知道是否仍可为该仓库贡献力量，答案是肯定的！\n\n## 变更内容\n* 进一步更新任务领域与类别；@ghlai9665 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F436 中添加了实用脚本。\n* 任务739–742：@hanut1909 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F317 中贡献的 lhoestq 数据集问答生成任务。\n* 任务829–833：@abhinawale12 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F325 中贡献的 giga_fren、poem_sentiment 和 poleval2019_mt 任务。\n* 任务877–881：@kashyap467 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F334 中贡献的 kde4 和 schema_guided_dstc8 数据集任务。\n* 任务934：Turk 简化任务，由 @cosmicishan 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F444 中完成。\n* 任务1499–1504：由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F461 中完成。\n* @yeganehkordi 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F455 中更新了 zest 任务。\n* @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F447 中针对人工反馈进行了处理。\n* 任务600–606：由 @Nikhitha0911 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F285 中完成。\n* @atharva-naik 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F460 中对评估任务文件进行了修正。\n* 任务1443–1446：由 @kurbster 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F449 中贡献的合成数据任务。\n* 任务1394–1397：由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F416 中完成。\n* 任务1361–1364：由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F410 中完成。\n* 任务1437–1442：由 @kuntalkumarpal 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F448 中完成的 DoQA 任务。\n* 任务1356–1360：由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F409 中完成。\n* 任务906–909：DialogRE 回答生成任务，由 @matthew-huff 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F341 中完成。\n* @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F466 中使错误信息更具说明性。\n* 任务1365–1369：由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F411 中完成。\n* 第二份 PR：针对人工反馈中的第34、35、44、45、48、49、52–58、167、201–205 条建议进行处理，由 @Palipoor 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F464 中完成。\n* 任务1498：由 @sharma121amit 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F456 中完成。\n* 任务955：将简单的英文维基百科句子改写为更复杂的英文表达，由 @ghlai9665 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F354 中完成。\n* 任务1505：ro","2021-11-17T17:41:23",{"id":190,"version":191,"summary_zh":192,"released_at":193},71972,"v2.0","我们于2021年10月15日完成了首次面向公众的扩展工作。在此期间，社区贡献了超过1500个任务！！🎉  \n随着数据的不断完善和更多实验的加入，我们将继续发布更多版本。  \n如果你想知道是否还能为这个仓库贡献力量，答案是肯定的！\n\n注：v1 数据可在此访问：https:\u002F\u002Finstructions.apps.allenai.org\u002F \n\n## 变更内容\n* natural instructions-v1 由 @danyaljj 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F1 中完成  \n* 更新 README 中的任务分类，由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F2 中完成  \n* 重复复制任务，由 @danyaljj 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F4 中完成  \n* 将示例中的输出格式改为非列表形式，FAQ，由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F9 中完成  \n* 更新 README.md，由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F12 中完成  \n* 任务73，由 @amirrezamirzaei 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F11 中完成  \n* Shailaja 的任务，由 @shailaja183 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F14 中完成  \n* 添加了4个使用 SPLASH 数据集的任务和1个使用 CoNaLa 数据集的任务，由 @kurbster 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F15 中完成  \n* 更新 README.md：澄清“有意义的贡献”，由 @danyaljj 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F23 中完成  \n* babi 数据集任务1—3个 NI 任务（问题生成、答案生成、识别 QA 的支持性事实），由 @shailaja183 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F22 中完成  \n* 更新 README.md：未关联到 GitHub 个人主页的提交，由 @danyaljj 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F30 中完成  \n* 添加使用 CoNaLa 数据集示例的任务，由 @kurbster 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F29 中完成  \n* 移除重复的示例，由 @kurbster 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F40 中完成  \n* 修复重复实例，由 @shailaja183 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F42 中完成  \n* squad2.0，由 @shailaja183 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F32 中完成  \n* 修复多个待处理的问题，由 @danyaljj 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F38 中完成  \n* 任务103：将事实转化为故事，由 @Mihir3009 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F34 中完成  \n* 更新任务102和103的相关文件，由 @Mihir3009 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F49 中完成  \n* 更新 README.md，由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F50 中完成  \n* 修复 issue #13、#43、#31 和 #24，由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F53 中完成  \n* 移除随机化（issue #55），由 @swarooprm 在 https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions-expansion\u002Fpull\u002F56 中完成  \n* 合成数据，由 @nrjvarshney 提供","2021-10-18T02:35:08"]