[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-microsoft--CodeXGLUE":3,"tool-microsoft--CodeXGLUE":65},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160015,2,"2026-04-18T11:30:52",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},8553,"spec-kit","github\u002Fspec-kit","Spec Kit 是一款专为提升软件开发效率而设计的开源工具包，旨在帮助团队快速落地“规格驱动开发”（Spec-Driven Development）模式。传统开发中，需求文档往往与代码实现脱节，导致沟通成本高且结果不可控；而 Spec Kit 通过将规格说明书转化为可执行的指令，让 AI 直接依据明确的业务场景生成高质量代码，从而减少从零开始的随意编码，确保产出结果的可预测性。\n\n该工具特别适合希望利用 AI 辅助编程的开发者、技术负责人及初创团队。无论是启动全新项目还是在现有工程中引入规范化流程，用户只需通过简单的命令行操作，即可初始化项目并集成主流的 AI 编程助手。其核心技术亮点在于“规格即代码”的理念，支持社区扩展与预设模板，允许用户根据特定技术栈定制开发流程。此外，Spec Kit 强调官方维护的安全性，提供稳定的版本管理，帮助开发者在享受 AI 红利的同时，依然牢牢掌握架构设计的主动权，真正实现从“凭感觉写代码”到“按规格建系统”的转变。",88749,"2026-04-17T09:48:14",[15,26,14,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":10,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85267,"2026-04-18T11:00:28",[26,51,52,53,14,54,15,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":62,"last_commit_at":63,"category_tags":64,"status":16},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[15,51,54],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":71,"readme_en":72,"readme_zh":73,"quickstart_zh":74,"use_case_zh":75,"hero_image_url":76,"owner_login":77,"owner_name":78,"owner_avatar_url":79,"owner_bio":80,"owner_company":81,"owner_location":81,"owner_email":82,"owner_twitter":83,"owner_website":84,"owner_url":85,"languages":86,"stars":111,"forks":112,"last_commit_at":113,"license":114,"difficulty_score":23,"env_os":115,"env_gpu":116,"env_ram":115,"env_deps":117,"category_tags":120,"github_topics":81,"view_count":10,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":121,"updated_at":122,"faqs":123,"releases":153},9219,"microsoft\u002FCodeXGLUE","CodeXGLUE","CodeXGLUE ","CodeXGLUE 是由微软研究院推出的代码智能基准测试平台，旨在为人工智能在编程领域的应用提供统一的评估标准。随着开发者数量激增，利用 AI 辅助代码搜索、自动补全、跨语言翻译及缺陷检测等需求日益迫切。然而，该领域长期缺乏像自然语言处理中 GLUE 或计算机视觉中 ImageNet 那样全面、多样化的数据集来衡量模型性能。\n\nCodeXGLUE 正是为了解决这一痛点而生。它汇集了 14 个数据集，覆盖代码克隆检测、代码修复、文本到代码生成、代码摘要等 10 类核心任务，全面涵盖“代码对代码”、“文本对代码”、“代码对文本”及“文本对文本”等多种交互场景。通过提供标准化的数据与评估平台，它让不同算法模型的性能对比变得科学且直观，极大地推动了预训练模型（如 CodeBERT）在代码理解与生成方面的研究进展。\n\n这套工具特别适合人工智能研究人员、大模型开发者以及关注代码智能前沿技术的工程师使用。无论是希望验证新算法的有效性，还是想要复现业界领先的代码辅助功能，CodeXGLUE 都提供了坚实的数据基础。其独特的价值在于打破了以往任务分散的局面，构建了一个多元化的评测生态，帮助社区更清晰地探","CodeXGLUE 是由微软研究院推出的代码智能基准测试平台，旨在为人工智能在编程领域的应用提供统一的评估标准。随着开发者数量激增，利用 AI 辅助代码搜索、自动补全、跨语言翻译及缺陷检测等需求日益迫切。然而，该领域长期缺乏像自然语言处理中 GLUE 或计算机视觉中 ImageNet 那样全面、多样化的数据集来衡量模型性能。\n\nCodeXGLUE 正是为了解决这一痛点而生。它汇集了 14 个数据集，覆盖代码克隆检测、代码修复、文本到代码生成、代码摘要等 10 类核心任务，全面涵盖“代码对代码”、“文本对代码”、“代码对文本”及“文本对文本”等多种交互场景。通过提供标准化的数据与评估平台，它让不同算法模型的性能对比变得科学且直观，极大地推动了预训练模型（如 CodeBERT）在代码理解与生成方面的研究进展。\n\n这套工具特别适合人工智能研究人员、大模型开发者以及关注代码智能前沿技术的工程师使用。无论是希望验证新算法的有效性，还是想要复现业界领先的代码辅助功能，CodeXGLUE 都提供了坚实的数据基础。其独特的价值在于打破了以往任务分散的局面，构建了一个多元化的评测生态，帮助社区更清晰地探索 AI 赋能软件开发的无限可能。","# Introduction\n\nAccording to [Evans Data Corporation](https:\u002F\u002Fevansdata.com\u002Fpress\u002FviewRelease.php?pressID=278), there are 23.9 million professional developers in 2019, and the population is expected to reach 28.7 million in 2024. With the growing population of developers, code intelligence, which aims to leverage AI to help software developers improve the productivity of the development process, is growing increasingly important in both communities of software engineering and artificial intelligence. \n\nWhen developers want to find code written by others with the same intent, [code search](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.09436) systems can help automatically retrieve semantically relevant code given natural language queries. When developers are confused about what to write next, [code completion](https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.00742) systems can help by automatically completing the following tokens given the context of the edits being made. When developers want to implement Java code with the same function of some existing body of Python code, [code-to-code translation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.03511) systems can help translate from one programming language (Python) to another (Java). \n\nCode intelligence therefore plays a vital role in Microsoft’s mission to empower developers. As highlighted by Microsoft CEO Satya Nadella at Microsoft [Build 2020](https:\u002F\u002Fmybuild.microsoft.com\u002Fsessions\u002F23912de2-1531-4684-b85a-d57ac30af09e), the role of developers is more important than ever. GitHub is increasingly the default home for source code, and Visual Studio Code is the most popular code editor. Microsoft offers the most complete toolchain for developers, bringing together the best of GitHub, Visual Studio, and Microsoft Azure to help developers to go from idea to code and code to cloud. \n\nRecent years have seen a surge of applying of statistical models, including neural nets, to code intelligence tasks. Very recently, pre-trained models learned from big programming language data  have been inspired by the great success of large pre-trained models like [BERT](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.04805) and [GPT](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.09203) in natural language processing (NLP). These models, including [IntelliCode](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.08025.pdf) and [CodeBERT](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2002.08155.pdf), obtain further improvements on code understanding and generation problems. However, the area of code intelligence lacks a benchmark suite that covers a wide range of tasks. We have seen that a diversified benchmark dataset is significant for the growth of an area of applied AI research, like [ImageNet](http:\u002F\u002Fimage-net.org\u002F) for computer vision and [GLUE](https:\u002F\u002Fgluebenchmark.com\u002F) for NLP. \n\nTo address this, researchers from Microsoft Research Asia, Developer Division, and Bing introduce CodeXGLUE, a benchmark dataset and open challenge for code intelligence. It includes a collection of code intelligence tasks and a platform for model evaluation and comparison. CodeXGLUE stands for General Language Understanding Evaluation benchmark for CODE. It includes 14 datasets for 10 diversified code intelligence tasks covering the following scenarios: \n\n*\t**[code-code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Ftree\u002Fmain\u002FCode-Code)** (clone detection, defect detection, cloze test, code completion, code repair, and code-to-code translation)\n* **[text-code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Ftree\u002Fmain\u002FText-Code)** (natural language code search, text-to-code generation) \n* **[code-text](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Ftree\u002Fmain\u002FCode-Text\u002F)** (code summarization) \n* **[text-text](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Ftree\u002Fmain\u002FText-Text)** (documentation translation) \n\nA brief summary of CodeXGLUE is given below, including tasks, datasets, language, sizes in various states, baseline systems, providers, and short definitions of each task. Datasets highlighted in BLUE are newly introduced. \n![A brief summary of CodeXGLUE, including tasks, datasets, baseline systems, etc.](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_CodeXGLUE_readme_8251a9e4ec6b.jpg)\n\n\n\nTo make it easy for participants, we provide three baseline models to support these tasks, including a BERT-style pre-trained model (in this case, CodeBERT), which is good at understanding problems. We also include a GPT-style pre-trained model, which we call CodeGPT, to support completion and generation problems. Finally, we include an Encoder-Decoder framework that supports sequence-to-sequence generation problems.\n\nThree pipelines including [CodeBERT](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeBERT), [CodeGPT](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FCodeGPT-small-java-adaptedGPT2\n), and Encoder-Decoder are provided to make it easy for participants.\n![baselines](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_CodeXGLUE_readme_cc4a57308d0b.jpg)\n\n\nWith CodeXGLUE, we seek to support the development of models that can be applied to various code intelligence problems, with the goal of increasing the productivity of software developers. We encourage researchers to participate in the open challenges to continue progress in code intelligence. Moving forward, we’ll extend CodeXGLUE to more programming languages and downstream tasks while continuing to push forward pre-trained models by exploring new model structures, introducing new pre-training tasks, using different types of data, and more.\n\n# Relevant Links\n[Leaderboard](https:\u002F\u002Fmicrosoft.github.io\u002FCodeXGLUE\u002F) | [CodeXGLUE paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2102.04664.pdf) | [Access from HuggingFace datasets](https:\u002F\u002Fhuggingface.co\u002Fdatasets?search=code_x_glue) \u003Cimg alt=\"Hugging Face Datasets\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-%F0%9F%A4%97%20datasets-blue\"> \u003C\u002Fa >\n\n# Tasks and Datasets\n\nBelow, we elaborate on the task definition for each task and newly introduced datasets that are highlighted in the table above.\n\n1.\tClone detection (BigCloneBench, POJ-104). A model is tasked with measure the semantic similarity between codes. Two existing datasets are included. One is for binary classification between code and the other is for retrieving semantically similar code given code as the query. \n2.\tDefect detection (Devign). A model is tasked with identifying whether a body of source code contains defects that may be used to attack software systems, such as resource leaks, use-after-free vulnerabilities and DoS attack. An existing dataset is included.\n3.\tCloze test (CT-all, CT-max\u002Fmin). A model is tasked with predicting the masked token from  code, formulated as a multi-choice classification problem. The two datasets are newly created, one with candidates from the (filtered) vocabulary and the other with candidates among “max” and “min”.\n4.\tCode completion (PY150, GitHub Java Corpus). A model is tasked with predicting following tokens given a code context. Both token-level and line-level completion are covered. The token-level task is analogous to language modeling, and we include two influential datasets here. Line-level datasets are newly created to test a model’s ability to autocomplete a line. \n5.\tCode translation (CodeTrans). A model is tasked with translating the code in one programming language to the code in another one. A dataset between Java and C# is newly created.\n6.\tCode search (CodeSearchNet, AdvTest; CodeSearchNet, WebQueryTest). ). A model is given the task of  measuring  semantic similarity between text and code. In the retrieval scenario, a test set is newly created where function names and variables in test sets are replaced to test the generalization ability of a model. In text-code classification scenario, a test set where natural language queries come from Bing query log is created to test on real user queries.\n7.\tCode repair (Bugs2Fix). A model is tasked with trying to automatically refine the code, which could be buggy or complex. An existing dataset is included.\n8.\tText-to-code generation (CONCODE). A model is given the task to generate code given natural language description. An existing dataset is included.\n9.\tCode summarization (CodeSearchNet). A model is given the task to generate natural language comments for a code. Existing datasets are included.\n10.\tDocumentation translation (Microsoft Docs). A model is given the task to translate code documentation between human languages. A dataset, focusing on low-resource multilingual translation, is newly created.\n\n# Submission Instructions\n\nOnce you have built a model that meets your expectations on evaluation with the dev set, you can submit your test results to get official evaluation on the test set. To ensure the integrity of the official test results, we do not release the correct answers for test set to the public. To submit your model for official evaluation on the test set, follow the below steps:\n\n1. Generate your prediction output for the dev set.\n2. Run the official evaluation methodologies found in the task specific git repo and verify your systems are running as expected.\n3. Generate your prediction output for the test set.\n4. Submit the following information by emailing to `codexglue@microsoft.com`.\n\nYour email should include:\n\n1. Prediction results on test set. **[Required]**\n2. Prediction results on dev set. **[Recommended]**\n3. Individual\u002FTeam Name: Name of the individual or the team to appear in the leaderboard. **[Required]**\n4. Individual\u002FTeam Institution: Name of the institution of the individual or the team to appear in the leaderboard. **[Optional]**\n5. Model code: Training code for the model. **[Recommended]**\n6. Model information: Name of the model\u002Ftechnique to appear in the leaderboard. **[Required]**\n7. Paper Information: Name, Citation, URL of the paper if model is from a published work to appear in the leaderboard. **[Optional]**\n\nTo avoid \"P-hacking\" we discourage too many submissions from the same group in a short period of time.\n\n# Training and Inference Time Cost\n\nWe calculate the training and inference time cost for each dataset with 2 P100 GPUs. Results are shared in the following table.\n![time-cost](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_CodeXGLUE_readme_f54016a46d8b.jpg)\n\n# LICENSE\nOur codes follow MIT License.\n\nOur datasets follow Computational Use of Data Agreement (C-UDA) License.\n\n# Reference\nIf you use this code or CodeXGLUE, please consider citing us.\n\u003Cpre>\u003Ccode>@article{DBLP:journals\u002Fcorr\u002Fabs-2102-04664,\n  author    = {Shuai Lu and\n               Daya Guo and\n               Shuo Ren and\n               Junjie Huang and\n               Alexey Svyatkovskiy and\n               Ambrosio Blanco and\n               Colin B. Clement and\n               Dawn Drain and\n               Daxin Jiang and\n               Duyu Tang and\n               Ge Li and\n               Lidong Zhou and\n               Linjun Shou and\n               Long Zhou and\n               Michele Tufano and\n               Ming Gong and\n               Ming Zhou and\n               Nan Duan and\n               Neel Sundaresan and\n               Shao Kun Deng and\n               Shengyu Fu and\n               Shujie Liu},\n  title     = {CodeXGLUE: {A} Machine Learning Benchmark Dataset for Code Understanding\n               and Generation},\n  journal   = {CoRR},\n  volume    = {abs\u002F2102.04664},\n  year      = {2021}\n}\u003C\u002Fcode>\u003C\u002Fpre>\n\nThis research was conducted by Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Daya Guo, Duyu Tang, Junjie Huang, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shuai Lu, Shujie Liu, and Shuo Ren.\n","# 引言\n\n根据[埃文斯数据公司](https:\u002F\u002Fevansdata.com\u002Fpress\u002FviewRelease.php?pressID=278)的报告，2019年全球专业开发者人数为2390万，预计到2024年将增至2870万。随着开发者群体的不断壮大，旨在利用人工智能提升软件开发效率的代码智能技术，在软件工程和人工智能两大领域的重要性日益凸显。\n\n当开发者希望找到他人编写的具有相同意图的代码时，[代码搜索](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.09436)系统能够根据自然语言查询自动检索语义相关的代码片段。当开发者不确定下一步该写什么时，[代码补全](https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.00742)系统则可以根据当前编辑上下文自动补全后续代码。而当开发者需要将一段Python代码的功能移植到Java中实现时，[代码到代码的翻译](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.03511)系统可以帮助完成从一种编程语言（Python）到另一种编程语言（Java）的转换。\n\n因此，代码智能在微软赋能开发者这一使命中发挥着至关重要的作用。正如微软首席执行官萨蒂亚·纳德拉在微软[BUILD 2020]大会上所强调的那样，开发者的作用比以往任何时候都更加重要。GitHub正逐渐成为源代码的默认托管平台，而Visual Studio Code则是最受欢迎的代码编辑器。微软为开发者提供了最完整的工具链，整合了GitHub、Visual Studio和Microsoft Azure的优势，帮助开发者将创意转化为代码，并进一步部署到云端。\n\n近年来，统计模型（包括神经网络）在代码智能任务中的应用呈现爆发式增长。尤其是最近，基于大规模编程语言数据预训练的模型，受到了BERT([arxiv.org\u002Fabs\u002F1810.04805](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.04805))和GPT([arxiv.org\u002Fabs\u002F1908.09203](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.09203))等自然语言处理领域大型预训练模型成功启发。这些模型，如[IntelliCode](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.08025.pdf)和[CodeBERT](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2002.08155.pdf)，在代码理解和生成方面取得了显著进展。然而，目前代码智能领域仍缺乏覆盖广泛任务的基准测试套件。我们已经看到，多样化的基准数据集对于应用型人工智能研究领域的蓬勃发展至关重要，例如计算机视觉领域的[ImageNet](http:\u002F\u002Fimage-net.org\u002F)和自然语言处理领域的[GLUE](https:\u002F\u002Fgluebenchmark.com\u002F)。\n\n为此，来自微软亚洲研究院、开发者事业部和必应的研究人员共同推出了CodeXGLUE——一个面向代码智能的基准数据集与开放挑战赛。它包含一系列代码智能任务以及用于模型评估和比较的平台。CodeXGLUE意为“面向代码的通用语言理解评估基准”，涵盖了10类多样化的代码智能任务，共14个数据集，具体场景如下：\n\n* **[代码-代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Ftree\u002Fmain\u002FCode-Code)**（克隆检测、缺陷检测、完形填空、代码补全、代码修复以及代码到代码的翻译）\n* **[文本-代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Ftree\u002Fmain\u002FText-Code)**（自然语言代码搜索、文本到代码生成）\n* **[代码-文本](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Ftree\u002Fmain\u002FCode-Text\u002F)**（代码摘要生成）\n* **[文本-文本](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Ftree\u002Fmain\u002FText-Text)**（文档翻译）\n\n以下是对CodeXGLUE的简要概述，包括任务、数据集、使用的语言、各数据集的规模、基线系统、提供方以及每项任务的简要定义。其中以蓝色标注的数据集为新引入的内容。\n![CodeXGLUE的简要概述，包括任务、数据集、基线系统等](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_CodeXGLUE_readme_8251a9e4ec6b.jpg)\n\n\n\n为了方便参赛者，我们提供了三类基线模型来支持这些任务：一种是擅长问题理解的BERT风格预训练模型（即CodeBERT）；另一种是GPT风格的预训练模型，我们称之为CodeGPT，主要用于补全和生成任务；最后还提供了一个编码器-解码器框架，以支持序列到序列的生成任务。\n\n我们提供了三条流水线，分别基于[CodeBERT](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeBERT)、[CodeGPT](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FCodeGPT-small-java-adaptedGPT2)以及编码器-解码器架构，以便参赛者轻松上手。\n![基线模型](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_CodeXGLUE_readme_cc4a57308d0b.jpg)\n\n\n通过CodeXGLUE，我们旨在推动能够应用于各类代码智能问题的模型开发，从而提高软件开发者的生产力。我们鼓励研究人员积极参与这一开放挑战赛，继续推进代码智能领域的研究进展。未来，我们将把CodeXGLUE扩展到更多编程语言和下游任务中，同时通过探索新的模型结构、引入新的预训练任务、使用不同类型的数据等方式，持续优化预训练模型。\n\n# 相关链接\n[排行榜](https:\u002F\u002Fmicrosoft.github.io\u002FCodeXGLUE\u002F) | [CodeXGLUE论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2102.04664.pdf) | [HuggingFace数据集入口](https:\u002F\u002Fhuggingface.co\u002Fdatasets?search=code_x_glue) \u003Cimg alt=\"Hugging Face Datasets\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-%F0%9F%A4%97%20datasets-blue\"> \u003C\u002Fa >\n\n# 任务与数据集\n\n以下我们将详细说明每个任务的任务定义，以及上表中重点介绍的新引入数据集。\n\n1. 克隆检测（BigCloneBench、POJ-104）。该任务要求模型衡量代码之间的语义相似度。包含两个现有数据集：一个用于代码间的二分类任务，另一个则是在给定查询代码的情况下检索语义相似的代码。\n2. 缺陷检测（Devign）。该任务要求模型识别源代码中是否存在可能被用于攻击软件系统的缺陷，例如资源泄漏、释放后使用漏洞和拒绝服务攻击等。包含一个现有数据集。\n3. 完形填空（CT-all、CT-max\u002Fmin）。该任务要求模型从代码中预测被遮盖的标记符，并将其表述为多选分类问题。这两个数据集均为新创建，其中一个候选标记符来自（经过过滤的）词汇表，另一个则仅限于“max”和“min”这两个选项。\n4. 代码补全（PY150、GitHub Java Corpus）。该任务要求模型在给定代码上下文的情况下预测后续标记符。涵盖标记符级别和行级别两种补全方式。标记符级别的任务类似于语言建模，我们在此收录了两个具有影响力的数据集。行级别数据集则是新近创建的，用于测试模型自动补全整行代码的能力。\n5. 代码翻译（CodeTrans）。该任务要求模型将一种编程语言的代码翻译成另一种编程语言的代码。新创建了一个Java与C#之间的数据集。\n6. 代码搜索（CodeSearchNet, AdvTest；CodeSearchNet, WebQueryTest）。该任务要求模型衡量文本与代码之间的语义相似度。在检索场景中，新创建了一个测试集，其中测试集中函数名和变量已被替换，以检验模型的泛化能力。而在文本-代码分类场景中，则创建了一个自然语言查询来源于Bing查询日志的测试集，用于测试模型对真实用户查询的表现。\n7. 代码修复（Bugs2Fix）。该任务要求模型尝试自动优化代码，这些代码可能存在错误或过于复杂。包含一个现有数据集。\n8. 文本到代码生成（CONCODE）。该任务要求模型根据自然语言描述生成代码。包含一个现有数据集。\n9. 代码摘要（CodeSearchNet）。该任务要求模型为代码生成自然语言注释。包含现有的数据集。\n10. 文档翻译（Microsoft Docs）。该任务要求模型在不同人类语言之间翻译代码文档。新创建了一个专注于低资源多语言翻译的数据集。\n\n# 提交说明\n\n当您构建的模型在开发集上的评估结果达到预期时，即可提交您的测试结果以获得测试集上的官方评估。为确保官方测试结果的公正性，我们不会公开测试集的正确答案。要提交您的模型以进行测试集上的官方评估，请按照以下步骤操作：\n\n1. 为开发集生成预测输出。\n2. 运行各任务特定Git仓库中的官方评估方法，验证您的系统运行正常。\n3. 为测试集生成预测输出。\n4. 通过电子邮件发送至 `codexglue@microsoft.com` 提交以下信息。\n\n邮件内容应包括：\n\n1. 测试集上的预测结果。**[必填]**\n2. 开发集上的预测结果。**[建议填写]**\n3. 个人\u002F团队名称：将在排行榜上显示的个人或团队名称。**[必填]**\n4. 个人\u002F团队所属机构：将在排行榜上显示的个人或团队所属机构。**[可选]**\n5. 模型代码：用于训练该模型的代码。**[建议提供]**\n6. 模型信息：将在排行榜上显示的模型名称或技术名称。**[必填]**\n7. 论文信息：若模型源自已发表的研究成果，需提供论文名称、引用信息及URL，以便在排行榜上展示。**[可选]**\n\n为避免“P值黑客行为”，我们不鼓励同一团队在短时间内提交过多结果。\n\n# 训练与推理时间成本\n\n我们使用2块P100 GPU计算了每个数据集的训练和推理时间成本。结果见下表。\n![time-cost](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_CodeXGLUE_readme_f54016a46d8b.jpg)\n\n# 许可证\n我们的代码遵循MIT许可证。\n\n我们的数据集遵循数据计算使用协议（C-UDA）许可证。\n\n# 参考文献\n如果您使用此代码或CodeXGLUE，请考虑引用我们的论文。\n\u003Cpre>\u003Ccode>@article{DBLP:journals\u002Fcorr\u002Fabs-2102-04664,\n  author    = {Shuai Lu and\n               Daya Guo and\n               Shuo Ren and\n               Junjie Huang and\n               Alexey Svyatkovskiy and\n               Ambrosio Blanco and\n               Colin B. Clement and\n               Dawn Drain and\n               Daxin Jiang and\n               Duyu Tang and\n               Ge Li and\n               Lidong Zhou and\n               Linjun Shou and\n               Long Zhou and\n               Michele Tufano and\n               Ming Gong and\n               Ming Zhou and\n               Nan Duan and\n               Neel Sundaresan and\n               Shao Kun Deng and\n               Shengyu Fu and\n               Shujie Liu},\n  title     = {CodeXGLUE: {A} Machine Learning Benchmark Dataset for Code Understanding\n               and Generation},\n  journal   = {CoRR},\n  volume    = {abs\u002F2102.04664},\n  year      = {2021}\n}\u003C\u002Fcode>\u003C\u002Fpre>\n\n本研究由Alexey Svyatkovskiy、Ambrosio Blanco、Colin Clement、Dawn Drain、Daxin Jiang、Daya Guo、Duyu Tang、Junjie Huang、Lidong Zhou、Linjun Shou、Long Zhou、Michele Tufano、Ming Gong、Ming Zhou、Nan Duan、Neel Sundaresan、Shao Kun Deng、Shengyu Fu、Shuai Lu、Shujie Liu和Shuo Ren共同完成。","# CodeXGLUE 快速上手指南\n\nCodeXGLUE 是微软推出的代码智能基准数据集与挑战平台，涵盖代码克隆检测、缺陷检测、代码补全、代码翻译、代码搜索等 10 类任务。本指南帮助开发者快速搭建环境并运行基线模型。\n\n## 环境准备\n\n*   **操作系统**: Linux (推荐 Ubuntu 18.04+) 或 macOS\n*   **Python 版本**: 3.6 或更高\n*   **硬件要求**: 建议使用 NVIDIA GPU (如 P100\u002FV100) 进行训练和推理；CPU 仅适用于小规模测试。\n*   **前置依赖**:\n    *   PyTorch 1.6+\n    *   Transformers (Hugging Face)\n    *   Git\n\n> **国内加速建议**：\n> *   Python 包安装推荐使用清华源：`-i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n> *   Hugging Face 模型下载可配置镜像：`export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com`\n\n## 安装步骤\n\n1.  **克隆项目仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE.git\n    cd CodeXGLUE\n    ```\n\n2.  **安装 Python 依赖**\n    进入对应的任务目录（以 `Text-Code` 下的代码生成任务为例，其他任务类似），安装所需库：\n    ```bash\n    cd Text-Code\u002FGeneration-concode\n    pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n    *注：若需使用预训练的 CodeBERT 或 CodeGPT 模型，请确保已安装 `transformers` 库。*\n\n3.  **获取数据集**\n    大部分数据集可通过 Hugging Face Datasets 直接加载，或从各任务子目录下的脚本下载。\n    ```python\n    # 在 Python 环境中验证数据集加载 (示例)\n    from datasets import load_dataset\n    dataset = load_dataset(\"code_x_glue_ct_code_to_text\", \"python\")\n    ```\n\n## 基本使用\n\n以下以 **代码补全 (Code Completion)** 任务为例，展示如何使用提供的 CodeGPT 基线模型进行推理。\n\n1.  **进入任务目录**\n    ```bash\n    cd ..\u002F..\n    cd Code-Code\u002FCompletion\n    ```\n\n2.  **运行评估脚本**\n    使用预训练模型对开发集（dev set）进行预测。以下命令假设你已下载好相应数据并在配置中指定了路径：\n\n    ```bash\n    python run.py \\\n        --model_type gpt2 \\\n        --model_name_or_path microsoft\u002FCodeGPT-small-java-adaptedGPT2 \\\n        --task_name completion \\\n        --do_eval \\\n        --eval_data_file ..\u002Fdata\u002Fjava\u002Fcorpus.jsonl \\\n        --output_dir .\u002Fsaved_models\n    ```\n\n    *   `--model_name_or_path`: 指定模型名称，自动从 Hugging Face 拉取（国内用户建议先配置 `HF_ENDPOINT`）。\n    *   `--eval_data_file`: 指向具体的评估数据文件。\n    *   `--output_dir`: 保存预测结果的目录。\n\n3.  **查看结果**\n    运行结束后，预测结果将保存在 `.\u002Fsaved_models` 目录下。你可以对照官方评估脚本计算准确率或 BLEU 分数。\n\n对于其他任务（如代码搜索、缺陷检测等），请参考对应子目录（如 `Text-Code\u002FSearch`, `Code-Code\u002FDefect-detection`）中的 `run.py` 脚本及参数说明，流程基本一致：**选择模型 -> 指定数据路径 -> 执行训练\u002F评估**。","某跨国金融科技公司的高级工程师李明，正负责将核心交易系统中遗留的 Python 风控模块迁移至 Java 平台，并需同步更新相关技术文档。\n\n### 没有 CodeXGLUE 时\n- **跨语言翻译靠手动**：缺乏统一的代码翻译基准，李明只能逐行人工重写逻辑，耗时数周且极易引入细微的逻辑错误。\n- **语义搜索效率低**：在海量代码库中寻找功能相似的参考片段时，传统关键词搜索无法理解“计算风险敞口”等自然语言意图，导致大量时间浪费在无效筛选上。\n- **模型评估无标准**：团队尝试引入多个 AI 辅助工具，但因缺乏像 ImageNet 那样权威的评测数据集，无法客观对比哪个模型更适合当前的代码补全或缺陷检测任务。\n- **文档维护不同步**：代码重构后，英文技术文档难以自动准确地翻译为中文供本地团队使用，造成信息滞后和理解偏差。\n\n### 使用 CodeXGLUE 后\n- **自动化代码迁移**：利用 CodeXGLUE 中的“代码到代码翻译”任务数据集训练模型，实现了 Python 到 Java 的高精度自动转换，将迁移周期从数周缩短至几天。\n- **智能语义检索**：基于其“文本到代码搜索”能力，李明直接用自然语言描述需求即可精准定位语义相关的代码片段，大幅提升了复用效率。\n- **科学选型有依据**：依托 CodeXGLUE 提供的 10 类多样化任务和统一评测平台，团队快速筛选出在缺陷检测和代码补全上表现最优的预训练模型。\n- **文档同步零时差**：借助“文档翻译”任务能力，系统自动生成与新版代码逻辑严格对应的中文文档，确保了全球协作的信息一致性。\n\nCodeXGLUE 通过提供标准化的基准数据集和评测平台，将原本依赖人工经验的模糊开发流程，转变为可量化、高效率的智能化工程实践。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_CodeXGLUE_23791f34.png","microsoft","Microsoft","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmicrosoft_4900709c.png","Open source projects and samples from Microsoft",null,"opensource@microsoft.com","OpenAtMicrosoft","https:\u002F\u002Fopensource.microsoft.com","https:\u002F\u002Fgithub.com\u002Fmicrosoft",[87,91,95,99,103,107],{"name":88,"color":89,"percentage":90},"C#","#178600",47.8,{"name":92,"color":93,"percentage":94},"Java","#b07219",37.2,{"name":96,"color":97,"percentage":98},"Python","#3572A5",13.1,{"name":100,"color":101,"percentage":102},"CSS","#663399",1.3,{"name":104,"color":105,"percentage":106},"HTML","#e34c26",0.6,{"name":108,"color":109,"percentage":110},"Shell","#89e051",0.1,1815,394,"2026-04-16T23:58:38","MIT","","基准测试中使用了 2 块 NVIDIA P100 GPU，具体显存和 CUDA 版本未说明",{"notes":118,"python":115,"dependencies":119},"README 主要介绍数据集、任务定义及基线模型（CodeBERT, CodeGPT 等），未提供具体的安装脚本或环境配置清单。训练和推理成本是基于 2 块 P100 GPU 计算的。数据遵循 C-UDA 许可，代码遵循 MIT 许可。",[],[15,54],"2026-03-27T02:49:30.150509","2026-04-19T03:03:49.087997",[124,129,134,139,144,149],{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},41382,"如何从头预训练 CodeGPT 和 CodeGPT-adapted 模型？它们的分词器（Tokenizer）有什么区别？","CodeGPT 和 CodeGPT-adapted 使用完全不同的分词器。CodeGPT 是在代码领域上新训练的 BPE 分词器，而 CodeGPT-adapted 复用了 OpenAI GPT-2 的分词器。如果您想从头预训练 CodeGPT，可以设置 `microsoft\u002FCodeGPT-small-xx` 使用我们训练好的 BPE 分词器，或者自己训练一个新的分词器。注意：微调时词汇表大小可能比预训练时大，因为添加了如 `concode_elem_sep` 等特殊令牌。如果遇到 NoneType 错误，可能是因为调用了不存在的特殊令牌，请打印出错时的 token ids 进行排查。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Fissues\u002F75",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},41383,"如何在 Colab 等资源受限平台上进行增量训练（Incremental Training）？","可以通过替换 `run.py` 文件并重新运行训练命令来实现增量训练，程序会自动恢复上一个检查点（checkpoint）。具体步骤是下载提供的 `run.txt` 替换原有的 `run.py`，然后使用以下命令重新运行（以 Ruby 语言为例）：\n```\nlang=ruby\nlr=5e-5\nbatch_size=32\nbeam_size=10\nsource_length=256\ntarget_length=128\ndata_dir=..\u002Fdataset\noutput_dir=model\u002F$lang\ntrain_file=$data_dir\u002F$lang\u002Ftrain.jsonl\ndev_file=$data_dir\u002F$lang\u002Fvalid.jsonl\nepochs=10\npretrained_model=microsoft\u002Fcodebert-base\n\npython run.py --do_train --do_eval --model_type roberta --model_name_or_path $pretrained_model --train_filename $train_file --dev_filename $dev_file --output_dir $output_dir --max_source_length $source_length --max_target_length $target_length --beam_size $beam_size --train_batch_size $batch_size --eval_batch_size $batch_size --learning_rate $lr --num_train_epochs $epochs\n```\n注意：此方法会重置 logger，但优化器（optimizer）状态也会根据检查点恢复。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Fissues\u002F23",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},41384,"将 Code-to-Text 模型转换为 ONNX 格式后，加载时报错 'Type parameter (T) of Optype (Concat) bound to different types' 如何解决？输入数据的形状和类型是什么？","该错误通常是由于输入数据类型不匹配导致的。Code-to-Text 模型接受的输入是 `source_ids` 和 `source_mask`。它们的形状（shape）应为 `[batch_size, max_length]`，数据类型（type）必须为 `torch.long`（即 int64）。在导出 ONNX 时，请确保传入的 sample_input 中这两个张量的类型正确且一致，避免混合了 float 和 int64 类型导致 Concat 操作失败。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Fissues\u002F65",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},41385,"运行缺陷检测（Defect-detection）脚本时提示找不到 'microsoft\u002Fcodebert-base' 模型或程序卡住，该如何解决？","这个问题通常由 transformers 库版本不兼容或网络加载失败引起。解决方法有两种：\n1. 升级 transformers 库：运行 `pip install --upgrade transformers`。\n2. 手动下载模型：从 HuggingFace (https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002Fcodebert-base#list-files) 下载模型文件到本地文件夹 `code\u002Fmicrosoft\u002Fcodebert-base`（注意目录结构需正确）。\n如果升级后程序无输出，请尝试使用以下代码测试能否正常加载模型：\n```\nfrom transformers import AutoTokenizer, AutoModel\ntokenizer = AutoTokenizer.from_pretrained(\"microsoft\u002Fcodebert-base\")\nmodel = AutoModel.from_pretrained(\"microsoft\u002Fcodebert-base\")\n```","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Fissues\u002F11",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},41386,"CodeGPT 在 CodeSearchNet 数据集上预训练时，数据处理和特殊令牌（如 \u003Cs>, \u003C\u002Fs>, \u003CEOL>）是如何使用的？","预训练使用了 CodeSearchNet 中的单模态（仅代码，约 1.1M）和双模态（自然语言 - 代码对，约 0.5M）数据。对于仅代码的数据，处理方式通常为 `\u003Cs> + pl_string + \u003C\u002Fs>`，其中 `pl_string` 中的换行符会被替换为 `\u003CEOL>` 令牌。对于自然语言 - 代码对，处理方式类似 `\u003Cs> + nl_string + \u003CEOL> + pl_string + \u003C\u002Fs>`。预训练脚本通常使用 `TextDataset` 类和 `run_lm.py`，并且会包含所有 1.6M 样本的代码部分。特殊令牌 `\u003Cs>`, `\u003C\u002Fs>`, `\u003CEOL>` 均被用于标记序列的开始、结束以及代码行之间的分隔。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCodeXGLUE\u002Fissues\u002F36",{"id":150,"question_zh":151,"answer_zh":152,"source_url":148},41387,"CodeGPT 预训练模型可以用于代码生成任务吗？效果如何？","原则上 CodeGPT 可以用于代码生成任务。但是，鉴于该模型的参数量相对较小，且训练数据量有限，其代码生成能力可能无法与近期的大规模模型（如 CodeX\u002FCopilot 背后的模型）相媲美。它适合用于研究或对生成质量要求不是极端苛刻的场景。",[]]