[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-src-d--awesome-machine-learning-on-source-code":3,"tool-src-d--awesome-machine-learning-on-source-code":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":79,"stars":83,"forks":84,"last_commit_at":85,"license":86,"difficulty_score":87,"env_os":78,"env_gpu":88,"env_ram":88,"env_deps":89,"category_tags":92,"github_topics":93,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":98,"updated_at":99,"faqs":100,"releases":101},3377,"src-d\u002Fawesome-machine-learning-on-source-code","awesome-machine-learning-on-source-code","Cool links & research papers related to Machine Learning applied to source code (MLonCode)","awesome-machine-learning-on-source-code 是一个精心整理的开源资源库，专注于“机器学习应用于源代码”（MLonCode）这一前沿领域。它汇集了该方向优质的研究论文、数据集、软件项目、技术会议及相关博客文章，旨在为探索代码智能的从业者提供一站式知识导航。\n\n在软件开发日益复杂的今天，如何利用人工智能理解、生成、优化甚至修复代码成为关键挑战。这份清单系统地梳理了从程序合成、代码补全、缺陷检测，到代码翻译、摘要生成及克隆检测等核心应用场景的最新成果，帮助使用者快速把握技术脉络，避免在海量文献中迷失方向。\n\n该资源特别适合人工智能研究人员、软件工程学者以及致力于开发智能编程辅助工具（如 AI 结对程序员）的开发者使用。虽然原仓库已停止主动维护，但其沉淀的分类体系极具价值，且文中指引了活跃的替代项目（ml4code.github.io），确保用户能持续获取前沿资讯。其独特的亮点在于高度结构化的分类目录，将跨学科的复杂研究清晰拆解，让无论是学术界还是工业界的用户都能高效找到所需的技术灵感与数据支持，是进入代码智能领域的理想起点。","# Awesome Machine Learning On Source Code [![Awesome Machine Learning On Source Code](badges\u002Fawesome.svg)](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code) [![CI Status](https:\u002F\u002Ftravis-ci.org\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code.svg)](https:\u002F\u002Ftravis-ci.org\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code)\n\n![Awesome Machine Learning On Source Code](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsrc-d_awesome-machine-learning-on-source-code_readme_4467c186ec43.png)\n\n**Notice: This repository is no longer actively maintained, and no further updates will be done, nor issues\u002FPRs will be answered or attended.**\nAn alternative actively maintained can be found at [ml4code.github.io](https:\u002F\u002Fml4code.github.io\u002Fpapers.html) [repository](https:\u002F\u002Fgithub.com\u002Fml4code\u002Fml4code.github.io).\n\nA curated list of awesome research papers, datasets and software projects devoted to machine learning _and_ source code. [#MLonCode](https:\u002F\u002Ftwitter.com\u002Fhashtag\u002FMLonCode)\n\n## Contents\n\n- [Digests](#digests)\n- [Conferences](#conferences)\n- [Competitions](#competitions)\n- [Papers](#papers)\n  - [Program Synthesis and Induction](#program-synthesis-and-induction)\n  - [Source Code Analysis and Language modeling](#source-code-analysis-and-language-modeling)\n  - [Neural Network Architectures and Algorithms](#neural-network-architectures-and-algorithms)\n  - [Embeddings in Software Engineering](#embeddings-in-software-engineering)\n  - [Program Translation](#program-translation)\n  - [Code Suggestion and Completion](#code-suggestion-and-completion)\n  - [Program Repair and Bug Detection](#program-repair-and-bug-detection)\n  - [APIs and Code Mining](#apis-and-code-mining)\n  - [Code Optimization](#code-optimization)\n  - [Topic Modeling](#topic-modeling)\n  - [Sentiment Analysis](#sentiment-analysis)\n  - [Code Summarization](#code-summarization)\n  - [Clone Detection](#clone-detection)\n  - [Differentiable Interpreters](#differentiable-interpreters)\n  - [Related research](#related-research)\u003Cdetails>\u003Csummary>(links require \"Related research\" spoiler to be open)\u003C\u002Fsummary>\n    - [AST Differencing](#ast-differencing)\n    - [Binary Data Modeling](#binary-data-modeling)\n    - [Soft Clustering Using T-mixture Models](#soft-clustering-using-t-mixture-models)\n    - [Natural Language Parsing and Comprehension](#natural-language-parsing-and-comprehension)\n      \u003C\u002Fdetails>\n- [Posts](#posts)\n- [Talks](#talks)\n- [Software](#software)\n  - [Machine Learning](#machine-learning)\n  - [Utilities](#utilities)\n- [Datasets](#datasets)\n- [Credits](#credits)\n- [Contributions](#contributions)\n- [License](#license)\n\n## Digests\n\n- [Learning from \"Big Code\"](http:\u002F\u002Flearnbigcode.github.io) - Techniques, challenges, tools, datasets on \"Big Code\".\n- [A Survey of Machine Learning for Big Code and Naturalness](https:\u002F\u002Fml4code.github.io\u002F) - Survey and literature review on Machine Learning on Source Code.\n\n## Conferences\n\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [ACM International Conference on Software Engineering, ICSE](https:\u002F\u002Fwww.icse2018.org\u002F)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [ACM International Conference on Automated Software Engineering, ASE](https:\u002F\u002F2019.aseconf.org)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE)](https:\u002F\u002Fconf.researchr.org\u002Fhome\u002Ffse-2018)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [2018 IEEE 25th International Conference on Software Analysis, Evolution, and Reengineering (SANER)](https:\u002F\u002Fwww.conference-publishing.com\u002Flist.php?Event=SANER18MAIN)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [Machine Learning for Programming](https:\u002F\u002Fml4p.org\u002F)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [Workshop on NLP for Software Engineering](https:\u002F\u002Fnl4se.github.io\u002F)\n- \u003Cimg src=\"badges\u002Forigin-industry-green.svg\" alt=\"origin-industry\" align=\"top\"> [SysML](http:\u002F\u002Fwww.sysml.cc\u002F)\n  - [Talks](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUChutDKIa-AYyAmbT45s991g\u002F)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [Mining Software Repositories](http:\u002F\u002Fwww.msrconf.org\u002F)\n- \u003Cimg src=\"badges\u002Forigin-industry-green.svg\" alt=\"origin-industry\" align=\"top\"> [AIFORSE](https:\u002F\u002Faiforse.org\u002F)\n- \u003Cimg src=\"badges\u002Forigin-industry-green.svg\" alt=\"origin-industry\" align=\"top\"> [source{d} tech talks](https:\u002F\u002Fblog.sourced.tech\u002Fpost\u002Fml_talks_moscow\u002F)\n  - [Talks](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL5Ld68ole7j3iQFUSB3fR9122dHCUWXsy)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [NIPS Neural Abstract Machines and Program Induction workshop](https:\u002F\u002Fuclmr.github.io\u002Fnampi\u002F)\n  - [Talks](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLzTDea_cM27LVPSTdK9RypSyqBHZWPywt)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [CamAIML](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fevent\u002Fartificial-intelligence-and-machine-learning-in-cambridge-2017\u002F)\n  - [Learning to Code: Machine Learning for Program Induction](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vzDuVhFMB9Q) - Alexander Gaunt.\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [MASES 2018](https:\u002F\u002Fmases18.github.io\u002F)\n\n## Competitions\n\n- [CodRep](https:\u002F\u002Fgithub.com\u002FKTH\u002FCodRep-competition) - competition on automatic program repair: given a source line, find the insertion point.\n\n## Papers\n\n#### Program Synthesis and Induction\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Program Synthesis and Semantic Parsing with Learned Code Idioms](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.10816v2) - Richard Shin, Miltiadis Allamanis, Marc Brockschmidt, Oleksandr Polozov, 2019.\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [Synthetic Datasets for Neural Program Synthesis](https:\u002F\u002Fopenreview.net\u002Fforum?id=ryeOSnAqYm) - Richard Shin, Neel Kant, Kavi Gupta, Chris Bender, Brandon Trabucco, Rishabh Singh, Dawn Song, ICLR 2019.\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [Execution-Guided Neural Program Synthesis](https:\u002F\u002Fopenreview.net\u002Fforum?id=H1gfOiAqYm) - Xinyun Chen, Chang Liu, Dawn Song, ICLR 2019.\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [DeepFuzz: Automatic Generation of Syntax Valid C Programs for Fuzz Testing](https:\u002F\u002Ffaculty.ist.psu.edu\u002Fwu\u002Fpapers\u002FDeepFuzz.pdf) - Xiao Liu, Xiaoting Li, Rupesh Prajapati, Dinghao Wu, AAAI 2019.\n- \u003Cimg src=\"badges\u002F12-pages-beginner-brightgreen.svg\" alt=\"12-pages-beginner\" align=\"top\"> [NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.08979v2) - Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, Michael D. Ernst, LREC 2018.\n- \u003Cimg src=\"badges\u002F18-pages-gray.svg\" alt=\"18-pages\" align=\"top\"> [Recent Advances in Neural Program Synthesis](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.02353v1) - Neel Kant, 2018.\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [Neural Sketch Learning for Conditional Program Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.05698) - Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, Chris Jermaine, ICLR 2018.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Neural Program Search: Solving Programming Tasks from Description and Examples](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.04335v1) - Illia Polosukhin, Alexander Skidanov, ICLR 2018.\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [Neural Program Synthesis with Priority Queue Training](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.03526v1) - Daniel A. Abolafia, Mohammad Norouzi, Quoc V. Le, 2018.\n- \u003Cimg src=\"badges\u002F31-pages-gray.svg\" alt=\"31-pages\" align=\"top\"> [Towards Synthesizing Complex Programs from Input-Output Examples](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.01284v3) - Xinyun Chen, Chang Liu, Dawn Song, ICLR 2018.\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [Glass-Box Program Synthesis: A Machine Learning Approach](https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.08669v1) - Konstantina Christakopoulou, Adam Tauman Kalai, AAAI 2018.\n- \u003Cimg src=\"badges\u002F14-pages-beginner-brightgreen.svg\" alt=\"14-pages\" align=\"top\"> [Synthesizing Benchmarks for Predictive Modeling](https:\u002F\u002Fchriscummins.cc\u002Fpub\u002F2017-cgo.pdf) - Chris Cummins, Pavlos Petoumenos, Zheng Wang, Hugh Leather, CGO 2017\n- \u003Cimg src=\"badges\u002F17-pages-beginner-brightgreen.svg\" alt=\"17-pages-beginner\" align=\"top\"> [Program Synthesis for Character Level Language Modeling](https:\u002F\u002Ffiles.sri.inf.ethz.ch\u002Fwebsite\u002Fpapers\u002Fcharmodel-iclr2017.pdf) - Pavol Bielik, Veselin Raychev, Martin Vechev, ICLR 2017.\n- \u003Cimg src=\"badges\u002F13-pages-beginner-brightgreen.svg\" alt=\"13-pages-beginner\" align=\"top\"> [SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.04436v1) - Xiaojun Xu, Chang Liu, Dawn Song, 2017.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Learning to Select Examples for Program Synthesis](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.03243v1) - Yewen Pu, Zachery Miranda, Armando Solar-Lezama, Leslie Pack Kaelbling, 2017.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Neural Program Meta-Induction](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.04157v1) - Jacob Devlin, Rudy Bunel, Rishabh Singh, Matthew Hausknecht, Pushmeet Kohli, NIPS 2017.\n- \u003Cimg src=\"badges\u002F14-pages-beginner-brightgreen.svg\" alt=\"14-pages-beginner\" align=\"top\"> [Learning to Infer Graphics Programs from Hand-Drawn Images](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.09627v4) - Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, Joshua B. Tenenbaum, 2017.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Neural Attribute Machines for Program Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.09231v2) - Matthew Amodio, Swarat Chaudhuri, Thomas Reps, 2017.\n- \u003Cimg src=\"badges\u002F11-pages-beginner-brightgreen.svg\" alt=\"11-pages-beginner\" align=\"top\"> [Abstract Syntax Networks for Code Generation and Semantic Parsing](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.07535v1) - Maxim Rabinovich, Mitchell Stern, Dan Klein, ACL 2017.\n- \u003Cimg src=\"badges\u002F20-pages-gray.svg\" alt=\"20-pages\" align=\"top\"> [Making Neural Programming Architectures Generalize via Recursion](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1704.06611v1.pdf) - Jonathon Cai, Richard Shin, Dawn Song, ICLR 2017.\n- \u003Cimg src=\"badges\u002F14-pages-gray.svg\" alt=\"14-pages\" align=\"top\"> [A Syntactic Neural Model for General-Purpose Code Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.01696v1) - Pengcheng Yin, Graham Neubig, ACL 2017.\n- \u003Cimg src=\"badges\u002F12-pages-beginner-brightgreen.svg\" alt=\"12-pages-beginner\" align=\"top\"> [Program Synthesis from Natural Language Using Recurrent Neural Networks](https:\u002F\u002Fhomes.cs.washington.edu\u002F~mernst\u002Fpubs\u002Fnl-command-tr170301.pdf) - Xi Victoria Lin, Chenglong Wang, Deric Pang, Kevin Vu, Luke Zettlemoyer, Michael Ernst, 2017.\n- \u003Cimg src=\"badges\u002F18-pages-beginner-brightgreen.svg\" alt=\"18-pages-beginner\" align=\"top\"> [RobustFill: Neural Program Learning under Noisy I\u002FO](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.07469v1) - Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli, ICML 2017.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Lifelong Perceptual Programming By Example](https:\u002F\u002Fopenreview.net\u002Fpdf?id=HJStZKqel) - Gaunt, Alexander L., Marc Brockschmidt, Nate Kushman, and Daniel Tarlow, 2017.\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [Neural Programming by Example](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.04990v1) - Chengxun Shu, Hongyu Zhang, AAAI 2017.\n- \u003Cimg src=\"badges\u002F21-pages-gray.svg\" alt=\"21-pages\" align=\"top\"> [DeepCoder: Learning to Write Programs](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01989) - Balog Matej, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow, ICLR 2017.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [A Differentiable Approach to Inductive Logic Programming](https:\u002F\u002Fpdfs.semanticscholar.org\u002F9698\u002F409fc1603d28b6d51c38261f6243837c8bdd.pdf) - Yang Fan, Zhilin Yang, and William W. Cohen, 2017.\n- \u003Cimg src=\"badges\u002F12-pages-beginner-brightgreen.svg\" alt=\"12-pages-beginner\" align=\"top\"> [Latent Attention For If-Then Program Synthesis](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01867v1) - Xinyun Chen, Chang Liu, Richard Shin, Dawn Song, Mingcheng Chen, NIPS 2016.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\" id=\"card2code\"> [Latent Predictor Networks for Code Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1603.06744) - Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Andrew Senior, Fumin Wang, Phil Blunsom, ACL 2016.\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision (Short Version)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.01197) - Liang Chen, Jonathan Berant, Quoc Le, Kenneth D. Forbus, and Ni Lao, NIPS 2016.\n- \u003Cimg src=\"badges\u002F5-pages-gray.svg\" alt=\"5-pages\" align=\"top\"> [Programs as Black-Box Explanations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.07579) - Singh, Sameer, Marco Tulio Ribeiro, and Carlos Guestrin, NIPS 2016.\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [Search-Based Generalization and Refinement of Code Templates](http:\u002F\u002Fsoft.vub.ac.be\u002FPublications\u002F2016\u002Fvub-soft-tr-16-06.pdf) - Tim Molderez, Coen De Roover, SSBSE 2016.\n- \u003Cimg src=\"badges\u002F14-pages-gray.svg\" alt=\"14-pages\" align=\"top\"> [Structured Generative Models of Natural Source Code](https:\u002F\u002Farxiv.org\u002Fabs\u002F1401.0514) - Chris J. Maddison, Daniel Tarlow, ICML 2014.\n\n#### Source Code Analysis and Language modeling\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Modeling Vocabulary for Big Code Machine Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.01873v1) - Hlib Babii, Andrea Janes, Romain Robbes, 2019.\n- \u003Cimg src=\"badges\u002F24-pages-gray.svg\" alt=\"24-pages\" align=\"top\"> [Generative Code Modeling with Graphs](https:\u002F\u002Fopenreview.net\u002Fforum?id=Bke4KsA5FX) - Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, Oleksandr Polozov, ICLR 2019.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [NL2Type: Inferring JavaScript Function Types from Natural Language Information](http:\u002F\u002Fsoftware-lab.org\u002Fpublications\u002Ficse2019_NL2Type.pdf) - Rabee Sohail Malik, Jibesh Patra, Michael Pradel, ICSE 2019.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [A Novel Neural Source Code Representation based on Abstract Syntax Tree](http:\u002F\u002Fxuwang.tech\u002Fpaper\u002Fastnn_icse2019.pdf) - Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, Xudong Liu, ICSE 2019.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Deep Learning Type Inference](http:\u002F\u002Fvhellendoorn.github.io\u002FPDF\u002Ffse2018-j2t.pdf) - Vincent J. Hellendoorn, Christian Bird, Earl T. Barr and Miltiadis Allamanis, FSE 2018. [Code](https:\u002F\u002Fgithub.com\u002FDeepTyper\u002FDeepTyper).\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Tree2Tree Neural Translation Model for Learning Source Code Changes](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.00314.pdf) - Saikat Chakraborty, Miltiadis Allamanis, Baishakhi Ray, 2018.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [code2seq: Generating Sequences from Structured Representations of Code](https:\u002F\u002Farxiv.org\u002Fabs\u002F1808.01400) - Uri Alon, Omer Levy, Eran Yahav, 2018.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Syntax and Sensibility: Using language models to detect and correct syntax errors](http:\u002F\u002Fsoftwareprocess.es\u002Fpubs\u002Fsantos2018SANER-syntax.pdf) - Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, and José Nelson Amaral, SANER 2018.\n- \u003Cimg src=\"badges\u002F25-pages-gray.svg\" alt=\"25-pages\" align=\"top\"> [code2vec: Learning Distributed Representations of Code](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.09473v2) - Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav, 2018.\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [Learning to Represent Programs with Graphs](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.00740v1) - Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi, ICLR 2018.\n- \u003Cimg src=\"badges\u002F36-pages-gray.svg\" alt=\"36-pages\" align=\"top\"> [A Survey of Machine Learning for Big Code and Naturalness](https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.06182v1) - Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Charles Sutton, 2017.\n- \u003Cimg src=\"badges\u002F36-pages-gray.svg\" alt=\"36-pages\" align=\"top\"> [Are Deep Neural Networks the Best Choice for Modeling Source Code?](http:\u002F\u002Fweb.cs.ucdavis.edu\u002F~devanbu\u002FisDLgood.pdf) - Vincent J. Hellendoorn, Premkumar Devanbu, FSE 2017.\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [A deep language model for software code](https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.02715v1) - Hoa Khanh Dam, Truyen Tran, Trang Pham, 2016.\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [Convolutional Neural Networks over Tree Structures for Programming Language Processing](https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.5718) - Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin, AAAI-16. [Code](https:\u002F\u002Fgithub.com\u002Fcrestonbunch\u002Ftbcnn).\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Suggesting Accurate Method and Class Names](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Faccurate-method-and-class.pdf) - Miltiadis Allamanis, Earl T. Barr, Christian Bird, Charles Sutton, FSE 2015.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Mining Source Code Repositories at Massive Scale using Language Modeling](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Fmsr2013.pdf) - Miltiadis Allamanis, Charles Sutton, MSR 2013.\n\n#### Neural Network Architectures and Algorithms\n\n- \u003Cimg src=\"badges\u002F19-pages-gray.svg\" alt=\"19-pages\" align=\"top\"> [Learning Compositional Neural Programs with Recursive Tree Search and Planning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.12941v1) - Thomas Pierrot, Guillaume Ligner, Scott Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas, 2019.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [From Programs to Interpretable Deep Models and Back](https:\u002F\u002Flink.springer.com\u002Fcontent\u002Fpdf\u002F10.1007%2F978-3-319-96145-3_2.pdf) - Eran Yahav, ICCAV 2018.\n- \u003Cimg src=\"badges\u002F13-pages-gray.svg\" alt=\"13-pages\" align=\"top\"> [Neural Code Comprehension: A Learnable Representation of Code Semantics](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.07336) - Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler, NIPS 2018.\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [A General Path-Based Representation for Predicting Program Properties](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.09544) - Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav, PLDI 2018.\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.06159v2) - Nghi D. Q. Bui, Lingxiao Jiang, Yijun Yu, AAAI 2018.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification](https:\u002F\u002Fbdqnghi.github.io\u002Ffiles\u002FSANER_2019_bilateral_dependency.pdf) - Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang, SANER 2018.\n- \u003Cimg src=\"badges\u002F17-pages-gray.svg\" alt=\"17-pages\" align=\"top\"> [Syntax-Directed Variational Autoencoder for Structured Data](https:\u002F\u002Fopenreview.net\u002Fpdf?id=SyqShMZRb) - Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, Le Song, ICLR 2018.\n- \u003Cimg src=\"badges\u002F19-pages-gray.svg\" alt=\"19-pages\" align=\"top\"> [Divide and Conquer with Neural Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.02401) - Nowak, Alex, and Joan Bruna, ICLR 2018.\n- \u003Cimg src=\"badges\u002F13-pages-gray.svg\" alt=\"13-pages\" align=\"top\"> [Hierarchical multiscale recurrent neural networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.01704) - Chung Junyoung, Sungjin Ahn, and Yoshua Bengio, ICLR 2017.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Learning Efficient Algorithms with Hierarchical Attentive Memory](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.03218) - Andrychowicz, Marcin, and Karol Kurach, 2016.\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [Learning Operations on a Stack with Neural Turing Machines](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.00827) - Deleu, Tristan, and Joseph Dureau, NIPS 2016.\n- \u003Cimg src=\"badges\u002F5-pages-gray.svg\" alt=\"5-pages\" align=\"top\"> [Probabilistic Neural Programs](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.00712) - Murray, Kenton W., and Jayant Krishnamurthy, NIPS 2016.\n- \u003Cimg src=\"badges\u002F13-pages-gray.svg\" alt=\"13-pages\" align=\"top\"> [Neural Programmer-Interpreters](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06279) - Reed, Scott, and Nando de Freitas, ICLR 2016.\n- \u003Cimg src=\"badges\u002F9-pages-gray.svg\" alt=\"9-pages\" align=\"top\"> [Neural GPUs Learn Algorithms](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.08228) - Kaiser, Łukasz, and Ilya Sutskever, ICLR 2016.\n- \u003Cimg src=\"badges\u002F17-pages-gray.svg\" alt=\"17-pages\" align=\"top\"> [Neural Random-Access Machines](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06392v3) - Karol Kurach, Marcin Andrychowicz, Ilya Sutskever, ERCIM News 2016.\n- \u003Cimg src=\"badges\u002F18-pages-gray.svg\" alt=\"18-pages\" align=\"top\"> [Neural Programmer: Inducing Latent Programs with Gradient Descent](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.04834) - Neelakantan, Arvind, Quoc V. Le, and Ilya Sutskever, ICLR 2015.\n- \u003Cimg src=\"badges\u002F25-pages-gray.svg\" alt=\"25-pages\" align=\"top\"> [Learning to Execute](https:\u002F\u002Farxiv.org\u002Fabs\u002F1410.4615v3) - Wojciech Zaremba, Ilya Sutskever, 2015.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets](https:\u002F\u002Farxiv.org\u002Fabs\u002F1503.01007) - Joulin, Armand, and Tomas Mikolov, NIPS 2015.\n- \u003Cimg src=\"badges\u002F26-pages-gray.svg\" alt=\"26-pages\" align=\"top\"> [Neural Turing Machines](https:\u002F\u002Farxiv.org\u002Fabs\u002F1410.5401) - Graves, Alex, Greg Wayne, and Ivo Danihelka, 2014.\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [From Machine Learning to Machine Reasoning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1102.1808) - Bottou Leon, Journal of Machine Learning 2011.\n\n#### Embeddings in Software Engineering\n\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [A Literature Study of Embeddings on Source Code](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.03061) - Zimin Chen and Martin Monperrus, 2019.\n- \u003Cimg src=\"badges\u002F3-pages-gray.svg\" alt=\"3-pages\" align=\"top\"> [AST-Based Deep Learning for Detecting Malicious PowerShell](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.09230.pdf) - Gili Rusak, Abdullah Al-Dujaili, Una-May O'Reilly, 2018.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Deep Code Search](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3180167) - Xiaodong Gu, Hongyu Zhang, Sunghun Kim, ICSE 2018.\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [Word Embeddings for the Software Engineering Domain](https:\u002F\u002Fgithub.com\u002Fvefstathiou\u002FSO_word2vec\u002Fblob\u002Fmaster\u002FMSR18-w2v.pdf) - Vasiliki Efstathiou, Christos Chatzilenas, Diomidis Spinellis, MSR 2018.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=top> [\n  Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.06686) - Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, Thomas Reps, FSE 2018.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Document Distance Estimation via Code Graph Embedding](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F320074701_Document_Distance_Estimation_via_Code_Graph_Embedding) - Zeqi Lin, Junfeng Zhao, Yanzhen Zou, Bing Xie, Internetware 2017.\n- \u003Cimg src=\"badges\u002F3-pages-gray.svg\" alt=\"3-pages\" align=\"top\"> [Combining Word2Vec with revised vector space model for better code retrieval](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F318123700_Combining_Word2Vec_with_Revised_Vector_Space_Model_for_Better_Code_Retrieval) - Thanh Van Nguyen, Anh Tuan Nguyen, Hung Dang Phan, Trong Duc Nguyen, Tien N. Nguyen, ICSE 2017.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [From word embeddings to document similarities for improved information retrieval in software engineering](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F296526040_From_Word_Embeddings_To_Document_Similarities_for_Improved_Information_Retrieval_in_Software_Engineering) - Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, Chang Liu, ICSE 2016.\n- \u003Cimg src=\"badges\u002F3-pages-gray.svg\" alt=\"3-pages\" align=\"top\"> [Mapping API Elements for Code Migration with Vector Representation](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=2892661) - Trong Duc Nguyen, Anh Tuan Nguyen, Tien N. Nguyen, ICSE 2016.\n\n#### Program Translation\n\n- \u003Cimg src=\"badges\u002F18-pages-gray.svg\" alt=\"18-pages\" align=\"top\"> [Towards Neural Decompilation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.08325v1) - Omer Katz, Yuval Olshaker, Yoav Goldberg, Eran Yahav, 2019.\n- \u003Cimg src=\"badges\u002F14-pages-gray.svg\" alt=\"14-pages\" align=\"top\"> [Tree-to-tree Neural Networks for Program Translation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.03691v1) - Xinyun Chen, Chang Liu, Dawn Song, ICLR 2018.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Code Attention: Translating Code to Comments by Exploiting Domain Features](https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.07642v2) - Wenhao Zheng, Hong-Yu Zhou, Ming Li, Jianxin Wu, 2017.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Automatically Generating Commit Messages from Diffs using Neural Machine Translation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1708.09492v1) - Siyuan Jiang, Ameer Armaly, Collin McMillan, ASE 2017.\n- \u003Cimg src=\"badges\u002F5-pages-gray.svg\" alt=\"5-pages\" align=\"top\"> [A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.02275v1) - Antonio Valerio Miceli Barone, Rico Sennrich, ICNLP 2017.\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.04856v1) - Pablo Loyola, Edison Marrese-Taylor, Yutaka Matsuo, ACL 2017.\n\n#### Code Suggestion and Completion\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Aroma: Code Recommendation via Structural Code Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.01158) - Sifei Luan, Di Yang, Koushik Sen and Satish Chandra, 2019.\n- \u003Cimg src=\"badges\u002F9-pages-gray.svg\" alt=\"9-pages\" align=\"top\"> [Intelligent Code Reviews Using Deep Learning](https:\u002F\u002Fwww.kdd.org\u002Fkdd2018\u002Ffiles\u002Fdeep-learning-day\u002FDLDay18_paper_40.pdf) - Anshul Gupta, Neel Sundaresan, KDD DL Day 2018.\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [Code Completion with Neural Attention and Pointer Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.09573v1) - Jian Li, Yue Wang, Irwin King, Michael R. Lyu, 2017.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Learning Python Code Suggestion with a Sparse Pointer Network](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.08307) - Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, Sebastian Riedel, 2016.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Code Completion with Statistical Language Models](http:\u002F\u002Fwww.cs.technion.ac.il\u002F~yahave\u002Fpapers\u002Fpldi14-statistical.pdf) - Veselin Raychev, Martin Vechev, Eran Yahav, PLDI 2014.\n\n#### Program Repair and Bug Detection\n\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [SampleFix: Learning to Correct Programs by Sampling Diverse Fixes](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.10502) - Hossein Hajipour, Apratim Bhattacharya, Mario Fritz, 2019.\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection](https:\u002F\u002Fopenreview.net\u002Fforum?id=ByloIiCqYQ) - Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel, Lizhen Qu, ICLR 2019.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Neural Program Repair by Jointly Learning to Localize and Repair](https:\u002F\u002Fopenreview.net\u002Fforum?id=ByloJ20qtm) - Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, Rishabh Singh, ICLR 2019.\n- \u003Cimg src=\"badges\u002F11-pages-beginner-brightgreen.svg\" alt=\"11-pages\" align=\"top\"> [Compiler Fuzzing through Deep Learning](https:\u002F\u002Fchriscummins.cc\u002Fpub\u002F2018-issta.pdf) - Chris Cummins, Pavlos Petoumenos, Alastair Murray, Hugh Leather, ISSTA 2018\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Automatically assessing vulnerabilities discovered by compositional analysis](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3243130) - Saahil Ognawala, Ricardo Nales Amato, Alexander Pretschner and Pooja Kulkarni, MASES 2018.\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation](http:\u002F\u002Fwww.cs.wm.edu\u002F~denys\u002Fpubs\u002FASE%2718-Learning-Bug-Fixes-NMT.pdf) - Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, Denys Poshyvanyk, ASE 2018.\n- \u003Cimg src=\"badges\u002F23-pages-gray.svg\" alt=\"23-pages\" align=\"top\"> [DeepBugs: A Learning Approach to Name-based Bug Detection](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1805.11683.pdf) - Michael Pradel, Koushik Sen, 2018.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Learning How to Mutate Source Code from Bug-Fixes](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.10772) - Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, Denys Poshyvanyk, 2018.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [A deep tree-based model for software defect prediction](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.00921) - HK Dam, T Pham, SW Ng, [T Tran](https:\u002F\u002Ftruyentran.github.io), J Grundy, A Ghose, T Kim, CJ Kim, 2018.\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [Automated Vulnerability Detection in Source Code Using Deep Representation Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.04320) - Rebecca L. Russell, Louis Kim, Lei H. Hamilton, Tomo Lazovich, Jacob A. Harer, Onur Ozdemir, Paul M. Ellingwood, Marc W. McConley, 2018.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Shaping Program Repair Space with Existing Patches and Similar Code](https:\u002F\u002Fxiongyingfei.github.io\u002Fpapers\u002FISSTA18a.pdf) - Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, Xiangqun Chen, 2018. ([code](https:\u002F\u002Fgithub.com\u002Fxgdsmileboy\u002FSimFix)).\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [Learning to Repair Software Vulnerabilities with Generative Adversarial Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1805.07475) - Jacob A. Harer, Onur Ozdemir, Tomo Lazovich, Christopher P. Reale, Rebecca L. Russell, Louis Y. Kim, Peter Chin, 2018.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Dynamic Neural Program Embedding for Program Repair](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.07163v2) - Ke Wang, Rishabh Singh, Zhendong Su, ICLR 2018.\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [Estimating defectiveness of source code: A predictive model using GitHub content](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.07764) - Ritu Kapur, Balwinder Sodhi, 2018\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [Automated software vulnerability detection with machine learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.04497) - Jacob A. Harer, Louis Y. Kim, Rebecca L. Russell, Onur Ozdemir, Leonard R. Kosta, Akshay Rangamani, Lei H. Hamilton, Gabriel I. Centeno, Jonathan R. Key, Paul M. Ellingwood, Marc W. McConley, Jeffrey M. Opper, Peter Chin, Tomo Lazovich, IWSPA 2018.\n- \u003Cimg src=\"badges\u002F34-pages-gray.svg\" alt=\"34-pages\" align=\"top\"> [Learning a Static Analyzer from Data](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01752) - Pavol Bielik, Veselin Raychev, Martin Vechev, CAV 2017. [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=bkieI3jLxVY).\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [To Type or Not to Type: Quantifying Detectable Bugs in JavaScript](http:\u002F\u002Fearlbarr.com\u002Fpublications\u002Ftypestudy.pdf) - Zheng Gao, Christian Bird, Earl Barr, ICSE 2017.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.04742) - Martin White, Michele Tufano, Matías Martínez, Martin Monperrus, Denys Poshyvanyk, 2017.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Semantic Code Repair using Neuro-Symbolic Transformation Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.11054v1) - Jacob Devlin, Jonathan Uesato, Rishabh Singh, Pushmeet Kohli, 2017.\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [Automated Identification of Security Issues from Commit Messages and Bug Reports](http:\u002F\u002Fasankhaya.github.io\u002Fpdf\u002Fautomated-identification-of-security-issues-from-commit-messages-and-bug-reports.pdf) - Yaqin Zhou and Asankhaya Sharma, FSE 2017.\n- \u003Cimg src=\"badges\u002F31-pages-gray.svg\" alt=\"31-pages\" align=\"top\"> [SmartPaste: Learning to Adapt Source Code](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.07867) - Miltiadis Allamanis, Marc Brockschmidt, 2017.\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [End-to-End Prediction of Buffer Overruns from Raw Source Code via Neural Memory Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.02458v1) - Min-je Choi, Sehun Jeong, Hakjoo Oh, Jaegul Choo, IJCAI 2017.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Tailored Mutants Fit Bugs Better](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.02516) - Miltiadis Allamanis, Earl T. Barr, René Just, Charles Sutton, 2016.\n\n#### APIs and Code Mining\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [SAR: Learning Cross-Language API Mappings with Little Knowledge](https:\u002F\u002Fbdqnghi.github.io\u002Ffiles\u002FFSE_2019.pdf) - Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang, FSE 2019.\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.04715) - Nghi D. Q. Bui, Lingxiao Jiang, ICSE 2018.\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.07734v1) - Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim, IJCAI 2017.\n- \u003Cimg src=\"badges\u002F9-pages-gray.svg\" alt=\"9-pages\" align=\"top\"> [Mining Change Histories for Unknown Systematic Edits](http:\u002F\u002Fsoft.vub.ac.be\u002FPublications\u002F2017\u002Fvub-soft-tr-17-04.pdf) - Tim Molderez, Reinout Stevens, Coen De Roover, MSR 2017.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Deep API Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.08535v3) - Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim, FSE 2016.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Exploring API Embedding for API Usages and Applications](http:\u002F\u002Fhome.eng.iastate.edu\u002F~trong\u002Fprojects\u002Fjv2cs\u002F) - Nguyen, Nguyen, Phan and Nguyen, Journal of Systems and Software 2017.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [API usage pattern recommendation for software development](http:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0164121216301200) - Haoran Niu, Iman Keivanloo, Ying Zou, 2017.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Parameter-Free Probabilistic API Mining across GitHub](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Ffse2016.pdf) - Jaroslav Fowkes, Charles Sutton, FSE 2016.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [A Subsequence Interleaving Model for Sequential Pattern Mining](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Fkdd2016-subsequence-interleaving.pdf) - Jaroslav Fowkes, Charles Sutton, KDD 2016.\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [Lean GHTorrent: GitHub data on demand](https:\u002F\u002Fbvasiles.github.io\u002Fpapers\u002Flean-ghtorrent.pdf) - Georgios Gousios, Bogdan Vasilescu, Alexander Serebrenik, Andy Zaidman, MSR 2014.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Mining idioms from source code](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Fidioms.pdf) - Miltiadis Allamanis, Charles Sutton, FSE 2014.\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [The GHTorent Dataset and Tool Suite](http:\u002F\u002Fwww.gousios.gr\u002Fpub\u002Fghtorrent-dataset-toolsuite.pdf) - Georgios Gousios, MSR 2013.\n\n#### Code Optimization\n\n- \u003Cimg src=\"badges\u002F27-pages-gray.svg\" alt=\"27-pages\" align=\"top\"> [The Case for Learned Index Structures](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.01208v2) - Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis, SIGMOD 2018.\n- \u003Cimg src=\"badges\u002F14-pages-gray.svg\" alt=\"14-pages\" align=\"top\"> [End-to-end Deep Learning of Optimization Heuristics](https:\u002F\u002Fchriscummins.cc\u002Fpub\u002F2017-pact.pdf) - Chris Cummins, Pavlos Petoumenos, Zheng Wang, Hugh Leather, PACT 2017\n- \u003Cimg src=\"badges\u002F14-pages-gray.svg\" alt=\"14-pages\" align=\"top\"> [Learning to superoptimize programs](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01787v3) - Rudy Bunel, Alban Desmaison, M. Pawan Kumar, Philip H.S. Torr, Pushmeet Kohlim ICLR 2017.\n- \u003Cimg src=\"badges\u002F18-pages-gray.svg\" alt=\"18-pages\" align=\"top\"> [Neural Nets Can Learn Function Type Signatures From Binaries](https:\u002F\u002Fwww.usenix.org\u002Fsystem\u002Ffiles\u002Fconference\u002Fusenixsecurity17\u002Fsec17-chua.pdf) - Zheng Leong Chua, Shiqi Shen, Prateek Saxena, and Zhenkai Liang, USENIX Security Symposium 2017.\n- \u003Cimg src=\"badges\u002F25-pages-gray.svg\" alt=\"25-pages\" align=\"top\"> [Adaptive Neural Compilation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.07969v2) - Rudy Bunel, Alban Desmaison, Pushmeet Kohli, Philip H.S. Torr, M. Pawan Kumar, NIPS 2016.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Learning to Superoptimize Programs - Workshop Version](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.01094) - Bunel, Rudy, Alban Desmaison, M. Pawan Kumar, Philip H. S. Torr, and Pushmeet Kohli, NIPS 2016.\n\n#### Topic Modeling\n\n- \u003Cimg src=\"badges\u002F9-pages-gray.svg\" alt=\"9-pages\" align=\"top\"> [A Language-Agnostic Model for Semantic Source Code Labeling](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3243132) - Ben Gelman, Bryan Hoyle, Jessica Moore, Joshua Saxe and David Slater, MASES 2018.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Topic modeling of public repositories at scale using names in source code](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.00135) - Vadim Markovtsev, Eiso Kant, 2017.\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [Why, When, and What: Analyzing Stack Overflow Questions by Topic, Type, and Code](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002FmsrCh2013.pdf) - Miltiadis Allamanis, Charles Sutton, MSR 2013.\n- \u003Cimg src=\"badges\u002F30-pages-gray.svg\" alt=\"30-pages\" align=\"top\"> [Semantic clustering: Identifying topics in source code](http:\u002F\u002Fscg.unibe.ch\u002Farchive\u002Fdrafts\u002FKuhn06bSemanticClustering.pdf) - Adrian Kuhn, Stéphane Ducasse, Tudor Girba, Information & Software Technology 2007.\n\n#### Sentiment Analysis\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [A Benchmark Study on Sentiment Analysis for Software Engineering Research](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.06525) - Nicole Novielli, Daniela Girardi, Filippo Lanubile, MSR 2018.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Sentiment Analysis for Software Engineering: How Far Can We Go?](http:\u002F\u002Fwww.inf.usi.ch\u002Fphd\u002Flin\u002Fdownloads\u002FLin2018a.pdf) - Bin Lin, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta, Michele Lanza, Rocco Oliveto, ICSE 2018.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Leveraging Automated Sentiment Analysis in Software Engineering](http:\u002F\u002Fcs.uno.edu\u002F~zibran\u002Fresources\u002FMyPapers\u002FSentiStrengthSE_2017.pdf) - Md Rakibul Islam, Minhaz F. Zibran, MSR 2017.\n- \u003Cimg src=\"badges\u002F27-pages-gray.svg\" alt=\"27-pages\" align=\"top\"> [Sentiment Polarity Detection for Software Development](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1709.02984.pdf) - Fabio Calefato, Filippo Lanubile, Federico Maiorano, Nicole Novielli, Empirical Software Engineering 2017.\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [SentiCR: A Customized Sentiment Analysis Tool for Code Review Interactions](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F0Byog0ILN8S1haGxpT3hvSzZxdms\u002Fview) - Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, Shahram Rahimi, ASE 2017.\n\n#### Code Summarization\n\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [Summarizing Source Code with Transferred API Knowledge](https:\u002F\u002Fxin-xia.github.io\u002Fpublication\u002Fijcai18.pdf) - Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, Zhi Jin, IJCAI 2018.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Deep Code Comment Generation](https:\u002F\u002Fxin-xia.github.io\u002Fpublication\u002Ficpc182.pdf) - Xing Hu, Ge Li, Xin Xia, David Lo, Zhi Jin, ICPC 2018.\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [A Neural Framework for Retrieval and Summarization of Source Code](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3240471) - Qingying Chen, Minghui Zhou, ASE 2018.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Improving Automatic Source Code Summarization via Deep Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.07234) - Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu and Philip S. Yu, ASE 2018.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [A Convolutional Attention Network for Extreme Summarization of Source Code](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.03001) - Miltiadis Allamanis, Hao Peng, Charles Sutton, ICML 2016.\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [TASSAL: Autofolding for Source Code Summarization](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Ficse2016-demo.pdf) - Jaroslav Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltiadis Allamanis, Mirella Lapata, Charles Sutton, ICSE 2016.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Summarizing Source Code using a Neural Attention Model](https:\u002F\u002Fgithub.com\u002Fsriniiyer\u002Fcodenn\u002Fblob\u002Fmaster\u002Fsummarizing_source_code.pdf) - Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Luke Zettlemoyer, ACL 2016.\n- \u003Cimg src=\"badges\u002F13-pages-gray.svg\" alt=\"13-pages\" align=\"top\"> [Automatic Generation of Pull Request Descriptions](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.06987) - Zhongxin Liu, Xin Xia, Christoph Treude, David Lo, Shanping Li, ASE 2019.\n\n#### Clone Detection\n\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection](https:\u002F\u002Fpvs.ifi.uni-heidelberg.de\u002Ffileadmin\u002Fpapers\u002F2019\u002FBuech-Andrzejak-SANER2019.pdf) - Lutz Büch and Artur Andrzejak, SANER 2019.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Oreo: detection of clones in the twilight zone](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3236026) - Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes, FSE 2018.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [A Deep Learning Approach to Program Similarity](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3243131) - Niccolò Marastoni, Roberto Giacobazzi and Mila Dalla Preda, MASES 2018.\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [Recurrent Neural Network for Code Clone Detection](https:\u002F\u002Fseim-conf.org\u002Fmedia\u002Fmaterials\u002F2018\u002Fproceedings\u002FSEIM-2018_Short_Papers.pdf#page=48) - Arseny Zorin and Vladimir Itsykson, SEIM 2018.\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [The Adverse Effects of Code Duplication in Machine Learning Models of Code](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.06469) - Miltiadis Allamanis, 2018.\n- \u003Cimg src=\"badges\u002F28-pages-gray.svg\" alt=\"28-pages\" align=\"top\"> [DéjàVu: a map of code duplicates on GitHub](http:\u002F\u002Fjanvitek.org\u002Fpubs\u002Foopsla17b.pdf) - Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, Jan Vitek, Programming Languages OOPSLA 2017.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Some from Here, Some from There: Cross-project Code Reuse in GitHub](http:\u002F\u002Fweb.cs.ucdavis.edu\u002F~filkov\u002Fpapers\u002Fclones.pdf) - Mohammad Gharehyazie, Baishakhi Ray, Vladimir Filkov, MSR 2017.\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Deep Learning Code Fragments for Code Clone Detection](http:\u002F\u002Fwww.cs.wm.edu\u002F~denys\u002Fpubs\u002FASE%2716-DeepLearningClones.pdf) - Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk, ASE 2016.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [A study of repetitiveness of code changes in software evolution](https:\u002F\u002Flib.dr.iastate.edu\u002Fcgi\u002Fviewcontent.cgi?referer=https:\u002F\u002Fscholar.google.com\u002F&httpsredir=1&article=1016&context=cs_conf) - HA Nguyen, AT Nguyen, TT Nguyen, TN Nguyen, H Rajan, ASE 2013.\n\n#### Differentiable Interpreters\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.11361v1) - Joseph Suarez, Justin Johnson, Fei-Fei Li, 2018.\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [Improving the Universality and Learnability of Neural Programmer-Interpreters with Combinator Abstraction](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.02696v1) - Da Xiao, Jo-Yu Liao, Xingyuan Yuan, ICLR 2018.\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Differentiable Programs with Neural Libraries](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.02109v2) - Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, Daniel Tarlow, ICML 2017.\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [Differentiable Functional Program Interpreters](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01988v2) - John K. Feser, Marc Brockschmidt, Alexander L. Gaunt, Daniel Tarlow, 2017.\n- \u003Cimg src=\"badges\u002F18-pages-gray.svg\" alt=\"18-pages\" align=\"top\"> [Programming with a Differentiable Forth Interpreter](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.06640) - Bošnjak, Matko, Tim Rocktäschel, Jason Naradowsky, and Sebastian Riedel, ICML 2017.\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [Neural Functional Programming](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01988v1) - Feser John K., Marc Brockschmidt, Alexander L. Gaunt, and Daniel Tarlow, ICLR 2017.\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [TerpreT: A Probabilistic Programming Language for Program Induction](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.00817) - Gaunt, Alexander L., Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, and Daniel Tarlow, NIPS 2016.\n\n\u003Ca name=\"related-research\">\u003C\u002Fa>\n\n\u003Cdetails>\n\u003Csummary>Related research\u003C\u002Fsummary>\n\n#### AST Differencing\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [ClDiff: Generating Concise Linked Code Differences](https:\u002F\u002Fchenbihuan.github.io\u002Fpaper\u002Fase18-huang-cldiff.pdf) - Kaifeng Huang, Bihuan Chen, Xin Peng, Daihong Zhou, Ying Wang, Yang Liu, Wenyun Zhao, ASE 2018. [Code](https:\u002F\u002Fgithub.com\u002FFudanSELab\u002FCLDIFF).\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Generating Accurate and Compact Edit Scripts Using Tree Differencing](http:\u002F\u002Fwww.xifiggam.eu\u002Fwp-content\u002Fuploads\u002F2018\u002F08\u002FGeneratingAccurateandCompactEditScriptsusingTreeDifferencing.pdf) - Veit Frick, Thomas Grassauer, Fabian Beck, Martin Pinzger, ICSME 2018.\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [Fine-grained and Accurate Source Code Differencing](https:\u002F\u002Fhal.archives-ouvertes.fr\u002Fhal-01054552\u002Fdocument) - Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, Martin Monperrus, ASE 2014.\n\n#### Binary Data Modeling\n\n- [Clustering Binary Data with Bernoulli Mixture Models](https:\u002F\u002Fnsgrantham.com\u002Fdocuments\u002Fclustering-binary-data.pdf) - Neal S. Grantham.\n- [A Family of Blockwise One-Factor Distributions for Modelling High-Dimensional Binary Data](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.01343.pdf) - Matthieu Marbac and Mohammed Sedki, Computational Statistics & Data Analysis 2017.\n- [BayesBinMix: an R Package for Model Based Clustering of Multivariate Binary Data](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1609.06960.pdf) - Panagiotis Papastamoulis and Magnus Rattray, R Journal 2016.\n\n#### Soft Clustering Using T-mixture Models\n\n- [Robust mixture modelling using the t distribution](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload?doi=10.1.1.218.7334&rep=rep1&type=pdf) - D. Peel and G. J. McLachlan, Statistics and Computing 2000.\n- [Robust mixture modeling using the skew t distribution](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload?doi=10.1.1.1030.9865&rep=rep1&type=pdf) - Tsung I. Lin, Jack C. Lee and Wan J. Hsieh, Statistics and Computing 2010.\n\n#### Natural Language Parsing and Comprehension\n\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [A Fast Unified Model for Parsing and Sentence Understanding](https:\u002F\u002Farxiv.org\u002Fabs\u002F1603.06021) - Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, Christopher Potts, ACL 2016.\n\n\u003C\u002Fdetails>\n\n## Posts\n\n- [Semantic Code Search](https:\u002F\u002Ftowardsdatascience.com\u002Fsemantic-code-search-3cd6d244a39c)\n- [Learning from Source Code](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fblog\u002Flearning-source-code\u002F)\n- [Training a Model to Summarize Github Issues](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-to-create-data-products-that-are-magical-using-sequence-to-sequence-models-703f86a231f8)\n- [Sequence Intent Classification Using Hierarchical Attention Networks](https:\u002F\u002Fwww.microsoft.com\u002Fdeveloperblog\u002F2018\u002F03\u002F06\u002Fsequence-intent-classification\u002F)\n- [Syntax-Directed Variational Autoencoder for Structured Data](https:\u002F\u002Fmlatgt.blog\u002F2018\u002F02\u002F08\u002Fsyntax-directed-variational-autoencoder-for-structured-data\u002F)\n- [Weighted MinHash on GPU helps to find duplicate GitHub repositories.](https:\u002F\u002Fblog.sourced.tech\u002F\u002Fpost\u002Fminhashcuda\u002F)\n- [Source Code Identifier Embeddings](https:\u002F\u002Fblog.sourced.tech\u002Fpost\u002Fid2vec\u002F)\n- [Using recurrent neural networks to predict next tokens in the java solutions](https:\u002F\u002Fcodeforces.com\u002Fblog\u002Fentry\u002F52327)\n- [The half-life of code & the ship of Theseus](https:\u002F\u002Ferikbern.com\u002F2016\u002F12\u002F05\u002Fthe-half-life-of-code.html)\n- [The eigenvector of \"Why we moved from language X to language Y\"](https:\u002F\u002Ferikbern.com\u002F2017\u002F03\u002F15\u002Fthe-eigenvector-of-why-we-moved-from-language-x-to-language-y.html)\n- [Analyzing Github, How Developers Change Programming Languages Over Time](https:\u002F\u002Fblog.sourced.tech\u002Fpost\u002Flanguage_migrations\u002F)\n- [Topic Modeling of GitHub Repositories](https:\u002F\u002Fblog.sourced.tech\u002F\u002Fpost\u002Fgithub_topic_modeling\u002F)\n- [Aroma: Using machine learning for code recommendation](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Faroma-ml-for-code-recommendation\u002F)\n\n## Talks\n\n- [Machine Learning on Source Code](http:\u002F\u002Fvmarkovtsev.github.io\u002Fpydays-2018-vienna\u002F)\n- [Similarity of GitHub Repositories by Source Code Identifiers](http:\u002F\u002Fvmarkovtsev.github.io\u002Ftechtalks-2017-moscow\u002F)\n- [Using deep RNN to model source code](http:\u002F\u002Fvmarkovtsev.github.io\u002Fre-work-2016-london\u002F)\n- [Source code abstracts classification using CNN (1)](http:\u002F\u002Fvmarkovtsev.github.io\u002Fre-work-2016-berlin\u002F)\n- [Source code abstracts classification using CNN (2)](http:\u002F\u002Fvmarkovtsev.github.io\u002Fdata-natives-2016\u002F)\n- [Source code abstracts classification using CNN (3)](http:\u002F\u002Fvmarkovtsev.github.io\u002Fslush-2016\u002F)\n- [Embedding the GitHub contribution graph](https:\u002F\u002Fegorbu.github.io\u002Ftechtalks-2017-moscow)\n- [Measuring code sentiment in a Git repository](http:\u002F\u002Fvmarkovtsev.github.io\u002Fgophercon-2018-moscow\u002F)\n\n## Software\n\n#### Machine Learning\n\n- [Differentiable Neural Computer (DNC)](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdnc) - TensorFlow implementation of the Differentiable Neural Computer.\n- [sourced.ml](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fml) - Abstracts feature extraction from source code syntax trees and working with ML models.\n- [vecino](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fvecino) - Finds similar Git repositories.\n- [apollo](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fapollo) - Source code deduplication as scale, research.\n- [gemini](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fgemini) - Source code deduplication as scale, production.\n- [enry](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fenry) - Insanely fast file based programming language detector.\n- [hercules](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fhercules) - Git repository mining framework with batteries on top of go-git.\n- [DeepCS](https:\u002F\u002Fgithub.com\u002Fguxd\u002Fdeep-code-search) - Keras and Pytorch implementations of DeepCS (Deep Code Search).\n- [Code Neuron](https:\u002F\u002Fgithub.com\u002Fvmarkovtsev\u002Fcodeneuron) - Recurrent neural network to detect code blocks in natural language text.\n- [Naturalize](https:\u002F\u002Fgithub.com\u002Fmast-group\u002Fnaturalize) - Language agnostic framework for learning coding conventions from a codebase and then expoiting this information for suggesting better identifier names and formatting changes in the code.\n- [Extreme Source Code Summarization](https:\u002F\u002Fgithub.com\u002Fmast-group\u002Fconvolutional-attention) - Convolutional attention neural network that learns to summarize source code into a short method name-like summary by just looking at the source code tokens.\n- [Summarizing Source Code using a Neural Attention Model](https:\u002F\u002Fgithub.com\u002Fsriniiyer\u002Fcodenn) - CODE-NN, uses LSTM networks with attention to produce sentences that describe C# code snippets and SQL queries from StackOverflow. Torch over C#\u002FSQL\n- [Probabilistic API Miner](https:\u002F\u002Fgithub.com\u002Fmast-group\u002Fapi-mining) - Near parameter-free probabilistic algorithm for mining the most interesting API patterns from a list of API call sequences.\n- [Interesting Sequence Miner](https:\u002F\u002Fgithub.com\u002Fmast-group\u002Fsequence-mining) - Novel algorithm that mines the most interesting sequences under a probabilistic model. It is able to efficiently infer interesting sequences directly from the database.\n- [TASSAL](https:\u002F\u002Fgithub.com\u002Fmast-group\u002Ftassal) - Tool for the automatic summarization of source code using autofolding. Autofolding automatically creates a summary of a source code file by folding non-essential code and comment blocks.\n- [JNice2Predict](http:\u002F\u002Fwww.nice2predict.org\u002F) - Efficient and scalable open-source framework for structured prediction, enabling one to build new statistical engines more quickly.\n- [Clone Digger](http:\u002F\u002Fclonedigger.sourceforge.net\u002Fdownload.html) - clone detection for Python and Java.\n- [Sensibility](https:\u002F\u002Fgithub.com\u002Fnaturalness\u002Fsensibility) - Uses LSTMs to detect and correct syntax errors in Java source code.\n- [DeepBugs](https:\u002F\u002Fgithub.com\u002Fmichaelpradel\u002FDeepBugs) - Framework for learning bug detectors from an existing code corpus.\n- [DeepSim](https:\u002F\u002Fgithub.com\u002Fparasol-aser\u002Fdeepsim) - a deep learning-based approach to measure code functional similarity.\n- [rnn-autocomplete](https:\u002F\u002Fgithub.com\u002FZeRoGerc\u002Frnn-autocomplete) - Neural code autocompletion with RNN (bachelor's thesis).\n- [MindsDB](https:\u002F\u002Fgithub.com\u002Fmindsdb\u002Fmindsdb) - MindsDB is an Explainable AutoML framework for developers. With MindsDB you can build, train and use state of the art ML models in as simple as one line of code.\n\n#### Utilities\n\n- [go-git](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fgo-git) - Highly extensible Git implementation in pure Go which is friendly to data mining.\n- [bblfsh](https:\u002F\u002Fgithub.com\u002Fbblfsh) - Self-hosted server for source code parsing.\n- [engine](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fengine) - Scalable and distributed data retrieval pipeline for source code.\n- [minhashcuda](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fminhashcuda) - Weighted MinHash implementation on CUDA to efficiently find duplicates.\n- [kmcuda](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fkmcuda) - k-means on CUDA to cluster and to search for nearest neighbors in dense space.\n- [wmd-relax](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fwmd-relax) - Python package which finds nearest neighbors at Word Mover's Distance.\n- [Tregex, Tsurgeon and Semgrex](https:\u002F\u002Fnlp.stanford.edu\u002Fsoftware\u002Ftregex.shtml) - Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for \"tree regular expressions\").\n- [source{d} models](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fmodels) - Machine Learning models for MLonCode trained using the source{d} stack.\n\n#### Datasets\n\n- [Neural-Code-Search-Evaluation-Dataset](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FNeural-Code-Search-Evaluation-Dataset) - dataset contains links to 4.7M methods from 24k+ repositories with 287 StackOverflow questions and code snippet answers.\n- [CodeSearchNet](https:\u002F\u002Fgithub.com\u002Fgithub\u002FCodeSearchNet) -  collection of datasets and benchmarks for code retrieval using natural language. Contains 2M pairs of (`comment`, `code`).\n- [Public Git Archive](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fdatasets\u002Ftree\u002Fmaster\u002FPublicGitArchive) - 6 TB of Git repositories from GitHub.\n- [StackOverflow Question-Code Dataset](https:\u002F\u002Fgithub.com\u002FLittleYUYU\u002FStackOverflow-Question-Code-Dataset) - ~148K Python and ~120K SQL question-code pairs mined from StackOverflow.\n- [GitHub Issue Titles and Descriptions for NLP Analysis](https:\u002F\u002Fwww.kaggle.com\u002Fdavidshinn\u002Fgithub-issues\u002F) - ~8 million GitHub issue titles and descriptions from 2017.\n- [GitHub repositories - languages distribution](https:\u002F\u002Fdata.world\u002Fsource-d\u002Fgithub-repositories-languages-distribution) - Programming languages distribution in 14,000,000 repositories on GitHub (October 2016).\n- [452M commits on GitHub](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002F452-m-commits-on-github) - ≈ 452M commits' metadata from 16M repositories on GitHub (October 2016).\n- [GitHub readme files](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Fgithub-readme-files) - Readme files of all GitHub repositories (16M) (October 2016).\n- [from language X to Y](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Ffrom-language-x-to-y) - Cache file Erik Bernhardsson collected for his awesome blog post.\n- [GitHub word2vec 120k](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Fgithub-word-2-vec-120-k) - Sequences of identifiers extracted from top starred 120,000 GitHub repositories.\n- [GitHub Source Code Names](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Fgithub-source-code-names) - Names in source code extracted from 13M GitHub repositories, not people.\n- [GitHub duplicate repositories](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Fgithub-duplicate-repositories) - GitHub repositories not marked as forks but very similar to each other.\n- [GitHub lng keyword frequencies](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Fgithub-lng-keyword-frequencies) - Programming language keyword frequency extracted from 16M GitHub repositories.\n- [GitHub Java Corpus](http:\u002F\u002Fgroups.inf.ed.ac.uk\u002Fcup\u002FjavaGithub\u002F) - GitHub Java corpus is a set of Java projects collected from GitHub that we have used in a number of our publications. The corpus consists of 14,785 projects and 352,312,696 LOC.\n- [150k Python Dataset](https:\u002F\u002Fwww.sri.inf.ethz.ch\u002Fpy150) - Dataset consisting of 150,000 Python ASTs.\n- [150k JavaScript Dataset](https:\u002F\u002Fwww.sri.inf.ethz.ch\u002Fjs150) - Dataset consisting of 150,000 JavaScript files and their parsed ASTs.\n- [card2code](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fcard2code) - This dataset contains the language to code datasets described in the paper [Latent Predictor Networks for Code Generation](#card2code).\n- [NL2Bash](https:\u002F\u002Fgithub.com\u002FTellinaTool\u002Fnl2bash) - This dataset contains a set of ~10,000 bash one-liners collected from websites such as StackOverflow and their English descriptions written by Bash programmers, as described in the [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.08979).\n- [GitHub JavaScript Dump October 2016](https:\u002F\u002Farchive.org\u002Fdetails\u002Fjavascript-sources-oct2016.sqlite3) - Dataset consisting of 494,352 syntactically-valid JavaScript files obtained from the top ~10000 starred JavaScript repositories on GitHub, with licenses, and parsed ASTs.\n- [BigCloneBench](https:\u002F\u002Fjeffsvajlenko.weebly.com\u002Fbigcloneeval.html) - Clone detection benchmark of 8 million function clone pairs in the IJaDataset.\n\n## Credits\n\n- A lot of references and articles were taken from [mast-group](https:\u002F\u002Fmast-group.github.io\u002F).\n- Inspired by [Awesome Machine Learning](https:\u002F\u002Fgithub.com\u002Fjosephmisiti\u002Fawesome-machine-learning).\n\n## Contributions\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md). TL;DR: create a [pull request](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code\u002Fpulls) which is [signed off](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code\u002Fblob\u002Fmaster\u002FCONTRIBUTING.md#certificate-of-origin).\n\n## License\n\n[![License: CC BY-SA 4.0](badges\u002FLicense-CC-BY--SA-4.0-lightgrey.svg)](https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-sa\u002F4.0\u002F)\n","# 源代码上的优秀机器学习 [![源代码上的优秀机器学习](badges\u002Fawesome.svg)](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code) [![CI 状态](https:\u002F\u002Ftravis-ci.org\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code.svg)](https:\u002F\u002Ftravis-ci.org\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code)\n\n![源代码上的优秀机器学习](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsrc-d_awesome-machine-learning-on-source-code_readme_4467c186ec43.png)\n\n**注意：此仓库已不再积极维护，将不会进行任何进一步的更新，也不会回复或处理任何问题和拉取请求。**\n一个替代的、仍在积极维护的版本可以在 [ml4code.github.io](https:\u002F\u002Fml4code.github.io\u002Fpapers.html) [仓库](https:\u002F\u002Fgithub.com\u002Fml4code\u002Fml4code.github.io) 中找到。\n\n一份精心整理的优秀研究论文、数据集和软件项目的列表，专注于机器学习与源代码的研究。[#MLonCode](https:\u002F\u002Ftwitter.com\u002Fhashtag\u002FMLonCode)\n\n## 目录\n\n- [摘要](#digests)\n- [会议](#conferences)\n- [竞赛](#competitions)\n- [论文](#papers)\n  - [程序合成与归纳](#program-synthesis-and-induction)\n  - [源代码分析与语言建模](#source-code-analysis-and-language-modeling)\n  - [神经网络架构与算法](#neural-network-architectures-and-algorithms)\n  - [软件工程中的嵌入表示](#embeddings-in-software-engineering)\n  - [程序翻译](#program-translation)\n  - [代码建议与补全](#code-suggestion-and-completion)\n  - [程序修复与缺陷检测](#program-repair-and-bug-detection)\n  - [API 与代码挖掘](#apis-and-code-mining)\n  - [代码优化](#code-optimization)\n  - [主题建模](#topic-modeling)\n  - [情感分析](#sentiment-analysis)\n  - [代码摘要](#code-summarization)\n  - [克隆检测](#clone-detection)\n  - [可微分解释器](#differentiable-interpreters)\n  - [相关研究](#related-research)\u003Cdetails>\u003Csummary>(链接需要打开“相关研究”才能查看)\u003C\u002Fsummary>\n    - [AST 差异化](#ast-differencing)\n    - [二进制数据建模](#binary-data-modeling)\n    - [基于 T-混合模型的软聚类](#soft-clustering-using-t-mixture-models)\n    - [自然语言解析与理解](#natural-language-parsing-and-comprehension)\n      \u003C\u002Fdetails>\n- [文章](#posts)\n- [演讲](#talks)\n- [软件](#software)\n  - [机器学习](#machine-learning)\n  - [工具](#utilities)\n- [数据集](#datasets)\n- [致谢](#credits)\n- [贡献](#contributions)\n- [许可证](#license)\n\n## 摘要\n\n- [从“大数据”中学习](http:\u002F\u002Flearnbigcode.github.io) - 关于“大数据”的技术、挑战、工具和数据集。\n- [机器学习在大数据与自然性方面的综述](https:\u002F\u002Fml4code.github.io\u002F) - 关于源代码上机器学习的综述与文献回顾。\n\n## 会议\n\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [ACM 国际软件工程大会，ICSE](https:\u002F\u002Fwww.icse2018.org\u002F)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [ACM 国际自动化软件工程大会，ASE](https:\u002F\u002F2019.aseconf.org)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [ACM 欧洲联合软件工程大会及软件工程基础研讨会 (FSE)](https:\u002F\u002Fconf.researchr.org\u002Fhome\u002Ffse-2018)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [2018 IEEE 第25届软件分析、演化与重构国际会议 (SANER)](https:\u002F\u002Fwww.conference-publishing.com\u002Flist.php?Event=SANER18MAIN)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [编程中的机器学习](https:\u002F\u002Fml4p.org\u002F)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [面向软件工程的自然语言处理研讨会](https:\u002F\u002Fnl4se.github.io\u002F)\n- \u003Cimg src=\"badges\u002Forigin-industry-green.svg\" alt=\"origin-industry\" align=\"top\"> [SysML](http:\u002F\u002Fwww.sysml.cc\u002F)\n  - [演讲](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUChutDKIa-AYyAmbT45s991g\u002F)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [软件资源挖掘](http:\u002F\u002Fwww.msrconf.org\u002F)\n- \u003Cimg src=\"badges\u002Forigin-industry-green.svg\" alt=\"origin-industry\" align=\"top\"> [AIFORSE](https:\u002F\u002Faiforse.org\u002F)\n- \u003Cimg src=\"badges\u002Forigin-industry-green.svg\" alt=\"origin-industry\" align=\"top\"> [source{d} 技术讲座](https:\u002F\u002Fblog.sourced.tech\u002Fpost\u002Fml_talks_moscow\u002F)\n  - [演讲](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL5Ld68ole7j3iQFUSB3fR9122dHCUWXsy)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [NIPS 神经抽象机与程序归纳研讨会](https:\u002F\u002Fuclmr.github.io\u002Fnampi\u002F)\n  - [演讲](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLzTDea_cM27LVPSTdK9RypSyqBHZWPywt)\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [CamAIML](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fevent\u002Fartificial-intelligence-and-machine-learning-in-cambridge-2017\u002F)\n  - [学习编程：用于程序归纳的机器学习](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vzDuVhFMB9Q) - 亚历山大·高恩特。\n- \u003Cimg src=\"badges\u002Forigin-academia-blue.svg\" alt=\"origin-academia\" align=\"top\"> [MASES 2018](https:\u002F\u002Fmases18.github.io\u002F)\n\n## 竞赛\n\n- [CodRep](https:\u002F\u002Fgithub.com\u002FKTH\u002FCodRep-competition) - 自动程序修复竞赛：给定一行源代码，找出插入点。\n\n## 论文\n\n#### 程序合成与归纳\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [基于学习的代码习语的程序合成与语义解析](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.10816v2) - Richard Shin、Miltiadis Allamanis、Marc Brockschmidt、Oleksandr Polozov，2019年。\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [用于神经程序合成的合成数据集](https:\u002F\u002Fopenreview.net\u002Fforum?id=ryeOSnAqYm) - Richard Shin、Neel Kant、Kavi Gupta、Chris Bender、Brandon Trabucco、Rishabh Singh、Dawn Song，ICLR 2019。\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [执行引导的神经程序合成](https:\u002F\u002Fopenreview.net\u002Fforum?id=H1gfOiAqYm) - Xinyun Chen、Chang Liu、Dawn Song，ICLR 2019。\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [DeepFuzz：用于模糊测试的语法有效的C程序自动生成](https:\u002F\u002Ffaculty.ist.psu.edu\u002Fwu\u002Fpapers\u002FDeepFuzz.pdf) - Xiao Liu、Xiaoting Li、Rupesh Prajapati、Dinghao Wu，AAAI 2019。\n- \u003Cimg src=\"badges\u002F12-pages-beginner-brightgreen.svg\" alt=\"12-pages-beginner\" align=\"top\"> [NL2Bash：面向Linux操作系统的自然语言接口语料库及语义解析器](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.08979v2) - Xi Victoria Lin、Chenglong Wang、Luke Zettlemoyer、Michael D. Ernst，LREC 2018。\n- \u003Cimg src=\"badges\u002F18-pages-gray.svg\" alt=\"18-pages\" align=\"top\"> [神经程序合成的最新进展](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.02353v1) - Neel Kant，2018年。\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [用于条件程序生成的神经草图学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.05698) - Vijayaraghavan Murali、Letao Qi、Swarat Chaudhuri、Chris Jermaine，ICLR 2018。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [神经程序搜索：从描述和示例中解决编程任务](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.04335v1) - Illia Polosukhin、Alexander Skidanov，ICLR 2018。\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [基于优先队列训练的神经程序合成](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.03526v1) - Daniel A. Abolafia、Mohammad Norouzi、Quoc V. Le，2018年。\n- \u003Cimg src=\"badges\u002F31-pages-gray.svg\" alt=\"31-pages\" align=\"top\"> [迈向从输入-输出示例中合成复杂程序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.01284v3) - Xinyun Chen、Chang Liu、Dawn Song，ICLR 2018。\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [玻璃盒程序合成：一种机器学习方法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.08669v1) - Konstantina Christakopoulou、Adam Tauman Kalai，AAAI 2018。\n- \u003Cimg src=\"badges\u002F14-pages-beginner-brightgreen.svg\" alt=\"14-pages\" align=\"top\"> [为预测建模合成基准数据](https:\u002F\u002Fchriscummins.cc\u002Fpub\u002F2017-cgo.pdf) - Chris Cummins、Pavlos Petoumenos、Zheng Wang、Hugh Leather，CGO 2017。\n- \u003Cimg src=\"badges\u002F17-pages-beginner-brightgreen.svg\" alt=\"17-pages-beginner\" align=\"top\"> [用于字符级语言建模的程序合成](https:\u002F\u002Ffiles.sri.inf.ethz.ch\u002Fwebsite\u002Fpapers\u002Fcharmodel-iclr2017.pdf) - Pavol Bielik、Veselin Raychev、Martin Vechev，ICLR 2017。\n- \u003Cimg src=\"badges\u002F13-pages-beginner-brightgreen.svg\" alt=\"13-pages-beginner\" align=\"top\"> [SQLNet：无需强化学习即可从自然语言生成结构化查询](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.04436v1) - Xiaojun Xu、Chang Liu、Dawn Song，2017年。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [学习如何为程序合成选择示例](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.03243v1) - Yewen Pu、Zachery Miranda、Armando Solar-Lezama、Leslie Pack Kaelbling，2017年。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [神经程序元归纳](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.04157v1) - Jacob Devlin、Rudy Bunel、Rishabh Singh、Matthew Hausknecht、Pushmeet Kohli，NIPS 2017。\n- \u003Cimg src=\"badges\u002F14-pages-beginner-brightgreen.svg\" alt=\"14-pages-beginner\" align=\"top\"> [学习从手绘图像推断图形程序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.09627v4) - Kevin Ellis、Daniel Ritchie、Armando Solar-Lezama、Joshua B. Tenenbaum，2017年。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [用于程序生成的神经属性机器](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.09231v2) - Matthew Amodio、Swarat Chaudhuri、Thomas Reps，2017年。\n- \u003Cimg src=\"badges\u002F11-pages-beginner-brightgreen.svg\" alt=\"11-pages-beginner\" align=\"top\"> [用于代码生成和语义解析的抽象语法网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.07535v1) - Maxim Rabinovich、Mitchell Stern、Dan Klein，ACL 2017。\n- \u003Cimg src=\"badges\u002F20-pages-gray.svg\" alt=\"20-pages\" align=\"top\"> [通过递归使神经编程架构具备泛化能力](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1704.06611v1.pdf) - Jonathon Cai、Richard Shin、Dawn Song，ICLR 2017。\n- \u003Cimg src=\"badges\u002F14-pages-gray.svg\" alt=\"14-pages\" align=\"top\"> [一种用于通用代码生成的句法神经模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.01696v1) - Pengcheng Yin、Graham Neubig，ACL 2017。\n- \u003Cimg src=\"badges\u002F12-pages-beginner-brightgreen.svg\" alt=\"12-pages-beginner\" align=\"top\"> [利用循环神经网络从自然语言进行程序合成](https:\u002F\u002Fhomes.cs.washington.edu\u002F~mernst\u002Fpubs\u002Fnl-command-tr170301.pdf) - Xi Victoria Lin、Chenglong Wang、Deric Pang、Kevin Vu、Luke Zettlemoyer、Michael Ernst，2017年。\n- \u003Cimg src=\"badges\u002F18-pages-beginner-brightgreen.svg\" alt=\"18-pages-beginner\" align=\"top\"> [RobustFill：在噪声输入输出条件下进行神经程序学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.07469v1) - Jacob Devlin、Jonathan Uesato、Surya Bhupatiraju、Rishabh Singh、Abdel-rahman Mohamed、Pushmeet Kohli，ICML 2017。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [基于示例的终身感知式编程](https:\u002F\u002Fopenreview.net\u002Fpdf?id=HJStZKqel) - Gaunt、Alexander L.、Marc Brockschmidt、Nate Kushman和Daniel Tarlow，2017年。\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [基于示例的神经程序编程](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.04990v1) - Chengxun Shu、Hongyu Zhang，AAAI 2017。\n- \u003Cimg src=\"badges\u002F21-pages-gray.svg\" alt=\"21-pages\" align=\"top\"> [DeepCoder：学习编写程序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01989) - Balog Matej、Alexander L. Gaunt、Marc Brockschmidt、Sebastian Nowozin和Daniel Tarlow，ICLR 2017。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [一种可微分的归纳逻辑编程方法](https:\u002F\u002Fpdfs.semanticscholar.org\u002F9698\u002F409fc1603d28b6d51c38261f6243837c8bdd.pdf) - Yang Fan、Zhilin Yang和William W. Cohen，2017年。\n- \u003Cimg src=\"badges\u002F12-pages-beginner-brightgreen.svg\" alt=\"12-pages-beginner\" align=\"top\"> [用于如果-那么型程序合成的潜在注意力机制](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01867v1) - Xinyun Chen、Chang Liu、Richard Shin、Dawn Song、Mingcheng Chen，NIPS 2016。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\" id=\"card2code\"> [用于代码生成的潜在预测网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F1603.06744) - Wang Ling、Edward Grefenstette、Karl Moritz Hermann、Tomáš Kočiský、Andrew Senior、Fumin Wang、Phil Blunsom，ACL 2016。\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [神经符号机器：在弱监督下于Freebase上学习语义解析器（简版）](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.01197) - Liang Chen、Jonathan Berant、Quoc Le、Kenneth D. Forbus和Ni Lao，NIPS 2016。\n- \u003Cimg src=\"badges\u002F5-pages-gray.svg\" alt=\"5-pages\" align=\"top\"> [程序作为黑箱解释](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.07579) - Singh、Sameer、Marco Tulio Ribeiro和Carlos Guestrin，NIPS 2016。\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [基于搜索的代码模板泛化与精炼](http:\u002F\u002Fsoft.vub.ac.be\u002FPublications\u002F2016\u002Fvub-soft-tr-16-06.pdf) - Tim Molderez、Coen De Roover，SSBSE 2016。\n- \u003Cimg src=\"badges\u002F14-pages-gray.svg\" alt=\"14-pages\" align=\"top\"> [自然源代码的结构化生成模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1401.0514) - Chris J. Maddison、Daniel Tarlow，ICML 2014。\n\n#### 源代码分析与语言建模\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [面向大型代码集的机器学习词汇建模](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.01873v1) - Hlib Babii、Andrea Janes、Romain Robbes，2019年。\n- \u003Cimg src=\"badges\u002F24-pages-gray.svg\" alt=\"24-pages\" align=\"top\"> [基于图的生成式代码建模](https:\u002F\u002Fopenreview.net\u002Fforum?id=Bke4KsA5FX) - Marc Brockschmidt、Miltiadis Allamanis、Alexander L. Gaunt、Oleksandr Polozov，ICLR 2019。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [NL2Type：从自然语言信息中推断 JavaScript 函数类型](http:\u002F\u002Fsoftware-lab.org\u002Fpublications\u002Ficse2019_NL2Type.pdf) - Rabee Sohail Malik、Jibesh Patra、Michael Pradel，ICSE 2019。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [基于抽象语法树的新型神经网络源代码表示](http:\u002F\u002Fxuwang.tech\u002Fpaper\u002Fastnn_icse2019.pdf) - Jian Zhang、Xu Wang、Hongyu Zhang、Hailong Sun、Kaixuan Wang、Xudong Liu，ICSE 2019。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [深度学习类型推断](http:\u002F\u002Fvhellendoorn.github.io\u002FPDF\u002Ffse2018-j2t.pdf) - Vincent J. Hellendoorn、Christian Bird、Earl T. Barr 和 Miltiadis Allamanis，FSE 2018。[代码](https:\u002F\u002Fgithub.com\u002FDeepTyper\u002FDeepTyper)。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [用于学习源代码变更的 Tree2Tree 神经翻译模型](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.00314.pdf) - Saikat Chakraborty、Miltiadis Allamanis、Baishakhi Ray，2018年。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [code2seq：从代码的结构化表示中生成序列](https:\u002F\u002Farxiv.org\u002Fabs\u002F1808.01400) - Uri Alon、Omer Levy、Eran Yahav，2018年。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [语法与理性：利用语言模型检测并修复语法错误](http:\u002F\u002Fsoftwareprocess.es\u002Fpubs\u002Fsantos2018SANER-syntax.pdf) - Eddie Antonio Santos、Joshua Charles Campbell、Dhvani Patel、Abram Hindle 和 José Nelson Amaral，SANER 2018。\n- \u003Cimg src=\"badges\u002F25-pages-gray.svg\" alt=\"25-pages\" align=\"top\"> [code2vec：学习代码的分布式表示](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.09473v2) - Uri Alon、Meital Zilberstein、Omer Levy、Eran Yahav，2018年。\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [使用图来表示程序的学习方法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.00740v1) - Miltiadis Allamanis、Marc Brockschmidt、Mahmoud Khademi，ICLR 2018。\n- \u003Cimg src=\"badges\u002F36-pages-gray.svg\" alt=\"36-pages\" align=\"top\"> [面向大型代码和自然性的机器学习综述](https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.06182v1) - Miltiadis Allamanis、Earl T. Barr、Premkumar Devanbu、Charles Sutton，2017年。\n- \u003Cimg src=\"badges\u002F36-pages-gray.svg\" alt=\"36-pages\" align=\"top\"> [深度神经网络是建模源代码的最佳选择吗？](http:\u002F\u002Fweb.cs.ucdavis.edu\u002F~devanbu\u002FisDLgood.pdf) - Vincent J. Hellendoorn、Premkumar Devanbu，FSE 2017。\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [用于软件代码的深度语言模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.02715v1) - Hoa Khanh Dam、Truyen Tran、Trang Pham，2016年。\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [用于编程语言处理的树结构卷积神经网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.5718) - Lili Mou、Ge Li、Lu Zhang、Tao Wang、Zhi Jin，AAAI-16。[代码](https:\u002F\u002Fgithub.com\u002Fcrestonbunch\u002Ftbcnn)。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [建议准确的方法名和类名](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Faccurate-method-and-class.pdf) - Miltiadis Allamanis、Earl T. Barr、Christian Bird、Charles Sutton，FSE 2015。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [利用语言建模在大规模上挖掘源代码仓库](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Fmsr2013.pdf) - Miltiadis Allamanis、Charles Sutton，MSR 2013。\n\n#### 神经网络架构与算法\n\n- \u003Cimg src=\"badges\u002F19-pages-gray.svg\" alt=\"19-pages\" align=\"top\"> [利用递归树搜索与规划学习组合式神经程序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.12941v1) - 托马斯·皮埃罗、纪尧姆·利涅尔、斯科特·里德、奥利维耶·西戈、尼古拉·佩兰、亚历山大·拉泰尔、大卫·卡斯、卡里姆·贝吉尔、南多·德弗雷塔斯，2019年。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [从程序到可解释的深度模型再返回](https:\u002F\u002Flink.springer.com\u002Fcontent\u002Fpdf\u002F10.1007%2F978-3-319-96145-3_2.pdf) - 埃兰·亚哈夫，ICCAV 2018。\n- \u003Cimg src=\"badges\u002F13-pages-gray.svg\" alt=\"13-pages\" align=\"top\"> [神经代码理解：代码语义的可学习表示](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.07336) - 塔尔·本努恩、爱丽丝·肖莎娜·雅各博维茨、托斯滕·霍夫勒，NIPS 2018。\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [用于预测程序属性的通用路径表示法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.09544) - 乌里·阿隆、梅塔尔·齐尔伯斯坦、奥默·列维、埃兰·亚哈夫，PLDI 2018。\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [使用双边基于树的卷积神经网络进行跨语言程序分类学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.06159v2) - Nghi D. Q. Bui、江凌霄、于一军，AAAI 2018。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [用于跨语言算法分类的双边依存神经网络](https:\u002F\u002Fbdqnghi.github.io\u002Ffiles\u002FSANER_2019_bilateral_dependency.pdf) - Nghi D. Q. Bui、于一军、江凌霄，SANER 2018。\n- \u003Cimg src=\"badges\u002F17-pages-gray.svg\" alt=\"17-pages\" align=\"top\"> [面向结构化数据的语法导向变分自编码器](https:\u002F\u002Fopenreview.net\u002Fpdf?id=SyqShMZRb) - 韩俊戴、应涛田、博戴、史蒂文·斯基纳、乐松，ICLR 2018。\n- \u003Cimg src=\"badges\u002F19-pages-gray.svg\" alt=\"19-pages\" align=\"top\"> [用神经网络进行分而治之](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.02401) - 诺瓦克、亚历克斯和琼·布鲁纳，ICLR 2018。\n- \u003Cimg src=\"badges\u002F13-pages-gray.svg\" alt=\"13-pages\" align=\"top\"> [层次化的多尺度循环神经网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.01704) - 钟俊英、成镇安和约书亚·本吉奥，ICLR 2017。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [利用层次化注意力记忆学习高效算法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.03218) - 安德里乔维奇、马尔钦和卡罗尔·库拉赫，2016年。\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [用神经图灵机学习栈上的操作](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.00827) - 德勒、特里斯坦和约瑟夫·杜罗，NIPS 2016。\n- \u003Cimg src=\"badges\u002F5-pages-gray.svg\" alt=\"5-pages\" align=\"top\"> [概率神经程序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.00712) - 穆雷、肯顿·W. 和贾扬特·克里希纳穆提，NIPS 2016。\n- \u003Cimg src=\"badges\u002F13-pages-gray.svg\" alt=\"13-pages\" align=\"top\"> [神经程序员-解释器](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06279) - 里德、斯科特和南多·德弗雷塔斯，ICLR 2016。\n- \u003Cimg src=\"badges\u002F9-pages-gray.svg\" alt=\"9-pages\" align=\"top\"> [神经GPU学习算法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.08228) - 凯泽、卢卡什和伊利亚·苏茨克维尔，ICLR 2016。\n- \u003Cimg src=\"badges\u002F17-pages-gray.svg\" alt=\"17-pages\" align=\"top\"> [神经随机存取机器](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06392v3) - 卡罗尔·库拉赫、马尔钦·安德里乔维奇、伊利亚·苏茨克维尔，ERCIM News 2016。\n- \u003Cimg src=\"badges\u002F18-pages-gray.svg\" alt=\"18-pages\" align=\"top\"> [神经程序员：通过梯度下降诱导潜在程序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.04834) - 尼拉坎坦、阿尔温德、Quoc V. Le和伊利亚·苏茨克维尔，ICLR 2015。\n- \u003Cimg src=\"badges\u002F25-pages-gray.svg\" alt=\"25-pages\" align=\"top\"> [学习执行](https:\u002F\u002Farxiv.org\u002Fabs\u002F1410.4615v3) - 沃伊切赫·扎伦巴、伊利亚·苏茨克维尔，2015年。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [利用栈增强型循环网络推断算法模式](https:\u002F\u002Farxiv.org\u002Fabs\u002F1503.01007) - 朱林、阿曼德和托马斯·米科洛夫，NIPS 2015。\n- \u003Cimg src=\"badges\u002F26-pages-gray.svg\" alt=\"26-pages\" align=\"top\"> [神经图灵机](https:\u002F\u002Farxiv.org\u002Fabs\u002F1410.5401) - 格雷夫斯、亚历克斯、格雷格·韦恩和伊沃·丹尼赫尔卡，2014年。\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [从机器学习到机器推理](https:\u002F\u002Farxiv.org\u002Fabs\u002F1102.1808) - 博图·莱昂，《机器学习杂志》2011年。\n\n#### 软件工程中的嵌入技术\n\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [源代码嵌入技术的文献综述](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.03061) - 陈子敏和马丁·蒙佩鲁斯，2019年。\n- \u003Cimg src=\"badges\u002F3-pages-gray.svg\" alt=\"3-pages\" align=\"top\"> [基于AST的深度学习用于检测恶意PowerShell脚本](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.09230.pdf) - 吉莉·鲁萨克、阿卜杜拉·阿尔-杜贾伊利、乌娜-梅·奥赖利，2018年。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [深度代码搜索](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3180167) - 谢晓东、张宏宇、金圣勋，ICSE 2018。\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [软件工程领域的词嵌入](https:\u002F\u002Fgithub.com\u002Fvefstathiou\u002FSO_word2vec\u002Fblob\u002Fmaster\u002FMSR18-w2v.pdf) - 瓦西利基·埃夫斯塔修、克里斯托斯·哈兹利纳斯、迪奥米迪斯·斯皮内利斯，MSR 2018。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=top> [\n  代码向量：通过嵌入的抽象符号轨迹理解程序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.06686) - 乔丹·亨克尔、舒文杜·K·拉希里、本·利布利特、托马斯·雷普斯，FSE 2018。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [通过代码图嵌入估计文档距离](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F320074701_Document_Distance_Estimation_via_Code_Graph_Embedding) - 林泽奇、赵俊峰、邹延振、谢兵，Internetware 2017。\n- \u003Cimg src=\"badges\u002F3-pages-gray.svg\" alt=\"3-pages\" align=\"top\"> [将Word2Vec与改进的向量空间模型结合以提升代码检索效果](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F318123700_Combining_Word2Vec_with_Revised_Vector_Space_Model_for_Better_Code_Retrieval) - 阮清云、阮英团、潘洪登、阮仲德、阮天宁，ICSE 2017。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [从词嵌入到文档相似性，以改善软件工程中的信息检索](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F296526040_From_Word_Embeddings_To_Document_Similarities_for_Improved_Information_Retrieval_in_Software_Engineering) - 叶欣、沈慧、马晓、拉兹万·布内斯库、刘畅，ICSE 2016。\n- \u003Cimg src=\"badges\u002F3-pages-gray.svg\" alt=\"3-pages\" align=\"top\"> [利用向量表示映射API元素以支持代码迁移](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=2892661) - 阮仲德、阮英团、阮天宁，ICSE 2016。\n\n#### 程序翻译\n\n- \u003Cimg src=\"badges\u002F18-pages-gray.svg\" alt=\"18-pages\" align=\"top\"> [迈向神经网络反编译](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.08325v1) - 奥默·卡茨、尤瓦尔·奥尔沙克、约阿夫·戈德堡、埃兰·亚哈夫，2019年。\n- \u003Cimg src=\"badges\u002F14-pages-gray.svg\" alt=\"14-pages\" align=\"top\"> [用于程序翻译的树到树神经网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.03691v1) - 陈鑫云、刘畅、宋晓东，ICLR 2018。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [代码注意力机制：利用领域特征将代码翻译为注释](https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.07642v2) - 郑文浩、周洪宇、李明、吴建新，2017年。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [基于神经机器翻译自动从代码差异中生成提交信息](https:\u002F\u002Farxiv.org\u002Fabs\u002F1708.09492v1) - 蒋思远、阿米尔·阿马利、科林·麦克米伦，ASE 2017。\n- \u003Cimg src=\"badges\u002F5-pages-gray.svg\" alt=\"5-pages\" align=\"top\"> [用于自动化代码文档生成与代码自动生成的 Python 函数及文档字符串平行语料库](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.02275v1) - 安东尼奥·瓦莱里奥·米切利·巴罗内、里科·森尼希，ICNLP 2017。\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [一种从源代码变更中生成自然语言描述的神经架构](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.04856v1) - 巴勃罗·洛约拉、爱迪生·马雷塞-泰勒、松尾丰、ACL 2017。\n\n#### 代码建议与补全\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [Aroma：通过结构化代码搜索进行代码推荐](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.01158) - 卢思飞、杨迪、库希克·森和萨蒂什·钱德拉，2019年。\n- \u003Cimg src=\"badges\u002F9-pages-gray.svg\" alt=\"9-pages\" align=\"top\"> [利用深度学习实现智能代码评审](https:\u002F\u002Fwww.kdd.org\u002Fkdd2018\u002Ffiles\u002Fdeep-learning-day\u002FDLDay18_paper_40.pdf) - 安舒尔·古普塔、尼尔·桑达雷桑，KDD 深度学习日 2018。\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [基于神经注意力机制与指针网络的代码补全](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.09573v1) - 李健、王悦、Irwin King、Michael R. Lyu，2017年。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [使用稀疏指针网络学习 Python 代码建议](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.08307) - 阿维什卡尔·布普昌德、蒂姆·罗克特谢尔、厄尔·巴尔、塞巴斯蒂安·里德尔，2016年。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [基于统计语言模型的代码补全](http:\u002F\u002Fwww.cs.technion.ac.il\u002F~yahave\u002Fpapers\u002Fpldi14-statistical.pdf) - 维塞林·雷切夫、马丁·韦切夫、埃兰·亚哈夫，PLDI 2014。\n\n#### 程序修复与缺陷检测\n\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [SampleFix：通过采样多样化的修复方案来学习程序修复](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.10502) - Hossein Hajipour、Apratim Bhattacharya、Mario Fritz，2019年。\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [用于二进制软件漏洞检测的最大化差异序列自编码器](https:\u002F\u002Fopenreview.net\u002Fforum?id=ByloIiCqYQ) - Tue Le、Tuan Nguyen、Trung Le、Dinh Phung、Paul Montague、Olivier De Vel、Lizhen Qu，ICLR 2019。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [通过联合学习定位与修复进行神经网络程序修复](https:\u002F\u002Fopenreview.net\u002Fforum?id=ByloJ20qtm) - Marko Vasic、Aditya Kanade、Petros Maniatis、David Bieber、Rishabh Singh，ICLR 2019。\n- \u003Cimg src=\"badges\u002F11-pages-beginner-brightgreen.svg\" alt=\"11-pages\" align=\"top\"> [基于深度学习的编译器模糊测试](https:\u002F\u002Fchriscummins.cc\u002Fpub\u002F2018-issta.pdf) - Chris Cummins、Pavlos Petoumenos、Alastair Murray、Hugh Leather，ISSTA 2018。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [自动评估组合式分析发现的漏洞](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3243130) - Saahil Ognawala、Ricardo Nales Amato、Alexander Pretschner 和 Pooja Kulkarni，MASES 2018。\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [利用神经机器翻译在真实环境中学习修复补丁的实证研究](http:\u002F\u002Fwww.cs.wm.edu\u002F~denys\u002Fpubs\u002FASE%2718-Learning-Bug-Fixes-NMT.pdf) - Michele Tufano、Cody Watson、Gabriele Bavota、Massimiliano Di Penta、Martin White、Denys Poshyvanyk，ASE 2018。\n- \u003Cimg src=\"badges\u002F23-pages-gray.svg\" alt=\"23-pages\" align=\"top\"> [DeepBugs：一种基于名称的漏洞检测学习方法](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1805.11683.pdf) - Michael Pradel、Koushik Sen，2018年。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [从漏洞修复中学习如何变异源代码](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.10772) - Michele Tufano、Cody Watson、Gabriele Bavota、Massimiliano Di Penta、Martin White、Denys Poshyvanyk，2018年。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [用于软件缺陷预测的深度树模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.00921) - HK Dam、T Pham、SW Ng、[T Tran](https:\u002F\u002Ftruyentran.github.io)、J Grundy、A Ghose、T Kim、CJ Kim，2018年。\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [利用深度表示学习实现源代码中的自动化漏洞检测](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.04320) - Rebecca L. Russell、Louis Kim、Lei H. Hamilton、Tomo Lazovich、Jacob A. Harer、Onur Ozdemir、Paul M. Ellingwood、Marc W. McConley，2018年。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [利用现有补丁和相似代码塑造程序修复空间](https:\u002F\u002Fxiongyingfei.github.io\u002Fpapers\u002FISSTA18a.pdf) - Jiajun Jiang、Yingfei Xiong、Hongyu Zhang、Qing Gao、Xiangqun Chen，2018年。（[代码](https:\u002F\u002Fgithub.com\u002Fxgdsmileboy\u002FSimFix)）。\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [利用生成对抗网络学习修复软件漏洞](https:\u002F\u002Farxiv.org\u002Fabs\u002F1805.07475) - Jacob A. Harer、Onur Ozdemir、Tomo Lazovich、Christopher P. Reale、Rebecca L. Russell、Louis Y. Kim、Peter Chin，2018年。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [用于程序修复的动态神经程序嵌入](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.07163v2) - Ke Wang、Rishabh Singh、Zhendong Su，ICLR 2018。\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [估计源代码的缺陷性：基于GitHub内容的预测模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.07764) - Ritu Kapur、Balwinder Sodhi，2018年。\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [利用机器学习进行自动化软件漏洞检测](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.04497) - Jacob A. Harer、Louis Y. Kim、Rebecca L. Russell、Onur Ozdemir、Leonard R. Kosta、Akshay Rangamani、Lei H. Hamilton、Gabriel I. Centeno、Jonathan R. Key、Paul M. Ellingwood、Marc W. McConley、Jeffrey M. Opper、Peter Chin、Tomo Lazovich，IWSPA 2018。\n- \u003Cimg src=\"badges\u002F34-pages-gray.svg\" alt=\"34-pages\" align=\"top\"> [从数据中学习静态分析工具](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01752) - Pavol Bielik、Veselin Raychev、Martin Vechev，CAV 2017。[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=bkieI3jLxVY)。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [类型化还是非类型化：量化JavaScript中的可检测缺陷](http:\u002F\u002Fearlbarr.com\u002Fpublications\u002Ftypestudy.pdf) - Zheng Gao、Christian Bird、Earl Barr，ICSE 2017。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [通过深度学习代码相似性对程序修复要素进行排序与转换](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.04742) - Martin White、Michele Tufano、Matías Martínez、Martin Monperrus、Denys Poshyvanyk，2017年。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [使用神经符号转换网络进行语义代码修复](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.11054v1) - Jacob Devlin、Jonathan Uesato、Rishabh Singh、Pushmeet Kohli，2017年。\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [从提交信息和缺陷报告中自动识别安全问题](http:\u002F\u002Fasankhaya.github.io\u002Fpdf\u002Fautomated-identification-of-security-issues-from-commit-messages-and-bug-reports.pdf) - Yaqin Zhou 和 Asankhaya Sharma，FSE 2017。\n- \u003Cimg src=\"badges\u002F31-pages-gray.svg\" alt=\"31-pages\" align=\"top\"> [SmartPaste：学习适应源代码](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.07867) - Miltiadis Allamanis、Marc Brockschmidt，2017年。\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [通过神经记忆网络从原始源代码端到端预测缓冲区溢出](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.02458v1) - Min-je Choi、Sehun Jeong、Hakjoo Oh、Jaegul Choo，IJCAI 2017。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [定制突变体更适合修复缺陷](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.02516) - Miltiadis Allamanis、Earl T. Barr、René Just、Charles Sutton，2016年。\n\n#### API与代码挖掘\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [SAR: 用少量知识学习跨语言API映射](https:\u002F\u002Fbdqnghi.github.io\u002Ffiles\u002FFSE_2019.pdf) - Nghi D. Q. Bui、Yijun Yu、Lingxiao Jiang，FSE 2019。\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [通过代码的分布式向量表示进行跨语言映射的层次化学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.04715) - Nghi D. Q. Bui、Lingxiao Jiang，ICSE 2018。\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [DeepAM：利用多模态序列到序列学习迁移API](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.07734v1) - Xiaodong Gu、Hongyu Zhang、Dongmei Zhang、Sunghun Kim，IJCAI 2017。\n- \u003Cimg src=\"badges\u002F9-pages-gray.svg\" alt=\"9-pages\" align=\"top\"> [挖掘变更历史以发现未知的系统性编辑](http:\u002F\u002Fsoft.vub.ac.be\u002FPublications\u002F2017\u002Fvub-soft-tr-17-04.pdf) - Tim Molderez、Reinout Stevens、Coen De Roover，MSR 2017。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [深度API学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.08535v3) - Xiaodong Gu、Hongyu Zhang、Dongmei Zhang、Sunghun Kim，FSE 2016。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [探索API嵌入用于API使用与应用](http:\u002F\u002Fhome.eng.iastate.edu\u002F~trong\u002Fprojects\u002Fjv2cs\u002F) - Nguyen、Nguyen、Phan和Nguyen，系统与软件期刊2017年。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [面向软件开发的API使用模式推荐](http:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0164121216301200) - Haoran Niu、Iman Keivanloo、Ying Zou，2017年。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [无参数的概率论GitHub跨API挖掘](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Ffse2016.pdf) - Jaroslav Fowkes、Charles Sutton，FSE 2016。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [用于序列模式挖掘的子序列交错模型](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Fkdd2016-subsequence-interleaving.pdf) - Jaroslav Fowkes、Charles Sutton，KDD 2016。\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [精简版GHTorrent：按需获取GitHub数据](https:\u002F\u002Fbvasiles.github.io\u002Fpapers\u002Flean-ghtorrent.pdf) - Georgios Gousios、Bogdan Vasilescu、Alexander Serebrenik、Andy Zaidman，MSR 2014。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [从源代码中挖掘惯用语](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Fidioms.pdf) - Miltiadis Allamanis、Charles Sutton，FSE 2014。\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [GHTorrent数据集及工具套件](http:\u002F\u002Fwww.gousios.gr\u002Fpub\u002Fghtorrent-dataset-toolsuite.pdf) - Georgios Gousios，MSR 2013。\n\n#### 代码优化\n\n- \u003Cimg src=\"badges\u002F27-pages-gray.svg\" alt=\"27-pages\" align=\"top\"> [学习型索引结构的理由](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.01208v2) - Tim Kraska、Alex Beutel、Ed H. Chi、Jeffrey Dean、Neoklis Polyzotis，SIGMOD 2018。\n- \u003Cimg src=\"badges\u002F14-pages-gray.svg\" alt=\"14-pages\" align=\"top\"> [端到端深度学习优化启发式算法](https:\u002F\u002Fchriscummins.cc\u002Fpub\u002F2017-pact.pdf) - Chris Cummins、Pavlos Petoumenos、Zheng Wang、Hugh Leather，PACT 2017。\n- \u003Cimg src=\"badges\u002F14-pages-gray.svg\" alt=\"14-pages\" align=\"top\"> [学习如何超优化程序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01787v3) - Rudy Bunel、Alban Desmaison、M. Pawan Kumar、Philip H.S. Torr、Pushmeet Kohli，ICLR 2017。\n- \u003Cimg src=\"badges\u002F18-pages-gray.svg\" alt=\"18-pages\" align=\"top\"> [神经网络可从二进制文件中学习函数类型签名](https:\u002F\u002Fwww.usenix.org\u002Fsystem\u002Ffiles\u002Fconference\u002Fusenixsecurity17\u002Fsec17-chua.pdf) - Zheng Leong Chua、Shiqi Shen、Prateek Saxena和Zhenkai Liang，USENIX安全研讨会2017年。\n- \u003Cimg src=\"badges\u002F25-pages-gray.svg\" alt=\"25-pages\" align=\"top\"> [适应性神经编译](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.07969v2) - Rudy Bunel、Alban Desmaison、Pushmeet Kohli、Philip H.S. Torr、M. Pawan Kumar，NIPS 2016。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [学习超优化程序——研讨会版本](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.01094) - Bunel、Rudy、Alban Desmaison、M. Pawan Kumar、Philip H. S. Torr和Pushmeet Kohli，NIPS 2016。\n\n#### 主题建模\n\n- \u003Cimg src=\"badges\u002F9-pages-gray.svg\" alt=\"9-pages\" align=\"top\"> [一种与语言无关的语义源代码标注模型](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3243132) - Ben Gelman、Bryan Hoyle、Jessica Moore、Joshua Saxe和David Slater，MASES 2018。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [利用源代码中的名称对公共仓库进行大规模主题建模](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.00135) - Vadim Markovtsev、Eiso Kant，2017年。\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [为什么、何时、什么：按主题、类型和代码分析Stack Overflow问题](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002FmsrCh2013.pdf) - Miltiadis Allamanis、Charles Sutton，MSR 2013。\n- \u003Cimg src=\"badges\u002F30-pages-gray.svg\" alt=\"30-pages\" align=\"top\"> [语义聚类：识别源代码中的主题](http:\u002F\u002Fscg.unibe.ch\u002Farchive\u002Fdrafts\u002FKuhn06bSemanticClustering.pdf) - Adrian Kuhn、Stéphane Ducasse、Tudor Girba，信息与软件技术2007年。\n\n#### 情感分析\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [面向软件工程研究的情感分析基准研究](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.06525) - Nicole Novielli、Daniela Girardi、Filippo Lanubile，MSR 2018。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [软件工程中的情感分析：我们能做到哪一步？](http:\u002F\u002Fwww.inf.usi.ch\u002Fphd\u002Flin\u002Fdownloads\u002FLin2018a.pdf) - Bin Lin、Fiorella Zampetti、Gabriele Bavota、Massimiliano Di Penta、Michele Lanza、Rocco Oliveto，ICSE 2018。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [在软件工程中利用自动化情感分析](http:\u002F\u002Fcs.uno.edu\u002F~zibran\u002Fresources\u002FMyPapers\u002FSentiStrengthSE_2017.pdf) - Md Rakibul Islam、Minhaz F. Zibran，MSR 2017。\n- \u003Cimg src=\"badges\u002F27-pages-gray.svg\" alt=\"27-pages\" align=\"top\"> [面向软件开发的情感极性检测](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1709.02984.pdf) - Fabio Calefato、Filippo Lanubile、Federico Maiorano、Nicole Novielli，经验软件工程2017年。\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [SentiCR：用于代码评审交互的定制情感分析工具](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F0Byog0ILN8S1haGxpT3hvSzZxdms\u002Fview) - Toufique Ahmed、Amiangshu Bosu、Anindya Iqbal、Shahram Rahimi，ASE 2017。\n\n#### 代码摘要\n\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [利用迁移的API知识总结源代码](https:\u002F\u002Fxin-xia.github.io\u002Fpublication\u002Fijcai18.pdf) - 胡星、李革、夏鑫、戴维·洛、陆帅、金志，IJCAI 2018。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [深度代码注释生成](https:\u002F\u002Fxin-xia.github.io\u002Fpublication\u002Ficpc182.pdf) - 胡星、李革、夏鑫、戴维·洛、金志，ICPC 2018。\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [用于检索和总结源代码的神经框架](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3240471) - 陈青莹、周明辉，ASE 2018。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [通过深度强化学习改进自动源代码摘要](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.07234) - 万瑶、赵舟、杨敏、徐冠东、应浩超、吴健和Philip S. Yu，ASE 2018。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [用于源代码极端摘要的卷积注意力网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.03001) - 米尔蒂阿迪斯·阿拉马尼斯、彭浩、查尔斯·萨顿，ICML 2016。\n- \u003Cimg src=\"badges\u002F4-pages-gray.svg\" alt=\"4-pages\" align=\"top\"> [TASSAL：用于源代码摘要的自动折叠](http:\u002F\u002Fhomepages.inf.ed.ac.uk\u002Fcsutton\u002Fpublications\u002Ficse2016-demo.pdf) - 约瑟夫·福克斯、潘卡詹·钱提拉塞加兰、拉兹万·兰卡、米尔蒂阿迪斯·阿拉马尼斯、米雷拉·拉帕塔、查尔斯·萨顿，ICSE 2016。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [使用神经注意力模型总结源代码](https:\u002F\u002Fgithub.com\u002Fsriniiyer\u002Fcodenn\u002Fblob\u002Fmaster\u002Fsummarizing_source_code.pdf) - 斯里尼瓦桑·艾耶尔、伊万尼斯·康斯塔斯、阿尔文·张、卢克·泽特勒莫耶，ACL 2016。\n- \u003Cimg src=\"badges\u002F13-pages-gray.svg\" alt=\"13-pages\" align=\"top\"> [拉取请求描述的自动生成](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.06987) - 刘忠信、夏鑫、克里斯托夫·特罗德、戴维·洛、李善平，ASE 2019。\n\n#### 克隆检测\n\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [基于学习的递归聚合抽象语法树用于代码克隆检测](https:\u002F\u002Fpvs.ifi.uni-heidelberg.de\u002Ffileadmin\u002Fpapers\u002F2019\u002FBuech-Andrzejak-SANER2019.pdf) - 卢茨·布赫和阿图尔·安德热亚克，SANER 2019。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [Oreo：在临界区域检测克隆](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3236026) - 维巴夫·赛尼、法里玛·法尔马希尼法拉哈尼、吕亚东、皮埃尔·巴尔迪和克里斯蒂娜·V·洛佩斯，FSE 2018。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [程序相似度的深度学习方法](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3243131) - 尼科洛·马拉斯托尼、罗伯托·贾科巴齐和米拉·达拉普雷达，MASES 2018。\n- \u003Cimg src=\"badges\u002F6-pages-gray.svg\" alt=\"6-pages\" align=\"top\"> [用于代码克隆检测的循环神经网络](https:\u002F\u002Fseim-conf.org\u002Fmedia\u002Fmaterials\u002F2018\u002Fproceedings\u002FSEIM-2018_Short_Papers.pdf#page=48) - 阿尔谢尼·佐林和弗拉基米尔·伊茨克森，SEIM 2018。\n- \u003Cimg src=\"badges\u002F8-pages-gray.svg\" alt=\"8-pages\" align=\"top\"> [代码复制对机器学习模型中代码的影响](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.06469) - 米尔蒂阿迪斯·阿拉马尼斯，2018年。\n- \u003Cimg src=\"badges\u002F28-pages-gray.svg\" alt=\"28-pages\" align=\"top\"> [DéjàVu：GitHub上的代码重复地图](http:\u002F\u002Fjanvitek.org\u002Fpubs\u002Foopsla17b.pdf) - 克里斯蒂娜·V·洛佩斯、彼得·马伊、佩德罗·马丁斯、维巴夫·赛尼、迪扬、雅库布·齐特尼、希特什·萨杰纳尼、扬·维泰克，编程语言OOPSLA 2017。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [有的来自这里，有的来自那里：GitHub中的跨项目代码重用](http:\u002F\u002Fweb.cs.ucdavis.edu\u002F~filkov\u002Fpapers\u002Fclones.pdf) - 穆罕默德·加雷海亚齐、贝莎基·雷、弗拉基米尔·菲尔科夫，MSR 2017。\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [用于代码克隆检测的深度学习代码片段](http:\u002F\u002Fwww.cs.wm.edu\u002F~denys\u002Fpubs\u002FASE%2716-DeepLearningClones.pdf) - 马丁·怀特、米歇尔·图法诺、克里斯托弗·文多姆和丹尼斯·波希瓦尼克，ASE 2016。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [软件演化中代码变更重复性的研究](https:\u002F\u002Flib.dr.iastate.edu\u002Fcgi\u002Fviewcontent.cgi?referer=https:\u002F\u002Fscholar.google.com\u002F&httpsredir=1&article=1016&context=cs_conf) - HA Nguyen、AT Nguyen、TT Nguyen、TN Nguyen、H Rajan，ASE 2013。\n\n#### 可微分解释器\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [DDRprog：一个CLEVR可微分动态推理程序员](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.11361v1) - 约瑟夫·苏亚雷斯、贾斯汀·约翰逊、李飞飞，2018年。\n- \u003Cimg src=\"badges\u002F16-pages-gray.svg\" alt=\"16-pages\" align=\"top\"> [通过组合抽象提升神经程序员-解释器的通用性和可学习性](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.02696v1) - 大晓、廖若宇、袁兴元，ICLR 2018。\n- \u003Cimg src=\"badges\u002F10-pages-gray.svg\" alt=\"10-pages\" align=\"top\"> [带有神经库的可微分程序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.02109v2) - 亚历山大·L·高恩特、马克·布罗克施密特、内特·库什曼、丹尼尔·塔洛，ICML 2017。\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [可微分函数式程序解释器](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01988v2) - 约翰·K·费瑟、马克·布罗克施密特、亚历山大·L·高恩特、丹尼尔·塔洛，2017年。\n- \u003Cimg src=\"badges\u002F18-pages-gray.svg\" alt=\"18-pages\" align=\"top\"> [使用可微分Forth解释器编程](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.06640) - 博什尼亚克、马特科、蒂姆·罗克特舍尔、杰森·纳拉多夫斯基和塞巴斯蒂安·里德尔，ICML 2017。\n- \u003Cimg src=\"badges\u002F15-pages-gray.svg\" alt=\"15-pages\" align=\"top\"> [神经函数式编程](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.01988v1) - 费瑟·约翰·K、布罗克施密特·马克、高恩特·亚历山大·L、塔洛·丹尼尔，ICLR 2017。\n- \u003Cimg src=\"badges\u002F7-pages-gray.svg\" alt=\"7-pages\" align=\"top\"> [TerpreT：一种用于程序归纳的概率编程语言](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.00817) - 高恩特·亚历山大·L、布罗克施密特·马克、里沙布·辛格、库什曼·内特、科利·普什米特、泰勒·乔纳森和塔洛·丹尼尔，NIPS 2016。\n\n\u003Ca name=\"related-research\">\u003C\u002Fa>\n\n\u003Cdetails>\n\u003Csummary>相关研究\u003C\u002Fsummary>\n\n#### AST差异分析\n\n- \u003Cimg src=\"badges\u002F12-pages-gray.svg\" alt=\"12-pages\" align=\"top\"> [ClDiff：生成简洁的关联代码差异](https:\u002F\u002Fchenbihuan.github.io\u002Fpaper\u002Fase18-huang-cldiff.pdf) - 黄凯峰、陈碧欢、彭欣、周代红、王颖、刘洋、赵文云，ASE 2018。[代码](https:\u002F\u002Fgithub.com\u002FFudanSELab\u002FCLDIFF)。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [使用树形差异算法生成准确且紧凑的编辑脚本](http:\u002F\u002Fwww.xifiggam.eu\u002Fwp-content\u002Fuploads\u002F2018\u002F08\u002FGeneratingAccurateandCompactEditScriptsusingTreeDifferencing.pdf) - 维特·弗里克、托马斯·格拉绍尔、法比安·贝克、马丁·平茨格，ICSME 2018。\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [细粒度且精确的源代码差异比较](https:\u002F\u002Fhal.archives-ouvertes.fr\u002Fhal-01054552\u002Fdocument) - 让-雷米·法莱里、弗洛雷尔·莫兰达、哈维埃·布朗克、马蒂亚斯·马丁内斯、马丁·蒙佩鲁斯，ASE 2014。\n\n#### 二值数据建模\n\n- [利用伯努利混合模型对二值数据进行聚类](https:\u002F\u002Fnsgrantham.com\u002Fdocuments\u002Fclustering-binary-data.pdf) - 尼尔·S·格兰瑟姆。\n- [一类分块单因子分布用于高维二值数据建模](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.01343.pdf) - 马蒂厄·马尔巴克和穆罕默德·塞德基，计算统计与数据分析，2017年。\n- [BayesBinMix：一个用于多变量二值数据模型化聚类的R包](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1609.06960.pdf) - 帕纳吉奥蒂斯·帕帕斯塔穆利斯和马格努斯·拉特雷，R期刊，2016年。\n\n#### 使用t混合模型的软聚类\n\n- [利用t分布进行稳健的混合建模](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload?doi=10.1.1.218.7334&rep=rep1&type=pdf) - D.皮尔和G. J.麦克拉克兰，统计与计算，2000年。\n- [利用偏t分布进行稳健的混合建模](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload?doi=10.1.1.1030.9865&rep=rep1&type=pdf) - 林宗义、杰克·李和谢万钧，统计与计算，2010年。\n\n#### 自然语言解析与理解\n\n- \u003Cimg src=\"badges\u002F11-pages-gray.svg\" alt=\"11-pages\" align=\"top\"> [一种快速统一的句法解析与句子理解模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1603.06021) - 塞缪尔·R·鲍曼、乔恩·戈蒂耶、阿比纳夫·拉斯托吉、拉加夫·古普塔、克里斯托弗·D·曼宁、克里斯托弗·波茨，ACL 2016。\n\n\u003C\u002Fdetails>\n\n\n\n## 文章\n\n- [语义代码搜索](https:\u002F\u002Ftowardsdatascience.com\u002Fsemantic-code-search-3cd6d244a39c)\n- [从源代码中学习](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fblog\u002Flearning-source-code\u002F)\n- [训练模型以总结GitHub问题](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-to-create-data-products-that-are-magical-using-sequence-to-sequence-models-703f86a231f8)\n- [使用层次注意力网络进行序列意图分类](https:\u002F\u002Fwww.microsoft.com\u002Fdeveloperblog\u002F2018\u002F03\u002F06\u002Fsequence-intent-classification\u002F)\n- [面向结构化数据的语法导向变分自编码器](https:\u002F\u002Fmlatgt.blog\u002F2018\u002F02\u002F08\u002Fsyntax-directed-variational-autoencoder-for-structured-data\u002F)\n- [GPU上的加权MinHash有助于查找重复的GitHub仓库。](https:\u002F\u002Fblog.sourced.tech\u002F\u002Fpost\u002Fminhashcuda\u002F)\n- [源代码标识符嵌入](https:\u002F\u002Fblog.sourced.tech\u002Fpost\u002Fid2vec\u002F)\n- [使用循环神经网络预测Java解决方案中的下一个标记](https:\u002F\u002Fcodeforces.com\u002Fblog\u002Fentry\u002F52327)\n- [代码的半衰期与忒修斯之船](https:\u002F\u002Ferikbern.com\u002F2016\u002F12\u002F05\u002Fthe-half-life-of-code.html)\n- [“我们为何从语言X迁移到语言Y”的特征向量](https:\u002F\u002Ferikbern.com\u002F2017\u002F03\u002F15\u002Fthe-eigenvector-of-why-we-moved-from-language-x-to-language-y.html)\n- [分析GitHub：开发者如何随时间改变编程语言](https:\u002F\u002Fblog.sourced.tech\u002Fpost\u002Flanguage_migrations\u002F)\n- [GitHub仓库的主题建模](https:\u002F\u002Fblog.sourced.tech\u002F\u002Fpost\u002Fgithub_topic_modeling\u002F)\n- [Aroma：利用机器学习进行代码推荐](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Faroma-ml-for-code-recommendation\u002F)\n\n## 演讲\n\n- [源代码上的机器学习](http:\u002F\u002Fvmarkovtsev.github.io\u002Fpydays-2018-vienna\u002F)\n- [基于源代码标识符的GitHub仓库相似性](http:\u002F\u002Fvmarkovtsev.github.io\u002Ftechtalks-2017-moscow\u002F)\n- [使用深度RNN建模源代码](http:\u002F\u002Fvmarkovtsev.github.io\u002Fre-work-2016-london\u002F)\n- [使用CNN对源代码摘要进行分类（1）](http:\u002F\u002Fvmarkovtsev.github.io\u002Fre-work-2016-berlin\u002F)\n- [使用CNN对源代码摘要进行分类（2）](http:\u002F\u002Fvmarkovtsev.github.io\u002Fdata-natives-2016\u002F)\n- [使用CNN对源代码摘要进行分类（3）](http:\u002F\u002Fvmarkovtsev.github.io\u002Fslush-2016\u002F)\n- [嵌入GitHub贡献图](https:\u002F\u002Fegorbu.github.io\u002Ftechtalks-2017-moscow)\n- [在Git仓库中测量代码情感](http:\u002F\u002Fvmarkovtsev.github.io\u002Fgophercon-2018-moscow\u002F)\n\n## 软件\n\n#### 机器学习\n\n- [可微神经计算机（DNC）](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdnc) - 可微神经计算机的 TensorFlow 实现。\n- [sourced.ml](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fml) - 从源代码语法树中提取特征，并用于处理机器学习模型。\n- [vecino](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fvecino) - 查找相似的 Git 仓库。\n- [apollo](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fapollo) - 大规模的源代码去重研究项目。\n- [gemini](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fgemini) - 大规模的源代码去重生产工具。\n- [enry](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fenry) - 极其快速的基于文件内容的编程语言检测器。\n- [hercules](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fhercules) - 基于 go-git 的电池级 Git 仓库挖掘框架。\n- [DeepCS](https:\u002F\u002Fgithub.com\u002Fguxd\u002Fdeep-code-search) - DeepCS（深度代码搜索）的 Keras 和 PyTorch 实现。\n- [Code Neuron](https:\u002F\u002Fgithub.com\u002Fvmarkovtsev\u002Fcodeneuron) - 用于在自然语言文本中检测代码块的循环神经网络。\n- [Naturalize](https:\u002F\u002Fgithub.com\u002Fmast-group\u002Fnaturalize) - 与语言无关的框架，可以从代码库中学习编码规范，并利用这些信息为代码建议更好的标识符名称和格式化更改。\n- [极端源代码摘要](https:\u002F\u002Fgithub.com\u002Fmast-group\u002Fconvolutional-attention) - 卷积注意力神经网络，仅通过查看源代码标记即可学习将源代码总结为类似短方法名的摘要。\n- [使用神经注意力模型总结源代码](https:\u002F\u002Fgithub.com\u002Fsriniiyer\u002Fcodenn) - CODE-NN 使用带有注意力机制的 LSTM 网络，生成描述 StackOverflow 上 C# 代码片段和 SQL 查询的句子。基于 Torch 的 C#\u002FSQL 模型。\n- [概率 API 挖掘器](https:\u002F\u002Fgithub.com\u002Fmast-group\u002Fapi-mining) - 几乎无需参数的概率算法，用于从一系列 API 调用序列中挖掘最有意思的 API 模式。\n- [有趣序列挖掘器](https:\u002F\u002Fgithub.com\u002Fmast-group\u002Fsequence-mining) - 基于概率模型的新颖算法，能够高效地直接从数据库中推断出有趣的序列。\n- [TASSAL](https:\u002F\u002Fgithub.com\u002Fmast-group\u002Ftassal) - 使用自动折叠功能对源代码进行自动摘要的工具。自动折叠会通过折叠非关键代码和注释块来创建源代码文件的摘要。\n- [JNice2Predict](http:\u002F\u002Fwww.nice2predict.org\u002F) - 高效且可扩展的开源结构化预测框架，使开发者能够更快地构建新的统计引擎。\n- [Clone Digger](http:\u002F\u002Fclonedigger.sourceforge.net\u002Fdownload.html) - 用于 Python 和 Java 的代码克隆检测工具。\n- [Sensibility](https:\u002F\u002Fgithub.com\u002Fnaturalness\u002Fsensibility) - 使用 LSTM 检测并修复 Java 源代码中的语法错误。\n- [DeepBugs](https:\u002F\u002Fgithub.com\u002Fmichaelpradel\u002FDeepBugs) - 从现有代码语料库中学习缺陷检测器的框架。\n- [DeepSim](https:\u002F\u002Fgithub.com\u002Fparasol-aser\u002Fdeepsim) - 基于深度学习的方法，用于衡量代码的功能相似性。\n- [rnn-autocomplete](https:\u002F\u002Fgithub.com\u002FZeRoGerc\u002Frnn-autocomplete) - 使用 RNN 的神经网络代码补全（本科毕业论文）。\n- [MindsDB](https:\u002F\u002Fgithub.com\u002Fmindsdb\u002Fmindsdb) - MindsDB 是面向开发者的可解释 AutoML 框架。借助 MindsDB，只需一行代码即可构建、训练和使用最先进的机器学习模型。\n\n#### 工具\n\n- [go-git](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fgo-git) - 高度可扩展的纯 Go 实现的 Git 库，非常适合数据挖掘。\n- [bblfsh](https:\u002F\u002Fgithub.com\u002Fbblfsh) - 自托管的源代码解析服务器。\n- [engine](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fengine) - 可扩展且分布式的源代码数据检索管道。\n- [minhashcuda](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fminhashcuda) - 在 CUDA 上实现的加权 MinHash，用于高效查找重复项。\n- [kmcuda](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fkmcuda) - 在 CUDA 上实现的 k-means 算法，用于聚类并在密集空间中搜索最近邻。\n- [wmd-relax](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fwmd-relax) - 一个 Python 包，用于以 Word Mover's Distance 寻找最近邻。\n- [Tregex、Tsurgeon 和 Semgrex](https:\u002F\u002Fnlp.stanford.edu\u002Fsoftware\u002Ftregex.shtml) - Tregex 是一种基于树状关系和节点上的正则表达式匹配来匹配树中模式的工具（该名称是“tree regular expressions”的缩写）。\n- [source{d} models](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fmodels) - 使用 source{d} 技术栈训练的 MLonCode 机器学习模型。\n\n#### 数据集\n\n- [神经代码搜索评估数据集](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FNeural-Code-Search-Evaluation-Dataset) - 该数据集包含来自24,000多个仓库的470万个方法链接，以及287个StackOverflow问题和对应的代码片段答案。\n- [CodeSearchNet](https:\u002F\u002Fgithub.com\u002Fgithub\u002FCodeSearchNet) - 一个用于自然语言代码检索的数据集和基准测试集合。包含200万个(`注释`, `代码`)对。\n- [公共Git档案](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fdatasets\u002Ftree\u002Fmaster\u002FPublicGitArchive) - 来自GitHub的6 TB大小的Git仓库数据。\n- [StackOverflow问题-代码数据集](https:\u002F\u002Fgithub.com\u002FLittleYUYU\u002FStackOverflow-Question-Code-Dataset) - 从StackOverflow挖掘出的约14.8万条Python问题及代码对，以及约12万条SQL问题与代码对。\n- [GitHub Issue标题和描述用于NLP分析](https:\u002F\u002Fwww.kaggle.com\u002Fdavidshinn\u002Fgithub-issues\u002F) - 2017年收集的约800万个GitHub Issue标题和描述。\n- [GitHub仓库语言分布](https:\u002F\u002Fdata.world\u002Fsource-d\u002Fgithub-repositories-languages-distribution) - GitHub上1400万个仓库中编程语言的分布情况（2016年10月）。\n- [GitHub上的4.52亿次提交](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002F452-m-commits-on-github) - 来自GitHub上1600万个仓库的约4.52亿次提交元数据（2016年10月）。\n- [GitHub README文件](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Fgithub-readme-files) - 所有GitHub仓库（1600万个）的README文件（2016年10月）。\n- [从X语言到Y语言](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Ffrom-language-x-to-y) - Erik Bernhardsson为他精彩的博客文章收集的缓存文件。\n- [GitHub word2vec 12万](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Fgithub-word-2-vec-120-k) - 从GitHub上最受星标关注的12万个仓库中提取的标识符序列。\n- [GitHub源代码名称](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Fgithub-source-code-names) - 从1300万个GitHub仓库中提取的源代码中的名称，而非人名。\n- [GitHub重复仓库](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Fgithub-duplicate-repositories) - 虽未标记为分支但彼此非常相似的GitHub仓库。\n- [GitHub编程语言关键字频率](https:\u002F\u002Fdata.world\u002Fvmarkovtsev\u002Fgithub-lng-keyword-frequencies) - 从1600万个GitHub仓库中提取的编程语言关键字频率。\n- [GitHub Java语料库](http:\u002F\u002Fgroups.inf.ed.ac.uk\u002Fcup\u002FjavaGithub\u002F) - GitHub Java语料库是从GitHub收集的一组Java项目，我们已在多项出版物中使用过。该语料库包含14,785个项目和352,312,696行代码。\n- [15万Python数据集](https:\u002F\u002Fwww.sri.inf.ethz.ch\u002Fpy150) - 由15万个Python AST组成的数据集。\n- [15万JavaScript数据集](https:\u002F\u002Fwww.sri.inf.ethz.ch\u002Fjs150) - 由15万个JavaScript文件及其解析后的AST组成的数据集。\n- [card2code](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fcard2code) - 该数据集包含论文[用于代码生成的潜在预测网络](#card2code)中描述的语言到代码数据集。\n- [NL2Bash](https:\u002F\u002Fgithub.com\u002FTellinaTool\u002Fnl2bash) - 该数据集包含约1万个从StackOverflow等网站收集的Bash单行命令及其由Bash程序员撰写的英文描述，如[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.08979)所述。\n- [GitHub JavaScript转储2016年10月](https:\u002F\u002Farchive.org\u002Fdetails\u002Fjavascript-sources-oct2016.sqlite3) - 该数据集包含494,352个语法有效的JavaScript文件，这些文件来自GitHub上排名前10,000的受星标关注的JavaScript仓库，附带许可证信息，并已解析为AST。\n- [BigCloneBench](https:\u002F\u002Fjeffsvajlenko.weebly.com\u002Fbigcloneeval.html) - IJaDataset中包含800万个函数克隆对的克隆检测基准测试。\n\n\n\n## 致谢\n\n- 许多参考文献和文章均摘自[mast-group](https:\u002F\u002Fmast-group.github.io\u002F)。\n- 受[Awesome Machine Learning](https:\u002F\u002Fgithub.com\u002Fjosephmisiti\u002Fawesome-machine-learning)启发。\n\n## 贡献\n\n请参阅[CONTRIBUTING.md](CONTRIBUTING.md)。简而言之：创建一个[拉取请求](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code\u002Fpulls)，并按照[贡献说明](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code\u002Fblob\u002Fmaster\u002FCONTRIBUTING.md#certificate-of-origin)签署确认。\n\n## 许可证\n\n[![许可证：CC BY-SA 4.0](badges\u002FLicense-CC-BY--SA-4.0-lightgrey.svg)](https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-sa\u002F4.0\u002F)","# awesome-machine-learning-on-source-code 快速上手指南\n\n**重要提示**：本仓库（`src-d\u002Fawesome-machine-learning-on-source-code`）**已停止维护**，不再接受新的更新、Issue 或 PR。\n官方推荐的替代方案是活跃维护的 **[ML4Code](https:\u002F\u002Fml4code.github.io\u002Fpapers.html)** 项目。本指南旨在帮助用户快速访问该列表中的核心资源（论文、数据集、工具），而非安装一个可执行的软件包。\n\n## 环境准备\n\n由于本项目是一个 curated list（精选列表），本身不包含可执行代码，因此无需特定的运行时环境。但为了阅读和利用列表中的资源，建议准备以下环境：\n\n*   **操作系统**：Windows \u002F macOS \u002F Linux 均可。\n*   **基础工具**：\n    *   `Git`：用于克隆仓库或访问相关项目。\n    *   现代浏览器：用于访问论文链接（arXiv, OpenReview 等）和数据集主页。\n*   **开发环境（可选）**：如果您计划复现列表中的某个具体算法或运行相关软件，通常需要：\n    *   Python 3.6+\n    *   PyTorch 或 TensorFlow\n    *   Jupyter Notebook\n\n## 获取资源\n\n本项目无需“安装”，直接通过 Git 克隆即可在本地浏览完整的分类索引。\n\n### 1. 克隆仓库\n打开终端，执行以下命令：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code.git\ncd awesome-machine-learning-on-source-code\n```\n\n> **国内加速建议**：如果 GitHub 访问缓慢，可使用国内镜像源（如 Gitee 上的同步镜像，若有）或通过代理加速克隆。\n> ```bash\n> # 示例：使用 Gitee 镜像（如果存在）\n> git clone https:\u002F\u002Fgitee.com\u002Fmirror\u002Fawesome-machine-learning-on-source-code.git\n> ```\n\n### 2. 访问在线版本\n直接访问官方 GitHub 页面查看整理好的目录结构：\n[https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code](https:\u002F\u002Fgithub.com\u002Fsrc-d\u002Fawesome-machine-learning-on-source-code)\n\n**强烈推荐访问替代项目（活跃维护）：**\n*   **网站**: [https:\u002F\u002Fml4code.github.io](https:\u002F\u002Fml4code.github.io)\n*   **仓库**: `git clone https:\u002F\u002Fgithub.com\u002Fml4code\u002Fml4code.github.io.git`\n\n## 基本使用\n\n本项目的核心价值在于其**分类索引**。您可以按照以下步骤查找所需的机器学习与源代码结合的研究资源：\n\n### 步骤 1：确定研究方向\n打开克隆后的 `README.md` 文件或在线页面，根据 **Contents** 目录找到您感兴趣的领域，例如：\n*   **Code Suggestion and Completion** (代码建议与补全)\n*   **Program Repair and Bug Detection** (程序修复与缺陷检测)\n*   **Code Summarization** (代码摘要)\n*   **Embeddings in Software Engineering** (软件工程中的嵌入表示)\n\n### 步骤 2：获取论文或数据集\n在对应分类下，点击论文标题链接（通常指向 arXiv 或会议官网）阅读文献，或点击 **Datasets** 部分获取训练数据。\n\n**示例：查找“代码补全”相关的最新论文**\n1.  定位到 `Papers` -> `Code Suggestion and Completion` 章节。\n2.  阅读列出的论文标题，如 *[Neural Program Synthesis with Priority Queue Training]*。\n3.  点击链接下载 PDF 或在 arXiv 上查看摘要。\n\n### 步骤 3：复现相关软件\n列表中 **Software** 章节提供了具体的开源工具链接。若需使用，请跳转到对应工具的仓库进行安装。\n\n**示例流程：**\n1.  在 `Software` -> `Machine Learning` 中找到感兴趣的项目（例如某个代码嵌入工具）。\n2.  复制该项目的 GitHub 链接。\n3.  进入该项目仓库，遵循其独立的 `README` 进行安装和运行（通常涉及 `pip install` 或 `docker run`）。\n\n```bash\n# 假设您找到了一个名为 \"code2vec\" 的工具（示例）\ngit clone https:\u002F\u002Fgithub.com\u002Ftech-srl\u002Fcode2vec.git\ncd code2vec\npip install -r requirements.txt\npython code2vec.py --data data\u002Fsmall_c_corpus --model models\u002Fc_model\n```\n\n## 核心资源分类速查\n\n为了方便中国开发者快速定位，以下是该列表涵盖的主要研究领域：\n\n| 领域 | 英文标签 | 典型应用场景 |\n| :--- | :--- | :--- |\n| **程序合成** | Program Synthesis | 根据自然语言描述自动生成代码 |\n| **代码分析** | Source Code Analysis | 代码复杂度预测、风格迁移 |\n| **代码补全** | Code Suggestion | IDE 智能提示、自动完成 |\n| **缺陷修复** | Program Repair | 自动定位并修复 Bug |\n| **代码克隆** | Clone Detection | 查重、抄袭检测、代码复用分析 |\n| **代码摘要** | Code Summarization | 自动生成代码注释、文档 |\n| **程序翻译** | Program Translation | 将 Java 代码自动转换为 Python 等 |\n\n> **注**：对于具体的算法实现和数据集下载，请务必参考列表中提供的原始链接，因为不同项目的依赖和环境要求差异较大。","某大型金融科技公司的高级架构师正带领团队重构核心交易系统的遗留代码库，急需引入基于机器学习的智能化工具来提升代码质量与开发效率。\n\n### 没有 awesome-machine-learning-on-source-code 时\n- **文献检索如大海捞针**：团队需要在海量学术数据库中手动筛选关于“代码克隆检测”或“自动缺陷修复”的论文，耗时数周仍难以找到最新且适用的算法。\n- **技术选型缺乏依据**：由于不了解业界现有的成熟数据集（Datasets）和基准测试，团队不得不从零开始构建训练数据，导致项目启动严重滞后。\n- **重复造轮子现象严重**：开发人员不知道已有开源项目实现了类似的“代码补全”或“程序翻译”功能，浪费大量人力重新编写基础模型代码。\n- **前沿动态掌握滞后**：团队错过了 ICSE、ASE 等顶级会议中关于 MLonCode 的最新研究成果，导致技术方案停留在过时的统计方法上，无法利用深度学习优势。\n\n### 使用 awesome-machine-learning-on-source-code 后\n- **精准定位核心资源**：通过分类清晰的论文列表，团队在半天内就锁定了针对 Java 遗留代码的“程序修复”最先进算法及相关实现代码。\n- **直接复用高质量数据**：利用列表中整理的专用数据集，团队迅速完成了模型预训练，将原本需要一个月的数据准备期缩短至三天。\n- **站在巨人肩膀上创新**：参考列表中推荐的软件项目（Software），团队直接集成了成熟的代码嵌入（Embeddings）模块，专注于业务逻辑优化而非底层架构。\n- **紧跟学术与工业界趋势**：借助更新的会议链接和综述（Digests），团队及时采纳了最新的神经网络架构，显著提升了代码建议的准确率。\n\nawesome-machine-learning-on-source-code 将分散的科研宝藏转化为可落地的工程指南，让团队从繁琐的调研中解放出来，专注于解决真正的业务难题。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsrc-d_awesome-machine-learning-on-source-code_4467c186.png","src-d","source{d}","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fsrc-d_bce84462.png","",null,"hello@sourced.tech","https:\u002F\u002Fsourced.tech","https:\u002F\u002Fgithub.com\u002Fsrc-d",6547,837,"2026-04-03T09:01:41","CC-BY-SA-4.0",1,"未说明",{"notes":90,"python":88,"dependencies":91},"该仓库是一个 curated list（精选列表），主要收集了关于源代码机器学习的研究论文、数据集和软件项目链接，本身不是一个可运行的软件工具或框架，因此没有具体的运行环境、依赖库或硬件需求。README 明确指出该仓库已不再维护。",[],[13],[94,95,96,97],"awesome-list","awesome","machine-learning","machine-learning-on-source-code","2026-03-27T02:49:30.150509","2026-04-06T05:16:14.580012",[],[]]