[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-atfortes--Awesome-LLM-Reasoning":3,"tool-atfortes--Awesome-LLM-Reasoning":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":82,"owner_twitter":83,"owner_website":84,"owner_url":85,"languages":82,"stars":86,"forks":87,"last_commit_at":88,"license":89,"difficulty_score":90,"env_os":91,"env_gpu":92,"env_ram":92,"env_deps":93,"category_tags":96,"github_topics":97,"view_count":23,"oss_zip_url":82,"oss_zip_packed_at":82,"status":16,"created_at":116,"updated_at":117,"faqs":118,"releases":119},3459,"atfortes\u002FAwesome-LLM-Reasoning","Awesome-LLM-Reasoning","From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓","Awesome-LLM-Reasoning 是一个精心整理的开源资源库，旨在帮助开发者与研究人员深入探索并解锁大语言模型（LLM）及多模态模型的推理能力。从基础的“思维链”（Chain-of-Thought）提示技术，到前沿的 OpenAI o1 和 DeepSeek-R1 等复杂推理模型，该项目系统性地汇集了相关学术论文、代码实现及技术综述。\n\n当前大模型虽强大，但在逻辑推导、数学解题及符号推理等方面仍存在挑战。Awesome-LLM-Reasoning 通过分类梳理“如何让模型学会思考”的关键技术，解决了从业者难以快速追踪领域进展、缺乏系统性学习路径的痛点。其内容涵盖推理机制分析、多模态推理应用、小模型推理扩展以及防数据污染评测等多个维度，并持续更新 2024 至 2025 年的最新研究成果。\n\n该资源库特别适合 AI 研究人员、算法工程师以及对大模型底层逻辑感兴趣的技术爱好者使用。无论是希望复现经典推理算法，还是寻找提升模型逻辑表现的最新方案，都能在此找到高质量的参考依据。作为连接理论研究与工程实践的桥梁，Awesome-LLM-Reasoning 以清晰的结构和权威的选品，成为","Awesome-LLM-Reasoning 是一个精心整理的开源资源库，旨在帮助开发者与研究人员深入探索并解锁大语言模型（LLM）及多模态模型的推理能力。从基础的“思维链”（Chain-of-Thought）提示技术，到前沿的 OpenAI o1 和 DeepSeek-R1 等复杂推理模型，该项目系统性地汇集了相关学术论文、代码实现及技术综述。\n\n当前大模型虽强大，但在逻辑推导、数学解题及符号推理等方面仍存在挑战。Awesome-LLM-Reasoning 通过分类梳理“如何让模型学会思考”的关键技术，解决了从业者难以快速追踪领域进展、缺乏系统性学习路径的痛点。其内容涵盖推理机制分析、多模态推理应用、小模型推理扩展以及防数据污染评测等多个维度，并持续更新 2024 至 2025 年的最新研究成果。\n\n该资源库特别适合 AI 研究人员、算法工程师以及对大模型底层逻辑感兴趣的技术爱好者使用。无论是希望复现经典推理算法，还是寻找提升模型逻辑表现的最新方案，都能在此找到高质量的参考依据。作为连接理论研究与工程实践的桥梁，Awesome-LLM-Reasoning 以清晰的结构和权威的选品，成为理解大模型推理演进不可或缺的工具。","\u003Ca name=\"readme-top\">\u003C\u002Fa>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fatfortes\u002FAwesome-LLM-Reasoning?style=for-the-badge\" alt=\"Stargazers\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning\u002Fnetwork\u002Fmembers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fatfortes\u002FAwesome-LLM-Reasoning?style=for-the-badge\" alt=\"Forks\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning\u002Fgraphs\u002Fcontributors\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcontributors\u002Fatfortes\u002FAwesome-LLM-Reasoning?style=for-the-badge\" alt=\"Contributors\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fatfortes\u002FAwesome-LLM-Reasoning?style=for-the-badge\" alt=\"MIT License\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"assets\u002Fcot.svg\" width=\"90%\" style=\"align:center;\"\u002F>\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">Awesome LLM Reasoning\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n    \u003Cb> Curated collection of papers and resources on how to unlock the reasoning ability of LLMs and MLLMs.\u003C\u002Fb>\n\u003C\u002Fp>\n\n\u003Cdetails>\n  \u003Csummary>🗂️ Table of Contents\u003C\u002Fsummary>\n  \u003Col>\n    \u003Cli>\u003Ca href=\"#survey\">Survey\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#analysis\">Analysis\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#ltechnique\">Technique\u003C\u002Fa>\n      \u003Cul>\n        \u003Cli>\u003Ca href=\"#llm\">🔤 Reasoning in Large Language Models - \u003Cem>An Emergent Ability\u003C\u002Fem>\u003C\u002Fa>\u003C\u002Fli>\n        \u003Cli>\u003Ca href=\"#mllm\">🧠 Multimodal Reasoning in Large Language Models\u003C\u002Fa>\u003C\u002Fli>\n        \u003Cli>\u003Ca href=\"#lm\">🤏 Scaling Smaller Language Models to Reason\u003C\u002Fa>\u003C\u002Fli>\n      \u003C\u002Ful>\n    \u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#other-useful-resources\">Other Useful Resources\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#other-awesome-lists\">Other Awesome Lists\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#contributing\">Contributing\u003C\u002Fa>\u003C\u002Fli>\n  \u003C\u002Fol>\n\u003C\u002Fdetails>\n\nIf you would like to test the symbolic reasoning ability of LLMs, take a look at: \u003Cb>\u003Ca href=https:\u002F\u002Fgithub.com\u002Fatfortes\u002FLLMSymbolicReasoningBench>LLMSymbolicReasoningBench\u003C\u002Fa>\u003C\u002Fb> 😄\n\n\n\n## Survey\n\n### 2025\n\n1. **[Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.12605)** [[code](https:\u002F\u002Fgithub.com\u002Fyaotingwangofficial\u002FAwesome-MCoT)]\n\n    *Yaoting Wang, Shengqiong Wu, Yuecheng Zhang, William Wang, Ziwei Liu, Jiebo Luo, Hao Fei.* Preprint'25\n\n2. **[Recent Advances in Large Language Model Benchmarks Against Data Contamination: From Static to Dynamic Evaluation.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.17521)** [[code](https:\u002F\u002Fgithub.com\u002FSeekingDream\u002FStatic-to-Dynamic-LLMEval)]\n\n    *Simin Chen, Yiming Chen, Zexin Li, Yifan Jiang, Zhongwei Wan, Yixin He, Dezhi Ran, Tianle Gu, Haizhou Li, Tao Xie, Baishakhi Ray.* Preprint'25\n\n### 2024\n\n1. **[Attention Heads of Large Language Models: A Survey.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.03752)** [[code](https:\u002F\u002Fgithub.com\u002FIAAR-Shanghai\u002FAwesome-Attention-Heads)]\n\n    *Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Bo Tang, Feiyu Xiong, Zhiyu Li.* Preprint'24\n\n1. **[Internal Consistency and Self-Feedback in Large Language Models: A Survey.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.14507)** [[code](https:\u002F\u002Fgithub.com\u002FIAAR-Shanghai\u002FICSFSurvey)]\n\n    *Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Feiyu Xiong, Zhiyu Li.* Preprint'24\n\n1. **[Puzzle Solving using Reasoning of Large Language Models: A Survey.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.11291)** [[code](https:\u002F\u002Fpuzzlellms.github.io\u002F)]\n\n    *Panagiotis Giadikiaroglou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou.* Preprint'24\n\n1. **[Large Language Models for Mathematical Reasoning: Progresses and Challenges.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.00157)** \n\n    *Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, Wenpeng Yin.* ACL'24\n\n### 2022\n\n1. **[Towards Reasoning in Large Language Models: A Survey.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10403)** [[code](https:\u002F\u002Fgithub.com\u002Fjeffhj\u002FLM-reasoning)]\n\n    *Jie Huang, Kevin Chen-Chuan Chang.* ACL'23 Findings\n\n1. **[Reasoning with Language Model Prompting: A Survey.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.09597)** [[code](https:\u002F\u002Fgithub.com\u002Fzjunlp\u002FPrompt4ReasoningPapers)]\n\n    *Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Huajun Chen.* ACL'23\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ Back to Top ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n## Analysis\n\n### 2025\n\n1. **[New Trends for Modern Machine Translation with Large Reasoning Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.10351)**\n\n    *Sinuo Liu, Chenyang Lyu, Minghao Wu, Longyue Wang, Weihua Luo, Kaifu Zhang, Zifu Shang.* Preprint'25\n\n### 2024\n\n1. **[Are Your LLMs Capable of Stable Reasoning?](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.13147)** [[code](https:\u002F\u002Fgithub.com\u002Fopen-compass\u002FGPassK)]\n\n    *Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen.* Preprint'24\n\n1. **[From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.03590)**\n\n    *Harsha Nori, Naoto Usuyama, Nicholas King, Scott Mayer McKinney, Xavier Fernandes, Sheng Zhang, Eric Horvitz.* Preprint'24\n\n1. **[To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12183)**\n\n    *Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett.* Preprint'24\n\n1. **[Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.04109)**\n\n    *Chenglei Si, Diyi Yang, Tatsunori Hashimoto.* Preprint'24\n\n1. **[A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.11050)** [[code](https:\u002F\u002Fgithub.com\u002Fbowen-upenn\u002Fllm_token_bias)]\n\n    *Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth.* EMNLP'24\n\n1. **[Iteration Head: A Mechanistic Study of Chain-of-Thought](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.02128)**\n\n    *Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Alice Yang, Francois Charton, Julia Kempe.* NeurIPS'24\n\n1. **[Do Large Language Models Latently Perform Multi-Hop Reasoning?](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.16837)**\n\n    *Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva, Sebastian Riedel.* ACL'24\n\n1. **[Premise Order Matters in Reasoning with Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.08939)**\n\n    *Xinyun Chen, Ryan A. Chi, Xuezhi Wang, Denny Zhou.* ICML'24\n\n1. **[The Impact of Reasoning Step Length on Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.04925)**\n\n    *Mingyu Jin, Qinkai Yu, Dong Shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, Mengnan Du.* ACL'24 Findings\n\n1. **[Large Language Models Cannot Self-Correct Reasoning Yet.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.01798)**\n\n    *Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou.* ICLR'24\n\n1. **[At Which Training Stage Does Code Data Help LLM Reasoning?](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.16298)**\n\n    *Yingwei Ma, Yue Liu, Yue Yu, Yuanliang Zhang, Yu Jiang, Changjian Wang, Shanshan Li.* ICLR'24\n\n### 2023\n\n1. **[Measuring Faithfulness in Chain-of-Thought Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.13702)**\n\n    *Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez.* Preprint'23\n\n1. **[Faith and Fate: Limits of Transformers on Compositionality.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.18654)**\n\n    *Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi.* NeurIPS'23\n\n1. **[Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.04388)** [[code](https:\u002F\u002Fgithub.com\u002Fmilesaturpin\u002Fcot-unfaithfulness)]\n\n    *Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman.* NeurIPS'23\n\n1. **[A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.04023)**\n\n    *Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung.* AACL'23\n\n1. **[Large Language Models Can Be Easily Distracted by Irrelevant Context.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.00093)**\n\n    *Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed Chi, Nathanael Schärli, Denny Zhou.* ICML'23\n   \n1. **[On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08061)**\n\n    *Omar Shaikh, Hongxin Zhang, William Held, Michael Bernstein, Diyi Yang.* ACL'23\n\n1. **[Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10001)** [[code](https:\u002F\u002Fgithub.com\u002Fsunlab-osu\u002FUnderstanding-CoT)]\n\n    *Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun.* ACL'23\n\n1. **[Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.09261)** [[code](https:\u002F\u002Fgithub.com\u002Fsuzgunmirac\u002FBIG-Bench-Hard)]\n\n    *Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei.* ACL'23 Findings\n\n### 2022\n\n1. **[Emergent Abilities of Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.07682)** [[blog](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F11\u002Fcharacterizing-emergent-phenomena-in.html)]\n\n    *Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus.* TMLR'22\n\n1. **[Can language models learn from explanations in context?](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.02329)**\n\n    *Andrew K. Lampinen, Ishita Dasgupta, Stephanie C. Y. Chan, Kory Matthewson, Michael Henry Tessler, Antonia Creswell, James L. McClelland, Jane X. Wang, Felix Hill.* EMNLP'22\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ Back to Top ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n\u003Ch2 id=\"ltechnique\">Technique\u003C\u002Fh2>\n\n\n\n\u003Ch3 id=\"llm\">🔤 Reasoning in Large Language Models - \u003Ci>An Emergent Ability\u003C\u002Fi>\u003C\u002Fh3>\n\n### 2025\n\n1. **[JudgeLRM: Large Reasoning Models as a Judge.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.00050)**\n\n    *Nuo Chen, Zhiyuan Hu, Qingyun Zou, Jiaying Wu, Qian Wang, Bryan Hooi, Bingsheng He.* Preprint'25\n   \n1. **[Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.04149)** [[code](https:\u002F\u002Fcodekaleidoscope.github.io\u002Fdycodeeval.html)]\n\n    *Simin Chen, Pranav Pusarla, Baishakhi Ray.* ICML'25\n\n1. **[CRANE: Reasoning with constrained LLM generation.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09061)**\n\n    *Debangshu Banerjee, Tarun Suresh, Shubham Ugare, Sasa Misailovic, Gagandeep Singh.* ICML'25\n\n1. **[Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.05179)** [[code](https:\u002F\u002Fwww.github.com\u002FSimonAytes\u002FSoT)]\n\n    *Simon A. Aytes, Jinheon Baek, Sung Ju Hwang.* Preprint'25\n\n1. **[Self-rewarding correction for mathematical reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.19613)**\n\n    *Wei Xiong, Hanning Zhang, Chenlu Ye, Lichang Chen, Nan Jiang, Tong Zhang.* Preprint'25\n\n1. **[Competitive Programming with Large Reasoning Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.06807)**\n\n    *OpenAI: Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaiev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou.* Preprint'25\n\n1. **[s1: Simple test-time scaling.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.19393)**\n\n    *Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto.* Preprint'25\n\n1. **[DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.12948)** [[project](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-R1)]\n\n    *Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z.F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, et al.* Preprint'25\n\n1. **[Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.04682)**\n\n    *Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn.* Preprint'25\n\n### 2024\n\n1. **[HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.18925)** [[code](https:\u002F\u002Fgithub.com\u002FFreedomIntelligence\u002FHuatuoGPT-o1)]\n\n    *Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, Benyou Wang.* Preprint'24\n\n\n1. **[PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3643780)** [[code](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3643780)]\n    *Simin Chen, XiaoNing Feng, Xiaohong Han, Cong Liu, Wei Yang* FSE'24\n\n1. **[DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.17498)** [[code](https:\u002F\u002Fgithub.com\u002Fkrystalan\u002FDRT-o1)]\n\n    *Jiaan Wang, Fandong Meng, Yunlong Liang, Jie Zhou.* Preprint'24\n\n1. **[MALT: Improving Reasoning with Multi-Agent LLM Training.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.01928)**\n\n    *Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, Markian Rybchuk, Philip H. S. Torr, Ivan Laptev, Fabio Pizzati, Ronald Clark, Christian Schroeder de Witt.* Preprint'24\n\n1. **[SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.07472)**\n\n    *Jiaqi Zhang, Chen Gao, Liyuan Zhang, Yong Li, Hongzhi Yin.* Preprint'24\n\n1. **[Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.14405)** [[code](https:\u002F\u002Fgithub.com\u002FAIDC-AI\u002FMarco-o1)] [[model](https:\u002F\u002Fhuggingface.co\u002FAIDC-AI\u002FMarco-o1)]\n\n    *Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang.* Preprint'24\n\n1. **[Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.10735)**\n\n    *Kuofeng Gao, Huanqia Cai, Qingyao Shuai, Dihong Gong, Zhifeng Li.* Preprint'24\n\n1. **[Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.03136)** [[code](https:\u002F\u002Fgithub.com\u002Fxiongsiheng\u002FSWAP)]\n\n    *Siheng Xiong, Ali Payani, Yuan Yang, Faramarz Fekri.* Preprint'24\n\n1. **[Interpretable Contrastive Monte Carlo Tree Search Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.01707)**\n\n    *Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, Lijie Wen.* Preprint'24\n\n1. **[Training Language Models to Self-Correct via Reinforcement Learning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12917)**\n\n    *Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, Yi Su, JD Co-Reyes, Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, Lei M. Zhang, Kay McKinney, Disha Shrivastava, Cosmin Paduraru, George Tucker, Doina Precup, Feryal Behbahani, Aleksandra Faust.* Preprint'24\n\n1. **[OpenAI o1.](https:\u002F\u002Fopenai.com\u002Findex\u002Flearning-to-reason-with-llms\u002F)**\n\n    *Open AI Team.* Technical Report'24\n\n1. **[Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.07199)**\n\n    *Pranav Putta, Edmund Mills, Naman Garg, Sumeet Motwani, Chelsea Finn, Divyansh Garg, Rafael Rafailov.* Preprint'24\n\n1. **[DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.04078)** [[code](https:\u002F\u002Fgithub.com\u002FChengpengLi1003\u002FDotaMath)]\n\n    *Chengpeng Li, Guanting Dong, Mingfeng Xue, Ru Peng, Xiang Wang, Dayiheng Liu.* Preprint'24\n\n1. **[LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.17663)**\n\n    *Aditya Kalyanpur, Kailash Saravanakumar, Victor Barres, Jennifer Chu-Carroll, David Melville, David Ferrucci.* Preprint'24\n\n1. **[Q\\*: Improving Multi-step Reasoning for LLMs with Deliberative Planning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.14283)**\n\n    *Chaojie Wang, Yanchen Deng, Zhiyi Lv, Shuicheng Yan, An Bo.* Preprint'24\n\n1. **[Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.04271)** [[code](https:\u002F\u002Fgithub.com\u002FYangLing0818\u002Fbuffer-of-thought-llm)]\n\n    *Ling Yang, Zhaochen Yu, Tianjun Zhang, Shiyi Cao, Minkai Xu, Wentao Zhang, Joseph E. Gonzalez, Bin Cui.* Preprint'24\n\n1. **[Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.12253)**\n\n    *Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu.* Preprint'24\n\n1. **[Self-playing Adversarial Language Game Enhances LLM Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.10642)**\n\n    *Pengyu Cheng, Tianhao Hu, Han Xu, Zhisong Zhang, Yong Dai, Lei Han, Nan Du.* Preprint'24\n\n1. **[Evaluating Mathematical Reasoning Beyond Accuracy.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05692)**\n\n    *Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, Pengfei Liu.* Preprint'24\n\n1. **[Advancing LLM Reasoning Generalists with Preference Trees.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02078)**\n\n    *Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, Bowen Zhou, Hao Peng, Zhiyuan Liu, Maosong Sun.* Preprint'24\n\n1. **[LLM3: Large Language Model-based Task and Motion Planning with Motion Failure Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.11552)** [[code](https:\u002F\u002Fgithub.com\u002FAssassinWS\u002FLLM-TAMP)]\n\n    *Shu Wang, Muzhi Han, Ziyuan Jiao, Zeyu Zhang, Ying Nian Wu, Song-Chun Zhu, Hangxin Liu.* IROS'24\n\n1. **[Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09629)**\n\n    *Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman.* Preprint'24\n\n1. **[GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.10963)**\n\n    *Alex Havrilla, Sharath Raparthy, Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Railneau.* ICML'24\n\n1. **[Chain-of-Thought Reasoning Without Prompting.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.10200)**\n\n    *Xuezhi Wang, Denny Zhou.* Preprint'24\n\n1. **[V-STaR: Training Verifiers for Self-Taught Reasoners.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.06457)**\n\n    *Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal.* Preprint'24\n\n1. **[InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.06332)**\n\n    *Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin.* Preprint'24\n\n1. **[Self-Discover: Large Language Models Self-Compose Reasoning Structures.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03620)**\n\n    *Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng.* Preprint'24\n\n1. **[DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03300)**\n\n    *Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo.* Preprint'24\n\n1. **[K-Level Reasoning with Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.01521)**\n\n    *Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, Furu Wei.* Preprint'24\n\n1. **[Efficient Tool Use with Chain-of-Abstraction Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.17464)**\n\n    *Silin Gao, Jane Dwivedi-Yu, Ping Yu, Xiaoqing Ellen Tan, Ramakanth Pasunuru, Olga Golovneva, Koustuv Sinha, Asli Celikyilmaz, Antoine Bosselut, Tianlu Wang.* Preprint'24\n\n1. **[Teaching Language Models to Self-Improve through Interactive Demonstrations.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.13522)**\n\n    *Xiao Yu, Baolin Peng, Michel Galley, Jianfeng Gao, Zhou Yu.* NAACL'24\n\n1. **[Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.13339)** [[code](https:\u002F\u002Fgithub.com\u002Fxf-zhao\u002FLoT)]\n\n    *Xufeng Zhao, Mengdi Li, Wenhao Lu, Cornelius Weber, Jae Hee Lee, Kun Chu, Stefan Wermter.* COLING'24\n\n1. **[Chain-of-Verification Reduces Hallucination in Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11495)**\n\n    *Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston.* ACL'24 Findings\n\n1. **[Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15337)**\n\n    *Xuefei Ning, Zinan Lin, Zixuan Zhou, Huazhong Yang, Yu Wang.* ICLR'24\n\n1. **[Question Decomposition Improves the Faithfulness of Model-Generated Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.11768)** [[code](https:\u002F\u002Fgithub.com\u002Fanthropics\u002FDecompositionFaithfulnessPaper)]\n\n    *Ansh Radhakrishnan, Karina Nguyen, Anna Chen, Carol Chen, Carson Denison, Danny Hernandez, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Sam McCandlish, Sheer El Showk, Tamera Lanham, Tim Maxwell, Venkatesa Chandrasekaran, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez.* Preprint'23\n\n1. **[Let's Verify Step by Step.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.20050)**\n\n    *Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe.* ICLR'24\n\n1. **[REFINER: Reasoning Feedback on Intermediate Representations.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01904)** [[project](https:\u002F\u002Fdebjitpaul.github.io\u002Frefiner\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fdebjitpaul\u002Frefiner)]\n\n    *Debjit Paul, Mete Ismayilzada, Maxime Peyrard, Beatriz Borges, Antoine Bosselut, Robert West, Boi Faltings.* EACL'24\n\n1. **[Active Prompting with Chain-of-Thought for Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.12246)** [[code](https:\u002F\u002Fgithub.com\u002Fshizhediao\u002Factive-cot)]\n\n    *Shizhe Diao, Pengcheng Wang, Yong Lin, Tong Zhang.* ACL'24\n\n1. **[Language Models as Inductive Reasoners.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10923)**\n\n    *Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei.* EACL'24\n\n### 2023\n\n1. **[Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08901)**\n\n    *Xijie Huang, Li Lyna Zhang, Kwang-Ting Cheng, Mao Yang.* Preprint'23\n\n1. **[Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12295)** [[code](https:\u002F\u002Fgithub.com\u002Fteacherpeterpan\u002FLogic-LLM)]\n\n    *Liangming Pan, Alon Albalak, Xinyi Wang, William Yang Wang.* EMNLP'23 Findings\n\n1. **[Recursion of Thought: A Divide and Conquer Approach to Multi-Context Reasoning with Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.06891)** [[code](https:\u002F\u002Fgithub.com\u002Fsoochan-lee\u002FRoT)] [[poster](https:\u002F\u002Fsoochanlee.com\u002Fimg\u002Frot\u002Frot_poster.pdf)]\n\n    *Soochan Lee, Gunhee Kim.* ACL'23 Findings\n\n1. **[Reasoning with Language Model is Planning with World Model.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14992)**\n\n    *Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, Zhiting Hu.* EMNLP'23\n\n1. **[Reasoning Implicit Sentiment with Chain-of-Thought Prompting.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.11255)** [[code](https:\u002F\u002Fgithub.com\u002Fscofield7419\u002FTHOR-ISA)]\n\n    *Hao Fei, Bobo Li, Qian Liu, Lidong Bing, Fei Li, Tat-Seng Chua.* ACL'23\n\n1. **[Tree of Thoughts: Deliberate Problem Solving with Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601)** [[code](https:\u002F\u002Fgithub.com\u002Fysymyth\u002Ftree-of-thought-llm)]\n\n    *Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan.* NeurIPS'23\n\n1. **[SatLM: Satisfiability-Aided Language Models Using Declarative Prompting.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.09656)** [[code](https:\u002F\u002Fgithub.com\u002Fxiye17\u002Fsat-lm)]\n\n    *Xi Ye, Qiaochu Chen, Isil Dillig, Greg Durrett.* NeurIPS'23\n\n1. **[ART: Automatic multi-step reasoning and tool-use for large language models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.09014)**\n\n    *Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannaneh Hajishirzi, Luke Zettlemoyer, Marco Tulio Ribeiro.* Preprint'23\n\n1. **[Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.12822)** [[code](https:\u002F\u002Fgithub.com\u002Fshizhediao\u002Fautomate-cot)]\n\n    *KaShun Shum, Shizhe Diao, Tong Zhang.* EMNLP'23 Findings\n\n1. **[Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.00618)**\n\n    *Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen.* ICML'23\n\n1. **[Faithful Chain-of-Thought Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.13379)**\n\n    *Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, Chris Callison-Burch.* IJCNLP-AACL'23\n\n1. **[Rethinking with Retrieval: Faithful Large Language Model Inference.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.00303)**\n\n    *Hangfeng He, Hongming Zhang, Dan Roth.* Preprint'23\n\n1. **[LAMBADA: Backward Chaining for Automated Reasoning in Natural Language.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.13894)**\n\n    *Seyed Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, Deepak Ramachandran.* ACL'23\n\n1. **[Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10509)** [[code](https:\u002F\u002Fgithub.com\u002FStonyBrookNLP\u002Fircot)]\n\n    *Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal.* ACL'23\n\n1. **[Large Language Models are Reasoners with Self-Verification.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.09561)** [[code](https:\u002F\u002Fgithub.com\u002FWENGSYX\u002FSelf-Verification)]\n\n    *Yixuan Weng, Minjun Zhu, Shizhu He, Kang Liu, Jun Zhao.* EMNLP'23 Findings\n\n1. **[Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.09146)** [[code](https:\u002F\u002Fgithub.com\u002FMcGill-NLP\u002Fretriever-lm-reasoning)]\n\n    *Parishad BehnamGhader, Santiago Miret, Siva Reddy.* EMNLP'23 Findings\n\n1. **[Complementary Explanations for Effective In-Context Learning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.13892)**\n\n    *Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, Ves Stoyanov, Greg Durrett, Ramakanth Pasunuru.* ACL'23 Findings\n\n1. **[Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.12588)** [[code](https:\u002F\u002Fgithub.com\u002Fwenhuchen\u002Fprogram-of-thoughts)]\n\n    *Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen.* TMLR'23\n\n1. **[Unsupervised Explanation Generation via Correct Instantiations.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.11160)**\n\n    *Sijie Cheng, Zhiyong Wu, Jiangjie Chen, Zhixing Li, Yang Liu, Lingpeng Kong.* AAAI'23\n\n1. **[PAL: Program-aided Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.10435)** [[project](https:\u002F\u002Freasonwithpal.com\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Freasoning-machines\u002Fpal)]\n\n    *Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig.* ICML'23\n\n1. **[Solving Math Word Problems via Cooperative Reasoning induced Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.16257)** [[code](https:\u002F\u002Fgithub.com\u002FTianHongZXY\u002FCoRe)]\n\n    *Xinyu Zhu, Junjie Wang, Lin Zhang, Yuxiang Zhang, Ruyi Gan, Jiaxing Zhang, Yujiu Yang.* ACL'23\n\n1. **[Large Language Models Can Self-Improve.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11610)**\n\n    *Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han.* EMNLP'23\n\n1. **[Mind's Eye: Grounded language model reasoning through simulation.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.05359)**\n\n    *Ruibo Liu, Jason Wei, Shixiang Shane Gu, Te-Yen Wu, Soroush Vosoughi, Claire Cui, Denny Zhou, Andrew M. Dai.* ICLR'23\n\n1. **[Automatic Chain of Thought Prompting in Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03493)** [[code](https:\u002F\u002Fgithub.com\u002Famazon-research\u002Fauto-cot)]\n\n    *Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola.* ICLR'23\n\n1. **[Language Models are Multilingual Chain-of-Thought Reasoners.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03057)**\n\n    *Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei.* ICLR'23\n\n1. **[Ask Me Anything: A simple strategy for prompting language models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.02441)** [[code](https:\u002F\u002Fgithub.com\u002Fhazyresearch\u002Fama_prompting)]\n\n    *Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré.* ICLR'23\n\n1. **[Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14610)** [[project](https:\u002F\u002Fpromptpg.github.io\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Flupantech\u002FPromptPG)]\n\n    *Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan.* ICLR'23\n\n1. **[Making Large Language Models Better Reasoners with Step-Aware Verifier.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.02336)**\n\n    *Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen.* ACL'23\n\n1. **[Least-to-most prompting enables complex reasoning in large language models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.10625)**\n\n    *Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi.* ICLR'23\n\n1. **[Self-consistency improves chain of thought reasoning in language models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11171)**\n\n    *Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou.* ICLR'23\n\n### 2022\n\n1. **[Retrieval Augmentation for Commonsense Reasoning: A Unified Approach.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.12887)** [[code](https:\u002F\u002Fgithub.com\u002Fwyu97\u002FRACo)]\n\n    *Wenhao Yu, Chenguang Zhu, Zhihan Zhang, Shuohang Wang, Zhuosheng Zhang, Yuwei Fang, Meng Jiang.* EMNLP'22\n\n1. **[Language Models of Code are Few-Shot Commonsense Learners.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.07128)** [[code](https:\u002F\u002Fgithub.com\u002Fmadaan\u002Fcocogen)]\n\n    *Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, Graham Neubig.* EMNLP'22\n\n1. **[Solving Quantitative Reasoning Problems with Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.14858)** [[blog](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F06\u002Fminerva-solving-quantitative-reasoning.html)]\n\n    *Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra.* NeurIPS'22\n\n1. **[Large Language Models Still Can't Plan.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.10498)** [[code](https:\u002F\u002Fgithub.com\u002Fkarthikv792\u002Fgpt-plan-benchmark)]\n\n    *Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati.* NeurIPS'22\n\n1. **[Large Language Models are Zero-Shot Reasoners.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.11916)**\n\n    *Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa.* NeurIPS'22\n\n1. **[Iteratively Prompt Pre-trained Language Models for Chain of Thought.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.08383)** [[code](https:\u002F\u002Fgithub.com\u002Fsunlab-osu\u002Fiterprompt)]\n\n    *Boshi Wang, Xiang Deng, Huan Sun.* EMNLP'22\n\n1. **[Chain of Thought Prompting Elicits Reasoning in Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11903)** [[blog](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F05\u002Flanguage-models-perform-reasoning-via.html)]\n\n    *Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou.* NeurIPS'22\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ Back to Top ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n### \u003Ch3 id=\"mllm\">🧠 Multimodal Reasoning in Large Language Models\u003Ch3\u002F>\n\n### 2025\n\n1. **[Introducing Visual Perception Token into Multimodal Large Language Model.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.17425)** [[code](https:\u002F\u002Fgithub.com\u002Fyu-rp\u002FVisualPerceptionToken)] [[model](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Frp-yu\u002Fvpt-models-67b6afdc8679a05a2876f07a)] [[dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Frp-yu\u002FVPT_Datasets)]\n\n    *Runpeng Yu, Xinyin Ma, Xinchao Wang.* Preprint'25\n\n1. **[LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.06186)** [[project](https:\u002F\u002Fmbzuai-oryx.github.io\u002FLlamaV-o1\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fmbzuai-oryx\u002FLlamaV-o1)] [[model](https:\u002F\u002Fhuggingface.co\u002Fomkarthawakar\u002FLlamaV-o1)]\n\n    *Omkar Thawakar, Dinura Dissanayake, Ketan More, Ritesh Thawkar, Ahmed Heakl, Noor Ahsan, Yuhao Li, Mohammed Zumri, Jean Lahoud, Rao Muhammad Anwer, Hisham Cholakkal, Ivan Laptev, Mubarak Shah, Fahad Shahbaz Khan, Salman Khan.* Preprint'25\n\n1. **[Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21696)** [[project]](https:\u002F\u002Fembodied-reasoner.github.io\u002F)[[code](https:\u002F\u002Fgithub.com\u002Fzwq2018\u002Fembodied_reasoner)][[dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fzwq2018\u002Fembodied_reasoner)]\n   *Wenqi Zhang, Mengna Wang, Gangao Liu, Xu Huixin, Yiwei Jiang, Yongliang Shen, Guiyang Hou, Zhe Zheng, Hang Zhang, Xin Li, Weiming Lu, Peng Li, Yueting Zhuang.* Preprint'25\n### 2024\n\n1. **[Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.14432)** [[code](https:\u002F\u002Fgithub.com\u002Fdongyh20\u002FInsight-V)] [[model](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTHUdyh\u002Finsight-v-673f5e1dd8ab5f2d8d332035)]\n\n    *Yuhao Dong, Zuyan Liu, Hai-Long Sun, Jingkang Yang, Winston Hu, Yongming Rao, Ziwei Liu.* Preprint'24\n\n1. **[LLaVA-CoT: Let Vision Language Models Reason Step-by-Step](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10440)** [code](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FLLaVA-CoT) [model](https:\u002F\u002Fhuggingface.co\u002FXkev\u002FLlama-3.2V-11B-cot)\n\n    *Guowei Xu, Peng Jin, Hao Li, Yibing Song, Lichao Sun, Li Yuan.* Preprint'24\n\n1. **[Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.09403)** [[project](https:\u002F\u002Fvisualsketchpad.github.io\u002F)] [[code](https:\u002F\u002Fgithub.com\u002FYushi-Hu\u002FVisualSketchpad)]\n\n    *Yushi Hu, Weijia Shi, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna.* Preprint'24\n\n1. **[Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.12596)**\n\n    *Victor Carbune, Hassan Mansoor, Fangyu Liu, Rahul Aralikatte, Gilles Baechler, Jindong Chen, Abhanshu Sharma.* NAACL'24 Findings\n\n1. **[SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.12168)** [[project](https:\u002F\u002Fspatial-vlm.github.io\u002F)]\n\n    *Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia.* CVPR'24\n\n1. **[Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.04398)**\n\n    *Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister.* ICLR'24\n\n1. **[Link-Context Learning for Multimodal LLMs.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07891)** [[code](https:\u002F\u002Fgithub.com\u002Fisekai-portal\u002FLink-Context-Learning)]\n\n    *Yan Tai, Weichen Fan, Zhao Zhang, Feng Zhu, Rui Zhao, Ziwei Liu.* CVPR'24\n\n### 2023\n\n1. **[Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.17661)**\n\n    *Yuqing Wang, Yun Zhao.* Preprint'23\n\n1. **[G-LLaVA: Solving Geometric Problems with Multi-Modal Large Language Model.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11370)**\n\n    *Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, Yufei Wang, Lanqing Hong, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong.* Preprint'23\n\n1. **[Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09842)** [[project](https:\u002F\u002Fchameleon-llm.github.io\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Flupantech\u002Fchameleon-llm)]\n\n    *Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Jianfeng Gao.* NeurIPS'23\n\n1. **[MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11381)** [[project](https:\u002F\u002Fmultimodal-react.github.io\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FMM-REACT)] [[demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmicrosoft-cognitive-service\u002Fmm-react)]\n\n    *Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang.* Preprint'23\n\n1. **[ViperGPT: Visual Inference via Python Execution for Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08128)** [[project](https:\u002F\u002Fviper.cs.columbia.edu\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fcvlab-columbia\u002Fviper)]\n\n    *Dídac Surís, Sachit Menon, Carl Vondrick.* ICCV'23\n\n1. **[Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04671)** [[code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fvisual-chatgpt)]\n\n    *Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan.* Preprint'23\n\n1. **[Multimodal Chain-of-Thought Reasoning in Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.00923)** [[code](https:\u002F\u002Fgithub.com\u002Famazon-science\u002Fmm-cot)]\n\n    *Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola.* Preprint'23\n\n1. **[Visual Programming: Compositional Visual Reasoning without Training.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.11559)** [[project](https:\u002F\u002Fprior.allenai.org\u002Fprojects\u002Fvisprog)] [[code](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fvisprog)]\n\n    *Tanmay Gupta, Aniruddha Kembhavi.* CPVR'23\n\n1. **[Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.00598)** [[project](https:\u002F\u002Fsocraticmodels.github.io\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002Fmaster\u002Fsocraticmodels)]\n\n    *Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence.* ICLR'23\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ Back to Top ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n### \u003Ch3 id=\"lm\">🤏 Scaling Smaller Language Models to Reason\u003Ch3\u002F>\n\n### 2025\n\n1. **[Learning to Reason from Feedback at Test-Time.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.15771)** [[code](https:\u002F\u002Fgithub.com\u002FLaVi-Lab\u002FFTTT)]\n\n    *Yanyang Li, Michael Lyu, Liwei Wang.* Preprint'25\n\n1. **[S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.12853)** [[code](https:\u002F\u002Fgithub.com\u002FNineAbyss\u002FS2R)]\n\n    *Ruotian Ma, Peisong Wang, Cheng Liu, Xingyan Liu, Jiaqi Chen, Bang Zhang, Xin Zhou, Nan Du, Jia Li.* Preprint'25\n\n1. **[rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.04519)** [[code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FrStar)]\n\n    *Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, Mao Yang.* Preprint'24\n\n### 2024\n\n1. **[MathScale: Scaling Instruction Tuning for Mathematical Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.02884)**\n\n    *Zhengyang Tang, Xingxing Zhang, Benyou Wang, Furu Wei.* Preprint'24\n\n### 2023\n\n1. **[Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07336)** [[code](https:\u002F\u002Fgithub.com\u002Fhitachi-nlp\u002FFLD)]\n\n    *Terufumi Morishita, Gaku Morio, Atsuki Yamaguchi, Yasuhiro Sogawa.* ICML'23\n\n1. **[Symbolic Chain-of-Thought Distillation: Small Models Can Also \"Think\" Step-by-Step.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.14050)** [[code](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fcot_distillation)]\n\n    *Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, Yejin Choi.* ACL'23\n\n1. **[Specializing Smaller Language Models towards Multi-Step Reasoning.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12726)**\n\n    *Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot.* ICML'23\n\n1. **[Large Language Models Are Reasoning Teachers.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10071)** [[code](https:\u002F\u002Fgithub.com\u002Fitsnamgyu\u002Freasoning-teacher)]\n\n    *Namgyu Ho, Laura Schmid, Se-Young Yun.* ACL'23\n\n1. **[Teaching Small Language Models to Reason.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08410)**\n\n    *Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn.* ACL'23 Short\n\n1. **[Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00193)**\n\n    *Kumar Shridhar, Alessandro Stolfo, Mrinmaya Sachan.* ACL'23 Findings\n\n### 2022\n\n1. **[Scaling Instruction-Finetuned Language Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11416)**\n\n    *Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei.* JMLR'22\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ Back to Top ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n## Other Useful Resources\n\n\n\n- **[LLM Reasoners](https:\u002F\u002Fgithub.com\u002FBer666\u002Fllm-reasoners)**  A library for advanced large language model reasoning.\n- **[Chain-of-Thought Hub](https:\u002F\u002Fgithub.com\u002FFranxYao\u002Fchain-of-thought-hub)**  Benchmarking LLM reasoning performance with chain-of-thought prompting.\n- **[ThoughtSource](https:\u002F\u002Fgithub.com\u002FOpenBioLink\u002FThoughtSource)**  Central and open resource for data and tools related to chain-of-thought reasoning in large language models.\n- **[AgentChain](https:\u002F\u002Fgithub.com\u002Fjina-ai\u002Fagentchain)**  Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks.\n- **[google\u002FCascades](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fcascades)**  Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference, and more.\n- **[LogiTorch](https:\u002F\u002Fgithub.com\u002FLogiTorch\u002Flogitorch)**  PyTorch-based library for logical reasoning on natural language.\n- **[salesforce\u002FLAVIS](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FLAVIS)**  One-stop Library for Language-Vision Intelligence.\n- **[facebookresearch\u002FRAM](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FRAM)**  A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ Back to Top ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n## Other Awesome Lists\n\n\n\n- **[Awesome-Controllable-Generation](https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-Controllable-Generation)**  Collection of papers and resources on Controllable Generation using Diffusion Models.\n- **[Chain-of-ThoughtsPapers](https:\u002F\u002Fgithub.com\u002FTimothyxxx\u002FChain-of-ThoughtsPapers)**  A trend starts from \"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models\".\n- **[LM-reasoning](https:\u002F\u002Fgithub.com\u002Fjeffhj\u002FLM-reasoning)**  Collection of papers and resources on Reasoning in Large Language Models.\n- **[Prompt4ReasoningPapers](https:\u002F\u002Fgithub.com\u002Fzjunlp\u002FPrompt4ReasoningPapers)**  Repository for the paper \"Reasoning with Language Model Prompting: A Survey\".\n- **[ReasoningNLP](https:\u002F\u002Fgithub.com\u002FFreedomIntelligence\u002FReasoningNLP)**  Paper list on reasoning in NLP\n- **[Awesome-LLM](https:\u002F\u002Fgithub.com\u002FHannibal046\u002FAwesome-LLM)**  Curated list of Large Language Model.\n- **[Awesome LLM Self-Consistency](https:\u002F\u002Fgithub.com\u002FSuperBruceJia\u002FAwesome-LLM-Self-Consistency)**  Curated list of Self-consistency in Large Language Models.\n- **[Deep-Reasoning-Papers](https:\u002F\u002Fgithub.com\u002Ffloodsung\u002FDeep-Reasoning-Papers)**  Recent Papers including Neural-Symbolic Reasoning, Logical Reasoning, and Visual Reasoning.\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ Back to Top ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n## Contributing\n\n- Add a new paper or update an existing paper, thinking about which category the work should belong to.\n- Use the same format as existing entries to describe the work.\n- Add the abstract link of the paper (`\u002Fabs\u002F` format if it is an arXiv publication).\n\n**Don't worry if you do something wrong, it will be fixed for you!**\n\n### Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fatfortes_Awesome-LLM-Reasoning_readme_9621d72d9c6c.png\" \u002F>\n\u003C\u002Fa>\n","\u003Ca name=\"readme-top\">\u003C\u002Fa>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fatfortes\u002FAwesome-LLM-Reasoning?style=for-the-badge\" alt=\"星标数\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning\u002Fnetwork\u002Fmembers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fatfortes\u002FAwesome-LLM-Reasoning?style=for-the-badge\" alt=\"复刻数\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning\u002Fgraphs\u002Fcontributors\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcontributors\u002Fatfortes\u002FAwesome-LLM-Reasoning?style=for-the-badge\" alt=\"贡献者\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fatfortes\u002FAwesome-LLM-Reasoning?style=for-the-badge\" alt=\"MIT 许可证\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"assets\u002Fcot.svg\" width=\"90%\" style=\"align:center;\"\u002F>\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">Awesome LLM Reasoning\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n    \u003Cb> 一份精心整理的论文与资源合集，聚焦于如何激发大型语言模型及多模态语言模型的推理能力。\u003C\u002Fb>\n\u003C\u002Fp>\n\n\u003Cdetails>\n  \u003Csummary>🗂️ 目录\u003C\u002Fsummary>\n  \u003Col>\n    \u003Cli>\u003Ca href=\"#survey\">综述\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#analysis\">分析\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#ltechnique\">技术\u003C\u002Fa>\n      \u003Cul>\n        \u003Cli>\u003Ca href=\"#llm\">🔤 大型语言模型中的推理——一种涌现能力\u003C\u002Fa>\u003C\u002Fli>\n        \u003Cli>\u003Ca href=\"#mllm\">🧠 大型语言模型中的多模态推理\u003C\u002Fa>\u003C\u002Fli>\n        \u003Cli>\u003Ca href=\"#lm\">🤏 将小型语言模型扩展至具备推理能力\u003C\u002Fa>\u003C\u002Fli>\n      \u003C\u002Ful>\n    \u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#other-useful-resources\">其他实用资源\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#other-awesome-lists\">其他优秀列表\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#contributing\">贡献\u003C\u002Fa>\u003C\u002Fli>\n  \u003C\u002Fol>\n\u003C\u002Fdetails>\n\n如果你想测试大型语言模型的符号推理能力，可以查看：\u003Cb>\u003Ca href=https:\u002F\u002Fgithub.com\u002Fatfortes\u002FLLMSymbolicReasoningBench>LLMSymbolicReasoningBench\u003C\u002Fa>\u003C\u002Fb> 😄\n\n\n\n## 综述\n\n### 2025年\n\n1. **[多模态思维链推理：全面综述。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.12605)** [[代码](https:\u002F\u002Fgithub.com\u002Fyaotingwangofficial\u002FAwesome-MCoT)]\n\n    *王耀庭、吴圣琼、张跃成、威廉·王、刘子威、罗杰波、费浩。* 预印本'25\n\n2. **[大型语言模型基准测试在应对数据污染方面的最新进展：从静态评估到动态评估。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.17521)** [[代码](https:\u002F\u002Fgithub.com\u002FSeekingDream\u002FStatic-to-Dynamic-LLMEval)]\n\n    *陈思敏、陈一鸣、李泽鑫、蒋义凡、万中伟、何怡欣、冉德志、顾天乐、李海舟、谢涛、雷百石。* 预印本'25\n\n### 2024年\n\n1. **[大型语言模型注意力头综述。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.03752)** [[代码](https:\u002F\u002Fgithub.com\u002FIAAR-Shanghai\u002FAwesome-Attention-Heads)]\n\n    *郑子凡、王叶昭辉、黄宇欣、宋世超、唐博、熊飞宇、李志宇。* 预印本'24\n\n1. **[大型语言模型中的内部一致性与自我反馈：综述。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.14507)** [[代码](https:\u002F\u002Fgithub.com\u002FIAAR-Shanghai\u002FICSFSurvey)]\n\n    *梁勋、宋世超、郑子凡、王涵宇、于青晨、李寻凯、李荣华、熊飞宇、李志宇。* 预印本'24\n\n1. **[利用大型语言模型推理解决谜题：综述。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.11291)** [[代码](https:\u002F\u002Fpuzzlellms.github.io\u002F)]\n\n    *帕纳约蒂斯·贾迪基亚罗格鲁、玛丽亚·林佩赖欧、乔治斯·菲兰德里亚诺斯、乔治斯·斯塔穆。* 预印本'24\n\n1. **[用于数学推理的大型语言模型：进展与挑战。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.00157)** \n\n    *珍妮丝·安、里舒·维尔马、伦泽·楼、李迪、张睿、尹文鹏。* ACL'24\n\n### 2022年\n\n1. **[迈向大型语言模型中的推理：综述。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10403)** [[代码](https:\u002F\u002Fgithub.com\u002Fjeffhj\u002FLM-reasoning)]\n\n    *黄杰、陈传昌凯文。* ACL'23 研究成果\n\n1. **[通过语言模型提示进行推理：综述。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.09597)** [[代码](https:\u002F\u002Fgithub.com\u002Fzjunlp\u002FPrompt4ReasoningPapers)]\n\n    *乔硕飞、欧义新、张宁宇、陈翔、姚云芝、邓淑敏、谭川奇、黄飞、陈华军。* ACL'23\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ 返回顶部 ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n## 分析\n\n### 2025年\n\n1. **[基于大型推理模型的现代机器翻译新趋势。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.10351)**\n\n    *刘思诺、吕晨阳、吴明浩、王隆悦、罗卫华、张凯富、尚子福。* 预印本'25\n\n### 2024\n\n1. **[你的大语言模型具备稳定的推理能力吗？](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.13147)** [[代码](https:\u002F\u002Fgithub.com\u002Fopen-compass\u002FGPassK)]\n\n    *刘俊楠、刘洪伟、肖林晨、王子怡、刘奎坤、高松阳、张文伟、张松阳、陈凯。* 预印本'24\n\n1. **[从Medprompt到o1：医学挑战性问题及更广泛领域的运行时策略探索。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.03590)**\n\n    *哈沙·诺里、宇山直人、尼古拉斯·金、斯科特·梅耶·麦金尼、泽维尔·费尔南德斯、张升、埃里克·霍维茨。* 预印本'24\n\n1. **[采用思维链还是不采用？思维链主要有助于数学和符号推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12183)**\n\n    *扎因·斯普拉格、尹方聪、胡安·迭戈·罗德里格斯、蒋东伟、玛尼亚·瓦德瓦、普拉桑·辛哈尔、赵欣宇、叶曦、凯尔·马霍瓦尔德、格雷格·杜雷特。* 预印本'24\n\n1. **[大语言模型能否生成新颖的研究思路？一项由100多位自然语言处理研究人员参与的大规模人类研究。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.04109)**\n\n    *司成磊、杨迪毅、桥本达则。* 预印本'24\n\n1. **[窥探标记偏置：大型语言模型尚非真正的推理者。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.11050)** [[代码](https:\u002F\u002Fgithub.com\u002Fbowen-upenn\u002Fllm_token_bias)]\n\n    *江博文、谢阳心宇、郝卓群、王晓萌、马利克·坦维、苏伟杰、泰勒·卡米洛、罗斯·丹。* EMNLP'24\n\n1. **[迭代之首：对思维链的机制性研究](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.02128)**\n\n    *维维安·卡巴内斯、查尔斯·阿尔纳尔、瓦西姆·布阿齐兹、爱丽丝·杨、弗朗索瓦·夏尔通、朱莉娅·肯佩。* NeurIPS'24\n\n1. **[大型语言模型是否潜在地执行多跳推理？](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.16837)**\n\n    *杨素熙、格里博夫斯卡娅·埃琳娜、卡斯纳·诺拉、盖瓦·莫尔、里德尔·塞巴斯蒂安。* ACL'24\n\n1. **[在使用大型语言模型进行推理时，前提顺序至关重要。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.08939)**\n\n    *陈鑫云、奇·瑞安、王雪芝、周登尼。* ICML'24\n\n1. **[推理步骤长度对大型语言模型的影响。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.04925)**\n\n    *金明宇、于钦凯、舒东、赵海燕、华文悦、孟艳达、张永峰、杜梦楠。* ACL'24 Findings\n\n1. **[大型语言模型目前仍无法自我纠正推理错误。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.01798)**\n\n    *黄杰、陈鑫云、米什拉·斯瓦鲁普、郑怀秀·史蒂文、余·亚当斯·魏、宋·新莹、周·登尼。* ICLR'24\n\n1. **[在训练的哪个阶段，代码数据有助于提升大语言模型的推理能力？](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.16298)**\n\n    *马英伟、刘岳、于岳、张元亮、姜宇、王昌健、李珊珊。* ICLR'24\n\n### 2023\n\n1. **[衡量思维链推理中的忠实性。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.13702)**\n\n    *塔梅拉·兰厄姆、安娜·陈、安什·拉达克里希南、贝努瓦·施泰纳、卡森·丹尼森、丹尼·埃尔南德斯、达斯汀·李、埃辛·杜尔穆斯、埃文·休宾格、杰克逊·科尔尼恩、卡米莱·卢科修特、卡丽娜·阮、牛顿·程、尼古拉斯·约瑟夫、尼古拉斯·希弗、奥利弗·劳施、罗宾·拉尔森、萨姆·麦坎德利什、桑迪潘·昆杜、萨乌拉夫·卡达瓦特、香农·杨、托马斯·赫尼根、蒂莫西·麦克斯韦尔、蒂莫西·特利恩-劳顿、特里斯坦·休姆、扎克·哈特菲尔德-多兹、贾里德·卡普兰、扬·布劳纳、塞缪尔·R·鲍曼、伊森·佩雷斯。* 预印本'23\n\n1. **[信仰与命运：Transformer模型在组合性方面的局限性。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.18654)**\n\n    *努哈·德齐里、陆锡铭、梅拉妮·斯克拉尔、李向洛林、蒋立伟、林·武义臣、彼得·韦斯特、查德拉·巴加瓦图拉、罗南·勒·布拉斯、黄珍娜·D、桑雅尔·苏米娅、威尔克·肖恩、任向、艾莉森·埃廷格、哈查乌伊·扎伊德、崔艺珍。* NeurIPS'23\n\n1. **[语言模型并不总是说出它们所想的：思维链提示中的不忠实解释。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.04388)** [[代码](https:\u002F\u002Fgithub.com\u002Fmilesaturpin\u002Fcot-unfaithfulness)]\n\n    *迈尔斯·特平、朱利安·迈克尔、伊森·佩雷斯、塞缪尔·R·鲍曼。* NeurIPS'23\n\n1. **[对ChatGPT在推理、幻觉和交互性方面的多任务、多语言、多模态评估。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.04023)**\n\n    *邦艺珍、卡亚维贾亚·塞缪尔、李娜妍、戴文亮、苏丹、威利·布莱恩、洛维尼亚·霍利、季紫薇、于铁正、钟威利、杜·屈越、徐燕、冯·帕斯卡尔。* AACL'23\n\n1. **[大型语言模型很容易被无关上下文分散注意力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.00093)**\n\n    *史芙蕾达、陈鑫云、米斯拉·卡尼什卡、斯凯尔斯·内森、多汉·大卫、奇·埃德、舍尔利·纳撒尼尔、周·登尼。* ICML'23\n\n1. **[再想想吧，我们还是不要一步一步地思考了！零样本推理中的偏见与毒性。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08061)**\n\n    *奥马尔·谢赫、张宏鑫、威廉·赫尔德、伯恩斯坦·迈克尔、杨迪毅。* ACL'23\n\n1. **[理解思维链提示：一项关于关键因素的实证研究。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10001)** [[代码](https:\u002F\u002Fgithub.com\u002Fsunlab-osu\u002FUnderstanding-CoT)]\n\n    *王博思、闵世温、邓向、沈嘉明、吴友、泽特洛默·卢克、孙欢。* ACL'23\n\n1. **[BIG-Bench难题及其是否能通过思维链解决的问题。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.09261)** [[代码](https:\u002F\u002Fgithub.com\u002Fsuzgunmirac\u002FBIG-Bench-Hard)]\n\n    *苏兹贡·米拉克、斯凯尔斯·内森、舍尔利·纳撒尼尔、格尔曼·塞巴斯蒂安、泰·易、钟·弘源、乔德里·阿坎克莎、黎·国荣、奇·埃德、周·登尼、魏·杰森。* ACL'23 Findings\n\n### 2022\n\n1. **[大型语言模型的涌现能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.07682)** [[博客](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F11\u002Fcharacterizing-emergent-phenomena-in.html)]\n\n    *魏·杰森、泰·易、博马萨尼·里希、拉法尔·科林、佐夫·巴雷特、博尔格奥德·塞巴斯蒂安、尤加塔马·丹尼、博斯马·马尔滕、周·登尼、梅茨勒·唐纳德、奇·埃德、桥本·达则、维尼亚尔斯·奥里奥尔、梁·珀西、迪恩·杰夫、费杜斯·威廉。* TMLR'22\n\n1. **[语言模型能否从上下文中的解释中学习？](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.02329)**\n\n    *兰皮宁·安德鲁·K、达斯古普塔·伊希塔、陈·斯蒂芬妮·C·Y、马修森·科里、特斯勒·迈克尔·亨利、克雷斯韦尔·安东尼娅、麦克莱兰德·詹姆斯·L、王·简·X、希尔·菲利克斯。* EMNLP'22\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ 返回顶部 ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n\u003Ch2 id=\"ltechnique\">技术\u003C\u002Fh2>\n\n\n\n\u003Ch3 id=\"llm\">🔤 大型语言模型中的推理——\u003Ci>一种涌现能力\u003C\u002Fi>\u003C\u002Fh3>\n\n### 2025\n\n1. **[JudgeLRM：大型推理模型作为裁判。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.00050)**\n\n    *陈诺、胡志远、邹清云、吴佳颖、王倩、布莱恩·胡伊、何炳生。* 预印本'25\n   \n1. **[数据污染下代码大语言模型推理能力的动态基准测试。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.04149)** [[代码](https:\u002F\u002Fcodekaleidoscope.github.io\u002Fdycodeeval.html)]\n\n    *陈思敏、普拉纳夫·普萨尔拉、贝莎基·雷。* ICML'25\n\n1. **[CRANE：受限LLM生成下的推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09061)**\n\n    *德邦舒·班纳吉、塔伦·苏雷什、舒巴姆·乌加雷、萨沙·米赛洛维奇、加甘迪普·辛格。* ICML'25\n\n1. **[思维草图：基于自适应认知启发式草图的高效LLM推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.05179)** [[代码](https:\u002F\u002Fwww.github.com\u002FSimonAytes\u002FSoT)]\n\n    *西蒙·A·艾特斯、白振宪、黄成柱。* 预印本'25\n\n1. **[数学推理的自我奖励修正。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.19613)**\n\n    *熊伟、张汉宁、叶晨露、陈立昌、蒋楠、张彤。* 预印本'25\n\n1. **[大型推理模型在竞技编程中的应用。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.06807)**\n\n    *OpenAI：艾哈迈德·埃尔-基什基、亚历山大·魏、安德烈·萨赖瓦、博里斯·米纳耶夫、丹尼尔·塞尔萨姆、大卫·多翰、弗朗西斯·宋、亨特·莱特曼、伊格纳西·克拉韦拉、雅库布·帕乔茨基、杰里·特沃雷克、洛伦茨·库恩、卢卡什·凯泽、马克·陈、马克斯·施瓦策、莫斯塔法·罗哈内贾德、纳特·麦卡利斯、o3贡献者、奥列格·穆尔克、瑞瑟姆·加格、瑞·舒、西蒙·西多尔、维尼特·科萨拉朱、周文达。* 预印本'25\n\n1. **[s1：简单的测试时缩放。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.19393)**\n\n    *尼克拉斯·门尼霍夫、杨子彤、史伟嘉、李香丽莎、李飞飞、汉娜内·哈吉希尔齐、卢克·泽特勒莫耶、珀西·梁、埃马纽埃尔·坎德斯、桥本达津纪。* 预印本'25\n\n1. **[DeepSeek-R1：通过强化学习激励LLM的推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.12948)** [[项目](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-R1)]\n\n    *郭达亚、杨德健、张浩伟、宋俊晓、张若宇、徐润欣、朱启豪、马世荣、王培义、毕晓、张孝康、于兴凯、吴宇、吴Z.F.、苟志斌、邵志宏、李卓书、高子怡等。* 预印本'25\n\n1. **[迈向LLM中的系统2推理：通过元思维链学习如何思考。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.04682)**\n\n    *黄薇儿、查理·斯奈尔、卡尼什克·甘地、阿隆·阿尔巴拉克、阿尼凯特·辛格、切斯·布拉格登、杜伊·冯、拉斐尔·拉法伊洛夫、内森·莱尔、达科塔·马汉、路易斯·卡斯特里卡托、扬-菲利普·弗兰肯、尼克·哈伯、切尔西·芬恩。* 预印本'25\n\n### 2024\n\n1. **[华佗GPT-o1，迈向LLM的医学复杂推理](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.18925)** [[代码](https:\u002F\u002Fgithub.com\u002FFreedomIntelligence\u002FHuatuoGPT-o1)]\n\n    *陈俊英、蔡振阳、季可、王锡东、刘万龙、王荣盛、侯建业、王本友。* 预印本'24\n\n\n1. **[PPM：用于基准测试代码生成模型的多样化编程问题自动生成](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3643780)** [[代码](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3643780)]\n    *陈思敏、冯晓宁、韩晓红、刘聪、杨伟* FSE'24\n\n1. **[DRT-o1：通过长思维链优化深度推理翻译。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.17498)** [[代码](https:\u002F\u002Fgithub.com\u002Fkrystalan\u002FDRT-o1)]\n\n    *王佳安、孟凡东、梁云龙、周杰。* 预印本'24\n\n1. **[MALT：通过多智能体LLM训练提升推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.01928)**\n\n    *苏米特·拉梅什·莫特瓦尼、钱德勒·史密斯、罗克蒂姆·焦提·达斯、马尔基安·雷布丘克、菲利普·H·S·托尔、伊万·拉普捷夫、法比奥·皮扎蒂、罗纳德·克拉克、克里斯蒂安·施罗德·德·维特。* 预印本'24\n\n1. **[SmartAgent：面向网络世界的具身化个性化代理的用户思维链。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.07472)**\n\n    *张佳琪、高晨、张丽媛、李勇、尹洪志。* 预印本'24\n\n1. **[Marco-o1：迈向开放性解决方案的开放推理模型。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.14405)** [[代码](https:\u002F\u002Fgithub.com\u002FAIDC-AI\u002FMarco-o1)] [[模型](https:\u002F\u002Fhuggingface.co\u002FAIDC-AI\u002FMarco-o1)]\n\n    *赵宇、尹慧峰、曾波、王浩、石天奇、吕晨阳、王龙跃、罗卫华、张凯富。* 预印本'24\n\n1. **[将自我修正嵌入大型语言模型，以增强数学推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.10735)**\n\n    *高阔峰、蔡焕秋、帅庆瑶、龚迪宏、李志峰。* 预印本'24\n\n1. **[面向LLM的深思熟虑推理：具有精确世界模型的结构感知规划。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.03136)** [[代码](https:\u002F\u002Fgithub.com\u002Fxiongsiheng\u002FSWAP)]\n\n    *熊思恒、阿里·帕亚尼、杨源、费拉马兹·费克里。* 预印本'24\n\n1. **[可解释的对比蒙特卡洛树搜索推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.01707)**\n\n    *高子添、牛博野、何旭正、许浩天、刘洪章、刘爱伟、胡旭明、温立杰。* 预印本'24\n\n1. **[通过强化学习训练语言模型进行自我修正。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12917)**\n\n    *阿维拉尔·库马尔、文森特·庄、里沙布·阿加瓦尔、苏毅、JD·科-雷耶斯、阿维·辛格、凯特·鲍姆利、沙里克·伊克巴尔、科尔顿·比绍普、丽贝卡·罗洛夫斯、张蕾·M、凯·麦金尼、迪莎·施里瓦斯塔瓦、科斯敏·帕杜拉鲁、乔治·塔克、多伊娜·普雷库普、费里亚尔·贝赫巴哈尼、亚历山德拉·福斯特。* 预印本'24\n\n1. **[OpenAI o1。](https:\u002F\u002Fopenai.com\u002Findex\u002Flearning-to-reason-with-llms\u002F)**\n\n    *Open AI团队。* 技术报告'24\n\n1. **[Agent Q：面向自主AI代理的高级推理与学习。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.07199)**\n\n    *普拉纳夫·普塔、埃德蒙·米尔斯、纳曼·加尔格、苏米特·莫特瓦尼、切尔西·芬恩、迪万尚·加尔格、拉斐尔·拉法伊洛夫。* 预印本'24\n\n1. **[DotaMath：借助代码辅助和自我修正实现思维分解的数学推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.04078)** [[代码](https:\u002F\u002Fgithub.com\u002FChengpengLi1003\u002FDotaMath)]\n\n    *李承鹏、董冠廷、薛明峰、彭汝、王翔、刘大亨。* 预印本'24\n\n1. **[LLM-ARC：用自动推理批评家增强LLM。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.17663)**\n\n    *阿迪蒂亚·卡利亚努普尔、凯拉什·萨拉瓦纳库马尔、维克托·巴雷斯、珍妮弗·楚-卡罗尔、戴维·梅尔维尔、戴维·费鲁奇。* 预印本'24\n\n1. **[Q\\*：通过深思熟虑规划提升LLM的多步推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.14283)**\n\n    *王超杰、邓燕辰、吕志义、颜水成、安博。* 预印本'24\n\n1. **[思维缓冲区：大型语言模型的思维增强型推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.04271)** [[代码](https:\u002F\u002Fgithub.com\u002FYangLing0818\u002Fbuffer-of-thought-llm)]\n\n    *杨凌、于兆臣、张天军、曹世义、徐敏凯、张文涛、约瑟夫·E·冈萨雷斯、崔彬。* 预印本'24\n\n1. **[迈向通过想象、搜索和批评实现LLM的自我改进。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.12253)**\n\n    *田烨、彭宝林、宋林峰、金立峰、于典、米海涛、于东。* 预印本'24\n\n1. **[自-play对抗语言游戏增强LLM推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.10642)**\n\n    *Pengyu Cheng, Tianhao Hu, Han Xu, Zhisong Zhang, Yong Dai, Lei Han, Nan Du.* 预印本'24\n\n1. **[超越准确率的数学推理评估。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05692)**\n\n    *Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, Pengfei Liu.* 预印本'24\n\n1. **[利用偏好树提升LLM推理通用性。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02078)**\n\n    *Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, Bowen Zhou, Hao Peng, Zhiyuan Liu, Maosong Sun.* 预印本'24\n\n1. **[LLM3：基于大语言模型的任务与运动规划，附运动失败推理功能。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.11552) [[代码](https:\u002F\u002Fgithub.com\u002FAssassinWS\u002FLLM-TAMP)]**\n\n    *Shu Wang, Muzhi Han, Ziyuan Jiao, Zeyu Zhang, Ying Nian Wu, Song-Chun Zhu, Hangxin Liu.* IROS'24\n\n1. **[Quiet-STaR：语言模型可自我训练，在发声前先思考。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09629)**\n\n    *Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman.* 预印本'24\n\n1. **[GLoRe：何时、何地以及如何通过全局与局部优化提升LLM推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.10963)**\n\n    *Alex Havrilla, Sharath Raparthy, Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Railneau.* ICML'24\n\n1. **[无需提示的思维链式推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.10200)**\n\n    *Xuezhi Wang, Denny Zhou.* 预印本'24\n\n1. **[V-STaR：为自教型推理者训练验证器。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.06457)**\n\n    *Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal.* 预印本'24\n\n1. **[InternLM-Math：开放数学大语言模型迈向可验证推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.06332)**\n\n    *Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin.* 预印本'24\n\n1. **[Self-Discover：大语言模型自我构建推理结构。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03620)**\n\n    *Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng.* 预印本'24\n\n1. **[DeepSeekMath：推动开放语言模型数学推理能力的极限。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03300)**\n\n    *Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo.* 预印本'24\n\n1. **[利用大语言模型进行K级推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.01521)**\n\n    *Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, Furu Wei.* 预印本'24\n\n1. **[通过抽象链式推理实现高效工具使用。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.17464)**\n\n    *Silin Gao, Jane Dwivedi-Yu, Ping Yu, Xiaoqing Ellen Tan, Ramakanth Pasunuru, Olga Golovneva, Koustuv Sinha, Asli Celikyilmaz, Antoine Bosselut, Tianlu Wang.* 预印本'24\n\n1. **[通过交互式演示教导语言模型自我改进。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.13522)**\n\n    *Xiao Yu, Baolin Peng, Michel Galley, Jianfeng Gao, Zhou Yu.* NAACL'24\n\n1. **[利用逻辑增强大语言模型零样本思维链式推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.13339) [[代码](https:\u002F\u002Fgithub.com\u002Fxf-zhao\u002FLoT)]**\n\n    *Xufeng Zhao, Mengdi Li, Wenhao Lu, Cornelius Weber, Jae Hee Lee, Kun Chu, Stefan Wermter.* COLING'24\n\n1. **[验证链减少大语言模型幻觉现象。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11495)**\n\n    *Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston.* ACL'24研究发现\n\n1. **[思维骨架：大语言模型可进行并行解码。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15337)**\n\n    *Xuefei Ning, Zinan Lin, Zixuan Zhou, Huazhong Yang, Yu Wang.* ICLR'24\n\n1. **[问题分解提升模型生成推理的忠实性。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.11768) [[代码](https:\u002F\u002Fgithub.com\u002Fanthropics\u002FDecompositionFaithfulnessPaper)]**\n\n    *Ansh Radhakrishnan, Karina Nguyen, Anna Chen, Carol Chen, Carson Denison, Danny Hernandez, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Sam McCandlish, Sheer El Showk, Tamera Lanham, Tim Maxwell, Venkatesa Chandrasekaran, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez.* 预印本'23\n\n1. **[让我们逐步验证吧。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.20050)**\n\n    *Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe.* ICLR'24\n\n1. **[REFINER：基于中间表示的推理反馈。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01904) [[项目](https:\u002F\u002Fdebjitpaul.github.io\u002Frefiner\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fdebjitpaul\u002Frefiner)]**\n\n    *Debjit Paul, Mete Ismayilzada, Maxime Peyrard, Beatriz Borges, Antoine Bosselut, Robert West, Boi Faltings.* EACL'24\n\n1. **[针对大语言模型的主动思维链式提示。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.12246) [[代码](https:\u002F\u002Fgithub.com\u002Fshizhediao\u002Factive-cot)]**\n\n    *Shizhe Diao, Pengcheng Wang, Yong Lin, Tong Zhang.* ACL'24\n\n1. **[语言模型作为归纳推理者。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10923)**\n\n    *Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei.* EACL'24\n\n\n\n### 2023年\n\n1. **[提升LLM推理能力：利用强化上下文剪枝突破少样本学习极限。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08901)**\n\n    *Xijie Huang, Li Lyna Zhang, Kwang-Ting Cheng, Mao Yang.* 预印本'23\n\n1. **[Logic-LM：用符号求解器赋能大语言模型，实现忠实的逻辑推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12295) [[代码](https:\u002F\u002Fgithub.com\u002Fteacherpeterpan\u002FLogic-LLM)]**\n\n    *Liangming Pan, Alon Albalak, Xinyi Wang, William Yang Wang.* EMNLP'23研究发现\n\n1. **[思维递归：一种基于语言模型的多上下文推理分治方法。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.06891) [[代码](https:\u002F\u002Fgithub.com\u002Fsoochan-lee\u002FRoT)] [[海报](https:\u002F\u002Fsoochanlee.com\u002Fimg\u002Frot\u002Frot_poster.pdf)]**\n\n    *Soochan Lee, Gunhee Kim.* ACL'23研究发现\n\n1. **[利用语言模型进行推理即是在构建世界模型下的规划。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14992)**\n\n    *Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, Zhiting Hu.* EMNLP'23\n\n1. **[通过思维链式提示推理隐含情感。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.11255) [[代码](https:\u002F\u002Fgithub.com\u002Fscofield7419\u002FTHOR-ISA)]**\n\n    *Hao Fei, Bobo Li, Qian Liu, Lidong Bing, Fei Li, Tat-Seng Chua.* ACL'23\n\n1. **[思维树：基于大型语言模型的深思熟虑问题解决方法。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601)** [[代码](https:\u002F\u002Fgithub.com\u002Fysymyth\u002Ftree-of-thought-llm)]\n\n    *姚顺宇、于典、赵杰弗里、伊扎克·沙夫兰、托马斯·L·格里菲斯、曹源、卡尔蒂克·纳拉西曼。* NeurIPS'23\n\n1. **[SatLM：利用声明式提示的可满足性辅助语言模型。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.09656)** [[代码](https:\u002F\u002Fgithub.com\u002Fxiye17\u002Fsat-lm)]\n\n    *叶曦、陈乔楚、伊希尔·迪利格、格雷格·杜雷特。* NeurIPS'23\n\n1. **[ART：大型语言模型的自动多步推理与工具使用。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.09014)**\n\n    *巴尔加维·帕兰贾佩、斯科特·伦德伯格、萨米尔·辛格、汉娜内·哈吉希尔齐、卢克·泽特勒莫耶、马尔科·图利奥·里贝罗。* Preprint'23\n\n1. **[基于标注数据的链式思考自动提示增强与选择。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.12822)** [[代码](https:\u002F\u002Fgithub.com\u002Fshizhediao\u002Fautomate-cot)]\n\n    *舒嘉舜、刁世哲、张彤。* EMNLP'23 Findings\n\n1. **[合成提示：为大型语言模型生成链式思考示例。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.00618)**\n\n    *邵志宏、龚烨云、沈业龙、黄民烈、段楠、陈伟祖。* ICML'23\n\n1. **[忠实的链式思考推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.13379)**\n\n    *吕青、施瑞娅·哈瓦尔达尔、亚当·斯坦、李章、德利普·拉奥、埃里克·王、玛丽安娜·阿皮迪亚纳基、克里斯·卡利森-伯奇。* IJCNLP-AACL'23\n\n1. **[检索式反思：忠实的大型语言模型推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.00303)**\n\n    *何航峰、张洪明、丹·罗斯。* Preprint'23\n\n1. **[LAMBADA：自然语言中的自动化推理之逆向链式推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.13894)**\n\n    *赛义德·梅赫兰·卡泽米、金娜琼、迪普蒂·巴蒂亚、徐欣、迪帕克·拉马昌德兰。* ACL'23\n\n1. **[将检索与链式思考推理交织用于知识密集型多步问题。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10509)** [[代码](https:\u002F\u002Fgithub.com\u002FStonyBrookNLP\u002Fircot)]\n\n    *哈尔什·特里维迪、尼兰詹·巴拉苏布拉马尼安、图沙尔·科特、阿希什·萨布瓦尔。* ACL'23\n\n1. **[大型语言模型是具备自我验证能力的推理者。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.09561)** [[代码](https:\u002F\u002Fgithub.com\u002FWENGSYX\u002FSelf-Verification)]\n\n    *翁一轩、朱敏俊、何仕柱、刘康、赵军。* EMNLP'23 Findings\n\n1. **[检索增强型语言模型能进行推理吗？检索器与语言模型之间的责任归属。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.09146)** [[代码](https:\u002F\u002Fgithub.com\u002FMcGill-NLP\u002Fretriever-lm-reasoning)]\n\n    *帕里沙德·贝赫南加德尔、圣地亚哥·米雷特、西瓦·雷迪。* EMNLP'23 Findings\n\n1. **[有效上下文学习的互补性解释。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.13892)**\n\n    *叶曦、斯里尼瓦桑·艾耶尔、阿斯莉·切利基尔马兹、韦斯·斯托亚诺夫、格雷格·杜雷特、拉马克坎特·帕苏努鲁。* ACL'23 Findings\n\n1. **[思维程序提示：为数值推理任务将计算与推理解耦。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.12588)** [[代码](https:\u002F\u002Fgithub.com\u002Fwenhuchen\u002Fprogram-of-thoughts)]\n\n    *陈文虎、马学光、王馨怡、威廉·W·科恩。* TMLR'23\n\n1. **[通过正确实例进行无监督解释生成。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.11160)**\n\n    *程思杰、吴志勇、陈江杰、李志兴、刘洋、孔令鹏。* AAAI'23\n\n1. **[PAL：程序辅助语言模型。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.10435)** [[项目](https:\u002F\u002Freasonwithpal.com\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Freasoning-machines\u002Fpal)]\n\n    *高璐瑜、阿曼·马达安、周书妍、乌里·阿隆、刘鹏飞、杨一鸣、杰米·卡兰、格雷厄姆·纽比格。* ICML'23\n\n1. **[通过合作式推理诱导的语言模型解决数学应用题。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.16257)** [[代码](https:\u002F\u002Fgithub.com\u002FTianHongZXY\u002FCoRe)]\n\n    *朱新宇、王俊杰、张林、张宇翔、甘如意、张佳星、杨宇久。* ACL'23\n\n1. **[大型语言模型可以自我改进。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11610)**\n\n    *黄家鑫、顾世祥、侯乐、吴悦欣、王雪芝、于鸿坤、韩家伟。* EMNLP'23\n\n1. **[心灵之眼：通过模拟实现 grounded 语言模型推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.05359)**\n\n    *刘睿博、魏Jason、顾世祥、吴特彦、沃索吉·索鲁什、崔克莱尔、周登尼、戴安德鲁·M。* ICLR'23\n\n1. **[大型语言模型中的自动链式思考提示。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03493)** [[代码](https:\u002F\u002Fgithub.com\u002Famazon-research\u002Fauto-cot)]\n\n    *张卓生、张阿斯顿、李牧、亚历克斯·斯莫拉。* ICLR'23\n\n1. **[语言模型是多语种的链式思考推理者。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03057)**\n\n    *史芙蕾达、苏兹贡·米拉克、弗莱塔格·马库斯、王雪芝、斯里瓦茨·苏拉杰、沃索吉·索鲁什、郑炯元、泰易、鲁德尔·塞巴斯蒂安、周登尼、达斯·迪潘詹、魏Jason。* ICLR'23\n\n1. **[问我任何问题：一种简单的语言模型提示策略。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.02441)** [[代码](https:\u002F\u002Fgithub.com\u002Fhazyresearch\u002Fama_prompting)]\n\n    *阿罗拉·西姆兰、纳拉扬·阿瓦妮卡、陈梅伊·F、奥尔·劳雷尔、古哈·尼尔、巴蒂亚·库什、查米·伊内斯、萨拉·弗雷德里克、雷·克里斯托弗。* ICLR'23\n\n1. **[通过策略梯度进行动态提示学习，用于半结构化数学推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14610)** [[项目](https:\u002F\u002Fpromptpg.github.io\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Flupantech\u002FPromptPG)]\n\n    *陆攀、邱亮、常凯威、吴颖年、朱松春、拉杰普罗希特·坦迈、克拉克·彼得、卡利安·阿什温。* ICLR'23\n\n1. **[通过步骤感知的验证器使大型语言模型成为更好的推理者。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.02336)**\n\n    *李一飞、林泽奇、张士卓、傅强、陈蓓、楼建广、陈伟祖。* ACL'23\n\n1. **[由简入繁的提示策略使大型语言模型具备复杂推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.10625)**\n\n    *周登尼、谢尔利·纳撒尼尔、侯乐、魏Jason、斯凯尔斯·内森、王雪芝、舒尔曼斯·戴尔、崔克莱尔、布斯盖特·奥利维尔、黎国强、奇·埃德。* ICLR'23\n\n1. **[自洽性提升语言模型的链式思考推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11171)**\n\n    *王雪芝、魏Jason、舒尔曼斯·戴尔、黎国强、奇·埃德、纳朗·沙兰、乔德里·阿坎克莎、周登尼。* ICLR'23\n\n### 2022\n\n1. **[检索增强的常识推理：一种统一的方法。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.12887)** [[代码](https:\u002F\u002Fgithub.com\u002Fwyu97\u002FRACo)]\n\n    *Wenhao Yu, Chenguang Zhu, Zhihan Zhang, Shuohang Wang, Zhuosheng Zhang, Yuwei Fang, Meng Jiang.* EMNLP'22\n\n1. **[代码语言模型是少样本常识学习者。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.07128)** [[代码](https:\u002F\u002Fgithub.com\u002Fmadaan\u002Fcocogen)]\n\n    *Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, Graham Neubig.* EMNLP'22\n\n1. **[利用语言模型解决定量推理问题。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.14858)** [[博客](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F06\u002Fminerva-solving-quantitative-reasoning.html)]\n\n    *Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra.* NeurIPS'22\n\n1. **[大型语言模型仍然无法进行规划。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.10498)** [[代码](https:\u002F\u002Fgithub.com\u002Fkarthikv792\u002Fgpt-plan-benchmark)]\n\n    *Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati.* NeurIPS'22\n\n1. **[大型语言模型是零样本推理者。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.11916)**\n\n    *Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa.* NeurIPS'22\n\n1. **[通过迭代提示微调预训练语言模型以实现思维链。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.08383)** [[代码](https:\u002F\u002Fgithub.com\u002Fsunlab-osu\u002Fiterprompt)]\n\n    *Boshi Wang, Xiang Deng, Huan Sun.* EMNLP'22\n\n1. **[思维链提示能够激发大型语言模型的推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11903)** [[博客](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F05\u002Flanguage-models-perform-reasoning-via.html)]\n\n    *Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou.* NeurIPS'22\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ 返回顶部 ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n### \u003Ch3 id=\"mllm\">🧠 大型语言模型中的多模态推理\u003Ch3\u002F>\n\n### 2025\n\n1. **[在多模态大型语言模型中引入视觉感知标记。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.17425)** [[代码](https:\u002F\u002Fgithub.com\u002Fyu-rp\u002FVisualPerceptionToken)] [[模型](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Frp-yu\u002Fvpt-models-67b6afdc8679a05a2876f07a)] [[数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Frp-yu\u002FVPT_Datasets)]\n\n    *Runpeng Yu, Xinyin Ma, Xinchao Wang.* 预印本'25\n\n1. **[LlamaV-o1：重新思考大型语言模型中的逐步视觉推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.06186)** [[项目](https:\u002F\u002Fmbzuai-oryx.github.io\u002FLlamaV-o1\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fmbzuai-oryx\u002FLlamaV-o1)] [[模型](https:\u002F\u002Fhuggingface.co\u002Fomkarthawakar\u002FLlamaV-o1)]\n\n    *Omkar Thawakar, Dinura Dissanayake, Ketan More, Ritesh Thawkar, Ahmed Heakl, Noor Ahsan, Yuhao Li, Mohammed Zumri, Jean Lahoud, Rao Muhammad Anwer, Hisham Cholakkal, Ivan Laptev, Mubarak Shah, Fahad Shahbaz Khan, Salman Khan.* 预印本'25\n\n1. **[Embodied-Reasoner：协同视觉搜索、推理和行动以完成具身交互任务](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21696)** [[项目](https:\u002F\u002Fembodied-reasoner.github.io\u002F)][[代码](https:\u002F\u002Fgithub.com\u002Fzwq2018\u002Fembodied_reasoner)][[数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fzwq2018\u002Fembodied_reasoner)]\n   *Wenqi Zhang, Mengna Wang, Gangao Liu, Xu Huixin, Yiwei Jiang, Yongliang Shen, Guiyang Hou, Zhe Zheng, Hang Zhang, Xin Li, Weiming Lu, Peng Li, Yueting Zhuang.* 预印本'25\n### 2024\n\n1. **[Insight-V：探索多模态大型语言模型中的长链式视觉推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.14432)** [[代码](https:\u002F\u002Fgithub.com\u002Fdongyh20\u002FInsight-V)] [[模型](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTHUdyh\u002Finsight-v-673f5e1dd8ab5f2d8d332035)]\n\n    *Yuhao Dong, Zuyan Liu, Hai-Long Sun, Jingkang Yang, Winston Hu, Yongming Rao, Ziwei Liu.* 预印本'24\n\n1. **[LLaVA-CoT：让视觉语言模型逐步推理](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10440)** [代码](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FLLaVA-CoT) [模型](https:\u002F\u002Fhuggingface.co\u002FXkev\u002FLlama-3.2V-11B-cot)\n\n    *Guowei Xu, Peng Jin, Hao Li, Yibing Song, Lichao Sun, Li Yuan.* 预印本'24\n\n1. **[视觉速写板：将草图作为多模态语言模型的视觉思维链。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.09403)** [[项目](https:\u002F\u002Fvisualsketchpad.github.io\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FYushi-Hu\u002FVisualSketchpad)]\n\n    *Yushi Hu, Weijia Shi, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna.* 预印本'24\n\n1. **[基于图表的推理：将能力从LLM转移到VLM。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.12596)**\n\n    *Victor Carbune, Hassan Mansoor, Fangyu Liu, Rahul Aralikatte, Gilles Baechler, Jindong Chen, Abhanshu Sharma.* NAACL'24 Findings\n\n1. **[SpatialVLM：赋予视觉语言模型空间推理能力。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.12168)** [[项目](https:\u002F\u002Fspatial-vlm.github.io\u002F)]\n\n    *Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia.* CVPR'24\n\n1. **[表格链：为表格理解进化出的推理链中的表格。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.04398)**\n\n    *Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister.* ICLR'24\n\n1. **[面向多模态LLM的链接上下文学习。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07891)** [[代码](https:\u002F\u002Fgithub.com\u002Fisekai-portal\u002FLink-Context-Learning)]\n\n    *Yan Tai, Weichen Fan, Zhao Zhang, Feng Zhu, Rui Zhao, Ziwei Liu.* CVPR'24\n\n### 2023\n\n1. **[Gemini在推理中的应用：揭示多模态大语言模型中的常识理解。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.17661)**\n\n    *王宇清、赵云.* 预印本'23\n\n1. **[G-LLaVA：利用多模态大语言模型解决几何问题。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11370)**\n\n    *高佳辉、皮仁杰、张继鹏、叶家成、钟万军、王宇飞、洪兰青、韩建华、徐航、李振国、孔令鹏.* 预印本'23\n\n1. **[Chameleon：基于大语言模型的即插即用式组合推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09842)** [[项目](https:\u002F\u002Fchameleon-llm.github.io\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Flupantech\u002Fchameleon-llm)]\n\n    *陆攀、彭宝林、程浩、米歇尔·加利、蔡开元、吴英年、朱松纯、高剑锋.* NeurIPS'23\n\n1. **[MM-REACT：通过提示引导ChatGPT实现多模态推理与行动。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11381)** [[项目](https:\u002F\u002Fmultimodal-react.github.io\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FMM-REACT)] [[演示](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmicrosoft-cognitive-service\u002Fmm-react)]\n\n    *杨正元、李林杰、王建峰、林凯文、阿扎尔纳斯布、艾哈迈德、刘子诚、刘策、曾迈克尔、王丽娟.* 预印本'23\n\n1. **[ViperGPT：通过Python执行进行视觉推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08128)** [[项目](https:\u002F\u002Fviper.cs.columbia.edu\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fcvlab-columbia\u002Fviper)]\n\n    *迪达克·苏里斯、萨奇特·梅农、卡尔·冯德里克.* ICCV'23\n\n1. **[Visual ChatGPT：与视觉基础模型对话、绘图和编辑。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04671)** [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fvisual-chatgpt)]\n\n    *吴晨菲、尹圣明、齐伟珍、王小东、唐泽成、段楠.* 预印本'23\n\n1. **[语言模型中的多模态思维链推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.00923)** [[代码](https:\u002F\u002Fgithub.com\u002Famazon-science\u002Fmm-cot)]\n\n    *张卓生、张 Aston、李牧、赵海、卡里皮斯、斯莫拉.* 预印本'23\n\n1. **[视觉编程：无需训练的组合式视觉推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.11559)** [[项目](https:\u002F\u002Fprior.allenai.org\u002Fprojects\u002Fvisprog)] [[代码](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fvisprog)]\n\n    *坦迈·古普塔、阿尼鲁达·肯布哈维.* CPVR'23\n\n1. **[Socratic Models：利用语言构建零样本多模态推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.00598)** [[项目](https:\u002F\u002Fsocraticmodels.github.io\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002Fmaster\u002Fsocraticmodels)]\n\n    *安迪·曾、玛丽亚·阿塔里安、布莱恩·伊希特、克日什托夫·霍罗马斯基、阿德里安·王、斯特凡·韦尔克、费德里科·汤巴里、阿维克·普罗希特、迈克尔·里奥、维卡斯·辛德瓦尼、约翰尼·李、文森特·范胡克、皮特·弗洛伦斯.* ICLR'23\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ 返回顶部 ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n### \u003Ch3 id=\"lm\">🤏 将小型语言模型扩展至推理能力\u003Ch3\u002F>\n\n### 2025\n\n1. **[在测试时通过反馈学习推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.15771)** [[代码](https:\u002F\u002Fgithub.com\u002FLaVi-Lab\u002FFTTT)]\n\n    *李燕阳、吕迈克尔、王立伟.* 预印本'25\n\n1. **[S²R：通过强化学习教导大语言模型自我验证与自我修正。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.12853)** [[代码](https:\u002F\u002Fgithub.com\u002FNineAbyss\u002FS2R)]\n\n    *马若天、王培松、刘成、刘星言、陈嘉琪、张邦、周欣、杜楠、李佳.* 预印本'25\n\n1. **[rStar-Math：小型语言模型可通过自我进化式深度思考掌握数学推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.04519)** [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FrStar)]\n\n    *关鑫宇、张莉娜、刘一飞、尚宁、孙佑然、朱毅、杨帆、杨茂.* 预印本'24\n\n### 2024\n\n1. **[MathScale：用于数学推理的指令微调扩展方法。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.02884)**\n\n    *唐正阳、张兴兴、王本友、魏福如.* 预印本'24\n\n### 2023\n\n1. **[基于形式逻辑的合成语料库学习演绎推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07336)** [[代码](https:\u002F\u002Fgithub.com\u002Fhitachi-nlp\u002FFLD)]\n\n    *森下照文、森尾岳、山口敦纪、曾川康弘.* ICML'23\n\n1. **[符号化思维链蒸馏：小型模型也能“逐步思考”。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.14050)** [[代码](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fcot_distillation)]\n\n    *李念安·哈罗德、杰克·赫塞尔、柳英宰、任翔、蔡开元、崔艺珍.* ACL'23\n\n1. **[将小型语言模型专门化为多步推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12726)**\n\n    *傅瑶、彭浩、欧立图、萨巴瓦尔、科特.* ICML'23\n\n1. **[大型语言模型是推理教师。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10071)** [[代码](https:\u002F\u002Fgithub.com\u002Fitsnamgyu\u002Freasoning-teacher)]\n\n    *何南圭、劳拉·施密德、尹世荣.* ACL'23\n\n1. **[教导小型语言模型进行推理。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08410)**\n\n    *露西·夏洛特·马吉斯特、乔纳森·马林森、雅库布·阿达梅克、埃里克·马尔米、阿里克谢·塞维尔金.* ACL'23 短文\n\n1. **[通过语义分解将大型语言模型的多步推理能力蒸馏到小型模型中。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00193)**\n\n    *库马尔·施里达尔、亚历山德罗·斯托尔福、姆林玛亚·萨昌.* ACL'23 发现\n\n### 2022\n\n1. **[指令微调语言模型的扩展方法。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.11416)**\n\n    *郑亨源、侯乐、朗普雷、佐夫、泰伊、费杜斯、李埃里克、王学智、德格哈尼、布拉马、韦布森、顾世祥、戴竹云、苏兹贡、陈欣韵、乔德里、纳朗、米什拉、余亚当斯、赵文轩、黄艳萍、戴安德鲁、于鸿坤、彼得罗夫、奇埃德、迪恩杰夫、德夫林、罗伯茨、周登尼、黎国强、魏贾森.* JMLR'22\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ 返回顶部 ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n## 其他实用资源\n\n\n\n- **[LLM Reasoners](https:\u002F\u002Fgithub.com\u002FBer666\u002Fllm-reasoners)**  一个用于高级大型语言模型推理的库。\n- **[Chain-of-Thought Hub](https:\u002F\u002Fgithub.com\u002FFranxYao\u002Fchain-of-thought-hub)**  使用思维链提示来评估大型语言模型的推理性能。\n- **[ThoughtSource](https:\u002F\u002Fgithub.com\u002FOpenBioLink\u002FThoughtSource)**  大型语言模型中与思维链推理相关的数据和工具的集中且开放的资源。\n- **[AgentChain](https:\u002F\u002Fgithub.com\u002Fjina-ai\u002Fagentchain)**  将多个大型语言模型串联起来进行推理，并协调多个大型模型完成复杂任务。\n- **[google\u002FCascades](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fcascades)**  一个Python库，支持语言模型的复杂组合，例如草稿纸、思维链、工具使用、选择-推理等。\n- **[LogiTorch](https:\u002F\u002Fgithub.com\u002FLogiTorch\u002Flogitorch)**  基于PyTorch的自然语言逻辑推理库。\n- **[salesforce\u002FLAVIS](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FLAVIS)**  集成语言与视觉智能的一站式库。\n- **[facebookresearch\u002FRAM](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FRAM)**  一个用于研究人工智能模型在推理、对齐及内存（RAM）使用方面的框架。\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ 返回顶部 ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n## 其他精彩列表\n\n\n\n- **[Awesome-Controllable-Generation](https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-Controllable-Generation)**  关于使用扩散模型进行可控生成的论文和资源合集。\n- **[Chain-of-ThoughtsPapers](https:\u002F\u002Fgithub.com\u002FTimothyxxx\u002FChain-of-ThoughtsPapers)**  该趋势始于“思维链提示能够激发大型语言模型的推理能力”。\n- **[LM-reasoning](https:\u002F\u002Fgithub.com\u002Fjeffhj\u002FLM-reasoning)**  大型语言模型推理相关论文和资源的集合。\n- **[Prompt4ReasoningPapers](https:\u002F\u002Fgithub.com\u002Fzjunlp\u002FPrompt4ReasoningPapers)**  论文《通过语言模型提示进行推理：综述》的存储库。\n- **[ReasoningNLP](https:\u002F\u002Fgithub.com\u002FFreedomIntelligence\u002FReasoningNLP)**  自然语言处理中关于推理的论文列表。\n- **[Awesome-LLM](https:\u002F\u002Fgithub.com\u002FHannibal046\u002FAwesome-LLM)**  精选的大型语言模型列表。\n- **[Awesome LLM Self-Consistency](https:\u002F\u002Fgithub.com\u002FSuperBruceJia\u002FAwesome-LLM-Self-Consistency)**  大型语言模型中自我一致性相关资源的精选列表。\n- **[Deep-Reasoning-Papers](https:\u002F\u002Fgithub.com\u002Ffloodsung\u002FDeep-Reasoning-Papers)**  包括神经符号推理、逻辑推理和视觉推理在内的最新论文。\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ 返回顶部 ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n## 如何贡献\n\n- 添加新论文或更新现有论文时，请考虑该工作应归入哪个类别。\n- 描述工作时，请采用与现有条目相同的格式。\n- 添加论文的摘要链接（如果是arXiv预印本，则为`\u002Fabs\u002F`格式）。\n\n**即使不小心做错了也没关系，我们会帮你修正！**\n\n### 贡献者\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fatfortes_Awesome-LLM-Reasoning_readme_9621d72d9c6c.png\" \u002F>\n\u003C\u002Fa>","# Awesome-LLM-Reasoning 快速上手指南\n\n**项目简介**：\n`Awesome-LLM-Reasoning` 并非一个可直接安装运行的软件库或框架，而是一个**精选的资源列表（Awesome List）**。它汇集了关于如何解锁大语言模型（LLM）和多模态大模型（MLLM）推理能力的最新论文、代码库、基准测试和技术综述。\n\n本指南旨在帮助开发者高效利用该列表中的资源，快速定位所需的算法实现、数据集或评估工具。\n\n## 1. 环境准备\n\n由于本项目是资源索引，无需特定的系统环境。但为了运行列表中链接的具体论文代码（如 CoT 推理、数学推理基准等），建议准备以下通用开发环境：\n\n*   **操作系统**：Linux (推荐 Ubuntu 20.04+), macOS, 或 Windows (WSL2)\n*   **Python 版本**：3.8 或更高版本（大多数现代 LLM 项目要求 3.9+）\n*   **包管理工具**：`pip` 或 `conda`\n*   **硬件要求**：\n    *   **阅读与检索**：无特殊要求，任意设备均可。\n    *   **运行相关代码**：若需复现列表中的推理实验，通常需要 NVIDIA GPU (显存建议 16GB+) 并安装 CUDA 驱动。\n*   **前置依赖**：\n    *   Git (用于克隆具体项目的代码库)\n    *   科学上网环境或学术镜像（部分论文链接指向 arXiv，代码库指向 GitHub）\n\n## 2. 获取与使用步骤\n\n本项目不需要传统的“安装”过程，而是通过浏览和克隆子项目来使用。\n\n### 步骤一：克隆或浏览资源列表\n\n你可以直接在线浏览，或克隆仓库到本地以便离线查阅和搜索。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning.git\ncd Awesome-LLM-Reasoning\n```\n\n> **国内加速建议**：如果克隆速度较慢，可使用国内镜像源（如 Gitee 镜像，若有）或配置 Git 代理。\n> ```bash\n> # 示例：使用代理加速克隆（请替换为你的实际代理端口）\n> git clone -c http.proxy=http:\u002F\u002F127.0.0.1:7890 https:\u002F\u002Fgithub.com\u002Fatfortes\u002FAwesome-LLM-Reasoning.git\n> ```\n\n### 步骤二：查找所需资源\n\n打开根目录下的 `README.md` 文件，根据目录结构查找你感兴趣的方向：\n\n*   **Survey (综述)**：查找特定年份（2022-2025）的推理技术综述论文。\n*   **Analysis (分析)**：查找关于 LLM 推理稳定性、幻觉、思维链（CoT）有效性的分析报告。\n*   **Technique (技术)**：\n    *   `LLM`: 纯文本大模型的推理增强技术。\n    *   `MLLM`: 多模态推理技术。\n    *   `LM`: 小模型推理能力扩展方案。\n\n### 步骤三：复用具体项目代码\n\n找到感兴趣的论文条目后，点击其附带的 `[code]` 链接（通常指向独立的 GitHub 仓库）。以下是通用的复用流程示例：\n\n假设你对 **\"Sketch-of-Thought\"** (一种高效的推理方法) 感兴趣：\n\n1.  **访问子项目页面**：在列表中点击对应的 code 链接。\n2.  **克隆子项目**：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002FSimonAytes\u002FSoT.git\n    cd SoT\n    ```\n3.  **安装子项目依赖**（参考该项目自己的 README）：\n    ```bash\n    pip install -r requirements.txt\n    ```\n4.  **运行示例**：\n    ```bash\n    python run_reasoning.py --model_name_or_path llama-2-7b --prompt \"Math problem here\"\n    ```\n\n## 3. 基本使用示例\n\n以下演示如何利用该列表快速找到一个“数学推理”相关的开源工具并运行最简单的测试。\n\n**场景**：你想测试一个大模型在数学问题上的思维链（Chain-of-Thought）能力。\n\n1.  **检索**：在 `Awesome-LLM-Reasoning` 的 `Technique` -> `LLM` 或 `Survey` 部分，寻找关键词 \"Mathematical Reasoning\" 或 \"Chain-of-Thought\"。\n    *   *发现资源*：例如列表中的 `[Large Language Models for Mathematical Reasoning: Progresses and Challenges]` 或其关联代码库。\n    *   *或者直接查找评测集*：列表顶部提到的 **[LLMSymbolicReasoningBench](https:\u002F\u002Fgithub.com\u002Fatfortes\u002FLLMSymbolicReasoningBench)**。\n\n2.  **获取评测工具**：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fatfortes\u002FLLMSymbolicReasoningBench.git\n    cd LLMSymbolicReasoningBench\n    ```\n\n3.  **安装依赖**：\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n4.  **执行简单测试**（假设该工具支持本地模型或 API）：\n    ```bash\n    # 运行符号推理基准测试示例\n    python evaluate.py --model gpt-3.5-turbo --task symbolic_logic\n    ```\n\n**提示**：\n*   列表中的每个条目都包含论文链接（arXiv）和代码链接（GitHub）。建议先阅读论文摘要了解原理，再查看代码库的 `Usage` 章节进行部署。\n*   关注 `2025` 和 `2024` 标签下的最新成果，这些通常代表了当前的 SOTA（State-of-the-Art）水平。","某金融科技公司算法团队正致力于研发一款能自动解析复杂衍生品合同并识别潜在风险条款的智能审计系统。\n\n### 没有 Awesome-LLM-Reasoning 时\n- **技术选型盲目**：团队在海量论文中迷失，难以区分哪些推理技术（如思维链 CoT、自一致性）真正适用于法律逻辑推导，导致反复试错。\n- **模型表现不稳定**：直接调用通用大模型处理多步逻辑题时，常出现“幻觉”或中间步骤跳跃，无法准确追踪合同条款间的因果链条。\n- **缺乏评估基准**：找不到针对符号推理和复杂谜题的专业评测集，无法量化模型在逻辑严密性上的真实提升幅度。\n- **研发周期冗长**：从零复现前沿推理算法耗时数月，错过了产品上线的最佳窗口期。\n\n### 使用 Awesome-LLM-Reasoning 后\n- **精准锁定方案**：通过其整理的综述与资源，团队快速定位到适合法律场景的“多模态思维链”及最新开源模型（如 DeepSeek-R1），直接复用成熟架构。\n- **推理能力跃升**：依据列表中关于内部一致性与自我反馈的研究优化提示词，模型现在能一步步拆解合同逻辑，错误率降低 40%。\n- **科学量化效果**：利用推荐的 LLMSymbolicReasoningBench 等基准测试，团队建立了严格的逻辑能力评估体系，确保每次迭代都有据可依。\n- **加速落地进程**：站在巨人肩膀上，将原本数月的预研工作压缩至两周，迅速完成了原型验证并推向生产环境。\n\nAwesome-LLM-Reasoning 不仅是一份资源清单，更是开发者解锁大模型深层逻辑推理能力、从“盲目尝试”转向“科学构建”的关键导航图。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fatfortes_Awesome-LLM-Reasoning_2609ca5b.png","atfortes","Armando Fortes","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fatfortes_b81cb213.jpg","PhD candidate in MMLab@NTU. Prev: Tsinghua @thu-ml, Técnico Lisboa.","Nanyang Technological University","Singapore",null,"atfortes19","atfortes.github.io","https:\u002F\u002Fgithub.com\u002Fatfortes",3577,202,"2026-04-03T20:44:08","MIT",1,"","未说明",{"notes":94,"python":92,"dependencies":95},"该项目是一个 curated collection（精选合集），主要包含关于大语言模型（LLM）和多模态大模型（MLLM）推理能力的论文和资源列表，并非一个可直接运行的软件工具或代码库。因此，README 中未提供具体的操作系统、GPU、内存、Python 版本或依赖库等运行环境需求。部分列出的论文可能附带独立的代码实现链接，需参考各子项目的具体说明。",[],[15,54,26],[98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115],"language-models","reasoning","prompt","in-context-learning","chatgpt","chain-of-thought","prompt-engineering","cot","awesome","gpt","mllm","multimodal","papers","gpt-4o","openai-o1","strawberry","deepseek","deepseek-r1","2026-03-27T02:49:30.150509","2026-04-06T05:44:31.750600",[],[]]