[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Eclipsess--Awesome-Efficient-Reasoning-LLMs":3,"tool-Eclipsess--Awesome-Efficient-Reasoning-LLMs":65},[4,17,27,35,48,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",150037,2,"2026-04-10T23:33:47",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,"2026-04-10T11:13:16",[26,43,44,45,14,46,15,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":54,"last_commit_at":55,"category_tags":56,"status":16},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[15,43,46],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":54,"last_commit_at":63,"category_tags":64,"status":16},6590,"gpt4all","nomic-ai\u002Fgpt4all","GPT4All 是一款让普通电脑也能轻松运行大型语言模型（LLM）的开源工具。它的核心目标是打破算力壁垒，让用户无需依赖昂贵的显卡（GPU）或云端 API，即可在普通的笔记本电脑和台式机上私密、离线地部署和使用大模型。\n\n对于担心数据隐私、希望完全掌控本地数据的企业用户、研究人员以及技术爱好者来说，GPT4All 提供了理想的解决方案。它解决了传统大模型必须联网调用或需要高端硬件才能运行的痛点，让日常设备也能成为强大的 AI 助手。无论是希望构建本地知识库的开发者，还是单纯想体验私有化 AI 聊天的普通用户，都能从中受益。\n\n技术上，GPT4All 基于高效的 `llama.cpp` 后端，支持多种主流模型架构（包括最新的 DeepSeek R1 蒸馏模型），并采用 GGUF 格式优化推理速度。它不仅提供界面友好的桌面客户端，支持 Windows、macOS 和 Linux 等多平台一键安装，还为开发者提供了便捷的 Python 库，可轻松集成到 LangChain 等生态中。通过简单的下载和配置，用户即可立即开始探索本地大模型的无限可能。",77307,"2026-04-11T06:52:37",[15,13],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":80,"owner_twitter":80,"owner_website":82,"owner_url":83,"languages":80,"stars":84,"forks":85,"last_commit_at":86,"license":80,"difficulty_score":54,"env_os":87,"env_gpu":88,"env_ram":88,"env_deps":89,"category_tags":92,"github_topics":93,"view_count":10,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":97,"updated_at":98,"faqs":99,"releases":135},4940,"Eclipsess\u002FAwesome-Efficient-Reasoning-LLMs","Awesome-Efficient-Reasoning-LLMs","[TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models","Awesome-Efficient-Reasoning-LLMs 是一个专注于大语言模型（LLM）高效推理技术的开源知识库与综述项目。它源自发表于 TMLR 2025 的论文《Stop Overthinking》，旨在解决当前大模型在复杂推理任务中普遍存在的“过度思考”问题——即生成冗长、低效的思维链，导致计算资源浪费和响应延迟。\n\n该项目系统性地梳理了让大模型“想得更少、更准”的前沿方案，涵盖了从强化学习奖励设计、变长思维链数据微调，到推理步骤压缩、动态推理范式及提示词引导等八大核心技术方向。通过分类整理最新的学术论文与技术进展，它为社区提供了一张清晰的技术路线图。\n\nAwesome-Efficient-Reasoning-LLMs 特别适合 AI 研究人员、算法工程师及大模型开发者使用。对于希望优化模型推理成本、提升响应速度的团队，这里提供了丰富的理论依据和实践参考；对于学术研究者，它则是追踪高效推理领域最新动态、寻找研究灵感的宝贵资源。无论是想要复现最新算法，还是评估不同优化策略的效果，这个项目都能帮助你快速掌握如何让大模型在保持智能的同时，学会“适可而止”。","# Awesome-Efficient-Reasoning-LLMs\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Stop_Overthinking-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.16419)\n\u003C!-- [![Maintenance](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMaintained%3F-yes-green.svg)]() \u003C!-- Optional: Link to GitHub repo -->\n\u003C!-- [![Last Commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002F\u003Cyour-username>\u002F\u003Crepo-name>)]() \u003C!-- Fill in your repo link -->\n\u003C!-- [![Contributions Welcome](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContributions-welcome-blue)]() --> \n\n\n## [TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models\n\n\u003C!-- omit in toc -->\n\n## 📢 Want to add related papers? Feel free to open a pull request!\n\n## 📢 News\n- **August 21, 2025**: Updated.\n- **July 14, 2025**: \"Stop Overthinking\" is accepted by TMLR, Transactions on Machine Learning Research.\n- **April 22, 2025**: Updated.\n- **March 20, 2025**: We release the first survey for efficient reasoning of LLMs \"[Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.16419)\".  \n  Feel free to cite, contribute, or open a pull request to add recent related papers!\n  \n\n\u003C!-- omit in toc -->\n![Pipeline](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEclipsess_Awesome-Efficient-Reasoning-LLMs_readme_246ce9301323.png)\n\nIn this paper, we present the first structured survey that systematically investigates and organizes the current progress in achieving **efficient reasoning in LLMs**.\n\n## 📊 Taxonomy\n\nBelow is a taxonomy graph summarizing the current landscape of efficient reasoning research for LLMs:\n\n![Taxonomy](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEclipsess_Awesome-Efficient-Reasoning-LLMs_readme_952a6b6d1bd7.png)\n\n---\n\n\u003C!-- omit in toc -->\n## 📚 Table of Contents\n\n- [Awesome-Efficient-Reasoning-LLMs](#awesome-efficient-reasoning-llms)\n  - [\\[TMLR 2025\\] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models](#tmlr-2025-stop-overthinking-a-survey-on-efficient-reasoning-for-large-language-models)\n  - [📢 Want to add related papers? Feel free to open a pull request!](#-want-to-add-related-papers-feel-free-to-open-a-pull-request)\n  - [📢 News](#-news)\n  - [📊 Taxonomy](#-taxonomy)\n  - [Section I:  RL with Length Reward Design](#section-i--rl-with-length-reward-design)\n  - [Section II: SFT with Variable-Length CoT Data](#section-ii-sft-with-variable-length-cot-data)\n  - [Section III: Compressing Reasoning Steps into Fewer Latent Representation](#section-iii-compressing-reasoning-steps-into-fewer-latent-representation)\n  - [Section IV: Dynamic Reasoning Paradigm during Inference](#section-iv-dynamic-reasoning-paradigm-during-inference)\n  - [Section V: Prompt-Guided Efficient Reasoning](#section-v-prompt-guided-efficient-reasoning)\n  - [Section VI: Prompts Attribute-Driven Reasoning Routing](#section-vi-prompts-attribute-driven-reasoning-routing)\n  - [Section VII: Reasoning Abilities via Efficient Training Data and Model Compression](#section-vii-reasoning-abilities-via-efficient-training-data-and-model-compression)\n  - [Section VIII: Evaluation and Benchmark](#section-viii-evaluation-and-benchmark)\n  - [Citation](#citation)\n  - [Acknowledgment](#acknowledgment)\n\n\n---\n\n\u003C!--[[Paper]](pdf LINK) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-\u003C TIME >-red)-->\n\n\"(.)\" stands for \"To Be Updated\" in the survey paper.\n\n## Section I:  RL with Length Reward Design\n\n* Demystifying Long Chain-of-Thought Reasoning in LLMs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.03373) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.12570) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.01-red)\n* Kimi k1.5: Scaling Reinforcement Learning with LLMs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.12599) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.01-red)\n* Training Language Models to Reason Efficiently [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.04463) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning [[Paper]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2503.04697) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.04472) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.07572) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* HAWKEYE: Efficient Reasoning with Model Collaboration [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.00424) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* THINKPRUNE: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.01296) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Think When You Need: Self-Adaptive Chain-of-Thought Learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.03234) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Concise Reasoning via Reinforcement Learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.05185) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.11827) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.17250) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Scalable Chain of Thoughts via Elastic Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.05315) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.07686) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.11274) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.07961) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Efficient RL Training for Reasoning Models via Length-Aware Optimization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.12284) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Optimizing Anytime Reasoning via Budget Relative Policy Optimization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.13438) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Learn to Reason Efficiently with Adaptive Length-based Reward Shaping [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.15612) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.16315) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.19187) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.21178) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Stable Reinforcement Learning for Efficient Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.18086) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.21765) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Thinkless: LLM Learns When to Think. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.13379) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Think Only When You Need with Large Hybrid-Reasoning Models. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.14631) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.15400) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.11896) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.10832) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* AdaptThink: Reasoning Models Can Learn When to Think. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.13417) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.08125) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* How Far Are We from Optimal Reasoning Efficiency? [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.07104) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.05256) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.10446) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Optimizing Length Compression in Large Reasoning Models. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.14755) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* AdapThink: Adaptive Thinking Preferences for Reasoning Language Model. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.18237) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.20160) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23840) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* SmartThinker: Learning to Compress and Preserve Reasoning by Step-Level Length Control. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.04348) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n* Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.02178) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* Train Long, Think Short: Curriculum Learning for Efficient Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.08940) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.09726) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* SABER: Switchable and Balanced Training for Efficient LLM Reasoning.  [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10026) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* Promoting Efficient Reasoning with Verifiable Stepwise Reward. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10293) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.11582) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* Beyond Token Length: Step Pruner for Efficient and Accurate Reasoning in Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.03805) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.10-red)\n\n\n## Section II: SFT with Variable-Length CoT Data\n\n* TokenSkip: Controllable Chain-of-Thought Compression in LLMs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.12067) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.11664) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* CoT-Valve: Length-Compressible Chain-of-Thought Tuning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.09601) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Self-Training Elicits Concise Reasoning in Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.20122) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Distilling System 2 into System 1 [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06023) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.07-red)\n* Can Language Models Learn to Skip Steps? [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.01855) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.11-red)\n* Verbosity-Aware Rationale Reduction: Sentence-Level Rationale Reduction for Efficient and Effective Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.21006) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.13260) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Z1: Efficient Test-time Scaling with Code [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.00810) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.21659) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.03469) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.13975) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.22662) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14582) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* VeriThinker: Learning to Verify Makes Reasoning Model Efficient [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.17941) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.14794) [[Model Card]](https:\u002F\u002Fhuggingface.co\u002Ftngtech\u002FDeepSeek-TNG-R1T2-Chimera) [[Free access via OpenRouter]](https:\u002F\u002Fopenrouter.ai\u002Ftngtech\u002Fdeepseek-r1t2-chimera:free) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16838) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Not All Tokens Are What You Need In Thinking [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.17827) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.24550) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.04881) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02678) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* OThink-R1: Intrinsic Fast\u002FSlow Thinking Mode Switching for Over-Reasoning Mitigation. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.02397) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09853) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.10822) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Compressing Chain-of-Thought in LLMs via Step Entropy. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2508.03346) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2508.05988) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n\n  \n## Section III: Compressing Reasoning Steps into Fewer Latent Representation\n\n* Training Large Language Models to Reason in a Continuous Latent Space [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.06769) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* Compressed Chain of Thought: Efficient Reasoning through Dense Representations [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.13171) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* Efficient Reasoning with Hidden Thinking (MLLM) [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.19201) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.01-red)\n* SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.12134) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.03275) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Reasoning with Latent Thoughts: On the Power of Looped Transformers [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.17416) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.21074) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Efficient Reasoning with Hidden Thinking [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.19201) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.01-red)\n* Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.03275) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.10835) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* SEAL: Steerable Reasoning Calibration of Large Language Models for Free [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.07986) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.16552) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.07240) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Controlling Thinking Speed in Reasoning Models. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2507.03704) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n\n## Section IV: Dynamic Reasoning Paradigm during Inference\n\n* Efficiently Serving LLM Reasoning Programs with Certaindex [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.20993) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* When More is Less: Understanding Chain-of-Thought Length in LLMs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.07266) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.05179) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* Reward-Guided Speculative Decoding for Efficient LLM Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.19324) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Fast Best-of-N Decoding via Speculative Rejection [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.20290) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.10-red)\n* FastMCTS: A Simple Sampling Strategy for Data Synthesis [[Paper]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2502.11476) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Dynamic Parallel Tree Search for Efficient LLM Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.16235) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.01422) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* LightThinker: Thinking Step-by-Step Compression (training LLMs to compress thoughts into gist tokens) [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.15589) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models [[Paper]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2503.06692) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* Reasoning Without Self-Doubt: More Efficient Chain-of-Thought Through Certainty Probing [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=wpK4IMJfdX) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.07891) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.13943) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.12329) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Efficient Reasoning for LLMs through Speculative Chain-of-Thought. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.19095) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Can atomic step decomposition enhance the self-structured reasoning of multimodal large models? [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.06252) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* Think smarter not harder: Adaptive reasoning with inference aware optimization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.17974) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.01-red)\n* Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.17017) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.10480) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.01-red)\n* Confidence Improves Self-Consistency in LLMs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.06233) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Make every penny count: Difficulty-adaptive self-consistency for cost-efficient reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.13457) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Path-consistency: Prefix enhancement for efficient inference in llm [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.01281) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* Bridging internal probability and self-consistency for effective and efficient llm reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.00511) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Towards thinking-optimal scaling of test-time compute for llm reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.18080) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods[[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.14047) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Reasoning models can be effective without thinking [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.09858) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Retro-search: Exploring untaken paths for deeper and efficient reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.04383) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Thought manipulation: External thought can be efficient for large reasoning models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.13626) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Sleep-time compute: Beyond inference scaling at test-time [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.13171) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of-thought [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.05695) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.13367) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Dynamic Early Exit in Reasoning Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.15895) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Stop Spinning Wheels: Mitigating LLM Overthinking via Mining Patterns for Early Reasoning Exit [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.17627) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.24863) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.04842) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.20325) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Fractured Chain-of-Thought Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.12992) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Value-Guided Search for Efficient Chain-of-Thought Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.17373) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.17813) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* First Finish Search: Efficient Test-Time Scaling in Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.18149) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Accelerating Large Language Model Reasoning via Speculative Search. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.02865) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* FlashThink: An Early Exit Method For Efficient Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.13949) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.13866) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.08392) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15684) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16122) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.17155) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSCALR@COLM-2025-blue)\n* CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22017) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM\u002FMLLM Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15154) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.24863) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.15707) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* SPECS: Faster Test-Time Scaling through Speculative Drafts. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.15733) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.22716) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Accelerated Test-Time Scaling with Model-Free Speculative Sampling [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04708) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Answer Convergence as a Signal for Early Stopping in Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02536) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Collaborative LLM Inference via Planning for Efficient Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.11578) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Wait, We Don't Need to \"Wait\"! Removing Thinking Tokens Improves Reasoning Efficiency [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08343) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.12353) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Steering LLM Thinking with Budget Guidance. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.13752) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.15647) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Activation Steering for Chain-of-Thought Compression. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.04742) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n* R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.17307) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n* Large Reasoning Models Know How to Think Efficiently. [[Paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=pLKDeGm2t1)]  ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FESFoMoIII@ICML-2025-blue)\n* MUR: Momentum Uncertainty guided Reasoning for Large Language Models. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.14958) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n* Test-time Prompt Intervention. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.02511) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.05337) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* Entropy After `\u003C\u002FThink>` for reasoning model early exiting [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.26522)  ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.09-red) (.)\n* Parallel-R1: Towards Parallel Thinking via Reinforcement Learning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.07980) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.09-red) (.)\n* DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2511.00640) [[Code]](https:\u002F\u002Fgithub.com\u002FZichengXu\u002FDecoding-Tree-Sketching) [[Colab]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FZichengXu\u002FDecoding-Tree-Sketching\u002Fblob\u002Fmain\u002Fnotebooks\u002Fexample_DeepSeek_R1_Distill_Qwen_1_5B.ipynb) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2026.02-red)\n\n## Section V: Prompt-Guided Efficient Reasoning\n\n* Token-Budget-Aware LLM Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.18547) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Chain of Draft: Thinking Faster by Writing Less [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.18600) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.01141) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.05618) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.10-red)\n* Brevity is the soul of sustainability: Characterizing LLM response lengths. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.08686) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.10716) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.18810) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n\n## Section VI: Prompts Attribute-Driven Reasoning Routing\n* Claude 3.7 Sonnet and Claude Code [[website]](https:\u002F\u002Fwww.anthropic.com\u002Fnews\u002Fclaude-3-7-sonnet) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fhtml-2025.02-red)\n* Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.05179) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* Learning to Route LLMs with Confidence Tokens [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.13284) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.04428) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* RouteLLM: Learning to Route LLMs with Preference Data [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.18665) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* ThinkSwitcher: When to Think Hard, When to Think Fast. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.14183) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.04182) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2507.02822) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n\n## Section VII: Reasoning Abilities via Efficient Training Data and Model Compression\n\n* LIMO: Less is More for Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.03387) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* s1: Simple test-time scaling [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.19393) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* S2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.12853) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.10460) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* Small Models Struggle to Learn from Strong Reasoners [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.12143) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Towards Reasoning Ability of Small Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.11569) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Mixed Distillation Helps Smaller Language Models Reason Better [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.10730) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.02-red)\n* Small language models need strong verifiers to self-correct reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.17140) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.06-red)\n* Teaching Small Language Models Reasoning through Counterfactual Distillation [[Paper]](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-main.333.pdf) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.11-red)\n* Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.14698) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.11-red)\n* Probe then retrieve and reason: Distilling probing and reasoning capabilities into smaller language models [[Paper]](https:\u002F\u002Faclanthology.org\u002F2024.lrec-main.1140.pdf) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.05-red)\n* Distilling Reasoning Ability from Large Language Models with Adaptive Thinking [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.09170) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.08-red)\n* SKIntern: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.13183) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.04872) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* Improving mathematical reasoning capabilities of small language models via feedback-driven distillation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.14698) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.11-red)\n* Probe then retrieve and reason: Distilling probing and reasoning capabilities into smaller language models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2212.00193) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2023.05-red)\n* TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.24198) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.02010) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n\n\n## Section VIII: Evaluation and Benchmark\n* Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.06703) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.08235) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.12521) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.09324) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.11-red)\n* The Impact of Reasoning Step Length on Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fhtml\u002F2401.04925v3) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.01-red)\n* S1-bench: A simple benchmark for evaluating system 1 thinking capability of large reasoning models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.10368) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* When reasoning meets compression: Benchmarking compressed large reasoning models on complex reasoning tasks. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.02010) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.04823) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* A Technical Study into 0.5B Reasoning Language Models. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.13404) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* Revisiting Model Interpolation for Efficient Reasoning. [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.10977) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.10-red)\n* Do LLMs Really Need 10+ Thoughts for \"Find the Time 1000 Days Later\"? Towards Structural Understanding of LLM Overthinking [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.07880) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.10-red)\n\n\n\n## Citation\nIf you find this work useful, please cite us.\n```bib\n@misc{sui2025stopoverthinkingsurveyefficient,\n      title={Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models}, \n      author={Yang Sui and Yu-Neng Chuang and Guanchu Wang and Jiamu Zhang and Tianyi Zhang and Jiayi Yuan and Hongyi Liu and Andrew Wen and Shaochen Zhong and Hanjie Chen and Xia Hu},\n      year={2025},\n      eprint={2503.16419},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.16419}, \n}\n```\n\n## Acknowledgment\n> 🧩 *Layout inspired by [zzli2022\u002FAwesome-System2-Reasoning-LLM](https:\u002F\u002Fgithub.com\u002Fzzli2022\u002FAwesome-System2-Reasoning-LLM) and the latest works are referred to as [hemingkx\u002FAwesome-Efficient-Reasoning](https:\u002F\u002Fgithub.com\u002Fhemingkx\u002FAwesome-Efficient-Reasoning). Many thanks for the great structure!*\n","# 令人惊叹的高效推理大语言模型\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-停止過度思考-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.16419)\n\u003C!-- [![维护中](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F维护中%3F-是-green.svg)]() \u003C!-- 可选：指向GitHub仓库的链接 -->\n\u003C!-- [![最近一次提交](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002F\u003Cyour-username>\u002F\u003Crepo-name>)]() \u003C!-- 填写你的仓库链接 -->\n\u003C!-- [![欢迎贡献](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F贡献-欢迎-blue)]() --> \n\n\n## 【TMLR 2025】停止过度思考：大型语言模型高效推理综述\n\n\u003C!-- omit in toc -->\n\n## 📢 想添加相关论文吗？欢迎随时发起拉取请求！\n\n## 📢 最新消息\n- **2025年8月21日**：更新。\n- **2025年7月14日**：“停止过度思考”已被《机器学习研究汇刊》（TMLR）接收。\n- **2025年4月22日**：更新。\n- **2025年3月20日**：我们发布了首篇关于大语言模型高效推理的综述文章“停止过度思考：大型语言模型高效推理综述”（https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.16419）。  \n  欢迎引用、贡献，或发起拉取请求以添加最新的相关论文！\n\n\u003C!-- omit in toc -->\n![流程图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEclipsess_Awesome-Efficient-Reasoning-LLMs_readme_246ce9301323.png)\n\n在本文中，我们首次提出了一种结构化的综述框架，系统性地调研并梳理了当前在实现**大语言模型高效推理**方面的进展。\n\n## 📊 分类体系\n\n以下是总结当前大语言模型高效推理研究现状的分类图：\n\n![分类体系](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEclipsess_Awesome-Efficient-Reasoning-LLMs_readme_952a6b6d1bd7.png)\n\n---\n\n\u003C!-- omit in toc -->\n## 📚 目录\n\n- [令人惊叹的高效推理大语言模型](#令人惊叹的高效推理大语言模型)\n  - [【TMLR 2025】停止过度思考：大型语言模型高效推理综述](#tmlr-2025-停止过度思考-大型语言模型高效推理综述)\n  - [📢 想添加相关论文吗？欢迎随时发起拉取请求！](#-想添加相关论文吗欢迎随时发起拉取请求)\n  - [📢 最新消息](#-最新消息)\n  - [📊 分类体系](#-分类体系)\n  - [第一部分：基于长度奖励设计的强化学习](#第一部分--基于长度奖励设计的强化学习)\n  - [第二部分：使用可变长度思维链数据的监督微调](#第二部分--使用可变长度思维链数据的监督微调)\n  - [第三部分：将推理步骤压缩为更少的潜在表示](#第三部分--将推理步骤压缩为更少的潜在表示)\n  - [第四部分：推理过程中的动态推理范式](#第四部分--推理过程中的动态推理范式)\n  - [第五部分：提示引导的高效推理](#第五部分--提示引导的高效推理)\n  - [第六部分：基于提示属性的推理路由](#第六部分--基于提示属性的推理路由)\n  - [第七部分：通过高效训练数据和模型压缩提升推理能力](#第七部分--通过高效训练数据和模型压缩提升推理能力)\n  - [第八部分：评估与基准测试](#第八部分--评估与基准测试)\n  - [引用](#引用)\n  - [致谢](#致谢)\n\n\n---\n\n\u003C!--[[论文]](pdf LINK) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-\u003C TIME >-red)-->\n\n“(.)”在综述论文中代表“待更新”。\n\n## 第一部分：基于长度奖励设计的强化学习\n\n* 解密大语言模型中的长链式思维推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.03373) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* O1-Pruner：针对O1类推理的长度协调微调与剪枝 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.12570) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.01-red)\n* Kimi k1.5：基于大语言模型的强化学习规模化应用 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.12599) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.01-red)\n* 训练语言模型实现高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.04463) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* L1：利用强化学习控制推理模型的思考时长 [[论文]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2503.04697) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* DAST：面向大型推理模型的难度自适应慢思考机制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.04472) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 通过元强化微调优化测试时计算资源消耗 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.07572) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* HAWKEYE：基于模型协作的高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.00424) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* THINKPRUNE：通过强化学习剪枝大语言模型的长链式思维 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.01296) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 需要时再思考：自适应链式思维学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.03234) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 基于强化学习的简洁推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.05185) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 并非所有思维都同等重要：多轮次强化学习驱动的大语言模型高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.11827) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* ConciseRL：以简洁性为导向的强化学习，用于构建高效推理模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.17250) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 基于弹性推理的可扩展链式思维 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.05315) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* S-GRPO：推理模型中的强化学习引导提前退出策略 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.07686) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* SelfBudgeter：面向高效大语言模型推理的自适应令牌分配 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.11274) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 让小型语言模型成为高效推理者：干预、监督与强化学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.07961) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 基于长度感知优化的高效推理模型强化学习训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.12284) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 通过预算相对策略优化实现任意时间推理的最优化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.13438) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 学习如何通过自适应长度奖励塑造实现高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.15612) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 激励双过程思维以提升大型语言模型的高效推理能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.16315) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* LIMOPro：面向高效且有效测试时缩放的推理优化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.19187) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 先走稳再跑快！基于强化学习的简洁大语言模型推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.21178) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 面向高效推理的稳定强化学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.18086) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 不必想得更久，而要明智地思考：优化大型推理模型的思维动态 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.21765) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Thinkless：大语言模型学会何时该思考。[[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.13379) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 只有在需要时才思考：大型混合推理模型的应用 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.14631) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 何时继续思考：面向高效推理的自适应思维模式切换 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.15400) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* AdaCoT：基于强化学习的帕累托最优自适应链式思维触发机制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.11896) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 学会何时思考：通过多阶段强化学习塑造R1风格模型的自适应推理能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.10832) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* AdaptThink：推理模型可以学会何时思考。[[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.13417) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* Bingo：通过动态且基于重要性的强化学习提升大语言模型的高效推理能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.08125) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 我们距离最优推理效率还有多远？[[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.07104) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 刚刚好够的思考：利用自适应长度惩罚的强化学习实现高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.05256) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 简单任务快速处理，复杂任务深入思考：通过动力长度惩罚实现高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.10446) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 优化大型推理模型中的长度压缩 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.14755) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* AdapThink：面向推理语言模型的自适应思考偏好 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.18237) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* AALC：通过自适应准确率-长度控制实现大型语言模型高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.20160) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 思考令牌是助力还是陷阱？迈向更高效的大型推理模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23840) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* SmartThinker：通过逐级长度控制学习压缩与保留推理内容 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.04348) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n* 重新审视过度思考：对链式思维推理中的内外冗余进行惩罚 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.02178) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* 先长期训练，后短时思考：面向高效推理的课程式学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.08940) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* 多采样，少思考：面向简洁推理的分组过滤策略优化 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.09726) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* SABER：面向高效大语言模型推理的可切换均衡训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10026) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* 通过可验证的逐级奖励促进高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10293) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* 先觉察，后少思：动态边界自我意识驱动大型语言模型的极致推理效率 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.11582) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* 超越令牌长度：面向大型语言模型高效精准推理的步骤剪枝器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.03805) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.10-red)\n\n## 第二节：变长思维链数据的监督微调\n\n* TokenSkip：LLM中的可控思维链压缩 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.12067) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* C3oT：在不牺牲有效性的情况下生成更短的思维链 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.11664) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* CoT-Valve：可长度压缩的思维链微调 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.09601) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 自训练激发大型语言模型中的简洁推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.20122) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 将系统2蒸馏到系统1中 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06023) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.07-red)\n* 语言模型能否学会跳过步骤？ [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.01855) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.11-red)\n* 多余性感知的论证简化：面向高效且有效推理的句子级论证简化。[[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.21006) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* 分步困惑度引导的精炼：用于大型语言模型中高效思维链推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.13260) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Z1：带有代码的高效测试时缩放 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.00810) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* Ada-R1：通过双层自适应推理优化实现混合思维链 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.21659) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 长短思维链混合监督微调，激发大型语言模型中的高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.03469) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* DRP：具有技能感知步骤分解的蒸馏式推理剪枝，用于高效的大规模推理模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.13975) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* AutoL2S：面向高效大型语言模型的自动长短推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.22662) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 剪枝能提升推理能力吗？以能力为导向重新审视长思维链压缩，以实现更好的推理效果 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14582) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* VeriThinker：学会验证使推理模型更高效 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.17941) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 专家集合：线性时间构建具有涌现和适应性行为的Chimera LLM变体 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.14794) [[模型卡片]](https:\u002F\u002Fhuggingface.co\u002Ftngtech\u002FDeepSeek-TNG-R1T2-Chimera) [[可通过OpenRouter免费访问]](https:\u002F\u002Fopenrouter.ai\u002Ftngtech\u002Fdeepseek-r1t2-chimera:free) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* R1-Compress：通过分块压缩与搜索实现长思维链压缩 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16838) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 并非所有标记都是思考所需的 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.17827) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* A*-Thought：通过双向压缩实现在低资源环境下的高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.24550) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* ConCISE：基于置信度的逐步高效推理中的压缩 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.04881) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* TL;DR：太长了，就重新加权吧——用于高效LLM推理压缩 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02678) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* OThink-R1：内在的快慢思维模式切换，用于缓解过度推理。[[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.02397) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 因果充分性和必要性提升思维链推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09853) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* ReCUT：通过分步轨迹和偏好优化，在LLM中平衡推理长度与准确性。[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.10822) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 通过步骤熵压缩LLM中的思维链。[[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2508.03346) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* 剪掉那些不出乎意料的部分：通过首个标记的惊讶度实现高效的代码推理。[[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2508.05988) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n\n## 第三节：将推理步骤压缩为更少的潜在表示\n\n* 在连续潜在空间中训练大型语言模型进行推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.06769) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* 压缩思维链：通过密集表示实现高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.13171) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* 利用隐性思维进行高效推理（MLLM） [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.19201) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.01-red)\n* SoftCoT：用于LLM高效推理的软性思维链 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.12134) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 令牌混合：混合潜在和文本令牌以提升语言模型的推理能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.03275) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 基于潜在思维的推理：关于循环Transformer的强大能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.17416) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* CODI：通过自我蒸馏将思维链压缩到连续空间 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.21074) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 利用隐性思维进行高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.19201) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.01-red)\n* 令牌混合：混合潜在和文本令牌以提升语言模型的推理能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.03275) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 反向注意力：理解并增强大型语言模型中的多跳推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.10835) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* SEAL：免费实现大型语言模型的可引导推理校准 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.07986) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 默念速思：LLM推理链的动态潜在压缩 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.16552) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 超频LLM推理：监控与控制LLM中的思维路径长度 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.07240) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 控制推理模型中的思考速度。[[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2507.03704) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n\n## 第四节：推理过程中的动态推理范式\n\n* 使用Certaindex高效服务LLM推理程序 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.20993) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* 当多即是少：理解LLM中的思维链长度 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.07266) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 思维草图：基于自适应认知启发式草图的高效LLM推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.05179) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 奖励引导的推测解码用于高效LLM推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.19324) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 通过推测拒绝实现快速的最佳N解码 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.20290) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.10-red)\n* FastMCTS：一种用于数据合成的简单采样策略 [[论文]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2502.11476) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 用于高效LLM推理的动态并行树搜索 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.16235) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 采样高效的测试时缩放：在早期解码中自我估计最佳N采样 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.01422) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* LightThinker：逐步压缩思考过程（训练LLM将思想压缩为要点标记） [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.15589) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* InftyThink：突破大型语言模型长上下文推理的长度限制 [[论文]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2503.06692) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 无自我怀疑的推理：通过确定性探测实现更高效的思维链 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=wpK4IMJfdX) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* SpecReason：通过推测推理实现快速且准确的推理时计算 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.07891) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* AdaptiveStep：根据模型置信度自动划分推理步骤 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.13943) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 推测性思考：在推理时利用大模型指导小模型推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.12329) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 通过推测性思维链实现LLM的高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.19095) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 原子级步骤分解能否提升多模态大模型的自组织推理能力？ [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.06252) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 更聪明地思考，而非更努力地思考：具有推理感知优化的自适应推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.17974) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.01-red)\n* 推理感知的自一致性：利用推理路径进行高效LLM采样 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.17017) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 摆脱高昂成本：用于多步推理的早期停止自一致性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.10480) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.01-red)\n* 置信度提升LLM的自一致性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.06233) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 让每一分钱都发挥作用：面向成本效益的难度自适应自一致性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.13457) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 路径一致性：用于LLM高效推理的前缀增强 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.01281) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 沟通内部概率与自一致性以实现有效且高效的LLM推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.00511) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 朝着推理时计算的最优缩放迈进 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.18080) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 深入思考，快速行动：探究无需验证器的推理时缩放方法的效率 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.14047) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 推理模型无需思考即可有效 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.09858) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 回溯搜索：探索未走过的路径以实现更深入、更高效的推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.04383) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 思想操控：外部思想对大型推理模型可能非常有效 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.13626) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 睡眠时间计算：超越推理时缩放 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.13171) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 解锁思想的能力：一个用于量化和优化思维链的推理边界框架 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.05695) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 思考终结者：基准测试、校准和缓解推理模型中的过度思考 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.13367) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 推理模型中的动态提前退出 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.15895) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 停止原地打转：通过挖掘模式实现LLM推理的早期退出以缓解过度思考 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.17627) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* AlphaOne：推理模型在测试时慢速与快速思考 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.24863) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 将价值重新带回强化学习中：通过统一LLM推理者与验证者实现更好的测试时缩放 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.04842) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 凭直觉引导：利用强化的内在信心实现高效的测试时缩放 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.20325) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 断裂的思维链推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.12992) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 价值导向的搜索用于高效思维链推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.17373) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 不要过度思考。偏好较短的思考链以提升LLM推理效果 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.17813) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 先完成搜索：大型语言模型中的高效测试时缩放 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.18149) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 通过推测搜索加速大型语言模型的推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.02865) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* FlashThink：一种用于高效推理的早期退出方法 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.13949) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 推理路径压缩：压缩生成轨迹以实现高效LLM推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.13866) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 加速思维链推理：当目标梯度重要性遇上动态跳过 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.08392) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* ThinkLess：一种无需训练的推理高效方法，用于减少推理冗余 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15684) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 计划与预算：大型语言模型推理中的有效且高效测试时缩放 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16122) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* TrimR：基于验证者的无训练思考压缩，用于高效测试时缩放 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.17155) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSCALR@COLM-2025-blue)\n* CoThink：通过指令模型引导推理模型实现令牌高效的推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22017) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 长时间推理并非全部所需：基于确定性的自适应路由用于高效LLM\u002FMLLM推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15154) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* AlphaOne：推理模型在测试时慢速与快速思考 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.24863) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 每一次展开都很重要：高效测试时缩放的最优资源分配 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.15707) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* SPECS：通过推测草稿实现更快的测试时缩放 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.15733) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* BEST-Route：具有测试时最优计算的自适应LLM路由 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.22716) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 无模型的推测采样加速测试时缩放 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04708) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 答案收敛作为推理中提前停止的信号 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02536) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 通过规划实现协作式LLM推理以提高效率 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.11578) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 等等，我们根本不需要“等待”！移除思考标记可提高推理效率 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08343) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 通过抑制大型推理模型中的自我肯定反思实现高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.12353) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 用预算指导来引导LLM思考 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.13752) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 探索并利用大型推理模型自身的内在效率，以实现自我引导的效率提升 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.15647) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 用于思维链压缩的激活引导 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.04742) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n* R-Stitch：用于高效推理的动态轨迹拼接 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.17307) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n* 大型推理模型知道如何高效思考 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=pLKDeGm2t1) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FESFoMoIII@ICML-2025-blue)\n* MUR：动量不确定性引导的大型语言模型推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.14958) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n* 测试时提示干预 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.02511) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* 通过基于确定性的反思抑制实现大型推理语言模型的高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.05337) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.08-red)\n* 在`\u003C\u002FThink>`之后的熵值可用于推理模型的提前退出 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.26522) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.09-red) (.)\n* Parallel-R1：通过强化学习迈向并行思考 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.07980) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.09-red) (.)\n* DTS：通过解码树草图增强大型推理模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2511.00640) [[代码]](https:\u002F\u002Fgithub.com\u002FZichengXu\u002FDecoding-Tree-Sketching) [[Colab]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FZichengXu\u002FDecoding-Tree-Sketching\u002Fblob\u002Fmain\u002Fnotebooks\u002Fexample_DeepSeek_R1_Distill_Qwen_1_5B.ipynb) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2026.02-red)\n\n## 第五节：提示引导的高效推理\n\n* 基于令牌预算的大型语言模型推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.18547) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 草稿链：通过减少书写来加快思考速度 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.18600) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 大型语言模型如何压缩自身的思维链？基于令牌复杂度的方法 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.01141) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 简洁思维链对大型语言模型解决问题的好处 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.05618) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.10-red)\n* 简洁是可持续性的灵魂：刻画大型语言模型的回答长度 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.08686) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* PREMISE：面向大型模型高效数学推理的可扩展且策略性的提示优化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.10716) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* ConciseHint：在生成过程中通过持续的简洁提示提升高效推理能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.18810) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n\n## 第六节：基于属性的提示驱动推理路由\n* Claude 3.7 Sonnet 和 Claude Code [[官网]](https:\u002F\u002Fwww.anthropic.com\u002Fnews\u002Fclaude-3-7-sonnet) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fhtml-2025.02-red)\n* 思维草图：基于自适应认知启发式草图的高效大型语言模型推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.05179) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 使用置信度令牌学习路由大型语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.13284) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 自信还是寻求更强？从基准测试到泛化，探索基于不确定性的设备端大型语言模型路由 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.04428) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* RouteLLM：利用偏好数据学习路由大型语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.18665) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* ThinkSwitcher：何时深入思考，何时快速思考 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.14183) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.05-red)\n* 长或短的思维链？探究大型推理模型的实例级切换 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.04182) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* SynapseRoute：一种双状态大型语言模型的自动路由切换框架 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2507.02822) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.07-red)\n\n## 第七节：通过高效训练数据和模型压缩提升推理能力\n\n* LIMO：少即是多的推理方法 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.03387) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* s1：简单的测试时缩放 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.19393) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* S2R：通过强化学习教导大型语言模型自我验证和自我修正 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.12853) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* Light-R1：从零开始及更进一步的长思维链课程化监督微调、DPO 和强化学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.10460) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 小型模型难以从强大的推理者那里学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.12143) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 朝着小型语言模型的推理能力前进 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.11569) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 混合蒸馏有助于小型语言模型更好地进行推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.10730) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.02-red)\n* 小型语言模型需要强大的验证者来进行推理自我修正 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.17140) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.06-red)\n* 通过反事实蒸馏教导小型语言模型进行推理 [[论文]](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-main.333.pdf) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.11-red)\n* 通过反馈驱动的蒸馏提升小型语言模型的数学推理能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.14698) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.11-red)\n* 探测后再检索并推理：将探测与推理能力蒸馏到小型语言模型中 [[论文]](https:\u002F\u002Faclanthology.org\u002F2024.lrec-main.1140.pdf) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.05-red)\n* 通过自适应思维从大型语言模型中蒸馏推理能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.09170) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.08-red)\n* SKIntern：内化符号知识，以更好地将思维链能力蒸馏到小型语言模型中 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.13183) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.12-red)\n* TinyR1-32B-Preview：通过分支合并蒸馏提升准确性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.04872) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 通过反馈驱动的蒸馏提升小型语言模型的数学推理能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.14698) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.11-red)\n* 探测后再检索并推理：将探测与推理能力蒸馏到小型语言模型中 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2212.00193) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2023.05-red)\n* TwT：通过多教师指导的习惯性推理蒸馏实现无令牌思考 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.24198) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.03-red)\n* 当推理遇上压缩：在复杂推理任务上对压缩后的大型推理模型进行基准测试 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.02010) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n\n## 第八部分：评估与基准测试\n* 10亿参数的LLM能否超越4050亿参数的LLM？重新思考计算最优的推理时缩放策略 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.06703) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 过度思考的危害：探究代理任务中的推理—行动困境 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.08235) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* LLM推理与规划的推理时计算：基准测试与洞察 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.12521) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.02-red)\n* 综合技巧：LLM越狱攻击的基准测试 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.09324) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.11-red)\n* 推理步骤长度对大型语言模型的影响 [[论文]](https:\u002F\u002Farxiv.org\u002Fhtml\u002F2401.04925v3) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2024.01-red)\n* S1-bench：用于评估大型推理模型系统1思维能力的简单基准测试 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.10368) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 当推理遇上压缩：在复杂推理任务上对压缩后的大型推理模型进行基准测试 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.02010) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 量化会损害推理能力吗？关于量化推理模型的实证研究 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.04823) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.04-red)\n* 关于0.5B推理语言模型的技术研究 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.13404) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.06-red)\n* 再次探讨模型插值法以实现高效推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.10977) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.10-red)\n* LLM真的需要10步以上的思考才能“算出1000天后的日期”吗？迈向对LLM过度思考的结构化理解 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.07880) ![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpdf-2025.10-red)\n\n\n\n## 引用\n如果您觉得本工作有用，请引用我们。\n```bib\n@misc{sui2025stopoverthinkingsurveyefficient,\n      title={停止过度思考：大型语言模型高效推理综述}, \n      author={杨穗、庄宇能、王冠楚、张嘉木、张天一、袁佳怡、刘洪毅、Andrew Wen、钟绍晨、陈瀚杰、胡霞},\n      year={2025},\n      eprint={2503.16419},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.16419}, \n}\n```\n\n## 致谢\n> 🧩 *版面设计灵感来源于[zzli2022\u002FAwesome-System2-Reasoning-LLM](https:\u002F\u002Fgithub.com\u002Fzzli2022\u002FAwesome-System2-Reasoning-LLM)，最新成果参考了[hemingkx\u002FAwesome-Efficient-Reasoning](https:\u002F\u002Fgithub.com\u002Fhemingkx\u002FAwesome-Efficient-Reasoning)。非常感谢其优秀的框架！*","# Awesome-Efficient-Reasoning-LLMs 快速上手指南\n\n**Awesome-Efficient-Reasoning-LLMs** 并非一个可直接安装的软件包或单一模型，而是一个由社区维护的**精选论文与资源列表**。它系统性地整理了关于“大型语言模型（LLM）高效推理”的最新研究成果，旨在帮助开发者快速定位减少推理步数、压缩思维链（CoT）及优化计算资源的相关技术。\n\n本指南将指导你如何获取该资源库，并基于其中的论文复现或应用高效推理技术。\n\n## 环境准备\n\n由于本项目主要包含论文链接、分类索引和学术综述，**无需安装特定的 Python 包或运行时环境**即可浏览内容。\n\n若你计划根据列表中的论文复现代码或训练模型，建议准备以下基础开发环境：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+) 或 macOS\n*   **Python**: 3.9 或更高版本\n*   **依赖管理**: `git`, `pip` 或 `conda`\n*   **硬件要求**: 取决于具体复现的论文模型（通常推理需要 NVIDIA GPU，训练则需要多卡集群）\n\n## 安装步骤\n\n### 1. 克隆仓库\n使用 `git` 将资源库下载到本地，以便离线浏览论文列表和分类结构。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FHKUNLP\u002FAwesome-Efficient-Reasoning-LLMs.git\ncd Awesome-Efficient-Reasoning-LLMs\n```\n\n> **国内加速提示**: 如果访问 GitHub 速度较慢，可使用国内镜像源（如 Gitee 镜像，若有）或通过代理加速克隆：\n> ```bash\n> git clone https:\u002F\u002Fgitee.com\u002Fmirror\u002FAwesome-Efficient-Reasoning-LLMs.git\n> ```\n> *(注：若官方未同步 Gitee 镜像，请使用标准克隆命令并配置网络代理)*\n\n### 2. 查看资源\n克隆完成后，直接在本地打开 `README.md` 文件即可查看完整的论文分类目录（包括 RL 长度奖励设计、变长 CoT 数据微调、推理步骤压缩等八大板块）。\n\n```bash\ncat README.md\n```\n\n或者在浏览器中打开本地文件：\n```bash\n# macOS\nopen README.md\n# Linux (需安装 xdg-utils)\nxdg-open README.md\n```\n\n## 基本使用\n\n本项目的核心用法是**作为技术选型指南**，帮助开发者找到适合的高效推理方案，然后前往对应的论文页面获取代码。\n\n### 使用场景示例：寻找“缩短思维链”的方法\n\n假设你希望让模型在保持准确率的同时减少推理 token 的数量，可以按照以下步骤操作：\n\n1.  **定位章节**: 在 `README.md` 中找到 **Section II: SFT with Variable-Length CoT Data**（基于变长思维链数据的监督微调）或 **Section I: RL with Length Reward Design**（基于长度奖励设计的强化学习）。\n2.  **筛选论文**: 浏览该章节下的论文列表，例如发现论文 *TokenSkip: Controllable Chain-of-Thought Compression in LLMs*。\n3.  **获取实现**: 点击论文标题链接（arXiv PDF）阅读细节，通常在论文摘要或首页会附带官方代码仓库链接（GitHub）。\n4.  **复现代码**: 进入该论文对应的独立 GitHub 仓库进行安装和使用。\n\n#### 示例：复现某篇具体论文的流程\n以列表中提到的 *O1-Pruner* 为例（假设其有公开代码库）：\n\n```bash\n# 1. 根据 README 中的链接找到 O1-Pruner 的官方代码仓库\ngit clone https:\u002F\u002Fgithub.com\u002F\u003Cauthor>\u002FO1-Pruner.git\ncd O1-Pruner\n\n# 2. 创建虚拟环境并安装依赖 (具体依赖以该论文仓库为准)\nconda create -n o1pruner python=3.10\nconda activate o1pruner\npip install -r requirements.txt\n\n# 3. 运行推理或训练脚本 (参考该仓库的具体文档)\npython infer.py --model_name_or_path qwen-7b --method length_pruning\n```\n\n### 贡献与更新\n该项目持续更新（最新更新于 2025 年 8 月），如果你发现了新的高效推理论文，可以通过 Pull Request 贡献到该列表中：\n\n```bash\n# 修改 README.md 添加新论文条目\n# 提交更改\ngit add README.md\ngit commit -m \"Add new paper: [Paper Title]\"\ngit push origin main\n```\n\n通过这种方式，你可以始终保持在 LLM 高效推理领域的最前沿。","某金融科技公司量化团队正致力于构建高频交易决策系统，需要大模型在毫秒级延迟内完成复杂的市场舆情分析与逻辑推演。\n\n### 没有 Awesome-Efficient-Reasoning-LLMs 时\n- **推理延迟过高**：模型习惯生成冗长的思维链（CoT），即使面对简单问题也“过度思考”，导致单次响应耗时超过 2 秒，无法满足实时交易需求。\n- **算力成本激增**：无效的长文本生成占用了大量 GPU 显存与计算资源，使得大规模并发部署的成本居高不下。\n- **技术选型迷茫**：面对海量的效率优化论文，团队难以系统梳理哪些方法（如长度奖励 RL、动态推理范式）适合当前业务，研发试错周期漫长。\n- **响应稳定性差**：缺乏对推理长度的有效控制，模型偶尔陷入死循环或输出无关废话，影响下游策略执行的准确性。\n\n### 使用 Awesome-Efficient-Reasoning-LLMs 后\n- **显著降低延迟**：基于综述中\"RL 长度奖励设计”与“动态推理范式”章节的指导，团队引入相关算法，使模型学会“适可而止”，平均响应时间压缩至 300 毫秒以内。\n- **资源利用率优化**：通过应用“思维链压缩”与“变长数据微调”技术，去除了冗余推理步骤，在同等硬件条件下并发吞吐量提升 3 倍。\n- **研发路径清晰**：利用其分类体系（Taxonomy），团队快速定位到“提示词引导的高效推理”方案，仅用一周即完成了从理论验证到生产环境的落地。\n- **输出精准可控**：模型能够根据问题难度动态调整思考深度，既保证了复杂逻辑的准确性，又杜绝了简单任务上的资源浪费。\n\nAwesome-Efficient-Reasoning-LLMs 不仅是一份文献清单，更是帮助开发者打破大模型“过度思考”瓶颈、实现低成本高性能推理的实战导航图。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEclipsess_Awesome-Efficient-Reasoning-LLMs_63137d00.png","Eclipsess","Yang Sui","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FEclipsess_940beeeb.jpg","Efficient AI;\r\nPostdoc at Rice University",null,"US","https:\u002F\u002Feclipsess.github.io\u002Fyang-sui.github.io\u002F","https:\u002F\u002Fgithub.com\u002FEclipsess",755,37,"2026-04-06T12:55:29","","未说明",{"notes":90,"python":88,"dependencies":91},"该项目是一个论文综述列表（Awesome List），用于整理和展示关于大语言模型高效推理的研究论文，本身不是一个可执行的软件工具或代码库，因此没有具体的运行环境、依赖库或硬件需求。用户需查阅列表中各篇具体论文的仓库以获取相应代码的运行要求。",[],[15],[94,95,96],"efficiency","large-language-models","large-reasoning-models","2026-03-27T02:49:30.150509","2026-04-11T16:59:37.798130",[100,105,110,115,120,125,130],{"id":101,"question_zh":102,"answer_zh":103,"source_url":104},22437,"如何向该综述项目提交新的相关研究工作？","维护者通常欢迎用户提交相关论文。对于新工作的建议，维护者可能会直接在下一个版本中更新，或者要求用户通过提交 Pull Request (PR) 的方式来正式添加。建议在 Issue 中提供论文标题、链接及简要介绍，若被要求则进一步提交 PR。","https:\u002F\u002Fgithub.com\u002FEclipsess\u002FAwesome-Efficient-Reasoning-LLMs\u002Fissues\u002F24",{"id":106,"question_zh":107,"answer_zh":108,"source_url":109},22438,"MARP 方法属于基于输入提示的高效推理还是基于输出推理的高效推理？","经过社区讨论和仔细核对定义，MARP 方法应归类为“基于输入提示的高效推理”（Prompt-Guided efficient reasoning in the input-based method），而非最初认为的基于输出的方法。这是因为其核心机制更符合通过提示引导来优化推理路径的定义。","https:\u002F\u002Fgithub.com\u002FEclipsess\u002FAwesome-Efficient-Reasoning-LLMs\u002Fissues\u002F1",{"id":111,"question_zh":112,"answer_zh":113,"source_url":114},22439,"提交逻辑推理（Logical Reasoning）相关的论文会被收录吗？","这取决于论文是否直接解决“高效推理”（efficient reasoning）的问题。如果论文主要关注逻辑推理的数据增强、提示增强或评估，但未直接涉及推理效率优化，维护者可能暂时不会将其纳入当前版本，而是留待未来专门讨论逻辑推理章节时再考虑。","https:\u002F\u002Fgithub.com\u002FEclipsess\u002FAwesome-Efficient-Reasoning-LLMs\u002Fissues\u002F8",{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},22440,"如果发现综述表格中的算法信息（如 RL 算法类型）有误，该如何反馈？","用户可以直接在 Issue 中指出具体错误并提供证据（如原论文的截图或引用）。维护者非常重视此类修正，确认后会迅速在后续版本中更新表格内容。例如，曾有用户指出 L1 论文使用的是 GRPO 算法而非其他，维护者随即确认并承诺修复。","https:\u002F\u002Fgithub.com\u002FEclipsess\u002FAwesome-Efficient-Reasoning-LLMs\u002Fissues\u002F4",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},22441,"关于长上下文（Long-Context）LLM 的高效推理工作（如 OmniKV）是否相关？","是的，这类工作是相关的。特别是那些提出在不丢弃 token 的情况下加速长文本推理，并在 CoT（思维链）任务中验证了有效性的方法。维护者会检查此类论文，如果确认其在保持效率的同时提升了 CoT 任务性能，通常会安排在一个月内更新收录。","https:\u002F\u002Fgithub.com\u002FEclipsess\u002FAwesome-Efficient-Reasoning-LLMs\u002Fissues\u002F6",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},22442,"是否有针对推荐系统领域的高效推理研究被收录？","有的。探索将“先思考后行动”（think-before-action）范式应用于推荐系统，从而实现隐式多步推理并提升长尾用户\u002F物品性能的工作（如 ReaRec），被视为有趣且相关的初步研究，维护者会在下一版本中讨论并收录此类跨领域的高效推理工作。","https:\u002F\u002Fgithub.com\u002FEclipsess\u002FAwesome-Efficient-Reasoning-LLMs\u002Fissues\u002F9",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},22443,"什么样的推理优化方法最容易被该综述收录？","最直接相关的方法是那些明确旨在减少推理步骤长度、优化思维链（CoT）结构或在固定预算下控制 token 消耗的技术。例如：动态早退（Dynamic Early Exit）、通过代数约束分解复杂任务（如 Syzygy of Thoughts）、以及在测试时调节推理进度的框架，这些都因直接提升推理效率而被优先收录。","https:\u002F\u002Fgithub.com\u002FEclipsess\u002FAwesome-Efficient-Reasoning-LLMs\u002Fissues\u002F13",[]]