[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-BytedTsinghua-SIA--DAPO":3,"similar-BytedTsinghua-SIA--DAPO":79},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":8,"readme_en":9,"readme_zh":10,"quickstart_zh":11,"use_case_zh":12,"hero_image_url":13,"owner_login":14,"owner_name":14,"owner_avatar_url":15,"owner_bio":16,"owner_company":17,"owner_location":17,"owner_email":17,"owner_twitter":17,"owner_website":17,"owner_url":18,"languages":19,"stars":24,"forks":25,"last_commit_at":26,"license":17,"difficulty_score":27,"env_os":28,"env_gpu":29,"env_ram":30,"env_deps":31,"category_tags":40,"github_topics":17,"view_count":43,"oss_zip_url":17,"oss_zip_packed_at":17,"status":44,"created_at":45,"updated_at":46,"faqs":47,"releases":78},9311,"BytedTsinghua-SIA\u002FDAPO","DAPO","An Open-source RL System from ByteDance Seed and Tsinghua AIR","DAPO 是由字节跳动 Seed 团队与清华大学 AIR 联合开源的大规模大语言模型强化学习（RL）系统。它旨在解决当前大模型在复杂推理任务中训练不稳定、探索效率低以及难以复现顶尖性能的难题。通过开放算法核心、代码基础设施及配套数据集，DAPO 让研究社区能够低成本地访问和复现世界级的强化学习成果。\n\n该系统特别适合从事大模型对齐、推理能力优化及强化学习算法研究的开发者与科研人员。基于高效的 verl 框架，DAPO 提出了独特的“解耦裁剪”与“动态采样”策略优化算法。这一创新不仅显著提升了训练过程的稳定性，还通过智能平衡“探索”与“利用”，引导模型生成更长、更复杂的推理路径。\n\n在实际表现上，DAPO 展现了惊人的效率：仅用以往最先进方法一半的训练步数，便基于 Qwen2.5-32B 基座模型在极具挑战性的 AIME 2024 数学竞赛中取得了超过 50% 的得分，刷新了同类规模模型的性能纪录。无论是希望深入探究 RLHF 前沿机制的研究者，还是寻求高效训练方案的技术团队，DAPO 都提供了一个透明、强大且易于上手的专业平台。","\u003Cdiv align='center'>\n\u003Ch1>DAPO: an Open-source RL System from \u003Cbr>ByteDance Seed and Tsinghua AIR\u003C\u002Fh1>\n\n\u003C!-- TODO:  Thread,Paper,Dataset,Weights-->\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpaper-5f16a8?style=for-the-badge&logo=arxiv&logoColor=white)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.14476)\n[![Blog](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-3858bf?style=for-the-badge&logo=homepage&logoColor=white)](https:\u002F\u002FDAPO-SIA.github.io\u002F)\n[![Dataset](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDatasets-4d8cd8?style=for-the-badge&logo=huggingface&logoColor=white)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBytedTsinghua-SIA\u002FDAPO-Math-17k)\n[![Weights](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel%20Weights-63cad3?style=for-the-badge&logo=huggingface&logoColor=white)](https:\u002F\u002Fhuggingface.co\u002FBytedTsinghua-SIA\u002FDAPO-Qwen-32B)\n\u003C!-- [![Thread](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FThread-91ded6?style=for-the-badge&logo=x&logoColor=white)](https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA\u002FDAPO) -->\n\u003C\u002Fdiv>\n\n> [!IMPORTANT]\n> **🔥 News!!!**\n> - [2025\u002F05] We update the [wandb training record](https:\u002F\u002Fwandb.ai\u002Fverl-org\u002FDAPO%20Reproduction%20on%20verl?nw=wmb4qxfht0n) of full DAPO and the [checkpoint](https:\u002F\u002Fhuggingface.co\u002FBytedTsinghua-SIA\u002FDAPO-Qwen-32B) which achieved 50%+ on AIME 2024. We also provide [instructions](#evaluation-on-aime-2024) for evaluation on AIME 2024.\n> - [2025\u002F03] We release the training record of an early version of DAPO (w\u002Fo Token-level PG Loss & Dynamic Sampling), achieving 44% on AIME 2024, in [wandb](https:\u002F\u002Fwandb.ai\u002Fverl-org\u002FDAPO%20Reproduction%20on%20verl?nw=u7n2j5sht28).\n\nWe release a fully open-sourced system for large-scale LLM RL, including algorithm, code infrastructure, and dataset. The system achieves state-of-the-art large-scale LLM RL performance. We propose the **D**ecoupled Clip and **D**ynamic s**A**mpling **P**olicy **O**ptimization (**DAPO**) algorithm.\nThrough open-sourcing, we provide the broader research community and society with practical access to scalable reinforcement learning, enabling all to benefit from these advancements. Our system is based on the awesome [verl](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) framework. Thanks for their great work!\n\n## Discussions Welcomed\n\n🤗 If you have any questions about our paper, issues are welcomed and we could discuss there. Thank you!\n\n## Key Results\n\n### AIME 2024 Performance\n\n🚀 **DAPO** achieves 50 points on AIME 2024 based on the Qwen2.5-32B base model, outperforming the previous SoTA DeepSeek-R1-Zero-Qwen-32B with 50% training steps.\n\n![alt text](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBytedTsinghua-SIA_DAPO_readme_7d6bd0eac5ac.png)\n\n### Metric Supervision during Training\n\n1. **Length stability and growth**: The steady increase in response length allows for greater exploration, facilitating the model’s ability to learn more complex reasoning behaviors, ultimately contributing to training stability and performance improvement.\n\n2. **Reward score stability**: A stable increase in the reward signal indicates that the model is successfully fitting the training distribution, ensuring that the learning process remains robust and consistent without significant fluctuations.\n\n3. **Entropy and mean probability trend**: A controlled increase in entropy, after an initial decrease, ensures a healthy balance between exploration and exploitation, avoiding issues such as overfitting or excessive randomness, and promoting sustained model performance.\n\n![alt text](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBytedTsinghua-SIA_DAPO_readme_6bb022784350.png)\n\n## Model Use\n\nWe provide the model weights of [DAPO-Qwen-32B](https:\u002F\u002Fhuggingface.co\u002FBytedTsinghua-SIA\u002FDAPO-Qwen-32B), which is trained based on Qwen2.5-32B using the DAPO algorithm.\n\n### Environment Setup\n\nWe recommend using conda to setup the environment:\n\n```bash\nconda create -n dapo python=3.10\nconda activate dapo\npip3 install -r requirements.txt\n```\n\n### Inference\n\nWe provide the model inference code here:\n\n```python\nimport torch\nfrom transformers import AutoTokenizer\nfrom vllm import SamplingParams, LLM\n\nexamples = [\n    {\n        \"question\": \"Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\\n\\nFind the largest possible real part of \\\\[(75+117i)z+\\\\frac{96+144i}{z}\\\\]where $z$ is a complex number with $|z|=4$.\\n\\nRemember to put your answer on its own line after \\\"Answer:\\\".\",\n        \"answer\": \"540\"\n    },\n    {\n        \"question\": \"Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\\n\\nEvery morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+\\\\frac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop.\\n\\nRemember to put your answer on its own line after \\\"Answer:\\\".\",\n        \"answer\": \"204\"\n    },\n    {\n        \"question\": \"Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\\n\\nLet $\\\\mathcal{B}$ be the set of rectangular boxes with surface area $54$ and volume $23$. Let $r$ be the radius of the smallest sphere that can contain each of the rectangular boxes that are elements of $\\\\mathcal{B}$. The value of $r^2$ can be written as $\\\\frac{p}{q}$, where $p$ and $q$ are relatively prime positive integers. Find $p+q$.\\n\\nRemember to put your answer on its own line after \\\"Answer:\\\".\",\n        \"answer\": \"721\"\n    }\n]\n\n\ndef main():\n    model = \"BytedTsinghua-SIA\u002FDAPO-Qwen-32B\"\n\n    tokenzier = AutoTokenizer.from_pretrained(model)\n\n    llm = LLM(\n        model=model,\n        dtype=torch.bfloat16,\n        tensor_parallel_size=8,\n        gpu_memory_utilization=0.95\n    )\n\n    sampling_params = SamplingParams(\n        temperature=1.0,\n        top_p=0.7,\n        max_tokens=20480\n    )\n\n    for example in examples:\n        question = example[\"question\"]\n        answer = example[\"answer\"]\n        output = llm.generate(\n                    prompts=tokenzier.apply_chat_template(conversation=[{\"content\": question, \"role\": \"user\"}],\n                                                          add_generation_prompt=True,\n                                                          tokenize=False),\n                    sampling_params=sampling_params\n                )\n        print(f\"***QUESTION***:\\n{question}\\n***GROUND TRUTH***:\\n{answer}\\n***MODEL OUTPUT***:\\n{output[0].outputs[0].text}\\n\")\n        print(\"-\"*100)\n\nif __name__ == \"__main__\":\n    main()\n```\n\n### Evaluation on AIME 2024\n\nTo evaluate the model on AIME 2024, we deploy DAPO-Qwen-32B with Ray Serve and vLLM.\n\nTo load the model from Huggingface:\n\n```bash\nserve run eval.llm:build_app model=BytedTsinghua-SIA\u002FDAPO-Qwen-32B tensor-parallel-size=8\n\n# open another terminal\npython eval\u002Feval_aime24.py --temperature 1.0 --top_p 0.7 --max_tokens 20480 --model BytedTsinghua-SIA\u002FDAPO-Qwen-32B --test_file eval\u002Faime-2024.parquet\n```\n\nTo load the model from local path:\n\n```bash\nserve run eval.llm:build_app model=aaa\u002Fbbb\u002Fccc tensor-parallel-size=8\n\n# open another terminal\npython eval\u002Feval_aime24.py --temperature 1.0 --top_p 0.7 --max_tokens 20480 --model ccc --test_file eval\u002Faime-2024.parquet\n```\n\n## Reproducibility\n\nTo benefit the broader research community, we fully open-source the recipe of our RL training, including algorithm details, dataset, and infrastructures.\n\n### Datasets\nWe provide training and validation datasets for DAPO training.\n\nTraining: [DAPO-Math-17k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBytedTsinghua-SIA\u002FDAPO-Math-17k), a carefully curated and processed math dataset.\nValidation: [AIME 2024](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBytedTsinghua-SIA\u002FAIME-2024).\n\n### Training\n\nWe provide the [out-of-the-box](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo) script for DAPO training reproduction. Quickstart and core code are mentioned in [README](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo\u002FREADME.md). These are scripts for:\n\n- [Datasets Preparation](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo\u002Fprepare_dapo_data.sh)\n- [DAPO w\u002Fo Token-level PG Loss & Dynamic Sampling -- AIME 44](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo\u002Frun_dapo_early_qwen2.5_32b.sh)\n- [DAPO Full -- AIME 50](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo\u002Frun_dapo_qwen2.5_32b.sh)\n\nNote:\n\n- The `DAPO w\u002Fo Token-level PG Loss & Dynamic Sampling -- AIME 44` script has been verified on the current verl and achieves 44 points on AIME 2024, whose training record can be accessed in [wandb](https:\u002F\u002Fwandb.ai\u002Fverl-org\u002FDAPO%20Reproduction%20on%20verl?nw=u7n2j5sht28).\n\n- The `DAPO Full -- AIME 50` script has also been validated on the latest verl version. It scores 50 points on AIME 2024. You can view the corresponding training record on [wandb](https:\u002F\u002Fwandb.ai\u002Fverl-org\u002FDAPO%20Reproduction%20on%20verl?nw=wmb4qxfht0n).\n\n## Acknowledgement\n\nWe thank the [verl](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) for providing the awesome open-source RL infrastructure.\n\nOur open-sourced experiments were conducted on the Volcano Engine Machine Learning Platform. We will provide a full reproduction guideline later on the Volcano Engine platform to help users replicate our experiments.\n\n\u003C!-- ## Citation -->\n","\u003Cdiv align='center'>\n\u003Ch1>DAPO：来自字节跳动Seed与清华AIR的开源强化学习系统\u003C\u002Fh1>\n\n\u003C!-- TODO:  Thread,Paper,Dataset,Weights-->\n[![论文](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpaper-5f16a8?style=for-the-badge&logo=arxiv&logoColor=white)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.14476)\n[![博客](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-3858bf?style=for-the-badge&logo=homepage&logoColor=white)](https:\u002F\u002FDAPO-SIA.github.io\u002F)\n[![数据集](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDatasets-4d8cd8?style=for-the-badge&logo=huggingface&logoColor=white)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBytedTsinghua-SIA\u002FDAPO-Math-17k)\n[![权重](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel%20Weights-63cad3?style=for-the-badge&logo=huggingface&logoColor=white)](https:\u002F\u002Fhuggingface.co\u002FBytedTsinghua-SIA\u002FDAPO-Qwen-32B)\n\u003C!-- [![线程](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FThread-91ded6?style=for-the-badge&logo=x&logoColor=white)](https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA\u002FDAPO) -->\n\u003C\u002Fdiv>\n\n> [!IMPORTANT]\n> **🔥 新闻!!!**\n> - [2025\u002F05] 我们更新了完整DAPO的[WandB训练记录](https:\u002F\u002Fwandb.ai\u002Fverl-org\u002FDAPO%20Reproduction%20on%20verl?nw=wmb4qxfht0n)以及在AIME 2024中取得50%以上成绩的[检查点](https:\u002F\u002Fhuggingface.co\u002FBytedTsinghua-SIA\u002FDAPO-Qwen-32B)。我们还提供了在AIME 2024上进行评估的[说明](#evaluation-on-aime-2024)。\n> - [2025\u002F03] 我们发布了DAPO早期版本（不含Token-level PG Loss和动态采样）的训练记录，在AIME 2024中取得了44%的成绩，相关记录可在[WandB](https:\u002F\u002Fwandb.ai\u002Fverl-org\u002FDAPO%20Reproduction%20on%20verl?nw=u7n2j5sht28)上查看。\n\n我们发布了一个完全开源的大规模LLM强化学习系统，包含算法、代码框架和数据集。该系统实现了当前最先进的大规模LLM强化学习性能。我们提出了**D**ecoupled Clip and **D**ynamic s**A**mpling **P**olicy **O**ptimization（DAPO）算法。\n通过开源，我们为更广泛的研究社区和社会提供了可扩展强化学习的实用途径，使所有人都能受益于这些技术进步。我们的系统基于优秀的[verl](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl)框架。感谢他们的杰出工作！\n\n## 欢迎讨论\n\n🤗 如果您对我们的论文有任何疑问，欢迎在Issue中提出，我们将一起探讨。谢谢！\n\n## 关键结果\n\n### AIME 2024表现\n\n🚀 **DAPO** 基于Qwen2.5-32B基础模型，在AIME 2024中取得了50分的成绩，优于此前的SoTA DeepSeek-R1-Zero-Qwen-32B，且仅使用了其50%的训练步数。\n\n![alt text](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBytedTsinghua-SIA_DAPO_readme_7d6bd0eac5ac.png)\n\n### 训练过程中的指标监控\n\n1. **长度稳定性与增长**：响应长度的稳步增加有助于更大程度的探索，促进模型学习更复杂的推理行为，最终提升训练的稳定性和性能。\n\n2. **奖励分数稳定性**：奖励信号的平稳上升表明模型正在成功拟合训练分布，确保学习过程稳健一致，不会出现大幅波动。\n\n3. **熵与平均概率趋势**：在初始下降后，熵的适度上升保证了探索与利用之间的健康平衡，避免过拟合或过度随机性的问题，从而持续提升模型性能。\n\n![alt text](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBytedTsinghua-SIA_DAPO_readme_6bb022784350.png)\n\n## 模型使用\n\n我们提供了基于DAPO算法、以Qwen2.5-32B为基础训练的[DAPO-Qwen-32B]模型权重，地址为[这里](https:\u002F\u002Fhuggingface.co\u002FBytedTsinghua-SIA\u002FDAPO-Qwen-32B)。\n\n### 环境搭建\n\n我们建议使用conda来搭建环境：\n\n```bash\nconda create -n dapo python=3.10\nconda activate dapo\npip3 install -r requirements.txt\n```\n\n### 推理\n\n我们在此提供模型推理代码：\n\n```python\nimport torch\nfrom transformers import AutoTokenizer\nfrom vllm import SamplingParams, LLM\n\nexamples = [\n    {\n        \"question\": \"请逐步解答以下数学问题。你的回答最后一行应为“Answer: $Answer”（不带引号），其中$Answer是问题的答案。\\n\\n求复数$z$满足$|z|=4$时，\\\\[(75+117i)z+\\\\frac{96+144i}{z}\\\\]的最大可能实部。\\n\\n请务必在“Answer:”之后单独一行写出答案。\",\n        \"answer\": \"540\"\n    },\n    {\n        \"question\": \"请逐步解答以下数学问题。你的回答最后一行应为“Answer: $Answer”（不带引号），其中$Answer是问题的答案。\\n\\n每天早上，Aya会步行9公里，随后去咖啡店休息。当她以每小时$s$公里的速度行走时，整个行程包括在咖啡店停留的$t$分钟，共需4小时。而当她以每小时$s+2$公里的速度行走时，整个行程包括在咖啡店停留的$t$分钟，只需2小时24分钟。假设Aya以每小时$s+\\\\frac{1}{2}$公里的速度行走，求她这次行程总共需要多少分钟，包括在咖啡店停留的$t$分钟。\\n\\n请务必在“Answer:”之后单独一行写出答案。\",\n        \"answer\": \"204\"\n    },\n    {\n        \"question\": \"请逐步解答以下数学问题。你的回答最后一行应为“Answer: $Answer”（不带引号），其中$Answer是问题的答案。\\n\\n设$\\\\mathcal{B}$为所有表面积为54、体积为23的长方体组成的集合。记$r$为能够容纳$\\\\mathcal{B}$中每个长方体的最小球体半径。$r^2$可以表示为$\\\\frac{p}{q}$的形式，其中$p$和$q$是互质的正整数。求$p+q$。\\n\\n请务必在“Answer:”之后单独一行写出答案。\",\n        \"answer\": \"721\"\n    }\n]\n\n\ndef main():\n    model = \"BytedTsinghua-SIA\u002FDAPO-Qwen-32B\"\n\n    tokenizer = AutoTokenizer.from_pretrained(model)\n\n    llm = LLM(\n        model=model,\n        dtype=torch.bfloat16,\n        tensor_parallel_size=8,\n        gpu_memory_utilization=0.95\n    )\n\n    sampling_params = SamplingParams(\n        temperature=1.0,\n        top_p=0.7,\n        max_tokens=20480\n    )\n\n    for example in examples:\n        question = example[\"question\"]\n        answer = example[\"answer\"]\n        output = llm.generate(\n                    prompts=tokenizer.apply_chat_template(conversation=[{\"content\": question, \"role\": \"user\"}],\n                                                          add_generation_prompt=True,\n                                                          tokenize=False),\n                    sampling_params=sampling_params\n                )\n        print(f\"***QUESTION***:\\n{question}\\n***GROUND TRUTH***:\\n{answer}\\n***MODEL OUTPUT***:\\n{output[0].outputs[0].text}\\n\")\n        print(\"-\"*100)\n\nif __name__ == \"__main__\":\n    main()\n```\n\n### AIME 2024 评估\n\n为了在 AIME 2024 上评估模型，我们使用 Ray Serve 和 vLLM 部署了 DAPO-Qwen-32B。\n\n从 Huggingface 加载模型：\n\n```bash\nserve run eval.llm:build_app model=BytedTsinghua-SIA\u002FDAPO-Qwen-32B tensor-parallel-size=8\n\n# 打开另一个终端\npython eval\u002Feval_aime24.py --temperature 1.0 --top_p 0.7 --max_tokens 20480 --model BytedTsinghua-SIA\u002FDAPO-Qwen-32B --test_file eval\u002Faime-2024.parquet\n```\n\n从本地路径加载模型：\n\n```bash\nserve run eval.llm:build_app model=aaa\u002Fbbb\u002Fccc tensor-parallel-size=8\n\n# 打开另一个终端\npython eval\u002Feval_aime24.py --temperature 1.0 --top_p 0.7 --max_tokens 20480 --model ccc --test_file eval\u002Faime-2024.parquet\n```\n\n## 可复现性\n\n为了造福更广泛的研究社区，我们完全开源了我们的强化学习训练流程，包括算法细节、数据集和基础设施。\n\n### 数据集\n我们提供了用于 DAPO 训练的训练集和验证集。\n\n训练集：[DAPO-Math-17k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBytedTsinghua-SIA\u002FDAPO-Math-17k)，一个经过精心整理和处理的数学数据集。\n验证集：[AIME 2024](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBytedTsinghua-SIA\u002FAIME-2024)。\n\n### 训练\n\n我们提供了 [开箱即用](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo) 的脚本，用于复现 DAPO 训练。快速入门和核心代码已在 [README](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo\u002FREADME.md) 中提及。这些脚本用于：\n\n- [数据集准备](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo\u002Fprepare_dapo_data.sh)\n- [无 token 级 PG 损失与动态采样——AIME 44](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo\u002Frun_dapo_early_qwen2.5_32b.sh)\n- [完整 DAPO——AIME 50](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo\u002Frun_dapo_qwen2.5_32b.sh)\n\n注意：\n\n- `无 token 级 PG 损失与动态采样——AIME 44` 脚本已在当前版本的 verl 上验证通过，并在 AIME 2024 中取得了 44 分的成绩，其训练记录可在 [wandb](https:\u002F\u002Fwandb.ai\u002Fverl-org\u002FDAPO%20Reproduction%20on%20verl?nw=u7n2j5sht28) 上查看。\n- `完整 DAPO——AIME 50` 脚本也已在最新版 verl 上验证通过，它在 AIME 2024 中获得了 50 分。您可以在 [wandb](https:\u002F\u002Fwandb.ai\u002Fverl-org\u002FDAPO%20Reproduction%20on%20verl?nw=wmb4qxfht0n) 上查看相应的训练记录。\n\n## 致谢\n\n我们感谢 [verl](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) 提供了出色的开源强化学习基础设施。\n\n我们的开源实验是在火山引擎机器学习平台上进行的。我们将在火山引擎平台上稍后提供完整的复现指南，以帮助用户复现我们的实验。\n\n\u003C!-- ## 引用 -->","# DAPO 快速上手指南\n\nDAPO 是由字节跳动 Seed 团队与清华大学 AIR 联合开源的大规模大语言模型强化学习（RL）系统。该系统基于 [verl](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) 框架，提出了**解耦裁剪与动态采样策略优化（DAPO）**算法，在 AIME 2024 评测中基于 Qwen2.5-32B 基座模型取得了 50 分的 SOTA 成绩。\n\n## 环境准备\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+)\n*   **Python 版本**: 3.10\n*   **硬件要求**: 多卡 GPU 环境（推理示例默认配置为 8 卡，可根据显存调整 `tensor_parallel_size`）\n*   **前置依赖**: 建议安装 `conda` 进行环境管理\n\n## 安装步骤\n\n推荐使用 conda 创建独立环境并安装依赖：\n\n```bash\n# 创建名为 dapo 的 conda 环境，指定 Python 版本为 3.10\nconda create -n dapo python=3.10\nconda activate dapo\n\n# 安装项目依赖\npip3 install -r requirements.txt\n```\n\n> **提示**: 如果下载依赖较慢，可添加国内镜像源加速，例如：\n> `pip3 install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n## 基本使用\n\n### 1. 模型推理\n\n以下代码展示了如何使用 `vllm` 加载 `DAPO-Qwen-32B` 模型进行数学问题推理。请确保机器拥有足够的显存（示例配置为 8 卡并行）。\n\n```python\nimport torch\nfrom transformers import AutoTokenizer\nfrom vllm import SamplingParams, LLM\n\nexamples = [\n    {\n        \"question\": \"Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\\n\\nFind the largest possible real part of \\\\[(75+117i)z+\\\\frac{96+144i}{z}\\\\]where $z$ is a complex number with $|z|=4$.\\n\\nRemember to put your answer on its own line after \\\"Answer:\\\".\",\n        \"answer\": \"540\"\n    },\n    {\n        \"question\": \"Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\\n\\nEvery morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+\\\\frac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop.\\n\\nRemember to put your answer on its own line after \\\"Answer:\\\".\",\n        \"answer\": \"204\"\n    },\n    {\n        \"question\": \"Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\\n\\nLet $\\\\mathcal{B}$ be the set of rectangular boxes with surface area $54$ and volume $23$. Let $r$ be the radius of the smallest sphere that can contain each of the rectangular boxes that are elements of $\\\\mathcal{B}$. The value of $r^2$ can be written as $\\\\frac{p}{q}$, where $p$ and $q$ are relatively prime positive integers. Find $p+q$.\\n\\nRemember to put your answer on its own line after \\\"Answer:\\\".\",\n        \"answer\": \"721\"\n    }\n]\n\n\ndef main():\n    # 模型路径，支持 HuggingFace ID 或本地路径\n    model = \"BytedTsinghua-SIA\u002FDAPO-Qwen-32B\"\n\n    tokenzier = AutoTokenizer.from_pretrained(model)\n\n    llm = LLM(\n        model=model,\n        dtype=torch.bfloat16,\n        tensor_parallel_size=8,  # 根据实际显卡数量调整\n        gpu_memory_utilization=0.95\n    )\n\n    sampling_params = SamplingParams(\n        temperature=1.0,\n        top_p=0.7,\n        max_tokens=20480\n    )\n\n    for example in examples:\n        question = example[\"question\"]\n        answer = example[\"answer\"]\n        output = llm.generate(\n                    prompts=tokenzier.apply_chat_template(conversation=[{\"content\": question, \"role\": \"user\"}],\n                                                          add_generation_prompt=True,\n                                                          tokenize=False),\n                    sampling_params=sampling_params\n                )\n        print(f\"***QUESTION***:\\n{question}\\n***GROUND TRUTH***:\\n{answer}\\n***MODEL OUTPUT***:\\n{output[0].outputs[0].text}\\n\")\n        print(\"-\"*100)\n\nif __name__ == \"__main__\":\n    main()\n```\n\n### 2. AIME 2024 评测\n\n若需复现官方在 AIME 2024 上的评测结果，需结合 `Ray Serve` 和 `vLLM` 部署服务。\n\n**从 HuggingFace 加载模型：**\n\n```bash\n# 终端 1: 启动推理服务\nserve run eval.llm:build_app model=BytedTsinghua-SIA\u002FDAPO-Qwen-32B tensor-parallel-size=8\n\n# 终端 2: 运行评测脚本\npython eval\u002Feval_aime24.py --temperature 1.0 --top_p 0.7 --max_tokens 20480 --model BytedTsinghua-SIA\u002FDAPO-Qwen-32B --test_file eval\u002Faime-2024.parquet\n```\n\n**从本地路径加载模型：**\n\n```bash\n# 终端 1: 启动推理服务 (假设模型位于 aaa\u002Fbbb\u002Fccc)\nserve run eval.llm:build_app model=aaa\u002Fbbb\u002Fccc tensor-parallel-size=8\n\n# 终端 2: 运行评测脚本\npython eval\u002Feval_aime24.py --temperature 1.0 --top_p 0.7 --max_tokens 20480 --model ccc --test_file eval\u002Faime-2024.parquet\n```\n\n### 3. 训练复现\n\nDAPO 的训练完全基于 [verl](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) 框架。官方提供了开箱即用的脚本，涵盖数据准备及不同版本的算法复现（早期版本 AIME 44 分 \u002F 完整版 AIME 50 分）。\n\n详细训练脚本及说明请参考：\n*   **数据准备**: [prepare_dapo_data.sh](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo\u002Fprepare_dapo_data.sh)\n*   **完整训练 (AIME 50)**: [run_dapo_qwen2.5_32b.sh](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl\u002Fblob\u002Fgm-tyx\u002Fpuffin\u002Fmain\u002Frecipe\u002Fdapo\u002Frun_dapo_qwen2.5_32b.sh)\n*   **训练数据集**: [DAPO-Math-17k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBytedTsinghua-SIA\u002FDAPO-Math-17k)","某顶尖 AI 实验室的研究团队正致力于提升大语言模型在复杂数学竞赛（如 AIME）中的推理能力，试图通过强化学习让模型学会多步推导。\n\n### 没有 DAPO 时\n- **训练效率低下**：复现顶级推理模型需要消耗巨大的计算资源和时间，往往需要完整的训练周期才能达到基准性能，试错成本极高。\n- **推理过程不稳定**：模型在训练中容易出现响应长度剧烈波动或奖励信号震荡，导致难以收敛到高质量的解题策略。\n- **探索与利用失衡**：传统算法难以动态调整采样策略，模型要么过早陷入局部最优（过拟合），要么因随机性过大而无法掌握复杂的逻辑链条。\n- **技术门槛高**：缺乏开源且经过验证的大规模 RL 系统基础设施，团队需从零搭建底层框架，分散了核心算法研究的精力。\n\n### 使用 DAPO 后\n- **大幅缩短训练周期**：借助解耦裁剪和动态采样策略，DAPO 仅用以往 50% 的训练步数就在 Qwen2.5-32B 上实现了 AIME 2024 得分突破 50 分，显著加速研发迭代。\n- **训练过程平稳可控**：监控指标显示响应长度稳步增长，奖励分数平滑上升，确保模型能稳定地学习更复杂的长链推理行为。\n- **智能平衡探索机制**：动态采样策略自动调节熵值，既避免了过度随机生成的无效尝试，又防止了过早固化思维，确保持续的性能提升。\n- **开箱即用的全栈系统**：直接复用字节种子与清华 AIR 开源的完整代码、数据集及预训练权重，团队可立即基于成熟框架开展前沿实验。\n\nDAPO 通过高效的动态优化算法，将大规模大模型强化学习的训练成本减半，同时确立了新的性能标杆，让顶级推理能力的获取变得触手可及。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBytedTsinghua-SIA_DAPO_7d6bd0ea.png","BytedTsinghua-SIA","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FBytedTsinghua-SIA_4c713504.png","The Joint SIA-Lab of Tsinghua AIR & Bytedance Seed - Scalable Intelligent Advanced Technology Joint Research Center",null,"https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA",[20],{"name":21,"color":22,"percentage":23},"Python","#3572A5",100,1787,84,"2026-04-17T18:14:30",3,"Linux","必需 NVIDIA GPU。推理示例配置为 tensor_parallel_size=8 (需 8 卡)，显存利用率设为 0.95；基于 Qwen2.5-32B 模型，建议单卡显存 24GB+ 或多卡互联。依赖 vLLM，通常要求 CUDA 12.1+。","未说明 (建议 64GB+ 以支持 32B 模型加载及数据处理)",{"notes":32,"python":33,"dependencies":34},"该工具基于 verl 框架开发，主要用于大规模 LLM 强化学习训练与推理。环境建议使用 conda 管理。推理和评估依赖 vLLM 和 Ray Serve。训练脚本需在 Volcano Engine 或兼容的分布式环境中运行。模型权重为 DAPO-Qwen-32B，基于 Qwen2.5-32B 微调。","3.10",[35,36,37,38,39],"torch","transformers","vllm","ray[serve]","verl",[41,42],"语言模型","开发框架",2,"ready","2026-03-27T02:49:30.150509","2026-04-19T06:04:18.514270",[48,53,58,63,68,73],{"id":49,"question_zh":50,"answer_zh":51,"source_url":52},41776,"为什么在 DAPO 训练脚本中 ppo_epochs 未设置（默认为 1），导致 clip_ratio 参数似乎不生效？","这是一个常见的误解。虽然 ppo_epochs 默认为 1，但这并不意味着 clip_ratio 无效。关键在于理解几个超参数的关系：\n1. `data.train_batch_size` 决定了旧策略生成的样本总数。\n2. `actor_rollout_ref.actor.ppo_mini_batch_size` 决定了每次梯度更新使用的样本子集大小。\n3. `ppo_micro_batch_size` 是为了适应显存限制而进行的微批次划分。\n4. `ppo_epochs` 控制样本复用次数。\n\n在 DAPO\u002FGRPO 的设定中，通常采用 off-policy 更新或单次遍历（epoch=1），但通过调整 mini_batch_size 小于 train_batch_size 来实现多次更新步数，而不是通过增加 ppo_epochs。重用样本（ppo_epochs > 1）在 Actor 策略更新中往往会导致性能显著下降，因此默认设置为 1 是符合最佳实践的。","https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA\u002FDAPO\u002Fissues\u002F11",{"id":54,"question_zh":55,"answer_zh":56,"source_url":57},41777,"论文公式 (5) 中提到计算平均优势（mean advantage），但 verl 代码中似乎是对 reward 求和（sum），这两者是否矛盾？","这不矛盾，主要是出于工程效率的考虑。在 ORM（Outcome Reward Model）设置下，奖励标量仅在序列的 `eos_token` 位置有意义，其他 token 的奖励为 0。由于工程中会将所有序列填充到最大长度，难以高效索引 `eos_token` 的位置，因此代码采用了将整个序列的 reward 标量相加的方法。\n数学上，因为非 eos 位置的奖励为 0，所以“求和”与“在 eos 位置取值”是等价的。至于公式中的均值处理，那是针对 token-level 优势计算的表述，与底层 reward 聚合方式（sum vs mean）在不同上下文中是兼容的。","https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA\u002FDAPO\u002Fissues\u002F3",{"id":59,"question_zh":60,"answer_zh":61,"source_url":62},41778,"为什么在评测 AIME 数据集时使用 avg@32 而不是 pass@1？","avg@32 本质上等于基于 32 次生成计算的 pass@1。使用 avg@32 的主要原因是其具有更低的方差，能够提供更稳定的监控指标。\n相比之下，直接使用单次的 pass@1 在验证过程中容易产生剧烈波动（如 DeepScaler 项目中观察到的情况）。因此，为了训练过程的稳定性，作者选择了 avg@32 作为评估标准，其期望值与 pass@1 相同但更可靠。","https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA\u002FDAPO\u002Fissues\u002F5",{"id":64,"question_zh":65,"answer_zh":66,"source_url":67},41779,"Verl 框架中的 Policy Loss 默认是实现 Token-level 还是 Sample-level 的策略梯度损失？","Verl 里面默认的 Policy Loss 实现是 Token-level Policy Gradient Loss。\n许多之前的强化学习算法使用的是 Sample-level loss（对整个轨迹取期望），但在 Verl 的默认配置中，损失函数是在 token 级别计算的。如果在代码中看到相关的条件判断，通常 `if` 分支对应的是目前的默认 token-level 实现，而 `else` 分支则对应 sample-level。","https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA\u002FDAPO\u002Fissues\u002F4",{"id":69,"question_zh":70,"answer_zh":71,"source_url":72},41780,"DAPO 为什么要使用 off-policy 更新，而不是像 vLLM 那样进行 on-policy 训练？","这主要取决于底层框架 Verl 的架构设计。Verl 将采样（sampling）和训练（training）分为顺序执行的步骤。每当从采样切换到训练时，如果模型规模较大（如超过 100B），在不同节点间同步检查点（checkpoint synchronization）会产生巨大的磁盘 I\u002FO 开销（I\u002FO bounded）。\n为了减少这种开销并提高速度，基于 Verl 实现的算法被迫采用 off-policy 更新策略（即使每个轨迹只使用一次）。\n相比之下，vLLM 和 SWIFT 等主流框架在不同设备上并行进行采样和训练，因此使用 on-policy 不会引入额外的时间开销。","https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA\u002FDAPO\u002Fissues\u002F23",{"id":74,"question_zh":75,"answer_zh":76,"source_url":77},41781,"如何自定义 Reward Function（奖励函数）？","在 Verl 框架中，可以通过配置 `reward_manager` 参数来指定或自定义奖励函数。在 `main_ppo.py` 中，逻辑如下：\n1. 读取配置 `config.reward_model.get(\"reward_manager\", \"naive\")`。\n2. 如果值为 'naive'，则使用 `NaiveRewardManager`。\n3. 如果值为 'prime'，则使用 `PrimeRewardManager`。\n4. 若要自定义，需实现一个新的 Reward Manager 类，并在配置文件中将 `reward_manager` 设置为对应的名称，同时在代码注册逻辑中添加该类的导入和实例化逻辑（类似现有的 naive 和 prime 处理方式）。","https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA\u002FDAPO\u002Fissues\u002F18",[],[80,91,99,107,115,124],{"id":81,"name":82,"github_repo":83,"description_zh":84,"stars":85,"difficulty_score":27,"last_commit_at":86,"category_tags":87,"status":44},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,"2026-04-06T06:32:30",[88,42,89,90],"Agent","图像","数据工具",{"id":92,"name":93,"github_repo":94,"description_zh":95,"stars":96,"difficulty_score":27,"last_commit_at":97,"category_tags":98,"status":44},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[42,89,88],{"id":100,"name":101,"github_repo":102,"description_zh":103,"stars":104,"difficulty_score":43,"last_commit_at":105,"category_tags":106,"status":44},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160015,"2026-04-18T11:30:52",[42,88,41],{"id":108,"name":109,"github_repo":110,"description_zh":111,"stars":112,"difficulty_score":43,"last_commit_at":113,"category_tags":114,"status":44},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[42,89,88],{"id":116,"name":117,"github_repo":118,"description_zh":119,"stars":120,"difficulty_score":43,"last_commit_at":121,"category_tags":122,"status":44},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[123,88,89,42],"插件",{"id":125,"name":126,"github_repo":127,"description_zh":128,"stars":129,"difficulty_score":43,"last_commit_at":130,"category_tags":131,"status":44},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[123,42]]