[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-inclusionAI--AReaL":3,"similar-inclusionAI--AReaL":201},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":8,"readme_en":9,"readme_zh":10,"quickstart_zh":11,"use_case_zh":12,"hero_image_url":13,"owner_login":14,"owner_name":14,"owner_avatar_url":15,"owner_bio":16,"owner_company":17,"owner_location":17,"owner_email":17,"owner_twitter":17,"owner_website":18,"owner_url":19,"languages":20,"stars":37,"forks":38,"last_commit_at":39,"license":40,"difficulty_score":41,"env_os":42,"env_gpu":43,"env_ram":44,"env_deps":45,"category_tags":54,"github_topics":58,"view_count":67,"oss_zip_url":17,"oss_zip_packed_at":17,"status":68,"created_at":69,"updated_at":70,"faqs":71,"releases":100},5302,"inclusionAI\u002FAReaL","AReaL","Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.","AReaL 是一款由清华大学 IIIS 与蚂蚁集团联合开源的大规模异步强化学习系统，专为提升大语言模型的推理能力与智能体（Agent）表现而设计。它致力于解决传统强化学习训练中速度慢、架构复杂且难以灵活定制的问题，让开发者能够以极低的成本和门槛构建高性能 AI 智能体。\n\n该系统特别适合人工智能研究人员、算法工程师以及希望探索智能体应用的开发者使用。无论是数学解题、代码生成，还是搜索辅助与客户服务场景，AReaL 都能提供业界领先的训练速度与稳定性。其核心亮点在于“完全异步”的训练架构，不仅大幅提升了扩展性，还支持通过简单替换配置即可无缝切换不同的智能体任务或在线训练环境。此外，AReaL 已适配昇腾 NPU 设备，并配套开放了训练数据、基础设施细节及预训练模型，真正实现了从理论到复现的全流程开源。团队希望通过这一灵活、高效且亲民的工具，像调制一杯可口的奶茶一样，让每个人都能轻松打造出属于自己的强大 AI 智能体。","\u003Ch1 align=\"center\">\n\u003Cem>AReaL\u003C\u002Fem>: A Large-Scale Asynchronous Reinforcement Learning System\n\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n| \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.24298\">\u003Cb>Paper\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002F\">\u003Cb>Documentation\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fzh\u002F\">\u003Cb>中文文档\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdeepwiki.com\u002FinclusionAI\u002FAReaL\">\u003Cb>Ask DeepWiki\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FinclusionAI\u002F\">\u003Cb>🤗 Models & Data\u003C\u002Fb>\u003C\u002Fa> |\n\u003Ca href=\".\u002Fassets\u002Fwechat_qrcode.png\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_AReaL_readme_117b5c19ef51.png\" width=\"20\" style=\"vertical-align: middle;\"> \u003Cb>WeChat (微信) Group\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Fgitcgr.com\u002FinclusionAI\u002FAReaL\">\n    \u003Cimg src=\"https:\u002F\u002Fgitcgr.com\u002Fbadge\u002FinclusionAI\u002FAReaL.svg\" alt=\"gitcgr\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cimg align=\"right\" alt=\"ReaL\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_AReaL_readme_79c4ed3aba8d.png\" width=\"20%\">\n\nAReaL is an open-source **fully asynchronous** reinforcement learning training system\nfor large **reasoning and agentic models**, developed by members from Tsinghua IIIS and\nthe AReaL Team at Ant Group. Built upon the open-source project\n[ReaLHF](https:\u002F\u002Fgithub.com\u002Fopenpsi-project\u002FReaLHF), we are fully committed to\nopen-source principles by providing the training details, data, and infrastructure\nrequired to reproduce our results, along with the models themselves. AReaL aims to help\neveryone build their own AI agents easily and affordably. Our team loves milk tea\nbecause it's delicious, customizable, and affordable—we hope you enjoy our project just\nas much as you'd enjoy real milk tea. Cheers!\n\n**AReaL Highlights**\n\n- ⚡ **Flexibility**: Seamless customization for\n  [agentic RL](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fen\u002Ftutorial\u002Fagentic_rl.html) and\n  [online RL training](.\u002Fexamples\u002Fopenclaw\u002F) by simply replacing the `base_url`.\n- 📈 **Scalability**: **Stable** fully asynchronous RL training with **industry-leading\n  speed**.\n- ✨ **Cutting-Edge Performance**: State-of-the-art [math](\u002Fblog\u002FAReaL_v0_2.md),\n  [coding](\u002Fblog\u002FAReaL_v0_3.md), [search](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher), and\n  [customer service](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.22607) agents.\n\n## 📰 News\n\n**\\[2026\u002F03\u002F02\\]** We provide [a complete example](.\u002Fexamples\u002Fopenclaw\u002F) to train your\nown 🦞 OpenClaw agent by simply replacing the `base_url` and `api_key` with AReaL's RL\nservice - no complicated dependencies, no code changes, works with any agentic runtime!\n\n**\\[2026\u002F02\u002F06\\]** We are delighted to introduce **AReaL-SEA**, a self-evolving data\nsynthesis engine. Combined with RL training on AReaL, the 235B MoE model surpasses GPT 5\nand achieves comparable performance with Gemini 3.0 Pro on $\\\\tau^2$-bench! Check out\nthe [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2601.22607),\n[model](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FAReaL-SEA-235B-A22B),\n[data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FinclusionAI\u002FAReaL-tau2-data), and\n[code](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Ftree\u002Fmain\u002Fexamples\u002Ftau2).\n\n**\\[2026\u002F01\u002F15\\]** Congrats to our friends at [CAMEL-AI](https:\u002F\u002Fwww.camel-ai.org\u002F) for\nopen-sourcing [SETA](https:\u002F\u002Fgithub.com\u002Fcamel-ai\u002Fseta), their terminal agent RL project\ntrained with AReaL! Check out\n[their training workflow](https:\u002F\u002Fgithub.com\u002Fcamel-ai\u002Fseta\u002Ftree\u002Fmain\u002Ftraining\u002Ftbench_areal_workflow)\nand the [announcement on X](https:\u002F\u002Fx.com\u002Fguohao_li\u002Fstatus\u002F2009678513574408636).\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>📋 Previous Releases\u003C\u002Fb>\u003C\u002Fsummary>\n\n**\\[2026\u002F01\u002F01\\]** Happy New Year! Thanks to the outstanding contribution from\n@HwVanICI, we are excited to officially announce stable support for AReaL training on\n**Ascend NPU devices**! The code is actively maintained and continuously updated in the\n[`ascend` branch](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Ftree\u002Fascend). Check out\n[our documentation](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fen\u002Ftutorial\u002Finstallation_npu.html)\nto get started, and feel free to report any issues!\n\n**\\[2025\u002F08\u002F30\\]** Introducing ASearcher, a state-of-the-art search agent built with\nAReaL's end-to-end asynchronous RL training. Check out the [paper](assets\u002Fpaper.pdf) and\nthe [open-source repository](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher)!\n\n**\\[2025\u002F07\u002F31\\] (AReaL-lite)** We introduce AReaL-lite, a **lightweight** version of\nAReaL designed specifically for AI researchers and rapid prototyping. AReaL-lite\nfeatures an **algorithm-first** API design that prioritizes ease of use and algorithm\ndevelopment, while natively supporting **fully asynchronous agentic RL**. With 80% fewer\nlines of code, AReaL-lite maintains 90% of AReaL's performance and core functionality.\nCheck out [our AReaL-lite design documentation](\u002Fareal\u002FREADME.md) and\n[the quickstart guide](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fen\u002Ftutorial\u002Fquickstart.html)\nto begin your journey with **AReaL-lite**!\n\n**\\[2025\u002F06\u002F03\\] (v0.3, boba²)** We release **boba²** (double-boba) for fully\nasynchronous RL training, which achieves **2.77× speedup while delivering comparable or\nsuperior training performance** compared to synchronous systems. Furthermore,\nasynchronous RL significantly simplifies multi-turn agentic RL training setup! Check out\n[our v0.3 overview blog](\u002Fblog\u002FAReaL_v0_3.md) and the\n[research paper](assets\u002Fpaper.pdf).\n\n**\\[2025\u002F03\u002F31\\] (v0.2, boba)** Introducing our milestone release—boba! Please call it\nA-ReaL-boba! This release features significantly faster training with SGLang support and\nstate-of-the-art 7B and 32B models for mathematical reasoning. Check out our\n[v0.2 technical blog](\u002Fblog\u002FAReaL_v0_2.md).\n\n**\\[2025\u002F02\u002F24\\] (v0.1)** Our initial release includes reproducible results for 1.5B and\n7B Large Reasoning Models (LRMs). Check out our\n[v0.1 technical blog](\u002Fblog\u002FAReaL_v0_1.md).\n\n\u003C\u002Fdetails>\n\n## 🚀 Getting Started\n\nFirst, install the package:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\ncd AReaL\npip install uv\n# Install flash-attn pre-built wheel first to avoid compiling from source\n# (pick the wheel matching your Python version; see https:\u002F\u002Fgithub.com\u002Fmjun0812\u002Fflash-attention-prebuild-wheels\u002Freleases)\nuv pip install \"https:\u002F\u002Fgithub.com\u002Fmjun0812\u002Fflash-attention-prebuild-wheels\u002Freleases\u002Fdownload\u002Fv0.7.16\u002Fflash_attn-2.8.3+cu128torch2.9-cp312-cp312-linux_x86_64.whl\"\nuv sync --extra cuda  # installs training packages + SGLang (default inference backend)\n```\n\nOur training scripts automatically download the required dataset (openai\u002Fgsm8k) and\nmodel (Qwen\u002FQwen2-1.5B-Instruct). To run on a single node:\n\n```bash\npython3 examples\u002Fmath\u002Fgsm8k_rl.py --config examples\u002Fmath\u002Fgsm8k_grpo.yaml scheduler.type=local\n```\n\nTo run on a Ray cluster with 2 nodes and 8 GPUs per node (remember to update paths in\nthe YAML file to point to your shared storage):\n\n```bash\npython3 examples\u002Fmath\u002Fgsm8k_rl.py --config examples\u002Fmath\u002Fgsm8k_grpo.yaml \\\n  cluster.n_nodes=2 cluster.n_gpus_per_node=8 \\\n  scheduler.type=ray\n```\n\nFor comprehensive setup instructions, see\n[our quickstart guide](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fen\u002Ftutorial\u002Fquickstart.html).\n\n## 📚 Examples\n\n### Math & Reasoning\n\n| Task                                                | Description                                                                                  | Performance                                                       |\n| --------------------------------------------------- | -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- |\n| **[Math](examples\u002Fmath\u002F)**                          | GSM8K math reasoning with GRPO, PPO, DAPO, REINFORCE, RLOO, LitePPO, DR-GRPO, GSPO, and more | -                                                                 |\n| **[Multi-Turn Math](examples\u002Fmulti_turn_math\u002F)**    | Multi-turn math agent with reward discounting across turns                                   | [Training Curve](examples\u002Fmulti_turn_math\u002Freward_curve.png)       |\n| **[LoRA Math](examples\u002Fmath\u002Fgsm8k_grpo_lora.yaml)** | Parameter-efficient math training with LoRA (SGLang\u002FvLLM backends)                           | -                                                                 |\n| **[Countdown](examples\u002Fcountdown\u002F)**                | Countdown numbers game with custom rewards                                                   | [Training Curve](examples\u002Fcountdown\u002Fcountdown_training_curve.png) |\n\n### Agentic RL\n\n| Task                                                     | Description                                                            | Performance                                                                  |\n| -------------------------------------------------------- | ---------------------------------------------------------------------- | ---------------------------------------------------------------------------- |\n| **[General Agent](examples\u002Fagent_workflow\u002F)**            | General agentic training with any agentic frameworks                   | [Guide](docs\u002Ftutorial\u002Fagentic_rl.md)                                         |\n| **[Tau2 Customer Service](examples\u002Ftau2\u002F)**              | Customer service agent on Tau2-Bench (retail, airline, telecom)        | [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.22607)                                    |\n| **[Search Agent](examples\u002Fsearch_agent\u002F)**               | End-to-end search agent with Tongyi-DeepResearch workflow              | [Training Curve](examples\u002Fsearch_agent\u002Ftongyi_deepresearch\u002Freward_curve.png) |\n| **[Tool-Integrated Reasoning](examples\u002Ftir\u002F)**           | Multi-turn tool calling during reasoning (Python executor, calculator) | [Training Curve](examples\u002Ftir\u002Ffigures\u002Ftask_reward.png)                       |\n| **[OpenAI Agents Integration](examples\u002Fopenai_agents\u002F)** | Integration with OpenAI Agents SDK for agentic workflows               | -                                                                            |\n| **[CAMEL-AI Integration](examples\u002Fcamel\u002F)**              | Integration with CAMEL-AI framework for agentic RL                     | -                                                                            |\n\n### Vision-Language Models\n\n| Task                                | Description                                               | Performance                                     |\n| ----------------------------------- | --------------------------------------------------------- | ----------------------------------------------- |\n| **[VLM](examples\u002Fvlm\u002F)**            | Geometry3K and CLEVR Count 70K visual reasoning with GRPO | -                                               |\n| **[VLM on NPU](examples\u002Fvlm_npu\u002F)** | VLM training on Huawei NPU hardware                       | [Benchmark Results](examples\u002Fvlm_npu\u002FREADME.md) |\n\n### Alignment & Infrastructure\n\n| Task                                            | Description                                           | Performance                                       |\n| ----------------------------------------------- | ----------------------------------------------------- | ------------------------------------------------- |\n| **[RLHF Reward Modeling](examples\u002Falignment\u002F)** | Bradley-Terry reward modeling on Anthropic HH-RLHF    | [Training Curve](examples\u002Falignment\u002Frw_curve.png) |\n| **[SkyPilot Deployment](examples\u002Fskypilot\u002F)**   | Cloud deployment with SkyPilot (GCP, AWS, Kubernetes) | [Screenshots](examples\u002Fskypilot\u002FREADME.md)        |\n\n## 🔧 Support Matrix\n\n### 🧠 Algorithms\n\nAll RL algorithms support both asynchronous and synchronous versions by setting\n`max_head_offpolicyness=0`. See [Asynchronous RL Guide](docs\u002Falgorithms\u002Fasync.md).\n\n| Algorithm                | Documentation                                 | Paper                                          | Configuration                                                     |\n| ------------------------ | --------------------------------------------- | ---------------------------------------------- | ----------------------------------------------------------------- |\n| **GRPO**                 | [📖 Docs](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.03300)   | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_grpo.yaml)                 |\n| **GSPO**                 | [📖 Docs](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.18071)   | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_gspo.yaml)                 |\n| **PPO**                  | [📖 Docs](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.02155)   | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_ppo.yaml)                  |\n| **DAPO**                 | [📖 Docs](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.14476)   | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_dapo_dynamic_bs.yaml)      |\n| **LitePPO**              | [📖 Docs](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.08221)   | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_liteppo.yaml)              |\n| **Dr.GRPO**              | [📖 Docs](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.20783)   | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_drgrpo.yaml)               |\n| **REINFORCE++**          | -                                             | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.03262)   | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_reinforce.yaml)            |\n| **RLOO**                 | [📖 Docs](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.14740v1) | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_rloo.yaml)                 |\n| **SAPO**                 | [📖 Docs](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.20347)   | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_sapo.yaml)                 |\n| **M2PO**                 | [📖 Docs](docs\u002Falgorithms\u002Fm2po.md)            | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.01161)   | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_m2po.yaml)                 |\n| **RLHF Reward Modeling** | -                                             | -                                              | [🔗 RLHF Example](examples\u002Falignment\u002F)                            |\n| **SFT**                  | -                                             | -                                              | [🔗 GSM8K Example](examples\u002Fmath\u002Fgsm8k_sft.py)                    |\n| **Distillation**         | [📖 Docs](docs\u002Fen\u002Falgorithms\u002Fdistillation.md) | [📄 Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.02208)   | [🔗 GSM8K Example](examples\u002Fdistillation\u002Fgsm8k_grpo_distill.yaml) |\n\n### Models\n\n| Model Family               | Megatron | PyTorch FSDP | PyTorch Archon | Notes                                                    |\n| -------------------------- | -------- | ------------ | -------------- | -------------------------------------------------------- |\n| **Qwen2\u002F3**                | ✅       | ✅           | ✅             | -                                                        |\n| **Qwen3-MoE**              | ✅       | ✅           | ✅             | -                                                        |\n| **Qwen2.5-VL**             | ❌       | ✅           | ❌             | Vision-language model                                    |\n| **Qwen3-VL**               | ❌       | ✅           | ❌             | Vision-language model                                    |\n| **Gemma 3**                | ❌       | ✅           | ❌             | Vision-language model                                    |\n| **Other Hugging Face LLM** | ❌       | ✅           | ❌             | Compatibility depending on the version of `transformers` |\n\nCheck the [AI Coding Assistant Guide](docs\u002Freference\u002Fai_assisted_dev.md) and\n[Archon Reference](docs\u002Ftutorial\u002Farchon.md) for how to integrate new models into AReaL.\n\n### Training Backends\n\n| Backend            | DP          | Tensor Parallel | Sequence Parallel within TP | Context Parallel | Pipeline Parallel | Expert Parallel | 1D Sequence Packing | LoRA                             |\n| ------------------ | ----------- | --------------- | --------------------------- | ---------------- | ----------------- | --------------- | ------------------- | -------------------------------- |\n| **Megatron**       | ✅ (ZeRO-1) | ✅              | ✅                          | ✅               | ✅                | ✅              | ✅                  | ✅ (with vLLM inference backend) |\n| **PyTorch FSDP**   | ✅ (FSDP2)  | ✅              | ✅                          | ✅               | ❌                | ❌              | ✅                  | ✅                               |\n| **PyTorch Archon** | ✅ (FSDP2)  | ✅              | ✅                          | ✅               | ✅                | ✅              | ✅                  | ❌                               |\n\n### Inference Backends\n\n| Backend    | Tensor Parallel | Context Parallel | Pipeline Parallel | Data Parallel Attention | Expert Parallel |\n| ---------- | --------------- | ---------------- | ----------------- | ----------------------- | --------------- |\n| **vLLM**   | ✅              | ❓               | ✅                | ❓                      | ❓              |\n| **SGLang** | ✅              | ❌               | ❌                | ✅                      | ✅              |\n\n## 📖 Resources\n\n### Tutorial\n\n- [Installation](docs\u002Fen\u002Ftutorial\u002Finstallation.md)\n- [Quickstart](docs\u002Fen\u002Ftutorial\u002Fquickstart.md)\n- [Agentic RL](docs\u002Fen\u002Ftutorial\u002Fagentic_rl.md)\n- [Evaluation](docs\u002Fen\u002Ftutorial\u002Feval.md)\n- [Large MoE with Megatron](docs\u002Fen\u002Ftutorial\u002Fmegatron.md)\n- [Large MoE with PyTorch Archon](docs\u002Fen\u002Ftutorial\u002Farchon.md)\n\n### Code Walkthrough\n\n- [Running GRPO on GSM8K dataset](docs\u002Fen\u002Ftutorial\u002Fgsm8k_grpo.md)\n\n### Best Practices\n\n- [Improving Algorithm Performance](docs\u002Fen\u002Fbest_practices\u002Falgo_perf.md)\n- [Agent Workflow Best Practices](docs\u002Fen\u002Fbest_practices\u002Fworkflow.md)\n- [Debugging](docs\u002Fen\u002Fbest_practices\u002Fdebugging.md)\n- [Handling OOM Issues](docs\u002Fen\u002Fbest_practices\u002Fhandling_oom.md)\n- [Performance Profiling](docs\u002Fen\u002Fbest_practices\u002Fperf_profiling.md)\n\n### Customization\n\n- [Customize Dataset](docs\u002Fen\u002Fcustomization\u002Fdataset.md)\n- [Customize Agentic\u002FRVLR Rollout Workflows](docs\u002Fen\u002Fcustomization\u002Fagent.md)\n\n### Algorithms\n\n- [Asynchronous RL Explained](docs\u002Fen\u002Falgorithms\u002Fasync.md)\n- [PPO, GRPO, and Related Algorithms](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)\n- [M2PO](docs\u002Fen\u002Falgorithms\u002Fm2po.md)\n\n### Reference\n\n- [CLI Configurations](docs\u002Fen\u002Fcli_reference.md)\n- [LoRA RL](docs\u002Fen\u002Freference\u002Flora.md)\n- [Checkpointing](docs\u002Fen\u002Freference\u002Fcheckpointing.md)\n- [Metrics Tracking](docs\u002Fen\u002Freference\u002Fmetrics_tracking.md)\n- [Allocation Mode](docs\u002Fen\u002Freference\u002Falloc_mode.md)\n- [Rollout Workflow](docs\u002Fen\u002Freference\u002Frollout_workflow.md)\n- [Agent Workflow](docs\u002Fen\u002Freference\u002Fagent_workflow.md)\n- [AI-Assisted Development](docs\u002Fen\u002Freference\u002Fai_assisted_dev.md)\n\n## 🤝 Contributing\n\nWe warmly welcome contributions from the community! Whether you're fixing bugs, adding\nfeatures, improving documentation, or helping others, your contribution is valued.\nPlease check our **[Contributing Guide](CONTRIBUTING.md)** for detailed information.\n\n```bash\n# Fork and clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002FYOUR-USERNAME\u002FAReaL\ncd AReaL\n\n# Install uv and sync dependencies\npip install uv\n# Install flash-attn pre-built wheel to avoid compiling from source\nuv pip install \"https:\u002F\u002Fgithub.com\u002Fmjun0812\u002Fflash-attention-prebuild-wheels\u002Freleases\u002Fdownload\u002Fv0.7.16\u002Fflash_attn-2.8.3+cu128torch2.9-cp312-cp312-linux_x86_64.whl\"\n# Use `--extra cuda` on Linux with CUDA (installs training packages + SGLang)\nuv sync --extra cuda --group dev\n# For vLLM instead (note: use torch2.10 flash-attn wheel):\n# uv sync --extra cuda-vllm --group dev\n# Or without CUDA support\n# uv sync --group dev\n\n# Set up pre-commit hooks (formatting, linting, commit message checks)\npre-commit install --install-hooks\n\n# Make changes\ngit checkout -b feat\u002Fgpt-o5\ngit add .\n# `git commit` will automatically check your files and commit messages\ngit commit -m \"feat: implement gpt-o5 training loop\"\ngit push\n```\n\n## 🗺️ Future Roadmap\n\n- **[Full Roadmap](ROADMAP.md)**\n- **[2025 Q4 Roadmap](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fissues\u002F542)**\n\nAReaL is under active development with planned minor releases weekly and major releases\nmonthly. We warmly welcome community engagement and contributions. We are also\n**actively hiring interns and full-time employees** with open positions in both the US\nand China.\n\n## 🙏 Acknowledgments\n\nWe gratefully acknowledge that major contributors are from the AReaL Team at the\nInstitute for Interdisciplinary Information Sciences (IIIS), Tsinghua University and Ant\nGroup.\n\nWe have also received invaluable assistance from the following groups (listed\nalphabetically):\n\n- The Data Intelligence Lab at Ant Research for their data support\n\n- @HwVanICI for support on vLLM, LoRA, NPU integration, and more\n\n- The [Relaxed System Lab](https:\u002F\u002Fgithub.com\u002FRelaxed-System-Lab) at HKUST for seamless\n  collaboration on numerous system-related aspects\n\n- The [SGLang team](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang) for supporting custom weight\n  update features and their contributions during AReaL-lite development\n\n- The Super Computing Technology (SCT) team at Ant Group for their expertise in\n  large-scale cluster operations and maintenance\n\n- Special thanks to @Lyken17 for providing valuable suggestions throughout the API\n  design process\n\nWe also deeply appreciate all pioneering work from the community, particularly the\n[ReaLHF](https:\u002F\u002Fgithub.com\u002Fopenpsi-project\u002FReaLHF) project from OpenPsi Inc. and other\noutstanding projects, including but not limited to\n[DeepScaleR](https:\u002F\u002Fgithub.com\u002Fagentica-project\u002Fdeepscaler),\n[Open-Reasoner-Zero](https:\u002F\u002Fgithub.com\u002FOpen-Reasoner-Zero\u002FOpen-Reasoner-Zero\u002Ftree\u002Fmain),\n[OpenRLHF](https:\u002F\u002Fgithub.com\u002FOpenRLHF\u002FOpenRLHF),\n[VeRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl),\n[SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang), [QwQ](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwQ),\n[Light-R1](https:\u002F\u002Fgithub.com\u002FQihoo360\u002FLight-R1), and\n[DAPO](https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA\u002FDAPO).\n\n## 📄 Citation\n\n```bibtex\n@inproceedings{mei2025real,\n  author       = {Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},\n  title        = {ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation},\n  booktitle    = {Proceedings of the Eighth Conference on Machine Learning and Systems,\n                  MLSys 2025, Santa Clara, CA, USA, May 12-15, 2025},\n  publisher    = {mlsys.org},\n  year         = {2025},\n}\n```\n\n```bibtex\n@misc{fu2025areal,\n      title={AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning},\n      author={Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu},\n      year={2025},\n      eprint={2505.24298},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.24298},\n}\n```\n","\u003Ch1 align=\"center\">\n\u003Cem>AReaL\u003C\u002Fem>：一个大规模异步强化学习系统\n\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n| \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.24298\">\u003Cb>论文\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002F\">\u003Cb>文档\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fzh\u002F\">\u003Cb>中文文档\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdeepwiki.com\u002FinclusionAI\u002FAReaL\">\u003Cb>Ask DeepWiki\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FinclusionAI\u002F\">\u003Cb>🤗 模型与数据\u003C\u002Fb>\u003C\u002Fa> |\n\u003Ca href=\".\u002Fassets\u002Fwechat_qrcode.png\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_AReaL_readme_117b5c19ef51.png\" width=\"20\" style=\"vertical-align: middle;\"> \u003Cb>WeChat (微信) 群\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Fgitcgr.com\u002FinclusionAI\u002FAReaL\">\n    \u003Cimg src=\"https:\u002F\u002Fgitcgr.com\u002Fbadge\u002FinclusionAI\u002FAReaL.svg\" alt=\"gitcgr\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cimg align=\"right\" alt=\"ReaL\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_AReaL_readme_79c4ed3aba8d.png\" width=\"20%\">\n\nAReaL 是一个开源的**完全异步**强化学习训练系统，专为大型**推理和智能体模型**设计，由清华大学 IIIS 的成员以及蚂蚁集团 AReaL 团队共同开发。基于开源项目 [ReaLHF](https:\u002F\u002Fgithub.com\u002Fopenpsi-project\u002FReaLHF)，我们完全遵循开源原则，提供复现结果所需的训练细节、数据和基础设施，同时开放模型本身。AReaL 致力于帮助每个人以简单且经济的方式构建自己的 AI 智能体。我们的团队热爱奶茶，因为它美味、可定制且价格实惠——我们也希望您会像喜欢真正的奶茶一样喜欢我们的项目。干杯！\n\n**AReaL 亮点**\n\n- ⚡ **灵活性**：通过简单替换 `base_url`，即可无缝定制用于[智能体强化学习](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fen\u002Ftutorial\u002Fagentic_rl.html)和[在线强化学习训练](.\u002Fexamples\u002Fopenclaw\u002F)。\n- 📈 **可扩展性**：**稳定**的全异步强化学习训练，具备**业界领先的速度**。\n- ✨ **前沿性能**：最先进的[数学](\u002Fblog\u002FAReaL_v0_2.md)、[编程](\u002Fblog\u002FAReaL_v0_3.md)、[搜索](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher)和[客服](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.22607)智能体。\n\n## 📰 新闻\n\n**\\[2026\u002F03\u002F02\\]** 我们提供了一个完整的示例([examples\u002Fopenclaw\u002F](.\u002Fexamples\u002Fopenclaw\u002F))，只需将 `base_url` 和 `api_key` 替换为 AReaL 的 RL 服务，即可训练属于您自己的 🦞 OpenClaw 智能体——无需复杂的依赖项，无需修改代码，适用于任何智能体运行时！\n\n**\\[2026\u002F02\u002F06\\]** 我们很高兴推出**AReaL-SEA**，一款自我进化的数据合成引擎。结合 AReaL 上的强化学习训练，这款 235B 参数的 MoE 模型超越了 GPT-5，并在 $\\\\tau^2$ 基准测试上达到了与 Gemini 3.0 Pro 相当的性能！请查看[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2601.22607)、[模型](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FAReaL-SEA-235B-A22B)、[数据](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FinclusionAI\u002FAReaL-tau2-data)以及[代码](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Ftree\u002Fmain\u002Fexamples\u002Ftau2)。\n\n**\\[2026\u002F01\u002F15\\]** 恭喜我们的朋友 [CAMEL-AI](https:\u002F\u002Fwww.camel-ai.org\u002F) 开源了他们的终端智能体强化学习项目 [SETA](https:\u002F\u002Fgithub.com\u002Fcamel-ai\u002Fseta)，该项目正是使用 AReaL 训练的！请查看他们的[训练流程](https:\u002F\u002Fgithub.com\u002Fcamel-ai\u002Fseta\u002Ftree\u002Fmain\u002Ftraining\u002Ftbench_areal_workflow)以及[X 平台上的公告](https:\u002F\u002Fx.com\u002Fguohao_li\u002Fstatus\u002F2009678513574408636)。\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>📋 往期发布\u003C\u002Fb>\u003C\u002Fsummary>\n\n**\\[2026\u002F01\u002F01\\]** 新年快乐！感谢 @HwVanICI 的杰出贡献，我们很高兴正式宣布对**Ascend NPU 设备**的 AReaL 训练提供稳定支持！相关代码正在积极维护并在 [`ascend` 分支](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Ftree\u002Fascend)中持续更新。请参阅[我们的文档](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fen\u002Ftutorial\u002Finstallation_npu.html)开始使用，如有任何问题欢迎随时反馈！\n\n**\\[2025\u002F08\u002F30\\]** 推出 ASearcher，一款基于 AReaL 全链路异步强化学习训练打造的最先进搜索智能体。请查看[论文](assets\u002Fpaper.pdf)和[开源仓库](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher)！\n\n**\\[2025\u002F07\u002F31\\] (AReaL-lite)** 我们推出了 AReaL-lite，这是一个专为 AI 研究人员和快速原型设计打造的**轻量级**版本。AReaL-lite 采用**算法优先**的 API 设计，注重易用性和算法开发，同时原生支持**全异步智能体强化学习**。尽管代码量减少了 80%，AReaL-lite 仍保持了 AReaL 90% 的性能和核心功能。请查阅[我们的 AReaL-lite 设计文档](\u002Fareal\u002FREADME.md)和[快速入门指南](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fen\u002Ftutorial\u002Fquickstart.html)，开启您的 **AReaL-lite** 之旅！\n\n**\\[2025\u002F06\u002F03\\] (v0.3, boba²)** 我们发布了 **boba²**（双珍珠），用于全异步强化学习训练，相比同步系统，它实现了**2.77 倍的加速，同时提供相当或更优的训练性能**。此外，异步强化学习显著简化了多轮智能体强化学习的设置！请查看[我们的 v0.3 概览博客](\u002Fblog\u002FAReaL_v0_3.md)和[研究论文](assets\u002Fpaper.pdf)。\n\n**\\[2025\u002F03\u002F31\\] (v0.2, boba)** 我们推出了里程碑式的发布——boba！请称之为 A-ReaL-boba！此版本支持 SGLang，训练速度大幅提升，并提供了最先进的 7B 和 32B 模型，用于数学推理。请查看我们的[v0.2 技术博客](\u002Fblog\u002FAReaL_v0_2.md)。\n\n**\\[2025\u002F02\u002F24\\] (v0.1)** 我们的首次发布包含了 1.5B 和 7B 大型推理模型（LRMs）的可复现结果。请查看我们的[v0.1 技术博客](\u002Fblog\u002FAReaL_v0_1.md)。\n\n\u003C\u002Fdetails>\n\n## 🚀 快速入门\n\n首先，安装软件包：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\ncd AReaL\npip install uv\n# 首先安装 flash-attn 预编译的 wheel 包，避免从源码编译\n# （选择与您的 Python 版本匹配的 wheel；详见 https:\u002F\u002Fgithub.com\u002Fmjun0812\u002Fflash-attention-prebuild-wheels\u002Freleases）\nuv pip install \"https:\u002F\u002Fgithub.com\u002Fmjun0812\u002Fflash-attention-prebuild-wheels\u002Freleases\u002Fdownload\u002Fv0.7.16\u002Fflash_attn-2.8.3+cu128torch2.9-cp312-cp312-linux_x86_64.whl\"\nuv sync --extra cuda  # 安装训练所需包 + SGLang（默认推理后端）\n```\n\n我们的训练脚本会自动下载所需的数据集（openai\u002Fgsm8k）和模型（Qwen\u002FQwen2-1.5B-Instruct）。要在单节点上运行：\n\n```bash\npython3 examples\u002Fmath\u002Fgsm8k_rl.py --config examples\u002Fmath\u002Fgsm8k_grpo.yaml scheduler.type=local\n```\n\n要在包含 2 个节点、每节点 8 张 GPU 的 Ray 集群上运行（请记得更新 YAML 文件中的路径，指向您的共享存储）：\n\n```bash\npython3 examples\u002Fmath\u002Fgsm8k_rl.py --config examples\u002Fmath\u002Fgsm8k_grpo.yaml \\\n  cluster.n_nodes=2 cluster.n_gpus_per_node=8 \\\n  scheduler.type=ray\n```\n\n有关完整的设置说明，请参阅[我们的快速入门指南](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fen\u002Ftutorial\u002Fquickstart.html)。\n\n## 📚 示例\n\n### 数学与推理\n\n| 任务                                                | 描述                                                                                  | 性能                                                       |\n| --------------------------------------------------- | -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- |\n| **[数学](examples\u002Fmath\u002F)**                          | 使用 GRPO、PPO、DAPO、REINFORCE、RLOO、LitePPO、DR-GRPO、GSPO 等方法进行 GSM8K 数学推理 | -                                                                 |\n| **[多轮数学](examples\u002Fmulti_turn_math\u002F)**    | 具有多轮奖励折扣的多轮数学智能体                                   | [训练曲线](examples\u002Fmulti_turn_math\u002Freward_curve.png)       |\n| **[LoRA 数学](examples\u002Fmath\u002Fgsm8k_grpo_lora.yaml)** | 使用 LoRA 进行参数高效的数学训练（SGLang\u002FvLLM 后端）                           | -                                                                 |\n| **[倒计时](examples\u002Fcountdown\u002F)**                | 带有自定义奖励的倒计时数字游戏                                                   | [训练曲线](examples\u002Fcountdown\u002Fcountdown_training_curve.png) |\n\n### 智能体强化学习\n\n| 任务                                                     | 描述                                                            | 性能                                                                  |\n| -------------------------------------------------------- | ---------------------------------------------------------------------- | ---------------------------------------------------------------------------- |\n| **[通用智能体](examples\u002Fagent_workflow\u002F)**            | 使用任何智能体框架进行通用智能体训练                   | [指南](docs\u002Ftutorial\u002Fagentic_rl.md)                                         |\n| **[Tau2 客服](examples\u002Ftau2\u002F)**              | Tau2-Bench 上的客服智能体（零售、航空、电信）        | [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.22607)                                    |\n| **[搜索智能体](examples\u002Fsearch_agent\u002F)**               | 使用 Tongyi-DeepResearch 工作流的端到端搜索智能体              | [训练曲线](examples\u002Fsearch_agent\u002Ftongyi_deepresearch\u002Freward_curve.png) |\n| **[工具集成推理](examples\u002Ftir\u002F)**           | 推理过程中多轮调用工具（Python 执行器、计算器） | [训练曲线](examples\u002Ftir\u002Ffigures\u002Ftask_reward.png)                       |\n| **[OpenAI 智能体集成](examples\u002Fopenai_agents\u002F)** | 与 OpenAI 智能体 SDK 集成以实现智能体工作流               | -                                                                            |\n| **[CAMEL-AI 集成](examples\u002Fcamel\u002F)**              | 与 CAMEL-AI 框架集成以进行智能体强化学习                     | -                                                                            |\n\n### 视觉-语言模型\n\n| 任务                                | 描述                                               | 性能                                     |\n| ----------------------------------- | --------------------------------------------------------- | ----------------------------------------------- |\n| **[VLM](examples\u002Fvlm\u002F)**            | 使用 GRPO 进行 Geometry3K 和 CLEVR Count 70K 的视觉推理 | -                                               |\n| **[NPU 上的 VLM](examples\u002Fvlm_npu\u002F)** | 在华为 NPU 硬件上进行 VLM 训练                       | [基准测试结果](examples\u002Fvlm_npu\u002FREADME.md)        |\n\n### 对齐与基础设施\n\n| 任务                                            | 描述                                           | 性能                                       |\n| ----------------------------------------------- | ----------------------------------------------------- | ------------------------------------------------- |\n| **[RLHF 奖励建模](examples\u002Falignment\u002F)** | 在 Anthropic HH-RLHF 上进行 Bradley-Terry 奖励建模    | [训练曲线](examples\u002Falignment\u002Frw_curve.png)        |\n| **[SkyPilot 部署](examples\u002Fskypilot\u002F)**   | 使用 SkyPilot 进行云部署（GCP、AWS、Kubernetes） | [截图](examples\u002Fskypilot\u002FREADME.md)        |\n\n## 🔧 支持矩阵\n\n### 🧠 算法\n\n所有强化学习算法都通过设置 `max_head_offpolicyness=0` 来支持异步和同步版本。详情请参阅 [异步强化学习指南](docs\u002Falgorithms\u002Fasync.md)。\n\n| 算法                | 文档                                 | 论文                                          | 配置                                                     |\n| ------------------------ | --------------------------------------------- | ---------------------------------------------- | ----------------------------------------------------------------- |\n| **GRPO**                 | [📖 文档](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.03300)   | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_grpo.yaml)                 |\n| **GSPO**                 | [📖 文档](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.18071)   | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_gspo.yaml)                 |\n| **PPO**                  | [📖 文档](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.02155)   | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_ppo.yaml)                  |\n| **DAPO**                 | [📖 文档](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.14476)   | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_dapo_dynamic_bs.yaml)      |\n| **LitePPO**              | [📖 文档](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.08221)   | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_liteppo.yaml)              |\n| **Dr.GRPO**              | [📖 文档](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.20783)   | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_drgrpo.yaml)               |\n| **REINFORCE++**          | -                                             | [📄 论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.03262)   | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_reinforce.yaml)            |\n| **RLOO**                 | [📖 文档](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.14740v1) | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_rloo.yaml)                 |\n| **SAPO**                 | [📖 文档](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)  | [📄 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.20347)   | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_sapo.yaml)                 |\n| **M2PO**                 | [📖 文档](docs\u002Falgorithms\u002Fm2po.md)            | [📄 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.01161)   | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_m2po.yaml)                 |\n| **RLHF 奖励建模**        | -                                             | -                                              | [🔗 RLHF 示例](examples\u002Falignment\u002F)                            |\n| **SFT**                  | -                                             | -                                              | [🔗 GSM8K 示例](examples\u002Fmath\u002Fgsm8k_sft.py)                    |\n| **蒸馏**         | [📖 文档](docs\u002Fen\u002Falgorithms\u002Fdistillation.md) | [📄 论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.02208)   | [🔗 GSM8K 示例](examples\u002Fdistillation\u002Fgsm8k_grpo_distill.yaml) |\n\n### 模型\n\n| 模型家族               | Megatron | PyTorch FSDP | PyTorch Archon | 备注                                                    |\n| -------------------------- | -------- | ------------ | -------------- | -------------------------------------------------------- |\n| **Qwen2\u002F3**                | ✅       | ✅           | ✅             | -                                                        |\n| **Qwen3-MoE**              | ✅       | ✅           | ✅             | -                                                        |\n| **Qwen2.5-VL**             | ❌       | ✅           | ❌             | 视觉语言模型                                    |\n| **Qwen3-VL**               | ❌       | ✅           | ❌             | 视觉语言模型                                    |\n| **Gemma 3**                | ❌       | ✅           | ❌             | 视觉语言模型                                    |\n| **其他 Hugging Face LLM** | ❌       | ✅           | ❌             | 兼容性取决于 `transformers` 的版本 |\n\n有关如何将新模型集成到 AReaL 中，请参阅 [AI 编码助手指南](docs\u002Freference\u002Fai_assisted_dev.md) 和 [Archon 参考文档](docs\u002Ftutorial\u002Farchon.md)。\n\n### 训练后端\n\n| 后端            | DP          | 张量并行 | TP 内的序列并行 | 上下文并行 | 流水线并行 | 专家并行 | 1D 序列打包 | LoRA                             |\n| ------------------ | ----------- | --------------- | --------------------------- | ---------------- | ----------------- | --------------- | ------------------- | -------------------------------- |\n| **Megatron**       | ✅ (ZeRO-1) | ✅              | ✅                          | ✅               | ✅                | ✅              | ✅                  | ✅ (与 vLLM 推理后端结合) |\n| **PyTorch FSDP**   | ✅ (FSDP2)  | ✅              | ✅                          | ✅               | ❌                | ❌              | ✅                  | ✅                               |\n| **PyTorch Archon** | ✅ (FSDP2)  | ✅              | ✅                          | ✅               | ✅                | ✅              | ✅                  | ❌                               |\n\n### 推理后端\n\n| 后端    | 张量并行 | 上下文并行 | 流水线并行 | 数据并行注意力 | 专家并行 |\n| ---------- | --------------- | ---------------- | ----------------- | ----------------------- | --------------- |\n| **vLLM**   | ✅              | ❓               | ✅                | ❓                      | ❓              |\n| **SGLang** | ✅              | ❌               | ❌                | ✅                      | ✅              |\n\n## 📖 资源\n\n### 教程\n\n- [安装](docs\u002Fen\u002Ftutorial\u002Finstallation.md)\n- [快速入门](docs\u002Fen\u002Ftutorial\u002Fquickstart.md)\n- [代理式强化学习](docs\u002Fen\u002Ftutorial\u002Fagentic_rl.md)\n- [评估](docs\u002Fen\u002Ftutorial\u002Feval.md)\n- [使用 Megatron 运行大型 MoE](docs\u002Fen\u002Ftutorial\u002Fmegatron.md)\n- [使用 PyTorch Archon 运行大型 MoE](docs\u002Fen\u002Ftutorial\u002Farchon.md)\n\n### 代码讲解\n\n- [在 GSM8K 数据集上运行 GRPO](docs\u002Fen\u002Ftutorial\u002Fgsm8k_grpo.md)\n\n### 最佳实践\n\n- [提升算法性能](docs\u002Fen\u002Fbest_practices\u002Falgo_perf.md)\n- [代理工作流最佳实践](docs\u002Fen\u002Fbest_practices\u002Fworkflow.md)\n- [调试](docs\u002Fen\u002Fbest_practices\u002Fdebugging.md)\n- [处理 OOM 问题](docs\u002Fen\u002Fbest_practices\u002Fhandling_oom.md)\n- [性能剖析](docs\u002Fen\u002Fbest_practices\u002Fperf_profiling.md)\n\n### 自定义\n\n- [自定义数据集](docs\u002Fen\u002Fcustomization\u002Fdataset.md)\n- [自定义代理\u002FRVLR 回放工作流](docs\u002Fen\u002Fcustomization\u002Fagent.md)\n\n### 算法\n\n- [异步强化学习详解](docs\u002Fen\u002Falgorithms\u002Fasync.md)\n- [PPO、GRPO 及相关算法](docs\u002Fen\u002Falgorithms\u002Fgrpo_series.md)\n- [M2PO](docs\u002Fen\u002Falgorithms\u002Fm2po.md)\n\n### 参考\n\n- [CLI 配置](docs\u002Fen\u002Fcli_reference.md)\n- [LoRA RL](docs\u002Fen\u002Freference\u002Flora.md)\n- [检查点机制](docs\u002Fen\u002Freference\u002Fcheckpointing.md)\n- [指标跟踪](docs\u002Fen\u002Freference\u002Fmetrics_tracking.md)\n- [分配模式](docs\u002Fen\u002Freference\u002Falloc_mode.md)\n- [Rollout 工作流](docs\u002Fen\u002Freference\u002Frollout_workflow.md)\n- [Agent 工作流](docs\u002Fen\u002Freference\u002Fagent_workflow.md)\n- [AI 辅助开发](docs\u002Fen\u002Freference\u002Fai_assisted_dev.md)\n\n## 🤝 贡献\n\n我们热烈欢迎社区的贡献！无论您是修复 bug、添加新功能、改进文档，还是帮助他人，您的贡献都备受珍视。请查看我们的 **[贡献指南](CONTRIBUTING.md)** 以获取详细信息。\n\n```bash\n# 分支并克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002FYOUR-USERNAME\u002FAReaL\ncd AReaL\n\n# 安装 uv 并同步依赖\npip install uv\n# 安装 flash-attn 预编译轮子，避免从源码编译\nuv pip install \"https:\u002F\u002Fgithub.com\u002Fmjun0812\u002Fflash-attention-prebuild-wheels\u002Freleases\u002Fdownload\u002Fv0.7.16\u002Fflash_attn-2.8.3+cu128torch2.9-cp312-cp312-linux_x86_64.whl\"\n# 在带有 CUDA 的 Linux 上使用 `--extra cuda`（安装训练相关包 + SGLang）\nuv sync --extra cuda --group dev\n# 如果使用 vLLM（注意：需使用 torch2.10 版本的 flash-attn 轮子）：\n# uv sync --extra cuda-vllm --group dev\n# 或者不支持 CUDA 的情况：\n# uv sync --group dev\n\n# 设置 pre-commit 钩子（格式化、静态检查、提交信息检查）\npre-commit install --install-hooks\n\n# 进行更改\ngit checkout -b feat\u002Fgpt-o5\ngit add .\n# `git commit` 会自动检查文件和提交信息\ngit commit -m \"feat: implement gpt-o5 training loop\"\ngit push\n```\n\n## 🗺️ 未来路线图\n\n- **[完整路线图](ROADMAP.md)**\n- **[2025 年第四季度路线图](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fissues\u002F542)**\n\nAReaL 目前正处于积极开发中，计划每周发布小版本，每月发布大版本。我们热忱欢迎社区参与和贡献。同时，我们也 **正在招聘实习生和全职员工**，在美国和中国均有职位空缺。\n\n## 🙏 致谢\n\n我们衷心感谢主要贡献者来自清华大学交叉信息研究院（IIIS）和蚂蚁集团的 AReaL 团队。\n\n此外，我们还得到了以下机构和个人的宝贵帮助（按字母顺序排列）：\n\n- 蚂蚁研究院数据智能实验室提供的数据支持\n\n- @HwVanICI 在 vLLM、LoRA、NPU 集成等方面的支持\n\n- 香港科技大学的 [Relaxed System Lab](https:\u002F\u002Fgithub.com\u002FRelaxed-System-Lab) 团队，在众多系统相关方面的无缝协作\n\n- [SGLang 团队](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang) 对自定义权重更新功能的支持，以及在 AReaL-lite 开发期间的贡献\n\n- 蚂蚁集团超算技术团队（SCT）在大规模集群运维方面的专业经验\n\n- 特别感谢 @Lyken17 在整个 API 设计过程中提供的宝贵建议\n\n我们同样深深感激社区的所有开创性工作，尤其是 OpenPsi Inc. 的 [ReaLHF](https:\u002F\u002Fgithub.com\u002Fopenpsi-project\u002FReaLHF) 项目，以及其他杰出项目，包括但不限于 [DeepScaleR](https:\u002F\u002Fgithub.com\u002Fagentica-project\u002Fdeepscaler)、[Open-Reasoner-Zero](https:\u002F\u002Fgithub.com\u002FOpen-Reasoner-Zero\u002FOpen-Reasoner-Zero\u002Ftree\u002Fmain)、[OpenRLHF](https:\u002F\u002Fgithub.com\u002FOpenRLHF\u002FOpenRLHF)、[VeRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl)、[SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang)、[QwQ](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwQ)、[Light-R1](https:\u002F\u002Fgithub.com\u002FQihoo360\u002FLight-R1)，以及 [DAPO](https:\u002F\u002Fgithub.com\u002FBytedTsinghua-SIA\u002FDAPO) 等。\n\n## 📄 引用\n\n```bibtex\n@inproceedings{mei2025real,\n  author       = {Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},\n  title        = {ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation},\n  booktitle    = {Proceedings of the Eighth Conference on Machine Learning and Systems,\n                  MLSys 2025, Santa Clara, CA, USA, May 12-15, 2025},\n  publisher    = {mlsys.org},\n  year         = {2025},\n}\n```\n\n```bibtex\n@misc{fu2025areal,\n      title={AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning},\n      author={Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu},\n      year={2025},\n      eprint={2505.24298},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.24298},\n}\n```","# AReaL 快速上手指南\n\nAReaL 是一个由清华大学 IIIS 研究所和蚂蚁集团 AReaL 团队联合开源的大规模**全异步**强化学习（RL）训练系统，专为大型推理模型和智能体（Agent）设计。本指南将帮助您快速搭建环境并运行第一个训练任务。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+)\n*   **Python**: 3.10 - 3.12 (示例中使用了 Python 3.12)\n*   **GPU**: NVIDIA GPU (支持 CUDA)，显存建议 16GB+；或华为 Ascend NPU（需切换至 `ascend` 分支）。\n*   **工具**: `git`, `pip` (推荐使用 `uv` 进行依赖管理以提升速度)。\n*   **网络**: 能够访问 Hugging Face 和 GitHub（若下载模型\u002F数据受阻，建议配置国内镜像或代理）。\n\n> **注意**：AReaL 默认使用 SGLang 作为推理后端，对 CUDA 版本有特定要求。\n\n## 2. 安装步骤\n\n推荐使用 `uv` 工具进行快速安装，它能有效避免源码编译带来的耗时问题。\n\n### 第一步：克隆代码库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\ncd AReaL\n```\n\n### 第二步：安装 uv 包管理器\n```bash\npip install uv\n```\n\n### 第三步：预安装 flash-attn\n为了避免从源码编译 `flash-attn`（耗时且易出错），请直接安装预编译好的 wheel 包。\n*请根据您的实际 Python 版本和 CUDA 版本选择对应的链接，以下为 Python 3.12 + CUDA 12.8 的示例：*\n\n```bash\nuv pip install \"https:\u002F\u002Fgithub.com\u002Fmjun0812\u002Fflash-attention-prebuild-wheels\u002Freleases\u002Fdownload\u002Fv0.7.16\u002Fflash_attn-2.8.3+cu128torch2.9-cp312-cp312-linux_x86_64.whl\"\n```\n\n### 第四步：同步安装项目依赖\n安装训练所需的核心包及默认的 SGLang 推理后端：\n\n```bash\nuv sync --extra cuda\n```\n\n> **提示**：如果您使用的是华为昇腾（Ascend）NPU，请切换到 `ascend` 分支并参考官方文档中的 NPU 安装指南。\n\n## 3. 基本使用\n\n安装完成后，您可以直接运行示例脚本。AReaL 会自动下载所需的演示数据集（如 `openai\u002Fgsm8k`）和基础模型（如 `Qwen\u002FQwen2-1.5B-Instruct`）。\n\n### 场景一：单机单卡\u002F多卡运行\n在本地单节点上启动数学推理（GSM8K）的 GRPO 训练：\n\n```bash\npython3 examples\u002Fmath\u002Fgsm8k_rl.py --config examples\u002Fmath\u002Fgsm8k_grpo.yaml scheduler.type=local\n```\n\n### 场景二：分布式集群运行 (Ray)\n若在 Ray 集群上运行（例如 2 个节点，每节点 8 张 GPU），请确保 YAML 配置中的路径指向共享存储，并使用以下命令：\n\n```bash\npython3 examples\u002Fmath\u002Fgsm8k_rl.py --config examples\u002Fmath\u002Fgsm8k_grpo.yaml \\\n  cluster.n_nodes=2 cluster.n_gpus_per_node=8 \\\n  scheduler.type=ray\n```\n\n### 自定义智能体训练\nAReaL 支持通过简单替换 `base_url` 来适配不同的智能体运行时。例如，训练自定义的 OpenClaw 智能体：\n\n```bash\n# 修改配置文件中的 base_url 和 api_key 指向您的 RL 服务即可，无需修改核心代码\npython3 examples\u002Fopenclaw\u002Ftrain.py --config your_config.yaml\n```\n\n---\n**下一步建议**：\n*   查看 [中文文档](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Fzh\u002F) 获取详细的算法配置说明。\n*   探索 `examples\u002F` 目录下的更多案例，涵盖多轮对话、搜索智能体、视觉语言模型（VLM）等场景。","某电商巨头正在构建一个能自主处理复杂售后纠纷的智能客服 Agent，需要模型具备多步推理和工具调用能力。\n\n### 没有 AReaL 时\n- **训练效率极低**：传统的同步强化学习流程导致 GPU 大量时间在等待环境反馈，训练一个 70B 参数的模型往往需要数周甚至更久，严重拖慢迭代节奏。\n- **架构耦合度高**：若要切换不同的业务场景（如从“退换货”转为“物流查询”），开发人员必须深度修改底层代码以适配新的环境接口，维护成本巨大。\n- **资源利用率不足**：难以充分利用异构算力集群，尤其是在混合部署 NPU 和 GPU 时，常因通信瓶颈导致算力闲置，硬件投资回报率低。\n- **策略收敛困难**：在处理长链条任务时，由于更新延迟高，模型策略难以稳定收敛，经常出现训练崩溃或性能倒退的情况。\n\n### 使用 AReaL 后\n- **训练速度飞跃**：借助全异步架构，AReaL 消除了等待延迟，将同等规模模型的训练周期从数周压缩至数天，实现了行业领先的吞吐速度。\n- **灵活零代码切换**：只需简单替换 `base_url` 即可无缝接入不同的智能体运行环境，无需改动核心代码，让团队能一天内完成新业务场景的适配。\n- **异构算力满负荷**：完美支持 Ascend NPU 与 GPU 混合部署，稳定发挥集群全部算力，显著降低了单位 token 的训练成本。\n- **大规模稳定收敛**：即使在处理复杂的数学推理或多步搜索任务时，AReaL 也能保持训练过程的极度稳定，帮助模型快速达到 SOTA 水平。\n\nAReaL 通过极致的异步并行架构，让企业能以低成本、高效率轻松定制出超越通用大模型的专业级 AI 智能体。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_AReaL_fa45d15e.png","inclusionAI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FinclusionAI_70666e45.jpg","This organization contains the series of open-source projects from Ant Group with dedicated efforts to work towards Artificial General Intelligence (AGI).",null,"https:\u002F\u002Finclusion-ai.org","https:\u002F\u002Fgithub.com\u002FinclusionAI",[21,25,29,33],{"name":22,"color":23,"percentage":24},"Python","#3572A5",81.9,{"name":26,"color":27,"percentage":28},"Jupyter Notebook","#DA5B0B",17.9,{"name":30,"color":31,"percentage":32},"Dockerfile","#384d54",0.2,{"name":34,"color":35,"percentage":36},"Shell","#89e051",0,4999,453,"2026-04-07T19:18:14","Apache-2.0",4,"Linux","需要 NVIDIA GPU (默认后端为 SGLang)，支持 CUDA 12.8 (示例中使用了 cu128)；另提供 Ascend NPU 分支支持华为昇腾设备","未说明",{"notes":46,"python":47,"dependencies":48},"1. 推荐使用 uv 进行包管理。2. 为避免源码编译，需预先安装与 Python 版本和 CUDA 版本匹配的 flash-attn 预编译 wheel。3. 支持单机本地运行及 Ray 集群分布式运行。4. 针对华为昇腾 NPU 设备，需切换至专门的 'ascend' 分支并使用特定文档进行安装。5. 训练脚本会自动下载所需的数据集和模型。","3.12 (示例安装命令中指定了 cp312)",[49,50,51,52,53],"flash-attn>=2.8.3","torch>=2.9","SGLang","uv","Ray",[55,56,57],"语言模型","Agent","开发框架",[59,60,61,62,63,64,65,66],"llm","llm-reasoning","machine-learning-systems","mlsys","reinforcement-learning","rl","agent","llm-agent",2,"ready","2026-03-27T02:49:30.150509","2026-04-08T10:01:13.707036",[72,77,82,87,92,96],{"id":73,"question_zh":74,"answer_zh":75,"source_url":76},24037,"训练过程中奖励（Reward）始终为 -5 是什么原因？","这种情况在某些场景下是正常的，可能的原因包括：\n1. 模型输出了某些 LaTeX 代码，导致解析器进入递归解析状态；\n2. CPU 和内存资源不足，导致初始化用于计算奖励的新进程池（ProcessPool）耗时过长。\n建议检查系统资源使用情况以及模型输出内容。","https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fissues\u002F128",{"id":78,"question_zh":79,"answer_zh":80,"source_url":81},24038,"程序卡在 \"Connecting to workers...\" 阶段无法继续怎么办？","这通常是由名称解析（name_resolve）问题引起的，特别是 `fcntl` 可能引入了相关 bug。\n解决方案：\n1. 强烈建议使用 Docker 容器运行实验，这是最稳定且推荐的方式。\n2. 如果必须在本地虚拟环境中运行，可以尝试应用相关的修复补丁（如 PR #27），并运行测试 `pytest -s tests\u002Fdistributed\u002Ftest_nfs_name_resolve.py` 验证名称解析功能是否正常。\n3. 检查是否配置了正确的环境变量和文件路径。","https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fissues\u002F25",{"id":83,"question_zh":84,"answer_zh":85,"source_url":86},24039,"如何实现具有可变回合数（varying turn number）的多轮代理（Agent）强化学习？","要实现回合数不固定的多轮代理 RL，需要在 Agent 的 `collect_trajectory` 函数中正确构建 `SequenceSample`。具体步骤是将所有回合的状态 - 动作对（或输入 - 输出对）拼接在一起：\n\n```python\n# 打包训练数据示例\npacked_input_ids = []\nprompt_mask = []\npacked_logprobs = []\nseq_no_eos_mask = []\npacked_prompts = prompt_token_ids\nseqlens = []\nfinal_rewards = []\n\nfor i in range(group_size):\n    for state, action in trajs[i]: \n        _tokens = list(state.tokens) + list(action.tokens)\n        _prompt_mask = [0.] * len(state.tokens) + [1.] * len(action.tokens)\n        # 将_tokens, _prompt_mask 等添加到对应的列表中\n        # 更新 seqlens 和其他元数据\n```\n确保每个样本的 `ids` 长度与 `seqlens` 定义的长度一致，即可支持不同样本拥有不同数量的对话回合。","https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fissues\u002F104",{"id":88,"question_zh":89,"answer_zh":90,"source_url":91},24040,"运行异步 PPO 示例脚本时出现 \"TimeoutError: Waiting generation servers timeout\" 错误如何解决？","该错误通常发生在生成服务器未能及时启动或连接超时。请检查以下几点：\n1. 确认 GPU 资源充足且未被其他进程占用；\n2. 检查 `allocation_mode` 配置是否正确匹配当前硬件拓扑（例如单节点多卡应使用类似 `sglang.d4p1m1+d2p2m1` 的模式）；\n3. 增加超时等待时间参数（如果配置支持）；\n4. 查看日志中是否有更早的错误信息（如 CUDA 初始化失败、端口冲突等）；\n5. 尝试减少并发请求数（如降低 `max_concurrent_rollouts` 或 `actor.sglang.max_running_requests`）以减轻负载。","https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fissues\u002F115",{"id":93,"question_zh":94,"answer_zh":95,"source_url":81},24041,"在本地虚拟环境安装时遇到 \"ModuleNotFoundError: No module named 'realhf'\" 或 pip 安装失败怎么办？","此类问题通常源于依赖未正确安装或环境冲突。建议操作如下：\n1. 创建全新的 conda 环境并激活；\n2. 确保已安装必要的构建工具（如 `pybind11`）；\n3. 在项目根目录执行 `pip install -e .` 进行可编辑安装；\n4. 若仍失败，优先改用官方提供的 Docker 镜像，可避免绝大多数环境配置问题；\n5. 运行测试脚本 `pytest -s tests\u002Fdistributed\u002Ftest_nfs_name_resolve.py` 验证核心模块是否加载成功。",{"id":97,"question_zh":98,"answer_zh":99,"source_url":76},24042,"如何在 Slurm 集群上部署和运行 AReaL 分布式训练任务？","在 Slurm 集群上运行需编写合适的 SBATCH 脚本，关键步骤包括：\n1. 设置 CUDA 路径和环境变量（如 `CUDA_DIR`, `LD_LIBRARY_PATH`, `FLASHINFER_CACHE_DIR`）；\n2. 配置 NCCL 通信参数（如 `NCCL_ALGO=NVLSTree`, `NCCL_IB_HCA=mlx5` 等）；\n3. 获取分配节点列表并设置 head node IP 和端口；\n4. 在所有节点上停止旧 Ray 集群并清理临时文件：\n   ```bash\n   srun --nodes=$worker_num --ntasks=$worker_num ray stop\n   srun --nodes=$worker_num --ntasks=$worker_num rm -rf \u002Ftmp\u002Fray\n   ```\n5. 启动 Ray 头节点和工作节点；\n6. 提交训练脚本并通过 Hydra 配置指定节点数、GPU 数、数据集路径等参数。",[101,106,111,116,121,126,131,136,141,146,151,156,161,166,171,176,181,186,191,196],{"id":102,"version":103,"summary_zh":104,"released_at":105},145598,"v1.0.2","## 发布说明\n\n**在此向本次发布中加入我们的新贡献者致以最诚挚的感谢！本项目的强大之处在于开源社区的集体智慧，而正是你们的工作推动我们不断前进。**\n\n### 🚀 模型与架构更新\n+ Qwen3.5 支持：新增对 Qwen3.5 的密集模型和 MoE（专家混合）变体的支持（Archon 后端，仅支持数据并行）。\n\n+ 策略内蒸馏：引入了对策略内蒸馏的原生支持。\n\n+ 增加了对 Hugging Face 内核以及分层优化器步数的可选支持，并为 FSDP 配备了流式 H2D\u002FD2H 数据传输管道。\n\n### 🛠 基础设施与可扩展性\n+ Docker 与运行时：将 Docker 镜像拆分为专门的 sglang 和 vllm 变体，以支持不同的 PyTorch 版本并加快镜像更新速度。\n\n### 📖 文档与本地化\n+ 双语支持：推出了全面的双语（英\u002F中）文档，包括新的 translate-doc-zh 命令，并修复了 LaTeX 渲染问题。\n\n+ 新指南：新增了在线代理模式训练指南，并修订了现有的在线强化学习训练教程。\n\n## 变更内容\n* 功能（模型）：由 @aoshen524 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F929 中实现，将视觉编码器在 Ulysses SP 节点间进行分片。\n* 文档：通过添加 \u002Fen\u002F 语言前缀修复了损坏的文档链接，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F986 中完成。\n* [功能] 添加策略内蒸馏支持，由 @HwVanICI 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F964 中实现。\n* 修复：在上下文并行设置下保持 PPO 令牌统计的一致性，由 @yash27-lab 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F990 中完成。\n* 升级 GitHub Actions 以兼容 Node 24，由 @salmanmkc 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F993 中完成。\n* 将 GitHub Actions 升级至最新版本，由 @salmanmkc 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F994 中完成。\n* 杂项：增加对 Python 3.11 的支持（requires-python >=3.11,\u003C3.13），由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F991 中实现。\n* 杂项（CI）：用 pre-commit CI 替代格式检查，并添加 commit-msg 钩子，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F998 中完成。\n* 功能（文档）：添加双语文档及 translate-doc-zh 命令，由 @ZiyiTsang 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F995 中实现。\n* 重构：将子模块导入扁平化，改为使用父包的重新导出，由 @NJX-njx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F996 中完成。\n* 功能（基础设施）：将 Docker 镜像拆分为 sglang 和 vllm 变体，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F985 中完成。\n* 文档：修复双语文档中损坏的 LaTeX 渲染问题，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F1004 中完成。\n* 修复：在 macOS 上实现仅 CPU 支持，由 @zhanghaotong 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F1003 中完成。\n* 功能（Archon）：为 FP32 路由门 GEMM 添加 moe_router_dtype 配置，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F1009 中完成。\n* 重构：将字符串字面量替换为枚举，并修复日志记录问题——…，由 @HT-Yuan 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F1008 中完成。\n* 功能（FSDP）：a","2026-03-17T14:19:29",{"id":107,"version":108,"summary_zh":109,"released_at":110},145599,"v1.0.1","## 发布说明\n\n这是一个补丁版本，修复了 Docker 镜像中的依赖问题，并丰富了 OpenClaw 示例的文档和测试。\n\n## 变更内容\n* fix(config): 修复 OpenClaw 配置中的拼写错误，并由 @fishcrap 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F959 中将 `max_tokens_per_mb` 增加。\n* docs(openclaw): 将 README 中的硬编码管理员密钥替换为占位符，由 @fishcrap 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F967 中完成。\n* feat: 完全支持 MIS\u002FTIS，以稳定部署与训练之间的不匹配，由 @ZiyiTsang 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F930 中实现。\n* refactor(api): 将验证逻辑移至配置的 `__post_init__` 方法中，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F970 中完成。\n* fix(openai-proxy): 在在线模式下，对于空轨迹返回 `None`，由 @fishcrap 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F971 中实现。\n* Ray 置放组重构及多节点推理实例的初步架构，由 @hlyli 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F966 中完成。\n* feat: 添加中文文档，由 @ZiyiTsang 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F969 中完成。\n* fix(api): 将 `SchedulingSpec.ray_placement_strategy` 的类型从 `Literal` 替换为 `str`，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F976 中完成。\n* 更新 README，由 @xssstory 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F974 中完成。\n* test(examples): 添加 OpenClaw 在线强化学习集成测试，由 @fishcrap 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F977 中完成。\n* fix: 将 `torchao` 版本固定为 0.15.0，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F981 中完成。\n* 升级至 v1.0.1，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F982 中完成。\n\n## 新贡献者\n* @hlyli 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F966 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fcompare\u002Fv1.0.0...v1.0.1","2026-03-04T14:18:58",{"id":112,"version":113,"summary_zh":114,"released_at":115},145600,"v1.0.0","## 🚀 重点概览\n## 发行说明\n\n### **在线强化学习训练**\n- 通过配置 `base_url` 和 `api_key` 即可无缝训练任何智能体，无需修改代码，也无需引入繁重的依赖。\n- 更多详情请参阅 [OpenClaw RL 训练示例](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Ftree\u002Fmain\u002Fexamples\u002Fopenclaw)。\n\n### **Archon 引擎**\n- 一个完全可用、原生支持 PyTorch 的 5D 并行训练引擎。\n- 包含以下特性：\n  - 自动 HF 格式转换\n  - 零气泡流水线技术\n  - torch.compile\n  - FSDP（全分片数据并行）\n  - 选择性激活支持\n\n### **AI 辅助编码**\n- 提供官方命令和技能，以简化开发流程并实现轻松定制。\n\n### **基础设施升级**\n- 从之前的 SPMD 架构过渡到更高效的单控制器架构。\n\n### **uv 安装支持**\n- 通过运行简单命令 `uv sync` 即可轻松搭建训练环境。\n\n## 变更内容\n* feat: 使用 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F739 中提出的 math-verify 替代旧版数学解析模块\n* 添加 @HwVanICI 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F748 中提供的 Ascend NPU 安装说明\n* [Bug Fix] 修复代理模式下的工具兼容性、最大 token 限制及 EOS token 问题，由 @yulangz 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F736 中完成\n* refactor: 修改引擎和控制器以支持使用同一训练器的单控制器模式，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F753 中完成\n* @HwVanICI 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F746 中实现 NPU 上的 VLM 训练\n* refactor: 将设备相关工具移至平台类和 io_struct 中，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F757 中完成\n* Fix: 为 train_controller 实现 get_device_stats() 方法，由 @HwVanICI 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F762 中完成\n* refactor: 在提交方法中采用单一来源生成 task_id，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F759 中完成\n* [Bug Fix] 修正 Camel 示例中错误或缺失的智能体参数，由 @HwVanICI 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F766 中完成\n* feat: 在单控制器模式下使用 `name_resolve` 进行工作节点发现，并修复 perf_tracer，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F764 中完成\n* [Feature] 实现单控制器 XCCL 权重更新，由 @HwVanICI 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F754 中完成\n* feat: 实现 Slurm 调度器，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F767 中完成\n* 为单控制器实现 Ray 调度器，由 @HwVanICI 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F741 中完成\n* refactor: 使用回调机制实现 xccl 权重传输，并在 rollout 过程中避免忙等，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F769 中完成\n* [Testing] 更新 GCP 镜像以加速 CI 测试，由 @nuzant 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F772 中完成\n* refactor: 统一启动器、调度规范和 yaml co","2026-03-02T15:04:17",{"id":117,"version":118,"summary_zh":119,"released_at":120},145601,"v1.0.0.rc1","1.0.0 版本的预发布。","2026-02-06T13:59:38",{"id":122,"version":123,"summary_zh":124,"released_at":125},145602,"v0.5.3","## 亮点\n\n这是一个补丁版本，主要目的是提供最新的 Docker 镜像用于测试。\n\n我们将在下一个主要版本中加入文档完善的特性。","2026-01-31T02:06:14",{"id":127,"version":128,"summary_zh":129,"released_at":130},145603,"v0.5.2","## 亮点\r\n\r\n本次发布为补丁版本，主要目的是提供包含 PyTorch 2.9.1、vLLM 0.14.0 和 SGLang 0.5.7 支持的最新 Docker 镜像。\n\n我们将在下一个主要版本中加入文档完善的特性。","2026-01-26T11:18:03",{"id":132,"version":133,"summary_zh":134,"released_at":135},145604,"v0.5.1","## 亮点\n\n这是 v0.5.0 的补丁版本。\n\n+ 新增了一个包含 `math-verify` 和最新版 `ruff` 的 Docker 镜像。\n+ 支持使用 Megatron 引擎的 PPO 评论家模型。\n+ 重构了 FSDP\u002FMegatron 引擎的实现。\n+ 使用 `RTensor`（即原来的 `DistributedBatch`）实现了高效的 RPC 张量传输。\n+ 为 vLLM 添加了束搜索支持。\n\n## 变更内容\n* 修复：修改检查点清理标志，以修复单控制器模式下的 `update_weights_from_disk` 问题，由 @HwVanICI 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F711 中完成。\n* 修复：防止在高数据并行度下 vLLM 服务器出现端口溢出问题（修复 #652），由 @HsiaoTsan 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F653 中完成。\n* 重构：重构训练引擎的高层 API，由 @aaaandychen 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F658 中完成。\n* [修复] 修复 TIR 示例中实验无法正常退出的 bug，由 @nuzant 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F712 中完成。\n* 杂项：在拼接模式下打印更多信息，并处理空工具调用以便于调试，由 @nuzant 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F713 中完成。\n* 杂项：精简 CI 中的测试，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F714 中完成。\n* 重构：在推理引擎中强制执行 task_id 的创建、访问和操作，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F715 中完成。\n* 重构：重新设计 TrainEngine API，使其具有更清晰的抽象层，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F719 中完成。\n* [测试] 为 Megatron 训练引擎添加 SFT\u002FGRPO 集成测试。由 @nuzant 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F726 中完成。\n* [功能] 为 VLM 训练添加 vLLM 支持，由 @HwVanICI 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F698 中完成。\n* 功能：在 vllm 后端支持束搜索，由 @ZiyiTsang 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F721 中完成。\n* 修复：更新多轮数学测试配置，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F727 中完成。\n* 修复：修正束搜索支持检查中的逻辑错误，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F728 中完成。\n* 功能：为 MegatronEngine 添加 PPO 评论家模型支持，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F729 中完成。\n* 功能：在单控制器模式下实现用于元数据传输的 RTensor，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F731 中完成。\n* 修复：修复多轮代理示例，由 @dhh1995 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F733 中完成。\n* 小幅修复：修复 OpenAI 缓存测试，将其加入 CI 测试套件，并移除 Megatron 引擎中的 OOD 待办事项和修复标记，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F732 中完成。\n* [功能] 为 ascend-vLLM 的单 LoRA 功能更新 XCCL，由 @gursimar 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F679 中完成。\n* 修复：在代理示例中将评估的 group_size 设置为 1，由 @dhh1995 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F737 中完成。\n* 功能：添加 ignore_eos 和 skip_special_tokens 生成参数，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F738 中完成。\n* 杂项：将数据集更新至 3.0.0 或更高版本，用于 ","2025-12-18T09:03:34",{"id":137,"version":138,"summary_zh":139,"released_at":140},145605,"v0.5.0","## 亮点\n\nAReaL 新发布的 v0.5.0 版本引入了两项核心创新：无缝智能体强化学习和单控制器架构：\n\n+ **无缝智能体强化学习**：AReaL 通过兼容 OpenAI 的 API 提供无缝的智能体训练服务。这促进了环境提供商、算法开发者和系统工程师之间的顺畅协作，在复杂的工程工作流中形成零摩擦的流水线，显著提升开发效率和系统可维护性。\n\n+ **单控制器架构**：消除了 SPMD（单程序多数据）模型固有的长尾延迟和数据不平衡问题。这种分层设计增强了推理的可扩展性，实现了细粒度的系统级控制，并在最大限度降低算法开发者代码迁移成本的同时，保持了算法灵活性。\n\n其他变更包括：\n\n+ **性能与可扩展性**：对步骤检测、任务分配逻辑和工作流批处理进行了大规模重构。改进了分布式训练，修复了 NCCL 超时、Gloo 组屏障以及 FSDP 的词汇并行 logprobs 问题。\n\n+ **模型与硬件支持**：为 Ascend-vLLM 添加了单 LoRA 功能，并改进了视觉-语言模型（VLM）的处理。\n\n+ **修复与优化**：解决了大量与数据加载、奖励超时、交互缓存、进程清理以及工具调用解析相关的 bug。进行了大规模代码重构，以合并重复逻辑、改进类型提示并集中管理资源。项目范围内的代码格式化统一切换至 ruff。\n\n## 未来工作\n\nAReaL 目前支持基本的单控制器模式和智能体强化学习训练流水线。未来的增强功能包括：\n+ 单控制器模式下的优化数据流和分布式启动能力；\n+ 自动扩缩容、故障恢复和高可用性训练；\n+ 在以智能体为中心的场景中提升训练与推理性能。\n\n## 变更内容\n* @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F578 中更新了 Qwen3-VL 的 README 文档。\n* [修复] @fishcrap 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F580 中将配方目录添加到预提交检查中。\n* [修复] @fishcrap 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F579 中降低了奖励超时警告的阈值。\n* [修复] @fishcrap 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F581 中修复了计算 logp 温度的问题。\n* [功能] @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F583 中围绕全局批次重构了步骤检测逻辑。\n* [杂项] @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F585 中延长了等待超时时间，并强化了配置检查。\n* [功能] @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F584 中简化了步骤分配逻辑。\n* [修复] @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F587 中使用后台线程在工作流执行器中提交任务并获取结果。\n* [修复] @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F589 中在 `agenerate` 中复用了 `aiohttp.ClientSession`。","2025-12-10T15:22:07",{"id":142,"version":143,"summary_zh":144,"released_at":145},145606,"v0.4.1","## 变更内容\n* 功能：添加 `raise_timeout` 参数，支持静默等待推理结果，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F547 中实现。\n* 修复示例 `examples\u002Fvlm\u002Fclevr_count_70k_grpo.yaml` 中的批量大小问题，由 @wangruohui 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F549 中完成。\n* 杂项：使用 ruff 格式化数据集和奖励文件夹，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F551 中完成。\n* 重构：将 `rollout\u002Fprepare_batch` 中的 `should_accept` 参数重命名为 `should_accept_fn`，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F555 中完成。\n* 杂项：删除未计划的实验性功能，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F554 中完成。\n* 功能：添加 grpo 训练器并简化 gsm8k grpo 示例，由 @dhh1995 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F552 中实现。\n* 功能：在推理引擎 API 中添加 `launch_server` 和 `teardown_server` 方法，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F550 中完成。\n* 【重构】重构引擎和示例中 `stats_tracker` 的使用方式，由 @nuzant 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F556 中完成。\n* 重构：允许在 rollout 工作流中传递字符串路径和初始化参数，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F525 中完成。\n* 功能：引入以会话为中心的跟踪 API，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F539 中实现。\n* 文档：添加关于异步强化学习训练的说明，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F558 中完成。\n* 格式化：使用 ruff 格式化示例目录，由 @fishcrap 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F559 中完成。\n* 功能：支持代理服务器和客户端，用于训练与 OpenAI 兼容的智能体，由 @dhh1995 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F500 中实现。\n* 杂项：修改类型注解并对单控制器模式进行小幅修复，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F560 中完成。\n* 文档：添加“性能剖析”指南至最佳实践文档，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F538 中完成。\n* 为 proxy_agent 添加 README 文件，由 @yulangz 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F561 中完成。\n* 杂项：扩展引擎性能监控工具，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F562 中完成。\n* 【功能】为 vLLM 服务器添加暂停\u002F恢复生成功能，由 @fishcrap 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F563 中实现。\n* 文档：更新 AReaL 设计文档，反映当前开发状态，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F568 中完成。\n* 文档：更新文档以匹配当前开发状态，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F570 中完成。\n* 重构：扩展分配模式，支持分配命名和组合，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F565 中完成。\n* 功能：使 perf_tracer 与任务层级结构对齐，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F569 中完成。\n* 杂项：添加关于分配模式重大变更的提示，由 @garrett4wade 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F572 中完成。\n* 【修复】修复工作流中的 atrace_session_phase，由 @fishcrap 在 https:\u002F\u002Fgithub.com\u002Fincl","2025-11-14T05:38:33",{"id":147,"version":148,"summary_zh":149,"released_at":150},145607,"v0.4.0","# AReaL v0.4.0 发行说明\n\n我们很高兴地宣布 AReaL v0.4.0 的发布，这是一个重大版本，为 MoE 模型的强化学习训练提供了稳定的基础设施支持。\n\n## 概述\n\n### MoE 训练\n\n尽管我们上个月已将 Megatron 后端作为实验性功能引入，但一些关键问题一直阻碍我们将其作为稳定版本推出。这些挑战包括：\n\n- 推理与训练之间的精度对齐\n- 权重迁移的复杂性\n- 缺乏经过验证的生产环境下的端到端 MoE 模型训练\n\nAReaL v0.4.0 全面解决了这些问题。在我们的实验中，仅使用 6 个 H200 节点，即可运行 Qwen 235B 模型的完全异步智能体强化学习训练（GRPO）。在数天的训练过程中，我们未遇到任何崩溃情况。\n\n从 FSDP 切换到 Megatron，只需在训练脚本中进行 3–5 行代码修改。有关详细指导，请参阅我们的 [大型 MoE 模型微调教程](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Ftutorial\u002Fmegatron.html)。\n\n### 智能体框架支持\n\n除了稳定的 MoE 训练之外，我们还在扩展对 Camel-AI 和 openai-agents 等智能体框架的原生支持。此前，AReaL 中的可训练智能体被封装在一个 `RolloutWorkflow` 对象中，用户需要手动操作每次 LLM 交互的 token ID。虽然智能体框架可以抽象掉这种复杂性，但它们无法捕获 token ID 或保持 LLM 交互的执行顺序。\n\n为此，AReaL v0.4.0 引入了 `ArealOpenAI`，这是一个可直接替换的客户端，模拟 `AsyncOpenAI` API。该客户端充当智能体代理，能够：\n\n- 秘密捕获来自您智能体的 token ID\n- 维护执行顺序以确保轨迹一致性\n- 支持为单个对话分配奖励\n- 实现跨对话回合的奖励折扣\n\n尽管此功能目前仍处于实验阶段，但我们鼓励用户探索我们的 [最新智能体强化学习文档](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Ftutorial\u002Fagentic_rl.html)，并尝试使用它。\n\n## 重点亮点\n\n**稳定的 MoE 训练**\n\n- bf16 精度的 Megatron 训练配方，各组件间精度对齐\n- 基于 NCCL 的权重更新\n\n**智能体框架集成**\n- 原生支持 [openai-agents SDK](https:\u002F\u002Fopenai.github.io\u002Fopenai-agents-python\u002F) 和 [Camel](https:\u002F\u002Fgithub.com\u002Fcamel-ai\u002Fcamel)\n\n**开发者体验**\n- 与现代工具链集成：`ruff` 和 `uv`\n- 简化安装：`uv pip install -e .[all]` 即可安装所有依赖项\n\n**新算法**\n- 新增 GSPO 支持\n\n\n我们感谢您一直以来的支持与反馈。祝您训练顺利！\n\n---\n\n## 变更内容\n* chore: 在 AGENTS.md 中添加全面的智能体操作指南，由 @rchardx 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F440 中完成\n* fix boba_grpo bug，由 @shun001 在 https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F439 中修复\n* 修复 KeyError: \"full_loss_mask\" 错误，并在没有 Ulysses 的情况下同步 boba GRPO YAML 配置文件，由 @garrett4wade 完成","2025-11-06T11:47:44",{"id":152,"version":153,"summary_zh":154,"released_at":155},145608,"v0.3.4.post1","## v0.3.4.post1 Patch Fix\r\n\r\n- Fixed a \"full_loss_mask\" KeyError introduced in #434. The original PR was tested with Ulysses enabled but caused errors when Ulysses was disabled.\r\n- Updated configuration and scripts in `boba_grpo.py` to reproduce legacy results.\r\n\r\n## What's Changed\r\n* chore: add comprehensive agent operations guide to AGENTS.md by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F440\r\n* fix boba_grpo bug by @shun001 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F439\r\n* fix KeyError: \"full_loss_mask\" without ulysses and synchronize the boba GRPO yaml config by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F441\r\n\r\n## New Contributors\r\n* @shun001 made their first contribution in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F439\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fcompare\u002Fv0.3.4...v0.3.4.post1","2025-10-13T01:54:01",{"id":157,"version":158,"summary_zh":159,"released_at":160},145609,"v0.3.4","# AReaL v0.3.4 Release Note\r\n\r\n## Highlights\r\n\r\n+ Support NPU training with the vLLM inference backend\r\n+ New algorithm implementations: RLOO, REINFORCE++, PPO with critic models, RLHF reward modeling\r\n+ New RL examples: multi-turn math, Tongyi DeepSearch agent (with nearly zero-code change compared with the official agent implementation), and tool-integrated reasoning\r\n+ Implemented LoRA with FSDP training support\r\n+ Enhanced documentation with hyperparameter explanations\r\n\r\n## What's Changed\r\n* fix: prevent submitting duplicate data for rl evaluation by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F356\r\n* Support Gemma3 models (multimodal) by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F350\r\n* doc: add historical highlights by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F357\r\n* [Bug Fix] Fix a bug in Megatron weight loading from huggingface ckpts by @nuzant in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F358\r\n* chore: add gemma3 model test and some minor changes in scripts by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F343\r\n* FEAT: New chapter in Doc by @ZiyiTsang in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F321\r\n* [Emergency fix] Display error in Doc by @ZiyiTsang in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F359\r\n* chore: set `swanlab.mode` to `disabled` by default by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F363\r\n* Document a new example of examing rollout results in both Transformers and the inference engine by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F361\r\n* [Feature] Add support for Reward Model fine-tuning by @catnanami in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F331\r\n* chore: fix a typo in gsm8k example readme by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F364\r\n* [FIX] save config by @fishcrap in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F365\r\n* Fix typo in Doc by @ZiyiTsang in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F367\r\n* Add swanlab log for StatsLogger by @MayDomine in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F362\r\n* [Feature] Support gradient checkpointing options for MegatronEngine by @nuzant in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F368\r\n* chore: add CLI options in internal repo by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F369\r\n* doc: Add docstrings for engine API by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F370\r\n* Using Dict to replace TensorDict by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F371\r\n* add controller and scheduler api by @dingzhiqiang in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F373\r\n* refactor: unifying normalization for rewards and advantages in PPO by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F372\r\n* feat: add trajectory format checking and support lazy platform initialization. by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F374\r\n* doc: refine docstrings of controller API by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F376\r\n* Change the default value and behavior of expert_tensor_parallel_size in ParallelStrategy; Clean up the code by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F378\r\n* doc: add documentation to explain hyper-parameters by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F381\r\n* feat: add tir local example by @mjbmjb in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F360\r\n* [emergent fix] Fix doc gen pre-commit hook by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F383\r\n* Refactor FSDP Engine and its utilities for future expert parallelism by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F384\r\n* Fix missing imports in experimental\u002Fapi\u002Fcli_args.py by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F386\r\n* [FIX] Remove training script requirement for LLM_SERVER_ONLY mode in launcher by @fishcrap in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F387\r\n* Lora Feature by @tutu0038-hk in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F304\r\n* Revert \"Lora Feature\" by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F390\r\n* feat: support NPU and vLLM by @dazhuangzhuang1024 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F351\r\n* feat: support PPO training with critic models by @dhh1995 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F392\r\n* impl distributed batch memory by @dingzhiqiang in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F379\r\n* fix the missing `dataclass` wrapper in vLLMConfig and update cli reference by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F393\r\n* feat: support lora (again) by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F391\r\n* fix: add back areal folder to PYTHONPATH for launcher by @dhh1995 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F396\r\n* feat: support leave-one-out mean and unbiased std estimation for normalization by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F394\r\n* [Feature] Update `ArealOpenAI` APIs, adds an example that finetunes Tongyi-DeepResearch agent. by @nuzant in https:\u002F\u002Fgithub.com\u002Fin","2025-10-11T12:10:12",{"id":162,"version":163,"summary_zh":164,"released_at":165},145610,"v0.3.3","## Release Note\r\n\r\nWe're excited to announce AReaL v0.3.3, which stabilizes training for larger dense models with extended context lengths. This release includes essential improvements and new algorithms to deliver the best out-of-the-box experience for users.\r\n\r\n**Enhanced Parallelism Support**\r\n- Added hybrid parallelism with FSDP backend: tensor parallelism, Ulysses sequence parallelism, and sequence-parallel activation checkpointing\r\n  - **Zero conversion required** – use ANY Hugging Face model directly\r\n  - **Memory efficient** – support for long context lengths with reduced GPU activation memory\r\n  - Usage: `allocation_mode=sglang:d8 + fsdp:d2c2t2`\r\n\r\n**New Algorithm Features**\r\n- PPO with clip higher\r\n- Dynamic sampling with variable batch sizes\r\n- Over-length penalty mechanism\r\n- Decoupled mean\u002Fstd computation for advantage estimation\r\n\r\n**Hardware Compatibility**\r\n- We are ready to support additional hardware backends beyond NVIDIA GPUs (more announcements coming soon!)\r\n\r\n\r\n## What's Changed\r\n* FEAT: Decoupled CLIP ratio (DAPO Trick-I) by @ZiyiTsang in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F285\r\n* Add agent-related logging logic in ppo actor & Update notebook example by @samjia2000 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F290\r\n* FEAT: Dynamic_Sampling(DAPO Trick-II) by @ZiyiTsang in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F294\r\n* refactor: refactor examples structure, make fsdp and ulysses use independent device meshes by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F297\r\n* doc: update the doc of using ulysses sp by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F298\r\n* [TEST] megatron dcp save load test by @fishcrap in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F306\r\n* fix doc: update package installation method within container by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F307\r\n* refactor: group examples according to application by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F305\r\n* fix: add the missing group argument in data redistribution by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F311\r\n* use sourceTensor.detach().clone() rather than torch.tensor(sourceTensor) by @CormickKneey in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F308\r\n* add countdown example by @samjia2000 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F299\r\n* Support tensor parallelism for FSDP engine by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F309\r\n* FEAT: Overlong_Reward_Penalty (DAPO Trick-III) by @ZiyiTsang in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F295\r\n* chore: add engine IDs to differentiate different ranks in logs by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F314\r\n* In remote engine, find sglang server using experiment name and trial name by @samjia2000 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F301\r\n* [Bug Fix] Fix server_idx initialization in RemoteSGLangEngine by @nuzant in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F318\r\n* chore: Fix signatures of `rollout.initialize` in examples by @nuzant in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F319\r\n* chore: amend the `should_accept` argument in `rollout_batch` with docs by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F316\r\n* fix: The shape of attention_mask itself gets changed when removing pads by @jwhj in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F325\r\n* Fix the gradient norm clipping for FSDP engine by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F320\r\n* chore: raise error when using slurm with apptainer and images are not specified. by @nuzant in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F329\r\n* Apply sequence parallel to LayerNorm\u002FRMSNorm layers by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F330\r\n* chore: add ci to close stale issues by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F332\r\n* Import missing AllocationMode by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F333\r\n* [Feat] Add device agnostic feature by @lowdy1 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F327\r\n* Update pre-commit hooks and rerun against all the files by @rchardx in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F334\r\n* Decouple the mean&std advantage normalization (Trick Dr. GRPO and LitePPO) by @ZiyiTsang in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F303\r\n* fix fsdp engine: qwen3 TP q\u002Fk norm wrapping, gradient clipping, the scale of grad norm, and sft scripts by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F335\r\n* chore: preventing CI to close stale PRs by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F337\r\n* fix: revert the order of evaluation and recover in entrypoints, fix all unit-tests by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F323\r\n* [device agnostic] fix examples with the usage of `current_platform` by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F338\r\n* refactor: move `should_accept` to the `submit` method instead of the `wait` method by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fp","2025-09-17T03:16:08",{"id":167,"version":168,"summary_zh":169,"released_at":170},145611,"v0.3.2","## Highlights\r\n\r\n* **Enhanced Documentation with Best Practices**: We've expanded our documentation to include essential best practices for debugging agents and algorithms in isolation, plus guidance on handling OOM errors using Ulysses sequence parallelism. [Explore the updated docs](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002F) to get the most out of your workflows!\r\n\r\n* **Intuitive Allocation Mode Construction**: We're excited to introduce a new approach to building allocation modes that's both incredibly intuitive and highly expressive. This foundation will enable you to specify complex parallel strategies for Megatron training, with full stable Megatron support coming in an upcoming release.\r\n\r\n## What's Changed\r\n* [Doc] add best practices doc, including debugging and handling OOM by @fishcrap in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F287\r\n* doc: fix important and note section format by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F288\r\n* [Bug Fix] Fix bugs in FSDP ulysses sequence parallel and megatron engine. by @nuzant in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F292\r\n* bump to v0.3.2 by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F293\r\n\r\n## New Contributors\r\n* @fishcrap made their first contribution in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F287\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fcompare\u002Fv0.3.1...v0.3.2","2025-09-03T13:08:19",{"id":172,"version":173,"summary_zh":174,"released_at":175},145612,"v0.3.1","## Release Note\r\n\r\nAReaL has been refactored from the legacy `realhf` codebase to the new `areal` codebase. These two directories are now independent, and our future development will focus primarily on the lightweight `areal` directory.\r\n\r\nMajor changes in v0.3.1 for the `areal` directory:\r\n+ Added support for RL with Megatron 5D parallelism based on Megatron Core 0.13.1. We can now fine-tune large MoE models with AReaL. We also optimized weight loading and saving of Megatron models to the Hugging Face format, achieving 20x and 5x speedup for loading and saving respectively.\r\n+ Added support for writing agentic RL workflows with the OpenAI client. Writing agentic RL is as easy as writing a standard agent!\r\n+ Added support for Ulysses sequence parallelism with FSDP to reduce peak memory usage.\r\n+ Added support for dp-attention, cross-node TP, and expert parallelism with SGLang.\r\n+ Added support for automatic failover.\r\n+ Created Jupyter notebook tutorials.\r\n\r\n\r\n## What's Changed\r\n* Warning in the doc: How to run a synchronous configuration asynchronously. by @xssstory in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F222\r\n* chore: update stream notebook, issue template, and contribution guide by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F227\r\n* [Feature] [refactor] Slightly refactor inference engine IO data structures by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F230\r\n* [doc] Update gsm8k example hyperparameters by @EnderXie23 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F228\r\n* [fix] Fix the unit-tests hanging bug by terminating the rollout thread by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F231\r\n* ci: migrate to isolated runners by @futrime in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F229\r\n* [feature] support fault recovery and rollout-only evaluation by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F234\r\n* [fix] Fix a minor iteration logic when using group_adv_norm by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F225\r\n* [fix] Fix remote name error in gsm8k_grpo.yaml by @EnderXie23 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F235\r\n* test(areal): SFT integration tests by @futrime in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F233\r\n* Update requirements.txt by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F237\r\n* test(grpo): add GRPO integration tests by @futrime in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F239\r\n* [fix] Fix the name mismatch error of NCCL weight update for VLM and timeout error for computing rewards. by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F236\r\n* [Fix] Fix the launcher errors of Ray\u002FSLURM by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F242\r\n* chore: add autoflake CI by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F245\r\n* [fix] Add verbose messages for CI pytest by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F247\r\n* [feat] Fix rollout completion order and allow stats logging during workflow execution. by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F246\r\n* feat: support openai-compatible rollout and add an unittest for prepare_mb_list by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F248\r\n* refactor: remove areal's dependency on realhf by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F249\r\n* [FEAT] Support Variable Shape of Multi-Modal Inputs for VLM Training by @JamesKrW in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F244\r\n* [fix] remove the dependency of realhf in areal by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F252\r\n* doc: add for writing workflows with the openai-compatible client by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F254\r\n* chore: highlight wechat in readme by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F255\r\n* [experimental] [feat] add megatron checkpointer and accelerate megatron weight loading by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F253\r\n* fix: fix incorrect imports from realhf that causes statistics naming error by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F262\r\n* [experimental] feat: megatron 5d parallel forward, reliable reward process executor, max length of dataset by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F263\r\n* add search agent jupyter notebook example by @samjia2000 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F264\r\n* fix: fix ci unit-test after gh runner recovers by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F268\r\n* fix: replace `LLMRequest` with `ModelRequest` in the notebook by @ZiyiTsang in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F271\r\n* fix: replace `LLMRequest` with `ModelRequest` and format the asearcher notebook by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F272\r\n* fix: fix the parsing logic of LLM_SERVER_ONLY allocation mode by @GurrenLagann97 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F265\r\n* feat: support sglang cross-node TP and dp-attention with slu","2025-08-29T07:57:19",{"id":177,"version":178,"summary_zh":179,"released_at":180},145613,"v0.3.0-lite.post2","## What's Changed\r\n* [fix] Fix the NCCL parameter synchronization bug with slurm and add more logging messages in workflow executor. by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F215\r\n* [fix] Fix a slurm launcher issue by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F219\r\n* [CRITICAL] Fix forward bug of LLM and VLM  by @antoinegg1 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F218\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fcompare\u002Fv0.3.0-lite.post1...v0.3.0-lite.post2","2025-08-07T04:25:21",{"id":182,"version":183,"summary_zh":184,"released_at":185},145614,"v0.3.0-lite.post1","## What's Changed\r\n* [fix] Fix the multi-turn example and the abnormal loss of LLM training by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F214","2025-08-05T06:41:52",{"id":187,"version":188,"summary_zh":189,"released_at":190},145615,"v0.3.0-lite","## Introducing AReaL-lite\r\n\r\nOur new release AReaL-lite is a **light-weight** and **algorithm-first** codebase that prioritizes better development experiences for AI researchers. As a result, AReaL-lite delivers most AReaL functionalities while maintains its high performance with much fewer lines of code. This allows users to build their own **agentic** training workflows with minimal efforts.\r\n\r\nWith 80% fewer lines of code, AReaL-lite maintains 90% of AReaL's high performance and core functionality. Check out [our AReaL-lite design doc](\u002Fareal\u002FREADME.md) and [the quickstart guide](https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Ftutorial\u002Fquickstart.html) to begin your journey with **AReaL-lite**!\r\n\r\n## Future Works\r\n\r\nAReaL-lite serves as the first phase in AReaL's broader refactoring initiative. It functions both as a standalone training library with intuitive interfaces and as the foundation for AReaL's future core API definitions. The plan is to transform AReaL's current worker-based architecture into an algorithm-first architecture similar to AReaL-lite, where AReaL will **extend** AReaL-lite's APIs and implementations to support additional backends for efficient large-scale training.\r\n\r\n","2025-08-01T15:12:08",{"id":192,"version":193,"summary_zh":194,"released_at":195},145616,"v0.3.0","## Milestone Release v0.3.0\r\n\r\n+ Support asynchronous RL training with decoupled PPO loss, rollout interuption, and staleness control.\r\n+ Support Qwen3 training.\r\n+ Refactor and simplify Ray-based launching.\r\n+ Add github page documentation and tutorials.\r\n\r\n## What's Changed\r\n\r\n* Delete .github\u002FPULL_REQUEST_TEMPLATE directory by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F46\r\n* Support asynchronous RL training, Qwen3, and the latest SGLang by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F47\r\n* Update issue templates by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F48\r\n* [Feature & Doc & Bug Fix] Add docs, simplified ray-based scripts, and fix issues to stablize asynchronous experiments by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F52\r\n* [Feature] Amend yaml configurations for Ray experiments by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F53\r\n* [CI] Add github page CI by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F54\r\n* [Doc] Fix doc by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F55\r\n* [CI] Fix doc CI by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F56\r\n* [CI] Fix doc CI again by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F57\r\n* [CI] Try to fix doc CI by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F58\r\n* [Feature] Support behavior importance weight capping and update evaluation scripts by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F59\r\n* [Feature] Create docs and examples for multi-turn agent RL by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F60\r\n* [Feature] Add link to documentation in README by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F61\r\n* [Doc & Fix] Simplify the environment setup procedure by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F62\r\n* [Fix] Fix tutorial async_ppo script and doc structure by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F63\r\n* [Doc] Fix documentation for using Docker containers and customized agents by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F64\r\n* [Doc] Add doc for reproducing benchmark results by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F65\r\n* [Doc] add dataset customization docs by @GurrenLagann97 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F66\r\n* [Doc] Add doc for rollout workers by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F67\r\n* [Doc] Add customization guide for new algorithms. by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F68\r\n* [Doc] Mark the equivalent between zero-staleness and synchronous PPO. by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F69\r\n* [Feature] Update Dockerfile to v0.3.0 by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F70\r\n* [Doc] Add verl benchmark scripts by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F71\r\n* [Doc] Add doc for reproducing released models by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F73\r\n* [Doc] Update README. by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F74\r\n* add hf links in readme and blog by @GurrenLagann97 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F75\r\n* [Doc] Fix README links. by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F77\r\n* update blog to discuss staleness by @samjia2000 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F76\r\n* [Doc] Fix README typos and move multi-turn training curve to doc by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F78\r\n* Switch dataset path \u002F model path to HF location to ease community usage by @Lyken17 in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F82\r\n* Fix formatting. by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F90\r\n* [Feature] Add multi-turn agent example by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F72\r\n* [Bug] Suppress the SGLang patch error in Ray workers by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F93\r\n* [Bug] Fix the dependency of a virtual environment for sympy==1.12 by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F92\r\n* [Doc] Add WeChat QR code by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F99\r\n* [Doc] fix wandb mode typo by @garrett4wade in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F100\r\n\r\n## New Contributors\r\n* @xssstory made their first contribution in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F16\r\n* @Ximingwang-09 made their first contribution in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F32\r\n* @HollowMan6 made their first contribution in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F38\r\n* @GurrenLagann97 made their first contribution in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F66\r\n* @samjia2000 made their first contribution in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F76\r\n* @Lyken17 made their first contribution in https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fpull\u002F82\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fcompare\u002Fv0.2.0...v0.3.0","2025-06-10T08:05:10",{"id":197,"version":198,"summary_zh":199,"released_at":200},145617,"v0.2.0","### Our milestone release, **AReaL-boba** 🎉\r\n\r\n## Features\r\n* Quickstart by default yaml config and commandline overrides. Check our [updated tutorial](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fblob\u002Fmain\u002Fexamples\u002FREADME.md)!\r\n* Full SGLang support and other system optimizations for 1.5x faster RL training.\r\n* SOTA 7B math reasoning: 61.9 AIME24 & 48.3 AIME25\r\n* 200-sample 32B tuning match QwQ on AIME24\r\n\r\nWe fully open-source all code, model, and data. Check our [technical blog](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL\u002Fblob\u002Fmain\u002Fblog\u002FAReaL_v0_2.md) for more details!\r\n\r\n","2025-03-31T00:50:26",[202,213,221,229,237,246],{"id":203,"name":204,"github_repo":205,"description_zh":206,"stars":207,"difficulty_score":208,"last_commit_at":209,"category_tags":210,"status":68},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[56,57,211,212],"图像","数据工具",{"id":214,"name":215,"github_repo":216,"description_zh":217,"stars":218,"difficulty_score":208,"last_commit_at":219,"category_tags":220,"status":68},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[57,211,56],{"id":222,"name":223,"github_repo":224,"description_zh":225,"stars":226,"difficulty_score":67,"last_commit_at":227,"category_tags":228,"status":68},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",144730,"2026-04-07T23:26:32",[57,56,55],{"id":230,"name":231,"github_repo":232,"description_zh":233,"stars":234,"difficulty_score":67,"last_commit_at":235,"category_tags":236,"status":68},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[57,211,56],{"id":238,"name":239,"github_repo":240,"description_zh":241,"stars":242,"difficulty_score":67,"last_commit_at":243,"category_tags":244,"status":68},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[245,57],"插件",{"id":247,"name":248,"github_repo":249,"description_zh":250,"stars":251,"difficulty_score":208,"last_commit_at":252,"category_tags":253,"status":68},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[55,211,56,57]]