[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-apexrl--Diff4RLSurvey":3,"tool-apexrl--Diff4RLSurvey":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",153609,2,"2026-04-13T11:34:59",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":76,"stars":80,"forks":81,"last_commit_at":82,"license":83,"difficulty_score":84,"env_os":85,"env_gpu":85,"env_ram":85,"env_deps":86,"category_tags":89,"github_topics":76,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":91,"updated_at":92,"faqs":93,"releases":94},7109,"apexrl\u002FDiff4RLSurvey","Diff4RLSurvey","This repository contains a collection of resources and papers on Diffusion Models for RL, accompanying the paper \"Diffusion Models for Reinforcement Learning: A Survey\"","Diff4RLSurvey 是一个专注于“扩散模型在强化学习中应用”的开源资源库，旨在为研究人员和开发者提供该前沿领域的系统性指引。随着扩散模型在图像生成领域取得巨大成功，如何将其强大的序列建模能力迁移到复杂的决策任务中，成为学术界关注的热点，但相关研究分散且难以追踪。Diff4RLSurvey 通过整理配套综述论文《Diffusion Models for Reinforcement Learning: A Survey》，有效解决了这一信息碎片化问题。\n\n该资源库不仅收录了核心综述，还精心分类整理了大量关键论文与代码实现，涵盖离线强化学习、在线强化学习、模仿学习、轨迹生成及数据增强等多个子方向。其独特亮点在于构建了从理论规划到行为合成的完整知识图谱，帮助用户快速定位如 Diffuser、AdaptDiffuser 等经典算法的最新进展。无论是希望深入探索序列决策机制的科研人员，还是寻求将生成式 AI 应用于机器人控制、游戏策略等场景的算法工程师，都能从中获得宝贵的参考依据，高效把握技术演进脉络。","# Diffusion Models for Sequential Decision-Making: A Survey\nThis repository contains a collection of resources and papers on ***Diffusion Models*** for ***Sequential Decision-Making***.\n\n:rocket: Please check out our survey paper [Diffusion Models for Reinforcement Learning: A Survey](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.01223)\n\n![image info](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fapexrl_Diff4RLSurvey_readme_74d343a548e9.png)\n\n## Table of Contents\n- [Diffusion Models for Sequential Decision-Making: A Survey](#diffusion-models-for-sequential-decision-making-a-survey)\n  - [Table of Contents](#table-of-contents)\n  - [Papers](#papers)\n    - [Offline Reinforcement Learning](#offline-reinforcement-learning)\n    - [Online Reinforcement Learning](#online-reinforcement-learning)\n    - [Imitation Learning](#imitation-learning)\n    - [Trajectory Generation](#trajectory-generation)\n    - [Data Augmentation](#data-augmentation)\n  - [Citation](#citation)\n\n## Papers\n\n### Offline Reinforcement Learning\n\n- **Planning with Diffusion for Flexible Behavior Synthesis**, ICML 2022. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.09991)] [[code](https:\u002F\u002Fgithub.com\u002Fjannerm\u002Fdiffuser)]\n\n- **Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning**, ICLR 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.06193)] [[code](https:\u002F\u002Fgithub.com\u002Fzhendong-wang\u002Fdiffusion-policies-for-offline-rl)] \n\n- **Offline Reinforcement Learning via High-fidelity Generative Behavior Modeling**, ICLR 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14548)] [[code](https:\u002F\u002Fgithub.com\u002Fchendrag\u002Fsfbc)]\n\n- **Is Conditional Generative Modeling all you need for Decision-Making?**, ICLR 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.15657)] [[code](https:\u002F\u002Fgithub.com\u002Fxcvil\u002Fdecision-diffuser\u002Ftree\u002Fmain\u002Fcode)]\n\n- **AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners**, ICML 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.01877)] [[code](https:\u002F\u002Fgithub.com\u002FLiang-ZX\u002Fadaptdiffuser)]\n\n- **Metadiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL**, ICML 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.19923)]\n\n- **Hierarchical Diffusion for Offline Decision Making**, ICML 2023. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=55kLa7tH9o)] [[code](https:\u002F\u002Fgithub.com\u002Fewanlee\u002FHDMI)]\n\n- **Contrastive Energy Prediction for Exact Energy-guided Diffusion Sampling in Offline Reinforcement Learning**, ICML 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12824)] [[code](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fcep-energy-guided-diffusion)]\n\n- **Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.15629v2)] [[code](https:\u002F\u002Fgithub.com\u002Fezhang7423\u002Flanguage-control-diffusion)]\n\n- **IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.10573)] [[code](https:\u002F\u002Fgithub.com\u002Fphilippe-eecs\u002Fidql)]\n\n- **Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning**, NeurIPS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.18459)] [[code](https:\u002F\u002Fgithub.com\u002Ftinnerhrhe\u002FMTDiff)]\n\n- **EDGI: Equivariant Diffusion for Planning with Embodied Agents**, NeurIPS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.12410)]\n\n- **Extracting Reward Functions from Diffusion Models**, NeurIPS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.01804)]\n\n- **Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?**, NeurIPS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.07837)]\n\n- **Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement**, NeurIPS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.07055)] [[code](https:\u002F\u002Fgithub.com\u002FKaffaljidhmah2\u002FRCGDM\u002Ftree\u002Fmain)]\n\n- **Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans**, NeurIPS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.19427)] [[code](https:\u002F\u002Fgithub.com\u002Fleekwoon\u002Frgg)]\n\n- **SafeDiffuser: Safe Planning with Diffusion Probabilistic Models**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.00148)]\n\n- **Efficient Diffusion Policies for Offline Reinforcement Learning**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.20081)] [[code](https:\u002F\u002Fgithub.com\u002Fsail-sg\u002Fedp)]\n\n- **MADiff: Offline Multi-agent Learning with Diffusion Models**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17330)] [[code](https:\u002F\u002Fgithub.com\u002Fzbzhu99\u002Fmadiff)]\n\n- **Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.01472)]\n\n- **Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching**, CoRL 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.14079)] [[code](https:\u002F\u002Fgithub.com\u002Fhjsuh94\u002Fscore_po)]\n\n- **Value function estimation using conditional diffusion models for control**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.07290)]\n\n- **Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.04875)]\n\n- **Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.04726)]\n\n- **Diffusion Policies as Multi-Agent Reinforcement Learning Strategies**, ICANN 2023. [[paper](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-031-44213-1_30)]\n\n- **DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05333)] [[code](https:\u002F\u002Fgithub.com\u002Ffelix-thu\u002FDiffCPS)]\n\n- **Score Regularized Policy Optimization through Diffusion Behavior**, ICLR 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07297)] [[code](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fsrpo)]\n\n- **Adaptive Online Replanning with Diffusion Models**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.09629)]\n\n- **AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02054)] [[code](https:\u002F\u002Fgithub.com\u002Faligndiff\u002Faligndiff.github.io)]\n\n- **SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution**, CVPR 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11598)] [[website](https:\u002F\u002Fskilldiffuser.github.io\u002F)]\n\n- **Learning a Diffusion Model Policy from Rewards vis Q-score Matching**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11752)]\n\n- **Simple Hierarchical Planning with Diffusion**, ICLR 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.02644)]\n\n- **Reasoning with Latent Diffusion in Offline Reinforcement Learning**, ICLR 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.06599)]\n\n- **Efficient Planning with Latent Diffusion**, ICLR 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.00311)]\n\n- **Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning**, arXiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.02772)]\n\n- **DMBP: Diffusion model-based predictor for robust offline reinforcement learning against state observation perturbations**, ICLR 2024. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=ZULjcYLWKe)] [[code](https:\u002F\u002Fgithub.com\u002Fzhyang2226\u002FDMBP)]\n\n- **Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning**, arXiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.04080)] [[code](https:\u002F\u002Fgithub.com\u002Fruoqizzz\u002Fentropy-offlineRL)]\n\n- **Diffusion World Model**, arXiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03570)]\n\n- **Diffusion World Models**, OpenReview 2024. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=bAXmvOLtjA)]\n\n- **Policy-Guided Diffusion**, arXiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.06356)] [[code](https:\u002F\u002Fgithub.com\u002FEmptyJackson\u002Fpolicy-guided-diffusion)]\n\n### Online Reinforcement Learning\n\n- **Policy Representation via Diffusion Probability Model for Reinforcement Learning**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13122)]\n\n- **Boosting Continuous Control with Consistency Policy**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06343)]\n\n- **Diffusion Reward: Learning Rewards via Conditional Video Diffusion**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.14134)] [[website](https:\u002F\u002Fdiffusion-reward.github.io\u002F)] [[code](https:\u002F\u002Fgithub.com\u002FTaoHuang13\u002Fdiffusion_reward)]\n\n- **ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories**, OpenReview 2024. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=Ng7OYC3PT8)]\n\n### Imitation Learning\n\n- **Imitating Human Behaviour with Diffusion Models**, ICLR 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.10677)] [[code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fimitating-human-behaviour-w-diffusion)]\n\n- **Diffusion Policy: Visuomotor Policy Learning via Action Diffusion**, RSS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04137)] [[code](https:\u002F\u002Fgithub.com\u002Freal-stanford\u002Fdiffusion_policy)]\n\n- **Goal-Conditioned Imitation Learning using Score-based Diffusion Policies**, RSS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.02532)] [[code](https:\u002F\u002Fgithub.com\u002Fintuitive-robots\u002Fbeso)]\n\n- **To the Noise and Back: Diffusion for Shared Autonomy**, RSS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.12244)] [[code](https:\u002F\u002Fgithub.com\u002Fripl\u002Fdiffusion-for-shared-autonomy)]\n\n- **DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics**, RAL 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.02438)]\n\n- **Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition**, CoRL 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14535)] [[code](https:\u002F\u002Fgithub.com\u002Freal-stanford\u002Fscalingup)]\n\n- **XSkill: Cross Embodiment Skill Discovery**, CoRL 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.09955)]\n\n- **ChainedDiffuser: Unifying Trajectory Diffusion and Keypose Prediction for Robotic Manipulation**, CoRL 2023. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=W0zgY2mBTA8)] [[code](https:\u002F\u002Fgithub.com\u002Fzhouxian\u002Fchained-diffuser)]\n\n- **PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play**, CoRL 2023. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=afF8RGcBBP)]\n\n- **Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models**, CoRL 2023. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=HtJE9ly5dT)] [[code](https:\u002F\u002Fgithub.com\u002Fgenerative-skill-chaining\u002Fgsc-code)]\n\n- **Multimodal Diffusion Transformer for Learning from Play**, CoRL 2023. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=nvtxqMGpn1)]\n\n- **GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields**, CoRL 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.16891)] [[code](https:\u002F\u002Fgithub.com\u002FYanjieZe\u002FGNFactor)]\n\n- **Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.01849)] [[code](https:\u002F\u002Fgithub.com\u002Flostxine\u002Fcrossway_diffusion)]\n\n- **Diffusion Co-Policy for Synergistic Human-Robot Collaborative Tasks**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12171)] [[code](https:\u002F\u002Fgithub.com\u002Feleyng\u002Fdiffusion_copolicy)]\n\n- **Compositional Foundation Models for Hierarchical Planning**, NeurIPS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.08587)] [[code](https:\u002F\u002Fgithub.com\u002Fanuragajay\u002Fhip\u002Ftree\u002Fmain)]\n\n- **Generating Behaviorally Diverse Policies with Latent Diffusion Models**, NeurIPS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.18738)]\n\n- **NoMaD: Goal Masking Diffusion Policies for Navigation and Exploration**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07896)] [[code](https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer)]\n\n- **Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10639)]\n\n- **Imitation Learning from Purified Demonstrations**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07143)]\n\n- **Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.01097)]\n\n- **Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning**, arXiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.17768)]\n\n- **3D Diffusion Policy**, arXiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.03954)] [[website](https:\u002F\u002F3d-diffusion-policy.github.io)] [[code](https:\u002F\u002Fgithub.com\u002FYanjieZe\u002F3D-Diffusion-Policy)]\n\n- **Large-Scale Actionless Video Pre-Training via Discrete Diffusion for Efficient Policy Learning**, arxiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.14407)] [[website](https:\u002F\u002Fvideo-diff.github.io\u002F)]\n\n- **SculptDiff: Learning Robotic Clay Sculpting from Humans with Goal Conditioned Diffusion Policy**, arXiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.10401)] [[website](https:\u002F\u002Fsites.google.com\u002Fandrew.cmu.edu\u002Fimitation-sculpting\u002Fhome)] [[code](https:\u002F\u002Fgithub.com\u002Falison-bartsch\u002FSculptDiff)]\n\n- **Subgoal Diffuser: Coarse-to-fine Subgoal Generation to Guide Model Predictive Control for Robot Manipulation**, ICRA 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.13085)] [[website](https:\u002F\u002Fsites.google.com\u002Fview\u002Fsubgoal-diffuser-mpc)]\n\n- **Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation**, CVPR 2024, [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.03890)] [[code](https:\u002F\u002Fgithub.com\u002Fdyson-ai\u002Fhdp)] [[website](https:\u002F\u002Fyusufma03.github.io\u002Fprojects\u002Fhdp\u002F)]\n\n- **Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation**, CVPR 2024, [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.02685)] [[code](https:\u002F\u002Fgithub.com\u002Ftomato1mule\u002Fdiffusion_edf)] [[website](https:\u002F\u002Fsites.google.com\u002Fview\u002Fdiffusion-edfs)]\n\n### Trajectory Generation\n\n- **MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model**, arXiv 2022. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.15001)] [[code](https:\u002F\u002Fgithub.com\u002Fmingyuan-zhang\u002FMotionDiffuse)]\n  \n- **Human Motion Diffusion Model**, ICLR 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14916)] [[code](https:\u002F\u002Fgithub.com\u002Fguytevet\u002Fmotion-diffusion-model)]\n  \n- **Executing your Commands via Motion Diffusion in Latent Space**, CVPR 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.04048)] [[code](https:\u002F\u002Fgithub.com\u002Fchenfengye\u002Fmotion-latent-diffusion)]\n  \n- **MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis**, CVPR 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.04495)] [[code](https:\u002F\u002Fgithub.com\u002FOFA-Sys\u002FMoFusion)]\n  \n- **ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model**, ICCV 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01116)] [[code](https:\u002F\u002Fgithub.com\u002Fmingyuan-zhang\u002FReMoDiffuse)]\n  \n- **MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion**, CVPR 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03083)]\n\n- **Learning Universal Policies via Text-Guided Video Generation**, NeurIPS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.08576)]\n\n- **EquiDiff: A Conditional Equivariant Diffusion Model For Trajectory Prediction**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.06564)]\n\n- **Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models**, IROS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01557)] [[code](https:\u002F\u002Fgithub.com\u002Fjacarvalho\u002Fmpd-public)]\n\n- **EDMP: Ensemble-of-costs-guided Diffusion for Motion Planning**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11414)] [[code](https:\u002F\u002Fgithub.com\u002Fvishal-2000\u002FEDMP)]\n\n- **Sampling Constrained Trajectories Using Composable Diffusion Models**, IROS 2023. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=UAylEpIMNE)]\n\n- **DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.13196)]\n\n- **Conditioned Score-Based Models for Learning Collision-Free Trajectory Generation**, NeurIPSW 2022. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=4Vqu4N1jjrx)]\n\n- **Video Language Planning**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10625)] [[code](https:\u002F\u002Fgithub.com\u002Fvideo-language-planning\u002Fvlp_code)]\n  \n- **Learning to Act from Actionless Video through Dense Correspondences**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.08576)] [[code](https:\u002F\u002Fgithub.com\u002Fflow-diffusion\u002FAVDC)]\n  \n- **Learning Interactive Real-World Simulators**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06114)]\n\n- **DNAct: Diffusion Guided Multi-Task 3D Policy Learning**, arXiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.04115)] [[website](https:\u002F\u002Fdnact.github.io\u002F)]\n\n- **Single Motion Diffusion**, ICLR 2024, [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05905)] [[code](https:\u002F\u002Fgithub.com\u002FSinMDM\u002FSinMDM)] [[website](https:\u002F\u002Fsinmdm.github.io\u002FSinMDM-page\u002F)]\n\n- **READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning**, CVPR 2024, [[paper](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fhtml\u002FOba_READ_Retrieval-Enhanced_Asymmetric_Diffusion_for_Motion_Planning_CVPR_2024_paper.html)] [[code](https:\u002F\u002Fgithub.com\u002FObat2343\u002FREAD)]\n\n### Data Augmentation\n\n- **Scaling Robot Learning with Semantically Imagined Experience**, RSS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.11550)]\n\n- **GenAug: Retargeting behaviors to unseen situations via Generative Augmentation**, RSS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.06671)] [[code](https:\u002F\u002Fgithub.com\u002Fgenaug\u002Fgenaug)]\n\n- **Synthetic Experience Replay**, NeurIPS 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.06614)] [[code](https:\u002F\u002Fgithub.com\u002Fconglu1997\u002FSynthER)]\n\n- **World Models via Policy-Guided Trajectory Diffusion**, arXiv 2023. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08533)]\n\n- **Distilling Conditional Diffusion Models for Offline Reinforcement Learning through Trajectory Stitching**, arXiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.00807)]\n\n- **DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching**, arXiv 2024. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.02439)]\n\n- **Flow to Better: Offline Preference-based Reinforcement Learning via Preferred Trajectory Generation**, ICLR 2024. [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=EG68RSznLT)] [[code](https:\u002F\u002Fgithub.com\u002FZzl35\u002Fflow-to-better)]\n\n## Citation\n```\n@article{zhu2023diffusion,\n  title={Diffusion Models for Reinforcement Learning: A Survey},\n  author={Zhu, Zhengbang and Zhao, Hanye and He, Haoran and Zhong, Yichao and Zhang, Shenyu and Yu, Yong and Zhang, Weinan},\n  journal={arXiv preprint arXiv:2311.01223},\n  year={2023}\n}\n```\n","# 用于序列决策的扩散模型：综述\n本仓库包含关于***序列决策***领域中***扩散模型***的相关资源和论文集。\n\n:rocket: 请查看我们的综述论文 [用于强化学习的扩散模型：综述](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.01223)\n\n![图片信息](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fapexrl_Diff4RLSurvey_readme_74d343a548e9.png)\n\n## 目录\n- [用于序列决策的扩散模型：综述](#diffusion-models-for-sequential-decision-making-a-survey)\n  - [目录](#table-of-contents)\n  - [论文](#papers)\n    - [离线强化学习](#offline-reinforcement-learning)\n    - [在线强化学习](#online-reinforcement-learning)\n    - [模仿学习](#imitation-learning)\n    - [轨迹生成](#trajectory-generation)\n    - [数据增强](#data-augmentation)\n  - [引用](#citation)\n\n## 论文\n\n### 离线强化学习\n\n- **基于扩散的规划用于灵活的行为合成**, ICML 2022. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.09991)] [[代码](https:\u002F\u002Fgithub.com\u002Fjannerm\u002Fdiffuser)]\n\n- **扩散策略作为离线强化学习中富有表现力的策略类**, ICLR 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.06193)] [[代码](https:\u002F\u002Fgithub.com\u002Fzhendong-wang\u002Fdiffusion-policies-for-offline-rl)]\n\n- **通过高保真生成式行为建模实现离线强化学习**, ICLR 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14548)] [[代码](https:\u002F\u002Fgithub.com\u002Fchendrag\u002Fsfbc)]\n\n- **条件生成建模是否足以应对决策任务？**, ICLR 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.15657)] [[代码](https:\u002F\u002Fgithub.com\u002Fxcvil\u002Fdecision-diffuser\u002Ftree\u002Fmain\u002Fcode)]\n\n- **AdaptDiffuser: 扩散模型作为自适应的自我演化规划器**, ICML 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.01877)] [[代码](https:\u002F\u002Fgithub.com\u002FLiang-ZX\u002Fadaptdiffuser)]\n\n- **Metadiffuser: 扩散模型作为离线元强化学习的条件规划器**, ICML 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.19923)]\n\n- **用于离线决策的层次化扩散**, ICML 2023. [[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=55kLa7tH9o)] [[代码](https:\u002F\u002Fgithub.com\u002Fewanlee\u002FHDMI)]\n\n- **对比能量预测用于离线强化学习中精确的能量引导扩散采样**, ICML 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12824)] [[代码](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fcep-energy-guided-diffusion)]\n\n- **语言控制扩散：在空间、时间和任务之间高效扩展**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.15629v2)] [[代码](https:\u002F\u002Fgithub.com\u002Fezhang7423\u002Flanguage-control-diffusion)]\n\n- **IDQL: 隐式Q学习作为一种结合扩散策略的演员—评论家方法**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.10573)] [[代码](https:\u002F\u002Fgithub.com\u002Fphilippe-eecs\u002Fidql)]\n\n- **扩散模型是多任务强化学习中有效的规划器和数据合成器**, NeurIPS 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.18459)] [[代码](https:\u002F\u002Fgithub.com\u002Ftinnerhrhe\u002FMTDiff)]\n\n- **EDGI: 等变扩散用于具身智能体的规划**, NeurIPS 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.12410)]\n\n- **从扩散模型中提取奖励函数**, NeurIPS 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.01804)]\n\n- **预训练的文本到图像模型能否为强化学习生成视觉目标？**, NeurIPS 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.07837)]\n\n- **奖励导向的条件扩散：可证明的分布估计与奖励提升**, NeurIPS 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.07055)] [[代码](https:\u002F\u002Fgithub.com\u002FKaffaljidhmah2\u002FRCGDM\u002Ftree\u002Fmain)]\n\n- **通过自动检测不可行计划来优化扩散规划器以实现可靠的行为合成**, NeurIPS 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.19427)] [[代码](https:\u002F\u002Fgithub.com\u002Fleekwoon\u002Frgg)]\n\n- **SafeDiffuser: 基于扩散概率模型的安全规划**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.00148)]\n\n- **用于离线强化学习的高效扩散策略**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.20081)] [[代码](https:\u002F\u002Fgithub.com\u002Fsail-sg\u002Fedp)]\n\n- **MADiff: 基于扩散模型的离线多智能体学习**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17330)] [[代码](https:\u002F\u002Fgithub.com\u002Fzbzhu99\u002Fmadiff)]\n\n- **超越保守主义：离线多智能体强化学习中的扩散策略**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.01472)]\n\n- **用梯度对抗不确定性：基于扩散分数匹配的离线强化学习**, CoRL 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.14079)] [[代码](https:\u002F\u002Fgithub.com\u002Fhjsuh94\u002Fscore_po)]\n\n- **利用条件扩散模型进行控制的价值函数估计**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.07290)]\n\n- **具有时间条件指导的指令型扩散器用于离线强化学习**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.04875)]\n\n- **用于离线强化学习中分布外泛化的扩散策略**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.04726)]\n\n- **扩散策略作为多智能体强化学习策略**, ICANN 2023. [[论文](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-031-44213-1_30)]\n\n- **DiffCPS: 基于扩散模型的约束策略搜索用于离线强化学习**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05333)] [[代码](https:\u002F\u002Fgithub.com\u002Ffelix-thu\u002FDiffCPS)]\n\n- **通过扩散行为进行分数正则化的策略优化**, ICLR 2024. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07297)] [[代码](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fsrpo)]\n\n- **基于扩散模型的自适应在线重规划**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.09629)]\n\n- **AlignDiff: 通过行为可定制的扩散模型对齐多样化的人类偏好**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02054)] [[代码](https:\u002F\u002Fgithub.com\u002Faligndiff\u002Faligndiff.github.io)]\n\n- **SkillDiffuser: 基于技能抽象的可解释层次化规划在扩散驱动的任务执行中**, CVPR 2024. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11598)] [[网站](https:\u002F\u002Fskilldiffuser.github.io\u002F)]\n\n- **通过Q分数匹配从奖励中学习扩散模型策略**, arXiv 2023. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11752)]\n\n- **使用扩散进行简单的层次化规划**, ICLR 2024. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.02644)]\n\n- **在离线强化学习中使用潜在扩散进行推理**, ICLR 2024. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.06599)]\n\n- **使用潜在扩散进行高效规划**, ICLR 2024. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.00311)]\n\n- **对比扩散器：通过对比学习规划至高回报状态**, arXiv 2024. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.02772)]\n\n- **DMBP: 基于扩散模型的状态观测扰动鲁棒性预测器用于离线强化学习**, ICLR 2024. [[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=ZULjcYLWKe)] [[代码](https:\u002F\u002Fgithub.com\u002Fzhyang2226\u002FDMBP)]\n\n- **带有Q集成的熵正则化扩散策略用于离线强化学习**, arXiv 2024. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.04080)] [[代码](https:\u002F\u002Fgithub.com\u002Fruoqizzz\u002Fentropy-offlineRL)]\n\n- **扩散世界模型**, arXiv 2024. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03570)]\n\n- **扩散世界模型**, OpenReview 2024. [[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=bAXmvOLtjA)]\n\n- **策略引导的扩散**, arXiv 2024. [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.06356)] [[代码](https:\u002F\u002Fgithub.com\u002FEmptyJackson\u002Fpolicy-guided-diffusion)]\n\n### 在线强化学习\n\n- **基于扩散概率模型的强化学习策略表示**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13122)]\n\n- **利用一致性策略提升连续控制性能**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06343)]\n\n- **扩散奖励：通过条件视频扩散学习奖励函数**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.14134)] [[网站](https:\u002F\u002Fdiffusion-reward.github.io\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FTaoHuang13\u002Fdiffusion_reward)]\n\n- **ATraDiff：利用想象轨迹加速在线强化学习**，OpenReview 2024。[[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=Ng7OYC3PT8)]\n\n### 示范学习\n\n- **使用扩散模型模仿人类行为**，ICLR 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.10677)] [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fimitating-human-behaviour-w-diffusion)]\n\n- **扩散策略：基于动作扩散的视觉-运动策略学习**，RSS 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04137)] [[代码](https:\u002F\u002Fgithub.com\u002Freal-stanford\u002Fdiffusion_policy)]\n\n- **基于分数函数的扩散策略实现目标条件示范学习**，RSS 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.02532)] [[代码](https:\u002F\u002Fgithub.com\u002Fintuitive-robots\u002Fbeso)]\n\n- **向噪声进发再回归：用于共享自主性的扩散方法**，RSS 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.12244)] [[代码](https:\u002F\u002Fgithub.com\u002Fripl\u002Fdiffusion-for-shared-autonomy)]\n\n- **DALL-E-Bot：将大规模网络级扩散模型引入机器人领域**，RAL 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.02438)]\n\n- **向上扩展与向下提炼：语言引导的机器人技能获取**，CoRL 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14535)] [[代码](https:\u002F\u002Fgithub.com\u002Freal-stanford\u002Fscalingup)]\n\n- **XSkill：跨本体技能发现**，CoRL 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.09955)]\n\n- **ChainedDiffuser：统一轨迹扩散与关键点姿态预测以支持机器人操作**，CoRL 2023。[[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=W0zgY2mBTA8)] [[代码](https:\u002F\u002Fgithub.com\u002Fzhouxian\u002Fchained-diffuser)]\n\n- **PlayFusion：通过语言标注的游戏进行扩散式技能获取**，CoRL 2023。[[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=afF8RGcBBP)]\n\n- **生成式技能链：基于扩散模型的长 horizon 技能规划**，CoRL 2023。[[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=HtJE9ly5dT)] [[代码](https:\u002F\u002Fgithub.com\u002Fgenerative-skill-chaining\u002Fgsc-code)]\n\n- **用于从游戏中学习的多模态扩散 Transformer**，CoRL 2023。[[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=nvtxqMGpn1)]\n\n- **GNFactor：基于可泛化神经特征场的多任务真实机器人学习**，CoRL 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.16891)] [[代码](https:\u002F\u002Fgithub.com\u002FYanjieZe\u002FGNFactor)]\n\n- **Crossway Diffusion：通过自监督学习改进基于扩散的视觉-运动策略**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.01849)] [[代码](https:\u002F\u002Fgithub.com\u002Flostxine\u002Fcrossway_diffusion)]\n\n- **用于协同人机合作任务的扩散共策略**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12171)] [[代码](https:\u002F\u002Fgithub.com\u002Feleyng\u002Fdiffusion_copolicy)]\n\n- **用于层次化规划的组合基础模型**，NeurIPS 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.08587)] [[代码](https:\u002F\u002Fgithub.com\u002Fanuragajay\u002Fhip\u002Ftree\u002Fmain)]\n\n- **利用潜在扩散模型生成行为多样化的策略**，NeurIPS 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.18738)]\n\n- **NoMaD：用于导航与探索的目标掩码扩散策略**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07896)] [[代码](https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer)]\n\n- **利用预训练的图像编辑扩散模型实现零样本机器人操作**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10639)]\n\n- **基于净化后的演示数据进行示范学习**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07143)]\n\n- **规划即补洞：一种基于扩散的具身任务规划框架，适用于不确定性环境**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.01097)]\n\n- **扩散遇见 DAgger：为眼手协调示范学习注入强大动力**，arXiv 2024。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.17768)]\n\n- **3D 扩散策略**，arXiv 2024。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.03954)] [[网站](https:\u002F\u002F3d-diffusion-policy.github.io)] [[代码](https:\u002F\u002Fgithub.com\u002FYanjieZe\u002F3D-Diffusion-Policy)]\n\n- **基于离散扩散的大规模无动作视频预训练，用于高效策略学习**，arXiv 2024。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.14407)] [[网站](https:\u002F\u002Fvideo-diff.github.io\u002F)]\n\n- **SculptDiff：基于目标条件扩散策略，从人类处学习机器人泥塑技能**，arXiv 2024。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.10401)] [[网站](https:\u002F\u002Fsites.google.com\u002Fandrew.cmu.edu\u002Fimitation-sculpting\u002Fhome)] [[代码](https:\u002F\u002Fgithub.com\u002Falison-bartsch\u002FSculptDiff)]\n\n- **子目标扩散器：由粗到细生成子目标，以指导机器人操作中的模型预测控制**，ICRA 2024。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.13085)] [[网站](https:\u002F\u002Fsites.google.com\u002Fview\u002Fsubgoal-diffuser-mpc)]\n\n- **面向运动学感知的多任务机器人操作的层次化扩散策略**，CVPR 2024，[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.03890)] [[代码](https:\u002F\u002Fgithub.com\u002Fdyson-ai\u002Fhdp)] [[网站](https:\u002F\u002Fyusufma03.github.io\u002Fprojects\u002Fhdp\u002F)]\n\n- **扩散 EDFs：在 SE(3) 上进行双等变去噪生成建模，用于视觉引导的机器人操作**，CVPR 2024，[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.02685)] [[代码](https:\u002F\u002Fgithub.com\u002Ftomato1mule\u002Fdiffusion_edf)] [[网站](https:\u002F\u002Fsites.google.com\u002Fview\u002Fdiffusion-edfs)]\n\n### 轨迹生成\n\n- **MotionDiffuse：基于扩散模型的文本驱动人体运动生成**，arXiv 2022。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.15001)] [[代码](https:\u002F\u002Fgithub.com\u002Fmingyuan-zhang\u002FMotionDiffuse)]\n  \n- **人体运动扩散模型**，ICLR 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14916)] [[代码](https:\u002F\u002Fgithub.com\u002Fguytevet\u002Fmotion-diffusion-model)]\n  \n- **通过潜在空间中的运动扩散执行指令**，CVPR 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.04048)] [[代码](https:\u002F\u002Fgithub.com\u002Fchenfengye\u002Fmotion-latent-diffusion)]\n  \n- **MoFusion：一种基于去噪扩散的运动合成框架**，CVPR 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.04495)] [[代码](https:\u002F\u002Fgithub.com\u002FOFA-Sys\u002FMoFusion)]\n  \n- **ReMoDiffuse：检索增强型运动扩散模型**，ICCV 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01116)] [[代码](https:\u002F\u002Fgithub.com\u002Fmingyuan-zhang\u002FReMoDiffuse)]\n  \n- **MotionDiffuser：使用扩散模型进行可控的多智能体运动预测**，CVPR 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03083)]\n\n- **通过文本引导的视频生成学习通用策略**，NeurIPS 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.08576)]\n\n- **EquiDiff：用于轨迹预测的条件等变扩散模型**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.06564)]\n\n- **运动规划扩散：利用扩散模型学习与规划机器人运动**，IROS 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01557)] [[代码](https:\u002F\u002Fgithub.com\u002Fjacarvalho\u002Fmpd-public)]\n\n- **EDMP：基于成本集合引导的运动规划扩散模型**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11414)] [[代码](https:\u002F\u002Fgithub.com\u002Fvishal-2000\u002FEDMP)]\n\n- **使用可组合扩散模型采样受限轨迹**，IROS 2023。[[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=UAylEpIMNE)]\n\n- **DiMSam：在部分可观测环境下作为任务和运动规划采样器的扩散模型**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.13196)]\n\n- **用于学习无碰撞轨迹生成的条件化得分模型**，NeurIPSW 2022。[[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=4Vqu4N1jjrx)]\n\n- **视频语言规划**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10625)] [[代码](https:\u002F\u002Fgithub.com\u002Fvideo-language-planning\u002Fvlp_code)]\n  \n- **通过密集对应关系从无动作视频中学习行动**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.08576)] [[代码](https:\u002F\u002Fgithub.com\u002Fflow-diffusion\u002FAVDC)]\n  \n- **学习交互式真实世界模拟器**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06114)]\n\n- **DNAct：扩散引导的多任务三维策略学习**，arXiv 2024。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.04115)] [[网站](https:\u002F\u002Fdnact.github.io\u002F)]\n\n- **单运动扩散**，ICLR 2024，[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05905)] [[代码](https:\u002F\u002Fgithub.com\u002FSinMDM\u002FSinMDM)] [[网站](https:\u002F\u002Fsinmdm.github.io\u002FSinMDM-page\u002F)]\n\n- **READ：用于运动规划的检索增强非对称扩散模型**，CVPR 2024，[[论文](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fhtml\u002FOba_READ_Retrieval-Enhanced_Asymmetric_Diffusion_for_Motion_Planning_CVPR_2024_paper.html)] [[代码](https:\u002F\u002Fgithub.com\u002FObat2343\u002FREAD)]\n\n### 数据增强\n\n- **通过语义想象的经验扩展机器人学习**，RSS 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.11550)]\n\n- **GenAug：通过生成式数据增强将行为迁移至未见场景**，RSS 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.06671)] [[代码](https:\u002F\u002Fgithub.com\u002Fgenaug\u002Fgenaug)]\n\n- **合成经验回放**，NeurIPS 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.06614)] [[代码](https:\u002F\u002Fgithub.com\u002Fconglu1997\u002FSynthER)]\n\n- **基于策略引导的轨迹扩散的世界模型**，arXiv 2023。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08533)]\n\n- **通过轨迹拼接提炼条件扩散模型以用于离线强化学习**，arXiv 2024。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.00807)]\n\n- **DiffStitch：利用基于扩散的轨迹拼接提升离线强化学习**，arXiv 2024。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.02439)]\n\n- **Flow to Better：通过偏好轨迹生成实现基于偏好的离线强化学习**，ICLR 2024。[[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=EG68RSznLT)] [[代码](https:\u002F\u002Fgithub.com\u002FZzl35\u002Fflow-to-better)]\n\n## 引用\n```\n@article{zhu2023diffusion,\n  title={强化学习中的扩散模型：综述},\n  author={Zhu, Zhengbang and Zhao, Hanye and He, Haoran and Zhong, Yichao and Zhang, Shenyu and Yu, Yong and Zhang, Weinan},\n  journal={arXiv预印本 arXiv:2311.01223},\n  year={2023}\n}\n```","# Diff4RLSurvey 快速上手指南\n\nDiff4RLSurvey 并非一个可直接运行的单一软件库，而是一个**综述资源集合**，汇集了基于扩散模型（Diffusion Models）进行序列决策（如强化学习、模仿学习）的论文与代码链接。本指南将指导你如何利用该资源找到并运行相关的开源项目。\n\n## 环境准备\n\n由于该仓库包含多个不同子项目的链接，具体环境要求取决于你选择的论文实现。但大多数基于扩散模型的强化学习项目具有以下通用前置依赖：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+) 或 macOS\n*   **Python 版本**: Python 3.8 - 3.10\n*   **深度学习框架**: PyTorch (通常版本 >= 1.10)\n*   **硬件要求**: 建议配备 NVIDIA GPU (显存 >= 8GB)，部分大型模型需要更高配置\n*   **包管理工具**: `pip` 或 `conda`\n\n**推荐国内加速方案：**\n在安装 Python 依赖时，建议使用清华源或阿里源以提升下载速度：\n```bash\npip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 安装步骤\n\n由于 Diff4RLSurvey 本身只是一个目录列表，你需要先浏览其整理的论文列表，选择感兴趣的项目（例如 `Diffusion Policy` 或 `Diffuser`），然后进入对应项目的 GitHub 仓库进行安装。\n\n以下以列表中热门的 **Diffusion Policy** 为例演示安装流程：\n\n1.  **克隆目标项目代码**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Freal-stanford\u002Fdiffusion_policy.git\n    cd diffusion_policy\n    ```\n\n2.  **创建虚拟环境**\n    ```bash\n    conda create -n diffpolicy python=3.8\n    conda activate diffpolicy\n    ```\n\n3.  **安装依赖**\n    （若项目提供 `requirements.txt`）\n    ```bash\n    pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n    （若项目使用 `setup.py`）\n    ```bash\n    pip install -e . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n> **注意**：对于列表中的其他项目（如 `diffuser`, `beso`, `madiff` 等），请替换上述第一步中的仓库地址，并严格参照该项目根目录下的 `README.md` 进行依赖安装。\n\n## 基本使用\n\n使用流程通常分为三个阶段：**查找资源** -> **配置任务** -> **运行训练\u002F评估**。\n\n### 1. 查找资源\n访问 [Diff4RLSurvey GitHub 页面](https:\u002F\u002Fgithub.com\u002F你的目标仓库地址)，在 `Papers` 章节下根据需求筛选：\n*   **Offline RL**: 离线强化学习（如 `Diffuser`, `EDP`）\n*   **Imitation Learning**: 模仿学习（如 `Diffusion Policy`, `BESO`）\n*   **Online RL**: 在线强化学习\n\n点击对应论文的 `[code]` 链接跳转至具体实现仓库。\n\n### 2. 运行示例 (以 Diffusion Policy 为例)\n大多数项目会提供预配置的脚本用于快速复现结果。\n\n**启动训练：**\n```bash\npython train.py --config configs\u002Fpush_t_diffusion_transformer.yaml\n```\n\n**启动评估\u002F可视化：**\n```bash\npython eval.py --config configs\u002Fpush_t_diffusion_transformer.yaml --checkpoint outputs\u002Fcheckpoint_latest.pt\n```\n\n### 3. 自定义开发\n若需将扩散模型应用于新任务，通常只需修改配置文件中的 `dataset_dir`（数据集路径）和 `task_name`，无需大幅改动核心代码。例如：\n```yaml\n# config.yaml\ntask_name: \"your_custom_task\"\ndataset_dir: \"\u002Fpath\u002Fto\u002Fyour\u002Fdataset\"\ndiffusion:\n  n_steps: 100\n  horizon: 16\n```\n\n通过这种方式，你可以利用 Diff4RLSurvey 提供的丰富生态，快速验证扩散模型在不同决策场景下的效果。","某自动驾驶初创公司的算法团队正致力于利用离线强化学习（Offline RL）优化城市复杂路况下的车辆规划策略，急需引入扩散模型来提升行为生成的多样性与安全性。\n\n### 没有 Diff4RLSurvey 时\n- **文献检索如大海捞针**：研究人员需在 arXiv 和 GitHub 上手动筛选海量论文，难以区分哪些是专门针对“序列决策”的扩散模型，哪些仅是图像生成应用。\n- **技术选型缺乏依据**：面对 Planning、Imitation Learning 或 Data Augmentation 等不同技术路线，团队无法快速找到对应的 SOTA（最先进）算法代码库进行对比验证。\n- **复现成本极高**：由于缺乏统一的资源索引，工程师常因找不到官方实现代码或关键超参数配置，导致在复现如\"Diffuser\"或\"IDQL\"等经典算法时浪费数周时间。\n- **前沿动态滞后**：团队容易遗漏如\"SafeDiffuser\"（安全规划）或\"AdaptDiffuser\"（自适应规划）等最新细分领域成果，导致技术方案落后于社区进展。\n\n### 使用 Diff4RLSurvey 后\n- **一站式精准导航**：团队直接通过分类目录（如 Offline RL、Trajectory Generation）锁定目标，几分钟内即可获取该领域核心论文与对应代码链接。\n- **场景化方案匹配**：借助清晰的子类别划分，团队迅速定位到适合自动驾驶的“分层扩散规划”与“安全约束生成”类算法，大幅缩短技术调研周期。\n- **高效复现与迭代**：依托仓库提供的权威代码源和论文对照，工程师在一周内成功复现了多个基线模型，并快速在此基础上进行了定制化改进。\n- **紧跟技术前沿**：团队定期查阅该仓库更新，及时将“奖励引导条件扩散”等新范式融入系统，显著提升了车辆在极端工况下的决策鲁棒性。\n\nDiff4RLSurvey 通过将分散的学术资源结构化，将原本数月的调研复现工作压缩至数天，成为连接扩散理论研究与实际工程落地的关键加速器。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fapexrl_Diff4RLSurvey_1aed593a.png","apexrl","ApexRL","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fapexrl_81a48dfe.png","RL group @ ApexLab in SJTU. Focusing on reinforcement learning, imitation learning and robotics.",null,"apexrl@apex.sjtu.edu.cn","http:\u002F\u002Fapex.sjtu.edu.cn","https:\u002F\u002Fgithub.com\u002Fapexrl",661,31,"2026-04-10T07:14:52","Apache-2.0",1,"未说明",{"notes":87,"python":85,"dependencies":88},"该仓库是一个综述资源列表，收集了关于扩散模型在序列决策中应用的论文和代码链接，本身不是一个可直接运行的单一软件工具。具体的运行环境需求（如操作系统、GPU、Python 版本等）需参考列表中各个独立项目（如 Diffuser, Diffusion Policy 等）的原始仓库说明。",[],[15,13,90],"其他","2026-03-27T02:49:30.150509","2026-04-13T22:44:33.470294",[],[]]