[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-huawei-noah--HEBO":3,"tool-huawei-noah--HEBO":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":118,"forks":119,"last_commit_at":120,"license":79,"difficulty_score":10,"env_os":121,"env_gpu":122,"env_ram":123,"env_deps":124,"category_tags":137,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":138,"updated_at":139,"faqs":140,"releases":170},1266,"huawei-noah\u002FHEBO","HEBO","Bayesian optimisation & Reinforcement Learning library developed by Huawei Noah's Ark Lab","HEBO 是由华为诺亚方舟实验室开发的一款用于贝叶斯优化的开源工具库，专注于解决复杂场景下的参数调优问题。它通过结合进化算法与贝叶斯方法，在异方差性较强的优化任务中表现出色，尤其适用于黑盒优化、高维空间搜索等挑战性场景。\n\nHEBO 的核心优势在于其高效性和灵活性，能够处理多种变量类型，并在实际应用中展现出卓越的性能。例如，它曾获得 NeurIPS 2020 黑盒优化挑战赛冠军，证明了其在实际问题中的有效性。\n\nHEBO 主要面向研究人员和开发者，特别是那些需要进行自动化调参、优化模型训练过程或探索复杂函数最优解的用户。对于希望提升优化效率、减少人工干预的团队来说，HEBO 提供了一个强大且易于集成的解决方案。\n\n其独特的技术亮点包括对异方差性建模的支持、高效的采样策略以及良好的可扩展性，使其在多个领域如机器学习、机器人控制和生物信息学中都有广泛应用潜力。","# Bayesian Optimization, Reinforcement Learning, & Generative Model Research\r\n\r\nThis directory contains official implementations for Bayesian Optimization, Reinforcement Learning, & Generative Model works developed by\r\nHuawei, Noah's Ark Lab.\r\n\r\n- Bayesian Optimisation Research\r\n    - [HEBO: Heteroscedastic Evolutionary Bayesian Optimisation](.\u002FHEBO)\r\n    - [MCBO: Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization](.\u002FMCBO)\r\n    - [NAP: End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes](.\u002FNAP)\r\n    - [RDUCB: High Dimensional Bayesian Optimisation with Random Decompositions](.\u002FRDUCB)\r\n    - [AntBO: Antibody Design with Combinatorial Bayesian Optimisation](.\u002FAntBO)\r\n    - [BOiLS: Bayesian Optimisation for Logic Synthesis](.\u002FBOiLS)\r\n    - [T-LBO: BO in a latent space shaped by metric learning](.\u002FT-LBO)\r\n    - [Bayesian Optimisation with Compositional Optimisers](.\u002FCompBO)\r\n    - [AIRBO: Efficient Robust Bayesian Optimisation for Arbitrary Uncertain Inputs](.\u002FAIRBO)\r\n- Reinforcement Learning Research\r\n    - [Enhancing Reinforcement Learning Agents with Local Guides](RLLG)\r\n    - [Sauté RL and Simmer RL: Safe Reinforcement Learning Using Safety State Augmentation ](.\u002FSIMMER)\r\n    - [Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](.\u002FPMDB)\r\n    - [SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks](.\u002FSparsePO)\r\n- Robotics Research\r\n    - [ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning](.\u002FROSLLM)\r\n- Generative Model Research\r\n    - [EM-LLM: Human-like Episodic Memory for Infinite Context LLMs](.\u002FEM-LLM)\r\n    - [Mixture of Attentions For Speculative Decoding](https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Ftree\u002Fmixture-of-attentions\u002F)\r\n\r\nFurther instructions are provided in the README files associated to each project.\r\n\r\n# Bayesian Optimisation Research\r\n\r\n## [HEBO](.\u002FHEBO)\r\n\r\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_bfddb1028855.png\" alt=\"drawing\" width=\"400\"\u002F>\r\n\r\nBayesian optimization library developed by Huawei Noahs Ark Decision Making and Reasoning (DMnR) lab. The \u003Cstrong>\r\nwinning submission \u003C\u002Fstrong> to\r\nthe [NeurIPS 2020 Black-Box Optimisation Challenge](https:\u002F\u002Fbbochallenge.com\u002Fleaderboard).\r\n\r\n## [MCBO](.\u002FMCBO)\r\n\r\n\u003Cp float=\"center\">\r\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_0870f67bc720.png\" width=\"400\"\u002F>\r\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_2f4b72e7296d.png\" width=\"400\"\u002F>\r\n\u003C\u002Fp>\r\n\r\n\r\nCodebase associated to: [Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization\r\n](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.09803) accepted at **NeurIPS** (2023).\r\n\r\n##### Abstract\r\n\r\nThis paper introduces a modular framework for Mixed-variable and Combinatorial\r\nBayesian Optimization (MCBO) to address the lack of systematic benchmarking\r\nand standardized evaluation in the field. Current MCBO papers often introduce\r\nnon-diverse or non-standard benchmarks to evaluate their methods, impeding the\r\nproper assessment of different MCBO primitives and their combinations. Additionally, papers introducing a solution for a\r\nsingle MCBO primitive often omit\r\nbenchmarking against baselines that utilize the same methods for the remaining primitives. This omission is primarily\r\ndue to the significant implementation\r\noverhead involved, resulting in a lack of controlled assessments and an inability\r\nto showcase the merits of a contribution effectively. To overcome these challenges, our proposed framework enables an\r\neffortless combination of Bayesian\r\nOptimization components, and provides a diverse set of synthetic and real-world\r\nbenchmarking tasks. Leveraging this flexibility, we implement 47 novel MCBO\r\nalgorithms and benchmark them against seven existing MCBO solvers and five\r\nstandard black-box optimization algorithms on ten tasks, conducting over 4000\r\nexperiments. Our findings reveal a superior combination of MCBO primitives\r\noutperforming existing approaches and illustrate the significance of model fit and\r\nthe use of a trust region. We make our MCBO library available under the MIT\r\nlicense at https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Ftree\u002Fmaster\u002FMCBO.\r\n\r\n## [NAP: End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes](.\u002FNAP\u002F)\r\n\r\n![regret-all](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_707591f2317e.png)\r\nCodebase associated\r\nto: [End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15930),\r\naccepted at **NeurIPS** (2023).\r\n\r\n##### Abstract\r\n\r\nMeta-Bayesian optimisation (meta-BO) aims to improve the sample efficiency of Bayesian optimisation by leveraging data\r\nfrom related tasks. While previous methods successfully meta-learn either a surrogate model or an acquisition function\r\nindependently, joint training of both components remains an open challenge. This paper proposes the first end-to-end\r\ndifferentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer\r\narchitectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled\r\nacquisition data. Early on, we notice that training transformer-based neural processes from scratch with RL is\r\nchallenging due to insufficient supervision, especially when rewards are sparse. We formalise this claim with a\r\ncombinatorial analysis showing that the widely used notion of regret as a reward signal exhibits a logarithmic sparsity\r\npattern in trajectory lengths. To tackle this problem, we augment the RL objective with an auxiliary task that guides\r\npart of the architecture to learn a valid probabilistic model as an inductive bias. We demonstrate that our method\r\nachieves state-of-the-art regret results against various baselines in experiments on standard hyperparameter\r\noptimisation tasks and also outperforms others in the real-world problems of mixed-integer programming tuning, antibody\r\ndesign, and logic synthesis for electronic design automation.\r\n\r\n## [RDUCB: High Dimensional Bayesian Optimisation with Random Decompositions](.\u002FRDUCB)\r\n\r\n\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_ee3435ff62ec.png\" width=\"400\" \u002F>\r\n\u003C\u002Fp>\r\n\r\nCodebase associated\r\nto: [Are Random Decomositions all we need in High Dimensional Bayesian Optimisation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.12844.pdf)\r\naccepted at **ICML** (2023).\r\n\r\n##### Abstract\r\n\r\nLearning decompositions of expensive-to-evaluate black-box functions promises to scale Bayesian optimisation (BO) to\r\nhigh-dimensional\r\nproblems. However, the success of these techniques depends on finding proper decompositions\r\nthat accurately represent the black-box. While previous works learn those decompositions based\r\non data, we investigate data-independent decomposition sampling rules in this paper. We find\r\nthat data-driven learners of decompositions can be easily misled towards local decompositions\r\nthat do not hold globally across the search space. Then, we formally show that a random tree-based\r\ndecomposition sampler exhibits favourable theoretical guarantees that effectively trade\r\noff maximal information gain and functional mismatch between the actual black-box and its\r\nsurrogate as provided by the decomposition. Those results motivate the development of the\r\nrandom decomposition upper-confidence bound algorithm (RDUCB) that is straightforward\r\nto implement - (almost) plug-and-play - and, surprisingly, yields significant empirical gains\r\ncompared to the previous state-of-the-art on a comprehensive set of benchmarks. We also\r\nconfirm the plug-and-play nature of our modelling component by integrating our method with\r\nHEBO, showing improved practical gains in the highest dimensional tasks from Bayesmark.\r\n\r\n## [AntBO: Antibody Design with Combinatorial Bayesian Optimisation](.\u002FAntBO)\r\n\r\n![AntBO overview](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_a4ac484f8d9a.png)\r\n\r\nCodebase associated\r\nto: [AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS2667237522002764)\r\npublished in **Cell Reports Method**s (2023).\r\n\r\n##### Abstract\r\n\r\nAntibodies are canonically Y-shaped multimeric proteins capable of highly specific molecular recognition. The CDRH3\r\nregion located at the tip of variable chains of an antibody dominates antigen-binding specificity. Therefore, it is a\r\npriority to design optimal antigen-specific CDRH3 regions to develop therapeutic antibodies to combat harmful pathogens.\r\nHowever, the combinatorial nature of CDRH3 sequence space makes it impossible to search for an optimal binding sequence\r\nexhaustively and efficiently, especially not experimentally. Here, we present AntBO: a Combinatorial Bayesian\r\nOptimisation framework enabling efficient in silico design of the CDRH3 region. Ideally, antibodies should bind to\r\ntheir target antigen and be free from any harmful outcomes. Therefore, we introduce the CDRH3 trust region that\r\nrestricts the search to sequences with feasible developability scores. To benchmark AntBO, we use the Absolut! software\r\nsuite as a black-box oracle because it can score the target specificity and affinity of designed antibodies in silico\r\nin an unconstrained fashion. The results across 188 antigens demonstrate the benefit of AntBO in designing CDRH3 regions\r\nwith diverse biophysical properties. In under 200 protein designs, AntBO can suggest antibody sequences that outperform\r\nthe best binding sequence drawn from 6.9 million experimentally obtained CDRH3s and a commonly used genetic algorithm\r\nbaseline. Additionally, AntBO finds very-high affinity CDRH3 sequences in only 38 protein designs whilst requiring no\r\ndomain knowledge. We conclude AntBO brings automated antibody design methods closer to what is practically viable for\r\nin vitro experimentation.\r\n\r\n## [BOiLS: Bayesian Optimisation for Logic Synthesis](.\u002FBOiLS)\r\n\r\n\u003Cp align=\"center\">\r\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_6b9b294ad13c.png\" alt=\"drawing\" width=\"500\"\u002F>\r\n\u003C\u002Fp>\r\n\r\nCodebase associated to: [BOiLS: Bayesian Optimisation for Logic Synthesis](https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.06178) accepted\r\nat **DATE22** conference.\r\n\r\n##### Abstract\r\n\r\nOptimising the quality-of-results (QoR) of circuits during logic synthesis is a formidable challenge necessitating the\r\nexploration of exponentially sized search spaces. While expert-designed operations aid in uncovering effective\r\nsequences, the increase in complexity of logic circuits favours automated procedures. Inspired by the successes of\r\nmachine learning, researchers adapted deep learning and reinforcement learning to logic synthesis applications. However\r\nsuccessful, those techniques suffer from high sample complexities preventing widespread adoption. To enable efficient\r\nand scalable solutions, we propose BOiLS, the first algorithm adapting modern Bayesian optimisation to navigate the\r\nspace of synthesis operations. BOiLS requires no human intervention and effectively trades-off exploration versus\r\nexploitation through novel Gaussian process kernels and trust-region constrained acquisitions. In a set of experiments\r\non EPFL benchmarks, we demonstrate BOiLS's superior performance compared to state-of-the-art in terms of both sample\r\nefficiency and QoR values.\r\n\r\n## [T-LBO: BO in a latent space shaped by metric learning](.\u002FT-LBO)\r\n\r\n\u003Cp float=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_479ac4d16c6d.png\" width=\"400\" \u002F>\r\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_de6f36264c0a.png\" width=\"400\" \u002F>\r\n\u003C\u002Fp>\r\n\r\nCodebase associated\r\nto: [High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.03609)\r\n\r\n##### Abstract\r\n\r\nWe introduce a method based on deep metric learning to perform Bayesian optimisation over high-dimensional, structured\r\ninput spaces using variational autoencoders (VAEs). By extending ideas from supervised deep metric learning, we address\r\na longstanding problem in high-dimensional VAE Bayesian optimisation, namely how to enforce a discriminative latent\r\nspace as an inductive bias. Importantly, we achieve such an inductive bias using just 1% of the available labelled data\r\nrelative to previous work, highlighting the sample efficiency of our approach. As a theoretical contribution, we present\r\na proof of vanishing regret for our method. As an empirical contribution, we present state-of-the-art results on\r\nreal-world high-dimensional black-box optimisation problems including property-guided molecule generation. It is the\r\nhope that the results presented in this paper can act as a guiding principle for realising effective high-dimensional\r\nBayesian optimisation.\r\n\r\n## [Bayesian Optimisation with Compositional Optimisers](.\u002FCompBO)\r\n\r\n\u003Cdiv style=\"text-align:center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_e356cf52d9b1.png\" alt=\"drawing\" width=\"600\"\u002F>\r\n\r\n\u003Cdiv style=\"text-align:left\">\u003C\u002F>\r\n\r\nCodebase associated\r\nto: [Are we Forgetting about Compositional Optimisers in Bayesian Optimisation?](https:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fv22\u002F20-1422.html)\r\naccepted at **JMLR**.\r\n\r\n##### Abstract\r\n\r\nBayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial\r\nperformance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that\r\nacquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive\r\nempirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically\r\nequivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional\r\noptimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical\r\nadvantages of the compositional approach to acquisition function maximisation across 3958 individual experiments\r\ncomprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition\r\nfunction maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield\r\nperformance improvements across all domains in which Bayesian optimisation is currently being applied.\r\n\r\n## [AIRBO](.\u002FAIRBO)\r\n\r\n\u003Cdiv style=\"text-align:center\">\u003Cimg src=\".\u002FAIRBO\u002Ffigures\\opt_performance.png\" alt=\"drawing\" width=\"600\"\u002F>\u003C\u002Fdiv>\r\n\r\n\r\nCodebase associated\r\nto: [Efficient Robust Bayesian Optimization for Arbitrary Uncertain Inputs](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.20145)\r\n\r\n##### Abstract\r\n\r\nBayesian Optimization (BO) is a sample-efficient optimization algorithm widely employed across various applications. In\r\nsome challenging BO tasks, input uncertainty arises due to the inevitable randomness in the optimization process, such\r\nas machining errors, execution noise, or contextual variability. This uncertainty deviates the input from the intended\r\nvalue before evaluation, resulting in significant performance fluctuations in the final result. In this paper, we\r\nintroduce a novel robust Bayesian Optimization algorithm, AIRBO, which can effectively identify a robust optimum that\r\nperforms consistently well under arbitrary input uncertainty. Our method directly models the uncertain inputs of\r\narbitrary distributions by empowering the Gaussian Process with the Maximum Mean Discrepancy (MMD) and further\r\naccelerates the posterior inference via Nystrom approximation. Rigorous theoretical regret bound is established under\r\nMMD estimation error and extensive experiments on synthetic functions and real problems demonstrate that our approach\r\ncan handle various input uncertainties and achieve state-of-the-art performance.\r\n\r\n# Reinforcement Learning Research\r\n\r\n## [Enhancing Reinforcement Learning Agents with Local Guides](.\u002FRLLG)\r\n\r\nCodebase associated\r\nto [paper: Enhancing Reinforcement Learning Agents with Local Guides](https:\u002F\u002Fhal.science\u002Fhal-04052358\u002Ffile\u002FFinal_Reinforcement_Learning_with_Local_Guides.pdf)\r\n\r\n##### Abstract\r\n\r\nThis paper addresses the problem of integrating local guide policies into a Reinforcement Learning agent. For this, we\r\nshow how\r\nto adapt existing algorithms to this setting before introducing a\r\nnovel algorithm based on a noisy policy-switching procedure. This\r\napproach builds on a proper Approximate Policy Evaluation (APE)\r\nscheme to provide a perturbation that carefully leads the local\r\nguides towards better actions. We evaluated our method on a set\r\nof classical Reinforcement Learning problems, including safetycritical systems where the agent cannot enter some areas\r\nat the\r\nrisk of triggering catastrophic consequences. In all the proposed\r\nenvironments, our agent proved to be efficient at leveraging those\r\npolicies to improve the performance of any APE-based Reinforcement Learning algorithm, especially in its first learning\r\nstages.\r\n\r\n## [Sauté RL and Simmer RL: Safe Reinforcement Learning Using Safety State Augmentation](.\u002FSIMMER)\r\n\r\nCodebase associated to: [Sauté RL: Almost Surely Safe RL Using State Augmentation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2202.06558.pdf)\r\nand  [Enhancing Safe Exploration Using Safety State Augmentation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2206.02675.pdf).\r\n\r\n##### Abstract for Sauté RL: Almost Surely Safe RL Using State Augmentation (ICML 2022)\r\n\r\nSatisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement\r\nLearning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability\r\none. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety\r\nconstraints are eliminated by augmenting them into the state-space and reshaping the objective. We show that Saute MDP\r\nsatisfies the Bellman equation and moves us closer to solving Safe RL with constraints satisfied almost surely. We argue\r\nthat Saute MDP allows to view Safe RL problem from a different perspective enabling new features. For instance, our\r\napproach has a plug-and-play nature, i.e., any RL algorithm can be \"sauteed\". Additionally, state augmentation allows\r\nfor policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their\r\nstate-of-the-art counterparts when constraint satisfaction is of high importance.\r\n\r\n##### Abstract for Effects of Safety State Augmentation on Safe Exploration (NeurIPS 2022)\r\n\r\nSafe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost\r\nis sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in\r\nsafety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is\r\nnonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward\r\nconstraint violation, while its initial value indicates the available safety budget. This idea allows us to derive\r\npolicies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for\r\nRL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints\r\nimposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest\r\nthat simmering a safe algorithm can improve safety during training for both settings. We further show that Simmer can\r\nstabilize training and improve the performance of safe RL with average constraints.\r\n\r\n## [Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](.\u002FPMDB)\r\n\r\nCode associdated\r\nto: [Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](https:\u002F\u002Fnips.cc\u002FConferences\u002F2022\u002FSchedule?showEvent=54842)\r\naccepted\r\nat **NeurIPS (2022)** conference.\r\n\r\n#### Abstract\r\n\r\nModel-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously\r\ncollected static dataset and a dynamics model. While learned through reuse of static dataset, the dynamics model's\r\ngeneralization ability hopefully promotes policy learning if properly utilized. To that end, several works propose to\r\nquantify the uncertainty of predicted dynamics, and explicitly apply it to penalize reward. However, as the dynamics and\r\nthe reward are intrinsically different factors in context of MDP, characterizing the impact of dynamics uncertainty\r\nthrough reward penalty may incur unexpected tradeoff between model utilization and risk avoidance. In this work, we\r\ninstead maintain a belief distribution over dynamics, and evaluate\u002Foptimize policy through biased sampling from the\r\nbelief. The sampling procedure, biased towards pessimism, is derived based on an alternating Markov game formulation\r\nof offline RL. We formally show that the biased sampling naturally induces an updated dynamics belief with\r\npolicy-dependent reweighting factor, termed *Pessimism-Modulated Dynamics Belief*. To improve policy, we devise an\r\niterative regularized policy optimization algorithm for the game, with guarantee of monotonous improvement under certain\r\ncondition. To make practical, we further devise an offline RL algorithm to approximately find the solution. Empirical\r\nresults show that the proposed approach achieves state-of-the-art performance on a wide range of benchmark tasks.\r\n\r\n## [SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks](.SparsePO)\r\n\r\nCode associated with the paper \"[SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.05102)\"\r\n\r\n#### Abstract\r\n\r\nPreference Optimization (PO) has proven an effective step for aligning language models to human-desired behaviors. \r\nCurrent variants, following the offline Direct Preference Optimization objective, have focused on a strict setting where all tokens \r\nare contributing signals of KL divergence and rewards to the loss function. However, human preference is not affected by each word in \r\na sequence equally but is often dependent on specific words or phrases, e.g. existence of toxic terms leads to non-preferred responses. \r\nBased on this observation, we argue that not all tokens should be weighted equally during PO and propose a flexible objective \r\ntermed SparsePO, that aims to automatically learn to weight the KL divergence and reward corresponding to each token during PO training. \r\nWe propose two different variants of weight-masks that can either be derived from the reference model itself or learned on the fly.\r\nNotably, our method induces sparsity in the learned masks, allowing the model to learn how to best weight reward and KL divergence\r\ncontributions at the token level, learning an optimal level of mask sparsity. Extensive experiments on multiple domains, \r\nincluding sentiment control, dialogue, text summarization and text-to-code generation, illustrate that our approach assigns meaningful \r\nweights to tokens according to the target task, generates more responses with the desired preference and improves reasoning tasks by \r\nup to 2 percentage points compared to other token- and response-level PO methods.\r\n\r\n\r\n# Generative Model Research\r\n\r\n## [EM-LLM: Human-like Episodic Memory for Infinite Context LLMs](.\u002FEM-LLM)\r\n\r\nCode associated with our EM-LLM paper: [[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.09450).\r\n\r\n\u003Cdiv align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_bcac490c532c.png\" alt=\"emllm_rag_fc\" width=\"400\"\u002F>\r\n\u003C\u002Fdiv>\r\n\r\n#### Abstract\r\n\r\nLarge language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In contrast, the human brain excels at organising and retrieving episodic experiences across vast temporal scales, spanning a lifetime. In this work, we introduce EM-LLM, a novel approach that integrates key aspects of human episodic memory and event cognition into LLMs with no fine-tuning, enabling them to handle practically infinite context lengths while maintaining computational efficiency. EM-LLM organises sequences of tokens into coherent episodic events using a combination of Bayesian surprise and graph-theoretic boundary refinement in an online fashion. When needed, these events are retrieved through a two-stage memory process, combining similarity-based and temporally contiguous retrieval for efficient and human-like access to relevant information. Experiments on the LongBench and $\\infty$-Bench benchmarks demonstrate EM-LLM's superior performance, consistently outperforming the state-of-the-art retrieval model InfLLM across various baseline LLMs. In addition, EM-LLM outperforms its popular counterpart, RAG, in a wide range of tasks, while requiring similar resources. Notably, EM-LLM's performance even surpasses full-context models in most tasks, while successfully performing retrieval across 5 million tokens -- a scale computationally infeasible for such models. Finally, our analysis reveals strong correlations between EM-LLM's event segmentation and human-perceived events, suggesting a bridge between this artificial system and its biological counterpart, thereby offering a novel computational framework for exploring human memory mechanisms.\r\n\r\n## [Mixture of Attentions for Speculative Decoding](https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Ftree\u002Fmixture-of-attentions\u002F)\r\n\r\n#### Abstract\r\n\r\nThe growth in the number of parameters of Large Language Models (LLMs) has led to a significant surge in computational requirements, making them challenging and costly to deploy. Speculative decoding (SD) leverages smaller models to efficiently propose future tokens, which are then verified by the LLM in parallel. Small models that utilise activations from the LLM currently achieve the fastest decoding speeds. However, we identify several limitations of SD models including the lack of on-policyness during training and partial observability. To address these shortcomings, we propose a more grounded architecture for small models by introducing a Mixture of Attentions for SD. Our novel architecture can be applied in two scenarios: a conventional single device deployment and a novel client-server deployment where the small model is hosted on a consumer device and the LLM on a server. In a single-device scenario, we demonstrate state-of-the-art speedups improving EAGLE-2 by 9.5% and its acceptance length by 25%. In a client-server setting, our experiments demonstrate: 1) state-of-the-art latencies with minimal calls to the server for different network conditions, and 2) in the event of a complete disconnection, our approach can maintain higher accuracy compared to other SD methods and demonstrates advantages over API calls to LLMs, which would otherwise be unable to continue the generation process.\r\n\r\n","# 贝叶斯优化、强化学习与生成模型研究\n\n本目录包含华为诺亚方舟实验室在贝叶斯优化、强化学习及生成模型领域开展的官方实现。\n\n- 贝叶斯优化研究  \n  - [HEBO：异方差进化式贝叶斯优化](.\u002FHEBO)  \n  - [MCBO：组合与混合变量贝叶斯优化的框架与基准](.\u002FMCBO)  \n  - [NAP：基于Transformer神经过程的端到端元贝叶斯优化](.\u002FNAP)  \n  - [RDUCB：基于随机分解的高维贝叶斯优化](.\u002FRDUCB)  \n  - [AntBO：基于组合式贝叶斯优化的抗体设计](.\u002FAntBO)  \n  - [BOiLS：面向逻辑综合的贝叶斯优化](.\u002FBOiLS)  \n  - [T-LBO：在度量学习塑造的隐空间中进行贝叶斯优化](.\u002FT-LBO)  \n  - [基于组合优化器的贝叶斯优化](.\u002FCompBO)  \n  - [AIRBO：针对任意不确定输入的高效鲁棒贝叶斯优化](.\u002FAIRBO)  \n- 强化学习研究  \n  - [利用局部引导增强强化学习智能体](RLLG)  \n  - [Sauté RL与Simmer RL：基于安全状态增强的安全强化学习](.\u002FSIMMER)  \n  - [基于模型的离线强化学习：采用悲观主义调节的动力学信念](.\u002FPMDB)  \n  - [SparsePO：通过稀疏标记掩码控制大语言模型的偏好对齐](.\u002FSparsePO)  \n- 机器人研究  \n  - [ROS-LLM：一种用于具身AI的任务反馈与结构化推理的ROS框架](.\u002FROSLLM)  \n- 生成模型研究  \n  - [EM-LLM：面向无限上下文大语言模型的人类似 episodic 记忆](.\u002FEM-LLM)  \n  - [注意力混合用于推测性解码](https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Ftree\u002Fmixture-of-attentions\u002F)  \n\n各项目的README文件中提供了进一步的说明。\n\n# 贝叶斯优化研究\n\n## [HEBO](.\u002FHEBO)\n\n![drawing](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_bfddb1028855.png \"绘图\")  \n由华为诺亚方舟决策与推理（DMnR）实验室开发的贝叶斯优化库。该库是[NeurIPS 2020黑盒优化挑战赛](https:\u002F\u002Fbbochallenge.com\u002Fleaderboard)的\u003Cstrong>冠军提交\u003C\u002Fstrong>。\n\n## [MCBO](.\u002FMCBO)\n\n\u003Cp float=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_0870f67bc720.png\" width=\"400\"\u002F>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_2f4b72e7296d.png\" width=\"400\"\u002F>\n\u003C\u002Fp>\n\n\n与论文《组合与混合变量贝叶斯优化的框架与基准》相关的代码库，已被**NeurIPS**（2023年）接收。\n\n##### 摘要\n\n本文提出了一种模块化的混合变量与组合式贝叶斯优化（MCBO）框架，以解决该领域缺乏系统性基准测试与标准化评估的问题。当前的MCBO相关论文往往采用非多样化或非标准的基准来评估其方法，这阻碍了对不同MCBO原语及其组合的合理评估。此外，那些仅针对单一MCBO原语提出解决方案的论文，通常会忽略与使用相同方法的基线进行对比评估。这种遗漏主要是由于实施过程中涉及的巨大开销，导致缺乏受控的评估，也无法有效展示贡献的价值。为克服这些挑战，我们提出的框架能够轻松组合贝叶斯优化组件，并提供多样化的合成与真实世界基准任务。借助这一灵活性，我们实现了47种新型MCBO算法，并在10个任务上将其与7种现有MCBO求解器以及5种标准黑盒优化算法进行对比，共进行了超过4000次实验。我们的研究结果表明，最优的MCBO原语组合优于现有方法，并凸显了模型拟合与信任区域使用的重要性。我们已将MCBO库以MIT许可证发布于https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Ftree\u002Fmaster\u002FMCBO。\n\n## [NAP：基于Transformer神经过程的端到端元贝叶斯优化](.\u002FNAP\u002F)\n\n![regret-all](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_707591f2317e.png)\n与论文《基于Transformer神经过程的端到端元贝叶斯优化》相关的代码库，已被**NeurIPS**（2023年）接收。\n\n##### 摘要\n\n元贝叶斯优化（meta-BO）旨在通过利用相关任务的数据来提高贝叶斯优化的样本效率。尽管先前的方法已成功地分别对代理模型或获取函数进行元学习，但同时训练这两个组件仍是一个未解之谜。本文提出了首个端到端可微分的元贝叶斯优化框架，将神经过程推广应用于通过Transformer架构学习获取函数。我们借助强化学习（RL）使这一端到端框架得以运行，以解决缺乏标注获取数据的问题。早期我们发现，由于监督不足，尤其是当奖励稀疏时，从零开始训练基于Transformer的神经过程颇具挑战性。我们通过一项组合分析正式论证了这一观点：广泛使用的“后悔”作为奖励信号，在轨迹长度上呈现出对数型的稀疏模式。为解决这一问题，我们在强化学习目标中加入了一个辅助任务，引导部分架构学习有效的概率模型作为归纳偏置。实验证明，我们的方法在标准超参数优化任务中取得了领先于各类基线的后悔表现，并且在混合整数规划调优、抗体设计以及电子设计自动化中的逻辑综合等实际问题上也优于其他方法。\n\n## [RDUCB：基于随机分解的高维贝叶斯优化](.\u002FRDUCB)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_ee3435ff62ec.png\" width=\"400\" \u002F>\n\u003C\u002Fp>\n\n与以下论文相关的代码库：\n[在高维贝叶斯优化中，我们是否只需要随机分解？](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.12844.pdf)\n已被 **ICML**（2023年）接收。\n\n##### 摘要\n\n学习对昂贵的黑箱函数进行分解，有望将贝叶斯优化（BO）扩展到高维问题。然而，这些方法的成功取决于能否找到能够准确表征黑箱的合适分解。尽管先前的研究主要基于数据来学习这些分解，但在本文中，我们探讨了不依赖数据的分解采样规则。我们发现，基于数据的分解学习方法很容易被误导，倾向于选择仅在局部成立、而在整个搜索空间中并不普遍适用的分解。随后，我们从理论上证明，一种基于随机树的分解采样器具有良好的理论保证，能够在最大化信息增益与实际黑箱与其由分解所构建的代理模型之间的功能不匹配之间取得有效权衡。这些结果促使我们开发了随机分解上置信界算法（RDUCB），该算法实现简单——几乎即插即用——并且令人惊讶的是，在一系列全面的基准测试中，其性能显著优于此前的最先进方法。此外，我们还通过将该方法与HEBO集成，验证了其建模组件的即插即用特性，并在Bayesmark提供的最高维任务中取得了更好的实际效果。\n\n## [AntBO：基于组合式贝叶斯优化的抗体设计](.\u002FAntBO)\n\n![AntBO概览](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_a4ac484f8d9a.png)\n\n与以下论文相关的代码库：\n[AntBO：迈向基于组合式贝叶斯优化的现实世界自动化抗体设计](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS2667237522002764)\n发表于 **Cell Reports Methods**（2023年）。\n\n##### 摘要\n\n抗体是一种典型的Y形多聚体蛋白质，能够实现高度特异性的分子识别。位于抗体可变链顶端的CDRH3区域主导着抗原结合的特异性。因此，设计最优的抗原特异性CDRH3区域是开发用于对抗有害病原体的治疗性抗体的关键。然而，由于CDRH3序列空间具有组合性质，要以高效且穷尽的方式搜索最优结合序列几乎是不可能的，尤其是通过实验手段更是难以实现。在此，我们提出了AntBO：一个组合式贝叶斯优化框架，能够高效地在计算机上设计CDRH3区域。理想情况下，抗体应能特异性结合其靶抗原，同时避免任何有害后果。因此，我们引入了CDRH3信任区域，将搜索范围限制在具有可行开发性评分的序列上。为了评估AntBO的性能，我们使用Absolut!软件套件作为黑箱预言机，因为它可以在不受约束的情况下对设计的抗体的靶标特异性和亲和力进行计算机评分。在针对188种抗原的实验中，结果表明AntBO在设计具有多样化生物物理特性的CDRH3区域方面具有显著优势。在不到200次蛋白质设计中，AntBO就能提出一些抗体序列，其结合性能优于从690万条实验获得的CDRH3序列以及常用的遗传算法基线。此外，AntBO仅需38次蛋白质设计便能找到超高亲和力的CDRH3序列，且无需任何领域知识。综上所述，AntBO使自动化抗体设计方法更接近于可在体外实验中实际应用的水平。\n\n## [BOiLS：面向逻辑综合的贝叶斯优化](.\u002FBOiLS)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_6b9b294ad13c.png\" alt=\"drawing\" width=\"500\"\u002F>\n\u003C\u002Fp>\n\n与以下论文相关的代码库：\n[BOiLS：面向逻辑综合的贝叶斯优化](https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.06178)，已被 **DATE22** 会议接收。\n\n##### 摘要\n\n在逻辑综合过程中优化电路的结果质量（QoR）是一项艰巨的挑战，需要探索指数级规模的搜索空间。尽管专家设计的操作有助于发现有效的序列，但随着逻辑电路复杂性的增加，自动化流程的优势愈发凸显。受机器学习成功经验的启发，研究人员将深度学习和强化学习应用于逻辑综合领域。然而，尽管这些技术取得了成功，它们仍面临样本复杂度高的问题，从而阻碍了广泛应用。为实现高效且可扩展的解决方案，我们提出了BOiLS——首个将现代贝叶斯优化应用于逻辑综合操作空间的算法。BOiLS无需人工干预，并通过新颖的高斯过程核函数和信任区域约束的采集策略，有效地权衡探索与利用。在EPFL基准上的系列实验中，我们证明了BOiLS在样本效率和QoR值方面均优于现有最先进方法。\n\n## [T-LBO：在度量学习塑造的隐空间中进行贝叶斯优化](.\u002FT-LBO)\n\n\u003Cp float=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_479ac4d16c6d.png\" width=\"400\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_de6f36264c0a.png\" width=\"400\" \u002F>\n\u003C\u002Fp>\n\n与以下论文相关的代码库：\n[基于变分自编码器与深度度量学习的高维贝叶斯优化](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.03609)\n\n##### 摘要\n\n我们提出了一种基于深度度量学习的方法，利用变分自编码器（VAE）在高维、结构化的输入空间上进行贝叶斯优化。通过扩展监督式深度度量学习的思想，我们解决了高维VAE贝叶斯优化中的一个长期难题，即如何将判别性的隐空间作为归纳偏置加以强制。值得注意的是，与以往的工作相比，我们仅使用了1%的可用标注数据就实现了这一归纳偏置，凸显了我们方法的样本效率。在理论贡献方面，我们给出了该方法的渐近后悔消失证明。在实证贡献方面，我们在包括基于属性的分子生成在内的现实世界高维黑箱优化问题上取得了最先进的结果。我们希望本文所呈现的结果能够为实现高效的高维贝叶斯优化提供指导原则。\n\n## [基于组合优化器的贝叶斯优化](.\u002FCompBO)\n\n\u003Cdiv style=\"text-align:center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_e356cf52d9b1.png\" alt=\"drawing\" width=\"600\"\u002F>\u003C\u002Fdiv>\n\n\u003Cdiv style=\"text-align:left\">\u003C\u002F>\n\n与以下内容相关的代码库：\n[我们在贝叶斯优化中是否忽视了组合优化器？](https:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fv22\u002F20-1422.html)\n已被**JMLR**接收。\n\n##### 摘要\n\n贝叶斯优化提供了一种样本高效的全局优化方法。在此框架下，一个至关重要的性能决定子程序是获取函数的最大化问题，而这一任务因获取函数通常为非凸函数、因而难以优化而变得更加复杂。本文对获取函数最大化的各种方法进行了全面的实证研究。此外，通过推导出流行获取函数的新型但数学等价的组合形式，我们将最大化问题重新表述为一个组合优化问题，从而能够受益于该领域的丰富文献。我们强调了在包含3958个独立实验的合成优化任务以及Bayesmark任务中，采用组合优化方法进行获取函数最大化的实证优势。鉴于获取函数最大化子程序的通用性，我们认为，采用组合优化器有望在当前应用贝叶斯优化的所有领域实现性能提升。\n\n## [AIRBO](.\u002FAIRBO)\n\n\u003Cdiv style=\"text-align:center\">\u003Cimg src=\".\u002FAIRBO\u002Ffigures\\opt_performance.png\" alt=\"drawing\" width=\"600\"\u002F>\u003C\u002Fdiv>\n\n\n与以下内容相关的代码库：\n[针对任意不确定输入的高效鲁棒贝叶斯优化](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.20145)\n\n##### 摘要\n\n贝叶斯优化（BO）是一种广泛应用于各类场景的样本高效优化算法。在一些具有挑战性的贝叶斯优化任务中，由于优化过程中不可避免的随机性，例如加工误差、执行噪声或情境变化，会导致输入不确定性。这种不确定性会使输入在评估前偏离预期值，从而造成最终结果的显著波动。本文提出了一种新颖的鲁棒贝叶斯优化算法——AIRBO，该算法能够有效识别出在任意输入不确定性下均表现稳健的鲁棒最优解。我们的方法通过赋予高斯过程最大均值差异（MMD）来直接建模任意分布的不确定输入，并进一步借助Nystrom近似加速后验推断。我们在MMD估计误差下建立了严格的理论遗憾上界，并通过大量针对合成函数和实际问题的实验表明，我们的方法能够应对多种输入不确定性并达到最先进的性能。\n\n# 强化学习研究\n\n## [利用局部引导增强强化学习智能体](.\u002FRLLG)\n\n与以下论文相关的代码库：\n[利用局部引导增强强化学习智能体](https:\u002F\u002Fhal.science\u002Fhal-04052358\u002Ffile\u002FFinal_Reinforcement_Learning_with_Local_Guides.pdf)\n\n##### 摘要\n\n本文探讨了如何将局部引导策略集成到强化学习智能体中。为此，我们首先展示了如何将现有算法适配到这一场景，随后提出了一种基于噪声式策略切换的新算法。该方法以恰当的近似策略评估（APE）方案为基础，引入一种微小扰动，从而引导局部引导策略逐步趋向更优的动作。我们在一组经典的强化学习问题上对该方法进行了评估，其中包括一些安全关键系统，这些系统要求智能体不得进入某些区域，以免引发灾难性后果。在所有提出的环境中，我们的智能体均能高效地利用这些策略，提升任何基于APE的强化学习算法的性能，尤其是在其初始学习阶段。\n\n## [Sauté RL与Simmer RL：利用安全状态增强的安全强化学习](.\u002FSIMMER)\n\n与以下论文相关的代码库：\n[Sauté RL：利用状态增强实现几乎必然安全的强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2202.06558.pdf)  \n以及 [利用安全状态增强提升安全探索](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2206.02675.pdf)。\n\n##### Sauté RL摘要：利用状态增强实现几乎必然安全的强化学习（ICML 2022）\n\n在现实应用场景中，几乎必然（或概率为1）满足安全约束对于强化学习（RL）的部署至关重要。例如，飞机的起降理想情况下应以概率1完成。我们通过引入安全增强型（Saute）马尔可夫决策过程（MDP），将安全约束融入状态空间并重塑目标函数，从而消除这些约束。我们证明了Saute MDP满足贝尔曼方程，使我们更接近解决几乎必然满足约束的安全强化学习问题。我们认为，Saute MDP能够从不同视角看待安全强化学习问题，从而带来新的特性。例如，我们的方法具有即插即用的特性，即任何强化学习算法均可“安全增强”。此外，状态增强还支持跨安全约束的策略泛化。最后，我们证明，在约束满足度极为重要的情况下，Saute RL算法的表现可优于其最先进的同类算法。\n\n##### 安全状态增强对安全探索的影响摘要（NeurIPS 2022）\n\n安全探索是无模型强化学习（RL）中一个极具挑战性且重要的问题。通常，安全成本稀疏且未知，这不可避免地导致约束违反——而在安全关键的应用中，这种情况理应尽量避免。我们通过在状态空间中增加一个安全状态来解决这一问题，该安全状态仅在约束满足时取非负值。同时，该状态的数值也代表了距离约束违反的程度，其初始值则反映了可用的安全预算。这一思路使我们能够制定训练期间的安全预算调度策略。我们称这种方法为Simmer（用于强化学习的安全策略改进），以体现此类调度的谨慎性。我们将这一理念应用于两类安全强化学习问题：一类是在平均成本上施加约束的强化学习，另一类是在概率为1的情况下施加约束的强化学习。实验表明，对安全算法进行“Simmer”处理能够在两种场景下均提升训练过程中的安全性。此外，我们还发现，Simmer能够稳定训练过程并提升平均约束下的安全强化学习性能。\n\n## [基于模型的离线强化学习：悲观主义调制的动力学信念](.\u002FPMDB)\n\n与论文[基于模型的离线强化学习：悲观主义调制的动力学信念](https:\u002F\u002Fnips.cc\u002FConferences\u002F2022\u002FSchedule?showEvent=54842)相关的代码，已被**NeurIPS 2022**会议录用。\n\n#### 摘要\n\n基于模型的离线强化学习旨在利用先前收集的静态数据集和动力学模型，寻找高奖励策略。尽管通过复用静态数据集进行学习，但若能合理利用动力学模型的泛化能力，则有望促进策略学习。为此，已有研究提出量化预测动力学的不确定性，并将其显式应用于奖励惩罚。然而，在马尔可夫决策过程的框架下，动力学与奖励本质上是不同的因素，仅通过奖励惩罚来刻画动力学不确定性的影响，可能会在模型利用与风险规避之间产生意想不到的权衡。本文则另辟蹊径，维护一个关于动力学的信念分布，并通过从该信念中进行偏差采样来评估与优化策略。这种偏向悲观的采样过程基于离线强化学习的交替马尔可夫博弈形式化推导而来。我们正式证明，这种偏差采样会自然地诱导出一个具有策略相关重加权因子的动力学信念更新，称为“悲观主义调制的动力学信念”。为提升策略性能，我们设计了一种迭代的正则化策略优化算法用于该博弈，并在特定条件下保证策略性能的单调提升。为进一步实现实际应用，我们还进一步设计了一种离线强化学习算法以近似求解该问题。实验结果表明，所提出的方案在广泛的基准任务上均取得了最先进的性能。\n\n## [SparsePO：通过稀疏标记掩码控制大语言模型的偏好对齐](.SparsePO)\n\n与论文“[SparsePO：通过稀疏标记掩码控制大语言模型的偏好对齐](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.05102)”相关的代码。\n\n#### 摘要\n\n偏好优化（PO）已被证明是使语言模型对齐到人类期望行为的有效步骤。当前的变体遵循离线直接偏好优化目标，主要聚焦于一种严格设定：所有标记均作为KL散度和奖励信号贡献到损失函数中。然而，人类偏好并非受序列中每个词同等影响，而是往往依赖于特定的词或短语——例如，存在有毒词汇会导致非偏好响应。基于这一观察，我们认为在偏好优化过程中并非所有标记都应被同等加权，并提出了一种灵活的目标，称为SparsePO，旨在自动学习在偏好优化训练过程中为每个标记对应的KL散度和奖励分配权重。我们提出了两种不同的权重掩码变体，它们既可以由参考模型本身推导而来，也可以在线学习获得。值得注意的是，我们的方法会在学习到的掩码中引入稀疏性，从而使模型能够学会如何在标记级别最优地权衡奖励与KL散度的贡献，并自动学习最佳的掩码稀疏程度。我们在多个领域开展的广泛实验，包括情感控制、对话、文本摘要以及文本到代码生成等，均表明我们的方法能够根据目标任务为各标记分配有意义的权重，生成更多符合预期偏好的响应，并在推理任务上较其他基于标记或响应级别的偏好优化方法提升多达2个百分点。\n\n\n# 生成模型研究\n\n## [EM-LLM：面向无限上下文的大语言模型的人类式情景记忆](.\u002FEM-LLM)\n\n与我们的EM-LLM论文相关的代码：[[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.09450)。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_readme_bcac490c532c.png\" alt=\"emllm_rag_fc\" width=\"400\"\u002F>\n\u003C\u002Fdiv>\n\n#### 摘要\n\n大型语言模型（LLMs）已展现出卓越的能力，但在处理超长上下文时仍面临挑战，限制了其在长序列中的连贯性和准确性。相比之下，人类大脑擅长在跨越一生的广阔时间尺度上组织并检索情景式经验。本文提出EM-LLM，这是一种新颖的方法，无需微调即可将人类情景记忆与事件认知的关键要素融入大语言模型，使其能够在保持计算效率的同时处理几乎无限的上下文长度。EM-LLM通过在线结合贝叶斯惊喜与图论边界细化，将标记序列组织成连贯的情景事件。必要时，这些事件可通过两阶段记忆过程被检索，结合基于相似性和时间连续性的检索方式，实现高效且贴近人类的关联信息访问。在LongBench和$\\infty$-Bench基准测试上的实验表明，EM-LLM表现出色，无论是在各类基础大语言模型上，还是在与当前最先进检索模型InfLLM的对比中，均持续领先。此外，EM-LLM在广泛的任务中也优于其热门对应物RAG，同时所需资源相近。尤为值得一提的是，EM-LLM的表现甚至在多数任务中超越了全上下文模型，同时成功实现了对500万标记的检索——这一规模对于此类模型而言在计算上几乎是不可行的。最后，我们的分析揭示了EM-LLM的情景分割与人类感知事件之间存在强相关性，这表明该人工系统与其生物对应物之间存在桥梁，从而为探索人类记忆机制提供了一个全新的计算框架。\n\n## [用于推测解码的注意力混合机制](https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Ftree\u002Fmixture-of-attentions\u002F)\n\n#### 摘要\n\n大型语言模型（LLMs）参数量的快速增长导致其计算需求大幅攀升，使得部署变得极具挑战且成本高昂。推测解码（SD）利用小型模型高效地预测未来标记，随后由大型语言模型并行验证这些预测。目前，基于大型语言模型激活信息的小型模型能够实现最快的解码速度。然而，我们发现SD模型存在若干局限性，包括训练过程中缺乏策略性以及部分可观测性问题。为解决这些不足，我们提出了一种更为稳健的小型模型架构——用于SD的注意力混合机制。我们的新架构可应用于两种场景：传统的单设备部署，以及一种新型的客户端—服务器部署模式——其中小型模型托管于用户端设备，而大型语言模型则运行在服务器上。在单设备场景下，我们展示了最先进的加速效果：EAGLE-2的解码速度提升9.5%，接受长度增加25%。而在客户端—服务器设置中，我们的实验表明：1）在不同网络条件下均能实现最优的延迟表现，并且对服务器的调用次数极少；2）在完全断网的情况下，我们的方法相较于其他SD方法仍能保持更高的准确性，并且优于直接调用大型语言模型API的做法——后者在断网后将无法继续生成过程。","# HEBO 快速上手指南\n\n## 环境准备\n\n- **系统要求**：HEBO 支持 Linux、macOS 和 Windows 系统，推荐使用 Linux（如 Ubuntu 18.04 或更高版本）。\n- **前置依赖**：\n  - Python 3.7+（建议使用 3.8 或 3.9）\n  - pip 工具\n  - CUDA（如需 GPU 加速，请安装 NVIDIA CUDA Toolkit）\n\n> 如果您在中国，可以使用国内镜像源加速依赖安装，例如使用 `https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`。\n\n---\n\n## 安装步骤\n\n1. 克隆 HEBO 项目仓库：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO.git\ncd HEBO\n```\n\n2. 安装依赖包：\n\n```bash\npip install -r requirements.txt\n```\n\n> 若网络较慢，可使用以下命令加速安装（替换为国内镜像）：\n\n```bash\npip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n---\n\n## 基本使用\n\n以下是一个最简单的 HEBO 使用示例，用于优化一个一维函数（如 `f(x) = x^2`）。\n\n### 示例代码\n\n```python\nfrom hebo.optimizers import HEBO\nimport numpy as np\n\n# 定义搜索空间\nspace = {\n    'x': np.linspace(-5, 5, 100)\n}\n\n# 定义目标函数\ndef objective(x):\n    return x['x'] ** 2\n\n# 初始化优化器\noptimizer = HEBO(space)\n\n# 进行优化\nfor _ in range(20):\n    params = optimizer.suggest()\n    loss = objective(params)\n    optimizer.observe(params, loss)\n\n# 输出最优解\nprint(optimizer.get_result())\n```\n\n### 说明\n\n- `space` 定义了变量的搜索范围和离散化方式。\n- `objective` 是需要优化的目标函数。\n- `suggest()` 方法返回下一个待评估的参数组合。\n- `observe()` 方法将评估结果反馈给优化器。\n- `get_result()` 返回当前找到的最优解。\n\n---\n\n以上是 HEBO 的快速上手指南，更多高级功能和用法请参考项目目录下的 `README.md` 文件。","某自动驾驶公司正在开发一款基于强化学习的智能路径规划系统，用于优化车辆在复杂城市道路中的行驶效率和安全性。该团队需要对路径规划算法进行大量超参数调优，以适应不同路况和交通规则。\n\n### 没有 HEBO 时\n- 超参数调优过程依赖人工经验，耗时且难以覆盖所有可能的组合。\n- 传统网格搜索和随机搜索效率低下，无法在合理时间内找到最优解。\n- 缺乏对不确定性建模的支持，导致优化结果不稳定，难以评估置信度。\n- 需要手动编写大量代码实现贝叶斯优化逻辑，开发周期长、维护成本高。\n- 对于高维、异构参数空间（如混合整数与连续变量）支持有限，限制了模型的灵活性。\n\n### 使用 HEBO 后\n- HEBO 提供了高效的贝叶斯优化框架，显著减少了超参数调优所需的时间和资源。\n- 支持异构参数空间，能够处理路径规划中涉及的多种类型变量（如速度、转向角等）。\n- 内置不确定性建模机制，提升了优化过程的稳定性，并提供了更可靠的置信区间估计。\n- 通过模块化设计，可快速集成到现有强化学习系统中，无需从零开始实现优化逻辑。\n- 提供了丰富的基准测试和实验支持，便于对比不同优化策略的效果，加速算法迭代。\n\nHEBO 为自动驾驶系统的超参数优化提供了高效、灵活且稳定的解决方案，显著提升了研发效率与模型性能。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuawei-noah_HEBO_707591f2.png","huawei-noah","HUAWEI Noah's Ark Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhuawei-noah_2ebfeb12.png","Working with and contributing to the open source community in data mining, artificial intelligence, and related fields.",null,"http:\u002F\u002Fwww.noahlab.com.hk\u002F","https:\u002F\u002Fgithub.com\u002Fhuawei-noah",[83,87,91,95,98,102,106,109,112,115],{"name":84,"color":85,"percentage":86},"Jupyter Notebook","#DA5B0B",80.8,{"name":88,"color":89,"percentage":90},"Python","#3572A5",18.5,{"name":92,"color":93,"percentage":94},"Shell","#89e051",0.3,{"name":96,"color":97,"percentage":94},"Jinja","#a52a22",{"name":99,"color":100,"percentage":101},"CMake","#DA3434",0.1,{"name":103,"color":104,"percentage":105},"C++","#f34b7d",0,{"name":107,"color":108,"percentage":105},"Tcl","#e4cc98",{"name":110,"color":111,"percentage":105},"Makefile","#427819",{"name":113,"color":114,"percentage":105},"Dockerfile","#384d54",{"name":116,"color":117,"percentage":105},"Batchfile","#C1F12E",2737,461,"2026-04-05T22:02:18","Linux, macOS","需要 NVIDIA GPU，显存 8GB+，CUDA 11.7+","16GB+",{"notes":125,"python":126,"dependencies":127},"建议使用 conda 管理环境，首次运行需下载约 5GB 模型文件","3.8+",[128,129,130,131,132,133,134,135,136],"torch>=2.0","transformers>=4.30","accelerate","numpy","scipy","pandas","matplotlib","gpytorch","tqdm",[13,15,54],"2026-03-27T02:49:30.150509","2026-04-06T08:41:35.576170",[141,146,151,156,161,166],{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},5764,"如何解决 'ValueError: The value argument must be within the support' 错误？","该错误通常出现在使用 HEBO v0.1 版本时。可以通过升级到最新版本的 HEBO 来解决此问题，或者在初始化 Hebo 时添加参数 `model_name='gp'`，例如：`Hebo(space, model_name='gp')`。","https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Fissues\u002F2",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},5765,"如何解决 'mcbo test_aig_task.py' 运行时出现的 'TypeError: __init__() missing 3 required positional arguments' 错误？","此错误是因为缺少必要的参数。可以尝试在代码中显式提供 `obj_dims`, `out_constr_dims`, 和 `out_upper_constr_vals` 参数，并根据任务需求设置合适的值。此外，维护者已更新相关文件以修复此问题。","https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Fissues\u002F78",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},5766,"如何解决 'ModuleNotFoundError: No module named 'pymoo.operators.mixed_variable_operator'' 错误？","该错误是由于 pymoo 库的版本变更导致的。可以通过使用以下命令安装 HEBO 的开发版本来解决：`pip install git+https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO.git#subdirectory=HEBO`。","https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Fissues\u002F26",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},5767,"如何解决 'ImportError: No module named 'pymoo.operators.mixed_variable_operator'' 错误？","这是由于 pymoo 库从 v0.6.0 开始移除了 `mixed_variable_operator` 模块。建议升级 HEBO 到 v0.3.5 或更高版本以修复此问题。","https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Fissues\u002F19",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},5768,"如何解决 'aig_optimization' 任务运行时出现的 'UnboundLocalError: local variable 'group' referenced before assignment' 错误？","该错误可能与指定的 `designs_group_id` 有关。请确保正确设置 `designs_group_id`，并检查路径是否有效。如果仍然遇到问题，请确认 abc 工具的路径是否正确，并确保所有依赖库（如 libreadline.so.7）已正确安装。","https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FHEBO\u002Fissues\u002F70",{"id":167,"question_zh":168,"answer_zh":169,"source_url":165},5769,"如何解决 'HEBO\u002FMCBO\u002Flibs\u002FEDA\u002Fabc: error while loading shared libraries: libreadline.so.7: cannot open shared object file' 错误？","该错误表明系统缺少 libreadline.so.7 库。可以通过运行以下命令安装：`sudo apt update && sudo apt install libreadline7`。如果无法通过 apt 安装，可尝试手动下载并安装该库。",[171],{"id":172,"version":173,"summary_zh":174,"released_at":175},115118,"v0.3.4","Update AntBO","2022-11-22T06:30:54"]