[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-facebookresearch--schedule_free":3,"tool-facebookresearch--schedule_free":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":87,"forks":88,"last_commit_at":89,"license":90,"difficulty_score":91,"env_os":78,"env_gpu":92,"env_ram":93,"env_deps":94,"category_tags":100,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":101,"updated_at":102,"faqs":103,"releases":133},3797,"facebookresearch\u002Fschedule_free","schedule_free","Schedule-Free Optimization in PyTorch","schedule_free 是一个专为 PyTorch 设计的开源优化器库，旨在让深度学习模型的训练过程更加高效且简单。它核心解决了传统训练中必须预先设定复杂学习率调整策略（如热身、余弦退火等）的痛点。使用 schedule_free，开发者无需再为“何时停止”或“如何衰减学习率”而烦恼，因为它能在不依赖任何预设调度表的情况下，实现比传统方法更快或至少相当的收敛速度。\n\n这款工具非常适合深度学习研究人员和工程开发者，尤其是那些希望简化超参数调优流程、加速实验迭代的人群。其独特的技术亮点在于用“插值与平均”机制替代了传统的动量更新方式，通过维护两组序列分别在梯度计算和模型评估间切换，既免除了繁琐的学习率调度，又保持了与基础优化器（如 SGD、AdamW）相同的内存占用。目前，schedule_free 已提供 SGD、AdamW 及社区贡献的 RAdam 等多种变体，并支持通过简单的 `train()` 和 `eval()` 模式切换即可无缝集成到现有训练代码中，让模型训练真正走上“少调度、高性能”的快车道。","# Schedule-Free Learning\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_schedule_free_readme_150dfe653354.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fschedulefree) [![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_schedule_free_readme_150dfe653354.png\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fschedulefree)\n\nSchedule-Free Optimizers in PyTorch.\n\nPreprint: [The Road Less Scheduled](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.15682)\n\nAuthors: Aaron Defazio, Xingyu (Alice) Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky\n\n**TLDR** Faster training without schedules - no need to specify the stopping time\u002Fsteps in advance!\n\n``` pip install schedulefree ```\n\nWe provide several Schedule-Free optimizer implementations:\n- `SGDScheduleFree` and `SGDScheduleFreeReference`: Schedule-free variants of SGD\n- `AdamWScheduleFree` and `AdamWScheduleFreeReference`: Schedule-free variants of AdamW\n- `RAdamScheduleFree`: Schedule-free variant of RAdam, which eliminates the need for both learning rate scheduling and warmup (implementation community contributed)\n- Experimental `ScheduleFreeWrapper` to combine with other optimizers\n\n`ScheduleFreeReference` versions have a simplified implementation, but which use more memory. There are also `ScheduleFreeClosure` versions which can be used with PyTorch's optimizer step closures.\n\nA [Jax implementation](https:\u002F\u002Foptax.readthedocs.io\u002Fen\u002Flatest\u002Fapi\u002Fcontrib.html#schedule-free) is availiable as part of Optax.\n\n## Approach\nSchedule-Free learning replaces the momentum of an underlying optimizer with a combination of interpolation and averaging. In the case of gradient descent, the basic Schedule-Free update is:\n\n$$\n\\begin{align*}\ny_{t} & = (1-\\beta)z_{t} + \\beta x_{t},\\\\\nz_{t+1} & =z_{t}-\\gamma\\nabla f(y_{t}),\\\\\nx_{t+1} & =\\left(1-\\frac{1}{t+1}\\right)x_{t}+\\frac{1}{t+1}z_{t+1},\n\\end{align*}\n$$\n\nHere $x$ is the sequence that evaluations of test\u002Fval loss should occur at, which differs from the primary iterates $z$ and the gradient evaluation locations $y$. The updates to $z$ correspond to the underlying optimizer, in this case a simple gradient step.\n\nAs the name suggests, Schedule-Free learning does not require a decreasing learning rate schedule, yet typically out-performs, or at worst matches, SOTA schedules such as cosine-decay and linear decay. Only two sequences need to be stored at a time (the third can be computed from the other two on the fly) so this method has the same memory requirements as the base optimizer (parameter buffer + momentum).\n\nWe provide both AdamW and SGD versions in this repo, as we as an experimental\nwrapper version that can be used with any base optimizer.\n\n## How to Use\nSince our optimizer uses two different points for gradient calls and test\u002Fval loss calculations, it's necessary to switch the param buffer between the two during training. This is done by calling `optimizer.train()` at the same place you call `model.train()` and `optimizer.eval()` at the same place you call `model.eval()`. The optimizer should also be placed in eval mode when storing checkpoints.\n\nIf your code supports PyTorch Optimizer step closures, you can use the closure forms of the optimizers, which do not require the `.train()` and `.eval()` calls.\n\n## Paper\nIf you use Schedule-Free training in your work, please cite our [preprint](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.15682) as:\n```\n@misc{defazio2024road,\n      title={The Road Less Scheduled}, \n      author={Aaron Defazio and Xingyu Yang and Harsh Mehta and Konstantin Mishchenko and Ahmed Khaled and Ashok Cutkosky},\n      year={2024},\n      eprint={2405.15682},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n\n### Releases\n\n*New* Version 1.4 adds a RAdam implementation by [nhamanasu](https:\u002F\u002Fgithub.com\u002Fnhamanasu).\n\nVersion 1.3 changes the behavior of weight decay during learning rate warmup\nto improve stabiliy and be more consistant with the behavior of standard AdamW in PyTorch. The previous implementation is still available as `AdamWScheduleFreePaper`.\n\n### Examples\nExamples of using the `schedulefree` package can be found in the `examples` folder. These include:\n- [Image classification (MNIST) using Convnets](.\u002Fexamples\u002Fmnist\u002F)*\n- More examples to be added\n\n*Example is modified from [Pytorch Examples Repo](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fexamples).\n\n\n## Caveats \n- If your model uses BatchNorm, additional modifications are required for test\u002Fval evaluations to work correctly. Right before eval, something like the following:\n  \n ```python\n  model.train()\n  optimizer.eval()\n  with torch.no_grad():\n    for batch in itertools.islice(train_loader, 50):\n      model(batch)\n  model.eval()\n```\nThis will replace the `training_mean`\u002F`training_var` cache (which is updated in each forward pass when in model.train() mode) with values calculated at $x$ instead of $y$. Using PreciseBN will also avoid this issue.\n\n\n - Many code bases use additional features that may not be compatible without additional changes. For instance, if the parameters are cached in fp16, the cached versions will need to be updated manually to ensure the correct $x$ sequence is used for evaluation, not the $y$ sequence. Some GradScalers do this.\n - Training is more sensitive to the choice of $\\beta$ than you may expect from standard momentum. Our default of $0.9$ works on most problems but it may be necessary to increase the value to $0.95$ or $0.98$ particularly for very long training runs.\n - There is no need to use a learning rate scheduler, however the code is compatible with one.\n - Using learning rate warmup is recommended. This is supported through the `warmup_steps` parameter.\n - This method does require tuning - it won't necessarily out-perform a schedule approach without also tuning regularization and learning rate parameters.\n - For SGD, a learning rate 10x-50x larger than classical rates seems to be a good starting point.\n - For AdamW, learning rates in the range 1x-10x larger than with schedule-based approaches seem to work.\n\n # Wrapper Version\n\nWe offer a highly experimental wrapper version `ScheduleFreeWrapper` which can wrap any base optimizer. When using this version, you can disable the base optimizer's \n momentum, as it's no longer necessary when using our wrapper's momentum (although you can use both types of momentum if you want).\n\n Example usage:\n ```\n  base_optimizer = torch.optim.RMSprop(model.parameters(), lr=0.0025)\n  optimizer = ScheduleFreeWrapper(\n    base_optimizer, momentum=0.9, weight_decay_at_y=0.1)\n ```\n If you set weight decay on the base optimizer, it computes weight decay at $z$. We offer the option to compute weight decay at $y$, via the `weight_decay_at_y`\n parameter, which seems to give better results in our experiments.\n\nWe also include a ScheduleFreeWrapperReference version which uses more memory but is more numerically stable, we recommended this version for early experimentation or research work. \n\n# License\nSee the [License file](\u002FLICENSE).\n\n# Related Work\n\nSchedule-Free learning can be seen as an interpolation between primal averaging ($\\beta=1$) and Polyak-Ruppert averaging ($\\beta=0)$. The advantage of this interpolation is that it allows us to get the best of both worlds. We can achieve the fast early stage convergence of Polyak-Ruppert averaging (since the $z$ sequence moves quicker than the $x$ sequence), without the $x$ sequence straying too far from the $z$ sequence, which causes instability.\n\nOur method is also related to Nesterov's accelerated method (Nesterov, 1983) in AC-SA form (Ghadimi & Lan 2010):\n\n$$\n\\begin{align*}\ny_{t} & =(1-2\u002F(t+1))x_{t} + (2\u002F(t+1))z_{t}\\\\\nz_{t+1} & =z_{t}-\\frac{t}{2L}\\nabla f(y_{t})\\\\\nx_{t+1} & =(1-2\u002F(t+1))x_{t}+(2\u002F(t+1))z_{t+1}\n\\end{align*}\n$$\n\nOur approach has the same three sequences, but uses very different weights, and crucially, does not include an increasing learning rate over time, which is essential for accelerated rates with Nesterov's method. We also use different weight sequences for the interpolation operation versus the averaging operation.\n\nTail averaging approaches such as Stochastic Weight Averaging (Izmailov et al., 2018) and LAtest Weight Averaging (Kaddour, 2022; Sanyal et al., 2023) combine averaging with large or cyclic learning rates. They still require the use of a schedule, introduce additional hyper-parameters to tune, and require additional memory compared to our technique. It is also possible to use SWA and LAWA on top of our approach, potentially giving further gains.\n\nPortes et al. (2022) use cyclic learning rate schedules with increasing cycle periods to give a method that explores multiple points along the Pareto frontier of training time vs eval performance. Each point at the end of a cycle is an approximation to the model from a tuned schedule ending at that time. Our method gives the entire frontier, rather than just a few points along the path.\n\nExponential moving averages (EMA) of the iterate sequence are used in the popular Lookahead optimizer (Zhang et al., 2019). The Lookahead method can be seen as the EMA version of primal averaging, just as exponential weight averaging is the EMA version of Polyak-Ruppert averaging. Our extra interpolation step can potentially be used in combination with the lookahead optimizer also.\n","# 无调度学习\n[![下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_schedule_free_readme_150dfe653354.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fschedulefree) [![下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_schedule_free_readme_150dfe653354.png\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fschedulefree)\n\nPyTorch 中的无调度优化器。\n\n预印本：[The Road Less Scheduled](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.15682)\n\n作者：Aaron Defazio、Xingyu (Alice) Yang、Harsh Mehta、Konstantin Mishchenko、Ahmed Khaled、Ashok Cutkosky\n\n**简而言之**：无需调度即可实现更快的训练——无需提前指定停止时间或步数！\n\n``` pip install schedulefree ```\n我们提供了几种无调度优化器的实现：\n- `SGDScheduleFree` 和 `SGDScheduleFreeReference`：SGD 的无调度变体\n- `AdamWScheduleFree` 和 `AdamWScheduleFreeReference`：AdamW 的无调度变体\n- `RAdamScheduleFree`：RAdam 的无调度变体，既不需要学习率调度，也不需要预热（由社区贡献的实现）\n- 实验性的 `ScheduleFreeWrapper`，可与其他优化器结合使用\n\n`ScheduleFreeReference` 版本实现了简化版逻辑，但占用更多内存。此外，还有 `ScheduleFreeClosure` 版本，可用于 PyTorch 的优化器步进闭包。\n\n作为 Optax 的一部分，还提供了一个 [Jax 实现](https:\u002F\u002Foptax.readthedocs.io\u002Fen\u002Flatest\u002Fapi\u002Fcontrib.html#schedule-free)。\n\n## 方法\n无调度学习用插值和平均的组合替代了基础优化器的动量。以梯度下降为例，基本的无调度更新公式如下：\n\n$$\n\\begin{align*}\ny_{t} & = (1-\\beta)z_{t} + \\beta x_{t},\\\\\nz_{t+1} & =z_{t}-\\gamma\\nabla f(y_{t}),\\\\\nx_{t+1} & =\\left(1-\\frac{1}{t+1}\\right)x_{t}+\\frac{1}{t+1}z_{t+1},\n\\end{align*}\n$$\n\n其中 $x$ 是用于评估测试\u002F验证损失的序列，它与主要迭代点 $z$ 及梯度计算位置 $y$ 不同。对 $z$ 的更新对应于基础优化器，在此例中即简单的梯度步进。\n\n顾名思义，无调度学习无需递减的学习率调度，但通常性能优于或至少与 SOTA 调度方法（如余弦退火和线性衰减）相当。每次只需存储两个序列（第三个可通过前两个实时计算），因此该方法的内存需求与基础优化器相同（参数缓冲区 + 动量）。\n\n我们在本仓库中同时提供了 AdamW 和 SGD 的版本，以及一个实验性的包装器版本，可用于任何基础优化器。\n\n## 使用方法\n由于我们的优化器在梯度计算和测试\u002F验证损失计算时使用不同的点，因此在训练过程中需要在这两者之间切换参数缓冲区。这可以通过在调用 `model.train()` 的地方调用 `optimizer.train()`，在调用 `model.eval()` 的地方调用 `optimizer.eval()` 来完成。保存检查点时，优化器也应置于评估模式。\n\n如果您的代码支持 PyTorch 优化器步进闭包，可以使用闭包形式的优化器，这样就不需要调用 `.train()` 和 `.eval()`。\n\n## 论文\n如果您在工作中使用了无调度训练，请引用我们的 [预印本](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.15682)：\n```\n@misc{defazio2024road,\n      title={The Road Less Scheduled}, \n      author={Aaron Defazio and Xingyu Yang and Harsh Mehta and Konstantin Mishchenko and Ahmed Khaled and Ashok Cutkosky},\n      year={2024},\n      eprint={2405.15682},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n\n### 发布说明\n\n*新增* 1.4 版本添加了由 [nhamanasu](https:\u002F\u002Fgithub.com\u002Fnhamanasu) 提供的 RAdam 实现。\n1.3 版本调整了学习率预热期间权重衰减的行为，以提高稳定性，并使其与 PyTorch 中标准 AdamW 的行为更加一致。之前的实现仍可作为 `AdamWScheduleFreePaper` 使用。\n\n### 示例\n`schedulefree` 包的使用示例可在 `examples` 文件夹中找到。其中包括：\n- [使用卷积神经网络进行图像分类（MNIST）](.\u002Fexamples\u002Fmnist\u002F)*\n- 更多示例将陆续添加\n\n*示例改编自 [Pytorch Examples Repo](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fexamples)。\n\n## 注意事项\n- 如果您的模型使用 BatchNorm，为使测试\u002F验证评估正常工作，需要进行额外修改。在评估之前，可以执行类似以下操作：\n\n ```python\n  model.train()\n  optimizer.eval()\n  with torch.no_grad():\n    for batch in itertools.islice(train_loader, 50):\n      model(batch)\n  model.eval()\n```\n\n这会将 `training_mean`\u002F`training_var` 缓存（在 `model.train()` 模式下每次前向传播时都会更新）替换为基于 $x$ 而非 $y$ 计算的值。使用 PreciseBN 也可以避免此问题。\n\n- 许多代码库使用了可能不兼容的附加功能，需要额外修改才能配合使用。例如，如果参数以 fp16 格式缓存，则需手动更新缓存版本，以确保评估时使用正确的 $x$ 序列，而非 $y$ 序列。某些 GradScaler 会自动处理这一点。\n- 训练对 $\\beta$ 值的选择比普通动量更敏感。我们的默认值 $0.9$ 在大多数问题上都适用，但对于非常长的训练过程，可能需要将其提高到 $0.95$ 或 $0.98$。\n- 虽然无需使用学习率调度器，但代码仍然兼容。\n- 建议使用学习率预热。可通过 `warmup_steps` 参数实现。\n- 该方法确实需要调参——如果不同时调整正则化和学习率参数，未必能超越基于调度的方法。\n- 对于 SGD，建议从比传统学习率大 10 到 50 倍的初始值开始。\n- 对于 AdamW，学习率在比基于调度的方法大 1 到 10 倍的范围内通常效果较好。\n\n# 包装器版本\n\n我们提供了一个高度实验性的包装器版本 `ScheduleFreeWrapper`，它可以包裹任何基础优化器。使用此版本时，您可以禁用基础优化器的动量，因为在使用我们的包装器动量时，基础动量已不再必要（当然，您也可以同时使用两种动量）。\n\n使用示例：\n```\n  base_optimizer = torch.optim.RMSprop(model.parameters(), lr=0.0025)\n  optimizer = ScheduleFreeWrapper(\n    base_optimizer, momentum=0.9, weight_decay_at_y=0.1)\n```\n\n如果您在基础优化器上设置了权重衰减，它会在 $z$ 点计算权重衰减。我们提供了在 $y$ 点计算权重衰减的选项，通过 `weight_decay_at_y` 参数实现，这在我们的实验中似乎效果更好。\n\n我们还提供了一个 `ScheduleFreeWrapperReference` 版本，该版本占用更多内存，但数值稳定性更高，建议用于早期实验或研究工作。\n\n# 许可证\n请参阅 [许可证文件](\u002FLICENSE)。\n\n# 相关工作\n\n无调度学习可以被视为原始平均法（$\\beta=1$）与 Polyak-Ruppert 平均法（$\\beta=0$）之间的插值。这种插值的优势在于它能够兼得两者的优点：我们既可以获得 Polyak-Ruppert 平均法在训练初期的快速收敛特性（因为 $z$ 序列的更新速度比 $x$ 序列更快），又不会让 $x$ 序列偏离 $z$ 序列太远，从而避免不稳定性。\n\n我们的方法还与 Nesterov 的加速方法（Nesterov, 1983）在 AC-SA 形式下（Ghadimi & Lan, 2010）相关：\n\n$$\n\\begin{align*}\ny_{t} & =(1-2\u002F(t+1))x_{t} + (2\u002F(t+1))z_{t}\\\\\nz_{t+1} & =z_{t}-\\frac{t}{2L}\\nabla f(y_{t})\\\\\nx_{t+1} & =(1-2\u002F(t+1))x_{t}+(2\u002F(t+1))z_{t+1}\n\\end{align*}\n$$\n\n我们的方法同样包含三个序列，但使用的权重截然不同，且关键在于并未采用随时间递增的学习率，而后者对于 Nesterov 方法实现加速至关重要。此外，我们在插值操作和平均操作中也采用了不同的权重序列。\n\n诸如随机权重平均（Stochastic Weight Averaging, Izmailov 等人, 2018）和最新权重平均（LAtest Weight Averaging, Kaddour, 2022；Sanyal 等人, 2023）等尾部平均方法，将平均化与较大或循环学习率相结合。然而，这些方法仍然需要使用学习率调度策略，引入额外的超参数进行调优，并且相比我们的方法需要更多的内存。值得注意的是，SWA 和 LAWA 也可以叠加在我们的方法之上，从而可能带来进一步的性能提升。\n\nPortes 等人（2022）使用周期性递增的学习率调度策略，以探索训练时间与评估性能之间帕累托前沿上的多个点。每个周期结束时的模型近似于在该时刻终止的调优学习率调度所得到的模型。相比之下，我们的方法能够生成整个帕累托前沿，而不仅仅是路径上的少数几个点。\n\n流行的 Lookahead 优化器（Zhang 等人, 2019）则使用迭代序列的指数移动平均（EMA）。从某种意义上说，Lookahead 方法可以视为原始平均法的 EMA 版本，正如指数权重平均是 Polyak-Ruppert 平均法的 EMA 版本一样。我们额外的插值步骤同样有可能与 Lookahead 优化器结合使用。","# Schedule-Free 快速上手指南\n\nSchedule-Free 是一种无需预设学习率调度（Learning Rate Schedule）即可实现高效训练的优化器方法。它通过插值和平均机制替代传统动量，通常能匹配或超越余弦退火等主流调度策略的效果，且无需提前指定训练停止步数。\n\n## 环境准备\n\n- **系统要求**：Linux, macOS, Windows\n- **Python 版本**：建议 Python 3.8+\n- **核心依赖**：\n  - PyTorch >= 1.8\n  - 无需其他特殊前置依赖\n\n## 安装步骤\n\n推荐使用 pip 直接安装官方发布版本：\n\n```bash\npip install schedulefree\n```\n\n如需加速下载，可使用国内镜像源（如清华大学源）：\n\n```bash\npip install schedulefree -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\nSchedule-Free 优化器需要在训练模式（计算梯度）和评估模式（计算验证损失）之间切换参数缓冲区。请务必在调用 `model.train()` 时调用 `optimizer.train()`，在 `model.eval()` 时调用 `optimizer.eval()`。\n\n### 1. 导入与初始化\n\n以 `AdamWScheduleFree` 为例（也支持 `SGDScheduleFree` 等）：\n\n```python\nimport torch\nfrom schedulefree import AdamWScheduleFree\n\n# 定义模型\nmodel = MyModel()\n\n# 初始化优化器\n# 注意：通常需要使用比传统方法更大的学习率 (1x-10x)\noptimizer = AdamWScheduleFree(model.parameters(), lr=0.005, weight_decay=0.1)\n```\n\n### 2. 训练循环示例\n\n```python\nfor epoch in range(num_epochs):\n    # --- 训练阶段 ---\n    model.train()\n    optimizer.train()  # 关键：切换到训练模式\n    \n    for batch in train_loader:\n        inputs, targets = batch\n        \n        optimizer.zero_grad()\n        outputs = model(inputs)\n        loss = criterion(outputs, targets)\n        loss.backward()\n        optimizer.step()\n\n    # --- 验证阶段 ---\n    model.eval()\n    optimizer.eval()  # 关键：切换到评估模式，此时参数位于 x 序列\n    \n    with torch.no_grad():\n        for batch in val_loader:\n            inputs, targets = batch\n            outputs = model(inputs)\n            # 计算验证指标...\n    \n    # 保存检查点时也需保持 optimizer.eval() 状态\n    if should_save_checkpoint:\n        torch.save({\n            'model_state_dict': model.state_dict(),\n            'optimizer_state_dict': optimizer.state_dict(),\n        }, 'checkpoint.pth')\n```\n\n### 3. 进阶提示\n\n- **学习率设置**：\n  - **SGD**: 尝试比经典值大 10-50 倍的学习率。\n  - **AdamW**: 尝试比经典值大 1-10 倍的学习率。\n- **Warmup**：虽然不需要复杂的调度，但建议使用 `warmup_steps` 参数进行学习率预热以提升稳定性。\n- **BatchNorm 注意事项**：如果模型包含 BatchNorm 层，在验证前可能需要运行少量训练批次以更新统计信息，确保评估基于正确的参数序列 $x$。","某初创团队正在训练一个基于 Transformer 的垂直领域大语言模型，面临算力昂贵且调参经验不足的困境。\n\n### 没有 schedule_free 时\n- **学习率调度复杂**：工程师需花费大量时间手动设计余弦退火或线性衰减策略，一旦预设的训练步数与实际不符，模型极易欠拟合或发散。\n- **冷启动不稳定**：必须精心配置漫长的预热（Warmup）阶段来防止初期梯度爆炸，增加了超参数搜索的维度。\n- **实验迭代缓慢**：每次调整总训练时长都需要重新推导整个学习率曲线，导致尝试不同收敛点的成本极高。\n- **资源浪费风险**：若中途提前停止训练，原本设计的调度方案失效，导致前期算力投入无法转化为最佳模型性能。\n\n### 使用 schedule_free 后\n- **彻底移除调度**：直接采用默认配置即可，无需指定停止时间或步数，算法自动通过插值与平均机制适应训练进程。\n- **消除预热需求**：内置的 RAdamW 等变体天然稳定，省去了繁琐的 Warmup 设置，开局即能安全高效地更新权重。\n- **灵活随时止损**：支持在任何时刻评估验证集损失并保存检查点，团队可根据实时效果动态决定何时结束训练，无需重跑。\n- **性能更优更稳**：在同等计算预算下，自动生成的隐式调度往往比人工设计的 SOTA 策略收敛更快，且最终精度更高。\n\nschedule_free 的核心价值在于将开发者从复杂的学习率工程管理中解放出来，让模型训练像“自动驾驶”一样简单且高效。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_schedule_free_807ed27c.png","facebookresearch","Meta Research","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ffacebookresearch_449342bd.png","",null,"https:\u002F\u002Fopensource.fb.com","https:\u002F\u002Fgithub.com\u002Ffacebookresearch",[83],{"name":84,"color":85,"percentage":86},"Python","#3572A5",100,2266,74,"2026-04-04T05:18:26","Apache-2.0",1,"未说明（依赖 PyTorch 环境，通常用于 GPU 加速训练）","未说明（内存需求与基础优化器相同，需存储参数量 + 动量缓冲区；部分参考实现版本内存占用更高）",{"notes":95,"python":96,"dependencies":97},"该工具是 PyTorch 的优化器库，通过 pip 安装。若模型包含 BatchNorm 层，在评估前需特殊处理以同步统计信息；若使用 fp16 混合精度训练，可能需要手动更新缓存参数以确保评估序列正确。建议配合学习率预热（warmup）使用，无需复杂的学习率衰减策略。SGD 变体通常需要比传统方法大 10-50 倍的学习率，AdamW 变体需大 1-10 倍。","未说明",[98,99],"torch","itertools",[13],"2026-03-27T02:49:30.150509","2026-04-06T07:12:59.279631",[104,109,114,119,123,128],{"id":105,"question_zh":106,"answer_zh":107,"source_url":108},17485,"安装 schedulefree 包时出现 'Getting requirements to build wheel did not run successfully' 错误怎么办？","这是一个已知的构建问题（版本 1.4.0），维护者已发布修复版本。请尝试安装最新版本（1.4.1 或更高），该版本已改用 hatch 构建系统解决了此问题。运行命令：`pip install schedulefree --upgrade`。如果作为依赖项被锁定在旧版本，可能需要手动更新相关项目的 requirements 文件以允许新版本。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fschedule_free\u002Fissues\u002F62",{"id":110,"question_zh":111,"answer_zh":112,"source_url":113},17486,"使用 AdamWScheduleFree 时遇到 'ZeroDivisionError: float division by zero' 错误是什么原因？","这通常是因为在使用优化器自带的预热（warmup）功能时，仍然保留了外部的学习率调度器（Learning Rate Scheduler）。Schedule-Free 优化器内部已经实现了预热机制（通过 `warmup_steps` 参数），因此必须移除代码中额外的学习率调度器（如 `scheduler.step()`），而不仅仅是注释掉调用行。确保只使用优化器内部的 `warmup_steps` 参数来控制预热。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fschedule_free\u002Fissues\u002F5",{"id":115,"question_zh":116,"answer_zh":117,"source_url":118},17487,"AdamWScheduleFree 如何与 PyTorch 的自动混合精度（AMP）、Autocast 和 GradScaler 配合使用？","根据维护者的实验反馈（如在 nanogpt 代码库中），该优化器可以正确地与 `torch.cuda.amp.autocast` 和 `torch.cuda.amp.GradScaler` 协同工作。标准的训练流程无需特殊修改：在前向传播使用 `autocast`，反向传播前使用 `grad_scaler.scale(loss).backward()`，更新权重时使用 `grad_scaler.step(optimizer)`，最后调用 `grad_scaler.update()`。如果在多 GPU 训练中遇到 NaN，请检查是否错误地叠加了外部学习率调度器。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fschedule_free\u002Fissues\u002F8",{"id":120,"question_zh":121,"answer_zh":122,"source_url":113},17488,"如何为 AdamWScheduleFree 设置学习率预热（Warmup）？","不需要使用外部的 `torch.optim.lr_scheduler`。AdamWScheduleFree 优化器构造函数中直接提供了一个 `warmup_steps` 参数。在初始化优化器时，直接传入所需的预热步数即可，例如：`optimizer = AdamWScheduleFree(model.parameters(), lr=1e-3, warmup_steps=1000)`。切勿同时使用外部调度器，否则会导致除零错误或其他训练不稳定问题。",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},17489,"在调整 AdamWScheduleFree 时，学习率（LR）和权重衰减（Weight Decay）应该如何联合调优？","根据作者的经验，最佳权重衰减（weight_decay）的值确实依赖于所选的学习率（LR）。当改变学习率时，通常也需要相应地调整权重衰减以获得最佳性能。建议在网格搜索（grid sweep）时同时调整这两个参数，而不是固定其中一个。虽然该优化器遵循 PyTorch AdamW 的惯例（权重衰减受 LR 缩放影响），但在实际调参中，两者存在耦合关系，需共同寻找最优组合。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fschedule_free\u002Fissues\u002F47",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},17490,"schedulefree 1.4.0 版本安装失败且无法通过依赖项自动升级，有什么临时解决方案？","由于 1.4.0 版本的构建配置存在问题且无法直接修改已发布的包，如果您的项目通过依赖项锁定了该版本，您需要 fork 对应的上游仓库，修改其 `requirements` 文件或 `pyproject.toml` 中的版本约束（例如改为 `schedulefree>=1.4.1`），然后安装您 fork 后的版本。维护者建议尽可能覆盖安装包版本，使用命令行强制安装新版本通常能解决大部分问题。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fschedule_free\u002Fissues\u002F63",[]]