[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-huggingface--pytorch-image-models":3,"tool-huggingface--pytorch-image-models":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":80,"owner_email":80,"owner_twitter":76,"owner_website":81,"owner_url":82,"languages":83,"stars":96,"forks":97,"last_commit_at":98,"license":99,"difficulty_score":23,"env_os":100,"env_gpu":101,"env_ram":102,"env_deps":103,"category_tags":111,"github_topics":112,"view_count":23,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":133,"updated_at":134,"faqs":135,"releases":165},3586,"huggingface\u002Fpytorch-image-models","pytorch-image-models","The largest collection of PyTorch image encoders \u002F backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more","pytorch-image-models 是 PyTorch 生态中规模最大、最全面的图像编码器与骨干网络集合库。它一站式集成了从模型训练、评估、推理到导出的完整脚本流程，并提供了丰富的预训练权重，覆盖了 ResNet、EfficientNet、Vision Transformer (ViT)、Swin Transformer、ConvNeXt 以及最新的 MobileNetV4 等主流前沿架构。\n\n该工具主要解决了研究人员和开发者在复现论文模型时面临的代码分散、标准不一及预训练资源难获取等痛点。通过提供统一且经过严格验证的代码实现，它让用户无需重复造轮子，即可快速搭建高性能的计算机视觉应用或开展学术实验。\n\npytorch-image-models 特别适合 AI 研究人员、算法工程师及深度学习开发者使用。无论是需要微调模型以适应特定场景，还是希望对比不同架构的性能表现，都能在此找到可靠支持。其独特亮点在于极高的更新频率与广泛的模型覆盖率，不仅及时跟进学术界最新成果（如差分注意力机制、新型优化器 AdaMuon 等），还针对推理速度进行了深度优化，并提供详尽的基准测试数据。此外，项目","pytorch-image-models 是 PyTorch 生态中规模最大、最全面的图像编码器与骨干网络集合库。它一站式集成了从模型训练、评估、推理到导出的完整脚本流程，并提供了丰富的预训练权重，覆盖了 ResNet、EfficientNet、Vision Transformer (ViT)、Swin Transformer、ConvNeXt 以及最新的 MobileNetV4 等主流前沿架构。\n\n该工具主要解决了研究人员和开发者在复现论文模型时面临的代码分散、标准不一及预训练资源难获取等痛点。通过提供统一且经过严格验证的代码实现，它让用户无需重复造轮子，即可快速搭建高性能的计算机视觉应用或开展学术实验。\n\npytorch-image-models 特别适合 AI 研究人员、算法工程师及深度学习开发者使用。无论是需要微调模型以适应特定场景，还是希望对比不同架构的性能表现，都能在此找到可靠支持。其独特亮点在于极高的更新频率与广泛的模型覆盖率，不仅及时跟进学术界最新成果（如差分注意力机制、新型优化器 AdaMuon 等），还针对推理速度进行了深度优化，并提供详尽的基准测试数据。此外，项目高度重视安全性与兼容性，持续改进检查点加载机制并适配最新 PyTorch 版本，是构建视觉系统值得信赖的基础设施。","# PyTorch Image Models\n- [What's New](#whats-new)\n- [Introduction](#introduction)\n- [Models](#models)\n- [Features](#features)\n- [Results](#results)\n- [Getting Started (Documentation)](#getting-started-documentation)\n- [Train, Validation, Inference Scripts](#train-validation-inference-scripts)\n- [Awesome PyTorch Resources](#awesome-pytorch-resources)\n- [Licenses](#licenses)\n- [Citing](#citing)\n\n## What's New\n\n## March 23, 2026\n* Improve pickle checkpoint handling security. Default all loading to `weights_only=True`, add safe_global for ArgParse.\n* Improve attention mask handling for core ViT\u002FEVA models & layers. Resolve bool masks, pass `is_causal` through for SSL tasks.\n* Fix class & register token uses with ViT and no pos embed enabled.\n* Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https:\u002F\u002Fgithub.com\u002Fsinahmr).\n* Improve consistency of output projection \u002F MLP dimensions for attention pooling layers.\n* Hiera model F.SDPA optimization to allow Flash Attention kernel use.\n* Caution added to SGDP optimizer.\n* Release 1.0.26. First maintenance release since my departure from Hugging Face.\n\n## Feb 23, 2026\n* Add token distillation training support to distillation task wrappers\n* Remove some torch.jit usage in prep for official deprecation\n* Caution added to AdamP optimizer\n* Call reset_parameters() even if meta-device init so that buffers get init w\u002F hacks like init_empty_weights\n* Tweak Muon optimizer to work with DTensor\u002FFSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)\n* Release 1.0.25\n\n## Jan 21, 2026\n* **Compat Break**: Fix oversight w\u002F QKV vs MLP bias in `ParallelScalingBlock` (& `DiffParallelScalingBlock`)\n  * Does not impact any trained `timm` models but could impact downstream use.\n\n## Jan 5 & 6, 2026\n* Release 1.0.24\n* Add new benchmark result csv files for inference timing on all models w\u002F RTX Pro 6000, 5090, and 4090 cards w\u002F PyTorch 2.9.1\n* Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports\n* Release 1.0.23\n\n## Dec 30, 2025\n* Add better NAdaMuon trained `dpwee`, `dwee`, `dlittle` (differential) ViTs with a small boost over previous runs\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k (83.24% top-1)\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k  (81.80% top-1)\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.67% top-1)\n* Add a ~21M param `timm` variant of the CSATv2 model at 512x512 & 640x640\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fcsatv2_21m.sw_r640_in1k (83.13% top-1)\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fcsatv2_21m.sw_r512_in1k (82.58% top-1)\n* Factor non-persistent param init out of `__init__` into a common method that can be externally called via `init_non_persistent_buffers()` after meta-device init. \n  \n## Dec 12, 2025\n* Add CSATV2 model (thanks https:\u002F\u002Fgithub.com\u002Fgusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https:\u002F\u002Fhuggingface.co\u002FHyunil\u002FCSATv2\n* Add AdaMuon and NAdaMuon optimizer support to existing `timm` Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks.\n* End of year PR cleanup, merge aspects of several long open PR\n  * Merge differential attention (`DiffAttention`), add corresponding `DiffParallelScalingBlock` (for ViT), train some wee vits\n    * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dwee_patch16_reg1_gap_256.sbb_in1k\n    * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dpwee_patch16_reg1_gap_256.sbb_in1k\n  * Add a few pooling modules, `LsePlus` and `SimPool`\n  * Cleanup, optimize `DropBlock2d` (also add support to ByobNet based models)\n* Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10\n  \n## Dec 1, 2025\n* Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.\n* Remove old APEX AMP support\n\n## Nov 4, 2025\n* Fix LayerScale \u002F LayerScale2d init bug (init values ignored), introduced in 1.0.21. Thanks https:\u002F\u002Fgithub.com\u002FIlya-Fradlin\n* Release 1.0.22\n\n## Oct 31, 2025 🎃\n* Update imagenet & OOD variant result csv files to include a few new models and verify correctness over several torch & timm versions\n* EfficientNet-X and EfficientNet-H B5 model weights added as part of a hparam search for AdamW vs Muon (still iterating on Muon runs)\n\n## Oct 16-20, 2025\n* Add an impl of the Muon optimizer (based on https:\u002F\u002Fgithub.com\u002FKellerJordan\u002FMuon) with customizations\n  * extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization\n  * small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops\n  * by default uses AdamW (or NAdamW if `nesterov=True`) updates if muon not suitable for parameter shape (or excluded via param group flag)\n  * like torch impl, select from several LR scale adjustment fns via `adjust_lr_fn`\n  * select from several NS coefficient presets or specify your own via `ns_coefficients`\n* First 2 steps of 'meta' device model initialization supported\n  * Fix several ops that were breaking creation under 'meta' device context\n  * Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in `timm`\n* License fields added to pretrained cfgs in code\n* Release 1.0.21\n\n## Sept 21, 2025\n* Remap DINOv3 ViT weight tags from `lvd_1689m` -> `lvd1689m` to match (same for `sat_493m` -> `sat493m`)\n* Release 1.0.20\n\n## Sept 17, 2025\n* DINOv3 (https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10104) ConvNeXt and ViT models added. ConvNeXt models were mapped to existing `timm` model. ViT support done via the EVA base model w\u002F a new `RotaryEmbeddingDinoV3` to match the DINOv3 specific RoPE impl\n  * HuggingFace Hub: https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ftimm\u002Ftimm-dinov3-68cb08bb0bee365973d52a4d\n* MobileCLIP-2 (https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.20691) vision encoders. New MCI3\u002FMCI4 FastViT variants added and weights mapped to existing FastViT and B, L\u002F14 ViTs.\n* MetaCLIP-2 Worldwide (https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.22062) ViT encoder weights added.\n* SigLIP-2 (https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14786) NaFlex ViT encoder weights added via timm NaFlexViT model.\n* Misc fixes and contributions\n\n## July 23, 2025\n* Add `set_input_size()` method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.\n* Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0\n* Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.\n\n## July 21, 2025\n* ROPE support added to NaFlexViT. All models covered by the EVA base (`eva.py`) including EVA, EVA02, Meta PE ViT, `timm` SBB ViT w\u002F ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when `use_naflex=True` passed at model creation time\n* More Meta PE ViT encoders added, including small\u002Ftiny variants, lang variants w\u002F tiling, and more spatial variants.\n* PatchDropout fixed with NaFlexViT and also w\u002F EVA models (regression after adding Naver ROPE-ViT)\n* Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).\n\n## July 7, 2025\n* MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)\n  * Add stem bias (zero'd in updated weights, compat break with old weights)\n  * GELU -> GELU (tanh approx). A minor change to be closer to JAX\n* Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold\n* Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32\n* Some typing, argument cleanup for norm, norm+act layers done with above\n* Support Naver ROPE-ViT (https:\u002F\u002Fgithub.com\u002Fnaver-ai\u002Frope-vit) in `eva.py`, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub\n\n|model                                             |img_size|top1  |top5  |param_count|\n|--------------------------------------------------|--------|------|------|-----------|\n|vit_large_patch16_rope_mixed_ape_224.naver_in1k  |224     |84.84 |97.122|304.4      |\n|vit_large_patch16_rope_mixed_224.naver_in1k      |224     |84.828|97.116|304.2      |\n|vit_large_patch16_rope_ape_224.naver_in1k        |224     |84.65 |97.154|304.37     |\n|vit_large_patch16_rope_224.naver_in1k            |224     |84.648|97.122|304.17     |\n|vit_base_patch16_rope_mixed_ape_224.naver_in1k   |224     |83.894|96.754|86.59      |\n|vit_base_patch16_rope_mixed_224.naver_in1k       |224     |83.804|96.712|86.44      |\n|vit_base_patch16_rope_ape_224.naver_in1k         |224     |83.782|96.61 |86.59      |\n|vit_base_patch16_rope_224.naver_in1k             |224     |83.718|96.672|86.43      |\n|vit_small_patch16_rope_224.naver_in1k            |224     |81.23 |95.022|21.98      |\n|vit_small_patch16_rope_mixed_224.naver_in1k      |224     |81.216|95.022|21.99      |\n|vit_small_patch16_rope_ape_224.naver_in1k        |224     |81.004|95.016|22.06      |\n|vit_small_patch16_rope_mixed_ape_224.naver_in1k  |224     |80.986|94.976|22.06      |\n* Some cleanup of ROPE modules, helpers, and FX tracing leaf registration\n* Preparing version 1.0.17 release\n\n## June 26, 2025\n* MobileNetV5 backbone (w\u002F encoder only variant) for [Gemma 3n](https:\u002F\u002Fai.google.dev\u002Fgemma\u002Fdocs\u002Fgemma-3n#parameters) image encoder\n* Version 1.0.16 released\n\n## June 23, 2025\n* Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https:\u002F\u002Fgithub.com\u002Fstas-sl).\n* Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https:\u002F\u002Fgithub.com\u002Fstas-sl).\n* Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w\u002F same hparams.\n\n | Model | Top-1 Acc | Top-5 Acc | Params (M) | Eval Seq Len |\n |:---|:---:|:---:|:---:|:---:|\n | [naflexvit_base_patch16_par_gap.e300_s576_in1k](https:\u002F\u002Fhf.co\u002Ftimm\u002Fnaflexvit_base_patch16_par_gap.e300_s576_in1k) | 83.67 | 96.45 | 86.63 | 576 |\n | [naflexvit_base_patch16_parfac_gap.e300_s576_in1k](https:\u002F\u002Fhf.co\u002Ftimm\u002Fnaflexvit_base_patch16_parfac_gap.e300_s576_in1k) | 83.63 | 96.41 | 86.46 | 576 |\n | [naflexvit_base_patch16_gap.e300_s576_in1k](https:\u002F\u002Fhf.co\u002Ftimm\u002Fnaflexvit_base_patch16_gap.e300_s576_in1k) | 83.50 | 96.46 | 86.63 | 576 |\n* Support gradient checkpointing for `forward_intermediates` and fix some checkpointing bugs. Thanks https:\u002F\u002Fgithub.com\u002Fbrianhou0208\n* Add 'corrected weight decay' (https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers\n* Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly\n* Fix cuda stream bug in prefetch loader\n  \n## June 5, 2025\n* Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:\n  1. Encapsulated embedding and position encoding in a single module\n  2. Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs\n  3. Support for NaFlex variable aspect, variable resolution (SigLip-2: https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14786)\n  4. Support for FlexiViT variable patch size (https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08013)\n  5. Support for NaViT fractional\u002Ffactorized position embedding (https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.06304)\n* Existing vit models in `vision_transformer.py` can be loaded into the NaFlexVit model by adding the `use_naflex=True` flag to `create_model`\n  * Some native weights coming soon\n* A full NaFlex data pipeline is available that allows training \u002F fine-tuning \u002F evaluating with variable aspect \u002F size images\n  * To enable in `train.py` and `validate.py` add the `--naflex-loader` arg, must be used with a NaFlexVit\n* To evaluate an existing (classic) ViT loaded in NaFlexVit model w\u002F NaFlex data pipe:\n  * `python validate.py \u002Fimagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256` \n* The training has some extra args features worth noting\n  * The `--naflex-train-seq-lens'` argument specifies which sequence lengths to randomly pick from per batch during training\n  * The `--naflex-max-seq-len` argument sets the target sequence length for validation\n  * Adding `--model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24` will enable random patch size selection per-batch w\u002F interpolation\n  * The `--naflex-loss-scale` arg changes loss scaling mode per batch relative to the batch size, `timm` NaFlex loading changes the batch size for each seq len\n\n## May 28, 2025\n* Add a number of small\u002Ffast models thanks to https:\u002F\u002Fgithub.com\u002Fbrianhou0208\n  * SwiftFormer - [(ICCV2023) SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https:\u002F\u002Fgithub.com\u002FAmshaker\u002FSwiftFormer) \n  * FasterNet - [(CVPR2023) Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks](https:\u002F\u002Fgithub.com\u002FJierunChen\u002FFasterNet)\n  * SHViT - [(CVPR2024) SHViT: Single-Head Vision Transformer with Memory Efficient](https:\u002F\u002Fgithub.com\u002Fysj9909\u002FSHViT)\n  * StarNet - [(CVPR2024) Rewrite the Stars](https:\u002F\u002Fgithub.com\u002Fma-xu\u002FRewrite-the-Stars)\n  * GhostNet-V3 [GhostNetV3: Exploring the Training Strategies for Compact Models](https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FEfficient-AI-Backbones\u002Ftree\u002Fmaster\u002Fghostnetv3_pytorch)\n* Update EVA ViT (closest match) to support Perception Encoder models (https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.13181) from Meta, loading Hub weights but I still need to push dedicated `timm` weights\n  * Add some flexibility to ROPE impl\n* Big increase in number of models supporting `forward_intermediates()` and some additional fixes thanks to https:\u002F\u002Fgithub.com\u002Fbrianhou0208\n  * DaViT, EdgeNeXt, EfficientFormerV2, EfficientViT(MIT), EfficientViT(MSRA), FocalNet, GCViT, HGNet \u002FV2, InceptionNeXt, Inception-V4, MambaOut, MetaFormer, NesT, Next-ViT, PiT, PVT V2, RepGhostNet, RepViT, ResNetV2, ReXNet, TinyViT, TResNet, VoV\n* TNT model updated w\u002F new weights `forward_intermediates()` thanks to https:\u002F\u002Fgithub.com\u002Fbrianhou0208\n* Add `local-dir:` pretrained schema, can use `local-dir:\u002Fpath\u002Fto\u002Fmodel\u002Ffolder` for model name to source model \u002F pretrained cfg & weights Hugging Face Hub models (config.json + weights file) from a local folder.\n* Fixes, improvements for onnx export\n    \n## Feb 21, 2025\n* SigLIP 2 ViT image encoders added (https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ftimm\u002Fsiglip-2-67b8e72ba08b09dd97aecaf9)\n  * Variable resolution \u002F aspect NaFlex versions are a WIP\n* Add 'SO150M2' ViT weights trained with SBB recipes, great results, better for ImageNet than previous attempt w\u002F less training.\n  * `vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k` - 88.1% top-1\n  * `vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k` - 87.9% top-1\n  * `vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k` - 87.3% top-1\n  * `vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12k`\n* Updated InternViT-300M '2.5' weights\n* Release 1.0.15\n\n## Feb 1, 2025\n* FYI PyTorch 2.6 & Python 3.13 are tested and working w\u002F current main and released version of `timm`\n\n## Jan 27, 2025\n* Add Kron Optimizer (PSGD w\u002F Kronecker-factored preconditioner) \n  * Code from https:\u002F\u002Fgithub.com\u002Fevanatyourservice\u002Fkron_torch\n  * See also https:\u002F\u002Fsites.google.com\u002Fsite\u002Flixilinx\u002Fhome\u002Fpsgd\n\n## Jan 19, 2025\n* Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated\n* Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k\u002F1k pretrain\u002Fft\n  * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k` - 86.7% top-1\n  * `vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k` - 87.4% top-1\n  * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k`\n* Misc typing, typo, etc. cleanup\n* 1.0.14 release to get above LeViT fix out\n\n## Jan 9, 2025\n* Add support to train and validate in pure `bfloat16` or `float16`\n* `wandb` project name arg added by https:\u002F\u002Fgithub.com\u002Fcaojiaolong, use arg.experiment for name\n* Fix old issue w\u002F checkpoint saving not working on filesystem w\u002Fo hard-link support (e.g. FUSE fs mounts)\n* 1.0.13 release\n\n## Jan 6, 2025\n* Add `torch.utils.checkpoint.checkpoint()` wrapper in `timm.models` that defaults `use_reentrant=False`, unless `TIMM_REENTRANT_CKPT=1` is set in env.\n\n## Dec 31, 2024\n* `convnext_nano` 384x384 ImageNet-12k pretrain & fine-tune. https:\u002F\u002Fhuggingface.co\u002Fmodels?search=convnext_nano%20r384\n* Add AIM-v2 encoders from https:\u002F\u002Fgithub.com\u002Fapple\u002Fml-aim, see on Hub: https:\u002F\u002Fhuggingface.co\u002Fmodels?search=timm%20aimv2\n* Add PaliGemma2 encoders from https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fbig_vision to existing PaliGemma, see on Hub: https:\u002F\u002Fhuggingface.co\u002Fmodels?search=timm%20pali2\n* Add missing L\u002F14 DFN2B 39B CLIP ViT, `vit_large_patch14_clip_224.dfn2b_s39b`\n* Fix existing `RmsNorm` layer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl to `SimpleNorm` layer, it's LN w\u002Fo centering or bias. There were only two `timm` models using it, and they have been updated.\n* Allow override of `cache_dir` arg for model creation\n* Pass through `trust_remote_code` for HF datasets wrapper\n* `inception_next_atto` model added by creator\n* Adan optimizer caution, and Lamb decoupled weight decay options\n* Some feature_info metadata fixed by https:\u002F\u002Fgithub.com\u002Fbrianhou0208\n* All OpenCLIP and JAX (CLIP, SigLIP, Pali, etc) model weights that used load time remapping were given their own HF Hub instances so that they work with `hf-hub:` based loading, and thus will work with new Transformers `TimmWrapperModel`\n\n## Introduction\n\nPy**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders \u002F augmentations, and reference training \u002F validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.\n\nThe work of many others is present here. I've tried to make sure all source material is acknowledged via links to github, arxiv papers, etc in the README, documentation, and code docstrings. Please let me know if I missed anything.\n\n## Features\n\n### Models\n\nAll model architecture families include variants with pretrained weights. There are specific model variants without any weights, it is NOT a bug. Help training new or better weights is always appreciated.\n\n* Aggregating Nested Transformers - https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.12723\n* BEiT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.08254\n* BEiT-V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.06366\n* BEiT3 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.10442\n* Big Transfer ResNetV2 (BiT) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.11370\n* Bottleneck Transformers - https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.11605\n* CaiT (Class-Attention in Image Transformers) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.17239\n* CoaT (Co-Scale Conv-Attentional Image Transformers) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.06399\n* CoAtNet (Convolution and Attention) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.04803\n* ConvNeXt - https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.03545\n* ConvNeXt-V2 - http:\u002F\u002Farxiv.org\u002Fabs\u002F2301.00808\n* ConViT (Soft Convolutional Inductive Biases Vision Transformers)- https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.10697\n* CspNet (Cross-Stage Partial Networks) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.11929\n* DeiT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.12877\n* DeiT-III - https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.07118.pdf\n* DenseNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.06993\n* DLA - https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06484\n* DPN (Dual-Path Network) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.01629\n* EdgeNeXt - https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.10589\n* EfficientFormer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.01191\n* EfficientFormer-V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08059\n* EfficientNet (MBConvNet Family)\n    * EfficientNet NoisyStudent (B0-B7, L2) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.04252\n    * EfficientNet AdvProp (B0-B8) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.09665\n    * EfficientNet (B0-B7) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.11946\n    * EfficientNet-EdgeTPU (S, M, L) - https:\u002F\u002Fai.googleblog.com\u002F2019\u002F08\u002Fefficientnet-edgetpu-creating.html\n    * EfficientNet V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.00298\n    * FBNet-C - https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.03443\n    * MixNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.09595\n    * MNASNet B1, A1 (Squeeze-Excite), and Small - https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.11626\n    * MobileNet-V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.04381\n    * Single-Path NAS - https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.02877\n    * TinyNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.14819\n* EfficientViT (MIT) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.14756\n* EfficientViT (MSRA) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.07027\n* EVA - https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.07636\n* EVA-02 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11331\n* FasterNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.03667\n* FastViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.14189\n* FlexiViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08013\n* FocalNet (Focal Modulation Networks) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11926\n* GCViT (Global Context Vision Transformer) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.09959\n* GhostNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.11907\n* GhostNet-V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.12905\n* GhostNet-V3 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.11202\n* gMLP - https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.08050\n* GPU-Efficient Networks - https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.14090\n* Halo Nets - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.12731\n* HGNet \u002F HGNet-V2 - TBD\n* HRNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.07919\n* InceptionNeXt - https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.16900\n* Inception-V3 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.00567\n* Inception-ResNet-V2 and Inception-V4 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.07261\n* Lambda Networks - https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.08602\n* LeViT (Vision Transformer in ConvNet's Clothing) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01136\n* MambaOut - https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.07992\n* MaxViT (Multi-Axis Vision Transformer) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.01697\n* MetaFormer (PoolFormer-v2, ConvFormer, CAFormer) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.13452\n* MLP-Mixer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.01601\n* MobileCLIP - https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.17049\n* MobileNet-V3 (MBConvNet w\u002F Efficient Head) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.02244\n  * FBNet-V3 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.02049\n  * HardCoRe-NAS - https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.11646\n  * LCNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.15099\n* MobileNetV4 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.10518\n* MobileOne - https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.04040\n* MobileViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02178\n* MobileViT-V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.02680\n* MViT-V2 (Improved Multiscale Vision Transformer) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.01526\n* NASNet-A - https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.07012\n* NesT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.12723\n* Next-ViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05501\n* NFNet-F - https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.06171\n* NF-RegNet \u002F NF-ResNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.08692\n* PE (Perception Encoder) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.13181\n* PNasNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.00559\n* PoolFormer (MetaFormer) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.11418\n* Pooling-based Vision Transformer (PiT) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.16302\n* PVT-V2 (Improved Pyramid Vision Transformer) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.13797\n* RDNet (DenseNets Reloaded) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.19588\n* RegNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.13678\n* RegNetZ - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.06877\n* RepVGG - https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.03697\n* RepGhostNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.06088\n* RepViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.09283\n* ResMLP - https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.03404\n* ResNet\u002FResNeXt\n    * ResNet (v1b\u002Fv1.5) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.03385\n    * ResNeXt - https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.05431\n    * 'Bag of Tricks' \u002F Gluon C, D, E, S variations - https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.01187\n    * Weakly-supervised (WSL) Instagram pretrained \u002F ImageNet tuned ResNeXt101 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1805.00932\n    * Semi-supervised (SSL) \u002F Semi-weakly Supervised (SWSL) ResNet\u002FResNeXts - https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.00546\n    * ECA-Net (ECAResNet) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.03151v4\n    * Squeeze-and-Excitation Networks (SEResNet) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.01507\n    * ResNet-RS - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.07579\n* Res2Net - https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.01169\n* ResNeSt - https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.08955\n* ReXNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.00992\n* ROPE-ViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.13298\n* SelecSLS - https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.00837\n* Selective Kernel Networks - https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.06586\n* Sequencer2D - https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.01972\n* SHViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.16456\n* SigLIP (image encoder) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.15343\n* SigLIP 2 (image encoder) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14786\n* StarNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.19967\n* SwiftFormer - https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.15446\n* Swin S3 (AutoFormerV2) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.14725\n* Swin Transformer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.14030\n* Swin Transformer V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.09883\n* TinyViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.10666\n* Transformer-iN-Transformer (TNT) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.00112\n* TResNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.13630\n* Twins (Spatial Attention in Vision Transformers) - https:\u002F\u002Farxiv.org\u002Fpdf\u002F2104.13840.pdf\n* VGG - https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.1556\n* Visformer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.12533\n* Vision Transformer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.11929\n* ViTamin - https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02132\n* VOLO (Vision Outlooker) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.13112\n* VovNet V2 and V1 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.06667\n* Xception - https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.02357\n* Xception (Modified Aligned, Gluon) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.02611\n* Xception (Modified Aligned, TF) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.02611\n* XCiT (Cross-Covariance Image Transformers) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.09681\n\n### Optimizers\nTo see full list of optimizers w\u002F descriptions: `timm.optim.list_optimizers(with_description=True)`\n\nIncluded optimizers available via `timm.optim.create_optimizer_v2` factory method:\n* `adabelief` an implementation of AdaBelief adapted from https:\u002F\u002Fgithub.com\u002Fjuntang-zhuang\u002FAdabelief-Optimizer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.07468\n* `adafactor` adapted from [FAIRSeq impl](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Fblob\u002Fmaster\u002Ffairseq\u002Foptim\u002Fadafactor.py) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.04235\n* `adafactorbv` adapted from [Big Vision](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fbig_vision\u002Fblob\u002Fmain\u002Fbig_vision\u002Foptax.py) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.04560\n* `adahessian` by [David Samuel](https:\u002F\u002Fgithub.com\u002Fdavda54\u002Fada-hessian) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.00719\n* `adamp` and `sgdp` by [Naver ClovAI](https:\u002F\u002Fgithub.com\u002Fclovaai) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.08217\n* `adamuon` and `nadamuon` as per https:\u002F\u002Fgithub.com\u002FChongjie-Si\u002FAdaMuon - https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.11005\n* `adan` an implementation of Adan adapted from https:\u002F\u002Fgithub.com\u002Fsail-sg\u002FAdan - https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.06677\n* `adopt` ADOPT adapted from https:\u002F\u002Fgithub.com\u002FiShohei220\u002Fadopt - https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.02853\n* `kron` PSGD w\u002F Kronecker-factored preconditioner from https:\u002F\u002Fgithub.com\u002Fevanatyourservice\u002Fkron_torch - https:\u002F\u002Fsites.google.com\u002Fsite\u002Flixilinx\u002Fhome\u002Fpsgd\n* `lamb` an implementation of Lamb and LambC (w\u002F trust-clipping) cleaned up and modified to support use with XLA - https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.00962\n* `laprop` optimizer from https:\u002F\u002Fgithub.com\u002FZ-T-WANG\u002FLaProp-Optimizer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.04839\n* `lars` an implementation of LARS and LARC (w\u002F trust-clipping) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1708.03888\n* `lion` and implementation of Lion adapted from https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fautoml\u002Ftree\u002Fmaster\u002Flion - https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.06675\n* `lookahead` adapted from impl by [Liam](https:\u002F\u002Fgithub.com\u002Falphadl\u002Flookahead.pytorch) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.08610\n* `madgrad` an implementation of MADGRAD adapted from https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmadgrad - https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.11075\n* `mars` MARS optimizer from https:\u002F\u002Fgithub.com\u002FAGI-Arena\u002FMARS - https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10438\n* `muon` MUON optimizer from https:\u002F\u002Fgithub.com\u002FKellerJordan\u002FMuon with numerous additions and improved non-transformer behaviour\n* `nadam` an implementation of Adam w\u002F Nesterov momentum\n* `nadamw` an implementation of AdamW (Adam w\u002F decoupled weight-decay) w\u002F Nesterov momentum. A simplified impl based on https:\u002F\u002Fgithub.com\u002Fmlcommons\u002Falgorithmic-efficiency\n* `novograd` by [Masashi Kimura](https:\u002F\u002Fgithub.com\u002Fconvergence-lab\u002Fnovograd) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.11286\n* `radam` by [Liyuan Liu](https:\u002F\u002Fgithub.com\u002FLiyuanLucasLiu\u002FRAdam) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.03265\n* `rmsprop_tf` adapted from PyTorch RMSProp by myself. Reproduces much improved Tensorflow RMSProp behaviour\n* `sgdw` and implementation of SGD w\u002F decoupled weight-decay\n* `fused\u003Cname>` optimizers by name with [NVIDIA Apex](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex\u002Ftree\u002Fmaster\u002Fapex\u002Foptimizers) installed\n* `bnb\u003Cname>` optimizers by name with [BitsAndBytes](https:\u002F\u002Fgithub.com\u002FTimDettmers\u002Fbitsandbytes) installed\n* `cadamw`, `clion`, and more 'Cautious' optimizers from https:\u002F\u002Fgithub.com\u002Fkyleliang919\u002FC-Optim - https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.16085\n* `adam`, `adamw`, `rmsprop`, `adadelta`, `adagrad`, and `sgd` pass through to `torch.optim` implementations\n* `c` suffix (eg `adamc`, `nadamc` to implement 'corrected weight decay' in https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02285)\n  \n### Augmentations\n* Random Erasing from [Zhun Zhong](https:\u002F\u002Fgithub.com\u002Fzhunzhong07\u002FRandom-Erasing\u002Fblob\u002Fmaster\u002Ftransforms.py) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1708.04896)\n* Mixup - https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.09412\n* CutMix - https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.04899\n* AutoAugment (https:\u002F\u002Farxiv.org\u002Fabs\u002F1805.09501) and RandAugment (https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.13719) ImageNet configurations modeled after impl for EfficientNet training (https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftpu\u002Fblob\u002Fmaster\u002Fmodels\u002Fofficial\u002Fefficientnet\u002Fautoaugment.py)\n* AugMix w\u002F JSD loss, JSD w\u002F clean + augmented mixing support works with AutoAugment and RandAugment as well - https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.02781\n* SplitBachNorm - allows splitting batch norm layers between clean and augmented (auxiliary batch norm) data\n\n### Regularization\n* DropPath aka \"Stochastic Depth\" - https:\u002F\u002Farxiv.org\u002Fabs\u002F1603.09382\n* DropBlock - https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.12890\n* Blur Pooling - https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.11486\n\n### Other\n\nSeveral (less common) features that I often utilize in my projects are included. Many of their additions are the reason why I maintain my own set of models, instead of using others' via PIP:\n\n* All models have a common default configuration interface and API for\n    * accessing\u002Fchanging the classifier - `get_classifier` and `reset_classifier`\n    * doing a forward pass on just the features - `forward_features` (see [documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftimm\u002Ffeature_extraction))\n    * these makes it easy to write consistent network wrappers that work with any of the models\n* All models support multi-scale feature map extraction (feature pyramids) via create_model (see [documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftimm\u002Ffeature_extraction))\n    * `create_model(name, features_only=True, out_indices=..., output_stride=...)`\n    * `out_indices` creation arg specifies which feature maps to return, these indices are 0 based and generally correspond to the `C(i + 1)` feature level.\n    * `output_stride` creation arg controls output stride of the network by using dilated convolutions. Most networks are stride 32 by default. Not all networks support this.\n    * feature map channel counts, reduction level (stride) can be queried AFTER model creation via the `.feature_info` member\n* All models have a consistent pretrained weight loader that adapts last linear if necessary, and from 3 to 1 channel input if desired\n* High performance [reference training, validation, and inference scripts](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftimm\u002Ftraining_script) that work in several process\u002FGPU modes:\n    * NVIDIA DDP w\u002F a single GPU per process, multiple processes with APEX present (AMP mixed-precision optional)\n    * PyTorch DistributedDataParallel w\u002F multi-gpu, single process (AMP disabled as it crashes when enabled)\n    * PyTorch w\u002F single GPU single process (AMP optional)\n* A dynamic global pool implementation that allows selecting from average pooling, max pooling, average + max, or concat([average, max]) at model creation. All global pooling is adaptive average by default and compatible with pretrained weights.\n* A 'Test Time Pool' wrapper that can wrap any of the included models and usually provides improved performance doing inference with input images larger than the training size. Idea adapted from original DPN implementation when I ported (https:\u002F\u002Fgithub.com\u002Fcypw\u002FDPNs)\n* Learning rate schedulers\n  * Ideas adopted from\n     * [AllenNLP schedulers](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fallennlp\u002Ftree\u002Fmaster\u002Fallennlp\u002Ftraining\u002Flearning_rate_schedulers)\n     * [FAIRseq lr_scheduler](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Ftree\u002Fmaster\u002Ffairseq\u002Foptim\u002Flr_scheduler)\n     * SGDR: Stochastic Gradient Descent with Warm Restarts (https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.03983)\n  * Schedulers include `step`, `cosine` w\u002F restarts, `tanh` w\u002F restarts, `plateau`\n* Space-to-Depth by [mrT23](https:\u002F\u002Fgithub.com\u002FmrT23\u002FTResNet\u002Fblob\u002Fmaster\u002Fsrc\u002Fmodels\u002Ftresnet\u002Flayers\u002Fspace_to_depth.py) (https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.04590)\n* Adaptive Gradient Clipping (https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.06171, https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdeepmind-research\u002Ftree\u002Fmaster\u002Fnfnets)\n* An extensive selection of channel and\u002For spatial attention modules:\n    * Bottleneck Transformer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.11605\n    * CBAM - https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.06521\n    * Effective Squeeze-Excitation (ESE) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.06667\n    * Efficient Channel Attention (ECA) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.03151\n    * Gather-Excite (GE) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.12348\n    * Global Context (GC) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.11492\n    * Halo - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.12731\n    * Involution - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.06255\n    * Lambda Layer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.08602\n    * Non-Local (NL) -  https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.07971\n    * Squeeze-and-Excitation (SE) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.01507\n    * Selective Kernel (SK) - (https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.06586\n    * Split (SPLAT) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.08955\n    * Shifted Window (SWIN) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.14030\n\n## Results\n\nModel validation results can be found in the [results tables](results\u002FREADME.md)\n\n## Getting Started (Documentation)\n\nThe official documentation can be found at https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Ftimm. Documentation contributions are welcome.\n\n[Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide](https:\u002F\u002Ftowardsdatascience.com\u002Fgetting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055-2\u002F) by [Chris Hughes](https:\u002F\u002Fgithub.com\u002FChris-hughes10) is an extensive blog post covering many aspects of `timm` in detail.\n\n[timmdocs](http:\u002F\u002Ftimm.fast.ai\u002F) is an alternate set of documentation for `timm`. A big thanks to [Aman Arora](https:\u002F\u002Fgithub.com\u002Famaarora) for his efforts creating timmdocs.\n\n[paperswithcode](https:\u002F\u002Fpaperswithcode.com\u002Flib\u002Ftimm) is a good resource for browsing the models within `timm`.\n\n## Train, Validation, Inference Scripts\n\nThe root folder of the repository contains reference train, validation, and inference scripts that work with the included models and other features of this repository. They are adaptable for other datasets and use cases with a little hacking. See [documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftimm\u002Ftraining_script).\n\n## Awesome PyTorch Resources\n\nOne of the greatest assets of PyTorch is the community and their contributions. A few of my favourite resources that pair well with the models and components here are listed below.\n\n### Object Detection, Instance and Semantic Segmentation\n* Detectron2 - https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\n* Segmentation Models (Semantic) - https:\u002F\u002Fgithub.com\u002Fqubvel\u002Fsegmentation_models.pytorch\n* EfficientDet (Obj Det, Semantic soon) - https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch\n\n### Computer Vision \u002F Image Augmentation\n* Albumentations - https:\u002F\u002Fgithub.com\u002Falbumentations-team\u002Falbumentations\n* Kornia - https:\u002F\u002Fgithub.com\u002Fkornia\u002Fkornia\n\n### Knowledge Distillation\n* RepDistiller - https:\u002F\u002Fgithub.com\u002FHobbitLong\u002FRepDistiller\n* torchdistill - https:\u002F\u002Fgithub.com\u002Fyoshitomo-matsubara\u002Ftorchdistill\n\n### Metric Learning\n* PyTorch Metric Learning - https:\u002F\u002Fgithub.com\u002FKevinMusgrave\u002Fpytorch-metric-learning\n\n### Training \u002F Frameworks\n* fastai - https:\u002F\u002Fgithub.com\u002Ffastai\u002Ffastai\n* lightly_train - https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly-train\n\n### Deployment\n* timmx (Export timm models to ONNX, CoreML, LiteRT, TensorRT, and more) - https:\u002F\u002Fgithub.com\u002FBoulaouaney\u002Ftimmx\n\n## Licenses\n\n### Code\nThe code here is licensed Apache 2.0. I've taken care to make sure any third party code included or adapted has compatible (permissive) licenses such as MIT, BSD, etc. I've made an effort to avoid any GPL \u002F LGPL conflicts. That said, it is your responsibility to ensure you comply with licenses here and conditions of any dependent licenses. Where applicable, I've linked the sources\u002Freferences for various components in docstrings. If you think I've missed anything please create an issue.\n\n### Pretrained Weights\nSo far all of the pretrained weights available here are pretrained on ImageNet with a select few that have some additional pretraining (see extra note below). ImageNet was released for non-commercial research purposes only (https:\u002F\u002Fimage-net.org\u002Fdownload). It's not clear what the implications of that are for the use of pretrained weights from that dataset. Any models I have trained with ImageNet are done for research purposes and one should assume that the original dataset license applies to the weights. It's best to seek legal advice if you intend to use the pretrained weights in a commercial product.\n\n#### Pretrained on more than ImageNet\nSeveral weights included or references here were pretrained with proprietary datasets that I do not have access to. These include the Facebook WSL, SSL, SWSL ResNe(Xt) and the Google Noisy Student EfficientNet models. The Facebook models have an explicit non-commercial license (CC-BY-NC 4.0, https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsemi-supervised-ImageNet1K-models, https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FWSL-Images). The Google models do not appear to have any restriction beyond the Apache 2.0 license (and ImageNet concerns). In either case, you should contact Facebook or Google with any questions.\n\n## Citing\n\n### BibTeX\n\n```bibtex\n@misc{rw2019timm,\n  author = {Ross Wightman},\n  title = {PyTorch Image Models},\n  year = {2019},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  doi = {10.5281\u002Fzenodo.4414861},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models}}\n}\n```\n\n### Latest DOI\n\n[![DOI](https:\u002F\u002Fzenodo.org\u002Fbadge\u002F168799526.svg)](https:\u002F\u002Fzenodo.org\u002Fbadge\u002Flatestdoi\u002F168799526)\n","# PyTorch 图像模型\n- [新增内容](#whats-new)\n- [简介](#introduction)\n- [模型](#models)\n- [特性](#features)\n- [结果](#results)\n- [入门指南（文档）](#getting-started-documentation)\n- [训练、验证、推理脚本](#train-validation-inference-scripts)\n- [优秀的 PyTorch 资源](#awesome-pytorch-resources)\n- [许可证](#licenses)\n- [引用](#citing)\n\n## 新增内容\n\n### 2026年3月23日\n* 改进 pickle 检查点处理的安全性。默认所有加载操作都设置为 `weights_only=True`，并为 ArgParse 添加安全全局变量。\n* 改进核心 ViT\u002FEVA 模型及层的注意力掩码处理。解析布尔掩码，并将 `is_causal` 参数传递给 SSL 任务。\n* 修复在启用 ViT 且未使用位置嵌入时的类和注册标记用法问题。\n* 在 ViT 中添加补丁表示精炼 (PRR) 作为池化选项。感谢 Sina (https:\u002F\u002Fgithub.com\u002Fsinahmr)。\n* 提高注意力池化层输出投影\u002FMLP 维度的一致性。\n* 对 Hiera 模型进行 F.SDPA 优化，以支持 Flash Attention 内核的使用。\n* 向 SGDP 优化器添加警告信息。\n* 发布 1.0.26 版本。这是我离开 Hugging Face 以来的首个维护版本。\n\n### 2026年2月23日\n* 在蒸馏任务包装器中添加标记蒸馏训练支持。\n* 为正式弃用做准备，移除部分 torch.jit 的使用。\n* 向 AdamP 优化器添加警告信息。\n* 即使是元设备初始化，也调用 `reset_parameters()` 方法，以便使用诸如 `init_empty_weights` 等技巧来初始化缓冲区。\n* 调整 Muon 优化器，使其与 DTensor\u002FFSDP2 兼容（使用 `clamp_` 而不是 `clamp_min_`，为 DTensor 使用替代的 NS 分支）。\n* 发布 1.0.25 版本。\n\n### 2026年1月21日\n* **兼容性破坏**：修复 `ParallelScalingBlock`（以及 `DiffParallelScalingBlock`）中 QKV 和 MLP 偏置的疏忽。\n  * 这不会影响任何已训练的 `timm` 模型，但可能会影响下游应用。\n\n### 2026年1月5日和6日\n* 发布 1.0.24 版本。\n* 添加新的基准测试结果 CSV 文件，用于在 RTX Pro 6000、5090 和 4090 显卡上，使用 PyTorch 2.9.1 进行所有模型的推理时间测试。\n* 修复已弃用的 `timm.models.layers` 导入路径中的模块移动错误，该错误影响旧版导入。\n* 发布 1.0.23 版本。\n\n### 2025年12月30日\n* 添加性能更好的 NAdaMuon 训练的 `dpwee`、`dwee`、`dlittle`（差异）ViT 模型，相比之前的运行略有提升。\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k（83.24% top-1）\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k（81.80% top-1）\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k（81.67% top-1）\n* 在 512x512 和 640x640 尺寸下，添加一个约 2100 万参数的 `timm` 变体 CSATv2 模型。\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fcsatv2_21m.sw_r640_in1k（83.13% top-1）\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fcsatv2_21m.sw_r512_in1k（82.58% top-1）\n* 将非持久化参数的初始化从 `__init__` 方法中提取出来，放入一个公共方法中，可在元设备初始化后通过 `init_non_persistent_buffers()` 外部调用。\n\n### 2025年12月12日\n* 添加 CSATV2 模型（感谢 https:\u002F\u002Fgithub.com\u002Fgusdlf93）——一种轻量级但高分辨率的模型，带有 DCT 主干和空间注意力。https:\u002F\u002Fhuggingface.co\u002FHyunil\u002FCSATv2。\n* 在现有的 `timm` Muon 实现中添加 AdaMuon 和 NAdaMuon 优化器支持。对于图像任务，它似乎比 AdamW 更具竞争力，且具有熟悉的超参数。\n* 年终 PR 清理，合并多个长期开放的 PR。\n  * 合并差异注意力 (`DiffAttention`)，添加相应的 `DiffParallelScalingBlock`（适用于 ViT），并训练一些小型 ViT 模型。\n    * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dwee_patch16_reg1_gap_256.sbb_in1k。\n    * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dpwee_patch16_reg1_gap_256.sbb_in1k。\n  * 添加几个池化模块，如 `LsePlus` 和 `SimPool`。\n  * 清理并优化 `DropBlock2d`（同时为 ByobNet 基础模型添加支持）。\n* 将单元测试升级到 PyTorch 2.9.1 和 Python 3.13 的较高版本，而较低版本仍保持 PyTorch 1.13 和 Python 3.10。\n\n### 2025年12月1日\n* 添加轻量级任务抽象，通过新任务在训练脚本中支持 logits 和特征蒸馏。\n* 移除旧的 APEX AMP 支持。\n\n### 2025年11月4日\n* 修复 LayerScale \u002F LayerScale2d 初始化错误（初始值被忽略），该错误是在 1.0.21 版本中引入的。感谢 https:\u002F\u002Fgithub.com\u002FIlya-Fradlin。\n* 发布 1.0.22 版本。\n\n### 2025年10月31日 🎃\n* 更新 ImageNet 和 OOD 变体结果 CSV 文件，加入几款新模型，并在多个 PyTorch 和 `timm` 版本中验证其正确性。\n* 添加 EfficientNet-X 和 EfficientNet-H B5 模型权重，作为 AdamW 与 Muon 超参数搜索的一部分（仍在迭代 Muon 运行）。\n\n### 2025年10月16日至20日\n* 添加基于 https:\u002F\u002Fgithub.com\u002FKellerJordan\u002FMuon 的 Muon 优化器实现，并进行定制。\n  * 提供额外的灵活性，改进对卷积权重的处理，并为不适合正交化的权重形状提供回退机制。\n  * 通过减少内存分配和使用融合的 (b)add(b)mm 操作，小幅提升了 NS 迭代的速度。\n  * 默认情况下，如果参数形状不适合使用 Muon（或通过参数组标志排除），则会使用 AdamW（或若 `nesterov=True` 则使用 NAdamW）更新。\n  * 类似于 PyTorch 实现，可通过 `adjust_lr_fn` 选择多种 LR 缩放调整函数。\n  * 可以选择几种 NS 系数预设，或通过 `ns_coefficients` 自定义自己的系数。\n* 支持“meta”设备模型初始化的前两步。\n  * 修复了几种在“meta”设备上下文中创建时会出错的操作。\n  * 在 `timm` 中的所有模型和模块（任何继承自 nn.Module 的对象）中添加了设备和 dtype 工厂关键字参数的支持。\n* 在代码中的预训练配置文件中添加了许可证字段。\n* 发布 1.0.21 版本。\n\n### 2025年9月21日\n* 将 DINOv3 ViT 权重标签从 `lvd_1689m` 重新映射为 `lvd1689m`，以保持一致（同样适用于 `sat_493m` -> `sat493m`）。\n* 发布 1.0.20 版本。\n\n### 2025年9月17日\n* 添加 DINOv3（https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10104）ConvNeXt 和 ViT 模型。ConvNeXt 模型被映射到现有的 `timm` 模型。ViT 支持则通过 EVA 基础模型实现，并使用新的 `RotaryEmbeddingDinoV3` 来匹配 DINOv3 特有的 RoPE 实现。\n  * HuggingFace Hub: https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ftimm\u002Ftimm-dinov3-68cb08bb0bee365973d52a4d。\n* MobileCLIP-2（https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.20691）视觉编码器。添加了新的 MCI3\u002FMCI4 FastViT 变体，并将权重映射到现有的 FastViT 和 B、L\u002F14 ViT 模型。\n* 添加 MetaCLIP-2 Worldwide（https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.22062）ViT 编码器权重。\n* 通过 timm NaFlexViT 模型添加 SigLIP-2（https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14786）NaFlex ViT 编码器权重。\n* 其他修复和贡献。\n\n### 2025年7月23日\n* 在 EVA 模型中添加 `set_input_size()` 方法，OpenCLIP 3.0.0 使用该方法允许对基于 `timm` 的编码器模型进行大小调整。\n* 发布 1.0.18 版本，这是 OpenCLIP 3.0.0 中 PE-Core S 和 T 模型所需的版本。\n* 修复一个小的类型问题，该问题导致 Python 3.9 不兼容。发布 1.0.19 补丁版本。\n\n## 2025年7月21日\n* NaFlexViT新增ROPE支持。所有由EVA基础模块（`eva.py`）覆盖的模型，包括EVA、EVA02、Meta PE ViT、带有ROPE的`timm` SBB ViT以及Naver ROPE-ViT，现在都可以在创建模型时传入`use_naflex=True`参数后加载到NaFlexViT中。\n* 增加了更多Meta PE ViT编码器，包括小型\u002F微型变体、带分块处理的语言变体，以及更多空间变体。\n* 修复了NaFlexViT及EVA模型中的PatchDropout问题（在添加Naver ROPE-ViT后出现的回归问题）。\n* 修正了grid_indexing='xy'模式下的XY顺序问题，该问题影响了在‘xy’模式下使用非正方形图像的情况（仅ROPE-ViT和PE受影响）。\n\n## 2025年7月7日\n* 对MobileNet-v5主干网络进行了调整，以改善Google Gemma 3n的表现（配合更新的官方权重）：\n  * 添加了stem偏置项（在更新的权重中已置零，与旧权重不兼容）；\n  * 将激活函数从GELU改为GELU（tanh近似）。这一小幅改动旨在更接近JAX实现。\n* 为层衰减支持新增两个参数：最小缩放系数限制和“无优化”缩放阈值。\n* 新增‘Fp32’版本的LayerNorm、RMSNorm和SimpleNorm变体，可启用以强制在float32精度下进行归一化计算。\n* 针对上述内容，对归一化层及其激活组合层的类型注解和参数进行了清理。\n* 在`eva.py`中支持Naver ROPE-ViT（https:\u002F\u002Fgithub.com\u002Fnaver-ai\u002Frope-vit），并新增RotaryEmbeddingMixed模块用于混合模式，权重已发布至HuggingFace Hub。\n\n|模型                                             |图像尺寸|Top-1准确率|Top-5准确率|参数量|\n|--------------------------------------------------|--------|----------|----------|-------|\n|vit_large_patch16_rope_mixed_ape_224.naver_in1k  |224     |84.84     |97.122    |304.4  |\n|vit_large_patch16_rope_mixed_224.naver_in1k      |224     |84.828    |97.116    |304.2  |\n|vit_large_patch16_rope_ape_224.naver_in1k        |224     |84.65     |97.154    |304.37 |\n|vit_large_patch16_rope_224.naver_in1k            |224     |84.648    |97.122    |304.17 |\n|vit_base_patch16_rope_mixed_ape_224.naver_in1k   |224     |83.894    |96.754    |86.59  |\n|vit_base_patch16_rope_mixed_224.naver_in1k       |224     |83.804    |96.712    |86.44  |\n|vit_base_patch16_rope_ape_224.naver_in1k         |224     |83.782    |96.61     |86.59  |\n|vit_base_patch16_rope_224.naver_in1k             |224     |83.718    |96.672    |86.43  |\n|vit_small_patch16_rope_224.naver_in1k            |224     |81.23     |95.022    |21.98  |\n|vit_small_patch16_rope_mixed_224.naver_in1k      |224     |81.216    |95.022    |21.99  |\n|vit_small_patch16_rope_ape_224.naver_in1k        |224     |81.004    |95.016    |22.06  |\n|vit_small_patch16_rope_mixed_ape_224.naver_in1k  |224     |80.986    |94.976    |22.06  |\n* 对ROPE模块、辅助工具及FX追踪叶节点注册进行了部分清理。\n* 正在准备1.0.17版本发布。\n\n## 2025年6月26日\n* 提供了适用于[Gemma 3n](https:\u002F\u002Fai.google.dev\u002Fgemma\u002Fdocs\u002Fgemma-3n#parameters)图像编码器的MobileNetV5主干网络（仅含编码器变体）。\n* 发布了1.0.16版本。\n\n## 2025年6月23日\n* 在NaFlexViT中新增基于F.grid_sample的2D及因子化位置编码插值功能。当需要处理大量不同尺寸时，此方法更为高效（参考自https:\u002F\u002Fgithub.com\u002Fstas-sl的示例）。\n* 进一步加速patch嵌入重采样过程，将vmap替换为矩阵乘法（参考自https:\u002F\u002Fgithub.com\u002Fstas-sl的代码片段）。\n* 在测试过程中生成了3个初始的原生宽高比NaFlexViT检查点，分别对应ImageNet-1k数据集及3种不同的位置编码配置，且具有相同的超参数。\n\n| 模型 | Top-1准确率 | Top-5准确率 | 参数量（M） | 评估序列长度 |\n|:---|:---:|:---:|:---:|:---:|\n| [naflexvit_base_patch16_par_gap.e300_s576_in1k](https:\u002F\u002Fhf.co\u002Ftimm\u002Fnaflexvit_base_patch16_par_gap.e300_s576_in1k) | 83.67 | 96.45 | 86.63 | 576 |\n| [naflexvit_base_patch16_parfac_gap.e300_s576_in1k](https:\u002F\u002Fhf.co\u002Ftimm\u002Fnaflexvit_base_patch16_parfac_gap.e300_s576_in1k) | 83.63 | 96.41 | 86.46 | 576 |\n| [naflexvit_base_patch16_gap.e300_s576_in1k](https:\u002F\u002Fhf.co\u002Ftimm\u002Fnaflexvit_base_patch16_gap.e300_s576_in1k) | 83.50 | 96.46 | 86.63 | 576 |\n* 支持`forward_intermediates`梯度检查点，并修复了一些检查点相关的bug。感谢https:\u002F\u002Fgithub.com\u002Fbrianhou0208的贡献。\n* 将AdamW（传统）、Adopt、Kron、Adafactor（BV）、Lamb、LaProp、Lion、NadamW、RmsPropTF、SGDW等优化器的权重衰减选项更新为“修正版权重衰减”（https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02285）。\n* 将PE（感知编码器）ViT模型切换为使用原生timm权重，而非实时重新映射。\n* 修复了预取加载器中的CUDA流bug。\n\n## 2025年6月5日\n* 初始的NaFlexVit模型代码。NaFlexVit是一种视觉Transformer，具备以下特性：\n  1. 将嵌入和位置编码封装在一个模块中；\n  2. 支持对预分割（字典形式）输入进行nn.Linear patch嵌入；\n  3. 支持NaFlex的可变宽高比和分辨率（SigLip-2：https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14786）；\n  4. 支持FlexiViT的可变patch大小（https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08013）；\n  5. 支持NaViT的分数\u002F因子化位置编码（https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.06304）。\n* 现有的`vision_transformer.py`中的ViT模型可以通过在`create_model`中添加`use_naflex=True`标志加载到NaFlexVit模型中。\n  * 部分原生权重即将推出。\n* 提供了一套完整的NaFlex数据流水线，支持使用可变宽高比\u002F尺寸的图像进行训练\u002F微调\u002F评估。\n  * 在`train.py`和`validate.py`中添加`--naflex-loader`参数即可启用，但必须与NaFlexVit模型配合使用。\n* 使用NaFlex数据流水线评估已加载到NaFlexVit模型中的现有（经典）ViT模型：\n  * `python validate.py \u002Fimagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256`\n* 训练过程中还有一些值得注意的额外参数功能：\n  * `--naflex-train-seq-lens`参数指定了训练时每批次随机选择的序列长度范围；\n  * `--naflex-max-seq-len`参数设置了验证时的目标序列长度；\n  * 添加`--model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24`将启用每批次随机选择patch大小并进行插值的功能；\n  * `--naflex-loss-scale`参数会根据批次大小动态调整损失缩放模式，而`timm`的NaFlex加载方式会为每个序列长度调整批次大小。\n\n## 2025年5月28日\n* 感谢 https:\u002F\u002Fgithub.com\u002Fbrianhou0208 的贡献，新增了一批小型\u002F快速模型：\n  * SwiftFormer - [(ICCV2023) SwiftFormer：用于基于Transformer的实时移动视觉应用的高效加性注意力机制](https:\u002F\u002Fgithub.com\u002FAmshaker\u002FSwiftFormer)\n  * FasterNet - [(CVPR2023) 跑起来，别走着：追求更高FLOPS以实现更快的神经网络](https:\u002F\u002Fgithub.com\u002FJierunChen\u002FFasterNet)\n  * SHViT - [(CVPR2024) SHViT：具有内存效率的单头视觉Transformer](https:\u002F\u002Fgithub.com\u002Fysj9909\u002FSHViT)\n  * StarNet - [(CVPR2024) 重写星辰](https:\u002F\u002Fgithub.com\u002Fma-xu\u002FRewrite-the-Stars)\n  * GhostNet-V3 [GhostNetV3：探索紧凑模型的训练策略](https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FEfficient-AI-Backbones\u002Ftree\u002Fmaster\u002Fghostnetv3_pytorch)\n* 更新EVA ViT（最接近的匹配）以支持Meta的Perception Encoder模型（https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.13181），加载Hub权重，但我仍需推送专门的`timm`权重。\n  * 为ROPE实现增加了一些灵活性。\n* 在https:\u002F\u002Fgithub.com\u002Fbrianhou0208的帮助下，支持`forward_intermediates()`的模型数量大幅增加，并进行了一些额外的修复。\n  * 包括DaViT、EdgeNeXt、EfficientFormerV2、EfficientViT(MIT)、EfficientViT(MSRA)、FocalNet、GCViT、HGNet \u002FV2、InceptionNeXt、Inception-V4、MambaOut、MetaFormer、NesT、Next-ViT、PiT、PVT V2、RepGhostNet、RepViT、ResNetV2、ReXNet、TinyViT、TResNet、VoV等。\n* TNT模型已使用新权重更新了`forward_intermediates()`，感谢https:\u002F\u002Fgithub.com\u002Fbrianhou0208。\n* 添加`local-dir:`预训练模式，可以使用`local-dir:\u002Fpath\u002Fto\u002Fmodel\u002Ffolder`作为模型名称，从本地文件夹中加载Hugging Face Hub模型的配置文件和权重文件。\n* 修复并改进了ONNX导出功能。\n\n## 2025年2月21日\n* 新增SigLIP 2 ViT图像编码器（https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ftimm\u002Fsiglip-2-67b8e72ba08b09dd97aecaf9）。\n  * 可变分辨率\u002F宽高比的NaFlex版本仍在开发中。\n* 新增采用SBB配方训练的‘SO150M2’ViT权重，效果极佳，相比之前的尝试，在ImageNet上的表现更好，且训练量更少。\n  * `vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k` - Top-1准确率88.1%\n  * `vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k` - Top-1准确率87.9%\n  * `vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k` - Top-1准确率87.3%\n  * `vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12k`\n* 更新InternViT-300M ‘2.5’权重。\n* 发布1.0.15版本。\n\n## 2025年2月1日\n* 特别说明：PyTorch 2.6和Python 3.13已在当前主分支及发布的`timm`版本中测试通过并正常工作。\n\n## 2025年1月27日\n* 新增Kron优化器（带有克罗内克分解预条件的PSGD）。\n  * 代码来自https:\u002F\u002Fgithub.com\u002Fevanatyourservice\u002Fkron_torch。\n  * 更多信息请参见https:\u002F\u002Fsites.google.com\u002Fsite\u002Flixilinx\u002Fhome\u002Fpsgd。\n\n## 2025年1月19日\n* 修复LeViT safetensor权重的加载问题，移除本应已停用的转换代码。\n* 新增采用SBB配方训练的‘SO150M’ViT权重，效果尚可，但其形状并不适合ImageNet-12k\u002F1k的预训练和微调。\n  * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k` - Top-1准确率86.7%\n  * `vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k` - Top-1准确率87.4%\n  * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k`\n* 进行了一些类型注解、拼写错误等方面的清理。\n* 发布1.0.14版本以修复上述LeViT问题。\n\n## 2025年1月9日\n* 增加对纯`bfloat16`或`float16`训练和验证的支持。\n* 由https:\u002F\u002Fgithub.com\u002Fcaojiaolong添加了`wandb`项目名称参数，使用arg.experiment指定名称。\n* 修复旧问题：在不支持硬链接的文件系统上无法保存检查点（例如FUSE挂载的文件系统）。\n* 发布1.0.13版本。\n\n## 2025年1月6日\n* 在`timm.models`中添加`torch.utils.checkpoint.checkpoint()`包装器，默认设置`use_reentrant=False`，除非环境变量`TIMM_REENTRANT_CKPT=1`被设置。\n\n## 2024年12月31日\n* `convnext_nano` 384x384 ImageNet-12k预训练及微调模型。https:\u002F\u002Fhuggingface.co\u002Fmodels?search=convnext_nano%20r384\n* 新增来自https:\u002F\u002Fgithub.com\u002Fapple\u002Fml-aim的AIM-v2编码器，可在Hub上查看：https:\u002F\u002Fhuggingface.co\u002Fmodels?search=timm%20aimv2\n* 将来自https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fbig_vision的PaliGemma2编码器添加到现有的PaliGemma中，可在Hub上查看：https:\u002F\u002Fhuggingface.co\u002Fmodels?search=timm%20pali2\n* 补齐缺失的L\u002F14 DFN2B 39B CLIP ViT，即`vit_large_patch14_clip_224.dfn2b_s39b`\n* 修复现有的`RmsNorm`层及函数，使其符合标准公式，尽可能使用PT 2.5的实现。将旧实现移至`SimpleNorm`层，该层为LN，但不含中心化或偏置。此前仅有两台`timm`模型使用它，现已完成更新。\n* 允许覆盖模型创建时的`cache_dir`参数。\n* 将`trust_remote_code`传递给HF数据集包装器。\n* 创作者新增了`inception_next_atto`模型。\n* 提醒注意Adan优化器以及Lamb解耦合权重衰减选项。\n* 由https:\u002F\u002Fgithub.com\u002Fbrianhou0208修复了一些feature_info元数据。\n* 所有使用加载时重新映射的OpenCLIP和JAX（CLIP、SigLIP、Pali等）模型权重均已分配独立的HF Hub实例，以便通过`hf-hub:`方式加载，从而与新的Transformers `TimmWrapperModel`兼容。\n\n## 简介\n\nPy**T**orch **Im**age **M**odels (`timm`) 是一个包含图像模型、层、工具、优化器、调度器、数据加载器\u002F增强以及参考训练\u002F验证脚本的集合，旨在汇集各种最先进的模型，并具备重现ImageNet训练结果的能力。\n此处包含了众多他人的工作成果。我已尽力确保所有来源材料均在README、文档和代码注释中通过GitHub、arXiv论文等链接予以注明。如有遗漏，请随时告知我。\n\n## 功能\n\n### 模型\n\n所有模型架构系列均包含带有预训练权重的变体。也有一些特定的模型变体未附带任何权重，这并非错误。我们始终欢迎帮助训练新的或更好的权重。\n\n* 聚合嵌套Transformer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.12723\n* BEiT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.08254\n* BEiT-V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.06366\n* BEiT3 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.10442\n* 大迁移ResNetV2 (BiT) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.11370\n* 瓶颈Transformer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.11605\n* CaiT（图像Transformer中的类别注意力） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.17239\n* CoaT（协同尺度卷积-注意力图像Transformer） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.06399\n* CoAtNet（卷积与注意力） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.04803\n* ConvNeXt - https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.03545\n* ConvNeXt-V2 - http:\u002F\u002Farxiv.org\u002Fabs\u002F2301.00808\n* ConViT（具有软卷积归纳偏置的视觉Transformer） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.10697\n* CspNet（跨阶段部分网络） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.11929\n* DeiT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.12877\n* DeiT-III - https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.07118.pdf\n* DenseNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.06993\n* DLA - https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06484\n* DPN（双路径网络） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.01629\n* EdgeNeXt - https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.10589\n* EfficientFormer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.01191\n* EfficientFormer-V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08059\n* EfficientNet（MBConvNet家族）\n    * EfficientNet NoisyStudent（B0-B7，L2） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.04252\n    * EfficientNet AdvProp（B0-B8） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.09665\n    * EfficientNet（B0-B7） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.11946\n    * EfficientNet-EdgeTPU（S、M、L） - https:\u002F\u002Fai.googleblog.com\u002F2019\u002F08\u002Fefficientnet-edgetpu-creating.html\n    * EfficientNet V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.00298\n    * FBNet-C - https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.03443\n    * MixNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.09595\n    * MNASNet B1、A1（挤压-激励）以及Small - https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.11626\n    * MobileNet-V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.04381\n    * 单路径NAS - https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.02877\n    * TinyNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.14819\n* EfficientViT（MIT） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.14756\n* EfficientViT（MSRA） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.07027\n* EVA - https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.07636\n* EVA-02 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11331\n* FasterNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.03667\n* FastViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.14189\n* FlexiViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08013\n* FocalNet（焦点调制网络） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11926\n* GCViT（全局上下文视觉Transformer） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.09959\n* GhostNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.11907\n* GhostNet-V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.12905\n* GhostNet-V3 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.11202\n* gMLP - https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.08050\n* GPU高效网络 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.14090\n* Halo Nets - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.12731\n* HGNet \u002F HGNet-V2 - 待定\n* HRNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.07919\n* InceptionNeXt - https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.16900\n* Inception-V3 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.00567\n* Inception-ResNet-V2和Inception-V4 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.07261\n* Lambda Networks - https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.08602\n* LeViT（以卷积网络形式出现的视觉Transformer） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01136\n* MambaOut - https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.07992\n* MaxViT（多轴视觉Transformer） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.01697\n* MetaFormer（PoolFormer-v2、ConvFormer、CAFormer） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.13452\n* MLP-Mixer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.01601\n* MobileCLIP - https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.17049\n* MobileNet-V3（带高效头部的MBConvNet） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.02244\n  * FBNet-V3 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.02049\n  * HardCoRe-NAS - https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.11646\n  * LCNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.15099\n* MobileNetV4 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.10518\n* MobileOne - https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.04040\n* MobileViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02178\n* MobileViT-V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.02680\n* MViT-V2（改进型多尺度视觉Transformer） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.01526\n* NASNet-A - https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.07012\n* NesT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.12723\n* Next-ViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05501\n* NFNet-F - https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.06171\n* NF-RegNet \u002F NF-ResNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.08692\n* PE（感知编码器） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.13181\n* PNasNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.00559\n* PoolFormer（MetaFormer） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.11418\n* 基于池化的视觉Transformer（PiT） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.16302\n* PVT-V2（改进型金字塔视觉Transformer） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.13797\n* RDNet（重装版DenseNet） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.19588\n* RegNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.13678\n* RegNetZ - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.06877\n* RepVGG - https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.03697\n* RepGhostNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.06088\n* RepViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.09283\n* ResMLP - https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.03404\n* ResNet\u002FResNeXt\n    * ResNet（v1b\u002Fv1.5） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.03385\n    * ResNeXt - https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.05431\n    * “技巧包”\u002FGluon C、D、E、S变体 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.01187\n    * 弱监督（WSL）Instagram预训练\u002FImageNet调优的ResNeXt101 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1805.00932\n    * 半监督（SSL）\u002F半弱监督（SWSL）ResNet\u002FResNeXts - https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.00546\n    * ECA-Net（ECAResNet） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.03151v4\n    * 挤压-激励网络（SEResNet） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.01507\n    * ResNet-RS - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.07579\n* Res2Net - https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.01169\n* ResNeSt - https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.08955\n* ReXNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.00992\n* ROPE-ViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.13298\n* SelecSLS - https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.00837\n* 选择性核网络 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.06586\n* Sequencer2D - https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.01972\n* SHViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.16456\n* SigLIP（图像编码器） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.15343\n* SigLIP 2（图像编码器） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14786\n* StarNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.19967\n* SwiftFormer - https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.15446\n* Swin S3（AutoFormerV2） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.14725\n* Swin Transformer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.14030\n* Swin Transformer V2 - https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.09883\n* TinyViT - https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.10666\n* Transformer嵌套Transformer（TNT） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.00112\n* TResNet - https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.13630\n* Twins（视觉Transformer中的空间注意力） - https:\u002F\u002Farxiv.org\u002Fpdf\u002F2104.13840.pdf\n* VGG - https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.1556\n* Visformer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.12533\n* 视觉Transformer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.11929\n* ViTamin - https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02132\n* VOLO（视觉展望者） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.13112\n* VovNet V2和V1 - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.06667\n* Xception - https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.02357\n* Xception（改良对齐版，Gluon） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.02611\n* Xception（改良对齐版，TF） - https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.02611\n* XCiT（交叉协方差图像Transformer） - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.09681\n\n### Optimizers\nTo see full list of optimizers w\u002F descriptions: `timm.optim.list_optimizers(with_description=True)`\n\nIncluded optimizers available via `timm.optim.create_optimizer_v2` factory method:\n* `adabelief` an implementation of AdaBelief adapted from https:\u002F\u002Fgithub.com\u002Fjuntang-zhuang\u002FAdabelief-Optimizer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.07468\n* `adafactor` adapted from [FAIRSeq impl](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Fblob\u002Fmaster\u002Ffairseq\u002Foptim\u002Fadafactor.py) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.04235\n* `adafactorbv` adapted from [Big Vision](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fbig_vision\u002Fblob\u002Fmain\u002Fbig_vision\u002Foptax.py) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.04560\n* `adahessian` by [David Samuel](https:\u002F\u002Fgithub.com\u002Fdavda54\u002Fada-hessian) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.00719\n* `adamp` and `sgdp` by [Naver ClovAI](https:\u002F\u002Fgithub.com\u002Fclovaai) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.08217\n* `adamuon` and `nadamuon` as per https:\u002F\u002Fgithub.com\u002FChongjie-Si\u002FAdaMuon - https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.11005\n* `adan` an implementation of Adan adapted from https:\u002F\u002Fgithub.com\u002Fsail-sg\u002FAdan - https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.06677\n* `adopt` ADOPT adapted from https:\u002F\u002Fgithub.com\u002FiShohei220\u002Fadopt - https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.02853\n* `kron` PSGD w\u002F Kronecker-factored preconditioner from https:\u002F\u002Fgithub.com\u002Fevanatyourservice\u002Fkron_torch - https:\u002F\u002Fsites.google.com\u002Fsite\u002Flixilinx\u002Fhome\u002Fpsgd\n* `lamb` an implementation of Lamb and LambC (w\u002F trust-clipping) cleaned up and modified to support use with XLA - https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.00962\n* `laprop` optimizer from https:\u002F\u002Fgithub.com\u002FZ-T-WANG\u002FLaProp-Optimizer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.04839\n* `lars` an implementation of LARS and LARC (w\u002F trust-clipping) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1708.03888\n* `lion` and implementation of Lion adapted from https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fautoml\u002Ftree\u002Fmaster\u002Flion - https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.06675\n* `lookahead` adapted from impl by [Liam](https:\u002F\u002Fgithub.com\u002Falphadl\u002Flookahead.pytorch) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.08610\n* `madgrad` an implementation of MADGRAD adapted from https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmadgrad - https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.11075\n* `mars` MARS optimizer from https:\u002F\u002Fgithub.com\u002FAGI-Arena\u002FMARS - https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10438\n* `muon` MUON optimizer from https:\u002F\u002Fgithub.com\u002FKellerJordan\u002FMuon with numerous additions and improved non-transformer behaviour\n* `nadam` an implementation of Adam w\u002F Nesterov momentum\n* `nadamw` an implementation of AdamW (Adam w\u002F decoupled weight-decay) w\u002F Nesterov momentum. A simplified impl based on https:\u002F\u002Fgithub.com\u002Fmlcommons\u002Falgorithmic-efficiency\n* `novograd` by [Masashi Kimura](https:\u002F\u002Fgithub.com\u002Fconvergence-lab\u002Fnovograd) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.11286\n* `radam` by [Liyuan Liu](https:\u002F\u002Fgithub.com\u002FLiyuanLucasLiu\u002FRAdam) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.03265\n* `rmsprop_tf` adapted from PyTorch RMSProp by myself. Reproduces much improved Tensorflow RMSProp behaviour\n* `sgdw` and implementation of SGD w\u002F decoupled weight-decay\n* `fused\u003Cname>` optimizers by name with [NVIDIA Apex](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex\u002Ftree\u002Fmaster\u002Fapex\u002Foptimizers) installed\n* `bnb\u003Cname>` optimizers by name with [BitsAndBytes](https:\u002F\u002Fgithub.com\u002FTimDettmers\u002Fbitsandbytes) installed\n* `cadamw`, `clion`, and more 'Cautious' optimizers from https:\u002F\u002Fgithub.com\u002Fkyleliang919\u002FC-Optim - https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.16085\n* `adam`, `adamw`, `rmsprop`, `adadelta`, `adagrad`, and `sgd` pass through to `torch.optim` implementations\n* `c` suffix (eg `adamc`, `nadamc` to implement 'corrected weight decay' in https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02285)\n  \n### Augmentations\n* Random Erasing from [Zhun Zhong](https:\u002F\u002Fgithub.com\u002Fzhunzhong07\u002FRandom-Erasing\u002Fblob\u002Fmaster\u002Ftransforms.py) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1708.04896)\n* Mixup - https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.09412\n* CutMix - https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.04899\n* AutoAugment (https:\u002F\u002Farxiv.org\u002Fabs\u002F1805.09501) and RandAugment (https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.13719) ImageNet configurations modeled after impl for EfficientNet training (https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftpu\u002Fblob\u002Fmaster\u002Fmodels\u002Fofficial\u002Fefficientnet\u002Fautoaugment.py)\n* AugMix w\u002F JSD loss, JSD w\u002F clean + augmented mixing support works with AutoAugment and RandAugment as well - https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.02781\n* SplitBachNorm - allows splitting batch norm layers between clean and augmented (auxiliary batch norm) data\n\n### Regularization\n* DropPath aka \"Stochastic Depth\" - https:\u002F\u002Farxiv.org\u002Fabs\u002F1603.09382\n* DropBlock - https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.12890\n* Blur Pooling - https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.11486\n\n### Other\n\nSeveral (less common) features that I often utilize in my projects are included. Many of their additions are the reason why I maintain my own set of models, instead of using others' via PIP:\n\n* All models have a common default configuration interface and API for\n    * accessing\u002Fchanging the classifier - `get_classifier` and `reset_classifier`\n    * doing a forward pass on just the features - `forward_features` (see [documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftimm\u002Ffeature_extraction))\n    * these makes it easy to write consistent network wrappers that work with any of the models\n* All models support multi-scale feature map extraction (feature pyramids) via create_model (see [documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftimm\u002Ffeature_extraction))\n    * `create_model(name, features_only=True, out_indices=..., output_stride=...)`\n    * `out_indices` creation arg specifies which feature maps to return, these indices are 0 based and generally correspond to the `C(i + 1)` feature level.\n    * `output_stride` creation arg controls output stride of the network by using dilated convolutions. Most networks are stride 32 by default. Not all networks support this.\n    * feature map channel counts, reduction level (stride) can be queried AFTER model creation via the `.feature_info` member\n* All models have a consistent pretrained weight loader that adapts last linear if necessary, and from 3 to 1 channel input if desired\n* High performance [reference training, validation, and inference scripts](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftimm\u002Ftraining_script) that work in several process\u002FGPU modes:\n    * NVIDIA DDP w\u002F a single GPU per process, multiple processes with APEX present (AMP mixed-precision optional)\n    * PyTorch DistributedDataParallel w\u002F multi-gpu, single process (AMP disabled as it crashes when enabled)\n    * PyTorch w\u002F single GPU single process (AMP optional)\n* A dynamic global pool implementation that allows selecting from average pooling, max pooling, average + max, or concat([average, max]) at model creation. All global pooling is adaptive average by default and compatible with pretrained weights.\n* A 'Test Time Pool' wrapper that can wrap any of the included models and usually provides improved performance doing inference with input images larger than the training size. Idea adapted from original DPN implementation when I ported (https:\u002F\u002Fgithub.com\u002Fcypw\u002FDPNs)\n* Learning rate schedulers\n  * Ideas adopted from\n     * [AllenNLP schedulers](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fallennlp\u002Ftree\u002Fmaster\u002Fallennlp\u002Ftraining\u002Flearning_rate_schedulers)\n     * [FAIRseq lr_scheduler](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Ftree\u002Fmaster\u002Ffairseq\u002Foptim\u002Flr_scheduler)\n     * SGDR: Stochastic Gradient Descent with Warm Restarts (https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.03983)\n  * Schedulers include `step`, `cosine` w\u002F restarts, `tanh` w\u002F restarts, `plateau`\n* Space-to-Depth by [mrT23](https:\u002F\u002Fgithub.com\u002FmrT23\u002FTResNet\u002Fblob\u002Fmaster\u002Fsrc\u002Fmodels\u002Ftresnet\u002Flayers\u002Fspace_to_depth.py) (https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.04590)\n* Adaptive Gradient Clipping (https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.06171, https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdeepmind-research\u002Ftree\u002Fmaster\u002Fnfnets)\n* An extensive selection of channel and\u002For spatial attention modules:\n    * Bottleneck Transformer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.11605\n    * CBAM - https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.06521\n    * Effective Squeeze-Excitation (ESE) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.06667\n    * Efficient Channel Attention (ECA) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.03151\n    * Gather-Excite (GE) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.12348\n    * Global Context (GC) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.11492\n    * Halo - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.12731\n    * Involution - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.06255\n    * Lambda Layer - https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.08602\n    * Non-Local (NL) -  https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.07971\n    * Squeeze-and-Excitation (SE) - https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.01507\n    * Selective Kernel (SK) - (https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.06586\n    * Split (SPLAT) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.08955\n    * Shifted Window (SWIN) - https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.14030\n\n## Results\n\nModel validation results can be found in the [results tables](results\u002FREADME.md)\n\n## Getting Started (Documentation)\n\nThe official documentation can be found at https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Ftimm. Documentation contributions are welcome.\n\n[Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide](https:\u002F\u002Ftowardsdatascience.com\u002Fgetting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055-2\u002F) by [Chris Hughes](https:\u002F\u002Fgithub.com\u002FChris-hughes10) is an extensive blog post covering many aspects of `timm` in detail.\n\n[timmdocs](http:\u002F\u002Ftimm.fast.ai\u002F) is an alternate set of documentation for `timm`. A big thanks to [Aman Arora](https:\u002F\u002Fgithub.com\u002Famaarora) for his efforts creating timmdocs.\n\n[paperswithcode](https:\u002F\u002Fpaperswithcode.com\u002Flib\u002Ftimm) is a good resource for browsing the models within `timm`.\n\n## Train, Validation, Inference Scripts\n\nThe root folder of the repository contains reference train, validation, and inference scripts that work with the included models and other features of this repository. They are adaptable for other datasets and use cases with a little hacking. See [documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftimm\u002Ftraining_script).\n\n## Awesome PyTorch Resources\n\nOne of the greatest assets of PyTorch is the community and their contributions. A few of my favourite resources that pair well with the models and components here are listed below.\n\n### Object Detection, Instance and Semantic Segmentation\n* Detectron2 - https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\n* Segmentation Models (Semantic) - https:\u002F\u002Fgithub.com\u002Fqubvel\u002Fsegmentation_models.pytorch\n* EfficientDet (Obj Det, Semantic soon) - https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch\n\n### Computer Vision \u002F Image Augmentation\n* Albumentations - https:\u002F\u002Fgithub.com\u002Falbumentations-team\u002Falbumentations\n* Kornia - https:\u002F\u002Fgithub.com\u002Fkornia\u002Fkornia\n\n### Knowledge Distillation\n* RepDistiller - https:\u002F\u002Fgithub.com\u002FHobbitLong\u002FRepDistiller\n* torchdistill - https:\u002F\u002Fgithub.com\u002Fyoshitomo-matsubara\u002Ftorchdistill\n\n### Metric Learning\n* PyTorch Metric Learning - https:\u002F\u002Fgithub.com\u002FKevinMusgrave\u002Fpytorch-metric-learning\n\n### Training \u002F Frameworks\n* fastai - https:\u002F\u002Fgithub.com\u002Ffastai\u002Ffastai\n* lightly_train - https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly-train\n\n### Deployment\n* timmx (Export timm models to ONNX, CoreML, LiteRT, TensorRT, and more) - https:\u002F\u002Fgithub.com\u002FBoulaouaney\u002Ftimmx\n\n## Licenses\n\n### Code\nThe code here is licensed Apache 2.0. I've taken care to make sure any third party code included or adapted has compatible (permissive) licenses such as MIT, BSD, etc. I've made an effort to avoid any GPL \u002F LGPL conflicts. That said, it is your responsibility to ensure you comply with licenses here and conditions of any dependent licenses. Where applicable, I've linked the sources\u002Freferences for various components in docstrings. If you think I've missed anything please create an issue.\n\n### 预训练权重\n到目前为止，此处提供的所有预训练权重均是在 ImageNet 数据集上进行预训练的，其中仅有少数几个还进行了额外的预训练（详见下方的附加说明）。ImageNet 数据集仅面向非商业研究目的发布（https:\u002F\u002Fimage-net.org\u002Fdownload）。关于使用该数据集的预训练权重可能带来的法律影响尚不明确。我使用 ImageNet 训练的所有模型均出于研究目的，因此应假定原始数据集的许可协议同样适用于这些权重。如果您打算在商业产品中使用这些预训练权重，最好咨询法律意见。\n\n#### 不仅在 ImageNet 上预训练\n此处包含或引用的若干权重是在我无法访问的专有数据集上进行预训练的。其中包括 Facebook 的 WSL、SSL、SWSL ResNe(Xt) 以及 Google 的 Noisy Student EfficientNet 模型。Facebook 的模型具有明确的非商业许可（CC-BY-NC 4.0，https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsemi-supervised-ImageNet1K-models，https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FWSL-Images）。而 Google 的模型似乎除了 Apache 2.0 许可证之外并无其他限制（当然也需考虑 ImageNet 相关问题）。无论哪种情况，如有任何疑问，您都应直接联系 Facebook 或 Google。\n\n## 引用\n\n### BibTeX\n\n```bibtex\n@misc{rw2019timm,\n  author = {Ross Wightman},\n  title = {PyTorch 图像模型},\n  year = {2019},\n  publisher = {GitHub},\n  journal = {GitHub 仓库},\n  doi = {10.5281\u002Fzenodo.4414861},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models}}\n}\n```\n\n### 最新 DOI\n\n[![DOI](https:\u002F\u002Fzenodo.org\u002Fbadge\u002F168799526.svg)](https:\u002F\u002Fzenodo.org\u002Fbadge\u002Flatestdoi\u002F168799526)","# PyTorch Image Models (timm) 快速上手指南\n\n`pytorch-image-models` (简称 `timm`) 是一个强大的 PyTorch 图像模型库，集成了数百种预训练的计算机视觉模型（如 ResNet, ViT, ConvNeXt 等），并提供统一的训练、验证和推理脚本。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows\n*   **Python**: 推荐 Python 3.10+ (支持范围通常为 3.8 - 3.13)\n*   **PyTorch**: 推荐 PyTorch 2.x (最低支持 1.13)\n*   **GPU**: 可选，但强烈推荐使用 NVIDIA GPU 以加速训练和推理\n\n**前置依赖检查：**\n请确保已正确安装 PyTorch 和 torchvision。您可以访问 [PyTorch 官网](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F) 获取适合您环境的安装命令。\n\n## 安装步骤\n\n### 方式一：使用 pip 安装（推荐）\n\n直接通过 PyPI 安装最新稳定版：\n\n```bash\npip install timm\n```\n\n**国内加速方案：**\n如果您在中国大陆，建议使用清华源或阿里源以加快下载速度：\n\n```bash\npip install timm -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n# 或者\npip install timm -i https:\u002F\u002Fmirrors.aliyun.com\u002Fpypi\u002Fsimple\u002F\n```\n\n### 方式二：从源码安装（获取最新特性）\n\n如果您需要使用 GitHub 上的最新代码（包含最新的模型和优化）：\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models.git\n```\n\n**国内加速方案（源码）：**\n使用 Gitee 镜像或代理加速（如果可用），或者直接克隆后安装：\n\n```bash\ngit clone https:\u002F\u002Fgitee.com\u002Fmirrors\u002Fpytorch-image-models.git\ncd pytorch-image-models\npip install -e . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n*(注：Gitee 镜像地址需确认是否同步最新，若不可用请直接使用官方 git+https 配合网络代理)*\n\n## 基本使用\n\n### 1. 加载预训练模型\n\n这是最常用的功能。`timm` 允许您通过一行代码加载数千种架构及其预训练权重。\n\n```python\nimport torch\nimport timm\n\n# 创建一个模型实例\n# model_name: 模型名称，例如 'resnet50', 'vit_base_patch16_224', 'convnext_base'\n# pretrained=True: 自动下载并加载 ImageNet 预训练权重\n# features_only=False: 设置为 True 可仅提取中间特征层\nmodel = timm.create_model('resnet50', pretrained=True)\n\n# 将模型设置为评估模式\nmodel.eval()\n\n# 准备输入数据 (Batch Size=1, Channels=3, Height=224, Width=224)\ninput_tensor = torch.randn(1, 3, 224, 224)\n\n# 前向传播\nwith torch.no_grad():\n    output = model(input_tensor)\n\nprint(output.shape)  # 输出形状通常为 [batch_size, num_classes]\n```\n\n### 2. 查看可用模型\n\n您可以列出所有支持的模型或搜索特定架构：\n\n```python\nimport timm\n\n# 列出所有包含 'resnet' 的模型名称\nmodels = timm.list_models('*resnet*')\nprint(models[:5])  # 打印前 5 个结果\n\n# 获取特定模型的详细信息配置\nconfig = timm.get_model_config('resnet50')\nprint(config)\n```\n\n### 3. 使用训练\u002F验证脚本\n\n`timm` 提供了功能完整的命令行脚本，用于训练、验证和批量推理。安装后，您可以在终端直接使用 `train.py` 和 `validate.py`（通常位于包内或通过 `python -m timm` 调用，具体取决于安装方式，建议参考官方文档获取最新脚本路径）。\n\n一个简单的验证示例（假设已准备好 ImageNet 格式的数据集）：\n\n```bash\n# 使用预训练的 ResNet50 在验证集上进行评估\npython validate.py \u002Fpath\u002Fto\u002Fimagenet\u002Fval --model resnet50 --amp -j 8\n```\n\n*   `--amp`: 启用自动混合精度加速推理。\n*   `-j 8`: 使用 8 个数据加载线程。\n\n### 4. 提取特征（作为骨干网络）\n\n如果您需要将模型作为其他任务（如检测、分割）的骨干网络：\n\n```python\nimport timm\n\n# 创建模型并只返回特征层\nmodel = timm.create_model('resnet50', pretrained=True, features_only=True)\n\n# 此时 model 是一个 FeatureHookManager，forward 返回一个元组，包含各层特征\nfeatures = model(torch.randn(1, 3, 224, 224))\n\n# features 是一个 tuple，包含不同分辨率的特征图\nfor i, f in enumerate(features):\n    print(f\"Layer {i} shape: {f.shape}\")\n```","某计算机视觉团队正在为电商平台的商品自动分类系统开发高精度图像识别模型，急需在有限算力下快速验证多种前沿架构。\n\n### 没有 pytorch-image-models 时\n- **模型复现成本极高**：工程师需手动从论文代码库移植 ResNeXT、EfficientNet 或 Vision Transformer 等模型，常因细节差异导致无法收敛或精度不达标。\n- **训练流程重复造轮子**：缺乏统一的训练、验证和推理脚本，每次尝试新架构（如 Swin Transformer 或 ConvNeXt）都需重写数据加载与优化器配置逻辑。\n- **预训练权重难对齐**：不同来源的预训练权重格式混乱，加载时需编写复杂的适配代码，且难以确保预处理步骤与官方训练一致。\n- **实验迭代缓慢**：由于环境配置和代码调试耗时过长，一周内仅能完成 1-2 种模型架构的基准测试，严重拖慢项目进度。\n\n### 使用 pytorch-image-models 后\n- **一键调用百种架构**：通过 `timm.create_model()` 即可实例化包括 MobileNetV4、MaxViT 在内的数百种编码器，直接复用经过严格验证的官方实现。\n- **全流程脚本开箱即用**：直接利用库内置的 train\u002Feval\u002Finference 脚本，配合新增的 NAdaMuon 优化器支持，迅速搭建起标准化的训练流水线。\n- **权重加载安全便捷**：借助更新的 `weights_only=True` 安全机制和统一的 Hugging Face 权重源，无缝加载 CSATv2 等最新高分模型的预训练参数。\n- **研发效率显著提升**：团队能在两天内完成十几种主流及前沿模型的对比实验，快速锁定最适合商品细粒度分类的架构方案。\n\npytorch-image-models 通过将碎片化的模型实现标准化，让研发团队从繁琐的代码工程中解放出来，专注于核心算法的优化与业务落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_pytorch-image-models_76de1774.png","huggingface","Hugging Face","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhuggingface_90da21a4.png","The AI community building the future.",null,"https:\u002F\u002Fhuggingface.co\u002F","https:\u002F\u002Fgithub.com\u002Fhuggingface",[84,88,92],{"name":85,"color":86,"percentage":87},"Python","#3572A5",87.4,{"name":89,"color":90,"percentage":91},"MDX","#fcb32c",12.6,{"name":93,"color":94,"percentage":95},"Shell","#89e051",0,36596,5141,"2026-04-05T02:19:44","Apache-2.0","Linux, macOS, Windows","非必需（支持 CPU 运行），但训练和推理建议使用 NVIDIA GPU。README 提及测试环境包含 RTX 4090, 5090, Pro 6000，支持 Flash Attention (F.SDPA) 优化。具体显存和 CUDA 版本未说明，取决于所选模型大小及 PyTorch 版本。","未说明（取决于模型规模，大型 ViT 模型建议 32GB+）",{"notes":104,"python":105,"dependencies":106},"该库主要提供图像模型架构、预训练权重及训练\u002F推理脚本。支持从 PyTorch 1.13 到 2.9.1 的广泛版本。已移除对 APEX AMP 的支持。支持 'meta' 设备初始化以节省内存。部分新功能（如 NaFlexViT、ROPE 支持）需特定参数启用。建议使用与所安装 PyTorch 版本匹配的 CUDA 环境。","3.10 - 3.13 (单元测试覆盖范围：下限 PyTorch 1.13 + Python 3.10，上限 PyTorch 2.9.1 + Python 3.13)",[107,108,109,110],"torch>=1.13","torchvision","pyyaml","huggingface_hub",[13,26,14],[113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132],"pytorch","resnet","pretrained-models","pretrained-weights","distributed-training","mobile-deep-learning","mobilenet-v2","mobilenetv3","efficientnet","augmix","randaugment","mixnet","vision-transformer-models","nfnets","normalization-free-training","maxvit","convnext","image-classification","imagenet","optimizer","2026-03-27T02:49:30.150509","2026-04-06T11:31:13.135112",[136,141,146,151,156,161],{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},16417,"如何高效地同时获取模型的特征嵌入（embeddings）和预测结果（predictions），避免重复计算？","现在可以通过分步调用来实现，无需两次完整的前向传播。首先使用 `forward_features` 获取特征，然后使用 `forward_head` 将特征转换为预测或嵌入：\n1. 获取未池化的特征：`unpooled_features = model.forward_features(inputs)`\n2. 获取预测 logits：`logits = model.forward_head(unpooled_features, pre_logits=False)`\n3. 获取嵌入向量：`embeddings = model.forward_head(unpooled_features, pre_logits=True)`\n这种方法避免了冗余计算，并允许从特征存储中直接生成预测。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fissues\u002F1141",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},16414,"如何复现 EfficientNet-B0 到 B7 的官方训练结果？有哪些推荐的超参数设置？","可以使用以下命令复现 EfficientNet-B0 的结果（基于 jiefengpeng 的设置）：\n`.\u002Fdistributed_train.sh 8 ..\u002FImageNet\u002F --model efficientnet_b0 -b 256 --sched step --epochs 500 --decay-epochs 3 --decay-rate 0.963 --opt rmsproptf --opt-eps .001 -j 8 --warmup-epochs 5 --weight-decay 1e-5 --drop 0.2 --color-jitter .06 --model-ema --lr .128`\n对于更大的模型（B3+），维护者更新了权重初始化方法以匹配 TensorFlow TPU 实现（特别是深度卷积部分），这有助于提升性能。如果结果仍有差距，请尝试使用更新后的代码从头训练。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fissues\u002F45",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},16415,"在哪里可以找到 DINOv3 模型的预训练权重？","DINOv3 (ViT 和 ConvNeXt) 的权重已合并到主分支并可用。您可以在 Hugging Face Hub 上的以下集合中找到它们：\nhttps:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ftimm\u002Ftimm-dinov3-68cb08bb0bee365973d52a4d\n维护者计划在下周初发布新版本以包含这些权重。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fissues\u002F2567",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},16416,"加载模型时遇到 \"CUDA out of memory\" 错误，特别是在恢复训练（resume）时，如何解决？","该问题通常与加载检查点时的内存峰值有关。代码中 `helper.py` 已经通过 `map_location='cpu'` 在 CPU 上加载模型来解决此问题：\n`checkpoint = torch.load(checkpoint_path, map_location='cpu')`\n如果您仍遇到此问题，请确保使用的是最新版本的代码，因为该问题已在多 GPU 恢复训练的场景中被修复。如果问题依旧，建议新建 Issue 并提供复现步骤。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fissues\u002F72",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},16418,"训练 ViT 模型时，如果使用 Adam 而不是 AdamW 优化器，需要注意什么？","如果您使用 Adam 代替 AdamW，必须调整权重衰减（weight_decay）和学习率。两者的主要区别在于权重衰减应用于梯度计算的方式不同。AdamW 将权重衰减与梯度更新解耦，而 Adam 将其耦合。因此，直接切换优化器而不调整超参数会导致性能下降。建议参考官方配置或使用默认的 AdamW 设置以获得最佳结果。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fissues\u002F252",{"id":162,"question_zh":163,"answer_zh":164,"source_url":150},16419,"为什么加载原始 DINOv3 权重时，ROPE 周期缓冲区的数据类型与 timm 实现不一致？这会影响精度吗？","原始权重中，除了 ROPE 周期缓冲区被强制转换为 bfloat16 外，其他参数均为 float32。timm 的实现不保留持久性缓冲区，而是在初始化时以 float32 生成它们。这导致小模型输出有约 3-4e-4 的差异，大模型差异降至 1e-7 级别。测试表明，保持 float32 进行微调效果略好，因此 timm 默认保持 float32 不变。这种差异通常可以忽略不计。",[166,171,176,181,186,191,196,201,206,211,216,221,226,231,236,241,246,251,256,261],{"id":167,"version":168,"summary_zh":169,"released_at":170},98752,"v1.0.26","## 2026年3月23日\n* 提升 pickle 检查点处理的安全性。将所有加载默认设置为 `weights_only=True`，并为 ArgParse 添加安全全局选项。\n* 改进核心 ViT\u002FEVA 模型及层的注意力掩码处理。解析布尔掩码，并在 SSL 任务中传递 `is_causal` 参数。\n* 修复在启用 ViT 且未使用位置嵌入时的类标记和注册标记用法问题。\n* 在 ViT 中新增补丁表示精炼（PRR）作为池化选项。感谢 Sina（https:\u002F\u002Fgithub.com\u002Fsinahmr）。\n* 提高注意力池化层输出投影与 MLP 维度的一致性。\n* 对 Hiera 模型进行 F.SDPA 优化，以支持 Flash Attention 内核的使用。\n* 向 SGDP 优化器添加了谨慎参数。\n* 发布 1.0.26 版本。这是我离开 Hugging Face 以来的首个维护版本。\n\n## 变更内容\n* 修复：@haosenwang1018 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2672 中将 5 处裸 except 子句替换为 except Exception。\n* @Boulaouaney 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2673 中将 timmx 模型导出工具添加到 README。\n* @Yuan-Jinghui 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2675 中为 SGDP 优化器增加了谨慎参数。\n* @sinahmr 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2676 中修复了在禁用 pos_embed 时 CLS 和 Reg 标记的使用问题。\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2679 中将加载函数的默认参数设置为 `weights_only=True`。\n* @Raiden129 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2680 中修复了 Hiera 全局注意力机制，使其使用 4D 张量以高效调度 SDPA。\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2684 中改进了 2D 和潜在空间注意力池化的维度处理，并修复了 #2682 问题。\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2686 中改进了 vision_transformer、eva 及相关模块的注意力掩码处理。\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2685 中实现了 PRR 作为池化模块，可替代 #2678 的方案。\n\n## 新贡献者\n* @haosenwang1018 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2672 中完成了首次贡献。\n* @Raiden129 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2680 中完成了首次贡献。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fcompare\u002Fv1.0.25...v1.0.26","2026-03-23T18:13:40",{"id":172,"version":173,"summary_zh":174,"released_at":175},98753,"v1.0.25","## 2026年2月23日\n* 在蒸馏任务包装器中添加令牌蒸馏训练支持\n* 为准备官方弃用而移除部分 torch.jit 的使用\n* 向 AdamP 优化器添加了谨慎性提示\n* 即使是元设备初始化，也调用 reset_parameters()，以便使用 init_empty_weights 等技巧正确初始化缓冲区\n* 调整 Muon 优化器以兼容 DTensor\u002FFSDP2（将 clamp_ 替换为 clamp_(min=)，为 DTensor 使用备用 NS 分支）\n* 发布 1.0.25 版本\n\n## 2026年1月21日\n* **兼容性破坏**：修复 `ParallelScalingBlock`（及 `DiffParallelScalingBlock`）中 QKV 与 MLP 偏置的疏忽问题\n  * 不会影响任何已训练的 `timm` 模型，但可能影响下游应用。\n\n## 变更内容\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2647 中完成了令牌蒸馏任务及蒸馏任务的重构\n* @hassonofer 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2649 中修复了 PiT forward_head 中使用错误令牌导致的蒸馏头部 dropout 问题\n* 修复 #2653，由于没有模型权重受到影响，因此仅为一次干净的修复，由 @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2654 中完成\n* @Yuan-Jinghui 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2657 中向 AdamP 优化器添加了谨慎优化器\n* @Yuan-Jinghui 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2658 中增强了谨慎优化器的数值稳定性\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2664 中针对 torch.jit 弃用和元设备初始化进行了一些杂项修复\n* fix(optim): 将 Lion 优化器中的裸 except 替换为 Exception，由 @llukito 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2666 中完成\n* 将 clamp_min_ 改为 clamp_(min=)，因为前者在 DTensor \u002F FSDP2 下无法正常工作，由 @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2668 中完成\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2669 中为 Muon 添加了兼容 DTensor 的 NS 实现\n\n## 新贡献者\n* @Yuan-Jinghui 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2657 中做出了首次贡献\n* @llukito 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2666 中做出了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fcompare\u002Fv1.0.24...v1.0.25","2026-02-23T17:22:16",{"id":177,"version":178,"summary_zh":179,"released_at":180},98754,"v1.0.24","## 2025年1月5日和6日\n* 补丁发布 1.0.24（修复 1.0.23 中的问题）\n* 添加新的基准测试结果 CSV 文件，用于在配备 RTX Pro 6000、5090 和 4090 显卡及 PyTorch 2.9.1 的所有模型上的推理时序测试\n* 修复已弃用的 timm.models.layers 导入路径中模块移动导致的错误，该错误影响旧版导入\n* 发布 1.0.23\n\n## 2025年12月30日\n* 添加经过 NAdaMuon 训练的更优 `dpwee`、`dwee`、`dlittle`（微分）ViT 模型，在先前运行的基础上有小幅提升\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k（top-1 准确率 83.24%）\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k（top-1 准确率 81.80%）\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k（top-1 准确率 81.67%）\n* 在 512x512 和 640x640 分辨率下添加一个约 2100 万参数的 CSATv2 模型 `timm` 变体\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fcsatv2_21m.sw_r640_in1k（top-1 准确率 83.13%）\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fcsatv2_21m.sw_r512_in1k（top-1 准确率 82.58%）\n* 将非持久化参数初始化从 `__init__` 方法中提取出来，放入一个公共方法中，可在元设备初始化后通过 `init_non_persistent_buffers()` 外部调用。\n\n## 2025年12月12日\n* 添加 CSATV2 模型（感谢 https:\u002F\u002Fgithub.com\u002Fgusdlf93）——一种轻量级但高分辨率的模型，采用 DCT 主干和空间注意力机制。https:\u002F\u002Fhuggingface.co\u002FHyunil\u002FCSATv2\n* 在现有的 `timm` Muon 实现中加入 AdaMuon 和 NAdaMuon 优化器支持。在图像任务中，使用熟悉的超参数时，其表现似乎比 AdamW 更具竞争力。\n* 年终 PR 清理，合并多个长期未合入的 PR：\n  * 合并微分注意力 (`DiffAttention`)，添加相应的 `DiffParallelScalingBlock`（用于 ViT），并训练一些小型 ViT 模型\n    * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dwee_patch16_reg1_gap_256.sbb_in1k\n    * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dpwee_patch16_reg1_gap_256.sbb_in1k\n  * 添加几个池化模块，包括 `LsePlus` 和 `SimPool`\n  * 清理并优化 `DropBlock2d`（同时为基于 ByobNet 的模型添加支持）\n* 将单元测试版本升级至 PyTorch 2.9.1 和 Python 3.13 的较高版本，较低版本仍为 PyTorch 1.13 和 Python 3.10。\n\n## 2025年12月1日\n* 添加轻量级任务抽象，并通过新任务向训练脚本中添加 logits 和特征蒸馏支持。\n* 移除旧的 APEX AMP 支持。\n\n## 变更内容\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2606 中，由 @t0278611 添加了验证间隔参数。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2617 中，由 @rwightman 添加了坐标注意力及其变体。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2598 中，由 @rwightman 进行了蒸馏相关的修复。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2620 中，由 @rwightman 对 DropBlock2d 进行了简化和修复。\n* 其他池化操作……由 @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2621 中完成。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2314 中，由 @rwightman 尝试微分注意力。\n* 微分 + 平行注意力由 @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fp","2026-01-07T00:28:47",{"id":182,"version":183,"summary_zh":184,"released_at":185},98755,"v1.0.23","## 2025年12月30日\n* 添加经过更好 NAdaMuon 训练的 `dpwee`、`dwee`、`dlittle`（微分）ViT 模型，在先前运行的基础上有小幅提升。\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k（top-1 准确率 83.24%）\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k（top-1 准确率 81.80%）\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k（top-1 准确率 81.67%）\n* 添加一个参数量约为 2100 万的 CSATv2 模型 `timm` 变体，分别在 512×512 和 640×640 分辨率下：\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fcsatv2_21m.sw_r640_in1k（top-1 准确率 83.13%）\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fcsatv2_21m.sw_r512_in1k（top-1 准确率 82.58%）\n* 将非持久化参数初始化从 `__init__` 方法中提取出来，放入一个公共方法中，以便在元设备初始化后通过 `init_non_persistent_buffers()` 外部调用。\n\n## 2025年12月12日\n* 添加 CSATV2 模型（感谢 https:\u002F\u002Fgithub.com\u002Fgusdlf93）——一种轻量级但高分辨率的模型，采用 DCT 骨干和空间注意力机制。https:\u002F\u002Fhuggingface.co\u002FHyunil\u002FCSATv2\n* 在现有的 `timm` Muon 实现中添加 AdaMuon 和 NAdaMuon 优化器支持。对于图像任务，在熟悉的超参数设置下，其表现似乎比 AdamW 更具竞争力。\n* 年终 PR 清理，合并多个长期未合入的 PR：\n  * 合并微分注意力 (`DiffAttention`)，添加相应的 `DiffParallelScalingBlock`（用于 ViT），并训练一些小型 ViT 模型。\n    * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dwee_patch16_reg1_gap_256.sbb_in1k\n    * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_dpwee_patch16_reg1_gap_256.sbb_in1k\n  * 添加几个池化模块，包括 `LsePlus` 和 `SimPool`。\n  * 清理并优化 `DropBlock2d`（同时为 ByobNet 基础模型添加支持）。\n* 将单元测试版本升级至 PyTorch 2.9.1 和 Python 3.13 的较高版本；较低版本仍保持 PyTorch 1.13 和 Python 3.10。\n\n## 2025年12月1日\n* 添加轻量级任务抽象，并通过新任务向训练脚本中加入 logits 和特征蒸馏支持。\n* 移除旧的 APEX AMP 支持。\n\n## 变更内容\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2606 中，由 @t0278611 添加了验证间隔参数。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2617 中，由 @rwightman 添加了坐标注意力及其变体。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2598 中，由 @rwightman 进行了蒸馏相关的修复。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2620 中，由 @rwightman 对 DropBlock2d 进行了简化和若干修复。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2621 中，由 @rwightman 添加了其他池化操作。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2314 中，由 @rwightman 尝试使用微分注意力。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2625 中，由 @rwightman 引入了微分注意力与并行注意力的结合。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2626 中，由 @rwightman 实现了 AdaMuon，并结合近期阅读的一些想法。\n* 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2627 中，由 @rwightman 贡献了 CSATv2 模型。\n* 在 hfdocs 中添加超参数章节。","2026-01-05T21:42:22",{"id":187,"version":188,"summary_zh":189,"released_at":190},98756,"v1.0.22","1.0.21 版本中优先级 LayerScale 初始化回归问题的补丁发布\n\n## 变更内容\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2602 中为 efficientnet_x 和 efficientnet_h 模型添加了一些权重。\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2603 中更新了结果 CSV 文件。\n* @Ilya-Fradlin 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2605 中修复了 LayerScale 忽略 init_values 的问题。\n\n## 新贡献者\n* @Ilya-Fradlin 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2605 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fcompare\u002Fv1.0.21...v1.0.22","2025-11-05T04:08:03",{"id":192,"version":193,"summary_zh":194,"released_at":195},98757,"v1.0.21","## 2025年10月16日至20日\n* 添加基于 https:\u002F\u002Fgithub.com\u002FKellerJordan\u002FMuon 的 Muon 优化器实现，并进行定制化改进：\n  * 提供额外的灵活性，改进对卷积权重的处理，以及针对不适合正交化的权重形状的回退机制。\n  * 通过减少内存分配并使用融合的 (b)add(b)mm 操作，小幅提升 NS 迭代的速度。\n  * 默认情况下，若参数形状不适合 Muon（或通过参数组标志排除），则使用 AdamW 更新（若 `nesterov=True` 则使用 NAdamW）。\n  * 类似于 PyTorch 实现，可通过 `adjust_lr_fn` 选择多种学习率缩放调整函数。\n  * 可从多个 NS 系数预设中选择，或通过 `ns_coefficients` 自定义系数。\n* 支持“meta”设备模型初始化的前两步：\n  * 修复了在“meta”设备上下文中会导致创建失败的若干操作。\n  * 在 `timm` 中的所有模型和模块（任何继承自 `nn.Module` 的类）中添加了 device 和 dtype 工厂关键字参数支持。\n* 在代码中的预训练配置文件中添加了许可证字段。\n* 发布版本 1.0.21\n\n## 变更内容\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2589 中添加了 `calculate_drop_path_rates` 辅助函数。\n* @Wauplin 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2592 中审查了 `huggingface_hub` 集成。\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2591 中为模块和模型添加了 device\u002Fdtype 的 factory_kwargs。\n* @alexanderdann 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2585 中实现了 timm 全局范围内一致的许可证处理。\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2596 中添加了 Muon 优化器的实现，并修复了 #2580。\n* @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2599 中将 Muon 的“simple”标志重命名为“fallback”。\n\n## 新贡献者\n* @alexanderdann 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2585 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fcompare\u002Fv1.0.20...v1.0.21","2025-10-24T22:39:31",{"id":197,"version":198,"summary_zh":199,"released_at":200},98758,"v1.0.20","## 2025年9月21日\n* 将DINOv3 ViT权重标签从`lvd_1689m`重映射为`lvd1689m`，以保持一致（`sat_493m` -> `sat493m`亦同）\n* 发布1.0.20版本\n\n## 2025年9月17日\n* 新增DINOv3（https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10104）ConvNeXt和ViT模型。ConvNeXt模型已映射至现有的`timm`模型。ViT支持则通过EVA基础模型实现，并引入新的`RotaryEmbeddingDinoV3`层，以匹配DINOv3特有的RoPE实现。\n  * HuggingFace Hub链接：https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ftimm\u002Ftimm-dinov3-68cb08bb0bee365973d52a4d\n* MobileCLIP-2（https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.20691）视觉编码器。新增MCI3\u002FMCI4 FastViT变体，并将权重映射至现有的FastViT以及B、L\u002F14 ViT模型。\n* MetaCLIP-2 Worldwide（https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.22062）ViT编码器权重已添加。\n* SigLIP-2（https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14786）NaFlex ViT编码器权重已通过timm的NaFlexViT模型添加。\n* 其他修复与贡献\n\n## 变更内容\n* @hassonofer在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2559中实现了在hieradet_sam2中传递init_values的功能。\n* @rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2560中添加了mobileclip2编码器权重。\n* @rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2561中增加了对Gemma 3n MobileNetV5编码器权重加载的支持。\n* 修复#2562问题，@rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2564中添加了siglip2 naflex vit编码器权重。\n* 修复：在保存结果前，若results_dir不存在则创建该目录，由@zhima771完成，见https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2576。\n* 功能性改进（validate）：添加精度、召回率和F1分数指标，由@ha405完成，见https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2568。\n* 允许用户在ImageDataset中请求除图像和标签之外的其他特征，由@grodino完成，见https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2571。\n* @rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2578中添加了MobileCLIP2图像编码器。\n* @rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2579中增加了DINOv3支持。\n\n## 新贡献者\n* @hassonofer在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2559中完成了首次贡献。\n* @zhima771在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2576中完成了首次贡献。\n* @ha405在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2568中完成了首次贡献。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fcompare\u002Fv1.0.19...v1.0.20","2025-09-21T17:28:50",{"id":202,"version":203,"summary_zh":204,"released_at":205},98759,"v1.0.19","Python 3.9 兼容性中断的补丁版本发布于 1.0.18\n\n## 2025年7月23日\n* 为 EVA 模型添加 `set_input_size()` 方法，OpenCLIP 3.0.0 使用该方法允许对基于 timm 的编码器模型进行尺寸调整。\n* 发布 1.0.18 版本，这是 OpenCLIP 3.0.0 中 PE-Core S & T 模型所必需的。\n* 修复了一个导致 Python 3.9 兼容性中断的小型类型问题。为此发布了 1.0.19 补丁版本。\n\n## 2025年7月21日\n* NaFlexViT 增加了 ROPE 支持。所有由 EVA 基类 (`eva.py`) 覆盖的模型，包括 EVA、EVA02、Meta PE ViT、带有 ROPE 的 `timm` SBB ViT 以及 Naver ROPE-ViT，现在都可以在创建模型时通过传递 `use_naflex=True` 参数加载到 NaFlexViT 中。\n* 新增了更多 Meta PE ViT 编码器，包括小型和微型变体、支持分块处理的语言变体，以及更多的空间变体。\n* 修复了 NaFlexViT 和 EVA 模型中的 PatchDropout 问题（在添加 Naver ROPE-ViT 后出现的回归）。\n* 修复了 `grid_indexing='xy'` 下的 XY 顺序问题，该问题影响了在 `xy` 模式下使用非正方形图像的情况（仅 ROPE-ViT 和 PE 受影响）。\n\n## 变更内容\n* 为 NaFlexVit（轴向和混合）添加 ROPE 支持，并支持大多数（甚至全部？）基于 EVA 的 ViT 模型及权重，相关工作由 @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2552 中完成。\n* 由 @rwightman 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2554 中实现了对 EVA 模型中 `set_input_size()` 方法的支持。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fcompare\u002Fv1.0.17...v1.0.18","2025-07-24T03:06:41",{"id":207,"version":208,"summary_zh":209,"released_at":210},98760,"v1.0.18","## 2025年7月23日\n* 为EVA模型添加`set_input_size()`方法，供OpenCLIP 3.0.0使用，以支持基于timm的编码器模型进行尺寸调整。\n* 发布1.0.18版本，这是OpenCLIP 3.0.0中PE-Core S & T模型所必需的。\n\n## 2025年7月21日\n* NaFlexViT新增ROPE支持。所有由EVA基础模块（`eva.py`）覆盖的模型，包括EVA、EVA02、Meta PE ViT、带有ROPE的`timm` SBB ViT以及Naver ROPE-ViT，在创建模型时传入`use_naflex=True`即可在NaFlexViT中加载。\n* 新增更多Meta PE ViT编码器，包括小型和极小型变体、带分块处理的语言变体，以及更多空间变体。\n* 修复了NaFlexViT及EVA模型中的PatchDropout问题（该问题是在添加Naver ROPE-ViT后出现的回归）。\n* 修正了`grid_indexing='xy'`下的XY顺序问题，该问题影响了在`xy`模式下使用非正方形图像的情况（仅ROPE-ViT和PE受影响）。\n\n## 变更内容\n* 在NaFlexVit（轴向和混合型）中添加ROPE支持，并通过@rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2552中支持大多数（甚至全部？）基于EVA的ViT模型及权重。\n* 通过@rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2554中为EVA模型支持`set_input_size()`方法。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fcompare\u002Fv1.0.17...v1.0.18","2025-07-23T20:03:48",{"id":212,"version":213,"summary_zh":214,"released_at":215},98761,"v1.0.17","## 2025年7月7日\n* 对MobileNet-v5主干网络进行微调，以改善Google Gemma 3n的表现（与更新的官方权重配套使用）\n  * 添加stem偏置（在更新的权重中已置零，与旧权重不兼容）\n  * 将GELU替换为GELU（tanh近似）。这一小幅改动旨在更接近JAX实现\n* 为层衰减支持添加两个参数：最小缩放系数限制和“无优化”缩放阈值\n* 添加‘Fp32’版本的LayerNorm、RMSNorm和SimpleNorm变体，可启用以强制在float32精度下进行归一化计算\n* 针对归一化层及归一化+激活层进行了类型注解和参数清理，与上述更改同步完成\n* 在`eva.py`中支持Naver ROPE-ViT（https:\u002F\u002Fgithub.com\u002Fnaver-ai\u002Frope-vit），并添加RotaryEmbeddingMixed模块以支持混合模式，相关权重已上传至HuggingFace Hub\n\n|模型                                             |图像尺寸|top1  |top5  |参数量|\n|--------------------------------------------------|--------|------|------|-----------|\n|vit_large_patch16_rope_mixed_ape_224.naver_in1k  |224     |84.84 |97.122|304.4      |\n|vit_large_patch16_rope_mixed_224.naver_in1k      |224     |84.828|97.116|304.2      |\n|vit_large_patch16_rope_ape_224.naver_in1k        |224     |84.65 |97.154|304.37     |\n|vit_large_patch16_rope_224.naver_in1k            |224     |84.648|97.122|304.17     |\n|vit_base_patch16_rope_mixed_ape_224.naver_in1k   |224     |83.894|96.754|86.59      |\n|vit_base_patch16_rope_mixed_224.naver_in1k       |224     |83.804|96.712|86.44      |\n|vit_base_patch16_rope_ape_224.naver_in1k         |224     |83.782|96.61 |86.59      |\n|vit_base_patch16_rope_224.naver_in1k             |224     |83.718|96.672|86.43      |\n|vit_small_patch16_rope_224.naver_in1k            |224     |81.23 |95.022|21.98      |\n|vit_small_patch16_rope_mixed_224.naver_in1k      |224     |81.216|95.022|21.99      |\n|vit_small_patch16_rope_ape_224.naver_in1k        |224     |81.004|95.016|22.06      |\n|vit_small_patch16_rope_mixed_ape_224.naver_in1k  |224     |80.986|94.976|22.06      |\n* 对ROPE模块、辅助函数以及FX追踪中的叶节点注册进行了一些清理\n* 正在准备1.0.17版本发布\n\n## 变更内容\n* @rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2529中为EVA ViT添加了对Naver rope-vit的支持\n* @GuillaumeErhard在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2534中将no_grad的用法尽可能更新为inference_mode\n* @rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2537中添加了最小层衰减缩放系数限制和用于排除组别参与优化的阈值\n* @rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2538中为MNV5添加了stem_bias选项，并解决了归一化层的问题，使其能够接受字符串作为输入\n* @rwightman在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2536中添加了启用float32精度归一化计算的标志\n* 修复：@RyanMullins在https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2533中修复了mnv5 conv_stem偏置及GELU使用approximate=tanh的问题\n* 修复类型转换错误","2025-07-10T16:04:42",{"id":217,"version":218,"summary_zh":219,"released_at":220},98762,"v1.0.16","## June 26, 2025\r\n* MobileNetV5 backbone (w\u002F encoder only variant) for [Gemma 3n](https:\u002F\u002Fai.google.dev\u002Fgemma\u002Fdocs\u002Fgemma-3n#parameters) image encoder\r\n* Version 1.0.16 released\r\n\r\n## June 23, 2025\r\n* Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https:\u002F\u002Fgithub.com\u002Fstas-sl).\r\n* Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https:\u002F\u002Fgithub.com\u002Fstas-sl).\r\n* Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w\u002F same hparams.\r\n\r\n | Model | Top-1 Acc | Top-5 Acc | Params (M) | Eval Seq Len |\r\n |:---|:---:|:---:|:---:|:---:|\r\n | [naflexvit_base_patch16_par_gap.e300_s576_in1k](https:\u002F\u002Fhf.co\u002Ftimm\u002Fnaflexvit_base_patch16_par_gap.e300_s576_in1k) | 83.67 | 96.45 | 86.63 | 576 |\r\n | [naflexvit_base_patch16_parfac_gap.e300_s576_in1k](https:\u002F\u002Fhf.co\u002Ftimm\u002Fnaflexvit_base_patch16_parfac_gap.e300_s576_in1k) | 83.63 | 96.41 | 86.46 | 576 |\r\n | [naflexvit_base_patch16_gap.e300_s576_in1k](https:\u002F\u002Fhf.co\u002Ftimm\u002Fnaflexvit_base_patch16_gap.e300_s576_in1k) | 83.50 | 96.46 | 86.63 | 576 |\r\n* Support gradient checkpointing for `forward_intermediates` and fix some checkpointing bugs. Thanks https:\u002F\u002Fgithub.com\u002Fbrianhou0208\r\n* Add 'corrected weight decay' (https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers\r\n* Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly\r\n* Fix cuda stream bug in prefetch loader\r\n  \r\n## June 5, 2025\r\n* Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:\r\n  1. Encapsulated embedding and position encoding in a single module\r\n  2. Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs\r\n  3. Support for NaFlex variable aspect, variable resolution (SigLip-2: https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14786)\r\n  4. Support for FlexiViT variable patch size (https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.08013)\r\n  5. Support for NaViT fractional\u002Ffactorized position embedding (https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.06304)\r\n* Existing vit models in `vision_transformer.py` can be loaded into the NaFlexVit model by adding the `use_naflex=True` flag to `create_model`\r\n  * Some native weights coming soon\r\n* A full NaFlex data pipeline is available that allows training \u002F fine-tuning \u002F evaluating with variable aspect \u002F size images\r\n  * To enable in `train.py` and `validate.py` add the `--naflex-loader` arg, must be used with a NaFlexVit\r\n* To evaluate an existing (classic) ViT loaded in NaFlexVit model w\u002F NaFlex data pipe:\r\n  * `python validate.py \u002Fimagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256` \r\n* The training has some extra args features worth noting\r\n  * The `--naflex-train-seq-lens'` argument specifies which sequence lengths to randomly pick from per batch during training\r\n  * The `--naflex-max-seq-len` argument sets the target sequence length for validation\r\n  * Adding `--model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24` will enable random patch size selection per-batch w\u002F interpolation\r\n  * The `--naflex-loss-scale` arg changes loss scaling mode per batch relative to the batch size, `timm` NaFlex loading changes the batch size for each seq len\r\n\r\n## May 28, 2025\r\n* Add a number of small\u002Ffast models thanks to https:\u002F\u002Fgithub.com\u002Fbrianhou0208\r\n  * SwiftFormer - [(ICCV2023) SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https:\u002F\u002Fgithub.com\u002FAmshaker\u002FSwiftFormer) \r\n  * FasterNet - [(CVPR2023) Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks](https:\u002F\u002Fgithub.com\u002FJierunChen\u002FFasterNet)\r\n  * SHViT - [(CVPR2024) SHViT: Single-Head Vision Transformer with Memory Efficient](https:\u002F\u002Fgithub.com\u002Fysj9909\u002FSHViT)\r\n  * StarNet - [(CVPR2024) Rewrite the Stars](https:\u002F\u002Fgithub.com\u002Fma-xu\u002FRewrite-the-Stars)\r\n  * GhostNet-V3 [GhostNetV3: Exploring the Training Strategies for Compact Models](https:\u002F\u002Fgithub.com\u002Fhuawei-noah\u002FEfficient-AI-Backbones\u002Ftree\u002Fmaster\u002Fghostnetv3_pytorch)\r\n* Update EVA ViT (closest match) to support Perception Encoder models (https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.13181) from Meta, loading Hub weights but I still need to push dedicated `timm` weights\r\n  * Add some flexibility to ROPE impl\r\n* Big increase in number of models supporting `forward_intermediates()` and some additional fixes thanks to https:\u002F\u002Fgithub.com\u002Fbrianhou0208\r\n  * DaViT, EdgeNeXt, EfficientFormerV2, EfficientViT(MIT), EfficientViT(MSRA), FocalNet, GCViT, HGNet \u002FV2, InceptionNeXt, Inception-V4, MambaOut, MetaFormer, NesT, Next-ViT, PiT, PVT V2, RepGhostNet, RepViT, ResNetV2, ReXNet, TinyViT, TResNet, VoV\r\n* TNT model updated w\u002F new weights `forward_intermediates()` thanks to https:\u002F\u002Fgithub.com\u002Fbrianhou0208\r\n* Add `local-dir:` pretrained schema, can use `local-dir:\u002Fpath\u002Fto\u002Fmodel\u002Ffolder` for mo","2025-06-26T18:44:53",{"id":222,"version":223,"summary_zh":224,"released_at":225},98763,"v1.0.15","## Feb 21, 2025\r\n* SigLIP 2 ViT image encoders added (https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ftimm\u002Fsiglip-2-67b8e72ba08b09dd97aecaf9)\r\n  * Variable resolution \u002F aspect NaFlex versions are a WIP\r\n* Add 'SO150M2' ViT weights trained with SBB recipes, great results, better for ImageNet than previous attempt w\u002F less training.\r\n  * `vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k` - 88.1% top-1\r\n  * `vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k` - 87.9% top-1\r\n  * `vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k` - 87.3% top-1\r\n  * `vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12k`\r\n* Updated InternViT-300M '2.5' weights\r\n* Release 1.0.15\r\n\r\n## Feb 1, 2025\r\n* FYI PyTorch 2.6 & Python 3.13 are tested and working w\u002F current main and released version of `timm`\r\n\r\n## Jan 27, 2025\r\n* Add Kron Optimizer (PSGD w\u002F Kronecker-factored preconditioner) \r\n  * Code from https:\u002F\u002Fgithub.com\u002Fevanatyourservice\u002Fkron_torch\r\n  * See also https:\u002F\u002Fsites.google.com\u002Fsite\u002Flixilinx\u002Fhome\u002Fpsgd\r\n\r\n## What's Changed\r\n* Fix metavar for `--input-size` by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2417\r\n* Add arguments to the respective argument groups by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2416\r\n* Add missing training flag to convert_sync_batchnorm by @collinmccarthy in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2423\r\n* Fix num_classes update in reset_classifier and RDNet forward head call by @brianhou0208 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2421\r\n* timm: add __all__ to __init__ by @adamjstewart in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2399\r\n* Fiddling with Kron (PSGD) optimizer by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2427\r\n* Try to force numpy\u003C2.0 for torch 1.13 tests, update newest tested torch to 2.5.1 by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2429\r\n* Kron flatten improvements + stochastic weight decay by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2431\r\n* PSGD: unify RNG by @ClashLuke in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2433\r\n* Add vit so150m2 weights by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2439\r\n* adapt_input_conv: add type hints by @adamjstewart in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2441\r\n* SigLIP 2 by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2440\r\n* timm.models: explicitly export attributes by @adamjstewart in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2442\r\n\r\n## New Contributors\r\n* @collinmccarthy made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2423\r\n* @ClashLuke made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2433\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fcompare\u002Fv1.0.14...v1.0.15","2025-02-23T05:07:06",{"id":227,"version":228,"summary_zh":229,"released_at":230},98764,"v1.0.14","## Jan 19, 2025\r\n* Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated\r\n* Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k\u002F1k pretrain\u002Fft\r\n  * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k` - 86.7% top-1\r\n  * `vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k` - 87.4% top-1\r\n  * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k`\r\n* Misc typing, typo, etc. cleanup\r\n* 1.0.14 release to get above LeViT fix out\r\n\r\n## What's Changed\r\n* Fix nn.Module type hints by @adamjstewart in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2400\r\n* Add missing paper title by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2405\r\n* fix 'timm recipe scripts' link by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2404\r\n* fix typo in EfficientNet docs by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2403\r\n* disable abbreviating csv inference output with ellipses by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2402\r\n* fix incorrect LaTeX formulas by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2406\r\n* VGG ConvMlp: fix layer defaults\u002Ftypes by @adamjstewart in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2409\r\n* Implement --no-console-results in inference.py by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2408\r\n* LeViT safetensors load is broken by conversion code that wasn't deactivated by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2412\r\n* A few more weights by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2413\r\n* Fix typos by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2415\r\n\r\n## New Contributors\r\n* @adamjstewart made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2400\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fcompare\u002Fv1.0.13...v1.0.14","2025-01-19T23:05:30",{"id":232,"version":233,"summary_zh":234,"released_at":235},98765,"v1.0.13","## Jan 9, 2025\r\n* Add support to train and validate in pure `bfloat16` or `float16`\r\n* `wandb` project name arg added by https:\u002F\u002Fgithub.com\u002Fcaojiaolong, use arg.experiment for name\r\n* Fix old issue w\u002F checkpoint saving not working on filesystem w\u002Fo hard-link support (e.g. FUSE fs mounts)\r\n* 1.0.13 release\r\n\r\n## Jan 6, 2025\r\n* Add `torch.utils.checkpoint.checkpoint()` wrapper in `timm.models` that defaults `use_reentrant=False`, unless `TIMM_REENTRANT_CKPT=1` is set in env.\r\n\r\n## Dec 31, 2024\r\n* `convnext_nano` 384x384 ImageNet-12k pretrain & fine-tune. https:\u002F\u002Fhuggingface.co\u002Fmodels?search=convnext_nano%20r384\r\n* Add AIM-v2 encoders from https:\u002F\u002Fgithub.com\u002Fapple\u002Fml-aim, see on Hub: https:\u002F\u002Fhuggingface.co\u002Fmodels?search=timm%20aimv2\r\n* Add PaliGemma2 encoders from https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fbig_vision to existing PaliGemma, see on Hub: https:\u002F\u002Fhuggingface.co\u002Fmodels?search=timm%20pali2\r\n* Add missing L\u002F14 DFN2B 39B CLIP ViT, `vit_large_patch14_clip_224.dfn2b_s39b`\r\n* Fix existing `RmsNorm` layer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl to `SimpleNorm` layer, it's LN w\u002Fo centering or bias. There were only two `timm` models using it, and they have been updated.\r\n* Allow override of `cache_dir` arg for model creation\r\n* Pass through `trust_remote_code` for HF datasets wrapper\r\n* `inception_next_atto` model added by creator\r\n* Adan optimizer caution, and Lamb decoupled weighgt decay options\r\n* Some feature_info metadata fixed by https:\u002F\u002Fgithub.com\u002Fbrianhou0208\r\n* All OpenCLIP and JAX (CLIP, SigLIP, Pali, etc) model weights that used load time remapping were given their own HF Hub instances so that they work with `hf-hub:` based loading, and thus will work with new Transformers `TimmWrapperModel`\r\n\r\n## What's Changed\r\n* Punch cache_dir through model factory \u002F builder \u002F pretrain helpers by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2356\r\n* Yuweihao inception next atto merge by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2360\r\n* Dataset trust remote tweaks by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2361\r\n* Add --dataset-trust-remote-code to the train.py and validate.py scripts by @grodino in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2328\r\n* Fix feature_info.reduction by @brianhou0208 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2369\r\n* Add caution to Adan. Add decouple decay option to LAMB. by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2357\r\n* Switching to timm specific weight instances for open_clip image encoders by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2376\r\n* Fix broken image link in `Quickstart` doc by @ariG23498 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2381\r\n* Supporting aimv2 encoders by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2379\r\n* fix: minor typos in markdowns by @ruidazeng in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2382\r\n* Add 384x384 in12k pretrain and finetune for convnext_nano by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2384\r\n* Fixed unfused attn2d scale by @laclouis5 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2387\r\n* Fix MQA V2 by @laclouis5 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2388\r\n* Wrap torch checkpoint() fn to default use_reentrant flag to False and allow env var override by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2394\r\n* Add half-precision (bfloat16, float16) support to train & validate scripts by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2397\r\n* Merging wandb project name chages w\u002F addition by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2398\r\n\r\n## New Contributors\r\n* @brianhou0208 made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2369\r\n* @ariG23498 made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2381\r\n* @ruidazeng made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2382\r\n* @laclouis5 made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2387\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fcompare\u002Fv1.0.12...v1.0.13","2025-01-09T18:49:44",{"id":237,"version":238,"summary_zh":239,"released_at":240},98766,"v1.0.12","## Nov 28, 2024\r\n* More optimizers\r\n  * Add MARS optimizer (https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10438, https:\u002F\u002Fgithub.com\u002FAGI-Arena\u002FMARS)\r\n  * Add LaProp optimizer (https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.04839, https:\u002F\u002Fgithub.com\u002FZ-T-WANG\u002FLaProp-Optimizer)\r\n  * Add masking from 'Cautious Optimizers' (https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.16085, https:\u002F\u002Fgithub.com\u002Fkyleliang919\u002FC-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW\r\n  * Cleanup some docstrings and type annotations re optimizers and factory\r\n* Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384\r\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmobilenetv4_conv_medium.e250_r384_in12k_ft_in1k\r\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmobilenetv4_conv_medium.e250_r384_in12k\r\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmobilenetv4_conv_medium.e180_ad_r384_in12k\r\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmobilenetv4_conv_medium.e180_r384_in12k\r\n* Add small cs3darknet, quite good for the speed\r\n  * https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fcs3darknet_focus_s.ra4_e3600_r256_in1k\r\n\r\n## Nov 12, 2024\r\n* Optimizer factory refactor\r\n  * New factory works by registering optimizers using an OptimInfo dataclass w\u002F some key traits\r\n  * Add `list_optimizers`, `get_optimizer_class`, `get_optimizer_info` to reworked `create_optimizer_v2` fn to explore optimizers, get info or class\r\n  * deprecate `optim.optim_factory`, move fns to `optim\u002F_optim_factory.py` and `optim\u002F_param_groups.py` and encourage import via `timm.optim`\r\n* Add Adopt (https:\u002F\u002Fgithub.com\u002FiShohei220\u002Fadopt) optimizer\r\n* Add 'Big Vision' variant of Adafactor (https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fbig_vision\u002Fblob\u002Fmain\u002Fbig_vision\u002Foptax.py) optimizer\r\n* Fix original Adafactor to pick better factorization dims for convolutions\r\n* Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit\r\n* dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thanks https:\u002F\u002Fgithub.com\u002Fwojtke\r\n\r\n## Oct 31, 2024\r\nAdd a set of new very well trained ResNet & ResNet-V2 18\u002F34 (basic block) weights. See https:\u002F\u002Fhuggingface.co\u002Fblog\u002Frwightman\u002Fresnet-trick-or-treat\r\n\r\n## Oct 19, 2024\r\n* Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices from [MengqingCao](https:\u002F\u002Fgithub.com\u002FMengqingCao) that should work now in PyTorch 2.5 w\u002F new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked.\r\n\r\n## What's Changed\r\n* mambaout.py: fixed bug by @NightMachinery in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2305\r\n* Cleanup some amp related behaviour to better support different (non-cuda) devices by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2308\r\n* Add NPU backend support for val and inference by @MengqingCao in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2109\r\n* Update some clip pretrained weights to point to new hub locations by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2311\r\n* ResNet vs MNV4 v1\u002Fv2 18 & 34 weights by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2316\r\n* Replace deprecated positional argument with --data-dir by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2322\r\n* Fix typo in train.py: bathes > batches by @JosuaRieder in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2321\r\n* Fix positional embedding resampling for non-square inputs in ViT by @wojtke in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2317\r\n* Add trust_remote_code argument to ReaderHfds by @grodino in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2326\r\n* Extend train epoch schedule by warmup_epochs if warmup_prefix enabled by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2325\r\n* Extend existing unit tests using Cover-Agent by @mrT23 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2331\r\n* An impl of adafactor as per big vision (scaling vit) changes by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2320\r\n* Add py.typed file as recommended by PEP 561 by @antoinebrl in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2252\r\n* Add CODE_OF_CONDUCT.md and CITATION.cff files by @AlinaImtiaz018 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2333\r\n* Add some 384x384 small model weights by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2334\r\n* In dist training, update loss running avg every step, sync on log by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2340\r\n* Improve WandB logging by @sinahmr in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2341\r\n* A few weights to merge Friday by @rwightman in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Fpull\u002F2343\r\n* Update timm torchvision resnet weight urls to the updated urls in torchvision by @JohannesTheo in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002F","2024-12-03T19:05:39",{"id":242,"version":243,"summary_zh":244,"released_at":245},98767,"v1.0.11","Quick turnaround from 1.0.10 to fix an error impacting 3rd party packages that still import through a deprecated path that isn't tested.\r\n\r\n## Oct 16, 2024\r\n* Fix error on importing from deprecated path `timm.models.registry`, increased priority of existing deprecation warnings to be visible\r\n* Port weights of InternViT-300M (https:\u002F\u002Fhuggingface.co\u002FOpenGVLab\u002FInternViT-300M-448px) to `timm` as `vit_intern300m_patch14_448`\r\n\r\n### Oct 14, 2024\r\n* Pre-activation (ResNetV2) version of 18\u002F18d\u002F34\u002F34d ResNet model defs added by request (weights pending)\r\n* Release 1.0.10\r\n\r\n### Oct 11, 2024\r\n* MambaOut (https:\u002F\u002Fgithub.com\u002Fyuweihao\u002FMambaOut) model & weights added. A cheeky take on SSM vision models w\u002Fo the SSM (essentially ConvNeXt w\u002F gating). A mix of original weights + custom variations & weights.\r\n\r\n|model                                                                                                                |img_size|top1  |top5  |param_count|\r\n|---------------------------------------------------------------------------------------------------------------------|--------|------|------|-----------|\r\n|[mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k)|384     |87.506|98.428|101.66     |\r\n|[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|288     |86.912|98.236|101.66     |\r\n|[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|224     |86.632|98.156|101.66     |\r\n|[mambaout_base_tall_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_tall_rw.sw_e500_in1k)                  |288     |84.974|97.332|86.48      |\r\n|[mambaout_base_wide_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_wide_rw.sw_e500_in1k)                  |288     |84.962|97.208|94.45      |\r\n|[mambaout_base_short_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_short_rw.sw_e500_in1k)                |288     |84.832|97.27 |88.83      |\r\n|[mambaout_base.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base.in1k)                                                  |288     |84.72 |96.93 |84.81      |\r\n|[mambaout_small_rw.sw_e450_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_small_rw.sw_e450_in1k)                          |288     |84.598|97.098|48.5       |\r\n|[mambaout_small.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_small.in1k)                                                |288     |84.5  |96.974|48.49      |\r\n|[mambaout_base_wide_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_wide_rw.sw_e500_in1k)                  |224     |84.454|96.864|94.45      |\r\n|[mambaout_base_tall_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_tall_rw.sw_e500_in1k)                  |224     |84.434|96.958|86.48      |\r\n|[mambaout_base_short_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_short_rw.sw_e500_in1k)                |224     |84.362|96.952|88.83      |\r\n|[mambaout_base.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base.in1k)                                                  |224     |84.168|96.68 |84.81      |\r\n|[mambaout_small.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_small.in1k)                                                |224     |84.086|96.63 |48.49      |\r\n|[mambaout_small_rw.sw_e450_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_small_rw.sw_e450_in1k)                          |224     |84.024|96.752|48.5       |\r\n|[mambaout_tiny.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_tiny.in1k)                                                  |288     |83.448|96.538|26.55      |\r\n|[mambaout_tiny.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_tiny.in1k)                                                  |224     |82.736|96.1  |26.55      |\r\n|[mambaout_kobe.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_kobe.in1k)                                                  |288     |81.054|95.718|9.14       |\r\n|[mambaout_kobe.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_kobe.in1k)                                                  |224     |79.986|94.986|9.14       |\r\n|[mambaout_femto.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_femto.in1k)                                                |288     |79.848|95.14 |7.3        |\r\n|[mambaout_femto.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_femto.in1k)                                                |224     |78.87 |94.408|7.3        |\r\n\r\n* SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models\r\n  *  [vit_so400m_patch14_siglip_378.webli_ft_in1k](https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_so400m_patch14_siglip_378.webli_ft_in1k) - 89.42 top-1\r\n  *  [vit_so400m_patch14_siglip_gap_378.webli_ft_in1k](https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_so400m_patch14_siglip_gap_378.webli_ft_in1k) - 89.03\r\n* SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https:\u002F\u002Fhuggingface.co\u002Ftimm\u002FViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.\r\n* Add two ConvNeXt 'Zepto'","2024-10-16T21:19:16",{"id":247,"version":248,"summary_zh":249,"released_at":250},98768,"v1.0.10","\r\n### Oct 14, 2024\r\n* Pre-activation (ResNetV2) version of 18\u002F18d\u002F34\u002F34d ResNet model defs added by request (weights pending)\r\n* Release 1.0.10\r\n\r\n### Oct 11, 2024\r\n* MambaOut (https:\u002F\u002Fgithub.com\u002Fyuweihao\u002FMambaOut) model & weights added. A cheeky take on SSM vision models w\u002Fo the SSM (essentially ConvNeXt w\u002F gating). A mix of original weights + custom variations & weights.\r\n\r\n|model                                                                                                                |img_size|top1  |top5  |param_count|\r\n|---------------------------------------------------------------------------------------------------------------------|--------|------|------|-----------|\r\n|[mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k)|384     |87.506|98.428|101.66     |\r\n|[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|288     |86.912|98.236|101.66     |\r\n|[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|224     |86.632|98.156|101.66     |\r\n|[mambaout_base_tall_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_tall_rw.sw_e500_in1k)                  |288     |84.974|97.332|86.48      |\r\n|[mambaout_base_wide_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_wide_rw.sw_e500_in1k)                  |288     |84.962|97.208|94.45      |\r\n|[mambaout_base_short_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_short_rw.sw_e500_in1k)                |288     |84.832|97.27 |88.83      |\r\n|[mambaout_base.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base.in1k)                                                  |288     |84.72 |96.93 |84.81      |\r\n|[mambaout_small_rw.sw_e450_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_small_rw.sw_e450_in1k)                          |288     |84.598|97.098|48.5       |\r\n|[mambaout_small.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_small.in1k)                                                |288     |84.5  |96.974|48.49      |\r\n|[mambaout_base_wide_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_wide_rw.sw_e500_in1k)                  |224     |84.454|96.864|94.45      |\r\n|[mambaout_base_tall_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_tall_rw.sw_e500_in1k)                  |224     |84.434|96.958|86.48      |\r\n|[mambaout_base_short_rw.sw_e500_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base_short_rw.sw_e500_in1k)                |224     |84.362|96.952|88.83      |\r\n|[mambaout_base.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_base.in1k)                                                  |224     |84.168|96.68 |84.81      |\r\n|[mambaout_small.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_small.in1k)                                                |224     |84.086|96.63 |48.49      |\r\n|[mambaout_small_rw.sw_e450_in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_small_rw.sw_e450_in1k)                          |224     |84.024|96.752|48.5       |\r\n|[mambaout_tiny.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_tiny.in1k)                                                  |288     |83.448|96.538|26.55      |\r\n|[mambaout_tiny.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_tiny.in1k)                                                  |224     |82.736|96.1  |26.55      |\r\n|[mambaout_kobe.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_kobe.in1k)                                                  |288     |81.054|95.718|9.14       |\r\n|[mambaout_kobe.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_kobe.in1k)                                                  |224     |79.986|94.986|9.14       |\r\n|[mambaout_femto.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_femto.in1k)                                                |288     |79.848|95.14 |7.3        |\r\n|[mambaout_femto.in1k](http:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fmambaout_femto.in1k)                                                |224     |78.87 |94.408|7.3        |\r\n\r\n* SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models\r\n  *  [vit_so400m_patch14_siglip_378.webli_ft_in1k](https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_so400m_patch14_siglip_378.webli_ft_in1k) - 89.42 top-1\r\n  *  [vit_so400m_patch14_siglip_gap_378.webli_ft_in1k](https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_so400m_patch14_siglip_gap_378.webli_ft_in1k) - 89.03\r\n* SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https:\u002F\u002Fhuggingface.co\u002Ftimm\u002FViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.\r\n* Add two ConvNeXt 'Zepto' models & weights (one w\u002F overlapped stem and one w\u002F patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.\r\n  * [convnext_zepto_rms_ols.ra4_e3600_r224_in1k](https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fconvnext_zepto_rms_ols.ra4_e3600_r224_in1k) - 73.20 top-1 @ 224\r\n  * [convnext_zepto_rms.ra4_e3600_r224_in1k](https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fconvnext_zepto_rms.ra4_e3600_r224_in1k) - 72.81 @ 224\r\n\r\n### Sept 2024\r\n* Add a suite o","2024-10-15T04:44:58",{"id":252,"version":253,"summary_zh":254,"released_at":255},98769,"v1.0.9","### Aug 21, 2024\r\n* Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models\r\n\r\n| model | top1 | top5 | param_count | img_size |\r\n| -------------------------------------------------- | ------ | ------ | ----------- | -------- |\r\n| [vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k](https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k) | 87.438 | 98.256 | 64.11 | 384 |\r\n| [vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k](https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k) | 86.608 | 97.934 | 64.11 | 256 |\r\n| [vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k](https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k) | 86.594 | 98.02 | 60.4 | 384 |\r\n| [vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k](https:\u002F\u002Fhuggingface.co\u002Ftimm\u002Fvit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k) | 85.734 | 97.61 | 60.4 | 256 |\r\n\r\n* MobileNet-V1 1.25, EfficientNet-B1, & ResNet50-D weights w\u002F MNV4 baseline challenge recipe\r\n\r\n| model                                                                                                                    | top1   | top5   | param_count | img_size |\r\n|--------------------------------------------------------------------------------------------------------------------------|--------|--------|-------------|----------|\r\n| [resnet50d.ra4_e3600_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fresnet50d.ra4_e3600_r224_in1k)                                         | 81.838 | 95.922 | 25.58       | 288      |\r\n| [efficientnet_b1.ra4_e3600_r240_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fefficientnet_b1.ra4_e3600_r240_in1k)                             | 81.440 | 95.700 | 7.79        | 288      |\r\n| [resnet50d.ra4_e3600_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fresnet50d.ra4_e3600_r224_in1k)                                         | 80.952 | 95.384 | 25.58       | 224      |\r\n| [efficientnet_b1.ra4_e3600_r240_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fefficientnet_b1.ra4_e3600_r240_in1k)                             | 80.406 | 95.152 | 7.79        | 240      |\r\n| [mobilenetv1_125.ra4_e3600_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv1_125.ra4_e3600_r224_in1k)                             | 77.600 | 93.804 | 6.27        | 256      |\r\n| [mobilenetv1_125.ra4_e3600_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv1_125.ra4_e3600_r224_in1k)                             | 76.924 | 93.234 | 6.27        | 224      |\r\n\r\n* Add SAM2 (HieraDet) backbone arch & weight loading support\r\n\r\n* Add Hiera Small weights trained w\u002F abswin pos embed on in12k & fine-tuned on 1k\r\n\r\n|model                            |top1  |top5  |param_count|\r\n|---------------------------------|------|------|-----------|\r\n|hiera_small_abswin_256.sbb2_e200_in12k_ft_in1k    |84.912|97.260|35.01      |\r\n|hiera_small_abswin_256.sbb2_pd_e200_in12k_ft_in1k |84.560|97.106|35.01      |\r\n\r\n### Aug 8, 2024\r\n* Add RDNet ('DenseNets Reloaded', https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.19588), thanks [Donghyun Kim](https:\u002F\u002Fgithub.com\u002Fdhkim0225)\r\n  ","2024-08-23T23:42:07",{"id":257,"version":258,"summary_zh":259,"released_at":260},98770,"v1.0.8","### July 28, 2024\r\n* Add `mobilenet_edgetpu_v2_m` weights w\u002F `ra4` mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256.\r\n* Release 1.0.8\r\n\r\n### July 26, 2024\r\n* More MobileNet-v4 weights, ImageNet-12k pretrain w\u002F fine-tunes, and anti-aliased ConvLarge models\r\n\r\n| model                                                                                            |top1  |top1_err|top5  |top5_err|param_count|img_size|\r\n|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------|\r\n| [mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k)|84.99 |15.01   |97.294|2.706   |32.59      |544     |\r\n| [mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k)|84.772|15.228  |97.344|2.656   |32.59      |480     |\r\n| [mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k)|84.64 |15.36   |97.114|2.886   |32.59      |448     |\r\n| [mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k)|84.314|15.686  |97.102|2.898   |32.59      |384     |\r\n| [mobilenetv4_conv_aa_large.e600_r384_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_aa_large.e600_r384_in1k)     |83.824|16.176  |96.734|3.266   |32.59      |480     |\r\n| [mobilenetv4_conv_aa_large.e600_r384_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_aa_large.e600_r384_in1k)             |83.244|16.756  |96.392|3.608   |32.59      |384     |\r\n| [mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k)|82.99 |17.01   |96.67 |3.33    |11.07      |320     |\r\n| [mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k)|82.364|17.636  |96.256|3.744   |11.07      |256     |\r\n\r\n* Impressive MobileNet-V1 and EfficientNet-B0 baseline challenges (https:\u002F\u002Fhuggingface.co\u002Fblog\u002Frwightman\u002Fmobilenet-baselines)\r\n  \r\n| model                                                                                            |top1  |top1_err|top5  |top5_err|param_count|img_size|\r\n|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------|\r\n| [efficientnet_b0.ra4_e3600_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fefficientnet_b0.ra4_e3600_r224_in1k)                       |79.364|20.636  |94.754|5.246   |5.29       |256     |\r\n| [efficientnet_b0.ra4_e3600_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fefficientnet_b0.ra4_e3600_r224_in1k)                       |78.584|21.416  |94.338|5.662   |5.29       |224     |    \r\n| [mobilenetv1_100h.ra4_e3600_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv1_100h.ra4_e3600_r224_in1k)                     |76.596|23.404  |93.272|6.728   |5.28       |256     |\r\n| [mobilenetv1_100.ra4_e3600_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv1_100.ra4_e3600_r224_in1k)                       |76.094|23.906  |93.004|6.996   |4.23       |256     |\r\n| [mobilenetv1_100h.ra4_e3600_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv1_100h.ra4_e3600_r224_in1k)                     |75.662|24.338  |92.504|7.496   |5.28       |224     |\r\n| [mobilenetv1_100.ra4_e3600_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv1_100.ra4_e3600_r224_in1k)                       |75.382|24.618  |92.312|7.688   |4.23       |224     |\r\n\r\n* Prototype of `set_input_size()` added to vit and swin v1\u002Fv2 models to allow changing image size, patch size, window size after model creation.\r\n* Improved support in swin for different size handling, in addition to `set_input_size`, `always_partition` and `strict_img_size` args have been added to `__init__` to allow more flexible input size constraints\r\n* Fix out of order indices info for intermediate 'Getter' feature wrapper, check out or range indices for same.\r\n* Add several `tiny` \u003C .5M param models for testing that are actually trained on ImageNet-1k\r\n\r\n|model                       |top1  |top1_err|top5  |top5_err|param_count|img_size|crop_pct|\r\n|----------------------------|------|--------|------|--------|-----------|--------|--------|\r\n|test_efficientnet.r160_in1k |47.156|52.844  |71.726|28.274  |0.36       |192     |1.0     |\r\n|test_byobnet.r160_in1k      |46.698|53.302  |71.674|28.326  |0.46       |192     |1.0     |\r\n|test_efficientnet.r160_in1k |46.426|53.574  |70.928|29.072  |0.36       |160     |0.875   |\r\n|test_byobnet.r160_in1k      |45.378|54.622  |70.572|29.428  |0.46       |160     |0.875   |\r\n|test_vit.r160_in1k|42.0  |58.0    |68.664|31.336  |0.37       |192     |1.0     |\r\n|test_vit.r160_in1k|40.822|59.178  |67.212|32.788  |0.37       |160     |0.875   |\r\n\r\n* Fix vit reg token init, thanks [Promisery](https:\u002F\u002Fgithub.com\u002FPromisery)\r\n* Other misc fixes\r\n\r\n### June 24, 2024\r\n* 3 more MobileNetV4 hyrid weights with different MQA weight init scheme\r\n\r\n| model                            ","2024-07-29T05:18:26",{"id":262,"version":263,"summary_zh":264,"released_at":265},98771,"v1.0.7","### June 12, 2024\r\n* MobileNetV4 models and initial set of `timm` trained weights added:\r\n\r\n| model                                                                                            |top1  |top1_err|top5  |top5_err|param_count|img_size|\r\n|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------|\r\n| [mobilenetv4_hybrid_large.e600_r384_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_hybrid_large.e600_r384_in1k) |84.266|15.734  |96.936 |3.064  |37.76      |448     |\r\n| [mobilenetv4_hybrid_large.e600_r384_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_hybrid_large.e600_r384_in1k) |83.800|16.200  |96.770 |3.230  |37.76      |384     |\r\n| [mobilenetv4_conv_large.e600_r384_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_large.e600_r384_in1k) |83.392|16.608  |96.622 |3.378  |32.59      |448     |\r\n| [mobilenetv4_conv_large.e600_r384_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_large.e600_r384_in1k) |82.952|17.048  |96.266 |3.734  |32.59      |384     |\r\n| [mobilenetv4_conv_large.e500_r256_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_large.e500_r256_in1k) |82.674|17.326  |96.31 |3.69    |32.59      |320     |\r\n| [mobilenetv4_conv_large.e500_r256_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_large.e500_r256_in1k)                   |81.862|18.138  |95.69 |4.31    |32.59      |256     |\r\n| [mobilenetv4_hybrid_medium.e500_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_hybrid_medium.e500_r224_in1k)             |81.276|18.724  |95.742|4.258   |11.07      |256     |\r\n| [mobilenetv4_conv_medium.e500_r256_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_medium.e500_r256_in1k)                 |80.858|19.142  |95.768|4.232   |9.72       |320     |\r\n| [mobilenetv4_hybrid_medium.e500_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_hybrid_medium.e500_r224_in1k)             |80.442|19.558  |95.38 |4.62    |11.07      |224     |\r\n| [mobilenetv4_conv_blur_medium.e500_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_blur_medium.e500_r224_in1k)       |80.142|19.858  |95.298|4.702   |9.72       |256     |\r\n| [mobilenetv4_conv_medium.e500_r256_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_medium.e500_r256_in1k)                 |79.928|20.072  |95.184|4.816   |9.72       |256     |\r\n| [mobilenetv4_conv_medium.e500_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_medium.e500_r224_in1k)                 |79.808|20.192  |95.186|4.814   |9.72       |256     |\r\n| [mobilenetv4_conv_blur_medium.e500_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_blur_medium.e500_r224_in1k)       |79.438|20.562  |94.932|5.068   |9.72       |224     |\r\n| [mobilenetv4_conv_medium.e500_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_medium.e500_r224_in1k)                 |79.094|20.906  |94.77 |5.23    |9.72       |224     |\r\n| [mobilenetv4_conv_small.e2400_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_small.e2400_r224_in1k)                 |74.616|25.384  |92.072|7.928   |3.77       |256     |\r\n| [mobilenetv4_conv_small.e1200_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_small.e1200_r224_in1k)                 |74.292|25.708  |92.116|7.884   |3.77       |256     |\r\n| [mobilenetv4_conv_small.e2400_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_small.e2400_r224_in1k)                 |73.756|26.244  |91.422|8.578   |3.77       |224     |\r\n| [mobilenetv4_conv_small.e1200_r224_in1k](http:\u002F\u002Fhf.co\u002Ftimm\u002Fmobilenetv4_conv_small.e1200_r224_in1k)                 |73.454|26.546  |91.34 |8.66    |3.77       |224     |\r\n\r\n* Apple MobileCLIP (https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17049, FastViT and ViT-B) image tower model support & weights added (part of OpenCLIP support).\r\n* ViTamin (https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02132) CLIP image tower model & weights added (part of OpenCLIP support).\r\n* OpenAI CLIP Modified ResNet image tower modelling & weight support (via ByobNet). Refactor AttentionPool2d.\r\n* Refactoring & improvements, especially related to classifier_reset and num_features vs head_hidden_size for forward_features() vs pre_logits\r\n","2024-06-19T06:52:36"]