[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-MzeroMiko--VMamba":3,"tool-MzeroMiko--VMamba":62},[4,18,26,35,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,2,"2026-04-10T11:39:34",[14,15,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":32,"last_commit_at":41,"category_tags":42,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[43,13,15,14],"插件",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[52,15,13,14],"语言模型",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,61],"视频",{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":68,"readme_en":69,"readme_zh":70,"quickstart_zh":71,"use_case_zh":72,"hero_image_url":73,"owner_login":74,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":79,"owner_email":80,"owner_twitter":80,"owner_website":80,"owner_url":81,"languages":82,"stars":107,"forks":108,"last_commit_at":109,"license":110,"difficulty_score":10,"env_os":111,"env_gpu":112,"env_ram":113,"env_deps":114,"category_tags":128,"github_topics":80,"view_count":32,"oss_zip_url":80,"oss_zip_packed_at":80,"status":17,"created_at":129,"updated_at":130,"faqs":131,"releases":160},8833,"MzeroMiko\u002FVMamba","VMamba","VMamba: Visual State Space Models，code is based on mamba","VMamba 是一款专为计算机视觉打造的高效骨干网络，它将自然语言处理领域著名的 Mamba 状态空间模型成功迁移至图像任务中。针对传统卷积神经网络感受野受限、而 Transformer 架构计算复杂度随图像分辨率平方级增长导致效率低下的痛点，VMamba 实现了线性时间复杂度，在保持高性能的同时大幅降低了计算资源消耗。\n\n该工具的核心亮点在于其独创的“二维选择性扫描”（SS2D）模块。通过沿四个方向遍历图像数据，SS2D 巧妙地将一维序列建模能力适配到非顺序的二维图像结构中，使模型能够像 Transformer 一样拥有全局有效感受野，精准捕捉多视角的上下文信息。实验表明，VMamba 在多种视觉感知任务中表现优异，且在输入分辨率扩展时展现出极佳的效率优势。\n\nVMamba 非常适合从事深度学习算法研究的研究人员、需要部署高效视觉模型的开发者，以及关注前沿架构探索的技术爱好者使用。项目代码基于 PyTorch 构建，提供了清晰的实现细节和预训练权重，甚至支持极简的单文件快速体验，便于用户快速上手验证或将其集成到自己的视觉系统中。作为 NeurIPS 2024 的亮点接收论文成果，VM","VMamba 是一款专为计算机视觉打造的高效骨干网络，它将自然语言处理领域著名的 Mamba 状态空间模型成功迁移至图像任务中。针对传统卷积神经网络感受野受限、而 Transformer 架构计算复杂度随图像分辨率平方级增长导致效率低下的痛点，VMamba 实现了线性时间复杂度，在保持高性能的同时大幅降低了计算资源消耗。\n\n该工具的核心亮点在于其独创的“二维选择性扫描”（SS2D）模块。通过沿四个方向遍历图像数据，SS2D 巧妙地将一维序列建模能力适配到非顺序的二维图像结构中，使模型能够像 Transformer 一样拥有全局有效感受野，精准捕捉多视角的上下文信息。实验表明，VMamba 在多种视觉感知任务中表现优异，且在输入分辨率扩展时展现出极佳的效率优势。\n\nVMamba 非常适合从事深度学习算法研究的研究人员、需要部署高效视觉模型的开发者，以及关注前沿架构探索的技术爱好者使用。项目代码基于 PyTorch 构建，提供了清晰的实现细节和预训练权重，甚至支持极简的单文件快速体验，便于用户快速上手验证或将其集成到自己的视觉系统中。作为 NeurIPS 2024 的亮点接收论文成果，VMamba 为构建下一代高效视觉基础模型提供了强有力的新选择。","\n\u003Cdiv align=\"center\">\n\u003Ch1>VMamba \u003C\u002Fh1>\n\u003Ch3>VMamba: Visual State Space Model\u003C\u002Fh3>\n\n[Yue Liu](https:\u002F\u002Fgithub.com\u002FMzeroMiko)\u003Csup>1\u003C\u002Fsup>,[Yunjie Tian](https:\u002F\u002Fsunsmarterjie.github.io\u002F)\u003Csup>1\u003C\u002Fsup>,[Yuzhong Zhao](https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?user=tStQNm4AAAAJ&hl=zh-CN&oi=ao)\u003Csup>1\u003C\u002Fsup>, [Hongtian Yu](https:\u002F\u002Fgithub.com\u002Fyuhongtian17)\u003Csup>1\u003C\u002Fsup>, [Lingxi Xie](https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?user=EEMm7hwAAAAJ&hl=zh-CN&oi=ao)\u003Csup>2\u003C\u002Fsup>, [Yaowei Wang](https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?user=o_DllmIAAAAJ&hl=zh-CN&oi=ao)\u003Csup>3\u003C\u002Fsup>, [Qixiang Ye](https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?user=tjEfgsEAAAAJ&hl=zh-CN&oi=ao)\u003Csup>1\u003C\u002Fsup>, [Yunfan Liu](https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?user=YPL33G0AAAAJ&hl=zh-CN&oi=ao)\u003Csup>1\u003C\u002Fsup>\n\n\u003Csup>1\u003C\u002Fsup>  University of Chinese Academy of Sciences, \u003Csup>2\u003C\u002Fsup>  HUAWEI Inc.,  \u003Csup>3\u003C\u002Fsup> PengCheng Lab.\n\nPaper: ([arXiv 2401.10166](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.10166))\n\n\u003C\u002Fdiv>\n\n\n## 🔥 use VMamba with only ***one file*** and in ***fewest steps*** !\n```bash\nconda create -n vmamba python=3.10\npip install torch==2.2 torchvision torchaudio triton pytest chardet yacs termcolor fvcore seaborn packaging ninja einops numpy==1.24.4 timm==0.4.12\npip install https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba\u002Freleases\u002Fdownload\u002Fv2.2.4\u002Fmamba_ssm-2.2.4+cu12torch2.2cxx11abiTRUE-cp310-cp310-linux_x86_64.whl\npython vmamba.py\n```\n\n* [**updates**](#white_check_mark-updates)\n* [**abstract**](#abstract)\n* [**overview**](#overview--derivations)\n* [**main results**](#main-results)\n* [**getting started**](#getting-started)\n* [**star history**](#star-history)\n* [**citation**](#citation)\n* [**acknowledgment**](#acknowledgment)\n\n\n## :white_check_mark: Updates\n* **`Sep. 25th, 2024`**: Update: **VMamba is accepted by NeurIPS2024 (spotlight)!**\n* **`June. 14th, 2024`**: Update: we clean the code to be easier to read; we add support for `mamba2`.\n* **`May. 26th, 2024`**: Update: we release the updated weights of VMambav2, together with the new arxiv paper.\n* **`May. 7th, 2024`**: Update: **Important!** using `torch.backends.cudnn.enabled=True` in downstream tasks may be quite slow. If you found vmamba quite slow in your machine, disable it in vmamba.py, else, ignore this.\n* **...**\n\n***for details see [detailed_updates.md](assets\u002Fdetailed_updates.md)***\n\n## Abstract\n\nDesigning computationally efficient network architectures persists as an ongoing necessity in computer vision. In this paper, we transplant Mamba, a state-space language model, into VMamba, a vision backbone that works in linear time complexity. At the core of VMamba lies a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module. By traversing along four scanning routes, SS2D helps bridge the gap between the ordered nature of 1D selective scan and the non-sequential structure of 2D vision data, which facilitates the gathering of contextual information from various sources and perspectives. Based on the VSS blocks, we develop a family of VMamba architectures and accelerate them through a succession of architectural and implementation enhancements. Extensive experiments showcase VMamba’s promising performance across diverse visual perception tasks, highlighting its advantages in input scaling efficiency compared to existing benchmark models.\n\n## Overview\n\n* [**VMamba**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.10166) serves as a general-purpose backbone for computer vision.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_5ad0ef57caad.png\" alt=\"architecture\" width=\"80%\">\n\u003C\u002Fp>\n\n* **2D-Selective-Scan of VMamba**\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_3bd1a713ccef.png\" alt=\"arch\" width=\"80%\">\n\u003C\u002Fp>\n\n* **VMamba has global effective receptive field**\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_830793141adc.png\" alt=\"erf\" width=\"80%\">\n\u003C\u002Fp>\n\n* **VMamba resembles Transformer-Based Methods in Activation Map**\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_7ea80877e40d.png\" alt=\"attn\" width=\"80%\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_ed52ea80af1a.png\" alt=\"activation\" width=\"80%\">\n\u003C\u002Fp>\n\n## Main Results\n\u003C!-- copied from assets\u002Fperformance.md  -->\n\n\u003C!-- :book: -->\n\u003C!-- ***The checkpoints of some of the models listed below will be released in weeks!*** -->\n\n:book:\n***For details see [performance.md](.\u002Fassets\u002Fperformance.md).***\n\n### **Classification on ImageNet-1K**\n| name | pretrain | resolution |acc@1 | #params | FLOPs | TP. | Train TP. | configs\u002Flogs\u002Fckpts |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n| Swin-T | ImageNet-1K | 224x224 | 81.2 | 28M | 4.5G | 1244 |987 | -- |\n| Swin-S | ImageNet-1K | 224x224 | 83.2 | 50M | 8.7G | 718 |642 | -- |\n| Swin-B | ImageNet-1K | 224x224 | 83.5 | 88M | 15.4G | 458 |496 | -- |\n| VMamba-S[`s2l15`] | ImageNet-1K | 224x224 | 83.6 | 50M | 8.7G | 877 | 314| [config](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2_small_224.yaml)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_small_0229.txt)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_small_0229_ckpt_epoch_222.pth) |\n| VMamba-B[`s2l15`] | ImageNet-1K | 224x224 | 83.9 | 89M | 15.4G | 646 | 247 | [config](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2_base_224.yaml)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_base_0229.txt)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_base_0229_ckpt_epoch_237.pth) |\n| VMamba-T[`s1l8`] | ImageNet-1K | 224x224 | 82.6 | 30M | 4.9G | 1686| 571| [config](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2v_tiny_224.yaml)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm1_tiny_0230s.txt)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm1_tiny_0230s_ckpt_epoch_264.pth) |\n\n\n* *Models in this subsection is trained from scratch with random or manual initialization. The hyper-parameters are inherited from Swin, except for `drop_path_rate` and `EMA`. All models are trained with EMA except for the `Vanilla-VMamba-T`.*\n* *`TP.(Throughput)` and `Train TP. (Train Throughput)` are assessed on an A100 GPU paired with an AMD EPYC 7542 CPU, with batch size 128. `Train TP.` is tested with mix-resolution, excluding the time consumption of optimizers.*\n* *`FLOPs` and `parameters` are now gathered with `head` (In previous versions, they were counted without head, so the numbers raise a little bit).*\n* *we calculate `FLOPs` with the algorithm @albertgu [provides](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba\u002Fissues\u002F110), which will be bigger than previous calculation (which is based on the `selective_scan_ref` function, and ignores the hardware-aware algorithm).*\n\n\n### **Object Detection on COCO**\n  \n| Backbone | #params | FLOPs | Detector | bboxAP | bboxAP50 | bboxAP75 | segmAP | segmAP50 | segmAP75 | configs\u002Flogs\u002Fckpts |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |:---: |:---: |:---: |\n| Swin-T | 48M | 267G | MaskRCNN@1x | 42.7 |65.2 |46.8 |39.3 |62.2 |42.2 |-- |\n| Swin-S | 69M | 354G | MaskRCNN@1x | 44.8 |66.6 |48.9 |40.9 |63.4 |44.2 |-- |-- |\n| Swin-B | 107M | 496G | MaskRCNN@1x | 46.9|--|--| 42.3|--|--|-- |-- |\n| VMamba-S[`s2l15`] | 70M | 384G | MaskRCNN@1x | 48.7 |70.0 |53.4 |43.7 |67.3 |47.0 | [config](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_small.py)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small.log)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small_epoch_11.pth) |\n| VMamba-B[`s2l15`] | 108M | 485G | MaskRCNN@1x | 49.2 |71.4 |54.0 |44.1 |68.3 |47.7 | [config](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_base.py)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base.log)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base_epoch_11.pth) |\n| VMamba-B[`s2l15`] | 108M | 485G | MaskRCNN@1x[`bs8`] | 49.2 |70.9 |53.9 |43.9 |67.7 |47.6 | [config](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_base.py)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base_bs8.log)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base_epoch_12_bs8.pth) |\n| VMamba-T[`s1l8`] | 50M | 271G | MaskRCNN@1x | 47.3 |69.3 |52.0 |42.7 |66.4 |45.9 | [config](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_tiny.py)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_s.log)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_s_epoch_12.pth) |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |\n| Swin-T | 48M | 267G | MaskRCNN@3x | 46.0 |68.1 |50.3 |41.6 |65.1 |44.9 |-- |\n| Swin-S | 69M | 354G | MaskRCNN@3x | 48.2 |69.8 |52.8 |43.2 |67.0 |46.1  |-- |\n| VMamba-S[`s2l15`] | 70M | 384G | MaskRCNN@3x | 49.9 |70.9 |54.7 |44.20 |68.2 |47.7 | [config](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_small_ms_3x.py)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small_ms_3x.log)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small_ms_3x_epoch_32.pth) |\n| VMamba-T[`s1l8`] | 50M | 271G | MaskRCNN@3x | 48.8 |70.4 |53.50 |43.7 |67.4 |47.0 | [config](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_tiny_ms_3x.py)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_ms_3x_s.log)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_ms_3x_s_epoch_31.pth) |\n\n\n* *Models in this subsection is initialized from the models trained in `classfication`.*\n* *we now calculate FLOPs with the algrithm @albertgu [provides](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba\u002Fissues\u002F110), which will be bigger than previous calculation (which is based on the `selective_scan_ref` function, and ignores the hardware-aware algrithm).*\n\n### **Semantic Segmentation on ADE20K**\n\n| Backbone | Input|  #params | FLOPs | Segmentor | mIoU(SS) | mIoU(MS) | configs\u002Flogs\u002Flogs(ms)\u002Fckpts |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |\n| Swin-T | 512x512 | 60M | 945G | UperNet@160k | 44.4| 45.8| -- |\n| Swin-S | 512x512 | 81M | 1039G | UperNet@160k | 47.6| 49.5| -- |\n| Swin-B | 512x512 | 121M | 1188G | UperNet@160k | 48.1| 49.7|-- |\n| VMamba-S[`s2l15`] | 512x512 | 82M | 1028G | UperNet@160k | 50.6| 51.2|[config](segmentation\u002Fconfigs\u002Fvssm1\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small.py)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small.log)\u002F[log(ms)](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small_tta.log)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small_iter_144000.pth) |\n| VMamba-B[`s2l15`] | 512x512 | 122M | 1170G | UperNet@160k | 51.0| 51.6|[config](segmentation\u002Fconfigs\u002Fvssm1\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base.py)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base.log)\u002F[log(ms)](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base_tta.log)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base_iter_160000.pth) |\n| VMamba-T[`s1l8`] | 512x512 | 62M | 949G | UperNet@160k | 47.9| 48.8| [config](segmentation\u002Fconfigs\u002Fvssm1\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny.py)\u002F[log](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_s.log)\u002F[log(ms)](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_s_tta.log)\u002F[ckpt](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_s_iter_160000.pth) |\n\n\n* *Models in this subsection is initialized from the models trained in `classfication`.*\n* *we now calculate FLOPs with the algrithm @albertgu [provides](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba\u002Fissues\u002F110), which will be bigger than previous calculation (which is based on the `selective_scan_ref` function, and ignores the hardware-aware algrithm).*\n\n## Getting Started\n\n### Installation\n\n**Step 1: Clone the VMamba repository:**\n\nTo get started, first clone the VMamba repository and navigate to the project directory:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba.git\ncd VMamba\n```\n\n**Step 2: Environment Setup:**\n\nVMamba recommends setting up a conda environment and installing dependencies via pip. Use the following commands to set up your environment:\nAlso, We recommend using the pytorch>=2.0, cuda>=11.8. But lower version of pytorch and CUDA are also supported.\n\n***Create and activate a new conda environment***\n\n```bash\nconda create -n vmamba\nconda activate vmamba\n```\n\n***Install Dependencies***\n\n```bash\npip install -r requirements.txt\ncd kernels\u002Fselective_scan && pip install .\n```\n\u003C!-- cd kernels\u002Fcross_scan && pip install . -->\n\n***Check Selective Scan (optional)***\n\n* If you want to check the modules compared with `mamba_ssm`, install [`mamba_ssm`](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba) first!\n\n* If you want to check if the implementation of `selective scan` of ours is the same with `mamba_ssm`, `selective_scan\u002Ftest_selective_scan.py` is here for you. Change to `MODE = \"mamba_ssm_sscore\"` in `selective_scan\u002Ftest_selective_scan.py`, and run `pytest selective_scan\u002Ftest_selective_scan.py`.\n\n* If you want to check if the implementation of `selective scan` of ours is the same with reference code (`selective_scan_ref`), change to `MODE = \"sscore\"` in `selective_scan\u002Ftest_selective_scan.py`, and run `pytest selective_scan\u002Ftest_selective_scan.py`.\n\n* `MODE = \"mamba_ssm\"` stands for checking whether the results of `mamba_ssm` is close to `selective_scan_ref`, and `\"sstest\"` is preserved for development. \n\n* If you find `mamba_ssm` (`selective_scan_cuda`) or `selective_scan` ( `selctive_scan_cuda_core`) is not close enough to `selective_scan_ref`, and the test failed, do not worry. Check if `mamba_ssm` and `selective_scan` are close enough [instead](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba\u002Fpull\u002F161).\n\n* ***If you are interested in selective scan, you can check [mamba](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba), [mamba-mini](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002Fmamba-mini), [mamba.py](https:\u002F\u002Fgithub.com\u002FalxndrTL\u002Fmamba.py) [mamba-minimal](https:\u002F\u002Fgithub.com\u002Fjohnma2006\u002Fmamba-minimal) for more information.***\n\n***Dependencies for `Detection` and `Segmentation` (optional)***\n\n```bash\npip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex\npip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0\n```\n\n### Model Training and Inference\n\n**Classification**\n\nTo train VMamba models for classification on ImageNet, use the following commands for different configurations:\n\n```bash\npython -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr=\"127.0.0.1\" --master_port=29501 main.py --cfg \u003C\u002Fpath\u002Fto\u002Fconfig> --batch-size 128 --data-path \u003C\u002Fpath\u002Fof\u002Fdataset> --output \u002Ftmp\n```\n\nIf you only want to test the performance (together with params and flops):\n\n```bash\npython -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr=\"127.0.0.1\" --master_port=29501 main.py --cfg \u003C\u002Fpath\u002Fto\u002Fconfig> --batch-size 128 --data-path \u003C\u002Fpath\u002Fof\u002Fdataset> --output \u002Ftmp --pretrained \u003C\u002Fpath\u002Fof\u002Fcheckpoint>\n```\n\n***please refer to [modelcard](.\u002Fmodelcard.sh) for more details.***\n\n**Detection and Segmentation**\n\nTo evaluate with `mmdetection` or `mmsegmentation`:\n```bash\nbash .\u002Ftools\u002Fdist_test.sh \u003C\u002Fpath\u002Fto\u002Fconfig> \u003C\u002Fpath\u002Fto\u002Fcheckpoint> 1\n```\n*use `--tta` to get the `mIoU(ms)` in segmentation*\n\nTo train with `mmdetection` or `mmsegmentation`:\n```bash\nbash .\u002Ftools\u002Fdist_train.sh \u003C\u002Fpath\u002Fto\u002Fconfig> 8\n```\n\nFor more information about detection and segmentation tasks, please refer to the manual of [`mmdetection`](https:\u002F\u002Fmmdetection.readthedocs.io\u002Fen\u002Flatest\u002Fuser_guides\u002Ftrain.html) and [`mmsegmentation`](https:\u002F\u002Fmmsegmentation.readthedocs.io\u002Fen\u002Flatest\u002Fuser_guides\u002F4_train_test.html). Remember to use the appropriate backbone configurations in the `configs` directory.\n\n### Analysis Tools\n\nVMamba includes tools for visualizing mamba \"attention\" and effective receptive field, analysing throughput and train-throughput. Use the following commands to perform analysis:\n\n```bash\n# Visualize Mamba \"Attention\"\nCUDA_VISIBLE_DEVICES=0 python analyze\u002Fattnmap.py\n\n# Analyze the effective receptive field\nCUDA_VISIBLE_DEVICES=0 python analyze\u002Ferf.py\n\n# Analyze the throughput and train throughput\nCUDA_VISIBLE_DEVICES=0 python analyze\u002Ftp.py\n\n```\n\n***We also included other analysing tools that we may use in this project. Thanks to all who have contributes to these tools.***\n\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_87862bd911c4.png)](https:\u002F\u002Fstar-history.com\u002F#MzeroMiko\u002FVMamba&Date)\n\n## Citation\n\n```\n@article{liu2024vmamba,\n  title={VMamba: Visual State Space Model},\n  author={Liu, Yue and Tian, Yunjie and Zhao, Yuzhong and Yu, Hongtian and Xie, Lingxi and Wang, Yaowei and Ye, Qixiang and Liu, Yunfan},\n  journal={arXiv preprint arXiv:2401.10166},\n  year={2024}\n}\n```\n\n## Acknowledgment\n\nThis project is based on Mamba ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.00752), [code](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba)), Swin-Transformer ([paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.14030.pdf), [code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSwin-Transformer)), ConvNeXt ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.03545), [code](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FConvNeXt)), [OpenMMLab](https:\u002F\u002Fgithub.com\u002Fopen-mmlab),\nand the `analyze\u002Fget_erf.py` is adopted from [replknet](https:\u002F\u002Fgithub.com\u002FDingXiaoH\u002FRepLKNet-pytorch\u002Ftree\u002Fmain\u002Ferf), thanks for their excellent works.\n\n* **We release [Fast-iTPN](https:\u002F\u002Fgithub.com\u002Fsunsmarterjie\u002FiTPN\u002Ftree\u002Fmain\u002Ffast_itpn) recently, which reports the best performance on ImageNet-1K at Tiny\u002FSmall\u002FBase level models as far as we know. (Tiny-24M-86.5%, Small-40M-87.8%, Base-85M-88.75%)**\n","\u003Cdiv align=\"center\">\n\u003Ch1>VMamba \u003C\u002Fh1>\n\u003Ch3>VMamba：视觉状态空间模型\u003C\u002Fh3>\n\n[Yue Liu](https:\u002F\u002Fgithub.com\u002FMzeroMiko)\u003Csup>1\u003C\u002Fsup>,[Yunjie Tian](https:\u002F\u002Fsunsmarterjie.github.io\u002F)\u003Csup>1\u003C\u002Fsup>,[Yuzhong Zhao](https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?user=tStQNm4AAAAJ&hl=zh-CN&oi=ao)\u003Csup>1\u003C\u002Fsup>, [Hongtian Yu](https:\u002F\u002Fgithub.com\u002Fyuhongtian17)\u003Csup>1\u003C\u002Fsup>, [Lingxi Xie](https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?user=EEMm7hwAAAAJ&hl=zh-CN&oi=ao)\u003Csup>2\u003C\u002Fsup>, [Yaowei Wang](https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?user=o_DllmIAAAAJ&hl=zh-CN&oi=ao)\u003Csup>3\u003C\u002Fsup>, [Qixiang Ye](https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?user=tjEfgsEAAAAJ&hl=zh-CN&oi=ao)\u003Csup>1\u003C\u002Fsup>, [Yunfan Liu](https:\u002F\u002Fscholar.google.com.hk\u002Fcitations?user=YPL33G0AAAAJ&hl=zh-CN&oi=ao)\u003Csup>1\u003C\u002Fsup>\n\n\u003Csup>1\u003C\u002Fsup>  中国科学院大学, \u003Csup>2\u003C\u002Fsup>  华为公司,  \u003Csup>3\u003C\u002Fsup>  崇德实验室。\n\n论文：([arXiv 2401.10166](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.10166))\n\n\u003C\u002Fdiv>\n\n\n## 🔥 仅需***一个文件***，以***最少步骤***使用VMamba！\n```bash\nconda create -n vmamba python=3.10\npip install torch==2.2 torchvision torchaudio triton pytest chardet yacs termcolor fvcore seaborn packaging ninja einops numpy==1.24.4 timm==0.4.12\npip install https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba\u002Freleases\u002Fdownload\u002Fv2.2.4\u002Fmamba_ssm-2.2.4+cu12torch2.2cxx11abiTRUE-cp310-cp310-linux_x86_64.whl\npython vmamba.py\n```\n\n* [**更新**](#white_check_mark-updates)\n* [**摘要**](#abstract)\n* [**概述**](#overview--derivations)\n* [**主要结果**](#main-results)\n* [**快速入门**](#getting-started)\n* [**星标历史**](#star-history)\n* [**引用**](#citation)\n* [**致谢**](#acknowledgment)\n\n\n## :white_check_mark: 更新\n* **`2024年9月25日`**：更新：**VMamba已被NeurIPS 2024接收（亮点论文）！**\n* **`2024年6月14日`**：更新：我们优化了代码，使其更易读；增加了对`mamba2`的支持。\n* **`2024年5月26日`**：更新：我们发布了VMambav2的更新权重，并附带新的arXiv论文。\n* **`2024年5月7日`**：更新：**重要提示！** 在下游任务中使用`torch.backends.cudnn.enabled=True`可能会导致运行速度较慢。如果您发现VMamba在您的机器上运行较慢，请在vmamba.py中将其禁用，否则无需理会。\n* **...**\n\n***详情请参阅[detailed_updates.md](assets\u002Fdetailed_updates.md)***\n\n## 摘要\n\n设计计算效率高的网络架构仍然是计算机视觉领域持续的需求。本文将状态空间语言模型Mamba移植到VMamba中，构建了一种线性时间复杂度的视觉骨干网络。VMamba的核心是由带有二维选择性扫描（SS2D）模块的视觉状态空间（VSS）块堆叠而成。通过沿四条扫描路径遍历，SS2D有助于弥合一维选择性扫描的有序特性与二维视觉数据非序列结构之间的差距，从而促进从不同来源和视角收集上下文信息。基于VSS块，我们开发了一系列VMamba架构，并通过一系列架构和实现上的改进进一步加速其性能。大量实验表明，VMamba在多种视觉感知任务中表现出色，尤其在输入尺度扩展效率方面优于现有基准模型。\n\n## 概述\n\n* [**VMamba**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.10166) 是一种通用的计算机视觉骨干网络。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_5ad0ef57caad.png\" alt=\"architecture\" width=\"80%\">\n\u003C\u002Fp>\n\n* **VMamba的二维选择性扫描**\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_3bd1a713ccef.png\" alt=\"arch\" width=\"80%\">\n\u003C\u002Fp>\n\n* **VMamba具有全局有效感受野**\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_830793141adc.png\" alt=\"erf\" width=\"80%\">\n\u003C\u002Fp>\n\n* **VMamba在激活图上类似于基于Transformer的方法**\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_7ea80877e40d.png\" alt=\"attn\" width=\"80%\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_ed52ea80af1a.png\" alt=\"activation\" width=\"80%\">\n\u003C\u002Fp>\n\n## 主要结果\n\u003C!-- copied from assets\u002Fperformance.md  -->\n\n\u003C!-- :book: -->\n\u003C!-- ***The checkpoints of some of the models listed below will be released in weeks!*** -->\n\n:book:\n***详情请参阅[performance.md](.\u002Fassets\u002Fperformance.md)。***\n\n### **ImageNet-1K上的分类**\n| 名称 | 预训练 | 分辨率 | acc@1 | 参数量 | FLOPs | TP. | 训练TP. | 配置\u002F日志\u002F检查点 |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n| Swin-T | ImageNet-1K | 224x224 | 81.2 | 28M | 4.5G | 1244 |987 | -- |\n| Swin-S | ImageNet-1K | 224x224 | 83.2 | 50M | 8.7G | 718 |642 | -- |\n| Swin-B | ImageNet-1K | 224x224 | 83.5 | 88M | 15.4G | 458 |496 | -- |\n| VMamba-S[`s2l15`] | ImageNet-1K | 224x224 | 83.6 | 50M | 8.7G | 877 | 314| [配置](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2_small_224.yaml)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_small_0229.txt)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_small_0229_ckpt_epoch_222.pth) |\n| VMamba-B[`s2l15`] | ImageNet-1K | 224x224 | 83.9 | 89M | 15.4G | 646 | 247 | [配置](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2_base_224.yaml)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_base_0229.txt)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_base_0229_ckpt_epoch_237.pth) |\n| VMamba-T[`s1l8`] | ImageNet-1K | 224x224 | 82.6 | 30M | 4.9G | 1686| 571| [配置](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2v_tiny_224.yaml)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm1_tiny_0230s.txt)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm1_tiny_0230s_ckpt_epoch_264.pth) |\n\n\n* *本小节中的模型均采用随机或手动初始化从头开始训练。超参数沿用了Swin的设置，但`drop_path_rate`和`EMA`除外。除`Vanilla-VMamba-T`外，所有模型均使用EMA进行训练。*\n* *`TP.(吞吐量)`和`Train TP. (训练吞吐量)`是在A100 GPU搭配AMD EPYC 7542 CPU的环境下，以批量大小128进行评估的。`训练吞吐量`测试时采用了混合分辨率，未计入优化器的时间消耗。*\n* *`FLOPs`和`参数量`现在包含了头部部分（在之前的版本中，这些数值并未计算头部，因此会略高一些）。*\n* *我们使用@albertgu提供的算法来计算`FLOPs`，该算法得出的结果会比之前基于`selective_scan_ref`函数且未考虑硬件优化的计算方法更大。*\n\n### COCO 数据集上的目标检测\n  \n| 主干网络 | 参数量 | FLOPs | 检测器 | bboxAP | bboxAP50 | bboxAP75 | segmAP | segmAP50 | segmAP75 | 配置\u002F日志\u002F检查点 |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |:---: |:---: |:---: |\n| Swin-T | 48M | 267G | MaskRCNN@1x | 42.7 |65.2 |46.8 |39.3 |62.2 |42.2 |-- |\n| Swin-S | 69M | 354G | MaskRCNN@1x | 44.8 |66.6 |48.9 |40.9 |63.4 |44.2 |-- |-- |\n| Swin-B | 107M | 496G | MaskRCNN@1x | 46.9|--|--| 42.3|--|--|-- |-- |\n| VMamba-S[`s2l15`] | 70M | 384G | MaskRCNN@1x | 48.7 |70.0 |53.4 |43.7 |67.3 |47.0 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_small.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small_epoch_11.pth) |\n| VMamba-B[`s2l15`] | 108M | 485G | MaskRCNN@1x | 49.2 |71.4 |54.0 |44.1 |68.3 |47.7 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_base.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base_epoch_11.pth) |\n| VMamba-B[`s2l15`] | 108M | 485G | MaskRCNN@1x[`bs8`] | 49.2 |70.9 |53.9 |43.9 |67.7 |47.6 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_base.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base_bs8.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base_epoch_12_bs8.pth) |\n| VMamba-T[`s1l8`] | 50M | 271G | MaskRCNN@1x | 47.3 |69.3 |52.0 |42.7 |66.4 |45.9 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_tiny.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_s.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_s_epoch_12.pth) |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |\n| Swin-T | 48M | 267G | MaskRCNN@3x | 46.0 |68.1 |50.3 |41.6 |65.1 |44.9 |-- |\n| Swin-S | 69M | 354G | MaskRCNN@3x | 48.2 |69.8 |52.8 |43.2 |67.0 |46.1  |-- |\n| VMamba-S[`s2l15`] | 70M | 384G | MaskRCNN@3x | 49.9 |70.9 |54.7 |44.20 |68.2 |47.7 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_small_ms_3x.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small_ms_3x.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small_ms_3x_epoch_32.pth) |\n| VMamba-T[`s1l8`] | 50M | 271G | MaskRCNN@3x | 48.8 |70.4 |53.50 |43.7 |67.4 |47.0 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_tiny_ms_3x.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_ms_3x_s.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_ms_3x_s_epoch_31.pth) |\n\n\n* *本小节中的模型均从分类任务中训练好的模型初始化。*\n* *我们现在使用 @albertgu 提供的算法计算 FLOPs（见 GitHub issue #110），这比之前的计算结果要大（之前的计算基于 `selective_scan_ref` 函数，未考虑硬件相关的优化算法）。*\n\n### ADE20K 数据集上的语义分割\n\n| 主干网络 | 输入尺寸 | 参数量 | FLOPs | 分割器 | mIoU(SS) | mIoU(MS) | 配置\u002F日志\u002F多尺度日志\u002F检查点 |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |\n| Swin-T | 512x512 | 60M | 945G | UperNet@160k | 44.4| 45.8| -- |\n| Swin-S | 512x512 | 81M | 1039G | UperNet@160k | 47.6| 49.5| -- |\n| Swin-B | 512x512 | 121M | 1188G | UperNet@160k | 48.1| 49.7|-- |\n| VMamba-S[`s2l15`] | 512x512 | 82M | 1028G | UperNet@160k | 50.6| 51.2|[配置](segmentation\u002Fconfigs\u002Fvssm1\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small.log)\u002F[多尺度日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small_tta.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small_iter_144000.pth) |\n| VMamba-B[`s2l15`] | 512x512 | 122M | 1170G | UperNet@160k | 51.0| 51.6|[配置](segmentation\u002Fconfigs\u002Fvssm1\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base.log)\u002F[多尺度日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base_tta.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base_iter_160000.pth) |\n| VMamba-T[`s1l8`] | 512x512 | 62M | 949G | UperNet@160k | 47.9| 48.8| [配置](segmentation\u002Fconfigs\u002Fvssm1\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_s.log)\u002F[多尺度日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_s_tta.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_s_iter_160000.pth) |\n\n\n* *本小节中的模型均从分类任务中训练好的模型初始化。*\n* *我们现在使用 @albertgu 提供的算法计算 FLOPs（见 GitHub issue #110），这比之前的计算结果要大（之前的计算基于 `selective_scan_ref` 函数，未考虑硬件相关的优化算法）。*\n\n## 入门\n\n### 安装\n\n**步骤 1：克隆 VMamba 仓库：**\n\n首先，克隆 VMamba 仓库并进入项目目录：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba.git\ncd VMamba\n```\n\n**步骤 2：环境设置：**\n\nVMamba 建议使用 conda 创建虚拟环境，并通过 pip 安装依赖。请使用以下命令设置环境：\n此外，我们推荐使用 PyTorch ≥ 2.0 和 CUDA ≥ 11.8，但较低版本的 PyTorch 和 CUDA 也受支持。\n\n***创建并激活一个新的 conda 环境***\n\n```bash\nconda create -n vmamba\nconda activate vmamba\n```\n\n***安装依赖***\n\n```bash\npip install -r requirements.txt\ncd kernels\u002Fselective_scan && pip install .\n```\n\u003C!-- cd kernels\u002Fcross_scan && pip install . -->\n\n***检查选择性扫描（可选）***\n\n* 如果您想将模块与 `mamba_ssm` 进行对比测试，请先安装 [`mamba_ssm`](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba)！\n\n* 若要检查我们的 `selective scan` 实现是否与 `mamba_ssm` 相同，可以运行 `selective_scan\u002Ftest_selective_scan.py`。将 `selective_scan\u002Ftest_selective_scan.py` 中的 `MODE = \"mamba_ssm_sscore\"` 修改为相应值，然后运行 `pytest selective_scan\u002Ftest_selective_scan.py`。\n\n* 若想验证我们的 `selective scan` 实现是否与参考代码（`selective_scan_ref`）一致，可将 `selective_scan\u002Ftest_selective_scan.py` 中的 `MODE = \"sscore\"` 修改为相应值，再运行 `pytest selective_scan\u002Ftest_selective_scan.py`。\n\n* `MODE = \"mamba_ssm\"` 表示检查 `mamba_ssm` 的结果是否接近 `selective_scan_ref`，而 `\"sstest\"` 则保留用于开发。\n\n* 如果发现 `mamba_ssm`（`selective_scan_cuda`）或 `selective_scan`（`selctive_scan_cuda_core`）与 `selective_scan_ref` 差距较大，测试失败也不要担心。请查看 [mamba](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba\u002Fpull\u002F161) 是否已解决该问题。\n\n* ***如果您对选择性扫描感兴趣，可以进一步了解 [mamba](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba)、[mamba-mini](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002Fmamba-mini)、[mamba.py](https:\u002F\u002Fgithub.com\u002FalxndrTL\u002Fmamba.py) 和 [mamba-minimal](https:\u002F\u002Fgithub.com\u002Fjohnma2006\u002Fmamba-minimal) 等项目。***\n\n***检测与分割任务的依赖（可选）***\n\n```bash\npip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex\npip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0\n```\n\n### 模型训练与推理\n\n**分类**\n\n要在 ImageNet 数据集上训练 VMamba 分类模型，可根据不同配置使用以下命令：\n\n```bash\npython -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr=\"127.0.0.1\" --master_port=29501 main.py --cfg \u003C\u002Fpath\u002Fto\u002Fconfig> --batch-size 128 --data-path \u003C\u002Fpath\u002Fof\u002Fdataset> --output \u002Ftmp\n```\n\n若仅需测试性能（包括参数量和 FLOPs）：\n\n```bash\npython -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr=\"127.0.0.1\" --master_port=29501 main.py --cfg \u003C\u002Fpath\u002Fto\u002Fconfig> --batch-size 128 --data-path \u003C\u002Fpath\u002Fof\u002Fdataset> --output \u002Ftmp --pretrained \u003C\u002Fpath\u002Fof\u002Fcheckpoint>\n```\n\n***更多详情请参阅 [modelcard](.\u002Fmodelcard.sh)。***\n\n**检测与分割**\n\n使用 `mmdetection` 或 `mmsegmentation` 进行评估：\n\n```bash\nbash .\u002Ftools\u002Fdist_test.sh \u003C\u002Fpath\u002Fto\u002Fconfig> \u003C\u002Fpath\u002Fto\u002Fcheckpoint> 1\n```\n* 使用 `--tta` 可获得分割任务中的 `mIoU(ms)`。\n\n使用 `mmdetection` 或 `mmsegmentation` 进行训练：\n\n```bash\nbash .\u002Ftools\u002Fdist_train.sh \u003C\u002Fpath\u002Fto\u002Fconfig> 8\n```\n\n有关检测和分割任务的更多信息，请参阅 [`mmdetection`](https:\u002F\u002Fmmdetection.readthedocs.io\u002Fen\u002Flatest\u002Fuser_guides\u002Ftrain.html) 和 [`mmsegmentation`](https:\u002F\u002Fmmsegmentation.readthedocs.io\u002Fen\u002Flatest\u002Fuser_guides\u002F4_train_test.html) 的官方文档。请务必在 `configs` 目录中使用合适的骨干网络配置。\n\n### 分析工具\n\nVMamba 提供了用于可视化 Mamba “注意力”和有效感受野、分析吞吐量及训练吞吐量的工具。请使用以下命令进行分析：\n\n```bash\n# 可视化 Mamba “注意力”\nCUDA_VISIBLE_DEVICES=0 python analyze\u002Fattnmap.py\n\n# 分析有效感受野\nCUDA_VISIBLE_DEVICES=0 python analyze\u002Ferf.py\n\n# 分析吞吐量和训练吞吐量\nCUDA_VISIBLE_DEVICES=0 python analyze\u002Ftp.py\n\n```\n\n***我们还包含了一些在此项目中可能用到的其他分析工具，感谢所有为这些工具做出贡献的人。***\n\n## 星标历史\n\n[![星标历史图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_readme_87862bd911c4.png)](https:\u002F\u002Fstar-history.com\u002F#MzeroMiko\u002FVMamba&Date)\n\n## 引用\n\n```\n@article{liu2024vmamba,\n  title={VMamba: 视觉状态空间模型},\n  author={刘悦、田云杰、赵宇中、于洪天、谢凌曦、王耀伟、叶启祥、刘云帆},\n  journal={arXiv 预印本 arXiv:2401.10166},\n  year={2024}\n}\n```\n\n## 致谢\n\n本项目基于 Mamba（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.00752)、[代码](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba)）、Swin-Transformer（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.14030.pdf)、[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSwin-Transformer)）、ConvNeXt（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.03545)、[代码](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FConvNeXt)）、[OpenMMLab](https:\u002F\u002Fgithub.com\u002Fopen-mmlab)，\n以及从 [replknet](https:\u002F\u002Fgithub.com\u002FDingXiaoH\u002FRepLKNet-pytorch\u002Ftree\u002Fmain\u002Ferf) 借鉴的 `analyze\u002Fget_erf.py`，感谢他们的杰出工作。\n\n* **我们最近发布了 [Fast-iTPN](https:\u002F\u002Fgithub.com\u002Fsunsmarterjie\u002FiTPN\u002Ftree\u002Fmain\u002Ffast_itpn)，据我们所知，该模型在 Tiny\u002FSmall\u002FBase 级别上取得了 ImageNet-1K 数据集的最佳性能。（Tiny-24M-86.5%，Small-40M-87.8%，Base-85M-88.75%）**","# VMamba 快速上手指南\n\nVMamba 是一种基于视觉状态空间模型（Visual State Space Model）的通用视觉骨干网络，具有线性时间复杂度，在图像分类、目标检测和语义分割等任务中表现优异。\n\n## 环境准备\n\n*   **操作系统**: Linux (x86_64)\n*   **Python**: 3.10\n*   **GPU**: 支持 CUDA 的 NVIDIA 显卡 (示例命令基于 CUDA 12)\n*   **编译器**: 需安装 `ninja` 以加速编译\n\n## 安装步骤\n\n请按照以下顺序执行命令来配置环境并安装依赖。\n\n### 1. 创建 Conda 环境\n```bash\nconda create -n vmamba python=3.10\nconda activate vmamba\n```\n\n### 2. 安装基础依赖\n推荐使用国内镜像源（如清华源）加速下载：\n```bash\npip install torch==2.2 torchvision torchaudio triton pytest chardet yacs termcolor fvcore seaborn packaging ninja einops numpy==1.24.4 timm==0.4.12 -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 3. 安装 Mamba SSM 核心组件\n安装预编译的 `mamba_ssm` wheel 包（适用于 CUDA 12 + Torch 2.2 + Python 3.10）：\n```bash\npip install https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba\u002Freleases\u002Fdownload\u002Fv2.2.4\u002Fmamba_ssm-2.2.4+cu12torch2.2cxx11abiTRUE-cp310-cp310-linux_x86_64.whl\n```\n> **注意**：如果您的 CUDA 版本或 Python 版本不同，请访问 [Mamba Releases](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba\u002Freleases) 查找对应的 `.whl` 文件进行安装。\n\n### 4. 获取项目代码\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba.git\ncd VMamba\n```\n\n## 基本使用\n\n项目提供了一个极简的单文件测试脚本，用于验证安装是否成功并运行模型。\n\n### 运行测试脚本\n在项目根目录下直接运行：\n```bash\npython vmamba.py\n```\n\n### 性能优化提示\n如果在下游任务中发现推理速度较慢，请检查 `vmamba.py` 中的设置。根据官方更新日志，在某些机器上禁用 cuDNN 可能提升速度：\n确保代码中未强制开启 `torch.backends.cudnn.enabled=True`，或者根据实际测试结果手动调整该选项。\n\n---\n*更多详细配置（如分类、检测、分割任务的训练与评估）请参考项目目录下的 `classification`, `detection`, `segmentation` 文件夹及对应的配置文件。*","某自动驾驶初创公司的算法团队正在开发车载实时路况感知系统，需要在算力受限的边缘设备上处理高分辨率摄像头数据，以精准识别远处的交通标志和行人。\n\n### 没有 VMamba 时\n- **全局信息捕捉困难**：传统的卷积神经网络（CNN）受限于局部感受野，难以有效关联图像中相距较远的上下文信息，导致在复杂背景下漏检小目标。\n- **计算资源消耗过大**：若改用 Vision Transformer 来获取全局视野，其二次方级的计算复杂度会让边缘设备推理延迟飙升，无法满足实时性要求。\n- **长序列建模效率低**：面对高分辨率输入，现有模型往往需要大幅下采样牺牲细节，或陷入显存溢出困境，难以平衡精度与速度。\n- **部署调优成本高**：为了在有限算力上跑通模型，工程师需花费大量时间进行剪枝、量化等复杂的后处理优化，且效果往往不尽如人意。\n\n### 使用 VMamba 后\n- **线性复杂度实现全局感知**：VMamba 凭借独特的二维选择性扫描机制（SS2D），以线性时间复杂度构建了全局有效感受野，轻松捕捉远距离依赖关系。\n- **边缘端推理流畅高效**：在保持高精度的同时，VMamba 显著降低了计算负载，使高分辨率图像在嵌入式芯片上的处理帧率提升了数倍，满足实时控制需求。\n- **原生支持高分辨率输入**：得益于高效的架构设计，团队可直接输入原始高清画面而无需过度压缩，显著提升了对远处微小交通标志的识别准确率。\n- **落地流程大幅简化**：VMamba 代码库简洁且易于集成，团队减少了繁琐的模型压缩步骤，将原本数周的适配周期缩短至几天，快速完成了原型验证。\n\nVMamba 成功打破了视觉模型中“全局感知”与“计算效率”不可兼得的僵局，让高性能视觉 backbone 在资源受限的边缘场景中真正落地成为可能。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMzeroMiko_VMamba_83079314.png","MzeroMiko","Liu Yue","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FMzeroMiko_320c48be.jpg","PhD student in UCAS","UCAS","Beijing, China",null,"https:\u002F\u002Fgithub.com\u002FMzeroMiko",[83,87,91,95,99,103],{"name":84,"color":85,"percentage":86},"Python","#3572A5",95.4,{"name":88,"color":89,"percentage":90},"Cuda","#3A4E3A",2.6,{"name":92,"color":93,"percentage":94},"C++","#f34b7d",1.1,{"name":96,"color":97,"percentage":98},"Shell","#89e051",0.6,{"name":100,"color":101,"percentage":102},"C","#555555",0.2,{"name":104,"color":105,"percentage":106},"Jupyter Notebook","#DA5B0B",0.1,3124,233,"2026-04-17T03:46:40","MIT","Linux","需要 NVIDIA GPU，预编译包指定 CUDA 12 (cu12)，测试环境使用 A100","未说明",{"notes":115,"python":116,"dependencies":117},"1. 官方提供的快速安装命令仅包含 Linux x86_64 的预编译 wheel 包 (mamba_ssm)，Windows 和 macOS 用户可能需要从源码编译 mamba-ssm。2. 在下游任务中若发现运行缓慢，建议在代码中禁用 `torch.backends.cudnn.enabled`。3. 该工具基于视觉状态空间模型 (VSS)，核心依赖特定的 `mamba-ssm` 库版本。","3.10",[118,119,120,121,122,123,124,125,126,127],"torch==2.2","torchvision","torchaudio","triton","mamba-ssm==2.2.4","timm==0.4.12","numpy==1.24.4","einops","ninja","fvcore",[15],"2026-03-27T02:49:30.150509","2026-04-18T09:19:35.727265",[132,137,142,147,152,156],{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},39616,"安装后出现 'ModuleNotFoundError: No module named selective_scan_cuda' 或无法导入相关模块的警告，如何解决？","这通常是因为 CUDA 版本不匹配或环境配置问题。建议严格按照作者提供的 requirements.txt 升级 CUDA 版本并重新安装环境。例如，如果使用的是 PyTorch 2.0.0+cu118，请确保系统安装的 cudatoolkit 也是 11.8 版本。推荐使用 conda install 安装 PyTorch，因为它会自动下载匹配的 cudatoolkit；若使用 pip 安装，需手动调整 cudatoolkit 版本以保持一致。可以通过运行 `nvcc -V` 检查当前 CUDA 版本。","https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Fissues\u002F55",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},39617,"遇到 'Cannot import selective_scan_cuda_oflex' 警告导致训练速度极慢（甚至显示需要几百天），该怎么办？","该问题通常由 CUDA 版本过低引起（如 CUDA 11.1 不支持 selective scan 优化）。解决方案是升级 CUDA 版本，并严格依据项目的 requirements.txt 重新配置环境。有用户反馈在 mmcv v1.5.2 + CUDA 11.1 + PyTorch 1.9 环境下无法工作，升级后可解决。","https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Fissues\u002F291",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},39618,"编译 selective_scan 时出现 'ninja -v returned non-zero exit status 1' 或 '.o 文件不存在' 的错误，如何修复？","部分环境中 `ninja -v` 命令不被支持，可尝试修改构建脚本，将 `command = ['ninja', '-v']` 改为 `command = ['ninja', '--version']`。此外，确保 PyTorch 版本与 CUDA 版本完全匹配（例如 torch 2.0.0+cu118 对应 CUDA 11.8）。建议使用 conda 安装 PyTorch 以自动处理依赖，若使用 pip 安装，需手动确认 `nvcc -V` 显示的版本与 PyTorch 要求的版本一致，并正确设置 CUDA_HOME 环境变量指向包含 bin 文件的 CUDA 目录。","https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Fissues\u002F64",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},39619,"VMamba 在分割任务中的表现为何远超 Swin-T，即使训练迭代次数相同？","VMamba 在早期迭代中表现优异可能与其模型特性有关，但也存在过拟合下游任务的风险。有用户发现增加 drop_path 或 decay 能改善结果。此外，VMamba 的训练速度通常比 Swin-T 慢（例如 VMamba-T 耗时约 11.5 小时，而 Swin-T 约 6 小时），更长的训练时间可能带来了性能提升。研究者需在速度与性能之间进行权衡。","https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Fissues\u002F328",{"id":153,"question_zh":154,"answer_zh":155,"source_url":136},39620,"是否必须同时使用 selective_scan_cuda_core、selective_scan_cuda 和 selective_scan_cuda_oflex 这三个模块？","虽然缺少这些模块代码仍可运行，但会严重影响速度（出现 'This affects speed' 警告）。为了获得最佳性能，应确保这三个模块都能成功导入。如果只能修复其中一个（如 oflex），通常是因为 CUDA 版本或编译环境问题导致其他模块构建失败，建议检查并统一 CUDA 与 PyTorch 版本后重新编译安装。",{"id":157,"question_zh":158,"answer_zh":159,"source_url":146},39621,"使用 pip 安装 PyTorch 后，运行 pip install . 报错找不到 CUDA_HOME 或 nvcc 命令，如何解决？","这是因为 pip 安装的 PyTorch 不会自动配置系统的 cudatoolkit。解决方法是：1. 使用 `nvcc -V` 检查系统是否安装了正确的 CUDA 版本；2. 若未找到命令，需手动安装对应版本的 cudatoolkit 并将其 bin 目录（如 \u002Fusr\u002Flocal\u002Fcuda\u002Fbin）添加到环境变量 PATH 中；3. 设置 CUDA_HOME 环境变量指向 CUDA 安装根目录。推荐直接使用 `conda install pytorch torchvision torchaudio cudatoolkit=xx.x -c pytorch` 来避免此类路径配置问题。",[161,166,171,176,181,186,191],{"id":162,"version":163,"summary_zh":164,"released_at":165},315569,"#v0seg","### **ADE20K 数据集上的语义分割**\n\n| 主干网络 | 输入分辨率 | 参数量 | FLOPs | 分割模型 | mIoU(SS) | mIoU(MS) | 配置文件\u002F日志\u002F多尺度测试日志\u002F检查点 |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |\n| Vanilla-VMamba-T| 512×512 | 55M | ~~939G~~ 964G | UperNet@16万次迭代 | 47.3| 48.3| [配置](segmentation\u002Fconfigs\u002Fvssm\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240223\u002Fvssmtiny_upernet_4xb4-160k_ade20k-512x512.log)\u002F[多尺度测试日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240223\u002Fvssmtiny_upernet_4xb4-160k_ade20k-512x512_iter_160000_tta.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240223\u002Fvssmtiny_upernet_4xb4-160k_ade20k-512x512_iter_160000.pth) |\n| Vanilla-VMamba-S| 512×512 | 76M | ~~1037G~~ 1081G | UperNet@16万次迭代 | 49.5| 50.5|[配置](segmentation\u002Fconfigs\u002Fvssm\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240223\u002Fvssmsmall_upernet_4xb4-160k_ade20k-512x512.log)\u002F[多尺度测试日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240223\u002Fvssmsmall_upernet_4xb4-160k_ade20k-512x512_iter_160000_tta.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240223\u002Fvssmsmall_upernet_4xb4-160k_ade20k-512x512_iter_160000.pth) |\n| Vanilla-VMamba-B| 512×512 | 110M | ~~1167G~~ 1226G | UperNet@16万次迭代 | 50.0| 51.3|[配置](segmentation\u002Fconfigs\u002Fvssm\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240223\u002Fvssmbase_upernet_4xb4-160k_ade20k-512x512.log)\u002F[多尺度测试日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240223\u002Fvssmbase_upernet_4xb4-160k_ade20k-512x512_iter_128000_tta.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240223\u002Fvssmbase_upernet_4xb4-160k_ade20k-512x512_iter_128000.pth) |\r\n\r\n","2024-02-22T16:51:06",{"id":167,"version":168,"summary_zh":169,"released_at":170},315570,"#v0det","### **COCO 数据集上的目标检测**\n\n| 主干网络 | 参数量 | FLOPs | 检测器 | bboxAP | bboxAP50 | bboxAP75 | segmAP | segmAP50 | segmAP75 | 配置\u002F日志\u002F检查点 |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |:---: |:---: |:---: |\n| Vanilla-VMamba-T | 42M | ~~262G~~ 286G | MaskRCNN@1x | 46.5 |68.5 |50.7 |42.1 |65.5 |45.3  | [配置](detection\u002Fconfigs\u002Fvssm\u002Fmask_rcnn_vssm_fpn_coco_tiny.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240222\u002Fvssmtiny_mask_rcnn_swin_fpn_coco.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240222\u002Fvssmtiny_mask_rcnn_swin_fpn_coco_epoch_12.pth) |\n| Vanilla-VMamba-S | 64M | ~~357G~~ 400G | MaskRCNN@1x | 48.2 |69.7 |52.5 |43.0 |66.6 |46.4 | [配置](detection\u002Fconfigs\u002Fvssm\u002Fmask_rcnn_vssm_fpn_coco_small.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240222\u002Fvssmsmall_mask_rcnn_swin_fpn_coco.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240222\u002Fvssmsmall_mask_rcnn_swin_fpn_coco_epoch_12.pth) |\n| Vanilla-VMamba-B | 96M | ~~482G~~ 540G | MaskRCNN@1x | 48.6 |70.0 |53.1 |43.3 |67.1 |46.7  | [配置](detection\u002Fconfigs\u002Fvssm\u002Fmask_rcnn_vssm_fpn_coco_base.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240222\u002Fvssmbase_mask_rcnn_swin_fpn_coco.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240222\u002Fvssmbase_mask_rcnn_swin_fpn_coco_epoch_12.pth) |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |\n| Vanilla-VMamba-T | 42M | ~~262G~~ 286G | MaskRCNN@3x | 48.5 |70.0 |52.7 |43.2 |66.9 |46.4 | [配置](detection\u002Fconfigs\u002Fvssm\u002Fmask_rcnn_vssm_fpn_coco_tiny_ms_3x.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240222\u002Fvssmtiny_mask_rcnn_swin_fpn_coco_ms_3x.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240222\u002Fvssmtiny_mask_rcnn_swin_fpn_coco_ms_3x_epoch_34.pth) |\n| Vanilla-VMamba-S | 64M | ~~357G~~ 400G | MaskRCNN@3x | 49.7 |70.4 |54.2 |44.0 |67.6 |47.3 | [配置](detection\u002Fconfigs\u002Fvssm\u002Fmask_rcnn_vssm_fpn_coco_small_ms_3x.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240222\u002Fvssmsmall_mask_rcnn_swin_fpn_coco_ms_3x.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%2320240222\u002Fvssmsmall_mask_rcnn_swin_fpn_coco_ms_3x_epoch_34.pth) |\n\n","2024-02-22T16:09:11",{"id":172,"version":173,"summary_zh":174,"released_at":175},315571,"#v0cls","`VMamba`（即 `vssm version 0`）的检查点\n\n这些检查点对应于 #20240119 日期之前进行的实验。\n\n| 名称       | 预训练数据集   | 分辨率    | 精度@1 | 参数量 | FLOPs    | 最佳 epoch | 是否使用 EMA | 配置文件                     |\n|------------|----------------|-----------|---------|--------|----------|------------|-------------|------------------------------|\n| VMamba-T   | ImageNet-1K    | 224×224  | 82.2    | 22M    | ~~4.5G~~ 5.6G | 292        | 未启用      | [config](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Fblob\u002F%2320240218\u002Fclassification\u002Fconfigs\u002Fvssm\u002Fvssm_tiny_224.yaml) |\n| VMamba-S   | ImageNet-1K    | 224×224  | 83.5    | 44M    | ~~9.1G~~ 11.2G | 238        | 是          | [config](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Fblob\u002F%2320240218\u002Fclassification\u002Fconfigs\u002Fvssm\u002Fvssm_small_224.yaml) |\n| VMamba-B   | ImageNet-1K    | 224×224  | 83.2    | 75M    | ~~15.2G~~ 18.0G | 260        | 未启用      | [config](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Fblob\u002F%2320240218\u002Fclassification\u002Fconfigs\u002Fvssm\u002Fvssm_base_224.yaml) |\n| VMamba-B*  | ImageNet-1K    | 224×224  | 83.7    | 75M    | ~~15.2G~~ 18.0G | 241        | 是          | [config](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Fblob\u002F%2320240218\u002Fclassification\u002Fconfigs\u002Fvssm\u002Fvssm_base_224.yaml) |\n\n*大多数主干模型在训练时未使用 EMA，因为据 \\cite(Swin-Transformer) 报道，EMA 并不能提升性能。我们之所以使用 EMA，是因为我们的模型仍处于开发阶段，尚未完成超参数调优。*\n\n*用于目标检测和分割任务的检查点是 `VMamba-B，dropout 概率为 0.5` + `不使用 EMA`。而 `VMamba-B*` 则表示 `VMamba-B，dropout 概率为 0.6，并启用 EMA`，其性能表现为：非 EMA 情况下，第 262 个 epoch 的精度为 83.3；启用 EMA 后，在第 241 个 epoch 达到 83.7。*","2024-02-18T03:28:34",{"id":177,"version":178,"summary_zh":179,"released_at":180},315572,"#v2seg","### **ADE20K 数据集上的语义分割**\n\n\n| 主干网络 | 输入分辨率 | 参数量 | FLOPs | 分割模型 | mIoU(SS) | mIoU(MS) | 配置文件\u002F日志\u002F多尺度测试日志\u002F检查点 |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |\n| VMamba-T[`s2l5`] | 512×512 | 62M | 948G | UperNet@160k | 48.3| 48.6| [配置](segmentation\u002Fconfigs\u002Fvssm1\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny.log)\u002F[多尺度测试日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_tta.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_iter_160000.pth) |\n| VMamba-S[`s2l15`] | 512×512 | 82M | 1028G | UperNet@160k | 50.6| 51.2|[配置](segmentation\u002Fconfigs\u002Fvssm1\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small.log)\u002F[多尺度测试日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small_tta.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_small_iter_144000.pth) |\n| VMamba-B[`s2l15`] | 512×512 | 122M | 1170G | UperNet@160k | 51.0| 51.6|[配置](segmentation\u002Fconfigs\u002Fvssm1\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base.log)\u002F[多尺度测试日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base_tta.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_base_iter_160000.pth) |\n| VMamba-T[`s1l8`] | 512×512 | 62M | 949G | UperNet@160k | 47.9| 48.8| [配置](segmentation\u002Fconfigs\u002Fvssm1\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_s.log)\u002F[多尺度测试日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_s_tta.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2seg\u002Fupernet_vssm_4xb4-160k_ade20k-512x512_tiny_s_iter_160000.pth) |\n","2024-03-20T03:13:49",{"id":182,"version":183,"summary_zh":184,"released_at":185},315573,"#v2cls","### **ImageNet-1K 上的分类结果**\n| 名称 | 预训练 | 分辨率 | 精度@1 | 参数量 | FLOPs | 吞吐量 | 训练吞吐量 | 配置\u002F日志\u002F检查点 |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n| VMamba-T[`s2l5`] | ImageNet-1K | 224×224 | 82.5 | 31M | 4.9G | 1340 | 464 | [配置](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2_tiny_224.yaml)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_tiny_0230.txt)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_tiny_0230_ckpt_epoch_262.pth) |\n| VMamba-S[`s2l15`] | ImageNet-1K | 224×224 | 83.6 | 50M | 8.7G | 877 | 314 | [配置](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2_small_224.yaml)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_small_0229.txt)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_small_0229_ckpt_epoch_222.pth) |\n| VMamba-B[`s2l15`] | ImageNet-1K | 224×224 | 83.9 | 89M | 15.4G | 646 | 247 | [配置](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2_base_224.yaml)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_base_0229.txt)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm_base_0229_ckpt_epoch_237.pth) |\n| VMamba-T[`s1l8`] | ImageNet-1K | 224×224 | 82.6 | 30M | 4.9G | 1686 | 571 | [配置](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2v_tiny_224.yaml)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm1_tiny_0230s.txt)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm1_tiny_0230s_ckpt_epoch_264.pth) |\n| VMamba-S[`s1l20`] | ImageNet-1K | 224×224 | 83.3 | 49M | 8.6G | 1106 | 390 | [配置](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2v_small_224.yaml)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm1_small_0229s.txt)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm1_small_0229s_ckpt_epoch_240.pth) |\n| VMamba-B[`s1l20`] | ImageNet-1K | 224×224 | 83.8 | 87M | 15.2G | 827 | 313 | [配置](classification\u002Fconfigs\u002Fvssm\u002Fvmambav2v_base_224.yaml)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm1_base_0229s.txt)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2cls\u002Fvssm1_base_0229s_ckpt_epoch_225.pth) |\n\n* *本小节中的模型均从随机或手动初始化开始进行训练。超参数沿用了 Swin 的设置，仅 `drop_path_rate` 和 `EMA` 有所调整。除 `Vanilla-VMamba-T` 外，所有模型均使用了 EMA。*\n* *`TP.(吞吐量)` 和 `Train TP. (训练吞吐量)` 是在配备 AMD EPYC 7542 CPU 的 A100 GPU 上评估得出的，批量大小为 128。`Train TP.` 在混合分辨率下测试，不包括优化器的时间消耗。*\n* *`FLOPs` 和 `参数量` 现在已包含 `head` 部分（此前版本未计算 `head`，因此数值略高）。*\n* *我们采用 @ albertgu 提供的算法来计算 `FLOPs`，该方法得到的结果会比 pr 更大。*","2024-03-16T08:47:17",{"id":187,"version":188,"summary_zh":189,"released_at":190},315574,"#v2det","### **COCO 数据集上的目标检测**\n\n| 主干网络 | 参数量 | FLOPs | 检测器 | bboxAP | bboxAP50 | bboxAP75 | segmAP | segmAP50 | segmAP75 | 配置\u002F日志\u002F检查点 |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |:---: |:---: |:---: |\n| VMamba-T[`s2l5`] | 50M | 270G | MaskRCNN@1x | 47.4 |69.5 |52.0 |42.7 |66.3 |46.0 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_tiny1.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_epoch_12.pth) |\n| VMamba-S[`s2l15`] | 70M | 384G | MaskRCNN@1x | 48.7 |70.0 |53.4 |43.7 |67.3 |47.0 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_small.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small_epoch_11.pth) |\n| VMamba-B[`s2l15`] | 108M | 485G | MaskRCNN@1x | 49.2 |71.4 |54.0 |44.1 |68.3 |47.7 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_base.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base_epoch_11.pth) |\n| VMamba-B[`s2l15`] | 108M | 485G | MaskRCNN@1x[`bs8`] | 49.2 |70.9 |53.9 |43.9 |67.7 |47.6 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_base.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base_bs8.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_base_epoch_12_bs8.pth) |\n| VMamba-T[`s1l8`] | 50M | 271G | MaskRCNN@1x | 47.3 |69.3 |52.0 |42.7 |66.4 |45.9 | [配置](..\u002Fdetection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_tiny.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_s.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_s_epoch_12.pth) |\n| :---: | :---: | :---: | :---: | :---: | :---: |:---: |:---: |:---: |:---: |:---: |:---: |\n| VMamba-T[`s2l5`] | 50M | 270G | MaskRCNN@3x | 48.9 |70.6 |53.6 |43.7 |67.7 |46.8 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_tiny1_ms_3x.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_ms_3x.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_tiny_ms_3x_epoch_36.pth) |\n| VMamba-S[`s2l15`] | 70M | 384G | MaskRCNN@3x | 49.9 |70.9 |54.7 |44.20 |68.2 |47.7 | [配置](detection\u002Fconfigs\u002Fvssm1\u002Fmask_rcnn_vssm_fpn_coco_small_ms_3x.py)\u002F[日志](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small_ms_3x.log)\u002F[检查点](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Freleases\u002Fdownload\u002F%23v2det\u002Fmask_rcnn_vssm_fpn_coco_small_ms_3x_epoc","2024-03-20T03:06:39",{"id":192,"version":193,"summary_zh":194,"released_at":195},315575,"#20240220","| 名称 | 预训练 | 分辨率 | top-1准确率 | 参数量 | FLOPs | 最佳epoch | 使用EMA | 配置 |\r\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\r\n| VMamba-T | ImageNet-1K | 224x224 | 82.5 | 32M | 5G | 258 | true | [config](https:\u002F\u002Fgithub.com\u002FMzeroMiko\u002FVMamba\u002Fblob\u002Fmain\u002Fclassification\u002Fconfigs\u002Fvssm1\u002Fvssm_tiny_224_0220.yaml) |\r\n\r\n*我们使用EMA是因为我们的模型仍在开发中，尚未进行超参数调优。*\r\n\r\n*这是预发布版本*","2024-02-22T02:13:26"]