[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-chongzhou96--EdgeSAM":3,"tool-chongzhou96--EdgeSAM":61},[4,18,26,36,44,52],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",141543,2,"2026-04-06T11:32:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":53,"name":54,"github_repo":55,"description_zh":56,"stars":57,"difficulty_score":10,"last_commit_at":58,"category_tags":59,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,60],"视频",{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":75,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":32,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":110,"github_topics":111,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":115,"updated_at":116,"faqs":117,"releases":158},4398,"chongzhou96\u002FEdgeSAM","EdgeSAM","Official PyTorch implementation of \"EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM\"","EdgeSAM 是专为手机、平板等边缘设备打造的加速版图像分割模型，源自著名的 Segment Anything Model (SAM)。它主要解决了原版 SAM 模型体积大、计算慢，难以在本地设备上实时运行的痛点。通过独特的“提示循环蒸馏”技术，EdgeSAM 将原本基于 ViT 的复杂架构转化为更轻量的纯 CNN 架构，并在训练过程中让提示编码器与掩码解码器共同参与，从而精准捕捉用户输入与生成结果间的动态关系。\n\n这一创新使得 EdgeSAM 在几乎不牺牲精度的前提下，速度比原版 SAM 提升了 40 倍，比同类轻量模型 MobileSAM 快 14 倍，更是首个能在 iPhone 14 上实现超过 30 FPS 实时运行的 SAM 变体。无论是希望在移动端集成高精度分割功能的开发者、追求高效实验的研究人员，还是希望通过 iOS 应用（如 CutCha）体验一键抠图的普通用户，都能从中受益。目前，EdgeSAM 已开源训练代码并支持 ONNX 导出，轻松融入各类标注工具与工作流，让强大的 AI 分割能力真正触手可及。","# EdgeSAM\n**Prompt-In-the-Loop Distillation for On-Device Deployment of SAM**\n\n\n[Chong Zhou\u003Csup>1\u003C\u002Fsup>](https:\u002F\u002Fchongzhou96.github.io\u002F),\n[Xiangtai Li\u003Csup>1\u003C\u002Fsup>](https:\u002F\u002Flxtgh.github.io\u002F),\n[Chen Change Loy\u003Csup>1*\u003C\u002Fsup>](https:\u002F\u002Fwww.mmlab-ntu.com\u002Fperson\u002Fccloy\u002F),\n[Bo Dai\u003Csup>2\u003C\u002Fsup>](https:\u002F\u002Fdaibo.info\u002F)\n\n(*corresponding author)\n\n[\u003Csup>1\u003C\u002Fsup>S-Lab, Nanyang Technological University](https:\u002F\u002Fwww.mmlab-ntu.com\u002F),\n[\u003Csup>2\u003C\u002Fsup>Shanghai Artificial Intelligence Laboratory](https:\u002F\u002Fwww.shlab.org.cn\u002F)\n\n[[`Paper`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.06660)]\n[[`Project Page`](https:\u002F\u002Fwww.mmlab-ntu.com\u002Fproject\u002Fedgesam\u002F)]\n[[`Hugging Face Demo`](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM)]\n[[`iOS App`](https:\u002F\u002Fapps.apple.com\u002Fus\u002Fapp\u002Fcutcha-photo\u002Fid6478521132)]\n\nhttps:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fassets\u002F15973859\u002Ffe1cd104-88dc-4690-a5ea-ff48ae013db3\n\n**Watch the full live demo video: [[YouTube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=YYsEQ2vleiE)] [[Bilibili](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1294y1P7TC\u002F)]**\n\n## Updates\n\n* **2024\u002F07\u002F23**: We release our training and evaluation code, check out [README_TRAIN.md](README_TRAIN.md).\n* **2024\u002F06\u002F05**: Check out our iOS App [CutCha](https:\u002F\u002Fapps.apple.com\u002Fus\u002Fapp\u002Fcutcha-photo\u002Fid6478521132) powered by EdgeSAM.\n* **2024\u002F01\u002F01**: EdgeSAM is intergrated into [X-AnyLabeling](https:\u002F\u002Fgithub.com\u002FCVHub520\u002FX-AnyLabeling).\n* **2023\u002F12\u002F19**: EdgeSAM is now supported in [ISAT](https:\u002F\u002Fgithub.com\u002FyatengLG\u002FISAT_with_segment_anything), a segmentation labeling tool.\n* **2023\u002F12\u002F16**: EdgeSAM is now supported in [Grounded-Segment-Anything](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-Segment-Anything). Check out the [grounded-edge-sam demo](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-Segment-Anything\u002Fblob\u002Fmain\u002FEfficientSAM\u002Fgrounded_edge_sam.py). Thanks to the IDEA Research team!\n* **2023\u002F12\u002F14**: [autodistill-grounded-edgesam](https:\u002F\u002Fgithub.com\u002Fautodistill\u002Fautodistill-grounded-edgesam) combines Grounding DINO and EdgeSAM to create Grounded EdgeSAM [[blog](https:\u002F\u002Fblog.roboflow.com\u002Fhow-to-use-grounded-edgesam\u002F)]. Thanks to the Roboflow team!\n* **2023\u002F12\u002F13**: Add ONNX export and speed up the web demo with ONNX as the backend.\n\n## Overview\n\n**EdgeSAM** is an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance.\nIt achieves a **40-fold speed increase** compared to the original SAM, and outperforms MobileSAM, being **14 times as fast** when deployed on edge devices while enhancing the mIoUs on COCO and LVIS by 2.3 and 3.2 respectively.\nEdgeSAM is also the first SAM variant that can run at **over 30 FPS** on an iPhone 14.\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"900\" alt=\"compare\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fchongzhou96_EdgeSAM_readme_6dac34973031.png\">\n\u003C\u002Fp>\n\n*In this figure, we show the encoder throughput of EdgeSAM compared with SAM and MobileSAM as well as the mIoU performance on the SA-1K dataset (sampled from SA-1B) with box and point prompts.*\n\n\u003Cdetails>\n\n\u003Csummary> \u003Cstrong>Approach\u003C\u002Fstrong> \u003C\u002Fsummary>\n\nOur approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture, better suited for edge devices. We carefully benchmark various distillation strategies and demonstrate that task-agnostic encoder distillation fails to capture the full knowledge embodied in SAM. To overcome this bottleneck, we include both the prompt encoder and mask decoder in the distillation process, with box and point prompts in the loop, so that the distilled model can accurately capture the intricate dynamics between user input and mask generation.\n\n  \u003Cp align=\"center\">\n    \u003Cimg width=\"612\" alt=\"arch\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fchongzhou96_EdgeSAM_readme_d3fb04b3b86c.png\">\n  \u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n\u003Csummary> \u003Cstrong>Performance\u003C\u002Fstrong> \u003C\u002Fsummary>\n\n| Method      | Train Set | COCO AP | COCO AP\u003Csub>s\u003C\u002Fsub> | COCO AP\u003Csub>m\u003C\u002Fsub> | COCO AP\u003Csub>l\u003C\u002Fsub> | GFLops | MParam. | FPS iPhone 14 | FPS 2080 Ti | FPS 3090 |\n|-------------|-----------|---------|---------------------|---------------------|---------------------|--------|---------|---------------|-------------|----------|\n| SAM         | SA-1B     | 46.1    | 33.6                | 51.9                | 57.7                | 2734.8 | 641.1   | -             | 4.3         | -        |\n| FastSAM     | 2% SA-1B  | 37.9    | 23.9                | 43.4                | 50.0                | 887.6  | 68.2    | -             | -           | 25.0*    |\n| MobileSAM   | 1% SA-1B  | 39.4    | 26.9                | 44.4                | 52.2                | 38.2   | 9.8     | 4.9           | 103.5       | 100.0*   |\n| EdgeSAM     | 1% SA-1B  | 42.2    | 29.6                | 47.6                | 53.9                | 22.1   | 9.6     | 38.7          | 164.3       | -        |\n| EdgeSAM-3x  | 3% SA-1B  | 42.7    | 30.0                | 48.6                | 54.5                | 22.1   | 9.6     | 38.7          | 164.3       | -        |\n| EdgeSAM-10x | 10% SA-1B | 43.0    | 30.3                | 48.9                | 55.1                | 22.1   | 9.6     | 38.7          | 164.3       | -        |\n\n*In this table, we report the mask mAP on the COCO dataset. ViTDet-H is used as the detector, whose box mAP is 58.7, to provide box prompts. For speed benchmarking, we infer both the encoder and decoder (with a single prompt). FLOPs are calculated based on the 1024x1024 input resolution. Numbers denoted by * are copied from MobileSAM. 3x and 10x represent training with more data. Here, we do not apply an additional mask refinement iteration per the setting of the original SAM paper.*\n\n\u003C\u002Fdetails>\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Usage](#usage)\n- [Train and Eval](#train)\n- [Web Demo](#demo)\n- [CoreML \u002F ONNX Export](#export)\n- [Checkpoints](#checkpoints)\n- [iOS App](#ios)\n- [Acknowledgements](#acknowledgement)\n- [Citation](#cite)\n- [License](#license)\n\n## Installation \u003Ca name=\"installation\">\u003C\u002Fa>\n\nThe code requires `python>=3.8` and we use `torch==2.0.0` and `torchvision==0.15.1`. Please refer to the\n[official PyTorch installation instructions](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F).\n\n1. Clone the repository locally:\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM.git && cd EdgeSAM\n```\n\n2. Install additional dependencies:\n\n```\npip install -r requirements.txt\n```\n\n3. Install EdgeSAM:\n\n```\npip install -e .\n```\n\n## Usage \u003Ca name=\"usage\">\u003C\u002Fa>\n\n1. Download checkpoints (please refer to [Checkpoints](#checkpoints) for more details about the PyTorch and CoreML checkpoints):\n\n```\nmkdir weights\nwget -P weights\u002F https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam.pth\nwget -P weights\u002F https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x.pth\n```\n\n2. You can easily incorporate EdgeSAM into your Python code with following lines:\n\n```\nfrom edge_sam import SamPredictor, sam_model_registry\nsam = sam_model_registry[\"edge_sam\"](checkpoint=\"\u003Cpath\u002Fto\u002Fcheckpoint>\")\npredictor = SamPredictor(sam)\npredictor.set_image(\u003Cyour_image>)\nmasks, _, _ = predictor.predict(\u003Cinput_prompts>)\n```\n\nSince EdgeSAM follows the same encoder-decoder architecture as SAM, their usages are very similar. One minor difference is that EdgeSAM allows outputting 1, 3, and 4 mask candidates for each prompt, while SAM yields either 1 or 3 masks. For more details, please refer to the [example Jupyter Notebook](https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fblob\u002Fmaster\u002Fnotebooks\u002Fpredictor_example.ipynb).\n\n## Train and Eval \u003Ca name=\"train\">\u003C\u002Fa>\nPlease refer to [README_TRAIN.md](README_TRAIN.md) for more details.\n\n## Web Demo \u003Ca name=\"demo\">\u003C\u002Fa>\nAfter installing EdgeSAM and downloading the checkpoints. You can start an interactive web demo with the following command:\n\n```\npython web_demo\u002Fgradio_app.py\n```\n\nBy default, the demo is hosted on `http:\u002F\u002F0.0.0.0:8080\u002F` and expects `edge_sam_3x.pth` to be stored in the `weights\u002F` folder. You can change the default behavior by:\n\n```\npython web_demo\u002Fgradio_app.py --checkpoint [CHECKPOINT] --server-name [SERVER_NAME] --port [PORT]\n```\n\nSince EdgeSAM can run smoothly on a mobile phone, it's fine if you don't have a GPU.\n\nWe've deployed the same web demo in the Hugging Face Space [[link](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM)]. \u003Cdel> However, since it uses the CPU as the backend and is shared by all users, the experience might not be as good as a local deployment. \u003C\u002Fdel>  Really appreciate the Hugging Face team for supporting us with the GPU!\n\n**Speed up the web demo with ONNX backend**\n\n1. Install the onnxruntime with `pip install onnxruntime` if your machine doesn't have a GPU or `pip install onnxruntime-gpu` if it does (but don't install both of them). Our implementation is tested under version `1.16.3`.\n\n2. Download the ONNX models to the `weights\u002F` folder:\n\n```\nwget -P weights\u002F https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_encoder.onnx\nwget -P weights\u002F https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_decoder.onnx\n```\n\n3. Start the demo:\n\n```\npython web_demo\u002Fgradio_app.py --enable-onnx\n```\n\n4. Navigate to http:\u002F\u002F0.0.0.0:8080 in your browser.\n\n## CoreML \u002F ONNX Export \u003Ca name=\"export\">\u003C\u002Fa>\n\n**CoreML**\n\nWe provide a script that can export a trained EdgeSAM PyTorch model to two CoreML model packages, one for the encoder and another for the decoder. You can also download the exported CoreML models at [Checkpoints](#checkpoints).\n\nFor encoder:\n\n```\npython scripts\u002Fexport_coreml_model.py [CHECKPOINT]\n```\n\nFor decoder:\n\n```\npython scripts\u002Fexport_coreml_model.py [CHECKPOINT] --decoder --use-stability-score\n```\n\nSince EdgeSAM doesn't perform knowledge distillation on the IoU token of the original SAM, its IoU predictions might not be reliable. Therefore, we use the stability score for mask selection instead. You can stick to the IoU predictions by removing `--use-stability-score`.\n\nThe following shows the performance reports of the EdgeSAM CoreML models measured by Xcode on an iPhone 14 (left: encoder, right: decoder):\n\n\u003Cp align=\"center\">\n\n  ![xcode](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fchongzhou96_EdgeSAM_readme_da61221d1ca5.png)\n\n\u003C\u002Fp>\n\n\u003Cdetails>\n  \u003Csummary> \u003Cstrong> Known issues and model descriptions \u003C\u002Fstrong> \u003C\u002Fsummary>\n\n  As of `coremltools==7.1`, you may encounter the assertion error during the export, e.g., `assert len(inputs) \u003C= 3 or inputs[3] is None`. One workaround is to comment out this assertion following the traceback path, e.g., `\u002Fopt\u002Fanaconda3\u002Fenvs\u002FEdgeSAM\u002Flib\u002Fpython3.8\u002Fsite-packages\u002Fcoremltools\u002Fconverters\u002Fmil\u002Ffrontend\u002Ftorch\u002Fops.py line 1573`.\n\n  Since CoreML doesn't support interpolation with dynamic target sizes, the converted CoreML models do not contain the pre-processing, i.e., resize-norm-pad, and the post-processing, i.e., resize back to the original size.\n\n  The encoder takes a `1x3x1024x1024` image as the input and outputs a `1x256x64x64` image embedding. The decoder then takes the image embedding together with point coordinates and point labels as the input. The point coordinates follow the `(height, width)` format with the top-left corner as the `(0, 0)`. The choices of point labels are `0: negative point`, `1: positive point`, `2: top-left corner of box`, and `3: bottom-right corner of box`.\n\n\u003C\u002Fdetails>\n\n**ONNX**\n\nSimilar to the CoreML export, you can use the following commands to export the encoder and the decoder to ONNX models respectively:\n\nFor encoder:\n\n```\npython scripts\u002Fexport_onnx_model.py [CHECKPOINT]\n```\n\nFor decoder:\n\n```\npython scripts\u002Fexport_onnx_model.py [CHECKPOINT] --decoder --use-stability-score\n```\n\n## Checkpoints \u003Ca name=\"checkpoints\">\u003C\u002Fa>\n\nPlease download the checkpoints of EdgeSAM from its Hugging Face Space (all the EdgeSAM variants only differ in the number of training images):\n\n| Model               | COCO mAP | PyTorch | CoreML         | ONNX           |\n| ------------------- | -------- | ------- | -------------- | -------------- |\n| SAM                 | 46.1     | -       | -              | -              |\n| EdgeSAM             | 42.1     | [Download](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam.pth) | [[Encoder](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_encoder.mlpackage.zip)] [[Decoder](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_decoder.mlpackage.zip)] | [[Encoder](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_encoder.onnx)] [[Decoder](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_decoder.onnx)] |\n| EdgeSAM-3x          | 42.7     | [Download](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x.pth) | [[Encoder](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_encoder.mlpackage.zip)] [[Decoder](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_decoder.mlpackage.zip)] | [[Encoder](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_encoder.onnx)] [[Decoder](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_decoder.onnx)] |\n| EdgeSAM-10x         | 43       | TBA     | TBA            | TBA |\n\nNote: You need to unzip the CoreML model packages before usage.\n\n## iOS App \u003Ca name=\"ios\">\u003C\u002Fa>\nWe are planning to release the iOS app that we used in the live demo to the App Store. Please stay tuned!\n\n## Acknowledgements \u003Ca name=\"acknowledgement\">\u003C\u002Fa>\nThis study is supported under the RIE2020 Industry Alignment Fund Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). We are grateful to [Han Soong Chong](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fhansoong-choong-0493a5155\u002F) for his effort in the demonstration application.\n\nWe appreciate the following projects, which enable EdgeSAM: [SAM](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything), [MobileSAM](https:\u002F\u002Fgithub.com\u002FChaoningZhang\u002FMobileSAM), [FastSAM](https:\u002F\u002Fgithub.com\u002FCASIA-IVA-Lab\u002FFastSAM), [TinyViT](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCream), and [RepViT](https:\u002F\u002Fgithub.com\u002FTHU-MIG\u002FRepViT).\n\n## Citation \u003Ca name=\"cite\">\u003C\u002Fa>\n```bibtex\n@article{zhou2023edgesam,\n  title={EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM},\n  author={Zhou, Chong and Li, Xiangtai and Loy, Chen Change and Dai, Bo},\n  journal={arXiv preprint arXiv:2312.06660},\n  year={2023}\n}\n```\n\n## License \u003Ca name=\"license\">\u003C\u002Fa>\n\nThis project is licensed under \u003Ca rel=\"license\" href=\"https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fblob\u002Fmaster\u002FLICENSE\">NTU S-Lab License 1.0\u003C\u002Fa>. Redistribution and use should follow this license.\n","# EdgeSAM\n**基于提示循环的蒸馏技术，用于在端侧设备上部署 SAM**\n\n\n[周冲\u003Csup>1\u003C\u002Fsup>](https:\u002F\u002Fchongzhou96.github.io\u002F),\n[李翔泰\u003Csup>1\u003C\u002Fsup>](https:\u002F\u002Flxtgh.github.io\u002F),\n[陈昌毅·洛伊\u003Csup>1*\u003C\u002Fsup>](https:\u002F\u002Fwww.mmlab-ntu.com\u002Fperson\u002Fccloy\u002F),\n[戴博\u003Csup>2\u003C\u002Fsup>](https:\u002F\u002Fdaibo.info\u002F)\n\n(*通讯作者)\n\n[\u003Csup>1\u003C\u002Fsup>S-Lab，南洋理工大学](https:\u002F\u002Fwww.mmlab-ntu.com\u002F)，\n[\u003Csup>2\u003C\u002Fsup>上海人工智能实验室](https:\u002F\u002Fwww.shlab.org.cn\u002F)\n\n[[`论文`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.06660)]\n[[`项目页面`](https:\u002F\u002Fwww.mmlab-ntu.com\u002Fproject\u002Fedgesam\u002F)]\n[[`Hugging Face 演示`](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM)]\n[[`iOS 应用`](https:\u002F\u002Fapps.apple.com\u002Fus\u002Fapp\u002Fcutcha-photo\u002Fid6478521132)]\n\nhttps:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fassets\u002F15973859\u002Ffe1cd104-88dc-4690-a5ea-ff48ae013db3\n\n**观看完整直播演示视频：[[YouTube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=YYsEQ2vleiE)] [[Bilibili](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1294y1P7TC\u002F)]**\n\n## 更新\n\n* **2024年7月23日**：我们发布了训练和评估代码，请查看 [README_TRAIN.md](README_TRAIN.md)。\n* **2024年6月5日**：请体验由 EdgeSAM 提供支持的 iOS 应用 [CutCha](https:\u002F\u002Fapps.apple.com\u002Fus\u002Fapp\u002Fcutcha-photo\u002Fid6478521132)。\n* **2024年1月1日**：EdgeSAM 已集成到 [X-AnyLabeling](https:\u002F\u002Fgithub.com\u002FCVHub520\u002FX-AnyLabeling) 中。\n* **2023年12月19日**：EdgeSAM 现已在分割标注工具 [ISAT](https:\u002F\u002Fgithub.com\u002FyatengLG\u002FISAT_with_segment_anything) 中得到支持。\n* **2023年12月16日**：EdgeSAM 现已在 [Grounded-Segment-Anything](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-Segment-Anything) 中得到支持。请查看 [grounded-edge-sam 演示](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-Segment-Anything\u002Fblob\u002Fmain\u002FEfficientSAM\u002Fgrounded_edge_sam.py)。感谢 IDEA Research 团队！\n* **2023年12月14日**：[autodistill-grounded-edgesam](https:\u002F\u002Fgithub.com\u002Fautodistill\u002Fautodistill-grounded-edgesam) 将 Grounding DINO 和 EdgeSAM 结合，打造了 Grounded EdgeSAM [[博客](https:\u002F\u002Fblog.roboflow.com\u002Fhow-to-use-grounded-edgesam\u002F)]。感谢 Roboflow 团队！\n* **2023年12月13日**：添加了 ONNX 导出，并以 ONNX 为后端加速了网页演示。\n\n## 概述\n\n**EdgeSAM** 是 Segment Anything Model (SAM) 的加速版本，专为在边缘设备上高效运行而优化，同时几乎不损失性能。\n与原始 SAM 相比，它实现了 **40 倍的速度提升**，并且优于 MobileSAM，在边缘设备上部署时速度是后者的 **14 倍**，同时将 COCO 和 LVIS 数据集上的 mIoU 分别提高了 2.3 和 3.2。\nEdgeSAM 也是首个能够在 iPhone 14 上以 **超过 30 FPS** 运行的 SAM 变体。\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"900\" alt=\"compare\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fchongzhou96_EdgeSAM_readme_6dac34973031.png\">\n\u003C\u002Fp>\n\n*在这张图中，我们展示了 EdgeSAM 与 SAM 和 MobileSAM 的编码器吞吐量对比，以及在 SA-1K 数据集（从 SA-1B 中采样）上使用框和点提示时的 mIoU 性能。*\n\n\u003Cdetails>\n\n\u003Csummary> \u003Cstrong>方法\u003C\u002Fstrong> \u003C\u002Fsummary>\n\n我们的方法是将基于 ViT 的原始 SAM 图像编码器蒸馏成纯 CNN 架构，更适合边缘设备。我们仔细评估了多种蒸馏策略，并证明任务无关的编码器蒸馏无法捕捉 SAM 中蕴含的全部知识。为了克服这一瓶颈，我们将提示编码器和掩码解码器都纳入蒸馏过程，并在循环中加入框和点提示，使蒸馏后的模型能够准确捕捉用户输入与掩码生成之间的复杂动态。\n\n  \u003Cp align=\"center\">\n    \u003Cimg width=\"612\" alt=\"arch\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fchongzhou96_EdgeSAM_readme_d3fb04b3b86c.png\">\n  \u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n\u003Csummary> \u003Cstrong>性能\u003C\u002Fstrong> \u003C\u002Fsummary>\n\n| 方法      | 训练集 | COCO AP | COCO AP\u003Csub>s\u003C\u002Fsub> | COCO AP\u003Csub>m\u003C\u002Fsub> | COCO AP\u003Csub>l\u003C\u002Fsub> | GFLops | MParam. | FPS iPhone 14 | FPS 2080 Ti | FPS 3090 |\n|-------------|-----------|---------|---------------------|---------------------|---------------------|--------|---------|---------------|-------------|----------|\n| SAM         | SA-1B     | 46.1    | 33.6                | 51.9                | 57.7                | 2734.8 | 641.1   | -             | 4.3         | -        |\n| FastSAM     | 2% SA-1B  | 37.9    | 23.9                | 43.4                | 50.0                | 887.6  | 68.2    | -             | -           | 25.0*    |\n| MobileSAM   | 1% SA-1B  | 39.4    | 26.9                | 44.4                | 52.2                | 38.2   | 9.8     | 4.9           | 103.5       | 100.0*   |\n| EdgeSAM     | 1% SA-1B  | 42.2    | 29.6                | 47.6                | 53.9                | 22.1   | 9.6     | 38.7          | 164.3       | -        |\n| EdgeSAM-3x  | 3% SA-1B  | 42.7    | 30.0                | 48.6                | 54.5                | 22.1   | 9.6     | 38.7          | 164.3       | -        |\n| EdgeSAM-10x | 10% SA-1B | 43.0    | 30.3                | 48.9                | 55.1                | 22.1   | 9.6     | 38.7          | 164.3       | -        |\n\n*在此表格中，我们报告了 COCO 数据集上的掩码 mAP。检测器采用 ViTDet-H，其框 mAP 为 58.7，用于提供框提示。在速度基准测试中，我们对编码器和解码器进行了推理（使用单个提示）。FLOPs 是基于 1024x1024 输入分辨率计算的。标有 * 的数字来自 MobileSAM。3x 和 10x 表示使用更多数据进行训练。此处我们未按照原始 SAM 论文中的设置应用额外的掩码细化迭代。*\n\n\u003C\u002Fdetails>\n\n## 目录\n\n- [安装](#installation)\n- [使用](#usage)\n- [训练与评估](#train)\n- [Web 演示](#demo)\n- [CoreML \u002F ONNX 导出](#export)\n- [检查点](#checkpoints)\n- [iOS 应用](#ios)\n- [致谢](#acknowledgement)\n- [引用](#cite)\n- [许可证](#license)\n\n## 安装 \u003Ca name=\"installation\">\u003C\u002Fa>\n\n代码需要 `python>=3.8`，我们使用 `torch==2.0.0` 和 `torchvision==0.15.1`。请参考\n[官方 PyTorch 安装说明](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F)。\n\n1. 在本地克隆仓库：\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM.git && cd EdgeSAM\n```\n\n2. 安装其他依赖项：\n\n```\npip install -r requirements.txt\n```\n\n3. 安装 EdgeSAM：\n\n```\npip install -e .\n```\n\n## 使用方法 \u003Ca name=\"usage\">\u003C\u002Fa>\n\n1. 下载检查点文件（有关 PyTorch 和 CoreML 检查点的更多详细信息，请参阅 [检查点](#checkpoints)）：\n\n```\nmkdir weights\nwget -P weights\u002F https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam.pth\nwget -P weights\u002F https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x.pth\n```\n\n2. 您可以通过以下几行代码轻松地将 EdgeSAM 集成到您的 Python 代码中：\n\n```\nfrom edge_sam import SamPredictor, sam_model_registry\nsam = sam_model_registry[\"edge_sam\"](checkpoint=\"\u003Cpath\u002Fto\u002Fcheckpoint>\")\npredictor = SamPredictor(sam)\npredictor.set_image(\u003Cyour_image>)\nmasks, _, _ = predictor.predict(\u003Cinput_prompts>)\n```\n\n由于 EdgeSAM 采用与 SAM 相同的编码器-解码器架构，因此两者的使用方式非常相似。一个细微的区别是，EdgeSAM 可为每个提示输出 1、3 或 4 个掩码候选，而 SAM 则只能输出 1 或 3 个掩码。有关更多详细信息，请参阅 [示例 Jupyter Notebook](https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fblob\u002Fmaster\u002Fnotebooks\u002Fpredictor_example.ipynb)。\n\n## 训练与评估 \u003Ca name=\"train\">\u003C\u002Fa>\n有关详细信息，请参阅 [README_TRAIN.md](README_TRAIN.md)。\n\n## Web 演示 \u003Ca name=\"demo\">\u003C\u002Fa>\n在安装 EdgeSAM 并下载检查点文件后，您可以使用以下命令启动交互式 Web 演示：\n\n```\npython web_demo\u002Fgradio_app.py\n```\n\n默认情况下，演示将在 `http:\u002F\u002F0.0.0.0:8080\u002F` 上运行，并期望 `edge_sam_3x.pth` 存储在 `weights\u002F` 文件夹中。您可以通过以下方式更改默认行为：\n\n```\npython web_demo\u002Fgradio_app.py --checkpoint [CHECKPOINT] --server-name [SERVER_NAME] --port [PORT]\n```\n\n由于 EdgeSAM 可以在手机上流畅运行，因此即使没有 GPU 也无妨。\n我们已在 Hugging Face Space 中部署了相同的 Web 演示 [[链接](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM)]。\u003Cdel> 然而，由于它使用 CPU 作为后端且所有用户共享资源，体验可能不如本地部署。 \u003C\u002Fdel> 非常感谢 Hugging Face 团队为我们提供 GPU 支持！\n\n**使用 ONNX 后端加速 Web 演示**\n\n1. 如果您的机器没有 GPU，请使用 `pip install onnxruntime` 安装 onnxruntime；如果有 GPU，则使用 `pip install onnxruntime-gpu`（但不要同时安装两者）。我们的实现已在版本 `1.16.3` 下测试通过。\n\n2. 将 ONNX 模型下载到 `weights\u002F` 文件夹：\n\n```\nwget -P weights\u002F https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_encoder.onnx\nwget -P weights\u002F https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_decoder.onnx\n```\n\n3. 启动演示：\n\n```\npython web_demo\u002Fgradio_app.py --enable-onnx\n```\n\n4. 在浏览器中访问 http:\u002F\u002F0.0.0.0:8080。\n\n## CoreML \u002F ONNX 导出 \u003Ca name=\"export\">\u003C\u002Fa>\n\n**CoreML**\n\n我们提供了一个脚本，可以将训练好的 EdgeSAM PyTorch 模型导出为两个 CoreML 模型包，分别用于编码器和解码器。您也可以在 [检查点](#checkpoints) 下下载这些导出的 CoreML 模型。\n\n对于编码器：\n\n```\npython scripts\u002Fexport_coreml_model.py [CHECKPOINT]\n```\n\n对于解码器：\n\n```\npython scripts\u002Fexport_coreml_model.py [CHECKPOINT] --decoder --use-stability-score\n```\n\n由于 EdgeSAM 没有对原始 SAM 的 IoU 标记进行知识蒸馏，其 IoU 预测可能不太可靠。因此，我们改用稳定性分数来选择掩码。如果您希望继续使用 IoU 预测，只需移除 `--use-stability-score` 即可。\n\n以下是使用 Xcode 在 iPhone 14 上测量的 EdgeSAM CoreML 模型性能报告（左：编码器，右：解码器）：\n\n\u003Cp align=\"center\">\n\n  ![xcode](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fchongzhou96_EdgeSAM_readme_da61221d1ca5.png)\n\n\u003C\u002Fp>\n\n\u003Cdetails>\n  \u003Csummary> \u003Cstrong> 已知问题及模型说明 \u003C\u002Fstrong> \u003C\u002Fsummary>\n\n  截至 `coremltools==7.1` 版本，在导出过程中可能会遇到断言错误，例如 `assert len(inputs) \u003C= 3 or inputs[3] is None`。一种解决方法是按照堆栈跟踪路径注释掉该断言，例如 `\u002Fopt\u002Fanaconda3\u002Fenvs\u002FEdgeSAM\u002Flib\u002Fpython3.8\u002Fsite-packages\u002Fcoremltools\u002Fconverters\u002Fmil\u002Ffrontend\u002Ftorch\u002Fops.py 第 1573 行`。\n\n  由于 CoreML 不支持动态目标尺寸的插值，因此转换后的 CoreML 模型不包含预处理步骤（如缩放、归一化和填充），也不包含后处理步骤（如恢复到原始尺寸）。\n\n  编码器的输入为 `1x3x1024x1024` 的图像，输出为 `1x256x64x64` 的图像嵌入。解码器则以图像嵌入以及点坐标和点标签作为输入。点坐标采用 `(height, width)` 格式，左上角为 `(0, 0)`。点标签的选择包括：`0：负点`、`1：正点`、`2：框的左上角` 和 `3：框的右下角`。\n\n\u003C\u002Fdetails>\n\n**ONNX**\n\n与 CoreML 导出类似，您可以使用以下命令分别将编码器和解码器导出为 ONNX 模型：\n\n对于编码器：\n\n```\npython scripts\u002Fexport_onnx_model.py [CHECKPOINT]\n```\n\n对于解码器：\n\n```\npython scripts\u002Fexport_onnx_model.py [CHECKPOINT] --decoder --use-stability-score\n```\n\n## 检查点 \u003Ca name=\"checkpoints\">\u003C\u002Fa>\n\n请从 EdgeSAM 的 Hugging Face Space 下载检查点文件（所有 EdgeSAM 变体仅在训练图像数量上有所不同）：\n\n| 模型               | COCO mAP | PyTorch | CoreML         | ONNX           |\n| ------------------- | -------- | ------- | -------------- | -------------- |\n| SAM                 | 46.1     | -       | -              | -              |\n| EdgeSAM             | 42.1     | [下载](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam.pth) | [[编码器](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_encoder.mlpackage.zip)] [[解码器](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_decoder.mlpackage.zip)] | [[编码器](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_encoder.onnx)] [[解码器](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_decoder.onnx)] |\n| EdgeSAM-3x          | 42.7     | [下载](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x.pth) | [[编码器](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_encoder.mlpackage.zip)] [[解码器](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_decoder.mlpackage.zip)] | [[编码器](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_encoder.onnx)] [[解码器](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x_decoder.onnx)] |\n| EdgeSAM-10x         | 43       | 待定    | 待定            | 待定           |\n\n注意：在使用 CoreML 模型之前，需要先解压缩模型包。\n\n## iOS 应用 \u003Ca name=\"ios\">\u003C\u002Fa>\n我们计划将我们在现场演示中使用的 iOS 应用程序发布到 App Store。敬请关注！\n\n## 致谢 \u003Ca name=\"acknowledgement\">\u003C\u002Fa>\n本研究得到了 RIE2020 产业对接基金产业合作项目（IAF-ICP）资助计划的支持，以及行业合作伙伴提供的现金和实物捐赠。我们感谢 [Han Soong Chong](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fhansoong-choong-0493a5155\u002F) 在演示应用开发方面所付出的努力。\n\n我们感谢以下项目为 EdgeSAM 的实现提供了支持：[SAM](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything)、[MobileSAM](https:\u002F\u002Fgithub.com\u002FChaoningZhang\u002FMobileSAM)、[FastSAM](https:\u002F\u002Fgithub.com\u002FCASIA-IVA-Lab\u002FFastSAM)、[TinyViT](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FCream) 和 [RepViT](https:\u002F\u002Fgithub.com\u002FTHU-MIG\u002FRepViT)。\n\n## 引用 \u003Ca name=\"cite\">\u003C\u002Fa>\n```bibtex\n@article{zhou2023edgesam,\n  title={EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM},\n  author={周冲、李向泰、陈昌礼、戴博},\n  journal={arXiv 预印本 arXiv:2312.06660},\n  year={2023}\n}\n```\n\n## 许可证 \u003Ca name=\"license\">\u003C\u002Fa>\n\n本项目采用 \u003Ca rel=\"license\" href=\"https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fblob\u002Fmaster\u002FLICENSE\">NTU S-Lab 许可证 1.0\u003C\u002Fa> 进行授权。再分发和使用应遵守该许可证的规定。","# EdgeSAM 快速上手指南\n\nEdgeSAM 是 Segment Anything Model (SAM) 的加速版本，专为边缘设备（如手机、嵌入式设备）优化。相比原版 SAM 速度提升 **40 倍**，在 iPhone 14 上可实现超过 **30 FPS** 的实时分割性能，同时保持了较高的分割精度。\n\n## 环境准备\n\n*   **操作系统**: Linux, macOS, Windows\n*   **Python 版本**: >= 3.8\n*   **核心依赖**:\n    *   `torch == 2.0.0`\n    *   `torchvision == 0.15.1`\n*   **硬件建议**: 支持 CPU 推理（适合移动端部署），若有 GPU 可加速训练和 Web Demo 体验。\n\n> **注意**: 请优先参考 [PyTorch 官方安装指南](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F) 安装对应版本的 torch 和 torchvision。国内用户可使用清华源或阿里源加速安装：\n> ```bash\n> pip install torch==2.0.0 torchvision==0.15.1 -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM.git && cd EdgeSAM\n    ```\n\n2.  **安装依赖**\n    ```bash\n    pip install -r requirements.txt\n    ```\n    *(国内用户建议添加 `-i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`)*\n\n3.  **安装 EdgeSAM**\n    ```bash\n    pip install -e .\n    ```\n\n4.  **下载预训练权重**\n    创建权重目录并下载模型文件（提供基础版和增强版）：\n    ```bash\n    mkdir weights\n    wget -P weights\u002F https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam.pth\n    wget -P weights\u002F https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fchongzhou\u002FEdgeSAM\u002Fresolve\u002Fmain\u002Fweights\u002Fedge_sam_3x.pth\n    ```\n    > **提示**: 如果 `wget` 下载缓慢，可直接在浏览器访问上述链接下载，或将文件手动放入 `weights\u002F` 目录。推荐使用 `edge_sam_3x.pth` 以获得更好的精度。\n\n## 基本使用\n\nEdgeSAM 的使用方式与原始 SAM 高度兼容。以下是最简单的 Python 调用示例：\n\n```python\nfrom edge_sam import SamPredictor, sam_model_registry\n\n# 1. 加载模型 (替换为实际下载的权重路径)\nsam = sam_model_registry[\"edge_sam\"](checkpoint=\"weights\u002Fedge_sam_3x.pth\")\n\n# 2. 初始化预测器\npredictor = SamPredictor(sam)\n\n# 3. 设置输入图像 (支持 numpy array)\npredictor.set_image(\u003Cyour_image>)\n\n# 4. 进行预测 (输入可以是点、框等 prompts)\n# 示例：根据输入提示生成掩码\nmasks, scores, logits = predictor.predict(\u003Cinput_prompts>)\n```\n\n**关键特性说明：**\n*   **输出数量**: 与原版 SAM 不同，EdgeSAM 允许为每个提示输出 **1、3 或 4** 个候选掩码。\n*   **架构兼容**: 采用相同的 Encoder-Decoder 架构，便于集成到现有 SAM 工作流中。\n\n更多详细用法（如交互式标注、批量处理）请参考项目自带的 `notebooks\u002Fpredictor_example.ipynb`。","某电商平台的移动端开发团队正在构建一款让用户通过手机拍照快速抠图换背景的功能，以增强商品展示效果。\n\n### 没有 EdgeSAM 时\n- **响应延迟严重**：原始 SAM 模型计算量过大，在手机上处理一张图片需数秒，用户等待焦虑感强，极易流失。\n- **云端依赖成本高**：为弥补手机端算力不足，不得不将图片上传至云端服务器处理，导致流量带宽成本激增且受网络波动影响大。\n- **小目标分割不准**：现有的轻量级替代方案（如早期 MobileSAM）在复杂背景下对细小商品边缘的识别精度不足，抠图效果粗糙。\n- **发热耗电快**：高强度的推理运算导致手机迅速发热并大量消耗电量，严重影响用户体验和应用留存率。\n\n### 使用 EdgeSAM 后\n- **实时交互流畅**：EdgeSAM 在 iPhone 14 等设备上实现超 30 FPS 的推理速度，用户手指移动提示框时，掩码生成几乎零延迟。\n- **纯端侧部署省钱**：凭借极致的轻量化架构，所有计算均在本地完成，彻底省去了云端推理的带宽费用和网络延迟问题。\n- **细节还原更精准**：通过“提示循环蒸馏”技术，EdgeSAM 在保持高速的同时，显著提升了 COCO 和 LVIS 数据集上的分割精度，发丝级边缘清晰可见。\n- **低功耗长续航**：专为边缘设备优化的 CNN 架构大幅降低了算力负载，用户长时间使用也不会感到手机发烫或电量骤降。\n\nEdgeSAM 成功将原本只能运行在服务器上的强大分割能力“装进”了用户的口袋，让高质量的实时图像编辑在移动端成为常态。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fchongzhou96_EdgeSAM_6dac3497.png","chongzhou96","Chong Zhou","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fchongzhou96_7819354e.jpg",null,"Meta","Seattle","chongzhou7","http:\u002F\u002Fchongzhou96.github.io","https:\u002F\u002Fgithub.com\u002Fchongzhou96",[82,86,90],{"name":83,"color":84,"percentage":85},"Jupyter Notebook","#DA5B0B",96.4,{"name":87,"color":88,"percentage":89},"Python","#3572A5",3.5,{"name":91,"color":92,"percentage":93},"Shell","#89e051",0.1,1131,58,"2026-04-04T02:14:09","NOASSERTION","Linux, macOS, Windows","非必需。支持在无 GPU 环境下运行（使用 CPU 后端），若有 NVIDIA GPU 可加速（文中测试环境包含 2080 Ti, 3090）。iOS 部署需 CoreML 兼容设备（如 iPhone 14+）。","未说明",{"notes":102,"python":103,"dependencies":104},"1. 该项目专为边缘设备优化，在无 GPU 的移动端或 Web 端也能流畅运行（iPhone 14 上可达 30+ FPS）。2. 若需导出 CoreML 模型，遇到 coremltools==7.1 的断言错误时，需手动注释掉源码中的相关断言代码。3. 导出的 CoreML 模型不包含动态尺寸的预处理和后处理步骤。4. 使用 ONNX 加速 Web Demo 时，需根据是否有 GPU 选择安装 onnxruntime 或 onnxruntime-gpu，不可同时安装。","3.8+",[105,106,107,108,109],"torch==2.0.0","torchvision==0.15.1","onnxruntime (可选，版本 1.16.3)","onnxruntime-gpu (可选，版本 1.16.3)","coremltools (可选，用于导出 CoreML 模型)",[15,14],[112,113,114],"on-device-ai","segment-anything","coreml","2026-03-27T02:49:30.150509","2026-04-06T21:10:35.630796",[118,123,128,133,138,143,148,153],{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},19996,"运行 Gradio 演示时点击图像没有反应怎么办？","这通常是由于 Gradio 版本不兼容导致的。请尝试卸载当前版本并安装指定版本：\npip uninstall gradio\npip install gradio==3.50.2","https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fissues\u002F49",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},19997,"如何自动分割图像中的所有物体而不手动设置点或框？","可以使用 `EdgeSAMAutomaticMaskGenerator`。安装 `sssegmentation` 后，使用以下代码：\nimport cv2\nimport torch\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom ssseg.modules.models.segmentors.sam.visualization import showanns\nfrom ssseg.modules.models.segmentors.edgesam import EdgeSAMAutomaticMaskGenerator\n\nimage = cv2.imread('images\u002Fdog.jpg')\nimage = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\nmask_generator = EdgeSAMAutomaticMaskGenerator(use_default_edgesam=True, device='cuda')\nmasks = mask_generator.generate(image)\nplt.figure(figsize=(20, 20))\nplt.imshow(image)\nshowanns(masks)\nplt.axis('off')\nplt.savefig('mask.png')\n\n注意：由于 EdgeSAM 遵循编码器 - 解码器架构，全图模式（everything mode）需要推理解码器 1024 次，效率较低。如果显存不是瓶颈且追求速度，原作者建议直接使用原始 SAM 模型。","https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fissues\u002F5",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},19998,"按照 README_TRAIN.md 训练后，加载模型出现 missing_keys 警告且文件大小不符怎么办？","训练生成的 checkpoint 文件包含额外的字典结构，需要提取其中的 \"model\" 字段才能正确加载。请使用以下脚本处理权重文件：\nimport torch\nwith open(\".\u002Foutput\u002Frep_vit_m1_fuse_enc_dec_4m_ft_bp_iter2b_sa_distill\u002Fdefault\u002Fckpt_epoch_4.pth\", \"rb\") as f:\n    state_dict = torch.load(f)\ntorch.save(state_dict[\"model\"], \"weights\u002Ftrained_model.pth\")\n处理后的文件即可正常加载使用。","https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fissues\u002F40",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},19999,"导入模块时报错 KeyError: 'edge_sam' 找不到模型类型怎么办？","这是因为导入路径错误。请不要使用 `from segment_anything import ...`，而是改为从 `edge_sam` 包直接导入：\nfrom edge_sam import SamPredictor, sam_model_registry\n\n然后即可正常使用 `sam_model_registry[\"edge_sam\"]` 加载模型。","https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fissues\u002F28",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},20000,"是否有针对特定任务的微调（Finetuning）代码或经验？","作者目前尚未在其他数据集上尝试过微调 EdgeSAM，但计划在未来针对 SA-1B 数据集进行。用户可以参考官方提供的训练代码作为起点，具体细节请查看项目中的 README_TRAIN.md 文件。","https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fissues\u002F20",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},20001,"是否支持 TensorRT、RKNN 或 Atlas 等边缘端部署框架？","目前作者表示不熟悉用户提到的这些特定框架（TensorRT, RKNN, Atlas），因此暂时没有提供相关的参考资料或官方支持。","https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fissues\u002F8",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},20002,"训练代码何时开源？","训练和评估代码已经发布。请参阅 Issue #15 或项目仓库中的相关文档获取代码。","https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fissues\u002F27",{"id":154,"question_zh":155,"answer_zh":156,"source_url":157},20003,"有哪些第三方工具已经集成了 EdgeSAM？","分割标注工具 ISAT (ISAT_with_segment_anything) 已经支持 EdgeSAM。用户可以在该工具的仓库中查找相关使用方法。","https:\u002F\u002Fgithub.com\u002Fchongzhou96\u002FEdgeSAM\u002Fissues\u002F9",[]]