[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-huggingface--candle":3,"tool-huggingface--candle":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":75,"owner_website":80,"owner_url":81,"languages":82,"stars":120,"forks":121,"last_commit_at":122,"license":123,"difficulty_score":124,"env_os":125,"env_gpu":126,"env_ram":127,"env_deps":128,"category_tags":137,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":138,"updated_at":139,"faqs":140,"releases":181},4081,"huggingface\u002Fcandle","candle","Minimalist ML framework for Rust","Candle 是一款专为 Rust 语言打造的轻量级机器学习框架，由 Hugging Face 团队开发。它致力于在保持代码极简的同时，提供卓越的计算性能，并原生支持 CPU 与 GPU 加速。\n\n对于希望在 Rust 生态中构建高效 AI 应用的开发者而言，Candle 解决了传统框架往往体积庞大、依赖复杂或难以与 Rust 类型安全特性深度融合的痛点。它无需庞大的运行时环境，即可轻松实现从基础的矩阵运算到复杂大模型推理的各种任务。\n\nCandle 特别适合熟悉 Rust 的工程师、追求高性能部署的研究人员，以及希望将 AI 模型嵌入本地应用或 Web 端（通过 WebAssembly）的技术团队。其独特的技术亮点在于“极简主义”设计：去除了冗余抽象，让开发者能直接掌控底层计算细节。目前，Candle 已成功支持 LLaMA、Whisper、Yolo、Stable Diffusion 等众多主流模型的运行，甚至能在浏览器中流畅跑通复杂的图像分割与语音识别任务。如果你看重执行效率、二进制文件大小以及 Rust 带来的内存安全性，Candle 是一个值得尝试的现代化选择。","# candle\n[![discord server](https:\u002F\u002Fdcbadge.limes.pink\u002Fapi\u002Fserver\u002Fhugging-face-879548962464493619)](https:\u002F\u002Fdiscord.gg\u002Fhugging-face-879548962464493619)\n[![Latest version](https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fcandle-core.svg)](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fcandle-core)\n[![Documentation](https:\u002F\u002Fdocs.rs\u002Fcandle-core\u002Fbadge.svg)](https:\u002F\u002Fdocs.rs\u002Fcandle-core)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fbase-org\u002Fnode?color=blue)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fblob\u002Fmain\u002FLICENSE-MIT)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-blue?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fblob\u002Fmain\u002FLICENSE-APACHE)\n\nCandle is a minimalist ML framework for Rust with a focus on performance (including GPU support) \nand ease of use. Try our online demos: \n[whisper](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-whisper),\n[LLaMA2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2),\n[T5](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-T5-Generation-Wasm),\n[yolo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-yolo),\n[Segment\nAnything](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002Fcandle-segment-anything-wasm).\n\n## Get started\n\nMake sure that you have [`candle-core`](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Ftree\u002Fmain\u002Fcandle-core) correctly installed as described in [**Installation**](https:\u002F\u002Fhuggingface.github.io\u002Fcandle\u002Fguide\u002Finstallation.html).\n\nLet's see how to run a simple matrix multiplication.\nWrite the following to your `myapp\u002Fsrc\u002Fmain.rs` file:\n```rust\nuse candle_core::{Device, Tensor};\n\nfn main() -> Result\u003C(), Box\u003Cdyn std::error::Error>> {\n    let device = Device::Cpu;\n\n    let a = Tensor::randn(0f32, 1., (2, 3), &device)?;\n    let b = Tensor::randn(0f32, 1., (3, 4), &device)?;\n\n    let c = a.matmul(&b)?;\n    println!(\"{c}\");\n    Ok(())\n}\n```\n\n`cargo run` should display a tensor of shape `Tensor[[2, 4], f32]`.\n\n\nHaving installed `candle` with Cuda support, simply define the `device` to be on GPU:\n\n```diff\n- let device = Device::Cpu;\n+ let device = Device::new_cuda(0)?;\n```\n\nFor more advanced examples, please have a look at the following section.\n\n## Check out our examples\n\nThese online demos run entirely in your browser:\n- [yolo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-yolo): pose estimation and\n  object recognition.\n- [whisper](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-whisper): speech recognition.\n- [LLaMA2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2): text generation.\n- [T5](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-T5-Generation-Wasm): text generation.\n- [Phi-1.5, and Phi-2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-Phi-1.5-Wasm): text generation.\n- [Segment Anything Model](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002Fcandle-segment-anything-wasm): Image segmentation.\n- [BLIP](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-BLIP-Image-Captioning): image captioning.\n\nWe also provide some command line based examples using state of the art models:\n\n- [LLaMA v1, v2, and v3](.\u002Fcandle-examples\u002Fexamples\u002Fllama\u002F): general LLM, includes\n  the SOLAR-10.7B variant.\n- [Falcon](.\u002Fcandle-examples\u002Fexamples\u002Ffalcon\u002F): general LLM.\n- [Codegeex4](.\u002Fcandle-examples\u002Fexamples\u002Fcodegeex4-9b\u002F): Code completion, code interpreter, web search, function calling, repository-level\n- [GLM4](.\u002Fcandle-examples\u002Fexamples\u002Fglm4\u002F): Open Multilingual Multimodal Chat LMs by THUDM\n- [Gemma v1 and v2](.\u002Fcandle-examples\u002Fexamples\u002Fgemma\u002F): 2b and 7b+\u002F9b general LLMs from Google Deepmind.\n- [RecurrentGemma](.\u002Fcandle-examples\u002Fexamples\u002Frecurrent-gemma\u002F): 2b and 7b\n  Griffin based models from Google that mix attention with a RNN like state.\n- [Phi-1, Phi-1.5, Phi-2, and Phi-3](.\u002Fcandle-examples\u002Fexamples\u002Fphi\u002F): 1.3b,\n  2.7b, and 3.8b general LLMs with performance on par with 7b models.\n- [StableLM-3B-4E1T](.\u002Fcandle-examples\u002Fexamples\u002Fstable-lm\u002F): a 3b general LLM\n  pre-trained on 1T tokens of English and code datasets. Also supports\n  StableLM-2, a 1.6b LLM trained on 2T tokens, as well as the code variants.\n- [Mamba](.\u002Fcandle-examples\u002Fexamples\u002Fmamba\u002F): an inference only\n  implementation of the Mamba state space model.\n- [Mistral7b-v0.1](.\u002Fcandle-examples\u002Fexamples\u002Fmistral\u002F): a 7b general LLM with\n  better performance than all publicly available 13b models as of 2023-09-28.\n- [Mixtral8x7b-v0.1](.\u002Fcandle-examples\u002Fexamples\u002Fmixtral\u002F): a sparse mixture of\n  experts 8x7b general LLM with better performance than a Llama 2 70B model with\n  much faster inference.\n- [StarCoder](.\u002Fcandle-examples\u002Fexamples\u002Fbigcode\u002F) and\n  [StarCoder2](.\u002Fcandle-examples\u002Fexamples\u002Fstarcoder2\u002F): LLM specialized to code generation.\n- [Qwen1.5](.\u002Fcandle-examples\u002Fexamples\u002Fqwen\u002F): Bilingual (English\u002FChinese) LLMs.\n- [RWKV v5 and v6](.\u002Fcandle-examples\u002Fexamples\u002Frwkv\u002F): An RNN with transformer level LLM\n  performance.\n- [Replit-code-v1.5](.\u002Fcandle-examples\u002Fexamples\u002Freplit-code\u002F): a 3.3b LLM specialized for code completion.\n- [Yi-6B \u002F Yi-34B](.\u002Fcandle-examples\u002Fexamples\u002Fyi\u002F): two bilingual\n  (English\u002FChinese) general LLMs with 6b and 34b parameters.\n- [Quantized LLaMA](.\u002Fcandle-examples\u002Fexamples\u002Fquantized\u002F): quantized version of\n  the LLaMA model using the same quantization techniques as\n  [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp).\n- [Quantized Qwen3 MoE](.\u002Fcandle-examples\u002Fexamples\u002Fquantized-qwen3-moe\u002F): support gguf quantized models of Qwen3 MoE models.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_b829a04f678b.gif\" width=\"600\">\n  \n- [Stable Diffusion](.\u002Fcandle-examples\u002Fexamples\u002Fstable-diffusion\u002F): text to\n  image generative model, support for the 1.5, 2.1, SDXL 1.0 and Turbo versions.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_e0adeab9e540.jpg\" width=\"200\">\n\n- [Wuerstchen](.\u002Fcandle-examples\u002Fexamples\u002Fwuerstchen\u002F): another text to\n  image generative model.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_0d9689cffc23.jpg\" width=\"200\">\n\n- [yolo-v3](.\u002Fcandle-examples\u002Fexamples\u002Fyolo-v3\u002F) and\n  [yolo-v8](.\u002Fcandle-examples\u002Fexamples\u002Fyolo-v8\u002F): object detection and pose\n  estimation models.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_f5435313a8cd.jpg\" width=\"200\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_f57e6101131d.jpg\" width=\"200\">\n- [segment-anything](.\u002Fcandle-examples\u002Fexamples\u002Fsegment-anything\u002F): image\n  segmentation model with prompt.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_178ff134614b.jpg\" width=\"200\">\n\n- [SegFormer](.\u002Fcandle-examples\u002Fexamples\u002Fsegformer\u002F): transformer based semantic segmentation model.\n- [Whisper](.\u002Fcandle-examples\u002Fexamples\u002Fwhisper\u002F): speech recognition model.\n- [EnCodec](.\u002Fcandle-examples\u002Fexamples\u002Fencodec\u002F): high-quality audio compression\n  model using residual vector quantization.\n- [MetaVoice](.\u002Fcandle-examples\u002Fexamples\u002Fmetavoice\u002F): foundational model for\n  text-to-speech.\n- [Parler-TTS](.\u002Fcandle-examples\u002Fexamples\u002Fparler-tts\u002F): large text-to-speech\n  model.\n- [T5](.\u002Fcandle-examples\u002Fexamples\u002Ft5), [Bert](.\u002Fcandle-examples\u002Fexamples\u002Fbert\u002F),\n  [JinaBert](.\u002Fcandle-examples\u002Fexamples\u002Fjina-bert\u002F) : useful for sentence embeddings.\n- [DINOv2](.\u002Fcandle-examples\u002Fexamples\u002Fdinov2\u002F): computer vision model trained\n  using self-supervision (can be used for imagenet classification, depth\n  evaluation, segmentation).\n- [VGG](.\u002Fcandle-examples\u002Fexamples\u002Fvgg\u002F),\n  [RepVGG](.\u002Fcandle-examples\u002Fexamples\u002Frepvgg): computer vision models.\n- [BLIP](.\u002Fcandle-examples\u002Fexamples\u002Fblip\u002F): image to text model, can be used to\n  generate captions for an image.\n- [CLIP](.\u002Fcandle-examples\u002Fexamples\u002Fclip\u002F): multi-model vision and language\n  model.\n- [TrOCR](.\u002Fcandle-examples\u002Fexamples\u002Ftrocr\u002F): a transformer OCR model, with\n  dedicated submodels for hand-writing and printed recognition.\n- [Marian-MT](.\u002Fcandle-examples\u002Fexamples\u002Fmarian-mt\u002F): neural machine translation\n  model, generates the translated text from the input text.\n- [Moondream](.\u002Fcandle-examples\u002Fexamples\u002Fmoondream\u002F): tiny computer-vision model \n  that can answer real-world questions about images.\n\nRun them using commands like:\n```\ncargo run --example quantized --release\n```\n\nIn order to use **CUDA** add `--features cuda` to the example command line. If\nyou have cuDNN installed, use `--features cudnn` for even more speedups.\n\nThere are also some wasm examples for whisper and\n[llama2.c](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fllama2.c). You can either build them with\n`trunk` or try them online:\n[whisper](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-whisper),\n[llama2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2),\n[T5](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-T5-Generation-Wasm),\n[Phi-1.5, and Phi-2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-Phi-1.5-Wasm),\n[Segment Anything Model](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002Fcandle-segment-anything-wasm).\n\nFor LLaMA2, run the following command to retrieve the weight files and start a\ntest server:\n```bash\ncd candle-wasm-examples\u002Fllama2-c\nwget https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2\u002Fresolve\u002Fmain\u002Fmodel.bin\nwget https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2\u002Fresolve\u002Fmain\u002Ftokenizer.json\ntrunk serve --release --port 8081\n```\nAnd then head over to\n[http:\u002F\u002Flocalhost:8081\u002F](http:\u002F\u002Flocalhost:8081\u002F).\n\n\u003C!--- ANCHOR: useful_libraries --->\n\n## Useful External Resources\n- [`candle-tutorial`](https:\u002F\u002Fgithub.com\u002FToluClassics\u002Fcandle-tutorial): A\n  very detailed tutorial showing how to convert a PyTorch model to Candle.\n- [`candle-lora`](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-lora): Efficient and\n  ergonomic LoRA implementation for Candle. `candle-lora` has      \n  out-of-the-box LoRA support for many models from Candle, which can be found\n  [here](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-lora\u002Ftree\u002Fmaster\u002Fcandle-lora-transformers\u002Fexamples).\n- [`candle-video`](https:\u002F\u002Fgithub.com\u002FFerrisMind\u002Fcandle-video): Rust library for text-to-video generation (LTX-Video and related models) built on Candle, focused on fast, Python-free inference.\n- [`optimisers`](https:\u002F\u002Fgithub.com\u002FKGrewal1\u002Foptimisers): A collection of optimisers\n  including SGD with momentum, AdaGrad, AdaDelta, AdaMax, NAdam, RAdam, and RMSprop.\n- [`candle-vllm`](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-vllm): Efficient platform for inference and\n  serving local LLMs including an OpenAI compatible API server.\n- [`candle-ext`](https:\u002F\u002Fgithub.com\u002Fmokeyish\u002Fcandle-ext): An extension library to Candle that provides PyTorch functions not currently available in Candle.\n- [`candle-coursera-ml`](https:\u002F\u002Fgithub.com\u002Fvishpat\u002Fcandle-coursera-ml): Implementation of ML algorithms from Coursera's [Machine Learning Specialization](https:\u002F\u002Fwww.coursera.org\u002Fspecializations\u002Fmachine-learning-introduction) course.\n- [`kalosm`](https:\u002F\u002Fgithub.com\u002Ffloneum\u002Ffloneum\u002Ftree\u002Fmaster\u002Finterfaces\u002Fkalosm): A multi-modal meta-framework in Rust for interfacing with local pre-trained models with support for controlled generation, custom samplers, in-memory vector databases, audio transcription, and more.\n- [`candle-sampling`](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-sampling): Sampling techniques for Candle.\n- [`gpt-from-scratch-rs`](https:\u002F\u002Fgithub.com\u002Fjeroenvlek\u002Fgpt-from-scratch-rs): A port of Andrej Karpathy's _Let's build GPT_ tutorial on YouTube showcasing the Candle API on a toy problem.\n- [`candle-einops`](https:\u002F\u002Fgithub.com\u002Ftomsanbear\u002Fcandle-einops): A pure rust implementation of the python [einops](https:\u002F\u002Fgithub.com\u002Farogozhnikov\u002Feinops) library.\n- [`atoma-infer`](https:\u002F\u002Fgithub.com\u002Fatoma-network\u002Fatoma-infer): A Rust library for fast inference at scale, leveraging FlashAttention2 for efficient attention computation, PagedAttention for efficient KV-cache memory management, and multi-GPU support. It is OpenAI api compatible.\n- [`llms-from-scratch-rs`](https:\u002F\u002Fgithub.com\u002Fnerdai\u002Fllms-from-scratch-rs): A comprehensive Rust translation of the code from Sebastian Raschka's Build an LLM from Scratch book.\n- [`vllm.rs`](https:\u002F\u002Fgithub.com\u002Fguoqingbao\u002Fvllm.rs): A minimalist vLLM implementation in Rust based on Candle.\n\nIf you have an addition to this list, please submit a pull request.\n\n\u003C!--- ANCHOR_END: useful_libraries --->\n\n\u003C!--- ANCHOR: features --->\n\n## Features\n\n- Simple syntax, looks and feels like PyTorch.\n    - Model training.\n    - Embed user-defined ops\u002Fkernels, such as [flash-attention v2](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fblob\u002F89ba005962495f2bfbda286e185e9c3c7f5300a3\u002Fcandle-flash-attn\u002Fsrc\u002Flib.rs#L152).\n- Backends.\n    - Optimized CPU backend with optional MKL support for x86 and Accelerate for macs.\n    - CUDA backend for efficiently running on GPUs, multiple GPU distribution via NCCL.\n    - WASM support, run your models in a browser.\n- Included models.\n    - Language Models.\n        - LLaMA v1, v2, and v3 with variants such as SOLAR-10.7B.\n        - Falcon.\n        - StarCoder, StarCoder2.\n        - Phi 1, 1.5, 2, and 3.\n        - Mamba, Minimal Mamba\n        - Gemma v1 2b and 7b+, v2 2b and 9b.\n        - Mistral 7b v0.1.\n        - Mixtral 8x7b v0.1.\n        - StableLM-3B-4E1T, StableLM-2-1.6B, Stable-Code-3B.\n        - Replit-code-v1.5-3B.\n        - Bert.\n        - Yi-6B and Yi-34B.\n        - Qwen1.5, Qwen1.5 MoE, Qwen3 MoE.\n        - RWKV v5 and v6.\n    - Quantized LLMs.\n        - Llama 7b, 13b, 70b, as well as the chat and code variants.\n        - Mistral 7b, and 7b instruct.\n        - Mixtral 8x7b.\n        - Zephyr 7b a and b (Mistral-7b based).\n        - OpenChat 3.5 (Mistral-7b based).\n        - Qwen3 MoE (16B-A3B, 32B-A3B)\n    - Text to text.\n        - T5 and its variants: FlanT5, UL2, MADLAD400 (translation), CoEdit (Grammar correction).\n        - Marian MT (Machine Translation).\n    - Text to image.\n        - Stable Diffusion v1.5, v2.1, XL v1.0.\n        - Wurstchen v2.\n    - Image to text.\n        - BLIP.\n        - TrOCR.\n    - Audio.\n        - Whisper, multi-lingual speech-to-text.\n        - EnCodec, audio compression model.\n        - MetaVoice-1B, text-to-speech model.\n        - Parler-TTS, text-to-speech model.\n    - Computer Vision Models.\n        - DINOv2, ConvMixer, EfficientNet, ResNet, ViT, VGG, RepVGG, ConvNeXT,\n          ConvNeXTv2, MobileOne, EfficientVit (MSRA), MobileNetv4, Hiera, FastViT.\n        - yolo-v3, yolo-v8.\n        - Segment-Anything Model (SAM).\n        - SegFormer.\n- File formats: load models from safetensors, npz, ggml, or PyTorch files.\n- Serverless (on CPU), small and fast deployments.\n- Quantization support using the llama.cpp quantized types.\n\n\u003C!--- ANCHOR_END: features --->\n\n## How to use\n\n\u003C!--- ANCHOR: cheatsheet --->\nCheatsheet:\n\n|            | Using PyTorch                            | Using Candle                                                     |\n|------------|------------------------------------------|------------------------------------------------------------------|\n| Creation   | `torch.Tensor([[1, 2], [3, 4]])`         | `Tensor::new(&[[1f32, 2.], [3., 4.]], &Device::Cpu)?`           |\n| Creation   | `torch.zeros((2, 2))`                    | `Tensor::zeros((2, 2), DType::F32, &Device::Cpu)?`               |\n| Indexing   | `tensor[:, :4]`                          | `tensor.i((.., ..4))?`                                           |\n| Operations | `tensor.view((2, 2))`                    | `tensor.reshape((2, 2))?`                                        |\n| Operations | `a.matmul(b)`                            | `a.matmul(&b)?`                                                  |\n| Arithmetic | `a + b`                                  | `&a + &b`                                                        |\n| Device     | `tensor.to(device=\"cuda\")`               | `tensor.to_device(&Device::new_cuda(0)?)?`                            |\n| Dtype      | `tensor.to(dtype=torch.float16)`         | `tensor.to_dtype(&DType::F16)?`                                  |\n| Saving     | `torch.save({\"A\": A}, \"model.bin\")`      | `candle::safetensors::save(&HashMap::from([(\"A\", A)]), \"model.safetensors\")?` |\n| Loading    | `weights = torch.load(\"model.bin\")`      | `candle::safetensors::load(\"model.safetensors\", &device)`        |\n\n\u003C!--- ANCHOR_END: cheatsheet --->\n\n\n## Structure\n\n- [candle-core](.\u002Fcandle-core): Core ops, devices, and `Tensor` struct definition\n- [candle-nn](.\u002Fcandle-nn\u002F): Tools to build real models\n- [candle-examples](.\u002Fcandle-examples\u002F): Examples of using the library in realistic settings\n- [candle-kernels](.\u002Fcandle-kernels\u002F): CUDA custom kernels\n- [candle-datasets](.\u002Fcandle-datasets\u002F): Datasets and data loaders.\n- [candle-transformers](.\u002Fcandle-transformers): transformers-related utilities.\n- [candle-flash-attn](.\u002Fcandle-flash-attn): Flash attention v2 layer.\n- [candle-onnx](.\u002Fcandle-onnx\u002F): ONNX model evaluation.\n\n## FAQ\n\n### Why should I use Candle?\n\n\u003C!--- ANCHOR: goals --->\n\nCandle's core goal is to *make serverless inference possible*. Full machine learning frameworks like PyTorch\nare very large, which makes creating instances on a cluster slow. Candle allows deployment of lightweight\nbinaries.\n\nSecondly, Candle lets you *remove Python* from production workloads. Python overhead can seriously hurt performance,\nand the [GIL](https:\u002F\u002Fwww.backblaze.com\u002Fblog\u002Fthe-python-gil-past-present-and-future\u002F) is a notorious source of headaches.\n\nFinally, Rust is cool! A lot of the HF ecosystem already has Rust crates, like [safetensors](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fsafetensors) and [tokenizers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftokenizers).\n\n\u003C!--- ANCHOR_END: goals --->\n\n### Other ML frameworks\n\n- [dfdx](https:\u002F\u002Fgithub.com\u002Fcoreylowman\u002Fdfdx) is a formidable crate, with shapes being included\n  in types. This prevents a lot of headaches by getting the compiler to complain about shape mismatches right off the bat.\n  However, we found that some features still require nightly, and writing code can be a bit daunting for non rust experts.\n\n  We're leveraging and contributing to other core crates for the runtime so hopefully both crates can benefit from each\n  other.\n\n- [burn](https:\u002F\u002Fgithub.com\u002Fburn-rs\u002Fburn) is a general crate that can leverage multiple backends so you can choose the best\n  engine for your workload.\n\n- [tch-rs](https:\u002F\u002Fgithub.com\u002FLaurentMazare\u002Ftch-rs.git) Bindings to the torch library in Rust. Extremely versatile, but they \n  bring in the entire torch library into the runtime. The main contributor of `tch-rs` is also involved in the development\n  of `candle`.\n\n### Common Errors\n\n#### Missing symbols when compiling with the mkl feature.\n\nIf you get some missing symbols when compiling binaries\u002Ftests using the mkl\nor accelerate features, e.g. for mkl you get:\n```\n  = note: \u002Fusr\u002Fbin\u002Fld: (....o): in function `blas::sgemm':\n          ...\u002Fblas-0.22.0\u002Fsrc\u002Flib.rs:1944: undefined reference to `sgemm_' collect2: error: ld returned 1 exit status\n\n  = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified\n  = note: use the `-l` flag to specify native libraries to link\n  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo\n```\nor for accelerate:\n```\nUndefined symbols for architecture arm64:\n            \"_dgemm_\", referenced from:\n                candle_core::accelerate::dgemm::h1b71a038552bcabe in libcandle_core...\n            \"_sgemm_\", referenced from:\n                candle_core::accelerate::sgemm::h2cf21c592cba3c47 in libcandle_core...\n          ld: symbol(s) not found for architecture arm64\n```\n\nThis is likely due to a missing linker flag that was needed to enable the mkl library. You\ncan try adding the following for mkl at the top of your binary:\n```rust\nextern crate intel_mkl_src;\n```\nor for accelerate:\n```rust\nextern crate accelerate_src;\n```\n\n#### Cannot run the LLaMA examples: access to source requires login credentials\n\n```\nError: request error: https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-hf\u002Fresolve\u002Fmain\u002Ftokenizer.json: status code 401\n```\n\nThis is likely because you're not permissioned for the LLaMA-v2 model. To fix\nthis, you have to register on the huggingface-hub, accept the [LLaMA-v2 model\nconditions](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-hf), and set up your\nauthentication token. See issue\n[#350](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F350) for more details.\n\n#### Docker build\n\nWhen building CUDA kernels inside a Dockerfile, nvidia-smi cannot be used to auto-detect compute capability.\n\nYou must explicitly set CUDA_COMPUTE_CAP, for example:\n\n```\nFROM nvidia\u002Fcuda:12.9.0-devel-ubuntu22.04\n\n# Install git and curl\nRUN set -eux; \\\n  apt-get update; \\\n  apt-get install -y curl git ca-certificates;\n\n# Install Rust\nRUN curl --proto '=https' --tlsv1.2 -sSf https:\u002F\u002Fsh.rustup.rs | sh -s -- -y\n\n# Clone candle repo\nRUN git clone https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle.git\n\n# Set compute capability for the build\nARG CUDA_COMPUTE_CAP=90\nENV CUDA_COMPUTE_CAP=${CUDA_COMPUTE_CAP}\n\n# Build with explicit compute cap\nWORKDIR \u002Fapp\nCOPY . .\nRUN cargo build --release features cuda\n```\n\n#### Compiling with flash-attention fails\n\n```\n\u002Fusr\u002Finclude\u002Fc++\u002F11\u002Fbits\u002Fstd_function.h:530:146: error: parameter packs not expanded with ‘...’:\n```\n\nThis is a bug in gcc-11 triggered by the Cuda compiler. To fix this, install a different, supported gcc version - for example gcc-10, and specify the path to the compiler in the NVCC_CCBIN environment variable.\n```\nenv NVCC_CCBIN=\u002Fusr\u002Flib\u002Fgcc\u002Fx86_64-linux-gnu\u002F10 cargo ...\n```\n\n#### Linking error on windows when running rustdoc or mdbook tests\n\n```\nCouldn't compile the test.\n---- .\\candle-book\\src\\inference\\hub.md - Using_the_hub::Using_in_a_real_model_ (line 50) stdout ----\nerror: linking with `link.exe` failed: exit code: 1181\n\u002F\u002Fvery long chain of linking\n = note: LINK : fatal error LNK1181: cannot open input file 'windows.0.48.5.lib'\n```\n\nMake sure you link all native libraries that might be located outside a project target, e.g., to run mdbook tests, you should run:\n\n```\nmdbook test candle-book -L .\\target\\debug\\deps\\ `\n-L native=$env:USERPROFILE\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\windows_x86_64_msvc-0.42.2\\lib `\n-L native=$env:USERPROFILE\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\windows_x86_64_msvc-0.48.5\\lib\n```\n\n#### Extremely slow model load time with WSL\n\nThis may be caused by the models being loaded from `\u002Fmnt\u002Fc`, more details on\n[stackoverflow](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F68972448\u002Fwhy-is-wsl-extremely-slow-when-compared-with-native-windows-npm-yarn-processing).\n\n#### Tracking down errors\n\nYou can set `RUST_BACKTRACE=1` to be provided with backtraces when a candle\nerror is generated.\n\n#### CudaRC error\n\nIf you encounter an error like this one `called `Result::unwrap()` on an `Err` value: LoadLibraryExW { source: Os { code: 126, kind: Uncategorized, message: \"The specified module could not be found.\" } }` on windows. To fix copy and rename these 3 files (make sure they are in path). The paths depend on your cuda version.\n`c:\\Windows\\System32\\nvcuda.dll` -> `cuda.dll`\n`c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.4\\bin\\cublas64_12.dll` -> `cublas.dll`\n`c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.4\\bin\\curand64_10.dll` -> `curand.dll`\n","# 蜡烛\n[![discord 服务器](https:\u002F\u002Fdcbadge.limes.pink\u002Fapi\u002Fserver\u002Fhugging-face-879548962464493619)](https:\u002F\u002Fdiscord.gg\u002Fhugging-face-879548962464493619)\n[![最新版本](https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fcandle-core.svg)](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fcandle-core)\n[![文档](https:\u002F\u002Fdocs.rs\u002Fcandle-core\u002Fbadge.svg)](https:\u002F\u002Fdocs.rs\u002Fcandle-core)\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fbase-org\u002Fnode?color=blue)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fblob\u002Fmain\u002FLICENSE-MIT)\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-blue?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fblob\u002Fmain\u002FLICENSE-APACHE)\n\nCandle 是一个面向 Rust 的极简机器学习框架，专注于性能（包括 GPU 支持）和易用性。请尝试我们的在线演示： \n[whisper](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-whisper),\n[LLaMA2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2),\n[T5](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-T5-Generation-Wasm),\n[yolo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-yolo),\n[Segment Anything](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002Fcandle-segment-anything-wasm)。\n\n## 开始使用\n\n请确保已按照 [**安装**](https:\u002F\u002Fhuggingface.github.io\u002Fcandle\u002Fguide\u002Finstallation.html) 中的说明正确安装了 [`candle-core`](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Ftree\u002Fmain\u002Fcandle-core)。\n\n让我们看看如何运行一个简单的矩阵乘法。将以下内容写入你的 `myapp\u002Fsrc\u002Fmain.rs` 文件：\n```rust\nuse candle_core::{Device, Tensor};\n\nfn main() -> Result\u003C(), Box\u003Cdyn std::error::Error>> {\n    let device = Device::Cpu;\n\n    let a = Tensor::randn(0f32, 1., (2, 3), &device)?;\n    let b = Tensor::randn(0f32, 1., (3, 4), &device)?;\n\n    let c = a.matmul(&b)?;\n    println!(\"{c}\");\n    Ok(())\n}\n```\n\n运行 `cargo run` 应该会显示一个形状为 `Tensor[[2, 4], f32]` 的张量。\n\n\n如果你已经安装了支持 CUDA 的 `candle`，只需将 `device` 定义为 GPU 即可：\n\n```diff\n- let device = Device::Cpu;\n+ let device = Device::new_cuda(0)?;\n```\n\n有关更高级的示例，请参阅下一节。\n\n## 查看我们的示例\n\n这些在线演示完全在您的浏览器中运行：\n- [yolo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-yolo)：姿态估计和目标识别。\n- [whisper](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-whisper)：语音识别。\n- [LLaMA2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2)：文本生成。\n- [T5](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-T5-Generation-Wasm)：文本生成。\n- [Phi-1.5 和 Phi-2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-Phi-1.5-Wasm)：文本生成。\n- [Segment Anything Model](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002Fcandle-segment-anything-wasm)：图像分割。\n- [BLIP](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-BLIP-Image-Captioning)：图像描述。\n\n我们还提供了一些基于命令行的示例，使用最先进的模型：\n\n- [LLaMA v1、v2 和 v3](.\u002Fcandle-examples\u002Fexamples\u002Fllama\u002F)：通用 LLM，包括 SOLAR-10.7B 变体。\n- [Falcon](.\u002Fcandle-examples\u002Fexamples\u002Ffalcon\u002F)：通用 LLM。\n- [Codegeex4](.\u002Fcandle-examples\u002Fexamples\u002Fcodegeex4-9b\u002F)：代码补全、代码解释器、网页搜索、函数调用、仓库级任务。\n- [GLM4](.\u002Fcandle-examples\u002Fexamples\u002Fglm4\u002F)：由 THUDM 发布的开放多语言多模态聊天 LLM。\n- [Gemma v1 和 v2](.\u002Fcandle-examples\u002Fexamples\u002Fgemma\u002F)：来自 Google Deepmind 的 2b 和 7b+\u002F9b 通用 LLM。\n- [RecurrentGemma](.\u002Fcandle-examples\u002Fexamples\u002Frecurrent-gemma\u002F)：Google 推出的基于 Griffin 架构的 2b 和 7b 模型，将注意力机制与 RNN 式状态相结合。\n- [Phi-1、Phi-1.5、Phi-2 和 Phi-3](.\u002Fcandle-examples\u002Fexamples\u002Fphi\u002F)：1.3b、2.7b 和 3.8b 通用 LLM，性能可与 7b 模型媲美。\n- [StableLM-3B-4E1T](.\u002Fcandle-examples\u002Fexamples\u002Fstable-lm\u002F)：一个 3b 通用 LLM，在 1T 个英语和代码数据集上预训练而成。同时也支持 1.6b 的 StableLM-2，以及其代码变体。\n- [Mamba](.\u002Fcandle-examples\u002Fexamples\u002Fmamba\u002F)：仅用于推理的 Mamba 状态空间模型实现。\n- [Mistral7b-v0.1](.\u002Fcandle-examples\u002Fexamples\u002Fmistral\u002F)：一个 7b 通用 LLM，截至 2023 年 9 月 28 日，其性能优于所有公开可用的 13b 模型。\n- [Mixtral8x7b-v0.1](.\u002Fcandle-examples\u002Fexamples\u002Fmixtral\u002F)：一种稀疏专家混合的 8x7b 通用 LLM，性能优于 Llama 2 的 70B 模型，且推理速度更快。\n- [StarCoder](.\u002Fcandle-examples\u002Fexamples\u002Fbigcode\u002F) 和 [StarCoder2](.\u002Fcandle-examples\u002Fexamples\u002Fstarcoder2\u002F)：专门用于代码生成的 LLM。\n- [Qwen1.5](.\u002Fcandle-examples\u002Fexamples\u002Fqwen\u002F)：双语（英语\u002F中文）LLM。\n- [RWKV v5 和 v6](.\u002Fcandle-examples\u002Fexamples\u002Frwkv\u002F)：一种具有 Transformer 级别 LLM 性能的 RNN。\n- [Replit-code-v1.5](.\u002Fcandle-examples\u002Fexamples\u002Freplit-code\u002F)：一个 3.3b 专门用于代码补全的 LLM。\n- [Yi-6B \u002F Yi-34B](.\u002Fcandle-examples\u002Fexamples\u002Fyi\u002F)：两款双语（英语\u002F中文）通用 LLM，分别拥有 6b 和 34b 参数。\n- [量化 LLaMA](.\u002Fcandle-examples\u002Fexamples\u002Fquantized\u002F)：使用与 [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp) 相同的量化技术对 LLaMA 模型进行量化。\n- [量化 Qwen3 MoE](.\u002Fcandle-examples\u002Fexamples\u002Fquantized-qwen3-moe\u002F)：支持 gguf 格式的 Qwen3 MoE 模型量化版本。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_b829a04f678b.gif\" width=\"600\">\n  \n- [Stable Diffusion](.\u002Fcandle-examples\u002Fexamples\u002Fstable-diffusion\u002F)：文本到图像生成模型，支持 1.5、2.1、SDXL 1.0 和 Turbo 版本。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_e0adeab9e540.jpg\" width=\"200\">\n\n- [Wuerstchen](.\u002Fcandle-examples\u002Fexamples\u002Fwuerstchen\u002F)：另一种文本到图像生成模型。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_0d9689cffc23.jpg\" width=\"200\">\n\n- [yolo-v3](.\u002Fcandle-examples\u002Fexamples\u002Fyolo-v3\u002F) 和 [yolo-v8](.\u002Fcandle-examples\u002Fexamples\u002Fyolo-v8\u002F)：目标检测和姿态估计模型。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_f5435313a8cd.jpg\" width=\"200\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_f57e6101131d.jpg\" width=\"200\">\n- [segment-anything](.\u002Fcandle-examples\u002Fexamples\u002Fsegment-anything\u002F)：带有提示的图像分割模型。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_readme_178ff134614b.jpg\" width=\"200\">\n\n- [SegFormer](.\u002Fcandle-examples\u002Fexamples\u002Fsegformer\u002F): 基于Transformer的语义分割模型。\n- [Whisper](.\u002Fcandle-examples\u002Fexamples\u002Fwhisper\u002F): 语音识别模型。\n- [EnCodec](.\u002Fcandle-examples\u002Fexamples\u002Fencodec\u002F): 使用残差向量量化技术的高质量音频压缩模型。\n- [MetaVoice](.\u002Fcandle-examples\u002Fexamples\u002Fmetavoice\u002F): 文本到语音的基础模型。\n- [Parler-TTS](.\u002Fcandle-examples\u002Fexamples\u002Fparler-tts\u002F): 大型文本到语音模型。\n- [T5](.\u002Fcandle-examples\u002Fexamples\u002Ft5)、[Bert](.\u002Fcandle-examples\u002Fexamples\u002Fbert\u002F)、[JinaBert](.\u002Fcandle-examples\u002Fexamples\u002Fjina-bert\u002F)：适用于句子嵌入。\n- [DINOv2](.\u002Fcandle-examples\u002Fexamples\u002Fdinov2\u002F): 自监督训练的计算机视觉模型（可用于ImageNet分类、深度估计、分割）。\n- [VGG](.\u002Fcandle-examples\u002Fexamples\u002Fvgg\u002F)、[RepVGG](.\u002Fcandle-examples\u002Fexamples\u002Frepvgg\u002F)：计算机视觉模型。\n- [BLIP](.\u002Fcandle-examples\u002Fexamples\u002Fblip\u002F): 图像到文本模型，可用于为图像生成标题。\n- [CLIP](.\u002Fcandle-examples\u002Fexamples\u002Fclip\u002F): 多模态视觉与语言模型。\n- [TrOCR](.\u002Fcandle-examples\u002Fexamples\u002Ftrocr\u002F)：Transformer OCR模型，具有专门用于手写和印刷体识别的子模型。\n- [Marian-MT](.\u002Fcandle-examples\u002Fexamples\u002Fmarian-mt\u002F)：神经机器翻译模型，根据输入文本生成翻译后的文本。\n- [Moondream](.\u002Fcandle-examples\u002Fexamples\u002Fmoondream\u002F)：小型计算机视觉模型，能够回答关于图像的实际问题。\n\n可以通过类似以下命令运行这些模型：\n```\ncargo run --example quantized --release\n```\n\n若要使用**CUDA**，请在示例命令行中添加 `--features cuda`。如果已安装cuDNN，则可使用 `--features cudnn` 以获得更高的加速效果。\n\n此外，还有针对Whisper和[llama2.c](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fllama2.c)的一些Wasm示例。你可以使用`trunk`构建它们，或在线试用：\n[Whisper](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-whisper)、\n[llama2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2)、\n[T5](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-T5-Generation-Wasm)、\n[Phi-1.5和Phi-2](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002FCandle-Phi-1.5-Wasm)、\n[Segment Anything Model](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fradames\u002Fcandle-segment-anything-wasm)。\n\n对于LLaMA2，运行以下命令以获取权重文件并启动测试服务器：\n```bash\ncd candle-wasm-examples\u002Fllama2-c\nwget https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2\u002Fresolve\u002Fmain\u002Fmodel.bin\nwget https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Flmz\u002Fcandle-llama2\u002Fresolve\u002Fmain\u002Ftokenizer.json\ntrunk serve --release --port 8081\n```\n然后访问\n[http:\u002F\u002Flocalhost:8081\u002F](http:\u002F\u002Flocalhost:8081\u002F)。\n\n\u003C!--- ANCHOR: useful_libraries --->\n\n\n\n## 有用的外部资源\n- [`candle-tutorial`](https:\u002F\u002Fgithub.com\u002FToluClassics\u002Fcandle-tutorial)：一个非常详细的教程，展示如何将PyTorch模型转换为Candle。\n- [`candle-lora`](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-lora)：高效且易用的Candle LoRA实现。`candle-lora`为Candle中的许多模型提供了开箱即用的LoRA支持，相关示例可在[这里](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-lora\u002Ftree\u002Fmaster\u002Fcandle-lora-transformers\u002Fexamples)找到。\n- [`candle-video`](https:\u002F\u002Fgithub.com\u002FFerrisMind\u002Fcandle-video)：基于Candle构建的Rust库，用于文本到视频生成（LTX-Video及相关模型），专注于快速、无需Python的推理。\n- [`optimisers`](https:\u002F\u002Fgithub.com\u002FKGrewal1\u002Foptimisers)：一系列优化器，包括带有动量的SGD、AdaGrad、AdaDelta、AdaMax、NAdam、RAdam和RMSprop。\n- [`candle-vllm`](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-vllm)：高效的本地大语言模型推理和服务平台，包含兼容OpenAI API的服务器。\n- [`candle-ext`](https:\u002F\u002Fgithub.com\u002Fmokeyish\u002Fcandle-ext)：Candle的扩展库，提供目前Candle中尚未实现的PyTorch功能。\n- [`candle-coursera-ml`](https:\u002F\u002Fgithub.com\u002Fvishpat\u002Fcandle-coursera-ml)：Coursera《机器学习专项课程》中ML算法的实现。\n- [`kalosm`](https:\u002F\u002Fgithub.com\u002Ffloneum\u002Ffloneum\u002Ftree\u002Fmaster\u002Finterfaces\u002Fkalosm)：Rust中的多模态元框架，用于对接本地预训练模型，支持可控生成、自定义采样器、内存中向量数据库、音频转录等功能。\n- [`candle-sampling`](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fcandle-sampling)：Candle的采样技术。\n- [`gpt-from-scratch-rs`](https:\u002F\u002Fgithub.com\u002Fjeroenvlek\u002Fgpt-from-scratch-rs)：Andrej Karpathy在YouTube上发布的“让我们构建GPT”教程的Rust移植版，展示了Candle API在玩具问题上的应用。\n- [`candle-einops`](https:\u002F\u002Fgithub.com\u002Ftomsanbear\u002Fcandle-einops)：纯Rust实现的Python [einops](https:\u002F\u002Fgithub.com\u002Farogozhnikov\u002Feinops)库。\n- [`atoma-infer`](https:\u002F\u002Fgithub.com\u002Fatoma-network\u002Fatoma-infer)：Rust库，用于大规模快速推理，利用FlashAttention2进行高效的注意力计算、PagedAttention进行高效的KV缓存管理，并支持多GPU。它兼容OpenAI API。\n- [`llms-from-scratch-rs`](https:\u002F\u002Fgithub.com\u002Fnerdai\u002Fllms-from-scratch-rs)：Sebastian Raschka的《从零开始构建LLM》一书代码的全面Rust翻译。\n- [`vllm.rs`](https:\u002F\u002Fgithub.com\u002Fguoqingbao\u002Fvllm.rs)：基于Candle的极简Rust vLLM实现。\n\n如果您有其他补充，请提交拉取请求。\n\n\u003C!--- ANCHOR_END: useful_libraries --->\n\n\u003C!--- ANCHOR: features --->\n\n## 特性\n\n- 简单的语法，使用起来感觉就像 PyTorch 一样。\n    - 模型训练。\n    - 可以嵌入用户自定义的操作\u002F内核，例如 [flash-attention v2](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fblob\u002F89ba005962495f2bfbda286e185e9c3c7f5300a3\u002Fcandle-flash-attn\u002Fsrc\u002Flib.rs#L152)。\n- 后端支持。\n    - 针对 x86 的优化 CPU 后端，可选 MKL 支持；针对 Mac 的 Accelerate 后端。\n    - CUDA 后端，可在 GPU 上高效运行，并通过 NCCL 实现多 GPU 分布式计算。\n    - WASM 支持，可以在浏览器中运行模型。\n- 内置模型。\n    - 语言模型。\n        - LLaMA v1、v2 和 v3，以及 SOLAR-10.7B 等变体。\n        - Falcon。\n        - StarCoder、StarCoder2。\n        - Phi 1、1.5、2 和 3。\n        - Mamba、Minimal Mamba。\n        - Gemma v1 2b 和 7b+，v2 2b 和 9b。\n        - Mistral 7b v0.1。\n        - Mixtral 8x7b v0.1。\n        - StableLM-3B-4E1T、StableLM-2-1.6B、Stable-Code-3B。\n        - Replit-code-v1.5-3B。\n        - Bert。\n        - Yi-6B 和 Yi-34B。\n        - Qwen1.5、Qwen1.5 MoE、Qwen3 MoE。\n        - RWKV v5 和 v6。\n    - 量化语言模型。\n        - Llama 7b、13b、70b，以及聊天和代码版本。\n        - Mistral 7b 和 7b 指令版。\n        - Mixtral 8x7b。\n        - Zephyr 7b a 和 b（基于 Mistral-7b）。\n        - OpenChat 3.5（基于 Mistral-7b）。\n        - Qwen3 MoE（16B-A3B、32B-A3B）。\n    - 文本到文本。\n        - T5 及其变体：FlanT5、UL2、MADLAD400（翻译）、CoEdit（语法修正）。\n        - Marian MT（机器翻译）。\n    - 文本到图像。\n        - Stable Diffusion v1.5、v2.1、XL v1.0。\n        - Wurstchen v2。\n    - 图像到文本。\n        - BLIP。\n        - TrOCR。\n    - 音频。\n        - Whisper，多语言语音转文本。\n        - EnCodec，音频压缩模型。\n        - MetaVoice-1B，文本到语音模型。\n        - Parler-TTS，文本到语音模型。\n    - 计算机视觉模型。\n        - DINOv2、ConvMixer、EfficientNet、ResNet、ViT、VGG、RepVGG、ConvNeXT、\n          ConvNeXTv2、MobileOne、EfficientVit（MSRA）、MobileNetv4、Hiera、FastViT。\n        - YOLO-v3、YOLO-v8。\n        - Segment-Anything Model (SAM)。\n        - SegFormer。\n- 文件格式：支持从 safetensors、npz、ggml 或 PyTorch 文件加载模型。\n- 无服务器（在 CPU 上），小型且快速的部署。\n- 支持使用 llama.cpp 的量化类型进行量化。\n\n\u003C!--- ANCHOR_END: features --->\n\n## 使用方法\n\n\u003C!--- ANCHOR: cheatsheet --->\n速查表：\n\n|            | 使用 PyTorch                            | 使用 Candle                                                     |\n|------------|------------------------------------------|------------------------------------------------------------------|\n| 创建       | `torch.Tensor([[1, 2], [3, 4]])`         | `Tensor::new(&[[1f32, 2.], [3., 4.]], &Device::Cpu)?`           |\n| 创建       | `torch.zeros((2, 2))`                    | `Tensor::zeros((2, 2), DType::F32, &Device::Cpu)?`               |\n| 索引       | `tensor[:, :4]`                          | `tensor.i((.., ..4))?`                                           |\n| 操作       | `tensor.view((2, 2))`                    | `tensor.reshape((2, 2))?`                                        |\n| 操作       | `a.matmul(b)`                            | `a.matmul(&b)?`                                                  |\n| 算术       | `a + b`                                  | `&a + &b`                                                        |\n| 设备       | `tensor.to(device=\"cuda\")`               | `tensor.to_device(&Device::new_cuda(0)?)?`                            |\n| 数据类型   | `tensor.to(dtype=torch.float16)`         | `tensor.to_dtype(&DType::F16)?`                                  |\n| 保存       | `torch.save({\"A\": A}, \"model.bin\")`      | `candle::safetensors::save(&HashMap::from([(\"A\", A)]), \"model.safetensors\")?` |\n| 加载       | `weights = torch.load(\"model.bin\")`      | `candle::safetensors::load(\"model.safetensors\", &device)`        |\n\n\u003C!--- ANCHOR_END: cheatsheet --->\n\n\n## 结构\n\n- [candle-core](.\u002Fcandle-core)：核心操作、设备以及 `Tensor` 结构的定义。\n- [candle-nn](.\u002Fcandle-nn\u002F)：用于构建实际模型的工具。\n- [candle-examples](.\u002Fcandle-examples\u002F)：库在实际场景中的使用示例。\n- [candle-kernels](.\u002Fcandle-kernels\u002F)：CUDA 自定义内核。\n- [candle-datasets](.\u002Fcandle-datasets\u002F)：数据集和数据加载器。\n- [candle-transformers](.\u002Fcandle-transformers)：与 Transformer 相关的工具。\n- [candle-flash-attn](.\u002Fcandle-flash-attn)：Flash Attention v2 层。\n- [candle-onnx](.\u002Fcandle-onnx\u002F)：ONNX 模型评估。\n\n## 常见问题解答\n\n### 为什么应该使用 Candle？\n\n\u003C!--- ANCHOR: goals --->\n\nCandle 的核心目标是 *实现无服务器推理*。像 PyTorch 这样的完整机器学习框架体积庞大，导致在集群上创建实例的速度较慢。而 Candle 则允许部署轻量级的二进制文件。\n\n其次，Candle 能够让你 *将 Python 从生产工作负载中移除*。Python 的开销会严重降低性能，而 [GIL](https:\u002F\u002Fwww.backblaze.com\u002Fblog\u002Fthe-python-gil-past-present-and-future\u002F) 更是出了名的麻烦来源。\n\n最后，Rust 很酷！HF 生态系统中已经有很多 Rust crate，比如 [safetensors](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fsafetensors) 和 [tokenizers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftokenizers)。\n\n\u003C!--- ANCHOR_END: goals --->\n\n### 其他 ML 框架\n\n- [dfdx](https:\u002F\u002Fgithub.com\u002Fcoreylowman\u002Fdfdx) 是一个功能强大的 crate，它将形状信息纳入类型系统中。这样可以避免很多因形状不匹配而导致的问题，编译器会在第一时间报错。不过，我们发现某些功能仍然需要 nightly 版本，而且对于非 Rust 专家来说，编写代码可能会有些困难。\n\n我们正在利用并为其他核心 crate 提供支持，希望这两个 crate 能够互相受益。\n\n- [burn](https:\u002F\u002Fgithub.com\u002Fburn-rs\u002Fburn) 是一个通用的 crate，可以利用多种后端，从而让你根据工作负载选择最合适的引擎。\n\n- [tch-rs](https:\u002F\u002Fgithub.com\u002FLaurentMazare\u002Ftch-rs.git) 是 Rust 中对 torch 库的绑定。它非常灵活，但会将整个 torch 库引入运行时环境。`tch-rs` 的主要贡献者也参与了 `candle` 的开发。\n\n### 常见错误\n\n#### 使用 mkl 功能编译时缺少符号。\n\n如果你在使用 mkl 或 accelerate 功能编译二进制文件或测试时遇到一些缺失的符号，例如对于 mkl 你会看到：\n```\n  = note: \u002Fusr\u002Fbin\u002Fld: (....o): in function `blas::sgemm':\n          ...\u002Fblas-0.22.0\u002Fsrc\u002Flib.rs:1944: undefined reference to `sgemm_' collect2: error: ld returned 1 exit status\n\n  = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified\n  = note: use the `-l` flag to specify native libraries to link\n  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo\n```\n或者对于 accelerate：\n```\nUndefined symbols for architecture arm64:\n            \"_dgemm_\", referenced from:\n                candle_core::accelerate::dgemm::h1b71a038552bcabe in libcandle_core...\n            \"_sgemm_\", referenced from:\n                candle_core::accelerate::sgemm::h2cf21c592cba3c47 in libcandle_core...\n          ld: symbol(s) not found for architecture arm64\n```\n\n这很可能是由于缺少启用 mkl 库所需的链接器标志。你可以尝试在你的二进制文件顶部添加以下内容以解决 mkl 的问题：\n```rust\nextern crate intel_mkl_src;\n```\n或者对于 accelerate：\n```rust\nextern crate accelerate_src;\n```\n\n#### 无法运行 LLaMA 示例：访问源需要登录凭证\n\n```\nError: request error: https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-hf\u002Fresolve\u002Fmain\u002Ftokenizer.json: status code 401\n```\n\n这很可能是因为你没有 LLaMA-v2 模型的权限。要解决这个问题，你需要在 huggingface-hub 上注册，接受 [LLaMA-v2 模型条款](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-hf)，并设置你的认证令牌。更多详情请参阅 issue [#350](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F350)。\n\n#### Docker 构建\n\n在 Dockerfile 中构建 CUDA 内核时，nvidia-smi 无法用于自动检测计算能力。\n\n你必须显式设置 CUDA_COMPUTE_CAP，例如：\n\n```\nFROM nvidia\u002Fcuda:12.9.0-devel-ubuntu22.04\n\n# 安装 git 和 curl\nRUN set -eux; \\\n  apt-get update; \\\n  apt-get install -y curl git ca-certificates;\n\n# 安装 Rust\nRUN curl --proto '=https' --tlsv1.2 -sSf https:\u002F\u002Fsh.rustup.rs | sh -s -- -y\n\n# 克隆 candle 仓库\nRUN git clone https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle.git\n\n# 设置构建的计算能力\nARG CUDA_COMPUTE_CAP=90\nENV CUDA_COMPUTE_CAP=${CUDA_COMPUTE_CAP}\n\n# 使用显式计算能力进行构建\nWORKDIR \u002Fapp\nCOPY . .\nRUN cargo build --release features cuda\n```\n\n#### 使用 flash-attention 编译失败\n\n```\n\u002Fusr\u002Finclude\u002Fc++\u002F11\u002Fbits\u002Fstd_function.h:530:146: error: parameter packs not expanded with ‘...’:\n```\n\n这是由 Cuda 编译器触发的 gcc-11 中的一个 bug。要修复这个问题，可以安装一个受支持的其他版本的 gcc，例如 gcc-10，并将编译器路径指定到 NVCC_CCBIN 环境变量中。\n```\nenv NVCC_CCBIN=\u002Fusr\u002Flib\u002Fgcc\u002Fx86_64-linux-gnu\u002F10 cargo ...\n```\n\n#### 在 Windows 上运行 rustdoc 或 mdbook 测试时出现链接错误\n\n```\nCouldn't compile the test.\n---- .\\candle-book\\src\\inference\\hub.md - Using_the_hub::Using_in_a_real_model_ (line 50) stdout ----\nerror: linking with `link.exe` failed: exit code: 1181\n\u002F\u002Fvery long chain of linking\n = note: LINK : fatal error LNK1181: cannot open input file 'windows.0.48.5.lib'\n```\n\n确保链接所有可能位于项目目标之外的原生库。例如，要运行 mdbook 测试，你应该执行以下命令：\n```\nmdbook test candle-book -L .\\target\\debug\\deps\\ `\n-L native=$env:USERPROFILE\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\windows_x86_64_msvc-0.42.2\\lib `\n-L native=$env:USERPROFILE\\.cargo\\registry\\src\\index.crates.io-6f17d22bba15001f\\windows_x86_64_msvc-0.48.5\\lib\n```\n\n#### WSL 下模型加载时间极慢\n\n这可能是由于模型从 `\u002Fmnt\u002Fc` 加载导致的，更多详细信息请参阅 [stackoverflow](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F68972448\u002Fwhy-is-wsl-extremely-slow-when-compared-with-native-windows-npm-yarn-processing)。\n\n#### 跟踪错误\n\n你可以设置 `RUST_BACKTRACE=1`，以便在 candle 报错时提供完整的调用栈信息。\n\n#### CudaRC 错误\n\n如果你在 Windows 上遇到类似 `called `Result::unwrap()` on an `Err` value: LoadLibraryExW { source: Os { code: 126, kind: Uncategorized, message: \"The specified module could not be found.\" } }` 的错误，可以通过复制并重命名以下 3 个文件来解决（确保它们在系统路径中）。具体路径取决于你的 CUDA 版本。\n`c:\\Windows\\System32\\nvcuda.dll` -> `cuda.dll`\n`c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.4\\bin\\cublas64_12.dll` -> `cublas.dll`\n`c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.4\\bin\\curand64_10.dll` -> `curand.dll`","# Candle 快速上手指南\n\nCandle 是一个由 Hugging Face 推出的极简 Rust 机器学习框架，专注于高性能（支持 GPU）和易用性。它允许开发者在 Rust 环境中高效运行大语言模型（LLM）、计算机视觉模型及语音识别模型。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows。\n*   **Rust 工具链**: 需安装最新稳定版 Rust。\n    ```bash\n    curl --proto '=https' --tlsv1.2 -sSf https:\u002F\u002Fsh.rustup.rs | sh\n    source $HOME\u002F.cargo\u002Fenv\n    rustc --version # 建议版本 1.70+\n    ```\n*   **GPU 支持 (可选)**:\n    *   若需使用 NVIDIA GPU 加速，请确保已安装 **CUDA Toolkit** (通常建议 11.8 或 12.x) 和对应的显卡驱动。\n    *   若需极致性能，可额外安装 **cuDNN**。\n*   **构建工具**: 确保 `cargo` 可用。\n\n> **国内开发者提示**: 如果下载 Rust 组件或 crates 依赖速度较慢，建议配置国内镜像源：\n> ```bash\n> # 配置 Rustup 镜像 (USTC)\n> export RUSTUP_DIST_SERVER=https:\u002F\u002Fmirrors.ustc.edu.cn\u002Frust-static\n> export RUSTUP_UPDATE_ROOT=https:\u002F\u002Fmirrors.ustc.edu.cn\u002Frust-static\u002Frustup\n> \n> 配置 Cargo 注册表镜像 (在 ~\u002F.cargo\u002Fconfig.toml 中添加)\n> [source.crates-io]\n> replace-with = 'ustc'\n> [source.ustc]\n> registry = \"sparse+https:\u002F\u002Fmirrors.ustc.edu.cn\u002Fcrates.io-index\u002F\"\n> ```\n\n## 安装步骤\n\nCandle 的核心库是 `candle-core`。你可以通过创建一个新的 Rust 项目并添加依赖来开始。\n\n1.  **创建新项目**:\n    ```bash\n    cargo new my_candle_app\n    cd my_candle_app\n    ```\n\n2.  **添加依赖**:\n    编辑 `Cargo.toml` 文件，添加 `candle-core`。\n    \n    *基础 CPU 版本*:\n    ```toml\n    [dependencies]\n    candle-core = \"0.6\" # 请使用 crates.io 上的最新版本\n    ```\n\n    *启用 CUDA GPU 加速版本*:\n    ```toml\n    [dependencies]\n    candle-core = { version = \"0.6\", features = [\"cuda\"] }\n    ```\n    \n    *启用 cuDNN 加速 (需预先安装 cuDNN)*:\n    ```toml\n    [dependencies]\n    candle-core = { version = \"0.6\", features = [\"cuda\", \"cudnn\"] }\n    ```\n\n## 基本使用\n\n以下是一个最简单的示例，演示如何创建两个随机张量并执行矩阵乘法。\n\n1.  **编写代码**:\n    将以下内容写入 `src\u002Fmain.rs`：\n\n    ```rust\n    use candle_core::{Device, Tensor};\n\n    fn main() -> Result\u003C(), Box\u003Cdyn std::error::Error>> {\n        \u002F\u002F 选择设备：CPU 或 CUDA\n        \u002F\u002F 如果启用了 cuda 特性，可以使用 Device::new_cuda(0)?\n        let device = Device::Cpu;\n\n        \u002F\u002F 创建两个随机张量 (形状分别为 2x3 和 3x4)\n        let a = Tensor::randn(0f32, 1., (2, 3), &device)?;\n        let b = Tensor::randn(0f32, 1., (3, 4), &device)?;\n\n        \u002F\u002F 执行矩阵乘法\n        let c = a.matmul(&b)?;\n        \n        \u002F\u002F 打印结果，预期形状为 [2, 4]\n        println!(\"{c}\");\n        \n        Ok(())\n    }\n    ```\n\n2.  **运行程序**:\n    \n    *CPU 模式*:\n    ```bash\n    cargo run\n    ```\n    \n    *GPU 模式 (需在 Cargo.toml 中开启 cuda 特性)*:\n    ```bash\n    cargo run --release\n    ```\n\n    输出应类似如下内容，显示计算后的张量数据：\n    ```text\n    Tensor[[2, 4], f32]\n    [[0.1234, -0.5678, ...],\n     [...]]\n    ```\n\n## 进阶示例运行\n\nCandle 官方提供了大量预置示例（如 LLaMA, Whisper, YOLO, Stable Diffusion 等）。你可以直接克隆仓库并运行这些示例。\n\n1.  **获取源码**:\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle.git\n    cd candle\n    ```\n\n2.  **运行示例**:\n    例如，运行量化版的 LLaMA 示例：\n    \n    *CPU 运行*:\n    ```bash\n    cargo run --example quantized --release\n    ```\n\n    *GPU 加速运行*:\n    ```bash\n    cargo run --example quantized --release --features cuda\n    ```\n    \n    *使用 cuDNN 进一步加速*:\n    ```bash\n    cargo run --example quantized --release --features cudnn\n    ```\n\n更多模型示例（如 Mistral, Gemma, Phi-3 等）位于 `candle-examples\u002Fexamples\u002F` 目录下，可根据具体模型目录查看对应的权重下载和运行说明。","一家专注于边缘计算的初创团队，正试图将大型语言模型（LLM）部署到资源受限的 IoT 网关设备上，以提供本地化的智能客服功能。\n\n### 没有 candle 时\n- **内存占用过高**：依赖 Python 解释器及庞大的深度学习框架（如 PyTorch），导致在仅有 2GB 内存的设备上无法加载模型，频繁发生 OOM 崩溃。\n- **启动延迟严重**：冷启动时需要初始化复杂的运行时环境，用户发出请求后需等待数秒才能收到响应，体验极差。\n- **部署流程繁琐**：需要在目标设备上交叉编译安装 Python 依赖和 CUDA\u002FcuDNN 库，环境配置极易出错，维护成本高昂。\n- **类型安全隐患**：动态语言特性使得内存管理和数据类型错误只能在运行时发现，增加了系统在生产环境中的不稳定性。\n\n### 使用 candle 后\n- **极致轻量运行**：利用 Rust 的零成本抽象和无垃圾回收机制，candle 将模型推理的内存占用降低了 60%，成功在低配设备上流畅运行 LLaMA 等模型。\n- **毫秒级响应**：去除了沉重的解释器开销，结合 candle 对 GPU\u002FCPU 的高效调度，首字生成时间从秒级缩短至毫秒级，实现实时交互。\n- **单二进制交付**：借助 Cargo 构建能力，整个应用被编译为独立的静态二进制文件，无需任何外部依赖即可直接部署到各类 Linux 边缘节点。\n- **编译期安全保障**：Rust 的强类型系统在编译阶段即可拦截维度不匹配或内存访问错误，显著提升了长期运行的可靠性。\n\ncandle 通过极简的 Rust 原生架构，打破了高性能 AI 模型在边缘设备部署的资源与效率瓶颈。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_candle_6d632687.png","huggingface","Hugging Face","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhuggingface_90da21a4.png","The AI community building the future.",null,"https:\u002F\u002Fhuggingface.co\u002F","https:\u002F\u002Fgithub.com\u002Fhuggingface",[83,87,91,94,98,102,106,110,114,117],{"name":84,"color":85,"percentage":86},"Rust","#dea584",77.3,{"name":88,"color":89,"percentage":90},"Metal","#8f14e9",6.8,{"name":92,"color":93,"percentage":90},"C++","#f34b7d",{"name":95,"color":96,"percentage":97},"Cuda","#3A4E3A",5,{"name":99,"color":100,"percentage":101},"HTML","#e34c26",1.8,{"name":103,"color":104,"percentage":105},"Python","#3572A5",1.7,{"name":107,"color":108,"percentage":109},"JavaScript","#f1e05a",0.5,{"name":111,"color":112,"percentage":113},"Shell","#89e051",0,{"name":115,"color":116,"percentage":113},"C","#555555",{"name":118,"color":119,"percentage":113},"Makefile","#427819",19901,1508,"2026-04-05T20:25:36","Apache-2.0",4,"Linux, macOS, Windows","非必需。支持 NVIDIA GPU (需安装 CUDA，可选 cuDNN 加速)；支持 WebAssembly (浏览器运行)。具体显存取决于运行的模型大小。","未说明 (取决于运行的模型大小)",{"notes":129,"python":130,"dependencies":131},"这是一个基于 Rust 的机器学习框架，无需 Python 环境。若需启用 GPU 加速，需在编译时添加 '--features cuda' 标志（若使用 cuDNN 则添加 '--features cudnn'）。部分示例支持在浏览器中通过 WebAssembly 运行。运行大型模型（如 LLaMA、Stable Diffusion）时，内存和显存需求随模型参数量增加而显著增长。","不需要 (基于 Rust)",[132,133,134,135,136],"Rust 工具链","candle-core","CUDA Toolkit (可选，用于 GPU 支持)","cuDNN (可选，用于加速)","trunk (可选，用于构建 Wasm 示例)",[26,14,55,13],"2026-03-27T02:49:30.150509","2026-04-06T09:06:53.939990",[141,146,151,156,161,166,171,176],{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},18595,"如何在 Linux 上运行带有 Flash Attention 的示例时解决编译错误？","如果在更新系统文件后无法在 Linux 上构建带有 flash-attn 的项目（例如使用 RTX 4000 ADA），请确保设置了正确的 CUDA 计算能力标志。尝试使用以下命令运行：\nCUDA_COMPUTE_CAP=80 cargo run --example stable-diffusion-3 --features cuda,cudnn,flash-attn -- --prompt \"something cool\" --which 3.5-large --use-flash-attn\n如果仍然报错，可能需要检查 CUDA 版本是否与 cutlass 兼容，或重新安装 CUDA 工具包。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F2816",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},18596,"Candle 是否支持量化模型（如 Q2-Q8）？如何运行量化后的模型？","是的，Candle 支持量化模型。你可以使用 tensor-tools 将 safetensors 模型转换为 GGUF 格式并指定量化等级。例如，将模型量化为 q8_0 格式的命令如下：\ncargo run --example tensor-tools --release -- quantize --quantization q8_0 model.safetensors -out model-q80.gguf\n之后即可加载该 .gguf 文件运行模型。部分配置信息会在加载时根据权重形状自动推断。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F359",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},18597,"为什么运行量化示例时出现 'libcuda.so: cannot open shared object file' 错误？","该错误通常表示系统未正确安装 NVIDIA 驱动或 CUDA 运行时库。请确保已安装与你的 GPU 匹配的 NVIDIA 驱动，并且 libcuda.so 存在于系统库路径中（如 \u002Fusr\u002Flib\u002Fx86_64-linux-gnu\u002F）。你可以通过运行 ldconfig -p | grep libcuda 检查是否存在。若缺失，请重新安装 nvidia-driver 和 cuda-toolkit 包。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F2175",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},18598,"如何在 Apple Silicon (M1\u002FM2\u002FM3) 上运行 Candle？是否支持 MPS 后端？","目前 Candle 对 Apple Silicon 的支持正在发展中，已有社区项目基于 Candle 封装并在 M 系列芯片上进行了基准测试，表现良好。虽然原生 MPS 后端尚未完全成熟，但你可以参考相关封装仓库获取在 Mac 上运行的优化方案。建议关注官方后续更新以获取原生 MPS 支持。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F313",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},18599,"使用 tensor-tools 量化 Llama 模型后加载失败，提示缺少 'llama.attention.head_count' 元数据，如何解决？","这是因为量化过程中丢失了原始配置文件中的元数据。当前解决方案是在加载 .gguf 权重时，程序会根据权重形状自动推断配置。此方法仅适用于与 Karpathy 仓库中使用的配置变体一致的模型。如遇此问题，请确保模型结构符合预期，或手动补充缺失的配置参数。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F1182",{"id":167,"question_zh":168,"answer_zh":169,"source_url":170},18600,"Candle 是否支持强化学习（Reinforcement Learning）？是否有相关示例？","是的，Candle 现已提供强化学习的支持，并包含了相关的示例代码。你可以查看官方仓库中的 examples 目录，寻找 RL 相关的示例项目进行学习和实践。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F1065",{"id":172,"question_zh":173,"answer_zh":174,"source_url":175},18601,"是否有量化版 Whisper 模型的示例？如何自行量化 Whisper 模型？","目前官方尚未提供专门的量化 Whisper 示例，但你可以使用 tensor-tools 自行量化。命令如下：\ncd tensor-tools\ncargo run --release -- quantize --quantization q8_0 ..\u002F..\u002Fmodel.safetensors --out-file model-q80.gguf\n该命令不依赖 whisper.cpp，可直接将 safetensors 转为 gguf 格式用于推理。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F574",{"id":177,"question_zh":178,"answer_zh":179,"source_url":180},18602,"使用 --features cuda 编译示例时遇到解析器警告或编译失败，如何处理？","如果出现关于 resolver 版本的警告（如 edition 2021 默认使用 resolver = \"2\"，而虚拟工作区默认为 \"1\"），请在 workspace 根目录的 Cargo.toml 中显式指定：\n[workspace]\nresolver = \"2\"\n此外，确保你的 GPU 计算能力不低于 7.5，否则可能因架构过旧导致 CUDA 支持被弃用。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle\u002Fissues\u002F353",[]]