[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-EricLBuehler--mistral.rs":3,"tool-EricLBuehler--mistral.rs":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":79,"owner_url":80,"languages":81,"stars":120,"forks":121,"last_commit_at":122,"license":123,"difficulty_score":23,"env_os":124,"env_gpu":125,"env_ram":126,"env_deps":127,"category_tags":135,"github_topics":136,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":140,"updated_at":141,"faqs":142,"releases":172},3872,"EricLBuehler\u002Fmistral.rs","mistral.rs","Fast, flexible LLM inference","mistral.rs 是一款基于 Rust 构建的高性能大语言模型推理引擎，旨在为用户提供快速、灵活且零配置的本地 AI 体验。它解决了传统推理工具配置繁琐、多模态支持分散以及硬件适配复杂等痛点，让用户只需一条命令即可运行来自 Hugging Face 的各类模型。\n\n无论是希望快速部署本地聊天机器人的开发者、需要高效测试不同量化策略的研究人员，还是想要在不编写代码的情况下体验最新多模态模型的普通用户，mistral.rs 都能满足需求。其核心亮点包括真正的“全模态”支持，能够在一个引擎中处理文本、图像、视频和音频输入；内置智能硬件调优功能，可自动基准测试并选择最适合当前设备的量化方案；同时提供原生 Web 界面和 Python\u002FRust SDK，方便集成与二次开发。此外，它还支持连续的批处理技术和多种量化格式，确保在消费级显卡上也能获得流畅的推理速度。通过简单的命令行操作，用户即可轻松开启从交互式对话到复杂代理任务的各种应用场景。","\u003Ca name=\"top\">\u003C\u002Fa>\n\u003C!--\n\u003Ch1 align=\"center\">\n  mistral.rs\n\u003C\u002Fh1>\n-->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEricLBuehler_mistral.rs_readme_0ecec90712c6.png\" alt=\"mistral.rs\" width=\"100%\" style=\"max-width: 800px;\">\n\u003C\u002Fdiv>\n\n\u003Ch3 align=\"center\">\nFast, flexible LLM inference.\n\u003C\u002Fh3>\n\n\u003Cp align=\"center\">\n  | \u003Ca href=\"https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002F\">\u003Cb>Documentation\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fcrates.io\u002Fcrates\u002Fmistralrs\">\u003Cb>Rust SDK\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FPYTHON_SDK.html\">\u003Cb>Python SDK\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FSZrecqK8qw\">\u003Cb>Discord\u003C\u002Fb>\u003C\u002Fa> |\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fstargazers\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FEricLBuehler\u002Fmistral.rs?style=social&label=Star\" alt=\"GitHub stars\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n## Latest\n\n- **Gemma 4**: Full multimodal: text, image, video, and audio input. [Guide](docs\u002FGEMMA4.md) | [Video setup](docs\u002FVIDEO.md)\n- **MXFP4 ISQ quantization**: MXFP4 with optimized decode kernels for faster, smaller models. [Quantization docs](docs\u002FQUANTS.md)\n- **Qwen 3.5 model family**: Support for the Qwen 3.5 series including vision. [Guide](docs\u002FQWEN3_5.md)\n\n## Why mistral.rs?\n\n- **Any Hugging Face model, zero config**: Just `mistralrs run -m user\u002Fmodel`.\n- **True multimodality**: Text, vision, video, and audio, speech generation, image generation, and embeddings in one engine.\n- **Full quantization control**: Choose the precise quantization you want to use, or make your own UQFF with `mistralrs quantize`.\n- **Built-in web UI**: `mistralrs serve --ui` gives you a web interface instantly.\n- **Hardware-aware**: `mistralrs tune` benchmarks your system and picks optimal quantization + device mapping.\n- **Flexible SDKs**: Python package and Rust crate to build your projects.\n- **Agentic features** — tool calling, web search, and MCP client built in\n\n## Quick Start\n\n### Install\n\n**Linux\u002FmacOS:**\n```bash\ncurl --proto '=https' --tlsv1.2 -sSf https:\u002F\u002Fraw.githubusercontent.com\u002FEricLBuehler\u002Fmistral.rs\u002Fmaster\u002Finstall.sh | sh\n```\n\n**Windows (PowerShell):**\n```powershell\nirm https:\u002F\u002Fraw.githubusercontent.com\u002FEricLBuehler\u002Fmistral.rs\u002Fmaster\u002Finstall.ps1 | iex\n```\n\n[Manual installation & other platforms](docs\u002FINSTALLATION.md)\n\n### Run Your First Model\n\n```bash\n# Interactive chat\nmistralrs run -m Qwen\u002FQwen3-4B\n\n# One-shot prompt (no interactive session)\nmistralrs run -m Qwen\u002FQwen3-4B -i \"What is the capital of France?\"\n\n# One-shot with an image\nmistralrs run -m google\u002Fgemma-4-E4B-it --image photo.jpg -i \"Describe this image\"\n\n# Or start a server with web UI\nmistralrs serve --ui -m google\u002Fgemma-4-E4B-it\n```\n\nThen visit `http:\u002F\u002Flocalhost:1234\u002Fui` for the web chat interface.\n\n### The `mistralrs` CLI\n\nThe CLI is designed to be **zero-config**: just point it at a model and go.\n\n- **Auto-detection**: Automatically detects model architecture, quantization format, and chat template\n- **All-in-one**: Single binary for chat, server, benchmarks, and web UI (`run`, `serve`, `bench`)\n- **Hardware tuning**: Run `mistralrs tune` to automatically benchmark and configure optimal settings for your hardware\n- **Format-agnostic**: Works with Hugging Face models, GGUF files, and [UQFF quantizations](docs\u002FUQFF.md) seamlessly\n\n```bash\n# Auto-tune for your hardware and emit a config file\nmistralrs tune -m Qwen\u002FQwen3-4B --emit-config config.toml\n\n# Run using the generated config\nmistralrs from-config -f config.toml\n\n# Diagnose system issues (CUDA, Metal, HuggingFace connectivity)\nmistralrs doctor\n```\n\n[Full CLI documentation](docs\u002FCLI.md)\n\n\u003Cdetails open>\n  \u003Csummary>\u003Cb>Web Chat Demo\u003C\u002Fb>\u003C\u002Fsummary>\n  \u003Cbr>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEricLBuehler_mistral.rs_readme_9cf2e024ee71.gif\" alt=\"Web Chat UI Demo\" \u002F>\n\u003C\u002Fdetails>\n\n## What Makes It Fast\n\n**Performance**\n- Continuous batching support by default on all devices.\n- CUDA with [FlashAttention](docs\u002FFLASH_ATTENTION.md) V2\u002FV3, Metal, [multi-GPU tensor parallelism](docs\u002FDISTRIBUTED\u002FDISTRIBUTED.md)\n- [PagedAttention](docs\u002FPAGED_ATTENTION.md) for high throughput continuous batching on CUDA or Apple Silicon, prefix caching (including multimodal)\n\n**Quantization** ([full docs](docs\u002FQUANTS.md))\n- [In-situ quantization (ISQ)](docs\u002FISQ.md) of any Hugging Face model\n- GGUF (2-8 bit), GPTQ, AWQ, HQQ, FP8, BNB support\n- ⭐ [Per-layer topology](docs\u002FTOPOLOGY.md): Fine-tune quantization per layer for optimal quality\u002Fspeed\n- ⭐ Auto-select fastest quant method for your hardware\n\n**Flexibility**\n- [LoRA & X-LoRA](docs\u002FADAPTER_MODELS.md) with weight merging\n- [AnyMoE](docs\u002FANYMOE.md): Create mixture-of-experts on any base model\n- [Multiple models](docs\u002Fmulti_model\u002FREADME.md): Load\u002Funload at runtime\n\n**Agentic Features**\n- Integrated [tool calling](docs\u002FTOOL_CALLING.md) with Python\u002FRust callbacks\n- ⭐ [Web search integration](docs\u002FWEB_SEARCH.md)\n- ⭐ [MCP client](docs\u002FMCP\u002Fclient.md): Connect to external tools automatically\n\n[Full feature documentation](docs\u002FREADME.md)\n\n## Supported Models\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Text Models\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Granite 4.0\n- SmolLM 3\n- DeepSeek V3\n- GPT-OSS\n- DeepSeek V2\n- Qwen 3 Next\n- Qwen 3 MoE\n- Phi 3.5 MoE\n- Qwen 3\n- GLM 4\n- GLM-4.7-Flash\n- GLM-4.7 (MoE)\n- Gemma 2\n- Qwen 2\n- Starcoder 2\n- Phi 3\n- Mixtral\n- Phi 2\n- Gemma\n- Llama\n- Mistral\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Multimodal Models\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Qwen 3.5\n- Qwen 3.5 MoE\n- Qwen 3-VL\n- Qwen 3-VL MoE\n- Gemma 3n\n- Llama 4\n- Gemma 3\n- Mistral 3\n- Phi 4 multimodal\n- Qwen 2.5-VL\n- MiniCPM-O\n- Llama 3.2 Vision\n- Qwen 2-VL\n- Idefics 3\n- Idefics 2\n- LLaVA Next\n- LLaVA\n- Phi 3V\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Speech Models\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Voxtral (ASR\u002Fspeech-to-text)\n- Dia\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Image Generation Models\u003C\u002Fb>\u003C\u002Fsummary>\n\n- FLUX\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Embedding Models\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Embedding Gemma\n- Qwen 3 Embedding\n\u003C\u002Fdetails>\n\n[Request a new model](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues\u002F156) | [Full compatibility tables](docs\u002FSUPPORTED_MODELS.md)\n\n## Python SDK\n\n```bash\npip install mistralrs  # or mistralrs-cuda, mistralrs-metal, mistralrs-mkl, mistralrs-accelerate\n```\n\n```python\nfrom mistralrs import Runner, Which, ChatCompletionRequest\n\nrunner = Runner(\n    which=Which.Plain(model_id=\"Qwen\u002FQwen3-4B\"),\n    in_situ_quant=\"4\",\n)\n\nres = runner.send_chat_completion_request(\n    ChatCompletionRequest(\n        model=\"default\",\n        messages=[{\"role\": \"user\", \"content\": \"Hello!\"}],\n        max_tokens=256,\n    )\n)\nprint(res.choices[0].message.content)\n```\n\n[Python SDK](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FPYTHON_SDK.html) | [Installation](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FPYTHON_INSTALLATION.html) | [Examples](examples\u002Fpython) | [Cookbook](examples\u002Fpython\u002Fcookbook.ipynb)\n\n## Rust SDK\n\n```bash\ncargo add mistralrs\n```\n\n```rust\nuse anyhow::Result;\nuse mistralrs::{IsqType, TextMessageRole, TextMessages, MultimodalModelBuilder};\n\n#[tokio::main]\nasync fn main() -> Result\u003C()> {\n    let model = MultimodalModelBuilder::new(\"google\u002Fgemma-4-E4B-it\")\n        .with_isq(IsqType::Q4K)\n        .with_logging()\n        .build()\n        .await?;\n\n    let messages = TextMessages::new().add_message(\n        TextMessageRole::User,\n        \"Hello!\",\n    );\n\n    let response = model.send_chat_request(messages).await?;\n\n    println!(\"{:?}\", response.choices[0].message.content);\n\n    Ok(())\n}\n```\n\n[API Docs](https:\u002F\u002Fdocs.rs\u002Fmistralrs) | [Crate](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fmistralrs) | [Examples](mistralrs\u002Fexamples)\n\n## Docker\n\nFor quick containerized deployment:\n\n```bash\ndocker pull ghcr.io\u002Fericlbuehler\u002Fmistral.rs:latest\ndocker run --gpus all -p 1234:1234 ghcr.io\u002Fericlbuehler\u002Fmistral.rs:latest \\\n  serve -m Qwen\u002FQwen3-4B\n```\n\n[Docker images](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpkgs\u002Fcontainer\u002Fmistral.rs)\n\n> For production use, we recommend installing the CLI directly for maximum flexibility.\n\n## Documentation\n\nFor complete documentation, see the **[Documentation](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002F)**.\n\n**Quick Links:**\n- [CLI Reference](docs\u002FCLI.md) - All commands and options\n- [HTTP API](docs\u002FHTTP.md) - OpenAI-compatible endpoints\n- [Quantization](docs\u002FQUANTS.md) - ISQ, GGUF, GPTQ, and more\n- [Device Mapping](docs\u002FDEVICE_MAPPING.md) - Multi-GPU and CPU offloading\n- [MCP Integration](docs\u002FMCP\u002FREADME.md) - MCP integration documentation\n- [Troubleshooting](docs\u002FTROUBLESHOOTING.md) - Common issues and solutions\n- [Configuration](docs\u002FCONFIGURATION.md) - Environment variables for configuration\n\n## Contributing\n\nContributions welcome! Please [open an issue](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues) to discuss new features or report bugs. If you want to add a new model, please contact us via an issue and we can coordinate.\n\n## Credits\n\nThis project would not be possible without the excellent work at [Candle](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle). Thank you to all [contributors](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fgraphs\u002Fcontributors)!\n\nmistral.rs is not affiliated with Mistral AI.\n\n\u003Cp align=\"right\">\n  \u003Ca href=\"#top\">Back to Top\u003C\u002Fa>\n\u003C\u002Fp>\n","\u003Ca name=\"top\">\u003C\u002Fa>\n\u003C!--\n\u003Ch1 align=\"center\">\n  mistral.rs\n\u003C\u002Fh1>\n-->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEricLBuehler_mistral.rs_readme_0ecec90712c6.png\" alt=\"mistral.rs\" width=\"100%\" style=\"max-width: 800px;\">\n\u003C\u002Fdiv>\n\n\u003Ch3 align=\"center\">\n快速、灵活的大型语言模型推理。\n\u003C\u002Fh3>\n\n\u003Cp align=\"center\">\n  | \u003Ca href=\"https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002F\">\u003Cb>文档\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fcrates.io\u002Fcrates\u002Fmistralrs\">\u003Cb>Rust SDK\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FPYTHON_SDK.html\">\u003Cb>Python SDK\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FSZrecqK8qw\">\u003Cb>Discord\u003C\u002Fb>\u003C\u002Fa> |\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fstargazers\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FEricLBuehler\u002Fmistral.rs?style=social&label=Star\" alt=\"GitHub 星标\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n## 最新动态\n\n- **Gemma 4**: 全模态支持：文本、图像、视频和音频输入。[指南](docs\u002FGEMMA4.md) | [视频设置](docs\u002FVIDEO.md)\n- **MXFP4 ISQ 量化**: 使用优化解码内核的 MXFP4，使模型更快、更小。[量化文档](docs\u002FQUANTS.md)\n- **通义千问 3.5 系列模型**: 支持包括视觉在内的通义千问 3.5 系列模型。[指南](docs\u002FQWEN3_5.md)\n\n## 为什么选择 mistral.rs？\n\n- **无需配置即可运行任何 Hugging Face 模型**: 只需 `mistralrs run -m user\u002Fmodel`。\n- **真正的多模态支持**: 文本、视觉、视频和音频、语音生成、图像生成以及嵌入功能集成在一个引擎中。\n- **完全可控的量化**: 选择您想要使用的精确量化方式，或者使用 `mistralrs quantize` 自定义 UQFF。\n- **内置 Web UI**: 运行 `mistralrs serve --ui` 即可立即获得 Web 界面。\n- **硬件感知**: `mistralrs tune` 会基准测试您的系统，并选择最佳的量化方案和设备映射。\n- **灵活的 SDK**: 提供 Python 包和 Rust crate，方便您构建自己的项目。\n- **智能体特性** — 内置工具调用、网页搜索和 MCP 客户端。\n\n## 快速入门\n\n### 安装\n\n**Linux\u002FmacOS:**\n```bash\ncurl --proto '=https' --tlsv1.2 -sSf https:\u002F\u002Fraw.githubusercontent.com\u002FEricLBuehler\u002Fmistral.rs\u002Fmaster\u002Finstall.sh | sh\n```\n\n**Windows (PowerShell):**\n```powershell\nirm https:\u002F\u002Fraw.githubusercontent.com\u002FEricLBuehler\u002Fmistral.rs\u002Fmaster\u002Finstall.ps1 | iex\n```\n\n[手动安装及其他平台](docs\u002FINSTALLATION.md)\n\n### 运行您的第一个模型\n\n```bash\n# 交互式聊天\nmistralrs run -m Qwen\u002FQwen3-4B\n\n# 一次性提示（无交互式会话）\nmistralrs run -m Qwen\u002FQwen3-4B -i \"法国的首都是哪里？\"\n\n# 带图片的一次性请求\nmistralrs run -m google\u002Fgemma-4-E4B-it --image photo.jpg -i \"请描述这张图片\"\n\n# 或者启动带有 Web UI 的服务器\nmistralrs serve --ui -m google\u002Fgemma-4-E4B-it\n```\n\n然后访问 `http:\u002F\u002Flocalhost:1234\u002Fui` 即可进入 Web 聊天界面。\n\n### `mistralrs` CLI\n\nCLI 设计为 **零配置**: 只需指向一个模型即可开始使用。\n\n- **自动检测**: 自动检测模型架构、量化格式和聊天模板。\n- **一体化**: 单个二进制文件即可完成聊天、服务、基准测试和 Web UI 功能（`run`、`serve`、`bench`）。\n- **硬件调优**: 运行 `mistralrs tune` 自动基准测试并配置适合您硬件的最佳设置。\n- **格式无关**: 无缝支持 Hugging Face 模型、GGUF 文件以及 [UQFF 量化](docs\u002FUQFF.md)。\n\n```bash\n# 自动为您的硬件调优并生成配置文件\nmistralrs tune -m Qwen\u002FQwen3-4B --emit-config config.toml\n\n# 使用生成的配置文件运行\nmistralrs from-config -f config.toml\n\n# 诊断系统问题（CUDA、Metal、HuggingFace 连接）\nmistralrs doctor\n```\n\n[完整 CLI 文档](docs\u002FCLI.md)\n\n\u003Cdetails open>\n  \u003Csummary>\u003Cb>Web 聊天演示\u003C\u002Fb>\u003C\u002Fsummary>\n  \u003Cbr>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEricLBuehler_mistral.rs_readme_9cf2e024ee71.gif\" alt=\"Web 聊天 UI 演示\" \u002F>\n\u003C\u002Fdetails>\n\n## 为何如此高效\n\n**性能**\n- 默认在所有设备上支持连续批处理。\n- CUDA 配合 [FlashAttention](docs\u002FFLASH_ATTENTION.md) V2\u002FV3、Metal 以及 [多 GPU 张量并行](docs\u002FDISTRIBUTED\u002FDISTRIBUTED.md)。\n- [PagedAttention](docs\u002FPAGED_ATTENTION.md) 用于 CUDA 或 Apple Silicon 上的高吞吐量连续批处理，支持前缀缓存（包括多模态）。\n\n**量化**（[完整文档](docs\u002FQUANTS.md)）\n- 对任何 Hugging Face 模型进行 [原位量化 (ISQ)](docs\u002FISQ.md)。\n- 支持 GGUF（2-8 位）、GPTQ、AWQ、HQQ、FP8、BNB。\n- ⭐ [逐层拓扑](docs\u002FTOPOLOGY.md): 可针对每一层微调量化以达到最佳质量与速度。\n- ⭐ 自动选择最适合您硬件的量化方法。\n\n**灵活性**\n- [LoRA 和 X-LoRA](docs\u002FADAPTER_MODELS.md) 支持权重合并。\n- [AnyMoE](docs\u002FANYMOE.md): 可在任何基础模型上创建专家混合模型。\n- [多模型](docs\u002Fmulti_model\u002FREADME.md): 支持运行时加载\u002F卸载。\n\n**智能体特性**\n- 集成 [工具调用](docs\u002FTOOL_CALLING.md) 并支持 Python\u002FRust 回调函数。\n- ⭐ [网页搜索集成](docs\u002FWEB_SEARCH.md)。\n- ⭐ [MCP 客户端](docs\u002FMCP\u002Fclient.md): 可自动连接外部工具。\n\n[完整功能文档](docs\u002FREADME.md)\n\n## 支持的模型\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>文本模型\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Granite 4.0\n- SmolLM 3\n- DeepSeek V3\n- GPT-OSS\n- DeepSeek V2\n- 通义千问 3 Next\n- 通义千问 3 MoE\n- Phi 3.5 MoE\n- 通义千问 3\n- GLM 4\n- GLM-4.7-Flash\n- GLM-4.7（MoE）\n- Gemma 2\n- 通义千问 2\n- Starcoder 2\n- Phi 3\n- Mixtral\n- Phi 2\n- Gemma\n- Llama\n- Mistral\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>多模态模型\u003C\u002Fb>\u003C\u002Fsummary>\n\n- 通义千问 3.5\n- 通义千问 3.5 MoE\n- 通义千问 3-VL\n- 通义千问 3-VL MoE\n- Gemma 3n\n- Llama 4\n- Gemma 3\n- Mistral 3\n- Phi 4 多模态\n- 通义千问 2.5-VL\n- MiniCPM-O\n- Llama 3.2 Vision\n- 通义千问 2-VL\n- Idefics 3\n- Idefics 2\n- LLaVA Next\n- LLaVA\n- Phi 3V\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>语音模型\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Voxtral（ASR\u002F语音转文字）\n- Dia\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>图像生成模型\u003C\u002Fb>\u003C\u002Fsummary>\n\n- FLUX\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>嵌入模型\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Embedding Gemma\n- 通义千问 3 嵌入\n\u003C\u002Fdetails>\n\n[请求新增模型](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues\u002F156) | [完整兼容性表格](docs\u002FSUPPORTED_MODELS.md)\n\n## Python SDK\n\n```bash\npip install mistralrs  # 或 mistralrs-cuda、mistralrs-metal、mistralrs-mkl、mistralrs-accelerate\n```\n\n```python\nfrom mistralrs import Runner, Which, ChatCompletionRequest\n\nrunner = Runner(\n    which=Which.Plain(model_id=\"Qwen\u002FQwen3-4B\"),\n    in_situ_quant=\"4\",\n)\n\nres = runner.send_chat_completion_request(\n    ChatCompletionRequest(\n        model=\"default\",\n        messages=[{\"role\": \"user\", \"content\": \"你好！\"}],\n        max_tokens=256,\n    )\n)\nprint(res.choices[0].message.content)\n```\n\n[Python SDK](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FPYTHON_SDK.html) | [安装指南](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002FPYTHON_INSTALLATION.html) | [示例](examples\u002Fpython) | [烹饪书](examples\u002Fpython\u002Fcookbook.ipynb)\n\n## Rust SDK\n\n```bash\ncargo add mistralrs\n```\n\n```rust\nuse anyhow::Result;\nuse mistralrs::{IsqType, TextMessageRole, TextMessages, MultimodalModelBuilder};\n\n#[tokio::main]\nasync fn main() -> Result\u003C()> {\n    let model = MultimodalModelBuilder::new(\"google\u002Fgemma-4-E4B-it\")\n        .with_isq(IsqType::Q4K)\n        .with_logging()\n        .build()\n        .await?;\n\n    let messages = TextMessages::new().add_message(\n        TextMessageRole::User,\n        \"Hello!\",\n    );\n\n    let response = model.send_chat_request(messages).await?;\n\n    println!(\"{:?}\", response.choices[0].message.content);\n\n    Ok(())\n}\n```\n\n[API 文档](https:\u002F\u002Fdocs.rs\u002Fmistralrs) | [Crate](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fmistralrs) | [示例](mistralrs\u002Fexamples)\n\n## Docker\n\n用于快速容器化部署：\n\n```bash\ndocker pull ghcr.io\u002Fericlbuehler\u002Fmistral.rs:latest\ndocker run --gpus all -p 1234:1234 ghcr.io\u002Fericlbuehler\u002Fmistral.rs:latest \\\n  serve -m Qwen\u002FQwen3-4B\n```\n\n[Docker 镜像](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpkgs\u002Fcontainer\u002Fmistral.rs)\n\n> 对于生产环境，我们建议直接安装 CLI，以获得最大的灵活性。\n\n## 文档\n\n有关完整文档，请参阅 **[文档](https:\u002F\u002Fericlbuehler.github.io\u002Fmistral.rs\u002F)**。\n\n**快速链接：**\n- [CLI 参考](docs\u002FCLI.md) - 所有命令和选项\n- [HTTP API](docs\u002FHTTP.md) - 兼容 OpenAI 的端点\n- [量化](docs\u002FQUANTS.md) - ISQ、GGUF、GPTQ 等\n- [设备映射](docs\u002FDEVICE_MAPPING.md) - 多 GPU 和 CPU 卸载\n- [MCP 集成](docs\u002FMCP\u002FREADME.md) - MCP 集成文档\n- [故障排除](docs\u002FTROUBLESHOOTING.md) - 常见问题及解决方案\n- [配置](docs\u002FCONFIGURATION.md) - 用于配置的环境变量\n\n## 贡献\n\n欢迎贡献！请 [提交议题](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues) 讨论新功能或报告 bug。如果您想添加新模型，请通过议题与我们联系，我们将进行协调。\n\n## 致谢\n\n没有 [Candle](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle) 的出色工作，本项目将无法实现。感谢所有 [贡献者](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fgraphs\u002Fcontributors)！\n\nmistral.rs 与 Mistral AI 无关联。\n\n\u003Cp align=\"right\">\n  \u003Ca href=\"#top\">返回顶部\u003C\u002Fa>\n\u003C\u002Fp>","# mistral.rs 快速上手指南\n\nmistral.rs 是一个高性能、灵活的 LLM 推理引擎，支持文本、图像、视频和音频多模态输入。它无需复杂配置即可运行任意 Hugging Face 模型，并提供 Rust 和 Python SDK。\n\n## 环境准备\n\n**系统要求：**\n- **操作系统**：Linux, macOS, Windows (PowerShell)\n- **硬件加速**（可选但推荐）：\n  - NVIDIA GPU (CUDA)\n  - Apple Silicon (Metal)\n  - CPU (MKL\u002FAccelerate)\n- **依赖**：无需额外安装深度学习框架，mistral.rs 为独立二进制文件或通过包管理器安装。\n\n> **注意**：国内用户若访问 Hugging Face 模型受阻，请确保已配置镜像源（如 `HF_ENDPOINT` 环境变量）或使用本地模型路径。\n\n## 安装步骤\n\n### 方式一：命令行一键安装（推荐）\n\n**Linux \u002F macOS:**\n```bash\ncurl --proto '=https' --tlsv1.2 -sSf https:\u002F\u002Fraw.githubusercontent.com\u002FEricLBuehler\u002Fmistral.rs\u002Fmaster\u002Finstall.sh | sh\n```\n\n**Windows (PowerShell):**\n```powershell\nirm https:\u002F\u002Fraw.githubusercontent.com\u002FEricLBuehler\u002Fmistral.rs\u002Fmaster\u002Finstall.ps1 | iex\n```\n\n### 方式二：Python SDK 安装\n\n```bash\npip install mistralrs\n# 如需特定后端加速，可选择：\n# pip install mistralrs-cuda\n# pip install mistralrs-metal\n# pip install mistralrs-mkl\n```\n\n### 方式三：Rust SDK 安装\n\n在 `Cargo.toml` 中添加依赖：\n```toml\n[dependencies]\nmistralrs = \"latest\"\n```\n\n或在终端执行：\n```bash\ncargo add mistralrs\n```\n\n### 方式四：Docker 部署\n\n```bash\ndocker pull ghcr.io\u002Fericlbuehler\u002Fmistral.rs:latest\n```\n\n## 基本使用\n\n### 1. 交互式对话 (CLI)\n\n直接运行模型进行聊天（自动下载模型）：\n```bash\nmistralrs run -m Qwen\u002FQwen3-4B\n```\n\n### 2. 单次提示推理\n\n不进入交互模式，直接输出结果：\n```bash\nmistralrs run -m Qwen\u002FQwen3-4B -i \"法国首都是哪里？\"\n```\n\n### 3. 多模态推理（图文）\n\n传入图片进行分析：\n```bash\nmistralrs run -m google\u002Fgemma-4-E4B-it --image photo.jpg -i \"描述这张图片\"\n```\n\n### 4. 启动 Web UI 服务\n\n启动本地服务器并打开网页聊天界面：\n```bash\nmistralrs serve --ui -m google\u002Fgemma-4-E4B-it\n```\n启动后访问：`http:\u002F\u002Flocalhost:1234\u002Fui`\n\n### 5. 硬件自动调优\n\n自动测试当前硬件性能并生成最优配置文件：\n```bash\nmistralrs tune -m Qwen\u002FQwen3-4B --emit-config config.toml\n```\n使用生成的配置运行：\n```bash\nmistralrs from-config -f config.toml\n```\n\n### 6. Python SDK 示例\n\n```python\nfrom mistralrs import Runner, Which, ChatCompletionRequest\n\nrunner = Runner(\n    which=Which.Plain(model_id=\"Qwen\u002FQwen3-4B\"),\n    in_situ_quant=\"4\",\n)\n\nres = runner.send_chat_completion_request(\n    ChatCompletionRequest(\n        model=\"default\",\n        messages=[{\"role\": \"user\", \"content\": \"你好！\"}],\n        max_tokens=256,\n    )\n)\nprint(res.choices[0].message.content)\n```\n\n### 7. Rust SDK 示例\n\n```rust\nuse anyhow::Result;\nuse mistralrs::{IsqType, TextMessageRole, TextMessages, MultimodalModelBuilder};\n\n#[tokio::main]\nasync fn main() -> Result\u003C()> {\n    let model = MultimodalModelBuilder::new(\"google\u002Fgemma-4-E4B-it\")\n        .with_isq(IsqType::Q4K)\n        .with_logging()\n        .build()\n        .await?;\n\n    let messages = TextMessages::new().add_message(\n        TextMessageRole::User,\n        \"Hello!\",\n    );\n\n    let response = model.send_chat_request(messages).await?;\n\n    println!(\"{:?}\", response.choices[0].message.content);\n\n    Ok(())\n}\n```","一家初创公司的算法工程师需要在本地消费级显卡上快速部署支持图文多模态输入的 Qwen3.5 模型，以构建内部文档智能审核原型。\n\n### 没有 mistral.rs 时\n- **环境配置繁琐**：需手动匹配 PyTorch、CUDA 版本与模型架构，常因依赖冲突耗费数天调试。\n- **多模态支持割裂**：处理图像需额外集成视觉编码器，文本与图片输入无法在单一引擎中无缝切换。\n- **显存优化困难**：缺乏灵活的量化控制，大模型在单卡上极易显存溢出（OOM），被迫降级使用小参数模型。\n- **部署流程冗长**：从模型加载到提供 Web 服务需编写大量胶水代码，无法即时向产品经理演示效果。\n\n### 使用 mistral.rs 后\n- **零配置启动**：仅需一条 `mistralrs run -m Qwen\u002FQwen3.5` 命令，自动识别架构与聊天模板，即刻开始交互。\n- **原生多模态融合**：直接通过 CLI 传入 `--image` 参数即可实现图文混合推理，无需额外开发集成逻辑。\n- **硬件感知调优**：运行 `mistralrs tune` 自动测试并生成最优量化配置（如 MXFP4），在有限显存下跑通更大模型。\n- **内置服务界面**：执行 `mistralrs serve --ui` 瞬间拉起 Web 聊天界面，团队可立即进行功能验证与演示。\n\nmistral.rs 通过极致的工程化封装，将原本复杂的本地大模型部署转化为“一键式”体验，让开发者能专注于业务逻辑而非基础设施调试。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEricLBuehler_mistral.rs_9cf2e024.gif","EricLBuehler","Eric Buehler","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FEricLBuehler_f582cced.jpg","@huggingface ",null,"https:\u002F\u002Fgithub.com\u002FEricLBuehler",[82,86,90,94,98,102,106,110,114,117],{"name":83,"color":84,"percentage":85},"Rust","#dea584",83.1,{"name":87,"color":88,"percentage":89},"Cuda","#3A4E3A",9.9,{"name":91,"color":92,"percentage":93},"Metal","#8f14e9",4.4,{"name":95,"color":96,"percentage":97},"JavaScript","#f1e05a",1,{"name":99,"color":100,"percentage":101},"Python","#3572A5",0.7,{"name":103,"color":104,"percentage":105},"Jinja","#a52a22",0.3,{"name":107,"color":108,"percentage":109},"CSS","#663399",0.2,{"name":111,"color":112,"percentage":113},"Shell","#89e051",0.1,{"name":115,"color":116,"percentage":113},"HTML","#e34c26",{"name":118,"color":119,"percentage":113},"PowerShell","#012456",6861,556,"2026-04-05T14:10:12","MIT","Linux, macOS, Windows","非绝对必需（支持 CPU），但推荐 NVIDIA GPU (CUDA 支持 FlashAttention V2\u002FV3, PagedAttention) 或 Apple Silicon (Metal)。支持多 GPU 张量并行。具体显存需求取决于模型大小及量化设置（工具提供自动调优功能 `mistralrs tune` 以匹配硬件）。","未说明（取决于模型大小，支持 CPU 卸载）",{"notes":128,"python":129,"dependencies":130},"该工具基于 Rust 开发，提供预编译的二进制文件，无需手动配置复杂的 Python 深度学习环境。支持多种量化格式（GGUF, GPTQ, AWQ, FP8, ISQ 等）以降低显存需求。内置 `mistralrs tune` 命令可自动基准测试系统并生成最优配置。支持 Docker 部署。模型可直接从 Hugging Face 加载或使用本地 GGUF\u002FUQFF 文件。","未说明（可通过 pip 安装 Python SDK）",[131,132,133,134],"Candle (底层深度学习框架)","mistralrs (Python\u002FRust SDK)","CUDA Toolkit (NVIDIA 用户)","Metal (macOS 用户)",[26,13],[137,138,139],"llm","rust","uqff","2026-03-27T02:49:30.150509","2026-04-06T05:27:12.627442",[143,148,153,158,163,167],{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},17780,"如何直接从 GGUF 文件运行模型而不需要额外的 tokenizer 文件？","早期版本可能需要手动提供 tokenizer.json，但维护者已合并相关更新（如 #416 和 #397），现在支持直接从 GGUF 文件加载聊天模板和分词器（包括 GPT2\u002FBPE 类型）。您可以使用以下命令直接运行，无需额外文件：\n\ncargo run --release --features cuda -- --token-source none -i gguf -m . -f \u003C您的模型文件>.gguf\n\n确保您使用的是包含这些修复的最新版本。","https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues\u002F326",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},17781,"mistral.rs 的推理速度比 llama.cpp 慢吗？如何优化提示词处理速度？","在早期的量化 Mistral 模型中，提示词处理速度确实较慢。但经过优化（如禁用注意力掩码、融合操作等），目前性能已非常接近 llama.cpp。测试数据显示，llama.cpp 耗时约 320ms，而 mistral.rs 约为 340ms，差异很小。维护者持续在进行算子融合（如 affine division）等进一步优化以提升速度。","https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues\u002F153",{"id":154,"question_zh":155,"answer_zh":156,"source_url":157},17782,"流式推理输出不够流畅或速度不如 Ollama\u002FMLX 怎么办？","针对流式推理不流畅的问题，维护者已通过合并 #887 等 PR 显著提升了解码速度（例如 Llama 3.1 8B q4k 模型吞吐量提升了 26%，从 30 tokens\u002Fs 提升到 38 tokens\u002Fs）。现在的速度已经等于或快于 llama.cpp 和 MLX。请确保您更新到了最新版本以获得最佳体验。","https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues\u002F630",{"id":159,"question_zh":160,"answer_zh":161,"source_url":162},17783,"如何在多 GPU 环境下运行超过单显存容量的大上下文模型？","对于超出单个 GPU 显存的大模型（如长上下文模型），社区曾请求跨 GPU 设备映射功能。虽然具体实现细节随版本迭代可能有所变化，但通常可以通过配置张量并行或利用后端支持的分布式加载来解决。如果遇到 CUDA Out of Memory 错误，请检查是否启用了正确的多卡支持特性，或参考示例代码（如 examples\u002Fhttp.md）进行 API 服务器部署测试。","https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues\u002F395",{"id":164,"question_zh":165,"answer_zh":166,"source_url":157},17784,"在 Apple Silicon (M3 Max) 上运行模型时遇到 'MissingHeader(\"etag\")' 错误怎么办？","在 macOS (Metal) 上从 Hugging Face 加载模型时，如果出现 'Could not get file \"tokenizer.json\" from API: MissingHeader(\"etag\")' 错误，这通常是由于网络请求头缺失导致的 API 获取失败。尝试检查网络连接，或者手动下载 tokenizer.json 和模型文件到本地，然后使用本地路径加载模型（例如使用 -m 指向本地目录），以避免直接从 API 拉取时的潜在问题。",{"id":168,"question_zh":169,"answer_zh":170,"source_url":171},17785,"如何在 Amazon SageMaker 或特定 Linux 环境下通过 pip 安装 mistralrs？","在 Amazon Linux 或类似环境中通过 pip 安装 mistralrs 时，必须确保系统已正确安装 Rust (cargo) 并且 cargo 的二进制目录已添加到 PATH 环境变量中。如果安装失败，请验证 rustc 和 cargo 是否可在终端直接运行。此外，可能需要安装系统级的构建依赖（如 gcc, make 等）以编译底层 Rust 扩展。","https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fissues\u002F288",[173,178,183,188,193,198,203,208,213,218,223,228,233,238,243,248,253,258,263,268],{"id":174,"version":175,"summary_zh":176,"released_at":177},108073,"v0.8.0","## 变更内容\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1854 中对文档和 README 进行了调整\n* @lizzzcai 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1861 中将 Metal 标准从 3.0 升级到 3.1\n* @setoelkahfi 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1857 中修复了 Stable Diffusion 的 README\n* @guoqingbao 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1856 中使用 cudaforge 构建内核\n* @dependabot[bot] 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1865 中将 bytes 从 1.11.0 升级到 1.11.1\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1867 中修复了融合 GLU 的 Metal 和 CUDA 实现的精度问题\n* @dependabot[bot] 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1868 中将 time 从 0.3.45 升级到 0.3.47\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1869 中修复了 ViT + flash attention 的情况\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1870 中实现了并行 + I\u002FO 流水线 ISQ\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1871 中修复了 GPT-OSS 滑动窗口在前缀缓存情况下的问题\n* @synek317 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1873 中将 GGUF 文件的分隔符改为 “;”\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1872 中实现了 GPT-OSS 分页注意力，并支持 sink，同时跨 CUDA、Metal 和 CPU 实现 MoE 预填充内核\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1875 中修复了流式 SSE 在错误事件时卡死的问题\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1864 中增加了对 Qwen 3 Next 的支持\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1877 中修复了 completions 忽略 logprobs 的问题\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1878 中针对 Qwen 3 VL 系列进行了修复\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1883 中新增了一种量化方法：F8Q8\n* @glaziermag 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1885 中修复了 Docker：为 CUDA 构建环境安装 Git，以便获取 flash-attn-v3 CUTLASS\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1880 中升级到 0.7.1-alpha.1\n* @glaziermag 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1887 中修复了 core：将流式数据块的创建时间戳统一使用 Unix 秒\n* @setoelkahfi 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1891 中新增了 tvOS 的 Metal 支持\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1890 中重写了分页注意力，实现了块级前缀缓存和 KV 聚合内核\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1892 中修复了 phi3 GGUF 的连续性错误\n* @glaziermag 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1895 中修复了 core：处理校准路径中缺失 BOS 令牌的情况\n* @setoelkahfi 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1893 中新增了可选的 save_file 参数，用于 URL 图像生成的响应格式\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1894 中修复了 metal：从内存加载 metallib","2026-04-02T18:20:13",{"id":179,"version":180,"summary_zh":181,"released_at":182},108074,"v0.7.0","## 亮点\n- **全新 CLI：** `mistralrs-cli`\n- **前缀缓存：** 我们为分页注意力实现了前缀缓存功能（#1750）。通过重用共享提示前缀的 KV 缓存，显著加速多轮对话和 RAG 工作流。\n- **模型大幅扩展：** 支持 Embedding Gemma、Qwen 3 Embedding、Gemma 3n、GLM-4、Granite Hybrid MoE、GLM-4 MoE、GLM-4 MoE Lite。\n- **动态模型加载：** 服务器现支持在运行时加载和卸载模型（#1828）。\n- **性能提升：** 添加对 CUDA 13.0\u002F13.1 的支持（#1767），并引入高度优化的融合内核（GEMV、GLU）以及分块 FP8 内核，从而在 NVIDIA GPU 上实现显著提速。\n- **`candle` 0.9.2：** 我们已迁移到 `candle` 0.9.2 的官方 crates.io 发布版本，稳定了后端依赖！\n\n## 新增模型与架构\n- **嵌入模型：** Qwen 3 Embedding、Embedding Gemma\n- **文本模型：** GLM-4、GLM-4.7 Flash、Granite Hybrid、GPT-OSS\n- **视觉模型：** Gemma 3n、Qwen 3 VL 及 Qwen 3 VL MoE\n\n## 变更内容\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1460 中改进自动工具调用功能。\n* 杂项：@polarathene 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1458 中使 `Dockerfile.cuda-all` 的线程数可配置。\n* 杂项：@polarathene 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1459 中合并了 `apt-get install` 的 `RUN` 指令。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1463 中添加了 metal::isnan 的回退定义。\n* 杂项：@polarathene 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1465 中移除了运行时 rayon 线程的环境变量。\n* @guoqingbao 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1474 中移除了对 api_dir_list 的重复调用。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1478 中修复了 `mistralrs_mcp` 中短暂存在的 pyo3 依赖问题。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1480 中修复了非 macOS 系统下的 objc 依赖问题。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1481 中修复了 phi4 mini + nccl 缓存问题。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1482 中修复了 phi3.5 moe（#1447）的问题。\n* @guoqingbao 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1437 中增加了对 GLM4 模型的支持。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1484 中重构了分布式后端。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1485 中限制了 metal 分页注意力的 KV 分配上限。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1486 中进一步优化了 metal 分页注意力的上限设置。\n* 服务器核心：@matthewhaynesonline 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1423 中整合并统一了路由处理器和 API 接口。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1488 中支持 qwen3 gguf 格式。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1493 中将 bos\u002Feos token ID 设置为可选。","2026-01-28T06:10:26",{"id":184,"version":185,"summary_zh":186,"released_at":187},108075,"v0.6.0","- Dockerfile（CUDA、CPU）：https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpkgs\u002Fcontainer\u002Fmistral.rs\n- PyPI 包（[无功能](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmistralrs)、[CUDA](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmistralrs-cuda)、[MKL](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmistralrs-mkl)、[Metal](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmistralrs-metal)、[Accelerate](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmistralrs-accelerate)）\n\n## 🔥 v0.6.0 亮点\n\n🚀 重大特性\n- 支持 Llama 4 模型以及 Qwen 3 \u002F MoE \u002F VL 系列模型，包括 DeepSeek 和 DeepCoder 集成\n- 多模态前缀缓存、分页注意力调度器优化，以及更快速的 Metal\u002FCUDA 后端\n- 带聊天历史、文件上传、语音生成功能的 Web 聊天应用，并对工具调用\u002F搜索进行了全面升级\n- 快速采样器和 CPU FlashAttention，性能与准确性均有所提升\n- Metal 和 CUDA：在量化（AFQ、ISQ）、UQFF 处理及内存优化方面取得重大进展\n- MCP（模型上下文协议）：新增服务器端点、文档，并集成客户端\n- 视觉与音频扩展：支持 SIGLIP、Dia 1.6b TTS、conformer 主干网络（Phi-4MM）、自动加载器以及视觉工具前缀\n\n🧠 推理优化\n- CPU 上超快的 AFQ 量化，Metal 上优化后的 Qwen 3 MoE，以及分页注意力相关修复\n- 统一的 FlashAttention 后端与 ISQ 的自动方法选择\n- Metal 预编译支持及减少 autorelease 泄漏问题\n\n🧰 开发改进\n- 引擎架构、KV 缓存、注意力后端及设备映射逻辑重构\n- 集中化依赖管理与更清晰的内部抽象\n- LoRA 支持流程简化且速度更快\n\n🎉 其他\n- 重新设计的 README、AGENTS.md 以及新的基准测试脚本\n- 交互模式现可显示吞吐量，支持 Gumbel 采样，并提供更好的运行时采样控制\n- 扩展量化与 GGUF 支持：AWQ、Qwen3 GGUF 以及预量化 MLX 兼容性\n\n⸻\n\n\n## 变更内容\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1234 中修复 Metal 融合注意力头维度的处理问题\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1235 中为视觉模型 Rust API 添加分页注意力支持\n* [破坏性变更] 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1237 中支持设置 HF 缓存路径\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1239 中为 DeepSeek 模型添加工具调用支持\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1244 中重构服务器图像处理模块\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1247 中优化 CUDA RoPE 内核\n* 由 @edwko 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1246 中修正拼写错误（将 add_speial_tokens 改为 add_special_tokens）\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1250 中修复 UQFF 与分布式层相关问题\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1243 中集成自动代理式搜索功能（`web_search_options`）\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.","2025-06-10T23:28:29",{"id":189,"version":190,"summary_zh":191,"released_at":192},108076,"v0.5.0","## 亮点\n\n博客文章：https:\u002F\u002Fhuggingface.co\u002Fblog\u002FEricB\u002Fmistralrs-v0-5-0\n\n感谢所有为本次发布做出贡献的开发者！本次发布不仅包含了以下亮点，还进行了无数的改进、修复和优化。\n\n- 支持**更多模型**：\n  - Gemma 3\n  - Qwen 2.5 VL\n  - Mistral Small 3.1\n  - Phi 4 多模态（仅图像）\n- 对以下模型的**原生工具调用支持**：\n  - Llama 3.1\u002F3.2\u002F3.3\n  - Mistral Small 3\n  - Mistral Nemo\n  - Hermes 2 Pro\n  - Hermes 3\n- **张量并行**支持（NCCL）！\n- **FlashAttention V3** 支持，并集成到 PagedAttention 中。\n- 在 Metal 上，ISQ 时间缩短了**30倍**！\n- 重构了**前缀缓存系统**。\n\n## 变更内容\n* 允许在 CurrentThread 运行时中使用库，由 @sgrebnov 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1082 中实现。\n* 提高 uqff 自动设备映射的准确性，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1084 中实现。\n* DeepSeekV3 的 sigmoid 支持，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1092 中实现。\n* GPU 加速采样（解码性能提升 5%），由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1094 中实现。\n* 修复 qwen2vl 中缺失的 perceiver_config，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1096 中实现。\n* 为 Deepseek 2\u002F3 增加更多 topk 方法，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1097 中实现。\n* 更准确地计算 Deepseek 2\u002F3 的层大小，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1098 中实现。\n* 改善流式输出用户体验，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1102 中实现。\n* 加快 fp8 分块反量化速度，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1100 中实现。\n* Deepseek 2\u002F3 的分页注意力实现，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1103 中实现。\n* 加快 bincount 操作速度，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1104 中实现。\n* PagedAttention 支持提示词分块，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1105 中实现。\n* 重构服务器 SSE，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1107 中实现。\n* PagedAttention 结合 FlashAttention（以及 FlashAttention V3），由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1109 中实现。\n* 考虑 KEEP_ALIVE_INTERVAL 参数，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1111 中实现。\n* 重构 FlashAttention 的启用逻辑，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1110 中实现。\n* 修复 imatrix isq quantize_onto 问题，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1112 中实现。\n* 张量并行与流水线并行，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1113 中实现。\n* 将 openssl 从 0.10.69 升级到 0.10.70，由 @dependabot 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1121 中实现。\n* 允许聊天流式输出使用工具，由 @Jeadie 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F1088 中实现。\n* 为 imatrix 引入新文件格式：`.cimatrix`，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricL 中实现。","2025-03-24T04:16:15",{"id":194,"version":195,"summary_zh":196,"released_at":197},108077,"v0.4.0","## 新特性\n- 🔥 新模型！\n  - DeepSeek V2\n  - DeepSeek V3 和 R1\n  - MiniCpm-O 2.6\n- 🧮 Imatrix 量化\n- ⚙️ 自动设备映射\n- BNB 量化\n- 支持分块 FP8 反量化及 Metal 上的 FP8\n- 集成 llguidance 库 (@mmoskal)\n- Metal 分页注意力机制\n- 来自贡献者的大量修复与改进！\n\n## 破坏性变更\n- Rust 设备映射 API 已更改。\n\n## 最低 Rust 版本要求\n此版本的最低 Rust 版本要求为 **1.83.0**。\n\n## 变更内容\n* 使用 CUDA_COMPUTE_CAP，若无法找到 nvidia-smi，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F944 中实现。\n* 修复文档：修复损坏的链接，由 @sammcj 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F945 中完成。\n* 更好的扩散交互模式，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F948 中实现。\n* 为 ISQ 实现 Imatrix，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F949 中完成。\n* 支持视觉模型的 Imatrix 量化，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F950 中完成。\n* 使用 Imatrix 进行困惑度计算，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F952 中实现。\n* 将最低 Rustc 版本设置为 1.82，由 @mmoskal 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F957 中完成。\n* 修复 append_sliding_window 函数，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F958 中完成。\n* 修复 completion API 中 best_of 的行为，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F959 中完成。\n* 确保支持 CUDA CC 5.3，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F960 中完成。\n* 提升 Windows 平台上的测试速度，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F961 中完成。\n* 使用 llguidance 库处理约束条件（包括 JSON 模式），由 @mmoskal 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F899 中完成。\n* 修复 Metal FP8 量化问题，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F962 中完成。\n* 修正 example gguf_locally 示例以符合聊天模板要求，由 @msk 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F966 中完成。\n* Bitsandbytes 量化：加载与内核实现，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F967 中完成。\n* 更新 core 模块的 tokenizers 依赖至 0.21，由 @vkomenda 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F975 中完成。\n* 移除 README 中过时的二进制文件提及，由 @BafS 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F973 中完成。\n* 改善错误处理，由 @cdoko 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F974 中完成。\n* 在 prefix_cacher.rs 中为 evict_all_to_cpu 添加 None 检查，以防止 panic，由 @cdoko 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F979 中完成。\n* 在 Metal 位运算中包含起始偏移量，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F978 中完成。\n* 对 TcpListener 绑定错误进行快速失败处理，由 @cdoko 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F982 中完成。\n* 长序列注意力优化中的原地 softmax，由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F984 中完成。\n* 修复 CUDA cublaslt","2025-01-22T19:39:46",{"id":199,"version":200,"summary_zh":201,"released_at":202},108078,"v0.3.4","## 新特性\n- 支持 Qwen2-VL\n- 支持 Idefics 3\u002FSmolVLM\n- 🔥 提升 6 倍提示性能（所有基准测试速度均快于或与 MLX、llama.cpp 相当）！\n- 🗂️ 更高效的非分页注意力 KV 缓存实现！\n- 公开的分词 API\n\n## Python 轮子包\n轮子包现已支持 Windows、Linux 和 Mac 系统，架构包括 x86_64 和 aarch64。\n\n## 最低 Rust 版本要求\n1.79.0\n\n## 变更内容\n* @Reckon-11 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F895 中更新了 Dockerfile\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F894 中添加了 Qwen2-VL 模型\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F902 中为 mistralrs-bench 添加了 ISQ 功能\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F904 中使用了 tokenizers v0.20\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F905 中修复了金属 SDPA 的 v 步幅问题\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F906 中改进了图像路径的解析\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F907 中添加了一些用于 HQQ 反量化处理的 Metal 内核\n* @Jeadie 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F824 中处理了带有 'tool_calls' 的助手消息\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F908 中实现了金属平台上的注意力融合 softmax\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F909 中优化了金属平台上的 qmatmul 矩阵乘法（性能提升 5.4 倍）\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F911 中增加了对 mistralrs bench 的 --dtype 参数支持\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F914 中使用了 mtl 资源共享以避免重复拷贝\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F916 中实现了预分配的 KV 缓存\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F917 中修复了 KV 缓存动态增长的问题\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F920 中调整了 CUDA 架构下始终使用 fp8 和 bf16 的编译设置\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F923 中扩展了 CUDA 上的注意力掩码\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F925 中提升了 CUDA 平台下的提示生成速度\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F926 中增加了对分页注意力 alibi 的支持\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F927 中将默认设置调整为 SDPA，以提升 VLlama PP T\u002Fs 的速度\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F928 中为 VLlama 视觉模型添加了 ISQ 支持\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F930 中增加了对金属平台上 fp8 的支持\n* @dependabot 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F932 中将 rustls 从 0.23.15 升级至 0.23.18\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F931 中实现了 ISQ 模型困惑度的计算\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F933 中集成了适用于长序列的快速 MLX SDPA 内核\n* 始终进行类型转换 ima","2024-11-28T19:27:09",{"id":204,"version":205,"summary_zh":206,"released_at":207},108079,"v0.3.2","## 主要变更\n- 通用改进与修复\n- ISQ FP8\n- GPTQ Marlin\n- Metal 性能提升 26%\n- 提供了 Python 包的 wheel 文件。详情请见下文及各个 PyPI 包。\n\n## 变更内容\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F804 中更新文档和依赖项\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F805 中支持 Qwen 2.5\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F806 中通过澄清和注释改进文档\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F811 中优化了注意力掩码的逆运算\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F812 中修复了 `repeat_interleave` 问题\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F814 中将交叉注意力掩码中的负无穷值改为使用 f32 类型\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F813 中提升了 UQFF 的内存效率\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F816 中更新了 Metal 和 CUDA Candle 实现以及 ISQ\n* @eltociear 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F822 中负责维护工作，更新了 pagedattention.cu 文件\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F820 中指出，若使用 f16 精度，则应以 f32 精度加载视觉模型\n* @polarathene 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F823 中升级了 CI 流程\n* @bhargavshirin 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F833 中因 README 文档过长而添加了返回顶部按钮\n* @nikolaydubina 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F835 中修正了模型架构枚举中的拼写错误\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F841 中公开了 Rust API 的配置，并调整了模式类型\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F832 中新增了 ISQ FP8 支持\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F846 中修复了 Metal F8 构建错误\n* @dependabot 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F854 中将 pyo3 从 0.22.3 升级至 0.22.4\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F849 中生成了独立的 UQFF 模型\n* @kaleaditya779 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F848 中更新了 README.MD 文件\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F856 中为 4 位和 8 位模型增加了 GPTQ Marlin 支持\n* @DaveTJones 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F858 中为 clap 添加了 wrap_help 功能\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F857 中修复了 UQFF 的金属生成问题\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F860 中增加了 GGUF 格式的 Qwen 2 模型支持\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F861 中避免了在 ISQ 过程中重复编码 Metal 命令缓冲区\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F859 中修复了 isnanf 问题\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F862 中修复了一些 Metal 警告信息\n* 支持交互式模式标记","2024-10-28T15:44:53",{"id":209,"version":210,"summary_zh":211,"released_at":212},108080,"v0.3.1","## 亮点\n- UQFF\n- FLUX 模型\n- Llama 3.2 视觉模型\n\n## MSRV\n\n此版本的 MSRV 为 1.79.0。\n\n## 变更内容\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F742 中启用自动确定常规加载器类型\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F745 中添加 `ForwardInputsResult` API\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F747 中实现量化专家混合（MoQE）\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F748 中将 quinn-proto 从 0.11.6 升级至 0.11.8\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F752 中修复 Metal\u002FAccelerate 的 f64-f32 类型不匹配问题\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F753 中改进配置错误的分页注意力输入元数据时的错误提示\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F755 中更新依赖项，支持 CUDA 12.6\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F759 中修复未使用分页注意力时的 bug\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F762 中修复 tokio 运行时中的 `MistralRs` Drop 实现\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F767 中使用更友好的 Candle 错误 API\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F766 中支持设置随机种子\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F771 中修复带有随机种子的 Metal 构建错误\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F776 中修复并增加对无 KV 缓存情况的检查\n* UQFF：独特而强大的量化文件格式。由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F770 中推出\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F780 中添加 `Scheduler::running_len`\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F787 中去重 RoPE 缓存\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F785 中简化 Rust 端 API，使其更易用\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F788 中添加 SomeMoE 的示例\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F790 中提供采样相关的 Rust API\n* 我们的首个扩散模型：FLUX。由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F758 中推出\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F792 中修复 Metal 和 NSUInteger 相关的构建错误\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F801 中支持 Llama 3.2 GGUF 模型中的权重共享\n* 由 @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F796 中实现 Llama 3.2 视觉模型\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.3.0...v0.3.1","2024-09-29T15:39:44",{"id":214,"version":215,"summary_zh":216,"released_at":217},108081,"v0.3.0","## 亮点\n- 新模型拓扑功能：ISQ 和设备映射\n- 🔥 批处理时更快的 FlashAttention 支持\n- 移除了 `plotly` 及其相关 JS 依赖\n- φ³ 支持 Phi 3.5、Phi 3.5 视觉版和 Phi 3.5 MoE\n- 改进了 Rust API 的易用性\n- 支持多个（分片）GGUF 文件\n\n## MSRV\n该版本的 Rust MSRV 已更新至 1.79.0。\n\n## 变更内容\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F690 中修复了在启用 RUST_BACKTRACE=1 时自动选择数据类型的问题。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F692 中新增了对多个 GGUF 文件的支持。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F693 中重构了普通模型和视觉模型的加载器。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F695 中修复了 GGUF 文件中 `split.count` 重复处理的问题。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F694 中提供了批处理示例。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F697 中进行了一些修复。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F698 中改进了视觉模型的 Rust 示例。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F701 中添加了 ISQ 拓扑支持。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F702 中新增了自定义 logits 处理器 API。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F704 中增加了对 Gemma 2 PagedAttention 的支持。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F703 中优化了 Gemma\u002FGemma2 中的 RmsNorm 性能，使其更快。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F706 中修复了 Metal ISQ 中的 bug。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F691 中增加了对 GGUF BF16 张量的支持。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F707 中进一步提升了 FlashAttention 的支持：实现了真正的批处理、滑动窗口和软上限功能。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F708 中移除了模型中部分 `pub` 的使用。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F710 中增加了对 Phi 3.5 V 型号的支持。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F709 中实现了 Phi 3.5 MoE 模型。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F717 中引入了设备映射拓扑。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F637 中实现了 DRY 惩罚机制。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F700 中移除了 plotly，仅输出 CSV 格式的损失文件。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F724 中使用 once_cell 来降低 MSRV 要求。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F729 中修复了 Windows 构建问题。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F731 中进行了更多针对 phi3.5moe 的修复尝试。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F733 中添加了 Phi 3.5 MoE 的示例。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehl\u002Fmistral.rs 中添加了 Phi 3.5 的聊天模板。","2024-09-02T17:27:40",{"id":219,"version":220,"summary_zh":221,"released_at":222},108082,"v0.2.5","## 变更内容\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F664 中重构了 ISQ 量化解析。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F665 中重构了服务器示例，使其使用 OpenAI Python 客户端。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F623 中实现了提示词分块功能。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F668 中清理了 Python 示例和服务器示例。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F467 中实现了 GPTQ 量化。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F672 中更新了依赖项。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F676 中重新设计了自动数据类型选择功能。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F681 中修复了后端 Candle 分支中的 Metal、闪存注意力以及 Llama 线性层相关问题。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F682 中在测试中使用了转换后的 tokenizer.json 文件。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F683 中重构了 ISQ 和 mistralrs-quant 模块。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F686 中修复了 ISQ 的 Metal 构建问题。\n* @ac3xx 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F685 中添加了自动数据类型选择功能中缺失的错误处理情况。\n* @wseaton 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F687 中修复了工具类型响应中的空值问题。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F677 中实现了 HQQ 量化。\n* @EricLBuehler 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F688 中将版本号升级至 0.2.5。\n\n## 新贡献者\n* @ac3xx 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F685 中做出了首次贡献。\n* @wseaton 在 https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F687 中做出了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.2.4...v0.2.5\n## 安装 mistralrs-server 0.2.5\n\n### 通过 Shell 脚本安装预编译二进制文件\n\n```sh\ncurl --proto '=https' --tlsv1.2 -LsSf https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.5\u002Fmistralrs-server-installer.sh | sh\n```\n\n## 下载 mistralrs-server 0.2.5\n\n| 文件 | 平台 | 校验和 |\n|--------|----------|----------|\n| [mistralrs-server-aarch64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.5\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz) | Apple Silicon macOS | [校验和](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.5\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz.sha256) |\n| [mistralrs-server-x86_64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.5\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz) | Intel macOS | [校验和](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.5\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz.sha256) |\n| [mistralrs-server-x86_64-unknown-linux-gnu.tar.xz]","2024-08-16T01:10:23",{"id":224,"version":225,"summary_zh":226,"released_at":227},108083,"v0.2.4","\r\n## What's Changed\r\n* fix build on metal by returning Device by @rgbkrk in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F642\r\n* Add invite to Matrix chatroom by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F644\r\n* Make sure we don't have dead links by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F647\r\n* Fix more links by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F648\r\n* Throughput for interactive mode by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F655\r\n* Implement tool calling by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F649\r\n* Fix device map check for paged attn by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F656\r\n* Fix for mistral nemo in gguf by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F657\r\n* Fix check of cache config when device mapping + PA by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F658\r\n* Biollama in tool calling example by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F659\r\n* Biollama in tool calling example by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F660\r\n* Examples for simple tool calling by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F661\r\n* Bump version to 0.2.4 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F662\r\n\r\n## New Contributors\r\n* @rgbkrk made their first contribution in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F642\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.2.3...v0.2.4\r\n\r\n## MSRV\r\n\r\nMSRV is 1.75\r\n\r\n## Install mistralrs-server 0.2.4\r\n\r\n### Install prebuilt binaries via shell script\r\n\r\n```sh\r\ncurl --proto '=https' --tlsv1.2 -LsSf https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.4\u002Fmistralrs-server-installer.sh | sh\r\n```\r\n\r\n## Download mistralrs-server 0.2.4\r\n\r\n|  File  | Platform | Checksum |\r\n|--------|----------|----------|\r\n| [mistralrs-server-aarch64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.4\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz) | Apple Silicon macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.4\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.4\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz) | Intel macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.4\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-unknown-linux-gnu.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.4\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz) | x64 Linux | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.4\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz.sha256) |\r\n\r\n\r\n","2024-08-01T12:22:49",{"id":229,"version":230,"summary_zh":231,"released_at":232},108084,"v0.2.3","## What's Changed\r\n* Implement min-p sampling by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F625\r\n* Tweak handling when PA cannot allocate by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F632\r\n* Update deps by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F633\r\n* Improve penalty context window calculation by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F636\r\n* Allow setting PagedAttention KV cache allocation from context size by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F640\r\n* Bump version to 0.2.3 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F638\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.2.2...v0.2.3\r\n\r\n## Install mistralrs-server 0.2.3\r\n\r\n### Install prebuilt binaries via shell script\r\n\r\n```sh\r\ncurl --proto '=https' --tlsv1.2 -LsSf https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.3\u002Fmistralrs-server-installer.sh | sh\r\n```\r\n\r\n## Download mistralrs-server 0.2.3\r\n\r\n|  File  | Platform | Checksum |\r\n|--------|----------|----------|\r\n| [mistralrs-server-aarch64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.3\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz) | Apple Silicon macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.3\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.3\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz) | Intel macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.3\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-unknown-linux-gnu.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.3\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz) | x64 Linux | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.3\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz.sha256) |\r\n\r\n\r\n","2024-07-28T18:52:03",{"id":234,"version":235,"summary_zh":236,"released_at":237},108085,"v0.2.2","## What's Changed\r\n* Fix ctrlc handling for scheduler v2 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F614\r\n* Make `sliding_window` optional for mixtral by @csicar in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F616\r\n* Support Llama 3.1 scaled rope by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F618\r\n\r\n## New Contributors\r\n* @csicar made their first contribution in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F616\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.2.1...v0.2.2\r\n\r\n## MSRV\r\n\r\nMSRV is `1.75`.\r\n\r\n## Install mistralrs-server 0.2.2\r\n\r\n### Install prebuilt binaries via shell script\r\n\r\n```sh\r\ncurl --proto '=https' --tlsv1.2 -LsSf https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.2\u002Fmistralrs-server-installer.sh | sh\r\n```\r\n\r\n## Download mistralrs-server 0.2.2\r\n\r\n|  File  | Platform | Checksum |\r\n|--------|----------|----------|\r\n| [mistralrs-server-aarch64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.2\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz) | Apple Silicon macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.2\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.2\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz) | Intel macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.2\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-unknown-linux-gnu.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.2\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz) | x64 Linux | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.2\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz.sha256) |\r\n\r\n\r\n","2024-07-24T17:22:02",{"id":239,"version":240,"summary_zh":241,"released_at":242},108086,"v0.2.1","## What's Changed\r\n* Fix path normalize for mistralrs-paged-attn by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F592\r\n* ISQ python example by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F593\r\n* Add support for mistral nemo by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F595\r\n* Fix dtype with QLinear by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F600\r\n* Update paged-attn build.rs with NVCC flags by @joshpopelka20 in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F604\r\n* Bump openssl from 0.10.64 to 0.10.66 by @dependabot in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F605\r\n* Update GitHub issue templates by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F607\r\n* Add server throughput logging by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F608\r\n* Make the plotly feature optional by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F597\r\n* Use OnceLock for Python bindings device by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F602\r\n* Topk for X-LoRA scalings by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F609\r\n* Fix server cross-origin errors by @openmynet in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F610\r\n* Refactor sampler by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F611\r\n* Bump version to 0.2.1 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F613\r\n\r\n## New Contributors\r\n* @dependabot made their first contribution in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F605\r\n* @openmynet made their first contribution in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F610\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.2.0...v0.2.1\r\n\r\n## Install mistralrs-server 0.2.1\r\n\r\n### Install prebuilt binaries via shell script\r\n\r\n```sh\r\ncurl --proto '=https' --tlsv1.2 -LsSf https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.1\u002Fmistralrs-server-installer.sh | sh\r\n```\r\n\r\n## Download mistralrs-server 0.2.1\r\n\r\n|  File  | Platform | Checksum |\r\n|--------|----------|----------|\r\n| [mistralrs-server-aarch64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.1\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz) | Apple Silicon macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.1\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.1\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz) | Intel macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.1\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-unknown-linux-gnu.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.1\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz) | x64 Linux | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.1\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz.sha256) |\r\n\r\n\r\n","2024-07-23T16:31:03",{"id":244,"version":245,"summary_zh":246,"released_at":247},108087,"v0.2.0","## New features\r\n- Support .bin, .pt, .pth extensions\r\n- Add Starcoder 2 GGUF\r\n- 🔥 PagedAttention - beating llama.cpp running GGUF plus all the throughput benefits 😉 \r\n- Optimized performance and memory usage\r\n\r\n## Rust MSRV\r\n\r\nMSRV of `mistral.rs` v0.2.0 is 1.75.\r\n\r\n## What's Changed\r\n* Fix SWA order (flip it) for Gemma 2 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F554\r\n* Support .bin, .pt, .pth extensions by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F557\r\n* Update readme by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F558\r\n* Fix Starcoder 2 ISQ by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F559\r\n* Update deps by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F560\r\n* Add the starcoder2 GGUF arch by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F561\r\n* Readme update for starcoder2 gguf by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F562\r\n* Fix PyPI release trigger by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F566\r\n* Optimize multi-batch and inference performance with PagedAttention by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F552\r\n* [Breaking] Version 0.2.0 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F527\r\n* Paged attention support for vision models by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F567\r\n* Automatically use paged attn on cuda, get memory size by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F568\r\n* Add docs link for vision loader by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F570\r\n* Add matching for valid model weight names by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F571\r\n* Remove ensure about no paged attn for vision models by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F573\r\n* Add percentage utilization support to paged attn by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F574\r\n* Include block engine in paged attn metadata by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F576\r\n* Update deps and sync Candle by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F578\r\n* Optimize CLIP model by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F579\r\n* Use softmax_last_dim in CLIP by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F580\r\n* Fix method of calculating paged attn with util percent by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F581\r\n* Handle windows in paged attn build by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F577\r\n* Warn instead of error when paged attn not supported by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F583\r\n* Warn instead of error when paged attn for adapters not supported by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F584\r\n* Add support for lm_head to adapter models by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F586\r\n* Add default plotly feature by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F587\r\n* Improve memory handling of PagedAttention with GGUF by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F590\r\n* Fix Windows build on cuda w\u002F PagedAttention by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F589\r\n* Update cuda kernels build.rs on windows by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F591\r\n* Bump version to 0.2.0 and update docs by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F582\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.1.26...v0.2.0\r\n\r\n## Install mistralrs-server 0.2.0\r\n\r\n### Install prebuilt binaries via shell script\r\n\r\n```sh\r\ncurl --proto '=https' --tlsv1.2 -LsSf https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.0\u002Fmistralrs-server-installer.sh | sh\r\n```\r\n\r\n## Download mistralrs-server 0.2.0\r\n\r\n|  File  | Platform | Checksum |\r\n|--------|----------|----------|\r\n| [mistralrs-server-aarch64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.0\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz) | Apple Silicon macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.0\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.0\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz) | Intel macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.0\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-unknown-linux-gnu.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.2.0\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz) | x64 Linux | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases","2024-07-19T12:51:35",{"id":249,"version":250,"summary_zh":251,"released_at":252},108088,"v0.1.26","## What's Changed\r\n* Reference cargo dist in readme by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F536\r\n* Fix xlora table in readme by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F537\r\n* Warning about device mapping for anymoe by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F538\r\n* Add updater and installer programs to cargo dist functionality by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F540\r\n* Device mapping for AnyMoE by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F539\r\n* Add tracking of memory usage by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F392\r\n* Fix memory avail for cuda by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F543\r\n* Add LLaVA Support by @chenwanqq in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F484\r\n* Device mapping fix for normal llama by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F549\r\n* Add benchmark for a6000 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F550\r\n* Bump version to 0.1.26 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F551\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.1.25...v0.1.26\r\n\r\n## Install mistralrs-server 0.1.26\r\n\r\n### Install prebuilt binaries via shell script\r\n\r\n```sh\r\ncurl --proto '=https' --tlsv1.2 -LsSf https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.26\u002Fmistralrs-server-installer.sh | sh\r\n```\r\n\r\n## Download mistralrs-server 0.1.26\r\n\r\n|  File  | Platform | Checksum |\r\n|--------|----------|----------|\r\n| [mistralrs-server-aarch64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.26\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz) | Apple Silicon macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.26\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.26\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz) | Intel macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.26\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-unknown-linux-gnu.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.26\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz) | x64 Linux | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.26\u002Fmistralrs-server-x86_64-unknown-linux-gnu.tar.xz.sha256) |\r\n\r\n\r\n","2024-07-05T11:35:03",{"id":254,"version":255,"summary_zh":256,"released_at":257},108089,"v0.1.25","## Summary\r\n- Added [AnyMoE](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fblob\u002Fmaster\u002Fdocs\u002FANYMOE.md)\r\n- Added the [Starcoder 2](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F522) model\r\n- [Vision interactive mode](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F468)\r\n- [X-LoRA and LoRA support for Gemma 2](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F499)\r\n- [Optimized local loading speed](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F523)\r\n- [Release using `cargo dist`](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F480)\r\n\r\n## MSRV\r\n`cargo msrv`\r\n\r\nMSRV of mistral.rs is `1.75.0`.\r\n\r\n## What's Changed\r\n* Add a vision interactive mode by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F468\r\n* Add support for X-LoRA\u002FLoRA to Gemma 2 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F499\r\n* Add adapter activation support for Gemma 2 LoRA by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F500\r\n* Decrease MSRV to 1.75.0 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F501\r\n* Fix docs with preprocessor config by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F502\r\n* Allow for optional `model` field in OpenAI requests by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F504\r\n* Support streaming for completion by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F508\r\n* Select the best device in examples by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F507\r\n* Handle Metal error from bf16 autodtype selection by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F509\r\n* Use chat messages for the rust examples and show T\u002Fs by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F510\r\n* Avoid f64 to f32 cast in phirope for dtype on metal by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F512\r\n* Handle errors in rust examples by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F513\r\n* AnyMoE: Build an MoE model from anything, quickly by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F476\r\n* chore: update paths.rs by @eltociear in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F516\r\n* Add AnyMoE support for vision models by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F515\r\n* Support images in AnyMoE dataset by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F517\r\n* Handle metal case for nonzero and bitwise ops by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F518\r\n* Support saving and loading AnyMoE gating layer by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F519\r\n* Implement the Starcoder 2 model architecture by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F522\r\n* Docs for Starcoder 2 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F524\r\n* Starcoder2 docs by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F525\r\n* Export loader for starcoder2 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F526\r\n* Support the new phi3 models by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F530\r\n* Source the AnyMoE dataset from a structured JSON file by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F531\r\n* Create an AnyMoE loss graph by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F532\r\n* Add AnyMoE demo video by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F533\r\n* Add judgment for whether model_id is a local path to speed up local model loading by @chenwanqq in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F523\r\n* Update docs by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F534\r\n* release using cargo-dist by @kranurag7 in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F480\r\n* Bump version to 0.1.25 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F535\r\n\r\n## New Contributors\r\n* @kranurag7 made their first contribution in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F480\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.1.24...v0.1.25\r\n\r\n\r\n## Download mistralrs-server 0.1.25\r\n\r\n|  File  | Platform | Checksum |\r\n|--------|----------|----------|\r\n| [mistralrs-server-aarch64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.25\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz) | Apple Silicon macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.25\u002Fmistralrs-server-aarch64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-apple-darwin.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.25\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz) | Intel macOS | [checksum](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Freleases\u002Fdownload\u002Fv0.1.25\u002Fmistralrs-server-x86_64-apple-darwin.tar.xz.sha256) |\r\n| [mistralrs-server-x86_64-unknown-linux-gnu.tar.xz](https:\u002F\u002Fgithub.com\u002FEricLBu","2024-07-03T12:41:06",{"id":259,"version":260,"summary_zh":261,"released_at":262},108090,"v0.1.24","**Patch release, please update**\r\n\r\n## What's Changed\r\n* Bump version to 0.1.24 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F497\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.1.23...v0.1.24","2024-06-30T06:49:27",{"id":264,"version":265,"summary_zh":266,"released_at":267},108091,"v0.1.23","## What's Changed\r\n* Improve and update docs by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F477\r\n* Progress bar and logging when loading repeating layers by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F479\r\n* Update deps by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F483\r\n* Optimize decoding by removing redundant qkv transpose by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F487\r\n* Fixes and tweak docs, logging for local loading by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F489\r\n* Add the Gemma 2 model by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F490\r\n* Update demo video by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F491\r\n* Utilize new quantize_onto qtensor api by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F492\r\n* Update deps by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F493\r\n* Bump version to 0.1.23 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F495\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.1.22...v0.1.23","2024-06-29T23:56:14",{"id":269,"version":270,"summary_zh":271,"released_at":272},108092,"v0.1.22","## What's Changed\r\n* Remove erroneously flaky CI test by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F466\r\n* NVCC flags support for mistralrs_core build by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F469\r\n* Prevent divide by zero in cuda kernel by @joshpopelka20 in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F471\r\n* Better cuda build.rs linking of stdc++ by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F472\r\n* Remove some unnecessary `&mut`s by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F473\r\n* Fix arg order for pdoc by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F474\r\n* Bump version to 0.1.22 by @EricLBuehler in https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fpull\u002F475\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs\u002Fcompare\u002Fv0.1.21...v0.1.22","2024-06-24T19:21:53"]