[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-mosecorg--mosec":3,"tool-mosecorg--mosec":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":67,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":78,"owner_email":79,"owner_twitter":75,"owner_website":80,"owner_url":81,"languages":82,"stars":99,"forks":100,"last_commit_at":101,"license":102,"difficulty_score":23,"env_os":103,"env_gpu":104,"env_ram":104,"env_deps":105,"category_tags":112,"github_topics":113,"view_count":132,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":133,"updated_at":134,"faqs":135,"releases":164},230,"mosecorg\u002Fmosec","mosec","A high-performance ML model serving framework, offers dynamic batching and CPU\u002FGPU pipelines to fully exploit your compute machine","mosec 是一个高性能的机器学习模型在线服务框架，帮助开发者快速将训练好的模型部署为高效、稳定的后端 API。它解决了模型从离线测试到线上服务过程中常见的性能瓶颈问题，比如低吞吐、高延迟和资源利用率不足。mosec 特别适合熟悉 Python 的 AI 开发者或算法工程师使用，尤其适用于需要在云环境中部署 CPU\u002FGPU 混合任务、追求高并发与低延迟的场景。其核心亮点包括基于 Rust 构建的高性能调度层、支持动态批处理（dynamic batching）以提升推理效率，以及通过多阶段流水线（pipelined stages）灵活编排预处理、推理和后处理任务。此外，mosec 原生支持模型预热、优雅关闭和 Prometheus 监控，便于集成到 Kubernetes 等容器化运维体系中，让开发者更专注于模型本身和业务逻辑。","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmosecorg_mosec_readme_ea129b3252dc.png\" width=90% alt=\"MOSEC\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FJq5vxuH69W\">\n    \u003Cimg alt=\"discord invitation link\" src=\"https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F916177932236521533?style=flat&logo=discord&color=blue&cacheSeconds=60\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fmosec\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmosec?style=flat&logo=python&color=blue&cacheSeconds=60\" alt=\"PyPI version\" height=\"20\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Fmosec\">\n    \u003Cimg src=\"https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Fmosec\u002Fbadges\u002Fversion.svg\" alt=\"conda-forge\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fmosec\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fmosec\" alt=\"Python Version\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmosec\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmosecorg_mosec_readme_e553369f4034.png\" alt=\"PyPi monthly Downloads\" height=\"20\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ci>Model Serving made Efficient in the Cloud.\u003C\u002Fi>\n\u003C\u002Fp>\n\n## Introduction\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmosecorg_mosec_readme_232204322362.png\" width=70% alt=\"MOSEC\" \u002F>\n\u003C\u002Fp>\n\nMosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.\n\n- **Highly performant**: web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I\u002FO\n- **Ease of use**: user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing\n- **Dynamic batching**: aggregate requests from different users for batched inference and distribute results back\n- **Pipelined stages**: spawn multiple processes for pipelined stages to handle CPU\u002FGPU\u002FIO mixed workloads\n- **Cloud friendly**: designed to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems\n- **Do one thing well**: focus on the online serving part, users can pay attention to the model optimization and business logic\n\n## Installation\n\nMosec requires Python 3.7 or above. Install the latest [PyPI package](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmosec\u002F) for Linux or macOS with:\n\n```shell\npip install -U mosec\n# or install with conda\nconda install conda-forge::mosec\n# or install with pixi\npixi add mosec\n```\n\nTo build from the source code, install [Rust](https:\u002F\u002Fwww.rust-lang.org\u002F) and run the following command:\n\n```shell\nmake package\n```\n\nYou will get a mosec wheel file in the `dist` folder.\n\n## Usage\n\nWe demonstrate how Mosec can help you easily host a pre-trained stable diffusion model as a service. You need to install [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) and [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) as prerequisites:\n\n```shell\npip install --upgrade diffusers[torch] transformers\n```\n\n### Write the server\n\n\u003Cdetails>\n\u003Csummary>Click me for server codes with explanations.\u003C\u002Fsummary>\n\nFirstly, we import the libraries and set up a basic logger to better observe what happens.\n\n```python\nfrom io import BytesIO\nfrom typing import List\n\nimport torch  # type: ignore\nfrom diffusers import StableDiffusionPipeline  # type: ignore\n\nfrom mosec import Server, Worker, get_logger\nfrom mosec.mixin import MsgpackMixin\n\nlogger = get_logger()\n```\n\nThen, we **build an API** for clients to query a text prompt and obtain an image based on the [stable-diffusion-v1-5 model](https:\u002F\u002Fhuggingface.co\u002Fstable-diffusion-v1-5\u002Fstable-diffusion-v1-5) in just 3 steps.\n\n1) Define your service as a class which inherits `mosec.Worker`. Here we also inherit `MsgpackMixin` to employ the [msgpack](https:\u002F\u002Fmsgpack.org\u002Findex.html) serialization format\u003Csup>(a)\u003C\u002Fsup>\u003C\u002Fa>.\n\n2) Inside the `__init__` method, initialize your model and put it onto the corresponding device. Optionally you can assign `self.example` with some data to warm up\u003Csup>(b)\u003C\u002Fsup>\u003C\u002Fa> the model. Note that the data should be compatible with your handler's input format, which we detail next.\n\n3) Override the `forward` method to write your service handler\u003Csup>(c)\u003C\u002Fsup>\u003C\u002Fa>, with the signature `forward(self, data: Any | List[Any]) -> Any | List[Any]`. Receiving\u002Freturning a single item or a tuple depends on whether [dynamic batching](#configuration)\u003Csup>(d)\u003C\u002Fsup>\u003C\u002Fa> is configured.\n\n\n```python\nclass StableDiffusion(MsgpackMixin, Worker):\n    def __init__(self):\n        self.pipe = StableDiffusionPipeline.from_pretrained(\n            \"sd-legacy\u002Fstable-diffusion-v1-5\", torch_dtype=torch.float16\n        )\n        self.pipe.enable_model_cpu_offload()\n        self.example = [\"useless example prompt\"] * 4  # warmup (batch_size=4)\n\n    def forward(self, data: List[str]) -> List[memoryview]:\n        logger.debug(\"generate images for %s\", data)\n        res = self.pipe(data)\n        logger.debug(\"NSFW: %s\", res[1])\n        images = []\n        for img in res[0]:\n            dummy_file = BytesIO()\n            img.save(dummy_file, format=\"JPEG\")\n            images.append(dummy_file.getbuffer())\n        return images\n```\n\n> [!NOTE]\n>\n> (a) In this example we return an image in the binary format, which JSON does not support (unless encoded with base64 that makes the payload larger). Hence, msgpack suits our need better. If we do not inherit `MsgpackMixin`, JSON will be used by default. In other words, the protocol of the service request\u002Fresponse can be either msgpack, JSON, or any other format (check our [mixins](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Freference\u002Finterface.html#module-mosec.mixin)).\n>\n> (b) Warm-up usually helps to allocate GPU memory in advance. If the warm-up example is specified, the service will only be ready after the example is forwarded through the handler. However, if no example is given, the first request's latency is expected to be longer. The `example` should be set as a single item or a tuple depending on what `forward` expects to receive. Moreover, in the case where you want to warm up with multiple different examples, you may set `multi_examples` (demo [here](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fjax.html)).\n>\n> (c) This example shows a single-stage service, where the `StableDiffusion` worker directly takes in client's prompt request and responds the image. Thus the `forward` can be considered as a complete service handler. However, we can also design a multi-stage service with workers doing different jobs (e.g., downloading images, model inference, post-processing) in a pipeline. In this case, the whole pipeline is considered as the service handler, with the first worker taking in the request and the last worker sending out the response. The data flow between workers is done by inter-process communication.\n>\n> (d) Since dynamic batching is enabled in this example, the `forward` method will wishfully receive a _list_ of string, e.g., `['a cute cat playing with a red ball', 'a man sitting in front of a computer', ...]`, aggregated from different clients for _batch inference_, improving the system throughput.\n\nFinally, we append the worker to the server to construct a *single-stage* workflow (multiple stages can be [pipelined](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPipeline_(computing)) to further boost the throughput, see [this example](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fpytorch.html#computer-vision)), and specify the number of processes we want it to run in parallel (`num=1`), and the maximum batch size (`max_batch_size=4`, the maximum number of requests dynamic batching will accumulate before timeout; timeout is defined with the `max_wait_time=10` in milliseconds, meaning the longest time Mosec waits until sending the batch to the Worker).\n\n```python\nif __name__ == \"__main__\":\n    server = Server()\n    # 1) `num` specifies the number of processes that will be spawned to run in parallel.\n    # 2) By configuring the `max_batch_size` with the value > 1, the input data in your\n    # `forward` function will be a list (batch); otherwise, it's a single item.\n    server.append_worker(StableDiffusion, num=1, max_batch_size=4, max_wait_time=10)\n    server.run()\n```\n\u003C\u002Fdetails>\n\n### Run the server\n\n\u003Cdetails>\n\u003Csummary>Click me to see how to run and query the server.\u003C\u002Fsummary>\n\nThe above snippets are merged in our example file. You may directly run at the project root level. We first have a look at the _command line arguments_ (explanations [here](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Freference\u002Farguments.html)):\n\n```shell\npython examples\u002Fstable_diffusion\u002Fserver.py --help\n```\n\nThen let's start the server with debug logs:\n\n```shell\npython examples\u002Fstable_diffusion\u002Fserver.py --log-level debug --timeout 30000\n```\n\nOpen `http:\u002F\u002F127.0.0.1:8000\u002Fopenapi\u002Fswagger\u002F` in your browser to get the OpenAPI doc.\n\nAnd in another terminal, test it:\n\n```shell\npython examples\u002Fstable_diffusion\u002Fclient.py --prompt \"a cute cat playing with a red ball\" --output cat.jpg --port 8000\n```\n\nYou will get an image named \"cat.jpg\" in the current directory.\n\nYou can check the metrics:\n\n```shell\ncurl http:\u002F\u002F127.0.0.1:8000\u002Fmetrics\n```\n\nThat's it! You have just hosted your **_stable-diffusion model_** as a service! 😉\n\u003C\u002Fdetails>\n\n## Examples\n\nMore ready-to-use examples can be found in the [Example](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Findex.html) section. It includes:\n\n- [Pipeline](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fecho.html): a simple echo demo even without any ML model.\n- [Request validation](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fvalidate.html): validate the request with type annotation and generate OpenAPI documentation.\n- [Multiple route](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fmulti_route.html): serve multiple models in one service\n- [Embedding service](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fembedding.html): OpenAI compatible embedding service\n- [Reranking service](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Frerank.html): rerank a list of passages based on a query\n- [Shared memory IPC](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fipc.html): inter-process communication with shared memory.\n- [Customized GPU allocation](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fenv.html): deploy multiple replicas, each using different GPUs.\n- [Customized metrics](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fmetric.html): record your own metrics for monitoring.\n- [Jax jitted inference](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fjax.html): just-in-time compilation speeds up the inference.\n- [Compression](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fcompression.html): enable request\u002Fresponse compression.\n- PyTorch deep learning models:\n  - [sentiment analysis](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fpytorch.html#natural-language-processing): infer the sentiment of a sentence.\n  - [image recognition](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fpytorch.html#computer-vision): categorize a given image.\n  - [stable diffusion](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fstable_diffusion.html): generate images based on texts, with msgpack serialization.\n\n## Configuration\n\n- Dynamic batching\n  - `max_batch_size` and `max_wait_time (millisecond)` are configured when you call `append_worker`.\n  - Make sure inference with the `max_batch_size` value won't cause the out-of-memory in GPU.\n  - Normally, `max_wait_time` should be less than the batch inference time.\n  - If enabled, it will collect a batch either when the number of accumulated requests reaches `max_batch_size` or when `max_wait_time` has elapsed. The service will benefit from this feature when the traffic is high.\n- Check the [arguments doc](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Freference\u002Farguments.html) for other configurations.\n\n## Deployment\n\n- If you're looking for a GPU base image with `mosec` installed, you can check the official image [`mosecorg\u002Fmosec`](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fmosecorg\u002Fmosec). For the complex use case, check out [envd](https:\u002F\u002Fgithub.com\u002Ftensorchord\u002Fenvd).\n- This service doesn't need Gunicorn or NGINX, but you can certainly use the ingress controller when necessary.\n- This service should be the PID 1 process in the container since it controls multiple processes. If you need to run multiple processes in one container, you will need a supervisor. You may choose [Supervisor](https:\u002F\u002Fgithub.com\u002FSupervisor\u002Fsupervisor) or [Horust](https:\u002F\u002Fgithub.com\u002FFedericoPonzi\u002FHorust).\n- Remember to collect the **metrics**.\n  - `mosec_service_batch_size_bucket` shows the batch size distribution.\n  - `mosec_service_batch_duration_second_bucket` shows the duration of dynamic batching for each connection in each stage (starts from receiving the first task).\n  - `mosec_service_process_duration_second_bucket` shows the duration of processing for each connection in each stage (including the IPC time but excluding the `mosec_service_batch_duration_second_bucket`).\n  - `mosec_service_remaining_task` shows the number of currently processing tasks.\n  - `mosec_service_throughput` shows the service throughput.\n- Stop the service with `SIGINT` (`CTRL+C`) or `SIGTERM` (`kill {PID}`) since it has the graceful shutdown logic.\n\n## Performance tuning\n\n- Find out the best `max_batch_size` and `max_wait_time` for your inference service. The metrics will show the histograms of the real batch size and batch duration. Those are the key information to adjust these two parameters.\n- Try to split the whole inference process into separate CPU and GPU stages (ref [DistilBERT](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fpytorch.html#natural-language-processing)). Different stages will be run in a [data pipeline](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPipeline_(software)), which will keep the GPU busy.\n- You can also adjust the number of workers in each stage. For example, if your pipeline consists of a CPU stage for preprocessing and a GPU stage for model inference, increasing the number of CPU-stage workers can help to produce more data to be batched for model inference at the GPU stage; increasing the GPU-stage workers can fully utilize the GPU memory and computation power. Both ways may contribute to higher GPU utilization, which consequently results in higher service throughput.\n- For multi-stage services, note that the data passing through different stages will be serialized\u002Fdeserialized by the `serialize_ipc\u002Fdeserialize_ipc` methods, so extremely large data might make the whole pipeline slow. The serialized data is passed to the next stage through rust by default, you could enable shared memory to potentially reduce the latency (ref [RedisShmIPCMixin](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fipc.html#redis-shm-ipc-py)).\n- You should choose appropriate `serialize\u002Fdeserialize` methods, which are used to decode the user request and encode the response. By default, both are using JSON. However, images and embeddings are not well supported by JSON. You can choose msgpack which is faster and binary compatible (ref [Stable Diffusion](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fstable_diffusion.html)).\n- Configure the threads for OpenBLAS or MKL. It might not be able to choose the most suitable CPUs used by the current Python process. You can configure it for each worker by using the [env](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Freference\u002Finterface.html#mosec.server.Server.append_worker) (ref [custom GPU allocation](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fenv.html)).\n- Enable HTTP\u002F2 from client side. `mosec` automatically adapts to user's protocol (e.g., HTTP\u002F2) since v0.8.8.\n\n## Adopters\n\nHere are some of the companies and individual users that are using Mosec:\n\n- [Modelz](https:\u002F\u002Fgithub.com\u002Ftensorchord\u002Fopenmodelz): Serverless platform for ML inference.\n- [MOSS](https:\u002F\u002Fgithub.com\u002FOpenLMLab\u002FMOSS\u002Fblob\u002Fmain\u002FREADME_en.md): An open sourced conversational language model like ChatGPT.\n- [TencentCloud](https:\u002F\u002Fwww.tencentcloud.com\u002Fdocument\u002Fproduct\u002F1141\u002F45261): Tencent Cloud Machine Learning Platform, using Mosec as the [core inference server framework](https:\u002F\u002Fcloud.tencent.com\u002Fdocument\u002Fproduct\u002F851\u002F74148).\n- [TensorChord](https:\u002F\u002Fgithub.com\u002Ftensorchord): Cloud native AI infrastructure company.\n- [OAT](https:\u002F\u002Fgithub.com\u002Fsail-sg\u002Foat): Serving reward models for online LLM alignment.\n\n## Citation\n\nIf you find this software useful for your research, please consider citing\n\n```\n@software{yang2021mosec,\n  title = {{MOSEC: Model Serving made Efficient in the Cloud}},\n  author = {Yang, Keming and Liu, Zichen and Cheng, Philip},\n  howpublished = {https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec},\n  year = {2021}\n}\n```\n\n## Contributing\n\nWe welcome any kind of contribution. Please give us feedback by [raising issues](https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fissues\u002Fnew\u002Fchoose) or discussing on [Discord](https:\u002F\u002Fdiscord.gg\u002FJq5vxuH69W). You could also directly [contribute](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fdevelopment\u002Fcontributing.html) your code and pull request!\n\nTo start develop, you can use [envd](https:\u002F\u002Fgithub.com\u002Ftensorchord\u002Fenvd) to create an isolated and clean Python & Rust environment. Check the [envd-docs](https:\u002F\u002Fenvd.tensorchord.ai\u002F) or [build.envd](https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fblob\u002Fmain\u002Fbuild.envd) for more information.\n","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmosecorg_mosec_readme_ea129b3252dc.png\" width=90% alt=\"MOSEC\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FJq5vxuH69W\">\n    \u003Cimg alt=\"discord invitation link\" src=\"https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F916177932236521533?style=flat&logo=discord&color=blue&cacheSeconds=60\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fmosec\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmosec?style=flat&logo=python&color=blue&cacheSeconds=60\" alt=\"PyPI version\" height=\"20\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Fmosec\">\n    \u003Cimg src=\"https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Fmosec\u002Fbadges\u002Fversion.svg\" alt=\"conda-forge\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fmosec\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fmosec\" alt=\"Python Version\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmosec\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmosecorg_mosec_readme_e553369f4034.png\" alt=\"PyPi monthly Downloads\" height=\"20\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ci>让云上的模型服务更高效。\u003C\u002Fi>\n\u003C\u002Fp>\n\n## 简介\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmosecorg_mosec_readme_232204322362.png\" width=70% alt=\"MOSEC\" \u002F>\n\u003C\u002Fp>\n\nMosec 是一个高性能且灵活的模型服务（model serving）框架，用于构建支持机器学习模型的后端服务和微服务。它弥合了你刚训练好的任意机器学习模型与高效在线服务 API 之间的鸿沟。\n\n- **高性能**：Web 层和任务协调使用 Rust 🦀 构建，在异步 I\u002FO（async I\u002FO）驱动下，不仅提供极快的速度，还能高效利用 CPU 资源\n- **易于使用**：用户接口完全基于 Python 🐍，用户可以以与离线测试相同的代码，以不依赖具体机器学习框架（ML framework-agnostic）的方式部署模型\n- **动态批处理（Dynamic batching）**：聚合来自不同用户的请求进行批量推理，并将结果分发回各自用户\n- **流水线阶段（Pipelined stages）**：为流水线的不同阶段启动多个进程，以处理混合了 CPU\u002FGPU\u002FI\u002FO 的工作负载\n- **云原生友好（Cloud friendly）**：专为云环境设计，内置模型预热（warmup）、优雅关闭（graceful shutdown）以及 Prometheus 监控指标，可轻松通过 Kubernetes 或其他容器编排系统管理\n- **专注做好一件事**：专注于在线服务部分，让用户能更专注于模型优化和业务逻辑\n\n## 安装\n\nMosec 要求 Python 3.7 或更高版本。可通过以下命令在 Linux 或 macOS 上安装最新的 [PyPI 包](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmosec\u002F)：\n\n```shell\npip install -U mosec\n# 或使用 conda 安装\nconda install conda-forge::mosec\n# 或使用 pixi 安装\npixi add mosec\n```\n\n若需从源码构建，请先安装 [Rust](https:\u002F\u002Fwww.rust-lang.org\u002F)，然后运行以下命令：\n\n```shell\nmake package\n```\n\n你将在 `dist` 文件夹中获得一个 mosec 的 wheel 文件。\n\n## 使用示例\n\n我们演示如何使用 Mosec 轻松将一个预训练的 Stable Diffusion 模型部署为服务。你需要先安装前置依赖 [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) 和 [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)：\n\n```shell\npip install --upgrade diffusers[torch] transformers\n```\n\n### 编写服务端\n\n\u003Cdetails>\n\u003Csummary>点击此处查看带解释的服务端代码。\u003C\u002Fsummary>\n\n首先，我们导入所需的库，并设置一个基础的日志记录器（logger），以便更好地观察程序运行情况。\n\n```python\nfrom io import BytesIO\nfrom typing import List\n\nimport torch  # type: ignore\nfrom diffusers import StableDiffusionPipeline  # type: ignore\n\nfrom mosec import Server, Worker, get_logger\nfrom mosec.mixin import MsgpackMixin\n\nlogger = get_logger()\n```\n\n接下来，我们仅需 **3 个步骤** 即可构建一个 API，供客户端提交文本提示（prompt），并基于 [stable-diffusion-v1-5 模型](https:\u002F\u002Fhuggingface.co\u002Fstable-diffusion-v1-5\u002Fstable-diffusion-v1-5) 获取生成的图像。\n\n1) 将你的服务定义为一个继承自 `mosec.Worker` 的类。这里我们还继承了 `MsgpackMixin`，以使用 [msgpack](https:\u002F\u002Fmsgpack.org\u002Findex.html) 序列化格式\u003Csup>(a)\u003C\u002Fsup>\u003C\u002Fa>。\n\n2) 在 `__init__` 方法中，初始化模型并将其加载到对应的设备上。你还可以选择性地为 `self.example` 赋值一些数据，用于模型预热（warmup）\u003Csup>(b)\u003C\u002Fsup>\u003C\u002Fa>。注意，这些数据必须与后续 `forward` 方法所期望的输入格式兼容。\n\n3) 重写 `forward` 方法来编写你的服务处理逻辑\u003Csup>(c)\u003C\u002Fsup>\u003C\u002Fa>，其函数签名为 `forward(self, data: Any | List[Any]) -> Any | List[Any]`。是否接收\u002F返回单个元素或列表，取决于是否启用了[动态批处理（dynamic batching）](#configuration)\u003Csup>(d)\u003C\u002Fsup>\u003C\u002Fa>。\n\n```python\nclass StableDiffusion(MsgpackMixin, Worker):\n    def __init__(self):\n        self.pipe = StableDiffusionPipeline.from_pretrained(\n            \"sd-legacy\u002Fstable-diffusion-v1-5\", torch_dtype=torch.float16\n        )\n        self.pipe.enable_model_cpu_offload()\n        self.example = [\"useless example prompt\"] * 4  # 预热 (batch_size=4)\n\n    def forward(self, data: List[str]) -> List[memoryview]:\n        logger.debug(\"generate images for %s\", data)\n        res = self.pipe(data)\n        logger.debug(\"NSFW: %s\", res[1])\n        images = []\n        for img in res[0]:\n            dummy_file = BytesIO()\n            img.save(dummy_file, format=\"JPEG\")\n            images.append(dummy_file.getbuffer())\n        return images\n```\n\n> [!NOTE]\n>\n> (a) 本例中我们以二进制格式返回图像，而 JSON 不支持二进制数据（除非使用 base64 编码，但这会增大负载体积）。因此 msgpack 更适合我们的需求。如果不继承 `MsgpackMixin`，则默认使用 JSON。换句话说，服务请求\u002F响应的协议可以是 msgpack、JSON 或其他任意格式（详见我们的 [mixins](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Freference\u002Finterface.html#module-mosec.mixin)）。\n>\n> (b) 预热通常有助于提前分配 GPU 内存。如果指定了预热示例，服务将在该示例通过 `forward` 处理完成后才进入就绪状态。若未提供示例，则首个请求的延迟会较长。`example` 应设置为单个元素或元组，具体取决于 `forward` 方法期望接收的数据形式。此外，如果你希望使用多个不同的示例进行预热，可以设置 `multi_examples`（示例见[此处](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fjax.html)）。\n>\n> (c) 本例展示的是单阶段服务：`StableDiffusion` 工作器直接接收客户端的提示请求并返回图像，因此 `forward` 可视为完整的服务处理逻辑。但你也可以设计多阶段服务，让不同工作器在流水线（pipeline）中分别执行不同任务（例如下载图像、模型推理、后处理等）。此时整个流水线被视为服务处理逻辑，由第一个工作器接收请求，最后一个工作器返回响应，工作器间通过进程间通信传递数据。\n>\n> (d) 由于本例启用了动态批处理（dynamic batching），`forward` 方法将接收到一个字符串 _列表_，例如 `['a cute cat playing with a red ball', 'a man sitting in front of a computer', ...]`，这些请求来自不同客户端，被聚合后进行 _批处理推理（batch inference）_，从而提升系统吞吐量。\n\n最后，我们将工作器添加到服务器中，构建一个 *单阶段* 工作流（多个阶段可通过[流水线（pipeline）](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPipeline_(computing)) 进一步提升吞吐量，参见[此示例](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fpytorch.html#computer-vision)），并指定并行运行的进程数（`num=1`）以及最大批处理大小（`max_batch_size=4`，即动态批处理在超时前最多累积的请求数；超时时间由 `max_wait_time=10` 毫秒定义，表示 Mosec 等待发送批次到工作器的最长时间）。\n\n```python\nif __name__ == \"__main__\":\n    server = Server()\n    # 1) `num` 指定将启动多少个进程并行运行。\n    # 2) 当 `max_batch_size` 设置为大于 1 的值时，`forward` 函数中的输入数据将是一个列表（批处理）；\n    #    否则，输入为单个元素。\n    server.append_worker(StableDiffusion, num=1, max_batch_size=4, max_wait_time=10)\n    server.run()\n```\n\u003C\u002Fdetails>\n\n### 运行服务端\n\n\u003Cdetails>\n\u003Csummary>点击此处查看如何运行和查询服务端。\u003C\u002Fsummary>\n\n上述代码片段已合并到我们的示例文件中。你可以在项目根目录下直接运行。首先查看 _命令行参数_（详细说明见[此处](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Freference\u002Farguments.html)）：\n\n```shell\npython examples\u002Fstable_diffusion\u002Fserver.py --help\n```\n\n然后以调试日志模式启动服务端：\n\n```shell\npython examples\u002Fstable_diffusion\u002Fserver.py --log-level debug --timeout 30000\n```\n\n在浏览器中打开 `http:\u002F\u002F127.0.0.1:8000\u002Fopenapi\u002Fswagger\u002F` 即可查看 OpenAPI 文档。\n\n在另一个终端中测试服务：\n\n```shell\npython examples\u002Fstable_diffusion\u002Fclient.py --prompt \"a cute cat playing with a red ball\" --output cat.jpg --port 8000\n```\n\n你将在当前目录下获得一张名为 \"cat.jpg\" 的图像。\n\n你还可以查看指标数据：\n\n```shell\ncurl http:\u002F\u002F127.0.0.1:8000\u002Fmetrics\n```\n\n搞定！你刚刚成功将 **_stable-diffusion 模型_** 部署为一项服务了！😉\n\u003C\u002Fdetails>\n\n## 示例\n\n更多开箱即用的示例可以在 [示例](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Findex.html) 章节中找到，包括：\n\n- [Pipeline（流水线）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fecho.html)：一个简单的 echo 演示，甚至不包含任何机器学习模型。\n- [请求验证（Request validation）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fvalidate.html)：使用类型注解验证请求并生成 OpenAPI 文档。\n- [多路由（Multiple route）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fmulti_route.html)：在一个服务中部署多个模型。\n- [Embedding 服务](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fembedding.html)：兼容 OpenAI 的 embedding 服务。\n- [重排序服务（Reranking service）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Frerank.html)：根据查询对一组段落进行重排序。\n- [共享内存 IPC（Shared memory IPC）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fipc.html)：使用共享内存进行进程间通信。\n- [自定义 GPU 分配（Customized GPU allocation）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fenv.html)：部署多个副本，每个副本使用不同的 GPU。\n- [自定义指标（Customized metrics）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fmetric.html)：记录您自己的监控指标。\n- [Jax 即时编译推理（Jax jitted inference）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fjax.html)：通过即时编译（just-in-time compilation）加速推理。\n- [压缩（Compression）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fcompression.html)：启用请求\u002F响应压缩。\n- PyTorch 深度学习模型：\n  - [情感分析（sentiment analysis）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fpytorch.html#natural-language-processing)：推断句子的情感倾向。\n  - [图像识别（image recognition）](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fpytorch.html#computer-vision)：对给定图像进行分类。\n  - [Stable Diffusion](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fstable_diffusion.html)：基于文本生成图像，并使用 msgpack 序列化。\n\n## 配置\n\n- 动态批处理（Dynamic batching）\n  - `max_batch_size` 和 `max_wait_time（毫秒）` 在调用 `append_worker` 时配置。\n  - 确保使用 `max_batch_size` 进行推理不会导致 GPU 内存溢出（out-of-memory）。\n  - 通常，`max_wait_time` 应小于批处理推理所需时间。\n  - 启用后，当累积请求数达到 `max_batch_size` 或等待时间达到 `max_wait_time` 时，将触发一次批处理。在高流量场景下，该功能可显著提升服务性能。\n- 其他配置项请参考 [参数文档](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Freference\u002Farguments.html)。\n\n## 部署\n\n- 如果您需要一个已预装 `mosec` 的 GPU 基础镜像，可以使用官方镜像 [`mosecorg\u002Fmosec`](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fmosecorg\u002Fmosec)。对于更复杂的使用场景，请参考 [envd](https:\u002F\u002Fgithub.com\u002Ftensorchord\u002Fenvd)。\n- 本服务无需 Gunicorn 或 NGINX，但在必要时仍可配合 Ingress 控制器使用。\n- 本服务应作为容器中的 PID 1 进程运行，因为它会管理多个子进程。如果您需要在单个容器中运行多个进程，则需使用进程管理器，例如 [Supervisor](https:\u002F\u002Fgithub.com\u002FSupervisor\u002Fsupervisor) 或 [Horust](https:\u002F\u002Fgithub.com\u002FFedericoPonzi\u002FHorust)。\n- 请务必收集 **指标（metrics）**：\n  - `mosec_service_batch_size_bucket`：显示批处理大小的分布。\n  - `mosec_service_batch_duration_second_bucket`：显示每个连接在各阶段的动态批处理耗时（从接收到第一个任务开始计算）。\n  - `mosec_service_process_duration_second_bucket`：显示每个连接在各阶段的处理耗时（包含 IPC 时间，但不包含 `mosec_service_batch_duration_second_bucket` 所涵盖的时间）。\n  - `mosec_service_remaining_task`：显示当前正在处理的任务数量。\n  - `mosec_service_throughput`：显示服务吞吐量。\n- 请通过 `SIGINT`（`CTRL+C`）或 `SIGTERM`（`kill {PID}`）信号停止服务，因为服务内置了优雅关闭（graceful shutdown）逻辑。\n\n## 性能调优\n\n- 为您的推理服务找出最佳的 `max_batch_size` 和 `max_wait_time`。指标会显示实际批处理大小和批处理耗时的直方图，这些是调整这两个参数的关键依据。\n- 尝试将整个推理过程拆分为独立的 CPU 和 GPU 阶段（参考 [DistilBERT](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fpytorch.html#natural-language-processing)）。不同阶段将以[数据流水线（data pipeline）](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPipeline_(software)) 方式运行，从而保持 GPU 持续忙碌。\n- 您还可以调整每个阶段的工作进程（worker）数量。例如，如果您的流水线包含一个用于预处理的 CPU 阶段和一个用于模型推理的 GPU 阶段，增加 CPU 阶段的 worker 数量有助于为 GPU 阶段生成更多可批处理的数据；增加 GPU 阶段的 worker 数量则能更充分地利用 GPU 内存和计算能力。这两种方式都可能提高 GPU 利用率，从而提升服务吞吐量。\n- 对于多阶段服务，请注意：数据在不同阶段之间传递时会通过 `serialize_ipc\u002Fdeserialize_ipc` 方法进行序列化\u002F反序列化，因此过大的数据可能导致整个流水线变慢。默认情况下，序列化后的数据通过 Rust 传递到下一阶段，您可以启用共享内存以潜在降低延迟（参考 [RedisShmIPCMixin](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fipc.html#redis-shm-ipc-py)）。\n- 您应选择合适的 `serialize\u002Fdeserialize` 方法，用于解析用户请求和编码响应。默认两者均使用 JSON，但 JSON 对图像和 embedding 等数据支持不佳。您可以选择更快且二进制兼容的 msgpack（参考 [Stable Diffusion](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fstable_diffusion.html)）。\n- 配置 OpenBLAS 或 MKL 的线程数。它们可能无法自动选择当前 Python 进程所使用的最合适 CPU 核心。您可以通过 [env](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Freference\u002Finterface.html#mosec.server.Server.append_worker) 为每个 worker 单独配置（参考 [自定义 GPU 分配](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Fenv.html)）。\n- 从客户端启用 HTTP\u002F2。自 v0.8.8 起，`mosec` 会自动适配用户使用的协议（如 HTTP\u002F2）。\n\n## 用户\n\n以下是一些正在使用 Mosec 的公司和个人用户：\n\n- [Modelz](https:\u002F\u002Fgithub.com\u002Ftensorchord\u002Fopenmodelz)：面向机器学习推理的 Serverless 平台。\n- [MOSS](https:\u002F\u002Fgithub.com\u002FOpenLMLab\u002FMOSS\u002Fblob\u002Fmain\u002FREADME_en.md)：一个开源的类 ChatGPT 对话语言模型。\n- [腾讯云（TencentCloud）](https:\u002F\u002Fwww.tencentcloud.com\u002Fdocument\u002Fproduct\u002F1141\u002F45261)：腾讯云机器学习平台，使用 Mosec 作为[核心推理服务器框架](https:\u002F\u002Fcloud.tencent.com\u002Fdocument\u002Fproduct\u002F851\u002F74148)。\n- [TensorChord](https:\u002F\u002Fgithub.com\u002Ftensorchord)：云原生 AI 基础设施公司。\n- [OAT](https:\u002F\u002Fgithub.com\u002Fsail-sg\u002Foat)：用于在线大语言模型（LLM）对齐的奖励模型服务。\n\n## 引用（Citation）\n\n如果您在研究中发现本软件有用，请考虑引用以下内容：\n\n```\n@software{yang2021mosec,\n  title = {{MOSEC: Model Serving made Efficient in the Cloud}},\n  author = {Yang, Keming and Liu, Zichen and Cheng, Philip},\n  howpublished = {https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec},\n  year = {2021}\n}\n```\n\n## 贡献（Contributing）\n\n我们欢迎任何形式的贡献。您可以通过 [提交 Issue](https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fissues\u002Fnew\u002Fchoose) 或在 [Discord](https:\u002F\u002Fdiscord.gg\u002FJq5vxuH69W) 上与我们讨论来提供反馈。您也可以直接[贡献代码](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fdevelopment\u002Fcontributing.html)并提交 Pull Request！\n\n开始开发时，您可以使用 [envd](https:\u002F\u002Fgithub.com\u002Ftensorchord\u002Fenvd) 创建一个隔离且干净的 Python 和 Rust 环境。更多信息请参阅 [envd 文档](https:\u002F\u002Fenvd.tensorchord.ai\u002F) 或 [build.envd](https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fblob\u002Fmain\u002Fbuild.envd) 文件。","# mosec 快速上手指南\n\n## 环境准备\n\n- **操作系统**：Linux 或 macOS\n- **Python 版本**：3.7 或更高\n- **可选依赖**（如运行 Stable Diffusion 示例）：\n  ```shell\n  pip install --upgrade diffusers[torch] transformers\n  ```\n\n> 💡 提示：国内用户建议配置 PyPI 镜像源（如清华源）以加速安装：\n> ```shell\n> pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple -U mosec\n> ```\n\n## 安装步骤\n\n### 通过 PyPI 安装（推荐）\n```shell\npip install -U mosec\n```\n\n### 通过 conda 安装\n```shell\nconda install conda-forge::mosec\n```\n\n### 通过 pixi 安装\n```shell\npixi add mosec\n```\n\n> 如需从源码构建，请先安装 [Rust](https:\u002F\u002Fwww.rust-lang.org\u002F)，然后执行：\n> ```shell\n> make package\n> ```\n> 构建好的 wheel 文件将位于 `dist\u002F` 目录下。\n\n## 基本使用\n\n以下是一个使用 mosec 部署 Stable Diffusion 模型的最简示例：\n\n### 1. 编写服务代码（`server.py`）\n\n```python\nfrom io import BytesIO\nfrom typing import List\n\nimport torch\nfrom diffusers import StableDiffusionPipeline\n\nfrom mosec import Server, Worker, get_logger\nfrom mosec.mixin import MsgpackMixin\n\nlogger = get_logger()\n\nclass StableDiffusion(MsgpackMixin, Worker):\n    def __init__(self):\n        self.pipe = StableDiffusionPipeline.from_pretrained(\n            \"sd-legacy\u002Fstable-diffusion-v1-5\", torch_dtype=torch.float16\n        )\n        self.pipe.enable_model_cpu_offload()\n        self.example = [\"useless example prompt\"] * 4  # 模型预热\n\n    def forward(self, data: List[str]) -> List[memoryview]:\n        logger.debug(\"generate images for %s\", data)\n        res = self.pipe(data)\n        images = []\n        for img in res[0]:\n            dummy_file = BytesIO()\n            img.save(dummy_file, format=\"JPEG\")\n            images.append(dummy_file.getbuffer())\n        return images\n\nif __name__ == \"__main__\":\n    server = Server()\n    server.append_worker(StableDiffusion, num=1, max_batch_size=4, max_wait_time=10)\n    server.run()\n```\n\n### 2. 启动服务\n\n```shell\npython server.py --log-level debug --timeout 30000\n```\n\n服务启动后，可通过以下方式访问：\n\n- **API 文档**：访问 `http:\u002F\u002F127.0.0.1:8000\u002Fopenapi\u002Fswagger\u002F`\n- **指标监控**：`curl http:\u002F\u002F127.0.0.1:8000\u002Fmetrics`\n\n### 3. 调用示例（客户端）\n\n```shell\npython examples\u002Fstable_diffusion\u002Fclient.py --prompt \"a cute cat playing with a red ball\" --output cat.jpg --port 8000\n```\n\n执行后将在当前目录生成 `cat.jpg` 图像文件。\n\n> 更多完整示例请参考官方文档：[mosec Examples](https:\u002F\u002Fmosecorg.github.io\u002Fmosec\u002Fexamples\u002Findex.html)","某AI创业公司正在为电商平台开发一个“AI商品图生成”服务，用户输入商品描述即可实时生成高质量营销图片，后端基于Stable Diffusion模型提供支持。\n\n### 没有 mosec 时\n- 直接用Flask封装模型，每个请求独占GPU资源，高并发下GPU利用率低、响应延迟飙升至10秒以上  \n- 手动实现批处理逻辑复杂且易出错，难以动态合并不同用户的文本生成请求  \n- CPU预处理（如文本编码）和GPU推理耦合在同一进程，无法并行，造成资源闲置  \n- 缺乏生产级能力：无优雅停机、无指标监控，Kubernetes部署后难以运维和扩缩容  \n\n### 使用 mosec 后\n- 利用mosec的动态批处理能力，自动聚合多个用户请求，在GPU上批量推理，吞吐量提升4倍，平均延迟降至2秒内  \n- 通过mosec的流水线机制，将文本编码（CPU密集）与图像生成（GPU密集）拆分为独立阶段，并行执行，硬件利用率接近满载  \n- 内置Prometheus指标暴露、模型预热和优雅关闭功能，轻松集成到现有K8s集群，运维成本大幅降低  \n- 仅需编写标准Python类继承Worker，无需修改原有模型代码，快速上线稳定服务  \n\nmosec让团队以极低改造成本，将实验阶段的生成模型高效转化为高并发、可运维的生产级API服务。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmosecorg_mosec_f943eab5.png","mosecorg","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmosecorg_463943ce.png","machine learning model serving infra",null,"mosecorg@gmail.com","https:\u002F\u002Fmosecorg.github.io","https:\u002F\u002Fgithub.com\u002Fmosecorg",[83,87,91,95],{"name":84,"color":85,"percentage":86},"Python","#3572A5",67.7,{"name":88,"color":89,"percentage":90},"Rust","#dea584",30.1,{"name":92,"color":93,"percentage":94},"Dockerfile","#384d54",1.2,{"name":96,"color":97,"percentage":98},"Makefile","#427819",1,894,72,"2026-04-02T20:34:56","Apache-2.0","Linux, macOS","未说明",{"notes":106,"python":107,"dependencies":108},"支持从 PyPI、conda 或源码安装；若从源码构建需先安装 Rust；示例中使用了 msgpack 序列化以支持二进制数据传输；动态批处理可提升吞吐量；支持多阶段流水线和自定义指标监控。","3.7+",[109,110,111],"torch","diffusers","transformers",[14,13,55,26],[114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131],"model-serving","deep-learning","machine-learning","nerual-network","mlops","machine-learning-platform","hacktoberfest","gpu","python","pytorch","tensorflow","llm","jax","llm-serving","rust","cv","mxnet","tts",4,"2026-03-27T02:49:30.150509","2026-04-06T06:45:54.175100",[136,141,146,151,156,160],{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},669,"如何解决请求体过大（'request body is too large'）的问题？","Mosec 默认限制请求体大小为 10MB。若需处理更大的请求（如大型 NumPy 数组），目前无法通过命令行参数直接调整，但可以修改源码并重新编译。维护者建议：如果确实需要更大请求体，可自行修改 Rust 代码中的 max-body-size 设置，并承担性能影响的风险。未来可能会支持类似 --max_req_size 的启动参数。","https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fissues\u002F521",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},670,"在 macOS M1 上运行 Mosec 时提示 'site-packages\u002Fmosec\u002Fbin\u002Fmosec' 找不到，如何解决？","该问题通常与 Rust 版本兼容性有关。建议使用 Rust 1.79.0 进行构建，因为更高版本（如 1.80+）可能因依赖库（如 time-0.3.34）编译失败。从 mosec 0.9.1 起已支持更多平台，可尝试直接通过 pip install mosec 安装 ARM 兼容版本。","https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fissues\u002F507",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},671,"如何对输入数据进行类型校验以避免 '500 inference internal error'？","Mosec 支持通过自定义装饰器或 mixin 实现类型校验。维护者推荐使用 ValidationError 异常配合 pydantic 或 msgspec 进行验证。例如，可定义 TypedDict 并在 forward 方法上添加校验逻辑，当输入格式不符时主动抛出 ValidationError，从而向用户返回明确错误信息而非内部错误。","https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fissues\u002F351",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},672,"为什么高并发请求下 Mosec 服务的吞吐量低、批次不满？","低并发性能可能与客户端请求发送方式、网络延迟或服务端批处理配置有关。虽然 Issue 中未给出完整解决方案，但建议检查是否使用了合适的异步客户端（如 httpx）、确保请求密集发送，并确认 max_batch_size 和 timeout 参数设置合理。官方后续可能提供基准测试工具辅助调优。","https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fissues\u002F564",{"id":157,"question_zh":158,"answer_zh":159,"source_url":150},673,"能否在 Mosec 中集成 OpenAPI 或自动请求格式文档？","Mosec 团队已在规划 OpenAPI 支持，相关工作在 Issue #359 中跟踪。当前可通过结合 pydantic 或 msgspec 定义输入输出 schema，并手动提供文档。类型校验功能（Issue #352）已实现，为未来自动生成 API 文档奠定基础。",{"id":161,"question_zh":162,"answer_zh":163,"source_url":140},674,"处理图像等大对象时，是否必须使用 JPEG 字节流？能否直接传 NumPy 数组？","Mosec 不强制要求使用 JPEG 字节流，但直接传输大型 NumPy 数组容易触发 10MB 请求体限制。维护者建议尽量使用紧凑格式（如 JPEG）减少体积。若必须传数组，需自行扩展 max-body-size 限制，或考虑将预处理结果序列化为更高效格式（如 msgpack + buffer 协议），但仍受限于默认大小上限。",[165,170,175,180,185,190,195,200,205,210,215,220,225,230,235,240,245,250,255,260],{"id":166,"version":167,"summary_zh":168,"released_at":169},100246,"0.9.6","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Changes 🛠\r\n* fix: make rust 1.88 clippy happy by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F657\r\n### More Documentation 📚\r\n* docs: make the halt error msg precise by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F655\r\n### Minor changes 🧹\r\n* chore: replace pre-commit with prek by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F661\r\n* chore: drop py3.9, add py3.14, py3.14t by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F670\r\n* chore: run macOS amd test on the new runner by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F675\r\n* chore: release 0.9.6 by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F676\r\n* chore: bump cuda and miniconda in dockerfile by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F677\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.9.5...0.9.6","2025-11-25T06:52:31",{"id":171,"version":172,"summary_zh":173,"released_at":174},100247,"0.9.5","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Changes 🛠\r\n* fix: align registered Runtime & appended Worker timeout by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F653\r\n### Exciting New Features 🎉\r\n* feat: adopt uv for venv and package management  by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F635\r\n### More Documentation 📚\r\n* docs: add pixi by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F643\r\n### Minor changes 🧹\r\n* chore: bump dep version by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F634\r\n* chore: push to ghcr, fix typos by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F636\r\n* chore: fix the modelz link in readme by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F641\r\n* chore: bump zstd to avoid the yanked version by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F642\r\n* chore: group the dependabot prs by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F645\r\n* chore: allow zlib license by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F649\r\n* chore: add code owner by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F650\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.9.3...0.9.5","2025-06-10T14:35:18",{"id":176,"version":177,"summary_zh":178,"released_at":179},100248,"0.9.3","> [!note]\r\n> Set as pre-release because I forgot to update the version in the `Cargo.toml`. :cry: \r\n> Also, this only contains a CI fix to the previous release, should check the previous changelog.\r\n\r\n\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Changes 🛠\r\n* fix: maturin release action with latest stable toolchain by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F631\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.9.2...0.9.3","2025-02-23T06:44:34",{"id":181,"version":182,"summary_zh":183,"released_at":184},100249,"0.9.2","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Changes 🛠\r\n* fix: dockerfile mosec bin path & cargo-deny trigger by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F621\r\n* Fix bibtex by @lkevinzc in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F625\r\n### Exciting New Features 🎉\r\n* feat: find out the installed mosec path by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F624\r\n* feat: change to rust edition 2024 by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F630\r\n### Minor changes 🧹\r\n* chore: fix github pages by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F618\r\n* chore: add cargo deny to lint cargo dependencies by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F619\r\n* chore: fix cargo deny by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F620\r\n* chore: fix discord badge by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F622\r\n* chore: add ubuntu arm to ci test by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F623\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.9.1...0.9.2","2025-02-22T03:31:14",{"id":186,"version":187,"summary_zh":188,"released_at":189},100250,"0.9.1","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at 0.9.1 -->\r\n\r\n## What's Changed\r\n### Exciting New Features 🎉\r\n* feat: deprecate py3.8 by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F602\r\n* feat: switch to OIDC PyPI release by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F613\r\n* feat: bump axum to 0.8, enable 3.13 test by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F615\r\n* feat: adopt maturin by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F616\r\n### More Documentation 📚\r\n* docs: add sitemap and robots by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F603\r\n* docs: add openapi keyword to readme by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F611\r\n### Minor changes 🧹\r\n* chore: add MSRV by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F612\r\n* chore: fix release ci yml by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F617\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.9.0...0.9.1","2025-01-09T12:33:12",{"id":191,"version":192,"summary_zh":193,"released_at":194},100251,"0.9.0","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Exciting New Features 🎉\r\n* feat: support gzip & zstd compression by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F599\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.8.9...0.9.0","2024-11-18T12:33:43",{"id":196,"version":197,"summary_zh":198,"released_at":199},100252,"0.8.9","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Exciting New Features 🎉\r\n* feat: decrease the guard check interval, bump version by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F596\r\n### More Documentation 📚\r\n* docs: update embedding.md by @eltociear in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F574\r\n### Minor changes 🧹\r\n* chore: upgrade lychee action and accept 403 forbidden by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F580\r\n* chore: fix the dynamic extra dependencies by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F591\r\n### Others 🔔\r\n* [Docs] Add source_edit_link to point to the correct path by @sravan1946 in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F578\r\n* Update README.md by @lkevinzc in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F593\r\n\r\n## New Contributors\r\n* @eltociear made their first contribution in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F574\r\n* @sravan1946 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F578\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.8.8...0.8.9","2024-11-13T14:59:15",{"id":201,"version":202,"summary_zh":203,"released_at":204},100253,"0.8.8","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Changes 🛠\r\n* fix: replace pkg_resources with importlib by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F563\r\n### Exciting New Features 🎉\r\n* feat: add http\u002F2 support by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F568\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.8.7...0.8.8","2024-09-22T02:10:17",{"id":206,"version":207,"summary_zh":208,"released_at":209},100254,"0.8.7","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Refactoring 🧬\r\n* refactor: HTTPStautsCode -> HTTPStatusCode by @monologg in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F552\r\n### Minor changes 🧹\r\n* chore: freeze numpy \u003C 2 by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F543\r\n* chore: bump version for derive_more by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F561\r\n* chore: replace once_cell with std OnceLock by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F562\r\n\r\n## New Contributors\r\n* @monologg made their first contribution in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F552\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.8.6...0.8.7","2024-09-04T07:23:34",{"id":211,"version":212,"summary_zh":213,"released_at":214},100255,"0.8.6","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Changes 🛠\r\n* Fix rerank server error by @miaojinc in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F537\r\n* fix: annotations lazy evaluation by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F538\r\n### Minor changes 🧹\r\n* chore: rm ci cache by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F539\r\n\r\n## New Contributors\r\n* @miaojinc made their first contribution in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F537\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.8.5...0.8.6","2024-06-16T09:32:07",{"id":216,"version":217,"summary_zh":218,"released_at":219},100256,"0.8.5","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### More Documentation 📚\r\n* docs: update the macOS ARM64 in README by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F509\r\n### Minor changes 🧹\r\n* chore: fix typos and broken links by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F522\r\n* chore: bump version to avoid CI failure by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F529\r\n### Others 🔔\r\n* test: add arm in ci by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F510\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.8.4...0.8.5","2024-05-21T02:43:49",{"id":221,"version":222,"summary_zh":223,"released_at":224},100257,"0.8.4","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n:tada: We support macOS ARM release now! :tada: \r\n\r\n## What's Changed\r\n### More Documentation 📚\r\n* docs: add conda installation command by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F492\r\n* docs: cross-encoder rerank model by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F504\r\n### Minor changes 🧹\r\n* chore: fix release docker image by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F490\r\n* chore: adopt ruff by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F503\r\n* chore: support macOS ARM release by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F508\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.8.3...0.8.4","2024-02-27T07:58:38",{"id":226,"version":227,"summary_zh":228,"released_at":229},100258,"0.8.3","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Changes 🛠\r\n* fix: upgrade to hyper 1 by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F478\r\n### Minor changes 🧹\r\n* chore: upload artifacts to os specific dir and merge downloads by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F489\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.8.2...0.8.3","2024-01-01T12:25:02",{"id":231,"version":232,"summary_zh":233,"released_at":234},100259,"0.8.2","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Exciting New Features 🎉\r\n* feat: use utc time for log by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F471\r\n### More Documentation 📚\r\n* docs: organize the examples folder by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F456\r\n* docs: add emb example by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F457\r\n### Minor changes 🧹\r\n* chore: fix the docker image tags by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F454\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.8.1...0.8.2","2023-12-05T03:19:36",{"id":236,"version":237,"summary_zh":238,"released_at":239},100260,"0.8.1","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Exciting New Features 🎉\r\n* feat: add endpoint label to the code metrics by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F437\r\n* feat: add py 3.12 test and release by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F447\r\n### More Documentation 📚\r\n* docs: add build from source code guide by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F431\r\n### Minor changes 🧹\r\n* chore: fix CI docker release tag by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F428\r\n* chore: bump utoipa version, update dep license file by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F453\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.8.0...0.8.1","2023-10-14T05:04:35",{"id":241,"version":242,"summary_zh":243,"released_at":244},100261,"0.8.0","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Exciting New Features 🎉\r\n* feat: support swagger-ui for the openapi by @n063h in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F407\r\n* feat: support multi-route with shared workers by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F423\r\n### More Documentation 📚\r\n* docs: update readme about pid 1 and dynamic batching by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F400\r\n* docs: fix readthedocs build config by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F401\r\n* docs: add performance tuning guide by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F409\r\n* docs: add concepts, faq, migration guide by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F411\r\n### Minor changes 🧹\r\n* chore: add readthedoc support by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F396\r\n* chore: add gpu docker image in release by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F398\r\n* chore: align the naming, fix the test lint by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F399\r\n* chore: add the codeql lint config by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F406\r\n* chore: fix the sphinx ci by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F412\r\n* chore: use collapsible readme by @lkevinzc in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F413\r\n* chore: add conda-forge badge to the readme by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F425\r\n* chore: increase the timeout for stable diffusion example by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F427\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.7.2...0.8.0","2023-08-05T01:20:52",{"id":246,"version":247,"summary_zh":248,"released_at":249},100262,"0.7.2","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Exciting New Features 🎉\r\n* feat: provide openapi doc by @n063h in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F370\r\n* feat: support server sent event by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F333\r\n* feat: support `--log-level` CLI argument by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F394\r\n### Refactoring 🧬\r\n* refactor: sse abstraction in worker and coordinator by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F391\r\n### Minor changes 🧹\r\n* chore: add sleep to service test to catch the check bug by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F380\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.7.1...0.7.2","2023-06-21T03:26:04",{"id":251,"version":252,"summary_zh":253,"released_at":254},100263,"0.7.1","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Changes 🛠\r\n* fix: check child process exit code by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F379\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.7.0...0.7.1","2023-06-06T09:43:44",{"id":256,"version":257,"summary_zh":258,"released_at":259},100264,"0.7.0","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Changes 🛠\r\n* fix: use try to import msgspec by @lkevinzc in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F368\r\n* fix: plasma mixin should use super serde with shm storage by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F369\r\n* fix: args should only be checked when server is used by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F371\r\n### Exciting New Features 🎉\r\n* feat: add content type to the inference response header by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F360\r\n* feat: support redis shm mixin by @n063h in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F367\r\n* feat: support dynamic inference route by @cutecutecat in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F374\r\n### More Documentation 📚\r\n* docs: mark ipc wrapper as deprecated by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F378\r\n### Minor changes 🧹\r\n* chore: better logo for night mode by @lkevinzc in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F365\r\n\r\n## New Contributors\r\n* @cutecutecat made their first contribution in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F374\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.6.7...0.7.0","2023-06-06T04:06:38",{"id":261,"version":262,"summary_zh":263,"released_at":264},100265,"0.6.7","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\r\n\r\n## What's Changed\r\n### Exciting New Features 🎉\r\n* feat: change plasma plugin to mixin by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F350\r\n* feat: support request validation according to annotation by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F352\r\n### More Documentation 📚\r\n* docs: fix plasma related doc, add deprecate warning by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F354\r\n* docs: combine the metrics related page by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F358\r\n### Minor changes 🧹\r\n* chore: only enable nightly test for Linux by @kemingy in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F346\r\n* chore: Add tencent cloud by @gaocegege in https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fpull\u002F356\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosecorg\u002Fmosec\u002Fcompare\u002F0.6.6...0.6.7","2023-05-19T10:58:58"]