[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-vllm-project--guidellm":3,"tool-vllm-project--guidellm":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",158594,2,"2026-04-16T23:34:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":76,"owner_url":77,"languages":78,"stars":102,"forks":103,"last_commit_at":104,"license":105,"difficulty_score":32,"env_os":106,"env_gpu":107,"env_ram":108,"env_deps":109,"category_tags":115,"github_topics":76,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":117,"updated_at":118,"faqs":119,"releases":149},8238,"vllm-project\u002Fguidellm","guidellm","Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs","GuideLLM 是一个专为大语言模型（LLM）部署打造的评估与优化平台，旨在帮助团队在接近真实生产环境的负载下，精准衡量模型性能。它解决了传统基准测试工具往往只关注接口连通性，而忽视首字延迟（TTFT）、令牌间延迟（ITL）等关键指标分布的问题，让开发者能够依据服务等级目标（SLO）进行数据驱动的调优。\n\n无论是负责模型落地的工程师，还是研究系统扩展性的研究人员，都能利用 GuideLLM 模拟同步、并发及速率限制等多种真实流量模式。它支持使用真实或合成的多模态数据集，通过端到端的交互仿真，生成包含详细资源需求和操作极限的标准报告，从而辅助容量规划与回归测试。\n\n该工具的独特亮点在于其深度适配 LLM 特性，不仅提供细粒度的令牌级统计信息，还具备高吞吐量的压测能力，支持多进程、异步执行及灵活的 CLI\u002FAPI 调用。相比通用脚本，GuideLLM 直接兼容 OpenAI 接口与 vLLM 原生服务，无需定制格式即可融入现有的 Python 开发工作流，是构建高效、可靠大模型应用的得力助手。","\u003Cp align=\"center\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fvllm-project\u002Fguidellm\u002Fmain\u002Fdocs\u002Fassets\u002Fguidellm-logo-light.png\">\n    \u003Cimg alt=\"GuideLLM Logo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvllm-project_guidellm_readme_f20eb31f5eba.png\" width=55%>\n  \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n\u003Ch3 align=\"center\">\nSLO-aware Benchmarking and Evaluation Platform for Optimizing Real-World LLM Inference\n\u003C\u002Fh3>\n\n[![GitHub Release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Fvllm-project\u002Fguidellm.svg?label=Version)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Freleases) [![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocumentation-8A2BE2?logo=read-the-docs&logoColor=%23ffffff&color=%231BC070)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Ftree\u002Fmain\u002Fdocs) [![License](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fvllm-project\u002Fguidellm.svg)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002FLICENSE) [![PyPI Release](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fguidellm.svg?label=PyPI%20Release)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fguidellm) [![Python Versions](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10--3.13-orange)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fguidellm) [![Nightly Build](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fvllm-project\u002Fguidellm\u002Fnightly.yml?branch=main&label=Nightly%20Build)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Factions\u002Fworkflows\u002Fnightly.yml)\n\n## Overview\n\n\u003Cp>\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fvllm-project\u002Fguidellm\u002Fmain\u002Fdocs\u002Fassets\u002Fguidellm-user-flows-dark.png\">\n    \u003Cimg alt=\"GuideLLM User Flows\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvllm-project_guidellm_readme_f13c42ee2b96.png\">\n  \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n**GuideLLM** is a platform for evaluating how language models perform under real workloads and configurations. It simulates end-to-end interactions with OpenAI-compatible and vLLM-native servers, generates workload patterns that reflect production usage, and produces detailed reports that help teams understand system behavior, resource needs, and operational limits. GuideLLM supports real and synthetic datasets, multimodal inputs, and flexible execution profiles, giving engineering and ML teams a consistent framework for assessing model behavior, tuning deployments, and planning capacity as their systems evolve.\n\n### Why GuideLLM?\n\nGuideLLM gives teams a clear picture of performance, efficiency, and reliability when deploying LLMs in production-like environments.\n\n- **Captures complete latency and token-level statistics for SLO-driven evaluation**, including full distributions for TTFT, ITL, and end-to-end behavior.\n- **Generates realistic, configurable traffic patterns** across synchronous, concurrent, and rate-based modes, including reproducible sweeps to identify safe operating ranges.\n- **Supports both real and synthetic multimodal datasets**, enabling controlled experiments and production-style evaluations in one framework.\n- **Produces standardized, exportable reports for dashboards, analysis, and regression tracking**, ensuring consistency across teams and workflows.\n- **Delivers high-throughput, extensible benchmarking** with multiprocessing, threading, async execution, and a flexible CLI\u002FAPI for customization or quickstarts.\n\n### Comparisons\n\nMany tools benchmark endpoints, not models, and miss the details that matter for LLMs. GuideLLM focuses exclusively on LLM-specific workloads, measuring TTFT, ITL, output distributions, and dataset-driven variation. It fits into everyday engineering tasks by using standard Python interfaces and HuggingFace datasets instead of custom formats or research-only pipelines. It is also built for performance, supporting high-rate load generation and accurate scheduling far beyond simple scripts or example benchmarks. The table below highlights how this approach compares to other options.\n\n| Tool                                                                         | CLI | API | High Perf | Full Metrics | Data Modalities                | Data Sources                          | Profiles                                                      | Backends                        | Endpoints                                                                 | Output Types             |\n| ---------------------------------------------------------------------------- | --- | --- | --------- | ------------ | ------------------------------ | ------------------------------------- | ------------------------------------------------------------- | ------------------------------- | ------------------------------------------------------------------------- | ------------------------ |\n| GuideLLM                                                                     | ✅  | ✅  | ✅        | ✅           | Text, Image, Audio, Video      | HuggingFace, Files, Synthetic, Custom | Synchronous, Concurrent, Throughput, Constant, Poisson, Sweep | OpenAI-compatible               | \u002Fcompletions, \u002Fchat\u002Fcompletions, \u002Faudio\u002Ftranslation, \u002Faudio\u002Ftranscription | console, json, csv, html |\n| [inference-perf](https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Finference-perf)          | ✅  | ❌  | ✅        | ❌           | Text                           | Synthetic, Specific Datasets          | Concurrent, Constant, Poisson, Sweep                          | OpenAI-compatible               | \u002Fcompletions, \u002Fchat\u002Fcompletions                                           | json, png                |\n| [genai-bench](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fgenai-bench)                    | ✅  | ❌  | ❌        | ❌           | Text, Image, Embedding, ReRank | Synthetic, File                       | Concurrent                                                    | OpenAI-compatible, Hosted Cloud | \u002Fchat\u002Fcompletions, \u002Fembeddings                                            | console, xlsx, png       |\n| [llm-perf](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fllmperf)                           | ❌  | ❌  | ✅        | ❌           | Text                           | Synthetic                             | Concurrent                                                    | OpenAI-compatible, Hosted Cloud | \u002Fchat\u002Fcompletions                                                         | json                     |\n| [ollama-benchmark](https:\u002F\u002Fgithub.com\u002Faidatatools\u002Follama-benchmark)          | ✅  | ❌  | ❌        | ❌           | Text                           | Synthetic                             | Synchronous                                                   | Ollama                          | \u002Fcompletions                                                              | console, json            |\n| [vllm\u002Fbenchmarks](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Ftree\u002Fmain\u002Fbenchmarks) | ✅  | ❌  | ❌        | ❌           | Text                           | Synthetic, Specific Datasets          | Synchronous, Throughput, Constant, Sweep                      | OpenAI-compatible, vLLM API     | \u002Fcompletions, \u002Fchat\u002Fcompletions                                           | console, png             |\n\n## What's New\n\nThis section summarizes the newest capabilities available to users and outlines the current areas of development. It helps readers understand how the platform is evolving and what to expect next.\n\n**Recent Additions**\n\n- New refactored architecture enabling high-rate load generation at scale and a more extensible interface for additional backends, data pipelines, load generation schedules, benchmarking constraints, and output formats.\n- Added multimodal benchmarking support for image, video, and audio workloads across chat completions, transcription, and translation APIs.\n- Broader metrics collection, including richer statistics for visual, audio, and text inputs such as image sizes, audio lengths, video frame counts, and word-level data.\n\n**Active Development**\n\n- Generation of synthetic multimodal datasets for controlled experimentation across images, audio, and video.\n- Extended prefixing options for testing system-prompt and user-prompt variations.\n- Multi-turn conversation capabilities for benchmarking chat agents and dialogue systems.\n- Speculative decoding specific views and outputs.\n\n## Quick Start\n\nThe Quick Start shows how to install GuideLLM, launch a server, and run your first benchmark in a few minutes.\n\n### Install GuideLLM\n\nBefore installing, ensure you have the following prerequisites:\n\n- OS: Linux or MacOS\n- Python: 3.10 - 3.13\n\nInstall the latest GuideLLM release from PyPi using `pip` :\n\n```bash\npip install guidellm[recommended]\n```\n\nOr install from source:\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git\n```\n\nOr run the latest container from [ghcr.io\u002Fvllm-project\u002Fguidellm](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpkgs\u002Fcontainer\u002Fguidellm):\n\n```bash\npodman run \\\n  --rm -it \\\n  -v \".\u002Fresults:\u002Fresults:rw\" \\\n  -e GUIDELLM_TARGET=http:\u002F\u002Flocalhost:8000 \\\n  -e GUIDELLM_PROFILE=sweep \\\n  -e GUIDELLM_MAX_SECONDS=30 \\\n  -e GUIDELLM_DATA=\"prompt_tokens=256,output_tokens=128\" \\\n  ghcr.io\u002Fvllm-project\u002Fguidellm:latest\n```\n\n### Launch an Inference Server\n\nStart any OpenAI-compatible endpoint. For vLLM:\n\n```bash\nvllm serve \"neuralmagic\u002FMeta-Llama-3.1-8B-Instruct-quantized.w4a16\"\n```\n\nVerify the server is running at `http:\u002F\u002Flocalhost:8000`.\n\n### Run Your First Benchmark\n\nRun a sweep that identifies the maximum performance and maximum rates for the model:\n\n```bash\nguidellm benchmark \\\n  --target \"http:\u002F\u002Flocalhost:8000\" \\\n  --profile sweep \\\n  --max-seconds 30 \\\n  --data \"prompt_tokens=256,output_tokens=128\"\n```\n\nYou will see progress updates and per-benchmark summaries during the run, as given below:\n\n\u003Cimg src= \"https:\u002F\u002Fraw.githubusercontent.com\u002Fvllm-project\u002Fguidellm\u002Fmain\u002Fdocs\u002Fassets\u002Fsample-benchmarks.gif\"\u002F>\n\n### Inspect Outputs\n\nAfter the benchmark completes, GuideLLM saves all results into the output directory you specified (default: the current directory). You'll see a summary printed in the console along with a set of file locations (`.json,` `.csv`, `.html`) that contain the full results of the run.\n\nThe following section, **Output Files and Reports**, explains what each file contains and how to use them for analysis, visualization, or automation.\n\n## Output Files and Reports\n\nAfter running the Quick Start benchmark, GuideLLM writes several output files to the directory you specified. Each one focuses on a different layer of analysis, ranging from a quick on-screen summary to fully structured data for dashboards and regression pipelines.\n\n**Console Output**\n\nThe console provides a lightweight summary with high-level statistics for each benchmark in the run. It's useful for quick checks to confirm that the server responded correctly, the load sweep completed, and the system behaved as expected. Additionally, the output tables can be copied and pasted into spreadsheet software using `|` as the delimiter. The sections will look similar to the following:\n\n\u003Cimg alt=\"Sample GuideLLM benchmark output\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvllm-project_guidellm_readme_067c41c27f28.png\" \u002F>\n\n**benchmarks.json**\n\nThis file is the authoritative record of the entire benchmark session. It includes configuration, metadata, per-benchmark statistics, and sample request entries with individual request timings. Use it for debugging, deeper analysis, or loading into Python with `GenerativeBenchmarksReport`.\n\nAlternatively, a yaml version of this file can be generated for easier human readability with the same content as `benchmarks.json` using the `--outputs yaml` argument.\n\n**benchmarks.csv**\n\nThis file provides a compact tabular view of each benchmark with the fields most commonly used for reporting—throughput, latency percentiles, token counts, and rate information. It opens cleanly in spreadsheets and BI tools and is well-suited for comparisons across runs.\n\n**benchmarks.html**\n\nThe HTML report provides a visual summary of results, including charts of latency distributions, throughput behavior, and generation patterns. It's ideal for quick exploration or sharing with teammates without requiring them to parse JSON.\n\n## Common Use Cases and Configurations\n\nGuideLLM supports a wide range of LLM benchmarking workflows. The examples below show how to run typical scenarios and highlight the parameters that matter most. For a complete list of arguments, details, and options, run `guidellm benchmark run --help`\n\n### Load Patterns\n\nSimmulating different applications requires different traffic shapes. This example demonstrates rate-based load testing using a constant profile at 10 requests per second, running for 20 seconds with synthetic data of 128 prompt tokens and 256 output tokens.\n\n```bash\nguidellm benchmark \\\n  --target http:\u002F\u002Flocalhost:8000 \\\n  --profile constant \\\n  --rate 10 \\\n  --max-seconds 20 \\\n  --data \"prompt_tokens=128,output_tokens=256\"\n```\n\n**Key parameters:**\n\n- `--profile`: Defines the traffic pattern - options include `synchronous` (sequential requests), `concurrent` (parallel users), `throughput` (maximum capacity), `constant` (fixed requests\u002Fsec), `poisson` (randomized requests\u002Fsec), or `sweep` (automatic rate exploration)\n- `--rate`: The numeric rate value whose meaning depends on profile - for `sweep` it's the number of benchmarks, for `concurrent` it's simultaneous requests, for `constant`\u002F`poisson` it's requests per second\n- `--max-seconds`: Maximum duration in seconds for each benchmark run (can also use `--max-requests` to limit by request count instead)\n\n### Dataset Sources\n\nGuideLLM supports HuggingFace datasets, local files, and synthetic data. This example loads the CNN DailyMail dataset from HuggingFace and maps the article column to prompts while using the summary token count column to determine output lengths.\n\n```bash\nguidellm benchmark run \\\n  --target http:\u002F\u002Flocalhost:8000 \\\n  --data \"abisee\u002Fcnn_dailymail\" \\\n  --data-args '{\"name\": \"3.0.0\"}' \\\n  --data-column-mapper '{\"text_column\":\"article\"}'\n```\n\n**Key parameters:**\n\n- `--data`: Data source specification - accepts HuggingFace dataset IDs (prefix with `hf:`), local file paths (`.json`, `.csv`, `.jsonl`, `.txt`), or synthetic data configs (JSON object or `key=value` pairs like `prompt_tokens=256,output_tokens=128`)\n- `--data-args`: Arguments for loading the dataset. See [`datasets.load_dataset`](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdatasets\u002Fv4.5.0\u002Fen\u002Fpackage_reference\u002Floading_methods#datasets.load_dataset) for valid options.\n- `--data-column-mapper`: JSON object of arguments for dataset creation - commonly used to specify column mappings like `text_column`, `output_tokens_count_column`, or HuggingFace dataset parameters\n- `--data-samples`: Number of samples to use from the dataset - use `-1` (default) for all samples with dynamic generation, or specify a positive integer to limit sample count\n- `--processor`: Tokenizer or processor name used for generating synthetic data - if not provided and required for the dataset, automatically loads from the model; accepts HuggingFace model IDs or local paths\n\n### Request Types and API Targets\n\nYou can benchmark chat completions, text completions, or other supported request types. This example configures the benchmark to test chat completions API using a custom dataset file, with GuideLLM automatically formatting requests to match the chat completions schema.\n\n```bash\nguidellm benchmark \\\n  --target http:\u002F\u002Flocalhost:8000 \\\n  --request-type chat_completions \\\n  --data path\u002Fto\u002Fdata.json\n```\n\n**Key parameters:**\n\n- `--request-type`: Specifies the API endpoint format - options include `chat_completions` (chat API format), `completions` (text completion format), `audio_transcription` (audio transcription), and `audio_translation` (audio translation).\n\n### Using Scenarios\n\nBuilt-in scenarios bundle schedules, dataset settings, and request formatting to standardize common testing patterns. This example uses the pre-configured chat scenario which includes appropriate defaults for chat model evaluation, with any additional CLI arguments overriding the scenario's settings.\n\n```bash\nguidellm benchmark --scenario chat --target http:\u002F\u002Flocalhost:8000\n```\n\n**Key parameters:**\n\n- `--scenario`: Built-in scenario name or path to a custom scenario configuration file - built-in options include pre-configured testing patterns for common use cases; CLI options passed alongside this will override the scenario's default settings\n\n### Benchmark Controls\n\nWarm-up, cooldown, and maximum limits help ensure stable, repeatable measurements. This example runs a concurrent benchmark with 16 parallel requests, using 10% warmup and cooldown periods to exclude initialization and shutdown effects, while limiting the test to stop if more than 5 errors occur.\n\n```bash\nguidellm benchmark \\\n  --target http:\u002F\u002Flocalhost:8000 \\\n  --profile concurrent \\\n  --rate 16 \\\n  --warmup 0.1 \\\n  --cooldown 0.1 \\\n  --max-errors 5 \\\n  --data \"prompt_tokens=256,output_tokens=128\" \\\n  --detect-saturation\n```\n\n**Key parameters:**\n\n- `--warmup`: Warm-up specification - values between 0 and 1 represent a percentage of total requests\u002Ftime, values ≥1 represent absolute request or time units.\n- `--cooldown`: Cool-down specification - same format as warmup, excludes final portion of benchmark from analysis to avoid shutdown effects\n- `--max-seconds`: Maximum duration in seconds for each benchmark before automatic termination\n- `--max-requests`: Maximum number of requests per benchmark before automatic termination\n- `--max-errors`: Maximum number of individual errors before stopping the benchmark entirely\n- `--data`: Data to use for benchmarking - synthetic data with 256 input and 128 output tokens\n- `--detect-saturation`: Enable over-saturation detection to automatically stop benchmarks when the model becomes over-saturated (see also `--over-saturation` for more advanced control)\n\n## Development and Contribution\n\nDevelopers interested in extending GuideLLM can use the project's established development workflow. Local setup, environment activation, and testing instructions are outlined in [DEVELOPING.md](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002FDEVELOPING.md). This guide explains how to run the benchmark suite, validate changes, and work with the CLI or API during development. Contribution standards are documented in [CONTRIBUTING.md](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002FCONTRIBUTING.md), including coding conventions, commit structure, and review guidelines. These standards help maintain stability as the platform evolves. The [CODE_OF_CONDUCT.md](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002FCODE_OF_CONDUCT.md) outlines expectations for respectful and constructive participation across all project spaces. For contributors who want deeper reference material, the documentation covers installation, backends, datasets, metrics, output types, and architecture. Reviewing these topics is useful when adding new backends, request types, or data integrations. Release notes and changelogs are linked from the GitHub Releases page and provide historical context for ongoing work.\n\n## Documentation\n\nThe complete documentation provides the details that do not fit in this README. It includes installation steps, backend configuration, dataset handling, metrics definitions, output formats, tutorials, and an architecture overview. These references help you explore the platform more deeply or integrate it into existing workflows.\n\nNotable docs are given below:\n\n- [**Installation Guide**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fgetting-started\u002Finstall.md) - This guide provides step-by-step instructions for installing GuideLLM, including prerequisites and setup tips.\n- [**Backends Guide**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fguides\u002Fbackends.md) - A comprehensive overview of supported backends and how to set them up for use with GuideLLM.\n- [**Data\u002FDatasets Guide**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fguides\u002Fdatasets.md) - Information on supported datasets, including how to use them for benchmarking.\n- [**Metrics Guide**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fguides\u002Fmetrics.md) - Detailed explanations of the metrics used in GuideLLM, including definitions and how to interpret them.\n- [**Outputs Guide**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fguides\u002Foutputs.md) - Information on the different output formats supported by GuideLLM and how to use them.\n- [**Architecture Overview**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fguides\u002Farchitecture.md) - A detailed look at GuideLLM's design, components, and how they interact.\n\n## License\n\nGuideLLM is licensed under the [Apache License 2.0](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002FLICENSE).\n\n## Cite\n\nIf you find GuideLLM helpful in your research or projects, please consider citing it:\n\n```bibtex\n@misc{guidellm2024,\n  title={GuideLLM: Scalable Inference and Optimization for Large Language Models},\n  author={Neural Magic, Inc.},\n  year={2024},\n  howpublished={\\url{https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm}},\n}\n```\n","\u003Cp align=\"center\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fvllm-project\u002Fguidellm\u002Fmain\u002Fdocs\u002Fassets\u002Fguidellm-logo-light.png\">\n    \u003Cimg alt=\"GuideLLM Logo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvllm-project_guidellm_readme_f20eb31f5eba.png\" width=55%>\n  \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n\u003Ch3 align=\"center\">\n面向真实世界大模型推理优化的SLO感知基准测试与评估平台\n\u003C\u002Fh3>\n\n[![GitHub Release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Fvllm-project\u002Fguidellm.svg?label=版本)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Freleases) [![文档](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F文档-8A2BE2?logo=read-the-docs&logoColor=%23ffffff&color=%231BC070)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Ftree\u002Fmain\u002Fdocs) [![许可证](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fvllm-project\u002Fguidellm.svg)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002FLICENSE) [![PyPI发布](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fguidellm.svg?label=PyPI%20Release)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fguidellm) [![Python版本](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10--3.13-orange)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fguidellm) [![夜间构建](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fvllm-project\u002Fguidellm\u002Fnightly.yml?branch=main&label=夜间%20Build)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Factions\u002Fworkflows\u002Fnightly.yml)\n\n## 概述\n\n\u003Cp>\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fvllm-project\u002Fguidellm\u002Fmain\u002Fdocs\u002Fassets\u002Fguidellm-user-flows-dark.png\">\n    \u003Cimg alt=\"GuideLLM 用户流程\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvllm-project_guidellm_readme_f13c42ee2b96.png\">\n  \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n**GuideLLM** 是一个用于评估语言模型在实际工作负载和配置下表现的平台。它能够模拟与 OpenAI 兼容及 vLLM 原生服务器的端到端交互，生成反映生产环境使用情况的工作负载模式，并输出详细的报告，帮助团队理解系统行为、资源需求以及运行上限。GuideLLM 支持真实和合成数据集、多模态输入以及灵活的执行配置文件，为工程和机器学习团队提供了一套一致的框架，以评估模型行为、优化部署并规划容量，从而适应系统的持续演进。\n\n### 为什么选择 GuideLLM？\n\nGuideLLM 能够在类生产环境中清晰地展示 LLM 部署时的性能、效率和可靠性。\n\n- **捕捉完整的延迟和 token 级别的统计信息，用于 SLO 驱动的评估**，包括 TTFT、ITL 和端到端行为的完整分布。\n- **生成逼真且可配置的流量模式**，涵盖同步、并发和基于速率的模式，支持可重复的扫描测试以识别安全的运行范围。\n- **同时支持真实和合成的多模态数据集**，在一个框架内实现受控实验和生产风格的评估。\n- **生成标准化且可导出的报告，用于仪表盘、分析和回归跟踪**，确保跨团队和工作流的一致性。\n- **提供高吞吐量、可扩展的基准测试能力**，支持多进程、多线程、异步执行，并配备灵活的 CLI\u002FAPI，便于定制或快速上手。\n\n### 比较\n\n许多工具仅对端点进行基准测试，而非模型本身，因此会忽略对大语言模型至关重要的细节。GuideLLM 专注于 LLM 特有的工作负载，能够测量 TTFT、ITL、输出分布以及由数据集驱动的差异性。它采用标准的 Python 接口和 HuggingFace 数据集，而非自定义格式或仅用于研究的流水线，因而能无缝融入日常工程任务。此外，GuideLLM 在性能方面也经过精心设计，支持高频率的负载生成和精确的调度，远超简单的脚本或示例基准测试。下表突显了这种方法与其他选项相比的优势。\n\n| 工具                                                                         | CLI | API | 高性能 | 全面指标 | 数据模态                | 数据来源                          | 配置文件                                                      | 后端                        | 端点                                                                 | 输出类型             |\n| ---------------------------------------------------------------------------- | --- | --- | -------- | ------------ | ------------------------------ | ------------------------------------- | ------------------------------------------------------------- | ------------------------------- | ------------------------------------------------------------------------- | ------------------------ |\n| GuideLLM                                                                     | ✅  | ✅  | ✅        | ✅           | 文本、图像、音频、视频      | HuggingFace、文件、合成数据、自定义数据 | 同步、并发、吞吐量、恒定、泊松、扫描                          | OpenAI 兼容                   | \u002Fcompletions, \u002Fchat\u002Fcompletions, \u002Faudio\u002Ftranslation, \u002Faudio\u002Ftranscription | 控制台、JSON、CSV、HTML |\n| [inference-perf](https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Finference-perf)          | ✅  | ❌  | ✅        | ❌           | 文本                           | 合成数据、特定数据集          | 并发、恒定、泊松、扫描                          | OpenAI 兼容                   | \u002Fcompletions, \u002Fchat\u002Fcompletions                                           | JSON、PNG                |\n| [genai-bench](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fgenai-bench)                    | ✅  | ❌  | ❌        | ❌           | 文本、图像、嵌入、重排序     | 合成数据、文件                       | 并发                                                    | OpenAI 兼容、托管云服务       | \u002Fchat\u002Fcompletions, \u002Fembeddings                                            | 控制台、XLSX、PNG       |\n| [llm-perf](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fllmperf)                           | ❌  | ❌  | ✅        | ❌           | 文本                           | 合成数据                             | 并发                                                    | OpenAI 兼容、托管云服务       | \u002Fchat\u002Fcompletions                                                         | JSON                     |\n| [ollama-benchmark](https:\u002F\u002Fgithub.com\u002Faidatatools\u002Follama-benchmark)          | ✅  | ❌  | ❌        | ❌           | 文本                           | 合成数据                             | 同步                                                   | Ollama                          | \u002Fcompletions                                                              | 控制台、JSON            |\n| [vllm\u002Fbenchmarks](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Ftree\u002Fmain\u002Fbenchmarks) | ✅  | ❌  | ❌        | ❌           | 文本                           | 合成数据、特定数据集          | 同步、吞吐量、恒定、扫描                      | OpenAI 兼容、vLLM API         | \u002Fcompletions, \u002Fchat-completions                                           | 控制台、PNG             |\n\n## 最新功能\n\n本节总结了用户可用的最新功能，并概述了当前的开发方向。这有助于读者了解平台的发展历程以及未来的预期。\n\n**近期新增功能**\n\n- 重构的新架构支持大规模高频率负载生成，并提供了更具扩展性的接口，以适应更多后端、数据管道、负载生成计划、基准测试约束及输出格式。\n- 新增多模态基准测试支持，涵盖聊天补全、转录和翻译 API 中的图像、视频和音频工作负载。\n- 扩展了指标收集范围，包括针对视觉、音频和文本输入的更丰富统计信息，例如图像尺寸、音频时长、视频帧数以及词级数据。\n\n**正在进行的开发**\n\n- 生成用于图像、音频和视频领域受控实验的多模态合成数据集。\n- 扩展前缀选项，以测试系统提示和用户提示的变化。\n- 实现多轮对话能力，用于对聊天代理和对话系统进行基准测试。\n- 提供推测解码相关的特定视图和输出。\n\n## 快速入门\n\n快速入门将展示如何在几分钟内安装 GuideLLM、启动服务器并运行首次基准测试。\n\n### 安装 GuideLLM\n\n在安装之前，请确保满足以下先决条件：\n\n- 操作系统：Linux 或 macOS\n- Python：3.10 - 3.13\n\n使用 `pip` 从 PyPI 安装最新版 GuideLLM：\n\n```bash\npip install guidellm[recommended]\n```\n\n或者从源代码安装：\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git\n```\n\n也可以直接运行来自 [ghcr.io\u002Fvllm-project\u002Fguidellm](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpkgs\u002Fcontainer\u002Fguidellm) 的最新容器：\n\n```bash\npodman run \\\n  --rm -it \\\n  -v \".\u002Fresults:\u002Fresults:rw\" \\\n  -e GUIDELLM_TARGET=http:\u002F\u002Flocalhost:8000 \\\n  -e GUIDELLM_PROFILE=sweep \\\n  -e GUIDELLM_MAX_SECONDS=30 \\\n  -e GUIDELLM_DATA=\"prompt_tokens=256,output_tokens=128\" \\\n  ghcr.io\u002Fvllm-project\u002Fguidellm:latest\n```\n\n### 启动推理服务器\n\n启动任意一个 OpenAI 兼容的端点。以 vLLM 为例：\n\n```bash\nvllm serve \"neuralmagic\u002FMeta-Llama-3.1-8B-Instruct-quantized.w4a16\"\n```\n\n请确认服务器已在 `http:\u002F\u002Flocalhost:8000` 正常运行。\n\n### 运行首次基准测试\n\n运行一次扫描测试，以确定模型的最大性能和最高速率：\n\n```bash\nguidellm benchmark \\\n  --target \"http:\u002F\u002Flocalhost:8000\" \\\n  --profile sweep \\\n  --max-seconds 30 \\\n  --data \"prompt_tokens=256,output_tokens=128\"\n```\n\n运行过程中，您将看到进度更新和每次基准测试的摘要，如下所示：\n\n\u003Cimg src= \"https:\u002F\u002Fraw.githubusercontent.com\u002Fvllm-project\u002Fguidellm\u002Fmain\u002Fdocs\u002Fassets\u002Fsample-benchmarks.gif\"\u002F>\n\n### 检查输出结果\n\n基准测试完成后，GuideLLM 会将所有结果保存到您指定的输出目录中（默认为当前目录）。控制台会打印汇总信息，并附带一组文件路径（`.json`、`.csv`、`.html`），其中包含了完整的运行结果。\n\n下一节——“输出文件与报告”——将详细说明每个文件的内容以及如何利用它们进行分析、可视化或自动化操作。\n\n## 输出文件与报告\n\n运行快速入门基准测试后，GuideLLM 会将多个输出文件写入您指定的目录中。每个文件侧重于不同层次的分析，从屏幕上的快速摘要，到可用于仪表板和回归管道的完全结构化数据。\n\n**控制台输出**\n\n控制台提供了一个轻量级的摘要，包含本次运行中每个基准测试的高级统计信息。它非常适合用于快速检查，以确认服务器响应正确、负载扫描已完成，并且系统行为符合预期。此外，输出表格可以使用 `|` 作为分隔符复制并粘贴到电子表格软件中。各部分的格式大致如下所示：\n\n\u003Cimg alt=\"Sample GuideLLM benchmark output\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvllm-project_guidellm_readme_067c41c27f28.png\" \u002F>\n\n**benchmarks.json**\n\n此文件是整个基准测试会话的权威记录。它包含配置、元数据、每个基准测试的统计信息，以及带有单个请求计时的示例请求条目。您可以将其用于调试、深入分析，或使用 `GenerativeBenchmarksReport` 加载到 Python 中。\n\n另外，您也可以通过使用 `--outputs yaml` 参数生成一个内容与 `benchmarks.json` 相同、但更易于人类阅读的 YAML 版本。\n\n**benchmarks.csv**\n\n此文件以紧凑的表格形式展示了每个基准测试的数据，字段包括最常用于报告的吞吐量、延迟百分位数、标记数量和速率信息。它可以在电子表格和 BI 工具中直接打开，非常适合用于跨次运行的比较。\n\n**benchmarks.html**\n\nHTML 报告提供了结果的可视化摘要，包括延迟分布图、吞吐量行为和生成模式等图表。它非常适合快速浏览，或与团队成员共享，而无需他们解析 JSON 文件。\n\n## 常见用例与配置\n\nGuideLLM 支持广泛的 LLM 基准测试工作流。以下示例展示了如何运行典型场景，并重点介绍了最重要的参数。如需完整的参数列表、详细信息和选项，请运行 `guidellm benchmark run --help`。\n\n### 负载模式\n\n模拟不同的应用需要不同的流量形状。此示例演示了基于速率的负载测试，采用每秒 10 个请求的恒定速率配置，持续 20 秒，使用包含 128 个提示标记和 256 个输出标记的合成数据。\n\n```bash\nguidellm benchmark \\\n  --target http:\u002F\u002Flocalhost:8000 \\\n  --profile constant \\\n  --rate 10 \\\n  --max-seconds 20 \\\n  --data \"prompt_tokens=128,output_tokens=256\"\n```\n\n**关键参数：**\n\n- `--profile`：定义流量模式——可选值包括 `synchronous`（顺序请求）、`concurrent`（并行用户）、`throughput`（最大容量）、`constant`（固定请求\u002F秒）、`poisson`（随机请求\u002F秒）或 `sweep`（自动速率探索）。\n- `--rate`：数值速率，其含义取决于所选的流量模式——对于 `sweep` 是基准测试的数量，对于 `concurrent` 是同时发起的请求数量，而对于 `constant` 或 `poisson` 则是每秒的请求数。\n- `--max-seconds`：每个基准测试运行的最大持续时间（单位为秒）。您也可以使用 `--max-requests` 来按请求数限制运行。\n\n### 数据集来源\n\nGuideLLM 支持 HuggingFace 数据集、本地文件以及合成数据。此示例从 HuggingFace 加载 CNN DailyMail 数据集，并将文章列映射为提示，同时使用摘要标记数量列来确定输出长度。\n\n```bash\nguidellm benchmark run \\\n  --target http:\u002F\u002Flocalhost:8000 \\\n  --data \"abisee\u002Fcnn_dailymail\" \\\n  --data-args '{\"name\": \"3.0.0\"}' \\\n  --data-column-mapper '{\"text_column\":\"article\"}'\n```\n\n**关键参数：**\n\n- `--data`：数据源规范——接受 HuggingFace 数据集 ID（需加前缀 `hf:`）、本地文件路径（`.json`、`.csv`、`.jsonl`、`.txt`），或合成数据配置（JSON 对象或 `key=value` 键值对，例如 `prompt_tokens=256,output_tokens=128`）。\n- `--data-args`：加载数据集时使用的参数。有关有效选项，请参阅 [`datasets.load_dataset`](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdatasets\u002Fv4.5.0\u002Fen\u002Fpackage_reference\u002Floading_methods#datasets.load_dataset)。\n- `--data-column-mapper`：用于数据集创建的参数 JSON 对象——通常用于指定列映射，例如 `text_column`、`output_tokens_count_column`，或 HuggingFace 数据集的相关参数。\n- `--data-samples`：从数据集中使用的样本数量——使用 `-1`（默认值）表示动态生成所有样本，或指定一个正整数以限制样本数量。\n- `--processor`：用于生成合成数据的分词器或处理器名称——如果未提供且数据集需要，则会自动从模型中加载；接受 HuggingFace 模型 ID 或本地路径。\n\n### 请求类型与 API 目标\n\n您可以对聊天补全、文本补全或其他支持的请求类型进行基准测试。此示例配置基准测试以使用自定义数据集文件测试聊天补全 API，GuideLLM 会自动将请求格式化为匹配聊天补全的架构。\n\n```bash\nguidellm benchmark \\\n  --target http:\u002F\u002Flocalhost:8000 \\\n  --request-type chat_completions \\\n  --data path\u002Fto\u002Fdata.json\n```\n\n**关键参数：**\n\n- `--request-type`：指定 API 端点的格式——可选值包括 `chat_completions`（聊天 API 格式）、 `completions`（文本补全格式）、`audio_transcription`（音频转录）和 `audio_translation`（音频翻译）。\n\n### 使用场景\n\n内置场景将调度、数据集设置和请求格式化打包在一起，以标准化常见的测试模式。此示例使用预配置的聊天场景，其中包含了适用于聊天模型评估的适当默认值，任何额外的 CLI 参数都会覆盖场景的设置。\n\n```bash\nguidellm benchmark --scenario chat --target http:\u002F\u002Flocalhost:8000\n```\n\n**关键参数：**\n\n- `--scenario`：内置场景名称或自定义场景配置文件的路径——内置选项包括针对常见用例预配置的测试模式；与此参数一起传递的 CLI 选项将覆盖场景的默认设置。\n\n### 基准控制\n\n预热、冷却和最大限制有助于确保测量的稳定性和可重复性。此示例运行一个并发基准测试，包含16个并行请求，使用10%的预热和冷却期以排除初始化和关闭的影响，同时限制测试在出现超过5次错误时停止。\n\n```bash\nguidellm benchmark \\\n  --target http:\u002F\u002Flocalhost:8000 \\\n  --profile concurrent \\\n  --rate 16 \\\n  --warmup 0.1 \\\n  --cooldown 0.1 \\\n  --max-errors 5 \\\n  --data \"prompt_tokens=256,output_tokens=128\" \\\n  --detect-saturation\n```\n\n**关键参数：**\n\n- `--warmup`：预热设置——值介于0到1之间表示总请求数或总时间的百分比，值≥1则表示绝对的请求数或时间单位。\n- `--cooldown`：冷却设置——格式与预热相同，用于将基准测试的最后一部分排除在分析之外，以避免关闭效应。\n- `--max-seconds`：每个基准测试的最大持续时间（秒），超过该时间将自动终止。\n- `--max-requests`：每个基准测试的最大请求数，超过该数量将自动终止。\n- `--max-errors`：在基准测试完全停止之前允许出现的最大错误次数。\n- `--data`：用于基准测试的数据——合成数据，输入256个标记，输出128个标记。\n- `--detect-saturation`：启用过饱和检测功能，当模型发生过饱和时自动停止基准测试（有关更高级的控制，请参阅`--over-saturation`）。\n\n## 开发与贡献\n\n对扩展GuideLLM感兴趣的开发者可以使用该项目已建立的开发流程。本地设置、环境激活及测试说明均在[DEVELOPING.md](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002FDEVELOPING.md)中列出。本指南解释了如何运行基准测试套件、验证更改，以及在开发过程中使用CLI或API的方法。贡献标准记录在[CONTRIBUTING.md](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002FCONTRIBUTING.md)中，包括编码规范、提交结构和评审指南。这些标准有助于在平台演进过程中保持稳定性。[CODE_OF_CONDUCT.md](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002FCODE_OF_CONDUCT.md)概述了在所有项目空间中进行尊重和建设性参与的期望。对于希望获取更深入参考资料的贡献者，文档涵盖了安装、后端、数据集、指标、输出类型和架构等内容。在添加新的后端、请求类型或数据集成时，查阅这些主题会很有帮助。发布说明和变更日志链接自GitHub Releases页面，为当前工作提供了历史背景。\n\n## 文档\n\n完整文档提供了本README中未涵盖的详细信息。其中包括安装步骤、后端配置、数据集处理、指标定义、输出格式、教程以及架构概览。这些参考资料有助于您更深入地探索该平台或将之集成到现有工作流中。\n\n以下是一些值得关注的文档：\n\n- [**安装指南**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fgetting-started\u002Finstall.md)——本指南提供了安装GuideLLM的分步说明，包括先决条件和设置技巧。\n- [**后端指南**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fguides\u002Fbackends.md)——全面概述了支持的后端及其在GuideLLM中的设置方法。\n- [**数据\u002F数据集指南**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fguides\u002Fdatasets.md)——关于支持的数据集的信息，包括如何将其用于基准测试。\n- [**指标指南**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fguides\u002Fmetrics.md)——对GuideLLM中使用的指标的详细解释，包括定义及如何解读。\n- [**输出指南**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fguides\u002Foutputs.md)——关于GuideLLM支持的不同输出格式及其使用方法的信息。\n- [**架构概览**](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002Fdocs\u002Fguides\u002Farchitecture.md)——详细介绍GuideLLM的设计、组件及其相互作用。\n\n## 许可证\n\nGuideLLM采用[Apache许可证2.0版](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fblob\u002Fmain\u002FLICENSE)授权。\n\n## 引用\n\n如果您在研究或项目中发现GuideLLM有所帮助，请考虑引用它：\n\n```bibtex\n@misc{guidellm2024,\n  title={GuideLLM：大型语言模型的可扩展推理与优化},\n  author={Neural Magic, Inc.},\n  year={2024},\n  howpublished={\\url{https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm}},\n}\n```","# GuideLLM 快速上手指南\n\nGuideLLM 是一个面向生产环境的 LLM 推理基准测试与评估平台，专注于 SLO（服务等级目标）驱动的性能分析。它能模拟真实负载模式，生成详细的延迟分布（TTFT、ITL）和吞吐量报告，帮助团队优化模型部署。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux 或 macOS\n*   **Python 版本**：3.10 - 3.13\n*   **推理后端**：需要运行一个 OpenAI 兼容的推理服务（例如 vLLM、TGI 等）。\n\n## 安装步骤\n\n您可以选择通过 PyPI 安装推荐版本，或直接从源码安装。\n\n### 方式一：通过 PyPI 安装（推荐）\n\n```bash\npip install guidellm[recommended]\n```\n\n> **提示**：国内用户若下载缓慢，可添加清华或阿里镜像源：\n> `pip install guidellm[recommended] -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n### 方式二：从源码安装\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git\n```\n\n### 方式三：使用容器运行\n\n如果您偏好使用容器，可以直接运行最新的 Docker\u002FPodman 镜像：\n\n```bash\npodman run \\\n  --rm -it \\\n  -v \".\u002Fresults:\u002Fresults:rw\" \\\n  -e GUIDELLM_TARGET=http:\u002F\u002Flocalhost:8000 \\\n  -e GUIDELLM_PROFILE=sweep \\\n  -e GUIDELLM_MAX_SECONDS=30 \\\n  -e GUIDELLM_DATA=\"prompt_tokens=256,output_tokens=128\" \\\n  ghcr.io\u002Fvllm-project\u002Fguidellm:latest\n```\n\n## 基本使用\n\n以下是启动服务并运行第一个基准测试的最小化流程。\n\n### 1. 启动推理服务器\n\n首先启动一个 OpenAI 兼容的推理服务。以下以 **vLLM** 为例：\n\n```bash\nvllm serve \"neuralmagic\u002FMeta-Llama-3.1-8B-Instruct-quantized.w4a16\"\n```\n\n*确保服务正常运行在 `http:\u002F\u002Flocalhost:8000`。*\n\n### 2. 运行基准测试\n\n使用 `guidellm benchmark` 命令执行压力测试。以下示例将执行一次 **Sweep（扫描）** 模式的测试，自动探测模型的最大性能边界：\n\n```bash\nguidellm benchmark \\\n  --target \"http:\u002F\u002Flocalhost:8000\" \\\n  --profile sweep \\\n  --max-seconds 30 \\\n  --data \"prompt_tokens=256,output_tokens=128\"\n```\n\n**参数说明：**\n*   `--target`: 推理服务的地址。\n*   `--profile sweep`: 执行负载扫描，寻找最佳吞吐量和最大并发率。\n*   `--max-seconds`: 每个测试场景的最大持续时间。\n*   `--data`: 定义合成数据的输入\u002F输出 Token 数量。\n\n### 3. 查看结果\n\n测试完成后，GuideLLM 会在当前目录生成以下文件：\n\n*   **控制台输出**：实时显示进度和关键指标摘要。\n*   `benchmarks.json`：包含完整的配置、元数据和详细请求日志，适合程序化分析。\n*   `benchmarks.csv`：表格格式，包含吞吐量、延迟百分位数等核心指标，适合导入 Excel 分析。\n*   `benchmarks.html`：可视化报告，包含延迟分布图和吞吐量趋势图。","某电商平台的 AI 客服团队正在为即将到来的“双 11\"大促进行大模型服务扩容，急需验证新部署的 vLLM 集群能否在高压下满足严格的响应延迟要求。\n\n### 没有 guidellm 时\n- **盲测风险高**：仅靠简单的脚本发送随机请求，无法模拟真实用户长短不一的对话节奏，导致上线后突发长文本请求时系统频繁卡顿。\n- **指标缺失**：只能监控平均延迟，缺乏首字延迟（TTFT）和 token 生成间隔（ITL）的详细分布数据，难以定位是网络问题还是模型推理瓶颈。\n- **SLO 验证困难**：无法精确回答“在 99% 的情况下响应是否小于 500ms\"，只能凭经验估算资源配额，造成服务器资源要么浪费要么不足。\n- **复现成本高**：遇到性能抖动时，缺乏可重放的标准化流量模式，开发团队花费数天时间试图手动复现故障场景。\n\n### 使用 guidellm 后\n- **真实流量仿真**：利用 guidellm 生成符合生产环境特征的混合负载（同步、并发及速率限制模式），提前暴露了长上下文场景下的显存溢出隐患。\n- **全链路深度洞察**：自动输出 TTFT 和 ITL 的完整分布统计报告，精准识别出特定长度区间的推理延迟异常，指导团队针对性优化参数。\n- **数据驱动决策**：基于 SLO 驱动的评估结果，团队确切知道了集群在满足 99% 请求低于 400ms 时的最大并发阈值，实现了资源的精准规划。\n- **高效回归测试**：通过保存标准化的执行配置，每次代码更新后一键重放相同压力测试，快速确认性能改进或防止退化。\n\nguidellm 将模糊的性能猜测转化为可视化的数据决策，帮助团队在零故障的前提下完成了大促期间的容量规划与系统调优。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvllm-project_guidellm_dab030b5.png","vllm-project","vLLM","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fvllm-project_88aef4ba.png","",null,"https:\u002F\u002Fgithub.com\u002Fvllm-project",[79,83,87,91,95,99],{"name":80,"color":81,"percentage":82},"Python","#3572A5",89.8,{"name":84,"color":85,"percentage":86},"TypeScript","#3178c6",9.6,{"name":88,"color":89,"percentage":90},"JavaScript","#f1e05a",0.3,{"name":92,"color":93,"percentage":94},"Dockerfile","#384d54",0.2,{"name":96,"color":97,"percentage":98},"CSS","#663399",0,{"name":100,"color":101,"percentage":98},"Shell","#89e051",1011,144,"2026-04-16T09:40:15","Apache-2.0","Linux, macOS","未说明 (工具本身为基准测试客户端，GPU 需求取决于所连接的后端推理服务器，如 vLLM)","未说明",{"notes":110,"python":111,"dependencies":112},"GuideLLM 是一个用于评估 LLM 推理性能的基准测试平台，它通过调用 OpenAI 兼容接口或 vLLM 原生 API 运行。因此，运行该工具本身不需要高性能 GPU，但被测试的推理后端（如 vLLM 服务器）需要相应的 GPU 资源。支持文本、图像、音频和视频等多模态输入。可通过 pip 安装或使用 Docker\u002FPodman 容器运行。","3.10 - 3.13",[113,114],"vllm (可选，用于启动本地测试服务器)","HuggingFace Datasets (用于数据加载)",[35,14,116],"其他","2026-03-27T02:49:30.150509","2026-04-17T08:25:13.881839",[120,125,130,135,140,145],{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},36888,"如何在运行 GuideLLM 基准测试时传递额外的采样参数（如 topk 或 ignore_eos）？","目前可以通过请求格式化器的 kwargs 路径来控制请求负载。虽然 `--backend-args` 仅传递给后端的构造函数而非每个请求，但使用 `ignore_eos` 参数可以确保始终生成所需数量的 token。对于更具体的场景，可能需要扩展 CLI 以支持直接传递这些参数到每个请求中。","https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fissues\u002F69",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},36889,"高并发基准测试（rate >= 24）在关闭时出现未处理的 ConnectionRefusedError 错误怎么办？","该问题通常与 Python 版本有关。建议将 Python 从 3.10 升级到 3.13，因为旧版本存在多处理相关问题。升级后，默认参数即可支持高达 --rate 100 的测试。此外，该问题在 0.4.0 版本及新的容器镜像修复中应已解决，如果仍遇到此问题，请检查是否使用了最新版本。","https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fissues\u002F329",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},36890,"使用 GuideLLM 通过 LiteLLM 代理测试 vLLM 时遇到 500 Internal Server Error 如何解决？","这是因为 GuideLLM 同时发送了 `max_tokens` 和 `max_completion_tokens` 参数，而某些服务器无法正确处理。解决方法是在 LiteLLM 配置中添加 `litellm.drop_params=True`，这样 LiteLLM 会自动丢弃不支持的参数。参考命令：在启动 LiteLLM 时确保配置文件中包含该设置，或查看 LiteLLM 文档关于 completion input 的部分。","https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fissues\u002F308",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},36891,"GuideLLM 默认使用了错误的 Tokenizer（例如对 Mistral 模型使用了 Llama Tokenizer）怎么办？","在设置默认的 GuideLLM 运行时，如果没有明确指定 Tokenizer，系统可能会使用默认的 Llama Tokenizer。为避免此问题，请在运行命令中显式指定与模型匹配的 Processor 或 Tokenizer，例如使用 `--processor \"mistralai\u002FMistral-7B-Instruct-v0.3\"` 来确保使用正确的分词器。","https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fissues\u002F37",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},36892,"如何为 GuideLLM 基准测试提供不同格式的数据源（DS），例如本地 JSON 文件或 Hugging Face 数据集？","可以使用 `--data` 参数指定数据源。对于 Hugging Face 数据集，使用 `--data ${HF_DS_ID}` 并配合 `--data-args '{\"prompt_column\": \"prompt\"}'` 指定列名。对于本地 JSON 文件，确保文件格式正确且路径无误；如果遇到 `PosixPath` 迭代错误，请检查 GuideLLM 版本是否支持该功能，或参考 PR #137 获取最新的数据格式示例和修复。","https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fissues\u002F133",{"id":146,"question_zh":147,"answer_zh":148,"source_url":134},36893,"为什么在使用 OpenAI HTTP 后端时会收到关于 max_completion_tokens 和 max_tokens 的错误？","GuideLLM 为了兼容性会同时发送 `max_tokens` 和 `max_completion_tokens` 参数。大多数模型服务器足够智能，可以忽略其中一个。但如果后端（如 LiteLLM 代理）报错，可以通过配置后端忽略多余参数来解决，例如在 LiteLLM 中设置 `drop_params=True`。",[150,155,160,165,170,175,180,185,190,195,200,205],{"id":151,"version":152,"summary_zh":153,"released_at":154},297275,"v0.6.0","## 概述\n\nGuideLLM v0.6.0 是一个功能版本，新增了多轮对话支持、Responses API、地理空间模型支持以及进程内 vLLM Python 后端，并修复了一些 bug。\n\n要开始使用，请通过以下命令安装：\n\n```bash\npip install guidellm[recommended]==0.6.0\n```\n\n或者从源代码安装：\n\n```bash\npip install 'guidellm[recommended] @ git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git'@v0.6.0\n```\n\n## 兼容性说明\n\n- **Python**: 3.10–3.13\n- **操作系统**: Linux、MacOS\n\n## 新增内容：\n\n* 添加了基础的 Responses API 支持：工具调用支持将在后续版本中加入。\n* 为数据集和合成数据都增加了多轮对话支持。\n* 增加了 vLLM Python（进程内）后端。\n* 增加了 TerraTorch 地理空间模型的支持。\n\n## 修复内容：\n\n* 允许在 HTTP 后端中禁用 vLLM 特有的请求体选项。\n* 修复了 `--sample-requests` 参数，使其能够限制输出中的采样数量。\n* 修复了 HTML 报告中的 HTML 引用问题。\n* 修复了 OpenShift 容器镜像中 HOME 目录的权限问题。\n\n## 更改日志\n\n### 功能特性\n\n* 由 @ushaket 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F607 中实现的即时 TTFT 过饱和功能。\n* 由 @jaredoconnell 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F596 中实现的 vLLM Python 后端。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F590 中实现的多轮基准测试功能。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F649 中添加的轮次和对话跟踪器。\n* 由 @jaredoconnell 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F655 中实现的 Basic Responses API 支持。\n* 由 @mgazz 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F610 中实现的对通过 vLLM \u002Fpooling 端点提供服务的 TerraTorch 地理空间模型的支持。\n\n### 内部重构与清理\n\n* 由 @jaredoconnell 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F640 中修复并改进的模拟服务器。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F644 中从子模块中导入工具函数。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F478 中将请求格式化工作移至后端。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F617 中从控制台的吞吐量指标中移除中位数。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F638 中修复主页面上的 HTML 并将其默认禁用。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F635 中移除了 vLLM 的 extras 组。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F654 中改进了环境变量警告并进行了验证清理。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F651 中将 mp 上下文传递给策略。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F647 中清理了 __init__ 文件。\n\n### 修复内容\n\n* 修复 (cli): 由 @aiwantaozi 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F561 中实现的对 --output-path 和 --output-dir 的验证。\n* 由 @natoscott 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F591 中修复的 guidellm 基准测试 --sample-requests 命令行选项。\n* 由 @sjmonson 在 ht 中移除了各种已弃用的设置，并取消了默认的 OpenAI 请求超时时间。","2026-04-01T21:44:09",{"id":156,"version":157,"summary_zh":158,"released_at":159},297276,"v0.5.4","## 概述\n\nGuideLLM v0.5.4 是一个热修复补丁，建议所有 GuideLLM 用户安装。\n\n开始使用前，请通过以下命令进行安装：\n\n```bash\npip install guidellm[recommended]==0.5.4\n```\n\n或者从源代码安装：\n\n```bash\npip install 'guidellm[recommended] @ git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git'@v0.5.4\n```\n\n## 修复内容\n\n* `--sample-requests` 现在应能正常工作\n* 更新锁定文件中的 `transformers` 包版本，以支持 Mistral 模型的分词器\n* 将 HTML 报告的模板来源从 `blog.vllm.ai` 更改为 `raw.githubusercontent.com`\n    * **严重问题**：此问题会导致启用 HTML 报告时程序崩溃\n\n## 兼容性说明\n\n- **Python**: 3.10–3.13\n- **操作系统**: Linux、MacOS\n\n## 更改日志\n\n### 错误修复\n\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F591 中修复了 guidellm 基准测试的 `--sample-requests` 命令行选项\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F628 中更新了锁定文件中的 `transformers` 版本\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F629 中将 HTML 模板的来源位置迁移到 GitHub 的 raw 域名\n\n**完整更改日志**: https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fcompare\u002Fv0.5.3...v0.5.4","2026-03-12T18:44:48",{"id":161,"version":162,"summary_zh":163,"released_at":164},297277,"v0.5.3","## 概述\n\nGuideLLM v0.5.3 是一个非常小的补丁版本，主要功能是支持 Mistral 3 模型的分词器。\n\n要开始使用，请通过以下命令安装：\n\n```bash\npip install guidellm[recommended]==0.5.3\n```\n\n或者从源代码安装：\n\n```bash\npip install 'guidellm[recommended] @ git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git'@v0.5.3\n```\n\n## 变更内容\n\n* 定速率类型的基准测试现在支持 `--rampup` 功能，该功能会线性地逐步提高输入速率。\n* 添加了 `mistral-common` 作为可选依赖项，以支持加载基于 Mistral 3 的分词器。\n    * 注意：加载 Mistral 分词器还需要 `transformers>=5.0.0`。\n\n## 兼容性说明\n\n- **Python**: 3.10–3.13\n- **操作系统**: Linux、MacOS\n\n## 更改日志\n\n### 错误修复\n\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F540 中修正了对 transformers 内部模块的导入问题。\n\n### 功能改进\n\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F541 中添加了 Mistral 分词器作为可选依赖项。\n* 由 @jaredoconnell 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F549 中为定速率类型添加了预热功能。\n\n### 文档更新\n\n* 由 @rgerganov 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F536 中添加了如何与 llama.cpp 配合使用的文档说明。\n\n**完整更改日志**: https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fcompare\u002Fv0.5.2...v0.5.3","2026-01-23T18:44:10",{"id":166,"version":167,"summary_zh":168,"released_at":169},297278,"v0.5.2","## 概述\n\nGuideLLM v0.5.2 继续修复错误，并重新引入在 v0.4.0 中移除的功能。\n\n要开始使用，请通过以下命令安装：\n\n```bash\npip install guidellm[recommended]==0.5.2\n```\n\n或者从源代码安装：\n\n```bash\npip install 'guidellm[recommended] @ git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git'@v0.5.2\n```\n\n## 变更内容\n\n* 重新支持传递 API 密钥。可以通过 `--backend-kwargs '{\"api_key\": \"KEY\"}'` 参数设置 API 密钥。\n* 控制台输出现在使用“总请求数”类别，而不是仅显示“成功请求数”。更多信息请参阅 #529。\n\n## 修复内容\n\n* 修复了在基准测试启动时可能发生的死锁问题，该问题可能导致首次请求发送显著延迟。\n* 修复了图像和视频 URL 的格式化问题。旧版本虽然适用于 vLLM，但不符合 OpenAI API 的规范。\n\n## 兼容性说明\n\n- **Python**: 3.10–3.13\n- **操作系统**: Linux、MacOS\n\n## 更改日志\n\n### 错误修复\n\n* 由 Vinno97 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F525 中实现，使 `image_url` 和 `video_url` 以字典形式发送。\n* 由 sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F528 中修复策略初始化时的死锁问题。\n\n### 功能改进\n\n* 由 sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F530 中将总请求数用于吞吐量计算。\n* 由 jaredoconnell 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F534 中添加了记录后端错误的选项。\n* 由 jaredoconnell 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F535 中增加了对 OpenAI API 密钥的支持。\n\n### 文档更新\n\n* 由 sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F531 中修复了一些过时的文档示例。\n* 文档：由 maryamtahhan 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F532 中更新了 vLLM 模拟器的链接。\n\n## 新贡献者\n* @Vinno97 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F525 中完成了他们的首次贡献。\n\n**完整更改日志**: https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fcompare\u002Fv0.5.1...v0.5.2","2026-01-16T20:29:57",{"id":171,"version":172,"summary_zh":173,"released_at":174},297279,"v0.5.1","## 概述\n\nGuideLLM v0.5.1 修复了 v0.4.0 中引入的多个问题。由于未完成请求计数存在一些 bug，我们建议所有依赖 `--max-requests` 以外约束条件的用户立即升级。\n\n开始使用前，请通过以下命令安装：\n\n```bash\npip install guidellm[recommended]==0.5.1\n```\n\n或者从源代码安装：\n\n```bash\npip install 'guidellm[recommended] @ git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git'@v0.5.1\n```\n\n## 修复内容\n\n* 未完成的请求现在会记录在取消之前接收到的 token 数量。这恢复了 v0.4.0 之前的正常行为。\n* 音频基准测试现适用于 vLLM `>v0.7.3` 版本。\n\n## 兼容性说明\n\n- **Python**: 3.10–3.13\n- **操作系统**: Linux、MacOS\n\n## 更改日志\n\n### Bug 修复\n\n* 修复当多个百分位数相同时的 HTML 渲染问题，由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F515 中完成。\n* 修复转录\u002F翻译端点的使用问题，由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F524 中完成。\n* 记录未完成请求的 output_tokens，由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F519 中完成。\n\n### 文档\n\n* 修正输出文档中的参数语法，由 @maryamtahhan 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F508 中完成。\n\n### CI、工作流与打包\n\n* 内存单元测试用例，由 @tukwila 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F506 中完成。\n* 更新 db_file 单元测试用例，由 @tukwila 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F507 中完成。\n* 由于端到端测试不稳定，在 CI 中重新运行端到端测试，由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F503 中完成。\n* 更新 settings.py 文件，由 @DaltheCow 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F512 中完成。\n* 容器构建缓存优化，由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F509 中完成。\n* 为 src\u002Fguidellm\u002Fdata\u002Fdeserializers\u002Fhuggingface.py 编写单元测试，由 @tukwila 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F500 中完成。\n* 模拟音频转录\u002F翻译函数，由 @tukwila 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F521 中完成。\n\n## 新贡献者\n* @maryamtahhan 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F508 中完成了首次贡献。\n\n**完整更改日志**: https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fcompare\u002Fv0.5.0...v0.5.1","2026-01-14T18:20:54",{"id":176,"version":177,"summary_zh":178,"released_at":179},297280,"v0.5.0","## 概述\n\nGuideLLM v0.5.0 是一个小版本更新，新增了在服务器过载时的请求限流功能。该版本重新引入了数据集预处理功能，并修复了 v0.4.0 中出现的各种问题。\n\n开始使用前，请通过以下命令安装：\n\n```bash\npip install guidellm[recommended]==0.5.0\n```\n\n或者从源代码安装：\n\n```bash\npip install 'guidellm[recommended] @ git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git'@v0.5.0\n```\n\n## 重大变更\n\n- **吞吐量模式**：此前的吞吐量模式假设使用固定速率。\n  - *迁移说明*：现在，如果不在 sweep 模式下使用吞吐量模式，则需要手动指定 `--rate` 参数。\n\n## 新增内容\n\n- **全面的预处理指南**：对文档进行了重大更新，涵盖了预处理配置、短提示处理策略、高级列映射以及可复现性控制等内容。\n- **过载检测**：当 LLM 服务器负载过高时，会自动停止基准测试，并提供了可调优的过载检测机制及详细文档。\n\n## 变更内容\n\n- **数据集预处理命令**：重新启用 `guidellm preprocess dataset` 命令，支持自定义提示\u002F输出标记大小、列映射、批量预处理、提示策略以及 Hugging Face 上传等功能。\n- **基准测试 CLI**：新增了 `--detect-saturation` 和 `--over-saturation` 参数，用于更稳健地指定过载约束。\n\n## 修复内容\n\n- **各类文档修复**：优化了文档内容，并移除了过时的引用。\n- **数据集处理器**：修复了一个边缘情况，即有限长度的数据集在耗尽后会卡死的问题。\n- **连接数限制**：取消了每个工作进程的最大连接数限制，此前该限制被固定为 100。\n\n## 兼容性说明\n\n- **Python**：3.10–3.13\n- **操作系统**：Linux、macOS\n\n## 更改日志\n\n### 错误修复\n\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F468 中修复 DataLoader 中未捕获的 StopIteration 异常。\n* 由 @tukwila 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F480 中修复使用字典输入时 encode_audio 函数失败的问题。\n* 由 @tukwila 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F489 中为音频和视觉编码函数添加单元测试。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F488 中允许每个工作进程拥有无限数量的连接。\n\n### 新特性\n\n* 由 @AlonKellner-RedHat 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F438 中添加过载约束。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F497 中向基准测试报告中添加更多元数据。\n* 由 @toslali-ibm 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F455 中将 vllm ID 添加到响应中。\n* 由 @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F467 中为吞吐量模式指示最大并发数，并禁止在没有 --rate 参数的情况下单独运行。\n* 由 @jaredoconnell 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpull\u002F472 中重新启用并改进数据集预处理功能。\n\n### CI、工作流与打包\n\n* 由 @yankay 在 https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm\u002Fpul 中通过将构建类型 ARG 移至 FROM 之后来修复容器版本不匹配的问题。","2025-12-16T14:50:44",{"id":181,"version":182,"summary_zh":183,"released_at":184},297281,"v0.4.0","## **概述**\n\n本次发布标志着一个重要里程碑，对 GuideLLM 代码库进行了**全面架构重构**，以提升可扩展性、性能和可维护性。主要亮点包括**多模态基准测试支持**（视觉与音频）、用于测试的新**模拟服务器**，以及对输出生成和统计收集的全面更新。此外，最低支持的 Python 版本已提升至 3.10，以便利用现代语言特性。\n\n开始使用时，请通过以下命令安装：\n\n```bash\npip install guidellm==0.4.0\n```\n\n或从源代码安装：\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git@v0.4.0\n```\n\n## **新增功能**\n\n- **多模态支持**：新增对视觉和音频工作负载的全面支持，包括音频转录和翻译的基准测试。\n- **全面重构**：对核心包（`backends`、`scheduler`、`benchmark`、`data`）进行彻底重组，以支持高频率负载生成并简化扩展。\n- **模拟服务器**：引入内置模拟服务器包，便于在无需实时 LLM 后端的情况下进行测试和开发。\n- **端到端测试**：新增端到端（E2E）测试流程，并配备专用的 vLLM 模拟器。\n\n## **变更内容**\n\n- **Python 要求**：最低支持的 Python 版本提升至**3.10**（此前为 3.9）。\n- **CLI 参数**：\n    - 将 `--rate-type` 重命名为 `--profile`，以提高清晰度。\n    - `--output-path` 已拆分为 `--output-dir` 和 `--outputs`。例如：`--output-dir \u002Fresults --outputs benchmark.json,benchmark.csv`。`--output-path` 在本版本中仍可使用，但未来将被弃用。\n- **容器**：更新 Docker 容器，加入 `ffmpeg` 及其他工具，以支持多模态功能。\n- **数据管道**：重新设计数据管道，以支持复杂的多模态数据集，并改进 Hugging Face 加载过程中的错误传播机制。\n\n## **修复内容**\n\n- **合成数据**：修复了在同一会话中多次基准测试时，合成文本数据集随机性丢失的问题。\n- **CSV 生成**：解决了基准测试过程中 CSV 输出生成失败的问题。\n- **Asyncio 稳定性**：修复了测试和调度器中与 `asyncio` 及时区相关的问题。\n- **类型安全**：对整个代码库进行了大量类型修复和优化，特别是在 `scheduler` 和 `utils` 包中。\n\n## **兼容性说明**\n\n- **Python**：3.10 – 3.13\n- **操作系统**：Linux 和 macOS\n- **依赖项**：\n    - 新增 `torchcodec`\n    - 移除 `librosa`、`pydub`、`soundfile`\n    - 开发工作流现使用 `pdm` 和 `tox-pdm`\n\n## **新贡献者**\n\n- @shijinye 在 PR #327 中完成了首次贡献。\n- @git-jxj 在 PR #435 中完成了首次贡献。\n- @AlonKellner-RedHat 在 PR #440 中完成了首次贡献。\n\n## **变更日志**\n\n### 重构与核心架构\n\n- PR #351：GuideLLM 全面重构\n- PR #354：调度器包更新与重写，以及","2025-11-21T22:29:53",{"id":186,"version":187,"summary_zh":188,"released_at":189},297282,"v0.3.1","# GuideLLM v0.3.1\n\n## 概述\n本次为小版本发布，重点在于容器构建与标签管理的稳定性、用户界面的优化及术语的一致性、OpenAI 后端的鲁棒性和可配置性的提升、更清晰的 JSON 输出，以及新增文档（包括 llama.cpp 的使用说明和 vLLM 模拟器的演练）。工作流现在会生成带有版本号的产物，并自动维护 latest 和 stable 标签。\n\n开始使用时，可通过以下命令安装：\n```bash\npip install guidellm[recommended]==0.3.1\n```\n或从源码安装：\n```bash\npip install 'guidellm[recommended] @ git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git'@v0.3.1\n```\n\n\n## 新增内容\n- **推荐扩展组**：通过 `guidellm[recommended]` 安装 OpenAI 分词器依赖（tiktoken、blobfile）\n- **llama.cpp 使用指南**：新增文档，涵盖 llama-server、模型别名及元数据处理\n- **vLLM 模拟器示例**：逐步演示“首次基准测试”，并附带示例输出图片\n- **容器维护工作流**：定时清理旧的 PR 镜像；自动重新打 latest 和 stable 标签\n\n## 变更内容\n- **UI 优化**：更清晰的标签（例如“每次请求耗时”、“测量 RPS（均值）”）和滑块文本\n- **版本化报告**：PROD\u002FSTAGING 报告 URL 固定到特定版本的 UI 构建\n- **容器构建系统**：新增顶层 Containerfile，采用 Fedora Python minimal + PDM；构建类型通过 GUIDELLM_BUILD_TYPE 指定\n- **指标 JSON 输出**：使用 UTF-8 编码，并以美观缩进的格式输出 JSON\n- **端点最大 token 键**：输出 token 上限现由每个端点单独控制，通过 `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` 配置\n\n## 修复内容\n- **流式传输鲁棒性**：安全处理聊天流中缺失的 delta.content\n- **端点 token 键**：支持按端点配置最大输出键（max_tokens vs max_completion_tokens）\n- **CI 稳定性**：修复了 RC 标签、GH Pages 发布路径以及工作流中的拼写错误；禁用镜像清理的试运行模式\n\n## 兼容性说明\n- **Python**：3.9–3.13\n- **操作系统**：Linux 和 MacOS\n- **依赖项**：可通过 `guidellm[recommended]` 安装可选扩展；目前包含 OpenAI 分词器的相关包，未来可能会进一步扩展\n- **破坏性变更**：此前所有端点都同时使用 `max_tokens` 和 `max_completion_tokens` 来限制输出，这在某些服务器上引发了问题。\n  - 现在该键由每个端点单独控制，默认情况下，“completions” 继续使用 `max_tokens`，而 “chat\u002Fcompletions” 则使用 `max_completion_tokens`\n\n## 新贡献者\n- **@rgerganov**：在 PR #318 中完成了首次贡献\n- **@git-jxj**：在 PR #316 中完成了首次贡献\n- **@psydok**：在 PR #372 中完成了首次贡献\n\n## 更改日志\n- 用户界面与展示\n  - #386：将 TPOT 统一更新为 ITL，覆盖所有标签和代码\n  - #298：更新 RPS 滑块标签\n  - #301：修正 GH Pages 的 UI 发布路径（src\u002Fui\u002Fout）\n  - #317：修正类型提示，以消除 Pydantic 序列化警告\n- 后端\n  - #399：使 `max_tokens`\u002F`max_completion_tokens` 键可按端点配置\n  - #316：处理流式传输中缺失的内容\n- 容器与","2025-10-10T14:35:15",{"id":191,"version":192,"summary_zh":193,"released_at":194},297283,"v0.3.0","# **GuideLLM v0.3.0**\n\n## **概述**\n\n这是一次重大（非语义版本号意义上的）发布，引入了 GuideLLM 的 Web UI、容器化基准测试、数据集预处理以及显著的工作流改进。此次发布将项目从 Neural Magic 组织迁移到 vLLM 项目生态中，同时扩展了基准测试能力并提升了开发者体验。\n\n要开始使用，请通过以下命令安装：\n\n```bash\npip install guidellm==0.3.0\n```\n\n或者从源代码安装：\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fguidellm.git@v0.3.0\n```\n\n## **新增功能**\n\n- **GuideLLM Web UI**：完整的前端界面，提供交互式图表和基准测试结果的可视化展示。\n- **数据集预处理**：新增 `preprocess` 命令，可根据 token 分布对数据集进行筛选，并将其保存到本地文件或 Hugging Face Hub。\n- **容器化基准测试**：支持 Docker，可通过可配置的环境变量实现简化部署。\n- **基准测试场景**：支持基于文件的基准测试配置，并采用 Pydantic 进行验证。\n- **HTML 报告生成**：生成包含嵌入式可视化数据的静态 HTML 报告。\n\n## **变更内容**\n\n- **项目迁移**：从 neuralmagic 迁移至 vllm-project GitHub 组织，更新了相关链接和品牌标识。\n- **调度优化**：统一了 RPS 和并发调度路径，以更好地支持多轮对话。\n- **OpenAI 后端增强**：新增对自定义 headers、SSL 验证控制、查询参数以及请求体修改的支持。\n- **开发工作流**：简化了 CI\u002FCD 流程，统一了测试执行方式，改进了 pre-commit 检查，并优化了构建产物管理。\n- **合成数据生成器**：新增前缀缓存控制和唯一 prompt 生成功能。\n\n## **修复内容**\n\n- **指标计算**：修复了 token 计算中的重复计数问题及并发变化事件的问题。\n- **事件循环错误**：解决了 HTTP 客户端连接池中的“事件循环已关闭”错误。\n- **Token 计数**：修复了合成数据生成器中的最大 token 数限制以及首次解码时的 token 计数问题。\n- **显示问题**：修正了指标单位的显示问题，并增强了 Web UI 在 Firefox 浏览器上的兼容性。\n\n## **兼容性说明**\n\n- Python：3.9–3.13\n- 操作系统：Linux 和 macOS\n- 依赖项：升级至最新版 Pydantic，锁定 Click 版本以支持 Python 3.9。\n- **破坏性变更**：移除了若干 UI 工作流组件及 husky pre-commit 钩子。\n- **破坏性变更**：更新了项目 URL，由 vllm-project 组织切换至 neuralmagic 组织。\n\n## **新贡献者**\n\n- @chewong 在 #168 中做出了首次贡献。\n- @dagrayvid 在 #173 中做出了首次贡献。\n- @TomerG711 在 #162 中做出了首次贡献。\n- @wangchen615 在 #123 中做出了首次贡献。\n- @kyolebu 在 #207 中做出了首次贡献。\n- @rymc 在 #223 中做出了首次贡献。\n- @jaredoconnell 在 #185 中做出了首次贡献。\n- @natoscott 在 #231 中做出了首次贡献。\n- @kdelee 做出了他们的","2025-09-16T15:20:34",{"id":196,"version":197,"summary_zh":198,"released_at":199},297284,"v0.2.1","## 摘要\n* 修复了导致基准测试崩溃的错误：这些错误是由于不当调用 `datasets` 库的 `load_dataset` 函数而引起的，现已支持 HF 数据集和本地数据文件。\n* 根据最新的发布标准重构了 CI\u002FCD 系统。\n\n## 变更内容\n* 将主分支版本更新至 0.3.0，以便 @markurtz 在 https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F127 中开始下一次发布的工作。\n* @markurtz 在 https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F128 中修复了 README.md 中 Python 版本的显示问题。\n* @hhy3 在 https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F129 中修复了日志记录问题。\n* @markurtz 在 https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F131 中修复了真实数据和 `chat_completions` 路径中的数据及请求相关问题。\n* @sjmonson 在 https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F132 中修复了夜间单元测试中的参数错误。\n* @markurtz 在 https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F137 中添加了关于数据\u002F数据集以及如何在 GuideLLM 中配置它们的文档。\n* @markurtz 在 https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F135 中根据最新的上游标准化要求重构了 CI\u002FCD 系统。\n\n## 新贡献者\n* @hhy3 在 https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F129 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fcompare\u002Fv0.2.0...v0.2.1","2025-04-29T18:02:54",{"id":201,"version":202,"summary_zh":203,"released_at":204},297285,"v0.2.0","## Summary\r\n\r\n- **Minimal Execution Overheads**\r\n    - Refactor enabling async multi-process\u002Fthreaded design with just 0.16% overhead in synchronous and 99.9% accuracy for constant requests\r\n- **Robust Accuracy + Monitoring**\r\n    - Built-in timings and diagnostics added to validate performance and catch regressions\r\n- **Flexible Benchmarking Profiles**\r\n    - Prebuilt support for synchronous, concurrent (added), throughput, constant rate, poisson rate, and sweep modes\r\n- **Unified Input\u002FOutput Formats**\r\n    - JSON, YAML, CSV, and console output now standardized\r\n- **Multi-Use Data Loaders**\r\n    - Native support for HuggingFace datasets, file-based data, and synthetic samples with fixes for previous flows and expanded support\r\n- **Pluggable Backends via OpenAI-Compatible APIs**\r\n    - Redeisgned to work out of the box with OpenAI style HTTP servers, easily expandable to other interfaces and servers. Fixed issues related to improper token lengths and more\r\n\r\n## What's Changed\r\n* Add summary metrics to saved json file by @anmarques in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F46\r\n* ADD TGI docs by @philschmid in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F43\r\n* Add missing vllm docs link by @eldarkurtic in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F50\r\n* Change default \"role\" from \"system\" to \"user\" by @philschmid in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F53\r\n* FIX TGI example by @philschmid in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F51\r\n* Revert Summary Metrics and Expand Test Coverage to Stabilize Nightly\u002FMain CI by @markurtz in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F58\r\n* [Dataset]: Iterate through benchmark dataset once by @parfeniukink in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F48\r\n* Replace busy wait in async loop with a Semaphore by @sjmonson in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F80\r\n* Add backend_kwargs to generate_benchmark_report by @jackcook in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F78\r\n* Drop request count check from throughput sweep profile by @sjmonson in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F89\r\n* Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance by @markurtz in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F91\r\n* Multi Process Scheduler Implementation, Benchmarker, and Report Generation Refactor by @markurtz in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F96\r\n* Update the README by @sjmonson in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F112\r\n* Fix units for Req Latency in output to seconds by @smalleni in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F113\r\n* Fix\u002Fnon integer rates by @thameem-abbas in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F116\r\n* Output support expansion, code hygiene, and tests by @markurtz in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F117\r\n* Bump min python to 3.9 by @sjmonson in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F121\r\n* v0.2.0 Version Update and Docs Expansions by @markurtz in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F118\r\n* Fix issue if async task count does not evenly divide accross process pool by @sjmonson in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F120\r\n* Readme grammar updates and cleanup by @markurtz in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F124\r\n* Update CICD flows to enable automated releases and match the feature set laid out in #56 by @markurtz in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F125\r\n* CI\u002FCD Build Fixes for Release by @markurtz in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F126\r\n\r\n## New Contributors\r\n* @anmarques made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F46\r\n* @philschmid made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F43\r\n* @eldarkurtic made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F50\r\n* @sjmonson made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F80\r\n* @jackcook made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F78\r\n* @smalleni made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F113\r\n* @thameem-abbas made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F116\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fcompare\u002Fv0.1.0...v0.2.0","2025-04-18T21:07:51",{"id":206,"version":207,"summary_zh":208,"released_at":209},297286,"v0.1.0","## What's Changed\r\n\r\nInitial release of GuideLLM with version 0.1.0! This core release adds the basic structure, infrastructure, and code for benchmarking LLM deployments across several different use cases utilizing a CLI interface and terminal output. Further improvements are coming soon!\r\n\r\n* Support added for general OpenAI backends and any text-input-based model served through those\r\n* Support added for emulated, transformers, and file-based datasets\r\n* Support added for general file storage of the full benchmark\u002Fevaluation that was run\r\n* Full support for different benchmark types including sweeps, synchronous, throughput, constant, and poison enabled through new scheduler and executor interfaces built on top of Python's asyncio\r\n\r\n## New Contributors\r\n* @DaltheCow made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F4\r\n* @markurtz made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F3\r\n* @rgreenberg1 made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F21\r\n* @jennyyangyi-magic made their first contribution in https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fpull\u002F35\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fneuralmagic\u002Fguidellm\u002Fcommits\u002Fv0.1.0","2024-09-04T05:39:37"]