[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-tensorzero--tensorzero":3,"tool-tensorzero--tensorzero":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":67,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":67,"owner_website":81,"owner_url":82,"languages":83,"stars":121,"forks":122,"last_commit_at":123,"license":124,"difficulty_score":23,"env_os":125,"env_gpu":126,"env_ram":127,"env_deps":128,"category_tags":136,"github_topics":137,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":158,"updated_at":159,"faqs":160,"releases":195},3825,"tensorzero\u002Ftensorzero","tensorzero","TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.","TensorZero 是一款开源的 LLMOps（大语言模型运维）平台，旨在为开发者提供一站式的大模型应用管理与优化方案。它巧妙地将网关、可观测性、评估、优化和实验五大核心功能整合在一起，帮助用户轻松应对大模型落地过程中的复杂挑战。\n\n在实际开发中，团队常面临接入多家模型供应商接口繁琐、难以追踪模型表现、缺乏系统化评估手段以及优化流程割裂等痛点。TensorZero 通过统一的 API 网关屏蔽了不同厂商的差异，支持以低于 1 毫秒的极低延迟处理高并发请求；同时，它能将推理数据与用户反馈自动存入数据库，让效果监控和后续优化有据可依。此外，平台内置的 A\u002FB 测试、智能路由及自动重试机制，让模型迭代更加稳健高效。\n\n这款工具特别适合需要构建生产级大模型应用的开发者、算法工程师及技术团队。无论是初创公司还是大型企业，都能利用它快速搭建从原型到生产的完整链路。其独特的技术亮点在于基于 Rust 语言打造的高性能架构，确保了卓越的吞吐量与稳定性，并且完美兼容 OpenAI SDK 和 OpenTelemetry 等主流生态。更值得一提的是，其新推出的\"Autopilot\"功能宛如一位自动化 A","TensorZero 是一款开源的 LLMOps（大语言模型运维）平台，旨在为开发者提供一站式的大模型应用管理与优化方案。它巧妙地将网关、可观测性、评估、优化和实验五大核心功能整合在一起，帮助用户轻松应对大模型落地过程中的复杂挑战。\n\n在实际开发中，团队常面临接入多家模型供应商接口繁琐、难以追踪模型表现、缺乏系统化评估手段以及优化流程割裂等痛点。TensorZero 通过统一的 API 网关屏蔽了不同厂商的差异，支持以低于 1 毫秒的极低延迟处理高并发请求；同时，它能将推理数据与用户反馈自动存入数据库，让效果监控和后续优化有据可依。此外，平台内置的 A\u002FB 测试、智能路由及自动重试机制，让模型迭代更加稳健高效。\n\n这款工具特别适合需要构建生产级大模型应用的开发者、算法工程师及技术团队。无论是初创公司还是大型企业，都能利用它快速搭建从原型到生产的完整链路。其独特的技术亮点在于基于 Rust 语言打造的高性能架构，确保了卓越的吞吐量与稳定性，并且完美兼容 OpenAI SDK 和 OpenTelemetry 等主流生态。更值得一提的是，其新推出的\"Autopilot\"功能宛如一位自动化 AI 工程师，能主动分析数据并自动执行提示词优化与模型调优，显著降低人工运维成本。","\u003Cp>\u003Cpicture>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorzero_tensorzero_readme_8bf54da83908.png\" alt=\"TensorZero Logo\" width=\"128\" height=\"128\">\u003C\u002Fpicture>\u003C\u002Fp>\n\n# TensorZero\n\n\u003Cp>\u003Cpicture>\u003Cimg src=\"https:\u002F\u002Fwww.tensorzero.com\u002Fgithub-trending-badge.svg\" alt=\"GitHub Trending - #1 Repository Of The Day\">\u003C\u002Fpicture>\u003C\u002Fp>\n\n**TensorZero is an open-source LLMOps platform that unifies:**\n\n- **Gateway:** access every LLM provider through a unified API, built for performance (\u003C1ms p99 latency)\n- **Observability:** store inferences and feedback in your database, available programmatically or in the UI\n- **Evaluation:** benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc.\n- **Optimization:** collect metrics and human feedback to optimize prompts, models, and inference strategies\n- **Experimentation:** ship with confidence with built-in A\u002FB testing, routing, fallbacks, retries, etc.\n\nYou can take what you need, adopt incrementally, and complement with other tools.\nIt plays nicely with the **OpenAI SDK**, **OpenTelemetry**, and **every major LLM provider**.\n\nTensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and fuels ~1% of global LLM API spend today.\n\n\u003Cbr>\n\n\u003Cp align=\"center\">\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002F\" target=\"_blank\">Website\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\" target=\"_blank\">Docs\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.x.com\u002Ftensorzero\" target=\"_blank\">Twitter\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fslack\" target=\"_blank\">Slack\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdiscord\" target=\"_blank\">Discord\u003C\u002Fa>\u003C\u002Fb>\n  \u003Cbr>\n  \u003Cbr>\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fquickstart\" target=\"_blank\">Quick Start (5min)\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fdeployment\u002Ftensorzero-gateway\" target=\"_blank\">Deployment Guide\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fapi-reference\" target=\"_blank\">API Reference\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fconfiguration-reference\" target=\"_blank\">Configuration Reference\u003C\u002Fa>\u003C\u002Fb>\n\u003C\u002Fp>\n\n## Demo\n\n\u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F04a8466e-27d8-4189-b305-e7cecb6881ee\">\u003C\u002Fvideo>\n\n## Features\n\n> [!NOTE]\n>\n> ### 🆕 TensorZero Autopilot\n>\n> TensorZero Autopilot is an **automated AI engineer** powered by TensorZero that analyzes LLM observability data, sets up evals, optimizes prompts and models, and runs A\u002FB tests.\n>\n> It **dramatically improves the performance of LLM agents** across diverse tasks:\n>\n> \u003Cimg width=\"600\" alt=\"Bar chart showing baseline vs. optimized scores across diverse LLM tasks\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorzero_tensorzero_readme_d7911f17a8ac.png\" \u002F>\n> \u003Cbr>\n>\n> **[Learn more →](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fautomated-ai-engineer\u002F)**&emsp;&emsp;**[Schedule a demo →](https:\u002F\u002Fwww.tensorzero.com\u002Fschedule-demo)**\n\n### 🌐 LLM Gateway\n\n> **Integrate with TensorZero once and access every major LLM provider.**\n\n- [x] **[Call any LLM](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fcall-any-llm)** (API or self-hosted) through a single unified API\n- [x] Infer with **[tool use](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Ftool-use)**, **[structured outputs (JSON)](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fgenerate-structured-outputs)**, **[batch](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fbatch-inference)**, **[embeddings](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fgenerate-embeddings)**, **[multimodal (images, files)](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fcall-llms-with-image-and-file-inputs)**, **[caching](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Finference-caching)**, etc.\n- [x] **[Create prompt templates and schemas](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fcreate-a-prompt-template)** to enforce a structured interface between your application and the LLMs\n- [x] Satisfy extreme throughput and latency needs, thanks to 🦀 Rust: **[\u003C1ms p99 latency overhead at 10k+ QPS](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fbenchmarks)**\n- [x] **[Ensure high availability](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fretries-fallbacks)** with routing, retries, fallbacks, load balancing, granular timeouts, etc.\n- [x] **[Track usage and cost](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Ftrack-usage-and-cost)** and **[enforce custom rate limits](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Fenforce-custom-rate-limits)** with granular scopes (e.g. tags)\n- [x] **[Set up auth for TensorZero](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Fset-up-auth-for-tensorzero)** to allow clients to access models without sharing provider API keys\n\n#### Supported Model Providers\n\n**[Anthropic](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fanthropic)**,\n**[AWS Bedrock](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Faws-bedrock)**,\n**[AWS SageMaker](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Faws-sagemaker)**,\n**[Azure](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fazure)**,\n**[DeepSeek](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fdeepseek)**,\n**[Fireworks](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Ffireworks)**,\n**[GCP Vertex AI Anthropic](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fgcp-vertex-ai-anthropic)**,\n**[GCP Vertex AI Gemini](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fgcp-vertex-ai-gemini)**,\n**[Google AI Studio (Gemini API)](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fgoogle-ai-studio-gemini)**,\n**[Groq](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fgroq)**,\n**[Hyperbolic](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fhyperbolic)**,\n**[Mistral](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fmistral)**,\n**[OpenAI](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fopenai)**,\n**[OpenRouter](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fopenrouter)**,\n**[SGLang](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fsglang)**,\n**[TGI](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Ftgi)**,\n**[Together AI](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Ftogether)**,\n**[vLLM](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fvllm)**, and\n**[xAI (Grok)](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fxai)**.\n\nNeed something else? TensorZero also supports **[any OpenAI-compatible API (e.g. Ollama)](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fopenai-compatible)**.\n\n#### Usage Example\n\nYou can use TensorZero with any OpenAI SDK (Python, Node, Go, etc.) or OpenAI-compatible client.\n\n1. **[Deploy the TensorZero Gateway](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fdeployment\u002Ftensorzero-gateway)** (one Docker container).\n2. Update the `base_url` and `model` in your OpenAI-compatible client.\n3. Run inference:\n\n```python\nfrom openai import OpenAI\n\n# Point the client to the TensorZero Gateway\nclient = OpenAI(base_url=\"http:\u002F\u002Flocalhost:3000\u002Fopenai\u002Fv1\", api_key=\"not-used\")\n\nresponse = client.chat.completions.create(\n    # Call any model provider (or TensorZero function)\n    model=\"tensorzero::model_name::anthropic::claude-sonnet-4-6\",\n    messages=[\n        {\n            \"role\": \"user\",\n            \"content\": \"Share a fun fact about TensorZero.\",\n        }\n    ],\n)\n```\n\nSee **[Quick Start](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fquickstart)** for more information.\n\n### 🔍 LLM Observability\n\n> **Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time &mdash; all using the open-source TensorZero UI.**\n\n- [x] Store inferences and **[feedback (metrics, human edits, etc.)](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fmetrics-feedback)** in your own database\n- [x] Dive into individual inferences or high-level aggregate patterns using the TensorZero UI or programmatically\n- [x] **[Build datasets](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fapi-reference\u002Fdatasets-datapoints)** for optimization, evaluation, and other workflows\n- [x] Replay historical inferences with new prompts, models, inference strategies, etc.\n- [x] **[Export OpenTelemetry traces (OTLP)](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Fexport-opentelemetry-traces)** and **[export Prometheus metrics](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Fexport-prometheus-metrics)** to your favorite application observability tools\n- [ ] Soon: AI-assisted debugging and root cause analysis; AI-assisted data labeling\n\n### 📈 LLM Optimization\n\n> **Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies &mdash; using the UI or programmatically.**\n\n- [x] Optimize your models with **[supervised fine-tuning](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foptimization\u002Fsupervised-fine-tuning-sft)**, RLHF, and other techniques\n- [x] Optimize your prompts with automated prompt engineering algorithms like **[GEPA](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foptimization\u002Fgepa)**\n- [x] Optimize your **[inference strategy](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Finference-time-optimizations)** with **[dynamic in-context learning](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foptimization\u002Fdynamic-in-context-learning-dicl)**, best\u002Fmixture-of-N sampling, etc.\n- [x] Enable a feedback loop for your LLMs: a data & learning flywheel turning production data into smarter, faster, and cheaper models\n- [ ] Soon: synthetic data generation\n\n### 📊 LLM Evaluation\n\n> **Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges.**\n\n- [x] **[Evaluate individual inferences](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fevaluations\u002Finference-evaluations\u002Ftutorial)** with _inference evaluations_ powered by heuristics or LLM judges (&approx; unit tests for LLMs)\n- [x] **[Evaluate end-to-end workflows](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fevaluations\u002Fworkflow-evaluations\u002Ftutorial)** with _workflow evaluations_ with complete flexibility (&approx; integration tests for LLMs)\n- [x] Optimize LLM judges just like any other TensorZero function to align them to human preferences\n- [ ] Soon: more built-in evaluators; headless evaluations\n\n\u003Ctable>\n  \u003Ctr>\u003C\u002Ftr> \u003C!-- flip highlight order -->\n  \u003Ctr>\n    \u003Ctd width=\"50%\" align=\"center\" valign=\"middle\">\u003Cb>Evaluation » UI\u003C\u002Fb>\u003C\u002Ftd>\n    \u003Ctd width=\"50%\" align=\"center\" valign=\"middle\">\u003Cb>Evaluation » CLI\u003C\u002Fb>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"50%\" align=\"center\" valign=\"middle\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorzero_tensorzero_readme_3bf20c3cfe4a.png\">\u003C\u002Ftd>\n    \u003Ctd width=\"50%\" align=\"left\" valign=\"middle\">\n\u003Cpre>\u003Ccode class=\"language-bash\">docker compose run --rm evaluations \\\n  --evaluation-name extract_data \\\n  --dataset-name hard_test_cases \\\n  --variant-name gpt_4o \\\n  --concurrency 5\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cpre>\u003Ccode class=\"language-bash\">Run ID: 01961de9-c8a4-7c60-ab8d-15491a9708e4\nNumber of datapoints: 100\n██████████████████████████████████████ 100\u002F100\nexact_match: 0.83 ± 0.03 (n=100)\nsemantic_match: 0.98 ± 0.01 (n=100)\nitem_count: 7.15 ± 0.39 (n=100)\u003C\u002Fcode>\u003C\u002Fpre>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### 🧪 LLM Experimentation\n\n> **Ship with confidence with built-in A\u002FB testing, routing, fallbacks, retries, etc.**\n\n- [x] **[Run adaptive A\u002FB tests](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fexperimentation\u002Frun-adaptive-ab-tests)** to ship with confidence and identify the best prompts and models for your use cases.\n- [x] Enforce principled experiments in complex workflows, including support for multi-turn LLM systems, sequential testing, and more.\n\n### & more!\n\n> **Build with an open-source stack well-suited for prototypes but designed from the ground up to support the most complex LLM applications and deployments.**\n\n- [x] Build simple applications or massive deployments with GitOps-friendly orchestration\n- [x] **[Extend TensorZero](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Fextend-tensorzero)** with built-in escape hatches, programmatic-first usage, direct database access, and more\n- [x] Integrate with third-party tools: specialized observability and evaluations, model providers, agent orchestration frameworks, etc.\n- [x] Iterate quickly by experimenting with prompts interactively using the Playground UI\n\n## Frequently Asked Questions\n\n**How is TensorZero different from other LLM frameworks?**\n\n1. TensorZero enables you to optimize complex LLM applications based on production metrics and human feedback.\n2. TensorZero supports the needs of industrial-grade LLM applications: low latency, high throughput, type safety, self-hosted, GitOps, customizability, etc.\n3. TensorZero unifies the entire LLMOps stack, creating compounding benefits. For example, LLM evaluations can be used for fine-tuning models alongside AI judges.\n\n**Can I use TensorZero with \\_\\_\\_?**\n\nYes.\nEvery major programming language is supported.\nIt plays nicely with the **OpenAI SDK**, **OpenTelemetry**, and **every major LLM provider**.\n\n**Is TensorZero production-ready?**\n\nYes.\nTensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and powers ~1% of the global LLM API spend today.\n\nHere's a case study: **[Automating Code Changelogs at a Large Bank with LLMs](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fcase-study-automating-code-changelogs-at-a-large-bank-with-llms)**\n\n**How much does TensorZero cost?**\n\nTensorZero (LLMOps platform) is 100% self-hosted and open-source.\n\nTensorZero Autopilot (automated AI engineer) is a complementary paid product powered by TensorZero.\n\n**Who is building TensorZero?**\n\nOur technical team includes a former Rust compiler maintainer, machine learning researchers (Stanford, CMU, Oxford, Columbia) with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects (e.g. ClickHouse, CockroachDB) and AI labs (e.g. OpenAI, Anthropic). See our **[$7.3M seed round announcement](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Ftensorzero-raises-7-3m-seed-round-to-build-an-open-source-stack-for-industrial-grade-llm-applications\u002F)** and **[coverage from VentureBeat](https:\u002F\u002Fventurebeat.com\u002Fai\u002Ftensorzero-nabs-7-3m-seed-to-solve-the-messy-world-of-enterprise-llm-development\u002F)**. We're **[hiring in NYC](https:\u002F\u002Fwww.tensorzero.com\u002Fjobs)**.\n\n**How do I get started?**\n\nYou can adopt TensorZero incrementally. Our **[Quick Start](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fquickstart)** goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes.\n\n## Get Started\n\n**Start building today.**\nThe **[Quick Start](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fquickstart)** shows it's easy to set up an LLM application with TensorZero.\n\n**Questions?**\nAsk us on **[Slack](https:\u002F\u002Fwww.tensorzero.com\u002Fslack)** or **[Discord](https:\u002F\u002Fwww.tensorzero.com\u002Fdiscord)**.\n\n**Using TensorZero at work?**\nEmail us at **[hello@tensorzero.com](mailto:hello@tensorzero.com)** to set up a Slack or Teams channel with your team (free).\n\n## Examples\n\nWe are working on a series of **complete runnable examples** illustrating TensorZero's data & learning flywheel.\n\n> **[Optimizing Data Extraction (NER) with TensorZero](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Ftree\u002Fmain\u002Fexamples\u002Fdata-extraction-ner)**\n>\n> This example shows how to use TensorZero to optimize a data extraction pipeline.\n> We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL).\n> In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task &mdash; at a fraction of the cost and latency &mdash; using a small amount of training data.\n\n> **[Agentic RAG — Multi-Hop Question Answering with LLMs](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Ftree\u002Fmain\u002Fexamples\u002Frag-retrieval-augmented-generation\u002Fsimple-agentic-rag\u002F)**\n>\n> This example shows how to build a multi-hop retrieval agent using TensorZero.\n> The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.\n\n> **[Writing Haikus to Satisfy a Judge with Hidden Preferences](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Ftree\u002Fmain\u002Fexamples\u002Fhaiku-hidden-preferences)**\n>\n> This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste.\n> You'll see TensorZero's \"data flywheel in a box\" in action: better variants leads to better data, and better data leads to better variants.\n> You'll see progress by fine-tuning the LLM multiple times.\n\n> **[Image Data Extraction — Multimodal (Vision) Fine-tuning](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Ftree\u002Fmain\u002Fexamples\u002Fmultimodal-vision-finetuning)**\n>\n> This example shows how to fine-tune multimodal models (VLMs) like GPT-4o to improve their performance on vision-language tasks.\n> Specifically, we'll build a system that categorizes document images (screenshots of computer science research papers).\n\n> **[Improving LLM Chess Ability with Best-of-N Sampling](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Ftree\u002Fmain\u002Fexamples\u002Fchess-puzzles\u002F)**\n>\n> This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.\n\n## Blog Posts\n\nWe write about LLM engineering on the **[TensorZero Blog](https:\u002F\u002Fwww.tensorzero.com\u002Fblog)**.\nHere are some of our favorite posts:\n\n- **[Bandits in your LLM Gateway: Improve LLM Applications Faster with Adaptive Experimentation (A\u002FB Testing)](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fbandits-in-your-llm-gateway\u002F)**\n- **[Is OpenAI's Reinforcement Fine-Tuning (RFT) Worth It?](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fis-openai-reinforcement-fine-tuning-rft-worth-it\u002F)**\n- **[Distillation with Programmatic Data Curation: Smarter LLMs, 5-30x Cheaper Inference](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fdistillation-programmatic-data-curation-smarter-llms-5-30x-cheaper-inference\u002F)**\n- **[From NER to Agents: Does Automated Prompt Engineering Scale to Complex Tasks?](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Ffrom-ner-to-agents-does-automated-prompt-engineering-scale-to-complex-tasks\u002F)**\n","\u003Cp>\u003Cpicture>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorzero_tensorzero_readme_8bf54da83908.png\" alt=\"TensorZero Logo\" width=\"128\" height=\"128\">\u003C\u002Fpicture>\u003C\u002Fp>\n\n# TensorZero\n\n\u003Cp>\u003Cpicture>\u003Cimg src=\"https:\u002F\u002Fwww.tensorzero.com\u002Fgithub-trending-badge.svg\" alt=\"GitHub Trending - #1 Repository Of The Day\">\u003C\u002Fpicture>\u003C\u002Fp>\n\n**TensorZero 是一个开源的 LLMOps 平台，它统一了以下功能：**\n\n- **网关：** 通过统一的 API 访问所有 LLM 提供商，专为高性能设计（p99 延迟 \u003C1ms）\n- **可观测性：** 将推理结果和反馈存储到您的数据库中，可通过编程接口或 UI 查看\n- **评估：** 使用启发式方法、LLM 评判员等对单个推理或端到端工作流进行基准测试\n- **优化：** 收集指标和人工反馈，以优化提示词、模型和推理策略\n- **实验：** 内置 A\u002FB 测试、路由、回退机制、重试等功能，让您更自信地部署\n\n您可以根据需求选择所需功能，逐步采用，并与其他工具结合使用。\n它与 **OpenAI SDK**、**OpenTelemetry** 以及 **各大主流 LLM 提供商** 都能良好兼容。\n\nTensorZero 目前已被从前沿 AI 创业公司到财富 10 强的企业广泛使用，支撑着全球约 1% 的 LLM API 开支。\n\n\u003Cbr>\n\n\u003Cp align=\"center\">\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002F\" target=\"_blank\">官网\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\" target=\"_blank\">文档\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.x.com\u002Ftensorzero\" target=\"_blank\">Twitter\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fslack\" target=\"_blank\">Slack\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdiscord\" target=\"_blank\">Discord\u003C\u002Fa>\u003C\u002Fb>\n  \u003Cbr>\n  \u003Cbr>\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fquickstart\" target=\"_blank\">快速入门（5 分钟）\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fdeployment\u002Ftensorzero-gateway\" target=\"_blank\">部署指南\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fapi-reference\" target=\"_blank\">API 参考\u003C\u002Fa>\u003C\u002Fb>\n  ·\n  \u003Cb>\u003Ca href=\"https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fconfiguration-reference\" target=\"_blank\">配置参考\u003C\u002Fa>\u003C\u002Fb>\n\u003C\u002Fp>\n\n## 演示\n\n\u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F04a8466e-27d8-4189-b305-e7cecb6881ee\">\u003C\u002Fvideo>\n\n## 功能\n\n> [!NOTE]\n>\n> ### 🆕 TensorZero 自动驾驶\n>\n> TensorZero 自动驾驶是一个由 TensorZero 提供支持的 **自动化 AI 工程师**，它可以分析 LLM 的可观测性数据，设置评估任务，优化提示词和模型，并运行 A\u002FB 测试。\n>\n> 它能够 **显著提升 LLM 代理在各种任务中的性能**：\n>\n> \u003Cimg width=\"600\" alt=\"柱状图显示不同 LLM 任务中基线与优化后的得分对比\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorzero_tensorzero_readme_d7911f17a8ac.png\" \u002F>\n> \u003Cbr>\n>\n> **[了解更多 →](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fautomated-ai-engineer\u002F)**&emsp;&emsp;**[预约演示 →](https:\u002F\u002Fwww.tensorzero.com\u002Fschedule-demo)**\n\n### 🌐 LLM 网关\n\n> **只需集成一次 TensorZero，即可访问所有主流 LLM 提供商。**\n\n- [x] **[调用任意 LLM](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fcall-any-llm)**（无论是 API 还是自托管模型）都可通过单一的统一 API 实现\n- [x] 支持 **[工具使用](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Ftool-use)**、**[结构化输出（JSON）](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fgenerate-structured-outputs)**、**[批量推理](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fbatch-inference)**、**[嵌入生成](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fgenerate-embeddings)**、**[多模态输入（图像、文件）](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fcall-llms-with-image-and-file-inputs)**、**[缓存机制](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Finference-caching)** 等功能\n- [x] **[创建提示模板和模式](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fcreate-a-prompt-template)**，以确保您的应用程序与 LLM 之间的接口标准化\n- [x] 凭借 🦀 Rust 语言的强大性能，满足极高的吞吐量和低延迟需求：**[在 10k+ QPS 下，p99 延迟开销 \u003C1ms](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fbenchmarks)**\n- [x] **[确保高可用性](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fretries-fallbacks)**，通过路由、重试、回退、负载均衡、细粒度超时设置等功能实现\n- [x] **[跟踪使用情况和成本](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Ftrack-usage-and-cost)**，并 **[实施自定义速率限制](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Fenforce-custom-rate-limits)**，支持按标签等细粒度范围进行控制\n- [x] **[为 TensorZero 设置身份验证](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Fset-up-auth-for-tensorzero)**，允许客户端在不共享提供商 API 密钥的情况下访问模型\n\n#### 支持的模型提供商\n\n**[Anthropic](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fanthropic)**，\n**[AWS Bedrock](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Faws-bedrock)**，\n**[AWS SageMaker](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Faws-sagemaker)**，\n**[Azure](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fazure)**，\n**[DeepSeek](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fdeepseek)**，\n**[Fireworks](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Ffireworks)**，\n**[GCP Vertex AI Anthropic](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fgcp-vertex-ai-anthropic)**，\n**[GCP Vertex AI Gemini](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fgcp-vertex-ai-gemini)**，\n**[Google AI Studio (Gemini API)](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fgoogle-ai-studio-gemini)**，\n**[Groq](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fgroq)**，\n**[Hyperbolic](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fhyperbolic)**，\n**[Mistral](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fmistral)**，\n**[OpenAI](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fopenai)**，\n**[OpenRouter](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fopenrouter)**，\n**[SGLang](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fsglang)**，\n**[TGI](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Ftgi)**，\n**[Together AI](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Ftogether)**，\n**[vLLM](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fvllm)**，以及\n**[xAI (Grok)](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fxai)**。\n\n如果您需要其他服务？TensorZero 同时也支持 **[任何 OpenAI 兼容的 API（例如 Ollama）](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fproviders\u002Fopenai-compatible)**。\n\n#### 使用示例\n\n您可以将 TensorZero 与任何 OpenAI SDK（Python、Node.js、Go 等）或其他 OpenAI 兼容的客户端一起使用。\n\n1. **[部署 TensorZero 网关](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fdeployment\u002Ftensorzero-gateway)**（只需一个 Docker 容器）。\n2. 在您的 OpenAI 兼容客户端中更新 `base_url` 和 `model`。\n3. 执行推理：\n\n```python\nfrom openai import OpenAI\n\n# 将客户端指向 TensorZero 网关\nclient = OpenAI(base_url=\"http:\u002F\u002Flocalhost:3000\u002Fopenai\u002Fv1\", api_key=\"not-used\")\n\nresponse = client.chat.completions.create(\n    # 调用任何模型提供商（或 TensorZero 函数）\n    model=\"tensorzero::model_name::anthropic::claude-sonnet-4-6\",\n    messages=[\n        {\n            \"role\": \"user\",\n            \"content\": \"分享一个关于 TensorZero 的有趣事实。\",\n        }\n    ],\n)\n```\n\n更多信息请参阅 **[快速入门](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fquickstart)**。\n\n### 🔍 LLM 可观测性\n\n> **可以放大以调试单个 API 调用，也可以缩小以监控跨模型和提示随时间变化的指标——所有这些都可以通过开源的 TensorZero UI 实现。**\n\n- [x] 将推理结果及**[反馈（指标、人工编辑等）](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Fmetrics-feedback)** 存储到您自己的数据库中\n- [x] 使用 TensorZero UI 或编程方式深入分析单个推理结果或高层次的聚合模式\n- [x] **[构建数据集](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fapi-reference\u002Fdatasets-datapoints)** 用于优化、评估及其他工作流\n- [x] 使用新的提示、模型、推理策略等重放历史推理记录\n- [x] **[导出 OpenTelemetry 跟踪数据 (OTLP)](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Fexport-opentelemetry-traces)** 和 **[导出 Prometheus 指标](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Fexport-prometheus-metrics)** 到您喜爱的应用程序可观测性工具中\n- [ ] 即将推出：AI 辅助调试与根因分析；AI 辅助数据标注\n\n### 📈 LLM 优化\n\n> **将生产环境中的指标和人工反馈发送出去，以便轻松优化您的提示、模型和推理策略——无论是通过 UI 还是编程方式。**\n\n- [x] 使用 **[监督微调](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foptimization\u002Fsupervised-fine-tuning-sft)**、RLHF 等技术优化您的模型\n- [x] 使用自动化提示工程算法（如 **[GEPA](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foptimization\u002Fgepa)**）优化您的提示\n- [x] 使用 **[动态上下文学习](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foptimization\u002Fdynamic-in-context-learning-dicl)**、最佳\u002F混合 N 抽样等方法优化您的 **[推理策略](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fgateway\u002Fguides\u002Finference-time-optimizations)**\n- [x] 为您的 LLM 启用反馈循环：让生产数据驱动的数据与学习飞轮不断迭代，从而打造更智能、更快、更低成本的模型\n- [ ] 即将推出：合成数据生成\n\n### 📊 LLM 评估\n\n> **使用启发式方法和 LLM 评委提供的评估功能，比较提示、模型和推理策略。**\n\n- [x] 使用启发式方法或 LLM 评委支持的 _推理评估_ 对 **[单个推理进行评估](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fevaluations\u002Finference-evaluations\u002Ftutorial)**（类似于 LLM 的单元测试）\n- [x] 使用完全灵活的 _工作流评估_ 对 **[端到端工作流进行评估](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fevaluations\u002Fworkflow-evaluations\u002Ftutorial)**（类似于 LLM 的集成测试）\n- [x] 像优化其他 TensorZero 函数一样优化 LLM 评委，使其与人类偏好保持一致\n- [ ] 即将推出：更多内置评估器；无头评估\n\n\u003Ctable>\n  \u003Ctr>\u003C\u002Ftr> \u003C!-- 翻转高亮顺序 -->\n  \u003Ctr>\n    \u003Ctd width=\"50%\" align=\"center\" valign=\"middle\">\u003Cb>评估 » UI\u003C\u002Fb>\u003C\u002Ftd>\n    \u003Ctd width=\"50%\" align=\"center\" valign=\"middle\">\u003Cb>评估 » CLI\u003C\u002Fb>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"50%\" align=\"center\" valign=\"middle\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorzero_tensorzero_readme_3bf20c3cfe4a.png\">\u003C\u002Ftd>\n    \u003Ctd width=\"50%\" align=\"left\" valign=\"middle\">\n\u003Cpre>\u003Ccode class=\"language-bash\">docker compose run --rm evaluations \\\n  --evaluation-name extract_data \\\n  --dataset-name hard_test_cases \\\n  --variant-name gpt_4o \\\n  --concurrency 5\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cpre>\u003Ccode class=\"language-bash\">运行 ID: 01961de9-c8a4-7c60-ab8d-15491a9708e4\n数据点数量：100\n██████████████████████████████████████ 100\u002F100\n精确匹配：0.83 ± 0.03（n=100）\n语义匹配：0.98 ± 0.01（n=100）\n项目数量：7.15 ± 0.39（n=100）\u003C\u002Fcode>\u003C\u002Fpre>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### 🧪 LLM 实验\n\n> **借助内置的 A\u002FB 测试、路由、回退机制、重试等功能，自信地部署应用。**\n\n- [x] **[运行自适应 A\u002FB 测试](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fexperimentation\u002Frun-adaptive-ab-tests)**，以确保部署成功，并找到最适合您用例的提示和模型。\n- [x] 在复杂的工作流中强制执行原则性的实验，包括对多轮 LLM 系统、顺序测试等的支持。\n\n### 更多！\n\n> **使用一套专为原型设计而优化、但从一开始就旨在支持最复杂的 LLM 应用和部署的开源技术栈进行开发。**\n\n- [x] 无论构建简单应用还是大规模部署，均可采用适合 GitOps 的编排方式\n- [x] **[扩展 TensorZero](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foperations\u002Fextend-tensorzero)**，提供内置逃生通道、以编程优先的方式使用、直接访问数据库等功能\n- [x] 可与其他第三方工具集成：专业的可观测性和评估工具、模型提供商、代理编排框架等\n- [x] 通过 Playground UI 交互式地试验提示，快速迭代\n\n## 常见问题解答\n\n**TensorZero 与其他 LLM 框架有何不同？**\n\n1. TensorZero 让您能够基于生产指标和人类反馈来优化复杂的 LLM 应用。\n2. TensorZero 支持工业级 LLM 应用的需求：低延迟、高吞吐量、类型安全、自托管、GitOps、可定制性等。\n3. TensorZero 统一了整个 LLMOps 堆栈，从而产生叠加效应。例如，LLM 评估可以与 AI 审判官一起用于模型的微调。\n\n**我能否将 TensorZero 与 \\_\\_\\_ 一起使用？**\n\n可以。\n支持所有主流编程语言。\n它能很好地与 **OpenAI SDK**、**OpenTelemetry** 以及 **所有主要的 LLM 提供商** 配合使用。\n\n**TensorZero 是否已准备好投入生产？**\n\n是的。\nTensorZero 目前已被从前沿 AI 创业公司到财富 10 强的企业所采用，并支撑着当今全球 LLM API 支出的约 1%。\n\n这里有一个案例研究：**[利用 LLM 自动化大型银行的代码变更日志](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fcase-study-automating-code-changelogs-at-a-large-bank-with-llms)**\n\n**TensorZero 的费用是多少？**\n\nTensorZero（LLMOps 平台）是 100% 自托管且开源的。\n\nTensorZero Autopilot（自动化 AI 工程师）则是由 TensorZero 提供支持的补充性付费产品。\n\n**谁在构建 TensorZero？**\n\n我们的技术团队包括一位前 Rust 编译器维护者、拥有数千次引用的机器学习研究人员（来自斯坦福、卡内基梅隆、牛津、哥伦比亚大学）以及一家十亿美元估值初创公司的首席产品官。我们得到了与领先开源项目（如 ClickHouse、CockroachDB）和 AI 实验室（如 OpenAI、Anthropic）相同的投资者的支持。请参阅我们的 **[$730 万美元种子轮融资公告](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Ftensorzero-raises-7-3m-seed-round-to-build-an-open-source-stack-for-industrial-grade-llm-applications\u002F)** 和 **[VentureBeat 的报道](https:\u002F\u002Fventurebeat.com\u002Fai\u002Ftensorzero-nabs-7-3m-seed-to-solve-the-messy-world-of-enterprise-llm-development\u002F)**。我们正在 **[纽约招聘](https:\u002F\u002Fwww.tensorzero.com\u002Fjobs)**。\n\n**我该如何开始使用？**\n\n您可以逐步采用 TensorZero。我们的 **[快速入门](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fquickstart)** 只需 5 分钟，就能将一个普通的 OpenAI 封装转换为具备可观测性和微调功能的生产就绪型 LLM 应用程序。\n\n## 开始使用\n\n**立即开始构建。**\n**[快速入门](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fquickstart)** 展示了使用 TensorZero 设置 LLM 应用是多么简单。\n\n**有问题吗？**\n\n欢迎在 **[Slack](https:\u002F\u002Fwww.tensorzero.com\u002Fslack)** 或 **[Discord](https:\u002F\u002Fwww.tensorzero.com\u002Fdiscord)** 上向我们提问。\n\n**在工作中使用 TensorZero 吗？**\n\n请发送邮件至 **[hello@tensorzero.com](mailto:hello@tensorzero.com)**，以便为您和您的团队设置一个免费的 Slack 或 Teams 频道。\n\n## 示例\n\n我们正在开发一系列 **完整的可运行示例**，以展示 TensorZero 的数据与学习飞轮机制。\n\n> **[使用 TensorZero 优化数据提取（NER）](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Ftree\u002Fmain\u002Fexamples\u002Fdata-extraction-ner)**\n>\n> 此示例展示了如何使用 TensorZero 优化数据提取流水线。\n> 我们演示了诸如微调和动态上下文学习（DICL）等技术。\n> 最终，经过优化的 GPT-4o Mini 模型在该任务上的表现超越了 GPT-4o——而且成本和延迟仅为后者的几分之一——仅需少量训练数据。\n\n> **[代理式 RAG — 使用 LLM 进行多跳问答](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Ftree\u002Fmain\u002Fexamples\u002Frag-retrieval-augmented-generation\u002Fsimple-agentic-rag\u002F)**\n>\n> 本示例展示了如何使用 TensorZero 构建一个多跳检索代理。\n> 该代理会迭代地搜索维基百科以收集信息，并决定何时已掌握足够的上下文来回答复杂问题。\n\n> **[根据隐藏偏好撰写俳句以取悦评审](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Ftree\u002Fmain\u002Fexamples\u002Fhaiku-hidden-preferences)**\n>\n> 本示例对 GPT-4o Mini 进行微调，以生成符合特定品味的俳句。\n> 您将看到 TensorZero 的“盒装数据飞轮”发挥作用：更好的变体带来更好的数据，而更好的数据又带来更优的变体。\n> 您会通过多次微调 LLM 看到进展。\n\n> **[图像数据提取 — 多模态（视觉）微调](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Ftree\u002Fmain\u002Fexamples\u002Fmultimodal-vision-finetuning)**\n>\n> 本示例展示了如何对多模态模型（VLMs），如 GPT-4o，进行微调，以提升其在视觉-语言任务中的表现。\n> 具体而言，我们将构建一个系统来分类文档图像（计算机科学论文的截图）。\n\n> **[通过 Best-of-N 抽样提升 LLM 的国际象棋能力](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Ftree\u002Fmain\u002Fexamples\u002Fchess-puzzles\u002F)**\n>\n> 本示例展示了 Best-of-N 抽样如何通过从多个生成选项中选择最有希望的走法，显著增强 LLM 的国际象棋水平。\n\n## 博客文章\n\n我们在 **[TensorZero 博客](https:\u002F\u002Fwww.tensorzero.com\u002Fblog)** 上撰写关于 LLM 工程的文章。\n以下是我们的一些精选文章：\n\n- **[LLM 网关中的赌徒算法：通过自适应实验（A\u002FB 测试）更快地改进 LLM 应用](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fbandits-in-your-llm-gateway\u002F)**\n- **[OpenAI 的强化微调（RFT）值得吗？](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fis-openai-reinforcement-fine-tuning-rft-worth-it\u002F)**\n- **[通过程序化数据整理进行蒸馏：更智能的 LLM，推理成本降低 5–30 倍](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fdistillation-programmatic-data-curation-smarter-llms-5-30x-cheaper-inference\u002F)**\n- **[从 NER 到代理：自动化提示工程能否扩展到复杂任务？](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Ffrom-ner-to-agents-does-automated-prompt-engineering-scale-to-complex-tasks\u002F)**","# TensorZero 快速上手指南\n\nTensorZero 是一个开源的 LLMOps 平台，旨在统一 LLM 网关、可观测性、评估、优化和实验功能。它兼容 OpenAI SDK，支持所有主流 LLM 提供商，并提供高性能（p99 延迟 \u003C1ms）和低开销的生产级解决方案。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows (需安装 WSL2)。\n*   **容器运行时**: 必须安装 **Docker** 和 **Docker Compose**。这是运行 TensorZero Gateway 的最简便方式。\n*   **编程语言环境**: 推荐使用 Python 3.8+ 或 Node.js，以便使用 OpenAI 兼容的 SDK 进行调用。\n*   **网络环境**: 确保能够访问所需的 LLM 提供商 API（如 OpenAI, Anthropic 等）。国内用户若访问受限，建议配置相应的网络代理或在 Docker 环境中设置 `HTTP_PROXY`\u002F`HTTPS_PROXY`。\n\n## 安装步骤\n\nTensorZero 的核心组件是 **Gateway**，可以通过单个 Docker 容器快速部署。\n\n1.  **拉取并运行 TensorZero Gateway**\n\n    使用 Docker Compose 启动服务。您可以创建一个 `docker-compose.yml` 文件，或直接使用命令行运行：\n\n    ```bash\n    docker run --rm -p 3000:3000 tensorzero\u002Ftensorzero-gateway\n    ```\n\n    *注：生产环境部署建议参考官方部署指南配置持久化存储和密钥管理。*\n\n2.  **验证服务状态**\n\n    服务启动后，默认监听在 `http:\u002F\u002Flocalhost:3000`。\n\n## 基本使用\n\nTensorZero 设计为与 **OpenAI SDK** 完全兼容。您无需更改现有代码逻辑，只需调整 `base_url` 和 `model` 参数即可接入。\n\n### 1. 安装依赖\n\n如果您使用 Python，请确保安装了 OpenAI 官方 SDK：\n\n```bash\npip install openai\n```\n\n### 2. 代码示例\n\n以下是最简单的调用示例。我们将客户端指向本地运行的 TensorZero Gateway，并通过统一的 API 调用任意模型（此处以 Anthropic 的 Claude 为例）。\n\n```python\nfrom openai import OpenAI\n\n# 初始化客户端，指向 TensorZero Gateway\n# base_url: 本地网关地址\n# api_key: 网关处理认证，此处可填任意非空字符串（如 \"not-used\"）\nclient = OpenAI(base_url=\"http:\u002F\u002Flocalhost:3000\u002Fopenai\u002Fv1\", api_key=\"not-used\")\n\nresponse = client.chat.completions.create(\n    # 模型命名格式：tensorzero::model_name::provider::model_id\n    # 这将通过网关路由到 Anthropic 的 claude-sonnet-4-6 模型\n    model=\"tensorzero::model_name::anthropic::claude-sonnet-4-6\",\n    messages=[\n        {\n            \"role\": \"user\",\n            \"content\": \"Share a fun fact about TensorZero.\",\n        }\n    ],\n)\n\nprint(response.choices[0].message.content)\n```\n\n### 3. 核心功能说明\n\n*   **统一接入**: 修改 `model` 参数中的 provider 部分（如 `openai`, `anthropic`, `aws-bedrock` 等），即可切换底层模型，无需更改应用代码。\n*   **可观测性**: 所有的推理请求和反馈数据会自动存储在您配置的数据库中，可通过 TensorZero UI 查看。\n*   **高级特性**: 支持工具调用 (Tool Use)、结构化输出 (JSON)、批量推理、多模态输入等功能，用法与原生的 OpenAI SDK 一致。\n\n接下来，您可以查阅 [官方文档](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs) 配置具体的模型提供商密钥、设置评估流程或开启 A\u002FB 测试功能。","某电商初创团队正在开发一款智能客服助手，需要同时调用多家大模型厂商的 API 来处理用户咨询，并持续优化回答质量。\n\n### 没有 tensorzero 时\n- **集成繁琐**：每接入一家新的大模型供应商（如从 OpenAI 切换到 Anthropic），开发人员都需要重写适配代码，维护多套不同的 SDK。\n- **黑盒运行**：无法统一查看不同模型的推理日志和用户反馈，当回答出错时，难以定位是提示词问题还是模型本身的问题。\n- **优化盲目**：缺乏系统的评估机制，调整提示词或更换模型后，只能凭感觉判断效果，无法通过数据量化对比。\n- **上线风险高**：想要尝试新模型策略时，不敢直接全量发布，因为缺少内置的 A\u002FB 测试和自动降级机制，担心影响用户体验。\n\n### 使用 tensorzero 后\n- **统一接入**：通过 tensorzero 的统一网关接口，团队只需一次集成即可灵活切换或并行调用任意主流大模型，无需修改业务代码。\n- **全景可观测**：所有推理请求和用户反馈自动存入数据库并在 UI 可视化，开发人员能迅速追溯坏案，精准分析失败原因。\n- **数据驱动迭代**：利用内置的评估功能，团队可以基于启发式规则或 LLM 裁判自动打分，清晰量化每次提示词优化带来的性能提升。\n- **安全实验**：借助原生支持的 A\u002FB 测试和路由策略，团队放心地将流量按比例分配给新模型，一旦指标下滑自动回滚，确保服务稳定。\n\ntensorzero 将分散的 LLMOps 环节整合为闭环工作流，让团队能以最低成本实现大模型应用的快速迭代与高质量交付。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorzero_tensorzero_3bf20c3c.png","TensorZero","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ftensorzero_fa5cacdf.png","",null,"hello@tensorzero.com","https:\u002F\u002Fwww.tensorzero.com\u002F","https:\u002F\u002Fgithub.com\u002Ftensorzero",[84,88,92,96,100,104,107,111,115,118],{"name":85,"color":86,"percentage":87},"Rust","#dea584",78.4,{"name":89,"color":90,"percentage":91},"TypeScript","#3178c6",15.2,{"name":93,"color":94,"percentage":95},"Python","#3572A5",4.5,{"name":97,"color":98,"percentage":99},"Shell","#89e051",0.6,{"name":101,"color":102,"percentage":103},"Go","#00ADD8",0.4,{"name":105,"color":106,"percentage":103},"Jupyter Notebook","#DA5B0B",{"name":108,"color":109,"percentage":110},"PLpgSQL","#336790",0.3,{"name":112,"color":113,"percentage":114},"Dockerfile","#384d54",0.1,{"name":116,"color":117,"percentage":114},"CSS","#663399",{"name":119,"color":120,"percentage":114},"Lua","#000080",11178,804,"2026-04-05T06:16:36","Apache-2.0","Linux, macOS, Windows","非必需（作为网关代理外部 LLM 时不需要 GPU；若自托管模型需视具体模型而定）","未说明（取决于并发量和是否自托管模型）",{"notes":129,"python":130,"dependencies":131},"TensorZero 主要是一个用 Rust 编写的高性能 LLMOps 网关，推荐通过 Docker 容器部署（单个容器）。它本身不运行大模型，而是统一接入各类 LLM 提供商（如 OpenAI, Anthropic, vLLM, Ollama 等），因此本地通常无需配置 GPU 或特定 Python 环境。若需自托管模型，需配合支持的推理后端（如 vLLM, TGI, SGLang）单独配置硬件资源。支持导出 OpenTelemetry 追踪和 Prometheus 指标。","未说明（主要通过 Docker 部署或使用 OpenAI 兼容 SDK 调用）",[132,133,134,135],"Docker","OpenAI SDK (Python\u002FNode\u002FGo 等)","OpenTelemetry (可选)","Prometheus (可选)",[15,26,13,14],[138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157],"ai","artificial-intelligence","deep-learning","gpt","llm","llmops","llms","machine-learning","rust","ml","mlops","anthropic","llama","openai","generative-ai","ai-engineering","python","ml-engineering","large-language-models","genai","2026-03-27T02:49:30.150509","2026-04-06T05:19:38.826115",[161,166,171,176,181,185,190],{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},17509,"如何在本地运行端到端（E2E）测试时处理缺失的 API 密钥或凭证？","在测试模式（如运行 `cargo run-e2e` 或使用相关功能标志）下，TensorZero 允许跳过某些凭证配置。如果缺少特定模型提供商所需的凭证，系统会发出警告提示：\"You are missing the credentials required for `model.model_name.provider.provider_name`, so the associated tests will likely fail.\"，但不会阻止程序运行。这使得开发者无需配置所有生产环境的密钥即可进行本地开发和测试。","https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Fissues\u002F575",{"id":167,"question_zh":168,"answer_zh":169,"source_url":170},17510,"在使用 Docker Compose 或直接运行二进制文件时，如何正确配置环境变量和 .env 文件？","配置方式取决于运行模式：\n1. 如果使用 Docker Compose 运行网关，必须确保在配置中显式传递 `.env` 文件。\n2. 如果直接运行二进制文件，则需要将凭证作为环境变量直接在 Shell 中设置。\n如果在网关日志中发现未检测到 API 密钥，请检查上述配置是否正确，特别是确认 `.env` 文件是否位于预期路径（如 `example\u002Fproduction-deployment\u002F`）并已被正确加载。","https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Fissues\u002F655",{"id":172,"question_zh":173,"answer_zh":174,"source_url":175},17511,"集成 Ollama 模型提供商时，应该使用 OpenAI 兼容接口还是原生接口？","虽然可以通过 OpenAI 兼容模型提供商使用 TensorZero + Ollama，但在添加专用的 Ollama 模型提供商时，官方建议集成 Ollama 的原生接口（native interface），而不是 OpenAI 兼容接口。原因是原生接口支持更多的功能特性。开发时应参考 `gateway\u002Fsrc\u002Finference\u002Fproviders` 目录下其他提供商（如 `xai.rs` 或 `hyperbolic.rs`）的实现方式。","https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Fissues\u002F662",{"id":177,"question_zh":178,"answer_zh":179,"source_url":180},17512,"在实现反馈 API 的 ID 验证时，如何处理推断写入前的等待逻辑？","早期设计中要求开发者手动“休眠”直到推断写入完成，但这被认为是不好的接口设计。现在的解决方案是使用内部提供的 `throttled_get_target_identifier` 函数。在编写验证 Episode ID 或其他反馈类型的代码时，应直接利用该函数来处理同步逻辑，而无需手动实现等待或轮询机制。可以参考 `gateway\u002Fsrc\u002Fendpoints\u002Ffeedback.rs` 中的实现示例。","https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Fissues\u002F350",{"id":182,"question_zh":183,"answer_zh":184,"source_url":170},17513,"如何为 TensorZero 添加新的模型提供商（Provider）？","添加新提供商通常需要执行以下步骤：\n1. 在 `gateway\u002Fsrc\u002Finference\u002Fproviders` 目录下创建新的提供商文件。\n2. 在 `gateway\u002Fsrc\u002Fmodel.rs` 和 `gateway\u002Fsrc\u002Ferror.rs` 中添加必要的条目。\n3. 将该提供商添加到 E2E 测试套件中（位于 `gateway\u002Ftests\u002Fe2e\u002Fproviders`）。\n4. 在 `examples\u002Fguides\u002Fproviders` 中添加最小化示例。\n5. 全局搜索代码库中现有提供商（如 `mistral`）的引用位置，并在相应位置添加新提供商的类似引用，以确保代码库的一致性。",{"id":186,"question_zh":187,"answer_zh":188,"source_url":189},17514,"在配置工具（tools）时，如何解决工具名称必须全局唯一的问题？","默认情况下，系统使用 `tools.tool_name` 作为传递给模型的名称，这要求该名称在全局范围内唯一。如果需要为不同的工具使用相同的名称，可以在配置表中添加一个可选的 `name` 字段。如果提供了该字段，它将覆盖默认的行为，允许使用自定义名称传递给模型；如果未提供，则保持原有行为（即使用 `tool_name`）。","https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Fissues\u002F230",{"id":191,"question_zh":192,"answer_zh":193,"source_url":194},17515,"如何更新 DPO（直接偏好优化）示例以使用最新的模型版本？","在更新 DPO 示例时，应将基线模型和微调模型都升级到最新版本（例如从 GPT-4o 升级到 GPT-4.1 和 GPT-4.1 Mini）。具体操作包括：\n1. 修改配置文件（如 TOML 文件）以指定新模型。\n2. 运行 NER 任务并重新生成评估图表，确保对比结果包含新模型。\n3. 更新 Notebook 示例，展示如何在 UI 中运行 SFT 作业以及在 recipes 目录中运行 DPO 作业。\n4. 如果有 DICL（上下文学习）演示，也建议使用新模型进行测试并更新结果图表。","https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero\u002Fissues\u002F810",[196,201,206,211,216,221,226,231,236,241,246,251,256,261,266,271,276,281,286,291],{"id":197,"version":198,"summary_zh":199,"released_at":200},107737,"2026.4.0","**新功能**\n\n- 在网关中添加一个 MCP 服务器，在 `\u002Fmcp` 路径下暴露其 API。\n- 通过 API 和 UI 报告提供商提示缓存统计信息。\n- 通过 CLI 工具、API 和 UI 报告推理评估的使用统计信息（例如令牌数、延迟、成本）。\n- 添加 Prometheus 指标 `tensorzero_input_tokens_total` 和 `tensorzero_output_tokens_total`。\n- 添加配置字段 `content_type_overrides`，用于处理长尾提供商的文件输入。\n\n_以及多项内部优化和 UI 改进_\n","2026-04-02T17:00:15",{"id":202,"version":203,"summary_zh":204,"released_at":205},107738,"2026.3.4","> [!WARNING]\n> **计划中的弃用**\n>\n> - 今后，推理评估的配置应嵌套在相关函数之下 **[[文档]](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fevaluations\u002Finference-evaluations\u002Ftutorial)**。您可以通过提供函数名称和评估器列表来运行评估。旧格式将在未来的版本中移除。\n>   ```\n>   [functions.write_haiku.evaluators.exact_match]\n>   type = \"exact_match\"\n>   ```\n> - GEPA 的旧实现（使用 `GEPAConfig` 的 `launch_optimization`）将在未来的版本中被移除。请改用 `t0.optimization.gepa.launch`。**[[文档]](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Foptimization\u002Fgepa)**\n>\n> **Bug 修复**\n>\n> - 修复了一个 UI Bug：在某些路由中，自定义网关的 `base_path` 未被正确处理。（感谢 @wangfenjin！）\n>\n> **新功能**\n>\n> - 开始将嵌入请求纳入 Prometheus 指标 `tensorzero_requests_total` 和 `tensorzero_inferences_total` 中。\n> - 添加了配置字段 `observability.batch_writes.write_queue_capacity`，以在网关中为可观测性数据启用反压机制。\n>\n> _以及多项内部和 UI 改进（感谢 @majiayu000）！_\n>\n> ---\n>\n> [!IMPORTANT]\n>\n> ### 🆕 TensorZero 自动驾驶\n>\n> TensorZero 自动驾驶是由 TensorZero 提供支持的**自动化 AI 工程师**，它能够分析 LLM 可观测性数据、设置评估、优化提示和模型，并运行 A\u002FB 测试。\n>\n> 它可以**显著提升 LLM 代理在各种任务中的性能**：\n>\n> \u003Cimg width=\"600\" alt=\"柱状图显示不同 LLM 任务的基线与优化后得分\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Faa474fe3-b55a-48aa-9f0d-e7c2f8e32ccd\" \u002F>\n> \u003Cbr>\n>\n> **[了解更多 →](https:\u002F\u002Fwww.tensorzero.com\u002Fblog\u002Fautomated-ai-engineer\u002F)**&emsp;&emsp;**[预约演示 →](https:\u002F\u002Fwww.tensorzero.com\u002Fschedule-demo)**","2026-03-26T14:27:42",{"id":207,"version":208,"summary_zh":209,"released_at":210},107739,"2026.3.3","**错误修复**\n\n- 修复了两个影响批量推理的边缘情况。\n- 修复了一个 UI 错误，该错误会影响包含 Base64 编码文件的输入中的“尝试使用…”功能。\n- 移除了针对 JSON 函数和 Anthropic 的助手消息预填充（Anthropic 已弃用此功能）。\n\n**新功能**\n\n- 基于持久工作流实现了 GEPA（自动化提示工程）。\n- 允许用户在 `all_of` 工具评估器中指定重复的工具调用，以实现并行工具调用的评估。\n- 允许用户在 UI 中为 API 密钥指定过期日期。（感谢 @eibrahim95）\n- 允许用户在配置中指定 `object_storage.endpoint = \"env::MY_ENV_VAR\"`，而不仅限于静态值。（感谢 @Meredith2328）\n\n_以及多项底层和 UI 改进（感谢 @majiayu000）！_","2026-03-18T16:23:28",{"id":212,"version":213,"summary_zh":214,"released_at":215},107740,"2026.3.2","**Bug 修复**\n\n- 修复了一个 UI 问题，该问题会导致在依赖历史配置时，某些页面无法正常渲染。\n\n**新功能**\n\n- 新增了 PostgreSQL 作为 ClickHouse 的替代可观测性后端。PostgreSQL 是最简单的入门方式；如果您处理的每秒请求数超过 100，则建议使用 ClickHouse。\n- 为嵌入模型添加了 `openrouter::xxx` 简写语法。\n- 在启用身份验证的情况下，新增了对浏览器中会话级 API 密钥的支持（取代全局环境变量）。\n\n_以及多项底层和 UI 改进！_\n","2026-03-13T16:09:19",{"id":217,"version":218,"summary_zh":219,"released_at":220},107741,"2026.3.1","> [!WARNING]\n> **已完成的弃用**\n>\n> - 移除了 `extra_body` 和 `extra_headers` 中已弃用的 `model_provider_name` 过滤器。请改用 `model_name` 和 `provider_name`。\n> - 移除了旧版实验性 `list_inferences` 端点和方法。请使用新的端点代替。**[[文档]](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fobservability\u002Fquery-historical-inferences)**\n> - 从 TensorZero Python SDK 中移除了多个长期已弃用的类型和方法。\n\n> [!WARNING]\n> **计划中的弃用**\n>\n> - TensorZero Python SDK 中的嵌入式网关将在未来的版本（2026.6 及以上）中被移除。`patch_openai_client` 和 `build_embedded` 已弃用。请改用独立的 TensorZero 网关部署（使用方式：OpenAI SDK 使用 `base_url`；TensorZero SDK 使用 `build_http`）。\n> - 变体配置字段 `weight` 将在未来的版本（2026.6 及以上）中被移除。请使用新的实验配置语义。**[[文档]](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fexperimentation\u002Frun-static-ab-tests)**\n\n**Bug 修复**\n\n- 修复了一个仅影响 Redis 的、与基于 Valkey 的缓存兼容性相关的 bug。\n\n**新功能**\n\n- 在 `launch_optimization_workflow` 中，新增了支持通过 `dataset_name`（而非推理查询）来启动优化工作流的功能。\n\n_以及多项底层和 UI 方面的改进！_\n","2026-03-05T22:13:25",{"id":222,"version":223,"summary_zh":224,"released_at":225},107742,"2026.3.0","> [!WARNING]\n> **已完成的弃用**\n>\n> - 已移除废弃的 Prometheus 指标 `tensorzero_inference_latency_overhead_seconds_histogram`。请改用 `tensorzero_inference_latency_overhead_seconds`。\n\n> [!WARNING]\n> **计划中的弃用**\n>\n> - 用于实验的配置（例如 `static_weights`、`track_and_stop`）已简化。旧的写法将在未来的版本中被移除。更多信息请参阅 **[运行自适应 A\u002FB 测试](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fexperimentation\u002Frun-adaptive-ab-tests)** 和 **[运行静态 A\u002FB 测试](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fexperimentation\u002Frun-static-ab-tests)**。\n> - 评估器配置字段 `cutoff` 将在未来的版本中被移除。取而代之的是，应在 CLI 中提供 `--cutoffs evaluator=value,...`。\n> - 网关路由 `\u002Fvariant_sampling_probabilities` 将在未来的版本中被移除。\n> - 配置字段 `postgres.enabled` 将在未来的版本中被移除。取而代之的是，网关将检查环境变量 `TENSORZERO_POSTGRES_URL` 是否已设置。\n\n**新功能**\n\n- 新增 `regex` 和 `tool_use` 评估器。**[[文档]](https:\u002F\u002Fwww.tensorzero.com\u002Fdocs\u002Fevaluations\u002Finference-evaluations\u002Fconfiguration-reference#evaluator-types)**\n- 在 TensorZero Python SDK 中新增 `experimental_launch_optimization_workflow`。\n\n_以及多项内部和 UI 方面的改进！_","2026-03-04T20:40:09",{"id":227,"version":228,"summary_zh":229,"released_at":230},107743,"2026.2.2","> [!CAUTION]\n> **破坏性变更**\n>\n> - `--config-file` 的 glob 匹配行为已更改：单层通配符（`*`）不再匹配跨目录边界的文件。若需匹配跨目录边界的文件，请使用递归通配符（`**`）。此举使行为与标准 glob 语义保持一致。例如：\n>   - `--config-file *.toml` 只会匹配 `tensorzero.toml`，而不会匹配 `subdir\u002Ftensorzero.toml`。\n>   - `--config-file **\u002F*.toml` 则会同时匹配 `tensorzero.toml` 和 `subdir\u002Ftensorzero.toml`。\n\n> [!WARNING]\n> **已废弃功能**\n>\n> - 移除了用于数据集管理的旧版已废弃端点。现有新端点已完全覆盖相关功能。\n\n**新增功能**\n\n- 增加成本跟踪和基于成本的速率限制。\n- 引入命名空间功能：支持为同一 TensorZero 函数设置多个细粒度的实验（A\u002FB 测试）。\n- 改进对 Anthropic（包括自适应思维）、Fireworks AI、SGLang 和 Together AI 的推理支持。\n- 允许用户将工具自动批准列入白名单，供 TensorZero Autopilot 使用。\n- 在启用 `include_raw_response` 时，报告提供商错误。\n- 在流式推理中新增 `include_aggregated_response` 参数。启用后，最后一个 chunk 将包含一个整合了先前所有 chunk 的聚合输出 `aggregated_response`。\n- 允许用户通过 UI 终止正在进行的评估任务。\n- 支持通过环境变量 `TENSORZERO_GATEWAY_BIND_ADDRESS` 自定义网关绑定地址。\n\n_以及多项底层和 UI 方面的改进（感谢 @Nfemz 和 @greg80303）！_\n","2026-02-26T18:47:41",{"id":232,"version":233,"summary_zh":234,"released_at":235},107744,"2026.2.1","> [!CAUTION]\n> **破坏性变更**\n>\n> - `cache_options.enabled` 的默认值由 `write_only` 更改为 `off`。\n\n**新功能**\n\n- 支持来自 Groq、Mistral 和 vLLM 的推理模型。\n- 支持与 Gemini 及 OpenAI 兼容模型的多轮推理。\n- 支持来自 Together AI 的嵌入模型。\n- 为流式推理添加可配置的 `total_ms` 超时设置。\n- 在 TensorZero 自动驾驶 UI 中展示 top-k 评估结果的图表。\n- 在整个 UI 中添加“向自动驾驶提问”按钮。\n- 允许 TensorZero 自动驾驶编辑您的本地配置文件。\n- 在 OpenAI 兼容端点中返回 `thought` 和 `unknown` 内容块（`tensorzero_extra_content`）。\n\n_以及多项底层和 UI 改进！_","2026-02-16T21:54:07",{"id":237,"version":238,"summary_zh":239,"released_at":240},107745,"2026.2.0","> [!WARNING]\n> **计划中的弃用**\n>\n> - Anthropic 的结构化输出功能已正式发布，不再处于测试阶段，因此 TensorZero 配置字段 `beta_structured_outputs` 现已被忽略并弃用。该字段将在未来的版本中被移除。\n\n**Bug 修复**\n\n- 修复了 `aws_bedrock` 提供商中影响长期持有者 API 密钥的回归问题。\n- 修复了推理详情 UI 页面中工具调用及结果显示时出现的水平溢出问题。\n\n**新功能**\n\n- 为 TensorZero 自动驾驶模式新增 YOLO 模式。\n- 为 TensorZero 自动驾驶会话添加中断功能。\n- 在 UI 中的 TensorZero 自动驾驶会话表格中添加摘要信息。\n\n_以及多项底层和 UI 方面的改进（感谢 @pratikbuilds）！_","2026-02-05T19:42:51",{"id":242,"version":243,"summary_zh":244,"released_at":245},107746,"2026.1.8","**Bug修复**\r\n\r\n- 修复TensorZero自动驾驶UI中的一个竞态条件，该条件可能导致聊天输入框被禁用。\r\n- 增加由TensorZero自动驾驶触发的慢速工具调用（例如评估）的超时时间。\r\n\r\n_以及多项底层和UI改进！_","2026-01-30T22:37:46",{"id":247,"version":248,"summary_zh":249,"released_at":250},107747,"2026.1.7","**New Features**\r\n\r\n- [Preview] TensorZero Autopilot &mdash; an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A\u002FB tests. **[Learn more →](https:\u002F\u002Fwww.tensorzero.com\u002F)** **[Join the waitlist →](https:\u002F\u002Ftensorzero.com\u002Fautopilot-waitlist)**\r\n- Support multi-turn reasoning for xAI (`reasoning_content` only).\r\n\r\n_& multiple under-the-hood and UI improvements!_\r\n","2026-01-30T19:29:46",{"id":252,"version":253,"summary_zh":254,"released_at":255},107748,"2026.1.6","> [!CAUTION]\r\n> **Breaking Changes**\r\n>\r\n> - Moving forward, TensorZero will use the OpenAI API's error format (`{\"error\": {\"message\": \"Bad!\"}`) instead of TensorZero's error format (`{\"error\": \"Bad!\"}`) in the OpenAI-compatible endpoints.\r\n\r\n> [!WARNING]\r\n> **Planned Deprecations**\r\n>\r\n> - When using `unstable_error_json` with the OpenAI-compatible inference endpoint, use `tensorzero_error_json` instead of `error_json`. For now, TensorZero will emit both fields with identical data. The TensorZero inference endpoint is not affected.\r\n\r\n**New Features**\r\n\r\n- Add native support for provider tools (e.g. web search) to the Anthropic and GCP Vertex AI Anthropic model providers. Previously, clients had to use `extra_body` to handle these tools.\r\n- Improve handling of reasoning content blocks when streaming with the OpenAI Responses API.\r\n- Handle inferences with missing `usage` fields gracefully in the OpenAI model provider.\r\n- Improve error handling across the UI.\r\n\r\n_& multiple under-the-hood and UI improvements!_\r\n","2026-01-30T17:42:01",{"id":257,"version":258,"summary_zh":259,"released_at":260},107749,"2026.1.5","> [!CAUTION]\r\n> **Breaking Changes**\r\n>\r\n> - TensorZero will normalize the reported `usage` from different model providers. Moving forward, `input_tokens` and `output_tokens` include all token variations (provider prompt caching, reasoning, etc.), just like OpenAI. Tokens cached by TensorZero remain excluded. You can still access the raw usage reported by providers with `include_raw_usage`.\r\n\r\n> [!WARNING]\r\n> **Planned Deprecations**\r\n>\r\n> - Migrate `include_original_response` to `include_raw_response`. For advanced variant types, the former only returned the last model inference, whereas the latter returns every model inference with associated metadata.\r\n> - Migrate `allow_auto_detect_region = true` to `region = \"sdk\"` when configuring AWS model providers. The behavior is identical.\r\n> - Provide the proper API base rather than the full endpoint when configuring custom Anthropic providers. Example:\r\n>   - Before: `api_base = \"https:\u002F\u002FYOUR-RESOURCE-NAME.services.ai.azure.com\u002Fanthropic\u002Fv1\u002Fmessages\"`\r\n>   - Now: `api_base = \"https:\u002F\u002FYOUR-RESOURCE-NAME.services.ai.azure.com\u002Fanthropic\u002Fv1\u002F\"`\r\n\r\n**Bug Fixes**\r\n\r\n- Fix a regression that triggered incorrect warnings about usage reporting for streaming inferences with Anthropic models.\r\n- Fix a bug in the TensorZero Python SDK that discarded some request fields in certain multi-turn inferences with tools.\r\n\r\n**New Features**\r\n\r\n- Improve error handling across many areas: TensorZero UI, JSON deserialization, AWS providers, streaming inferences, timeouts, etc.\r\n- Support Valkey (Redis) for improving performance of rate limiting checks (recommended at 100+ QPS).\r\n- Support `reasoning_effort` for Gemini 3 models (mapped to `thinkingLevel`).\r\n- Improve handling of Anthropic reasoning models in TensorZero JSON functions. Moving forward, `json_mode = \"strict\"` will use the beta structured outputs feature; `json_mode = \"on\"` still uses the legacy assistant message prefill.\r\n- Improve handling of reasoning content in the OpenRouter and xAI model providers.\r\n- Add `extra_headers` support for embedding models. (thanks @jonaylor89!)\r\n- Support dynamic credentials for AWS Bedrock and AWS SageMaker model providers.\r\n\r\n_& multiple under-the-hood and UI improvements (thanks @ndoherty-xyz)!_\r\n","2026-01-24T17:25:58",{"id":262,"version":263,"summary_zh":264,"released_at":265},107750,"2026.1.2","**New Features**\r\n\r\n- Support appending to arrays with `extra_body` using the `\u002Fmy_array\u002F-` notation.\r\n- Handle cross-model thought signatures in GCP Vertex AI Gemini and Google AI Studio.\r\n\r\n_& multiple under-the-hood and UI improvements (thanks @ecalifornica!)_\r\n","2026-01-15T19:33:47",{"id":267,"version":268,"summary_zh":269,"released_at":270},107751,"2026.1.1","> [!WARNING]\r\n> **Planned Deprecations**\r\n>\r\n> - In a future release, the parameter `model` will be required when initializing `DICLOptimizationConfig`. The parameter remains optional (defaults to `openai::gpt-5-mini`) in the meantime.\r\n\r\n**Bug Fixes**\r\n\r\n- Stop buffering `raw_usage` when streaming with the OpenAI-compatible inference endpoint; instead, emit `raw_usage` as soon as possible, just like in the native endpoint.\r\n- Stop reporting zero usage in every chunk when streaming a cached inference; instead, report zero usage only in the final chunk, as expected.\r\n\r\n**New Features**\r\n\r\n- Support `stream_options.include_usage` for every model under the Azure provider.\r\n\r\n_& multiple under-the-hood and UI improvements!_\r\n","2026-01-14T18:46:55",{"id":272,"version":273,"summary_zh":274,"released_at":275},107752,"2026.1.0","> [!CAUTION]\r\n> **Breaking Changes**\r\n>\r\n> - The Prometheus metric `tensorzero_inference_latency_overhead_seconds` will report a histogram instead of a summary. You can customize the buckets using `gateway.metrics.tensorzero_inference_latency_overhead_seconds_buckets` in the configuration (default: 1ms, 10ms, 100ms).\r\n\r\n> [!WARNING]\r\n> **Planned Deprecations**\r\n>\r\n> - Deprecate the `TENSORZERO_CLICKHOUSE_URL` environment variable from the UI. Moving forward, the UI will query data through the gateway and does not communicate directly with ClickHouse.\r\n> - Rename the Prometheus metric `tensorzero_inference_latency_overhead_seconds_histogram` to `tensorzero_inference_latency_overhead_seconds`. Both metrics will be emitted for now.\r\n> - Rename the configuration field `tensorzero_inference_latency_overhead_seconds_histogram_buckets` to `tensorzero_inference_latency_overhead_seconds_buckets`. Both fields are available for now.\r\n\r\n**New Features**\r\n\r\n- Add optional `include_raw_usage` parameter to inference requests. If enabled, the gateway returns the raw usage objects from model provider responses in addition to the normalized `usage` response field.\r\n- Add optional `--bind-address` CLI flag to the gateway.\r\n- Add optional `description` field to metrics in the configuration.\r\n- Add option to fine-tune Fireworks models without automatic deployment.\r\n\r\n_& multiple under-the-hood and UI improvements (thanks @ecalifornica @achaljhawar @rguilmont)!_\r\n","2026-01-10T18:20:29",{"id":277,"version":278,"summary_zh":279,"released_at":280},107753,"2025.12.6","> [!CAUTION]\r\n> **Breaking Changes**\r\n>\r\n> - Migrated the following optimization fields from the TensorZero Python SDK to the configuration:\r\n>   - **`DICLOptimizationConfig`:** removed `credential_location`.\r\n>   - **`FireworksSFTConfig`:** moved `account_id` to `[provider_types.fireworks.sft]`; removed `api_base` and `credential_location`.\r\n>   - **`GCPVertexGeminiSFTConfig`:** moved `bucket_name`, `bucket_path_prefix`, `kms_key_name`, `project_id`, `region`, and `service_account` to to `[provider_types.gcp_vertex_gemini.sft]`.\r\n>   - **`OpenAIRFTConfig`:** removed `api_base` and `credential_location`.\r\n>   - **`OpenAISFTConfig`:** removed `api_base` and `credential_location`.\r\n>   - **`TogetherSFTConfig`:** `hf_api_token`, `wandb_api_key`, `wandb_base_url`, and `wandb_project_name` moved to `[provider_types.together.sft]`; removed `api_base` and `credential_location`.\r\n\r\n**New Features**\r\n\r\n- Support gateway relay. With gateway relay, an LLM inference request can be routed through multiple independent TensorZero Gateway deployments before reaching a model provider. This enables you to enforce organization-wide controls (e.g. auth, rate limits, credentials) without restricting how teams build their LLM features.\r\n- Add \"Try with model\" button to the datapoint page in the UI.\r\n- Add `tensorzero_inference_latency_overhead_seconds_histogram` Prometheus metric for meta-observability.\r\n- Add `concurrency` parameter to `experimental_render_samples` (defaults to 100).\r\n- Add `otlp_traces_extra_attributes` and `otlp_traces_extra_resources` to the TensorZero Python SDK. (thanks @jinnovation!)\r\n\r\n_& multiple under-the-hood and UI improvements (thanks @ecalifornica)_\r\n","2025-12-26T19:04:00",{"id":282,"version":283,"summary_zh":284,"released_at":285},107754,"2025.12.5","> [!WARNING]\r\n> **Planned Deprecations**\r\n>\r\n> - The variant type `experimental_chain_of_thought` will be deprecated in `2026.2+`. As reasoning models are becoming prevalent, please use their native reasoning capabilities.\r\n> - The `timeout_s` configuration field for best\u002Fmixture-of-N variants will be deprecated in `2026.2+`. Please use the `[timeouts]` block in the configuration for their candidates instead.\r\n\r\n**New Features**\r\n\r\n- Expand the dataset builder in the UI to support complex queries (e.g. filter by tags, feedback).\r\n- Export `tensorzero_inference_latency_overhead_seconds` Prometheus metric for meta-observability.\r\n- Allow users to disable TensorZero API keys using `--disable-api-key` in the CLI. (thanks @jinnovation!)\r\n\r\n_& multiple under-the-hood and UI improvements (thanks @ecalifornica)!_\r\n","2025-12-23T22:02:56",{"id":287,"version":288,"summary_zh":289,"released_at":290},107755,"2025.12.3","**Bug Fixes**\r\n\r\n- Fix a bug where negative tag filters (e.g. `user_id != 1`) matched inferences and datapoints without that tag.\r\n- Fix a bug where metric filters covering default values (e.g. `exact_match = false`) matched inferences without that metric.\r\n- Fix a regression affecting the logger in the UI.\r\n\r\n**New Features**\r\n\r\n- Improve the performance of the inference and datapoint list pages in the UI.\r\n- Support filtering inferences by whether they have a demonstration.\r\n\r\n_& multiple under-the-hood and UI improvements (thanks @jinnovation @ecalifornica @simeonlee)!_","2025-12-17T17:58:25",{"id":292,"version":293,"summary_zh":294,"released_at":295},107756,"2025.12.2","**Bug Fixes**\r\n\r\n- Fix a performance regression affecting the inference table in the UI.\r\n\r\n**New Features**\r\n\r\n- Allow users to customize the log level in the UI (`TENSORZERO_UI_LOG_LEVEL`).\r\n\r\n_& multiple under-the-hood and UI improvements_","2025-12-12T17:26:10"]