[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-JudgmentLabs--judgeval":3,"tool-JudgmentLabs--judgeval":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":79,"owner_url":80,"languages":81,"stars":86,"forks":87,"last_commit_at":88,"license":89,"difficulty_score":23,"env_os":90,"env_gpu":90,"env_ram":90,"env_deps":91,"category_tags":100,"github_topics":101,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":115,"updated_at":116,"faqs":117,"releases":148},2407,"JudgmentLabs\u002Fjudgeval","judgeval","The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.","Judgeval 是一款专为 AI 智能体（Agent）打造的开源监控与评估工具，旨在帮助开发者在构建完成后持续优化和守护应用表现。它主要解决了大模型应用在真实场景中难以量化效果、故障排查困难以及缺乏实时预警的痛点，让团队能够基于生产数据及时发现并修复智能体的行为偏差。\n\n这款工具非常适合从事 LLM 应用开发的工程师、算法研究人员以及需要保障生产环境稳定性的技术团队使用。Judgeval 的核心亮点在于其基于 OpenTelemetry 的追踪能力，只需简单装饰器即可自动记录输入输出与 Token 消耗，无缝融入现有可观测性体系。它支持灵活的评估机制，既提供预置的准确性、相关性等评分标准，也允许用户编写自定义 Python 逻辑作为评判器，甚至能在安全的微虚拟机中运行复杂的代码检查流程。此外，Judgeval 具备独特的在线异步监控功能，可在不影响系统延迟的前提下对实时流量进行打分，并支持配置 Slack 警报。配合数据集管理与提示词版本控制功能，Judgeval 为智能体的全生命周期迭代提供了坚实的数据基础。","\u003Cdiv align=\"center\">\n\n\u003Ca href=\"https:\u002F\u002Fjudgmentlabs.ai\u002F\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"assets\u002Flogo_darkmode.svg\">\n    \u003Cimg src=\"assets\u002Flogo_lightmode.svg\" alt=\"Judgment Logo\" width=\"400\" \u002F>\n  \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n\u003Cbr>\n\n## Agent Behavior Monitoring\n\nTrack and judge agent behavior in online and offline setups. Set up Sentry-style alerts and analyze agent behaviors at scale.\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fjudgeval)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F)\n[![Docs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocumentation-blue)](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation)\n[![Judgment Cloud](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJudgment%20Cloud-brightgreen)](https:\u002F\u002Fapp.judgmentlabs.ai\u002Fregister)\n[![Self-Host](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSelf--Host-orange)](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation\u002Fself-hosting\u002Fget-started)\n\n[![X](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-X\u002FTwitter-000?logo=x&logoColor=white)](https:\u002F\u002Fx.com\u002FJudgmentLabs)\n[![LinkedIn](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJudgmentLabs_judgeval_readme_006e5e7b2fbb.png)](https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fjudgmentlabs)\n\n\u003C\u002Fdiv>\n\n## Overview\n\nJudgeval is an open-source Python SDK for agent behavior monitoring. It provides tracing, evaluation, and online monitoring for LLM-powered applications, enabling you to catch failures in real time and improve agents from production data.\n\nTo get started, try one of the [cookbooks](#cookbooks) below or dive into the [docs](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation).\n\n## Why Judgeval\n\n**OpenTelemetry-based tracing** -- Instrument any function with `@Tracer.observe()`. Automatically captures inputs, outputs, and LLM token usage. Built on OpenTelemetry for full compatibility with existing observability stacks.\n\n**Hosted and custom evaluation** -- Run evaluations against Judgment's hosted scorers (faithfulness, answer relevancy, instruction adherence, etc.) or define your own `Judge` classes with binary, numeric, or categorical response types.\n\n**Online monitoring** -- Score live production traffic asynchronously with `Tracer.async_evaluate()`. Runs server-side with no latency impact. Configure Slack alerts for failures.\n\n**Custom scorer hosting** -- Upload arbitrary Python scorers to run in secure Firecracker microVMs. Any logic you can express in Python -- LLM-as-a-judge, code checks, multi-step pipelines -- can run as a hosted scorer.\n\n**Dataset management and prompt versioning** -- Store golden evaluation sets, version prompt templates with `{{variable}}` syntax, and tag versions for production\u002Fstaging workflows.\n\n**Broad integrations** -- Auto-instrumentation for OpenAI, Anthropic, Google GenAI, and Together AI. Framework support for LangGraph, OpenLit, and Claude Agent SDK.\n\n## Quickstart\n\nInstall the SDK:\n\n```bash\npip install judgeval\n```\n\nSet your credentials ([create a free account](https:\u002F\u002Fapp.judgmentlabs.ai\u002Fregister) if you don't have keys):\n\n```bash\nexport JUDGMENT_API_KEY=...\nexport JUDGMENT_ORG_ID=...\n```\n\n### Tracing\n\nAdd observability to your agent with two lines of setup:\n\n```python\nfrom judgeval import Tracer, wrap\nfrom openai import OpenAI\n\nTracer.init(project_name=\"my-project\")\nclient = wrap(OpenAI())\n\n@Tracer.observe(span_type=\"tool\")\ndef search(query: str) -> str:\n    results = vector_db.search(query)\n    return results\n\n@Tracer.observe(span_type=\"agent\")\ndef run_agent(question: str) -> str:\n    context = search(question)\n    response = client.chat.completions.create(\n        model=\"gpt-4o-mini\",\n        messages=[{\"role\": \"user\", \"content\": f\"{context}\\n\\n{question}\"}],\n    )\n    return response.choices[0].message.content\n\nrun_agent(\"What is the capital of the United States?\")\n```\n\nAll traces are delivered to your [Judgment dashboard](https:\u002F\u002Fapp.judgmentlabs.ai\u002F):\n\n![Judgment Platform Trajectory View](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJudgmentLabs_judgeval_readme_782e534609b0.png)\n\n### Online Monitoring\n\nScore live traffic asynchronously inside any traced function. Evaluations run server-side after the span completes:\n\n```python\n@Tracer.observe(span_type=\"agent\")\ndef run_agent(question: str) -> str:\n    response = client.chat.completions.create(\n        model=\"gpt-4o-mini\",\n        messages=[{\"role\": \"user\", \"content\": question}],\n    )\n    answer = response.choices[0].message.content\n\n    Tracer.async_evaluate(\n        \"answer_relevancy\",\n        {\"input\": question, \"actual_output\": answer},\n    )\n\n    return answer\n```\n\n![Custom Scorer Online ABM](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJudgmentLabs_judgeval_readme_42bceb7a9bcc.png)\n\n### Offline Evaluation\n\nUse the `Judgeval` client to run batch evaluations against hosted scorers:\n\n```python\nfrom judgeval import Judgeval\nfrom judgeval.data import Example\n\nclient = Judgeval(project_name=\"my-project\")\nevaluation = client.evaluation.create()\n\nresults = evaluation.run(\n    examples=[\n        Example.create(\n            input=\"What is 2+2?\",\n            actual_output=\"4\",\n            expected_output=\"4\",\n        ),\n    ],\n    scorers=[\"faithfulness\", \"answer_relevancy\"],\n    eval_run_name=\"nightly-eval\",\n)\n```\n\nResults are returned as `ScoringResult` objects and displayed in the dashboard.\n\n## Custom Judges\n\nDefine your own evaluation logic by subclassing `Judge` with a response type:\n\n```python\nfrom judgeval.judges import Judge\nfrom judgeval.hosted.responses import BinaryResponse\nfrom judgeval.data import Example\n\nclass CorrectnessJudge(Judge[BinaryResponse]):\n    async def score(self, data: Example) -> BinaryResponse:\n        correct = data[\"expected_output\"].lower() in data[\"actual_output\"].lower()\n        return BinaryResponse(\n            value=correct,\n            reason=\"Contains expected answer\" if correct else \"Missing expected answer\",\n        )\n```\n\nThree response types are available:\n\n| Type | Value | Use case |\n|:-----|:------|:---------|\n| `BinaryResponse` | `bool` | Pass\u002Ffail checks |\n| `NumericResponse` | `float` | Continuous scores (0.0 -- 1.0) |\n| `CategoricalResponse` | `str` | Classification into defined categories |\n\n### Scaffold and upload via CLI\n\n```bash\njudgeval scorer init -t binary -n CorrectnessJudge\njudgeval scorer upload correctness_judge.py -p my-project\n```\n\nOnce uploaded, your judge runs in a secure Firecracker microVM and can be used with `Tracer.async_evaluate()` for online monitoring.\n\n## Datasets\n\nManage golden evaluation sets through the platform:\n\n```python\nfrom judgeval import Judgeval\nfrom judgeval.data import Example\n\nclient = Judgeval(project_name=\"my-project\")\n\ndataset = client.datasets.create(\n    name=\"golden-set\",\n    examples=[\n        Example.create(input=\"What is 2+2?\", expected_output=\"4\"),\n        Example.create(input=\"Capital of France?\", expected_output=\"Paris\"),\n    ],\n)\n\ndataset = client.datasets.get(name=\"golden-set\")\n```\n\nDatasets support import from JSON\u002FYAML, batch appending, and export.\n\n## Prompt Versioning\n\nVersion and tag prompt templates with `{{variable}}` placeholders:\n\n```python\nclient = Judgeval(project_name=\"my-project\")\n\nprompt = client.prompts.create(\n    name=\"system-prompt\",\n    prompt=\"You are a helpful assistant for {{product}}. Answer in {{language}}.\",\n    tags=[\"production\"],\n)\n\nprompt = client.prompts.get(name=\"system-prompt\", tag=\"production\")\ncompiled = prompt.compile(product=\"Acme Search\", language=\"English\")\n```\n\n## Integrations\n\n### LLM Providers\n\nWrap any supported client with `wrap()` for automatic span creation and token\u002Fcost tracking:\n\n```python\nfrom judgeval import wrap\n\nclient = wrap(OpenAI())          # OpenAI\nclient = wrap(Anthropic())       # Anthropic\nclient = wrap(genai.Client())    # Google GenAI\nclient = wrap(Together())        # Together AI\n```\n\n### Frameworks\n\n| Framework | Setup |\n|:----------|:------|\n| LangGraph | `from judgeval.integrations import Langgraph; Langgraph.initialize()` |\n| OpenLit | `from judgeval.integrations import Openlit; Openlit.initialize()` |\n| Claude Agent SDK | `from judgeval.integrations import setup_claude_agent_sdk; setup_claude_agent_sdk()` |\n\n## Cookbooks\n\n| Topic | Notebook | Description |\n|:------|:---------|:------------|\n| Online ABM | [Research Agent](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FJudgmentLabs\u002Fjudgment-cookbook\u002Fblob\u002Fmain\u002Fmonitoring\u002FResearch_Agent_Online_Monitoring.ipynb) | Monitor agent behavior in production |\n| Custom Scorers | [HumanEval](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FJudgmentLabs\u002Fjudgment-cookbook\u002Fblob\u002Fmain\u002Fcustom_scorers\u002FHumanEval_Custom_Scorer.ipynb) | Build custom evaluators for your agents |\n\nBrowse the full [cookbook repository](https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgment-cookbook) or watch [video tutorials](https:\u002F\u002Fwww.youtube.com\u002F@Alexshander-JL).\n\n## Links\n\n- [Documentation](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation)\n- [Judgment Platform](https:\u002F\u002Fapp.judgmentlabs.ai\u002F)\n- [Self-Hosting Guide](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation\u002Fself-hosting\u002Fget-started)\n- [Custom Scorers Guide](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation\u002Fevaluation\u002Fcustom-scorers)\n- [Online Evaluation Guide](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation\u002Fperformance\u002Fonline-evals)\n- [Cookbook Repository](https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgment-cookbook)\n- [Video Tutorials](https:\u002F\u002Fwww.youtube.com\u002F@Alexshander-JL)\n\n---\n\nJudgeval is created and maintained by [Judgment Labs](https:\u002F\u002Fjudgmentlabs.ai\u002F).\n","\u003Cdiv align=\"center\">\n\n\u003Ca href=\"https:\u002F\u002Fjudgmentlabs.ai\u002F\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"assets\u002Flogo_darkmode.svg\">\n    \u003Cimg src=\"assets\u002Flogo_lightmode.svg\" alt=\"Judgment Logo\" width=\"400\" \u002F>\n  \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n\u003Cbr>\n\n## 代理行为监控\n\n在在线和离线环境中跟踪并评估代理行为。设置类似 Sentry 的告警，并大规模分析代理行为。\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fjudgeval)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F)\n[![文档](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocumentation-blue)](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation)\n[![Judgment Cloud](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJudgment%20Cloud-brightgreen)](https:\u002F\u002Fapp.judgmentlabs.ai\u002Fregister)\n[![自托管](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSelf--Host-orange)](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation\u002Fself-hosting\u002Fget-started)\n\n[![X](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-X\u002FTwitter-000?logo=x&logoColor=white)](https:\u002F\u002Fx.com\u002FJudgmentLabs)\n[![LinkedIn](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJudgmentLabs_judgeval_readme_006e5e7b2fbb.png)](https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fjudgmentlabs)\n\n\u003C\u002Fdiv>\n\n## 概述\n\nJudgeval 是一个用于代理行为监控的开源 Python SDK。它为由 LLM 驱动的应用程序提供追踪、评估和在线监控功能，使您能够实时捕捉故障，并基于生产数据改进代理。\n\n要开始使用，请尝试下面的 [教程](#cookbooks) 之一，或深入阅读 [文档](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation)。\n\n## 为什么选择 Judgeval\n\n**基于 OpenTelemetry 的追踪** -- 使用 `@Tracer.observe()` 装饰任何函数。自动捕获输入、输出以及 LLM 的 token 使用情况。基于 OpenTelemetry 构建，与现有可观测性堆栈完全兼容。\n\n**托管与自定义评估** -- 可以针对 Judgment 托管的评分器（如忠实度、答案相关性、指令遵循等）运行评估，也可以定义自己的 `Judge` 类，支持二元、数值或分类类型的响应。\n\n**在线监控** -- 使用 `Tracer.async_evaluate()` 异步对实时生产流量进行评分。在服务器端运行，不会影响延迟。可配置 Slack 告警以应对失败情况。\n\n**自定义评分器托管** -- 将任意 Python 评分器上传至安全的 Firecracker 微虚拟机中运行。任何可以用 Python 表达的逻辑——例如将 LLM 用作评判者、代码检查、多步骤流水线等——都可以作为托管评分器运行。\n\n**数据集管理和提示版本控制** -- 存储黄金评估集，使用 `{{variable}}` 语法为提示模板添加版本号，并为生产\u002F预发布工作流打上标签。\n\n**广泛的集成支持** -- 自动化对 OpenAI、Anthropic、Google GenAI 和 Together AI 的插桩。同时支持 LangGraph、OpenLit 和 Claude Agent SDK 等框架。\n\n## 快速入门\n\n安装 SDK：\n\n```bash\npip install judgeval\n```\n\n设置您的凭据（如果您没有密钥，请[创建一个免费账户](https:\u002F\u002Fapp.judgmentlabs.ai\u002Fregister)）：\n\n```bash\nexport JUDGMENT_API_KEY=...\nexport JUDGMENT_ORG_ID=...\n```\n\n### 追踪\n\n只需两行代码即可为您的代理添加可观测性：\n\n```python\nfrom judgeval import Tracer, wrap\nfrom openai import OpenAI\n\nTracer.init(project_name=\"my-project\")\nclient = wrap(OpenAI())\n\n@Tracer.observe(span_type=\"tool\")\ndef search(query: str) -> str:\n    results = vector_db.search(query)\n    return results\n\n@Tracer.observe(span_type=\"agent\")\ndef run_agent(question: str) -> str:\n    context = search(question)\n    response = client.chat.completions.create(\n        model=\"gpt-4o-mini\",\n        messages=[{\"role\": \"user\", \"content\": f\"{context}\\n\\n{question}\"}],\n    )\n    return response.choices[0].message.content\n\nrun_agent(\"美国的首都是哪里？\")\n```\n\n所有追踪数据都会被发送到您的 [Judgment 控制台](https:\u002F\u002Fapp.judgmentlabs.ai\u002F)：\n\n![Judgment 平台轨迹视图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJudgmentLabs_judgeval_readme_782e534609b0.png)\n\n### 在线监控\n\n可以在任何已追踪的函数内部异步对实时流量进行评分。评估将在 span 完成后在服务器端运行：\n\n```python\n@Tracer.observe(span_type=\"agent\")\ndef run_agent(question: str) -> str:\n    response = client.chat.completions.create(\n        model=\"gpt-4o-mini\",\n        messages=[{\"role\": \"user\", \"content\": question}],\n    )\n    answer = response.choices[0].message.content\n\n    Tracer.async_evaluate(\n        \"answer_relevancy\",\n        {\"input\": question, \"actual_output\": answer},\n    )\n\n    return answer\n```\n\n![自定义评分器在线 ABM](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJudgmentLabs_judgeval_readme_42bceb7a9bcc.png)\n\n### 离线评估\n\n使用 `Judgeval` 客户端对托管评分器执行批量评估：\n\n```python\nfrom judgeval import Judgeval\nfrom judgeval.data import Example\n\nclient = Judgeval(project_name=\"my-project\")\nevaluation = client.evaluation.create()\n\nresults = evaluation.run(\n    examples=[\n        Example.create(\n            input=\"2+2 等于多少？\",\n            actual_output=\"4\",\n            expected_output=\"4\",\n        ),\n    ],\n    scorers=[\"faithfulness\", \"answer_relevancy\"],\n    eval_run_name=\"nightly-eval\",\n)\n```\n\n结果将以 `ScoringResult` 对象的形式返回，并显示在控制台中。\n\n## 自定义评判者\n\n通过继承 `Judge` 类并指定响应类型来定义您自己的评估逻辑：\n\n```python\nfrom judgeval.judges import Judge\nfrom judgeval.hosted.responses import BinaryResponse\nfrom judgeval.data import Example\n\nclass CorrectnessJudge(Judge[BinaryResponse]):\n    async def score(self, data: Example) -> BinaryResponse:\n        correct = data[\"expected_output\"].lower() in data[\"actual_output\"].lower()\n        return BinaryResponse(\n            value=correct,\n            reason=\"包含预期答案\" if correct else \"缺少预期答案\",\n        )\n```\n\n目前有三种响应类型可供选择：\n\n| 类型 | 值 | 使用场景 |\n|:-----|:------|:---------|\n| `BinaryResponse` | `bool` | 通过\u002F不通过检查 |\n| `NumericResponse` | `float` | 连续分数（0.0 — 1.0）|\n| `CategoricalResponse` | `str` | 分类到预定义类别 |\n\n### 使用 CLI 搭建并上传\n\n```bash\njudgeval scorer init -t binary -n CorrectnessJudge\njudgeval scorer upload correctness_judge.py -p my-project\n```\n\n上传完成后，您的评判者将在安全的 Firecracker 微虚拟机中运行，并可通过 `Tracer.async_evaluate()` 用于在线监控。\n\n## 数据集\n\n通过平台管理黄金评估集：\n\n```python\nfrom judgeval import Judgeval\nfrom judgeval.data import Example\n\nclient = Judgeval(project_name=\"my-project\")\n\ndataset = client.datasets.create(\n    name=\"golden-set\",\n    examples=[\n        Example.create(input=\"2+2 等于多少？\", expected_output=\"4\"),\n        Example.create(input=\"法国的首都是哪里？\", expected_output=\"巴黎\"),\n    ],\n)\n\ndataset = client.datasets.get(name=\"golden-set\")\n```\n\n数据集支持从 JSON\u002FYAML 导入、批量追加和导出。\n\n## 提示模板版本控制\n\n使用 `{{variable}}` 占位符对提示模板进行版本管理和标记：\n\n```python\nclient = Judgeval(project_name=\"my-project\")\n\nprompt = client.prompts.create(\n    name=\"system-prompt\",\n    prompt=\"你是一位针对{{product}}的助手。请用{{language}}作答。\",\n    tags=[\"production\"],\n)\n\nprompt = client.prompts.get(name=\"system-prompt\", tag=\"production\")\ncompiled = prompt.compile(product=\"Acme Search\", language=\"English\")\n```\n\n## 集成\n\n### 大模型提供商\n\n使用 `wrap()` 包装任何受支持的客户端，即可自动创建追踪跨度并跟踪 token 数量与成本：\n\n```python\nfrom judgeval import wrap\n\nclient = wrap(OpenAI())          # OpenAI\nclient = wrap(Anthropic())       # Anthropic\nclient = wrap(genai.Client())    # Google GenAI\nclient = wrap(Together())        # Together AI\n```\n\n### 框架\n\n| 框架         | 设置                     |\n|:-------------|:-------------------------|\n| LangGraph    | `from judgeval.integrations import Langgraph; Langgraph.initialize()` |\n| OpenLit      | `from judgeval.integrations import Openlit; Openlit.initialize()` |\n| Claude Agent SDK | `from judgeval.integrations import setup_claude_agent_sdk; setup_claude_agent_sdk()` |\n\n## 说明书\n\n| 主题           | 笔记本                 | 描述                                   |\n|:---------------|:-----------------------|:---------------------------------------|\n| 在线 A\u002FB 测试  | [Research Agent](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FJudgmentLabs\u002Fjudgment-cookbook\u002Fblob\u002Fmain\u002Fmonitoring\u002FResearch_Agent_Online_Monitoring.ipynb) | 监控生产环境中的代理行为 |\n| 自定义评分器   | [HumanEval](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FJudgmentLabs\u002Fjudgment-cookbook\u002Fblob\u002Fmain\u002Fcustom_scorers\u002FHumanEval_Custom_Scorer.ipynb) | 为你的代理构建自定义评估工具 |\n\n浏览完整的[说明书仓库](https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgment-cookbook)，或观看[视频教程](https:\u002F\u002Fwww.youtube.com\u002F@Alexshander-JL)。\n\n## 链接\n\n- [文档](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation)\n- [Judgment 平台](https:\u002F\u002Fapp.judgmentlabs.ai\u002F)\n- [自托管指南](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation\u002Fself-hosting\u002Fget-started)\n- [自定义评分器指南](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation\u002Fevaluation\u002Fcustom-scorers)\n- [在线评估指南](https:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation\u002Fperformance\u002Fonline-evals)\n- [说明书仓库](https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgment-cookbook)\n- [视频教程](https:\u002F\u002Fwww.youtube.com\u002F@Alexshander-JL)\n\n---\n\nJudgeval 由 [Judgment Labs](https:\u002F\u002Fjudgmentlabs.ai\u002F) 创建并维护。","# Judgeval 快速上手指南\n\nJudgeval 是一个开源的 Python SDK，专为监控和评估 LLM 驱动的智能体（Agent）行为而设计。它支持基于 OpenTelemetry 的链路追踪、在线\u002F离线评估以及自定义评分逻辑，帮助开发者实时发现故障并优化生产环境中的 Agent 表现。\n\n## 环境准备\n\n*   **系统要求**：Python 3.8 及以上版本。\n*   **前置依赖**：\n    *   已注册的 Judgment Labs 账户（用于获取 API 密钥）。\n    *   常用的 LLM 客户端库（如 `openai`, `anthropic` 等，根据实际需求安装）。\n*   **网络提示**：由于 Judgment Labs 服务托管在海外，国内开发者使用时请确保网络环境能够访问 `app.judgmentlabs.ai` 及相关 API 端点。目前官方暂未提供中国镜像源。\n\n## 安装步骤\n\n1.  使用 pip 安装 Judgeval SDK：\n\n```bash\npip install judgeval\n```\n\n2.  配置环境变量。请登录 [Judgment Cloud](https:\u002F\u002Fapp.judgmentlabs.ai\u002Fregister) 注册免费账户并获取 `API Key` 和 `Organization ID`，然后在终端执行：\n\n```bash\nexport JUDGMENT_API_KEY=your_api_key_here\nexport JUDGMENT_ORG_ID=your_org_id_here\n```\n\n## 基本使用\n\n### 1. 链路追踪 (Tracing)\n\n只需两行代码即可为现有的 LLM 调用添加可观测性，自动捕获输入、输出及 Token 消耗情况。\n\n```python\nfrom judgeval import Tracer, wrap\nfrom openai import OpenAI\n\n# 初始化项目\nTracer.init(project_name=\"my-project\")\n\n# 包装 LLM 客户端以自动追踪\nclient = wrap(OpenAI())\n\n@Tracer.observe(span_type=\"tool\")\ndef search(query: str) -> str:\n    # 模拟向量数据库搜索\n    return \"Context information related to \" + query\n\n@Tracer.observe(span_type=\"agent\")\ndef run_agent(question: str) -> str:\n    context = search(question)\n    response = client.chat.completions.create(\n        model=\"gpt-4o-mini\",\n        messages=[{\"role\": \"user\", \"content\": f\"{context}\\n\\n{question}\"}],\n    )\n    return response.choices[0].message.content\n\n# 运行智能体\nrun_agent(\"What is the capital of the United States?\")\n```\n*执行后，所有追踪数据将自动同步至您的 Judgment Dashboard 进行可视化分析。*\n\n### 2. 在线监控 (Online Monitoring)\n\n在 traced 函数中异步运行评估，对生产环境的实时流量进行打分，且不会增加请求延迟。\n\n```python\n@Tracer.observe(span_type=\"agent\")\ndef run_agent(question: str) -> str:\n    response = client.chat.completions.create(\n        model=\"gpt-4o-mini\",\n        messages=[{\"role\": \"user\", \"content\": question}],\n    )\n    answer = response.choices[0].message.content\n\n    # 异步评估答案相关性\n    Tracer.async_evaluate(\n        \"answer_relevancy\",\n        {\"input\": question, \"actual_output\": answer},\n    )\n\n    return answer\n```\n\n### 3. 离线评估 (Offline Evaluation)\n\n使用 `Judgeval` 客户端针对预设数据集批量运行评估（例如使用内置的“忠实度”或“答案相关性”评分器）。\n\n```python\nfrom judgeval import Judgeval\nfrom judgeval.data import Example\n\nclient = Judgeval(project_name=\"my-project\")\nevaluation = client.evaluation.create()\n\nresults = evaluation.run(\n    examples=[\n        Example.create(\n            input=\"What is 2+2?\",\n            actual_output=\"4\",\n            expected_output=\"4\",\n        ),\n    ],\n    scorers=[\"faithfulness\", \"answer_relevancy\"],\n    eval_run_name=\"nightly-eval\",\n)\n```\n\n### 4. 自定义评分器 (Custom Judges)\n\n您可以定义自己的评估逻辑。以下是一个简单的二元（通过\u002F失败）评分器示例：\n\n```python\nfrom judgeval.judges import Judge\nfrom judgeval.hosted.responses import BinaryResponse\nfrom judgeval.data import Example\n\nclass CorrectnessJudge(Judge[BinaryResponse]):\n    async def score(self, data: Example) -> BinaryResponse:\n        correct = data[\"expected_output\"].lower() in data[\"actual_output\"].lower()\n        return BinaryResponse(\n            value=correct,\n            reason=\"Contains expected answer\" if correct else \"Missing expected answer\",\n        )\n```\n\n定义完成后，可通过 CLI 工具上传至云端安全沙箱运行：\n```bash\njudgeval scorer init -t binary -n CorrectnessJudge\njudgeval scorer upload correctness_judge.py -p my-project\n```","某电商公司正在开发一个基于大模型的智能客服 Agent，用于自动处理用户的退货咨询与订单查询。\n\n### 没有 judgeval 时\n- **黑盒运行难排查**：当用户反馈回答错误时，开发团队无法复现完整的调用链路，难以定位是检索工具出错还是模型生成偏差。\n- **评估全靠人工**：每天数千条对话只能靠抽检，无法量化“回答准确性”或“指令遵循度”，导致坏案例流入生产环境。\n- **故障响应滞后**：出现严重幻觉（如编造退款政策）时，往往要等到用户投诉爆发才能察觉，缺乏实时预警机制。\n- **迭代缺乏依据**：优化 Prompt 或微调模型时，缺少统一的“金标准”测试集来验证新版本是否真的比旧版本更好。\n\n### 使用 judgeval 后\n- **全链路可观测**：通过 `@Tracer.observe` 自动记录每次交互的输入、输出及 Token 消耗，在 Dashboard 中一键还原故障现场，秒级定位问题根源。\n- **自动化批量评测**：利用内置的“忠实度”和“相关性”评分器，对历史和新产生的对话进行异步打分，将评估覆盖率从 5% 提升至 100%。\n- **实时异常告警**：配置 Slack 警报规则，一旦检测到涉及敏感政策的幻觉回答，系统立即通知工程师介入，将风险控制在分钟级。\n- **数据驱动迭代**：建立版本化的黄金测试集，每次代码提交自动运行回归测试，用客观分数证明新策略的有效性，告别盲目调优。\n\njudgeval 将原本不可控的 Agent 黑盒变成了可量化、可监控、可持续进化的透明系统，让团队能放心地将智能客服大规模推向生产环境。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJudgmentLabs_judgeval_782e5346.png","JudgmentLabs","Judgment Labs","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FJudgmentLabs_b29eea65.png","",null,"https:\u002F\u002Fgithub.com\u002FJudgmentLabs",[82],{"name":83,"color":84,"percentage":85},"Python","#3572A5",100,1018,88,"2026-04-02T03:11:37","Apache-2.0","未说明",{"notes":92,"python":90,"dependencies":93},"该工具主要作为 Python SDK 使用，依赖云端服务（Judgment Cloud）或自托管后端进行评估和监控。自定义评分器在安全的 Firecracker microVM 中运行。需配置 JUDGMENT_API_KEY 和 JUDGMENT_ORG_ID 环境变量。支持通过 pip 安装。",[94,95,96,97,98,99],"openai","anthropic","google-generativeai","together","langgraph","opentelemetry",[13,54,26,15],[102,98,103,104,105,106,107,94,108,109,110,111,112,113,114],"langchain","llama-index","llm","llm-evaluation","llm-observability","open-source","prompt-engineering","agent","agentic-ai","agents","grpo","reinforcement-learning","rl","2026-03-27T02:49:30.150509","2026-04-06T11:32:01.867696",[118,123,128,133,138,143],{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},11085,"运行评估时遇到 'Error 422: Project limit exceeded' 错误怎么办？","这通常是因为在调用 `run_evaluation()` 时未传递 `project_name` 参数，导致系统默认使用名为 `default_project` 的项目。如果您之前已经创建过同名项目，就会被视为尝试创建第二个项目从而超出限制。\n\n解决方案：\n1. 在调用函数时显式传入唯一的 `project_name` 参数。\n2. 注意：免费层的项目数量限制已提升至 5 个。\n3. 确保文档中关于不传递项目名称会导致使用默认名称的说明清晰可见。","https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fissues\u002F443",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},11086,"访问自托管（self-hosting）文档链接时出现 404 错误？","README 文件中的自托管文档链接存在拼写错误。错误的链接使用了下划线 `get_started`，而正确的链接应使用连字符 `get-started`。\n\n正确链接地址为：\nhttps:\u002F\u002Fdocs.judgmentlabs.ai\u002Fdocumentation\u002Fself-hosting\u002Fget-started\n\n请将旧链接替换为此地址以正常访问文档。","https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fissues\u002F456",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},11087,"在使用在线评估追踪代码时遇到 'NameError: AnswerRelevancyScorer is not defined' 错误？","这是因为缺少必要的导入语句。您需要在使用评分器之前，从 `judgeval.scorers` 模块中显式导入所需的类。\n\n请添加以下代码来解决该问题：\n```python\nfrom judgeval.scorers import AnswerRelevancyScorer, ToolOrderScorer\n```\n添加导入后，即可正常使用如下代码：\n```python\nscorers=[AnswerRelevancyScorer(threshold=0.5)]\n```","https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fissues\u002F422",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},11088,"初次设置 Judgeval 环境时遇到 Python 版本或依赖包安装问题如何解决？","常见设置问题及解决方案如下：\n1. **Python 版本**：Judgeval 需要 Python 3.8 或更高版本。如果使用 Python 3.7 会导致导入错误，请升级 Python 版本。\n2. **虚拟环境**：建议使用 `venv` 创建虚拟环境。在 macOS 上激活失败时，需手动验证是否使用了正确的 Python 解释器。\n3. **依赖包名称**：注意区分 `tavily` 和 `tavily-python`。安装错误的包（`pip install tavily`）会导致导入失败。请卸载错误包并安装正确的版本：\n   ```bash\n   pip uninstall tavily\n   pip install tavily-python\n   ```\n4. **导入路径**：明确区分组件导入路径，例如 `from judgeval.common.tracer import Tracer` 与 `from judgeval import JudgmentClient` 的不同用途。","https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fissues\u002F363",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},11089,"打印 ScoringResult 对象时显示的字段名与实际代码访问的字段名不一致（scorer_data vs scorers_data）？","这是一个已确认的显示 Bug。对象的字符串表示形式错误地显示了 `scorer_data`，但实际编程访问时应使用 `scorers_data`（复数形式）。\n\n- **现象**：`print(results[0])` 显示 `scorer_data=[...]`\n- **正确用法**：代码中必须使用 `results[0].scorers_data` 访问，否则会出现 `AttributeError`。\n- **状态**：维护者已确认该问题，并将在下一个版本发布中修复此不一致性，使打印输出与实际操作字段保持一致。","https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fissues\u002F429",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},11090,"文档中提到的 `judgeval.evals.Evaluation` 类无法找到或不存在？","`judgeval.evals.Evaluation` 类或模块在最新版本中已被移除。如果您参考的是旧版示例或非官方文档，可能会遇到此问题。\n\n解决方案：\n1. 不要使用已废弃的 `Evaluation` 基类。\n2. 请参考最新的官方文档或 Cookbooks，使用自定义函数或直接调用客户端方法来实现评估逻辑。\n3. 确保您的代码示例来源是最新更新的官方资源，以避免使用过时的 API。","https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fissues\u002F357",[149,154,159,164,169,174,179,184,189,194,199,204,209,214,219,224,229,234,239,244],{"id":150,"version":151,"summary_zh":152,"released_at":153},61483,"v1.0.2","您可以在 PyPI 上找到该软件包的发布版本：https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F1.0.2\u002F\n\n\n## 变更内容\n* 修复：在发出部分跨度时，不再附加待处理的跟踪评估，由 @abhishekg999 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F729 中完成\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv1.0.1...v1.0.2","2026-03-27T20:21:27",{"id":155,"version":156,"summary_zh":157,"released_at":158},61484,"v1.0.1","您可以在 PyPI 上找到该软件包的发布版本：https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F1.0.1\u002F\n\n\n## 变更内容\n* 修复：由 @abhishekg999 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F728 中添加了打包依赖项\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv1.0.0...v1.0.1","2026-03-27T00:05:57",{"id":160,"version":161,"summary_zh":162,"released_at":163},61485,"v1.0.0","您可以在 PyPI 上找到此软件包的发布版本：https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F1.0.0\u002F\n\n\n## 变更内容\n* 啊哈\u002F修复 CLI，由 @abhishekg999 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F722 中完成\n* 恢复多文件 CLI，由 @justinsheu 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F723 中完成\n* 修复：序列化可选的 Pydantic 依赖项，由 @abhishekg999 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F724 中完成\n* 杂项：添加公共示例目录，由 @abhishekg999 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F725 中完成\n* 杂项：恢复 tracer.wrap 别名，由 @abhishekg999 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F726 中完成\n* Judgeval v1 文档字符串（JUD-4491），由 @abhishekg999 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F727 中完成\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.32.1...v1.0.0","2026-03-26T23:54:13",{"id":165,"version":166,"summary_zh":167,"released_at":168},61486,"v0.32.1","您可以在 PyPI 上找到此软件包的发布版本：https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.32.1\u002F\n\n\n## 变更内容\n* JUD-4184：由 @alanzhang25 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F718 中实现的判断客户用户 ID 功能\n* 由 @justinsheu 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F721 中修复的端到端测试\n* 由 @justinsheu 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F720 中实现的手动发布流程\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.32.0...v0.32.1","2026-03-24T03:56:05",{"id":170,"version":171,"summary_zh":172,"released_at":173},61487,"v0.32.0","您可以在 PyPI 上找到该软件包的发布版本：https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.32.0\u002F\n\n\n## 变更内容\n* Justin\u002F代码评测类别，由 @justinsheu 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F715 中提出\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.31.0...v0.32.0","2026-03-18T17:49:12",{"id":175,"version":176,"summary_zh":177,"released_at":178},61488,"v0.31.0","您可以在 PyPI 上找到此软件包的发布版本：https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.31.0\u002F\n\n\n## 变更内容\n* chore：危险地允许非 root 用户覆盖（JUD-4197），由 @abhishekg999 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F710 中提交\n* JUD-3738 多文件评判器，由 @justinsheu 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F712 中提交\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.30.0...v0.31.0","2026-03-16T19:13:44",{"id":180,"version":181,"summary_zh":182,"released_at":183},61489,"v0.30.0","您可以在 PyPI 上找到此软件包的发布版本：https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.30.0\u002F\n\n\n## 变更内容\n* @justinsheu 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F711 中添加了 `images.generate()` 支持\n* @adivate2021 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F709 中改进了自动文档工作流\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.29.0...v0.30.0","2026-03-10T03:03:23",{"id":185,"version":186,"summary_zh":187,"released_at":188},61490,"v0.29.0","您可以在 PyPI 上找到该软件包的发布版本：https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.29.0\u002F\n\n\n## 变更内容\n* 更新本地评分器，使其使用 Judge 类型，由 @justinsheu 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F708 中完成\n* 移除 trace prompt 评分器（JUD-4116），由 @adivate2021 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F706 中完成\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.28.1...v0.29.0","2026-03-04T01:18:42",{"id":190,"version":191,"summary_zh":192,"released_at":193},61491,"v0.28.1","您可以在 PyPI 上找到此软件包的发布版本：https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.28.1\u002F\n\n\n## 变更内容\n* Judgeval Claude 工具跨度（JUD-3934），由 @Mandolaro 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F704 中完成\n* 更新 Judgeval 端到端测试端点签名，使其更符合 HTTP 标准，由 @SamGearou 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F703 中完成\n* JUD-3799：Judge 统一，由 @alanzhang25 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F700 中完成\n\n## 新贡献者\n* @SamGearou 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F703 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.28.0...v0.28.1","2026-02-28T05:10:32",{"id":195,"version":196,"summary_zh":197,"released_at":198},61492,"v0.28.0","您可以在 PyPI 上找到此软件包的发布版本：https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.28.0\u002F\n\n\n## 变更内容\n* chore：废弃旧版功能，由 @abhishekg999 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F695 中完成\n* 本地评分功能，由 @justinsheu 在 https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F699 中实现\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.27.1...v0.28.0","2026-02-19T20:07:28",{"id":200,"version":201,"summary_zh":202,"released_at":203},61493,"v0.27.1","You can find this package release on PyPI: https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.27.1\u002F\n\n\n## What's Changed\n* Enabling Partial Emit Spans by @Ishan-Sinha123 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F694\n* chore: update pre-commit hooks by @github-actions[bot] in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F626\n* Ahh\u002Fcustom scorer types by @alanzhang25 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F689\n\n## New Contributors\n* @Ishan-Sinha123 made their first contribution in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F694\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.27.0...v0.27.1","2026-02-10T03:26:51",{"id":205,"version":206,"summary_zh":207,"released_at":208},61494,"v0.27.0","You can find this package release on PyPI: https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.27.0\u002F\n\n\n## What's Changed\n* add graceful handling of no project id (JUD-3682) by @Mandolaro in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F686\n* Justin\u002Fstagin pypi by @justinsheu in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F690\n\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.26.2...v0.27.0","2026-02-06T22:53:28",{"id":210,"version":211,"summary_zh":212,"released_at":213},61495,"v0.26.2","You can find this package release on PyPI: https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.26.2\u002F\n\n\n## What's Changed\n* Allow choices for custom scorers by @adivate2021 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F688\n\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.26.1...v0.26.2","2026-02-05T05:25:23",{"id":215,"version":216,"summary_zh":217,"released_at":218},61496,"v0.26.1","You can find this package release on PyPI: https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.26.1\u002F\n\n\n## What's Changed\n* Aaryan\u002Fcustom scorer fixes (JUD-3553) by @adivate2021 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F681\n* Add workflow to trigger docs update by @adivate2021 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F683\n* add use_default_span_processor param by @justinsheu in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F684\n* Remove bucketing by @adivate2021 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F682\n\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.26.0...v0.26.1","2026-02-04T04:22:07",{"id":220,"version":221,"summary_zh":222,"released_at":223},61497,"v0.26.0","You can find this package release on PyPI: https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.26.0\u002F\n\n\n## What's Changed\n* Move Scorer to Project-level (JUD-3524) by @Mandolaro in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F669\n* Fix tests by @adivate2021 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F675\n* fix tracer behavior by @Mandolaro in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F679\n* Another e2e fix by @adivate2021 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F680\n* Abhishek\u002Fjud 3544 add support for wrapping context manager by @abhishekg999 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F678\n\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.25.1...v0.26.0","2026-01-29T04:18:03",{"id":225,"version":226,"summary_zh":227,"released_at":228},61498,"v0.25.1","You can find this package release on PyPI: https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.25.1\u002F\n\n\n## What's Changed\n* Fix Python 3.10 (JUD-3547) by @adivate2021 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F674\n* feat: override project from tracer (JUD-3539) by @abhishekg999 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F676\n\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.25.0...v0.25.1","2026-01-28T19:13:58",{"id":230,"version":231,"summary_zh":232,"released_at":233},61499,"v0.25.0","You can find this package release on PyPI: https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.25.0\u002F\n\n\n## What's Changed\n* Custom scorer name fix by @adivate2021 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F672\n* Ahh\u002Fobserve support for generators by @alanzhang25 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F673\n\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.24.3...v0.25.0","2026-01-27T18:30:32",{"id":235,"version":236,"summary_zh":237,"released_at":238},61500,"v0.24.3","You can find this package release on PyPI: https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.24.3\u002F\n\n\n## What's Changed\n* fix random (JUD-3484) by @Mandolaro in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F667\n* Feat: tags by @abhishekg999 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F666\n* changes to scorers and api generation (JUD-3332) by @adivate2021 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F661\n* revert: version.py by @abhishekg999 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F668\n\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.24.2...v0.24.3","2026-01-26T23:15:32",{"id":240,"version":241,"summary_zh":242,"released_at":243},61501,"v0.24.2","You can find this package release on PyPI: https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.24.2\u002F\n\n\n## What's Changed\n* Sessions by @justinsheu in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F663\n\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.24.1...v0.24.2","2026-01-23T08:03:01",{"id":245,"version":246,"summary_zh":247,"released_at":248},61502,"v0.24.1","You can find this package release on PyPI: https:\u002F\u002Fpypi.org\u002Fproject\u002Fjudgeval\u002F0.24.1\u002F\n\n\n## What's Changed\n* Fix prompt scorer errors by @adivate2021 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F659\n* Feat: record input output flags to control observe behavior by @abhishekg999 in https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fpull\u002F664\n\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FJudgmentLabs\u002Fjudgeval\u002Fcompare\u002Fv0.24.0...v0.24.1","2026-01-22T03:05:21"]