[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-huggingface--evaluate":3,"tool-huggingface--evaluate":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":75,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":95,"env_os":96,"env_gpu":96,"env_ram":96,"env_deps":97,"category_tags":105,"github_topics":106,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":109,"updated_at":110,"faqs":111,"releases":140},2680,"huggingface\u002Fevaluate","evaluate","🤗 Evaluate: A library for easily evaluating machine learning models and datasets.","evaluate 是 Hugging Face 推出的一款开源库，旨在让机器学习模型和数据集的评估过程变得更加简单、标准化。在 AI 开发中，不同框架间的指标计算往往繁琐且难以统一，evaluate 通过提供数十种涵盖自然语言处理到计算机视觉的常用指标（如准确率、F1 分数等），有效解决了这一痛点。无论是使用 PyTorch、TensorFlow 还是 JAX，开发者只需一行代码即可加载所需指标并快速获得评估结果。\n\n这款工具非常适合 AI 研究人员、数据科学家以及机器学习工程师使用，尤其是那些需要频繁对比模型性能或复现论文结果的团队。其独特亮点在于强大的社区生态与规范化设计：所有指标均托管在 Hugging Face Hub 上，支持用户轻松创建并分享自定义评估模块；同时，它内置了类型检查机制以防输入错误，并为每个指标配备了详细的“指标卡片”，清晰说明数值含义、适用范围及局限性，极大地提升了实验的可复现性与协作效率。对于希望专注于模型优化而非重复编写评估代码的用户而言，evaluate 是一个实用且友好的得力助手。","\u003Cp align=\"center\">\r\n    \u003Cbr>\r\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_evaluate_readme_e39d94f8c2d5.png\" width=\"400\"\u002F>\r\n    \u003Cbr>\r\n\u003C\u002Fp>\r\n\r\n\u003Cp align=\"center\">\r\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Factions\u002Fworkflows\u002Fci.yml?query=branch%3Amain\">\r\n        \u003Cimg alt=\"Build\" src=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg?branch=main\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fblob\u002Fmaster\u002FLICENSE\">\r\n        \u003Cimg alt=\"GitHub\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fhuggingface\u002Fevaluate.svg?color=blue\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002Findex\">\r\n        \u003Cimg alt=\"Documentation\" src=\"https:\u002F\u002Fimg.shields.io\u002Fwebsite\u002Fhttp\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002Findex.svg?down_color=red&down_message=offline&up_message=online\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Freleases\">\r\n        \u003Cimg alt=\"GitHub release\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Fhuggingface\u002Fevaluate.svg\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"CODE_OF_CONDUCT.md\">\r\n        \u003Cimg alt=\"Contributor Covenant\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContributor%20Covenant-2.0-4baaaa.svg\">\r\n    \u003C\u002Fa>\r\n\u003C\u002Fp>\r\n\r\n\r\n\r\n> **Tip:** For more recent evaluation approaches, for example for evaluating LLMs, we recommend our newer and more actively maintained library [LightEval](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flighteval).\r\n\r\n\r\n\r\n🤗 Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized. \r\n\r\nIt currently contains:\r\n\r\n- **implementations of dozens of popular metrics**: the existing metrics cover a variety of tasks spanning from NLP to Computer Vision, and include dataset-specific metrics for datasets. With a simple command like `accuracy = load(\"accuracy\")`, get any of these metrics ready to use for evaluating a ML model in any framework (Numpy\u002FPandas\u002FPyTorch\u002FTensorFlow\u002FJAX).\r\n- **comparisons and measurements**: comparisons are used to measure the difference between models and measurements are tools to evaluate datasets.\r\n- **an easy way of adding new evaluation modules to the 🤗 Hub**: you can create new evaluation modules and push them to a dedicated Space in the 🤗 Hub with `evaluate-cli create [metric name]`, which allows you to see easily compare different metrics and their outputs for the same sets of references and predictions.\r\n\r\n[🎓 **Documentation**](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002F)\r\n\r\n🔎 **Find a [metric](https:\u002F\u002Fhuggingface.co\u002Fevaluate-metric), [comparison](https:\u002F\u002Fhuggingface.co\u002Fevaluate-comparison), [measurement](https:\u002F\u002Fhuggingface.co\u002Fevaluate-measurement) on the Hub**\r\n\r\n[🌟 **Add a new evaluation module**](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002F)\r\n\r\n🤗 Evaluate also has lots of useful features like:\r\n\r\n- **Type checking**: the input types are checked to make sure that you are using the right input formats for each metric\r\n- **Metric cards**: each metrics comes with a card that describes the values, limitations and their ranges, as well as providing examples of their usage and usefulness.\r\n- **Community metrics:** Metrics live on the Hugging Face Hub and you can easily add your own metrics for your project or to collaborate with others.\r\n\r\n\r\n# Installation\r\n\r\n## With pip\r\n\r\n🤗 Evaluate can be installed from PyPi and has to be installed in a virtual environment (venv or conda for instance)\r\n\r\n```bash\r\npip install evaluate\r\n```\r\n\r\n# Usage\r\n\r\n🤗 Evaluate's main methods are:\r\n\r\n- `evaluate.list_evaluation_modules()` to list the available metrics, comparisons and measurements\r\n- `evaluate.load(module_name, **kwargs)` to instantiate an evaluation module\r\n- `results = module.compute(*kwargs)` to compute the result of an evaluation module\r\n\r\n# Adding a new evaluation module\r\n\r\nFirst install the necessary dependencies to create a new metric with the following command:\r\n```bash\r\npip install evaluate[template]\r\n```\r\nThen you can get started with the following command which will create a new folder for your metric and display the necessary steps:\r\n```bash\r\nevaluate-cli create \"Awesome Metric\"\r\n```\r\nSee this [step-by-step guide](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002Fcreating_and_sharing) in the documentation for detailed instructions.\r\n\r\n## Credits\r\n\r\nThanks to [@marella](https:\u002F\u002Fgithub.com\u002Fmarella) for letting us use the `evaluate` namespace on PyPi previously used by his [library](https:\u002F\u002Fgithub.com\u002Fmarella\u002Fevaluate).\r\n","\u003Cp align=\"center\">\r\n    \u003Cbr>\r\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_evaluate_readme_e39d94f8c2d5.png\" width=\"400\"\u002F>\r\n    \u003Cbr>\r\n\u003C\u002Fp>\r\n\r\n\u003Cp align=\"center\">\r\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Factions\u002Fworkflows\u002Fci.yml?query=branch%3Amain\">\r\n        \u003Cimg alt=\"构建\" src=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg?branch=main\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fblob\u002Fmaster\u002FLICENSE\">\r\n        \u003Cimg alt=\"GitHub\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fhuggingface\u002Fevaluate.svg?color=blue\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002Findex\">\r\n        \u003Cimg alt=\"文档\" src=\"https:\u002F\u002Fimg.shields.io\u002Fwebsite\u002Fhttp\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002Findex.svg?down_color=red&down_message=离线&up_message=在线\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Freleases\">\r\n        \u003Cimg alt=\"GitHub 发布\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Fhuggingface\u002Fevaluate.svg\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"CODE_OF_CONDUCT.md\">\r\n        \u003Cimg alt=\"贡献者公约\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContributor%20Covenant-2.0-4baaaa.svg\">\r\n    \u003C\u002Fa>\r\n\u003C\u002Fp>\r\n\r\n\r\n\r\n> **提示:** 对于更先进的评估方法，例如针对大型语言模型的评估，我们推荐使用更新且维护更为活跃的库 [LightEval](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flighteval)。\r\n\r\n\r\n\r\n🤗 Evaluate 是一个使模型评估与比较以及性能报告更加简便和标准化的库。\r\n\r\n目前它包含以下内容：\r\n\r\n- **数十种流行指标的实现**：这些指标涵盖了从自然语言处理到计算机视觉等多种任务，并为特定数据集提供了专用指标。只需简单地执行 `accuracy = load(\"accuracy\")`，即可获得任意一种指标，直接用于在任何框架（Numpy\u002FPandas\u002FPyTorch\u002FTensorFlow\u002FJAX）中评估机器学习模型。\r\n- **比较与度量工具**：比较用于衡量不同模型之间的差异，而度量工具则用于评估数据集。\r\n- **一种简便的方式将新的评估模块添加到 🤗 Hub**：您可以通过 `evaluate-cli create [metric name]` 创建新的评估模块并将其推送到 🤗 Hub 中的专用 Space，从而轻松比较同一组参考数据和预测结果下不同指标及其输出。\r\n\r\n[🎓 **文档**](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002F)\r\n\r\n🔎 **在 Hub 上查找 [指标](https:\u002F\u002Fhuggingface.co\u002Fevaluate-metric)、[比较](https:\u002F\u002Fhuggingface.co\u002Fevaluate-comparison) 或 [度量](https:\u002F\u002Fhuggingface.co\u002Fevaluate-measurement)**\r\n\r\n[🌟 **添加一个新的评估模块**](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002F)\r\n\r\n🤗 Evaluate 还具备许多实用功能，例如：\r\n\r\n- **类型检查**：会检查输入类型，以确保您为每种指标使用了正确的输入格式。\r\n- **指标卡片**：每个指标都附带一张卡片，详细说明其取值范围、局限性等信息，并提供使用示例及适用场景。\r\n- **社区指标**：指标托管在 Hugging Face Hub 上，您可以轻松添加自己的指标，用于项目或与其他开发者协作。\r\n\r\n\r\n# 安装\r\n\r\n## 使用 pip\r\n\r\n🤗 Evaluate 可以从 PyPI 安装，且必须在虚拟环境（如 venv 或 conda）中进行安装。\r\n\r\n```bash\r\npip install evaluate\r\n```\r\n\r\n# 使用\r\n\r\n🤗 Evaluate 的主要方法包括：\r\n\r\n- `evaluate.list_evaluation_modules()` 用于列出可用的指标、比较和度量工具。\r\n- `evaluate.load(module_name, **kwargs)` 用于实例化一个评估模块。\r\n- `results = module.compute(*kwargs)` 用于计算评估模块的结果。\r\n\r\n# 添加一个新的评估模块\r\n\r\n首先，使用以下命令安装创建新指标所需的依赖项：\r\n```bash\r\npip install evaluate[template]\r\n```\r\n然后，您可以使用以下命令开始操作，该命令将为您的指标创建一个新文件夹，并显示必要的步骤：\r\n```bash\r\nevaluate-cli create \"Awesome Metric\"\r\n```\r\n有关详细说明，请参阅文档中的[分步指南](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002Fcreating_and_sharing)。\r\n\r\n## 致谢\r\n\r\n感谢 [@marella](https:\u002F\u002Fgithub.com\u002Fmarella)，允许我们在 PyPI 上使用 `evaluate` 命名空间，该命名空间此前曾被他的[库](https:\u002F\u002Fgithub.com\u002Fmarella\u002Fevaluate)所使用。","# 🤗 Evaluate 快速上手指南\n\n🤗 Evaluate 是一个用于简化和标准化机器学习模型评估的库。它提供了数十种常用指标（涵盖 NLP、计算机视觉等任务）的实现，支持在 NumPy、PyTorch、TensorFlow 等任何框架中使用。\n\n> **提示**：如果您需要评估大语言模型（LLM）或寻找更前沿的评估方法，推荐使用 Hugging Face 最新维护的库 [LightEval](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flighteval)。\n\n## 环境准备\n\n- **操作系统**：Linux, macOS, Windows\n- **Python 版本**：建议 Python 3.8+\n- **前置依赖**：无特殊系统级依赖，但建议在虚拟环境中安装以避免包冲突。\n- **推荐环境**：使用 `venv` 或 `conda` 创建独立的虚拟环境。\n\n## 安装步骤\n\n### 1. 创建并激活虚拟环境（可选但推荐）\n\n```bash\npython -m venv eval_env\nsource eval_env\u002Fbin\u002Factivate  # Linux\u002FmacOS\n# 或\neval_env\\Scripts\\activate     # Windows\n```\n\n### 2. 安装 Evaluate\n\n通过 PyPI 安装基础版本：\n\n```bash\npip install evaluate\n```\n\n> **国内加速建议**：如果遇到下载速度慢的问题，可以使用清华或阿里镜像源：\n> ```bash\n> pip install evaluate -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n### 3. 安装开发模板（仅当需要创建新指标时）\n\n如果您打算贡献或创建自定义评估模块，需安装模板依赖：\n\n```bash\npip install \"evaluate[template]\"\n```\n\n## 基本使用\n\nEvaluate 的核心工作流程分为三步：列出可用模块 -> 加载模块 -> 计算结果。\n\n### 1. 查看可用指标\n\n列出 Hub 上所有可用的指标、对比工具和测量工具：\n\n```python\nimport evaluate\n\n# 查看所有可用的评估模块\nmodules = evaluate.list_evaluation_modules()\nprint(modules)\n```\n\n### 2. 加载并计算指标\n\n以最常用的 `accuracy`（准确率）为例：\n\n```python\nimport evaluate\n\n# 加载指标\naccuracy = evaluate.load(\"accuracy\")\n\n# 准备预测值和真实标签\npredictions = [0, 1, 1, 0]\nreferences = [0, 1, 0, 0]\n\n# 计算结果\nresults = accuracy.compute(predictions=predictions, references=references)\n\nprint(results)\n# 输出: {'accuracy': 0.75}\n```\n\n该库会自动进行类型检查，确保输入格式符合指标要求，并支持直接传入列表、NumPy 数组或 Pandas Series。\n\n### 3. 创建自定义指标（进阶）\n\n若需添加新的评估模块到 Hugging Face Hub：\n\n```bash\n# 初始化新项目文件夹\nevaluate-cli create \"Awesome Metric\"\n```\n\n执行后将生成标准目录结构，具体开发步骤请参考官方文档中的 [创建与分享指南](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002Fcreating_and_sharing)。","某 NLP 团队正在迭代一个多语言情感分析模型，需要快速验证不同版本在测试集上的表现并产出标准化报告。\n\n### 没有 evaluate 时\n- **重复造轮子**：每次切换任务（如从准确率切换到 F1 或 BLEU），开发人员都要手动查找公式并用 NumPy 重写计算逻辑，极易引入代码错误。\n- **格式对齐困难**：不同框架（PyTorch vs TensorFlow）输出的张量格式不一，每次评估前都要编写大量胶水代码进行类型转换和维度调整。\n- **指标理解成本高**：团队成员对某些复杂指标（如困惑度 Perplexity）的具体取值范围和适用边界缺乏统一认知，导致对模型好坏的误判。\n- **协作壁垒高**：新成员加入或跨组复现结果时，因缺乏统一的度量标准，往往需要花费数天时间对齐评估脚本和依赖环境。\n\n### 使用 evaluate 后\n- **一行代码加载**：只需调用 `load(\"f1\")` 或 `load(\"bleu\")` 即可瞬间获取经过社区验证的几十种主流指标实现，彻底告别手动实现算法。\n- **框架无关性**：evaluate 自动处理输入数据的类型检查与格式兼容，无论是 Pandas DataFrame 还是 PyTorch Tensor，都能直接传入 `compute` 方法得出结果。\n- **内置指标卡片**：每个指标都附带详细的说明卡片，清晰列出数值含义、局限性及典型用例，让团队成员能准确解读评估数据。\n- **标准化协作**：通过 Hugging Face Hub 共享评估模块，确保全团队使用完全一致的度量逻辑，大幅降低了沟通成本并提升了实验复现效率。\n\nevaluate 将繁琐且易错的评估流程转化为标准化的原子操作，让算法工程师能专注于模型优化而非度量衡的统一。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhuggingface_evaluate_e39d94f8.png","huggingface","Hugging Face","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhuggingface_90da21a4.png","The AI community building the future.",null,"https:\u002F\u002Fhuggingface.co\u002F","https:\u002F\u002Fgithub.com\u002Fhuggingface",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",99.9,{"name":88,"color":89,"percentage":90},"Makefile","#427819",0.1,2437,311,"2026-04-03T06:51:50","Apache-2.0",1,"未说明",{"notes":98,"python":96,"dependencies":99},"该库是一个评估指标集合，本身不强制要求 GPU 或特定内存大小，具体资源需求取决于所加载的具体指标及被评估模型的大小。支持多种机器学习框架（Numpy\u002FPandas\u002FPyTorch\u002FTensorFlow\u002FJAX）。必须安装在虚拟环境（如 venv 或 conda）中。对于大语言模型（LLM）的评估，官方推荐使用更新的 LightEval 库。",[100,101,102,103,104],"numpy","pandas","pytorch","tensorflow","jax",[54,13],[107,108],"evaluation","machine-learning","2026-03-27T02:49:30.150509","2026-04-06T05:15:59.466044",[112,117,122,127,132,136],{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},12415,"如何在离线环境下使用 evaluate 库加载指标？","如果无法联网，可以尝试以下两种方法：\n1. 直接导入底层依赖库（如 sklearn）自行计算，例如：\n```python\nfrom sklearn.metrics import accuracy_score\ndef compute_metrics(eval_pred):\n    predictions, labels = eval_pred\n    predictions = np.argmax(predictions, axis=1)\n    return {\"accuracy\": float(accuracy_score(labels, predictions))}\n```\n2. 对于某些指标（如 bleu），如果 `evaluate.load` 即使有缓存也尝试下载，可能是因为代码中存在 `# From: \u003Curl>` 注释。尝试移除该注释或手动调整 compute 函数直接使用本地逻辑，避免调用 `evaluate.load`。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fissues\u002F315",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},12416,"为什么升级 datasets 库后 evaluate 出现报错或兼容性问题？","较新版本的 `datasets`（如 2.16.0+）可能破坏了 `evaluate` 的某些功能（如锁文件权限、多进程加载等）。解决方案包括：\n1. 降级 `datasets` 到稳定版本，例如：`pip install datasets==2.15.0` 或 `datasets==2.13.1`。\n2. 升级 `evaluate` 到最新版本以适配新版的 `datasets`：`pip install -U evaluate`。\n如果问题仍存在且涉及多 GPU\u002F多节点训练，建议暂时使用单节点设置或关注 `accelerate` 仓库的相关讨论。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fissues\u002F542",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},12417,"遇到 'meteor' 指标在最新 datasets 版本中无法工作怎么办？","这是因为 `datasets` 库放弃了对 Python 3.7 的支持，导致旧版代码不兼容。解决方法是更新 `evaluate` 库到最新版本，它已修复此兼容性问题：\n```bash\npip install -U evaluate\n```\n如果更新后仍有问题，可以尝试将 `datasets` 降级到 2.13.1 版本。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fissues\u002F480",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},12418,"evaluate 库在处理输入数据类型时会自动转换吗？这会导致什么问题？","是的，`evaluate` 内部会使用 `pyarrow.array` 尝试自动转换数据类型。例如，如果期望输入是字符串但传入了整数列表 `[1]`，它会被隐式转换为 `[\"1\"]`；如果传入嵌套列表 `[[\"a\", \"b\"]]`，可能会被转换为 `[\"['a', 'b']\"]`。这可能导致像 `rouge` 这样的指标计算错误。目前建议在传入数据前严格检查类型，确保字符串字段确实是字符串，避免依赖自动转换。","https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fissues\u002F15",{"id":133,"question_zh":134,"answer_zh":135,"source_url":131},12419,"不同指标（如 accuracy, bleu, bertscore）需要的输入数据格式是什么？","不同指标对输入结构有不同要求，常见格式如下：\n- **accuracy**: `predictions` 和 `references` 通常为整数序列 (`Sequence(Value(\"int32\"))`)。\n- **bertscore \u002F bleurt \u002F cer**: `predictions` 为字符串，`references` 为字符串或字符串序列。\n- **bleu \u002F chrf**: `predictions` 为分词后的字符串序列，`references` 为参考译文列表（列表的列表）。\n- **comet**: 需要 `sources` (源文本), `predictions` (预测), `references` (参考)，均为字符串序列。\n使用前请查阅具体指标的文档或源码中的 `features` 定义，确保数据结构匹配，否则可能触发隐式类型转换导致错误。",{"id":137,"question_zh":138,"answer_zh":139,"source_url":121},12420,"在多 GPU 或多节点分布式训练中使用 evaluate.load 时报错找不到锁文件怎么办？","这通常是由于 `datasets>=2.16.0` 版本中锁文件创建时的权限掩码（umask）未被正确尊重导致的。表现为生成的 `.lock` 文件权限不足（如 `-rw-r--r--` 而非 `-rw-rw-rw-`），导致其他进程无法访问。\n临时解决方案：\n1. 降级 `datasets` 到 2.15.0 版本。\n2. 或者在运行脚本前手动设置正确的 umask（如 `umask 000`），但这可能无法完全解决库内部逻辑问题。\n3. 如果可能，暂时回退到单节点设置以避免此并发问题。",[141,146,151,156,161,166,171,176,181,186,191,196,201,206],{"id":142,"version":143,"summary_zh":144,"released_at":145},62770,"v0.4.6","## 变更内容\n* 移除已弃用的 `HfFolder`，由 @Wauplin 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F701 中完成  \n  * 此更改增加了对 `huggingface_hub>=1.0` 的支持\n* 更新 `index.mdx`，由 @meg-huggingface 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F694 中完成\n* 修复 parity 测试的 CI，由 @lhoestq 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F696 中完成\n* 在文档中添加排行榜，由 @burtenshaw 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F697 中完成\n* 在 CI 中固定 hfh 版本以更新仓库，由 @lhoestq 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F702 中完成\n\n## 新贡献者\n* @burtenshaw 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F697 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fcompare\u002Fv0.4.5...v0.4.6","2025-09-18T13:07:20",{"id":147,"version":148,"summary_zh":149,"released_at":150},62771,"v0.4.5","## 变更内容\n* @lhoestq 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F689 中增加了对数据集 4 的支持\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fcompare\u002Fv0.4.4...v0.4.5","2025-07-10T13:26:45",{"id":152,"version":153,"summary_zh":154,"released_at":155},62772,"v0.4.4","## Bug 修复\n* 支持 jiwer 4.0，由 @lhoestq 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F685 中完成\n* 修复无 bos_token_id 的分词器的困惑度分数问题，由 @kylehowells 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F682 中完成\n* 修复 precision\u002Frecall\u002Ff1 的 size 属性错误，由 @Maxwell-Jia 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F656 中完成\n\n## 其他变更\n* 添加必需的 hf_token 秘钥以构建主文档，由 @albertvillanova 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F635 中完成\n* 将 numpy 锁定为 \u003C2 版本，以满足 tensorflow 的要求并修复文档构建问题，由 @albertvillanova 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F631 中完成\n* 支持 nltk>=3.9，以修复漏洞，由 @albertvillanova 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F629 中完成\n* 在文档和 README 中添加提示，指向 lighteval，由 @MoritzLaurer 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F618 中完成\n\n## 新贡献者\n* @MoritzLaurer 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F618 中完成了首次贡献\n* @Maxwell-Jia 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F656 中完成了首次贡献\n* @kylehowells 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F682 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fcompare\u002Fv0.4.3...v0.4.4","2025-06-20T17:49:29",{"id":157,"version":158,"summary_zh":159,"released_at":160},62773,"v0.4.3","此版本通过移除对已弃用代码的调用，新增了对 `datasets>=3.0` 的支持。\n\n## 变更内容\n* 由 @albertvillanova 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F623 中修复 CI，临时锁定 nltk\u003C3.9\n* 由 @albertvillanova 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F621 中将已弃用的 `use_auth_token` 替换为 `token`\n* 由 @lhoestq 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F624 中移除 `ignore_url_params`\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fcompare\u002Fv0.4.2...v0.4.3","2024-09-11T10:17:30",{"id":162,"version":163,"summary_zh":164,"released_at":165},62774,"v0.4.2","## 变更内容\n* @krishnap25 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F416 中更新了 mauve 的文档和引用\n* @daskol 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F507 中移除了未使用的依赖\n* @osanseviero 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F528 中添加了混淆矩阵\n* @qubvel 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F571 中将 Python 更新至 3.8\n* @lhoestq 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F578 中修复了 FileFreeLock 问题\n* @alexrs 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F575 中修复了 load 函数中的示例文档\n* @qubvel 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F569 中加速了 mean_iou 指标的计算\n\n## 新贡献者\n* @rtrompier 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F510 中做出了首次贡献\n* @daskol 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F507 中做出了首次贡献\n* @qubvel 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F571 中做出了首次贡献\n* @alexrs 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F575 中做出了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fcompare\u002Fv0.4.1...v0.4.2","2024-04-30T09:45:36",{"id":167,"version":168,"summary_zh":169,"released_at":170},62775,"v0.4.1","## 变更内容\n* @stevhliu 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F374 中为文档字符串添加代码示例\n* [小修复] @cakiki 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F403 中修正了一个拼写错误\n* [文档] @hazrulakmal 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F386 中修复了 bertscore 说明文件中的一个拼写错误\n* @kdutia 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F411 中为困惑度度量的文档字符串添加了 `max_length` 关键字参数\n* @tupini07 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F417 中修复了 a_quick_tour.mdx 中的一个小拼写错误\n* @jorahn 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F418 中修复了 docs\u002Fbase_evaluator.mdx 文件\n* @BramVanroy 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F427 中更新了 Gradio 的描述，以明确文本输入的要求\n* @hazrulakmal 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F424 中修复了 `add` 方法\n* @tupini07 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F419 中修复了 docs\u002Fa_quick_tour.mdx 中的失效链接\n* @Plutone11011 解决了 #379 问题，添加了音频分类评估器及相应文档，见 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F405\n* @Plutone11011 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F425 中修复了 `combine` 函数中关键字参数未被传递的问题\n* @TKaanKoc 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F407 中添加了 r^2 指标\n* @BramVanroy 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F426 中将 Spaces 的 Gradio 版本更新至 3.19.1\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F447 中用 datasets 替代了 evaluate 的 DownloadConfig\n* @mariosasko 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F463 中渲染了 Text2TextGenerationEvaluators 的文档字符串示例\n* @Wauplin 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F467 中配置了在 ci-* 分支上触发 CI 流程\n* @ricardorei 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F443 中更新了 comet 指标\n* @mariosasko 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F490 中修复了 Meteor 指标中 `datasets` 的导入问题\n* @bzz 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F498 中修正了 scikit-learn 包名的建议\n* 发布：0.4.1，由 @lhoestq 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F505 中完成\n\n## 新贡献者\n* @cakiki 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F403 中做出了首次贡献\n* @hazrulakmal 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F386 中做出了首次贡献\n* @kdutia 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F411 中做出了首次贡献\n* @tupini07 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F417 中做出了首次贡献\n* @jorahn 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F418 中做出了首次贡献\n* @Plutone11011 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F405 中做出了首次贡献\n* @TKaanKoc 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F407 中做出了首次贡献\n* @mariosasko 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F463 中做出了首次贡献\n* @Wauplin 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F467 中做出了首次贡献\n* @ricardorei 做出了首次…","2023-10-13T15:57:18",{"id":172,"version":173,"summary_zh":174,"released_at":175},62776,"v0.4.0","## 变更内容\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F325 中添加了训练器集成文档\n* @mathemakitten 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F333 中停止在困惑度计算中使用模型定义的截断\n* @fxmarty 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F341 中修改文档，不再对 Evaluator 实例使用 `eval` 方法\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F336 中修复了缓存问题\n* @Raibows 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F335 中修复了 #327 问题，将 Gradio WebUI 的默认行数设置为 1，并移除空行\n* @mishig25 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F344 中更新了 PR 文档操作\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F345 中修复了 Spaces 中 `scikit-learn` 的安装问题\n* @kashif 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F330 中新增了 MASE、sMAPE 和 MAPE 指标\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F346 中修复了 MAPE、MASE 和 sMAPE 中对 `scikit-learn` 的依赖问题\n* @stevhliu 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F360 中更新了链接文本\n* @clefourrier 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F359 中修正了 MAE 的取值范围\n* @mishig25 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F363 中撤销了“更新 PR 文档操作”的更改\n* @mathemakitten 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F337 中实现了评估套件\n* @sanderland 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F362 中实现了 Matthews 相关系数\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F372 中修复了 TensorFlow 版本问题\n* @NimaBoscarino 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F350 中添加了文本生成评估器\n* @davebulaval 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F364 中修复了 ROUGE 类型中的拼写错误\n* @awinml 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F368 中增加了 `scikit-learn` 的 `Evaluate` 使用方法\n* @sashavor 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F342 中添加了指标可视化功能\n* @BramVanroy 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F250 中添加了 NIST 指标\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F375 中添加了 GitHub Actions CI\n* @arjunpatel7 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F370 中增加了 Keras 和 TensorFlow 的 `Evaluate` 使用方法\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F380 中修复了版本问题\n* @BramVanroy 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F286 中提出了 MT 指标 CharacTER\n* @BramVanroy 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F290 中提出了另一种基于字符的 MT 评估指标 CharCut\n* @bayartsogt-ya 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F378 中添加了 ASR 模型评估器并更新了相关文档\n* @mathemakitten 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F340 中编写了 EvaluationSuite 的文档\n* @krishnap25 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F377 中更新了 Mauve 的文档\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F385 中修复了 CI 徽章问题\n\n## 新贡献者\n* @Raibows 在 https:\u002F\u002Fgit 中做出了首次贡献","2022-12-13T13:35:51",{"id":177,"version":178,"summary_zh":179,"released_at":180},62777,"v0.3.0","## 变更内容\n* @fcakyon 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F221 中添加了多标签 F1 分数的评估用法\n* @mathemakitten 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F227 中强制 `get_supported_tasks()` 返回列表而非字典键值\n* @albertvillanova 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F220 中解绑了 `rouge_score` 的版本锁定\n* @meg-huggingface 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F231 中移除了测量卡片中的导入语句\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F229 中使 ROUGE 支持多参考文本\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F230 中修复了强制字符串格式的问题\n* @mathemakitten 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F238 中修正了困惑度测量文档中的示例\n* @douwekiela 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F237 中添加了威尔科克森符号秩检验\n* @fxmarty 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F205 中为 TextClassificationEvaluator 添加了对两个输入列的支持\n* @BramVanroy 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F248 中修复了 TEMPLATE_REQUIRE 中的错误：添加逗号\n* @stevhliu 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F236 中提出了快速入门文档的少量改进建议\n* @BramVanroy 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F247 中澄清了 ChrF 无参考文本时的错误信息\n* @BramVanroy 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F246 中仅跟踪唯一的缺失依赖项\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F228 中更新了 Spaces 中的 evaluate 库\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F253 中将 `commit_hash` 添加到参数中\n* @mathemakitten 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F242 中将困惑度计算的底数改为自然常数 e\n* @mathemakitten 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F254 中针对之前的 PR 进行了变基操作\n* @mathemakitten 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F255 中修复了使用自然常数 e 计算困惑度后的新文档字符串\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F258 中为 ROUGE 添加了分词器选项\n* @meg-huggingface 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F263 中向示例中添加了 `list_duplicates=True` 参数\n* @meg-huggingface 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F267 中对功能描述进行了小幅修改\n* @meg-huggingface 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F268 中将示例输出映射到返回结果\n* @meg-huggingface 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F265 中将“duplicates_list”改为“duplicates_dict”（因为实际是字典类型）\n* @meg-huggingface 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F264 中在示例中将“duplicates_list”改为“duplicates_dict”\n* @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F273 中为双列一致性测试添加了“慢速”标志\n* @fxmarty 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F272 中从问答评估器的默认 `PIPELINE_KWARGS` 中移除了 `handle_impossible_answer`\n* 毒性检测由 @sashavo 实现","2022-10-13T13:04:14",{"id":182,"version":183,"summary_zh":184,"released_at":185},62778,"v0.2.2","## 变更内容\n* 由 @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F218 中更新 CLI 文档\n* 由 @mathemakitten 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F206 中为每个 EvaluationModule 添加指纹\n* 由 @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F222 中修复加载错误\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fcompare\u002Fv0.2.1...v0.2.2","2022-07-29T14:58:33",{"id":187,"version":188,"summary_zh":189,"released_at":190},62779,"v0.2.1","## 变更内容\n* 由 @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F203 中为质量和风格检查添加度量指标\n* 由 @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F204 中为代码质量测试添加比较和度量指标\n* 由 @albertvillanova 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F207 中从文档中移除对 datasets 的提及\n* 由 @sashavor 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F202 中添加标签分布度量\n* 由 @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F217 中修复空格标注问题\n* 由 @lvwerra 在 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F216 中将 datasets 版本设置为 >=2.0.0\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fcompare\u002Fv0.2.0...v0.2.1","2022-07-28T13:13:50",{"id":192,"version":193,"summary_zh":194,"released_at":195},62780,"v0.2.0","## What's New\r\n\r\n### `evaluator`\r\nThe `evaluator` has been extended to three new tasks:\r\n- `\"image-classification\"`\r\n- `\"token-classification\"`\r\n- `\"question-answering\"`\r\n\r\n### `combine`\r\nWith `combine` one can bundle several metrics into a single object that can be evaluated in one call and also used in combination with the `evalutor`.\r\n\r\n## What's Changed\r\n* Fix typo in WER docs by @pn11 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F147\r\n* Fix rouge outputs by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F158\r\n* add tutorial for custom pipeline by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F154\r\n* refactor `evaluator` tests by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F155\r\n* rename `input_texts` to `predictions` in perplexity by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F157\r\n* Add link to GitHub author by @lewtun in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F166\r\n* Add `combine` to compose multiple evaluations by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F150\r\n* test string casting only on first element by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F159\r\n* remove unused fixtures from unittests by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F170\r\n* Add a test to check that Evaluator evaluations match transformers examples by @fxmarty in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F163\r\n* Add smaller model for `TextClassificationEvaluator` test by @fxmarty in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F172\r\n* Add tags to spaces by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F162\r\n* Rename evaluation modules by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F160\r\n* Update push_evaluations_to_hub.py by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F174\r\n* update evaluate dependency for spaces by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F175\r\n* Add `ImageClassificationEvaluator` by @fxmarty in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F173\r\n* attempting to let meteor handle multiple references per prediction by @sashavor in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F164\r\n* fixed duplicate calculation of spearmanr function in metrics wrapper. by @benlipkin in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F176\r\n* forbid hyphens in template for module names by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F177\r\n* switch from Github to Hub module factory for canonical modules by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F180\r\n* Fix bertscore idf by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F183\r\n* refactor evaluator base and task classes by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F185\r\n* Avoid importing tensorflow when importing evaluate by @NouamaneTazi in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F135\r\n* Add QuestionAnsweringEvaluator by @fxmarty in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F179\r\n* Evaluator perf by @ola13 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F178\r\n* Fix QuestionAnsweringEvaluator for squad v2, fix examples by @fxmarty in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F190\r\n* Rename perf metric evaluator by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F191\r\n* Fix typos in QA Evaluator by @lewtun in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F192\r\n* Evaluator device placement by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F193\r\n* Change test command in installation.mdx to use exact_match by @mathemakitten in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F194\r\n* Add `TokenClassificationEvaluator` by @fxmarty in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F167\r\n* Pin rouge_score by @albertvillanova in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F197\r\n* add poseval by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F195\r\n* Combine docs by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F201\r\n* Evaluator column loading by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F200\r\n* Evaluator documentation by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F199\r\n\r\n## New Contributors\r\n* @pn11 made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F147\r\n* @fxmarty made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F163\r\n* @benlipkin made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F176\r\n* @NouamaneTazi made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F135\r\n* @mathemakitten made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F194\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fcompare\u002Fv0.1.2...v0.2.0","2022-07-25T14:34:09",{"id":197,"version":198,"summary_zh":199,"released_at":200},62781,"v0.1.2","## What's Changed\r\n* Fix trec sacrebleu by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F130\r\n* Add distilled version Cometihno by @BramVanroy in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F131\r\n* fix: add yaml extension to github action for release by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F133\r\n* fix docs badge by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F134\r\n* fix cookiecutter path to repository by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F139\r\n* docs: make metric cards more prominent by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F132\r\n* Update README.md by @sashavor in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F145\r\n* Fix datasets download imports by @albertvillanova in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F143\r\n\r\n## New Contributors\r\n* @BramVanroy made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F131\r\n* @albertvillanova made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F143\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fcompare\u002Fv0.1.1...v0.1.2","2022-06-16T10:01:40",{"id":202,"version":203,"summary_zh":204,"released_at":205},62782,"v0.1.1","## What's Changed\r\n* Fix broken links by @mishig25 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F92\r\n* Fix readme by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F98\r\n* Fixing broken evaluate-measurement hub link by @panwarnaveen9 in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F102\r\n* fix typo in autodoc by @manueldeprada in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F101\r\n* fix typo by @manueldeprada in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F100\r\n* FIX `pip install evaluate[evaluator]` by @philschmid in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F103\r\n* fix description field in metric template readme by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F122\r\n* Add automatic pypi release for evaluate by @osanseviero in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F121\r\n* Fix typos in Evaluator docstrings by @lewtun in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F124\r\n* Fix spaces description in metadata by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F123\r\n* fix revision string if it is a python version by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F129\r\n* Use accuracy as default metric for text classification Evaluator by @lewtun in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F128\r\n* bump `evaluate` dependency in spaces by @lvwerra in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F88\r\n\r\n## New Contributors\r\n* @panwarnaveen9 made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F102\r\n* @manueldeprada made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F101\r\n* @philschmid made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F103\r\n* @osanseviero made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F121\r\n* @lewtun made their first contribution in https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fpull\u002F124\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate\u002Fcompare\u002Fv0.1.0...v0.1.1","2022-06-08T12:38:39",{"id":207,"version":208,"summary_zh":209,"released_at":210},62783,"v0.1.0","# Release notes\r\n​\r\nThese are the release notes of the initial release of the Evaluate library.\r\n​\r\n## Goals\r\n​\r\nGoals of the Evaluate library:\r\n​\r\n- reproducibility: reporting and reproducing results is easy\r\n- ease-of-use: access to a wide range of evaluation tools with a unified interface\r\n- diversity: provide wide range of evaluation tools with metrics, comparisons, and measurements\r\n- multimodal: models and datasets of many modalities can be evaluated\r\n- community-driven: anybody can add custom evaluations by hosting them on the Hugging Face Hub\r\n​\r\n## Release overview:\r\n​\r\n- `evaluate.load()`: The `load()` function is the main entry point into evaluate and allows to load evaluation modules from a local folder, the evaluate repository, or the Hugging Face Hub. It downloads, caches, and loads the evaluation modules and returns an `evaluate.EvaluationModule`.\r\n- `evaluate.save()`: With `save()` a user can save evaluation results in a JSON file. In addition to the results from `evaluate.EvaluationModule` it can save additional parameters and automatically saves the timestamp, git commit hash, library version as well as Python path. One can either provide a directory for the results, in which case file names are automatically created, or an explicit file name for the result.\r\n- `evaluate.push_to_hub()`: The `push_to_hub` function allows to push the results of a model evaluation to the model card on the Hugging Face Hub. The model, dataset, and metric are specified such that they can be linked on the hub.\r\n- `evaluate.EvaluationModule`: The `EvaluationModule` class is the baseclass for all evaluation modules. There are three module types: metrics (to evaluate models), comparisons (to compare models), and measurements (to analyze datasets). The inputs can be either added with `add` (single input) and `add_batch` (batch of inputs) followed by a final `compute` call to compute the scores or all inputs can be passed to `compute` directly. Under the hood, Apache Arrow stores and loads the input data to compute the scores.\r\n- `evaluate.EvaluationModuleInfo`: The `EvaluationModule` class is used to store attributes:\r\n    - `description`: A short description of the evaluation module.\r\n    - `citation`: A BibTex string for citation when available.\r\n    - `features`: A `Features` object defining the input format. The inputs provided to `add`, `add_batch`, and `compute` are tested against these types and an error is thrown in case of a mismatch.\r\n    - `inputs_description`: This is equivalent to the modules docstring.\r\n    - `homepage`: The homepage of the module.\r\n    - `license`: The license of the module.\r\n    - `codebase_urls`: Link to the code behind the module.\r\n    - `reference_urls`: Additional reference URLs.\r\n- `evaluate.evaluator`: The `evaluator` provides automated evaluation and only requires a model, dataset, metric, in contrast to the metrics in the `EvaluationModule` which require  model predictions. It has three main components: a model wrapped in a pipeline, a dataset, and a metric, and it returns the computed evaluation scores. Besides the three main components, it may also require two mappings to align the columns in the dataset and the pipeline labels with the datasets labels. This is an experimental feature -- currently, only text classification is supported.\r\n- `evaluate-cli`: The community can add custom metrics by adding the necessary module script to a Space on the Hugging Face Hub. The `evaluate-cli` is a tool that simplifies this process by creating the Space, populating a template, and pushing it to the Hub. It also provides instructions to customize the template and integrate custom logic.\r\n​\r\n## Main contributors:\r\n​\r\n@lvwerra , @sashavor , @NimaBoscarino , @ola13 , @osanseviero , @lhoestq , @lewtun , @douwekiela  ","2022-05-31T13:57:20"]