[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Unbabel--COMET":3,"tool-Unbabel--COMET":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":78,"owner_url":79,"languages":80,"stars":85,"forks":86,"last_commit_at":87,"license":88,"difficulty_score":23,"env_os":89,"env_gpu":90,"env_ram":91,"env_deps":92,"category_tags":101,"github_topics":102,"view_count":23,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":109,"updated_at":110,"faqs":111,"releases":137},3290,"Unbabel\u002FCOMET","COMET"," A Neural Framework for MT Evaluation","COMET 是一款基于神经网络的机器翻译（MT）自动评估框架，旨在为翻译质量提供接近人类判断水平的精准评分。在传统评估中，依赖人工打分既昂贵又耗时，而简单的字符串匹配指标往往无法捕捉语义差异。COMET 通过深度学习模型，有效解决了这一痛点，能够灵活支持“有参考译文”和“无参考译文”两种评估模式，甚至能利用上下文信息进行文档级评估，显著提升了在对话翻译等复杂场景下的准确性。\n\n该工具特别适合自然语言处理研究人员、机器翻译开发者以及需要大规模监控翻译质量的企业团队使用。除了输出直观的质量分数外，COMET 还具备独特的技术亮点：其最新的 XCOMET 模型不仅能识别翻译中的轻微、重大或严重错误，还能生成自由的文本解释，帮助用户理解扣分原因；而 DocCOMET 扩展则让模型能够结合上下文语境，更准确地评估篇章连贯性。用户既可以通过命令行快速批量评测，也能将其集成到 Python 代码中构建自动化工作流。作为一款开源且持续更新的项目，COMET 已成为当前机器翻译领域衡量与优化模型性能的重要标准工具。","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FUnbabel_COMET_readme_99bcf6935afb.png\">\n  \u003Cbr \u002F>\n  \u003Cbr \u002F>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fblob\u002Fmaster\u002FLICENSE\">\u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FUnbabel\u002FCOMET\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fstargazers\">\u003Cimg alt=\"GitHub stars\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FUnbabel\u002FCOMET\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"\">\u003Cimg alt=\"PyPI\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Funbabel-comet\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack\">\u003Cimg alt=\"Code Style\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-black\" \u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n**NEWS:** \n1) We added a new method to extract free-text explanations from XCOMET outputs! [Check this section](https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET?tab=readme-ov-file#explaining-translation-errors)\n2) We now support [DocCOMET](https:\u002F\u002Fstatmt.org\u002Fwmt22\u002Fpdf\u002F2022.wmt-1.6.pdf), a document-level extension of COMET which can utilize contextual information. Using context improves accuracy on discourse phenomena tasks as well as referenceless evaluation of [chat translation quality](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.08314).\n3) We released our new eXplainable COMET models ([XCOMET-XL](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XL) and [-XXL](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XXL)) which along with quality scores detects which errors in the translation are minor, major or critical according to MQM typology\n\nPlease check all available models [here](https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fblob\u002Fmaster\u002FMODELS.md)\n \n# Quick Installation\n\nCOMET requires python 3.8 or above. Simple installation from PyPI\n\n```bash\npip install --upgrade pip  # ensures that pip is current \npip install unbabel-comet\n```\n\n**Note:** To use some COMET models such as `Unbabel\u002Fwmt22-cometkiwi-da` you must acknowledge it's license on Hugging Face Hub and [log-in into hugging face hub](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fquick-start#:~:text=Once%20you%20have%20your%20User%20Access%20Token%2C%20run%20the%20following%20command%20in%20your%20terminal%3A).\n\n\nTo develop locally install run the following commands:\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\ncd COMET\npip install poetry\npoetry install\n```\n\nFor development, you can run the CLI tools directly, e.g.,\n\n```bash\nPYTHONPATH=. .\u002Fcomet\u002Fcli\u002Fscore.py\n```\n\n# Table of Contents\n\n1. [Scoring MT outputs](#scoring-mt-outputs)\n    1. [CLI Usage](#cli-usage)\n        1. [Basic scoring command](#basic-scoring-command)\n        2. [Reference-free evaluation](#reference-free-evaluation)\n        3. [Comparing multiple systems](#comparing-multiple-systems)\n        4. [Minimum Bayes Risk Decoding](#minimum-bayes-risk-decoding)\n2. [COMET Models](#comet-models)\n    1. [Interpreting Scores](#interpreting-scores)\n    2. [Languages Covered](#languages-covered)\n    3. [COMET for African Languages](#comet-for-african-languages)\n    4. [Scoring within Python](#scoring-within-python)\n    5. [Explaining Translation Errors](#explaining-translation-errors)\n3. [Train your own Metric](#train-your-own-metric)\n4. [Unittest](#unittest)\n5. [Publications](#publications)\n\n\n# Scoring MT outputs:\n\n## CLI Usage:\n\nTest examples:\n\n```bash\necho -e \"10 到 15 分钟可以送到吗\\nPode ser entregue dentro de 10 a 15 minutos?\" >> src.txt\necho -e \"Can I receive my food in 10 to 15 minutes?\\nCan it be delivered in 10 to 15 minutes?\" >> hyp1.txt\necho -e \"Can it be delivered within 10 to 15 minutes?\\nCan you send it for 10 to 15 minutes?\" >> hyp2.txt\necho -e \"Can it be delivered between 10 to 15 minutes?\\nCan it be delivered between 10 to 15 minutes?\" >> ref.txt\n```\n\n### Basic scoring command:\n```bash\ncomet-score -s src.txt -t hyp1.txt -r ref.txt\n```\n> you can set the number of gpus using `--gpus` (0 to test on CPU).\n\nFor better error analysis, you can use XCOMET models such as [`Unbabel\u002FXCOMET-XL`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XL), you can export the identified errors using the `--to_json` flag:\n\n```bash\ncomet-score -s src.txt -t hyp1.txt -r ref.txt --model Unbabel\u002FXCOMET-XL --to_json output.json\n```\n\nScoring multiple systems:\n```bash\ncomet-score -s src.txt -t hyp1.txt hyp2.txt -r ref.txt\n```\n\nWMT test sets via [SacreBLEU](https:\u002F\u002Fgithub.com\u002Fmjpost\u002Fsacrebleu):\n\n```bash\ncomet-score -d wmt22:en-de -t PATH\u002FTO\u002FTRANSLATIONS\n```\n\nScoring with context:\n```bash\necho -e \"Pies made from apples like these. \u003C\u002Fs> Oh, they do look delicious.\\nOh, they do look delicious.\" >> src.txt\necho -e \"Des tartes faites avec des pommes comme celles-ci. \u003C\u002Fs> Elles ont l’air delicieux.\\nElles ont l’air delicieux\" >> hyp1.txt\necho -e \"Des tartes faites avec des pommes comme celles-ci. \u003C\u002Fs> Ils ont l’air delicieux.\\nIls ont l’air delicieux.\" >> hyp2.txt\n```\n\nwhere `\u003C\u002Fs>` is the separator token of the specific tokenizer (here: `xlm-roberta-large`) that the underlying model uses. \n\n```bash\ncomet-score -s src.txt -t hyp1.txt hyp2.txt --model Unbabel\u002Fwmt20-comet-qe-da --enable-context\n```\n\nIf you are only interested in a system-level score use the following command:\n\n```bash\ncomet-score -s src.txt -t hyp1.txt -r ref.txt --quiet --only_system\n```\n\n### Reference-free evaluation:\n\n```bash\ncomet-score -s src.txt -t hyp1.txt --model Unbabel\u002Fwmt22-cometkiwi-da\n```\n\n**Note:** To use the `Unbabel\u002Fwmt23-cometkiwi-da-xl` you first have to acknowledge its license on [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FUnbabel\u002Fwmt23-cometkiwi-da-xl).\n\n### Comparing multiple systems:\n\nWhen comparing multiple MT systems we encourage you to run the `comet-compare` command to get **statistical significance** with Paired T-Test and bootstrap resampling [(Koehn, et al 2004)](https:\u002F\u002Faclanthology.org\u002FW04-3250\u002F).\n\n```bash\ncomet-compare -s src.de -t hyp1.en hyp2.en hyp3.en -r ref.en\n```\n\n### Minimum Bayes Risk Decoding:\n\nThe MBR command allows you to rank translations and select the best one according to COMET metrics. For more details you can read our paper on [Quality-Aware Decoding for Neural Machine Translation](https:\u002F\u002Faclanthology.org\u002F2022.naacl-main.100.pdf).\n\n\n```bash\ncomet-mbr -s [SOURCE].txt -t [MT_SAMPLES].txt --num_sample [X] -o [OUTPUT_FILE].txt\n```\n\nIf working with a very large candidate list you can use `--rerank_top_k` flag to prune the topK most promissing candidates according to a reference-free metric.\n\nExample for a candidate list of 1000 samples:\n\n```bash\ncomet-mbr -s [SOURCE].txt -t [MT_SAMPLES].txt -o [OUTPUT_FILE].txt --num_sample 1000 --rerank_top_k 100 --gpus 4 --qe_model Unbabel\u002Fwmt23-cometkiwi-da-xl\n```\n\nYour source and samples file should be [formatted in this way](https:\u002F\u002Funbabel.github.io\u002FCOMET\u002Fhtml\u002Frunning.html#:~:text=Example%20with%202%20source%20and%203%20samples%3A).\n\n# COMET Models\n\nWithin COMET, there are several evaluation models available. You can refer to the [MODELS](MODELS.md) page for a comprehensive list of all available models. Here is a concise list of the main reference-based and reference-free models:\n\n- **Default Model:** [`Unbabel\u002Fwmt22-comet-da`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt22-comet-da) - This model employs a reference-based regression approach and is built upon the XLM-R architecture. It has been trained on direct assessments from WMT17 to WMT20 and provides scores ranging from 0 to 1, where 1 signifies a perfect translation.\n- **Reference-free Model:** [`Unbabel\u002Fwmt22-cometkiwi-da`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt22-cometkiwi-da) - This reference-free model employs a regression approach and is built on top of InfoXLM. It has been trained using direct assessments from WMT17 to WMT20, as well as direct assessments from the MLQE-PE corpus. Similar to other models, it generates scores ranging from 0 to 1. For those interested, we also offer larger versions of this model: [`Unbabel\u002Fwmt23-cometkiwi-da-xl`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt23-cometkiwi-da-xl) with 3.5 billion parameters and [`Unbabel\u002Fwmt23-cometkiwi-da-xxl`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt23-cometkiwi-da-xxl) with 10.7 billion parameters.\n- **eXplainable COMET (XCOMET):** [`Unbabel\u002FXCOMET-XXL`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XXL) - Our latest model is trained to identify error spans and assign a final quality score, resulting in an explainable neural metric. We offer this version in XXL with 10.7 billion parameters, as well as the XL variant with 3.5 billion parameters ([`Unbabel\u002FXCOMET-XL`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XL)). These models have demonstrated the highest correlation with MQM and are our best performing evaluation models.\n\nPlease be aware that different models may be subject to varying licenses. To learn more, kindly refer to the [LICENSES.models](LICENSE.models.md) and model licenses sections.\n\nIf you intend to compare your results with papers published before 2022, it's likely that they used older evaluation models. In such cases, please refer to [`Unbabel\u002Fwmt20-comet-da`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt20-comet-da) and [`Unbabel\u002Fwmt20-comet-qe-da`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt20-comet-qe-da), which were the primary checkpoints used in previous versions (\u003C2.0) of COMET.\n\nAlso, [UniTE Metric](https:\u002F\u002Faclanthology.org\u002F2022.acl-long.558\u002F) developed by the NLP2CT Lab at the University of Macau and Alibaba Group can be used directly through COMET check [here for more details](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Funite-mup).\n\n## Interpreting Scores:\n\n**New:** An excellent reference for learning how to interpret machine translation metrics is the analysis paper by Kocmi et al. (2024), available [at this link.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.06760.pdf)\n\nWhen using COMET to evaluate machine translation, it's important to understand how to interpret the scores it produces.\n\nIn general, COMET models are trained to predict quality scores for translations. These scores are typically normalized using a [z-score transformation](https:\u002F\u002Fsimplypsychology.org\u002Fz-score.html) to account for individual differences among annotators. While the raw score itself does not have a direct interpretation, it is useful for ranking translations and systems according to their quality.\n\nHowever, since 2022 we have introduced a new training approach that scales the scores between 0 and 1. This makes it easier to interpret the scores: a score close to 1 indicates a high-quality translation, while a score close to 0 indicates a translation that is no better than random chance. Also, with the introduction of XCOMET models we can now analyse which text spans are part of minor, major or critical errors according to the MQM typology.\n\nIt's worth noting that when using COMET to compare the performance of two different translation systems, it's important to run the `comet-compare` command to obtain statistical significance measures. This command compares the output of two systems using a statistical hypothesis test, providing an estimate of the probability that the observed difference in scores between the systems is due to chance. This is an important step to ensure that any differences in scores between systems are statistically significant.\n\nOverall, the added interpretability of scores in the latest COMET models, combined with the ability to assess statistical significance between systems using `comet-compare`, make COMET a valuable tool for evaluating machine translation.\n\n## Languages Covered:\n\nAll the above mentioned models are build on top of XLM-R (variants) which cover the following languages:\n\nAfrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskrit, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.\n\n**Thus, results for language pairs containing uncovered languages are unreliable!**\n\n### COMET for African Languages:\n\nIf you are interested in COMET metrics for african languages please visit [afriCOMET](https:\u002F\u002Fgithub.com\u002Fmasakhane-io\u002Fafricomet). \n\n## Scoring within Python:\n\n```python\nfrom comet import download_model, load_from_checkpoint\n\n# Choose your model from Hugging Face Hub\nmodel_path = download_model(\"Unbabel\u002FXCOMET-XL\")\n# or for example:\n# model_path = download_model(\"Unbabel\u002Fwmt22-comet-da\")\n\n# Load the model checkpoint:\nmodel = load_from_checkpoint(model_path)\n\n# Data must be in the following format:\ndata = [\n    {\n        \"src\": \"10 到 15 分钟可以送到吗\",\n        \"mt\": \"Can I receive my food in 10 to 15 minutes?\",\n        \"ref\": \"Can it be delivered between 10 to 15 minutes?\"\n    },\n    {\n        \"src\": \"Pode ser entregue dentro de 10 a 15 minutos?\",\n        \"mt\": \"Can you send it for 10 to 15 minutes?\",\n        \"ref\": \"Can it be delivered between 10 to 15 minutes?\"\n    }\n]\n# Call predict method:\nmodel_output = model.predict(data, batch_size=8, gpus=1)\n```\n\nAs output, we get the following information:\n```python\n# Sentence-level scores (list)\n>>> model_output.scores\n[0.9822099208831787, 0.9599897861480713]\n\n# System-level score (float)\n>>> model_output.system_score\n0.971099853515625\n\n# Detected error spans (list of list of dicts)\n>>> model_output.metadata.error_spans\n[\n  [{'confidence': 0.4160953164100647,\n   'end': 21,\n   'severity': 'minor',\n   'start': 13,\n   'text': 'my food'}],\n  [{'confidence': 0.40004390478134155,\n   'end': 19,\n   'severity': 'minor',\n   'start': 3,\n   'text': 'you send it for'}]\n]\n```\n\nHowever, note that not all COMET models return metadata with detected error spans.\n\n\n## Explaining translation errors:\n\nCheck [this notebook](https:\u002F\u002Fgist.github.com\u002Fmtreviso\u002Fb618b499bc6de0414a3e11157e91cf02) for a minimal example on how you can combine xCOMET with [xTower](https:\u002F\u002Fhuggingface.co\u002Fsardinelab\u002FxTower13B) to generate a natural language explanation for each error span. \n\nFor the Portuguese-English example above, we would call xTower with the following **prompt**:\n> You are provided with a Source, Translation, Translation quality analysis, and Translation quality score (weak, moderate, good, excellent, best). The Translation quality analysis contains a translation with marked error spans with different levels of severity (minor or major). Given this information, generate an explanation for each error and a fully correct translation. \u003Cbr>\u003Cbr>\n> Portuguese source: Pode ser entregue dentro de 10 a 15 minutos? \u003Cbr>\n> English translation: Can you send it for 10 to 15 minutes? \u003Cbr>\n> Translation quality analysis: Can `\u003Cerror1 severity='minor'>`you send it for`\u003C\u002Ferror1>` 10 to 15 minutes? \u003Cbr>\n> Translation quality score: excellent\n\n\nAnd get this as output:\n> Explanation for error1: The phrase \"Can you send it for 10 to 15 minutes?\" is a mistranslation of the original Portuguese sentence. The correct interpretation should focus on the delivery time rather than the duration of sending. The original sentence is asking about the delivery time, not the duration of sending. \u003Cbr>\n> Translation correction: Can it be delivered within 10 to 15 minutes?\n\n\nFor more information, check [xTower documentation](https:\u002F\u002Fhuggingface.co\u002Fsardinelab\u002FxTower13B).\n\n\n# Train your own Metric: \n\nInstead of using pretrained models your can train your own model with the following command:\n```bash\ncomet-train --cfg configs\u002Fmodels\u002F{your_model_config}.yaml\n```\n\nYou can then use your own metric to score:\n\n```bash\ncomet-score -s src.de -t hyp1.en -r ref.en --model PATH\u002FTO\u002FCHECKPOINT\n```\n\nYou can also upload your model to [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Findex). Use [`Unbabel\u002Fwmt22-comet-da`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt22-comet-da) as example. Then you can use your model directly from the hub.\n\n# unittest:\nIn order to run the toolkit tests you must run the following command:\n\n```bash\npoetry run coverage run --source=comet -m unittest discover\npoetry run coverage report -m # Expected coverage 76%\n```\n\n**Note:** Testing on CPU takes a long time\n\n# Publications\n\nIf you use COMET please cite our work **and don't forget to say which model you used!**\n\n- [xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10482.pdf)\n\n- [Scaling up CometKiwi: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.11925.pdf)\n\n- [CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task](https:\u002F\u002Faclanthology.org\u002F2022.wmt-1.60\u002F)\n\n- [COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task](https:\u002F\u002Faclanthology.org\u002F2022.wmt-1.52\u002F)\n\n- [Searching for Cometinho: The Little Metric That Could](https:\u002F\u002Faclanthology.org\u002F2022.eamt-1.9\u002F)\n\n- [Are References Really Needed? Unbabel-IST 2021 Submission for the Metrics Shared Task](https:\u002F\u002Faclanthology.org\u002F2021.wmt-1.111\u002F)\n\n- [Uncertainty-Aware Machine Translation Evaluation](https:\u002F\u002Faclanthology.org\u002F2021.findings-emnlp.330\u002F) \n\n- [COMET - Deploying a New State-of-the-art MT Evaluation Metric in Production](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.amta-user.4)\n\n- [Unbabel's Participation in the WMT20 Metrics Shared Task](https:\u002F\u002Faclanthology.org\u002F2020.wmt-1.101\u002F)\n\n- [COMET: A Neural Framework for MT Evaluation](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.emnlp-main.213)\n","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FUnbabel_COMET_readme_99bcf6935afb.png\">\n  \u003Cbr \u002F>\n  \u003Cbr \u002F>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fblob\u002Fmaster\u002FLICENSE\">\u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FUnbabel\u002FCOMET\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fstargazers\">\u003Cimg alt=\"GitHub stars\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FUnbabel\u002FCOMET\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"\">\u003Cimg alt=\"PyPI\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Funbabel-comet\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack\">\u003Cimg alt=\"Code Style\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-black\" \u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n**新闻：**\n1) 我们新增了一种从 XCOMET 输出中提取自由文本解释的方法！[请查看此部分](https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET?tab=readme-ov-file#explaining-translation-errors)\n2) 我们现在支持 [DocCOMET](https:\u002F\u002Fstatmt.org\u002Fwmt22\u002Fpdf\u002F2022.wmt-1.6.pdf)，这是 COMET 的文档级扩展，能够利用上下文信息。使用上下文可以提升话语现象任务的准确性，以及对 [聊天翻译质量](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.08314) 的无参考评估。\n3) 我们发布了全新的可解释 COMET 模型（[XCOMET-XL](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XL) 和 [-XXL](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XXL))，这些模型在提供质量评分的同时，还能根据 MQM 分类体系检测译文中哪些错误属于轻微、严重或关键。\n\n请在此处查看所有可用模型：[这里](https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fblob\u002Fmaster\u002FMODELS.md)\n\n# 快速安装\n\nCOMET 需要 Python 3.8 或更高版本。可通过 PyPI 简单安装：\n\n```bash\npip install --upgrade pip  # 确保 pip 是最新版本\npip install unbabel-comet\n```\n\n**注意：** 若要使用某些 COMET 模型，例如 `Unbabel\u002Fwmt22-cometkiwi-da`，您必须在 Hugging Face Hub 上确认其许可证，并[登录 Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fquick-start#:~:text=Once%20you%20have%20your%20User%20Access%20Token%2C%20run%20the%20following%20command%20in%20your%20terminal%3A).\n\n若要在本地开发，请执行以下命令：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\ncd COMET\npip install poetry\npoetry install\n```\n\n在开发过程中，您可以直接运行 CLI 工具，例如：\n```bash\nPYTHONPATH=. .\u002Fcomet\u002Fcli\u002Fscore.py\n```\n\n# 目录\n\n1. [对机器翻译输出进行评分](#scoring-mt-outputs)\n    1. [CLI 使用方法](#cli-usage)\n        1. [基本评分命令](#basic-scoring-command)\n        2. [无参考评估](#reference-free-evaluation)\n        3. [比较多个系统](#comparing-multiple-systems)\n        4. [最小贝叶斯风险解码](#minimum-bayes-risk-decoding)\n2. [COMET 模型](#comet-models)\n    1. [解读评分](#interpreting-scores)\n    2. [覆盖的语言](#languages-covered)\n    3. [面向非洲语言的 COMET](#comet-for-african-languages)\n    4. [在 Python 中评分](#scoring-within-python)\n    5. [解释翻译错误](#explaining-translation-errors)\n3. [训练您自己的指标](#train-your-own-metric)\n4. [单元测试](#unittest)\n5. [出版物](#publications)\n\n\n# 对机器翻译输出进行评分：\n\n## CLI 使用方法：\n\n测试示例：\n\n```bash\necho -e \"10 到 15 分钟可以送到吗\\nPode ser entregue dentro de 10 a 15 minutos?\" >> src.txt\necho -e \"Can I receive my food in 10 to 15 minutes?\\nCan it be delivered in 10 to 15 minutes?\" >> hyp1.txt\necho -e \"Can it be delivered within 10 to 15 minutes?\\nCan you send it for 10 to 15 minutes?\" >> hyp2.txt\necho -e \"Can it be delivered between 10 to 15 minutes?\\nCan it be delivered between 10 to 15 minutes?\" >> ref.txt\n```\n\n### 基本评分命令：\n```bash\ncomet-score -s src.txt -t hyp1.txt -r ref.txt\n```\n> 您可以使用 `--gpus` 参数设置 GPU 数量（设为 0 可在 CPU 上测试）。\n\n为了更好地分析错误，您可以使用 XCOMET 模型，例如 [`Unbabel\u002FXCOMET-XL`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XL)，并使用 `--to_json` 标志导出识别出的错误：\n\n```bash\ncomet-score -s src.txt -t hyp1.txt -r ref.txt --model Unbabel\u002FXCOMET-XL --to_json output.json\n```\n\n对多个系统进行评分：\n```bash\ncomet-score -s src.txt -t hyp1.txt hyp2.txt -r ref.txt\n```\n\n通过 [SacreBLEU](https:\u002F\u002Fgithub.com\u002Fmjpost\u002Fsacrebleu) 提供的 WMT 测试集：\n```bash\ncomet-score -d wmt22:en-de -t PATH\u002FTO\u002FTRANSLATIONS\n```\n\n带上下文的评分：\n```bash\necho -e \"用这样的苹果做的馅饼。\u003C\u002Fs> 哦，它们看起来真美味。\\n哦，它们看起来真美味。\" >> src.txt\necho -e \"用这样的苹果做的馅饼。\u003C\u002Fs> 它们看起来很美味。\\n它们看起来很美味。\" >> hyp1.txt\necho -e \"用这样的苹果做的馅饼。\u003C\u002Fs> 他们看起来很美味。\\n他们看起来很美味。\" >> hyp2.txt\n```\n\n其中 `\u003C\u002Fs>` 是底层模型所使用的特定分词器（此处为 `xlm-roberta-large`）的分隔符标记。\n\n```bash\ncomet-score -s src.txt -t hyp1.txt hyp2.txt --model Unbabel\u002Fwmt20-comet-qe-da --enable-context\n```\n\n如果您只关心系统级别的评分，可以使用以下命令：\n\n```bash\ncomet-score -s src.txt -t hyp1.txt -r ref.txt --quiet --only_system\n```\n\n### 无参考评估：\n\n```bash\ncomet-score -s src.txt -t hyp1.txt --model Unbabel\u002Fwmt22-cometkiwi-da\n```\n\n**注意：** 若要使用 `Unbabel\u002Fwmt23-cometkiwi-da-xl`，您需要先在 [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FUnbabel\u002Fwmt23-cometkiwi-da-xl) 上确认其许可证。\n\n### 比较多个系统：\n\n在比较多个机器翻译系统时，我们建议您运行 `comet-compare` 命令，以通过配对 t 检验和自助重采样获得 **统计显著性** [(Koehn, et al 2004)](https:\u002F\u002Faclanthology.org\u002FW04-3250\u002F)。\n\n```bash\ncomet-compare -s src.de -t hyp1.en hyp2.en hyp3.en -r ref.en\n```\n\n### 最小贝叶斯风险解码：\n\nMBR 命令允许您根据 COMET 指标对译文进行排序，并选择最佳译文。更多详情请参阅我们的论文《面向神经机器翻译的质量感知解码》[链接](https:\u002F\u002Faclanthology.org\u002F2022.naacl-main.100.pdf)。\n\n```bash\ncomet-mbr -s [SOURCE].txt -t [MT_SAMPLES].txt --num_sample [X] -o [OUTPUT_FILE].txt\n```\n\n如果候选列表非常庞大，您可以使用 `--rerank_top_k` 标志，根据无参考指标筛选出最有可能的前 K 个候选。\n\n例如，针对 1000 个样本的候选列表：\n\n```bash\ncomet-mbr -s [SOURCE].txt -t [MT_SAMPLES].txt -o [OUTPUT_FILE].txt --num_sample 1000 --rerank_top_k 100 --gpus 4 --qe_model Unbabel\u002Fwmt23-cometkiwi-da-xl\n```\n\n您的源文件和样本文件应按照如下方式格式化：[链接](https:\u002F\u002Funbabel.github.io\u002FCOMET\u002Fhtml\u002Frunning.html#:~:text=Example%20with%202%20source%20and%203%20samples%3A).\n\n# COMET 模型\n\n在 COMET 中，有多种评估模型可供使用。您可以参考 [MODELS](MODELS.md) 页面，以获取所有可用模型的完整列表。以下是主要的基于参考和无参考模型的简要列表：\n\n- **默认模型：** [`Unbabel\u002Fwmt22-comet-da`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt22-comet-da) - 该模型采用基于参考的回归方法，基于 XLM-R 架构构建。它使用 WMT17 至 WMT20 的直接评分进行训练，输出分数范围为 0 到 1，其中 1 表示完美翻译。\n- **无参考模型：** [`Unbabel\u002Fwmt22-cometkiwi-da`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt22-cometkiwi-da) - 这种无参考模型同样采用回归方法，基于 InfoXLM 构建。它使用 WMT17 至 WMT20 的直接评分以及 MLQE-PE 语料库中的直接评分进行训练。与其他模型类似，其输出分数范围也是 0 到 1。对于有兴趣的用户，我们还提供了该模型的更大版本：参数量为 35 亿的 [`Unbabel\u002Fwmt23-cometkiwi-da-xl`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt23-cometkiwi-da-xl)，以及参数量为 107 亿的 [`Unbabel\u002Fwmt23-cometkiwi-da-xxl`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt23-cometkiwi-da-xxl)。\n- **可解释的 COMET（XCOMET）：** [`Unbabel\u002FXCOMET-XXL`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XXL) - 我们的最新模型经过训练，能够识别错误片段并给出最终质量得分，从而生成一种可解释的神经网络指标。我们提供 XXL 版本（参数量为 107 亿）以及 XL 版本（参数量为 35 亿）（[`Unbabel\u002FXCOMET-XL`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XL)）。这些模型与 MQM 的相关性最高，是我们表现最佳的评估模型。\n\n请注意，不同模型可能适用不同的许可证。如需了解更多信息，请参阅 [LICENSES.models](LICENSE.models.md) 和各模型的许可证部分。\n\n如果您希望将结果与 2022 年之前发表的论文进行比较，很可能他们使用的是较旧的评估模型。在这种情况下，请参考 [`Unbabel\u002Fwmt20-comet-da`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt20-comet-da) 和 [`Unbabel\u002Fwmt20-comet-qe-da`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt20-comet-qe-da)，它们是 COMET 早期版本（\u003C2.0）中使用的主检查点。\n\n此外，由澳门大学 NLP2CT 实验室和阿里巴巴集团开发的 [UniTE Metric](https:\u002F\u002Faclanthology.org\u002F2022.acl-long.558\u002F) 可以通过 COMET 直接使用，[详情请见此处](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Funite-mup)。\n\n## 分数解读：\n\n**新增：** 学习如何解读机器翻译指标的绝佳参考资料是 Kocmi 等人于 2024 年发表的分析论文，[请点击此链接查看](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.06760.pdf)。\n\n在使用 COMET 评估机器翻译时，理解其输出分数的含义非常重要。\n\n一般来说，COMET 模型被训练用来预测翻译的质量得分。这些得分通常会通过 [z 分数转换](https:\u002F\u002Fsimplypsychology.org\u002Fz-score.html)进行归一化处理，以考虑标注者之间的个体差异。虽然原始得分本身没有直接的解释意义，但它可以用于根据质量对翻译和系统进行排名。\n\n然而，自 2022 年起，我们引入了一种新的训练方法，将分数缩放到 0 到 1 之间。这使得分数更容易解读：接近 1 的分数表示高质量的翻译，而接近 0 的分数则表示翻译质量与随机猜测无异。此外，随着 XCOMET 模型的推出，我们现在可以根据 MQM 类型学分析哪些文本片段属于轻微、严重或关键错误。\n\n值得注意的是，在使用 COMET 比较两个不同翻译系统的表现时，务必运行 `comet-compare` 命令来获得统计显著性度量。该命令通过统计假设检验比较两个系统的输出，从而估算观察到的系统间分数差异是由偶然因素导致的概率。这是确保系统间分数差异具有统计显著性的关键步骤。\n\n总体而言，最新 COMET 模型在分数可解释性方面的提升，结合使用 `comet-compare` 对系统间统计显著性进行评估的能力，使 COMET 成为评估机器翻译的宝贵工具。\n\n## 支持的语言：\n\n上述所有模型均基于 XLM-R（变体）构建，支持以下语言：\n\n南非语、阿尔巴尼亚语、阿姆哈拉语、阿拉伯语、亚美尼亚语、阿萨姆语、阿塞拜疆语、巴斯克语、白俄罗斯语、孟加拉语、罗马化孟加拉语、波斯尼亚语、布列塔尼语、保加利亚语、缅甸语、加泰罗尼亚语、简体中文、繁体中文、克罗地亚语、捷克语、丹麦语、荷兰语、英语、世界语、爱沙尼亚语、菲律宾语、芬兰语、法语、加利西亚语、格鲁吉亚语、德语、希腊语、古吉拉特语、豪萨语、希伯来语、印地语、罗马化印地语、匈牙利语、冰岛语、印尼语、爱尔兰语、意大利语、日语、爪哇语、卡纳达语、哈萨克语、高棉语、韩语、库尔德语（库尔曼吉）、吉尔吉斯语、老挝语、拉丁语、拉脱维亚语、立陶宛语、马其顿语、马达加斯加语、马来语、马拉雅拉姆语、马拉地语、蒙古语、尼泊尔语、挪威语、奥里亚语、奥罗莫语、普什图语、波斯语、波兰语、葡萄牙语、旁遮普语、罗马尼亚语、俄语、梵语、苏格兰盖尔语、塞尔维亚语、信德语、僧伽罗语、斯洛伐克语、斯洛文尼亚语、索马里语、西班牙语、巽他语、斯瓦希里语、瑞典语、泰米尔语、罗马化泰米尔语、泰卢固语、罗马化泰卢固语、泰语、土耳其语、乌克兰语、乌尔都语、罗马化乌尔都语、维吾尔语、乌兹别克语、越南语、威尔士语、西弗里斯兰语、科萨语、意第绪语。\n\n**因此，包含未覆盖语言的语言对的结果不可靠！**\n\n### 非洲语言的 COMET：\n\n如果您对非洲语言的 COMET 指标感兴趣，请访问 [afriCOMET](https:\u002F\u002Fgithub.com\u002Fmasakhane-io\u002Fafricomet)。\n\n## 在 Python 中进行评分：\n\n```python\nfrom comet import download_model, load_from_checkpoint\n\n# 从 Hugging Face Hub 选择您的模型\nmodel_path = download_model(\"Unbabel\u002FXCOMET-XL\")\n# 或例如：\n# model_path = download_model(\"Unbabel\u002Fwmt22-comet-da\")\n\n# 加载模型检查点：\nmodel = load_from_checkpoint(model_path)\n\n# 数据必须采用以下格式：\ndata = [\n    {\n        \"src\": \"10 到 15 分钟可以送到吗\",\n        \"mt\": \"Can I receive my food in 10 to 15 minutes?\",\n        \"ref\": \"Can it be delivered between 10 to 15 minutes?\"\n    },\n    {\n        \"src\": \"Pode ser entregue dentro de 10 a 15 minutos?\",\n        \"mt\": \"Can you send it for 10 to 15 minutes?\",\n        \"ref\": \"Can it be delivered between 10 to 15 minutes?\"\n    }\n]\n# 调用 predict 方法：\nmodel_output = model.predict(data, batch_size=8, gpus=1)\n```\n\n输出结果如下：\n```python\n# 句子级分数（列表）\n>>> model_output.scores\n[0.9822099208831787, 0.9599897861480713]\n\n# 系统级评分（浮点数）\n>>> model_output.system_score\n0.971099853515625\n\n# 检测到的错误片段（列表，嵌套字典列表）\n>>> model_output.metadata.error_spans\n[\n  [{'confidence': 0.4160953164100647,\n   'end': 21,\n   'severity': 'minor',\n   'start': 13,\n   'text': 'my food'}],\n  [{'confidence': 0.40004390478134155,\n   'end': 19,\n   'severity': 'minor',\n   'start': 3,\n   'text': 'you send it for'}]\n]\n```\n\n不过，请注意，并非所有 COMET 模型都会返回包含检测到的错误片段的元数据。\n\n\n## 解释翻译错误：\n\n请查看[这个笔记本](https:\u002F\u002Fgist.github.com\u002Fmtreviso\u002Fb618b499bc6de0414a3e11157e91cf02)，其中提供了一个最小示例，说明如何将 xCOMET 与 [xTower](https:\u002F\u002Fhuggingface.co\u002Fsardinelab\u002FxTower13B) 结合使用，为每个错误片段生成自然语言解释。\n\n对于上述葡萄牙语-英语示例，我们将使用以下**提示**调用 xTower：\n> 您将获得源文本、译文、译文质量分析以及译文质量评分（弱、中、好、优、最佳）。译文质量分析包含带有标记错误片段的译文，这些片段具有不同程度的严重性（轻微或严重）。根据这些信息，为每个错误生成解释，并给出完全正确的译文。\u003Cbr>\u003Cbr>\n> 葡萄牙语原文：Pode ser entregue dentro de 10 a 15 minutos?\u003Cbr>\n> 英语译文：Can you send it for 10 to 15 minutes?\u003Cbr>\n> 译文质量分析：Can `\u003Cerror1 severity='minor'>`you send it for`\u003C\u002Ferror1>` 10 to 15 minutes?\u003Cbr>\n> 译文质量评分：优秀\n\n\n输出结果如下：\n> 错误1解释：短语“Can you send it for 10 to 15 minutes?”是对原葡萄牙语句子的误译。正确的理解应关注送达时间，而非发送所需的时间。原句询问的是送达时间，而不是发送所需的时间。\u003Cbr>\n> 译文修正：Can it be delivered within 10 to 15 minutes?\n\n\n更多信息请参阅[xTower 文档](https:\u002F\u002Fhuggingface.co\u002Fsardinelab\u002FxTower13B)。\n\n\n# 训练您自己的指标：\n\n除了使用预训练模型外，您还可以通过以下命令训练自己的模型：\n```bash\ncomet-train --cfg configs\u002Fmodels\u002F{your_model_config}.yaml\n```\n\n之后，您可以使用自定义指标进行评分：\n```bash\ncomet-score -s src.de -t hyp1.en -r ref.en --model PATH\u002FTO\u002FCHECKPOINT\n```\n\n此外，您还可以将模型上传至[Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Findex)。以[`Unbabel\u002Fwmt22-comet-da`](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt22-comet-da)为例，随后即可直接从 Hub 上使用您的模型。\n\n# 单元测试：\n要运行工具包的测试，您需要执行以下命令：\n\n```bash\npoetry run coverage run --source=comet -m unittest discover\npoetry run coverage report -m # 预期覆盖率 76%\n```\n\n**注意：** 在 CPU 上运行测试耗时较长。\n\n# 出版物\n\n如果您使用 COMET，请务必引用我们的工作，并且不要忘记说明您所使用的模型！\n\n- [xCOMET：通过细粒度错误检测实现透明的机器翻译评估](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10482.pdf)\n\n- [扩展 CometKiwi：Unbabel-IST 2023 年质量评估共享任务提交](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.11925.pdf)\n\n- [CometKiwi：IST-Unbabel 2022 年质量评估共享任务提交](https:\u002F\u002Faclanthology.org\u002F2022.wmt-1.60\u002F)\n\n- [COMET-22：Unbabel-IST 2022 年指标共享任务提交](https:\u002F\u002Faclanthology.org\u002F2022.wmt-1.52\u002F)\n\n- [寻找小彗星：那个不起眼却大有可为的指标](https:\u002F\u002Faclanthology.org\u002F2022.eamt-1.9\u002F)\n\n- [真的需要参考译文吗？Unbabel-IST 2021 年指标共享任务提交](https:\u002F\u002Faclanthology.org\u002F2021.wmt-1.111\u002F)\n\n- [基于不确定性的机器翻译评估](https:\u002F\u002Faclanthology.org\u002F2021.findings-emnlp.330\u002F)\n\n- [COMET：在生产环境中部署一种新的最先进 MT 评估指标](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.amta-user.4)\n\n- [Unbabel 参与 WMT20 指标共享任务](https:\u002F\u002Faclanthology.org\u002F2020.wmt-1.101\u002F)\n\n- [COMET：用于 MT 评估的神经网络框架](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.emnlp-main.213)","# COMET 快速上手指南\n\nCOMET 是一个基于神经网络的机器翻译（MT）评估工具，支持有参考（Reference-based）和无参考（Reference-free）的质量评分，并能通过 XCOMET 模型定位翻译错误。\n\n## 环境准备\n\n*   **操作系统**：Linux, macOS, Windows (WSL 推荐)\n*   **Python 版本**：3.8 或更高\n*   **硬件要求**：\n    *   CPU：可用于基础推理，但速度较慢。\n    *   GPU：推荐使用 NVIDIA GPU 以加速大规模评估（需安装对应的 CUDA 版本）。\n*   **前置依赖**：\n    *   确保 `pip` 为最新版本。\n    *   若使用部分特定模型（如 `Unbabel\u002Fwmt22-cometkiwi-da`），需先在 Hugging Face Hub 接受协议并登录。\n\n## 安装步骤\n\n### 1. 基础安装\n通过 PyPI 直接安装（国内用户可添加清华或阿里镜像源加速）：\n\n```bash\npip install --upgrade pip\npip install unbabel-comet -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 2. Hugging Face 登录（可选但推荐）\n若需使用受保护的模型（如 CometKiwi 系列），请先登录：\n\n```bash\nhuggingface-cli login\n```\n*按提示输入您的 Hugging Face Access Token。*\n\n### 3. 开发模式安装（可选）\n如需修改源码或运行 CLI 开发工具：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\ncd COMET\npip install poetry\npoetry install\n```\n\n## 基本使用\n\n### 1. 准备数据文件\nCOMET 需要源语言文件 (`src`)、假设译文文件 (`hyp`) 和参考译文文件 (`ref`)。每行对应一个句子。\n\n创建测试文件示例：\n```bash\necho -e \"10 到 15 分钟可以送到吗\\nPode ser entregue dentro de 10 a 15 minutos?\" >> src.txt\necho -e \"Can I receive my food in 10 to 15 minutes?\\nCan it be delivered in 10 to 15 minutes?\" >> hyp1.txt\necho -e \"Can it be delivered within 10 to 15 minutes?\\nCan you send it for 10 to 15 minutes?\" >> ref.txt\n```\n\n### 2. 执行评分命令\n\n#### 场景 A：有参考评分 (默认)\n使用默认的 WMT22 模型进行评分：\n\n```bash\ncomet-score -s src.txt -t hyp1.txt -r ref.txt\n```\n*注：可通过 `--gpus 0` 强制使用 CPU，或设置具体 GPU 数量。*\n\n#### 场景 B：无参考评分 (Quality Estimation)\n当没有参考译文时，使用 CometKiwi 模型：\n\n```bash\ncomet-score -s src.txt -t hyp1.txt --model Unbabel\u002Fwmt22-cometkiwi-da\n```\n\n#### 场景 C：错误分析与解释 (XCOMET)\n使用 XCOMET 模型识别错误类型（轻微、严重、致命）并输出 JSON 报告：\n\n```bash\ncomet-score -s src.txt -t hyp1.txt -r ref.txt --model Unbabel\u002FXCOMET-XL --to_json output.json\n```\n\n### 3. 多系统对比与显著性检验\n比较多个翻译系统的输出，并进行统计显著性测试（Paired T-Test）：\n\n```bash\ncomet-compare -s src.txt -t hyp1.txt hyp2.txt hyp3.en -r ref.txt\n```\n\n### 4. 在 Python 代码中使用\n除了命令行，您也可以直接在 Python 脚本中调用：\n\n```python\nfrom comet import download_model, load_from_checkpoint\n\n# 下载并加载模型\nmodel_path = download_model(\"Unbabel\u002Fwmt22-comet-da\")\nmodel = load_from_checkpoint(model_path)\n\n# 准备数据\ndata = [\n    {\n        \"src\": \"10 到 15 分钟可以送到吗\",\n        \"mt\": \"Can I receive my food in 10 to 15 minutes?\",\n        \"ref\": \"Can it be delivered within 10 to 15 minutes?\"\n    }\n]\n\n# 获取评分\nmodel_output = model.predict(data, batch_size=8, gpus=1)\nprint(model_output.scores)\n```\n\n## 分数解读\n*   **0 ~ 1 范围**：新版模型（2022 年后训练）输出归一化分数。**1** 代表完美翻译，**0** 代表随机水平。\n*   **统计显著性**：若需发表论文或严谨对比系统差异，请务必使用 `comet-compare` 命令获取 P 值，而非仅比较平均分。","某跨境电商团队正在构建多语言客服系统，需每日评估数千条由机器翻译生成的葡语、西班牙语回复质量，以确保海外用户沟通顺畅。\n\n### 没有 COMET 时\n- **依赖人工抽检效率低下**：质检员只能随机抽取少量译文进行人工打分，无法覆盖海量数据，导致大量低质翻译流入生产环境。\n- **缺乏细粒度错误定位**：发现译文不通顺时，难以快速判断是术语错误、语法问题还是逻辑缺失，返工修改耗时耗力。\n- **无参考译文场景束手无策**：针对实时聊天产生的新句式，往往没有标准参考答案，传统 BLEU 等指标完全失效，质量评估陷入盲区。\n- **主观评分标准不一**：不同质检员对“好翻译”的理解存在偏差，导致评分波动大，难以量化模型迭代带来的真实提升。\n\n### 使用 COMET 后\n- **全量自动化评分**：利用 COMET 的神经框架对每日生成的数千条译文进行批量打分，迅速筛选出低分样本优先处理，覆盖率提升至 100%。\n- **智能错误归因分析**：通过 XCOMET 模型自动识别并标记译文中的“严重”、“主要”或“轻微”错误（如漏译、错译），直接指导工程师精准修复。\n- **支持无参考评估**：启用 COMETKiwi 等无参考模型，即使在没有标准答案的实时对话场景中，也能依据上下文准确评估翻译流畅度与准确性。\n- **统一客观量化标准**：基于 MQM 类型学提供一致的分数体系，消除了人为主观差异，清晰量化了每次模型微调后的质量增益。\n\nCOMET 将模糊的翻译质量感知转化为可执行的量化数据与错误洞察，大幅降低了多语言业务的风险与运维成本。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FUnbabel_COMET_99bcf693.png","Unbabel","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FUnbabel_b3a5b9fa.png","",null,"https:\u002F\u002Fgithub.com\u002FUnbabel",[81],{"name":82,"color":83,"percentage":84},"Python","#3572A5",100,731,106,"2026-03-24T09:08:11","Apache-2.0","未说明","非必需（可通过 --gpus 0 在 CPU 上运行），但推荐使用 GPU 加速。具体型号、显存大小及 CUDA 版本未在文档中明确说明，但大型模型（如 XXL 版本，107 亿参数）通常需要高显存。","未说明（大型模型如 XXL 版本建议配备充足内存）",{"notes":93,"python":94,"dependencies":95},"部分模型（如 wmt22-cometkiwi-da, XCOMET 系列）需要在 Hugging Face Hub 上接受许可协议并登录才能使用。支持无参考评估和基于上下文的评估。安装建议使用 pip 或 poetry。不同模型参数量差异巨大（从基础版到 107 亿参数的 XXL 版），硬件需求随模型大小显著增加。","3.8+",[96,97,98,99,100],"unbabel-comet","torch","transformers","huggingface_hub","poetry (开发环境)",[13,54,26],[103,104,105,106,107,108],"machine-translation","evaluation-metrics","natural-language-processing","machine-learning","artificial-intelligence","nlp","2026-03-27T02:49:30.150509","2026-04-06T06:43:38.963423",[112,117,122,127,132],{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},15114,"为什么在 Python 3.8 或更高版本中使用 COMET 时会遇到 'Can't pickle local object' 或导入错误？","这通常是因为安装了错误的 PyPI 包。请确保卸载名为 `comet` 或 `comet_ml` 的包，并安装正确的包 `unbabel-comet`。\n解决方法：\n1. 卸载错误包：`pip uninstall comet`\n2. 安装正确包：`pip install unbabel-comet`\n注意：`comet`、`comet_ml` 和 `unbabel-comet` 是不同的包，混淆它们会导致此类错误。","https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fissues\u002F27",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},15115,"如何在 Windows 系统上成功安装 COMET？","在 Windows 上安装可能会遇到依赖问题（如 test-tube）。建议按以下步骤操作：\n1. 创建 Conda 环境：`conda create --name comet_windows_3_7 python=3.7`\n2. 激活环境：`conda activate comet_windows_3_7`\n3. 安装特定版本的 Torch：`pip install torch==1.4.0 -f https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Ftorch_stable.html`\n4. 安装 COMET：`pip install unbabel-comet==0.1.0 torch==1.4.0`\n如果从源码构建，可能需要在 poetry 中添加特定版本的 torch wheel。","https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fissues\u002F17",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},15116,"使用 XLMRoberta-large 模型训练时显存不足（OOM）怎么办？","即使在 24GB 显存的显卡上，使用 XLMRoberta-large 或 XL 模型进行全量训练仍然非常困难，仅调整精度（precision）可能不够。\n建议解决方案：\n1. 尝试使用 LoRA (Low-Rank Adaptation) 技术，这可能是在有限显存下训练大模型的有效方案。\n2. 如果必须全量训练，可能需要进一步减小 batch size（但这会显著降低速度）或升级硬件。","https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fissues\u002F208",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},15117,"COMETKiwi 模型的架构中，为什么选择第一个 token 而不是平均池化来计算句子嵌入？","在 WMT22\u002FWMT23 的 COMETKiwi 模型中，确实采用了取第一个 token 作为句子嵌入的方式，并通过逐层和前馈层计算得分。虽然具体的选择原因在讨论中被截断，但这种设计通常是为了利用预训练模型中 [CLS] 标记（或序列起始标记）所聚合的全局语义信息，这在许多下游分类和评分任务中表现优于简单的平均池化。具体配置可在模型的 hparams.yaml 文件中查看。","https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fissues\u002F216",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},15118,"如何使用自己的数据训练自定义的 COMET 评估指标？","您可以使用 `comet-train` 命令训练自己的排名模型。\n1. 数据格式：准备包含 \"src\" (源文), \"mt\" (机器翻译), \"ref\" (参考译文), \"score\" (得分) 字段的数据文件。\n2. 模型选择：可以在配置中指定编码器，例如 `bert-base-multilingual-cased`。\n3. 运行训练：使用 `comet-train` 配合相应的配置文件（如 trainer.yaml）。\n注意：如果在初始化模型时看到关于未使用权重的警告（例如从 BertForPreTraining 初始化 BertForSequenceClassification），这是预期行为，只要模型架构兼容即可继续训练。","https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fissues\u002F61",[138,143,148,153,158,163,168,173,178,183,188,193,198,203,208,213,218,223,228],{"id":139,"version":140,"summary_zh":141,"released_at":142},84747,"v2.2.7","修复了一些小问题，以关闭 [#248](https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fpull\u002F248)","2025-09-01T17:28:34",{"id":144,"version":145,"summary_zh":146,"released_at":147},84748,"v2.2.6","此版本修复了 #244 中报告的问题。\n\n与先前版本（2.2.5）一样，在使用 `comet-score` 命令时，它也会默认启用 `mode.half()`。这将使所有模型运行更快并降低内存占用，但同时也会导致评分结果出现极小的差异。\n\n例如：\n版本 2.2.4\nhyp1.txt 第 0 段得分：0.841**3**\nhyp1.txt 第 1 段得分：0.960**4**\nhyp1.txt 总得分：0.90**09**\n\n版本 2.2.6\nhyp1.txt 第 0 段得分：0.841**6**\nhyp1.txt 第 1 段得分：0.960**7**\nhyp1.txt 总得分：0.90**11**","2025-04-07T10:37:45",{"id":149,"version":150,"summary_zh":151,"released_at":152},84749,"v2.2.5","与 #195、#243 相关的少量错误修复。\n\n`cli-score` 现在默认使用 `model.half`，从而在时间和内存效率上都有所提升。\n\n**警告**：\n此版本中，部分模型的得分与先前版本相比出现了不一致。请改用 2.2.6 版本。","2025-03-26T17:12:05",{"id":154,"version":155,"summary_zh":156,"released_at":157},84750,"v2.2.4","修复了一些小问题，以关闭 #236、#235 和 #231。","2024-12-05T13:11:37",{"id":159,"version":160,"summary_zh":161,"released_at":162},84751,"v2.2.3","自v2.2.2以来的 minor bug 修复和功能改进","2024-11-27T18:09:28",{"id":164,"version":165,"summary_zh":166,"released_at":167},84752,"v2.2.1","修复了问题 #177、#178、#185、#183 和 #191 的小改动\n","2024-01-08T14:19:19",{"id":169,"version":170,"summary_zh":171,"released_at":172},84753,"v2.2.0","[xCOMET 模型](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10482.pdf)正式发布！\r\n\r\n为了便于与内部代码集成，我们新增了一个类 [XCOMETMetric](https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fblob\u002Fv2.2.0\u002Fcomet\u002Fmodels\u002Fmultitask\u002Fxcomet_metric.py#L33C7-L33C19)。本次发布仅支持此类模型的推理，训练功能目前仍未完全实现。\r\n\r\n这些模型可通过 Hugging Face Hub 获取：\n- [XCOMET-XL](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XL)\n- [XCOMET-XXL](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002FXCOMET-XXL)","2023-10-23T20:50:00",{"id":174,"version":175,"summary_zh":176,"released_at":177},84754,"v2.1.1","修复了 MBR 默认模型中的一个小错误。\n将默认的 QE 模型回滚至 CometKiwi-22（因其轻量级）。\n小幅更新依赖项。","2023-10-13T21:12:49",{"id":179,"version":180,"summary_zh":181,"released_at":182},84755,"v2.1.0","发布了 CometKiwi [-XL](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt23-cometkiwi-da-xl) 和 [-XXL](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt23-cometkiwi-da-xxl)\n\n升级 torchmetrics（“^0.10.2”）和 PyTorch Lightning（“^2.0.0”）(#159)\n\n更新了 MODELS.md 和 LICENSE.models.md 文档。","2023-09-21T20:35:55",{"id":184,"version":185,"summary_zh":186,"released_at":187},84756,"v2.0.2","修复了一个小 bug，更新了 Hugging Face Hub，并发布了 [CometKiwi 模型](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt22-cometkiwi-da)：\n- 升级 Hugging Face Hub 版本至 ^0.16.0\n- 增加了检查点下载的灵活性 (#156)\n- [CometKiwi](https:\u002F\u002Fhuggingface.co\u002FUnbabel\u002Fwmt22-cometkiwi-da) 模型终于发布。该模型为开源，采用非商业许可。","2023-08-16T11:22:32",{"id":189,"version":190,"summary_zh":191,"released_at":192},84757,"v2.0.1","Minor updates to release v2.0.0\r\n\r\n- Update torch dependency (#119, #126)\r\n- Minor Improvements\u002FBug fixes: (#122, #123, #124)\r\n- Specify device during inference (#120)","2023-04-05T10:41:49",{"id":194,"version":195,"summary_zh":196,"released_at":197},84758,"v2.0.0","- **New model architecture (UnifiedMetric)** inspired by [UniTE](https:\u002F\u002Faclanthology.org\u002F2022.acl-long.558\u002F). \r\n       - This model uses cross-encoding (similar to [BLEURT](https:\u002F\u002Faclanthology.org\u002F2020.acl-main.704\u002F)), works **with and without references** and can be trained in a multitask setting. This model is also implemented in a **very flexible** way where we can decide to train using just source and MT, reference and MT or source, MT and reference. \r\n \r\n- New encoder models [RemBERT](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.12821) and [XLM-RoBERTa-XL](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.00572.pdf)\r\n\r\n- New training features: \r\n       - **System-level accuracy** [(Kocmi et al, 2021)](https:\u002F\u002Faclanthology.org\u002F2021.wmt-1.57.pdf) reported during validation (only if validation files has a `system` column).\r\n       - **Support for multiple training files** (each file will be loaded at the end of the corresponding epoch): This is helpful to **train with large datasets** and to **train following a curriculum**.\r\n       - **Support for multiple validation files**: Before we were using 1 single validation file with all language pairs concatenated which has an impact in correlations. With this change we now can have 1 validation file for each language and correlations will be averaged over all validation sets. This also allows for the use of validation files where the ground truth scores are in different scales.\r\n       - **Support to HuggingFace Hub**: Models can now be easily added to HuggingFace Hub and used directly using the CLI\r\n       \r\n- With this release we also add **New models from WMT 22**: \r\n       1) We won the WMT 22 QE shared task: Using UnifiedMetric it should be easy to replicate our final system, nonetheless we are planning to release the system that was used: `wmt22-cometkiwi-da` which performs strongly both on data from the QE task (MLQE-PE corpus) and on data from metrics task (MQM annotations).\r\n       2) We were 2nd in the Metrics task (1st place was MetricXL a 6B parameter metric trained on top of mT5-XXL). Our new model  `wmt22-comet-da` was part of the ensemble used to secure our result.\r\n\r\nIf you are interested in our work from this year please read the following paper:\r\n- [CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task](https:\u002F\u002Fwww.statmt.org\u002Fwmt22\u002Fpdf\u002F2022.wmt-1.60.pdf)\r\n- [COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task](https:\u002F\u002Fwww.statmt.org\u002Fwmt22\u002Fpdf\u002F2022.wmt-1.52.pdf)\r\n\r\nAnd the corresponding findings papers:\r\n- [Findings of the WMT 2022 Shared Task on Quality Estimation](https:\u002F\u002Fwww.statmt.org\u002Fwmt22\u002Fpdf\u002F2022.wmt-1.3.pdf)\r\n- [Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust](https:\u002F\u002Fwww.statmt.org\u002Fwmt22\u002Fpdf\u002F2022.wmt-1.2.pdf)\r\n\r\nSpecial thanks to all the involved people: @mtreviso @nunonmg @glushkovato @chryssa-zrv @jsouza @DuarteMRAlves @Catarinafarinha @cmaroti","2023-03-13T16:54:18",{"id":199,"version":200,"summary_zh":201,"released_at":202},84759,"v1.1.3","Same as v1.1.2 but we bumped some requirements in order to be easier to use COMET on Windows and Apple M1.   ","2023-01-13T14:44:14",{"id":204,"version":205,"summary_zh":206,"released_at":207},84760,"v1.1.2","Just minor requirement updates to avoid installation errors described in #82\r\n","2022-06-06T18:59:58",{"id":209,"version":210,"summary_zh":211,"released_at":212},84761,"v1.1.1","1) comet-compare to support multiple system comparisons.\r\n2) Bugfix: Broken link for wmt21-comet-qe-da (#78)\r\n3) Bugfix: protobuf dependency (#82)\r\n4) New models from Cometinho [EAMT 22 paper](https:\u002F\u002Faclanthology.org\u002F2022.eamt-1.9\u002F) (eamt22-cometinho-da & eamt22-comet-prune-da)\r\n\r\n## Breaking Changed\r\n`comet-compare` does not support `-x`and `-y` flags. Now it receives a single flag `-t` with multiple arguments for multiples systems.\r\n\r\nBefore:\r\n```\r\ncomet-compare -s src.de -x hyp1.en -y hyp2.en -r ref.en\r\n```\r\n\r\nAfter:\r\n\r\n```\r\ncomet-compare -s src.de -t hyp1.en hyp2.en -r ref.en\r\n```\r\n\r\n## Contributors\r\n* @erip (#69, #70)\r\n* @SamuelLarkin (#74)\r\n* @Joao-Maria-Janeiro (#75, #77)\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fcompare\u002Fv1.0.1...v1.1.0","2022-06-01T22:47:27",{"id":214,"version":215,"summary_zh":216,"released_at":217},84762,"v1.1.0","1) Updated [documentation](https:\u002F\u002Funbabel.github.io\u002FCOMET\u002Fhtml\u002Findex.html)\r\n2) Updated Pytorch Lightning version to avoid security vulnerabilities (Untrusted Data & Code Injection)\r\n3) Inspired by [Amrhein et al, 2022](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.05148) we added the `comet-mbr` command for fast Minimum Bayes Risk Decoding.\r\n4) New encoder models\r\n\r\n## What's Changed\r\n* Fix minor typo in exception message by @alvations in https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fpull\u002F57\r\n* Adds --quiet flag by @Remorax in https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fpull\u002F58\r\n* Bug fix of num_workers. by @devrimcavusoglu in https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fpull\u002F63\r\n* fix encoding issues for Windows users by @erip in https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fpull\u002F68\r\n\r\n## New Contributors\r\n* @alvations made their first contribution in https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fpull\u002F57\r\n* @Remorax made their first contribution in https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fpull\u002F58\r\n* @devrimcavusoglu made their first contribution in https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fpull\u002F63\r\n* @erip made their first contribution in https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fpull\u002F68\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FUnbabel\u002FCOMET\u002Fcompare\u002Fv1.0.1...v1.1.0","2022-04-02T18:52:45",{"id":219,"version":220,"summary_zh":221,"released_at":222},84763,"v1.0.1","Scipy missing from dependencies list.","2021-11-19T15:51:52",{"id":224,"version":225,"summary_zh":226,"released_at":227},84764,"v1.0.0","What's new?\r\n\r\n1) comet-compare command for statistical comparison between two models\r\n2) comet-score with multiple hypothesis\u002Fsystems\r\n3) Embeddings caching for faster inference (thanks to @jsouza).\r\n4) Length Batching for faster inference (thanks to @CoderPat)\r\n5) Integration with SacreBLEU for dataset downloading (thanks to @mjpost)\r\n6) Monte-carlo Dropout for uncertainty estimation (thanks to @glushkovato and @chryssa-zrv)\r\n7) Some code refactoring\r\n\r\nHopefully, this version is also easier to install than the previous one that relied on fairseq.\r\n\r\n","2021-11-19T14:59:59",{"id":229,"version":230,"summary_zh":231,"released_at":232},84765,"0.1.0","- We now use Poetry to solve dependency issues.\r\n- Removed LASER encoder and FastBPE dependencies (Windows users can now run COMET)\r\n- Removed references requirements for QE models","2021-03-11T17:55:36"]