[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-primeqa--primeqa":3,"tool-primeqa--primeqa":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",158594,2,"2026-04-16T23:34:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":65,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":76,"owner_url":77,"languages":78,"stars":103,"forks":104,"last_commit_at":105,"license":106,"difficulty_score":10,"env_os":107,"env_gpu":108,"env_ram":107,"env_deps":109,"category_tags":119,"github_topics":121,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":139,"updated_at":140,"faqs":141,"releases":175},8376,"primeqa\u002Fprimeqa","primeqa","The prime repository for state-of-the-art Multilingual Question Answering research and development.","PrimeQA 是一个专注于多语言问答（QA）研究与开发的开源平台，旨在帮助用户轻松训练和部署最先进的问答模型。它主要解决了跨语言场景下信息获取的难题，让用户能够基于自定义数据复现顶级 NLP 会议中的前沿实验，或直接调用预训练模型进行高效推理。\n\n这款工具特别适合自然语言处理领域的研究人员和开发者使用。无论是希望验证最新算法的学者，还是需要将多语言问答能力集成到应用中的工程师，都能从中受益。PrimeQA 基于 Hugging Face Transformers 构建，提供了从数据加载到模型评估的一站式流程。\n\n其核心技术亮点在于支持端到端的多语言问答全链路功能：既包含传统（如 BM25）与神经（如 ColBERT、DPR）信息检索模块，能精准定位相关文档；也涵盖多语言机器阅读理解，可从文中提取或生成答案；甚至支持多语言问题生成及检索增强生成（RAG），能结合 GPT 等大模型生成高质量回复。凭借在 XOR-TyDi、TyDiQA 等多个权威榜单上的领先表现，PrimeQA 已成为探索多语言智能问答技术的可靠基石。","\u003C!---\nCopyright 2022 IBM Corp.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n-->\n\n\u003Ch3 align=\"center\">\n    \u003Cimg width=\"350\" alt=\"primeqa\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_4fac6610ddf9.png\">\n    \u003Cp>The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development.\u003C\u002Fp>\n\u003C\u002Fh3>\n\n![Build Status](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Factions\u002Fworkflows\u002Fprimeqa-ci.yml\u002Fbadge.svg)\n[![LICENSE|Apache2.0](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fsaltstack\u002Fsalt?color=blue)](https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0.txt)\n[![sphinx-doc-build](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Factions\u002Fworkflows\u002Fsphinx-doc-build.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Factions\u002Fworkflows\u002Fsphinx-doc-build.yml)   \n\nPrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. PrimeQA is built on top of the [Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) toolkit and uses [datasets](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fviewer\u002F) and [models](https:\u002F\u002Fhuggingface.co\u002FPrimeQA) that are directly downloadable.\n\n\nThe models within PrimeQA supports End-to-end Question Answering. PrimeQA answers questions via \n- [Information Retrieval](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fprimeqa\u002Fir): Retrieving documents and passages using both traditional (e.g. BM25) and neural (e.g. ColBERT) models\n- [Multilingual Machine Reading Comprehension](https:\u002F\u002Fhuggingface.co\u002Fibm\u002Ftydiqa-primary-task-xlm-roberta-large): Extract and\u002F or generate answers given the source document or passage.\n- [Multilingual Question Generation](https:\u002F\u002Fhuggingface.co\u002FPrimeQA\u002Fmt5-base-tydi-question-generator): Supports generation of questions for effective domain adaptation over [tables](https:\u002F\u002Fhuggingface.co\u002FPrimeQA\u002Ft5-base-table-question-generator) and [multilingual text](https:\u002F\u002Fhuggingface.co\u002FPrimeQA\u002Fmt5-base-tydi-question-generator).\n- [Retrieval Augmented Generation](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Fblob\u002Fmain\u002Fnotebooks\u002Fretriever-reader-pipelines\u002Fprompt_reader_with_GPT.ipynb): Generate answers using the GPT-3\u002FChatGPT pretrained models, conditioned on retrieved passages. \n\nSome examples of models (applicable on benchmark datasets) supported are :\n- [Traditional IR with BM25](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fprimeqa\u002Fir\u002F) Pyserini\n- [Neural IR with ColBERT, DPR](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fprimeqa\u002Fir) (collaboration with [Stanford NLP](https:\u002F\u002Fnlp.stanford.edu\u002F) IR led by [Chris Potts](https:\u002F\u002Fweb.stanford.edu\u002F~cgpotts\u002F) & [Matei Zaharia](https:\u002F\u002Fcs.stanford.edu\u002F~matei\u002F)).\nReplicating the experiments that [Dr. Decr](https:\u002F\u002Fhuggingface.co\u002Fibm\u002FDrDecr_XOR-TyDi_whitebox) (Li et. al, 2022) performed to reach the top of the XOR TyDI leaderboard.\n- [Machine Reading Comprehension with XLM-R](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fprimeqa\u002Fmrc): to replicate the experiments to get to the top of the TyDI leaderboard similar to the performance of the IBM GAAMA system. Coming soon: code to replicate GAAMA's performance on Natural Questions. \n\n## 🏅 Top of the Leaderboard\n\nPrimeQA is at the top of several leaderboards: XOR-TyDi, TyDiQA-main, OTT-QA and HybridQA.\n\n### [XOR-TyDi](https:\u002F\u002Fnlp.cs.washington.edu\u002Fxorqa\u002F)\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_4170fc578687.png\" width=\"50%\">\n\n### [TyDiQA-main](https:\u002F\u002Fai.google.com\u002Fresearch\u002Ftydiqa)\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_94d5d43ea6af.png\" width=\"50%\">\n\n### [OTT-QA](https:\u002F\u002Fcodalab.lisn.upsaclay.fr\u002Fcompetitions\u002F7967)\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_dec62912506a.png\" width=\"50%\">\n\n### [HybridQA](https:\u002F\u002Fcodalab.lisn.upsaclay.fr\u002Fcompetitions\u002F7979)\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_f9ae46b9d3f3.png\" width=\"50%\">\n\n## ✔️ Getting Started\n\n### Installation\n[Installation doc](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002Finstallation.html)       \n\n```shell\n# cd to project root\n\n# If you want to run on GPU make sure to install torch appropriately\n\n# E.g. for torch 1.11 + CUDA 11.3:\npip install 'torch~=1.11.0' --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu113\n\n# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired\n# Example installation commands:\n\n# Minimal install (non-editable)\npip install .\n\n# GPU support\npip install .[gpu]\n\n# Full install (editable)\npip install -e .[all]\n```\n\nPlease note that dependencies (specified in [setup.py](.\u002Fsetup.py)) are pinned to provide a stable experience.\nWhen installing from source these can be modified, however this is not officially supported.\n\n**Note:** in many environments, conda-forge based faiss libraries perform substantially better than the default ones installed with pip. To install faiss libraries from conda-forge, use the following steps:\n\n- Create and activate a conda environment\n- Install faiss libraries, using a command\n\n```conda install -c conda-forge faiss=1.7.0 faiss-gpu=1.7.0```\n\n- In `setup.py`, remove the faiss-related lines:\n\n```commandline\n\"faiss-cpu~=1.7.2\": [\"install\", \"gpu\"],\n\"faiss-gpu~=1.7.2\": [\"gpu\"],\n```\n\n- Continue with the `pip install` commands as desctibed above.\n\n\n### JAVA requirements\nJava 11 is required for BM25 retrieval. Install java as follows:\n\n```shell\nconda install -c conda-forge openjdk=11\n```\n## :speech_balloon: Blog Posts\nThere're several blog posts by members of the open source community on how they've been using PrimeQA for their needs. Read some of them:\n1. [PrimeQA and GPT 3](https:\u002F\u002Fwww.marktechpost.com\u002F2023\u002F03\u002F03\u002Fwith-just-20-lines-of-python-code-you-can-do-retrieval-augmented-gpt-based-qa-using-this-open-source-repository-called-primeqa\u002F)\n2. [Enterprise search with PrimeQA](https:\u002F\u002Fheidloff.net\u002Farticle\u002Fintroduction-neural-information-retrieval\u002F)\n3. [A search engine for Trivia geeks](https:\u002F\u002Fwww.deleeuw.me.uk\u002Fposts\u002FUsing-PrimeQA-For-NLP-Question-Answering\u002F)\n\n\n## 🧪 Unit Tests\n[Testing doc](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002Ftesting.html)       \n\nTo run the unit tests you first need to [install PrimeQA](#Installation).\nMake sure to install with the `[tests]` or `[all]` extras from pip.\n\nFrom there you can run the tests via pytest, for example:\n```shell\npytest --cov PrimeQA --cov-config .coveragerc tests\u002F\n```\n\nFor more information, see:\n- Our [tox.ini](.\u002Ftox.ini)\n- The [pytest](https:\u002F\u002Fdocs.pytest.org) and [tox](https:\u002F\u002Ftox.wiki\u002Fen\u002Flatest\u002F) documentation    \n\n## 🔭 Learn more\n\n| Section | Description |\n|-|-|\n| 📒 [Documentation](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa) | Full API documentation and tutorials |\n| 🏁 [Quick tour: Entry Points for PrimeQA](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fprimeqa) | Different entry points for PrimeQA: Information Retrieval, Reading Comprehension, TableQA and Question Generation |\n| 📓 [Tutorials: Jupyter Notebooks](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fnotebooks) | Notebooks to get started on QA tasks |\n| 📓 [GPT-3\u002FChatGPT Reader Notebooks](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fnotebooks\u002Fmrc\u002FLLM_reader_predict_mode.ipynb) | Notebooks to get started with the GPT-3\u002FChatGPT reader components|\n| 💻 [Examples: Applying PrimeQA on various QA tasks](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fexamples) | Example scripts for fine-tuning PrimeQA models on a range of QA tasks |\n| 🤗 [Model sharing and uploading](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_sharing) | Upload and share your fine-tuned models with the community |\n| ✅ [Pull Request](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002Fpull_request_template.html) | PrimeQA Pull Request |\n| 📄 [Generate Documentation](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002FREADME.html) | How Documentation works |        \n| 🛠 [Orchestrator Service REST Microservice](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002Forchestrator.html) | Proof-of-concept code for PrimeQA Orchestrator microservice |        \n| 📖 [Tooling UI](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002Ftooling_ui.html) | Demo UI |        \n\n## ❤️ PrimeQA collaborators include       \n\n| | | | |\n|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|\n|\u003Cimg width=\"75\" alt=\"stanford\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_bb9e84c8ae80.png\">| Stanford NLP |\u003Cimg width=\"75\" alt=\"i\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_92a5b6abfb0f.png\">| University of Illinois |\n|\u003Cimg width=\"75\" alt=\"stuttgart\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_e95c05b41b34.png\">| University of Stuttgart | \u003Cimg width=\"75\" alt=\"notredame\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_9f3469619b90.png\">| University of Notre Dame |\n|\u003Cimg width=\"75\" alt=\"ohio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_d509f7beaed6.png\">| Ohio State University |\u003Cimg width=\"75\" alt=\"carnegie\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_c3cc7401823c.png\">| Carnegie Mellon University |\n|\u003Cimg width=\"75\" alt=\"massachusetts\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_b7adfa7afbd4.png\">| University of Massachusetts |\u003Cimg width=\"75\" height=\"75\" alt=\"ibm\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_9ab88b980eb6.png\">| IBM Research |\n| | | | |\n\n\n\u003Cbr>\n\u003Cbr>\n\u003Cbr>\n\u003Cbr>\n\u003Cdiv align=\"center\">\n    \u003Cimg width=\"30\" alt=\"primeqa\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_a58c4aee383d.png\">\n\u003C\u002Fdiv>\n","\u003C!---\n版权所有 © 2022 IBM公司。\n\n根据 Apache License, Version 2.0（“许可证”）授权；\n除非符合许可证的规定，否则不得使用此文件。\n您可以在以下网址获取许可证副本：\n\n    http:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0\n\n除非适用法律要求或书面同意，否则根据“AS IS”基础分发的软件，\n不提供任何形式的保证或条件，无论是明示的还是默示的。\n有关权限和限制的具体语言，请参阅许可证。\n-->\n\n\u003Ch3 align=\"center\">\n    \u003Cimg width=\"350\" alt=\"primeqa\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_4fac6610ddf9.png\">\n    \u003Cp>最先进的多语言问答研究与开发的首选资源库。\u003C\u002Fp>\n\u003C\u002Fh3>\n\n![构建状态](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Factions\u002Fworkflows\u002Fprimeqa-ci.yml\u002Fbadge.svg)\n[![许可证|Apache2.0](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fsaltstack\u002Fsalt?color=blue)](https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0.txt)\n[![sphinx-doc-build](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Factions\u002Fworkflows\u002Fsphinx-doc-build.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Factions\u002Fworkflows\u002Fsphinx-doc-build.yml)   \n\nPrimeQA是一个公开的开源代码库，使研究人员和开发者能够训练用于问答（QA）的最先进模型。通过使用PrimeQA，研究人员可以复现最新自然语言处理会议论文中描述的实验，同时还可以从在线存储库下载预训练模型，并在自己的自定义数据上运行这些模型。PrimeQA基于[Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)工具包构建，并使用可直接下载的[数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fviewer\u002F)和[模型](https:\u002F\u002Fhuggingface.co\u002FPrimeQA)。\n\nPrimeQA中的模型支持端到端的问答任务。PrimeQA通过以下方式回答问题：\n- [信息检索](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fprimeqa\u002Fir)：使用传统方法（如BM25）和神经网络模型（如ColBERT）检索文档和段落。\n- [多语言机器阅读理解](https:\u002F\u002Fhuggingface.co\u002Fibm\u002Ftydiqa-primary-task-xlm-roberta-large)：根据源文档或段落提取或生成答案。\n- [多语言问题生成](https:\u002F\u002Fhuggingface.co\u002FPrimeQA\u002Fmt5-base-tydi-question-generator)：支持为有效的领域适应生成问题，适用于[表格](https:\u002F\u002Fhuggingface.co\u002FPrimeQA\u002Ft5-base-table-question-generator)和[多语言文本](https:\u002F\u002Fhuggingface.co\u002FPrimeQA\u002Fmt5-base-tydi-question-generator)。\n- [检索增强生成](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Fblob\u002Fmain\u002Fnotebooks\u002Fretriever-reader-pipelines\u002Fprompt_reader_with_GPT.ipynb)：利用GPT-3\u002FChatGPT预训练模型，在检索到的段落基础上生成答案。\n\n一些支持的模型示例（适用于基准数据集）包括：\n- [传统的BM25信息检索](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fprimeqa\u002Fir\u002F) Pyserini\n- [基于ColBERT、DPR的神经网络信息检索](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fprimeqa\u002Fir)（与[斯坦福NLP](https:\u002F\u002Fnlp.stanford.edu\u002F)合作，由[Chris Potts](https:\u002F\u002Fweb.stanford.edu\u002F~cgpotts\u002F)和[Matei Zaharia](https:\u002F\u002Fcs.stanford.edu\u002F~matei\u002F)领导的信息检索团队）。\n复现了[Dr. Decr](https:\u002F\u002Fhuggingface.co\u002Fibm\u002FDrDecr_XOR-TyDi_whitebox)（Li等人，2022年）为登上XOR TyDI排行榜榜首所进行的实验。\n- [基于XLM-R的机器阅读理解](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fprimeqa\u002Fmrc)：复现登上TyDI排行榜榜首的实验，性能与IBM GAAMA系统相似。不久将提供复现GAAMA在Natural Questions数据集上表现的代码。\n\n## 🏅 排行榜榜首\n\nPrimeQA位于多个排行榜的榜首：XOR-TyDi、TyDiQA-main、OTT-QA和HybridQA。\n\n### [XOR-TyDi](https:\u002F\u002Fnlp.cs.washington.edu\u002Fxorqa\u002F)\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_4170fc578687.png\" width=\"50%\">\n\n### [TyDiQA-main](https:\u002F\u002Fai.google.com\u002Fresearch\u002Ftydiqa)\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_94d5d43ea6af.png\" width=\"50%\">\n\n### [OTT-QA](https:\u002F\u002Fcodalab.lisn.upsaclay.fr\u002Fcompetitions\u002F7967)\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_dec62912506a.png\" width=\"50%\">\n\n### [HybridQA](https:\u002F\u002Fcodalab.lisn.upsaclay.fr\u002Fcompetitions\u002F7979)\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_f9ae46b9d3f3.png\" width=\"50%\">\n\n## ✔️ 开始使用\n\n### 安装\n[安装文档](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002Finstallation.html)       \n\n```shell\n# 进入项目根目录\n\n# 如果要在GPU上运行，请确保正确安装PyTorch\n\n# 例如，对于PyTorch 1.11 + CUDA 11.3：\npip install 'torch~=1.11.0' --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu113\n\n# 使用pip以可编辑（-e）或不可编辑的方式安装，并根据需要添加额外组件（如测试）\n# 示例安装命令：\n\n# 最小化安装（不可编辑）\npip install .\n\n# GPU支持\npip install .[gpu]\n\n# 完整安装（可编辑）\npip install -e .[all]\n```\n\n请注意，依赖项（在[setup.py](.\u002Fsetup.py)中指定）已被固定版本锁定，以提供稳定的使用体验。从源代码安装时可以修改这些依赖项，但官方并不支持这种做法。\n\n**注意：** 在许多环境中，基于conda-forge的faiss库比使用pip安装的默认库性能要好得多。要从conda-forge安装faiss库，请按照以下步骤操作：\n\n- 创建并激活一个conda环境\n- 使用以下命令安装faiss库：\n\n```conda install -c conda-forge faiss=1.7.0 faiss-gpu=1.7.0```\n\n- 在`setup.py`中移除与faiss相关的行：\n\n```commandline\n\"faiss-cpu~=1.7.2\": [\"install\", \"gpu\"],\n\"faiss-gpu~=1.7.2\": [\"gpu\"],\n```\n\n- 继续按照上述说明执行`pip install`命令。\n\n\n### JAVA要求\nBM25检索需要Java 11。请按以下步骤安装Java：\n\n```shell\nconda install -c conda-forge openjdk=11\n```\n## :speech_balloon: 博客文章\n开源社区成员撰写了多篇博客文章，介绍了他们如何使用PrimeQA来满足自身需求。请阅读其中几篇：\n1. [PrimeQA与GPT 3](https:\u002F\u002Fwww.marktechpost.com\u002F2023\u002F03\u002F03\u002Fwith-just-20-lines-of-python-code-you-can-do-retrieval-augmented-gpt-based-qa-using-this-open-source-repository-called-primeqa\u002F)\n2. [使用PrimeQA进行企业搜索](https:\u002F\u002Fheidloff.net\u002Farticle\u002Fintroduction-neural-information-retrieval\u002F)\n3. [面向Trivia迷的搜索引擎](https:\u002F\u002Fwww.deleeuw.me.uk\u002Fposts\u002FUsing-PrimeQA-For-NLP-Question-Answering\u002F)\n\n\n## 🧪 单元测试\n[测试文档](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002Ftesting.html)       \n\n要运行单元测试，您首先需要[安装PrimeQA](#Installation)。请务必使用pip安装时包含[tests]或[all]附加组件。\n\n之后，您可以使用pytest运行测试，例如：\n```shell\npytest --cov PrimeQA --cov-config .coveragerc tests\u002F\n```\n\n更多信息，请参阅：\n- 我们的[tox.ini](.\u002Ftox.ini)\n- [pytest](https:\u002F\u002Fdocs.pytest.org)和[tox](https:\u002F\u002Ftox.wiki\u002Fen\u002Flatest\u002F)的文档\n\n## 🔭 了解更多\n\n| 片段 | 描述 |\n|-|-|\n| 📒 [文档](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa) | 完整的 API 文档和教程 |\n| 🏁 [快速入门：PrimeQA 的入口点](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fprimeqa) | PrimeQA 的不同入口点：信息检索、阅读理解、表格问答和问题生成 |\n| 📓 [教程：Jupyter 笔记本](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fnotebooks) | 用于开始 QA 任务的笔记本 |\n| 📓 [GPT-3\u002FChatGPT 阅读器笔记本](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fnotebooks\u002Fmrc\u002FLLM_reader_predict_mode.ipynb) | 用于开始使用 GPT-3\u002FChatGPT 阅读器组件的笔记本 |\n| 💻 [示例：在各种 QA 任务上应用 PrimeQA](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Ftree\u002Fmain\u002Fexamples) | 用于在一系列 QA 任务上微调 PrimeQA 模型的示例脚本 |\n| 🤗 [模型分享与上传](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_sharing) | 将您微调后的模型上传并与社区共享 |\n| ✅ [拉取请求](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002Fpull_request_template.html) | PrimeQA 拉取请求 |\n| 📄 [生成文档](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002FREADME.html) | 文档的工作原理 |\n| 🛠 [编排服务 REST 微服务](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002Forchestrator.html) | PrimeQA 编排微服务的概念验证代码 |\n| 📖 [工具 UI](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa\u002Ftooling_ui.html) | 演示 UI |\n\n## ❤️ PrimeQA 合作伙伴包括\n\n| | | | |\n|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|\n|\u003Cimg width=\"75\" alt=\"斯坦福\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_bb9e84c8ae80.png\">| 斯坦福 NLP |\u003Cimg width=\"75\" alt=\"i\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_92a5b6abfb0f.png\">| 伊利诺伊大学 |\n|\u003Cimg width=\"75\" alt=\"斯图加特\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_e95c05b41b34.png\">| 斯图加特大学 | \u003Cimg width=\"75\" alt=\"圣母大学\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_9f3469619b90.png\">| 圣母大学 |\n|\u003Cimg width=\"75\" alt=\"俄亥俄州立大学\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_d509f7beaed6.png\">| 俄亥俄州立大学 |\u003Cimg width=\"75\" alt=\"卡内基梅隆大学\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_c3cc7401823c.png\">| 卡内基梅隆大学 |\n|\u003Cimg width=\"75\" alt=\"马萨诸塞大学\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_b7adfa7afbd4.png\">| 马萨诸塞大学 |\u003Cimg width=\"75\" height=\"75\" alt=\"IBM\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_9ab88b980eb6.png\">| IBM 研究 |\n| | | | |\n\n\n\u003Cbr>\n\u003Cbr>\n\u003Cbr>\n\u003Cbr>\n\u003Cdiv align=\"center\">\n    \u003Cimg width=\"30\" alt=\"primeqa\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_readme_a58c4aee383d.png\">\n\u003C\u002Fdiv>","# PrimeQA 快速上手指南\n\nPrimeQA 是一个开源的多语言问答（QA）研究与开发平台，基于 Hugging Face Transformers 构建。它支持端到端的问答流程，包括信息检索（IR）、机器阅读理解（MRC）、多语言问题生成以及检索增强生成（RAG）。\n\n## 1. 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux 或 macOS (Windows 需通过 WSL2 运行)\n*   **Python**: 推荐 Python 3.8+\n*   **Java**: 若需使用 BM25 传统检索功能，必须安装 **Java 11**。\n*   **GPU (可选)**: 若需加速训练或推理，请确保已安装适配的 CUDA 驱动。\n\n### 前置依赖安装\n\n**安装 Java 11 (BM25 检索必需):**\n```shell\nconda install -c conda-forge openjdk=11\n```\n\n**优化 Faiss 性能 (强烈推荐):**\n为了获得比默认 pip 包更好的性能，建议使用 conda-forge 安装 Faiss：\n```shell\n# 创建并激活 conda 环境\nconda create -n primeqa python=3.8\nconda activate primeqa\n\n# 安装 faiss 库\nconda install -c conda-forge faiss=1.7.0 faiss-gpu=1.7.0\n```\n*注意：如果您通过 conda 安装了 faiss，请在后续执行 `pip install` 前，编辑项目根目录下的 `setup.py` 文件，移除其中关于 `faiss-cpu` 和 `faiss-gpu` 的依赖行，以避免冲突。*\n\n## 2. 安装步骤\n\n克隆仓库并进入项目根目录：\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa.git\ncd primeqa\n```\n\n根据您的硬件环境选择以下一种安装方式：\n\n**方案 A：最小化安装 (仅 CPU)**\n```shell\npip install .\n```\n\n**方案 B：GPU 支持安装**\n首先确保安装了正确的 PyTorch 版本（例如 torch 1.11 + CUDA 11.3）：\n```shell\npip install 'torch~=1.11.0' --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu113\n```\n然后安装 PrimeQA GPU 版本：\n```shell\npip install .[gpu]\n```\n\n**方案 C：完整开发环境安装 (可编辑模式)**\n包含测试工具和所有额外依赖：\n```shell\npip install -e .[all]\n```\n\n> **国内加速提示**：如果 pip 下载速度较慢，建议添加国内镜像源，例如：\n> `pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple .[gpu]`\n\n## 3. 基本使用\n\nPrimeQA 提供了多种入口点，包括信息检索、阅读理解和表格问答等。以下是最基础的模型加载与使用示例（基于 Hugging Face Transformers 接口）。\n\n### 示例：加载预训练模型进行推理\n\n您可以直接从 Hugging Face Hub 下载 PrimeQA 托管的预训练模型。以下示例展示如何加载一个多语言阅读理解模型：\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForQuestionAnswering\nimport torch\n\n# 1. 指定模型名称 (例如：TyDiQA 主任务模型)\nmodel_name = \"ibm\u002Ftydiqa-primary-task-xlm-roberta-large\"\n\n# 2. 加载分词器和模型\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForQuestionAnswering.from_pretrained(model_name)\n\n# 3. 准备输入数据\ncontext = \"PrimeQA is an open source repository for question answering research.\"\nquestion = \"What is PrimeQA?\"\n\ninputs = tokenizer(question, context, return_tensors=\"pt\")\n\n# 4. 执行推理\nwith torch.no_grad():\n    outputs = model(**inputs)\n\n# 5. 解析答案\nanswer_start_index = torch.argmax(outputs.start_logits)\nanswer_end_index = torch.argmax(outputs.end_logits) + 1\n\npredict_answer_tokens = inputs.input_ids[0, answer_start_index:answer_end_index]\nanswer = tokenizer.decode(predict_answer_tokens)\n\nprint(f\"问题：{question}\")\nprint(f\"答案：{answer}\")\n```\n\n### 进阶使用提示\n\n*   **信息检索 (IR)**: 查看 `primeqa\u002Fir` 目录，支持 BM25 (Pyserini) 和神经检索 (ColBERT, DPR)。\n*   **检索增强生成 (RAG)**: 参考 `notebooks\u002Fretriever-reader-pipelines\u002F` 中的 Notebook，了解如何结合 GPT-3\u002FChatGPT 与检索到的段落生成答案。\n*   **微调训练**: 查看 `examples` 目录下的脚本，学习如何在自定义数据集上微调 PrimeQA 模型。\n\n更多详细教程和 API 文档请访问 [PrimeQA 官方文档](https:\u002F\u002Fprimeqa.github.io\u002Fprimeqa)。","某跨国电商企业的本地化团队需要构建一个支持中、英、西等多语言的智能客服系统，以便从海量多语种产品文档中自动提取答案响应用户咨询。\n\n### 没有 primeqa 时\n- **多语言模型开发门槛高**：团队需分别寻找不同语言的预训练模型并手动对齐数据格式，缺乏统一框架导致重复造轮子。\n- **检索与阅读理解割裂**：传统关键词检索（如 BM25）无法理解语义，而神经检索模型又难以直接对接阅读理解模块，端到端流程搭建极其复杂。\n- **复现前沿算法困难**：想要应用最新的跨语言问答（XOR-TyDi）SOTA 成果，却因缺少官方代码和权重文件，耗费数周仍无法复现论文效果。\n- **领域适配成本高昂**：面对特有的商品表格数据和混合文本，缺乏有效的多语言问题生成工具来扩充训练数据，导致模型在垂直领域表现不佳。\n\n### 使用 primeqa 后\n- **一站式多语言开发**：直接调用 primeqa 内置的 XLM-R 等多语言 MRC 模型，统一接口即可处理十余种语言，大幅降低开发复杂度。\n- **端到端流水线集成**：利用其集成的 ColBERT 神经检索与阅读理解组件，轻松构建“检索 - 阅读”闭环，显著提升答案准确率。\n- **快速复现顶尖性能**：一键下载已在 XOR-TyDi 等榜单登顶的预训练模型（如 DrDecr），立即在自有数据上验证并部署业界最强效果。\n- **高效领域数据增强**：使用内置的多语言问题生成模型，针对商品表格自动生成高质量问答对，快速完成模型在垂直领域的微调适配。\n\nprimeqa 通过提供标准化的多语言问答全栈能力，让企业能以最低成本快速落地具备世界领先水平的智能问答系统。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprimeqa_primeqa_4fac6610.png","PrimeQA","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fprimeqa_bc4401d2.jpg","",null,"https:\u002F\u002Fgithub.com\u002Fprimeqa",[79,83,87,91,95,99],{"name":80,"color":81,"percentage":82},"Python","#3572A5",60.1,{"name":84,"color":85,"percentage":86},"Jupyter Notebook","#DA5B0B",38.1,{"name":88,"color":89,"percentage":90},"C++","#f34b7d",1.1,{"name":92,"color":93,"percentage":94},"Shell","#89e051",0.4,{"name":96,"color":97,"percentage":98},"Cuda","#3A4E3A",0.2,{"name":100,"color":101,"percentage":102},"Dockerfile","#384d54",0,739,57,"2026-03-10T20:58:03","Apache-2.0","未说明","非必需（支持 CPU），若需 GPU 加速需安装对应版本的 torch。示例中提及支持 CUDA 11.3 (torch 1.11)。建议通过 conda 安装 faiss-gpu 以获得更好性能。",{"notes":110,"python":107,"dependencies":111},"1. BM25 检索功能强制要求安装 Java 11 (可通过 conda 安装 openjdk=11)。\n2. 强烈建议使用 conda-forge 安装 faiss 库 (faiss=1.7.0, faiss-gpu=1.7.0) 以获得比 pip 默认版本更好的性能，此时需手动修改 setup.py 移除相关的 faiss 依赖行。\n3. 该工具基于 Hugging Face Transformers 构建，支持多语言问答、信息检索及检索增强生成 (RAG) 等功能。",[112,113,114,115,116,117,118],"torch~=1.11.0","transformers","datasets","faiss-cpu~=1.7.2","faiss-gpu~=1.7.2","Pyserini","openjdk=11",[13,35,14,15,120],"其他",[122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138],"ibm","ibm-research-ai","machine-learning","natural-language-processing","nlp","python","pytorch","question-answering","ai","dpr","language-model","neural-search","semantic-search","transfer-learning","bert","neural-information-retrieval","squad","2026-03-27T02:49:30.150509","2026-04-17T10:19:26.534925",[142,147,152,157,162,167,171],{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},37494,"如何为 ELI5 数据集实现长形式问答（LFQA）并使用 FiD 模型？","PrimeQA 已通过 PR #310 添加了对长形式问答的支持，具体包括使用 run_mrc.py 运行 FiD 模型（一种用于 QA 的生成式 seq2seq 模型）。实现组件包括：预处理器（Preprocessor）、FiD 模型、数据整理器（Data collator）、后处理器（Postprocessor）以及 Rouge 评估指标。目前该功能已合并并可用。","https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Fissues\u002F300",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},37495,"在 PrimeQA 中集成 ReasonBERT 模型时，如何处理长文档的分块和结果聚合？","ReasonBERT 的阅读器实现主要涉及标准的分词和分块（chunking）。对于长段落（如 TriviaQA 和 SearchQA，通常超过 700 token），需要将上下文分块，并在之后聚合不同块的得分。原始训练示例数为 128\u002F1024，但分块会导致实际训练样本略多。具体的结果聚合逻辑可参考 ReasonBERT 官方仓库中的 data_loaders.py 文件。如果性能与论文报告有差异，通常是因为阅读器实现或答案聚合方式的细微差别。","https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Fissues\u002F304",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},37496,"如何在 PrimeQA 中构建和使用置信度校准（Confidence Calibration）模块？","置信度校准模块已集成到 PrimeQA NAACL 版本中。实施步骤包括：\n1. 创建训练\u002F评估管道（run_confidence.py），输入为 MRC 模型、数据集（如 TyDiQA）和预测文件，输出为置信度模型。\n2. 增强 MRC 管道以使用置信度评分器：\n   - 添加置信度任务头（extractive mrc head + confidence output）。\n   - 扩展 MRC 输出以包含置信度特征。\n   - 在解码阶段调用置信度评分器。\n代码中需修改 ExtractiveQAModelOutput 类，增加相应的 logits 输出。实验显示，加入 Dropout 特征和 ColBert 特征可以有效降低预期校准误差（ECE）。","https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Fissues\u002F91",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},37497,"在运行基于 DPR 的检索引擎时，Faiss 版本和批次大小对查询速度有何影响？","在单张 A100 GPU 上运行 XOR-TyDi 开发集测试时，不同配置下的查询速度（queries\u002Fs）如下：\n- Faiss 1.7.0 (conda), batch size 1: 10.6 q\u002Fs\n- Faiss 1.7.0 (conda), batch size 16: 14.4 q\u002Fs\n- Faiss 1.7.2 (pip), batch size 1: 4.90 q\u002Fs\n- Faiss 1.7.2 (pip), batch size 16: 5.98 q\u002Fs\n建议使用 conda 安装的 Faiss 1.7.0 并设置较大的批次大小（如 16）以获得最佳性能。此外，bert-base-multilingual 模型在某些机器上可能较慢，建议更换机器或重新检查环境。","https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Fissues\u002F169",{"id":163,"question_zh":164,"answer_zh":165,"source_url":166},37498,"在处理 KILT-ELI5 数据集时，如何确定维基百科段落与 ELI5 答案之间的相似度阈值？","通过对 5000 个训练 - 开发样本的分析，发现当相似度阈值设为 0.25 时效果较为合适。数据显示，仅使用 ELI5 数据的相似度超过 0.1 的有 1543 对，而结合维基百科数据后增加到 1948 对。具体的相似度分布表明，>0.3 的高相似度样本中，包含维基百科信息的组合（如 eli5-wiki-wiki）数量显著多于纯 ELI5 组合，因此 0.25 是一个平衡召回率和精度的适当阈值。","https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Fissues\u002F135",{"id":168,"question_zh":169,"answer_zh":170,"source_url":161},37499,"ColBERT 和 DPR 模型在跨语言 XOR-TyDi 检索任务上的表现有何差异？","在 XOR-TyDi 检索任务（无答案选择）的开发集上，不同模型的 R@5kt 和 R@2kt 指标对比如下：\n- ColBERT (XLMR, 英文训练): R@5kt 75.1, R@2kt 70.1\n- ColBERT (XLMR, 跨语言训练): R@5kt 53.4, R@2kt 46.0\n- DPR (论文基准，英文测试): R@5kt 69.6, R@2kt 62.2\n- DPR (英文人类版本训练): R@5kt 70.2, R@2kt 64.3\n- DPR (bert-base-multilingual, 跨语言训练): R@5kt 46.3, R@2kt 37.9\n结果表明，使用英文数据训练的 ColBERT 和 DPR 模型在英文测试集上表现最好，而跨语言训练或使用多语言 BERT 的 DPR 模型性能下降明显。",{"id":172,"question_zh":173,"answer_zh":174,"source_url":166},37500,"为什么从 Transformers 4.3.3 升级到 4.17.0 后，之前的训练选择策略不再有效？","在将 Transformers 库从 4.3.3 升级到 4.17.0 后，原有的训练选择策略（用于 KILT-ELI5 数据集的实验）未能显示出相对于基线的改进。这可能是因为新版本库内部机制的变化导致旧策略失效。目前该项目处于非活跃状态，建议用户在使用新版本 Transformers 时重新评估或调整训练策略，或者直接采用最新的基线方法。",[176,181,186,191,196,201,206,211,216,221],{"id":177,"version":178,"summary_zh":179,"released_at":180},298024,"v0.15.2-alpha","教程笔记本\n可搜索语料库类\npredict 方法已弃用，由重新排序器组件中的 rerank 方法取代\nDPR 重新排序器\n嵌入组件\n\n","2023-06-28T11:53:46",{"id":182,"version":183,"summary_zh":184,"released_at":185},298025,"v0.14.5-alpha","文档集合工具\r\n错误修复重排序组件\n\n[https:\u002F\u002Fpypi.org\u002Fproject\u002Fprimeqa\u002F0.14.5\u002F](https:\u002F\u002Fpypi.org\u002Fproject\u002Fprimeqa\u002F0.14.5\u002F)\n提交：31f78de","2023-06-13T18:16:19",{"id":187,"version":188,"summary_zh":189,"released_at":190},298026,"v0.14.2-alpha","https:\u002F\u002Fpypi.org\u002Fproject\u002Fprimeqa\u002F0.14.2\u002F\n生成式阅读器\n重排序组件\nUDAPDR","2023-04-19T15:18:05",{"id":192,"version":193,"summary_zh":194,"released_at":195},298027,"v0.11.8-alpha","[ec1ea11](https:\u002F\u002Fgithub.com\u002Fprimeqa\u002Fprimeqa\u002Fcommit\u002Fec1ea1103d566f6a4c70a10d4c9754dc1ad64bb9)","2023-03-07T19:22:48",{"id":197,"version":198,"summary_zh":199,"released_at":200},298028,"v0.9.9-alpha","抽取式阅读理解组件中更新了默认的阅读模型  \n“[PrimeQA](https:\u002F\u002Fhuggingface.co\u002FPrimeQA)\u002F[nq_tydi_sq1-reader-xlmr_large-20221110](https:\u002F\u002Fhuggingface.co\u002FPrimeQA\u002Fnq_tydi_sq1-reader-xlmr_large-20221110)”","2022-11-15T15:39:21",{"id":202,"version":203,"summary_zh":204,"released_at":205},298029,"v0.9.8-alpha","改进服务组件的初始化\n文档更新\n- 如何设置服务存储并快速集成索引、检查点\n- 通过 Conda 安装 FAISS 库\n","2022-11-15T01:53:10",{"id":207,"version":208,"summary_zh":209,"released_at":210},298030,"v0.9.7-alpha","1. 添加 MRQA 预处理工具  \n2. 简化 run_ir 的参数  \n3. 支持标注 EM 数据  \n4. 更新 README，说明如何通过 Conda 安装 Faiss  \n4. 更新 services\u002FREADME，介绍如何使用反馈数据进行微调","2022-11-10T13:42:41",{"id":212,"version":213,"summary_zh":214,"released_at":215},298031,"v0.9.3-alpha","本次发布新增以下支持：\n\n1. 密集段落检索（DPR）\n2. 流水线与组件接口\n3. gRPC 和 REST 服务层\n4. 将 PrimeQA 容器化（CPU 和 GPU 版本）\n5. 使用自定义数据和反馈数据进行阅读理解型问答（MRC）训练\n6. 自动化文档构建\n7. 更新了 PrimeQA 的 logo","2022-10-27T16:26:18",{"id":217,"version":218,"summary_zh":219,"released_at":220},298032,"v0.9.3-2022-10-24","这会标记用于构建 PrimeQA 容器并将其发布到 Docker Hub 的版本：https:\u002F\u002Fhub.docker.com\u002Fr\u002Fprimeqa\u002Fservices\u002Ftags。","2022-10-24T19:14:07",{"id":222,"version":223,"summary_zh":224,"released_at":225},298033,"v0.8.0-alpha","本版本包含以下内容：\n- 通过 primeqa\u002Fir\u002Frun_ir.py 脚本实现的 IR 支持稠密检索和稀疏检索\n- 通过 primeqa\u002Fmrc\u002Frun_mrc.py 实现的 MRC 支持 TyDI、SQUAD 和 MLQA 数据集\n- TyDIQA 对布尔型问题的支持\n- 基于表格和文本的问题生成\n- 使用 GitHub Actions 实现持续集成","2022-08-09T14:32:10"]