[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-cdqa-suite--cdQA":3,"tool-cdqa-suite--cdQA":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":67,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":86,"forks":87,"last_commit_at":88,"license":89,"difficulty_score":10,"env_os":90,"env_gpu":91,"env_ram":92,"env_deps":93,"category_tags":99,"github_topics":100,"view_count":10,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":110,"updated_at":111,"faqs":112,"releases":142},654,"cdqa-suite\u002FcdQA","cdQA","⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.","cdQA 是一款专为封闭领域设计的端到端问答系统，底层依托于强大的 HuggingFace transformers 库。它旨在帮助技术团队轻松构建基于私有文档的智能问答机器人，解决了传统 NLP 项目中数据预处理复杂、模型集成门槛高的问题。\n\n对于开发者与研究人员而言，cdQA 提供了从数据准备到模型部署的一站式体验。用户只需准备包含标题和段落的结构化数据，或利用内置转换器直接处理 PDF、Markdown 等常见文档格式，即可快速启动训练流程。系统内置了预训练模型下载、训练、预测及评估模块，大幅降低了搭建垂直领域问答应用的技术成本。\n\n值得注意的是，cdQA 目前已进入非维护状态，仅保留用于教育目的。若寻求生产环境中的稳定替代方案，建议参考其推荐的 Haystack 框架。但在理解问答系统原理及快速原型验证方面，cdQA 依然是一个值得探索的优秀开源项目。","# cdQA: Closed Domain Question Answering\n\n[![Build Status](https:\u002F\u002Ftravis-ci.com\u002Fcdqa-suite\u002FcdQA.svg?branch=master)](https:\u002F\u002Ftravis-ci.com\u002Fcdqa-suite\u002FcdQA)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fcdqa-suite\u002FcdQA\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fcdqa-suite\u002FcdQA)\n[![PyPI Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fcdqa.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fcdqa\u002F)\n[![PyPI Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fcdqa.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fcdqa\u002F)\n[![Binder](https:\u002F\u002Fmybinder.org\u002Fbadge_logo.svg)](https:\u002F\u002Fmybinder.org\u002Fv2\u002Fgh\u002Fcdqa-suite\u002FcdQA\u002Fmaster?filepath=examples%2Ftutorial-first-steps-cdqa.ipynb)\n[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcdqa-suite\u002FcdQA\u002Fblob\u002Fmaster\u002Fexamples\u002Ftutorial-first-steps-cdqa.ipynb)\n[![Contributor Covenant](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContributor%20Covenant-v1.4%20adopted-ff69b4.svg)](.github\u002FCODE_OF_CONDUCT.md)\n[![PRs Welcome](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-brightgreen.svg)](http:\u002F\u002Fmakeapullrequest.com)\n![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fcdqa-suite\u002FcdQA.svg)\n\nAn End-To-End Closed Domain Question Answering System. Built on top of the HuggingFace [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) library.\n\n**⛔ [NOT MAINTAINED] This repository is no longer maintained, but is being kept around for educational purposes. If you want a maintained alternative to cdQA check out: https:\u002F\u002Fgithub.com\u002Fdeepset-ai\u002Fhaystack**\n\n## cdQA in details\n\nIf you are interested in understanding how the system works and its implementation, we wrote an [article on Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-to-create-your-own-question-answering-system-easily-with-python-2ef8abc8eb5) with a high-level explanation.\n\nWe also made a presentation during the \\#9 NLP Breakfast organised by [Feedly](feedly.com). You can check it out [here](https:\u002F\u002Fblog.feedly.com\u002Fnlp-breakfast-9-closed-domain-question-answering\u002F).\n\n## Table of Contents \u003C!-- omit in toc -->\n\n- [Installation](#Installation)\n  - [With pip](#With-pip)\n  - [From source](#From-source)\n  - [Hardware Requirements](#Hardware-Requirements)\n- [Getting started](#Getting-started)\n  - [Preparing your data](#Preparing-your-data)\n    - [Manual](#Manual)\n    - [With converters](#With-converters)\n  - [Downloading pre-trained models](#Downloading-pre-trained-models)\n  - [Training models](#Training-models)\n  - [Making predictions](#Making-predictions)\n  - [Evaluating models](#Evaluating-models)\n- [Notebook Examples](#Notebook-Examples)\n- [Deployment](#Deployment)\n  - [Manual](#Manual-1)\n- [Contributing](#Contributing)\n- [References](#References)\n- [LICENSE](#LICENSE)\n\n## Installation\n\n### With pip\n\n```shell\npip install cdqa\n```\n\n### From source\n\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA.git\ncd cdQA\npip install -e .\n```\n\n### Hardware Requirements\n\nExperiments have been done with:\n\n- **CPU** 👉 AWS EC2 `t2.medium` Deep Learning AMI (Ubuntu) Version 22.0\n- **GPU** 👉 AWS EC2 `p3.2xlarge` Deep Learning AMI (Ubuntu) Version 22.0 + a single Tesla V100 16GB.\n\n## Getting started\n\n### Preparing your data\n\n#### Manual\n\nTo use `cdQA` you need to create a pandas dataframe with the following columns:\n\n| title             | paragraphs                                             |\n| ----------------- | ------------------------------------------------------ |\n| The Article Title | [Paragraph 1 of Article, ... , Paragraph N of Article] |\n\n#### With converters\n\nThe objective of `cdqa` converters is to make it easy to create this dataframe from your raw documents database. For instance the `pdf_converter` can create a `cdqa` dataframe from a directory containing `.pdf` files:\n\n```python\nfrom cdqa.utils.converters import pdf_converter\n\ndf = pdf_converter(directory_path='path_to_pdf_folder')\n```\n\nYou will need to install [Java OpenJDK](https:\u002F\u002Fopenjdk.java.net\u002Finstall\u002F) to use this converter. We currently have converters for:\n\n- pdf\n- markdown\n\nWe plan to improve and add more converters in the future. Stay tuned!\n\n### Downloading pre-trained models and data\n\nYou can download the models and data manually from the GitHub [releases](https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA\u002Freleases) or use our download functions:\n\n```python\nfrom cdqa.utils.download import download_squad, download_model, download_bnpp_data\n\ndirectory = 'path-to-directory'\n\n# Downloading data\ndownload_squad(dir=directory)\ndownload_bnpp_data(dir=directory)\n\n# Downloading pre-trained BERT fine-tuned on SQuAD 1.1\ndownload_model('bert-squad_1.1', dir=directory)\n\n# Downloading pre-trained DistilBERT fine-tuned on SQuAD 1.1\ndownload_model('distilbert-squad_1.1', dir=directory)\n```\n\n### Training models\n\nFit the pipeline on your corpus using the pre-trained reader:\n\n```python\nimport pandas as pd\nfrom ast import literal_eval\nfrom cdqa.pipeline import QAPipeline\n\ndf = pd.read_csv('your-custom-corpus-here.csv', converters={'paragraphs': literal_eval})\n\ncdqa_pipeline = QAPipeline(reader='bert_qa.joblib') # use 'distilbert_qa.joblib' for DistilBERT instead of BERT\ncdqa_pipeline.fit_retriever(df=df)\n```\n\nIf you want to fine-tune the reader on your custom SQuAD-like annotated dataset:\n\n```python\ncdqa_pipeline = QAPipeline(reader='bert_qa.joblib') # use 'distilbert_qa.joblib' for DistilBERT instead of BERT\ncdqa_pipeline.fit_reader('path-to-custom-squad-like-dataset.json')\n```\n\nSave the reader model after fine-tuning:\n```python\ncdqa_pipeline.dump_reader('path-to-save-bert-reader.joblib')\n```\n### Making predictions\n\nTo get the best prediction given an input query:\n\n```python\ncdqa_pipeline.predict(query='your question')\n```\n\nTo get the N best predictions:\n```python\ncdqa_pipeline.predict(query='your question', n_predictions=N)\n```\n\nThere is also the possibility to change the weight of the retriever score\nversus the reader score in the computation of final ranking score (the default is 0.35, which is shown to be the best weight on the development set of SQuAD 1.1-open)\n\n```python\ncdqa_pipeline.predict(query='your question', retriever_score_weight=0.35)\n```\n\n### Evaluating models\n\nIn order to evaluate models on your custom dataset you will need to annotate it. The annotation process can be done in 3 steps:\n\n1. Convert your pandas DataFrame into a json file with SQuAD format:\n\n    ```python\n    from cdqa.utils.converters import df2squad\n\n    json_data = df2squad(df=df, squad_version='v1.1', output_dir='.', filename='dataset-name')\n    ```\n\n2. Use an annotator to add ground truth question-answer pairs:\n\n    Please refer to our [`cdQA-annotator`](https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA-annotator), a web-based annotator for closed-domain question answering datasets with SQuAD format.\n\n3. Evaluate the pipeline object:\n\n    ```python\n    from cdqa.utils.evaluation import evaluate_pipeline\n\n    evaluate_pipeline(cdqa_pipeline, 'path-to-annotated-dataset.json')\n\n    ```\n\n4. Evaluate the reader:\n\n    ```python\n    from cdqa.utils.evaluation import evaluate_reader\n\n    evaluate_reader(cdqa_pipeline, 'path-to-annotated-dataset.json')\n    ```\n\n## Notebook Examples\n\nWe prepared some notebook examples under the [examples](examples) directory.\n\nYou can also play directly with these notebook examples using [Binder](https:\u002F\u002Fgke.mybinder.org\u002F) or [Google Colaboratory](https:\u002F\u002Fcolab.research.google.com\u002Fnotebooks\u002Fwelcome.ipynb):\n\n| Notebook                         | Hardware     | Platform                                                                                                                                                                                                                                                                                                                                      |\n| -------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [1] First steps with cdQA        | CPU or GPU | [![Binder](https:\u002F\u002Fmybinder.org\u002Fbadge_logo.svg)](https:\u002F\u002Fmybinder.org\u002Fv2\u002Fgh\u002Fcdqa-suite\u002FcdQA\u002Fmaster?filepath=examples%2Ftutorial-first-steps-cdqa.ipynb) [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcdqa-suite\u002FcdQA\u002Fblob\u002Fmaster\u002Fexamples\u002Ftutorial-first-steps-cdqa.ipynb)   |\n| [2] Using the PDF converter      | CPU or GPU | [![Binder](https:\u002F\u002Fmybinder.org\u002Fbadge_logo.svg)](https:\u002F\u002Fmybinder.org\u002Fv2\u002Fgh\u002Fcdqa-suite\u002FcdQA\u002Fmaster?filepath=examples%2Ftutorial-use-pdf-converter.ipynb) [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcdqa-suite\u002FcdQA\u002Fblob\u002Fmaster\u002Fexamples\u002Ftutorial-use-pdf-converter.ipynb) |\n| [3] Training the reader on SQuAD | GPU        | [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcdqa-suite\u002FcdQA\u002Fblob\u002Fmaster\u002Fexamples\u002Ftutorial-train-reader-squad.ipynb)                                                                                                                                                         |\n\nBinder and Google Colaboratory provide temporary environments and may be slow to start but we recommend them if you want to get started with `cdQA` easily.\n\n## Deployment\n\n### Manual\n\nYou can deploy a `cdQA` REST API by executing:\n\n```shell\nexport dataset_path=path-to-dataset.csv\nexport reader_path=path-to-reader-model\n\nFLASK_APP=api.py flask run -h 0.0.0.0\n```\n\nYou can now make requests to test your API (here using [HTTPie](https:\u002F\u002Fhttpie.org\u002F)):\n\n```shell\nhttp localhost:5000\u002Fapi query=='your question here'\n```\n\nIf you wish to serve a user interface on top of your `cdQA` system, follow the instructions of [cdQA-ui](https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA-ui), a web interface developed for `cdQA`.\n\n## Contributing\n\nRead our [Contributing Guidelines](.github\u002FCONTRIBUTING.md).\n\n## References\n\n| Type                 | Title                                                                                                                                        | Author                                                                                 | Year |\n| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- | ---- |\n| :video_camera: Video | [Stanford CS224N: NLP with Deep Learning Lecture 10 – Question Answering](https:\u002F\u002Fyoutube.com\u002Fwatch?v=yIdF-17HwSk)                           | Christopher Manning                                                                    | 2019 |\n| :newspaper: Paper    | [Reading Wikipedia to Answer Open-Domain Questions](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.00051)                                                        | Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes                                   | 2017 |\n| :newspaper: Paper    | [Neural Reading Comprehension and Beyond](https:\u002F\u002Fcs.stanford.edu\u002Fpeople\u002Fdanqi\u002Fpapers\u002Fthesis.pdf)                                            | Danqi Chen                                                                             | 2018 |\n| :newspaper: Paper    | [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.04805)                         | Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova                           | 2018 |\n| :newspaper: Paper    | [Contextual Word Representations: A Contextual Introduction](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.06006)                                               | Noah A. Smith                                                                          | 2019 |\n| :newspaper: Paper    | [End-to-End Open-Domain Question Answering with BERTserini](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.01718)                                                | Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin | 2019 |\n| :newspaper: Paper    | [Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.06652)                                 | Wei Yang, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin                        | 2019 |\n| :newspaper: Paper    | [Passage Re-ranking with BERT](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.04085)                                                                             | Rodrigo Nogueira, Kyunghyun Cho                                                        | 2019 |\n| :newspaper: Paper    | [MRQA: Machine Reading for Question Answering](https:\u002F\u002Fmrqa.github.io\u002F)                                                                      | Jonathan Berant, Percy Liang, Luke Zettlemoyer                                         | 2019 |\n| :newspaper: Paper    | [Unsupervised Question Answering by Cloze Translation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.04980)                                                     | Patrick Lewis, Ludovic Denoyer, Sebastian Riedel                                       | 2019 |\n| :computer: Framework | [Scikit-learn: Machine Learning in Python](http:\u002F\u002Fjmlr.csail.mit.edu\u002Fpapers\u002Fv12\u002Fpedregosa11a.html)                                           | Pedregosa et al.                                                                       | 2011 |\n| :computer: Framework | [PyTorch](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.04980)                                                                                                  | Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan                               | 2016 |\n| :computer: Framework | [Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) | Hugging Face                                                                           | 2018 |\n\n## LICENSE\n\n[Apache-2.0](LICENSE)\n","\u003C\u002Fthink>\n\n# cdQA：封闭领域问答系统\n\n[![Build Status](https:\u002F\u002Ftravis-ci.com\u002Fcdqa-suite\u002FcdQA.svg?branch=master)](https:\u002F\u002Ftravis-ci.com\u002Fcdqa-suite\u002FcdQA)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fcdqa-suite\u002FcdQA\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fcdqa-suite\u002FcdQA)\n[![PyPI Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fcdqa.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fcdqa\u002F)\n[![PyPI Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fcdqa.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fcdqa\u002F)\n[![Binder](https:\u002F\u002Fmybinder.org\u002Fbadge_logo.svg)](https:\u002F\u002Fmybinder.org\u002Fv2\u002Fgh\u002Fcdqa-suite\u002FcdQA\u002Fmaster?filepath=examples%2Ftutorial-first-steps-cdqa.ipynb)\n[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcdqa-suite\u002FcdQA\u002Fblob\u002Fmaster\u002Fexamples\u002Ftutorial-first-steps-cdqa.ipynb)\n[![Contributor Covenant](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContributor%20Covenant-v1.4%20adopted-ff69b4.svg)](.github\u002FCODE_OF_CONDUCT.md)\n[![PRs Welcome](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-brightgreen.svg)](http:\u002F\u002Fmakeapullrequest.com)\n![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fcdqa-suite\u002FcdQA.svg)\n\n一个端到端的封闭领域问答（Closed Domain Question Answering）系统。基于 HuggingFace [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) 库构建。\n\n**⛔ [不再维护]** 此仓库已不再维护，但保留用于教育目的。如果您想要 cdQA 的替代维护版本，请查看：https:\u002F\u002Fgithub.com\u002Fdeepset-ai\u002Fhaystack\n\n## cdQA 详解\n\n如果您对了解系统工作原理及其实现感兴趣，我们撰写了一篇 [Medium 文章](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-to-create-your-own-question-answering-system-easily-with-python-2ef8abc8eb5)，其中包含高层解释。\n\n我们还参加了由 [Feedly](feedly.com) 组织的第 9 届 NLP Breakfast 会议并进行了演示。您可以在此处查看 [此处](https:\u002F\u002Fblog.feedly.com\u002Fnlp-breakfast-9-closed-domain-question-answering\u002F)。\n\n## 目录 \u003C!-- omit in toc -->\n\n- [安装](#installation)\n  - [使用 pip](#with-pip)\n  - [从源码](#from-source)\n  - [硬件要求](#hardware-requirements)\n- [入门指南](#getting-started)\n  - [准备数据](#preparing-your-data)\n    - [手动](#manual)\n    - [使用转换器](#with-converters)\n  - [下载预训练模型和数据](#downloading-pre-trained-models-and-data)\n  - [训练模型](#training-models)\n  - [进行预测](#making-predictions)\n  - [评估模型](#evaluating-models)\n- [笔记本示例](#notebook-examples)\n- [部署](#deployment)\n  - [手动](#manual-1)\n- [贡献](#contributing)\n- [参考文献](#references)\n- [许可证](#license)\n\n## 安装\n\n### 使用 pip\n\n```shell\npip install cdqa\n```\n\n### 从源码\n\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA.git\ncd cdQA\npip install -e .\n```\n\n### 硬件要求\n\n实验是在以下环境下进行的：\n\n- **CPU** 👉 AWS EC2 `t2.medium` Deep Learning AMI (Ubuntu) Version 22.0\n- **GPU** 👉 AWS EC2 `p3.2xlarge` Deep Learning AMI (Ubuntu) Version 22.0 + 单张 Tesla V100 16GB。\n\n## 入门指南\n\n### 准备数据\n\n#### 手动\n\n要使用 `cdQA`，您需要创建一个包含以下列的 pandas DataFrame（pandas 数据框）：\n\n| title             | paragraphs                                             |\n| ----------------- | ------------------------------------------------------ |\n| The Article Title | [Paragraph 1 of Article, ... , Paragraph N of Article] |\n\n#### 使用转换器\n\n`cdqa` 转换器的目标是让您能够轻松地从原始文档数据库创建此 DataFrame。例如，`pdf_converter` 可以从包含 `.pdf` 文件的目录创建 `cdqa` DataFrame：\n\n```python\nfrom cdqa.utils.converters import pdf_converter\n\ndf = pdf_converter(directory_path='path_to_pdf_folder')\n```\n\n您需要安装 [Java OpenJDK](https:\u002F\u002Fopenjdk.java.net\u002Finstall\u002F) 才能使用此转换器。我们目前拥有以下转换器：\n\n- pdf\n- markdown\n\n我们计划在未来改进并添加更多转换器。敬请期待！\n\n### 下载预训练模型和数据\n\n您可以从 GitHub [发布页面](https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA\u002Freleases) 手动下载模型和数据，或者使用我们的下载函数：\n\n```python\nfrom cdqa.utils.download import download_squad, download_model, download_bnpp_data\n\ndirectory = 'path-to-directory'\n\n# Downloading data\ndownload_squad(dir=directory)\ndownload_bnpp_data(dir=directory)\n\n# Downloading pre-trained BERT fine-tuned on SQuAD 1.1\ndownload_model('bert-squad_1.1', dir=directory)\n\n# Downloading pre-trained DistilBERT fine-tuned on SQuAD 1.1\ndownload_model('distilbert-squad_1.1', dir=directory)\n```\n\n### 训练模型\n\n使用预训练的阅读器在您的语料库上拟合管道：\n\n```python\nimport pandas as pd\nfrom ast import literal_eval\nfrom cdqa.pipeline import QAPipeline\n\ndf = pd.read_csv('your-custom-corpus-here.csv', converters={'paragraphs': literal_eval})\n\ncdqa_pipeline = QAPipeline(reader='bert_qa.joblib') # use 'distilbert_qa.joblib' for DistilBERT instead of BERT\ncdqa_pipeline.fit_retriever(df=df)\n```\n\n如果您想在自定义的 SQuAD 风格标注数据集上微调阅读器：\n\n```python\ncdqa_pipeline = QAPipeline(reader='bert_qa.joblib') # use 'distilbert_qa.joblib' for DistilBERT instead of BERT\ncdqa_pipeline.fit_reader('path-to-custom-squad-like-dataset.json')\n```\n\n微调后保存阅读器模型：\n```python\ncdqa_pipeline.dump_reader('path-to-save-bert-reader.joblib')\n```\n### 进行预测\n\n获取给定输入查询的最佳预测：\n\n```python\ncdqa_pipeline.predict(query='your question')\n```\n\n获取 N 个最佳预测：\n```python\ncdqa_pipeline.predict(query='your question', n_predictions=N)\n```\n\n还可以更改检索器分数与阅读器分数的权重，以计算最终排名分数（默认值为 0.35，这在 SQuAD 1.1-open 的开发集上被证明是最佳权重）\n\n```python\ncdqa_pipeline.predict(query='your question', retriever_score_weight=0.35)\n```\n\n### 评估模型\n\n要在自定义数据集上评估模型，您需要对其进行标注。标注过程可分为 3 个步骤：\n\n1. 将您的 pandas DataFrame 转换为具有 SQuAD 格式的 json 文件：\n\n    ```python\n    from cdqa.utils.converters import df2squad\n\n    json_data = df2squad(df=df, squad_version='v1.1', output_dir='.', filename='dataset-name')\n    ```\n\n2. 使用标注工具添加 ground truth（真实值）问题 - 答案对：\n\n    请参阅我们的 [`cdQA-annotator`](https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA-annotator)，这是一个用于带有 SQuAD 格式的封闭域问答数据集的基于 Web 的 annotator（标注工具）。\n\n3. 评估 pipeline（流水线）对象：\n\n    ```python\n    from cdqa.utils.evaluation import evaluate_pipeline\n\n    evaluate_pipeline(cdqa_pipeline, 'path-to-annotated-dataset.json')\n\n    ```\n\n4. 评估 reader（阅读模型）：\n\n    ```python\n    from cdqa.utils.evaluation import evaluate_reader\n\n    evaluate_reader(cdqa_pipeline, 'path-to-annotated-dataset.json')\n    ```\n\n## Notebook 示例\n\n我们在 [examples](examples) 目录下准备了一些 Notebook 示例。\n\n您也可以使用 [Binder](https:\u002F\u002Fgke.mybinder.org\u002F) 或 [Google Colaboratory](https:\u002F\u002Fcolab.research.google.com\u002Fnotebooks\u002Fwelcome.ipynb) 直接运行这些 Notebook 示例：\n\n| Notebook                         | Hardware     | Platform                                                                                                                                                                                                                                                                                                                                      |\n| -------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| [1] cdQA 入门步骤                | CPU 或 GPU   | [![Binder](https:\u002F\u002Fmybinder.org\u002Fbadge_logo.svg)](https:\u002F\u002Fmybinder.org\u002Fv2\u002Fgh\u002Fcdqa-suite\u002FcdQA\u002Fmaster?filepath=examples%2Ftutorial-first-steps-cdqa.ipynb) [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcdqa-suite\u002FcdQA\u002Fblob\u002Fmaster\u002Fexamples\u002Ftutorial-first-steps-cdqa.ipynb)   |\n| [2] 使用 PDF 转换器              | CPU 或 GPU   | [![Binder](https:\u002F\u002Fmybinder.org\u002Fbadge_logo.svg)](https:\u002F\u002Fmybinder.org\u002Fv2\u002Fgh\u002Fcdqa-suite\u002FcdQA\u002Fmaster?filepath=examples%2Ftutorial-use-pdf-converter.ipynb) [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcdqa-suite\u002FcdQA\u002Fblob\u002Fmaster\u002Fexamples\u002Ftutorial-use-pdf-converter.ipynb) |\n| [3] 在 SQuAD 上训练阅读模型      | GPU          | [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcdqa-suite\u002FcdQA\u002Fblob\u002Fmaster\u002Fexamples\u002Ftutorial-train-reader-squad.ipynb)                                                                                                                                                         |\n\nBinder 和 Google Colaboratory 提供临时环境，启动可能较慢，但如果您想轻松开始使用 `cdQA`，我们推荐它们。\n\n## 部署\n\n### 手动\n\n您可以通过执行以下命令来部署 `cdQA` REST API：\n\n```shell\nexport dataset_path=path-to-dataset.csv\nexport reader_path=path-to-reader-model\n\nFLASK_APP=api.py flask run -h 0.0.0.0\n```\n\n现在您可以发送请求来测试您的 API（此处使用 [HTTPie](https:\u002F\u002Fhttpie.org\u002F)）：\n\n```shell\nhttp localhost:5000\u002Fapi query=='your question here'\n```\n\n如果您希望在 `cdQA` 系统之上提供用户界面，请遵循 [cdQA-ui](https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA-ui) 的说明，这是为 `cdQA` 开发的 Web 界面。\n\n## 贡献\n\n请阅读我们的 [贡献指南](.github\u002FCONTRIBUTING.md)。\n\n## 参考文献\n\n| 类型                 | 标题                                                                                                                                                                                                                       | 作者                                                                                 | 年份 |\n| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ | ---- |\n| :video_camera: 视频  | [斯坦福 CS224N：自然语言处理 (NLP) 与深度学习 第 10 讲 – 问答系统](https:\u002F\u002Fyoutube.com\u002Fwatch?v=yIdF-17HwSk)                                                                                                               | Christopher Manning                                                                  | 2019 |\n| :newspaper: 论文     | [阅读维基百科以回答开放域问题](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.00051)                                                                                                                                                           | Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes                                 | 2017 |\n| :newspaper: 论文     | [神经阅读理解与超越](https:\u002F\u002Fcs.stanford.edu\u002Fpeople\u002Fdanqi\u002Fpapers\u002Fthesis.pdf)                                                                                                                                               | Danqi Chen                                                                           | 2018 |\n| :newspaper: 论文     | [BERT：用于语言理解的深度双向 Transformer（变换器）预训练](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.04805)                                                                                                                             | Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova                         | 2018 |\n| :newspaper: 论文     | [上下文词表示：一种上下文介绍](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.06006)                                                                                                                                                           | Noah A. Smith                                                                        | 2019 |\n| :newspaper: 论文     | [基于 BERTserini 的端到端开放域问答系统](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.01718)                                                                                                                                                 | Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin | 2019 |\n| :newspaper: 论文     | [开放域问答中 BERT 微调 (Fine-Tuning) 的数据增强](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.06652)                                                                                                                                         | Wei Yang, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin                        | 2019 |\n| :newspaper: 论文     | [基于 BERT 的段落重排序 (Re-ranking)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.04085)                                                                                                                                                     | Rodrigo Nogueira, Kyunghyun Cho                                                      | 2019 |\n| :newspaper: 论文     | [MRQA：面向问答的机器阅读](https:\u002F\u002Fmrqa.github.io\u002F)                                                                                                                                                                        | Jonathan Berant, Percy Liang, Luke Zettlemoyer                                         | 2019 |\n| :newspaper: 论文     | [通过完形填空翻译进行无监督问答](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.04980)                                                                                                                                                         | Patrick Lewis, Ludovic Denoyer, Sebastian Riedel                                       | 2019 |\n| :computer: 框架      | [Scikit-learn：Python 中的机器学习](http:\u002F\u002Fjmlr.csail.mit.edu\u002Fpapers\u002Fv12\u002Fpedregosa11a.html)                                                                                                                               | Pedregosa et al.                                                                     | 2011 |\n| :computer: 框架      | [PyTorch](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.04980)                                                                                                                                                                                | Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan                               | 2016 |\n| :computer: 框架      | [Transformers：适用于 TensorFlow 2.0 和 PyTorch 的尖端自然语言处理库](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)                                                                                                           | Hugging Face                                                                         | 2018 |\n\n## 许可证\n\n[Apache-2.0](LICENSE)","# cdQA 快速上手指南\n\n> ⚠️ **重要提示**：该项目已**停止维护**（Not Maintained），仅保留用于教育目的。如需生产级替代方案，建议查看 [Haystack](https:\u002F\u002Fgithub.com\u002Fdeepset-ai\u002Fhaystack)。\n\n## 1. 环境准备\n\n### 系统要求\n*   **操作系统**: Linux (Ubuntu 22.0+), macOS, Windows (WSL)\n*   **硬件**:\n    *   **CPU**: 支持标准推理 (如 AWS EC2 t2.medium)\n    *   **GPU (推荐)**: 单卡 Tesla V100 16GB 或更高性能显卡以加速训练与推理\n*   **依赖**:\n    *   Python 环境\n    *   Java OpenJDK (仅在使用 PDF 转换器时需要)\n\n## 2. 安装步骤\n\n### 方式一：通过 pip 安装 (推荐)\n```shell\npip install cdqa\n```\n\n### 方式二：从源码安装\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA.git\ncd cdQA\npip install -e .\n```\n\n## 3. 基本使用\n\n### 第一步：准备数据\n创建包含 `title` 和 `paragraphs` 列的 Pandas DataFrame。\n```python\nimport pandas as pd\nfrom ast import literal_eval\n\n# 示例：读取 CSV 并转换 paragraphs 列为列表\ndf = pd.read_csv('your-custom-corpus-here.csv', converters={'paragraphs': literal_eval})\n```\n*注：也可使用内置转换器将 PDF 或 Markdown 文件转换为所需格式。*\n\n### 第二步：下载预训练模型\n需手动下载模型文件或运行以下脚本：\n```python\nfrom cdqa.utils.download import download_model\n\ndirectory = 'path-to-directory'\n# 下载基于 SQuAD 1.1 微调的 BERT 模型\ndownload_model('bert-squad_1.1', dir=directory)\n```\n\n### 第三步：初始化管道与预测\n加载模型并进行问答预测。\n```python\nfrom cdqa.pipeline import QAPipeline\n\n# 初始化管道 (使用 distilbert_qa.joblib 可换取更快速度)\ncdqa_pipeline = QAPipeline(reader='bert_qa.joblib')\n\n# 检索器适配 (可选，视数据集大小而定)\ncdqa_pipeline.fit_retriever(df=df)\n\n# 进行预测\nresult = cdqa_pipeline.predict(query='你的问题')\nprint(result)\n```\n\n### 进阶：微调阅读模型\n若拥有自定义标注数据 (SQuAD 格式)，可进一步微调 Reader：\n```python\n# 在自定义数据集上微调\ncdqa_pipeline.fit_reader('path-to-custom-squad-like-dataset.json')\n\n# 保存微调后的模型\ncdqa_pipeline.dump_reader('path-to-save-bert-reader.joblib')\n```\n\n## 4. 其他资源\n*   **在线体验**: 可通过 [Binder](https:\u002F\u002Fmybinder.org\u002Fv2\u002Fgh\u002Fcdqa-suite\u002FcdQA\u002Fmaster?filepath=examples%2Ftutorial-first-steps-cdqa.ipynb) 或 [Google Colab](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcdqa-suite\u002FcdQA\u002Fblob\u002Fmaster\u002Fexamples\u002Ftutorial-first-steps-cdqa.ipynb) 直接运行 Notebook 示例。\n*   **API 部署**: 支持 Flask 部署 REST API，详见项目文档中的 Deployment 章节。","某大型制造企业 IT 部门计划搭建内部技术问答系统，帮助一线工程师快速解决设备报错问题。\n\n### 没有 cdQA 时\n- 工程师需手动打开数十个 PDF 手册，逐页搜索关键词，查找特定参数耗时过长且容易遗漏。\n- 传统搜索引擎无法理解“服务器启动失败怎么办”这类语义问题，只能匹配字面词导致结果不相关。\n- 文档更新频繁，维护一套独立的检索数据库需要大量人力进行数据清洗、去重和格式标准化。\n- 想要将知识接入 Slack 或钉钉机器人，缺乏现成的 API 接口，开发周期长且代码耦合度高。\n\n### 使用 cdQA 后\n- 利用 pdf_converter 一键解析文档目录，自动构建包含标题与段落的 Pandas 数据框，省去繁琐预处理。\n- 内置预训练模型能精准理解用户意图，直接从段落中提取答案而非仅返回链接，响应速度更快。\n- 提供完整的训练与评估流程，可根据内部术语微调模型，提升垂直领域准确率并监控效果。\n- 支持 Docker 部署与 API 调用，数小时内即可将问答能力嵌入现有办公协作平台，实现无缝集成。\n\ncdQA 通过端到端的问答系统架构，将静态文档转化为动态的智能服务，显著降低了知识检索的时间成本与技术门槛。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcdqa-suite_cdQA_bc2d9ea1.png","cdqa-suite","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fcdqa-suite_698da172.jpg","⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System. ",null,"https:\u002F\u002Fcdqa-suite.github.io\u002FcdQA-website\u002F","https:\u002F\u002Fgithub.com\u002Fcdqa-suite",[82],{"name":83,"color":84,"percentage":85},"Python","#3572A5",100,617,191,"2026-02-16T20:44:18","Apache-2.0","Linux","需要 NVIDIA GPU，显存 16GB (Tesla V100)","未说明",{"notes":94,"python":92,"dependencies":95},"项目已停止维护（仅供教育用途）；PDF 转换功能需安装 Java OpenJDK；官方测试环境基于 Ubuntu Deep Learning AMI；支持通过 Binder 或 Google Colab 在线运行示例。",[96,97,98],"transformers","pandas","flask",[26,54,13],[101,102,103,104,105,106,107,108,109,96],"reading-comprehension","question-answering","deep-learning","natural-language-processing","information-retrieval","bert","artificial-intelligence","nlp","pytorch","2026-03-27T02:49:30.150509","2026-04-06T09:44:36.467001",[113,118,123,128,132,137],{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},2697,"初始化 QAPipeline 时应该使用哪个参数名来加载模型？","根据文档更新，应该使用 `reader` 参数而不是 `model`。\n\n正确示例：\n```python\ncdqa_pipeline = QAPipeline(reader='bert_qa_vCPU-sklearn.joblib')\n```\n\n之前使用的 `model='bert_qa_vCPU-sklearn.joblib'` 会导致问题。","https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA\u002Fissues\u002F241",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},2698,"如何单独对 Reader 组件进行训练？","现在不再使用 `fit()` 方法中的布尔参数来控制 Reader 训练，而是提供了独立的 `fit_reader()` 方法。\n\n如果您想同时训练 Retriever 和 Reader，可以按以下步骤操作：\n```python\ncdqa_pipeline.fit(X=df)\ncdqa_pipeline.fit_reader(X='sec.json')\n```","https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA\u002Fissues\u002F115",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},2699,"cdQA 是否支持除英语以外的语言？","目前该框架仅支持英语。\n\n对于其他语言，您需要使用该语言对应的 BERT 版本作为 Reader，并且需要在对应语言的问答（QA）数据集上对其进行训练才能生效。","https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA\u002Fissues\u002F215",{"id":129,"question_zh":130,"answer_zh":131,"source_url":127},2700,"如何判断系统无法回答用户的问题？","您可以利用随答案返回的 logits 数据来判断置信度。\n\n具体方法是：使用 PR #203 添加的 logits 功能，并根据 Issue #195 中的建议微调一个阈值，当分数低于该阈值时即可判定系统没有确切答案。",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},2701,"如何为 QA 数据集（JSON 格式）添加唯一 ID？","为了评估模型，需要手动在 JSON 文件中为每个问题添加唯一的 `id` 字段。可以使用以下 Python 脚本处理：\n\n```python\nimport json\nimport uuid\n\nwith open(\"\u002Fpath_to_json\") as json_file:\n    data = json.load(json_file)\n\narticles = data[\"data\"]\nfor article in range(0, len(articles)):\n    paragraphs = data[\"data\"][article]['paragraphs']\n    for paragraph in range(0, len(paragraphs)):\n        questions = data[\"data\"][article]['paragraphs'][paragraph]['qas']\n        for question in range(0, len(questions)):\n            unique_id = uuid.uuid4()\n            data[\"data\"][article]['paragraphs'][paragraph]['qas'][question]['id'] = str(unique_id)\n\nwith open('path_to_output_json', 'w') as outfile:\n    json.dump(data, outfile)\n```","https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA\u002Fissues\u002F104",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},2702,"部署 cdQA REST API 后请求出现 404 错误如何解决？","此错误通常是因为 URL 格式或拼写问题导致的。\n\n根据错误日志显示，请求路径中出现了 `%20`（空格编码），例如 `\u002Fapi%20query`。这通常意味着在调用 API 时参数传递方式不正确，导致空格被包含在路径中。\n\n解决方法：\n1. 仔细检查服务器控制台日志中的 URL 拼写。\n2. 确保在 HTTPie 或 curl 命令中正确传递查询参数，避免在路径中包含未转义的空格。\n3. 参考官方部署文档确认正确的请求格式。","https:\u002F\u002Fgithub.com\u002Fcdqa-suite\u002FcdQA\u002Fissues\u002F260",[143,148,153,158,163],{"id":144,"version":145,"summary_zh":146,"released_at":147},111873,"bert_qa","Release with a version of BERT model trained on SQuAD 1.1 following last updates on cdQA modules.\r\n\r\nThis model is CPU\u002FGPU agnostic. If the model is loaded in a machine with support for CUDA it will automatically send the model to GPU for computations.\r\n\r\nThis version of the model achieves 81.3% EM and 88.7% F1-score on SQuAD 1.1 dev.","2019-10-25T15:46:48",{"id":149,"version":150,"summary_zh":151,"released_at":152},111874,"distilbert_qa","Release with a version of DistilBERT model trained on SQuAD 1.1 using Knowledge Distillation and `bert-large-uncased-whole-word-masking-finetuned-squad` as a teacher.\r\n\r\nThis version of Distilbert achieves 80.1% EM and 87.5% F1-score (vs. 81.2% EM and 88.6% F1-score for our version of BERT), while being much faster and lighter.\r\n\r\nVersion available only with sklearn wrapper.","2019-10-25T09:21:54",{"id":154,"version":155,"summary_zh":156,"released_at":157},111875,"bert_qa_vGPU","Release with a version of BERT model trained on SQuAD 1.1 runnable on GPU.\r\n\r\nVersion available only with sklearn wrapper (bert_qa_vGPU-sklearn.joblib)\r\n","2019-06-02T20:30:09",{"id":159,"version":160,"summary_zh":161,"released_at":162},111876,"bert_qa_vCPU","Release with a version of BERT model trained on SQuAD 1.1 runnable on CPU.\r\n\r\nVersion available in two formats:\r\n\r\nPytorch model: bert_qa_vCPU.bin and config.json\r\n\r\nSklearn wrapper: bert_qa_vCPU-sklearn.joblib","2019-05-31T12:02:01",{"id":164,"version":165,"summary_zh":166,"released_at":167},111877,"bnpp_newsroom_v1.1","A dataset of 3675 BNP Paribas public articles with content and metadata available on the \"Newsroom\" page of the [official BNP Paribas website](https:\u002F\u002Fgroup.bnpparibas\u002Fen\u002Fall-news).\r\n\r\n- `bnpp_newsroom-v1.1.csv`: contains all articles paragraphs, non-filtered.","2019-05-14T18:33:10"]