[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-neuml--paperai":3,"tool-neuml--paperai":62},[4,18,28,37,45,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":24,"last_commit_at":25,"category_tags":26,"status":17},9989,"n8n","n8n-io\u002Fn8n","n8n 是一款面向技术团队的公平代码（fair-code）工作流自动化平台，旨在让用户在享受低代码快速构建便利的同时，保留编写自定义代码的灵活性。它主要解决了传统自动化工具要么过于封闭难以扩展、要么完全依赖手写代码效率低下的痛点，帮助用户轻松连接 400 多种应用与服务，实现复杂业务流程的自动化。\n\nn8n 特别适合开发者、工程师以及具备一定技术背景的业务人员使用。其核心亮点在于“按需编码”：既可以通过直观的可视化界面拖拽节点搭建流程，也能随时插入 JavaScript 或 Python 代码、调用 npm 包来处理复杂逻辑。此外，n8n 原生集成了基于 LangChain 的 AI 能力，支持用户利用自有数据和模型构建智能体工作流。在部署方面，n8n 提供极高的自由度，支持完全自托管以保障数据隐私和控制权，也提供云端服务选项。凭借活跃的社区生态和数百个现成模板，n8n 让构建强大且可控的自动化系统变得简单高效。",184740,2,"2026-04-19T23:22:26",[16,14,13,15,27],"插件",{"id":29,"name":30,"github_repo":31,"description_zh":32,"stars":33,"difficulty_score":10,"last_commit_at":34,"category_tags":35,"status":17},10095,"AutoGPT","Significant-Gravitas\u002FAutoGPT","AutoGPT 是一个旨在让每个人都能轻松使用和构建 AI 的强大平台，核心功能是帮助用户创建、部署和管理能够自动执行复杂任务的连续型 AI 智能体。它解决了传统 AI 应用中需要频繁人工干预、难以自动化长流程工作的痛点，让用户只需设定目标，AI 即可自主规划步骤、调用工具并持续运行直至完成任务。\n\n无论是开发者、研究人员，还是希望提升工作效率的普通用户，都能从 AutoGPT 中受益。开发者可利用其低代码界面快速定制专属智能体；研究人员能基于开源架构探索多智能体协作机制；而非技术背景用户也可直接选用预置的智能体模板，立即投入实际工作场景。\n\nAutoGPT 的技术亮点在于其模块化“积木式”工作流设计——用户通过连接功能块即可构建复杂逻辑，每个块负责单一动作，灵活且易于调试。同时，平台支持本地自托管与云端部署两种模式，兼顾数据隐私与使用便捷性。配合完善的文档和一键安装脚本，即使是初次接触的用户也能在几分钟内启动自己的第一个 AI 智能体。AutoGPT 正致力于降低 AI 应用门槛，让人人都能成为 AI 的创造者与受益者。",183572,"2026-04-20T04:47:55",[13,36,27,14,15],"语言模型",{"id":38,"name":39,"github_repo":40,"description_zh":41,"stars":42,"difficulty_score":10,"last_commit_at":43,"category_tags":44,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":46,"name":47,"github_repo":48,"description_zh":49,"stars":50,"difficulty_score":24,"last_commit_at":51,"category_tags":52,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",161147,"2026-04-19T23:31:47",[14,13,36],{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":59,"last_commit_at":60,"category_tags":61,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,27],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":24,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":108,"github_topics":110,"view_count":24,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":120,"updated_at":121,"faqs":122,"releases":151},10091,"neuml\u002Fpaperai","paperai","📄 🤖 AI for medical and scientific papers","paperai 是一款专为医学和科学论文打造的 AI 应用，旨在利用人工智能技术大幅提升科研效率。面对海量的学术文献，研究人员往往需要耗费大量时间阅读和整理信息，而 paperai 通过自动化流程解决了这一痛点。它能够遍历本地或远程的论文仓库，针对用户提出的具体问题，批量生成有据可查的回答和深度研究报告。\n\n这款工具特别适合医学研究者、科学家以及需要处理大量文献的数据分析师使用。其核心技术亮点在于结合了大型语言模型（LLM）与检索增强生成（RAG）管道。这意味着它不仅能理解复杂的学术问题，还能精准地从原始论文中检索相关片段作为依据，确保生成的答案准确可靠，有效避免了大模型常见的“幻觉”问题。此外，paperai 支持高度定制化的配置文件，允许用户一次性发起数百个查询任务，实现高性能的批量推理。最终成果可灵活输出为 Markdown、CSV 格式，甚至能直接在 PDF 原文上进行标注。无论是构建个人知识库还是进行大规模文献综述，paperai 都能成为得力的科研助手。","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fneuml_paperai_readme_edda01966bb4.png\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cb>AI for medical and scientific papers\u003C\u002Fb>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Freleases\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Fneuml\u002Fpaperai.svg?style=flat&color=success\" alt=\"Version\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Freleases\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease-date\u002Fneuml\u002Fpaperai.svg?style=flat&color=blue\" alt=\"GitHub Release Date\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fissues\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fneuml\u002Fpaperai.svg?style=flat&color=success\" alt=\"GitHub issues\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fneuml\u002Fpaperai.svg?style=flat&color=blue\" alt=\"GitHub last commit\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Factions?query=workflow%3Abuild\">\n        \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fworkflows\u002Fbuild\u002Fbadge.svg\" alt=\"Build Status\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcoveralls.io\u002Fgithub\u002Fneuml\u002Fpaperai?branch=master\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002FcoverallsCoverage\u002Fgithub\u002Fneuml\u002Fpaperai\" alt=\"Coverage Status\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n-------------------------------------------------------------------------------------------------------------------------------------------------------\n\n`paperai` is an AI application for medical and scientific papers.\n\n![demo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fneuml_paperai_readme_608e792ef278.png)\n\n⚡ Supercharge research tasks with AI-driven report generation. A `paperai` application goes through repositories of articles and generates bulk answers to questions backed by Large Language Model (LLM) prompts and Retrieval Augmented Generation (RAG) pipelines.\n\nA `paperai` configuration file enables bulk LLM inference operations in a performant manner. Think of it like kicking off hundreds of ChatGPT prompts over your data.\n\n![architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fneuml_paperai_readme_3dfa0bd27a78.png)\n![architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fneuml_paperai_readme_a3e4f82adb04.png)\n\n`paperai` can generate reports in Markdown, CSV and annotate answers directly on PDFs (when available).\n\n## Installation\n\nThe easiest way to install is via pip and PyPI\n\n```\npip install paperai\n```\n\nPython 3.10+ is supported. Using a Python [virtual environment](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Fvenv.html) is recommended.\n\n`paperai` can also be installed directly from GitHub to access the latest, unreleased features.\n\n```\npip install git+https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\n```\n\nSee [this link](https:\u002F\u002Fneuml.github.io\u002Ftxtai\u002Finstall\u002F#environment-specific-prerequisites) to help resolve environment-specific install issues.\n\n### Docker\n\nRun the steps below to build a docker image with `paperai` and all dependencies.\n\n```\nwget https:\u002F\u002Fraw.githubusercontent.com\u002Fneuml\u002Fpaperai\u002Fmaster\u002Fdocker\u002FDockerfile\ndocker build -t paperai .\ndocker run --name paperai --rm -it paperai\n```\n\npaperetl can be added in to have a single image to index and query content. Follow the instructions to build a [paperetl docker image](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperetl#docker) and then run the following.\n\n```\ndocker build -t paperai --build-arg BASE_IMAGE=paperetl --build-arg START=\u002Fscripts\u002Fstart.sh .\ndocker run --name paperai --rm -it paperai\n```\n\n## Examples\n\nThe following notebooks and applications demonstrate the capabilities provided by `paperai`.\n\n### Notebooks\n\n| Notebook  | Description  |       |\n|:----------|:-------------|------:|\n| [Introducing paperai](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fexamples\u002F01_Introducing_paperai.ipynb) | Overview of the functionality provided by paperai | [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fexamples\u002F01_Introducing_paperai.ipynb) |\n| [Medical Research Project](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fexamples\u002F02_Medical_Research_Project.ipynb) | Research young onset colon cancer | [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fexamples\u002F02_Medical_Research_Project.ipynb) |\n\n### Applications\n\n| Application  | Description  |\n|:----------|:-------------|\n| [Search](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fexamples\u002Fsearch.py) | Search a `paperai` index. Set query parameters, execute searches and display results. |\n\n## Building a model\n\n`paperai` indexes databases previously built with [paperetl](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperetl). The following shows how to create a new `paperai` index.\n\n1. (Optional) Create an index.yml file\n\n    `paperai` uses the default txtai embeddings configuration when not specified. Alternatively, an index.yml file can be specified that takes all the same options as a txtai embeddings instance. See the [txtai documentation](https:\u002F\u002Fneuml.github.io\u002Ftxtai\u002Fembeddings\u002Fconfiguration) for more on the possible options. A simple example is shown below.\n\n    ```\n    path: sentence-transformers\u002Fall-MiniLM-L6-v2\n    content: True\n    ```\n\n2. Build embeddings index\n\n    ```\n    python -m paperai.index \u003Cpath to input data> \u003Coptional index configuration>\n    ```\n\nThe paperai.index process requires an input data path and optionally takes index configuration. This configuration can either be a vector model path or an index.yml configuration file.\n\n## Running queries\n\nThe fastest way to run queries is to start a `paperai` shell\n\n```\npaperai \u003Cpath to model directory>\n```\n\nA prompt will come up. Queries can be typed directly into the console.\n\n## Report schema\n\nThe following steps through an example `paperai` report configuration file and describes each section.\n\n```yaml\nname: ColonCancer\noptions:\n    llm: Intelligent-Internet\u002FII-Medical-8B-1706-GGUF\u002FII-Medical-8B-1706.Q4_K_M.gguf\n    system: You are a medical literature document parser. You extract fields from data.\n    template: |\n        Quickly extract the following field using the provided rules and context.\n\n        Rules:\n          - Keep it simple, don't overthink it\n          - ONLY extract the data\n          - NEVER explain why the field is extracted\n          - NEVER restate the field name only give the field value\n          - Say no data if the field can't be found within the context\n\n        Field:\n        {question}\n\n        Context:\n        {context}\n\n    context: 5\n    params:\n        maxlength: 4096\n        stripthink: True\n\nResearch:\n    query: colon cancer young adults\n    columns:\n        - name: Date\n        - name: Study\n        - name: Study Link\n        - name: Journal\n        - {name: Sample Size, query: number of patients, question: Sample Size}\n        - {name: Objective, query: objective, question: Study Objective}\n        - {name: Causes, query: possible causes, question: List of possible causes}\n        - {name: Detection, query: diagnosis, question: List of ways to diagnose}\n```\n\n### Configuration\n\nThe following shows the top level configuration options.\n\n| Field  | Description  |\n|:------------ |:-------------|\n| name | Report name |\n| options | RAG pipeline options - set the LLM, prompt templates, max length and more|\n| report | Each unique top level parameter sets the report name. In the example above, it's called `Research` |\n| query | Vector query that identifies the top n documents |\n| columns | List of columns |\n\n### Standard columns\n\nStandard columns use the article data store metadata to simply copy fields into a report. Set the column `name` to one of the values below.\n\n| Field  | Description  |\n|:------------ |:-------------|\n| Id | Article unique identifier |\n| Date | Article publication date |\n| Study | Title of the article |\n| Study Link | HTTP link to the study | \n| Journal | Publication name | \n| Source | Data source name | \n| Entry | Article entry date |\n| Matches | Sections that caused this article to match the report query | \n\n### Generated columns\n\nThe most novel feature of `paperai` is being able to generate dynamic columns driven by a RAG pipeline. Each field takes the following parameters.\n\n| Parameter  | Description  |\n|:------------ |:-------------|\n| name | Column name |\n| query | search\u002Fsimilarity query |\n| question | llm question parameter |\n\nFor each matching article, the `query` sorts each section by relevance to that query. This can be a vector query, keyword query or hybrid query. This is controlled by the embeddings index configuration. The `question` is plugged into the RAG pipeline template along with the top n matching context elements from the query. The generated column is stored as `name` in the report output.\n\n## Building a report file\n\nReports can generate output in multiple formats. An example report call:\n\n```\npython -m paperai.report crc.yml 10 csv \u003Cpath to model directory>\n```\n\nIn the example above, a file named Research.csv will be created with the top 10 most relevant articles.\n\nThe following report formats are supported:\n\n- Markdown (Default) - Renders a Markdown report. Columns and answers are extracted from articles with the results stored in a Markdown file.\n- CSV - Renders a CSV report. Columns and answers are extracted from articles with the results stored in a CSV file.\n- Annotation - Columns and answers are extracted from articles with the results annotated over the original PDF files. Requires passing in a path with the original PDF files.\n\nSee the [examples](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Ftree\u002Fmaster\u002Fexamples) directory for report examples. Additional historical report configuration files can be found [here](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fcord19q\u002Ftree\u002Fmaster\u002Ftasks).\n\n## Tech Overview\n\n`paperai` is a combination of a [txtai](https:\u002F\u002Fgithub.com\u002Fneuml\u002Ftxtai) embeddings index, a SQLite database with the articles and an LLM. These components are joined together in a [txtai RAG pipeline](https:\u002F\u002Fneuml.github.io\u002Ftxtai\u002Fpipeline\u002Ftext\u002Frag\u002F).\n\nEach article is parsed into sections and stored in a data store along with the article metadata. Embeddings are built over the full corpus. The LLM analyzes context-limited requests and generates outputs.\n\nMultiple entry points exist to interact with the model.\n\n- paperai.report - Builds a report for a series of queries. For each query, the top scoring articles are shown along with matches from those articles. There is also a highlights section showing the most relevant results.\n- paperai.query - Runs a single query from the terminal\n- paperai.shell - Allows running multiple queries from the terminal\n\n## Recognition\n\n`paperai` and\u002For NeuML has been recognized in the following articles.\n\n- [Machine-Learning Experts Delve Into 47,000 Papers on Coronavirus Family](https:\u002F\u002Fwww.wsj.com\u002Farticles\u002Fmachine-learning-experts-delve-into-47-000-papers-on-coronavirus-family-11586338201)\n- [Data scientists assist medical researchers in the fight against COVID-19](https:\u002F\u002Fcloud.google.com\u002Fblog\u002Fproducts\u002Fai-machine-learning\u002Fhow-kaggle-data-scientists-help-with-coronavirus)\n- [CORD-19 Kaggle Challenge Awards](https:\u002F\u002Fwww.kaggle.com\u002Fallen-institute-for-ai\u002FCORD-19-research-challenge\u002Fdiscussion\u002F161447)\n","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fneuml_paperai_readme_edda01966bb4.png\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cb>面向医学和科学论文的AI\u003C\u002Fb>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Freleases\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Fneuml\u002Fpaperai.svg?style=flat&color=success\" alt=\"版本\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Freleases\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease-date\u002Fneuml\u002Fpaperai.svg?style=flat&color=blue\" alt=\"GitHub发布日期\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fissues\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fneuml\u002Fpaperai.svg?style=flat&color=success\" alt=\"GitHub问题\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fneuml\u002Fpaperai.svg?style=flat&color=blue\" alt=\"GitHub最新提交\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Factions?query=workflow%3Abuild\">\n        \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fworkflows\u002Fbuild\u002Fbadge.svg\" alt=\"构建状态\"\u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcoveralls.io\u002Fgithub\u002Fneuml\u002Fpaperai?branch=master\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002FcoverallsCoverage\u002Fgithub\u002Fneuml\u002Fpaperai\" alt=\"覆盖率状态\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n-------------------------------------------------------------------------------------------------------------------------------------------------------\n\n`paperai` 是一款用于医学和科学论文的AI应用。\n\n![demo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fneuml_paperai_readme_608e792ef278.png)\n\n⚡ 通过AI驱动的报告生成，为研究任务注入强大动力。`paperai` 应用会遍历文章库，并基于大型语言模型（LLM）提示和检索增强生成（RAG）流程，批量生成对问题的回答。\n\n借助 `paperai` 的配置文件，可以高效地执行大规模的 LLM 推理操作。这就好比在您的数据上同时启动数百个 ChatGPT 提示。\n\n![architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fneuml_paperai_readme_3dfa0bd27a78.png)\n![architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fneuml_paperai_readme_a3e4f82adb04.png)\n\n`paperai` 可以生成 Markdown、CSV 格式的报告，并在 PDF 文件上直接标注答案（如适用）。\n\n## 安装\n\n最简单的安装方式是通过 pip 和 PyPI：\n\n```\npip install paperai\n```\n\n支持 Python 3.10 及以上版本。建议使用 Python [虚拟环境](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Fvenv.html)。\n\n您也可以直接从 GitHub 安装 `paperai` 来获取最新的未发布功能：\n\n```\npip install git+https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\n```\n\n请参阅[此链接](https:\u002F\u002Fneuml.github.io\u002Ftxtai\u002Finstall\u002F#environment-specific-prerequisites)，以帮助解决特定环境下的安装问题。\n\n### Docker\n\n按照以下步骤构建包含 `paperai` 及所有依赖项的 Docker 镜像：\n\n```\nwget https:\u002F\u002Fraw.githubusercontent.com\u002Fneuml\u002Fpaperai\u002Fmaster\u002Fdocker\u002FDockerfile\ndocker build -t paperai .\ndocker run --name paperai --rm -it paperai\n```\n\n还可以加入 `paperetl`，以便在一个镜像中完成内容的索引和查询。请遵循说明构建 [paperetl Docker 镜像](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperetl#docker)，然后运行以下命令：\n\n```\ndocker build -t paperai --build-arg BASE_IMAGE=paperetl --build-arg START=\u002Fscripts\u002Fstart.sh .\ndocker run --name paperai --rm -it paperai\n```\n\n## 示例\n\n以下笔记本和应用程序展示了 `paperai` 提供的功能。\n\n### 笔记本\n\n| 笔记本  | 描述  |       |\n|:----------|:-------------|------:|\n| [介绍 paperai](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fexamples\u002F01_Introducing_paperai.ipynb) | paperai 功能概览 | [![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fexamples\u002F01_Introducing_paperai.ipynb) |\n| [医学研究项目](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fexamples\u002F02_Medical_Research_Project.ipynb) | 研究年轻发病结肠癌 | [![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fexamples\u002F02_Medical_Research_Project.ipynb) |\n\n### 应用程序\n\n| 应用程序  | 描述  |\n|:----------|:-------------|\n| [搜索](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fexamples\u002Fsearch.py) | 搜索 `paperai` 索引。设置查询参数、执行搜索并显示结果。 |\n\n## 构建模型\n\n`paperai` 会索引先前使用 [paperetl](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperetl) 构建的数据库。以下是创建新 `paperai` 索引的方法。\n\n1. （可选）创建 index.yml 文件\n\n    如果未指定，`paperai` 将使用默认的 txtai 嵌入配置。或者，您可以指定一个 index.yml 文件，其中包含与 txtai 嵌入实例相同的选项。有关可能选项的更多信息，请参阅 [txtai 文档](https:\u002F\u002Fneuml.github.io\u002Ftxtai\u002Fembeddings\u002Fconfiguration)。下面是一个简单示例：\n\n    ```\n    path: sentence-transformers\u002Fall-MiniLM-L6-v2\n    content: True\n    ```\n\n2. 构建嵌入索引\n\n    ```\n    python -m paperai.index \u003C输入数据路径 > \u003C 可选索引配置 >\n    ```\n\n    paperai.index 过程需要输入数据路径，并可选择性地接受索引配置。该配置可以是向量模型路径，也可以是 index.yml 配置文件。\n\n## 执行查询\n\n执行查询的最快方式是启动一个 `paperai` shell：\n\n```\npaperai \u003C模型目录路径>\n```\n\n随后会出现提示符。您可以直接在控制台中输入查询。\n\n## 报表架构\n\n以下步骤将逐步介绍一个 `paperai` 报表配置文件，并对每个部分进行说明。\n\n```yaml\nname: ColonCancer\noptions:\n    llm: Intelligent-Internet\u002FII-Medical-8B-1706-GGUF\u002FII-Medical-8B-1706.Q4_K_M.gguf\n    system: 你是一名医学文献解析器。你负责从数据中提取字段。\n    template: |\n        请根据提供的规则和上下文，快速提取以下字段。\n\n        规则：\n          - 简单明了，不要过度思考\n          - 只提取数据\n          - 绝不解释为何提取该字段\n          - 绝不重复字段名称，只给出字段值\n          - 如果在上下文中找不到该字段，则回答“无数据”\n\n        字段：\n        {question}\n\n        上下文：\n        {context}\n\n    context: 5\n    params:\n        maxlength: 4096\n        stripthink: True\n\nResearch:\n    query: colon cancer young adults\n    columns:\n        - name: Date\n        - name: Study\n        - name: Study Link\n        - name: Journal\n        - {name: Sample Size, query: number of patients, question: Sample Size}\n        - {name: Objective, query: objective, question: Study Objective}\n        - {name: Causes, query: possible causes, question: List of possible causes}\n        - {name: Detection, query: diagnosis, question: List of ways to diagnose}\n```\n\n### 配置\n\n以下是顶级配置选项的说明。\n\n| 字段  | 描述  |\n|:------------ |:-------------|\n| name | 报表名称 |\n| options | RAG 流水线选项 - 设置 LLM、提示模板、最大长度等 |\n| report | 每个唯一的顶级参数都会设置报表名称。在上面的例子中，它被称为 `Research` |\n| query | 用于识别前 n 篇文档的向量查询 |\n| columns | 列表 |\n\n### 标准列\n\n标准列使用文章数据存储元数据，直接将字段复制到报表中。将列的 `name` 设置为以下值之一。\n\n| 字段  | 描述  |\n|:------------ |:-------------|\n| Id | 文章唯一标识符 |\n| Date | 文章发表日期 |\n| Study | 文章标题 |\n| Study Link | 文章的 HTTP 链接 |\n| Journal | 出版物名称 |\n| Source | 数据源名称 |\n| Entry | 文章录入日期 |\n| Matches | 导致该文章匹配报表查询的章节 |\n\n### 生成列\n\n`paperai` 最具创新性的功能是能够通过 RAG 流水线生成动态列。每个字段需要以下参数。\n\n| 参数  | 描述  |\n|:------------ |:-------------|\n| name | 列名 |\n| query | 搜索\u002F相似度查询 |\n| question | LLM 的问题参数 |\n\n对于每篇匹配的文章，`query` 会根据与该查询的相关性对各章节进行排序。这可以是向量查询、关键词查询或混合查询。具体由嵌入索引配置控制。`question` 会与查询中排名靠前的 n 个上下文元素一起插入到 RAG 流水线模板中。生成的列将以 `name` 的形式存储在报表输出中。\n\n## 构建报表文件\n\n报表可以生成多种格式的输出。以下是一个报表调用示例：\n\n```\npython -m paperai.report crc.yml 10 csv \u003C模型目录路径>\n```\n\n在上面的例子中，将创建一个名为 Research.csv 的文件，其中包含最相关的前 10 篇文章。\n\n支持的报表格式如下：\n\n- Markdown（默认） - 渲染 Markdown 报表。从文章中提取列和答案，并将结果存储在 Markdown 文件中。\n- CSV - 渲染 CSV 报表。从文章中提取列和答案，并将结果存储在 CSV 文件中。\n- 注释 - 从文章中提取列和答案，并将结果注释在原始 PDF 文件上。需要传入包含原始 PDF 文件的路径。\n\n有关报表示例，请参阅 [examples](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Ftree\u002Fmaster\u002Fexamples) 目录。更多历史报表配置文件可在 [这里](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fcord19q\u002Ftree\u002Fmaster\u002Ftasks) 找到。\n\n## 技术概述\n\n`paperai` 是由 [txtai](https:\u002F\u002Fgithub.com\u002Fneuml\u002Ftxtai) 嵌入索引、包含文章的 SQLite 数据库以及 LLM 组成的组合。这些组件通过 [txtai RAG 流水线](https:\u002F\u002Fneuml.github.io\u002Ftxtai\u002Fpipeline\u002Ftext\u002Frag\u002F) 结合在一起。\n\n每篇文章会被解析成多个章节，并连同文章元数据一起存储在数据存储中。整个语料库会构建嵌入。LLM 会分析受限于上下文的请求并生成输出。\n\n有多个入口点可以与模型交互。\n\n- paperai.report - 为一系列查询构建报表。对于每个查询，都会显示得分最高的文章及其匹配内容。还有一个亮点部分，展示最相关的结果。\n- paperai.query - 从终端运行单个查询\n- paperai.shell - 允许从终端运行多个查询\n\n## 认可\n\n`paperai` 和\u002F或 NeuML 已在以下文章中被提及。\n\n- [机器学习专家深入研究冠状病毒家族的 47,000 篇论文](https:\u002F\u002Fwww.wsj.com\u002Farticles\u002Fmachine-learning-experts-delve-into-47-000-papers-on-coronavirus-family-11586338201)\n- [数据科学家协助医学研究人员抗击 COVID-19](https:\u002F\u002Fcloud.google.com\u002Fblog\u002Fproducts\u002Fai-machine-learning\u002Fhow-kaggle-data-scientists-help-with-coronavirus)\n- [CORD-19 Kaggle 挑战赛奖项](https:\u002F\u002Fwww.kaggle.com\u002Fallen-institute-for-ai\u002FCORD-19-research-challenge\u002Fdiscussion\u002F161447)","# paperai 快速上手指南\n\n**paperai** 是一款专为医学和科学论文设计的 AI 应用。它利用大语言模型（LLM）和检索增强生成（RAG）技术，批量处理文献库，自动生成基于证据的研究报告、CSV 表格或直接对 PDF 进行标注。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows\n*   **Python 版本**：Python 3.10 或更高版本\n*   **推荐环境**：建议使用 Python 虚拟环境以避免依赖冲突\n    ```bash\n    python -m venv venv\n    # Linux\u002FmacOS\n    source venv\u002Fbin\u002Factivate\n    # Windows\n    venv\\Scripts\\activate\n    ```\n*   **前置数据**：`paperai` 需要基于由 [paperetl](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperetl) 构建的数据库索引运行。在使用前，您需要先准备好文献数据并完成索引构建（详见“基本使用”中的建库步骤）。\n\n> **提示**：国内开发者在安装大型依赖时，如遇网络问题，可配置 pip 使用国内镜像源（如清华源）：\n> `pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple ...`\n\n## 安装步骤\n\n您可以通过 PyPI 直接安装稳定版，或从 GitHub 安装最新开发版。\n\n### 方式一：通过 pip 安装（推荐）\n\n```bash\npip install paperai\n```\n\n### 方式二：从 GitHub 安装（获取最新功能）\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\n```\n\n### 方式三：使用 Docker\n\n如果您希望隔离环境或使用预构建镜像：\n\n```bash\nwget https:\u002F\u002Fraw.githubusercontent.com\u002Fneuml\u002Fpaperai\u002Fmaster\u002Fdocker\u002FDockerfile\ndocker build -t paperai .\ndocker run --name paperai --rm -it paperai\n```\n\n## 基本使用\n\n`paperai` 的核心工作流程分为三步：**配置索引** -> **构建向量库** -> **生成报告**。\n\n### 1. 准备索引配置 (可选)\n\n创建一个名为 `index.yml` 的文件，用于指定嵌入模型。如果不创建此文件，系统将使用默认配置。\n\n```yaml\npath: sentence-transformers\u002Fall-MiniLM-L6-v2\ncontent: True\n```\n\n### 2. 构建 embeddings 索引\n\n使用 `paperai.index` 命令处理您的输入数据目录（需包含由 paperetl 处理过的数据），生成向量索引。\n\n```bash\npython -m paperai.index \u003Cpath to input data> \u003Coptional index configuration>\n```\n\n*   `\u003Cpath to input data>`: 输入数据路径。\n*   `\u003Coptional index configuration>`: 可选，指向 `index.yml` 文件或向量模型路径。\n\n### 3. 定义报告 schema\n\n创建一个 YAML 配置文件（例如 `report_config.yml`），定义您想要提取的信息。以下是一个提取结肠癌研究信息的示例：\n\n```yaml\nname: ColonCancer\noptions:\n    llm: Intelligent-Internet\u002FII-Medical-8B-1706-GGUF\u002FII-Medical-8B-1706.Q4_K_M.gguf\n    system: You are a medical literature document parser. You extract fields from data.\n    template: |\n        Quickly extract the following field using the provided rules and context.\n        Rules:\n          - Keep it simple, don't overthink it\n          - ONLY extract the data\n          - NEVER explain why the field is extracted\n          - NEVER restate the field name only give the field value\n          - Say no data if the field can't be found within the context\n        Field:\n        {question}\n        Context:\n        {context}\n    context: 5\n    params:\n        maxlength: 4096\n        stripthink: True\n\nResearch:\n    query: colon cancer young adults\n    columns:\n        - name: Date\n        - name: Study\n        - name: Study Link\n        - name: Journal\n        - {name: Sample Size, query: number of patients, question: Sample Size}\n        - {name: Objective, query: objective, question: Study Objective}\n        - {name: Causes, query: possible causes, question: List of possible causes}\n        - {name: Detection, query: diagnosis, question: List of ways to diagnose}\n```\n\n### 4. 生成报告\n\n运行报告生成命令，输出格式支持 Markdown (默认), CSV 或 PDF 标注。\n\n**生成 CSV 报告示例：**\n以下命令将查找最相关的 10 篇文章，并生成 `Research.csv` 文件。\n\n```bash\npython -m paperai.report report_config.yml 10 csv \u003Cpath to model directory>\n```\n\n*   `report_config.yml`: 上一步创建的配置文件。\n*   `10`: 限制处理最相关的前 10 篇文章。\n*   `csv`: 输出格式。\n*   `\u003Cpath to model directory>`: 第 2 步生成的模型索引目录路径。\n\n### 5. 交互式查询 (可选)\n\n您也可以启动交互式 Shell 进行即时查询：\n\n```bash\npaperai \u003Cpath to model directory>\n```\n启动后，直接在控制台输入问题即可获取基于文献的回答。","某生物医药公司的研发分析师正面临紧急任务，需要在 48 小时内从上千篇最新的肿瘤免疫疗法论文中，梳理出特定靶点的临床实验数据并撰写综述报告。\n\n### 没有 paperai 时\n- 分析师需人工逐篇下载 PDF，肉眼筛选与目标靶点相关的段落，耗时极长且容易遗漏关键文献。\n- 阅读过程中需手动摘录数据到 Excel，不仅效率低下，还常因疲劳导致数据转录错误或格式不统一。\n- 面对海量专业术语和复杂句式，非本细分领域的专家难以快速抓住每篇文章的核心结论，理解成本极高。\n- 最终报告生成缓慢，无法在紧迫的决策窗口期内提供足够的数据支撑，影响新药立项进度。\n\n### 使用 paperai 后\n- paperai 自动遍历本地或云端的海量论文库，利用 RAG 技术精准定位所有提及该靶点的章节，瞬间完成初筛。\n- 通过预设的批量 LLM 提示词，paperai 直接提取结构化数据并生成 CSV 报表，同时支持在原始 PDF 上自动标注答案来源，确保数据可追溯。\n- 借助大模型的专业理解能力，paperai 能准确概括复杂的医学结论，将晦涩的学术语言转化为清晰的要点摘要。\n- 系统一键生成包含引用来源的 Markdown 综述草稿，将原本数周的工作压缩至几小时，让团队能迅速基于最新证据做出决策。\n\npaperai 通过将检索、阅读、提取和写作全流程自动化，把研究人员从繁琐的文献搬运工作中解放出来，使其专注于高价值的科学洞察。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fneuml_paperai_608e792e.png","neuml","NeuML","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fneuml_4e34d655.png","NeuML is the company behind txtai, one of the most popular open-source AI frameworks in the world. ",null,"neumll","https:\u002F\u002Fneuml.com","https:\u002F\u002Fgithub.com\u002Fneuml",[82,86,90],{"name":83,"color":84,"percentage":85},"Python","#3572A5",98,{"name":87,"color":88,"percentage":89},"Dockerfile","#384d54",1.1,{"name":91,"color":92,"percentage":93},"Makefile","#427819",0.9,1753,143,"2026-04-18T08:55:12","Apache-2.0","Linux, macOS, Windows","未说明 (取决于所选 LLM 模型，若运行本地量化模型如 GGUF 则非必需，若运行全精度模型则需高性能 GPU)","未说明 (建议根据处理的论文数据量和加载的 LLM 模型大小调整，通常推荐 16GB+)",{"notes":102,"python":103,"dependencies":104},"该工具核心依赖 txtai 进行嵌入和 RAG 流程。安装推荐使用 Python 虚拟环境或 Docker。LLM 部分支持通过配置文件指定本地 GGUF 格式模型（如示例中的 II-Medical-8B）或远程 API，因此硬件需求高度灵活。若使用本地大模型，需确保有足够内存或显存加载对应模型；若仅使用轻量级嵌入模型，普通 CPU 即可运行。构建索引前需先通过 paperetl 处理数据。","3.10+",[65,105,106,107],"txtai","paperetl","sentence-transformers (可选配置)",[13,109,15,14,36],"其他",[111,112,113,114,115,116,117,105,118,119],"python","machine-learning","nlp","medical","search","scientific-papers","document-search","ai","artificial-intelligence","2026-03-27T02:49:30.150509","2026-04-20T19:23:41.194607",[123,128,133,138,143,147],{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},45309,"在 Windows 上安装 paperai 时遇到 'UnicodeDecodeError: gbk' 或编码错误怎么办？","Windows 系统在处理某些压缩包或依赖项（如 mdv）时可能会遇到 GBK 编码问题。推荐的解决方案包括：\n1. 使用 WSL (Windows Subsystem for Linux) 运行 Ubuntu 环境进行安装，这是最稳定的方法。\n2. 尝试设置环境变量 `set PYTHONUTF8=1`（但可能不总是有效）。\n3. 参考项目的 GitHub Actions 脚本了解构建细节：https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002F.github\u002Fworkflows\u002Fbuild.yml\n4. 或者直接使用 Docker 安装，避免本地环境配置问题：https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fblob\u002Fmaster\u002Fdocker\u002FDockerfile","https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fissues\u002F20",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},45310,"运行查询时没有输出结果（只有 Highlights 和 Articles 标题），如何解决？","如果运行查询后只显示标题而无具体内容，可能是查询语句本身没有匹配到结果，或者模型未正确加载。建议：\n1. 检查查询语句是否包含有效的关键词。\n2. 确认已成功构建模型并加载了向量文件。\n3. 参考 Kaggle 上的官方 Notebook 示例，查看不同配置和查询写法：https:\u002F\u002Fwww.kaggle.com\u002Fdavidmezzetti\u002Fnotebooks\n4. 确保使用的向量模型与文献数据匹配。","https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fissues\u002F19",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},45311,"如何正确下载和配置预训练的向量模型文件（cord19-300d.magnitude）？","正确配置预训练向量的步骤如下：\n1. 从 Kaggle 下载向量文件：https:\u002F\u002Fwww.kaggle.com\u002Fdavidmezzetti\u002Fcord19-fasttext-vectors#cord19-300d.magnitude\n2. 创建目录 `~\u002F.cord19\u002Fvectors\u002F`。\n3. 将下载的 ZIP 文件解压，提取出 `cord19-300d.magnitude` 文件。\n4. 将该文件直接移动到 `~\u002F.cord19\u002Fvectors\u002F` 目录下（不要保留额外的子文件夹）。\n5. 运行 `python -m paperai.index` 构建索引。\n注意：如果遇到 'Too many requests' 错误，请稍后再试或使用其他网络环境下载。","https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fissues\u002F6",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},45312,"paperai 支持集成 Zotero 文献库吗？","目前项目本身不直接内置 Zotero 集成，但社区提供了实现方案。可以通过 SQLite API 访问 Zotero 数据库，无需安装额外的 AGPL 软件。已有用户开发了集成插件，代码托管在：https:\u002F\u002Fgithub.com\u002FSoenkevL\u002Fpaperetl_zotero_integration.git。该插件主要增加了 Zotero_extractor 文件，并对 PDF 读取和解析部分做了适配。相关讨论也可参考 paperetl 仓库的 Issue：https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperetl\u002Fissues\u002F49","https:\u002F\u002Fgithub.com\u002Fneuml\u002Fpaperai\u002Fissues\u002F68",{"id":144,"question_zh":145,"answer_zh":146,"source_url":132},45313,"cord19 向量模型是否仅适用于新冠文献，能否用于其他生物医学文献？","cord19 向量模型虽然是为新冠文献（CORD-19 数据集）优化的，但它仍然可以很好地应用于其他生物医学文献。向量嵌入捕捉的是通用的生物医学术语和语义关系。如果需要，理论上也可以替换为其他领域训练的向量模型，但这需要重新构建索引并确保格式兼容。建议先使用默认模型测试效果。",{"id":148,"question_zh":149,"answer_zh":150,"source_url":132},45314,"如何在生成报告时获得与演示页面类似的输出格式？","要获得与演示类似的报告输出，建议参考 Kaggle 上的官方 Notebook 示例，其中包含了多种配置和输出格式的演示：https:\u002F\u002Fwww.kaggle.com\u002Fdavidmezzetti\u002Fnotebooks。不同的配置参数会影响输出的详细程度和格式。此外，确保使用的是最新版本的代码，因为旧版本中的一些停用词处理方法已被移除或优化。",[152,157,162,167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242],{"id":153,"version":154,"summary_zh":155,"released_at":156},360181,"v2.5.0","本次发布新增了以下功能改进和错误修复：\n\n- 支持将 LLM 流水线选项传递给 Reports 模块 (#82)\n- 新增按引用次数排序功能 (#83)\n- 更新 README，更清晰地展示 paperai 的功能特性 (#84)\n- 添加医学研究项目示例笔记本 (#85)","2025-07-01T17:43:16",{"id":158,"version":159,"summary_zh":160,"released_at":161},360182,"v2.4.0","本次发布新增了以下功能改进和错误修复：\n\n- 要求 Python 版本 ≥ 3.10 (#80)\n- 重构词向量代码 (#81)","2025-06-23T14:21:03",{"id":163,"version":164,"summary_zh":165,"released_at":166},360183,"v2.3.0","本次发布新增了以下功能增强和错误修复：\n\n- 考虑从 lxml 的 clean_html 切换到更安全（且可能性能更好）的替代方案 (#69)\n- paperai pip 包中 models.py 文件的错误 (#77)\n- 更新代码库以兼容 txtai 8.x (#78)\n- 要求 Python 版本 ≥ 3.9 (#79)","2024-12-28T19:43:40",{"id":168,"version":169,"summary_zh":170,"released_at":171},360184,"v2.2.1","此版本新增了以下功能改进和错误修复：\n\n- 更新 setup.py，使其在 PyPI 上仅显示标准图像 (#72)","2023-09-18T21:17:10",{"id":173,"version":174,"summary_zh":175,"released_at":176},360185,"v2.2.0","本次发布新增了以下功能增强和问题修复：\n\n- 面向初学者的 paperai (#60)\n- 添加示例笔记本 (#63)\n- 修改默认索引配置 (#64)\n- 作为控制台脚本运行时，Shell 不接受命令行参数 (#65)\n- 将报告部分的查询与嵌入索引查询对齐 (#66)\n- 修复向 Extractor 管道传递空队列的问题 (#67)\n- 将最低 Python 版本更新至 3.8 (#71)\n- 升级至 txtai 6.0 (#70)\n","2023-09-18T20:35:20",{"id":178,"version":179,"summary_zh":180,"released_at":181},360186,"v2.1.0","本次发布新增了以下功能改进和错误修复：\n\n- 增加从字典读取索引配置的功能 (#57)\n- 更新测试以确保在不同环境下的可复现性 (#58)\n- 添加对最新 txtai 索引选项的支持 (#61)\n- 从 mdv 迁移到 rich 库 (#62)","2023-01-20T14:35:34",{"id":183,"version":184,"summary_zh":185,"released_at":186},360187,"v2.0.0","本次发布新增了以下功能增强和错误修复：\n\n- 允许在任务 YML 文件中设置报告选项 (#42)\n- 允许对整个数据库运行报告 (#43)\n- 批量提取器查询 (#44)\n- 移除研究设计列 (#46)\n- 增加指定提取列上下文的选项 (#47)\n- 增加报告引用列 (#48)\n- 增加报告列格式参数 (#49)\n- 添加提交前检查 (#50)\n- 在报告章节查询中添加检查，以确保文本包含标记 (#51)\n- 移除默认的 home 目录 cord19 路径默认值 (#52)\n- 要求使用 Python 3.7 或更高版本 (#54)\n- 将 txtai 更新至 4.3.1 (#56)","2022-03-12T01:09:33",{"id":188,"version":189,"summary_zh":190,"released_at":191},360188,"v1.10.0","与 txtai 3.3 同步 (#41)","2021-09-10T18:26:28",{"id":193,"version":194,"summary_zh":195,"released_at":196},360189,"v1.9.0","更新至 txtai 3.2 (#40)","2021-08-18T01:05:27",{"id":198,"version":199,"summary_zh":200,"released_at":201},360190,"v1.8.0","本次发布新增了以下功能改进和缺陷修复：\n\n- 增加读取索引 YAML 文件的功能 (#18)\n- 从 mdv 切换到 mdv3，以支持 Python 3.9 (#21)\n- 为 paperai 添加增强型 API (#30)\n- 增加可配置的查询阈值 (#31)\n- 支持查询否定 (#32)\n- 添加搜索应用 (#33)\n\n","2021-04-23T12:27:44",{"id":203,"version":204,"summary_zh":205,"released_at":206},360191,"v1.7.0","This release adds the following enhancements and bug fixes:\r\n\r\n- Add pre-trained models to GitHub (#19, #27)\r\n- Add Dockerfile (#29)","2021-02-24T20:02:09",{"id":208,"version":209,"summary_zh":210,"released_at":211},360192,"v1.6.0","Sync with txtai 2.0 (#26)","2021-01-13T14:57:36",{"id":213,"version":214,"summary_zh":215,"released_at":216},360193,"v1.5.0","This release adds the following enhancements and bug fixes:\r\n\r\n- Add annotation report (#17)","2020-12-11T02:13:15",{"id":218,"version":219,"summary_zh":220,"released_at":221},360194,"v1.4.0","This release adds the following enhancements and bug fixes:\r\n\r\n- Allow specifying vector output file (#10, #11, #13)\r\n- Build test suite (#12)\r\n- Add additional column parameters (#14)\r\n- Allow indexing partial datasources  (#15)\r\n- Add GitHub actions build script (#16)","2020-11-06T14:43:14",{"id":223,"version":224,"summary_zh":225,"released_at":226},360195,"v1.3.0","This release addresses the following:\r\n\r\n- Remove NLTK dependency (#9)","2020-08-18T20:31:52",{"id":228,"version":229,"summary_zh":230,"released_at":231},360196,"v1.2.1","Minor README update to note package can be installed from PyPI","2020-08-12T16:41:53",{"id":233,"version":234,"summary_zh":235,"released_at":236},360197,"v1.2.0","Release addresses the following:\r\n\r\n- Allow customized the QA model used for QA extraction (#5)\r\n- Migrated embeddings index logic to txtai project (#7)","2020-08-11T21:42:34",{"id":238,"version":239,"summary_zh":240,"released_at":241},360198,"v1.1.0","Release addresses the following:\r\n\r\n- Add wildcard report queries (#1) - Add ability to run report against entire database. This is only practical for smaller datasets.\r\n- Fix Windows install issues (#2)\r\n- Embeddings index memory improvements (#3) - Various improvements to limit memory usage when building an embeddings index\r\n- Support must clauses for custom query columns (#4) - Add same logic already present in general queries to require a term to be present when deriving report query columns","2020-08-05T01:15:49",{"id":243,"version":244,"summary_zh":245,"released_at":246},360199,"v1.0.0","Initial release of paperai, migrating AI\u002FML\u002FSearch logic from existing [cord19q](https:\u002F\u002Fgithub.com\u002Fneuml\u002Fcord19q) project. ","2020-07-21T19:42:47"]