[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-VectifyAI--PageIndex":3,"tool-VectifyAI--PageIndex":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":23,"env_os":93,"env_gpu":93,"env_ram":93,"env_deps":94,"category_tags":99,"github_topics":100,"view_count":113,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":114,"updated_at":115,"faqs":116,"releases":150},565,"VectifyAI\u002FPageIndex","PageIndex","📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG","PageIndex 是一款创新的开源 RAG（检索增强生成）框架，专为处理长篇幅专业文档而设计。它摒弃了传统的向量数据库和文本切片方案，转而采用“无向量、基于推理”的检索模式。\n\n传统 RAG 往往依赖语义相似度进行搜索，但这并不等同于真正的相关性，尤其在需要多步推理和专业知识的场景下容易失效。PageIndex 受 AlphaGo 启发，通过构建文档的层级树状索引（类似目录结构），引导大语言模型像人类专家一样在文档中进行推理和导航，从而精准定位关键信息。\n\n这一技术路径带来了显著优势：无需维护向量数据库，避免了切片导致的上下文断裂，实现了更接近人类思维方式的检索体验。PageIndex 非常适合希望提升文档问答准确性的开发者、AI 研究人员，以及需要处理复杂长文档的专业团队。通过 MCP 或 API 接口，它能轻松集成到现有应用中，让机器理解能力更上一层楼。","\u003Cdiv align=\"center\">\n  \n\u003Ca href=\"https:\u002F\u002Fvectify.ai\u002Fpageindex\" target=\"_blank\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_945eac4b7892.png\" alt=\"PageIndex Banner\" \u002F>\n\u003C\u002Fa>\n\n\u003Cbr\u002F>\n\u003Cbr\u002F>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F14736\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_4a68feb902da.png\" alt=\"VectifyAI%2FPageIndex | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n# PageIndex: Vectorless, Reasoning-based RAG\n\n\u003Cp align=\"center\">\u003Cb>Reasoning-based RAG&nbsp; ◦ &nbsp;No Vector DB&nbsp; ◦ &nbsp;No Chunking&nbsp; ◦ &nbsp;Human-like Retrieval\u003C\u002Fb>\u003C\u002Fp>\n\n\u003Ch4 align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fvectify.ai\">🏠 Homepage\u003C\u002Fa>&nbsp; • &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fchat.pageindex.ai\">🖥️ Chat Platform\u003C\u002Fa>&nbsp; • &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fpageindex.ai\u002Fmcp\">🔌 MCP\u003C\u002Fa>&nbsp; • &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fdocs.pageindex.ai\">📚 Docs\u003C\u002Fa>&nbsp; • &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fdiscord.com\u002Finvite\u002FVuXuf29EUj\">💬 Discord\u003C\u002Fa>&nbsp; • &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fii2abc2jejf.typeform.com\u002Fto\u002FtK3AXl8T\">✉️ Contact\u003C\u002Fa>&nbsp;\n\u003C\u002Fh4>\n  \n\u003C\u002Fdiv>\n\n\n\u003Cdetails open>\n\u003Csummary>\u003Ch2>📢 Updates\u003C\u002Fh2>\u003C\u002Fsummary>\n\n- 🔥 [**Agentic Vectorless RAG**](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fexamples\u002Fagentic_vectorless_rag_demo.py): A simple *agentic, vectorless RAG* [example](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fexamples\u002Fagentic_vectorless_rag_demo.py) with self-hosted PageIndex, using OpenAI Agents SDK.\n- [PageIndex Chat](https:\u002F\u002Fchat.pageindex.ai): A Human-like document analysis agent [platform](https:\u002F\u002Fchat.pageindex.ai) for professional long documents. Also available via [MCP](https:\u002F\u002Fpageindex.ai\u002Fmcp) or [API](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart).\n- [PageIndex Framework](https:\u002F\u002Fpageindex.ai\u002Fblog\u002Fpageindex-intro): The PageIndex framework — an *agentic, in-context tree index* that enables LLMs to perform *reasoning-based, human-like retrieval* over long documents.\n\n \u003C!-- **🧪 Cookbooks:**\n- [Vectorless RAG](https:\u002F\u002Fdocs.pageindex.ai\u002Fcookbook\u002Fvectorless-rag-pageindex): A minimal, hands-on example of reasoning-based RAG using PageIndex. No vectors, no chunking, and human-like retrieval.\n- [Vision-based Vectorless RAG](https:\u002F\u002Fdocs.pageindex.ai\u002Fcookbook\u002Fvision-rag-pageindex): OCR-free, vision-only RAG with PageIndex's reasoning-native retrieval workflow that works directly over PDF page images. -->\n\n\u003C\u002Fdetails>\n\n---\n\n# 📑 Introduction to PageIndex\n\nAre you frustrated with vector database retrieval accuracy for long professional documents? Traditional vector-based RAG relies on semantic *similarity* rather than true *relevance*. But **similarity ≠ relevance** — what we truly need in retrieval is **relevance**, and that requires **reasoning**. When working with professional documents that demand domain expertise and multi-step reasoning, similarity search often falls short.\n\nInspired by AlphaGo, we propose **[PageIndex](https:\u002F\u002Fvectify.ai\u002Fpageindex)** — a **vectorless**, **reasoning-based RAG** system that builds a **hierarchical tree index** from long documents and uses LLMs to **reason** *over that index* for **agentic, context-aware retrieval**.\nIt simulates how *human experts* navigate and extract knowledge from complex documents through *tree search*, enabling LLMs to *think* and *reason* their way to the most relevant document sections. PageIndex performs retrieval in two steps:\n\n1. Generate a “Table-of-Contents” **tree structure index** of documents\n2. Perform reasoning-based retrieval through **tree search**\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fpageindex.ai\u002Fblog\u002Fpageindex-intro\" target=\"_blank\" title=\"The PageIndex Framework\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_72d998f296e2.png\" width=\"70%\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n### 🎯 Core Features\n\nCompared to traditional vector-based RAG, **PageIndex** features:\n- **No Vector DB**: Uses document structure and LLM reasoning for retrieval, instead of vector similarity search.\n- **No Chunking**: Documents are organized into natural sections, not artificial chunks.\n- **Human-like Retrieval**: Simulates how human experts navigate and extract knowledge from complex documents.\n- **Better Explainability and Traceability**: Retrieval is based on reasoning — traceable and interpretable, with page and section references. No more opaque, approximate vector search (“vibe retrieval”).\n\nPageIndex powers a reasoning-based RAG system that achieved **state-of-the-art** [98.7% accuracy](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FMafin2.5-FinanceBench) on FinanceBench, demonstrating superior performance over vector-based RAG solutions in professional document analysis (see our [blog post](https:\u002F\u002Fvectify.ai\u002Fblog\u002FMafin2.5) for details).\n\n### 📍 Explore PageIndex\n\nTo learn more, please see a detailed introduction of the [PageIndex framework](https:\u002F\u002Fpageindex.ai\u002Fblog\u002Fpageindex-intro). Check out this GitHub repo for open-source code, and the [cookbooks](https:\u002F\u002Fdocs.pageindex.ai\u002Fcookbook), [tutorials](https:\u002F\u002Fdocs.pageindex.ai\u002Ftutorials), and [blog](https:\u002F\u002Fpageindex.ai\u002Fblog) for additional usage guides and examples.\n\nThe PageIndex service is available as a ChatGPT-style [chat platform](https:\u002F\u002Fchat.pageindex.ai), or can be integrated via [MCP](https:\u002F\u002Fpageindex.ai\u002Fmcp) or [API](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart).\n\n### 🛠️ Deployment Options\n- Self-host — run locally with this open-source repo.\n- Cloud Service — try instantly with our [Chat Platform](https:\u002F\u002Fchat.pageindex.ai\u002F), or integrate with [MCP](https:\u002F\u002Fpageindex.ai\u002Fmcp) or [API](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart).\n- _Enterprise_ — private or on-prem deployment. [Contact us](https:\u002F\u002Fii2abc2jejf.typeform.com\u002Fto\u002FtK3AXl8T) or [book a demo](https:\u002F\u002Fcalendly.com\u002Fpageindex\u002Fmeet) for more details.\n\n### 🧪 Quick Hands-on\n\n- 🔥 [**Agentic Vectorless RAG**](examples\u002Fagentic_vectorless_rag_demo.py) (**latest**) — a simple but complete **agentic vectorless RAG** [example](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fexamples\u002Fagentic_vectorless_rag_demo.py) with *self-hosted* PageIndex, using OpenAI Agents SDK.\n- Try the [Vectorless RAG](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fcookbook\u002Fpageindex_RAG_simple.ipynb) notebook — a *minimal*, hands-on example of reasoning-based RAG using PageIndex.\n- Check out [Vision-based Vectorless RAG](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fcookbook\u002Fvision_RAG_pageindex.ipynb) — no OCR; a minimal, vision-based & reasoning-native RAG pipeline that works directly over page images.\n  \n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fexamples\u002Fagentic_vectorless_rag_demo.py\" target=\"_blank\" rel=\"noopener\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FView_on_GitHub-Agentic_Vectorless_RAG-blue?style=for-the-badge&logo=github\" alt=\"View on GitHub: Agentic Vectorless RAG\" \u002F>\n  \u003C\u002Fa>\n  \u003Cbr\u002F>\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fcookbook\u002Fpageindex_RAG_simple.ipynb\" target=\"_blank\" rel=\"noopener\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpen_In_Colab-Vectorless_RAG-orange?style=for-the-badge&logo=googlecolab\" alt=\"Open in Colab: Vectorless RAG\" \u002F>\n  \u003C\u002Fa>\n  &nbsp;&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fcookbook\u002Fvision_RAG_pageindex.ipynb\" target=\"_blank\" rel=\"noopener\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpen_In_Colab-Vision_RAG-orange?style=for-the-badge&logo=googlecolab\" alt=\"Open in Colab: Vision RAG\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n---\n\n# 🌲 PageIndex Tree Structure\n\nPageIndex can transform lengthy PDF documents into a semantic **tree structure**, similar to a _\"table of contents\"_ but optimized for use with Large Language Models (LLMs). It's ideal for: financial reports, regulatory filings, academic textbooks, legal or technical manuals, and any document that exceeds LLM context limits.\n\nBelow is an example PageIndex tree structure. Also see more example [documents](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Ftree\u002Fmain\u002Fexamples\u002Fdocuments) and generated [tree structures](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Ftree\u002Fmain\u002Fexamples\u002Fdocuments\u002Fresults).\n\n```jsonc\n...\n{\n  \"title\": \"Financial Stability\",\n  \"node_id\": \"0006\",\n  \"start_index\": 21,\n  \"end_index\": 22,\n  \"summary\": \"The Federal Reserve ...\",\n  \"nodes\": [\n    {\n      \"title\": \"Monitoring Financial Vulnerabilities\",\n      \"node_id\": \"0007\",\n      \"start_index\": 22,\n      \"end_index\": 28,\n      \"summary\": \"The Federal Reserve's monitoring ...\"\n    },\n    {\n      \"title\": \"Domestic and International Cooperation and Coordination\",\n      \"node_id\": \"0008\",\n      \"start_index\": 28,\n      \"end_index\": 31,\n      \"summary\": \"In 2023, the Federal Reserve collaborated ...\"\n    }\n  ]\n}\n...\n```\n\nYou can generate the PageIndex tree structure with this open-source repo, or use our [API](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart).\n\n---\n\n# ⚙️ Package Usage\n\nYou can follow these steps to generate a PageIndex tree from a PDF document.\n\n### 1. Install dependencies\n\n```bash\npip3 install --upgrade -r requirements.txt\n```\n\n### 2. Set your LLM API key\n\nCreate a `.env` file in the root directory with your LLM API key, with multi-LLM support via [LiteLLM](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders):\n\n```bash\nOPENAI_API_KEY=your_openai_key_here\n```\n\n### 3. Generate PageIndex structure for your PDF\n\n```bash\npython3 run_pageindex.py --pdf_path \u002Fpath\u002Fto\u002Fyour\u002Fdocument.pdf\n```\n\n\u003Cdetails>\n\u003Csummary>Optional parameters\u003C\u002Fsummary>\n\u003Cbr>\nYou can customize the processing with additional optional arguments:\n\n```\n--model                 LLM model to use (default: gpt-4o-2024-11-20)\n--toc-check-pages       Pages to check for table of contents (default: 20)\n--max-pages-per-node    Max pages per node (default: 10)\n--max-tokens-per-node   Max tokens per node (default: 20000)\n--if-add-node-id        Add node ID (yes\u002Fno, default: yes)\n--if-add-node-summary   Add node summary (yes\u002Fno, default: yes)\n--if-add-doc-description Add doc description (yes\u002Fno, default: yes)\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Markdown support\u003C\u002Fsummary>\n\u003Cbr>\nWe also provide markdown support for PageIndex. You can use the `--md_path` flag to generate a tree structure for a markdown file.\n\n```bash\npython3 run_pageindex.py --md_path \u002Fpath\u002Fto\u002Fyour\u002Fdocument.md\n```\n\n> Note: in this mode, we use \"#\" to determine node headings and their levels. For example, \"##\" is level 2, \"###\" is level 3, etc. Make sure your markdown file is formatted correctly. If your Markdown file was converted from a PDF or HTML, we don't recommend using this mode, since most existing conversion tools cannot preserve the original hierarchy. Instead, use our [PageIndex OCR](https:\u002F\u002Fpageindex.ai\u002Fblog\u002Focr), which is designed to preserve the original hierarchy, to convert the PDF to a markdown file and then use this mode.\n\u003C\u002Fdetails>\n\n### Agentic Vectorless RAG Example\n\nFor a simple, end-to-end _**agentic vectorless RAG**_ example using PageIndex (with OpenAI Agents SDK), see [`examples\u002Fagentic_vectorless_rag_demo.py`](examples\u002Fagentic_vectorless_rag_demo.py).\n\n```bash\n# Install optional dependency\npip3 install openai-agents\n\n# Run the demo\npython3 examples\u002Fagentic_vectorless_rag_demo.py\n```\n\n\u003C!--\n# ☁️ Improved Tree Generation with PageIndex OCR\n\nThis repo is designed for generating PageIndex tree structure for simple PDFs, but many real-world use cases involve complex PDFs that are hard to parse by classic Python tools. However, extracting high-quality text from PDF documents remains a non-trivial challenge. Most OCR tools only extract page-level content, losing the broader document context and hierarchy.\n\nTo address this, we introduced PageIndex OCR — the first long-context OCR model designed to preserve the global structure of documents. PageIndex OCR significantly outperforms other leading OCR tools, such as those from Mistral and Contextual AI, in recognizing true hierarchy and semantic relationships across document pages.\n\n- Experience next-level OCR quality with PageIndex OCR at our [Dashboard](https:\u002F\u002Fdash.pageindex.ai\u002F).\n- Integrate PageIndex OCR seamlessly into your stack via our [API](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart).\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_8a607f3eb1e1.png\" width=\"80%\">\n\u003C\u002Fp>\n-->\n\n---\n\n# 📈 Case Study: PageIndex Leads Finance QA Benchmark\n\n[Mafin 2.5](https:\u002F\u002Fvectify.ai\u002Fmafin) is a reasoning-based RAG system for financial document analysis, powered by **PageIndex**. It achieved a state-of-the-art [**98.7% accuracy**](https:\u002F\u002Fvectify.ai\u002Fblog\u002FMafin2.5) on the [FinanceBench](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.11944) benchmark, significantly outperforming traditional vector-based RAG systems.\n\nPageIndex's hierarchical indexing and reasoning-driven retrieval enable precise navigation and extraction of relevant context from complex financial reports, such as SEC filings and earnings disclosures.\n\nExplore the full [benchmark results](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FMafin2.5-FinanceBench) and our [blog post](https:\u002F\u002Fvectify.ai\u002Fblog\u002FMafin2.5) for detailed comparisons and performance metrics.\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FMafin2.5-FinanceBench\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_3d68dcdb1f19.png\" width=\"70%\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n---\n\n# 🧭 Resources\n\n* 🧪 [Cookbooks](https:\u002F\u002Fdocs.pageindex.ai\u002Fcookbook\u002Fvectorless-rag-pageindex): hands-on, runnable examples and advanced use cases.\n* 📖 [Tutorials](https:\u002F\u002Fdocs.pageindex.ai\u002Fdoc-search): practical guides and strategies, including *Document Search* and *Tree Search*.\n* 📝 [Blog](https:\u002F\u002Fpageindex.ai\u002Fblog): technical articles, research insights, and product updates.\n* 🔌 [MCP setup](https:\u002F\u002Fpageindex.ai\u002Fmcp#quick-setup) & [API docs](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart): integration details and configuration options.\n\n---\n\n# ⭐ Support Us\nPlease cite this work as:\n```\nMingtian Zhang, Yu Tang and PageIndex Team,\n\"PageIndex: Next-Generation Vectorless, Reasoning-based RAG\",\nPageIndex Blog, Sep 2025.\n```\n\n\u003Cdetails>\n\u003Csummary>Or use the BibTeX citation.\u003C\u002Fsummary>\n\n```bibtex\n@article{zhang2025pageindex,\n  author = {Mingtian Zhang and Yu Tang and PageIndex Team},\n  title = {PageIndex: Next-Generation Vectorless, Reasoning-based RAG},\n  journal = {PageIndex Blog},\n  year = {2025},\n  month = {September},\n  note = {https:\u002F\u002Fpageindex.ai\u002Fblog\u002Fpageindex-intro},\n}\n```\n\u003C\u002Fdetails>\n\nLeave us a star 🌟 if you like our project. Thank you!  \n\n\u003Cp>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_ec04d3e57d2b.png\" width=\"80%\">\n\u003C\u002Fp>\n\n### Connect with Us\n\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-000000?style=for-the-badge&logo=x&logoColor=white)](https:\u002F\u002Fx.com\u002FPageIndexAI)&nbsp;\n[![LinkedIn](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fvectify-ai\u002F)&nbsp;\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.com\u002Finvite\u002FVuXuf29EUj)&nbsp;\n[![Contact Us](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContact_Us-3B82F6?style=for-the-badge&logo=envelope&logoColor=white)](https:\u002F\u002Fii2abc2jejf.typeform.com\u002Fto\u002FtK3AXl8T)\n\n---\n\n© 2026 [Vectify AI](https:\u002F\u002Fvectify.ai)\n","\u003Cdiv align=\"center\">\n  \n\u003Ca href=\"https:\u002F\u002Fvectify.ai\u002Fpageindex\" target=\"_blank\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_945eac4b7892.png\" alt=\"PageIndex 横幅\" \u002F>\n\u003C\u002Fa>\n\n\u003Cbr\u002F>\n\u003Cbr\u002F>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F14736\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_4a68feb902da.png\" alt=\"VectifyAI%2FPageIndex | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n# PageIndex：无向量、基于推理的 RAG (检索增强生成)\n\n\u003Cp align=\"center\">\u003Cb>基于推理的 RAG (检索增强生成) ◦ 无需向量数据库 (Vector DB) ◦ 无需分块 (Chunking) ◦ 类人检索\u003C\u002Fb>\u003C\u002Fp>\n\n\u003Ch4 align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fvectify.ai\">🏠 主页\u003C\u002Fa>&nbsp; • &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fchat.pageindex.ai\">🖥️ 聊天平台\u003C\u002Fa>&nbsp; • &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fpageindex.ai\u002Fmcp\">🔌 MCP (模型上下文协议)\u003C\u002Fa>&nbsp; • &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fdocs.pageindex.ai\">📚 文档\u003C\u002Fa>&nbsp; • &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fdiscord.com\u002Finvite\u002FVuXuf29EUj\">💬 Discord\u003C\u002Fa>&nbsp; • &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fii2abc2jejf.typeform.com\u002Fto\u002FtK3AXl8T\">✉️ 联系\u003C\u002Fa>&nbsp;\n\u003C\u002Fh4>\n  \n\u003C\u002Fdiv>\n\n\n\u003Cdetails open>\n\u003Csummary>\u003Ch2>📢 更新\u003C\u002Fh2>\u003C\u002Fsummary>\n\n- 🔥 [**代理式 (Agentic) 无向量 RAG**](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fexamples\u002Fagentic_vectorless_rag_demo.py): 一个使用 OpenAI Agents SDK 和自托管 PageIndex 的简单 *代理式、无向量 RAG* [示例](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fexamples\u002Fagentic_vectorless_rag_demo.py)。\n- [PageIndex 聊天](https:\u002F\u002Fchat.pageindex.ai): 针对专业长文档的类人文档分析智能体 [平台](https:\u002F\u002Fchat.pageindex.ai)。也可通过 [MCP (模型上下文协议)](https:\u002F\u002Fpageindex.ai\u002Fmcp) 或 [API (应用程序接口)](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart) 访问。\n- [PageIndex 框架](https:\u002F\u002Fpageindex.ai\u002Fblog\u002Fpageindex-intro): PageIndex 框架 —— 一种 *代理式、上下文内 (in-context) 树索引*，使 LLMs (大语言模型) 能够对长文档进行 *基于推理的、类人检索*。\n\n \u003C!-- **🧪 示例代码库：**\n- [无向量 RAG](https:\u002F\u002Fdocs.pageindex.ai\u002Fcookbook\u002Fvectorless-rag-pageindex): 使用 PageIndex 进行基于推理的 RAG 的最小化动手示例。无需向量，无需分块，实现类人检索。\n- [基于视觉的无向量 RAG](https:\u002F\u002Fdocs.pageindex.ai\u002Fcookbook\u002Fvision-rag-pageindex): 使用 PageIndex 的原生推理检索工作流，直接处理 PDF 页面图像，无需 OCR。 -->\n\n\u003C\u002Fdetails>\n\n---\n\n# 📑 PageIndex 简介\n\n您是否对专业长文档的向量数据库检索准确率感到沮丧？传统的基于向量的 RAG 依赖于语义 *相似度* 而非真正的 *相关性*。但 **相似度 ≠ 相关性** —— 我们在检索中真正需要的是 **相关性**，而这需要 **推理**。在处理需要领域专业知识和多步推理的专业文档时，相似度搜索往往力不从心。\n\n受 AlphaGo 启发，我们提出了 **[PageIndex](https:\u002F\u002Fvectify.ai\u002Fpageindex)** —— 一个 **无向量**、**基于推理的 RAG (检索增强生成)** 系统，它从长文档构建 **分层树索引**，并利用 LLMs (大语言模型) 在该索引上进行 **推理**，以实现 **代理式、感知上下文的检索**。它模拟了 *人类专家* 如何通过 *树搜索* 导航并从复杂文档中提取知识，使 LLMs 能够 *思考* 并 *推理* 出最相关的文档部分。PageIndex 执行检索的两个步骤如下：\n\n1. 生成文档的“目录”**树结构索引**\n2. 通过 **树搜索** 执行基于推理的检索\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fpageindex.ai\u002Fblog\u002Fpageindex-intro\" target=\"_blank\" title=\"PageIndex 框架\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_72d998f296e2.png\" width=\"70%\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n### 🎯 核心功能\n\n与传统的基于向量的 RAG 相比，**PageIndex** 具备以下特点：\n- **无需向量数据库 (Vector DB)**：利用文档结构和 LLM 推理进行检索，而非向量相似度搜索。\n- **无需分块 (Chunking)**：文档被组织成自然章节，而非人工分块。\n- **类人检索**：模拟人类专家如何导航并从复杂文档中提取知识。\n- **更好的可解释性和可追溯性**：检索基于推理——可追溯且可解释，包含页码和章节引用。不再有不透明、近似的向量搜索（“氛围”检索 vibe retrieval）。\n\nPageIndex  powering 了一个基于推理的 RAG 系统，该系统在 FinanceBench 上达到了 **最先进的 (State-of-the-art)** [98.7% 准确率](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FMafin2.5-FinanceBench)，证明了其在专业文档分析方面优于基于向量的 RAG 解决方案的性能（详情见我们的 [博客文章](https:\u002F\u002Fvectify.ai\u002Fblog\u002FMafin2.5)）。\n\n### 📍 探索 PageIndex\n\n欲了解更多，请参阅 [PageIndex 框架](https:\u002F\u002Fpageindex.ai\u002Fblog\u002Fpageindex-intro) 的详细介绍。查看本 GitHub 仓库以获取开源代码，并参考 [示例代码库 (cookbooks)](https:\u002F\u002Fdocs.pageindex.ai\u002Fcookbook)、[教程](https:\u002F\u002Fdocs.pageindex.ai\u002Ftutorials) 和 [博客](https:\u002F\u002Fpageindex.ai\u002Fblog) 以获取更多使用指南和示例。\n\nPageIndex 服务可作为 ChatGPT 风格的 [聊天平台](https:\u002F\u002Fchat.pageindex.ai) 使用，或通过 [MCP (模型上下文协议)](https:\u002F\u002Fpageindex.ai\u002Fmcp) 或 [API (应用程序接口)](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart) 集成。\n\n### 🛠️ 部署选项\n- **自托管** —— 使用此开源仓库在本地运行。\n- **云服务** —— 立即试用我们的 [聊天平台](https:\u002F\u002Fchat.pageindex.ai\u002F)，或与 [MCP (模型上下文协议)](https:\u002F\u002Fpageindex.ai\u002Fmcp) 或 [API (应用程序接口)](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart) 集成。\n- _企业版_ —— 私有或本地 (on-prem) 部署。[联系我们](https:\u002F\u002Fii2abc2jejf.typeform.com\u002Fto\u002FtK3AXl8T) 或 [预约演示](https:\u002F\u002Fcalendly.com\u002Fpageindex\u002Fmeet) 以了解更多信息。\n\n### 🧪 快速上手\n\n- 🔥 [**智能体式无向量 RAG（检索增强生成）**](examples\u002Fagentic_vectorless_rag_demo.py) (**最新版**) — 一个简单但完整的 **智能体式无向量 RAG** [示例](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fexamples\u002Fagentic_vectorless_rag_demo.py)，使用 *自托管* PageIndex 和 OpenAI Agents SDK。\n- 尝试 [无向量 RAG](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fcookbook\u002Fpageindex_RAG_simple.ipynb) 笔记本 — 一个使用 PageIndex 进行基于推理的 RAG 的 *极简* 动手示例。\n- 查看 [基于视觉的无向量 RAG](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fcookbook\u002Fvision_RAG_pageindex.ipynb) — 无需 OCR（光学字符识别）；这是一个直接在页面图像上运行的、基于视觉且原生支持推理的 RAG 流水线，属于极简版本。\n  \n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fexamples\u002Fagentic_vectorless_rag_demo.py\" target=\"_blank\" rel=\"noopener\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FView_on_GitHub-Agentic_Vectorless_RAG-blue?style=for-the-badge&logo=github\" alt=\"在 GitHub 上查看：智能体式无向量 RAG\" \u002F>\n  \u003C\u002Fa>\n  \u003Cbr\u002F>\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fcookbook\u002Fpageindex_RAG_simple.ipynb\" target=\"_blank\" rel=\"noopener\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpen_In_Colab-Vectorless_RAG-orange?style=for-the-badge&logo=googlecolab\" alt=\"在 Colab 中打开：无向量 RAG\" \u002F>\n  \u003C\u002Fa>\n  &nbsp;&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FVectifyAI\u002FPageIndex\u002Fblob\u002Fmain\u002Fcookbook\u002Fvision_RAG_pageindex.ipynb\" target=\"_blank\" rel=\"noopener\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpen_In_Colab-Vision_RAG-orange?style=for-the-badge&logo=googlecolab\" alt=\"在 Colab 中打开：视觉 RAG\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n---\n\n# 🌲 PageIndex 树状结构\n\nPageIndex 可以将长篇 PDF 文档转换为语义 **树状结构**，类似于 _\"目录\"_，但针对与大语言模型 (LLMs) 配合使用进行了优化。它非常适合以下场景：财务报告、监管文件、学术教科书、法律或技术手册，以及任何超出 LLM 上下文限制的文档。\n\n下面是一个 PageIndex 树状结构的示例。还可以查看更多示例 [文档](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Ftree\u002Fmain\u002Fexamples\u002Fdocuments) 和生成的 [树状结构](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Ftree\u002Fmain\u002Fexamples\u002Fdocuments\u002Fresults)。\n\n```jsonc\n...\n{\n  \"title\": \"Financial Stability\",\n  \"node_id\": \"0006\",\n  \"start_index\": 21,\n  \"end_index\": 22,\n  \"summary\": \"The Federal Reserve ...\",\n  \"nodes\": [\n    {\n      \"title\": \"Monitoring Financial Vulnerabilities\",\n      \"node_id\": \"0007\",\n      \"start_index\": 22,\n      \"end_index\": 28,\n      \"summary\": \"The Federal Reserve's monitoring ...\"\n    },\n    {\n      \"title\": \"Domestic and International Cooperation and Coordination\",\n      \"node_id\": \"0008\",\n      \"start_index\": 28,\n      \"end_index\": 31,\n      \"summary\": \"In 2023, the Federal Reserve collaborated ...\"\n    }\n  ]\n}\n...\n```\n\n您可以使用此开源仓库生成 PageIndex 树状结构，或者使用我们的 [API（应用程序编程接口）](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart)。\n\n---\n\n# ⚙️ 包使用\n\n您可以按照以下步骤从 PDF 文档生成 PageIndex 树状结构。\n\n### 1. 安装依赖项\n\n```bash\npip3 install --upgrade -r requirements.txt\n```\n\n### 2. 设置您的 LLM API 密钥\n\n在根目录创建一个 `.env` 文件，包含您的 LLM API 密钥，并通过 [LiteLLM](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders) 支持多 LLM：\n\n```bash\nOPENAI_API_KEY=your_openai_key_here\n```\n\n### 3. 为您的 PDF 生成 PageIndex 结构\n\n```bash\npython3 run_pageindex.py --pdf_path \u002Fpath\u002Fto\u002Fyour\u002Fdocument.pdf\n```\n\n\u003Cdetails>\n\u003Csummary>可选参数\u003C\u002Fsummary>\n\u003Cbr>\n您可以使用其他可选参数自定义处理过程：\n\n```\n--model                 LLM model to use (default: gpt-4o-2024-11-20)\n--toc-check-pages       Pages to check for table of contents (default: 20)\n--max-pages-per-node    Max pages per node (default: 10)\n--max-tokens-per-node   Max tokens per node (default: 20000)\n--if-add-node-id        Add node ID (yes\u002Fno, default: yes)\n--if-add-node-summary   Add node summary (yes\u002Fno, default: yes)\n--if-add-doc-description Add doc description (yes\u002Fno, default: yes)\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Markdown 支持\u003C\u002Fsummary>\n\u003Cbr>\n我们还为 PageIndex 提供了 Markdown 支持。您可以使用 `--md_path` 标志为 Markdown 文件生成树状结构。\n\n```bash\npython3 run_pageindex.py --md_path \u002Fpath\u002Fto\u002Fyour\u002Fdocument.md\n```\n\n> 注意：在此模式下，我们使用 \"#\" 来确定节点标题及其级别。例如，\"##\" 是第 2 级，\"###\" 是第 3 级，依此类推。请确保您的 Markdown 文件格式正确。如果您的 Markdown 文件是从 PDF 或 HTML 转换而来的，我们不建议使用此模式，因为大多数现有的转换工具无法保留原始层级结构。相反，请使用我们的 [PageIndex OCR](https:\u002F\u002Fpageindex.ai\u002Fblog\u002Focr)，它是专为保留原始层级结构而设计的，用于将 PDF 转换为 Markdown 文件，然后再使用此模式。\n\u003C\u002Fdetails>\n\n### 智能体式无向量 RAG 示例\n\n对于使用 PageIndex（配合 OpenAI Agents SDK）的简单、端到端 _**智能体式无向量 RAG**_ 示例，请参见 [`examples\u002Fagentic_vectorless_rag_demo.py`](examples\u002Fagentic_vectorless_rag_demo.py)。\n\n```bash\n# Install optional dependency\npip3 install openai-agents\n\n# Run the demo\npython3 examples\u002Fagentic_vectorless_rag_demo.py\n```\n\n\u003C!--\n# ☁️ 通过 PageIndex OCR 改进树状结构生成\n\n本仓库旨在为简单的 PDF 生成 PageIndex 树状结构，但许多现实世界的应用涉及复杂的 PDF，这些 PDF 难以被经典的 Python 工具解析。然而，从 PDF 文档中提取高质量文本仍然是一个非琐碎的挑战。大多数 OCR 工具仅提取页面级别的内容，丢失了更广泛的文档上下文和层级结构。\n\n为了解决这个问题，我们推出了 PageIndex OCR —— 第一个旨在保留文档全局结构的长上下文 OCR 模型。PageIndex OCR 在识别跨文档页面的真实层级结构和语义关系方面，显著优于其他领先的 OCR 工具，如 Mistral 和 Contextual AI 的工具。\n\n- 在我们的 [仪表盘](https:\u002F\u002Fdash.pageindex.ai\u002F) 体验下一代的 OCR 质量。\n- 通过我们的 [API（应用程序编程接口）](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart) 无缝集成 PageIndex OCR 到您的技术栈中。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_8a607f3eb1e1.png\" width=\"80%\">\n\u003C\u002Fp>\n-->\n\n# 📈 案例研究：PageIndex 引领金融问答基准测试\n\n[Mafin 2.5](https:\u002F\u002Fvectify.ai\u002Fmafin) 是一个由 **PageIndex** 驱动的、用于金融文档分析的基于推理的 RAG（检索增强生成）系统。它在 [FinanceBench](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.11944) 基准测试中取得了最先进的 [**98.7% 准确率**](https:\u002F\u002Fvectify.ai\u002Fblog\u002FMafin2.5)，显著优于传统的基于向量的 RAG 系统。\n\nPageIndex 的分层索引和推理驱动检索功能，使其能够从复杂的财务报告（如 SEC 备案文件和收益披露）中精确导航并提取相关上下文。\n\n探索完整的 [基准测试结果](https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FMafin2.5-FinanceBench) 和我们的 [博客文章](https:\u002F\u002Fvectify.ai\u002Fblog\u002FMafin2.5)，以获取详细的比较和性能指标。\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FMafin2.5-FinanceBench\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_3d68dcdb1f19.png\" width=\"70%\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n---\n\n# 🧭 资源\n\n* 🧪 [示例手册](https:\u002F\u002Fdocs.pageindex.ai\u002Fcookbook\u002Fvectorless-rag-pageindex)：动手实践的可运行示例和高级用例。\n* 📖 [教程](https:\u002F\u002Fdocs.pageindex.ai\u002Fdoc-search)：实用指南和策略，包括 *文档搜索* 和 *树搜索*。\n* 📝 [博客](https:\u002F\u002Fpageindex.ai\u002Fblog)：技术文章、研究见解和产品更新。\n* 🔌 [MCP 设置](https:\u002F\u002Fpageindex.ai\u002Fmcp#quick-setup) 及 [API 文档](https:\u002F\u002Fdocs.pageindex.ai\u002Fquickstart)：集成详情和配置选项。\n\n---\n\n# ⭐ 支持我们\n请引用如下：\n```\nMingtian Zhang, Yu Tang and PageIndex Team,\n\"PageIndex: Next-Generation Vectorless, Reasoning-based RAG\",\nPageIndex Blog, Sep 2025.\n```\n\n\u003Cdetails>\n\u003Csummary>或使用 BibTeX 引用。\u003C\u002Fsummary>\n\n```bibtex\n@article{zhang2025pageindex,\n  author = {Mingtian Zhang and Yu Tang and PageIndex Team},\n  title = {PageIndex: Next-Generation Vectorless, Reasoning-based RAG},\n  journal = {PageIndex Blog},\n  year = {2025},\n  month = {September},\n  note = {https:\u002F\u002Fpageindex.ai\u002Fblog\u002Fpageindex-intro},\n}\n```\n\u003C\u002Fdetails>\n\n如果您喜欢我们的项目，请给我们一颗星 🌟。谢谢！  \n\n\u003Cp>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_readme_ec04d3e57d2b.png\" width=\"80%\">\n\u003C\u002Fp>\n\n### 联系我们\n\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-000000?style=for-the-badge&logo=x&logoColor=white)](https:\u002F\u002Fx.com\u002FPageIndexAI)&nbsp;\n[![LinkedIn](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002Fvectify-ai\u002F)&nbsp;\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.com\u002Finvite\u002FVuXuf29EUj)&nbsp;\n[![Contact Us](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContact_Us-3B82F6?style=for-the-badge&logo=envelope&logoColor=white)](https:\u002F\u002Fii2abc2jejf.typeform.com\u002Fto\u002FtK3AXl8T)\n\n---\n\n© 2026 [Vectify AI](https:\u002F\u002Fvectify.ai)","# PageIndex 快速上手指南\n\n## 简介\nPageIndex 是一款基于推理的无向量 RAG（检索增强生成）系统。它通过构建层次化树索引并利用大语言模型进行推理，模拟人类专家导航复杂文档的方式，实现高效、可解释的检索，无需向量数据库或人工分块。\n\n## 环境准备\n- **操作系统**: Linux \u002F macOS \u002F Windows\n- **Python 版本**: Python 3.x\n- **前置依赖**: Git, Pip\n- **API 密钥**: 支持 OpenAI 或其他兼容 [LiteLLM](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders) 的大模型 API Key。\n\n## 安装步骤\n\n### 1. 克隆仓库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex.git\ncd PageIndex\n```\n\n### 2. 安装依赖\n```bash\npip3 install --upgrade -r requirements.txt\n```\n*(注：国内用户如遇网络问题，建议配置 pip 镜像源以加速下载)*\n\n### 3. 配置环境变量\n在根目录创建 `.env` 文件，填入您的 LLM API Key：\n```bash\nOPENAI_API_KEY=your_openai_key_here\n```\n> 提示：若使用非 OpenAI 模型，请根据 LiteLLM 文档配置相应的环境变量。\n\n## 基本使用\n\n### 1. 生成文档树索引 (PDF)\n将长 PDF 文档转换为语义树结构：\n```bash\npython3 run_pageindex.py --pdf_path \u002Fpath\u002Fto\u002Fyour\u002Fdocument.pdf\n```\n\n**常用可选参数：**\n```bash\n--model                 # 指定使用的 LLM 模型 (默认：gpt-4o-2024-11-20)\n--max-pages-per-node    # 每个节点的最大页数 (默认：10)\n--if-add-node-summary   # 是否添加节点摘要 (默认：yes)\n```\n\n### 2. 生成文档树索引 (Markdown)\n支持直接解析 Markdown 文件（需确保标题层级如 `##`, `###` 格式正确）：\n```bash\npython3 run_pageindex.py --md_path \u002Fpath\u002Fto\u002Fyour\u002Fdocument.md\n```\n\n### 3. 运行 Agent 式无向量 RAG 示例\n结合 OpenAI Agents SDK 进行完整的智能体检索演示：\n```bash\n# 安装可选依赖\npip3 install openai-agents\n\n# 运行演示\npython3 examples\u002Fagentic_vectorless_rag_demo.py\n```\n\n## 更多资源\n- 📚 **官方文档**: [docs.pageindex.ai](https:\u002F\u002Fdocs.pageindex.ai)\n- 🖥️ **在线试用**: [Chat Platform](https:\u002F\u002Fchat.pageindex.ai)\n- 💬 **社区交流**: [Discord](https:\u002F\u002Fdiscord.com\u002Finvite\u002FVuXuf29EUj)","某律所合规分析师正在处理一份长达 200 页的跨国并购协议，需要快速定位其中“竞业限制”与“违约赔偿”条款之间的逻辑关联。\n\n### 没有 PageIndex 时\n- 传统向量检索仅基于语义相似度，常返回词汇相似但逻辑无关的条款，导致法律误判风险。\n- 文档被强制切分为固定长度的片段，跨页的关键上下文信息断裂，难以理解整体意图。\n- 检索结果缺乏解释性，分析师需人工逐页翻阅验证，耗时且极易遗漏隐蔽的交叉引用。\n- 系统依赖复杂的向量数据库搭建与维护，增加了额外的技术门槛和运维成本。\n\n### 使用 PageIndex 后\n- 自动构建文档目录树结构索引，完整保留章节层级，彻底消除了文档切分带来的信息丢失。\n- 利用大模型推理能力在树结构中进行搜索，精准识别条款间的因果、条件及关联关系。\n- 输出符合人类专家思维的分析路径，直接呈现结论依据，显著减少人工复核的时间成本。\n- 无需部署任何向量数据库，简化了技术架构的同时，大幅提升了长文档处理的准确率。\n\nPageIndex 通过模拟人类专家的推理导航，让复杂专业文档的分析变得像阅读目录一样直观高效。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FVectifyAI_PageIndex_a8b10184.png","VectifyAI","Vectify AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FVectifyAI_1465ff67.png","Building next-gen Vectorless, Reasoning-based RAG",null,"mingtian@vectify.ai","PageIndexAI","https:\u002F\u002Fvectify.ai","https:\u002F\u002Fgithub.com\u002FVectifyAI",[85],{"name":86,"color":87,"percentage":88},"Python","#3572A5",100,24083,2006,"2026-04-05T10:48:35","MIT","未说明",{"notes":95,"python":93,"dependencies":96},"需要配置 LLM API 密钥（如 OpenAI），支持通过 LiteLLM 切换模型；核心功能为将 PDF\u002FMarkdown 转换为树状索引结构；示例代码需额外安装 openai-agents 库；主要依赖外部 API 进行推理，本地无需特定 GPU 即可运行基础功能。",[97,98],"LiteLLM","openai-agents",[54,51,15,26,13,14],[101,102,103,104,105,106,107,108,109,110,111,112],"agentic-ai","agents","ai","ai-agents","context-engineering","information-retrieval","llm","rag","reasoning","retrieval","retrieval-augmented-generation","vector-database",77,"2026-03-27T02:49:30.150509","2026-04-06T05:16:26.402987",[117,122,126,131,136,140,145],{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},2300,"如何配置和使用自定义的 OpenAI 兼容 API？","目前项目通过 LiteLLM 支持多种 LLM 提供商。您可以直接使用 LiteLLM 进行配置。如果需要适配不同的 API 和基础 URL，可以参考 PR #158 中的少量代码修改。","https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fissues\u002F166",{"id":123,"question_zh":124,"answer_zh":125,"source_url":121},2301,"切换不同模型时出现崩溃或处理缓慢是什么原因？","不同模型的行为方式不同，返回的数据格式也可能不一致，导致程序崩溃或处理时间过长。建议测试更稳定的模型，或确保代码适配了特定模型的返回格式。",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},2302,"如何进行数据的检索或查询操作？","相关的检索功能请参考 PR #125，其中提供了如何使用类似 OpenAI Agents SDK 的代理进行检索的演示。此外，项目计划采用 LiteLLM 以支持更多 LLM 提供商。","https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fissues\u002F25",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},2303,"试用版的配额限制是多少？","试用版允许最多处理 10 页内容。如果您已经使用了部分配额（例如 3 页），任何超过剩余页数的上传将无法处理。您可以在 Dashboard 中检查您的配额使用情况。如需更多试用，可填写表单获取免费额度。","https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fissues\u002F8",{"id":137,"question_zh":138,"answer_zh":139,"source_url":135},2304,"对上传文件的大小和语言有没有限制？","没有严格的最大文件大小限制，只要账户有足够的页面配额即可支持。语言方面没有硬性限制，但系统主要针对英文文档优化，非英文文档尚未充分测试，欢迎尝试并提供反馈。",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},2305,"遇到 API 速率限制（Rate Limit）错误该如何解决？","这通常不是 PageIndex 的问题，而是底层模型（如 GLM）的速率限制问题。建议更换其他模型进行测试。如果是模型供应商的限制，需联系其调整或等待冷却。","https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fissues\u002F178",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},2306,"环境变量应该设置为什么名称？","标准名称通常为 OPENAI_API_KEY，但在 pageindex\u002Futils.py 中也存在 CHATGPT_API_KEY 的配置项。请根据实际代码版本确认正确的变量名。","https:\u002F\u002Fgithub.com\u002FVectifyAI\u002FPageIndex\u002Fissues\u002F152",[]]