[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-GiovanniPasq--agentic-rag-for-dummies":3,"tool-GiovanniPasq--agentic-rag-for-dummies":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",146793,2,"2026-04-08T23:32:35",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108111,"2026-04-08T11:23:26",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":95,"forks":96,"last_commit_at":97,"license":98,"difficulty_score":10,"env_os":99,"env_gpu":100,"env_ram":101,"env_deps":102,"category_tags":116,"github_topics":118,"view_count":32,"oss_zip_url":79,"oss_zip_packed_at":79,"status":17,"created_at":137,"updated_at":138,"faqs":139,"releases":168},5700,"GiovanniPasq\u002Fagentic-rag-for-dummies","agentic-rag-for-dummies","A modular Agentic RAG built with LangGraph — learn Retrieval-Augmented Generation Agents in minutes.","agentic-rag-for-dummies 是一个基于 LangGraph 构建的模块化智能检索增强生成（RAG）系统，旨在帮助开发者快速掌握并落地复杂的 Agent 驱动型 RAG 架构。针对传统 RAG 教程往往只停留在基础概念、缺乏可扩展实战代码的痛点，该项目提供了一套兼具教学价值与生产级灵活性的解决方案。\n\n它特别适合希望深入理解大模型应用开发的工程师、研究人员以及技术爱好者。无论是想通过交互式笔记本快速入门的新手，还是需要构建可定制系统的资深开发者，都能从中受益。用户可以根据需求轻松切换底层大模型（支持 Ollama、OpenAI、Anthropic 等）、嵌入模型或向量数据库，无需重构核心逻辑。\n\n该工具的技术亮点在于其先进的“分层索引”机制，既能精准搜索细粒度片段，又能召回大块上下文以保持语义连贯；同时内置了对话记忆、模糊查询自动澄清、多智能体并行处理及自我纠错等功能。此外，它还集成了上下文压缩与全链路可观测性支持，确保系统在长周期任务中保持高效与透明。通过 agentic-rag-for-dummies，用户可以低成本地搭建出具备自然交互能力和复杂推理逻辑的智能问答","agentic-rag-for-dummies 是一个基于 LangGraph 构建的模块化智能检索增强生成（RAG）系统，旨在帮助开发者快速掌握并落地复杂的 Agent 驱动型 RAG 架构。针对传统 RAG 教程往往只停留在基础概念、缺乏可扩展实战代码的痛点，该项目提供了一套兼具教学价值与生产级灵活性的解决方案。\n\n它特别适合希望深入理解大模型应用开发的工程师、研究人员以及技术爱好者。无论是想通过交互式笔记本快速入门的新手，还是需要构建可定制系统的资深开发者，都能从中受益。用户可以根据需求轻松切换底层大模型（支持 Ollama、OpenAI、Anthropic 等）、嵌入模型或向量数据库，无需重构核心逻辑。\n\n该工具的技术亮点在于其先进的“分层索引”机制，既能精准搜索细粒度片段，又能召回大块上下文以保持语义连贯；同时内置了对话记忆、模糊查询自动澄清、多智能体并行处理及自我纠错等功能。此外，它还集成了上下文压缩与全链路可观测性支持，确保系统在长周期任务中保持高效与透明。通过 agentic-rag-for-dummies，用户可以低成本地搭建出具备自然交互能力和复杂推理逻辑的智能问答系统。","\u003Cp align=\"center\">\n  \u003Cimg alt=\"Agentic RAG for Dummies Logo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGiovanniPasq_agentic-rag-for-dummies_readme_c4c7470eb994.png\" width=\"350px\">\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">Agentic RAG for Dummies\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>Build a modular Agentic RAG system with LangGraph, conversation memory, and human-in-the-loop query clarification\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#overview\">Overview\u003C\u002Fa> •\n  \u003Ca href=\"#how-it-works\">How It Works\u003C\u002Fa> •\n  \u003Ca href=\"#llm-provider-configuration\">LLM Providers\u003C\u002Fa> •\n  \u003Ca href=\"#implementation\">Implementation\u003C\u002Fa> •\n  \u003Ca href=\"#installation--usage\">Installation & Usage\u003C\u002Fa> •\n  \u003Ca href=\"#troubleshooting\">Troubleshooting\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FGiovanniPasq\u002Fagentic-rag-for-dummies?style=social\" alt=\"GitHub Stars\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002FGiovanniPasq\u002Fagentic-rag-for-dummies?style=social\" alt=\"GitHub Forks\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-green\" alt=\"License\"\u002F>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fvon-development\u002Fawesome-langgraph\">\n    \u003Cimg src=\"https:\u002F\u002Fawesome.re\u002Fbadge.svg\" alt=\"Awesome LangGraph\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.11%2B-blue?logo=python&logoColor=white\" alt=\"Python\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLangGraph-1.0%2B-orange?logo=langchain&logoColor=white\" alt=\"LangGraph\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FQdrant-vector%20db-DC244C\" alt=\"Qdrant\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLLM%20Providers-Ollama%20%7C%20OpenAI%20%7C%20Anthropic%20%7C%20Google-purple\" alt=\"LLM Providers\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FGiovanniPasq\u002Fagentic-rag-for-dummies\u002Fblob\u002Fmain\u002Fnotebooks\u002Fagentic_rag.ipynb\">\n    \u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg alt=\"Agentic RAG Demo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGiovanniPasq_agentic-rag-for-dummies_readme_f545e6bbde35.gif\" width=\"650px\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>If you like this project, a star ⭐️ would mean a lot :)\u003C\u002Fstrong>\u003Cbr>\n\u003C\u002Fp>\n\n## Overview\n\nThis repository demonstrates how to build an **Agentic RAG (Retrieval-Augmented Generation)** system using LangGraph with minimal code. Most RAG tutorials show basic concepts but lack guidance on building modular, agent-driven systems — this project bridges that gap by providing **both learning materials and an extensible architecture**.\n\n### What's inside\n\n| Feature | Description |\n|---|---|\n| 🗂️ **Hierarchical Indexing** | Search small chunks for precision, retrieve large Parent chunks for context |\n| 🧠 **Conversation Memory** | Maintains context across questions for natural dialogue |\n| ❓ **Query Clarification** | Rewrites ambiguous queries or pauses to ask the user for details |\n| 🤖 **Agent Orchestration** | LangGraph coordinates the full retrieval and reasoning workflow |\n| 🔀 **Multi-Agent Map-Reduce** | Decomposes complex queries into parallel sub-queries |\n| ✅ **Self-Correction** | Re-queries automatically if initial results are insufficient |\n| 🗜️ **Context Compression** | Keeps working memory lean across long retrieval loops |\n| 🔍 **Observability** | Track LLM calls, tool usage, and graph execution with Langfuse |\n\n### 🎯 Two Ways to Use This Repo\n\n**1️⃣ Learning Path: Interactive Notebook**\n\nStep-by-step tutorial perfect for understanding core concepts. Start here if you're new to Agentic RAG or want to experiment quickly.\n\n**2️⃣ Building Path: Modular Project**\n\nFlexible architecture where each component can be independently swapped — LLM provider, embedding model, PDF converter, agent workflow. One line to switch from Ollama to Anthropic, OpenAI, or Google.\n\nSee [Modular Architecture](#modular-architecture) and [Installation & Usage](#installation--usage) to get started.\n\n## How It Works\n\n### Document Preparation: Hierarchical Indexing\n\nBefore queries can be processed, documents are split twice for optimal retrieval:\n\n- **Parent Chunks**: Large sections based on Markdown headers (H1, H2, H3)\n- **Child Chunks**: Small, fixed-size pieces derived from parents\n\n> 💡 Optional: If you want to visually inspect or edit your chunks before indexing, you can use 🐿️ [**Chunky**](https:\u002F\u002Fgithub.com\u002FGiovanniPasq\u002Fchunky).\n\nThis combines the **precision of small chunks** for search with the **contextual richness of large chunks** for answer generation.\n\n---\n\n### Query Processing: Four-Stage Intelligent Workflow\n```\nUser Query → Conversation Summary → Query Rewriting → Query Clarification →\nParallel Agent Reasoning → Aggregation → Final Response\n```\n\n**Stage 1 — Conversation Understanding:** Analyzes recent history to extract context and maintain continuity across questions.\n\n**Stage 2 — Query Clarification:** Resolves references (\"How do I update it?\" → \"How do I update SQL?\"), splits multi-part questions into focused sub-queries, detects unclear inputs, and rewrites queries for optimal retrieval. Pauses for human input when clarification is needed.\n\n**Stage 3 — Intelligent Retrieval (Multi-Agent Map-Reduce):** Spawns parallel agent subgraphs — one per sub-query. Each agent searches child chunks, fetches parent chunks for context, self-corrects if results are insufficient, compresses context to avoid redundant fetches, and falls back gracefully if the search budget is exhausted.\n\n> **Example:** *\"What is JavaScript? What is Python?\"* → 2 parallel agents execute simultaneously.\n\n**Stage 4 — Response Generation:** Aggregates all agent responses into a single coherent answer.\n\n---\n\n## LLM Provider Configuration\n\nThis system is provider-agnostic — it supports any LLM provider available in [LangChain](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fintegrations\u002Fchat\u002F), swappable in a single line. The examples below cover the most common options, but the same pattern applies to any other supported provider.\n\n> **Note:** Model names change frequently. Always check the official documentation for the latest available models and their identifiers before deploying.\n\n### Ollama (Local)\n\n```bash\n# Install Ollama from https:\u002F\u002Follama.com\nollama pull qwen3:4b-instruct-2507-q4_K_M\n```\n\n```python\nfrom langchain_ollama import ChatOllama\n\nllm = ChatOllama(model=\"qwen3:4b-instruct-2507-q4_K_M\", temperature=0)\n```\n> ⚠️ For reliable tool calling and instruction following, prefer models **7B+**. Smaller models may ignore retrieval instructions or hallucinate. See [Troubleshooting](#troubleshooting).\n\n---\n\n### Cloud Providers\n\n\u003Cdetails>\n\u003Csummary>Click to expand\u003C\u002Fsummary>\n\n**OpenAI GPT:**\n```bash\npip install -qU langchain-openai\n```\n```python\nfrom langchain_openai import ChatOpenAI\nimport os\n\nos.environ[\"OPENAI_API_KEY\"] = \"your-api-key-here\"\nllm = ChatOpenAI(model=\"gpt-4o-mini\", temperature=0)\n```\n\n**Anthropic Claude:**\n```bash\npip install -qU langchain-anthropic\n```\n```python\nfrom langchain_anthropic import ChatAnthropic\nimport os\n\nos.environ[\"ANTHROPIC_API_KEY\"] = \"your-api-key-here\"\nllm = ChatAnthropic(model=\"claude-sonnet-4-5-20250929\", temperature=0)\n```\n\n**Google Gemini**\n```bash\npip install -qU langchain-google-genai\n```\n```python\nimport os\nfrom langchain_google_genai import ChatGoogleGenerativeAI\n\nos.environ[\"GOOGLE_API_KEY\"] = \"your-api-key-here\"\nllm = ChatGoogleGenerativeAI(model=\"gemini-2.5-flash\", temperature=0)\n```\n\u003C\u002Fdetails>\n\n---\n\n## Implementation\n\nAdditional details, extended explanations, and Langfuse observability (LLM call tracing, tool usage, and graph execution tracking) are available in the **[notebook](notebooks\u002Fagentic_rag.ipynb)** and in the full project.\n\n| Step | Description |\n|------|-------------|\n| 1 | [Initial Setup and Configuration](#step-1-initial-setup-and-configuration) |\n| 2 | [Configure Vector Database](#step-2-configure-vector-database) |\n| 3 | [PDFs to Markdown](#step-3-pdfs-to-markdown) |\n| 4 | [Hierarchical Document Indexing](#step-4-hierarchical-document-indexing) |\n| 5 | [Define Agent Tools](#step-5-define-agent-tools) |\n| 6 | [Define System Prompts](#step-6-define-system-prompts) |\n| 7 | [Define State and Data Models](#step-7-define-state-and-data-models) |\n| 8 | [Agent Configuration](#step-8-agent-configuration) |\n| 9 | [Build Graph Node and Edge Functions](#step-9-build-graph-node-and-edge-functions) |\n| 10 | [Build the LangGraph Graphs](#step-10-build-the-langgraph-graphs) |\n| 11 | [Create Chat Interface](#step-11-create-chat-interface) |\n\n### Step 1: Initial Setup and Configuration\n\nDefine paths and initialize core components.\n\n```python\nimport os\nfrom pathlib import Path\nfrom langchain_huggingface import HuggingFaceEmbeddings\nfrom langchain_qdrant.fastembed_sparse import FastEmbedSparse\nfrom qdrant_client import QdrantClient\n\nDOCS_DIR = \"docs\"  # Directory containing your pdf files\nMARKDOWN_DIR = \"markdown_docs\" # Directory containing the pdfs converted to markdown\nPARENT_STORE_PATH = \"parent_store\"  # Directory for parent chunk JSON files\nCHILD_COLLECTION = \"document_child_chunks\"\n\nos.makedirs(DOCS_DIR, exist_ok=True)\nos.makedirs(MARKDOWN_DIR, exist_ok=True)\nos.makedirs(PARENT_STORE_PATH, exist_ok=True)\n\nfrom langchain_ollama import ChatOllama\nllm = ChatOllama(model=\"qwen3:4b-instruct-2507-q4_K_M\", temperature=0)\n\ndense_embeddings = HuggingFaceEmbeddings(model_name=\"sentence-transformers\u002Fall-mpnet-base-v2\")\nsparse_embeddings = FastEmbedSparse(model_name=\"Qdrant\u002Fbm25\")\n\nclient = QdrantClient(path=\"qdrant_db\")\n```\n\n---\n\n### Step 2: Configure Vector Database\n\nSet up Qdrant to store child chunks with hybrid search capabilities.\n\n```python\nfrom qdrant_client.http import models as qmodels\nfrom langchain_qdrant import QdrantVectorStore\nfrom langchain_qdrant.qdrant import RetrievalMode\n\nembedding_dimension = len(dense_embeddings.embed_query(\"test\"))\n\ndef ensure_collection(collection_name):\n    if not client.collection_exists(collection_name):\n        client.create_collection(\n            collection_name=collection_name,\n            vectors_config=qmodels.VectorParams(\n                size=embedding_dimension,\n                distance=qmodels.Distance.COSINE\n            ),\n            sparse_vectors_config={\n                \"sparse\": qmodels.SparseVectorParams()\n            },\n        )\n```\n\n---\n\n### Step 3: PDFs to Markdown\n\nConvert the PDFs to Markdown. For more details about other techniques use this companion [notebook](notebooks\u002Fpdf_to_markdown.ipynb).\n\n```python\nimport os\nimport pymupdf.layout\nimport pymupdf4llm\nfrom pathlib import Path\nimport glob\n\nos.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n\ndef pdf_to_markdown(pdf_path, output_dir):\n    doc = pymupdf.open(pdf_path)\n    md = pymupdf4llm.to_markdown(doc, header=False, footer=False, page_separators=True, ignore_images=True, write_images=False, image_path=None)\n    md_cleaned = md.encode('utf-8', errors='surrogatepass').decode('utf-8', errors='ignore')\n    output_path = Path(output_dir) \u002F Path(doc.name).stem\n    Path(output_path).with_suffix(\".md\").write_bytes(md_cleaned.encode('utf-8'))\n\ndef pdfs_to_markdowns(path_pattern, overwrite: bool = False):\n    output_dir = Path(MARKDOWN_DIR)\n    output_dir.mkdir(parents=True, exist_ok=True)\n\n    for pdf_path in map(Path, glob.glob(path_pattern)):\n        md_path = (output_dir \u002F pdf_path.stem).with_suffix(\".md\")\n        if overwrite or not md_path.exists():\n            pdf_to_markdown(pdf_path, output_dir)\n\npdfs_to_markdowns(f\"{DOCS_DIR}\u002F*.pdf\")\n```\n\n---\n\n### Step 4: Hierarchical Document Indexing\n\nProcess documents with the Parent\u002FChild splitting strategy.\n```python\nimport os\nimport glob\nimport json\nfrom pathlib import Path\nfrom langchain_text_splitters import MarkdownHeaderTextSplitter, RecursiveCharacterTextSplitter\n```\n\n\u003Cdetails>\n\u003Csummary>Parent & Child chunk processing functions\u003C\u002Fsummary>\n\n```python\ndef merge_small_parents(chunks, min_size):\n    if not chunks:\n        return []\n\n    merged, current = [], None\n\n    for chunk in chunks:\n        if current is None:\n            current = chunk\n        else:\n            current.page_content += \"\\n\\n\" + chunk.page_content\n            for k, v in chunk.metadata.items():\n                if k in current.metadata:\n                    current.metadata[k] = f\"{current.metadata[k]} -> {v}\"\n                else:\n                    current.metadata[k] = v\n\n        if len(current.page_content) >= min_size:\n            merged.append(current)\n            current = None\n\n    if current:\n        if merged:\n            merged[-1].page_content += \"\\n\\n\" + current.page_content\n            for k, v in current.metadata.items():\n                if k in merged[-1].metadata:\n                    merged[-1].metadata[k] = f\"{merged[-1].metadata[k]} -> {v}\"\n                else:\n                    merged[-1].metadata[k] = v\n        else:\n            merged.append(current)\n\n    return merged\n\ndef split_large_parents(chunks, max_size, splitter):\n    split_chunks = []\n\n    for chunk in chunks:\n        if len(chunk.page_content) \u003C= max_size:\n            split_chunks.append(chunk)\n        else:\n            large_splitter = RecursiveCharacterTextSplitter(\n                chunk_size=max_size,\n                chunk_overlap=splitter._chunk_overlap\n            )\n            sub_chunks = large_splitter.split_documents([chunk])\n            split_chunks.extend(sub_chunks)\n\n    return split_chunks\n\ndef clean_small_chunks(chunks, min_size):\n    cleaned = []\n\n    for i, chunk in enumerate(chunks):\n        if len(chunk.page_content) \u003C min_size:\n            if cleaned:\n                cleaned[-1].page_content += \"\\n\\n\" + chunk.page_content\n                for k, v in chunk.metadata.items():\n                    if k in cleaned[-1].metadata:\n                        cleaned[-1].metadata[k] = f\"{cleaned[-1].metadata[k]} -> {v}\"\n                    else:\n                        cleaned[-1].metadata[k] = v\n            elif i \u003C len(chunks) - 1:\n                chunks[i + 1].page_content = chunk.page_content + \"\\n\\n\" + chunks[i + 1].page_content\n                for k, v in chunk.metadata.items():\n                    if k in chunks[i + 1].metadata:\n                        chunks[i + 1].metadata[k] = f\"{v} -> {chunks[i + 1].metadata[k]}\"\n                    else:\n                        chunks[i + 1].metadata[k] = v\n            else:\n                cleaned.append(chunk)\n        else:\n            cleaned.append(chunk)\n\n    return cleaned\n```\n\n\u003C\u002Fdetails>\n\n```python\nif client.collection_exists(CHILD_COLLECTION):\n    client.delete_collection(CHILD_COLLECTION)\n    ensure_collection(CHILD_COLLECTION)\nelse:\n    ensure_collection(CHILD_COLLECTION)\n\nchild_vector_store = QdrantVectorStore(\n    client=client,\n    collection_name=CHILD_COLLECTION,\n    embedding=dense_embeddings,\n    sparse_embedding=sparse_embeddings,\n    retrieval_mode=RetrievalMode.HYBRID,\n    sparse_vector_name=\"sparse\"\n)\n\ndef index_documents():\n    headers_to_split_on = [(\"#\", \"H1\"), (\"##\", \"H2\"), (\"###\", \"H3\")]\n    parent_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on, strip_headers=False)\n    child_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n\n    min_parent_size = 2000\n    max_parent_size = 4000\n\n    all_parent_pairs, all_child_chunks = [], []\n    md_files = sorted(glob.glob(os.path.join(MARKDOWN_DIR, \"*.md\")))\n\n    if not md_files:\n        return\n\n    for doc_path_str in md_files:\n        doc_path = Path(doc_path_str)\n        try:\n            with open(doc_path, \"r\", encoding=\"utf-8\") as f:\n                md_text = f.read()\n        except Exception as e:\n            continue\n\n        parent_chunks = parent_splitter.split_text(md_text)\n        merged_parents = merge_small_parents(parent_chunks, min_parent_size)\n        split_parents = split_large_parents(merged_parents, max_parent_size, child_splitter)\n        cleaned_parents = clean_small_chunks(split_parents, min_parent_size)\n\n        for i, p_chunk in enumerate(cleaned_parents):\n            parent_id = f\"{doc_path.stem}_parent_{i}\"\n            p_chunk.metadata.update({\"source\": doc_path.stem + \".pdf\", \"parent_id\": parent_id})\n            all_parent_pairs.append((parent_id, p_chunk))\n            children = child_splitter.split_documents([p_chunk])\n            all_child_chunks.extend(children)\n\n    if not all_child_chunks:\n        return\n\n    try:\n        child_vector_store.add_documents(all_child_chunks)\n    except Exception as e:\n        return\n\n    for item in os.listdir(PARENT_STORE_PATH):\n        os.remove(os.path.join(PARENT_STORE_PATH, item))\n\n    for parent_id, doc in all_parent_pairs:\n        doc_dict = {\"page_content\": doc.page_content, \"metadata\": doc.metadata}\n        filepath = os.path.join(PARENT_STORE_PATH, f\"{parent_id}.json\")\n        with open(filepath, \"w\", encoding=\"utf-8\") as f:\n            json.dump(doc_dict, f, ensure_ascii=False, indent=2)\n\nindex_documents()\n```\n\n---\n\n### Step 5: Define Agent Tools\n\nCreate the retrieval tools the agent will use.\n\n```python\nimport json\nfrom typing import List\nfrom langchain_core.tools import tool\n\n@tool\ndef search_child_chunks(query: str, limit: int) -> str:\n    \"\"\"Search for the top K most relevant child chunks.\n\n    Args:\n        query: Search query string\n        limit: Maximum number of results to return\n    \"\"\"\n    try:\n        results = child_vector_store.similarity_search(query, k=limit, score_threshold=0.7)\n        if not results:\n            return \"NO_RELEVANT_CHUNKS\"\n\n        return \"\\n\\n\".join([\n            f\"Parent ID: {doc.metadata.get('parent_id', '')}\\n\"\n            f\"File Name: {doc.metadata.get('source', '')}\\n\"\n            f\"Content: {doc.page_content.strip()}\"\n            for doc in results\n        ])\n\n    except Exception as e:\n        return f\"RETRIEVAL_ERROR: {str(e)}\"\n\n@tool\ndef retrieve_parent_chunks(parent_id: str) -> str:\n    \"\"\"Retrieve full parent chunks by their IDs.\n    \n    Args:\n        parent_id: Parent chunk ID to retrieve\n    \"\"\"\n    file_name = parent_id if parent_id.lower().endswith(\".json\") else f\"{parent_id}.json\"\n    path = os.path.join(PARENT_STORE_PATH, file_name)\n\n    if not os.path.exists(path):\n        return \"NO_PARENT_DOCUMENT\"\n\n    with open(path, \"r\", encoding=\"utf-8\") as f:\n        data = json.load(f)\n\n    return (\n        f\"Parent ID: {parent_id}\\n\"\n        f\"File Name: {data.get('metadata', {}).get('source', 'unknown')}\\n\"\n        f\"Content: {data.get('page_content', '').strip()}\"\n    )\n\nllm_with_tools = llm.bind_tools([search_child_chunks, retrieve_parent_chunks])\n```\n\n---\n\n### Step 6: Define System Prompts\n\nDefine the system prompts for conversation summarization, query rewriting, agent orchestration, context compression, fallback response, and answer aggregation.\n\n\u003Cdetails>\n\u003Csummary>Conversation Summary Prompt\u003C\u002Fsummary>\n\n```python\ndef get_conversation_summary_prompt() -> str:\n    return \"\"\"You are an expert conversation summarizer.\n\nYour task is to create a brief 1-2 sentence summary of the conversation (max 30-50 words).\n\nInclude:\n- Main topics discussed\n- Important facts or entities mentioned\n- Any unresolved questions if applicable\n- Sources file name (e.g., file1.pdf) or documents referenced\n\nExclude:\n- Greetings, misunderstandings, off-topic content.\n\nOutput:\n- Return ONLY the summary.\n- Do NOT include any explanations or justifications.\n- If no meaningful topics exist, return an empty string.\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Query Rewrite Prompt\u003C\u002Fsummary>\n\n```python\ndef get_rewrite_query_prompt() -> str:\n    return \"\"\"You are an expert query analyst and rewriter.\n\nYour task is to rewrite the current user query for optimal document retrieval, incorporating conversation context only when necessary.\n\nRules:\n1. Self-contained queries:\n   - Always rewrite the query to be clear and self-contained\n   - If the query is a follow-up (e.g., \"what about X?\", \"and for Y?\"), integrate minimal necessary context from the summary\n   - Do not add information not present in the query or conversation summary\n\n2. Domain-specific terms:\n   - Product names, brands, proper nouns, or technical terms are treated as domain-specific\n   - For domain-specific queries, use conversation context minimally or not at all\n   - Use the summary only to disambiguate vague queries\n\n3. Grammar and clarity:\n   - Fix grammar, spelling errors, and unclear abbreviations\n   - Remove filler words and conversational phrases\n   - Preserve concrete keywords and named entities\n\n4. Multiple information needs:\n   - If the query contains multiple distinct, unrelated questions, split into separate queries (maximum 3)\n   - Each sub-query must remain semantically equivalent to its part of the original\n   - Do not expand, enrich, or reinterpret the meaning\n\n5. Failure handling:\n   - If the query intent is unclear or unintelligible, mark as \"unclear\"\n\nInput:\n- conversation_summary: A concise summary of prior conversation\n- current_query: The user's current query\n\nOutput:\n- One or more rewritten, self-contained queries suitable for document retrieval\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Orchestrator Prompt\u003C\u002Fsummary>\n\n```python\ndef get_orchestrator_prompt() -> str:\n    return \"\"\"You are an expert retrieval-augmented assistant.\n\nYour task is to act as a researcher: search documents first, analyze the data, and then provide a comprehensive answer using ONLY the retrieved information.\n\nRules:\n1. You MUST call 'search_child_chunks' before answering, unless the [COMPRESSED CONTEXT FROM PRIOR RESEARCH] already contains sufficient information.\n2. Ground every claim in the retrieved documents. If context is insufficient, state what is missing rather than filling gaps with assumptions.\n3. If no relevant documents are found, broaden or rephrase the query and search again. Repeat until satisfied or the operation limit is reached.\n\nCompressed Memory:\nWhen [COMPRESSED CONTEXT FROM PRIOR RESEARCH] is present —\n- Queries already listed: do not repeat them.\n- Parent IDs already listed: do not call `retrieve_parent_chunks` on them again.\n- Use it to identify what is still missing before searching further.\n\nWorkflow:\n1. Check the compressed context. Identify what has already been retrieved and what is still missing.\n2. Search for 5-7 relevant excerpts using 'search_child_chunks' ONLY for uncovered aspects.\n3. If NONE are relevant, apply rule 3 immediately.\n4. For each relevant but fragmented excerpt, call 'retrieve_parent_chunks' ONE BY ONE — only for IDs not in the compressed context. Never retrieve the same ID twice.\n5. Once context is complete, provide a detailed answer omitting no relevant facts.\n6. Conclude with \"---\\n**Sources:**\\n\" followed by the unique file names.\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Fallback Response Prompt\u003C\u002Fsummary>\n\n```python\ndef get_fallback_response_prompt() -> str:\n    return \"\"\"You are an expert synthesis assistant. The system has reached its maximum research limit.\n\nYour task is to provide the most complete answer possible using ONLY the information provided below.\n\nInput structure:\n- \"Compressed Research Context\": summarized findings from prior search iterations — treat as reliable.\n- \"Retrieved Data\": raw tool outputs from the current iteration — prefer over compressed context if conflicts arise.\nEither source alone is sufficient if the other is absent.\n\nRules:\n1. Source Integrity: Use only facts explicitly present in the provided context. Do not infer, assume, or add any information not directly supported by the data.\n2. Handling Missing Data: Cross-reference the USER QUERY against the available context.\n   Flag ONLY aspects of the user's question that cannot be answered from the provided data.\n   Do not treat gaps mentioned in the Compressed Research Context as unanswered\n   unless they are directly relevant to what the user asked.\n3. Tone: Professional, factual, and direct.\n4. Output only the final answer. Do not expose your reasoning, internal steps, or any meta-commentary about the retrieval process.\n5. Do NOT add closing remarks, final notes, disclaimers, summaries, or repeated statements after the Sources section.\n   The Sources section is always the last element of your response. Stop immediately after it.\n\nFormatting:\n- Use Markdown (headings, bold, lists) for readability.\n- Write in flowing paragraphs where possible.\n- Conclude with a Sources section as described below.\n\nSources section rules:\n- Include a \"---\\\\n**Sources:**\\\\n\" section at the end, followed by a bulleted list of file names.\n- List ONLY entries that have a real file extension (e.g. \".pdf\", \".docx\", \".txt\").\n- Any entry without a file extension is an internal chunk identifier — discard it entirely, never include it.\n- Deduplicate: if the same file appears multiple times, list it only once.\n- If no valid file names are present, omit the Sources section entirely.\n- THE SOURCES SECTION IS THE LAST THING YOU WRITE. Do not add anything after it.\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Context Compression Prompt\u003C\u002Fsummary>\n\n```python\ndef get_context_compression_prompt() -> str:\n    return \"\"\"You are an expert research context compressor.\n\nYour task is to compress retrieved conversation content into a concise, query-focused, and structured summary that can be directly used by a retrieval-augmented agent for answer generation.\n\nRules:\n1. Keep ONLY information relevant to answering the user's question.\n2. Preserve exact figures, names, versions, technical terms, and configuration details.\n3. Remove duplicated, irrelevant, or administrative details.\n4. Do NOT include search queries, parent IDs, chunk IDs, or internal identifiers.\n5. Organize all findings by source file. Each file section MUST start with: ### filename.pdf\n6. Highlight missing or unresolved information in a dedicated \"Gaps\" section.\n7. Limit the summary to roughly 400-600 words. If content exceeds this, prioritize critical facts and structured data.\n8. Do not explain your reasoning; output only structured content in Markdown.\n\nRequired Structure:\n\n# Research Context Summary\n\n## Focus\n[Brief technical restatement of the question]\n\n## Structured Findings\n\n### filename.pdf\n- Directly relevant facts\n- Supporting context (if needed)\n\n## Gaps\n- Missing or incomplete aspects\n\nThe summary should be concise, structured, and directly usable by an agent to generate answers or plan further retrieval.\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Aggregation Prompt\u003C\u002Fsummary>\n\n```python\ndef get_aggregation_prompt() -> str:\n    return \"\"\"You are an expert aggregation assistant.\n\nYour task is to combine multiple retrieved answers into a single, comprehensive and natural response that flows well.\n\nRules:\n1. Write in a conversational, natural tone - as if explaining to a colleague.\n2. Use ONLY information from the retrieved answers.\n3. Do NOT infer, expand, or interpret acronyms or technical terms unless explicitly defined in the sources.\n4. Weave together the information smoothly, preserving important details, numbers, and examples.\n5. Be comprehensive - include all relevant information from the sources, not just a summary.\n6. If sources disagree, acknowledge both perspectives naturally (e.g., \"While some sources suggest X, others indicate Y...\").\n7. Start directly with the answer - no preambles like \"Based on the sources...\".\n\nFormatting:\n- Use Markdown for clarity (headings, lists, bold) but don't overdo it.\n- Write in flowing paragraphs where possible rather than excessive bullet points.\n- Conclude with a Sources section as described below.\n\nSources section rules:\n- Each retrieved answer may contain a \"Sources\" section — extract the file names listed there.\n- List ONLY entries that have a real file extension (e.g. \".pdf\", \".docx\", \".txt\").\n- Any entry without a file extension is an internal chunk identifier — discard it entirely, never include it.\n- Deduplicate: if the same file appears across multiple answers, list it only once.\n- Format as \"---\\\\n**Sources:**\\\\n\" followed by a bulleted list of the cleaned file names.\n- File names must appear ONLY in this final Sources section and nowhere else in the response.\n- If no valid file names are present, omit the Sources section entirely.\n\nIf there's no useful information available, simply say: \"I couldn't find any information to answer your question in the available sources.\"\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n---\n\n### Step 7: Define State and Data Models\n\nCreate the state structure for conversation tracking and agent execution.\n\n```python\nfrom langgraph.graph import MessagesState\nfrom pydantic import BaseModel, Field\nfrom typing import List, Annotated, Set\nimport operator\n\ndef accumulate_or_reset(existing: List[dict], new: List[dict]) -> List[dict]:\n    if new and any(item.get('__reset__') for item in new):\n        return []\n    return existing + new\n\ndef set_union(a: Set[str], b: Set[str]) -> Set[str]:\n    return a | b\n\nclass State(MessagesState):\n    questionIsClear: bool = False\n    conversation_summary: str = \"\"\n    originalQuery: str = \"\"\n    rewrittenQuestions: List[str] = []\n    agent_answers: Annotated[List[dict], accumulate_or_reset] = []\n\nclass AgentState(MessagesState):\n    tool_call_count: Annotated[int, operator.add] = 0\n    iteration_count: Annotated[int, operator.add] = 0\n    question: str = \"\"\n    question_index: int = 0\n    context_summary: str = \"\"\n    retrieval_keys: Annotated[Set[str], set_union] = set()\n    final_answer: str = \"\"\n    agent_answers: List[dict] = []\n\nclass QueryAnalysis(BaseModel):\n    is_clear: bool = Field(description=\"Indicates if the user's question is clear and answerable.\")\n    questions: List[str] = Field(description=\"List of rewritten, self-contained questions.\")\n    clarification_needed: str = Field(description=\"Explanation if the question is unclear.\")\n```\n\n---\n\n### Step 8: Agent Configuration\n\nHard limits on tool calls and iterations prevent infinite loops. Token counting (via `tiktoken`) drives context compression decisions.\n\n```python\nimport tiktoken\n\nMAX_TOOL_CALLS = 8       # Maximum tool calls per agent run\nMAX_ITERATIONS = 10      # Maximum agent loop iterations\nBASE_TOKEN_THRESHOLD = 2000     # Initial token threshold for compression\nTOKEN_GROWTH_FACTOR = 0.9       # Multiplier applied after each compression\n\ndef estimate_context_tokens(messages: list) -> int:\n    try:\n        encoding = tiktoken.encoding_for_model(\"gpt-4\")\n    except:\n        encoding = tiktoken.get_encoding(\"cl100k_base\")\n    return sum(len(encoding.encode(str(msg.content))) for msg in messages if hasattr(msg, 'content') and msg.content)\n```\n\n---\n\n### Step 9: Build Graph Node and Edge Functions\n\nCreate the processing nodes and edges for the LangGraph workflow.\n\n#### Main Graph Nodes & Edges\n```python\nfrom langgraph.types import Send, Command\nfrom langchain_core.messages import HumanMessage, AIMessage, SystemMessage, RemoveMessage, ToolMessage\nfrom typing import Literal\n\ndef summarize_history(state: State):\n    if len(state[\"messages\"]) \u003C 4:\n        return {\"conversation_summary\": \"\"}\n\n    relevant_msgs = [\n        msg for msg in state[\"messages\"][:-1]\n        if isinstance(msg, (HumanMessage, AIMessage)) and not getattr(msg, \"tool_calls\", None)\n    ]\n\n    if not relevant_msgs:\n        return {\"conversation_summary\": \"\"}\n\n    conversation = \"Conversation history:\\n\"\n    for msg in relevant_msgs[-6:]:\n        role = \"User\" if isinstance(msg, HumanMessage) else \"Assistant\"\n        conversation += f\"{role}: {msg.content}\\n\"\n\n    summary_response = llm.with_config(temperature=0.2).invoke([SystemMessage(content=get_conversation_summary_prompt()), HumanMessage(content=conversation)])\n    return {\"conversation_summary\": summary_response.content, \"agent_answers\": [{\"__reset__\": True}]}\n\ndef rewrite_query(state: State):\n    last_message = state[\"messages\"][-1]\n    conversation_summary = state.get(\"conversation_summary\", \"\")\n\n    context_section = (f\"Conversation Context:\\n{conversation_summary}\\n\" if conversation_summary.strip() else \"\") + f\"User Query:\\n{last_message.content}\\n\"\n\n    llm_with_structure = llm.with_config(temperature=0.1).with_structured_output(QueryAnalysis)\n    response = llm_with_structure.invoke([SystemMessage(content=get_rewrite_query_prompt()), HumanMessage(content=context_section)])\n\n    if response.questions and response.is_clear:\n        delete_all = [RemoveMessage(id=m.id) for m in state[\"messages\"] if not isinstance(m, SystemMessage)]\n        return {\"questionIsClear\": True, \"messages\": delete_all, \"originalQuery\": last_message.content, \"rewrittenQuestions\": response.questions}\n\n    clarification = response.clarification_needed if response.clarification_needed and len(response.clarification_needed.strip()) > 10 else \"I need more information to understand your question.\"\n    return {\"questionIsClear\": False, \"messages\": [AIMessage(content=clarification)]}\n\ndef request_clarification(state: State):\n    return {}\n\ndef route_after_rewrite(state: State) -> Literal[\"request_clarification\", \"agent\"]:\n    if not state.get(\"questionIsClear\", False):\n        return \"request_clarification\"\n    else:\n        return [\n                Send(\"agent\", {\"question\": query, \"question_index\": idx, \"messages\": []})\n                for idx, query in enumerate(state[\"rewrittenQuestions\"])\n            ]\n\ndef aggregate_answers(state: State):\n    if not state.get(\"agent_answers\"):\n        return {\"messages\": [AIMessage(content=\"No answers were generated.\")]}\n\n    sorted_answers = sorted(state[\"agent_answers\"], key=lambda x: x[\"index\"])\n\n    formatted_answers = \"\"\n    for i, ans in enumerate(sorted_answers, start=1):\n        formatted_answers += (f\"\\nAnswer {i}:\\n\"f\"{ans['answer']}\\n\")\n\n    user_message = HumanMessage(content=f\"\"\"Original user question: {state[\"originalQuery\"]}\\nRetrieved answers:{formatted_answers}\"\"\")\n    synthesis_response = llm.invoke([SystemMessage(content=get_aggregation_prompt()), user_message])\n    return {\"messages\": [AIMessage(content=synthesis_response.content)]}\n```\n\n---\n\n#### Agent Subgraph Nodes & Edges\n```python\ndef orchestrator(state: AgentState):\n    context_summary = state.get(\"context_summary\", \"\").strip()\n    sys_msg = SystemMessage(content=get_orchestrator_prompt())\n    summary_injection = (\n        [HumanMessage(content=f\"[COMPRESSED CONTEXT FROM PRIOR RESEARCH]\\n\\n{context_summary}\")]\n        if context_summary else []\n    )\n    if not state.get(\"messages\"):\n        human_msg = HumanMessage(content=state[\"question\"])\n        force_search = HumanMessage(content=\"YOU MUST CALL 'search_child_chunks' AS THE FIRST STEP TO ANSWER THIS QUESTION.\")\n        response = llm_with_tools.invoke([sys_msg] + summary_injection + [human_msg, force_search])\n        return {\"messages\": [human_msg, response], \"tool_call_count\": len(response.tool_calls or []), \"iteration_count\": 1}\n\n    response = llm_with_tools.invoke([sys_msg] + summary_injection + state[\"messages\"])\n    tool_calls = response.tool_calls if hasattr(response, \"tool_calls\") else []\n    return {\"messages\": [response], \"tool_call_count\": len(tool_calls) if tool_calls else 0, \"iteration_count\": 1}\n\ndef route_after_orchestrator_call(state: AgentState) -> Literal[\"tool\", \"fallback_response\", \"collect_answer\"]:\n    iteration = state.get(\"iteration_count\", 0)\n    tool_count = state.get(\"tool_call_count\", 0)\n\n    if iteration >= MAX_ITERATIONS or tool_count > MAX_TOOL_CALLS:\n        return \"fallback_response\"\n\n    last_message = state[\"messages\"][-1]\n    tool_calls = getattr(last_message, \"tool_calls\", None) or []\n\n    if not tool_calls:\n        return \"collect_answer\"\n    \n    return \"tools\"\n\ndef fallback_response(state: AgentState):\n    seen = set()\n    unique_contents = []\n    for m in state[\"messages\"]:\n        if isinstance(m, ToolMessage) and m.content not in seen:\n            unique_contents.append(m.content)\n            seen.add(m.content)\n\n    context_summary = state.get(\"context_summary\", \"\").strip()\n\n    context_parts = []\n    if context_summary:\n        context_parts.append(f\"## Compressed Research Context (from prior iterations)\\n\\n{context_summary}\")\n    if unique_contents:\n        context_parts.append(\n            \"## Retrieved Data (current iteration)\\n\\n\" +\n            \"\\n\\n\".join(f\"--- DATA SOURCE {i} ---\\n{content}\" for i, content in enumerate(unique_contents, 1))\n        )\n\n    context_text = \"\\n\\n\".join(context_parts) if context_parts else \"No data was retrieved from the documents.\"\n\n    prompt_content = (\n        f\"USER QUERY: {state.get('question')}\\n\\n\"\n        f\"{context_text}\\n\\n\"\n        f\"INSTRUCTION:\\nProvide the best possible answer using only the data above.\"\n    )\n    response = llm.invoke([SystemMessage(content=get_fallback_response_prompt()), HumanMessage(content=prompt_content)])\n    return {\"messages\": [response]}\n\ndef should_compress_context(state: AgentState) -> Command[Literal[\"compress_context\", \"orchestrator\"]]:\n    messages = state[\"messages\"]\n\n    new_ids: Set[str] = set()\n    for msg in reversed(messages):\n        if isinstance(msg, AIMessage) and getattr(msg, \"tool_calls\", None):\n            for tc in msg.tool_calls:\n                if tc[\"name\"] == \"retrieve_parent_chunks\":\n                    raw = tc[\"args\"].get(\"parent_id\") or tc[\"args\"].get(\"id\") or tc[\"args\"].get(\"ids\") or []\n                    if isinstance(raw, str):\n                        new_ids.add(f\"parent::{raw}\")\n                    else:\n                        new_ids.update(f\"parent::{r}\" for r in raw)\n\n                elif tc[\"name\"] == \"search_child_chunks\":\n                    query = tc[\"args\"].get(\"query\", \"\")\n                    if query:\n                        new_ids.add(f\"search::{query}\")\n            break\n\n    updated_ids = state.get(\"retrieval_keys\", set()) | new_ids\n\n    current_token_messages = estimate_context_tokens(messages)\n    current_token_summary = estimate_context_tokens([HumanMessage(content=state.get(\"context_summary\", \"\"))])\n    current_tokens = current_token_messages + current_token_summary\n\n    max_allowed = BASE_TOKEN_THRESHOLD + int(current_token_summary * TOKEN_GROWTH_FACTOR)\n\n    goto = \"compress_context\" if current_tokens > max_allowed else \"orchestrator\"\n    return Command(update={\"retrieval_keys\": updated_ids}, goto=goto)\n\ndef compress_context(state: AgentState):\n    messages = state[\"messages\"]\n    existing_summary = state.get(\"context_summary\", \"\").strip()\n\n    if not messages:\n        return {}\n\n    conversation_text = f\"USER QUESTION:\\n{state.get('question')}\\n\\nConversation to compress:\\n\\n\"\n    if existing_summary:\n        conversation_text += f\"[PRIOR COMPRESSED CONTEXT]\\n{existing_summary}\\n\\n\"\n\n    for msg in messages[1:]:\n        if isinstance(msg, AIMessage):\n            tool_calls_info = \"\"\n            if getattr(msg, \"tool_calls\", None):\n                calls = \", \".join(f\"{tc['name']}({tc['args']})\" for tc in msg.tool_calls)\n                tool_calls_info = f\" | Tool calls: {calls}\"\n            conversation_text += f\"[ASSISTANT{tool_calls_info}]\\n{msg.content or '(tool call only)'}\\n\\n\"\n        elif isinstance(msg, ToolMessage):\n            tool_name = getattr(msg, \"name\", \"tool\")\n            conversation_text += f\"[TOOL RESULT — {tool_name}]\\n{msg.content}\\n\\n\"\n\n    summary_response = llm.invoke([SystemMessage(content=get_context_compression_prompt()), HumanMessage(content=conversation_text)])\n    new_summary = summary_response.content\n\n    retrieved_ids: Set[str] = state.get(\"retrieval_keys\", set())\n    if retrieved_ids:\n        parent_ids = sorted(r for r in retrieved_ids if r.startswith(\"parent::\"))\n        search_queries = sorted(r.replace(\"search::\", \"\") for r in retrieved_ids if r.startswith(\"search::\"))\n\n        block = \"\\n\\n---\\n**Already executed (do NOT repeat):**\\n\"\n        if parent_ids:\n            block += \"Parent chunks retrieved:\\n\" + \"\\n\".join(f\"- {p.replace('parent::', '')}\" for p in parent_ids) + \"\\n\"\n        if search_queries:\n            block += \"Search queries already run:\\n\" + \"\\n\".join(f\"- {q}\" for q in search_queries) + \"\\n\"\n        new_summary += block\n\n    return {\"context_summary\": new_summary, \"messages\": [RemoveMessage(id=m.id) for m in messages[1:]]}\n\ndef collect_answer(state: AgentState):\n    last_message = state[\"messages\"][-1]\n    is_valid = isinstance(last_message, AIMessage) and last_message.content and not last_message.tool_calls\n    answer = last_message.content if is_valid else \"Unable to generate an answer.\"\n    return {\n        \"final_answer\": answer,\n        \"agent_answers\": [{\"index\": state[\"question_index\"], \"question\": state[\"question\"], \"answer\": answer}]\n    }\n```\n\n**Why this architecture?**\n- **Summarization** maintains conversational context without overwhelming the LLM\n- **Query rewriting** ensures search queries are precise and unambiguous, using context intelligently\n- **Human-in-the-loop** catches unclear queries before wasting any retrieval resources\n- **Parallel execution** via `Send` API spawns independent agent subgraphs for each sub-question simultaneously\n- **Context compression** keeps the agent's working memory lean across long retrieval loops, preventing redundant fetches\n- **Fallback response** ensures graceful degradation — the agent always returns something useful even when the budget runs out\n- **Answer collection & aggregation** extracts clean final answers from agents and aggregates them into a single coherent response\n---\n\n### Step 10: Build the LangGraph Graphs\n\nAssemble the complete workflow graph with conversation memory and multi-agent architecture.\n\n```python\nfrom langgraph.graph import START, END, StateGraph\nfrom langgraph.prebuilt import ToolNode\nfrom langgraph.checkpoint.memory import InMemorySaver\n\ncheckpointer = InMemorySaver()\n\nagent_builder = StateGraph(AgentState)\nagent_builder.add_node(orchestrator)\nagent_builder.add_node(\"tools\", ToolNode([search_child_chunks, retrieve_parent_chunks]))\nagent_builder.add_node(compress_context)\nagent_builder.add_node(fallback_response)\nagent_builder.add_node(should_compress_context)\nagent_builder.add_node(collect_answer)\n\nagent_builder.add_edge(START, \"orchestrator\")\nagent_builder.add_conditional_edges(\"orchestrator\", route_after_orchestrator_call, {\"tools\": \"tools\", \"fallback_response\": \"fallback_response\", \"collect_answer\": \"collect_answer\"})\nagent_builder.add_edge(\"tools\", \"should_compress_context\")\nagent_builder.add_edge(\"compress_context\", \"orchestrator\")\nagent_builder.add_edge(\"fallback_response\", \"collect_answer\")\nagent_builder.add_edge(\"collect_answer\", END)\nagent_subgraph = agent_builder.compile()\n\ngraph_builder = StateGraph(State)\ngraph_builder.add_node(summarize_history)\ngraph_builder.add_node(rewrite_query)\ngraph_builder.add_node(request_clarification)\ngraph_builder.add_node(\"agent\", agent_subgraph)\ngraph_builder.add_node(aggregate_answers)\n\ngraph_builder.add_edge(START, \"summarize_history\")\ngraph_builder.add_edge(\"summarize_history\", \"rewrite_query\")\ngraph_builder.add_conditional_edges(\"rewrite_query\", route_after_rewrite)\ngraph_builder.add_edge(\"request_clarification\", \"rewrite_query\")\ngraph_builder.add_edge([\"agent\"], \"aggregate_answers\")\ngraph_builder.add_edge(\"aggregate_answers\", END)\n\nagent_graph = graph_builder.compile(checkpointer=checkpointer, interrupt_before=[\"request_clarification\"])\n```\n\n**Graph architecture explained:**\n\nThe architecture flow diagram can be viewed **[here](.\u002Fassets\u002Fagentic_rag_workflow.png)**.\n\n**Agent Subgraph** (processes individual questions):\n- START → `orchestrator` (invoke LLM with tools)\n- `orchestrator` → `tools` (if tool calls needed) OR `fallback_response` (if budget exhausted) OR `collect_answer` (if done)\n- `tools` → `should_compress_context` (check token budget)\n- `should_compress_context` → `compress_context` (if threshold exceeded) OR `orchestrator` (otherwise)\n- `compress_context` → `orchestrator` (resume with compressed memory)\n- `fallback_response` → `collect_answer` (package best-effort answer)\n- `collect_answer` → END (clean final answer with index)\n\n**Main Graph** (orchestrates complete workflow):\n- START → `summarize_history` (extract conversation context from history)\n- `summarize_history` → `rewrite_query` (rewrite query with context, check clarity)\n- `rewrite_query` → `request_clarification` (if unclear) OR spawn parallel `agent` subgraphs via `Send` (if clear)\n- `request_clarification` → `rewrite_query` (after user provides clarification)\n- All `agent` subgraphs → `aggregate_answers` (merge all responses)\n- `aggregate_answers` → END (return final synthesized answer)\n\n---\n\n### Step 11: Create Chat Interface\n\nBuild a Gradio interface with conversation persistence and human-in-the-loop support. For a complete end-to-end pipeline Gradio interface, including document ingestion, please refer to [project\u002FREADME.md](.\u002Fproject\u002FREADME.md).\n\n> **Note:** Full streaming support — including reasoning steps and tool calls visibility — is implemented in the [notebook](notebooks\u002Fagentic_rag.ipynb) and in the full [project](project\u002Fcore\u002Fchat_interface.py). The example below is intentionally minimal — it shows the basic Gradio integration pattern only.\n\n```python\nimport gradio as gr\nimport uuid\n\ndef create_thread_id():\n    \"\"\"Generate a unique thread ID for each conversation\"\"\"\n    return {\"configurable\": {\"thread_id\": str(uuid.uuid4())}, \"recursion_limit\": 50}\n\ndef clear_session():\n    \"\"\"Clear thread for new conversation\"\"\"\n    global config\n    agent_graph.checkpointer.delete_thread(config[\"configurable\"][\"thread_id\"])\n    config = create_thread_id()\n\ndef chat(message, history):\n    current_state = agent_graph.get_state(config)\n    \n    if current_state.next:\n        agent_graph.update_state(config,{\"messages\": [HumanMessage(content=message.strip())]})\n        result = agent_graph.invoke(None, config)\n    else:\n        result = agent_graph.invoke({\"messages\": [HumanMessage(content=message.strip())]}, config)\n    \n    return result['messages'][-1].content\n\nconfig = create_thread_id()\n\nwith gr.Blocks() as demo:\n    chatbot = gr.Chatbot()\n    chatbot.clear(clear_session)\n    gr.ChatInterface(fn=chat, chatbot=chatbot)\n\ndemo.launch(theme=gr.themes.Citrus())\n```\n\n**You're done!** You now have a fully functional Agentic RAG system with conversation memory, hierarchical indexing, and human-in-the-loop query clarification.\n\n---\n\n## Modular Architecture\n\nThe app (`project\u002F` folder) is organized into modular components — each independently swappable without breaking the system.\n\n### 📂 Project Structure\n```\nproject\u002F\n├── app.py                    # Main Gradio application entry point\n├── config.py                 # Configuration hub (models, chunk sizes, providers)\n├── core\u002F                     # RAG system orchestration\n├── db\u002F                       # Vector DB and parent chunk storage\n├── rag_agent\u002F                # LangGraph workflow (nodes, edges, prompts, tools)\n└── ui\u002F                       # Gradio interface\n```\n\nKey customization points: LLM provider, embedding model, chunking strategy, agent workflow, and system prompts — all configurable via `config.py` or their respective modules.\n\nFull documentation in [project\u002FREADME.md](.\u002Fproject\u002FREADME.md).\n\n## Installation & Usage\n\nSample pdf files can be found here: [javascript](https:\u002F\u002Fwww.tutorialspoint.com\u002Fjavascript\u002Fjavascript_tutorial.pdf), [blockchain](https:\u002F\u002Fblockchain-observatory.ec.europa.eu\u002Fdocument\u002Fdownload\u002F1063effa-59cc-4df4-aeee-d2cf94f69178_en?filename=Blockchain_For_Beginners_A_EUBOF_Guide.pdf), [microservices](https:\u002F\u002Fcdn.studio.f5.com\u002Ffiles\u002Fk6fem79d\u002Fproduction\u002F5e4126e1cefa813ab67f9c0b6d73984c27ab1502.pdf), [fortinet](https:\u002F\u002Fwww.commoncriteriaportal.org\u002Ffiles\u002Fepfiles\u002FFortinet%20FortiGate_EAL4_ST_V1.5.pdf(320893)_TMP.pdf).\n\n### Option 1: Quickstart Notebook (Recommended for Testing)\n\n**Google Colab:** Click the **Open in Colab** badge at the top of this README, upload your PDFs to a `docs\u002F` folder in the file browser, install dependencies with `pip install -r requirements.txt`, then run all cells top to bottom.\n\n**Local (Jupyter\u002FVSCode):** Optionally create and activate a virtual environment, install dependencies with `pip install -r requirements.txt`, add your PDFs to `docs\u002F`, then run all cells top to bottom.\n\nThe chat interface will appear at the end.\n\n### Option 2: Full Python Project (Recommended for Development)\n\n#### 1. Install Dependencies\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002FGiovanniPasq\u002Fagentic-rag-for-dummies\ncd agentic-rag-for-dummies\n\n# Optional: create and activate a virtual environment\n# On macOS\u002FLinux:\npython -m venv venv && source venv\u002Fbin\u002Factivate\n# On Windows:\npython -m venv venv && .\\venv\\Scripts\\activate\n\n# Install packages\npip install -r requirements.txt\n```\n\n#### 2. Run the Application\n```bash\npython app.py\n```\n\n#### 3. Ask Questions\n\nOpen the local URL (e.g., `http:\u002F\u002F127.0.0.1:7860`) to start chatting.\n\n---\n\n### Option 3: Docker Deployment\n\nSee [`project\u002FREADME.md`](.\u002Fproject\u002FREADME.md#Docker-Deployment) for full Docker instructions and system requirements.\n\n### Example Conversations\n\n**With Conversation Memory:**\n```\nUser: \"How do I install SQL?\"\nAgent: [Provides installation steps from documentation]\n\nUser: \"How do I update it?\"\nAgent: [Understands \"it\" = SQL, provides update instructions]\n```\n\n**With Query Clarification:**\n```\nUser: \"Tell me about that thing\"\nAgent: \"I need more information. What specific topic are you asking about?\"\n\nUser: \"The installation process for PostgreSQL\"\nAgent: [Retrieves and answers with specific information]\n```\n\n---\n\n## Troubleshooting\n\n| Area | Common Problems | Suggested Solutions |\n|------|----------------|------------------|\n| **Model Selection** | - Responses ignore instructions\u003Cbr>- Tools (retrieval\u002Fsearch) used incorrectly\u003Cbr>- Poor context understanding\u003Cbr>- Hallucinations or incomplete aggregation | - Use more capable LLMs\u003Cbr>- Prefer models 7B+ for better reasoning\u003Cbr>- Consider cloud-based models if local models are limited |\n| **System Prompt Behavior** | - Model answers without retrieving documents\u003Cbr>- Query rewriting loses context\u003Cbr>- Aggregation introduces hallucinations | - Make retrieval explicit in system prompts\u003Cbr>- Keep query rewriting close to user intent |\n| **Retrieval Configuration** | - Relevant documents not retrieved\u003Cbr>- Too much irrelevant information | - Increase retrieved chunks (`k`) or lower similarity thresholds to improve recall\u003Cbr>- Reduce `k` or increase thresholds to improve precision |\n| **Chunk Size \u002F Document Splitting** | - Answers lack context or feel fragmented\u003Cbr>- Retrieval is slow or embedding costs are high | - Increase chunk & parent sizes for more context\u003Cbr>- Decrease chunk sizes to improve speed and reduce costs |\n| **Context Compression** | - Agent loses important details after compression\u003Cbr>- Compressed summaries are too vague | - Tune the compression system prompt\u003Cbr>- Increase `BASE_TOKEN_THRESHOLD` to delay compression\u003Cbr>- Increase `TOKEN_GROWTH_FACTOR` |\n| **Agent Configuration** | - Agent gives up too early \u003Cbr>- Agent loops too long| - Increase `MAX_TOOL_CALLS` \u002F `MAX_ITERATIONS` for complex queries\u003Cbr>- Decrease them to speed up simple queries |\n| **Temperature & Consistency** | - Responses inconsistent or overly creative\u003Cbr>- Responses too rigid or repetitive | - Set temperature to `0` for factual, consistent output\u003Cbr>- Slightly increase temperature for summarization or analysis tasks |\n| **Embedding Model Quality** | - Poor semantic search\u003Cbr>- Weak performance on domain-specific or multilingual docs | - Use higher-quality or domain-specific embeddings\u003Cbr>- Re-index all documents after changing embeddings |\n\n> 💡 **For additional troubleshooting tips** see the [README Troubleshooting](.\u002Fproject\u002FREADME.md#troubleshooting).\n","\u003Cp align=\"center\">\n  \u003Cimg alt=\"傻瓜版代理式RAG Logo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGiovanniPasq_agentic-rag-for-dummies_readme_c4c7470eb994.png\" width=\"350px\">\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">傻瓜版代理式RAG\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>使用LangGraph、对话记忆和人工参与的查询澄清，构建模块化的代理式RAG系统\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#overview\">概述\u003C\u002Fa> •\n  \u003Ca href=\"#how-it-works\">工作原理\u003C\u002Fa> •\n  \u003Ca href=\"#llm-provider-configuration\">LLM提供商\u003C\u002Fa> •\n  \u003Ca href=\"#implementation\">实现\u003C\u002Fa> •\n  \u003Ca href=\"#installation--usage\">安装与使用\u003C\u002Fa> •\n  \u003Ca href=\"#troubleshooting\">故障排除\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FGiovanniPasq\u002Fagentic-rag-for-dummies?style=social\" alt=\"GitHub Star数\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002FGiovanniPasq\u002Fagentic-rag-for-dummies?style=social\" alt=\"GitHub Fork数\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-green\" alt=\"许可证\"\u002F>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fvon-development\u002Fawesome-langgraph\">\n    \u003Cimg src=\"https:\u002F\u002Fawesome.re\u002Fbadge.svg\" alt=\"Awesome LangGraph\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.11%2B-blue?logo=python&logoColor=white\" alt=\"Python\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLangGraph-1.0%2B-orange?logo=langchain&logoColor=white\" alt=\"LangGraph\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FQdrant-vector%20db-DC244C\" alt=\"Qdrant\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLLM%20Providers-Ollama%20%7C%20OpenAI%20%7C%20Anthropic%20%7C%20Google-purple\" alt=\"LLM提供商\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FGiovanniPasq\u002Fagentic-rag-for-dummies\u002Fblob\u002Fmain\u002Fnotebooks\u002Fagentic_rag.ipynb\">\n    \u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"在Colab中打开\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg alt=\"代理式RAG演示\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGiovanniPasq_agentic-rag-for-dummies_readme_f545e6bbde35.gif\" width=\"650px\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>如果你喜欢这个项目，一个星⭐️会对我意义重大 :)\u003C\u002Fstrong>\u003Cbr>\n\u003C\u002Fp>\n\n## 概述\n\n本仓库展示了如何使用LangGraph以最少的代码构建一个**代理式RAG（检索增强生成）**系统。大多数RAG教程只介绍基本概念，却缺乏关于如何构建模块化、基于代理的系统的指导——本项目通过提供**学习资料和可扩展架构**填补了这一空白。\n\n### 内容概览\n\n| 功能 | 描述 |\n|---|---|\n| 🗂️ **分层索引** | 搜索小块以获得精确结果，检索大父级块以获取上下文 |\n| 🧠 **对话记忆** | 在多轮提问中保持上下文连贯性，实现自然对话 |\n| ❓ **查询澄清** | 重写模糊查询或暂停以向用户询问详细信息 |\n| 🤖 **代理编排** | LangGraph协调整个检索和推理流程 |\n| 🔀 **多代理Map-Reduce** | 将复杂查询分解为并行子查询 |\n| ✅ **自我修正** | 如果初始结果不充分，则自动重新查询 |\n| 🗜️ **上下文压缩** | 在长时间的检索循环中保持工作内存精简 |\n| 🔍 **可观测性** | 使用Langfuse跟踪LLM调用、工具使用和图执行 |\n\n### 🎯 使用本仓库的两种方式\n\n**1️⃣ 学习路径：交互式笔记本**\n\n适合理解核心概念的逐步教程。如果你是代理式RAG新手或想快速实验，可以从这里开始。\n\n**2️⃣ 构建路径：模块化项目**\n\n灵活的架构，每个组件都可以独立替换——LLM提供商、嵌入模型、PDF转换器、代理工作流。只需一行代码即可从Ollama切换到Anthropic、OpenAI或Google。\n\n请参阅[模块化架构](#modular-architecture)和[安装与使用](#installation--usage)以开始。\n\n## 工作原理\n\n### 文档准备：分层索引\n\n在处理查询之前，文档会被两次分割，以实现最佳检索效果：\n\n- **父级块**：基于Markdown标题（H1、H2、H3）的大段落\n- **子级块**：由父级块衍生出的小型固定大小片段\n\n> 💡 可选：如果你想在索引前可视化或编辑你的块，可以使用🐿️ [**Chunky**](https:\u002F\u002Fgithub.com\u002FGiovanniPasq\u002Fchunky)。\n\n这种方式结合了**小块的精确性**用于搜索，以及**大块的丰富上下文**用于生成答案。\n\n---\n\n### 查询处理：四阶段智能流程\n```\n用户查询 → 对话摘要 → 查询重写 → 查询澄清 →\n并行代理推理 → 聚合 → 最终回答\n```\n\n**阶段1 — 对话理解：** 分析最近的对话历史，提取上下文并保持多轮提问之间的连贯性。\n\n**阶段2 — 查询澄清：** 解决指代问题（“如何更新它？”→“如何更新SQL？”），将多部分问题拆分为聚焦的子查询，检测不明确的输入，并重写查询以实现最佳检索效果。当需要澄清时，会暂停等待人工输入。\n\n**阶段3 — 智能检索（多代理Map-Reduce）：** 为每个子查询启动并行的代理子图。每个代理搜索子级块，获取父级块以补充上下文，如果结果不足则自我修正，压缩上下文以避免重复检索，并在检索预算耗尽时优雅地回退。\n\n> **示例：** *“JavaScript是什么？Python是什么？”* → 2个并行代理同时执行。\n\n**阶段4 — 回答生成：** 将所有代理的回答聚合为一个连贯的答案。\n\n---\n\n## LLM提供商配置\n\n本系统与提供商无关——它支持[LangChain](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fintegrations\u002Fchat\u002F)中提供的任何LLM提供商，只需一行代码即可切换。以下示例涵盖了最常见的选项，但相同的模式适用于任何其他受支持的提供商。\n\n> **注意：** 模型名称经常变化。在部署前，请务必查阅官方文档，以获取最新的可用模型及其标识符。\n\n### Ollama（本地）\n\n```bash\n# 从https:\u002F\u002Follama.com安装Ollama\nollama pull qwen3:4b-instruct-2507-q4_K_M\n```\n\n```python\nfrom langchain_ollama import ChatOllama\n\nllm = ChatOllama(model=\"qwen3:4b-instruct-2507-q4_K_M\", temperature=0)\n```\n> ⚠️ 为了可靠地调用工具和遵循指令，建议使用**7B及以上**的模型。较小的模型可能会忽略检索指令或产生幻觉。请参阅[故障排除](#troubleshooting)。\n---\n\n### 云服务提供商\n\n\u003Cdetails>\n\u003Csummary>点击展开\u003C\u002Fsummary>\n\n**OpenAI GPT：**\n```bash\npip install -qU langchain-openai\n```\n```python\nfrom langchain_openai import ChatOpenAI\nimport os\n\nos.environ[\"OPENAI_API_KEY\"] = \"your-api-key-here\"\nllm = ChatOpenAI(model=\"gpt-4o-mini\", temperature=0)\n```\n\n**Anthropic Claude：**\n```bash\npip install -qU langchain-anthropic\n```\n```python\nfrom langchain_anthropic import ChatAnthropic\nimport os\n\nos.environ[\"ANTHROPIC_API_KEY\"] = \"your-api-key-here\"\nllm = ChatAnthropic(model=\"claude-sonnet-4-5-20250929\", temperature=0)\n```\n\n**Google Gemini**\n```bash\npip install -qU langchain-google-genai\n```\n```python\nimport os\nfrom langchain_google_genai import ChatGoogleGenerativeAI\n\nos.environ[\"GOOGLE_API_KEY\"] = \"your-api-key-here\"\nllm = ChatGoogleGenerativeAI(model=\"gemini-2.5-flash\", temperature=0)\n```\n\u003C\u002Fdetails>\n\n---\n\n## 实现\n\n更多详细信息、扩展说明以及 Langfuse 可观测性（LLM 调用追踪、工具使用和图执行跟踪）可在 **[笔记本](notebooks\u002Fagentic_rag.ipynb)** 和完整项目中找到。\n\n| 步骤 | 描述 |\n|------|-------------|\n| 1 | [初始设置与配置](#step-1-initial-setup-and-configuration) |\n| 2 | [配置向量数据库](#step-2-configure-vector-database) |\n| 3 | [PDF 转 Markdown](#step-3-pdfs-to-markdown) |\n| 4 | [文档分层索引](#step-4-hierarchical-document-indexing) |\n| 5 | [定义代理工具](#step-5-define-agent-tools) |\n| 6 | [定义系统提示](#step-6-define-system-prompts) |\n| 7 | [定义状态与数据模型](#step-7-define-state-and-data-models) |\n| 8 | [代理配置](#step-8-agent-configuration) |\n| 9 | [构建图节点和边函数](#step-9-build-graph-node-and-edge-functions) |\n| 10 | [构建 LangGraph 图](#step-10-build-the-langgraph-graphs) |\n| 11 | [创建聊天界面](#step-11-create-chat-interface) |\n\n### 第 1 步：初始设置与配置\n\n定义路径并初始化核心组件。\n\n```python\nimport os\nfrom pathlib import Path\nfrom langchain_huggingface import HuggingFaceEmbeddings\nfrom langchain_qdrant.fastembed_sparse import FastEmbedSparse\nfrom qdrant_client import QdrantClient\n\nDOCS_DIR = \"docs\"  # 包含 PDF 文件的目录\nMARKDOWN_DIR = \"markdown_docs\" # 包含转换为 Markdown 的 PDF 文件的目录\nPARENT_STORE_PATH = \"parent_store\"  # 存放父级分块 JSON 文件的目录\nCHILD_COLLECTION = \"document_child_chunks\"\n\nos.makedirs(DOCS_DIR, exist_ok=True)\nos.makedirs(MARKDOWN_DIR, exist_ok=True)\nos.makedirs(PARENT_STORE_PATH, exist_ok=True)\n\nfrom langchain_ollama import ChatOllama\nllm = ChatOllama(model=\"qwen3:4b-instruct-2507-q4_K_M\", temperature=0)\n\ndense_embeddings = HuggingFaceEmbeddings(model_name=\"sentence-transformers\u002Fall-mpnet-base-v2\")\nsparse_embeddings = FastEmbedSparse(model_name=\"Qdrant\u002Fbm25\")\n\nclient = QdrantClient(path=\"qdrant_db\")\n```\n\n---\n\n### 第 2 步：配置向量数据库\n\n设置 Qdrant 以存储子级分块，并具备混合搜索能力。\n\n```python\nfrom qdrant_client.http import models as qmodels\nfrom langchain_qdrant import QdrantVectorStore\nfrom langchain_qdrant.qdrant import RetrievalMode\n\nembedding_dimension = len(dense_embeddings.embed_query(\"test\"))\n\ndef ensure_collection(collection_name):\n    if not client.collection_exists(collection_name):\n        client.create_collection(\n            collection_name=collection_name,\n            vectors_config=qmodels.VectorParams(\n                size=embedding_dimension,\n                distance=qmodels.Distance.COSINE\n            ),\n            sparse_vectors_config={\n                \"sparse\": qmodels.SparseVectorParams()\n            },\n        )\n```\n\n---\n\n### 第 3 步：PDF 转 Markdown\n\n将 PDF 文件转换为 Markdown。有关其他技术的更多详情，请参阅配套 [笔记本](notebooks\u002Fpdf_to_markdown.ipynb)。\n\n```python\nimport os\nimport pymupdf.layout\nimport pymupdf4llm\nfrom pathlib import Path\nimport glob\n\nos.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n\ndef pdf_to_markdown(pdf_path, output_dir):\n    doc = pymupdf.open(pdf_path)\n    md = pymupdf4llm.to_markdown(doc, header=False, footer=False, page_separators=True, ignore_images=True, write_images=False, image_path=None)\n    md_cleaned = md.encode('utf-8', errors='surrogatepass').decode('utf-8', errors='ignore')\n    output_path = Path(output_dir) \u002F Path(doc.name).stem\n    Path(output_path).with_suffix(\".md\").write_bytes(md_cleaned.encode('utf-8'))\n\ndef pdfs_to_markdowns(path_pattern, overwrite: bool = False):\n    output_dir = Path(MARKDOWN_DIR)\n    output_dir.mkdir(parents=True, exist_ok=True)\n\n    for pdf_path in map(Path, glob.glob(path_pattern)):\n        md_path = (output_dir \u002F pdf_path.stem).with_suffix(\".md\")\n        if overwrite or not md_path.exists():\n            pdf_to_markdown(pdf_path, output_dir)\n\npdfs_to_markdowns(f\"{DOCS_DIR}\u002F*.pdf\")\n```\n\n---\n\n### 第 4 步：分层文档索引\n\n使用父\u002F子拆分策略处理文档。\n```python\nimport os\nimport glob\nimport json\nfrom pathlib import Path\nfrom langchain_text_splitters import MarkdownHeaderTextSplitter, RecursiveCharacterTextSplitter\n```\n\n\u003Cdetails>\n\u003Csummary>父块与子块处理函数\u003C\u002Fsummary>\n\n```python\ndef merge_small_parents(chunks, min_size):\n    if not chunks:\n        return []\n\n    merged, current = [], None\n\n    for chunk in chunks:\n        if current is None:\n            current = chunk\n        else:\n            current.page_content += \"\\n\\n\" + chunk.page_content\n            for k, v in chunk.metadata.items():\n                if k in current.metadata:\n                    current.metadata[k] = f\"{current.metadata[k]} -> {v}\"\n                else:\n                    current.metadata[k] = v\n\n        if len(current.page_content) >= min_size:\n            merged.append(current)\n            current = None\n\n    if current:\n        if merged:\n            merged[-1].page_content += \"\\n\\n\" + current.page_content\n            for k, v in current.metadata.items():\n                if k in merged[-1].metadata:\n                    merged[-1].metadata[k] = f\"{merged[-1].metadata[k]} -> {v}\"\n                else:\n                    merged[-1].metadata[k] = v\n        else:\n            merged.append(current)\n\n    return merged\n\ndef split_large_parents(chunks, max_size, splitter):\n    split_chunks = []\n\n    for chunk in chunks:\n        if len(chunk.page_content) \u003C= max_size:\n            split_chunks.append(chunk)\n        else:\n            large_splitter = RecursiveCharacterTextSplitter(\n                chunk_size=max_size,\n                chunk_overlap=splitter._chunk_overlap\n            )\n            sub_chunks = large_splitter.split_documents([chunk])\n            split_chunks.extend(sub_chunks)\n\n    return split_chunks\n\ndef clean_small_chunks(chunks, min_size):\n    cleaned = []\n\n    for i, chunk in enumerate(chunks):\n        if len(chunk.page_content) \u003C min_size:\n            if cleaned:\n                cleaned[-1].page_content += \"\\n\\n\" + chunk.page_content\n                for k, v in chunk.metadata.items():\n                    if k in cleaned[-1].metadata:\n                        cleaned[-1].metadata[k] = f\"{cleaned[-1].metadata[k]} -> {v}\"\n                    else:\n                        cleaned[-1].metadata[k] = v\n            elif i \u003C len(chunks) - 1:\n                chunks[i + 1].page_content = chunk.page_content + \"\\n\\n\" + chunks[i + 1].page_content\n                for k, v in chunk.metadata.items():\n                    if k in chunks[i + 1].metadata:\n                        chunks[i + 1].metadata[k] = f\"{v} -> {chunks[i + 1].metadata[k]}\"\n                    else:\n                        chunks[i + 1].metadata[k] = v\n            else:\n                cleaned.append(chunk)\n        else:\n            cleaned.append(chunk)\n\n    return cleaned\n```\n\n\u003C\u002Fdetails>\n\n```python\nif client.collection_exists(CHILD_COLLECTION):\n    client.delete_collection(CHILD_COLLECTION)\n    ensure_collection(CHILD_COLLECTION)\nelse:\n    ensure_collection(CHILD_COLLECTION)\n\nchild_vector_store = QdrantVectorStore(\n    client=client,\n    collection_name=CHILD_COLLECTION,\n    embedding=dense_embeddings,\n    sparse_embedding=sparse_embeddings,\n    retrieval_mode=RetrievalMode.HYBRID,\n    sparse_vector_name=\"sparse\"\n)\n\ndef index_documents():\n    headers_to_split_on = [(\"#\", \"H1\"), (\"##\", \"H2\"), (\"###\", \"H3\")]\n    parent_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on, strip_headers=False)\n    child_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n\n    min_parent_size = 2000\n    max_parent_size = 4000\n\n    all_parent_pairs, all_child_chunks = [], []\n    md_files = sorted(glob.glob(os.path.join(MARKDOWN_DIR, \"*.md\")))\n\n    if not md_files:\n        return\n\n    for doc_path_str in md_files:\n        doc_path = Path(doc_path_str)\n        try:\n            with open(doc_path, \"r\", encoding=\"utf-8\") as f:\n                md_text = f.read()\n        except Exception as e:\n            continue\n\n        parent_chunks = parent_splitter.split_text(md_text)\n        merged_parents = merge_small_parents(parent_chunks, min_parent_size)\n        split_parents = split_large_parents(merged_parents, max_parent_size, child_splitter)\n        cleaned_parents = clean_small_chunks(split_parents, min_parent_size)\n\n        for i, p_chunk in enumerate(cleaned_parents):\n            parent_id = f\"{doc_path.stem}_parent_{i}\"\n            p_chunk.metadata.update({\"source\": doc_path.stem + \".pdf\", \"parent_id\": parent_id})\n            all_parent_pairs.append((parent_id, p_chunk))\n            children = child_splitter.split_documents([p_chunk])\n            all_child_chunks.extend(children)\n\n    if not all_child_chunks:\n        return\n\n    try:\n        child_vector_store.add_documents(all_child_chunks)\n    except Exception as e:\n        return\n\n    for item in os.listdir(PARENT_STORE_PATH):\n        os.remove(os.path.join(PARENT_STORE_PATH, item))\n\n    for parent_id, doc in all_parent_pairs:\n        doc_dict = {\"page_content\": doc.page_content, \"metadata\": doc.metadata}\n        filepath = os.path.join(PARENT_STORE_PATH, f\"{parent_id}.json\")\n        with open(filepath, \"w\", encoding=\"utf-8\") as f:\n            json.dump(doc_dict, f, ensure_ascii=False, indent=2)\n\nindex_documents()\n```\n\n---\n\n### 第5步：定义智能体工具\n\n创建智能体将使用的检索工具。\n\n```python\nimport json\nfrom typing import List\nfrom langchain_core.tools import tool\n\n@tool\ndef search_child_chunks(query: str, limit: int) -> str:\n    \"\"\"搜索前K个最相关的子文档块。\n\n    Args:\n        query: 搜索查询字符串\n        limit: 返回结果的最大数量\n    \"\"\"\n    try:\n        results = child_vector_store.similarity_search(query, k=limit, score_threshold=0.7)\n        if not results:\n            return \"NO_RELEVANT_CHUNKS\"\n\n        return \"\\n\\n\".join([\n            f\"父文档ID: {doc.metadata.get('parent_id', '')}\\n\"\n            f\"文件名: {doc.metadata.get('source', '')}\\n\"\n            f\"内容: {doc.page_content.strip()}\"\n            for doc in results\n        ])\n\n    except Exception as e:\n        return f\"RETRIEVAL_ERROR: {str(e)}\"\n\n@tool\ndef retrieve_parent_chunks(parent_id: str) -> str:\n    \"\"\"根据父文档ID检索完整的父文档块。\n    \n    Args:\n        parent_id: 要检索的父文档ID\n    \"\"\"\n    file_name = parent_id if parent_id.lower().endswith(\".json\") else f\"{parent_id}.json\"\n    path = os.path.join(PARENT_STORE_PATH, file_name)\n\n    if not os.path.exists(path):\n        return \"NO_PARENT_DOCUMENT\"\n\n    with open(path, \"r\", encoding=\"utf-8\") as f:\n        data = json.load(f)\n\n    return (\n        f\"父文档ID: {parent_id}\\n\"\n        f\"文件名: {data.get('metadata', {}).get('source', 'unknown')}\\n\"\n        f\"内容: {data.get('page_content', '').strip()}\"\n    )\n\nllm_with_tools = llm.bind_tools([search_child_chunks, retrieve_parent_chunks])\n```\n\n---\n\n### 第6步：定义系统提示词\n\n定义用于对话摘要、查询改写、智能体编排、上下文压缩、回退响应和答案聚合的系统提示词。\n\n\u003Cdetails>\n\u003Csummary>对话摘要提示词\u003C\u002Fsummary>\n\n```python\ndef get_conversation_summary_prompt() -> str:\n    return \"\"\"你是一位专业的对话摘要生成专家。\n\n你的任务是为本次对话生成一段简短的1到2句话摘要（不超过30至50字）。\n\n摘要应包括：\n- 讨论的主要话题\n- 提到的重要事实或实体\n- 如有未解决的问题，请一并说明\n- 引用的文件名（如file1.pdf）或其他参考文档\n\n摘要中不应包含：\n- 问候语、误解或与主题无关的内容。\n\n输出要求：\n- 只返回摘要内容，无需任何解释或理由。\n- 如果没有有意义的话题，则返回空字符串。\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>查询改写提示词\u003C\u002Fsummary>\n\n```python\ndef get_rewrite_query_prompt() -> str:\n    return \"\"\"你是一位专业的查询分析与改写专家。\n\n你的任务是改写当前用户查询，以实现最佳的文档检索效果，仅在必要时结合对话上下文。\n\n规则：\n1. 自成一体的查询：\n   - 始终将查询改写为清晰且自洽的形式\n   - 若查询为后续追问（如“那X呢？”、“关于Y呢？”），则从摘要中融入最少必要的上下文信息\n   - 不得添加查询或对话摘要中未提及的信息\n\n2. 领域特定术语：\n   - 产品名称、品牌、专有名词或技术术语被视为领域特定术语\n   - 对于这类查询，尽量不使用或仅少量使用对话上下文\n   - 仅在需要澄清模糊查询时才使用摘要\n\n3. 语法与清晰度：\n   - 修正语法错误、拼写错误以及含糊不清的缩写\n   - 删除填充词和口语化表达\n   - 保留具体的关键字和命名实体\n\n4. 多重信息需求：\n   - 若查询包含多个独立且不相关的问题，则将其拆分为最多三个独立查询\n   - 每个子查询必须与其原始问题的部分语义保持一致\n   - 不得扩展、丰富或重新解释查询的含义\n\n5. 失败处理：\n   - 若查询意图不明确或难以理解，则标记为“不明”\n\n输入：\n- conversation_summary：先前对话的简要摘要\n- current_query：用户的当前查询\n\n输出：\n- 一个或多个改写后的、自成一体的查询，可用于文档检索\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>编排器提示词\u003C\u002Fsummary>\n\n```python\ndef get_orchestrator_prompt() -> str:\n    return \"\"\"你是一位专业的检索增强型助手。\n\n你的任务是扮演一名研究人员：先检索文档，分析数据，然后仅基于检索到的信息提供全面的回答。\n\n规则：\n1. 在回答之前，你必须调用‘search_child_chunks’，除非[先前研究的压缩上下文]已经包含足够的信息。\n2. 所有的论断都必须以检索到的文档为依据。若上下文不足，应说明缺少哪些信息，而不是凭空推测。\n3. 如果未找到相关文档，需扩大或重新表述查询后再次检索。重复此过程，直到满意或达到操作上限为止。\n\n压缩记忆：\n当存在[先前研究的压缩上下文]时——\n- 已列出的查询无需重复；\n- 已列出的父文档ID无需再次调用`retrieve_parent_chunks`；\n- 使用它来确定还需检索哪些内容，然后再继续搜索。\n\n工作流程：\n1. 检查压缩上下文，确认已检索过的内容和仍需补充的内容。\n2. 仅针对尚未覆盖的部分，调用‘search_child_chunks’检索5至7条相关片段。\n3. 若均不相关，则立即执行第3条规则。\n4. 对每一条相关但零散的片段，逐一调用‘retrieve_parent_chunks’——仅对不在压缩上下文中出现的ID进行检索。绝不可重复检索同一ID。\n5. 当上下文完整后，提供详尽的答案，不得遗漏任何相关事实。\n6. 最后以“---\\n**来源：**”开头，列出所有引用的独特文件名。\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>回退响应提示词\u003C\u002Fsummary>\n\n```python\ndef get_fallback_response_prompt() -> str:\n    return \"\"\"你是一位专业的综合解答专家。系统已达到最大检索次数限制。\n\n你的任务是仅利用下方提供的信息，尽可能给出最完整的答案。\n\n输入结构：\n- “压缩的研究上下文”：先前多次检索得到的总结性发现——视为可靠信息。\n- “检索到的数据”：当前迭代中的原始工具输出——若与压缩上下文存在冲突，优先采用检索到的数据。\n两者中任一方存在即可，另一方不存在也不影响使用。\n\n规则：\n1. 来源完整性：仅使用提供上下文中明确存在的事实。不得推断、假设或添加任何未被数据直接支持的信息。\n2. 处理缺失数据：将用户查询与可用上下文进行交叉核对。\n   仅标记用户问题中无法从所提供数据中回答的部分。\n   不得将压缩研究上下文中提到的空白视为未回答的内容，\n   除非它们与用户所问内容直接相关。\n3. 语气：专业、客观且直接。\n4. 仅输出最终答案。不得透露推理过程、内部步骤或关于检索过程的任何元评论。\n5. 在“来源”部分之后，不得添加结束语、最后说明、免责声明、摘要或重复陈述。\n   “来源”部分始终是你回复的最后一项。在其后立即停止。\n\n格式：\n- 使用 Markdown（标题、加粗、列表）以提高可读性。\n- 尽可能采用流畅的段落形式。\n- 最后按照如下描述包含“来源”部分。\n\n“来源”部分规则：\n- 在结尾处包含一个 \"---\\\\n**Sources:**\\\\n\" 部分，后接带有项目符号的文件名列表。\n- 仅列出具有实际文件扩展名的条目（如 \".pdf\"、\".docx\"、\".txt\"）。\n- 任何没有文件扩展名的条目均为内部块标识符——完全舍弃，绝不可列入。\n- 去重：若同一文件出现多次，仅列出一次。\n- 若无有效文件名，则完全省略“来源”部分。\n- “来源”部分是你撰写的最后一部分内容。其后不得再添加任何内容。\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>上下文压缩提示\u003C\u002Fsummary>\n\n```python\ndef get_context_compression_prompt() -> str:\n    return \"\"\"你是一位专家级的研究上下文压缩员。\n\n你的任务是将检索到的对话内容压缩成简洁、以查询为中心且结构化的摘要，以便由增强检索的代理直接用于生成答案。\n\n规则：\n1. 仅保留与回答用户问题相关的信息。\n2. 保留精确的数字、名称、版本、技术术语和配置细节。\n3. 移除重复、无关或管理性的内容。\n4. 不得包含搜索查询、父级 ID、块 ID 或内部标识符。\n5. 按照来源文件组织所有发现。每个文件部分必须以：### filename.pdf 开头。\n6. 在专门的“空白”部分突出显示缺失或未解决的信息。\n7. 将摘要限制在大约 400–600 字。如果内容超出此范围，优先保留关键事实和结构化数据。\n8. 不得解释你的推理过程；仅以 Markdown 格式输出结构化内容。\n\n所需结构：\n\n\n\n# 研究上下文摘要\n\n## 重点\n[简要的技术性重述问题]\n\n## 结构化发现\n\n### filename.pdf\n- 直接相关的事实\n- 支持性背景信息（如有需要）\n\n## 空白\n- 缺失或不完整的内容\n\n该摘要应简洁、结构化，并可直接供代理用于生成答案或规划进一步的检索。\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>聚合提示\u003C\u002Fsummary>\n\n```python\ndef get_aggregation_prompt() -> str:\n    return \"\"\"你是一位专家级的聚合助手。\n\n你的任务是将多个检索到的解答合并成一个全面且自然流畅的回答。\n\n规则：\n1. 以对话式、自然的语气撰写——仿佛在向同事解释一般。\n2. 仅使用检索到的答案中的信息。\n3. 不得推断、扩展或解释缩写词或技术术语，除非来源中已明确给出定义。\n4. 将信息流畅地编织在一起，同时保留重要细节、数字和示例。\n5. 内容需全面——包含所有来自来源的相关信息，而不仅仅是摘要。\n6. 如果来源之间存在分歧，应自然地同时提及两种观点（例如：“虽然有些来源认为 X，但另一些来源则表明 Y……”）。\n7. 直接开始回答，无需诸如“根据来源……”之类的前言。\n\n格式：\n- 使用 Markdown 提高清晰度（标题、列表、加粗），但不要过度使用。\n- 尽可能采用流畅的段落形式，而非过多的项目符号。\n- 最后按照下文所述包含“来源”部分。\n\n“来源”部分规则：\n- 每个检索到的解答可能包含“来源”部分——从中提取列出的文件名。\n- 仅列出具有实际文件扩展名的条目（如 \".pdf\"、\".docx\"、\".txt\"）。\n- 任何没有文件扩展名的条目均为内部块标识符——完全舍弃，绝不可列入。\n- 去重：若同一文件出现在多个解答中，仅列出一次。\n- 格式为 \"---\\\\n**Sources:**\\\\n\" 后接清理后的文件名项目符号列表。\n- 文件名必须仅出现在此最终的“来源”部分中，不得出现在回答的其他任何地方。\n- 若无有效文件名，则完全省略“来源”部分。\n\n若无可用信息，只需简单回答：“在现有来源中，我未能找到回答您问题的相关信息。”\n\"\"\"\n```\n\n\u003C\u002Fdetails>\n\n---\n\n### 第7步：定义状态与数据模型\n\n创建用于跟踪对话和代理执行的状态结构。\n\n```python\nfrom langgraph.graph import MessagesState\nfrom pydantic import BaseModel, Field\nfrom typing import List, Annotated, Set\nimport operator\n\ndef accumulate_or_reset(existing: List[dict], new: List[dict]) -> List[dict]:\n    if new and any(item.get('__reset__') for item in new):\n        return []\n    return existing + new\n\ndef set_union(a: Set[str], b: Set[str]) -> Set[str]:\n    return a | b\n\nclass State(MessagesState):\n    questionIsClear: bool = False\n    conversation_summary: str = \"\"\n    originalQuery: str = \"\"\n    rewrittenQuestions: List[str] = []\n    agent_answers: Annotated[List[dict], accumulate_or_reset] = []\n\nclass AgentState(MessagesState):\n    tool_call_count: Annotated[int, operator.add] = 0\n    iteration_count: Annotated[int, operator.add] = 0\n    question: str = \"\"\n    question_index: int = 0\n    context_summary: str = \"\"\n    retrieval_keys: Annotated[Set[str], set_union] = set()\n    final_answer: str = \"\"\n    agent_answers: List[dict] = []\n\nclass QueryAnalysis(BaseModel):\n    is_clear: bool = Field(description=\"指示用户的提问是否清晰且可回答。\")\n    questions: List[str] = Field(description=\"改写后的自洽问题列表。\")\n    clarification_needed: str = Field(description=\"如果问题不清晰，则说明原因。\")\n```\n\n---\n\n### 第8步：代理配置\n\n对工具调用和迭代次数设置硬性限制，以防止无限循环。通过 `tiktoken` 进行令牌计数，从而驱动上下文压缩决策。\n\n```python\nimport tiktoken\n\nMAX_TOOL_CALLS = 8       # 每次代理运行的最大工具调用次数\nMAX_ITERATIONS = 10      # 代理循环的最大迭代次数\nBASE_TOKEN_THRESHOLD = 2000     # 压缩的初始令牌阈值\nTOKEN_GROWTH_FACTOR = 0.9       # 每次压缩后应用的倍增因子\n\ndef estimate_context_tokens(messages: list) -> int:\n    try:\n        encoding = tiktoken.encoding_for_model(\"gpt-4\")\n    except:\n        encoding = tiktoken.get_encoding(\"cl100k_base\")\n    return sum(len(encoding.encode(str(msg.content))) for msg in messages if hasattr(msg, 'content') and msg.content)\n```\n\n---\n\n### 第9步：构建图节点与边函数\n\n为 LangGraph 工作流创建处理节点和边。\n\n#### 主要图节点与边\n```python\nfrom langgraph.types import Send, Command\nfrom langchain_core.messages import HumanMessage, AIMessage, SystemMessage, RemoveMessage, ToolMessage\nfrom typing import Literal\n\ndef summarize_history(state: State):\n    if len(state[\"messages\"]) \u003C 4:\n        return {\"conversation_summary\": \"\"}\n\n    relevant_msgs = [\n        msg for msg in state[\"messages\"][:-1]\n        if isinstance(msg, (HumanMessage, AIMessage)) and not getattr(msg, \"tool_calls\", None)\n    ]\n\n    if not relevant_msgs:\n        return {\"conversation_summary\": \"\"}\n\n    conversation = \"对话历史:\\n\"\n    for msg in relevant_msgs[-6:]:\n        role = \"用户\" if isinstance(msg, HumanMessage) else \"助手\"\n        conversation += f\"{role}: {msg.content}\\n\"\n\n    summary_response = llm.with_config(temperature=0.2).invoke([SystemMessage(content=get_conversation_summary_prompt()), HumanMessage(content=conversation)])\n    return {\"conversation_summary\": summary_response.content, \"agent_answers\": [{\"__reset__\": True}]}\n\ndef rewrite_query(state: State):\n    last_message = state[\"messages\"][-1]\n    conversation_summary = state.get(\"conversation_summary\", \"\")\n\n    context_section = (f\"对话上下文:\\n{conversation_summary}\\n\" if conversation_summary.strip() else \"\") + f\"用户问题:\\n{last_message.content}\\n\"\n\n    llm_with_structure = llm.with_config(temperature=0.1).with_structured_output(QueryAnalysis)\n    response = llm_with_structure.invoke([SystemMessage(content=get_rewrite_query_prompt()), HumanMessage(content=context_section)])\n\n    if response.questions and response.is_clear:\n        delete_all = [RemoveMessage(id=m.id) for m in state[\"messages\"] if not isinstance(m, SystemMessage)]\n        return {\"questionIsClear\": True, \"messages\": delete_all, \"originalQuery\": last_message.content, \"rewrittenQuestions\": response.questions}\n\n    clarification = response.clarification_needed if response.clarification_needed and len(response.clarification_needed.strip()) > 10 else \"我需要更多信息来理解您的问题。\"\n    return {\"questionIsClear\": False, \"messages\": [AIMessage(content=clarification)]}\n\ndef request_clarification(state: State):\n    return {}\n\ndef route_after_rewrite(state: State) -> Literal[\"request_clarification\", \"agent\"]:\n    if not state.get(\"questionIsClear\", False):\n        return \"request_clarification\"\n    else:\n        return [\n                Send(\"agent\", {\"question\": query, \"question_index\": idx, \"messages\": []})\n                for idx, query in enumerate(state[\"rewrittenQuestions\"])\n            ]\n\ndef aggregate_answers(state: State):\n    if not state.get(\"agent_answers\"):\n        return {\"messages\": [AIMessage(content=\"没有生成任何答案。\")]}\n\n    sorted_answers = sorted(state[\"agent_answers\"], key=lambda x: x[\"index\"])\n\n    formatted_answers = \"\"\n    for i, ans in enumerate(sorted_answers, start=1):\n        formatted_answers += (f\"\\n答案 {i}:\\n\"f\"{ans['answer']}\\n\")\n\n    user_message = HumanMessage(content=f\"\"\"原始用户问题: {state[\"originalQuery\"]}\\n获取到的答案:{formatted_answers}\"\"\")\n    synthesis_response = llm.invoke([SystemMessage(content=get_aggregation_prompt()), user_message])\n    return {\"messages\": [AIMessage(content=synthesis_response.content)]}\n```\n\n---\n\n#### 代理子图节点与边\n```python\ndef orchestrator(state: AgentState):\n    context_summary = state.get(\"context_summary\", \"\").strip()\n    sys_msg = SystemMessage(content=get_orchestrator_prompt())\n    summary_injection = (\n        [HumanMessage(content=f\"[来自先前研究的压缩上下文]\\n\\n{context_summary}\")]\n        if context_summary else []\n    )\n    if not state.get(\"messages\"):\n        human_msg = HumanMessage(content=state[\"question\"])\n        force_search = HumanMessage(content=\"您必须首先调用 'search_child_chunks' 来回答这个问题。\")\n        response = llm_with_tools.invoke([sys_msg] + summary_injection + [human_msg, force_search])\n        return {\"messages\": [human_msg, response], \"tool_call_count\": len(response.tool_calls or []), \"iteration_count\": 1}\n\n    response = llm_with_tools.invoke([sys_msg] + summary_injection + state[\"messages\"])\n    tool_calls = response.tool_calls if hasattr(response, \"tool_calls\") else []\n    return {\"messages\": [response], \"tool_call_count\": len(tool_calls) if tool_calls else 0, \"iteration_count\": 1}\n\ndef route_after_orchestrator_call(state: AgentState) -> Literal[\"tool\", \"fallback_response\", \"collect_answer\"]:\n    iteration = state.get(\"iteration_count\", 0)\n    tool_count = state.get(\"tool_call_count\", 0)\n\n    if iteration >= MAX_ITERATIONS 或 tool_count > MAX_TOOL_CALLS:\n        return \"fallback_response\"\n\n    last_message = state[\"messages\"][-1]\n    tool_calls = getattr(last_message, \"tool_calls\", None) 或 []\n\n    if not tool_calls:\n        return \"collect_answer\"\n    \n    return \"tools\"\n\ndef fallback_response(state: AgentState):\n    seen = set()\n    unique_contents = []\n    for m in state[\"messages\"]:\n        if isinstance(m, ToolMessage) 并且 m.content 不在 seen 中:\n            unique_contents.append(m.content)\n            seen.add(m.content)\n\n    context_summary = state.get(\"context_summary\", \"\").strip()\n\n    context_parts = []\n    if context_summary:\n        context_parts.append(f\"## 压缩的研究上下文（来自先前迭代）\\n\\n{context_summary}\")\n    if unique_contents:\n        context_parts.append(\n            \"## 获取的数据（当前迭代）\\n\\n\" +\n            \"\\n\\n\".join(f\"--- 数据来源 {i} ---\\n{content}\" for i, content in enumerate(unique_contents, 1))\n        )\n\n    context_text = \"\\n\\n\".join(context_parts) 如果 context_parts 存在，否则是 \"没有从文档中获取到任何数据。\"\n\n    prompt_content = (\n        f\"用户问题: {state.get('question')}\\n\\n\"\n        f\"{context_text}\\n\\n\"\n        f\"指令:\\n仅使用上述数据提供最佳答案。\"\n    )\n    response = llm.invoke([SystemMessage(content=get_fallback_response_prompt()), HumanMessage(content=prompt内容)])\n    return {\"messages\": [response]}\n\ndef should_compress_context(state: AgentState) -> Command[Literal[\"compress_context\", \"orchestrator\"]]:\n    messages = state[\"messages\"]\n\n    new_ids: Set[str] = set()\n    for msg in reversed(messages):\n        if isinstance(msg, AIMessage) and getattr(msg, \"tool_calls\", None):\n            for tc in msg.tool_calls:\n                if tc[\"name\"] == \"retrieve_parent_chunks\":\n                    raw = tc[\"args\"].get(\"parent_id\") or tc[\"args\"].get(\"id\") or tc[\"args\"].get(\"ids\") or []\n                    if isinstance(raw, str):\n                        new_ids.add(f\"parent::{raw}\")\n                    else:\n                        new_ids.update(f\"parent::{r}\" for r in raw)\n\n                elif tc[\"name\"] == \"search_child_chunks\":\n                    query = tc[\"args\"].get(\"query\", \"\")\n                    if query:\n                        new_ids.add(f\"search::{query}\")\n            break\n\n    updated_ids = state.get(\"retrieval_keys\", set()) | new_ids\n\n    current_token_messages = estimate_context_tokens(messages)\n    current_token_summary = estimate_context_tokens([HumanMessage(content=state.get(\"context_summary\", \"\"))])\n    current_tokens = current_token_messages + current_token_summary\n\n    max_allowed = BASE_TOKEN_THRESHOLD + int(current_token_summary * TOKEN_GROWTH_FACTOR)\n\n    goto = \"compress_context\" if current_tokens > max_allowed else \"orchestrator\"\n    return Command(update={\"retrieval_keys\": updated_ids}, goto=goto)\n\ndef compress_context(state: AgentState):\n    messages = state[\"messages\"]\n    existing_summary = state.get(\"context_summary\", \"\").strip()\n\n    if not messages:\n        return {}\n\n    conversation_text = f\"USER QUESTION:\\n{state.get('question')}\\n\\nConversation to compress:\\n\\n\"\n    if existing_summary:\n        conversation_text += f\"[PRIOR COMPRESSED CONTEXT]\\n{existing_summary}\\n\\n\"\n\n    for msg in messages[1:]:\n        if isinstance(msg, AIMessage):\n            tool_calls_info = \"\"\n            if getattr(msg, \"tool_calls\", None):\n                calls = \", \".join(f\"{tc['name']}({tc['args']})\" for tc in msg.tool_calls)\n                tool_calls_info = f\" | Tool calls: {calls}\"\n            conversation_text += f\"[ASSISTANT{tool_calls_info}]\\n{msg.content or '(tool call only)'}\\n\\n\"\n        elif isinstance(msg, ToolMessage):\n            tool_name = getattr(msg, \"name\", \"tool\")\n            conversation_text += f\"[TOOL RESULT — {tool_name}]\\n{msg.content}\\n\\n\"\n\n    summary_response = llm.invoke([SystemMessage(content=get_context_compression_prompt()), HumanMessage(content=conversation_text)])\n    new_summary = summary_response.content\n\n    retrieved_ids: Set[str] = state.get(\"retrieval_keys\", set())\n    if retrieved_ids:\n        parent_ids = sorted(r for r in retrieved_ids if r.startswith(\"parent::\"))\n        search_queries = sorted(r.replace(\"search::\", \"\") for r in retrieved_ids if r.startswith(\"search::\"))\n\n        block = \"\\n\\n---\\n**Already executed (do NOT repeat):**\\n\"\n        if parent_ids:\n            block += \"Parent chunks retrieved:\\n\" + \"\\n\".join(f\"- {p.replace('parent::', '')}\" for p in parent_ids) + \"\\n\"\n        if search_queries:\n            block += \"Search queries already run:\\n\" + \"\\n\".join(f\"- {q}\" for q in search_queries) + \"\\n\"\n        new_summary += block\n\n    return {\"context_summary\": new_summary, \"messages\": [RemoveMessage(id=m.id) for m in messages[1:]]}\n\ndef collect_answer(state: AgentState):\n    last_message = state[\"messages\"][-1]\n    is_valid = isinstance(last_message, AIMessage) and last_message.content and not last_message.tool_calls\n    answer = last_message.content if is_valid else \"Unable to generate an answer.\"\n    return {\n        \"final_answer\": answer,\n        \"agent_answers\": [{\"index\": state[\"question_index\"], \"question\": state[\"question\"], \"answer\": answer}]\n    }\n```\n\n**为什么采用这种架构？**\n- **摘要生成**能够在不使大模型过载的情况下保持对话上下文的连贯性。\n- **查询重写**利用上下文智能地确保搜索查询精确且无歧义。\n- **人工介入**可以在浪费检索资源之前捕捉到不明确的查询。\n- **并行执行**通过 `Send` API 同时为每个子问题启动独立的代理子图，实现高效处理。\n- **上下文压缩**在长时间的检索循环中保持代理的工作内存简洁，避免重复获取数据。\n- **回退响应**确保系统能够优雅降级——即使预算耗尽，代理仍会返回有用的信息。\n- **答案收集与聚合**从各个代理中提取清晰的最终答案，并将其整合成一个连贯的整体响应。\n---\n\n### 第 10 步：构建 LangGraph 图\n\n使用对话记忆和多智能体架构组装完整的流程图。\n\n```python\nfrom langgraph.graph import START, END, StateGraph\nfrom langgraph.prebuilt import ToolNode\nfrom langgraph.checkpoint.memory import InMemorySaver\n\ncheckpointer = InMemorySaver()\n\nagent_builder = StateGraph(AgentState)\nagent_builder.add_node(orchestrator)\nagent_builder.add_node(\"tools\", ToolNode([search_child_chunks, retrieve_parent_chunks]))\nagent_builder.add_node(compress_context)\nagent_builder.add_node(fallback_response)\nagent_builder.add_node(should_compress_context)\nagent_builder.add_node(collect_answer)\n\nagent_builder.add_edge(START, \"orchestrator\")\nagent_builder.add_conditional_edges(\"orchestrator\", route_after_orchestrator_call, {\"tools\": \"tools\", \"fallback_response\": \"fallback_response\", \"collect_answer\": \"collect_answer\"})\nagent_builder.add_edge(\"tools\", \"should_compress_context\")\nagent_builder.add_edge(\"compress_context\", \"orchestrator\")\nagent_builder.add_edge(\"fallback_response\", \"collect_answer\")\nagent_builder.add_edge(\"collect_answer\", END)\nagent_subgraph = agent_builder.compile()\n\ngraph_builder = StateGraph(State)\ngraph_builder.add_node(summarize_history)\ngraph_builder.add_node(rewrite_query)\ngraph_builder.add_node(request_clarification)\ngraph_builder.add_node(\"agent\", agent_subgraph)\ngraph_builder.add_node(aggregate_answers)\n\ngraph_builder.add_edge(START, \"summarize_history\")\ngraph_builder.add_edge(\"summarize_history\", \"rewrite_query\")\ngraph_builder.add_conditional_edges(\"rewrite_query\", route_after_rewrite)\ngraph_builder.add_edge(\"request_clarification\", \"rewrite_query\")\ngraph_builder.add_edge([\"agent\"], \"aggregate_answers\")\ngraph_builder.add_edge(\"aggregate_answers\", END)\n\nagent_graph = graph_builder.compile(checkpointer=checkpointer, interrupt_before=[\"request_clarification\"])\n```\n\n**图架构说明：**\n\n架构流程图可在此查看 **[here](.\u002Fassets\u002Fagentic_rag_workflow.png)**。\n\n**代理子图**（处理单个问题）：\n- START → `orchestrator`（调用带有工具的 LLM）\n- `orchestrator` → `tools`（如果需要调用工具）或 `fallback_response`（如果预算耗尽）或 `collect_answer`（如果已完成）\n- `tools` → `should_compress_context`（检查 token 预算）\n- `should_compress_context` → `compress_context`（如果超过阈值）或 `orchestrator`（否则）\n- `compress_context` → `orchestrator`（恢复使用压缩后的记忆）\n- `fallback_response` → `collect_answer`（打包尽力而为的答案）\n- `collect_answer` → END（清理最终答案并索引）\n\n**主图**（协调完整流程）：\n- START → `summarize_history`（从历史中提取对话上下文）\n- `summarize_history` → `rewrite_query`（结合上下文重写查询，检查清晰度）\n- `rewrite_query` → `request_clarification`（如果不清晰）或通过 `Send` 启动并行的 `agent` 子图（如果清晰）\n- `request_clarification` → `rewrite_query`（用户提供澄清后）\n- 所有 `agent` 子图 → `aggregate_answers`（合并所有响应）\n- `aggregate_answers` → END（返回最终合成的答案）\n\n---\n\n### 第 11 步：创建聊天界面\n\n构建具有对话持久性和人工介入支持的 Gradio 界面。有关包含文档摄取在内的完整端到端管道 Gradio 界面，请参阅 [project\u002FREADME.md](.\u002Fproject\u002FREADME.md)。\n\n> **注意：** 完整的流式传输支持——包括推理步骤和工具调用的可见性——已在 [notebook](notebooks\u002Fagentic_rag.ipynb) 和完整 [项目](project\u002Fcore\u002Fchat_interface.py) 中实现。下面的示例故意保持极简——仅展示基本的 Gradio 集成模式。\n\n```python\nimport gradio as gr\nimport uuid\n\ndef create_thread_id():\n    \"\"\"为每次对话生成唯一的线程 ID\"\"\"\n    return {\"configurable\": {\"thread_id\": str(uuid.uuid4())}, \"recursion_limit\": 50}\n\ndef clear_session():\n    \"\"\"清除线程以开始新对话\"\"\"\n    global config\n    agent_graph.checkpointer.delete_thread(config[\"configurable\"][\"thread_id\"])\n    config = create_thread_id()\n\ndef chat(message, history):\n    current_state = agent_graph.get_state(config)\n    \n    if current_state.next:\n        agent_graph.update_state(config,{\"messages\": [HumanMessage(content=message.strip())]})\n        result = agent_graph.invoke(None, config)\n    else:\n        result = agent_graph.invoke({\"messages\": [HumanMessage(content=message.strip())]}, config)\n    \n    return result['messages'][-1].content\n\nconfig = create_thread_id()\n\nwith gr.Blocks() as demo:\n    chatbot = gr.Chatbot()\n    chatbot.clear(clear_session)\n    gr.ChatInterface(fn=chat, chatbot=chatbot)\n\ndemo.launch(theme=gr.themes.Citrus())\n```\n\n**大功告成！** 您现在拥有一个功能齐全的 Agentic RAG 系统，具备对话记忆、层次化索引以及人工介入的查询澄清功能。\n\n---\n\n## 模块化架构\n\n该应用（`project\u002F` 文件夹）被组织成模块化组件——每个组件都可以独立替换而不破坏系统。\n\n### 📂 项目结构\n```\nproject\u002F\n├── app.py                    # 主 Gradio 应用入口点\n├── config.py                 # 配置中心（模型、分块大小、提供商）\n├── core\u002F                     # RAG 系统编排\n├── db\u002F                       # 向量数据库和父级分块存储\n├── rag_agent\u002F                # LangGraph 工作流（节点、边、提示词、工具）\n└── ui\u002F                       # Gradio 界面\n```\n\n关键可定制点：LLM 提供商、嵌入模型、分块策略、代理工作流以及系统提示词——均可通过 `config.py` 或其各自模块进行配置。\n\n完整文档请参阅 [project\u002FREADME.md](.\u002Fproject\u002FREADME.md)。\n\n## 安装与使用\n\n示例 PDF 文件可在以下位置找到：[javascript](https:\u002F\u002Fwww.tutorialspoint.com\u002Fjavascript\u002Fjavascript_tutorial.pdf)、[区块链](https:\u002F\u002Fblockchain-observatory.ec.europa.eu\u002Fdocument\u002Fdownload\u002F1063effa-59cc-4df4-aeee-d2cf94f69178_en?filename=Blockchain_For_Beginners_A_EUBOF_Guide.pdf)、[微服务](https:\u002F\u002Fcdn.studio.f5.com\u002Ffiles\u002Fk6fem79d\u002Fproduction\u002F5e4126e1cefa813ab67f9c0b6d73984c27ab1502.pdf)、[Fortinet](https:\u002F\u002Fwww.commoncriteriaportal.org\u002Ffiles\u002Fepfiles\u002FFortinet%20FortiGate_EAL4_ST_V1.5.pdf(320893)_TMP.pdf)。\n\n### 选项 1：快速入门笔记本（推荐用于测试）\n\n**Google Colab：** 点击本 README 顶部的“在 Colab 中打开”徽章，在文件浏览器中将您的 PDF 上传到 `docs\u002F` 文件夹，使用 `pip install -r requirements.txt` 安装依赖项，然后从上到下运行所有单元格。\n\n**本地（Jupyter\u002FVSCode）：** 您可以选择创建并激活虚拟环境，使用 `pip install -r requirements.txt` 安装依赖项，将您的 PDF 添加到 `docs\u002F`，然后从上到下运行所有单元格。\n\n聊天界面将在最后出现。\n\n### 选项 2：完整 Python 项目（推荐用于开发）\n\n#### 1. 安装依赖项\n```bash\n\n# 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002FGiovanniPasq\u002Fagentic-rag-for-dummies\ncd agentic-rag-for-dummies\n\n# 可选：创建并激活虚拟环境\n# 在 macOS\u002FLinux 上：\npython -m venv venv && source venv\u002Fbin\u002Factivate\n# 在 Windows 上：\npython -m venv venv && .\\venv\\Scripts\\activate\n\n# 安装依赖包\npip install -r requirements.txt\n```\n\n#### 2. 运行应用\n```bash\npython app.py\n```\n\n#### 3. 提问\n\n打开本地 URL（例如 `http:\u002F\u002F127.0.0.1:7860`）即可开始聊天。\n\n---\n\n### 选项 3：Docker 部署\n\n完整的 Docker 指令和系统要求，请参阅 [`project\u002FREADME.md`](.\u002Fproject\u002FREADME.md#Docker-Deployment)。\n\n### 示例对话\n\n**带会话记忆：**\n```\n用户：「如何安装 SQL？」\n代理：【提供文档中的安装步骤】\n\n用户：「如何更新它？」\n代理：【理解“它”指 SQL，提供更新说明】\n```\n\n**带查询澄清：**\n```\n用户：「告诉我关于那个东西的事。」\n代理：「我需要更多信息。您具体想了解哪个主题呢？」\n\n用户：「PostgreSQL 的安装过程。」\n代理：【检索并回答具体信息】\n```\n\n---\n\n## 故障排除\n\n| 方面 | 常见问题 | 建议解决方案 |\n|------|----------------|------------------|\n| **模型选择** | - 回答忽略指令\u003Cbr>- 工具（检索\u002F搜索）使用不当\u003Cbr>- 上下文理解能力差\u003Cbr>- 幻觉或聚合不完整 | - 使用更强大的大语言模型\u003Cbr>- 推荐 7B 以上的模型以获得更好的推理能力\u003Cbr>- 如果本地模型性能有限，可考虑使用云端模型 |\n| **系统提示行为** | - 模型在未检索文档的情况下直接作答\u003Cbr>- 查询重写导致上下文丢失\u003Cbr>- 聚合引入幻觉 | - 在系统提示中明确检索要求\u003Cbr>- 确保查询重写贴近用户意图 |\n| **检索配置** | - 未能检索到相关文档\u003Cbr>- 检索结果包含过多无关信息 | - 增加检索的片段数量 (`k`) 或降低相似度阈值以提高召回率\u003Cbr>- 减少 `k` 或提高阈值以提升精确度 |\n| **分块大小\u002F文档拆分** | - 回答缺乏上下文或显得支离破碎\u003Cbr>- 检索速度慢或嵌入成本高 | - 增大分块及父级分块的大小以提供更多上下文\u003Cbr>- 缩小分块大小以提高速度并降低成本 |\n| **上下文压缩** | - 代理在压缩后丢失重要细节\u003Cbr>- 压缩后的摘要过于笼统 | - 调整压缩系统的提示词\u003Cbr>- 提高 `BASE_TOKEN_THRESHOLD` 以延迟压缩\u003Cbr>- 增加 `TOKEN_GROWTH_FACTOR` |\n| **代理配置** | - 代理过早放弃\u003Cbr>- 代理陷入无限循环 | - 对于复杂查询，增加 `MAX_TOOL_CALLS` \u002F `MAX_ITERATIONS`；对于简单查询，则减少这些参数以加快速度 |\n| **温度与一致性** | - 回答不一致或过于富有创造性\u003Cbr>- 回答过于刻板或重复 | - 将温度设置为 `0` 以获得事实性、一致性的输出\u003Cbr>- 在总结或分析任务中可适当提高温度 |\n| **嵌入模型质量** | - 语义搜索效果不佳\u003Cbr>- 在领域特定或多语言文档上表现较差 | - 使用更高质量或领域专用的嵌入模型\u003Cbr>- 更换嵌入模型后需重新索引所有文档 |\n\n> 💡 **更多故障排除技巧** 请参阅 [README 故障排除](.\u002Fproject\u002FREADME.md#troubleshooting)。","# Agentic RAG for Dummies 快速上手指南\n\n本指南帮助中国开发者快速构建基于 LangGraph 的模块化 **Agentic RAG（检索增强生成）** 系统。该系统具备分层索引、对话记忆、查询澄清及多智能体并行推理等高级功能。\n\n## 环境准备\n\n### 系统要求\n- **Python**: 3.11 或更高版本\n- **操作系统**: Linux, macOS, Windows (推荐 WSL2)\n- **向量数据库**: 本地运行需支持文件读写权限\n\n### 前置依赖\n本项目核心依赖包括 `LangGraph`, `Qdrant`, `LangChain` 及相关 LLM 提供商库。\n- **本地模型推荐**: Ollama (需安装 [Ollama](https:\u002F\u002Follama.com))\n- **云端模型可选**: OpenAI, Anthropic, Google Gemini\n\n> 💡 **国内加速建议**:\n> - 安装 Python 包时，推荐使用清华或阿里镜像源：\n>   ```bash\n>   pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple \u003Cpackage_name>\n>   ```\n> - 使用 Ollama 时，若拉取模型缓慢，可配置国内镜像代理或使用已下载的模型文件。\n\n---\n\n## 安装步骤\n\n### 1. 克隆项目\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FGiovanniPasq\u002Fagentic-rag-for-dummies.git\ncd agentic-rag-for-dummies\n```\n\n### 2. 创建虚拟环境并安装依赖\n```bash\npython -m venv venv\nsource venv\u002Fbin\u002Factivate  # Windows 用户请使用: venv\\Scripts\\activate\n\n# 推荐使用国内镜像源加速安装\npip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple -r requirements.txt\n```\n*(注：若项目中无 `requirements.txt`，请根据 README 中的 Implementation 部分手动安装核心库，如下所示)*\n\n```bash\npip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple langgraph langchain-qdrant langchain-ollama langchain-huggingface pymupdf4llm qdrant-client fastembed\n```\n\n### 3. 准备本地大模型 (以 Ollama 为例)\n确保已安装 Ollama，并拉取推荐模型（建议使用 7B+ 参数模型以保证工具调用准确性）：\n```bash\nollama pull qwen3:4b-instruct-2507-q4_K_M\n# 或者更强大的模型\nollama pull llama3.1:8b\n```\n\n---\n\n## 基本使用\n\n以下是最小化的代码示例，展示如何初始化系统并进行一次问答。\n\n### 1. 初始化配置与向量库\n在 Python 脚本或 Jupyter Notebook 中运行：\n\n```python\nimport os\nfrom pathlib import Path\nfrom langchain_huggingface import HuggingFaceEmbeddings\nfrom langchain_qdrant.fastembed_sparse import FastEmbedSparse\nfrom qdrant_client import QdrantClient\nfrom langchain_ollama import ChatOllama\n\n# 1. 设置路径\nDOCS_DIR = \"docs\"  # 存放 PDF 文件的目录\nMARKDOWN_DIR = \"markdown_docs\"\nPARENT_STORE_PATH = \"parent_store\"\nCHILD_COLLECTION = \"document_child_chunks\"\n\nos.makedirs(DOCS_DIR, exist_ok=True)\nos.makedirs(MARKDOWN_DIR, exist_ok=True)\nos.makedirs(PARENT_STORE_PATH, exist_ok=True)\n\n# 2. 初始化 LLM (本地 Ollama)\nllm = ChatOllama(model=\"qwen3:4b-instruct-2507-q4_K_M\", temperature=0)\n\n# 3. 初始化 Embedding 模型\ndense_embeddings = HuggingFaceEmbeddings(model_name=\"sentence-transformers\u002Fall-mpnet-base-v2\")\nsparse_embeddings = FastEmbedSparse(model_name=\"Qdrant\u002Fbm25\")\n\n# 4. 连接 Qdrant 向量数据库\nclient = QdrantClient(path=\"qdrant_db\")\n```\n\n### 2. 数据处理 (PDF 转 Markdown 并索引)\n将你的 PDF 文档放入 `docs` 文件夹，然后运行以下代码进行分层索引：\n\n```python\nimport pymupdf4llm\nimport glob\nfrom pathlib import Path\n\n# 简单的 PDF 转 Markdown 函数\ndef pdf_to_markdown(pdf_path, output_dir):\n    doc = pymupdf.open(pdf_path)\n    md = pymupdf4llm.to_markdown(doc, header=False, footer=False, page_separators=True, ignore_images=True)\n    output_path = Path(output_dir) \u002F Path(doc.name).stem\n    Path(output_path).with_suffix(\".md\").write_bytes(md.encode('utf-8'))\n\n# 批量转换\nfor pdf_path in map(Path, glob.glob(f\"{DOCS_DIR}\u002F*.pdf\")):\n    pdf_to_markdown(pdf_path, MARKDOWN_DIR)\n\n# 后续需执行分层切分与入库逻辑 (参考项目 notebooks\u002Fagentic_rag.ipynb 中的 Step 4)\n# 此处省略具体的 Parent\u002FChild 切分代码以保持简洁，完整逻辑请参阅官方 Notebook\n```\n\n### 3. 构建 Agent 并提问\n完成索引后，构建 LangGraph 工作流并进行对话：\n\n```python\n# 伪代码示例：构建图并运行\n# from your_project_module import build_agentic_graph \n\n# graph = build_agentic_graph(llm, client, dense_embeddings, sparse_embeddings)\n\n# 模拟用户输入\nquery = \"如何使用 Python 进行数据清洗？\"\nresponse = graph.invoke({\"messages\": [(\"user\", query)]})\n\nprint(response[\"messages\"][-1].content)\n```\n\n> 🚀 **进阶体验**:\n> 想要完整体验包含“查询澄清”、“多智能体并行”和“自我修正”的全流程，强烈建议直接运行项目提供的 Colab 笔记本或本地 Notebook：\n> ```bash\n> jupyter notebook notebooks\u002Fagentic_rag.ipynb\n> ```\n\n通过以上步骤，你即可拥有一个具备上下文记忆和智能检索能力的 Agentic RAG 系统原型。","某科技公司的技术文档团队正试图构建一个内部智能问答系统，帮助工程师快速从海量且结构复杂的 API 文档和架构手册中检索准确信息。\n\n### 没有 agentic-rag-for-dummies 时\n- **回答碎片化严重**：传统 RAG 只能检索到零散的代码片段，缺乏上下文关联，导致工程师无法理解整体架构逻辑。\n- **模糊查询直接失败**：当用户提问含糊不清（如“怎么修复那个报错”）时，系统强行生成错误答案，而非主动澄清需求。\n- **多步推理能力缺失**：面对涉及多个模块的复杂问题，系统无法将其拆解为子任务并行处理，只能提供片面信息。\n- **对话记忆断层**：系统无法记住上一轮的沟通内容，用户每次追问都必须重复背景信息，体验极不流畅。\n- **开发门槛高**：想要实现上述高级功能，团队需从头编写大量 LangGraph 编排代码，耗时数周且难以维护。\n\n### 使用 agentic-rag-for-dummies 后\n- **层级索引还原全貌**：利用其父子分块机制，既能精准定位细节，又能自动召回对应的父级章节，提供完整的上下文解释。\n- **智能澄清消除歧义**：内置的查询重写与人机协作机制，会在遇到模糊问题时主动暂停并引导用户补充细节，确保答案精准。\n- **多智能体并行解题**：通过 Map-Reduce 工作流，自动将复杂问题拆解为多个子查询并行检索，最后汇总成逻辑严密的综合方案。\n- **原生支持长程记忆**：自带的对话记忆模块让系统能自然承接多轮对话，工程师可像与同事交流一样连续追问。\n- **模块化极速落地**：凭借开箱即用的模块化架构，团队仅需配置少量参数即可切换大模型供应商，一天内完成原型部署。\n\nagentic-rag-for-dummies 将原本需要数周研发的高级代理检索能力，转化为可灵活组装的标准化模块，让企业能以最低成本构建具备“思考”能力的专业知识库。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGiovanniPasq_agentic-rag-for-dummies_c4c7470e.png","GiovanniPasq","Giovanni Pasqualino","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FGiovanniPasq_3b09d9d6.jpg","AI Engineer | PhD in Computer Science","BV TECH","Italy",null,"giovannipasq.github.io\u002FGiovanniPasq","https:\u002F\u002Fgithub.com\u002FGiovanniPasq",[83,87,91],{"name":84,"color":85,"percentage":86},"Jupyter Notebook","#DA5B0B",66.9,{"name":88,"color":89,"percentage":90},"Python","#3572A5",32.7,{"name":92,"color":93,"percentage":94},"Dockerfile","#384d54",0.5,3005,411,"2026-04-08T11:21:37","MIT","未说明","非必需。若使用本地 Ollama 运行大模型，建议根据模型大小配置相应 GPU；若使用云端 API (OpenAI, Anthropic, Google) 则无需本地 GPU。","未说明（取决于所选 LLM 模型大小及文档处理量）",{"notes":103,"python":104,"dependencies":105},"该项目支持多种 LLM 提供商（Ollama 本地部署或 OpenAI\u002FAnthropic\u002FGoogle 云端 API）。若使用 Ollama 本地运行，建议使用 7B 以上参数的模型以确保工具调用和指令遵循的可靠性。向量数据库使用 Qdrant（支持本地路径存储）。包含将 PDF 转换为 Markdown 的功能。可通过 Google Colab 在线运行演示。","3.11+",[106,107,108,109,110,111,112,113,114,115],"langgraph","langchain-ollama","langchain-openai","langchain-anthropic","langchain-google-genai","langchain-huggingface","qdrant-client","langchain-qdrant","pymupdf","pymupdf4llm",[14,16,13,117,35],"其他",[119,120,121,122,123,124,125,126,106,127,128,129,130,131,132,133,134,135,136],"agentic-ai","agentic-rag","agents","rag","agent","bm25","rag-chatbot","rag-pipeline","llm","retrieval-augmented-generation","retrieval-augmented-generation-rag","gradio","langchain","qdrant","ollama","ai-agents","generative-ai","rag-agents","2026-03-27T02:49:30.150509","2026-04-09T09:33:19.884069",[140,145,150,155,160,164],{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},25880,"使用 Ollama (如 Llama 3.2) 时，上传文档并提问后 AI 没有返回消息怎么办？","这是因为某些小模型（如 Llama 3.2）无法像 Qwen3 那样正确解析嵌入在系统提示词中的问答内容。维护者已修复了负责聚合响应的节点代码。建议更新到最新代码，或者尝试使用性能更好的模型，例如 `qwen3:4b-instruct-2507-q4_K_M` 替代 `llama3.2`。","https:\u002F\u002Fgithub.com\u002FGiovanniPasq\u002Fagentic-rag-for-dummies\u002Fissues\u002F1",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},25881,"如何在远程主机上配置 Ollama 的 base_url？","虽然默认配置为了兼容性未包含此选项，但你可以通过修改代码来支持远程 Ollama 主机。具体步骤：\n1. 在 `config.py` 中添加变量：`LLM_BASEURL = \"http:\u002F\u002Fmyollamahost:11434\"`\n2. 在 `core\u002Frag_system.py` 初始化 LLM 时传入该参数：`llm = ChatOllama(model=config.LLM_MODEL, temperature=config.LLM_TEMPERATURE, base_url=config.LLM_BASEURL)`。\n未来版本的文档将包含此类自定义配置的详细说明。","https:\u002F\u002Fgithub.com\u002FGiovanniPasq\u002Fagentic-rag-for-dummies\u002Fissues\u002F3",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},25882,"项目文档中提到的代码应该放在哪个文件中？结构不清晰怎么办？","建议先跟随 Notebook 教程理解核心概念，再进行模块化实施。为了解决文件位置不明确的问题，维护者已在项目文件夹内添加了专门的 README，其中明确了各个文件的位置和用途。此外，项目中空的 `__init__.py` 文件是 Python 包结构的标准标记，用于确保模块正常导入，通常无需额外配置。","https:\u002F\u002Fgithub.com\u002FGiovanniPasq\u002Fagentic-rag-for-dummies\u002Fissues\u002F2",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},25883,"运行程序时遇到报错或没有输出，应该如何提供信息以便获得帮助？","为了有效诊断问题，请务必提供以下详细信息：\n1. 具体哪里出了问题（安装、特定功能还是整个笔记本？）；\n2. 完整的错误消息或 traceback；\n3. 出现问题前执行的具体步骤；\n4. 环境信息（Python 版本、操作系统、依赖安装方式如 pip 或 conda）；\n5. 预期行为与实际行为的对比。缺乏这些信息将无法定位问题。","https:\u002F\u002Fgithub.com\u002FGiovanniPasq\u002Fagentic-rag-for-dummies\u002Fissues\u002F4",{"id":161,"question_zh":162,"answer_zh":163,"source_url":154},25884,"文档版本与代码版本（如 v1.8）不一致，缺少配置说明怎么办？","维护者已根据反馈更新了文档，以匹配 v1.8 及更高版本的代码变更。更新内容包括补充了 `config.py` 中“文本分割配置”部分的说明，并增加了关于自定义配置的章节。请查看最新的提交记录或刷新文档页面以获取最新指南。",{"id":165,"question_zh":166,"answer_zh":167,"source_url":144},25885,"为什么推荐使用 Qwen3 而不是 Llama 3.2 作为本地模型？","根据测试，`qwen3:4b-instruct-2507-q4_K_M` 在性能上优于 `Llama 3.2 (3B)`。特别是在处理系统提示词时，Qwen3 能够更好地理解嵌入在其中的上下文（如聚合的问答对），而 Llama 3.2 可能会将其误认为是指令从而导致无响应。如果遇到问题，切换模型是一个有效的解决方案。",[169,174,179,184,189,194,199,204],{"id":170,"version":171,"summary_zh":172,"released_at":173},163209,"v2.1","新功能：\n1. 使用 Langfuse 实现可观测性\n   • 集成 Langfuse 跟踪功能，可追踪整个流水线中的 LLM 调用、工具使用以及图执行过程。\n   • 新增专用的 `observability.ipynb` 笔记本，提供逐步指南，帮助用户设置并使用 Langfuse 与系统集成。\n\n2. 流式响应与智能体透明度\n   • 响应现在以逐 token 的方式直接在 Gradio 界面中流式输出。\n   • 智能体的推理过程现可实时查看：工具调用（如 `search_child_chunks`、`retrieve_parent_chunks`）及其结果会在执行过程中以可折叠消息的形式展示。\n   • 系统节点（如查询改写、历史摘要）会逐步呈现其输出，包括在查询不明确时发出的澄清请求。\n\n错误修复：\n1. Docker 卷权限修复\n   • 修复了在使用 Docker 挂载卷时点击“清除全部”操作引发的 `PermissionError`。\n   • 现在通过移除目录内容而非删除根文件夹来清空目录，从而避免挂载点的权限问题。","2026-04-01T06:51:21",{"id":175,"version":176,"summary_zh":177,"released_at":178},163210,"v2.0","新特性：\n\n1. 上下文压缩\n    • 当上下文超过可配置的令牌阈值时，会压缩智能体的工作内存。\n    • 防止在长时间的检索循环中出现重复的工具调用。可通过 BASE_TOKEN_THRESHOLD 和 TOKEN_GROWTH_FACTOR 进行调整。\n\n2. 智能体限制与回退响应\n    • 引入了对工具调用次数（MAX_TOOL_CALLS）和推理循环迭代次数（MAX_ITERATIONS）的硬性上限，以确保执行过程的有界性。\n    • 当任一上限被达到时，智能体会回退到专门的响应节点，利用迄今为止已检索到的所有上下文生成尽可能好的答案，而不是静默失败。\n\n改进：\n\n1. 笔记本文档增强\n    • 为每个代码块添加了更丰富、更清晰的注释，并附上了官方文档的引用。\n    • 提高了代码的可读性，帮助用户更好地理解每个流水线组件。","2026-02-24T07:46:10",{"id":180,"version":181,"summary_zh":182,"released_at":183},163211,"v1.9","改进内容：\n\n1. 提升笔记本可读性  \n    • 更新笔记本，采用更清晰的格式和结构。  \n    • 改善用户体验，使代码和输出更易于理解。  \n\n2. 增强模块化架构文档  \n    • 扩充模块化架构的相关文档。  \n    • 更清楚地说明各组件之间的交互方式及扩展方法。  \n\n3. 更新系统提示词和工具输出  \n    • 优化提示词和输出内容，以提高准确性和一致性。  \n    • 增强流水线响应和结果的可靠性。  \n\n4. 更新依赖项  \n    • 将所有依赖项升级至最新版本。  \n    • 确保兼容性、安全性，并提升性能。  \n\n5. 更新 README 文件及链接  \n    • 修复并更新笔记本和图表的链接。  \n    • 简化资源的导航与访问流程。","2026-01-21T19:58:08",{"id":185,"version":186,"summary_zh":187,"released_at":188},163212,"v1.8","新功能：\n\n1. 教程笔记本中的多智能体 Map-Reduce\n    • 完全集成到笔记本中，便于用户动手实验。\n    • 使用户在处理复杂的 RAG 查询时，能够直接运行并检查多智能体流水线。\n\n2. 多种 PDF 转 Markdown 工具\n    • 在笔记本中新增了多种 PDF 到 Markdown 的转换工具选项。\n    • 提高了在处理不同 PDF 格式和提取质量时的灵活性和鲁棒性。\n\n改进：\n\n1. 增强笔记本文档\n   • 扩展并澄清了每个代码块的注释，包括对官方文档的引用。\n   • 提升了可读性，帮助用户更好地理解每个流水线组件。\n\n2. 新增故障排除章节\n    • 引入了专门的故障排除章节，用于解决常见问题。\n","2025-12-18T15:59:15",{"id":190,"version":191,"summary_zh":192,"released_at":193},163213,"v1.7","新特性：\n1. 用于 RAG 查询的多智能体 Map-Reduce\n   • 将复杂查询分解为并行子查询，以生成更全面、更准确的答案。\n   • 增强了检索和生成步骤的并行化，从而提高响应效率。\n\n改进：\n1. 代码库重构\n    • 进行了结构化清理，以提升代码的可读性和可维护性。\n    • 简化了各组件，便于后续扩展和未来开发。\n\n2. 更新的教程笔记本\n   • 优化了说明和示例，使学习流程更加顺畅。\n   • 增加了更清晰的指导，帮助用户更好地理解并高效运行整个流水线。","2025-12-08T16:59:21",{"id":195,"version":196,"summary_zh":197,"released_at":198},163214,"v1.6","新功能：\n1. 用于 RAG 流水线的端到端 Gradio 界面\n   • 集成了功能完善的 Gradio 界面，以简化与 RAG 流水线的交互。\n   • 新增通过文件上传导入 PDF 的功能，用于填充知识库。\n   • 实现了文档删除功能，允许用户从系统中移除不需要的内容。\n\n2. 模块化项目结构\n   • 重构代码库，采用模块化架构。\n   • 提升了代码的可维护性、可读性和可扩展性，便于扩展和集成新组件。","2025-11-15T13:54:36",{"id":200,"version":201,"summary_zh":202,"released_at":203},163215,"v1.5","新功能：\n1. 基于摘要的对话记忆\n    • 实现了一种记忆机制，通过总结过往交互来保持整个对话的上下文连贯性。\n\n2. 人机协作的查询澄清\n    • 引入了一个交互式澄清步骤，在查询存在歧义或不完整时提示用户。\n    • 允许系统在继续执行检索和生成之前收集更多上下文信息。","2025-10-31T16:07:54",{"id":205,"version":206,"summary_zh":207,"released_at":208},163216,"v1.0","初始发布：\n\n1. 智能体式 RAG 流水线\n    • 首次公开发布基于 LangGraph 构建的极简智能体式 RAG 系统。\n    • 采用父子块切分策略实现层次化索引，以支持精准且富含上下文的检索。\n    • 集成 Qdrant 的混合检索功能，同时使用稠密和稀疏向量表示。","2025-10-20T21:24:09"]