[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Danielskry--Awesome-RAG":3,"tool-Danielskry--Awesome-RAG":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":79,"owner_twitter":79,"owner_website":79,"owner_url":81,"languages":79,"stars":82,"forks":83,"last_commit_at":84,"license":85,"difficulty_score":86,"env_os":87,"env_gpu":88,"env_ram":88,"env_deps":89,"category_tags":92,"github_topics":93,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":99,"updated_at":100,"faqs":101,"releases":102},3894,"Danielskry\u002FAwesome-RAG","Awesome-RAG","😎 Awesome list of Retrieval-Augmented Generation (RAG) applications in Generative AI.","Awesome-RAG 是一份精心整理的生成式 AI 资源清单，专注于检索增强生成（RAG）技术。它就像一张详细的“生态地图”，汇集了构建 RAG 系统所需的工具、框架、架构模式、实战技巧及学习材料。\n\n传统大语言模型往往受限于训练数据的截止时间，且容易产生“幻觉”或无法回答私有领域问题。Awesome-RAG 旨在解决这些痛点，帮助开发者通过动态检索外部知识库，让模型能够利用最新、特定领域或专有的信息进行回答。这不仅显著提升了回答的准确性，还降低了微调成本，并增强了数据来源的透明度与安全性。\n\n这份资源特别适合 AI 开发者、研究人员以及希望将大模型落地到具体业务场景的技术团队。无论你是想从零开始搭建原型，还是寻求生产环境下的优化策略，都能在这里找到权威指南。其独特亮点在于内容覆盖极广，从基础的 Python 教程（如 LangChain、LlamaIndex 实战）到高级的架构设计、评估指标乃至生产级最佳实践，均提供了直接的链接与说明，是探索和建设 RAG 应用的一站式入门与进阶宝典。","# 😎 Awesome Retrieval Augmented Generation (RAG) \n[![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge-flat.svg)](https:\u002F\u002Fawesome.re) [![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002FDanielskry\u002FAwesome-RAG)\n\nA curated resource map of tools, frameworks, techniques, and learning materials for building Retrieval-Augmented Generation (RAG) systems. This repository catalogs the RAG ecosystem and provides links to authoritative sources, tutorials, and implementations to help you explore and build RAG applications.\n\n## Overview\n\n**Retrieval-Augmented Generation (RAG)** is a sophisticated technique in Generative AI that enhances Large Language Models (LLMs) by dynamically retrieving and incorporating relevant context from external knowledge sources during the generation process. Unlike traditional LLMs that rely solely on pre-trained knowledge, RAG systems enable models to access up-to-date, domain-specific, or proprietary information, significantly improving accuracy, reducing hallucinations, and enabling real-time knowledge integration.\n\n### Key Benefits\n\n- **Reduced Hallucinations**: Grounds responses in retrieved factual information\n- **Domain Adaptation**: Enables LLMs to work with specialized knowledge without fine-tuning\n- **Real-time Updates**: Incorporates latest information without model retraining\n- **Cost Efficiency**: More economical than fine-tuning for domain-specific tasks\n- **Transparency**: Provides source attribution for generated content\n- **Privacy & Security**: Keeps sensitive data in private knowledge bases\n\n## Content\n\n- [ℹ️ General Information on RAG](#ℹ%EF%B8%8F-general-information-on-rag)\n- [🏗️ Architecture Patterns](#%EF%B8%8F-architecture-patterns)\n- [🎯 Advanced Approaches](#-advanced-approaches)\n- [🧰 Frameworks that Facilitate RAG](#-frameworks-that-facilitate-rag)\n- [🐍 Python Ecosystem for RAG](#-python-ecosystem-for-rag)\n- [🛠️ Techniques](#-techniques)\n- [📊 Metrics & Evaluation](#-metrics--evaluation)\n- [💾 Databases](#-databases)\n- [🔌 Platform-Specific RAG Implementations](#-platform-specific-rag-implementations)\n- [🚀 Production Considerations](#-production-considerations)\n- [💡 Best Practices](#-best-practices)\n\n## ℹ️ General Information on RAG\n\nRAG addresses a fundamental limitation of LLMs: their static knowledge cutoff and inability to access external information. Traditional RAG implementations employ a retrieval pipeline that enriches LLM prompts with contextually relevant documents from a knowledge base. For example, when querying about renovation materials for a specific house, the LLM may have general renovation knowledge but lacks details about that particular property. An RAG system can retrieve relevant documents (e.g., blueprints, material specifications, local building codes) to provide accurate, context-aware responses.\n\n### Implementation Resources\n\n#### Python Tutorials & Examples\n\n- Complete basic [RAG implementation in Python](https:\u002F\u002Fgithub.com\u002FDanielskry\u002FLangChain-Chroma-RAG-demo-2024): Full-stack RAG example with LangChain and Chroma\n- [LangChain RAG Tutorial](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fuse_cases\u002Fquestion_answering\u002F): Comprehensive guide to building RAG applications\n- [LlamaIndex RAG Tutorial](https:\u002F\u002Fdocs.llamaindex.ai\u002Fen\u002Fstable\u002Fgetting_started\u002Fstarter_example\u002F): Getting started with LlamaIndex for RAG\n- [Haystack RAG Pipeline](https:\u002F\u002Fdocs.haystack.deepset.ai\u002Fdocs\u002Fretrieval-augmented-generation): Building RAG pipelines with Haystack\n\n#### Production & Best Practices\n\n- [Production RAG patterns and best practices](https:\u002F\u002Fdocs.llamaindex.ai\u002Fen\u002Fstable\u002Foptimizing\u002Fproduction_rag\u002F): Production-ready RAG optimization strategies\n- [LangChain Production Guide](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fproduction\u002F): Deploying LangChain applications to production\n- [Python Async Best Practices](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Fasyncio-dev.html): Writing efficient async Python code for AI applications\n\n## 🏗️ Architecture Patterns\n\nRAG systems can be architected using various patterns depending on requirements:\n\n- **Naive RAG**: Basic retrieve-then-generate pipeline without optimization\n- **Advanced RAG**: Incorporates query rewriting, re-ranking, and context compression\n- **Modular RAG**: Composable components for retrieval, ranking, and generation\n- **Agentic RAG**: LLM-driven agents that make retrieval decisions dynamically\n- **Self-RAG**: Models that self-reflect on retrieval quality and adjust strategies\n- **Graph RAG**: Leverages knowledge graphs for structured information retrieval\n\n## 🎯 Advanced Approaches\n\nRAG implementations vary in complexity, from simple document retrieval to advanced techniques integrating iterative feedback loops, multi-agent systems, and domain-specific enhancements. Modern approaches include:\n\n- [Vision-RAG](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=npkp4mSweEg): Embeds entire pages as images, allowing vision models to handle reasoning directly without parsing text-RAG.\n- [Cache-Augmented Generation (CAG)](https:\u002F\u002Fmedium.com\u002F@ronantech\u002Fcache-augmented-generation-cag-in-llms-a-step-by-step-tutorial-6ac35d415eec): Preloads relevant documents into a model’s context and stores the inference state (Key-Value (KV) cache).\n- [Agentic RAG](https:\u002F\u002Flangchain-ai.github.io\u002Flanggraph\u002Ftutorials\u002Frag\u002Flanggraph_agentic_rag\u002F): Also known as retrieval agents, can make decisions on retrieval processes.\n- [A-RAG](https:\u002F\u002Fgithub.com\u002FAyanami0730\u002Farag): Agentic RAG with hierarchical retrieval interfaces (keyword, semantic, chunk-level), enabling LLM agents to autonomously search and retrieve at multiple granularities. ([Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.03442))\n- [Corrective RAG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.15884.pdf) (CRAG): Methods to correct or refine the retrieved information before integration into LLM responses.\n- [Retrieval-Augmented Fine-Tuning](https:\u002F\u002Ftechcommunity.microsoft.com\u002Ft5\u002Fai-ai-platform-blog\u002Fraft-a-new-way-to-teach-llms-to-be-better-at-rag\u002Fba-p\u002F4084674) (RAFT): Techniques to fine-tune LLMs specifically for enhanced retrieval and generation tasks.\n- [Self Reflective RAG](https:\u002F\u002Fselfrag.github.io\u002F): Models that dynamically adjust retrieval strategies based on model performance feedback.\n- [RAG Fusion](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03367): Techniques combining multiple retrieval methods for improved context integration.\n- [Temporal Augmented Retrieval](https:\u002F\u002Fadam-rida.medium.com\u002Ftemporal-augmented-retrieval-tar-dynamic-rag-ad737506dfcc) (TAR): Considering time-sensitive data in retrieval processes.\n- [Plan-then-RAG](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.12430) (PlanRAG): Strategies involving planning stages before executing RAG for complex tasks.\n- [GraphRAG](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fgraphrag): A structured approach using knowledge graphs for enhanced context integration and reasoning.\n- [Code-Graph-RAG](https:\u002F\u002Fgithub.com\u002Fvitali87\u002Fcode-graph-rag): A knowledge graph RAG system for multi-language codebase analysis.\n- [FLARE](https:\u002F\u002Fmedium.com\u002Fetoai\u002Fbetter-rag-with-active-retrieval-augmented-generation-flare-3b66646e2a9f) - An approach that incorporates active retrieval-augmented generation to improve response quality.\n- [GNN-RAG](https:\u002F\u002Fgithub.com\u002Fcmavro\u002FGNN-RAG): Graph neural retrieval for large language modeling reasoning.\n- [Multimodal RAG](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fan-easy-introduction-to-multimodal-retrieval-augmented-generation\u002F): Extends RAG to handle multiple modalities such as text, images, and audio.\n- [VideoRAG](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.05874): Extends RAG to videos using Large Video Language Models (LVLMs) to retrieve and integrate visual and textual content for multimodal generation.\n- [REFRAG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.01092): Optimizes RAG decoding by compressing retrieved context into embeddings before generation, reducing latency while maintaining output quality.\n- [InstructRAG](https:\u002F\u002Fgithub.com\u002Fweizhepei\u002FInstructRAG): Enhances RAG systems through instruction-based fine-tuning using self-synthesized rationales to improve retrieval and generation quality. \n\n## 🧰 Frameworks that Facilitate RAG\n\n- [Haystack](https:\u002F\u002Fgithub.com\u002Fdeepset-ai\u002Fhaystack): LLM orchestration framework to build customizable, production-ready LLM applications.\n- [LangChain](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fmodules\u002Fdata_connection\u002F): An all-purpose framework for working with LLMs.\n- [Semantic Kernel](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fsemantic-kernel): An SDK from Microsoft for developing Generative AI applications.\n- [LlamaIndex](https:\u002F\u002Fdocs.llamaindex.ai\u002Fen\u002Fstable\u002Foptimizing\u002Fproduction_rag\u002F): Framework for connecting custom data sources to LLMs.\n- [Dify](https:\u002F\u002Fgithub.com\u002Flanggenius\u002Fdify): An open-source LLM app development platform.\n- [Cognita](https:\u002F\u002Fgithub.com\u002Ftruefoundry\u002Fcognita): Open-source RAG framework for building modular and production ready applications.\n- [Verba](https:\u002F\u002Fgithub.com\u002Fweaviate\u002FVerba): Open-source application for RAG out of the box.\n- [Mastra](https:\u002F\u002Fgithub.com\u002Fmastra-ai\u002Fmastra): Typescript framework for building AI applications.\n- [Letta](https:\u002F\u002Fgithub.com\u002Fletta-ai\u002Fletta): Open source framework for building stateful LLM applications.\n- [Flowise](https:\u002F\u002Fgithub.com\u002FFlowiseAI\u002FFlowise): Drag & drop UI to build customized LLM flows.\n- [Kreuzberg](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg): Polyglot document intelligence library (Rust core with Python, TypeScript, Go bindings) that extracts text, tables, and metadata from 62+ document formats for RAG ingestion pipelines.\n- [Swiftide](https:\u002F\u002Fgithub.com\u002Fbosun-ai\u002Fswiftide): Rust framework for building modular, streaming LLM applications.\n- [CocoIndex](https:\u002F\u002Fgithub.com\u002Fcocoindex-io\u002Fcocoindex): ETL framework to index data for AI, such as RAG; with realtime incremental updates.\n- [Pathway](https:\u002F\u002Fgithub.com\u002Fpathwaycom\u002Fpathway\u002F): Performant open-source Python ETL framework with Rust runtime, supporting 300+ data sources.\n- [Pathway AI Pipelines](https:\u002F\u002Fgithub.com\u002Fpathwaycom\u002Fllm-app\u002F): A production-ready RAG framework supporting real-time indexing, retrieval, and change tracking across diverse data sources.\n- [LiteLLM](https:\u002F\u002Fdocs.litellm.ai\u002F): Unified interface for multiple LLM providers (OpenAI, Anthropic, Hugging Face, Replicate) with logging, monitoring, and cost tracking.\n- [Agentset](https:\u002F\u002Fgithub.com\u002Fagentset-ai\u002Fagentset): Open-source production-ready RAG platform with built-in agentic reasoning, hybrid search, and multimodal support.\n\n## 🐍 Python Ecosystem for RAG\n\nPython is the most mature ecosystem for RAG today, with extensive support for\nLLMs, embeddings, vector databases, evaluation, and production tooling.\n\nSee the full guide: [Python Ecosystem for RAG](docs\u002Fpython-ecosystem.md)\n\n## 🛠️ Techniques\n\n### Data cleaning\n\n- [Data cleaning techniques](https:\u002F\u002Fmedium.com\u002Fintel-tech\u002Ffour-data-cleaning-techniques-to-improve-large-language-model-llm-performance-77bee9003625): Pre-processing steps to refine input data and improve model performance.\n\n### Prompting\n\n- **Strategies**\n  - [Tagging and Labeling](https:\u002F\u002Fpython.langchain.com\u002Fv0.1\u002Fdocs\u002Fuse_cases\u002Ftagging\u002F): Adding semantic tags or labels to retrieved data to enhance relevance.\n  - [Chain of Thought (CoT)](https:\u002F\u002Fwww.promptingguide.ai\u002Ftechniques\u002Fcot): Encouraging the model to think through problems step by step before providing an answer.\n  - [Chain of Verification (CoVe)](https:\u002F\u002Fsourajit16-02-93.medium.com\u002Fchain-of-verification-cove-understanding-implementation-e7338c7f4cb5): Prompting the model to verify each step of its reasoning for accuracy.\n  - [Self-Consistency](https:\u002F\u002Fwww.promptingguide.ai\u002Ftechniques\u002Fconsistency): Generating multiple reasoning paths and selecting the most consistent answer.\n  - [Zero-Shot Prompting](https:\u002F\u002Fwww.promptingguide.ai\u002Ftechniques\u002Fzeroshot): Designing prompts that guide the model without any examples.\n  - [Few-Shot Prompting](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fhow_to\u002Ffew_shot_examples\u002F): Providing a few examples in the prompt to demonstrate the desired response format.\n  - [Reason & Act (ReAct) prompting](https:\u002F\u002Fwww.promptingguide.ai\u002Ftechniques\u002Freact): Combines reasoning (e.g. CoT) with acting (e.g. tool calling).\n- **Caching**\n  - [Prompt Caching](https:\u002F\u002Fmedium.com\u002F@1kg\u002Fprompt-cache-what-is-prompt-caching-a-comprehensive-guide-e6cbae48e6a3): Optimizes LLMs by storing and reusing precomputed attention states.\n- **Structuring**\n  -  [Token-Oriented Object Notation](https:\u002F\u002Fgithub.com\u002Ftoon-format\u002Ftoon): A compact, deterministic JSON format for LLM prompts.\n\n### Chunking\n\nChunking strategy is one of the most critical decisions in RAG system design, directly impacting retrieval precision and context quality. The optimal approach depends on document types, domain characteristics, and query patterns.\n\n- **[Fixed-Size Chunking](https:\u002F\u002Fmedium.com\u002F@anuragmishra_27746\u002Ffive-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d)**\n  - **Use Case**: Simple, uniform documents where structure is less important\n  - **Characteristics**: Divides text into consistent-sized segments (typically 256-512 tokens) with configurable overlap (10-20%)\n  - **Pros**: Simple to implement, predictable chunk sizes, efficient processing\n  - **Cons**: May split sentences\u002Fparagraphs, loses document structure, can fragment semantic units\n  - **Implementation**: [CharacterTextSplitter](https:\u002F\u002Fpython.langchain.com\u002Fv0.1\u002Fdocs\u002Fmodules\u002Fdata_connection\u002Fdocument_transformers\u002Fcharacter_text_splitter\u002F) (LangChain), [SentenceSplitter](https:\u002F\u002Fdocs.llamaindex.ai\u002Fen\u002Fstable\u002Fapi_reference\u002Fnode_parsers\u002Fsentence_splitter\u002F) (LlamaIndex)\n\n- **[Recursive Chunking](https:\u002F\u002Fmedium.com\u002F@AbhiramiVS\u002Fchunking-methods-all-to-know-about-it-65c10aa7b24e)**\n  - **Use Case**: Documents with hierarchical structure (markdown, HTML, code)\n  - **Characteristics**: Recursively splits by separators (paragraphs → sentences → words) until desired chunk size\n  - **Pros**: Preserves natural boundaries, respects document hierarchy, better semantic coherence\n  - **Cons**: More complex, variable chunk sizes, requires careful separator configuration\n  - **Implementation**: [RecursiveCharacterTextSplitter](https:\u002F\u002Fpython.langchain.com\u002Fv0.1\u002Fdocs\u002Fmodules\u002Fdata_connection\u002Fdocument_transformers\u002Frecursive_text_splitter\u002F) (LangChain)\n\n- **[Document-Based Chunking](https:\u002F\u002Fmedium.com\u002F@david.richards.tech\u002Fdocument-chunking-for-rag-ai-applications-04363d48fbf7)**\n  - **Use Case**: Structured documents with clear sections (markdown headers, PDF sections, database records)\n  - **Characteristics**: Segments based on document metadata, formatting cues, or structural elements\n  - **Pros**: Maintains document structure, preserves context, enables metadata-rich retrieval\n  - **Cons**: Requires structured input, may create very large or very small chunks\n  - **Implementation**: [MarkdownHeaderTextSplitter](https:\u002F\u002Fpython.langchain.com\u002Fv0.1\u002Fdocs\u002Fmodules\u002Fdata_connection\u002Fdocument_transformers\u002Fmarkdown_header_metadata\u002F) (LangChain)\n  - **Multimodal**: Handle images and text with models like [OpenCLIP](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002Fopen_clip)\n\n- **[Semantic Chunking](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8OJC21T2SL4&t=1933s)**\n  - **Use Case**: Documents where semantic coherence is critical (narratives, technical documentation)\n  - **Characteristics**: Uses embedding similarity to identify natural semantic boundaries\n  - **Pros**: Preserves semantic units, adapts to content, improves retrieval relevance\n  - **Cons**: Computationally expensive, requires embedding model, less predictable chunk sizes\n  - **Best For**: High-quality retrieval where context preservation is paramount\n\n- **[Agentic Chunking](https:\u002F\u002Fyoutu.be\u002F8OJC21T2SL4?si=8VnYaGUaBmtZhCsg&t=2882)**\n  - **Use Case**: Complex documents requiring intelligent segmentation decisions\n  - **Characteristics**: Uses LLMs to analyze content and determine optimal chunk boundaries\n  - **Pros**: Highly adaptive, understands context, can apply domain knowledge\n  - **Cons**: High cost, slower processing, requires LLM API access\n  - **Best For**: Specialized domains where standard chunking fails\n\n**Chunking Best Practices:**\n- **Overlap Strategy**: Use 10-20% overlap to maintain context across boundaries\n- **Size Optimization**: Balance chunk size (larger = more context, smaller = better precision)\n- **Metadata Preservation**: Retain document structure, headers, and formatting in chunk metadata\n- **Multi-Granularity**: Consider hierarchical approaches (small chunks for retrieval, larger for context)\n\n### Embeddings\n\nEmbeddings are the foundation of semantic search in RAG systems. The choice of embedding model significantly impacts retrieval quality.\n\n- **Model Selection**\n  - **[MTEB Leaderboard](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmteb\u002Fleaderboard)**: Comprehensive benchmark for evaluating embedding models across multiple tasks and languages. Consider models that perform well on tasks relevant to your use case (retrieval, clustering, classification).\n  - **Model Characteristics**: Evaluate models based on:\n    - **Dimensions**: Higher dimensions (768-1024) generally offer better quality but increase storage and compute costs\n    - **Context Length**: Ensure models support your document chunk sizes\n    - **Multilingual Support**: Required for international applications\n    - **Domain Specialization**: General-purpose vs. domain-specific (e.g., scientific, legal, medical)\n  \n- **Custom Embeddings**\n  - **Fine-tuning**: Adapt pre-trained models to your domain using contrastive learning, triplet loss, or supervised fine-tuning\n  - **Training from Scratch**: For highly specialized domains with sufficient labeled data\n  - **Multi-Modal Embeddings**: For applications requiring text, image, or audio understanding (e.g., CLIP, ImageBind)\n  - **Ensemble Methods**: Combine multiple embedding models for improved robustness\n\n### Retrieval\n\n- **Search Methods**\n  - [Vector Store Flat Index](https:\u002F\u002Fweaviate.io\u002Fdevelopers\u002Facademy\u002Fpy\u002Fvector_index\u002Fflat)\n    - Simple and efficient form of retrieval.\n    - Content is vectorized and stored as flat content vectors.\n  - [Hierarchical Index Retrieval](https:\u002F\u002Fpixion.co\u002Fblog\u002Frag-strategies-hierarchical-index-retrieval)\n    - Hierarchically narrow data to different levels.\n    - Executes retrievals by hierarchical order.\n  - [Hypothetical Questions](https:\u002F\u002Fpixion.co\u002Fblog\u002Frag-strategies-hypothetical-questions-hyde)\n    - Used to increase similarity between database chunks and queries (same with HyDE).\n    - LLM is used to generate specific questions for each text chunk.\n    - Converts these questions into vector embeddings.\n    - During search, matches queries against this index of question vectors.\n  - [Hypothetical Document Embeddings (HyDE)](https:\u002F\u002Fpixion.co\u002Fblog\u002Frag-strategies-hypothetical-questions-hyde)\n    - Used to increase similarity between database chunks and queries (same with Hypothetical Questions).\n    - LLM is used to generate a hypothetical response based on the query.\n    - Converts this response into a vector embedding.\n    - Compares the query vector with the hypothetical response vector.\n  - [Small to Big Retrieval](https:\u002F\u002Fgithub.com\u002FGoogleCloudPlatform\u002Fgenerative-ai\u002Fblob\u002Fmain\u002Fgemini\u002Fuse-cases\u002Fretrieval-augmented-generation\u002Fsmall_to_big_rag\u002Fsmall_to_big_rag.ipynb)\n    - Improves retrieval by using smaller chunks for search and larger chunks for context.\n    - Smaller child chunks refers to bigger parent chunks\n  - [Contextual Retrieval](https:\u002F\u002Fwww.anthropic.com\u002Fengineering\u002Fcontextual-retrieval)\n    - Enhances RAG retrieval accuracy by preserving document context that is typically lost during chunking.\n    - Each text chunk is enriched with a short, model-generated summary before embedding and indexing, resulting in Contextual Embeddings and Contextual BM25.\n    - This combined approach improves both semantic and lexical matching, reducing retrieval failure rates when paired with reranking.\n  - [Adaptive Retrieval](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.14403)\n    - Dynamically decide when and how much to retrieve during generation.\n  - [Query Reformulation and Expansion](https:\u002F\u002Fhaystack.deepset.ai\u002Fcookbook\u002Fquery-expansion)\n    - Automatically rewrites or expands the query before retrieval to boost recall.\n    - Useful for long or ambiguous user queries.\n- **[Re-ranking](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fenhancing-rag-pipelines-with-re-ranking\u002F)**: Enhances search results in RAG pipelines by reordering initially retrieved documents, prioritizing those most semantically relevant to the query.\n\n### Response Quality & Safety\n\nEnsuring high-quality, safe, and reliable responses is critical for production RAG systems.\n\n- **Hallucination Mitigation**\n  - **[Detection Techniques](https:\u002F\u002Fmachinelearningmastery.com\u002Frag-hallucination-detection-techniques\u002F)**: Implement methods to identify when models generate unsupported information\n  - **Grounding Verification**: Cross-reference generated claims with retrieved context\n  - **Confidence Scoring**: Assign confidence scores to generated responses based on source quality\n  - **Source Attribution**: Require citations for all factual claims\n  - **Retrieval Quality**: Improve retrieval precision to reduce hallucination risk\n\n- **Guardrails & Safety**\n  - **[Implementation Guide](https:\u002F\u002Fdeveloper.ibm.com\u002Ftutorials\u002Fawb-how-to-implement-llm-guardrails-for-rag-applications\u002F)**: Comprehensive approach to implementing safety mechanisms\n  - **Content Moderation**: Filter harmful, biased, or inappropriate content at input and output stages\n  - **Bias Mitigation**: Detect and mitigate biases in retrieved content and generated responses\n  - **Fact-Checking**: Verify claims against authoritative sources or knowledge bases\n  - **Toxicity Detection**: Use classifiers to identify and filter toxic content\n\n- **Prompt Injection Prevention**\n  - **[Security Guide](https:\u002F\u002Fhiddenlayer.com\u002Finnovation-hub\u002Fprompt-injection-attacks-on-llms\u002F)**: Understanding and preventing prompt injection attacks\n  - **Input Validation**: Rigorously validate and sanitize all external inputs using whitelisting, length limits, and pattern matching\n  - **Content Separation**: Use clear delimiters, templating systems, and role-based prompts to separate instructions from user data\n  - **Output Monitoring**: Continuously monitor responses for anomalies, unexpected behaviors, or security violations\n  - **Rate Limiting**: Implement rate limits and abuse detection to prevent systematic attacks\n  - **Sandboxing**: Isolate LLM execution environments to limit potential damage from successful injections\n\n## 📊 Metrics & Evaluation\n\n### Similarity Metrics for Embeddings\n\nThese metrics are used to measure the similarity between embeddings, which is crucial for evaluating how effectively RAG systems retrieve and integrate external documents or data sources. By selecting appropriate similarity metrics, you can optimize the performance and accuracy of your RAG system. Alternatively, you may develop custom metrics tailored to your specific domain or niche to capture domain-specific nuances and improve relevance.\n\n- **[Cosine Similarity](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCosine_similarity)**\n\n  - Measures the cosine of the angle between two vectors in a multi-dimensional space.\n  - Highly effective for comparing text embeddings where the direction of the vectors represents semantic information.\n  - Commonly used in RAG systems to measure semantic similarity between query embeddings and document embeddings.\n\n- **[Dot Product](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FDot_product)**\n\n  - Calculates the sum of the products of corresponding entries of two sequences of numbers.\n  - Equivalent to cosine similarity when vectors are normalized.\n  - Simple and efficient, often used with hardware acceleration for large-scale computations.\n\n- **[Euclidean Distance](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FEuclidean_distance)**\n\n  - Computes the straight-line distance between two points in Euclidean space.\n  - Can be used with embeddings but may lose effectiveness in high-dimensional spaces due to the \"[curse of dimensionality](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F99171\u002Fwhy-is-euclidean-distance-not-a-good-metric-in-high-dimensions).\"\n  - Often used in clustering algorithms like K-means after dimensionality reduction.\n\n- **[Jaccard Similarity](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FJaccard_index)**\n  - Measures the similarity between two finite sets as the size of the intersection divided by the size of the union of the sets.\n  - Useful when comparing sets of tokens, such as in bag-of-words models or n-gram comparisons.\n  - Less applicable to continuous embeddings produced by LLMs.\n\n> **Note:** Cosine Similarity and Dot Product are generally seen as the most effective metrics for measuring similarity between high-dimensional embeddings.\n\n### Response Evaluation Metrics\n\nResponse evaluation in RAG solutions involves assessing the quality of language model outputs using diverse metrics. Here are structured approaches to evaluating these responses:\n\n- **Automated Benchmarking**\n\n  - **[BLEU](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FBLEU):** Evaluates the overlap of n-grams between machine-generated and reference outputs, providing insight into precision.\n  - **[ROUGE](\u003Chttps:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FROUGE_(metric)>):** Measures recall by comparing n-grams, skip-bigrams, or longest common subsequence with reference outputs.\n  - **[METEOR](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FMETEOR):** Focuses on exact matches, stemming, synonyms, and alignment for machine translation.\n\n- **Human Evaluation**\n  Involves human judges assessing responses for:\n  - **Relevance:** Alignment with user queries.\n  - **Fluency:** Grammatical and stylistic quality.\n  - **Factual Accuracy:** Verifying claims against authoritative sources.\n  - **Coherence:** Logical consistency within responses.\n  \n  Approaches include:\n  - **[Annotation queues](https:\u002F\u002Fdocs.langchain.com\u002Flangsmith\u002Fannotation-queues):** provides a streamlined, directed view for human annotators to attach feedback to specific runs.\n\n- **Model Evaluation**\n  Leverages pre-trained evaluators to benchmark outputs against diverse criteria:\n\n  - **[TuringBench](https:\u002F\u002Fturingbench.ist.psu.edu\u002F):** Offers comprehensive evaluations across language benchmarks.\n  - **[Hugging Face Evaluate](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002Fen\u002Findex):** Calculates alignment with human preferences.\n\n- **Key Dimensions for Evaluation**\n  - **Groundedness:** Assesses if responses are based entirely on provided context. Low groundedness may indicate reliance on hallucinated or irrelevant information.\n  - **Completeness:** Measures if the response answers all aspects of a query.\n  - **Approaches:** AI-assisted retrieval scoring and prompt-based intent verification.\n  - **Utilization:** Evaluates the extent to which retrieved data contributes to the response.\n  - **Analysis:** Use LLMs to check the inclusion of retrieved chunks in responses.\n\n#### Tools\n\nThese tools can assist in evaluating the performance of your RAG system, from tracking user feedback to logging query interactions and comparing multiple evaluation metrics over time.\n\n- **[LangFuse](https:\u002F\u002Fgithub.com\u002Flangfuse\u002Flangfuse)**: Open-source tool for tracking LLM metrics, observability, and prompt management.\n- **[Opik](https:\u002F\u002Fgithub.com\u002Fcomet-ml\u002Fopik)**: Open-source platform for LLM observability, evaluations, and prompt optimization.\n- **[Ragas](https:\u002F\u002Fdocs.ragas.io\u002Fen\u002Fstable\u002F)**: Framework that helps evaluate RAG pipelines.\n- **[LangSmith](https:\u002F\u002Fdocs.smith.langchain.com\u002F)**: A platform for building production-grade LLM applications, allows you to closely monitor and evaluate your application.\n- **[Hugging Face Evaluate](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate)**: Tool for computing metrics like BLEU and ROUGE to assess text quality.\n- **[Weights & Biases](https:\u002F\u002Fwandb.ai\u002Fwandb-japan\u002Frag-hands-on\u002Freports\u002FStep-for-developing-and-evaluating-RAG-application-with-W-B--Vmlldzo1NzU4OTAx)**: Tracks experiments, logs metrics, and visualizes performance.\n\n## 💾 Databases\n\nVector databases are critical components of RAG systems, providing efficient storage and similarity search capabilities for embeddings. The selection of an appropriate database depends on factors such as scale, latency requirements, deployment model (cloud vs. on-premises), and feature needs (hybrid search, filtering, etc.). The list below features database systems suitable for RAG applications:\n\n### Benchmarks\n\n- [Picking a vector database](https:\u002F\u002Fbenchmark.vectorview.ai\u002Fvectordbs.html)\n\n### Distributed Data Processing and Serving Engines:\n\n- [Apache Cassandra](https:\u002F\u002Fcassandra.apache.org\u002Fdoc\u002Flatest\u002Fcassandra\u002Fvector-search\u002Fconcepts.html): Distributed NoSQL database management system.\n- [MongoDB Atlas](https:\u002F\u002Fwww.mongodb.com\u002Fproducts\u002Fplatform\u002Fatlas-vector-search): Globally distributed, multi-model database service with integrated vector search.\n- [Vespa](https:\u002F\u002Fvespa.ai\u002F): Open-source big data processing and serving engine designed for real-time applications.\n\n### Search Engines with Vector Capabilities:\n\n- [Elasticsearch](https:\u002F\u002Fwww.elastic.co\u002Felasticsearch): Provides vector search capabilities along with traditional search functionalities.\n- [OpenSearch](https:\u002F\u002Fgithub.com\u002Fopensearch-project\u002FOpenSearch): Distributed search and analytics engine, forked from Elasticsearch.\n\n### Vector Databases:\n\n- [Chroma DB](https:\u002F\u002Fgithub.com\u002Fchroma-core\u002Fchroma): An AI-native open-source embedding database.\n- [Milvus](https:\u002F\u002Fgithub.com\u002Fmilvus-io\u002Fmilvus): An open-source vector database for AI-powered applications.\n- [Pinecone](https:\u002F\u002Fwww.pinecone.io\u002F): A serverless vector database, optimized for machine learning workflows.\n- [Oracle AI Vector Search](https:\u002F\u002Fwww.oracle.com\u002Fdatabase\u002Fai-vector-search\u002F#retrieval-augmented-generation): Integrates vector search capabilities within Oracle Database for semantic querying based on vector embeddings.\n\n### Relational Database Extensions:\n\n- [Pgvector](https:\u002F\u002Fgithub.com\u002Fpgvector\u002Fpgvector): An open-source extension for vector similarity search in PostgreSQL.\n\n### Other Database Systems:\n\n- [Azure Cosmos DB](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fcosmos-db\u002Fvector-database): Globally distributed, multi-model database service with integrated vector search.\n- [Couchbase](https:\u002F\u002Fwww.couchbase.com\u002Fproducts\u002Fvector-search\u002F): A distributed NoSQL cloud database.\n- [Lantern](https:\u002F\u002Flantern.dev\u002F): A privacy-aware personal search engine.\n- [LlamaIndex](https:\u002F\u002Fdocs.llamaindex.ai\u002Fen\u002Fstable\u002Fmodule_guides\u002Fstoring\u002Fvector_stores\u002F): Employs a straightforward in-memory vector store for rapid experimentation.\n- [Neo4j](https:\u002F\u002Fneo4j.com\u002Fdocs\u002Fcypher-manual\u002Fcurrent\u002Findexes\u002Fsemantic-indexes\u002Fvector-indexes\u002F): Graph database management system.\n- [Qdrant](https:\u002F\u002Fgithub.com\u002Fneo4j\u002Fneo4j): An open-source vector database designed for similarity search.\n- [Redis Stack](https:\u002F\u002Fredis.io\u002Fdocs\u002Flatest\u002Fdevelop\u002Finteract\u002Fsearch-and-query\u002F): An in-memory data structure store used as a database, cache, and message broker.\n- [SurrealDB](https:\u002F\u002Fgithub.com\u002Fsurrealdb\u002Fsurrealdb): A scalable multi-model database optimized for time-series data.\n- [Weaviate](https:\u002F\u002Fgithub.com\u002Fweaviate\u002Fweaviate): A open-source cloud-native vector search engine.\n\n### Vector Search Libraries and Tools:\n\n- [FAISS](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffaiss): A library for efficient similarity search and clustering of dense vectors, designed to handle large-scale datasets and optimized for fast retrieval of nearest neighbors.\n\n## 🚀 Production Considerations\n\nBuilding production-grade RAG systems requires addressing several critical aspects beyond the core retrieval and generation pipeline:\n\n### Scalability & Performance\n\n- **Indexing Throughput**: Design pipelines to handle high-volume document ingestion with incremental updates\n- **Query Latency**: Optimize retrieval speed through efficient indexing (HNSW, IVF), caching strategies, and parallel processing\n- **Concurrent Requests**: Implement connection pooling, request queuing, and load balancing for high-traffic scenarios\n- **Resource Management**: Monitor GPU\u002FCPU utilization, memory consumption, and database connection pools\n\n### Reliability & Monitoring\n\n- **Observability**: Implement comprehensive logging, tracing, and metrics collection (latency, throughput, error rates)\n- **Health Checks**: Monitor embedding service availability, vector database connectivity, and LLM API status\n- **Error Handling**: Implement retry logic, circuit breakers, and graceful degradation strategies\n- **A\u002FB Testing**: Compare different retrieval strategies, chunking methods, and prompt templates\n\n### Data Management\n\n- **Incremental Updates**: Support real-time or near-real-time document indexing without full re-indexing\n- **Version Control**: Track document versions, embedding model versions, and prompt templates\n- **Data Quality**: Implement validation pipelines to detect corrupted embeddings, missing metadata, or stale content\n- **Backup & Recovery**: Regular backups of vector indexes and metadata stores\n\n### Security & Compliance\n\n- **Access Control**: Implement authentication, authorization, and audit logging\n- **Data Privacy**: Encrypt data at rest and in transit, support data residency requirements\n- **Content Filtering**: Apply content moderation, PII detection, and compliance checks\n- **Rate Limiting**: Protect against abuse and ensure fair resource allocation\n\n### Cost Optimization\n\n- **Embedding Caching**: Cache frequently accessed embeddings to reduce API costs\n- **Selective Retrieval**: Use query routing to avoid unnecessary retrieval operations\n- **Model Selection**: Balance cost and performance when choosing embedding and LLM models\n- **Resource Right-sizing**: Optimize infrastructure based on actual usage patterns\n\n## 🔌 Platform-Specific RAG Implementations\n\nFor detailed implementation guides for specific platforms, see the documentation:\n\n- [Supabase Integration Guide](docs\u002Fsupabase-integration.md): Building RAG systems with Supabase, pgvector, and Edge Functions\n\n## 💡 Best Practices\n\n### Chunking Strategy\n\n- **Domain-Aware Chunking**: Use semantic or document-structure-based chunking over fixed-size for better context preservation\n- **Overlap Management**: Include strategic overlap (10-20%) to maintain context across boundaries\n- **Metadata Preservation**: Retain document structure, headers, and formatting cues in chunk metadata\n- **Multi-Granularity**: Consider hierarchical chunking (small chunks for retrieval, larger chunks for context)\n\n### Embedding Selection\n\n- **Model Evaluation**: Use MTEB leaderboard and domain-specific benchmarks to select appropriate models\n- **Dimension Optimization**: Balance embedding dimensions (higher = better quality, lower = faster retrieval)\n- **Domain Fine-tuning**: Fine-tune embeddings on domain-specific data when possible\n- **Consistency**: Ensure the same embedding model is used for indexing and querying\n\n### Retrieval Optimization\n\n- **Hybrid Search**: Combine semantic (vector) and lexical (BM25\u002Fkeyword) search for improved recall\n- **Re-ranking**: Apply cross-encoders or learned-to-rank models to improve precision\n- **Query Understanding**: Implement query classification, intent detection, and query expansion\n- **Result Diversification**: Avoid redundant results by implementing diversity constraints\n\n### Prompt Engineering\n\n- **Clear Instructions**: Provide explicit instructions on how to use retrieved context\n- **Source Attribution**: Request citations and require grounding in provided context\n- **Few-Shot Examples**: Include examples demonstrating desired response format and quality\n- **Context Compression**: Use techniques like summarization or extraction when context exceeds limits\n\n### Evaluation Framework\n\n- **Multi-Dimensional Metrics**: Evaluate relevance, accuracy, completeness, and groundedness\n- **Human-in-the-Loop**: Incorporate human feedback for continuous improvement\n- **Synthetic Evaluation**: Generate test queries and expected outputs for automated testing\n- **Production Monitoring**: Track user satisfaction, query patterns, and failure modes\n\n### Iterative Improvement\n\n- **Feedback Loops**: Collect user feedback, query logs, and performance metrics\n- **Experimentation**: Systematically test improvements (chunking, retrieval, prompts) with controlled experiments\n- **Model Updates**: Plan for embedding model upgrades and migration strategies\n- **Documentation**: Maintain clear documentation of architecture, decisions, and operational procedures\n\n---\n\n## Contributing\n\nThis is a community-driven resource and continues to evolve. Contributions are welcome! If you'd like to add resources, fix errors, or improve organization:\n\n1. Fork the repository\n2. Create a branch for your changes\n3. Submit a pull request with a clear description\n\nFor new entries, ensure links are working, descriptions are accurate and concise, and content fits the appropriate section.\n\n## License\n\nThis project is licensed under the [CC0 1.0 Universal](LICENSE).\n","# 😎 令人惊叹的检索增强生成（RAG）\n[![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge-flat.svg)](https:\u002F\u002Fawesome.re) [![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002FDanielskry\u002FAwesome-RAG)\n\n这是一份精心整理的资源地图，涵盖了构建检索增强生成（RAG）系统所需的工具、框架、技术和学习资料。本仓库系统性地梳理了RAG生态，并提供了权威来源、教程和实现的链接，帮助您探索和开发RAG应用。\n\n## 概述\n\n**检索增强生成（RAG）**是生成式AI领域中的一项先进方法，它通过在生成过程中动态检索并整合来自外部知识源的相关上下文信息，从而增强大型语言模型（LLM）的能力。与仅依赖预训练知识的传统LLM不同，RAG系统能够让模型访问最新、特定领域或专有的信息，显著提升准确性、减少幻觉现象，并实现知识的实时更新。\n\n### 主要优势\n\n- **减少幻觉**：使回答基于检索到的事实信息\n- **领域适应性**：无需微调即可让LLM处理专业领域知识\n- **实时更新**：无需重新训练模型即可引入最新信息\n- **成本效益**：相比针对特定任务进行微调更具经济性\n- **透明度**：为生成内容提供来源标注\n- **隐私与安全**：将敏感数据保留在私有知识库中\n\n## 内容\n\n- [ℹ️ RAG概述](#ℹ%EF%B8%8F-general-information-on-rag)\n- [🏗️ 架构模式](#%EF%B8%8F-architecture-patterns)\n- [🎯 高级方法](#-advanced-approaches)\n- [🧰 支持RAG的框架](#-frameworks-that-facilitate-rag)\n- [🐍 Python生态中的RAG工具](#-python-ecosystem-for-rag)\n- [🛠️ 技术](#-techniques)\n- [📊 指标与评估](#-metrics--evaluation)\n- [💾 数据库](#-databases)\n- [🔌 平台相关的RAG实现](#-platform-specific-rag-implementations)\n- [🚀 生产环境考量](#-production-considerations)\n- [💡 最佳实践](#-best-practices)\n\n## ℹ️ RAG概述\n\nRAG解决了LLM的一个根本性局限：其静态的知识截止点以及无法访问外部信息的能力。传统的RAG实现通常采用一个检索流水线，在LLM的提示中加入来自知识库的上下文相关文档。例如，当询问关于某栋特定房屋的装修材料时，LLM可能具备一般的装修知识，但缺乏该房产的具体细节。而RAG系统则能够检索到相关文档（如建筑蓝图、材料规格、当地建筑规范等），从而提供准确且上下文感知的回答。\n\n### 实现资源\n\n#### Python教程与示例\n\n- 完整的基础[Python RAG实现](https:\u002F\u002Fgithub.com\u002FDanielskry\u002FLangChain-Chroma-RAG-demo-2024)：使用LangChain和Chroma的全栈RAG示例\n- [LangChain RAG教程](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fuse_cases\u002Fquestion_answering\u002F)：构建RAG应用的全面指南\n- [LlamaIndex RAG教程](https:\u002F\u002Fdocs.llamaindex.ai\u002Fen\u002Fstable\u002Fgetting_started\u002Fstarter_example\u002F)：使用LlamaIndex入门RAG\n- [Haystack RAG流水线](https:\u002F\u002Fdocs.haystack.deepset.ai\u002Fdocs\u002Fretrieval-augmented-generation)：利用Haystack构建RAG流水线\n\n#### 生产与最佳实践\n\n- [生产级RAG模式与最佳实践](https:\u002F\u002Fdocs.llamaindex.ai\u002Fen\u002Fstable\u002Foptimizing\u002Fproduction_rag\u002F)：面向生产的RAG优化策略\n- [LangChain生产指南](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fproduction\u002F)：将LangChain应用部署至生产环境\n- [Python异步编程最佳实践](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Fasyncio-dev.html)：编写高效的异步Python代码以支持AI应用\n\n## 🏗️ 架构模式\n\n根据具体需求，RAG系统可以采用多种架构模式：\n\n- **朴素RAG**：基础的“先检索后生成”流水线，未经过优化\n- **高级RAG**：结合查询重写、重排序和上下文压缩等技术\n- **模块化RAG**：由可组合的检索、排序和生成组件构成\n- **代理式RAG**：由LLM驱动的智能体动态做出检索决策\n- **自适应RAG**：模型能够自我反思检索质量并调整策略\n- **图谱RAG**：利用知识图谱进行结构化信息检索\n\n## 🎯 高级方法\n\nRAG 的实现方式复杂多样，从简单的文档检索到集成迭代反馈循环、多智能体系统和领域特定增强的高级技术不等。现代方法包括：\n\n- [Vision-RAG](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=npkp4mSweEg)：将整页内容嵌入为图像，使视觉模型可以直接进行推理，而无需解析文本-RAG。\n- [缓存增强生成 (CAG)](https:\u002F\u002Fmedium.com\u002F@ronantech\u002Fcache-augmented-generation-cag-in-llms-a-step-by-step-tutorial-6ac35d415eec)：将相关文档预加载到模型的上下文中，并存储推理状态（键值对 (KV) 缓存）。\n- [代理式 RAG](https:\u002F\u002Flangchain-ai.github.io\u002Flanggraph\u002Ftutorials\u002Frag\u002Flanggraph_agentic_rag\u002F)：也称为检索代理，能够对检索过程做出决策。\n- [A-RAG](https:\u002F\u002Fgithub.com\u002FAyanami0730\u002Farag)：具有层次化检索接口（关键词、语义、块级）的代理式 RAG，使 LLM 代理能够在多个粒度上自主搜索和检索。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.03442)）\n- [纠正式 RAG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.15884.pdf)（CRAG）：在将检索到的信息整合到 LLM 回答之前对其进行纠正或优化的方法。\n- [检索增强微调](https:\u002F\u002Ftechcommunity.microsoft.com\u002Ft5\u002Fai-ai-platform-blog\u002Fraft-a-new-way-to-teach-llms-to-be-better-at-rag\u002Fba-p\u002F4084674)（RAFT）：专门针对增强检索和生成任务对 LLM 进行微调的技术。\n- [自我反思式 RAG](https:\u002F\u002Fselfrag.github.io\u002F)：根据模型性能反馈动态调整检索策略的模型。\n- [RAG 融合](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03367)：结合多种检索方法以改善上下文整合的技术。\n- [时间增强型检索](https:\u002F\u002Fadam-rida.medium.com\u002Ftemporal-augmented-retrieval-tar-dynamic-rag-ad737506dfcc)（TAR）：在检索过程中考虑时间敏感数据。\n- [先规划后 RAG](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.12430)（PlanRAG）：针对复杂任务，在执行 RAG 之前加入规划阶段的战略。\n- [GraphRAG](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fgraphrag)：一种使用知识图谱来增强上下文整合和推理的结构化方法。\n- [代码图谱-RAG](https:\u002F\u002Fgithub.com\u002Fvitali87\u002Fcode-graph-rag)：用于多语言代码库分析的知识图谱 RAG 系统。\n- [FLARE](https:\u002F\u002Fmedium.com\u002Fetoai\u002Fbetter-rag-with-active-retrieval-augmented-generation-flare-3b66646e2a9f)：一种结合主动检索与增强生成的方法，以提高响应质量。\n- [GNN-RAG](https:\u002F\u002Fgithub.com\u002Fcmavro\u002FGNN-RAG)：用于大型语言模型推理的图神经网络检索。\n- [多模态 RAG](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fan-easy-introduction-to-multimodal-retrieval-augmented-generation\u002F)：将 RAG 扩展到处理文本、图像和音频等多种模态。\n- [VideoRAG](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.05874)：利用大型视频语言模型 (LVLMs) 将 RAG 扩展到视频领域，以检索和整合视觉及文本内容，实现多模态生成。\n- [REFRAG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.01092)：通过在生成前将检索到的上下文压缩为嵌入向量来优化 RAG 解码，从而在保持输出质量的同时降低延迟。\n- [InstructRAG](https:\u002F\u002Fgithub.com\u002Fweizhepei\u002FInstructRAG)：通过基于指令的微调以及自合成的理由来提升检索和生成质量的 RAG 系统。\n\n## 🧰 促进 RAG 的框架\n\n- [Haystack](https:\u002F\u002Fgithub.com\u002Fdeepset-ai\u002Fhaystack)：用于构建可定制且生产就绪的 LLM 应用程序的 LLM 协调框架。\n- [LangChain](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fmodules\u002Fdata_connection\u002F)：一个用于处理 LLM 的通用框架。\n- [Semantic Kernel](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fsemantic-kernel)：微软提供的用于开发生成式 AI 应用程序的 SDK。\n- [LlamaIndex](https:\u002F\u002Fdocs.llamaindex.ai\u002Fen\u002Fstable\u002Foptimizing\u002Fproduction_rag\u002F)：用于将自定义数据源连接到 LLM 的框架。\n- [Dify](https:\u002F\u002Fgithub.com\u002Flanggenius\u002Fdify)：一个开源的 LLM 应用开发平台。\n- [Cognita](https:\u002F\u002Fgithub.com\u002Ftruefoundry\u002Fcognita)：一个开源的 RAG 框架，用于构建模块化且生产就绪的应用程序。\n- [Verba](https:\u002F\u002Fgithub.com\u002Fweaviate\u002FVerba)：一个开源自带 RAG 功能的应用程序。\n- [Mastra](https:\u002F\u002Fgithub.com\u002Fmastra-ai\u002Fmastra)：用于构建 AI 应用程序的 TypeScript 框架。\n- [Letta](https:\u002F\u002Fgithub.com\u002Fletta-ai\u002Fletta)：一个开源框架，用于构建有状态的 LLM 应用程序。\n- [Flowise](https:\u002F\u002Fgithub.com\u002FFlowiseAI\u002FFlowise)：一个拖放式 UI，用于构建自定义的 LLM 流程。\n- [Kreuzberg](https:\u002F\u002Fgithub.com\u002Fkreuzberg-dev\u002Fkreuzberg)：一个多语言文档智能库（Rust 核心，附带 Python、TypeScript 和 Go 绑定），可以从 62 种以上的文档格式中提取文本、表格和元数据，用于 RAG 数据摄取管道。\n- [Swiftide](https:\u002F\u002Fgithub.com\u002Fbosun-ai\u002Fswiftide)：一个用于构建模块化、流式 LLM 应用程序的 Rust 框架。\n- [CocoIndex](https:\u002F\u002Fgithub.com\u002Fcocoindex-io\u002Fcocoindex)：一个用于为 AI 索引数据的 ETL 框架，例如 RAG；支持实时增量更新。\n- [Pathway](https:\u002F\u002Fgithub.com\u002Fpathwaycom\u002Fpathway\u002F)：一个高性能的开源 Python ETL 框架，采用 Rust 运行时，支持 300 多种数据源。\n- [Pathway AI Pipelines](https:\u002F\u002Fgithub.com\u002Fpathwaycom\u002Fllm-app\u002F)：一个生产就绪的 RAG 框架，支持跨不同数据源的实时索引、检索和变更跟踪。\n- [LiteLLM](https:\u002F\u002Fdocs.litellm.ai\u002F)：一个统一的接口，用于连接多个 LLM 提供商（OpenAI、Anthropic、Hugging Face、Replicate），并提供日志记录、监控和成本跟踪功能。\n- [Agentset](https:\u002F\u002Fgithub.com\u002Fagentset-ai\u002Fagentset)：一个开源的生产就绪 RAG 平台，内置代理式推理、混合搜索和多模态支持。\n\n## 🐍 RAG 的 Python 生态系统\n\nPython 是目前 RAG 最成熟的生态系统，广泛支持 LLM、嵌入、向量数据库、评估以及生产工具。\n\n请参阅完整指南：[RAG 的 Python 生态系统](docs\u002Fpython-ecosystem.md)\n\n## 🛠️ 技术\n\n### 数据清洗\n\n- [数据清洗技术](https:\u002F\u002Fmedium.com\u002Fintel-tech\u002Ffour-data-cleaning-techniques-to-improve-large-language-model-llm-performance-77bee9003625)：用于优化输入数据并提升模型性能的预处理步骤。\n\n### 提示工程\n\n- **策略**\n  - [标记与标签](https:\u002F\u002Fpython.langchain.com\u002Fv0.1\u002Fdocs\u002Fuse_cases\u002Ftagging\u002F)：为检索到的数据添加语义标签，以提升相关性。\n  - [思维链（CoT）](https:\u002F\u002Fwww.promptingguide.ai\u002Ftechniques\u002Fcot)：鼓励模型在给出答案前，逐步思考问题。\n  - [验证链（CoVe）](https:\u002F\u002Fsourajit16-02-93.medium.com\u002Fchain-of-verification-cove-understanding-implementation-e7338c7f4cb5)：提示模型逐项验证其推理过程的准确性。\n  - [自我一致性](https:\u002F\u002Fwww.promptingguide.ai\u002Ftechniques\u002Fconsistency)：生成多条推理路径，并选择最一致的答案。\n  - [零样本提示](https:\u002F\u002Fwww.promptingguide.ai\u002Ftechniques\u002Fzeroshot)：设计无需任何示例即可引导模型的提示。\n  - [少样本提示](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fhow_to\u002Ffew_shot_examples\u002F)：在提示中提供少量示例，以展示期望的响应格式。\n  - [推理与行动（ReAct）提示](https:\u002F\u002Fwww.promptingguide.ai\u002Ftechniques\u002Freact)：将推理（如思维链）与行动（如工具调用）相结合。\n- **缓存**\n  - [提示缓存](https:\u002F\u002Fmedium.com\u002F@1kg\u002Fprompt-cache-what-is-prompt-caching-a-comprehensive-guide-e6cbae48e6a3)：通过存储和重用预计算的注意力状态来优化大语言模型。\n- **结构化**\n  - [基于标记的对象表示法](https:\u002F\u002Fgithub.com\u002Ftoon-format\u002Ftoon)：一种紧凑且确定性的 JSON 格式，用于大语言模型的提示。\n\n### 分块\n\n分块策略是 RAG 系统设计中最关键的决策之一，它直接影响检索精度和上下文质量。最佳方法取决于文档类型、领域特征和查询模式。\n\n- **[固定大小分块](https:\u002F\u002Fmedium.com\u002F@anuragmishra_27746\u002Ffive-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d)**\n  - **适用场景**：结构不重要的简单、均匀文档\n  - **特点**：将文本划分为大小一致的段落（通常 256–512 个 token），可配置 10%–20% 的重叠\n  - **优点**：实现简单，分块大小可预测，处理效率高\n  - **缺点**：可能分割句子或段落，丢失文档结构，容易打散语义单元\n  - **实现**：[CharacterTextSplitter](https:\u002F\u002Fpython.langchain.com\u002Fv0.1\u002Fdocs\u002Fmodules\u002Fdata_connection\u002Fdocument_transformers\u002Fcharacter_text_splitter\u002F)（LangChain）、[SentenceSplitter](https:\u002F\u002Fdocs.llamaindex.ai\u002Fen\u002Fstable\u002Fapi_reference\u002Fnode_parsers\u002Fsentence_splitter\u002F)（LlamaIndex）\n\n- **[递归分块](https:\u002F\u002Fmedium.com\u002F@AbhiramiVS\u002Fchunking-methods-all-to-know-about-it-65c10aa7b24e)**\n  - **适用场景**：具有层次结构的文档（Markdown、HTML、代码）\n  - **特点**：递归地按分隔符（段落→句子→词）拆分，直到达到所需的分块大小\n  - **优点**：保留自然边界，尊重文档层次结构，语义连贯性更好\n  - **缺点**：较为复杂，分块大小不一，需仔细配置分隔符\n  - **实现**：[RecursiveCharacterTextSplitter](https:\u002F\u002Fpython.langchain.com\u002Fv0.1\u002Fdocs\u002Fmodules\u002Fdata_connection\u002Fdocument_transformers\u002Frecursive_text_splitter\u002F)（LangChain）\n\n- **[基于文档的分块](https:\u002F\u002Fmedium.com\u002F@david.richards.tech\u002Fdocument-chunking-for-rag-ai-applications-04363d48fbf7)**\n  - **适用场景**：具有清晰章节的结构化文档（Markdown 标题、PDF 节、数据库记录）\n  - **特点**：根据文档元数据、格式提示或结构元素进行分割\n  - **优点**：保持文档结构，保留上下文，支持富含元数据的检索\n  - **缺点**：需要结构化输入，可能导致分块过大或过小\n  - **实现**：[MarkdownHeaderTextSplitter](https:\u002F\u002Fpython.langchain.com\u002Fv0.1\u002Fdocs\u002Fmodules\u002Fdata_connection\u002Fdocument_transformers\u002Fmarkdown_header_metadata\u002F)（LangChain）\n  - **多模态**：使用如 [OpenCLIP](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002Fopen_clip) 等模型处理图像和文本。\n\n- **[语义分块](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8OJC21T2SL4&t=1933s)**\n  - **适用场景**：语义连贯性至关重要的文档（叙事性文本、技术文档）\n  - **特点**：利用嵌入相似性识别自然的语义边界\n  - **优点**：保留语义单元，适应内容变化，提高检索相关性\n  - **缺点**：计算成本高，需使用嵌入模型，分块大小较难预测\n  - **最适合**：对上下文保存要求极高的高质量检索场景。\n\n- **[代理式分块](https:\u002F\u002Fyoutu.be\u002F8OJC21T2SL4?si=8VnYaGUaBmtZhCsg&t=2882)**\n  - **适用场景**：需要智能分割决策的复杂文档\n  - **特点**：使用大语言模型分析内容，确定最优的分块边界\n  - **优点**：高度自适应，理解上下文，能应用领域知识\n  - **缺点**：成本高，处理速度慢，需访问大语言模型 API\n  - **最适合**：标准分块方法失效的专业领域。\n\n**分块最佳实践：**\n- **重叠策略**：使用 10%–20% 的重叠，以保持跨边界的内容连贯性。\n- **大小优化**：平衡分块大小（越大上下文越丰富，越小精度越高）。\n- **元数据保留**：在分块元数据中保留文档结构、标题和格式信息。\n- **多粒度**：考虑层次化方法（小分块用于检索，大分块用于提供上下文）。\n\n### 嵌入\n\n嵌入是 RAG 系统中语义搜索的基础。嵌入模型的选择显著影响检索质量。\n\n- **模型选择**\n  - [MTEB 排行榜](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmteb\u002Fleaderboard)：全面评估嵌入模型在多个任务和语言上的表现。应选择在与您的应用场景相关的任务上表现优异的模型（如检索、聚类、分类）。\n  - **模型特性**：从以下方面评估模型：\n    - **维度**：较高维度（768–1024）通常提供更好的质量，但会增加存储和计算成本。\n    - **上下文长度**：确保模型支持您的文档分块大小。\n    - **多语言支持**：国际应用需要多语言支持。\n    - **领域专长**：通用模型与特定领域模型（如科学、法律、医学）的区别。\n  \n- **自定义嵌入**\n  - **微调**：使用对比学习、三元组损失或监督微调，将预训练模型适配到您的领域。\n  - **从头训练**：适用于有足够标注数据的高度专业化领域。\n  - **多模态嵌入**：用于需要理解文本、图像或音频的应用场景（如 CLIP、ImageBind）。\n  - **集成方法**：结合多种嵌入模型以提高鲁棒性。\n\n### 檢索\n\n- **搜尋方法**\n  - [向量存儲平面索引](https:\u002F\u002Fweaviate.io\u002Fdevelopers\u002Facademy\u002Fpy\u002Fvector_index\u002Fflat)\n    - 簡單且高效的檢索方式。\n    - 將內容向量化後以平面向量形式存儲。\n  - [層次索引檢索](https:\u002F\u002Fpixion.co\u002Fblog\u002Frag-strategies-hierarchical-index-retrieval)\n    - 按照層次結構逐步縮小數據範圍。\n    - 按照層次順序執行檢索。\n  - [假設性問題](https:\u002F\u002Fpixion.co\u002Fblog\u002Frag-strategies-hypothetical-questions-hyde)\n    - 用於提高資料庫分塊與查詢之間的相似度（與 HyDE 相同）。\n    - 使用大型語言模型為每個文本分塊生成具體問題。\n    - 將這些問題轉換為向量嵌入。\n    - 搜尋時，將查詢與該問題向量索引進行匹配。\n  - [假設性文檔嵌入 (HyDE)](https:\u002F\u002Fpixion.co\u002Fblog\u002Frag-strategies-hypothetical-questions-hyde)\n    - 用於提高資料庫分塊與查詢之間的相似度（與假設性問題相同）。\n    - 使用大型語言模型根據查詢生成一個假設的回答。\n    - 將此回答轉換為向量嵌入。\n    - 將查詢向量與假設回應向量進行比較。\n  - [由小到大檢索](https:\u002F\u002Fgithub.com\u002FGoogleCloudPlatform\u002Fgenerative-ai\u002Fblob\u002Fmain\u002Fgemini\u002Fuse-cases\u002Fretrieval-augmented-generation\u002Fsmall_to_big_rag\u002Fsmall_to_big_rag.ipynb)\n    - 通過使用較小的分塊進行搜尋，而使用較大的分塊提供上下文來改進檢索效果。\n    - 較小的子分塊引用較大的父分塊。\n  - [情境檢索](https:\u002F\u002Fwww.anthropic.com\u002Fengineering\u002Fcontextual-retrieval)\n    - 通過保留通常在分塊過程中丟失的文檔上下文，提升 RAG 檢索的準確性。\n    - 在嵌入和索引之前，每個文本分塊都會被添加一段由模型生成的簡短摘要，從而產生情境嵌入和情境 BM25。\n    - 這種結合方法同時改善了語義和詞法匹配，並在與重新排序結合時降低了檢索失敗率。\n  - [自適應檢索](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.14403)\n    - 在生成過程中動態決定何時以及檢索多少內容。\n  - [查詢重構與擴展](https:\u002F\u002Fhaystack.deepset.ai\u002Fcookbook\u002Fquery-expansion)\n    - 在檢索前自動重寫或擴展查詢，以提高召回率。\n    - 對於冗長或模糊的用戶查詢特別有用。\n- **[重新排序](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fenhancing-rag-pipelines-with-re-ranking\u002F)**：通過對初始檢索到的文檔進行重新排序，優先選擇與查詢語義最相關的文檔，從而提升 RAG 流程中的搜尋結果。\n\n### 回應品質與安全性\n\n確保高品質、安全且可靠的回應對於生產環境下的 RAG 系統至關重要。\n\n- **幻覺現象緩解**\n  - **[檢測技術](https:\u002F\u002Fmachinelearningmastery.com\u002Frag-hallucination-detection-techniques\u002F)**：實施方法以識別模型何時生成無據可依的信息。\n  - **依據事實驗證**：將生成的主張與檢索到的上下文進行交叉核對。\n  - **置信度評分**：根據來源質量為生成的回應分配置信度分數。\n  - **來源歸屬**：要求所有事實性主張都附有引用來源。\n  - **檢索品質**：提高檢索精確度以降低幻覺風險。\n\n- **安全防護機制**\n  - **[實施指南](https:\u002F\u002Fdeveloper.ibm.com\u002Ftutorials\u002Fawb-how-to-implement-llm-guardrails-for-rag-applications\u002F)**：全面的安全機制實施方法。\n  - **內容審查**：在輸入和輸出階段過濾有害、偏見或不適當的內容。\n  - **偏見緩解**：檢測並緩解檢索內容及生成回應中的偏見。\n  - **事實核查**：將主張與權威來源或知識庫進行核對。\n  - **毒性檢測**：使用分類器識別並過濾有毒內容。\n\n- **提示注入預防**\n  - **[安全指南](https:\u002F\u002Fhiddenlayer.com\u002Finnovation-hub\u002Fprompt-injection-attacks-on-llms\u002F)**：理解並預防提示注入攻擊。\n  - **輸入驗證**：嚴格驗證並淨化所有外部輸入，採用白名單、長度限制和模式匹配等方法。\n  - **內容分離**：使用明確的分隔符、模板系統和基於角色的提示，將指令與用戶數據分開。\n  - **輸出監控**：持續監控回應是否存在異常、意外行為或安全漏洞。\n  - **速率限制**：實施速率限制和濫用檢測，以防止系統性攻擊。\n  - **沙盒隔離**：將 LLM 的執行環境隔離，以限制成功注入可能造成的損害。\n\n## 📊 指標與評估\n\n### 嵌入相似度指標\n\n這些指標用於衡量嵌入之間的相似度，對於評估 RAG 系統如何有效地檢索和整合外部文檔或資料來源至關重要。通過選擇合適的相似度指標，您可以優化 RAG 系統的性能和準確性。此外，您也可以根據特定領域的需求開發自定義指標，以捕捉領域特有的細節並提高相關性。\n\n- **[餘弦相似度](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCosine_similarity)**\n\n  - 衡量多維空間中兩個向量夾角的餘弦值。\n  - 對於比較文本嵌入非常有效，因為向量的方向代表語義信息。\n  - 在 RAG 系統中常用於衡量查詢嵌入與文檔嵌入之間的語義相似度。\n\n- **[點積](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FDot_product)**\n\n  - 計算兩個數列對應項乘積之和。\n  - 在向量歸一化的情況下，等同於餘弦相似度。\n  - 簡單高效，常與硬體加速結合，適用於大規模運算。\n\n- **[歐幾里得距離](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FEuclidean_distance)**\n\n  - 計算歐幾里得空間中兩點之間的直線距離。\n  - 可用於嵌入比較，但在高維空間中可能會因「維度災難」而失去效用。\n  - 常用於降維後的 K-means 等聚類算法中。\n\n- **[傑卡德相似度](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FJaccard_index)**\n\n  - 衡量兩個有限集合之間的相似度，即交集大小除以並集大小。\n  - 對於比較詞袋模型或 n 元組比較中的詞彙集合非常有用。\n  - 對於大型語言模型生成的連續型嵌入則不太適用。\n\n> **注意**：一般認為餘弦相似度和點積是衡量高維嵌入之間相似度最有效的指標。\n\n### 响应评估指标\n\nRAG 解决方案中的响应评估涉及使用多种指标来衡量语言模型输出的质量。以下是评估这些响应的结构化方法：\n\n- **自动化基准测试**\n\n  - **[BLEU](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FBLEU)：** 评估机器生成文本与参考文本之间的 n-gram 重叠程度，从而反映其精确度。\n  - **[ROUGE](\u003Chttps:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FROUGE_(metric)>):** 通过比较 n-gram、跳过二元组或最长公共子序列与参考文本，来衡量召回率。\n  - **[METEOR](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FMETEOR)：** 专注于精确匹配、词干提取、同义词和对齐，主要用于机器翻译任务。\n\n- **人工评估**\n  由人工评价者根据以下方面对响应进行评估：\n  - **相关性：** 与用户查询的一致性。\n  - **流畅性：** 语法和风格质量。\n  - **事实准确性：** 根据权威来源验证陈述的真实性。\n  - **连贯性：** 响应内部的逻辑一致性。\n  \n  具体方法包括：\n  - **[标注队列](https:\u002F\u002Fdocs.langchain.com\u002Flangsmith\u002Fannotation-queues)：** 为人工标注者提供一个简洁、定向的界面，以便将反馈附加到特定的运行记录上。\n\n- **模型评估**\n  利用预训练的评估工具，从多维度对比和衡量输出结果：\n\n  - **[TuringBench](https:\u002F\u002Fturingbench.ist.psu.edu\u002F)：** 提供跨语言基准的全面评估。\n  - **[Hugging Face Evaluate](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fevaluate\u002Fen\u002Findex)：** 计算输出与人类偏好的一致程度。\n\n- **评估的关键维度**\n  - ** groundedness（ groundedness）：** 评估响应是否完全基于提供的上下文。低 groundedness 可能表明系统依赖于幻觉或无关信息。\n  - **完整性：** 衡量响应是否回答了查询的所有方面。\n  - **检索评分：** 使用 AI 辅助的检索评分以及基于提示的意图验证。\n  - **利用率：** 评估检索到的数据在多大程度上对响应有贡献。\n  - **分析：** 利用 LLM 检查检索到的片段是否被纳入响应中。\n\n#### 工具\n\n这些工具可以帮助您评估 RAG 系统的性能，从跟踪用户反馈到记录查询交互，并随时间比较多种评估指标。\n\n- **[LangFuse](https:\u002F\u002Fgithub.com\u002Flangfuse\u002Flangfuse)：** 开源工具，用于跟踪 LLM 指标、可观测性和提示管理。\n- **[Opik](https:\u002F\u002Fgithub.com\u002Fcomet-ml\u002Fopik)：** 开源平台，用于 LLM 的可观测性、评估和提示优化。\n- **[Ragas](https:\u002F\u002Fdocs.ragas.io\u002Fen\u002Fstable\u002F)：** 一个帮助评估 RAG 流程的框架。\n- **[LangSmith](https:\u002F\u002Fdocs.smith.langchain.com\u002F)：** 一个用于构建生产级 LLM 应用程序的平台，允许您密切监控和评估您的应用。\n- **[Hugging Face Evaluate](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate)：** 用于计算 BLEU 和 ROUGE 等指标以评估文本质量的工具。\n- **[Weights & Biases](https:\u002F\u002Fwandb.ai\u002Fwandb-japan\u002Frag-hands-on\u002Freports\u002FStep-for-developing-and-evaluating-RAG-application-with-W-B--Vmlldzo1NzU4OTAx)：** 跟踪实验、记录指标并可视化性能。\n\n## 💾 数据库\n\n向量数据库是 RAG 系统的关键组件，它们为嵌入提供高效的存储和相似性搜索能力。选择合适的数据库取决于规模、延迟要求、部署模式（云或本地）以及功能需求（混合搜索、过滤等）。以下列表列出了一些适用于 RAG 应用的数据库系统：\n\n### 基准测试\n\n- [选择向量数据库](https:\u002F\u002Fbenchmark.vectorview.ai\u002Fvectordbs.html)\n\n### 分布式数据处理与服务引擎：\n\n- [Apache Cassandra](https:\u002F\u002Fcassandra.apache.org\u002Fdoc\u002Flatest\u002Fcassandra\u002Fvector-search\u002Fconcepts.html)：分布式 NoSQL 数据库管理系统。\n- [MongoDB Atlas](https:\u002F\u002Fwww.mongodb.com\u002Fproducts\u002Fplatform\u002Fatlas-vector-search)：全球分布式的多模型数据库服务，集成向量搜索功能。\n- [Vespa](https:\u002F\u002Fvespa.ai\u002F)：开源的大数据处理与服务引擎，专为实时应用设计。\n\n### 具备向量功能的搜索引擎：\n\n- [Elasticsearch](https:\u002F\u002Fwww.elastic.co\u002Felasticsearch)：提供传统搜索功能的同时，也具备向量搜索能力。\n- [OpenSearch](https:\u002F\u002Fgithub.com\u002Fopensearch-project\u002FOpenSearch)：从 Elasticsearch 分叉而来的分布式搜索与分析引擎。\n\n### 向量数据库：\n\n- [Chroma DB](https:\u002F\u002Fgithub.com\u002Fchroma-core\u002Fchroma)：一款面向 AI 的开源嵌入数据库。\n- [Milvus](https:\u002F\u002Fgithub.com\u002Fmilvus-io\u002Fmilvus)：面向 AI 驱动应用的开源向量数据库。\n- [Pinecone](https:\u002F\u002Fwww.pinecone.io\u002F)：无服务器架构的向量数据库，专为机器学习工作流优化。\n- [Oracle AI 向量搜索](https:\u002F\u002Fwww.oracle.com\u002Fdatabase\u002Fai-vector-search\u002F#retrieval-augmented-generation)：将向量搜索功能集成到 Oracle 数据库中，支持基于向量嵌入的语义查询。\n\n### 关系型数据库扩展：\n\n- [Pgvector](https:\u002F\u002Fgithub.com\u002Fpgvector\u002Fpgvector)：PostgreSQL 中用于向量相似性搜索的开源扩展。\n\n### 其他数据库系统：\n\n- [Azure Cosmos DB](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fcosmos-db\u002Fvector-database)：全球分布式的多模型数据库服务，集成向量搜索功能。\n- [Couchbase](https:\u002F\u002Fwww.couchbase.com\u002Fproducts\u002Fvector-search\u002F)：分布式 NoSQL 云数据库。\n- [Lantern](https:\u002F\u002Flantern.dev\u002F)：注重隐私的个人搜索引擎。\n- [LlamaIndex](https:\u002F\u002Fdocs.llamaindex.ai\u002Fen\u002Fstable\u002Fmodule_guides\u002Fstoring\u002Fvector_stores\u002F)：采用简单的内存向量存储，便于快速实验。\n- [Neo4j](https:\u002F\u002Fneo4j.com\u002Fdocs\u002Fcypher-manual\u002Fcurrent\u002Findexes\u002Fsemantic-indexes\u002Fvector-indexes\u002F)：图数据库管理系统。\n- [Qdrant](https:\u002F\u002Fgithub.com\u002Fneo4j\u002Fneo4j)：一款开源向量数据库，专为相似性搜索设计。\n- [Redis Stack](https:\u002F\u002Fredis.io\u002Fdocs\u002Flatest\u002Fdevelop\u002Finteract\u002Fsearch-and-query\u002F)：作为数据库、缓存和消息代理使用的内存数据结构存储。\n- [SurrealDB](https:\u002F\u002Fgithub.com\u002Fsurrealdb\u002Fsurrealdb)：一款可扩展的多模型数据库，针对时序数据进行了优化。\n- [Weaviate](https:\u002F\u002Fgithub.com\u002Fweaviate\u002Fweaviate)：一款开源的云原生向量搜索引擎。\n\n### 向量搜索库与工具：\n\n- [FAISS](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffaiss)：一个用于高效相似性搜索和密集向量聚类的库，专为大规模数据集设计，优化了最近邻快速检索。\n\n## 🚀 生产注意事项\n\n构建生产级 RAG 系统不仅需要关注核心的检索和生成流程，还需要解决以下几个关键问题：\n\n### 可扩展性与性能\n\n- **索引吞吐量**：设计管道以处理高容量文档摄取，并支持增量更新\n- **查询延迟**：通过高效索引（HNSW、IVF）、缓存策略和并行处理优化检索速度\n- **并发请求**：在高流量场景下实施连接池、请求队列和负载均衡\n- **资源管理**：监控 GPU\u002FCPU 利用率、内存消耗以及数据库连接池\n\n### 可靠性与监控\n\n- **可观测性**：实现全面的日志记录、链路追踪和指标收集（延迟、吞吐量、错误率）\n- **健康检查**：监控嵌入服务可用性、向量数据库连接状态以及 LLM API 状态\n- **错误处理**：实现重试逻辑、熔断器和优雅降级策略\n- **A\u002FB 测试**：比较不同的检索策略、分块方法和提示模板\n\n### 数据管理\n\n- **增量更新**：支持实时或近实时的文档索引，无需全量重新索引\n- **版本控制**：跟踪文档版本、嵌入模型版本和提示模板\n- **数据质量**：实施验证管道，以检测损坏的嵌入、缺失的元数据或过时的内容\n- **备份与恢复**：定期备份向量索引和元数据存储\n\n### 安全与合规\n\n- **访问控制**：实施身份验证、授权和审计日志记录\n- **数据隐私**：对静态和传输中的数据进行加密，支持数据本地化要求\n- **内容过滤**：应用内容审核、PII 检测和合规性检查\n- **速率限制**：防止滥用并确保公平的资源分配\n\n### 成本优化\n\n- **嵌入缓存**：缓存频繁访问的嵌入，以降低 API 成本\n- **选择性检索**：使用查询路由避免不必要的检索操作\n- **模型选择**：在选择嵌入模型和 LLM 模型时，权衡成本与性能\n- **资源合理化**：根据实际使用情况优化基础设施\n\n## 🔌 平台特定的 RAG 实现\n\n有关特定平台的详细实现指南，请参阅文档：\n\n- [Supabase 集成指南](docs\u002Fsupabase-integration.md)：使用 Supabase、pgvector 和 Edge Functions 构建 RAG 系统\n\n## 💡 最佳实践\n\n### 分块策略\n\n- **领域感知分块**：相较于固定大小的分块，采用语义或基于文档结构的分块方式，以更好地保留上下文\n- **重叠管理**：设置 10%-20% 的策略性重叠，以保持跨边界上下文连贯性\n- **元数据保留**：在分块元数据中保留文档结构、标题和格式提示\n- **多粒度分块**：考虑层次化分块（小块用于检索，大块用于提供上下文）\n\n### 嵌入模型选择\n\n- **模型评估**：使用 MTEB 排行榜和领域特定基准测试来选择合适的模型\n- **维度优化**：在嵌入维度之间取得平衡（更高维数意味着更好质量，更低维数则检索更快）\n- **领域微调**：尽可能在领域特定数据上对嵌入进行微调\n- **一致性**：确保索引和查询时使用相同的嵌入模型\n\n### 检索优化\n\n- **混合搜索**：结合语义（向量）和词汇（BM25\u002F关键词）搜索，以提高召回率\n- **重排序**：应用交叉编码器或学习排序模型，以提升精确度\n- **查询理解**：实施查询分类、意图识别和查询扩展\n- **结果多样化**：通过实施多样性约束，避免重复结果\n\n### 提示工程\n\n- **清晰指令**：明确指示如何使用检索到的上下文\n- **来源标注**：要求引用来源，并确保回答基于提供的上下文\n- **少样本示例**：提供展示期望响应格式和质量的示例\n- **上下文压缩**：当上下文超出限制时，使用摘要或提取等技术\n\n### 评估框架\n\n- **多维度指标**：评估相关性、准确性、完整性以及回答的依据性\n- **人机协作**：纳入人工反馈以持续改进\n- **合成评估**：生成测试查询和预期输出，以进行自动化测试\n- **生产监控**：跟踪用户满意度、查询模式和故障类型\n\n### 迭代改进\n\n- **反馈循环**：收集用户反馈、查询日志和性能指标\n- **实验**：通过受控实验系统地测试改进方案（分块、检索、提示）\n- **模型更新**：规划嵌入模型的升级及迁移策略\n- **文档维护**：保持架构、决策和操作流程的清晰文档\n\n---\n\n## 贡献\n\n这是一个由社区驱动的资源，并将持续发展。欢迎贡献！如需添加资源、修复错误或改进组织结构：\n\n1. 克隆仓库\n2. 创建一个用于更改的分支\n3. 提交带有清晰描述的拉取请求\n\n对于新增条目，请确保链接有效、描述准确简洁，并且内容符合相应章节的要求。\n\n## 许可证\n\n本项目采用 [CC0 1.0 通用许可](LICENSE)。","# Awesome-RAG 快速上手指南\n\nAwesome-RAG 并非一个单一的独立软件包，而是一个精选的检索增强生成（RAG）生态系统资源地图。它汇集了构建 RAG 系统所需的工具、框架、技术模式和最佳实践。本指南将指导你如何利用该仓库推荐的 Python 生态核心组件，快速搭建一个基础的 RAG 应用。\n\n## 环境准备\n\n在开始之前，请确保你的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows (推荐 WSL2)\n*   **Python 版本**: Python 3.9 或更高版本\n*   **包管理器**: pip 或 conda\n*   **前置依赖**:\n    *   基础 Python 开发环境\n    *   网络连接（用于下载模型和访问外部 API）\n    *   (可选) GPU 支持（如需本地运行大型嵌入模型或 LLM）\n\n**国内加速建议**：\n建议使用国内镜像源加速 Python 包的安装，例如阿里云或清华大学镜像源。\n\n## 安装步骤\n\n由于 Awesome-RAG 推荐了多种框架（如 LangChain, LlamaIndex, Haystack 等），这里以生态最成熟、文档最丰富的 **LangChain + ChromaDB** 组合为例进行安装。这是入门 RAG 最经典的技术栈。\n\n1.  **创建虚拟环境**（推荐）：\n    ```bash\n    python -m venv rag-env\n    source rag-env\u002Fbin\u002Factivate  # Windows 用户请使用: rag-env\\Scripts\\activate\n    ```\n\n2.  **安装核心依赖**：\n    使用国内镜像源安装 LangChain、Chroma 向量数据库及相关嵌入模型支持。\n\n    ```bash\n    pip install langchain langchain-community langchain-chroma langchain-openai tiktoken -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n    *注：如果你计划使用本地开源模型（如通过 Ollama 或 HuggingFace），还需安装 `langchain-huggingface` 或配置相应的本地服务。*\n\n## 基本使用\n\n以下是一个最简单的 RAG 流程示例：加载文本数据 -> 存入向量数据库 -> 检索相关上下文 -> 生成回答。\n\n### 1. 准备数据与初始化\n创建一个名为 `quick_start.py` 的文件，并写入以下代码。此示例使用 LangChain 内置的文档加载器和 Chroma 作为内存向量存储。\n\n```python\nfrom langchain_community.document_loaders import TextLoader\nfrom langchain_text_splitters import CharacterTextSplitter\nfrom langchain_chroma import Chroma\nfrom langchain_openai import OpenAIEmbeddings, ChatOpenAI\nfrom langchain_core.prompts import ChatPromptTemplate\nfrom langchain_core.runnables import RunnablePassthrough\nfrom langchain_core.output_parsers import StrOutputParser\nimport os\n\n# 【重要】设置你的 API Key (以 OpenAI 为例，也可替换为其他兼容接口)\n# 国内用户可使用中转服务或替换为本地模型端点\nos.environ[\"OPENAI_API_KEY\"] = \"your-api-key-here\"\n\n# 1. 加载数据 (假设当前目录下有一个 data.txt 文件)\n# 如果没有文件，可以先创建一个测试文件\nwith open(\"data.txt\", \"w\", encoding=\"utf-8\") as f:\n    f.write(\"检索增强生成 (RAG) 是一种通过从外部知识库检索相关信息来增强大语言模型能力的技术。\")\n    f.write(\"它可以减少幻觉，提供实时信息，并允许模型访问私有数据。\")\n\nloader = TextLoader(\"data.txt\", encoding=\"utf-8\")\ndocuments = loader.load()\n\n# 2. 分割文本 (Chunking)\ntext_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)\ntexts = text_splitter.split_documents(documents)\n\n# 3. 初始化嵌入模型和向量数据库\n# 注意：首次运行会下载嵌入模型或调用 API\nembeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\nvectorstore = Chroma.from_documents(documents=texts, embedding=embeddings)\n\n# 4. 设置检索器\nretriever = vectorstore.as_retriever(search_kwargs={\"k\": 1})\n\n# 5. 构建生成链\ntemplate = \"\"\"基于以下已知信息，简洁和专业地回答用户的问题。如果无法从中得到答案，请说“根据已知信息无法回答该问题”，不要编造答案。\n已知信息：{context}\n问题：{question}\n\"\"\"\nprompt = ChatPromptTemplate.from_template(template)\n\nmodel = ChatOpenAI(model=\"gpt-3.5-turbo\") # 或替换为你使用的模型\n\nrag_chain = (\n    {\"context\": retriever, \"question\": RunnablePassthrough()}\n    | prompt\n    | model\n    | StrOutputParser()\n)\n\n# 6. 执行查询\nquery = \"RAG 技术有什么主要好处？\"\nresponse = rag_chain.invoke(query)\n\nprint(f\"问题：{query}\")\nprint(f\"回答：{response}\")\n```\n\n### 2. 运行示例\n\n在终端中执行脚本：\n\n```bash\npython quick_start.py\n```\n\n### 3. 下一步探索\n\n完成上述基础示例后，你可以参考 Awesome-RAG 仓库中的其他章节进行深入：\n*   **架构模式**: 尝试从 Naive RAG 升级到 Advanced RAG（增加重排序、查询改写）。\n*   **高级方法**: 探索 Agentic RAG 或 GraphRAG 以处理更复杂的推理任务。\n*   **框架切换**: 尝试使用 LlamaIndex 或 Haystack 重构上述流程，对比不同框架的特性。\n*   **生产部署**: 参考仓库中的 \"Production Considerations\" 部分，学习如何优化延迟、监控成本及评估效果。","某金融科技公司的研发团队正急需构建一个能实时回答最新监管政策变化的智能客服系统，以应对频繁更新的合规要求。\n\n### 没有 Awesome-RAG 时\n- **技术选型迷茫**：面对市面上繁杂的向量数据库和框架，团队花费数周试错，难以确定适合金融高隐私要求的架构组合。\n- **幻觉问题频发**：直接微调大模型成本高且更新滞后，导致客服经常编造不存在的法规条款，引发合规风险。\n- **缺乏评估标准**：不知道如何量化检索准确率，上线后无法判断回答是基于真实文档还是模型臆测。\n- **开发效率低下**：由于缺少权威的实战教程和最佳实践指引，初级工程师在数据清洗和提示词工程上反复踩坑。\n\n### 使用 Awesome-RAG 后\n- **架构快速落地**：通过查阅 curated 的资源地图，团队迅速锁定了\"LangChain+Chroma\"的成熟方案，将原型开发周期从数周缩短至三天。\n- **精准事实溯源**：借鉴库中提供的先进检索技巧，系统能动态抓取最新监管文件，确保每条回答都有据可查，彻底消除幻觉。\n- **科学效果评估**：利用推荐的评估指标体系，团队建立了自动化测试流程，实时监控检索相关性，确保持续优化。\n- **避坑指南齐全**：参考生产环境最佳实践和生产级模式，直接规避了异步处理和数据安全等常见陷阱，提升了系统稳定性。\n\nAwesome-RAG 不仅是一张资源清单，更是团队从概念验证到生产落地的加速引擎，让构建可信、实时的领域专家系统变得有章可循。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FDanielskry_Awesome-RAG_5ceb537a.png","Danielskry","Daniel Skryseth","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FDanielskry_a4ba95be.png","I build infrastructure for how engineers will work and maintain with AI ",null,"Norway","https:\u002F\u002Fgithub.com\u002FDanielskry",1133,94,"2026-04-05T10:02:47","CC0-1.0",1,"","未说明",{"notes":90,"python":88,"dependencies":91},"Awesome-RAG 是一个 curated list（资源列表）仓库，而非单一的可执行软件工具。它主要收集了构建检索增强生成（RAG）系统所需的工具、框架、技术教程和学习资料链接（如 LangChain, LlamaIndex, Haystack 等）。因此，该仓库本身没有特定的操作系统、GPU、内存或 Python 版本要求。具体的运行环境需求取决于用户选择使用的下游框架或示例项目（例如 README 中提到的 LangChain-Chroma-RAG-demo）。部分列出的工具支持多种语言（如 Rust, TypeScript），不仅限于 Python。",[],[13,26,54],[94,95,96,97,98],"artificial-intelligence","generative-ai","large-language-models","machine-learning","retrieval-augmented-generation","2026-03-27T02:49:30.150509","2026-04-06T07:14:49.335522",[],[]]