[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Marker-Inc-Korea--AutoRAG":3,"tool-Marker-Inc-Korea--AutoRAG":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160784,2,"2026-04-19T11:32:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":10,"env_os":93,"env_gpu":94,"env_ram":93,"env_deps":95,"category_tags":104,"github_topics":106,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":125,"updated_at":126,"faqs":127,"releases":156},9633,"Marker-Inc-Korea\u002FAutoRAG","AutoRAG","AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation","AutoRAG 是一款专为检索增强生成（RAG）技术打造的开源自动化工具，旨在帮助用户快速找到最适合其特定数据和业务场景的 RAG 流程。在当前的 AI 应用中，面对琳琅满目的 RAG 模块与组合方案，开发者往往难以判断哪种配置在自己的数据上表现最佳，而手动构建并逐一评估所有可能性不仅耗时费力，门槛也极高。\n\nAutoRAG 通过引入类似 AutoML 的自动化机制，完美解决了这一痛点。它允许用户利用自有评估数据，自动测试多种 RAG 模块组合，智能筛选出最优 pipeline，从而大幅降低试错成本。该工具特别适合 AI 开发者、研究人员以及希望优化大模型应用效果的技术团队使用。\n\n其核心技术亮点在于提供了一套完整的闭环工作流：从数据解析、分块到问答对生成，再到基于 YAML 配置的自动化评估与可视化仪表盘监控，甚至支持一键部署最优方案。此外，AutoRAG 还兼容 Hugging Face Space 和 Google Colab，提供了丰富的教程与预置空间，让用户能够轻松上手，无需从零开始搭建复杂的评估体系，真正实现高效、精准的 RAG 系统优化。","# AutoRAG\n\nRAG AutoML tool for automatically finding an optimal RAG pipeline for your data.\n\n![Thumbnail](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_192cd8d2e363.png)\n\n![PyPI - Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002FAutoRAG)\n[![LinkedIn](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLinkedIn-Connect-blue?style=flat-square&logo=linkedin)](https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002F104375108\u002Fadmin\u002Fdashboard\u002F)\n![X (formerly Twitter) Follow](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002FAutoRAG_HQ)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Follow-orange?style=flat-square&logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002FAutoRAG)\n\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F7832\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_0b0cf02be0f5.png\" alt=\"Marker-Inc-Korea%2FAutoRAG | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\nThere are many RAG pipelines and modules out there,\nbut you don’t know what pipeline is great for “your own data” and \"your own use-case.\"\nMaking and evaluating all RAG modules is very time-consuming and hard to do.\nBut without it, you will never know which RAG pipeline is the best for your own use-case.\n\nAutoRAG is a tool for finding the optimal RAG pipeline for “your data.”\nYou can evaluate various RAG modules automatically with your own evaluation data\nand find the best RAG pipeline for your own use-case.\n\nAutoRAG supports a simple way to evaluate many RAG module combinations.\nTry now and find the best RAG pipeline for your own use-case.\n\nExplore our 📖 [Document](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002F)!!\n\n---\n\n## YouTube Tutorial\n\nhttps:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fassets\u002F96727832\u002Fc0d23896-40c0-479f-a17b-aa2ec3183a26\n\n_Muted by default, enable sound for voice-over_\n\nYou can see on [YouTube](https:\u002F\u002Fyoutu.be\u002F2ojK8xjyXAU?feature=shared)\n\n## Use AutoRAG in HuggingFace Space 🚀\n\n- [💬 Naive RAG Chatbot](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAutoRAG\u002FNaive-RAG-chatbot)\n- [✏️ AutoRAG Data Creation](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAutoRAG\u002FAutoRAG-data-creation)\n- [🚀 AutoRAG RAG Pipeline Optimization](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAutoRAG\u002FRAG-Pipeline-Optimization)\n\n## Colab Tutorial\n\n- [Step 1: Basic of AutoRAG | Optimizing your RAG pipeline](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F19OEQXO_pHN6gnn2WdfPd4hjnS-4GurVd?usp=sharing)\n- [Step 2: Data Creation | Create your own Data for RAG Optimization](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1BOdzMndYgMY_iqhwKcCCS7ezHbZ4Oz5X?usp=sharing)\n- [Step 3: Use Custom LLM & Embedding Model | Use Custom Model](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F12VpWcSTSOsLSyW0BKb-kPoEzK22ACxvS?usp=sharing)\n\n# Index\n\n- [Quick Install](#quick-install)\n- [Data Creation](#data-creation)\n    - [Parsing](#1-parsing)\n    - [Chunking](#2-chunking)\n    - [QA Creation](#3-qa-creation)\n- [RAG Optimization](#rag-optimization)\n    - [How AutoRAG optimizes RAG pipeline?](#how-autorag-optimizes-rag-pipeline)\n    - [Metrics](#metrics)\n    - [Quick Start](#quick-start-1)\n        - [Set YAML File](#1-set-yaml-file)\n        - [Run AutoRAG](#2-run-autorag)\n        - [Run Dashboard](#3-run-dashboard)\n        - [Deploy your optimal RAG pipeline](#4-deploy-your-optimal-rag-pipeline)\n- [FaQ](#-faq)\n\n# Quick Install\n\nWe recommend using Python version 3.10 or higher for AutoRAG.\n\n```bash\npip install AutoRAG\n```\n\nIf you want to use the local models, you need to install gpu version.\n\n```bash\npip install \"AutoRAG[gpu]\"\n```\n\nOr for parsing, you can use the parsing version.\n\n```bash\npip install \"AutoRAG[gpu,parse]\"\n```\n\n# Data Creation\n\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAutoRAG\u002FAutoRAG-data-creation\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_0ac86e2beae4.png\" alt=\"Hugging Face Sticker\" style=\"width:200px;height:auto;\">\n\u003C\u002Fa>\n\n![Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_e071efdb0c80.png)\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_d6bcbcdbd8d8.png)\n\nRAG Optimization requires two types of data: QA dataset and Corpus dataset.\n\n1. **QA** dataset file (qa.parquet)\n2. **Corpus** dataset file (corpus.parquet)\n\n**QA** dataset is important for accurate and reliable evaluation and optimization.\n\n**Corpus** dataset is critical to the performance of RAGs.\nThis is because RAG uses the corpus to retrieve documents and generate answers using it.\n\n### 📌 Supporting Data Creation Modules\n\n![Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_49ef22523536.png)\n\n- [Supporting Parsing Modules List](https:\u002F\u002Fedai.notion.site\u002FSupporting-Parsing-Modules-e0b7579c7c0e4fb2963e408eeccddd75?pvs=4)\n- [Supporting Chunking Modules List](https:\u002F\u002Fedai.notion.site\u002FSupporting-Chunk-Modules-8db803dba2ec4cd0a8789659106e86a3?pvs=4)\n\n## Quick Start\n\n### 1. Parsing\n\n#### Set YAML File\n\n```yaml\nmodules:\n  - module_type: langchain_parse\n    parse_method: pdfminer\n```\n\nYou can also use multiple Parse modules at once.\nHowever, in this case, you'll need to return a new process for each parsed result.\n\n#### Start Parsing\n\nYou can parse your raw documents with just a few lines of code.\n\n```python\nfrom autorag.parser import Parser\n\nparser = Parser(data_path_glob=\"your\u002Fdata\u002Fpath\u002F*\")\nparser.start_parsing(\"your\u002Fpath\u002Fto\u002Fparse_config.yaml\")\n```\n\n### 2. Chunking\n\n#### Set YAML File\n\n```yaml\nmodules:\n  - module_type: llama_index_chunk\n    chunk_method: Token\n    chunk_size: 1024\n    chunk_overlap: 24\n    add_file_name: en\n```\n\nYou can also use multiple Chunk modules at once.\nIn this case, you need to use one corpus to create QA and then map the rest of the corpus to QA Data.\nIf the chunk method is different, the retrieval_gt will be different, so we need to remap it to the QA dataset.\n\n#### Start Chunking\n\nYou can chunk your parsed results with just a few lines of code.\n\n```python\nfrom autorag.chunker import Chunker\n\nchunker = Chunker.from_parquet(parsed_data_path=\"your\u002Fparsed\u002Fdata\u002Fpath\")\nchunker.start_chunking(\"your\u002Fpath\u002Fto\u002Fchunk_config.yaml\")\n```\n\n### 3. QA Creation\n\nYou can create QA dataset with just a few lines of code.\n\n```python\nimport pandas as pd\nfrom llama_index.llms.openai import OpenAI\n\nfrom autorag.data.qa.filter.dontknow import dontknow_filter_rule_based\nfrom autorag.data.qa.generation_gt.llama_index_gen_gt import (\n\tmake_basic_gen_gt,\n\tmake_concise_gen_gt,\n)\nfrom autorag.data.qa.schema import Raw, Corpus\nfrom autorag.data.qa.query.llama_gen_query import factoid_query_gen\nfrom autorag.data.qa.sample import random_single_hop\n\nllm = OpenAI()\nraw_df = pd.read_parquet(\"your\u002Fpath\u002Fto\u002Fparsed.parquet\")\nraw_instance = Raw(raw_df)\n\ncorpus_df = pd.read_parquet(\"your\u002Fpath\u002Fto\u002Fcorpus.parquet\")\ncorpus_instance = Corpus(corpus_df, raw_instance)\n\ninitial_qa = (\n\tcorpus_instance.sample(random_single_hop, n=3)\n\t.map(\n\t\tlambda df: df.reset_index(drop=True),\n\t)\n\t.make_retrieval_gt_contents()\n\t.batch_apply(\n\t\tfactoid_query_gen,  # query generation\n\t\tllm=llm,\n\t)\n\t.batch_apply(\n\t\tmake_basic_gen_gt,  # answer generation (basic)\n\t\tllm=llm,\n\t)\n\t.batch_apply(\n\t\tmake_concise_gen_gt,  # answer generation (concise)\n\t\tllm=llm,\n\t)\n\t.filter(\n\t\tdontknow_filter_rule_based,  # filter don't know\n\t\tlang=\"en\",\n\t)\n)\n\ninitial_qa.to_parquet('.\u002Fqa.parquet', '.\u002Fcorpus.parquet')\n```\n\n# RAG Optimization\n\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAutoRAG\u002FRAG-Pipeline-Optimization\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_0ac86e2beae4.png\" alt=\"Hugging Face Sticker\" style=\"width:200px;height:auto;\">\n\u003C\u002Fa>\n\n![Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_07bed2d76e90.png)\n\n![rag](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_548d236ede08.png)\n\n## How AutoRAG optimizes RAG pipeline?\n\nHere is the AutoRAG RAG Structure that only show Nodes.\n\n![Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_93096663760e.png)\n\nHere is the image showing all the nodes and modules.\n\n![Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_7bccd6bc90a5.png)\n\n![rag_opt_gif](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_5825f3c68063.png)\n\n### 📌 Supporting RAG Optimization Nodes & modules\n\n- [Supporting RAG Modules list](https:\u002F\u002Fedai.notion.site\u002FSupporting-Nodes-modules-0ebc7810649f4e41aead472a92976be4?pvs=4)\n\n## Metrics\n\nThe metrics used by each node in AutoRAG are shown below.\n\n![Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_1af1737d419d.png)\n\n![Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_3664e8599f48.png)\n\n- [Supporting metrics list](https:\u002F\u002Fedai.notion.site\u002FSupporting-metrics-867d71caefd7401c9264dd91ba406043?pvs=4)\n\nHere is the detailed information about the metrics that AutoRAG supports.\n\n- [Retrieval Metrics](https:\u002F\u002Fedai.notion.site\u002FRetrieval-Metrics-dde3d9fa1d9547cdb8b31b94060d21e7?pvs=4)\n- [Retrieval Token Metrics](https:\u002F\u002Fedai.notion.site\u002FRetrieval-Token-Metrics-c3e2d83358e04510a34b80429ebb543f?pvs=4)\n- [Generation Metrics](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7d4a3069-9186-4854-885d-ca0f7bcc17e8)\n\n## Quick Start\n\n### 1. Set YAML File\n\nFirst, you need to set the config YAML file for your RAG optimization.\n\nWe highly recommend using pre-made config YAML files for starter.\n\n- [Get Sample YAML](sample_config\u002Frag)\n    - [Sample YAML Guide](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002Foptimization\u002Fsample_config.html)\n- [Make Custom YAML Guide](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002Foptimization\u002Fcustom_config.html)\n\nHere is an example of the config YAML file to use three retrieval nodes, `prompt_maker`, and `generator` nodes.\n\n```yaml\nnode_lines:\n  - node_line_name: retrieve_node_line\n    nodes:\n      - node_type: lexical_retrieval\n        strategy:\n          metrics: [ retrieval_f1, retrieval_recall, retrieval_ndcg, retrieval_mrr ]\n        top_k: 3\n        modules:\n          - module_type: bm25\n      - node_type: semantic_retrieval\n        strategy:\n          metrics: [ retrieval_f1, retrieval_recall, retrieval_ndcg, retrieval_mrr ]\n        top_k: 3\n        modules:\n          - module_type: vectordb\n            vectordb: default\n      - node_type: hybrid_retrieval\n        strategy:\n          metrics: [ retrieval_f1, retrieval_recall, retrieval_ndcg, retrieval_mrr ]\n        top_k: 3\n        modules:\n          - module_type: hybrid_rrf\n            weight_range: (4,80)\n  - node_line_name: post_retrieve_node_line\n    nodes:\n      - node_type: prompt_maker  # Set Prompt Maker Node\n        strategy:\n          metrics: # Set Generation Metrics\n            - metric_name: meteor\n            - metric_name: rouge\n            - metric_name: sem_score\n              embedding_model: openai\n        modules:\n          - module_type: fstring\n            prompt: \"Read the passages and answer the given question. \\n Question: {query} \\n Passage: {retrieved_contents} \\n Answer : \"\n      - node_type: generator  # Set Generator Node\n        strategy:\n          metrics: # Set Generation Metrics\n            - metric_name: meteor\n            - metric_name: rouge\n            - metric_name: sem_score\n              embedding_model: openai\n        modules:\n          - module_type: openai_llm\n            llm: gpt-4o-mini\n            batch: 16\n```\n\n### 2. Run AutoRAG\n\nYou can evaluate your RAG pipeline with just a few lines of code.\n\n```python\nfrom autorag.evaluator import Evaluator\n\nevaluator = Evaluator(qa_data_path='your\u002Fpath\u002Fto\u002Fqa.parquet', corpus_data_path='your\u002Fpath\u002Fto\u002Fcorpus.parquet')\nevaluator.start_trial('your\u002Fpath\u002Fto\u002Fconfig.yaml')\n```\n\nor you can use the command line interface\n\n```bash\nautorag evaluate --config your\u002Fpath\u002Fto\u002Fdefault_config.yaml --qa_data_path your\u002Fpath\u002Fto\u002Fqa.parquet --corpus_data_path your\u002Fpath\u002Fto\u002Fcorpus.parquet\n```\n\nOnce it is done, you can see several files and folders created in your current directory.\nAt the trial folder named to numbers (like 0),\nyou can check `summary.csv` file that summarizes the evaluation results and the best RAG pipeline for your data.\n\nFor more details, you can check out how the folder structure looks like\nat [here](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002Foptimization\u002Ffolder_structure.html).\n\n### 3. Run Dashboard\n\nYou can run a dashboard to easily see the result.\n\n```bash\nautorag dashboard --trial_dir \u002Fyour\u002Fpath\u002Fto\u002Ftrial_dir\n```\n\n#### sample dashboard\n\n![dashboard](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_2b1fb87600e3.png)\n\n### 4. Deploy your optimal RAG pipeline\n\n### 4-1. Run as a Code\n\nYou can use an optimal RAG pipeline right away from the trial folder.\nThe trial folder is the directory used in the running dashboard. (like 0, 1, 2, ...)\n\n```python\nfrom autorag.deploy import Runner\n\nrunner = Runner.from_trial_folder('\u002Fyour\u002Fpath\u002Fto\u002Ftrial_dir')\nrunner.run('your question')\n```\n\n### 4-2. Run as an API server\n\nYou can run this pipeline as an API server.\n\nCheck out the API endpoint at [here](.\u002Fdocs\u002Fsource\u002Fdeploy\u002Fapi_endpoint.md).\n\n```python\nimport nest_asyncio\nfrom autorag.deploy import ApiRunner\n\nnest_asyncio.apply()\n\nrunner = ApiRunner.from_trial_folder('\u002Fyour\u002Fpath\u002Fto\u002Ftrial_dir')\nrunner.run_api_server()\n```\n\n```bash\nautorag run_api --trial_dir your\u002Fpath\u002Fto\u002Ftrial_dir --host 0.0.0.0 --port 8000\n```\n\nThe cli command uses extracted config YAML file. If you want to know it more, check\nout [here](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002Ftutorial.html#extract-pipeline-and-evaluate-test-dataset).\n\n### 4-3. Run as a Web Interface\n\nyou can run this pipeline as a web interface.\n\nCheck out the web interface at [here](deploy\u002Fweb.md).\n\n```bash\nautorag run_web --trial_path your\u002Fpath\u002Fto\u002Ftrial_path\n```\n\n#### sample web interface\n\n\u003Cimg width=\"1491\" alt=\"web_interface\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_2980764ca3af.png\">\n\n## ☎️ FaQ\n\n💻 [Hardware Specs](https:\u002F\u002Fedai.notion.site\u002FHardware-specs-28cefcf2a26246ffadc91e2f3dc3d61c?pvs=4)\n\n⭐ [Running AutoRAG](https:\u002F\u002Fedai.notion.site\u002FAbout-running-AutoRAG-44a8058307af42068fc218a073ee480b?pvs=4)\n\n🍯 [Tips\u002FTricks](https:\u002F\u002Fedai.notion.site\u002FTips-Tricks-10708a0e36ff461cb8a5d4fb3279ff15?pvs=4)\n\n☎️ [TroubleShooting](https:\u002F\u002Fmedium.com\u002F@autorag\u002Fautorag-troubleshooting-5cf872b100e3)\n\n## Thanks for shoutout\n\n### Company\n\n\u003Ca href=\"https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Fllamaindex_rag-pipelines-have-a-lot-of-hyperparameters-activity-7182053546593247232-HFMN\u002F\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_fc4c94b10dcb.png\" alt=\"llama index\" style=\"width:200px;height:auto;\">\n\u003C\u002Fa>\n\n### Individual\n\n- [Shubham Saboo](https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Fshubhamsaboo_just-found-the-solution-to-the-biggest-rag-activity-7255404464054939648-ISQ8\u002F)\n- [Kalyan KS](https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Fkalyanksnlp_rag-autorag-llms-activity-7258677155574788097-NgS0\u002F)\n\n---\n\n# ✨ Contributors ✨\n\nThanks go to these wonderful people:\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_293b7608edb1.png\" \u002F>\n\u003C\u002Fa>\n\n# Contribution\n\nWe are developing AutoRAG as open-source.\n\nSo this project welcomes contributions and suggestions. Feel free to contribute to this project.\n\nPlus, check out our detailed documentation at [here](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002Findex.html).\n\n## Citation\n\n```bibtex\n@misc{kim2024autoragautomatedframeworkoptimization,\n      title={AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline},\n      author={Dongkyu Kim and Byoungwook Kim and Donggeon Han and Matouš Eibich},\n      year={2024},\n      eprint={2410.20878},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.20878},\n}\n```\n","# AutoRAG\n\n一款用于自动为您的数据寻找最优 RAG 流程的 RAG 自动机器学习工具。\n\n![缩略图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_192cd8d2e363.png)\n\n![PyPI - 下载量](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002FAutoRAG)\n[![LinkedIn](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLinkedIn-Connect-blue?style=flat-square&logo=linkedin)](https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002F104375108\u002Fadmin\u002Fdashboard\u002F)\n![X（原 Twitter）关注](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002FAutoRAG_HQ)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Follow-orange?style=flat-square&logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002FAutoRAG)\n\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F7832\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_0b0cf02be0f5.png\" alt=\"Marker-Inc-Korea%2FAutoRAG | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n市面上有许多 RAG 流程和模块，但您并不清楚哪种流程最适合“您自己的数据”和“您的应用场景”。手动尝试并评估所有 RAG 模块既耗时又困难。然而，如果不进行这样的尝试，您将永远无法确定哪种 RAG 流程最符合您的实际需求。\n\nAutoRAG 是一款专为“您的数据”寻找最优 RAG 流程的工具。您可以使用自己的评估数据自动评估多种 RAG 模块组合，并找到最适合您应用场景的 RAG 流程。\n\nAutoRAG 提供了一种简单的方式来评估大量的 RAG 模块组合。立即试用，找到最适合您应用场景的 RAG 流程。\n\n请查阅我们的 📖 [文档](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002F)！！\n\n---\n\n## YouTube 教程\n\nhttps:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fassets\u002F96727832\u002Fc0d23896-40c0-479f-a17b-aa2ec3183a26\n\n_默认静音，请开启声音以收听旁白_\n\n您也可以在 [YouTube](https:\u002F\u002Fyoutu.be\u002F2ojK8xjyXAU?feature=shared) 上观看。\n\n## 在 HuggingFace Space 中使用 AutoRAG 🚀\n\n- [💬 简单 RAG 聊天机器人](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAutoRAG\u002FNaive-RAG-chatbot)\n- [✏️ AutoRAG 数据生成](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAutoRAG\u002FAutoRAG-data-creation)\n- [🚀 AutoRAG RAG 流程优化](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAutoRAG\u002FRAG-Pipeline-Optimization)\n\n## Colab 教程\n\n- [步骤 1：AutoRAG 基础 | 优化您的 RAG 流程](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F19OEQXO_pHN6gnn2WdfPd4hjnS-4GurVd?usp=sharing)\n- [步骤 2：数据生成 | 为 RAG 优化创建您自己的数据](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1BOdzMndYgMY_iqhwKcCCS7ezHbZ4Oz5X?usp=sharing)\n- [步骤 3：使用自定义 LLM 和嵌入模型 | 使用自定义模型](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F12VpWcSTSOsLSyW0BKb-kPoEzK22ACxvS?usp=sharing)\n\n# 目录\n\n- [快速安装](#quick-install)\n- [数据生成](#data-creation)\n    - [解析](#1-parsing)\n    - [分块](#2-chunking)\n    - [问答对生成](#3-qa-creation)\n- [RAG 优化](#rag-optimization)\n    - [AutoRAG 如何优化 RAG 流程？](#how-autorag-optimizes-rag-pipeline)\n    - [指标](#metrics)\n    - [快速入门](#quick-start-1)\n        - [设置 YAML 文件](#1-set-yaml-file)\n        - [运行 AutoRAG](#2-run-autorag)\n        - [运行仪表板](#3-run-dashboard)\n        - [部署您最优的 RAG 流程](#4-deploy-your-optimal-rag-pipeline)\n- [常见问题](#-faq)\n\n# 快速安装\n\n我们建议使用 Python 3.10 或更高版本来运行 AutoRAG。\n\n```bash\npip install AutoRAG\n```\n\n如果您希望使用本地模型，则需要安装 GPU 版本。\n\n```bash\npip install \"AutoRAG[gpu]\"\n```\n\n或者，如果您只需要解析功能，可以使用解析版本：\n\n```bash\npip install \"AutoRAG[gpu,parse]\"\n```\n\n# 数据生成\n\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAutoRAG\u002FAutoRAG-data-creation\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_0ac86e2beae4.png\" alt=\"Hugging Face 贴纸\" style=\"width:200px;height:auto;\">\n\u003C\u002Fa>\n\n![图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_e071efdb0c80.png)\n\n![图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_d6bcbcdbd8d8.png)\n\nRAG 优化需要两种类型的数据：问答数据集和语料库数据集。\n\n1. **问答** 数据集文件 (qa.parquet)\n2. **语料库** 数据集文件 (corpus.parquet)\n\n**问答** 数据集对于准确可靠的评估和优化至关重要。\n\n**语料库** 数据集则直接关系到 RAG 的性能。这是因为 RAG 会利用语料库检索文档，并基于这些文档生成答案。\n\n### 📌 支持的数据生成模块\n\n![图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_49ef22523536.png)\n\n- [支持的解析模块列表](https:\u002F\u002Fedai.notion.site\u002FSupporting-Parsing-Modules-e0b7579c7c0e4fb2963e408eeccddd75?pvs=4)\n- [支持的分块模块列表](https:\u002F\u002Fedai.notion.site\u002FSupporting-Chunk-Modules-8db803dba2ec4cd0a8789659106e86a3?pvs=4)\n\n## 快速入门\n\n### 1. 解析\n\n#### 设置 YAML 文件\n\n```yaml\nmodules:\n  - module_type: langchain_parse\n    parse_method: pdfminer\n```\n\n您也可以同时使用多个解析模块。不过，在这种情况下，每次解析结果都需要返回一个新的进程。\n\n#### 开始解析\n\n只需几行代码，即可解析您的原始文档。\n\n```python\nfrom autorag.parser import Parser\n\nparser = Parser(data_path_glob=\"your\u002Fdata\u002Fpath\u002F*\")\nparser.start_parsing(\"your\u002Fpath\u002Fto\u002Fparse_config.yaml\")\n```\n\n### 2. 分块\n\n#### 设置 YAML 文件\n\n```yaml\nmodules:\n  - module_type: llama_index_chunk\n    chunk_method: Token\n    chunk_size: 1024\n    chunk_overlap: 24\n    add_file_name: en\n```\n\n您同样可以同时使用多个分块模块。在这种情况下，您需要先用一个语料库创建问答对，再将剩余的语料库映射到问答数据中。如果分块方法不同，检索基准也会有所差异，因此我们需要将其重新映射到问答数据集中。\n\n#### 开始分块\n\n只需几行代码，即可对解析后的结果进行分块。\n\n```python\nfrom autorag.chunker import Chunker\n\nchunker = Chunker.from_parquet(parsed_data_path=\"your\u002Fparsed\u002Fdata\u002Fpath\")\nchunker.start_chunking(\"your\u002Fpath\u002Fto\u002Fchunk_config.yaml\")\n```\n\n### 3. QA 数据集创建\n\n只需几行代码即可创建 QA 数据集。\n\n```python\nimport pandas as pd\nfrom llama_index.llms.openai import OpenAI\n\nfrom autorag.data.qa.filter.dontknow import dontknow_filter_rule_based\nfrom autorag.data.qa.generation_gt.llama_index_gen_gt import (\n\tmake_basic_gen_gt,\n\tmake_concise_gen_gt,\n)\nfrom autorag.data.qa.schema import Raw, Corpus\nfrom autorag.data.qa.query.llama_gen_query import factoid_query_gen\nfrom autorag.data.qa.sample import random_single_hop\n\nllm = OpenAI()\nraw_df = pd.read_parquet(\"your\u002Fpath\u002Fto\u002Fparsed.parquet\")\nraw_instance = Raw(raw_df)\n\ncorpus_df = pd.read_parquet(\"your\u002Fpath\u002Fto\u002Fcorpus.parquet\")\ncorpus_instance = Corpus(corpus_df, raw_instance)\n\ninitial_qa = (\n\tcorpus_instance.sample(random_single_hop, n=3)\n\t.map(\n\t\tlambda df: df.reset_index(drop=True),\n\t)\n\t.make_retrieval_gt_contents()\n\t.batch_apply(\n\t\tfactoid_query_gen,  # 查询生成\n\t\tllm=llm,\n\t)\n\t.batch_apply(\n\t\tmake_basic_gen_gt,  # 答案生成（基础版）\n\t\tllm=llm,\n\t)\n\t.batch_apply(\n\t\tmake_concise_gen_gt,  # 答案生成（简洁版）\n\t\tllm=llm,\n\t)\n\t.filter(\n\t\tdontknow_filter_rule_based,  # 过滤“不知道”情况\n\t\tlang=\"en\",\n\t)\n)\n\ninitial_qa.to_parquet('.\u002Fqa.parquet', '.\u002Fcorpus.parquet')\n```\n\n# RAG 优化\n\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAutoRAG\u002FRAG-Pipeline-Optimization\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_0ac86e2beae4.png\" alt=\"Hugging Face 贴纸\" style=\"width:200px;height:auto;\">\n\u003C\u002Fa>\n\n![图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_07bed2d76e90.png)\n\n![rag](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_548d236ede08.png)\n\n## AutoRAG 如何优化 RAG 流程？\n\n以下是仅显示节点的 AutoRAG RAG 结构图。\n\n![图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_93096663760e.png)\n\n以下是展示所有节点和模块的图像。\n\n![图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_7bccd6bc90a5.png)\n\n![rag_opt_gif](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_5825f3c68063.png)\n\n### 📌 支持 RAG 优化的节点与模块\n\n- [支持的 RAG 模块列表](https:\u002F\u002Fedai.notion.site\u002FSupporting-Nodes-modules-0ebc7810649f4e41aead472a92976be4?pvs=4)\n\n## 指标\n\nAutoRAG 中每个节点所使用的指标如下所示。\n\n![图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_1af1737d419d.png)\n\n![图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_3664e8599f48.png)\n\n- [支持的指标列表](https:\u002F\u002Fedai.notion.site\u002FSupporting-metrics-867d71caefd7401c9264dd91ba406043?pvs=4)\n\n以下是 AutoRAG 所支持指标的详细信息。\n\n- [检索指标](https:\u002F\u002Fedai.notion.site\u002FRetrieval-Metrics-dde3d9fa1d9547cdb8b31b94060d21e7?pvs=4)\n- [检索令牌指标](https:\u002F\u002Fedai.notion.site\u002FRetrieval-Token-Metrics-c3e2d83358e04510a34b80429ebb543f?pvs=4)\n- [生成指标](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7d4a3069-9186-4854-885d-ca0f7bcc17e8)\n\n## 快速入门\n\n### 1. 设置 YAML 文件\n\n首先，您需要为您的 RAG 优化设置配置 YAML 文件。\n\n我们强烈建议初学者使用预先准备好的配置 YAML 文件。\n\n- [获取示例 YAML](sample_config\u002Frag)\n    - [示例 YAML 指南](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002Foptimization\u002Fsample_config.html)\n- [自定义 YAML 指南](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002Foptimization\u002Fcustom_config.html)\n\n以下是一个使用三个检索节点、`prompt_maker` 和 `generator` 节点的配置 YAML 文件示例。\n\n```yaml\nnode_lines:\n  - node_line_name: retrieve_node_line\n    nodes:\n      - node_type: lexical_retrieval\n        strategy:\n          metrics: [ retrieval_f1, retrieval_recall, retrieval_ndcg, retrieval_mrr ]\n        top_k: 3\n        modules:\n          - module_type: bm25\n      - node_type: semantic_retrieval\n        strategy:\n          metrics: [ retrieval_f1, retrieval_recall, retrieval_ndcg, retrieval_mrr ]\n        top_k: 3\n        modules:\n          - module_type: vectordb\n            vectordb: default\n      - node_type: hybrid_retrieval\n        strategy:\n          metrics: [ retrieval_f1, retrieval_recall, retrieval_ndcg, retrieval_mrr ]\n        top_k: 3\n        modules:\n          - module_type: hybrid_rrf\n            weight_range: (4,80)\n  - node_line_name: post_retrieve_node_line\n    nodes:\n      - node_type: prompt_maker  # 设置提示生成器节点\n        strategy:\n          metrics: # 设置生成指标\n            - metric_name: meteor\n            - metric_name: rouge\n            - metric_name: sem_score\n              embedding_model: openai\n        modules:\n          - module_type: fstring\n            prompt: \"阅读段落并回答给定的问题。 \\n 问题：{query} \\n 段落：{retrieved_contents} \\n 答案：\"\n      - node_type: generator  # 设置生成器节点\n        strategy:\n          metrics: # 设置生成指标\n            - metric_name: meteor\n            - metric_name: rouge\n            - metric_name: sem_score\n              embedding_model: openai\n        modules:\n          - module_type: openai_llm\n            llm: gpt-4o-mini\n            batch: 16\n```\n\n### 2. 运行 AutoRAG\n\n只需几行代码即可评估您的 RAG 流程。\n\n```python\nfrom autorag.evaluator import Evaluator\n\nevaluator = Evaluator(qa_data_path='your\u002Fpath\u002Fto\u002Fqa.parquet', corpus_data_path='your\u002Fpath\u002Fto\u002Fcorpus.parquet')\nevaluator.start_trial('your\u002Fpath\u002Fto\u002Fconfig.yaml')\n```\n\n或者您也可以使用命令行界面：\n\n```bash\nautorag evaluate --config your\u002Fpath\u002Fto\u002Fdefault_config.yaml --qa_data_path your\u002Fpath\u002Fto\u002Fqa.parquet --corpus_data_path your\u002Fpath\u002Fto\u002Fcorpus.parquet\n```\n\n运行完成后，您当前目录下会生成多个文件和文件夹。\n在以数字命名的试验文件夹中（如 0），您可以查看 `summary.csv` 文件，其中总结了评估结果以及适合您数据的最佳 RAG 流程。\n\n更多详细信息，请参阅文件夹结构的说明：\n[这里](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002Foptimization\u002Ffolder_structure.html)。\n\n### 3. 运行仪表板\n\n您可以通过运行仪表板轻松查看结果。\n\n```bash\nautorag dashboard --trial_dir \u002Fyour\u002Fpath\u002Fto\u002Ftrial_dir\n```\n\n#### 示例仪表板\n\n![dashboard](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_2b1fb87600e3.png)\n\n### 4. 部署您的最佳 RAG 流程\n\n### 4-1. 直接作为代码运行\n\n您可以直接从试验文件夹中使用最佳的 RAG 流程。试验文件夹是运行仪表板时使用的目录。（如 0、1、2、…）\n\n```python\nfrom autorag.deploy import Runner\n\nrunner = Runner.from_trial_folder('\u002Fyour\u002Fpath\u002Fto\u002Ftrial_dir')\nrunner.run('your question')\n```\n\n### 4-2. 作为 API 服务器运行\n\n您可以将此流水线作为 API 服务器运行。\n\nAPI 端点请参见[此处](.\u002Fdocs\u002Fsource\u002Fdeploy\u002Fapi_endpoint.md)。\n\n```python\nimport nest_asyncio\nfrom autorag.deploy import ApiRunner\n\nnest_asyncio.apply()\n\nrunner = ApiRunner.from_trial_folder('\u002Fyour\u002Fpath\u002Fto\u002Ftrial_dir')\nrunner.run_api_server()\n```\n\n```bash\nautorag run_api --trial_dir your\u002Fpath\u002Fto\u002Ftrial_dir --host 0.0.0.0 --port 8000\n```\n\n该命令行工具使用提取的配置 YAML 文件。如果您想了解更多，请参阅[此处](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002Ftutorial.html#extract-pipeline-and-evaluate-test-dataset)。\n\n### 4-3. 作为 Web 界面运行\n\n您也可以将此流水线作为 Web 界面运行。\n\nWeb 界面请参见[此处](deploy\u002Fweb.md)。\n\n```bash\nautorag run_web --trial_path your\u002Fpath\u002Fto\u002Ftrial_path\n```\n\n#### 示例 Web 界面\n\n\u003Cimg width=\"1491\" alt=\"web_interface\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_2980764ca3af.png\">\n\n## ☎️ 常见问题解答\n\n💻 [硬件规格](https:\u002F\u002Fedai.notion.site\u002FHardware-specs-28cefcf2a26246ffadc91e2f3dc3d61c?pvs=4)\n\n⭐ [运行 AutoRAG](https:\u002F\u002Fedai.notion.site\u002FAbout-running-AutoRAG-44a8058307af42068fc218a073ee480b?pvs=4)\n\n🍯 [技巧与窍门](https:\u002F\u002Fedai.notion.site\u002FTips-Tricks-10708a0e36ff461cb8a5d4fb3279ff15?pvs=4)\n\n☎️ [故障排除](https:\u002F\u002Fmedium.com\u002F@autorag\u002Fautorag-troubleshooting-5cf872b100e3)\n\n## 感谢鸣谢\n\n### 公司\n\n\u003Ca href=\"https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Fllamaindex_rag-pipelines-have-a-lot-of-hyperparameters-activity-7182053546593247232-HFMN\u002F\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_fc4c94b10dcb.png\" alt=\"llama index\" style=\"width:200px;height:auto;\">\n\u003C\u002Fa>\n\n### 个人\n\n- [Shubham Saboo](https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Fshubhamsaboo_just-found-the-solution-to-the-biggest-rag-activity-7255404464054939648-ISQ8\u002F)\n- [Kalyan KS](https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Fkalyanksnlp_rag-autorag-llms-activity-7258677155574788097-NgS0\u002F)\n\n---\n\n# ✨ 贡献者 ✨\n\n感谢以下各位优秀人士：\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_readme_293b7608edb1.png\" \u002F>\n\u003C\u002Fa>\n\n# 贡献\n\n我们正在以开源方式开发 AutoRAG。\n\n因此，本项目欢迎各种贡献和建议。欢迎您参与其中。\n\n此外，您还可以查阅我们的详细文档[此处](https:\u002F\u002Fmarker-inc-korea.github.io\u002FAutoRAG\u002Findex.html)。\n\n## 引用\n\n```bibtex\n@misc{kim2024autoragautomatedframeworkoptimization,\n      title={AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline},\n      author={Dongkyu Kim and Byoungwook Kim and Donggeon Han and Matouš Eibich},\n      year={2024},\n      eprint={2410.20878},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.20878},\n}\n```","# AutoRAG 快速上手指南\n\nAutoRAG 是一款 RAG（检索增强生成）AutoML 工具，旨在自动为您的数据寻找最优的 RAG 流水线。它支持自动评估多种 RAG 模块组合，帮助您快速确定最适合自身业务场景的方案。\n\n## 环境准备\n\n*   **操作系统**：Linux, macOS, Windows\n*   **Python 版本**：推荐 **Python 3.10** 或更高版本\n*   **硬件要求**：\n    *   基础使用：CPU 即可\n    *   本地模型运行或大规模数据处理：建议配备 NVIDIA GPU\n*   **前置依赖**：确保已安装 `pip` 包管理工具\n\n## 安装步骤\n\n### 1. 基础安装\n适用于大多数标准场景：\n```bash\npip install AutoRAG\n```\n\n### 2. GPU 加速版（推荐）\n如果您计划使用本地大模型或进行高性能计算：\n```bash\npip install \"AutoRAG[gpu]\"\n```\n\n### 3. 完整功能版（含解析模块）\n如果您需要处理 PDF 等原始文档进行数据解析：\n```bash\npip install \"AutoRAG[gpu,parse]\"\n```\n\n> **提示**：国内用户若下载缓慢，可添加清华源或阿里源加速：\n> `pip install AutoRAG -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n## 基本使用\n\nAutoRAG 的核心工作流分为三个阶段：**数据创建** -> **配置流水线** -> **自动优化评估**。\n\n### 第一步：准备数据\nAutoRAG 需要两种核心数据文件：\n1.  **QA 数据集** (`qa.parquet`)：用于评估的问题与标准答案。\n2.  **语料库数据集** (`corpus.parquet`)：用于检索的知识文档切片。\n\n您可以使用 AutoRAG 内置模块从原始文档生成这些数据，或直接加载已有的 parquet 文件。以下是一个通过代码生成 QA 数据的简化示例：\n\n```python\nimport pandas as pd\nfrom llama_index.llms.openai import OpenAI\nfrom autorag.data.qa.schema import Raw, Corpus\nfrom autorag.data.qa.query.llama_gen_query import factoid_query_gen\nfrom autorag.data.qa.generation_gt.llama_index_gen_gt import make_basic_gen_gt, make_concise_gen_gt\nfrom autorag.data.qa.filter.dontknow import dontknow_filter_rule_based\nfrom autorag.data.qa.sample import random_single_hop\n\n# 初始化 LLM\nllm = OpenAI()\n\n# 加载已解析和切片的数据 (假设已完成 parsing 和 chunking)\nraw_df = pd.read_parquet(\"your\u002Fpath\u002Fto\u002Fparsed.parquet\")\nraw_instance = Raw(raw_df)\n\ncorpus_df = pd.read_parquet(\"your\u002Fpath\u002Fto\u002Fcorpus.parquet\")\ncorpus_instance = Corpus(corpus_df, raw_instance)\n\n# 生成 QA 对并保存\ninitial_qa = (\n\tcorpus_instance.sample(random_single_hop, n=3)\n\t.map(lambda df: df.reset_index(drop=True))\n\t.make_retrieval_gt_contents()\n\t.batch_apply(factoid_query_gen, llm=llm)\n\t.batch_apply(make_basic_gen_gt, llm=llm)\n\t.batch_apply(make_concise_gen_gt, llm=llm)\n\t.filter(dontknow_filter_rule_based, lang=\"en\")\n)\n\ninitial_qa.to_parquet('.\u002Fqa.parquet', '.\u002Fcorpus.parquet')\n```\n\n### 第二步：配置优化策略 (YAML)\n创建一个配置文件（例如 `config.yaml`），定义您想要测试的检索器、提示词构建器和生成器组合。AutoRAG 将自动遍历这些组合并评分。\n\n```yaml\nnode_lines:\n  - node_line_name: retrieve_node_line\n    nodes:\n      - node_type: lexical_retrieval\n        strategy:\n          metrics: [ retrieval_f1, retrieval_recall ]\n        top_k: 3\n        modules:\n          - module_type: bm25\n      - node_type: semantic_retrieval\n        strategy:\n          metrics: [ retrieval_f1, retrieval_recall ]\n        top_k: 3\n        modules:\n          - module_type: vectordb\n            vectordb: default\n  - node_line_name: post_retrieve_node_line\n    nodes:\n      - node_type: prompt_maker\n        strategy:\n          metrics:\n            - metric_name: meteor\n            - metric_name: rouge\n        modules:\n          - module_type: fstring\n            prompt: \"Read the passages and answer the given question. \\n Question: {query} \\n Passage: {retrieved_contents} \\n Answer : \"\n      - node_type: generator\n        strategy:\n          metrics:\n            - metric_name: meteor\n            - metric_name: rouge\n        modules:\n          - module_type: openai_llm\n            llm: gpt-4o-mini\n            batch: 16\n```\n\n### 第三步：运行评估\n使用 Python 脚本或命令行启动自动优化过程。\n\n**方式 A：Python 代码**\n```python\nfrom autorag.evaluator import Evaluator\n\nevaluator = Evaluator(qa_data_path='your\u002Fpath\u002Fto\u002Fqa.parquet', corpus_data_path='your\u002Fpath\u002Fto\u002Fcorpus.parquet')\nevaluator.start_trial('your\u002Fpath\u002Fto\u002Fconfig.yaml')\n```\n\n**方式 B：命令行**\n```bash\nautorag evaluate --config your\u002Fpath\u002Fto\u002Fconfig.yaml --qa_data_path your\u002Fpath\u002Fto\u002Fqa.parquet --corpus_data_path your\u002Fpath\u002Fto\u002Fcorpus.parquet\n```\n\n### 第四步：查看结果与部署\n运行完成后，当前目录下会生成以数字命名的试验文件夹（如 `0`, `1`...）。\n*   查看 `summary.csv`：包含所有测试组合的评分排名及最佳流水线配置。\n*   部署：根据生成的最佳配置，您可以直接加载该流水线用于生产环境。","某电商公司的算法团队正致力于构建一个智能客服系统，需要让 AI 基于海量商品说明书和售后政策文档，准确回答用户的具体咨询。\n\n### 没有 AutoRAG 时\n- **盲目试错成本高**：面对数十种检索器、分块策略和生成模型的组合，工程师只能凭经验手动搭建并逐一测试，耗时数周仍难确定最佳方案。\n- **评估标准不统一**：缺乏自动化的评估框架，不同 pipeline 的效果对比依赖人工抽检，主观性强且难以量化“哪个更适合自家数据”。\n- **优化迭代缓慢**：一旦业务数据更新或场景微调，重新调整参数和模块需要重复大量机械性工作，导致模型上线周期被严重拉长。\n- **资源浪费严重**：为了追求效果，往往默认堆砌高性能大模型和复杂检索逻辑，却忽略了针对特定数据集的轻量化最优解。\n\n### 使用 AutoRAG 后\n- **自动化寻优**：AutoRAG 像 AutoML 一样自动遍历多种 RAG 模块组合，利用专属评估数据在几小时内即可锁定针对该电商数据的最佳 Pipeline。\n- **量化决策依据**：内置多维度的自动评估指标，直观展示不同配置下的准确率与召回率，让技术选型从“拍脑袋”变为“看数据”。\n- **敏捷迭代部署**：当新增促销规则文档时，只需重新运行配置，AutoRAG 能快速验证并输出新的最优策略，大幅缩短模型更新周期。\n- **性价比最大化**：通过自动搜索，发现了一套在保持高回答质量的同时，计算资源消耗更低的轻量级组合，显著降低了推理成本。\n\nAutoRAG 将原本繁琐的 RAG 调优过程转化为自动化的数据驱动决策，帮助团队以最低成本快速落地最适合自身业务的最优检索增强生成方案。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMarker-Inc-Korea_AutoRAG_7f9f3fc6.png","Marker-Inc-Korea","Markr.AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FMarker-Inc-Korea_64669623.png","ML research company for industrial use",null,"cheol@markr.ai","markr.ai","https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea",[81,85],{"name":82,"color":83,"percentage":84},"Python","#3572A5",99.8,{"name":86,"color":87,"percentage":88},"HTML","#e34c26",0.2,4709,392,"2026-04-18T19:53:50","Apache-2.0","未说明","非必需。仅在运行本地模型（local models）时需要安装 GPU 版本（AutoRAG[gpu]），具体显卡型号、显存大小及 CUDA 版本未在文档中明确指定。",{"notes":96,"python":97,"dependencies":98},"该工具是一个 RAG AutoML 框架，基础安装仅需 'pip install AutoRAG'。若需使用本地模型进行优化或解析 PDF 等文档，需分别安装 'AutoRAG[gpu]' 或 'AutoRAG[gpu,parse]' 扩展包。运行前必须准备 QA 数据集 (qa.parquet) 和语料库数据集 (corpus.parquet)。支持通过 YAML 配置文件灵活定义检索、提示词生成和生成器节点。","3.10+",[99,100,101,102,103],"langchain","llama_index","pandas","pdfminer (可选，用于解析)","openai (用于 LLM 和 Embedding)",[14,35,105,16],"其他",[107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124],"analysis","automl","benchmarking","document-parser","embeddings","evaluation","llm","llm-evaluation","llm-ops","open-source","ops","optimization","pipeline","python","qa","rag","rag-evaluation","retrieval-augmented-generation","2026-03-27T02:49:30.150509","2026-04-20T04:06:34.108709",[128,133,138,143,148,152],{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},43258,"如何在运行评估时跳过验证步骤以避免 'doc_id not found' 错误？","当使用 passage augmenter 节点但验证功能尚未完善时，可能会遇到 'ValueError: doc_id not found in corpus_data' 错误。可以通过以下两种方式跳过验证：\n1. 命令行方式：添加 --skip_validation true 参数\n   autorag evaluate --config your\u002Fpath\u002Fto\u002Fdefault_config.yaml --qa_data_path your\u002Fpath\u002Fto\u002Fqa.parquet --corpus_data_path your\u002Fpath\u002Fto\u002Fcorpus.parquet --project_dir .\u002Fyour\u002Fproject\u002Fdirectory --skip_validation true\n2. Python SDK 方式：在 start_trial 中设置 skip_validation=True\n   from autorag.evaluator import Evaluator\n   evaluator = Evaluator(qa_data_path='...', corpus_data_path='...', project_dir='...')\n   evaluator.start_trial('...', skip_validation=True)\n或者暂时不要在配置文件中启用 passage augmenter 节点。","https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fissues\u002F854",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},43259,"是否支持 Ollama 和 Hugging Face 模型，是否有相关的 Colab 示例？","目前官方已不再维护 Colab 笔记本。但是用户可以在本地通过 YAML 配置文件使用 Ollama 和 Hugging Face 模型。配置示例如下：\n在 vectordb 部分指定 embedding_model 为 huggingface_baai_bge_small；\n在 generator 节点中指定 module_type 为 vllm 以支持本地模型推理。\n虽然官方没有提供直接的 Colab 链接，但社区用户分享了相关的 notebook 代码供参考。","https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fissues\u002F1134",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},43260,"如何在运行 Web 界面（run_web）时添加自定义的 Hugging Face Embedding 模型？","目前 run_web 仅支持 CLI 启动，不像 run_api_server() 那样支持通过 Python SDK 动态注册自定义模型（如 autorag.embedding_models[\"custom\"] = ...）。这是一个已知的使用限制。如果遇到相关异步事件循环错误（如 'There is no current event loop in thread'），可能是由于 Gradio 与 asyncio\u002Fanyio 的兼容性问题，建议关注后续的代码重构或尝试使用 API 服务器模式代替 Web 模式进行自定义模型的部署。","https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fissues\u002F616",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},43261,"如何为项目贡献新 Logo 或参与设计工作？","项目欢迎设计师或擅长编写 Midjourney 提示词的用户贡献新的 Logo 设计。贡献者可以将设计提案提交到 Issue 中，一旦被采纳合并，即可成为官方贡献者。例如，新的 AutoRAG Logo 就是由社区成员 @taehallm 提出并被采纳的。此外，项目计划同步更新 Discord 服务器的 Logo。","https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fissues\u002F623",{"id":149,"question_zh":150,"answer_zh":151,"source_url":142},43262,"在进行代码合并或贡献前，如何处理代码格式化和 Linter 检查？","为了避免合并冲突并保持代码风格一致，建议在提交前使用 ruff 工具进行格式化和检查。具体步骤如下：\n1. 确保项目根目录下的 pyproject.toml 中包含 ruff 配置。\n2. 运行以下命令自动修复问题并格式化代码：\n   ruff check --fix\n   ruff format\n项目已在相关版本中集成了 ruff Linter 并对所有代码进行了重新格式化。",{"id":153,"question_zh":154,"answer_zh":155,"source_url":132},43263,"AutoRAG 的数据创建教程与 Step2 笔记本中的 QA-Corpus 映射逻辑为何不一致？","教程文档和笔记本在数据流展示上可能存在差异，但核心逻辑是一致的：\n1. 初始阶段：从解析后的原始数据（initial_raw_df）生成初始分块（chunk），并基于此采样生成初始 QA 对。\n2. 优化阶段：利用 Chunker 对原始数据进行多种策略的分块，生成新的语料库。\n3. 映射更新：使用新的语料库实例更新现有的 QA 数据（qa.update_corpus），确保 QA 对与新分块内容的正确映射。\n如果在实际操作中遇到困惑，请确保 'raw' 实例始终指向最初的解析数据，而 'corpus' 实例指向当前使用的分块数据。",[157,162,167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242,247,252],{"id":158,"version":159,"summary_zh":160,"released_at":161},342935,"v0.3.22","## 变更内容\n* [功能请求] @hypoxisaurea 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1199 中新增了 NVIDIA 重排序器模块\n* 功能：@octo-patch 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1204 中将 MiniMax LLM 添加为一级生成器模块\n* 修复：@sebastiondev 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1205 中对 BM25 语料文件使用受限的反序列化器，以防止任意代码执行（CWE-502）\n* 将 LangChain 升级至 v1，并强化可选集成的回退机制，由 @vkehfdl1 完成，详见 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1211\n\n## 新贡献者\n* @hypoxisaurea 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1199 中完成了首次贡献\n* @octo-patch 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1204 中完成了首次贡献\n* @sebastiondev 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1205 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.21...v0.3.22","2026-04-03T15:50:54",{"id":163,"version":164,"summary_zh":165,"released_at":166},342936,"v0.3.21","## 变更内容\n* 移除动态版本号，并将构建后端替换为 hatchling。添加 ty… 由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1186 中完成\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.20...v0.3.21","2025-11-14T05:40:08",{"id":168,"version":169,"summary_zh":170,"released_at":171},342937,"v0.3.20","## 变更内容\n* @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1160 中添加了 vllm 嵌入模型\n* @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1176 中删除了与 GUI 相关的代码，以精简代码库\n* @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1178 中重构了文档及 PyPI 发布的 GitHub Actions\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.19...v0.3.20","2025-11-14T05:28:38",{"id":173,"version":174,"summary_zh":175,"released_at":176},342938,"v0.3.19","## 变更内容\n* @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1150 中启用了使用 openai_llm、llama_index_llm 和 vllm_api 的聊天提示功能。\n* @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1151 中修改了 README.md，添加了新的徽章和图片。\n* @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1153 中修复了生成器模块。\n* @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1155 中修复了一些错误，并发布了 0.3.19 版本。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.18...v0.3.19","2025-10-13T05:32:49",{"id":178,"version":179,"summary_zh":180,"released_at":181},342939,"v0.3.18","## 变更内容\n* 添加 chat_fstring 模块，并在 vLLM 中支持聊天功能（+推理）——由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1141 中完成\n* 修复较新版本 vLLM 中的 model_executor 删除错误——由 @Copilot 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1144 中完成\n* 为 AutoRAG 开发添加全面的 GitHub Copilot 使用说明——由 @Copilot 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1146 中完成\n* 将 Cohere 版本升级至 v.3.18.0——由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1147 中完成\n* 发布版本 0.3.18——由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1148 中完成\n\n## 新贡献者\n* @Copilot 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1144 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.17...v0.3.18","2025-09-20T07:26:37",{"id":183,"version":184,"summary_zh":185,"released_at":186},342940,"v0.3.17","## 变更内容\n* Hugging Face 模型在摄取数据时会自动禁用异步模式，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1126 中实现。\n* 在 sphinx.yml 中添加 uv run sphinx-build 命令，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1128 中完成。\n* 将 OpenAI 版本更新至最新，并替换 openai.resources.beta… 相关内容，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1131 中完成。\n* 在 \u002F.github\u002Fworkflows 中将 tj-actions\u002Fchanged-files 从 44 升级至 46，由 @dependabot[bot] 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1129 中执行。\n* 在 \u002Fapi 中将 Redis 从 4.5.1 升级至 4.5.4，由 @dependabot[bot] 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1083 中完成。\n* 将检索模块拆分为“混合”、“语义”和“词法”三个部分，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1135 中实现。\n* 发布版本 v0.3.17，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1136 中完成。\n* 尝试使用更新的 Pydantic 版本，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1139 中提出。\n\n## 新贡献者\n* @dependabot[bot] 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1129 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.16...v0.3.17","2025-08-31T07:18:43",{"id":188,"version":189,"summary_zh":190,"released_at":191},342941,"v0.3.16","## 变更内容\n* 由 @zenoengine 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1098 中修复了 README.md 中的失效链接。\n* 由 @sappho192 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1103 中将日文提示语修正得更加自然。\n* 由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1108 中防止了 Unicode 解码错误。\n* 由 @parssky 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1109 中为 semantic splitter llama_index 模块添加了对 openai_like 嵌入模型的支持。\n* 由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1111 中将 Chroma 升级到最新版本，并切换至 uv。\n* 由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1116 中移除了 auto-rag.com URL 的使用。\n* 由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1119 中解决了依赖冲突，并发布了 v0.3.15 版本。\n* 由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1121 中修复了 YAML 文件。\n\n## 新贡献者\n* @zenoengine 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1098 中完成了首次贡献。\n* @sappho192 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1103 中完成了首次贡献。\n* @parssky 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1109 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.14...v0.3.16","2025-06-22T15:09:16",{"id":193,"version":194,"summary_zh":195,"released_at":196},342942,"v0.3.14","## 变更内容\n* 将 AutoRAG 改造为 Monorepo 项目，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F960 中完成。\n* 将安装方式改为使用 yarn install，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1090 中完成。\n* 使用 Docker Compose 运行 GUI Next.js 应用程序，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1092 中完成。\n* 在 openai_llm.py 中添加 gpt-4.5-preview 模型，由 @minsing-jin 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1095 中完成。\n* 发布版本 0.3.14，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1097 中完成。\n\n## 新贡献者\n* @minsing-jin 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1095 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.13...v0.3.14","2025-03-03T06:29:44",{"id":198,"version":199,"summary_zh":200,"released_at":201},342943,"v0.3.13","## 变更内容\n* 🚑 修复：更新 API 服务的容器镜像标签，以使用最新版本……由 @hongsw 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1048 中完成\n* 从段落过滤模块的 kwargs 中移除 embedding_model 参数，由 @rjwharry 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1043 中完成\n* 在 \u002Fv1\u002Fretrieve 端点中添加得分及其他元数据，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1055 中完成\n* 更新 Sphinx 的 GitHub Actions 配置，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1059 中完成\n* 修复：由于 pyproject.toml 格式错误导致 Poetry shell 启动失败的问题，由 @korjsh 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1051 中完成\n* 新功能：将 intfloat\u002Fmultilingual-e5-large-instruct 添加到嵌入模型列表中，由 @e7217 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1061 中完成\n* 添加 Cohere 重排序模型 v3.5，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1050 中完成\n* 新功能：启用动态嵌入模型 #1060，由 @e7217 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1063 中完成\n* 实现 Node Generator 模块，用于与 vllm 服务 API 集成，由 @korjsh 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1062 中完成\n* 添加 Llama Index 的 Ollama 嵌入模型，并修复 embedding_model 的类型提示，由 @rjwharry 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1071 中完成\n* 修复：解析结果中存在重复项的问题（#1064），通过拼接每个解析结果来解决，由 @e7217 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1066 中完成\n* 发布版本 v0.3.13，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1075 中完成\n\n## 新贡献者\n* @korjsh 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1051 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.12...v0.3.13","2025-01-25T05:28:12",{"id":203,"version":204,"summary_zh":205,"released_at":206},342944,"v0.3.12","## 变更内容\n* 在 CLI 中，默认的 API 远程设置现为 False，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1028 中实现。\n* [热修复] 修复 AutoRAG API 错误，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1032 中完成。\n* 文档：更新 Milvus 配置示例，由 @e7217 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1030 中完成。\n* 添加关于移除与文件名相关文件的说明，由 @vkehfdl1 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1038 中完成。\n* 修改文件类型逻辑，由 @bwook00 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1044 中完成。\n* 将 all_files 设置为 True 时，更改 parse_result Parquet 文件的命名规则，由 @bwook00 在 https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1046 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.11...v0.3.12","2024-12-09T08:40:05",{"id":208,"version":209,"summary_zh":210,"released_at":211},342945,"v0.3.11","## What's Changed\r\n* docs[fix]: modify contents on upstage parser by @e7217 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F967\r\n* Resolve Pydantic 2.10.0 conflict issue with latest LlamaIndex by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F973\r\n* \bAdd Qdrant vectorDB by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F976\r\n* Replace to local embeddings at the gpu sample config YAML files by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F988\r\n* Add full YAML at vectorDB integration docs by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F994\r\n* feat: Set parameters for Milvus using the configuration file #998 by @e7217 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1001\r\n* docs: add warning about `AttributeError: vllm_model` by @e7217 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1014\r\n* fix: refactor  method to properly release vllm instance resources by @e7217 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1012\r\n* support parsing multiple types of documents at once by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1015\r\n* feat: add func to generate multiple quries by @e7217 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1009\r\n* Release\u002Fv0.3.11 by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F1017\r\n\r\n## New Contributors\r\n* @e7217 made their first contribution in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F967\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.10...v0.3.11","2024-11-29T13:40:33",{"id":213,"version":214,"summary_zh":215,"released_at":216},342946,"v0.3.10","## What's Changed\r\n* Add Integration part at docs by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F939\r\n* Update README.md by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F941\r\n* Add Weaviate VectorDB by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F949\r\n* add documentation for evaluate your custom rag by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F953\r\n* Add \u002Fv1\u002Fretrieve endpoint at API server by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F954\r\n* Add pinecone vector DB by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F955\r\n* Add Couchbase VectorDB by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F961\r\n* Release\u002Fv0.3.10 by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F964\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.9...v0.3.10","2024-11-20T06:02:56",{"id":218,"version":219,"summary_zh":220,"released_at":221},342947,"v0.3.9","## What's Changed\r\n* Edit documentation about data schema and descriptions by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F905\r\n* autorag —version by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F913\r\n* [Hotfix] fix hf space url at README.md by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F917\r\n* ✨ feat: improve sample size handling in Validator class by @hongsw in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F912\r\n* Fix error that missing init of huggingface llm and ollama by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F919\r\n* Fix: added table_html variable initialization by @effortprogrammer in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F920\r\n* enhanced documentation at custom LLM models by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F923\r\n* just return original texts when there is no corresponding tokenizer a… by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F931\r\n* add Arxiv citation for our paper by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F932\r\n* delete tqdm by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F934\r\n* add demojize with emoji package by @rjwharry in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F935\r\n* Release\u002Fv0.3.9 by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F936\r\n\r\n## New Contributors\r\n* @effortprogrammer made their first contribution in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F920\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.8...v0.3.9","2024-11-11T02:33:16",{"id":223,"version":224,"summary_zh":225,"released_at":226},342948,"v0.3.8","## What's Changed\r\n* Feature\u002Fdocker deploy push by @hongsw in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F887\r\n* Edit stream API endpoint and add instructions deploying kotaemon to fly.io by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F891\r\n* \bDelete trial path logic at parse & chunk + add detail docs & tutorial at docs by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F894\r\n* Feature\u002F#892 by @rjwharry in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F895\r\n* Add documentation for custom_query_gen and make_custom_gen_gt function by @rjwharry in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F897\r\n* Edit api routes url by @eduumach in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F899\r\n* Add test code for query expansion with vectordb  by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F902\r\n* Add progress bar by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F903\r\n* dump version 0.3.8 by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F904\r\n\r\n## New Contributors\r\n* @eduumach made their first contribution in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F899\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.7...v0.3.8","2024-10-30T14:53:35",{"id":228,"version":229,"summary_zh":230,"released_at":231},342949,"v0.3.7","## What's Changed\r\n* fix the error and release 0.3.5-rc1 by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F842\r\n* Add Huggingface Space at README.md by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F847\r\n* Add new Sample YAML file by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F848\r\n* Fix README.md by @Jake-Song in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F850\r\n* Add AWS Bedrock llm and upgrade VERSION 0.3.6 by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F856\r\n* Add roadmap and other badges at README.md by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F862\r\n* Add use multimodal feature at llama parse by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F868\r\n* ✨ feat: Update supporting nodes and modules information in index.md by @hongsw in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F859\r\n* Add External VectorDB Connections by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F872\r\n* Release\u002Fv0.3.7 by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F883\r\n\r\n## New Contributors\r\n* @Jake-Song made their first contribution in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F850\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.5...v0.3.7","2024-10-24T03:44:36",{"id":233,"version":234,"summary_zh":235,"released_at":236},342950,"v0.3.5","## What's Changed\r\n* Run validation at the start_trial  by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F826\r\n* AutoRAG api version & api docker container + gpu version docker container by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F823\r\n* Add FlashRank Reranker module by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F818\r\n* set the fixed port number of the panel dashboard by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F827\r\n* change stream to astream, and add non-async stream function by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F835\r\n* add setup python at sphinx.yml by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F836\r\n* Change recency filter parameter name to threshold_datetime from threshold by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F837\r\n* Release\u002Fv0.3.5 by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F838\r\n* [Hotfix] name change Konlpy at chunk_full.yaml by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F840\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.4...v0.3.5","2024-10-13T11:55:41",{"id":238,"version":239,"summary_zh":240,"released_at":241},342951,"v0.3.4","## What's Changed\r\n* Add OpenVINO Reranker module by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F808\r\n* Properly truncate to 8000 tokens when we use OpenAI Embeddings by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F812\r\n* Refactor API server with streaming and passage return  by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F810\r\n* ✨ feat: Added Docker push workflow, Dockerfile updates, and build script by @hongsw in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F807\r\n* Add VoyageAI Reranker module by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F809\r\n* calculate the right cosine similarity score at the get_id_scores by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F816\r\n* 日本語対応 by @wooheum-xin in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F814\r\n* Add Mixedbread AI Reranker Module by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F805\r\n* Release\u002Fv0.3.4 by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F813\r\n\r\n## New Contributors\r\n* @wooheum-xin made their first contribution in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F814\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.3...v0.3.4","2024-10-09T11:33:28",{"id":243,"version":244,"summary_zh":245,"released_at":246},342952,"v0.3.3","## What's Changed\r\n* [Parse Bug] Fix only parse the first page of the whole pdf files by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F783\r\n* [Parse Bug] Add non-table exists page to use clova.py by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F784\r\n* Prevent error that httpx uses different event loop at method chaining on the QA  by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F785\r\n* add deepeval metrics by @Eastsidegunn in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F750\r\n* Release\u002Fv0.3.3 by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F803\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.2...v0.3.3","2024-10-05T07:52:06",{"id":248,"version":249,"summary_zh":250,"released_at":251},342953,"v0.3.2","## What's Changed\r\n* [Hotfix] Fix parse path at support.py by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F778\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.1...v0.3.2","2024-10-03T02:30:09",{"id":253,"version":254,"summary_zh":255,"released_at":256},342954,"v0.3.1","## What's Changed\r\n* Add toctree by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F745\r\n* Fix minor errors at the documentations by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F747\r\n* add effective_order at bleu as True by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F748\r\n* add passage dependency filter at data creation by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F751\r\n* Add Passage Dependency at README.md by @bwook00 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F761\r\n* docs: update data_format.md by @eltociear in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F772\r\n* change the README and tutorial of deploying the result. by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F769\r\n* Windows support (partially) AutoRAG by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F766\r\n* Feature\u002Fhongsw\u002F671 dockerfile Add Dockerfile and Docker configuration for AutoRAG production environment by @hongsw in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F763\r\n* Add total three evolving methods to QA creation by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F767\r\n* Possible error when the QA retrieval_gt shape will be different by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F774\r\n* dump version 0.3.1 by @vkehfdl1 in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F776\r\n\r\n## New Contributors\r\n* @eltociear made their first contribution in https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fpull\u002F772\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FMarker-Inc-Korea\u002FAutoRAG\u002Fcompare\u002Fv0.3.0...v0.3.1","2024-10-02T05:06:43"]