[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-opendatalab--PDF-Extract-Kit":3,"tool-opendatalab--PDF-Extract-Kit":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":88,"forks":89,"last_commit_at":90,"license":91,"difficulty_score":10,"env_os":92,"env_gpu":93,"env_ram":92,"env_deps":94,"category_tags":106,"github_topics":79,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":107,"updated_at":108,"faqs":109,"releases":140},2724,"opendatalab\u002FPDF-Extract-Kit","PDF-Extract-Kit","A Comprehensive Toolkit for High-Quality PDF Content Extraction","PDF-Extract-Kit 是一款专为高质量提取复杂 PDF 文档内容而设计的开源工具包。它有效解决了传统方法在处理包含数学公式、表格、多栏排版及扫描图片的文档时，常出现的布局混乱、公式识别错误或文字丢失等难题。\n\n该工具集成了业界领先的文档解析模型，涵盖布局检测、公式定位与识别（转为 LaTeX）、高精度 OCR 文字提取以及表格还原等核心任务。其独特的技术亮点在于采用了模块化设计，用户只需调整配置文件或少量代码，即可像搭积木一样灵活组合不同模型，构建定制化的文档处理应用。此外，项目还提供了全面的评估基准，帮助用户根据具体场景选择最优模型组合。\n\nPDF-Extract-Kit 主要面向开发者和研究人员。如果您希望基于大模型构建文档翻译、智能问答或数字助手等应用，它可以作为强大的底层引擎提供精准的结构化数据。对于需要深入探索文档解析算法的研究者，它也是一个理想的实验平台。值得注意的是，若您的目标仅是快速将 PDF 转换为 Markdown 格式，官方推荐配合使用基于此工具构建的终端应用 MinerU，以获得更便捷的开箱即用体验。","\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_e3cec2cb0262.png\" width=\"220px\" style=\"vertical-align:middle;\">\n\u003C\u002Fp>\n\n\u003Cdiv align=\"center\">\n\nEnglish | [简体中文](.\u002FREADME_zh-CN.md)\n\n[PDF-Extract-Kit-1.0 Tutorial](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Fget_started\u002Fpretrained_model.html)\n\n[[Models (🤗Hugging Face)]](https:\u002F\u002Fhuggingface.co\u002Fopendatalab\u002FPDF-Extract-Kit-1.0) | [[Models(\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_375e99f5914e.png\" width=\"20px\">ModelScope)]](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FOpenDataLab\u002FPDF-Extract-Kit-1.0) \n \n🔥🔥🔥 [MinerU: Efficient Document Content Extraction Tool Based on PDF-Extract-Kit](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n    👋 join us on \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTdedn9GTXq\" target=\"_blank\">Discord\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fr.vansin.top\u002F?r=MinerU\" target=\"_blank\">WeChat\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n## Overview\n\n`PDF-Extract-Kit` is a powerful open-source toolkit designed to efficiently extract high-quality content from complex and diverse PDF documents. Here are its main features and advantages:\n\n- **Integration of Leading Document Parsing Models**: Incorporates state-of-the-art models for layout detection, formula detection, formula recognition, OCR, and other core document parsing tasks.\n- **High-Quality Parsing Across Diverse Documents**: Fine-tuned with diverse document annotation data to deliver high-quality results across various complex document types.\n- **Modular Design**: The flexible modular design allows users to easily combine and construct various applications by modifying configuration files and minimal code, making application building as straightforward as stacking blocks.\n- **Comprehensive Evaluation Benchmarks**: Provides diverse and comprehensive PDF evaluation benchmarks, enabling users to choose the most suitable model based on evaluation results.\n\n**Experience PDF-Extract-Kit now and unlock the limitless potential of PDF documents!**\n\n> **Note:** PDF-Extract-Kit is designed for high-quality document processing and functions as a model toolbox.    \n> If you are interested in extracting high-quality document content (e.g., converting PDFs to Markdown), please use [MinerU](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU), which combines the high-quality predictions from PDF-Extract-Kit with specialized engineering optimizations for more convenient and efficient content extraction.    \n> If you're a developer looking to create engaging applications such as document translation, document Q&A, or document assistants, you'll find it very convenient to build your own projects using PDF-Extract-Kit. In particular, we will periodically update the PDF-Extract-Kit\u002Fproject directory with interesting applications, so stay tuned!\n\n**We welcome researchers and engineers from the community to contribute outstanding models and innovative applications by submitting PRs to become contributors to the PDF-Extract-Kit project.**\n\n## Model Overview\n\n| **Task Type**     | **Description**                                                                 | **Models**                    |\n|-------------------|---------------------------------------------------------------------------------|-------------------------------|\n| **Layout Detection** | Locate different elements in a document: including images, tables, text, titles, formulas | `DocLayout-YOLO_ft`, `YOLO-v10_ft`, `LayoutLMv3_ft` | \n| **Formula Detection** | Locate formulas in documents: including inline and block formulas            | `YOLOv8_ft`                   |  \n| **Formula Recognition** | Recognize formula images into LaTeX source code                             | `UniMERNet`                   |  \n| **OCR**           | Extract text content from images (including location and recognition)            | `PaddleOCR`                   | \n| **Table Recognition** | Recognize table images into corresponding source code (LaTeX\u002FHTML\u002FMarkdown)   | `PaddleOCR+TableMaster`, `StructEqTable` |  \n| **Reading Order** | Sort and concatenate discrete text paragraphs                                    | Coming Soon!                  | \n\n## News and Updates\n- `2024.10.22` 🎉🎉🎉 We are excited to announce that table recognition model [StructTable-InternVL2-1B](https:\u002F\u002Fhuggingface.co\u002FU4R\u002FStructTable-InternVL2-1B), which supports output LaTeX, HTML and MarkdDown formats has been officially integrated into `PDF-Extract-Kit 1.0`. Please refer to the [table recognition algorithm documentation](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Falgorithm\u002Ftable_recognition.html) for usage instructions!\n- `2024.10.17` 🎉🎉🎉 We are excited to announce that the more accurate and faster layout detection model, [DocLayout-YOLO](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FDocLayout-YOLO), has been officially integrated into `PDF-Extract-Kit 1.0`. Please refer to the [layout detection algorithm documentation](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Falgorithm\u002Flayout_detection.html) for usage instructions!\n- `2024.10.10` 🎉🎉🎉 The official release of `PDF-Extract-Kit 1.0`, rebuilt with modularity for more convenient and flexible model usage! Please switch to the [release\u002F0.1.1](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Ftree\u002Frelease\u002F0.1.1) branch for the old version.\n- `2024.08.01` 🎉🎉🎉 Added the [StructEqTable](demo\u002FTabRec\u002FStructEqTable\u002FREADME_TABLE.md) module for table content extraction. Welcome to use it!\n- `2024.07.01` 🎉🎉🎉 We released `PDF-Extract-Kit`, a comprehensive toolkit for high-quality PDF content extraction, including `Layout Detection`, `Formula Detection`, `Formula Recognition`, and `OCR`.\n\n## Performance Demonstration\n\nMany current open-source SOTA models are trained and evaluated on academic datasets, achieving high-quality results only on single document types. To enable models to achieve stable and robust high-quality results on diverse documents, we constructed diverse fine-tuning datasets and fine-tuned some SOTA models to obtain practical parsing models. Below are some visual results of the models.\n\n### Layout Detection\n\nWe trained robust `Layout Detection` models using diverse PDF document annotations. Our fine-tuned models achieve accurate extraction results on diverse PDF documents such as papers, textbooks, research reports, and financial reports, and demonstrate high robustness to challenges like blurring and watermarks. The visualization example below shows the inference results of the fine-tuned LayoutLMv3 model.\n \n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_6594b1bf2b39.png)\n\n### Formula Detection\n\nSimilarly, we collected and annotated documents containing formulas in both English and Chinese, and fine-tuned advanced formula detection models. The visualization result below shows the inference results of the fine-tuned YOLO formula detection model:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_22bc07ba0735.png)\n\n### Formula Recognition\n\n[UniMERNet](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FUniMERNet) is an algorithm designed for diverse formula recognition in real-world scenarios. By constructing large-scale training data and carefully designed results, it achieves excellent recognition performance for complex long formulas, handwritten formulas, and noisy screenshot formulas.\n\n### Table Recognition\n\n[StructEqTable](https:\u002F\u002Fgithub.com\u002FUniModal4Reasoning\u002FStructEqTable-Deploy) is a high efficiency toolkit that can converts table images into LaTeX\u002FHTML\u002FMarkDown. The latest version, powered by the InternVL2-1B foundation model,  improves Chinese recognition accuracy and expands multi-format output options.\n\n#### For more visual and inference results of the models, please refer to the [PDF-Extract-Kit tutorial documentation](xxx).\n\n## Evaluation Metrics\n\nComing Soon!\n\n## Usage Guide\n\n### Environment Setup\n\n```bash\nconda create -n pdf-extract-kit-1.0 python=3.10\nconda activate pdf-extract-kit-1.0\npip install -r requirements.txt\n```\n> **Note:** If your device does not support GPU, please install the CPU version dependencies using `requirements-cpu.txt` instead of `requirements.txt`.\n\n> **Note：** Current Doclayout-YOLO only supports installation from pypi，if error raises during DocLayout-YOLO installation，please install through `pip3 install doclayout-yolo==0.0.2 --extra-index-url=https:\u002F\u002Fpypi.org\u002Fsimple` .\n\n### Model Download\n\nPlease refer to the [Model Weights Download Tutorial](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Fget_started\u002Fpretrained_model.html) to download the required model weights. Note: You can choose to download all the weights or select specific ones. For detailed instructions, please refer to the tutorial.\n\n### Running Demos\n\n#### Layout Detection Model\n\n```bash \npython scripts\u002Flayout_detection.py --config=configs\u002Flayout_detection.yaml\n```\nLayout detection models support **DocLayout-YOLO** (default model), YOLO-v10, and LayoutLMv3. For YOLO-v10 and LayoutLMv3, please refer to [Layout Detection Algorithm](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Falgorithm\u002Flayout_detection.html). You can view the layout detection results in the `outputs\u002Flayout_detection` folder.\n\n#### Formula Detection Model\n\n```bash \npython scripts\u002Fformula_detection.py --config=configs\u002Fformula_detection.yaml\n```\nYou can view the formula detection results in the `outputs\u002Fformula_detection` folder.\n\n#### OCR Model\n\n```bash \npython scripts\u002Focr.py --config=configs\u002Focr.yaml\n```\nYou can view the OCR results in the `outputs\u002Focr` folder.\n\n#### Formula Recognition Model\n\n```bash \npython scripts\u002Fformula_recognition.py --config=configs\u002Fformula_recognition.yaml\n```\nYou can view the formula recognition results in the `outputs\u002Fformula_recognition` folder.\n\n#### Table Recognition Model\n\n```bash \npython scripts\u002Ftable_parsing.py --config configs\u002Ftable_parsing.yaml\n```\nYou can view the table recognition results in the `outputs\u002Ftable_parsing` folder.\n\n> **Note:** For more details on using the model, please refer to the[PDF-Extract-Kit-1.0 Tutorial](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Fget_started\u002Fpretrained_model.html).\n\n> This project focuses on using models for `high-quality` content extraction from `diverse` documents and does not involve reconstructing extracted content into new documents, such as PDF to Markdown. For such needs, please refer to our other GitHub project: [MinerU](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU).\n\n## To-Do List\n\n- [x] **Table Parsing**: Develop functionality to convert table images into corresponding LaTeX\u002FMarkdown format source code.\n- [ ] **Chemical Equation Detection**: Implement automatic detection of chemical equations.\n- [ ] **Chemical Equation\u002FDiagram Recognition**: Develop models to recognize and parse chemical equations and diagrams.\n- [ ] **Reading Order Sorting Model**: Build a model to determine the correct reading order of text in documents.\n\n**PDF-Extract-Kit** aims to provide high-quality PDF content extraction capabilities. We encourage the community to propose specific and valuable needs and welcome everyone to participate in continuously improving the PDF-Extract-Kit tool to advance research and industry development.\n\n## License\n\nThis project is open-sourced under the [AGPL-3.0](LICENSE) license.\n\nSince this project uses YOLO code and PyMuPDF for file processing, these components require compliance with the AGPL-3.0 license. Therefore, to ensure adherence to the licensing requirements of these dependencies, this repository as a whole adopts the AGPL-3.0 license.\n\n## Acknowledgement\n\n   - [LayoutLMv3](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm\u002Ftree\u002Fmaster\u002Flayoutlmv3): Layout detection model\n   - [UniMERNet](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FUniMERNet): Formula recognition model\n   - [StructEqTable](https:\u002F\u002Fgithub.com\u002FUniModal4Reasoning\u002FStructEqTable-Deploy): Table recognition model\n   - [YOLO](https:\u002F\u002Fgithub.com\u002Fultralytics\u002Fultralytics): Formula detection model\n   - [PaddleOCR](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleOCR): OCR model\n   - [DocLayout-YOLO](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FDocLayout-YOLO): Layout detection model\n\n## Citation\nIf you find our models \u002F code \u002F papers useful in your research, please consider giving ⭐ and citations 📝, thx :)  \n```bibtex\n@article{wang2024mineru,\n  title={MinerU: An Open-Source Solution for Precise Document Content Extraction},\n  author={Wang, Bin and Xu, Chao and Zhao, Xiaomeng and Ouyang, Linke and Wu, Fan and Zhao, Zhiyuan and Xu, Rui and Liu, Kaiwen and Qu, Yuan and Shang, Fukai and others},\n  journal={arXiv preprint arXiv:2409.18839},\n  year={2024}\n}\n\n@misc{zhao2024doclayoutyoloenhancingdocumentlayout,\n      title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception}, \n      author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He},\n      year={2024},\n      eprint={2410.12628},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.12628}, \n}\n\n@misc{wang2024unimernet,\n      title={UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition}, \n      author={Bin Wang and Zhuangcheng Gu and Chao Xu and Bo Zhang and Botian Shi and Conghui He},\n      year={2024},\n      eprint={2404.15254},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n\n@article{he2024opendatalab,\n  title={Opendatalab: Empowering general artificial intelligence with open datasets},\n  author={He, Conghui and Li, Wei and Jin, Zhenjiang and Xu, Chao and Wang, Bin and Lin, Dahua},\n  journal={arXiv preprint arXiv:2407.13773},\n  year={2024}\n}\n```\n\n## Star History\n\n\u003Ca>\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_3c1184d1bb07.png&theme=dark\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_3c1184d1bb07.png\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_3c1184d1bb07.png\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n## Related Links\n- [UniMERNet (Real-World Formula Recognition Algorithm)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FUniMERNet)\n- [LabelU (Lightweight Multimodal Annotation Tool)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FlabelU)\n- [LabelLLM (Open Source LLM Dialogue Annotation Platform)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FLabelLLM)\n- [MinerU (One-Stop High-Quality Data Extraction Tool)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_e3cec2cb0262.png\" width=\"220px\" style=\"vertical-align:middle;\">\n\u003C\u002Fp>\n\n\u003Cdiv align=\"center\">\n\n英语 | [简体中文](.\u002FREADME_zh-CN.md)\n\n[PDF-Extract-Kit-1.0 教程](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Fget_started\u002Fpretrained_model.html)\n\n[[模型 (🤗Hugging Face)]](https:\u002F\u002Fhuggingface.co\u002Fopendatalab\u002FPDF-Extract-Kit-1.0) | [[模型(\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_375e99f5914e.png\" width=\"20px\">ModelScope)]](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FOpenDataLab\u002FPDF-Extract-Kit-1.0) \n \n🔥🔥🔥 [MinerU：基于 PDF-Extract-Kit 的高效文档内容提取工具](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n    👋 欢迎加入我们的 \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTdedn9GTXq\" target=\"_blank\">Discord\u003C\u002Fa> 和 \u003Ca href=\"https:\u002F\u002Fr.vansin.top\u002F?r=MinerU\" target=\"_blank\">微信\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n## 概述\n\n`PDF-Extract-Kit` 是一款功能强大的开源工具包，旨在高效地从复杂多样的 PDF 文档中提取高质量内容。其主要特点和优势如下：\n\n- **集成领先的文档解析模型**：整合了布局检测、公式检测、公式识别、OCR 等核心文档解析任务的最先进模型。\n- **跨多种文档类型的高质量解析**：通过多样化的文档标注数据进行微调，能够在各种复杂文档类型中提供高质量的解析结果。\n- **模块化设计**：灵活的模块化设计允许用户通过修改配置文件和少量代码轻松组合构建各类应用，使应用开发如同搭积木般简单。\n- **全面的评估基准**：提供了多样且全面的 PDF 评估基准，帮助用户根据评估结果选择最适合的模型。\n\n**立即体验 PDF-Extract-Kit，开启 PDF 文档的无限可能！**\n\n> **注意**：PDF-Extract-Kit 专注于高质量的文档处理，是一个模型工具箱。    \n> 如果您希望提取高质量的文档内容（例如将 PDF 转换为 Markdown），请使用 [MinerU](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)，它结合了 PDF-Extract-Kit 的高质量预测结果，并进行了专门的工程优化，以实现更便捷高效的文档内容提取。    \n> 如果您是开发者，希望构建文档翻译、文档问答或文档助手等有趣的应用，那么使用 PDF-Extract-Kit 来搭建自己的项目将非常方便。特别是，我们将定期在 PDF-Extract-Kit\u002Fproject 目录下更新一些有趣的应用案例，敬请关注！\n\n**我们欢迎社区的研究人员和工程师通过提交 PR 成为 PDF-Extract-Kit 项目的贡献者，共同贡献优秀的模型和创新的应用！**\n\n## 模型概览\n\n| **任务类型**     | **描述**                                                                 | **模型**                    |\n|-------------------|---------------------------------------------------------------------------------|-------------------------------|\n| **布局检测** | 定位文档中的不同元素：包括图片、表格、文本、标题、公式等                       | `DocLayout-YOLO_ft`, `YOLO-v10_ft`, `LayoutLMv3_ft` | \n| **公式检测** | 在文档中定位公式：包括行内公式和独立公式                                     | `YOLOv8_ft`                   |  \n| **公式识别** | 将公式图像识别为 LaTeX 源代码                                                 | `UniMERNet`                   |  \n| **OCR**           | 从图像中提取文本内容（包括位置和识别）                                        | `PaddleOCR`                   | \n| **表格识别** | 将表格图像识别为对应的源代码（LaTeX\u002FHTML\u002FMarkdown）                           | `PaddleOCR+TableMaster`, `StructEqTable` |  \n| **阅读顺序** | 对分散的文本段落进行排序并拼接                                               | 即将推出！                  | \n\n## 新闻与更新\n- `2024.10.22` 🎉🎉🎉 我们很高兴地宣布，支持输出 LaTeX、HTML 和 Markdown 格式的表格识别模型 [StructTable-InternVL2-1B](https:\u002F\u002Fhuggingface.co\u002FU4R\u002FStructTable-InternVL2-1B) 已正式集成到 `PDF-Extract-Kit 1.0` 中。使用方法请参阅 [表格识别算法文档](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Falgorithm\u002Ftable_recognition.html)！\n- `2024.10.17` 🎉🎉🎉 我们很高兴地宣布，更加精准快速的布局检测模型 [DocLayout-YOLO](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FDocLayout-YOLO) 已正式集成到 `PDF-Extract-Kit 1.0` 中。使用方法请参阅 [布局检测算法文档](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Falgorithm\u002Flayout_detection.html)！\n- `2024.10.10` 🎉🎉🎉 正式发布 `PDF-Extract-Kit 1.0`，采用模块化设计，使模型使用更加便捷灵活！旧版本请切换至 [release\u002F0.1.1](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Ftree\u002Frelease\u002F0.1.1) 分支。\n- `2024.08.01` 🎉🎉🎉 增加了用于表格内容提取的 [StructEqTable](demo\u002FTabRec\u002FStructEqTable\u002FREADME_TABLE.md) 模块，欢迎大家使用！\n- `2024.07.01` 🎉🎉🎉 我们发布了 `PDF-Extract-Kit`，这是一个包含 `布局检测`、`公式检测`、`公式识别` 和 `OCR` 的综合性高质量 PDF 内容提取工具包。\n\n## 性能展示\n\n目前许多开源的 SOTA 模型都是在学术数据集上训练和评估的，只能在单一类型的文档上取得高质量的结果。为了使模型能够在多样化的文档上稳定、鲁棒地达到高质量效果，我们构建了多样化的微调数据集，并对部分 SOTA 模型进行了微调，从而得到了实用的解析模型。以下是部分模型的可视化结果。\n\n### 布局检测\n\n我们利用多样化的 PDF 文档标注数据训练了鲁棒的 `布局检测` 模型。经过微调后的模型能够在论文、教科书、研究报告、财务报告等多种 PDF 文档中实现准确的提取，并且对模糊、水印等挑战具有很高的鲁棒性。下面的示意图展示了微调后的 LayoutLMv3 模型的推理结果：\n \n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_6594b1bf2b39.png)\n\n### 公式检测\n\n同样地，我们收集并标注了包含英汉双语公式的文档，对先进的公式检测模型进行了微调。下面的示意图展示了微调后的 YOLO 公式检测模型的推理结果：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_22bc07ba0735.png)\n\n### 公式识别\n\n[UniMERNet](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FUniMERNet) 是一种专为现实场景中多样化公式识别设计的算法。通过构建大规模训练数据和精心设计的标注结果，该算法在复杂长公式、手写公式以及含噪声的截图公式识别任务上均取得了优异的性能。\n\n### 表格识别\n\n[StructEqTable](https:\u002F\u002Fgithub.com\u002FUniModal4Reasoning\u002FStructEqTable-Deploy) 是一款高效的工具包，可将表格图像转换为 LaTeX、HTML 或 Markdown 格式。最新版本基于 InternVL2-1B 基础模型，进一步提升了中文识别准确率，并扩展了多格式输出选项。\n\n#### 如需查看模型的更多可视化效果及推理结果，请参阅 [PDF-Extract-Kit 教程文档](xxx)。\n\n## 评估指标\n\n即将推出！\n\n## 使用指南\n\n### 环境搭建\n\n```bash\nconda create -n pdf-extract-kit-1.0 python=3.10\nconda activate pdf-extract-kit-1.0\npip install -r requirements.txt\n```\n> **注意：** 如果您的设备不支持 GPU，请使用 `requirements-cpu.txt` 安装 CPU 版本的依赖，而非 `requirements.txt`。\n\n> **注意：** 目前 Doclayout-YOLO 仅支持从 PyPI 安装。若安装 DocLayout-YOLO 时出现错误，请尝试使用以下命令进行安装：`pip3 install doclayout-yolo==0.0.2 --extra-index-url=https:\u002F\u002Fpypi.org\u002Fsimple`。\n\n### 模型下载\n\n请参考 [模型权重下载教程](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Fget_started\u002Fpretrained_model.html)，下载所需的模型权重。注意：您可以选择下载所有权重，也可以只下载部分特定权重。详细操作步骤请参阅该教程。\n\n### 示例运行\n\n#### 布局检测模型\n\n```bash\npython scripts\u002Flayout_detection.py --config=configs\u002Flayout_detection.yaml\n```\n布局检测模型支持 **DocLayout-YOLO**（默认模型）、YOLO-v10 和 LayoutLMv3。关于 YOLO-v10 和 LayoutLMv3 的更多信息，请参阅 [布局检测算法](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Falgorithm\u002Flayout_detection.html)。布局检测结果将保存在 `outputs\u002Flayout_detection` 文件夹中。\n\n#### 公式检测模型\n\n```bash\npython scripts\u002Fformula_detection.py --config=configs\u002Fformula_detection.yaml\n```\n公式检测结果将保存在 `outputs\u002Fformula_detection` 文件夹中。\n\n#### OCR 模型\n\n```bash\npython scripts\u002Focr.py --config=configs\u002Focr.yaml\n```\nOCR 结果将保存在 `outputs\u002Focr` 文件夹中。\n\n#### 公式识别模型\n\n```bash\npython scripts\u002Fformula_recognition.py --config=configs\u002Fformula_recognition.yaml\n```\n公式识别结果将保存在 `outputs\u002Fformula_recognition` 文件夹中。\n\n#### 表格识别模型\n\n```bash\npython scripts\u002Ftable_parsing.py --config configs\u002Ftable_parsing.yaml\n```\n表格识别结果将保存在 `outputs\u002Ftable_parsing` 文件夹中。\n\n> **注意：** 如需了解更多关于模型使用的详细信息，请参阅 [PDF-Extract-Kit-1.0 教程](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Fget_started\u002Fpretrained_model.html)。\n\n> 本项目专注于从“多样化”文档中提取“高质量”内容，而不涉及将提取的内容重新构建成新文档，例如将 PDF 转换为 Markdown。如需此类功能，请参考我们的另一个 GitHub 项目：[MinerU](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)。\n\n## 待办事项\n\n- [x] **表格解析**：开发将表格图像转换为相应 LaTeX\u002FMarkdown 格式的源代码的功能。\n- [ ] **化学方程式检测**：实现化学方程式的自动检测。\n- [ ] **化学方程式\u002F图表识别**：开发用于识别和解析化学方程式及图表的模型。\n- [ ] **阅读顺序排序模型**：构建用于确定文档中文本正确阅读顺序的模型。\n\n**PDF-Extract-Kit** 致力于提供高质量的 PDF 内容提取能力。我们鼓励社区提出具体且有价值的需求，并欢迎所有人参与不断改进 PDF-Extract-Kit 工具，以推动研究与行业的发展。\n\n## 许可证\n\n本项目采用 [AGPL-3.0](LICENSE) 开源许可证。\n\n由于本项目使用了 YOLO 代码和 PyMuPDF 进行文件处理，这些组件需要遵守 AGPL-3.0 许可证的要求。因此，为确保符合这些依赖项的许可要求，整个仓库也采用了 AGPL-3.0 许可证。\n\n## 致谢\n\n   - [LayoutLMv3](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm\u002Ftree\u002Fmaster\u002Flayoutlmv3)：布局检测模型\n   - [UniMERNet](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FUniMERNet)：公式识别模型\n   - [StructEqTable](https:\u002F\u002Fgithub.com\u002FUniModal4Reasoning\u002FStructEqTable-Deploy)：表格识别模型\n   - [YOLO](https:\u002F\u002Fgithub.com\u002Fultralytics\u002Fultralytics)：公式检测模型\n   - [PaddleOCR](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleOCR)：OCR 模型\n   - [DocLayout-YOLO](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FDocLayout-YOLO)：布局检测模型\n\n## 引用\n\n如果您在研究中发现我们的模型、代码或论文有所帮助，请考虑给予 ⭐ 和引用 📝，谢谢 :)  \n```bibtex\n@article{wang2024mineru,\n  title={MinerU: An Open-Source Solution for Precise Document Content Extraction},\n  author={Wang, Bin and Xu, Chao and Zhao, Xiaomeng and Ouyang, Linke and Wu, Fan and Zhao, Zhiyuan and Xu, Rui and Liu, Kaiwen and Qu, Yuan and Shang, Fukai and others},\n  journal={arXiv preprint arXiv:2409.18839},\n  year={2024}\n}\n\n@misc{zhao2024doclayoutyoloenhancingdocumentlayout,\n      title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception}, \n      author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He},\n      year={2024},\n      eprint={2410.12628},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.12628}, \n}\n\n@misc{wang2024unimernet,\n      title={UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition}, \n      author={Bin Wang and Zhuangcheng Gu and Chao Xu and Bo Zhang and Botian Shi and Conghui He},\n      year={2024},\n      eprint={2404.15254},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n\n@article{he2024opendatalab,\n  title={Opendatalab: Empowering general artificial intelligence with open datasets},\n  author={He, Conghui and Li, Wei and Jin, Zhenjiang and Xu, Chao and Wang, Bin and Lin, Dahua},\n  journal={arXiv preprint arXiv:2407.13773},\n  year={2024}\n}\n```\n\n## 星标历史\n\n\u003Ca>\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_3c1184d1bb07.png&theme=dark\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_3c1184d1bb07.png\" \u002F>\n   \u003Cimg alt=\"星标历史图表\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_readme_3c1184d1bb07.png\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n## 相关链接\n- [UniMERNet（真实世界公式识别算法）](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FUniMERNet)\n- [LabelU（轻量级多模态标注工具）](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FlabelU)\n- [LabelLLM（开源大模型对话标注平台）](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FLabelLLM)\n- [MinerU（一站式高质量数据提取工具）](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)","# PDF-Extract-Kit 快速上手指南\n\nPDF-Extract-Kit 是一款强大的开源工具包，旨在从复杂多样的 PDF 文档中高效提取高质量内容。它集成了布局检测、公式检测与识别、OCR 及表格识别等领先的解析模型，采用模块化设计，方便开发者灵活构建应用。\n\n> **提示**：如果您仅需将 PDF 转换为 Markdown 等格式，推荐使用基于本工具优化的 [MinerU](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)；如果您是开发者希望构建文档翻译、问答等自定义应用，请使用本工具。\n\n## 1. 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n- **操作系统**：Linux \u002F Windows \u002F macOS\n- **Python 版本**：3.10\n- **硬件要求**：\n  - 推荐配备 NVIDIA GPU 以加速推理。\n  - 若无 GPU，可使用 CPU 模式（速度较慢）。\n- **依赖管理**：已安装 `conda` 或 `pip`。\n\n## 2. 安装步骤\n\n### 2.1 创建虚拟环境\n\n建议使用 Conda 创建独立的 Python 环境：\n\n```bash\nconda create -n pdf-extract-kit-1.0 python=3.10\nconda activate pdf-extract-kit-1.0\n```\n\n### 2.2 安装依赖\n\n**GPU 版本（推荐）：**\n```bash\npip install -r requirements.txt\n```\n\n**CPU 版本（无显卡设备）：**\n```bash\npip install -r requirements-cpu.txt\n```\n\n> **注意**：若安装 `DocLayout-YOLO` 时报错，请尝试以下命令强制指定源安装：\n> ```bash\n> pip3 install doclayout-yolo==0.0.2 --extra-index-url=https:\u002F\u002Fpypi.org\u002Fsimple\n> ```\n\n### 2.3 下载模型权重\n\n本项目不包含预训练模型文件，需手动下载。您可以从 **Hugging Face** 或 **ModelScope（魔搭）** 下载。国内用户推荐优先使用 ModelScope 以获得更快速度。\n\n- **Hugging Face**: [PDF-Extract-Kit-1.0](https:\u002F\u002Fhuggingface.co\u002Fopendatalab\u002FPDF-Extract-Kit-1.0)\n- **ModelScope (国内加速)**: [PDF-Extract-Kit-1.0](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FOpenDataLab\u002FPDF-Extract-Kit-1.0)\n\n请下载所需任务的模型权重，并参考 [官方教程](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Fget_started\u002Fpretrained_model.html) 将其放置于正确目录。\n\n## 3. 基本使用\n\n安装完成并下载模型后，即可运行脚本进行内容提取。以下是最常用的几个功能示例：\n\n### 3.1 布局检测 (Layout Detection)\n识别文档中的图片、表格、文本、标题和公式区域。默认使用 `DocLayout-YOLO` 模型。\n\n```bash\npython scripts\u002Flayout_detection.py --config=configs\u002Flayout_detection.yaml\n```\n*结果输出至：`outputs\u002Flayout_detection`*\n\n### 3.2 公式检测 (Formula Detection)\n定位文档中的行内公式和块级公式。\n\n```bash\npython scripts\u002Fformula_detection.py --config=configs\u002Fformula_detection.yaml\n```\n*结果输出至：`outputs\u002Fformula_detection`*\n\n### 3.3 光学字符识别 (OCR)\n从图像中提取文本内容及位置信息。\n\n```bash\npython scripts\u002Focr.py --config=configs\u002Focr.yaml\n```\n*结果输出至：`outputs\u002Focr`*\n\n### 3.4 公式识别 (Formula Recognition)\n将公式图像转换为 LaTeX 源代码。\n\n```bash\npython scripts\u002Fformula_recognition.py --config=configs\u002Fformula_recognition.yaml\n```\n*结果输出至：`outputs\u002Fformula_recognition`*\n\n### 3.5 表格识别 (Table Recognition)\n将表格图像转换为 LaTeX、HTML 或 Markdown 格式。\n\n```bash\npython scripts\u002Ftable_parsing.py --config configs\u002Ftable_parsing.yaml\n```\n*结果输出至：`outputs\u002Ftable_parsing`*\n\n> **说明**：具体配置文件路径及模型切换方式，请参阅项目根目录下的 `configs` 文件夹及 [详细文档](https:\u002F\u002Fpdf-extract-kit.readthedocs.io\u002Fen\u002Flatest\u002Fget_started\u002Fpretrained_model.html)。","某金融科技公司数据团队需要将数千份包含复杂数学公式和多层嵌套表格的历年研报 PDF，批量转换为结构化 Markdown 以构建垂直领域大模型知识库。\n\n### 没有 PDF-Extract-Kit 时\n- **公式识别混乱**：传统 OCR 工具将行内公式和独立公式块错误识别为乱码或普通文本，导致模型无法理解关键量化逻辑。\n- **表格结构丢失**：复杂的跨页表格和多线框表格被拆解为无序文本片段，行列对应关系完全错乱，数据失去分析价值。\n- **版面解析割裂**：缺乏统一的布局检测模型，需人工拼凑多个开源工具处理标题、图片和正文，开发维护成本极高且效果参差不齐。\n- **阅读顺序错乱**：双栏排版文档的文本流经常上下跳跃，后续清洗工作耗时巨大，严重拖慢知识库构建进度。\n\n### 使用 PDF-Extract-Kit 后\n- **公式完美还原**：利用集成的 UniMERNet 模型，精准将公式图像转换为标准 LaTeX 源码，完整保留数学语义。\n- **表格高保真重构**：通过 TableMaster 等专用模型，自动识别复杂表格结构并输出规范的 Markdown\u002FHTML 代码，行列数据准确对齐。\n- **一站式模块化解析**：借助其预训练的 Layout Detection 模型统一检测图文版式，灵活组合各任务模块，大幅降低工程集成难度。\n- **智能阅读排序**：内置的阅读顺序算法自动梳理双栏及多栏文档逻辑，直接输出连贯流畅的文本流，几乎无需人工后处理。\n\nPDF-Extract-Kit 通过集成业界领先的专项模型，将非结构化复杂文档转化为高质量机器可读数据的能力提升了数倍，成为构建专业文档知识库的基石。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_PDF-Extract-Kit_6594b1bf.png","opendatalab","OpenDataLab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fopendatalab_37842245.jpg","OpenDataLab provides access to numerous significant open-source datasets.",null,"OpenDataLab@pjlab.org.cn","https:\u002F\u002Fopendatalab.org.cn","https:\u002F\u002Fgithub.com\u002Fopendatalab",[84],{"name":85,"color":86,"percentage":87},"Python","#3572A5",100,9539,719,"2026-04-03T08:56:05","AGPL-3.0","未说明","非必需（支持 CPU 模式，需安装 requirements-cpu.txt）；GPU 具体型号、显存大小及 CUDA 版本未在文档中明确说明",{"notes":95,"python":96,"dependencies":97},"1. 建议使用 conda 创建名为 'pdf-extract-kit-1.0' 的虚拟环境。2. 若设备不支持 GPU，请使用 'requirements-cpu.txt' 替代 'requirements.txt' 进行安装。3. DocLayout-YOLO 模型目前仅支持通过 PyPI 安装，若安装报错，请执行特定命令：'pip3 install doclayout-yolo==0.0.2 --extra-index-url=https:\u002F\u002Fpypi.org\u002Fsimple'。4. 本项目为模型工具箱，若需将 PDF 转换为 Markdown 等完整文档提取功能，建议配合使用 MinerU 工具。5. 运行前需参考教程下载相应的模型权重文件。","3.10",[98,99,100,101,102,103,104,105],"doclayout-yolo==0.0.2","requirements.txt (包含 torch, transformers 等，具体版本未列出)","requirements-cpu.txt (CPU 版本依赖)","PyMuPDF","YOLO","PaddleOCR","UniMERNet","StructEqTable",[14,13,54],"2026-03-27T02:49:30.150509","2026-04-06T09:52:04.607253",[110,115,120,125,130,135],{"id":111,"question_zh":112,"answer_zh":113,"source_url":114},12619,"为什么运行时报错 'PIL.Image' has no attribute 'LINEAR'？","这是因为 Pillow 版本不兼容导致的。请尝试安装特定版本的 Pillow：`pip install pillow==8.4.0`。如果是在 Windows 上且使用 Python 3.11，该版本可能不支持，建议参考官方文档安装 ImageMagick 并检查环境配置：https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fblob\u002Fmain\u002Fdocs\u002FInstall_in_Windows_zh_cn.md","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fissues\u002F47",{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},12620,"运行 layoutlmv3 时出现 'Image' object has no attribute 'read' 错误怎么办？","这是一个已知 Bug，原因是代码未正确处理输入的 `PIL.Image.Image` 对象，导致将其再次传递给 `Image.open`。该问题已在 PR #160 中修复，请更新代码库后重新尝试：https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F160","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fissues\u002F157",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},12621,"如何获取 layoutlmv3-ft 的版面检测详细结果（如 boxes, scores, classes）和标注图片？","目前脚本默认不直接存储这些详细信息。您需要自行修改代码，逻辑是将输入路径转换为 `List[PIL.Images]` 再逐个传入 `layoutlmv3` 进行推理。若需获取标注图片，由于无法直接使用 `visualize=True`，您需要根据检测结果自行绘制。","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fissues\u002F163",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},12622,"遇到 'Can't get attribute YOLOv10DetectionModel' 属性错误如何解决？","这通常是由于环境冲突或库版本不一致导致的。确保您安装了正确版本的 `doclayout-yolo`。如果问题依旧，检查相关脚本（如 `pdf2markdown.py`）中的变量引用是否正确（例如确认是否应将 `pdf_path` 改为 `fpath`），并重新拉取最新代码。","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fissues\u002F153",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},12623,"安装 detectron2 时提示 'not a supported wheel on this platform' 怎么办？","这表明下载的 `.whl` 文件与当前系统平台或 Python 版本不匹配。请勿手动下载预编译包，而是按照官方文档指示，根据您的操作系统（Linux\u002FWindows\u002FMac）和 Python 版本，使用正确的 pip 命令或从源码编译安装 detectron2。","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fissues\u002F143",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},12624,"项目是否提供在线体验 Demo 地址？","项目曾提供在线体验地址（如 https:\u002F\u002Fopendatalab.com\u002FOpenSourceTools\u002FExtractor\u002FPDF），但用户反馈链接有时无法打开。如果官方链接失效，建议直接在本地部署环境以体验完整功能，或关注项目主页的最新公告获取可用的 Demo 链接。","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fissues\u002F22",[141,146,151],{"id":142,"version":143,"summary_zh":144,"released_at":145},63035,"PDF-Extract-Kit-1.0.0-released","## 变更内容\n* @wangbinDL 重构了 pdf-extract-kit-0.1.1 的代码，支持模块化操作，使用户能够更方便、灵活地选择和组合所需的模型。\n* @wangbinDL 新增了公式识别、公式检测和版面检测的示例。\n* @wangbinDL 补充了 PDF-Extract-Kit-1.0 的文档。\n* @JulioZhao97 引入了一个新的版面检测模型（LayoutLMv3）。\n* @wufan-tb 增加了 OCR 支持。\n\n## 新贡献者\n* @JulioZhao97 首次贡献，加入了 LayoutLMv3 模型。","2024-10-11T08:56:25",{"id":147,"version":148,"summary_zh":149,"released_at":150},63036,"PDF-Extract-Kit-0.1.1-released","## 变更内容\n- 由 @wangbinDL 将许可证从 Apache 2.0 更新为 AGPL-3.0\n- 由 @wangbinDL 添加 MinerU 技术报告的 BibTeX 条目\n\n版本 0.1.1 是 PDF-Extract-Kit 1.0.0 重大架构变更之前的稳定版。虽然即将发布的 1.0.0 版本将带来更加简洁直观的用户体验，但同时也进行了大量修改。建议希望保持稳定性及熟悉旧版操作方式的用户继续使用 0.1.1 版本。","2024-10-09T03:11:17",{"id":152,"version":153,"summary_zh":154,"released_at":155},63037,"PDF-Extract-Kit-0.1.0-released","## 变更内容\n* @ouyanglinke 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F1 中添加了评估表格\n* @ouyanglinke 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F3 中修改了 README 文件\n* @ouyanglinke 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F4 中修改了 README 文件\n* @ouyanglinke 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F6 中在 README 中添加了许可证信息及模型检查点版本信息\n* @ouyanglinke 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F17 中添加了验证内容\n* @wangbinDL 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F34 中更新了 README 文件\n* @wufan-tb 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F35 中更新了代码说明，并移除了部分个人信息\n* @ouyanglinke 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F40 中将类别映射介绍添加到了验证 README 文件中\n* 解决 #36：@zhchbin 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F44 中添加了 Google Colab 链接\n* @sky-fly97 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F58 中添加了使用 StructEqTable 模型进行表格识别的功能\n* @wangbinDL 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F70 中优化了表格识别教程\n* 修复、重构与文档更新：@myhloli 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F88 中更新了 OCR 逻辑和安装指南\n* 修复（OCR）：@myhloli 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F92 中解决了 OCR 过程中因粘连导致部分行和跨度缺失的问题\n* 新功能：@jorgeolothar 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F89 中添加了批处理大小参数和垃圾回收功能\n* 开发分支合并至主分支：@myhloli 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F124 中完成了合并\n\n## 新贡献者\n* @ouyanglinke 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F1 中做出了首次贡献\n* @wufan-tb 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F35 中做出了首次贡献\n* @zhchbin 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F44 中做出了首次贡献\n* @jorgeolothar 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fpull\u002F89 中做出了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit\u002Fcommits\u002FPDF-Extract-Kit-0.1.0-released","2024-09-11T08:30:54"]