[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-opendatalab--MinerU":3,"tool-opendatalab--MinerU":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,52],"视频",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[14,35],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":10,"env_os":93,"env_gpu":94,"env_ram":95,"env_deps":96,"category_tags":104,"github_topics":106,"view_count":120,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":121,"updated_at":122,"faqs":123,"releases":157},4243,"opendatalab\u002FMinerU","MinerU","Transforms complex documents like PDFs into LLM-ready markdown\u002FJSON for your Agentic workflows.","MinerU 是一款专为大语言模型（LLM）打造的文档解析工具，旨在将复杂的 PDF 文件高效转化为机器易读的 Markdown 或 JSON 格式。在日常工作中，许多用户面临从扫描版论文、技术手册或包含复杂排版的文档中提取高质量文本的难题，传统方法往往难以保留原有的公式、表格和结构信息，导致后续 AI 处理效果不佳。MinerU 正是为了解决这一痛点而生，它能精准识别并还原文档中的多栏布局、数学公式及图表内容，确保输出数据干净、结构化，直接适配各类智能体（Agentic）工作流。\n\n这款工具特别适合开发者、数据科学家以及需要构建知识库的研究人员使用。无论是希望微调专属模型的算法工程师，还是试图搭建企业级 RAG（检索增强生成）系统的技术团队，MinerU 都能提供强有力的支持。其核心技术亮点在于对复杂版面分析的深度优化，不仅支持批量处理，还能在保持高准确率的同时，大幅降低数据清洗的人力成本。通过 MinerU，用户可以轻松打通从原始文档到 AI 应用的数据链路，让非结构化文档真正变成可被智能体理解的高价值资产。","\u003Cdiv align=\"center\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fhtml\">\n\u003C!-- logo -->\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_0e6b9cc985cc.png\" width=\"300px\" style=\"vertical-align:middle;\">\n\u003C\u002Fp>\n\n\u003C!-- icon -->\n\n[![stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopendatalab\u002FMinerU.svg)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n[![forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fopendatalab\u002FMinerU.svg)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n[![open issues](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-raw\u002Fopendatalab\u002FMinerU)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues)\n[![issue resolution](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-closed-raw\u002Fopendatalab\u002FMinerU)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues)\n[![PyPI version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmineru)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmineru\u002F)\n[![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fmineru)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmineru\u002F)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_f617c87ed07f.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmineru)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_f617c87ed07f.png\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmineru)\n[![OpenDataLab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fwebapp_on_mineru.net-blue?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMTM0IiBoZWlnaHQ9IjEzNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cGF0aCBkPSJtMTIyLDljMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0idXJsKCNhKSIvPjxwYXRoIGQ9Im0xMjIsOWMwLDUtNCw5LTksOXMtOS00LTktOSw0LTksOS05LDksNCw5LDl6IiBmaWxsPSIjMDEwMTAxIi8+PHBhdGggZD0ibTkxLDE4YzAsNS00LDktOSw5cy05LTQtOS05LDQtOSw5LTksOSw0LDksOXoiIGZpbGw9InVybCgjYikiLz48cGF0aCBkPSJtOTEsMThjMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0iIzAxMDEwMSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0idXJsKCNjKSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0iIzAxMDEwMSIvPjxkZWZzPjxsaW5lYXJHcmFkaWVudCBpZD0iYSIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYiIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYyIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjwvZGVmcz48L3N2Zz4=&labelColor=white)](https:\u002F\u002Fmineru.net\u002FOpenSourceTools\u002FExtractor?source=github)\n[![HuggingFace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_HuggingFace-yellow.svg?logo=data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t\u002FAAAAk1BMVEVHcEz\u002FnQv\u002FnQv\u002FnQr\u002FnQv\u002FnQr\u002FnQv\u002FnQv\u002FnQr\u002FwRf\u002FtxT\u002Fpg7\u002FyRr\u002FrBD\u002FzRz\u002Fngv\u002FoAz\u002Fzhz\u002Fnwv\u002FtxT\u002Fngv\u002F0B3+zBz\u002FnQv\u002F0h7\u002Fwxn\u002FvRb\u002FthXkuiT\u002FrxH\u002FpxD\u002Fogzcqyf\u002FnQvTlSz\u002FczCxky7\u002FSjifdjT\u002FMj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9\u002FfxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw\u002F1f3UaWcSGYNKTdf\u002FP+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl\u002F6C4s\u002FZLAM45SOi\u002F1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8\u002FPhXiBXPMjLSxtwp8W9f\u002F1AngRierBkA+kk\u002FIpUSOeKByzn8y3kAAAfh\u002F\u002F0oXgV4roHm\u002Fkz4E2z\u002F\u002FzRc3\u002FlgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6\u002FPT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr\u002FcyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61\u002FUj\u002F9H\u002FVzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3kOAp2f1Kf0Weony7pn\u002FcPydvhQYV+eFOfmOu7VB\u002FViPe34\u002FEN3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO\u002FuOvHofxjrV\u002FTNS6iMJS+4TcSTgk9n5agJdBQbB\u002F\u002FIfF\u002FHpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ\u002FptaJq5T\u002F7WcgAZywR\u002FXlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN\u002Fi1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi\u002FhnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX\u002Fe6479yZcLwCBmTxiawEwrOcleuu12t3tbLv\u002FN4RLYIBhYexm7Fcn4OJcn0+zc+s8\u002FVfPeddZHAGN6TT8eGczHdR\u002FGts1\u002FMzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG\u002FvsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fopendatalab\u002FMinerU)\n[![ModelScope](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_ModelScope-purple?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002FOpenDataLab\u002FMinerU)\n[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgist\u002Fmyhloli\u002Fa3cb16570ab3cfeadf9d8f0ac91b4fca\u002Fmineru_demo.ipynb)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.18839)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.22186)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fopendatalab\u002FMinerU)\n\n\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F11174\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_4a68feb902da.png\" alt=\"opendatalab%2FMinerU | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n\u003C!-- language -->\n\n[English](README.md) | [简体中文](README_zh-CN.md)\n\n\u003C!-- hot link -->\n\n\u003Cp align=\"center\">\n🚀\u003Ca href=\"https:\u002F\u002Fmineru.net\u002F?source=github\">Access MinerU Now→✅ Zero-Install Web Version ✅ Full-Featured Desktop Client ✅ Instant API Access; Skip deployment headaches – get all product formats in one click. Developers, dive in!\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C!-- join us -->\n\n\u003Cp align=\"center\">\n    👋 join us on \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTdedn9GTXq\" target=\"_blank\">Discord\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fmineru.net\u002Fcommunity-portal\u002F?aliasId=3c430f94\" target=\"_blank\">WeChat\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n\n\u003Cdetails>\n\u003Csummary>MinerU — High-accuracy document parsing engine for LLM · RAG · Agent workflows\u003C\u002Fsummary>\nConverts PDF · Word · PPT · Images · Web pages into structured Markdown \u002F JSON · VLM+OCR dual engine · 109 languages \u003Cbr>\nMCP Server · LangChain \u002F Dify \u002F FastGPT native integration · 10+ domestic AI chip support\n\n**🔍 Core Parsing Capabilities**\n\n- Formulas → LaTeX · Tables → HTML, accurate layout reconstruction\n- Supports scanned docs, handwriting, multi-column layouts, cross-page table merging\n- Output follows human reading order with automatic header\u002Ffooter removal\n- VLM + OCR dual engine, 109-language OCR recognition\n\n**🔌 Integration**\n\n| Use Case | Solution |\n|----------|----------|\n| AI Coding Tools | MCP Server — Cursor · Claude Desktop · Windsurf |\n| RAG Frameworks | LangChain · LlamaIndex · RAGFlow · RAG-Anything · Flowise · Dify · FastGPT |\n| Development | Python \u002F Go \u002F TypeScript SDK · CLI · REST API · Docker |\n| No-Code | mineru.net online · Gradio WebUI · Desktop client |\n\n**🖥️ Deployment (Private · Fully Offline)**\n\n| Inference Backend | Best For |\n|------------------|---------|\n| pipeline         | Fast & stable, no hallucination, runs on CPU or GPU |\n| vlm-engine       | High accuracy, supports vLLM \u002F LMDeploy \u002F mlx ecosystem |\n| hybrid-engine    | High accuracy, native text extraction, low hallucination |\n\nDomestic AI chips: Ascend · Cambricon · Enflame · MetaX · Moore Threads · Kunlunxin · Iluvatar · Hygon · Biren · T-Head\n\u003C\u002Fdetails>\n\n# Changelog\n\n- 2026\u002F03\u002F29 3.0.0 Released\n\n  This release delivers a systematic upgrade centered on **parsing capability, system architecture, and engineering usability**. The main updates include:\n  \n  - Native `DOCX` parsing\n    - Official support for native `DOCX` parsing, delivering high-precision results without hallucinations.\n    - Compared with the traditional workflow of first converting `DOCX` to `PDF` and then parsing it, end-to-end speed is improved by tens of times, making it better suited for scenarios with high requirements for both accuracy and throughput.\n  - `pipeline` backend upgrade\n    - The `pipeline` backend achieves a score of `86.2` on OmniDocBench (v1.5), surpassing the accuracy of the previous-generation mainstream VLM `MinerU2.0-2505-0.9B`.\n    - Added support for parsing images\u002Fformulas inside tables, seal text recognition, vertical text support, and interline formula numbering recognition, continuously improving parsing quality for complex document scenarios.\n    - While maintaining high accuracy, it keeps resource usage extremely low and continues to support inference in pure CPU environments.\n  - `API \u002F CLI \u002F Router` orchestration upgrade\n    - `mineru` now runs as an orchestration client based on `mineru-api`; when `--api-url` is not provided, it will automatically start a local temporary service.\n    - `mineru-api` adds a new asynchronous task endpoint `POST \u002Ftasks`, supporting task submission, status querying, and result retrieval; meanwhile, it retains the synchronous parsing endpoint `POST \u002Ffile_parse` for compatibility with legacy plugins.\n    - Added `mineru-router`, designed for unified entry deployment and task routing across multiple services and multiple GPUs; its interfaces are fully compatible with `mineru-api` and support automatic task load balancing.\n  - Deployment and usability improvements\n    - Resolved compatibility issues with `torch >= 2.8`; the base image has been upgraded to `vllm0.11.2 + torch2.9.0`, unifying installation paths across different Compute Capabilities.\n    - Optimized the parsing pipeline with a sliding-window mechanism, significantly reducing peak memory usage in long-document scenarios, so documents with tens of thousands of pages no longer need to be split manually.\n    - Batch inference in `pipeline` now supports streaming writes to disk, allowing completed parsing results to be written out in time and further improving the experience for long-running tasks.\n    - Completed thread-safety optimization and now fully supports multi-threaded concurrent inference; together with `mineru-router`, this enables one-click multi-GPU deployment and makes it easy to build high-concurrency, high-throughput parsing systems.\n    - Completely removed the use of two AGPLv3 models (`doclayoutyolo` and `mfd_yolov8`) and one CC-BY-NC-SA 4.0 model (`layoutreader`).  \n  \n  This update is not just a set of feature enhancements, but a key leap forward in MinerU's overall system capabilities. We specifically addressed the peak memory usage issue in long-document parsing. Through optimizations such as sliding windows and streaming writes to disk, ultra-long document parsing has moved from “requiring manual splitting and careful handling” to being “stable, scalable, and ready for production workloads.” At the same time, we completed thread-safety optimization and fully enabled multi-threaded concurrent inference, further improving single-machine resource utilization and runtime stability under high-concurrency workloads. On top of this, with `mineru-router` and the new `API \u002F CLI` orchestration framework, MinerU now supports one-click multi-GPU deployment, unified access across multiple services, and automatic task load balancing, significantly reducing the difficulty of large-scale deployment. As a result, MinerU is evolving from a standalone data production tool into a large-scale document parsing foundation for high-concurrency and high-throughput scenarios, providing enterprise-grade document data processing with infrastructure that is more stable, more efficient, and easier to scale.\n\n> 📝 View the complete [Changelog](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Freference\u002Fchangelog\u002F) for more historical version information\n\n# MinerU\n\n## Project Introduction\n\nMinerU is a document parsing tool that converts `PDF`, image, and `DOCX` inputs into machine-readable formats such as Markdown and JSON for downstream retrieval, extraction, and processing.\nMinerU was born during the pre-training process of [InternLM](https:\u002F\u002Fgithub.com\u002FInternLM\u002FInternLM). We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models.\nCompared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on [issue](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues) and **attach the relevant document or sample file**.\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F4bea02c9-6d54-4cd6-97ed-dff14340982c\n\n## Key Features\n\n- Support `PDF`, image, and `DOCX` inputs.\n- Remove headers, footers, footnotes, page numbers, etc., to ensure semantic coherence.\n- Output text in human-readable order, suitable for single-column, multi-column, and complex layouts.\n- Preserve the structure of the original document, including headings, paragraphs, lists, etc.\n- Extract images, image descriptions, tables, table titles, and footnotes.\n- Automatically recognize and convert formulas in the document to LaTeX format.\n- Automatically recognize and convert tables in the document to HTML format.\n- Automatically detect scanned PDFs and garbled PDFs and enable OCR functionality.\n- OCR supports detection and recognition of 109 languages.\n- Supports multiple output formats, such as multimodal and NLP Markdown, JSON sorted by reading order, and rich intermediate formats.\n- Supports various visualization results, including layout visualization and span visualization, for efficient confirmation of output quality.\n- Built-in CLI, FastAPI, Gradio WebUI, for local orchestration and multi-service deployment.\n- Supports running in a pure CPU environment, and also supports GPU(CUDA)\u002FNPU(CANN)\u002FMPS acceleration\n- Compatible with Windows, Linux, and Mac platforms.\n\n# Quick Start\n\nIf you encounter any installation issues, please first consult the \u003Ca href=\"#faq\">FAQ\u003C\u002Fa>. \u003C\u002Fbr>\nIf the parsing results are not as expected, refer to the \u003Ca href=\"#known-issues\">Known Issues\u003C\u002Fa>. \u003C\u002Fbr>\n\n## Online Experience\n\n### Official online web application\nThe official online version has the same functionality as the client, with a beautiful interface and rich features, requires login to use  \n \n- [![OpenDataLab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fwebapp_on_mineru.net-blue?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMTM0IiBoZWlnaHQ9IjEzNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cGF0aCBkPSJtMTIyLDljMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0idXJsKCNhKSIvPjxwYXRoIGQ9Im0xMjIsOWMwLDUtNCw5LTksOXMtOS00LTktOSw0LTksOS05LDksNCw5LDl6IiBmaWxsPSIjMDEwMTAxIi8+PHBhdGggZD0ibTkxLDE4YzAsNS00LDktOSw5cy05LTQtOS05LDQtOSw5LTksOSw0LDksOXoiIGZpbGw9InVybCgjYikiLz48cGF0aCBkPSJtOTEsMThjMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0iIzAxMDEwMSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0idXJsKCNjKSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0iIzAxMDEwMSIvPjxkZWZzPjxsaW5lYXJHcmFkaWVudCBpZD0iYSIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYiIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYyIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjwvZGVmcz48L3N2Zz4=&labelColor=white)](https:\u002F\u002Fmineru.net\u002FOpenSourceTools\u002FExtractor?source=github)\n\n### Gradio-based online demo\nA WebUI developed based on Gradio, with a simple interface and only core parsing functionality, no login required  \n\n- [![ModelScope](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_ModelScope-purple?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002FOpenDataLab\u002FMinerU)\n- [![HuggingFace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_HuggingFace-yellow.svg?logo=data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t\u002FAAAAk1BMVEVHcEz\u002FnQv\u002FnQv\u002FnQr\u002FnQv\u002FnQr\u002FnQv\u002FnQv\u002FnQr\u002FwRf\u002FtxT\u002Fpg7\u002FyRr\u002FrBD\u002FzRz\u002Fngv\u002FoAz\u002Fzhz\u002Fnwv\u002FtxT\u002Fngv\u002F0B3+zBz\u002FnQv\u002F0h7\u002Fwxn\u002FvRb\u002FthXkuiT\u002FrxH\u002FpxD\u002Fogzcqyf\u002FnQvTlSz\u002FczCxky7\u002FSjifdjT\u002FMj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9\u002FfxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw\u002F1f3UaWcSGYNKTdf\u002FP+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl\u002F6C4s\u002FZLAM45SOi\u002F1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8\u002FPhXiBXPMjLSxtwp8W9f\u002F1AngRierBkA+kk\u002FIpUSOeKByzn8y3kAAAfh\u002F\u002F0oXgV4roHm\u002Fkz4E2z\u002F\u002FzRc3\u002FlgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6\u002FPT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr\u002FcyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61\u002FUj\u002F9H\u002FVzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3kOAp2f1Kf0Weony7pn\u002FcPydvhQYV+eFOfmOu7VB\u002FViPe34\u002FEN3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO\u002FuOvHofxjrV\u002FTNS6iMJS+4TcSTgk9n5agJdBQbB\u002F\u002FIfF\u002FHpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ\u002FptaJq5T\u002F7WcgAZywR\u002FXlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN\u002Fi1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi\u002FhnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX\u002Fe6479yZcLwCBmTxiawEwrOcleuu12t3tbLv\u002FN4RLYIBhYexm7Fcn4OJcn0+zc+s8\u002FVfPeddZHAGN6TT8eGczHdR\u002FGts1\u002FMzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG\u002FvsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fopendatalab\u002FMinerU)\n\n## Local Deployment\n\n\n> [!WARNING]\n> **Pre-installation Notice—Hardware and Software Environment Support**\n>\n> To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.\n>\n> By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features.\n>\n> In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.\n\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth rowspan=\"2\">Parsing Backend\u003C\u002Fth>\n      \u003Cth rowspan=\"2\">pipeline\u003C\u002Fth>\n      \u003Cth colspan=\"2\">*-auto-engine\u003C\u002Fth>\n      \u003Cth colspan=\"2\">*-http-client\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>hybrid\u003C\u002Fth>\n      \u003Cth>vlm\u003C\u002Fth>\n      \u003Cth>hybrid\u003C\u002Fth>\n      \u003Cth>vlm\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Cth>Backend Features\u003C\u002Fth>\n      \u003Ctd >Good Compatibility\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\">High Hardware Requirements\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\">For OpenAI Compatible Servers\u003Csup>2\u003C\u002Fsup>\u003C\u002Ftd>\n    \u003C\u002Ftr> \n    \u003Ctr>\n      \u003Cth>Accuracy\u003Csup>1\u003C\u002Fsup>\u003C\u002Fth>\n      \u003Ctd style=\"text-align:center;\">86+\u003C\u002Ftd>\n      \u003Ctd colspan=\"4\" style=\"text-align:center;\">90+\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Operating System\u003C\u002Fth>\n      \u003Ctd colspan=\"5\" style=\"text-align:center;\">Linux\u003Csup>3\u003C\u002Fsup> \u002F Windows\u003Csup>4\u003C\u002Fsup> \u002F macOS\u003Csup>5\u003C\u002Fsup>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Pure CPU Support\u003C\u002Fth>\n      \u003Ctd style=\"text-align:center;\">✅\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">❌\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">✅\u003C\u002Ftd>\n    \u003C\u002Ftr>\n        \u003Ctr>\n      \u003Cth>GPU Acceleration\u003C\u002Fth>\n      \u003Ctd colspan=\"4\" style=\"text-align:center;\">Volta and later architecture GPUs or Apple Silicon\u003C\u002Ftd>\n      \u003Ctd rowspan=\"2\">Not Required\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Min VRAM\u003C\u002Fth>\n      \u003Ctd style=\"text-align:center;\">4GB\u003C\u002Ftd>\n      \u003Ctd style=\"text-align:center;\">8GB\u003C\u002Ftd>\n      \u003Ctd style=\"text-align:center;\">8GB\u003C\u002Ftd>\n      \u003Ctd style=\"text-align:center;\">2GB\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>RAM\u003C\u002Fth>\n      \u003Ctd colspan=\"3\" style=\"text-align:center;\">Min 16GB, Recommended 32GB or more\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">Min 16GB\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Disk Space\u003C\u002Fth>\n      \u003Ctd colspan=\"3\" style=\"text-align:center;\">Min 20GB, SSD Recommended\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">Min 2GB\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Python Version\u003C\u002Fth>\n      \u003Ctd colspan=\"5\" style=\"text-align:center;\">3.10-3.13\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\u003Csup>1\u003C\u002Fsup> Accuracy metrics are the End-to-End Evaluation Overall scores from OmniDocBench (v1.5), based on the latest version of `MinerU`.  \n\u003Csup>2\u003C\u002Fsup> Servers compatible with OpenAI API, such as local model servers or remote model services deployed via inference frameworks like `vLLM`\u002F`SGLang`\u002F`LMDeploy`.  \n\u003Csup>3\u003C\u002Fsup> Linux only supports distributions from 2019 and later.  \n\u003Csup>4\u003C\u002Fsup> Since the key dependency `ray` does not support Python 3.13 on Windows, only versions 3.10~3.12 are supported.  \n\u003Csup>5\u003C\u002Fsup> macOS requires version 14.0 or later.\n\n\n### Install MinerU\n\n#### Install MinerU using pip or uv\n```bash\npip install --upgrade pip\npip install uv\nuv pip install -U \"mineru[all]\"\n```\n\n#### Install MinerU from source code\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU.git\ncd MinerU\nuv pip install -e .[all]\n```\n\n> [!TIP]\n> `mineru[all]` includes all core features, compatible with Windows \u002F Linux \u002F macOS systems, suitable for most users.\n> If you need to specify the inference framework for the VLM model, or only intend to install a lightweight client on an edge device, please refer to the documentation [Extension Modules Installation Guide](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fquick_start\u002Fextension_modules\u002F).\n\n---\n \n#### Deploy MinerU using Docker\nMinerU provides a convenient Docker deployment method, which helps quickly set up the environment and solve some tricky environment compatibility issues.\nYou can get the [Docker Deployment Instructions](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fquick_start\u002Fdocker_deployment\u002F) in the documentation.\n\n---\n\n### Using MinerU\n\n\nIf your device meets the GPU acceleration requirements in the table above, you can use a simple command line for document parsing:\n```bash\nmineru -p \u003Cinput_path> -o \u003Coutput_path>\n```\nIf your device does not meet the GPU acceleration requirements, you can specify the backend as `pipeline` to run in a pure CPU environment:\n```bash\nmineru -p \u003Cinput_path> -o \u003Coutput_path> -b pipeline\n```\n\n`mineru` currently supports local `PDF`, image, and `DOCX` file or directory inputs, and can be used for document parsing through the CLI, API, WebUI, and `mineru-router`. For detailed instructions, please refer to the [Usage Guide](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fusage\u002F).\n\n# TODO\n\n- [x] Reading order based on the model  \n- [x] Recognition of `index` and `list` in the main text  \n- [x] Table recognition\n- [x] Heading Classification\n- [x] Handwritten Text Recognition  \n- [x] Vertical Text Recognition  \n- [x] Latin Accent Mark Recognition\n- [x] Code block recognition in the main text\n- [x] [Chemical formula recognition](docs\u002Fchemical_knowledge_introduction\u002Fintroduction.pdf)(mineru.net)\n- [ ] Geometric shape recognition\n\n# Known Issues\n\n- Reading order is determined by the model based on the spatial distribution of readable content, and may be out of order in some areas under extremely complex layouts.\n- Limited support for vertical text.\n- Tables of contents and lists are recognized through rules, and some uncommon list formats may not be recognized.\n- Code blocks are not yet supported in the layout model.\n- Comic books, art albums, primary school textbooks, and exercises cannot be parsed well.\n- Table recognition may result in row\u002Fcolumn recognition errors in complex tables.\n- OCR recognition may produce inaccurate characters in PDFs of lesser-known languages (e.g., diacritical marks in Latin script, easily confused characters in Arabic script).\n- Some formulas may not render correctly in Markdown.\n\n# FAQ\n\n- If you encounter any issues during usage, you can first check the [FAQ](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Ffaq\u002F) for solutions.  \n- If your issue remains unresolved, you may also use [DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fopendatalab\u002FMinerU) to interact with an AI assistant, which can address most common problems.  \n- If you still cannot resolve the issue, you are welcome to join our community via [Discord](https:\u002F\u002Fdiscord.gg\u002FTdedn9GTXq) or [WeChat](https:\u002F\u002Fmineru.net\u002Fcommunity-portal\u002F?aliasId=3c430f94) to discuss with other users and developers.\n\n# All Thanks To Our Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_da0181fe3e88.png\" \u002F>\n\u003C\u002Fa>\n\n# License Information\n\n[LICENSE.md](LICENSE.md)\n\nThe source code in this repository is licensed under AGPLv3.\n\n# Acknowledgments\n\n- [UniMERNet](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FUniMERNet)\n- [TableStructureRec](https:\u002F\u002Fgithub.com\u002FRapidAI\u002FTableStructureRec)\n- [PaddleOCR](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleOCR)\n- [PaddleOCR2Pytorch](https:\u002F\u002Fgithub.com\u002Ffrotms\u002FPaddleOCR2Pytorch)\n- [fast-langdetect](https:\u002F\u002Fgithub.com\u002FLlmKira\u002Ffast-langdetect)\n- [pypdfium2](https:\u002F\u002Fgithub.com\u002Fpypdfium2-team\u002Fpypdfium2)\n- [pdftext](https:\u002F\u002Fgithub.com\u002Fdatalab-to\u002Fpdftext)\n- [pdfminer.six](https:\u002F\u002Fgithub.com\u002Fpdfminer\u002Fpdfminer.six)\n- [pypdf](https:\u002F\u002Fgithub.com\u002Fpy-pdf\u002Fpypdf)\n- [magika](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmagika)\n- [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)\n- [LMDeploy](https:\u002F\u002Fgithub.com\u002FInternLM\u002Flmdeploy)\n\n# Citation\n\n```bibtex\n@article{dong2026minerudiffusion,\n  title={MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding},\n  author={Dong, Hejun and Niu, Junbo and Wang, Bin and Zeng, Weijun and Zhang, Wentao and He, Conghui},\n  journal={arXiv preprint arXiv:2603.22458},\n  year={2026}\n}\n\n@article{niu2025mineru2,\n  title={Mineru2. 5: A decoupled vision-language model for efficient high-resolution document parsing},\n  author={Niu, Junbo and Liu, Zheng and Gu, Zhuangcheng and Wang, Bin and Ouyang, Linke and Zhao, Zhiyuan and Chu, Tao and He, Tianyao and Wu, Fan and Zhang, Qintong and others},\n  journal={arXiv preprint arXiv:2509.22186},\n  year={2025}\n}\n\n@article{wang2024mineru,\n  title={Mineru: An open-source solution for precise document content extraction},\n  author={Wang, Bin and Xu, Chao and Zhao, Xiaomeng and Ouyang, Linke and Wu, Fan and Zhao, Zhiyuan and Xu, Rui and Liu, Kaiwen and Qu, Yuan and Shang, Fukai and others},\n  journal={arXiv preprint arXiv:2409.18839},\n  year={2024}\n}\n\n@article{he2024opendatalab,\n  title={Opendatalab: Empowering general artificial intelligence with open datasets},\n  author={He, Conghui and Li, Wei and Jin, Zhenjiang and Xu, Chao and Wang, Bin and Lin, Dahua},\n  journal={arXiv preprint arXiv:2407.13773},\n  year={2024}\n}\n```\n\n# Star History\n\n\u003Ca>\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_93d56e76be9c.png&theme=dark\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_93d56e76be9c.png\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_93d56e76be9c.png\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n\n# Links\n- [MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU-Diffusion)\n- [Easy Data Preparation with latest LLMs-based Operators and Pipelines](https:\u002F\u002Fgithub.com\u002FOpenDCAI\u002FDataFlow)\n- [Vis3 (OSS browser based on s3)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FVis3)\n- [LabelU (A Lightweight Multi-modal Data Annotation Tool)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FlabelU)\n- [LabelLLM (An Open-source LLM Dialogue Annotation Platform)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FLabelLLM)\n- [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit)\n- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FOmniDocBench)\n- [Magic-HTML (Mixed web page extraction tool)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002Fmagic-html)\n- [Magic-Doc (Fast speed ppt\u002Fpptx\u002Fdoc\u002Fdocx\u002Fpdf extraction tool)](https:\u002F\u002Fgithub.com\u002FInternLM\u002Fmagic-doc) \n- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https:\u002F\u002Fgithub.com\u002FMigoXLab\u002Fdingo)\n","\u003Cdiv align=\"center\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fhtml\">\n\u003C!-- 标志 -->\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_0e6b9cc985cc.png\" width=\"300px\" style=\"vertical-align:middle;\">\n\u003C\u002Fp>\n\n\u003C!-- 图标 -->\n\n[![星级](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopendatalab\u002FMinerU.svg)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n[![复刻数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fopendatalab\u002FMinerU.svg)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n[![未解决问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-raw\u002Fopendatalab\u002FMinerU)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues)\n[![问题解决率](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-closed-raw\u002Fopendatalab\u002FMinerU)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues)\n[![PyPI版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmineru)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmineru\u002F)\n[![PyPI - Python版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fmineru)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmineru\u002F)\n[![下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_f617c87ed07f.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmineru)\n[![月度下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_f617c87ed07f.png\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmineru)\n[![OpenDataLab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fwebapp_on_mineru.net-blue?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMTM0IiBoZWlnaHQ9IjEzNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cGF0aCBkPSJtMTIyLDljMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0idXJsKCNhKSIvPjxwYXRoIGQ9Im0xMjIsOWMwLDUtNCw5LTksOXMtOS00LTktOSw0LTksOS05LDksNCw5LDl6IiBmaWxsPSIjMDEwMTAxIi8+PHBhdGggZD0ibTkxLDE4YzAsNS00LDktOSw5cy05LTQtOS05LDQtOSw5LTksOSw0LDksOXoiIGZpbGw9InVybCgjYikiLz48cGF0aCBkPSJtOTEsMThjMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDksNCw5LDl6IiBmaWxsPSIjMDEwMTAxIi8+PHBhdGggZmlsbC1ydWxlPSJldmVub2RkIiBjbGlwLnJ1bGU9ImV2ZW5vZGQiIGQ9Im0zOSw2MmMwLDE2LDgsMzAsMjAsMzgsNywtNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0idXJsKCNjKSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc9LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0iIzAxMDEwMSIvPjxkZWZzPjxsaW5lYXJHcmFkaWVudCBpZD0iYSIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYiIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYyIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjwvZGVmcz48L3N2Zz4=&labelColor=white)](https:\u002F\u002Fmineru.net\u002FOpenSourceTools\u002FExtractor?source=github)\n[![HuggingFace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_HuggingFace-yellow.svg?logo=data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t\u002FAAAAk1BMVEVHcEz\u002FnQv\u002FnQv\u002FnQr\u002FnQv\u002FnQr\u002FnQv\u002FnQv\u002FnQr\u002FwRf\u002FtxT\u002Fpg7\u002FyRr\u002FrBD\u002Fzhz\u002Fngv\u002F0B3+zBz\u002FnQv\u002F0h7\u002Fwxn\u002FvRb\u002FthXkuiT\u002FrxH\u002FpxD\u002Fogzcqyf\u002FnQvTlSz\u002FczCxky7\u002FSjifdjT\u002FMj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9\u002FfxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw\u002F1f3UaWcSGYNKTdf\u002FP+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl\u002F6C4s\u002FZLAM45SOi\u002F1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJAM4R7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjX597HopF5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8\u002FPhXiBXPMjLSxtwp8W9f\u002F1AngRierBkA+kk\u002FIpUSOeKByzn8y3kAAAfh\u002F\u002F0oXgV4roHm\u002Fkz4E2z\u002F\u002FzRc3\u002FlgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6\u002FPT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr\u002FcyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61\u002FUj\u002F9H\u002FVzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIo......[![星级](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopendatalab\u002FMinerU.svg)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n[![复刻数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fopendatalab\u002FMinerU.svg)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU)\n[![未解决问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-raw\u002Fopendatalab\u002FMinerU)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues)\n[![问题解决率](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-closed-raw\u002Fopendatalab\u002FMinerU)](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues)\n[![PyPI版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmineru)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmineru\u002F)\n[![PyPI - Python版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fmineru)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmineru\u002F)\n[![下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_f617c87ed07f.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmineru)\n[![每月下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_f617c87ed07f.png\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmineru)\n[![OpenDataLab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fwebapp_on_mineru.net-blue?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMTM0IiBoZWlnaHQ9IjEzNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cGF0aCBkPSJtMTIyLDljMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0idXJsKCNhKSIvPjxwYXRoIGQ9Im0xMjIsOWMwLDUtNCw5LTksOXMtOS00LTktOSw0LTksOS05LDksNCw5LDl6IiBmaWxsPSIjMDEwMTAxIi8+PHBhdGggZD0ibTkxLDE4YzAsNS00LDktOSw5cy05LTQtOS05LDQtOSw5LTksOSw0LDksOXoiIGZpbGw9InVybCgjYikiLz48cGF0aCBkPSJtOTEsMThjMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDksNCw5LDl6IiBmaWxsPSIjMDEwMTAxIi8+PHBhdGggZmlsbC1ydWxlPSJldmVub2RkIiBjbGlwLnJ1bGU9ImV2ZW5vZGQiIGQ9Im0zOSw2MmMwLDE2LDgsMzAsMjAsMzgsNywtNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0idXJsKCNjKSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc9LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0iIzAxMDEwMSIvPjxkZWZzPjxsaW5lYXJHcmFkaWVudCBpZD0iYSIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYiIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYyIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjwvZGVmcz48L3N2Zz4=&labelColor=white)](https:\u002F\u002Fmineru.net\u002FOpenSourceTools\u002FExtractor?source=github)\n[![HuggingFace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_HuggingFace-yellow.svg?logo=data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t\u002FAAAAk1BMVEVHcEz\u002FnQv\u002FnQv\u002FnQr\u002FnQv\u002FnQr\u002FnQv\u002FnQv\u002FnQr\u002FwRf\u002FtxT\u002Fpg7\u002FyRr\u002FrBD\u002Fzhz\u002Fngv\u002F0B3+zBz\u002FnQv\u002F0h7\u002Fwxn\u002FvRb\u002FthXkuiT\u002FrxH\u002FpxD\u002Fogzcqyf\u002FnQvTlSz\u002FczCxky7\u002FSjifdjT\u002FMj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9\u002FfxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw\u002F1f3UaWcSGYNKTdf\u002FP+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl\u002F6C4s\u002FZLAM45SOi\u002F1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJAM4R7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjX597HopF5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8\u002FPhXiBXPMjLSxtwp8W9f\u002F1AngRierBkA+kk\u002FIpUSOeKByzn8y3kAAAfh\u002F\u002F0oXgV4roHm\u002Fkz4E2z\u002F\u002FzRc3\u002FlgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6\u002FPT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr\u002FcyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61\u002FUj\u002F9H\u002FVzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ\u002Ft\u002FfL++6unpR1YGC2n\u002FKCoa0tTLoKiEeUPDl94nj+5\u002FTv3\u002FeL7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz\u002FYn3RFHY\u002FyRuT8ddCtMPH\u002FMcBAT5s+vRde\u002Fgf2c\u002FsPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV\u002FX1fBl9S5zrwuwF8NFr......\n\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F11174\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_4a68feb902da.png\" alt=\"opendatalab%2FMinerU | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n\u003C!-- language -->\n\n[英语](README.md) | [简体中文](README_zh-CN.md)\n\n\u003C!-- hot link -->\n\n\u003Cp align=\"center\">\n🚀\u003Ca href=\"https:\u002F\u002Fmineru.net\u002F?source=github\">立即访问 MinerU→✅ 无需安装的网页版 ✅ 功能齐全的桌面客户端 ✅ 即时 API 访问；告别部署烦恼，一键获取所有产品形态。开发者们，快来体验吧！\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C!-- join us -->\n\n\u003Cp align=\"center\">\n    👋 欢迎加入我们的 \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTdedn9GTXq\" target=\"_blank\">Discord\u003C\u002Fa> 和 \u003Ca href=\"https:\u002F\u002Fmineru.net\u002Fcommunity-portal\u002F?aliasId=3c430f94\" target=\"_blank\">微信\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n\n\u003Cdetails>\n\u003Csummary>MinerU — 面向 LLM · RAG · Agent 工作流的高精度文档解析引擎\u003C\u002Fsummary>\n可将 PDF · Word · PPT · 图片 · 网页转换为结构化 Markdown \u002F JSON · VLM+OCR 双引擎 · 支持 109 种语言\u003Cbr>\nMCP 服务器 · LangChain \u002F Dify \u002F FastGPT 原生集成 · 支持 10 多款国产 AI 芯片\n\n**🔍 核心解析能力**\n\n- 公式 → LaTeX · 表格 → HTML，精准还原布局\n- 支持扫描件、手写文字、多栏布局、跨页表格合并\n- 输出遵循人类阅读顺序，并自动去除页眉页脚\n- VLM + OCR 双引擎，支持 109 种语言的 OCR 识别\n\n**🔌 集成**\n\n| 使用场景 | 解决方案 |\n|----------|----------|\n| AI 编程工具 | MCP 服务器 — Cursor · Claude Desktop · Windsurf |\n| RAG 框架 | LangChain · LlamaIndex · RAGFlow · RAG-Anything · Flowise · Dify · FastGPT |\n| 开发 | Python \u002F Go \u002F TypeScript SDK · CLI · REST API · Docker |\n| 无代码 | mineru.net 在线版 · Gradio WebUI · 桌面客户端 |\n\n**🖥️ 部署（私有·完全离线）**\n\n| 推理后端 | 最佳适用场景 |\n|------------------|---------|\n| pipeline         | 速度快且稳定，无幻觉，可在 CPU 或 GPU 上运行 |\n| vlm-engine       | 精度高，支持 vLLM \u002F LMDeploy \u002F mlx 生态系统 |\n| hybrid-engine    | 精度高，原生文本提取，幻觉少 |\n\n国产 AI 芯片：Ascend · Cambricon · Enflame · MetaX · Moore Threads · Kunlunxin · Iluvatar · Hygon · Biren · T-Head\n\u003C\u002Fdetails>\n\n\n\n# 更改日志\n\n- 2026年3月29日 3.0.0 发布\n\n  本次发布围绕 **解析能力、系统架构和工程可用性** 进行了系统性升级。主要更新包括：\n  \n  - 原生 `DOCX` 解析\n    - 正式支持原生 `DOCX` 解析，结果精确无幻觉。\n    - 相比于传统先将 `DOCX` 转为 `PDF` 再解析的工作流程，端到端速度提升了数十倍，更适合对准确性和吞吐量都有较高要求的场景。\n  - `pipeline` 后端升级\n    - `pipeline` 后端在 OmniDocBench (v1.5) 上得分达到 `86.2`，超越了上一代主流 VLM `MinerU2.0-2505-0.9B` 的准确率。\n    - 新增支持解析表格内的图片\u002F公式、印章文字识别、竖排文本支持以及行间公式编号识别等功能，持续提升复杂文档场景下的解析质量。\n    - 在保持高精度的同时，资源占用极低，继续支持纯 CPU 环境下的推理。\n  - `API \u002F CLI \u002F Router` 编排升级\n    - `mineru` 现在以 `mineru-api` 为基础运行编排客户端；当未提供 `--api-url` 时，会自动启动本地临时服务。\n    - `mineru-api` 新增异步任务接口 `POST \u002Ftasks`，支持任务提交、状态查询和结果获取；同时保留同步解析接口 `POST \u002Ffile_parse`，以兼容旧版插件。\n    - 新增 `mineru-router`，专为多服务、多 GPU 环境下的统一入口部署及任务路由设计；其接口与 `mineru-api` 完全兼容，支持自动任务负载均衡。\n  - 部署与可用性改进\n    - 解决了与 `torch >= 2.8` 的兼容性问题；基础镜像已升级至 `vllm0.11.2 + torch2.9.0`，统一了不同计算能力下的安装路径。\n    - 优化了解析管道中的滑动窗口机制，大幅降低了长文档场景下的峰值内存使用，数万页的文档不再需要手动拆分。\n    - `pipeline` 中的批处理推理现在支持流式写入磁盘，已完成的解析结果可以及时写出，进一步改善长时间任务的体验。\n    - 完成了线程安全优化，全面支持多线程并发推理；结合 `mineru-router`，实现了多 GPU 的一键部署，轻松构建高并发、高吞吐量的解析系统。\n    - 彻底移除了两个 AGPLv3 许可模型（`doclayoutyolo` 和 `mfd_yolov8`）以及一个 CC-BY-NC-SA 4.0 许可模型（`layoutreader`）。  \n  \n  本次更新不仅是功能上的增强，更是 MinerU 整体系统能力的一次关键飞跃。我们特别解决了长文档解析中的峰值内存问题。通过滑动窗口和流式写盘等优化措施，超长文档解析已从“需手动拆分、小心处理”转变为“稳定、可扩展，可直接用于生产工作负载”。与此同时，我们完成了线程安全优化，全面启用了多线程并发推理，进一步提升了单机资源利用率和高并发工作负载下的运行稳定性。在此基础上，借助 `mineru-router` 和全新的 `API \u002F CLI` 编排框架，MinerU 现已支持多 GPU 的一键部署、多服务间的统一接入以及任务的自动负载均衡，大大降低了大规模部署的难度。因此，MinerU 正在从一款独立的数据生产工具，演变为面向高并发、高吞吐量场景的大规模文档解析基础平台，为企业级文档数据处理提供更加稳定、高效且易于扩展的基础设施。\n\n> 📝 查看完整 [更改日志](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Freference\u002Fchangelog\u002F) 获取更多历史版本信息\n\n# MinerU\n\n## 项目介绍\n\nMinerU 是一款文档解析工具，可将 `PDF`、图像和 `DOCX` 格式的输入转换为机器可读的格式，如 Markdown 和 JSON，以便进行下游的检索、提取和处理。\nMinerU 诞生于 [InternLM](https:\u002F\u002Fgithub.com\u002FInternLM\u002FInternLM) 的预训练过程中。我们专注于解决科学文献中的符号转换问题，希望为大模型时代的技术发展贡献力量。\n与知名的商业产品相比，MinerU 仍处于起步阶段。如果您遇到任何问题或结果不符合预期，请在 [issue](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues) 上提交问题，并 **附上相关文档或示例文件**。\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F4bea02c9-6d54-4cd6-97ed-dff14340982c\n\n## 核心功能\n\n- 支持 `PDF`、图像和 `DOCX` 输入。\n- 去除页眉、页脚、脚注、页码等，确保语义连贯性。\n- 按照人类阅读顺序输出文本，适用于单栏、多栏及复杂布局。\n- 保留原始文档的结构，包括标题、段落、列表等。\n- 提取图片、图片说明、表格、表标题和脚注。\n- 自动识别并把文档中的公式转换为 LaTeX 格式。\n- 自动识别并把文档中的表格转换为 HTML 格式。\n- 自动检测扫描版 PDF 和乱码 PDF，并启用 OCR 功能。\n- OCR 支持 109 种语言的检测与识别。\n- 支持多种输出格式，如多模态和 NLP Markdown、按阅读顺序排序的 JSON，以及丰富的中间格式。\n- 支持多种可视化结果，包括布局可视化和跨度可视化，以高效确认输出质量。\n- 内置 CLI、FastAPI 和 Gradio WebUI，便于本地编排和多服务部署。\n- 支持纯 CPU 环境运行，同时也支持 GPU(CUDA)\u002FNPU(CANN)\u002FMPS 加速。\n- 兼容 Windows、Linux 和 Mac 平台。\n\n# 快速开始\n\n如果在安装过程中遇到任何问题，请先参阅 \u003Ca href=\"#faq\">常见问题解答\u003C\u002Fa>。\u003C\u002Fbr>\n如果解析结果不符合预期，请参考 \u003Ca href=\"#known-issues\">已知问题\u003C\u002Fa>。\u003C\u002Fbr>\n\n## 在线体验\n\n### 官方在线 Web 应用\n官方在线版本与客户端功能相同，界面美观、功能丰富，需登录后使用。\n\n- [![OpenDataLab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fwebapp_on_mineru.net-blue?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMTM0IiBoZWlnaHQ9IjEzNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cGF0aCBkPSJtMTIyLDljMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0idXJsKCNhKSIvPjxwYXRoIGQ9Im0xMjIsOWMwLDUtNCw5LTksOXMtOS00LTktOSw0LTksOS05LDksNCw5LDl6IiBmaWxsPSIjMDEwMTAxIi8+PHBhdGggZD0ibTkxLDE4YzAsNS00LDktOSw5cy05LTQtOS05LDQtOSw5LTksOSw0LDksOXoiIGZpbGw9InVybCgjYikiLz48cGF0aCBkPSJtOTEsMThjMCw1LTQsOS05LDlzLTktNC05LTksNC05LDktOSw5LDQsOSw5eiIgZmlsbD0iIzAxMDEwMSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc5LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0idXJsKCNjKSIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgY2xpcC1ydWxlPSJldmVub2RkIiBkPSJtMzksNjJjMCwxNiw4LDMwLDIwLDM4LDctNiwxMi0xNiwxMi0yNlY0OWMwLTQsMy03LDYtOGw0Ni0xMmM1LTEsMTEsMywxMSw4djMxYzAsMzctMzAsNjYtNjYsNjYtMzcsMC02Ni0zMC02Ni02NlY0NmMwLTQsMy03LDYtOGwyMC02YzUtMSwxMSwzLDExLDh2MjF6bS0yOSw2YzAsMTYsNiwzMCwxNyw0MCwzLDEsNSwxLDgsMSw1LDAsMTAtMSwxNS0zQzM3LDk1LDI5LDc9LDI5LDYyVjQybC0xOSw1djIweiIgZmlsbD0iIzAxMDEwMSIvPjxkZWZzPjxsaW5lYXJHcmFkaWVudCBpZD0iYSIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYiIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjxsaW5lYXJHcmFkaWVudCBpZD0iYyIgeDE9Ijg0IiB5MT0iNDEiIHgyPSI3NSIgeTI9IjEyMCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPjxzdG9wIHN0b3AtY29sb3I9IiNmZmYiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3AtY29sb3I9IiMyZTJlMmUiLz48L2xpbmVhckdyYWRpZW50PjwvZGVmcz48L3N2Zz4=&labelColor=white)](https:\u002F\u002Fmineru.net\u002FOpenSourceTools\u002FExtractor?source=github)\n\n### 基于Gradio的在线演示\n基于Gradio开发的Web界面，界面简洁，仅提供核心解析功能，无需登录  \n\n- [![ModelScope](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_ModelScope-purple?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuN......### 基于Gradio的在线演示\n基于Gradio开发的Web界面，界面简洁，仅提供核心解析功能，无需登录  \n\n- [![ModelScope](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo_on_ModelScope-purple?logo=data:image\u002Fsvg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OWwtMjUuNjUsMGwwLC0......\n\n## 本地部署\n\n\n> [!WARNING]\n> **安装前须知—硬件与软件环境支持**\n>\n> 为确保项目的稳定性和可靠性，我们在开发过程中仅针对特定的硬件和软件环境进行优化和测试。这样可以保证用户在推荐的系统配置上部署和运行项目时，能够获得最佳性能并减少兼容性问题。\n>\n> 通过将资源集中在主流环境中，我们的团队可以更高效地解决潜在的bug并开发新功能。\n>\n> 在非主流环境中，由于硬件和软件配置的多样性以及第三方依赖的兼容性问题，我们无法保证项目100%可用。因此，对于希望在非推荐环境下使用本项目的人士，建议先仔细阅读文档和常见问题解答。大多数问题在FAQ中已有相应的解决方案。我们也鼓励社区反馈，以帮助我们逐步扩大支持范围。\n\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth rowspan=\"2\">解析后端\u003C\u002Fth>\n      \u003Cth rowspan=\"2\">pipeline\u003C\u002Fth>\n      \u003Cth colspan=\"2\">*-auto-engine\u003C\u002Fth>\n      \u003Cth colspan=\"2\">*-http-client\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>hybrid\u003C\u002Fth>\n      \u003Cth>vlm\u003C\u002Fth>\n      \u003Cth>hybrid\u003C\u002Fth>\n      \u003Cth>vlm\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Cth>后端特性\u003C\u002Fth>\n      \u003Ctd >兼容性好\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\">硬件要求高\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\">适用于OpenAI兼容服务器\u003Csup>2\u003C\u002Fsup>\u003C\u002Ftd>\n    \u003C\u002Ftr> \n    \u003Ctr>\n      \u003Cth>准确率\u003Csup>1\u003C\u002Fsup>\u003C\u002Fth>\n      \u003Ctd style=\"text-align:center;\">86+\u003C\u002Ftd>\n      \u003Ctd colspan=\"4\" style=\"text-align:center;\">90+\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>操作系统\u003C\u002Fth>\n      \u003Ctd colspan=\"5\" style=\"text-align:center;\">Linux\u003Csup>3\u003C\u002Fsup> \u002F Windows\u003Csup>4\u003C\u002Fsup> \u002F macOS\u003Csup>5\u003C\u002Fsup>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>纯CPU支持\u003C\u002Fth>\n      \u003Ctd style=\"text-align:center;\">✅\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">❌\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">✅\u003C\u002Ftd>\n    \u003C\u002Ftr>\n        \u003Ctr>\n      \u003Cth>GPU加速\u003C\u002Fth>\n      \u003Ctd colspan=\"4\" style=\"text-align:center;\">Volta及更高架构的GPU或Apple Silicon\u003C\u002Ftd>\n      \u003Ctd rowspan=\"2\">无需\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>最小显存\u003C\u002Fth>\n      \u003Ctd style=\"text-align:center;\">4GB\u003C\u002Ftd>\n      \u003Ctd style=\"text-align:center;\">8GB\u003C\u002Ftd>\n      \u003Ctd style=\"text-align:center;\">8GB\u003C\u002Ftd>\n      \u003Ctd style=\"text-align:center;\">2GB\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>内存\u003C\u002Fth>\n      \u003Ctd colspan=\"3\" style=\"text-align:center;\">最低16GB，推荐32GB及以上\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">最低16GB\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>磁盘空间\u003C\u002Fth>\n      \u003Ctd colspan=\"3\" style=\"text-align:center;\">最低20GB，建议使用SSD\u003C\u002Ftd>\n      \u003Ctd colspan=\"2\" style=\"text-align:center;\">最低2GB\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>Python版本\u003C\u002Fth>\n      \u003Ctd colspan=\"5\" style=\"text-align:center;\">3.10-3.13\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\u003Csup>1\u003C\u002Fsup> 准确率指标基于`MinerU`最新版本，在OmniDocBench（v1.5）中的端到端评估总分。  \n\u003Csup>2\u003C\u002Fsup> 兼容OpenAI API的服务器，例如本地模型服务器或通过`vLLM`\u002F`SGLang`\u002F`LMDeploy`等推理框架部署的远程模型服务。  \n\u003Csup>3\u003C\u002Fsup> Linux仅支持2019年及以后发布的发行版。  \n\u003Csup>4\u003C\u002Fsup> 由于关键依赖项`ray`在Windows上不支持Python 3.13，因此仅支持3.10~3.12版本。  \n\u003Csup>5\u003C\u002Fsup> macOS需要14.0或更高版本。\n\n\n### 安装MinerU\n\n#### 使用pip或uv安装MinerU\n```bash\npip install --upgrade pip\npip install uv\nuv pip install -U \"mineru[all]\"\n```\n\n#### 从源代码安装MinerU\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU.git\ncd MinerU\nuv pip install -e .[all]\n```\n\n> [!TIP]\n> `mineru[all]`包含所有核心功能，兼容Windows \u002F Linux \u002F macOS系统，适合大多数用户。\n> 如果您需要指定VLM模型的推理框架，或者仅打算在边缘设备上安装轻量级客户端，请参阅文档中的[扩展模块安装指南](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fquick_start\u002Fextension_modules\u002F)。\n\n---\n \n#### 使用Docker部署MinerU\nMinerU提供了便捷的Docker部署方式，可以帮助快速搭建环境并解决一些棘手的环境兼容性问题。\n您可以在文档中找到[Docker部署说明](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fquick_start\u002Fdocker_deployment\u002F)。\n\n---\n\n### 使用MinerU\n\n\n如果您的设备符合上表中的GPU加速要求，您可以使用简单的命令行进行文档解析：\n```bash\nmineru -p \u003Cinput_path> -o \u003Coutput_path>\n```\n如果您的设备不符合GPU加速要求，可以将后端指定为`pipeline`，以便在纯CPU环境下运行：\n```bash\nmineru -p \u003Cinput_path> -o \u003Coutput_path> -b pipeline\n```\n\n`mineru`目前支持本地`PDF`、图像和`DOCX`文件或目录输入，并可通过CLI、API、WebUI以及`mineru-router`进行文档解析。有关详细说明，请参阅[使用指南](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fusage\u002F)。\n\n# 待办事项\n\n- [x] 基于模型的阅读顺序  \n- [x] 主文中`index`和`list`的识别  \n- [x] 表格识别\n- [x] 标题分类\n- [x] 手写文本识别  \n- [x] 竖排文本识别  \n- [x] 拉丁文重音符号识别\n- [x] 主文中代码块识别\n- [x] [化学式识别](docs\u002Fchemical_knowledge_introduction\u002Fintroduction.pdf)(mineru.net)\n- [ ] 几何形状识别\n\n# 已知问题\n\n- 阅读顺序由模型根据可读内容的空间分布决定，在布局极其复杂的区域可能会出现顺序错乱。\n- 对竖排文本的支持有限。\n- 目录和列表通过规则识别，某些不常见的列表格式可能无法被识别。\n- 布局模型中尚未支持代码块。\n- 漫画书、艺术画册、小学教材和练习册等难以很好地解析。\n- 表格识别在复杂表格中可能出现行列识别错误。\n- OCR识别在小语种PDF中可能出现字符不准确的情况（如拉丁字母中的变音符号、阿拉伯文字中容易混淆的字符）。\n- 部分公式在Markdown中可能无法正确渲染。\n\n# 常见问题解答\n\n- 如果在使用过程中遇到任何问题，您可以先查看[常见问题解答](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Ffaq\u002F)以寻找解决方案。  \n- 如果问题仍未解决，您也可以使用[DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fopendatalab\u002FMinerU)与AI助手互动，它能够解决大多数常见问题。  \n- 如果仍然无法解决问题，欢迎您通过[Discord](https:\u002F\u002Fdiscord.gg\u002FTdedn9GTXq)或[微信](https:\u002F\u002Fmineru.net\u002Fcommunity-portal\u002F?aliasId=3c430f94)加入我们的社区，与其他用户和开发者交流讨论。\n\n# 感谢所有贡献者\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_da0181fe3e88.png\" \u002F>\n\u003C\u002Fa>\n\n# 许可证信息\n\n[LICENSE.md](LICENSE.md)\n\n本仓库中的源代码采用AGPLv3许可证。\n\n# 致谢\n\n- [UniMERNet](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FUniMERNet)\n- [TableStructureRec](https:\u002F\u002Fgithub.com\u002FRapidAI\u002FTableStructureRec)\n- [PaddleOCR](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleOCR)\n- [PaddleOCR2Pytorch](https:\u002F\u002Fgithub.com\u002Ffrotms\u002FPaddleOCR2Pytorch)\n- [fast-langdetect](https:\u002F\u002Fgithub.com\u002FLlmKira\u002Ffast-langdetect)\n- [pypdfium2](https:\u002F\u002Fgithub.com\u002Fpypdfium2-team\u002Fpypdfium2)\n- [pdftext](https:\u002F\u002Fgithub.com\u002Fdatalab-to\u002Fpdftext)\n- [pdfminer.six](https:\u002F\u002Fgithub.com\u002Fpdfminer\u002Fpdfminer.six)\n- [pypdf](https:\u002F\u002Fgithub.com\u002Fpy-pdf\u002Fpypdf)\n- [magika](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmagika)\n- [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)\n- [LMDeploy](https:\u002F\u002Fgithub.com\u002FInternLM\u002Flmdeploy)\n\n# 引用\n\n```bibtex\n@article{dong2026minerudiffusion,\n  title={MinerU-Diffusion: 将文档OCR重新思考为基于扩散解码的逆向渲染},\n  author={Dong, Hejun and Niu, Junbo and Wang, Bin and Zeng, Weijun and Zhang, Wentao and He, Conghui},\n  journal={arXiv预印本 arXiv:2603.22458},\n  year={2026}\n}\n\n@article{niu2025mineru2,\n  title={Mineru2.5：一种用于高效高分辨率文档解析的解耦视觉-语言模型},\n  author={Niu, Junbo and Liu, Zheng and Gu, Zhuangcheng and Wang, Bin and Ouyang, Linke and Zhao, Zhiyuan and Chu, Tao and He, Tianyao and Wu, Fan and Zhang, Qintong et al.},\n  journal={arXiv预印本 arXiv:2509.22186},\n  year={2025}\n}\n\n@article{wang2024mineru,\n  title={Mineru：一种开源的精确文档内容提取解决方案},\n  author={Wang, Bin and Xu, Chao and Zhao, Xiaomeng and Ouyang, Linke and Wu, Fan and Zhao, Zhiyuan and Xu, Rui and Liu, Kaiwen and Qu, Yuan and Shang, Fukai et al.},\n  journal={arXiv预印本 arXiv:2409.18839},\n  year={2024}\n}\n\n@article{he2024opendatalab,\n  title={Opendatalab：以开放数据集赋能通用人工智能},\n  author={He, Conghui and Li, Wei and Jin, Zhenjiang and Xu, Chao and Wang, Bin and Lin, Dahua},\n  journal={arXiv预印本 arXiv:2407.13773},\n  year={2024}\n}\n```\n\n# 星标历史\n\n\u003Ca>\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_93d56e76be9c.png&theme=dark\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_93d56e76be9c.png\" \u002F>\n   \u003Cimg alt=\"星标历史图表\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_readme_93d56e76be9c.png\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n\n# 链接\n- [MinerU-Diffusion：将文档OCR重新思考为基于扩散解码的逆向渲染](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU-Diffusion)\n- [使用最新LLM驱动的算子和流水线轻松进行数据准备](https:\u002F\u002Fgithub.com\u002FOpenDCAI\u002FDataFlow)\n- [Vis3（基于S3的开源浏览器）](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FVis3)\n- [LabelU（一款轻量级多模态数据标注工具）](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FlabelU)\n- [LabelLLM（一个开源的LLM对话标注平台）](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FLabelLLM)\n- [PDF-Extract-Kit（一套全面的高质量PDF内容提取工具包）](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FPDF-Extract-Kit)\n- [OmniDocBench（一个全面的文档解析与评估基准）](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FOmniDocBench)\n- [Magic-HTML（混合网页提取工具）](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002Fmagic-html)\n- [Magic-Doc（快速提取PPT\u002FPPTX\u002FDOC\u002FDOCX\u002FPDF内容的工具）](https:\u002F\u002Fgithub.com\u002FInternLM\u002Fmagic-doc) \n- [Dingo：一款全面的人工智能数据质量评估工具](https:\u002F\u002Fgithub.com\u002FMigoXLab\u002Fdingo)","# MinerU 快速上手指南\n\nMinerU 是一款高精度的文档解析引擎，专为 LLM、RAG 和 Agent 工作流设计。它支持将 PDF、Word、PPT、图片及网页转换为结构化的 Markdown 或 JSON，具备公式转 LaTeX、表格转 HTML 及多语言 OCR 识别能力。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux (推荐), macOS, Windows\n*   **Python 版本**：3.8 - 3.12\n*   **硬件要求**：\n    *   **CPU 模式**：无特殊要求，适合轻量级任务。\n    *   **GPU 模式**：推荐 NVIDIA GPU (CUDA 11.8+) 以获得更快的推理速度；同时也支持昇腾 (Ascend)、寒武纪等国产 AI 芯片。\n*   **前置依赖**：建议安装 `pip` 包管理工具，并确保网络连接畅通（若访问 GitHub 或 PyPI 较慢，建议使用国内镜像源）。\n\n## 安装步骤\n\n### 方式一：通过 PyPI 安装（推荐）\n\n使用 pip 直接安装最新稳定版。国内用户推荐使用清华或阿里镜像源加速下载。\n\n```bash\n# 使用默认源安装\npip install mineru\n\n# 或使用国内镜像源加速安装\npip install mineru -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 方式二：从源码安装\n\n如果您需要体验最新功能或进行二次开发，可以从 GitHub 克隆源码安装。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU.git\ncd MinerU\npip install -e .\n```\n\n> **注意**：首次运行时，MinerU 会自动下载所需的模型文件。如果网络受限，请参照官方文档配置模型下载路径或使用离线包。\n\n## 基本使用\n\n安装完成后，您可以通过命令行工具 `mineru` 快速开始文档解析。\n\n### 1. 解析单个文件\n\n最简单的用法是直接指定输入文件（支持 PDF, DOCX, PNG, JPG 等）和输出目录。\n\n```bash\nmineru input.pdf -o output_dir\n```\n\n*   `input.pdf`: 待解析的文档路径。\n*   `-o output_dir`: 解析结果（Markdown 文件及提取的图片）保存的目录。\n\n### 2. 批量解析文件夹\n\n您可以直接对一个包含多个文档的文件夹进行批量处理：\n\n```bash\nmineru .\u002Fdocs_folder -o .\u002Fresults\n```\n\n### 3. 指定后端引擎\n\nMinerU 支持多种解析后端，默认为 `pipeline`（速度快、资源占用低）。如需更高精度（特别是复杂公式和表格），可指定 `vlm-engine` 或 `hybrid-engine`（需确保已配置相应的 VLM 模型环境）：\n\n```bash\n# 使用高精度 VLM 引擎\nmineru input.pdf -o output_dir --backend vlm-engine\n```\n\n### 4. 查看帮助\n\n更多高级参数（如指定语言、输出格式、并发数等）可通过以下命令查看：\n\n```bash\nmineru --help\n```\n\n解析完成后，您将在输出目录中获得包含完整排版信息、公式（LaTeX）和表格（HTML）的 Markdown 文件，可直接用于 RAG 知识库构建或大模型训练。","某金融科技公司的数据团队需要构建一个基于大模型的财报分析助手，首要任务是将数千份包含复杂表格、公式和多栏排版的上市公司 PDF 年报转化为高质量的结构化数据。\n\n### 没有 MinerU 时\n- **排版混乱导致信息丢失**：直接提取的文本往往打乱原有的多栏布局，导致段落顺序错乱，大模型无法理解上下文逻辑。\n- **表格与公式解析失败**：PDF 中的关键财务表格被拆解为无意义的纯文本，数学公式变成乱码，严重阻碍量化分析。\n- **人工清洗成本极高**：工程师需编写大量脆弱的正则规则或安排专人手动校对，处理一份百页财报平均耗时数小时。\n- **非结构化数据难利用**：由于缺乏统一的 Markdown 或 JSON 格式，后续的智能体（Agent）工作流难以自动调用这些数据进行推理。\n\n### 使用 MinerU 后\n- **完美还原文档结构**：MinerU 精准识别并重组多栏排版，输出的 Markdown 完整保留了标题层级和阅读顺序，确保语义连贯。\n- **高精度还原图表公式**：自动将复杂财务报表转换为标准的 Markdown 表格，并将数学公式转为 LaTeX 格式，直接可供计算引擎使用。\n- **自动化流程效率倍增**：无需人工干预，MinerU 可在分钟级内完成单份财报的清洗与转换，整体数据处理效率提升数十倍。\n- **无缝对接智能体工作流**：生成的标准化 JSON\u002FMarkdown 数据可直接喂给下游 LLM，让财报分析助手能立即执行趋势预测和风险预警任务。\n\nMinerU 通过将“死”的复杂文档瞬间转化为大模型可理解的“活”数据，彻底打通了从原始资料到智能决策的最后一公里。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendatalab_MinerU_0e6b9cc9.png","opendatalab","OpenDataLab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fopendatalab_37842245.jpg","OpenDataLab provides access to numerous significant open-source datasets.",null,"OpenDataLab@pjlab.org.cn","https:\u002F\u002Fopendatalab.org.cn","https:\u002F\u002Fgithub.com\u002Fopendatalab",[81,85],{"name":82,"color":83,"percentage":84},"Python","#3572A5",99.2,{"name":86,"color":87,"percentage":88},"Dockerfile","#384d54",0.8,58197,4801,"2026-04-06T01:37:49","AGPL-3.0","Linux, macOS, Windows","非必需。支持纯 CPU 运行（pipeline 后端）；若使用 VLM 引擎或追求高精度\u002F高吞吐，建议使用 NVIDIA GPU（支持 CUDA），并兼容国产 AI 芯片（如昇腾、寒武纪等）。具体显存和 CUDA 版本未在片段中明确说明。","未说明（提及 pipeline 后端资源占用极低）",{"notes":97,"python":98,"dependencies":99},"1. 提供多种推理后端：pipeline（CPU\u002FGPU 通用，低资源）、vlm-engine（高精度，需 GPU）、hybrid-engine（混合模式）。2. 原生支持 DOCX 解析，无需转为 PDF。3. 支持多种部署方式：Python SDK、CLI、REST API、Docker 及桌面客户端。4. 兼容多种国产 AI 芯片（昇腾、寒武纪、摩尔线程等）。5. 具体依赖库版本需参考官方 requirements 文件，此处仅列出核心组件。","3.8+",[100,101,102,103],"mineru","mineru-api","torch (可选，用于 GPU 加速)","vLLM\u002FLMDeploy\u002Fmlx (可选，用于 VLM 引擎)",[15,16,105,14,35],"其他",[107,108,109,110,111,112,113,114,115,116,117,118,119],"extract-data","layout-analysis","ocr","parser","pdf","pdf-converter","python","document-analysis","pdf-parser","pdf-extractor-llm","pdf-extractor-pretrain","pdf-extractor-rag","ai4science",6,"2026-03-27T02:49:30.150509","2026-04-06T15:02:36.735734",[124,129,134,139,144,149,153],{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},19326,"如何部署支持多 GPU 并行处理的 MinerU 服务？","可以使用 LitServe 构建服务化方案。安装依赖：`pip install -U litserve python-multipart filetype` 以及对应的 torch 和 magic-pdf 版本。在服务端代码中，利用 `litserve` 的 `setup` 方法初始化模型，并在 `predict` 方法中调用 `do_parse` 进行解析。服务端会自动在多个 GPU 上并行处理请求，客户端只需多线程调用即可。注意：模型初始化通常在 `do_parse` 内部完成，无需在 `setup` 中额外实例化模型以避免资源浪费。","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues\u002F667",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},19327,"为什么使用 pip 安装时始终获取到旧版本（如 0.6.1）而不是最新版？","这通常是因为 pip 缓存了旧版本或索引源优先级问题。尝试强制指定版本安装：`pip install -U \"magic-pdf[full]==1.1.0\" --extra-index-url https:\u002F\u002Fwheels.myhloli.com\u002F -i https:\u002F\u002Fmirrors.aliyun.com\u002Fpypi\u002Fsimple`。如果是在 Docker 环境中，请确保基础镜像没有预装旧版本，或者在 Dockerfile 中清理 pip 缓存后再安装。此外，检查是否与其他源（如清华源）冲突，导致优先拉取了旧版本。","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues\u002F1723",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},19328,"在 Dify 中集成 MinerU 插件时报错\"UnsupportedProtocol\"或 URL 缺失协议怎么办？","该错误通常是因为环境变量配置未生效。需要修改 `.env` 文件中的 `FILES_URL` 为完整的 Dify 服务地址（包含 `http:\u002F\u002F` 或 `https:\u002F\u002F` 前缀，例如 `http:\u002F\u002Fip:port`）。修改后，仅仅执行 `docker compose restart` 可能无效，必须彻底重启容器：先执行 `docker compose down`，然后再执行 `docker compose up -d`，以确保新配置加载成功。","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues\u002F2006",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},19329,"在 GPU 环境下开启表格识别（is_table_recog_enable: true）时报错\"axis 2 is out of bounds\"如何解决？","此问题出现在特定版本（如 0.7.1）的 GPU 模式下，当 PDF 中包含表格且开启表格识别配置时触发。虽然 CPU 模式下该配置正常，但 GPU 模式下可能存在维度计算 bug。临时解决方案是：如果文档中表格较少，可尝试在 `magic-pdf.json` 中将 `is_table_recog_enable` 设置为 `false` 以绕过报错；或者切换回 CPU 模式运行。建议关注官方后续版本更新以修复此 GPU 下的维度越界问题。","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues\u002F659",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},19330,"如何在 Docker 容器中正确安装指定版本的 magic-pdf？","在 Dockerfile 中，确保使用正确的 pip 源和参数。示例如下：\n`RUN pip install -U magic-pdf[full] --extra-index-url https:\u002F\u002Fwheels.myhloli.com`\n如果仍然安装失败或版本不对，可以尝试显式指定版本号，并检查基础镜像（如 `python:3.10`）的网络连通性。若遇到依赖冲突，建议先卸载现有包再重新安装，或在构建前清除 pip 缓存。","https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fissues\u002F556",{"id":150,"question_zh":151,"answer_zh":152,"source_url":128},19331,"多 GPU 服务部署时，worker 启动过多导致 onnxruntime 报线程资源不足错误的原因是什么？","这是因为每个 worker 都会占用一定的线程资源，当每张卡启动的 worker 数量过多时，超过了 onnxruntime 默认的线程池限制或系统可用线程数。解决方法是限制每个 GPU 上启动的 worker 数量，或者在环境变量中调整 onnxruntime 的线程数配置（如 `OMP_NUM_THREADS`），确保总线程需求不超过系统承载能力。",{"id":154,"question_zh":155,"answer_zh":156,"source_url":143},19332,"PDF 中包含非简体中文（如粤语）时识别效果不佳或未开启 OCR 怎么办？","如果未开启 OCR，MinerU 主要依赖 PDF 内置文本层，对于特殊字体或生僻字（如部分粤语用字）可能识别错误或乱码。解决方案是强制开启 OCR 功能，让模型通过图像识别文字。可以在配置文件或调用参数中启用 OCR 模式，虽然速度会稍慢，但能显著提高对特殊字符和复杂排版的识别准确率。",[158,163,168,173,178,183,188,193,198,203,208,213,218,223,228,233,238,243,248,253],{"id":159,"version":160,"summary_zh":161,"released_at":162},117309,"mineru-3.0.8-released","## 变更内容\n\n* 修复：#4728 #4730 由 @myhloli 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4731 中实现 MinerU 的进程管理和关闭机制\n\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-3.0.7-released...mineru-3.0.8-released","2026-04-03T10:51:14",{"id":164,"version":165,"summary_zh":166,"released_at":167},117310,"mineru-3.0.7-released","## 变更内容\n\n* 修复：在 office_middle_json_mkcontent 中去除段落文本中的换行符，由 @myhloli 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4717 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-3.0.6-released...mineru-3.0.7-released","2026-04-01T13:23:31",{"id":169,"version":170,"summary_zh":171,"released_at":172},117311,"mineru-3.0.6-released","## 变更内容\n\n#4708：\n  - 功能：在 Markdown 处理中添加下划线主题分隔符的转义\n  - 修复：通过移除不必要的去除空白操作，修正段落文本提取\n  - 功能：增强段落文本提取功能，以包含内联内容控件\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-3.0.5-released...mineru-3.0.6-released","2026-04-01T12:54:58",{"id":174,"version":175,"summary_zh":176,"released_at":177},117312,"mineru-3.0.5-released","## 变更内容\n\n- 修复：在 3.0.4 版本中，改进了 Windows 系统下 FastAPI 子进程的关闭处理逻辑。\n- 修复：在 Swagger UI 中为文件上传添加自定义 JSON Schema，以支持 `fastapi>=0.130.0`。\n- 修复：更新 `pyproject.toml` 文件中 Windows 的 `sys_platform` 标识符，解决在 Windows 系统上安装 `[all]` 时无法自动安装 `lmdeploy` 的问题。\n- 新特性：在 `pyproject.toml` 中添加 albumentations 依赖项 #4701\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-3.0.4-released...mineru-3.0.5-released","2026-03-31T19:36:42",{"id":179,"version":180,"summary_zh":181,"released_at":182},117313,"mineru-3.0.4-released","## 变更内容\n* 功能新增：在 CLI 中添加了 `--enable-vlm-preload` 选项，用于在启动时预加载 VLM 模型，由 @myhloli 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4693 中实现。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-3.0.3-released...mineru-3.0.4-released","2026-03-30T17:51:33",{"id":184,"version":185,"summary_zh":186,"released_at":187},117314,"mineru-3.0.3-released","## 变更内容\n* 修复：通过持久化执行器和回收逻辑增强 PDF 渲染，由 @myhloli 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4688 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-3.0.1-released...mineru-3.0.3-released","2026-03-30T12:26:04",{"id":189,"version":190,"summary_zh":191,"released_at":192},117315,"mineru-3.0.1-released","## 变更内容\n\n* 修复：重构 OCR 处理逻辑，以改进跨度处理并减少代码重复，由 @myhloli 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4675 中完成。\n\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-3.0.0-released...mineru-3.0.1-released","2026-03-29T05:31:07",{"id":194,"version":195,"summary_zh":196,"released_at":197},117316,"mineru-3.0.0-released","## 变更内容\n\n- 2026年3月29日 发布 3.0.0\n\n本次发布围绕**解析能力、系统架构和工程易用性**进行了系统性升级。主要更新包括：\n\n- 原生 `DOCX` 解析\n  - 正式支持原生 `DOCX` 解析，结果精度高且无幻觉问题。\n  - 相较于传统先将 `DOCX` 转换为 `PDF` 再进行解析的工作流，端到端速度提升了数十倍，更适合对准确性和吞吐量均有较高要求的场景。\n  \n- `pipeline` 后端升级\n  - 在 OmniDocBench（v1.5）上获得 86.2 分，超越了上一代主流 VLM 模型 `MinerU2.0-2505-0.9B` 的精度。\n  - 新增对表格内图像\u002F公式解析、印章文字识别、竖排文本支持以及行间公式编号识别等功能，持续提升复杂文档场景下的解析质量。\n  - 在保持高精度的同时，资源占用极低，并继续支持纯 CPU 环境下的推理。\n\n- `API \u002F CLI \u002F Router` 编排升级\n  - `mineru` 现已作为基于 `mineru-api` 的编排客户端运行；当未指定 `--api-url` 时，会自动启动本地临时服务。\n  - `mineru-api` 新增异步任务接口 `POST \u002Ftasks`，支持任务提交、状态查询及结果获取；同时保留同步解析接口 `POST \u002Ffile_parse`，以兼容旧版插件。\n  - 新增 `mineru-router`，专为多服务、多 GPU 环境下的统一入口部署与任务路由设计；其接口与 `mineru-api` 完全兼容，并支持任务的自动负载均衡。\n\n- 部署与易用性改进\n  - 解决了与 `torch >= 2.8` 的兼容性问题；基础镜像升级至 `vllm0.11.2 + torch2.9.0`，统一了不同计算能力下的安装路径。\n  - 优化了解析管道，引入滑动窗口机制，显著降低了长文档场景下的峰值内存使用，使得数万页文档无需再手动拆分。\n  - `pipeline` 中的批量推理现支持流式写盘，可及时输出已完成的解析结果，进一步提升长时间任务的用户体验。\n  - 完成了线程安全优化，全面支持多线程并发推理；结合 `mineru-router`，可实现一键多 GPU 部署，轻松构建高并发、高吞吐量的解析系统。\n  - 彻底移除了对两款 AGPLv3 许可模型（`doclayoutyolo` 和 `mfd_yolov8`）以及一款 CC-BY-NC-SA 4.0 许可模型（`layoutreader`）的依赖。\n\n此次更新不仅是一系列功能增强，更是 MinerU 整体发展中的关键飞跃。","2026-03-28T20:13:48",{"id":199,"version":200,"summary_zh":201,"released_at":202},117317,"mineru-2.7.6-released","## 变更内容\n\n- 2026年2月6日 2.7.6 版本发布\n  - 新增对国产算力平台昆仑芯和太初元碁的支持。\n\n- 2026年2月6日 2.7.6 版本发布\n  - 新增国产算力平台昆仑芯、太初元碁的适配支持，目前已由官方和厂商适配并支持的国产算力平台包括：\n    - [昇腾 Ascend](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FAscend) \n    - [平头哥 T-Head](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FTHead) \n    - [沐曦 METAX](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FMETAX) \n    - [海光 Hygon](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FHygon\u002F)\n    - [燧原 Enflame](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FEnflame\u002F)\n    - [摩尔线程 MooreThreads](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FMooreThreads\u002F)\n    - [天数智芯 IluvatarCorex](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FIluvatarCorex\u002F)\n    - [寒武纪 Cambricon](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FCambricon\u002F)\n    - [昆仑芯 Kunlunxin](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FKunlunxin\u002F)\n    - [太初元碁 Tecorigin](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FTecorigin\u002F)\n  - MinerU 持续兼容国产硬件平台，支持主流芯片架构。以安全可靠的技术，助力科研、政企用户迈向文档数字化新高度！\n\n## 新贡献者\n* @Arrmsgt 在 https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4498 中做出了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.7.5-released...mineru-2.7.6-released","2026-02-06T03:39:53",{"id":204,"version":205,"summary_zh":206,"released_at":207},117318,"mineru-2.7.5-released","## 变更内容\n\n- 修复在特定条件下 PDF 渲染超时检测失败的问题。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.7.4-released...mineru-2.7.5-released","2026-02-02T11:59:52",{"id":209,"version":210,"summary_zh":211,"released_at":212},117319,"mineru-2.7.4-released","## What's Changed\r\n\r\n- 2026\u002F01\u002F30 2.7.4 Release\r\n  - Added support for domestic computing platforms IluvatarCorex and Cambricon.\r\n\r\n- 2026\u002F01\u002F30 2.7.4 发布\r\n  - 新增国产算力平台天数智芯、寒武纪的适配支持，目前已由官方适配并支持的国产算力平台包括:\r\n    - [昇腾 Ascend](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FAscend) \r\n    - [平头哥 T-Head](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FTHead) \r\n    - [沐曦 METAX](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FMETAX) \r\n    - [海光 Hygon](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FHygon\u002F)\r\n    - [燧原 Enflame](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FEnflame\u002F)\r\n    - [摩尔线程 MooreThreads](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FMooreThreads\u002F)\r\n    - [天数智芯 IluvatarCorex](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FIluvatarCorex\u002F)\r\n    - [寒武纪 Cambricon](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FCambricon\u002F)\r\n  - MinerU 持续兼容国产硬件平台，支持主流芯片架构。以安全可靠的技术，助力科研、政企用户迈向文档数字化新高度！\r\n\r\n## New Contributors\r\n* @pgoslatara made their first contribution in https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4421\r\n* @Copilot made their first contribution in https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4434\r\n* @guguducken made their first contribution in https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4435\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.7.3-released...mineru-2.7.4-released","2026-01-30T13:48:52",{"id":214,"version":215,"summary_zh":216,"released_at":217},117320,"mineru-2.7.3-released","## What's Changed\r\n\r\n- Fix bug : #4415\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.7.2-released...mineru-2.7.3-released","2026-01-26T11:40:42",{"id":219,"version":220,"summary_zh":221,"released_at":222},117321,"mineru-2.7.2-released","## What's Changed\r\n\r\n- 2026\u002F01\u002F23 2.7.2 Release\r\n  - Cross-page table merging optimization, improving merge success rate and merge quality\r\n \r\n- 2026\u002F01\u002F23 2.7.2 发布\r\n  - 新增国产算力平台海光、燧原、摩尔线程的适配支持，目前已由官方适配并支持的国产算力平台包括:\r\n    - [昇腾 Ascend](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FAscend) \r\n    - [平头哥 T-Head](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FTHead) \r\n    - [沐曦 METAX](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FMETAX) \r\n    - [海光 Hygon](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FHygon\u002F)\r\n    - [燧原 Enflame](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FEnflame\u002F)\r\n    - [摩尔线程 MooreThreads](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002Facceleration_cards\u002FMooreThreads\u002F)\r\n  - MinerU 持续兼容国产硬件平台，支持主流芯片架构。以安全可靠的技术，助力科研、政企用户迈向文档数字化新高度！\r\n  - 跨页表合并优化，提升合并成功率与合并效果\r\n\r\n## New Contributors\r\n* @tommygood made their first contribution in https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4365\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.7.1-released...mineru-2.7.2-released","2026-01-23T13:41:07",{"id":224,"version":225,"summary_zh":226,"released_at":227},117322,"mineru-2.7.1-released","## What's Changed\r\n\r\n- 2026\u002F01\u002F06 2.7.1 Release\r\n  - fix bug: #4300\r\n  - Updated pdfminer.six dependency version to resolve [CVE-2025-64512](https:\u002F\u002Fgithub.com\u002Fadvisories\u002FGHSA-wf5f-4jwr-ppcp)\r\n  - Support automatic correction of input image exif orientation to improve OCR recognition accuracy  #4283\r\n\r\n- 2026\u002F01\u002F06 2.7.1 发布\r\n  - fix bug: #4300\r\n  - 更新pdfminer.six的依赖版本以解决 [CVE-2025-64512](https:\u002F\u002Fgithub.com\u002Fadvisories\u002FGHSA-wf5f-4jwr-ppcp)\r\n  - 支持输入图像的exif方向自动校正，提升OCR识别效果  #4283\r\n  \r\n## New Contributors\r\n* @kingdomad made their first contribution in https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4283\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.7.0-released...mineru-2.7.1-released","2026-01-06T06:59:26",{"id":229,"version":230,"summary_zh":231,"released_at":232},117323,"mineru-2.7.0-released","## What's Changed\r\n\r\n- 2025\u002F12\u002F30 2.7.0 Release\r\n  - Simplified installation process. No need to separately install `vlm` acceleration engine dependencies. Using `uv pip install mineru[all]` during installation will install all optional backend dependencies.\r\n  - Added new `hybrid` backend, which combines the advantages of `pipeline` and `vlm` backends. Built on vlm, it integrates some capabilities of pipeline, adding extra extensibility on top of high accuracy:\r\n    - Directly extracts text from text PDFs, natively supports multi-language recognition in text PDF scenarios, and greatly reduces parsing hallucinations;\r\n    - Supports text recognition in 109 languages for scanned PDF scenarios by specifying OCR language;\r\n    - Independent inline formula recognition switch, which can be disabled separately when inline formula recognition is not needed, improving the visual effect of parsing results.\r\n  - Simplified engine selection logic for `vlm\u002Fhybrid` backends. Users only need to specify the backend as `*-auto-engine`, and the system will automatically select the appropriate engine for inference acceleration based on the current environment, improving usability.\r\n  - Switched default parsing backend from `pipeline` to `hybrid-auto-engine`, improving out-of-the-box result consistency for new users and avoiding cognitive differences in parsing results.\r\n  - Added i18n support to gradio application, supporting switching between Chinese and English languages.\r\n \r\n- 2025\u002F12\u002F30 2.7.0 发布\r\n  - 简化安装流程，现在不再需要单独安装`vlm`加速引擎依赖包，安装时使用`uv pip install mineru[all]`即可安装所有可选后端的依赖包。\r\n  - 增加全新后端`hybrid`，该后端结合了`pipeline`和`vlm`后端的优势，在vlm的基础上，融入了pipeline的部分能力，在高精度的基础上增加了额外的扩展性：\r\n    - 从文本pdf中直接抽取文本，在文本pdf场景原生支持多语言识别，并极大减少解析幻觉；\r\n    - 通过指定ocr语言，在扫描pdf场景下支持109种语言的文本识别；\r\n    - 独立的行内公式识别开关，在不需要行内公式识别的场景下可单独关闭，提升解析结果视觉效果。\r\n  - 简化`vlm\u002Fhybrid`后端的引擎选择逻辑，用户只需指定后端为`*-auto-engine`，系统会根据当前环境自动选择合适的引擎进行推理加速，提升易用性.\r\n  - 默认解析后端从`pipeline`切换至`hybrid-auto-engine`，提升新用户开箱即用的结果一致性，避免出现解析结果认知差异。\r\n  - gradio应用增加i18n适配，支持中英文两种语言切换。\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.6.8-released...mineru-2.7.0-released","2025-12-30T10:24:37",{"id":234,"version":235,"summary_zh":236,"released_at":237},117324,"mineru-2.6.8-released","## What's Changed\r\n- Bug Fix:  #4189\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.6.7-released...mineru-2.6.8-released","2025-12-15T10:25:45",{"id":239,"version":240,"summary_zh":241,"released_at":242},117325,"mineru-2.6.7-released","## What's Changed\r\n\r\n- Bug fix: #4168\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.6.6-released...mineru-2.6.7-released","2025-12-12T09:25:04",{"id":244,"version":245,"summary_zh":246,"released_at":247},117326,"mineru-2.6.6-released","## What's Changed\r\n\r\n- 2025\u002F12\u002F02 2.6.6 Release\r\n  - `mineru-api` tool optimizations\r\n    - Added descriptive text to `mineru-api` interface parameters to improve API documentation readability.\r\n    - You can use the environment variable `MINERU_API_ENABLE_FASTAPI_DOCS` to control whether the auto-generated interface documentation page is enabled (enabled by default).\r\n    - Added concurrency configuration options for the `vlm-vllm-async-engine`, `vlm-lmdeploy-engine`, and `vlm-http-client` backends. Users can use the environment variable `MINERU_API_MAX_CONCURRENT_REQUESTS` to set the maximum number of concurrent API requests (unlimited by default).\r\n\r\n- 2025\u002F12\u002F02 2.6.6 发布\r\n  - `Ascend`适配优化\r\n    - 优化命令行工具初始化流程，使Ascend适配方案中`vlm-vllm-engine`后端在命令行工具中可用。\r\n    - 为Atlas 300I Duo(310p)设备更新适配文档。\r\n  - `mineru-api`工具优化\r\n    - 为`mineru-api`接口参数增加描述性文本，优化接口文档可读性。\r\n    - 可通过环境变量`MINERU_API_ENABLE_FASTAPI_DOCS`控制是否启用自动生成的接口文档页面，默认为启用。\r\n    - 为`vlm-vllm-async-engine`、`vlm-lmdeploy-engine`、`vlm-http-client`后端增加并发数配置选项，用户可通过环境变量`MINERU_API_MAX_CONCURRENT_REQUESTS`控制api接口的最大并发请求数，默认为不限制数量。\r\n\r\n\r\n\r\n## New Contributors\r\n* @zyileven made their first contribution in https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4070\r\n* @Flynn-Zh made their first contribution in https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F4046\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.6.5-released...mineru-2.6.6-released","2025-12-01T19:54:45",{"id":249,"version":250,"summary_zh":251,"released_at":252},117327,"mineru-2.6.5-released","## What's Changed\r\n- 2025\u002F11\u002F26 2.6.5 Release\r\n  - Added support for a new backend vlm-lmdeploy-engine. Its usage is similar to vlm-vllm-(async)engine, but it uses lmdeploy as the inference engine and additionally supports native inference acceleration on Windows platforms compared to vllm.\r\n\r\n- 2025\u002F11\u002F26 2.6.5 发布\r\n  - 增加新后端`vlm-lmdeploy-engine`支持，使用方式与`vlm-vllm-(async)engine`类似，但使用`lmdeploy`作为推理引擎，与`vllm`相比额外支持Windows平台原生推理加速。\r\n  - 新增国产算力平台`昇腾\u002Fnpu`、`平头哥\u002Fppu`、`沐曦\u002Fmaca`的适配支持，用户可在对应平台上使用`pipeline`与`vlm`模型，并使用`vllm`\u002F`lmdeploy`引擎加速vlm模型推理，具体使用方式请参考[其他加速卡适配](https:\u002F\u002Fopendatalab.github.io\u002FMinerU\u002Fzh\u002Fusage\u002F)。\r\n    - 国产平台适配不易，我们已尽量确保适配的完整性和稳定性，但仍可能存在一些稳定性\u002F兼容问题与精度对齐问题，请大家根据适配文档页面内红绿灯情况自行选择合适的环境与场景进行使用。\r\n    - 如在使用国产化平台适配方案的过程中遇到任何文档未提及的问题，为便于其他用户查找解决方案，请在discussions的[指定帖子](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fdiscussions\u002F4064)中进行反馈。\r\n\r\n## New Contributors\r\n* @jinminxi104 made their first contribution in https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fpull\u002F3946\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.6.4-released...mineru-2.6.5-released","2025-11-26T03:52:02",{"id":254,"version":255,"summary_zh":256,"released_at":257},117328,"mineru-2.6.4-released","## What's Changed\r\n- 2025\u002F11\u002F04 2.6.4 Release\r\n  - Added timeout configuration for PDF image rendering, default is 300 seconds, can be configured via environment variable `MINERU_PDF_RENDER_TIMEOUT` to prevent long blocking of the rendering process caused by some abnormal PDF files.\r\n  - Added CPU thread count configuration options for ONNX models, default is the system CPU core count, can be configured via environment variables `MINERU_INTRA_OP_NUM_THREADS` and `MINERU_INTER_OP_NUM_THREADS` to reduce CPU resource contention conflicts in high concurrency scenarios.\r\n\r\n- 2025\u002F11\u002F04 2.6.4 发布\r\n  - 为pdf渲染图片增加超时配置，默认为300秒，可通过环境变量`MINERU_PDF_RENDER_TIMEOUT`进行配置，防止部分异常pdf文件导致渲染过程长时间阻塞。\r\n  - 为onnx模型增加cpu线程数配置选项，默认为系统cpu核心数，可通过环境变量`MINERU_INTRA_OP_NUM_THREADS`和`MINERU_INTER_OP_NUM_THREADS`进行配置，以减少高并发场景下的对cpu资源的抢占冲突。\r\n\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU\u002Fcompare\u002Fmineru-2.6.3-released...mineru-2.6.4-released","2025-11-04T12:29:33"]