[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-deepdoctection--deepdoctection":3,"tool-deepdoctection--deepdoctection":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",151918,2,"2026-04-12T11:33:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":64,"owner_name":64,"owner_avatar_url":72,"owner_bio":73,"owner_company":74,"owner_location":74,"owner_email":74,"owner_twitter":74,"owner_website":74,"owner_url":75,"languages":76,"stars":92,"forks":93,"last_commit_at":94,"license":95,"difficulty_score":96,"env_os":97,"env_gpu":98,"env_ram":97,"env_deps":99,"category_tags":111,"github_topics":112,"view_count":32,"oss_zip_url":74,"oss_zip_packed_at":74,"status":17,"created_at":128,"updated_at":129,"faqs":130,"releases":160},6925,"deepdoctection\u002Fdeepdoctection","deepdoctection","A Repo For Document AI","deepdoctection 是一个专为文档智能理解打造的 Python 开源库，旨在帮助开发者高效构建从扫描版 PDF 到结构化数据的提取流水线。它主要解决了复杂文档处理中布局分析困难、OCR 识别精度不一以及后续文本分类难以串联的痛点，让用户无需从零开始整合各类算法。\n\n该工具非常适合需要处理大量非结构化文档的 AI 工程师、数据科学家及研究人员使用。无论是构建自动化归档系统，还是研发复杂的文档问答应用，deepdoctection 都能提供坚实的技术支撑。其核心亮点在于强大的生态整合能力：底层基于 PyTorch，无缝集成了 Detectron2 和 Hugging Face Transformers，支持 LayoutLM、LiLT 等多种前沿预训练模型的微调与推理；同时在 OCR 环节灵活兼容 Tesseract、DocTr 及 AWS Textract。此外，它还提供了完整的管道评估体系和丰富的教程笔记，帮助用户轻松实现文档布局分析、表格识别、语言检测及图像矫正等全流程任务，让文档提取工作变得更加流畅可控。","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_readme_52607c1abf48.png\" alt=\"Deep Doctection Logo\" width=\"60%\">\n\u003C\u002Fp>\n\n![GitHub Repo stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdeepdoctection\u002Fdeepdoctection)\n![PyPI - Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fdeepdoctection)\n![PyPI - License](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fl\u002Fdeepdoctection)\n\n\n------------------------------------------------------------------------------------------------------------------------\n# NEW \n\nVersion `v.1.0` includes a major refactoring.  Key changes include:\n\n* PyTorch-only support for all deep learning models.\n* Support for many more fine-tuned models from the Huggingface Hub (Bert, RobertA, LayoutLM, LiLT, ...)\n* Decomposition into small sub-packages: dd-core, dd-datasets and deepdoctection\n* Type validations of core data structures\n* New test suite\n\n------------------------------------------------------------------------------------------------------------------------\n\n\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">\n  A Package for Document Understanding\n  \u003C\u002Fh1>\n\u003C\u002Fp>\n\n\n**deep**doctection is a Python library that orchestrates Scan and PDF document layout analysis, OCR and document \nand token classification. Build and run a pipeline for your document extraction tasks, develop your own document\nextraction workflow, fine-tune pre-trained models and use them seamlessly for inference.\n\n# Overview\n\n- Document layout analysis and table recognition in PyTorch with \n[**Detectron2**](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Ftree\u002Fmain\u002Fdetectron2) and \n[**Transformers**](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers),\n- OCR with support of [**Tesseract**](https:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract), [**DocTr**](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr) and \n  [**AWS Textract**](https:\u002F\u002Faws.amazon.com\u002Ftextract\u002F),\n- Document and token classification with the [**LayoutLM**](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm) family,\n  [**LiLT**](https:\u002F\u002Fgithub.com\u002FjpWang\u002FLiLT) and and many\n  [**Bert**](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fxlm-roberta)-style models including features like sliding windows.\n- Text mining for native PDFs with [**pdfplumber**](https:\u002F\u002Fgithub.com\u002Fjsvine\u002Fpdfplumber),\n- Language detection with with transformer based `papluca\u002Fxlm-roberta-base-language-detection`. \n- Deskewing and rotating images with [**jdeskew**](https:\u002F\u002Fgithub.com\u002Fphamquiluan\u002Fjdeskew) or [**Tesseract**](https:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract).\n- Fine-tuning object detection, document or token classification models and evaluating whole pipelines.\n- Lot's of [tutorials](https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fnotebooks)\n\nHave a look at the [**introduction notebook**](https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fnotebooks\u002Fblob\u002Fmain\u002FAnalyzer_Get_Started.ipynb) for an easy start.\n\nCheck the [**release notes**](https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Freleases) for recent updates.\n\n----------------------------------------------------------------------------------------\n\n# Hugging Face Space Demo\n\nCheck the demo of a document layout analysis pipeline with OCR on 🤗\n[**Hugging Face spaces**](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepdoctection\u002Fdeepdoctection).\n\n--------------------------------------------------------------------------------------------------------\n\n# Example \n\nThe following example shows how to use the built-in analyzer to decompose a PDF document into its layout structures.\n\n```python\nimport deepdoctection as dd\nfrom IPython.core.display import HTML\nfrom matplotlib import pyplot as plt\n\nanalyzer = dd.get_dd_analyzer()  # instantiate the built-in analyzer similar to the Hugging Face space demo\n\ndf = analyzer.analyze(path = \"\u002Fpath\u002Fto\u002Fyour\u002Fdoc.pdf\")  # setting up pipeline\ndf.reset_state()                 # Trigger some initialization\n\ndoc = iter(df)\npage = next(doc) \n\nimage = page.viz(show_figures=True, show_residual_layouts=True)\nplt.figure(figsize = (25,17))\nplt.axis('off')\nplt.imshow(image)\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_readme_bfeaa45a3095.png\" \nalt=\"sample\" width=\"40%\">\n\u003C\u002Fp>\n\n```\nHTML(page.tables[0].html)\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_readme_8b4f81e20525.png\" \nalt=\"table\" width=\"40%\">\n\u003C\u002Fp>\n\n```\nprint(page.text)\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_readme_0452df6eb523.png\" \nalt=\"text\" width=\"40%\">\n\u003C\u002Fp>\n\n\n\n-----------------------------------------------------------------------------------------\n\n# Requirements\n\n![requirements](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_readme_99bcdcc80dd4.png)\n\n- Python >= 3.10\n- PyTorch >= 2.6\n- To fine-tune models, a GPU is recommended.\n\n------------------------------------------------------------------------------------------\n\n# Installation\n\nWe recommend using a virtual environment.\n\n## Get started installation\n\nFor a simple setup which is enough to parse documents with the default setting, install the following\n\n```\nuv pip install timm  # needed for the default setup\nuv pip install transformers\nuv pip install python-doctr\nuv pip install deepdoctection\n```\n\nThis setup is sufficient to run the [**introduction notebook**](https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fnotebooks\u002Fblob\u002Fmain\u002FGet_Started.ipynb).\n\n### Full installation\n\nThe following installation will give you a general setup so that you can experiment with various configurations.\nRemember, that you always have to install PyTorch separately.\n\nFirst install **Detectron2** separately as it is not distributed via PyPi. Check the instruction\n[here](https:\u002F\u002Fdetectron2.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Finstall.html) or try:\n\n```\nuv pip install --no-build-isolation detectron2@git+https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdetectron2.git\n```\n\nThen install **deep**doctection with all its dependencies:\n\n```\nuv pip install deepdoctection[full]\n```\n\n### Installation with Conda or Mamba\n\nYou can install **deep**doctection using Conda or Mamba with the provided `environment.yml`:\n\n```bash\n# Using conda\nconda env create -f environment.yml\nconda activate deepdoctection\n\n# Using mamba (faster)\nmamba env create -f environment.yml\nmamba activate deepdoctection\n```\n\n\nFor further information, please consult the [**full installation instructions**](https:\u002F\u002Fdeepdoctection.readthedocs.io\u002Fen\u002Flatest\u002Finstall\u002F).\n\n\n## Installation from source\n\nDownload the repository or clone via\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection.git\n```\n\nThe easiest way is to install with make. A virtual environment is required\n\n```bash\nmake install-dd\n```\n\n\n## Running a Docker container from Docker hub\n\nPre-existing Docker images can be downloaded from the [Docker hub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fdeepdoctection\u002Fdeepdoctection).\n\nAdditionally, specify a working directory to mount files to be processed into the container.\n\n```\ndocker compose up -d\n```\n\nwill start the container. There is no endpoint exposed, though.\n\n-----------------------------------------------------------------------------------------------\n\n# Credits\n\nWe thank all libraries that provide high quality code and pre-trained models. Without, it would have been impossible\nto develop this framework.\n\n\n# If you like **deep**doctection ...\n\n...you can easily support the project by making it more visible. Leaving a star or a recommendation will help.\n\n# License\n\nDistributed under the Apache 2.0 License. Check [LICENSE](https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fblob\u002Fmaster\u002FLICENSE) for additional information.\n","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_readme_52607c1abf48.png\" alt=\"Deep Doctection Logo\" width=\"60%\">\n\u003C\u002Fp>\n\n![GitHub 仓库星级](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdeepdoctection\u002Fdeepdoctection)\n![PyPI - 版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fdeepdoctection)\n![PyPI - 许可证](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fl\u002Fdeepdoctection)\n\n\n------------------------------------------------------------------------------------------------------------------------\n# 新版\n\n版本 `v.1.0` 包含了一次重大重构。主要变更包括：\n\n* 所有深度学习模型仅支持 PyTorch。\n* 支持来自 Hugging Face Hub 的更多微调模型（Bert、RobertA、LayoutLM、LiLT 等）。\n* 拆分为多个小型子包：dd-core、dd-datasets 和 deepdoctection。\n* 核心数据结构的类型验证。\n* 新的测试套件。\n\n------------------------------------------------------------------------------------------------------------------------\n\n\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">\n  用于文档理解的软件包\n  \u003C\u002Fh1>\n\u003C\u002Fp>\n\n\n**deep**doctection 是一个 Python 库，用于编排扫描和 PDF 文档的布局分析、OCR 以及文档和标记分类。您可以为自己的文档提取任务构建并运行流水线，开发自定义的文档提取工作流，微调预训练模型，并无缝地将其用于推理。\n\n# 概述\n\n- 使用 [**Detectron2**](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Ftree\u002Fmain\u002Fdetectron2) 和 [**Transformers**](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) 在 PyTorch 中进行文档布局分析和表格识别，\n- OCR 支持 [**Tesseract**](https:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract)、[**DocTr**](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr) 和 [**AWS Textract**](https:\u002F\u002Faws.amazon.com\u002Ftextract\u002F)，\n- 使用 [**LayoutLM**](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm) 系列、[**LiLT**](https:\u002F\u002Fgithub.com\u002FjpWang\u002FLiLT) 以及许多基于 [**Bert**](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fxlm-roberta) 风格的模型进行文档和标记分类，这些模型还包括滑动窗口等功能。\n- 使用 [**pdfplumber**](https:\u002F\u002Fgithub.com\u002Fjsvine\u002Fpdfplumber) 对原生 PDF 进行文本挖掘，\n- 基于 Transformer 的 `papluca\u002Fxlm-roberta-base-language-detection` 进行语言检测。\n- 使用 [**jdeskew**](https:\u002F\u002Fgithub.com\u002Fphamquiluan\u002Fjdeskew) 或 [**Tesseract**](https:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract) 对图像进行去倾斜和旋转处理。\n- 微调目标检测、文档或标记分类模型，并评估整个流水线。\n- 多个 [教程](https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fnotebooks)\n\n请查看 [**入门笔记本**](https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fnotebooks\u002Fblob\u002Fmain\u002FAnalyzer_Get_Started.ipynb)，以便快速上手。\n\n有关最新更新，请参阅 [**发布说明**](https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Freleases)。\n\n----------------------------------------------------------------------------------------\n\n# Hugging Face Space 演示\n\n在 🤗 [**Hugging Face spaces**](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepdoctection\u002Fdeepdoctection) 上查看带有 OCR 的文档布局分析流水线演示。\n\n--------------------------------------------------------------------------------------------------------\n\n# 示例\n\n以下示例展示了如何使用内置分析器将 PDF 文档分解为其布局结构。\n\n```python\nimport deepdoctection as dd\nfrom IPython.core.display import HTML\nfrom matplotlib import pyplot as plt\n\nanalyzer = dd.get_dd_analyzer()  # 实例化与 Hugging Face space 演示类似的内置分析器\n\ndf = analyzer.analyze(path = \"\u002Fpath\u002Fto\u002Fyour\u002Fdoc.pdf\")  # 设置流水线\ndf.reset_state()                 # 触发一些初始化\n\ndoc = iter(df)\npage = next(doc) \n\nimage = page.viz(show_figures=True, show_residual_layouts=True)\nplt.figure(figsize = (25,17))\nplt.axis('off')\nplt.imshow(image)\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_readme_bfeaa45a3095.png\" \nalt=\"样本\" width=\"40%\">\n\u003C\u002Fp>\n\n```\nHTML(page.tables[0].html)\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_readme_8b4f81e20525.png\" \nalt=\"表格\" width=\"40%\">\n\u003C\u002Fp>\n\n```\nprint(page.text)\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_readme_0452df6eb523.png\" \nalt=\"文本\" width=\"40%\">\n\u003C\u002Fp>\n\n\n\n-----------------------------------------------------------------------------------------\n\n# 要求\n\n![要求](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_readme_99bcdcc80dd4.png)\n\n- Python >= 3.10\n- PyTorch >= 2.6\n- 若要微调模型，建议使用 GPU。\n\n------------------------------------------------------------------------------------------\n\n# 安装\n\n我们建议使用虚拟环境。\n\n## 快速安装\n\n对于只需使用默认设置解析文档的简单配置，请安装以下内容：\n\n```\nuv pip install timm  # 默认配置所需\nuv pip install transformers\nuv pip install python-doctr\nuv pip install deepdoctection\n```\n\n此配置足以运行 [**入门笔记本**](https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fnotebooks\u002Fblob\u002Fmain\u002FGet_Started.ipynb)。\n\n### 完整安装\n\n以下安装将为您提供一个通用的配置，以便您可以尝试各种不同的设置。请记住，您始终需要单独安装 PyTorch。\n\n首先，由于 Detectron2 不通过 PyPI 发布，因此需要单独安装。请参考 [此处](https:\u002F\u002Fdetectron2.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Finstall.html) 的说明，或尝试：\n\n```\nuv pip install --no-build-isolation detectron2@git+https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdetectron2.git\n```\n\n然后安装包含所有依赖项的 **deep**doctection：\n\n```\nuv pip install deepdoctection[full]\n```\n\n### 使用 Conda 或 Mamba 安装\n\n您可以使用 Conda 或 Mamba 通过提供的 `environment.yml` 文件来安装 **deep**doctection：\n\n```bash\n# 使用 conda\nconda env create -f environment.yml\nconda activate deepdoctection\n\n# 使用 mamba（更快）\nmamba env create -f environment.yml\nmamba activate deepdoctection\n```\n\n\n有关更多信息，请参阅 [**完整安装指南**](https:\u002F\u002Fdeepdoctection.readthedocs.io\u002Fen\u002Flatest\u002Finstall\u002F)。\n\n## 从源代码安装\n\n您可以下载仓库或通过以下命令克隆：\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection.git\n```\n\n最简单的方式是使用 make 进行安装。需要虚拟环境：\n\n```bash\nmake install-dd\n```\n\n## 从 Docker Hub 运行 Docker 容器\n\n预先存在的 Docker 镜像可以从 [Docker Hub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fdeepdoctection\u002Fdeepdoctection) 下载。\n\n此外，还需指定一个工作目录，用于将待处理的文件挂载到容器中。\n\n```\ndocker compose up -d\n```\n\n将启动容器。不过，目前尚未暴露任何端点。\n\n-----------------------------------------------------------------------------------------------\n\n# 致谢\n\n我们感谢所有提供高质量代码和预训练模型的开源库。如果没有它们，开发本框架将是不可能的。\n\n# 如果你喜欢 **deep**doctection……\n\n……你可以通过让更多人了解该项目来轻松支持它。留下一颗星或推荐一下都会有所帮助。\n\n# 许可证\n\n本项目采用 Apache 2.0 许可证进行分发。更多信息请参阅 [LICENSE](https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fblob\u002Fmaster\u002FLICENSE) 文件。","# deepdoctection 快速上手指南\n\n**deepdoctection** 是一个用于文档理解的 Python 库，能够编排扫描版和 PDF 文档的布局分析、OCR（光学字符识别）以及文档\u002F令牌分类任务。它基于 PyTorch，集成了 Detectron2、Transformers、Tesseract 等主流工具，帮助用户快速构建文档提取工作流。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows (推荐 Linux)\n*   **Python 版本**: >= 3.10\n*   **PyTorch 版本**: >= 2.6\n*   **硬件建议**: 如果需要进行模型微调（Fine-tuning），强烈建议使用 GPU；仅进行推理任务可使用 CPU。\n\n> **注意**：PyTorch 需要单独安装，未包含在默认依赖中。\n\n## 安装步骤\n\n推荐使用虚拟环境（如 `venv` 或 `conda`）进行隔离安装。\n\n### 1. 基础安装（推荐新手）\n\n如果您只需要使用默认配置解析文档，运行以下命令即可。此配置足以运行官方入门教程。\n\n```bash\nuv pip install timm\nuv pip install transformers\nuv pip install python-doctr\nuv pip install deepdoctection\n```\n\n*(注：如果您使用国内网络环境较慢，可添加 `-i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple` 参数加速下载)*\n\n### 2. 完整安装（高级用户）\n\n如果您需要实验不同的配置或使用对象检测功能，需先单独安装 **Detectron2**，然后安装完整版的 deepdoctection。\n\n**第一步：安装 Detectron2**\n由于 Detectron2 未通过 PyPI 分发，需从源码安装：\n\n```bash\nuv pip install --no-build-isolation detectron2@git+https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdetectron2.git\n```\n\n**第二步：安装 deepdoctection 全量依赖**\n\n```bash\nuv pip install deepdoctection[full]\n```\n\n### 3. 使用 Conda\u002FMamba 安装\n\n项目提供了 `environment.yml` 文件，可通过以下命令一键创建环境：\n\n```bash\n# 使用 conda\nconda env create -f environment.yml\nconda activate deepdoctection\n\n# 或使用 mamba (速度更快)\nmamba env create -f environment.yml\nmamba activate deepdoctection\n```\n\n## 基本使用\n\n以下示例展示了如何使用内置的分析器将 PDF 文档分解为布局结构，并可视化结果。\n\n```python\nimport deepdoctection as dd\nfrom IPython.core.display import HTML\nfrom matplotlib import pyplot as plt\n\n# 实例化内置分析器（类似 Hugging Face Space 演示中的配置）\nanalyzer = dd.get_dd_analyzer()\n\n# 设置管道并分析文档 (请替换为您的实际 PDF 路径)\ndf = analyzer.analyze(path=\"\u002Fpath\u002Fto\u002Fyour\u002Fdoc.pdf\")\n\n# 触发初始化\ndf.reset_state()\n\n# 获取文档迭代器并读取第一页\ndoc = iter(df)\npage = next(doc)\n\n# 可视化页面布局\nimage = page.viz(show_figures=True, show_residual_layouts=True)\nplt.figure(figsize=(25, 17))\nplt.axis('off')\nplt.imshow(image)\nplt.show()\n\n# 提取表格为 HTML 格式\nHTML(page.tables[0].html)\n\n# 提取纯文本内容\nprint(page.text)\n```\n\n运行上述代码后，您将看到带有布局标注的文档图像，并可分别获取结构化的表格数据和提取后的文本内容。","某金融合规团队每天需处理上千份扫描版贷款合同，从中提取借款人信息、金额及条款细节以录入风控系统。\n\n### 没有 deepdoctection 时\n- **流程割裂严重**：开发人员需分别调用 Tesseract 做 OCR、OpenCV 做图像矫正、再写规则解析坐标，代码耦合度高且难以维护。\n- **复杂版面识别率低**：面对包含多栏排版、嵌套表格的合同，传统方法无法区分“标题”与“正文”，导致关键数据错位或丢失。\n- **模型迭代困难**：若想提升特定字段的识别精度，缺乏统一的微调框架，重新训练和部署 LayoutLM 等模型需要大量重复造轮子的工作。\n- **人工复核成本高**：由于自动化结果不可靠，团队被迫保留大量人力进行二次校对，严重拖慢放款审批速度。\n\n### 使用 deepdoctection 后\n- **一键构建流水线**：通过几行代码即可串联起图像去斜、OCR 识别、版面分析及实体分类，自动协调 Detectron2 与 Transformers 完成端到端处理。\n- **精准理解文档结构**：利用预训练的 LayoutLM 和 LiLT 模型，能准确识别合同中的表格边界及段落逻辑，即使在手写签名或印章遮挡下也能定位关键字段。\n- **灵活微调与扩展**：内置对 Hugging Face 模型的支持，团队可基于历史标注数据快速微调专用模型，并无缝切换至推理环节以适应新合同模板。\n- **可视化调试高效**：直接生成带有布局框的可视化图像，开发人员能直观看到识别残差，迅速定位问题并优化策略，大幅减少人工复核比例。\n\ndeepdoctection 将原本繁琐破碎的文档处理工程转化为标准化的智能流水线，让非结构化文档数据的提取变得像处理数据库记录一样高效可靠。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepdoctection_deepdoctection_9f45b87b.png","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fdeepdoctection_8c6d0279.png","Document AI - Wrapping and using the best Open Source tools",null,"https:\u002F\u002Fgithub.com\u002Fdeepdoctection",[77,81,85,89],{"name":78,"color":79,"percentage":80},"Python","#3572A5",99.6,{"name":82,"color":83,"percentage":84},"Makefile","#427819",0.2,{"name":86,"color":87,"percentage":88},"Dockerfile","#384d54",0.1,{"name":90,"color":91,"percentage":88},"XSLT","#EB8CEB",3159,189,"2026-04-12T05:22:57","Apache-2.0",4,"未说明","微调模型时推荐需要 GPU（具体型号、显存大小及 CUDA 版本未在文中明确说明）",{"notes":100,"python":101,"dependencies":102},"Detectron2 未通过 PyPI 分发，需单独安装（推荐使用提供的 git 源）。建议使用虚拟环境（venv、Conda 或 Mamba）进行安装。提供预构建的 Docker 镜像。核心功能仅支持 PyTorch。",">=3.10",[103,104,105,106,107,108,109,110],"torch>=2.6","timm","transformers","python-doctr","detectron2","pdfplumber","jdeskew","tesseract-ocr",[15,35,14],[113,114,115,116,117,118,119,120,121,122,123,124,125,126,127],"document-parser","document-image-analysis","table-recognition","ocr","document-ai","document-understanding","python","document-layout-analysis","table-detection","pytorch","tensorflow","publaynet","pubtabnet","layoutlm","nlp","2026-03-27T02:49:30.150509","2026-04-13T04:04:31.471410",[131,136,141,146,151,156],{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},31202,"如何在管道中添加自定义 YOLO 模型（如 YOLOv10）作为图像布局服务？","您需要编写一个新的包装器（wrapper）。参考 `d2detect.py` 和 `pdftext.py` 的结构创建新文件（例如 `yolodetector.py`）。主要步骤包括：\n1. 定义一个函数将 YOLO 的推理结果转换为 `DetectionResult` 列表，提取边界框坐标、类别 ID、类别名称和置信度。\n2. 定义预测函数，接收图像数组和模型，调用 YOLO 模型并返回检测结果。\n3. 创建一个继承自 `ObjectDetector` 的类（如 `YoloDetector`），在初始化中加载模型，并实现预测方法。\n完成后，将该检测器实例传递给 `ImageLayoutService` 即可集成到管道中。","https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fissues\u002F371",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},31203,"如何创建自定义数据集以进行微调？","您需要在 `datasets\u002Finstances` 目录下创建一个自定义文件（如 `customdataset.py`）。具体步骤如下：\n1. 定义一个类继承自 `dd.DatasetBase`。\n2. 实现 `_info` 类方法返回 `dd.DatasetInfo`。\n3. 实现 `_categories` 方法返回 `dd.DatasetCategories`。\n4. 实现 `_builder` 方法返回一个自定义的 DataFlowBuilder。\n5. 您需要手动编写 `CustomDataFlowBuilder` 类，使其继承自 `DataFlowBaseBuilder`，用于处理数据加载和注解文件解析。","https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fissues\u002F40",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},31204,"如果想使用非 LayoutParser 的模型（例如 DiT）作为布局检测器，应该替换代码中的哪个部分？","默认不支持开箱即用。您需要编写一个类似于 `HFDetrDerivedDetector` 的自定义检测器类（例如 `DiTDetector`）。\n在代码中，不要使用 `dd.D2FrcnnDetector`，而是实例化您新编写的检测器类。示例逻辑如下：\n```python\n# 获取权重和配置路径\npath_weights = dd.ModelCatalog.get_full_path_weights(\"local_model.pth\")\npath_config = dd.ModelCatalog.get_full_path_configs(\"local_model.pth\")\ncategories = dd.ModelCatalog.get_profile(\"local_model.pth\").categories\n\n# 替换为自定义检测器，而不是 D2FrcnnDetector\ndit_detector = DiTDetector(path_config, path_weights, categories) \nimage_layout = dd.ImageLayoutService(dit_detector)\npipe = dd.DoctectionPipe([image_layout])\n```\n相关教程可参考官方文档中关于运行第三方库预训练模型的章节。","https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fissues\u002F152",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},31205,"运行管道时输入路径（path）报错，支持哪些文件格式？","`path` 参数必须指向以下两种内容之一：\n1. 一个 PDF 文档文件。\n2. 一个包含图像文件（如 png, jpg, tif 等）的**文件夹**。\n如果您直接传入单个图片文件的路径（例如 `.png`），可能会报错。请确保传入的是文件夹路径，或者如果是 PDF 则传入文件路径。","https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fissues\u002F60",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},31206,"DocTr OCR 系统中的文本检测和识别模型必须一起使用吗？","不一定。虽然 DocTr OCR 系统通常采用两步法（先检测后识别），但这两个模型可以独立初始化和运行。\n您可以只导入并使用其中一种模型。例如，如果只需要文本检测，可以仅使用 `doctr\u002Fmodels\u002Fdetection_predictor` 加载检测模型（如 DBNet），而无需加载识别模型（如 CRNN）。只需导入所需的模型类并实例化即可单独运行。","https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fissues\u002F368",{"id":157,"question_zh":158,"answer_zh":159,"source_url":145},31207,"在哪里可以找到运行第三方库预训练模型的教程或笔记本？","相关的 Notebook 和文档位置已更新。您可以访问以下链接获取最新教程：\n1. GitHub Notebook: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fnotebooks\u002Fblob\u002Fmain\u002FRunning_pre_trained_models_from_third_party_libraries.ipynb\n2. 官方文档教程: https:\u002F\u002Fdeepdoctection.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Frunning_pre_trained_models_from_third_party_libraries_notebook\u002F",[161,166,171,176,181,186,191,196,201,206,211,216,221,226,231,236,241,246,251,256],{"id":162,"version":163,"summary_zh":164,"released_at":165},230873,"v.1.2.8","## 变更内容\n* 版本 1.2.8，由 @JaMe76 在 https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F454 中提交\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.2.7...v.1.2.8","2026-04-09T07:26:07",{"id":167,"version":168,"summary_zh":169,"released_at":170},230874,"v.1.2.7","## 变更内容\n* 添加 Extras 类并更新图像序列化 [dd_core][force ci]，由 @JaMe76 在 https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F453 中完成\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.2.6...v.1.2.7","2026-03-31T09:20:21",{"id":172,"version":173,"summary_zh":174,"released_at":175},230875,"v.1.2.6","## 变更内容\n* 版本 1.2.6，由 @JaMe76 在 https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F452 中提交\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.2.5...v.1.2.6","2026-03-30T20:46:38",{"id":177,"version":178,"summary_zh":179,"released_at":180},230876,"v.1.2.5","## 变更内容\n* 版本 1.2.5，由 @JaMe76 在 https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F451 中提交\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.2.4...v.1.2.5","2026-03-20T10:22:07",{"id":182,"version":183,"summary_zh":184,"released_at":185},230877,"v.1.2.4","## 变更内容\n* 版本 1.2.4，由 @JaMe76 在 https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F450 中提交\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.2.3...v.1.2.4","2026-03-14T10:12:24",{"id":187,"version":188,"summary_zh":189,"released_at":190},230878,"v.1.2.3","## 变更内容\n* 版本 1.2.3，由 @JaMe76 在 https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F449 中提交\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.2.2...v.1.2.3","2026-03-12T09:52:40",{"id":192,"version":193,"summary_zh":194,"released_at":195},230879,"v.1.2.2","## 变更内容\n* 版本 1.2.2，由 @JaMe76 在 https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F448 中提交\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.2.1...v.1.2.2","2026-03-08T18:33:58",{"id":197,"version":198,"summary_zh":199,"released_at":200},230880,"v.1.2.1","## 变更内容\n* 版本 1.2.1，由 @JaMe76 在 https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F447 中提交\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.2.0...v.1.2.1","2026-03-04T16:05:11",{"id":202,"version":203,"summary_zh":204,"released_at":205},230881,"v.1.2.0","## 变更内容\n* 版本 1.2.0，由 @JaMe76 在 https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F446 中提交\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.1.0...v.1.2.0","2026-03-02T10:30:05",{"id":207,"version":208,"summary_zh":209,"released_at":210},230882,"v.1.1.0","## 变更内容\n* 版本_1.1.0，由 @JaMe76 在 https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F445 中提交\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.0.7...v.1.1.0","2026-02-17T06:24:03",{"id":212,"version":213,"summary_zh":214,"released_at":215},230883,"v.1.0.7","## What's Changed\r\n* version_1.0.7 by @JaMe76 in https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F444\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.0.6...v.1.0.7","2026-02-13T19:44:07",{"id":217,"version":218,"summary_zh":219,"released_at":220},230884,"v.1.0.6","## What's Changed\r\n* Version 1.0.6 by @JaMe76 in https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F443\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.0.5...v.1.0.6","2026-02-01T11:38:15",{"id":222,"version":223,"summary_zh":224,"released_at":225},230885,"v.1.0.5","## What's Changed\r\n* Version 1.0.5 by @JaMe76 in https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F441\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.0.4...v.1.0.5","2026-01-14T12:56:54",{"id":227,"version":228,"summary_zh":229,"released_at":230},230886,"v.1.0.4","This release fixes two issues:\r\n\r\n- When serializing `ContainerAnnotation`s we now get the `value` attribute exported.\r\n- Detectron2 inference on `mps` takes forever (due to https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Fissues\u002F4888). If `mps` is detected we switch to `cpu` for `D2FrcnnDetector`.\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.0.3...v.1.0.4","2026-01-08T08:48:56",{"id":232,"version":233,"summary_zh":234,"released_at":235},230887,"v.1.0.3","## What's Changed\r\n* Version 1.0.3 by @JaMe76 in https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F440\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.0.2...v.1.0.3","2026-01-07T10:06:47",{"id":237,"version":238,"summary_zh":239,"released_at":240},230888,"v.1.0.2","This release contains a small hot fix:\r\n\r\nWhen serializing `Image` and `Annotation` instances, some attributes did not export properly:\r\n- `image_id`\r\n- `annotation_id`.","2026-01-06T15:49:58",{"id":242,"version":243,"summary_zh":244,"released_at":245},230889,"v.1.0.1","## What's Changed\r\n* Merge version `1.0.1` by @JaMe76 in https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F436\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.1.0.0...v.1.0.1","2026-01-04T10:33:17",{"id":247,"version":248,"summary_zh":249,"released_at":250},230890,"v.1.0.0","This release dumps `version 1.0`. The scope is substantial and introduces breaking changes, meaning this release is not compatible with earlier versions.\r\n\r\n1. The core datapoint module (including `Annotation`, `CategoryAnnotation`, `ImageAnnotation`, `Image`, and `Page`) is now built on Pydantic’s `BaseModel` instead of `dataclass`. This comes with significantly more comprehensive type validation that was previously handled in an ad-hoc manner.\r\n\r\n2. The interface object `Page` and the annotation classes based on `ImageAnnotationBaseView` are no longer subclasses of `Image` and `Annotation`, respectively, but independent objects. The original Annotation-based source object is now stored as an internal attribute of `ImageAnnotationBaseView`; the same applies to `Image` and `Page`.\r\n\r\n3. Package configuration has been unified using the pydantic-settings library. The central `EnvSettings` object is responsible for reading `.env` files, creating environment variables, and copying configuration files (`profiles.json`, `dd_one_config.yaml`, etc.).\r\n\r\n4. The BERT-like model wrappers have been refactored so that many more models from the Transformers library can be used via configuration. This relies on `AutoConfig`, `AutoModel`, and `AutoTokenizer`. By leveraging tokenizer config files, maintaining a separate mapping of model → tokenizer will no longer be necessary.\r\n\r\n5. The analyzer pipeline has been extended with additional components for token classification and sequence classification. This enables using fine-tuned models from the Hugging Face Hub via configuration within a pipeline, without having to implement a custom pipeline.\r\n\r\n6. Models, wrappers, etc. based on TensorFlow\u002FTensorpack have been removed. The package is now PyTorch-only.\r\n\r\n7. FastText-based language detection has been removed.\r\n\r\n8. The points above describe changes to the library objects themselves. The remaining changes affect the overall organization of `deepdoctection`.\r\n\r\n9. `deepdoctection` is no longer monolithic and is split into separate packages: `dd_core`, `dd_datasets`, and `deepdoctection`.\r\n\r\n- `dd_core` contains the modules `utils`, `datapoint`, `mapper`, and `dataflow` and can be installed either as a minimal package or with the extra dependency set full. The motivation is to provide the core data model as a small standalone package when inference happens elsewhere and only `JSON` results need to be parsed. The full extras additionally include tools to convert PDF documents into image pixel formats.\r\n\r\n- `dd_datasets` depends on `dd_core` and contains the datasets module.\r\n\r\n- `deepdoctection` contains the remaining modules: `extern`, `configs`, `analyzer`, `pipe`, `eval`, and `train`. In addition to third-party libraries, it depends on `dd_core` for the reduced scope. With this combination, inference-mode pipelines can be run locally, which is sufficient for most use cases. With the full extras, dd_datasets and the training scripts can be used as well.\r\n\r\n10. The test suite has been extensively reworked:\r\n\r\nAdded ~300 unit tests for the refactored datapoint module\r\nReduced fixture dependencies\r\nIntroduced slow test cases\r\ntox is used for testing, formatting, linting, and type checking. It can be run directly or via the Makefile.\r\n\r\n11. The documentation has been updated extensively to reflect the new structure and behavior.","2025-12-29T22:23:49",{"id":252,"version":253,"summary_zh":254,"released_at":255},230891,"v.1.0.0a","## What's Changed\r\n* version 1.0 by @JaMe76 in https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F434\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.0.46.2...v.1.0.0a","2025-12-29T19:47:15",{"id":257,"version":258,"summary_zh":259,"released_at":260},230892,"v.0.46.2","## What's Changed\r\n* version_0.46.2 by @JaMe76 in https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fpull\u002F430\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdeepdoctection\u002Fdeepdoctection\u002Fcompare\u002Fv.0.46.1...v.0.46.2","2025-10-27T17:31:41"]