[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-mindee--doctr":3,"tool-mindee--doctr":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",145895,2,"2026-04-08T11:32:59",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108111,"2026-04-08T11:23:26",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":32,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":110,"github_topics":111,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":121,"updated_at":122,"faqs":123,"releases":154},5562,"mindee\u002Fdoctr","doctr","docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.","doctr 是一个基于深度学习构建的开源光学字符识别（OCR）库，旨在让文档文字提取变得简单高效且人人可用。它主要解决了从各类文档（如 PDF、图片或网页）中精准定位并识别文字内容的难题，尤其擅长处理复杂的版面布局。\n\n无论是需要集成 OCR 功能的开发者，还是从事文档分析的研究人员，doctr 都能提供极大的便利。其核心亮点在于采用“两阶段”端到端处理流程：先通过文本检测模型定位单词位置，再利用文本识别模型逐字还原内容。这种架构不仅支持用户灵活组合不同的检测与识别模型（如 DBNet 与 CRNN），还内置了对旋转文档和多方向文本框的智能处理能力，显著提升了在复杂场景下的识别准确率。\n\ndoctr 基于 PyTorch 开发，提供了简洁的 Python API，支持一键加载预训练模型，只需几行代码即可实现从文件读取到结果输出的全流程。此外，它还兼容 Docker 部署，并提供 Hugging Face 空间演示和 Colab 教程，方便用户快速上手验证效果。如果你正在寻找一个高性能、易扩展且社区活跃的 OCR 解决方案，doctr 值得尝试。","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_98976510ebb4.gif\" width=\"40%\">\n\u003C\u002Fp>\n\n[![Slack Icon](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-Community-4A154B?style=flat-square&logo=slack&logoColor=white)](https:\u002F\u002Fslack.mindee.com) [![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-blue.svg)](LICENSE) ![Build Status](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fworkflows\u002Fbuilds\u002Fbadge.svg) [![Docker Images](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocker-4287f5?style=flat&logo=docker&logoColor=white)](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpkgs\u002Fcontainer\u002Fdoctr) [![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fmindee\u002Fdoctr\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg?token=577MO567NM)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fmindee\u002Fdoctr) [![CodeFactor](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_9ee0cb95ac54.png)](https:\u002F\u002Fwww.codefactor.io\u002Frepository\u002Fgithub\u002Fmindee\u002Fdoctr) [![Codacy Badge](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_549cb1d41474.png)](https:\u002F\u002Fapp.codacy.com\u002Fgh\u002Fmindee\u002Fdoctr?utm_source=github.com&utm_medium=referral&utm_content=mindee\u002Fdoctr&utm_campaign=Badge_Grade) [![Doc Status](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fworkflows\u002Fdoc-status\u002Fbadge.svg)](https:\u002F\u002Fmindee.github.io\u002Fdoctr) [![Pypi](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpypi-v1.0.1-blue.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpython-doctr\u002F) [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmindee\u002Fdoctr) [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fmindee\u002Fnotebooks\u002Fblob\u002Fmain\u002Fdoctr\u002Fquicktour.ipynb) [![Gurubase](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGurubase-Ask%20docTR%20Guru-006BFF)](https:\u002F\u002Fgurubase.io\u002Fg\u002Fdoctr)\n\n\n**Optical Character Recognition made seamless & accessible to anyone, powered by PyTorch**\n\nWhat you can expect from this repository:\n\n- efficient ways to parse textual information (localize and identify each word) from your documents\n- guidance on how to integrate this in your current architecture\n\n![OCR_example](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_a3f053914bad.png)\n\n## Quick Tour\n\n### Getting your pretrained model\n\nEnd-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word).\nAs such, you can select the architecture used for [text detection](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Flatest\u002Fmodules\u002Fmodels.html#doctr-models-detection), and the one for [text recognition](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Flatest\u002F\u002Fmodules\u002Fmodels.html#doctr-models-recognition) from the list of available implementations.\n\n```python\nfrom doctr.models import ocr_predictor\n\nmodel = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)\n```\n\n### Reading files\n\nDocuments can be interpreted from PDF or images:\n\n```python\nfrom doctr.io import DocumentFile\n# PDF\npdf_doc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\n# Image\nsingle_img_doc = DocumentFile.from_images(\"path\u002Fto\u002Fyour\u002Fimg.jpg\")\n# Webpage (requires `weasyprint` to be installed)\nwebpage_doc = DocumentFile.from_url(\"https:\u002F\u002Fwww.yoursite.com\")\n# Multiple page images\nmulti_img_doc = DocumentFile.from_images([\"path\u002Fto\u002Fpage1.jpg\", \"path\u002Fto\u002Fpage2.jpg\"])\n```\n\n### Putting it together\n\nLet's use the default pretrained model for an example:\n\n```python\nfrom doctr.io import DocumentFile\nfrom doctr.models import ocr_predictor\n\nmodel = ocr_predictor(pretrained=True)\n# PDF\ndoc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\n# Analyze\nresult = model(doc)\n```\n\n### Dealing with rotated documents\n\nShould you use docTR on documents that include rotated pages, or pages with multiple box orientations,\nyou have multiple options to handle it:\n\n- If you only use straight document pages with straight words (horizontal, same reading direction),\nconsider passing `assume_straight_pages=True` to the ocr_predictor. It will directly fit straight boxes\non your page and return straight boxes, which makes it the fastest option.\n\n- If you want the predictor to output straight boxes (no matter the orientation of your pages, the final localizations\nwill be converted to straight boxes), you need to pass `export_as_straight_boxes=True` in the predictor. Otherwise, if `assume_straight_pages=False`, it will return rotated bounding boxes (potentially with an angle of 0°).\n\nIf both options are set to False, the predictor will always fit and return rotated boxes.\n\nTo interpret your model's predictions, you can visualize them interactively as follows:\n\n```python\n# Display the result (requires matplotlib & mplcursors to be installed)\nresult.show()\n```\n\n![Visualization sample](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_accc32d17532.gif)\n\nOr even rebuild the original document from its predictions:\n\n```python\nimport matplotlib.pyplot as plt\n\nsynthetic_pages = result.synthesize()\nplt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()\n```\n\n![Synthesis sample](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_a85e4336ee64.png)\n\nThe `ocr_predictor` returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`).\nTo get a better understanding of our document model, check our [documentation](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Fmodules\u002Fio.html#document-structure):\n\nYou can also export them as a nested dict, more appropriate for JSON format:\n\n```python\njson_output = result.export()\n```\n\n### Use the KIE predictor\n\nThe KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and addresses in a document.\n\nThe KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.\n\n```python\nfrom doctr.io import DocumentFile\nfrom doctr.models import kie_predictor\n\n# Model\nmodel = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)\n# PDF\ndoc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\n# Analyze\nresult = model(doc)\n\npredictions = result.pages[0].predictions\nfor class_name in predictions.keys():\n    list_predictions = predictions[class_name]\n    for prediction in list_predictions:\n        print(f\"Prediction for {class_name}: {prediction}\")\n```\n\nThe KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.\n\n### If you are looking for support from the Mindee team\n\n[![Bad OCR test detection image asking the developer if they need help](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_e6d321ea2778.png)](https:\u002F\u002Fmindee.com\u002Fproduct\u002Fdoctr)\n\n## Installation\n\n### Prerequisites\n\nPython 3.10 (or higher) and [pip](https:\u002F\u002Fpip.pypa.io\u002Fen\u002Fstable\u002F) are required to install docTR.\n\n### Latest release\n\nYou can then install the latest release of the package using [pypi](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpython-doctr\u002F) as follows:\n\n```shell\npip install python-doctr\n```\n\nWe try to keep extra dependencies to a minimum. You can install specific builds as follows:\n\n```shell\n# standard build\npip install python-doctr\n# optional dependencies for visualization, html, and contrib modules can be installed as follows:\npip install \"python-doctr[viz,html,contrib]\"\n```\n\n### Developer mode\n\nAlternatively, you can install it from source, which will require you to install [Git](https:\u002F\u002Fgit-scm.com\u002Fbook\u002Fen\u002Fv2\u002FGetting-Started-Installing-Git).\nFirst clone the project repository:\n\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr.git\npip install -e doctr\u002F.\n```\n\nAgain, if you prefer to avoid the risk of missing dependencies, you can install the build:\n\n```shell\npip install -e doctr\u002F.\n```\n\n## Models architectures\n\nCredits where it's due: this repository is implementing, among others, architectures from published research papers.\n\n### Text Detection\n\n- DBNet: [Real-time Scene Text Detection with Differentiable Binarization](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.08947.pdf).\n- LinkNet: [LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1707.03718.pdf)\n- FAST: [FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2111.02394.pdf)\n\n### Text Recognition\n\n- CRNN: [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1507.05717.pdf).\n- SAR: [Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1811.00751.pdf).\n- MASTER: [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.02562.pdf).\n- ViTSTR: [Vision Transformer for Fast and Efficient Scene Text Recognition](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.08582.pdf).\n- PARSeq: [Scene Text Recognition with Permuted Autoregressive Sequence Models](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.06966).\n- VIPTR: [A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.10110).\n\n## More goodies\n\n### Documentation\n\nThe full package documentation is available [here](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002F) for detailed specifications.\n\n### Demo app\n\nA minimal demo app is provided for you to play with our end-to-end OCR models!\n\n![Demo app](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_156e17b9542a.png)\n\n#### Live demo\n\nCourtesy of :hugs: [Hugging Face](https:\u002F\u002Fhuggingface.co\u002F) :hugs:, docTR has now a fully deployed version available on [Spaces](https:\u002F\u002Fhuggingface.co\u002Fspaces)!\nCheck it out [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmindee\u002Fdoctr)\n\n#### Running it locally\n\nIf you prefer to use it locally, there is an extra dependency ([Streamlit](https:\u002F\u002Fstreamlit.io\u002F)) that is required.\n\n```shell\npip install -r demo\u002Fpt-requirements.txt\n```\n\nThen run your app in your default browser with:\n\n```shell\nstreamlit run demo\u002Fapp.py\n```\n\n### Docker container\n\nWe offer Docker container support for easy testing and deployment. [Here are the available docker tags.](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpkgs\u002Fcontainer\u002Fdoctr).\n\n#### Using GPU with docTR Docker Images\n\nThe docTR Docker images are GPU-ready and based on CUDA `12.2`. Make sure your host is **at least `12.2`**, otherwise Torch won't be able to initialize the GPU.\nPlease ensure that Docker is configured to use your GPU.\n\nTo verify and configure GPU support for Docker, please follow the instructions provided in the [NVIDIA Container Toolkit Installation Guide](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html).\n\nOnce Docker is configured to use GPUs, you can run docTR Docker containers with GPU support:\n\n```shell\ndocker run -it --gpus all ghcr.io\u002Fmindee\u002Fdoctr:torch-py3.9.18-2024-10 bash\n```\n\n#### Available Tags\n\nThe Docker images for docTR follow a specific tag nomenclature: `\u003Cdeps>-py\u003Cpython_version>-\u003Cdoctr_version|YYYY-MM>`. Here's a breakdown of the tag structure:\n\n- `\u003Cdeps>`: `torch`, `torch-viz-html-contrib`.\n- `\u003Cpython_version>`: `3.9.18`, `3.10.13` or `3.11.8`.\n- `\u003Cdoctr_version>`: a tag >= `v0.11.0`\n- `\u003CYYYY-MM>`: e.g. `2014-10`\n\nHere are examples of different image tags:\n\n| Tag                        | Description                                       |\n|----------------------------|---------------------------------------------------|\n| `torch-viz-html-contrib-py3.11.8-2024-10`       | Torch with extra dependencies version `3.11.8` from latest commit on `main` in `2024-10`. |\n| `torch-py3.11.8-2024-10`| PyTorch version `3.11.8` from latest commit on `main` in `2024-10`. |\n\n#### Building Docker Images Locally\n\nYou can also build docTR Docker images locally on your computer.\n\n```shell\ndocker build -t doctr .\n```\n\nYou can specify custom Python versions and docTR versions using build arguments. For example, to build a docTR image with PyTorch, Python version `3.9.10`, and docTR version `v0.7.0`, run the following command:\n\n```shell\ndocker build -t doctr --build-arg FRAMEWORK=torch --build-arg PYTHON_VERSION=3.9.10 --build-arg DOCTR_VERSION=v0.7.0 .\n```\n\n### Example script\n\nAn example script is provided for a simple documentation analysis of a PDF or image file:\n\n```shell\npython scripts\u002Fanalyze.py path\u002Fto\u002Fyour\u002Fdoc.pdf\n```\n\nAll script arguments can be checked using `python scripts\u002Fanalyze.py --help`\n\n### Minimal API integration\n\nLooking to integrate docTR into your API? Here is a template to get you started with a fully working API using the wonderful [FastAPI](https:\u002F\u002Fgithub.com\u002Ftiangolo\u002Ffastapi) framework.\n\n#### Deploy your API locally\n\nSpecific dependencies are required to run the API template, which you can install as follows:\n\n```shell\ncd api\u002F\npip install poetry\nmake lock\npip install -r requirements.txt\n```\n\nYou can now run your API locally:\n\n```shell\nuvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api\u002F app.main:app\n```\n\nAlternatively, you can run the same server on a docker container if you prefer using:\n\n```shell\nPORT=8002 docker-compose up -d --build\n```\n\n#### What you have deployed\n\nYour API should now be running locally on your port 8002. Access your automatically-built documentation at [http:\u002F\u002Flocalhost:8002\u002Fredoc](http:\u002F\u002Flocalhost:8002\u002Fredoc) and enjoy your three functional routes (\"\u002Fdetection\", \"\u002Frecognition\", \"\u002Focr\", \"\u002Fkie\"). Here is an example with Python to send a request to the OCR route:\n\n```python\nimport requests\n\nparams = {\"det_arch\": \"db_resnet50\", \"reco_arch\": \"crnn_vgg16_bn\"}\n\nwith open('\u002Fpath\u002Fto\u002Fyour\u002Fdoc.jpg', 'rb') as f:\n    files = [  # application\u002Fpdf, image\u002Fjpeg, image\u002Fpng supported\n        (\"files\", (\"doc.jpg\", f.read(), \"image\u002Fjpeg\")),\n    ]\nprint(requests.post(\"http:\u002F\u002Flocalhost:8080\u002Focr\", params=params, files=files).json())\n```\n\n### Example notebooks\n\nLooking for more illustrations of docTR features? You might want to check the [Jupyter notebooks](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Ftree\u002Fmain\u002Fnotebooks) designed to give you a broader overview.\n\n## Supported By\n\nThis project is supported by [t2k GmbH](https:\u002F\u002Fwww.text2knowledge.de\u002Fde),\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_e551dc4c5be1.png\" width=\"40%\">\n\u003C\u002Fp>\n\n## Citation\n\nIf you wish to cite this project, feel free to use this [BibTeX](http:\u002F\u002Fwww.bibtex.org\u002F) reference:\n\n```bibtex\n@misc{doctr2021,\n    title={docTR: Document Text Recognition},\n    author={Mindee},\n    year={2021},\n    publisher = {GitHub},\n    howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr}}\n}\n```\n\n## Contributing\n\nIf you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way?\n\nYou're in luck, we compiled a short guide (cf. [`CONTRIBUTING`](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Fcontributing\u002Fcontributing.html)) for you to easily do so!\n\n## License\n\nDistributed under the Apache 2.0 License. See [`LICENSE`](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr?tab=Apache-2.0-1-ov-file#readme) for more information.\n","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_98976510ebb4.gif\" width=\"40%\">\n\u003C\u002Fp>\n\n[![Slack 图标](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-Community-4A154B?style=flat-square&logo=slack&logoColor=white)](https:\u002F\u002Fslack.mindee.com) [![许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-blue.svg)](LICENSE) ![构建状态](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fworkflows\u002Fbuilds\u002Fbadge.svg) [![Docker 镜像](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocker-4287f5?style=flat&logo=docker&logoColor=white)](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpkgs\u002Fcontainer\u002Fdoctr) [![Codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fmindee\u002Fdoctr\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg?token=577MO567NM)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fmindee\u002Fdoctr) [![CodeFactor](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_9ee0cb95ac54.png)](https:\u002F\u002Fwww.codefactor.io\u002Frepository\u002Fgithub\u002Fmindee\u002Fdoctr) [![Codacy 徽章](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_549cb1d41474.png)](https:\u002F\u002Fapp.codacy.com\u002Fgh\u002Fmindee\u002Fdoctr?utm_source=github.com&utm_medium=referral&utm_content=mindee\u002Fdoctr&utm_campaign=Badge_Grade) [![文档状态](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fworkflows\u002Fdoc-status\u002Fbadge.svg)](https:\u002F\u002Fmindee.github.io\u002Fdoctr) [![Pypi](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpypi-v1.0.1-blue.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpython-doctr\u002F) [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmindee\u002Fdoctr) [![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fmindee\u002Fnotebooks\u002Fblob\u002Fmain\u002Fdoctr\u002Fquicktour.ipynb) [![Gurubase](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGurubase-Ask%20docTR%20Guru-006BFF)](https:\u002F\u002Fgurubase.io\u002Fg\u002Fdoctr)\n\n\n**由 PyTorch 提供支持，让光学字符识别变得无缝且人人可用**\n\n您将从本仓库中获得：\n\n- 从文档中高效解析文本信息（定位并识别每个单词）的方法\n- 如何将其集成到您现有架构中的指导\n\n![OCR_example](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_a3f053914bad.png)\n\n## 快速入门\n\n### 获取预训练模型\n\ndocTR 中的端到端 OCR 是通过两阶段方法实现的：文本检测（定位单词），然后文本识别（识别单词中的所有字符）。\n因此，您可以从可用的实现列表中选择用于[文本检测](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Flatest\u002Fmodules\u002Fmodels.html#doctr-models-detection)和用于[文本识别](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Flatest\u002F\u002Fmodules\u002Fmodels.html#doctr-models-recognition)的架构。\n\n```python\nfrom doctr.models import ocr_predictor\n\nmodel = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)\n```\n\n### 读取文件\n\n文档可以从 PDF 或图像中解析：\n\n```python\nfrom doctr.io import DocumentFile\n# PDF\npdf_doc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\n# 图像\nsingle_img_doc = DocumentFile.from_images(\"path\u002Fto\u002Fyour\u002Fimg.jpg\")\n# 网页（需要安装 `weasyprint`）\nwebpage_doc = DocumentFile.from_url(\"https:\u002F\u002Fwww.yoursite.com\")\n# 多页图像\nmulti_img_doc = DocumentFile.from_images([\"path\u002Fto\u002Fpage1.jpg\", \"path\u002Fto\u002Fpage2.jpg\"])\n```\n\n### 组合使用\n\n让我们以默认的预训练模型为例：\n\n```python\nfrom doctr.io import DocumentFile\nfrom doctr.models import ocr_predictor\n\nmodel = ocr_predictor(pretrained=True)\n# PDF\ndoc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\n# 分析\nresult = model(doc)\n```\n\n### 处理旋转的文档\n\n如果您在 docTR 中处理包含旋转页面或具有多种框方向的文档，您可以选择以下几种方式来处理：\n\n- 如果您只使用页面和文字都为直立（水平、相同阅读方向）的文档，请考虑在 ocr_predictor 中传递 `assume_straight_pages=True`。它会直接在您的页面上拟合直立框，并返回直立框，这是最快的方式。\n\n- 如果您希望预测器输出直立框（无论页面方向如何，最终的定位都会被转换为直立框），则需要在预测器中传递 `export_as_straight_boxes=True`。否则，如果 `assume_straight_pages=False`，它将返回旋转的边界框（可能角度为 0°）。\n\n如果这两个选项都设置为 False，预测器将始终拟合并返回旋转的边界框。\n\n要解释模型的预测结果，您可以按如下方式交互式地可视化它们：\n\n```python\n# 显示结果（需要安装 matplotlib 和 mplcursors）\nresult.show()\n```\n\n![可视化示例](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_accc32d17532.gif)\n\n或者甚至可以根据预测结果重建原始文档：\n\n```python\nimport matplotlib.pyplot as plt\n\nsynthetic_pages = result.synthesize()\nplt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()\n```\n\n![合成示例](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_a85e4336ee64.png)\n\n`ocr_predictor` 返回一个带有嵌套结构的 `Document` 对象（包含 `Page`、`Block`、`Line`、`Word`、`Artefact`）。要更好地了解我们的文档模型，请查看我们的[文档](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Fmodules\u002Fio.html#document-structure)：\n\n您还可以将其导出为更适合 JSON 格式的嵌套字典：\n\n```python\njson_output = result.export()\n```\n\n### 使用 KIE 预测器\n\n与 OCR 相比，KIE 预测器更加灵活，因为您的检测模型可以检测文档中的多个类别。例如，您可以使用检测模型来检测文档中的日期和地址。\n\nKIE 预测器使您能够将多类别的检测模型与识别模型结合使用，并为您预先设置好整个流程。\n\n```python\nfrom doctr.io import DocumentFile\nfrom doctr.models import kie_predictor\n\n# 模型\nmodel = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)\n# PDF\ndoc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\n# 分析\nresult = model(doc)\n\npredictions = result.pages[0].predictions\nfor class_name in predictions.keys():\n    list_predictions = predictions[class_name]\n    for prediction in list_predictions:\n        print(f\"Prediction for {class_name}: {prediction}\")\n```\n\nKIE 预测器每页的结果以字典格式呈现，每个键代表一个类别名称，其值是该类别的预测结果。\n\n### 如果您需要 Mindee 团队的支持\n\n[![Bad OCR 测试检测图像，询问开发者是否需要帮助](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_e6d321ea2778.png)](https:\u002F\u002Fmindee.com\u002Fproduct\u002Fdoctr)\n\n## 安装\n\n### 先决条件\n\n安装 docTR 需要 Python 3.10（或更高版本）以及 [pip](https:\u002F\u002Fpip.pypa.io\u002Fen\u002Fstable\u002F)。\n\n### 最新版本\n\n随后，您可以使用 [pypi](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpython-doctr\u002F) 安装该包的最新版本，命令如下：\n\n```shell\npip install python-doctr\n```\n\n我们尽量将额外依赖保持在最低限度。您也可以按需安装特定构建版本，方法如下：\n\n```shell\n# 标准构建\npip install python-doctr\n# 如果需要可视化、HTML 和 contrib 模块的可选依赖，可以这样安装：\npip install \"python-doctr[viz,html,contrib]\"\n```\n\n### 开发者模式\n\n此外，您也可以从源代码安装，这需要先安装 [Git](https:\u002F\u002Fgit-scm.com\u002Fbook\u002Fen\u002Fv2\u002FGetting-Started-Installing-Git)。首先克隆项目仓库：\n\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr.git\npip install -e doctr\u002F.\n```\n\n同样地，如果您希望避免遗漏依赖的风险，可以直接安装构建版本：\n\n```shell\npip install -e doctr\u002F.\n```\n\n## 模型架构\n\n在此特别鸣谢：本仓库实现了多篇已发表研究论文中的模型架构。\n\n### 文本检测\n\n- DBNet：[基于可微二值化的实时场景文本检测](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.08947.pdf)\n- LinkNet：[LinkNet：利用编码器表示进行高效语义分割](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1707.03718.pdf)\n- FAST：[FAST：基于极简核表示的更快任意形状文本检测器](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2111.02394.pdf)\n\n### 文本识别\n\n- CRNN：[端到端可训练的图像序列识别神经网络及其在场景文本识别中的应用](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1507.05717.pdf)\n- SAR：[展示、注意与阅读：一种简单而强大的不规则文本识别基线](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1811.00751.pdf)\n- MASTER：[MASTER：用于场景文本识别的多视角非局部网络](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.02562.pdf)\n- ViTSTR：[用于快速高效场景文本识别的视觉Transformer](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.08582.pdf)\n- PARSeq：[基于置换自回归序列模型的场景文本识别](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.06966.pdf)\n- VIPTR：[用于快速高效场景文本识别的视觉可置换提取器](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.10110)\n\n## 更多实用功能\n\n### 文档\n\n完整的包文档请访问 [这里](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002F)，以获取详细说明。\n\n### 示例应用\n\n我们提供了一个极简示例应用，供您体验我们的端到端 OCR 模型！\n\n![示例应用](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_156e17b9542a.png)\n\n#### 在线演示\n\n感谢 :hugs: [Hugging Face](https:\u002F\u002Fhuggingface.co\u002F) :hugs:，docTR 现已在 [Spaces](https:\u002F\u002Fhuggingface.co\u002Fspaces) 上部署了完整版本！快去看看吧 [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmindee\u002Fdoctr)\n\n#### 本地运行\n\n如果您更倾向于在本地使用，还需要安装一个额外的依赖项（[Streamlit](https:\u002F\u002Fstreamlit.io\u002F))。\n\n```shell\npip install -r demo\u002Fpt-requirements.txt\n```\n\n然后在您的默认浏览器中运行应用：\n\n```shell\nstreamlit run demo\u002Fapp.py\n```\n\n### Docker 容器\n\n我们提供了 Docker 容器支持，方便测试和部署。[可用的 Docker 标签在这里。](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpkgs\u002Fcontainer\u002Fdoctr)。\n\n#### 使用 GPU 运行 docTR Docker 镜像\n\ndocTR 的 Docker 镜像已支持 GPU，并基于 CUDA `12.2` 构建。请确保您的主机至少为 `12.2` 版本，否则 PyTorch 将无法初始化 GPU。同时，请确保 Docker 已正确配置以使用您的 GPU。\n\n要验证并配置 Docker 的 GPU 支持，请按照 [NVIDIA Container Toolkit 安装指南](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html) 中的说明操作。\n\n完成 GPU 配置后，您可以运行支持 GPU 的 docTR Docker 容器：\n\n```shell\ndocker run -it --gpus all ghcr.io\u002Fmindee\u002Fdoctr:torch-py3.9.18-2024-10 bash\n```\n\n#### 可用标签\n\ndocTR 的 Docker 镜像采用特定的标签命名规则：\u003Cdeps>-py\u003Cpython_version>-\u003Cdoctr_version|YYYY-MM>。标签结构分解如下：\n\n- `\u003Cdeps>`：torch、torch-viz-html-contrib。\n- `\u003Cpython_version>`：3.9.18、3.10.13 或 3.11.8。\n- `\u003Cdoctr_version>`：标签 ≥ v0.11.0。\n- `\u003CYYYY-MM>`：例如 2014-10。\n\n以下是不同镜像标签的示例：\n\n| 标签                        | 描述                                       |\n|----------------------------|---------------------------------------------------|\n| `torch-viz-html-contrib-py3.11.8-2024-10`       | 带有额外依赖的 PyTorch 版本 3.11.8，基于 `main` 分支在 2024 年 10 月的最新提交。 |\n| `torch-py3.11.8-2024-10`| PyTorch 版本 3.11.8，基于 `main` 分支在 2024 年 10 月的最新提交。 |\n\n#### 本地构建 Docker 镜像\n\n您也可以在本地计算机上构建 docTR 的 Docker 镜像。\n\n```shell\ndocker build -t doctr .\n```\n\n您可以通过构建参数指定自定义的 Python 版本和 docTR 版本。例如，要构建一个包含 PyTorch、Python 3.9.10 和 docTR v0.7.0 的镜像，可以运行以下命令：\n\n```shell\ndocker build -t doctr --build-arg FRAMEWORK=torch --build-arg PYTHON_VERSION=3.9.10 --build-arg DOCTR_VERSION=v0.7.0 .\n```\n\n### 示例脚本\n\n我们提供了一个示例脚本，用于对 PDF 或图像文件进行简单的文档分析：\n\n```shell\npython scripts\u002Fanalyze.py path\u002Fto\u002Fyour\u002Fdoc.pdf\n```\n\n您可以通过 `python scripts\u002Fanalyze.py --help` 查看所有脚本参数。\n\n### 极简 API 集成\n\n希望将 docTR 集成到您的 API 中吗？这里有一个模板，帮助您使用优秀的 [FastAPI](https:\u002F\u002Fgithub.com\u002Ftiangolo\u002Ffastapi) 框架快速搭建一个可运行的 API。\n\n#### 在本地部署您的 API\n\n运行此 API 模板需要一些特定的依赖项，您可以按如下方式安装：\n\n```shell\ncd api\u002F\npip install poetry\nmake lock\npip install -r requirements.txt\n```\n\n现在您可以在本地运行您的 API：\n\n```shell\nuvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api\u002F app.main:app\n```\n\n或者，如果您更喜欢使用 Docker 容器，也可以通过以下命令启动相同的服务：\n\n```shell\nPORT=8002 docker-compose up -d --build\n```\n\n#### 您已部署的内容\n\n您的 API 现在应该正在本地的 8002 端口上运行。您可以通过 [http:\u002F\u002Flocalhost:8002\u002Fredoc](http:\u002F\u002Flocalhost:8002\u002Fredoc) 访问自动生成的文档，并体验三个可用的路由（“\u002Fdetection”、“\u002Frecognition”、“\u002Focr”、“\u002Fkie”）。以下是一个使用 Python 向 OCR 路由发送请求的示例：\n\n```python\nimport requests\n\nparams = {\"det_arch\": \"db_resnet50\", \"reco_arch\": \"crnn_vgg16_bn\"}\n\nwith open('\u002Fpath\u002Fto\u002Fyour\u002Fdoc.jpg', 'rb') as f:\n    files = [  # 支持 application\u002Fpdf、image\u002Fjpeg、image\u002Fpng 格式\n        (\"files\", (\"doc.jpg\", f.read(), \"image\u002Fjpeg\")),\n    ]\nprint(requests.post(\"http:\u002F\u002Flocalhost:8080\u002Focr\", params=params, files=files).json())\n```\n\n### 示例笔记本\n\n想了解更多关于 docTR 功能的示例吗？不妨查看我们精心设计的 [Jupyter 笔记本](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Ftree\u002Fmain\u002Fnotebooks)，它们将为您提供更全面的概览。\n\n## 支持单位\n\n本项目由 [t2k GmbH](https:\u002F\u002Fwww.text2knowledge.de\u002Fde) 提供支持。\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_readme_e551dc4c5be1.png\" width=\"40%\">\n\u003C\u002Fp>\n\n## 引用\n\n如果您希望引用本项目，可以使用以下 [BibTeX](http:\u002F\u002Fwww.bibtex.org\u002F) 格式的参考文献：\n\n```bibtex\n@misc{doctr2021,\n    title={docTR: 文档文本识别},\n    author={Mindee},\n    year={2021},\n    publisher = {GitHub},\n    howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr}}\n}\n```\n\n## 贡献\n\n如果您翻到这一部分，很可能您非常认可开源精神。您是否愿意扩展我们支持的字符范围？或者提交一篇论文实现？亦或是以其他方式做出贡献？\n\n那您就来对了！我们为您整理了一份简短指南（参见 [`CONTRIBUTING`](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Fcontributing\u002Fcontributing.html)），帮助您轻松参与贡献！\n\n## 许可证\n\n本项目采用 Apache 2.0 许可证进行分发。更多信息请参阅 [`LICENSE`](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr?tab=Apache-2.0-1-ov-file#readme) 文件。","# doctr 快速上手指南\n\n**doctr** 是一个基于 PyTorch 的开源光学字符识别（OCR）工具，旨在让文档文本信息的提取（定位与识别）变得简单高效。它采用两阶段方法：先进行文本检测（定位单词），再进行文本识别（识别字符）。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows\n*   **Python 版本**：3.10 或更高版本\n*   **包管理工具**：pip\n*   **可选依赖**：\n    *   若需可视化结果，需安装 `matplotlib` 和 `mplcursors`。\n    *   若需解析网页，需安装 `weasyprint`。\n    *   若需运行本地 Demo 应用，需安装 `streamlit`。\n\n> **提示**：国内用户建议使用国内镜像源加速安装，例如阿里云或清华大学镜像源。\n\n## 安装步骤\n\n### 方式一：通过 PyPI 安装（推荐）\n\n安装最新稳定版：\n\n```shell\npip install python-doctr -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n若需要额外的可视化、HTML 导出或贡献模块功能，可安装完整依赖：\n\n```shell\npip install \"python-doctr[viz,html,contrib]\" -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 方式二：从源码安装（开发者模式）\n\n如果您希望使用最新开发版本或参与贡献：\n\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr.git\ncd doctr\npip install -e . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\n以下是最简单的端到端 OCR 使用流程，涵盖模型加载、文档读取及结果分析。\n\n### 1. 快速开始示例\n\n这段代码将加载预训练模型，读取一个 PDF 文件，并输出识别结果。\n\n```python\nfrom doctr.io import DocumentFile\nfrom doctr.models import ocr_predictor\n\n# 加载预训练模型 (默认使用 db_resnet50 进行检测，crnn_vgg16_bn 进行识别)\nmodel = ocr_predictor(pretrained=True)\n\n# 读取文档 (支持 PDF, 图片，或图片列表)\ndoc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\n# 如果是图片：doc = DocumentFile.from_images(\"path\u002Fto\u002Fyour\u002Fimg.jpg\")\n\n# 执行 OCR 分析\nresult = model(doc)\n\n# 查看结果结构 (result 是一个包含 Page, Block, Line, Word 的嵌套对象)\nprint(result.export()) \n```\n\n### 2. 可视化结果\n\n若已安装可视化依赖，可以直接展示检测框和识别文本：\n\n```python\n# 交互式显示结果\nresult.show()\n```\n\n或者合成重建文档图像：\n\n```python\nimport matplotlib.pyplot as plt\n\nsynthetic_pages = result.synthesize()\nplt.imshow(synthetic_pages[0])\nplt.axis('off')\nplt.show()\n```\n\n### 3. 处理旋转文档\n\n如果您的文档包含旋转页面或非水平文本，可以通过调整参数优化效果：\n\n*   `assume_straight_pages=True`：假设页面和文字都是水平的（速度最快）。\n*   `export_as_straight_boxes=True`：无论原始角度如何，强制输出水平边界框。\n\n```python\n# 针对可能包含旋转内容的文档\nmodel = ocr_predictor(pretrained=True, assume_straight_pages=False, export_as_straight_boxes=True)\nresult = model(doc)\n```\n\n### 4. 关键信息提取 (KIE)\n\n除了通用 OCR，doctr 还支持针对特定类别（如日期、地址）的关键信息提取：\n\n```python\nfrom doctr.models import kie_predictor\n\n# 加载 KIE 预测器\nmodel = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)\n\ndoc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\nresult = model(doc)\n\n# 遍历不同类别的预测结果\npredictions = result.pages[0].predictions\nfor class_name, list_predictions in predictions.items():\n    for prediction in list_predictions:\n        print(f\"类别 {class_name}: {prediction}\")\n```","某金融科技公司需要每天自动处理数千份扫描版的发票和合同，将其中的关键信息录入数据库以进行合规审计。\n\n### 没有 doctr 时\n- 传统 OCR 引擎对倾斜拍摄或褶皱的文档识别率极低，导致大量关键字段（如金额、日期）乱码，需人工二次复核。\n- 面对 PDF、图片甚至网页截图等多种格式源文件，团队不得不编写繁琐的预处理脚本分别转换，维护成本高昂。\n- 开源替代方案往往缺乏端到端的深度学习模型，无法精准定位单词级别的位置坐标，难以满足结构化数据提取需求。\n- 部署过程复杂，依赖环境冲突频发，且缺乏针对旋转文档的自动校正能力，开发周期被严重拉长。\n\n### 使用 doctr 后\n- 借助其强大的深度学习检测与识别双阶段模型，即使文档存在旋转或背景干扰，doctr 也能精准还原文字内容，准确率显著提升。\n- 通过统一的 `DocumentFile` 接口，doctr 直接支持从 PDF、本地图片集乃至 URL 链接读取数据，彻底消除了格式转换的代码冗余。\n- doctr 原生输出包含每个单词的精确边界框坐标，让后续的结构化信息抽取（如表格还原）变得简单可靠。\n- 利用预训练模型和简洁的 API，开发人员仅需几行代码即可集成高精度 OCR 功能，并自动处理页面旋转问题，大幅缩短上线时间。\n\ndoctr 将复杂的文档文字识别任务转化为简单的几行代码调用，让企业能以最低成本实现高可用的自动化文档处理流程。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmindee_doctr_a3f05391.png","mindee","Mindee","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmindee_84bc7acc.png","",null,"contact@mindee.com","mindeeapi","https:\u002F\u002Fmindee.com","https:\u002F\u002Fgithub.com\u002Fmindee",[82,86,90],{"name":83,"color":84,"percentage":85},"Python","#3572A5",99.7,{"name":87,"color":88,"percentage":89},"Dockerfile","#384d54",0.2,{"name":91,"color":92,"percentage":93},"Makefile","#427819",0.1,6002,634,"2026-04-08T00:30:04","Apache-2.0","Linux, macOS, Windows","非必需。若使用 Docker GPU 加速，需 NVIDIA GPU 且宿主机 CUDA 版本至少为 12.2。","未说明",{"notes":102,"python":103,"dependencies":104},"该工具基于 PyTorch 构建，支持文本检测和识别两阶段 OCR。标准安装仅需执行 'pip install python-doctr'，可视化、HTML 处理等额外功能需安装可选依赖。官方提供基于 CUDA 12.2 的 Docker 镜像以简化 GPU 环境部署。","3.10+",[105,106,107,108,109],"torch","weasyprint (可选，用于网页解析)","matplotlib (可选，用于可视化)","mplcursors (可选，用于可视化)","streamlit (可选，用于本地演示应用)",[15,14],[112,113,114,115,116,117,118,119,120],"ocr","deep-learning","document-recognition","tensorflow2","text-detection-recognition","text-detection","text-recognition","optical-character-recognition","pytorch","2026-03-27T02:49:30.150509","2026-04-08T22:39:58.981155",[124,129,134,139,144,149],{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},25224,"在 Windows 上运行时报错 'cannot load library pango-1.0' 或找不到库文件怎么办？","这通常是由于外部库版本不兼容导致的。解决方案包括：\n1. 尝试卸载 GraphViz，因为它在 Windows 上可能与 GTK+ 的外部库存在版本冲突。\n2. 确保安装了最新版本的依赖库（特别是 weasyprint），新版本已移除对 cairo 的强制依赖并升级了 pango 版本，更好地支持 Windows。\n3. 如果问题依旧，请检查是否正确安装了 GTK 项目及相关运行时环境。","https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fissues\u002F815",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},25225,"如何为 Doctr 项目构建 Conda 包？为什么官方没有提供 Conda 版本？","目前官方尚未发布正式的 Conda 包，主要原因是部分依赖项（如 PyMuPDF 或 pypdfium2 的特定构建）在 conda-forge 上支持不完善或架构受限。\n建议方案：\n1. 继续使用自定义频道安装核心依赖：`conda install -c bblanchon -c pypdfium2-team pypdfium2`。\n2. 虽然 conda-forge 上有 pypdfium2，但其仅支持特定架构（如 osx-64）且打包方式不够灵活，因此推荐优先使用上述专用频道。\n3. 社区正在探索将更多依赖推送到 conda-forge 的可能性，但目前手动配置频道仍是最佳实践。","https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fissues\u002F113",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},25226,"训练自定义数据集时遇到 'UnboundLocalError: local variable l1_loss referenced before assignment' 错误如何解决？","该错误通常与输入图像尺寸或训练配置有关。建议检查以下几点：\n1. 确保训练和推理时使用一致的图像分辨率（例如，如果在 960x960 的图像上训练，检测时也应调整图像至相同尺寸）。\n2. 检查数据加载器是否正确处理了不同尺寸的图像，必要时在预处理阶段统一 Resize 操作。\n3. 确认使用的模型架构（如 db_resnet50）与输入尺寸兼容，某些架构对输入尺寸有特定要求。","https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fissues\u002F1738",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},25227,"如何在德语词汇表中添加特殊字符（如 'é'）以支持包含外来语的训练数据？","虽然 'é' 不是标准德语字母，但在某些源自法语的德语单词（如 Coupé, Varieté）中会使用。如果训练因缺少该字符失败，可以手动将其添加到德语词汇表配置中。\n具体操作需修改项目源码中的语言配置文件，将 'é' 加入对应语言的字符集列表，然后重新初始化模型或词汇表对象。","https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fissues\u002F1141",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},25228,"Doctr 是否支持生成带有可搜索文本层的 PDF\u002FA 文件或 hOCR 格式输出？","社区已有相关功能请求和部分实现进展。目前可以通过以下方式获取类似功能：\n1. 关注项目最新的 PR 更新，部分提交已尝试集成 hOCR 输出或直接生成 PDF\u002FA 的功能。\n2. 用户可以利用 Doctr 提取的文本坐标信息，结合其他工具（如 reportlab 或 pypdf）自行合成带有隐藏文本层的 PDF。\n3. 建议加入 Mindee 社区的 Slack 频道（#doctr 通道）以获取最新的功能动态和临时解决方案。","https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fissues\u002F512",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},25229,"使用 synthesize() 方法重建文档图像时，字体断裂或质量不佳怎么办？","重建图像质量不佳（如字体断裂）通常与渲染引擎或字体映射有关。\n建议尝试以下优化：\n1. 检查系统是否安装了完整的字体库，缺失字体可能导致回退到默认字体从而引起渲染异常。\n2. 确保使用的 WeasyPrint 或底层渲染库是最新版本，旧版本可能存在渲染 bug。\n3. 如果是 PDF 输入，尝试调整 DocumentFile 的加载参数，或在合成前对页面进行额外的预处理（如提高 DPI）。","https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fissues\u002F1528",[155,160,165,170,175,180,185,190,195,200,205,210,215,220,225,230,235,240,245,250],{"id":156,"version":157,"summary_zh":158,"released_at":159},154603,"v1.0.1","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\n注意：docTR 1.0.1 需要 Python >= 3.10\n\n\u003C!-- 使用 .github\u002Frelease.yml 中的配置生成的发布说明 -->\n\n## 变更内容\n### 错误修复\n* [杂项] @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F2037 中进行了维护更新\n* [bug] 修复 RandomCrop 的 bug 并更新 API conftest 的顺序，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F2038 中完成\n### 杂项\n* [杂项] @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1986 中发布了 v1.0.0 版本后的更新\n* @SteffenSH 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1994 中修复了 README 中的拼写错误\n* @nik875 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F2036 中更新以支持 huggingface-hub 1.3.5\n\n## 新贡献者\n* @SteffenSH 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1994 中做出了首次贡献\n* @AdemBoukhris457 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1993 中做出了首次贡献\n* @nik875 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F2036 中做出了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fcompare\u002Fv1.0.0...v1.0.1","2026-02-04T14:45:29",{"id":161,"version":162,"summary_zh":163,"released_at":164},154604,"v1.0.0","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\n注意：docTR 1.0.0 需要 Python >= 3.10\n\n## 变更内容\n\n### 破坏性变更\n\n**TensorFlow** 已被移除，不再作为支持的后端。docTR 现在默认且仅使用 **PyTorch** 作为深度学习后端。\n\n安装选项 `torch` 和 `tf` 已被移除。现在只需运行以下命令即可安装 docTR：\n\n```bash\npip install python-doctr\n```\n\n这将默认安装带有 PyTorch 支持的 docTR。\n\n训练脚本的文件名已更新，去除了与后端相关的扩展名。例如：\n\n```\nrecognition\u002Ftrain_pytorch.py → recognition\u002Ftrain.py\n```\n\n### 新特性\n* 添加了一个新的 crnn_vgg16_bn 检查点\n\n\u003C!-- 使用 .github\u002Frelease.yml 中的配置生成的发布说明，在 main 分支上 -->\n\n## 变更内容\n### 破坏性变更 🛠\n* [BREAKING] 移除 TensorFlow 后端，由 @felixT2K 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1967 中完成\n### Bug 修复\n* [bug] 修复 viptr ONNX 导出问题，由 @felixT2K 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1966 中完成\n* [Fix] 修正方向估计中图像膨胀的条件，由 @Razlaw 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1971 中完成\n### 改进\n* [models] 新的 crnn_vgg16_bn 检查点，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1969 中添加\n* [CI] 将 Windows 添加到构建 CI 任务中，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1981 中完成\n### 其他\n* [misc] 发布 v0.12.0 版本后的相关工作，由 @felixT2K 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1965 中完成\n* [misc\u002Fquality] 调整导入语句，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1984 中完成\n* [misc] 重命名参考脚本及相应的 CI 作业路径，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1985 中完成\n\n## 新贡献者\n* @Tanmay20030516 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1980 中做出了他们的首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fcompare\u002Fv0.12.0...v1.0.0","2025-07-09T11:39:32",{"id":166,"version":167,"summary_zh":168,"released_at":169},154605,"v0.12.0","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\n\u003C\u002Fp>\n\n注意：docTR 0.12.0 需要 Python >= 3.10  \n注意：docTR 0.12.0 需要 TensorFlow >= 2.15.0 或 PyTorch >= 2.0.0  \n\n> [!WARNING]  \n> **TensorFlow 后端弃用通知**  \n>  \n> 使用 TensorFlow 作为后端的 docTR 已被弃用，并将在下一个主要版本（v1.0.0）中移除。  \n> 我们**建议切换到 PyTorch 后端**，因为它维护得更加活跃，并且支持最新的功能和模型。  \n> 或者，您也可以使用 [OnnxTR](https:\u002F\u002Fgithub.com\u002Ffelixdittrich92\u002FOnnxTR)，它**不需要 TensorFlow 或 PyTorch**。  \n>  \n> 做出这一决定是基于以下几点考虑：  \n>  \n> - 能够更好地专注于改进核心库  \n> - 释放资源以更快地开发新功能  \n> - 可以利用 PyTorch 进行更有针对性的优化  \n\n> [!WARNING]  \n> **本版本是最后一个支持 TensorFlow 后端的次要版本**  \n\n## 更改内容\n### 新特性\n* 添加了一个新的轻量级识别模型 `viptr_tiny`  \n* 新增内置数据集——COCO-Text V2  \n* 新增自定义模型加载接口  \n\n```python\n# 新增\nmodel = vitstr_small(pretrained=False, pretrained_backbone=False)\nmodel.from_pretrained(\"\u003CPATH_TO>\")  # 本地路径或 .pt 或 .h5 文件的 URL\n\n# 替代依赖后端的方式\nreco_params = torch.load('\u003Cpath_to_pt>', map_location=\"cpu\")\nreco_model.load_state_dict(reco_params)\n# 或者使用 TensorFlow\nreco_model.load_weights(..)\n```  \n\n\u003C!-- 发布说明由 .github\u002Frelease.yml 中的配置生成 -->\n\n## 更改内容\n### 破坏性变更 🛠\n* [特性] 简化并统一模型加载方式——from_pretrained，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1915 中实现  \n* [构建] 添加 TensorFlow 弃用警告，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1948 中实现  \n### 新特性\n* [数据集] COCO-Text V2 集成，由 @sarjil77 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1888 中实现  \n* [参考] 识别——允许使用内置数据集，由 @sarjil77 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1904 中实现  \n* [特性] PyTorch ——VIP 主干网络和 VIPTR 识别模块，由 @lkosh 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1912 中实现  \n### Bug 修复\n* [修复] 修复了识别训练脚本中的重复前向调用，由 @felixT2K 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1862 中实现  \n* [修复] 修复了无效的 hOCR 格式以及与 PDF\u002FA 的兼容性问题，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1870 中实现  \n* [CI\u002FCD] 修复了 conda 依赖问题，由 @felixT2K 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1937 中实现  \n* [修复] 修复了短字符串合并问题，由 @Razlaw 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1947 中实现  \n* [Bug] 修复了缺失的导入，由 @felixT2K 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1958 中实现  \n* 修复：仅在验证损失改善时更新 min_loss，以确保最佳……，由 @sneakybatman 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1961 中实现  \n### 改进","2025-06-20T07:43:27",{"id":171,"version":172,"summary_zh":173,"released_at":174},154606,"v0.11.0","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\n注意：docTR 0.11.0 需要 Python >= 3.10\r\n注意：docTR 0.11.0 需要 TensorFlow >= 2.15.0 或 PyTorch >= 2.0.0\r\n\r\n## 变更内容\r\n### 新特性\r\n* 添加了 `torch.compile` 支持（PyTorch 后端）\r\n* 改进了模型训练日志记录\r\n* 创建了一个专为 docTR 设计的小型标注工具（处于早期阶段）--> [doctr-labeler](https:\u002F\u002Fgithub.com\u002Ftext2knowledge\u002FdocTR-Labeler) \r\n\r\n### 编译你的模型\r\n使用 `torch.compile` 编译 PyTorch 模型可以通过将其转换为图表示，并应用能够提升性能的后端，从而优化模型。\r\n这一过程可以使推理速度更快，并在执行过程中减少内存开销。\r\n\r\n更多信息请参阅 [PyTorch 文档](https:\u002F\u002Fpytorch.org\u002Ftutorials\u002Fintermediate\u002Ftorch_compile_tutorial.html)\r\n\r\n```python\r\nimport torch\r\nfrom doctr.models import (\r\n    ocr_predictor,\r\n    vitstr_small,\r\n    fast_base,\r\n    mobilenet_v3_small_crop_orientation,\r\n    mobilenet_v3_small_page_orientation,\r\n    crop_orientation_predictor,\r\n    page_orientation_predictor\r\n)\r\n\r\n# 编译模型\r\ndetection_model = torch.compile(\r\n    fast_base(pretrained=True).eval()\r\n)\r\nrecognition_model = torch.compile(\r\n    vitstr_small(pretrained=True).eval()\r\n)\r\ncrop_orientation_model = torch.compile(\r\n    mobilenet_v3_small_crop_orientation(pretrained=True).eval()\r\n)\r\npage_orientation_model = torch.compile(\r\n    mobilenet_v3_small_page_orientation(pretrained=True).eval()\r\n)\r\n\r\npredictor = models.ocr_predictor(\r\n    detection_model, recognition_model, assume_straight_pages=False\r\n)\r\n# 注意：仅在非直立页面（`assume_straight_pages=False`）且未禁用方向分类时需要\r\n# 设置方向预测器\r\npredictor.crop_orientation_predictor = crop_orientation_predictor(crop_orientation_model)\r\npredictor.page_orientation_predictor = page_orientation_predictor(page_orientation_model)\r\n\r\ncompiled_out = predictor(doc)\r\n```\r\n\r\n\u003C!-- 根据 .github\u002Frelease.yml 中的配置生成的发布说明 -->\r\n\r\n## 变更内容\r\n### 新特性\r\n* [特性] 添加 torch.compile 支持，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1791 中实现\r\n* 特性：:sparkles: tqdm Slack 集成，由 @odulcy-mindee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1837 中实现\r\n* [文档] 在训练章节中添加关于标注工具的说明，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1839 中完成\r\n* 特性：:sparkles: ClearML 训练损失日志记录，由 @odulcy-mindee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1844 中实现\r\n### 错误修复\r\n* [修复] 文档部署 CI\u002FCD 任务，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1781 中完成\r\n* [CI] 修复 PR 标签器任务并回滚文档部署依赖项，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1786 中完成\r\n* [构建] 修复 TensorFlow 构建依赖，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1807 中完成","2025-01-30T09:25:43",{"id":176,"version":177,"summary_zh":178,"released_at":179},154607,"v0.10.0","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\n注意：docTR 0.10.0 需要 Python >= 3.9  \n注意：docTR 0.10.0 需要 TensorFlow >= 2.15.0 或 PyTorch >= 2.0.0  \n\n## 变更内容\n### 软性破坏性变更（仅限 TensorFlow 后端）🛠\n* 将保存格式从 `\u002Fweights` 更改为 `.weights.h5`\n\n**注意：** 请更新您自定义训练的模型以及上传到 Hugging Face Hub 的模型，这将是最后一个支持从 `\u002Fweights` 手动加载的版本。\n\n\n## 新特性\n* 添加了对 NumPy 2.0 的支持 @felixdittrich92  \n* 新增并更新了笔记本示例 @felixdittrich92 --> [笔记本](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Flatest\u002Fnotebooks.html)  \n* 自定义方向分类模型加载 @felixdittrich92  \n* 在处理旋转文档时，新增了更多控制流水线的功能 @milosacimovic @felixdittrich92  \n* 内置数据集现在可以直接通过设置 `detection_task=True` 进行检测，与现有的 `recognition_task=True` 类似 @felixdittrich92  \n\n### 禁用页面方向分类\n\n* 如果您处理的文档仅包含小幅旋转（约 -45 到 45 度），可以禁用页面方向分类以加快推理速度。  \n* 此功能仅在以下条件下生效：`assume_straight_pages=False` 和\u002F或 `straighten_pages=True` 和\u002F或 `detect_orientation=True`。\n\n```python\nfrom doctr.models import ocr_predictor\n\nmodel = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_page_orientation=True)\n```\n\n### 禁用裁剪方向分类\n\n* 如果您处理的文档仅包含水平文本，可以禁用裁剪方向分类以加快推理速度。  \n* 此功能仅在以下条件下生效：`assume_straight_pages=False` 和\u002F或 `straighten_pages=True`。\n\n```python\nfrom doctr.models import ocr_predictor\n\nmodel = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_crop_orientation=True)\n```\n\n### 加载自定义导出的方向分类模型\n\n现在您可以加载自己训练的方向分类模型，以下代码片段展示了具体方法：\n\n```python\nfrom doctr.io import DocumentFile\nfrom doctr.models import ocr_predictor, mobilenet_v3_small_page_orientation, mobilenet_v3_small_crop_orientation\nfrom doctr.models.classification.zoo import crop_orientation_predictor, page_orientation_predictor\n\ncustom_page_orientation_model = mobilenet_v3_small_page_orientation(\"\u003C自定义导出的 ONNX 模型路径>\")\ncustom_crop_orientation_model = mobilenet_v3_small_crop_orientation(\"\u003C自定义导出的 ONNX 模型路径>\")\n\npredictor = ocr_predictor(pretrained=True, assume_straight_pages=False, detect_orientation=True)\n\n# 替换默认的方向分类模型\npredictor.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model)\npredictor.page_orientation_predictor = ","2024-10-21T08:37:12",{"id":181,"version":182,"summary_zh":183,"released_at":184},154608,"v0.9.0","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\n注意：docTR 0.9.0 需要 Python >= 3.9。\r\n注意：docTR 0.9.0 需要 TensorFlow >= 2.11.0 或 PyTorch >= 1.12.0。\n\n## 变更内容\n### 软性破坏性变更 🛠\n* 默认的 `detection` 模型已从 `db_resnet50` 更改为 `fast_base`。\n**注意**：可以通过传递检测模型 `predictor = ocr_predictor(det_arch=\"db_resnet50\", pretrained=True)` 来恢复为旧版本。\n* `resolve_blocks` 的默认值已从 `True` 改为 `False`。\n**注意**：可以通过向 `ocr_predictor` 传递 `resolve_blocks=True` 来恢复为旧版本。\n\n## 新特性\n* @odulcy-mindee 和 @felixdittrich92 为快速模型提供了预训练检查点。\n* 引入了贡献模块，取代了目标检测部分，并为更多流水线提供了一个平台，由 @felixdittrich92 完成。\n* @felixdittrich92 和 @odulcy-mindee 改进了方向检测功能。\n* @felixdittrich92 改进了并更新了 API 模板。\n* @felixdittrich92 在结果中加入了 `objectness_score`。\n* @felixdittrich92 将单词裁剪后的通用方向添加到了输出中。\n* @felixdittrich92 将库拆分为多个可选依赖部分。\n* @felixdittrich92 和 @odulcy-mindee 添加了页面方向预测器。\n* @felixdittrich92 添加了 ONNX 推理文档。\n\n### ✨ 安装 ✨\n\n我们将 docTR 拆分成了几个可选部分，以使其更加轻量，并排除那些在推理时不需要的部分。\n可选部分包括：\n* 可视化（支持 `.show()` 方法）\n* HTML 支持（支持 `.from_url(...)` 方法）\n* 贡献模块\n\n```\r\n# 对于 TensorFlow，不包含任何可选依赖\r\npip install \"python-doctr[tf]\"\r\n\r\n# 对于 PyTorch，不包含任何可选依赖\r\npip install \"python-doctr[torch]\"\r\n\r\n# 安装 PyTorch 并包含所有可用的可选依赖\r\npip install \"python-doctr[torch,viz,html,contib]\"\r\n```\r\n\r\n### ✨ ONNX 和 OnnxTR ✨\n\n我们构建了一个独立的库，旨在提供一种超轻量级的方式来使用现有的 docTR ONNX 导出模型或您自定义的模型。\n\n优势：\n- 熟悉的 docTR 接口（如 `ocr_predictor` 等）\n- **无需** `PyTorch` 或 `TensorFlow` —— 基于 `onnxruntime` 构建\n- 包体积更小，推理延迟更低，所需资源更少\n- 8 位量化模型可在 CPU 上实现更快的推理速度\n\n请尝试并查看：[OnnxTR](https:\u002F\u002Fgithub.com\u002Ffelixdittrich92\u002FOnnxTR)  \ndocTR 文档：[ONNX \u002F OnnxTR](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Fusing_doctr\u002Fusing_model_export.html#using-your-onnx-exported-model)\n\n![Screenshot from 2024-08-09 09-15-37](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F60775627-9ba1-47f2-bea8-4ed92cbf58fe)\n\n\n\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at main -->\n\n## 变更内容\n### 破坏性变更 🛠\n* [models] 将默认模型更改为 `fast_base` —— 软性破坏性变更，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1588 中提出。\n* ","2024-08-08T13:57:18",{"id":186,"version":187,"summary_zh":188,"released_at":189},154609,"v0.8.1","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\n注意：doctr 0.8.1 需要 TensorFlow >= 2.11.0 或 PyTorch >= 1.12.0。\n\n## 变更内容\n\n- 修复了 Conda 安装包及 CI 作业，适用于 Conda 和 PyPI 发布版本\n- 修复了一些失效的链接\n\n- 预发布：来自 [FAST: 基于极简内核表示的更快任意形状文本检测器](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2111.02394) 的 FAST 文本检测模型 **-> 检查点将在下一次发布中提供**","2024-03-04T14:50:47",{"id":191,"version":192,"summary_zh":193,"released_at":194},154610,"v0.8.0","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\n\u003C\u002Fp>\n\n注意：doctr 0.8.0 需要 TensorFlow >= 2.11.0 或 PyTorch >= 1.12.0。\n\n## 变更内容\n### 破坏性变更 🛠\n* 移除了 `db_resnet50_rotation`（PyTorch）和 `linknet_resnet18_rotation`（TensorFlow）模型（所有模型现在都可以处理旋转的文档）\n* `.show(doc)` 已更改为 `.show()`\n\n## 新特性\n* 所有模型现均已由 @odulcy-mindee 提供预训练检查点\n* 所有检测模型均由 @odulcy-mindee 使用旋转样本重新训练\n* @felixdittrich92 改进了在 -90 度到 90 度之间旋转的文档的方向检测\n* @frgfm 添加了 Conda 部署作业及收据相关功能\n* @odulcy-mindee 添加了官方 docTR Docker 镜像 => [docker-images](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpkgs\u002Fcontainer\u002Fdoctr)\n* @felixdittrich92 进行了新的基准测试并改进了文档\n* @HamzaGbada 添加了 `WildReceipt` 数据集\n* @SkaarFacee 将 EarlyStopping 回调添加到所有训练脚本中\n* @felixdittrich92 为 `ocr_predictor` 添加了钩子机制，允许在管道中间根据需求操作检测预测结果\n\n```python\nfrom doctr.model import ocr_predictor\n\nclass CustomHook:\n    def __call__(self, loc_preds):\n        # 在这里操作位置预测\n        # 1. 输出结构需要与输入的位置预测保持一致\n        # 2. 注意坐标是相对值，范围应在 0 到 1 之间\n        return loc_preds\n\nmy_hook = CustomHook()\n\npredictor = ocr_predictor(pretrained=True)\n# 在管道中间添加一个钩子\npredictor.add_hook(my_hook)\n# 也可以添加多个钩子，它们会按顺序执行\nfor hook in [my_hook, my_hook, my_hook]:\n    predictor.add_hook(hook)\n```\n\n## 变更内容\n### 破坏性变更 🛠\n* [原型] 由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1336 中实现的基于分割图计算方向的功能\n### 新特性\n* 特性：:sparkles: @odulcy-mindee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1322 中提供的 docTR 官方 Docker 镜像\n* @HamzaGbada 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1359 中添加了 wildreceipt 数据集\n* @SkaarFacee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1397 中添加了提前停止功能\n* [PT \u002F TF] @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1425 中添加了 TextNet - FAST 主干网络\n* 特性：@frgfm 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1414 中添加了 Conda recipe 及相应的 CI 作业\n* [原型] @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1449 中扩展了检测结果的自定义功能\n### Bug 修复\n* [修复] @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1324 中修复了 PreProcessor 中的抗锯齿问题\n* [修复] @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1327 中修复了 parseq 和 vitstr 模型的概率计算问题\n* [修复] @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1324 中修复了概率值溢出的问题","2024-02-28T13:13:02",{"id":196,"version":197,"summary_zh":198,"released_at":199},154611,"v0.7.0","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\n注意：doctr 0.7.0 需要 TensorFlow >= 2.11.0 或 PyTorch >= 1.12.0。\r\n注意：我们将在 0.7.1 版本中发布缺失的 PyTorch 检查点。\n\n## 变更内容\n### 破坏性变更 🛠\n* 我们在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1279 中将 `preserve_aspect_ratio` 参数的默认值改为了 `True`。\n=> 若要恢复旧行为，可以在 `predictor` 实例中传入 `preserve_aspect_ratio=False`。\n\n## 新特性\n* 功能：使检测训练和推理支持多类别，由 @aminemindee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1097 中实现。\n* 现所有 TensorFlow 模型均已提供预训练权重，由 @odulcy-mindee 完成。\n* 文档已更新，并添加了各模型对应的基准测试结果，由 @felixdittrich92 完成。\n* 在两个框架中新增了两种识别模型（ViTSTR 和 PARSeq），由 @felixdittrich92 和 @nikokks 共同完成。\n\n### KIE 预测器的加入\n与 OCR 相比，KIE 预测器更加灵活，因为您的检测模型可以在文档中检测多个类别。例如，您可以使用一个检测模型来识别文档中的日期和地址。\n\nKIE 预测器使得能够将多类别检测模型与识别模型结合使用，并为您预先配置好整个流程。\n\n```python\nfrom doctr.io import DocumentFile\nfrom doctr.models import kie_predictor\n\n# 模型\nmodel = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)\n# PDF\ndoc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\n# 分析\nresult = model(doc)\n\npredictions = result.pages[0].predictions\nfor class_name in predictions.keys():\n    list_predictions = predictions[class_name]\n    for prediction in list_predictions:\n        print(f\"Prediction for {class_name}: {prediction}\")\n```\nKIE 预测器每页的结果以字典格式返回，其中每个键代表一个类别名称，其值则是该类别的预测结果。\n\n## 变更内容\n### 破坏性变更 🛠\n* 功能：使检测训练和推理支持多类别，由 @aminemindee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1097 中实现。\n### 新特性\n* 功能：:sparkles: PyTorch 识别模型的多 GPU 支持，由 @odulcy-mindee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1164 中实现。\n* [功能] 添加 TF 和 PT 版本的 PARSeq 模型，由 @nikokks 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1205 中完成。\n* [功能] 提高 PT 后端预测器的精度，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1204 中实现。\n* 功能：:sparkles: TensorFlow 的 ClearML 支持，由 @odulcy-mindee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1257 中实现。\n### Bug 修复\n* 修复分类模型在 CUDA 上的移动问题，由 @odulcy-mindee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1125 中完成。\n* 修复：:wrench: Docker API 使用 GitHub 仓库的问题，由 @odulcy-mindee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F1148 中完成。\n* 解压 SROIE 数据集归档文件时出现错误，由 @HamzaGbada 在 https:\u002F\u002Fgithub.com\u002Fm 中发现并报告。","2023-09-09T13:23:25",{"id":201,"version":202,"summary_zh":203,"released_at":204},154612,"v0.6.0","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\n# 本次发布亮点:\r\n\r\n**注意**: doctr 0.6.0 需要 TensorFlow >= 2.9.0 或 PyTorch >= 1.8.0。\n\n### 与 Hugging Face Hub 完全集成（docTr 与 Hugging Face 的结合）\n\n![hf](https:\u002F\u002Fassets.st-note.com\u002Fproduction\u002Fuploads\u002Fimages\u002F35450010\u002Frectangle_large_type_2_7f287c8bb8ad90f69c4a537719b32ace.png?fit=bounds&quality=85&width=1280)\n\n- 从 Hub 加载：\n\n```python\nfrom doctr.io import DocumentFile\nfrom doctr.models import ocr_predictor, from_hub\nimage = DocumentFile.from_images(['data\u002Fexample.jpg'])\n# 从 Hugging Face Hub 加载自定义检测模型\ndet_model = from_hub('Felix92\u002Fdoctr-torch-db-mobilenet-v3-large')\n# 从 Hugging Face Hub 加载自定义识别模型\nreco_model = from_hub('Felix92\u002Fdoctr-torch-crnn-mobilenet-v3-large-french')\n# 可以轻松将这些模型接入 OCR 预测器\npredictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)\nresult = predictor(image)\n```\n\n- 推送到 Hub：\n\n```python\nfrom doctr.models import recognition, login_to_hub, push_to_hf_hub\nlogin_to_hub()\nmy_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)\npush_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')\n```\n文档：https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Fusing_doctr\u002Fsharing_models.html\n\n### 预定义数据集也可用于识别任务\n\n```python\nfrom doctr.datasets import CORD\n# 使用原始裁剪框（可能包含不规则形状）\ntrain_set = CORD(train=True, download=True, recognition_task=True)\n# 使用旋转后的裁剪框（始终为规则矩形）\ntrain_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True）\nimg, target = train_set[0]\n```\n文档：https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Fusing_doctr\u002Fusing_datasets.html\n\n### 新模型（两个框架均有）\n\n- 分类：VisionTransformer (ViT)\n- 识别：用于场景文本识别的 Vision Transformer (ViTSTR)\n\n### 修复了识别模型中的错误\n\n- MASTER 和 SAR 架构现在在两个框架（TensorFlow 和 PyTorch）中均可正常运行。\n\n### ONNX 支持（实验性）\n\n- 所有模型现在都可以导出为 ONNX 格式（仅剩 TF 版本的 MobileNet 尚未支持，计划在 0.7.0 中实现）。\n\n注意：完整的生产级 ONNX 流程及构建计划将在 0.7.0 版本中推出（目前导出的模型仅到 logits 层，不包含任何后处理步骤）。\n\n### 其他功能改进\n\n- 我们的演示现在也兼容 PyTorch，感谢 @odulcy-mindee 的贡献。\n- 现在可以检测提取文本的语言，感谢 @aminemindee 的贡献。\n\n\n## 变更内容\n### 破坏性变更 🛠\n* 功能：✨ 允许 CRNN 后处理中使用大于 1 的束宽，由 @khalidMindee 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F630 中实现。\n* [修复] TensorFlow SAR_Resnet31 实现，由 @felixdittrich92 在 https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F9 中完成。","2022-09-29T11:51:15",{"id":206,"version":207,"summary_zh":208,"released_at":209},154613,"v0.5.1","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\nThis minor release includes: improvement of the documentation thanks to @felixdittrich92, bugs fixed, support of rotation extended to Tensorflow backend, a switch from PyMuPDF to pypdfmium2 and a nice integration to the Hugginface Hub thanks to @fg-mindee !\r\n\r\n**Note**: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.\r\n\r\n# Highlights\r\n\r\n### Improvement of the documentation\r\n\r\nThe documentation has been improved adding a new theme, illustrations, and docstring has been completed and developed.\r\nThis how it renders:\r\n\r\n![doc](https:\u002F\u002Fuser-images.githubusercontent.com\u002F70526046\u002F159456296-48529ffd-9fd7-4517-bcd4-3d4de9368419.png)\r\n![Capture d’écran de 2022-03-22 11-08-31](https:\u002F\u002Fuser-images.githubusercontent.com\u002F70526046\u002F159457048-abd970b9-436e-40dd-b940-ec16baadb53b.png)\r\n\r\n### Rotated text detection extended to Tensorflow backend\r\n\r\nWe provide weights for the `linknet_resnet18_rotation` model which has been deeply modified: We implemented a new loss (based on Dice Loss and Focal Loss), we changed the computation of the targets so that polygons are shrunken the same way they are in the DBNet which improves highly the precision of the segmenter and we trained the model preserving the aspect ratio of the images.\r\nAll these improvements led to much better results, and the pretrained model is now very robust.\r\n\r\n### Preserving the aspect ratio in the detection task\r\n\r\nYou can now choose to preserve the aspect ratio in the detection_predictor:\r\n\r\n```\r\n>>> from doctr.models import detection_predictor\r\n>>> predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)\r\n```\r\nThis option can also be activated in the high level end-to-end predictor:\r\n\r\n```\r\n>>> from doctr.model import ocr_predictor\r\n>>> model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)\r\n```\r\n\r\n### Integration within the HugginFace Hub\r\n\r\nThe artefact detection model is now available on the [HugginFace Hub,](https:\u002F\u002Fhuggingface.co\u002Fmindee\u002Ffasterrcnn_mobilenet_v3_large_fpn) this is amazing:\r\n\r\n![Capture d’écran de 2022-03-22 11-33-14](https:\u002F\u002Fuser-images.githubusercontent.com\u002F70526046\u002F159462918-0ce6807b-4096-44f9-b238-60279ac9034b.png)\r\n\r\n On DocTR, you can now use the .`from_hub()` method so that those 2 snippets are equivalent:\r\n\r\n```\r\n# Pretrained\r\nfrom doctr.models.obj_detection import fasterrcnn_mobilenet_v3_large_fpn\r\nmodel = fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)\r\n```\r\nand:\r\n\r\n```\r\n# HF Hub\r\nfrom doctr.models.obj_detection.factory import from_hub\r\nmodel = from_hub(\"mindee\u002Ffasterrcnn_mobilenet_v3_large_fpn\")\r\n\r\n```\r\n\r\n# Breaking changes\r\n\r\n### Replacing the PyMuPDF dependency with pypdfmium2 which is license compatible\r\n\r\nWe replaced for the PyMuPDF dependency with pypdfmium2 for a license-compatibility issue, so we loose the word and objects extraction from source pdf which was done with PyMuPDF. It wasn't used in any models so it is not a big issue, but anyway we will work in the future to re-integrate such a feature.\r\n\r\n\r\n# Full changelog\r\n## What's Changed\r\n### Breaking Changes 🛠\r\n* fix: polygon orientation + line aggregation by @charlesmindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F801\r\n* refactor: Switched from PyMuPDF to pypdfium2 by @fg-mindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F829\r\n### New Features\r\n* feat: Added RandomHorizontalFLip in TF by @SiddhantBahuguna in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F779\r\n* Imgur5k dataset integration by @felixdittrich92 in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F785\r\n* feat: Added support of GPU for predictors in PyTorch by @fg-mindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F808\r\n* Add SynthWordGenerator to text reco training scripts by @felixdittrich92 in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F825\r\n* fix: Fixed some ResNet architecture imprecisions by @fg-mindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F828\r\n* feat: Added shadow augmentation for all backends by @fg-mindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F811\r\n* feat: Added loading method for PyTorch artefact detection models from HF Hub by @fg-mindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F836\r\n* feat: add rotated linknet_resnet18 tensorflow ckpts by @charlesmindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F817\r\n### Bug Fixes\r\n* fix: Fixed rotation of img + target by @fg-mindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F784\r\n* fix: show sample when batch size is 1 by @charlesmindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F787\r\n* ci: Fixed PR label check job by @fg-mindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F792\r\n* ci: Fixed typo in the script ref by @fg-mindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F794\r\n* [datasets] fix description by @felixdittrich92 in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F795\r\n* fix: linknet target computation by @char","2022-03-22T10:41:57",{"id":211,"version":212,"summary_zh":213,"released_at":214},154614,"v0.5.0","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\nThis release adds support of rotated documents, and extends both the model & dataset zoos.\r\n\r\n**Note**: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.\r\n\r\n# Highlights\r\n\r\n## :upside_down_face: :smiley: Rotation-aware text detection :upside_down_face: :smiley:\r\n\r\nIt's no secret: this release focus was to bring the same level of performance to rotated documents! \r\n\r\n![predictions](https:\u002F\u002Fuser-images.githubusercontent.com\u002F70526046\u002F147554907-93f403ba-686b-4029-9ef2-5adc821e7776.png)\r\n\r\ndocTR is meant to be your best tool for seamless document processing, and it couldn't do without supporting a very natural & common augmentation of input documents. This large project was subdivided into three parts:\r\n\r\n### Straightening pages before text detection\r\nDeveloping a heuristic-based method to estimate the page skew, and rotate it before forwarding it to any deep learning model. Our thanks to @Rob192 for his contribution on this part :pray:\r\n\r\n_This behaviour can be enabled to avoid retraining the text detection models. However, the heuristics approach has its limits in terms of robustness._\r\n\r\n### Text detection training with rotated images\r\n\r\n![doctr_sample](https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F147919531-74077940-ac3f-4a9a-acfb-22a0e7881c03.png)\r\n\r\n\r\nThe core of this project was to enable our text detection models to produce non-degraded heatmaps & localization candidates when processing a rotated page.\r\n\r\n### Crop orientation resolution\r\n\r\n![rot2](https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F147919416-a4d8f9d0-b986-4886-aaba-42baf722876f.png)\r\n\r\nFinally, once the localization candidates have been extracted, there is no saying that this localization candidate will read from left to right. In order to remove this doubt, a lightweight image orientation classifier was added to refine the crops that will be sent to text recognition!\r\n\r\n## :zebra: A wider pretrained classification model zoo :zebra:  \r\n\r\nThe stability of trainings in deep learning for complex tasks has mostly been helped by leveraging transfer learning. As such, OCR tasks usually require a backbone as a feature extractor. For this reason, all checkpoints of classification models in both PyTorch & TensorFlow have been updated :rocket: \r\n_Those were trained using our synthetic character classification dataset, for more details cf. [Character classification training](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Ftree\u002Fmain\u002Freferences\u002Fclassification)_\r\n\r\n## :framed_picture: New public datasets join the fray\r\n\r\nThanks to @felixdittrich92, the list of supported datasets has considerably grown :partying_face: \r\nThis includes widely popular datasets used for benchmarks on OCR-related tasks, you can find the full list over here :point_right: #587\r\n\r\n## Synthetic text recognition dataset\r\n\r\nAdditionally, we followed up on the existing `CharGenerator` by introducing `WordGenerator`:\r\n- generates an image of word of length randomly sampled within a specified range, with characters randomly sampled from the specified vocab.\r\n- you can even pass a list of fonts so that each word font family is randomly picked among them\r\n\r\nBelow are some samples using a `font_size=32`: \r\n![wordgenerator_sample](https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F147415761-05a5346c-03ef-494a-a6ce-1138072b60fa.png)\r\n\r\n## :bookmark_tabs: New notebooks\r\n\r\nTwo new notebooks have made their way into the [documentation](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Flatest\u002Fnotebooks.html):\r\n- producing searchable PDFs from docTR analysis results\r\n- introduction to document artefact detection (QR code, bar codes, ID pictures, etc.) with docTR\r\n\r\n![image](https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F147834457-81e54fd7-5aa6-4e48-b6a4-dc103ba9845a.png)\r\n\r\n\r\n# Breaking changes\r\n\r\n## Revamp of classification models\r\n\r\nWith the retraining of all classification backbones, several changes have been introduced:\r\n- Model naming: `linknet16` --> `linknet_resnet18`\r\n- Architecture changes: all classification backbones are available with a classification head now.\r\n\r\n## Enforcing relative coordinates in datasets\r\n\r\nIn order to unify our data pipelines, we forced the conversion to relative coordinates on all datasets!\r\n\r\n0.4.1 | 0.5.0\r\n-- | --\r\n`>>> from doctr.datasets import FUNSD` \u003Cbr\u002F> `>>> ds = FUNSD(train=True, download=True)` \u003Cbr\u002F> `>>> img, target = ds[0]`\u003Cbr\u002F> `>>> print(target['boxes'].dtype, target['boxes'].max())` \u003Cbr\u002F> `(dtype('int64'), 862)` | `>>> from doctr.datasets import FUNSD` \u003Cbr\u002F> `>>> ds = FUNSD(train=True, download=True)` \u003Cbr\u002F> `>>> img, target = ds[0]`\u003Cbr\u002F> `>>> print(target['boxes'].dtype, target['boxes'].max())` \u003Cbr\u002F> `(dtype('float32'), 0.98341835)`  |\r\n\r\n\r\n# Full changelog\r\n## Breaking Changes 🛠\r\n* refacto: :wrench: postprocessing with rotated boxes by @charlesmindee in https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fpull\u002F641\r\n* refa","2021-12-31T18:32:26",{"id":216,"version":217,"summary_zh":218,"released_at":219},154615,"v0.4.1","\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\nThis patch release brings the support of AMP for PyTorch training to docTR along with artefact object detection.\r\n\r\n**Note**: doctr 0.4.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.\r\n\r\n# Highlights\r\n\r\n## Automatic Mixed Precision (AMP) :zap:\r\n\r\nTraining scripts with [PyTorch back-end](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fnotes\u002Famp_examples.html) now benefit from AMP to reduce the RAM footprint and potentially increase the maximum batch size! This comes especially handy on text detection which require high spatial resolution inputs!\r\n\r\n## Artefact detection :flying_saucer: \r\n\r\nDocument understanding goes beyond textual elements, as information can be encoded in other visual forms. For this reason, we have extended the range of supported tasks by adding object detection. This will be focused on non-textual elements in documents, including QR codes, barcodes, ID pictures, and logos.\r\n\r\nHere are some early results:\r\n\r\n![2x3_art(1)](https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F142852701-c220664a-8cd1-4a71-83b0-df8e6beb0485.jpg)\r\n\r\nThis release comes with a training & validation set [DocArtefacts](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Flatest\u002Fdatasets.html#doctr.datasets.DocArtefacts), and a reference [training script](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fblob\u002Fmain\u002Freferences\u002Fobj_detection\u002Ftrain_pytorch.py). Keep an eye for models we will be releasing in the next release!\r\n\r\n\r\n## Get more of docTR with Colab tutorials :book: \r\n\r\nYou've been waiting for it, from now on, we will be adding regularly new tutorials for docTR in the form of [jupyter notebooks](https:\u002F\u002Fjupyter.org\u002F) that you can open and run locally or on [Google Colab](https:\u002F\u002Fresearch.google.com\u002Fcolaboratory\u002F) for instance!\r\n\r\nCheck the new page in the documentation to have an updated list of all our community notebooks: https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Flatest\u002Fnotebooks.html\r\n\r\n# Breaking changes\r\n\r\n## Deprecated support of FP16 for datasets\r\n\r\nFloat-precision can be leveraged in deep learning to decrease the RAM footprint of trainings. The common data type `float32` has a lower resolution counterpart `float16` which is usually only supported on GPU for common deep learning operations. Initially, we were planning to make all our operations available in both to reduce memory footprint in the end.\r\n\r\nHowever, with the latest development of Deep Learning frameworks, and their Automatic Mixed Precision mechanism, this isn't required anymore and only adds more constraints on the development side. We thus deprecated this feature from our datasets and predictors:\r\n\r\n0.4.0 | 0.4.1\r\n-- | --\r\n`>>> from doctr.datasets import FUNSD` \u003Cbr\u002F> `>>> ds = FUNSD(train=True, download=True, fp16=True)` \u003Cbr\u002F> `>>> print(getattr(ds, \"fp16\"))` \u003Cbr\u002F> `True` | `>>> from doctr.datasets import FUNSD` \u003Cbr\u002F> `>>> ds = FUNSD(train=True, download=True)` \u003Cbr\u002F> `>>> print(getattr(ds, \"fp16\"))` \u003Cbr\u002F> `None`  |\r\n\r\n\r\n\r\n# Detailed changes\r\n## New features\r\n\r\n* Adds Arabic to supported vocabs in #514 (@mzeidhassan)\r\n* Adds XML export method to DocumentBuilder in #544 (@felixdittrich92)\r\n* Adds flags to control the behaviour with rotated elements in #551 (@charlesmindee)\r\n* Adds unittest to ensure headers are correct in #556 (@fg-mindee)\r\n* Adds isort ordering & dedicated CI check in #557 (@fg-mindee)\r\n* Adds IIIT-5K to supported datasets in #589 (@felixdittrich92)\r\n* Adds support of AMP to all PyTorch training scripts in #604 (@fg-mindee)\r\n* Adds DocArtefacts dataset for object detection on non-textual elements in #583 (@SiddhantBahuguna)\r\n* Speeds up CTC decoding in PyTorch by x10 in #633 (@fg-mindee)\r\n* Added train script for artefact detection in #593 (@SiddhantBahuguna)\r\n* Added GPU support for classification and improve memory pinning in #629 (@fg-mindee)\r\n* Added an object detection metric in #628 (@fg-mindee)\r\n* Split DocArtefacts into subsets and updated its class mapping  in #601 (@fg-mindee)\r\n* Added README specific for the API with route examples in #612 (@fg-mindee)\r\n* Added SVT dataset integration in #620 (@felixdittrich92)\r\n* Added links to tutorial notebooks in the documentation in #619 (@fg-mindee)\r\n* Added new architectures to model selection in demo in #600 (@fg-mindee)\r\n* Add det\u002Freco_predictor arch in `OCRPredictor.__repr__` in #595 (@RBMindee)\r\n* Improves coverage by adding missing unittests in #545 (@fg-mindee)\r\n* Resolve both lines and blocks by default when building a doc in #548 (@charlesmindee)\r\n* Relocated test\u002F to tests\u002F and made contribution process easier in #598 (@fg-mindee)\r\n* Fixed Makefile by converting spaces to tabs in #615 (@fg-mindee)\r\n* Updated flake8 config to spot unused imports & undefined variables in #623 (@fg-mindee)\r\n* Adds 2 new rotation flags in the ocr_predictor in #632 (@charlesmindee)\r\n\r\n\r\n## Bug fixes\r\n\r\n* Fixed evaluation script clipping issue in #522 (@charlesmindee)\r\n* Fixed API template i","2021-11-22T11:22:33",{"id":221,"version":222,"summary_zh":223,"released_at":224},154616,"v0.4.0","\r\n\u003Cp align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135670324-5fee4530-26f9-413b-b6e0-282cdfbd746a.gif\" width=\"50%\">\r\n\u003C\u002Fp>\r\n\r\n\r\nThis release brings the support of PyTorch out of beta, makes text recognition more robust, and provides light architectures for complex tasks.\r\n\r\n**Note**: doctr 0.4.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.\r\n\r\n# Highlights\r\n\r\n### No more width limitation for text recognition\r\n\r\nSome documents such as French ID card include very long strings that can be challenging to transcribe:\r\n\r\n![fr_id_card_sample (copy)](https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F135622390-f7725f84-aa0d-40a9-b109-06555b45eed3.jpg)\r\n\r\nThis release enables a smart split\u002Fmerge strategy for wide crops to avoid performance drops. Previously the whole crop was analyzed altogether, while right now, it is split into reasonably sized crops, the inference is performed in batch then predictions are merged together.\r\n\r\nThe following snippet:\r\n```python\r\nfrom doctr.io import DocumentFile\r\nfrom doctr.models import ocr_predictor\r\n\r\ndoc = DocumentFile.from_images('path\u002Fto\u002Fimg.png')\r\npredictor = ocr_predictor(pretrained=True)\r\nprint(predictor(doc).pages[0])\r\n```\r\n\r\nused to yield:\r\n\r\n```\r\nPage(\r\n  dimensions=(447, 640)\r\n  (blocks): [Block(\r\n    (lines): [Line(\r\n      (words): [\r\n        Word(value='1XXXXXX', confidence=0.0023),\r\n        Word(value='1XXXX', confidence=0.0018),\r\n      ]\r\n    )]\r\n    (artefacts): []\r\n  )]\r\n)\r\n```\r\n\r\nand now yields:\r\n\r\n```\r\nPage(\r\n  dimensions=(447, 640)\r\n  (blocks): [Block(\r\n    (lines): [Line(\r\n      (words): [\r\n        Word(value='IDFRABERTHIER\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C\u003C', confidence=0.49),\r\n        Word(value='8806923102858CORINNE\u003C\u003C\u003C\u003C\u003C\u003C\u003C6512068F6', confidence=0.22),\r\n      ]\r\n    )]\r\n    (artefacts): []\r\n  )]\r\n)\r\n```\r\n\r\n### Framework specific predictors\r\n\r\nPyTorch support is now no longer in beta, so we made some efforts so that switching from one deep learning backend to another is unified :raised_hands:  Predictors are designed to be the recommended interface for inference with your models!\r\n\r\n0.3.1 (TensorFlow) | 0.3.1 (PyTorch) | 0.4.0\r\n-- | -- | --\r\n`>>> from doctr.models import detection_predictor` \u003Cbr\u002F> `>>> predictor = detection_predictor(pretrained=True)` \u003Cbr\u002F> `>>> out = predictor(doc, training=False)` | `>>> from doctr.models import detection_predictor` \u003Cbr\u002F> `>>> import torch` \u003Cbr\u002F> `>>> predictor = detection_predictor(pretrained=True)` \u003Cbr\u002F> `>>> predictor.model.eval()` \u003Cbr\u002F> `>>> with torch.no_grad(): out = predictor(doc)`  | `>>> from doctr.models import detection_predictor` \u003Cbr\u002F> `>>> predictor = detection_predictor(pretrained=True)` \u003Cbr\u002F> `>>> out = predictor(doc)` |\r\n\r\n### An evergrowing model zoo :zebra:\r\n\r\nAs PyTorch goes out of beta, we have bridged the gap between PyTorch & TensorFlow pretrained models' availability. Additionally, by leveraging our integration of light backbones, this release comes with lighter architectures for text detection and text recognition:\r\n- db_mobilenet_v3_large\r\n- crnn_mobilenet_v3_small\r\n- crnn_mobilenet_v3_large\r\n\r\nThe full list of supported architectures is available :point_right:  [here](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Flatest\u002Fusing_models.html)\r\n\r\n### Demo live on HuggingFace Spaces\r\n\r\nIf you have enjoyed the Streamlit demo, but prefer not to run in on your own hardware, feel free to check out the online version on HuggingFace Spaces:\r\n[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fosanseviero\u002Fdoctr)\r\n\r\nCourtesy of @osanseviero for deploying it, and [HuggingFace](https:\u002F\u002Fhuggingface.co\u002F) for hosting & serving :pray: \r\n\r\n# Breaking changes\r\n\r\n### Deprecated crnn_resnet31 & sar_vgg16_bn\r\n\r\nAfter going over some backbone compatibility and re-assessing whether all combinations should be trained, DocTR is focusing on reproducing the paper's authors' will or improve upon it. As such, we have deprecated the following recognition models (that had no pretrained params): `crnn_resnet31`, `sar_vgg16_bn`.\r\n\r\n### Deprecated models.export\r\n\r\nSince `doctr.models.export` was specific to TensorFlow and it didn't bring much more value than TensorFlow tutorials, we added instructions in the documentation and deprecated the submodule.\r\n\r\n\r\n# New features\r\n\r\n## Datasets\r\nResources to access data in efficient ways\r\n- Added entry in vocabs for Portuguese #464 (@fmobrj), English, Spanish & German #467 (@fg-mindee), ancient Greek #500 (@fg-mindee)\r\n\r\n## IO\r\nFeatures to manipulate input & outputs\r\n- Added `.synthesize` method to `Page` and `Document` #472 (@fg-mindee)\r\n\r\n## Models\r\nDeep learning model building and inference\r\n- Add dynamic crop splitting for wide inputs to recognition models #465 (@charlesmindee)\r\n- Added MobileNets with rectangular pooling #483 (@fg-mindee)\r\n- Added pretrained params for `db_mobilenet_v3_large` #485 #487 , `crnn_vgg16_bn` #487, `db_resnet50` #489, `crnn_mobilenet_v3_small` & `crnn_mobilenet_v3_small` #517 #","2021-10-01T18:58:43",{"id":226,"version":227,"summary_zh":228,"released_at":229},154617,"v0.3.1","This release stabilizes the support for PyTorch backend while extending the range features (new task, superior pretrained models, speed ups).\r\n\r\n*Brought to you by @fg-mindee & @charlesmindee*\r\n\r\n**Note**: doctr 0.3.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.\r\n\r\n# Highlights\r\n\r\n### Improved pretrained parameters for your favorite models :rocket: \r\nWhich each release, we hope to bring you improved models and more comprehensive evaluation results. As part of the 0.3.1 release, we provide you with:\r\n- improved params for `crnn_vgg16_bn` & `sar_resnet31` \r\n- evaluation results on a new private dataset (US tax forms)\r\n\r\n### Lighter backbones for faster architectures :zap:\r\nWithout any surprise, just like many other libraries, DocTR's future will involve some balance between speed and pure performance. To make this choice available to you, we added support of [MobileNet V3](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.02244.pdf) and pretrained it for character classification for both PyTorch & TensorFlow.\r\n\r\n### Speeding up preprocessors & datasets :train2:\r\nWhether you are a user looking for inference speed, or a dedicated model trainer looking for optimal data loading, you will be thrilled to know that we have greatly improved our data loading\u002Fprocessing by leveraging multi-threading!\r\n\r\n### Better demo app :art: \r\nWe value the accessibility of this project and thus commit to improving tools for entry-level users. Deploying a demo from a Python library is not the expertise of every developer, so this release improves the existing demo:\r\n\r\n![new_demo](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Freleases\u002Fdownload\u002Fv0.3.0\u002Fdemo_update.png)\r\n\r\nPage selection was added for multi-page documents, the predictions are used to produce a synthesized version of the initial document, and you get the JSON export! We're looking forward to your feedback :hugs: \r\n\r\n### [beta] Character classification\r\nAs DocTR continues to move forward with more complex tasks, paving the way for a consistent training procedure will become necessary. Pretraining has shown potential in many deep learning tasks, and we want to explore opportunities to make training for OCR even more accessible.\r\n![char_classif](https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F131171140-8cb5846b-c976-4202-8ef1-031adef69deb.png)\r\n\r\nSo this release makes a big step forward by adding on-the-fly character generator and training scripts, which allows you to train a character classifier without any pre-existing data :hushed: \r\n\r\n\r\n# Breaking changes\r\n\r\n### Default dtype of TF datasets\r\n\r\nIn order to harmonize data processing between frameworks, the default data type of dataloaders has been switched to float32 for TensorFlow backend:\r\n\r\n0.3.0 | 0.3.1\r\n-- | --\r\n`>>> from doctr.datasets import FUNSD` \u003Cbr\u002F> `>>> ds = FUNSD()` \u003Cbr\u002F> `>>> img, target = ds[0]` \u003Cbr\u002F> `>>> print(img.dtype)` \u003Cbr\u002F> `\u003Cdtype: 'uint8'>` \u003Cbr\u002F> `>>> print(img.numpy().min(), img.numpy().max())` \u003Cbr\u002F> `0 255`  | `>>> from doctr.datasets import FUNSD` \u003Cbr\u002F> `>>> ds = FUNSD()` \u003Cbr\u002F> `>>> img, target = ds[0]` \u003Cbr\u002F> `>>> print(img.dtype)` \u003Cbr\u002F> `\u003Cdtype: 'float32'>` \u003Cbr\u002F> `>>> print(img.numpy().min(), img.numpy().max())` \u003Cbr\u002F> `0.0 1.0` |\r\n\r\n\r\n### I\u002FO module\r\nWhether it is for exporting predictions or loading input data, the library lets you play around with inputs and outputs using minimal code. Since its usage is constantly expanding, the `doctr.documents` module was repurposed into `doctr.io`.\r\n\r\n\r\n0.3.0 | 0.3.1\r\n-- | --\r\n`>>> from doctr.documents import DocumentFile` \u003Cbr\u002F> `>>> pdf_doc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\").as_images()`  | `>>> from doctr.io import DocumentFile` \u003Cbr\u002F> `>>> pdf_doc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\").as_images()` |\r\n\r\nIt now also includes an `image` submodule for easy tensor \u003C--> numpy conversion for all supported data types.\r\n\r\n### Multithreading relocated\r\nAs multithreading is getting increasingly used to boost performances in the entire library, it has been moved from utilities of TF-only datasets to `doctr.utils.multithreading`:\r\n\r\n0.3.0 | 0.3.1\r\n-- | --\r\n`>>> from doctr.datasets.multithreading import multithread_exec` \u003Cbr\u002F> `>>> results = multithread_exec(lambda x: x ** 2, [1, 4, 8])`  | `>>> from doctr.utils.multithreading import multithread_exec` \u003Cbr\u002F> `>>> results = multithread_exec(lambda x: x ** 2, [1, 4, 8])` |\r\n\r\n# New features\r\n\r\n## Datasets\r\nResources to access data in efficient ways\r\n- Added support of FP16 (#367)\r\n- Added option to merge subsets for recognition datasets (#376)\r\n- Added dynamic sequence encoding (#393)\r\n- Added support of new label format datasets (#407)\r\n- Added character generator dataset for image classification (#412, #418)\r\n\r\n## IO\r\nFeatures to manipulate input & outputs\r\n- Added `Element` creation from dictionary (#386)\r\n- Added byte decoding function for PyTorch and TF (#390)\r\n- Added extra tensor conversion functions (#412)\r\n\r\n## Models\r\nDeep learning model building and inference\r\n- Added `crnn_resnet31` as a recognition model (#361)\r","2021-08-27T18:53:35",{"id":231,"version":232,"summary_zh":233,"released_at":234},154618,"v0.3.0","This release adds support for PyTorch backend & rotated text elements.\r\n\r\n*Release brought to you by @fg-mindee & @charlesmindee*\r\n\r\n**Note**: doctr 0.3.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.\r\n\r\n# Highlights\r\n### [beta] Welcome PyTorch :tada: \r\nThis release comes with exciting news: we added support of PyTorch for the whole library!\r\n\r\n\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Fpytorch.org\u002Fassets\u002Fimages\u002Fpytorch-logo.png\" width=\"200\" height=\"200\">\u003C\u002Fp>\r\n\r\nIf you have both TensorFlow & Pytorch, simply switch DocTR backend by using the `USE_TORCH` and `USE_TF` environment variables.\r\n```shell\r\nexport USE_TORCH='1'\r\n```\r\nThen DocTR will do the rest for you to play along with PyTorch:\r\n```python\r\nimport torch\r\nfrom doctr.models import db_resnet50\r\nmodel = db_resnet50(pretrained=True).eval()\r\nwith torch.no_grad():\r\n    out = model(torch.rand(1, 3, 1024, 1024))\r\n```\r\nMore pretrained models to come in the next releases!\r\n\r\n\r\n### Support of rotated boxes\r\nUsers might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.\r\n\r\n![Rotated bounding boxes](https:\u002F\u002Fuser-images.githubusercontent.com\u002F70526046\u002F121030560-df055080-c7a9-11eb-8b19-a3a1a55cf145.png)\r\n\r\n### Page reconstruction\r\nFollowing up on some feedback about the lack of clarity for visualization of dense predictions, we added a page reconstruction feature. \r\n\r\n```python\r\nimport matplotlib.pyplot as plt\r\nfrom doctr.utils.visualization import synthesize_page\r\nfrom doctr.documents import DocumentFile\r\nfrom doctr.models import ocr_predictor\r\n\r\nmodel = ocr_predictor(pretrained=True)\r\n# PDF\r\ndoc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\").as_images()\r\n# Analyze\r\nresult = model(doc)\r\n\r\n# Reconstruct the first page\r\nreconstructed_page = synthesize_page(result.export()[0])\r\nplt.imshow(reconstructed_page); plt.show()\r\n```\r\n![Original image](https:\u002F\u002Fuser-images.githubusercontent.com\u002F70526046\u002F122777414-4e9c3500-d2ac-11eb-8870-109deb1e28a9.png) ![Page reconstruction](https:\u002F\u002Fuser-images.githubusercontent.com\u002F70526046\u002F122777419-4f34cb80-d2ac-11eb-8dba-d4546071f361.png)\r\n\r\nUsing the predictions from our models, we try to synthesize the document with only its textual information!\r\n\r\n# Breaking changes\r\n\r\n### Renamed LinkNet\r\n\r\nWhile the paper doesn't introduce different versions of the LinkNet architectures, we want to keep the possibility to add more. In order to stabilize the interface early on, we renamed `linknet` into `linknet16`\r\n\r\n0.2.1 | 0.3.0\r\n-- | --\r\n`>>> from doctr.models import linknet` \u003Cbr\u002F> `>>> model = linknet(pretrained=True)`  | `>>> from doctr.models import linknet16` \u003Cbr\u002F> `>>> model = linknet16(pretrained=True)` |\r\n\r\n\r\n# New features\r\n\r\n## Datasets\r\nResources to access data in efficient ways\r\n- Added option to yield rotated bounding boxes as target (#281)\r\n- Added support of PyTorch for all datasets (#319)\r\n\r\n## Documents\r\nFeatures to manipulate document information\r\n- Added support of rotated bboxes (#281)\r\n- Added entry for MASTER (#300)\r\n- Updated LinkNet entry (#313)\r\n- Added code of conduct (#325)\r\n\r\n## Models\r\nDeep learning model building and inference\r\n- Added rotated cropping feature & inference mode (#281)\r\n- Added spatial masked loss support for LinkNet (#296)\r\n- Added page orientation estimation feature (#293)\r\n- Added box target rotation feature (#297)\r\n- Added support of MASTER recognition model & transformer (#300, #342)\r\n- Added Focal loss support to linknet (#304, #311)\r\n- Added PyTorch support for DBNet (#310, #313, #316), LinkNet (#317), `conv_sequence` & parameter loading (#323), `resnet31` (#327), `vgg16_bn` (#328), CRNN (#318), SAR (#333), MASTER (#329, #335, #340, #342)\r\n- Added cleaner verified file downloading function (#319)\r\n- Added upfront page orientation estimation (#324) by @Rob192 \r\n\r\n## Utils\r\nUtility features relevant to the library use cases.\r\n- Added Mask IoU computation (#290)\r\n- Added straight \u003C--> rotated bbox conversion and metric computation support (#281)\r\n- Added page synthesis feature (#320)\r\n- Added IoA, and NMS (#332)\r\n\r\n## Transforms\r\nData transformations operations\r\n- Added support of custom `Resize` in PyTorch (#313), `ColorInversion` (#322)\r\n\r\n## Test\r\nVerifications of the package well-being before release\r\n- Added unittest for maks IoU computation (#290)\r\n- Added unittests for rotated bbox support (#281, #297)\r\n- Added unittests for page orientation estimation (#293, #324)\r\n- Added unittests for MASTER (#300, #309)\r\n- Added test case for the focal loss of LinkNet (#304)\r\n- Added unittests for Pytorch integration (#310, #313, #317, #319, #322, #323, #327, #318, #329, #335, #340, #342)\r\n- Added unittests for IoA & NMS (#332)\r\n\r\n## Documentation\r\nOnline resources for potential users\r\n- Added instructions to install DocTR with PyTorch or TF (#306)\r\n- Added specific instructions to run checks in CONTRIBUTING (#321)\r\n\r\n## References\r\nReference training scrip","2021-07-02T19:32:01",{"id":236,"version":237,"summary_zh":238,"released_at":239},154619,"v0.2.1","This patch release fixes issues with preprocessor and greatly improves text detection models.\r\n\r\n*Brought to you by @fg-mindee & @charlesmindee*\r\n\r\n**Note**: doctr 0.2.1 requires TensorFlow 2.4.0 or higher.\r\n\r\n# Highlights\r\n### Improved text detection\r\nWith this iteration, DocTR brings you a set of newly pretrained parameters for `db_resnet50` which was trained using a much wider range of data augmentations!\r\n\r\narchitecture | FUNSD recall | FUNSD precision | CORD recall | CORD precision\r\n-- | -- | -- | -- | --\r\ndb_resnet50 + crnn_vgg16_bn (v0.2.0) | 64.8 | 70.3 | 67.7 | 78.4\r\ndb_resnet50 + crnn_vgg16_bn (v0.2.1) | 70.08 | 74.77 | 82.19 | 79.67\r\n\r\n![OCR sample](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Freleases\u002Fdownload\u002Fv0.2.0\u002Focr.png)\r\n\r\n### Sequence prediction confidence\r\nUsers might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.\r\n\r\nUsing the following image:\r\n![reco_sample](https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F117133599-c073fa00-ada4-11eb-831b-412de4d28341.jpeg)\r\n\r\nwith this snippet\r\n```python\r\nfrom doctr.documents import DocumentFile\r\nfrom doctr.models import recognition_predictor\r\npredictor = recognition_predictor(pretrained=True)\r\ndoc = DocumentFile.from_images(\"path\u002Fto\u002Freco_sample.jpg\")\r\nprint(predictor(doc))\r\n```\r\nwill get you a list of tuples (word value, sequence confidence):\r\n```\r\n[('invite', 0.9302278757095337)]\r\n````\r\n\r\n### More comprehensive representation of predictors\r\nFor those who play around with the predictor's component, you might value your understanding of their composition. In order to get a cleaner interface, we improved the representation of all predictors component.\r\n\r\nThe following snippet:\r\n```python\r\nfrom doctr.models import ocr_predictor\r\nprint(ocr_predictor())\r\n```\r\nnow yields a much cleaner representation of the predictor composition\r\n```\r\nOCRPredictor(\r\n  (det_predictor): DetectionPredictor(\r\n    (pre_processor): PreProcessor(\r\n      (resize): Resize(output_size=(1024, 1024), method='bilinear')\r\n      (normalize): Compose(\r\n        (transforms): [\r\n          LambdaTransformation(),\r\n          Normalize(mean=[0.7979999780654907, 0.7850000262260437, 0.7720000147819519], std=[0.2639999985694885, 0.27489998936653137, 0.28700000047683716]),\r\n        ]\r\n      )\r\n    )\r\n    (model): DBNet(\r\n      (feat_extractor): IntermediateLayerGetter()\r\n      (fpn): FeaturePyramidNetwork(channels=128)\r\n      (probability_head): \u003Ctensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f645f58e0>\r\n      (threshold_head): \u003Ctensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7ce15310>\r\n      (postprocessor): DBPostProcessor(box_thresh=0.1, max_candidates=1000)\r\n    )\r\n  )\r\n  (reco_predictor): RecognitionPredictor(\r\n    (pre_processor): PreProcessor(\r\n      (resize): Resize(output_size=(32, 128), method='bilinear', preserve_aspect_ratio=True, symmetric_pad=False)\r\n      (normalize): Compose(\r\n        (transforms): [\r\n          LambdaTransformation(),\r\n          Normalize(mean=[0.5, 0.5, 0.5], std=[1.0, 1.0, 1.0]),\r\n        ]\r\n      )\r\n    )\r\n    (model): CRNN(\r\n      (feat_extractor): \u003Cdoctr.models.backbones.vgg.VGG object at 0x7f6f7d866040>\r\n      (decoder): \u003Ctensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7cce2430>\r\n      (postprocessor): CTCPostProcessor(vocab_size=118)\r\n    )\r\n  )\r\n  (doc_builder): DocumentBuilder(resolve_lines=False, resolve_blocks=False, paragraph_break=0.035)\r\n)\r\n```\r\n\r\n\r\n# Breaking changes\r\n\r\n### Metrics' granularity\r\n\r\nRenamed `ExactMatch` to `TextMatch` since the metric now produces different levels of flexibility for the evaluation. Additionally, the constructor flags have been deprecated since the summary will provide all different types of evaluation.\r\n\r\n0.2.0 | 0.2.1\r\n-- | --\r\n`>>> from doctr.utils.metrics import ExactMatch` \u003Cbr\u002F> `>>> metric = ExactMatch(ignore_case=True)` \u003Cbr\u002F> `>>> metric.update([\"i\", \"am\", \"a\", \"jedi\"], [\"I\", \"am\", \"a\", \"sith\"])` \u003Cbr\u002F> `>>> print(metric.summary())` \u003Cbr\u002F> `0.75` | `>>> from doctr.utils.metrics import TextMatch` \u003Cbr\u002F> `>>> metric = TextMatch()` \u003Cbr\u002F> `>>> metric.update([\"i\", \"am\", \"a\", \"jedi\"], [\"I\", \"am\", \"a\", \"sith\"])` \u003Cbr\u002F> `>>> print(metric.summary())` \u003Cbr\u002F> `{'raw': 0.5, 'caseless': 0.75, 'unidecode': 0.5, 'unicase': 0.75}` |\r\n\r\nRaw being the exact match, caseless being the exact match of lower case counterparts, unidecode being the exact match of unidecoded counterparts, and unicase being the exact match of unidecoded lower-case counterparts.\r\n\r\n\r\n# New features\r\n\r\n## Models\r\nDeep learning model building and inference\r\n- Added detection features of faces (#258), bar codes (#260)\r\n- Added new pretrained weights for `db_resnet50` (#277)\r\n- Added sequence probability in text recognition (#284)\r\n\r\n## Utils\r\nUtility features relevant to the library use cases.\r\n- Added granularity on recognition metrics (#274)\r\n- Added visualiz","2021-05-28T13:52:42",{"id":241,"version":242,"summary_zh":243,"released_at":244},154620,"v0.2.0","This release improves model performances and extends library features considerably (including a minimal API template, new datasets, newly trained models).\r\n\r\n*Release handled by @fg-mindee & @charlesmindee*\r\n\r\n**Note**: doctr 0.2.0 requires TensorFlow 2.4.0 or higher.\r\n\r\n# Highlights\r\n### New pretrained weights\r\nEnjoy our newly trained detection and recognition models with improved robustness and performances!\r\nCheck our fully benchmark in the [documentation](https:\u002F\u002Fmindee.github.io\u002Fdoctr\u002Flatest\u002Fmodels.html#end-to-end-ocr) for further details.\r\n\r\n### Improved Line & block detection\r\nThis release comes with a large improvement of line detection. While it is only done in post-processing for now, we considered many cases to make sure you get a consistent and helpful result:\r\n\r\nBefore | After\r\n-- | --\r\n![Before](https:\u002F\u002Fuser-images.githubusercontent.com\u002F70526046\u002F116271250-1979d780-a780-11eb-99cc-f4564fa4c3f0.png) | ![After](https:\u002F\u002Fuser-images.githubusercontent.com\u002F70526046\u002F116271231-15e65080-a780-11eb-965f-3636de849ae6.png)\r\n\r\n### File reading from any source\r\nYou can now expect reading images or PDF from files, binary streams, or even URLs. We completely revamped our document reading pipeline with the new `DocumentFile` class methods\r\n\r\n```python\r\nfrom doctr.documents import DocumentFile\r\n# PDF\r\npdf_doc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\").as_images()\r\n# Image\r\nsingle_img_doc = DocumentFile.from_images(\"path\u002Fto\u002Fyour\u002Fimg.jpg\")\r\n# Multiple page images\r\nmulti_img_doc = DocumentFile.from_images([\"path\u002Fto\u002Fpage1.jpg\", \"path\u002Fto\u002Fpage2.jpg\"])\r\n# Web page\r\nwebpage_doc = DocumentFile.from_url(\"https:\u002F\u002Fwww.yoursite.com\").as_images()\r\n```\r\nIf by any chance your PDF is a source file (web page are converted into such PDF) and not a scanned version, you will also be able to read the information inside\r\n```python\r\nfrom doctr.documents import DocumentFile\r\npdf_doc = DocumentFile.from_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\r\n# Retrieve bounding box and text information\r\nwords = pdf_doc.get_words()\r\n````\r\n\r\n### Reference scripts for training\r\nBy adding multithreading dataloaders and transformations in DocTR, we can now provide you with reference training scripts to train models on your own!\r\n\r\nText detection script (additional details available in [README](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fblob\u002Fmain\u002Freferences\u002Fdetection\u002FREADME.md))\r\n```shell\r\npython references\u002Fdetection.train.py \u002Fpath\u002Fto\u002Fdataset db_resnet50 -b 8 --input-size 512 --epochs 20\r\n```\r\nText recognition script (additional details available in [README](https:\u002F\u002Fgithub.com\u002Fmindee\u002Fdoctr\u002Fblob\u002Fmain\u002Freferences\u002Frecognition\u002FREADME.md))\r\n```shell\r\npython references\u002Fdetection.train.py \u002Fpath\u002Fto\u002Fdataset db_resnet50 -b 8 --input-size 512 --epochs 20\r\n```\r\n\r\n### Minimal API\r\nIf you enjoy DocTR, you might want to integrate it in your API. For your convenience, we added a minimal API template with routes for text detection, text recognition or plain OCR!\r\n\r\nRun it as follows in a docker container:\r\n```shell\r\nPORT=8050 docker-compose up -d --build\r\n```\r\nYour API is now running locally on port 8050! Navigate to http:\u002F\u002Flocalhost:8050\u002Fredoc to check your documentation\r\n![API doc](https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F117133559-b225de00-ada4-11eb-96ba-bd56c1e8d3f3.png)\r\n\r\nOr start making your first request!\r\n```python\r\nimport requests\r\nimport io\r\nwith open('\u002Fpath\u002Fto\u002Fyour\u002Fimage.jpeg', 'rb') as f:\r\n    data = f.read()\r\nresponse = requests.post(\"http:\u002F\u002Flocalhost:8050\u002Frecognition\", files={'file': io.BytesIO(data)})\r\n```\r\n\r\n\r\n# Breaking changes\r\n### Support dropped for TF \u003C 2.4.0\r\nIn order to ensure that all compression features are fully functional in DocTR, support for TensorFlow \u003C 2.4.0 has been dropped.\r\n\r\n### Less confusing predictor's inputs\r\n`OCRPredictor` used to be taking a list of documents as input, and now only takes list of pages.\r\n\r\n0.1.1 | 0.2.0\r\n-- | --\r\n`>>> predictor = ...` \u003Cbr\u002F> `>>> page = np.zeros((h, w, 3), dtype=np.uint8)` \u003Cbr\u002F> `>>> out = predictor([[page]])` | `>>> predictor = ...`\u003Cbr\u002F> `>>> page = np.zeros((h, w, 3), dtype=np.uint8)`\u003Cbr\u002F> `>>> out = predictor([page])` |\r\n\r\n### Model calls\r\nTo gain more flexibility on the training side, the model call method was changed to yield a dictionary with multiple entries\r\n\r\n0.1.1 | 0.2.0\r\n-- | --\r\n`>>> from doctr.models import db_resnet50, DBPostProcessor` \u003Cbr\u002F> `>>> model = db_resnet50(pretrained=True)` \u003Cbr\u002F> `>>> postprocessor = DBPostProcessor()` \u003Cbr\u002F> `>>> prob_map = model(input_t, training=False)` \u003Cbr\u002F> `>>> boxes = postprocessor(prob_map)` | `>>> from doctr.models import db_resnet50` \u003Cbr\u002F> `>>> model = db_resnet50(pretrained=True)`\u003Cbr\u002F> `>>> out = model(input_t, training=False)`\u003Cbr\u002F> `>>> boxes = out['boxes']` |\r\n\r\n# New features\r\n\r\n## Datasets\r\nEasy-to-use datasets for OCR\r\n- Added support of SROIE (#165) and CORD (#197)\r\n- Added recognition dataloader (#163)\r\n- Added sequence encoding function (#184)\r\n- Added `DataLoader` as a dataset wrapper for parallel high performance data reading (#198, #201)\r\n- Added sup","2021-05-11T16:53:16",{"id":246,"version":247,"summary_zh":248,"released_at":249},154621,"v0.1.1","This release patch fixes several bugs, introduces OCR datasets and improves model performances.\r\n\r\n*Release handled by @fg-mindee & @charlesmindee*\r\n\r\n**Note**: doctr 0.1.1 requires TensorFlow 2.3.0 or higher.\r\n\r\n# Highlights\r\n### Introduction of vision datasets\r\nWhether this is for training or evaluation purposes, DocTR provides you with objects to easily download and manipulate datasets. Access OCR datasets within a few lines of code:\r\n\r\n```\r\nfrom doctr.datasets import FUNSD\r\ntrain_set = FUNSD(train=True, download=True)\r\nimg, target = train_set[0]\r\n``` \r\n\r\n### Model evaluation\r\nWhile DocTR 0.1.0 gave you access to pretrained models, you had no way to find the performances of these models apart from computing them yourselves. As of now, we have added a performance benchmark in our documentation for all our models and made the evaluation script available for seamless reproducibility:\r\n```\r\npython scripts\u002Fevaluate.py ocr_db_crnn_vgg\r\n```\r\n\r\n### Demo app\r\nSince we want to make DocTR a convenience for you to build OCR-related applications and services, we made a minimal [Streamlit](https:\u002F\u002Fstreamlit.io\u002F) demo app to showcase its text detection capabilities. You can run the demo with the following commands:\r\n```\r\nstreamlit run demo\u002Fapp.py\r\n```\r\n\r\nHere is how it renders performing text detection on a sample document:\r\n![doctr_demo](https:\u002F\u002Fuser-images.githubusercontent.com\u002F76527547\u002F111645201-c4ea5080-8800-11eb-9807-fd69459e1067.png)\r\n\r\n\r\n# Breaking changes\r\n### Metric update & summary\r\nFor improved clarity, the evaluation metrics' methods were renamed.\r\n\r\n0.1.0 | 0.1.1\r\n-- | --\r\n`>>> from doctr.utils import ExactMatch` \u003Cbr\u002F> `>>> metric = ExactMatch()` \u003Cbr\u002F> `>>> metric.update_state(['Hello', 'world'], ['hello', 'world'])`\u003Cbr\u002F> `>>> metric.result()` | `>>> from doctr.utils import ExactMatch` \u003Cbr\u002F> `>>> metric = ExactMatch()`\u003Cbr\u002F> `>>> metric.update(['Hello', 'world'], ['hello', 'world'])`\u003Cbr\u002F> `>>> metric.summary()` |\r\n\r\n### Renaming of high-level predictors\r\nAs the range of backbones and combinations evolves, we have updated the name of high-level predictors:\r\n0.1.0 | 0.1.1\r\n-- | --\r\n`>>> from doctr.models import ocr_db_crnn` | `>>> from doctr.models import ocr_db_crnn_vgg` |\r\n\r\n# New features\r\n\r\n## Datasets\r\nEasy-to-use datasets for OCR\r\n- Added predefined vocabs (#116)\r\n- Added string encoding\u002Fdecoding utilities (#116)\r\n- Added `FUNSD` dataset (#136, #141)\r\n\r\n## Models\r\nDeep learning model building and inference\r\n- Added ResNet-31 backbone to SAR (#132) and CRNN (#148)\r\n\r\n## Utils\r\nUtility features relevant to the library use cases.\r\n- Added localization (#117) & end-to-end OCR (#122, #141) metrics\r\n\r\n\r\n## Test\r\nVerifications of the package well-being before release\r\n- Added unittests for evaluation metrics (#117, #122)\r\n- Added unittests for string encoding\u002Fdecoding (#116)\r\n- Added unittests for datasets (#136, #141)\r\n- Added unittests for pretrained `crnn_resnet31` (#148), and OCR predictors (#150)\r\n\r\n\r\n## Documentation\r\nOnline resources for potential users\r\n- Added pypi badge to README (#114)\r\n- Added pypi installation instructions to documentation (#114)\r\n- Added evaluation metric section (#117, #122, #158)\r\n- Added multi-version documentation deployment (#123)\r\n- Added datasets page in documentation (#136, #154)\r\n- Added performance benchmark on `FUNSD` in documentation (#143, #149, #150, #155)\r\n- Added instructions in README to run the demo app (#146)\r\n- Added `sar_resnet31` to recognition models documentation (#150) \r\n\r\n## Others\r\nOther tools and implementations\r\n- Added default label to bug report issues (#121)\r\n- Updated CI job for documentation build (#123)\r\n- Added CI job to ensure `analyze.py` script runs (#142)\r\n- Added evaluation script (#141, #145, #151)\r\n- Added text detection demo app (#146)\r\n\r\n# Bug fixes\r\n## Models\r\n- Fixed no-detection predictor export (#119)\r\n- Fixed edge case of polygon to box computation (#139)\r\n- Fixed DB `bitmap_to_boxes` method (#155)\r\n\r\n## Utils\r\n- Fixed typo in `ExactMatch` (#120)\r\n- Fixed IoU computation when boxes are distant (#140)\r\n\r\n## Test\r\n\r\n## Documentation\r\n- Fixed docstring examples of predictors (#126)\r\n- Fixed multi-version documentation build (#138)\r\n- Fixed docstrings of `VisionDataset` and `FUNSD` (#147)\r\n- Fixed usage instructions in README (#150)\r\n- Fixed installation instructions in documentation (#154)\r\n\r\n## Others\r\n- Fixed pypi release CI job (#153)\r\n\r\n# Improvements\r\n\r\n## Models\r\n- Added dimension check on predictor's inputs (#126)\r\n- Updated pretrained DBNet URLs (#129, #150)\r\n- Improved DBNet post-processing (#130, #150, #155, #157)\r\n- Moved normalization parameters to config (#133, #150)\r\n- Refactored file downloading (#136)\r\n- Increased default batch size for recognition (#143)\r\n- Updated `max_length` and `input_shape` of SAR (#143)\r\n- Added support of absolute coordinates for crop extraction (#145)\r\n- Added proper kernel sizing to silence TF unresolved checkpoints warnings (#152, #156)\r\n\r\n## Utils\r\n- Renamed state updating and summarizing ","2021-03-18T14:48:08",{"id":251,"version":252,"summary_zh":253,"released_at":254},154622,"v0.1.0","This first release adds pretrained models for end-to-end OCR and document manipulation utilities.\r\n\r\n*Release handled by @fg-mindee & @charlesmindee*\r\n\r\n**Note**: doctr 0.1.0 requires TensorFlow 2.3.0 or newer.\r\n\r\n# Highlights\r\n## Easy & high-performing document reading\r\nSince document processing is at the core of this project, being able to read documents efficiently is a priority. In this release, we considered PDF and image-based files.\r\n\r\nPDF reading is a wrapper around [PyMuPDF](https:\u002F\u002Fgithub.com\u002Fpymupdf\u002FPyMuPDF) back-end for fast file reading\r\n```\r\nfrom doctr.documents import read_pdf\r\n# from path\r\ndoc = read_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\r\n# from stream\r\nwith open(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\", 'rb') as f:\r\n    doc = read_pdf(f.read())\r\n```\r\nwhile image reading is using [OpenCV](https:\u002F\u002Fgithub.com\u002Fopencv\u002Fopencv) backend\r\n```\r\nfrom doctr.documents import read_img\r\npage = read_img(\"path\u002Fto\u002Fyour\u002Fimg.jpg\")\r\n```\r\n\r\n## Pretrained End-to-End OCR predictors\r\nWhether you conduct text detection, text recognition or end-to-end OCR, this release brings you pretrained models and advanced predictors (that will take care of all preprocessing, model inference and post-processing for you) for easy-to-use pythonic features\r\n\r\n### Text detection\r\nCurrently, only [DBNet](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.08947.pdf)-based architectures are supported, more to come in the next releases!\r\n```\r\nfrom doctr.documents import read_pdf\r\nfrom doctr.models import db_resnet50_predictor\r\nmodel = db_resnet50_predictor(pretrained=True)\r\ndoc = read_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\r\nresult = model(doc)\r\n```\r\n\r\n### Text recognition\r\nThere are two architectures implemented for recognition: [CRNN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1602.05875.pdf), and [SAR](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1811.00751.pdf)\r\n```\r\nfrom doctr.models import crnn_vgg16_bn_predictor\r\nmodel = crnn_vgg16_bn_predictor(pretrained=True)\r\n```\r\n\r\n### End-to-End OCR\r\nSimply combining two models into a two-stage architecture, OCR predictors bring you the easiest way to analyze your document\r\n```\r\nfrom doctr.documents import read_pdf\r\nfrom doctr.models import ocr_db_crnn\r\n\r\nmodel = ocr_db_crnn(pretrained=True)\r\ndoc = read_pdf(\"path\u002Fto\u002Fyour\u002Fdoc.pdf\")\r\nresult = model([doc])\r\n```\r\n\r\n\r\n# New features\r\n\r\n## Documents\r\nDocumentation reading and manipulation\r\n- Added PDF (#8, #18, #25, #83) and image (#30, #79) reading utilities\r\n- Added document structured elements for export (#16, #26, #61, #102)\r\n\r\n## Models\r\nDeep learning model building and inference\r\n- Added model export methods (#10) \r\n- Added preprocessing module (#20, #25, #36, #50, #55, #77)\r\n- Added text detection model and post-processing (#24, #32, #36, #43, #49, #51, #84): DBNet\r\n- Added image cropping function (#33, #44)\r\n- Added model param loading function (#49, #60)\r\n- Added text recognition post-processing (#35, #36, #37, #38, #43, #45, #49, #51, #63, #65, #74, #78, #84, #101, #107, #108, #111, #112): SAR & CRNN\r\n- Added task-specific predictors (#39, #52, #58, #62, #85, #98, #102)\r\n- Added VGG16 (#36), Resnet31 (#70) backbones\r\n\r\n## Utils\r\nUtility features relevant to the library use cases.\r\n- Added page interactive prediction visualization (#54, #82)\r\n- Added custom types (#87)\r\n- Added abstract auto-repr object (#102)\r\n- Added metric module (#110)\r\n\r\n\r\n## Test\r\nVerifications of the package well-being before release\r\n- Added pytest unittests (#7, #59, #75, #76, #80, #92, #104)\r\n\r\n\r\n## Documentation\r\nOnline resources for potential users\r\n- Updated README (#9, #48, #67, #68, #95)\r\n- Added CONTRIBUTING (#7, #29, #48, #67)\r\n- Added sphinx built documentation (#12, #36, #55, #86, #90, #91, #93, #96, #99, #106)\r\n\r\n## Others\r\nOther tools and implementations\r\n- Added python package setup (#7, #21, #67)\r\n- Added CI verifications (#7, #67, #69, #73)\r\n- Added dockerized environment with library installed (#17, #19)\r\n- Added issue template (#34)\r\n- Added environment collection script (#81)\r\n- Added analysis script (#85, #95, #103)","2021-03-05T19:21:56"]