[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-open-mmlab--mmocr":3,"tool-open-mmlab--mmocr":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",150037,2,"2026-04-10T23:33:47",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":74,"owner_website":78,"owner_url":79,"languages":80,"stars":93,"forks":94,"last_commit_at":95,"license":96,"difficulty_score":10,"env_os":97,"env_gpu":98,"env_ram":99,"env_deps":100,"category_tags":110,"github_topics":111,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":132,"updated_at":133,"faqs":134,"releases":165},6564,"open-mmlab\u002Fmmocr","mmocr","OpenMMLab Text Detection, Recognition and Understanding Toolbox","MMOCR 是 OpenMMLab 推出的一款开源工具箱，专注于文字检测、识别与理解任务。简单来说，它能帮助计算机“看懂”图片中的文字：不仅能精准定位文字在图像中的位置（检测），还能将图像中的文字内容转化为可编辑的文本（识别），甚至进一步分析文本的结构与含义（理解）。\n\n在日常应用中，无论是从扫描件中提取信息、识别路牌广告，还是处理复杂版面文档，MMOCR 都能有效解决传统方法难以应对的弯曲文字、多语言混合及复杂背景干扰等难题。它提供了一套完整且模块化的解决方案，让用户无需从零开始构建算法，即可快速部署高精度的 OCR 应用。\n\n这款工具特别适合人工智能开发者、科研人员以及需要处理大量图像文本数据的企业技术人员使用。对于研究者而言，MMOCR 内置了丰富的主流算法模型和详尽的训练评测流程，便于复现论文成果或进行二次创新；对于开发者，其清晰的代码结构和便捷的接口能大幅降低项目落地门槛。\n\nMMOCR 的技术亮点在于其强大的生态整合能力与灵活性。它不仅支持多种前沿深度学习架构，还允许用户通过配置轻松切换不同模型，实现从轻量级移动端部署到高精度服务器端推理的自由适配。凭借活跃的社区维护和持续","MMOCR 是 OpenMMLab 推出的一款开源工具箱，专注于文字检测、识别与理解任务。简单来说，它能帮助计算机“看懂”图片中的文字：不仅能精准定位文字在图像中的位置（检测），还能将图像中的文字内容转化为可编辑的文本（识别），甚至进一步分析文本的结构与含义（理解）。\n\n在日常应用中，无论是从扫描件中提取信息、识别路牌广告，还是处理复杂版面文档，MMOCR 都能有效解决传统方法难以应对的弯曲文字、多语言混合及复杂背景干扰等难题。它提供了一套完整且模块化的解决方案，让用户无需从零开始构建算法，即可快速部署高精度的 OCR 应用。\n\n这款工具特别适合人工智能开发者、科研人员以及需要处理大量图像文本数据的企业技术人员使用。对于研究者而言，MMOCR 内置了丰富的主流算法模型和详尽的训练评测流程，便于复现论文成果或进行二次创新；对于开发者，其清晰的代码结构和便捷的接口能大幅降低项目落地门槛。\n\nMMOCR 的技术亮点在于其强大的生态整合能力与灵活性。它不仅支持多种前沿深度学习架构，还允许用户通过配置轻松切换不同模型，实现从轻量级移动端部署到高精度服务器端推理的自由适配。凭借活跃的社区维护和持续的版本迭代，MMOCR 已成为当前中文乃至全球 OCR 领域极具参考价值的基础设施之一。","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_6b349ec9156a.png\" width=\"500px\"\u002F>\n  \u003Cdiv>&nbsp;\u003C\u002Fdiv>\n  \u003Cdiv align=\"center\">\n    \u003Cb>\u003Cfont size=\"5\">OpenMMLab website\u003C\u002Ffont>\u003C\u002Fb>\n    \u003Csup>\n      \u003Ca href=\"https:\u002F\u002Fopenmmlab.com\">\n        \u003Ci>\u003Cfont size=\"4\">HOT\u003C\u002Ffont>\u003C\u002Fi>\n      \u003C\u002Fa>\n    \u003C\u002Fsup>\n    &nbsp;&nbsp;&nbsp;&nbsp;\n    \u003Cb>\u003Cfont size=\"5\">OpenMMLab platform\u003C\u002Ffont>\u003C\u002Fb>\n    \u003Csup>\n      \u003Ca href=\"https:\u002F\u002Fplatform.openmmlab.com\">\n        \u003Ci>\u003Cfont size=\"4\">TRY IT OUT\u003C\u002Ffont>\u003C\u002Fi>\n      \u003C\u002Fa>\n    \u003C\u002Fsup>\n  \u003C\u002Fdiv>\n  \u003Cdiv>&nbsp;\u003C\u002Fdiv>\n\n[![build](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fworkflows\u002Fbuild\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Factions)\n[![docs](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_6bf48b3e9a6d.png)](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002F?badge=dev-1.x)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fopen-mmlab\u002Fmmocr\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fopen-mmlab\u002Fmmocr)\n[![license](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fopen-mmlab\u002Fmmocr.svg)](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fblob\u002Fmain\u002FLICENSE)\n[![PyPI](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fmmocr.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmmocr\u002F)\n[![Average time to resolve an issue](https:\u002F\u002Fisitmaintained.com\u002Fbadge\u002Fresolution\u002Fopen-mmlab\u002Fmmocr.svg)](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues)\n[![Percentage of issues still open](https:\u002F\u002Fisitmaintained.com\u002Fbadge\u002Fopen\u002Fopen-mmlab\u002Fmmocr.svg)](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues)\n\u003Ca href=\"https:\u002F\u002Fconsole.tiyaro.ai\u002Fexplore?q=mmocr&pub=mmocr\"> \u003Cimg src=\"https:\u002F\u002Ftiyaro-public-docs.s3.us-west-2.amazonaws.com\u002Fassets\u002Ftry_on_tiyaro_badge.svg\">\u003C\u002Fa>\n\n[📘Documentation](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002F) |\n[🛠️Installation](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fget_started\u002Finstall.html) |\n[👀Model Zoo](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmodelzoo.html) |\n[🆕Update News](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fnotes\u002Fchangelog.html) |\n[🤔Reporting Issues](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues\u002Fnew\u002Fchoose)\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\nEnglish | [简体中文](README_zh-CN.md)\n\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fopenmmlab.medium.com\u002F\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_062337b0e5ec.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F25839884\u002F218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png\" width=\"3%\" alt=\"\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FraweFPmdzG\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_6342e5371027.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F25839884\u002F218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png\" width=\"3%\" alt=\"\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002FOpenMMLab\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_04c3beda0b07.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F25839884\u002F218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png\" width=\"3%\" alt=\"\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fopenmmlab\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_204fe79b5a90.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F25839884\u002F218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png\" width=\"3%\" alt=\"\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Fspace.bilibili.com\u002F1293512903\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_8655b6233577.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F25839884\u002F218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png\" width=\"3%\" alt=\"\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Fwww.zhihu.com\u002Fpeople\u002Fopenmmlab\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_447c4737c11f.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n## Latest Updates\n\n**The default branch is now `main` and the code on the branch has been upgraded to v1.0.0. The old `main` branch (v0.6.3) code now exists on the `0.x` branch.** If you have been using the `main` branch and encounter upgrade issues, please read the [Migration Guide](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmigration\u002Foverview.html) and notes on [Branches](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmigration\u002Fbranches.html) .\n\nv1.0.0 was released in 2023-04-06. Major updates from 1.0.0rc6 include:\n\n1. Support for SCUT-CTW1500, SynthText, and MJSynth datasets in Dataset Preparer\n2. Updated FAQ and documentation\n3. Deprecation of file_client_args in favor of backend_args\n4. Added a new MMOCR tutorial notebook\n\nTo know more about the updates in MMOCR 1.0, please refer to [What's New in MMOCR 1.x](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmigration\u002Fnews.html), or\nRead [Changelog](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fnotes\u002Fchangelog.html) for more details!\n\n## Introduction\n\nMMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is part of the [OpenMMLab](https:\u002F\u002Fopenmmlab.com\u002F) project.\n\nThe main branch works with **PyTorch 1.6+**.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_0254ecd2898c.png\"\u002F>\n\u003C\u002Fdiv>\n\n### Major Features\n\n- **Comprehensive Pipeline**\n\n  The toolbox supports not only text detection and text recognition, but also their downstream tasks such as key information extraction.\n\n- **Multiple Models**\n\n  The toolbox supports a wide variety of state-of-the-art models for text detection, text recognition and key information extraction.\n\n- **Modular Design**\n\n  The modular design of MMOCR enables users to define their own optimizers, data preprocessors, and model components such as backbones, necks and heads as well as losses. Please refer to [Overview](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fget_started\u002Foverview.html) for how to construct a customized model.\n\n- **Numerous Utilities**\n\n  The toolbox provides a comprehensive set of utilities which can help users assess the performance of models. It includes visualizers which allow visualization of images, ground truths as well as predicted bounding boxes, and a validation tool for evaluating checkpoints during training.  It also includes data converters to demonstrate how to convert your own data to the annotation files which the toolbox supports.\n\n## Installation\n\nMMOCR depends on [PyTorch](https:\u002F\u002Fpytorch.org\u002F), [MMEngine](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmengine), [MMCV](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmcv) and [MMDetection](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection).\nBelow are quick steps for installation.\nPlease refer to [Install Guide](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fget_started\u002Finstall.html) for more detailed instruction.\n\n```shell\nconda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y\nconda activate open-mmlab\npip3 install openmim\ngit clone https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr.git\ncd mmocr\nmim install -e .\n```\n\n## Get Started\n\nPlease see [Quick Run](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fget_started\u002Fquick_run.html) for the basic usage of MMOCR.\n\n## [Model Zoo](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmodelzoo.html)\n\nSupported algorithms:\n\n\u003Cdetails open>\n\u003Csummary>BackBone\u003C\u002Fsummary>\n\n- [x] [oCLIP](configs\u002Fbackbone\u002Foclip\u002FREADME.md) (ECCV'2022)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>Text Detection\u003C\u002Fsummary>\n\n- [x] [DBNet](configs\u002Ftextdet\u002Fdbnet\u002FREADME.md) (AAAI'2020) \u002F [DBNet++](configs\u002Ftextdet\u002Fdbnetpp\u002FREADME.md) (TPAMI'2022)\n- [x] [Mask R-CNN](configs\u002Ftextdet\u002Fmaskrcnn\u002FREADME.md) (ICCV'2017)\n- [x] [PANet](configs\u002Ftextdet\u002Fpanet\u002FREADME.md) (ICCV'2019)\n- [x] [PSENet](configs\u002Ftextdet\u002Fpsenet\u002FREADME.md) (CVPR'2019)\n- [x] [TextSnake](configs\u002Ftextdet\u002Ftextsnake\u002FREADME.md) (ECCV'2018)\n- [x] [DRRG](configs\u002Ftextdet\u002Fdrrg\u002FREADME.md) (CVPR'2020)\n- [x] [FCENet](configs\u002Ftextdet\u002Ffcenet\u002FREADME.md) (CVPR'2021)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>Text Recognition\u003C\u002Fsummary>\n\n- [x] [ABINet](configs\u002Ftextrecog\u002Fabinet\u002FREADME.md) (CVPR'2021)\n- [x] [ASTER](configs\u002Ftextrecog\u002Faster\u002FREADME.md) (TPAMI'2018)\n- [x] [CRNN](configs\u002Ftextrecog\u002Fcrnn\u002FREADME.md) (TPAMI'2016)\n- [x] [MASTER](configs\u002Ftextrecog\u002Fmaster\u002FREADME.md) (PR'2021)\n- [x] [NRTR](configs\u002Ftextrecog\u002Fnrtr\u002FREADME.md) (ICDAR'2019)\n- [x] [RobustScanner](configs\u002Ftextrecog\u002Frobust_scanner\u002FREADME.md) (ECCV'2020)\n- [x] [SAR](configs\u002Ftextrecog\u002Fsar\u002FREADME.md) (AAAI'2019)\n- [x] [SATRN](configs\u002Ftextrecog\u002Fsatrn\u002FREADME.md) (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)\n- [x] [SVTR](configs\u002Ftextrecog\u002Fsvtr\u002FREADME.md) (IJCAI'2022)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>Key Information Extraction\u003C\u002Fsummary>\n\n- [x] [SDMG-R](configs\u002Fkie\u002Fsdmgr\u002FREADME.md) (ArXiv'2021)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>Text Spotting\u003C\u002Fsummary>\n\n- [x] [ABCNet](projects\u002FABCNet\u002FREADME.md) (CVPR'2020)\n- [x] [ABCNetV2](projects\u002FABCNet\u002FREADME_V2.md) (TPAMI'2021)\n- [x] [SPTS](projects\u002FSPTS\u002FREADME.md) (ACM MM'2022)\n\n\u003C\u002Fdetails>\n\nPlease refer to [model_zoo](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmodelzoo.html) for more details.\n\n## Projects\n\n[Here](projects\u002FREADME.md) are some implementations of SOTA models and solutions built on MMOCR, which are supported and maintained by community users. These projects demonstrate the best practices based on MMOCR for research and product development. We welcome and appreciate all the contributions to OpenMMLab ecosystem.\n\n## Contributing\n\nWe appreciate all contributions to improve MMOCR. Please refer to [CONTRIBUTING.md](.github\u002FCONTRIBUTING.md) for the contributing guidelines.\n\n## Acknowledgement\n\nMMOCR is an open-source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks.\nWe hope the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new OCR methods.\n\n## Citation\n\nIf you find this project useful in your research, please consider cite:\n\n```bibtex\n@article{mmocr2022,\n    title={MMOCR:  A Comprehensive Toolbox for Text Detection, Recognition and Understanding},\n    author={MMOCR Developer Team},\n    howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr}},\n    year={2022}\n}\n```\n\n## License\n\nThis project is released under the [Apache 2.0 license](LICENSE).\n\n## OpenMMLab Family\n\n- [MMEngine](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmengine): OpenMMLab foundational library for training deep learning models\n- [MMCV](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmcv): OpenMMLab foundational library for computer vision.\n- [MIM](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmim): MIM installs OpenMMLab packages.\n- [MMClassification](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmclassification): OpenMMLab image classification toolbox and benchmark.\n- [MMDetection](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection): OpenMMLab detection toolbox and benchmark.\n- [MMDetection3D](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection3d): OpenMMLab's next-generation platform for general 3D object detection.\n- [MMRotate](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmrotate): OpenMMLab rotated object detection toolbox and benchmark.\n- [MMSegmentation](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark.\n- [MMOCR](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr): OpenMMLab text detection, recognition, and understanding toolbox.\n- [MMPose](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmpose): OpenMMLab pose estimation toolbox and benchmark.\n- [MMHuman3D](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark.\n- [MMSelfSup](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmselfsup): OpenMMLab self-supervised learning toolbox and benchmark.\n- [MMRazor](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmrazor): OpenMMLab model compression toolbox and benchmark.\n- [MMFewShot](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmfewshot): OpenMMLab fewshot learning toolbox and benchmark.\n- [MMAction2](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark.\n- [MMTracking](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmtracking): OpenMMLab video perception toolbox and benchmark.\n- [MMFlow](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmflow): OpenMMLab optical flow toolbox and benchmark.\n- [MMEditing](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmediting): OpenMMLab image and video editing toolbox.\n- [MMGeneration](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmgeneration): OpenMMLab image and video generative models toolbox.\n- [MMDeploy](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdeploy): OpenMMLab model deployment framework.\n\n## Welcome to the OpenMMLab community\n\nScan the QR code below to follow the OpenMMLab team's [**Zhihu Official Account**](https:\u002F\u002Fwww.zhihu.com\u002Fpeople\u002Fopenmmlab) and join the OpenMMLab team's [**QQ Group**](https:\u002F\u002Fjq.qq.com\u002F?_wv=1027&k=aCvMxdr3), or join the official communication WeChat group by adding the WeChat, or join our [**Slack**](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fmmocrworkspace\u002Fshared_invite\u002Fzt-1ifqhfla8-yKnLO_aKhVA2h71OrK8GZw)\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fopen-mmlab\u002Fmmcv\u002Fmaster\u002Fdocs\u002Fen\u002F_static\u002Fzhihu_qrcode.jpg\" height=\"400\" \u002F>  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fopen-mmlab\u002Fmmcv\u002Fmaster\u002Fdocs\u002Fen\u002F_static\u002Fqq_group_qrcode.jpg\" height=\"400\" \u002F>  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fopen-mmlab\u002Fmmcv\u002Fmaster\u002Fdocs\u002Fen\u002F_static\u002Fwechat_qrcode.jpg\" height=\"400\" \u002F>\n\u003C\u002Fdiv>\n\nWe will provide you with the OpenMMLab community\n\n- 📢 share the latest core technologies of AI frameworks\n- 💻 Explaining PyTorch common module source Code\n- 📰 News related to the release of OpenMMLab\n- 🚀 Introduction of cutting-edge algorithms developed by OpenMMLab\n  🏃 Get the more efficient answer and feedback\n- 🔥 Provide a platform for communication with developers from all walks of life\n\nThe OpenMMLab community looks forward to your participation! 👬\n","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_6b349ec9156a.png\" width=\"500px\"\u002F>\n  \u003Cdiv>&nbsp;\u003C\u002Fdiv>\n  \u003Cdiv align=\"center\">\n    \u003Cb>\u003Cfont size=\"5\">OpenMMLab官网\u003C\u002Ffont>\u003C\u002Fb>\n    \u003Csup>\n      \u003Ca href=\"https:\u002F\u002Fopenmmlab.com\">\n        \u003Ci>\u003Cfont size=\"4\">热门\u003C\u002Ffont>\u003C\u002Fi>\n      \u003C\u002Fa>\n    \u003C\u002Fsup>\n    &nbsp;&nbsp;&nbsp;&nbsp;\n    \u003Cb>\u003Cfont size=\"5\">OpenMMLab平台\u003C\u002Ffont>\u003C\u002Fb>\n    \u003Csup>\n      \u003Ca href=\"https:\u002F\u002Fplatform.openmmlab.com\">\n        \u003Ci>\u003Cfont size=\"4\">立即体验\u003C\u002Ffont>\u003C\u002Fi>\n      \u003C\u002Fa>\n    \u003C\u002Fsup>\n  \u003C\u002Fdiv>\n  \u003Cdiv>&nbsp;\u003C\u002Fdiv>\n\n[![构建](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fworkflows\u002Fbuild\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Factions)\n[![文档](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_6bf48b3e9a6d.png)](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002F?badge=dev-1.x)\n[![代码覆盖率](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fopen-mmlab\u002Fmmocr\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fopen-mmlab\u002Fmmocr)\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fopen-mmlab\u002Fmmocr.svg)](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fblob\u002Fmain\u002FLICENSE)\n[![PyPI](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fmmocr.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmmocr\u002F)\n[![解决issue的平均时间](https:\u002F\u002Fisitmaintained.com\u002Fbadge\u002Fresolution\u002Fopen-mmlab\u002Fmmocr.svg)](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues)\n[![仍未解决的issue比例](https:\u002F\u002Fisitmaintained.com\u002Fbadge\u002Fopen\u002Fopen-mmlab\u002Fmmocr.svg)](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues)\n\u003Ca href=\"https:\u002F\u002Fconsole.tiyaro.ai\u002Fexplore?q=mmocr&pub=mmocr\"> \u003Cimg src=\"https:\u002F\u002Ftiyaro-public-docs.s3.us-west-2.amazonaws.com\u002Fassets\u002Ftry_on_tiyaro_badge.svg\">\u003C\u002Fa>\n\n[📘文档](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002F) |\n[🛠️安装](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fget_started\u002Finstall.html) |\n[👀模型库](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmodelzoo.html) |\n[🆕更新消息](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fnotes\u002Fchangelog.html) |\n[🤔提交问题](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues\u002Fnew\u002Fchoose)\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\nEnglish | [简体中文](README_zh-CN.md)\n\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fopenmmlab.medium.com\u002F\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_062337b0e5ec.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F25839884\u002F218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png\" width=\"3%\" alt=\"\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FraweFPmdzG\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_6342e5371027.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F25839884\u002F218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png\" width=\"3%\" alt=\"\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002FOpenMMLab\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_04c3beda0b07.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F25839884\u002F218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png\" width=\"3%\" alt=\"\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fopenmmlab\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_204fe79b5a90.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F25839884\u002F218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png\" width=\"3%\" alt=\"\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Fspace.bilibili.com\u002F1293512903\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_8655b6233577.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F25839884\u002F218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png\" width=\"3%\" alt=\"\" \u002F>\n  \u003Ca href=\"https:\u002F\u002Fwww.zhihu.com\u002Fpeople\u002Fopenmmlab\" style=\"text-decoration:none;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_447c4737c11f.png\" width=\"3%\" alt=\"\" \u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n## 最新动态\n\n**默认分支现已变更为`main`，该分支上的代码已升级至v1.0.0。原`main`分支（v0.6.3）的代码现位于`0.x`分支上。** 若您一直使用`main`分支并遇到升级问题，请阅读[迁移指南](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmigration\u002Foverview.html)以及关于[分支](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmigration\u002Fbranches.html)的相关说明。\n\nv1.0.0于2023年4月6日发布。相较于1.0.0rc6的主要更新包括：\n\n1. 数据集准备工具新增对SCUT-CTW1500、SynthText和MJSynth数据集的支持\n2. 更新了常见问题解答及文档\n3. 弃用file_client_args，改用backend_args\n4. 新增MMOCR教程笔记本\n\n欲了解更多关于MMOCR 1.0版本的更新内容，请参阅[MMOCR 1.x版本新特性](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmigration\u002Fnews.html)，或查阅[变更日志](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fnotes\u002Fchangelog.html)以获取更多详细信息！\n\n## 简介\n\nMMOCR是一个基于PyTorch和mmdetection的开源工具箱，用于文本检测、文本识别及其下游任务，如关键信息提取等。它是[OpenMMLab](https:\u002F\u002Fopenmmlab.com\u002F)项目的一部分。\n\n主分支支持**PyTorch 1.6及以上版本**。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_readme_0254ecd2898c.png\"\u002F>\n\u003C\u002Fdiv>\n\n### 主要特性\n\n- **全面的流水线**\n\n  该工具箱不仅支持文本检测和文本识别，还涵盖了其下游任务，如关键信息提取。\n\n- **多种模型**\n\n  工具箱支持大量最先进的文本检测、文本识别及关键信息提取模型。\n\n- **模块化设计**\n\n  MMOCR的模块化设计使用户能够自定义优化器、数据预处理方法以及模型组件，如骨干网络、颈部结构、头部模块和损失函数等。有关如何构建自定义模型的详细信息，请参阅[概述](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fget_started\u002Foverview.html)。\n\n- **丰富的实用工具**\n\n  该工具箱提供了一整套实用工具，可帮助用户评估模型性能。其中包括可视化工具，可用于展示图像、真实标签及预测框；还有验证工具，可在训练过程中评估检查点；此外，还提供了数据转换工具，演示如何将自有数据转换为工具箱支持的标注文件格式。\n\n## 安装\n\nMMOCR 依赖于 [PyTorch](https:\u002F\u002Fpytorch.org\u002F)、[MMEngine](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmengine)、[MMCV](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmcv) 和 [MMDetection](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection)。以下是快速安装步骤。更多详细说明请参阅 [安装指南](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fget_started\u002Finstall.html)。\n\n```shell\nconda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y\nconda activate open-mmlab\npip3 install openmim\ngit clone https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr.git\ncd mmocr\nmim install -e .\n```\n\n## 快速入门\n\nMMOCR 的基本用法请参阅 [快速运行](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fget_started\u002Fquick_run.html)。\n\n## [模型库](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmodelzoo.html)\n\n支持的算法：\n\n\u003Cdetails open>\n\u003Csummary>骨干网络\u003C\u002Fsummary>\n\n- [x] [oCLIP](configs\u002Fbackbone\u002Foclip\u002FREADME.md) (ECCV'2022)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>文本检测\u003C\u002Fsummary>\n\n- [x] [DBNet](configs\u002Ftextdet\u002Fdbnet\u002FREADME.md) (AAAI'2020) \u002F [DBNet++](configs\u002Ftextdet\u002Fdbnetpp\u002FREADME.md) (TPAMI'2022)\n- [x] [Mask R-CNN](configs\u002Ftextdet\u002Fmaskrcnn\u002FREADME.md) (ICCV'2017)\n- [x] [PANet](configs\u002Ftextdet\u002Fpanet\u002FREADME.md) (ICCV'2019)\n- [x] [PSENet](configs\u002Ftextdet\u002Fpsenet\u002FREADME.md) (CVPR'2019)\n- [x] [TextSnake](configs\u002Ftextdet\u002Ftextsnake\u002FREADME.md) (ECCV'2018)\n- [x] [DRRG](configs\u002Ftextdet\u002Fdrrg\u002FREADME.md) (CVPR'2020)\n- [x] [FCENet](configs\u002Ftextdet\u002Ffcenet\u002FREADME.md) (CVPR'2021)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>文本识别\u003C\u002Fsummary>\n\n- [x] [ABINet](configs\u002Ftextrecog\u002Fabinet\u002FREADME.md) (CVPR'2021)\n- [x] [ASTER](configs\u002Ftextrecog\u002Faster\u002FREADME.md) (TPAMI'2018)\n- [x] [CRNN](configs\u002Ftextrecog\u002Fcrnn\u002FREADME.md) (TPAMI'2016)\n- [x] [MASTER](configs\u002Ftextrecog\u002Fmaster\u002FREADME.md) (PR'2021)\n- [x] [NRTR](configs\u002Ftextrecog\u002Fnrtr\u002FREADME.md) (ICDAR'2019)\n- [x] [RobustScanner](configs\u002Ftextrecog\u002Frobust_scanner\u002FREADME.md) (ECCV'2020)\n- [x] [SAR](configs\u002Ftextrecog\u002Fsar\u002FREADME.md) (AAAI'2019)\n- [x] [SATRN](configs\u002Ftextrecog\u002Fsatrn\u002FREADME.md) (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)\n- [x] [SVTR](configs\u002Ftextrecog\u002Fsvtr\u002FREADME.md) (IJCAI'2022)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>关键信息抽取\u003C\u002Fsummary>\n\n- [x] [SDMG-R](configs\u002Fkie\u002Fsdmgr\u002FREADME.md) (ArXiv'2021)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>文本定位\u003C\u002Fsummary>\n\n- [x] [ABCNet](projects\u002FABCNet\u002FREADME.md) (CVPR'2020)\n- [x] [ABCNetV2](projects\u002FABCNet\u002FREADME_V2.md) (TPAMI'2021)\n- [x] [SPTS](projects\u002FSPTS\u002FREADME.md) (ACM MM'2022)\n\n\u003C\u002Fdetails>\n\n更多详情请参阅 [model_zoo](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmodelzoo.html)。\n\n## 项目\n\n[这里](projects\u002FREADME.md)是一些基于 MMOCR 构建的 SOTA 模型和解决方案的实现，由社区用户支持和维护。这些项目展示了基于 MMOCR 进行研究和产品开发的最佳实践。我们欢迎并感谢所有对 OpenMMLab 生态系统的贡献。\n\n## 贡献\n\n我们非常感谢所有为改进 MMOCR 所做的贡献。请参阅 [CONTRIBUTING.md](.github\u002FCONTRIBUTING.md) 获取贡献指南。\n\n## 致谢\n\nMMOCR 是一个开源项目，由来自不同院校和公司的研究人员和工程师共同贡献而成。我们感谢所有实现其方法或添加新功能的贡献者，以及提供宝贵反馈的用户。我们希望这个工具箱和基准能够为不断增长的科研社区提供灵活的工具，以重新实现现有方法并开发新的 OCR 方法。\n\n## 引用\n\n如果您在研究中发现本项目有用，请考虑引用：\n\n```bibtex\n@article{mmocr2022,\n    title={MMOCR: 一种全面的文本检测、识别与理解工具箱},\n    author={MMOCR 开发团队},\n    howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr}},\n    year={2022}\n}\n```\n\n## 许可证\n\n本项目采用 [Apache 2.0 许可证](LICENSE) 发布。\n\n## OpenMMLab 家族\n\n- [MMEngine](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmengine): OpenMMLab 基础深度学习训练库\n- [MMCV](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmcv): OpenMMLab 计算机视觉基础库\n- [MIM](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmim): MIM 用于安装 OpenMMLab 软件包\n- [MMClassification](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmclassification): OpenMMLab 图像分类工具箱和基准\n- [MMDetection](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection): OpenMMLab 目标检测工具箱和基准\n- [MMDetection3D](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection3d): OpenMMLab 下一代通用 3D 物体检测平台\n- [MMRotate](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmrotate): OpenMMLab 旋转目标检测工具箱和基准\n- [MMSegmentation](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmsegmentation): OpenMMLab 语义分割工具箱和基准\n- [MMOCR](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr): OpenMMLab 文本检测、识别和理解工具箱\n- [MMPose](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmpose): OpenMMLab 姿态估计工具箱和基准\n- [MMHuman3D](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmhuman3d): OpenMMLab 3D 人体参数化模型工具箱和基准\n- [MMSelfSup](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmselfsup): OpenMMLab 自监督学习工具箱和基准\n- [MMRazor](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmrazor): OpenMMLab 模型压缩工具箱和基准\n- [MMFewShot](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmfewshot): OpenMMLab 少样本学习工具箱和基准\n- [MMAction2](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmaction2): OpenMMLab 下一代动作理解工具箱和基准\n- [MMTracking](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmtracking): OpenMMLab 视频感知工具箱和基准\n- [MMFlow](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmflow): OpenMMLab 光流计算工具箱和基准\n- [MMEditing](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmediting): OpenMMLab 图像和视频编辑工具箱\n- [MMGeneration](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmgeneration): OpenMMLab 图像和视频生成模型工具箱\n- [MMDeploy](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdeploy): OpenMMLab 模型部署框架。\n\n## 欢迎加入 OpenMMLab 社区\n\n扫描下方二维码，关注 OpenMMLab 团队的[**知乎官方账号**](https:\u002F\u002Fwww.zhihu.com\u002Fpeople\u002Fopenmmlab)，并加入 OpenMMLab 团队的[**QQ 群**](https:\u002F\u002Fjq.qq.com\u002F?_wv=1027&k=aCvMxdr3)；或者通过添加微信加入官方交流微信群；亦可加入我们的[**Slack**](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fmmocrworkspace\u002Fshared_invite\u002Fzt-1ifqhfla8-yKnLO_aKhVA2h71OrK8GZw)。\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fopen-mmlab\u002Fmmcv\u002Fmaster\u002Fdocs\u002Fen\u002F_static\u002Fzhihu_qrcode.jpg\" height=\"400\" \u002F>  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fopen-mmlab\u002Fmmcv\u002Fmaster\u002Fdocs\u002Fen\u002F_static\u002Fqq_group_qrcode.jpg\" height=\"400\" \u002F>  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fopen-mmlab\u002Fmmcv\u002Fmaster\u002Fdocs\u002Fen\u002F_static\u002Fwechat_qrcode.jpg\" height=\"400\" \u002F>\n\u003C\u002Fdiv>\n\n我们将为您带来 OpenMMLab 社区的：\n\n- 📢 分享 AI 框架的最新核心技术\n- 💻 解读 PyTorch 常用模块源码\n- 📰 OpenMMLab 相关发布资讯\n- 🚀 介绍 OpenMMLab 自研的前沿算法\n- 🏃 获得更高效的答疑与反馈\n- 🔥 提供一个与各界开发者交流的平台\n\nOpenMMLab 社区期待您的参与！👫","# MMOCR 快速上手指南\n\nMMOCR 是一个基于 PyTorch 和 MMDetection 的开源工具箱，专注于文本检测、文本识别及关键信息提取等任务。它是 OpenMMLab 项目的重要组成部分。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux (推荐), Windows, macOS\n*   **Python**: 3.8 及以上版本\n*   **PyTorch**: 1.6+ (推荐 1.10+)\n*   **CUDA**: 根据显卡型号安装对应的 CUDA Toolkit (示例中使用 11.3)\n*   **前置依赖**: MMOCR 依赖 MMEngine, MMCV 和 MMDetection，安装脚本会自动处理这些依赖。\n\n> **提示**：国内用户建议使用 Conda 并配置清华或阿里镜像源以加速下载。\n\n## 安装步骤\n\n推荐使用 `conda` 创建独立虚拟环境进行安装。以下是基于 Linux\u002FCUDA 环境的快速安装命令：\n\n1.  **创建并激活 Conda 环境**\n    ```bash\n    conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y\n    conda activate open-mmlab\n    ```\n    *(注：国内用户若下载缓慢，可添加 `-c https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Fmain\u002F` 等国内源参数)*\n\n2.  **安装 MIM (OpenMMLab 包管理工具)**\n    ```bash\n    pip3 install openmim\n    ```\n\n3.  **克隆代码库并安装 MMOCR**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr.git\n    cd mmocr\n    mim install -e .\n    ```\n    *`-e` 参数表示以可编辑模式安装，方便后续开发调试。*\n\n## 基本使用\n\n安装完成后，您可以使用 Python 快速调用预训练模型进行推理。以下是一个最简单的文本检测与识别示例：\n\n```python\nfrom mmocr.apis import MMOCRInferencer\n\n# 初始化推理器\n# det: 文本检测模型 (如 'DBNet')\n# rec: 文本识别模型 (如 'CRNN')\ninferencer = MMOCRInferencer(det='DBNet', rec='CRNN')\n\n# 执行推理\n# 支持图片路径、文件夹或列表\nresults = inferencer('demo\u002Fdemo_text_det_rec.jpg')\n\n# 查看结果\nprint(results['predictions'][0])\n```\n\n**说明：**\n*   `det` 参数指定文本检测算法，可选模型包括 `DBNet`, `PANet`, `PSENet` 等。\n*   `rec` 参数指定文本识别算法，可选模型包括 `CRNN`, `SAR`, `ABINet` 等。\n*   首次运行时，工具会自动下载对应的预训练权重文件。\n\n更多高级用法（如关键信息提取、自定义数据集训练）请参考官方文档中的 [Quick Run](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fget_started\u002Fquick_run.html) 章节。","某电商物流团队每天需处理数万张手写快递面单，试图从中自动提取收件人姓名、电话及地址以更新数据库。\n\n### 没有 mmocr 时\n- 面对倾斜、模糊或手写潦草的面单图片，传统 OCR 引擎识别率极低，大量关键字段无法读取。\n- 缺乏针对文本检测与识别的联合优化流程，开发人员需自行拼接不同开源模型，调试成本极高且兼容性差。\n- 遇到复杂背景或艺术字体时系统直接报错，不得不安排数十名客服人员进行人工二次核对与录入。\n- 模型迭代困难，无法快速利用新收集的面单数据对特定字段（如电话号码）进行微调训练。\n\n### 使用 mmocr 后\n- 借助其内置的高精度检测与识别算法，即使面对严重倾斜或手写潦草的文本，也能实现端到端的精准提取。\n- 利用开箱即用的统一工具箱，一键部署从文本定位到内容识别的全流程，大幅降低了多模型集成的开发门槛。\n- 强大的鲁棒性使其能从容应对复杂背景干扰，自动化率提升至 95% 以上，几乎消除了人工复核环节。\n- 依托丰富的预训练模型和便捷的微调接口，团队可快速基于自有数据定制专用模型，持续优化特定场景表现。\n\nmmocr 将原本碎片化且高成本的 OCR 工程难题，转化为高效、精准且可持续迭代的自动化数据流水线。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopen-mmlab_mmocr_6b349ec9.png","open-mmlab","OpenMMLab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fopen-mmlab_7c171dd7.png","",null,"https:\u002F\u002Fopenmmlab.com","https:\u002F\u002Fgithub.com\u002Fopen-mmlab",[81,85,89],{"name":82,"color":83,"percentage":84},"Python","#3572A5",99.7,{"name":86,"color":87,"percentage":88},"Shell","#89e051",0.2,{"name":90,"color":91,"percentage":92},"Dockerfile","#384d54",0.1,4728,778,"2026-04-08T16:36:51","Apache-2.0","Linux, macOS, Windows","需要 NVIDIA GPU（用于加速），示例环境使用 CUDA 11.3，具体显存需求取决于模型大小","未说明",{"notes":101,"python":102,"dependencies":103},"默认分支已升级至 v1.0.0 版本，需配合 PyTorch 1.6+ 使用。建议使用 conda 创建虚拟环境进行安装。该工具依赖 OpenMMLab 全家桶（MMEngine, MMCV, MMDetection），安装时推荐使用 mim 工具管理依赖。旧版代码位于 0.x 分支。","3.8+",[104,105,106,107,108,109],"torch>=1.6","torchvision","MMEngine","MMCV","MMDetection","openmim",[15,14],[112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131],"pytorch","ocr","deep-learning","text-detection","text-recognition","sar","psenet","panet","maskrcnn","key-information-extraction","pan","dbnet","sdmg-r","crnn","segmentation-based-text-recognition","fcenet","abinet","abcnet","spts","svtr","2026-03-27T02:49:30.150509","2026-04-11T17:48:52.159976",[135,140,145,150,155,160],{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},29645,"训练 SAR 模型时准确率极低且长时间不提升，应该如何解决？","当在大小不平衡的数据集上训练模型时，可以通过重复（repeat）小数据集多次来缓解大数据集带来的偏差。例如 SAR 模型就使用了这种策略。如果精度和召回率很低，请检查是否需要对小样本数据进行重复采样，并注意避免过拟合，通常需要根据数据集情况调整重复次数。","https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues\u002F858",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},29646,"运行测试或推理脚本时遇到 'DataContainer' object is not subscriptable 错误怎么办？","该错误通常与数据容器格式有关。如果是使用 `ocr.py` 进行推理，尝试更换其他文本检测模型看问题是否依然存在。此外，确保使用的配置文件和检查点版本匹配，某些旧版本的配置可能在新版代码中不再兼容。","https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues\u002F481",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},29647,"在哪里可以找到关键信息提取（KIE）模型的训练代码和数据集准备指南？","官方团队已确认关键信息提取（KIE）的训练代码正在整理中，稍后会发布。目前用户可以关注官方仓库的更新或 Pull Request。对于自定义数据集，可以参考社区用户的经验进行标注和准备，或联系有经验的开发者获取帮助。","https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues\u002F478",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},29648,"训练 FCENet 模型时出现数据未在 GPU 上的错误或兼容性问题，如何解决？","该问题与 PyTorch 版本有关。测试发现 PyTorch 1.5.0 会出现此错误，而 PyTorch >= 1.6.0 版本可以正常工作。请将您的 PyTorch 升级到 1.6.0 或更高版本以解决该问题。","https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues\u002F271",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},29649,"训练 MASTER 模型时报错 '[Errno 21] Is a directory'，提示 label.lmdb 是目录而非文件，如何修复？","这是一个隐蔽的配置 bug。在使用 LMDB 格式加载数据时，需要正确配置 `AnnFileLoader`。请确保配置文件中使用如下结构：\n```python\ntrain_ann_file1 = f'{train_root}\u002FSyn90k\u002Flabel.lmdb'\ntrain1 = dict(\n    type='OCRDataset',\n    img_prefix=train_img_prefix1,\n    ann_file=train_ann_file1,\n    loader=dict(\n        type='AnnFileLoader',\n        repeat=1,\n        file_format='lmdb',\n        parser=dict(type='LineJsonParser', keys=['filename', 'text'])),\n    pipeline=None,\n    test_mode=False)\n```\n确保 `file_format` 设置为 'lmdb' 且路径指向包含 data.mdb 和 lock.mdb 的目录。","https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues\u002F1105",{"id":161,"question_zh":162,"answer_zh":163,"source_url":164},29650,"构建数据集时遇到 'AssertionError: AnnFileLoader' 错误，可能的原因是什么？","该错误通常发生在数据集配置文件构建阶段，可能是由于 `AnnFileLoader` 的参数配置不正确或缺少必要的字段（如 file_format、parser 等）。请检查配置文件中数据集部分的 `loader` 定义，确保指定了正确的文件格式（如 'lmdb' 或 'txt'）以及对应的解析器（parser），并验证路径是否存在且权限正确。","https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fissues\u002F1026",[166,171,176,181,186,191,196,201,206,211,216,221,226,231,236,241,246,251,256,261],{"id":167,"version":168,"summary_zh":169,"released_at":170},206209,"v1.0.1","我们非常高兴地宣布 MMOCR v1.0.1 正式发布！该版本包含重要的错误修复和功能增强。\n\n## 🆕 新特性\n- 由 [@A-new-b](https:\u002F\u002Fgithub.com\u002FA-new-b) 基于 mmpretrain 的调度器可视化功能现已集成到 MMOCR 中，详情请见 [#1866](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1866)。\n- 支持 AWS S3 数据获取器 @EnableAsync   https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1888\n\n## 🛠️ 错误修复\n- 由 [@frankstorming](https:\u002F\u002Fgithub.com\u002Ffrankstorming) 修复了 TypeError 错误，详情请见 [#1868](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1868)。\n- 由 [@gaotongxiao](https:\u002F\u002Fgithub.com\u002Fgaotongxiao) 更新了 IIIT5K 数据集的 MD5 校验值，详情请见 [#1848](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1848)。\n- 由 [@KevinNuNu](https:\u002F\u002Fgithub.com\u002FKevinNuNu) 修复了一些中文显示问题，详情请见 [#1922](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1922)。\n- 由 [@gaotongxiao](https:\u002F\u002Fgithub.com\u002Fgaotongxiao) 更新了持续集成（CI）配置中的分支设置，详情请见 [#1842](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1842)。\n\n## 📝 文档改进\n- 由 [@gaotongxiao](https:\u002F\u002Fgithub.com\u002Fgaotongxiao) 从文档中移除了版本标签页，详情请见 [#1843](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1843)。\n- 由 [@Harold-lkk](https:\u002F\u002Fgithub.com\u002FHarold-lkk) 更新了数据准备指南，详情请见 [#1784](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1784)。\n- 由 [@Lum1104](https:\u002F\u002Fgithub.com\u002FLum1104) 更新了 dataset_preparer.md 的英文版本，详情请见 [#1860](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1860)。\n\n## 新贡献者\n* @frankstorming 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1868 中完成了首次贡献。\n* @A-new-b 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1866 中完成了首次贡献。\n* @Lum1104 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1860 中完成了首次贡献。\n* @ly015 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1944 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fcompare\u002Fv1.0.0...v1.0.1","2023-07-04T07:11:53",{"id":172,"version":173,"summary_zh":174,"released_at":175},206210,"v1.0.0","我们很高兴地宣布 MMOCR 1.0 的首个正式版本发布！此次更新带来了多项增强功能、错误修复，并新增了对多个数据集的支持！\n\n## 🌟 亮点\n\n- 支持 SCUT-CTW1500、SynthText 和 MJSynth 数据集\n- 更新了常见问题解答和文档\n- 弃用 `file_client_args`，改用 `backend_args`\n- 新增 MMOCR 教程笔记本\n\n## 🆕 新特性与改进\n\n- @Mountchicken 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1677 中添加了 SCUT-CTW1500 数据集\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1774 中合入了 #1205 提交\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1772 中将 lanms-neo 设置为可选\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1779 中添加了 SynthText 数据集\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1765 中弃用了 `file_client_args`，改用 `backend_args`\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1791 中添加了 MJSynth 数据集\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1771 中新增了 MMOCR 教程笔记本\n- @hugotong6425 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1801 中将 `MMOCRInferencer` 中的 `batch_size` 拆分为 `det_batch_size`、`rec_batch_size` 和 `kie_batch_size`\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1806 中使 `train.py` 和 `test.py` 支持 `local-rank`\n- @cherryjm 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1824 中更新了 `stitch_boxes_into_lines` 函数\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1836 中增加了对 PyTorch 2.0 的测试\n\n## 📝 文档\n\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1773 中更新了常见问题解答\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1767 中从文档中移除了 `LoadImageFromLMDB`\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1766 中在文档中标注了相关项目\n- @jorie-peng 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1753 中添加了 opendatalab 的下载链接\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1469 中修复了文档中的一些失效链接\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1775 中修复了快速入门部分\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1782 中完善了数据集相关内容\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1817 中更新了常见问题解答\n- @fengshiwest 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1818 中增加了更多社交网络链接\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1834 中根据分支切换更新了文档\n\n## 🛠️ 错误修复\n\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1781 中将字典文件放置到 `.mim` 目录下\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1786 中使用 `svtr_small` 替代 `svtr_tiny` 进行测试\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1787 中将 PSE 权重添加到元文件中\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1788 中处理了 Synthtext 的元文件\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1798 中清理了一些未使用的脚本\n- 如果目标路径不存在，移动单个文件时可能会抛出“文件不存在”的错误。由 @KevinNuNu 提供","2023-04-06T11:05:25",{"id":177,"version":178,"summary_zh":179,"released_at":180},206211,"v1.0.0rc6","## 亮点\n\n1. 在 `projects\u002F` 文件夹中新增了两个模型：ABCNet v2（仅用于推理）和 SPTS。\n2. 正式发布 `Inferencer`，这是 OpenMMLab 中的一个统一推理接口，方便用户使用所有预训练权重进行快速推理。[文档](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fuser_guides\u002Finference.html)\n3. 用户现在可以为文本识别任务使用测试时增强技术。[文档](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fuser_guides\u002Ftrain_test.html#test-time-augmentation)\n4. 通过 [`BatchAugSampler`](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1757) 支持 [批量增强](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2020\u002Fpapers\u002FHoffer_Augment_Your_Batch_Improving_Generalization_Through_Instance_Repetition_CVPR_2020_paper.pdf)，该技术被应用于 SPTS 模型中。\n5. 数据集准备工具已重构，以支持更灵活的配置。此外，用户现在可以将文本识别数据集准备为 LMDB 格式。[文档](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fuser_guides\u002Fdata_prepare\u002Fdataset_preparer.html#lmdb-format)\n6. 部分文本检测数据集已修订，以提高其正确性和与通用实践的一致性。\n7. 已消除来自 `shapely` 的潜在误报警告。\n\n## 依赖项\n\n本版本要求 MMEngine >= 0.6.0、MMCV >= 2.0.0rc4 和 MMDet >= 3.0.0rc5。\n\n## 新特性与改进\n\n- 废弃已过时的 LMDB 数据集格式，仅支持 img+label 格式，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1681 中实现。\n- ABCNet v2 推理功能，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1657 中实现。\n- 添加 RepeatAugSampler，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1678 中实现。\n- SPTS 模型，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1696 中实现。\n- 重构 Inferencers，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1608 中实现。\n- 为 rescale_polygons 添加动态返回类型，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1702 中实现。\n- 修订上游版本限制，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1703 中实现。\n- TextRecogCropConverter 增加使用 OpenCV 的 warpPerspective 函数进行裁剪的功能，由 @KevinNuNu 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1667 中实现。\n- 将 cuDNN benchmark 设置为 False，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1705 中实现。\n- 添加 ST 预训练的 DB 系列模型及日志，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1635 中实现。\n- 发布模型时仅保留 meta 和 state_dict，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1729 中实现。\n- 文本识别 TTA 功能，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1401 中实现。\n- 通过用 torch… 替代 np.transpose 来加速格式化操作，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1719 中实现。\n- 支持从注册表自动导入模块，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1731 中实现。\n- 在 Inferencer 中支持批量可视化和转储功能，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1722 中实现。\n- 添加一个新的参数 font_prope","2023-03-07T12:27:43",{"id":182,"version":183,"summary_zh":184,"released_at":185},206212,"v1.0.0rc5","## 亮点\n\n1. 我们的模型库新增了两个模型：Aster 和 SVTR。同时，ABCNet 的完整实现也已开放。\n2. 数据集准备工具支持另外 5 个数据集：CocoTextV2、FUNSD、TextOCR、NAF、SROIE。\n3. 新增了 4 种文本识别变换和 2 种辅助变换。详情请参阅以下 PR：https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1646、https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1632、https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1645。\n4. 变换 `FixInvalidPolygon` 在处理无效多边形方面更加智能，现在能够应对更多异常标注。因此，在 TotalText 数据集上可以无 bug 地完成一个完整的训练周期。同时，基于 TotalText 预训练的 DBNet 和 FCENet 权重也已发布。\n\n## 新特性与改进\n\n- 根据 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1617 中的 DataPrepare 更新 ic15 检测配置。\n- 由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1620 中将 icdardataset 元信息改为小写。\n- @Mountchicken 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1239 中添加了 ASTER 编码器。\n- @Mountchicken 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1625 中添加了 ASTER 解码器。\n- @Mountchicken 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1238 中添加了 ASTER 配置。\n- @Mountchicken 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1629 中更新了 ASTER 配置。\n- @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1503 中支持使用 browse_dataset.py 可视化原始数据集。\n- @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1514 中将 CocoTextv2 添加到数据集准备工具中。\n- @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1550 中将 Funsd 添加到数据集准备工具中。\n- @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1543 中将 TextOCR 添加到数据集准备工具中。\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1628 中优化了示例项目和 README。\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1632 中增强了 FixInvalidPolygon，并新增了 RemoveIgnored 变换。\n- @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1646 中实现了 ConditionApply。\n- @Mountchicken 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1609 中将 NAF 添加到数据集准备工具中。\n- @FerryHuang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1639 中将 SROIE 添加到数据集准备工具中。\n- @willpat1213 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1448 中添加了 SVTR 解码器。\n- @Mountchicken 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1651 中补充了缺失的单元测试。\n- @willpat1213 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1483 中添加了 SVTR 编码器。\n- @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1610 中实现了 ABCNet 的训练。\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1633 中为 DB 和 FCE 提供了 Totaltext 配置。\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1611 中为模型添加了别名。\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1645 中提供了 SVTR 变换。\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F 中添加了 SVTR 框架和配置。","2023-01-06T09:35:46",{"id":187,"version":188,"summary_zh":189,"released_at":190},206213,"v1.0.0rc4","## 亮点\n\n1. 数据集准备工具可在准备流程结束时自动生成基础数据集配置，并新增支持6个数据集：IIIT5k、CUTE80、ICDAR2013、ICDAR2015、SVT、SVTP。\n2. 推出`projects\u002F`文件夹——长期以来，由于对代码质量要求严格，将新模型和功能集成到OpenMMLab的算法库中一直被认为较为繁琐，这不仅会阻碍SOTA模型的快速迭代，也可能抑制社区成员在此分享最新成果的积极性。为此，我们推出了`projects\u002F`文件夹，用于存放一些实验性功能、框架和模型，只需满足最低限度的代码质量要求即可。欢迎大家在此文件夹中提交自己优秀想法的实现！同时，我们也添加了首个[示例项目](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Ftree\u002Fdev-1.x\u002Fprojects\u002Fexample_project)，以展示一个优秀的项目应具备的要素（更多详情请参阅README.md原文）。\n3. 在`projects\u002F`文件夹内，我们发布了ABCNet的预览版本，这是MMOCR中首个文本检测与识别模型的实现。目前仅支持推理功能，但完整的实现将在不久后上线。\n\n## 新特性与改进\n\n- @xinke-wang在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1521中将SVT加入数据集准备工具。\n- @gaotongxiao在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1532中优化了bbox2poly函数。\n- @xinke-wang在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1523中将SVTP加入数据集准备工具。\n- @Harold-lkk在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1530中实现了Iiit5k数据集转换器。\n- @xinke-wang在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1522中将cute80加入数据集准备工具。\n- @xinke-wang在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1531中新增IC13数据集准备工具。\n- @gaotongxiao在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1524中创建了“Projects\u002F”文件夹，并提供了首个示例项目。\n- @Harold-lkk在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1541中将文件命名规则更改为{数据集名}\\_task_train\u002Ftest。\n- @IncludeMathH在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1547中向工具集添加了print_config.py脚本。\n- @gaotongxiao在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1553中新增get_md5函数。\n- @gaotongxiao在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1552中新增配置生成器。\n- @gaotongxiao在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1556中增加了对IC15_1811数据集的支持。\n- @gaotongxiao在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1555中更新了CT80配置。\n- @gaotongxiao在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1560中为所有文本检测和文本识别配置添加了配置生成器。\n- @Mountchicken在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1240中重构了TPS模块。\n- @gaotongxiao在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1561中新增TextSpottingConfigGenerator。\n- @Harold-lkk在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1596中引入了通用类型定义。\n- @gaotongxiao在https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmo中更新了文本识别配置及说明文档。","2022-12-06T09:24:57",{"id":192,"version":193,"summary_zh":194,"released_at":195},206214,"v0.6.3","## 亮点\n\n本次发布增强了推理脚本，并修复了一个可能导致 TorchServe 失败的 bug。\n\n此外，MMOCR 1.0.0rc3（[1.x 分支](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Ftree\u002F1.x)）中还推出了一个新的主干网络 oCLIP-ResNet 以及一个数据集准备工具 Dataset Preparer。更多关于新功能的信息，请参阅 [变更日志](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fnotes\u002Fchangelog.html)，有关 MMOCR 未来维护计划的详情，请查看 [维护计划](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmigration\u002Foverview.html)。\n\n## 新特性与改进\n\n- 将 numpy.float32 类型转换为 Python 内置 float 类型，由 @JunYao1020 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1462 中实现。\n- 当输出字符串中不包含“.”字符时，也将其视为有效结果，由 @JunYao1020 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1457 中实现。\n- 重构问题模板，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1449 中完成。\n- 问题模板更新，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1489 中完成。\n- 更新维护人员名单，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1504 中完成。\n- 支持 MMCV \u003C 1.8.0 版本，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1508 中实现。\n\n## Bug 修复\n\n- 修复 CI 构建问题，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1491 中完成。\n- \\[CI\\] 修复 CI 构建问题，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1463 中完成。\n\n## 文档\n\n- \\[文档\\] 在 README 中添加 MMYOLO 说明，由 @ysh329 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1475 中完成。\n- \\[文档\\] 更新 contributing.md 文件，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1490 中完成。\n\n## 新贡献者\n\n- @ysh329 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1475 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fcompare\u002Fv0.6.2...v0.6.3","2022-11-03T12:01:17",{"id":197,"version":198,"summary_zh":199,"released_at":200},206215,"v1.0.0rc3","## 亮点\n\n1. 我们发布了若干以 [oCLIP-ResNet](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fblob\u002F1.x\u002Fconfigs\u002Fbackbone\u002Foclip\u002FREADME.md) 为骨干网络的预训练模型。oCLIP-ResNet 是一种基于 oCLIP 训练的 ResNet 变体（相关论文：[ECCV 2022](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136880282.pdf)），能够显著提升文本检测模型的性能。\n\n2. 数据集的准备通常既繁琐又耗时，尤其是在 OCR 领域，往往需要使用多个数据集。为了减轻用户的负担，我们设计了 [Dataset Preparer](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fuser_guides\u002Fdata_prepare\u002Fdataset_preparer.html)，只需 **一行命令** 即可快速准备好多组数据集！此外，Dataset Preparer 采用模块化设计，每个模块负责处理准备流程中的一个标准化环节，从而缩短对新数据集的支持开发周期。\n\n## 新特性与改进\n\n- @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1484 中新增了 Dataset Preparer。\n- @HannibalAPE 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1458 中增加了对 oCLIP 中使用的改进版 ResNet 结构的支持。\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1509 中添加了 oCLIP 相关配置。\n\n## 文档更新\n\n- @rogachevai 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1494 中更新了 install.md 文件。\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1455 中优化了部分文档。\n- @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1502 中更新了与数据集准备工具相关的文档。\n- @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1505 中编写了 oCLIP 的 README 文件。\n\n## Bug 修复\n\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1500 中修复了由新数据流引入的 offline_eval 错误。\n\n## 新贡献者\n\n- @rogachevai 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1494 中完成了首次贡献。\n- @HannibalAPE 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1458 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fcompare\u002Fv1.0.0rc2...v1.0.0rc3","2022-11-03T11:59:42",{"id":202,"version":203,"summary_zh":204,"released_at":205},206216,"v1.0.0rc2","本次发布放宽了 `MMEngine` 的版本要求，调整为 `>=0.1.0, \u003C 1.0.0`。\n\n\n**完整更新日志**: https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fcompare\u002Fv1.0.0rc1...v1.0.0rc2","2022-10-14T06:26:33",{"id":207,"version":208,"summary_zh":209,"released_at":210},206217,"v0.6.2","## 亮点\n\n现在可以通过 Python 接口训练和测试模型。例如，您可以在 mmocr\u002F 目录下以如下方式训练模型：\n\n```python\n# 下面展示了如何使用这些修改的示例：\nfrom mmocr.tools.train import TrainArg, parse_args, run_train_cmd\nargs = TrainArg(config='\u002Fpath\u002Fto\u002Fconfig.py')\nargs.add_arg('--work-dir', '\u002Fpath\u002Fto\u002Fdir')\nargs = parse_args(args.arg_list)\nrun_train_cmd(args)\n```\n\n更多详情请参阅 PR [#1138](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1138)。\n\n此外，包含大量新功能的 MMOCR 1.0 发布候选版本现已在 [1.x 分支](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Ftree\u002F1.x) 上提供！请查看 [变更日志](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fnotes\u002Fchangelog.html)，了解有关新功能的更多信息；同时，请参阅 [维护计划](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmigration\u002Foverview.html)，以了解我们未来将如何维护 MMOCR。\n\n## 新特性\n\n- @wybryan 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1138 中添加了可直接在代码中使用的测试与训练 API。\n- @hsiehpinghan 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1437 中使 ResizeOCR 完全支持 mmcv.impad 的 pad_val 参数。\n\n## Bug 修复\n\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1256 中修复了 ABINet 配置问题。\n- @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1333 中修复了识别分数归一化问题。\n- @antoniolanza1996 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1433 中移除了 max_seq_len 不一致的问题。\n- @yjmm10 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1205 中修正了边界框点的排序问题。\n- @JunYao1020 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1446 中纠正了拼写错误，将“preperties”更正为“properties”。\n\n## 文档\n\n- @Venkat2811 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1272 中添加了 Tiyaro 上的演示、实验以及实时推理 API。\n- @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1369 中更新了 1.x 版本的相关信息。\n- @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1406 中为文档和版本切换菜单添加了全局说明。\n- @Nourollah 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1345 中更新了日志钩子配置，以支持 WandB。\n\n## 新贡献者\n\n- @Venkat2811 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1272 中完成了首次贡献。\n- @wybryan 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1139 中完成了首次贡献。\n- @hsiehpinghan 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1437 中完成了首次贡献。\n- @yjmm10 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1205 中完成了首次贡献。\n- @JunYao1020 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1446 中完成了首次贡献。\n- @Nourollah 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1345 中完成了首次贡献。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fcompare\u002Fv0.6.1...v0.6.2","2022-10-14T06:21:26",{"id":212,"version":213,"summary_zh":214,"released_at":215},206218,"v1.0.0rc1","## 亮点\n\n本次发布修复了一个严重的 bug，该 bug 导致多 GPU 训练时指标报告不准确。同时，我们还发布了 MMOCR 1.0 架构中所有文本识别模型的权重，并将它们的推理简写重新添加到 `ocr.py` 中。此外，现在提供了更多文档章节。\n\n## 新特性与改进\n\n- 简化 Mask R-CNN 配置文件，由 @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1391 中完成\n- 自动缩放学习率，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1326 中实现\n- 更新预训练权重路径，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1416 中完成\n- 理顺 pan_postprocessor 中重复的 split_result，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1418 中完成\n- 更新 ocr.py 和 inference.md 中的模型链接，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1431 中完成\n- 更新识别配置文件，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1417 中完成\n- 可视化工具优化，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1411 中完成\n- 支持在 dev-1.x 分支中获取 FLOPs 和参数量，由 @vansin 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1414 中实现\n\n## 文档\n\n- intersphinx 和 API 文档，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1367 中完成\n- 修复 quickrun，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1374 中完成\n- 修复部分文档问题，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1385 中完成\n- 添加 DataElements 相关文档，由 @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1381 中完成\n- 配置文件英文说明，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1372 中完成\n- 指标说明，由 @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1399 中完成\n- 在菜单中添加版本切换器，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1407 中完成\n- 数据增强说明，由 @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1392 中完成\n- 修复推理相关文档，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1415 中完成\n- 修复部分文档，由 @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1410 中完成\n- 在迁移指南中添加维护计划，由 @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1413 中完成\n- 更新识别模型说明，由 @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1402 中完成\n\n## Bug 修复\n\n- 仅在主进程中清空 metric.results，由 @Harold-lkk 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1379 中完成\n- 修复 MMDetWrapper 中的一个 bug，由 @xinke-wang 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1393 中完成\n- 修复 browse_dataset.py，由 @Mountchicken 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1398 中完成\n- ImgAugWrapper：若不适用则不裁剪多边形，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1231 中完成\n- 修复 CI 流程，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1365 中完成\n- 修复合并阶段测试，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1370 中完成\n- 移除对 PyTorch 1.5.1 的 CI 支持，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1371 中完成\n- 测试 Windows cu111 环境，由 @gaotongxiao 在 https:\u002F\u002Fgithub.com\u002Fopen","2022-10-09T11:21:02",{"id":217,"version":218,"summary_zh":219,"released_at":220},206219,"v1.0.0rc0","We are excited to announce the release of MMOCR 1.0.0rc0!\r\nMMOCR 1.0.0rc0 is the first version of MMOCR 1.x, a part of the OpenMMLab 2.0 projects.\r\nBuilt upon the new [training engine](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmengine),\r\nMMOCR 1.x unifies the interfaces of dataset, models, evaluation, and visualization with faster training and testing speed.\r\n\r\n## Highlights\r\n\r\n1. **New engines**. MMOCR 1.x is based on [MMEngine](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmengine), which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.\r\n\r\n2. **Unified interfaces**. As a part of the OpenMMLab 2.0 projects, MMOCR 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task\u002Fmodality algorithms.\r\n\r\n3. **Cross project calling**. Benefiting from the unified design, you can use the models implemented in other OpenMMLab projects, such as MMDet. We provide an example of how to use MMDetection's Mask R-CNN through `MMDetWrapper`. Check our documents for more details. More wrappers will be released in the future.\r\n\r\n4. **Stronger visualization**. We provide a series of useful tools which are mostly based on brand-new visualizers. As a result, it is more convenient for the users to explore the models and datasets now.\r\n\r\n5. **More documentation and tutorials**. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it [here](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002F).\r\n\r\n## Breaking Changes\r\n\r\nWe briefly list the major breaking changes here.\r\nWe also have the [migration guide](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fmigration\u002Foverview.html) that provides complete details and migration instructions.\r\n\r\n### Dependencies\r\n\r\n- MMOCR 1.x relies on MMEngine to run. MMEngine is a new foundational library for training deep learning models in OpenMMLab 2.0 models. The dependencies of file IO and training are migrated from MMCV 1.x to MMEngine.\r\n- MMOCR 1.x relies on MMCV>=2.0.0rc0. Although MMCV no longer maintains the training functionalities since 2.0.0rc0, MMOCR 1.x relies on the data transforms, CUDA operators, and image processing interfaces in MMCV. Note that the package `mmcv` is the version that provide pre-built CUDA operators and `mmcv-lite` does not since MMCV 2.0.0rc0, while `mmcv-full` has been deprecated.\r\n\r\n### Training and testing\r\n\r\n- MMOCR 1.x uses Runner in [MMEngine](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmengine) rather than that in MMCV. The new Runner implements and unifies the building logic of dataset, model, evaluation, and visualizer. Therefore, MMOCR 1.x no longer maintains the building logics of those modules in `mmocr.train.apis` and `tools\u002Ftrain.py`. Those code have been migrated into [MMEngine](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmengine\u002Fblob\u002Fmain\u002Fmmengine\u002Frunner\u002Frunner.py). Please refer to the [migration guide of Runner in MMEngine](https:\u002F\u002Fmmengine.readthedocs.io\u002Fen\u002Flatest\u002Fmigration\u002Frunner.html) for more details.\r\n- The Runner in MMEngine also supports testing and validation. The testing scripts are also simplified, which has similar logic as that in training scripts to build the runner.\r\n- The execution points of hooks in the new Runner have been enriched to allow more flexible customization. Please refer to the [migration guide of Hook in MMEngine](https:\u002F\u002Fmmengine.readthedocs.io\u002Fen\u002Flatest\u002Fmigration\u002Fhook.html) for more details.\r\n- Learning rate and momentum schedules has been migrated from `Hook` to `Parameter Scheduler` in MMEngine. Please refer to the [migration guide of Parameter Scheduler in MMEngine](https:\u002F\u002Fmmengine.readthedocs.io\u002Fen\u002Flatest\u002Fmigration\u002Fparam_scheduler.html) for more details.\r\n\r\n### Configs\r\n\r\n- The [Runner in MMEngine](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmengine\u002Fblob\u002Fmain\u002Fmmengine\u002Frunner\u002Frunner.py) uses a different config structures to ease the understanding of the components in runner. Users can read the [config example of MMOCR](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fuser_guides\u002Fconfig.html) or refer to the [migration guide in MMEngine](https:\u002F\u002Fmmengine.readthedocs.io\u002Fen\u002Flatest\u002Fmigration\u002Frunner.html) for migration details.\r\n- The file names of configs and models are also refactored to follow the new rules unified across OpenMMLab 2.0 projects. Please refer to the [user guides of config](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Fdev-1.x\u002Fuser_guides\u002Fconfig.html) for more details.\r\n\r\n### Dataset\r\n\r\nThe Dataset classes implemented in MMOCR 1.x all inherits from the `BaseDetDataset`, which inherits from the [BaseDataset in MMEngine](https:\u002F\u002Fmmengine.readthedocs.io\u002Fen\u002Flatest\u002Fadvanced_tutorials\u002Fbasedataset.html). There are several changes of Dataset in MMOCR 1.x.\r\n\r\n- All the datasets support serializing the data list to reduce the memory when multiple workers are built to accelerate data loading.\r\n- The ","2022-09-01T06:30:00",{"id":222,"version":223,"summary_zh":224,"released_at":225},206220,"v0.6.1","## Highlights\r\n\r\n1. ArT dataset is available for text detection and recognition!\r\n2. Fix several bugs that affects the correctness of the models.\r\n3. Thanks to [MIM](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmim), our installation is much simpler now! The [docs](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Finstall.html) has been renewed as well.\r\n\r\n## New Features & Enhancements\r\n\r\n- Add ArT by @xinke-wang in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1006\r\n- add ABINet_Vision api by @Abdelrahman350 in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1041\r\n- add codespell ignore and use mdformat by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1022\r\n- Add mim to extras_requrie to setup.py, update mminstall… by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1062\r\n- Simplify normalized edit distance calculation by @maxbachmann in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1060\r\n- Test mim in CI by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1090\r\n- Remove redundant steps by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1091\r\n\r\n* Update links to SDMGR links by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1252\r\n\r\n## Bug Fixes\r\n\r\n- Remove unnecessary requirements by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1000\r\n- Remove confusing img_scales in pipelines by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1007\r\n- inplace operator \"+=\" will cause RuntimeError when model backward by @garvan2021 in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1018\r\n- Fix a typo problem in MASTER by @Mountchicken in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1031\r\n- Fix config name of MASTER in ocr.py by @Mountchicken in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1044\r\n- Relax OpenCV requirement by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1061\r\n- Restrict the minimum version of OpenCV to avoid potential vulnerability by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1065\r\n- typo by @tpoisonooo in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1024\r\n- Fix a typo in setup.py by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1095\r\n- fix #1067: add torchserve DockerFile and fix bugs by @Hegelim in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1073\r\n- Incorrect filename in labelme_converter.py by @xiefeifeihu in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1103\r\n- Fix dataset configs by @Mountchicken in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1106\r\n- Fix #1098: normalize text recognition scores by @Hegelim in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1119\r\n- Update ST_SA_MJ_train.py by @MingyuLau in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1117\r\n- PSENet metafile by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1121\r\n- Flexible ways of getting file name by @balandongiv in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1107\r\n- Updating edge-embeddings after each GNN layer by @amitbcp in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1134\r\n- links update by @TekayaNidham in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1141\r\n- bug fix: access params by cfg.get by @doem97 in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1145\r\n- Fix a bug in LmdbAnnFileBackend that cause breaking in Synthtext detection training by @Mountchicken in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1159\r\n- Fix typo of --lmdb-map-size default value by @easilylazy in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1147\r\n- Fixed docstring syntax error of line 19 & 21 by @APX103 in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1157\r\n- Update lmdb_converter and ct80 cropped image source in document by @doem97 in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1164\r\n- MMCV compatibility due to outdated MMDet by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1192\r\n- Update maximum version of mmcv by @xinke-wang in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1219\r\n- Update ABINet links for main by @Mountchicken in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1221\r\n- Update owners by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1248\r\n- Add back some missing fields in configs by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1171\r\n\r\n## Docs\r\n\r\n- Fix typos by @xinke-wang in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1001\r\n- Configure Myst-parser to parse anchor tag by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1012\r\n- Fix a error in docs\u002Fen\u002Ftutorials\u002Fdataset_types.md by @Mountchicken in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1034\r\n- Update readme according to the guideline by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1047\r\n- Limit markdown version by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1172\r\n- Limit extension versions by @Mountchicken in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1210\r\n\r\n* Update installation guide by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1254\r\n* Update image link @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F1255\r\n\r\n## New Contributors\r\n\r\n- @tpoisonooo made their first contribution in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F102","2022-08-04T06:03:40",{"id":227,"version":228,"summary_zh":229,"released_at":230},206221,"v0.6.0","## Highlights\r\n\r\n1. A new recognition algorithm [MASTER](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.02562) has been added into MMOCR, which was the championship solution for the \"ICDAR 2021 Competition on Scientific Table Image Recognition to Latex\"! The model pre-trained on SynthText and MJSynth is available for testing! Credit to @JiaquanYe\r\n2. [DBNet++](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.10304) has been released now! A new Adaptive Scale Fusion module has been equipped for feature enhancement. Benefiting from this, the new model achieved 2% better h-mean score than its predecessor on the ICDAR2015 dataset.\r\n3. Three more dataset converters are added: LSVT, RCTW and HierText. Check the dataset zoo ([Det](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Fdatasets\u002Fdet.html#) & [Recog](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Fdatasets\u002Frecog.html)) to explore further information.\r\n4. To enhance the data storage efficiency, MMOCR now supports loading both images and labels from .lmdb format annotations for the text recognition task. To enable such a feature, the new lmdb_converter.py is ready for use to pack your cropped images and labels into an lmdb file. For a detailed tutorial, please refer to the following sections and the [doc](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Ftools.html#convert-text-recognition-dataset-to-lmdb-format).\r\n5. Testing models on multiple datasets is a widely used evaluation strategy. MMOCR now supports automatically reporting mean scores when there is more than one dataset to evaluate, which enables a more convenient comparison between checkpoints. [Doc](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fdataset_types.html#getting-mean-evaluation-scores)\r\n6. Evaluation is more flexible and customizable now. For text detection tasks, you can set the score threshold range where the best results might come out. ([Doc](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fdataset_types.html#evaluation)) If too many results are flooding your text recognition train log, you can trim it by specifying a subset of metrics in evaluation config. Check out the [Evaluation](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fdataset_types.html#ocrdataset) section for details.\r\n7. MMOCR provides a script to convert the .json labels obtained by the popular annotation toolkit **Labelme** to MMOCR-supported data format. @Y-M-Y contributed a log analysis tool that helps users gain a better understanding of the entire training process. Read [tutorial docs](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Ftools.html) to get started.\r\n\r\n## Lmdb Dataset\r\n\r\nReading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now officially supports loading images and labels from lmdb datasets via a new pipeline [LoadImageFromLMDB](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fblob\u002F878383b9de8d0e598f31fbb844ffcb0c305deb8b\u002Fmmocr\u002Fdatasets\u002Fpipelines\u002Floading.py#L140).\r\nThis section is intended to serve as a quick walkthrough for you to master this update and apply it to facilitate your research.\r\n\r\n### Specifications\r\n\r\nTo better align with the academic community, MMOCR now requires the following specifications for lmdb datasets:\r\n\r\n  * The parameter describing the data volume of the dataset is `num-samples` instead of `total_number` (deprecated).\r\n  * Images and labels are stored with keys in the form of `image-000000001` and `label-000000001`, respectively.\r\n\r\n\r\n### Usage\r\n\r\n1. Use existing academic lmdb datasets if they meet the specifications; or the tool provided by MMOCR to pack images & annotations into a lmdb dataset.\r\n\r\n  - Previously, MMOCR had a function `txt2lmdb` (deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility [lmdb_converter](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fblob\u002Fmain\u002Ftools\u002Fdata\u002Futils\u002Flmdb_converter.py) to convert recognition datasets with both images and labels to lmdb format.\r\n  - Say that your recognition data in MMOCR's format are organized as follows. (See an example in [ocr_toy_dataset](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Ftree\u002Fmain\u002Ftests\u002Fdata\u002Focr_toy_dataset)).\r\n\r\n    ```text\r\n    # Directory structure\r\n\r\n    ├──img_path\r\n    |      |—— img1.jpg\r\n    |      |—— img2.jpg\r\n    |      |—— ...\r\n    |——label.txt (or label.jsonl)\r\n\r\n    # Annotation format\r\n\r\n    label.txt:  img1.jpg HELLO\r\n                img2.jpg WORLD\r\n                ...\r\n\r\n    label.jsonl:    {'filename':'img1.jpg', 'text':'HELLO'}\r\n                    {'filename':'img2.jpg', 'text':'WORLD'}\r\n                    ...\r\n    ```\r\n\r\n  - Then pack these files up:\r\n\r\n    ```bash\r\n    python tools\u002Fdata\u002Futils\u002Flmdb_converter.py  {PATH_TO_LABEL} {OUTPUT_PATH} --i {PATH_TO_IMAGES}\r\n    ```\r\n\r\n  - Check out [t","2022-05-05T14:20:47",{"id":232,"version":233,"summary_zh":234,"released_at":235},206222,"v0.5.0","## Highlights\r\n\r\n1. MMOCR now supports SPACE recognition! (What a prominent feature!) Users only need to convert the recognition annotations that contain spaces from a plain `.txt` file to JSON line format `.jsonl`, and then revise a few configurations to enable the `LineJsonParser`. For more information, please read our step-by-step [tutorial](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fblank_recog.html).\r\n2. [Tesseract](https:\u002F\u002Fgithub.com\u002Ftesseract-ocr\u002Ftesseract) is now available in MMOCR! While MMOCR is more flexible to support various downstream tasks, users might sometimes not be satisfied with DL models and would like to turn to effective legacy solutions. Therefore, we offer this option in `mmocr.utils.ocr` by wrapping Tesseract as a detector and\u002For recognizer. Users can easily create an MMOCR object by `MMOCR(det=’Tesseract’, recog=’Tesseract’)`. Credit to @garvan2021\r\n3. We release data converters for **16** widely used OCR datasets, including multiple scenarios such as document, handwritten, and scene text. Now it is more convenient to generate annotation files for these datasets. Check the dataset zoo ( [Det](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Fdatasets\u002Fdet.html#) & [Recog](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Fdatasets\u002Frecog.html) ) to explore further information.\r\n4. Special thanks to @EighteenSprings @BeyondYourself @yangrisheng, who had actively participated in documentation translation!\r\n\r\n## Migration Guide - ResNet\r\n\r\nSome refactoring processes are still going on. For text recognition models, we unified the [`ResNet-like` architectures](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fblob\u002F72f945457324e700f0d14796dd10a51535c01a57\u002Fmmocr\u002Fmodels\u002Ftextrecog\u002Fbackbones\u002Fresnet.py) which are used as backbones. By introducing stage-wise and block-wise plugins, the refactored ResNet is highly flexible to support existing models, like ResNet31 and ResNet45, and other future designs of ResNet variants.\r\n\r\n### Plugin\r\n\r\n- `Plugin` is a module category inherited from MMCV's implementation of `PLUGIN_LAYERS`, which can be inserted between each stage of ResNet or into a basicblock. You can find a simple implementation of plugin at [mmocr\u002Fmodels\u002Ftextrecog\u002Fplugins\u002Fcommon.py](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fblob\u002F72f945457324e700f0d14796dd10a51535c01a57\u002Fmmocr\u002Fmodels\u002Ftextrecog\u002Fplugins\u002Fcommon.py), or click the button below.\r\n\r\n    \u003Cdetails close>\r\n    \u003Csummary>Plugin Example\u003C\u002Fsummary>\r\n\r\n    ```python\r\n    @PLUGIN_LAYERS.register_module()\r\n    class Maxpool2d(nn.Module):\r\n        \"\"\"A wrapper around nn.Maxpool2d().\r\n\r\n        Args:\r\n            kernel_size (int or tuple(int)): Kernel size for max pooling layer\r\n            stride (int or tuple(int)): Stride for max pooling layer\r\n            padding (int or tuple(int)): Padding for pooling layer\r\n        \"\"\"\r\n\r\n        def __init__(self, kernel_size, stride, padding=0, **kwargs):\r\n            super(Maxpool2d, self).__init__()\r\n            self.model = nn.MaxPool2d(kernel_size, stride, padding)\r\n\r\n        def forward(self, x):\r\n            \"\"\"\r\n            Args:\r\n                x (Tensor): Input feature map\r\n\r\n            Returns:\r\n                Tensor: The tensor after Maxpooling layer.\r\n            \"\"\"\r\n            return self.model(x)\r\n    ```\r\n\r\n    \u003C\u002Fdetails>\r\n\r\n### Stage-wise Plugins\r\n\r\n- ResNet is composed of stages, and each stage is composed of blocks. E.g., ResNet18 is composed of 4 stages, and each stage is composed of basicblocks. For each stage, we provide two ports to insert stage-wise plugins by giving `plugins` parameters in ResNet.\r\n\r\n    ```text\r\n    [port1: before stage] ---> [stage] ---> [port2: after stage]\r\n    ```\r\n\r\n- E.g. Using a ResNet with four stages as example. Suppose we want to insert an additional convolution layer before each stage, and an additional convolution layer at stage 1, 2, 4. Then you can define the special ResNet18 like this\r\n\r\n    ```python\r\n    resnet18_speical = ResNet(\r\n            # for simplicity, some required\r\n            # parameters are omitted\r\n            plugins=[\r\n                dict(\r\n                    cfg=dict(\r\n                    type='ConvModule',\r\n                    kernel_size=3,\r\n                    stride=1,\r\n                    padding=1,\r\n                    norm_cfg=dict(type='BN'),\r\n                    act_cfg=dict(type='ReLU')),\r\n                    stages=(True, True, True, True),\r\n                    position='before_stage')\r\n                dict(\r\n                    cfg=dict(\r\n                    type='ConvModule',\r\n                    kernel_size=3,\r\n                    stride=1,\r\n                    padding=1,\r\n                    norm_cfg=dict(type='BN'),\r\n                    act_cfg=dict(type='ReLU')),\r\n                    stages=(True, True, False, True),\r\n                    position='after_stage')\r\n            ])\r\n    ```\r\n\r\n- You can also insert more than one plugin in each port and those plugins will be executed in order. Let's take ResNet in [MASTER](https:\u002F\u002Farxiv.org\u002Fabs","2022-03-31T09:50:19",{"id":237,"version":238,"summary_zh":239,"released_at":240},206223,"v0.4.1","## Highlights\r\n\r\n1. Visualizing edge weights in OpenSet KIE is now supported! https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F677\r\n2. Some configurations have been optimized to significantly speed up the training and testing processes! Don't worry - you can still tune these parameters in case these modifications do not work. https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F757\r\n3. Now you can use CPU to train\u002Fdebug your model! https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F752\r\n4. We have fixed a severe bug that causes users unable to call `mmocr.apis.test` with our pre-built wheels. https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F667\r\n\r\n## New Features & Enhancements\r\n\r\n* Show edge score for openset kie by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F677\r\n* Download flake8 from github as pre-commit hooks by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F695\r\n* Deprecate the support for 'python setup.py test' by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F722\r\n* Disable multi-processing feature of cv2 to speed up data loading by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F721\r\n* Extend ctw1500 converter to support text fields by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F729\r\n* Extend totaltext converter to support text fields by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F728\r\n* Speed up training by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F739\r\n* Add setup multi-processing both in train and test.py by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F757\r\n* Support CPU training\u002Ftesting by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F752\r\n* Support specify gpu for testing and training with gpu-id instead of gpu-ids and gpus  by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F756\r\n* Remove unnecessary custom_import from test.py  by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F758\r\n\r\n## Bug Fixes\r\n\r\n* Fix satrn onnxruntime test by @AllentDan in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F679\r\n* Support both ConcatDataset and UniformConcatDataset by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F675\r\n* Fix bugs of show_results in single_gpu_test by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F667\r\n* Fix a bug for sar decoder when bi-rnn is used by @MhLiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F690\r\n* Fix opencv version to avoid some bugs by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F694\r\n* Fix py39 ci error by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F707\r\n* Update visualize.py by @TommyZihao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F715\r\n* Fix link of config by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F726\r\n* Use yaml.safe_load instead of load by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F753\r\n* Add necessary keys to test_pipelines to enable test-time visualization by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F754\r\n\r\n## Docs\r\n\r\n* Fix recog.md by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F674\r\n* Add config tutorial by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F683\r\n* Add MMSelfSup\u002FMMRazor\u002FMMDeploy in readme by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F692\r\n* Add recog & det model summary by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F693\r\n* Update docs link by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F710\r\n* add pull request template.md by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F711\r\n* Add website links to readme by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F731\r\n* update readme according to standard by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F742\r\n\r\n## New Contributors\r\n\r\n* @MhLiao made their first contribution in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F690\r\n* @TommyZihao made their first contribution in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F715\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fcompare\u002Fv0.4.0...v0.4.1","2022-01-27T06:41:56",{"id":242,"version":243,"summary_zh":244,"released_at":245},206224,"v0.4.0","## Highlights\r\n\r\n1. We release a new text recognition model - [ABINet](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.06495.pdf) (CVPR 2021, Oral). With dedicated model design and useful data augmentation transforms, ABINet achieves the best performance on irregular text recognition tasks. [Check it out!](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Ftextrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition)\r\n2. We are also working hard to fulfill the requests from our community. [OpenSet KIE](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Fkie_models.html#wildreceiptopenset) is one of the achievements, which extends the application of SDMGR from text node classification to node-pair relation extraction. We also provide a demo script to convert WildReceipt to open set domain, though it may not take full advantage of the OpenSet format. For more information, read our [tutorial](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fkie_closeset_openset.html).\r\n3. APIs of models can be exposed through TorchServe. [Docs](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Fmodel_serving.html)\r\n\r\n## Breaking Changes & Migration Guide\r\n\r\n### Postprocessor\r\n\r\nSome refactoring processes are still going on. For all text detection models, we unified their `decode` implementations into a new module category, `POSTPROCESSOR`, which is responsible for decoding different raw outputs into boundary instances. In all text detection configs, the `text_repr_type` argument in `bbox_head` is deprecated and will be removed in the future release.\r\n\r\n**Migration Guide**: Find a similar line from detection model's config:\r\n```\r\ntext_repr_type=xxx,\r\n```\r\nAnd replace it with\r\n```\r\npostprocessor=dict(type='{MODEL_NAME}Postprocessor', text_repr_type=xxx)),\r\n```\r\nTake a snippet of PANet's config as an example. Before the change, its config for `bbox_head` looks like:\r\n```\r\n    bbox_head=dict(\r\n        type='PANHead',\r\n        text_repr_type='poly',\r\n        in_channels=[128, 128, 128, 128],\r\n        out_channels=6,\r\n        loss=dict(type='PANLoss')),\r\n```\r\nAfterwards:\r\n```\r\n    bbox_head=dict(\r\n    type='PANHead',\r\n    in_channels=[128, 128, 128, 128],\r\n    out_channels=6,\r\n    loss=dict(type='PANLoss'),\r\n    postprocessor=dict(type='PANPostprocessor', text_repr_type='poly')),\r\n```\r\nThere are other postprocessors and each takes different arguments. Interested users can find their interfaces or implementations in `mmocr\u002Fmodels\u002Ftextdet\u002Fpostprocess` or through our [api docs](https:\u002F\u002Fmmocr.readthedocs.io\u002Fen\u002Flatest\u002Fapi.html#textdet-postprocess).\r\n\r\n### New Config Structure\r\n\r\nWe reorganized the `configs\u002F` directory by extracting reusable sections into `configs\u002F_base_`. Now the directory tree of `configs\u002F_base_` is organized as follows:\r\n\r\n```\r\n_base_\r\n├── det_datasets\r\n├── det_models\r\n├── det_pipelines\r\n├── recog_datasets\r\n├── recog_models\r\n├── recog_pipelines\r\n└── schedules\r\n```\r\n\r\nMost of model configs are making full use of base configs now, which makes the overall structural clearer and facilitates fair comparison across models. Despite the seemingly significant hierarchical difference, **these changes would not break the backward compatibility** as the names of model configs remain the same.\r\n\r\n## New Features\r\n* Support openset kie by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F498\r\n* Add converter for the Open Images v5 text annotations by Krylov et al. by @baudm in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F497\r\n* Support Chinese for kie show result by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F464\r\n* Add TorchServe support for text detection and recognition by @Harold-lkk in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F522\r\n* Save filename in text detection test results by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F570\r\n* Add codespell pre-commit hook and fix typos by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F520\r\n* Avoid duplicate placeholder docs in CN by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F582\r\n* Save results to json file for kie. by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F589\r\n* Add SAR_CN to ocr.py by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F579\r\n* mim extension for windows by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F641\r\n* Support muitiple pipelines for different datasets by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F657\r\n* ABINet Framework by @gaotongxiao in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F651\r\n\r\n## Refactoring\r\n* Refactor textrecog config structure by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F617\r\n* Refactor text detection config by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F626\r\n* refactor transformer modules by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F618\r\n* refactor textdet postprocess by @cuhk-hbsun in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F640\r\n\r\n## Docs\r\n* C++ example section by @apiaccess21 in https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr","2021-12-15T03:40:06",{"id":247,"version":248,"summary_zh":249,"released_at":250},206225,"v0.3.0","## Highlights\r\n1. We add a new text recognition model -- SATRN! Its pretrained checkpoint achieves the best performance over other provided text recognition models. A lighter version of SATRN is also released which can obtain ~98% of the performance of the original model with only 45 MB in size. (@2793145003) #405\r\n2. Improve the demo script, `ocr.py`, which supports applying end-to-end text detection, text recognition and key information extraction models on images with easy-to-use commands. Users can find its full documentation in the demo section. (@samayala22, @manjrekarom) #371, #386, #400, #374, #428\r\n3. Our documentation is reorganized into a clearer structure. More useful contents are on the way! #409, #454\r\n4. The requirement of `Polygon3` is removed since this project is no longer maintained or distributed. We unified all its references to equivalent substitutions in `shapely` instead. #448\r\n\r\n## Breaking Changes & Migration Guide\r\n1. Upgrade version requirement of MMDetection to 2.14.0 to avoid bugs #382\r\n2. MMOCR now has its own model and layer registries inherited from MMDetection's or MMCV's counterparts. (#436) The modified hierarchical structure of the model registries are now organized as follows.\r\n```text\r\nmmcv.MODELS -> mmdet.BACKBONES -> BACKBONES\r\nmmcv.MODELS -> mmdet.NECKS -> NECKS\r\nmmcv.MODELS -> mmdet.ROI_EXTRACTORS -> ROI_EXTRACTORS\r\nmmcv.MODELS -> mmdet.HEADS -> HEADS\r\nmmcv.MODELS -> mmdet.LOSSES -> LOSSES\r\nmmcv.MODELS -> mmdet.DETECTORS -> DETECTORS\r\nmmcv.ACTIVATION_LAYERS -> ACTIVATION_LAYERS\r\nmmcv.UPSAMPLE_LAYERS -> UPSAMPLE_LAYERS\r\n````\r\nTo migrate your old implementation to our new backend, you need to change the import path of any registries and their corresponding builder functions (including `build_detectors`) from `mmdet.models.builder` to `mmocr.models.builder`. If you have referred to any model or layer of MMDetection or MMCV in your model config, you need to add `mmdet.` or `mmcv.` prefix to its name to inform the model builder of the right namespace to work on.\r\n\r\nInterested users may check out [MMCV's tutorial on Registry](https:\u002F\u002Fmmcv.readthedocs.io\u002Fen\u002Flatest\u002Funderstand_mmcv\u002Fregistry.html) for in-depth explanations on its mechanism.\r\n \r\n## New Features\r\n- Automatically replace SyncBN with BN for inference #420, #453\r\n- Support batch inference for CRNN and SegOCR #407\r\n- Support exporting documentation in pdf or epub format #406\r\n- Support `persistent_workers` option in data loader #459\r\n\r\n## Bug Fixes\r\n- Remove depreciated key in kie_test_imgs.py #381\r\n- Fix dimension mismatch in batch testing\u002Finference of DBNet #383\r\n- Fix the problem of dice loss which stays at 1 with an empty target given #408\r\n- Fix a wrong link in ocr.py (@naarkhoo) #417\r\n- Fix undesired assignment to \"pretrained\" in test.py #418\r\n- Fix a problem in polygon generation of DBNet #421, #443\r\n- Skip invalid annotations in totaltext_converter #438\r\n- Add zero division handler in poly utils, remove Polygon3 #448\r\n\r\n## Improvements\r\n- Replace lanms-proper with lanms-neo to support installation on Windows (with special thanks to @gen-ko who has re-distributed this package!)\r\n- Support MIM #394\r\n- Add tests for PyTorch 1.9 in CI #401\r\n- Enables fullscreen layout in readthedocs #413\r\n- General documentation enhancement #395\r\n- Update version checker #427\r\n- Add copyright info #439\r\n- Update citation information #440\r\n\r\n## Contributors\r\nWe thank @2793145003, @samayala22, @manjrekarom, @naarkhoo, @gen-ko, @duanjiaqi, @gaotongxiao, @cuhk-hbsun, @innerlee, @wdsd641417025 for their contribution to this release!","2021-08-25T08:52:47",{"id":252,"version":253,"summary_zh":254,"released_at":255},206226,"v0.2.1","**Highlights**\r\n1. Upgrade to use MMCV-full **>= 1.3.8** and MMDetection **>= 2.13.0** for latest features\r\n2. Add ONNX and TensorRT export tool, supporting the deployment of DBNet, PSENet, PANet and CRNN (experimental) [#278](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F278), [#291](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F291), [#300](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F300), [#328](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F328)\r\n3. Unified parameter initialization method which uses init_cfg in config files [#365](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F365)\r\n\r\n**New Features**\r\n\r\n- Support TextOCR dataset [#293](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F293)\r\n- Support Total-Text dataset [#266](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F266), [#273](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F273), [#357](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F357)\r\n- Support grouping text detection box into lines [#290](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F290), [#304](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F304)\r\n- Add benchmark_processing script that benchmarks data loading process [#261](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F261)\r\n- Add SynthText preprocessor for text recognition models [#351](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F351), [#361](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F361)\r\n- Support batch inference during testing [#310](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F310)\r\n- Add user-friendly OCR inference script [#366](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F366)\r\n\r\n**Bug Fixes**\r\n\r\n- Fix improper class ignorance in SDMGR Loss [#221](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F221)\r\n- Fix potential numerical zero division error in DRRG [#224](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F224)\r\n- Fix installing requirements with pip and mim [#242](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F242)\r\n- Fix dynamic input error of DBNet [#269](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F269)\r\n- Fix space parsing error in LineStrParser [#285](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F285)\r\n- Fix textsnake decode error [#264](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F264)\r\n- Correct isort setup [#288](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F288)\r\n- Fix a bug in SDMGR config [#316](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F316)\r\n- Fix kie_test_img for KIE nonvisual [#319](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F319)\r\n- Fix metafiles [#342](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F342)\r\n- Fix different device problem in FCENet [#334](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F334)\r\n- Ignore improper tailing empty characters in annotation files [#358](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F358)\r\n- Docs fixes [#247](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F247), [#255](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F255), [#265](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F265), [#267](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F267), [#268](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F268), [#270](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F270), [#276](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F276), [#287](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F287), [#330](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F330), [#355](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F355), [#367](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F367)\r\n- Fix NRTR config [#356](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F356), [#370](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F370)\r\n\r\n**Improvements**\r\n\r\n- Add backend for resizeocr [#244](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F244)\r\n- Skip image processing pipelines in SDMGR novisual [#260](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F260)\r\n- Speedup DBNet [#263](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F263)\r\n- Update mmcv installation method in workflow [#323](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F323)\r\n- Add part of Chinese documentations [#353](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F353), [#362](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F362)\r\n- Add support for ConcatDataset with two workflows [#348](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F348)\r\n- Add list_from_file and list_to_file utils [#226](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F226)\r\n- Speed up sort_vertex [#239](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F239)\r\n- Support distributed evaluation of KIE [#234](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F234)\r\n- Add pretrained FCENet on IC15 [#258](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F258)\r\n- Support CPU for OCR demo [#227](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F227)\r\n- Avoid extra image pre-processing steps [#375](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F375)","2021-07-20T15:30:41",{"id":257,"version":258,"summary_zh":259,"released_at":260},206227,"v0.2.0","**Highlights**\r\n\r\n1. Add the NER approach Bert-softmax (NAACL'2019)\r\n2. Add the text detection method DRRG (CVPR'2020)\r\n3. Add the text detection method FCENet (CVPR'2021)\r\n4. Increase the ease of use via adding text detection and recognition end-to-end demo, and colab online demo.\r\n5. Simplify the installation.\r\n\r\n**New Features**\r\n\r\n- Add Bert-softmax for Ner task [#148](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F148)\r\n- Add DRRG [#189](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F189)\r\n- Add FCENet [#133](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F133)\r\n- Add end-to-end demo [#105](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F105)\r\n- Support batch inference [#86](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F86) [#87](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F87) [#178](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F178)\r\n- Add TPS preprocessor for text recognition [#117](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F117) [#135](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F135)\r\n- Add demo documentation [#151](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F151) [#166](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F166) [#168](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F168) [#170](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F170) [#171](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F171)\r\n- Add checkpoint for Chinese recognition [#156](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F156)\r\n- Add metafile [#175](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F175) [#176](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F176) [#177](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F177) [#182](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F182) [#183](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F183)\r\n- Add support for numpy array inference [#74](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F74)\r\n\r\n**Bug Fixes**\r\n\r\n- Fix the duplicated point bug due to transform for textsnake [#130](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F130)\r\n- Fix CTC loss NaN [#159](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F159)\r\n- Fix error raised if result is empty in demo [#144](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F141)\r\n- Fix results missing if one image has a large number of boxes [#98](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F98)\r\n- Fix package missing in dockerfile [#109](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F109)\r\n\r\n**Improvements**\r\n\r\n- Simplify installation procedure via removing compiling [#188](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F188)\r\n- Speed up panet post processing so that it can detect dense texts [#188](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F188)\r\n- Add zh-CN README [#70](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F70) [#95](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F95)\r\n- Support windows [#89](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F89)\r\n- Add Colab [#147](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F147) [#199](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F199)\r\n- Add 1-step installation using conda environment [#193](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F193) [#194](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F194) [#195](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmocr\u002Fpull\u002F195)","2021-05-18T15:24:28",{"id":262,"version":263,"summary_zh":264,"released_at":265},206228,"v0.1.0","**Main Features**\r\n\r\n- Support text detection, text recognition and the corresponding downstream tasks such as key information extraction.\r\n- For text detection, support both single-step (`PSENet`, `PANet`, `DBNet`, `TextSnake`) and two-step (`MaskRCNN`) methods.\r\n- For text recognition, support CTC-loss based method `CRNN`; Encoder-decoder (with attention) based methods `SAR`, `Robustscanner`; Segmentation based method `SegOCR`; Transformer based method `NRTR`.\r\n- For key information extraction, support GCN based method `SDMG-R`.\r\n- Provide checkpoints and log files for all of the methods above.","2021-04-13T14:05:20"]