[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-githubharald--SimpleHTR":3,"tool-githubharald--SimpleHTR":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160411,2,"2026-04-18T23:33:24",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":76,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":85,"forks":86,"last_commit_at":87,"license":88,"difficulty_score":89,"env_os":90,"env_gpu":91,"env_ram":92,"env_deps":93,"category_tags":100,"github_topics":101,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":108,"updated_at":109,"faqs":110,"releases":145},9452,"githubharald\u002FSimpleHTR","SimpleHTR","Handwritten Text Recognition (HTR) system implemented with TensorFlow.","SimpleHTR 是一个基于 TensorFlow 构建的开源手写文本识别（HTR）系统，旨在将包含手写单词或整行文本的图片自动转换为可编辑的数字文本。它有效解决了传统 OCR 工具在处理非标准、连笔或个性化手写内容时识别率低的问题，特别适用于需要从手写笔记、历史文档或表单中提取信息的场景。\n\n该工具非常适合开发者、人工智能研究人员以及需要定制手写识别功能的技术团队使用。用户可以直接利用预训练模型进行推理，也能基于 IAM 数据集重新训练模型以适应特定需求。SimpleHTR 的核心亮点在于其灵活的解码策略：除了支持基础的 CTC 解码外，还集成了独特的“词束搜索（Word Beam Search）”算法。该算法结合词典约束，能显著降低拼写错误，即使在处理复杂语境时也能比常规解码器更准确地还原原文。此外，系统不仅支持单个单词识别，还能处理包含多个单词的整行文本，并提供了清晰的命令行接口和 Web 演示，方便用户快速上手验证效果。","# Handwritten Text Recognition with TensorFlow\n\n* **Update 2023\u002F2: a [web demo](https:\u002F\u002Fgithubharald.github.io\u002Ftext_reader.html) is available**\n* **Update 2023\u002F1: see [HTRPipeline](https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FHTRPipeline) for a package to read full pages**\n* **Update 2021\u002F2: recognize text on line level (multiple words)**\n* **Update 2021\u002F1: more robust model, faster dataloader, word beam search decoder also available for Windows**\n* **Update 2020: code is compatible with TF2**\n\n\nHandwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset.\nThe model takes **images of single words or text lines (multiple words) as input** and **outputs the recognized text**.\n3\u002F4 of the words from the validation-set are correctly recognized, and the character error rate is around 10%.\n\n![htr](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_3d01d44bd5a7.png)\n\n\n## Run demo\n\n* Download one of the pretrained models\n  * [Model trained on word images](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002Fmya8hw6jyzqm0a3\u002Fword-model.zip?dl=1): \n    only handles single words per image, but gives better results on the IAM word dataset\n  * [Model trained on text line images](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002F7xwkcilho10rthn\u002Fline-model.zip?dl=1):\n    can handle multiple words in one image\n* Put the contents of the downloaded zip-file into the `model` directory of the repository  \n* Go to the `src` directory \n* Run inference code:\n  * Execute `python main.py` to run the model on an image of a word\n  * Execute `python main.py --img_file .https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_f8c4e3545ac5.png` to run the model on an image of a text line\n\nThe input images, and the expected outputs are shown below when the text line model is used.\n\n![test](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_326f03ef9661.png)\n```\n> python main.py\nInit with stored values from ..\u002Fmodel\u002Fsnapshot-13\nRecognized: \"word\"\nProbability: 0.9806370139122009\n```\n\n![test](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_f8c4e3545ac5.png)\n\n```\n> python main.py --img_file .https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_f8c4e3545ac5.png\nInit with stored values from ..\u002Fmodel\u002Fsnapshot-13\nRecognized: \"or work on line level\"\nProbability: 0.6674373149871826\n```\n\n## Command line arguments\n* `--mode`: select between \"train\", \"validate\" and \"infer\". Defaults to \"infer\".\n* `--decoder`: select from CTC decoders \"bestpath\", \"beamsearch\" and \"wordbeamsearch\". Defaults to \"bestpath\". For option \"wordbeamsearch\" see details below.\n* `--batch_size`: batch size.\n* `--data_dir`: directory containing IAM dataset (with subdirectories `img` and `gt`).\n* `--fast`: use LMDB to load images faster.\n* `--line_mode`: train reading text lines instead of single words.\n* `--img_file`: image that is used for inference.\n* `--dump`: dumps the output of the NN to CSV file(s) saved in the `dump` folder. Can be used as input for the [CTCDecoder](https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FCTCDecoder).\n\n\n## Integrate word beam search decoding\n\nThe [word beam search decoder](https:\u002F\u002Fgithubharald.github.io\u002Fpublications.html) can be used instead of the two decoders shipped with TF.\nWords are constrained to those contained in a dictionary, but arbitrary non-word character strings (numbers, punctuation marks) can still be recognized.\nThe following illustration shows a sample for which word beam search is able to recognize the correct text, while the other decoders fail.\n\n![decoder_comparison](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_73e554ecb064.png)\n\nFollow these instructions to integrate word beam search decoding:\n\n1. Clone repository [CTCWordBeamSearch](https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FCTCWordBeamSearch)\n2. Compile and install by running `pip install .` at the root level of the CTCWordBeamSearch repository\n3. Specify the command line option `--decoder wordbeamsearch` when executing `main.py` to actually use the decoder\n\nThe dictionary is automatically created in training and validation mode by using all words contained in the IAM dataset (i.e. also including words from validation set) and is saved into the file `data\u002Fcorpus.txt`.\nFurther, the manually created list of word-characters can be found in the file `model\u002FwordCharList.txt`.\nBeam width is set to 50 to conform with the beam width of vanilla beam search decoding.\n\n\n## Train model on IAM dataset\n\n### Prepare dataset\nFollow these instructions to get the IAM dataset:\n\n* Register for free at this [website](http:\u002F\u002Fwww.fki.inf.unibe.ch\u002Fdatabases\u002Fiam-handwriting-database)\n* Download `words\u002Fwords.tgz`\n* Download `ascii\u002Fwords.txt`\n* Create a directory for the dataset on your disk, and create two subdirectories: `img` and `gt`\n* Put `words.txt` into the `gt` directory\n* Put the content (directories `a01`, `a02`, ...) of `words.tgz` into the `img` directory\n\n### Run training\n\n* Delete files from `model` directory if you want to train from scratch\n* Go to the `src` directory and execute `python main.py --mode train --data_dir path\u002Fto\u002FIAM`\n* The IAM dataset is split into 95% training data and 5% validation data  \n* If the option `--line_mode` is specified, \n  the model is trained on text line images created by combining multiple word images into one  \n* Training stops after a fixed number of epochs without improvement\n\nThe pretrained word model was trained with this command on a GTX 1050 Ti:\n```\npython main.py --mode train --fast --data_dir path\u002Fto\u002Fiam  --batch_size 500 --early_stopping 15\n```\n\nAnd the line model with:\n```\npython main.py --mode train --fast --data_dir path\u002Fto\u002Fiam  --batch_size 250 --early_stopping 10\n```\n\n\n### Fast image loading\nLoading and decoding the png image files from the disk is the bottleneck even when using only a small GPU.\nThe database LMDB is used to speed up image loading:\n* Go to the `src` directory and run `create_lmdb.py --data_dir path\u002Fto\u002Fiam` with the IAM data directory specified\n* A subfolder `lmdb` is created in the IAM data directory containing the LMDB files\n* When training the model, add the command line option `--fast`\n\nThe dataset should be located on an SSD drive.\nUsing the `--fast` option and a GTX 1050 Ti training on single words takes around 3h with a batch size of 500.\nTraining on text lines takes a bit longer.\n\n\n## Information about model\n\nThe model is a stripped-down version of the HTR system I implemented for [my thesis](https:\u002F\u002Fgithubharald.github.io\u002Fpublications.html).\nWhat remains is the bare minimum to recognize text with an acceptable accuracy.\nIt consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer.\nFor more details see this [Medium article](https:\u002F\u002Fharald-scheidl.medium.com\u002F2326a3487cd5).\n\n\n## References\n* [Build a Handwritten Text Recognition System using TensorFlow](https:\u002F\u002Fharald-scheidl.medium.com\u002F2326a3487cd5)\n* [Scheidl - Handwritten Text Recognition in Historical Documents](https:\u002F\u002Fgithubharald.github.io\u002Fpublications.html)\n* [Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm](https:\u002F\u002Fgithubharald.github.io\u002Fpublications.html)\n\n","# 使用 TensorFlow 进行手写文本识别\n\n* **更新 2023\u002F2：已提供一个 [网页演示](https:\u002F\u002Fgithubharald.github.io\u002Ftext_reader.html)**\n* **更新 2023\u002F1：请参阅 [HTRPipeline](https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FHTRPipeline)，这是一个用于读取整页文本的工具包**\n* **更新 2021\u002F2：支持行级文本识别（多个单词）**\n* **更新 2021\u002F1：模型更加稳健，数据加载器速度更快，词束搜索解码器现在也适用于 Windows 系统**\n* **更新 2020：代码兼容 TF2**\n\n\n使用 TensorFlow (TF) 实现的手写文本识别 (HTR) 系统，并在 IAM 离线 HTR 数据集上进行训练。该模型以**单个单词或文本行（多个单词）的图像作为输入**，并**输出识别后的文本**。\n验证集中的 3\u002F4 的单词被正确识别，字符错误率约为 10%。\n\n![htr](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_3d01d44bd5a7.png)\n\n\n## 运行演示\n\n* 下载其中一个预训练模型\n  * [基于单词图像训练的模型](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002Fmya8hw6jyzqm0a3\u002Fword-model.zip?dl=1)：\n    每张图像仅处理单个单词，但在 IAM 单词数据集上效果更好\n  * [基于文本行图像训练的模型](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002F7xwkcilho10rthn\u002Fline-model.zip?dl=1)：\n    可以处理一张图像中的多个单词\n* 将下载的压缩包内容解压到仓库的 `model` 目录中\n* 进入 `src` 目录\n* 运行推理代码：\n  * 执行 `python main.py` 来对单个单词的图像进行推理\n  * 执行 `python main.py --img_file .https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_f8c4e3545ac5.png` 来对文本行的图像进行推理\n\n以下展示了使用文本行模型时的输入图像及预期输出。\n\n![test](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_326f03ef9661.png)\n```\n> python main.py\n使用 ..\u002Fmodel\u002Fsnapshot-13 中存储的值初始化\n识别结果: \"word\"\n置信度: 0.9806370139122009\n```\n\n![test](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_f8c4e3545ac5.png)\n\n```\n> python main.py --img_file .https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_f8c4e3545ac5.png\n使用 ..\u002Fmodel\u002Fsnapshot-13 中存储的值初始化\n识别结果: \"or work on line level\"\n置信度: 0.6674373149871826\n```\n\n## 命令行参数\n* `--mode`: 选择“train”（训练）、“validate”（验证）或“infer”（推理）。默认为“infer”。\n* `--decoder`: 选择 CTC 解码器“bestpath”、“beamsearch”和“wordbeamsearch”。默认为“bestpath”。关于“wordbeamsearch”选项，请参见下文详细说明。\n* `--batch_size`: 批量大小。\n* `--data_dir`: 包含 IAM 数据集的目录（包含子目录 `img` 和 `gt`）。\n* `--fast`: 使用 LMDB 加速图像加载。\n* `--line_mode`: 训练文本行识别而非单个单词。\n* `--img_file`: 用于推理的图像。\n* `--dump`: 将神经网络的输出转储到 `dump` 文件夹中保存的 CSV 文件中。这些文件可用作 [CTCDecoder](https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FCTCDecoder) 的输入。\n\n\n## 集成词束搜索解码器\n\n[词束搜索解码器](https:\u002F\u002Fgithubharald.github.io\u002Fpublications.html)可以替代 TF 自带的两种解码器使用。\n单词会被限制在字典中包含的词汇范围内，但仍然可以识别任意非单词字符序列（如数字、标点符号）。\n下图展示了一个示例，在这个例子中，词束搜索能够正确识别文本，而其他解码器则失败。\n\n![decoder_comparison](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_readme_73e554ecb064.png)\n\n按照以下步骤集成词束搜索解码器：\n\n1. 克隆 [CTCWordBeamSearch](https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FCTCWordBeamSearch) 仓库\n2. 在 CTCWordBeamSearch 仓库的根目录下运行 `pip install .` 进行编译和安装\n3. 在执行 `main.py` 时指定命令行选项 `--decoder wordbeamsearch`，即可实际使用该解码器\n\n字典会在训练和验证模式下自动创建，使用 IAM 数据集中包含的所有单词（包括验证集中的单词），并保存到文件 `data\u002Fcorpus.txt` 中。此外，手动创建的单词字符列表可在文件 `model\u002FwordCharList.txt` 中找到。束宽设置为 50，以与原生束搜索解码器的束宽保持一致。\n\n\n## 在 IAM 数据集上训练模型\n\n### 准备数据集\n按照以下步骤获取 IAM 数据集：\n\n* 在此 [网站](http:\u002F\u002Fwww.fki.inf.unibe.ch\u002Fdatabases\u002Fiam-handwriting-database) 上免费注册\n* 下载 `words\u002Fwords.tgz`\n* 下载 `ascii\u002Fwords.txt`\n* 在您的磁盘上创建一个数据集目录，并创建两个子目录：`img` 和 `gt`\n* 将 `words.txt` 放入 `gt` 目录\n* 将 `words.tgz` 中的内容（目录 `a01`, `a02`, 等）放入 `img` 目录\n\n### 运行训练\n\n* 如果您想从头开始训练，请删除 `model` 目录中的文件\n* 进入 `src` 目录，执行 `python main.py --mode train --data_dir path\u002Fto\u002FIAM`\n* IAM 数据集被划分为 95% 的训练数据和 5% 的验证数据\n* 如果指定了 `--line_mode` 选项，\n  模型将基于将多个单词图像组合成一张图像的文本行图像进行训练\n* 当经过固定数量的轮次后仍无改进时，训练将停止\n\n预训练的单词模型是在 GTX 1050 Ti 显卡上使用以下命令训练的：\n```\npython main.py --mode train --fast --data_dir path\u002Fto\u002Fiam  --batch_size 500 --early_stopping 15\n```\n\n而文本行模型则是使用以下命令训练的：\n```\npython main.py --mode train --fast --data_dir path\u002Fto\u002Fiam  --batch_size 250 --early_stopping 10\n```\n\n\n### 快速图像加载\n即使只使用小型 GPU，从磁盘加载和解码 PNG 图像文件仍然是瓶颈。为了加速图像加载，使用了 LMDB 数据库：\n* 这里进入 `src` 目录，运行 `create_lmdb.py --data_dir path\u002Fto\u002Fiam`，指定 IAM 数据集目录\n* 在 IAM 数据集目录中会创建一个名为 `lmdb` 的子文件夹，其中包含 LMDB 文件\n* 在训练模型时，添加命令行选项 `--fast`\n\n数据集应位于 SSD 驱动器上。使用 `--fast` 选项并在 GTX 1050 Ti 上以 500 的批量大小训练单个单词大约需要 3 小时。而训练文本行则需要更长的时间。\n\n\n## 模型相关信息\n\n该模型是我为 [我的论文](https:\u002F\u002Fgithubharald.github.io\u002Fpublications.html) 实现的 HTR 系统的精简版本。它保留了以可接受的准确率识别文本所需的最低限度功能。模型由 5 层 CNN、2 层 RNN（LSTM）以及 CTC 损失和解码层组成。更多细节请参阅这篇 [Medium 文章](https:\u002F\u002Fharald-scheidl.medium.com\u002F2326a3487cd5)。\n\n\n## 参考文献\n* [使用 TensorFlow 构建手写文本识别系统](https:\u002F\u002Fharald-scheidl.medium.com\u002F2326a3487cd5)\n* [Scheidl - 历史文献中的手写文本识别](https:\u002F\u002Fgithubharald.github.io\u002Fpublications.html)\n* [Scheidl - 词束搜索：一种连接时序分类解码算法](https:\u002F\u002Fgithubharald.github.io\u002Fpublications.html)","# SimpleHTR 快速上手指南\n\nSimpleHTR 是一个基于 TensorFlow 实现的手写文本识别（HTR）系统，支持识别单个单词或整行文本。本项目在 IAM 离线手写数据集上训练，字符错误率约为 10%。\n\n## 环境准备\n\n*   **操作系统**: Linux, macOS 或 Windows\n*   **Python**: 建议 Python 3.6+\n*   **深度学习框架**: TensorFlow 2.x (代码已兼容 TF2)\n*   **依赖库**:\n    ```bash\n    pip install tensorflow opencv-python numpy\n    ```\n    *(注：如需使用更快的数据加载功能，需额外安装 `lmdb`)*\n\n## 安装步骤\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FSimpleHTR.git\n    cd SimpleHTR\n    ```\n\n2.  **下载预训练模型**\n    根据需求选择以下任一模型，并将解压后的内容放入项目根目录下的 `model` 文件夹中：\n\n    *   **单词模型** (推荐用于单字识别，准确率更高):\n        [下载链接](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002Fmya8hw6jyzqm0a3\u002Fword-model.zip?dl=1)\n    *   **文本行模型** (支持一行多词识别):\n        [下载链接](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002F7xwkcilho10rthn\u002Fline-model.zip?dl=1)\n\n    *操作示例:*\n    ```bash\n    mkdir -p model\n    # 假设已下载 word-model.zip\n    unzip word-model.zip -d model\u002F\n    ```\n\n3.  **(可选) 集成高级解码器**\n    若需使用更精准的 \"Word Beam Search\" 解码器（需字典约束），请执行：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FCTCWordBeamSearch.git\n    cd CTCWordBeamSearch\n    pip install .\n    cd ..\n    ```\n\n## 基本使用\n\n进入 `src` 目录即可运行推理。\n\n### 1. 识别单个单词图片\n使用默认配置（加载单词模型）：\n```bash\ncd src\npython main.py\n```\n**输出示例:**\n```text\nInit with stored values from ..\u002Fmodel\u002Fsnapshot-13\nRecognized: \"word\"\nProbability: 0.9806370139122009\n```\n\n### 2. 识别整行文本图片\n指定图片路径并使用文本行模型（确保 `model` 目录下放置的是 line-model）：\n```bash\npython main.py --img_file ..\u002Fdata\u002Fline.png\n```\n**输出示例:**\n```text\nInit with stored values from ..\u002Fmodel\u002Fsnapshot-13\nRecognized: \"or work on line level\"\nProbability: 0.6674373149871826\n```\n\n### 3. 常用参数说明\n*   `--img_file`: 指定待识别的图片路径。\n*   `--decoder`: 选择解码器，可选 `bestpath` (默认), `beamsearch`, `wordbeamsearch`。\n*   `--mode`: 切换模式，可选 `infer` (默认), `train`, `validate`。\n\n*(进阶训练及数据集准备请参考项目原始文档)*","某档案馆数字化团队正致力于将大量 20 世纪的手写会议记录转化为可检索的电子文本，以便历史学家进行关键词研究。\n\n### 没有 SimpleHTR 时\n- **人工转录效率极低**：工作人员需逐字手动输入手写内容，处理一页文档平均耗时数小时，项目周期被无限拉长。\n- **通用 OCR 识别失败**：传统印刷体识别工具无法适应潦草的手写笔迹，输出结果充满乱码，几乎不可用。\n- **缺乏行级上下文理解**：现有的简单模型只能识别单个单词，无法处理连贯的句子，导致语义断裂，后期校对成本极高。\n- **部署门槛高**：自行训练深度学习模型需要深厚的 TensorFlow  expertise 和大量标注数据，团队难以在短时间内构建可用系统。\n\n### 使用 SimpleHTR 后\n- **自动化批量识别**：利用预训练模型直接对整行手写图像进行推理，几分钟内即可完成过去数小时的人工工作量，效率提升百倍。\n- **高精度手写适配**：SimpleHTR 专为 IAM 手写数据集训练，能准确识别连笔和特殊字迹，字符错误率控制在 10% 左右，大幅减少人工修正。\n- **智能行级解码**：通过集成\"Word Beam Search\"解码器，模型能结合词典约束识别整句内容，有效纠正形近词错误，保证语句通顺。\n- **开箱即用的灵活性**：团队无需从头训练，只需下载预训练权重并运行简单的 Python 命令（如 `python main.py --img_file`），即可在本地快速部署验证。\n\nSimpleHTR 通过将复杂的深度学习手写识别能力封装为易用的命令行工具，让非 AI 专家团队也能低成本实现高质量的历史文档数字化。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgithubharald_SimpleHTR_3d01d44b.png","githubharald","Harald Scheidl","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fgithubharald_42b65c37.jpg","Interested in computer vision, deep learning, C, C++ and Python.",null,"Austria","https:\u002F\u002Fgithubharald.github.io","https:\u002F\u002Fgithub.com\u002Fgithubharald",[81],{"name":82,"color":83,"percentage":84},"Python","#3572A5",100,2159,915,"2026-04-18T18:34:26","MIT",4,"未说明, Windows (明确提及支持 Word Beam Search 解码器)","非必需，但推荐使用。文中提及在 GTX 1050 Ti 上训练耗时约 3 小时，表明低端 NVIDIA GPU 即可运行。未指定具体显存大小或 CUDA 版本要求。","未说明",{"notes":94,"python":95,"dependencies":96},"1. 该工具基于 TensorFlow 2 实现。2. 若需使用更精准的 'wordbeamsearch' 解码器，需额外克隆并编译安装 'CTCWordBeamSearch' 仓库（涉及 C++ 扩展编译）。3. 为加速图像加载，建议将数据集放置在 SSD 硬盘上并使用 LMDB 格式。4. 预训练模型需手动下载并放入 'model' 目录。5. 训练数据需自行从 IAM 数据库注册下载并按特定目录结构整理。","未说明 (需兼容 TensorFlow 2)",[97,98,99],"tensorflow>=2.0","CTCWordBeamSearch (可选，需单独编译安装)","lmdb (可选，用于加速数据加载)",[15,14],[102,103,104,105,106,107],"handwritten-text-recognition","ocr","recurrent-neural-networks","tensorflow","deep-learning","machine-learning","2026-03-27T02:49:30.150509","2026-04-19T15:38:06.051069",[111,116,121,126,131,136,141],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},42400,"增加卷积层数量或使用批归一化（Batch Normalization）对模型精度有何影响？","实验表明，加倍卷积层数量可以显著提高准确率（例如达到 76% 的词准确率），且在约 25 个 epoch 时终止训练比训练小型网络到收敛更快且效果更好。此外，引入批归一化（BN）层也能带来明显收益，可视化数据显示 BN 能改变损失下降速率并防止过拟合。代码实现上，可在 `conv2d` 和 `relu` 之间插入 `tf.layers.batch_normalization` 层。","https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FSimpleHTR\u002Fissues\u002F38",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},42397,"如何配置模型以支持非英语语言（如泰语、波斯语等无空格语言）的字符集和训练？","对于无空格语言（如泰语），默认的 Word Beam Search (WBS) 可能会错误地插入非单词字符。解决方法涉及修改 C++ 源码：在 `Beam.cpp` 中，默认逻辑会在遇到非单词字符时重置当前单词记忆，这适用于有分隔符的语言。对于无空格语言，需要调整代码逻辑，使得在一个单词完成后立即开始新单词，而不强制插入分隔符。具体需查看 `LanguageModel::getNextChars` 到 `Beam::createChildBeam` 的数据传递逻辑。此外，字符列表长度需根据目标语言调整，若长度不匹配会导致 TensorFlow 报错，需确保解码器设置的字符集长度与实际字符列表一致。","https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FSimpleHTR\u002Fissues\u002F122",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},42398,"为什么模型只能识别单个单词，无法处理包含多个单词的句子图像？","该模型主要在 IAM 数据集上训练，该数据集主要由手写单词组成，因此模型可能学习了特定的语言属性（如某些字符组合更可能出现）。如果输入是句子，效果可能不佳。建议检查输入图像的书写风格是否与训练数据一致。若需处理句子，可能需要使用专门针对句子训练的模式或调整预处理步骤，确保图像高度固定（如 32 像素）且宽度可变，同时增强文本的清晰度和粗度以改善识别效果。","https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FSimpleHTR\u002Fissues\u002F9",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},42399,"如何在项目中找到或生成所需的 JSON 文件（summary.json）？","JSON 文件通常包含在预训练模型包中。如果遇到找不到 `summary.json` 的错误，请检查模型目录结构。确保只将一个模型（单词模型或行模型）的内容解压到 `model\u002F` 目录下，不要混合存放。例如，若使用单词模型，应将所有相关文件（包括 json、权重文件等）直接放在 `model\u002F` 文件夹根目录下，这样程序才能正确读取路径 `..\u002Fmodel\u002Fsummary.json`。","https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FSimpleHTR\u002Fissues\u002F172",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},42401,"如何启用 GPU 进行模型训练？","代码默认支持 TensorFlow 的 GPU 加速，无需特殊标志。如果代码仅在 CPU 上运行，请首先确认本地 TensorFlow 是否正确安装并配置了 GPU 支持（即安装了 `tensorflow-gpu` 且 CUDA\u002FcuDNN 环境正常）。可以通过在其他训练中验证 GPU 是否工作来排查环境问题。若环境无误，代码应自动利用 GPU 进行训练。","https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FSimpleHTR\u002Fissues\u002F105",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},42402,"如何评估模型性能以及使用 Beam Search 的效果？","可以通过运行命令 `python main.py --validate --beamsearch` 来验证模型性能。使用该命令后，典型的输出结果包括字符错误率（Character error rate）和词准确率（Word accuracy）。例如，某次运行结果显示字符错误率为 10.46%，词准确率为 74.00%。这表明启用 Beam Search 能有效提升识别准确率。","https:\u002F\u002Fgithub.com\u002Fgithubharald\u002FSimpleHTR\u002Fissues\u002F31",{"id":142,"question_zh":143,"answer_zh":144,"source_url":135},42403,"训练时准确率始终为 0 且预测结果不变，可能是什么原因？","这种情况通常意味着模型未发生有效训练。常见原因包括：1. 字符集配置错误，导致解码器字符列表长度与设定值（如英语为 79）不匹配，引发 TensorFlow 错误；2. 图像预处理不当，建议将图像高度统一调整为 32 像素，宽度可变，并确保文本清晰加粗；3. 学习率或优化器设置问题。对于非拉丁语系（如波斯语），需特别注意字符集的定义和 CTC 解码器的适配性。",[]]