[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-imanoop7--Ollama-OCR":3,"tool-imanoop7--Ollama-OCR":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":68,"owner_location":68,"owner_email":68,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":10,"env_os":95,"env_gpu":95,"env_ram":95,"env_deps":96,"category_tags":101,"github_topics":68,"view_count":10,"oss_zip_url":68,"oss_zip_packed_at":68,"status":16,"created_at":102,"updated_at":103,"faqs":104,"releases":145},1017,"imanoop7\u002FOllama-OCR","Ollama-OCR",null,"Ollama-OCR 是一款开源的文字识别工具，通过调用 Ollama 平台上的先进视觉语言模型，帮你从图片和 PDF 文件中提取文字内容。它既提供 Python 包方便开发者集成，也附带网页应用让普通用户开箱即用。\n\n相比传统 OCR 工具，Ollama-OCR 能更好地理解复杂文档结构，无论是表格、图表还是信息图都能准确识别。它支持 LLaVA、Llama 3.2 Vision、Granite3.2-vision 等多种模型，可根据需求在速度和精度间灵活选择。输出格式也很丰富，包括 Markdown、JSON、结构化表格等六种选项，方便直接用于后续处理。\n\n所有识别过程都在本地完成，数据无需上传云端，特别适合对隐私要求高的场景。批量处理功能支持并行处理大量文件，还能通过自定义提示词精准提取特定信息（如日期、姓名），并支持多语言识别。\n\n这款工具主要面向开发者、研究人员以及需要处理大量文档的企业用户。如果你正在寻找一款识别准确、注重隐私且高度可定制的 OCR 解决方案，Ollama-OCR 值得一试。","\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fimanoop7\u002FOllama-OCR.svg?style=social&label=Star\" alt=\"Stargazers\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\u002Fgraphs\u002Fcommit-activity\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fm\u002Fimanoop7\u002FOllama-OCR.svg\" alt=\"Commit Activity\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fimanoop7\u002FOllama-OCR.svg\" alt=\"Last Commit\">\u003C\u002Fa>\n\n![Ollama OCR Logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_5912dfe85be2.jpg)\n\n\u003Ch1 align=\"center\">Ollama OCR\u003C\u002Fh1>\n\nA powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images and PDF. Available both as a Python package and a Streamlit web application.\n\n## 🌟 Features\n\n### Supports PDF and Images (New! 🆕)\n\n- **Multiple Vision Models Support**\n  - [LLaVA](https:\u002F\u002Follama.com\u002Flibrary\u002Fllava): Efficient vision-language model for real-time processing (LLaVa model can generate wrong output sometimes)\n  - Llama 3.2 Vision: Advanced model with high accuracy for complex documents\n  - [Granite3.2-vision](https:\u002F\u002Follama.com\u002Flibrary\u002Fgranite3.2-vision): A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts,    infographics, plots, diagrams, and more.\n  - [Moondream](https:\u002F\u002Follama.com\u002Flibrary\u002Fmoondream): Small vision language model designed to run efficiently on edge devices.\n  - [Minicpm-v](https:\u002F\u002Follama.com\u002Flibrary\u002Fminicpm-v): MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344).\n\n- **Multiple Output Formats**\n  - Markdown: Preserves text formatting with headers and lists\n  - Plain Text: Clean, simple text extraction\n  - JSON: Structured data format\n  - Structured: Tables and organized data\n  - Key-Value Pairs: Extracts labeled information\n  - Table: Extract all tabular data.\n\n- **Batch Processing**\n  - Process multiple images in parallel\n  - Progress tracking for each image\n  - Image preprocessing (resize, normalize, etc.)\n\n- **Custom Prompts**\n  - Override default prompts with custom instructions for text extraction.\n\n## 📦 Package Installation\n\n```bash\npip install ollama-ocr\n```\n\n## 🚀 Quick Start\n### Prerequisites\n1. Install Ollama\n2. Pull the required model:\n\n```bash\nollama pull llama3.2-vision:11b\nollama pull granite3.2-vision\nollama pull moondream\nollama pull minicpm-v\n```\n## Using the Package\n\n### Single File Processing\n\n```python\nfrom ollama_ocr import OCRProcessor\n\n# Initialize OCR processor\nocr = OCRProcessor(model_name='llama3.2-vision:11b', base_url=\"http:\u002F\u002Fhost.docker.internal:11434\u002Fapi\u002Fgenerate\")  # You can use any vision model available on Ollama\n# you can pass your custom ollama api\n\n# Process an image\nresult = ocr.process_image(\n    image_path=\"path\u002Fto\u002Fyour\u002Fimage.png\", # path to your pdf files \"path\u002Fto\u002Fyour\u002Ffile.pdf\"\n    format_type=\"markdown\",  # Options: markdown, text, json, structured, key_value\n    custom_prompt=\"Extract all text, focusing on dates and names.\", # Optional custom prompt\n    language=\"English\" # Specify the language of the text (New! 🆕)\n)\nprint(result)\n```\n### Batch File \n\n```python\nfrom ollama_ocr import OCRProcessor\n\n# Initialize OCR processor\nocr = OCRProcessor(model_name='llama3.2-vision:11b', max_workers=4)  # max workers for parallel processing\n\n# Process multiple images\n# Process multiple images with progress tracking\nbatch_results = ocr.process_batch(\n    input_path=\"path\u002Fto\u002Fimages\u002Ffolder\",  # Directory or list of image paths\n    format_type=\"markdown\",\n    recursive=True,  # Search subdirectories\n    preprocess=True,  # Enable image preprocessing\n    custom_prompt=\"Extract all text, focusing on dates and names.\", # Optional custom prompt\n    language=\"English\" # Specify the language of the text (New! 🆕)\n)\n# Access results\nfor file_path, text in batch_results['results'].items():\n    print(f\"\\nFile: {file_path}\")\n    print(f\"Extracted Text: {text}\")\n\n# View statistics\nprint(\"\\nProcessing Statistics:\")\nprint(f\"Total images: {batch_results['statistics']['total']}\")\nprint(f\"Successfully processed: {batch_results['statistics']['successful']}\")\nprint(f\"Failed: {batch_results['statistics']['failed']}\")\n```\n\n## 📋 Output Format Details\n\n1. **Markdown Format**: The output is a markdown string containing the extracted text from the image.\n2. **Text Format**: The output is a plain text string containing the extracted text from the image.\n3. **JSON Format**: The output is a JSON object containing the extracted text from the image.\n4. **Structured Format**: The output is a structured object containing the extracted text from the image.\n5. **Key-Value Format**: The output is a dictionary containing the extracted text from the image.\n6. **Table Format**: Extract all tabular data.\n\n-----\n## 🌐 Streamlit Web Application(supports batch processing)\n- **User-Friendly Interface**\n  - Drag-and-drop file upload\n  - Real-time processing\n  - Download extracted text\n  - Image preview with details\n  - Responsive design\n  - Language Selection: Specify the language for better OCR accuracy. (New! 🆕)\n\n1. Clone the repository:\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR.git\ncd Ollama-OCR\n```\n2. Install dependencies:\n```bash\npip install -r requirements.txt\n```\n2. Go to the directory where app.py is located:\n```bash\ncd src\u002Follama_ocr      \n```\n3. Run the Streamlit app:\n```bash\nstreamlit run app.py\n```\n\n## 📒 Example Notebooks \n- [Ollama OCR on Colab](example_notebooks\\ollama_ocr_on_colab.ipynb): How to use Ollama-OCR on Google Colab.\n- [Example Notebook](example_notebooks\\example.ipynb): Example usage of Ollama OCR.\n- [Ollama OCR with Autogen](example_notebooks\\ollama-ocr-with-autogen.ipynb): Use Ollama-OCR with autogen.\n- [Ollama OCR with LangGraph](example_notebooks\\ollama-ocr-with-langgraph.ipynb): Use Ollama-OCR with LangGraph.\n\n\n## Examples Output\n### Input Image\n![Input Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_b20985dfb96f.png)\n\n\n### Sample Output\n![Sample Output](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_0528b35f951b.png)\n![Sample Output](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_893d95bef3e8.png)\n\n\n## 📄 License\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## 🙏 Acknowledgments\nBuilt with Ollama\nPowered by Vision Models\n\n\n## Star History\n\n\u003Ca href=\"https:\u002F\u002Fwww.star-history.com\u002F#imanoop7\u002FOllama-OCR&Date\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_6bd56ba926e2.png&theme=dark\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_6bd56ba926e2.png\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_6bd56ba926e2.png\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n","\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fimanoop7\u002FOllama-OCR.svg?style=social&label=Star\" alt=\"Stargazers\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\u002Fgraphs\u002Fcommit-activity\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fm\u002Fimanoop7\u002FOllama-OCR.svg\" alt=\"Commit Activity\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fimanoop7\u002FOllama-OCR.svg\" alt=\"Last Commit\">\u003C\u002Fa>\n\n![Ollama OCR Logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_5912dfe85be2.jpg)\n\n\u003Ch1 align=\"center\">Ollama OCR（光学字符识别）\u003C\u002Fh1>\n\n一个强大的 OCR（光学字符识别）包，通过 Ollama 使用最先进的视觉语言模型（vision language models）从图像和 PDF 中提取文本。既可作为 Python 包使用，也可作为 Streamlit 网页应用使用。\n\n## 🌟 功能特性\n\n### 支持 PDF 和图像（新增！🆕）\n\n- **多视觉模型支持**\n  - [LLaVA](https:\u002F\u002Follama.com\u002Flibrary\u002Fllava)：高效的视觉语言模型（vision-language model），适用于实时处理（LLaVA 模型有时会生成错误输出）\n  - Llama 3.2 Vision：针对复杂文档具有高精度的先进模型\n  - [Granite3.2-vision](https:\u002F\u002Follama.com\u002Flibrary\u002Fgranite3.2-vision)：紧凑高效的视觉语言模型（vision-language model），专为视觉文档理解而设计，能够从表格、图表、信息图、绘图、图表等中自动提取内容。\n  - [Moondream](https:\u002F\u002Follama.com\u002Flibrary\u002Fmoondream)：小型视觉语言模型（vision language model），专为在边缘设备上高效运行而设计。\n  - [Minicpm-v](https:\u002F\u002Follama.com\u002Flibrary\u002Fminicpm-v)：MiniCPM-V 2.6 可以处理任意宽高比和最高 180 万像素（例如 1344x1344）的图像。\n\n- **多种输出格式**\n  - Markdown：保留文本格式，包括标题和列表\n  - Plain Text（纯文本）：干净、简单的文本提取\n  - JSON：结构化数据格式\n  - Structured（结构化）：表格和有组织的数据\n  - Key-Value Pairs（键值对）：提取带标签的信息\n  - Table（表格）：提取所有表格数据。\n\n- **批量处理（Batch Processing）**\n  - 并行处理多个图像\n  - 每个图像的处理进度跟踪\n  - 图像预处理（调整大小、归一化等）\n\n- **自定义提示词（Custom Prompts）**\n  - 使用自定义指令覆盖默认提示词以进行文本提取。\n\n## 📦 包安装\n\n```bash\npip install ollama-ocr\n```\n\n## 🚀 快速开始\n### 前置条件\n1. 安装 Ollama\n2. 拉取所需模型：\n\n```bash\nollama pull llama3.2-vision:11b\nollama pull granite3.2-vision\nollama pull moondream\nollama pull minicpm-v\n```\n## 使用包\n\n### 单文件处理\n\n```python\nfrom ollama_ocr import OCRProcessor\n\n# Initialize OCR processor\nocr = OCRProcessor(model_name='llama3.2-vision:11b', base_url=\"http:\u002F\u002Fhost.docker.internal:11434\u002Fapi\u002Fgenerate\")  # You can use any vision model available on Ollama\n# you can pass your custom ollama api\n\n# Process an image\nresult = ocr.process_image(\n    image_path=\"path\u002Fto\u002Fyour\u002Fimage.png\", # path to your pdf files \"path\u002Fto\u002Fyour\u002Ffile.pdf\"\n    format_type=\"markdown\",  # Options: markdown, text, json, structured, key_value\n    custom_prompt=\"Extract all text, focusing on dates and names.\", # Optional custom prompt\n    language=\"English\" # Specify the language of the text (New! 🆕)\n)\nprint(result)\n```\n### 批量文件 \n\n```python\nfrom ollama_ocr import OCRProcessor\n\n# Initialize OCR processor\nocr = OCRProcessor(model_name='llama3.2-vision:11b', max_workers=4)  # max workers for parallel processing\n\n# Process multiple images\n# Process multiple images with progress tracking\nbatch_results = ocr.process_batch(\n    input_path=\"path\u002Fto\u002Fimages\u002Ffolder\",  # Directory or list of image paths\n    format_type=\"markdown\",\n    recursive=True,  # Search subdirectories\n    preprocess=True,  # Enable image preprocessing\n    custom_prompt=\"Extract all text, focusing on dates and names.\", # Optional custom prompt\n    language=\"English\" # Specify the language of the text (New! 🆕)\n)\n# Access results\nfor file_path, text in batch_results['results'].items():\n    print(f\"\\nFile: {file_path}\")\n    print(f\"Extracted Text: {text}\")\n\n# View statistics\nprint(\"\\nProcessing Statistics:\")\nprint(f\"Total images: {batch_results['statistics']['total']}\")\nprint(f\"Successfully processed: {batch_results['statistics']['successful']}\")\nprint(f\"Failed: {batch_results['statistics']['failed']}\")\n```\n\n## 📋 输出格式详情\n\n1. **Markdown 格式**：包含从图像中提取的文本的 markdown 字符串。\n2. **Text 格式**：包含从图像中提取的文本的纯文本字符串。\n3. **JSON 格式**：包含从图像中提取的文本的 JSON 对象。\n4. **Structured 格式**：包含从图像中提取的文本的结构化对象。\n5. **Key-Value 格式**：包含从图像中提取的文本的字典。\n6. **Table 格式**：提取所有表格数据。\n\n-----\n## 🌐 Streamlit 网页应用（支持批量处理）\n- **用户友好的界面**\n  - 拖放式文件上传\n  - 实时处理\n  - 下载提取的文本\n  - 带详细信息的图像预览\n  - 响应式设计\n  - 语言选择：指定语言以获得更好的 OCR 准确性。（新增！🆕）\n\n1. 克隆仓库：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR.git\ncd Ollama-OCR\n```\n2. 安装依赖：\n```bash\npip install -r requirements.txt\n```\n2. 进入 app.py 所在的目录：\n```bash\ncd src\u002Follama_ocr      \n```\n3. 运行 Streamlit 应用：\n```bash\nstreamlit run app.py\n```\n\n## 📒 示例笔记本\n- [Ollama OCR on Colab](example_notebooks\\ollama_ocr_on_colab.ipynb)：如何在 Google Colab 上使用 Ollama-OCR。\n- [Example Notebook](example_notebooks\\example.ipynb)：Ollama OCR 的使用示例。\n- [Ollama OCR with Autogen](example_notebooks\\ollama-ocr-with-autogen.ipynb)：将 Ollama-OCR 与 Autogen 配合使用。\n- [Ollama OCR with LangGraph](example_notebooks\\ollama-ocr-with-langgraph.ipynb)：将 Ollama-OCR 与 LangGraph 配合使用。\n\n\n## 示例输出\n### 输入图像\n![Input Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_b20985dfb96f.png)\n\n\n### 示例输出\n![Sample Output](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_0528b35f951b.png)\n![Sample Output](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_893d95bef3e8.png)\n\n\n## 📄 许可证\n本项目采用 MIT 许可证 - 详见 LICENSE 文件。\n\n## 🙏 致谢\n基于 Ollama 构建\n由视觉模型提供支持\n\n\n## Star 历史\n\n\u003Ca href=\"https:\u002F\u002Fwww.star-history.com\u002F#imanoop7\u002FOllama-OCR&Date\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_6bd56ba926e2.png&theme=dark\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_6bd56ba926e2.png\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_readme_6bd56ba926e2.png\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>","# Ollama-OCR 快速上手指南\n\n## 环境准备\n\n### 系统要求\n- Python 3.x 环境\n- 已安装并运行 Ollama 服务\n\n### 前置依赖\n1. 安装 Ollama（参考 [Ollama 官网](https:\u002F\u002Follama.com)）\n2. 拉取所需模型（推荐 `llama3.2-vision:11b`）：\n\n```bash\nollama pull llama3.2-vision:11b\n# 可选：其他模型\nollama pull granite3.2-vision\nollama pull moondream\nollama pull minicpm-v\n```\n\n## 安装步骤\n\n```bash\npip install ollama-ocr\n```\n\n## 基本使用\n\n### 1. 单文件处理（图片\u002FPDF）\n\n```python\nfrom ollama_ocr import OCRProcessor\n\n# 初始化\nocr = OCRProcessor(model_name='llama3.2-vision:11b')\n\n# 处理文件\nresult = ocr.process_image(\n    image_path=\"path\u002Fto\u002Fyour\u002Fimage.png\",  # 支持图片或 PDF\n    format_type=\"markdown\"  # 可选: markdown, text, json, structured, key_value, table\n)\n\nprint(result)\n```\n\n### 2. 批量处理\n\n```python\nfrom ollama_ocr import OCRProcessor\n\n# 初始化（设置 4 个并行工作线程）\nocr = OCRProcessor(model_name='llama3.2-vision:11b', max_workers=4)\n\n# 处理文件夹\nbatch_results = ocr.process_batch(\n    input_path=\"path\u002Fto\u002Fimages\u002Ffolder\",\n    format_type=\"markdown\",\n    recursive=True,  # 递归子目录\n    preprocess=True  # 启用图像预处理\n)\n\n# 查看结果\nfor file_path, text in batch_results['results'].items():\n    print(f\"\\n文件: {file_path}\")\n    print(f\"提取文本: {text}\")\n\n# 查看统计\nprint(f\"\\n总计: {batch_results['statistics']['total']}\")\nprint(f\"成功: {batch_results['statistics']['successful']}\")\nprint(f\"失败: {batch_results['statistics']['failed']}\")\n```\n\n### 3. 使用自定义提示词和语言\n\n```python\nresult = ocr.process_image(\n    image_path=\"path\u002Fto\u002Fyour\u002Fimage.png\",\n    format_type=\"markdown\",\n    custom_prompt=\"提取所有文本，重点关注日期和人名。\",\n    language=\"Chinese\"  # 支持中文\n)\n```","某中型制造企业的技术团队正在为质量管理部门开发一套供应商文档数字化系统，每天需要处理来自200多家供应商的PDF质检报告、手写整改单和印刷版材料合格证，共计约500页混合格式的文档。\n\n### 没有 Ollama-OCR 时\n- 传统OCR工具对质检报告中的复杂表格和折线图识别错误率高达40%，工程师需要逐行核对数据，平均每小时只能处理15份报告\n- 手写的整改单因字迹潦草，识别率不足30%，几乎全靠人工录入，一个专员每天只能处理50张\n- 材料合格证中的中英混合技术参数和特殊符号（如μm、°C）经常被识别为乱码，导致数据入库失败\n- 批量处理时内存占用超过8GB，服务器每周崩溃2-3次，严重影响产线质检进度\n- 云端OCR服务将供应商图纸上传至公网，违反供应链数据保密协议，存在商业机密泄露风险\n\n### 使用 Ollama-OCR 后\n- Ollama-OCR的Llama 3.2 Vision模型能精准识别表格结构和技术图表，识别准确率提升至92%，工程师每小时可处理80份报告，效率提升5倍\n- 通过自定义提示词优化，手写体识别率达到85%，配合人工复核，单专员日处理量提升至300张，人力成本降低60%\n- 支持多语言混合识别和特殊符号保留，技术参数准确无误，数据入库成功率从70%提高到99.5%\n- 批量处理500页文档仅需25分钟，内存峰值控制在3GB以内，系统稳定运行无崩溃\n- 完全本地化部署在质检部门内网服务器，所有供应商文档数据不出厂区，完全符合ISO27001供应链安全要求\n\nOllama-OCR将文档处理从\"人工为主、工具为辅\"的半自动化模式，升级为真正高效、安全、可扩展的全自动化流水线，让技术团队用一周就完成了原定一个月的开发任务。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fimanoop7_Ollama-OCR_0528b35f.png","imanoop7","Anoop Maurya","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fimanoop7_a054b1b0.jpg","AI","imanoop_7","https:\u002F\u002Fimanoop7.github.io\u002F","https:\u002F\u002Fgithub.com\u002Fimanoop7",[83,87],{"name":84,"color":85,"percentage":86},"Jupyter Notebook","#DA5B0B",95.7,{"name":88,"color":89,"percentage":90},"Python","#3572A5",4.3,2276,256,"2026-04-03T07:14:47","MIT","未说明",{"notes":97,"python":95,"dependencies":98},"1. 需先安装 Ollama 服务并手动拉取所需模型（如 llama3.2-vision:11b、granite3.2-vision、moondream、minicpm-v）；2. 提供 Python 包和 Streamlit Web 界面两种使用方式；3. 支持 Docker 部署（示例中使用 host.docker.internal）；4. 包含 Google Colab、Autogen 和 LangGraph 的集成示例；5. 支持自定义提示词、批量处理、语言选择及多种输出格式（Markdown、JSON、表格等）",[99,100],"ollama-ocr","streamlit",[14,37],"2026-03-27T02:49:30.150509","2026-04-06T08:46:06.342103",[105,110,115,120,125,130,135,140],{"id":106,"question_zh":107,"answer_zh":108,"source_url":109},4531,"处理PDF时出现PIL.UnidentifiedImageError错误怎么办？","此错误表示PIL库无法直接识别PDF文件。解决方案：1) 避免使用llava模型，改用支持PDF的模型如granite或llama3.2-vision；2) 确保代码中启用了PDF预处理功能，会自动将PDF转换为图像；3) 检查默认提示词是否针对OCR优化。有用户反馈切换为granite模型后问题解决。","https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\u002Fissues\u002F5",{"id":111,"question_zh":112,"answer_zh":113,"source_url":114},4532,"出现\"404 Client Error: Not Found for url: http:\u002F\u002Flocalhost:11434\u002Fapi\u002Fgenerate\"怎么办？","此错误通常由两个原因导致：1) **Ollama服务未运行**：需先安装Ollama应用(https:\u002F\u002Follama.com\u002Fdownload)，然后在终端执行`ollama serve`启动服务；2) **模型未下载**：使用`ollama pull \u003C模型名>`下载模型，并通过`ollama list`确认模型已存在。另外，确保代码或Streamlit界面中填写的模型名称与`ollama list`显示的完全一致。","https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\u002Fissues\u002F12",{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},4533,"导入OCRProcessor时出现ModuleNotFoundError: No module named 'frontend'错误","这是PyMuPDF库的依赖冲突问题。请按顺序执行：1) 更新ollama_ocr包：`pip install --upgrade ollama-ocr`；2) 手动创建项目根目录下的\u002Fstatic文件夹；3) 安装tools包：`pip install tools`；4) 如果问题仍存在，尝试安装frontend包：`pip install frontend`。有用户通过执行这些步骤成功解决。","https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\u002Fissues\u002F27",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},4534,"Ollama安装在远程服务器或Docker中，如何配置连接地址？","初始化OCRProcessor时通过base_url参数指定地址。例如：\n- **Docker连接宿主机**：`ocr = OCRProcessor(model_name='llama3.2-vision:11b', base_url=\"http:\u002F\u002Fhost.docker.internal:11434\")`\n- **远程服务器**：`ocr = OCRProcessor(model_name='llama3.2-vision:11b', base_url=\"http:\u002F\u002F\u003C服务器IP>:11434\")`\n注意：base_url只需填写协议和端口，不要包含\u002Fapi\u002Fgenerate路径。","https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\u002Fissues\u002F21",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},4535,"模型已安装但仍报404错误，提示\"page not found\"怎么办？","这是**模型名称不匹配**导致的典型问题。必须使用与`ollama list`命令显示**完全一致**的完整名称。例如，列表中显示为`llama3.2-vision:11b`，就不能只写`llama3.2-vision`。解决步骤：1) 运行`ollama list`查看已安装模型的完整名称；2) 在代码或Streamlit界面中复制粘贴完全相同的名称；3) 注意区分大小写和标签（如:11b）。","https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\u002Fissues\u002F24",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},4536,"pip安装的版本缺少custom_prompt参数怎么办？","PyPI上的版本可能滞后于GitHub最新代码。解决方案：1) 首先尝试更新：`pip install --upgrade ollama-ocr`；2) 如果仍不存在，等待维护者发布新版本（已确认会更新）；3) 作为临时方案，从GitHub源码安装最新版：`pip install git+https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR.git`。维护者建议创建新issue反馈问题。","https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\u002Fissues\u002F20",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},4537,"模型返回图像摘要而不是准确的OCR文本怎么办？","不同模型的文本提取能力差异很大：1) **更换模型**：推荐使用llama3.2-vision或granite模型，避免使用llava模型（倾向于描述图像）；2) **语言限制**：llama3.2-vision目前主要支持英文文档，非英文文档效果可能不佳；3) **检查提示词**：确保使用正确的OCR提示词，默认提示词已优化；4) **尝试小模型**：如果11B模型太慢，可尝试Qwen2-VL-2B等小型模型。","https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\u002Fissues\u002F3",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},4538,"llama3.2-vision:11b模型只返回图像描述而非OCR文本","这是模型行为问题，与Issue #3类似。具体解决：1) 明确指定输出格式参数，如`format_type=\"markdown\"`或`\"text\"`；2) 尝试添加自定义提示词强调提取文本：\"请准确提取图片中的所有文字内容，不要描述图像\"；3) 如果问题持续，切换到granite模型，多位用户反馈其在OCR任务上表现更稳定；4) 对于扫描质量差的图片，启用预处理功能`preprocess=True`。","https:\u002F\u002Fgithub.com\u002Fimanoop7\u002FOllama-OCR\u002Fissues\u002F14",[]]