[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-getomni-ai--zerox":3,"tool-getomni-ai--zerox":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":101,"forks":102,"last_commit_at":103,"license":104,"difficulty_score":23,"env_os":105,"env_gpu":105,"env_ram":105,"env_deps":106,"category_tags":114,"github_topics":115,"view_count":118,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":119,"updated_at":120,"faqs":121,"releases":156},723,"getomni-ai\u002Fzerox","zerox","OCR & Document Extraction using vision models","zerox 是一款基于视觉大模型的开源 OCR 与文档提取工具，旨在将各类文档（如 PDF、图片、DOCX）高效转换为机器可读的 Markdown 格式。传统 OCR 技术往往难以准确还原复杂的页面布局、表格或图表，导致数据质量参差不齐。zerox 通过“文件转图片 + 视觉模型解析”的逻辑，利用 AI 对视觉信息的强大理解力，完美解决了这一痛点，让文档内容真正能被 AI 轻松消化。\n\nzerox 主要面向开发者、数据研究人员以及需要自动化处理文档的团队。它提供了 Node.js 和 Python 两种开发包，支持灵活集成到现有工作流中。技术上，zerox 的一大亮点是广泛的模型兼容性，用户可以选择 OpenAI、Azure OpenAI、AWS Bedrock 或 Google Gemini 等多种主流视觉模型接口。此外，它还具备智能排版保持、多页并发处理及自定义系统提示词等功能，能根据需求调整输出风格。对于希望提升文档数字化效率、构建 RAG 系统或进行数据分析的用户来说，zerox 是一个可靠且强大的选择。","![Hero Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgetomni-ai_zerox_readme_c5b121324858.png)\n\n## Zerox OCR\n\n\u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Fsmg2QfwtJ6\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgetomni-ai_zerox_readme_6118bdf8627d.png\" alt=\"Join us on Discord\" width=\"200px\">\n\u003C\u002Fa>\n\nA dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense!\n\nThe general logic:\n\n- Pass in a file (PDF, DOCX, image, etc.)\n- Convert that file into a series of images\n- Pass each image to GPT and ask nicely for Markdown\n- Aggregate the responses and return Markdown\n\nTry out the hosted version here: \u003Chttps:\u002F\u002Fgetomni.ai\u002Focr-demo>\nOr visit our full documentation at: \u003Chttps:\u002F\u002Fdocs.getomni.ai\u002Fzerox>\n\n## Getting Started\n\nZerox is available as both a Node and Python package.\n\n- [Node README](#node-zerox) - [npm package](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002Fzerox)\n- [Python README](#python-zerox) - [pip package](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpy-zerox\u002F)\n\n| Feature                   | Node.js                      | Python                     |\n| ------------------------- | ---------------------------- | -------------------------- |\n| PDF Processing            | ✓ (requires graphicsmagick)  | ✓ (requires poppler)       |\n| Image Processing          | ✓                            | ✓                          |\n| OpenAI Support            | ✓                            | ✓                          |\n| Azure OpenAI Support      | ✓                            | ✓                          |\n| AWS Bedrock Support       | ✓                            | ✓                          |\n| Google Gemini Support     | ✓                            | ✓                          |\n| Vertex AI Support         | ✗                            | ✓                          |\n| Data Extraction           | ✓ (`schema`)                 | ✗                          |\n| Per-page Extraction       | ✓ (`extractPerPage`)         | ✗                          |\n| Custom System Prompts     | ✗                            | ✓ (`custom_system_prompt`) |\n| Maintain Format Option    | ✓ (`maintainFormat`)         | ✓ (`maintain_format`)      |\n| Async API                 | ✓                            | ✓                          |\n| Error Handling Modes      | ✓ (`errorMode`)              | ✗                          |\n| Concurrent Processing     | ✓ (`concurrency`)            | ✓ (`concurrency`)          |\n| Temp Directory Management | ✓ (`tempDir`)                | ✓ (`temp_dir`)             |\n| Page Selection            | ✓ (`pagesToConvertAsImages`) | ✓ (`select_pages`)         |\n| Orientation Correction    | ✓ (`correctOrientation`)     | ✗                          |\n| Edge Trimming             | ✓ (`trimEdges`)              | ✗                          |\n\n## Node Zerox\n\n(Node.js SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, Google Gemini, etc.)\n\n### Installation\n\n```sh\nnpm install zerox\n```\n\nZerox uses `graphicsmagick` and `ghostscript` for the PDF => image processing step. These should be pulled automatically, but you may need to manually install.\n\nOn linux use:\n\n```\nsudo apt-get update\nsudo apt-get install -y graphicsmagick\n```\n\n## Usage\n\n**With file URL**\n\n```ts\nimport { zerox } from \"zerox\";\n\nconst result = await zerox({\n  filePath: \"https:\u002F\u002Fomni-demo-data.s3.amazonaws.com\u002Ftest\u002Fcs101.pdf\",\n  credentials: {\n    apiKey: process.env.OPENAI_API_KEY,\n  },\n});\n```\n\n**From local path**\n\n```ts\nimport { zerox } from \"zerox\";\nimport path from \"path\";\n\nconst result = await zerox({\n  filePath: path.resolve(__dirname, \".\u002Fcs101.pdf\"),\n  credentials: {\n    apiKey: process.env.OPENAI_API_KEY,\n  },\n});\n```\n\n### Parameters\n\n```ts\nconst result = await zerox({\n  \u002F\u002F Required\n  filePath: \"path\u002Fto\u002Ffile\",\n  credentials: {\n    apiKey: \"your-api-key\",\n    \u002F\u002F Additional provider-specific credentials as needed\n  },\n\n  \u002F\u002F Optional\n  cleanup: true, \u002F\u002F Clear images from tmp after run\n  concurrency: 10, \u002F\u002F Number of pages to run at a time\n  correctOrientation: true, \u002F\u002F True by default, attempts to identify and correct page orientation\n  directImageExtraction: false, \u002F\u002F Extract data directly from document images instead of the markdown\n  errorMode: ErrorMode.IGNORE, \u002F\u002F ErrorMode.THROW or ErrorMode.IGNORE, defaults to ErrorMode.IGNORE\n  extractionPrompt: \"\", \u002F\u002F LLM instructions for extracting data from document\n  extractOnly: false, \u002F\u002F Set to true to only extract structured data using a schema\n  extractPerPage, \u002F\u002F Extract data per page instead of the entire document\n  imageDensity: 300, \u002F\u002F DPI for image conversion\n  imageHeight: 2048, \u002F\u002F Maximum height for converted images\n  llmParams: {}, \u002F\u002F Additional parameters to pass to the LLM\n  maintainFormat: false, \u002F\u002F Slower but helps maintain consistent formatting\n  maxImageSize: 15, \u002F\u002F Maximum size of images to compress, defaults to 15MB\n  maxRetries: 1, \u002F\u002F Number of retries to attempt on a failed page, defaults to 1\n  maxTesseractWorkers: -1, \u002F\u002F Maximum number of Tesseract workers. Zerox will start with a lower number and only reach maxTesseractWorkers if needed\n  model: ModelOptions.OPENAI_GPT_4O, \u002F\u002F Model to use (supports various models from different providers)\n  modelProvider: ModelProvider.OPENAI, \u002F\u002F Choose from OPENAI, BEDROCK, GOOGLE, or AZURE\n  outputDir: undefined, \u002F\u002F Save combined result.md to a file\n  pagesToConvertAsImages: -1, \u002F\u002F Page numbers to convert to image as array (e.g. `[1, 2, 3]`) or a number (e.g. `1`). Set to -1 to convert all pages\n  prompt: \"\", \u002F\u002F LLM instructions for processing the document\n  schema: undefined, \u002F\u002F Schema for structured data extraction\n  tempDir: \"\u002Fos\u002Ftmp\", \u002F\u002F Directory to use for temporary files (default: system temp directory)\n  trimEdges: true, \u002F\u002F True by default, trims pixels from all edges that contain values similar to the given background color, which defaults to that of the top-left pixel\n});\n```\n\nThe `maintainFormat` option tries to return the markdown in a consistent format by passing the output of a prior page in as additional context for the next page. This requires the requests to run synchronously, so it's a lot slower. But valuable if your documents have a lot of tabular data, or frequently have tables that cross pages.\n\n```\nRequest #1 => page_1_image\nRequest #2 => page_1_markdown + page_2_image\nRequest #3 => page_2_markdown + page_3_image\n```\n\n### Example Output\n\n```js\n{\n  completionTime: 10038,\n  fileName: 'invoice_36258',\n  inputTokens: 25543,\n  outputTokens: 210,\n  pages: [\n    {\n      page: 1,\n      content: '# INVOICE # 36258\\n' +\n        '**Date:** Mar 06 2012  \\n' +\n        '**Ship Mode:** First Class  \\n' +\n        '**Balance Due:** $50.10  \\n' +\n        '## Bill To:\\n' +\n        'Aaron Bergman  \\n' +\n        '98103, Seattle,  \\n' +\n        'Washington, United States  \\n' +\n        '## Ship To:\\n' +\n        'Aaron Bergman  \\n' +\n        '98103, Seattle,  \\n' +\n        'Washington, United States  \\n' +\n        '\\n' +\n        '| Item                                       | Quantity | Rate   | Amount  |\\n' +\n        '|--------------------------------------------|----------|--------|---------|\\n' +\n        \"| Global Push Button Manager's Chair, Indigo | 1        | $48.71 | $48.71  |\\n\" +\n        '| Chairs, Furniture, FUR-CH-4421             |          |        |         |\\n' +\n        '\\n' +\n        '**Subtotal:** $48.71  \\n' +\n        '**Discount (20%):** $9.74  \\n' +\n        '**Shipping:** $11.13  \\n' +\n        '**Total:** $50.10  \\n' +\n        '---\\n' +\n        '**Notes:**  \\n' +\n        'Thanks for your business!  \\n' +\n        '**Terms:**  \\n' +\n        'Order ID : CA-2012-AB10015140-40974  ',\n      contentLength: 747,\n    }\n  ],\n  extracted: null,\n  summary: {\n    totalPages: 1,\n    ocr: {\n      failed: 0,\n      successful: 1,\n    },\n    extracted: null,\n  },\n}\n```\n\n### Data Extraction\n\nZerox supports structured data extraction from documents using a schema. This allows you to pull specific information from documents in a structured format instead of getting the full markdown conversion.\n\nSet `extractOnly: true` and provide a `schema` to extract structured data. The schema follows the [JSON Schema standard](https:\u002F\u002Fjson-schema.org\u002Funderstanding-json-schema\u002F).\n\nUse `extractPerPage` to extract data per page instead of from the whole document at once.\n\nYou can also set `extractionModel`, `extractionModelProvider`, and `extractionCredentials` to use a different model for extraction than OCR. By default, the same model is used.\n\n### Supported Models\n\nZerox supports a wide range of models across different providers:\n\n- **Azure OpenAI**\n\n  - GPT-4 Vision (gpt-4o)\n  - GPT-4 Vision Mini (gpt-4o-mini)\n  - GPT-4.1 (gpt-4.1)\n  - GPT-4.1 Mini (gpt-4.1-mini)\n\n- **OpenAI**\n\n  - GPT-4 Vision (gpt-4o)\n  - GPT-4 Vision Mini (gpt-4o-mini)\n  - GPT-4.1 (gpt-4.1)\n  - GPT-4.1 Mini (gpt-4.1-mini)\n\n- **AWS Bedrock**\n\n  - Claude 3 Haiku (2024.03, 2024.10)\n  - Claude 3 Sonnet (2024.02, 2024.06, 2024.10)\n  - Claude 3 Opus (2024.02)\n\n- **Google Gemini**\n  - Gemini 1.5 (Flash, Flash-8B, Pro)\n  - Gemini 2.0 (Flash, Flash-Lite)\n\n```ts\nimport { zerox } from \"zerox\";\nimport { ModelOptions, ModelProvider } from \"zerox\u002Fnode-zerox\u002Fdist\u002Ftypes\";\n\n\u002F\u002F OpenAI\nconst openaiResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.OPENAI,\n  model: ModelOptions.OPENAI_GPT_4O,\n  credentials: {\n    apiKey: process.env.OPENAI_API_KEY,\n  },\n});\n\n\u002F\u002F Azure OpenAI\nconst azureResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.AZURE,\n  model: ModelOptions.OPENAI_GPT_4O,\n  credentials: {\n    apiKey: process.env.AZURE_API_KEY,\n    endpoint: process.env.AZURE_ENDPOINT,\n  },\n});\n\n\u002F\u002F AWS Bedrock\nconst bedrockResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.BEDROCK,\n  model: ModelOptions.BEDROCK_CLAUDE_3_SONNET_2024_10,\n  credentials: {\n    accessKeyId: process.env.AWS_ACCESS_KEY_ID,\n    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,\n    region: process.env.AWS_REGION,\n  },\n});\n\n\u002F\u002F Google Gemini\nconst geminiResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.GOOGLE,\n  model: ModelOptions.GOOGLE_GEMINI_1_5_PRO,\n  credentials: {\n    apiKey: process.env.GEMINI_API_KEY,\n  },\n});\n```\n\n## Python Zerox\n\n(Python SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, etc.)\n\n### Installation\n\n- Install **poppler** on the system, it should be available in path variable. See the [pdf2image documentation](https:\u002F\u002Fpdf2image.readthedocs.io\u002Fen\u002Flatest\u002Finstallation.html) for instructions by platform.\n- Install py-zerox:\n\n```sh\npip install py-zerox\n```\n\nThe `pyzerox.zerox` function is an asynchronous API that performs OCR (Optical Character Recognition) to markdown using vision models. It processes PDF files and converts them into markdown format. Make sure to set up the environment variables for the model and the model provider before using this API.\n\nRefer to the [LiteLLM Documentation](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders) for setting up the environment and passing the correct model name.\n\n### Usage\n\n```python\nfrom pyzerox import zerox\nimport os\nimport json\nimport asyncio\n\n### Model Setup (Use only Vision Models) Refer: https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders ###\n\n## placeholder for additional model kwargs which might be required for some models\nkwargs = {}\n\n## system prompt to use for the vision model\ncustom_system_prompt = None\n\n# to override\n# custom_system_prompt = \"For the below PDF page, do something..something...\" ## example\n\n###################### Example for OpenAI ######################\nmodel = \"gpt-4o-mini\" ## openai model\nos.environ[\"OPENAI_API_KEY\"] = \"\" ## your-api-key\n\n\n###################### Example for Azure OpenAI ######################\nmodel = \"azure\u002Fgpt-4o-mini\" ## \"azure\u002F\u003Cyour_deployment_name>\" -> format \u003Cprovider>\u002F\u003Cmodel>\nos.environ[\"AZURE_API_KEY\"] = \"\" # \"your-azure-api-key\"\nos.environ[\"AZURE_API_BASE\"] = \"\" # \"https:\u002F\u002Fexample-endpoint.openai.azure.com\"\nos.environ[\"AZURE_API_VERSION\"] = \"\" # \"2023-05-15\"\n\n\n###################### Example for Gemini ######################\nmodel = \"gemini\u002Fgpt-4o-mini\" ## \"gemini\u002F\u003Cgemini_model>\" -> format \u003Cprovider>\u002F\u003Cmodel>\nos.environ['GEMINI_API_KEY'] = \"\" # your-gemini-api-key\n\n\n###################### Example for Anthropic ######################\nmodel=\"claude-3-opus-20240229\"\nos.environ[\"ANTHROPIC_API_KEY\"] = \"\" # your-anthropic-api-key\n\n###################### Vertex ai ######################\nmodel = \"vertex_ai\u002Fgemini-1.5-flash-001\" ## \"vertex_ai\u002F\u003Cmodel_name>\" -> format \u003Cprovider>\u002F\u003Cmodel>\n## GET CREDENTIALS\n## RUN ##\n# !gcloud auth application-default login - run this to add vertex credentials to your env\n## OR ##\nfile_path = 'path\u002Fto\u002Fvertex_ai_service_account.json'\n\n# Load the JSON file\nwith open(file_path, 'r') as file:\n    vertex_credentials = json.load(file)\n\n# Convert to JSON string\nvertex_credentials_json = json.dumps(vertex_credentials)\n\nvertex_credentials=vertex_credentials_json\n\n## extra args\nkwargs = {\"vertex_credentials\": vertex_credentials}\n\n###################### For other providers refer: https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders ######################\n\n# Define main async entrypoint\nasync def main():\n    file_path = \"https:\u002F\u002Fomni-demo-data.s3.amazonaws.com\u002Ftest\u002Fcs101.pdf\" ## local filepath and file URL supported\n\n    ## process only some pages or all\n    select_pages = None ## None for all, but could be int or list(int) page numbers (1 indexed)\n\n    output_dir = \".\u002Foutput_test\" ## directory to save the consolidated markdown file\n    result = await zerox(file_path=file_path, model=model, output_dir=output_dir,\n                        custom_system_prompt=custom_system_prompt,select_pages=select_pages, **kwargs)\n    return result\n\n\n# run the main function:\nresult = asyncio.run(main())\n\n# print markdown result\nprint(result)\n```\n\n### Parameters\n\n```python\nasync def zerox(\n    cleanup: bool = True,\n    concurrency: int = 10,\n    file_path: Optional[str] = \"\",\n    maintain_format: bool = False,\n    model: str = \"gpt-4o-mini\",\n    output_dir: Optional[str] = None,\n    temp_dir: Optional[str] = None,\n    custom_system_prompt: Optional[str] = None,\n    select_pages: Optional[Union[int, Iterable[int]]] = None,\n    **kwargs\n) -> ZeroxOutput:\n  ...\n```\n\nParameters\n\n- **cleanup** (bool, optional):\n  Whether to clean up temporary files after processing. Defaults to True.\n- **concurrency** (int, optional):\n  The number of concurrent processes to run. Defaults to 10.\n- **file_path** (Optional[str], optional):\n  The path to the PDF file to process. Defaults to an empty string.\n- **maintain_format** (bool, optional):\n  Whether to maintain the format from the previous page. Defaults to False.\n- **model** (str, optional):\n  The model to use for generating completions. Defaults to \"gpt-4o-mini\".\n  Refer to LiteLLM Providers for the correct model name, as it may differ depending on the provider.\n- **output_dir** (Optional[str], optional):\n  The directory to save the markdown output. Defaults to None.\n- **temp_dir** (str, optional):\n  The directory to store temporary files, defaults to some named folder in system's temp directory. If already exists, the contents will be deleted before Zerox uses it.\n- **custom_system_prompt** (str, optional):\n  The system prompt to use for the model, this overrides the default system prompt of Zerox.Generally it is not required unless you want some specific behavior. Defaults to None.\n- **select_pages** (Optional[Union[int, Iterable[int]]], optional):\n  Pages to process, can be a single page number or an iterable of page numbers. Defaults to None\n- **kwargs** (dict, optional):\n  Additional keyword arguments to pass to the litellm.completion method.\n  Refer to the LiteLLM Documentation and Completion Input for details.\n\nReturns\n\n- ZeroxOutput:\n  Contains the markdown content generated by the model and also some metadata (refer below).\n\n### Example Output (output from \"azure\u002Fgpt-4o-mini\")\n\nNote the output is manually wrapped for this documentation for better readability.\n\n````Python\nZeroxOutput(\n    completion_time=9432.975,\n    file_name='cs101',\n    input_tokens=36877,\n    output_tokens=515,\n    pages=[\n        Page(\n            content='| Type    | Description                          | Wrapper Class |\\n' +\n                    '|---------|--------------------------------------|---------------|\\n' +\n                    '| byte    | 8-bit signed 2s complement integer   | Byte          |\\n' +\n                    '| short   | 16-bit signed 2s complement integer  | Short         |\\n' +\n                    '| int     | 32-bit signed 2s complement integer  | Integer       |\\n' +\n                    '| long    | 64-bit signed 2s complement integer  | Long          |\\n' +\n                    '| float   | 32-bit IEEE 754 floating point number| Float         |\\n' +\n                    '| double  | 64-bit floating point number         | Double        |\\n' +\n                    '| boolean | may be set to true or false          | Boolean       |\\n' +\n                    '| char    | 16-bit Unicode (UTF-16) character    | Character     |\\n\\n' +\n                    'Table 26.2.: Primitive types in Java\\n\\n' +\n                    '### 26.3.1. Declaration & Assignment\\n\\n' +\n                    'Java is a statically typed language meaning that all variables must be declared before you can use ' +\n                    'them or refer to them. In addition, when declaring a variable, you must specify both its type and ' +\n                    'its identifier. For example:\\n\\n' +\n                    '```java\\n' +\n                    'int numUnits;\\n' +\n                    'double costPerUnit;\\n' +\n                    'char firstInitial;\\n' +\n                    'boolean isStudent;\\n' +\n                    '```\\n\\n' +\n                    'Each declaration specifies the variable’s type followed by the identifier and ending with a ' +\n                    'semicolon. The identifier rules are fairly standard: a name can consist of lowercase and ' +\n                    'uppercase alphabetic characters, numbers, and underscores but may not begin with a numeric ' +\n                    'character. We adopt the modern camelCasing naming convention for variables in our code. In ' +\n                    'general, variables must be assigned a value before you can use them in an expression. You do not ' +\n                    'have to immediately assign a value when you declare them (though it is good practice), but some ' +\n                    'value must be assigned before they can be used or the compiler will issue an error.\\n\\n' +\n                    'The assignment operator is a single equal sign, `=` and is a right-to-left assignment. That is, ' +\n                    'the variable that we wish to assign the value to appears on the left-hand-side while the value ' +\n                    '(literal, variable or expression) is on the right-hand-side. Using our variables from before, ' +\n                    'we can assign them values:\\n\\n' +\n                    '> 2 Instance variables, that is variables declared as part of an object do have default values. ' +\n                    'For objects, the default is `null`, for all numeric types, zero is the default value. For the ' +\n                    'boolean type, `false` is the default, and the default char value is `\\\\0`, the null-terminating ' +\n                    'character (zero in the ASCII table).',\n            content_length=2333,\n            page=1\n        )\n    ]\n)\n````\n\n## Supported File Types\n\nWe use a combination of `libreoffice` and `graphicsmagick` to do document => image conversion. For non-image \u002F non-PDF files, we use libreoffice to convert that file to a PDF, and then to an image.\n\n```js\n[\n  \"pdf\", \u002F\u002F Portable Document Format\n  \"doc\", \u002F\u002F Microsoft Word 97-2003\n  \"docx\", \u002F\u002F Microsoft Word 2007-2019\n  \"odt\", \u002F\u002F OpenDocument Text\n  \"ott\", \u002F\u002F OpenDocument Text Template\n  \"rtf\", \u002F\u002F Rich Text Format\n  \"txt\", \u002F\u002F Plain Text\n  \"html\", \u002F\u002F HTML Document\n  \"htm\", \u002F\u002F HTML Document (alternative extension)\n  \"xml\", \u002F\u002F XML Document\n  \"wps\", \u002F\u002F Microsoft Works Word Processor\n  \"wpd\", \u002F\u002F WordPerfect Document\n  \"xls\", \u002F\u002F Microsoft Excel 97-2003\n  \"xlsx\", \u002F\u002F Microsoft Excel 2007-2019\n  \"ods\", \u002F\u002F OpenDocument Spreadsheet\n  \"ots\", \u002F\u002F OpenDocument Spreadsheet Template\n  \"csv\", \u002F\u002F Comma-Separated Values\n  \"tsv\", \u002F\u002F Tab-Separated Values\n  \"ppt\", \u002F\u002F Microsoft PowerPoint 97-2003\n  \"pptx\", \u002F\u002F Microsoft PowerPoint 2007-2019\n  \"odp\", \u002F\u002F OpenDocument Presentation\n  \"otp\", \u002F\u002F OpenDocument Presentation Template\n];\n```\n\n## Credits\n\n- [Litellm](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm): \u003Chttps:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm> | This powers our python sdk to support all popular vision models from different providers.\n\n### License\n\nThis project is licensed under the MIT License.\n","![Hero Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgetomni-ai_zerox_readme_c5b121324858.png)\n\n## Zerox OCR\n\n\u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Fsmg2QfwtJ6\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgetomni-ai_zerox_readme_6118bdf8627d.png\" alt=\"Join us on Discord\" width=\"200px\">\n\u003C\u002Fa>\n\n一种将文档进行 OCR（光学字符识别）处理以供 AI 摄入的极其简单的方法。毕竟，文档本质上是一种视觉呈现。包含奇怪的布局、表格、图表等。视觉模型在这里非常适用！\n\n基本逻辑如下：\n\n- 传入文件（PDF、DOCX、图片等）\n- 将该文件转换为一系列图片\n- 将每张图片发送给 GPT 并礼貌地请求 Markdown 格式\n- 聚合响应结果并返回 Markdown\n\n在此处试用托管版本：\u003Chttps:\u002F\u002Fgetomni.ai\u002Focr-demo>\n或者访问我们的完整文档：\u003Chttps:\u002F\u002Fdocs.getomni.ai\u002Fzerox>\n\n## 开始使用\n\nZerox 同时提供 Node 和 Python 包。\n\n- [Node README](#node-zerox) - [npm package](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002Fzerox)\n- [Python README](#python-zerox) - [pip package](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpy-zerox\u002F)\n\n| 功能                   | Node.js                      | Python                     |\n| ------------------------- | ---------------------------- | -------------------------- |\n| PDF 处理            | ✓（需要 graphicsmagick）  | ✓（需要 poppler）       |\n| 图像处理          | ✓                            | ✓                          |\n| OpenAI 支持            | ✓                            | ✓                          |\n| Azure OpenAI 支持      | ✓                            | ✓                          |\n| AWS Bedrock 支持       | ✓                            | ✓                          |\n| Google Gemini 支持     | ✓                            | ✓                          |\n| Vertex AI 支持         | ✗                            | ✓                          |\n| 数据提取           | ✓（`schema`）                 | ✗                          |\n| 逐页提取       | ✓（`extractPerPage`）         | ✗                          |\n| 自定义系统提示词     | ✗                            | ✓（`custom_system_prompt`） |\n| 保持格式选项    | ✓（`maintainFormat`）         | ✓（`maintain_format`）      |\n| 异步 API                 | ✓                            | ✓                          |\n| 错误处理模式      | ✓（`errorMode`）              | ✗                          |\n| 并发处理     | ✓（`concurrency`）            | ✓（`concurrency`）          |\n| 临时目录管理 | ✓（`tempDir`）                | ✓（`temp_dir`）             |\n| 页面选择            | ✓（`pagesToConvertAsImages`） | ✓（`select_pages`）         |\n| 方向校正    | ✓（`correctOrientation`）     | ✗                          |\n| 边缘裁剪             | ✓（`trimEdges`）              | ✗                          |\n\n## Node Zerox\n\n(Node.js SDK - 支持来自不同提供商的视觉模型，如 OpenAI、Azure OpenAI、Anthropic、AWS Bedrock、Google Gemini 等。)\n\n### 安装\n\n```sh\nnpm install zerox\n```\n\nZerox 使用 `graphicsmagick` 和 `ghostscript` 进行 PDF 转图片的处理步骤。这些通常会自动拉取，但您可能需要手动安装。\n\n在 Linux 上使用：\n\n```\nsudo apt-get update\nsudo apt-get install -y graphicsmagick\n```\n\n## 使用\n\n**使用文件 URL**\n\n```ts\nimport { zerox } from \"zerox\";\n\nconst result = await zerox({\n  filePath: \"https:\u002F\u002Fomni-demo-data.s3.amazonaws.com\u002Ftest\u002Fcs101.pdf\",\n  credentials: {\n    apiKey: process.env.OPENAI_API_KEY,\n  },\n});\n```\n\n**从本地路径**\n\n```ts\nimport { zerox } from \"zerox\";\nimport path from \"path\";\n\nconst result = await zerox({\n  filePath: path.resolve(__dirname, \".\u002Fcs101.pdf\"),\n  credentials: {\n    apiKey: process.env.OPENAI_API_KEY,\n  },\n});\n```\n\n### 参数\n\n```ts\nconst result = await zerox({\n  \u002F\u002F Required\n  filePath: \"path\u002Fto\u002Ffile\",\n  credentials: {\n    apiKey: \"your-api-key\",\n    \u002F\u002F Additional provider-specific credentials as needed\n  },\n\n  \u002F\u002F Optional\n  cleanup: true, \u002F\u002F Clear images from tmp after run\n  concurrency: 10, \u002F\u002F Number of pages to run at a time\n  correctOrientation: true, \u002F\u002F True by default, attempts to identify and correct page orientation\n  directImageExtraction: false, \u002F\u002F Extract data directly from document images instead of the markdown\n  errorMode: ErrorMode.IGNORE, \u002F\u002F ErrorMode.THROW or ErrorMode.IGNORE, defaults to ErrorMode.IGNORE\n  extractionPrompt: \"\", \u002F\u002F LLM instructions for extracting data from document\n  extractOnly: false, \u002F\u002F Set to true to only extract structured data using a schema\n  extractPerPage, \u002F\u002F Extract data per page instead of the entire document\n  imageDensity: 300, \u002F\u002F DPI for image conversion\n  imageHeight: 2048, \u002F\u002F Maximum height for converted images\n  llmParams: {}, \u002F\u002F Additional parameters to pass to the LLM\n  maintainFormat: false, \u002F\u002F Slower but helps maintain consistent formatting\n  maxImageSize: 15, \u002F\u002F Maximum size of images to compress, defaults to 15MB\n  maxRetries: 1, \u002F\u002F Number of retries to attempt on a failed page, defaults to 1\n  maxTesseractWorkers: -1, \u002F\u002F Maximum number of Tesseract workers. Zerox will start with a lower number and only reach maxTesseractWorkers if needed\n  model: ModelOptions.OPENAI_GPT_4O, \u002F\u002F Model to use (supports various models from different providers)\n  modelProvider: ModelProvider.OPENAI, \u002F\u002F Choose from OPENAI, BEDROCK, GOOGLE, or AZURE\n  outputDir: undefined, \u002F\u002F Save combined result.md to a file\n  pagesToConvertAsImages: -1, \u002F\u002F Page numbers to convert to image as array (e.g. `[1, 2, 3]`) or a number (e.g. `1`). Set to -1 to convert all pages\n  prompt: \"\", \u002F\u002F LLM instructions for processing the document\n  schema: undefined, \u002F\u002F Schema for structured data extraction\n  tempDir: \"\u002Fos\u002Ftmp\", \u002F\u002F Directory to use for temporary files (default: system temp directory)\n  trimEdges: true, \u002F\u002F True by default, trims pixels from all edges that contain values similar to the given background color, which defaults to that of the top-left pixel\n});\n```\n\n`maintainFormat` 选项尝试通过传递前一页面的输出作为下一页面的额外上下文，以一致的格式返回 Markdown。这需要请求同步运行，因此速度要慢得多。但如果您的文档包含大量表格数据，或经常有跨页表格，则非常有价值。\n\n```\n请求 #1 => page_1_image\n请求 #2 => page_1_markdown + page_2_image\n请求 #3 => page_2_markdown + page_3_image\n```\n\n### 示例输出\n\n```js\n{\n  completionTime: 10038,\n  fileName: 'invoice_36258',\n  inputTokens: 25543,\n  outputTokens: 210,\n  pages: [\n    {\n      page: 1,\n      content: '# INVOICE # 36258\\n' +\n        '**Date:** Mar 06 2012  \\n' +\n        '**Ship Mode:** First Class  \\n' +\n        '**Balance Due:** $50.10  \\n' +\n        '## Bill To:\\n' +\n        'Aaron Bergman  \\n' +\n        '98103, Seattle,  \\n' +\n        'Washington, United States  \\n' +\n        '## Ship To:\\n' +\n        'Aaron Bergman  \\n' +\n        '98103, Seattle,  \\n' +\n        'Washington, United States  \\n' +\n        '\\n' +\n        '| Item                                       | Quantity | Rate   | Amount  |\\n' +\n        '|--------------------------------------------|----------|--------|---------|\\n' +\n        \"| Global Push Button Manager's Chair, Indigo | 1        | $48.71 | $48.71  |\\n\" +\n        '| Chairs, Furniture, FUR-CH-4421             |          |        |         |\\n' +\n        '\\n' +\n        '**Subtotal:** $48.71  \\n' +\n        '**Discount (20%):** $9.74  \\n' +\n        '**Shipping:** $11.13  \\n' +\n        '**Total:** $50.10  \\n' +\n        '---\\n' +\n        '**Notes:**  \\n' +\n        'Thanks for your business!  \\n' +\n        '**Terms:**  \\n' +\n        'Order ID : CA-2012-AB10015140-40974  ',\n      contentLength: 747,\n    }\n  ],\n  extracted: null,\n  summary: {\n    totalPages: 1,\n    ocr: {\n      failed: 0,\n      successful: 1,\n    },\n    extracted: null,\n  },\n}\n```\n\n### 数据提取\n\nZerox 支持使用模式（schema）从文档中提取结构化数据。这允许您以结构化格式从文档中拉取特定信息，而不是获取完整的 markdown 转换。\n\n设置 `extractOnly: true` 并提供 `schema` 以提取结构化数据。该模式遵循 [JSON Schema 标准](https:\u002F\u002Fjson-schema.org\u002Funderstanding-json-schema\u002F)。\n\n使用 `extractPerPage` 按页提取数据，而不是一次性从整个文档中提取。\n\n您还可以设置 `extractionModel`、`extractionModelProvider` 和 `extractionCredentials`，以便在提取时使用与 OCR 不同的模型。默认情况下，使用相同的模型。\n\n### 支持的模型\n\nZerox 支持来自不同提供商的多种模型：\n\n- **Azure OpenAI**\n\n  - GPT-4 Vision (gpt-4o)\n  - GPT-4 Vision Mini (gpt-4o-mini)\n  - GPT-4.1 (gpt-4.1)\n  - GPT-4.1 Mini (gpt-4.1-mini)\n\n- **OpenAI**\n\n  - GPT-4 Vision (gpt-4o)\n  - GPT-4 Vision Mini (gpt-4o-mini)\n  - GPT-4.1 (gpt-4.1)\n  - GPT-4.1 Mini (gpt-4.1-mini)\n\n- **AWS Bedrock**\n\n  - Claude 3 Haiku (2024.03, 2024.10)\n  - Claude 3 Sonnet (2024.02, 2024.06, 2024.10)\n  - Claude 3 Opus (2024.02)\n\n- **Google Gemini**\n  - Gemini 1.5 (Flash, Flash-8B, Pro)\n  - Gemini 2.0 (Flash, Flash-Lite)\n\n```ts\nimport { zerox } from \"zerox\";\nimport { ModelOptions, ModelProvider } from \"zerox\u002Fnode-zerox\u002Fdist\u002Ftypes\";\n\n\u002F\u002F OpenAI\nconst openaiResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.OPENAI,\n  model: ModelOptions.OPENAI_GPT_4O,\n  credentials: {\n    apiKey: process.env.OPENAI_API_KEY,\n  },\n});\n\n\u002F\u002F Azure OpenAI\nconst azureResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.AZURE,\n  model: ModelOptions.OPENAI_GPT_4O,\n  credentials: {\n    apiKey: process.env.AZURE_API_KEY,\n    endpoint: process.env.AZURE_ENDPOINT,\n  },\n});\n\n\u002F\u002F AWS Bedrock\nconst bedrockResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.BEDROCK,\n  model: ModelOptions.BEDROCK_CLAUDE_3_SONNET_2024_10,\n  credentials: {\n    accessKeyId: process.env.AWS_ACCESS_KEY_ID,\n    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,\n    region: process.env.AWS_REGION,\n  },\n});\n\n\u002F\u002F Google Gemini\nconst geminiResult = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  modelProvider: ModelProvider.GOOGLE,\n  model: ModelOptions.GOOGLE_GEMINI_1_5_PRO,\n  credentials: {\n    apiKey: process.env.GEMINI_API_KEY,\n  },\n});\n```\n\n## Python Zerox\n\n(Python SDK - 支持来自不同提供商的视觉模型，如 OpenAI、Azure OpenAI、Anthropic、AWS Bedrock 等。)\n\n### 安装\n\n- 在系统上安装 **poppler**，确保其可在路径变量中找到。请参阅 [pdf2image 文档](https:\u002F\u002Fpdf2image.readthedocs.io\u002Fen\u002Flatest\u002Finstallation.html) 获取各平台的安装说明。\n- 安装 py-zerox:\n\n```sh\npip install py-zerox\n```\n\n`pyzerox.zerox` 函数是一个异步 API，使用视觉模型将 OCR（光学字符识别）转换为 markdown。它处理 PDF 文件并将其转换为 markdown 格式。在使用此 API 之前，请确保设置好模型和模型提供者的环境变量。\n\n有关设置环境和传递正确模型名称的说明，请参阅 [LiteLLM 文档](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders)。\n\n### 使用\n\n```python\nfrom pyzerox import zerox\nimport os\nimport json\nimport asyncio\n\n### Model Setup (Use only Vision Models) Refer: https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders ###\n\n## placeholder for additional model kwargs which might be required for some models\nkwargs = {}\n\n## system prompt to use for the vision model\ncustom_system_prompt = None\n\n# to override\n# custom_system_prompt = \"For the below PDF page, do something..something...\" ## example\n\n###################### Example for OpenAI ######################\nmodel = \"gpt-4o-mini\" ## openai model\nos.environ[\"OPENAI_API_KEY\"] = \"\" ## your-api-key\n\n\n###################### Example for Azure OpenAI ######################\nmodel = \"azure\u002Fgpt-4o-mini\" ## \"azure\u002F\u003Cyour_deployment_name>\" -> format \u003Cprovider>\u002F\u003Cmodel>\nos.environ[\"AZURE_API_KEY\"] = \"\" # \"your-azure-api-key\"\nos.environ[\"AZURE_API_BASE\"] = \"\" # \"https:\u002F\u002Fexample-endpoint.openai.azure.com\"\nos.environ[\"AZURE_API_VERSION\"] = \"\" # \"2023-05-15\"\n\n\n###################### Example for Gemini ######################\nmodel = \"gemini\u002Fgpt-4o-mini\" ## \"gemini\u002F\u003Cgemini_model>\" -> format \u003Cprovider>\u002F\u003Cmodel>\nos.environ['GEMINI_API_KEY'] = \"\" # your-gemini-api-key\n\n\n###################### Example for Anthropic ######################\nmodel=\"claude-3-opus-20240229\"\nos.environ[\"ANTHROPIC_API_KEY\"] = \"\" # your-anthropic-api-key\n\n###################### Vertex ai ######################\nmodel = \"vertex_ai\u002Fgemini-1.5-flash-001\" ## \"vertex_ai\u002F\u003Cmodel_name>\" -> format \u003Cprovider>\u002F\u003Cmodel>\n## GET CREDENTIALS\n## RUN ##\n# !gcloud auth application-default login - run this to add vertex credentials to your env\n## OR ##\nfile_path = 'path\u002Fto\u002Fvertex_ai_service_account.json'\n\n# Load the JSON file\nwith open(file_path, 'r') as file:\n    vertex_credentials = json.load(file)\n\n# Convert to JSON string\nvertex_credentials_json = json.dumps(vertex_credentials)\n\nvertex_credentials=vertex_credentials_json\n\n## extra args\nkwargs = {\"vertex_credentials\": vertex_credentials}\n\n###################### For other providers refer: https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders ######################\n```\n\n```python\n# Define main async entrypoint\nasync def main():\n    file_path = \"https:\u002F\u002Fomni-demo-data.s3.amazonaws.com\u002Ftest\u002Fcs101.pdf\" ## local filepath and file URL supported\n\n    ## process only some pages or all\n    select_pages = None ## None for all, but could be int or list(int) page numbers (1 indexed)\n\n    output_dir = \".\u002Foutput_test\" ## directory to save the consolidated markdown file\n    result = await zerox(file_path=file_path, model=model, output_dir=output_dir,\n                        custom_system_prompt=custom_system_prompt,select_pages=select_pages, **kwargs)\n    return result\n\n\n# run the main function:\nresult = asyncio.run(main())\n\n# print markdown result\nprint(result)\n```\n\n### 参数\n\n```python\nasync def zerox(\n    cleanup: bool = True,\n    concurrency: int = 10,\n    file_path: Optional[str] = \"\",\n    maintain_format: bool = False,\n    model: str = \"gpt-4o-mini\",\n    output_dir: Optional[str] = None,\n    temp_dir: Optional[str] = None,\n    custom_system_prompt: Optional[str] = None,\n    select_pages: Optional[Union[int, Iterable[int]]] = None,\n    **kwargs\n) -> ZeroxOutput:\n  ...\n```\n\n参数\n\n- **cleanup** (bool, 可选):\n  是否在处理完成后清理临时文件。默认为 True。\n- **concurrency** (int, 可选):\n  要运行的并发进程数量。默认为 10。\n- **file_path** (Optional[str], 可选):\n  要处理的 PDF 文件的路径。默认为空字符串。\n- **maintain_format** (bool, 可选):\n  是否保留上一页的格式。默认为 False。\n- **model** (str, 可选):\n  用于生成补全内容的模型。默认为 \"gpt-4o-mini\"。\n  请参考 LiteLLM 提供商 (LiteLLM Providers) 以获取正确的模型名称，因为它可能因提供商而异。\n- **output_dir** (Optional[str], 可选):\n  保存 Markdown 输出的目录。默认为 None。\n- **temp_dir** (str, 可选):\n  存储临时文件的目录，默认为系统临时目录中的某个命名文件夹。如果已存在，Zerox 在使用前将删除其中的内容。\n- **custom_system_prompt** (str, 可选):\n  为模型使用的系统提示词，这将覆盖 Zerox 的默认系统提示词。通常不需要，除非您想要某些特定行为。默认为 None。\n- **select_pages** (Optional[Union[int, Iterable[int]]], 可选):\n  要处理的页面，可以是单个页码或页码的可迭代对象。默认为 None。\n- **kwargs** (dict, 可选):\n  传递给 litellm.completion 方法的额外关键字参数。\n  有关详细信息，请参阅 LiteLLM 文档和补全输入 (Completion Input)。\n\n返回\n\n- ZeroxOutput:\n  包含模型生成的 Markdown 内容以及一些元数据（见下文）。\n\n### 示例输出 (来自 \"azure\u002Fgpt-4o-mini\")\n\n注意：为了便于阅读，此文档中的输出已手动包装。\n\n````Python\nZeroxOutput(\n    completion_time=9432.975,\n    file_name='cs101',\n    input_tokens=36877,\n    output_tokens=515,\n    pages=[\n        Page(\n            content='| Type    | Description                          | Wrapper Class |\\n' +\n                    '|---------|--------------------------------------|---------------|\\n' +\n                    '| byte    | 8-bit signed 2s complement integer   | Byte          |\\n' +\n                    '| short   | 16-bit signed 2s complement integer  | Short         |\\n' +\n                    '| int     | 32-bit signed 2s complement integer  | Integer       |\\n' +\n                    '| long    | 64-bit signed 2s complement integer  | Long          |\\n' +\n                    '| float   | 32-bit IEEE 754 floating point number| Float         |\\n' +\n                    '| double  | 64-bit floating point number         | Double        |\\n' +\n                    '| boolean | may be set to true or false          | Boolean       |\\n' +\n                    '| char    | 16-bit Unicode (UTF-16) character    | Character     |\\n\\n' +\n                    'Table 26.2.: Primitive types in Java\\n\\n' +\n                    '### 26.3.1. Declaration & Assignment\\n\\n' +\n                    'Java is a statically typed language meaning that all variables must be declared before you can use ' +\n                    'them or refer to them. In addition, when declaring a variable, you must specify both its type and ' +\n                    'its identifier. For example:\\n\\n' +\n                    '```java\\n' +\n                    'int numUnits;\\n' +\n                    'double costPerUnit;\\n' +\n                    'char firstInitial;\\n' +\n                    'boolean isStudent;\\n' +\n                    '```\\n\\n' +\n                    'Each declaration specifies the variable’s type followed by the identifier and ending with a ' +\n                    'semicolon. The identifier rules are fairly standard: a name can consist of lowercase and ' +\n                    'uppercase alphabetic characters, numbers, and underscores but may not begin with a numeric ' +\n                    'character. We adopt the modern camelCasing naming convention for variables in our code. In ' +\n                    'general, variables must be assigned a value before you can use them in an expression. You do not ' +\n                    'have to immediately assign a value when you declare them (though it is good practice), but some ' +\n                    'value must be assigned before they can be used or the compiler will issue an error.\\n\\n' +\n                    'The assignment operator is a single equal sign, `=` and is a right-to-left assignment. That is, ' +\n                    'the variable that we wish to assign the value to appears on the left-hand-side while the value ' +\n                    '(literal, variable or expression) is on the right-hand-side. Using our variables from before, ' +\n                    'we can assign them values:\\n\\n' +\n                    '> 2 Instance variables, that is variables declared as part of an object do have default values. ' +\n                    'For objects, the default is `null`, for all numeric types, zero is the default value. For the ' +\n                    'boolean type, `false` is the default, and the default char value is `\\\\0`, the null-terminating ' +\n                    'character (zero in the ASCII table).',\n            content_length=2333,\n            page=1\n        )\n    ]\n)\n````\n\n## 支持的文件类型\n\n我们结合使用 `libreoffice`（LibreOffice）和 `graphicsmagick`（GraphicsMagick）来实现文档到图像的转换。对于非图像\u002F非 PDF（Portable Document Format）文件，我们使用 libreoffice 将该文件转换为 PDF，然后再转换为图像。\n\n```js\n[\n  \"pdf\", \u002F\u002F Portable Document Format\n  \"doc\", \u002F\u002F Microsoft Word 97-2003\n  \"docx\", \u002F\u002F Microsoft Word 2007-2019\n  \"odt\", \u002F\u002F OpenDocument Text\n  \"ott\", \u002F\u002F OpenDocument Text Template\n  \"rtf\", \u002F\u002F Rich Text Format\n  \"txt\", \u002F\u002F Plain Text\n  \"html\", \u002F\u002F HTML Document\n  \"htm\", \u002F\u002F HTML Document (alternative extension)\n  \"xml\", \u002F\u002F XML Document\n  \"wps\", \u002F\u002F Microsoft Works Word Processor\n  \"wpd\", \u002F\u002F WordPerfect Document\n  \"xls\", \u002F\u002F Microsoft Excel 97-2003\n  \"xlsx\", \u002F\u002F Microsoft Excel 2007-2019\n  \"ods\", \u002F\u002F OpenDocument Spreadsheet\n  \"ots\", \u002F\u002F OpenDocument Spreadsheet Template\n  \"csv\", \u002F\u002F Comma-Separated Values\n  \"tsv\", \u002F\u002F Tab-Separated Values\n  \"ppt\", \u002F\u002F Microsoft PowerPoint 97-2003\n  \"pptx\", \u002F\u002F Microsoft PowerPoint 2007-2019\n  \"odp\", \u002F\u002F OpenDocument Presentation\n  \"otp\", \u002F\u002F OpenDocument Presentation Template\n];\n```\n\n## 致谢\n\n- [Litellm](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm): \u003Chttps:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm> | 此项目为我们的 Python SDK（软件开发工具包）提供支持，以兼容来自不同提供商的所有流行视觉模型。\n\n### 许可证\n\n本项目根据 MIT License（MIT 许可证）获得许可。","# Zerox 快速上手指南\n\n## 简介\nZerox 是一款简单的 OCR 工具，旨在将文档（PDF、DOCX、图片等）转换为 Markdown 格式，以便 AI 模型直接摄入。它擅长处理包含复杂布局、表格和图表的文档。\n\n## 环境准备\n\n### Node.js 环境\nZerox 依赖 `graphicsmagick` 和 `ghostscript` 进行 PDF 转图片处理。\n- **Linux** 用户请手动安装：\n  ```sh\n  sudo apt-get update\n  sudo apt-get install -y graphicsmagick\n  ```\n- **Windows\u002FmacOS** 通常会自动处理，若遇到问题请查阅相关包管理器文档。\n\n### Python 环境\n- 系统需安装 `poppler`，并确保其位于环境变量 PATH 中。\n- 参考 [pdf2image 安装文档](https:\u002F\u002Fpdf2image.readthedocs.io\u002Fen\u002Flatest\u002Finstallation.html) 获取各平台安装说明。\n\n### API 密钥\n- 准备支持视觉模型的大厂 API 密钥（如 OpenAI, Azure, AWS Bedrock 等）。\n- 设置对应的环境变量（例如 `OPENAI_API_KEY`）。\n\n## 安装步骤\n\n### Node.js\n```sh\nnpm install zerox\n```\n\n### Python\n```sh\npip install py-zerox\n```\n\n## 基本使用\n\n### Node.js 示例\n支持传入本地文件路径或远程 URL，以下为使用 OpenAI 模型的极简示例：\n\n```ts\nimport { zerox } from \"zerox\";\n\nconst result = await zerox({\n  filePath: \"path\u002Fto\u002Ffile.pdf\",\n  credentials: {\n    apiKey: process.env.OPENAI_API_KEY,\n  },\n});\n\nconsole.log(result.pages[0].content);\n```\n\n### Python 示例\nPython SDK 为异步 API，需配置环境变量后调用：\n\n```python\nfrom pyzerox import zerox\nimport os\nimport asyncio\n\n# 设置环境变量\nos.environ[\"OPENAI_API_KEY\"] = \"your-api-key\"\n\nasync def main():\n    result = await zerox(\n        filepath=\"path\u002Fto\u002Ffile.pdf\",\n        model=\"gpt-4o-mini\"\n    )\n    print(result)\n\nasyncio.run(main())\n```\n\n## 支持的模型\nZerox 支持多种模型提供商，包括 OpenAI、Azure OpenAI、AWS Bedrock 和 Google Gemini。开发者可通过 `modelProvider` 和 `model` 参数灵活切换。","某金融科技公司的研发工程师正在搭建内部合规文档知识库，急需将数百份扫描版 PDF 合同转化为可被 AI 精准检索的结构化文本。\n\n### 没有 zerox 时\n- 传统 OCR 引擎对复杂表格和跨页内容识别率低，导致数据严重错位\n- 人工校对修复格式耗时巨大，单份百页合同需耗费数小时才能可用\n- 原始文档中的图表和特殊符号在转换过程中直接丢失，关键信息缺失\n- 需要针对不同供应商的 API 编写定制化解析脚本，系统耦合度高且维护困难\n\n### 使用 zerox 后\n- 直接输出高质量 Markdown，自动还原表格结构与段落层级关系\n- 基于视觉模型理解页面布局，无需人工干预即可处理复杂排版与图表\n- 支持 OpenAI、Azure 等多种大模型接口，灵活适配现有云架构环境\n- 通过 Node.js 或 Python 调用 zerox 快速集成，仅需几行代码即可实现批量自动化处理\n\nzerox 利用视觉模型替代传统 OCR 技术，让非结构化文档瞬间变为机器友好的 Markdown 格式。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgetomni-ai_zerox_1d42704d.png","getomni-ai","OmniAI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fgetomni-ai_4323f791.png","AI agents for lenders",null,"founders@getomni.ai","getomni_ai","https:\u002F\u002Fgetomni.ai","https:\u002F\u002Fgithub.com\u002Fgetomni-ai",[85,89,93,97],{"name":86,"color":87,"percentage":88},"TypeScript","#3178c6",67.6,{"name":90,"color":91,"percentage":92},"Python","#3572A5",27.2,{"name":94,"color":95,"percentage":96},"JavaScript","#f1e05a",3.1,{"name":98,"color":99,"percentage":100},"Makefile","#427819",2.1,12196,836,"2026-04-04T14:18:38","MIT","未说明",{"notes":107,"python":105,"dependencies":108},"Node 版本需系统安装 graphicsmagick 和 ghostscript；Python 版本需系统安装 poppler 并参考 pdf2image 文档。核心 OCR 及视觉处理依赖外部大模型 API（如 OpenAI、Azure、AWS Bedrock、Google Gemini），需配置对应 API Key 凭证。支持异步处理和并发控制。",[109,110,111,112,113,67],"graphicsmagick","ghostscript","poppler","pdf2image","py-zerox",[14],[116,117],"ocr","pdf",4,"2026-03-27T02:49:30.150509","2026-04-06T05:37:28.904190",[122,127,132,137,141,146,151],{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},3053,"使用 Gemini 模型处理图片时报错 \"Failed to process image\" 如何处理？","这通常是因为 Google 更改了 LLM 端点配置。请确保在代码中指定具体的模型版本号，例如将 model = \"gemini\u002Fgemini-1.5-pro\" 修改为 model = \"gemini\u002Fgemini-1.5-pro-002\"。","https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fissues\u002F67",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},3054,"运行 Azure OpenAI 示例代码时报错 \"Azure client is not an instance of AsyncAzureOpenAI\" 怎么办？","这是 LiteLLM 依赖库的版本兼容性问题。建议安装特定版本的 LiteLLM，执行命令：pip install litellm==1.53.3 后重试。","https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fissues\u002F170",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},3055,"如何确认自定义系统提示词（custom_system_prompt）是否正常工作？","可以在自定义提示词中明确指示模型返回 JSON 格式而非 Markdown。通过观察输出结果是否变为 JSON 格式，即可确认自定义提示词已生效。","https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fissues\u002F50",{"id":138,"question_zh":139,"answer_zh":140,"source_url":136},3056,"PDF 转 Markdown 时内容不完整或缺失是什么原因？","不同模型的处理能力存在差异。GPT-4o 比 4o-mini 能返回更多的 OCR 内容，但成本更高。如果 4o-mini 的结果不满意，建议切换至 4o 模型以获得更完整的提取效果。",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},3057,"如何将 Zerox 生成的 Markdown 内容转换为 HTML 以支持 EPUB 导出？","可以使用 Python 的 markdown 库进行手动转换。遍历零ox_response['pages']，获取每页 content 并使用 markdown.markdown(content) 生成 HTML 字符串，最后保存为 .html 文件。","https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fissues\u002F51",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},3058,"安装后页面顺序混乱或功能异常，可能是版本问题吗？","PyPI 上托管的可能仍是旧版本（如 0.0.3）。建议先卸载 pyzerox，然后通过 Git 仓库安装最新版：pip uninstall pyzerox 然后 pip install git+https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox.git。","https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fissues\u002F43",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},3059,"项目是否支持 Azure OpenAI 或其他第三方模型提供商？","官方原生支持有限，推荐集成 LiteLLM 库（https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm）。它支持通过单一 API 接口调用几乎所有流行的大模型提供商，包括 Azure、Anthropic 和 AWS Bedrock 等。","https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fissues\u002F13",[157,162,167,172,177,182,186,190,195],{"id":158,"version":159,"summary_zh":160,"released_at":161},102577,"v0.1.04","## What's Changed\r\n* 🩴 Add pre-process step to correct image orientation by @tylermaran in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F52\r\n* Fix ENAMETOOLONG error by truncating file names by @mefengl in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F61\r\n\r\n## New Contributors\r\n* @mefengl made their first contribution in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F61\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fcompare\u002Fv0.1.03...v0.1.04","2024-10-21T16:02:52",{"id":163,"version":164,"summary_zh":165,"released_at":166},102578,"v0.1.03","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fcompare\u002Fv0.1.02...v0.1.03","2024-10-20T18:31:17",{"id":168,"version":169,"summary_zh":170,"released_at":171},102579,"v0.1.02","## What's Changed\r\n* fix: allow empty select_pages again by @mg6 in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F53\r\n\r\n## New Contributors\r\n* @mg6 made their first contribution in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F53\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fcompare\u002Fv0.1.01...v0.1.02","2024-10-18T04:43:39",{"id":173,"version":174,"summary_zh":175,"released_at":176},102580,"v0.1.01","## What's Changed\r\n* Fixes the correct order processing in python by @michaelfeil in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F26\r\n* Page ordering\u002Fsorting fix, fix to use unique temp dir if userinput not provided also with better cleanup, added async shutil (aioshutil) by @pradhyumna85 in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F34\r\n* Add model parameters to restrict max tokens by @xdotli in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F22\r\n* fix: page numbering when only converting specific pages by @dfdeagle47 in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F24\r\n* minor typo fix by @pradhyumna85 in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F36\r\n* Python SDK: Feat. process specific pages, Other fixes and improvements by @pradhyumna85 in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F39\r\n* Fix not getting file types from urls without a file extension by @xdotli in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F45\r\n\r\n## New Contributors\r\n* @michaelfeil made their first contribution in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F26\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fcompare\u002Fv0.1.0...v0.1.01","2024-10-15T19:15:31",{"id":178,"version":179,"summary_zh":180,"released_at":181},102581,"v0.1.0","Big thanks to @pradhyumna85 for this PR. \r\n\r\n## What's Changed\r\n* FEAT: Introducing support for vision models from all major providers like Azure OpenAI, Anthropic etc and custom system prompt in python SDK by @pradhyumna85 in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F21\r\n* fix - missing total token count calculation for maintain_format = False by @pradhyumna85 in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F32\r\n\r\n## New Contributors\r\n* @pradhyumna85 made their first contribution in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F21\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fcompare\u002Fv0.0.2...v0.1.0","2024-09-12T05:16:48",{"id":183,"version":184,"summary_zh":79,"released_at":185},102582,"v0.0.2","2024-09-10T19:28:39",{"id":187,"version":188,"summary_zh":79,"released_at":189},102583,"v0.0.1","2024-09-10T19:14:34",{"id":191,"version":192,"summary_zh":193,"released_at":194},102575,"v0.1.06","## What's Changed\r\n* Fix Tesseract scheduler change by @ZeeshanZulfiqarAli in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F108\r\n* Use Tesseract's Orientation detection by @ZeeshanZulfiqarAli in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F111\r\n* allow optional density for node by @alexander-densley in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F116\r\n* added performance testing by @kailingding in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F119\r\n* Update py-zerox installation instructions for clarity by @bjlange in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F120\r\n* Tyler\u002Fadd ignore warnings + increase default token limit by @tylermaran in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F100\r\n* add density and height to python as well by @alexander-densley in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F118\r\n\r\n## New Contributors\r\n* @alexander-densley made their first contribution in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F116\r\n* @bjlange made their first contribution in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F120\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fcompare\u002Fv0.1.05...v0.1.06","2024-12-18T03:41:04",{"id":196,"version":197,"summary_zh":198,"released_at":199},102576,"v0.1.05","## What's Changed\r\n* Update README.md for graphicsmagick install, otherwise npm throws an error by @masparasol in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F59\r\n* docs: update README.md by @eltociear in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F68\r\n* Fine tuned models by @ZeeshanZulfiqarAli in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F81\r\n* Add testing mechanism by @ZeeshanZulfiqarAli in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F82\r\n* Add test cases by @ZeeshanZulfiqarAli in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F85\r\n* Add test cases 0005.pdf - 0010.pdf by @annapo23 in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F89\r\n* Add 10 more test cases by @kailingding in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F92\r\n* Add pre- and post-processing callbacks by @annapo23 in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F93\r\n* Add page number param for onPreProcess and onPostProcess callbacks by @ZeeshanZulfiqarAli in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F98\r\n* Use uuid as file name by @ZeeshanZulfiqarAli in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F102\r\n* Tesseract worker usage improvements by @ZeeshanZulfiqarAli in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F101\r\n* Revert \"Tesseract worker usage improvements\" by @annapo23 in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F105\r\n* add encoding to avoid error in windows by @rezawr in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F104\r\n\r\n## New Contributors\r\n* @masparasol made their first contribution in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F59\r\n* @ZeeshanZulfiqarAli made their first contribution in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F81\r\n* @kailingding made their first contribution in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F92\r\n* @rezawr made their first contribution in https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fpull\u002F104\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgetomni-ai\u002Fzerox\u002Fcompare\u002Fv0.1.04...v0.1.05","2024-11-23T01:38:07"]