[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-facebookresearch--nougat":3,"tool-facebookresearch--nougat":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":23,"env_os":95,"env_gpu":96,"env_ram":97,"env_deps":98,"category_tags":107,"github_topics":79,"view_count":108,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":109,"updated_at":110,"faqs":111,"releases":142},232,"facebookresearch\u002Fnougat","nougat","Implementation of Nougat Neural Optical Understanding for Academic Documents","Nougat 是一款专为学术文档设计的智能 PDF 解析工具，能将论文、报告等复杂排版的 PDF 文件精准转换为结构化的 Markdown 格式，尤其擅长识别数学公式（LaTeX）和表格。它解决了传统 OCR 工具在处理科研文献时公式错乱、排版丢失、语义断裂的问题，让机器真正“读懂”学术内容。\n\n适合研究人员、数据工程师、教育工作者或任何需要批量处理学术 PDF 的用户使用，尤其对需提取公式或构建论文数据库的人非常实用。开发者也可通过 API 集成到自己的系统中。\n\n技术上，Nougat 基于深度学习模型，采用端到端架构直接从图像生成结构化文本，支持 GPU 加速，并提供轻量级（small）与标准版（base）两种模型选择。输出格式兼容 Mathpix Markdown，便于后续渲染或导入笔记软件。命令行操作简单，也支持指定页码或批量处理，灵活适应不同场景。虽然目前对部分设备的失败检测机制尚不稳定，但可通过参数调整规避问题。开源免费，由 Meta 研究团队维护，持续更新中。","\u003Cdiv align=\"center\">\n\u003Ch1>Nougat: Neural Optical Understanding for Academic Documents\u003C\u002Fh1>\n\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arxiv.2308.13418-white)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.13418)\n[![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Ffacebookresearch\u002Fnougat)](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fnougat-ocr?logo=pypi)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fnougat-ocr)\n[![Python 3.9+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.9+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Frelease\u002Fpython-390\u002F)\n[![Code style: black](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-000000.svg)](https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack)\n[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗%20Hugging%20Face-Community%20Space-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fysharma\u002Fnougat)\n\n\u003C\u002Fdiv>\n\nThis is the official repository for Nougat, the academic document PDF parser that understands LaTeX math and tables.\n\nProject page: https:\u002F\u002Ffacebookresearch.github.io\u002Fnougat\u002F\n\n## Install\n\nFrom pip:\n```\npip install nougat-ocr\n```\n\nFrom repository:\n```\npip install git+https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\n```\n\n> Note, on Windows: If you want to utilize a GPU, make sure you first install the correct PyTorch version. Follow instructions [here](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F)\n\nThere are extra dependencies if you want to call the model from an API or generate a dataset.\nInstall via\n\n`pip install \"nougat-ocr[api]\"` or `pip install \"nougat-ocr[dataset]\"`\n\n### Get prediction for a PDF\n#### CLI\n\nTo get predictions for a PDF run\n\n```\n$ nougat path\u002Fto\u002Ffile.pdf -o output_directory\n```\n\nA path to a directory or to a file where each line is a path to a PDF can also be passed as a positional argument\n\n```\n$ nougat path\u002Fto\u002Fdirectory -o output_directory\n```\n\n```\nusage: nougat [-h] [--batchsize BATCHSIZE] [--checkpoint CHECKPOINT] [--model MODEL] [--out OUT]\n              [--recompute] [--markdown] [--no-skipping] pdf [pdf ...]\n\npositional arguments:\n  pdf                   PDF(s) to process.\n\noptions:\n  -h, --help            show this help message and exit\n  --batchsize BATCHSIZE, -b BATCHSIZE\n                        Batch size to use.\n  --checkpoint CHECKPOINT, -c CHECKPOINT\n                        Path to checkpoint directory.\n  --model MODEL_TAG, -m MODEL_TAG\n                        Model tag to use.\n  --out OUT, -o OUT     Output directory.\n  --recompute           Recompute already computed PDF, discarding previous predictions.\n  --full-precision      Use float32 instead of bfloat16. Can speed up CPU conversion for some setups.\n  --no-markdown         Do not add postprocessing step for markdown compatibility.\n  --markdown            Add postprocessing step for markdown compatibility (default).\n  --no-skipping         Don't apply failure detection heuristic.\n  --pages PAGES, -p PAGES\n                        Provide page numbers like '1-4,7' for pages 1 through 4 and page 7. Only works for single PDFs.\n```\n\nThe default model tag is `0.1.0-small`. If you want to use the base model, use `0.1.0-base`.\n```\n$ nougat path\u002Fto\u002Ffile.pdf -o output_directory -m 0.1.0-base\n```\n\nIn the output directory every PDF will be saved as a `.mmd` file, the lightweight markup language, mostly compatible with [Mathpix Markdown](https:\u002F\u002Fgithub.com\u002FMathpix\u002Fmathpix-markdown-it) (we make use of the LaTeX tables).\n\n> Note: On some devices the failure detection heuristic is not working properly. If you experience a lot of `[MISSING_PAGE]` responses, try to run with the `--no-skipping` flag. Related: [#11](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F11), [#67](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F67)\n\n#### API\n\nWith the extra dependencies you use `app.py` to start an API. Call\n\n```sh\n$ nougat_api\n```\n\nTo get a prediction of a PDF file by making a POST request to http:\u002F\u002F127.0.0.1:8503\u002Fpredict\u002F. It also accepts parameters `start` and `stop` to limit the computation to select page numbers (boundaries are included).\n\nThe response is a string with the markdown text of the document.\n\n```sh\ncurl -X 'POST' \\\n  'http:\u002F\u002F127.0.0.1:8503\u002Fpredict\u002F' \\\n  -H 'accept: application\u002Fjson' \\\n  -H 'Content-Type: multipart\u002Fform-data' \\\n  -F 'file=@\u003CPDFFILE.pdf>;type=application\u002Fpdf'\n```\nTo use the limit the conversion to pages 1 to 5, use the start\u002Fstop parameters in the request URL: http:\u002F\u002F127.0.0.1:8503\u002Fpredict\u002F?start=1&stop=5\n\n## Dataset\n### Generate dataset\n\nTo generate a dataset you need \n\n1. A directory containing the PDFs\n2. A directory containing the `.html` files (processed `.tex` files by [LaTeXML](https:\u002F\u002Fmath.nist.gov\u002F~BMiller\u002FLaTeXML\u002F)) with the same folder structure\n3. A binary file of [pdffigures2](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fpdffigures2) and a corresponding environment variable `export PDFFIGURES_PATH=\"\u002Fpath\u002Fto\u002Fbinary.jar\"`\n\nNext run\n\n```\npython -m nougat.dataset.split_htmls_to_pages --html path\u002Fhtml\u002Froot --pdfs path\u002Fpdf\u002Froot --out path\u002Fpaired\u002Foutput --figure path\u002Fpdffigures\u002Foutputs\n```\n\nAdditional arguments include\n\n| Argument              | Description                                |\n| --------------------- | ------------------------------------------ |\n| `--recompute`         | recompute all splits                       |\n| `--markdown MARKDOWN` | Markdown output dir                        |\n| `--workers WORKERS`   | How many processes to use                  |\n| `--dpi DPI`           | What resolution the pages will be saved at |\n| `--timeout TIMEOUT`   | max time per paper in seconds              |\n| `--tesseract`         | Tesseract OCR prediction for each page     |\n\nFinally create a `jsonl` file that contains all the image paths, markdown text and meta information.\n\n```\npython -m nougat.dataset.create_index --dir path\u002Fpaired\u002Foutput --out index.jsonl\n```\n\nFor each `jsonl` file you also need to generate a seek map for faster data loading:\n\n```\npython -m nougat.dataset.gen_seek file.jsonl\n```\n\nThe resulting directory structure can look as follows:\n\n```\nroot\u002F\n├── images\n├── train.jsonl\n├── train.seek.map\n├── test.jsonl\n├── test.seek.map\n├── validation.jsonl\n└── validation.seek.map\n```\n\nNote that the `.mmd` and `.json` files in the `path\u002Fpaired\u002Foutput` (here `images`) are no longer required.\nThis can be useful for pushing to a S3 bucket by halving the amount of files.\n\n## Training\n\nTo train or fine tune a Nougat model, run \n\n```\npython train.py --config config\u002Ftrain_nougat.yaml\n```\n\n## Evaluation\n\nRun \n\n```\npython test.py --checkpoint path\u002Fto\u002Fcheckpoint --dataset path\u002Fto\u002Ftest.jsonl --save_path path\u002Fto\u002Fresults.json\n```\n\nTo get the results for the different text modalities, run\n\n```\npython -m nougat.metrics path\u002Fto\u002Fresults.json\n```\n\n## FAQ\n\n- Why am I only getting `[MISSING_PAGE]`?\n\n  Nougat was trained on scientific papers found on arXiv and PMC. Is the document you're processing similar to that?\n  What language is the document in? Nougat works best with English papers, other Latin-based languages might work. **Chinese, Russian, Japanese etc. will not work**.\n  If these requirements are fulfilled it might be because of false positives in the failure detection, when computing on CPU or older GPUs ([#11](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F11)). Try passing the `--no-skipping` flag for now.\n\n- Where can I download the model checkpoint from.\n\n  They are uploaded here on GitHub in the release section. You can also download them during the first execution of the program. Choose the preferred preferred model by passing `--model 0.1.0-{base,small}`\n\n## Citation\n\n```\n@misc{blecher2023nougat,\n      title={Nougat: Neural Optical Understanding for Academic Documents}, \n      author={Lukas Blecher and Guillem Cucurull and Thomas Scialom and Robert Stojnic},\n      year={2023},\n      eprint={2308.13418},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n\n## Acknowledgments\n\nThis repository builds on top of the [Donut](https:\u002F\u002Fgithub.com\u002Fclovaai\u002Fdonut\u002F) repository.\n\n## License\n\nNougat codebase is licensed under MIT.\n\nNougat model weights are licensed under CC-BY-NC.\n","\u003Cdiv align=\"center\">\n\u003Ch1>Nougat：用于学术文档的神经光学理解模型\u003C\u002Fh1>\n\n[![论文](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F论文-arxiv.2308.13418-white)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.13418)\n[![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Ffacebookresearch\u002Fnougat)](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fnougat-ocr?logo=pypi)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fnougat-ocr)\n[![Python 3.9+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.9+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Frelease\u002Fpython-390\u002F)\n[![代码风格: black](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-000000.svg)](https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack)\n[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗%20Hugging%20Face-社区空间-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fysharma\u002Fnougat)\n\n\u003C\u002Fdiv>\n\n这是 Nougat 的官方仓库。Nougat 是一款能够理解 LaTeX 数学公式和表格的学术 PDF 文档解析器。\n\n项目主页：https:\u002F\u002Ffacebookresearch.github.io\u002Fnougat\u002F\n\n## 安装\n\n通过 pip 安装：\n```\npip install nougat-ocr\n```\n\n通过仓库安装：\n```\npip install git+https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\n```\n\n> 注意（Windows 用户）：如需使用 GPU，请先安装正确的 PyTorch 版本。请参考 [此处说明](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F)\n\n若需通过 API 调用模型或生成数据集，还需安装额外依赖项。\n可通过以下命令安装：\n\n`pip install \"nougat-ocr[api]\"` 或 `pip install \"nougat-ocr[dataset]\"`\n\n### 对 PDF 进行预测\n#### 命令行界面 (CLI)\n\n对单个 PDF 文件进行预测：\n\n```\n$ nougat path\u002Fto\u002Ffile.pdf -o output_directory\n```\n\n也可传入目录路径，或包含每行一个 PDF 路径的文件：\n\n```\n$ nougat path\u002Fto\u002Fdirectory -o output_directory\n```\n\n```\n用法: nougat [-h] [--batchsize BATCHSIZE] [--checkpoint CHECKPOINT] [--model MODEL] [--out OUT]\n              [--recompute] [--markdown] [--no-skipping] pdf [pdf ...]\n\n位置参数:\n  pdf                   待处理的 PDF 文件。\n\n选项:\n  -h, --help            显示帮助信息并退出\n  --batchsize BATCHSIZE, -b BATCHSIZE\n                        使用的批处理大小。\n  --checkpoint CHECKPOINT, -c CHECKPOINT\n                        检查点目录路径。\n  --model MODEL_TAG, -m MODEL_TAG\n                        使用的模型标签。\n  --out OUT, -o OUT     输出目录。\n  --recompute           重新计算已处理过的 PDF，丢弃之前的预测结果。\n  --full-precision      使用 float32 而非 bfloat16。在某些配置下可加速 CPU 转换。\n  --no-markdown         不添加为兼容 markdown 的后处理步骤。\n  --markdown            添加为兼容 markdown 的后处理步骤（默认）。\n  --no-skipping         不应用失败检测启发式方法。\n  --pages PAGES, -p PAGES\n                        指定页码范围，如 '1-4,7' 表示第 1 至 4 页及第 7 页。仅适用于单个 PDF。\n```\n\n默认模型标签为 `0.1.0-small`。如需使用基础模型，请使用 `0.1.0-base`：\n```\n$ nougat path\u002Fto\u002Ffile.pdf -o output_directory -m 0.1.0-base\n```\n\n输出目录中每个 PDF 将保存为 `.mmd` 文件（轻量级标记语言），基本兼容 [Mathpix Markdown](https:\u002F\u002Fgithub.com\u002FMathpix\u002Fmathpix-markdown-it)（我们利用了其中的 LaTeX 表格功能）。\n\n> 注意：在某些设备上，失败检测启发式方法可能无法正常工作。如果遇到大量 `[MISSING_PAGE]` 响应，请尝试使用 `--no-skipping` 参数运行。相关问题：[#11](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F11)，[#67](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F67)\n\n#### API\n\n安装额外依赖后，可通过 `app.py` 启动 API。执行：\n\n```sh\n$ nougat_api\n```\n\n向 http:\u002F\u002F127.0.0.1:8503\u002Fpredict\u002F 发送 POST 请求即可获取 PDF 文件的预测结果。请求支持 `start` 和 `stop` 参数，用于限定计算的页码范围（含边界）。\n\n响应内容为文档的 markdown 文本字符串。\n\n```sh\ncurl -X 'POST' \\\n  'http:\u002F\u002F127.0.0.1:8503\u002Fpredict\u002F' \\\n  -H 'accept: application\u002Fjson' \\\n  -H 'Content-Type: multipart\u002Fform-data' \\\n  -F 'file=@\u003CPDFFILE.pdf>;type=application\u002Fpdf'\n```\n如需将转换限制在第 1 至 5 页，请在请求 URL 中加入 start\u002Fstop 参数：http:\u002F\u002F127.0.0.1:8503\u002Fpredict\u002F?start=1&stop=5\n\n## 数据集\n### 生成数据集\n\n生成数据集需要：\n\n1. 包含 PDF 文件的目录\n2. 包含 `.html` 文件的目录（由 [LaTeXML](https:\u002F\u002Fmath.nist.gov\u002F~BMiller\u002FLaTeXML\u002F) 处理 `.tex` 文件生成），且目录结构与 PDF 目录一致\n3. [pdffigures2](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fpdffigures2) 的二进制文件，并设置环境变量 `export PDFFIGURES_PATH=\"\u002Fpath\u002Fto\u002Fbinary.jar\"`\n\n接着运行：\n\n```\npython -m nougat.dataset.split_htmls_to_pages --html path\u002Fhtml\u002Froot --pdfs path\u002Fpdf\u002Froot --out path\u002Fpaired\u002Foutput --figure path\u002Fpdffigures\u002Foutputs\n```\n\n其他参数包括：\n\n| 参数                  | 描述                                       |\n| --------------------- | ------------------------------------------ |\n| `--recompute`         | 重新计算所有分割                           |\n| `--markdown MARKDOWN` | Markdown 输出目录                          |\n| `--workers WORKERS`   | 使用的进程数                               |\n| `--dpi DPI`           | 页面保存时的分辨率                         |\n| `--timeout TIMEOUT`   | 每篇论文的最大处理时间（秒）               |\n| `--tesseract`         | 对每页进行 Tesseract OCR 预测              |\n\n最后，创建一个包含所有图像路径、markdown 文本和元信息的 `jsonl` 文件：\n\n```\npython -m nougat.dataset.create_index --dir path\u002Fpaired\u002Foutput --out index.jsonl\n```\n\n对于每个 `jsonl` 文件，还需生成 seek map 以加快数据加载速度：\n\n```\npython -m nougat.dataset.gen_seek file.jsonl\n```\n\n最终目录结构示例如下：\n\n```\nroot\u002F\n├── images\n├── train.jsonl\n├── train.seek.map\n├── test.jsonl\n├── test.seek.map\n├── validation.jsonl\n└── validation.seek.map\n```\n\n注意：`path\u002Fpaired\u002Foutput`（此处为 `images`）中的 `.mmd` 和 `.json` 文件不再需要。\n这在上传至 S3 存储桶时非常有用，可减少一半文件数量。\n\n## 训练\n\n要训练或微调 Nougat 模型，请运行：\n\n```\npython train.py --config config\u002Ftrain_nougat.yaml\n```\n\n## 评估\n\n运行：\n\n```\npython test.py --checkpoint path\u002Fto\u002Fcheckpoint --dataset path\u002Fto\u002Ftest.jsonl --save_path path\u002Fto\u002Fresults.json\n```\n\n要获取不同文本模态的结果，请运行：\n\n```\npython -m nougat.metrics path\u002Fto\u002Fresults.json\n```\n\n## 常见问题（FAQ）\n\n- 为什么我只得到 `[MISSING_PAGE]`？\n\n  Nougat 是在 arXiv 和 PMC 上的科学论文数据集上训练的。你正在处理的文档是否与此类似？\n  文档使用的是什么语言？Nougat 对英文论文效果最佳，其他基于拉丁字母的语言可能也能工作。**中文、俄文、日文等将无法正常工作**。\n  如果满足上述条件，仍出现此问题，可能是由于在 CPU 或较旧 GPU 上运行时失败检测出现了误报（[#11](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F11)）。目前可尝试传入 `--no-skipping` 参数。\n\n- 我可以从哪里下载模型检查点（model checkpoint）？\n\n  模型已上传至 GitHub 的发布（release）页面。你也可以在首次运行程序时自动下载。通过传入 `--model 0.1.0-{base,small}` 来选择你偏好的模型版本。\n\n## 引用（Citation）\n\n```\n@misc{blecher2023nougat,\n      title={Nougat: Neural Optical Understanding for Academic Documents}, \n      author={Lukas Blecher and Guillem Cucurull and Thomas Scialom and Robert Stojnic},\n      year={2023},\n      eprint={2308.13418},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n\n## 致谢（Acknowledgments）\n\n本代码库基于 [Donut](https:\u002F\u002Fgithub.com\u002Fclovaai\u002Fdonut\u002F) 项目构建。\n\n## 许可证（License）\n\nNougat 代码库采用 MIT 许可证。\n\nNougat 模型权重采用 CC-BY-NC 许可证。","```markdown\n# Nougat 快速上手指南\n\n## 环境准备\n\n- **Python 版本**：3.9 或更高版本\n- **操作系统**：支持 Linux、macOS 和 Windows\n- **GPU 支持（可选）**：\n  - 若需使用 GPU 加速，请先安装对应版本的 PyTorch（推荐使用官方命令安装）：\n    ```bash\n    pip3 install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n    ```\n  - 国内用户可考虑使用清华源加速：\n    ```bash\n    pip3 install torch torchvision torchaudio -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n## 安装步骤\n\n### 基础安装（推荐）\n\n```bash\npip install nougat-ocr\n```\n\n### 从 GitHub 源码安装（获取最新版）\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\n```\n\n### 扩展功能安装（按需选择）\n\n- 如需启动 API 服务：\n  ```bash\n  pip install \"nougat-ocr[api]\"\n  ```\n\n- 如需生成训练数据集：\n  ```bash\n  pip install \"nougat-ocr[dataset]\"\n  ```\n\n## 基本使用\n\n### 1. 命令行解析 PDF\n\n将单个 PDF 转换为 `.mmd` 格式（兼容 Mathpix Markdown）：\n\n```bash\nnougat path\u002Fto\u002Ffile.pdf -o output_directory\n```\n\n批量处理整个目录中的 PDF：\n\n```bash\nnougat path\u002Fto\u002Fdirectory -o output_directory\n```\n\n> 默认使用小型模型 `0.1.0-small`，如需更高精度可指定基础模型：\n\n```bash\nnougat path\u002Fto\u002Ffile.pdf -o output_directory -m 0.1.0-base\n```\n\n### 2. 启动 API 服务（需安装 `nougat-ocr[api]`）\n\n启动本地服务：\n\n```bash\nnougat_api\n```\n\n通过 curl 请求预测结果（默认监听 `http:\u002F\u002F127.0.0.1:8503\u002Fpredict\u002F`）：\n\n```bash\ncurl -X 'POST' \\\n  'http:\u002F\u002F127.0.0.1:8503\u002Fpredict\u002F' \\\n  -H 'accept: application\u002Fjson' \\\n  -H 'Content-Type: multipart\u002Fform-data' \\\n  -F 'file=@\u003CPDFFILE.pdf>;type=application\u002Fpdf'\n```\n\n指定页码范围（例如第1到第5页）：\n\n```bash\ncurl -X 'POST' 'http:\u002F\u002F127.0.0.1:8503\u002Fpredict\u002F?start=1&stop=5' \\\n  -H 'accept: application\u002Fjson' \\\n  -H 'Content-Type: multipart\u002Fform-data' \\\n  -F 'file=@\u003CPDFFILE.pdf>;type=application\u002Fpdf'\n```\n```","一位高校研究生正在整理导师提供的上百篇计算机视觉领域的PDF论文，准备构建一个可检索、可复制公式的本地知识库用于文献综述写作。\n\n### 没有 nougat 时\n- 复制论文中的数学公式时只能手动重打LaTeX，稍复杂的矩阵或积分符号极易出错，一篇论文就要花几小时校对\n- 表格内容无法直接复制为结构化数据，必须手动在Excel里重建，遇到跨页表格更是灾难\n- 部分扫描版PDF根本无法选中文本，只能截图后用通用OCR工具识别，结果丢失所有数学符号和排版结构\n- 不同论文的公式编号、章节标题格式不统一，后期整理Markdown文档时需要大量手工调整\n- 想批量处理100+篇PDF时无从下手，只能一篇篇人工操作，效率极低还容易遗漏\n\n### 使用 nougat 后\n- 直接运行 `nougat papers\u002F -o output\u002F` 命令，自动将全部PDF转为保留LaTeX公式的.mmd文件，公式复制粘贴零误差\n- 表格被精准转换为Markdown兼容的LaTeX表格语法，可直接导入数据分析工具或粘贴到论文中\n- 扫描版PDF也能准确识别数学符号和复杂排版，输出结果与原版印刷质量几乎一致\n- 自动生成统一风格的Markdown结构，章节标题、公式编号、参考文献格式标准化，省去后期格式调整时间\n- 支持批量处理和API调用，配合脚本可实现全自动文献解析流水线，100篇论文2小时内完成结构化转换\n\nnougat 让学术文档从“不可编辑的图片”变成“可计算的知识资产”，彻底解放研究者的时间于创造性工作而非机械劳动。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_nougat_57e6edca.png","facebookresearch","Meta Research","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ffacebookresearch_449342bd.png","",null,"https:\u002F\u002Fopensource.fb.com","https:\u002F\u002Fgithub.com\u002Ffacebookresearch",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",99.7,{"name":88,"color":89,"percentage":90},"Dockerfile","#384d54",0.3,9886,626,"2026-04-05T18:29:03","MIT","Linux, macOS, Windows","非必需，但推荐使用 NVIDIA GPU（Windows 需手动安装对应 PyTorch）","未说明",{"notes":99,"python":100,"dependencies":101},"Windows 用户若需 GPU 支持，须先按 PyTorch 官网指引安装对应版本；首次运行会自动下载模型，默认使用 small 模型，可选 base 模型；处理中文\u002F俄文\u002F日文等非拉丁语系文档效果不佳。","3.9+",[102,103,104,105,106],"torch","transformers","accelerate","LaTeXML","pdffigures2",[14,37],15,"2026-03-27T02:49:30.150509","2026-04-06T08:45:16.361486",[112,117,122,127,132,137],{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},683,"运行 `predict.py` 时报错 'cannot unpack non-iterable NoneType object' 怎么解决？","该错误源于 pypdfium2 的多进程处理 bug，当 PDF 页面数少于进程数时会触发。官方已通过限制进程数不超过页面数修复。建议更新到最新版本，或改用 GPU 加速以避免此问题。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F96",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},684,"微调中文模型时出现 'Warning: Found repetitions in sample0' 警告怎么办？","这是由于训练数据过于简单（如纯文本无格式）导致泛化性差。应生成包含复杂排版和多样样式的中文 MMD 数据，并混合英文数据一起训练以防遗忘。单纯增加数据量不一定有效，需注重数据多样性。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F174",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},685,"如何正确配置 YAML 文件微调中文 Nougat 模型？损失值为 NaN 怎么办？","应设置 model_path 为原始英文模型路径，tokenizer 需合并中英文字符集，训练时同时使用中英文数据防止灾难性遗忘。若出现 NaN 损失，检查学习率是否过高、数据预处理是否异常或梯度爆炸。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F181",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},686,"在 M1\u002FM2 Mac 上使用 MPS 加速为何内存占用过高甚至崩溃？","部分 PDF 在 MPS 模式下会尝试分配超过 13GB 内存（即使设 --batchsize 1）。可尝试降级回 CPU 模式（内存\u003C4GB），或升级 PyTorch 至 2.1+ 并测试不同 PDF。某些文档本身可能导致资源异常分配。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F61",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},687,"报错 'cv2.dnn has no attribute DictValue' 如何修复？","编辑 OpenCV 安装目录下的 typing\u002F__init__.py 文件（如 \u002Fsite-packages\u002Fcv2\u002Ftyping\u002F__init__.py），注释掉第 169 行（或 macOS 最新版的第 171 行）：`# LayerId = cv2.dnn.DictValue` 即可临时解决。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F40",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},688,"生成的 .mmd 文件数学公式无法在 Markdown 编辑器中正常显示怎么办？","Nougat 输出的 MMD 格式默认兼容 Mathpix。若编辑器不支持，可请求添加命令行参数如 `--delimeter-dollar` 自动为公式添加 $...$ 包裹符，或手动后处理添加分隔符以便渲染引擎识别。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnougat\u002Fissues\u002F76",[143,148],{"id":144,"version":145,"summary_zh":146,"released_at":147},109950,"0.1.0-small","nougat-small weights","2023-08-22T13:21:34",{"id":149,"version":150,"summary_zh":151,"released_at":152},109951,"0.1.0-base","nougat-base weights","2023-08-22T13:21:18"]