[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-nlmatics--llmsherpa":3,"tool-nlmatics--llmsherpa":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":23,"env_os":95,"env_gpu":96,"env_ram":96,"env_deps":97,"category_tags":102,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":103,"updated_at":104,"faqs":105,"releases":136},3154,"nlmatics\u002Fllmsherpa","llmsherpa","Developer APIs to Accelerate LLM Projects","llmsherpa 是一套专为加速大语言模型（LLM）应用开发而设计的 API 工具集，其核心组件 LayoutPDFReader 能智能解析 PDF 文档并保留完整的版面布局信息。传统 PDF 转文本工具往往丢失段落结构、随意切断句子，导致在构建检索增强生成（RAG）系统时难以准确划分文本块或关联上下文。llmsherpa 通过识别章节层级、合并断行段落、提取表格与列表、去除页眉页脚及水印，甚至跨页连接内容，有效解决了这一痛点。\n\n该工具特别适合开发者和技术研究人员使用，尤其是那些需要处理大量文档数据以构建知识库、问答系统或进行文档向量化工作的团队。其独特亮点在于提供带有坐标信息的结构化数据块，支持更精准的“智能分块”，从而突破大模型上下文窗口的限制。此外，项目后端服务已完全开源（Apache 2.0 协议），支持 Docker 私有化部署，并兼容 DOCX、PPTX 等多种格式及内置 OCR 功能，让用户能在保障数据隐私的同时灵活定制解析流程。只需几行代码，即可将复杂的非结构化文档转化为大模型易于理解的高质量输入。","# LLM Sherpa\n\nLLM Sherpa provides strategic APIs to accelerate large language model (LLM) use cases.\n\n## What's New\n\n> [!IMPORTANT]\n> llmsherpa back end service is now fully open sourced under Apache 2.0 Licence. See [https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor](https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor)\n> - You can now run your own servers using a docker image!\n> - Support for different file formats: DOCX, PPTX, HTML, TXT, XML\n> - OCR Support is built in\n> - Blocks now have co-ordinates - use bbox propery of blocks such as sections\n> - A new indent parser to better align all headings in a document to their corresponding level\n> - The free server and paid server are not updated with latest code and users are requested to spawn their own servers using instructions in nlm-ingestor\n\n## LayoutPDFReader\n\nMost PDF to text parsers do not provide layout information. Often times, even the sentences are split with arbritrary CR\u002FLFs making it very difficult to find paragraph boundaries. This poses various challenges in chunking and adding long running contextual information such as section header to the passages while indexing\u002Fvectorizing PDFs for LLM applications such as retrieval augmented generation (RAG). \n\nLayoutPDFReader solves this problem by parsing PDFs along with hierarchical layout information such as:\n\n1. Sections and subsections along with their levels.\n2. Paragraphs - combines lines.\n3. Links between sections and paragraphs.\n4. Tables along with the section the tables are found in.\n5. Lists and nested lists.\n6. Join content spread across pages.\n7. Removal of repeating headers and footers.\n8. Watermark removal.\n\nWith LayoutPDFReader, developers can find optimal chunks of text to vectorize, and a solution for limited context window sizes of LLMs. \n\nYou can experiment with the library directly in **Google Colab** [here](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1hx5Y2TxWriAuFXcwcjsu3huKyn39Q2id?usp=sharing)\n\nHere's a [writeup](https:\u002F\u002Fopen.substack.com\u002Fpub\u002Fambikasukla\u002Fp\u002Fefficient-rag-with-document-layout?r=ft8uc&utm_campaign=post&utm_medium=web) explaining the problem and our approach. \n\nHere'a LlamaIndex [blog](https:\u002F\u002Fmedium.com\u002F@kirankurup\u002Fmastering-pdfs-extracting-sections-headings-paragraphs-and-tables-with-cutting-edge-parser-faea18870125) explaining the need for smart chunking. \n\n**API Reference**: [https:\u002F\u002Fllmsherpa.readthedocs.io\u002F](https:\u002F\u002Fllmsherpa.readthedocs.io\u002F)\n\n[How to use with Google Gemini Pro](https:\u002F\u002Fmedium.com\u002Fnlmatics\u002Fusing-google-gemini-pro-with-your-pdfs-7c191a2fcd98)\n[How to use with Cohere Embed3](https:\u002F\u002Fmedium.com\u002Fnlmatics\u002Fask-your-pdf-with-cohere-embed-v3-3eb5dab36945)\n\n### Important Notes\n\n * The LayoutPDFReader is tested on a wide variety of PDFs. That being said, it is still challenging to get every PDF parsed correctly.\n* OCR is currently not supported. Only PDFs with a text layer are supported.\n\n> [!NOTE]\n> LLMSherpa uses a free and open api server. The server does not store your PDFs except for temporary storage during parsing. This server will be decommissioned soon. \n> Self-host your own private server using instructions at [https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor](https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor)\n\n> [!IMPORTANT]\n> Private available at [Microsoft Azure Marketplace](https:\u002F\u002Fazuremarketplace.microsoft.com\u002Fen-us\u002Fmarketplace\u002Fapps\u002Fnlmaticscorp1686371242615.layout_pdf_parser?tab=Overview) \n> will be decommissioned soon. Please move to your self-hosted instance using instructions at [https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor](https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor).\n\n\n### Installation\n\n```bash\npip install llmsherpa\n```\n\n### Read a PDF file\n\nThe first step in using the LayoutPDFReader is to provide a url or file path to it and get back a document object.\n\n```python\nfrom llmsherpa.readers import LayoutPDFReader\n\nllmsherpa_api_url = \"https:\u002F\u002Freaders.llmsherpa.com\u002Fapi\u002Fdocument\u002Fdeveloper\u002FparseDocument?renderFormat=all\"\npdf_url = \"https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.13461.pdf\" # also allowed is a file path e.g. \u002Fhome\u002Fdownloads\u002Fxyz.pdf\npdf_reader = LayoutPDFReader(llmsherpa_api_url)\ndoc = pdf_reader.read_pdf(pdf_url)\n\n```\n\n### Install LlamaIndex\n\nIn the following examples, we will use [LlamaIndex](https:\u002F\u002Fwww.llamaindex.ai\u002F) for simplicity. Install the library if you haven't already.\n\n```bash\npip install llama-index\n```\n\n### Setup OpenAI\n\n```python\nimport openai\nopenai.api_key = #\u003CInsert API Key>\n```\n\n### Vector search and Retrieval Augmented Generation with Smart Chunking\n\nLayoutPDFReader does smart chunking keeping related text due to document structure together:\n\n* All list items are together including the paragraph that precedes the list.\n* Items in a table are chuncked together\n* Contextual information from section headers and nested section headers is included\n\nThe following code creates a LlamaIndex query engine from LayoutPDFReader document chunks\n\n```python\nfrom llama_index.core import Document\nfrom llama_index.core import VectorStoreIndex\n\nindex = VectorStoreIndex([])\nfor chunk in doc.chunks():\n    index.insert(Document(text=chunk.to_context_text(), extra_info={}))\nquery_engine = index.as_query_engine()\n```\n\nLet's run one query:\n\n```python\nresponse = query_engine.query(\"list all the tasks that work with bart\")\nprint(response)\n```\n\nWe get the following response:\n\n```\nBART works well for text generation, comprehension tasks, abstractive dialogue, question answering, and summarization tasks.\n```\n\nLet's try another query that needs answer from a table:\n\n```python\nresponse = query_engine.query(\"what is the bart performance score on squad\")\nprint(response)\n```\n\nHere's the response we get:\n\n```\nThe BART performance score on SQuAD is 88.8 for EM and 94.6 for F1.\n```\n\n### Summarize a Section using prompts\n\nLayoutPDFReader offers powerful ways to pick sections and subsections from a large document and use LLMs to extract insights from a section.\n\nThe following code looks for the Fine-tuning section of the document:\n\n```python\nfrom IPython.core.display import display, HTML\nselected_section = None\n# find a section in the document by title\nfor section in doc.sections():\n    if section.title == '3 Fine-tuning BART':\n        selected_section = section\n        break\n# use include_children=True and recurse=True to fully expand the section. \n# include_children only returns at one sublevel of children whereas recurse goes through all the descendants\nHTML(section.to_html(include_children=True, recurse=True))\n```\n\nRunning the above code yields the following HTML output:\n\n> \u003Ch3>3 Fine-tuning BART\u003C\u002Fh3>\u003Cp>The representations produced by BART can be used in several ways for downstream applications.\u003C\u002Fp>\u003Ch4>3.1 Sequence Classiﬁcation Tasks\u003C\u002Fh4>\u003Cp>For sequence classiﬁcation tasks, the same input is fed into the encoder and decoder, and the ﬁnal hidden state of the ﬁnal decoder token is fed into new multi-class linear classiﬁer.\\nThis approach is related to the CLS token in BERT; however we add the additional token to the end so that representation for the token in the decoder can attend to decoder states from the complete input (Figure 3a).\u003C\u002Fp>\u003Ch4>3.2 Token Classiﬁcation Tasks\u003C\u002Fh4>\u003Cp>For token classiﬁcation tasks, such as answer endpoint classiﬁcation for SQuAD, we feed the complete document into the encoder and decoder, and use the top hidden state of the decoder as a representation for each word.\\nThis representation is used to classify the token.\u003C\u002Fp>\u003Ch4>3.3 Sequence Generation Tasks\u003C\u002Fh4>\u003Cp>Because BART has an autoregressive decoder, it can be directly ﬁne tuned for sequence generation tasks such as abstractive question answering and summarization.\\nIn both of these tasks, information is copied from the input but manipulated, which is closely related to the denoising pre-training objective.\\nHere, the encoder input is the input sequence, and the decoder generates outputs autoregressively.\u003C\u002Fp>\u003Ch4>3.4 Machine Translation\u003C\u002Fh4>\u003Cp>We also explore using BART to improve machine translation decoders for translating into English.\\nPrevious work Edunov et al.\\n(2019) has shown that models can be improved by incorporating pre-trained encoders, but gains from using pre-trained language models in decoders have been limited.\\nWe show that it is possible to use the entire BART model (both encoder and decoder) as a single pretrained decoder for machine translation, by adding a new set of encoder parameters that are learned from bitext (see Figure 3b).\u003C\u002Fp>\u003Cp>More precisely, we replace BART’s encoder embedding layer with a new randomly initialized encoder.\\nThe model is trained end-to-end, which trains the new encoder to map foreign words into an input that BART can de-noise to English.\\nThe new encoder can use a separate vocabulary from the original BART model.\u003C\u002Fp>\u003Cp>We train the source encoder in two steps, in both cases backpropagating the cross-entropy loss from the output of the BART model.\\nIn the ﬁrst step, we freeze most of BART parameters and only update the randomly initialized source encoder, the BART positional embeddings, and the self-attention input projection matrix of BART’s encoder ﬁrst layer.\\nIn the second step, we train all model parameters for a small number of iterations.\u003C\u002Fp>\n\nNow, let's create a custom summary of this text using a prompt:\n\n```python\nfrom llama_index.llms import OpenAI\ncontext = selected_section.to_html(include_children=True, recurse=True)\nquestion = \"list all the tasks discussed and one line about each task\"\nresp = OpenAI().complete(f\"read this text and answer question: {question}:\\n{context}\")\nprint(resp.text)\n```\n\nThe above code results in following output:\n\n```\nTasks discussed in the text:\n\n1. Sequence Classification Tasks: The same input is fed into the encoder and decoder, and the final hidden state of the final decoder token is used for multi-class linear classification.\n2. Token Classification Tasks: The complete document is fed into the encoder and decoder, and the top hidden state of the decoder is used as a representation for each word for token classification.\n3. Sequence Generation Tasks: BART can be fine-tuned for tasks like abstractive question answering and summarization, where the encoder input is the input sequence and the decoder generates outputs autoregressively.\n4. Machine Translation: BART can be used to improve machine translation decoders by incorporating pre-trained encoders and using the entire BART model as a single pretrained decoder. The new encoder parameters are learned from bitext.\n```\n\n### Analyze a Table using prompts\n\nWith LayoutPDFReader, you can iterate through all the tables in a document and use the power of LLMs to analyze a Table\nLet's look at the 6th table in this document. If you are using a notebook, you can display the table as follows:\n\n```python\nfrom IPython.core.display import display, HTML\nHTML(doc.tables()[5].to_html())\n```\nThe output table structure looks like this:\n\n|  | SQuAD 1.1 EM\u002FF1 | SQuAD 2.0 EM\u002FF1 | MNLI m\u002Fmm | SST Acc | QQP Acc | QNLI Acc | STS-B Acc | RTE Acc | MRPC Acc | CoLA Mcc\n | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | ---\n | BERT | 84.1\u002F90.9 | 79.0\u002F81.8 | 86.6\u002F- | 93.2 | 91.3 | 92.3 | 90.0 | 70.4 | 88.0 | 60.6\n | UniLM | -\u002F- | 80.5\u002F83.4 | 87.0\u002F85.9 | 94.5 | - | 92.7 | - | 70.9 | - | 61.1\n | XLNet | 89.0\u002F94.5 | 86.1\u002F88.8 | 89.8\u002F- | 95.6 | 91.8 | 93.9 | 91.8 | 83.8 | 89.2 | 63.6\n | RoBERTa | 88.9\u002F94.6 | 86.5\u002F89.4 | 90.2\u002F90.2 | 96.4 | 92.2 | 94.7 | 92.4 | 86.6 | 90.9 | 68.0\n | BART | 88.8\u002F94.6 | 86.1\u002F89.2 | 89.9\u002F90.1 | 96.6 | 92.5 | 94.9 | 91.2 | 87.0 | 90.4 | 62.8\n\nNow let's ask a question to analyze this table:\n\n```python\nfrom llama_index.llms import OpenAI\ncontext = doc.tables()[5].to_html()\nresp = OpenAI().complete(f\"read this table and answer question: which model has the best performance on squad 2.0:\\n{context}\")\nprint(resp.text)\n```\n\nThe above question will result in the following output:\n```\nThe model with the best performance on SQuAD 2.0 is RoBERTa, with an EM\u002FF1 score of 86.5\u002F89.4.\n```\n\nThat's it! LayoutPDFReader also supports tables with nested headers and header rows.\n\nHere's an example with nested headers:\n```\nfrom IPython.core.display import display, HTML\nHTML(doc.tables()[6].to_html())\n```\n\n |  | CNN\u002FDailyMail |  |  | XSum |  | -\n | --- | --- | --- | --- | --- | --- | ---\n  |  | R1 | R2 | RL | R1 | R2 | RL\n | --- | --- | --- | --- | --- | --- | ---\n | Lead-3 | 40.42 | 17.62 | 36.67 | 16.30 | 1.60 | 11.95\n | PTGEN (See et al., 2017) | 36.44 | 15.66 | 33.42 | 29.70 | 9.21 | 23.24\n | PTGEN+COV (See et al., 2017) | 39.53 | 17.28 | 36.38 | 28.10 | 8.02 | 21.72\n | UniLM | 43.33 | 20.21 | 40.51 | - | - | -\n | BERTSUMABS (Liu & Lapata, 2019) | 41.72 | 19.39 | 38.76 | 38.76 | 16.33 | 31.15\n | BERTSUMEXTABS (Liu & Lapata, 2019) | 42.13 | 19.60 | 39.18 | 38.81 | 16.50 | 31.27\n | BART | 44.16 | 21.28 | 40.90 | 45.14 | 22.27 | 37.25\n\nNow let's ask an interesting question:\n\n```python\nfrom llama_index.llms import OpenAI\ncontext = doc.tables()[6].to_html()\nquestion = \"tell me about R1 of bart for different datasets\"\nresp = OpenAI().complete(f\"read this table and answer question: {question}:\\n{context}\")\nprint(resp.text)\n```\nAnd we get the following answer:\n\n```\nR1 of BART for different datasets:\n\n- For the CNN\u002FDailyMail dataset, the R1 score of BART is 44.16.\n- For the XSum dataset, the R1 score of BART is 45.14.\n```\n\n\n### Get the Raw JSON\n\nTo get the complete json returned by llmsherpa service and process it differently, simply get the json attribute\n\n```python\ndoc.json\n```","# LLM 顺丰\n\nLLM 顺丰 提供战略级 API，以加速大型语言模型（LLM）用例的落地。\n\n## 最新动态\n\n> [!重要]\n> llmsherpa 后端服务现已完全开源，采用 Apache 2.0 许可证。详情请参见 [https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor](https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor)\n> - 您现在可以使用 Docker 镜像运行自己的服务器！\n> - 支持多种文件格式：DOCX、PPTX、HTML、TXT、XML\n> - 内置 OCR 支持\n> - 块现在带有坐标信息——您可以使用诸如章节等块的 bbox 属性\n> - 新增缩进解析器，更好地将文档中的所有标题与其对应的层级对齐\n> - 免费服务器和付费服务器未更新至最新代码，建议用户按照 nlm-ingestor 中的说明自行部署服务器\n\n## LayoutPDFReader\n\n大多数 PDF 转文本解析器无法提供布局信息。通常情况下，即使是句子也会被随意的换行符分割，导致很难找到段落边界。这给在为 LLM 应用（如检索增强生成 RAG）索引或向量化 PDF 文件时，进行分块以及将长篇上下文信息（如章节标题）添加到片段中带来了诸多挑战。\n\nLayoutPDFReader 通过解析 PDF 并提取层次化的布局信息来解决这一问题，例如：\n\n1. 章节和子章节及其层级。\n2. 段落——合并多行内容。\n3. 章节与段落之间的链接。\n4. 表格及其所在的章节。\n5. 列表及嵌套列表。\n6. 连接跨页的内容。\n7. 去除重复的页眉和页脚。\n8. 去除水印。\n\n借助 LayoutPDFReader，开发者可以找到最适合向量化的文本块，从而解决 LLM 上下文窗口有限的问题。\n\n您可以在 **Google Colab** 中直接体验该库 [这里](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1hx5Y2TxWriAuFXcwcjsu3huKyn39Q2id?usp=sharing)\n\n这里有一篇 [文章](https:\u002F\u002Fopen.substack.com\u002Fpub\u002Fambikasukla\u002Fp\u002Fefficient-rag-with-document-layout?r=ft8uc&utm_campaign=post&utm_medium=web) 解释了这个问题以及我们的解决方案。\n\nLlamaIndex 的一篇 [博客](https:\u002F\u002Fmedium.com\u002F@kirankurup\u002Fmastering-pdfs-extracting-sections-headings-paragraphs-and-tables-with-cutting-edge-parser-faea18870125) 也解释了智能分块的必要性。\n\n**API 参考文档**: [https:\u002F\u002Fllmsherpa.readthedocs.io\u002F](https:\u002F\u002Fllmsherpa.readthedocs.io\u002F)\n\n[如何与 Google Gemini Pro 配合使用](https:\u002F\u002Fmedium.com\u002Fnlmatics\u002Fusing-google-gemini-pro-with-your-pdfs-7c191a2fcd98)\n[如何与 Cohere Embed3 配合使用](https:\u002F\u002Fmedium.com\u002Fnlmatics\u002Fask-your-pdf-with-cohere-embed-v3-3eb5dab36945)\n\n### 重要提示\n\n * LayoutPDFReader 已在各种类型的 PDF 上进行了测试。尽管如此，要确保每一份 PDF 都能被正确解析仍然具有挑战性。\n* 目前不支持 OCR。仅支持带有文本层的 PDF。\n\n> [!注意]\n> LLMSherpa 使用一个免费且开放的 API 服务器。该服务器不会存储您的 PDF 文件，仅在解析过程中进行临时存储。此服务器将很快停止服务。\n> 请按照 [https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor](https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor) 中的说明，自行托管私有服务器。\n\n> [!重要]\n> 在 [Microsoft Azure Marketplace](https:\u002F\u002Fazuremarketplace.microsoft.com\u002Fen-us\u002Fmarketplace\u002Fapps\u002Fnlmaticscorp1686371242615.layout_pdf_parser?tab=Overview) 上提供的私有版本也将很快停止服务。请按照 [https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor](https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor) 中的说明迁移到您自己托管的实例。\n\n\n### 安装\n\n```bash\npip install llmsherpa\n```\n\n### 读取 PDF 文件\n\n使用 LayoutPDFReader 的第一步是提供一个 URL 或文件路径，然后获取一个文档对象。\n\n```python\nfrom llmsherpa.readers import LayoutPDFReader\n\nllmsherpa_api_url = \"https:\u002F\u002Freaders.llmsherpa.com\u002Fapi\u002Fdocument\u002Fdeveloper\u002FparseDocument?renderFormat=all\"\npdf_url = \"https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.13461.pdf\" # 也可以使用文件路径，例如 \u002Fhome\u002Fdownloads\u002Fxyz.pdf\npdf_reader = LayoutPDFReader(llmsherpa_api_url)\ndoc = pdf_reader.read_pdf(pdf_url)\n\n```\n\n### 安装 LlamaIndex\n\n在下面的示例中，我们将使用 [LlamaIndex](https:\u002F\u002Fwww.llamaindex.ai\u002F) 来简化操作。如果您尚未安装该库，请先安装。\n\n```bash\npip install llama-index\n```\n\n### 设置 OpenAI\n\n```python\nimport openai\nopenai.api_key = #\u003C插入 API 密钥>\n```\n\n### 基于智能分块的向量搜索与检索增强生成\n\nLayoutPDFReader 会根据文档结构进行智能分块，将相关文本保持在一起：\n\n* 所有列表项及其前面的段落会放在一起。\n* 表格中的内容会被分块在一起。\n* 章节标题和嵌套章节标题中的上下文信息也会被包含进去。\n\n以下代码从 LayoutPDFReader 的文档块中创建了一个 LlamaIndex 查询引擎：\n\n```python\nfrom llama_index.core import Document\nfrom llama_index.core import VectorStoreIndex\n\nindex = VectorStoreIndex([])\nfor chunk in doc.chunks():\n    index.insert(Document(text=chunk.to_context_text(), extra_info={}))\nquery_engine = index.as_query_engine()\n```\n\n让我们运行一个查询：\n\n```python\nresponse = query_engine.query(\"列出所有与 BART 合作的任务\")\nprint(response)\n```\n\n我们得到如下响应：\n\n```\nBART 在文本生成、理解任务、摘要式对话、问答和摘要任务方面表现良好。\n```\n\n再尝试一个需要从表格中获取答案的查询：\n\n```python\nresponse = query_engine.query(\"BART 在 SQuAD 上的表现分数是多少\")\nprint(response)\n```\n\n以下是我们的回答：\n\n```\nBART 在 SQuAD 上的表现分数为 EM 88.8，F1 94.6。\n```\n\n### 使用提示总结某一章节\n\nLayoutPDFReader 提供了强大的方法，可以从大型文档中选择特定的章节和子章节，并利用 LLM 从该章节中提取见解。\n\n以下代码查找文档中的“微调”章节：\n\n```python\nfrom IPython.core.display import display, HTML\nselected_section = None\n# 根据标题在文档中查找某个章节\nfor section in doc.sections():\n    if section.title == '3 微调 BART':\n        selected_section = section\n        break\n# 使用 include_children=True 和 recurse=True 来完全展开该章节。\n\n# include_children 只返回子级的一层，而 recurse 会遍历所有后代\nHTML(section.to_html(include_children=True, recurse=True))\n```\n\n运行上述代码后，生成以下 HTML 输出：\n\n> \u003Ch3>3 微调 BART\u003C\u002Fh3>\u003Cp>BART 产生的表示可以用于下游任务的多种方式。\u003C\u002Fp>\u003Ch4>3.1 序列分类任务\u003C\u002Fh4>\u003Cp>对于序列分类任务，相同的输入同时送入编码器和解码器，解码器最后一个标记的最终隐藏状态会被送入一个新的多分类线性分类器。\\n这种方法与 BERT 中的 CLS 标记类似；然而，我们会在输入末尾添加一个额外的标记，以便解码器中的该标记能够关注到整个输入的解码器状态（图 3a）。\u003C\u002Fp>\u003Ch4>3.2 标记分类任务\u003C\u002Fh4>\u003Cp>对于标记分类任务，例如 SQuAD 的答案端点分类，我们将完整文档送入编码器和解码器，并使用解码器的顶层隐藏状态作为每个词的表示。\\n这个表示用于对标记进行分类。\u003C\u002Fp>\u003Ch4>3.3 序列生成任务\u003C\u002Fh4>\u003Cp>由于 BART 具有自回归解码器，它可以被直接微调用于序列生成任务，例如摘要式问答和文本摘要。\\n在这两类任务中，信息都是从输入中复制并经过处理的，这与去噪预训练目标密切相关。\\n在这里，编码器的输入是输入序列，而解码器则以自回归方式生成输出。\u003C\u002Fp>\u003Ch4>3.4 机器翻译\u003C\u002Fh4>\u003Cp>我们还探索使用 BART 来改进用于翻译成英语的机器翻译解码器。\\n先前的研究 Edunov 等人（2019）表明，通过引入预训练的编码器可以提升模型性能，但在解码器中使用预训练语言模型所带来的收益却相对有限。\\n我们证明，可以通过添加一组从双语平行语料中学习的新编码器参数，将整个 BART 模型（包括编码器和解码器）用作单一的预训练解码器来进行机器翻译（见图 3b）。\u003C\u002Fp>\u003Cp>更具体地说，我们将 BART 的编码器嵌入层替换为一个新的随机初始化编码器。\\n该模型以端到端的方式进行训练，从而让新的编码器学会将外语词汇映射为 BART 能够去噪并转换成英语的输入形式。\\n新编码器可以使用与原始 BART 模型不同的词汇表。\u003C\u002Fp>\u003Cp>我们分两步训练源端编码器，在每一步中都基于 BART 模型输出的交叉熵损失进行反向传播。\\n在第一步中，我们冻结了 BART 的大部分参数，只更新随机初始化的源端编码器、BART 的位置嵌入以及 BART 编码器第一层的自注意力输入投影矩阵。\\n在第二步中，我们对所有模型参数进行少量迭代的训练。\u003C\u002Fp>\n\n现在，让我们使用提示创建这段文字的自定义摘要：\n\n```python\nfrom llama_index.llms import OpenAI\ncontext = selected_section.to_html(include_children=True, recurse=True)\nquestion = \"列出文中讨论的所有任务，并为每项任务写一句话简介\"\nresp = OpenAI().complete(f\"阅读这段文字并回答问题：{question}：\\n{context}\")\nprint(resp.text)\n```\n\n上述代码生成如下输出：\n\n```\n文中讨论的任务：\n\n1. 序列分类任务：将相同的输入送入编码器和解码器，利用解码器最后一个标记的最终隐藏状态进行多分类线性分类。\n2. 标记分类任务：将完整文档送入编码器和解码器，使用解码器的顶层隐藏状态作为每个词的表示来进行标记分类。\n3. 序列生成任务：BART 可以被微调用于摘要式问答和文本摘要等任务，其中编码器的输入是输入序列，解码器则以自回归方式生成输出。\n4. 机器翻译：BART 可以通过引入预训练的编码器，并将整个 BART 模型用作单一的预训练解码器来改进机器翻译解码器。新编码器的参数是从双语平行语料中学习得到的。\n```\n\n### 使用提示分析表格\n\n借助 LayoutPDFReader，您可以遍历文档中的所有表格，并利用大型语言模型的强大功能来分析表格。  \n让我们来看看本文档中的第6个表格。如果您使用的是笔记本，可以按如下方式显示该表格：\n\n```python\nfrom IPython.core.display import display, HTML\nHTML(doc.tables()[5].to_html())\n```\n\n输出的表格结构如下：\n\n|  | SQuAD 1.1 EM\u002FF1 | SQuAD 2.0 EM\u002FF1 | MNLI m\u002Fmm | SST Acc | QQP Acc | QNLI Acc | STS-B Acc | RTE Acc | MRPC Acc | CoLA Mcc |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| BERT | 84.1\u002F90.9 | 79.0\u002F81.8 | 86.6\u002F- | 93.2 | 91.3 | 92.3 | 90.0 | 70.4 | 88.0 | 60.6 |\n| UniLM | -\u002F- | 80.5\u002F83.4 | 87.0\u002F85.9 | 94.5 | - | 92.7 | - | 70.9 | - | 61.1 |\n| XLNet | 89.0\u002F94.5 | 86.1\u002F88.8 | 89.8\u002F- | 95.6 | 91.8 | 93.9 | 91.8 | 83.8 | 89.2 | 63.6 |\n| RoBERTa | 88.9\u002F94.6 | 86.5\u002F89.4 | 90.2\u002F90.2 | 96.4 | 92.2 | 94.7 | 92.4 | 86.6 | 90.9 | 68.0 |\n| BART | 88.8\u002F94.6 | 86.1\u002F89.2 | 89.9\u002F90.1 | 96.6 | 92.5 | 94.9 | 91.2 | 87.0 | 90.4 | 62.8 |\n\n现在我们来提出一个问题，以分析这个表格：\n\n```python\nfrom llama_index.llms import OpenAI\ncontext = doc.tables()[5].to_html()\nresp = OpenAI().complete(f\"阅读此表格并回答问题：哪个模型在 SQuAD 2.0 上表现最佳？\\n{context}\")\nprint(resp.text)\n```\n\n上述问题将产生以下输出：\n```\n在 SQuAD 2.0 上表现最佳的模型是 RoBERTa，其 EM\u002FF1 得分为 86.5\u002F89.4。\n```\n\n就是这样！LayoutPDFReader 还支持带有嵌套表头和表头行的表格。\n\n以下是一个包含嵌套表头的示例：\n```python\nfrom IPython.core.display import display, HTML\nHTML(doc.tables()[6].to_html())\n```\n\n|  | CNN\u002FDailyMail |  |  | XSum |  | -\n| --- | --- | --- | --- | --- | --- | ---\n|  | R1 | R2 | RL | R1 | R2 | RL |\n| --- | --- | --- | --- | --- | --- | ---\n| Lead-3 | 40.42 | 17.62 | 36.67 | 16.30 | 1.60 | 11.95 |\n| PTGEN (See 等, 2017) | 36.44 | 15.66 | 33.42 | 29.70 | 9.21 | 23.24 |\n| PTGEN+COV (See 等, 2017) | 39.53 | 17.28 | 36.38 | 28.10 | 8.02 | 21.72 |\n| UniLM | 43.33 | 20.21 | 40.51 | - | - | - |\n| BERTSUMABS (Liu & Lapata, 2019) | 41.72 | 19.39 | 38.76 | 38.76 | 16.33 | 31.15 |\n| BERTSUMEXTABS (Liu & Lapata, 2019) | 42.13 | 19.60 | 39.18 | 38.81 | 16.50 | 31.27 |\n| BART | 44.16 | 21.28 | 40.90 | 45.14 | 22.27 | 37.25 |\n\n现在我们来提出一个有趣的问题：\n\n```python\nfrom llama_index.llms import OpenAI\ncontext = doc.tables()[6].to_html()\nquestion = \"请告诉我 BART 在不同数据集上的 R1 分数\"\nresp = OpenAI().complete(f\"阅读此表格并回答问题：{question}：\\n{context}\")\nprint(resp.text)\n```\n\n我们得到如下答案：\n\n```\nBART 在不同数据集上的 R1 分数如下：\n\n- 对于 CNN\u002FDailyMail 数据集，BART 的 R1 分数为 44.16。\n- 对于 XSum 数据集，BART 的 R1 分数为 45.14。\n```\n\n\n### 获取原始 JSON\n\n要获取 llmsherpa 服务返回的完整 JSON 并进行其他处理，只需调用 json 属性即可：\n\n```python\ndoc.json\n```","# LLM Sherpa 快速上手指南\n\nLLM Sherpa 提供了一套战略性的 API，旨在加速大语言模型（LLM）的应用开发。其核心组件 `LayoutPDFReader` 能够解析 PDF 文件的层级布局信息（如章节、段落、表格、列表等），解决传统解析器丢失结构信息的问题，非常适合用于构建检索增强生成（RAG）系统。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows\n*   **Python 版本**：推荐 Python 3.8 及以上版本\n*   **前置依赖**：\n    *   `pip` 包管理工具\n    *   （可选）若需运行示例中的向量搜索功能，需准备 OpenAI API Key\n*   **网络环境**：由于默认使用官方在线 API 服务解析 PDF，请确保网络可访问 `https:\u002F\u002Freaders.llmsherpa.com`。\n    *   *注意：官方免费服务器即将停用，生产环境建议参考 [nlm-ingestor](https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fnlm-ingestor) 项目自行部署 Docker 服务。*\n\n## 安装步骤\n\n使用 pip 直接安装核心库。国内用户若遇到下载缓慢，可指定清华或阿里镜像源。\n\n```bash\n# 标准安装\npip install llmsherpa\n\n# 或使用国内镜像加速安装\npip install llmsherpa -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n若需运行下文中的 RAG 示例，还需安装 LlamaIndex：\n\n```bash\npip install llama-index\n```\n\n## 基本使用\n\n### 1. 解析 PDF 文件\n\n`LayoutPDFReader` 可以将 PDF 转换为包含丰富结构信息的文档对象。以下是最基础的用法：\n\n```python\nfrom llmsherpa.readers import LayoutPDFReader\n\n# 配置 API 地址 (建议使用自托管地址替换下方的默认地址)\nllmsherpa_api_url = \"https:\u002F\u002Freaders.llmsherpa.com\u002Fapi\u002Fdocument\u002Fdeveloper\u002FparseDocument?renderFormat=all\"\n\n# 支持网络 URL 或本地文件路径 (例如：\u002Fhome\u002Fdownloads\u002Fxyz.pdf)\npdf_url = \"https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.13461.pdf\"\n\n# 初始化阅读器并解析\npdf_reader = LayoutPDFReader(llmsherpa_api_url)\ndoc = pdf_reader.read_pdf(pdf_url)\n```\n\n### 2. 构建智能分块的 RAG 索引\n\n`LayoutPDFReader` 能根据文档结构（如将列表项、表格内容与其上下文关联）进行智能分块。结合 LlamaIndex 可快速构建问答引擎：\n\n```python\nfrom llama_index.core import Document\nfrom llama_index.core import VectorStoreIndex\nimport openai\n\n# 配置 OpenAI Key\nopenai.api_key = \"\u003CInsert API Key>\"\n\n# 创建索引并插入智能分块后的内容\nindex = VectorStoreIndex([])\nfor chunk in doc.chunks():\n    # to_context_text() 会自动包含章节标题等上下文信息\n    index.insert(Document(text=chunk.to_context_text(), extra_info={}))\n\n# 创建查询引擎\nquery_engine = index.as_query_engine()\n\n# 执行查询\nresponse = query_engine.query(\"list all the tasks that work with bart\")\nprint(response)\n```\n\n### 3. 针对特定章节或表格进行分析\n\n您可以直接提取特定章节或表格的 HTML 内容，发送给 LLM 进行深度分析或总结。\n\n**提取并总结特定章节：**\n\n```python\nfrom llama_index.llms import OpenAI\n\n# 查找标题为 '3 Fine-tuning BART' 的章节\nselected_section = None\nfor section in doc.sections():\n    if section.title == '3 Fine-tuning BART':\n        selected_section = section\n        break\n\n# 获取包含子章节的完整 HTML 内容\ncontext = selected_section.to_html(include_children=True, recurse=True)\n\n# 使用 LLM 进行总结\nquestion = \"list all the tasks discussed and one line about each task\"\nresp = OpenAI().complete(f\"read this text and answer question: {question}:\\n{context}\")\nprint(resp.text)\n```\n\n**分析表格数据：**\n\n```python\n# 获取文档中的第 6 个表格 (索引从 0 开始)\ntable_html = doc.tables()[5].to_html()\n\n# 让 LLM 基于表格内容回答问题\nresp = OpenAI().complete(f\"read this table and answer question: which model has the best performance on squad 2.0:\\n{table_html}\")\nprint(resp.text)\n```\n\n> **重要提示**：目前的公共 API 服务器仅支持带有文本层的 PDF（不支持纯图片 PDF 的 OCR 识别）。如需处理复杂文件或保障数据隐私，请务必部署私有服务。","某金融科技团队正在构建基于 RAG（检索增强生成）的内部合规问答系统，需要处理数千份包含复杂表格和多层级标题的 PDF 监管文档。\n\n### 没有 llmsherpa 时\n- **段落支离破碎**：传统解析器常将完整的句子按物理行强行切断，导致向量切片中充满不连贯的碎片文本，严重影响语义理解。\n- **层级信息丢失**：无法识别章节与子章节的从属关系，当切片脱离上下文时，AI 无法判断该段落属于“反洗钱”还是“信贷风险”部分。\n- **表格数据错乱**：复杂的监管报表被还原为无结构的纯文本，行列对应关系完全混乱，导致关键数值无法被准确检索。\n- **噪音干扰严重**：每页重复出现的页眉、页脚和水印被当作正文内容索引，产生大量低质量的匹配结果，干扰回答准确性。\n\n### 使用 llmsherpa 后\n- **智能段落重组**：LayoutPDFReader 自动合并断行，还原文档逻辑段落，确保每个向量切片都是语义完整的自然语言单元。\n- **保留层级结构**：精准提取标题树状结构，将章节路径作为元数据附加到内容块上，让 AI 在回答时能精准定位上下文背景。\n- **表格结构化还原**：自动识别表格区域及其所属章节，保持行列逻辑完整，使模型能正确解读监管指标中的数值关系。\n- **自动清洗噪音**：内置算法智能剔除重复页眉、页脚及水印，仅保留高价值正文内容，显著提升检索信噪比。\n\nllmsherpa 通过还原文档的物理布局与逻辑结构，解决了非结构化 PDF 转化为高质量 LLM 语料的核心难题，大幅提升了 RAG 系统的回答准确率。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnlmatics_llmsherpa_727f01fa.png","nlmatics","NLMatics","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fnlmatics_b917acb7.png","",null,"https:\u002F\u002Fwww.nlmatics.com","https:\u002F\u002Fgithub.com\u002Fnlmatics",[83,87],{"name":84,"color":85,"percentage":86},"Jupyter Notebook","#DA5B0B",77.7,{"name":88,"color":89,"percentage":90},"Python","#3572A5",22.3,1751,167,"2026-04-03T13:05:45","MIT","Linux, macOS, Windows","未说明",{"notes":98,"python":96,"dependencies":99},"该工具主要作为 Python 客户端库运行，核心解析逻辑依赖远程 API 服务（默认使用官方免费服务器，但即将停用）。强烈建议用户根据文档指引自行部署后端服务（nlm-ingestor），可通过 Docker 镜像运行。目前仅支持包含文本层的 PDF 文件，暂不支持纯图片 PDF 的 OCR 识别。安装仅需执行 pip install llmsherpa，无特殊本地重型依赖（如 PyTorch）要求，除非自行部署后端服务。",[67,100,101],"llama-index","openai",[26,13,51],"2026-03-27T02:49:30.150509","2026-04-06T07:13:11.939749",[106,111,116,121,126,131],{"id":107,"question_zh":108,"answer_zh":109,"source_url":110},14534,"如何获取 PDF 的页码、文件名或页面标签等元数据信息？","可以通过 chunk.block_json 访问完整的块信息，或者直接使用 chunk.page_idx（页码索引）、chunk.block_idx（块索引）等属性。如果需要文件名或 URL，目前库中未自动附加，但用户可以通过解析 JSON 后手动添加这些元数据到每个 chunk 中，以便在 RAG 应用中追踪信息来源。","https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fllmsherpa\u002Fissues\u002F13",{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},14535,"为什么使用 to_html() 或 to_text() 方法时输出中出现重复的节点或文本？","这是一个已知问题，通常发生在文档树结构包含嵌套章节时（例如 Section-1 包含 Section-2）。当前的 to_text 实现会递归遍历所有子节点，导致子章节的文本在被父章节遍历时输出一次，又在单独遍历该子章节时再次输出。维护者已确认该问题并创建了 Pull Request #83 进行修复，建议更新到包含该修复的最新版本。","https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fllmsherpa\u002Fissues\u002F79",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},14536,"运行代码时遇到 \"KeyError: 'result'\" 错误怎么办？","这通常是由于服务器负载过高导致的临时性问题，并非代码逻辑错误。解决方法是重试几次（有用户反馈重试约 10 次后成功）。维护者表示正在增加服务器容量并计划提供私有托管选项。如果问题持续，建议稍后再试或检查网络连接。","https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fllmsherpa\u002Fissues\u002F7",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},14537,"解析 PDF 时遇到 \"MaxRetryError\" 或 \"SSLCertVerificationError\" 错误如何解决？","这通常不是 llmsherpa 库本身的错误，而是运行代码的服务器与目标 PDF 源（如 arxiv.org）之间的连接问题。可能的原因包括：目标网站限制了频繁下载（节流）、临时网络连通性问题或本地 SSL 证书验证失败。建议检查网络连接，尝试更换网络环境，或在本地环境中确保证书链完整。如果是访问 arxiv 失败，可能是对方限制了请求频率。","https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fllmsherpa\u002Fissues\u002F34",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},14538,"运行 Demo 时遇到 \"List Index out of range\" 错误如何处理？","该问题已在版本 0.1.2 中修复。如果您遇到此错误，请将 llmsherpa 库升级到 0.1.2 或更高版本（命令：pip install --upgrade llmsherpa），然后重新运行代码即可解决。","https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fllmsherpa\u002Fissues\u002F5",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},14539,"解析本地文件时出错，是否代码中存在拼写错误？","是的，社区用户在 file_reader.py 第 63 行发现了一个潜在的逻辑错误，指出判断条件可能应该是 `!= \"c\"` 而不是 `!= \"\"`。维护者已确认该问题并接受了相关的 Pull Request 进行修正。建议更新到最新版本以获取此修复。","https:\u002F\u002Fgithub.com\u002Fnlmatics\u002Fllmsherpa\u002Fissues\u002F92",[137,141,146,151,156,160],{"id":138,"version":139,"summary_zh":79,"released_at":140},81455,"v0.1.5","2024-06-13T10:28:11",{"id":142,"version":143,"summary_zh":144,"released_at":145},81456,"v0.1.4","更新了 README，添加了关于 nlm-ingestor 的信息；为块增加了边界框，并向文档类添加了 to_html 和 to_text 方法。","2024-01-24T05:16:48",{"id":147,"version":148,"summary_zh":149,"released_at":150},81457,"v0.1.3","- 添加了文档\r\n- 为分块和所有节点添加了元数据","2023-11-01T22:00:30",{"id":152,"version":153,"summary_zh":154,"released_at":155},81458,"v0.1.2","- 改进了文档\n- 修复了头部顺序错乱导致的列表索引越界错误\n- 添加了 urllib3 作为依赖项","2023-10-21T20:53:56",{"id":157,"version":158,"summary_zh":154,"released_at":159},81459,"v0.1.1","2023-10-17T23:59:31",{"id":161,"version":162,"summary_zh":163,"released_at":164},81460,"v0.1.0","llmsherpa 首次发布，内置 LayoutPDFReader","2023-10-17T23:35:09"]