[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-google--langextract":3,"tool-google--langextract":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":98,"forks":99,"last_commit_at":100,"license":101,"difficulty_score":23,"env_os":102,"env_gpu":102,"env_ram":102,"env_deps":103,"category_tags":106,"github_topics":107,"view_count":119,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":120,"updated_at":121,"faqs":122,"releases":151},992,"google\u002Flangextract","langextract","A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.","LangExtract 是一个开源的 Python 库，帮你从杂乱文本（比如医疗报告、小说或会议记录）中自动抓取结构化信息。它用大型语言模型（LLMs）理解你的需求——只需简单描述任务并提供几个例子，就能精准提取关键内容，如人物关系、药物名称或诊断结果，同时确保每个结果都能精确回溯到原文位置，避免“凭空捏造”。\n\n它解决了非结构化文本处理的痛点：传统方法常漏掉细节、难以验证准确性，尤其面对长文档时更棘手。LangExtract 通过三大亮点提升体验：一是“精确溯源”，自动高亮提取内容在原文的出处，方便快速核对；二是“交互式可视化”，一键生成可浏览的 HTML 文件，轻松审查成百上千条数据；三是高效处理长文本，用智能分块和并行计算提升召回率。它灵活支持云端模型（如 Gemini）和本地开源模型（通过 Ollama），无需模型微调，几分钟就能适配新领域。\n\n开发者、研究人员（尤其是医疗、金融或学术领域）会爱上它的易用性——你只需定义任务规则，就能构建定制化提取流程，省去繁琐编码。普通技术用户也能快速上手，让文本分析变得直观可靠。试试 LangExtract，把杂乱信息变成清晰结构，专注你的","LangExtract 是一个开源的 Python 库，帮你从杂乱文本（比如医疗报告、小说或会议记录）中自动抓取结构化信息。它用大型语言模型（LLMs）理解你的需求——只需简单描述任务并提供几个例子，就能精准提取关键内容，如人物关系、药物名称或诊断结果，同时确保每个结果都能精确回溯到原文位置，避免“凭空捏造”。\n\n它解决了非结构化文本处理的痛点：传统方法常漏掉细节、难以验证准确性，尤其面对长文档时更棘手。LangExtract 通过三大亮点提升体验：一是“精确溯源”，自动高亮提取内容在原文的出处，方便快速核对；二是“交互式可视化”，一键生成可浏览的 HTML 文件，轻松审查成百上千条数据；三是高效处理长文本，用智能分块和并行计算提升召回率。它灵活支持云端模型（如 Gemini）和本地开源模型（通过 Ollama），无需模型微调，几分钟就能适配新领域。\n\n开发者、研究人员（尤其是医疗、金融或学术领域）会爱上它的易用性——你只需定义任务规则，就能构建定制化提取流程，省去繁琐编码。普通技术用户也能快速上手，让文本分析变得直观可靠。试试 LangExtract，把杂乱信息变成清晰结构，专注你的核心工作吧！（字数：298）","\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\">\n    \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fgoogle\u002Flangextract\u002Fmain\u002Fdocs\u002F_static\u002Flogo.svg\" alt=\"LangExtract Logo\" width=\"128\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n# LangExtract\n\n[![PyPI version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Flangextract.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Flangextract\u002F)\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgoogle\u002Flangextract.svg?style=social&label=Star)](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract)\n![Tests](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Factions\u002Fworkflows\u002Fci.yaml\u002Fbadge.svg)\n[![DOI](https:\u002F\u002Fzenodo.org\u002Fbadge\u002FDOI\u002F10.5281\u002Fzenodo.17015089.svg)](https:\u002F\u002Fdoi.org\u002F10.5281\u002Fzenodo.17015089)\n\n## Table of Contents\n\n- [Introduction](#introduction)\n- [Why LangExtract?](#why-langextract)\n- [Quick Start](#quick-start)\n- [Installation](#installation)\n- [API Key Setup for Cloud Models](#api-key-setup-for-cloud-models)\n- [Adding Custom Model Providers](#adding-custom-model-providers)\n- [Using OpenAI Models](#using-openai-models)\n- [Using Local LLMs with Ollama](#using-local-llms-with-ollama)\n- [More Examples](#more-examples)\n  - [*Romeo and Juliet* Full Text Extraction](#romeo-and-juliet-full-text-extraction)\n  - [Medication Extraction](#medication-extraction)\n  - [Radiology Report Structuring: RadExtract](#radiology-report-structuring-radextract)\n- [Community Providers](#community-providers)\n- [Contributing](#contributing)\n- [Testing](#testing)\n- [Disclaimer](#disclaimer)\n\n## Introduction\n\nLangExtract is a Python library that uses LLMs to extract structured information from unstructured text documents based on user-defined instructions. It processes materials such as clinical notes or reports, identifying and organizing key details while ensuring the extracted data corresponds to the source text.\n\n## Why LangExtract?\n\n1.  **Precise Source Grounding:** Maps every extraction to its exact location in the source text, enabling visual highlighting for easy traceability and verification.\n2.  **Reliable Structured Outputs:** Enforces a consistent output schema based on your few-shot examples, leveraging controlled generation in supported models like Gemini to guarantee robust, structured results.\n3.  **Optimized for Long Documents:** Overcomes the \"needle-in-a-haystack\" challenge of large document extraction by using an optimized strategy of text chunking, parallel processing, and multiple passes for higher recall.\n4.  **Interactive Visualization:** Instantly generates a self-contained, interactive HTML file to visualize and review thousands of extracted entities in their original context.\n5.  **Flexible LLM Support:** Supports your preferred models, from cloud-based LLMs like the Google Gemini family to local open-source models via the built-in Ollama interface.\n6.  **Adaptable to Any Domain:** Define extraction tasks for any domain using just a few examples. LangExtract adapts to your needs without requiring any model fine-tuning.\n7.  **Leverages LLM World Knowledge:** Utilize precise prompt wording and few-shot examples to influence how the extraction task may utilize LLM knowledge. The accuracy of any inferred information and its adherence to the task specification are contingent upon the selected LLM, the complexity of the task, the clarity of the prompt instructions, and the nature of the prompt examples.\n\n## Quick Start\n\n> **Note:** Using cloud-hosted models like Gemini requires an API key. See the [API Key Setup](#api-key-setup-for-cloud-models) section for instructions on how to get and configure your key.\n\nExtract structured information with just a few lines of code.\n\n### 1. Define Your Extraction Task\n\nFirst, create a prompt that clearly describes what you want to extract. Then, provide a high-quality example to guide the model.\n\n```python\nimport langextract as lx\nimport textwrap\n\n# 1. Define the prompt and extraction rules\nprompt = textwrap.dedent(\"\"\"\\\n    Extract characters, emotions, and relationships in order of appearance.\n    Use exact text for extractions. Do not paraphrase or overlap entities.\n    Provide meaningful attributes for each entity to add context.\"\"\")\n\n# 2. Provide a high-quality example to guide the model\nexamples = [\n    lx.data.ExampleData(\n        text=\"ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.\",\n        extractions=[\n            lx.data.Extraction(\n                extraction_class=\"character\",\n                extraction_text=\"ROMEO\",\n                attributes={\"emotional_state\": \"wonder\"}\n            ),\n            lx.data.Extraction(\n                extraction_class=\"emotion\",\n                extraction_text=\"But soft!\",\n                attributes={\"feeling\": \"gentle awe\"}\n            ),\n            lx.data.Extraction(\n                extraction_class=\"relationship\",\n                extraction_text=\"Juliet is the sun\",\n                attributes={\"type\": \"metaphor\"}\n            ),\n        ]\n    )\n]\n```\n\n> **Note:** Examples drive model behavior. Each `extraction_text` should ideally be verbatim from the example's `text` (no paraphrasing), listed in order of appearance. LangExtract raises `Prompt alignment` warnings by default if examples don't follow this pattern—resolve these for best results.\n>\n> **Grounding:** LLMs may occasionally extract content from few-shot examples rather than the input text. LangExtract automatically detects this: extractions that cannot be located in the source text will have `char_interval = None`. Filter these out with `[e for e in result.extractions if e.char_interval]` to keep only grounded results.\n\n### 2. Run the Extraction\n\nProvide your input text and the prompt materials to the `lx.extract` function.\n\n```python\n# The input text to be processed\ninput_text = \"Lady Juliet gazed longingly at the stars, her heart aching for Romeo\"\n\n# Run the extraction\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=prompt,\n    examples=examples,\n    model_id=\"gemini-2.5-flash\",\n)\n```\n\n> **Model Selection**: `gemini-2.5-flash` is the recommended default, offering an excellent balance of speed, cost, and quality. For highly complex tasks requiring deeper reasoning, `gemini-2.5-pro` may provide superior results. For large-scale or production use, a Tier 2 Gemini quota is suggested to increase throughput and avoid rate limits. See the [rate-limit documentation](https:\u002F\u002Fai.google.dev\u002Fgemini-api\u002Fdocs\u002Frate-limits#tier-2) for details.\n>\n> **Model Lifecycle**: Note that Gemini models have a lifecycle with defined retirement dates. Users should consult the [official model version documentation](https:\u002F\u002Fcloud.google.com\u002Fvertex-ai\u002Fgenerative-ai\u002Fdocs\u002Flearn\u002Fmodel-versions) to stay informed about the latest stable and legacy versions.\n\n### 3. Visualize the Results\n\nThe extractions can be saved to a `.jsonl` file, a popular format for working with language model data. LangExtract can then generate an interactive HTML visualization from this file to review the entities in context.\n\n```python\n# Save the results to a JSONL file\nlx.io.save_annotated_documents([result], output_name=\"extraction_results.jsonl\", output_dir=\".\")\n\n# Generate the visualization from the file\nhtml_content = lx.visualize(\"extraction_results.jsonl\")\nwith open(\"visualization.html\", \"w\") as f:\n    if hasattr(html_content, 'data'):\n        f.write(html_content.data)  # For Jupyter\u002FColab\n    else:\n        f.write(html_content)\n```\n\nThis creates an animated and interactive HTML file:\n\n![Romeo and Juliet Basic Visualization ](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle_langextract_readme_35d6f27b6b2c.gif)\n\n> **Note on LLM Knowledge Utilization:** This example demonstrates extractions that stay close to the text evidence - extracting \"longing\" for Lady Juliet's emotional state and identifying \"yearning\" from \"gazed longingly at the stars.\" The task could be modified to generate attributes that draw more heavily from the LLM's world knowledge (e.g., adding `\"identity\": \"Capulet family daughter\"` or `\"literary_context\": \"tragic heroine\"`). The balance between text-evidence and knowledge-inference is controlled by your prompt instructions and example attributes.\n\n### Scaling to Longer Documents\n\nFor larger texts, you can process entire documents directly from URLs with parallel processing and enhanced sensitivity:\n\n```python\n# Process Romeo & Juliet directly from Project Gutenberg\nresult = lx.extract(\n    text_or_documents=\"https:\u002F\u002Fwww.gutenberg.org\u002Ffiles\u002F1513\u002F1513-0.txt\",\n    prompt_description=prompt,\n    examples=examples,\n    model_id=\"gemini-2.5-flash\",\n    extraction_passes=3,    # Improves recall through multiple passes\n    max_workers=20,         # Parallel processing for speed\n    max_char_buffer=1000    # Smaller contexts for better accuracy\n)\n```\n\nThis approach can extract hundreds of entities from full novels while maintaining high accuracy. The interactive visualization seamlessly handles large result sets, making it easy to explore hundreds of entities from the output JSONL file. **[See the full *Romeo and Juliet* extraction example →](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002Fdocs\u002Fexamples\u002Flonger_text_example.md)** for detailed results and performance insights.\n\n### Vertex AI Batch Processing\n\nSave costs on large-scale tasks by enabling Vertex AI Batch API: `language_model_params={\"vertexai\": True, \"batch\": {\"enabled\": True}}`.\n\nSee an example of the Vertex AI Batch API usage in [this example](docs\u002Fexamples\u002Fbatch_api_example.md).\n\n## Installation\n\n### From PyPI\n\n```bash\npip install langextract\n```\n\n*Recommended for most users. For isolated environments, consider using a virtual environment:*\n\n```bash\npython -m venv langextract_env\nsource langextract_env\u002Fbin\u002Factivate  # On Windows: langextract_env\\Scripts\\activate\npip install langextract\n```\n\n### From Source\n\nLangExtract uses modern Python packaging with `pyproject.toml` for dependency management:\n\n*Installing with `-e` puts the package in development mode, allowing you to modify the code without reinstalling.*\n\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract.git\ncd langextract\n\n# For basic installation:\npip install -e .\n\n# For development (includes linting tools):\npip install -e \".[dev]\"\n\n# For testing (includes pytest):\npip install -e \".[test]\"\n```\n\n### Docker\n\n```bash\ndocker build -t langextract .\ndocker run --rm -e LANGEXTRACT_API_KEY=\"your-api-key\" langextract python your_script.py\n```\n\n## API Key Setup for Cloud Models\n\nWhen using LangExtract with cloud-hosted models (like Gemini or OpenAI), you'll need to\nset up an API key. On-device models don't require an API key. For developers\nusing local LLMs, LangExtract offers built-in support for Ollama and can be\nextended to other third-party APIs by updating the inference endpoints.\n\n### API Key Sources\n\nGet API keys from:\n\n*   [AI Studio](https:\u002F\u002Faistudio.google.com\u002Fapp\u002Fapikey) for Gemini models\n*   [Vertex AI](https:\u002F\u002Fcloud.google.com\u002Fvertex-ai\u002Fgenerative-ai\u002Fdocs\u002Fsdks\u002Foverview) for enterprise use\n*   [OpenAI Platform](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys) for OpenAI models\n\n### Setting up API key in your environment\n\n**Option 1: Environment Variable**\n\n```bash\nexport LANGEXTRACT_API_KEY=\"your-api-key-here\"\n```\n\n**Option 2: .env File (Recommended)**\n\nAdd your API key to a `.env` file:\n\n```bash\n# Add API key to .env file\ncat >> .env \u003C\u003C 'EOF'\nLANGEXTRACT_API_KEY=your-api-key-here\nEOF\n\n# Keep your API key secure\necho '.env' >> .gitignore\n```\n\nIn your Python code:\n```python\nimport langextract as lx\n\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=\"Extract information...\",\n    examples=[...],\n    model_id=\"gemini-2.5-flash\"\n)\n```\n\n**Option 3: Direct API Key (Not Recommended for Production)**\n\nYou can also provide the API key directly in your code, though this is not recommended for production use:\n\n```python\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=\"Extract information...\",\n    examples=[...],\n    model_id=\"gemini-2.5-flash\",\n    api_key=\"your-api-key-here\"  # Only use this for testing\u002Fdevelopment\n)\n```\n\n**Option 4: Vertex AI (Service Accounts)**\n\nUse [Vertex AI](https:\u002F\u002Fcloud.google.com\u002Fvertex-ai\u002Fdocs\u002Fstart\u002Fintroduction-unified-platform) for authentication with service accounts:\n\n```python\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=\"Extract information...\",\n    examples=[...],\n    model_id=\"gemini-2.5-flash\",\n    language_model_params={\n        \"vertexai\": True,\n        \"project\": \"your-project-id\",\n        \"location\": \"global\"  # or regional endpoint\n    }\n)\n```\n\n## Adding Custom Model Providers\n\nLangExtract supports custom LLM providers via a lightweight plugin system. You can add support for new models without changing core code.\n\n- Add new model support independently of the core library\n- Distribute your provider as a separate Python package\n- Keep custom dependencies isolated\n- Override or extend built-in providers via priority-based resolution\n\nSee the detailed guide in [Provider System Documentation](langextract\u002Fproviders\u002FREADME.md) to learn how to:\n\n- Register a provider with `@registry.register(...)`\n- Publish an entry point for discovery\n- Optionally provide a schema with `get_schema_class()` for structured output\n- Integrate with the factory via `create_model(...)`\n\n## Using OpenAI Models\n\nLangExtract supports OpenAI models (requires optional dependency: `pip install langextract[openai]`):\n\n```python\nimport langextract as lx\n\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=prompt,\n    examples=examples,\n    model_id=\"gpt-4o\",  # Automatically selects OpenAI provider\n    api_key=os.environ.get('OPENAI_API_KEY'),\n    fence_output=True,\n    use_schema_constraints=False\n)\n```\n\nNote: OpenAI models require `fence_output=True` and `use_schema_constraints=False` because LangExtract doesn't implement schema constraints for OpenAI yet.\n\n## Using Local LLMs with Ollama\nLangExtract supports local inference using Ollama, allowing you to run models without API keys:\n\n```python\nimport langextract as lx\n\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=prompt,\n    examples=examples,\n    model_id=\"gemma2:2b\",  # Automatically selects Ollama provider\n    model_url=\"http:\u002F\u002Flocalhost:11434\",\n    fence_output=False,\n    use_schema_constraints=False\n)\n```\n\n**Quick setup:** Install Ollama from [ollama.com](https:\u002F\u002Follama.com\u002F), run `ollama pull gemma2:2b`, then `ollama serve`.\n\nFor detailed installation, Docker setup, and examples, see [`examples\u002Follama\u002F`](examples\u002Follama\u002F).\n\n## More Examples\n\nAdditional examples of LangExtract in action:\n\n### *Romeo and Juliet* Full Text Extraction\n\nLangExtract can process complete documents directly from URLs. This example demonstrates extraction from the full text of *Romeo and Juliet* from Project Gutenberg (147,843 characters), showing parallel processing, sequential extraction passes, and performance optimization for long document processing.\n\n**[View *Romeo and Juliet* Full Text Example →](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002Fdocs\u002Fexamples\u002Flonger_text_example.md)**\n\n### Medication Extraction\n\n> **Disclaimer:** This demonstration is for illustrative purposes of LangExtract's baseline capability only. It does not represent a finished or approved product, is not intended to diagnose or suggest treatment of any disease or condition, and should not be used for medical advice.\n\nLangExtract excels at extracting structured medical information from clinical text. These examples demonstrate both basic entity recognition (medication names, dosages, routes) and relationship extraction (connecting medications to their attributes), showing LangExtract's effectiveness for healthcare applications.\n\n**[View Medication Examples →](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002Fdocs\u002Fexamples\u002Fmedication_examples.md)**\n\n### Radiology Report Structuring: RadExtract\n\nExplore RadExtract, a live interactive demo on HuggingFace Spaces that shows how LangExtract can automatically structure radiology reports. Try it directly in your browser with no setup required.\n\n**[View RadExtract Demo →](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fgoogle\u002Fradextract)**\n\n## Community Providers\n\nExtend LangExtract with custom model providers! Check out our [Community Provider Plugins](COMMUNITY_PROVIDERS.md) registry to discover providers created by the community or add your own.\n\nFor detailed instructions on creating a provider plugin, see the [Custom Provider Plugin Example](examples\u002Fcustom_provider_plugin\u002F).\n\n## Contributing\n\nContributions are welcome! See [CONTRIBUTING.md](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002FCONTRIBUTING.md) to get started\nwith development, testing, and pull requests. You must sign a\n[Contributor License Agreement](https:\u002F\u002Fcla.developers.google.com\u002Fabout)\nbefore submitting patches.\n\n\n\n## Testing\n\nTo run tests locally from the source:\n\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract.git\ncd langextract\n\n# Install with test dependencies\npip install -e \".[test]\"\n\n# Run all tests\npytest tests\n```\n\nOr reproduce the full CI matrix locally with tox:\n\n```bash\ntox  # runs pylint + pytest on Python 3.10 and 3.11\n```\n\n### Ollama Integration Testing\n\nIf you have Ollama installed locally, you can run integration tests:\n\n```bash\n# Test Ollama integration (requires Ollama running with gemma2:2b model)\ntox -e ollama-integration\n```\n\nThis test will automatically detect if Ollama is available and run real inference tests.\n\n## Development\n\n### Code Formatting\n\nThis project uses automated formatting tools to maintain consistent code style:\n\n```bash\n# Auto-format all code\n.\u002Fautoformat.sh\n\n# Or run formatters separately\nisort langextract tests --profile google --line-length 80\npyink langextract tests --config pyproject.toml\n```\n\n### Pre-commit Hooks\n\nFor automatic formatting checks:\n```bash\npre-commit install  # One-time setup\npre-commit run --all-files  # Manual run\n```\n\n### Linting\n\nRun linting before submitting PRs:\n\n```bash\npylint --rcfile=.pylintrc langextract tests\n```\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for full development guidelines.\n\n## Disclaimer\n\nThis is not an officially supported Google product. If you use\nLangExtract in production or publications, please cite accordingly and\nacknowledge usage. Use is subject to the [Apache 2.0 License](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002FLICENSE).\nFor health-related applications, use of LangExtract is also subject to the\n[Health AI Developer Foundations Terms of Use](https:\u002F\u002Fdevelopers.google.com\u002Fhealth-ai-developer-foundations\u002Fterms).\n\n---\n\n**Happy Extracting!**\n","\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\">\n    \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fgoogle\u002Flangextract\u002Fmain\u002Fdocs\u002F_static\u002Flogo.svg\" alt=\"LangExtract 标志\" width=\"128\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n# LangExtract\n\n[![PyPI 版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Flangextract.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Flangextract\u002F)\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgoogle\u002Flangextract.svg?style=social&label=Star)](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract)\n![测试](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Factions\u002Fworkflows\u002Fci.yaml\u002Fbadge.svg)\n[![DOI](https:\u002F\u002Fzenodo.org\u002Fbadge\u002FDOI\u002F10.5281\u002Fzenodo.17015089.svg)](https:\u002F\u002Fdoi.org\u002F10.5281\u002Fzenodo.17015089)\n\n## 目录\n\n- [简介](#简介)\n- [为什么选择 LangExtract？](#为什么选择-langextract)\n- [快速开始](#快速开始)\n- [安装](#安装)\n- [云模型的 API 密钥配置](#云模型的-api-密钥配置)\n- [添加自定义模型提供商](#添加自定义模型提供商)\n- [使用 OpenAI 模型](#使用-openai-模型)\n- [通过 Ollama 使用本地大语言模型（LLM）](#通过-ollama-使用本地大语言模型llm)\n- [更多示例](#更多示例)\n  - [*罗密欧与朱丽叶*全文提取](#罗密欧与朱丽叶-全文提取)\n  - [药物信息提取](#药物信息提取)\n  - [放射科报告结构化：RadExtract](#放射科报告结构化-radextract)\n- [社区提供商](#社区提供商)\n- [贡献指南](#贡献指南)\n- [测试](#测试)\n- [免责声明](#免责声明)\n\n## 简介\n\nLangExtract 是一个 Python 库，利用大型语言模型（LLM）根据用户定义的指令从非结构化文本文档中提取结构化信息。它处理临床笔记或报告等材料，识别并组织关键细节，同时确保提取的数据与源文本严格对应。\n\n## 为什么选择 LangExtract？\n\n1.  **精准的源文本定位（Precise Source Grounding）：** 将每次提取精确映射到源文本中的具体位置，支持可视化高亮显示，便于追溯和验证。\n2.  **可靠的结构化输出（Reliable Structured Outputs）：** 基于您的少样本示例（few-shot examples）强制执行一致的输出模式，利用 Gemini 等支持模型的受控生成能力，确保结果稳健且结构化。\n3.  **长文档优化（Optimized for Long Documents）：** 通过优化的文本分块策略、并行处理和多轮提取，克服大型文档中\"大海捞针\"的挑战，提高召回率。\n4.  **交互式可视化（Interactive Visualization）：** 即时生成独立的交互式 HTML 文件，在原始上下文中可视化和审查数千个提取实体。\n5.  **灵活的 LLM 支持（Flexible LLM Support）：** 支持您偏好的模型，从 Google Gemini 系列等云托管 LLM 到通过内置 Ollama 接口的本地开源模型。\n6.  **全领域适配（Adaptable to Any Domain）：** 仅需少量示例即可定义任意领域的提取任务。LangExtract 无需模型微调即可适配您的需求。\n7.  **利用 LLM 世界知识（Leverages LLM World Knowledge）：** 通过精确的提示词设计和少样本示例，影响提取任务如何利用 LLM 的知识。任何推断信息的准确性及其对任务规范的符合程度，取决于所选 LLM、任务复杂度、提示指令清晰度以及提示示例的性质。\n\n## 快速开始\n\n> **注意：** 使用 Gemini 等云托管模型需要 API 密钥。请参阅 [API 密钥配置](#云模型的-api-密钥配置) 部分获取密钥申请和配置说明。\n\n仅需几行代码即可提取结构化信息。\n\n### 1. 定义提取任务\n\n首先，创建一个清晰描述提取目标的提示词。然后提供高质量示例引导模型。\n\n```python\nimport langextract as lx\nimport textwrap\n\n# 1. 定义提示词和提取规则\nprompt = textwrap.dedent(\"\"\"\\\n    按出现顺序提取角色、情绪和关系。\n    使用原文文本进行提取，不得改写或重叠实体。\n    为每个实体提供有意义的属性以补充上下文。\"\"\")\n\n# 2. 提供高质量示例引导模型\nexamples = [\n    lx.data.ExampleData(\n        text=\"ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.\",\n        extractions=[\n            lx.data.Extraction(\n                extraction_class=\"character\",\n                extraction_text=\"ROMEO\",\n                attributes={\"emotional_state\": \"wonder\"}\n            ),\n            lx.data.Extraction(\n                extraction_class=\"emotion\",\n                extraction_text=\"But soft!\",\n                attributes={\"feeling\": \"gentle awe\"}\n            ),\n            lx.data.Extraction(\n                extraction_class=\"relationship\",\n                extraction_text=\"Juliet is the sun\",\n                attributes={\"type\": \"metaphor\"}\n            ),\n        ]\n    )\n]\n```\n\n> **注意：** 示例驱动模型行为。每个 `extraction_text` 应理想地直接取自示例的 `text`（不得改写），并按出现顺序列出。若示例不符合此模式，LangExtract 默认会触发 `Prompt alignment` 警告——为获得最佳结果，请解决这些警告。\n>\n> **源文本定位（Grounding）：** LLM 有时可能从少样本示例而非输入文本中提取内容。LangExtract 会自动检测此情况：无法在源文本中定位的提取项将具有 `char_interval = None`。使用 `[e for e in result.extractions if e.char_interval]` 过滤，仅保留有定位的结果。\n\n### 2. 执行提取\n\n将输入文本和提示材料提供给 `lx.extract` 函数。\n\n```python\n# 待处理的输入文本\ninput_text = \"Lady Juliet gazed longingly at the stars, her heart aching for Romeo\"\n\n# 执行提取\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=prompt,\n    examples=examples,\n    model_id=\"gemini-2.5-flash\",\n)\n```\n\n> **模型选择：** `gemini-2.5-flash` 是推荐的默认选项，在速度、成本和质量间取得极佳平衡。对于需要深度推理的高复杂度任务，`gemini-2.5-pro` 可能提供更优结果。大规模或生产环境建议申请 Gemini Tier 2 配额以提升吞吐量并避免速率限制。详情参见 [速率限制文档](https:\u002F\u002Fai.google.dev\u002Fgemini-api\u002Fdocs\u002Frate-limits#tier-2)。\n>\n> **模型生命周期：** Gemini 模型具有明确的生命周期和退役日期。用户应查阅 [官方模型版本文档](https:\u002F\u002Fcloud.google.com\u002Fvertex-ai\u002Fgenerative-ai\u002Fdocs\u002Flearn\u002Fmodel-versions) 了解最新稳定版和旧版信息。\n\n### 3. 可视化结果\n\n提取结果可保存为 `.jsonl` 文件（处理语言模型数据的常用格式）。LangExtract 可从此文件生成交互式 HTML 可视化界面，在上下文中审查实体。\n\n```python\n\n# 将结果保存到 JSONL 文件\nlx.io.save_annotated_documents([result], output_name=\"extraction_results.jsonl\", output_dir=\".\")\n\n# 从文件生成可视化\nhtml_content = lx.visualize(\"extraction_results.jsonl\")\nwith open(\"visualization.html\", \"w\") as f:\n    if hasattr(html_content, 'data'):\n        f.write(html_content.data)  # For Jupyter\u002FColab\n    else:\n        f.write(html_content)\n```\n\n这将生成一个动画和交互式的 HTML 文件：\n\n![Romeo and Juliet Basic Visualization ](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle_langextract_readme_35d6f27b6b2c.gif)\n\n> **关于 LLM（大型语言模型）知识利用的说明：** 本示例展示了紧密基于文本证据的提取——为朱丽叶小姐的情感状态提取 \"longing\"（渴望），并从 \"gazed longingly at the stars\" 中识别出 \"yearning\"（渴望）。该任务可以修改为生成更多依赖 LLM 世界知识的属性（例如，添加 `\"identity\": \"Capulet family daughter\"` 或 `\"literary_context\": \"tragic heroine\"`）。文本证据与知识推理之间的平衡由您的提示指令和示例属性控制。\n\n### 扩展到更长文档\n\n对于较大文本，您可以通过并行处理和增强敏感度直接从 URL 处理整个文档：\n\n```python\n# 直接从 Project Gutenberg 处理《罗密欧与朱丽叶》\nresult = lx.extract(\n    text_or_documents=\"https:\u002F\u002Fwww.gutenberg.org\u002Ffiles\u002F1513\u002F1513-0.txt\",\n    prompt_description=prompt,\n    examples=examples,\n    model_id=\"gemini-2.5-flash\",\n    extraction_passes=3,    # 通过多次提取提高召回率\n    max_workers=20,         # 并行处理以提升速度\n    max_char_buffer=1000    # 更小的上下文以提高准确性\n)\n```\n\n此方法可以从整部小说中提取数百个实体，同时保持高精度。交互式可视化无缝处理大型结果集，便于从输出的 JSONL 文件中探索数百个实体。**[查看完整的《罗密欧与朱丽叶》提取示例 →](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002Fdocs\u002Fexamples\u002Flonger_text_example.md)** 以获取详细结果和性能洞察。\n\n### Vertex AI（谷歌云顶点 AI）批量处理\n\n通过启用 Vertex AI 批量 API 降低大规模任务成本：`language_model_params={\"vertexai\": True, \"batch\": {\"enabled\": True}}`。\n\n参见 [this example](docs\u002Fexamples\u002Fbatch_api_example.md) 中的 Vertex AI 批量 API 使用示例。\n\n## 安装\n\n### 从 PyPI 安装\n\n```bash\npip install langextract\n```\n\n*推荐给大多数用户。对于隔离环境，建议使用虚拟环境：*\n\n```bash\npython -m venv langextract_env\nsource langextract_env\u002Fbin\u002Factivate  # 在 Windows 上：langextract_env\\Scripts\\activate\npip install langextract\n```\n\n### 从源码安装\n\nLangExtract 使用现代 Python 打包方式，通过 `pyproject.toml` 进行依赖管理：\n\n*使用 `-e` 安装会将包置于开发模式，允许您修改代码而无需重新安装。*\n\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract.git\ncd langextract\n\n# 基础安装：\npip install -e .\n\n# 开发环境（包含代码检查工具）：\npip install -e \".[dev]\"\n\n# 测试环境（包含 pytest）：\npip install -e \".[test]\"\n```\n\n### Docker\n\n```bash\ndocker build -t langextract .\ndocker run --rm -e LANGEXTRACT_API_KEY=\"your-api-key\" langextract python your_script.py\n```\n\n## 云模型的 API 密钥设置\n\n当将 LangExtract 与云托管模型（如 Gemini 或 OpenAI）一起使用时，您需要设置 API 密钥。设备上模型不需要 API 密钥。对于使用本地 LLM（大型语言模型）的开发者，LangExtract 提供对 Ollama 的内置支持，并可通过更新推理端点扩展到其他第三方 API。\n\n### API 密钥来源\n\n从以下位置获取 API 密钥：\n\n*   [AI Studio](https:\u002F\u002Faistudio.google.com\u002Fapp\u002Fapikey) 用于 Gemini 模型\n*   [Vertex AI](https:\u002F\u002Fcloud.google.com\u002Fvertex-ai\u002Fgenerative-ai\u002Fdocs\u002Fsdks\u002Foverview) 用于企业级应用\n*   [OpenAI Platform](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys) 用于 OpenAI 模型\n\n### 在环境中设置 API 密钥\n\n**选项 1：环境变量**\n\n```bash\nexport LANGEXTRACT_API_KEY=\"your-api-key-here\"\n```\n\n**选项 2：.env 文件（推荐）**\n\n将您的 API 密钥添加到 `.env` 文件：\n\n```bash\n# 将 API 密钥添加到 .env 文件\ncat >> .env \u003C\u003C 'EOF'\nLANGEXTRACT_API_KEY=your-api-key-here\nEOF\n\n# 保护您的 API 密钥安全\necho '.env' >> .gitignore\n```\n\n在您的 Python 代码中：\n```python\nimport langextract as lx\n\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=\"Extract information...\",\n    examples=[...],\n    model_id=\"gemini-2.5-flash\"\n)\n```\n\n**选项 3：直接 API 密钥（不推荐用于生产环境）**\n\n您也可以在代码中直接提供 API 密钥，但不建议在生产环境中使用：\n\n```python\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=\"Extract information...\",\n    examples=[...],\n    model_id=\"gemini-2.5-flash\",\n    api_key=\"your-api-key-here\"  # 仅用于测试\u002F开发\n)\n```\n\n**选项 4：Vertex AI（服务账号）**\n\n使用 [Vertex AI](https:\u002F\u002Fcloud.google.com\u002Fvertex-ai\u002Fdocs\u002Fstart\u002Fintroduction-unified-platform) 通过服务账号进行身份验证：\n\n```python\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=\"Extract information...\",\n    examples=[...],\n    model_id=\"gemini-2.5-flash\",\n    language_model_params={\n        \"vertexai\": True,\n        \"project\": \"your-project-id\",\n        \"location\": \"global\"  # 或区域端点\n    }\n)\n```\n\n## 添加自定义模型提供者\n\nLangExtract 通过轻量级插件系统支持自定义 LLM 提供者。您可以在不更改核心代码的情况下添加对新模型的支持。\n\n- 独立于核心库添加新模型支持\n- 将您的提供者作为单独的 Python 包分发\n- 保持自定义依赖项隔离\n- 通过基于优先级的解析覆盖或扩展内置提供者\n\n参见 [Provider System Documentation](langextract\u002Fproviders\u002FREADME.md) 中的详细指南，了解如何：\n\n- 使用 `@registry.register(...)` 注册提供者\n- 发布用于发现的入口点\n- 可选地通过 `get_schema_class()` 提供模式以实现结构化输出\n- 通过 `create_model(...)` 与工厂集成\n\n## 使用 OpenAI 模型\n\nLangExtract 支持 OpenAI 模型（需要可选依赖：`pip install langextract[openai]`）：\n\n```python\nimport langextract as lx\n\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=prompt,\n    examples=examples,\n    model_id=\"gpt-4o\",  # 自动选择 OpenAI 提供者\n    api_key=os.environ.get('OPENAI_API_KEY'),\n    fence_output=True,\n    use_schema_constraints=False\n)\n```\n\n注意：OpenAI 模型需要 `fence_output=True` 和 `use_schema_constraints=False`，因为 LangExtract 尚未为 OpenAI 实现模式约束。\n\n## 使用 Ollama 与本地大语言模型（LLM）\nLangExtract 支持通过 Ollama（本地大语言模型运行框架）进行本地推理，无需 API 密钥即可运行模型：\n\n```python\nimport langextract as lx\n\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=prompt,\n    examples=examples,\n    model_id=\"gemma2:2b\",  # 自动选择 Ollama 提供商\n    model_url=\"http:\u002F\u002Flocalhost:11434\",\n    fence_output=False,\n    use_schema_constraints=False\n)\n```\n\n**快速设置：** 从 [ollama.com](https:\u002F\u002Follama.com\u002F) 安装 Ollama，运行 `ollama pull gemma2:2b`，然后执行 `ollama serve`。\n\n详细安装指南、Docker（容器化平台）配置及示例请参阅 [`examples\u002Follama\u002F`](examples\u002Follama\u002F)。\n\n## 更多样例\n\nLangExtract 实际应用的其他示例：\n\n### *罗密欧与朱丽叶* 全文提取\n\nLangExtract 可直接从 URL 处理完整文档。本示例演示了从 Project Gutenberg 获取的 *罗密欧与朱丽叶* 全文（147,843 个字符）进行提取，展示了并行处理、顺序提取流程以及长文档处理的性能优化。\n\n**[查看 *罗密欧与朱丽叶* 全文示例 →](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002Fdocs\u002Fexamples\u002Flonger_text_example.md)**\n\n### 药物信息提取\n\n> **免责声明：** 本演示仅用于说明 LangExtract 的基础能力，不代表已完成或获批的产品，不用于诊断或建议任何疾病\u002F症状的治疗，不可作为医疗建议使用。\n\nLangExtract 擅长从临床文本中提取结构化医疗信息。这些示例展示了基础实体识别（药物名称、剂量、给药途径）和关系提取（关联药物与其属性），体现 LangExtract 在医疗健康应用中的有效性。\n\n**[查看药物提取示例 →](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002Fdocs\u002Fexamples\u002Fmedication_examples.md)**\n\n### 放射科报告结构化：RadExtract\n\n探索 RadExtract，这是一个 HuggingFace Spaces 上的实时交互式演示，展示 LangExtract 如何自动结构化放射科报告。无需配置，直接在浏览器中体验。\n\n**[查看 RadExtract 演示 →](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fgoogle\u002Fradextract)**\n\n## 社区提供商\n\n通过自定义模型提供商扩展 LangExtract！查阅我们的 [社区提供商插件](COMMUNITY_PROVIDERS.md) 注册表，发现社区创建的提供商或添加您自己的插件。\n\n创建提供商插件的详细说明请参见 [自定义提供商插件示例](examples\u002Fcustom_provider_plugin\u002F)。\n\n## 贡献指南\n\n欢迎贡献！参阅 [CONTRIBUTING.md](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002FCONTRIBUTING.md) 了解开发、测试和提交拉取请求的流程。提交补丁前必须签署 [贡献者许可协议](https:\u002F\u002Fcla.developers.google.com\u002Fabout)。\n\n## 测试\n\n从源码本地运行测试：\n\n```bash\n# 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract.git\ncd langextract\n\n# 安装测试依赖\npip install -e \".[test]\"\n\n# 运行所有测试\npytest tests\n```\n\n或使用 tox 本地复现完整 CI 矩阵（持续集成测试矩阵）：\n\n```bash\ntox  # 在 Python 3.10 和 3.11 上运行 pylint + pytest\n```\n\n### Ollama 集成测试\n\n若本地已安装 Ollama，可运行集成测试：\n\n```bash\n# 测试 Ollama 集成（需运行含 gemma2:2b 模型的 Ollama）\ntox -e ollama-integration\n```\n\n该测试将自动检测 Ollama 可用性并执行真实推理测试。\n\n## 开发指南\n\n### 代码格式化\n\n本项目使用自动化格式化工具保持代码风格一致：\n\n```bash\n# 自动格式化所有代码\n.\u002Fautoformat.sh\n\n# 或分别运行格式化工具\nisort langextract tests --profile google --line-length 80\npyink langextract tests --config pyproject.toml\n```\n\n### 预提交钩子\n\n用于自动格式化检查：\n```bash\npre-commit install  # 一次性设置\npre-commit run --all-files  # 手动执行\n```\n\n### 代码检查\n\n提交 PR 前请运行代码检查：\n\n```bash\npylint --rcfile=.pylintrc langextract tests\n```\n\n完整开发规范请参阅 [CONTRIBUTING.md](CONTRIBUTING.md)。\n\n## 免责声明\n\n本项目非 Google 官方支持产品。若在生产环境或出版物中使用 LangExtract，请适当引用并注明使用情况。使用受 [Apache 2.0 License](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002FLICENSE) 约束。医疗相关应用还需遵守 [Health AI Developer Foundations Terms of Use](https:\u002F\u002Fdevelopers.google.com\u002Fhealth-ai-developer-foundations\u002Fterms)。\n\n---\n\n**愉快提取！**","# LangExtract 快速上手指南\n\n## 环境准备\n- **系统要求**：Python 3.8 或更高版本（推荐 3.10+）\n- **前置依赖**：\n  - pip 包管理工具（随 Python 自动安装）\n  - 建议使用虚拟环境隔离依赖（避免全局污染）\n  - 使用云模型（如 Gemini）需提前准备 API Key（本地模型无需）\n\n## 安装步骤\n```bash\n# 创建虚拟环境（推荐）\npython -m venv langextract_env\nsource langextract_env\u002Fbin\u002Factivate  # Linux\u002FMac\n# Windows 用户执行：langextract_env\\Scripts\\activate\n\n# 使用国内镜像源加速安装（推荐清华源）\npip install langextract -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 验证安装\npython -c \"import langextract; print('安装成功')\"\n```\n\n## 基本使用\n以下是最简示例，从文本中提取角色、情绪和关系信息：\n\n```python\nimport langextract as lx\nimport textwrap\n\n# 1. 定义提取任务（保持 prompt 清晰明确）\nprompt = textwrap.dedent(\"\"\"\\\n    Extract characters, emotions, and relationships in order of appearance.\n    Use exact text for extractions. Do not paraphrase or overlap entities.\"\"\")\n\n# 2. 提供示例（关键！决定模型行为）\nexamples = [\n    lx.data.ExampleData(\n        text=\"ROMEO. But soft! What light through yonder window breaks?\",\n        extractions=[\n            lx.data.Extraction(\n                extraction_class=\"character\",\n                extraction_text=\"ROMEO\",\n                attributes={\"emotional_state\": \"wonder\"}\n            )\n        ]\n    )\n]\n\n# 3. 执行提取（使用默认 Gemini 模型）\ninput_text = \"Lady Juliet gazed longingly at the stars\"\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=prompt,\n    examples=examples,\n    model_id=\"gemini-2.5-flash\"  # 推荐默认模型\n)\n\n# 4. 查看结果（仅保留文本中的有效提取）\ngrounded_extractions = [e for e in result.extractions if e.char_interval]\nprint(grounded_extractions)\n```\n\n> **关键提示**：\n> 1. 示例必须严格使用原文片段（禁止改写），否则会触发 `Prompt alignment` 警告\n> 2. 首次使用云模型需配置 API Key（[设置指南](#api-key-setup-for-cloud-models)）\n> 3. 本地测试推荐 `gemini-2.5-flash` 模型，兼顾速度与准确性\n> 4. 结果中的 `char_interval` 为 `None` 表示未在原文找到对应内容，需过滤","一位医疗数据分析师正在处理医院电子健康记录系统中的非结构化医生笔记，需要从数千份长篇临床文档中提取患者当前服用的药物清单，以支持用药安全分析。\n\n### 没有 langextract 时\n- 手动逐行阅读笔记耗时极长，处理一份2000字的文档平均需40分钟，团队每周浪费20+小时在重复劳动上。\n- 提取的药物信息缺乏原文位置标记，审核时需反复翻查原始文本，错误率高达15%，例如漏掉“阿司匹林 100mg”中的剂量单位。\n- 输出格式混乱：不同分析师对“二甲双胍”可能写成“Metformin”或“格华止”，导致后续数据整合失败。\n- 长文档中关键信息分散（如药物名藏在段落末尾），人工易遗漏低频药物，召回率不足70%。\n\n### 使用 langextract 后\n- 自动解析文档仅需2分钟\u002F份，通过Ollama本地模型并行处理，团队效率提升20倍，周节省15+小时。\n- 每个提取项（如“二甲双胍 500mg”）精确高亮原文位置，交互式HTML报告一键跳转验证，错误率降至5%以下。\n- 强制输出标准化JSON结构（含药物名、剂量、频率字段），确保所有数据统一格式，无缝对接分析系统。\n- 优化分块策略自动扫描全文，多轮提取覆盖边缘案例，低频药物召回率提升至95%，关键信息零遗漏。\n\nlangextract将非结构化医疗文本的提取过程转化为高效、精准且可追溯的自动化流程，释放数据价值。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle_langextract_35d6f27b.gif","google","Google","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fgoogle_c4bedcda.png","Google ❤️ Open Source",null,"opensource@google.com","GoogleOSS","https:\u002F\u002Fopensource.google\u002F","https:\u002F\u002Fgithub.com\u002Fgoogle",[86,90,94],{"name":87,"color":88,"percentage":89},"Python","#3572A5",99.6,{"name":91,"color":92,"percentage":93},"Shell","#89e051",0.4,{"name":95,"color":96,"percentage":97},"Dockerfile","#384d54",0,35479,2405,"2026-04-05T23:03:01","Apache-2.0","未说明",{"notes":104,"python":102,"dependencies":105},"使用云模型（如Gemini\u002FOpenAI）需配置API key；支持本地模型需安装Ollama；处理长文档时建议增加max_workers参数提升并行效率；首次运行需网络访问以调用模型服务。",[],[53,26,51,13],[108,109,110,111,112,113,114,115,116,117,118],"llm","nlp","python","gemini-ai","information-extration","large-language-models","structured-data","gemini","gemini-api","gemini-flash","gemini-pro",41,"2026-03-27T02:49:30.150509","2026-04-06T09:45:08.315267",[123,128,132,136,141,146],{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},4409,"使用LangExtract时出现'ResolverParsingError: Content must contain an 'extractions' key'错误，如何解决？","此错误通常因文本分块后某些块无实体导致LLM返回空JSON。解决方案：增加max_char_buffer参数值，例如在代码中设置max_char_buffer=3000、4000或5000（默认1000）。升级到最新版本可自动处理模型返回列表或包含\u003Cthink>标签的情况。具体代码示例：`extractor = LangExtract(max_char_buffer=4000)`。","https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fissues\u002F132",{"id":129,"question_zh":130,"answer_zh":131,"source_url":127},4410,"使用推理模型（如GPT-4o）时出现解析错误，如何处理包含\u003Cthink>标签的输出？","新版本LangExtract已自动处理；解析器会在解析失败时剥离\u003Cthink>标签。请确保升级到最新版本（如v1.1.0+），无需额外配置。维护者确认PR #300修复了此问题，推理模型现在可正常工作。",{"id":133,"question_zh":134,"answer_zh":135,"source_url":127},4411,"当模型返回列表[...]而非字典{'extractions': [...]}时，LangExtract报错，如何解决？","新版本支持顶层列表格式；升级LangExtract到最新版本即可自动处理此类输出。维护者通过PR #300添加了顶层列表回退机制，模型返回[...]时不再报错，直接解析为extractions数据。",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},4412,"如何使LangExtract支持日语、中文等非英语语言？","使用UnicodeTokenizer。在代码中初始化时指定tokenizer=UnicodeTokenizer，例如：`extractor = LangExtract(tokenizer=UnicodeTokenizer())`。参考[日语提取示例文档](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002Fdocs\u002Fexamples\u002Fjapanese_extraction.md)配置，确保升级到v1.1.0或更高版本。","https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fissues\u002F13",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},4413,"如何集成Outlines库进行结构化生成？","安装langextract-outlines包：`pip install langextract-outlines`。然后按照[COMMUNITY_PROVIDERS.md](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002FCOMMUNITY_PROVIDERS.md)中的说明配置，例如使用OutlinesProvider指定输出格式。这提供统一接口处理不同模型的结构化输出，无需手动管理模型特定参数。","https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fissues\u002F101",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},4414,"提取中文文本时，为什么char_interval属性为None？","需要启用UnicodeTokenizer以支持多字节字符。升级LangExtract到最新版本，并在初始化时指定tokenizer=UnicodeTokenizer，例如：`extractor = LangExtract(tokenizer=UnicodeTokenizer())`。参考[日语示例文档](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002Fdocs\u002Fexamples\u002Fjapanese_extraction.md)配置，确保正确对齐中文字符区间。","https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fissues\u002F201",[152,157,162,167,172,177,182,187,192,197,202,207,212],{"id":153,"version":154,"summary_zh":155,"released_at":156},113531,"v1.2.0","## What's New\n\n### Features\n- Cross-chunk context awareness for coreference resolution (#306)\n  - Resolves pronouns and references across chunk boundaries (e.g., \"She\" → \"Dr. Sarah Johnson\")\n  - New `context_window_chars` parameter on `extract()`\n\n### Bug Fixes\n- Load builtin providers before resolution regardless of config path (#419)\n  - Fixes `InferenceConfigError` when specifying provider by name via `ModelConfig(provider='ollama')`\n- Graceful handling of chunks with no extractable entities (#423)\n  - `suppress_parse_errors` now defaults to `True` in `extract()` so one unparseable chunk does not fail the entire document\n  - Sanitizes suppress-parse-error log path to exclude raw chunk text\n- Send `keep_alive` at top level for Ollama API (#421)\n- Support Enum\u002Fdataclass values in GCS batch cache hashing (#359)\n- Handle non-Gemini model output parsing edge cases (#300)\n\n### Documentation\n- Clarify that ungrounded extractions have `char_interval=None` (#420)\n- Clarify best practices for few-shot examples (#302)\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.1.1...v1.2.0","2026-03-22T22:11:16",{"id":158,"version":159,"summary_zh":160,"released_at":161},113532,"v1.1.1","## What's New\n\n### Improvements\n- Multi-language tokenizer support with Unicode & Regex (#284)\n  - Significantly improves support for CJK (Chinese, Japanese, Korean) languages\n  - Better handling of non-Latin scripts\n\n### Bug Fixes\n- Fix Gemini Batch API project parameter passing (#286)\n  - Resolves \"Required parameter: project\" error when using Vertex AI\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.1.0...v1.1.1","2025-11-27T04:49:59",{"id":163,"version":164,"summary_zh":165,"released_at":166},113533,"v1.1.0","## What's New\n\n### Features\n- Vertex AI Batch API Support (#279)\n  - Cost-effective processing with automatic chunking, GCS caching, and fault tolerance\n  - Automatic fallback to standard online prediction if batch job fails\n- FormatHandler and schema validation framework (#239)\n- Independent progress bar control (`show_progress`) (#227)\n- Zenodo DOI support (#218)\n- Alignment parameter support via `resolver_params` (#211)\n- Community Providers:\n  - Outlines (#250)\n  - vLLM (#244)\n  - llama-cpp-python (#202)\n\n### Improvements\n- Streamlined annotation layer with lazy streaming (#276)\n- Diverse text type benchmark with tokenization quality metrics (#272)\n- Enable `suppress_parse_errors` parameter in `resolver_params` (#261)\n- Resolve pylint naming convention warnings in provider modules (#273)\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.0.9...v1.1.0\n","2025-11-14T22:21:00",{"id":168,"version":169,"summary_zh":170,"released_at":171},113534,"v1.0.9","## What's New\n\n### Features\n- Prompt alignment validation for few-shot examples (#215)\n  - Validates that example extractions exist in their source text\n  - Three modes: OFF, WARNING (default), ERROR\n  - New parameters: `prompt_validation_level` and `prompt_validation_strict`\n- Vertex AI authentication support for Gemini provider (#60)\n- llama-cpp-python community provider added (#202)\n\n### Improvements\n- Changed `debug=False` as default in `extract()` for cleaner output\n- Fixed router typings for provider plugins (#190)\n- Allow T-prefixed TypeVars in pylint (#194)\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.0.8...v1.0.9","2025-08-31T19:50:26",{"id":173,"version":174,"summary_zh":175,"released_at":176},113535,"v1.0.8","## What's Changed\n\n### Features\n- Ollama timeout improvements (#154)\n  - Increased default timeout from 30s to 120s \n  - Made timeout configurable via ModelConfig\n  - Fixed kwargs not being passed through\n\n### Documentation\n- Improved visualization examples for Jupyter\u002FColab (#153)\n- Added Romeo & Juliet Colab notebook\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.0.7...v1.0.8","2025-08-15T07:19:02",{"id":178,"version":179,"summary_zh":180,"released_at":181},113536,"v1.0.7","## What's New\n\n- Debug logging support when `debug=True` in `lx.extract()` (#142)\n- GPT-5 model registration fixes (#143)\n- Improved documentation for provider plugins and schema support\n- Automated plugin generator script for external providers\n- Base URL support for OpenAI-compatible endpoints (#138)\n\nSee the [full changelog](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.0.6...v1.0.7) for details.","2025-08-14T11:37:40",{"id":183,"version":184,"summary_zh":185,"released_at":186},113537,"v1.0.6","## Major Features\n\n### Custom Model Provider Plugin Support\n- New provider registry infrastructure for extending LangExtract with custom LLM providers\n- Plugin discovery via entry points allows third-party packages to register providers\n- Example implementation available at [examples\u002Fcustom_provider_plugin](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Ftree\u002Fmain\u002Fexamples\u002Fcustom_provider_plugin)\n\n### Schema System Refactor  \n- Refactored schema system to support provider-specific schema implementations\n- Providers can now define their own schema constraints and validation\n- Better separation of concerns between core schema logic and provider implementations\n\n## Enhancements\n\n- **Ollama Provider**: Added support for Hugging Face style model IDs (e.g., `meta-llama\u002FLlama-3.2-1B-Instruct`)\n- **Extract API**: Added `model` and `config` parameters to `extract()` for more flexible model configuration\n- **Examples**: Updated Ollama quickstart to demonstrate ModelConfig pattern with JSON mode\n- **Testing**: Improved test infrastructure for provider registry and plugin system\n\n## Bug Fixes\n\n- Fixed lazy loading for provider pattern registration\n- Fixed unicode escaping in example generation\n- Fixed test failures related to provider registry initialization\n\n## Installation\n```bash\npip install langextract==1.0.6\n```\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.0.5...v1.0.6","2025-08-13T10:22:16",{"id":188,"version":189,"summary_zh":190,"released_at":191},113538,"v1.0.5","## What's Changed\n\n### Bug Fixes\n- Fix chunking bug when newlines fall at chunk boundaries (#88) - Resolves issue where content was incorrectly chunked when newline characters appeared at chunk boundaries\n- Fix IPython import warnings and improve notebook detection (#86) - Eliminates import warnings in Jupyter notebooks and improves compatibility\n\n### New Features  \n- Add base_url parameter to OpenAILanguageModel (#51) - Enables using custom OpenAI-compatible endpoints for alternative LLM providers\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.0.4...v1.0.5","2025-08-08T01:32:20",{"id":193,"version":194,"summary_zh":195,"released_at":196},113539,"v1.0.4","## What's Changed\n\n- **Added Ollama language model integration** – Full support for local LLMs via Ollama\n- **Docker deployment support** – Production-ready docker-compose setup with health checks\n- **Comprehensive examples** – Quickstart script and detailed documentation in `examples\u002Follama\u002F`\n- **Fixed OllamaLanguageModel parameter** – Changed from `model` to `model_id` for consistency (#57)\n- **Enhanced CI\u002FCD** – Added Ollama integration tests that run on every PR\n- **Improved documentation** – Consistent API examples across all language models\n\n## Technical Details\n\n- Supports all Ollama models (gemma2:2b, llama3.2, mistral, etc.)\n- Secure setup with localhost-only binding by default\n- Integration tests use lightweight models for faster CI runs\n- Docker setup includes automatic model pulling and health checks\n\n## Usage Example\n\n```python\nimport langextract as lx\n\nresult = lx.extract(\n    text_or_documents=input_text,\n    prompt_description=prompt,\n    examples=examples,\n    language_model_type=lx.inference.OllamaLanguageModel,\n    model_id=\"gemma2:2b\",\n    model_url=\"http:\u002F\u002Flocalhost:11434\",\n    fence_output=False,\n    use_schema_constraints=False\n)\n```\n\n**Quick setup:** Install Ollama from [ollama.com](https:\u002F\u002Follama.com\u002F), run `ollama pull gemma2:2b`, then `ollama serve`.\n\nFor detailed installation, Docker setup, and more examples, see [`examples\u002Follama\u002F`](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Ftree\u002Fmain\u002Fexamples\u002Follama).\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.0.3...v1.0.4","2025-08-05T12:29:36",{"id":198,"version":199,"summary_zh":200,"released_at":201},113540,"v1.0.3","## v1.0.3 – OpenAI language model support\n\n### What's Changed\n- **Added OpenAI language model integration** – Support for GPT-4o, GPT-4o-mini, and other OpenAI models\n- **Enhanced documentation** – Added OpenAI usage examples and API key setup instructions to README\n- **Comprehensive test coverage** – Added unit tests for OpenAI backend\n\n### Technical Details\n- Uses modern OpenAI v1.x client API with parallel processing support\n- Note: Schema constraints for OpenAI are not yet implemented (use `use_schema_constraints=False`)\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.0.2...v1.0.3","2025-08-03T17:26:25",{"id":203,"version":204,"summary_zh":205,"released_at":206},113541,"v1.0.2","## v1.0.2 – Slimmer install, Windows fix, OpenAI v1.x support\r\n\r\n### What’s Changed\r\n- **Removed `langfun` and `pylibmagic` dependencies** – lighter install; no `libmagic` needed on Windows  \r\n- Fixed Windows-installation failure [#25]  \r\n- Restored compatibility with modern **OpenAI SDK v1.x** [#16]  \r\n- Updated README and Dockerfile to match the new, slimmer dependency set\r\n\r\n### Note\r\n`LangFunLanguageModel` has been removed.  \r\nIf you still need LangFun support, please open a new issue so we can discuss re-adding it in a cross-platform way.\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.0.1...v1.0.2","2025-08-03T13:47:12",{"id":208,"version":209,"summary_zh":210,"released_at":211},113542,"v1.0.1","### What's Changed\n- Fixed libmagic ImportError by adding pylibmagic dependency (#6)\n- Added `[full]` install option for easier setup\n- Added Docker support with pre-installed libmagic  \n- Updated troubleshooting documentation\n\n### Bug Fixes\n- Resolve \"failed to find libmagic\" error when importing langextract (#6)\n\n### Installation\n```bash\n# Standard install (now includes pylibmagic)\npip install langextract\n\n# Full install (explicit all dependencies)\npip install langextract[full]\n\n# Docker (libmagic pre-installed)\ndocker run --rm -e LANGEXTRACT_API_KEY=\"your-key\" langextract python script.py\n```\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fcompare\u002Fv1.0.0...v1.0.1","2025-08-02T06:40:38",{"id":213,"version":214,"summary_zh":215,"released_at":216},113543,"v1.0.0","A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.\n\n## Key Features\n- Extract structured data from any text using few-shot examples\n- Support for Gemini and Ollama models\n- Interactive HTML visualizations with source highlighting\n- Optimized for long documents with parallel processing and multiple extraction passes\n- Precise source grounding - every extraction maps to its location in the original text\n\n## Installation\n```bash\npip install langextract\n```\n\nSee the [documentation](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Flangextract\u002Fblob\u002Fmain\u002FREADME.md) for full usage examples.","2025-07-22T21:59:26"]