[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-aryn-ai--sycamore":3,"tool-aryn-ai--sycamore":62},[4,18,28,37,45,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":24,"last_commit_at":25,"category_tags":26,"status":17},9989,"n8n","n8n-io\u002Fn8n","n8n 是一款面向技术团队的公平代码（fair-code）工作流自动化平台，旨在让用户在享受低代码快速构建便利的同时，保留编写自定义代码的灵活性。它主要解决了传统自动化工具要么过于封闭难以扩展、要么完全依赖手写代码效率低下的痛点，帮助用户轻松连接 400 多种应用与服务，实现复杂业务流程的自动化。\n\nn8n 特别适合开发者、工程师以及具备一定技术背景的业务人员使用。其核心亮点在于“按需编码”：既可以通过直观的可视化界面拖拽节点搭建流程，也能随时插入 JavaScript 或 Python 代码、调用 npm 包来处理复杂逻辑。此外，n8n 原生集成了基于 LangChain 的 AI 能力，支持用户利用自有数据和模型构建智能体工作流。在部署方面，n8n 提供极高的自由度，支持完全自托管以保障数据隐私和控制权，也提供云端服务选项。凭借活跃的社区生态和数百个现成模板，n8n 让构建强大且可控的自动化系统变得简单高效。",184740,2,"2026-04-19T23:22:26",[16,14,13,15,27],"插件",{"id":29,"name":30,"github_repo":31,"description_zh":32,"stars":33,"difficulty_score":10,"last_commit_at":34,"category_tags":35,"status":17},10095,"AutoGPT","Significant-Gravitas\u002FAutoGPT","AutoGPT 是一个旨在让每个人都能轻松使用和构建 AI 的强大平台，核心功能是帮助用户创建、部署和管理能够自动执行复杂任务的连续型 AI 智能体。它解决了传统 AI 应用中需要频繁人工干预、难以自动化长流程工作的痛点，让用户只需设定目标，AI 即可自主规划步骤、调用工具并持续运行直至完成任务。\n\n无论是开发者、研究人员，还是希望提升工作效率的普通用户，都能从 AutoGPT 中受益。开发者可利用其低代码界面快速定制专属智能体；研究人员能基于开源架构探索多智能体协作机制；而非技术背景用户也可直接选用预置的智能体模板，立即投入实际工作场景。\n\nAutoGPT 的技术亮点在于其模块化“积木式”工作流设计——用户通过连接功能块即可构建复杂逻辑，每个块负责单一动作，灵活且易于调试。同时，平台支持本地自托管与云端部署两种模式，兼顾数据隐私与使用便捷性。配合完善的文档和一键安装脚本，即使是初次接触的用户也能在几分钟内启动自己的第一个 AI 智能体。AutoGPT 正致力于降低 AI 应用门槛，让人人都能成为 AI 的创造者与受益者。",183572,"2026-04-20T04:47:55",[13,36,27,14,15],"语言模型",{"id":38,"name":39,"github_repo":40,"description_zh":41,"stars":42,"difficulty_score":10,"last_commit_at":43,"category_tags":44,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":46,"name":47,"github_repo":48,"description_zh":49,"stars":50,"difficulty_score":24,"last_commit_at":51,"category_tags":52,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",161147,"2026-04-19T23:31:47",[14,13,36],{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":59,"last_commit_at":60,"category_tags":61,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,27],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":68,"readme_en":69,"readme_zh":70,"quickstart_zh":71,"use_case_zh":72,"hero_image_url":73,"owner_login":74,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":78,"owner_url":79,"languages":80,"stars":108,"forks":109,"last_commit_at":110,"license":111,"difficulty_score":24,"env_os":112,"env_gpu":113,"env_ram":114,"env_deps":115,"category_tags":126,"github_topics":128,"view_count":24,"oss_zip_url":78,"oss_zip_packed_at":78,"status":17,"created_at":138,"updated_at":139,"faqs":140,"releases":166},10152,"aryn-ai\u002Fsycamore","sycamore","🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.","Sycamore 是一款专为非结构化数据打造的开源搜索与分析平台，旨在让大语言模型（LLM）更轻松地理解和处理复杂文档。面对 PDF、演示文稿、手册等包含表格、图表和图像的复杂文件，传统工具往往难以精准提取内容，导致检索增强生成（RAG）效果不佳。Sycamore 通过智能分区技术，将这些杂乱文档转化为结构清晰、语义丰富的高质量数据块，从而显著提升下游搜索和分析的准确性。\n\n这款工具特别适合开发者、数据工程师及 AI 研究人员使用，尤其是那些需要构建企业级知识库、优化 RAG 应用或进行大规模文档 ETL（抽取、转换、加载）工作的团队。Sycamore 的核心亮点在于集成了 Aryn DocParse 服务，利用先进的视觉 AI 模型（DETR）精准识别文档布局，其数据分块准确度比同类系统高出数倍。此外，它基于 Ray 构建可扩展后端，提供灵活的 Python API 和\"DocSet\"抽象概念，让用户能像操作普通数据集一样轻松清洗、 enrich（增强）并加载数据至 OpenSearch、Pinecone 等主流向量数据库。无论是快速原型验证还是生产环境部署，Sycamore 都能帮","Sycamore 是一款专为非结构化数据打造的开源搜索与分析平台，旨在让大语言模型（LLM）更轻松地理解和处理复杂文档。面对 PDF、演示文稿、手册等包含表格、图表和图像的复杂文件，传统工具往往难以精准提取内容，导致检索增强生成（RAG）效果不佳。Sycamore 通过智能分区技术，将这些杂乱文档转化为结构清晰、语义丰富的高质量数据块，从而显著提升下游搜索和分析的准确性。\n\n这款工具特别适合开发者、数据工程师及 AI 研究人员使用，尤其是那些需要构建企业级知识库、优化 RAG 应用或进行大规模文档 ETL（抽取、转换、加载）工作的团队。Sycamore 的核心亮点在于集成了 Aryn DocParse 服务，利用先进的视觉 AI 模型（DETR）精准识别文档布局，其数据分块准确度比同类系统高出数倍。此外，它基于 Ray 构建可扩展后端，提供灵活的 Python API 和\"DocSet\"抽象概念，让用户能像操作普通数据集一样轻松清洗、 enrich（增强）并加载数据至 OpenSearch、Pinecone 等主流向量数据库。无论是快速原型验证还是生产环境部署，Sycamore 都能帮助用户高效释放非结构化数据的价值。","\u003Ca name=\"readme-top\">\u003C\u002Fa>\n![SycamoreLogoFinal.svg](https:\u002F\u002Fraw.githubusercontent.com\u002Faryn-ai\u002Fsycamore\u002Fmain\u002Fdocs\u002Fsource\u002Fimages\u002Fsycamore_logo.svg)\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fsycamore-ai)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fsycamore-ai\u002F)\n[![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fsycamore-ai)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fsycamore-ai\u002F)\n[![Slack](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fslack-sycamore-brightgreen.svg?logo=slack)](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Faryn-community\u002Fshared_invite\u002Fzt-36vhennsx-mN3UsqD6PT2vxVZxpqdHsw)\n[![Docs](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faryn-ai_sycamore_readme_13d664e1afd7.png)](https:\u002F\u002Fsycamore.readthedocs.io\u002Fen\u002Fstable\u002F?badge=stable)\n![License](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Faryn-ai\u002Fsycamore)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Faryn-ai\u002Fsycamore)\n\nSycamore is an open source, AI-powered document processing engine for ETL, RAG, LLM-based applications, and analytics on unstructured data. Sycamore can partition and enrich a wide range of document types including reports, presentations, transcripts, manuals, and more. It can analyze and chunk complex documents such as PDFs and images with embedded tables, figures, graphs, and other infographics. Check out an [example notebook](https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fblob\u002Fmain\u002Fnotebooks\u002Fsycamore-tutorial-intermediate-etl.ipynb).\n\nFor processing documents, Sycamore leverages [Aryn DocParse](https:\u002F\u002Fwww.aryn.ai\u002Fpost\u002Fannouncing-the-aryn-partitioning-service) (formerly known as the Aryn Partitioning Service), a serverless, GPU-powered API for segmenting and labeling documents, doing OCR, extracting tables and images, and more. It leverages Aryn's state-of-the-art, [open source deep learning DETR AI model](https:\u002F\u002Fhuggingface.co\u002FAryn\u002Fdeformable-detr-DocLayNet) trained on 80k+ enterprise documents, and it can lead to 6x more accurate data chunking and 2x improved recall on hybrid search or RAG when compared to alternate systems. You can [sign-up for free here](http:\u002F\u002Fwww.aryn.ai\u002Fget-started), or choose to run the Aryn Partitioner locally.\n\nAryn DocParse takes [documents](https:\u002F\u002Fdocs.aryn.ai\u002Fdocparse\u002Fformats_supported) and returns the partitioned output in JSON, and you can use Sycamore for additional data extraction, enrichment, transforms, cleaning, and loading into downstream databases. You can choose the LLMs to use with these transforms.\n\nSycamore reliably loads your vector databases and hybrid search engines, including as OpenSearch, ElasticSearch, Pinecone, DuckDB, Qdrant and Weaviate, with higher quality data. \n\nThe Sycamore framework is built around a scalable and robust abstraction for document processing called a DocSet, and includes powerful high-level transformations in Python for data processing, enrichment, and cleaning. DocSets also encapsulate scalable data processing techniques removing the undifferentiated heavy lifting of reliably loading chunks. DocSets' functional programming approach allows you to rapidly customize and experiment with your chunking for better quality RAG results.\n\n![Untitled](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faryn-ai_sycamore_readme_481276fcbd59.png)\n\n## Features\n\n- Integrated with [Aryn DocParse](https:\u002F\u002Fsycamore.readthedocs.io\u002Fen\u002Fstable\u002Faryn_cloud\u002Faryn_partitioning_service.html), using a [state-of-the art vision AI model](https:\u002F\u002Fhuggingface.co\u002FAryn\u002Fdeformable-detr-DocLayNet) for segmentation and preserving the semantic structure of documents\n- DocSet abstraction to scalably and reliably transform and manipulate unstructured documents\n- High-quality table extraction, OCR, visual summarization, LLM-powered UDFs, and other performant Python data transforms\n- Quickly create vector embeddings using your choice of AI model\n- Helpful features like automatic data crawlers (Amazon S3 and HTTP), Jupyter notebook for writing and iterating on jobs, and an OpenSearch hybrid search and RAG engine for testing\n- Scalable [Ray](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray) backend\n\n## Demo\n\n[Introduction to Aryn DocParse (formerly known as the Aryn Partitioning Service)](https:\u002F\u002Fwww.aryn.ai\u002F?name=ArynPartitioningService_Intro)\n\n## Get Started\n\nSycamore currently runs on Linux and Mac OS. To install , run:\n\n```pip install sycamore-ai```\n\nSycamore provides connectors to vector databases via Python extras. To install a connector, include it as an extra with your pip install. For example, \n\n```pip install sycamore-ai[duckdb]```\n\nSupported connectors include `duckdb`, `elasticsearch`, `opensearch`, `pinecone`, `qdrant`, and `weaviate`.\n\nTo use Aryn DocParse, [sign-up for free here](https:\u002F\u002Fwww.aryn.ai\u002Fget-started) and use the API key.\n\n## Resources\n\n- Documentation: https:\u002F\u002Fsycamore.readthedocs.io\n- Example notebook: https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fblob\u002Fmain\u002Fnotebooks\u002Fsycamore-tutorial-intermediate-etl.ipynb\n- Slack: https:\u002F\u002Fjoin.slack.com\u002Ft\u002Faryn-community\u002Fshared_invite\u002Fzt-36vhennsx-mN3UsqD6PT2vxVZxpqdHsw\n- Data preparation libraries (PyPi): https:\u002F\u002Fpypi.org\u002Fproject\u002Fsycamore-ai\u002F\n- Contact us: info@aryn.ai\n\n## Contributing\n\nCheck out our [Contributing Guide](https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fblob\u002Fmain\u002FCONTRIBUTING.md) for more information about how to contribute to Sycamore and set up your environment for development.\n\n## Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg alt=\"contributors\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faryn-ai_sycamore_readme_969c1afe9b8b.png\"\u002F>\n\u003C\u002Fa>\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faryn-ai_sycamore_readme_6845ce6b1ccd.png)](https:\u002F\u002Fstar-history.com\u002F#aryn-ai\u002Fsycamore&Date)\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ Back to Top ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>\n","\u003Ca name=\"readme-top\">\u003C\u002Fa>\n![SycamoreLogoFinal.svg](https:\u002F\u002Fraw.githubusercontent.com\u002Faryn-ai\u002Fsycamore\u002Fmain\u002Fdocs\u002Fsource\u002Fimages\u002Fsycamore_logo.svg)\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fsycamore-ai)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fsycamore-ai\u002F)\n[![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fsycamore-ai)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fsycamore-ai\u002F)\n[![Slack](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fslack-sycamore-brightgreen.svg?logo=slack)](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Faryn-community\u002Fshared_invite\u002Fzt-36vhennsx-mN3UsqD6PT2vxVZxpqdHsw)\n[![Docs](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faryn-ai_sycamore_readme_13d664e1afd7.png)](https:\u002F\u002Fsycamore.readthedocs.io\u002Fen\u002Fstable\u002F?badge=stable)\n![License](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Faryn-ai\u002Fsycamore)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Faryn-ai\u002Fsycamore)\n\nSycamore 是一款开源的、由 AI 驱动的文档处理引擎，适用于 ETL、RAG、基于 LLM 的应用以及非结构化数据的分析。Sycamore 能够对多种文档类型进行分区和增强，包括报告、演示文稿、文字记录、手册等。它还能分析并分块处理复杂的文档，例如包含表格、图表、图形和其他信息图的 PDF 和图像文件。请查看一个 [示例笔记本](https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fblob\u002Fmain\u002Fnotebooks\u002Fsycamore-tutorial-intermediate-etl.ipynb)。\n\n在文档处理方面，Sycamore 利用 [Aryn DocParse](https:\u002F\u002Fwww.aryn.ai\u002Fpost\u002Fannouncing-the-aryn-partitioning-service)（前身为 Aryn 分区服务），这是一个无服务器、基于 GPU 的 API，用于分割和标记文档、执行 OCR、提取表格和图像等。它采用了 Aryn 最先进的、[开源深度学习 DETR AI 模型](https:\u002F\u002Fhuggingface.co\u002FAryn\u002Fdeformable-detr-DocLayNet)，该模型在超过 8 万份企业文档上进行了训练，与替代系统相比，能够实现 6 倍更高的数据分块准确率，并使混合搜索或 RAG 的召回率提高 2 倍。您可以在 [这里免费注册](http:\u002F\u002Fwww.aryn.ai\u002Fget-started)，或者选择在本地运行 Aryn 分区器。\n\nAryn DocParse 接收 [文档](https:\u002F\u002Fdocs.aryn.ai\u002Fdocparse\u002Fformats_supported)，并以 JSON 格式返回分区后的结果，而您可以使用 Sycamore 进行进一步的数据提取、增强、转换、清洗，并将其加载到下游数据库中。您可以选择用于这些转换的 LLM。\n\nSycamore 能可靠地将高质量的数据加载到您的向量数据库和混合搜索引擎中，包括 OpenSearch、ElasticSearch、Pinecone、DuckDB、Qdrant 和 Weaviate 等。\n\nSycamore 框架围绕一种名为 DocSet 的可扩展且稳健的文档处理抽象构建，包含强大的高级 Python 数据处理、增强和清洗转换功能。DocSet 还封装了可扩展的数据处理技术，从而消除了可靠加载数据块这一繁琐且重复性高的工作。DocSet 的函数式编程方法使您能够快速自定义和试验数据分块方式，以获得更优质的 RAG 结果。\n\n![Untitled](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faryn-ai_sycamore_readme_481276fcbd59.png)\n\n## 特性\n\n- 与 [Aryn DocParse](https:\u002F\u002Fsycamore.readthedocs.io\u002Fen\u002Fstable\u002Faryn_cloud\u002Faryn_partitioning_service.html) 集成，使用 [最先进的视觉 AI 模型](https:\u002F\u002Fhuggingface.co\u002FAryn\u002Fdeformable-detr-DocLayNet) 进行分割，并保留文档的语义结构\n- DocSet 抽象层，可扩展且可靠地转换和操作非结构化文档\n- 高质量的表格提取、OCR、可视化摘要、基于 LLM 的 UDF 以及其他高效的 Python 数据转换\n- 可快速使用您选择的 AI 模型创建向量嵌入\n- 便捷的功能，如自动数据爬取程序（Amazon S3 和 HTTP）、用于编写和迭代作业的 Jupyter 笔记本，以及用于测试的 OpenSearch 混合搜索和 RAG 引擎\n- 可扩展的 [Ray](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray) 后端\n\n## 演示\n\n[关于 Aryn DocParse 的介绍（前身为 Aryn 分区服务）](https:\u002F\u002Fwww.aryn.ai\u002F?name=ArynPartitioningService_Intro)\n\n## 开始使用\n\nSycamore 目前支持 Linux 和 Mac OS。要安装，请运行：\n\n```pip install sycamore-ai```\n\nSycamore 通过 Python extras 提供与向量数据库的连接器。要安装连接器，只需在 pip 安装时将其作为 extra 包含即可。例如：\n\n```pip install sycamore-ai[duckdb]```\n\n支持的连接器包括 `duckdb`、`elasticsearch`、`opensearch`、`pinecone`、`qdrant` 和 `weaviate`。\n\n要使用 Aryn DocParse，请 [在此免费注册](https:\u002F\u002Fwww.aryn.ai\u002Fget-started)，并使用 API 密钥。\n\n## 资源\n\n- 文档：https:\u002F\u002Fsycamore.readthedocs.io\n- 示例笔记本：https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fblob\u002Fmain\u002Fnotebooks\u002Fsycamore-tutorial-intermediate-etl.ipynb\n- Slack：https:\u002F\u002Fjoin.slack.com\u002Ft\u002Faryn-community\u002Fshared_invite\u002Fzt-36vhennsx-mN3UsqD6PT2vxVZxpqdHsw\n- 数据准备库（PyPi）：https:\u002F\u002Fpypi.org\u002Fproject\u002Fsycamore-ai\u002F\n- 联系我们：info@aryn.ai\n\n## 贡献\n\n请参阅我们的 [贡献指南](https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fblob\u002Fmain\u002FCONTRIBUTING.md)，了解更多关于如何为 Sycamore 做出贡献以及设置开发环境的信息。\n\n## 贡献者\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg alt=\"contributors\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faryn-ai_sycamore_readme_969c1afe9b8b.png\"\u002F>\n\u003C\u002Fa>\n\n## 星标历史\n\n[![星标历史图表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faryn-ai_sycamore_readme_6845ce6b1ccd.png)](https:\u002F\u002Fstar-history.com\u002F#aryn-ai\u002Fsycamore&Date)\n\n\u003Cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\">\n    \u003Ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\">\n        ↑ 返回顶部 ↑\n    \u003C\u002Fa>\n\u003C\u002Fp>","# Sycamore 快速上手指南\n\nSycamore 是一款开源的 AI 驱动文档处理引擎，专为非结构化数据的 ETL、RAG（检索增强生成）、LLM 应用及分析而设计。它能够智能分割和丰富各类文档（如 PDF、图片、报告等），提取表格、图表及文本内容，并高质量地加载至向量数据库。\n\n## 环境准备\n\n- **操作系统**：Linux 或 macOS\n- **Python 版本**：支持主流 Python 3.x 版本（具体兼容版本请参考 PyPI 页面）\n- **前置依赖**：\n  - `pip` 包管理工具\n  - （可选）若需使用特定向量数据库连接器，请确保对应数据库服务可用\n  - （可选）若使用 Aryn DocParse 云服务，需提前注册获取 API Key\n\n## 安装步骤\n\n1. **安装核心库**\n   ```bash\n   pip install sycamore-ai\n   ```\n\n2. **安装向量数据库连接器（按需选择）**\n   根据目标数据库安装对应的额外依赖，例如安装 DuckDB 连接器：\n   ```bash\n   pip install sycamore-ai[duckdb]\n   ```\n   支持的连接器包括：`duckdb`, `elasticsearch`, `opensearch`, `pinecone`, `qdrant`, `weaviate`。\n\n   > **提示**：国内用户如遇下载缓慢，可配置国内镜像源加速安装，例如使用清华源：\n   > ```bash\n   > pip install sycamore-ai -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   > ```\n\n3. **配置 Aryn DocParse（可选但推荐）**\n   为了获得最佳的文档分割和 OCR 效果，建议注册 [Aryn 免费账号](https:\u002F\u002Fwww.aryn.ai\u002Fget-started) 获取 API Key，并在代码中配置使用。也可选择在本地运行分割器。\n\n## 基本使用\n\n以下是一个最简单的示例，展示如何使用 Sycamore 读取文档、进行分割处理并生成嵌入向量：\n\n```python\nimport sycamore\nfrom sycamore.data import Document\n\n# 初始化上下文 (可根据需要配置 Ray 后端等)\nctx = sycamore.Context()\n\n# 创建 DocSet 并加载文档 (以本地 PDF 为例)\ndoc_set = ctx.read.binary(paths=[\"path\u002Fto\u002Fyour\u002Fdocument.pdf\"], binary_format=\"pdf\")\n\n# 使用 Aryn DocParse 进行文档分割与 enrich (需配置 API_KEY)\n# doc_set = doc_set.partition(partitioner=ArynPartitioner(api_key=\"YOUR_ARYN_API_KEY\"))\n\n# 或者使用基础分割器\ndoc_set = doc_set.partition()\n\n# 将文档内容转换为向量嵌入 (需指定 embedding 模型)\n# doc_set = doc_set.embed(embedder=SentenceTransformerEmbedder(model_name=\"all-MiniLM-L6-v2\"))\n\n# 查看处理后的数据\ndocs = doc_set.take(5)\nfor doc in docs:\n    print(doc.text_representation)\n```\n\n> **说明**：\n> - `DocSet` 是 Sycamore 的核心抽象，用于链式调用各种数据处理操作。\n> - 实际生产中通常结合 `ray` 后端进行大规模分布式处理。\n> - 完整的高级 ETL 示例请参考官方 [Example Notebook](https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fblob\u002Fmain\u002Fnotebooks\u002Fsycamore-tutorial-intermediate-etl.ipynb)。","某金融合规团队需要从数千份包含复杂表格和图表的年度 PDF 财报中，快速提取关键风险数据并构建可检索的知识库。\n\n### 没有 sycamore 时\n- 传统 OCR 工具无法识别文档中的复杂表格结构，导致行列数据错乱，人工校对耗时极长。\n- 简单的文本切分策略破坏了段落语义连贯性，使得基于向量检索的问答系统经常返回断章取义的片段。\n- 处理图像内的图表和非结构化文本需要编写大量定制脚本，开发周期长达数周且难以维护。\n- 缺乏统一的文档抽象层，面对不同格式的财报（如扫描件与电子版）需要重复开发解析逻辑。\n\n### 使用 sycamore 后\n- 利用内置的 Aryn DocParse 和视觉 AI 模型，精准还原财报中的复杂表格与图表结构，数据提取准确率提升 6 倍。\n- 基于语义的智能分块技术自动保留上下文逻辑，显著优化了 RAG 系统的检索召回率，问答结果更加精准可靠。\n- 通过 Python 高级转换接口，仅需几行代码即可完成从 OCR、数据清洗到向量化嵌入的全流程，开发效率提升数倍。\n- 借助 DocSet 抽象概念，统一处理多种格式的文档流，轻松对接 OpenSearch 或 Pinecone 等向量数据库，实现规模化部署。\n\nsycamore 将非结构化文档处理从繁琐的手工定制转变为高效、精准的自动化流水线，让企业能真正挖掘沉睡文档中的数据价值。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faryn-ai_sycamore_aebf9833.png","aryn-ai","Aryn, Inc.","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Faryn-ai_bcb897c9.png","",null,"https:\u002F\u002Fgithub.com\u002Faryn-ai",[81,85,89,93,97,101,105],{"name":82,"color":83,"percentage":84},"Python","#3572A5",75.8,{"name":86,"color":87,"percentage":88},"HTML","#e34c26",12.4,{"name":90,"color":91,"percentage":92},"Jupyter Notebook","#DA5B0B",10.3,{"name":94,"color":95,"percentage":96},"Shell","#89e051",1.1,{"name":98,"color":99,"percentage":100},"C++","#f34b7d",0.2,{"name":102,"color":103,"percentage":104},"Makefile","#427819",0.1,{"name":106,"color":107,"percentage":104},"Dockerfile","#384d54",598,67,"2026-04-19T17:54:49","Apache-2.0","Linux, macOS","可选。若使用云端 Aryn DocParse 服务则无需本地 GPU；若选择本地运行 Aryn Partitioner，需支持深度学习的 GPU（具体型号和显存未说明）。","未说明",{"notes":116,"python":117,"dependencies":118},"该工具核心功能依赖 Aryn DocParse 服务（默认通过 API 调用，需注册获取 API Key），也可选择本地部署。支持多种向量数据库连接器，需通过 pip extras 安装（如 sycamore-ai[duckdb]）。基于 Ray 构建可扩展后端。","3.8+",[119,120,121,122,123,124,125],"ray","duckdb","elasticsearch","opensearch","pinecone","qdrant","weaviate",[15,16,14,127,36,13],"其他",[129,130,131,132,133,134,135,122,136,137],"ai","dataprep","etl","information-retrieval","llm","ml","nlp","search","semantic-search","2026-03-27T02:49:30.150509","2026-04-20T19:33:11.129554",[141,146,151,156,161],{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},45584,"在 Windows 上导入 Sycamore 时遇到 'ModuleNotFoundError: No module named resource' 错误怎么办？","这是由于 `resource` 模块是 Unix 特有的，Windows 不支持。目前官方正在处理 Windows 兼容性问题。建议暂时不要在 Windows 上运行，或者关注官方后续发布的修复版本。如果必须使用，请等待官方合并相关修复或查看是否有新的 Windows 支持版本发布。","https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fissues\u002F960",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},45585,"导入 'ArynPartitioner' 时出现 ImportError 错误如何解决？","该问题通常是因为安装了错误的包或版本过旧。`ArynPartitioner` 是在 sycamore-ai 0.1.18 版本中引入的。请确保卸载名为 `sycamore` 的独立包（它与本项目无关），并安装或升级到最新版的 `sycamore-ai`（建议 0.1.19 或更高）。执行命令：`pip uninstall sycamore` 然后 `pip install -U sycamore-ai`。","https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fissues\u002F744",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},45586,"如何处理 PDF 文档解析时出现的 'Unable to merge' 或编码相关错误？","这通常是由于文本和二进制数据转换时的编码问题引起的。检查代码中涉及字符串转换的部分，确保显式指定 UTF-8 编码。例如，在将字符串转换为字节或反之亦然时，使用 `.encode(\"utf-8\")` 或 `.decode(\"utf-8\")`。特别是检查 `partition.py` 中类似第 51 行的代码，确保正确处理了编码格式。","https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fissues\u002F82",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},45587,"调用 SycamoreQueryClient.get_opensearch_schema 时遇到 ConnectionRefusedError 怎么办？","此错误表明客户端无法连接到 OpenSearch 服务。请检查以下几点：1. 确认 OpenSearch 服务是否正在运行；2. 检查连接地址和端口配置是否正确（默认通常是 localhost:9200）；3. 如果是远程服务，确认防火墙设置允许连接；4. 确认是否需要 HTTPS 以及证书配置是否正确。如果在本地运行，请确保已启动 OpenSearch 实例。","https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fissues\u002F1186",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},45588,"如何在生成嵌入之前将元数据增强到文本表示中以提升搜索效果？","可以通过自定义转换逻辑来实现。虽然目前可能没有直接的 `augment_text` API，但可以使用格式化字符串将文档属性（properties）拼接到原始文本前。例如，遍历文档列表，构建新的文本表示：`new_text = f\"This pertains to the part {doc.properties['part_name']}. {doc.text_representation}\"`。注意处理缺失属性的情况（使用 try\u002Fexcept 捕获 KeyError），建议在 `explode` 操作之后、嵌入生成之前执行此步骤。","https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fissues\u002F206",[167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242,247,252,257,262],{"id":168,"version":169,"summary_zh":170,"released_at":171},360464,"v0.1.29","本次 Sycamore 发布包含一些小的 bug 修复和功能增强。\n\n## 变更内容\n* 当没有表格结构时，使用单元格的标记边界框来代替单元格边界框，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1061 中实现。\n* 禁用在运行 KNN 查询时 OpenSearch 阅读器中的滚动功能。由 @austintlee 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1062 中实现。\n* 对 OCR 图像进行二值化处理以提升性能，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1063 中实现。\n* 修复了对于没有 `elem.table` 属性的表格元素的 `split_elements` 方法，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1064 中实现。\n* 修复了提取模式时返回为空的问题，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1067 中实现。\n* 将版本号升级至 v0.1.29，由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1068 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fcompare\u002Fv0.1.28...v0.1.29","2024-12-09T22:52:07",{"id":173,"version":174,"summary_zh":175,"released_at":176},360460,"v0.1.33","本次 Sycamore 发布包含多项错误修复和改进。\n\n## 变更内容\n* 捡拾易得的 mypy 果实（Sycamore 版）。由 @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1265 中完成  \n* 各类代码层面的改进。由 @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1266 中完成  \n* 通过批处理某些操作来优化 OpenSearch 阅读器。由 @austin-aryn-ai 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1267 中完成  \n* 在查询中不包含 original_elements。由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1268 中完成  \n* 添加新的 OpenAI 模型，并回滚 LlmFilterPrompt 的更改。由 @austin-aryn-ai 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1271 中完成  \n* 为 materialize 添加元数据加载功能。由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1270 中完成  \n* 更新 httpcore 和 h11 以解决 Dependabot 问题。由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1274 中完成  \n* 将 os_client_args 放入默认命名空间，以避免意外情况。由 @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1275 中完成  \n* 添加对 Gemini 模型意外停止原因的调试信息。由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1276 中完成  \n* 为 groupby 聚合结果总结分组名称。由 @bohou-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1261 中完成  \n* 将 torch 更新至 2.7.0，将 transformers 更新至 4.50.0。由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1279 中完成  \n* 添加 gpt-4.1 和 gpt-4.1-nano 模型。由 @Soeb-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1278 中完成  \n* 添加更完善的 view_pdf 函数。由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1277 中完成  \n* 对 materialize 文件名进行适度标准化。由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1283 中完成  \n* 增加对非聚类式 groupby 的支持。由 @bohou-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1281 中完成  \n* 在 LLM 提取实体时添加拆分实体的说明。由 @akarshgupta7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1286 中完成  \n* 添加用于展平实体的函数。由 @bohou-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1288 中完成  \n* 修复混合表格模型的回退逻辑，使其不再就地编辑 token。由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1290 中完成  \n* 基于文档集大小添加计算 K-means 聚类 K 值的启发式方法。由 @akarshgupta7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1287 中完成  \n* 修复非聚类式 groupby 路径中的错误。由 @bohou-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1291 中完成  \n* 将导入 Sycamore 的速度提升约 10 倍；并添加用于计时导入的模块\u002F工具。由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1292 中完成  \n* 改进 docset 文档，说明其为惰性加载机制。由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1294 中完成  \n* 处理非聚类式聚合中的计数操作。由 @bohou-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1293 中完成  \n* 根据使用托管服务的测试结果进行的小幅修复。由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1296 中完成  \n* 懒加载 tiktoken 分词器。由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1298 中完成  \n* 重构规划器，以添加查询预处理器功能。由 @HenryL27 在 h","2025-07-22T23:25:04",{"id":178,"version":179,"summary_zh":180,"released_at":181},360461,"v0.1.32","本次 Sycamore 发布包含多项错误修复和改进。\n\n## 变更内容\n* 由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1232 中为 aryn 写入器添加 autoschema 参数\n* 由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1234 中移除 aryn SDK 发布者工作流\n* 由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1230 中添加 extract_image_format 选项\n* 由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1238 中将 summaryDocument 反序列化为 SummaryDocument，而非 Document\n* 由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1239 中修复摘要文档中无子文档的情况\n* 由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1240 中修复 OS 文档重建读取问题\n* 由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1210 中提出当同时提及 reconstruct_document 时优先使用 doc_reconstruct\n* 由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1231 中添加对 easyocr 模型进行空气隔离的支持\n* 由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1243 中展开属性数据，而非引用\n* 由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1244 中提出若 deformable 失败则回退至 tatr\n* 由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1245 中修复 MRR 的变量名，该变量因合并错误而损坏\n* 由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1246 中向上游推送客户提示\n* 由 @bohou-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1241 中添加 materialize docset 的访问方法\n* 由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1247 中升级 Lint 依赖并重新运行 Lint\n* 由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1248 中添加 Claude 3.7 Sonnet\n* 由 @bohou-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1249 中将 QueryBookmark 重命名为 DataLoader\n* 由 @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1250 中提出 Embedder 现在是一个上下文管理器，可以释放资源\n* 由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1253 中为 llms 添加默认 llm_mode\n* 由 @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1254 中为我们的 OpenAI 和 OpenAIClientWrapper 类添加 close() 方法\n* 由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1255 中修复 limit transform\n* 由 @bohou-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1251 中修复聚类测试的不稳定问题\n* 由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1258 中调整 llm 过滤提示，使其更加优化\n* 由 @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1256 中允许在 run_plan() 中覆盖 OpenSearch 的用户名和密码\n* 由 @austin-aryn-ai 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1257 中提出在读取完成后删除 PIT\n* 由 @austin-aryn-ai 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1259 中提出仅在引发 TypeError 时才检查序列化问题\n* 由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1262 中为 aryn 写入器提供更好的默认设置\n* 错误修复：缺失的 token 包含 4 个斜杠，由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore 中修复","2025-04-21T20:02:38",{"id":183,"version":184,"summary_zh":185,"released_at":186},360462,"v0.1.31","本次 Sycamore 发布包含多项错误修复和改进。\n\n## 变更内容\n* 重构 llms 中的缓存 API，使 get 和 set API 具有对称性，由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1108 中完成。\n* 修复需要预加载索引的 OpenSearch 测试，由 @austintlee 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1111 中完成。\n* 修复 conf.py 中的 source_directory 路径，由 @sravan1946 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F953 中完成。\n* 使 `lib\u002Fpoetry-lock\u002Fpoetry-lock-all.sh` 的失败提示更加明显，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1107 中完成。\n* aryn-opensearch-bedrock-rag-example.ipynb，由 @jonfritz 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1117 中完成。\n* 升级依赖项以修复安全问题，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1119 中完成。\n* [llm 统一 1\u002Fn] 添加整合后的 prompt 类，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1120 中完成。\n* 添加依赖项审查 Action，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1121 中完成。\n* 移除 Guidance，由 @Soeb-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1114 中完成。\n* 升级 PyPDF，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1124 中完成。\n* 将 anthropic API 密钥添加到测试工作流中，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1125 中完成。\n* 在 `aryn-sdk` 中添加对异步 DocParse 调用的支持，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1116 中完成。\n* 将 `aryn-sdk` 版本升级至 0.1.11，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1127 中完成。\n* 添加 CodeQL 漏洞扫描，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1118 中完成。\n* 更新 fileformattools，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1133 中完成。\n* 从 LLMs 中捕获元数据，由 @Soeb-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1122 中完成。\n* 添加 Jupyter 工具（Finra 上游），由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1135 中完成。\n* 修改 @context_params 的行为，使其仅传递显式参数。由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1136 中完成。\n* 将 OpenAI 升级至 ^1.60.2。由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1137 中完成。\n* 显式添加 tiktoken 并重新锁定依赖项。由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1138 中完成。\n* 将 HeaderAugmenterMerger 添加到文档中，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1139 中完成。\n* 为 rtd 再次添加 --no-root 参数，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1140 中完成。\n* 通过 `aryn-sdk` 改善异步 DocParse 的使用，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1134 中完成。\n* 将 `aryn-sdk` 版本升级至 0.1.12，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1141 中完成。\n* 修复自报告的 `aryn-sdk` 版本问题，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1142 中完成。\n* 将 `aryn-sdk` 版本升级至 0.1.12.post0，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1143 中完成。\n* 更新测试工作流，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1144 中完成。\n* [llm 统一 2\u002Fn] 实现 llm_map(_elements) 并移动 ext","2025-03-25T03:58:56",{"id":188,"version":189,"summary_zh":190,"released_at":191},360463,"v0.1.30","本次 Sycamore 发布包含多项错误修复和改进。\n\n## 变更内容\n* 在 base_writer 中添加完整异常日志记录，由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1069 中完成。\n* 修复 create_element 方法在处理无效元素类型时崩溃的问题，由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1070 中完成。\n* 添加 docset.take_stream() 方法，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1071 中完成。\n* 对 `split_elements` 进行临时修复，以避免因某些表格元素导致递归深度超出限制，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1073 中完成。\n* 添加 TableMerger 用于合并元素文档，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1074 中完成。\n* 提高 `split_element` 的 `split_one` 方法的最大递归深度，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1075 中完成。\n* 合并元素与 LLM 过滤器，由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1076 中完成。\n* 为 similarity 模块添加 GPU 支持，由 @austintlee 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F999 中完成。\n* 容忍实体抽取中的错误情况，由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1078 中完成。\n* 移动可变形 DETR 的安全加载代码，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1055 中完成。\n* 允许通过函数重建 Doc 对象，由 @austintlee 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1072 中完成。\n* 为 LLM 实体抽取添加分词器和重排序功能，由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1081 中完成。\n* 添加 Schema 对象及实体抽取支持，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1083 中完成。\n* 使 ttviz.cpp 能够再次编译，由 @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1082 中完成。\n* 在 OpenAI Embedder 中保留换行符，由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1086 中完成。\n* 将默认嵌入模型更改为 OpenAI，由 @akarshgupta7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1087 中完成。\n* 添加元素级别的嵌入功能，由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1084 中完成。\n* 使 sycamore.query 能够使用 Schema 而不仅限于 OpenSearchSchema，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1088 中完成。\n* 添加混合表格提取器，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1089 中完成。\n* 添加 MapReduce 风格的摘要生成功能，用于处理大型文本的摘要任务，由 @austintlee 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1079 中完成。\n* 修复 max(nothing) 错误，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1091 中完成。\n* 延迟在嵌入器中初始化 OpenAI 客户端，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1092 中完成。\n* 修复 Windows 系统上的 materialize 功能，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1093 中完成。\n* 为 OpenSearch Writer 添加重试机制，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1085 中完成。\n* 实现属性抽取的类型转换，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1095 中完成。\n* 撤销过度的无根化操作，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1098 中完成。\n* 添加对 Anthropic LLM 的支持，由 @bsowell 在 https:\u002F\u002Fgithub.c 中完成。","2025-01-11T18:25:33",{"id":193,"version":194,"summary_zh":195,"released_at":196},360465,"v0.1.28","本次发布将 doc_ids 从 UUID 更新为 NanoIds，新增了一些文档标题功能，并提升了稳定性和性能。\n\n## 变更内容\n* 在 @Soeb-aryn 的贡献下，通过 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1023 添加了一次性提示以及多模态请求。\n* @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1028 中修复了 query-ui 对 boto3 的依赖问题，并重新锁定依赖版本。\n* @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1026 中更新了 NTSB 查询及 CIDR-25 论文的真值数据。\n* @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1027 中为 query-server 添加了流式传输支持及测试。\n* @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1031 中在 MarkedMerger 的输出中提供了元素类型信息。\n* @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1030 中修复了 SummarizeData，以确保下游的 .materialize 操作能够正常运行。\n* @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1034 中添加了 nanoid。\n* @akarshgupta7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1035 中移除了查询执行中的重复代码。\n* @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1032 中将 docids 从 UUID 转换为 NanoID。\n* @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1036 中在 file_scan 中使用 NanoIDs。\n* @Soeb-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1037 中提取了表格属性提示并修复了相关 bug。\n* @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1038 中将 DocIDs 转换为 Qdrant 和 Weaviate 所需的 UUID 格式，并编写了相应的单元测试。\n* @Soeb-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1033 中提出了基于章节标题获取标题的启发式方法。\n* @Soeb-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1041 中更新了 pdf_miner 类中的函数。\n* @akarshgupta7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1039 中添加了 ragas，用于计算字符串指标以进行评估。\n* @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1040 中修复了 sort 函数，使其能够在未指定或为 None 的 default_value 下正常工作。\n* @akarshgupta7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1043 中向指标中添加了正确性分数。\n* @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1046 中改进了查询计划器。\n* @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1045 中修复了 materialize，使其在 Ray 模式下能够容忍空输入目录。\n* @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1047 中进行了 PR 修复。\n* @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1048 中默认禁用了查询中的向量搜索重排序功能。\n* @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1049 中对向量搜索计划器的提示进行了更改。\n* @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1050 中使 OpenAIEmbedder 在客户端初始化后可序列化。\n* @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1051 中重命名了 ElasticSearch Notebook 中的 Embedding。\n* @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1053 中添加了可变形表格提取器。\n* 添加了一个用于线程局部变量的帮助程序，可用于向…添加元数据。","2024-12-05T23:24:27",{"id":198,"version":199,"summary_zh":200,"released_at":201},360466,"v0.1.27","本次 Sycamore 发布包含多项小的错误修复和改进。\n\n## 变更内容\n* 将 `aryn-sdk` 版本由 0.1.8 升级至 0.1.9，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1011 中完成  \n* 添加计划验证功能，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1001 中完成  \n* 如果存在分数属性，则按分数对检索文档进行排序，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1012 中完成  \n* 为 `summarize_data` 添加默认最大字符数 12 万，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1013 中完成  \n* 修复 Queryeval 文档集写入问题，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1014 中完成  \n* 添加 OpenSearch 示例的笔记本文件，由 @jonfritz 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1015 中完成  \n* 修复查询评估工具中的 NTSB 查询，由 @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1016 中完成  \n* 将名称由 APS 更改为 DocParse，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1017 中完成  \n* 启用表格的 JSON 化功能，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1018 中完成  \n* 修复 `aryn-sdk` 中的 `convert_image_element` 示例，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1019 中完成  \n* 修复 `aryn-sdk` 中的 DocParse 分块示例，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1021 中完成  \n* blacksmith.sh：将工作流迁移到 Blacksmith，由 @blacksmith-sh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1020 中完成  \n* 将单元测试恢复到 GitHub Actions，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1025 中完成  \n* 将版本号升级至 0.1.27，由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F1024 中完成  \n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fcompare\u002Fv0.1.26...v0.1.27","2024-11-14T19:46:54",{"id":203,"version":204,"summary_zh":205,"released_at":206},360467,"v0.1.26","本次发布包含多项稳定性和可靠性改进。\n\n## 变更内容\n* 跳过不稳定测试，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F956 中完成  \n* 修复 mypy 警告，由 @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F947 中完成  \n* 解决 vcrpy 录制过程中出现的挂起问题，由 @alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F950 中完成  \n* 对 LLM 计划器返回的计划进行后处理；修复 query-ui 的一些小问题，由 @amolvdeshpande 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F882 中完成  \n* 将 SDK 版本升级至 0.1.7，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F961 中完成  \n* 添加 HeaderAugmenterMerger，由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F946 中完成  \n* 更新文档以反映 OpenAIPropertyExtractor 已更名为 LLMPropertyextractor，由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F964 中完成  \n* 对表格合并功能进行几处小幅修复和调整，由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F963 中完成  \n* 启用 query.summarize_data 中的 use_elements 参数，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F966 中完成  \n* 修复 Summarize Images 文档字符串中的语法错误，由 @jonfritz 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F967 中完成  \n* 在 MarkBreakByTokens 的文档字符串中添加缺失的 `tokenizer` 参数，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F969 中完成  \n* 添加大量连接器单元测试，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F957 中完成  \n* 添加 OCR 评估代码，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F685 中完成  \n* 修复查询标签检查问题，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F968 中完成  \n* 修复 SDK 阈值相关 Bug，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F970 中完成  \n* 在 OpenSearch 查询结果中为每个文档添加得分，由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F971 中完成  \n* 修复 HeaderAugmenterMerger，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F973 中完成  \n* 重构 `mark_bbox_preset`，使其功能可在 `DocSet` 外部调用，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F972 中完成  \n* 修复 `mark_bbox_preset` 的 `MarkDropHeaderFooter` 参数问题，由 @MarkLindblad 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F975 中完成  \n* 对 OpenSearch 进行改进，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F974 中完成  \n* 添加独立的安装说明页面，由 @AbhijitP-009 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F977 中完成  \n* 将 OCR\u002FPDFMiner 提取的文本标记与表格输出合并，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F976 中完成  \n* 使表格处理代码更加健壮，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F979 中完成  \n* 修复 align_headers 中的除零错误，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F978 中完成  \n* 允许在缓存查询执行时返回查询轨迹，由 @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F959 中完成  \n* 在 SDK 中添加增强表格选项，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F980 中完成  \n* 升级 SDK 版本，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F981 中完成  \n* 更新依赖锁文件，由 @karanataryn 完成","2024-11-08T00:02:00",{"id":208,"version":209,"summary_zh":210,"released_at":211},360468,"v0.1.25","本次 Sycamore 发布包含针对连接器及其他转换的大量错误修复。此外，还通过 Amazon Bedrock 添加了对 Anthropic LLM 的支持。\n\n## 变更内容\n* Sycamore Query 评估工具，由 @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F912 中实现。\n* Luna 客户端本地模式（第二次尝试），由 @dtecuci 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F919 中实现。\n* 修复客户端中的一个小 bug，由 @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F923 中完成。\n* 修复 DuckDB 拼写错误，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F924 中完成。\n* 将 OpenSearchSchema 改为规范的 Pydantic 模型，由 @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F922 中实现。\n* 修复拼写错误，由 @Yashbhatt786 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F927 中完成。\n* 错误修复：DocumentSource 枚举序列化问题以及旧数据中缺失的 element_id，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F928 中完成。\n* 错误修复：移除 docset.rerank 和 Sycamore Query 代码生成中的 kwargs 参数，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F932 中完成。\n* 添加 Table Merger 功能，由 @dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F880 中实现。\n* 基本的 Bedrock LLM 客户端，由 @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F931 中实现。\n* 在配置中接受查询计划示例，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F934 中完成。\n* 在 query-eval 中评估查询计划，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F936 中完成。\n* 为 json scan 和 json document scan 添加本地模式支持，由 @bohou-aryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F925 中完成。\n* 处理表格和单元格缺失的情况，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F938 中完成。\n* 在 Sycamore Query 客户端中支持 LLM 选择，由 @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F935 中实现。\n* 修复 Crop To Bbox 错误，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F939 中完成。\n* 添加计划正确性指标汇总，并将 TopK 中的 K 设置为可选参数，由 @baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F940 中完成。\n* 不再使用 OpenAI 对空字符串进行嵌入，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F943 中完成。\n* 支持使用非 OpenAI LLM 进行 SummarizeImages 操作，由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F941 中完成。\n* 添加对标签和备注的支持，由 @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F942 中完成。\n* 创建 LLMSchemaExtractor 和 LLMPropertyExtractor 类，由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F945 中完成。\n* 不在单元测试中运行嵌入式 Weaviate，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F951 中完成。\n* 修复章节标题中的空字符串问题，由 @HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F948 中完成。\n* 按页选择功能，由 @bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F937 中完成。\n* 修复笔记本测试，由 @eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F933 中完成。\n* 在单元测试中使用 pytest-xdist，由 @mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F952 中完成。\n* 更新 standardizer.py 文件，由 @jonfritz 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F944 中完成。\n* 修复 Unflattening Data 中的错误，由 @karanataryn 在 https:\u002F\u002Fgithub.com\u002Far","2024-10-18T23:23:31",{"id":213,"version":214,"summary_zh":215,"released_at":216},360469,"v0.1.24","本次 Sycamore 发布包含对 Weaviate 和 DuckDB 连接器以及多个示例笔记本的若干 bug 修复。感谢 @Dnaynu 对 Sycamore 文档的贡献！\n\n## 变更内容\n* 修复读取器中的 asdict 方法。@HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F907 中完成。\n* 添加空表的文本表示。@dhruvkaliraman7 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F909 中完成。\n* 重构逻辑计划序列化。@mdwelsh 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F905 中完成。\n* 微小性能优化。@HenryL27 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F906 中完成。\n* Bug 修复：处理 OpenSearch 读取器在结果中无父文档时的文档重建问题。@baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F908 中完成。\n* 修复实体抽取中的 bug。@eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F911 中完成。\n* 增加从文件读取 Schema 的功能。@dtecuci 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F904 中完成。\n* 启用哈希上下文的复制。@alexaryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F910 中完成。\n* 添加从 pdfminer 中提取基于行的边界框的选项。@bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F874 中完成。\n* 支持本地模式下的随机采样。@bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F913 中完成。\n* 修复 OpenSearch 的 kwargs 问题。@baitsguy 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F914 中完成。\n* 修复 NTSB 示例中的拼写错误。@karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F917 中完成。\n* 更新 using_jupyter.md 文件。@jonfritz 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F902 中完成。\n* 文档：修正拼写错误。@Dnaynu 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F918 中完成。\n* 更新 DuckDB 读取器以适配包变更。@karanataryn 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F916 中完成。\n* 使 metadata-extraction.ipynb 脚本正常运行。@eric-anderson 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F915 中完成。\n* 将 Sycamore 版本升级至 0.1.24。@bsowell 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F921 中完成。\n\n## 新贡献者\n* @Dnaynu 在 https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F918 中完成了首次贡献。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fcompare\u002Fv0.1.23...v0.1.24","2024-10-14T19:52:52",{"id":218,"version":219,"summary_zh":220,"released_at":221},360470,"v0.1.23","This is a small release that fixes a bug in the Weaviate writer and includes a few other bug fixes and documentation improvements. \r\n\r\n## What's Changed\r\n* fix bug in weaviate writer causing api keys to be of wrong type by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F893\r\n* Expose local easyocr kwargs by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F894\r\n* Fix PDFMiner Output Parsing by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F890\r\n* Allow passing custom ocr object to arynpartitioner by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F895\r\n* Update Elasticsearch Port by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F896\r\n* Update Merger Parameters in Docs by @sohamkasar19 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F897\r\n* Fix Elasticsearch Docs by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F899\r\n* Cleanup Docs by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F900\r\n* Add smaller pdfminer bboxs to large detr bboxs by doing iob and not iou  by @dhruvkaliraman7 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F901\r\n* Fix anonymous reading in materialize and add rate limited logging. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F898\r\n* Bump version to v0.1.23. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F903\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fcompare\u002Fv0.1.22...v0.1.23","2024-10-10T22:34:01",{"id":223,"version":224,"summary_zh":225,"released_at":226},360471,"v0.1.22","This sycamore release includes support for Python 3.12, a connector for the Qdrant vector database, and many bug fixes and enhancements. Thanks to @Anush008 for contributing the Qdrant support! \r\n\r\n## What's Changed\r\n* bump sdk to 0.1.4 by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F823\r\n* Fix issue with empty tool response leading to hallucinations. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F818\r\n* Fix bug where prompt is modified by OpenAIEntityExtractor. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F824\r\n* Fix poetry.lock with missing dependency. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F825\r\n* Query trace viewer for Luna demo, and better PDF previews. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F828\r\n* Batch Processing Bug Fix by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F829\r\n* Get local mode working 1\u002Fn by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F826\r\n* Changing titles for some posts by @AbhijitP-009 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F827\r\n* Transform to convert Document into Markdown. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F811\r\n* Fix query trace viewer. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F830\r\n* Ingest more fields into OpenSearch schema for NTSB demo. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F834\r\n* Fix bug with trace view. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F833\r\n* Improved sorting of elements by bbox for one and two columns. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F801\r\n* Make PDFMiner Pipelined by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F807\r\n* Fix error message on None value passed to DateTimeStandardizer. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F835\r\n* Sundry improvements while using luna in a customer. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F832\r\n* fix to pass string to tokenizer by @Soeb-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F831\r\n* Some improvements to query plans for Luna demo. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F836\r\n* Update requires_modules type annotations to work with mypy. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F837\r\n* Lazily Set Table Text Representation by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F839\r\n* Have Luna use .keyword field for path field. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F841\r\n* Add a simple logical query plan compare function by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F840\r\n* Improve luna property handling by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F842\r\n* Add support for Python 3.12. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F838\r\n* Fix Luna UI to show query plan operators. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F847\r\n* bugfix to extract text summaries(dont just randomly assert) by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F848\r\n* Ignore bad tables by @MarkLindblad in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F849\r\n* Add support for caching intermediate results of Luna queries. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F850\r\n* add read.opensearch(reconstruct_document =True) option by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F845\r\n* Fold in query-demo capability to query-ui. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F852\r\n* Define parallelism on nodes by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F853\r\n* Basic documentation for APS markdown option. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F854\r\n* Implement output_format in Aryn SDK partition_file(). by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F857\r\n* Add `local-inference` extra to `sycamore-ai` dependency in `apps\u002Fquery-ui`. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F859\r\n* Super basic FastAPI wrapper to Sycamore Query. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F855\r\n* Support output_format in ArynPartitioner. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F858\r\n* Fix tile cannot extend outside image by @dhruvkaliraman7 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F856\r\n* Support Jupyter saving to S3 by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F860\r\n* Add PaddleOCR and Refactor Text Extraction by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F745\r\n* Fix broken test. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F863\r\n* Get Local Mode working 2\u002Fn by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F861\r\n* Remove package-mode by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F865\r\n* Add similarity scoring and rerank transform by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F864\r\n* adding docs for AssignDocProperties, Standardizer and ExtractTableProperties by @Soeb-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F866\r\n* Add newline before text elements. by @alexaryn","2024-10-09T19:54:27",{"id":228,"version":229,"summary_zh":230,"released_at":231},360472,"v0.1.21","This Sycamore release contains Aryn Partitioning Service client updates to support the new auto-threshold feature and add support for Microsoft Word (.doc and .docx) and Microsoft PowerPoint (.ppt and .pptx) files. It also contains a variety of bug fixes and stability improvements. \r\n\r\n## What's Changed\r\n* Fix Lib\u002FSycamore README by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F771\r\n* Allow custom SycamoreQueryClient in query-ui + cleanup by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F772\r\n* Sycamore changes to support new NTSB demo. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F774\r\n* improving ExtractTableProperties and standardizer transforms by @Soeb-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F773\r\n* add materialize to transform toc by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F779\r\n* Fix Bugs in Sycamore Pipeline by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F777\r\n* New NTSB Luna demo UI. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F778\r\n* neo4j, refactor pipeline to not auto resolve entities + add support for images in pipeline. by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F766\r\n* Fix issue with duplicate widget keys. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F780\r\n* Bug fixes in query path by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F781\r\n* Add querydemo to pyproject.toml. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F783\r\n* A few Luna demo fixes. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F784\r\n* Bugfixes for context_vars by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F785\r\n* Fix Local Mode Read Bug by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F786\r\n* Make reorder_elements more like sorted() so we can use key= by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F787\r\n* Add new OpenSearch writer notebook by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F788\r\n* Fix function signature reading in contextvars by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F789\r\n* Various Luna Demo fixes. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F790\r\n* Making changes to docs. Better titles etc. by @AbhijitP-009 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F793\r\n* Update our container support  by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F782\r\n* Add natural language result flag. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F794\r\n* Make Element Class More Robust by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F797\r\n* updated docs to explain the new default threshold setting for ArynPartitioner by @dtecuci in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F795\r\n* Add support for pushing query filters down to OpenSearch. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F796\r\n* neo4j writer docs by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F798\r\n* Docs for nms change (take 2) by @dtecuci in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F799\r\n* Fix TableTransformer Bug by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F800\r\n* Remove dead code. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F803\r\n* Verify .map can run parallel classes by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F802\r\n* bugfix to extract graph entities by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F805\r\n* Update default threshold values for ArynPartitioner. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F804\r\n* Update type signatures for threshold in aryn_sdk. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F806\r\n* Change Bounding Box Validity Assertion by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F808\r\n* A few Luna demo tweaks. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F810\r\n* Couple of Luna demo bugfixes. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F814\r\n* Add QueryVectorDatabase to SycamoreQuery by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F813\r\n* Add `.docx` documentation by @MarkLindblad in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F812\r\n* Ritam add example notebook by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F815\r\n* query-ui: cosmetic changes by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F817\r\n* Improved NTSB ingestion pipeline for Luna demo. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F816\r\n* Bump version to v0.1.21. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F819\r\n* Reverts README change to restore poetry build. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F820\r\n* Fix typo scyamore -> sycamore. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F822\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fcompare\u002Fv0.1.20...v0.1.21","2024-09-18T18:10:25",{"id":233,"version":234,"summary_zh":235,"released_at":236},360473,"v0.1.20","This release refactors Sycamore’s dependencies to use extras in order to conditionally pull in dependencies for connectors and local inference (e.g. creating vector embeddings). For example, if you want to use the OpenSearch connector, you will need to: pip install sycamore-ai[opensearch]. Or, if you want to run a local vector embedding model, you will need to: pip install sycamore-ai[local-inference]. To do both, you will need to: `pip install sycamore-ai[opensearch,local-inference]`\r\n\r\nAlso, this release includes performance and stability improvements.\r\n\r\n## What's Changed\r\n* Dependencies 1\u002Fn: Remove need to restart colab runtime by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F728\r\n* Don't require installing neo4j unless it's used. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F733\r\n* Handle None cases for element.table = \u003C> by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F735\r\n* Fix materialize + S3 not working. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F734\r\n* Fixed neo4j relationship property loading + added support for loading lists and dictionaries as properties by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F736\r\n* Handle non-hashable data types in opensearch schema extractor by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F737\r\n* docs: update README.md by @eltociear in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F739\r\n* Support concurrent libreoffice executions, fix bug to support s3 source paths in file_format_tools by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F741\r\n* Fix calls to structured outputs so that they can be cached by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F738\r\n* fix 'SycamorePartitioner' error message by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F748\r\n* Fix context test by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F749\r\n* Enforce the constraint that each cell is only in one spanning cell. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F754\r\n* Add context_params decorator to read args from Context by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F747\r\n* Remove unnecessary tracing code. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F752\r\n* Dependencies 2\u002F3: Move connectors to extras. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F740\r\n* Allow any pinecone error on create index by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F750\r\n* Allow all Exceptions while creating Connector Targets by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F753\r\n* Adding new ETL tutorial by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F751\r\n* Add materialize to the ntsb loader for luna by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F742\r\n* Add Weaviate notebook by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F757\r\n* Update get_started.rst by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F759\r\n* Update pinecone.md by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F758\r\n* added new document structure + tests by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F746\r\n* Dependencies 3\u002F3: Add partitioning extras. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F755\r\n* Dependencies: Remove need to restart colab session for aryn-sdk by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F756\r\n* Default llm in transforms by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F760\r\n* Improve materialize by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F762\r\n* adding neo4j s3 proxy for aura db + split_calls flag for entity and relationship extractor. by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F761\r\n* Fix show_pages in Google Colab. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F763\r\n* Jonfritz patch 3 tutorial by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F764\r\n* Fix materialize to work even if it is re-executed on the same documents. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F765\r\n* add clear_materialize(path=\u003Csomething>) by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F767\r\n* Jonfritz patch 3 consoledocs by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F768\r\n* Update docs with more info on dependencies. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F769\r\n* bump sycamore version to 0.1.20 by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F770\r\n\r\n## New Contributors\r\n* @eltociear made their first contribution in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F739\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fcompare\u002Fv0.1.19...v0.1.20","2024-09-06T18:39:29",{"id":238,"version":239,"summary_zh":240,"released_at":241},360474,"v0.1.19","This release adds a materialize opertaion and enhanced query functionality along with stability and performance improvements.\r\nAlso an experimental neo4j writer.\r\n\r\n## What's Changed\r\n* Add comment to MetadataDocument superclass call by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F607\r\n* Update Copyright Year by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F617\r\n* Merge elements in ntsb test ingest by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F619\r\n* Add github ref name to Helicone logs. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F618\r\n* Add local (no-ray) execution mode to speed up lineage development by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F616\r\n* Integrate LLM Extract logic into Sycamore Transforms by @tranade in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F608\r\n* parse html tables better by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F621\r\n* Change Aryn-SDK Error Message by @karanataryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F622\r\n* Small update to `field_to_value` by @tranade in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F620\r\n* Jonfritz patch docsupdate by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F624\r\n* Avoid repeat take_all in Eval Pipeline by @aanya-p in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F611\r\n* Add Evaluate as Transform by @Soeb-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F487\r\n* Checking in notebook that calls APS to analyze financial document (10k). by @AbhijitP-009 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F626\r\n* Refactor LogicalOperators to use pydantic. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F610\r\n* Added Entity Extractor + HierarchicalDocument by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F601\r\n* Rename SycamorePartitionerExample.ipynb to ArynPartitionerExample.ipynb by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F628\r\n* Add LLMFilter as a DocSet Transform by @tranade in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F623\r\n* Create `count_distinct` for DocSet by @tranade in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F625\r\n* Jonfritz patch 3 update readme by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F629\r\n* Update get_hash_context_file func by @pparmar30 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F603\r\n* Change PDFMiner cache to $HOME\u002F.sycamore\u002FPDFMinerCache. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F634\r\n* Add Context.config by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F627\r\n* Fixup git repo from accidental pushes via overrides by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F636\r\n* Include match and range filter functions by @tranade in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F630\r\n* uncap python version for aryn-sdk by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F638\r\n* Add support to materialize to write documents out to files. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F640\r\n* Refactor OpenSearchSchema to be more robust. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F639\r\n* reading env variable as suggested and cosmetic changes by @Soeb-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F609\r\n* Fix code execution and trace display in Query UI by @tranade in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F646\r\n* Added OpenAI Async client by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F632\r\n* A couple of small tweaks to make Sycamore more robust to missing or bogus data. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F649\r\n* Add generic traverse by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F648\r\n* Shift more operations by @tranade in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F631\r\n* Refactor Context and support args in Map by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F637\r\n* Changed OpenAI Cache integration test by @RitxmSaha in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F651\r\n* Run poetry-lock-all until the dependencies became consistent. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F652\r\n* Fix range filter problem by @tranade in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F654\r\n* Fix codegen syntax\u002Fformatting by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F655\r\n* fix table html parsing edge case by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F656\r\n* Code executor by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F657\r\n* Switch Luna tracing to use materialize. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F653\r\n* Add documentation on output of Aryn Partitioning Service by @MarkLindblad in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F633\r\n* Revamp Sycamore Query demo UI. by @mdwelsh in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F659\r\n* Adding new docs for Aryn Partitioning Service. Added a gentle introduction to APS docs and rearranged some of the existing APS docs. by @AbhijitP-009 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F660\r\n* Bugfix for query dry-run mode by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpul","2024-08-27T19:02:53",{"id":243,"version":244,"summary_zh":245,"released_at":246},360475,"v0.1.18","This Sycamore release contains a variety of new features, including interfaces for reading from and writing to vector stores, with implementations for OpenSearch, DuckDB, Elasticsearch, Pinecone, and Weaviate. This release also contains performance enhancements, dependency upgrades, and bug fixes. \r\n\r\nThis release coincides with the launch of the Aryn Partitioning Service, which provides an endpoint for partitioning PDFs. This service is integrated with Sycamore and free to try at https:\u002F\u002Fwww.aryn.ai\u002Fget-started. \r\n\r\n## What's Changed\r\n* Provide better error messages on *Map mis-use by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F450\r\n* Run poetry lock in openai-proxy with poetry-lock-all.sh by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F447\r\n* use unstructured in weaviate IT by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F453\r\n* Allow disabling CUDA via env var by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F454\r\n* Lazily clean up temp files from DETR partitioner. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F452\r\n* fix pytest commands in contributing guide by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F441\r\n* Work around bad interaction between mypy and Python 3.9 by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F456\r\n* Enable metadata by default. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F442\r\n* Writer abstraction by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F451\r\n* Remove temporary file write in DETR partitioner. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F459\r\n* Demo UI feature for Manual Filters and Aggregations on input query by @sohamkasar19 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F460\r\n* convert opensearch writer to use base writer 1\u002F3 by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F461\r\n* Upgrade torch and Ray. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F462\r\n* handle llm flakiness in convert_timestamp better by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F455\r\n* [Bug Fix] Demo UI pdf viewer by @sohamkasar19 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F464\r\n* convert weaviate writer to base db writer 2\u002F3 by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F465\r\n* Adding OpenAITokenizer to sycamore.functions by @Soeb-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F466\r\n* Batch detr inference by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F467\r\n* Add LogTime that logs time trace info via logging by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F469\r\n* Another approach to CUDA support in Docker, with less bloat. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F449\r\n* Choose MPS or CUDA automatically. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F458\r\n* Selectable sizes, autoscaling, better variable names. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F473\r\n* Fix show_pages to work with MetadataDocuments. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F471\r\n* Force ray down to 2.20.0. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F477\r\n* Instrumented more code with TimeTrace decorators. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F474\r\n* Add opensearch reader by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F476\r\n* Batched sycamore pdf partitioner by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F478\r\n* Fix typo. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F482\r\n* Potential memory leak point by @bohou-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F483\r\n* Get integration tests working again. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F484\r\n* Make it possible to pass in a schema for property extraction. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F481\r\n* Switch batch at a time to True by default. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F485\r\n* convert pinecone writer to base writer 3\u002F3 by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F468\r\n* Add ArynPartitioner by @MarkLindblad in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F470\r\n* Remove old model server endpoint option by @MarkLindblad in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F488\r\n* Add DuckDB Writer by @karansampath in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F480\r\n* pinecone demo nb by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F486\r\n* Weaviate Scan by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F490\r\n* Add DuckDB Scan by @karansampath in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F492\r\n* Add filter_elements method on DocSet. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F493\r\n* Support table deserialization from a dictionary. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F494\r\n* Add flatten option to weaviate writer by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F495\r\n* Add PDFMiner caching by @pparmar30 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F489\r\n* Use examples in few shot e","2024-07-30T15:11:45",{"id":248,"version":249,"summary_zh":250,"released_at":251},360476,"v0.1.17","This Sycamore release contains new writers to the Weaviate and Pinecone vector databases, enhancements to the demo UI, and numerous small features and bug fixes. \r\n\r\n## What's Changed\r\n* Add Sycamore Partitioner example notebook by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F379\r\n* Various link updates and typo fixes in docs by @hsm207 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F381\r\n* Fix notebook link in docs. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F383\r\n* SummarizeImage example in the SycamorePartitionerExample notebook. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F382\r\n* Jonfritz patch 1 updated description by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F384\r\n* Rename ...Request -> ...Call and ...Response -> ...Reply by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F385\r\n* Responsive Demo UI by @sohamkasar19 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F386\r\n* lineage 1\u002Fn: add support for metadata. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F387\r\n* Set table object to None when no table is found. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F389\r\n* Fix integration tests. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F390\r\n* Add GPT-4o support. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F392\r\n* Updates to Demo UI by @sohamkasar19 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F391\r\n* Updates in demo ui for filtering by @sohamkasar19 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F394\r\n* ensure unique uuids post explode via sequence numbers by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F398\r\n* Fix model deserialization error by @bohou-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F395\r\n* Convert map.py transforms over to base map. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F399\r\n* Convert bbox_merge and mark_misc to new Map style by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F402\r\n* Add TimeTrace and instrument major pieces of code. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F388\r\n* Use proxy to provide default settings to the UI by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F404\r\n* Update CONTRIBUTING.md to note that integration tests are currently broken. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F403\r\n* FIX: Pdf viewer error by @sohamkasar19 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F406\r\n* Convert Merge from Ray Actor to Ray Task. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F401\r\n* Tools to look at TimeTrace output. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F396\r\n* Switch Filter and Enbed over to BaseMapTransform by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F405\r\n* Convert classes over to new *Map classes 3\u002Fn by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F408\r\n* Explicitly enumerate notebooks to automatically test. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F335\r\n* Add timing for OpenSearch writer. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F407\r\n* Add weaviate writer by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F400\r\n* Switch from uuid1() to uuid4() in explode. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F410\r\n* Add remote model server support by @MarkLindblad in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F397\r\n* Refactor drawing code to support additional formats. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F409\r\n* Adjust assertion for batch_size resource_arg. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F411\r\n* Convert over to *Map 4\u002Fn by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F412\r\n* Fix OOM on CPU by reducing default batch size by @MarkLindblad in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F415\r\n* Update poetry lock files by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F414\r\n* Fix bug in runtests, it always detected changes. Add --force to force tests by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F413\r\n* Conversion to base map 5\u002Fn: Partition by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F416\r\n* Remove generate_map_class_from_callable -- *Map 6\u002Fn by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F417\r\n* Fix bug again.  Make JSON encoding work. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F418\r\n* TimeTrace: add RSS and improve usability by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F419\r\n* Fix split_and_convert_to_image when some pages have no elements. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F422\r\n* remove empty lists from documents in weaviate writer by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F421\r\n* Switch spread props & ndd to BaseMap -- *Map 7\u002Fn by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F425\r\n* Add show_pages pdf utility for visualizing pdf partitioning. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F424\r\n* TimeTrace: Fallback to RUSAGE_SELF when ","2024-06-06T18:21:55",{"id":253,"version":254,"summary_zh":255,"released_at":256},360477,"v0.1.16","This release contains support in the SycamorePartitioner for extracting table structure and images, as well as a new transform for summarizing images. It also includes a number of bug fixes and enhancements. \r\n\r\n## What's Changed\r\n* fix ui error when no title is extracted and we're not in ntsb setting by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F352\r\n* Fix almost all the pyproject.toml and poetry.lock files to have consistent requirements on python dependencies. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F345\r\n* Bind mount to convey SSL cert\u002Fkey to Jupyter & UI by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F349\r\n* Use real SSL certificate for OpenSearch HTTP. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F353\r\n* copy lib\u002Fpoetry-lock into containers to make poetry happy by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F354\r\n* copy lib\u002Fpoetry-lock into remote-processor-service too. by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F355\r\n* copy in all of poetry-lock, not just the pyproject files by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F356\r\n* Update data model for table structure recognition. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F357\r\n* Put token-protected HTTPS proxy in front of UI proxy. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F359\r\n* Arxiv switched to HTTP for these PDFs; make it work. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F360\r\n* Add apt update to UI Dockerfiles. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F361\r\n* Use chown in our copy commands to make sure all files are owned by app by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F362\r\n* Add TableStructureExtractor interface and TableTransformer impl. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F358\r\n* fix zsh path by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F367\r\n* Jupyter container improvements by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F369\r\n* Don't say localhost if it's not going to work. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F366\r\n* bump deploy timeout for reranking model from 60 to 120 by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F363\r\n* ingest all ntsb docs, automatically detect docker v not, spread path … by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F368\r\n* Fix typos in README by @hsm207 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F370\r\n* Fix default prep script when given an empty directory to import by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F371\r\n* fix typo by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F372\r\n* Add the ability to summarize images to partitioned docsets. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F365\r\n* Store element bbox as a tuple rather than BoundingBox. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F374\r\n* Jonfritz patch 1 partition update by @jonfritz in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F376\r\n* FIX: Error on initiate conversation without a conversation id  by @sohamkasar19 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F375\r\n* Add API docs for the SycamorePartitioner and table extraction. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F373\r\n* Fix malformed text from beautiful soup. by @bohou-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F351\r\n* Handle deserializing JSON documents when elements is None. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F377\r\n* Bump sycamore version to 0.1.16 by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F378\r\n\r\n## New Contributors\r\n* @hsm207 made their first contribution in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F370\r\n* @sohamkasar19 made their first contribution in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F375\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fcompare\u002Fv0.1.15...v0.1.16","2024-05-07T04:41:00",{"id":258,"version":259,"summary_zh":260,"released_at":261},360478,"v0.1.15","This release add support for writing DocSets to jsonl files as well as other incremental features and bug fixes. \r\n\r\n## What's Changed\r\n* Cache entire Amazon Textract response by @baitsguy in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F333\r\n* New query chosen in consultation with Mehul. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F336\r\n* Fix unit test mocking. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F338\r\n* Added ability to write JSONL block files. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F337\r\n* Fix bug in updating a single property and most workarounds for the bug. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F341\r\n* Set RPS default version to follow VERSION again by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F342\r\n* Initial Container ITs by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F339\r\n* Force to opensearch V2.12.0.0 to make build work by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F343\r\n* minor fixups to NDD doc by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F346\r\n* Better container integration testing automation by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F344\r\n* Update NDD notebook with JSON\u002FPDF ingestion options. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F347\r\n* Bump sycamore version to v0.1.15 by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F348\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fcompare\u002Fv0.1.14...v0.1.15","2024-04-11T23:58:23",{"id":263,"version":264,"summary_zh":265,"released_at":266},360479,"v0.1.14","This release includes CPU support and OCR in the Sycamore Partitioner,  caching for better performance and lower cost when using Textract for table extraction, an upgraded version of Ray (2.10), and more.\r\n\r\n## What's Changed\r\n* mark rps version as latest rc by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F291\r\n* Cleanup rewriting - cloning doesn't work by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F292\r\n* Fix integ test import error. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F293\r\n* Change notebook working directory when running outside container. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F294\r\n* Fix bug in undocumented\u002Funtested prefix limiting feature. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F295\r\n* Implement CachedTextractTableExtractor by @bohou-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F288\r\n* Upgrade the openai Python library to 1.x and guidance to 0.1.x. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F242\r\n* Reorder partitioner output and fix model loading inefficiency by @bohou-aryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F277\r\n* Refactor sycamore to apps, lib by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F296\r\n* add averaged_perceptron_tagger to nltk downloads by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F301\r\n* fix jupyter bind mount path by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F302\r\n* Make sure filetype property is already set. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F298\r\n* initialize messages index on startup by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F303\r\n* Add demo UI by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F300\r\n* Address HTML viewer bug when doing sycamore_crawler_http_sort_all by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F304\r\n* Make SycamorePartitioner runnable on CPUs. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F299\r\n* Get all the containers building and working again. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F305\r\n* Switch from Exception to RuntimeError by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F306\r\n* remove submodule steps from plugin checkout in dockerfile because sub… by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F309\r\n* Fix dockerfile to work post merge by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F310\r\n* Add some documentation for NDD: Sketcher at ingestion time. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F307\r\n* Add sketch() after explode() in all our default pipelines. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F312\r\n* Add remote processor service by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F311\r\n* use ADD instead of RUN git clone to checkout git repos by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F313\r\n* Change from nmslib to faiss everywhere. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F314\r\n* Add tesseract-ocr to container dependencies. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F316\r\n* compile docs with poetry by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F317\r\n* Add support for OCR in the Sycamore partitioner. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F315\r\n* Setup query-time NDD: pre-create RPS processors, add to pipelines by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F318\r\n* Changes needed for vanilla build of importer and RPS containers. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F320\r\n* Add shingles to _source to enable query-time near duplicate detection by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F321\r\n* Fix importer to check for user, apply similar fix to crawlers by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F322\r\n* Remove obsolete files from the quickstart -> sycamore repo merge. by @eric-anderson in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F283\r\n* Upgrade to Ray 2.10.0. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F319\r\n* Upgrade guidance to 0.1.13. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F323\r\n* Remove mypy --explicit-package-bases flag and fix issues. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F324\r\n* Update poetry.lock files based on recent sycamore dependency changes. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F325\r\n* Deal with renamed file. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F329\r\n* Added -anon switch to S3 crawler for public buckets. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F327\r\n* add docs for RPS by @HenryL27 in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F328\r\n* Add Jupyter notebook to demonstrate query-time NDD. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F326\r\n* Expand NDD doc into separate file. by @alexaryn in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F330\r\n* Bump version to 0.1.14. by @bsowell in https:\u002F\u002Fgithub.com\u002Faryn-ai\u002Fsycamore\u002Fpull\u002F332\r\n* Add .profile to co","2024-04-02T19:38:56"]