[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-AuvaLab--itext2kg":3,"tool-AuvaLab--itext2kg":64},[4,18,28,36,44,52],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":24,"last_commit_at":25,"category_tags":26,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,27],"语言模型",{"id":29,"name":30,"github_repo":31,"description_zh":32,"stars":33,"difficulty_score":10,"last_commit_at":34,"category_tags":35,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[27,15,13,14],{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":24,"last_commit_at":42,"category_tags":43,"status":17},8553,"spec-kit","github\u002Fspec-kit","Spec Kit 是一款专为提升软件开发效率而设计的开源工具包，旨在帮助团队快速落地“规格驱动开发”（Spec-Driven Development）模式。传统开发中，需求文档往往与代码实现脱节，导致沟通成本高且结果不可控；而 Spec Kit 通过将规格说明书转化为可执行的指令，让 AI 直接依据明确的业务场景生成高质量代码，从而减少从零开始的随意编码，确保产出结果的可预测性。\n\n该工具特别适合希望利用 AI 辅助编程的开发者、技术负责人及初创团队。无论是启动全新项目还是在现有工程中引入规范化流程，用户只需通过简单的命令行操作，即可初始化项目并集成主流的 AI 编程助手。其核心技术亮点在于“规格即代码”的理念，支持社区扩展与预设模板，允许用户根据特定技术栈定制开发流程。此外，Spec Kit 强调官方维护的安全性，提供稳定的版本管理，帮助开发者在享受 AI 红利的同时，依然牢牢掌握架构设计的主动权，真正实现从“凭感觉写代码”到“按规格建系统”的转变。",88749,"2026-04-17T09:48:14",[27,15,13,14],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":24,"last_commit_at":50,"category_tags":51,"status":17},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[14,27],{"id":53,"name":54,"github_repo":55,"description_zh":56,"stars":57,"difficulty_score":24,"last_commit_at":58,"category_tags":59,"status":17},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,"2026-04-10T11:13:16",[15,16,60,61,13,62,27,14,63],"视频","插件","其他","音频",{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":77,"owner_website":77,"owner_url":78,"languages":79,"stars":84,"forks":85,"last_commit_at":86,"license":87,"difficulty_score":24,"env_os":88,"env_gpu":88,"env_ram":88,"env_deps":89,"category_tags":94,"github_topics":95,"view_count":24,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":99,"updated_at":100,"faqs":101,"releases":136},8751,"AuvaLab\u002Fitext2kg","itext2kg","We build KGs the way nature builds matter","iText2KG（现已升级为 ATOM）是一款利用大语言模型从非结构化文本中构建和持续更新“时序知识图谱”的开源工具。传统方法往往忽视数据的时间动态性，且在处理长文本时容易遗漏关键事实或导致结果不稳定。ATOM 通过独特的“原子事实分解”技术，将文档拆解为最小、自包含的事实单元，有效解决了大模型在长上下文中的“遗忘效应”，显著提升了信息提取的完整性和多次运行的一致性。\n\n该工具的核心亮点在于其并行的三模块架构：首先拆分原子事实，接着并行提取包含时间跨度的五元组关系，最后利用基于距离度量的算法高效合并图谱。这一设计不仅实现了观察时间与事件时间的双重建模，避免了时间归属错误，更将处理延迟降低了 93% 以上，完美支持大规模动态数据的实时更新。\n\niText2KG 非常适合需要处理海量动态文本数据的 AI 研究人员、后端开发者以及知识图谱工程师使用。无论是构建随时间演变的行业知识库，还是研发需要精准时间推理的智能应用，它都能提供稳定、可扩展且无需特定领域微调的高效解决方案。","# ATOM: AdapTive and OptiMized Dynamic Temporal Knowledge Graph Construction Using LLMs\n\niText2KG is now ATOM. ATOM is a few-shot and scalable approach for building and continuously updating Temporal Knowledge Graphs (TKGs) from unstructured texts.\n(We kept the legacy iText2KG in the repository, please check the [README](.\u002FREADME_itext2kg.md).)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_e22ec1c0eb12.png\" width=\"851px\" alt=\"ATOM Banner\">\n\u003C\u002Fp>\n\n![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fauvalab\u002Fitext2kg?style=social)\n![GitHub forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fauvalab\u002Fitext2kg?style=social)\n![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fitext2kg)\n![Total Downloads](https:\u002F\u002Fimg.shields.io\u002Fpepy\u002Fdt\u002Fitext2kg)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-View-green?style=flat&logo=adobeacrobatreader)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.22590)\n![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fitext2kg)\n[![Demo](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-Available-blue)](.\u002Fexamples\u002F)\n![Status](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FStatus-Work%20in%20Progress-yellow)\n\n\u003Cp align=\"center\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_6b2de3ab64fe.png\" width=\"300\">\n    \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\".\u002Fdocs\u002Flogo_atom_black.png\" width=\"300\">\n    \u003Cimg alt=\"Logo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_6b2de3ab64fe.png\" width=\"300\">\n  \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n## Overview\nTraditional static KG construction often overlooks the dynamic and time-sensitive nature of real-world data, limiting adaptability to continuous changes. Moreover, recent zero- or few-shot approaches that avoid domain-specific fine-tuning or reliance on prebuilt ontologies often suffer from instability across multiple runs, as well as incomplete coverage of key facts.\n\nATOM splits input documents into minimal, self-contained “atomic” facts, improving extraction exhaustivity and stability. From these atomic facts, atomic KGs are derived and then merged in parallel.\n\nIn a nutshell, ATOM addresses these limitations by:\n\n- ✅ **Improving exhaustivity**: Capturing comprehensive fact coverage from longer texts (~31% gain on factual exhaustivity, ~18% improvement in temporal exhaustivity)\n- ✅ **Ensuring stability**: Producing consistent TKGs across multiple runs (~17% improvement)\n- ✅ **Enabling scalability**: Supporting large-scale dynamic temporal updates through parallel architecture.\n\n## 🔥 News\n* [20\u002F10\u002F2025] ATOM - Major Enhancements:\n    -   **Complete Architectural Redesign**: ATOM now employs a three-module parallel pipeline for DTKG construction and updates.\n    -   **Atomic Fact Decomposition**: A new first module splits text into minimal \"atomic facts,\" addressing the \"forgetting effect\" where LLMs omit facts in longer contexts.\n    -   **Enhanced Exhaustivity and Stability**: The new architecture achieves significant gains: ~31% in factual exhaustivity, ~18% in temporal exhaustivity, and ~17% in stability.\n    -   **Dual-Time Modeling**: Implemented dual-time modeling (`t_obs` vs. `t_start`\u002F`t_end`) to prevent temporal misattribution in dynamic KGs.\n    -   **Parallel 5-Tuple Extraction**: Module-2 now directly extracts 5-tuples `(subject, predicate, object, t_start, t_end)` in parallel from atomic facts.\n    -   **Parallel Atomic Merge Architecture**: Module-3 uses an efficient, parallel pairwise merge algorithm, achieving 93.8% latency reduction vs. Graphiti and 95.3% vs. iText2KG.\n    -   **LLM-Independent Resolution**: Replaced slow LLM-based resolution with distance metrics (cosine similarity) for scalable, parallel merging.\n\n* [29\u002F07\u002F2025] iText2KG - New Features and Enhanced Capabilities:\n    -   **iText2KG_Star**: Introduced a simpler version that directly extracts relationships, eliminating the separate entity extraction step and reducing token consumption.\n    -   **Facts-Based KG Construction**: Enhanced the framework with facts-based KG construction using a Document Distiller.\n    -   **Dynamic Knowledge Graphs**: Added support for building dynamic KGs that evolve over time. See example: [Dynamic KG Construction](.\u002Fexamples\u002Fbuilding_dynamic_kg_openai_posts.ipynb). **NB: Temporal\u002Flogical conflicts resolution is not handled in this version.**\n\n* [19\u002F07\u002F2025] iText2KG - Major Performance and Reliability Updates:\n    -   **Asynchronous Architecture**: Migrated core methods to `async\u002Fawait` for non-blocking I\u002FO with LLM APIs.\n    -   **Logging System**: Implemented comprehensive logging to replace print statements.\n    -   **Enhanced Batch Processing**: Improved efficiency for handling multiple documents and LLM calls.\n    -   **Better Error Handling**: Added enhanced error handling and retry mechanisms.\n\n* [07\u002F10\u002F2024] iText2KG - Latest features:\n    -   Refactored code with data models for Entity, Relation, and KnowledgeGraph.\n    -   Entities are embedded using both name (0.6 weight) and label (0.4 weight) to differentiate concepts (e.g., Python:Language vs. Python:Snake).\n    -   Added `max_tries` parameters to `build_graph` to handle LLM hallucinations.\n\n* [17\u002F09\u002F2024] iText2KG - Latest features:\n    -   Compatibility with all LangChain chat and embedding models.\n    -   The `build_graph` function can now expand existing graphs.\n    -   Compatible with Python 3.9+.\n\n* [16\u002F07\u002F2024] iText2KG - Addressed two major LLM hallucination issues:\n    -   Handled invented entities by replacing them with the most similar entity from the provided list.\n    -   Handled the \"forgetting effect\" (failing to assign relations) by re-prompting the LLM for missing entities.\n\n## Architecture\n\nATOM employs a three-module parallel pipeline that constructs and continuously updates DTKGs from unstructured text.\n\n**Module-1 (Atomic Fact Decomposition)** splits input documents `D_t` observed at time `t` into temporal atomic facts `{f_{t,1}, ..., f_{t,m_t}}` using LLM-based prompting with an optimal chunk size of \u003C400 tokens, where each temporal atomic fact is a short, self-contained snippet that conveys exactly one piece of information.\n\n**Module-2 (Atomic TKGs Construction)** extracts 5-tuples (quintuples) in parallel from each atomic fact `f_{t,i}` to construct atomic temporal KGs `G^t_i`, while embedding nodes and relations and addressing temporal resolution during extraction by transforming end validity facts into affirmative counterparts while modifying only the `t_end` time (e.g., \"John Doe is no longer CEO of X on 01-01-2026\" → `(John_Doe, is_ceo, X, [.], [01-01-2026])`).\n\n**Module-3 (Parallel Atomic Merge)** employs a binary merge algorithm to merge pairs of atomic TKGs through iterative pairwise merging in parallel until convergence, with three resolution phases: (1) entity resolution using exact match or cosine similarity threshold `θ_E = 0.8`, (2) relation resolution merging relation names regardless of endpoints and timestamps using threshold `θ_R = 0.7`, and (3) temporal resolution that merges observation and validity time sets for relations with similar `(e_s, r_p, e_o)`.\n\nThe resulting TKG snapshot `G^t_s` is then merged with the previous DTKG `G^{t-1}` to yield the updated DTKG: `G^t`.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_853061f48e52.png\" width=\"800px\" alt=\"ATOM Architecture\">\n\u003C\u002Fp>\n\n---\n## Example of the ATOM Workflow\n\nOn observation date 09-01-2007, ATOM processes the fact \"Steve Jobs was the CEO of Apple Inc. on January 9, 2007\" to create the 5-tuple `(Steve Jobs, is_ceo, Apple Inc., [09-01-2007], [.])` where `t_start = [09-01-2007]` and `t_end = [.]` (empty\u002Funknown).\n\nLater, on observation date 05-10-2011, ATOM processes the update \"Steve Jobs is no longer the CEO of Apple Inc. on 05-10-2011\". As described in **Module-2**, this **end validity fact** is transformed into its affirmative counterpart by modifying only the `t_end` time, producing `(Steve Jobs, is_ceo, Apple Inc., [.], [05-10-2011])`.\n\nDuring Module-3's temporal resolution phase, ATOM detects that both 5-tuples share the same `(e_s, r_p, e_o)` triple and merges their time lists to produce the final 5-tuple: `(Steve Jobs, is_ceo, Apple Inc., [09-01-2007], [05-10-2011])`. This correctly represents that Steve Jobs was CEO from January 9, 2007 to October 5, 2011, while maintaining dual-time modeling with `t_obs = [09-01-2007, 05-10-2011]` to track when each piece of information was observed.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_8601a427ecba.png\" width=\"800px\" alt=\"ATOM Workflow Diagram\">\n\u003C\u002Fp>\n\nFor more technical details, check out:\n-   **`atom\u002Fatom.py`**: Core logic for building, merging, and updating the knowledge graphs.\n\n---\n\n## Latency & Scalability\n\nATOM achieves significant latency reduction (93.8% vs. Graphiti, 95.3% vs. iText2KG) by replacing serial bottlenecks with a fully parallel architecture.\n\nKey architectural advantages include:\n\n1.  **Parallel 5-Tuple Extraction**: ATOM extracts 5-tuples in a single, parallelized step. This avoids the separate entity and relation extraction steps used by iText2KG and Graphiti, which double LLM calls and increase latency.\n2.  **LLM-Independent Merging**: The framework uses efficient distance metrics (cosine similarity) for entity\u002Frelation resolution. This avoids the computational bottlenecks of LLM-based resolution (used by Graphiti) and allows true parallelization as the graph scales.\n3.  **Parallel Atomic Merge**: Atomic TKGs are merged using an iterative pairwise algorithm, which runs in parallel (e.g., 8 threads with a batch size of 40).\n4.  **Early Temporal Resolution**: Temporal logic is handled during the extraction phase (Module-2), not during the merge phase.\n\nAs a result, the parallel merge process (Module-3) accounts for only 13% of ATOM's total latency. The remainder is attributed to API calls, which can be further minimized by increasing the batch size or scaling local LLM hardware.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_95d3eecc0a64.png\" width=\"800px\" alt=\"Latency Comparison\">\n\u003C\u002Fp>\n\n---\n\n## Example: Temporal Modeling (ATOM vs. Graphiti)\n\nThe following figure demonstrates the difference between ATOM's and Graphiti's temporal modeling using COVID-19 news from 09-01-2020 to 23-01-2020. For ATOM, timestamps are encoded in UNIX format to eliminate overhead associated with string parsing operations and timezone conversion calculations\n\nFor the fact \"The mysterious respiratory virus spread to at least 10 other countries\" observed on 23-01-2020, **Graphiti** treats the observation time as the validity start time (t_start), setting `valid_at = 23-01-2020` and implying the spread occurred on that specific date.\n\nIn contrast, **ATOM's** dual-time modeling preserves the observation time (`t_obs = 23-01-2020`) separately from the validity period. It recognizes that the article was published on 23-01-2020, but this does not guarantee the spread occurred at that exact time—the spread could have happened days or weeks earlier. This distinction is essential for temporal reasoning: Graphiti would infer that all events in a news article happened on the publication date, while ATOM correctly models when information was observed versus when events actually occurred, preventing temporal misattribution.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_d3deffa176cc.png\" width=\"800px\" alt=\"OpenAI posts DTKG\">\n\u003C\u002Fp>\n\n\nThe figure below shows the temporal resolution comparison between ATOM and Graphiti. Two atomic facts observed on January 28, 2020, report death counts from January 24 (26 deaths) and January 27 (at least 80 deaths). Left (ATOM): performs temporal resolution by detecting similar relations and extending their validity period history (t_end in the figure). Right (Graphiti): creates separate relations for each atomic fact, resulting in duplication. Moreover, Graphiti misinterprets \"By January 24, 2020\" and \"By January 27, 2020\" as validity start times rather than validity end times, leading to temporal misattribution.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_c64c5fd19417.png\" width=\"800px\" alt=\"OpenAI posts DTKG\">\n\u003C\u002Fp>\n\n\n## Installation\n```pip install --update itext2kg```\n\n## ATOM\n\n### LLM Compatibility\n\nATOM is compatible with all language models supported by LangChain. To use ATOM, you will need both a chat model and an embeddings model. For available chat models, refer to the options listed at: https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fintegrations\u002Fchat\u002F. For embedding models, explore the choices at: https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fintegrations\u002Ftext_embedding\u002F\n\nPlease ensure that you install the necessary package for each chat model before use.\n\n### ATOM Arguments\n\n**Initialization:**\n- `llm_model`: A LangChain chat model instance for extracting relationships from text\n- `embeddings_model`: A LangChain embeddings model instance for computing semantic similarities\n\n**`build_graph` function:**\n(This function could be used to build static KGs as well, just fix an arbitrary observation time and pass your atomic facts).\n\n- `atomic_facts` (List[str]): A list of atomic facts (short, self-contained text snippets) to process\n- `obs_timestamp` (str): The observation timestamp when the atomic facts were collected\n- `existing_knowledge_graph` (KnowledgeGraph, optional): An existing knowledge graph to merge with the new one\n- `ent_threshold` (float, default=0.8): Similarity threshold for entity resolution during merging\n- `rel_threshold` (float, default=0.7): Similarity threshold for relationship resolution during merging\n- `entity_name_weight` (float, default=0.8): Weight for entity name in similarity calculations\n- `entity_label_weight` (float, default=0.2): Weight for entity label in similarity calculations\n- `max_workers` (int, default=8): Maximum number of parallel workers for processing\n\n**`build_graph_from_different_obs_times` function:**\n- `atomic_facts_with_obs_timestamps` (dict): A dictionary where keys are observation timestamps (str) and values are lists of atomic facts for each timestamp\n- `existing_knowledge_graph` (KnowledgeGraph, optional): An existing knowledge graph to merge with the new ones\n- `ent_threshold` (float, default=0.8): Similarity threshold for entity resolution during merging\n- `rel_threshold` (float, default=0.7): Similarity threshold for relationship resolution during merging\n- `entity_name_weight` (float, default=0.8): Weight for entity name in similarity calculations\n- `entity_label_weight` (float, default=0.2): Weight for entity label in similarity calculations\n- `max_workers` (int, default=8): Maximum number of parallel workers for processing\n\n# Example: Building a TKG from Text\n\nThe following is a basic example, where we demonstrate how to use ATOM to build a dynamic TKG from atomic facts of the 2020-COVID-NYT. \n\n⚠️ Performance Note: For optimal performance, it is better to run ATOM in dedicated Python scripts rather than Jupyter notebooks, as ATOM's parallel processing architecture can experience significant slowdowns due to event loop conflicts and thread contention in notebook environments.\n\nMore complex example are coming soon..\n\n---\n\n```python\nimport pandas as pd\nimport asyncio\nimport ast\n\n# Import LLM and Embeddings models using LangChain wrappers\nfrom langchain_openai import ChatOpenAI, OpenAIEmbeddings\nfrom itext2kg.atom import Atom\nfrom itext2kg import Neo4jStorage\n\n# Set up the OpenAI LLM and embeddings models (replace \"##\" with your API key)\nopenai_api_key = \"#\"\nopenai_llm_model = ChatOpenAI(\n    api_key=openai_api_key,\n    model=\"gpt-4.1-2025-04-14\",\n    temperature=0,\n    max_tokens=None,\n    timeout=None,\n    max_retries=2,\n)\n\nopenai_embeddings_model = OpenAIEmbeddings(\n    api_key=openai_api_key,\n    model=\"text-embedding-3-large\",\n)\n\n# Load the 2020-COVID-NYT dataset pickle\nnews_covid = pd.read_pickle(\"..\u002Fdatasets\u002Fatom\u002Fnyt_news\u002F2020_nyt_COVID_last_version_ready.pkl\")\n\n# Define a helper function to convert the dataframe's atomic facts into a dictionary,\n# where keys are observation dates and values are the combined list of atomic facts for that date.\ndef to_dictionary(df:pd.DataFrame, max_elements: int | None = 20): \n\n    if isinstance(df['factoids_g_truth'][0], str):\n        df[\"factoids_g_truth\"] = df[\"factoids_g_truth\"].apply(lambda x:ast.literal_eval(x))\n    grouped_df = df.groupby(\"date\")[\"factoids_g_truth\"].sum().reset_index()[:max_elements]\n    return {\n        str(date): factoids for date, factoids in grouped_df.set_index(\"date\")[\"factoids_g_truth\"].to_dict().items()\n        }\n\n# Convert the dataframe into the required dictionary format\nnews_covid_dict = to_dictionary(news_covid)\n\n# Initialize the ATOM pipeline with the OpenAI models\natom = Atom(llm_model=openai_llm_model, embeddings_model=openai_embeddings_model)\n\n# Build the knowledge graph across different observation timestamps\nkg = await atom.build_graph_from_different_obs_times(\n    atomic_facts_with_obs_timestamps=news_covid_dict,\n    \n)\n\n# Visualize the resulting knowledge graph using Neo4j\nURI = \"bolt:\u002F\u002Flocalhost:7687\"\nUSERNAME = \"neo4j\"\nPASSWORD = \"##\"\nNeo4jStorage(uri=URI, username=USERNAME, password=PASSWORD).visualize_graph(knowledge_graph=kg)\n````\n\n## Evaluation Scripts, Dataset and Prompts\nThe dynamic temporal dataset of COVID-19 from 2020 is located in the folder .\u002Fdatasets\u002Fatom (is available also on [huggingface](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Flairgiyassir\u002F2020-COVID-NYT)). To reproduce the results, the scripts are located in the folder .\u002Fevaluation. The prompts are located in the folder .\u002Fitext2kg\u002Fatom\u002Fmodels\u002Fschemas\u002F\n\n\n\n## Public Collaboration\nWe welcome contributions from the community to improve ATOM.\n\n## Citation\n```bibtex\n@article{lairgi2024atom,\n  title={ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs},\n  author={Lairgi, Yassir and Moncla, Ludovic and Benabdeslem, Khalid and Cazabet, R{\\'e}my and Cl{\\'e}au, Pierre},\n  journal={arXiv preprint arXiv:2510.22590},\n  year={2025},\n  url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.22590},\n  eprint={2510.22590},\n  archivePrefix={arXiv},\n  primaryClass={cs.AI}\n}","# ATOM：基于大语言模型的自适应优化动态时序知识图谱构建\n\niText2KG 现已更名为 ATOM。ATOM 是一种少样本、可扩展的方法，用于从非结构化文本中构建并持续更新时序知识图谱（TKG）。\n（我们保留了仓库中的旧版 iText2KG，请参阅 [README](.\u002FREADME_itext2kg.md)。）\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_e22ec1c0eb12.png\" width=\"851px\" alt=\"ATOM Banner\">\n\u003C\u002Fp>\n\n![GitHub 星标](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fauvalab\u002Fitext2kg?style=social)\n![GitHub 分支](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fauvalab\u002Fitext2kg?style=social)\n![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fitext2kg)\n![总下载量](https:\u002F\u002Fimg.shields.io\u002Fpepy\u002Fdt\u002Fitext2kg)\n[![论文](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-View-green?style=flat&logo=adobeacrobatreader)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.22590)\n![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fitext2kg)\n[![演示](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-Available-blue)](.\u002Fexamples\u002F)\n![状态](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FStatus-Work%20in%20Progress-yellow)\n\n\u003Cp align=\"center\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_6b2de3ab64fe.png\" width=\"300\">\n    \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\".\u002Fdocs\u002Flogo_atom_black.png\" width=\"300\">\n    \u003Cimg alt=\"Logo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_6b2de3ab64fe.png\" width=\"300\">\n  \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n## 概述\n传统的静态知识图谱构建方法往往忽视了现实世界数据的动态性和时间敏感性，从而限制了其对持续变化的适应能力。此外，近年来出现的一些零样本或少样本方法虽然避免了特定领域的微调或对预构建本体的依赖，但通常存在多次运行结果不稳定以及关键事实覆盖不全的问题。\n\nATOM 将输入文档分解为最小的、自包含的“原子”事实，从而提高了抽取的全面性和稳定性。随后，基于这些原子事实生成原子级知识图谱，并以并行方式将其合并。\n\n简而言之，ATOM 通过以下方式解决了上述局限性：\n\n- ✅ **提升全面性**：从较长文本中捕获更全面的事实覆盖（事实全面性提升约 31%，时间维度全面性提升约 18%）\n- ✅ **确保稳定性**：在多次运行中生成一致的时序知识图谱（稳定性提升约 17%）\n- ✅ **实现可扩展性**：通过并行架构支持大规模的动态时序更新。\n\n## 🔥 最新消息\n* [2025年10月20日] ATOM - 重大升级：\n    -   **全新架构设计**：ATOM 现采用三模块并行流水线进行 DTKG 的构建与更新。\n    -   **原子事实分解**：新增的第一模块将文本拆分为最小的“原子事实”，有效解决了 LLM 在长上下文中容易遗漏事实的“遗忘效应”问题。\n    -   **增强全面性与稳定性**：新架构显著提升了性能：事实全面性提升约 31%，时间维度全面性提升约 18%，稳定性提升约 17%。\n    -   **双时间建模**：实现了 `t_obs` 与 `t_start`\u002F`t_end` 的双时间建模，以防止动态知识图谱中的时间归属错误。\n    -   **并行五元组抽取**：第二模块直接从原子事实中并行抽取 `(subject, predicate, object, t_start, t_end)` 五元组。\n    -   **并行原子合并架构**：第三模块采用高效的并行两两合并算法，相较于 Graphiti 延迟降低了 93.8%，相较于 iText2KG 则降低了 95.3%。\n    -   **LLM 无关解析**：用距离度量（余弦相似度）替代了耗时的 LLM 解析，实现了可扩展的并行合并。\n\n* [2025年7月29日] iText2KG - 新功能与增强能力：\n    -   **iText2KG_Star**：推出了一种更简单的版本，可直接抽取关系，省去了单独的实体抽取步骤，从而减少了 token 消耗。\n    -   **基于事实的知识图谱构建**：通过文档蒸馏器增强了基于事实的知识图谱构建框架。\n    -   **动态知识图谱**：新增了支持构建随时间演化的动态知识图谱的功能。示例请参见：[动态知识图谱构建](.\u002Fexamples\u002Fbuilding_dynamic_kg_openai_posts.ipynb)。**注意：该版本尚未处理时间或逻辑冲突。**\n\n* [2025年7月19日] iText2KG - 重大性能与可靠性更新：\n    -   **异步架构**：核心方法已迁移到 `async\u002Fawait`，以实现与 LLM API 的非阻塞 I\u002FO。\n    -   **日志系统**：引入了全面的日志记录机制，取代了原有的 print 语句。\n    -   **增强的批处理能力**：提高了处理多篇文档和多次 LLM 调用的效率。\n    -   **更好的错误处理**：增加了增强的错误处理和重试机制。\n\n* [2024年10月7日] iText2KG - 最新功能：\n    -   重构了代码，引入了实体、关系和知识图谱的数据模型。\n    -   实体嵌入同时考虑名称（权重 0.6）和标签（权重 0.4），以区分不同概念（例如，Python:Language 与 Python:Snake）。\n    -   在 `build_graph` 函数中新增了 `max_tries` 参数，用于应对 LLM 的幻觉问题。\n\n* [2024年9月17日] iText2KG - 最新功能：\n    -   兼容所有 LangChain 对话和嵌入模型。\n    -   `build_graph` 函数现在可以扩展现有图谱。\n    -   兼容 Python 3.9 及以上版本。\n\n* [2024年7月16日] iText2KG - 解决了两个主要的 LLM 幻觉问题：\n    -   对于虚构实体，用提供的实体列表中与其最相似的实体进行替换。\n    -   针对“遗忘效应”（未能分配关系），通过重新提示 LLM 来补全缺失的实体。\n\n## 架构\n\nATOM 采用三模块并行流水线，从非结构化文本中构建并持续更新动态时间知识图谱（DTKG）。\n\n**模块-1（原子事实分解）** 使用基于大语言模型的提示技术，以不超过400个标记的最佳分块大小，将时间 `t` 时观测到的输入文档 `D_t` 分解为时间原子事实 `{f_{t,1}, ..., f_{t,m_t}}`。每个时间原子事实都是一个简短、自包含的片段，仅传达一条信息。\n\n**模块-2（原子 TKG 构建）** 并行地从每个原子事实 `f_{t,i}` 中提取五元组（quintuples），以构建原子时间知识图谱 `G^t_i`；同时对节点和关系进行嵌入，并在提取过程中通过将结束有效性事实转换为肯定形式来处理时间解析问题，仅修改 `t_end` 时间（例如，“约翰·多伊于2026年1月1日不再担任X公司的首席执行官” → `(John_Doe, is_ceo, X, [.], [01-01-2026])`）。\n\n**模块-3（并行原子合并）** 采用二叉合并算法，通过迭代式成对并行合并原子 TKG，直至收敛。该过程包含三个解析阶段：(1) 实体解析，使用精确匹配或余弦相似度阈值 `θ_E = 0.8`；(2) 关系解析，忽略端点和时间戳合并关系名称，使用阈值 `θ_R = 0.7`；(3) 时间解析，对于具有相似 `(e_s, r_p, e_o)` 的关系，合并其观测时间和有效时间集合。\n\n最终得到的时间知识图谱快照 `G^t_s` 会与先前的 DTKG `G^{t-1}` 合并，从而生成更新后的 DTKG：`G^t`。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_853061f48e52.png\" width=\"800px\" alt=\"ATOM 架构\">\n\u003C\u002Fp>\n\n---\n## ATOM 工作流示例\n\n在观测日期 2007年1月9日，ATOM 处理事实“史蒂夫·乔布斯于2007年1月9日担任苹果公司首席执行官”，创建五元组 `(Steve Jobs, is_ceo, Apple Inc., [09-01-2007], [.])`，其中 `t_start = [09-01-2007]`，`t_end = [.]`（空\u002F未知）。\n\n随后，在观测日期 2011年10月5日，ATOM 处理更新：“史蒂夫·乔布斯于2011年10月5日不再担任苹果公司首席执行官”。如 **模块-2** 所述，这一 **结束有效性事实** 被转换为其肯定形式，仅修改 `t_end` 时间，生成 `(Steve Jobs, is_ceo, Apple Inc., [.], [05-10-2011])`。\n\n在模块-3 的时间解析阶段，ATOM 检测到这两个五元组共享相同的 `(e_s, r_p, e_o)` 三元组，并合并它们的时间列表，最终生成五元组：`(Steve Jobs, is_ceo, Apple Inc., [09-01-2007], [05-10-2011])`。这正确地表示史蒂夫·乔布斯自2007年1月9日至2011年10月5日担任首席执行官，同时保持双时间建模，即 `t_obs = [09-01-2007, 05-10-2011]`，以追踪每条信息的观测时间。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_8601a427ecba.png\" width=\"800px\" alt=\"ATOM 工作流示意图\">\n\u003C\u002Fp>\n\n更多技术细节，请参阅：\n-   **`atom\u002Fatom.py`**: 构建、合并和更新知识图谱的核心逻辑。\n\n---\n\n## 延迟与可扩展性\n\nATOM 通过用完全并行架构取代串行瓶颈，实现了显著的延迟降低（相比 Graphiti 降低93.8%，相比 iText2KG 降低95.3%）。\n\n关键的架构优势包括：\n\n1.  **并行五元组提取**：ATOM 在单个并行步骤中提取五元组。这避免了 iText2KG 和 Graphiti 采用的分别提取实体和关系的步骤，后者会增加两倍的大语言模型调用次数并延长延迟。\n2.  **独立于大语言模型的合并**：该框架使用高效的距离度量（余弦相似度）进行实体\u002F关系解析。这避免了基于大语言模型的解析所带来的计算瓶颈（Graphiti 采用的方式），并允许在图规模扩大时实现真正的并行化。\n3.  **并行原子合并**：原子 TKG 通过迭代式成对算法合并，且该过程可并行运行（例如，8个线程，批大小为40）。\n4.  **早期时间解析**：时间逻辑在提取阶段（模块-2）处理，而非在合并阶段。\n\n因此，模块-3 的并行合并过程仅占 ATOM 总延迟的13%。其余部分归因于 API 调用，可通过增大批大小或扩展本地大语言模型硬件进一步减少。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_95d3eecc0a64.png\" width=\"800px\" alt=\"延迟对比图\">\n\u003C\u002Fp>\n\n---\n\n## 示例：时间建模（ATOM vs. Graphiti）\n\n下图展示了 ATOM 和 Graphiti 在时间建模上的差异，以2020年1月9日至23日关于新冠疫情的新闻为例。对于 ATOM，时间戳以 UNIX 格式编码，以消除与字符串解析操作和时区转换计算相关的开销。\n\n对于2020年1月23日观测到的事实“这种神秘的呼吸道病毒已传播到至少10个其他国家”，**Graphiti** 将观测时间视为有效开始时间（t_start），设置 `valid_at = 23-01-2020`，暗示传播发生在该特定日期。\n\n相比之下，**ATOM** 的双时间建模将观测时间（`t_obs = 23-01-2020`）与有效时间分开保存。它认识到文章是在2020年1月23日发表的，但这并不意味着传播恰好发生在那一天——传播可能早在几天或几周前就已经发生。这种区分对于时间推理至关重要：Graphiti 会推断新闻中的所有事件都发生在文章发表之日，而 ATOM 则正确地建模了信息的观测时间与事件实际发生时间之间的区别，从而避免时间误指。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_d3deffa176cc.png\" width=\"800px\" alt=\"OpenAI 发布 DTKG\">\n\u003C\u002Fp>\n\n\n下图显示了 ATOM 和 Graphiti 在时间解析方面的比较。两个于2020年1月28日观测到的原子事实分别报告了1月24日（26人死亡）和1月27日（至少80人死亡）的死亡人数。左图（ATOM）：通过检测相似关系并扩展其有效时间历史（图中 t_end）来进行时间解析。右图（Graphiti）：为每个原子事实创建单独的关系，导致重复。此外，Graphiti 错误地将“截至2020年1月24日”和“截至2020年1月27日”理解为有效开始时间，而非有效结束时间，从而导致时间误指。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_readme_c64c5fd19417.png\" width=\"800px\" alt=\"OpenAI 发布 DTKG\">\n\u003C\u002Fp>\n\n\n## 安装\n```pip install --update itext2kg```\n\n## ATOM\n\n### 大语言模型兼容性\n\nATOM 兼容 LangChain 支持的所有语言模型。要使用 ATOM，您需要同时具备聊天模型和嵌入模型。有关可用聊天模型，请参阅以下链接：https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fintegrations\u002Fchat\u002F。有关嵌入模型，请访问：https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fintegrations\u002Ftext_embedding\u002F。\n\n请确保在使用前为每个聊天模型安装必要的包。\n\n### ATOM 参数\n\n**初始化：**\n- `llm_model`: 一个用于从文本中提取关系的 LangChain 对话模型实例\n- `embeddings_model`: 一个用于计算语义相似度的 LangChain 嵌入模型实例\n\n**`build_graph` 函数：**\n（该函数也可用于构建静态知识图谱，只需固定一个任意的观测时间并传入你的原子事实即可）。\n\n- `atomic_facts` (List[str]): 需要处理的原子事实列表（简短、自成一体的文本片段）\n- `obs_timestamp` (str): 收集这些原子事实时的观测时间戳\n- `existing_knowledge_graph` (KnowledgeGraph, 可选): 与新知识图谱合并的现有知识图谱\n- `ent_threshold` (float, 默认值=0.8): 合并过程中实体消歧的相似度阈值\n- `rel_threshold` (float, 默认值=0.7): 合并过程中关系消歧的相似度阈值\n- `entity_name_weight` (float, 默认值=0.8): 相似度计算中实体名称的权重\n- `entity_label_weight` (float, 默认值=0.2): 相似度计算中实体标签的权重\n- `max_workers` (int, 默认值=8): 处理时的最大并行工作线程数\n\n**`build_graph_from_different_obs_times` 函数：**\n- `atomic_facts_with_obs_timestamps` (dict): 字典，键为观测时间戳（str），值为每个时间戳对应的原子事实列表\n- `existing_knowledge_graph` (KnowledgeGraph, 可选): 与新知识图谱合并的现有知识图谱\n- `ent_threshold` (float, 默认值=0.8): 合并过程中实体消歧的相似度阈值\n- `rel_threshold` (float, 默认值=0.7): 合并过程中关系消歧的相似度阈值\n- `entity_name_weight` (float, 默认值=0.8): 相似度计算中实体名称的权重\n- `entity_label_weight` (float, 默认值=0.2): 相似度计算中实体标签的权重\n- `max_workers` (int, 默认值=8): 处理时的最大并行工作线程数\n\n# 示例：从文本构建动态时空知识图谱\n\n以下是一个基本示例，展示了如何使用 ATOM 从 2020 年 COVID-NYT 的原子事实中构建动态时空知识图谱。\n\n⚠️ 性能提示：为了获得最佳性能，建议在专用的 Python 脚本中运行 ATOM，而不是在 Jupyter 笔记本中运行。这是因为 ATOM 的并行处理架构在笔记本环境中容易因事件循环冲突和线程竞争而导致显著的性能下降。\n\n更复杂的示例即将推出……\n\n---\n\n```python\nimport pandas as pd\nimport asyncio\nimport ast\n\n# 使用 LangChain 封装导入 LLM 和嵌入模型\nfrom langchain_openai import ChatOpenAI, OpenAIEmbeddings\nfrom itext2kg.atom import Atom\nfrom itext2kg import Neo4jStorage\n\n# 设置 OpenAI LLM 和嵌入模型（将“##”替换为您的 API 密钥）\nopenai_api_key = \"#\"\nopenai_llm_model = ChatOpenAI(\n    api_key=openai_api_key,\n    model=\"gpt-4.1-2025-04-14\",\n    temperature=0,\n    max_tokens=None,\n    timeout=None,\n    max_retries=2,\n)\n\nopenai_embeddings_model = OpenAIEmbeddings(\n    api_key=openai_api_key,\n    model=\"text-embedding-3-large\",\n)\n\n# 加载 2020 年 COVID-NYT 数据集的 pickle 文件\nnews_covid = pd.read_pickle(\"..\u002Fdatasets\u002Fatom\u002Fnyt_news\u002F2020_nyt_COVID_last_version_ready.pkl\")\n\n# 定义一个辅助函数，将数据框中的原子事实转换为字典，\n# 其中键为观测日期，值为该日期的所有原子事实组合。\ndef to_dictionary(df:pd.DataFrame, max_elements: int | None = 20): \n\n    if isinstance(df['factoids_g_truth'][0], str):\n        df[\"factoids_g_truth\"] = df[\"factoids_g_truth\"].apply(lambda x:ast.literal_eval(x))\n    grouped_df = df.groupby(\"date\")[\"factoids_g_truth\"].sum().reset_index()[:max_elements]\n    return {\n        str(date): factoids for date, factoids in grouped_df.set_index(\"date\")[\"factoids_g_truth\"].to_dict().items()\n        }\n\n# 将数据框转换为所需的字典格式\nnews_covid_dict = to_dictionary(news_covid)\n\n# 使用 OpenAI 模型初始化 ATOM 流水线\natom = Atom(llm_model=openai_llm_model, embeddings_model=openai_embeddings_model)\n\n# 在不同观测时间戳上构建知识图谱\nkg = await atom.build_graph_from_different_obs_times(\n    atomic_facts_with_obs_timestamps=news_covid_dict,\n    \n)\n\n# 使用 Neo4j 可视化生成的知识图谱\nURI = \"bolt:\u002F\u002Flocalhost:7687\"\nUSERNAME = \"neo4j\"\nPASSWORD = \"##\"\nNeo4jStorage(uri=URI, username=USERNAME, password=PASSWORD).visualize_graph(knowledge_graph=kg)\n```\n\n## 评估脚本、数据集和提示词\n2020 年 COVID-19 动态时空数据集位于 .\u002Fdatasets\u002Fatom 文件夹中（也可在 [huggingface](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Flairgiyassir\u002F2020-COVID-NYT) 上获取）。要复现结果，相关脚本位于 .\u002Fevaluation 文件夹中。提示词则位于 .\u002Fitext2kg\u002Fatom\u002Fmodels\u002Fschemas\u002F 文件夹中。\n\n\n\n## 公开协作\n我们欢迎社区贡献以改进 ATOM。\n\n## 引用\n```bibtex\n@article{lairgi2024atom,\n  title={ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs},\n  author={Lairgi, Yassir and Moncla, Ludovic and Benabdeslem, Khalid and Cazabet, R{\\'e}my and Cl{\\'e}au, Pierre},\n  journal={arXiv preprint arXiv:2510.22590},\n  year={2025},\n  url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.22590},\n  eprint={2510.22590},\n  archivePrefix={arXiv},\n  primaryClass={cs.AI}\n}","# ATOM (原 iText2KG) 快速上手指南\n\nATOM 是一个基于大语言模型（LLM）的自适应、优化动态时序知识图谱（TKG）构建工具。它通过将非结构化文本分解为最小化的“原子事实”，并行提取五元组并进行高效合并，显著提升了知识图谱构建的完整性、稳定性和扩展性。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows\n*   **Python 版本**：Python 3.9 或更高版本\n*   **前置依赖**：\n    *   `pip` 包管理工具\n    *   有效的 LLM API 密钥（如 OpenAI API Key，或其他兼容 LangChain 的模型）\n    *   推荐安装 `langchain` 相关库以利用其广泛的模型支持\n\n## 安装步骤\n\n您可以直接通过 PyPI 安装最新版本的工具包（包名仍为 `itext2kg`）：\n\n```bash\npip install --upgrade itext2kg\n```\n\n> **提示**：如果您在中国大陆地区遇到下载速度慢的问题，建议使用国内镜像源加速安装：\n> ```bash\n> pip install --upgrade itext2kg -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 基本使用\n\nATOM 的核心工作流包含三个并行模块：原子事实分解、原子 TKG 构建和平行原子合并。以下是一个最简化的 Python 使用示例，展示如何从文本构建动态时序知识图谱。\n\n### 1. 导入模块与初始化\n\n```python\nfrom atom import ATOM\nfrom langchain_openai import ChatOpenAI\nimport os\n\n# 设置 API Key (请以实际环境变量或硬编码方式配置)\nos.environ[\"OPENAI_API_KEY\"] = \"your-api-key-here\"\n\n# 初始化 LLM 模型\nllm = ChatOpenAI(model=\"gpt-4o\", temperature=0)\n\n# 初始化 ATOM 引擎\n# 可根据需要调整线程数 (num_threads) 和批次大小 (batch_size) 以优化性能\natom_engine = ATOM(llm=llm, num_threads=8, batch_size=40)\n```\n\n### 2. 构建动态时序知识图谱\n\n准备包含时间信息的文本数据。ATOM 能够自动识别观察时间（`t_obs`）并推断事实的有效起止时间（`t_start`, `t_end`）。\n\n```python\n# 示例文档列表：每个文档包含文本内容和观察时间戳\ndocuments = [\n    {\n        \"text\": \"Steve Jobs was the CEO of Apple Inc. on January 9, 2007.\",\n        \"observation_time\": \"2007-01-09\"\n    },\n    {\n        \"text\": \"Steve Jobs is no longer the CEO of Apple Inc. on October 5, 2011.\",\n        \"observation_time\": \"2011-10-05\"\n    }\n]\n\n# 执行构建\u002F更新操作\n# 该函数会自动处理原子事实分解、五元组提取及并行合并\ndynamic_kg = atom_engine.build_graph(documents)\n\n# 查看生成的图谱结果\nprint(dynamic_kg)\n```\n\n### 3. 结果说明\n\n执行上述代码后，ATOM 将输出一个合并后的时序知识图谱。针对上述示例，系统将正确生成如下五元组逻辑：\n\n*   **实体关系**：`(Steve Jobs, is_ceo, Apple Inc.)`\n*   **时间范围**：`t_start: 2007-01-09`, `t_end: 2011-10-05`\n*   **观察时间**：系统会保留两条信息的原始观察时间 (`t_obs`)，以区分事实发生时间与信息获取时间，避免时序归因错误。\n\n### 进阶提示\n\n*   **增量更新**：`build_graph` 函数支持传入现有的图谱对象，从而实现知识的连续动态更新。\n*   **异步处理**：底层架构已迁移至 `async\u002Fawait` 模式，在处理大量文档时能显著提高 I\u002FO 效率。\n*   **自定义阈值**：可在初始化时调整实体分辨率 (`theta_E`) 和关系分辨率 (`theta_R`) 的余弦相似度阈值，以适应不同领域的精度需求。","某金融风控团队需要从每日海量的新闻快讯和财报中，实时提取企业间的动态关联（如并购、高管变动）以构建时序知识图谱。\n\n### 没有 itext2kg 时\n- **关键事实遗漏**：面对长篇幅的深度报道，传统大模型容易受“遗忘效应”影响，漏掉文中后半段提及的关键时间点和事件细节。\n- **结果不稳定**：同一份文档多次运行提取，得到的实体关系不一致，导致风控规则无法固化，需人工反复校验。\n- **时间维度混乱**：难以区分“事件发生时间”与“信息观测时间”，常将过去发生的旧闻误判为最新风险信号。\n- **更新效率低下**：随着数据量激增，串行处理架构导致图谱更新延迟高达数小时，无法满足实时预警需求。\n\n### 使用 itext2kg 后\n- **事实覆盖全面**：itext2kg 先将文本拆解为最小“原子事实”再并行提取，使关键事实捕获率提升约 31%，彻底解决长文遗漏问题。\n- **输出高度稳定**：基于原子化分解的架构确保了多次运行结果的一致性，稳定性提升约 17%，大幅降低人工复核成本。\n- **时间建模精准**：独有的双重时间建模机制（区分观测时间与起止时间），有效防止了动态图谱中的时间归属错误。\n- **实时动态更新**：利用并行合并算法替代缓慢的大模型推理，延迟降低超过 93%，支持大规模数据的秒级图谱演进。\n\nitext2kg 通过原子化分解与并行架构，将非结构化文本转化为高覆盖率、高稳定性且具备精准时间维度的动态知识图谱，让实时智能决策成为可能。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAuvaLab_itext2kg_e22ec1c0.png","AuvaLab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FAuvaLab_ea9bd537.png",null,"https:\u002F\u002Fgithub.com\u002FAuvaLab",[80],{"name":81,"color":82,"percentage":83},"Python","#3572A5",100,934,101,"2026-04-17T08:39:14","Apache-2.0","未说明",{"notes":90,"python":91,"dependencies":92},"该工具主要依赖 LLM API（如 OpenAI）进行知识图谱构建，支持异步架构和批量处理。核心功能包括原子事实分解、并行五元组提取及基于余弦相似度的实体\u002F关系解析（无需 LLM 参与合并阶段）。安装命令为 `pip install --update itext2kg`。README 中未明确提及具体的操作系统、GPU 或内存硬件需求，表明其可能对本地算力要求不高，主要瓶颈在于 LLM API 调用延迟。","3.9+",[93],"langchain",[27,16],[96,97,98],"knowledge-graph","llms","temporal-knowledge-graph","2026-03-27T02:49:30.150509","2026-04-18T09:19:32.233435",[102,107,112,117,122,127,131],{"id":103,"question_zh":104,"answer_zh":105,"source_url":106},39236,"如何替换默认的 GPT 模型，使用其他大语言模型（如 Qwen、Ollama 本地模型等）？","项目支持通过 LangChain 集成多种模型。对于本地模型，推荐使用 Ollama。在 Linux 上安装 Ollama 的命令为：`curl -fsSL https:\u002F\u002Follama.com\u002Finstall.sh | sh`。安装后下载所需模型（例如 `qwen2.5:7b` 或更大版本如 `qwen2:72b`）。维护者建议，较小的模型（如 7B 参数以下）可能无法正确格式化输出，若遇到解析错误，请尝试升级到大参数版本（14B, 32B, 72B）以获得更好的结构化输出能力。","https:\u002F\u002Fgithub.com\u002FAuvaLab\u002Fitext2kg\u002Fissues\u002F18",{"id":108,"question_zh":109,"answer_zh":110,"source_url":111},39237,"安装时遇到 'No module named distutils' 错误怎么办？","该错误通常是因为 numpy 1.24 与较新的 Python 版本（如 Python 3.12）不兼容导致的。临时解决方案是将 Python 版本降级至 3.11。此外，可以尝试先运行 `pip install setuptools` 看是否能解决问题。维护者表示将在后续版本中更新依赖要求以修复此问题。","https:\u002F\u002Fgithub.com\u002FAuvaLab\u002Fitext2kg\u002Fissues\u002F2",{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},39238,"运行代码时出现类型错误（如 'relationship' 是字符串而非字典，或 'name' 是列表而非字符串），如何解决？","这类错误（TypeError\u002FValidationError）的主要原因是使用的本地小模型（如 llama3-8b）无法稳定地生成结构化输出。唯一的根本解决方案是升级使用更大参数的模型（需要更多资源）。作为临时规避，可以检查并更新到最新版本（v0.0.8+），其中已处理部分此类问题。如果问题依旧，请考虑更换为 qwen2.5:7b 或更大的模型。","https:\u002F\u002Fgithub.com\u002FAuvaLab\u002Fitext2kg\u002Fissues\u002F22",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},39239,"遇到 Pydantic 版本兼容性错误怎么办？","由于 Pydantic 新版本发布可能导致兼容性问题，建议在 `requirements.txt` 中明确指定 Pydantic 版本（例如 `pydantic==2.9.2`）。维护者已在 v0.0.8 版本中处理了相关的依赖兼容性问题，建议升级 itext2kg 到最新版本。","https:\u002F\u002Fgithub.com\u002FAuvaLab\u002Fitext2kg\u002Fissues\u002F26",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},39240,"如何处理长文本场景或避免上下文长度限制？","如果文档过长导致上下文限制问题，可以在将文档送入 Document Distiller 或构建图谱之前，先将文档切分成块（chunking）。可以使用 LangChain 提供的文本分割器（Text Splitters）来实现。参考文档：https:\u002F\u002Fpython.langchain.com\u002Fv0.1\u002Fdocs\u002Fmodules\u002Fdata_connection\u002Fdocument_transformers\u002F","https:\u002F\u002Fgithub.com\u002FAuvaLab\u002Fitext2kg\u002Fissues\u002F6",{"id":128,"question_zh":129,"answer_zh":130,"source_url":126},39241,"示例代码中的 \"distilled_doc\" 是什么？我需要自己解析文档吗？","\"distilled_doc\" 指的是经过 \"Document Distiller\" 处理后的精简文档（例如精简后的简历）。如果你发现代码中找不到该变量但示例中有，可能是因为该步骤需要预处理。如果文档过长，你可能需要先对文档进行分块处理，然后再传递给 Document Distiller 或 iText2KG 进行图谱构建。",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},39242,"论文中提到的 \"unresolved (false positive) entities\"（未解决\u002F假阳性实体）具体指什么？","这是指在生成知识图谱后，实体或关系列表中存在的无效或错误提取的项目。评估方法是：在使用不同基线方法生成图谱后，人工检查每个实体\u002F关系列表，计算其中未解决（即错误或冗余）的实体\u002F关系数量占总提取数量的比例。","https:\u002F\u002Fgithub.com\u002FAuvaLab\u002Fitext2kg\u002Fissues\u002F20",[137,142,147,152,157,162,167,172,176],{"id":138,"version":139,"summary_zh":140,"released_at":141},315187,"v1.0.0","- 全面架构重设计：ATOM 现采用三模块并行流水线来构建和更新动态知识图谱（DTKG）。  - 原子事实分解：新增的第一模块将文本拆分为最小的“原子事实”，有效缓解了大型语言模型在长上下文中容易遗漏事实的“遗忘效应”。  - 更强的完备性与稳定性：新架构带来了显著提升：事实完备性提升约31%，时间维度完备性提升约18%，稳定性提升约17%。  - 双时间建模：引入双时间建模机制（观测时间 t_obs 与起始\u002F结束时间 t_start\u002Ft_end），以避免在动态知识图谱中出现时间属性错配的问题。  - 并行五元组抽取：第二模块直接从原子事实中并行抽取五元组（主体、谓词、客体、起始时间、结束时间）。  - 并行原子合并架构：第三模块采用高效的并行两两合并算法，相较于 Graphiti 延迟降低93.8%，相较于 iText2KG 则降低95.3%。  - 不依赖大语言模型的消歧方案：用距离度量（余弦相似度）替代耗时的大语言模型消歧方法，实现可扩展的并行合并。","2025-10-28T09:28:27",{"id":143,"version":144,"summary_zh":145,"released_at":146},315188,"v0.0.9","# 改进\r\n\r\n- 我们修复了 Neo4j 存储中的 bug #38。","2025-09-01T10:38:56",{"id":148,"version":149,"summary_zh":150,"released_at":151},315189,"v0.0.8","### 新特性\n* iText2KG_Star\n    * 直接关系抽取（速度更快）\n    * 取消了单独的实体抽取步骤\n    * 无需处理孤立或虚构的实体\n* 动态知识图谱\n    * 支持时间序列追踪的动态知识图谱\n    * 可与现有图谱进行增量更新\n* 基于事实的构建\n    * 通过 Document Distiller 进行结构化事实抽取\n    * 构建更加全面的知识图谱\n\n### 改进\n* 异步迁移：所有方法现均为异步\u002Fawait 模式\n* 日志增强：引入结构化日志系统\n* 通用 LangChain 支持：兼容所有聊天和嵌入模型\n* 更完善的错误处理：达到生产级可靠性\n\n### 技术细节\n* 关系抽取能力提升\n* 兼容 Python 3.10 及以上版本\n* 提供全面的示例\n\n我们已修复以下问题：#34、#33、#29、#28、#26、#22","2025-07-29T10:12:22",{"id":153,"version":154,"summary_zh":155,"released_at":156},315190,"v0.0.7","- 对整个 iText2KG 代码进行了重构，新增了用于描述实体、关系和知识图谱的数据模型。\n- 每个实体都同时使用其名称和标签进行嵌入，以避免将名称相似但标签不同的概念合并在一起，例如“Python: Language”和“Python: Snake”。\n- 实体名称嵌入和实体标签的权重可配置，默认设置为：实体标签权重为 0.4，实体名称权重为 0.6。\n- 在 iText2KG.build_graph 函数中新增了 max_tries 参数，用于实体和关系抽取，以防止在构建输出时出现幻觉现象。同时，还为该方法新增了 max_tries_isolated_entities 参数，以处理孤立实体处理过程中可能出现的幻觉问题。","2024-10-09T13:45:50",{"id":158,"version":159,"summary_zh":160,"released_at":161},315191,"v0.0.5","- 修复 #7 中报告的 bug。- 更新 iText2KG 的 build_graph 函数，使其在图构建完成后，能够对新构建的图与现有图进行匹配。","2024-09-20T04:36:53",{"id":163,"version":164,"summary_zh":165,"released_at":166},315192,"v0.0.4","- 现在，iText2KG 兼容 LangChain 支持的所有聊天\u002F嵌入模型。（#1）\n通过将已提取的实体和关系作为参数传递给 iText2KG 中的 build_graph 函数，可以扩展构建的图。\n- iText2KG 兼容 Python 3.9 及以上的所有版本。（#2）\n- 修复了整体架构中的一些 bug。","2024-09-17T09:08:07",{"id":168,"version":169,"summary_zh":170,"released_at":171},315193,"v0.0.3","- 更新依赖项中 Neo4j 的版本。 - 添加用于阈值估计的数据集。 - 添加论文链接。","2024-09-06T01:03:52",{"id":173,"version":174,"summary_zh":77,"released_at":175},315194,"V0.0.2","2024-07-16T11:28:21",{"id":177,"version":178,"summary_zh":77,"released_at":179},315195,"V0.0.1","2024-07-16T11:12:06"]