[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Hawksight-AI--semantica":3,"tool-Hawksight-AI--semantica":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",143909,2,"2026-04-07T11:33:18",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":76,"owner_url":77,"languages":78,"stars":83,"forks":84,"last_commit_at":85,"license":86,"difficulty_score":87,"env_os":88,"env_gpu":88,"env_ram":88,"env_deps":89,"category_tags":101,"github_topics":102,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":123,"updated_at":124,"faqs":125,"releases":154},5032,"Hawksight-AI\u002Fsemantica","semantica","Semantica 🧠 — A framework for building semantic layers, context graphs, and decision intelligence systems with explainability and provenance.","Semantica 是一个专为构建可解释、可追溯的 AI 系统而设计的框架，旨在为人工智能应用添加“语义层”和“决策智能”。当前许多 AI 代理虽然能力强大，却常被视为“黑盒”：它们缺乏真正的记忆结构，无法解释检索依据，没有决策记录可供审计，也难以追踪数据来源或发现事实冲突。这些缺陷使得 AI 在医疗、金融等对合规性要求极高的领域难以落地。\n\nSemantica 通过构建结构化的上下文图谱（Context Graphs），将实体、关系和决策过程清晰记录下来，让每一步推理都有据可查。它支持完整的决策生命周期管理，确保每个结论都能追溯到源头事实，并符合 W3C 溯源标准。此外，内置的推理引擎支持多种逻辑推导方式，能主动检测数据冲突并进行实体消歧，从而提升系统的可靠性。\n\n这款工具特别适合需要构建高可信度 AI 应用的开发者、研究人员及企业架构师。它可以无缝集成到 LangChain、LlamaIndex 或 AutoGen 等现有工作流中，作为上层的“问责机制”而非替代品。如果你希望 AI 系统不仅聪明，而且透明、可控且值得信赖，Semantica 提供了坚实的技术基础。","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHawksight-AI_semantica_readme_b3f18647a21c.png\" alt=\"Semantica Logo\" width=\"420\"\u002F>\n\n# 🧠 Semantica\n\n**A Framework for Building Context Graphs and Decision Intelligence Layers for AI**\n\n[![Python 3.8+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.8+-blue.svg)](https:\u002F\u002Fwww.python.org\u002F)\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fsemantica.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fsemantica\u002F)\n[![Version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fversion-0.3.0-brightgreen.svg)](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Freleases\u002Ftag\u002Fv0.3.0)\n[![Total Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHawksight-AI_semantica_readme_0a351f725a09.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fsemantica)\n[![CI](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fworkflows\u002FCI\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Factions)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join-5865F2?logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH)\n[![X](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX-Follow-black?logo=x&logoColor=white)](https:\u002F\u002Fx.com\u002FBuildSemantica)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join%20Community-5865F2?logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH)\n[![X](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX-Follow%20Semantica-black?logo=x&logoColor=white)](https:\u002F\u002Fx.com\u002FBuildSemantica)\n\n### ⭐ Give us a Star • 🍴 Fork us • 💬 Join our Discord • 🐦 Follow on X\n\n> **Transform Chaos into Intelligence. Build AI systems with context graphs, decision tracking, and advanced knowledge engineering that are explainable, traceable, and trustworthy — not black boxes.**\n\n\u003C\u002Fdiv>\n\n---\n\n## The Problem\n\nAI agents today are capable but not trustworthy:\n\n- **No memory structure** — agents store embeddings, not meaning. Retrieval is fuzzy; there's no way to ask *why* something was recalled.\n- **No decision trail** — agents make decisions continuously but record nothing. When something goes wrong, there's no history to debug or audit.\n- **No provenance** — outputs cannot be traced back to source facts. In regulated industries, this is a compliance blocker.\n- **No reasoning transparency** — black-box answers with no explanation of how a conclusion was reached.\n- **No conflict detection** — contradictory facts silently coexist in vector stores, producing unpredictable answers.\n\nThese aren't edge cases. They are the reason AI cannot be deployed in healthcare, finance, legal, and government without custom guardrails built from scratch.\n\n## The Solution\n\nSemantica is the **context and intelligence layer** you add to your AI stack:\n\n- **Context Graphs** — structured graph of entities, relationships, and decisions your agent builds as it works. Queryable, traceable, persistent.\n- **Decision Intelligence** — every decision is a first-class object: recorded, linked causally, searchable by precedent, and analyzable for downstream impact.\n- **Provenance** — every fact links to its source. W3C PROV-O compliant. Full lineage from ingestion to inference.\n- **Reasoning engines** — forward chaining, Rete networks, deductive, abductive, and SPARQL reasoning. Explainable inference paths, not black-box answers.\n- **Deduplication & QA** — conflict detection, entity resolution, and validation built into the pipeline.\n\nWorks alongside LangChain, LlamaIndex, AutoGen, CrewAI, and any LLM provider — Semantica is not a replacement, it's the accountability layer on top.\n\n### ⚡ Quick Installation\n\n```bash\npip install semantica\n```\n\n---\n\n## What's New in v0.3.0\n\n> First stable release — `Production\u002FStable` on PyPI. Ships across three stages: 0.3.0-alpha, 0.3.0-beta, and 0.3.0 stable.\n\n| Area | Highlights |\n|------|-----------|\n| **Context Graphs** | Temporal validity windows (`valid_from`\u002F`valid_until`), weighted BFS (`min_weight`), cross-graph navigation (`link_graph`, `navigate_to`, `resolve_links`) with full save\u002Fload persistence |\n| **Decision Intelligence** | Complete lifecycle: `record_decision` → `trace_decision_chain` → `analyze_decision_impact` → `find_similar_decisions`; hybrid precedent search; `PolicyEngine` with versioned rules |\n| **KG Algorithms** | PageRank, betweenness, community detection (Louvain), Node2Vec embeddings, link prediction, path finding — all returning structured dicts |\n| **Semantic Extraction** | LLM relation extraction fixed (no silent drops); `_match_pattern` rewritten; duplicate relation bug removed; `\"llm_typed\"` metadata corrected |\n| **Deduplication v2** | `blocking_v2`\u002F`hybrid_v2` candidate generation (**63.6% faster**); two-stage prefilter (**18–25% faster**); semantic dedup v2 (**6.98x faster**) |\n| **Delta Processing** | SPARQL-based incremental diff; `delta_mode` pipelines; snapshot versioning with `prune_versions()` |\n| **Export** | RDF format aliases (`\"ttl\"`, `\"json-ld\"`, etc.); ArangoDB AQL export; Apache Parquet export (Spark\u002FBigQuery\u002FDatabricks ready) |\n| **Pipeline** | `FailureHandler` with LINEAR\u002FEXPONENTIAL\u002FFIXED backoff; `PipelineValidator` returning `ValidationResult`; retry loop fixed |\n| **Graph Backends** | Apache AGE (SQL injection fixed), AWS Neptune, FalkorDB, PgVector (HNSW\u002FIVFFlat indexing) |\n| **Tests** | **886+ passing, 0 failures** — 335 context, ~430 KG, 70 semantic extraction, 85 real-world E2E |\n\nSee [RELEASE_NOTES.md](RELEASE_NOTES.md) for the full per-contributor breakdown and [CHANGELOG](CHANGELOG.md) for the complete diff.\n\n---\n\n## Unreleased \u002F Coming Next\n\n| Area | Highlights |\n|------|-----------|\n| **SHACL Constraints** | `OntologyEngine.to_shacl()` auto-derives SHACL shapes from any OWL ontology; `validate_graph()` returns structured `SHACLValidationReport` with plain-English violation explanations; three quality tiers (`\"basic\"`, `\"standard\"`, `\"strict\"`); three output formats (Turtle, JSON-LD, N-Triples); 3-level inheritance propagation |\n\n---\n\n## Features\n\n### Context & Decision Intelligence\n- **Context Graphs** — structured graph of entities, relationships, and decisions; queryable, causal, persistent\n- **Decision tracking** — record, link, and analyze every agent decision with `add_decision()`, `record_decision()`\n- **Causal chains** — link decisions with `add_causal_relationship()`, trace lineage with `trace_decision_chain()`\n- **Precedent search** — hybrid similarity search over past decisions with `find_similar_decisions()`\n- **Influence analysis** — `analyze_decision_impact()`, `analyze_decision_influence()` — understand downstream effects\n- **Policy engine** — enforce business rules with `check_decision_rules()`; automated compliance validation\n- **Agent memory** — `AgentMemory` with short\u002Flong-term storage, conversation history, and statistics\n- **Cross-system context capture** — `capture_cross_system_inputs()` for multi-agent pipelines\n\n### Knowledge Graphs\n- **Knowledge graph construction** — entities, relationships, properties, typed edges\n- **Graph algorithms** — PageRank, betweenness centrality, clustering coefficient, community detection\n- **Node embeddings** — Node2Vec embeddings via `NodeEmbedder`\n- **Similarity** — cosine similarity via `SimilarityCalculator`\n- **Link prediction** — score potential new edges via `LinkPredictor`\n- **Temporal graphs** — time-aware nodes and edges\n- **Incremental \u002F delta processing** — update graphs without full recompute\n\n### Semantic Extraction\n- **Entity extraction** — named entity recognition, normalization, classification\n- **Relation extraction** — triplet generation from raw text using LLMs or rule-based methods\n- **LLM-typed extraction** — extraction with typed relation metadata\n- **Deduplication v1** — Jaro-Winkler similarity, basic blocking\n- **Deduplication v2** — `blocking_v2`, `hybrid_v2`, `semantic_v2` strategies with `max_candidates_per_entity`\n- **Triplet deduplication** — `dedup_triplets()` for removing duplicate (subject, predicate, object) triples\n\n### Reasoning Engines\n- **Forward chaining** — `Reasoner` with IF\u002FTHEN string rules and dict facts\n- **Rete network** — `ReteEngine` for high-throughput production rule matching\n- **Deductive reasoning** — `DeductiveReasoner` for classical inference\n- **Abductive reasoning** — `AbductiveReasoner` for hypothesis generation from observations\n- **SPARQL reasoning** — `SPARQLReasoner` for query-based inference over RDF graphs\n\n### Provenance & Auditability\n- **Entity provenance** — `ProvenanceTracker.track_entity(id, source_url, metadata)`\n- **Algorithm provenance** — `AlgorithmTrackerWithProvenance` tracks computation lineage\n- **Graph builder provenance** — `GraphBuilderWithProvenance` records entity source lineage from URLs\n- **W3C PROV-O compliant** — lineage tracking across all modules\n- **Change management** — version control with checksums, audit trails, compliance support\n\n### Vector Store\n- **Backends** — FAISS, Pinecone, Weaviate, Qdrant, Milvus, PgVector, in-memory\n- **Semantic search** — top-k retrieval by embedding similarity\n- **Hybrid search** — vector + keyword with configurable weights\n- **Filtered search** — metadata-based filtering on any field\n- **Custom similarity weights** — tune retrieval per use case\n\n### 🌐 Graph Database Support\n- **AWS Neptune** — Amazon Neptune graph database with IAM authentication\n- **Apache AGE** — PostgreSQL graph extension with openCypher via SQL\n- **FalkorDB** — native support; `DecisionQuery` and `CausalChainAnalyzer` work directly with FalkorDB row\u002Fheader shapes\n\n### Data Ingestion\n- **File formats** — PDF, DOCX, HTML, JSON, CSV, Excel, PPTX, archives\n- **Web crawl** — `WebIngestor` with configurable depth\n- **Databases** — `DBIngestor` with SQL query support\n- **Snowflake** — `SnowflakeIngestor` with table\u002Fquery ingestion, pagination, and key-pair\u002FOAuth auth\n- **Docling** — advanced document parsing with table and layout extraction (PDF, DOCX, PPTX, XLSX)\n- **Media** — image OCR, audio\u002Fvideo metadata extraction\n\n### Export Formats\n- **RDF** — Turtle (`.ttl`), JSON-LD, N-Triples (`.nt`), XML via `RDFExporter`\n- **Parquet** — `ParquetExporter` for entities, relationships, and full KG export\n- **ArangoDB AQL** — ready-to-run INSERT statements via `ArangoAQLExporter`\n- **OWL ontologies** — export generated ontologies in Turtle or RDF\u002FXML\n- **SHACL shapes** — export auto-derived constraint shapes via `RDFExporter.export_shacl()` (`.ttl`, `.jsonld`, `.nt`, `.shacl`)\n\n### Pipeline & Production\n- **Pipeline builder** — `PipelineBuilder` with stage chaining and parallel workers\n- **Validation** — `PipelineValidator` returns `ValidationResult(valid, errors, warnings)` before execution\n- **Failure handling** — `FailureHandler` with `RetryPolicy` and `RetryStrategy` (exponential backoff, fixed, etc.)\n- **Parallel processing** — configurable worker count per pipeline stage\n- **LLM providers** — 100+ models via LiteLLM (OpenAI, Anthropic, Cohere, Mistral, Ollama, and more)\n\n### Ontology\n- **Auto-generation** — derive OWL ontologies from knowledge graphs via `OntologyGenerator`\n- **Import** — load existing OWL, RDF, Turtle, JSON-LD ontologies via `OntologyImporter`\n- **Validation** — HermiT\u002FPellet compatible consistency checking\n- **SHACL shape generation** — `OntologyEngine.to_shacl()` auto-derives SHACL node and property shapes from any Semantica ontology dict; zero hand-authoring; deterministic (same ontology → same shapes)\n- **SHACL validation** — `OntologyEngine.validate_graph()` runs shapes against a data graph and returns a `SHACLValidationReport` with machine-readable violations and plain-English explanations\n- **Quality tiers** — `\"basic\"` (structure + cardinality), `\"standard\"` (+ enumerations, inheritance), `\"strict\"` (+ `sh:closed` rejects undeclared properties)\n- **Inheritance propagation** — child shapes automatically include all ancestor property shapes (up to 3+ levels), cycle-safe\n- **Three output formats** — Turtle (`.ttl`), JSON-LD, N-Triples; file export via `export_shacl()`\n\n---\n\n## Modules\n\n| Module | What it provides |\n|---|---|\n| `semantica.context` | Context graphs, agent memory, decision tracking, causal analysis, precedent search, policy engine |\n| `semantica.kg` | Knowledge graph construction, graph algorithms, centrality, community detection, embeddings, link prediction, provenance |\n| `semantica.semantic_extract` | NER, relation extraction, event extraction, coreference, triplet generation, LLM-enhanced extraction |\n| `semantica.reasoning` | Forward chaining, Rete network, deductive, abductive, SPARQL reasoning, explanation generation |\n| `semantica.vector_store` | FAISS, Pinecone, Weaviate, Qdrant, Milvus, PgVector, in-memory; hybrid & filtered search |\n| `semantica.export` | RDF (Turtle\u002FJSON-LD\u002FN-Triples\u002FXML), Parquet, ArangoDB AQL, CSV, YAML, OWL, graph formats |\n| `semantica.ingest` | Files (PDF, DOCX, CSV, HTML), web crawl, feeds, databases, Snowflake, MCP, email, repositories |\n| `semantica.ontology` | Auto-generation (6-stage pipeline), OWL\u002FRDF export, import (OWL\u002FRDF\u002FTurtle\u002FJSON-LD), validation, versioning, **SHACL shape generation & validation** |\n| `semantica.pipeline` | Pipeline DSL, parallel workers, validation, retry policies, failure handling, resource scheduling |\n| `semantica.graph_store` | Graph database backends — Neo4j, FalkorDB, Apache AGE, Amazon Neptune; Cypher queries |\n| `semantica.embeddings` | Text embedding generation — Sentence-Transformers, FastEmbed, OpenAI, BGE; similarity calculation |\n| `semantica.deduplication` | Entity deduplication, similarity scoring, merging, clustering; blocking and semantic strategies |\n| `semantica.provenance` | W3C PROV-O compliant end-to-end lineage tracking, source attribution, audit trails |\n| `semantica.parse` | Document parsing — PDF, DOCX, PPTX, HTML, code, email, structured data, media with OCR |\n| `semantica.split` | Document chunking — recursive, semantic, entity-aware, relation-aware, graph-based, ontology-aware |\n| `semantica.normalize` | Data normalization for text, entities, dates, numbers, quantities, languages, encodings |\n| `semantica.conflicts` | Multi-source conflict detection (value, type, relationship, temporal, logical) with resolution strategies |\n| `semantica.change_management` | Version storage, change tracking, checksums, audit trails, compliance support for KGs and ontologies |\n| `semantica.triplet_store` | RDF triplet store integration — Blazegraph, Jena, RDF4J; SPARQL queries and bulk loading |\n| `semantica.visualization` | Interactive and static visualization of KGs, ontologies, embeddings, analytics, and temporal graphs |\n| `semantica.seed` | Seed data management for initial KG construction from CSV, JSON, databases, and APIs |\n| `semantica.core` | Framework orchestration, configuration management, knowledge base construction, plugin system |\n| `semantica.llms` | LLM provider integrations — Groq, OpenAI, Novita AI, HuggingFace, LiteLLM |\n| `semantica.utils` | Shared utilities — logging, validation, exception handling, constants, types, progress tracking |\n\n---\n\n## ⚡ Quick Start\n\n```python\nimport semantica\nfrom semantica.context import AgentContext, ContextGraph\nfrom semantica.vector_store import VectorStore\n\n# Build an agent with structured context\ncontext = AgentContext(\n    vector_store=VectorStore(backend=\"faiss\", dimension=768),\n    knowledge_graph=ContextGraph(advanced_analytics=True),\n    decision_tracking=True,\n    kg_algorithms=True,\n)\n\n# Store memory\nmemory_id = context.store(\n    \"GPT-4 outperforms GPT-3.5 on reasoning benchmarks by 40%\",\n    conversation_id=\"research_session_1\",\n)\n\n# Record a decision with full context\ndecision_id = context.record_decision(\n    category=\"model_selection\",\n    scenario=\"Choose LLM for production reasoning pipeline\",\n    reasoning=\"GPT-4 benchmark advantage justifies 3x cost increase\",\n    outcome=\"selected_gpt4\",\n    confidence=0.91,\n    entities=[\"gpt4\", \"gpt35\", \"reasoning_pipeline\"],\n)\n\n# Find similar decisions from history\nprecedents = context.find_precedents(\"model selection reasoning\", limit=5)\n\n# Analyze downstream influence of this decision\ninfluence = context.analyze_decision_influence(decision_id)\n```\n\n**[📖 Full Quick Start](#-quick-start)** • **[🍳 Cookbook Examples](#-semantica-cookbook)** • **[💬 Join Discord](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH)** • **[⭐ Star Us](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica)**\n\n---\n\n## Core Value Proposition\n\n| **Trustworthy** | **Explainable** | **Auditable** |\n|:------------------:|:------------------:|:-----------------:|\n| Conflict detection & validation | Transparent reasoning paths | Complete provenance tracking |\n| Rule-based governance | Entity relationships & ontologies | W3C PROV-O compliant lineage |\n| Production-grade QA | Multi-hop graph reasoning | Source tracking & integrity verification |\n\n---\n\n## Key Features & Benefits\n\n### Not Just Another Agentic Framework\n\n**Semantica complements** LangChain, LlamaIndex, AutoGen, CrewAI, Google ADK, Agno, and other frameworks to enhance your agents with:\n\n| Feature | Benefit |\n|:--------|:--------|\n| **Context Graphs** | Structured knowledge representation with entity relationships and semantic context |\n| **Decision Tracking** | Complete decision lifecycle management with precedent search and causal analysis |\n| **KG Algorithms** | Advanced graph analytics including centrality, community detection, and embeddings |\n| **Vector Store Integration** | Hybrid search with custom similarity weights and advanced filtering |\n| **Auditable** | Complete provenance tracking with W3C PROV-O compliance |\n| **Explainable** | Transparent reasoning paths with entity relationships |\n| **Provenance-Aware** | End-to-end lineage from documents to responses |\n| **Validated** | Built-in conflict detection, deduplication, QA |\n| **Governed** | Rule-based validation and semantic consistency |\n| **Version Control** | Enterprise-grade change management with integrity verification |\n\n### Perfect For High-Stakes Use Cases\n\n| 🏥 **Healthcare** | 💰 **Finance** | ⚖️ **Legal** |\n|:-----------------:|:--------------:|:------------:|\n| Clinical decisions | Fraud detection | Evidence-backed research |\n| Drug interactions | Regulatory support | Contract analysis |\n| Patient safety | Risk assessment | Case law reasoning |\n\n| 🔒 **Cybersecurity** | 🏛️ **Government** | 🏭 **Infrastructure** | 🚗 **Autonomous** |\n|:-------------------:|:----------------:|:-------------------:|:-----------------:|\n| Threat attribution | Policy decisions | Power grids | Decision logs |\n| Incident response | Classified info | Transportation | Safety validation |\n\n### Powers Your AI Stack\n\n- **Context Graphs** — Structured knowledge representation with entity relationships and semantic context\n- **Decision Tracking Systems** — Complete decision lifecycle management with precedent search and causal analysis\n- **GraphRAG Systems** — Retrieval with graph reasoning and hybrid search using KG algorithms\n- **AI Agents** — Trustworthy, accountable multi-agent systems with semantic memory and decision history\n- **Reasoning Models** — Explainable AI decisions with reasoning paths and influence analysis\n- **Enterprise AI** — Governed, auditable platforms that support compliance and policy enforcement\n\n### Integrations\n\n- **Docling Support** — Document parsing with table extraction (PDF, DOCX, PPTX, XLSX)\n- **AWS Neptune** — Amazon Neptune graph database support with IAM authentication\n- **Apache AGE** — PostgreSQL graph extension backend (openCypher via SQL)\n- **Snowflake** — Native ingestion with `SnowflakeIngestor`; table\u002Fquery ingestion, pagination, key-pair & OAuth auth\n- **Custom Ontology Import** — Import existing ontologies (OWL, RDF, Turtle, JSON-LD)\n\n> **Built for environments where every answer must be explainable and governed.**\n\n---\n\n## Context Graphs & Decision Tracking\n\nSemantica's flagship module. Tracks every decision your agent makes as a structured graph node — with causal links, precedent search, impact analysis, and policy enforcement.\n\n```python\nfrom semantica.context import ContextGraph\n\ngraph = ContextGraph(advanced_analytics=True)\n\n# Record a loan approval decision\nloan_id = graph.add_decision(\n    category=\"loan_approval\",\n    scenario=\"Mortgage application — 780 credit score, 28% DTI\",\n    reasoning=\"Strong credit history, stable income for 8 years, low DTI\",\n    outcome=\"approved\",\n    confidence=0.95,\n)\n\n# Record a downstream dependent decision\nrate_id = graph.add_decision(\n    category=\"interest_rate\",\n    scenario=\"Set rate for approved mortgage\",\n    reasoning=\"Prime applicant qualifies for lowest tier rate\",\n    outcome=\"rate_set_6.2pct\",\n    confidence=0.98,\n)\n\n# Link the decisions causally\ngraph.add_causal_relationship(loan_id, rate_id, relationship_type=\"enables\")\n\n# Find similar past decisions using hybrid similarity\nsimilar    = graph.find_similar_decisions(\"mortgage approval\", max_results=5)\nchain      = graph.trace_decision_chain(loan_id)\nimpact     = graph.analyze_decision_impact(loan_id)\ncompliance = graph.check_decision_rules({\"category\": \"loan_approval\", \"confidence\": 0.95})\ninsights   = graph.get_decision_insights()\n```\n\n```python\nfrom semantica.context import AgentContext, AgentMemory\nfrom semantica.vector_store import VectorStore\n\ncontext = AgentContext(\n    vector_store=VectorStore(backend=\"inmemory\"),\n    knowledge_graph=ContextGraph(advanced_analytics=True),\n    decision_tracking=True,\n    graph_expansion=True,\n    kg_algorithms=True,\n)\n\ncontext.store(\"Regulation EU 2024\u002F1689 requires explainability for high-risk AI\", conversation_id=\"compliance_review\")\ncontext.store(\"Our fraud model flags 0.3% of transactions\", conversation_id=\"compliance_review\")\n\nresults = context.retrieve(\"AI regulation explainability requirements\", limit=3)\nhistory = context.get_conversation_history(\"compliance_review\")\nstats   = context.get_statistics()\n```\n\n---\n\n## Knowledge Graphs\n\n```python\nfrom semantica.kg import KnowledgeGraph, Entity, Relationship\nfrom semantica.kg import CentralityAnalyzer, NodeEmbedder, LinkPredictor\n\nkg = KnowledgeGraph()\n\nkg.add_entity(Entity(id=\"transformer\", label=\"Transformer\", type=\"Architecture\",\n                     properties={\"year\": 2017, \"paper\": \"Attention Is All You Need\"}))\nkg.add_entity(Entity(id=\"bert\", label=\"BERT\", type=\"Model\",\n                     properties={\"year\": 2018, \"parameters\": \"340M\"}))\nkg.add_entity(Entity(id=\"gpt4\", label=\"GPT-4\", type=\"Model\", properties={\"year\": 2023}))\n\nkg.add_relationship(Relationship(source=\"bert\", target=\"transformer\", type=\"based_on\"))\nkg.add_relationship(Relationship(source=\"gpt4\", target=\"transformer\", type=\"based_on\"))\n\n# Graph algorithms\nanalyzer    = CentralityAnalyzer(kg)\ncentrality  = analyzer.compute_pagerank()\nbetweenness = analyzer.compute_betweenness()\n\n# Node embeddings (Node2Vec)\nembedder   = NodeEmbedder()\nembeddings = embedder.compute_embeddings(kg, node_labels=[\"Model\"], relationship_types=[\"based_on\"])\n\n# Link prediction\npredictor = LinkPredictor()\nscore     = predictor.score_link(kg, \"gpt4\", \"bert\", method=\"common_neighbors\")\n\nmodels      = kg.find_nodes(type=\"Model\")\ndescendants = kg.get_neighbors(\"transformer\", direction=\"incoming\")\n```\n\n---\n\n## Semantic Extraction\n\n```python\nfrom semantica.semantic_extract import extract_entities, extract_relations, extract_triplets\n\ntext = \"\"\"\nOpenAI released GPT-4 in March 2023. Microsoft integrated GPT-4 into Azure OpenAI Service.\nAnthropic, founded by former OpenAI researchers, released Claude as a competing model.\n\"\"\"\n\nentities = extract_entities(text)\n# → [Entity(label=\"OpenAI\", type=\"Organization\"), Entity(label=\"GPT-4\", type=\"Model\"), ...]\n\nrelations = extract_relations(text)\n# → [Relation(source=\"OpenAI\", type=\"released\", target=\"GPT-4\"), ...]\n\ntriplets = extract_triplets(text)\n```\n\n```python\nfrom semantica.deduplication import DuplicateDetector\n\nentities = [\n    {\"id\": \"e1\", \"name\": \"OpenAI Inc.\", \"type\": \"Organization\"},\n    {\"id\": \"e2\", \"name\": \"Open AI\",    \"type\": \"Organization\"},\n    {\"id\": \"e3\", \"name\": \"Anthropic\",  \"type\": \"Organization\"},\n]\n\ndetector   = DuplicateDetector()\nduplicates = detector.detect_duplicates(entities, threshold=0.85)\n# → [(\"e1\", \"e2\")]\n\nduplicates_v2 = detector.detect_duplicates(entities, threshold=0.85, strategy=\"semantic_v2\")\n```\n\n---\n\n## Reasoning Engines\n\n```python\nfrom semantica.reasoning import Reasoner\n\nreasoner = Reasoner()\nreasoner.add_rule(\"IF Person(?x) THEN Mortal(?x)\")\nreasoner.add_rule(\"IF Employee(?x) AND WorksAt(?x, ?y) THEN HasEmployer(?x, ?y)\")\n\nresults = reasoner.infer_facts([\n    \"Person(Socrates)\",\n    \"Employee(Alice)\",\n    {\"source_name\": \"Alice\", \"target_name\": \"OpenAI\", \"type\": \"WorksAt\"},\n])\n# → [\"Mortal(Socrates)\", \"HasEmployer(Alice, OpenAI)\"]\n```\n\n```python\nfrom semantica.reasoning import ReteEngine\n\nrete = ReteEngine()\nrete.add_rule({\n    \"name\": \"flag_high_risk_transaction\",\n    \"conditions\": [\n        {\"field\": \"amount\",  \"operator\": \">\",  \"value\": 10000},\n        {\"field\": \"country\", \"operator\": \"in\", \"value\": [\"IR\", \"KP\", \"SY\"]},\n    ],\n    \"action\": \"flag_for_compliance_review\",\n})\nmatches = rete.match({\"amount\": 15000, \"country\": \"IR\", \"id\": \"txn_9921\"})\n```\n\n```python\nfrom semantica.reasoning import DeductiveReasoner, AbductiveReasoner\n\ndeductive = DeductiveReasoner()\ndeductive.add_axiom(\"All transformers use attention mechanisms\")\ndeductive.add_fact(\"BERT is a transformer\")\nconclusion = deductive.reason(\"Does BERT use attention?\")\n\nabductive = AbductiveReasoner()\nabductive.add_observation(\"The model accuracy dropped 12% after deployment\")\nhypotheses = abductive.generate_hypotheses()\n# → [\"Distribution shift in production data\", \"Preprocessing pipeline mismatch\", ...]\n```\n\n---\n\n## Provenance Tracking\n\nW3C PROV-O compliant lineage tracking. Every fact traces back to its origin.\n\n```python\nfrom semantica.kg import ProvenanceTracker, AlgorithmTrackerWithProvenance\n\ntracker = ProvenanceTracker()\ntracker.track_entity(\"gpt4_benchmark\",\n    source_url=\"https:\u002F\u002Fopenai.com\u002Fresearch\u002Fgpt-4\",\n    metadata={\"metric\": \"MMLU\", \"score\": 86.4})\n\nalgo_tracker = AlgorithmTrackerWithProvenance(provenance=True)\nalgo_tracker.track_graph_construction(\n    algorithm=\"node2vec\",\n    input_data={\"nodes\": 1500, \"edges\": 4200},\n    parameters={\"dimensions\": 128, \"walk_length\": 80},\n)\n\nsources      = tracker.get_all_sources(\"gpt4_benchmark\")\nall_entities = tracker.get_all_entities()\n```\n\n---\n\n## Vector Store & Hybrid Search\n\n```python\nfrom semantica.vector_store import VectorStore\n\nvs = VectorStore(backend=\"faiss\", dimension=768)\n\nvs.store(\"The Transformer architecture revolutionized NLP\",\n         metadata={\"source\": \"arxiv\", \"year\": 2017}, id=\"doc_001\")\nvs.store(\"BERT introduced bidirectional pre-training for language understanding\",\n         metadata={\"source\": \"arxiv\", \"year\": 2018}, id=\"doc_002\")\n\nresults = vs.search(\"attention mechanisms in language models\", top_k=5)\n\nresults = vs.hybrid_search(\n    query=\"transformer pre-training\",\n    top_k=10,\n    vector_weight=0.6,\n    keyword_weight=0.4,\n)\n\nresults = vs.search(\"pre-training\", top_k=5, filter={\"year\": 2018})\n```\n\n---\n\n## Data Ingestion\n\n```python\nfrom semantica.ingest import FileIngestor, WebIngestor, DBIngestor\n\nfile_ingestor = FileIngestor(recursive=True)\ndocs = file_ingestor.ingest(\".\u002Fresearch_papers\u002F\")\n\nweb_ingestor = WebIngestor(max_depth=2)\nweb_docs = web_ingestor.ingest(\"https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762\")\n\ndb_ingestor = DBIngestor(connection_string=\"postgresql:\u002F\u002Fuser:pass@localhost\u002Fkg_db\")\ndb_docs = db_ingestor.ingest(query=\"SELECT title, abstract FROM papers WHERE year >= 2020\")\n\nall_sources = docs + web_docs + db_docs\n```\n\n```python\nfrom semantica.parse import DoclingParser\n\n# Advanced table and layout extraction\ndocling = DoclingParser()\nparsed  = docling.parse(\"financial_report.pdf\")\n```\n\n```python\nfrom semantica.ingest import SnowflakeIngestor\n\n# Connect to Snowflake and ingest a table\ningestor = SnowflakeIngestor(\n    account=\"myorg-myaccount\",\n    user=\"analyst\",\n    password=\"...\",\n    warehouse=\"COMPUTE_WH\",\n    database=\"ANALYTICS\",\n    schema=\"PUBLIC\",\n)\n\n# Ingest a table with optional filtering and pagination\ndata = ingestor.ingest_table(\n    table_name=\"customer_events\",\n    where=\"event_date >= '2024-01-01'\",\n    limit=10000,\n)\n\n# Or run a custom SQL query\ndata = ingestor.ingest_query(\n    query=\"SELECT id, content, tags FROM knowledge_base WHERE active = TRUE\",\n    batch_size=500,\n)\n\n# Convert to Semantica documents for downstream pipeline use\ndocs = ingestor.export_as_documents(data, id_field=\"id\", text_fields=[\"content\"])\n\n# Key-pair and OAuth auth are also supported via env vars:\n# SNOWFLAKE_PRIVATE_KEY_PATH, SNOWFLAKE_TOKEN, SNOWFLAKE_AUTHENTICATOR\n```\n\n---\n\n## Export\n\n```python\nfrom semantica.export import RDFExporter, ParquetExporter, ArangoAQLExporter\n\nrdf_exporter = RDFExporter()\nturtle   = rdf_exporter.export_to_rdf(kg, format=\"turtle\")\njsonld   = rdf_exporter.export_to_rdf(kg, format=\"json-ld\")\nntriples = rdf_exporter.export_to_rdf(kg, format=\"nt\")\n\nparquet_exporter = ParquetExporter()\nparquet_exporter.export_entities(kg,        path=\"output\u002Fentities.parquet\")\nparquet_exporter.export_relationships(kg,   path=\"output\u002Frelationships.parquet\")\nparquet_exporter.export_knowledge_graph(kg, path=\"output\u002F\")\n\naql_exporter = ArangoAQLExporter()\naql_exporter.export(kg, path=\"output\u002Finsert.aql\")\n```\n\n---\n\n## Pipeline Orchestration\n\n```python\nfrom semantica.pipeline import PipelineBuilder, PipelineValidator, FailureHandler\nfrom semantica.pipeline import RetryPolicy, RetryStrategy\n\nbuilder = (\n    PipelineBuilder()\n    .add_stage(\"ingest\",      FileIngestor(recursive=True))\n    .add_stage(\"extract\",     extract_triplets)\n    .add_stage(\"deduplicate\", DuplicateDetector())\n    .add_stage(\"build_kg\",    KnowledgeGraph())\n    .add_stage(\"export\",      RDFExporter())\n    .with_parallel_workers(4)\n)\n\nvalidator = PipelineValidator()\nresult    = validator.validate(builder)\nif result.valid:\n    pipeline = builder.build()\n    pipeline.run(input_path=\".\u002Fdocuments\u002F\")\n\nretry_policy = RetryPolicy(strategy=RetryStrategy.EXPONENTIAL_BACKOFF, max_retries=3)\nhandler = FailureHandler()\nhandler.handle_failure(error=last_error, policy=retry_policy, retry_count=1)\n```\n\n---\n\n## Ontology\n\n```python\nfrom semantica.ontology import OntologyGenerator, OntologyImporter\n\ngenerator = OntologyGenerator()\nontology  = generator.generate(kg)\ngenerator.export(ontology, path=\"domain_ontology.owl\", format=\"turtle\")\n\nimporter = OntologyImporter()\nontology = importer.load(\"existing_ontology.owl\")\nontology = importer.load(\"schema.ttl\", format=\"turtle\")\nontology = importer.load(\"context.jsonld\")\n```\n\n### SHACL Shape Generation & Validation\n\nSemantica turns ontologies into executable data contracts. The constraints layer completes a hybrid reasoning system — symbolic constraints (SHACL) alongside semantic retrieval (embeddings).\n\n**Phase 1 — Generate shapes from any ontology dict:**\n\n```python\nfrom semantica.ontology import OntologyEngine\n\nengine   = OntologyEngine()\nontology = engine.from_data(data)          # or engine.from_text(...) \u002F engine.to_owl(...)\n\n# Generate SHACL shapes — zero hand-authoring\nshacl_ttl  = engine.to_shacl(ontology)                        # Turtle string (default)\nshacl_jld  = engine.to_shacl(ontology, format=\"json-ld\")      # JSON-LD string\nshacl_nt   = engine.to_shacl(ontology, format=\"n-triples\")    # N-Triples string\n\n# Write to file\nengine.export_shacl(ontology, path=\"shapes\u002Fdomain.ttl\")\n```\n\n**Quality tiers — control constraint strictness:**\n\n```python\n# \"basic\"    — node shapes, property paths, datatypes, cardinality\n# \"standard\" — + enumerations (sh:in), patterns, inheritance propagation  [DEFAULT]\n# \"strict\"   — + sh:closed true on all shapes (rejects undeclared properties)\n\nshacl = engine.to_shacl(ontology, quality_tier=\"strict\")\n```\n\n**Phase 2 — Validate a graph against the shapes:**\n\n```python\nimport pathlib\n\nreport = engine.validate_graph(\n    data_graph=pathlib.Path(\"data\u002Fgraph.ttl\").read_text(),\n    ontology=ontology,   # auto-generates SHACL before validating\n    explain=True,        # populate plain-English explanations on each violation\n)\n\nprint(report.summary())\n# → \"Graph does NOT conform: 2 violation(s).\"\n\nfor v in report.violations:\n    print(v.explanation)\n# → \"Node \u003Chttps:\u002F\u002Fexample.com\u002Fjohn> is missing required property \u003Cex:name>. At least 1 value(s) are required.\"\n# → \"Node \u003Chttps:\u002F\u002Fexample.com\u002Facme> has value '999' for \u003Cex:employeeCount> but the expected datatype is xsd:string.\"\n\nimport json\nprint(json.dumps(report.to_dict(), indent=2))  # machine-readable — feed to LLM or pipeline\n```\n\n**Or validate against a pre-built SHACL file:**\n\n```python\nreport = engine.validate_graph(\n    data_graph=graph_turtle_string,\n    shacl=\"shapes\u002Fdomain.ttl\",   # path or SHACL string\n)\n```\n\n**Regenerate shapes in CI to detect breaking ontology changes:**\n\n```bash\npython -c \"\nfrom semantica.ontology import OntologyEngine\nimport json, pathlib\nengine = OntologyEngine()\nonto = engine.from_data(json.loads(pathlib.Path('ontology.json').read_text()))\nengine.export_shacl(onto, 'shapes\u002Fshapes.ttl')\n\"\ngit diff shapes\u002Fshapes.ttl   # detects breaking ontology changes\n```\n\n> **Requires pyshacl for `validate_graph()`:** `pip install semantica[shacl]`\n> Shape generation (`to_shacl`, `export_shacl`) works without any optional dependencies.\n\n---\n\n## Integrations\n\n**Graph Databases**\n- AWS Neptune — Amazon Neptune with IAM authentication\n- Apache AGE — PostgreSQL + openCypher via SQL\n- FalkorDB — native support for decision queries and causal analysis\n\n**Vector Databases**\n- FAISS — high-performance dense vector search\n- Pinecone — serverless and pod-based managed vector database (`pip install semantica[vectorstore-pinecone]`)\n- Weaviate — GraphQL-based vector store with rich schema management (`pip install semantica[vectorstore-weaviate]`)\n- Qdrant — collection-based store with payload filtering (`pip install semantica[vectorstore-qdrant]`)\n- Milvus — scalable store with partition support and multiple index types (`pip install semantica[vectorstore-milvus]`)\n- PgVector — PostgreSQL pgvector extension with JSONB metadata (`pip install semantica[vectorstore-pgvector]`)\n- In-memory — lightweight, zero-dependency store for development and testing\n\n**Data Sources**\n- Snowflake — `SnowflakeIngestor` for table\u002Fquery ingestion, schema introspection, pagination, and multiple auth methods (password, key-pair, OAuth, SSO) (`pip install semantica[db-snowflake]`)\n\n**Document Parsing**\n- Docling — PDF, DOCX, PPTX, XLSX with table and layout extraction\n\n**LLM Providers**\n- 100+ models via LiteLLM — OpenAI, Anthropic, Cohere, Mistral, Ollama, Azure, AWS Bedrock, and more\n- Novita AI — OpenAI-compatible provider (`deepseek\u002Fdeepseek-v3.2` and more); configure via `NOVITA_API_KEY`\n\n**Agentic Frameworks**\n- Complements LangChain, LlamaIndex, AutoGen, CrewAI, Google ADK, and more\n\n> **Agno — First-Class Integration** `pip install semantica[agno]`\n>\n> Semantica ships a dedicated Agno integration with five ready-to-use components:\n> - **`AgnoContextStore`** — graph-backed agent memory\n> - **`AgnoKnowledgeGraph`** — multi-hop GraphRAG knowledge base\n> - **`AgnoDecisionKit`** — 6 decision-intelligence tools\n> - **`AgnoKGToolkit`** — 7 knowledge-graph pipeline tools\n> - **`AgnoSharedContext`** — shared context graph for multi-agent teams\n\n**Export**\n- RDF: Turtle, JSON-LD, N-Triples, XML · Parquet · ArangoDB AQL\n\n---\n\n## Installation\n\n```bash\n# Core\npip install semantica\n\n# With all optional dependencies\npip install semantica[all]\n\n# Vector store backends (install only what you need)\npip install semantica[vectorstore-pinecone]\npip install semantica[vectorstore-weaviate]\npip install semantica[vectorstore-qdrant]\npip install semantica[vectorstore-milvus]\npip install semantica[vectorstore-pgvector]\n\n# SHACL validation (validate_graph)\npip install semantica[shacl]\n\n# Snowflake ingestion\npip install semantica[db-snowflake]\n\n# From source\ngit clone https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica.git\ncd semantica\npip install -e \".[dev]\"\n\n# Run tests\npytest tests\u002F\n```\n\n---\n\n## 🤝 Community & Support\n\n### Join Our Community\n\n| **Channel** | **Purpose** |\n|:-----------:|:-----------|\n| [**Discord**](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH) | Real-time help, showcases |\n| [**GitHub Discussions**](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fdiscussions) | Q&A, feature requests |\n\n### Learning Resources\n\n\n### Enterprise Support\n\nEnterprise support, professional services, and commercial licensing will be available in the future. For now, we offer community support through Discord and GitHub Discussions.\n\n**Current Support:**\n- **Community Support** - Free support via [Discord](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH) and [GitHub Discussions](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fdiscussions)\n- **Bug Reports** - [GitHub Issues](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fissues)\n\n**Future Enterprise Offerings:**\n- Professional support with SLA\n- Enterprise licensing\n- Custom development services\n- Priority feature requests\n- Dedicated support channels\n\nStay tuned for updates!\n\n- **AI \u002F ML engineers** — GraphRAG, explainable agents, decision tracing\n- **Data engineers** — governed semantic pipelines with full provenance\n- **Knowledge engineers** — ontology management and KG construction at scale\n- **High-stakes domains** — healthcare, finance, legal, cybersecurity, government\n\n---\n\n## Resources\n\n- [Documentation](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Ftree\u002Fmain\u002Fdocs)\n- [Cookbook & Notebooks](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Ftree\u002Fmain\u002Fcookbook)\n- [Contributing Guide](CONTRIBUTING.md)\n- [Changelog](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Freleases)\n- [💬 Discord Community](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH)\n- [Follow on X](https:\u002F\u002Fx.com\u002FBuildSemantica)\n\n---\n\n## Contributing\n\nAll contributions welcome — bug fixes, new features, tests, and docs.\n\n1. Fork the repo and create a branch\n2. `pip install -e \".[dev]\"`\n3. Write tests alongside your changes\n4. Open a PR and tag `@KaifAhmad1` for review\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for full guidelines.\n\n---\n\n\u003Cdiv align=\"center\">\n\nMIT License · Built by [Hawksight AI](https:\u002F\u002Fgithub.com\u002FHawksight-AI) · [⭐ Star on GitHub](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica)\n\n[GitHub](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica) • [Discord](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH)\n","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHawksight-AI_semantica_readme_b3f18647a21c.png\" alt=\"Semantica Logo\" width=\"420\"\u002F>\n\n# 🧠 Semantica\n\n**用于构建上下文图和 AI 决策智能层的框架**\n\n[![Python 3.8+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.8+-blue.svg)](https:\u002F\u002Fwww.python.org\u002F)\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fsemantica.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fsemantica\u002F)\n[![Version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fversion-0.3.0-brightgreen.svg)](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Freleases\u002Ftag\u002Fv0.3.0)\n[![Total Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHawksight-AI_semantica_readme_0a351f725a09.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fsemantica)\n[![CI](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fworkflows\u002FCI\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Factions)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join-5865F2?logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH)\n[![X](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX-Follow-black?logo=x&logoColor=white)](https:\u002F\u002Fx.com\u002FBuildSemantica)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join%20Community-5865F2?logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH)\n[![X](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX-Follow%20Semantica-black?logo=x&logoColor=white)](https:\u002F\u002Fx.com\u002FBuildSemantica)\n\n### ⭐ 给我们点个 Star • 🍴 Fork 我们 • 💬 加入我们的 Discord • 🐦 在 X 上关注我们\n\n> **将混沌转化为智能。构建具备上下文图、决策追踪和高级知识工程的 AI 系统，使其可解释、可追溯且值得信赖——而不是黑箱。**\n\n\u003C\u002Fdiv>\n\n---\n\n## 问题\n\n如今的 AI 代理虽然功能强大，却缺乏可信度：\n\n- **无记忆结构** — 代理存储的是嵌入向量，而非语义信息。检索结果模糊不清，无法追问“为何会召回该内容”。\n- **无决策轨迹** — 代理持续做出决策，却未记录任何信息。一旦出现问题，便无历史记录可供调试或审计。\n- **无出处溯源** — 输出结果无法追溯到原始事实来源。在受监管的行业中，这成为合规性障碍。\n- **推理透明度不足** — 黑箱式的回答，缺乏对结论得出过程的解释。\n- **无冲突检测机制** — 相互矛盾的事实会在向量存储中悄然共存，导致输出结果不可预测。\n\n这些问题并非边缘情况，而是阻碍 AI 在医疗、金融、法律和政府等领域部署的根本原因，除非从头构建定制化的安全防护机制。\n\n## 解决方案\n\nSemantica 是您为 AI 技术栈添加的 **上下文与智能层**：\n\n- **上下文图** — 由您的代理在运行过程中构建的实体、关系和决策的结构化图谱。可查询、可追溯、持久化存储。\n- **决策智能** — 每个决策都是第一类对象：被记录、以因果关系链接、可通过先例搜索，并可分析其下游影响。\n- **出处溯源** — 每条事实都与其来源相关联。符合 W3C PROV-O 标准。从数据摄入到推理的完整 lineage。\n- **推理引擎** — 前向链式推理、Rete 网络、演绎推理、溯因推理以及 SPARQL 推理。提供可解释的推理路径，而非黑箱答案。\n- **去重与质量保证** — 冲突检测、实体解析和验证内置于管道中。\n\nSemantica 可与 LangChain、LlamaIndex、AutoGen、CrewAI 以及任何 LLM 提供商协同工作。它并非替代品，而是位于顶层的责任保障层。\n\n### ⚡ 快速安装\n\n```bash\npip install semantica\n```\n\n---\n\n## v0.3.0 新特性\n\n> 首个稳定版本 — PyPI 上标记为 `Production\u002FStable`。分三个阶段发布：0.3.0-alpha、0.3.0-beta 和 0.3.0 稳定版。\n\n| 领域 | 亮点 |\n|------|-----------|\n| **上下文图** | 时间有效性窗口（`valid_from`\u002F`valid_until`）、加权 BFS（`min_weight`）、跨图导航（`link_graph`、`navigate_to`、`resolve_links`）及完整的保存\u002F加载持久化支持 |\n| **决策智能** | 完整生命周期：`record_decision` → `trace_decision_chain` → `analyze_decision_impact` → `find_similar_decisions`；混合先例搜索；带有版本化规则的 `PolicyEngine` |\n| **知识图算法** | PageRank、介数中心性、社区发现（Louvain）、Node2Vec 嵌入、链接预测、路径查找 — 所有结果均返回结构化字典 |\n| **语义抽取** | LLM 关系抽取修复（不再出现静默丢失）；`_match_pattern` 重写；移除重复关系错误；修正 `\"llm_typed\"` 元数据 |\n| **去重 v2** | `blocking_v2`\u002F`hybrid_v2` 候选生成（**快 63.6%**）；两阶段预过滤器（**快 18–25%**）；语义去重 v2（**快 6.98 倍**） |\n| **增量处理** | 基于 SPARQL 的增量差异计算；`delta_mode` 管道；通过 `prune_versions()` 进行快照版本管理 |\n| **导出** | RDF 格式别名（`\"ttl\"`、`\"json-ld\"` 等）；ArangoDB AQL 导出；Apache Parquet 导出（适用于 Spark\u002FBigQuery\u002FDatabricks） |\n| **管道** | 带有 LINEAR\u002FEXPONENTIAL\u002FFIXED 退避策略的 `FailureHandler`；返回 `ValidationResult` 的 `PipelineValidator`；重试循环修复 |\n| **图数据库后端** | Apache AGE（修复 SQL 注入漏洞）、AWS Neptune、FalkorDB、PgVector（HNSW\u002FIVFFlat 索引） |\n| **测试** | **886+ 项通过，0 项失败** — 包括 335 项上下文相关测试、约 430 项知识图测试、70 项语义抽取测试、85 项真实场景端到端测试 |\n\n完整按贡献者划分的说明请参阅 [RELEASE_NOTES.md](RELEASE_NOTES.md)，完整变更日志请参阅 [CHANGELOG.md]。\n\n---\n\n## 尚未发布 \u002F 即将推出\n\n| 领域 | 亮点 |\n|------|-----------|\n| **SHACL 约束** | `OntologyEngine.to_shacl()` 可自动从任意 OWL 本体推导 SHACL 形状；`validate_graph()` 返回结构化的 `SHACLValidationReport`，附带纯英文违规说明；三种质量等级（`\"basic\"`、`\"standard\"`、`\"strict\"`）；三种输出格式（Turtle、JSON-LD、N-Triples）；三级继承传播 |\n\n---\n\n## 功能\n\n### 上下文与决策智能\n- **上下文图** — 实体、关系和决策的结构化图谱；可查询、具因果关系、持久化存储\n- **决策追踪** — 使用 `add_decision()` 和 `record_decision()` 记录、链接并分析每个代理决策\n- **因果链条** — 使用 `add_causal_relationship()` 链接决策，用 `trace_decision_chain()` 追踪其沿袭\n- **先例搜索** — 使用 `find_similar_decisions()` 对过往决策进行混合相似度搜索\n- **影响力分析** — `analyze_decision_impact()` 和 `analyze_decision_influence()` — 了解决策的下游效应\n- **政策引擎** — 使用 `check_decision_rules()` 强制执行业务规则；实现自动化合规验证\n- **代理记忆** — `AgentMemory` 提供短期\u002F长期存储、对话历史和统计数据\n- **跨系统上下文捕获** — `capture_cross_system_inputs()` 用于多代理流水线\n\n### 知识图谱\n- **知识图谱构建** — 实体、关系、属性、类型化边\n- **图算法** — PageRank、介数中心性、聚类系数、社区发现\n- **节点嵌入** — 通过 `NodeEmbedder` 生成 Node2Vec 嵌入\n- **相似度** — 通过 `SimilarityCalculator` 计算余弦相似度\n- **链接预测** — 通过 `LinkPredictor` 评分潜在的新边\n- **时序图** — 考虑时间的节点和边\n- **增量\u002F差异处理** — 在不完全重新计算的情况下更新图\n\n### 语义抽取\n- **实体抽取** — 命名实体识别、归一化、分类\n- **关系抽取** — 使用大语言模型或基于规则的方法从原始文本中生成三元组\n- **大语言模型类型化抽取** — 抽取带有类型化关系元数据的结果\n- **去重 v1** — Jaro-Winkler 相似度、基础分块\n- **去重 v2** — `blocking_v2`、`hybrid_v2`、`semantic_v2` 策略，配合 `max_candidates_per_entity`\n- **三元组去重** — 使用 `dedup_triplets()` 移除重复的 (主体, 谓语, 客体) 三元组\n\n### 推理引擎\n- **正向链式推理** — 使用 IF\u002FTHEN 字符串规则和字典事实的 `Reasoner`\n- **Rete 网络** — 高吞吐量生产规则匹配的 `ReteEngine`\n- **演绎推理** — 用于经典推理的 `DeductiveReasoner`\n- **溯因推理** — 根据观测结果生成假设的 `AbductiveReasoner`\n- **SPARQL 推理** — 用于 RDF 图上基于查询的推理的 `SPARQLReasoner`\n\n### 来源追踪与可审计性\n- **实体来源追踪** — `ProvenanceTracker.track_entity(id, source_url, metadata)`\n- **算法来源追踪** — `AlgorithmTrackerWithProvenance` 追踪计算 lineage\n- **图构建者来源追踪** — `GraphBuilderWithProvenance` 记录来自 URL 的实体来源 lineage\n- **W3C PROV-O 兼容** — 所有模块的 lineage 追踪\n- **变更管理** — 使用校验和进行版本控制，提供审计轨迹和支持合规性\n\n### 向量存储\n- **后端** — FAISS、Pinecone、Weaviate、Qdrant、Milvus、PgVector、内存中存储\n- **语义搜索** — 按嵌入相似度检索 top-k 结果\n- **混合搜索** — 向量 + 关键词，权重可配置\n- **过滤搜索** — 基于元数据对任意字段进行过滤\n- **自定义相似度权重** — 针对不同用例调整检索方式\n\n### 🌐 图数据库支持\n- **AWS Neptune** — 带有 IAM 认证的 Amazon Neptune 图数据库\n- **Apache AGE** — PostgreSQL 图扩展，通过 SQL 支持 openCypher\n- **FalkorDB** — 原生支持；`DecisionQuery` 和 `CausalChainAnalyzer` 可直接处理 FalkorDB 的行\u002F表头格式\n\n### 数据摄入\n- **文件格式** — PDF、DOCX、HTML、JSON、CSV、Excel、PPTX、压缩包\n- **网页爬取** — 可配置深度的 `WebIngestor`\n- **数据库** — 支持 SQL 查询的 `DBIngestor`\n- **Snowflake** — `SnowflakeIngestor` 支持表\u002F查询摄入、分页以及密钥对\u002FOAuth 认证\n- **Docling** — 高级文档解析，支持表格和版面提取（PDF、DOCX、PPTX、XLSX）\n- **媒体** — 图像 OCR、音频\u002F视频元数据提取\n\n### 导出格式\n- **RDF** — Turtle (`.ttl`)、JSON-LD、N-Triples (`.nt`)、XML，通过 `RDFExporter` 输出\n- **Parquet** — `ParquetExporter` 用于导出实体、关系及完整知识图谱\n- **ArangoDB AQL** — 通过 `ArangoAQLExporter` 生成可直接运行的 INSERT 语句\n- **OWL 本体** — 将生成的本体以 Turtle 或 RDF\u002FXML 格式导出\n- **SHACL 形状** — 通过 `RDFExporter.export_shacl()` 自动导出约束形状（`.ttl`、`.jsonld`、`.nt`、`.shacl`）\n\n### 流水线与生产\n- **流水线构建器** — `PipelineBuilder` 支持阶段串联和平行工作进程\n- **验证** — `PipelineValidator` 在执行前返回 `ValidationResult(valid, errors, warnings)`\n- **失败处理** — `FailureHandler` 提供 `RetryPolicy` 和 `RetryStrategy`（指数退避、固定等）\n- **并行处理** — 可为每个流水线阶段配置工作进程数量\n- **大语言模型提供商** — 通过 LiteLLM 支持 100 多种模型（OpenAI、Anthropic、Cohere、Mistral、Ollama 等）\n\n### 本体\n- **自动生成** — 通过 `OntologyGenerator` 从知识图谱推导 OWL 本体\n- **导入** — 通过 `OntologyImporter` 加载现有的 OWL、RDF、Turtle、JSON-LD 本体\n- **验证** — 兼容 HermiT\u002FPellet 的一致性检查\n- **SHACL 形状生成** — `OntologyEngine.to_shacl()` 会自动从任何 Semantica 本体字典中推导出 SHACL 节点和属性形状，无需手动编写，且具有确定性（同一本体生成相同形状）\n- **SHACL 验证** — `OntologyEngine.validate_graph()` 会将形状应用于数据图，并返回包含机器可读违规信息及英文说明的 `SHACLValidationReport`\n- **质量等级** — `\"basic\"`（结构 + 基数）、`\"standard\"`（+ 枚举、继承）、`\"strict\"`（+ `sh:closed` 拒绝未声明的属性）\n- **继承传播** — 子形状会自动包含所有祖先属性形状（最多 3 层以上），且循环安全\n- **三种输出格式** — Turtle (`.ttl`)、JSON-LD、N-Triples；可通过 `export_shacl()` 导出文件\n\n---\n\n## 模块\n\n| 模块 | 提供的功能 |\n|---|---|\n| `semantica.context` | 上下文图、智能体记忆、决策追踪、因果分析、先例检索、策略引擎 |\n| `semantica.kg` | 知识图谱构建、图算法、中心性分析、社区发现、嵌入、链接预测、溯源 |\n| `semantica.semantic_extract` | 命名实体识别、关系抽取、事件抽取、指代消解、三元组生成、大模型增强的抽取 |\n| `semantica.reasoning` | 正向链式推理、Rete网络、演绎推理、溯因推理、SPARQL推理、解释生成 |\n| `semantica.vector_store` | FAISS、Pinecone、Weaviate、Qdrant、Milvus、PgVector、内存存储；混合与过滤搜索 |\n| `semantica.export` | RDF（Turtle\u002FJSON-LD\u002FN-Triples\u002FXML）、Parquet、ArangoDB AQL、CSV、YAML、OWL、图格式 |\n| `semantica.ingest` | 文件（PDF、DOCX、CSV、HTML）、网页爬取、信息流、数据库、Snowflake、MCP、电子邮件、代码仓库 |\n| `semantica.ontology` | 自动化生成（6阶段流程）、OWL\u002FRDF导出、导入（OWL\u002FRDF\u002FTurtle\u002FJSON-LD）、验证、版本控制、**SHACL形状生成与验证** |\n| `semantica.pipeline` | 流水线DSL、并行工作进程、验证、重试策略、故障处理、资源调度 |\n| `semantica.graph_store` | 图数据库后端——Neo4j、FalkorDB、Apache AGE、Amazon Neptune；Cypher查询 |\n| `semantica.embeddings` | 文本嵌入生成——Sentence-Transformers、FastEmbed、OpenAI、BGE；相似度计算 |\n| `semantica.deduplication` | 实体去重、相似度评分、合并、聚类；阻塞与语义策略 |\n| `semantica.provenance` | 符合W3C PROV-O标准的端到端血缘追踪、来源归因、审计轨迹 |\n| `semantica.parse` | 文档解析——PDF、DOCX、PPTX、HTML、代码、电子邮件、结构化数据、含OCR的媒体 |\n| `semantica.split` | 文档分块——递归式、语义式、实体感知式、关系感知式、基于图的、本体感知式 |\n| `semantica.normalize` | 数据标准化——文本、实体、日期、数字、数量、语言、编码 |\n| `semantica.conflicts` | 多源冲突检测（值、类型、关系、时间、逻辑）及解决策略 |\n| `semantica.change_management` | 版本存储、变更追踪、校验和、审计轨迹、KG与本体的合规支持 |\n| `semantica.triplet_store` | RDF三元组存储集成——Blazegraph、Jena、RDF4J；SPARQL查询与批量加载 |\n| `semantica.visualization` | KG、本体、嵌入、分析结果及时间序列图的交互式与静态可视化 |\n| `semantica.seed` | 种子数据管理——用于从CSV、JSON、数据库和API构建初始KG |\n| `semantica.core` | 框架编排、配置管理、知识库构建、插件系统 |\n| `semantica.llms` | 大模型提供商集成——Groq、OpenAI、Novita AI、HuggingFace、LiteLLM |\n| `semantica.utils` | 共享工具——日志记录、验证、异常处理、常量、类型、进度跟踪 |\n\n---\n\n## ⚡ 快速入门\n\n```python\nimport semantica\nfrom semantica.context import AgentContext, ContextGraph\nfrom semantica.vector_store import VectorStore\n\n# 构建具有结构化上下文的智能体\ncontext = AgentContext(\n    vector_store=VectorStore(backend=\"faiss\", dimension=768),\n    knowledge_graph=ContextGraph(advanced_analytics=True),\n    decision_tracking=True,\n    kg_algorithms=True,\n)\n\n# 存储记忆\nmemory_id = context.store(\n    \"GPT-4在推理基准测试中比GPT-3.5高出40%\",\n    conversation_id=\"research_session_1\",\n)\n\n# 记录带有完整上下文的决策\ndecision_id = context.record_decision(\n    category=\"model_selection\",\n    scenario=\"为生产推理流水线选择大模型\",\n    reasoning=\"GPT-4的基准优势足以证明其价格高出3倍的合理性\",\n    outcome=\"selected_gpt4\",\n    confidence=0.91，\n    entities=[\"gpt4\", \"gpt35\", \"reasoning_pipeline\"],\n)\n\n# 查找历史中的相似决策\nprecedents = context.find_precedents(\"模型选择推理\", limit=5)\n\n# 分析该决策的下游影响\ninfluence = context.analyze_decision_influence(decision_id)\n```\n\n**[📖 完整快速入门](#-quick-start)** • **[🍳 食谱示例](#-semantica-cookbook)** • **[💬 加入Discord](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH)** • **[⭐ 给我们点个赞](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica)**\n\n---\n\n## 核心价值主张\n\n| **可信** | **可解释** | **可审计** |\n|:------------------:|:------------------:|:-----------------:|\n| 冲突检测与验证 | 透明的推理路径 | 完整的溯源追踪 |\n| 基于规则的治理 | 实体关系与本体 | 符合W3C PROV-O标准的血缘 |\n| 生产级质量保证 | 多跳图推理 | 来源追踪与完整性验证 |\n\n---\n\n## 关键特性与优势\n\n### 不只是又一个智能体框架\n\n**Semantica补充了** LangChain、LlamaIndex、AutoGen、CrewAI、Google ADK、Agno等框架，通过以下方式增强您的智能体：\n\n| 特性 | 优势 |\n|:--------|:--------|\n| **上下文图** | 结构化的知识表示，包含实体关系与语义上下文 |\n| **决策追踪** | 完整的决策生命周期管理，支持先例检索与因果分析 |\n| **KG算法** | 高级图数据分析，包括中心性、社区发现和嵌入 |\n| **向量存储集成** | 混合搜索，支持自定义相似度权重与高级过滤 |\n| **可审计** | 完整的溯源追踪，符合W3C PROV-O标准 |\n| **可解释** | 透明的推理路径，结合实体关系 |\n| **溯源感知** | 从文档到响应的端到端血缘 |\n| **已验证** | 内置冲突检测、去重与质量保证 |\n| **受治理** | 基于规则的验证与语义一致性 |\n| **版本控制** | 企业级变更管理，确保完整性 |\n\n### 非常适合高风险应用场景\n\n| 🏥 **医疗保健** | 💰 **金融** | ⚖️ **法律** |\n|:-----------------:|:--------------:|:------------:|\n| 临床决策 | 欺诈检测 | 证据支持的研究 |\n| 药物相互作用 | 监管支持 | 合同分析 |\n| 患者安全 | 风险评估 | 判例法推理 |\n\n| 🔒 **网络安全** | 🏛️ **政府** | 🏭 **基础设施** | 🚗 **自动驾驶** |\n|:-------------------:|:----------------:|:-------------------:|:-----------------:|\n| 威胁归因 | 政策决策 | 电网 | 决策日志 |\n| 事件响应 | 机密信息 | 交通运输 | 安全验证 |\n\n### 驱动您的 AI 技术栈\n\n- **上下文图谱** — 带有实体关系和语义上下文的结构化知识表示\n- **决策追踪系统** — 全面的决策生命周期管理，支持先例检索与因果分析\n- **GraphRAG 系统** — 结合图推理与混合搜索的检索系统，采用知识图谱算法\n- **AI 代理** — 可信、可问责的多智能体系统，具备语义记忆与决策历史\n- **推理模型** — 提供带有推理路径与影响分析的可解释 AI 决策\n- **企业级 AI** — 受治理、可审计的平台，支持合规性与政策执行\n\n### 集成\n\n- **Docling 支持** — 文档解析与表格提取（PDF、DOCX、PPTX、XLSX）\n- **AWS Neptune** — 支持 Amazon Neptune 图数据库，并提供 IAM 身份验证\n- **Apache AGE** — PostgreSQL 图扩展后端（通过 SQL 使用 openCypher）\n- **Snowflake** — 原生数据摄取工具 `SnowflakeIngestor`；支持表\u002F查询摄取、分页、密钥对及 OAuth 认证\n- **自定义本体导入** — 导入现有本体（OWL、RDF、Turtle、JSON-LD）\n\n> **专为需要每一条回答都可解释且受监管的环境打造。**\n\n---\n\n## 上下文图谱与决策追踪\n\nSemantica 的旗舰模块。将您的代理所做的每一个决策以结构化的图节点形式记录下来——包含因果链接、先例检索、影响分析以及政策执行。\n\n```python\nfrom semantica.context import ContextGraph\n\ngraph = ContextGraph(advanced_analytics=True)\n\n# 记录一笔贷款审批决策\nloan_id = graph.add_decision(\n    category=\"loan_approval\",\n    scenario=\"抵押贷款申请 — 信用评分780，DTI 28%\",\n    reasoning=\"良好的信用记录，稳定收入8年，低 DTI\",\n    outcome=\"批准\",\n    confidence=0.95,\n)\n\n# 记录下游的从属决策\nrate_id = graph.add_decision(\n    category=\"interest_rate\",\n    scenario=\"为已批准的抵押贷款设定利率\",\n    reasoning=\"优质申请人符合最低等级利率条件\",\n    outcome=\"利率定为6.2%\",\n    confidence=0.98，\n)\n\n# 将两个决策以因果关系连接\ngraph.add_causal_relationship(loan_id, rate_id, relationship_type=\"enables\")\n\n# 使用混合相似度查找类似的历史决策\nsimilar    = graph.find_similar_decisions(\"抵押贷款审批\", max_results=5)\nchain      = graph.trace_decision_chain(loan_id)\nimpact     = graph.analyze_decision_impact(loan_id)\ncompliance = graph.check_decision_rules({\"category\": \"loan_approval\", \"confidence\": 0.95})\ninsights   = graph.get_decision_insights()\n```\n\n```python\nfrom semantica.context import AgentContext, AgentMemory\nfrom semantica.vector_store import VectorStore\n\ncontext = AgentContext(\n    vector_store=VectorStore(backend=\"inmemory\"),\n    knowledge_graph=ContextGraph(advanced_analytics=True),\n    decision_tracking=True,\n    graph_expansion=True,\n    kg_algorithms=True,\n)\n\ncontext.store(\"欧盟法规 2024\u002F1689 要求高风险 AI 必须具备可解释性\", conversation_id=\"compliance_review\")\ncontext.store(\"我们的欺诈模型会标记 0.3% 的交易\", conversation_id=\"compliance_review\")\n\nresults = context.retrieve(\"AI 法规中的可解释性要求\", limit=3)\nhistory = context.get_conversation_history(\"compliance_review\")\nstats   = context.get_statistics()\n```\n\n---\n\n## 知识图谱\n\n```python\nfrom semantica.kg import KnowledgeGraph, Entity, Relationship\nfrom semantica.kg import CentralityAnalyzer, NodeEmbedder, LinkPredictor\n\nkg = KnowledgeGraph()\n\nkg.add_entity(Entity(id=\"transformer\", label=\"Transformer\", type=\"Architecture\",\n                     properties={\"year\": 2017, \"paper\": \"Attention Is All You Need\"}))\nkg.add_entity(Entity(id=\"bert\", label=\"BERT\", type=\"Model\",\n                     properties={\"year\": 2018, \"parameters\": \"340M\"}))\nkg.add_entity(Entity(id=\"gpt4\", label=\"GPT-4\", type=\"Model\", properties={\"year\": 2023}))\n\nkg.add_relationship(Relationship(source=\"bert\", target=\"transformer\", type=\"based_on\"))\nkg.add_relationship(Relationship(source=\"gpt4\", target=\"transformer\", type=\"based_on\"))\n\n# 图算法\nanalyzer    = CentralityAnalyzer(kg)\ncentrality  = analyzer.compute_pagerank()\nbetweenness = analyzer.compute_betweenness()\n\n# 节点嵌入（Node2Vec）\nembedder   = NodeEmbedder()\nembeddings = embedder.compute_embeddings(kg, node_labels=[\"Model\"], relationship_types=[\"based_on\"])\n\n# 链接预测\npredictor = LinkPredictor()\nscore     = predictor.score_link(kg, \"gpt4\", \"bert\", method=\"common_neighbors\")\n\nmodels      = kg.find_nodes(type=\"Model\")\ndescendants = kg.get_neighbors(\"transformer\", direction=\"incoming\")\n```\n\n---\n\n## 语义抽取\n\n```python\nfrom semantica.semantic_extract import extract_entities, extract_relations, extract_triplets\n\ntext = \"\"\"\nOpenAI 于 2023 年 3 月发布了 GPT-4。微软将 GPT-4 集成到了 Azure OpenAI 服务中。\n由前 OpenAI 研究人员创立的 Anthropic 发布了 Claude 作为竞争模型。\n\"\"\"\n\nentities = extract_entities(text)\n# → [Entity(label=\"OpenAI\", type=\"Organization\"), Entity(label=\"GPT-4\", type=\"Model\"), ...]\n\nrelations = extract_relations(text)\n# → [Relation(source=\"OpenAI\", type=\"released\", target=\"GPT-4\"), ...]\n\ntriplets = extract_triplets(text)\n```\n\n```python\nfrom semantica.deduplication import DuplicateDetector\n\nentities = [\n    {\"id\": \"e1\", \"name\": \"OpenAI Inc.\", \"type\": \"Organization\"},\n    {\"id\": \"e2\", \"name\": \"Open AI\",    \"type\": \"Organization\"},\n    {\"id\": \"e3\", \"name\": \"Anthropic\",  \"type\": \"Organization\"},\n]\n\ndetector   = DuplicateDetector()\nduplicates = detector.detect_duplicates(entities, threshold=0.85)\n# → [(\"e1\", \"e2\")]\n\nduplicates_v2 = detector.detect_duplicates(entities, threshold=0.85, strategy=\"semantic_v2\")\n```\n\n---\n\n## 推理引擎\n\n```python\nfrom semantica.reasoning import Reasoner\n\nreasoner = Reasoner()\nreasoner.add_rule(\"IF Person(?x) THEN Mortal(?x)\")\nreasoner.add_rule(\"IF Employee(?x) AND WorksAt(?x, ?y) THEN HasEmployer(?x, ?y)\")\n\nresults = reasoner.infer_facts([\n    \"Person(Socrates)\",\n    \"Employee(Alice)\",\n    {\"source_name\": \"Alice\", \"target_name\": \"OpenAI\", \"type\": \"WorksAt\"},\n])\n# → [\"Mortal(Socrates)\", \"HasEmployer(Alice, OpenAI)\"]\n```\n\n```python\nfrom semantica.reasoning import ReteEngine\n\nrete = ReteEngine()\nrete.add_rule({\n    \"name\": \"flag_high_risk_transaction\",\n    \"conditions\": [\n        {\"field\": \"amount\",  \"operator\": \">\",  \"value\": 10000},\n        {\"field\": \"country\", \"operator\": \"in\", \"value\": [\"IR\", \"KP\", \"SY\"]},\n    ],\n    \"action\": \"flag_for_compliance_review\",\n})\nmatches = rete.match({\"amount\": 15000, \"country\": \"IR\", \"id\": \"txn_9921\"})\n```\n\n```python\nfrom semantica.reasoning import DeductiveReasoner, AbductiveReasoner\n\ndeductive = DeductiveReasoner()\ndeductive.add_axiom(\"所有 Transformer 模型都使用注意力机制\")\ndeductive.add_fact(\"BERT 是一种 Transformer\")\nconclusion = deductive.reason(\"BERT 是否使用注意力机制？\")\n\nabductive = AbductiveReasoner()\nabductive.add_observation(\"模型部署后准确率下降了 12%\")\nhypotheses = abductive.generate_hypotheses()\n\n# → [\"生产数据中的分布偏移\", \"预处理管道不匹配\", ...]\n```\n\n---\n\n## 出处追踪\n\n符合 W3C PROV-O 标准的谱系追踪。每个事实都能追溯到其源头。\n\n```python\nfrom semantica.kg import ProvenanceTracker, AlgorithmTrackerWithProvenance\n\ntracker = ProvenanceTracker()\ntracker.track_entity(\"gpt4_benchmark\",\n    source_url=\"https:\u002F\u002Fopenai.com\u002Fresearch\u002Fgpt-4\",\n    metadata={\"metric\": \"MMLU\", \"score\": 86.4})\n\nalgo_tracker = AlgorithmTrackerWithProvenance(provenance=True)\nalgo_tracker.track_graph_construction(\n    algorithm=\"node2vec\",\n    input_data={\"nodes\": 1500, \"edges\": 4200},\n    parameters={\"dimensions\": 128, \"walk_length\": 80},\n)\n\nsources      = tracker.get_all_sources(\"gpt4_benchmark\")\nall_entities = tracker.get_all_entities()\n```\n\n---\n\n## 向量存储与混合搜索\n\n```python\nfrom semantica.vector_store import VectorStore\n\nvs = VectorStore(backend=\"faiss\", dimension=768)\n\nvs.store(\"Transformer 架构彻底革新了自然语言处理\",\n         metadata={\"source\": \"arxiv\", \"year\": 2017}, id=\"doc_001\")\nvs.store(\"BERT 引入了用于语言理解的双向预训练\",\n         metadata={\"source\": \"arxiv\", \"year\": 2018}, id=\"doc_002\")\n\nresults = vs.search(\"语言模型中的注意力机制\", top_k=5)\n\nresults = vs.hybrid_search(\n    query=\"Transformer 预训练\",\n    top_k=10,\n    vector_weight=0.6,\n    keyword_weight=0.4,\n)\n\nresults = vs.search(\"预训练\", top_k=5, filter={\"year\": 2018})\n```\n\n---\n\n## 数据摄取\n\n```python\nfrom semantica.ingest import FileIngestor, WebIngestor, DBIngestor\n\nfile_ingestor = FileIngestor(recursive=True)\ndocs = file_ingestor.ingest(\".\u002Fresearch_papers\u002F\")\n\nweb_ingestor = WebIngestor(max_depth=2)\nweb_docs = web_ingestor.ingest(\"https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762\")\n\ndb_ingestor = DBIngestor(connection_string=\"postgresql:\u002F\u002Fuser:pass@localhost\u002Fkg_db\")\ndb_docs = db_ingestor.ingest(query=\"SELECT title, abstract FROM papers WHERE year >= 2020\")\n\nall_sources = docs + web_docs + db_docs\n```\n\n```python\nfrom semantica.parse import DoclingParser\n\n# 高级表格和布局提取\ndocling = DoclingParser()\nparsed  = docling.parse(\"financial_report.pdf\")\n```\n\n```python\nfrom semantica.ingest import SnowflakeIngestor\n\n# 连接到 Snowflake 并摄取一张表\ningestor = SnowflakeIngestor(\n    account=\"myorg-myaccount\",\n    user=\"analyst\",\n    password=\"...\",\n    warehouse=\"COMPUTE_WH\",\n    database=\"ANALYTICS\",\n    schema=\"PUBLIC\",\n)\n\n# 摄取一张表，可选进行过滤和分页\ndata = ingestor.ingest_table(\n    table_name=\"customer_events\",\n    where=\"event_date >= '2024-01-01'\",\n    limit=10000,\n)\n\n# 或者运行自定义 SQL 查询\ndata = ingestor.ingest_query(\n    query=\"SELECT id, content, tags FROM knowledge_base WHERE active = TRUE\",\n    batch_size=500,\n)\n\n# 转换为 Semantica 文档，供下游流程使用\ndocs = ingestor.export_as_documents(data, id_field=\"id\", text_fields=[\"content\"])\n\n# 支持通过环境变量进行密钥对和 OAuth 认证：\n# SNOWFLAKE_PRIVATE_KEY_PATH、SNOWFLAKE_TOKEN、SNOWFLAKE_AUTHENTICATOR\n```\n\n---\n\n## 导出\n\n```python\nfrom semantica.export import RDFExporter、ParquetExporter、ArangoAQLExporter\n\nrdf_exporter = RDFExporter()\nturtle   = rdf_exporter.export_to_rdf(kg, format=\"turtle\")\njsonld   = rdf_exporter.export_to_rdf(kg, format=\"json-ld\")\nntriples = rdf_exporter.export_to_rdf(kg, format=\"nt\")\n\nparquet_exporter = ParquetExporter()\nparquet_exporter.export_entities(kg,        path=\"output\u002Fentities.parquet\")\nparquet_exporter.export_relationships(kg,   path=\"output\u002Frelationships.parquet\")\nparquet_exporter.export_knowledge_graph(kg, path=\"output\u002F\")\n\naql_exporter = ArangoAQLExporter()\naql_exporter.export(kg, path=\"output\u002Finsert.aql\")\n```\n\n---\n\n## 流水线编排\n\n```python\nfrom semantica.pipeline import PipelineBuilder、PipelineValidator、FailureHandler\nfrom semantica.pipeline import RetryPolicy、RetryStrategy\n\nbuilder = (\n    PipelineBuilder()\n    .add_stage(\"ingest\",      FileIngestor(recursive=True))\n    .add_stage(\"extract\",     extract_triplets)\n    .add_stage(\"deduplicate\", DuplicateDetector())\n    .add_stage(\"build_kg\",    KnowledgeGraph())\n    .add_stage(\"export\",      RDFExporter())\n    .with_parallel_workers(4)\n)\n\nvalidator = PipelineValidator()\nresult    = validator.validate(builder)\nif result.valid:\n    pipeline = builder.build()\n    pipeline.run(input_path=\".\u002Fdocuments\u002F\")\n\nretry_policy = RetryPolicy(strategy=RetryStrategy.EXPONENTIAL_BACKOFF, max_retries=3)\nhandler = FailureHandler()\nhandler.handle_failure(error=last_error, policy=retry_policy, retry_count=1)\n```\n\n---\n\n## 本体论\n\n```python\nfrom semantica.ontology import OntologyGenerator、OntologyImporter\n\ngenerator = OntologyGenerator()\nontology  = generator.generate(kg)\ngenerator.export(ontology, path=\"domain_ontology.owl\", format=\"turtle\")\n\nimporter = OntologyImporter()\nontology = importer.load(\"existing_ontology.owl\")\nontology = importer.load(\"schema.ttl\", format=\"turtle\")\nontology = importer.load(\"context.jsonld\")\n```\n\n### SHACL 形状生成与验证\n\nSemantica 将本体论转化为可执行的数据契约。约束层完善了混合推理系统——符号约束（SHACL）与语义检索（嵌入）相结合。\n\n**阶段 1 — 从任意本体字典生成形状：**\n\n```python\nfrom semantica.ontology import OntologyEngine\n\nengine   = OntologyEngine()\nontology = engine.from_data(data)          # 或 engine.from_text(...) \u002F engine.to_owl(...)\n\n# 生成 SHACL 形状——无需手动编写\nshacl_ttl  = engine.to_shacl(ontology)                        # Turtle 字符串（默认）\nshacl_jld  = engine.to_shacl(ontology, format=\"json-ld\")      # JSON-LD 字符串\nshacl_nt   = engine.to_shacl(ontology, format=\"n-triples\")    # N-Triples 字符串\n\n# 写入文件\nengine.export_shacl(ontology, path=\"shapes\u002Fdomain.ttl\")\n```\n\n**质量等级——控制约束严格程度：**\n\n```python\n# \"basic\"    — 节点形状、属性路径、数据类型、基数\n# \"standard\" — + 枚举（sh:in）、模式、继承传播  [默认]\n# \"strict\"   — + sh:closed true 在所有形状上（拒绝未声明的属性）\n\nshacl = engine.to_shacl(ontology, quality_tier=\"strict\")\n```\n\n**阶段 2 — 根据形状验证图：**\n\n```python\nimport pathlib\n\nreport = engine.validate_graph(\n    data_graph=pathlib.Path(\"data\u002Fgraph.ttl\").read_text(),\n    ontology=ontology,   # 自动在验证前生成 SHACL\n    explain=True,        # 为每项违规添加通俗易懂的解释\n)\n\nprint(report.summary())\n# → \"图不符合规范：2 处违规。\"\n\nfor v in report.violations:\n    print(v.explanation)\n# → \"节点 \u003Chttps:\u002F\u002Fexample.com\u002Fjohn> 缺少必需的属性 \u003Cex:name>。至少需要 1 个值。\"\n\n# → “节点 \u003Chttps:\u002F\u002Fexample.com\u002Facme> 在 \u003Cex:employeeCount> 上的值为 '999'，但预期的数据类型是 xsd:string。”\n\nimport json\nprint(json.dumps(report.to_dict(), indent=2))  # 机器可读 — 可输入到 LLM 或流水线中\n```\n\n**或者使用预构建的 SHACL 文件进行验证：**\n\n```python\nreport = engine.validate_graph(\n    data_graph=graph_turtle_string,\n    shacl=\"shapes\u002Fdomain.ttl\",   # 路径或 SHACL 字符串\n)\n```\n\n**在 CI 中重新生成形状以检测破坏性本体变更：**\n\n```bash\npython -c \"\nfrom semantica.ontology import OntologyEngine\nimport json, pathlib\nengine = OntologyEngine()\nonto = engine.from_data(json.loads(pathlib.Path('ontology.json').read_text()))\nengine.export_shacl(onto, 'shapes\u002Fshapes.ttl')\n\"\ngit diff shapes\u002Fshapes.ttl   # 检测破坏性本体变更\n```\n\n> **`validate_graph()` 需要 pyshacl：** `pip install semantica[shacl]`\n> 形状生成（`to_shacl`、`export_shacl`）无需任何可选依赖即可运行。\n\n---\n\n## 集成\n\n**图数据库**\n- AWS Neptune — 带有 IAM 认证的 Amazon Neptune\n- Apache AGE — 通过 SQL 使用 PostgreSQL + openCypher\n- FalkorDB — 原生支持决策查询和因果分析\n\n**向量数据库**\n- FAISS — 高性能稠密向量搜索\n- Pinecone — 无服务器且基于 Pod 的托管向量数据库（`pip install semantica[vectorstore-pinecone]`）\n- Weaviate — 基于 GraphQL 的向量存储，具有丰富的模式管理功能（`pip install semantica[vectorstore-weaviate]`）\n- Qdrant — 基于集合的存储，支持负载过滤（`pip install semantica[vectorstore-qdrant]`）\n- Milvus — 具有分区支持和多种索引类型的可扩展存储（`pip install semantica[vectorstore-milvus]`）\n- PgVector — 带有 JSONB 元数据的 PostgreSQL pgvector 扩展（`pip install semantica[vectorstore-pgvector]`）\n- 内存中 — 轻量级、零依赖的存储，适用于开发和测试\n\n**数据源**\n- Snowflake — `SnowflakeIngestor` 用于表\u002F查询摄取、模式自省、分页以及多种认证方式（密码、密钥对、OAuth、SSO）（`pip install semantica[db-snowflake]`）\n\n**文档解析**\n- Docling — 支持 PDF、DOCX、PPTX、XLSX，并可提取表格和布局\n\n**LLM 提供商**\n- 通过 LiteLLM 支持 100 多种模型 — OpenAI、Anthropic、Cohere、Mistral、Ollama、Azure、AWS Bedrock 等\n- Novita AI — 兼容 OpenAI 的提供商（`deepseek\u002Fdeepseek-v3.2` 等）；可通过 `NOVITA_API_KEY` 进行配置\n\n**代理框架**\n- 补充 LangChain、LlamaIndex、AutoGen、CrewAI、Google ADK 等\n\n> **Agno — 一流集成** `pip install semantica[agno]`\n>\n> Semantica 提供专门的 Agno 集成，包含五个开箱即用的组件：\n> - **`AgnoContextStore`** — 基于图的代理记忆\n> - **`AgnoKnowledgeGraph`** — 多跳 GraphRAG 知识库\n> - **`AgnoDecisionKit`** — 6 种决策智能工具\n> - **`AgnoKGToolkit`** — 7 种知识图谱流水线工具\n> - **`AgnoSharedContext`** — 用于多代理团队的共享上下文图\n\n**导出**\n- RDF：Turtle、JSON-LD、N-Triples、XML · Parquet · ArangoDB AQL\n\n---\n\n## 安装\n\n```bash\n# 核心包\npip install semantica\n\n# 包含所有可选依赖\npip install semantica[all]\n\n# 向量存储后端（仅安装所需）\npip install semantica[vectorstore-pinecone]\npip install semantica[vectorstore-weaviate]\npip install semantica[vectorstore-qdrant]\npip install semantica[vectorstore-milvus]\npip install semantica[vectorstore-pgvector]\n\n# SHACL 验证（validate_graph）\npip install semantica[shacl]\n\n# Snowflake 数据摄取\npip install semantica[db-snowflake]\n\n# 从源码安装\ngit clone https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica.git\ncd semantica\npip install -e \".[dev]\"\n\n# 运行测试\npytest tests\u002F\n```\n\n---\n\n## 🤝 社区与支持\n\n### 加入我们的社区\n\n| **频道** | **用途** |\n|:-----------:|:-----------|\n| [**Discord**](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH) | 实时帮助、案例展示 |\n| [**GitHub Discussions**](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fdiscussions) | 问答、功能请求 |\n\n### 学习资源\n\n\n### 企业支持\n\n未来将提供企业支持、专业服务和商业许可。目前，我们通过 Discord 和 GitHub Discussions 提供社区支持。\n\n**当前支持：**\n- **社区支持** - 通过 [Discord](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH) 和 [GitHub Discussions](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fdiscussions) 提供的免费支持\n- **Bug 报告** - [GitHub Issues](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fissues)\n\n**未来的企业服务：**\n- 带有 SLA 的专业支持\n- 企业许可\n- 定制开发服务\n- 优先处理的功能请求\n- 专属支持渠道\n\n敬请期待更新！\n\n- **AI \u002F ML 工程师** — GraphRAG、可解释代理、决策追踪\n- **数据工程师** — 受治理的语义管道，具备完整溯源能力\n- **知识工程师** — 大规模本体管理和知识图谱构建\n- **高风险领域** — 医疗保健、金融、法律、网络安全、政府\n\n---\n\n## 资源\n\n- [文档](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Ftree\u002Fmain\u002Fdocs)\n- [食谱与笔记本](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Ftree\u002Fmain\u002Fcookbook)\n- [贡献指南](CONTRIBUTING.md)\n- [变更日志](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Freleases)\n- [💬 Discord 社区](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH)\n- [关注 X](https:\u002F\u002Fx.com\u002FBuildSemantica)\n\n---\n\n## 贡献\n\n欢迎所有贡献 — Bug 修复、新功能、测试和文档。\n\n1. 分支仓库并创建分支\n2. `pip install -e \".[dev]\"`\n3. 在修改的同时编写测试\n4. 打开 PR 并标记 `@KaifAhmad1` 进行审查\n\n完整指南请参阅 [CONTRIBUTING.md](CONTRIBUTING.md)。\n\n---\n\n\u003Cdiv align=\"center\">\n\nMIT 许可证 · 由 [Hawksight AI](https:\u002F\u002Fgithub.com\u002FHawksight-AI) 构建 · [⭐ 在 GitHub 上点赞](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica)\n\n[GitHub](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica) • [Discord](https:\u002F\u002Fdiscord.gg\u002FsV34vps5hH)","# Semantica 快速上手指南\n\nSemantica 是一个用于构建 AI 上下文图谱（Context Graphs）和决策智能层（Decision Intelligence Layers）的框架。它旨在解决当前 AI 代理缺乏记忆结构、决策追踪、数据来源溯源及推理透明度等问题，让 AI 系统变得可解释、可追踪且可信。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS, 或 Windows\n*   **Python 版本**：3.8 或更高版本 (推荐 3.9+)\n*   **包管理工具**：pip\n*   **前置依赖**：无特殊系统级依赖，核心功能通过 Python 包自动安装。若需使用特定图数据库后端（如 Apache AGE, AWS Neptune），请确保相关数据库服务已就绪。\n\n## 安装步骤\n\n### 1. 基础安装\n使用 pip 直接安装最新稳定版：\n\n```bash\npip install semantica\n```\n\n### 2. 国内加速安装（推荐）\n如果您在中国大陆地区，建议使用国内镜像源以加快下载速度：\n\n```bash\npip install semantica -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 3. 验证安装\n安装完成后，可通过以下命令检查版本：\n\n```bash\npython -c \"import semantica; print(semantica.__version__)\"\n```\n\n## 基本使用\n\nSemantica 的核心功能是构建上下文图谱并记录决策过程。以下是一个最小化的使用示例，展示如何创建图谱、添加实体关系、记录决策并进行因果追踪。\n\n### 示例：构建上下文图谱与决策追踪\n\n```python\nfrom semantica.context import ContextGraph, AgentMemory\nfrom semantica.decision import DecisionEngine\n\n# 1. 初始化上下文图谱和代理记忆\ngraph = ContextGraph()\nmemory = AgentMemory(graph)\ndecision_engine = DecisionEngine(graph)\n\n# 2. 构建知识图谱 (添加实体和关系)\n# 添加实体\ngraph.add_entity(\"Patient_001\", type=\"Person\", properties={\"age\": 45, \"symptom\": \"fever\"})\ngraph.add_entity(\"Medication_A\", type=\"Drug\", properties={\"name\": \"Aspirin\"})\n\n# 添加关系\ngraph.add_relationship(\"Patient_001\", \"has_symptom\", \"fever\")\ngraph.add_relationship(\"Medication_A\", \"treats\", \"fever\")\n\n# 3. 记录 AI 决策\n# 模拟一个 AI 代理做出的治疗决策\ndecision_id = decision_engine.record_decision(\n    agent_id=\"MedicalAgent_v1\",\n    action=\"prescribe\",\n    target=\"Medication_A\",\n    context={\"patient\": \"Patient_001\", \"reason\": \"fever_reduction\"},\n    metadata={\"confidence\": 0.95}\n)\n\n# 4. 建立因果链\n# 将决策与之前的观察事实联系起来\ndecision_engine.add_causal_relationship(\n    cause_entity=\"fever\",\n    effect_decision=decision_id,\n    relationship_type=\"triggered_by\"\n)\n\n# 5. 查询与追溯\n# 追踪决策链路，查看该决策是如何产生的\nchain = decision_engine.trace_decision_chain(decision_id)\nprint(f\"决策链路: {chain}\")\n\n# 6. 查找类似先例\n# 基于混合相似度搜索过去的类似决策\nsimilar_decisions = decision_engine.find_similar_decisions(\n    query_context={\"symptom\": \"fever\"},\n    top_k=3\n)\nprint(f\"相似先例数量：{len(similar_decisions)}\")\n\n# 7. 持久化保存 (可选)\n# graph.save(\"my_context_graph.json\") \n```\n\n### 核心概念说明\n\n*   **ContextGraph**: 存储实体、关系及决策的结构化图谱，支持时间窗口和加权遍历。\n*   **DecisionEngine**: 用于记录决策、建立因果链、分析决策影响及搜索历史先例。\n*   **Provenance**: 所有添加的数据和决策默认支持来源追踪（符合 W3C PROV-O 标准），便于审计。\n\n此框架可与 LangChain、LlamaIndex 等主流 AI 开发库协同工作，作为上层的“问责层”使用，无需替换现有的 LLM 提供商或代理框架。","某金融合规团队正在构建一个自动化信贷审批 AI 助手，需要依据不断更新的监管政策和客户历史数据做出放贷决策，并随时接受审计审查。\n\n### 没有 semantica 时\n- **决策黑盒难审计**：AI 直接给出“拒绝贷款”的结论，但无法追溯是依据哪条法规或哪个数据点做出的判断，导致合规部门无法通过审计。\n- **事实冲突无感知**：向量数据库中同时存在客户“已还清债务”和“当前逾期”的矛盾嵌入记录，AI 随机采信其一，产生不可预测的错误结果。\n- **缺乏因果链条**：当政策更新时，无法快速定位哪些历史决策受到了旧政策影响，难以进行批量回溯和修正。\n- **记忆只有碎片**：系统仅存储模糊的语义相似度匹配，丢失了实体间明确的逻辑关系（如“公司 A 是公司 B 的子公司”），导致推理能力薄弱。\n\n### 使用 semantica 后\n- **全链路可解释**：semantica 将每个审批决策记录为独立对象，自动生成从原始数据到最终结论的完整推理路径，审计人员可一键查看“为何被拒”。\n- **自动冲突检测**：在知识写入阶段，semantica 自动识别并标记相互矛盾的事实（如还款状态冲突），强制人工或规则介入解决，杜绝逻辑谬误。\n- **动态影响分析**：利用决策智能功能，团队可反向追踪受特定法规变更影响的所有历史案例，迅速完成合规性重估。\n- **结构化上下文图谱**：构建包含时间有效性的上下文图谱，明确实体间的因果与从属关系，让 AI 基于严谨的逻辑网络而非模糊匹配进行推理。\n\nsemantica 通过将混沌的非结构化数据转化为可追溯、可解释的决策智能层，让金融 AI 从“不可信的黑盒”变成了“透明的合规专家”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHawksight-AI_semantica_b3f18647.png","Hawksight-AI","Hawksight AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FHawksight-AI_1354b94b.png","An open-source AI research lab advancing ethical, transparent, and community-driven intelligence systems across domains—ensuring truth, trust, and accessibility",null,"https:\u002F\u002Fgithub.com\u002FHawksight-AI",[79],{"name":80,"color":81,"percentage":82},"Python","#3572A5",100,943,153,"2026-04-07T00:30:38","MIT",1,"未说明",{"notes":90,"python":91,"dependencies":92},"该工具是一个用于构建上下文图谱和决策智能层的框架，支持多种图数据库后端（如 AWS Neptune, Apache AGE, FalkorDB, PgVector）和向量存储后端（如 FAISS, Pinecone, Weaviate）。它通过 LiteLLM 支持 100+ 种 LLM 模型。README 中未明确提及具体的操作系统、GPU 或内存硬性要求，表明其可能主要依赖 CPU 运行核心逻辑，或在连接外部 LLM\u002F向量库时依赖相应服务端的资源。安装方式为直接 pip install semantica。","3.8+",[64,93,94,95,96,97,98,99,100],"LiteLLM","FAISS","PgVector","Docling","Apache AGE","FalkorDB","AWS Neptune","Snowflake",[16,13,35,14],[103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122],"ai-agents","graphrag","knowledge-engineering","rag","semantic-layer","agentic-ai","semantic-web","context-management","graph-analytics","ai-infrastructure","data-infrastructure","developer-tools","llmops","python-library","agent-memory","graph-modeling","knowledge-graphs","ontology-engineering","schema-design","context-graph","2026-03-27T02:49:30.150509","2026-04-07T22:51:03.309800",[126,131,136,141,146,150],{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},22876,"为什么使用 `format=\"ttl\"` 导出 RDF 时会报错，提示不支持该格式？","这是因为 `RDFExporter` 目前不直接接受 \"ttl\" 作为格式别名，尽管它是标准的文件扩展名。解决方法是使用完整的格式名称 `format=\"turtle\"` 代替。开发团队已确认这是一个需要修复的问题，计划在未来版本中添加格式别名映射（如将 \"ttl\" 自动映射为 \"turtle\"），但在当前版本中请显式使用 \"turtle\"。","https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fissues\u002F355",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},22877,"在处理大型知识图谱（如 5 万个节点）时，为什么会出现 502 网关超时错误？","这是由于 `ContextGraph` 的分页机制存在性能缺陷：它在应用 `skip` 和 `limit` 分页参数之前，先将整个图谱加载到内存中（O(N) 复杂度），导致内存分配过大并阻塞事件循环。临时解决方案是减小单次请求的图谱规模或增加服务器超时时间。该问题已在后续更新中通过改用迭代器（如 `itertools.islice`）直接处理分页得到修复，将复杂度降低为 O(limit)。","https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fissues\u002F430",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},22878,"Semantica 中的基准测试（Benchmarking）、评估框架（Evaluation）和质量保证（QA）有什么区别？","三者关注点不同且互补：1. **基准测试 (Benchmarking)**：衡量系统**性能**（如速度、吞吐量、延迟、可扩展性），解决“有多快”的问题；2. **评估框架 (Evaluation)**：衡量结果的**准确性\u002F质量**，解决“有多正确”的问题；3. **质量保证 (QA)**：检测并修复**数据质量问题**，解决“数据有多干净”的问题。生产级系统需要同时具备这三项能力。","https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fissues\u002F231",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},22879,"如何对大型数据集进行高效更新，避免每次重新处理所有数据？","可以使用 Semantica 的**增量\u002F增量处理 (Incremental\u002FDelta Processing)** 功能。该功能通过捕获图谱快照并计算快照之间的差异（新增或删除的三元组），使管道只需处理变化的数据而非全量数据。这能显著降低计算成本和处理时间，适用于需要频繁更新的大型数据集。相关模块位于 `semantica\u002Fchange_management\u002F` 和 `semantica\u002Ftriplet_store\u002F`。","https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fissues\u002F323",{"id":147,"question_zh":148,"answer_zh":149,"source_url":135},22880,"在可视化图谱时遇到从 \"Nothing\" 到 \"Nothing\" 的连线错误（黑洞节点问题）该如何解决？","这是一个已知的前端渲染边缘情况，通常发生在数据中存在未正确解析的节点引用时。维护者建议在数据处理逻辑中加入**排除逻辑 (exclusion logic)**，在传递给前端渲染引擎之前过滤掉这些无效的空节点或空边。确保在 `context_graph.py` 相关的数据准备步骤中检查并清理此类异常数据。",{"id":151,"question_zh":152,"answer_zh":153,"source_url":130},22881,"新手在哪里可以找到关于导出 RDF 数据（如 Turtle 格式）的示例代码？","官方入门笔记本系列中的 `15_Export.ipynb` 本应包含相关示例。如果遇到缺失，可以直接参考 `semantica\u002Fexport` 模块下的 `RDFExporter` 类文档。使用时需实例化 `RDFExporter` 并调用 `export` 方法，注意格式参数需使用 `\"turtle\"` 而非 `\"ttl\"`（见其他 FAQ）。社区贡献者正在完善文档以填补这一空白。",[155,160,165,170,175,180,185,190,195,200,205,210,214,218,223],{"id":156,"version":157,"summary_zh":158,"released_at":159},136641,"v0.3.0","# 🧠 Semantica v0.3.0 — 首个稳定版\n\n**发布日期:** 2026-03-10 &nbsp;|&nbsp; **PyPI:** `pip install semantica` &nbsp;|&nbsp; **Python:** 3.8 – 3.12 &nbsp;|&nbsp; **许可证:** MIT\n\n> Semantica 的首个 `生产\u002F稳定` 版本——一个用于构建上下文图和 AI 代理决策智能层的开源框架。本次发布整合了此前三个阶段的所有功能：**0.3.0-alpha**（2026-02-19）、**0.3.0-beta**（2026-03-07）以及 **0.3.0 稳定版**（2026-03-10）。\n\n```bash\npip install --upgrade semantica\n```\n\n> **无破坏性变更。** 所有新参数均带有安全默认值。所有新方法均为纯新增功能。\n\n---\n\n## 🚦 发布亮点\n\n- 🕐 **时间有效性** — 节点与边支持 `valid_from`\u002F`valid_until` 属性；可查询任意时间点的活跃内容\n- 🔗 **跨图导航** — 可链接独立的 `ContextGraph` 实例；支持跨图导航；保存与加载后仍保持完整\n- ⚖️ **加权 BFS 遍历** — 可通过 `min_weight` 参数按边置信度过滤多跳查询\n- 🧠 **决策智能** — 完整生命周期：记录 → 因果链 → 影响分析 → 先例搜索 → 政策执行\n- 🔄 **增量处理** — 基于 SPARQL 的增量图差异计算；仅变更数据会流经管道\n- 🗃️ **去重 v2** — 语义去重速度提升 6.98 倍，候选生成速度提升 63.6%\n- 📤 **新增导出格式** — ArangoDB AQL、Apache Parquet（适用于 Spark\u002FBigQuery\u002FDatabricks）\n- 🗄️ **图数据库后端** — Apache AGE、PgVector、AWS Neptune、FalkorDB\n- ✅ **886+ 测试全部通过 — 无失败**\n\n---\n\n## 👥 贡献者\n\n| 贡献者 | 贡献领域 |\n|-------------|-------|\n| [@KaifAhmad1](https:\u002F\u002Fgithub.com\u002FKaifAhmad1) | 主要维护者 — 上下文图、决策智能、知识图谱算法、语义抽取、流水线、溯源、Bug 修复、版本管理 |\n| [@ZohaibHassan16](https:\u002F\u002Fgithub.com\u002FZohaibHassan16) | 去重 v2 套件、增量\u002FDelta 处理、基准测试套件 |\n| [@Sameer6305](https:\u002F\u002Fgithub.com\u002FSameer6305) | Apache AGE 后端、PgVector 存储、Snowflake 连接器、Apache Arrow 导出 |\n| [@tibisabau](https:\u002F\u002Fgithub.com\u002Ftibisabau) | ArangoDB AQL 导出、Apache Parquet 导出 |\n| [@d4ndr4d3](https:\u002F\u002Fgithub.com\u002Fd4ndr4d3) | 修复 ResourceScheduler 死锁问题 |\n\n---\n\n## ✨ v0.3.0 稳定版 — 上下文图功能完整性\n\n> 2026-03-10 发布 · 所有更改均由 [@KaifAhmad1](https:\u002F\u002Fgithub.com\u002FKaifAhmad1) 完成\n\n### 🕐 时间有效性窗口\n\n节点和边现具备一等公民级别的 `valid_from` \u002F `valid_until` ISO 日期时间字段——直接存储在 `ContextNode` 和 `ContextEdge` 数据类中，而非深埋于元数据内。\n\n**新增 API:**\n- `add_node(valid_from=..., valid_until=...)` 和 `add_edge(valid_from=..., valid_until=...)` — 在创建时设置有效性窗口\n- `node.is_active(at_time=None)` 和 `edge.is_active(at_time=None)` — 如果在指定时间处于活跃状态则返回 `True`（默认为当前时间）\n- `graph.find_active_nodes(node_type=None, at_time=None)` — 对整个图进行筛选以","2026-03-10T22:07:44",{"id":161,"version":162,"summary_zh":163,"released_at":164},136642,"v0.3.0-beta","# Semantica v0.3.0-beta — 发行说明\n\n**日期:** 2026-03-07 | **标签:** `v0.3.0-beta` | **状态:** 内部测试版（预发布）\n\n> 整合并整合所有 Alpha 版及未发布的功能，以便在公开发布 0.3.0 版本之前进行内部验证。\n\n---\n\n## 新增内容\n\n### 语义提取与推理\n- **多创始人 LLM 提取修复** (#354) — 对于无法匹配的关系主体\u002F客体，现在会生成合成的 `UNKNOWN` 实体，而不是静默丢弃；所有 LLM 返回的联合创始人将被保留\n- **推理引擎模式匹配重写** (#354) — `_match_pattern` 现在能够正确处理多词值、预先绑定的变量、重复的变量引用以及非贪婪分隔符\n\n### 导出\n- **RDF \u002F TTL 别名修复** (#355) — `format=\"ttl\"`、`\"nt\"`、`\"xml\"`、`\"rdf\"`、`\"json-ld\"` 均可正常解析，且不会破坏现有调用方\n- **ArangoDB AQL 导出** (#342) — 完整的 AQL INSERT 语句生成，适用于顶点和边；支持配置批量操作；17 个测试通过\n- **Apache Parquet 导出** (#343) — 列式存储，支持可配置的压缩算法（snappy、gzip、brotli、zstd、lz4）；明确指定 Arrow 模式；25 个测试通过\n\n### 去重 v2（Epic #333）\n- **候选生成 v2** (#338) — 引入 `blocking_v2` 和 `hybrid_v2` 策略，支持多键和音似阻塞；最坏情况下的性能提升 **63.6%**\n- **两阶段评分预过滤器** (#339) — 在昂贵的语义评分之前进行快速预过滤；批处理速度提升 **18–25%**\n- **语义去重 v2** (#340) — 可选的 `semantic_v2` 功能，包含规范化、O(1) 哈希匹配和加权评分；性能提升 **6.98 倍**；修复了无限递归 bug\n- **迁移指南** (#344) — 提供 `MIGRATION_V2.md` 文档，包含完整示例；确认性能提升 **5.86 倍**；向后兼容\n\n### 增量\u002F增量处理\n- **增量处理** (#349) — 原生 SPARQL 支持图快照之间的增量计算；新增 `delta_mode` 流水线配置项；提供 `prune_versions()` 用于快照保留；已具备生产就绪条件，适用于近实时流水线\n\n---\n\n## Bug 修复\n\n- **`NameError` — 缺少 `Type` 导入** 在 `utils\u002Fhelpers.py` 中；移除了 `config_manager.py` 中未使用的导入\n- **上下文模块** — 修复了 `retrieve_decision_precedents`、`hybrid_retrieval`、`dynamic_context_traversal`、`multi_hop_context_assembly`、`_retrieve_from_vector`、`_extract_entities_from_query`；新增了缺失的 `expand_context` 和 `_get_decision_query` 方法\n- **知识图谱模块** — 修复了 `calculate_pagerank`、`community_detector._to_networkx`、`detect_communities`、`_build_adjacency`；新增了 `ProvenanceTracker` 和 9 种领域特定的追踪方法\n- **流水线模块** — 修复了 `execution_engine` 中的重试循环；新增了 `RecoveryAction`，支持 LINEAR \u002F EXPONENTIAL \u002F FIXED 退避策略；修复了 `add_step` 的返回值；新增了 `validate` 别名\n- **测试文件** — 将表情符号替换为 ASCII 字符，以提高 Windows cp1252 兼容性；修复了 4 个测试文件中的断言顺序和循环相关 bug\n\n---\n\n## 测试结果\n\n| 通过 | 跳过（外部服务） | 失败 |\n|---|---|--","2026-03-07T11:28:23",{"id":166,"version":167,"summary_zh":168,"released_at":169},136643,"v0.3.0-alpha","## 🎉 Semantica v0.3.0-alpha 版本发布\n\n此 Alpha 版本引入了全面的决策跟踪能力、先进的知识图谱算法以及可用于测试的生产就绪架构。\n\n### 🚀 主要特性\n\n#### **决策跟踪系统**\n- 完整的决策生命周期管理与审计追踪\n- 来源追踪与血缘管理\n- 政策合规性与异常处理\n- 决策影响分析与影响评分\n\n#### **高级知识图谱算法**\n- 基于 Node2Vec 的嵌入用于语义相似度计算\n- 中心性分析（度中心性、介数中心性、接近中心性、特征向量中心性）\n- 社区检测与图数据分析\n- 路径查找与链接预测\n\n#### **增强的上下文模块**\n- 统一的 AgentContext，配备细粒度的功能开关\n- 与决策跟踪系统的集成\n- 生产就绪的架构设计及验证\n- GraphStore 能力验证\n\n#### **向量存储功能**\n- 混合搜索，结合语义、结构和类别相似度\n- 可配置权重的高级检索\n- 集成 FastEmbed，实现高效操作\n\n### 🧪 测试与质量\n- 上下文模块与核心模块共通过 113+ 项测试\n- 全面的决策跟踪测试覆盖率\n- 增强的错误处理与边界情况测试\n- 已修复所有关键测试失败，确保发布就绪\n\n### 📦 安装\n```bash\npip install semantica==0.3.0a0","2026-02-19T18:46:07",{"id":171,"version":172,"summary_zh":173,"released_at":174},136644,"v0.2.7","## 概述\n版本 0.2.7 新增了 Snowflake 集成、Apache Arrow 导出以及基准测试套件。\n\n## 🚀 新特性\n\n### 用于数据摄取的 Snowflake 连接器\n**PR #276，作者 @Sameer6305**\n\n原生 Snowflake 连接器，支持多种身份验证方式（密码、OAuth、密钥对、SSO）。包含表\u002F查询摄取、模式自省以及 SQL 注入防护功能。\n\n**测试用例**：24\u002F24 通过  \n**依赖项**：`db-snowflake` 可选\n\n### Apache Arrow 导出支持\n**PR #273，作者 @Sameer6305**\n\n高性能列式导出，支持显式模式定义、压缩，并与 Pandas 和 DuckDB 兼容。\n\n**测试用例**：20\u002F20 通过  \n**依赖项**：`db-arrow` 可选\n\n### 全面的基准测试套件\n**PR #289，作者 @ZohaibHassan16、@KaifAhmad1**\n\n覆盖所有模块的 137+ 个基准测试，具备回归检测和 CI\u002FCD 集成功能。\n\n**特性**：统计分析、环境无关设计、命令行工具\n\n## 📊 质量保证\n- **总测试用例数**：44\u002F44 通过\n- **破坏性变更**：无\n- **向后兼容**：是\n\n## 🛠 安装\n```bash\npip install semantica==0.2.7\npip install semantica[db-snowflake,db-arrow]==0.2.7\n```\n\n## 🙏 贡献者\n- **@Sameer6305**：Snowflake 连接器、Arrow 导出\n- **@ZohaibHassan16**：基准测试套件实现\n- **@KaifAhmad1**：基准测试优化、CI\u002FCD 集成\n\n## 🔗 链接\n- **GitHub**：https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\n- **PyPI**：https:\u002F\u002Fpypi.org\u002Fproject\u002Fsemantica\u002F\n- **基准测试**：`python benchmarks\u002Fbenchmark_runner.py`\n\n## 📈 性能\n- **文本处理**：>10,000 操作\u002F秒\n- **Arrow 导出**：速度提升 10 倍\n- **基准测试覆盖率**：137+ 个测试\n\n---\n\n**感谢所有贡献者使本次发布成为可能！**","2026-02-09T07:26:03",{"id":176,"version":177,"summary_zh":178,"released_at":179},136645,"v0.2.6","# Semantica v0.2.6\n\n**发布日期：** 2026年2月3日\n\n我们很高兴地宣布 Semantica v0.2.6 正式发布！本次版本带来了元数据溯源追踪、变更管理等方面的重大增强，以及多项重要的错误修复！\n\n---\n\n## 🎉 亮点\n\n### 重大特性\n\n- **符合 W3C PROV-O 标准的元数据溯源追踪** - 覆盖全部 17 个模块的企业级血缘追踪\n- **增强的变更管理** - 知识图谱和本体的版本控制\n- **CSV 数据导入优化** - 自动检测与健壮的错误处理机制\n- **全面的测试覆盖** - 导入模块覆盖率达 80%-86%\n\n### 错误修复\n\n- LLM 提供商的温度参数兼容性问题\n- JenaStore 空图初始化问题\n\n---\n\n## ✨ 新特性与改进\n\n### 符合 W3C PROV-O 标准的元数据溯源追踪\n**PRs：** #254、#246 | **贡献者：** @KaifAhmad1\n\n一个全面的元数据溯源追踪系统，完全符合 W3C PROV-O 标准，覆盖 Semantica 的所有 17 个模块。\n\n**核心模块：**\n- `ProvenanceManager` 用于集中式追踪\n- W3C PROV-O 模式（活动、实体、代理）\n- 存储后端：内存存储与 SQLite\n- SHA-256 完整性校验\n\n**模块集成：**\n- 语义抽取、LLM（Groq、OpenAI、HuggingFace、LiteLLM）\n- 流水线、上下文、数据导入、嵌入\n- 图\u002F向量\u002F三元组存储\n- 推理、冲突解决、去重\n- 导出、解析、归一化、本体、可视化\n\n**特性：**\n- 完整的血缘追踪：文档 → 块 → 实体 → 关系 → 图谱\n- LLM 追踪：令牌、成本、延迟\n- 源数据追踪及领域转换桥接公理\n\n**合规性：**\n- W3C PROV-O、FDA 21 CFR Part 11、SOX、HIPAA、TNFD\n\n**测试：**\n- 237 个测试用例，覆盖核心功能、所有 17 个模块的集成、边缘场景及向后兼容性\n\n**设计：**\n- 默认关闭溯源功能（`provenance=False`），用户可按需启用\n- 无破坏性变更\n- 不引入新依赖\n\n---\n\n### 增强的变更管理模块\n**PRs：** #248、#243 | **贡献者：** @KaifAhmad1\n\n面向知识图谱和本体的企业级版本控制功能，支持持久化存储与审计日志。\n\n**核心类：**\n- `TemporalVersionManager` - 知识图谱版本管理\n- `OntologyVersionManager` - 本体版本管理\n- `ChangeLogEntry` - 变更元数据追踪\n\n**存储：**\n- SQLite（持久化）与内存后端\n- 线程安全操作\n\n**特性：**\n- 使用 SHA-256 校验码确保完整性\n- 详细的实体\u002F关系差异对比\n- 本体结构比较\n- 邮箱验证功能\n\n**合规性：**\n- HIPAA、SOX、FDA 21 CFR Part 11\n- 不可变的审计日志\n\n**测试：**\n- 104 个测试用例（100% 通过）\n- 包括单元测试、集成测试、合规性测试、性能测试及边缘场景测试\n\n**性能：**\n- 处理 1 万个实体耗时 17.6 毫秒\n- 并发操作可达每秒 510 次以上\n- 可高效处理包含 5,000+ 个实体的图谱\n\n**迁移：**\n- 向后兼容\n- 类名简化\n- 无需外部依赖\n\n---\n\n### CSV 数据导入增强\n**PR：** #244 | **贡献者：** @saloni0318\n\n健壮的","2026-02-03T05:10:11",{"id":181,"version":182,"summary_zh":183,"released_at":184},136646,"v0.2.5","# Semantica v0.2.5\n\n## 🚀 发布亮点\n本次发布带来了原生的 **Pinecone 向量存储** 支持、可配置的 **LLM 重试逻辑**，以及对 **语义提取** 模块的重大增强，包括对自定义 Hugging Face 模型（BYOM）的 robust 支持、NER\u002F关系抽取功能的改进，以及三元组抽取逻辑的完善。\n\n## 🌟 新特性\n\n### Pinecone 向量存储支持\n- 实现了原生的 `PineconeStore`，具备完整的 CRUD 能力。\n- 支持无服务器和基于 Pod 的索引、命名空间，以及元数据过滤。\n- 与统一的 `VectorStore` 接口及注册表完全集成。\n- *(关闭 #219，解决 #220)*\n\n### 可配置的 LLM 重试逻辑\n- 在 `NERExtractor`、`RelationExtractor` 和 `TripletExtractor` 中暴露了 `max_retries` 参数。\n- 默认设置为 3 次重试，以优雅地处理 JSON 验证失败或 API 超时问题。\n- 将重试配置传播到分块处理辅助工具中，确保长文档处理的一致性。\n\n### Bring Your Own Model (BYOM) 支持\n- **自定义 Hugging Face 模型**：在 `NERExtractor`、`RelationExtractor` 和 `TripletExtractor` 中实现了对自定义模型的全面支持。\n- **自定义分词器**：新增对具有非标准分词需求模型的支持。\n- **运行时覆盖**：`extract(model=...)` 现在能够正确覆盖配置默认值。\n\n### 增强的提取能力\n- **NER**：新增可配置的聚合策略（`simple`、`first`、`average`、`max`），并实现了 robust 的 IOB\u002FBILOU 解析。\n- **关系抽取**：实现了标准实体标记技术（`\u003Csubj>`、`\u003Cobj>`）以及结构化输出解析。\n- **三元组抽取**：针对 Seq2Seq 模型（如 REBEL）添加了专用解析逻辑，可直接从文本中生成结构化的三元组。\n\n## 🐛 Bug 修复\n\n- **LLM 提取稳定性**：通过严格限制 `max_retries` 来避免无限重试循环。\n- **模型参数优先级**：解决了配置默认值覆盖运行时参数的问题。\n- **导入处理**：通过改进的 mock 策略，修复了测试套件中的循环导入问题。\n\n## 📦 安装\n```bash\npip install semantica==0.2.5\n```","2026-01-27T16:26:12",{"id":186,"version":187,"summary_zh":188,"released_at":189},136647,"v0.2.4","### 新增\n- **本体入库模块**：\n  - 实现了用于解析 RDF\u002FOWL 文件（Turtle、RDF\u002FXML、JSON-LD、N3）的 `OntologyIngestor`。\n  - 添加了 `ingest_ontology` 方法，并统一了 `ingest(source_type=\"ontology\")` 接口。\n  - 增加了递归目录扫描功能，支持批量本体入库。\n  - 新增了用于存储一致元数据的 `OntologyData` 数据类。\n- **文档**：\n  - 更新了 `ontology_usage.md` 和 `ontology.md`，增加了使用示例和 API 详细信息。\n- **测试**：\n  - 添加了全面的测试套件 `tests\u002Fingest\u002Ftest_ontology_ingestor.py`。\n  - 新增了 `examples\u002Fdemo_ontology_ingest.py`，用于端到端演示。","2026-01-22T07:20:46",{"id":191,"version":192,"summary_zh":193,"released_at":194},136648,"v0.2.3","我们很高兴地宣布 **Semantica v0.2.3**！本次发布重点提升了稳定性、性能和开发者体验，包括对 LLM 关系抽取的关键修复、高性能向量存储入库功能的优化，以及循环依赖问题的解决。\n\n## 🚀 新增功能\n\n### **向量存储高性能入库**\n- **新增 `add_documents` API**：支持高吞吐量的入库操作，具备自动嵌入生成、批处理和并行处理能力。\n- **`embed_batch` 辅助函数**：高效为文本列表生成嵌入向量，无需立即存储。\n- **并行处理默认启用**：在 `VectorStore` 中默认启用并行入库（默认 `max_workers=6`），以提升处理速度。\n- **文档更新**：新增专用指南 `docs\u002Fvector_store_usage.md`，介绍高性能配置方法。\n- **测试用例**：新增 `tests\u002Fvector_store\u002Ftest_vector_store_parallel.py`，涵盖并行与串行性能对比及边缘场景测试。\n\n### **Amazon Neptune 开发环境**\n- **CloudFormation 模板**：新增 `cookbook\u002Fintroduction\u002Fneptune-setup.yaml`，用于部署带有公共端点和 IAM 身份验证的开发用 Neptune 集群。\n- **文档更新**：更新 `cookbook\u002Fintroduction\u002F21_Amazon_Neptune_Store.ipynb`，包含部署指南、成本估算及 IAM 最佳实践。\n- **代码检查**：将 `cfn-lint` 添加到 pre-commit 钩子中，用于 CloudFormation 模板的验证。\n\n### **全面的测试套件**\n- **单元测试**：新增 `tests\u002Ftest_relations_llm.py`，覆盖关系抽取的类型化和结构化响应路径。\n- **集成测试**：新增 `tests\u002Fintegration\u002Ftest_relations_groq.py`，用于实际 Groq API 的验证。\n\n## 🐛 修复问题\n\n### **LLM 关系抽取解析**\n- **零关系问题修复**：解决了即使 API 调用成功仍返回零结果的问题。\n- **响应规范化**：将 Instructor\u002FOpenAI\u002FGroq 返回的类型化响应统一为一致的字典格式。\n- **JSON 回退机制**：当类型化生成结果为空时，自动回退到结构化的 JSON 格式。\n- **参数清理**：移除了内部调用中不支持的关键字参数（如 `max_tokens`、`max_entities_prompt`），以避免 API 错误。\n\n### **Pipeline 循环导入**\n- **解决导入循环**：修复了 `pipeline_builder` 和 `pipeline_validator` 之间的循环依赖问题（Issue #192、#193）。\n- **延迟加载**：对 `PipelineValidator` 实现了延迟加载，确保模块导入的稳定性。\n\n### **JupyterLab 稳定性**\n- **进度输出控制**：新增 `SEMANTICA_DISABLE_JUPYTER_PROGRESS` 环境变量。\n- **内存优化**：当该变量启用时，会回退到控制台风格的输出，以防止 JupyterLab 因无限滚动表格而引发内存不足错误（Issue #181）。\n\n## ⚡ 变更\n\n### **关系抽取 API**\n- **接口简化**：移除未使用的关键字参数，防止参数泄漏。\n- **调试增强**：改进了错误处理和抽取流程中的详细日志记录。\n- **解析健壮性**：增强了对 API 响应的后处理解析能力。","2026-01-20T06:39:37",{"id":196,"version":197,"summary_zh":198,"released_at":199},136649,"v0.2.2","## 亮点\n\n- 所有核心抽取器均采用高吞吐量的**并行抽取引擎**。\n- 针对真实场景的抽取工作负载，性能大幅提升（约1.89倍加速）。\n- 示例和缓存机制中的**安全规范**进一步加强。\n- 更新了**Gemini SDK集成**及依赖约束，以确保安装更加稳定。\n\n---\n\n## 新增功能\n\n- **并行抽取引擎**\n  - 在所有核心抽取器中实现了并行批处理：\n    - `NERExtractor`、`RelationExtractor`、`TripletExtractor`\n    - `EventDetector`、`SemanticNetworkExtractor`\n  - 为所有抽取器的`extract()`方法新增了`max_workers`参数，用户可根据CPU资源或速率限制调整并发度。\n  - 对大型文档的以下方法启用了**并行分块处理**：\n    - `_extract_entities_chunked`\n    - `_extract_relations_chunked`\n  - 增强了`ProgressTracker`的线程安全性，支持并发批量更新。\n\n- **语义抽取性能与回归测试**\n  - 添加了针对以下内容的回归测试套件：\n    - 默认最大工作线程数\n    - LLM提示中的实体过滤\n    - 抽取器复用场景\n  - 提供了一个可运行的基准测试脚本，用于测量以下组件的批处理延迟：\n    - `NERExtractor`、`RelationExtractor`、`TripletExtractor`\n    - `EventDetector`、`SemanticAnalyzer`、`SemanticNetworkExtractor`\n  - 当设置了`GROQ_API_KEY`时，新增了针对实体\u002F关系\u002F三元组的**Groq LLM冒烟测试**。\n\n---\n\n## 安全性\n\n- **凭据净化**\n  - 从8个教程笔记本中移除了硬编码的API密钥，以防止敏感信息泄露。\n  - 强制要求在所有示例中使用环境变量来设置`GROQ_API_KEY`。\n\n- **安全缓存**\n  - 更新了`ExtractionCache`，将敏感参数（如`api_key`、`token`、`password`等）排除在缓存键之外，从而实现安全的缓存共享。\n  - 将缓存键哈希算法由**MD5**升级为**SHA-256**，以提高抗碰撞能力。\n\n---\n\n## 变更\n\n- **Gemini SDK迁移**\n  - 将`GeminiProvider`迁移到新的`google-genai` SDK（`v0.1.0+`），以解决弃用问题。\n  - 增加了对`google.generativeai`的优雅回退机制，以保持向后兼容性。\n\n- **依赖解析**\n  - 将`opentelemetry-api`和`opentelemetry-sdk`固定为`1.37.0`版本，以解决pip冲突问题。\n  - 更新了`protobuf`和`grpcio`的约束条件，以提升稳定性。\n\n- **实体过滤范围**\n  - 从**非LLM**抽取流程中移除实体过滤，以避免准确率下降。\n  - 将实体筛选仅限于**LLM关系提示构建**阶段，同时仍会将返回的实体与原始完整列表进行匹配。\n\n- **批处理并发默认值**\n  - 统一了`semantic_extract`中`max_workers`的默认值：\n    - 基于机器学习的方法默认为单线程。\n    - 基于模式\u002F正则表达式\u002F规则\u002FLLM\u002FHuggingFace的方法则采用更高的并行度，但受CPU限制。\n  - 将全局`optimization.max_workers`的默认值提高到**8**，以提升批处理工作负载的吞吐量。\n\n---\n\n## 性能\n\n- **瓶颈优化（GitHub Iss","2026-01-14T19:13:17",{"id":201,"version":202,"summary_zh":203,"released_at":204},136650,"v0.2.1","## 🚀 摘要\n本次发布解决了长文档中关键的 LLM 提取失败问题（Bug #176），并修复了财报电话会议分析教程（Bug #177）。\n\n## 🛠 主要变更\n- **LLM 稳定性（修复 #176）**：\n  - 通过正确传递 `max_tokens` 参数，解决了 JSON 输出不完整的问题。\n  - 当达到令牌限制时，新增了**自动重试机制，并降低分块大小**。\n  - 将 Groq、OpenAI 和 Anthropic 的默认分块大小统一设置为 **64k**。\n- **教程修复（修复 #177）**：\n  - 通过修复 `SourceReference` 的使用方式，解决了 `03_Earnings_Call_Analysis.ipynb` 中的 `TypeError` 错误。\n- **改进**：\n  - 新增对较新提供商 API 的 `max_completion_tokens` 支持。\n  - 移除了语义类中的硬编码长度约束。\n\n## ✅ 验证\n- 通过新增测试验证了 `max_tokens` 的传递及错误处理机制。\n- 手动验证了 Groq Llama 3.3 70B 的集成效果。\n- **PyPI 发布**：成功构建并上传了 `semantica-0.2.1`。","2026-01-12T12:36:30",{"id":206,"version":207,"summary_zh":208,"released_at":209},136651,"v0.2.0","# 🚀 Semantica v0.2.0 Release Notes\r\n\r\n**Date:** 2026-01-10  \r\n**Tag:** `v0.2.0`\r\n\r\n## 🌟 Highlights\r\n\r\nThis release represents a major step forward in enterprise readiness and robustness. Key additions include native **Amazon Neptune** support for scalable graph storage, **Docling** integration for high-fidelity document parsing, and a new **Robust Extraction** system that ensures reliable data extraction even in edge cases.\r\n\r\n### 🌊 Amazon Neptune Support\r\nWe've added a production-ready `AmazonNeptuneStore` that integrates seamlessly with AWS.\r\n*   **Native SigV4 Authentication**: Secure connection handling with automatic token refresh.\r\n*   **Resilient Connectivity**: Built-in retry logic with backoff for transient network issues.\r\n*   **Full GraphStore Compatibility**: Passes the complete compliance test suite.\r\n\r\n### 📄 Docling Integration\r\nParsing unstructured documents just got a major upgrade with the new `DoclingParser`.\r\n*   **Multi-Format Support**: High-fidelity parsing for PDF, DOCX, PPTX, XLSX, HTML, and Images.\r\n*   **Structure Preservation**: Superior table extraction and layout understanding.\r\n*   **Flexible Export**: Convert documents to Markdown, JSON, or HTML with ease.\r\n\r\n### 🛡️ Robust Extraction Fallbacks\r\nNo more empty results. We've implemented a smart fallback chain:\r\n1.  **ML\u002FLLM**: High-precision extraction.\r\n2.  **Pattern**: Regex and rule-based matching.\r\n3.  **Last Resort**: Heuristic fallback (e.g., capitalized words) to guarantee output.\r\n\r\n---\r\n\r\n## ✨ New Features\r\n\r\n### Provenance & Tracking\r\n*   **Traceability**: Added `batch_index` and `document_id` metadata to all extracted entities and relations.\r\n*   **Observability**: Enhanced logging with count tracking for batch processes.\r\n\r\n### Semantic Extraction\r\n*   **Auto-Chunking**: Automatically handles long text inputs for LLM extraction.\r\n*   **Resilient Providers**: Added retry logic (3 attempts w\u002F backoff) and robust JSON parsing for all LLM calls.\r\n*   **Groq Enhancements**: Better diagnostics and connectivity testing for Groq users.\r\n\r\n### 🛠️ Technical Improvements\r\n*   **Schema Standardization**: Introduced canonical Pydantic models in `semantica\u002Fsemantic_extract\u002Fschemas.py`.\r\n*   **Testing**: Added comprehensive test suites for robustness, model switching, and end-to-end pipelines.\r\n*   **Dependencies**: Added `GitPython` and `chardet` for better environment stability.\r\n\r\n---\r\n\r\n## 🐛 Bug Fixes & Refactoring\r\n\r\n*   **Critical**: Fixed issue where extractors returned empty lists on primary method failure.\r\n*   **Critical**: Resolved `NameError` in `extraction_validator.py`.\r\n*   **Embeddings**: Fixed model switching bug in `TextEmbedder` to ensure state clears correctly.\r\n*   **Graph Analysis**: Fixed `TypeError` when processing graphs with raw Entity objects.\r\n*   **Cleanup**: Removed legacy deduplication logic in favor of the unified `semantica\u002Fconflicts` module.\r\n\r\n---\r\n\r\n## 📦 Installation & Upgrade\r\n\r\nTo install the latest version:\r\n```bash\r\npip install semantica==0.2.0\r\n```\r\n\r\nTo include Amazon Neptune support:\r\n```bash\r\npip install semantica[graph-amazon-neptune]\r\n```\r\n\r\nTo include Docling support:\r\n```bash\r\npip install semantica[docling]\r\n```\r\n\r\n---\r\n*Thank you to all our contributors! If you encounter any issues, please report them on [GitHub Issues](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fissues).*\r\n","2026-01-10T18:03:06",{"id":211,"version":212,"summary_zh":76,"released_at":213},136652,"v0.0.5","2025-11-26T06:47:41",{"id":215,"version":216,"summary_zh":76,"released_at":217},136653,"v0.0.4","2025-11-26T06:20:45",{"id":219,"version":220,"summary_zh":221,"released_at":222},136654,"v0.0.3","# Semantica v0.0.3\r\n\r\nBuild powerful knowledge graphs and semantic layers with ease.\r\n\r\n## 🎯 What's New in v0.0.3\r\n\r\n### 🔧 GitHub Workflow Improvements\r\n- **Fixed CI pipeline** - Simplified to build validation only\r\n- Combined release and PyPI publishing workflows\r\n- Removed unnecessary automation scripts\r\n- Streamlined security scanning\r\n\r\n### 📚 Documentation & Community\r\n- Enhanced issue templates (Bug, Feature, Docs, Support, Partnerships)\r\n- Updated PR template with clear guidelines\r\n- Added comprehensive support documentation\r\n- Funding and sponsorship configuration\r\n- Clean `.github` folder structure with README\r\n\r\n### 📖 Expanded Use Cases\r\n- 10+ domain-specific cookbook examples\r\n- Finance, Healthcare, Cybersecurity, Trading\r\n- Supply Chain, Renewable Energy, Blockchain\r\n- Intelligence, Biomedical, Advanced RAG\r\n\r\n## 📦 Installation\r\n\r\npip install semantica==0.0.3## 🚀 Quick Start\r\n\r\nfrom semantica import Semantica\r\n\r\n# Initialize and build knowledge graph\r\ncore = Semantica()\r\nkg = core.process([\"Your documents...\"])\r\nkg.export(\"knowledge_graph.json\")## 🔗 Resources\r\n\r\n- [Documentation](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Ftree\u002Fmain\u002Fdocs)\r\n- [Cookbook Examples](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Ftree\u002Fmain\u002Fcookbook)\r\n- [GitHub Discussions](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fdiscussions)\r\n- [Report Issues](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fissues)\r\n\r\n## 💼 Partnerships & Sponsorship\r\n\r\nInterested in funding, grants, or partnerships? [Submit a proposal](https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fissues\u002Fnew?template=grant.md)\r\n\r\n## 📄 License\r\n\r\nMIT\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FHawksight-AI\u002Fsemantica\u002Fcompare\u002Fv0.0.2...v0.0.3","2025-11-25T18:21:47",{"id":224,"version":225,"summary_zh":226,"released_at":227},136655,"v0.0.1","## 🧠 Semantica v0.0.1 - Initial Release\r\n\r\nFirst release of Semantica - An Open Source Framework for building Semantic Layers and Knowledge Engineering.\r\n\r\n### 🚀 Installation\r\n\r\npip install semantica### ✨ Features\r\n\r\n- **Semantic Layer Construction**: Build semantic layers from unstructured data\r\n- **Knowledge Graph Generation**: Create and manage knowledge graphs\r\n- **Entity & Relationship Extraction**: Extract entities and relationships from text\r\n- **Conflict Resolution**: Multiple strategies for resolving data conflicts\r\n- **Multiple Export Formats**: Export to RDF, OWL, JSON, CSV, YAML, and more\r\n- **Vector Store Integration**: Store and query embeddings\r\n- **Embedding Generation**: Generate embeddings for text, images, and audio\r\n- **Data Ingestion**: Support for multiple data sources (files, web, databases, etc.)\r\n- **Ontology Management**: Create and manage ontologies\r\n- **Graph Analytics**: Analyze and visualize knowledge graphs\r\n\r\n### 📦 Package Information\r\n\r\n- **Package Name**: `semantica`\r\n- **Version**: 0.0.1\r\n- **Python Version**: >=3.8\r\n- **License**: MIT\r\n- **Author**: Hawksight AI\r\n\r\n### 🔗 Links\r\n\r\n- **PyPI**: https:\u002F\u002Fpypi.org\u002Fproject\u002Fsemantica\u002F\r\n- **Documentation**: See README.md for usage examples\r\n\r\n### 📝 Notes\r\n\r\nThis is the initial alpha release. We welcome feedback and contributions!","2025-11-21T12:48:48"]