[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-BaseModelAI--cleora":3,"tool-BaseModelAI--cleora":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",151918,2,"2026-04-12T11:33:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":103,"forks":104,"last_commit_at":105,"license":106,"difficulty_score":107,"env_os":108,"env_gpu":109,"env_ram":110,"env_deps":111,"category_tags":121,"github_topics":122,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":136,"updated_at":137,"faqs":138,"releases":167},6937,"BaseModelAI\u002Fcleora","cleora","Cleora AI is a general-purpose open-source model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data. Created by Synerise.com team.","Cleora 是一款专为异构关系数据设计的开源图嵌入引擎，能够高效生成稳定且具有归纳能力的实体向量表示。它主要解决了传统图算法在处理大规模数据时计算缓慢、内存占用高以及依赖 GPU 和随机采样导致结果不稳定的痛点。无论是需要构建推荐系统、进行社交网络分析的研究人员，还是追求高性能生产级应用的开发者，都能从中受益。\n\nCleora 的核心技术亮点在于其独特的确定性算法：它无需负采样，也不依赖 GPU，仅通过单次矩阵乘法即可计算出所有可能的随机游走路径。这种设计不仅消除了随机噪声，确保了结果的可复现性，还带来了极致的性能提升——速度比 GraphSAGE 快 240 倍，内存占用比 NetMF 少 50 倍。此外，Cleora 在多个权威学术数据集的基准测试中准确率位居第一，且具备出色的扩展性，能够处理超大规模图谱而不崩溃。只需几行代码或简单的命令行指令，用户即可快速完成从数据输入到向量生成的全过程，轻松将复杂的图数据转化为可用的机器学习特征。","\u003Cp align=\"center\">\n\n![Cleora logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBaseModelAI_cleora_readme_924aa4805a50.png)\n\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">The Graph Embedding Engine\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\nCleora computes \u003Cb>all possible random walks in a single matrix multiplication\u003C\u002Fb>.\u003Cbr>\nNo negative sampling. No GPU. No noise. Just fast, deterministic, production-grade embeddings.\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fcleora.ai\">Website\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fcleora.ai\u002Fdocs\">Documentation\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fcleora.ai\u002Fapi\">API Reference\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fcleora.ai\u002Fbenchmarks\">Benchmarks\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ccode>pip install pycleora\u003C\u002Fcode>\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\n  \u003Cb>#1 Accuracy. Every Dataset.\u003C\u002Fb>\u003Cbr>\n  Tested on \u003Cb>5 canonical academic datasets\u003C\u002Fb> against \u003Cb>7 competing algorithms\u003C\u002Fb> — Cleora wins on accuracy on \u003Cb>every single dataset\u003C\u002Fb>,\u003Cbr>and is the only algorithm that scales to every graph without crashing.\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cb>240x\u003C\u002Fb> Faster Than GraphSAGE &nbsp;·&nbsp;\n  \u003Cb>50x\u003C\u002Fb> Less Memory Than NetMF &nbsp;·&nbsp;\n  \u003Cb>~5 MB\u003C\u002Fb> Install &nbsp;·&nbsp;\n  \u003Cb>0\u003C\u002Fb> GPUs Required\n\u003C\u002Fp>\n\n---\n\n## Achievements\n\n:one:st place at [SIGIR eCom Challenge 2020](https:\u002F\u002Fsigir-ecom.github.io\u002Fecom20DCPapers\u002FSIGIR_eCom20_DC_paper_1.pdf)\n\n:two:nd place and Best Paper Award at [WSDM Booking.com Challenge 2021](http:\u002F\u002Fceur-ws.org\u002FVol-2855\u002Fchallenge_short_3.pdf)\n\n:two:nd place at [Twitter Recsys Challenge 2021](https:\u002F\u002Frecsys-twitter.com\u002Fcompetition_leaderboard\u002Flatest)\n\n:three:rd place at [KDD Cup 2021](https:\u002F\u002Fogb.stanford.edu\u002Fpaper\u002Fkddcup2021\u002Fmag240m_SyneriseAI.pdf)\n\n---\n\n## Installation\n\n```bash\npip install pycleora\n```\n\nOptional extras:\n\n```bash\npip install pycleora[viz]       # matplotlib for visualization\npip install pycleora[full]      # matplotlib + networkx + tqdm\n```\n\n## Quick Start\n\n```python\nfrom pycleora import SparseMatrix, embed, find_most_similar\n\nedges = [\"alice item_laptop\", \"alice item_mouse\", \"bob item_keyboard\"]\ngraph = SparseMatrix.from_iterator(iter(edges), \"complex::reflexive::product\")\n\nembeddings = embed(graph, feature_dim=256, num_iterations=40)\n\nsimilar = find_most_similar(graph, embeddings, \"alice\", top_k=5)\nfor r in similar:\n    print(f\"{r['entity_id']}: {r['similarity']:.4f}\")\n```\n\n`embed()` defaults to `feature_dim=256`, `num_iterations=40`, and whitening after every propagation step.\n\n### Step-by-Step Example\n\nThe high-level `embed()` function wraps the Markov propagation loop for convenience. Here's the full manual version, which gives you complete control over the process:\n\n```python\nfrom pycleora import SparseMatrix, whiten_embeddings\nimport numpy as np\nimport pandas as pd\nimport random\n\ncustomers = [f\"Customer_{i}\" for i in range(1, 20)]\nproducts = [f\"Product_{j}\" for j in range(1, 20)]\n\ndata = {\n    \"customer\": random.choices(customers, k=100),\n    \"product\": random.choices(products, k=100),\n}\n\ndf = pd.DataFrame(data)\ncustomer_products = df.groupby('customer')['product'].apply(list).values\ncleora_input = map(lambda x: ' '.join(x), customer_products)\n\nmat = SparseMatrix.from_iterator(cleora_input, columns='complex::reflexive::product')\n\nprint(mat.entity_ids)\n\nembeddings = mat.initialize_deterministically(256)\n\nNUM_ITERATIONS = 40\n\nfor i in range(NUM_ITERATIONS):\n    embeddings = mat.left_markov_propagate(embeddings)\n    embeddings \u002F= np.linalg.norm(embeddings, ord=2, axis=-1, keepdims=True)\n    embeddings = whiten_embeddings(embeddings)\n\nfor entity, embedding in zip(mat.entity_ids, embeddings):\n    print(entity, embedding)\n\nprint(np.dot(embeddings[0], embeddings[1]))\n```\n\n### CLI\n\n```bash\npycleora embed --input graph.tsv --output embeddings.npz --dim 256 --iterations 40\npycleora info --input graph.tsv\npycleora similar --input graph.tsv --entity alice --top-k 10\npycleora benchmark --dataset karate_club\n```\n\n---\n\n## Key Advantages\n\n### No Negative Sampling\nUnlike DeepWalk, Node2Vec, and LINE, Cleora doesn't approximate random walks with negative sampling. It computes **all walks exactly** via matrix multiplication. Less noise, higher accuracy, perfect reproducibility.\n\n### 240x Faster Than GraphSAGE\nZomato reported embedding generation in **under 5 minutes** with Cleora, compared to **20 hours with GraphSAGE** on the same dataset. Rust core with adaptive parallelism makes every CPU cycle count.\n\n### Deterministic Embeddings\nSame input always produces the same output. No random seeds, no stochastic variation, no \"run it 5 times and average\" workflows. Critical for reproducible research and production ML pipelines.\n\n### Heterogeneous Hypergraphs\nNatively handles multi-type nodes and edges, bipartite graphs, and hypergraphs. TSV input with typed columns like `complex::reflexive::product`. No graph preprocessing needed.\n\n### ~5 MB, Zero Dependencies\nThe entire library is ~5 MB. Compare: PyTorch Geometric is 500 MB+, DGL is 400 MB+. Cleora ships as a single compiled Rust extension. No CUDA, no cuDNN, no GPU driver headaches.\n\n### Stable & Inductive\nEmbeddings are stable across runs and support inductive learning: new nodes can be embedded without retraining the entire graph. Production-ready from day one.\n\n---\n\n## Supported Algorithms\n\n| Algorithm | Type | Description |\n|-----------|------|-------------|\n| **Cleora** | Spectral \u002F Random Walk | Iterative Markov propagation with per-iteration whitening — all random walks in one matrix multiplication |\n| **ProNE** | Spectral | Fast spectral propagation with Chebyshev polynomial approximation |\n| **RandNE** | Random Projection | Gaussian random projection for very fast, approximate embeddings |\n| **NetMF** | Matrix Factorization | Network Matrix Factorization — factorizes the DeepWalk matrix explicitly |\n| **DeepWalk** | Random Walk | Classic random walk + skip-gram approach |\n| **Node2Vec** | Random Walk | Biased random walks with tunable BFS\u002FDFS exploration |\n| **HOPE** | Matrix Factorization | High-Order Proximity preserved Embedding |\n| **GraRep** | Matrix Factorization | Graph Representations with Global Structural Information |\n| **MLP** | Neural Classifier | 2-layer MLP classifier in pure numpy\u002Fscipy — no PyTorch needed |\n\nAll algorithms are unified under a single API. Switch between methods by changing one parameter:\n\n```bash\npycleora embed --input graph.tsv --output out.npz --algorithm cleora\npycleora embed --input graph.tsv --output out.npz --algorithm prone\npycleora embed --input graph.tsv --output out.npz --algorithm node2vec\n```\n\n### Advanced Embedding Modes\n\nBeyond the standard algorithms, Cleora supports several advanced embedding strategies:\n\n- **Multiscale embeddings** — concatenates embeddings from different iteration depths (e.g. scales `[10, 20, 30, 40]`) to capture both local and global graph structure simultaneously\n- **Attention-weighted propagation** — uses softmax-normalized dot-product attention during propagation, dynamically weighting neighbor contributions\n- **Supervised refinement** — fine-tunes unsupervised embeddings using positive\u002Fnegative entity pairs with a triplet margin loss\n- **Directed graph embeddings** — handles asymmetric relationships where edge direction matters\n- **Weighted graph embeddings** — incorporates edge weights into the propagation step\n- **Node feature integration** — initializes embeddings with external features (text, image, numeric) before propagation\n- **PCA whitening** — built-in whitening after every iteration by default to decorrelate embedding dimensions and improve downstream task performance\n\n---\n\n## Batteries Included\n\npycleora ships with a comprehensive set of built-in modules:\n\n| Module | What it does |\n|--------|-------------|\n| `pycleora.community` | Community detection (Louvain) |\n| `pycleora.classify` | MLP and Label Propagation classifiers — no PyTorch needed |\n| `pycleora.sampling` | 6 graph sampling methods |\n| `pycleora.tuning` | Grid search and random search for hyperparameter tuning |\n| `pycleora.compress` | Embedding compression (PQ, scalar quantization) |\n| `pycleora.io_utils` | Save\u002Fload embeddings (NPZ, CSV, TSV), NetworkX conversion |\n| `pycleora.viz` | Embedding visualization (UMAP, t-SNE projections) |\n| `pycleora.metrics` | Evaluation metrics for embeddings |\n| `pycleora.benchmark` | Compare algorithms with time, memory, and accuracy metrics |\n| `pycleora.ensemble` | Combine embeddings from multiple algorithms |\n| `pycleora.align` | Embedding alignment across graphs |\n| `pycleora.search` | Nearest-neighbor entity search |\n| `pycleora.stats` | Graph statistics and degree analysis |\n| `pycleora.preprocess` | Graph preprocessing and filtering |\n| `pycleora.hetero` | Heterogeneous graph utilities |\n| `pycleora.generators` | Synthetic graph generators for testing |\n| `pycleora.datasets` | Real-world benchmark datasets (Facebook, Cora, CiteSeer, PubMed, PPI, roadNet-CA, and more) |\n\nSee the [full API reference](https:\u002F\u002Fcleora.ai\u002Fapi) for details on every function and parameter.\n\n---\n\n## Case Study: Zomato\n\n**From 20 hours to under 5 minutes** — powering recommendations for 80M+ users across 500+ cities.\n\nZomato's ML team needed graph embeddings to power \"People Like You\" restaurant recommendations. Their initial approach with **GraphSAGE took ~20 hours** just to process customer-restaurant interaction data for a single city region — making it impossible to scale across 500+ cities.\n\n**Pipeline:**\n1. **Customer-Restaurant Graph** — Bipartite graph of customer orders and restaurant interactions\n2. **Cleora Embeddings** (\u003C 5 minutes) — 197x faster than DeepWalk, no sampling of positive\u002Fnegative examples\n3. **EMDE Density Estimation** — Customer preferences modeled as probability density functions\n4. **Production Recommendations** — Restaurant recommendations, search ranking, dish suggestions, and \"People Like You\" lookalikes\n\n**Results:**\n\n| Metric | Value |\n|--------|-------|\n| Speed vs DeepWalk | **197x** faster |\n| Embedding generation | **\u003C 5 min** |\n| Cities scaled to | **500+** |\n| GPUs required | **0** |\n\n[Read the full Zomato blog post →](https:\u002F\u002Fwww.zomato.com\u002Fblog\u002Fconnecting-the-dots-strengthening-recommendations-for-our-customers-part-two\u002F)\n\n---\n\n## Benchmarks\n\nBenchmarked against **7 competing algorithms** on **5 real-world datasets** (ego-Facebook, Cora, CiteSeer, PubMed, PPI) plus a 2M-node scale test. All datasets are genuine academic benchmarks from SNAP, Planetoid, and DGL. Cleora wins on accuracy on **every single dataset**.\n\nFull interactive benchmark results at [cleora.ai\u002Fbenchmarks](https:\u002F\u002Fcleora.ai\u002Fbenchmarks).\n\n### Classification Accuracy\n\n| Dataset | Nodes | Cleora | NetMF | DeepWalk | Node2Vec | HOPE | GraRep | ProNE | RandNE |\n|---------|-------|--------|-------|----------|----------|------|--------|-------|--------|\n| **ego-Facebook** | 4K | **0.990** | 0.957 | 0.958 | 0.958 | 0.890 | T\u002FO | 0.075 | 0.212 |\n| **Cora** | 2.7K | **0.861** | 0.839 | 0.835 | 0.835 | 0.821 | 0.809 | 0.179 | 0.247 |\n| **CiteSeer** | 3.3K | **0.824** | 0.810 | 0.806 | 0.806 | 0.740 | 0.756 | 0.189 | 0.244 |\n| **PubMed** | 19.7K | **0.879** | OOM | T\u002FO | T\u002FO | T\u002FO | OOM | 0.339 | 0.351 |\n| **PPI** | 3.9K | **1.000** | OOM | T\u002FO | T\u002FO | T\u002FO | OOM | 0.023 | 0.073 |\n\n> **Only 3 of 8 algorithms survive at 19.7K nodes.** HOPE, NetMF, GraRep, DeepWalk, and Node2Vec all crash or time out. Cleora achieves perfect accuracy on PPI (50 classes).\n\n### Memory Efficiency\n\n| Dataset | Cleora | Best Competitor | Factor |\n|---------|--------|-----------------|--------|\n| ego-Facebook (4K) | **22 MB** | 572 MB | 26x less |\n| Cora (2.7K) | **14 MB** | 227 MB | 16x less |\n| CiteSeer (3.3K) | **16 MB** | 294 MB | 18x less |\n| PubMed (19.7K) | **97 MB** | 175 MB | Only 3 survived |\n| roadNet-CA (2M) | **4.1 GB** | — | Only Cleora finished |\n\n### Scale Test: roadNet-CA (2 Million Nodes)\n\n2 million nodes. 31 seconds. Every other algorithm crashes with out-of-memory. Cleora is the only library that survives at this scale on a single CPU.\n\n---\n\n## Library Comparison\n\n| Feature | **pycleora 3.2** | PyG | KarateClub | DGL | Node2Vec | StellarGraph |\n|---------|:---:|:---:|:---:|:---:|:---:|:---:|\n| CPU-only (no GPU needed) | **Yes** | Optional | Yes | Optional | Yes | Optional |\n| Rust-powered core | **Yes** | No (C++) | No | No (C++) | No | No (TF) |\n| No negative sampling needed | **Yes** | No | No | No | No | No |\n| Deterministic output | **Yes** | No | No | No | No | No |\n| Node2Vec \u002F DeepWalk | **Built-in** | Yes | Yes | Yes | Yes | Yes |\n| MLP classifier (no PyTorch) | **MLP** | Requires PyTorch | No | Requires PyTorch | No | Requires TF |\n| Graph sampling | **6 methods** | Yes | No | Yes | No | Yes |\n| Hyperparameter tuning | **Grid + Random** | Manual | No | Manual | No | Manual |\n| Install size | **~5 MB** | ~500 MB+ | ~15 MB | ~400 MB+ | ~2 MB | ~600 MB+ |\n| Actively maintained | **Yes** | Yes | Yes | Yes | Yes | Archived |\n\n---\n\n## Use Cases\n\n- **Recommendation Systems** — Products, content, restaurants, videos\n- **Knowledge Graphs** — Entity and relation embeddings\n- **Customer Lookalikes** — Find users with similar behavior patterns\n- **Entity Resolution** — Match entities across data sources\n- **Fraud Detection** — Detect anomalous patterns in transaction graphs\n- **Social Networks** — Community detection and link prediction\n- **Drug Discovery** — Molecule and protein interaction networks\n- **Supply Chain** — Supplier and logistics graph analysis\n\nSee [cleora.ai\u002Fuse-cases](https:\u002F\u002Fcleora.ai\u002Fuse-cases) for detailed walkthroughs with code examples.\n\n---\n\n## How It Works\n\n1. **Input Data** — Feed edge lists, interaction logs, or knowledge triples. Cleora accepts any TSV with typed columns.\n2. **Hypergraph Construction** — Builds a heterogeneous hypergraph where a single edge can connect multiple entities of different types.\n3. **Sparse Markov Matrix** — Constructs a sparse transition matrix (99%+ sparse). Rows normalized so each row sums to 1.\n4. **Single Matrix Multiplication = All Walks** — One sparse matrix multiplication captures *every possible random walk* of a given length. No sampling, no noise.\n5. **L2-Normalized + Whitened Propagation** — Each iteration replaces every node's embedding with the L2-normalized average of its neighbors and then whitens the embedding space. The default configuration runs 40 iterations at 256 dimensions.\n6. **Embeddings Ready** — Dense, deterministic embedding vectors for every entity. Same input always yields same output.\n\n---\n\n## Also Used By\n\n**Synerise** — AI\u002FML platform processing billions of e-commerce events daily. Cleora powers core recommendation and personalization: product embeddings from terabytes of transactions, substitute vs. complement detection, customer segmentation, cold-start solving — all on CPU in minutes.\n\n**Dailymotion** — Video platform with 350M+ monthly visitors. Personalized video recommendations with improved relevance and catalog coverage.\n\n**ML Competitions** — Cleora-powered solutions achieved top placements in KDD Cup 2021, WSDM WebTour 2021, and SIGIR eCom 2020 — beating deep learning approaches on travel, e-commerce, and web recommendation benchmarks.\n\n---\n\n## FAQ\n\n**Q: What should I embed?**\n\nA: Any entities that interact with each other, co-occur or can be said to be present together in a given context. Examples can include: products in a shopping basket, locations frequented by the same people at similar times, employees collaborating together, chemical molecules being present in specific circumstances, proteins produced by the same bacteria, drug interactions, co-authors of the same academic papers, companies occurring together in the same LinkedIn profiles.\n\n**Q: How should I construct the input?**\n\nA: What works best is grouping entities co-occurring in a similar context, and feeding them in whitespace-separated lines using `complex::reflexive` modifier is a good idea. E.g. if you have product data, you can group the products by shopping baskets or by users. If you have urls, you can group them by browser sessions, or by (user, time window) pairs. Check out the usage example above. Grouping products by customers is just one possibility.\n\n**Q: Can I embed users and products simultaneously, to compare them with cosine similarity?**\n\nA: No, this is a methodologically wrong approach, stemming from outdated matrix factorization approaches. What you should do is come up with good product embeddings first, then create user embeddings from them. Feeding two columns e.g. `user product` into cleora will result in a bipartite graph. Similar products will be close to each other, similar users will be close to each other, but users and products will not necessarily be similar to each other.\n\n**Q: What embedding dimensionality to use?**\n\nA: The default is **256**. For larger production systems we often work from _1024_ to _4096_, but `256` is the baseline shipped by the library.\n\n**Q: How many iterations of Markov propagation should I use?**\n\nA: The default is **40** whitening-enhanced propagation steps. If you want more local, co-occurrence-style behavior you can dial that down manually; higher values bias more toward contextual similarity.\n\n**Q: How do I incorporate external information, e.g. entity metadata, images, texts into the embeddings?**\n\nA: Just initialize the embedding matrix with your own vectors coming from a VIT, sentence-transformers, or a random projection of your numeric features. In that scenario fewer Markov iterations than the default `40` often work best.\n\n**Q: My embeddings don't fit in memory, what do I do?**\n\nA: Cleora operates on dimensions independently. Initialize your embeddings with a smaller number of dimensions, run Cleora, persist to disk, then repeat. You can concatenate your resulting embedding vectors afterwards, but remember to normalize them afterwards!\n\n**Q: Is there a minimum number of entity occurrences?**\n\nA: No, an entity `A` co-occurring just 1 time with some other entity `B` will get a proper embedding, i.e. `B` will be the most similar to `A`. The other way around, `A` will be highly ranked among nearest neighbors of `B`, which may or may not be desirable, depending on your use case. Feel free to prune your input to Cleora to eliminate low-frequency items.\n\n**Q: Are there any edge cases where Cleora can fail?**\n\nA: Cleora works best for relatively sparse hypergraphs. If all your hyperedges contain some very common entity `X`, e.g. a _shopping bag_, then it will degrade the quality of embeddings by degenerating shortest paths in the random walk. It is a good practice to remove such entities from the hypergraph.\n\n**Q: How can Cleora be so fast and accurate at the same time?**\n\nA: Not using negative sampling is a great boon. By constructing the (sparse) Markov transition matrix, Cleora explicitly performs all possible random walks in a hypergraph in one big step (a single matrix multiplication). That's what we call a single _iteration_. The default configuration performs 40 such iterations with whitening after every step. Negative sampling or randomly selecting random walks tend to introduce a lot of noise - Cleora is free of those burdens.\n\n---\n\n## Resources\n\n- **Website**: [cleora.ai](https:\u002F\u002Fcleora.ai)\n- **API Reference**: [cleora.ai\u002Fapi](https:\u002F\u002Fcleora.ai\u002Fapi)\n- **Benchmarks**: [cleora.ai\u002Fbenchmarks](https:\u002F\u002Fcleora.ai\u002Fbenchmarks)\n- **Whitepaper**: [\"Cleora: A Simple, Strong and Scalable Graph Embedding Scheme\"](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.02302)\n- **GitHub**: [github.com\u002FBaseModelAI\u002Fcleora](https:\u002F\u002Fgithub.com\u002FBaseModelAI\u002Fcleora)\n- **PyPI**: [pypi.org\u002Fproject\u002Fpycleora](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpycleora\u002F)\n\n## Cite\n\nPlease cite [our paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.02302) (and the respective papers of the methods used) if you use this code in your own work:\n\n```\n@article{DBLP:journals\u002Fcorr\u002Fabs-2102-02302,\n  author    = {Barbara Rychalska, Piotr Babel, Konrad Goluchowski, Andrzej Michalowski, Jacek Dabrowski},\n  title     = {Cleora: {A} Simple, Strong and Scalable Graph Embedding Scheme},\n  journal   = {CoRR},\n  year      = {2021}\n}\n```\n\n## License\n\nMIT licensed. See [LICENSE](LICENSE) for details.\n\n## Contributing\n\nPull requests are welcome. For major changes, please open an issue first. Contact: cleora@synerise.com\n","\u003Cp align=\"center\">\n\n![Cleora logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBaseModelAI_cleora_readme_924aa4805a50.png)\n\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">图嵌入引擎\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\nCleora 通过\u003Cb>一次矩阵乘法即可计算所有可能的随机游走\u003C\u002Fb>。\u003Cbr>\n无需负采样，无需 GPU，无噪声。只有快速、确定性、生产级的嵌入。\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fcleora.ai\">官网\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fcleora.ai\u002Fdocs\">文档\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fcleora.ai\u002Fapi\">API 参考\u003C\u002Fa> &nbsp;·&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fcleora.ai\u002Fbenchmarks\">基准测试\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ccode>pip install pycleora\u003C\u002Fcode>\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\n  \u003Cb>#1 准确率。每个数据集。\u003C\u002Fb>\u003Cbr>\n  在\u003Cb>5 个经典的学术数据集\u003C\u002Fb>上，与\u003Cb>7 种竞争算法\u003C\u002Fb>进行对比——Cleora 在\u003Cb>每一个数据集\u003C\u002Fb>上都取得了最高的准确率，\u003Cbr>并且是唯一一种能够扩展到任意规模图而不崩溃的算法。\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  比 GraphSAGE 快\u003Cb>240 倍\u003C\u002Fb> &nbsp;·&nbsp;\n  内存占用比 NetMF 少\u003Cb>50 倍\u003C\u002Fb> &nbsp;·&nbsp;\n  安装包仅约\u003Cb>5 MB\u003C\u002Fb> &nbsp;·&nbsp;\n  不需要任何\u003Cb>GPU\u003C\u002Fb>\n\u003C\u002Fp>\n\n---\n\n## 成就\n\n在 [SIGIR eCom Challenge 2020](https:\u002F\u002Fsigir-ecom.github.io\u002Fecom20DCPapers\u002FSIGIR_eCom20_DC_paper_1.pdf) 中获得第\u003Cb>1\u003C\u002Fb>名\n\n在 [WSDM Booking.com Challenge 2021](http:\u002F\u002Fceur-ws.org\u002FVol-2855\u002Fchallenge_short_3.pdf) 中获得第\u003Cb>2\u003C\u002Fb>名及最佳论文奖\n\n在 [Twitter Recsys Challenge 2021](https:\u002F\u002Frecsys-twitter.com\u002Fcompetition_leaderboard\u002Flatest) 中获得第\u003Cb>2\u003C\u002Fb>名\n\n在 [KDD Cup 2021](https:\u002F\u002Fogb.stanford.edu\u002Fpaper\u002Fkddcup2021\u002Fmag240m_SyneriseAI.pdf) 中获得第\u003Cb>3\u003C\u002Fb>名\n\n---\n\n## 安装\n\n```bash\npip install pycleora\n```\n\n可选扩展：\n\n```bash\npip install pycleora[viz]       # matplotlib 用于可视化\npip install pycleora[full]      # matplotlib + networkx + tqdm\n```\n\n## 快速入门\n\n```python\nfrom pycleora import SparseMatrix, embed, find_most_similar\n\nedges = [\"alice item_laptop\", \"alice item_mouse\", \"bob item_keyboard\"]\ngraph = SparseMatrix.from_iterator(iter(edges), \"complex::reflexive::product\")\n\nembeddings = embed(graph, feature_dim=256, num_iterations=40)\n\nsimilar = find_most_similar(graph, embeddings, \"alice\", top_k=5)\nfor r in similar:\n    print(f\"{r['entity_id']}: {r['similarity']:.4f}\")\n```\n\n`embed()` 的默认参数为 `feature_dim=256`、`num_iterations=40`，并在每一步传播后进行白化处理。\n\n### 分步示例\n\n高级别 `embed()` 函数封装了马尔可夫传播循环，方便使用。以下是完整的手动版本，可以让你完全控制整个过程：\n\n```python\nfrom pycleora import SparseMatrix, whiten_embeddings\nimport numpy as np\nimport pandas as pd\nimport random\n\ncustomers = [f\"Customer_{i}\" for i in range(1, 20)]\nproducts = [f\"Product_{j}\" for j in range(1, 20)]\n\ndata = {\n    \"customer\": random.choices(customers, k=100),\n    \"product\": random.choices(products, k=100),\n}\n\ndf = pd.DataFrame(data)\ncustomer_products = df.groupby('customer')['product'].apply(list).values\ncleora_input = map(lambda x: ' '.join(x), customer_products)\n\nmat = SparseMatrix.from_iterator(cleora_input, columns='complex::reflexive::product')\n\nprint(mat.entity_ids)\n\nembeddings = mat.initialize_deterministically(256)\n\nNUM_ITERATIONS = 40\n\nfor i in range(NUM_ITERATIONS):\n    embeddings = mat.left_markov_propagate(embeddings)\n    embeddings \u002F= np.linalg.norm(embeddings, ord=2, axis=-1, keepdims=True)\n    embeddings = whiten_embeddings(embeddings)\n\nfor entity, embedding in zip(mat.entity_ids, embeddings):\n    print(entity, embedding)\n\nprint(np.dot(embeddings[0], embeddings[1]))\n```\n\n### 命令行工具\n\n```bash\npycleora embed --input graph.tsv --output embeddings.npz --dim 256 --iterations 40\npycleora info --input graph.tsv\npycleora similar --input graph.tsv --entity alice --top-k 10\npycleora benchmark --dataset karate_club\n```\n\n---\n\n## 核心优势\n\n### 无负采样\n与 DeepWalk、Node2Vec 和 LINE 不同，Cleora 不使用负采样来近似随机游走。它通过矩阵乘法\u003Cb>精确计算所有游走路径\u003C\u002Fb>。噪声更少，准确更高，结果完全可复现。\n\n### 比 GraphSAGE 快 240 倍\nZomato 报告称，使用 Cleora 在\u003Cb>不到 5 分钟\u003C\u002Fb>内即可生成嵌入，而使用 GraphSAGE 在相同数据集上则需要\u003Cb>20 小时\u003C\u002Fb>。基于 Rust 的核心实现了自适应并行化，充分利用每一颗 CPU 核心。\n\n### 确定性嵌入\n相同的输入始终产生相同的输出。没有随机种子，没有随机波动，无需“运行 5 次取平均”的流程。这对于可复现的研究和生产级机器学习流水线至关重要。\n\n### 异构超图支持\n原生支持多类型节点和边、二分图以及超图。只需提供带有类型列（如 `complex::reflexive::product`）的 TSV 文件即可，无需任何图预处理。\n\n### 约 5 MB，零依赖\n整个库的大小约为 5 MB。相比之下，PyTorch Geometric 超过 500 MB，DGL 则有 400 MB。Cleora 以单个编译好的 Rust 扩展形式发布，无需 CUDA、cuDNN 或 GPU 驱动程序带来的麻烦。\n\n### 稳定且具备归纳性\n嵌入在不同运行之间保持稳定，并支持归纳学习：新节点可以直接嵌入，而无需重新训练整个图。从第一天起即可投入生产使用。\n\n---\n\n## 支持的算法\n\n| 算法 | 类型 | 描述 |\n|-----------|------|-------------|\n| **Cleora** | 谱方法 \u002F 随机游走 | 迭代式马尔可夫传播，每步后进行白化——所有随机游走仅需一次矩阵乘法 |\n| **ProNE** | 谱方法 | 使用切比雪夫多项式近似的快速谱传播 |\n| **RandNE** | 随机投影 | 高斯随机投影，用于快速近似嵌入 |\n| **NetMF** | 矩阵分解 | 网络矩阵分解——显式分解 DeepWalk 矩阵 |\n| **DeepWalk** | 随机游走 | 经典的随机游走结合 skip-gram 方法 |\n| **Node2Vec** | 随机游走 | 具有 BFS\u002FDFS 探索权衡的偏置随机游走 |\n| **HOPE** | 矩阵分解 | 保留高阶邻近关系的嵌入 |\n| **GraRep** | 矩阵分解 | 包含全局结构信息的图表示 |\n| **MLP** | 神经分类器 | 纯 NumPy\u002FSciPy 实现的两层 MLP 分类器——无需 PyTorch |\n\n所有算法都统一在一个 API 下。只需更改一个参数即可切换方法：\n\n```bash\npycleora embed --input graph.tsv --output out.npz --algorithm cleora\npycleora embed --input graph.tsv --output out.npz --algorithm prone\npycleora embed --input graph.tsv --output out.npz --algorithm node2vec\n```\n\n### 高级嵌入模式\n\n除了标准算法之外，Cleora 还支持多种高级嵌入策略：\n\n- **多尺度嵌入** — 将不同迭代深度的嵌入拼接在一起（例如尺度为 `[10, 20, 30, 40]`），以同时捕捉局部和全局图结构。\n- **注意力加权传播** — 在传播过程中使用 softmax 归一化的点积注意力机制，动态调整邻居节点的贡献权重。\n- **监督精调** — 利用正负实体对和三元组间隔损失来微调无监督嵌入。\n- **有向图嵌入** — 处理边的方向性很重要的非对称关系。\n- **加权图嵌入** — 在传播步骤中纳入边的权重。\n- **节点特征集成** — 在传播之前，使用外部特征（文本、图像、数值）初始化嵌入。\n- **PCA 白化** — 默认在每次迭代后进行白化处理，以消除嵌入维度之间的相关性，并提升下游任务的表现。\n\n---\n\n## 一站式解决方案\n\npycleora 自带一套全面的内置模块：\n\n| 模块 | 功能 |\n|--------|-------------|\n| `pycleora.community` | 社区检测（Louvain 算法）|\n| `pycleora.classify` | MLP 和标签传播分类器 — 无需 PyTorch |\n| `pycleora.sampling` | 6 种图采样方法 |\n| `pycleora.tuning` | 网格搜索和随机搜索用于超参数调优 |\n| `pycleora.compress` | 嵌入压缩（PQ、标量量化）|\n| `pycleora.io_utils` | 保存\u002F加载嵌入（NPZ、CSV、TSV）、NetworkX 转换 |\n| `pycleora.viz` | 嵌入可视化（UMAP、t-SNE 投影）|\n| `pycleora.metrics` | 嵌入评估指标 |\n| `pycleora.benchmark` | 使用时间、内存和准确率指标比较算法 |\n| `pycleora.ensemble` | 结合多个算法的嵌入结果 |\n| `pycleora.align` | 图之间的嵌入对齐 |\n| `pycleora.search` | 最近邻实体搜索 |\n| `pycleora.stats` | 图统计与度分析 |\n| `pycleora.preprocess` | 图预处理与过滤 |\n| `pycleora.hetero` | 异质图工具 |\n| `pycleora.generators` | 用于测试的合成图生成器 |\n| `pycleora.datasets` | 现实世界基准数据集（Facebook、Cora、CiteSeer、PubMed、PPI、roadNet-CA 等）|\n\n有关每个函数和参数的详细信息，请参阅 [完整 API 参考](https:\u002F\u002Fcleora.ai\u002Fapi)。\n\n---\n\n## 案例研究：Zomato\n\n**从 20 小时缩短至不到 5 分钟** — 为 500 多座城市的 8000 万+ 用户提供推荐服务。\n\nZomato 的机器学习团队需要图嵌入来驱动“像你一样的人”餐厅推荐功能。他们最初采用 **GraphSAGE 方法，仅处理一个城市区域的顾客-餐厅交互数据就需要约 20 小时**，这使得扩展到 500 多座城市变得不可能。\n\n**流程：**\n1. **顾客-餐厅图** — 顾客订单与餐厅互动的二分图\n2. **Cleora 嵌入**（\u003C 5 分钟）— 比 DeepWalk 快 197 倍，无需对正负样本进行采样\n3. **EMDE 密度估计** — 将顾客偏好建模为概率密度函数\n4. **生产环境推荐** — 餐厅推荐、搜索排序、菜品建议以及“像你一样的人”相似用户查找\n\n**结果：**\n\n| 指标 | 数值 |\n|--------|-------|\n| 速度 vs DeepWalk | **197 倍**更快 |\n| 嵌入生成 | **\u003C 5 分钟** |\n| 扩展城市数量 | **500+** |\n| 所需 GPU 数量 | **0** |\n\n[阅读完整的 Zomato 博客文章 →](https:\u002F\u002Fwww.zomato.com\u002Fblog\u002Fconnecting-the-dots-strengthening-recommendations-for-our-customers-part-two\u002F)\n\n---\n\n## 基准测试\n\n我们在 **5 个真实世界数据集**（ego-Facebook、Cora、CiteSeer、PubMed、PPI）以及一个 200 万节点的大规模测试上，与 **7 种竞争算法**进行了对比。所有数据集均来自 SNAP、Planetoid 和 DGL 的权威学术基准。Cleora 在 **每一个数据集上都取得了最高的准确率**。\n\n完整的交互式基准测试结果请访问 [cleora.ai\u002Fbenchmarks](https:\u002F\u002Fcleora.ai\u002Fbenchmarks)。\n\n### 分类准确率\n\n| 数据集 | 节点数 | Cleora | NetMF | DeepWalk | Node2Vec | HOPE | GraRep | ProNE | RandNE |\n|---------|-------|--------|-------|----------|----------|------|--------|-------|--------|\n| **ego-Facebook** | 4K | **0.990** | 0.957 | 0.958 | 0.958 | 0.890 | T\u002FO | 0.075 | 0.212 |\n| **Cora** | 2.7K | **0.861** | 0.839 | 0.835 | 0.835 | 0.821 | 0.809 | 0.179 | 0.247 |\n| **CiteSeer** | 3.3K | **0.824** | 0.810 | 0.806 | 0.806 | 0.740 | 0.756 | 0.189 | 0.244 |\n| **PubMed** | 19.7K | **0.879** | OOM | T\u002FO | T\u002FO | T\u002FO | OOM | 0.339 | 0.351 |\n| **PPI** | 3.9K | **1.000** | OOM | T\u002FO | T\u002FO | T\u002FO | OOM | 0.023 | 0.073 |\n\n> **在 19.7K 节点的数据集上，只有 3 种算法能够运行。** HOPE、NetMF、GraRep、DeepWalk 和 Node2Vec 全部崩溃或超时。而 Cleora 在 PPI 数据集上实现了完美的分类准确率（50 个类别）。\n\n### 内存效率\n\n| 数据集 | Cleora | 最佳竞争对手 | 差异倍数 |\n|---------|--------|-----------------|--------|\n| ego-Facebook (4K) | **22 MB** | 572 MB | 少 26 倍 |\n| Cora (2.7K) | **14 MB** | 227 MB | 少 16 倍 |\n| CiteSeer (3.3K) | **16 MB** | 294 MB | 少 18 倍 |\n| PubMed (19.7K) | **97 MB** | 175 MB | 只有 3 种算法存活 |\n| roadNet-CA (2M) | **4.1 GB** | — | 只有 Cleora 完成了计算 |\n\n### 大规模测试：roadNet-CA（200 万个节点）\n\n200 万个节点。31 秒。其他所有算法均因内存不足而崩溃。Cleora 是唯一一款能够在单核 CPU 上完成如此大规模计算的库。\n\n---\n\n## 库对比\n\n| 特性 | **pycleora 3.2** | PyG | KarateClub | DGL | Node2Vec | StellarGraph |\n|---------|:---:|:---:|:---:|:---:|:---:|:---:|\n| 仅 CPU 运行（无需 GPU） | **是** | 可选 | 是 | 可选 | 是 | 可选 |\n| Rust 驱动的核心 | **是** | 否（C++） | 否 | 否（C++） | 否 | 否（TF） |\n| 无需负采样 | **是** | 否 | 否 | 否 | 否 | 否 |\n| 确定性输出 | **是** | 否 | 否 | 否 | 否 | 否 |\n| Node2Vec \u002F DeepWalk | **内置** | 是 | 是 | 是 | 是 | 是 |\n| MLP 分类器（无需 PyTorch） | **MLP** | 需要 PyTorch | 否 | 需要 PyTorch | 否 | 需要 TF |\n| 图采样 | **6 种方法** | 是 | 否 | 是 | 否 | 是 |\n| 超参数调优 | **网格 + 随机** | 手动 | 否 | 手动 | 否 | 手动 |\n| 安装大小 | **~5 MB** | ~500 MB+ | ~15 MB | ~400 MB+ | ~2 MB | ~600 MB+ |\n| 积极维护 | **是** | 是 | 是 | 是 | 是 | 已归档 |\n\n---\n\n## 使用场景\n\n- **推荐系统** — 商品、内容、餐厅、视频\n- **知识图谱** — 实体和关系嵌入\n- **客户相似画像** — 找到行为模式相似的用户\n- **实体消歧** — 在不同数据源中匹配实体\n- **欺诈检测** — 检测交易图中的异常模式\n- **社交网络** — 社区发现和链接预测\n- **药物发现** — 分子与蛋白质相互作用网络\n- **供应链** — 供应商和物流图分析\n\n更多包含代码示例的详细教程，请参阅 [cleora.ai\u002Fuse-cases](https:\u002F\u002Fcleora.ai\u002Fuse-cases)。\n\n---\n\n## 工作原理\n\n1. **输入数据** — 提供边列表、交互日志或知识三元组。Cleora 接受任何带有类型列的 TSV 文件。\n2. **超图构建** — 构建一个异构超图，其中一条边可以连接多个不同类型实体。\n3. **稀疏马尔可夫矩阵** — 构造一个稀疏转移矩阵（99%以上为零）。行已归一化，使得每行之和为 1。\n4. **一次矩阵乘法 = 所有随机游走** — 通过一次稀疏矩阵乘法即可捕捉给定长度下的 *所有可能的随机游走*。无需采样，无噪声。\n5. **L2 归一化 + 白化传播** — 每次迭代都会用其邻居的 L2 归一化平均值替换每个节点的嵌入向量，并对嵌入空间进行白化处理。默认配置为 256 维下运行 40 次迭代。\n6. **嵌入就绪** — 为每个实体生成稠密且确定性的嵌入向量。相同的输入始终产生相同的输出。\n\n---\n\n## 其他应用案例\n\n**Synerise** — 一款每日处理数十亿电商事件的人工智能\u002F机器学习平台。Cleora 支撑了核心的推荐与个性化功能：从数 TB 的交易数据中提取商品嵌入，识别替代品与互补品，进行客户细分，解决冷启动问题——所有这些操作均可在 CPU 上几分钟内完成。\n\n**Dailymotion** — 一家月访问量超过 3.5 亿的视频平台。利用 Cleora 提供个性化视频推荐，显著提升了相关性和内容覆盖范围。\n\n**机器学习竞赛** — 基于 Cleora 的解决方案在 KDD Cup 2021、WSDM WebTour 2021 和 SIGIR eCom 2020 等比赛中名列前茅，在旅行、电商和网页推荐基准测试中击败了深度学习方法。\n\n---\n\n## 常见问题解答\n\n**问：我应该嵌入什么？**\n\n答：任何彼此交互、共同出现，或可在特定上下文中被认为同时存在的实体。例如：购物篮中的商品、同一人群在相近时间频繁光顾的地点、协同工作的员工、特定条件下共存的化学分子、由同种细菌产生的蛋白质、药物相互作用、撰写同一学术论文的作者、出现在同一 LinkedIn 个人资料中的公司等。\n\n**问：我该如何构造输入数据？**\n\n答：最佳做法是将处于相似上下文中的实体分组，并以空格分隔的形式逐行输入。使用 `complex::reflexive` 修饰符是个不错的选择。例如，若你有商品数据，可按购物篮或用户进行分组；若你有 URL 数据，则可按浏览器会话或 (用户, 时间窗口) 对进行分组。请参考上述使用示例。按客户分组只是众多可能性之一。\n\n**问：我可以同时嵌入用户和商品，然后用余弦相似度比较它们吗？**\n\n答：不可以。这种做法在方法论上是错误的，源于过时的矩阵分解方法。正确的做法是先获得高质量的商品嵌入，再基于这些嵌入生成用户嵌入。如果直接将两列数据（如“用户 商品”）输入 Cleora，将会得到一个二部图。相似的商品会彼此靠近，相似的用户也会彼此靠近，但用户和商品之间并不一定具有相似性。\n\n**问：应使用多少维的嵌入？**\n\n答：默认值为 **256**。对于大型生产系统，我们通常使用 _1024_ 到 _4096_ 维，但库提供的基础配置为 256 维。\n\n**问：我应该进行多少步的马尔可夫传播？**\n\n答：默认为 **40** 步增强白化的传播过程。若希望获得更局部、基于共现的行为特征，可手动减少迭代次数；较高的迭代次数则会更多地偏向于上下文相似性。\n\n**问：如何将外部信息，如实体元数据、图像、文本等融入嵌入中？**\n\n答：只需用来自 VIT、sentence-transformers 或数值特征随机投影的自定义向量初始化嵌入矩阵即可。在这种情况下，通常使用少于默认的 40 次马尔可夫迭代效果更好。\n\n**问：我的嵌入数据太大，无法放入内存，该怎么办？**\n\n答：Cleora 可以独立处理各个维度。先用较少的维度初始化嵌入，运行 Cleora，将其持久化到磁盘，然后再重复此过程。最后可以将结果嵌入向量拼接起来，但务必记得随后对其进行归一化！\n\n**问：是否存在实体出现次数的最低要求？**\n\n答：没有。即使实体 `A` 只与另一实体 `B` 共现 1 次，它也能获得合理的嵌入表示，即 `B` 将成为与 `A` 最相似的实体。反之亦然，`A` 会在 `B` 的最近邻中占据较高排名，但这是否符合需求取决于具体应用场景。您可以根据需要对输入数据进行过滤，去除低频项。\n\n**问：Cleora 是否存在某些特殊情况会导致失效？**\n\n答：Cleora 最适合处理相对稀疏的超图。如果您的所有超边都包含某个非常常见的实体，例如“购物袋”，那么这会通过退化随机游走中的最短路径来降低嵌入质量。因此，建议将此类实体从超图中移除。\n\n**问：Cleora 是如何做到既快速又准确的呢？**\n\n答：不采用负采样是一个巨大优势。通过构建稀疏的马尔可夫转移矩阵，Cleora 能够在一个大步骤中显式地执行超图中的所有可能的随机游走（即一次矩阵乘法）。这就是所谓的单次“迭代”。默认配置为 40 次迭代，且每一步之后都会进行白化处理。而负采样或随机选择随机游走往往会引入大量噪声——Cleora 则完全避免了这些问题。\n\n---\n\n## 资源\n\n- **官网**: [cleora.ai](https:\u002F\u002Fcleora.ai)\n- **API 文档**: [cleora.ai\u002Fapi](https:\u002F\u002Fcleora.ai\u002Fapi)\n- **基准测试**: [cleora.ai\u002Fbenchmarks](https:\u002F\u002Fcleora.ai\u002Fbenchmarks)\n- **白皮书**: [\"Cleora: 一种简单、强大且可扩展的图嵌入方案\"](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.02302)\n- **GitHub**: [github.com\u002FBaseModelAI\u002Fcleora](https:\u002F\u002Fgithub.com\u002FBaseModelAI\u002Fcleora)\n- **PyPI**: [pypi.org\u002Fproject\u002Fpycleora](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpycleora\u002F)\n\n## 引用\n\n如果您在自己的工作中使用此代码，请引用[我们的论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.02302)（以及所用方法的相应论文）：\n\n```\n@article{DBLP:journals\u002Fcorr\u002Fabs-2102-02302,\n  author    = {Barbara Rychalska, Piotr Babel, Konrad Goluchowski, Andrzej Michalowski, Jacek Dabrowski},\n  title     = {Cleora: {一个}简单、强大且可扩展的图嵌入方案},\n  journal   = {CoRR},\n  year      = {2021}\n}\n```\n\n## 许可证\n\n采用 MIT 许可证。详情请参阅 [LICENSE](LICENSE)。\n\n## 贡献\n\n欢迎提交 Pull 请求。对于重大更改，请先提出问题。联系方式：cleora@synerise.com","# Cleora 快速上手指南\n\nCleora 是一款高性能的图嵌入引擎，无需 GPU、无需负采样，通过单次矩阵乘法即可计算所有可能的随机游走。它以确定性、高准确率和极低内存占用著称，适合生产环境部署。\n\n## 环境准备\n\n*   **操作系统**：支持 Linux、macOS 和 Windows。\n*   **Python 版本**：建议 Python 3.8 及以上版本。\n*   **硬件要求**：**无需 GPU**。基于 Rust 核心构建，充分利用 CPU 多核并行计算，内存占用极低（比同类工具少 50 倍）。\n*   **前置依赖**：无重型依赖（如 PyTorch 或 TensorFlow）。基础安装仅需 `pip`。若需可视化功能，可选安装 `matplotlib`。\n\n> **国内加速提示**：建议使用国内镜像源安装以提升下载速度。\n> ```bash\n> pip install pycleora -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\n### 1. 基础安装\n安装核心库（约 5 MB）：\n```bash\npip install pycleora\n```\n\n### 2. 可选扩展\n如需使用可视化功能或完整依赖包（包含 `networkx`, `tqdm` 等）：\n```bash\n# 仅安装可视化工具\npip install pycleora[viz]\n\n# 安装完整功能包\npip install pycleora[full]\n```\n\n## 基本使用\n\n### 方法一：Python API（推荐）\n\n以下示例展示如何从边列表构建图、生成嵌入向量并查找相似节点。\n\n```python\nfrom pycleora import SparseMatrix, embed, find_most_similar\n\n# 1. 准备数据：格式为 \"源节点 目标节点\"\nedges = [\"alice item_laptop\", \"alice item_mouse\", \"bob item_keyboard\"]\n\n# 2. 构建稀疏矩阵\n# columns 参数定义列类型，支持异构图谱 (complex::reflexive::product)\ngraph = SparseMatrix.from_iterator(iter(edges), \"complex::reflexive::product\")\n\n# 3. 生成嵌入向量\n# 默认特征维度 256，迭代 40 次，每步自动进行白化处理\nembeddings = embed(graph, feature_dim=256, num_iterations=40)\n\n# 4. 查找最相似的实体\nsimilar = find_most_similar(graph, embeddings, \"alice\", top_k=5)\nfor r in similar:\n    print(f\"{r['entity_id']}: {r['similarity']:.4f}\")\n```\n\n### 方法二：命令行工具 (CLI)\n\nCleora 提供了完整的命令行接口，适合处理大规模 TSV 文件。\n\n**生成嵌入：**\n```bash\npycleora embed --input graph.tsv --output embeddings.npz --dim 256 --iterations 40\n```\n\n**查看图信息：**\n```bash\npycleora info --input graph.tsv\n```\n\n**查找相似节点：**\n```bash\npycleora similar --input graph.tsv --entity alice --top-k 10\n```\n\n**运行基准测试：**\n```bash\npycleora benchmark --dataset karate_club\n```\n\n### 进阶控制（手动模式）\n\n如果需要完全控制传播过程（如自定义归一化方式），可以使用底层 API：\n\n```python\nfrom pycleora import SparseMatrix, whiten_embeddings\nimport numpy as np\n\n# ... (数据准备同上) ...\nmat = SparseMatrix.from_iterator(cleora_input, columns='complex::reflexive::product')\n\n# 确定性初始化\nembeddings = mat.initialize_deterministically(256)\n\nNUM_ITERATIONS = 40\nfor i in range(NUM_ITERATIONS):\n    # 左马尔可夫传播\n    embeddings = mat.left_markov_propagate(embeddings)\n    # L2 归一化\n    embeddings \u002F= np.linalg.norm(embeddings, ord=2, axis=-1, keepdims=True)\n    # 白化处理\n    embeddings = whiten_embeddings(embeddings)\n\n# 输出结果\nfor entity, embedding in zip(mat.entity_ids, embeddings):\n    print(entity, embedding)\n```","某大型电商平台的推荐算法团队正面临海量用户 - 商品交互数据的实时嵌入更新挑战，需要在有限算力下快速生成高质量的实体向量以支撑个性化推荐。\n\n### 没有 cleora 时\n- **训练效率低下**：传统图嵌入模型（如 GraphSAGE）依赖负采样和复杂的随机游走模拟，处理亿级边数据时耗时极长，无法满足每日多次迭代的需求。\n- **硬件成本高昂**：现有方案严重依赖高性能 GPU 集群进行加速，导致基础设施维护成本和电力消耗居高不下。\n- **结果不稳定**：由于随机初始化和采样过程的噪声干扰，每次运行生成的嵌入向量存在波动，导致线上推荐效果忽高忽低，难以复现最优状态。\n- **内存容易溢出**：面对稀疏且异构的大规模关系图，旧有算法内存占用巨大，常因显存不足导致任务崩溃，不得不频繁进行数据分片处理。\n\n### 使用 cleora 后\n- **极速计算响应**：cleora 通过单次矩阵乘法直接计算所有可能的随机游走，无需负采样，将原本数小时的训练过程缩短至分钟级，速度提升高达 240 倍。\n- **纯 CPU 高效运行**：完全摆脱对 GPU 的依赖，仅需普通 CPU 服务器即可承载生产级负载，大幅降低了硬件门槛和运营成本。\n- **确定性输出**：算法具备完全确定性，消除了随机噪声，确保每次生成的嵌入向量严格一致，让推荐系统的 A\u002FB 测试和效果调优更加精准可靠。\n- **线性扩展能力**：凭借极低的内存占用（比 NetMF 少 50 倍），cleora 能轻松处理超大规模异构图谱而不会崩溃，实现了真正的端到端全量数据处理。\n\ncleora 以“无采样、无 GPU、确定性”的核心特性，将图嵌入从昂贵的实验性技术转变为高效、稳定的生产级基础设施。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBaseModelAI_cleora_924aa480.png","BaseModelAI","BaseModel.AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FBaseModelAI_bc608554.png","Apply science to behavioral data. Automatically.",null,"office@basemodel.ai","https:\u002F\u002Fbasemodel.ai","https:\u002F\u002Fgithub.com\u002FBaseModelAI",[81,85,89,93,97,100],{"name":82,"color":83,"percentage":84},"Jupyter Notebook","#DA5B0B",50.8,{"name":86,"color":87,"percentage":88},"Python","#3572A5",30.1,{"name":90,"color":91,"percentage":92},"Rust","#dea584",18.9,{"name":94,"color":95,"percentage":96},"Batchfile","#C1F12E",0.1,{"name":98,"color":99,"percentage":96},"Makefile","#427819",{"name":101,"color":102,"percentage":96},"Shell","#89e051",537,57,"2026-04-05T15:21:21","NOASSERTION",1,"Linux, macOS, Windows","不需要 GPU (0 GPUs Required)","未说明 (基准测试中 200 万节点图仅需 4.1GB)",{"notes":112,"python":113,"dependencies":114},"核心由 Rust 编写并编译为单个扩展包，安装包仅约 5MB。无需 CUDA、cuDNN 或 GPU 驱动。支持确定性嵌入（无随机种子影响），原生处理异构超图和加权图。可选依赖用于可视化或完整功能集。","未说明",[115,116,117,118,119,120],"numpy","scipy","pandas","matplotlib (可选)","networkx (可选)","tqdm (可选)",[16,13,14,15],[123,124,125,126,127,128,129,130,131,132,133,134,135],"ai","graphs","synerise","embeddings","ml","machine-learning","pytorch-biggraph","deepwalk","cleora-embeddings","entity","hypergraphs","inductive-entity-embeddings","datasets","2026-03-27T02:49:30.150509","2026-04-13T04:02:48.631035",[139,144,149,154,159,163],{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},31257,"Cleora 支持在线学习（实时更新模型）吗？","目前 Cleora 不支持在线学习或实时用新数据更新模型。维护者表示短期内（1-2 个月）没有实施该功能的计划。如果未来有大量用户需要此功能，团队可能会考虑分配时间进行开发。建议关注官方更新或贡献代码。","https:\u002F\u002Fgithub.com\u002FBaseModelAI\u002Fcleora\u002Fissues\u002F30",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},31258,"如何复现白皮书中关于商品互补品和替代品的结果？数据格式和参数应如何设置？","复现 Dunhumby 白皮书结果时，输入数据的格式和列类型参数至关重要。不要使用 `transient::user complex::products`，而应使用 `complex::reflexive::CliqueNode`。官方已提供专门的 Colab Notebook 用于复现白皮书结果，其中包含了正确的数据预处理步骤和运行命令。建议直接参考该 Notebook 而不是通用的链接预测示例。","https:\u002F\u002Fgithub.com\u002FBaseModelAI\u002Fcleora\u002Fissues\u002F42",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},31259,"为什么在单列节点类型图中，距离较远的节点生成的向量相似度反而比相邻节点更高？","这种现象通常发生在二分图（bipartite graph）结构中，或者数据中存在间接连接（如 A->C, C->D）。如果所有节点属于同一实体类型但出现此类情况，请检查数据是否隐含了二分结构。对于二分图，需要参考特定的配置讨论（如 Issue #29）。此外，可以尝试调整迭代次数（-n）或增加维度（-d），但如果最短路径扫描显示短路径节点相似度低，大概率是图结构或列类型配置（如 reflexive 的使用）需要调整。","https:\u002F\u002Fgithub.com\u002FBaseModelAI\u002Fcleora\u002Fissues\u002F28",{"id":155,"question_zh":156,"answer_zh":157,"source_url":158},31260,"在 Windows 上运行时，如果输入文件包含格式错误的行，程序会静默失败或部分计算而不报错吗？","早期版本在 Windows 上遇到格式错误的输入行时可能不会中止且警告不明显。在新版本（如 v1.2.3+）中，该问题已修复。现在程序会输出明确的警告日志信息指出格式错误的行，同时继续处理格式正确的行并生成有效大小的嵌入文件。建议升级到最新版本以获得更好的错误处理和调试信息。","https:\u002F\u002Fgithub.com\u002FBaseModelAI\u002Fcleora\u002Fissues\u002F57",{"id":160,"question_zh":161,"answer_zh":162,"source_url":148},31261,"Cleora 的列类型参数（如 transient, complex, reflexive）具体代表什么含义，如何影响结果？","列类型参数决定了节点在图中的行为模式：`transient` 表示瞬态节点（通常作为连接桥梁，如用户，不生成最终嵌入或不参与某些计算）；`complex` 表示复杂节点；`reflexive` 表示自反关系（节点与自身或同类型节点交互）。例如，在构建商品共现图时，使用 `complex::reflexive::CliqueNode` 可以让商品节点之间形成全连接（clique），从而正确捕捉互补\u002F替代关系。错误的配置（如将本应是 reflexive 的设为普通 complex）会导致向量空间分布不符合预期。",{"id":164,"question_zh":165,"answer_zh":166,"source_url":148},31262,"如何准备交易数据以用于训练 Cleora 模型？","交易数据通常需要按“购物篮”（basket）分组。每一行应包含一个用户 ID 和该用户在该次交易中购买的一系列商品 ID，格式通常为 `user_id \u003Ctab> product_id product_id ...`。可以使用 Pandas 等工具读取原始 CSV，按 `household_key` 和 `BASKET_ID` 分组，然后将商品 ID 列表拼接成字符串写入文本文件。注意不要直接套用链接预测（Link Prediction）的数据格式，因为互补品\u002F替代品分析需要不同的图结构构建方式（如 Clique 结构）。",[168,173,178,183,188,193,198,203,208,213,218],{"id":169,"version":170,"summary_zh":171,"released_at":172},230973,"v3.2.1","默认为黑白过渡","2026-04-02T15:27:59",{"id":174,"version":175,"summary_zh":176,"released_at":177},230974,"v3.2.0","## pycleora 3.2.0\n\n**2026年3月 — 性能与生态版本**\n\n图嵌入。极速运行。\n\n---\n\n### 新特性\n\n- **Rust原生全嵌入循环（`embed_fast`）** — 整个嵌入流程（初始化、传播、归一化、迭代）现在都在一次Rust调用中完成。消除了每次迭代时Python与Rust之间的边界切换。在roadNet（200万节点）上速度提升至**3.7倍**：15.8秒 → 4.3秒。在Cora（2500节点）上速度提升至**1.7倍**。\n\n- **嵌入白化（`whiten_embeddings`）** — 一种后处理方法，通过特征值分解对嵌入维度进行均值中心化和去相关处理。在Cora数据集上，将节点分类准确率从0.26提升至0.70。结合多尺度方法，可达到**0.83的准确率**。作为`embed()`函数中的`whiten=True`参数提供。\n\n- **残差连接** — 将传播后的嵌入与前一次迭代的结果混合：`emb = (1-α)·propagated + α·prev`。防止深层迭代时出现过平滑现象。参数为`embed()`中的`residual_weight`。\n\n- **基于收敛的提前停止** — 自动检测嵌入何时趋于稳定（迭代间的RMSE低于阈值）。对于在`max_iterations`之前就收敛的图，可节省计算资源。参数为`embed()`中的`convergence_threshold`。\n\n- **图统计模块（`pycleora.stats`）** — 包含`graph_summary()`、`degree_distribution()`、`clustering_coefficient()`、`connected_components()`、`diameter()`、`betweenness_centrality()`、`pagerank()`等函数。\n\n- **图预处理模块（`pycleora.preprocess`）** — 包括`clean_graph()`、`largest_connected_component()`、`filter_by_degree()`等工具。\n\n- **近似最近邻搜索模块（`pycleora.search`）** — 提供`ANNIndex`类，用于近似最近邻查询。支持通过可选的hnswlib库使用HNSW算法，或在无依赖的情况下回退到球树法。\n\n- **嵌入压缩模块（`pycleora.compress`）** — 包括`pca_compress()`、`random_projection()`、`product_quantize()`等方法，并生成带有`reconstruct()`和`search()`方法的`PQIndex`。\n\n- **嵌入对齐模块（`pycleora.align`）** — 提供`procrustes()`、`cca_align()`、`alignment_score()`等方法。\n\n- **集成嵌入模块（`pycleora.ensemble`）** — `combine()`函数可通过拼接、平均、加权平均或SVD降维等方式合并多个嵌入矩阵。\n\n- **扩展的I\u002FO功能** — 支持从Pandas、Scipy稀疏矩阵、NumPy数组以及边列表加载图数据。\n\n### 改进\n\n- **Rust核心：双缓冲传播** — 使用两个预先分配的缓冲区交替使用，而非每次迭代都重新分配嵌入矩阵。这降低了垃圾回收压力和内存分配器的开销。\n\n- **Rust核心：更快的初始化哈希** — 在`init_value()`中将SipHash（`DefaultHasher`）替换为FxHash。FxHash的运行速度约为0.3周期\u002F字节，而SipHash约为4周期\u002F字节——对于大型图而言，初始化速度提升了10倍。\n\n- **Rust核心：嵌入过程中释放GIL** — `py.allow_threads()`会在整个Rust嵌入计算过程中释放Python的全局解释器锁（GIL），从而实现真正的多线程并行计算。\n\n- **Rust核心：向量化友好的内层循环** — SpMM核已重写，采用直接切片访问的方式。","2026-03-31T12:44:47",{"id":179,"version":180,"summary_zh":181,"released_at":182},230975,"v2.0.0","**Cleora** 目前已作为 Python 包 `pycleora` 提供。与先前版本相比，主要改进包括：\n* _性能优化_：嵌入时间提升约 10 倍\n* _性能优化_：显著降低内存占用\n* _最新研究_：提升了嵌入质量\n* _新特性_：除了支持从 TSV 文件创建图之外，还可直接从 Python 迭代器构建图\n* _新特性_：与 NumPy 无缝集成\n* _新特性_：通过自定义嵌入初始化支持项目属性\n* _新特性_：在每一步传播后可调整向量投影或归一化\n\n**重大变更：**\n* 不再支持 _transient_ 修饰符——对于超图嵌入，按瞬态实体分组来创建 `complex::reflexive` 列，效果更佳。\n","2024-11-24T21:52:40",{"id":184,"version":185,"summary_zh":186,"released_at":187},230976,"v1.2.3","### 变更\n- 升级依赖库 ([#60])。\n\n[#60]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F60\n\n### 修复\n- 检查输入中是否存在格式错误的行 ([#59])。\n\n[#59]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F59","2022-06-29T14:56:24",{"id":189,"version":190,"summary_zh":191,"released_at":192},230977,"v1.2.2","### 已更改\n- 允许 cleora 将多个输入文件作为位置参数接受。命名参数 `input` 即将弃用（[#55]）。\n\n[#55]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F55","2022-06-24T13:20:02",{"id":194,"version":195,"summary_zh":196,"released_at":197},230978,"v1.2.1","### 变更\r\n- 优化了 `--output-format numpy` 模式，使其在写入输出文件时不再需要额外内存（[#50]）。\r\n- 升级依赖库（[#52]）。\n\n[#50]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F50\n[#52]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F52","2022-04-13T10:04:48",{"id":199,"version":200,"summary_zh":201,"released_at":202},230979,"v1.2.0","### 新增\n- 在向量初始化时使用默认哈希器。（[#47]）。\n\n[#47]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F47","2022-03-17T16:52:22",{"id":204,"version":205,"summary_zh":206,"released_at":207},230980,"v1.1.1","### 新增\n- 在训练过程中使用种子初始化嵌入表示（[#27]）。\n\n[#27]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F27","2021-05-14T12:58:11",{"id":209,"version":210,"summary_zh":211,"released_at":212},230981,"v1.1.0","### 变更\n- 将 `env_logger` 升级至 `0.8.2`，`smallvec` 升级至 `1.5.1`，移除了 `fnv` 哈希器 ([#11])。\n\n[#11]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F11\n\n### 新增\n- 为内存中和内存映射文件的嵌入计算添加了测试（快照）([#12])。\n- 支持 `NumPy` 输出格式（可通过程序参数 `--output-format` 使用）([#15])。\n- 包含实验的 Jupyter 笔记本([#16])。\n\n[#12]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F12\n[#15]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F15\n[#16]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F16\n\n### 改进\n- 在 `hash_to_id` 映射中使用 `vector`，实现了非分配式的笛卡尔积，并引入 `ryu` crate 以提升写入速度 ([#13])。\n- 对稀疏矩阵进行了重构（清理、简化、使用迭代器、提速）。同时，将 `Cargo.toml` 中的数据用于 `clap` crate ([#17])。\n- 统一并简化了内存中矩阵和内存映射矩阵的嵌入计算逻辑 ([#18])。\n\n[#13]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F13\n[#17]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F17\n[#18]: https:\u002F\u002Fgithub.com\u002FSynerise\u002Fcleora\u002Fpull\u002F18","2020-12-23T18:07:46",{"id":214,"version":215,"summary_zh":216,"released_at":217},230982,"v1.0.1","### 修复\n- 跳过读取无效的 UTF-8 行 (#8)。\n- 修复 Clippy 警告 (#7)。\n\n### 新增\n- JSON 支持 (#3)。\n- 快照测试 (#5)。","2020-11-23T16:37:32",{"id":219,"version":220,"summary_zh":221,"released_at":222},230983,"v1.0.0","Initial release.","2020-11-23T08:55:54"]