[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-kLabUM--rrcf":3,"tool-kLabUM--rrcf":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",143909,2,"2026-04-07T11:33:18",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":76,"owner_url":77,"languages":78,"stars":87,"forks":88,"last_commit_at":89,"license":90,"difficulty_score":91,"env_os":75,"env_gpu":92,"env_ram":92,"env_deps":93,"category_tags":98,"github_topics":99,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":109,"updated_at":110,"faqs":111,"releases":142},5232,"kLabUM\u002Frrcf","rrcf","🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams","rrcf 是一个专为流式数据设计的开源 Python 库，实现了鲁棒随机切割森林（RRCF）算法，用于高效检测数据中的异常点。在实时监控、物联网传感器网络或金融交易流等场景中，数据往往持续涌入且维度较高，传统方法难以兼顾速度与准确性，而 rrcf 正是为了解决这一痛点而生。\n\n该工具特别适合数据科学家、算法研究人员以及需要处理实时数据流的开发者使用。它能够优雅地应对高维数据，自动降低无关维度的干扰，并能有效识别被大量重复数据掩盖的罕见异常值。与其他黑盒模型不同，rrcf 提供的异常评分机制具有清晰的统计学意义，基于“共谋位移”（CoDisp）概念：如果插入一个新点显著增加了模型的复杂度，则该点极有可能是异常值。\n\n此外，rrcf 支持动态增删数据点，无需重新训练整个模型，非常适合资源受限或要求低延迟的在线应用场景。只需几行代码，用户即可构建随机切割树，实时插入新观测值并获取异常评分，是探索流数据异常检测的理想选择。","# rrcf 🌲🌲🌲\n[![Build Status](https:\u002F\u002Ftravis-ci.org\u002FkLabUM\u002Frrcf.svg?branch=master)](https:\u002F\u002Ftravis-ci.org\u002FkLabUM\u002Frrcf) [![Coverage Status](https:\u002F\u002Fcoveralls.io\u002Frepos\u002Fgithub\u002FkLabUM\u002Frrcf\u002Fbadge.svg?branch=master)](https:\u002F\u002Fcoveralls.io\u002Fgithub\u002FkLabUM\u002Frrcf?branch=master) [![Python 3.6](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.6-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Frelease\u002Fpython-360\u002F) ![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FkLabUM\u002Frrcf.svg) [![status](http:\u002F\u002Fjoss.theoj.org\u002Fpapers\u002Ff8c83c0b01a984d0dbf934939b53c96d\u002Fstatus.svg)](http:\u002F\u002Fjoss.theoj.org\u002Fpapers\u002Ff8c83c0b01a984d0dbf934939b53c96d)\n\nImplementation of the *Robust Random Cut Forest Algorithm* for anomaly detection by [Guha et al. (2016)](http:\u002F\u002Fproceedings.mlr.press\u002Fv48\u002Fguha16.pdf).\n\n> S. Guha, N. Mishra, G. Roy, & O. Schrijvers, *Robust random cut forest based anomaly\n> detection on streams*, in Proceedings of the 33rd International conference on machine\n> learning, New York, NY, 2016 (pp. 2712-2721).\n\n## About\n\nThe *Robust Random Cut Forest* (RRCF) algorithm is an ensemble method for detecting outliers in streaming data. RRCF offers a number of features that many competing anomaly detection algorithms lack. Specifically, RRCF:\n\n- Is designed to handle streaming data.\n- Performs well on high-dimensional data.\n- Reduces the influence of irrelevant dimensions.\n- Gracefully handles duplicates and near-duplicates that could otherwise mask the presence of outliers.\n- Features an anomaly-scoring algorithm with a clear underlying statistical meaning.\n\nThis repository provides an open-source implementation of the RRCF algorithm and its core data structures for the purposes of facilitating experimentation and enabling future extensions of the RRCF algorithm.\n\n## Documentation\n\nRead the docs [here 📖](https:\u002F\u002Fklabum.github.io\u002Frrcf\u002F).\n\n## Installation\n\nUse `pip` to install `rrcf` via pypi:\n\n```shell\n$ pip install rrcf\n```\n\nCurrently, only Python 3 is supported.\n\n### Dependencies\n\nThe following dependencies are *required* to install and use `rrcf`:\n\n- [numpy](http:\u002F\u002Fwww.numpy.org\u002F) (>= 1.15)\n\nThe following *optional* dependencies are required to run the examples shown in the documentation:\n\n- [pandas](https:\u002F\u002Fpandas.pydata.org\u002F) (>= 0.23)\n- [scipy](https:\u002F\u002Fwww.scipy.org\u002F) (>= 1.2)\n- [scikit-learn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002F) (>= 0.20)\n- [matplotlib](https:\u002F\u002Fmatplotlib.org\u002F) (>= 3.0)\n\nListed version numbers have been tested and are known to work (this does not necessarily preclude older versions).\n\n## Robust random cut trees\n\nA robust random cut tree (RRCT) is a binary search tree that can be used to detect outliers in a point set. A RRCT can be instantiated from a point set. Points can also be added and removed from an RRCT.\n\n### Creating the tree\n\n```python\nimport numpy as np\nimport rrcf\n\n# A (robust) random cut tree can be instantiated from a point set (n x d)\nX = np.random.randn(100, 2)\ntree = rrcf.RCTree(X)\n\n# A random cut tree can also be instantiated with no points\ntree = rrcf.RCTree()\n```\n\n### Inserting points\n\n```python\ntree = rrcf.RCTree()\n\nfor i in range(6):\n    x = np.random.randn(2)\n    tree.insert_point(x, index=i)\n```\n\n```\n─+\n ├───+\n │   ├───+\n │   │   ├──(0)\n │   │   └───+\n │   │       ├──(5)\n │   │       └──(4)\n │   └───+\n │       ├──(2)\n │       └──(3)\n └──(1)\n```\n\n### Deleting points\n\n```\ntree.forget_point(2)\n```\n\n```\n─+\n ├───+\n │   ├───+\n │   │   ├──(0)\n │   │   └───+\n │   │       ├──(5)\n │   │       └──(4)\n │   └──(3)\n └──(1)\n```\n\n## Anomaly score\n\nThe likelihood that a point is an outlier is measured by its collusive displacement (CoDisp): if including a new point significantly changes the model complexity (i.e. bit depth), then that point is more likely to be an outlier.\n\n```python\n# Seed tree with zero-mean, normally distributed data\nX = np.random.randn(100,2)\ntree = rrcf.RCTree(X)\n\n# Generate an inlier and outlier point\ninlier = np.array([0, 0])\noutlier = np.array([4, 4])\n\n# Insert into tree\ntree.insert_point(inlier, index='inlier')\ntree.insert_point(outlier, index='outlier')\n```\n\n```python\ntree.codisp('inlier')\n>>> 1.75\n```\n\n```python\ntree.codisp('outlier')\n>>> 39.0\n```\n\n## Batch anomaly detection\n\nThis example shows how a robust random cut forest can be used to detect outliers in a batch setting. Outliers correspond to large CoDisp.\n\n```python\nimport numpy as np\nimport pandas as pd\nimport rrcf\n\n# Set parameters\nnp.random.seed(0)\nn = 2010\nd = 3\nnum_trees = 100\ntree_size = 256\n\n# Generate data\nX = np.zeros((n, d))\nX[:1000,0] = 5\nX[1000:2000,0] = -5\nX += 0.01*np.random.randn(*X.shape)\n\n# Construct forest\nforest = []\nwhile len(forest) \u003C num_trees:\n    # Select random subsets of points uniformly from point set\n    ixs = np.random.choice(n, size=(n \u002F\u002F tree_size, tree_size),\n                           replace=False)\n    # Add sampled trees to forest\n    trees = [rrcf.RCTree(X[ix], index_labels=ix) for ix in ixs]\n    forest.extend(trees)\n\n# Compute average CoDisp\navg_codisp = pd.Series(0.0, index=np.arange(n))\nindex = np.zeros(n)\nfor tree in forest:\n    codisp = pd.Series({leaf : tree.codisp(leaf) for leaf in tree.leaves})\n    avg_codisp[codisp.index] += codisp\n    np.add.at(index, codisp.index.values, 1)\navg_codisp \u002F= index\n```\n\n![Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FkLabUM_rrcf_readme_9ff51e2ce024.png)\n\n## Streaming anomaly detection\n\nThis example shows how the algorithm can be used to detect anomalies in streaming time series data.\n\n```python\nimport numpy as np\nimport rrcf\n\n# Generate data\nn = 730\nA = 50\ncenter = 100\nphi = 30\nT = 2*np.pi\u002F100\nt = np.arange(n)\nsin = A*np.sin(T*t-phi*T) + center\nsin[235:255] = 80\n\n# Set tree parameters\nnum_trees = 40\nshingle_size = 4\ntree_size = 256\n\n# Create a forest of empty trees\nforest = []\nfor _ in range(num_trees):\n    tree = rrcf.RCTree()\n    forest.append(tree)\n    \n# Use the \"shingle\" generator to create rolling window\npoints = rrcf.shingle(sin, size=shingle_size)\n\n# Create a dict to store anomaly score of each point\navg_codisp = {}\n\n# For each shingle...\nfor index, point in enumerate(points):\n    # For each tree in the forest...\n    for tree in forest:\n        # If tree is above permitted size, drop the oldest point (FIFO)\n        if len(tree.leaves) > tree_size:\n            tree.forget_point(index - tree_size)\n        # Insert the new point into the tree\n        tree.insert_point(point, index=index)\n        # Compute codisp on the new point and take the average among all trees\n        if not index in avg_codisp:\n            avg_codisp[index] = 0\n        avg_codisp[index] += tree.codisp(index) \u002F num_trees\n```\n\n![Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FkLabUM_rrcf_readme_292ce0c9c063.png)\n\n## Obtain feature importance\n\nThis example shows how to estimate the feature importance using the dimension of cut obtained during the calculation of the CoDisp.\n\n\n```python\nimport numpy as np\nimport pandas as pd\nimport rrcf\n\n# Set parameters\nnp.random.seed(0)\nn = 2010\nd = 3\nnum_trees = 100\ntree_size = 256\n\n# Generate data\nX = np.zeros((n, d))\nX[:1000,0] = 5\nX[1000:2000,0] = -5\nX += 0.01*np.random.randn(*X.shape)\n\n# Construct forest\nforest = []\nwhile len(forest) \u003C num_trees:\n    # Select random subsets of points uniformly from point set\n    ixs = np.random.choice(n, size=(n \u002F\u002F tree_size, tree_size),\n                           replace=False)\n    # Add sampled trees to forest\n    trees = [rrcf.RCTree(X[ix], index_labels=ix) for ix in ixs]\n    forest.extend(trees)\n\n\n# Compute average CoDisp with the cut dimension for each point\ndim_codisp = np.zeros([n,d],dtype=float)\nindex = np.zeros(n)\nfor tree in forest:\n    for leaf in tree.leaves:\n        codisp,cutdim = tree.codisp_with_cut_dimension(leaf)\n        \n        dim_codisp[leaf,cutdim] += codisp \n\n        index[leaf] += 1\n\navg_codisp = dim_codisp.sum(axis=1)\u002Findex\n\n#codisp anomaly threshold and calculate the mean over each feature\nfeature_importance_anomaly = np.mean(dim_codisp[avg_codisp>50,:],axis=0)\n#create a dataframe with the feature importance\ndf_feature_importance = pd.DataFrame(feature_importance_anomaly,columns=['feature_importance'])\ndf_feature_importance\n```\n![Image](https:\u002F\u002Fraw.githubusercontent.com\u002FkLabUM\u002Frrcf\u002Fmaster\u002Ffeature_importance.png)\n\n\n\n## Contributing\n\nWe welcome contributions to the `rrcf` repo. To contribute, submit a [pull request](https:\u002F\u002Fhelp.github.com\u002Fen\u002Farticles\u002Fabout-pull-requests) to the `dev` branch.\n\n#### Types of contributions\n\nSome suggested types of contributions include:\n\n- Bug fixes\n- Documentation improvements\n- Performance enhancements\n- Extensions to the algorithm\n\nCheck the issue tracker for any specific issues that need help. If you encounter a problem using `rrcf`, or have an idea for an extension, feel free to raise an issue.\n\n#### Guidelines for contributors\n\nPlease consider the following guidelines when contributing to the codebase:\n\n- Ensure that any new methods, functions or classes include docstrings. Docstrings should include a description of the code, as well as descriptions of the inputs (arguments) and outputs (returns). Providing an example use case is recommended (see existing methods for examples).\n- Write unit tests for any new code and ensure that all tests are passing with no warnings. Please ensure that overall code coverage does not drop below 80%.\n\n#### Running unit tests\n\nTo run unit tests, first ensure that `pytest` and `pytest-cov` are installed:\n\n```\n$ pip install pytest pytest-cov\n```\n\nTo run the tests, navigate to the root directory of the repo and run:\n\n```\n$ pytest --cov=rrcf\u002F\n```\n\n## Citing\n\nIf you have used this codebase in a publication and wish to cite it, please use the [`Journal of Open Source Software article`](https:\u002F\u002Fjoss.theoj.org\u002Fpapers\u002F10.21105\u002Fjoss.01336).\n\n> M. Bartos, A. Mullapudi, & S. Troutman, *rrcf: Implementation of the Robust\n> Random Cut Forest algorithm for anomaly detection on streams*,\n> in: Journal of Open Source Software, The Open Journal, Volume 4, Number 35.\n> 2019\n\n```bibtex\n@article{bartos_2019_rrcf,\n  title={{rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams}},\n  authors={Matthew Bartos and Abhiram Mullapudi and Sara Troutman},\n  journal={{The Journal of Open Source Software}},\n  volume={4},\n  number={35},\n  pages={1336},\n  year={2019}\n}\n```\n","# rrcf 🌲🌲🌲\n[![构建状态](https:\u002F\u002Ftravis-ci.org\u002FkLabUM\u002Frrcf.svg?branch=master)](https:\u002F\u002Ftravis-ci.org\u002FkLabUM\u002Frrcf) [![覆盖率状态](https:\u002F\u002Fcoveralls.io\u002Frepos\u002Fgithub\u002FkLabUM\u002Frrcf\u002Fbadge.svg?branch=master)](https:\u002F\u002Fcoveralls.io\u002Fgithub\u002FkLabUM\u002Frrcf?branch=master) [![Python 3.6](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.6-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Frelease\u002Fpython-360\u002F) ![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FkLabUM\u002Frrcf.svg) [![状态](http:\u002F\u002Fjoss.theoj.org\u002Fpapers\u002Ff8c83c0b01a984d0dbf934939b53c96d\u002Fstatus.svg)](http:\u002F\u002Fjoss.theoj.org\u002Fpapers\u002Ff8c83c0b01a984d0dbf934939b53c96d)\n\n由 [Guha 等人 (2016)](http:\u002F\u002Fproceedings.mlr.press\u002Fv48\u002Fguha16.pdf) 提出的用于异常检测的 *鲁棒随机切割森林算法* 的实现。\n\n> S. Guha, N. Mishra, G. Roy, & O. Schrijvers，《基于鲁棒随机切割森林的流式数据异常检测》，载于第33届国际机器学习会议论文集，纽约，纽约州，2016年（第2712–2721页）。\n\n## 关于\n\n*鲁棒随机切割森林*（RRCF）算法是一种用于检测流式数据中离群点的集成方法。RRCF 具有许多竞争性异常检测算法所不具备的功能。具体而言，RRCF：\n\n- 专为处理流式数据而设计。\n- 在高维数据上表现良好。\n- 减少无关维度的影响。\n- 能够优雅地处理重复和近似重复的数据，这些数据可能会掩盖离群点的存在。\n- 拥有一个具有明确统计意义的异常评分算法。\n\n本仓库提供了 RRCF 算法及其核心数据结构的开源实现，旨在促进实验研究，并为 RRCF 算法的未来扩展提供支持。\n\n## 文档\n\n请在此处阅读文档 📖[链接]。\n\n## 安装\n\n使用 `pip` 通过 pypi 安装 `rrcf`：\n\n```shell\n$ pip install rrcf\n```\n\n目前仅支持 Python 3。\n\n### 依赖项\n\n安装和使用 `rrcf` 需要以下依赖项：\n\n- [numpy](http:\u002F\u002Fwww.numpy.org\u002F)（>= 1.15）\n\n运行文档中示例所需的以下可选依赖项：\n\n- [pandas](https:\u002F\u002Fpandas.pydata.org\u002F)（>= 0.23）\n- [scipy](https:\u002F\u002Fwww.scipy.org\u002F)（>= 1.2）\n- [scikit-learn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002F)（>= 0.20）\n- [matplotlib](https:\u002F\u002Fmatplotlib.org\u002F)（>= 3.0）\n\n列出的版本号已经过测试并确认可用（但这并不排除较旧版本的可能性）。\n\n## 鲁棒随机切割树\n\n鲁棒随机切割树（RRCT）是一种二叉搜索树，可用于检测点集中的离群点。RRCT 可以从点集实例化。也可以向 RRCT 中添加或删除点。\n\n### 创建树\n\n```python\nimport numpy as np\nimport rrcf\n\n# 可以从点集（n x d）实例化一棵（鲁棒）随机切割树\nX = np.random.randn(100, 2)\ntree = rrcf.RCTree(X)\n\n# 随机切割树也可以在没有点的情况下实例化\ntree = rrcf.RCTree()\n```\n\n### 插入点\n\n```python\ntree = rrcf.RCTree()\n\nfor i in range(6):\n    x = np.random.randn(2)\n    tree.insert_point(x, index=i)\n```\n\n```\n─+\n ├───+\n │   ├───+\n │   │   ├──(0)\n │   │   └───+\n │   │       ├──(5)\n │   │       └──(4)\n │   └───+\n │       ├──(2)\n │       └──(3)\n └──(1)\n```\n\n### 删除点\n\n```\ntree.forget_point(2)\n```\n\n```\n─+\n ├───+\n │   ├───+\n │   │   ├──(0)\n │   │   └───+\n │   │       ├──(5)\n │   │       └──(4)\n │   └──(3)\n └──(1)\n```\n\n## 异常评分\n\n一个点是离群点的可能性由其协同位移（CoDisp）来衡量：如果加入一个新的点会显著改变模型复杂度（即位深），那么该点更有可能是离群点。\n\n```python\n# 使用零均值、正态分布的数据初始化树\nX = np.random.randn(100,2)\ntree = rrcf.RCTree(X)\n\n# 生成一个内点和一个离群点\ninlier = np.array([0, 0])\noutlier = np.array([4, 4])\n\n# 插入树中\ntree.insert_point(inlier, index='inlier')\ntree.insert_point(outlier, index='outlier')\n```\n\n```python\ntree.codisp('inlier')\n>>> 1.75\n```\n\n```python\ntree.codisp('outlier')\n>>> 39.0\n```\n\n## 批量异常检测\n\n此示例展示了如何使用鲁棒随机切割森林在批量环境中检测离群点。离群点对应于较大的 CoDisp 值。\n\n```python\nimport numpy as np\nimport pandas as pd\nimport rrcf\n\n# 设置参数\nnp.random.seed(0)\nn = 2010\nd = 3\nnum_trees = 100\ntree_size = 256\n\n# 生成数据\nX = np.zeros((n, d))\nX[:1000,0] = 5\nX[1000:2000,0] = -5\nX += 0.01*np.random.randn(*X.shape)\n\n# 构建森林\nforest = []\nwhile len(forest) \u003C num_trees:\n    # 从点集中均匀随机选择子集\n    ixs = np.random.choice(n, size=(n \u002F\u002F tree_size, tree_size),\n                           replace=False)\n    # 将采样的树添加到森林中\n    trees = [rrcf.RCTree(X[ix], index_labels=ix) for ix in ixs]\n    forest.extend(trees)\n\n# 计算平均 CoDisp\navg_codisp = pd.Series(0.0, index=np.arange(n))\nindex = np.zeros(n)\nfor tree in forest:\n    codisp = pd.Series({leaf : tree.codisp(leaf) for leaf in tree.leaves})\n    avg_codisp[codisp.index] += codisp\n    np.add.at(index, codisp.index.values, 1)\navg_codisp \u002F= index\n```\n\n![图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FkLabUM_rrcf_readme_9ff51e2ce024.png)\n\n## 流式异常检测\n\n此示例展示了如何使用该算法检测流式时间序列数据中的异常。\n\n```python\nimport numpy as np\nimport rrcf\n\n# 生成数据\nn = 730\nA = 50\ncenter = 100\nphi = 30\nT = 2*np.pi\u002F100\nt = np.arange(n)\nsin = A*np.sin(T*t-phi*T) + center\nsin[235:255] = 80\n\n# 设置树的参数\nnum_trees = 40\nshingle_size = 4\ntree_size = 256\n\n# 创建一个空树的森林\nforest = []\nfor _ in range(num_trees):\n    tree = rrcf.RCTree()\n    forest.append(tree)\n    \n# 使用“shingle”生成器创建滚动窗口\npoints = rrcf.shingle(sin, size=shingle_size)\n\n# 创建一个字典存储每个点的异常评分\navg_codisp = {}\n\n# 对于每一个 shingle...\nfor index, point in enumerate(points):\n    # 对于森林中的每一棵树...\n    for tree in forest:\n        # 如果树的叶子数量超过允许的最大值，则移除最老的叶子（FIFO）\n        if len(tree.leaves) > tree_size:\n            tree.forget_point(index - tree_size)\n        # 将新点插入树中\n        tree.insert_point(point, index=index)\n        # 计算新点的 CoDisp，并取所有树的平均值\n        if not index in avg_codisp:\n            avg_codisp[index] = 0\n        avg_codisp[index] += tree.codisp(index) \u002F num_trees\n```\n\n![图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FkLabUM_rrcf_readme_292ce0c9c063.png)\n\n## 获取特征重要性\n\n此示例展示了如何在计算 CoDisp 时，利用切割维度来估计特征的重要性。\n\n\n```python\nimport numpy as np\nimport pandas as pd\nimport rrcf\n\n# 设置参数\nnp.random.seed(0)\nn = 2010\nd = 3\nnum_trees = 100\ntree_size = 256\n\n# 生成数据\nX = np.zeros((n, d))\nX[:1000,0] = 5\nX[1000:2000,0] = -5\nX += 0.01*np.random.randn(*X.shape)\n\n# 构建森林\nforest = []\nwhile len(forest) \u003C num_trees:\n    # 从点集中均匀随机选择子集\n    ixs = np.random.choice(n, size=(n \u002F\u002F tree_size, tree_size),\n                           replace=False)\n    # 将采样的树加入森林\n    trees = [rrcf.RCTree(X[ix], index_labels=ix) for ix in ixs]\n    forest.extend(trees)\n\n\n# 计算每个点在切割维度上的平均CoDisp\ndim_codisp = np.zeros([n,d],dtype=float)\nindex = np.zeros(n)\nfor tree in forest:\n    for leaf in tree.leaves:\n        codisp,cutdim = tree.codisp_with_cut_dimension(leaf)\n        \n        dim_codisp[leaf,cutdim] += codisp \n\n        index[leaf] += 1\n\navg_codisp = dim_codisp.sum(axis=1)\u002Findex\n\n# 根据CoDisp异常阈值计算各特征的平均值\nfeature_importance_anomaly = np.mean(dim_codisp[avg_codisp>50,:],axis=0)\n# 创建包含特征重要性的数据框\ndf_feature_importance = pd.DataFrame(feature_importance_anomaly,columns=['feature_importance'])\ndf_feature_importance\n```\n![图片](https:\u002F\u002Fraw.githubusercontent.com\u002FkLabUM\u002Frrcf\u002Fmaster\u002Ffeature_importance.png)\n\n\n\n## 贡献说明\n\n我们欢迎对 `rrcf` 仓库的贡献。如需贡献，请向 `dev` 分支提交一个 [拉取请求](https:\u002F\u002Fhelp.github.com\u002Fen\u002Farticles\u002Fabout-pull-requests)。\n\n#### 贡献类型\n\n以下是一些建议的贡献类型：\n\n- 修复 bug\n- 改进文档\n- 提升性能\n- 扩展算法功能\n\n请查看问题跟踪器，了解是否有需要帮助的具体问题。如果您在使用 `rrcf` 时遇到问题，或有扩展功能的想法，欢迎随时提交问题。\n\n#### 贡献者指南\n\n在向代码库贡献时，请遵循以下指南：\n\n- 确保所有新增的方法、函数或类都包含文档字符串。文档字符串应包括代码描述、输入参数和输出结果的说明。建议提供示例用法（可参考现有方法）。\n- 为任何新代码编写单元测试，并确保所有测试都能通过且无警告。请确保整体代码覆盖率不低于80%。\n\n#### 运行单元测试\n\n要运行单元测试，首先确保已安装 `pytest` 和 `pytest-cov`：\n\n```\n$ pip install pytest pytest-cov\n```\n\n然后导航到仓库的根目录并运行：\n\n```\n$ pytest --cov=rrcf\u002F\n```\n\n## 引用说明\n\n如果您在论文或其他出版物中使用了本代码库并希望引用它，请使用 [`Journal of Open Source Software 文章`](https:\u002F\u002Fjoss.theoj.org\u002Fpapers\u002F10.21105\u002Fjoss.01336)。\n\n> M. Bartos, A. Mullapudi, & S. Troutman, *rrcf: Implementation of the Robust\n> Random Cut Forest algorithm for anomaly detection on streams*,\n> in: Journal of Open Source Software, The Open Journal, 第4卷第35期.\n> 2019\n\n```bibtex\n@article{bartos_2019_rrcf,\n  title={{rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams}},\n  authors={Matthew Bartos and Abhiram Mullapudi and Sara Troutman},\n  journal={{The Journal of Open Source Software}},\n  volume={4},\n  number={35},\n  pages={1336},\n  year={2019}\n}\n```","# rrcf 快速上手指南\n\n`rrcf` 是 **鲁棒随机切割森林 (Robust Random Cut Forest)** 算法的 Python 实现，专为流式数据中的异常检测设计。它擅长处理高维数据，能有效识别离群点并降低无关维度的影响。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows\n*   **Python 版本**：仅支持 **Python 3** (推荐 3.6+)\n*   **核心依赖**：\n    *   `numpy` (>= 1.15)\n*   **可选依赖**（运行示例代码时需要）：\n    *   `pandas`, `scipy`, `scikit-learn`, `matplotlib`\n\n## 安装步骤\n\n使用 `pip` 即可快速安装。国内用户建议使用清华或阿里镜像源以加速下载。\n\n**标准安装：**\n```shell\npip install rrcf\n```\n\n**使用国内镜像源加速安装（推荐）：**\n```shell\npip install rrcf -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\n以下是使用 `rrcf` 进行异常检测的最简示例。该算法通过计算 **共谋位移 (CoDisp)** 来衡量点的异常程度：CoDisp 值越高，该点越可能是异常点。\n\n### 1. 构建树并插入数据\n\n```python\nimport numpy as np\nimport rrcf\n\n# 生成一些符合正态分布的正常数据 (100 个点，2 维)\nX = np.random.randn(100, 2)\n\n# 初始化随机切割树\ntree = rrcf.RCTree(X)\n\n# 定义一个正常点和一个异常点\ninlier = np.array([0, 0])   # 靠近中心，应为正常\noutlier = np.array([4, 4])  # 远离中心，应为异常\n\n# 将点插入树中\ntree.insert_point(inlier, index='inlier')\ntree.insert_point(outlier, index='outlier')\n```\n\n### 2. 计算异常分数 (CoDisp)\n\n```python\n# 获取正常点的异常分数\nscore_inlier = tree.codisp('inlier')\nprint(f\"Inlier CoDisp: {score_inlier}\")\n# 输出示例：1.75 (数值较小)\n\n# 获取异常点的异常分数\nscore_outlier = tree.codisp('outlier')\nprint(f\"Outlier CoDisp: {score_outlier}\")\n# 输出示例：39.0 (数值显著较大)\n```\n\n### 3. 简单判断逻辑\n\n在实际应用中，您可以设定一个阈值来判断新数据是否为异常：\n\n```python\nthreshold = 10.0  # 根据实际数据分布调整阈值\n\nif score_outlier > threshold:\n    print(\"检测到异常点！\")\nelse:\n    print(\"数据正常。\")\n```\n\n> **提示**：对于生产环境中的流式数据或批量数据，通常建议构建由多棵树组成的“森林” (`Robust Random Cut Forest`) 并取平均 CoDisp 值，以获得更稳定的检测结果。详细用法可参考官方文档。","某大型电商平台的实时风控团队正在处理每秒数万条的用户交易流，急需从海量正常订单中瞬间识别出罕见的欺诈行为。\n\n### 没有 rrcf 时\n- 传统静态模型难以适应交易数据的实时流动，必须定期停机重新训练，导致新出现的欺诈模式存在数小时的检测盲区。\n- 面对包含用户设备、地理位置、行为习惯等几十维特征的高维数据，常规算法容易受无关维度干扰，误报率居高不下。\n- 大量重复的正常交易（如批量采购）会掩盖少数异常点，导致基于密度的检测方法失效，漏掉精心伪装的团伙作弊。\n- 缺乏明确的统计解释，运维人员面对报警只能凭经验猜测，无法量化某个订单为何被判定为异常。\n\n### 使用 rrcf 后\n- rrcf 专为流数据设计，支持动态插入和删除节点，无需重训即可实时同化新交易，将欺诈识别延迟降低至毫秒级。\n- 其鲁棒随机切割机制自动降低无关维度的权重，在高维特征空间中依然能精准锁定真正的异常轨迹。\n- 算法天然免疫重复数据干扰，即使面对海量相似的正常订单，也能敏锐捕捉到那些试图混入其中的离群欺诈点。\n- 提供的“共谋位移”（CoDisp）评分具有清晰的统计学含义，让开发人员能直接依据分数阈值解释报警原因，大幅缩短排查时间。\n\nrrcf 通过其独特的流式处理能力与统计可解释性，将原本滞后的离线风控升级为实时、精准且透明的智能防御体系。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FkLabUM_rrcf_b4626649.png","kLabUM","Real-time water systems lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FkLabUM_efae012f.png","",null,"https:\u002F\u002Fgithub.com\u002FkLabUM",[79,83],{"name":80,"color":81,"percentage":82},"Python","#3572A5",92.7,{"name":84,"color":85,"percentage":86},"TeX","#3D6117",7.3,522,117,"2026-04-04T17:15:31","MIT",1,"未说明",{"notes":94,"python":95,"dependencies":96},"目前仅支持 Python 3。若需运行文档中的示例代码，还需安装 pandas (>=0.23)、scipy (>=1.2)、scikit-learn (>=0.20) 和 matplotlib (>=3.0)。该工具主要用于流数据异常检测，无特殊硬件加速需求。","3.6+",[97],"numpy>=1.15",[16,14],[100,101,102,103,104,105,106,107,108],"outliers","detect-outliers","tree","random-forest","machine-learning","anomaly-detection","streaming-data","python","robust-random-cut-forest","2026-03-27T02:49:30.150509","2026-04-08T03:53:55.016357",[112,117,122,127,132,137],{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},23724,"如何正确保存（pickle）和加载 rrcf 树模型？直接对树对象使用 pickle.dump 会报错。","不能直接对整个树对象进行 pickle，而应该只序列化树的根节点（root）。加载时，需要创建一个空树，将加载的根节点赋值给它，并重新构建叶子节点映射。具体代码如下：\n\n保存：\npickle.dumps(tree.root)\n\n加载：\nwith open('tree.p', 'rb') as p:\n    root = pickle.load(p)\ntree = rrcf.RCTree()\ntree.root = root\ntree.map_leaves(tree.root, op=(lambda x, leaves: leaves.update({x.i : x})), leaves=tree.leaves)\ntree.ndim = len(next(iter(tree.leaves.values())).x)","https:\u002F\u002Fgithub.com\u002FkLabUM\u002Frrcf\u002Fissues\u002F65",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},23725,"如果遇到 'branch' variable can be referenced before assignment 错误，或者在大数据集上运行遇到问题怎么办？","该问题已在主分支修复。对于大数据集，建议不要逐个处理，而是采用以下策略：\n1. 使用批处理模式（Batch mode）。\n2. 使用并行化处理（Parallelization）。\n3. 在采样策略上，建议使用水库采样（Reservoir Sampling）代替简单的 FIFO 采样，以提高效率和代表性。","https:\u002F\u002Fgithub.com\u002FkLabUM\u002Frrcf\u002Fissues\u002F56",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},23726,"为什么序列化后反序列化的树，其叶子节点数量（len(leaves)）与原始树不一致？","这通常是由于数据中存在重复点导致的。to_dict 方法在早期版本中可能未正确跟踪重复点的原始索引，导致叶子节点字典丢失部分条目。虽然树结构本身可能是正确的，但叶子映射不完整。建议升级到最新版本以获取修复，或检查数据中是否存在大量重复值。","https:\u002F\u002Fgithub.com\u002FkLabUM\u002Frrcf\u002Fissues\u002F71",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},23727,"rrcf 实现中是否支持类似隔离森林的 subsample_size（子采样大小）参数？如何构建基于子采样的森林？","可以通过手动循环来实现子采样森林的构建。具体做法是：从数据集中随机抽取固定大小（subSampleSize）的子集，为每个子集构建一棵树，重复此过程直到达到所需的树的数量。示例代码逻辑如下：\n\nixs = np.random.choice(n, size=(num_subsets, tree_size), replace=False)\ntrees = [rrcf.RCTree(X[ix], index_labels=ix) for ix in ixs]\n\n然后计算所有树中每个点的平均 CoDisp 得分。","https:\u002F\u002Fgithub.com\u002FkLabUM\u002Frrcf\u002Fissues\u002F37",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},23728,"无法使用 copy.deepcopy 或 pickle 复制\u002F保存模型，报错 \"can't pickle module objects\" 如何解决？","这通常是因为使用的 PyPI 版本过旧。请卸载当前版本并从 GitHub 源码安装最新版，新版提供了 to_dict 和 from_dict 方法来替代直接的 pickle 操作，从而解决序列化问题。更新命令：\npip3 install git+https:\u002F\u002Fgithub.com\u002FkLabUM\u002Frrcf --upgrade","https:\u002F\u002Fgithub.com\u002FkLabUM\u002Frrcf\u002Fissues\u002F69",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},23729,"在流式数据处理中，新插入的点会被添加到所有树中吗？如果是，所有树最终会包含相同的点吗？","是的，如果你选择将新点插入到森林中的每一棵树，那么所有树最终都会包含该点。但是，该点在每棵树中的具体位置是随机的，由 insertpoint 算法决定。因此，虽然点集相同，但树的结构（拓扑）在不同树之间会有所不同，这正是集成方法有效性的来源。","https:\u002F\u002Fgithub.com\u002FkLabUM\u002Frrcf\u002Fissues\u002F60",[143,148,153,158,163,168,173],{"id":144,"version":145,"summary_zh":146,"released_at":147},145224,"0.4.4","更新已弃用的 NumPy 约定。","2023-03-15T20:10:41",{"id":149,"version":150,"summary_zh":151,"released_at":152},145225,"0.4.3","- 在 setup.py 中添加许可证","2020-06-10T01:13:04",{"id":154,"version":155,"summary_zh":156,"released_at":157},145226,"0.4.1","- 修复与输入输出方法中重复点相关的 bug","2020-05-23T17:57:55",{"id":159,"version":160,"summary_zh":161,"released_at":162},145227,"0.4","- 为 RCTrees 添加输入\u002F输出方法","2020-01-01T22:57:25",{"id":164,"version":165,"summary_zh":166,"released_at":167},145228,"0.3.2","- 向 RCTree 添加随机数生成器\r\n- 修复叶节点深度为负值的 bug","2019-05-26T03:32:49",{"id":169,"version":170,"summary_zh":171,"released_at":172},145229,"0.3.1","Zenodo 发布。","2019-03-28T18:35:02",{"id":174,"version":175,"summary_zh":176,"released_at":177},145230,"0.3","本次发布整合了 JOSS 评审中的所有建议：\nhttps:\u002F\u002Fgithub.com\u002Fopenjournals\u002Fjoss-reviews\u002Fissues\u002F1336\n\n此外，本次发布还提高了测试覆盖率，并增加了更多文档。","2019-03-26T05:43:36"]