[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-elki-project--elki":3,"tool-elki-project--elki":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",149489,2,"2026-04-10T11:32:46",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":98,"forks":99,"last_commit_at":100,"license":101,"difficulty_score":32,"env_os":102,"env_gpu":103,"env_ram":102,"env_deps":104,"category_tags":109,"github_topics":111,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":128,"updated_at":129,"faqs":130,"releases":159},6248,"elki-project\u002Felki","elki","ELKI Data Mining Toolkit ","ELKI 是一款基于 Java 开发的开源数据挖掘工具包，全称为“支持索引结构的知识发现应用开发环境”。它主要专注于无监督学习领域的研究，特别是在聚类分析和异常检测算法方面提供了丰富的实现。\n\n在数据挖掘研究中，公平地比较不同算法往往十分困难：要么缺乏统一的代码实现，要么因编程效率差异导致评估结果失真。ELKI 通过将数据挖掘算法与底层数据管理（如索引结构）彻底分离，有效解决了这一痛点。这种架构不仅让研究者能独立评估算法本身的优劣，还通过集成 R*-树等高效索引结构，显著提升了大规模数据处理时的性能与可扩展性。\n\nELKI 特别适合高校研究人员、计算机专业学生以及需要深度定制算法的开发者使用。它的核心设计理念是高度的灵活性与模块化，支持任意数据类型、距离度量标准及文件格式，并允许用户轻松扩展新的研究方法。与 Weka 或 RapidMiner 等通用框架不同，ELKI 不提供图形化界面导向的流水线操作，而是致力于提供一个参数高度可配、环境公平透明的基准测试平台，帮助社区更科学地验证和对比各类数据挖掘算法的创新价值。","# ELKI\n##### Environment for Developing KDD-Applications Supported by Index-Structures\n[![Unit tests](https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Factions\u002Fworkflows\u002Fgradle-build.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Factions\u002Fworkflows\u002Fgradle-build.yml)\n[![License AGPL-3.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-AGPL--3-34D058.svg?labelColor=444D56)](https:\u002F\u002Felki-project.github.io\u002Flicense)\n[![DBLP:conf\u002Fsisap\u002FSchubert22](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCite%20as%3A-DBLP%3Aconf%2Fsisap%2FSchubert22-34D058?labelColor=444D56)](https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Fsisap\u002FSchubert22.html?view=bibtex)\n\n## Quick Summary\nELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection.\nIn order to achieve high performance and scalability, ELKI offers many data index structures such as the R*-tree that can provide major performance gains.\nELKI is designed to be easy to extend for researchers and students in this domain, and welcomes contributions in particular of new methods.\nELKI aims at providing a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms. \n\n## Download\n\nYou can [download](https:\u002F\u002Felki-project.github.io\u002Freleases\u002F) precompiled ELKI releases from the home page,\nor you can use standard Java dependency management such as Gradle and Maven.\n\nGradle:\n```groovy\ndependencies {\n    compile group: 'io.github.elki-project', name: 'elki', version:'0.8.0'\n}\n```\n\nMaven:\n```xml\n\u003C!-- https:\u002F\u002Fmvnrepository.com\u002Fartifact\u002Fio.github.elki-project\u002Felki -->\n\u003Cdependency>\n    \u003CgroupId>io.github.elki-project\u003C\u002FgroupId>\n    \u003CartifactId>elki\u003C\u002FartifactId>\n    \u003Cversion>0.8.0\u003C\u002Fversion>\n\u003C\u002Fdependency>\n```\n\n## Background\n\nData mining research leads to many algorithms for similar tasks. A fair and useful comparison of these algorithms is difficult due to several reasons:\n * Implementations of comparison partners are not at hand.\n * If implementations of different authors are provided, an evaluation in terms of efficiency is biased to evaluate the efforts of different authors in efficient programming instead of evaluating algorithmic merits.\n\nOn the other hand, efficient data management tools like index-structures can show considerable impact on data mining tasks and are therefore useful for a broad variety of algorithms.\n\nIn ELKI, data mining algorithms and data management tasks are separated and allow for an independent evaluation. This separation makes ELKI unique among data mining frameworks like Weka or Rapidminer and frameworks for index structures like GiST. At the same time, ELKI is open to arbitrary data types, distance or similarity measures, or file formats. The fundamental approach is the independence of file parsers or database connections, data types, distances, distance functions, and data mining algorithms. Helper classes, e.g. for algebraic or analytic computations are available for all algorithms on equal terms.\n\n\nWith the development and publication of ELKI, we humbly hope to serve the data mining and database research community beneficially. The framework is **free** for scientific usage (\"free\" as in \"open source\", see [License](https:\u002F\u002Felki-project.github.io\u002Flicense) for details). In case of application of ELKI in scientific publications, we would appreciate credit in form of a [citation](https:\u002F\u002Felki-project.github.io\u002Fpublications) of the appropriate publication (see [our list of publications](https:\u002F\u002Felki-project.github.io\u002Fpublications)), that is, the publication related to the release of ELKI you were using.\n\nThe people behind ELKI are documented on the [Team](https:\u002F\u002Felki-project.github.io\u002Fteam) page.\n\n\n## The ELKI wiki: Tutorials, HowTos, Documentation\n\nBeginners may want to start at the HowTo documents, [Examples](https:\u002F\u002Felki-project.github.io\u002Fexamples\u002F) and [Tutorials](https:\u002F\u002Felki-project.github.io\u002Ftutorial\u002F) to help with difficult configuration scenarios and beginning with ELKI development.\n\nThis website serves as community development hub and task tracker for both [bug reports](https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Fissues), [Tutorials](https:\u002F\u002Felki-project.github.io\u002Ftutorial\u002F), [FAQ](https:\u002F\u002Felki-project.github.io\u002Ffaq), general issues and development tasks.\n\nThe most important documentation pages are: [Tutorial](https:\u002F\u002Felki-project.github.io\u002Ftutorial\u002F), [JavaDoc]((https:\u002F\u002Felki-project.github.io\u002Fdev\u002Fjavadoc)), [FAQ](https:\u002F\u002Felki-project.github.io\u002Ffaq),\n[InputFormat](https:\u002F\u002Felki-project.github.io\u002Fhowto\u002Finputformat), [DataTypes](https:\u002F\u002Felki-project.github.io\u002Fdatatypes), [DistanceFunctions](https:\u002F\u002Felki-project.github.io\u002Falgorithms\u002Fdistances), [DataSets](https:\u002F\u002Felki-project.github.io\u002Fdatasets\u002F), [Development](https:\u002F\u002Felki-project.github.io\u002Fdev\u002F), [Parameterization](https:\u002F\u002Felki-project.github.io\u002Fdev\u002Fparameterization),\n[Visualization](https:\u002F\u002Felki-project.github.io\u002Falgorithms\u002Fvisualization), [Benchmarking](https:\u002F\u002Felki-project.github.io\u002Fbenchmarking), and the\nlist of [Algorithms](https:\u002F\u002Felki-project.github.io\u002Falgorithms\u002F) and [RelatedPublications](https:\u002F\u002Felki-project.github.io\u002Freferences).\n\n## Getting ELKI: Download and Citation Policy\n\nYou can download ELKI including source code on the [Releases](https:\u002F\u002Felki-project.github.io\u002Freleases\u002F) page.\u003Cbr \u002F> ELKI uses the [AGPLv3 License](https:\u002F\u002Felki-project.github.io\u002Flicense), a well-known open source license.\n\nThere is a list of [Publications](https:\u002F\u002Felki-project.github.io\u002Fpublications) that accompany the ELKI releases. When using ELKI in your scientific work, you should cite the publication corresponding to the ELKI release you are using, to give credit. This also helps to improve the repeatability of your experiments. We would also appreciate if you contributed your algorithm to ELKI to allow others to reproduce your results and compare with your algorithm (which in turn will likely get you citations). We try to document every publication used for implementing ELKI: the page [RelatedPublications](https:\u002F\u002Felki-project.github.io\u002Frelated) is generated from the source code annotations.\n\n## Efficiency Benchmarking with ELKI\n\nELKI is quite fast (see [some of our benchmark results](https:\u002F\u002Felki-project.github.io\u002Fbenchmarking)) but the focus lies on a *broad coverage of algorithms and variations*.\nWe discourage cross-platform benchmarking, because it is easy to produce misleading results by comparing apples and oranges. For fair comparability, you should implement all algorithms within ELKI, and use the same APIs. We have also observed Java JDK versions have a large impact on the runtime performance. To make your results reproducible, please [cite](https:\u002F\u002Felki-project.github.io\u002Fpublications) the version you have been using. See also [Benchmarking](https:\u002F\u002Felki-project.github.io\u002Fbenchmarking).\n\n\n## Bug Reports and Contact\n\nYou can [browse the open bug reports](https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Fissues) or [create a new bug report](https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Fissues\u002Fnew).\n\nWe also appreciate any comments, suggestions and code contributions.\u003Cbr\u002F> You can contact the core development team by e-mail: `elki () dbs ifi lmu de`\n\n## Design Goals\n\n * Extensibility - ELKI has a very modular design. We want to allow arbitrary combinations of data types, distance functions, algorithms, input formats, index structures and evaluations methods\n * Contributions - ELKI grows only as fast as people contribute. By having a modular design that allows small contributions such as single distance functions and single algorithms, we can have students and external contributors participate in the progress of ELKI\n * Completeness - for an exhaustive comparison of methods, we aim at covering as much published and credited work as we can\n * Fairness - It is easy to do an unfair comparison by badly implementing a competitor. We try to implement every method as good as we can, and by publishing the source code allow for external improvements. We try to add all proposed improvements, such as index structures for faster range and kNN queries\n * Performance - the modular architecture of ELKI allows optimized versions of algorithms and index structures for acceleration\n * Progress - ELKI is changing with every release. To accomodate new features and enhance performance, API breakages are unavoidable. We hope to get a stable API with the 1.0 release, but we are not there yet.\n\n## Building ELKI\n\nELKI is built using the [Gradle](https:\u002F\u002Fgradle.org\u002F) wrapper:\n\n    .\u002Fgradlew shadowJar\n\nwill produce a single executable `jar` file named `elki-bundle-\u003CVERSION>.jar`.\n\nIndividual jar files can be built using:\n\n    .\u002Fgradlew jar\n\nA complete build (with tests and JavaDoc, it will take a few minutes) can be triggered as:\n\n    .\u002Fgradlew build\n\nEclipse can build ELKI, and the easiest way is to use `elki-bundle` as classpath, which includes everything enabled.\n","# ELKI\n##### 基于索引结构支持的KDD应用开发环境\n[![单元测试](https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Factions\u002Fworkflows\u002Fgradle-build.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Factions\u002Fworkflows\u002Fgradle-build.yml)\n[![AGPL-3.0许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-AGPL--3-34D058.svg?labelColor=444D56)](https:\u002F\u002Felki-project.github.io\u002Flicense)\n[![DBLP:conf\u002Fsisap\u002FSchubert22](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCite%20as%3A-DBLP%3Aconf%2Fsisap%2FSchubert22-34D058?labelColor=444D56)](https:\u002F\u002Fdblp.org\u002Frec\u002Fconf\u002Fsisap\u002FSchubert22.html?view=bibtex)\n\n## 快速概览\nELKI是一款用Java编写的开源（AGPLv3）数据挖掘软件。ELKI的重点是算法研究，尤其侧重于聚类分析和异常检测中的无监督方法。\n为了实现高性能和可扩展性，ELKI提供了多种数据索引结构，例如R*-树，这些结构能够显著提升性能。\nELKI的设计宗旨是便于该领域的研究人员和学生进行扩展，并特别欢迎对新方法的贡献。ELKI旨在提供大量高度可参数化的算法，以便轻松、公正地评估和基准测试各种算法。\n\n## 下载\n您可以从主页上[下载](https:\u002F\u002Felki-project.github.io\u002Freleases\u002F)预编译的ELKI发布版本，也可以使用Gradle和Maven等标准的Java依赖管理工具。\n\nGradle:\n```groovy\ndependencies {\n    compile group: 'io.github.elki-project', name: 'elki', version:'0.8.0'\n}\n```\n\nMaven:\n```xml\n\u003C!-- https:\u002F\u002Fmvnrepository.com\u002Fartifact\u002Fio.github.elki-project\u002Felki -->\n\u003Cdependency>\n    \u003CgroupId>io.github.elki-project\u003C\u002FgroupId>\n    \u003CartifactId>elki\u003C\u002FartifactId>\n    \u003Cversion>0.8.0\u003C\u002Fversion>\n\u003C\u002Fdependency>\n```\n\n## 背景\n数据挖掘研究产生了许多用于相似任务的算法。然而，由于以下几个原因，对这些算法进行公平且有意义的比较十分困难：\n * 参与比较的算法实现往往难以获得。\n * 即使提供了不同作者的实现，基于效率的评估也容易偏向于衡量不同作者在高效编程方面的努力，而非算法本身的优劣。\n\n另一方面，高效的 数据管理工具（如索引结构）可以对数据挖掘任务产生显著影响，因此适用于多种算法。\n\n在ELKI中，数据挖掘算法与数据管理任务被分离，从而允许独立评估。这种分离使ELKI在Weka或Rapidminer等数据挖掘框架以及GiST等索引结构框架中独树一帜。同时，ELKI支持任意数据类型、距离或相似度度量，以及多种文件格式。其核心理念是将文件解析器或数据库连接、数据类型、距离度量、距离函数以及数据挖掘算法相互独立。此外，ELKI还为所有算法提供了平等使用的辅助类，例如用于代数或分析计算的工具。\n\n通过开发和发布ELKI，我们谦逊地希望为数据挖掘和数据库研究社区带来有益的帮助。该框架对于科学研究用途是**免费**的（“免费”即“开源”，详情请参阅[许可证](https:\u002F\u002Felki-project.github.io\u002Flicense)）。如果在科学出版物中使用ELKI，我们非常感谢您能以引用[相应出版物](https:\u002F\u002Felki-project.github.io\u002Fpublications)的形式致谢（请参阅[我们的出版物列表](https:\u002F\u002Felki-project.github.io\u002Fpublications))，即与您所使用的ELKI版本相关的那篇论文。\n\nELKI背后的团队成员信息可在[团队](https:\u002F\u002Felki-project.github.io\u002Fteam)页面中找到。\n\n## ELKI维基：教程、操作指南、文档\n初学者可以从操作指南文档、[示例](https:\u002F\u002Felki-project.github.io\u002Fexamples\u002F)和[教程](https:\u002F\u002Felki-project.github.io\u002Ftutorial\u002F)开始，以帮助解决复杂的配置问题并入门ELKI开发。\n\n本网站既是社区开发中心，也是任务跟踪平台，用于处理[错误报告](https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Fissues)、[教程](https:\u002F\u002Felki-project.github.io\u002Ftutorial\u002F)、[常见问题解答](https:\u002F\u002Felki-project.github.io\u002Ffaq)、一般性问题及开发任务。\n\n最重要的文档页面包括：[教程](https:\u002F\u002Felki-project.github.io\u002Ftutorial\u002F)、[JavaDoc]((https:\u002F\u002Felki-project.github.io\u002Fdev\u002Fjavadoc))、[常见问题解答](https:\u002F\u002Felki-project.github.io\u002Ffaq)、[输入格式](https:\u002F\u002Felki-project.github.io\u002Fhowto\u002Finputformat)、[数据类型](https:\u002F\u002Felki-project.github.io\u002Fdatatypes)、[距离函数](https:\u002F\u002Felki-project.github.io\u002Falgorithms\u002Fdistances)、[数据集](https:\u002F\u002Felki-project.github.io\u002Fdatasets\u002F)、[开发](https:\u002F\u002Felki-project.github.io\u002Fdev\u002F)、[参数化](https:\u002F\u002Felki-project.github.io\u002Fdev\u002Fparameterization)、[可视化](https:\u002F\u002Felki-project.github.io\u002Falgorithms\u002Fvisualization)、[基准测试](https:\u002F\u002Felki-project.github.io\u002Fbenchmarking)，以及[算法列表](https:\u002F\u002Felki-project.github.io\u002Falgorithms\u002F)和[相关文献](https:\u002F\u002Felki-project.github.io\u002Freferences)。\n\n## 获取ELKI：下载与引用政策\n您可以在[发布页面](https:\u002F\u002Felki-project.github.io\u002Freleases\u002F)下载包含源代码的ELKI。\u003Cbr \u002F> ELKI采用广为人知的开源许可证——[AGPLv3许可证](https:\u002F\u002Felki-project.github.io\u002Flicense)。\n\nELKI发布时会附带一份[出版物列表](https:\u002F\u002Felki-project.github.io\u002Fpublications)。在您的科学研究中使用ELKI时，请引用与您所使用的ELKI版本相对应的那篇论文，以表示感谢。这也有助于提高实验的可重复性。此外，如果您愿意将自己的算法贡献给ELKI，我们将不胜感激，这样其他人就可以复现您的结果并与您的算法进行比较，而这也很可能为您带来更多的引用。我们尽量记录每一篇用于实现ELKI的出版物：[相关文献](https:\u002F\u002Felki-project.github.io\u002Frelated)页面正是根据源代码注释生成的。\n\n## 使用ELKI进行效率基准测试\nELKI的速度相当快（请参阅[部分基准测试结果](https:\u002F\u002Felki-project.github.io\u002Fbenchmarking))，但其重点在于*广泛的算法覆盖及其变体*。\n我们不建议进行跨平台的基准测试，因为将不同的系统或实现放在一起比较很容易得出误导性的结论。为了确保比较的公正性，您应当在ELKI内部实现所有算法，并使用相同的API。此外，我们还观察到Java JDK版本会对运行时性能产生较大影响。为使您的结果具有可重复性，请务必[引用](https:\u002F\u002Felki-project.github.io\u002Fpublications)您所使用的版本。更多信息请参阅[基准测试](https:\u002F\u002Felki-project.github.io\u002Fbenchmarking)。\n\n## 错误报告与联系方式\n\n您可以[浏览当前未解决的错误报告](https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Fissues)或[创建一个新的错误报告](https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Fissues\u002Fnew)。\n\n我们也非常欢迎任何评论、建议以及代码贡献。\u003Cbr\u002F> 您可以通过电子邮件联系核心开发团队：`elki () dbs ifi lmu de`\n\n## 设计目标\n\n * 可扩展性 - ELKI 采用高度模块化的设计。我们希望支持数据类型、距离函数、算法、输入格式、索引结构和评估方法之间的任意组合。\n * 社区参与 - ELKI 的发展速度取决于社区贡献者的参与程度。通过模块化设计，允许小型贡献（如单个距离函数或单个算法），我们可以吸引学生和外部贡献者参与到 ELKI 的发展中。\n * 完整性 - 为了对各种方法进行详尽的比较，我们的目标是尽可能涵盖已发表且被广泛认可的研究成果。\n * 公平性 - 如果实现竞争对手的方法时不够严谨，很容易导致不公平的比较结果。因此，我们力求以尽可能高质量的方式实现每种方法，并通过公开源代码来促进外部改进。同时，我们会积极采纳所有提出的改进建议，例如用于加速范围查询和 kNN 查询的索引结构。\n * 性能 - ELKI 的模块化架构使得算法和索引结构能够被优化，从而提升运行效率。\n * 持续进步 - ELKI 在每次发布中都在不断变化。为了引入新功能和提升性能，API 的破坏性变更在所难免。我们期望在 1.0 版本中达到稳定的 API，但目前尚未实现。\n\n## 构建 ELKI\n\nELKI 使用 [Gradle](https:\u002F\u002Fgradle.org\u002F) 封装脚本来构建：\n\n    .\u002Fgradlew shadowJar\n\n将生成一个名为 `elki-bundle-\u003CVERSION>.jar` 的可执行 JAR 文件。\n\n如果需要单独构建 JAR 文件，可以使用以下命令：\n\n    .\u002Fgradlew jar\n\n要执行完整的构建（包括测试和 JavaDoc，这可能需要几分钟），可以运行：\n\n    .\u002Fgradlew build\n\nEclipse 也可以用来构建 ELKI，最简单的方式是将 `elki-bundle` 作为类路径，它包含了所有已启用的功能。","# ELKI 快速上手指南\n\nELKI 是一个用 Java 编写的开源数据挖掘软件，专注于聚类分析和异常检测等无监督学习算法的研究。它提供了丰富的索引结构（如 R*-tree）以提升性能，并支持高度可配置的算法评估与基准测试。\n\n## 环境准备\n\n*   **操作系统**：Windows、Linux 或 macOS（跨平台）。\n*   **Java 环境**：需安装 JDK 11 或更高版本（推荐 JDK 17+ 以获得更佳性能）。\n    *   验证安装：`java -version`\n*   **构建工具**（仅源码编译时需要）：项目内置了 Gradle Wrapper，无需单独安装 Gradle。\n*   **内存建议**：处理大型数据集时，建议分配足够的 JVM 堆内存。\n\n## 安装步骤\n\n你可以选择下载预编译包或通过依赖管理工具集成到项目中。\n\n### 方式一：下载预编译包（推荐新手）\n\n1.  访问 [ELKI 发布页面](https:\u002F\u002Felki-project.github.io\u002Freleases\u002F) 下载最新版本的 `elki-bundle-\u003CVERSION>.jar`。\n2.  将下载的 JAR 文件放置在任意目录即可使用。\n\n### 方式二：通过 Maven\u002FGradle 集成\n\n如果你正在开发 Java 项目，可直接在配置文件中添加依赖（当前稳定版本为 0.8.0）：\n\n**Maven (`pom.xml`):**\n```xml\n\u003Cdependency>\n    \u003CgroupId>io.github.elki-project\u003C\u002FgroupId>\n    \u003CartifactId>elki\u003C\u002FartifactId>\n    \u003Cversion>0.8.0\u003C\u002Fversion>\n\u003C\u002Fdependency>\n```\n\n**Gradle (`build.gradle`):**\n```groovy\ndependencies {\n    compile group: 'io.github.elki-project', name: 'elki', version:'0.8.0'\n}\n```\n\n### 方式三：源码编译\n\n如需自定义构建或贡献代码，可克隆仓库并编译：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki.git\ncd elki\n\n# 构建可执行的 bundle JAR 文件\n.\u002Fgradlew shadowJar\n\n# 或者构建完整项目（包含测试和文档）\n.\u002Fgradlew build\n```\n编译完成后，可执行文件位于 `elki-bundle\u002Fbuild\u002Flibs\u002F` 目录下。\n\n## 基本使用\n\nELKI 主要通过命令行运行，也支持作为库调用。以下是最简单的命令行使用示例。\n\n### 1. 准备数据\nELKI 支持多种格式，最常用的是 CSV。创建一个名为 `data.csv` 的文件，内容如下（每行代表一个数据点）：\n```csv\n1.0, 2.0\n2.0, 2.5\n3.0, 3.0\n8.0, 8.0\n8.5, 8.5\n9.0, 9.0\n```\n\n### 2. 运行算法\n使用下载的 `elki-bundle-\u003CVERSION>.jar` 运行一个简单的 DBSCAN 聚类算法：\n\n```bash\njava -jar elki-bundle-0.8.0.jar dbscan -input data.csv -dbscan.epsilon 0.5 -dbscan.minpts 2\n```\n\n*   `-input`: 指定输入数据文件路径。\n*   `-dbscan.epsilon`: 设置邻域半径参数。\n*   `-dbscan.minpts`: 设置核心点的最小邻居数。\n\n### 3. 查看结果\n运行结束后，ELKI 会在控制台输出聚类统计信息，并在当前目录生成可视化文件（通常是 `.png` 或 `.svg`），展示聚类效果和异常点。\n\n> **提示**：ELKI 拥有大量可配置参数。对于复杂的配置场景，建议查阅官方 [Tutorials](https:\u002F\u002Felki-project.github.io\u002Ftutorial\u002F) 和 [算法列表](https:\u002F\u002Felki-project.github.io\u002Falgorithms\u002F) 获取详细参数说明。","某高校数据科学实验室的研究团队正在评估多种新型无监督异常检测算法，以识别金融交易数据中的欺诈模式。\n\n### 没有 elki 时\n- **复现成本高昂**：研究人员需手动编写不同算法的底层代码，耗费数周时间复现论文逻辑，且难以保证实现准确性。\n- **对比公平性缺失**：由于各算法由不同人编写，运行效率差异源于编码水平而非算法本身，导致基准测试（Benchmark）结果失真。\n- **缺乏索引加速**：处理百万级高维交易记录时，缺少 R*-tree 等高效索引结构支持，单次聚类实验耗时数小时甚至崩溃。\n- **扩展灵活性差**：想要尝试自定义距离函数或特殊数据类型时，需重构大量核心代码，严重阻碍创新验证。\n\n### 使用 elki 后\n- **开箱即用验证**：直接调用 elki 内置的数十种高度参数化算法，几分钟内即可完成从数据加载到模型运行的全流程。\n- **公正算法评估**：所有算法共享统一的数据管理与执行框架，消除了编程实现差异，确保性能对比真实反映算法优劣。\n- **索引性能飞跃**：利用 elki 集成的 R*-tree 等索引结构，大规模数据的离群点检测速度提升数十倍，实验迭代周期从天缩短至分钟。\n- **便捷定制扩展**：通过清晰的模块化设计，研究人员可轻松插入自定义的距离度量或数据解析器，无需改动核心架构。\n\nelki 通过解耦算法与数据管理，为科研人员提供了一个公平、高效且极易扩展的实验环境，极大加速了数据挖掘领域的创新步伐。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Felki-project_elki_c1f98aae.png","elki-project","ELKI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Felki-project_b4e18384.png","ELKI Data Mining Toolkit",null,"https:\u002F\u002Felki-project.github.io\u002F","https:\u002F\u002Fgithub.com\u002Felki-project",[80,84,88,92,95],{"name":81,"color":82,"percentage":83},"Java","#b07219",99.9,{"name":85,"color":86,"percentage":87},"Python","#3572A5",0.1,{"name":89,"color":90,"percentage":91},"GLSL","#5686a5",0,{"name":93,"color":94,"percentage":91},"Batchfile","#C1F12E",{"name":96,"color":97,"percentage":91},"Shell","#89e051",831,324,"2026-04-09T15:49:10","AGPL-3.0","未说明","不需要",{"notes":105,"python":103,"dependencies":106},"ELKI 是基于 Java 开发的数据挖掘软件，无需 Python 环境。构建项目需要安装 Gradle（项目自带 wrapper），运行需要 Java 开发工具包（JDK）。文中特别指出 JDK 版本对运行时性能有较大影响。许可证为 AGPLv3，主要用于科研场景下的聚类分析和异常检测算法研究。",[107,108],"Java (JDK)","Gradle",[16,14,110],"其他",[112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127],"data-mining","java","machine-learning","clustering","outlier-detection","anomalydetection","visualization","data-mining-algorithms","indexing","index","cluster-analysis","outliers","time-series","distance-functions","data-science","data-analysis","2026-03-27T02:49:30.150509","2026-04-10T20:47:17.053476",[131,136,140,145,149,154],{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},28270,"如何将 ELKI 的层次聚类结果（PointerHierarchyResult）转换为 SciPy  linkage 函数所需的数据结构？","你需要利用父指针（parent pointers）来构建合并信息。当多个数据库 ID（DBIDs）指向同一个父节点时，应使用父节点的距离值。在 `topologicalSort` 函数中，合并顺序（merge order）被用来在距离相同的情况下进一步区分合并次序。目前访问 `mergeOrder` 的唯一方法是将类放在相同的命名空间中，这是推荐的实现方式。","https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Fissues\u002F100",{"id":137,"question_zh":138,"answer_zh":139,"source_url":135},28271,"ELKI 是否支持处理超过 65,535 个实例的大数据集进行层次聚类？","目前的实现受限于 65,535 个实例。要支持更大的数据集，仅仅更改 `MatrixParadigm` 类是不够的，这需要对算法进行彻底的重写。维护者指出，虽然并行计算距离矩阵（例如使用 BLAS 库优化欧几里得距离）是可行的，但这涉及到底层架构的重大变更，目前库中尚未包含此类重写版本。",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},28272,"在实现测地线距离（Geodetic distance）或 R-Tree 地理索引时，如何正确处理经度环绕和方向判断（向东还是向西）？","代码中的注释可能引起混淆：\"East\" 既可以指“我们需要向东移动到矩形”，也可以指“我们位于矩形的东侧”，这两者是相反的。为了正确判断最短路径，需要测试点是在矩形的左侧还是右侧，这通常通过将矩形的平均经度旋转 180°（即对面的子午线）来实现。此外，角度计算应归一化到 0-360 度（或 0-2π 弧度）范围内。建议参考 ELKI 代码库中新增的单元测试（commit 30a4573），其中包含了使用 Web 工具计算的参考距离，可用于验证不同边界情况（如跨越赤道、靠近极点或跨越国际日期变更线）的实现是否正确。","https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Fissues\u002F29",{"id":146,"question_zh":147,"answer_zh":148,"source_url":144},28273,"为什么在测试测地线距离算法时，将矩形放置在 (0,0) 附近无法发现某些边界错误？","将矩形放置在 (0,0) 会导致一些复杂的边界情况（如需要使用跨轨道距离 cross-track 的边界）退化为赤道，从而掩盖错误。为了全面测试，应使用具有挑战性的地理位置，例如日本和阿根廷的边界框。还需要测试相反侧的情况（例如经度在 [0°;60°] - 180° 范围内的点），以及靠近赤道但距离矩形边 89° 的点，以验证是否应该飞向矩形边而不是角点。此外，还应添加跨越北极点的测试用例，因为在那里可能存在多条最佳路径。",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},28274,"在 Windows 上使用 KDDCLIApplication 运行 ELKI 时，如果输入文件路径不存在或格式不对，为什么会报\"Path component should be '\u002F'\"的错误？","这是因为 ELKI 的参数处理机制从 `Path` 切换到了 `URI` 以支持 Jar 包内的资源文件。在 Windows 上，如果手动输入 `file:\u002F\u002F\u002FC:\u002F...` 格式的路径，它会被转换为 `C:\\...` 格式，其中的反斜杠可能导致后续解析出错。简单的 `Paths.get` 无法处理 Jar 包内的文件，因此必须保留 URI 支持。解决此问题的关键是确保路径字符串格式正确，或者等待库对错误信息进行优化，以便更清晰地提示文件不存在而非路径格式错误。","https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Fissues\u002F75",{"id":155,"question_zh":156,"answer_zh":157,"source_url":158},28275,"CLIQUE 聚类算法中，什么样的密集单元（dense units）被认为是“连接”的？","根据原始论文 (AGGR98)，两个密集单元被认为是连接的，当且仅当它们共享一个面（common face）。这意味着它们在 n-1 个维度上是完全相同的，而在剩下的 1 个维度上是相邻的。如果实现中将仅在顶点或边缘接触但未共享完整面的单元视为同一簇，则不符合标准定义。正确的实现应严格检查这种“共享面”的邻接关系。","https:\u002F\u002Fgithub.com\u002Felki-project\u002Felki\u002Fissues\u002F53",[]]