[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-deepseek-ai--3FS":3,"tool-deepseek-ai--3FS":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":123,"forks":124,"last_commit_at":125,"license":126,"difficulty_score":127,"env_os":128,"env_gpu":129,"env_ram":130,"env_deps":131,"category_tags":145,"github_topics":146,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":148,"updated_at":149,"faqs":150,"releases":180},2222,"deepseek-ai\u002F3FS","3FS"," A high-performance distributed file system designed to address the challenges of AI training and inference workloads. ","3FS（Fire-Flyer File System）是一款专为人工智能训练与推理场景打造的高性能分布式文件系统。它旨在解决大模型开发中面临的海量数据读写瓶颈、存储一致性难保障以及传统缓存成本过高等核心痛点。\n\n该系统特别适合从事大规模 AI 模型训练的算法工程师、系统架构师及科研人员使用。3FS 通过独特的“分离式架构”，能够聚合数千块 SSD 的吞吐能力与数百个存储节点的网络带宽，让应用无需关心数据物理位置即可高效访问。其技术亮点包括：采用 CRAQ 协议确保强一致性，简化了分布式应用的开发逻辑；提供标准的文件接口，开发者无需学习新的 API 即可上手；支持基于事务键值对的无状态元数据服务。\n\n在实际应用中，3FS 不仅能高效管理数据分析流水线的中间产物，还能消除训练时对数据预取或打乱的依赖，实现跨节点的随机高速访问。此外，它为推理过程中的 KVCache 提供了一种高性价比的替代方案，相比传统内存缓存，在保持高吞吐的同时显著扩大了容量。测试数据显示，在大规模集群下，3FS 的聚合读取吞吐量可高达 6.6 TiB\u002Fs，是构建高性能 AI 基础设施的理想选择。","#  Fire-Flyer File System\n\n[![Build](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3fs\u002Factions\u002Fworkflows\u002Fbuild.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3fs\u002Factions\u002Fworkflows\u002Fbuild.yml)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLICENSE-MIT-blue.svg)](LICENSE)\n\nThe Fire-Flyer File System (3FS) is a high-performance distributed file system designed to address the challenges of AI training and inference workloads. It leverages modern SSDs and RDMA networks to provide a shared storage layer that simplifies development of distributed applications. Key features and benefits of 3FS include:\n\n- Performance and Usability\n  - **Disaggregated Architecture** Combines the throughput of thousands of SSDs and the network bandwidth of hundreds of storage nodes, enabling applications to access storage resource in a locality-oblivious manner.\n  - **Strong Consistency** Implements Chain Replication with Apportioned Queries (CRAQ) for strong consistency, making application code simple and easy to reason about.\n  - **File Interfaces** Develops stateless metadata services backed by a transactional key-value store (e.g., FoundationDB). The file interface is well known and used everywhere. There is no need to learn a new storage API.\n\n- Diverse Workloads\n  - **Data Preparation** Organizes outputs of data analytics pipelines into hierarchical directory structures and manages a large volume of intermediate outputs efficiently.\n  - **Dataloaders** Eliminates the need for prefetching or shuffling datasets by enabling random access to training samples across compute nodes.\n  - **Checkpointing** Supports high-throughput parallel checkpointing for large-scale training.\n  - **KVCache for Inference** Provides a cost-effective alternative to DRAM-based caching, offering high throughput and significantly larger capacity.\n\n## Documentation\n\n* [Design Notes](docs\u002Fdesign_notes.md)\n* [Setup Guide](deploy\u002FREADME.md)\n* [USRBIO API Reference](src\u002Flib\u002Fapi\u002FUsrbIo.md)\n* [P Specifications](.\u002Fspecs\u002FREADME.md)\n\n## Performance\n\n### 1. Peak throughput\n\nThe following figure demonstrates the throughput of read stress test on a large 3FS cluster. This cluster consists of 180 storage nodes, each equipped with 2×200Gbps InfiniBand NICs and sixteen 14TiB NVMe SSDs. Approximately 500+ client nodes were used for the read stress test, with each client node configured with 1x200Gbps InfiniBand NIC. The final aggregate read throughput reached approximately 6.6 TiB\u002Fs with background traffic from training jobs.\n\n![Large block read throughput under stress test on a 180-node cluster](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_readme_6e837f20872b.jpg)\n\nTo benchmark 3FS, please use our [fio engine for USRBIO](benchmarks\u002Ffio_usrbio\u002FREADME.md).\n\n### 2. GraySort\n\nWe evaluated [smallpond](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fsmallpond) using the GraySort benchmark, which measures sort performance on large-scale datasets. Our implementation adopts a two-phase approach: (1) partitioning data via shuffle using the prefix bits of keys, and (2) in-partition sorting. Both phases read\u002Fwrite data from\u002Fto 3FS.\n\nThe test cluster comprised 25 storage nodes (2 NUMA domains\u002Fnode, 1 storage service\u002FNUMA, 2×400Gbps NICs\u002Fnode) and 50 compute nodes (2 NUMA domains, 192 physical cores, 2.2 TiB RAM, and 1×200 Gbps NIC\u002Fnode). Sorting 110.5 TiB of data across 8,192 partitions completed in 30 minutes and 14 seconds, achieving an average throughput of *3.66 TiB\u002Fmin*.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_readme_16422cca802b.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_readme_b28430808a72.png)\n\n### 3. KVCache\n\nKVCache is a technique used to optimize the LLM inference process. It avoids redundant computations by caching the key and value vectors of previous tokens in the decoder layers.\nThe top figure demonstrates the read throughput of all KVCache clients (1×400Gbps NIC\u002Fnode), highlighting both peak and average values, with peak throughput reaching up to 40 GiB\u002Fs. The bottom figure presents the IOPS of removing ops from garbage collection (GC) during the same time period.\n\n![KVCache Read Throughput](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_readme_7c204f769b50.png)\n![KVCache GC IOPS](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_readme_e18cfbfee9fd.png)\n\n## Check out source code\n\nClone 3FS repository from GitHub:\n\n\tgit clone https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3fs\n\nWhen `deepseek-ai\u002F3fs` has been cloned to a local file system, run the\nfollowing commands to check out the submodules:\n\n```bash\ncd 3fs\ngit submodule update --init --recursive\n.\u002Fpatches\u002Fapply.sh\n```\n\n## Install dependencies\n\nInstall dependencies:\n\n```bash\n# for Ubuntu 20.04.\napt install cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \\\n  libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \\\n  libgoogle-perftools-dev google-perftools libssl-dev libclang-rt-14-dev gcc-10 g++-10 libboost1.71-all-dev build-essential\n\n# for Ubuntu 22.04.\napt install cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \\\n  libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \\\n  libgoogle-perftools-dev google-perftools libssl-dev gcc-12 g++-12 libboost-all-dev build-essential\n\n# for openEuler 2403sp1\nyum install cmake libuv-devel lz4-devel xz-devel double-conversion-devel libdwarf-devel libunwind-devel \\\n    libaio-devel gflags-devel glog-devel gtest-devel gmock-devel clang-tools-extra clang lld \\\n    gperftools-devel gperftools openssl-devel gcc gcc-c++ boost-devel\n\n# for OpenCloudOS 9 and TencentOS 4\ndnf install epol-release wget git meson cmake perl lld gcc gcc-c++ autoconf lz4 lz4-devel xz xz-devel \\\n    double-conversion-devel libdwarf-devel libunwind-devel libaio-devel gflags-devel glog-devel \\\n    libuv-devel gmock-devel gperftools gperftools-devel openssl-devel boost-static boost-devel mono-devel \\\n    libevent-devel libibverbs-devel numactl-devel python3-devel\n```\n\nInstall other build prerequisites:\n\n- [`libfuse`](https:\u002F\u002Fgithub.com\u002Flibfuse\u002Flibfuse\u002Freleases\u002Ftag\u002Ffuse-3.16.1) 3.16.1 or newer version\n- [FoundationDB](https:\u002F\u002Fapple.github.io\u002Ffoundationdb\u002Fgetting-started-linux.html) 7.1 or newer version\n- [Rust](https:\u002F\u002Fwww.rust-lang.org\u002Ftools\u002Finstall) toolchain: minimal 1.75.0, recommended 1.85.0 or newer version (latest stable version) \n\n## Build 3FS\n\nBuild 3FS in `build` folder:\n\n```bash\n# Replace \u003Cmethod> with 'g++10' or 'g++11' based on your environment\ncmake -S . -B build \\\n      -DCMAKE_CXX_COMPILER=clang++-14 -DCMAKE_C_COMPILER=clang-14 \\\n      -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \\\n      -DSHUFFLE_METHOD=\u003Cmethod>\ncmake --build build -j 32\n```\n\nDue to the historical use of `std::shuffle`, binaries compiled with different compiler versions (e.g., `g++10` vs. `g++11 +`) may be incompatible ([issue](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3FS\u002Fissues\u002F368)). To resolve this, you must explicitly specify `-DSHUFFLE_METHOD` during compilation to lock in a consistent shuffle algorithm:\n\n- Existing Clusters: Use the method corresponding to the compiler version previously used to deploy the cluster (`g++10` or `g++11`).\n- New Clusters: You can choose either `g++10` or `g++11`. However, once the cluster is deployed, you must stay with the same configuration for all future builds to maintain compatibility.\n\n### Build 3FS use Docker\n- For TencentOS-4:  `docker pull docker.io\u002Ftencentos\u002Ftencentos4-deepseek3fs-build:latest`\n- For OpenCloudOS-9:  `docker pull docker.io\u002Fopencloudos\u002Fopencloudos9-deepseek3fs-build:latest`\n  \n## Run a test cluster\n\nFollow instructions in [setup guide](deploy\u002FREADME.md) to run a test cluster.\n\n## Report Issues\n\nPlease visit https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3fs\u002Fissues to report issues.\n","#  火萤文件系统\n\n[![构建](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3fs\u002Factions\u002Fworkflows\u002Fbuild.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3fs\u002Factions\u002Fworkflows\u002Fbuild.yml)\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLICENSE-MIT-blue.svg)](LICENSE)\n\n火萤文件系统（3FS）是一个高性能的分布式文件系统，旨在解决人工智能训练和推理工作负载中的挑战。它利用现代固态硬盘和RDMA网络，提供一个共享存储层，从而简化分布式应用程序的开发。3FS的主要特性与优势包括：\n\n- 性能与易用性\n  - **解耦架构** 结合了数千块SSD的吞吐量以及数百个存储节点的网络带宽，使应用能够以不依赖于本地性的形式访问存储资源。\n  - **强一致性** 实现了按比例查询的链式复制（CRAQ）机制来保证强一致性，使得应用代码简单且易于理解。\n  - **文件接口** 基于事务型键值存储（如FoundationDB）构建无状态元数据服务。文件接口广为人知且被广泛应用，无需学习新的存储API。\n\n- 多样化的工作负载\n  - **数据准备** 将数据分析流水线的输出组织成层次化的目录结构，并高效管理大量中间结果。\n  - **数据加载器** 通过支持跨计算节点对训练样本的随机访问，消除了预取或打乱数据集的需求。\n  - **检查点机制** 支持大规模训练中的高吞吐并行检查点操作。\n  - **推理用KV缓存** 提供了一种经济高效的DRAM替代方案，具备高吞吐量和显著更大的容量。\n\n## 文档\n\n* [设计说明](docs\u002Fdesign_notes.md)\n* [部署指南](deploy\u002FREADME.md)\n* [USRBIO API参考](src\u002Flib\u002Fapi\u002FUsrbIo.md)\n* [P规范](.\u002Fspecs\u002FREADME.md)\n\n## 性能\n\n### 1. 峰值吞吐量\n\n下图展示了在大型3FS集群上进行读取压力测试时的吞吐量。该集群由180个存储节点组成，每个节点配备2张200Gbps InfiniBand网卡和16块14TiB NVMe SSD。大约500多个客户端节点参与了此次读取压力测试，每个客户端节点配置1张200Gbps InfiniBand网卡。最终总读取吞吐量达到约6.6 TiB\u002Fs，同时还有来自训练作业的背景流量。\n\n![180节点集群上的大块读取压力测试吞吐量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_readme_6e837f20872b.jpg)\n\n要对3FS进行基准测试，请使用我们的[用于USRBIO的fio引擎](benchmarks\u002Ffio_usrbio\u002FREADME.md)。\n\n### 2. GraySort\n\n我们使用GraySort基准测试评估了[smallpond](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fsmallpond)，该测试用于衡量大规模数据集上的排序性能。我们的实现采用了两阶段方法：(1) 通过密钥前缀位进行shuffle分区；(2) 在各分区内进行排序。这两个阶段均从3FS读写数据。\n\n测试集群由25个存储节点（每节点2个NUMA域，1个存储服务\u002FNUMA，2×400Gbps NIC\u002F节点）和50个计算节点（2个NUMA域，192个物理核心，2.2 TiB RAM，1×200 Gbps NIC\u002F节点）组成。在30分14秒内完成了对8,192个分区中110.5 TiB数据的排序，平均吞吐率达到*3.66 TiB\u002Fmin*。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_readme_16422cca802b.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_readme_b28430808a72.png)\n\n### 3. KVCache\n\nKVCache是一种用于优化LLM推理过程的技术。它通过在解码器层中缓存先前标记的键和值向量，避免了重复计算。\n上图展示了所有KVCache客户端（1×400Gbps NIC\u002F节点）的读取吞吐量，突出了峰值和平均值，其中峰值吞吐量高达40 GiB\u002Fs。下图则展示了在同一时间段内垃圾回收（GC）移除操作的IOPS。\n\n![KVCache读取吞吐量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_readme_7c204f769b50.png)\n![KVCache GC IOPS](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_readme_e18cfbfee9fd.png)\n\n## 克隆源代码\n\n从GitHub克隆3FS仓库：\n\n\tgit clone https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3fs\n\n当`deepseek-ai\u002F3fs`已被克隆到本地文件系统后，运行以下命令以检出子模块：\n\n```bash\ncd 3fs\ngit submodule update --init --recursive\n.\u002Fpatches\u002Fapply.sh\n```\n\n## 安装依赖项\n\n安装依赖项：\n\n```bash\n# 对于Ubuntu 20.04。\napt install cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \\\n  libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \\\n  libgoogle-perftools-dev google-perftools libssl-dev libclang-rt-14-dev gcc-10 g++-10 libboost1.71-all-dev build-essential\n\n# 对于Ubuntu 22.04。\napt install cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \\\n  libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \\\n  libgoogle-perftools-dev google-perftools libssl-dev gcc-12 g++-12 libboost-all-dev build-essential\n\n# 对于openEuler 2403sp1\nyum install cmake libuv-devel lz4-devel xz-devel double-conversion-devel libdwarf-devel libunwind-devel \\\n    libaio-devel gflags-devel glog-devel gtest-devel gmock-devel clang-tools-extra clang lld \\\n    gperftools-devel gperftools openssl-devel gcc-12 g++-12 libboost-all-dev build-essential\n\n# 对于OpenCloudOS 9和TencentOS 4\ndnf install epol-release wget git meson cmake perl lld gcc gcc-c++ autoconf lz4 lz4-devel xz xz-devel \\\n    double-conversion-devel libdwarf-devel libunwind-devel libaio-devel gflags-devel glog-devel \\\n    libuv-devel gmock-devel gperftools gperftools-devel openssl-devel boost-static boost-devel mono-devel \\\n    libevent-devel libibverbs-devel numactl-devel python3-devel\n```\n\n安装其他构建前提条件：\n\n- [`libfuse`](https:\u002F\u002Fgithub.com\u002Flibfuse\u002Flibfuse\u002Freleases\u002Ftag\u002Ffuse-3.16.1) 3.16.1或更高版本\n- [FoundationDB](https:\u002F\u002Fapple.github.io\u002Ffoundationdb\u002Fgetting-started-linux.html) 7.1或更高版本\n- [Rust](https:\u002F\u002Fwww.rust-lang.org\u002Ftools\u002Finstall)工具链：最低1.75.0，推荐1.85.0或更高版本（最新稳定版）\n\n## 构建3FS\n\n在`build`文件夹中构建3FS：\n\n```bash\n\n# 请根据您的环境将 \u003Cmethod> 替换为 'g++10' 或 'g++11'\ncmake -S . -B build \\\n      -DCMAKE_CXX_COMPILER=clang++-14 -DCMAKE_C_COMPILER=clang-14 \\\n      -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \\\n      -DSHUFFLE_METHOD=\u003Cmethod>\ncmake --build build -j 32\n```\n\n由于历史上一直使用 `std::shuffle`，因此使用不同编译器版本（例如 `g++10` 与 `g++11+`）编译的二进制文件可能存在不兼容问题（[问题](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3FS\u002Fissues\u002F368)）。为了解决这一问题，您必须在编译时显式指定 `-DSHUFFLE_METHOD`，以锁定一致的洗牌算法：\n\n- 现有集群：请使用之前部署该集群所用编译器版本对应的洗牌方法（`g++10` 或 `g++11`）。\n- 新建集群：您可以选择 `g++10` 或 `g++11`。然而，一旦集群部署完成，后续的所有构建都必须保持相同的配置，以确保兼容性。\n\n### 使用 Docker 构建 3FS\n- 对于 TencentOS-4：`docker pull docker.io\u002Ftencentos\u002Ftencentos4-deepseek3fs-build:latest`\n- 对于 OpenCloudOS-9：`docker pull docker.io\u002Fopencloudos\u002Fopencloudos9-deepseek3fs-build:latest`\n\n## 运行测试集群\n\n请按照 [部署指南](deploy\u002FREADME.md) 中的说明运行一个测试集群。\n\n## 报告问题\n\n如有问题，请访问 https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3fs\u002Fissues 进行报告。","# 3FS (Fire-Flyer File System) 快速上手指南\n\n3FS 是一款专为 AI 训练和推理负载设计的高性能分布式文件系统。它利用现代 SSD 和 RDMA 网络，提供高吞吐量、强一致性的共享存储层，适用于数据准备、DataLoader、Checkpointing 及 LLM 推理 KVCache 等场景。\n\n## 1. 环境准备\n\n### 系统要求\n*   **推荐操作系统**：Ubuntu 20.04\u002F22.04, openEuler 2403sp1, OpenCloudOS 9, TencentOS 4。\n*   **硬件建议**：支持 NVMe SSD 和 RDMA (InfiniBand\u002FRoCE) 网络的服务器集群。\n\n### 前置依赖\n在编译前，请根据操作系统安装基础依赖包。\n\n**Ubuntu 20.04:**\n```bash\napt install cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \\\n  libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \\\n  libgoogle-perftools-dev google-perftools libssl-dev libclang-rt-14-dev gcc-10 g++-10 libboost1.71-all-dev build-essential\n```\n\n**Ubuntu 22.04:**\n```bash\napt install cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \\\n  libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \\\n  libgoogle-perftools-dev google-perftools libssl-dev gcc-12 g++-12 libboost-all-dev build-essential\n```\n\n**openEuler 2403sp1:**\n```bash\nyum install cmake libuv-devel lz4-devel xz-devel double-conversion-devel libdwarf-devel libunwind-devel \\\n    libaio-devel gflags-devel glog-devel gtest-devel gmock-devel clang-tools-extra clang lld \\\n    gperftools-devel gperftools openssl-devel gcc gcc-c++ boost-devel\n```\n\n**OpenCloudOS 9 \u002F TencentOS 4:**\n```bash\ndnf install epol-release wget git meson cmake perl lld gcc gcc-c++ autoconf lz4 lz4-devel xz xz-devel \\\n    double-conversion-devel libdwarf-devel libunwind-devel libaio-devel gflags-devel glog-devel \\\n    libuv-devel gmock-devel gperftools gperftools-devel openssl-devel boost-static boost-devel mono-devel \\\n    libevent-devel libibverbs-devel numactl-devel python3-devel\n```\n\n**其他关键依赖（需手动安装）：**\n*   **libfuse**: 版本 3.16.1 或更高 ([下载链接](https:\u002F\u002Fgithub.com\u002Flibfuse\u002Flibfuse\u002Freleases\u002Ftag\u002Ffuse-3.16.1))\n*   **FoundationDB**: 版本 7.1 或更高 ([安装指南](https:\u002F\u002Fapple.github.io\u002Ffoundationdb\u002Fgetting-started-linux.html))\n*   **Rust Toolchain**: 最低 1.75.0，推荐 1.85.0+ ([安装命令](https:\u002F\u002Fwww.rust-lang.org\u002Ftools\u002Finstall))\n    ```bash\n    curl --proto '=https' --tlsv1.2 -sSf https:\u002F\u002Fsh.rustup.rs | sh\n    ```\n\n## 2. 安装与编译\n\n### 方法一：源码编译\n\n1.  **克隆代码并初始化子模块**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3fs\n    cd 3fs\n    git submodule update --init --recursive\n    .\u002Fpatches\u002Fapply.sh\n    ```\n\n2.  **配置与构建**\n    *注意：由于 `std::shuffle` 的历史兼容性问题，必须指定 `-DSHUFFLE_METHOD`。新建集群可选 `g++10` 或 `g++11`，但集群内所有节点必须保持一致。*\n\n    ```bash\n    # 替换 \u003Cmethod> 为 'g++10' 或 'g++11'\n    cmake -S . -B build \\\n          -DCMAKE_CXX_COMPILER=clang++-14 -DCMAKE_C_COMPILER=clang-14 \\\n          -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \\\n          -DSHUFFLE_METHOD=\u003Cmethod>\n    \n    cmake --build build -j 32\n    ```\n\n### 方法二：使用 Docker 构建（推荐国内用户）\n若本地环境配置复杂，可使用预置环境的 Docker 镜像进行构建。\n\n*   **TencentOS-4 环境:**\n    ```bash\n    docker pull docker.io\u002Ftencentos\u002Ftencentos4-deepseek3fs-build:latest\n    ```\n*   **OpenCloudOS-9 环境:**\n    ```bash\n    docker pull docker.io\u002Fopencloudos\u002Fopencloudos9-deepseek3fs-build:latest\n    ```\n\n## 3. 基本使用\n\n3FS 的核心功能是作为高性能共享存储运行。最简单的使用方式是部署一个测试集群。\n\n1.  **部署测试集群**\n    请参考官方部署指南启动服务（包含元数据服务和存储节点）：\n    [查看部署指南 (deploy\u002FREADME.md)](deploy\u002FREADME.md)\n\n2.  **挂载与访问**\n    部署完成后，3FS 将通过 FUSE 挂载点暴露标准文件接口。您可以像操作普通本地目录一样操作它：\n    ```bash\n    # 示例：将数据写入挂载点\n    cp large_dataset.bin \u002Fmnt\u002F3fs_mount_point\u002Fdata\u002F\n    \n    # 示例：在训练脚本中直接读取\n    python train.py --data_dir \u002Fmnt\u002F3fs_mount_point\u002Fdata\u002F\n    ```\n\n3.  **性能基准测试（可选）**\n    如需验证读写性能，可使用内置的 fio 测试引擎：\n    [查看 Benchmark 说明 (benchmarks\u002Ffio_usrbio\u002FREADME.md)](benchmarks\u002Ffio_usrbio\u002FREADME.md)\n\n> **提示**：生产环境部署涉及多节点配置、RDMA 网络调优及 FoundationDB 集群搭建，请务必详细阅读 [Setup Guide](deploy\u002FREADME.md)。","某大型 AI 实验室正在训练千亿参数大模型，需同时处理海量训练数据加载、高频断点续训及推理阶段的 KVCache 缓存。\n\n### 没有 3FS 时\n- **数据加载瓶颈**：传统存储无法支撑数百个计算节点并发随机读取训练样本，被迫采用复杂的数据预取和打乱策略，仍常因 I\u002FO 等待导致 GPU 闲置。\n- **断点保存缓慢**：大规模训练 checkpointing 时，串行或低吞吐的写入过程耗时极长，显著拉长整体训练周期，且故障恢复风险高。\n- **推理成本高昂**：推理阶段依赖昂贵的 DRAM 存储 KVCache，显存容量受限导致批处理大小（Batch Size）难以提升，单位 token 生成成本居高不下。\n- **开发维护复杂**：缺乏强一致性保证，分布式应用需自行处理数据竞争和状态同步问题，代码逻辑复杂且容易出错。\n\n### 使用 3FS 后\n- **极致数据吞吐**：利用 3FS 的解耦架构和 RDMA 网络，实现跨节点数据的原地随机访问，彻底消除预取需求，GPU 利用率接近 100%。\n- **高速并行落盘**：支持高吞吐并行检查点写入，将原本数分钟的保存过程压缩至秒级，大幅缩短训练迭代时间并提升容错效率。\n- **大容量低成本缓存**：将 KVCache 卸载至 3FS，以 SSD 的低成本获得远超 DRAM 的缓存容量，显著提升推理吞吐量并降低单次调用成本。\n- **简化开发逻辑**：基于 CRAQ 协议的强一致性特性，让开发者无需关心底层数据同步，直接使用标准文件接口即可构建可靠的分布式应用。\n\n3FS 通过融合高性能 SSD 与 RDMA 网络，为 AI 全链路工作负载提供了统一、极速且易用的共享存储底座。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_3FS_6e837f20.jpg","deepseek-ai","DeepSeek","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fdeepseek-ai_04503588.png","",null,"service@deepseek.com","https:\u002F\u002Fwww.deepseek.com\u002F","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai",[84,88,92,96,100,104,108,112,116,119],{"name":85,"color":86,"percentage":87},"C++","#f34b7d",87,{"name":89,"color":90,"percentage":91},"Rust","#dea584",4.4,{"name":93,"color":94,"percentage":95},"Gnuplot","#f0a9f0",3.4,{"name":97,"color":98,"percentage":99},"Python","#3572A5",2.1,{"name":101,"color":102,"percentage":103},"C","#555555",1.6,{"name":105,"color":106,"percentage":107},"CMake","#DA3434",0.8,{"name":109,"color":110,"percentage":111},"Shell","#89e051",0.3,{"name":113,"color":114,"percentage":115},"Dockerfile","#384d54",0.2,{"name":117,"color":118,"percentage":115},"PowerShell","#012456",{"name":120,"color":121,"percentage":122},"Makefile","#427819",0,9792,1026,"2026-04-05T09:32:44","MIT",5,"Linux (Ubuntu 20.04, Ubuntu 22.04, openEuler 2403sp1, OpenCloudOS 9, TencentOS 4)","未说明","未说明 (测试集群计算节点配置为 2.2 TiB RAM，但非最低运行要求)",{"notes":132,"python":133,"dependencies":134},"该工具是高性能分布式文件系统，主要依赖 RDMA 网络（如 InfiniBand）和 NVMe SSD 硬件以达到最佳性能。编译时需根据集群历史版本指定 SHUFFLE_METHOD (g++10 或 g++11) 以确保二进制兼容性。支持通过 Docker 镜像进行构建。元数据服务依赖 FoundationDB。","未说明 (仅需 python3-devel 用于开发依赖)",[135,136,137,138,139,140,141,142,143,144],"cmake","libuv","liblz4","liblzma","clang-14 \u002F gcc-10 or gcc-12","libfuse (>=3.16.1)","FoundationDB (>=7.1)","Rust toolchain (>=1.75.0, 推荐 >=1.85.0)","libboost","libaio",[13,51],[147],"distributed-file-system","2026-03-27T02:49:30.150509","2026-04-06T08:17:44.859271",[151,156,161,166,171,175],{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},10222,"在 Ubuntu 22.04 上构建项目失败（gmake Error 2）该如何解决？","构建失败通常是因为系统缺少某些必要的工具或依赖包。由于并行编译（-j）可能会掩盖具体的错误信息，建议使用单线程编译（添加 `-j 1` 参数）来查看具体的报错内容。此外，如果看到类似 `apache-arrow-cpp` 完成后的错误，请向上滚动日志查找更早出现的实际错误原因。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3FS\u002Fissues\u002F54",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},10223,"部署 Storage Service 时启动失败并报错，常见原因是什么？","最常见的原因是配置文件中的 `node_id` 填写错误。请检查存储节点的配置文件，确保 `node_id` 与该节点的实际规划一致，且集群中各节点的 ID 唯一。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3FS\u002Fissues\u002F220",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},10224,"执行 `admin_cli init-cluster` 命令时发生阻塞（hang）怎么办？","该问题通常由 FoundationDB (FDB) 的配置或网络问题引起。排查步骤如下：\n1. 检查 FDB 的监听地址：如果配置为 `127.0.0.1`，在容器或分布式环境中会导致连接失败，请将其修改为实际的服务器 IP 地址。\n2. 检查容器内网络：确保容器内部网络配置正确，能够访问 FDB 服务。\n3. 避免端口冲突：如果在同一节点上同时运行了容器内和宿主机上的 FDB，可能会导致冲突，尝试停止其中一个实例。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3FS\u002Fissues\u002F272",{"id":167,"question_zh":168,"answer_zh":169,"source_url":170},10225,"在进行大量文件随机读取测试时，如何优化 `register_fd` 的性能？","如果涉及大量文件（例如 10 万个文件）的随机读取，频繁地注册和注销文件描述符（register_fd\u002Fderegister_fd）会成为性能瓶颈。建议的做法是：在测试开始前，一次性将所有需要访问的文件全部注册好（register），然后在测试过程中复用这些已注册的句柄，测试结束后再统一注销。这样可以显著减少系统调用开销。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3FS\u002Fissues\u002F398",{"id":172,"question_zh":173,"answer_zh":174,"source_url":170},10226,"多客户端并发测试时，单个客户端无法打满 InfiniBand 带宽且整体吞吐未达存储节点瓶颈，如何排查？","当单客户端能跑满带宽而多客户端并发时性能下降，可能涉及负载均衡或锁竞争问题。建议排查以下指标和配置：\n1. 检查不同客户端是否配置了不同的 `sl` (Service Level)，不同的 SL 设置可能会影响流量调度从而提升速度。\n2. 监控 Meta 节点的延迟（latency），确认是否存在元数据服务的争用。\n3. 调整客户端的空闲线程数（如 `max_idle_threads`），观察是否能改善并发处理能力。\n4. 检查存储节点的磁盘队列深度和 CPU 使用率，确认是否存在隐性瓶颈。",{"id":176,"question_zh":177,"answer_zh":178,"source_url":179},10227,"Storage 服务运行中出现 Panic，疑似并发控制导致的元数据不一致，如何处理？","如果在 SSD 上同时发生一个目标（target）被移除而另一个目标正在读写的情况，原有的并发控制逻辑可能导致位置（pos）元数据不一致。仅检查被移除目标是否有进行中（inflight）请求是不够的。解决方案是参考 `commit_chunk` 中处理 `chunk.is_remove` 的逻辑：确保在执行 `meta_store.remove` 和写入操作完成后，再进行 `meta_cache` 的清理，或者采用批处理移除（batch remove）的正确时序，防止中间状态被读操作重新加载导致不一致。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3FS\u002Fissues\u002F238",[]]