[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-yahoo--TensorFlowOnSpark":3,"tool-yahoo--TensorFlowOnSpark":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",151918,2,"2026-04-12T11:33:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":77,"owner_website":79,"owner_url":80,"languages":81,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":98,"env_os":99,"env_gpu":100,"env_ram":101,"env_deps":102,"category_tags":108,"github_topics":109,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":117,"updated_at":118,"faqs":119,"releases":147},6975,"yahoo\u002FTensorFlowOnSpark","TensorFlowOnSpark","TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.","TensorFlowOnSpark 是一款由 Yahoo 开源的工具，旨在将强大的 TensorFlow 深度学习框架无缝引入 Apache Spark 和 Hadoop 集群。它主要解决了在大规模分布式环境中部署深度学习模型的难题，让开发者无需对现有 TensorFlow 代码进行大量修改（通常少于 10 行），即可在共享的 CPU 或 GPU 服务器集群上运行训练和推理任务。\n\n这款工具特别适合拥有海量数据、已构建 Spark 数据处理流水线的机器学习工程师和数据科学家。如果你需要在企业级大数据平台上高效地扩展深度学习应用，TensorFlowOnSpark 是理想的选择。\n\n其核心技术亮点在于灵活的架构设计：既支持利用 TensorFlow 原生接口直接从 HDFS 读取数据，也支持通过 Spark RDD 将数据推送给计算节点。此外，它允许节点间直接通信以加速学习过程，并完整支持同步\u002F异步训练、模型并行、数据并行以及 TensorBoard 可视化等 TensorFlow 高级功能。无论是部署在私有云还是公有云，TensorFlowOnSpark 都能帮助用户轻松整合现有的大数","TensorFlowOnSpark 是一款由 Yahoo 开源的工具，旨在将强大的 TensorFlow 深度学习框架无缝引入 Apache Spark 和 Hadoop 集群。它主要解决了在大规模分布式环境中部署深度学习模型的难题，让开发者无需对现有 TensorFlow 代码进行大量修改（通常少于 10 行），即可在共享的 CPU 或 GPU 服务器集群上运行训练和推理任务。\n\n这款工具特别适合拥有海量数据、已构建 Spark 数据处理流水线的机器学习工程师和数据科学家。如果你需要在企业级大数据平台上高效地扩展深度学习应用，TensorFlowOnSpark 是理想的选择。\n\n其核心技术亮点在于灵活的架构设计：既支持利用 TensorFlow 原生接口直接从 HDFS 读取数据，也支持通过 Spark RDD 将数据推送给计算节点。此外，它允许节点间直接通信以加速学习过程，并完整支持同步\u002F异步训练、模型并行、数据并行以及 TensorBoard 可视化等 TensorFlow 高级功能。无论是部署在私有云还是公有云，TensorFlowOnSpark 都能帮助用户轻松整合现有的大数据生态与前沿的深度学习技术，实现高效的大规模模型训练。","\u003C!--\nCopyright 2019 Yahoo Inc.\nLicensed under the terms of the Apache 2.0 license.\nPlease see LICENSE file in the project root for terms.\n-->\n# TensorFlowOnSpark\n> _TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark\nclusters._\n\n[![Build Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyahoo_TensorFlowOnSpark_readme_46e6945b3e3a.png)](https:\u002F\u002Fcd.screwdriver.cd\u002Fpipelines\u002F6384)\n[![Package](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpackage-pypi-blue.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Ftensorflowonspark\u002F)\n[![Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Ftensorflowonspark.svg)](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Ftensorflowonspark.svg)\n[![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocumentation-latest-blue.svg)](https:\u002F\u002Fyahoo.github.io\u002FTensorFlowOnSpark\u002F)\n\nBy combining salient features from the [TensorFlow](https:\u002F\u002Fwww.tensorflow.org) deep learning framework with [Apache Spark](http:\u002F\u002Fspark.apache.org) and [Apache Hadoop](http:\u002F\u002Fhadoop.apache.org), TensorFlowOnSpark enables distributed\ndeep learning on a cluster of GPU and CPU servers.\n\nIt enables both distributed TensorFlow training and\ninferencing on Spark clusters, with a goal to minimize the amount\nof code changes required to run existing TensorFlow programs on a\nshared grid.  Its Spark-compatible API helps manage the TensorFlow\ncluster with the following steps:\n\n1. **Startup** - launches the Tensorflow main function on the executors, along with listeners for data\u002Fcontrol messages.\n1. **Data ingestion**\n   - **InputMode.TENSORFLOW** - leverages TensorFlow's built-in APIs to read data files directly from HDFS.\n   - **InputMode.SPARK** - sends Spark RDD data to the TensorFlow nodes via a `TFNode.DataFeed` class.  Note that we leverage the [Hadoop Input\u002FOutput Format](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fecosystem\u002Ftree\u002Fmaster\u002Fhadoop) to access TFRecords on HDFS.\n1. **Shutdown** - shuts down the Tensorflow workers and PS nodes on the executors.\n\n## Table of Contents\n\n- [Background](#background)\n- [Install](#install)\n- [Usage](#usage)\n- [API](#api)\n- [Contribute](#contribute)\n- [License](#license)\n\n## Background\n\nTensorFlowOnSpark was developed by Yahoo for large-scale distributed\ndeep learning on our Hadoop clusters in Yahoo's private cloud.\n\nTensorFlowOnSpark provides some important benefits (see [our\nblog](https:\u002F\u002Fdeveloper.yahoo.com\u002Fblogs\u002F157196317141\u002F))\nover alternative deep learning solutions.\n   * Easily migrate existing TensorFlow programs with \u003C10 lines of code change.\n   * Support all TensorFlow functionalities: synchronous\u002Fasynchronous training, model\u002Fdata parallelism, inferencing and TensorBoard.\n   * Server-to-server direct communication achieves faster learning when available.\n   * Allow datasets on HDFS and other sources pushed by Spark or pulled by TensorFlow.\n   * Easily integrate with your existing Spark data processing pipelines.\n   * Easily deployed on cloud or on-premise and on CPUs or GPUs.\n\n## Install\n\nTensorFlowOnSpark is provided as a pip package, which can be installed on single machines via:\n```\n# for tensorflow>=2.0.0\npip install tensorflowonspark\n\n# for tensorflow\u003C2.0.0\npip install tensorflowonspark==1.4.4\n```\n\nFor distributed clusters, please see our [wiki site](..\u002F..\u002Fwiki) for detailed documentation for specific environments, such as our getting started guides for [single-node Spark Standalone](https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fwiki\u002FGetStarted_Standalone), [YARN clusters](..\u002F..\u002Fwiki\u002FGetStarted_YARN) and [AWS EC2](..\u002F..\u002Fwiki\u002FGetStarted_EC2).  Note: the Windows operating system is not currently supported due to [this issue](https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fissues\u002F36).\n\n## Usage\n\nTo use TensorFlowOnSpark with an existing TensorFlow application, you can follow our [Conversion Guide](..\u002F..\u002Fwiki\u002FConversion-Guide) to describe the required changes.  Additionally, our [wiki site](..\u002F..\u002Fwiki) has pointers to some presentations which provide an overview of the platform.\n\n**Note: since TensorFlow 2.x breaks API compatibility with TensorFlow 1.x, the examples have been updated accordingly.  If you are using TensorFlow 1.x, you will need to checkout the `v1.4.4` tag for compatible examples and instructions.**\n\n## API\n\n[API Documentation](https:\u002F\u002Fyahoo.github.io\u002FTensorFlowOnSpark\u002F) is automatically generated from the code.\n\n## Contribute\n\nPlease join the [TensorFlowOnSpark user group](https:\u002F\u002Fgroups.google.com\u002Fforum\u002F#!forum\u002FTensorFlowOnSpark-users) for discussions and questions.  If you have a question, please review our [FAQ](..\u002F..\u002Fwiki\u002FFrequently-Asked-Questions) before posting.\n\nContributions are always welcome.  For more information, please see our [guide for getting involved](Contributing.md).\n\n## License\n\nThe use and distribution terms for this software are covered by the Apache 2.0 license.\nSee [LICENSE](LICENSE) file for terms.\n","\u003C!--\n版权所有 © 2019 雅虎公司。\n根据 Apache 2.0 许可证条款授权使用。\n有关条款，请参阅项目根目录下的 LICENSE 文件。\n-->\n# TensorFlowOnSpark\n> _TensorFlowOnSpark 将可扩展的深度学习引入 Apache Hadoop 和 Apache Spark 集群。_\n\n[![构建状态](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyahoo_TensorFlowOnSpark_readme_46e6945b3e3a.png)](https:\u002F\u002Fcd.screwdriver.cd\u002Fpipelines\u002F6384)\n[![软件包](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpackage-pypi-blue.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Ftensorflowonspark\u002F)\n[![下载量](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Ftensorflowonspark.svg)](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Ftensorflowonspark.svg)\n[![文档](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocumentation-latest-blue.svg)](https:\u002F\u002Fyahoo.github.io\u002FTensorFlowOnSpark\u002F)\n\n通过将 [TensorFlow](https:\u002F\u002Fwww.tensorflow.org) 深度学习框架的突出特性与 [Apache Spark](http:\u002F\u002Fspark.apache.org) 和 [Apache Hadoop](http:\u002F\u002Fhadoop.apache.org) 相结合，TensorFlowOnSpark 能够在由 GPU 和 CPU 服务器组成的集群上实现分布式深度学习。\n\n它支持在 Spark 集群上进行分布式 TensorFlow 训练和推理，旨在尽量减少将现有 TensorFlow 程序运行在共享计算资源上所需的代码修改量。其与 Spark 兼容的 API 可帮助按以下步骤管理 TensorFlow 集群：\n\n1. **启动** - 在执行器上启动 TensorFlow 主函数，并设置用于接收数据和控制消息的监听器。\n1. **数据摄取**\n   - **InputMode.TENSORFLOW** - 利用 TensorFlow 内置 API 直接从 HDFS 读取数据文件。\n   - **InputMode.SPARK** - 通过 `TFNode.DataFeed` 类将 Spark RDD 数据发送到 TensorFlow 节点。请注意，我们利用 [Hadoop 输入\u002F输出格式](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fecosystem\u002Ftree\u002Fmaster\u002Fhadoop) 来访问 HDFS 上的 TFRecords。\n1. **关闭** - 关闭执行器上的 TensorFlow 工作节点和参数服务器节点。\n\n## 目录\n\n- [背景](#background)\n- [安装](#install)\n- [使用](#usage)\n- [API](#api)\n- [贡献](#contribute)\n- [许可证](#license)\n\n## 基本信息\n\nTensorFlowOnSpark 由雅虎开发，用于在雅虎私有云中的 Hadoop 集群上进行大规模分布式深度学习。\n\n与替代性深度学习解决方案相比，TensorFlowOnSpark 具有一些重要优势（详见 [我们的博客](https:\u002F\u002Fdeveloper.yahoo.com\u002Fblogs\u002F157196317141\u002F)）：\n   * 可以仅通过不到 10 行代码的修改轻松迁移现有的 TensorFlow 程序。\n   * 支持所有 TensorFlow 功能：同步\u002F异步训练、模型并行与数据并行、推理以及 TensorBoard。\n   * 在条件允许的情况下，服务器间直接通信能够实现更快的训练速度。\n   * 支持存储在 HDFS 上的数据集以及其他由 Spark 推送或由 TensorFlow 拉取的数据源。\n   * 可轻松集成到您现有的 Spark 数据处理流水线中。\n   * 可方便地部署在云端或本地环境中，并且适用于 CPU 或 GPU 硬件。\n\n## 安装\n\nTensorFlowOnSpark 以 pip 包的形式提供，可在单机上通过以下命令安装：\n```\n# 对于 tensorflow>=2.0.0\npip install tensorflowonspark\n\n# 对于 tensorflow\u003C2.0.0\npip install tensorflowonspark==1.4.4\n```\n\n对于分布式集群，请参阅我们的 [wiki 站点](..\u002F..\u002Fwiki) 获取针对特定环境的详细文档，例如我们的入门指南，包括 [单节点 Spark Standalone](https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fwiki\u002FGetStarted_Standalone)、[YARN 集群](..\u002F..\u002Fwiki\u002FGetStarted_YARN) 和 [AWS EC2](..\u002F..\u002Fwiki\u002FGetStarted_EC2)。请注意：由于 [此问题](https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fissues\u002F36)，目前不支持 Windows 操作系统。\n\n## 使用\n\n要将 TensorFlowOnSpark 与现有的 TensorFlow 应用程序一起使用，您可以按照我们的 [转换指南](..\u002F..\u002Fwiki\u002FConversion-Guide) 描述所需更改。此外，我们的 [wiki 站点](..\u002F..\u002Fwiki) 还提供了一些演示文稿的链接，这些文稿概述了该平台。\n\n**注意：由于 TensorFlow 2.x 与 TensorFlow 1.x 的 API 不兼容，示例已相应更新。如果您使用的是 TensorFlow 1.x，则需要检出 `v1.4.4` 标签以获取兼容的示例和说明。**\n\n## API\n\n[API 文档](https:\u002F\u002Fyahoo.github.io\u002FTensorFlowOnSpark\u002F) 由代码自动生成。\n\n## 贡献\n\n请加入 [TensorFlowOnSpark 用户组](https:\u002F\u002Fgroups.google.com\u002Fforum\u002F#!forum\u002FTensorFlowOnSpark-users) 以参与讨论和提问。如果您有任何疑问，请在发帖前先查看我们的 [常见问题解答](..\u002F..\u002Fwiki\u002FFrequently-Asked-Questions)。\n\n我们始终欢迎贡献。有关更多信息，请参阅我们的 [参与指南](Contributing.md)。\n\n## 许可证\n\n本软件的使用和分发条款受 Apache 2.0 许可证约束。\n有关条款，请参阅 [LICENSE](LICENSE) 文件。","# TensorFlowOnSpark 快速上手指南\n\nTensorFlowOnSpark 将可扩展的深度学习带入 Apache Hadoop 和 Apache Spark 集群，支持在 GPU 和 CPU 服务器集群上进行分布式 TensorFlow 训练和推理。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**：Linux 或 macOS（**不支持 Windows**）\n- **集群环境**：Apache Spark  standalone、YARN 集群或 AWS EC2\n- **硬件**：支持 CPU 或 GPU 服务器\n\n### 前置依赖\n- Python (建议 3.6+)\n- Apache Spark (版本需与 TensorFlowOnSpark 兼容)\n- Apache Hadoop (用于访问 HDFS 数据)\n- TensorFlow\n  - 若使用 TensorFlow >= 2.0.0，需安装最新版 TensorFlowOnSpark\n  - 若使用 TensorFlow \u003C 2.0.0，需安装 TensorFlowOnSpark 1.4.4\n\n## 安装步骤\n\n### 单机安装\n通过 pip 安装（推荐使用国内镜像源加速）：\n\n```bash\n# 对于 TensorFlow >= 2.0.0\npip install tensorflowonspark -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 对于 TensorFlow \u003C 2.0.0\npip install tensorflowonspark==1.4.4 -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 分布式集群部署\n对于生产环境的分布式集群（如 YARN, Spark Standalone, AWS EC2），请参考官方 Wiki 获取特定环境的详细配置指南：\n- [Spark Standalone 单机入门](https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fwiki\u002FGetStarted_Standalone)\n- [YARN 集群入门](https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fwiki\u002FGetStarted_YARN)\n- [AWS EC2 入门](https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fwiki\u002FGetStarted_EC2)\n\n## 基本使用\n\n要将现有的 TensorFlow 应用程序迁移到 TensorFlowOnSpark，通常只需修改少于 10 行代码。主要流程包括启动集群、数据摄入和关闭节点。\n\n### 核心概念\nTensorFlowOnSpark 提供两种数据摄入模式：\n1. **InputMode.TENSORFLOW**: 利用 TensorFlow 内置 API 直接从 HDFS 读取数据文件。\n2. **InputMode.SPARK**: 通过 `TFNode.DataFeed` 类将 Spark RDD 数据发送给 TensorFlow 节点。\n\n### 简单示例逻辑\n以下是一个典型的集成逻辑概述（具体代码实现请参考 [转换指南](https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fwiki\u002FConversion-Guide)）：\n\n1. **初始化 SparkSession** 并导入 `TFSparkNode`。\n2. **定义映射函数**，在该函数中启动 TensorFlow 的主程序（`tf_main`）。\n3. **启动集群**：使用 `TFCluster.run()` 启动 Executor 上的 TensorFlow 主函数及监听器。\n4. **执行训练**：根据选择的数据模式（TENSORFLOW 或 SPARK）进行数据流处理。\n5. **关闭集群**：训练完成后调用 `cluster.shutdown()` 释放资源。\n\n> **注意**：由于 TensorFlow 2.x 与 1.x 存在 API 不兼容问题，如果您使用的是 TensorFlow 1.x，请检出 `v1.4.4` 标签以获取兼容的示例和说明。\n\n更多详细的 API 文档请访问：[TensorFlowOnSpark API Documentation](https:\u002F\u002Fyahoo.github.io\u002FTensorFlowOnSpark\u002F)","某大型电商数据团队需要在现有的 Hadoop\u002FSpark 集群上，利用历史交易日志训练大规模深度学习推荐模型。\n\n### 没有 TensorFlowOnSpark 时\n- **架构割裂**：数据预处理依赖 Spark，而模型训练需单独搭建 TensorFlow 集群，导致数据必须在 HDFS 和独立存储间反复搬运，流程繁琐且易出错。\n- **代码改造量大**：为了适配分布式训练，开发人员需要重写大量底层通信代码来管理参数服务器（PS）和工作节点，迁移现有单机脚本极其困难。\n- **资源利用率低**：GPU 计算集群与 CPU 数据处理集群相互隔离，无法共享同一套硬件资源，导致高峰期资源争抢而闲时大量闲置。\n- **运维复杂度高**：需要同时维护两套独立的调度系统和监控面板，故障排查时需在 Spark 日志和 TensorFlow 日志之间来回切换，定位问题耗时。\n\n### 使用 TensorFlowOnSpark 后\n- **流水线一体化**：直接在 Spark 集群上启动 TensorFlow 任务，通过 `InputMode.SPARK` 将 RDD 数据流无缝喂送给模型，消除了数据移动开销，实现“数据不动计算动”。\n- **极简迁移成本**：仅需修改不到 10 行代码即可将原有的 TensorFlow 程序转化为分布式应用，无需关心底层的集群启动、监听器注册及节点关闭逻辑。\n- **资源统一调度**：充分利用现有的 YARN 或 Standalone 集群，让 CPU 数据预处理与 GPU 模型训练在同一组服务器上协同工作，显著提升整体资源利用率。\n- **功能完整兼容**：原生支持同步\u002F异步训练、模型并行及 TensorBoard 可视化，开发者可像使用普通 Spark 任务一样管理深度学习生命周期，运维视角统一清晰。\n\nTensorFlowOnSpark 通过打通大数据处理与深度学习训练的壁垒，让企业能以最小的代码代价在现有基础设施上实现可扩展的分布式 AI 落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyahoo_TensorFlowOnSpark_8f6dd054.png","yahoo","Yahoo","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fyahoo_5a311dbd.png","This organization is the home to many of the active open source projects published by engineers at Yahoo Inc.",null,"ospo@yahooinc.com","https:\u002F\u002Fdeveloper.yahoo.com\u002Fopensource","https:\u002F\u002Fgithub.com\u002Fyahoo",[82,86,90],{"name":83,"color":84,"percentage":85},"Python","#3572A5",80.5,{"name":87,"color":88,"percentage":89},"Scala","#c22d40",17.8,{"name":91,"color":92,"percentage":93},"Shell","#89e051",1.7,3860,941,"2026-04-02T08:34:50","Apache-2.0",5,"Linux, macOS","支持 CPU 和 GPU 集群；具体显卡型号、显存大小及 CUDA 版本未说明（取决于底层 TensorFlow 版本需求）","未说明",{"notes":103,"python":101,"dependencies":104},"不支持 Windows 操作系统。该工具旨在将 TensorFlow 集成到 Apache Hadoop 和 Spark 集群中，支持分布式训练和推理。对于 TensorFlow 2.x 和 1.x 有不同的安装版本和示例代码要求。",[105,106,107],"tensorflow>=2.0.0 (或 tensorflow\u003C2.0.0 配合 tensorflowonspark==1.4.4)","apache-spark","apache-hadoop",[14],[110,111,73,112,113,114,115,116],"tensorflow","spark","machine-learning","cluster","featured","python","scala","2026-03-27T02:49:30.150509","2026-04-13T06:10:33.991697",[120,125,130,135,139,143],{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},31429,"在 YARN 上运行 TensorFlowOnSpark 时任务卡住且没有错误信息，如何解决？","这通常意味着环境变量 `spark.executorEnv.LD_LIBRARY_PATH` 需要指向包含 `libhdfs.so` 文件的目录。请确保该文件存在于所有执行器（executors）的相同路径下。如果问题仍然存在，尝试将执行器数量减少到 1 个以排查行为。","https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fissues\u002F33",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},31430,"使用 Keras 示例在 HDFS 上保存模型时，进程完成但找不到生成的模型文件，原因是什么？","根本原因是分布式系统的“不精确”行为与 Keras API 的“精确”行为之间存在冲突。特别是当输入数据（如 MNIST CSV）通过 Spark 生成且分区大小不均匀时（例如某些分区 5120 行，某些 6144 行），会导致模型保存失败。建议检查数据分区的均匀性，或调整代码以适应这种分布差异。","https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fissues\u002F398",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},31431,"随着训练轮次（epoch）增加，内存使用量不断上升并最终导致内存溢出（OOM），该如何解决？","在某些环境（如 Ubuntu）下，TensorFlow 可能存在内存泄漏问题。有用户反馈将操作系统从 Ubuntu 切换到 CentOS 后解决了该问题。此外，可以尝试将 Numpy 数组转换为 `tf.data` 格式传入 `model.fit`，或者检查是否使用了正确的 inputMode（Tensorflow 模式 vs Spark 模式）。","https:\u002F\u002Fgithub.com\u002Fyahoo\u002FTensorFlowOnSpark\u002Fissues\u002F534",{"id":136,"question_zh":137,"answer_zh":138,"source_url":124},31432,"提交任务时遇到 PythonException 且日志信息不足，应该如何调试？","截图或简短的错误堆栈通常不足以定位问题。请创建一个新的 Issue 并提供详细信息，包括：使用的软件版本（Spark, TensorFlow, Python 等）、尝试运行的示例代码、完整的 spark-submit 命令行参数以及执行器（executor）的详细日志。",{"id":140,"question_zh":141,"answer_zh":142,"source_url":129},31433,"在使用 Keras fit_generator 方法时，是否必须输入验证数据（validation_data）？","不是必须的。用户可以不输入验证数据进行训练。如果模型无法保存到 HDFS，通常与验证数据无关，而是由于数据分区不均匀导致的分布式同步问题。",{"id":144,"question_zh":145,"answer_zh":146,"source_url":134},31434,"如何在不同操作系统间选择以避免 TensorFlowOnSpark 的内存泄漏问题？","如果在 Ubuntu 等系统上遇到随 epoch 增加的内存泄漏问题，建议尝试在 CentOS 环境下运行。有案例表明，相同的代码在 CentOS 上运行正常，而在 Ubuntu 上会出现内存持续增长直至溢出的情况。",[148,153,158,163,168,173,178,183,188,193,198,203,208,213,218,223,228,233,238,243],{"id":149,"version":150,"summary_zh":151,"released_at":152},231143,"v2.2.5","- 允许与 `tensorflow-cpu` 包一起使用。\n- 依赖项更新\n- 小的修复。\n","2022-04-21T20:41:22",{"id":154,"version":155,"summary_zh":156,"released_at":157},231144,"v2.2.4","- 增加了将临时套接字\u002F端口的释放延迟到用户 `map_function` 中的选项，以应对用户代码可能无法及时绑定到分配的端口、从而导致其他进程抢先绑定同一端口的情况，例如在调用 TensorFlow API 之前需要进行大量预处理时。\n- 更新了 screwdriver.cd 构建模板。\n- 在推送至 PyPI 后触发文档发布。\n","2021-05-25T22:25:31",{"id":159,"version":160,"summary_zh":161,"released_at":162},231145,"v2.2.3","- 增加了在 TFParallel 中禁用 Spark 阻塞执行的功能\n- 更新了依赖，支持 Spark 3 和 Scala 2.12\n- 修复了文档构建问题","2021-03-23T16:46:56",{"id":164,"version":165,"summary_zh":166,"released_at":167},231146,"v2.2.2","- 将构建从 Travis CI 迁移到 Screwdriver.cd","2020-12-18T21:57:51",{"id":169,"version":170,"summary_zh":171,"released_at":172},231147,"v2.2.1","- 在 `TFOS_SERVER_PORT` 环境变量中添加了对端口范围的支持。\n- 更新了 `mnist\u002Fkeras\u002Fmnist_tf.py` 示例，添加了针对 TensorFlow 数据集问题的 workaround。\n- 为缺少 `executor_id` 的情况添加了更详细的错误信息。\n- 添加了针对 GPU 资源分配不同场景的单元测试。","2020-03-16T18:59:39",{"id":174,"version":175,"summary_zh":176,"released_at":177},231148,"v2.2.0","- 增加了对 Spark 3.0 GPU 资源的支持\n- 更新以支持 Spark 2.4.5\n- 修复了 `mnist_inference.py` 示例中的数据集顺序问题（感谢 @qsbao）\n- 添加了可选的环境变量，用于配置执行器上的 TF 服务器\u002FGRPC 端口以及 TensorBoard 端口\n- 修复了 TF1.x 向后兼容代码中 `TFNode.start_cluster_server` 的 bug\n- 修复了 TF2.1 中 `compat.export_saved_model` 的文件冲突问题\n- 移除了对 Python 2.x 的支持","2020-02-19T01:05:22",{"id":179,"version":180,"summary_zh":181,"released_at":182},231149,"v2.1.3","- 在不导入 TensorFlow 的情况下检测其版本，以避免在 GPU 资源分配之前进行运行时初始化。","2020-01-22T17:09:54",{"id":184,"version":185,"summary_zh":186,"released_at":187},231150,"v2.1.2","- 使用 `tf.config.list_physical_devices()` 来避免 TensorFlow 运行时的初始化。","2020-01-10T22:19:19",{"id":189,"version":190,"summary_zh":191,"released_at":192},231151,"v2.1.1","- 添加了 `compat.is_gpu_available()` 方法，用于：\n  - `tf.config.list_logical_devices('GPU')`（适用于 TensorFlow 2.1）\n  - `tf.test.is_cuda_available()`（适用于较早版本的 TensorFlow）。\n- 增加了在 `chief:0` 或 `master:0` 节点上启动 TensorBoard 的功能（适用于没有 `worker` 节点的小型集群）。","2020-01-09T23:42:56",{"id":194,"version":195,"summary_zh":196,"released_at":197},231152,"v2.1.0","- 新增 `compat` 模块，用于管理 TensorFlow 中的次要 API 变更。\n- 增加对 TF2.1.0rc0 的兼容性（包括导出 saved_models 和配置自动分片策略）。\n- 重新引入对 TF1.x 的兼容性（但 ML Pipeline API 中不再支持 InputMode.TENSORFLOW）。\n- 新增 `TFParallel` 类，用于通过 Spark 执行器实现单节点并行推理。\n- 更新示例以适配 TensorFlow API 的变更。\n- 更新为使用模块级别的日志记录器。","2019-12-09T18:38:03",{"id":199,"version":200,"summary_zh":201,"released_at":202},231153,"v2.0.0","- initial release compatible with TensorFlow 2.x.\r\n- API changes:\r\n  - removed `TFNode.start_cluster_server`, which is not required for `tf.keras` and `tf.estimator`.\r\n  - removed `TFNode.export_saved_model`, which can be replaced by TF native APIs now.\r\n  - added `TFNodeContext.num_workers` to count `master`, `chief`, and `worker` nodes.\r\n- Spark ML Pipeline API changes:\r\n  - Scala API has been removed for now, since the Java library for TensorFlow 2.0 is not available yet.\r\n  - removed `InputMode.TENSORFLOW` support for ML Pipelines, since the input data is always a Spark DataFrame for this API.\r\n  - added `HasMasterNode` and `HasGraceSecs` params.\r\n  - removed optional `export_fn` argument for Spark ML `TFEstimator` (use TF export APIs instead).\r\n  - added standard default values for `signature_def_key` and `tag_set` for Spark ML `TFModel`.\r\n  - modified inferencing code in `TFModel` for TF2.x APIs.\r\n- older TF 1.x examples have been replaced with TF 2.x compatible examples.","2019-10-02T16:31:45",{"id":204,"version":205,"summary_zh":206,"released_at":207},231154,"v1.4.4","- last expected release compatible with TensorFlow 1.x (aside from any critical fixes), since the `master` branch will be moving to TF 2.0 compatibility.\r\n- handle multiple outputs with signaturedef (thanks to @markromedia).\r\n- handle exceptions after data feeding.\r\n- moved API docs to sphinx_rtd_theme.\r\n- updated to Spark 2.4.4.","2019-09-30T22:50:45",{"id":209,"version":210,"summary_zh":211,"released_at":212},231155,"v1.4.3","- removed `tensorflow` as a dependency, in order to support other variants like `tensorflow-gpu` or `tf-nightly`.\r\n- allow use of `evaluator` node type in cluster (thanks to @bbshetty)\r\n- refactored cluster template generation.\r\n- updated wide-deep example to use models\u002Fofficial code.\r\n- restore termination of feed in mnist\u002Fspark example.\r\n- updated sample notebook instructions.\r\n- updated to use Spark 2.3.3.\r\n","2019-04-06T01:37:23",{"id":214,"version":215,"summary_zh":216,"released_at":217},231156,"v1.4.2","- Set TF_CONFIG for \"chief\" clusters (required by DistributionStrategy APIs)\r\n- Fix GPU allocation for multi-gpu nodes\r\n- Updated examples for MNIST\r\n- Updated Hadoop and Spark dependency versions","2019-01-22T18:46:53",{"id":219,"version":220,"summary_zh":221,"released_at":222},231157,"v1.4.1","- Added `util.single_node_env()`, which can be used to initialize the environment (HDFS compatibility + GPU allocation) for running a single-node instance of TensorFlow on the Spark driver.\r\n- Added an example of parallelized inferencing from a pre-trained SavedModel.","2018-12-03T21:58:25",{"id":224,"version":225,"summary_zh":226,"released_at":227},231158,"v1.4.0","- More deterministic GPU allocation for multi-GPU nodes.\r\n- Added `timeout` argument to `TFCluster.shutdown()` (default is 3 days).  This is intended to shutdown the Spark application in the event that any of the TF nodes hang for any reason.  Set to -1 to disable timeout.\r\n- Added ability to start reservation server on a specific port (contributed by @AvihayTsayeg).\r\n- Updated pipeline API for latest TF APIs (contributed by @AvihayTsayeg)\r\n- Added unit test for `tf.SparseTensor` support.\r\n- Updated examples to latest TF APIs (including workaround for https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow\u002Fissues\u002F21745).\r\n- Updated Spark version dependency for Scala Inferencing API.\r\n- Added `__version__` to module.","2018-11-16T18:43:07",{"id":229,"version":230,"summary_zh":231,"released_at":232},231159,"v1.3.4","- Travis CI integration for Python documentation and Scala Inferencing API builds.\r\n- Added `sys.path` to tensorboard search path.","2018-09-27T22:02:32",{"id":234,"version":235,"summary_zh":236,"released_at":237},231160,"v1.3.3","- Only set TF_CONFIG environment variable if cluster_spec has a \"master\", i.e. when using `tf.estimator`.\r\n- Updated `mnist\u002Fkeras\u002Fmnist_mlp_estimator.py` with example of distributed\u002Fparallel inferencing via `estimator.predict`.\r\n- Added optional `feed_timeout` argument to `TFCluster.train()` for InputMode.SPARK.\r\n- Added optional `grace_secs` argument to `TFCluster.shutdown()`.\r\n- Workaround for firewall proxy issue with `get_ip_address` (contributed by @viplav).\r\n- Add support for all Hadoop-compatible File System schemes (contributed by @vishnu2kmohan).\r\n- Added error messages to `assert` statements.\r\n- Initial Travis CI integration.\r\n\r\n","2018-09-06T20:44:49",{"id":239,"version":240,"summary_zh":241,"released_at":242},231161,"v1.3.2","- add grace period to `TFCluster.shutdown()`\r\n- add wide & deep example (contributed by @crafet)\r\n- update mnist\u002Fpipeline examples to `tf.data`, add instructions, and misc code cleanup (from @yileic)\r\n- parameterize versions in pom.xml and code cleanup (from @tmielika)\r\n- update Scala Inferencing pom.xml to latest tensorflow-hadoop artifact (contributed by @psuszyns)\r\n\r\n","2018-07-13T21:13:57",{"id":244,"version":245,"summary_zh":246,"released_at":247},231162,"v1.3.1","- Add keras\u002Festimator example\r\n- Update original keras example to latest ` tf.keras` apis\r\n- Update Scala Inferencing pom.xml to latest TF java version\r\n- Allow PS to use CPU on TF-GPU builds (contributed by @dratini6)\r\n- More pep8\r\n- More py2\u002Fpy3 compat\r\n","2018-07-13T17:52:03"]