[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-zer0n--deepframeworks":3,"tool-zer0n--deepframeworks":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",157379,2,"2026-04-15T23:32:42",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":75,"owner_email":75,"owner_twitter":75,"owner_website":77,"owner_url":78,"languages":75,"stars":79,"forks":80,"last_commit_at":81,"license":75,"difficulty_score":82,"env_os":83,"env_gpu":84,"env_ram":85,"env_deps":86,"category_tags":96,"github_topics":75,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":98,"updated_at":99,"faqs":100,"releases":131},7911,"zer0n\u002Fdeepframeworks","deepframeworks","Evaluation of Deep Learning Frameworks","deepframeworks 是一份针对主流深度学习框架的横向评测报告，旨在帮助开发者和研究人员在 Caffe、CNTK、TensorFlow、Theano 及 Torch 等工具中做出明智选择。它通过对比建模能力、接口友好度、部署灵活性、运行性能及生态系统等多个维度，解决了用户在技术选型时面临的信息不对称难题。\n\n这份资料特别适合需要深入理解框架底层特性与适用场景的算法工程师和学术研究者。其独特之处在于不仅提供了直观的星级评分，还详细剖析了各框架在处理卷积网络、循环神经网络（RNN\u002FLSTM）及注意力机制时的具体表现与代码复杂度。例如，报告指出了 Caffe 在视觉领域的优势及其在语言模型上的架构局限，也分析了 TensorFlow 当时在双向 RNN 支持上的不足。\n\n需要注意的是，deepframeworks 的研究数据主要基于 2015 年底至 2016 年初的技术环境。虽然各大框架此后已历经多次重大迭代与性能飞跃，但文中关于架构设计哲学与扩展灵活性的深度分析，至今仍对理解深度学习工具的发展脉络具有重要的参考价值。","# Evaluation of Deep Learning Toolkits\n\n**Warning**: this research was done in late 2015 with slight modifications in early 2016. Many toolkits have improved significantly since then.\n\n**Abstract.** In this study, I evaluate some popular deep learning toolkits. The candidates are listed in alphabetical order: [Caffe](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe), [CNTK](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FCNTK), [TensorFlow](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow), [Theano](https:\u002F\u002Fgithub.com\u002FTheano\u002FTheano), and [Torch](https:\u002F\u002Fgithub.com\u002Ftorch\u002Ftorch7).\n\nI also provide ratings in some areas because for a lot of people, ratings are useful. However, keep in mind that ratings are inherently subjective [1].\n\nIf you find something wrong or inadequate, please help improve by filing an issue.\n\n**Table of contents**\n\n1. [Modeling Capability](#modeling-capability)\n- [Interfaces](#interfaces)\n- [Model Deployment](#model-deployment)\n- [Performance](#performance)\n- [Architecture](#architecture)\n- [Ecosystem](#ecosystem)\n- [Cross-platform](#cross-platform) \n\n___\n\n## Modeling Capability\nIn this section, we evaluate each toolkit's ability to train common and state-of-the-art networks \u003Cu>without writing too much code\u003C\u002Fu>. Some of these networks are:\n\n- ConvNets: AlexNet, OxfordNet, GoogleNet\n- RecurrentNets: plain RNN, LSTM\u002FGRU, bidirectional RNN\n- Sequential modeling with attention.\n\nIn addition, we also evaluate the flexibility to create a new type of model.\n\n#### Caffe \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nCaffe is perhaps the first mainstream industry-grade deep learning toolkit, started in late 2013, due to its excellent convnet implementation (at the time). It is still the most popular toolkit within the computer vision community, with many extensions being actively added. \n\nHowever, its support for recurrent networks and language modeling in general is poor, due to its legacy architecture, which's limitations are detailed in the [architecture section](#architecture).\n\n#### CNTK \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_2_stars.png\">\nCNTK is a deep learning system started by the speech people who [started the deep learning craze](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload?doi=10.1.1.185.1908&rep=rep1&type=pdf) and grown into a more general platform-independent deep learning system. It is better known in the speech community than in the general deep learning community.\n\nIn CNTK (as in TensorFlow and Theano), a network is specified as a symbolic graph of vector operations, such as matrix add\u002Fmultiply or convolution. A layer is just a composition of those operations. The fine granularity of the building blocks (operations) allows users to invent new complex layer types without implementing them in a low-level language (as in Caffe).\n\n#### TensorFlow \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_and_a_half_stars.png\">\n**State-of-the-art models**\n\n- RNN API and implementation are suboptimal. The team also commented about it [here](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow\u002Fissues\u002F7) and [here](https:\u002F\u002Fgroups.google.com\u002Fa\u002Ftensorflow.org\u002Fforum\u002F?utm_medium=email&utm_source=footer#!msg\u002Fdiscuss\u002FB8HyI0tVtPY\u002FaR43OIuUAwAJ).\n- Bidirectional RNN [not available yet](https:\u002F\u002Fgroups.google.com\u002Fa\u002Ftensorflow.org\u002Fforum\u002F?utm_medium=email&utm_source=footer#!msg\u002Fdiscuss\u002FlwgaL7WEuW4\u002FUXaL4bYkAgAJ)\n- No 3D convolution, which is useful for video recognition\n\n**New models**\nSince TF uses symbolic graph of vector operations approach, specifying a new network is fairly easy. Although it doesn't support symbolic loop yet (at least not well tested\u002Fdocumented, as of 05\u002F2016), RNNs can be made easy and efficient using the [bucketing trick](https:\u002F\u002Fwww.tensorflow.org\u002Fversions\u002Fr0.8\u002Ftutorials\u002Fseq2seq\u002Findex.html#bucketing-and-padding).\n\nHowever, TF has a major weakness in terms of modeling flexibility. Every computational flow has be constructed as a static graph. That makes some computations difficult, such as [beam search](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow\u002Fissues\u002F654) (which is used frequently in sequence prediction tasks). \n\n\n#### Theano \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_and_a_half_stars.png\">\n**State-of-the-art models.** Theano has implementation for most state-of-the-art networks, either in the form of a higher-level framework (e.g. [Blocks](https:\u002F\u002Fgithub.com\u002Fmila-udem\u002Fblocks), [Keras](https:\u002F\u002Fgithub.com\u002Ffchollet\u002Fkeras), etc.) or in pure Theano.\n\n**New models.** Theano pioneered the trend of using symbolic graph for programming a network. Theano's symbolic API supports looping control, so-called [scan](http:\u002F\u002Fdeeplearning.net\u002Fsoftware\u002Ftheano\u002Ftutorial\u002Floop.html), which makes implementing RNNs easy and efficient. Users don't always have to define a new model at the tensor operations level. There are a few higher-level frameworks, mentioned above, which make model definition and training simpler.\n\n#### Torch \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_5_stars.png\">\n**State-of-the-art models**\n\n- Excellent for conv nets. It's worth noting that temporal convolution can be done in TensorFlow\u002FTheano via `conv2d` but that's a trick. The native interface for temporal convolution  in Torch makes it slightly more intuitive to use. \n- Rich set of RNNs available through a [non-official extension](https:\u002F\u002Fgithub.com\u002FElement-Research\u002Frnn) [2]\n\n**New models.** In Torch, there are multiple ways (stack of layers or graph of layers) to define a network but essentially, a network is defined as a graph of layers. Because of this coarser granularity, Torch is sometimes considered less flexible because for new layer types, users have to implement the full forward, backward, and gradient input update.\n\nHowever, unlike Caffe, defining a new layer in Torch is much easier because you don't have to program in C++. Plus, in Torch, the difference between new layer definition and network definition is minimal. In Caffe, layers are defined in C++ while networks are defined via `Protobuf`.\n\nTorch is more flexible than TensorFlow and Theano in that it is imperative while TF\u002FTheano are declarative (i.e. one has to declare a computational graph). That makes some operations, e.g. beam search, much easier to do in Torch.\n\n---\n\u003Ccenter>\n\u003Cimg src=\"http:\u002F\u002Fi.snag.gy\u002F0loNv.jpg\" height=\"450\">  \u003Cimg src=\"https:\u002F\u002Fcamo.githubusercontent.com\u002F49ac7d0f42e99d979c80a10d0ffd125f4b3df0ea\u002F68747470733a2f2f7261772e6769746875622e636f6d2f6b6f7261796b762f746f7263682d6e6e67726170682f6d61737465722f646f632f6d6c70335f666f72776172642e706e67\" height=\"450\">\u003Cbr>\n\u003Ci>Left: graph model of CNTK\u002FTheano\u002FTensorFlow; Right: graph model of Caffe\u002FTorch\u003C\u002Fi>\n\u003C\u002Fcenter>\n\n\n## Interfaces\n\n#### Caffe \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nCaffe has `pycaffe` interface but that's a mere secondary alternative to the command line interface. The model has to be defined in protobuf (usually with a plain text editor), even if you use `pycaffe`.\n\n#### CNTK \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_2_and_a_half_stars.png\">\nThe way to use CNTK, similar to Caffe, is to specify a config file and run command line. CNTK has Python support since V2.0 and C# support in progress.\n\n\n#### TensorFlow \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_and_a_half_stars.png\">\nTF supports two interfaces: Python and C++. This means that you can do experiments in a rich, high-level environment and deploy your model in an environment that requires native code or low latency.  \n\nIt would be perfect if TF supports `F#` or `TypeScript`. The lack of static type in Python is just ... painful :).\n\n#### Theano \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_stars.png\">\nPython\n\n#### Torch \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_stars.png\">\nTorch runs on LuaJIT, which is amazingly fast (comparable with industrial languages such as C++\u002FC#\u002FJava). Hence developers don't have to think about symbolic programming, which can be limited. They can just write all kinds of computations without worrying about performance penalty.\n\nHowever, let's face it, Lua is not yet a mainstream language.\n\n## Model Deployment\nHow easy to deploy a new model?\n\n#### Caffe \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_5_stars.png\">\nCaffe is C++ based, which can be compiled on a variety of devices. It is cross-platform (windows port is available and maintained [here](https:\u002F\u002Fgithub.com\u002FMSRDL\u002Fcaffe)). Which makes Caffe the best choice with respect deployment.\n\n#### CNTK \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_and_a_half_stars.png\">\nLike Caffe, CNTK is also C++ based and is cross-platform. Hence, deployment should be easy in most cases. However, to my understanding, it doesn't work on ARM architecture, which limits its its capability on mobile devices. \n\n#### TensorFlow \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_and_a_half_stars.png\">\nTF supports C++ interface and the library can be compiled\u002Foptimized on ARM architectures because it uses [Eigen](http:\u002F\u002Feigen.tuxfamily.org) (instead of a BLAS library). This means that you can deploy your trained models on a variety of devices (servers or mobile devices) without having to implement a separate model decoder or load Python\u002FLuaJIT interpreter [3].\n\nTF doesn't work on Windows yet so TF models can't be deployed on Windows devices though.\n\n#### Theano \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nThe lack of low-level interface and the inefficiency of Python interpreter makes Theano less attractive for industrial users. For a large model, the overhead of Python isn’t too bad but the dogma is still there.\n\nThe cross-platform nature (mentioned below) enables a Theano model to be deployed in a Windows environment. Which helps it gain some points.\n\n#### Torch \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nTorch require LuaJIT to run models. This makes it less attractive than bare bone C++ support of Caffe\u002FCNTK\u002FTF. It’s not just the performance overhead, which is minimal. The bigger problem is integration, at API level, with a larger production pipeline.\n\n\n## Performance\n### Single-GPU\nAll of these toolkits call cuDNN so as long as there’s no major computations or memory allocations at the outer level, they should perform similarly.\n\nSoumith@FB has done some [benchmarking for ConvNets](https:\u002F\u002Fgithub.com\u002Fsoumith\u002Fconvnet-benchmarks). Deep Learning is not just about feedforward convnets, not just about ImageNet, and certainly not just about a few passes over the network. However, Soumith’s benchmark is the only notable one as of today. So we will base the Single-GPU performance rating based on his benchmark.\n\n#### TensorFlow and Torch \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_5_stars.png\">\n\nTensorFlow used to be slow when it first came out but as of 05\u002F2016, it has reached the ballpark of other frameworks in terms of ConvNet speed. This is not surprising because every framework nowadays calls CuDNN for the actual computations.\n\nHere's my latest micro benchmark of TensorFlow 0.8 vs before. The measurement is latency, in milliseconds, for one full minibatch forward-backward pass on a single Titan X GPU. \n\n| Network | TF 0.6 [[ref](https:\u002F\u002Fgithub.com\u002Fsoumith\u002Fconvnet-benchmarks\u002Fblob\u002Fefb3d9321d14856f49951980dbea2f554190161a\u002FREADME.md)]                                                                     | TF 0.8 [my run] | Torch FP32 [my run] |\n|:------------------------:|:-----------------------------------------------------------------------------------------------------------:| ----------:| ------------:|\n| AlexNet      | 292  | 97  |  81  |\n| Inception v1 | 1237 | 518 |  470 |\n\n\n#### Theano \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nOn big networks, Theano’s performance is on par with Torch7, according to [this benchmark](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1211.5590v1.pdf). The main issue of Theano is startup time, which is terrible, because Theano has to compile C\u002FCUDA code to binary. We don’t always train big models. In fact, DL researchers often spend more time debugging than training big models. TensorFlow doesn’t have this problem. It simply maps the symbolic tensor operations to the already-compiled corresponding function calls.\n\nEven `import theano` takes time because this `import` apparently does a lot of stuffs. Also, after `import Theano`, you are stuck with a pre-configured device (e.g. `GPU0`).\n\n### Multi-GPU\nTBD \n\n## Architecture\nDeveloper Zone\n\n#### Caffe \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nCaffe's architecture was considered excellent when it was born but in the modern standard, it is considered average. The main pain points of Caffe are its layer-wise design in C++ and the protobuf interface for model definition.\n\n**Layer-wise design.** The building block of a network in Caffe is layer. \n- For new layer types, you have to define the full forward, backward, and gradient update. You can see an  already [long-list of layers implemented in (official) caffe](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe\u002Ftree\u002Fmaster\u002Fsrc\u002Fcaffe\u002Flayers).\n- What's worse is that if you want to support both CPU and GPU, you need to implement extra functions, e.g. [`Forward_gpu` and `Backward_gpu`](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe\u002Fblob\u002Fmaster\u002Fsrc\u002Fcaffe\u002Flayers\u002Fcudnn_conv_layer.cu).\n- Worse, you need to assign an int id to your layer type and add that to the [proto file](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe\u002Fblob\u002Fmaster\u002Fsrc\u002Fcaffe\u002Fproto\u002Fcaffe.proto#L1046). If your pull request is not merged early, you may need to change the id because someone else already claims that.\n\n**Protobuf.** Caffe has `pycaffe` interface but that's a mere replacement of the command line interface. The model has to be defined in protobuf (usually with a plain text editor), even if you use `pycaffe`.\n\n[*Copied from [my own answer on Quora](https:\u002F\u002Fwww.quora.com\u002FHow-is-TensorFlow-architected-differently-from-Caffe)*]\n \n### CNTK\nTo be updated ...\n\n#### TensorFlow \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_5_stars.png\">\nTF has a clean, modular architecture with multiple frontends and execution platforms. Details are in the [white paper](http:\u002F\u002Fdownload.tensorflow.org\u002Fpaper\u002Fwhitepaper2015.pdf).\n\n\u003Cimg src=\"http:\u002F\u002Fi.snag.gy\u002FsJlZe.jpg\" width=\"500\">\n\n#### Theano \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nThe architecture is fairly hacky: the whole code base is Python where C\u002FCUDA code is packaged as Python string. This makes it hard to navigate, debug, refactor, and hence contribute as developers.\n\n#### Torch \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_5_stars.png\">\nTorch7 and nn libraries are also well-designed with clean, modular interfaces.\n\n## Ecosystem\n- Caffe and CNTK: C++\n- TensorFlow: Python and C++\n- Theano: Python\n- Torch: Lua is not a mainstream language and hence libraries built for it are not as rich as ones built for Python.\n\n\n## Cross-platform\nCaffe, CNTK, TensorFlow and Theano work on all OSes. Torch does not work on Windows and there's no known plan to port from either camp.\n\n\u003Cbr>\n___ \n\n**Footnotes**\n\n[1] Note that I don’t aggregate ratings because different users\u002Fdevelopers have different priorities.\n\n[2] Disclaimer: I haven’t analyzed this extension carefully.\n\n[3] See my [blog post](http:\u002F\u002Fwww.kentran.net\u002F2014\u002F12\u002Fchallenges-in-machine-learning-practice.html) for why this is desirable.\n","# 深度学习工具包评估\n\n**警告**：本研究于2015年末完成，并在2016年初略有修改。自那时以来，许多工具包已显著改进。\n\n**摘要**。在本研究中，我评估了一些流行的深度学习工具包。候选工具包按字母顺序排列如下：[Caffe](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe)、[CNTK](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FCNTK)、[TensorFlow](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow)、[Theano](https:\u002F\u002Fgithub.com\u002FTheano\u002FTheano) 和 [Torch](https:\u002F\u002Fgithub.com\u002Ftorch\u002Ftorch7)。\n\n我还提供了一些领域的评分，因为对许多人来说，评分很有用。然而，请记住，评分本质上是主观的 [1]。\n\n如果您发现任何错误或不足之处，请通过提交问题来帮助改进。\n\n**目录**\n\n1. [建模能力](#modeling-capability)\n   - [接口](#interfaces)\n   - [模型部署](#model-deployment)\n   - [性能](#performance)\n   - [架构](#architecture)\n   - [生态系统](#ecosystem)\n   - [跨平台](#cross-platform)\n\n___\n\n## 建模能力\n在这一部分，我们评估每个工具包在\u003Cu>无需编写过多代码\u003C\u002Fu>的情况下训练常见及最先进网络的能力。其中一些网络包括：\n\n- 卷积神经网络：AlexNet、OxfordNet、GoogleNet\n- 循环神经网络：普通RNN、LSTM\u002FGRU、双向RNN\n- 带注意力机制的序列建模。\n\n此外，我们还评估了创建新型模型的灵活性。\n\n#### Caffe \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nCaffe或许是第一个主流的工业级深度学习工具包，始于2013年末，这得益于其当时出色的卷积神经网络实现。它至今仍是计算机视觉社区中最受欢迎的工具包，且不断有新的扩展被积极添加。\n\n然而，由于其遗留架构的限制，Caffe对循环神经网络和语言建模的支持较差，具体限制将在[架构部分](#architecture)中详细说明。\n\n#### CNTK \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_2_stars.png\">\nCNTK是由语音领域的人士发起的一个深度学习系统，他们曾[掀起深度学习热潮](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload?doi=10.1.1.185.1908&rep=rep1&type=pdf)，后来发展成为一个更通用、跨平台的深度学习系统。CNTK在语音社区中比在广义的深度学习社区中更为知名。\n\n在CNTK（以及TensorFlow和Theano）中，网络被定义为向量运算的符号图，例如矩阵加法、乘法或卷积。层只是这些运算的组合。构建块（运算）的细粒度使得用户无需使用低级语言实现即可发明新的复杂层类型（如Caffe那样）。\n\n#### TensorFlow \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_and_a_half_stars.png\">\n**最先进模型**\n\n- RNN API及其实现并不理想。团队也在[这里](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow\u002Fissues\u002F7)和[这里](https:\u002F\u002Fgroups.google.com\u002Fa\u002Ftensorflow.org\u002Fforum\u002F?utm_medium=email&utm_source=footer#!msg\u002Fdiscuss\u002FB8HyI0tVtPY\u002FaR43OIuUAwAJ)对此进行了评论。\n- 双向RNN[尚未可用](https:\u002F\u002Fgroups.google.com\u002Fa\u002Ftensorflow.org\u002Fforum\u002F?utm_medium=email&utm_source=footer#!msg\u002Fdiscuss\u002FlwgaL7WEuW4\u002FUXaL4bYkAgAJ)\n- 缺乏3D卷积，而3D卷积对于视频识别非常有用。\n\n**新模型**\n由于TF采用向量运算的符号图方法，指定一个新的网络相当容易。尽管它目前还不支持符号循环（至少截至2016年5月尚未经过充分测试或文档化），但可以使用[分桶技巧](https:\u002F\u002Fwww.tensorflow.org\u002Fversions\u002Fr0.8\u002Ftutorials\u002Fseq2seq\u002Findex.html#bucketing-and-padding)轻松高效地构建RNN。\n\n然而，TF在建模灵活性方面存在重大缺陷。所有的计算流程都必须以静态图的形式构建。这使得一些计算变得困难，例如[束搜索](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow\u002Fissues\u002F654)（常用于序列预测任务）。\n\n#### Theano \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_and_a_half_stars.png\">\n**最先进模型。** Theano实现了大多数最先进的网络，无论是通过高级框架（例如[Blocks](https:\u002F\u002Fgithub.com\u002Fmila-udem\u002Fblocks)、[Keras](https:\u002F\u002Fgithub.com\u002Ffchollet\u002Fkeras)等）还是纯Theano实现。\n\n**新模型。** Theano率先提出了使用符号图来编程网络的趋势。Theano的符号API支持循环控制，即所谓的[scan](http:\u002F\u002Fdeeplearning.net\u002Fsoftware\u002Ftheano\u002Ftutorial\u002Floop.html)，这使得实现RNN既简单又高效。用户不必总是从张量运算层面定义新模型。上述提到的一些高级框架使模型定义和训练更加简便。\n\n#### Torch \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_5_stars.png\">\n**最先进模型**\n\n- 非常适合卷积网络。值得注意的是，在TensorFlow\u002FTheano中可以通过`conv2d`实现时间卷积，但这只是一种技巧。而Torch原生的时间卷积接口使其使用起来更为直观。\n- 通过一个[非官方扩展](https:\u002F\u002Fgithub.com\u002FElement-Research\u002Frnn)提供了丰富的RNN集合 [2]\n\n**新模型。** 在Torch中，定义网络的方式有多种（层的堆叠或层的图），但本质上，网络是作为层的图来定义的。由于这种较粗的粒度，Torch有时被认为灵活性较低，因为对于新型层，用户需要实现完整的前向、反向传播以及梯度更新。\n\n然而，与Caffe不同，Torch定义新层要容易得多，因为你不需要用C++编程。此外，在Torch中，新层定义与网络定义之间的差异很小。而在Caffe中，层是用C++定义的，而网络则是通过`Protobuf`定义的。\n\nTorch比TensorFlow和Theano更具灵活性，因为它采用命令式编程，而TF\u002FTheano则采用声明式编程（即需要声明计算图）。这使得一些操作，例如束搜索，在Torch中更容易实现。\n\n---\n\u003Ccenter>\n\u003Cimg src=\"http:\u002F\u002Fi.snag.gy\u002F0loNv.jpg\" height=\"450\">  \u003Cimg src=\"https:\u002F\u002Fcamo.githubusercontent.com\u002F49ac7d0f42e99d979c80a10d0ffd125f4b3df0ea\u002F68747470733a2f2f7261772e6769746875622e636f6d2f6b6f7261796b762f746f7263682d6e6e67726170682f6d61737465722f646f632f6d6c70335f666f72776172642e706e67\" height=\"450\">\u003Cbr>\n\u003Ci>左：CNTK\u002FTheano\u002FTensorFlow的图模型；右：Caffe\u002FTorch的图模型\u003C\u002Fi>\n\u003C\u002Fcenter>\n\n## 接口\n\n#### Caffe \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nCaffe 提供了 `pycaffe` 接口，但那只是命令行接口的次要替代方案。即使使用 `pycaffe`，模型仍然必须用 Protocol Buffers 定义（通常通过纯文本编辑器）。\n\n#### CNTK \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_2_and_a_half_stars.png\">\n与 Caffe 类似，使用 CNTK 的方式是通过指定配置文件并运行命令行。CNTK 自 2.0 版本起支持 Python，而 C# 支持仍在开发中。\n\n\n#### TensorFlow \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_and_a_half_stars.png\">\nTF 支持两种接口：Python 和 C++。这意味着你可以在丰富的高级环境中进行实验，同时也可以在需要原生代码或低延迟的环境中部署模型。\n\n如果 TF 能够支持 `F#` 或 `TypeScript` 就更完美了。Python 缺乏静态类型检查确实让人感到非常痛苦 :).\n\n#### Theano \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_stars.png\">\nPython\n\n#### Torch \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_stars.png\">\nTorch 运行在 LuaJIT 上，速度惊人（可与 C++\u002FC#\u002FJava 等工业级语言媲美）。因此，开发者无需考虑可能受限的符号化编程，可以直接编写各种计算，而不必担心性能损失。\n\n然而，不得不承认，Lua 目前还不是主流语言。\n\n## 模型部署\n部署新模型有多容易？\n\n#### Caffe \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_5_stars.png\">\nCaffe 基于 C++，可以在多种设备上编译。它具有跨平台特性（Windows 移植版可用，并由 [这里](https:\u002F\u002Fgithub.com\u002FMSRDL\u002Fcaffe) 维护）。这使得 Caffe 在部署方面成为最佳选择。\n\n#### CNTK \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_and_a_half_stars.png\">\n与 Caffe 类似，CNTK 也基于 C++ 并且是跨平台的。因此，在大多数情况下部署应该比较容易。不过据我所知，它无法在 ARM 架构上运行，这限制了其在移动设备上的应用能力。\n\n#### TensorFlow \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_4_and_a_half_stars.png\">\nTF 支持 C++ 接口，并且由于使用了 [Eigen](http:\u002F\u002Feigen.tuxfamily.org)，该库可以在 ARM 架构上编译和优化（而不是依赖 BLAS 库）。这意味着你可以在各种设备上部署训练好的模型（无论是服务器还是移动设备），而无需实现单独的模型解码器或加载 Python\u002FLuaJIT 解释器 [3]。\n\n不过，TF 目前尚不支持 Windows，因此无法在 Windows 设备上部署 TF 模型。\n\n#### Theano \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\n缺乏底层接口以及 Python 解释器的效率问题，使得 Theano 对工业用户吸引力不足。对于大型模型来说，Python 的开销不算太大，但这种局限性依然存在。\n\nTheano 具有跨平台特性（如下所述），因此可以在 Windows 环境中部署。这为其赢得了一些加分。\n\n#### Torch \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nTorch 需要 LuaJIT 才能运行模型。这使其相比 Caffe\u002FCNTK\u002FTF 等直接支持 C++ 的框架吸引力较低。问题不仅在于性能开销——虽然很小——更大的挑战在于如何在 API 层面与更大的生产流水线集成。\n\n\n## 性能\n### 单 GPU\n这些工具包都调用了 cuDNN，因此只要在外层没有大规模的计算或内存分配，它们的性能应该相差不大。\n\nSoumith@FB 曾对卷积神经网络进行过一些 [基准测试](https:\u002F\u002Fgithub.com\u002Fsoumith\u002Fconvnet-benchmarks)。深度学习不仅仅是前馈卷积网络、ImageNet 数据集，更不是仅仅对网络进行几次迭代那么简单。然而，截至目前，Soumith 的基准测试仍是唯一值得注意的。因此，我们将根据他的基准测试来评估单 GPU 的性能评分。\n\n#### TensorFlow 和 Torch \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_5_stars.png\">\n\nTensorFlow 刚推出时速度较慢，但截至 2016 年 5 月，其卷积神经网络的速度已与其他框架相当。这并不令人意外，因为如今每个框架的实际计算都依赖于 cuDNN。\n\n以下是我在最新版本 TensorFlow 0.8 上进行的微基准测试结果。测量的是在单块 Titan X GPU 上，一个完整的小批量前向-反向传播的延迟，单位为毫秒。\n\n| 网络 | TF 0.6 [[ref](https:\u002F\u002Fgithub.com\u002Fsoumith\u002Fconvnet-benchmarks\u002Fblob\u002Fefb3d9321d14856f49951980dbea2f554190161a\u002FREADME.md)]                                                                     | TF 0.8 [我的测试] | Torch FP32 [我的测试] |\n|:------------------------:|:-----------------------------------------------------------------------------------------------------------:| ----------:| ------------:|\n| AlexNet      | 292  | 97  |  81  |\n| Inception v1 | 1237 | 518 |  470 |\n\n\n#### Theano \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\n根据 [这份基准测试](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1211.5590v1.pdf)，在大型网络上，Theano 的性能与 Torch7 不相上下。Theano 的主要问题在于启动时间极长，因为它需要将 C\u002FCUDA 代码编译成二进制文件。我们并非总是训练大型模型，事实上，深度学习研究人员往往花更多的时间调试，而非训练大型模型。而 TensorFlow 则不存在这个问题，它只需将符号化的张量操作映射到已经编译好的相应函数调用即可。\n\n甚至仅仅是 `import theano` 也需要时间，因为这个导入似乎做了很多工作。此外，导入 Theano 后，你只能使用预设的设备（例如 `GPU0`）。\n\n### 多 GPU\n待定\n\n## 架构\n开发者专区\n\n#### Caffe \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nCaffe的架构在刚诞生时被认为非常优秀，但按照现代标准来看，它只能算作中等水平。Caffe的主要痛点在于其基于C++的层式设计，以及使用Protocol Buffers接口来定义模型。\n\n**层式设计。** Caffe中网络的基本构建模块是“层”。\n- 对于新的层类型，你需要完整地实现前向传播、反向传播和梯度更新逻辑。你可以在[官方Caffe已实现的大量层列表](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe\u002Ftree\u002Fmaster\u002Fsrc\u002Fcaffe\u002Flayers)中看到这些内容。\n- 更糟糕的是，如果你希望同时支持CPU和GPU，还需要额外实现一些函数，例如[`Forward_gpu`和`Backward_gpu`](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe\u002Fblob\u002Fmaster\u002Fsrc\u002Fcaffe\u002Flayers\u002Fcudnn_conv_layer.cu)。\n- 甚至更麻烦的是，你必须为自己的层类型分配一个整数ID，并将其添加到[proto文件](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe\u002Fblob\u002Fmaster\u002Fsrc\u002Fcaffe\u002Fproto\u002Fcaffe.proto#L1046)中。如果你的Pull Request没有及时被合并，你可能还需要更改这个ID，因为其他人可能已经占用了它。\n\n**Protocol Buffers。** Caffe提供了`pycaffe`接口，但这只是命令行接口的一种替代方式。即使使用`pycaffe`，模型仍然必须用Protocol Buffers格式定义（通常通过纯文本编辑器完成）。\n\n[*摘自我在Quora上的回答：[TensorFlow与Caffe的架构有何不同？](https:\u002F\u002Fwww.quora.com\u002FHow-is-TensorFlow-architected-differently-from-Caffe)*]\n\n### CNTK\n待更新……\n\n#### TensorFlow \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_5_stars.png\">\nTF拥有清晰、模块化的架构，支持多种前端和执行平台。具体细节可见其[白皮书](http:\u002F\u002Fdownload.tensorflow.org\u002Fpaper\u002Fwhitepaper2015.pdf)。\n\n\u003Cimg src=\"http:\u002F\u002Fi.snag.gy\u002FsJlZe.jpg\" width=\"500\">\n\n#### Theano \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_3_stars.png\">\nTheano的架构相对较为“hacky”：整个代码库都是用Python编写的，而C\u002FCUDA代码则被打包成Python字符串的形式。这使得代码难以导航、调试和重构，也因此对开发者来说贡献起来较为困难。\n\n#### Torch \u003Cimg src=\"http:\u002F\u002Fwww.wpclipart.com\u002Fsigns_symbol\u002Fstars\u002F5_star_rating_system\u002F.cache\u002F5_Star_Rating_System_5_stars.png\">\nTorch7及其nn库同样设计良好，具有清晰、模块化的接口。\n\n## 生态系统\n- Caffe和CNTK：主要使用C++\n- TensorFlow：同时支持Python和C++\n- Theano：主要使用Python\n- Torch：Lua并非主流语言，因此基于Lua构建的库数量远不及Python生态中的丰富程度。\n\n\n## 跨平台支持\nCaffe、CNTK、TensorFlow和Theano均能在所有操作系统上运行。而Torch目前无法在Windows上运行，且双方阵营中也暂无移植计划。\n\n\u003Cbr>\n___\n\n**脚注**\n\n[1] 需要注意的是，我并未对评分进行简单平均，因为不同的用户或开发者往往有不同的优先级。\n\n[2] 免责声明：我对这一扩展尚未进行深入分析。\n\n[3] 关于为何这一点值得提倡，请参阅我的[博客文章](http:\u002F\u002Fwww.kentran.net\u002F2014\u002F12\u002Fchallenges-in-machine-learning-practice.html)。","# deepframeworks 快速上手指南\n\n> **重要提示**：本项目（deepframeworks）并非一个可安装的深度学习框架，而是一份**2015 年底至 2016 年初**发布的深度学习工具包评估研究报告。它对比了 Caffe、CNTK、TensorFlow、Theano 和 Torch 等主流框架的特性。\n>\n> 由于内容较为陈旧，许多被评估的框架（如 TensorFlow、PyTorch 等）已发生巨大变化。本文档旨在帮助您快速浏览该评估报告的核心内容，而非指导您安装一个名为\"deepframeworks\"的软件。\n\n## 环境准备\n\n由于本项目本质上是 Markdown 格式的文档报告，**无需安装任何深度学习依赖或 GPU 驱动**即可阅读。\n\n*   **系统要求**：任意操作系统（Windows, macOS, Linux）。\n*   **前置依赖**：\n    *   Git（用于克隆仓库）\n    *   任意 Markdown 阅读器（或直接使用 GitHub 网页版查看）\n\n## 获取与查看步骤\n\n您可以通过克隆仓库在本地查看这份评估报告。\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fkyegupta\u002Fdeepframeworks.git\n    ```\n    *(注：如果原仓库地址不可用，请在 GitHub 上搜索 `deepframeworks` 查找对应的存档仓库)*\n\n2.  **进入目录**\n    ```bash\n    cd deepframeworks\n    ```\n\n3.  **查看报告**\n    *   **推荐方式**：直接在 GitHub 网页端打开 `README.md` 文件阅读，格式最完整。\n    *   **本地方式**：使用 VS Code、Typora 或任何支持 Markdown 的编辑器打开 `README.md`。\n\n## 核心内容速览\n\n该报告从以下几个维度对当时的五大框架进行了评分（满分 5 星）：\n\n### 1. 建模能力 (Modeling Capability)\n评估在不编写大量代码的情况下训练主流网络（如 ConvNets, RNNs, LSTM）的能力。\n*   **Torch (5⭐)**: 最灵活，支持命令式编程，易于实现新层和复杂操作（如 Beam Search）。\n*   **TensorFlow \u002F Theano (4.5⭐)**: 基于符号图，拥有丰富的高层框架支持（如 Keras），但在当时对循环神经网络（RNN）的支持略显不足。\n*   **Caffe (3⭐)**: 卷积网络（ConvNets）表现优异，但受限于架构，对递归网络和语言建模支持较差。\n*   **CNTK (2.5⭐)**: 起源于语音领域，基于符号图，当时在通用社区知名度较低。\n\n### 2. 接口友好度 (Interfaces)\n*   **TensorFlow**: 支持 Python 和 C++，适合实验与部署分离，但当时缺乏静态类型支持。\n*   **Torch**: 基于 LuaJIT，性能极高且无需符号编程思维，但 Lua 语言非主流。\n*   **Theano**: 纯 Python 接口。\n*   **Caffe \u002F CNTK**: 主要依赖配置文件和命令行，Python 接口在当时仅为辅助或刚起步。\n\n### 3. 模型部署 (Model Deployment)\n*   **Caffe (5⭐)**: 基于 C++，跨平台支持最好，是当时部署的首选。\n*   **TensorFlow \u002F CNTK (4.5⭐)**: 支持 C++ 接口，TF 可利用 Eigen 库在 ARM 架构（移动端）运行；CNTK 当时不支持 ARM。\n*   **Theano \u002F Torch (3⭐)**: 分别依赖 Python 解释器和 LuaJIT，在生产环境集成和性能开销上不如纯 C++ 方案友好。\n\n### 4. 性能 (Performance)\n*   在单 GPU 环境下，由于各框架底层均调用 **cuDNN**，卷积网络的性能差异不大。\n*   **TensorFlow** 在早期版本较慢，但在 2016 年 5 月左右已优化至与其他框架持平。\n\n---\n*本指南仅基于原文档内容整理。对于现代深度学习开发，建议参考 PyTorch 或最新版 TensorFlow 的官方文档。*","某计算机视觉初创团队在 2016 年初期需要为视频识别项目选型深度学习框架，面临从 Caffe、TensorFlow 到 Torch 等多个主流工具的决策难题。\n\n### 没有 deepframeworks 时\n- 团队成员需分别阅读各框架冗长的英文文档和零散的 GitHub Issue，耗时数周仍难以横向对比其对双向 RNN 或 3D 卷积的支持情况。\n- 由于缺乏客观的建模能力评估，团队误选了当时对循环神经网络支持较弱的 Caffe，导致后续开发视频序列模型时需大量编写底层代码“造轮子”。\n- 在性能与生态系统的权衡上只能依靠社区道听途说，无法量化判断 TensorFlow 的符号图机制是否真能比 Theano 更灵活地构建新模型。\n- 试错成本极高，一旦框架选型失误，整个项目的架构推倒重来，直接延误产品上线周期。\n\n### 使用 deepframeworks 后\n- 团队直接参考 deepframeworks 提供的多维评分表，快速定位到 TensorFlow 在通用建模上的高分优势及其在 RNN 方面的具体短板，决策时间缩短至 2 天。\n- 依据报告中关于“建模灵活性”的详细分析，团队避开了对递归网络支持不佳的工具，选择了更适合序列建模的架构方案，减少了 80% 的自定义算子开发工作。\n- 通过报告中对接口友好度和跨平台能力的对比，团队提前预判了部署阶段的潜在坑点，制定了更稳妥的工程落地路径。\n- 借助清晰的星级评价和优缺点总结，非算法背景的技术负责人也能参与讨论，统一了团队内部的技术选型共识。\n\ndeepframeworks 将晦涩的技术细节转化为直观的决策依据，帮助开发者在深度学习框架混战初期以最低试错成本锁定最优技术栈。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzer0n_deepframeworks_01a6ea13.png","zer0n","Kenneth Tran","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fzer0n_636b0bda.jpg",null,"Microsoft Research","http:\u002F\u002Fwww.kentran.net","https:\u002F\u002Fgithub.com\u002Fzer0n",2043,300,"2026-04-07T13:58:02",5,"Linux, macOS, Windows (部分支持\u002F需移植)","需要 NVIDIA GPU (基于 cuDNN)，具体型号和显存未说明，需安装对应版本的 CUDA","未说明",{"notes":87,"python":88,"dependencies":89},"该文档是 2015-2016 年的旧版评估报告，所列工具（如 Caffe, Theano, 旧版 Torch\u002FTensorFlow）的版本和依赖关系已过时，不适用于现代环境。文中提到：Caffe 和 CNTK 基于 C++；Torch 依赖 LuaJIT 而非 Python；TensorFlow 当时不支持 Windows；CNTK 当时不支持 ARM 架构。所有框架在单 GPU 性能上均依赖 cuDNN。","支持 Python (Caffe\u002FTheano\u002FTensorFlow\u002FCNTK)，Torch 主要使用 LuaJIT",[90,91,92,93,94,95],"cuDNN","Eigen (TensorFlow)","BLAS","LuaJIT (Torch)","Protobuf (Caffe)","Keras\u002FBlocks (Theano 高层框架)",[14,97],"其他","2026-03-27T02:49:30.150509","2026-04-16T08:19:15.356479",[101,106,111,116,121,126],{"id":102,"question_zh":103,"answer_zh":104,"source_url":105},35430,"CNTK 是否有 Python 接口？","是的，CNTK 现已提供 Python 接口。虽然早期版本缺乏高级语言支持，但现在可以通过 Microsoft 官方仓库中的 contrib 部分找到 Python 绑定：https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FCNTK\u002Ftree\u002Fmaster\u002Fcontrib\u002FPython。","https:\u002F\u002Fgithub.com\u002Fzer0n\u002Fdeepframeworks\u002Fissues\u002F15",{"id":107,"question_zh":108,"answer_zh":109,"source_url":110},35429,"Torch 是否支持计算图（Computational Graph）方法？","支持。Torch 可以通过 `nngraph` (https:\u002F\u002Fgithub.com\u002Ftorch\u002Fnngraph) 或 `Autograd` (https:\u002F\u002Fgithub.com\u002Ftwitter\u002Ftorch-autograd) 库来实现计算图方法。此外，官方路线图中曾计划将 `nn` 和 `nngraph` 合并，以提供更统一的抽象层。","https:\u002F\u002Fgithub.com\u002Fzer0n\u002Fdeepframeworks\u002Fissues\u002F3",{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},35428,"TensorFlow 支持在 Windows 上运行吗？","是的，TensorFlow 自 0.12 版本起已原生支持 Windows。用户可以直接在 Win 10 上使用。若需 GPU 支持（需 CUDA 8），可运行命令：`pip install tensorflow-gpu`。虽然早期可以通过 Windows 的 Ubuntu Bash 子系统运行，但原生支持更为稳定且能更好地利用 GPU 资源。","https:\u002F\u002Fgithub.com\u002Fzer0n\u002Fdeepframeworks\u002Fissues\u002F19",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},35431,"Torch 模型是否容易部署到移动端或集成到其他系统？","是的，Lua 和 Torch 可以轻松部署到运行 iOS 和 Android 的智能手机上。对于服务器端集成，Lua 可以与 C、C++、CUDA、Java、Python 等多种语言进行交互，这使得它很容易集成到现有的生产管道中。尽管其可视化库相对较弱，但在部署和集成方面具有显著优势。","https:\u002F\u002Fgithub.com\u002Fzer0n\u002Fdeepframeworks\u002Fissues\u002F23",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},35432,"Caffe 是否仅限于 x86 架构或需要 CUDA 才能运行？","不是。Caffe 并不强制要求 x86 架构或 CUDA，它已经被成功移植到 iOS 和 Android 平台上，具有良好的跨平台能力。","https:\u002F\u002Fgithub.com\u002Fzer0n\u002Fdeepframeworks\u002Fissues\u002F2",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},35433,"在 Torch 中实现新层（如 LSTM）是否需要手动编写完整的前向和反向传播代码？","不需要。在 Torch 中，新层可以像 Theano 一样通过组合现有层来构建，无需为每个新层手动实现完整的前向\u002F反向传播逻辑。例如，可以参考 karpathy 的 char-rnn 项目中的 LSTM 和 GRU 实现示例。","https:\u002F\u002Fgithub.com\u002Fzer0n\u002Fdeepframeworks\u002Fissues\u002F1",[]]