[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-hughperkins--DeepCL":3,"tool-hughperkins--DeepCL":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":108,"forks":109,"last_commit_at":110,"license":111,"difficulty_score":112,"env_os":113,"env_gpu":114,"env_ram":115,"env_deps":116,"category_tags":126,"github_topics":78,"view_count":23,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":127,"updated_at":128,"faqs":129,"releases":160},3187,"hughperkins\u002FDeepCL","DeepCL","OpenCL library to train deep convolutional neural networks","DeepCL 是一个基于 OpenCL 的开源库，专为训练深度卷积神经网络而设计。它旨在解决在异构计算硬件上高效构建和训练深度学习模型的需求，让开发者能够充分利用 GPU 或 APU 的并行计算能力，而不仅限于特定的厂商生态。\n\n这款工具非常适合需要底层控制权的深度学习研究人员、算法工程师以及希望探索强化学习（如 Q-learning）的开发者。通过提供 C++ 核心库以及便捷的 Python、Lua 封装和命令行接口，DeepCL 降低了高性能网络训练的门槛。其技术亮点在于广泛的兼容性，支持多种网络层类型（如卷积、池化、Dropout）、丰富的激活函数（包括 ReLU、ELU 等）以及多样化的优化器（如 SGD、Adadelta、Nesterov）。此外，它还支持多列网络架构和随机数据增强，并兼容 JPEG、MNIST 等多种数据格式。无论是进行图像识别研究还是博弈策略训练，DeepCL 都能为用户提供灵活且高效的解决方案。","  DeepCL\n==========\n\n- [Python API](python\u002FREADME.md)\n- [Command line API](doc\u002FCommandline.md)\n- [C++ API](doc\u002FNeuralNetAPI.md)\n- [Q-learning](doc\u002FQLearning.md)\n- [To build](doc\u002FBuild.md)\n- [Development](doc\u002FDevelopment.md)\n- [Changes](doc\u002FChanges.md)\n\nDeepCL\n==========\n\nOpenCL library to train deep convolutional networks\n- C++\n- OpenCL\n- Deep convolutional\n- Python wrappers\n- Lua wrappers\n- Q-learning\n\nAPIs:\n* [Python](python\u002FREADME.md)\n* [c++](doc\u002FNeuralNetAPI.md)\n* [command-line](doc\u002FCommandline.md)\n\nLayer types:\n* convolutional\n* max-pooling\n* normalization\n* activation\n* dropout\n* random translations\n* random patches\n* loss\n\nLoss layer types:\n* softmax\n* cross-entropy (synonymous with multinomial logistic, etc)\n* square loss\n\nTrainers:\n* SGD\n* Anneal\n* Nesterov\n* Adagrad\n* Rmsprop\n* Adadelta\n\nActivations:\n* tanh\n* scaled tanh (1.7519 * tanh(2\u002F3x) )\n* linear\n* sigmoid\n* relu\n* elu (new!)\n\n[Loader formats](doc\u002FLoaders.md):\n* jpegs\n* mnist\n* kgsv2\n* norb\n\nWeight initializers:\n* original\n* uniform\n* more possible...\n\nMulticolumn net also possible, as in [McDnn](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1202.2745.pdf)\n\n# Example usages\n\n- obtained 37.2% test accuracy, on next move prediction task, using 33.6 million training examples from [kgsgo v2 dataset](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fkgsgo-dataset-preprocessor)\n  - commandline used `.\u002Fdeepcl_train dataset=kgsgoall netdef=12*(32c5z-relu)-500n-tanh-361n numepochs=15 learningrate=0.0001`\n  - 2 epochs, 2 days per epoch, on an Amazon GPU instance, comprising half an NVidia GRID K520 GPU (about half as powerful as a GTX780)\n- obtained 99.5% test accuracy on MNIST, using `netdef=rt2-8c5z-relu-mp2-16c5z-relu-mp3-150n-tanh-10n numepochs=20 multinet=6 learningrate=0.002`\n   - epoch time 99.8 seconds, using an Amazon GPU instance, ie half an NVidia GRID K520 GPU (since we are learning 6 nets in parallel, so 16.6seconds per epoch per net)\n\n# Installation\n\n## Native library installation\n\nThis section installs the native libraries, and the command-line tools.  You always need to do this part, even if you will use the Python wrappers.\n\n### Windows\n\n#### Pre-requisites:\n\n* OpenCL-enabled GPU or APU, along with appropriate OpenCL driver installed\n* Tested using Windows 2012 RC2, and (New!) Visual Studio 2015, this is how the CI builds run\n\n#### Procedure:\n\n* Download latest binary zip file from http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F (eg from v8.0.0rc8)\n* unzip it, which creates the `dist` folder\n* To test it:\n  * open a cmd\n  * run `call dist\\bin\\activate.bat` (adjusting the path appropriately for wherever you downloaded deepcl binaries to)\n  * now, eg try `deepcl_unittests`\n  * (New!), you can choose which gpu to run tests on now, eg: `deepcl_unittests gpuindex=1`\n\nNote that you need to \"activate\" the installation each time you open a new cmd prompt (or you could add appropriate environment variables permanently, using Control Panel | System | Advanced System Settings | Environment Variables)\n\n### Linux\n\n#### Pre-requisites:\n\n* OpenCL-enabled GPU or APU, along with appropriate OpenCL driver installed (can check by running `clinfo`, which should show your desired GPU device)\n* Tested using Ubuntu 14.04 32-bit\u002F64-bit\n\n#### Procedure:\n\n* Download latest tar file from http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F (eg from v8.0.0rc8)\n* untar it, which creates the `dist` sub-folder\n* in a bash prompt, run `source dist\u002Fbin\u002Factivate.sh` (adjust the path appropriate for wherever you untarred the binaries tar file to)\n* test by doing, from the same bash prompt, eg `deepcl_unittests`\n  * (New!), you can choose which gpu to run tests on now, eg: `deepcl_unittests gpuindex=1`\n\nNote that you need to \"activate\" the installation each time you open a new bash prompt (or you can call activate.sh from your `.bashrc` file)\n\n## Python wrappers\n\n* make sure you already installed the native library, and \"activate\"d it, by doing `call dist\\bin\\activate.bat`, or `source dist\u002Fbin\u002Factivate.sh`\n* run `pip install --pre DeepCL`\n* test by doing `python -c \"import PyDeepCL; cl = PyDeepCL.DeepCL()\"`\n\n## To build from source\n\nBuilding from source is only needed if installing from binaries doesn't work for your configuration, or if you want to modify DeepCL.\n\nSee [Build.md](doc\u002FBuild.md)\n\n## What if it doesn't run?\n\n* Check if you have an OpenCL-enabled device on your system\n  * ideally a GPU, or accelerator, since there is no attempt to optimize DeepCL for CPUs (at least, not currently, could change, feel free to submit a pull request :-) )\n* Try running `gpuinfo` (from [EasyCL](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FEasyCL), but built as part of this project too, for ease of use )\n  * it should output at least one OpenCL-enabled device\n  * if it doesn't, then you need to make sure you have an OpenCL-enabled device, and that appropriate drivers are installed, and that the ICD is configured appropriately (registry in Windows, and `\u002Fetc\u002FOpenCL\u002Fvendors` in linux)\n\n# What if I need a new feature?\n\nPlease raise an issue, let me know you're interested.\n* If it's on my list of things I was going to do sooner or later anyway (see below), I might do it sooner rather than later.\n* If it's to do with usability, I will try to make that a priority\n\nWhat if I want to contribute myself?\n=================\n\n- please feel free to fork this repository, tweak things, send a pull request.  Or get in contact. Or both :-)\n\nThird-party libraries\n=====================\n\n* [EasyCL](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FEasyCL)\n* [clew](https:\u002F\u002Fgithub.com\u002Fmartijnberger\u002Fclew)\n* [libpng++](http:\u002F\u002Fwww.nongnu.org\u002Fpngpp\u002Fdoc\u002F0.2.1\u002F)\n* lua\n* cogapp\n\nHardware\u002Fdriver specific issues\n===============================\n\n* If you're using Clover, you might want to look at:\n  * this thread [https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fissues\u002F35](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fissues\u002F35)\n  * this branch [https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Ftree\u002Fclover-compatibility](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Ftree\u002Fclover-compatibility)\n  * Note that Clover is NOT supported, these are just provided as \"starting-points\", in case someone wants to dabble in this :)\n\nRelated projects\n================\n\n* [kgsgo-dataset-preprocessor](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fkgsgo-dataset-preprocessor) Dataset based on kgsgo games; 33 million data points\n* [cltorch](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fcltorch)\n* [clnn](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fclnn)\n\nLicense\n=======\n\n[Mozilla Public License 2.0](http:\u002F\u002Fmozilla.org\u002FMPL\u002F2.0\u002F)\n\nRecent changes\n==============\n\n* 2017 May 2nd:\n  * branch `update-easycl-mac` updated to latest EasyCL, and unit-tests tested on Mac Sierra against:\n    * Intel HD Graphics 530 GPU\n    * Radeon Pro 450 GPU\n  * This latest EasyCL lets you use environment variable `CL_GPUOFFSET` to select gpus, eg set to `1` for second GPU, or `2` for third\n  * Thank you to my employer [ASAPP](http:\u002F\u002Fasapp.com) for providing me use of said Mac Sierra :-)\n* 7th August 2016:\n  * \"standard\" version of windows compiler changed from msvc2010 to msvc2015 update 3  (no change to linux\u002Fmac)\n  * \"standard\" version of python 3.x on windows changed from 3.4 to 3.5  (no change to linux\u002Fmac)\n  * (note: python2.7 continues to work as before on all of Windows 32\u002F64, linux, Mac)\n  * standard c++ version on linux\u002Fmac changed from c++0x to c++11\n* 29th July 2016:\n  * python fixes:\n    * CHANGE: must use numpy tensors now, `array.array` no longer accepted\n    * New feature: can provide numpy tensors as 4d tensors now, no longer have to be 1d tensors\n    * Bug fix: q-learning working again now (hopefully)\n* 26th July 2016:\n  * fixed some bugs in manifest loader\n  * no longer need to specify the number of images in the first line of the manifest file\n  * added `gpuindex=` option to `deepcl_unittests` (quite beta for now...)\n* 4th January 2016:\n  * fixed a number of build warnings on Mac, both in OpenCL build, and C++ build\n* 3rd January 2016:\n  * create Mac OS X build on Travis, and fix the build, https:\u002F\u002Ftravis-ci.org\u002Fhughperkins\u002FDeepCL\n* 27th November:\n  * added [ELU](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.07289v1.pdf)\n* Week of 26th October:\n  * created branch `clblas-2.8.0`, which works with Visual Studio 2015.  It uses the latest 2.8.x release of clBLAS.  Thank you to jakakonda for helping to test this and get it working.\n* Aug 28th:\n  * merged 8.x branch to master, will release first version of 8.x shortly\n  * installation of 8.x from binaries on Windows works now, by doing, eg on 32-bit Windows 7, and assuming you already activated an appropriate python environment (assumes 7-zip is installed, in default location, otherwise do the unzip by hand):\n```\npowershell Set-ExecutionPolicy unrestricted\nrem following command is like `wget` in linux:\npowershell.exe -Command (new-object System.Net.WebClient).DownloadFile('http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002Fdeepcl-win32-v8.0.0rc8.zip', 'deepcl-win32-v8.0.0rc8.zip')\nrem following command is like `tar -xf` in linux:\n\"c:\\program files\\7-Zip\\7z.exe\" x deepcl-win32-v8.0.0rc8.zip\ncall dist\\bin\\activate.bat\npip install --pre DeepCL\npython -c \"import PyDeepCL; cl = PyDeepCL.DeepCL()\"\n# (last line is just to check works ok)\n```\n* Aug 26th: installation of 8.x from binaries on linux works now, by doing, eg on 64-bit Ubuntu 14.04:\n```\nmkdir 8.0.0rc4\ncd 8.0.0rc4\nwget http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002Fdeepcl-linux64-v8.0.0rc4.tar.bz2\ntar -xf deepcl-linux64-v8.0.0rc4.tar.bz2\nvirtualenv env\nsource env\u002Fbin\u002Factivate\nsource dist\u002Fbin\u002Factivate.sh\npip install --pre DeepCL\npython -c \"import PyDeepCL; cl = PyDeepCL.DeepCL()\"\n```\n(last line is just to check works ok)\n\n* Aug 21st-24th:\n  * 8.x finally builds again on all CI tested configurations!\n    * ubuntu 14.04 32-bit Python 2.7\n    * ubuntu 14.04 32-bit Python 3.4\n    * ubuntu 14.04 64-bit Python 2.7\n    * ubuntu 14.04 64-bit Python 3.4\n    * visual studio 2010 32-bit python 2.7\n    * visual studio 2010 32-bit python 3.4\n    * visual studio 2010 64-bit python 2.7\n    * visual studio 2010 64-bit python 3.4\n* Aug 19th-20th:\n  * Python wrappers now built using a very thin setup.py layer, on top of the standard native DeepCL build\n* Aug 18th:\n  * added BackwardIm2Col layer, which uses im2col for backward propagation\n  * added BackpropWeightsIm2Col layer, which uses im2col for weight update\n  * added BackwardAuto layer, which automatically selects fastest Backward layer\n  * added BackpropWeightsAuto layer, which automatically selects faster weight update layer\n  * under the covers:\n    * created ClBlasHelper, to handle Gemm and Gemv\n    * factorized im2col into Im2Col class\n* week up to Aug 17th:\n  * added forward and backward im2col layer\n  * forward im2col automatically used during forward propagation, where appropriate\n  * backwards has yet to be integrated\n  * under the covers:\n    * added clBLAS\n    * migrated the Python build process to use cmake, rather than setup.py (whether this turns out to be good or bad is a bit up in the air for now)\n* June 22nd:\n  * removed lua wrappers\n  * if you want to use lua with OpenCL, please consider using [cltorch](http:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fcltorch) and [clnn](http:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fclnn)\n\nTo get in contact\n=================\n\nJust create an issues, in github, in the top right of this page.  Don't worry about whether you think the issue sounds silly or anything.  The more feedback the better!\n\nNote that I'm currently focused 100.000% on [cuda-on-cl](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fcuda-on-cl), so please be patient during this period.\n","DeepCL\n==========\n\n- [Python API](python\u002FREADME.md)\n- [命令行 API](doc\u002FCommandline.md)\n- [C++ API](doc\u002FNeuralNetAPI.md)\n- [Q-learning](doc\u002FQLearning.md)\n- [构建指南](doc\u002FBuild.md)\n- [开发指南](doc\u002FDevelopment.md)\n- [变更记录](doc\u002FChanges.md)\n\nDeepCL\n==========\n\n一个用于训练深度卷积神经网络的 OpenCL 库\n- C++\n- OpenCL\n- 深度卷积\n- Python 封装\n- Lua 封装\n- Q-learning\n\nAPI：\n* [Python](python\u002FREADME.md)\n* [C++](doc\u002FNeuralNetAPI.md)\n* [命令行](doc\u002FCommandline.md)\n\n层类型：\n* 卷积层\n* 最大池化层\n* 归一化层\n* 激活层\n* Dropout 层\n* 随机平移层\n* 随机裁剪层\n* 损失层\n\n损失层类型：\n* softmax\n* 交叉熵（与多项逻辑回归等同）\n* 均方误差\n\n优化器：\n* SGD\n* Anneal\n* Nesterov\n* Adagrad\n* Rmsprop\n* Adadelta\n\n激活函数：\n* tanh\n* 缩放 tanh（1.7519 * tanh(2\u002F3x)）\n* 线性\n* sigmoid\n* ReLU\n* ELU（新！）\n\n[数据加载格式](doc\u002FLoaders.md)：\n* JPEG 图片\n* MNIST 数据集\n* KGSGO v2 数据集\n* NORB 数据集\n\n权重初始化方法：\n* 原始初始化\n* 均匀分布\n* 更多可能性...\n\n也支持多列网络结构，如 [McDNN](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1202.2745.pdf) 中所述。\n\n# 示例用法\n\n- 在下一步棋预测任务中，使用来自 [KGSGO v2 数据集](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fkgsgo-dataset-preprocessor) 的 3360 万条训练样本，获得了 37.2% 的测试准确率。\n  - 命令行指令为：`.\u002Fdeepcl_train dataset=kgsgoall netdef=12*(32c5z-relu)-500n-tanh-361n numepochs=15 learningrate=0.0001`\n  - 每个 epoch 耗时 2 天，共运行了 2 个 epoch，使用的硬件是一台亚马逊 GPU 实例，其中包含半块 NVidia GRID K520 GPU（性能约为 GTX780 的一半）。\n\n- 在 MNIST 数据集上，使用 `netdef=rt2-8c5z-relu-mp2-16c5z-relu-mp3-150n-tanh-10n numepochs=20 multinet=6 learningrate=0.002` 的配置，获得了 99.5% 的测试准确率。\n  - 每个 epoch 耗时 99.8 秒，使用的硬件同样是亚马逊的一半 NVidia GRID K520 GPU（由于我们并行训练 6 个网络，因此每个网络每个 epoch 只需约 16.6 秒）。\n\n# 安装\n\n## 原生库安装\n\n本节介绍原生库及命令行工具的安装。即使你打算使用 Python 封装，也需要先完成这一步。\n\n### Windows\n\n#### 先决条件：\n\n* 支持 OpenCL 的 GPU 或 APU，并已安装相应的 OpenCL 驱动程序\n* 已在 Windows 2012 RC2 和（全新）Visual Studio 2015 上测试通过，这也是持续集成构建所使用的环境。\n\n#### 操作步骤：\n\n* 从 http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F 下载最新的二进制压缩包（例如 v8.0.0rc8 版本）\n* 解压后会生成 `dist` 文件夹\n* 测试方法：\n  * 打开命令提示符\n  * 运行 `call dist\\bin\\activate.bat`（请根据你下载 DeepCL 二进制文件的实际路径调整命令）\n  * 然后尝试运行 `deepcl_unittests`\n  * （新增功能）现在可以选择运行测试的 GPU 设备，例如：`deepcl_unittests gpuindex=1`\n\n请注意，每次打开新的命令提示符时都需要“激活”安装环境（或者你可以通过控制面板 | 系统 | 高级系统设置 | 环境变量永久添加相关环境变量）。\n\n### Linux\n\n#### 先决条件：\n\n* 支持 OpenCL 的 GPU 或 APU，并已安装相应的 OpenCL 驱动程序（可通过运行 `clinfo` 检查，该命令应显示你期望的 GPU 设备）\n* 已在 Ubuntu 14.04 32 位\u002F64 位系统上测试通过。\n\n#### 操作步骤：\n\n* 从 http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F 下载最新的 tar 包（例如 v8.0.0rc8 版本）\n* 解压后会生成 `dist` 子文件夹\n* 在 Bash 终端中运行 `source dist\u002Fbin\u002Factivate.sh`（请根据解压二进制 tar 包的实际路径调整命令）\n* 测试方法：在同一 Bash 终端中运行 `deepcl_unittests`\n  * （新增功能）现在可以选择运行测试的 GPU 设备，例如：`deepcl_unittests gpuindex=1`\n\n请注意，每次打开新的 Bash 终端时都需要“激活”安装环境（或者你可以将 `activate.sh` 添加到你的 `.bashrc` 文件中）。\n\n## Python 封装\n\n* 确保你已经安装了原生库，并通过运行 `call dist\\bin\\activate.bat` 或 `source dist\u002Fbin\u002Factivate.sh` 完成激活\n* 运行 `pip install --pre DeepCL`\n* 测试方法：运行 `python -c \"import PyDeepCL; cl = PyDeepCL.DeepCL()\"`\n\n## 从源码构建\n\n仅当使用二进制包无法满足你的配置需求，或你希望对 DeepCL 进行修改时，才需要从源码构建。\n\n详情请参阅 [Build.md](doc\u002FBuild.md)。\n\n## 如果无法运行怎么办？\n\n* 检查你的系统是否配备了支持 OpenCL 的设备\n  * 理想情况下是 GPU 或加速器，因为目前 DeepCL 并未针对 CPU 进行优化（至少暂时如此，未来可能会改变，欢迎提交 Pull Request :-) ）\n* 尝试运行 `gpuinfo`（来自 [EasyCL](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FEasyCL)，但也作为本项目的一部分内置，以方便使用）\n  * 该工具应至少输出一个支持 OpenCL 的设备\n  * 如果没有，则需要确保你拥有支持 OpenCL 的设备，并已正确安装驱动程序以及配置 ICD（Windows 中为注册表，Linux 中为 `\u002Fetc\u002FOpenCL\u002Fvendors`）。\n\n# 如果我需要一个新功能怎么办？\n\n请提交一个问题，告诉我你对此感兴趣。\n* 如果这个功能本来就在我的待办事项列表中，迟早会实现的话（见下文），我可能会提前完成它。\n* 如果这个功能与用户体验相关，我会尽量将其列为优先事项。\n\n如果我想自己贡献代码呢？\n=================\n\n- 请随意 fork 这个仓库，修改代码并提交 pull request。或者直接联系我。也可以两者都做 :-)\n\n第三方库\n=====================\n\n* [EasyCL](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FEasyCL)\n* [clew](https:\u002F\u002Fgithub.com\u002Fmartijnberger\u002Fclew)\n* [libpng++](http:\u002F\u002Fwww.nongnu.org\u002Fpngpp\u002Fdoc\u002F0.2.1\u002F)\n* Lua\n* cogapp\n\n硬件\u002F驱动程序相关问题\n===============================\n\n* 如果你使用的是 Clover，你可能想看看：\n  * 这个讨论串 [https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fissues\u002F35](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fissues\u002F35)\n  * 这个分支 [https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Ftree\u002Fclover-compatibility](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Ftree\u002Fclover-compatibility)\n  * 请注意，Clover 并不受官方支持，这些内容只是作为“起点”提供给那些想要尝试的人 :)\n\n相关项目\n================\n\n* [kgsgo-dataset-preprocessor](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fkgsgo-dataset-preprocessor) 基于 kgsgo 棋局的数据集；包含 3300 万个数据点\n* [cltorch](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fcltorch)\n* [clnn](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fclnn)\n\n许可证\n=======\n\n[Mozilla 公共许可证 2.0](http:\u002F\u002Fmozilla.org\u002FMPL\u002F2.0\u002F)\n\n近期变更\n==============\n\n* 2017年5月2日：\n  * 分支 `update-easycl-mac` 更新至最新版 EasyCL，并在 Mac Sierra 上对以下 GPU 进行了单元测试：\n    * Intel HD Graphics 530 GPU\n    * Radeon Pro 450 GPU\n  * 最新版 EasyCL 允许使用环境变量 `CL_GPUOFFSET` 来选择 GPU，例如设置为 `1` 选择第二个 GPU，或设置为 `2` 选择第三个 GPU。\n  * 感谢我的雇主 [ASAPP](http:\u002F\u002Fasapp.com) 提供了这台 Mac Sierra 给我使用 :-)\n* 2016年8月7日：\n  * Windows 编译器的“标准”版本由 msvc2010 更改为 msvc2015 update 3（Linux 和 Mac 不变）\n  * Windows 上 Python 3.x 的“标准”版本由 3.4 更改为 3.5（Linux 和 Mac 不变）\n  * （注：Python 2.7 在所有 Windows 32\u002F64 位、Linux 和 Mac 上仍可正常使用）\n  * Linux 和 Mac 上的标准 C++ 版本由 c++0x 更改为 c++11\n* 2016年7月29日：\n  * Python 相关修复：\n    * 变更：现在必须使用 numpy 张量，不再接受 `array.array`\n    * 新特性：现在可以将 numpy 张量作为 4D 张量提供，不再局限于 1D 张量\n    * 错误修复：Q-learning 现在应该又能正常工作了\n* 2016年7月26日：\n  * 修复了清单加载器中的一些错误\n  * 不再需要在清单文件的第一行指定图像数量\n  * 为 `deepcl_unittests` 添加了 `gpuindex=` 选项（目前还处于测试阶段...）\n* 2016年1月4日：\n  * 修复了 Mac 上 OpenCL 构建和 C++ 构建中的多个编译警告\n* 2016年1月3日：\n  * 在 Travis 上创建了 Mac OS X 构建，并修复了构建问题，https:\u002F\u002Ftravis-ci.org\u002Fhughperkins\u002FDeepCL\n* 11月27日：\n  * 添加了 [ELU](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.07289v1.pdf)\n* 10月第26周：\n  * 创建了分支 `clblas-2.8.0`，该分支与 Visual Studio 2015 兼容。它使用最新的 clBLAS 2.8.x 版本。感谢 jakakonda 的帮助，使该分支得以成功运行。\n* 8月28日：\n  * 将 8.x 分支合并到主分支，即将发布 8.x 的第一个版本\n  * 现在可以在 Windows 上通过二进制文件安装 8.x，例如在 32 位 Windows 7 上，假设你已经激活了合适的 Python 环境（假设已安装 7-Zip，默认位置，否则需手动解压）：\n```\npowershell Set-ExecutionPolicy unrestricted\nrem following command is like `wget` in linux:\npowershell.exe -Command (new-object System.Net.WebClient).DownloadFile('http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002Fdeepcl-win32-v8.0.0rc8.zip', 'deepcl-win32-v8.0.0rc8.zip')\nrem following command is like `tar -xf` in linux:\n\"c:\\program files\\7-Zip\\7z.exe\" x deepcl-win32-v8.0.0rc8.zip\ncall dist\\bin\\activate.bat\npip install --pre DeepCL\npython -c \"import PyDeepCL; cl = PyDeepCL.DeepCL()\"\n# (last line is just to check works ok)\n```\n* 8月26日：现在也可以在 Linux 上通过二进制文件安装 8.x，例如在 64 位 Ubuntu 14.04 上：\n```\nmkdir 8.0.0rc4\ncd 8.0.0rc4\nwget http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002Fdeepcl-linux64-v8.0.0rc4.tar.bz2\ntar -xf deepcl-linux64-v8.0.0rc4.tar.bz2\nvirtualenv env\nsource env\u002Fbin\u002Factivate\nsource dist\u002Fbin\u002Factivate.sh\npip install --pre DeepCL\npython -c \"import PyDeepCL; cl = PyDeepCL.DeepCL()\"\n```\n（最后一行只是为了检查是否正常工作）\n\n* 8月21日至24日：\n  * 8.x 终于在所有 CI 测试配置上重新成功构建！\n    * ubuntu 14.04 32 位 Python 2.7\n    * ubuntu 14.04 32 位 Python 3.4\n    * ubuntu 14.04 64 位 Python 2.7\n    * ubuntu 14.04 64 位 Python 3.4\n    * visual studio 2010 32 位 python 2.7\n    * visual studio 2010 32 位 python 3.4\n    * visual studio 2010 64 位 python 2.7\n    * visual studio 2010 64 位 python 3.4\n* 8月19日至20日：\n  * Python 包装现在基于标准的原生 DeepCL 构建，使用非常简单的 setup.py 层进行构建\n* 8月18日：\n  * 添加了 BackwardIm2Col 层，该层在反向传播时使用 im2col\n  * 添加了 BackpropWeightsIm2Col 层，该层在更新权重时使用 im2col\n  * 添加了 BackwardAuto 层，该层会自动选择最快的反向传播层\n  * 添加了 BackpropWeightsAuto 层，该层会自动选择更快的权重更新层\n  * 在底层：\n    * 创建了 ClBlasHelper，用于处理 Gemm 和 Gemv\n    * 将 im2col 抽象为 Im2Col 类\n* 截止到8月17日的一周内：\n  * 添加了前向和反向 im2col 层\n  * 前向 im2col 会在适当的时候自动用于前向传播\n  * 反向部分尚未集成\n  * 在底层：\n    * 添加了 clBLAS\n    * 将 Python 构建流程迁移到使用 cmake，而不是 setup.py（这样做是好是坏目前还不太确定）\n* 6月22日：\n  * 移除了 Lua 包装\n  * 如果你想在 OpenCL 中使用 Lua，请考虑使用 [cltorch](http:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fcltorch) 和 [clnn](http:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fclnn)\n\n如何联系我\n=================\n\n只需在 GitHub 页面右上角创建一个问题即可。不用担心你觉得这个问题是否荒谬或其他。反馈越多越好！\n\n请注意，我目前 100% 的精力都集中在 [cuda-on-cl](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fcuda-on-cl) 上，因此在此期间请耐心等待。","# DeepCL 快速上手指南\n\nDeepCL 是一个基于 OpenCL 的开源库，用于训练深度卷积神经网络。它支持 C++、Python 和命令行接口，能够利用 GPU 加速深度学习任务的训练过程。\n\n## 环境准备\n\n### 系统要求\n*   **操作系统**: Windows (推荐 VS2015), Linux (Ubuntu 14.04+), macOS。\n*   **硬件**: 必须配备支持 **OpenCL** 的 GPU 或 APU（目前主要针对 GPU 优化，CPU 运行效率较低）。\n*   **驱动**: 安装对应显卡厂商提供的最新 OpenCL 驱动程序。\n    *   *验证方法*: 在终端运行 `clinfo` (Linux) 或后续安装步骤中的 `gpuinfo` 工具，确认能检测到至少一个 OpenCL 设备。\n\n### 前置依赖\n*   **Windows**: Visual Studio 2015 (用于编译或运行某些组件)，7-Zip (用于解压)。\n*   **Linux**: build-essential, cmake, python-dev, pip。\n*   **Python**: Python 2.7 或 3.4+ (推荐 3.5+)，需安装 `numpy`。\n\n> **注意**: 原文未提供中国镜像源。国内用户若下载二进制包速度慢，可尝试使用代理或寻找第三方镜像。安装 Python 包时，建议临时切换至国内 pip 源（如清华源）：\n> `pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple --pre DeepCL`\n\n## 安装步骤\n\nDeepCL 提供预编译的二进制包，推荐优先使用此方式安装。仅当需要修改源码或二进制包不兼容时才选择从源码构建。\n\n### 方案一：使用预编译二进制包（推荐）\n\n#### 1. Windows 安装\n1.  从官网下载最新二进制压缩包 (例如 `deepcl-win32-v8.0.0rc8.zip`)：\n    http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F\n2.  解压文件，生成 `dist` 文件夹。\n3.  打开命令提示符 (cmd)，激活环境并测试：\n    ```cmd\n    cd \u003C解压后的路径>\n    call dist\\bin\\activate.bat\n    deepcl_unittests\n    ```\n    *(可选)* 指定特定 GPU 进行测试：`deepcl_unittests gpuindex=1`\n\n#### 2. Linux 安装\n1.  从官网下载最新 tar 包 (例如 `deepcl-linux64-v8.0.0rc8.tar.bz2`)。\n2.  解压并激活环境：\n    ```bash\n    tar -xf deepcl-linux64-v8.0.0rc8.tar.bz2\n    cd dist\n    source bin\u002Factivate.sh\n    deepcl_unittests\n    ```\n    *(可选)* 指定特定 GPU 进行测试：`deepcl_unittests gpuindex=1`\n\n### 方案二：安装 Python 封装器\n\n在完成上述“原生库安装”并激活环境后，执行以下命令安装 Python 接口：\n\n```bash\npip install --pre DeepCL\n```\n\n**验证安装：**\n运行以下 Python 命令，若无报错则安装成功：\n```python\npython -c \"import PyDeepCL; cl = PyDeepCL.DeepCL()\"\n```\n\n### 方案三：从源码构建\n如果预编译包无法运行，请参考官方文档 [doc\u002FBuild.md](doc\u002FBuild.md) 进行源码编译。\n\n## 基本使用\n\nDeepCL 支持命令行直接训练、C++ API 和 Python API。以下是最常用的两种方式。\n\n### 1. 命令行使用 (Command Line)\n\n无需编写代码，通过定义网络结构和参数即可开始训练。\n\n**示例：在 MNIST 数据集上训练**\n该命令定义了一个包含卷积、池化、全连接层的网络，训练 20 个 epoch。\n\n```bash\n.\u002Fdeepcl_train dataset=mnist netdef=\"rt2-8c5z-relu-mp2-16c5z-relu-mp3-150n-tanh-10n\" numepochs=20 multinet=6 learningrate=0.002\n```\n\n**示例：围棋下一步预测任务 (kgsgo 数据集)**\n```bash\n.\u002Fdeepcl_train dataset=kgsgoall netdef=\"12*(32c5z-relu)-500n-tanh-361n\" numepochs=15 learningrate=0.0001\n```\n\n*参数说明:*\n*   `dataset`: 数据集类型 (支持 mnist, jpegs, kgsv2 等)。\n*   `netdef`: 网络结构定义字符串 (如 `8c5z` 表示 8 个 5x5 卷积核，`relu` 为激活函数，`mp2` 为 2x2 最大池化)。\n*   `numepochs`: 训练轮数。\n*   `learningrate`: 学习率。\n\n### 2. Python API 使用\n\n在 Python 脚本中调用 DeepCL 进行更灵活的控制。\n\n```python\nimport PyDeepCL\n\n# 初始化 DeepCL 上下文\ncl = PyDeepCL.DeepCL()\n\n# 获取可用的 GPU 设备数量\nnum_devices = cl.getNumDevices()\nprint(\"Found {} OpenCL devices\".format(num_devices))\n\n# 后续可结合 numpy 数组进行数据加载、网络定义和训练\n# 具体高级用法请参考 python\u002FREADME.md\n```\n\n### 支持的核心功能概览\n*   **层类型**: 卷积 (convolutional), 最大池化 (max-pooling), 归一化，Dropout, 随机平移\u002F裁剪等。\n*   **损失函数**: Softmax, 交叉熵 (cross-entropy), 平方损失 (square loss)。\n*   **优化器**: SGD, Nesterov, Adagrad, Rmsprop, Adadelta。\n*   **激活函数**: tanh, relu, sigmoid, elu, linear 等。","某计算机视觉初创团队需要在资源有限的云服务器上，快速训练一个用于工业零件缺陷检测的深度卷积神经网络。\n\n### 没有 DeepCL 时\n- **硬件依赖受限**：团队被迫绑定昂贵的 NVIDIA CUDA 生态，无法利用服务器上现有的 AMD GPU 或集成显卡资源，导致算力闲置。\n- **训练周期漫长**：在处理高分辨率零件图像时，单轮迭代耗时过长，调整网络结构（如卷积层或池化层）后需等待数天才能验证效果。\n- **开发灵活性差**：缺乏原生的 OpenCL 支持，难以通过命令行快速并行启动多个实验（Multinet），严重拖慢算法调优进度。\n- **环境配置复杂**：在不同操作系统间迁移项目时，常因驱动和底层库不兼容导致构建失败，耗费大量运维精力。\n\n### 使用 DeepCL 后\n- **硬件兼容自由**：借助 OpenCL 跨平台特性，团队成功激活了所有可用的 GPU 和 APU 设备，包括非 NVIDIA 芯片，最大化利用了云端算力。\n- **训练效率倍增**：利用 DeepCL 优化的卷积与池化算子，结合 Adagrad 等先进优化器，将单次实验的收敛时间从数天缩短至数小时。\n- **并行实验便捷**：通过命令行接口轻松实现“多列网络”并行训练，一次性对比多种激活函数（如 ReLU、ELU）和架构组合，加速模型选型。\n- **部署流程简化**：无论是 Linux 还是 Windows，只需简单激活脚本即可运行，配合 Python 封装接口，让算法工程师能专注于模型逻辑而非环境调试。\n\nDeepCL 通过打破硬件厂商锁定并提供高效的 OpenCL 加速，让中小团队在低成本设备上也能高效完成深度卷积网络的训练与迭代。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhughperkins_DeepCL_9bcd3782.png","hughperkins","Hugh Perkins","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhughperkins_02f50c4a.png",null,"https:\u002F\u002Fcn.linkedin.com\u002Fin\u002Fhughperkins","https:\u002F\u002Fgithub.com\u002Fhughperkins",[82,86,90,93,97,101,105],{"name":83,"color":84,"percentage":85},"C++","#f34b7d",81.7,{"name":87,"color":88,"percentage":89},"C","#555555",7.4,{"name":91,"color":92,"percentage":89},"Python","#3572A5",{"name":94,"color":95,"percentage":96},"CMake","#DA3434",1.9,{"name":98,"color":99,"percentage":100},"JavaScript","#f1e05a",0.9,{"name":102,"color":103,"percentage":104},"Batchfile","#C1F12E",0.4,{"name":106,"color":107,"percentage":104},"Shell","#89e051",880,199,"2026-03-09T12:55:57","MPL-2.0",4,"Windows, Linux, macOS","必需 OpenCL 兼容的 GPU 或 APU（支持 NVIDIA, AMD\u002FATI, Intel 等），未指定具体型号或显存大小；不支持纯 CPU 运行（无优化）","未说明",{"notes":117,"python":118,"dependencies":119},"1. 必须安装对应显卡的 OpenCL 驱动程序并正确配置 ICD（Windows 注册表或 Linux \u002Fetc\u002FOpenCL\u002Fvendors）。2. 二进制包安装后需在每次新终端会话中运行激活脚本（Windows: activate.bat, Linux\u002Fmac: activate.sh）以设置环境变量。3. Python 接口要求使用 numpy 张量，不再支持 array.array。4. macOS 用户可通过环境变量 CL_GPUOFFSET 选择特定 GPU。5. 该项目主要基于 OpenCL，而非 CUDA。","2.7, 3.4, 3.5+ (Windows 默认 3.5+, Linux\u002Fmac 支持 2.7\u002F3.4+)",[120,121,122,123,124,125],"OpenCL Driver","EasyCL","clBLAS (可选\u002F部分版本)","numpy (Python 接口必需)","libpng++","clew",[14,13],"2026-03-27T02:49:30.150509","2026-04-06T11:31:09.440225",[130,135,140,145,150,155],{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},14687,"如何在 Mac OSX 上安装 DeepCL？","DeepCL 现已支持在 Mac 上构建。建议用户使用 v2.4 标签版本进行安装。项目已配置 Travis CI 以确保持续构建在 Mac 上的兼容性。如果遇到问题，请检查是否使用了正确的版本标签。","https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fissues\u002F32",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},14688,"如何避免每次预测时重新初始化 GPU 和网络以减少内存占用？","目前官方未直接提供保持 GPU 初始化的持久化模式。用户提出的变通方案是将预测程序作为“服务器”运行，监听特定文件夹中的 manifest 文件，收到文件后执行预测并输出结果，从而避免重复启动进程导致的重初始化。此外，对于网络规模 netdef=4*(60c5z-relu-mp2)-150n-150n-2n (输入 96x96x3)，约 2GB RAM + 1GB 显存的占用属于正常范围。","https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fissues\u002F85",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},14689,"使用 Adadelta 训练器时出现 Loss 为 NaN 或权重爆炸怎么办？","该问题通常由两个原因导致：1. 对小于 FLT_MIN 的数字取了对数，修复方法是防止对 x\u003C=0 的值进行 log 运算；2. 学习率过高导致权重和偏置数值过大产生 NaN，解决方法是降低学习率或增加权重衰减（例如设置 weight decay 为 0.1）。相关修复代码已合并到 PR #120 中。","https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fissues\u002F87",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},14690,"如何记录 MNIST 训练中每个内核的运行时间以进行优化？","可以通过合并社区贡献的内核计时功能来实现。具体做法是从特定的分支（如 fsword73-kernel1）中提取内核文件和计时器代码，将其集成到本地版本中。这能帮助用户识别耗时最多的内核（例如占工作量 50% 以上的内核），进而针对这些内核（如前向传播的滤波器操作）进行优化，实测可带来约 53% 的速度提升。","https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fissues\u002F66",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},14691,"处理较大尺寸数据集（如 14k*50*50）时遇到 'Out of resources, code -5' 错误如何解决？","该错误通常与显存资源不足有关，即使监控显示显存占用增加不多也可能触发。尝试更换不同架构的显卡（如从 R5 230 换到其他卡）或在不同 Python 环境（如 Intel Python 2.7 或 Anaconda）中运行可能解决问题。确保使用的是最新版本的 DeepCL 和 PyDeepCL，并检查驱动程序是否为系统更新提供的最新版本。","https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fissues\u002F83",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},14692,"如何实现单张图片多标签输出或网格图像中的目标定位？","DeepCL 原生主要支持单标签分类。对于多标签或多目标检测需求（如识别图片中的多个字符或网格中的特定对象），目前的推荐做法是预处理图像：先将图像分割成单个字符或网格块，分别送入网络预测，再后处理结果。虽然官方曾讨论过类似人脸检测的回归框方案，但在该议题中用户最终表示不再需要此功能，暗示当时需依靠外部图像处理逻辑实现。","https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fissues\u002F93",[161,166,171,176,181,186,191,196,201,206,211,216,221,226,231,236,241,246,251,256],{"id":162,"version":163,"summary_zh":164,"released_at":165},81610,"v8.3.1","与 v8.1.3 相比：\n- 在 Python 中新增了 ScaledTanh 激活函数\n- 新增了 ELU 激活函数\n- 修复了 Mac 上的一些构建警告\n- 进行了多项修复和优化（详情请参阅：https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\u002Fcommits\u002Fv8.3.1）\n\n适用于 Linux 和 Windows 的原生库下载地址：\n\n```\nhttp:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F\n使用说明请参考自述文件：https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\n```\n\nPython 包已在 PyPI 上发布：https:\u002F\u002Fpypi.python.org\u002Fpypi\u002FDeepCL\u002F8.3.1","2016-03-27T20:06:38",{"id":167,"version":168,"summary_zh":169,"released_at":170},81611,"v8.1.3","代码变更：\n- 移除了 norbloader 单元测试，该测试会持续失败（因为需要下载 norb 数据集）。\n\n发布变更：\n- Edward Geist 非常慷慨地为本次版本 DeepCL v8.1.3 创建了 Mac OS X Homebrew 安装公式，你可以在以下链接找到：https:\u002F\u002Fgist.github.com\u002FGOFAI\u002F53a25ff22f31ec144608\n\nLinux 和 Windows 的原生库下载：\n- http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F\n- 使用说明请参阅自述文件：https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\n","2015-11-22T03:30:57",{"id":172,"version":173,"summary_zh":174,"released_at":175},81612,"v8.1.2","主要为错误修复（相较于 v8.0.0）：\n- 使用 `int64_t`，以确保在所有平台上 64 位整数均为 64 位\n- 为部分内核添加保护机制，防止在特定几何形状下程序崩溃\n- 现在 Python 构建可在 OS X 上正常工作\n- 修复了一些 OpenCL 警告以及部分测试失败问题\n- 当 OpenCL 头文件不存在时，Python 构建也能正常运行\n\n此外，还有一个小功能改进：\n- 在 Python 中，可以对 SoftMax 层调用 `getLabels` 方法\n\n原生库下载（适用于 Linux 和 Windows；Mac 目前仍需自行构建）：\n- http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F\n- 使用说明请参阅自述文件：https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL","2015-11-21T04:46:44",{"id":177,"version":178,"summary_zh":179,"released_at":180},81613,"v8.0.0","（这些注释有点事后补的意味，不过迟到总比不到好 :-) 其实我只是直接从 8.0.0rc9 的发行说明里复制过来的……）\n\n原生库下载：\n- http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F\n- 使用说明请参阅 README 文件：https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\n\nPython 封装：\n- 首先按照上述原生库的 README 文档激活原生库环境。\n- 然后运行 `pip install --pre DeepCL`。\n- 最后执行 `python -c \"import PyDeepCL; cl = PyDeepCL.DeepCL()\"` 来验证是否正常工作。\n\n自 5.x 版本以来的变更：\n- 新增了 clblas 支持。\n- 新增了 im2col 功能。\n- Python 封装现在只是对标准原生库的一层轻量封装，不再为 Python 构建单独的原生库版本。\n- 移除了 Lua 封装（如果您想使用 Lua 训练卷积神经网络，请使用 https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fcltorch 和 https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fclnn）。\n- 安装流程已更改，请务必遵循 README 中的说明。\n- 命令行工具现已更名为 deepcl_train 和 deepcl_predict，取代了之前的 deepclrun、deepclexec 等名称。\n\n（请注意，6.x 版本仅发布了一个版本，随后便立即启动了 clblas 和构建系统的全面重构，从而推出了 8.x 版本。7.x 版本是我无意中跳过的 :-P）","2015-11-21T04:52:35",{"id":182,"version":183,"summary_zh":184,"released_at":185},81614,"v8.0.0rc9","这可能是 8.0.0 的最后一个 rc 版本，在首个非 rc 版本发布之前。\n\n原生库下载：\n- http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F\n- 使用说明请参阅自述文件：https:\u002F\u002Fgithub.com\u002Fhughperkins\u002FDeepCL\n\nPython 封装：\n- 首先按照上述原生库的自述文件激活原生库。\n- 然后执行 `pip install --pre DeepCL`。\n- 最后运行 `python -c \"import PyDeepCL; cl = PyDeepCL.DeepCL()\"` 来检查是否一切正常。\n\n自 5.x 以来的变更：\n- 新增了 clblas；\n- 新增了 im2col 操作；\n- Python 封装现在只是对标准原生库的一层轻量封装，不再为 Python 构建单独的原生库；\n- 移除了 Lua 封装（如果您想使用 Lua 训练卷积神经网络，请使用 https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fcltorch 和 https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fclnn）；\n- 安装流程已更改，请务必遵循自述文件中的说明；\n- 命令行工具现已更名为 `deepcl_train` 和 `deepcl_predict`，取代了之前的 `deepclrun`、`deepclexec` 等。\n\n（请注意，6.x 只发布了一个版本，随后便立即启动了 clblas 和构建系统的全面重构，从而推出了 8.x。7.x 是我无意中跳过的 :-P ）\n","2015-08-28T00:19:45",{"id":187,"version":188,"summary_zh":189,"released_at":190},81615,"v6.0.0","本次发布并没有新增太多功能，主要是废弃了几个旧功能，并有望修复 5.8 版本在 Python 环境下的构建问题。\n\n本次发布的变更：\n- 移除了 Lua 封装。现在可以通过 [cltorch](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fcltorch) 和 [clnn](https:\u002F\u002Fgithub.com\u002Fhughperkins\u002Fclnn) 来支持 Lua。这两者仍在开发中，但已经取得了相当大的进展。\n- `deepclrun` 和 `train` 命令已被弃用，统一为 `deepcl_train`。\n- `predict` 命令更名为 `deepcl_predict`。\n- 在底层实现上，easycl 已升级为使用 Lua 内核模板。\n\n请从 http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F 下载二进制文件。对于 Python 用户，可以通过 [pypi](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002FDeepCL\u002F6.0.0) 进行安装。","2015-06-25T13:57:04",{"id":192,"version":193,"summary_zh":194,"released_at":195},81616,"v5.10.2","_新功能：_\n- `predict` 增加了一个新的命令行选项 `outputlayer=`，用于选择要记录输出的层（即不一定是最后一层）。\n\n_错误修复：_\n- `predict` 以文本格式输出时，每个示例现在都独占一行，而不是将每个批次的所有示例合并到同一行。\n- 在 `predict` 中，不再跳过每隔一个输入示例。\n- `gpuinfo` 现在可以在 HD5500 集成显卡上正常运行，且不会导致显示异常。\n\n命令行工具和 C++ 库：http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\nPython 封装：https:\u002F\u002Fpypi.python.org\u002Fpypi\u002FDeepCL\u002F5.10.2\n","2015-05-31T06:28:06",{"id":197,"version":198,"summary_zh":199,"released_at":200},81617,"v5.9.0","_新功能：_\n- `deepclrun` 更名为 `train`：用于处理使用标注数据的训练和验证\n- `deepclexec` 更名为 `predict`：用于从未标注数据中生成预测结果\n- `predict` 可以通过 GenericLoader 从文件中读取输入，支持与训练相同的格式，也可以从标准输入读取。\n- `predict` 可以将输出写入文件或发送到标准输出，支持文本和二进制两种格式。\n\n请从 http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads 下载命令行版本和库。\nPython 封装已在 PyPI 上发布，地址分别为 https:\u002F\u002Fpypi.python.org\u002Fpypi\u002FDeepCL\u002F5.9.0 和 https:\u002F\u002Fpypi.python.org\u002Fpypi\u002FDeepCL\u002Fv5.9.0（显然我需要修复版本号，以便 Linux 和 Windows 都能安装同一个 PyPI 版本）。\n","2015-05-30T15:54:22",{"id":202,"version":203,"summary_zh":204,"released_at":205},81618,"v5.8.3","_错误修复：_\n- 清理单元测试中的部分内存泄漏\n\n从 http:\u002F\u002Fdeepcl.hughperkins.com\u002FDownloads\u002F 下载\n","2015-05-28T15:24:06",{"id":207,"version":208,"summary_zh":209,"released_at":210},81619,"v5.8.2","_错误修复：_\n- 修复了多个 clMath 对象中的内存泄漏问题，该问题与未正确删除内核有关。\n","2015-05-28T15:13:13",{"id":212,"version":213,"summary_zh":214,"released_at":215},81620,"v5.8.1","_Bug fixes:_\n- fixed a memory leak in the random number generator module, which meant that the amount of memory used was excessive, and that using random translations, random patches, or dropout used up all available memory within seconds\n- fixed a bug in the forward1 propagate kernel, which meant that it crashed for certain geometries\n","2015-05-28T14:40:22",{"id":217,"version":218,"summary_zh":219,"released_at":220},81621,"v5.8.0","### New:\n- Josef Moudrik has started work on `deepclexec`, to run prediction, on pre-trained weights, for new data samples\n- Added jpeg loader, so can load imagenet data now\n\n### Changes\n- Added dependency on libjpeg-turbo, for the jpeg loader.  This can be turned off in cmake options, if you want\n\n### Changes under the covers\n- factorized applying bias into separate class for forward3 (it was already in a separate OpenCLKernel), and away from the convolutional forward opencl in forward4, and forward1\n- fixed a bug in forward4, where images larger than the square root of the maximum gpu workgroupsize sometimes has incorrect values for the last few pixels\n- migrated to latest version of EasyCL, which handles storing the device dirty flag for us, rather than having lots of flags in our code like `weightsCopiedToHost`, and so on\n- GenericLoaderv2 is now stateful, rather than using static methods as per original GenericLoader\n  - new NetLearnerOnDemandv2 uses GenericLoaderv2, as does new OnDemandBatcherv2\n  - deepclrun migrated to use GenericLoaderv2\n  - GenericLoaderv1Wrapper wraps existing GenericLoader implementations, so no need to re-write those in any way for now, and any new GenericLoader implementations can continue to be v1, via the wrapper, if they dont need state\n  - making GenericLoaderv2 stateful means we can read a jpeg manifest, eg for imagenet et al, once, and then hold it in memory\n","2015-05-17T15:05:20",{"id":222,"version":223,"summary_zh":224,"released_at":225},81622,"v5.7.0","### New:\n- added Adadelta\n  - in commandline, 'trainer=adadelta rho=0.9'\n  - in C++, `trainer = new Adadelta( cl, 0.9f );`\n  - In Python and Lua, as in C++\n","2015-05-04T10:35:48",{"id":227,"version":228,"summary_zh":229,"released_at":230},81623,"v5.6.0","### New\n- added WeightInitializer abstraction, so can customizer how weights are initialized\n- two implementations:\n  - OriginalInitializer: default, corresponds to initialization method up to now\n  - UniformInitializer: samples uniformly, from a range parameterized by passed in `initialWeights` parameter\n","2015-05-03T11:05:24",{"id":232,"version":233,"summary_zh":234,"released_at":235},81624,"v5.5.0","### New\n- added Rmsprop\n  - available in commandline, using `trainer=rmsprop`\n  - available in c++, using `Rmsprop *trainer = new Rmsprop( cl );`, and using it to do training (`trainer->train(...)` etc)\n  - available in lua wrappers, by creating an Rmsprop object, and using for training\n  - available in python wrappers, by creating an Rmsprop object, and using for training\n\nNote: builds ok on Windows.\n","2015-05-03T08:48:57",{"id":237,"version":238,"summary_zh":239,"released_at":240},81625,"v5.4.0","### New:\n- added Adagrad\n  - in commandline, use `trainer=adagrad`\n  - In C++, use `Adagrad *adagrad = new Adagrad( cl ); adagrad->setLearningRate( 0.002f );`\n  - In Python and Lua, create an Adagrad instance, as per C++\n\n### Under the hood\n- a bunch of functions added to CLMathWrapper, like: per element inverse, per element add scalar, per element multiply, per element squared, per element square root\n- simplified SGD etc, by moving most stuff from bindState into the Trainer base class\n","2015-05-03T08:11:45",{"id":242,"version":243,"summary_zh":244,"released_at":245},81626,"v5.3.0","### New\n- added Nesterov trainer to python wrappers\n- added Nesterov trainer to lua wrappers\n","2015-05-03T03:36:02",{"id":247,"version":248,"summary_zh":249,"released_at":250},81627,"v5.2.0","### New:\n- added Nesterov trainer\n  - available from C++ api, and from commandline\n","2015-05-03T03:29:03",{"id":252,"version":253,"summary_zh":254,"released_at":255},81628,"v5.1.0","Under the hood:\n- added new CLMathWrapper class, to make per-element gpu array arithmetic easy\n  - migrated SGD and Annealer to use CLMathWrapper\n\nBug fixes:\n- fixed crash in unittests, caused by attempting to reuse kernels across multiple EasyCL instances\n- added appropriate exports, so builds on Windows again now\n","2015-05-03T01:37:58",{"id":257,"version":258,"summary_zh":259,"released_at":260},81629,"v5.0.0","### New:\n- Added Annealer trainer, and 'anneal=' commandline option\n  - python and lua wrappers can also create Annealer trainer, as well as the existing SGD trainer\n\n### Changes:\n- bunch of changes to many of the classes now in `src\u002Fbatch` directory, ie xxxBatcher classes, xxxNetAction type classes, and NetLearnerxxx classes, hence bumped major version to `5`, eg\n  - XXXBatcher.tick: added parameter 'epoch' to 'tick' and 'run' methods\n  - OnDemandBatcher constructor takes in a Batcher\\* object\n  - created new Batcher2, and NetAction2 classes\n  - removed BatchLearner class\n\n### Bug fixes\n- bunch of bugfixes to the python and lua wrappers\n","2015-05-02T14:06:30"]