[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-uTensor--uTensor":3,"tool-uTensor--uTensor":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":67,"owner_name":67,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":77,"owner_website":79,"owner_url":80,"languages":81,"stars":101,"forks":102,"last_commit_at":103,"license":104,"difficulty_score":105,"env_os":106,"env_gpu":106,"env_ram":106,"env_deps":107,"category_tags":116,"github_topics":117,"view_count":105,"oss_zip_url":77,"oss_zip_packed_at":77,"status":16,"created_at":128,"updated_at":129,"faqs":130,"releases":160},632,"uTensor\u002FuTensor","uTensor","TinyML AI inference library","uTensor 是一款专为资源受限设备打造的轻量级机器学习推理框架。它基于 TensorFlow 构建并针对 Arm 架构优化，核心运行时库体积仅约 2KB，完美契合嵌入式场景。面对传统 AI 模型难以在单片机或低功耗设备上运行的痛点，uTensor 提供了一套高效的解决方案。\n\n工作流程十分友好：开发者在 PC 端完成 TensorFlow 模型训练后，利用 uTensor 的离线工具即可生成可直接嵌入的 C++ 代码，实现“复制粘贴”式的部署。其技术亮点在于系统安全性与可调试性——uTensor 能在编译阶段精确锁定内存占用，杜绝运行时堆冲突，同时提供高层级接口屏蔽底层指针操作的复杂性，既保证了速度又降低了出错风险。\n\n它特别适合物联网工程师、嵌入式开发人员以及对 TinyML 感兴趣的研究者，帮助他们在有限的硬件资源上轻松运行智能算法。","# uTensor - Test Release\n[![CircleCI](https:\u002F\u002Fcircleci.com\u002Fgh\u002FuTensor\u002FuTensor.svg?style=svg)](https:\u002F\u002Fcircleci.com\u002Fgh\u002FuTensor\u002FuTensor)\nNote: If you are looking for stable releases, checkout master.\n\n## Tutorials\n\n### Building Tutorial Examples\n\nMake sure `cmake` is available on your system and run following commands:\n\n```bash\n$ mkdir build\n$ cd build\n$ cmake -DPACKAGE_TUTORIALS=ON ..\n$ make\n```\n\nAfter the building process finish, you should find the tutorial executables under `build\u002Ftutorials\u002F` directory.\n\nFollow instructions in the `README.md` in each tutorial directories to learn how to use `uTensor`.\n\nHere are the links to the tutorials:\n\n1. [Error Handling with uTensor](tutorials\u002Ferror_handling)\n2. [Custom Operator](tutorials\u002Fcustom_operator)\n\n## Introduction\n\n### What is it?\nuTensor is an extremely light-weight machine learning inference framework built on Tensorflow and optimized for Arm targets. It consists of a runtime library and an offline tool that handles most of the model translation work. This repo holds the core runtime and some example implementations of operators, memory managers\u002Fschedulers, and more, and the size of the core runtime is only ~2KB!\n\n| Module                       |         .text |       .data |        .bss |\n|------------------------------|---------------|-------------|-------------|\n| uTensor\u002Fsrc\u002FuTensor\u002Fcore     |   1275(+1275) |       4(+4) |     28(+28) |\n| uTensor\u002Fsrc\u002FuTensor\u002Ftensors  |     791(+791) |       0(+0) |       0(+0) |\n\n\n### How does the uTensor workflow work?\n\u003Cdiv>\u003Cimg src=docs\u002Fimg\u002FuTensorFlow.jpg width=600 align=center\u002F>\u003C\u002Fdiv>\n\nA model is constructed and trained in Tensorflow. uTensor takes the model and produces a .cpp and .hpp file. These files contains the generated C++11 code needed for inferencing. Working with uTensor on the embedded side is as easy as copy-and-paste.\n\n### How does the uTensor runtime work?\n[Check out the detailed description here](src\u002FuTensor\u002FREADME.md)\n\n\n## Release Note\nThe rearchitecture is fundamentally centered around a few key ideas, and the structure of the code base and build tools naturally followed.\nOld key points:\n- Tensors describe how data is accessed and where from\n  - Performance of ops depends on which tensors are used\n- Operators are Tensor agnostic\n  - High performance ops can fetch blocks of data at once\n- Strive for low total power in execution\n- Low static and dynamic footprint, be small\n  - Low cost per Tensor throughout the entire system, since most generated models have 100+ including intermediates, also impacts dynamic footprint\n  - Lightweight class hierarchy\n  - Duh\n\nNew additional key ideas:\n- System safety\n  - All tensor metadata and actual data are owned in dedicated regions\n    - This can either be user provided, or one we create\n  - We can guarantee that runtime will use no more than N bytes of RAM at code gen time or at compile time!\n  - Generally should not collide with userspace or system space memory, i.e. dont share heaps\n  - Generally implications: a safe runtime means we can safely update models remotely\n  - As many compile time errors as possible!\n    - Mismatched inputs, outputs, or numbers\n    - wrong sizes used\n    - Impossible memory accesses\n    - etc.\n- Clear, Concise, and Debuggable\n  - Previous iteration of uTensor relied almost too heavily on codegen, making changes to a model for any reason was near impossible\n  - A developer should be able to make changes to the model without relying on code gen\n  - A developer should be able to look at a model file and immediately understand what the graph looks like, without a massive amound of jumping around\n  - Default tensor interface should behave like a higher level language, but exploit the speed of C++\n    - Generally: No more pointer bullshit! C is super error prone, fight me\n      - Only specialized operators have access to raw data blocks, and these ops will be wicked fast\n  - Extensible, configurable, and optimize-outable error handling\n  - GDB debugging IS NOW TRIVIAL\n\nAs mentioned before, these key ideas need to be reflected not only in the code, but in the code structure in such a way that it is Maintainable, Hackable, and User-extensible. Pretty much everything in the uTensor runtime can be divided into two components: core, and everything else. The core library contains all the deep low level functionality needed for the runtime to make the above guarantees, as well as the interfaces required for concrete implementation. Furthermore, the overhead of this core engine should be negligible relative to the system operation. Everything not in the core library really should just be thought of a reasonable defaults. For example, tensor implementations, default operators, example memory allocators, or even possible logging systems and error handlers. These modules should be the primary area for future optimization, especially before model deployment.\n\n## High level API\n\n```c++\nusing namespace uTensor;\n\nconst uint8_t s_a[4] = {1, 2, 3, 4};\nconst uint8_t s_b[4] = {5, 6, 7, 8};\nconst uint8_t s_c_ref[4] = {19, 22, 43, 50};\n\n\u002F\u002F These can also be embedded in models\n\u002F\u002F Recommend, not putting these on the heap or stack directly as they can be large\nlocalCircularArenaAllocator\u003C256> meta_allocator; \u002F\u002F All tensor metadata gets stored here automatically, even when new is called\nlocalCircularArenaAllocator\u003C256> ram_allocator;  \u002F\u002F All temporary storage gets allocated here\n\nvoid foo() {\n  \u002F\u002F Tell the uTensor context which allocators to use\n  Context::get_default_context()->set_metadata_allocator(&meta_allocator);\n  Context::get_default_context()->set_ram_data_allocator(&ram_allocator);\n\n  \u002F\u002F Tensors are simply handles for accessing data as necessary, they are no larger than a pointer\n  \u002F\u002F RomTensor(TensorShape, data_type, data*);\n  Tensor a = new \u002F*const*\u002F RomTensor({2, 2}, u8, s_a);\n  Tensor b = new \u002F*const*\u002F RomTensor({2, 2}, u8, s_b);\n  Tensor c_ref = new RomTensor({2,2}, u8, s_c_ref);\n  \u002F\u002F RamTensors are held internally and can be moved or cleared depending on the memory schedule (optional)\n  Tensor c = new RamTensor({2, 2}, u8);\n\n  \u002F\u002F Operators take in a fixed size map of (input_name -> parameter), this gives compile time errors on input mismatching\n  \u002F\u002F Also, the name binding + lack of parameter ordering makes ctag jumping and GDB sessions significantly more intuitive\n  MatrixMultOperator\u003Cuint8_t> mult_AB;\n  mult_AB\n      .set_inputs({{MatrixMultOperator\u003Cuint8_t>::a, a}, {MatrixMultOperator\u003Cuint8_t>::b, b}})\n      .set_outputs({{MatrixMultOperator\u003Cuint8_t>::c, c}})\n      .eval();\n\n  \u002F\u002F Compare results\n  TensorShape& c_shape = c->get_shape();\n  for (int i = 0; i \u003C c_shape[0]; i++) {\n    for (int j = 0; j \u003C c_shape[1]; j++) {\n      \u002F\u002F Just need to cast the access to the expected type\n      if( static_cast\u003Cuint8_t>(c(i, j)) != static_cast\u003Cuint8_t>(c_ref(i, j)) ) {\n        printf(\"Oh crap!\\n\");\n        exit(-1);\n      }\n    }\n  }\n}\n```\n\n## Building and testing locally\n\n```\ngit clone git@github.com:uTensor\u002FuTensor.git\ncd uTensor\u002F\ngit checkout proposal\u002Frearch\ngit submodule init\ngit submodule update\nmkdir build\ncd build\u002F\ncmake -DPACKAGE_TESTS=ON -DCMAKE_BUILD_TYPE=Debug ..\nmake\nmake test\n```\n\n## Building and running on Arm Mbed OS\n\nThe uTensor core library is configured as a mbed library out of the box, so we just need to import it into our project and build as normal.\n\n```\nmbed new my_project\ncd my_project\nmbed import https:\u002F\u002Fgithub.com\u002FuTensor\u002FuTensor.git\n# Create main file\n# Run uTensor-cli workflow and copy model directory here\nmbed compile # as normal\n```\n\n## Building and running on Arm systems\nTODO\nNote: CMake Support for Arm is currently experimental\nhttps:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F46916611\u002Fcross-compiling-googletest-for-arm64\n\nDefault build\n```\nmkdir build && cd build\ncmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=..\u002Fextern\u002FCMSIS_5\u002FCMSIS\u002FDSP\u002Fgcc.cmake  ..\n```\n\nWith CMSIS optimized kernels\n```\nmkdir build && cd build\ncmake -DARM_PROJECT=1 -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=..\u002Fextern\u002FCMSIS_5\u002FCMSIS\u002FDSP\u002Fgcc.cmake  ..\n```\n\n## Further Reading\n- [Why Edge Computing](https:\u002F\u002Ftowardsdatascience.com\u002Fwhy-machine-learning-on-the-edge-92fac32105e6)\n- [Why the Future of Machine Learning is Tiny](https:\u002F\u002Fpetewarden.com\u002F2018\u002F06\u002F11\u002Fwhy-the-future-of-machine-learning-is-tiny\u002F)\n- [TensorFlow](https:\u002F\u002Fwww.tensorflow.org)\n- [Mbed](https:\u002F\u002Fdeveloper.mbed.org)\n- [Node-Viewer](https:\u002F\u002Fgithub.com\u002Fneil-tan\u002Ftf-node-viewer\u002F)\n- [How to Quantize Neural Networks with TensorFlow](https:\u002F\u002Fpetewarden.com\u002F2016\u002F05\u002F03\u002Fhow-to-quantize-neural-networks-with-tensorflow\u002F)\n- [mxnet Handwritten Digit Recognition](https:\u002F\u002Fmxnet.incubator.apache.org\u002Ftutorials\u002Fpython\u002Fmnist.html)\n","# uTensor - 测试版本发布\n[![CircleCI](https:\u002F\u002Fcircleci.com\u002Fgh\u002FuTensor\u002FuTensor.svg?style=svg)](https:\u002F\u002Fcircleci.com\u002Fgh\u002FuTensor\u002FuTensor)\n注意：如果您正在寻找稳定版本，请检出 master 分支。\n\n## 教程\n\n### 构建教程示例\n\n确保您的系统上已安装 `cmake`（构建工具）并运行以下命令：\n\n```bash\n$ mkdir build\n$ cd build\n$ cmake -DPACKAGE_TUTORIALS=ON ..\n$ make\n```\n\n构建过程完成后，您应该在 `build\u002Ftutorials\u002F` 目录下找到教程可执行文件。\n\n遵循每个教程目录中 `README.md` 的说明来学习如何使用 `uTensor`。\n\n以下是教程链接：\n\n1. [使用 uTensor 进行错误处理](tutorials\u002Ferror_handling)\n2. [自定义算子](tutorials\u002Fcustom_operator)\n\n## 简介\n\n### 它是什么？\nuTensor 是一个极轻量级的机器学习推理框架，基于 Tensorflow（深度学习框架）构建并针对 Arm（处理器架构）目标进行了优化。它由一个运行时库（Runtime library）和一个离线工具组成，后者处理大部分模型转换工作。此仓库包含核心运行时以及一些算子（Operators）、内存管理器\u002F调度器（Memory managers\u002Fschedulers）的示例实现，且核心运行时的大小仅为 ~2KB！\n\n| Module                       |         .text |       .data |        .bss |\n|------------------------------|---------------|-------------|-------------|\n| uTensor\u002Fsrc\u002FuTensor\u002Fcore     |   1275(+1275) |       4(+4) |     28(+28) |\n| uTensor\u002Fsrc\u002FuTensor\u002Ftensors  |     791(+791) |       0(+0) |       0(+0) |\n\n\n### uTensor 工作流程是如何运作的？\n\u003Cdiv>\u003Cimg src=docs\u002Fimg\u002FuTensorFlow.jpg width=600 align=center\u002F>\u003C\u002Fdiv>\n\n模型在 Tensorflow 中构建和训练。uTensor 接收该模型并生成 .cpp 和 .hpp 文件。这些文件包含推理所需的生成的 C++11 代码。在嵌入式侧使用 uTensor 就像复制粘贴一样简单。\n\n### uTensor 运行时是如何工作的？\n[在此处查看详细描述](src\u002FuTensor\u002FREADME.md)\n\n\n## 发布说明\n此次重构从根本上围绕几个关键理念展开，代码库结构和构建工具也随之自然演变。\n旧的关键点：\n- 张量 (Tensor) 描述数据如何访问以及来自何处\n  - 算子 (Ops) 的性能取决于使用了哪些张量\n- 算子 (Operators) 是独立于张量的\n  - 高性能算子可以一次性获取数据块\n- 追求执行过程中的低总功耗\n- 低静态和动态占用空间，保持小巧\n  - 整个系统中每个张量的成本低，因为大多数生成的模型包含 100+ 个（包括中间变量），这也影响动态占用空间\n  - 轻量级类层次结构\n  - 显而易见\n\n新的附加关键理念：\n- 系统安全性\n  - 所有张量元数据和实际数据都拥有在专用区域中\n    - 这可以由用户提供，也可以是我们创建的\n  - 我们可以保证在代码生成时或编译时，运行时使用的 RAM 不超过 N 字节！\n  - 通常不应与用户空间或系统空间内存冲突，即不要共享堆\n  - 一般含义：安全的运行时意味着我们可以安全地远程更新模型\n  - 尽可能多的编译时错误！\n    - 输入、输出或数量不匹配\n    - 使用了错误的尺寸\n    - 不可能的内存访问\n    - 等等\n- 清晰、简洁且可调试\n  - uTensor 的前一版本几乎过于依赖代码生成 (Codegen)，出于任何原因修改模型都几乎不可能\n  - 开发人员应该能够在不依赖代码生成的情况下修改模型\n  - 开发人员应该能够查看模型文件并立即理解图的结构，而无需大量跳转\n  - 默认张量接口应表现得像高级语言，同时利用 C++ 的速度\n    - 一般来说：别再搞指针垃圾了！C 语言极易出错，不服来战\n      - 只有专用算子可以访问原始数据块，这些算子将非常快\n  - 可扩展、可配置且可优化掉的错误处理\n  - GDB（调试器）调试现在变得极其简单\n\n如前所述，这些关键理念不仅需要在代码中体现，还需要体现在代码结构中，使其具有可维护性、可修改性和用户可扩展性。uTensor 运行时中的几乎所有内容都可以分为两个组件：核心和其他部分。核心库包含运行时实现上述保证所需的所有底层深度功能，以及具体实现所需的接口。此外，相对于系统操作，此核心引擎的开销应该可以忽略不计。核心库之外的所有内容实际上都应被视为合理的默认值。例如，张量实现、默认算子、示例内存分配器，甚至可能的日志系统和错误处理器。这些模块应该是未来优化的主要领域，特别是在模型部署之前。\n\n## 高级 API\n\n```c++\nusing namespace uTensor;\n\nconst uint8_t s_a[4] = {1, 2, 3, 4};\nconst uint8_t s_b[4] = {5, 6, 7, 8};\nconst uint8_t s_c_ref[4] = {19, 22, 43, 50};\n\n\u002F\u002F These can also be embedded in models\n\u002F\u002F Recommend, not putting these on the heap or stack directly as they can be large\nlocalCircularArenaAllocator\u003C256> meta_allocator; \u002F\u002F All tensor metadata gets stored here automatically, even when new is called\nlocalCircularArenaAllocator\u003C256> ram_allocator;  \u002F\u002F All temporary storage gets allocated here\n\nvoid foo() {\n  \u002F\u002F Tell the uTensor context which allocators to use\n  Context::get_default_context()->set_metadata_allocator(&meta_allocator);\n  Context::get_default_context()->set_ram_data_allocator(&ram_allocator);\n\n  \u002F\u002F Tensors are simply handles for accessing data as necessary, they are no larger than a pointer\n  \u002F\u002F RomTensor(TensorShape, data_type, data*);\n  Tensor a = new \u002F*const*\u002F RomTensor({2, 2}, u8, s_a);\n  Tensor b = new \u002F*const*\u002F RomTensor({2, 2}, u8, s_b);\n  Tensor c_ref = new RomTensor({2,2}, u8, s_c_ref);\n  \u002F\u002F RamTensors are held internally and can be moved or cleared depending on the memory schedule (optional)\n  Tensor c = new RamTensor({2, 2}, u8);\n\n  \u002F\u002F Operators take in a fixed size map of (input_name -> parameter), this gives compile time errors on input mismatching\n  \u002F\u002F Also, the name binding + lack of parameter ordering makes ctag jumping and GDB sessions significantly more intuitive\n  MatrixMultOperator\u003Cuint8_t> mult_AB;\n  mult_AB\n      .set_inputs({{MatrixMultOperator\u003Cuint8_t>::a, a}, {MatrixMultOperator\u003Cuint8_t>::b, b}})\n      .set_outputs({{MatrixMultOperator\u003Cuint8_t>::c, c}})\n      .eval();\n\n  \u002F\u002F Compare results\n  TensorShape& c_shape = c->get_shape();\n  for (int i = 0; i \u003C c_shape[0]; i++) {\n    for (int j = 0; j \u003C c_shape[1]; j++) {\n      \u002F\u002F Just need to cast the access to the expected type\n      if( static_cast\u003Cuint8_t>(c(i, j)) != static_cast\u003Cuint8_t>(c_ref(i, j)) ) {\n        printf(\"Oh crap!\\n\");\n        exit(-1);\n      }\n    }\n  }\n}\n```\n\n## 本地构建与测试\n\n```\ngit clone git@github.com:uTensor\u002FuTensor.git\ncd uTensor\u002F\ngit checkout proposal\u002Frearch\ngit submodule init\ngit submodule update\nmkdir build\ncd build\u002F\ncmake -DPACKAGE_TESTS=ON -DCMAKE_BUILD_TYPE=Debug ..\nmake\nmake test\n```\n\n## 在 Arm Mbed OS 上构建和运行\n\nuTensor 核心库开箱即用地配置为 Mbed (嵌入式开发平台) 库，因此我们只需将其导入项目并正常构建。\n\n```\nmbed new my_project\ncd my_project\nmbed import https:\u002F\u002Fgithub.com\u002FuTensor\u002FuTensor.git\n# Create main file\n# Run uTensor-cli workflow and copy model directory here\nmbed compile # as normal\n```\n\n## 在 Arm 系统上构建和运行\nTODO\n注意：CMake (跨平台构建工具) 对 ARM (处理器架构) 的支持目前处于实验阶段\nhttps:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F46916611\u002Fcross-compiling-googletest-for-arm64\n\n默认构建\n```\nmkdir build && cd build\ncmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=..\u002Fextern\u002FCMSIS_5\u002FCMSIS\u002FDSP\u002Fgcc.cmake  ..\n```\n\n使用 CMSIS (微控制器软件接口标准) 优化内核\n```\nmkdir build && cd build\ncmake -DARM_PROJECT=1 -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=..\u002Fextern\u002FCMSIS_5\u002FCMSIS\u002FDSP\u002Fgcc.cmake  ..\n```\n\n## 延伸阅读\n- [为什么是边缘计算](https:\u002F\u002Ftowardsdatascience.com\u002Fwhy-machine-learning-on-the-edge-92fac32105e6)\n- [为什么机器学习的未来是微型化](https:\u002F\u002Fpetewarden.com\u002F2018\u002F06\u002F11\u002Fwhy-the-future-of-machine-learning-is-tiny\u002F)\n- [TensorFlow](https:\u002F\u002Fwww.tensorflow.org)\n- [Mbed](https:\u002F\u002Fdeveloper.mbed.org)\n- [Node-Viewer](https:\u002F\u002Fgithub.com\u002Fneil-tan\u002Ftf-node-viewer\u002F)\n- [如何使用 TensorFlow 量化神经网络](https:\u002F\u002Fpetewarden.com\u002F2016\u002F05\u002F03\u002Fhow-to-quantize-neural-networks-with-tensorflow\u002F)\n- [MXNet 手写数字识别](https:\u002F\u002Fmxnet.incubator.apache.org\u002Ftutorials\u002Fpython\u002Fmnist.html)","# uTensor 快速上手指南\n\nuTensor 是一个基于 TensorFlow 构建的超轻量级机器学习推理框架，专为 ARM 目标设备优化。其核心运行时库仅约 2KB，适合资源受限的嵌入式环境。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n- **操作系统**：Linux \u002F macOS \u002F Windows (需支持 CMake)\n- **版本控制**：Git\n- **构建工具**：CMake (需安装在系统路径中)\n- **编译器**：支持 C++11 标准的 C++ 编译器\n- **依赖**：ARM Mbed OS (若用于嵌入式开发)\n\n## 安装步骤\n\n### 1. 克隆仓库并初始化子模块\n\n```bash\ngit clone git@github.com:uTensor\u002FuTensor.git\ncd uTensor\u002F\ngit checkout proposal\u002Frearch\ngit submodule init\ngit submodule update\n```\n\n> **注意**：如果您寻找稳定版本，可切换至 `master` 分支。本指南基于当前架构重构版本 (`proposal\u002Frearch`) 编写。\n\n### 2. 配置与编译\n\n创建构建目录并运行 CMake 配置：\n\n```bash\nmkdir build\ncd build\u002F\ncmake -DPACKAGE_TESTS=ON -DCMAKE_BUILD_TYPE=Debug ..\nmake\nmake test\n```\n\n编译完成后，测试用例将位于 `build\u002F` 目录下。\n\n### 3. 嵌入式集成 (可选)\n\nuTensor 核心库已配置为 Mbed 库，可直接导入项目：\n\n```bash\nmbed new my_project\ncd my_project\nmbed import https:\u002F\u002Fgithub.com\u002FuTensor\u002FuTensor.git\n# 创建 main 文件\n# 运行 uTensor-cli 工作流并将模型目录复制至此\nmbed compile # 正常编译\n```\n\n## 基本使用\n\nuTensor 的工作流是：在 TensorFlow 中构建训练模型 -> 使用 uTensor 工具生成 `.cpp` 和 `.hpp` 文件 -> 在嵌入式端进行推理。\n\n以下是使用 uTensor 高级 API 进行矩阵乘法运算的示例：\n\n```c++\nusing namespace uTensor;\n\nconst uint8_t s_a[4] = {1, 2, 3, 4};\nconst uint8_t s_b[4] = {5, 6, 7, 8};\nconst uint8_t s_c_ref[4] = {19, 22, 43, 50};\n\n\u002F\u002F These can also be embedded in models\n\u002F\u002F Recommend, not putting these on the heap or stack directly as they can be large\nlocalCircularArenaAllocator\u003C256> meta_allocator; \u002F\u002F All tensor metadata gets stored here automatically, even when new is called\nlocalCircularArenaAllocator\u003C256> ram_allocator;  \u002F\u002F All temporary storage gets allocated here\n\nvoid foo() {\n  \u002F\u002F Tell the uTensor context which allocators to use\n  Context::get_default_context()->set_metadata_allocator(&meta_allocator);\n  Context::get_default_context()->set_ram_data_allocator(&ram_allocator);\n\n  \u002F\u002F Tensors are simply handles for accessing data as necessary, they are no larger than a pointer\n  \u002F\u002F RomTensor(TensorShape, data_type, data*);\n  Tensor a = new \u002F*const*\u002F RomTensor({2, 2}, u8, s_a);\n  Tensor b = new \u002F*const*\u002F RomTensor({2, 2}, u8, s_b);\n  Tensor c_ref = new RomTensor({2,2}, u8, s_c_ref);\n  \u002F\u002F RamTensors are held internally and can be moved or cleared depending on the memory schedule (optional)\n  Tensor c = new RamTensor({2, 2}, u8);\n\n  \u002F\u002F Operators take in a fixed size map of (input_name -> parameter), this gives compile time errors on input mismatching\n  \u002F\u002F Also, the name binding + lack of parameter ordering makes ctag jumping and GDB sessions significantly more intuitive\n  MatrixMultOperator\u003Cuint8_t> mult_AB;\n  mult_AB\n      .set_inputs({{MatrixMultOperator\u003Cuint8_t>::a, a}, {MatrixMultOperator\u003Cuint8_t>::b, b}})\n      .set_outputs({{MatrixMultOperator\u003Cuint8_t>::c, c}})\n      .eval();\n\n  \u002F\u002F Compare results\n  TensorShape& c_shape = c->get_shape();\n  for (int i = 0; i \u003C c_shape[0]; i++) {\n    for (int j = 0; j \u003C c_shape[1]; j++) {\n      \u002F\u002F Just need to cast the access to the expected type\n      if( static_cast\u003Cuint8_t>(c(i, j)) != static_cast\u003Cuint8_t>(c_ref(i, j)) ) {\n        printf(\"Oh crap!\\n\");\n        exit(-1);\n      }\n    }\n  }\n}\n```\n\n更多教程示例（如错误处理、自定义算子）可在 `tutorials\u002F` 目录下找到。","某物联网团队正在为工业电机开发基于微控制器的异常振动监测终端，需要在资源受限的 ARM Cortex-M 芯片上运行深度学习模型，以实现预测性维护功能。\n\n### 没有 uTensor 时\n- 通用推理库体积庞大，导致 MCU 剩余内存不足，无法同时处理高频传感器数据。\n- 手动管理内存极易引发堆栈溢出，设备运行时偶尔会死机且难以复现故障原因。\n- 每次更新模型都需要重新编写底层 C 代码，部署周期长且容易引入人为错误。\n- 缺乏可视化的图结构支持，排查推理逻辑错误如同大海捞针，调试效率极低。\n\n### 使用 uTensor 后\n- 核心运行时仅约 2KB，极大释放了内存空间，确保多任务稳定并行运行。\n- 编译期即可锁定最大 RAM 占用，杜绝了运行时内存冲突的安全隐患，保障系统可靠性。\n- 直接复制生成的 C++ 文件即可集成模型，大幅简化了从训练到部署的流程，提升迭代速度。\n- 清晰的张量接口和调试支持，让开发者能直观理解模型图并快速定位逻辑问题。\n\nuTensor 以极小的资源开销实现了嵌入式 AI 的安全高效部署，让边缘智能触手可及。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FuTensor_uTensor_ecb5b070.png","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FuTensor_42e4e4d4.jpg","",null,"utensor@googlegroups.com","www.utensor.ai","https:\u002F\u002Fgithub.com\u002FuTensor",[82,86,90,94,98],{"name":83,"color":84,"percentage":85},"C++","#f34b7d",99.4,{"name":87,"color":88,"percentage":89},"Python","#3572A5",0.3,{"name":91,"color":92,"percentage":93},"C","#555555",0.2,{"name":95,"color":96,"percentage":97},"CMake","#DA3434",0,{"name":99,"color":100,"percentage":97},"Dockerfile","#384d54",1917,247,"2026-04-05T07:12:21","Apache-2.0",4,"未说明",{"notes":108,"python":106,"dependencies":109},"专为嵌入式和 Arm 架构优化的轻量级机器学习推理框架，核心运行时大小仅约 2KB。模型需在 TensorFlow 中训练并通过离线工具转换为 C++ 代码。支持在 Mbed OS 上部署。构建环境需安装 CMake 和 Make，部分功能可选依赖 CMSIS 进行优化。",[110,111,112,113,114,115],"cmake","git","TensorFlow","Mbed CLI","CMSIS","C++11 编译器",[13,53],[118,119,120,121,122,123,124,125,126,127],"tensorflow","mbed","machine-learning","deep-learning","cortex-m","edge-computing","microcontroller","embedded","iot-middleware","iot","2026-03-27T02:49:30.150509","2026-04-06T05:15:23.234077",[131,136,140,145,150,155],{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},2594,"如何在代码中直接将 Tensor 初始化为零值？","可以通过 write 方法获取指针并进行循环赋值。具体代码如下：\n```cpp\nuint32_t *w_ptr = data->write\u003Cuint32_t>(0, 0);\nfor (uint32_t i = 0; i \u003C data->getSize(); ++i) {\n    *(w_ptr + i) = 0;\n}\n```\n如果需要其他初始化方式，也可以考虑使用向量初始化。","https:\u002F\u002Fgithub.com\u002FuTensor\u002FuTensor\u002Fissues\u002F96",{"id":137,"question_zh":138,"answer_zh":139,"source_url":135},2595,"如何移除对 SD 卡的依赖并将模型参数固化到芯片闪存？","可以将权重参数编译到代码中并固化到芯片。具体做法是将参数存储在只读闪存区域，创建 Tensor 时使用 const array 代替 idxImporter。这样可以直接从代码加载参数，无需外部存储。",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},2596,"运行 utensor-cli 时报错 'backend' must be unicode 怎么办？","这是 Python 版本兼容性导致的类型错误。建议将 utensor-cli 升级到 0.2.3 或更高版本，该版本已修复此问题。你可以运行 `utensor-cli -v` 检查当前版本。","https:\u002F\u002Fgithub.com\u002FuTensor\u002FuTensor\u002Fissues\u002F119",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},2597,"在 NRF52_DK 上编译新项目时出现 FLASH overflowed 错误如何解决？","这是因为设备 Flash 空间不足（NRF52_DK 需额外约 100kb+ 用于 BLE 栈）。临时解决方案是升级到 Flash 更大的设备。长期来看，项目正在减少对 stdc++ 的依赖以降低二进制体积。","https:\u002F\u002Fgithub.com\u002FuTensor\u002FuTensor\u002Fissues\u002F139",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},2598,"uTensor 生成的代码为何将多维输出识别为单值？","这涉及张量类型和大小映射的问题。生成器理论上已知输出层的大小和类型。如果返回单值，可能是读取时的类型转换（如 void* 强转）或迭代逻辑有误。建议检查生成的推理代码中 Tensor 的定义及 ctx.get 的调用方式。","https:\u002F\u002Fgithub.com\u002FuTensor\u002FuTensor\u002Fissues\u002F160",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},2599,"TensorStrides 构造函数是否存在索引下溢风险？","是的，原实现在使用无符号类型（size_t）递减循环索引时可能导致下溢。建议将循环索引改为有符号类型（如 int32_t）。该问题已在后续提交中修复。","https:\u002F\u002Fgithub.com\u002FuTensor\u002FuTensor\u002Fissues\u002F215",[161,166],{"id":162,"version":163,"summary_zh":164,"released_at":165},102127,"v0.0.1","- ROM Tensor support\r\n- Updated uTensor ReadMe\r\n- Updated uTensor-CLI ReadMe\r\n- uTensor destructor hotfix","2018-09-17T21:02:18",{"id":167,"version":168,"summary_zh":169,"released_at":170},102128,"v0.0.0","- Updated uTensor ReadMe\r\n- Updated uTensor-CLI ReadMe\r\n- Added Contributor Guide\r\n- Added new uTensor project guide\r\n- Dropout Support","2018-07-02T15:03:38"]