[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-shamangary--FSA-Net":3,"tool-shamangary--FSA-Net":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":10,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":112,"github_topics":113,"view_count":23,"oss_zip_url":82,"oss_zip_packed_at":82,"status":16,"created_at":129,"updated_at":130,"faqs":131,"releases":167},1358,"shamangary\u002FFSA-Net","FSA-Net","[CVPR19] FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image","FSA-Net 是一款轻量级头部姿态估计开源模型，只需一张普通 RGB 照片就能实时输出人脸的俯仰、偏航、滚转三个角度。它摆脱了传统方法对关键点或深度图的依赖，把“精细结构聚合”思想引入特征融合：先按空间关系把特征图拆成若干局部区域，再动态加权汇总，既保留细节又大幅压缩计算量。实验显示，在公开数据集上其精度优于 Hopenet 等主流方案，而模型大小只有后者的 1\u002F100，单张 GTX-1080Ti 即可跑 30 FPS 以上。  \n适合计算机视觉研究者、AR\u002FVR 开发者、人机交互设计师以及想做人脸朝向检测的开发者快速集成；附带的 SSD\u002FMTCNN 人脸检测 demo 也让普通用户能直接用摄像头体验。","# FSA-Net\n**[CVPR19] FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image**\n\n**Code Author: Tsun-Yi Yang**\n\n**\\[Updates\\]**\n - `2019\u002F10\u002F06`: Big thanks to [Kapil Sachdeva](https:\u002F\u002Fgithub.com\u002Fksachdeva) again!!! The keras Lambda layers are replaced, and converted [tf frozen models](https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Ftree\u002Fmaster\u002Fpre-trained\u002Fconverted-models) are supported!\n - `2019\u002F09\u002F27`: Refactoring the model code. Very beautiful and concise codes contributed by [Kapil Sachdeva](https:\u002F\u002Fgithub.com\u002Fksachdeva).\n - `2019\u002F08\u002F30`: Demo update! Robust and fast SSD face detector added!\n\n\n\n### Comparison video\n(Baseline **Hopenet:** https:\u002F\u002Fgithub.com\u002Fnatanielruiz\u002Fdeep-head-pose)\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_0f1476aca961.gif\" height=\"320\"\u002F>\n\n### (New!!!) Fast and robust demo with SSD face detector (2019\u002F08\u002F30)\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_7850f3fb2c5f.gif\" height=\"300\"\u002F>\n\n### Webcam demo\n\n| Signle person (LBP) | Multiple people (MTCNN)|\n| --- | --- |\n| \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_e3db95ab4292.gif\" height=\"220\"\u002F> | \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_5b19070f2fd6.gif\" height=\"220\"\u002F> |\n\n\n| Time sequence | Fine-grained structure|\n| --- | --- |\n| \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_a8506f910a64.png\" height=\"160\"\u002F> | \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_bf46bcb9cfce.png\" height=\"330\"\u002F> |\n\n\n\n### Results\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_1c81bd372f2b.png\" height=\"220\"\u002F>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_e2a98447e5ba.png\" height=\"220\"\u002F>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_09a591a78e2a.png\" height=\"220\"\u002F>\n\n\n## Paper\n\n\n### PDF\nhttps:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002F0191.pdf\n\n\n### Paper authors\n**[Tsun-Yi Yang](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=WhISCE4AAAAJ&hl=en), [Yi-Ting Chen](https:\u002F\u002Fsites.google.com\u002Fmedia.ee.ntu.edu.tw\u002Fyitingchen\u002F), [Yen-Yu Lin](https:\u002F\u002Fwww.citi.sinica.edu.tw\u002Fpages\u002Fyylin\u002Findex_zh.html), and [Yung-Yu Chuang](https:\u002F\u002Fwww.csie.ntu.edu.tw\u002F~cyy\u002F)**\n\n\n## Abstract\nThis paper proposes a method for head pose estimation from a single image. Previous methods often predicts head poses through landmark or depth estimation and would require more computation than necessary. Our method is based on regression and feature aggregation. For having a compact model, we employ the soft stagewise regression scheme. Existing feature aggregation methods treat inputs as a bag of features and thus ignore their spatial relationship in a feature map. We propose to learn a fine-grained structure mapping for spatially grouping features before aggregation. The fine-grained structure provides part-based information and pooled values. By ultilizing learnable and non-learnable importance over the spatial location, different variant models as a complementary ensemble can be generated. Experiments show that out method outperforms the state-of-the-art methods including both the landmark-free ones and the ones based on landmark or depth estimation. Based on a single RGB frame as input, our method even outperforms methods utilizing multi-modality information (RGB-D, RGB-Time) on estimating the yaw angle. Furthermore, the memory overhead of the proposed model is 100× smaller than that of previous methods.\n\n## Platform\n+ Keras\n+ Tensorflow\n+ GTX-1080Ti\n+ Ubuntu\n```\npython                    3.5.6                hc3d631a_0  \nkeras-applications        1.0.4                    py35_1    anaconda\nkeras-base                2.1.0                    py35_0    anaconda\nkeras-gpu                 2.1.0                         0    anaconda\nkeras-preprocessing       1.0.2                    py35_1    anaconda\ntensorflow                1.10.0          mkl_py35heddcb22_0  \ntensorflow-base           1.10.0          mkl_py35h3c3e929_0  \ntensorflow-gpu            1.10.0               hf154084_0    anaconda\ncudnn                     7.1.3                 cuda8.0_0  \ncuda80                    1.0                           0    soumith\nnumpy                     1.15.2          py35_blas_openblashd3ea46f_0  [blas_openblas]  conda-forge\nnumpy-base                1.14.3           py35h2b20989_0  \n```\n\n## Dependencies\n+ A guide for most dependencies. (in Chinese)\nhttp:\u002F\u002Fshamangary.logdown.com\u002Fposts\u002F3009851\n+ Anaconda\n+ OpenCV\n+ MTCNN\n```\npip3 install mtcnn\n```\n+ Capsule: https:\u002F\u002Fgithub.com\u002FXifengGuo\u002FCapsNet-Keras\n+ Loupe_Keras: https:\u002F\u002Fgithub.com\u002Fshamangary\u002FLOUPE_Keras\n\n## Codes\n\nThere are three different section of this project. \n1. Data pre-processing\n2. Training and testing\n3. Demo\n\nWe will go through the details in the following sections.\n\nThis repository is for 300W-LP, AFLW2000, and BIWI datasets.\n\n\n### 1. Data pre-processing\n\n#### [For lazy people just like me] \n\nIf you don't want to re-download every dataset images and do the pre-processing again, or maybe you don't even care about the data structure in the folder. Just download the file **data.zip** from the following link, and replace the data folder.\n\n[Google drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1j6GMx33DCcbUOS8J3NHZ-BMHgk7H-oC_\u002Fview?usp=sharing)\n\nNow you can skip to the \"Training and testing\" stage.\n\n#### [Details]\n\nIn the paper, we define **Protocol 1** and **Protocol 2**.\n\n```\n\n# Protocol 1\n\nTraining: 300W-LP (A set of subsets: {AFW.npz, AFW_Flip.npz, HELEN.npz, HELEN_Flip.npz, IBUG.npz, IBUG_Flip.npz, LFPW.npz, LFPW_Flip.npz})\nTesting: AFLW2000.npz or BIWI_noTrack.npz\n\n\n# Protocol 2\n\nTraining: BIWI(70%)-> BIWI_train.npz\nTesting: BIWI(30%)-> BIWI_test.npz\n\n```\n(Note that type1 (300W-LP, AFLW2000) datasets have the same image arrangement, and I categorize them as **type1**. It is not about Protocal 1 or 2.)\n\nIf you want to do the pre-processing from the beginning, you need to download the dataset first.\n\n#### Download the datasets\n\n+ [300W-LP, AFLW2000](http:\u002F\u002Fwww.cbsr.ia.ac.cn\u002Fusers\u002Fxiangyuzhu\u002Fprojects\u002F3DDFA\u002Fmain.htm)\n+ [BIWI](https:\u002F\u002Fdata.vision.ee.ethz.ch\u002Fcvl\u002Fgfanelli\u002Fhead_pose\u002Fhead_forest.html)\n\nPut 300W-LP and AFLW2000 folders under **data\u002Ftype1\u002F**, and put BIWI folder under **data\u002F**\n\n#### Run pre-processing\n\n```\n# For 300W-LP and AFLW2000 datasets\n\ncd data\u002Ftype1\nsh run_created_db_type1.sh\n\n\n# For BIWI dataset\n\ncd data\npython TYY_create_db_biwi.py\npython TYY_create_db_biwi_70_30.py\n```\n\n\n### 2. Training and testing\n\n```\n\n# Training\nsh run_fsanet_train.sh\n\n# Testing\n# Note that we calculate the MAE of yaw, pitch, roll independently, and average them into one single MAE for evaluation.\nsh run_fsanet_test.sh\n\n```\n\nJust remember to check which model type you want to use in the shell script and you are good to go.\n\n\n### 3. Demo\n\nYou need a **webcam** to correctly process the demo file.\n\n\nNote the the center of the color axes is the detected face center.\nIdeally, each frame should have new face detection results.\nHowever, if the face detection fails, the previous detection results will be used to estimate poses.\n\n\nLBP is fast enough for real-time face detection, while MTCNN is much more accurate but slow.\n\n(2019\u002F08\u002F30 update!) SSD face detection is robust and fast! I borrow some face detector code from https:\u002F\u002Fwww.pyimagesearch.com\n```\n# LBP face detector (fast but often miss detecting faces)\ncd demo\nsh run_demo_FSANET.sh\n\n# MTCNN face detector (slow but accurate)\ncd demo\nsh run_demo_FSANET_mtcnn.sh\n\n# SSD face detector (fast and accurate)\ncd demo\nsh run_demo_FSANET_ssd.sh\n```\n\n### 4. Conversion to tensorflow frozen graph\n\n```bash\ncd training_and_testing\npython keras_to_tf.py --trained-model-dir-path ..\u002Fpre-trained\u002F300W_LP_models\u002Ffsanet_var_capsule_3_16_2_21_5 --output-dir-path \u003Cyour_output_dir>\n```\n\nAbove command will generate the tensorflow frozen graph in \u003Cyour_output_dir>\u002Fconverted-models\u002Ftf\u002Ffsanet_var_capsule_3_16_2_21_5.pb\n\n### Modules explanation:\n\n1. ssr_G_model:\n\nhttps:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002Flib\u002FFSANET_model.py#L441\n\n+ Two-stream structure for extracting the features.\n\n2. ssr_feat_S_model:\n\nhttps:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002Flib\u002FFSANET_model.py#L442\n\n+ Generating fine-grained structure mapping from different scoring functions.\n+ Apply the mapping on to the features and generate primary capsules.\n\n4. ssr_aggregation_model:\n\nhttps:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002Flib\u002FFSANET_model.py#L443\n\n+ Feed the primary capsules into capsule layer and output the final aggregated capsule features. And divide them into 3 stages.\n\n5. ssr_F_model:\n\nhttps:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002Flib\u002FFSANET_model.py#L444\n\n+ Taking the previous 3 stages features for Soft-Stagewise Regression (SSR) module. Each stage further splits into three parts: prediction, dynamic index shifting, and dynamic scaling. This part please check the '[IJCAI18] SSR-Net' for more detail explanation.\n\n6. SSRLayer:\n\nhttps:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002Flib\u002FFSANET_model.py#L444\n\n+ Taking the prediction, dynamic index shifting, and dynamic scaling for the final regression output.\nIn this case, there are three outputs (yaw, pitch, roll).\n\n\n## 3rd party implementation\n+ https:\u002F\u002Fgithub.com\u002Faoru45\u002FFSANet.Pytorch\n+ https:\u002F\u002Fgithub.com\u002Fomasaht\u002Fheadpose-fsanet-pytorch\n\nAwesome VR flight simluation with FSANet ONNX format\n+ https:\u002F\u002Fgithub.com\u002Fxuhao1\u002FFOXTracker\u002F\n","# FSA-Net\n**[CVPR19] FSA-Net：基于单张图像的细粒度结构聚合学习，用于头部姿态估计**\n\n**代码作者：杨孙毅**\n\n**【更新信息】**\n- **2019年10月6日**：再次向【Kapil Sachdeva】（https:\u002F\u002Fgithub.com\u002Fksachdeva）致以衷心的感谢！我们已成功替换 Keras 的 Lambda 层，并支持将 [tf 预训练模型](https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Ftree\u002Fmaster\u002Fpre-trained\u002Fconverted-models) 转换为 TensorFlow 模型！\n- **2019年9月27日**：对模型代码进行了重构。由【Kapil Sachdeva】（https:\u002F\u002Fgithub.com\u002Fksachdeva）贡献了极其优美且简洁的代码。\n- **2019年8月30日**：演示版本更新！新增了鲁棒且高效的 SSD 人脸检测器！\n\n### 对比视频\n（基准方法 **Hopenet**：https:\u002F\u002Fgithub.com\u002Fnatanielruiz\u002Fdeep-head-pose）\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_0f1476aca961.gif\" height=\"320\"\u002F>\n\n### （全新！！！）快速且稳健的 SSD 人脸检测器演示（2019年8月30日）\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_7850f3fb2c5f.gif\" height=\"300\"\u002F>\n\n### 网络摄像头演示\n\n| 单人场景（LBP） | 多人场景（MTCNN）|\n| --- | --- |\n| \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_e3db95ab4292.gif\" height=\"220\"\u002F> | \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_5b19070f2fd6.gif\" height=\"220\"\u002F> |\n\n\n| 时间序列 | 细粒度结构|\n| --- | --- |\n| \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_a8506f910a64.png\" height=\"160\"\u002F> | \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_bf46bcb9cfce.png\" height=\"330\"\u002F> |\n\n\n\n### 结果\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_1c81bd372f2b.png\" height=\"220\"\u002F>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_e2a98447e5ba.png\" height=\"220\"\u002F>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_readme_09a591a78e2a.png\" height=\"220\"\u002F>\n\n## 论文\n\n### PDF\nhttps:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002F0191.pdf\n\n\n### 论文作者\n**【杨孙毅】（https:\u002F\u002Fscholar.google.com\u002Fcitations?user=WhISCE4AAAAJ&hl=en）、【陈怡婷】（https:\u002F\u002Fsites.google.com\u002Fmedia.ee.ntu.edu.tw\u002Fyitingchen\u002F）、【林延宇】（https:\u002F\u002Fwww.citi.sinica.edu.tw\u002Fpages\u002Fyylin\u002Findex_zh.html）以及【庄永裕】（https:\u002F\u002Fwww.csie.ntu.edu.tw\u002F~cyy\u002F）**\n\n## 摘要\n本文提出了一种基于单张图像进行头部姿态估计的方法。以往的方法往往通过地标或深度估计来预测头部姿态，而这些方法往往需要额外的计算资源，远超实际需求。我们的方法以回归和特征聚合为基础。为了构建一个更轻量化的模型，我们采用了软分段回归方案。现有的特征聚合方法通常将输入视为一组特征，因此忽略了特征图中各特征之间的空间关系。我们提出了一种学习细粒度结构映射的方法，通过在特征聚合前对特征进行空间分组。细粒度结构能够提供基于部分的信息以及聚合后的值。通过利用可学习与不可学习的权重，针对空间位置进行差异化设计，我们可以生成多种互补的模型组合。实验结果表明，我们的方法在性能上优于当前最先进的方法，包括无地标方法以及基于地标或深度估计的方法。仅使用一帧 RGB 图像作为输入，我们的方法在估计yaw角时甚至超越了采用多模态信息（RGB-D、RGB-Time）的方法。此外，所提出的模型的内存开销比以往方法减少了 100 倍。\n\n## 平台\n+ Keras\n+ TensorFlow\n+ GTX-1080Ti\n+ Ubuntu\n```\npython                    3.5.6                hc3d631a_0  \nkeras-applications        1.0.4                    py35_1    anaconda\nkeras-base                2.1.0                    py35_0    anaconda\nkeras-gpu                 2.1.0                         0    anaconda\nkeras-preprocessing       1.0.2                    py35_1    anaconda\ntensorflow                1.10.0          mkl_py35heddcb22_0  \ntensorflow-base           1.10.0          mkl_py35h3c3e929_0  \ntensorflow-gpu            1.10.0               hf154084_0    anaconda\ncudnn                     7.1.3                 cuda8.0_0  \ncuda80                    1.0                           0    soumith\nnumpy                     1.15.2          py35_blas_openblashd3ea46f_0  [blas_openblas]  conda-forge\nnumpy-base                1.14.3           py35h2b20989_0  \n```\n\n## 依赖项\n+ 一份关于大部分依赖项的指南。（中文版）\nhttp:\u002F\u002Fshamangary.logdown.com\u002Fposts\u002F3009851\n+ Anaconda\n+ OpenCV\n+ MTCNN\n```\npip3 install mtcnn\n```\n+ Capsule：https:\u002F\u002Fgithub.com\u002FXifengGuo\u002FCapsNet-Keras\n+ Loupe_Keras：https:\u002F\u002Fgithub.com\u002Fshamangary\u002FLOUPE_Keras\n\n## 代码\n\n本项目共分为三个主要部分：\n1. 数据预处理\n2. 训练与测试\n3. 演示\n\n接下来，我们将逐一详细讲解各个部分的内容。\n\n本仓库适用于 300W-LP、AFLW2000 和 BIWI 数据集。\n\n### 1. 数据预处理\n\n#### 【献给像我这样懒惰的人】\n\n如果你不想重新下载每一份数据集的图片并重复进行预处理，或者你甚至并不在意文件夹中的数据结构，只需从以下链接下载 **data.zip** 文件，并将数据文件夹替换为该文件即可。\n\n[Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1j6GMx33DCcbUOS8J3NHZ-BMHgk7H-oC_\u002Fview?usp=sharing)\n\n现在，你可以直接跳转到“训练与测试”阶段。\n\n#### 【详细说明】\n\n在论文中，我们定义了 **协议 1** 和 **协议 2**。\n\n```\n\n# 协议 1\n\n训练：300W-LP（包含多个子集：{AFW.npz, AFW_Flip.npz, HELEN.npz, HELEN_Flip.npz, IBUG.npz, IBUG_Flip.npz, LFPW.npz, LFPW_Flip.npz}）\n\n测试：AFLW2000.npz 或 BIWI_noTrack.npz\n\n# 协议 2\n\n训练：BIWI（70%）→ BIWI_train.npz\n测试：BIWI（30%）→ BIWI_test.npz\n\n```\n（请注意，类型 1（300W-LP、AFLW2000）的数据集图像排列方式相同，我将其归类为 **类型 1**。这与协议 1 或协议 2 无关。）\n\n如果你想从头开始进行预处理，请先下载数据集。\n\n#### 下载数据集\n\n+ 【300W-LP、AFLW2000】（http:\u002F\u002Fwww.cbsr.ia.ac.cn\u002Fusers\u002Fxiangyuzhu\u002Fprojects\u002F3DDFA\u002Fmain.htm）\n+ 【BIWI】（https:\u002F\u002Fdata.vision.ee.ethz.ch\u002Fcvl\u002Fgfanelli\u002Fhead_pose\u002Fhead_forest.html）\n\n将 300W-LP 和 AFLW2000 文件夹放置于 **data\u002Ftype1\u002F** 目录下，将 BIWI 文件夹放置于 **data\u002F** 目录下。\n\n#### 运行预处理\n\n```\n# 对于 300W-LP 和 AFLW2000 数据集\n\ncd data\u002Ftype1\nsh run_created_db_type1.sh\n\n\n# 对于 BIWI 数据集\n\ncd data\npython TYY_create_db_biwi.py\npython TYY_create_db_biwi_70_30.py\n```\n\n\n### 2. 训练与测试\n\n```\n\n# 训练\nsh run_fsanet_train.sh\n\n# 测试\n# 请注意，我们分别计算 yaw、pitch、roll 的 MAE，然后将它们平均为单一的 MAE 以进行评估。\nsh run_fsanet_test.sh\n\n```\n\n只需在 Shell 脚本中确认自己想要使用的模型类型，一切就绪。\n\n### 3. 演示\n\n要正确处理演示文件，您需要一台**网络摄像头**。\n\n\n请注意，颜色轴的中心即为检测到的人脸中心。\n理想情况下，每一张帧都应包含新的面部检测结果。\n然而，如果面部检测失败，则会沿用之前的检测结果来估算人脸姿态。\n\n\nLBP 在实时面部检测中速度足够快；而 MTCNN 虽然精度更高，但速度较慢。\n\n（2019年8月30日更新！）SSD 面部检测既稳健又快速！我从 https:\u002F\u002Fwww.pyimagesearch.com 处借用了部分面部检测器代码。\n```\n# LBP 面部检测器（速度快，但往往容易漏检人脸）\ncd demo\nsh run_demo_FSANET.sh\n\n# MTCNN 面部检测器（速度慢，但精度高）\ncd demo\nsh run_demo_FSANET_mtcnn.sh\n\n# SSD 面部检测器（速度快且精度高）\ncd demo\nsh run_demo_FSANET_ssd.sh\n```\n\n### 4. 转换为 TensorFlow 模型图\n\n```bash\ncd training_and_testing\npython keras_to_tf.py --trained-model-dir-path ..\u002Fpre-trained\u002F300W_LP_models\u002Ffsanet_var_capsule_3_16_2_21_5 --output-dir-path \u003Cyour_output_dir>\n```\n\n上述命令会将 TensorFlow 模型图生成至 `\u003Cyour_output_dir>\u002Fconverted-models\u002Ftf\u002Ffsanet_var_capsule_3_16_2_21_5.pb` 目录下。\n\n### 模块说明：\n\n1. ssr_G_model：\n   https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002Flib\u002FFSANET_model.py#L441\n\n   + 采用双流结构来提取特征。\n\n2. ssr_feat_S_model：\n   https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002Flib\u002FFSANET_model.py#L442\n\n   + 通过不同评分函数生成细粒度的结构映射。\n   + 将这些映射应用于特征，并生成主胶囊。\n\n3. ssr_aggregation_model：\n   https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002Flib\u002FFSANET_model.py#L443\n\n   + 将主胶囊输入胶囊层，输出最终的聚合胶囊特征，并将其划分为三个阶段。\n\n4. ssr_F_model：\n   https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002Flib\u002FFSANET_model.py#L444\n\n   + 以先前三个阶段的特征作为输入，进行软步进回归（SSR）模块的处理。每个阶段进一步拆分为三个子部分：预测、动态索引偏移以及动态缩放。如需了解更多细节，请参阅 '[IJCAI18] SSR-Net'。\n\n5. SSRLayer：\n   https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fblob\u002Fmaster\u002Flib\u002FFSANET_model.py#L444\n\n   + 以预测、动态索引偏移和动态缩放作为最终回归输出的输入。\n   在本例中，共有三个输出值： yaw、pitch 和 roll。\n\n## 第三方实现\n+ https:\u002F\u002Fgithub.com\u002Faoru45\u002FFSANet.Pytorch\n+ https:\u002F\u002Fgithub.com\u002Fomasaht\u002Fheadpose-fsanet-pytorch\n\n使用 FSANet ONNX 格式打造的超棒 VR 飞行模拟\n+ https:\u002F\u002Fgithub.com\u002Fxuhao1\u002FFOXTracker\u002F","# FSA-Net 快速上手指南\n\n## 环境准备\n- **系统**：Ubuntu 16.04+（Windows 亦可，推荐 Ubuntu）\n- **GPU**：GTX-1080Ti 或同等级别 NVIDIA 显卡（≥4 GB 显存）\n- **CUDA \u002F cuDNN**：CUDA 8.0 + cuDNN 7.1.3（或 CUDA 10.x + 对应 cuDNN）\n- **Python**：3.5 \u002F 3.6（官方示例基于 3.5.6）\n- **深度学习框架**  \n  - TensorFlow-GPU 1.10.0  \n  - Keras 2.1.0（GPU 版）\n\n## 安装步骤\n\n1. **创建并激活 Conda 环境**\n```bash\nconda create -n fsanet python=3.5\nconda activate fsanet\n```\n\n2. **一键安装主要依赖**\n```bash\n# TensorFlow-GPU 1.10.0（清华镜像加速）\npip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple tensorflow-gpu==1.10.0\n\n# Keras 2.1.0\npip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple keras-gpu==2.1.0\n\n# 其他依赖\npip install opencv-python mtcnn\n```\n\n3. **克隆代码与权重**\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net.git\ncd FSA-Net\n```\n\n4. **下载预处理数据（懒人包）**\n```bash\n# 下载 data.zip（Google Drive 或国内镜像）\n# 解压后替换项目根目录下的 data\u002F 文件夹\n```\n\n5. **可选：下载预训练模型**\n```bash\n# 已内置在 pre-trained\u002F 目录，无需额外操作\n```\n\n## 基本使用\n\n### 1. 快速测试（单张图片）\n```bash\ncd demo\n# 使用 SSD 人脸检测器（推荐）\nsh run_demo_FSANET_ssd.sh\n```\n程序会自动打开摄像头，实时显示头部姿态（yaw \u002F pitch \u002F roll）。\n\n### 2. 训练自己的模型\n```bash\n# 训练（默认 Protocol 1，300W-LP → AFLW2000）\nsh run_fsanet_train.sh\n\n# 测试\nsh run_fsanet_test.sh\n```\n\n### 3. 导出 TensorFlow 冻结图（部署用）\n```bash\ncd training_and_testing\npython keras_to_tf.py \\\n  --trained-model-dir-path ..\u002Fpre-trained\u002F300W_LP_models\u002Ffsanet_var_capsule_3_16_2_21_5 \\\n  --output-dir-path .\u002Fexported\n# 生成的 .pb 文件位于 .\u002Fexported\u002Fconverted-models\u002Ftf\u002F\n```\n\n至此，FSA-Net 已可跑通 Demo、训练及导出模型。","一家做线上少儿英语教育的初创公司，准备在 iPad 端推出“AI 纠音”功能：孩子对着摄像头朗读时，系统实时判断其头部朝向，若长时间低头或左顾右盼就弹出提醒，以保证注意力集中。\n\n### 没有 FSA-Net 时\n- 用传统人脸关键点 + 3D 姿态解算，iPad Air 2 上帧率只有 8 FPS，孩子一动就卡顿。  \n- 关键点抖动导致误判，孩子稍微低头 5° 就被系统判定“走神”，频繁弹窗打断课堂。  \n- 模型 120 MB，App Store 下载包体超标，用户投诉“更新一次 200 MB”。  \n- 多人课堂场景下，老师和孩子同框，算法把老师的姿态也算进去，界面乱成一团。  \n\n### 使用 FSA-Net 后\n- 单帧 RGB 直接回归姿态，iPad Air 2 稳定 28 FPS，动画流畅无掉帧。  \n- 误差从 ±6° 降到 ±2°，只在真正偏头 15° 以上才提醒，误报率下降 80%。  \n- 模型仅 1.2 MB，整包瘦身 90%，4G 用户也能秒下。  \n- 内置 SSD 人脸检测，自动锁定最近一张儿童脸，老师入镜也不干扰。  \n\nFSA-Net 让“AI 纠音”在低端平板上也能低延迟、高精度地守护孩子的注意力。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshamangary_FSA-Net_87564e36.png","shamangary","Tsun-Yi Yang","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fshamangary_c89211f1.jpg","Founder of MIMI AI, UK.\r\nEx-Meta. \r\nNational Taiwan University CSIE PhD\r\n","Amazon","London ","shamangary@hotmail.com",null,"https:\u002F\u002Fshamangary.github.io\u002F","https:\u002F\u002Fgithub.com\u002Fshamangary",[86,90],{"name":87,"color":88,"percentage":89},"Python","#3572A5",98.9,{"name":91,"color":92,"percentage":93},"Shell","#89e051",1.1,630,156,"2025-12-23T04:04:04","Apache-2.0","Linux","需要 NVIDIA GPU，推荐 GTX-1080Ti 或更高，CUDA 8.0，cuDNN 7.1.3","未说明",{"notes":102,"python":103,"dependencies":104},"建议使用 Anaconda 管理环境；已提供预训练 TensorFlow frozen graph；需准备摄像头运行 Demo；可选 LBP\u002FMTCNN\u002FSSD 三种人脸检测器，SSD 兼顾速度与精度；提供 300W-LP、AFLW2000、BIWI 数据集的预处理脚本与打包下载","3.5.6",[105,106,107,108,109,110,111],"tensorflow-gpu==1.10.0","keras-gpu==2.1.0","numpy==1.15.2","opencv-python","mtcnn","Capsule (https:\u002F\u002Fgithub.com\u002FXifengGuo\u002FCapsNet-Keras)","Loupe_Keras (https:\u002F\u002Fgithub.com\u002Fshamangary\u002FLOUPE_Keras)",[14,13],[114,115,116,117,118,119,120,121,122,123,124,125,126,127,128],"cvpr","2019","cvpr19","cvpr2019","fsa","fsanet","fsa-net","pose","esimtation","face","angle","regression","keras","tensorflow","head","2026-03-27T02:49:30.150509","2026-04-06T06:45:57.629284",[132,137,142,147,152,157,162],{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},6209,"运行 train.sh 时出现 IOError: Unable to open file ... fsanet_noS_capsule_3_16_2_192_5.h5 不存在，怎么办？","该错误通常是因为 Python 或 NumPy 版本不兼容导致的。将 Python 与 NumPy 版本与作者环境保持一致即可解决。作者已验证的环境：Python 3.x + NumPy 1.16 左右。","https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fissues\u002F1",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},6210,"FSA-Net 检测一张图片需要多长时间？Python 和 C++ 哪个更快？","在作者笔记本 CPU 上，11 秒的视频仅需 16 秒即可完成全部帧的欧拉角预测；相比之下，Hopenet 在 GPU 上需约 35 秒。实际速度请在你自己的设备上重新计时，以获得公平对比。","https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fissues\u002F27",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},6211,"训练 SSR-Net (ssrnet_mt) 时验证 MAE 高达 10，而训练 MAE 只有 3，正常吗？","这是 batch-size 设置不当导致的。作者使用 batch-size=8 重新训练后，在 AFLW2000 上得到 MAE=5.808（yaw\u002Fpitch\u002Froll = 4.822\u002F7.219\u002F5.382），在 BIWI 上得到 MAE=4.768。请尝试减小 batch-size 并重新训练。","https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fissues\u002F26",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},6212,"训练 FSA_net_Var_Capsule 到 30 epoch 后 loss 变成 nan，怎么解决？","将 TensorFlow 降级到 1.10 后重新训练即可避免 nan 问题。作者已确认该版本下训练稳定。","https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fissues\u002F4",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},6213,"在 Unity 中用 TensorFlowSharp 调用模型时出现 “Input to reshape is a tensor with 3136 values, but the requested shape has 4096” 怎么办？","作者未提供针对 TensorFlowSharp 的官方支持。请自行检查输入尺寸是否与模型要求一致（224×224×3），并确保 reshape 操作的张量维度匹配。","https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fissues\u002F40",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},6214,"运行 keras_to_tf.py 报错 “module 'tensorflow.io' has no attribute 'write_graph'”，需要升级 TensorFlow 吗？","脚本在 TensorFlow 1.15.0 下可直接运行。若使用 TF 1.10，请将脚本第 173 行：\n```\ntf.io.write_graph(frozen_graph, tf_dir_path, f\"{model_name}.pb\", as_text=False)\n```\n改为：\n```\ntf.train.write_graph(frozen_graph, tf_dir_path, f\"{model_name}.pb\", as_text=False)\n```\n注意：仓库暂不支持 TensorFlow 2.x。","https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fissues\u002F36",{"id":163,"question_zh":164,"answer_zh":165,"source_url":166},6215,"把 batch_size 设成 64 或 128 会掉精度吗？","作者未在 Issue 中给出明确结论。根据 SSR-Net 的经验（见 Issue #26），batch-size 对验证精度影响较大，建议先用较小 batch-size（如 8）确保精度，再逐步增大以加速训练。","https:\u002F\u002Fgithub.com\u002Fshamangary\u002FFSA-Net\u002Fissues\u002F6",[]]