[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-okankop--Efficient-3DCNNs":3,"similar-okankop--Efficient-3DCNNs":84},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":8,"readme_en":9,"readme_zh":10,"quickstart_zh":11,"use_case_zh":12,"hero_image_url":13,"owner_login":14,"owner_name":15,"owner_avatar_url":16,"owner_bio":17,"owner_company":18,"owner_location":17,"owner_email":17,"owner_twitter":17,"owner_website":19,"owner_url":20,"languages":21,"stars":30,"forks":31,"last_commit_at":32,"license":33,"difficulty_score":34,"env_os":35,"env_gpu":36,"env_ram":35,"env_deps":37,"category_tags":45,"github_topics":17,"view_count":48,"oss_zip_url":17,"oss_zip_packed_at":17,"status":49,"created_at":50,"updated_at":51,"faqs":52,"releases":83},3528,"okankop\u002FEfficient-3DCNNs","Efficient-3DCNNs","PyTorch Implementation of \"Resource Efficient 3D Convolutional Neural Networks\", codes and pretrained models.","Efficient-3DCNNs 是一个基于 PyTorch 实现的开源项目，专注于提供资源高效型的 3D 卷积神经网络（3D CNN）。它主要解决了在视频理解任务中，传统 3D 模型计算量大、对硬件要求高从而难以部署的问题。通过引入并复现多种轻量化架构，该工具让高精度的视频分析也能在资源受限的设备上流畅运行。\n\n该项目非常适合从事计算机视觉研究的学者、需要部署视频算法的开发者，以及关注模型压缩与加速的技术人员使用。其核心亮点在于不仅实现了 3D SqueezeNet、3D MobileNet 系列及 3D ShuffleNet 系列等经典轻量模型，还补充了 3D ResNet 和 3D ResNeXt 作为性能基准。所有模型均支持通过调整“宽度乘数”来灵活控制复杂度，并在 Kinetics、Jester 和 UCF-101 等主流数据集上完成了验证。此外，项目提供了完整的预训练模型、数据预处理脚本及详细的运行配置，帮助用户快速复现论文结果或构建自己的高效视频分析应用。","# Efficient-3DCNNs\nPyTorch Implementation of the article \"[Resource Efficient 3D Convolutional Neural Networks](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.02422.pdf)\", codes and pretrained models.\n\n## Update!\n\n3D ResNet and 3D ResNeXt models are added! The details of these models can be found in [link](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.09577.pdf).\n\n## Requirements\n\n* [PyTorch 1.0.1.post2](http:\u002F\u002Fpytorch.org\u002F)\n* OpenCV\n* FFmpeg, FFprobe\n* Python 3\n\n## Pre-trained models\n\nPretrained models can be downloaded from [here](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1eggpkmy_zjb62Xra6kQviLa67vzP_FR8).\n\nImplemented models:\n - 3D SqueezeNet\n - 3D MobileNet\n - 3D ShuffleNet\n - 3D MobileNetv2\n - 3D ShuffleNetv2\n \n For state-of-the-art comparison, the following models are also evaluated:\n - ResNet-18\n - ResNet-50\n - ResNet-101\n - ResNext-101\n \n All models (except for SqueezeNet) are evaluated for 4 different complexity levels by adjusting their 'width_multiplier' with 2 different hardware platforms.\n\n## Results\n\n\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fokankop_Efficient-3DCNNs_readme_d4a4a98a385c.png\" align=\"middle\" width=\"900\" title=\"Results of Efficient 3DCNNs\" \u002F>\u003C\u002Fp>\n\n## Dataset Preparation\n\n### Kinetics\n\n* Download videos using [the official crawler](https:\u002F\u002Fgithub.com\u002Factivitynet\u002FActivityNet\u002Ftree\u002Fmaster\u002FCrawler\u002FKinetics).\n  * Locate test set in ```video_directory\u002Ftest```.\n* Different from the other datasets, we did not extract frames from the videos. Insted, we read the frames directly from videos using OpenCV throughout the training. If you want to extract the frames for Kinetics dataset, please follow the preperation steps in [Kensho Hara's codebase](https:\u002F\u002Fgithub.com\u002Fkenshohara\u002F3D-ResNets-PyTorch). You also need to modify the kinetics.py file in the datasets folder.\n\n* Generate annotation file in json format similar to ActivityNet using ```utils\u002Fkinetics_json.py```\n  * The CSV files (kinetics_{train, val, test}.csv) are included in the crawler.\n\n```bash\npython utils\u002Fkinetics_json.py train_csv_path val_csv_path video_dataset_path dst_json_path\n```\n\n### Jester\n\n* Download videos [here](https:\u002F\u002F20bn.com\u002Fdatasets\u002Fjester#download).\n* Generate n_frames files using ```utils\u002Fn_frames_jester.py```\n\n```bash\npython utils\u002Fn_frames_jester.py dataset_directory\n```\n\n* Generate annotation file in json format similar to ActivityNet using ```utils\u002Fjester_json.py```\n  * ```annotation_dir_path``` includes classInd.txt, trainlist.txt, vallist.txt\n\n```bash\npython utils\u002Fjester_json.py annotation_dir_path\n```\n\n### UCF-101\n\n* Download videos and train\u002Ftest splits [here](http:\u002F\u002Fcrcv.ucf.edu\u002Fdata\u002FUCF101.php).\n* Convert from avi to jpg files using ```utils\u002Fvideo_jpg_ucf101_hmdb51.py```\n\n```bash\npython utils\u002Fvideo_jpg_ucf101_hmdb51.py avi_video_directory jpg_video_directory\n```\n\n* Generate n_frames files using ```utils\u002Fn_frames_ucf101_hmdb51.py```\n\n```bash\npython utils\u002Fn_frames_ucf101_hmdb51.py jpg_video_directory\n```\n\n* Generate annotation file in json format similar to ActivityNet using ```utils\u002Fucf101_json.py```\n  * ```annotation_dir_path``` includes classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt\n\n```bash\npython utils\u002Fucf101_json.py annotation_dir_path\n```\n\n\n## Running the code\n\nModel configurations are given as follows:\n\n```misc\nShuffleNetV1-1.0x : --model shufflenet   --width_mult 1.0 --groups 3\nShuffleNetV2-1.0x : --model shufflenetv2 --width_mult 1.0\nMobileNetV1-1.0x  : --model mobilenet    --width_mult 1.0\nMobileNetV2-1.0x  : --model mobilenetv2  --width_mult 1.0 \nSqueezeNet\t  : --model squeezenet --version 1.1\nResNet-18\t  : --model resnet  --model_depth 18  --resnet_shortcut A\nResNet-50\t  : --model resnet  --model_depth 50  --resnet_shortcut B\nResNet-101\t  : --model resnet  --model_depth 101 --resnet_shortcut B\nResNeXt-101\t  : --model resnext --model_depth 101 --resnet_shortcut B --resnext_cardinality 32\n```\n\nPlease check all the 'Resource efficient 3D CNN models' in models folder and run the code by providing the necessary parameters. An example run is given as follows:\n\n- Training from scratch:\n```bash\npython main.py --root_path ~\u002F \\\n\t--video_path ~\u002Fdatasets\u002Fjester \\\n\t--annotation_path Efficient-3DCNNs\u002Fannotation_Jester\u002Fjester.json \\\n\t--result_path Efficient-3DCNNs\u002Fresults \\\n\t--dataset jester \\\n\t--n_classes 27 \\\n\t--model mobilenet \\\n\t--width_mult 0.5 \\\n\t--train_crop random \\\n\t--learning_rate 0.1 \\\n\t--sample_duration 16 \\\n\t--downsample 2 \\\n\t--batch_size 64 \\\n\t--n_threads 16 \\\n\t--checkpoint 1 \\\n\t--n_val_samples 1 \\\n```\n\n- Resuming training from a checkpoint:\n```bash\npython main.py --root_path ~\u002F \\\n\t--video_path ~\u002Fdatasets\u002Fjester \\\n\t--annotation_path Efficient-3DCNNs\u002Fannotation_Jester\u002Fjester.json \\\n\t--result_path Efficient-3DCNNs\u002Fresults \\\n\t--resume_path Efficient-3DCNNs\u002Fresults\u002Fjester_shufflenet_0.5x_G3_RGB_16_best.pth \\\n\t--dataset jester \\\n\t--n_classes 27 \\\n\t--model shufflenet \\\n\t--groups 3 \\\n\t--width_mult 0.5 \\\n\t--train_crop random \\\n\t--learning_rate 0.1 \\\n\t--sample_duration 16 \\\n\t--downsample 2 \\\n\t--batch_size 64 \\\n\t--n_threads 16 \\\n\t--checkpoint 1 \\\n\t--n_val_samples 1 \\\n```\n\n\n- Training from a pretrained model. Use '--ft_portion' and select 'complete' or 'last_layer' for the fine tuning:\n```bash\npython main.py --root_path ~\u002F \\\n\t--video_path ~\u002Fdatasets\u002Fjester \\\n\t--annotation_path Efficient-3DCNNs\u002Fannotation_UCF101\u002Fucf101_01.json \\\n\t--result_path Efficient-3DCNNs\u002Fresults \\\n\t--pretrain_path Efficient-3DCNNs\u002Fresults\u002Fkinetics_shufflenet_0.5x_G3_RGB_16_best.pth \\\n\t--dataset ucf101 \\\n\t--n_classes 600 \\\n\t--n_finetune_classes 101 \\\n\t--ft_portion last_layer \\\n\t--model shufflenet \\\n\t--groups 3 \\\n\t--width_mult 0.5 \\\n\t--train_crop random \\\n\t--learning_rate 0.1 \\\n\t--sample_duration 16 \\\n\t--downsample 1 \\\n\t--batch_size 64 \\\n\t--n_threads 16 \\\n\t--checkpoint 1 \\\n\t--n_val_samples 1 \\\n```\n\n### Augmentations\n\nThere are several augmentation techniques available. Please check spatial_transforms.py and temporal_transforms.py for the details of the augmentation methods.\n\nNote: Do not use \"RandomHorizontalFlip\" for trainings of Jester dataset, as it alters the class type of some classes (e.g. Swipe_Left --> RandomHorizontalFlip() --> Swipe_Right)\n\n### Calculating Video Accuracy\n\nIn order to calculate viceo accuracy, you should first run the models with '--test' mode in order to create 'val.json'. Then, you need to run 'video_accuracy.py' in utils folder to calculate video accuracies. \n\n### Calculating FLOPs\n\nIn order to calculate FLOPs, run the file 'calculate_FLOP.py'. You need to fist uncomment the desired model in the file. \n\n## Citation\n\nPlease cite the following article if you use this code or pre-trained models:\n\n```bibtex\n@inproceedings{kopuklu2019resource,\n  title={Resource efficient 3d convolutional neural networks},\n  author={K{\\\"o}p{\\\"u}kl{\\\"u}, Okan and Kose, Neslihan and Gunduz, Ahmet and Rigoll, Gerhard},\n  booktitle={2019 IEEE\u002FCVF International Conference on Computer Vision Workshop (ICCVW)},\n  pages={1910--1919},\n  year={2019},\n  organization={IEEE}\n}\n```\n\n## Acknowledgement\nWe thank Kensho Hara for releasing his [codebase](https:\u002F\u002Fgithub.com\u002Fkenshohara\u002F3D-ResNets-PyTorch), which we build our work on top.\n","# 高效3D CNN\n论文“资源高效的3D卷积神经网络”（[链接](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.02422.pdf)）的PyTorch实现，包含代码和预训练模型。\n\n## 更新！\n\n新增了3D ResNet和3D ResNeXt模型！这些模型的详细信息可在[链接](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.09577.pdf)中找到。\n\n## 环境要求\n\n* [PyTorch 1.0.1.post2](http:\u002F\u002Fpytorch.org\u002F)\n* OpenCV\n* FFmpeg, FFprobe\n* Python 3\n\n## 预训练模型\n\n预训练模型可从[这里](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1eggpkmy_zjb62Xra6kQviLa67vzP_FR8)下载。\n\n已实现的模型：\n- 3D SqueezeNet\n- 3D MobileNet\n- 3D ShuffleNet\n- 3D MobileNetv2\n- 3D ShuffleNetv2\n\n为了进行最先进方法的对比，还评估了以下模型：\n- ResNet-18\n- ResNet-50\n- ResNet-101\n- ResNext-101\n\n所有模型（除SqueezeNet外）均通过调整其‘width_multiplier’参数，在两种不同的硬件平台上评估了四种不同的复杂度级别。\n\n## 结果\n\n\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fokankop_Efficient-3DCNNs_readme_d4a4a98a385c.png\" align=\"middle\" width=\"900\" title=\"高效3D CNN的结果\" \u002F>\u003C\u002Fp>\n\n## 数据集准备\n\n### Kinetics\n\n* 使用[官方爬虫](https:\u002F\u002Fgithub.com\u002Factivitynet\u002FActivityNet\u002Ftree\u002Fmaster\u002FCrawler\u002FKinetics)下载视频。\n  * 测试集位于```video_directory\u002Ftest```。\n* 与其他数据集不同，我们没有从视频中提取帧。相反，在整个训练过程中，我们直接使用OpenCV从视频中读取帧。如果您想为Kinetics数据集提取帧，请遵循[Kensho Hara的代码库](https:\u002F\u002Fgithub.com\u002Fkenshohara\u002F3D-ResNets-PyTorch)中的准备步骤。您还需要修改datasets文件夹中的kinetics.py文件。\n\n* 使用```utils\u002Fkinetics_json.py```生成类似于ActivityNet的JSON格式标注文件\n  * 爬虫中包含了CSV文件（kinetics_{train, val, test}.csv）。\n\n```bash\npython utils\u002Fkinetics_json.py train_csv_path val_csv_path video_dataset_path dst_json_path\n```\n\n### Jester\n\n* 在[这里](https:\u002F\u002F20bn.com\u002Fdatasets\u002Fjester#download)下载视频。\n* 使用```utils\u002Fn_frames_jester.py```生成n_frames文件\n\n```bash\npython utils\u002Fn_frames_jester.py dataset_directory\n```\n\n* 使用```utils\u002Fjester_json.py```生成类似于ActivityNet的JSON格式标注文件\n  * ```annotation_dir_path```包括classInd.txt、trainlist.txt、vallist.txt\n\n```bash\npython utils\u002Fjester_json.py annotation_dir_path\n```\n\n### UCF-101\n\n* 在[这里](http:\u002F\u002Fcrcv.ucf.edu\u002Fdata\u002FUCF101.php)下载视频和训练\u002F测试划分。\n* 使用```utils\u002Fvideo_jpg_ucf101_hmdb51.py```将AVI文件转换为JPG文件\n\n```bash\npython utils\u002Fvideo_jpg_ucf101_hmdb51.py avi_video_directory jpg_video_directory\n```\n\n* 使用```utils\u002Fn_frames_ucf101_hmdb51.py```生成n_frames文件\n\n```bash\npython utils\u002Fn_frames_ucf101_hmdb51.py jpg_video_directory\n```\n\n* 使用```utils\u002Fucf101_json.py```生成类似于ActivityNet的JSON格式标注文件\n  * ```annotation_dir_path```包括classInd.txt、trainlist0{1, 2, 3}.txt、testlist0{1, 2, 3}.txt\n\n```bash\npython utils\u002Fucf101_json.py annotation_dir_path\n```\n\n\n## 运行代码\n\n模型配置如下：\n\n```misc\nShuffleNetV1-1.0x : --model shufflenet   --width_mult 1.0 --groups 3\nShuffleNetV2-1.0x : --model shufflenetv2 --width_mult 1.0\nMobileNetV1-1.0x  : --model mobilenet    --width_mult 1.0\nMobileNetV2-1.0x  : --model mobilenetv2  --width_mult 1.0 \nSqueezeNet\t  : --model squeezenet --version 1.1\nResNet-18\t  : --model resnet  --model_depth 18  --resnet_shortcut A\nResNet-50\t  : --model resnet  --model_depth 50  --resnet_shortcut B\nResNet-101\t  : --model resnet  --model_depth 101 --resnet_shortcut B\nResNeXt-101\t  : --model resnext --model_depth 101 --resnet_shortcut B --resnext_cardinality 32\n```\n\n请查看models文件夹中的所有“资源高效的3D CNN模型”，并提供必要的参数来运行代码。示例如下：\n\n- 从头开始训练：\n```bash\npython main.py --root_path ~\u002F \\\n\t--video_path ~\u002Fdatasets\u002Fjester \\\n\t--annotation_path Efficient-3DCNNs\u002Fannotation_Jester\u002Fjester.json \\\n\t--result_path Efficient-3DCNNs\u002Fresults \\\n\t--dataset jester \\\n\t--n_classes 27 \\\n\t--model mobilenet \\\n\t--width_mult 0.5 \\\n\t--train_crop random \\\n\t--learning_rate 0.1 \\\n\t--sample_duration 16 \\\n\t--downsample 2 \\\n\t--batch_size 64 \\\n\t--n_threads 16 \\\n\t--checkpoint 1 \\\n\t--n_val_samples 1 \\\n```\n\n- 从检查点恢复训练：\n```bash\npython main.py --root_path ~\u002F \\\n\t--video_path ~\u002Fdatasets\u002Fjester \\\n\t--annotation_path Efficient-3DCNNs\u002Fannotation_Jester\u002Fjester.json \\\n\t--result_path Efficient-3DCNNs\u002Fresults \\\n\t--resume_path Efficient-3DCNNs\u002Fresults\u002Fjester_shufflenet_0.5x_G3_RGB_16_best.pth \\\n\t--dataset jester \\\n\t--n_classes 27 \\\n\t--model shufflenet \\\n\t--groups 3 \\\n\t--width_mult 0.5 \\\n\t--train_crop random \\\n\t--learning_rate 0.1 \\\n\t--sample_duration 16 \\\n\t--downsample 2 \\\n\t--batch_size 64 \\\n\t--n_threads 16 \\\n\t--checkpoint 1 \\\n\t--n_val_samples 1 \\\n```\n\n- 从预训练模型开始训练。使用‘--ft_portion’并选择‘complete’或‘last_layer’进行微调：\n```bash\npython main.py --root_path ~\u002F \\\n\t--video_path ~\u002Fdatasets\u002Fjester \\\n\t--annotation_path Efficient-3DCNNs\u002Fannotation_UCF101\u002Fucf101_01.json \\\n\t--result_path Efficient-3DCNNs\u002Fresults \\\n\t--pretrain_path Efficient-3DCNNs\u002Fresults\u002Fkinetics_shufflenet_0.5x_G3_RGB_16_best.pth \\\n\t--dataset ucf101 \\\n\t--n_classes 600 \\\n\t--n_finetune_classes 101 \\\n\t--ft_portion last_layer \\\n\t--model shufflenet \\\n\t--groups 3 \\\n\t--width_mult 0.5 \\\n\t--train_crop random \\\n\t--learning_rate 0.1 \\\n\t--sample_duration 16 \\\n\t--downsample 1 \\\n\t--batch_size 64 \\\n\t--n_threads 16 \\\n\t--checkpoint 1 \\\n\t--n_val_samples 1 \\\n```\n\n### 数据增强\n\n有几种可用的数据增强技术。请查看spatial_transforms.py和temporal_transforms.py以了解增强方法的详细信息。\n\n注意：不要对Jester数据集的训练使用“RandomHorizontalFlip”，因为它会改变某些类别的类别类型（例如，Swipe_Left --> RandomHorizontalFlip() --> Swipe_Right）。\n\n### 计算视频准确率\n\n为了计算视频准确率，您应该首先以‘--test’模式运行模型，以创建‘val.json’文件。然后，您需要在utils文件夹中运行‘video_accuracy.py’来计算视频准确率。\n\n### 计算FLOPs\n\n要计算FLOPs，请运行‘calculate_FLOP.py’文件。您需要先取消注释文件中所需的模型。\n\n## 引用\n\n如果您使用此代码或预训练模型，请引用以下文章：\n\n```bibtex\n@inproceedings{kopuklu2019resource,\n  title={Resource efficient 3d convolutional neural networks},\n  author={K{\\\"o}p{\\\"u}kl{\\\"u}, Okan and Kose, Neslihan and Gunduz, Ahmet and Rigoll, Gerhard},\n  booktitle={2019 IEEE\u002FCVF International Conference on Computer Vision Workshop (ICCVW)},\n  pages={1910--1919},\n  year={2019},\n  organization={IEEE}\n}\n```\n\n## 致谢\n我们感谢Kensho Hara公开了他的[代码库](https:\u002F\u002Fgithub.com\u002Fkenshohara\u002F3D-ResNets-PyTorch)，我们的工作正是基于该代码库展开的。","# Efficient-3DCNNs 快速上手指南\n\nEfficient-3DCNNs 是一个基于 PyTorch 的资源高效型 3D 卷积神经网络实现，包含 SqueezeNet、MobileNet、ShuffleNet 等轻量级模型及其预训练权重，专为视频动作识别任务设计。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux (推荐) 或 macOS\n*   **Python**: Python 3.x\n*   **核心依赖**:\n    *   PyTorch 1.0.1.post2 (或兼容版本)\n    *   OpenCV (`opencv-python`)\n    *   FFmpeg 和 FFprobe (用于视频读取)\n\n**安装依赖命令：**\n\n```bash\n# 建议先安装 ffmpeg (以 Ubuntu 为例)\nsudo apt-get update\nsudo apt-get install ffmpeg\n\n# 安装 Python 依赖\npip install torch==1.0.1.post2 torchvision opencv-python\n# 如果国内下载慢，推荐使用清华源：\n# pip install torch==1.0.1.post2 torchvision opencv-python -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 安装步骤\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fokankop\u002FEfficient-3DCNNs.git\n    cd Efficient-3DCNNs\n    ```\n\n2.  **下载预训练模型 (可选)**\n    如果您需要使用官方提供的预训练权重，可以从 Google Drive 下载，或使用国内加速链接（如有）。\n    *   官方地址：[Download Link](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1eggpkmy_zjb62Xra6kQviLa67vzP_FR8)\n    *   支持模型：3D SqueezeNet, 3D MobileNet (v1\u002Fv2), 3D ShuffleNet (v1\u002Fv2), ResNet, ResNeXt。\n\n3.  **数据集准备**\n    本项目支持 Kinetics, Jester, UCF-101 等数据集。以 **Jester** 数据集为例：\n    \n    *   下载视频数据。\n    *   生成帧数文件：\n        ```bash\n        python utils\u002Fn_frames_jester.py \u003Cdataset_directory>\n        ```\n    *   生成 JSON 标注文件：\n        ```bash\n        python utils\u002Fjester_json.py \u003Cannotation_dir_path>\n        ```\n    *(注：`\u003Cannotation_dir_path>` 需包含 classInd.txt, trainlist.txt, vallist.txt)*\n\n## 基本使用\n\n以下是最简单的**从头开始训练 (Training from scratch)** 的示例命令。该示例使用 MobileNet 模型在 Jester 数据集上进行训练。\n\n请根据您的实际路径修改 `--root_path`, `--video_path`, 和 `--annotation_path` 参数。\n\n```bash\npython main.py --root_path ~\u002F \\\n\t--video_path ~\u002Fdatasets\u002Fjester \\\n\t--annotation_path Efficient-3DCNNs\u002Fannotation_Jester\u002Fjester.json \\\n\t--result_path Efficient-3DCNNs\u002Fresults \\\n\t--dataset jester \\\n\t--n_classes 27 \\\n\t--model mobilenet \\\n\t--width_mult 0.5 \\\n\t--train_crop random \\\n\t--learning_rate 0.1 \\\n\t--sample_duration 16 \\\n\t--downsample 2 \\\n\t--batch_size 64 \\\n\t--n_threads 16 \\\n\t--checkpoint 1 \\\n\t--n_val_samples 1\n```\n\n**常用模型配置参考：**\n\n*   **ShuffleNetV1**: `--model shufflenet --width_mult 1.0 --groups 3`\n*   **ShuffleNetV2**: `--model shufflenetv2 --width_mult 1.0`\n*   **MobileNetV1**: `--model mobilenet --width_mult 1.0`\n*   **MobileNetV2**: `--model mobilenetv2 --width_mult 1.0`\n*   **SqueezeNet**: `--model squeezenet --version 1.1`\n*   **ResNet-18**: `--model resnet --model_depth 18 --resnet_shortcut A`\n\n**微调预训练模型 (Fine-tuning):**\n若需加载预训练权重进行微调，添加 `--pretrain_path` 和 `--ft_portion` 参数：\n\n```bash\npython main.py --root_path ~\u002F \\\n\t--video_path ~\u002Fdatasets\u002Fjester \\\n\t--annotation_path Efficient-3DCNNs\u002Fannotation_UCF101\u002Fucf101_01.json \\\n\t--result_path Efficient-3DCNNs\u002Fresults \\\n\t--pretrain_path Efficient-3DCNNs\u002Fresults\u002Fkinetics_shufflenet_0.5x_G3_RGB_16_best.pth \\\n\t--dataset ucf101 \\\n\t--n_classes 600 \\\n\t--n_finetune_classes 101 \\\n\t--ft_portion last_layer \\\n\t--model shufflenet \\\n\t--groups 3 \\\n\t--width_mult 0.5 \\\n    ... (其他参数同上)\n```\n\n> **注意**: 在训练 Jester 数据集时，请勿使用 `RandomHorizontalFlip` 数据增强，因为这会改变部分手势类别的含义（例如将“向左滑”变为“向右滑”）。","某智慧零售团队需要在边缘计算设备（如 NVIDIA Jetson）上实时分析店内监控视频，以识别顾客的特定手势交互行为。\n\n### 没有 Efficient-3DCNNs 时\n- **硬件门槛过高**：传统的 3D ResNet-101 等模型参数量巨大，必须依赖昂贵的云端 GPU 集群进行推理，无法在低成本的边缘设备上运行。\n- **响应延迟严重**：视频数据需上传至云端处理再返回结果，网络传输导致识别延迟高达数百毫秒，无法满足“即时反馈”的交互需求。\n- **带宽成本激增**：为了维持实时性，需要持续上传多路高清视频流，产生了高昂的网络带宽费用和数据存储压力。\n- **部署灵活性差**：由于对算力要求苛刻，系统难以快速复制推广到成百上千个缺乏高性能服务器的线下门店。\n\n### 使用 Efficient-3DCNNs 后\n- **边缘端直接落地**：利用其提供的 3D MobileNetV2 或 3D ShuffleNetV2 等轻量化预训练模型，成功将算法部署在低功耗的边缘盒子上，无需云端依赖。\n- **实现毫秒级响应**：直接在本地完成视频帧的特征提取与分类，消除了网络往返时间，手势识别延迟降低至 50 毫秒以内，体验流畅自然。\n- **大幅降低运营成本**：仅需上传最终的识别事件标签而非原始视频流，带宽占用减少 99% 以上，显著节省了长期运营开支。\n- **弹性扩展能力增强**：凭借模型对不同复杂度等级（width_multiplier）的支持，团队可根据不同门店的硬件配置灵活调整模型大小，实现规模化快速铺设。\n\nEfficient-3DCNNs 通过提供资源高效型的 3D 卷积网络实现，成功打破了视频理解技术在边缘设备上的算力瓶颈，让实时智能分析变得低成本且触手可及。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fokankop_Efficient-3DCNNs_d41373e6.png","okankop","Okan Köpüklü","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fokankop_72ff2001.jpg",null,"Technical University of Munich","okankop.github.io","https:\u002F\u002Fgithub.com\u002Fokankop",[22,26],{"name":23,"color":24,"percentage":25},"Python","#3572A5",99.3,{"name":27,"color":28,"percentage":29},"Shell","#89e051",0.7,818,155,"2026-03-26T01:23:14","MIT",4,"未说明","未说明（基于 PyTorch 3D CNN 训练通常建议配备 NVIDIA GPU，但 README 未指定具体型号或显存要求）",{"notes":38,"python":39,"dependencies":40},"该工具主要用于视频动作识别。对于 Kinetics 数据集，默认配置是直接使用 OpenCV 从视频中读取帧进行训练，而非预先提取帧。Jester 数据集训练时请勿使用“随机水平翻转”增强，以免改变动作类别标签（如左滑变为右滑）。代码库基于 Kensho Hara 的 3D-ResNets-PyTorch 项目构建。","Python 3",[41,42,43,44],"PyTorch 1.0.1.post2","OpenCV","FFmpeg","FFprobe",[46,47],"视频","其他",2,"ready","2026-03-27T02:49:30.150509","2026-04-06T05:37:41.556361",[53,58,63,68,73,78],{"id":54,"question_zh":55,"answer_zh":56,"source_url":57},16172,"加载预训练模型时，如何正确处理 state_dict 中的键名以及输入数据的归一化设置？","1. 加载模型时需去除并行数据加载器添加的前缀（通常是 'module.'，即前 7 个字符）。代码示例：\nstate_dict = {k[7:]: v for k, v in cp['state_dict'].items()}\nmodel.load_state_dict(state_dict)\n2. 模型是在输入范围为 [0, 255] 的情况下训练的。如果你使用 transforms.ToTensor() 将输入归一化到了 [0, 1]，则需要使用特定的均值：mean=[110.63666788, 103.16065604, 96.29023126]，标准差 std=[1, 1, 1]。\n3. 或者，直接使用项目中的 spatial_transforms.py 或 target_transforms.py 中的 ToTensor 函数，避免手动归一化到 [0, 1]。\n4. 注意输入图像格式：训练时使用的是 OpenCV 默认的 BGR 格式，如果转换为 RGB 可能会导致结果异常。","https:\u002F\u002Fgithub.com\u002Fokankop\u002FEfficient-3DCNNs\u002Fissues\u002F24",{"id":59,"question_zh":60,"answer_zh":61,"source_url":62},16173,"为什么测试时的准确率很低，且看起来模型没有在学习？","这通常是因为你查看的是“片段准确率”（clip accuracy），而不是最终的“视频准确率”（video accuracy）。在训练过程中显示的往往是单个片段的指标。你需要在完成训练后，运行专门的脚本（如 video_accuracy.py）来计算整个视频的分类准确率，那时的数值才会符合预期（例如 UCF101 上可达 84.9%）。","https:\u002F\u002Fgithub.com\u002Fokankop\u002FEfficient-3DCNNs\u002Fissues\u002F19",{"id":64,"question_zh":65,"answer_zh":66,"source_url":67},16174,"如何进行微调（Fine-tuning）？是否有具体的命令示例？","可以使用以下命令示例进行微调（以 UCF101 数据集为例，使用 Kinetics 预训练模型）：\npython main.py --root_path ~\u002F \\\n--video_path ~\u002Fdatasets\u002Fjester \\\n--annotation_path Efficient-3DCNNs\u002Fannotation_UCF101\u002Fucf101_01.json \\\n--result_path Efficient-3DCNNs\u002Fresults \\\n--pretrain_path Efficient-3DCNNs\u002Fresults\u002Fkinetics_shufflenet_0.5x_G3_RGB_16_best.pth \\\n--dataset ucf101 \\\n--n_classes 600 \\\n--n_finetune_classes 101 \\\n--ft_portion last_layer \\\n--model shufflenet \\\n--groups 3 \\\n--width_mult 0.5 \\\n--train_crop random \\\n--learning_rate 0.1 \\\n--sample_duration 16 \\\n--batch_size 64 \\\n--n_threads 16 \\\n--checkpoint 1 \\\n--n_val_samples 1\n关键参数包括 --pretrain_path（预训练权重路径）、--n_finetune_classes（新数据集类别数）和 --ft_portion（微调部分，如 last_layer）。","https:\u002F\u002Fgithub.com\u002Fokankop\u002FEfficient-3DCNNs\u002Fissues\u002F3",{"id":69,"question_zh":70,"answer_zh":71,"source_url":72},16175,"Kinetics 数据集标注文件中的 begin_index 和 end_index 代表什么？视频是如何进行时间裁剪的？","begin_index 和 end_index 并不代表绝对的帧号或秒数，而是用于定义视频片段。在训练过程中，时间增强（temporal augmentation）是在每个 batch 创建时动态进行的。这意味着在每个 epoch 中，网络会接收到从视频中随机选择的一段连续帧（consecutive frames），而不是固定相同的片段。","https:\u002F\u002Fgithub.com\u002Fokankop\u002FEfficient-3DCNNs\u002Fissues\u002F2",{"id":74,"question_zh":75,"answer_zh":76,"source_url":77},16176,"论文或报告中提到的“视频分类准确率”是指 Top-1 还是 Top-5 准确率？","除非特别说明，通常指的是 Top-1 准确率。如果在实验中得到的结果与论文报告差距较大（例如论文报告 70.95%，而你的 Top-1 只有 52%），请确认你是否正确计算了视频级别的准确率（见相关问题关于 clip accuracy 与 video accuracy 的区别），并检查是否使用了正确的预训练模型和微调策略。","https:\u002F\u002Fgithub.com\u002Fokankop\u002FEfficient-3DCNNs\u002Fissues\u002F32",{"id":79,"question_zh":80,"answer_zh":81,"source_url":82},16177,"在 Jester 数据集上训练时遇到张量尺寸不匹配的错误（invalid argument 0: sizes of tensors must match...），如何解决？","该错误通常由数据加载或预处理阶段的维度不一致引起。虽然具体修复代码未在该线程详细展开，但维护者已确认该问题已解决。建议检查：\n1. 确保使用的注释文件（annotation file）与数据集版本匹配。\n2. 检查 --sample_duration 参数设置是否与数据预处理逻辑一致（例如是否所有视频都能提取出指定数量的帧）。\n3. 重新克隆项目代码以确保使用最新修复版本。","https:\u002F\u002Fgithub.com\u002Fokankop\u002FEfficient-3DCNNs\u002Fissues\u002F5",[],[85,100,109,117,126,134],{"id":86,"name":87,"github_repo":88,"description_zh":89,"stars":90,"difficulty_score":48,"last_commit_at":91,"category_tags":92,"status":49},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[93,94,46,95,96,47,97,98,99],"图像","数据工具","插件","Agent","语言模型","开发框架","音频",{"id":101,"name":102,"github_repo":103,"description_zh":104,"stars":105,"difficulty_score":106,"last_commit_at":107,"category_tags":108,"status":49},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[96,93,98,97,47],{"id":110,"name":111,"github_repo":112,"description_zh":113,"stars":114,"difficulty_score":106,"last_commit_at":115,"category_tags":116,"status":49},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[97,93,98,47],{"id":118,"name":119,"github_repo":120,"description_zh":121,"stars":122,"difficulty_score":123,"last_commit_at":124,"category_tags":125,"status":49},3215,"awesome-machine-learning","josephmisiti\u002Fawesome-machine-learning","awesome-machine-learning 是一份精心整理的机器学习资源清单，汇集了全球优秀的机器学习框架、库和软件工具。面对机器学习领域技术迭代快、资源分散且难以甄选的痛点，这份清单按编程语言（如 Python、C++、Go 等）和应用场景（如计算机视觉、自然语言处理、深度学习等）进行了系统化分类，帮助使用者快速定位高质量项目。\n\n它特别适合开发者、数据科学家及研究人员使用。无论是初学者寻找入门库，还是资深工程师对比不同语言的技术选型，都能从中获得极具价值的参考。此外，清单还延伸提供了免费书籍、在线课程、行业会议、技术博客及线下聚会等丰富资源，构建了从学习到实践的全链路支持体系。\n\n其独特亮点在于严格的维护标准：明确标记已停止维护或长期未更新的项目，确保推荐内容的时效性与可靠性。作为机器学习领域的“导航图”，awesome-machine-learning 以开源协作的方式持续更新，旨在降低技术探索门槛，让每一位从业者都能高效地站在巨人的肩膀上创新。",72149,1,"2026-04-03T21:50:24",[98,47],{"id":127,"name":128,"github_repo":129,"description_zh":130,"stars":131,"difficulty_score":123,"last_commit_at":132,"category_tags":133,"status":49},2234,"scikit-learn","scikit-learn\u002Fscikit-learn","scikit-learn 是一个基于 Python 构建的开源机器学习库，依托于 SciPy、NumPy 等科学计算生态，旨在让机器学习变得简单高效。它提供了一套统一且简洁的接口，涵盖了从数据预处理、特征工程到模型训练、评估及选择的全流程工具，内置了包括线性回归、支持向量机、随机森林、聚类等在内的丰富经典算法。\n\n对于希望快速验证想法或构建原型的数据科学家、研究人员以及 Python 开发者而言，scikit-learn 是不可或缺的基础设施。它有效解决了机器学习入门门槛高、算法实现复杂以及不同模型间调用方式不统一的痛点，让用户无需重复造轮子，只需几行代码即可调用成熟的算法解决分类、回归、聚类等实际问题。\n\n其核心技术亮点在于高度一致的 API 设计风格，所有估算器（Estimator）均遵循相同的调用逻辑，极大地降低了学习成本并提升了代码的可读性与可维护性。此外，它还提供了强大的模型选择与评估工具，如交叉验证和网格搜索，帮助用户系统地优化模型性能。作为一个由全球志愿者共同维护的成熟项目，scikit-learn 以其稳定性、详尽的文档和活跃的社区支持，成为连接理论学习与工业级应用的最",65628,"2026-04-05T10:10:46",[98,47,94],{"id":135,"name":136,"github_repo":137,"description_zh":138,"stars":139,"difficulty_score":48,"last_commit_at":140,"category_tags":141,"status":49},3364,"keras","keras-team\u002Fkeras","Keras 是一个专为人类设计的深度学习框架，旨在让构建和训练神经网络变得简单直观。它解决了开发者在不同深度学习后端之间切换困难、模型开发效率低以及难以兼顾调试便捷性与运行性能的痛点。\n\n无论是刚入门的学生、专注算法的研究人员，还是需要快速落地产品的工程师，都能通过 Keras 轻松上手。它支持计算机视觉、自然语言处理、音频分析及时间序列预测等多种任务。\n\nKeras 3 的核心亮点在于其独特的“多后端”架构。用户只需编写一套代码，即可灵活选择 TensorFlow、JAX、PyTorch 或 OpenVINO 作为底层运行引擎。这一特性不仅保留了 Keras 一贯的高层易用性，还允许开发者根据需求自由选择：利用 JAX 或 PyTorch 的即时执行模式进行高效调试，或切换至速度最快的后端以获得最高 350% 的性能提升。此外，Keras 具备强大的扩展能力，能无缝从本地笔记本电脑扩展至大规模 GPU 或 TPU 集群，是连接原型开发与生产部署的理想桥梁。",63927,"2026-04-04T15:24:37",[98,94,47]]