[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-philferriere--tfoptflow":3,"tool-philferriere--tfoptflow":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":10,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":114,"github_topics":115,"view_count":23,"oss_zip_url":82,"oss_zip_packed_at":82,"status":16,"created_at":125,"updated_at":126,"faqs":127,"releases":158},1323,"philferriere\u002Ftfoptflow","tfoptflow","Optical Flow Prediction with TensorFlow. Implements \"PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume,\" by Deqing Sun et al. (CVPR 2018)","tfoptflow 是一个用 TensorFlow 实现的“光流估计”开源项目，能把连续两帧视频变成一张“运动地图”——每个像素都带一个箭头，告诉你它往哪儿跑了多远。它完整复现了 2018 年 CVPR 论文 PWC-Net 的算法，并解决了早期实现只能推理、不能训练，或只能在 Linux 上跑、不支持多 GPU 等痛点。现在，Windows 和 Linux 用户都能用，训练、推理、混合精度、多卡并行都开箱即用，还附带在 MPI-Sintel 数据集上表现优于原论文的预训练模型。  \n如果你在做视频跟踪、动作识别、分割或任何需要“像素级运动信息”的研究\u002F产品开发，tfoptflow 能让你快速上手、复现结果并继续改进。","# Optical Flow Prediction with Tensorflow\n\nThis repo provides a TensorFlow-based implementation of the wonderful paper \"PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume,\" by Deqing Sun et al. (CVPR 2018).\n\nThere are already a few attempts at implementing PWC-Net using TensorFlow out there. However, they either use outdated architectures of the paper's CNN networks, only provide TF inference (no TF training), only work on Linux platforms, and do not support multi-GPU training.\n\nThis implementation provides **both TF-based training and inference**. It is **portable**: because it doesn't use any dynamically loaded CUDA-based TensorFlow user ops, it **works on Linux and Windows**. It also **supports multi-GPU training** (the notebooks and results shown here were collected on a GTX 1080 Ti paired with a Titan X). The code also allows for **mixed-precision training**.\n\nFinally, as shown in the [\"Links to pre-trained models\"](#links) section, we achieve better results than the ones reported in the official paper on the challenging MPI-Sintel 'final' dataset.\n\n# Table of Contents\n\n- [Background](#background)\n- [Environment Setup](#environment-setup)\n- [Links to pre-trained models](#links)\n- [PWC-Net](#pwc-net)\n  + [Basic Idea](#pwc-net-basic-idea)\n  + [Network](#pwc-net-network)\n  + [Jupyter Notebooks](#pwc-net-jupyter-notebooks)\n  + [Training](#pwc-net-training)\n    * [Multisteps learning rate schedule](#pwc-net-training-multisteps)\n    * [Cyclic learning rate schedule](#pwc-net-training-cyclic)\n    * [Mixed-precision training](#pwc-net-training-mixed-precision)\n  + [Evaluation](#pwc-net-eval)\n  + [Inference](#pwc-net-predict)\n    * [Running inference on the test split of a dataset](#pwc-net-predict-dataset)\n    * [Running inference on image pairs](#pwc-net-predict-img-pairs)\n- [Datasets](#datasets)\n- [References](#references)\n- [Acknowledgments](#acknowledgments)\n\n# Background\n\nThe purpose of **optical flow estimation** is to generate a dense 2D real-valued (u,v vector) map of the motion occurring from one video frame to the next. This information can be very useful when trying to solve computer vision problems such as **object tracking, action recognition, video object segmentation**, etc.\n\nFigure [[2017a]](#2017a) (a) below shows training pairs (black and white frames 0 and 1) from the [Middlebury](http:\u002F\u002Fvision.middlebury.edu\u002Fflow\u002F) Optical Flow dataset as well as the their color-coded optical flow ground truth. Figure (b) indicates the color coding used for easy visualization of the (u,v) flow fields. Usually, vector orientation is represented by color hue while vector length is encoded by color saturation:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_52c065b73828.png)\n\nThe most common measures used to evaluate the quality of optical flow estimation are **angular error (AE)** and **endpoint error (EPE)**. The angular error between two optical flow vectors *(u\u003Csub>0\u003C\u002Fsub>, v\u003Csub>0\u003C\u002Fsub>)* and *(u\u003Csub>1\u003C\u002Fsub>, v\u003Csub>1\u003C\u002Fsub>)* is defined as *arccos((u\u003Csub>0\u003C\u002Fsub>, v\u003Csub>0\u003C\u002Fsub>) . (u\u003Csub>1\u003C\u002Fsub>, v\u003Csub>1\u003C\u002Fsub>))*. The endpoint error measures the distance between the endpoints of two optical flow vectors *(u\u003Csub>0\u003C\u002Fsub>, v\u003Csub>0\u003C\u002Fsub>)* and *(u\u003Csub>1\u003C\u002Fsub>, v\u003Csub>1\u003C\u002Fsub>)* and is defined as *sqrt((u\u003Csub>0\u003C\u002Fsub> - u\u003Csub>1\u003C\u002Fsub>)\u003Csup>2\u003C\u002Fsup> + (v\u003Csub>0\u003C\u002Fsub> - v\u003Csub>1\u003C\u002Fsub>)\u003Csup>2\u003C\u002Fsup>)*.\n\n\n# Environment Setup \u003Ca name=\"environment-setup\">\u003C\u002Fa>\n\nThe code in this repo was developed and tested using Anaconda3 v.5.2.0. To reproduce our conda environment, please refer to the following files:\n\n*On Ubuntu:*\n- [`conda list`](tfoptflow\u002Fsetup\u002Fdlubu36.txt) and [`conda env export`](tfoptflow\u002Fsetup\u002Fdlubu36.yml)\n\n*On Windows:*\n- [`conda list`](tfoptflow\u002Fsetup\u002Fdlwin36.txt) and [`conda env export`](tfoptflow\u002Fsetup\u002Fdlwin36.yml)\n\n# Links to pre-trained models \u003Ca name=\"links\">\u003C\u002Fa>\n\nPre-trained models can be found [here](http:\u002F\u002Fbit.ly\u002Ftfoptflow). They come in two flavors: \"small\" (`sm`, with 4,705,064 learned parameters) models don't use dense connections or residual connections, \"large\" (`lg`, with 14,079,050 learned parameters) models do. They are all built with a 6-level pyramid, upsampling level 2 by 4 in each dimension to generate the final prediction, and construct an 81-channel cost volume at each level from a search range (maximum displacement) of 4.\n\nPlease note that we trained these models using slightly different dataset and learning rate schedules. The official multistep schedule discussed in [[2018a]](#2018a) is as follows: S\u003Csub>long\u003C\u002Fsub> 1.2M iters training, batch size 8 + S\u003Csub>fine\u003C\u002Fsub> 500k iters finetuning, batch size 4). Ours is S\u003Csub>long\u003C\u002Fsub> only, 1.2M iters, batch size 8, on a mix of `FlyingChairs` and `FlyingThings3DHalfRes`. `FlyingThings3DHalfRes` is our own version of `FlyingThings3D` where every input image pair and groundtruth flow has been **downsampled by two** in each dimension. We also use a **different set of augmentation techniques**.\n\n## Model performance\n\n| Model name | Notebooks | FlyingChairs (384x512) AEPE | Sintel clean (436x1024) AEPE | Sintel final (436x1024) AEPE |\n| :---: | :---: | :---: | :---: | :---: |\n| `pwcnet-lg-6-2-multisteps-chairsthingsmix` | [train](tfoptflow\u002Fpwcnet_train_lg-6-2-multisteps-chairsthingsmix.ipynb) | 1.44 ([notebook](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_flyingchairs.ipynb)) | 2.60 ([notebook](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_mpisintelclean.ipynb)) | 3.70 ([notebook](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_mpisintelfinal.ipynb)) |\n| `pwcnet-sm-6-2-multisteps-chairsthingsmix` | [train](tfoptflow\u002Fpwcnet_train_sm-6-2-multisteps-chairsthingsmix.ipynb) | 1.71 ([notebook](tfoptflow\u002Fpwcnet_eval_sm-6-2-multisteps-chairsthingsmix_flyingchairs.ipynb)) | 2.96 ([notebook](tfoptflow\u002Fpwcnet_eval_sm-6-2-multisteps-chairsthingsmix_mpisintelclean.ipynb)) | 3.83 ([notebook](tfoptflow\u002Fpwcnet_eval_sm-6-2-multisteps-chairsthingsmix_mpisintelfinal.ipynb)) |\n\nAs a reference, here are the official, reported results:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_070f128e3b84.png)\n\n## Model inference times\n\nWe also measured the following MPI-Sintel (436 x 1024) inference times on a few GPUs:\n\n| Model name |  Titan X  |  GTX 1080  |  GTX 1080 Ti  |\n| :---: | :---: | :---: | :---: |\n| `pwcnet-lg-6-2-cyclic-chairsthingsmix` | 90ms | 81ms | 68ms |\n| `pwcnet-sm-6-2-cyclic-chairsthingsmix` | 68.5ms | 64.4ms | 53.8ms |\n\nA few clarifications about the numbers above...\n\nFirst, please note that this implementation is, by design, portable, i.e., it doesn't use any user-defined CUDA kernels whereas the official NVidia implementation does. Ours will work on any OS and any hardware configuration (even one without a GPU) that can run TensorFlow.\n\nSecond, the timing numbers we report are the inference times of the models trained on `FlyingChairs` and `FlyingThings3DHalfRes`. These are models that you can train longer if you want to, or finetune using an additional dataset, should you want to do so. In other words, **these graphs haven't been frozen yet**.\n\nIn a typical production environment, you would freeze the model after final training\u002Ffinetuning and optimize the graph to whatever platform(s) you need to distribute them on using TensorFlow XLA or TensorRT. In that important context, **the inference numbers we report on unoptimized graphs are rather meaningless**.\n\n# PWC-Net \u003Ca name=\"pwc-net\">\u003C\u002Fa>\n\n## Basic Idea \u003Ca name=\"pwc-net-basic-idea\">\u003C\u002Fa>\n\nPer [[2018a]](#2018a), PWC Net improves on FlowNet2 [[2016a]](#2016a) by adding domain knowledge into the design of the network. The basic idea behind optical flow estimation it that a pixel will retain most of its brightness over time despite a positional change from one frame to the next (\"brightness\" constancy). We can grab a small patch around a pixel in video frame 1 and find another small patch in video frame 2 that will maximize some function (e.g., normalized cross correlation) of the two patches. Sliding that patch over the entire frame 1, looking for a peak, generates what's called a **cost volume** (the C in PWC). This techniques is fairly robust (invariant to color change) but is expensive to compute. In some cases, you may need a fairly large patch to reduce the number of false positives in frame1, raising the complexity even more.\n\nTo alleviate the cost of generating the cost volume, the first optimization is to use **pyramidal processing** (the P in PWC). Using a lower resolution image lets you perform the search sliding a smaller patch from frame 1 over a smaller version of frame 2, yielding a smaller motion vector, then use that information as a hint to perform a more targeted search at the next level of resolution in the pyramid. That multiscale motion estimation can be performed in the image domain or in the feature domain (i.e., using the downscaled feature maps generated by a convnet). In practice, PWC **warps** (the W in PWC) frame 1 using an upsampled version of the motion flow estimated at a lower resolution because this will lead to searching for a smaller motion increment in the next higher resolution level of the pyramid (hence, allowing for a smaller search range). Here's a screenshot of a [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vVU8XV0Ac_0) given by [Deqing Sun](http:\u002F\u002Fresearch.nvidia.com\u002Fperson\u002Fdeqing-sun) that illustrates this process using a 2-level pyramid:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_2d95bfdc3621.png)\n\nNote that none of the three optimizations used here (P\u002FW\u002FC) are unique to PWC-Net. These are techniques that were also used in SpyNet [[2016b]](#2016b) and FlowNet2 [[2016a]](#2016a). However, here, they are used **on the CNN features**, rather than on an image pyramid:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_ab9c50bdbe1b.png)\n\nThe authors also acknowledge the fact that careful data augmentation (e.g., adding horizontal flipping) was necessary to reach best performance. To improve robustness, the authors also recommend training on multiple datasets (Sintel+KITTI+HD1K, for example) with careful class imbalance rebalancing.\n\nSince this algorithm only works on two continuous frames at a time, it has the same limitations as methods that only use image pairs (instead of n frames with n>2). Namely, if an object moves out of frame, the predicted flow will likely have a large EPE. As the authors remark, techniques that use a larger number of frames can accommodate for this limitation by propagating motion information over time. The model also sometimes fails for small, fast moving objects.\n\n\n## Network \u003Ca name=\"pwc-net-network\">\u003C\u002Fa>\n\nHere's a picture of the network architecture described in [[2018a]](#2018a):\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_b0c0c8a9e856.png)\n\n## Jupyter Notebooks \u003Ca name=\"pwc-net-jupyter-notebooks\">\u003C\u002Fa>\n\nThe recommended way to test this implementation is to use the following Jupyter notebooks:\n\n- [`Optical flow datasets (prep and inspect)`](tfoptflow\u002Fdataset_prep.ipynb): In this notebook, we:\n  + Load the optical flow datasets and (automatically) create the additional data necessary to train our models (one-time operation on first-time load).\n  + Show sample images\u002Fflows from each dataset. Note that you must have downloaded and unpacked the master data files already. See [[Datasets]](#datasets) for download links to each dataset.\n\n- [`PWC-Net-large model training (with multisteps learning rate schedule)`](tfoptflow\u002Fpwcnet_train_lg-6-2-multisteps-chairsthingsmix.ipynb): In this notebook, we:\n  + Use a PWC-Net-large model (with dense and residual connections), 6 level pyramid, upsample level 2 by 4 as the final flow prediction\n  + Train the model on a mix of the `FlyingChairs` and `FlyingThings3DHalfRes` dataset using the S\u003Csub>long\u003C\u002Fsub> schedule described in [[2016a]](#2016a)\n  + In [`PWC-Net-small model training (with multisteps learning rate schedule)`](tfoptflow\u002Fpwcnet_train_sm-6-2-multisteps-chairsthingsmix.ipynb), we train the small version of the model (no dense or residual connections)\n  + In [`PWC-Net-large model training (with cyclical learning rate schedule)`](tfoptflow\u002Fpwcnet_train_lg-6-2-cyclic-chairsthingsmix.ipynb), we train the large version of the model using the Cyclic\u003Csub>short\u003C\u002Fsub> schedule\n  + In [`PWC-Net-small model training (with cyclical learning rate schedule)`](tfoptflow\u002Fpwcnet_train_sm-6-2-cyclic-chairsthingsmix.ipynb), we train the small version of the model (no dense or residual connections) using the Cyclic\u003Csub>short\u003C\u002Fsub> schedule\n\n- [`PWC-Net-large model evaluation (on FlyingChairs validation split)`](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_flyingchairs.ipynb): In this notebook, we:\n  + Evaluate the PWC-Net-large model trained on a mix of the `FlyingChairs` and `FlyingThings3DHalfRes` datasets using the S\u003Csub>long\u003C\u002Fsub> schedule\n  + Run the evaluation on the **validation split** of the `FlyingChairs` dataset, yielding an average EPE of 1.44\n  + Perform basic error analysis\n\n- [`PWC-Net-large model evaluation (on MPI-Sintel 'clean')`](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_mpisintelclean.ipynb): In this notebook, we:\n  + Evaluate the PWC-Net-large model trained on a mix of the `FlyingChairs` and `FlyingThings3DHalfRes` datasets using the S\u003Csub>long\u003C\u002Fsub> schedule\n  + Run the evaluation on the **'clean'** version of the MPI-Sintel dataset, yielding an average EPE of 2.60\n  + Perform basic error analysis\n\n- [`PWC-Net-large model evaluation (on MPI-Sintel 'final')`](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_mpisintelfinal.ipynb): In this notebook, we:\n  + Evaluate the PWC-Net-large model trained on a mix of the `FlyingChairs` and `FlyingThings3DHalfRes` datasets using the S\u003Csub>long\u003C\u002Fsub> schedule\n  + Run the evaluation on the **'final'** version of the MPI-Sintel dataset, yielding an average EPE of 3.70\n  + Perform basic error analysis\n\n## Training \u003Ca name=\"pwc-net-training\">\u003C\u002Fa>\n\n\n### Multisteps learning rate schedule \u003Ca name=\"pwc-net-training-multisteps\">\u003C\u002Fa>\n\nDifferently from the original paper, we do not train on `FlyingChairs` and `FlyingThings3D` sequentially (i.e, pre-train on `FlyingChairs` then finetune on `FlyingThings3D`). This is because the average flow magnitude on the `MPI-Sintel` dataset is only 13.5, while the average flow magnitudes on `FlyingChairs` and `FlyingThings3D` are 11.1 and 38, respectively. In our experiments, finetuning on `FlyingThings3D` would only yield worse results on `MPI-Sintel`.\n\nWe got more stable results by using a half-resolution version of the `FlyingThings3D` dataset with an average flow magnitude of 19, much closer to `FlyingChairs` and `MPI-Sintel` in that respect. We then trained on a mix of the `FlyingChairs` and `FlyingThings3DHalfRes` datasets. This mix, of course, could be extended with additional datasets.\n\nHere are the training curves for the S\u003Csub>long\u003C\u002Fsub> training notebooks listed above:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_c00e8fa2c605.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_f91718701a2b.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_356cf3b1ee08.png)\n\nNote that, if you click on the `IMAGE` tab in Tensorboard while running the training notebooks above, you will be able to visualize the progress of the training on a few validation samples (including the predicted flows at each pyramid level), as demonstrated here:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_cf7527e2678f.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_398f0fdc6f3d.png)\n\n### Cyclic learning rate schedule \u003Ca name=\"pwc-net-training-cyclic\">\u003C\u002Fa>\n\nIf you don't want to use the long training schedule, but still would like to play with this code, try our very short **cyclic learning rate schedule** (100k iters, batch size 8). The results are nowhere near as good, but they allow **for quick experimentation**:\n\n| Model name | Notebooks | FlyingChairs (384x512) AEPE | Sintel clean (436x1024) AEPE | Sintel final (436x1024) AEPE |\n| :---: | :---:| :---: | :---: | :---: |\n| `pwcnet-lg-6-2-cyclic-chairsthingsmix` | [train](tfoptflow\u002Fpwcnet_train_lg-6-2-cyclic-chairsthingsmix.ipynb) | 2.67 ([notebook](tfoptflow\u002Fpwcnet_eval_lg-6-2-cyclic-chairsthingsmix_flyingchairs.ipynb)) | 3.99 ([notebook](tfoptflow\u002Fpwcnet_eval_lg-6-2-cyclic-chairsthingsmix_mpisintelclean.ipynb)) | 5.08 ([notebook](tfoptflow\u002Fpwcnet_eval_lg-6-2-cyclic-chairsthingsmix_mpisintelfinal.ipynb)) |\n| `pwcnet-sm-6-2-cyclic-chairsthingsmix` | [train](tfoptflow\u002Fpwcnet_train_sm-6-2-cyclic-chairsthingsmix.ipynb) | 2.79 ([notebook](tfoptflow\u002Fpwcnet_eval_sm-6-2-cyclic-chairsthingsmix_flyingchairs.ipynb)) | 4.34 ([notebook](tfoptflow\u002Fpwcnet_eval_sm-6-2-cyclic-chairsthingsmix_mpisintelclean.ipynb)) | 5.3 ([notebook](tfoptflow\u002Fpwcnet_eval_sm-6-2-cyclic-chairsthingsmix_mpisintelfinal.ipynb))|\n\nBelow are the training curves for the Cyclic\u003Csub>short\u003C\u002Fsub> training notebooks:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_276b8052438a.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_46226d1cc875.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_43f7fc376217.png)\n\n### Mixed-precision training \u003Ca name=\"pwc-net-training-mixed-precision\">\u003C\u002Fa>\n\nYou can speed up training even further by using **mixed-precision** training. But, again, don't expect the same level of accuracy:\n\n| Model name | Notebooks | FlyingChairs (384x512) AEPE | Sintel clean (436x1024) AEPE | Sintel final (436x1024) AEPE |\n| :---: | :---:| :---: | :---: | :---: |\n| `pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16` | [train](tfoptflow\u002Fpwcnet_train_sm-6-2-cyclic-chairsthingsmix-fp16.ipynb) | 2.47 ([notebook](tfoptflow\u002Fpwcnet_eval_sm-6-2-cyclic-chairsthingsmix-fp16.ipynb)) | 3.77 ([notebook](pwcnet_eval_sm-6-2-cyclic-chairsthingsmix-fp16.ipynb)) | 4.90 ([notebook](pwcnet_eval_sm-6-2-cyclic-chairsthingsmix-fp16.ipynb))|\n \n## Evaluation \u003Ca name=\"pwc-net-eval\">\u003C\u002Fa>\n\nAs shown in the evaluation notebooks, and as expected, it becomes harder for the PWC-Net models to deliver accurate flow predictions if the average flow magnitude from one frame to the next is high:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_2a7ac80f7503.png)\n\nIt is especially hard for this -- and any other 2-frame based motion estimator! -- model to generate accurate predictions when picture elements simply disappear out-of-frame or suddenly fly-in:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_a9d66b929cc5.png)\n\nStill, when the average motion is moderate, both the small and large models generate remarkable results:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_20b04a8e1d4a.png)\n\n## Inference \u003Ca name=\"pwc-net-predict\">\u003C\u002Fa>\n\nThere are two ways you can call the code provided here to generate flow predictions for your own dataset:\n\n- Pass a list of image pairs to a `ModelPWCNet` object using its  `predict_from_img_pairs()` method\n- Pass an `OpticalFlowDataset` object to a `ModelPWCNet` object and call its  `predict()` method\n\n### Running inference on image pairs \u003Ca name=\"pwc-net-predict-img-pairs\">\u003C\u002Fa>\n\nIf you want to use a pre-trained PWC-Net model on your own set of images, you can pass a list of image pairs to a `ModelPWCNet` object using its  `predict_from_img_pairs()` method, as demonstrated here:\n\n```python\nfrom __future__ import absolute_import, division, print_function\nfrom copy import deepcopy\nfrom skimage.io import imread\nfrom model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_TEST_OPTIONS\nfrom visualize import display_img_pairs_w_flows\n\n# Build a list of image pairs to process\nimg_pairs = []\nfor pair in range(1, 4):\n    image_path1 = f'.\u002Fsamples\u002Fmpisintel_test_clean_ambush_1_frame_00{pair:02d}.png'\n    image_path2 = f'.\u002Fsamples\u002Fmpisintel_test_clean_ambush_1_frame_00{pair+1:02d}.png'\n    image1, image2 = imread(image_path1), imread(image_path2)\n    img_pairs.append((image1, image2))\n\n# TODO: Set device to use for inference\n# Here, we're using a GPU (use '\u002Fdevice:CPU:0' to run inference on the CPU)\ngpu_devices = ['\u002Fdevice:GPU:0']  \ncontroller = '\u002Fdevice:GPU:0'\n\n# TODO: Set the path to the trained model (make sure you've downloaded it first from http:\u002F\u002Fbit.ly\u002Ftfoptflow)\nckpt_path = '.\u002Fmodels\u002Fpwcnet-lg-6-2-multisteps-chairsthingsmix\u002Fpwcnet.ckpt-595000'\n\n# Configure the model for inference, starting with the default options\nnn_opts = deepcopy(_DEFAULT_PWCNET_TEST_OPTIONS)\nnn_opts['verbose'] = True\nnn_opts['ckpt_path'] = ckpt_path\nnn_opts['batch_size'] = 1\nnn_opts['gpu_devices'] = gpu_devices\nnn_opts['controller'] = controller\n\n# We're running the PWC-Net-large model in quarter-resolution mode\n# That is, with a 6 level pyramid, and upsampling of level 2 by 4 in each dimension as the final flow prediction\nnn_opts['use_dense_cx'] = True\nnn_opts['use_res_cx'] = True\nnn_opts['pyr_lvls'] = 6\nnn_opts['flow_pred_lvl'] = 2\n\n# The size of the images in this dataset are not multiples of 64, while the model generates flows padded to multiples\n# of 64. Hence, we need to crop the predicted flows to their original size\nnn_opts['adapt_info'] = (1, 436, 1024, 2)\n\n# Instantiate the model in inference mode and display the model configuration\nnn = ModelPWCNet(mode='test', options=nn_opts)\nnn.print_config()\n\n# Generate the predictions and display them\npred_labels = nn.predict_from_img_pairs(img_pairs, batch_size=1, verbose=False)\ndisplay_img_pairs_w_flows(img_pairs, pred_labels)\n```\n\nThe code above can be found in the [`pwcnet_predict_from_img_pairs.ipynb`](tfoptflow\u002Fpwcnet_predict_from_img_pairs.ipynb) notebook and the  [`pwcnet_predict_from_img_pairs.py`](tfoptflow\u002Fpwcnet_predict_from_img_pairs.py) script.\n\n### Running inference on the test split of a dataset \u003Ca name=\"pwc-net-predict-dataset\">\u003C\u002Fa>\n\nIf you want to train a PWC-Net model from scratch, or finetune a pre-trained PWC-Net model using your own dataset, you will need to **implement a dataset handler** that derives from the `OpticalFlowDataset` base class in [`dataset_base.py`](tfoptflow\u002Fdataset_base.py).\n\nWe provide several dataset handlers for well known datasets, such as MPI-Sintel ([`dataset_mpisintel.py`](tfoptflow\u002Fdataset_mpisintel.py)), FlyingChairs ([`dataset_flyingchairs.py`](tfoptflow\u002Fdataset_flyingchairs.py)), FlyingThings3D ([`dataset_flyingthings3d.py`](tfoptflow\u002Fdataset_flyingthings3d.py)), and KITTI ([`dataset_kitti.py`](tfoptflow\u002Fdataset_kitti.py)). Anyone of them is a good starting point to figure out how to implement your own. \n\nPlease note that that this is not complicated work; the derived class does little beyond telling the base class which list of files are to be used for training, validation, and testing, leaving the heavy lifting to the base class.\n\nOnce you have a data handler, you can pass it to a `ModelPWCNet` object and call its  `predict()` method to generate flow predictions for its test split, as shown in the [`pwcnet_predict.ipynb`](tfoptflow\u002Fpwcnet_predict.ipynb) notebook and the  [`pwcnet_predict.py`](tfoptflow\u002Fpwcnet_predict.py) script.\n\n## Datasets \u003Ca name=\"datasets\">\u003C\u002Fa>\n\nDatasets most commonly used for optical flow estimation include:\n\n- FlyingThings3D [[image pairs](https:\u002F\u002Flmb.informatik.uni-freiburg.de\u002Fdata\u002FSceneFlowDatasets_CVPR16\u002FRelease_april16\u002Fdata\u002FFlyingThings3D\u002Fraw_data\u002Fflyingthings3d__frames_cleanpass.tar) + [flows](https:\u002F\u002Flmb.informatik.uni-freiburg.de\u002Fdata\u002FSceneFlowDatasets_CVPR16\u002FRelease_april16\u002Fdata\u002FFlyingThings3D\u002Fderived_data\u002Fflyingthings3d__optical_flow.tar.bz2) + [all_unused_files.txt](https:\u002F\u002Flmb.informatik.uni-freiburg.de\u002Fresources\u002Fdatasets\u002FSceneFlow\u002Fassets\u002Fall_unused_files.txt)]\n- FlyingChairs [[images pairs + flows](https:\u002F\u002Flmb.informatik.uni-freiburg.de\u002Fdata\u002FFlyingChairs\u002FFlyingChairs.zip) + [FlyingChairs_train_val split](https:\u002F\u002Flmb.informatik.uni-freiburg.de\u002Fresources\u002Fdatasets\u002FFlyingChairs\u002FFlyingChairs_train_val.txt)]\n- MPI Sintel [[zip]](http:\u002F\u002Ffiles.is.tue.mpg.de\u002Fsintel\u002FMPI-Sintel-complete.zip)\n- KITTI Flow 2012 [[zip]](http:\u002F\u002Fwww.cvlibs.net\u002Fdownload.php?file=data_stereo_flow.zip) and\u002For KITTI Flow 2015 [[zip]](http:\u002F\u002Fwww.cvlibs.net\u002Fdownload.php?file=data_scene_flow.zip)\n\nAdditional optical flow datasets (not used here):\n\n- Middlebury Optical Flow [[web]](http:\u002F\u002Fvision.middlebury.edu\u002Fflow\u002F)\n- Heidelberg HD1K Flow [[web]](http:\u002F\u002Fhci-benchmark.org\u002Fflow)\n\nPer [[2018a]](#2018a), KITTI and Sintel are currently the most challenging and widely-used benchmarks for optical flow. The KITTI benchmark is targeted at autonomous driving applications and its semi-dense ground truth is collected using LIDAR. The 2012 set only consists of static scenes. The 2015 set is extended to dynamic scenes via human annotations and more challenging to existing methods because of the large motion, severe illumination changes, and occlusions. \n\nThe Sintel benchmark is created using the open source graphics movie \"Sintel\" with two passes, clean and final. The final pass contains strong atmospheric effects, motion blur, and camera noise, which cause severe problems to existing methods.\n\n## References\n\n### 2018\n\n- [2018a]\u003Ca name=\"2018a\">\u003C\u002Fa> Sun et al. 2018. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. [[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.02371) [[web]](http:\u002F\u002Fresearch.nvidia.com\u002Fpublication\u002F2018-02_PWC-Net%3A-CNNs-for) [[PyTorch (Official)]](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FPWC-Net\u002Ftree\u002Fmaster\u002FPyTorch) [[PyTorch]](https:\u002F\u002Fgithub.com\u002Fsniklaus\u002Fpytorch-pwc) [[PyTorch]](https:\u002F\u002Fgithub.com\u002FRanhaoKang\u002FPWC-Net_pytorch) [[Caffe (Official)]](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FPWC-Net\u002Ftree\u002Fmaster\u002FCaffe) [[TensorFlow]](https:\u002F\u002Fgithub.com\u002Fdjl11\u002FPWC_Net_TensorFlow) [[TensorFlow]](https:\u002F\u002Fgithub.com\u002Fdaigo0927\u002FPWC-Net_tf) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vVU8XV0Ac_0) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=LBJ20kxr1a0)\n\n### 2017\n\n- [2017a]\u003Ca name=\"2017a\">\u003C\u002Fa> Baghaie et al. 2017. Dense Descriptors for Optical Flow Estimation: A Comparative Study. [[web]](http:\u002F\u002Fwww.mdpi.com\u002F2313-433X\u002F3\u002F1\u002F12)\n\n### 2016\n\n- [2016a]\u003Ca name=\"2016a\">\u003C\u002Fa> Ilg et al. 2016. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. [[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.01925) [[PyTorch (Official)]](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fflownet2-pytorch) [[TensorFlow]](https:\u002F\u002Fgithub.com\u002Fsampepose\u002Fflownet2-tf)\n- [2016b]\u003Ca name=\"2016b\">\u003C\u002Fa> Ranjan et al. 2016. SpyNet: Optical Flow Estimation using a Spatial Pyramid Network. [[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.00850) [[Torch (Official)]](https:\u002F\u002Fgithub.com\u002Fanuragranj\u002Fspynet) [[PyTorch]](https:\u002F\u002Fgithub.com\u002Fsniklaus\u002Fpytorch-spynet)\n\n### 2015\n\n- [2015a]\u003Ca name=\"2015a\">\u003C\u002Fa> Fischer et al. 2015. FlowNet: Learning Optical Flow with Convolutional Networks. [[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1504.06852) [[Tensorflow (FlowNet-S)]](https:\u002F\u002Fgithub.com\u002FDingGit\u002Fflownet-tf)\n\n## Acknowledgments\n\nOther TensorFlow implementations we are indebted to:\n- https:\u002F\u002Fgithub.com\u002Fdaigo0927\u002FPWC-Net_tf by daigo0927\n- https:\u002F\u002Fgithub.com\u002Fdjl11\u002FPWC_Net_TensorFlow by djl11\n- https:\u002F\u002Fgithub.com\u002Ftensorpack\u002Ftensorpack\u002Ftree\u002Fmaster\u002Fexamples\u002FOpticalFlow by PatWie\n\n```\n@InProceedings{Sun2018PWC-Net,\n  author    = {Deqing Sun and Xiaodong Yang and Ming-Yu Liu and Jan Kautz},\n  title     = {{PWC-Net}: {CNNs} for Optical Flow Using Pyramid, Warping, and Cost Volume},\n  booktitle = CVPR,\n  year      = {2018},\n}\n```\n\n```\n@InProceedings\\{DFIB15,\n  author       = \"A. Dosovitskiy and P. Fischer and E. Ilg and P. H{\\\"a}usser and C. Hazirbas and V. Golkov and P. v.d. Smagt and D. Cremers and T. Brox\",\n  title        = \"FlowNet: Learning Optical Flow with Convolutional Networks\",\n  booktitle    = \"IEEE International Conference on Computer Vision (ICCV)\",\n  month        = \"Dec\",\n  year         = \"2015\",\n  url          = \"http:\u002F\u002Flmb.informatik.uni-freiburg.de\u002F\u002FPublications\u002F2015\u002FDFIB15\"\n}\n```\n\n# Contact Info\n\nIf you have any questions about this work, please feel free to contact us here:\n\n[![https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fphilferriere](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_a5d35e87943b.png)](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fphilferriere)\n","# 使用 TensorFlow 进行光流预测\n\n本仓库提供了基于 TensorFlow 的实现，复现了 Deqing Sun 等人发表于 CVPR 2018 的优秀论文《PWC-Net：利用金字塔、扭曲和代价体积的 CNN 光流网络》。\n\n目前已有若干尝试使用 TensorFlow 实现 PWC-Net。然而，这些实现要么沿用了论文中已过时的 CNN 网络架构，要么仅提供 TF 推理（无 TF 训练），要么仅支持 Linux 平台，且不支持多 GPU 训练。\n\n本实现同时提供了基于 TensorFlow 的训练与推理功能。它具有良好的可移植性：由于未使用任何动态加载的 CUDA 版 TensorFlow 用户自定义操作，因此可在 Linux 和 Windows 上运行。此外，它还支持多 GPU 训练（此处展示的笔记本及结果均在 GTX 1080 Ti 配合 Titan X 的配置下完成）。代码还支持混合精度训练。\n\n最后，正如“预训练模型链接”部分所示，我们在极具挑战性的 MPI-Sintel ‘final’ 数据集上取得了优于官方论文报道的结果。\n\n# 目录\n\n- [背景](#background)\n- [环境搭建](#environment-setup)\n- [预训练模型链接](#links)\n- [PWC-Net](#pwc-net)\n  + [基本思想](#pwc-net-basic-idea)\n  + [网络结构](#pwc-net-network)\n  + [Jupyter 笔记本](#pwc-net-jupyter-notebooks)\n  + [训练](#pwc-net-training)\n    * [多步学习率调度](#pwc-net-training-multisteps)\n    * [循环学习率调度](#pwc-net-training-cyclic)\n    * [混合精度训练](#pwc-net-training-mixed-precision)\n  + [评估](#pwc-net-eval)\n  + [推理](#pwc-net-predict)\n    * [在数据集的测试集上运行推理](#pwc-net-predict-dataset)\n    * [对图像对进行推理](#pwc-net-predict-img-pairs)\n- [数据集](#datasets)\n- [参考文献](#references)\n- [致谢](#acknowledgments)\n\n# 背景\n\n**光流估计**的目的是生成一幅从一帧视频到下一帧视频之间发生的运动的稠密二维实数值（u,v 向量）映射。这一信息在解决计算机视觉问题时非常有用，例如 **目标跟踪、动作识别、视频目标分割** 等。\n\n下图 [[2017a]](#2017a) (a) 展示了来自 [Middlebury](http:\u002F\u002Fvision.middlebury.edu\u002Fflow\u002F) 光流数据集的训练样本对（黑白帧 0 和 1），以及它们的彩色编码光流真实值。图 (b) 则说明了用于直观显示 (u,v) 流场的色彩编码规则：通常，向量方向由颜色色调表示，而向量长度则由颜色饱和度编码：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_52c065b73828.png)\n\n评估光流估计质量最常用的指标是 **角度误差 (AE)** 和 **端点误差 (EPE)**。两个光流向量 *(u\u003Csub>0\u003C\u002Fsub>, v\u003Csub>0\u003C\u002Fsub>)* 和 *(u\u003Csub>1\u003C\u002Fsub>, v\u003Csub>1\u003C\u002Fsub>)* 之间的角度误差定义为 *arccos((u\u003Csub>0\u003C\u002Fsub>, v\u003Csub>0\u003C\u002Fsub>) · (u\u003Csub>1\u003C\u002Fsub>, v\u003Csub>1\u003C\u002Fsub>))*。端点误差则衡量两个光流向量 *(u\u003Csub>0\u003C\u002Fsub>, v\u003Csub>0\u003C\u002Fsub>)* 和 *(u\u003Csub>1\u003C\u002Fsub>, v\u003Csub>1\u003C\u002Fsub>)* 的端点之间的距离，定义为 *sqrt((u\u003Csub>0\u003C\u002Fsub> - u\u003Csub>1\u003C\u002Fsub>)² + (v\u003Csub>0\u003C\u002Fsub> - v\u003Csub>1\u003C\u002Fsub>)²)*。\n\n\n# 环境搭建 \u003Ca name=\"environment-setup\">\u003C\u002Fa>\n\n本仓库中的代码是在 Anaconda3 v.5.2.0 下开发并测试的。如需复现我们的 conda 环境，请参阅以下文件：\n\n*在 Ubuntu 上：*\n- [`conda list`](tfoptflow\u002Fsetup\u002Fdlubu36.txt) 和 [`conda env export`](tfoptflow\u002Fsetup\u002Fdlubu36.yml)\n\n*在 Windows 上：*\n- [`conda list`](tfoptflow\u002Fsetup\u002Fdlwin36.txt) 和 [`conda env export`](tfoptflow\u002Fsetup\u002Fdlwin36.yml)\n\n# 预训练模型链接 \u003Ca name=\"links\">\u003C\u002Fa>\n\n预训练模型可在此处获取 [这里](http:\u002F\u002Fbit.ly\u002Ftfoptflow)。模型分为两种：“小”模型（`sm`，拥有 4,705,064 个学习参数）不使用密集连接或残差连接；“大”模型（`lg`，拥有 14,079,050 个学习参数）则使用。所有模型均采用 6 层金字塔结构，在每个维度上以 2×4 的比例进行上采样以生成最终预测，并在每一层构建一个 81 通道的代价体积，搜索范围（最大位移）为 4。\n\n请注意，我们使用略有不同的数据集和学习率调度来训练这些模型。官方论文 [[2018a]](#2018a) 中讨论的多步调度如下：S\u003Csub>long\u003C\u002Fsub> 训练 120 万次迭代，批次大小 8；随后 S\u003Csub>fine\u003C\u002Fsub> 微调 50 万次迭代，批次大小 4。而我们的调度仅为 S\u003Csub>long\u003C\u002Fsub>，训练 120 万次迭代，批次大小 8，训练数据混合了 `FlyingChairs` 和 `FlyingThings3DHalfRes`。其中，`FlyingThings3DHalfRes` 是我们自己版本的 `FlyingThings3D`，其每一对输入图像及其对应的光流真实值均已在每个维度上 **缩小了两倍**。此外，我们还采用了 **不同的增强技术组合**。\n\n## 模型性能\n\n| 模型名称 | 笔记本 | FlyingChairs (384x512) AEPE | Sintel clean (436x1024) AEPE | Sintel final (436x1024) AEPE |\n| :---: | :---: | :---: | :---: | :---: |\n| `pwcnet-lg-6-2-multisteps-chairsthingsmix` | [训练](tfoptflow\u002Fpwcnet_train_lg-6-2-multisteps-chairsthingsmix.ipynb) | 1.44 ([笔记本](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_flyingchairs.ipynb)) | 2.60 ([笔记本](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_mpisintelclean.ipynb)) | 3.70 ([笔记本](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_mpisintelfinal.ipynb)) |\n| `pwcnet-sm-6-2-multisteps-chairsthingsmix` | [训练](tfoptflow\u002Fpwcnet_train_sm-6-2-multisteps-chairsthingsmix.ipynb) | 1.71 ([笔记本](tfoptflow\u002Fpwcnet_eval_sm-6-2-multisteps-chairsthingsmix_flyingchairs.ipynb)) | 2.96 ([笔记本](tfoptflow\u002Fpwcnet_eval_sm-6-2-multisteps-chairsthingsmix_mpisintelclean.ipynb)) | 3.83 ([笔记本](tfoptflow\u002Fpwcnet_eval_sm-6-2-multisteps-chairsthingsmix_mpisintelfinal.ipynb)) |\n\n作为参考，以下是官方报告的结果：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_070f128e3b84.png)\n\n## 模型推理时间\n\n我们还在几款 GPU 上测量了以下 MPI-Sintel（436×1024）数据集的推理时间：\n\n| 模型名称 | Titan X | GTX 1080 | GTX 1080 Ti |\n| :---: | :---: | :---: | :---: |\n| `pwcnet-lg-6-2-cyclic-chairsthingsmix` | 90ms | 81ms | 68ms |\n| `pwcnet-sm-6-2-cyclic-chairsthingsmix` | 68.5ms | 64.4ms | 53.8ms |\n\n关于上述数字的几点说明……\n\n首先，请注意，本实现从设计上就具有可移植性，即它不使用任何用户自定义的 CUDA 内核，而官方的 NVIDIA 实现则会使用。我们的实现可以在任何操作系统和任何硬件配置上运行（甚至在没有 GPU 的系统上也能运行），只要该系统能够支持 TensorFlow。\n\n其次，我们报告的这些时间数据是基于在 `FlyingChairs` 和 `FlyingThings3DHalfRes` 数据集上训练的模型的推理时间。如果您希望，可以对这些模型进行更长时间的训练，或者使用额外的数据集对其进行微调。换句话说，**这些图尚未经过冻结处理**。\n\n在典型的生产环境中，您会在完成最终训练\u002F微调后对模型进行冻结，并利用 TensorFlow XLA 或 TensorRT 将计算图优化至您需要分发到的目标平台。在这种重要场景下，**我们在未优化的计算图上报告的推理时间其实并无太大意义**。\n\n# PWC-Net \u003Ca name=\"pwc-net\">\u003C\u002Fa>\n\n## 基本思想 \u003Ca name=\"pwc-net-basic-idea\">\u003C\u002Fa>\n\n根据 [[2018a]](#2018a)，PWC Net 通过将领域知识融入网络设计，对 FlowNet2 [[2016a]](#2016a) 进行了改进。光流估计的基本思想是：尽管像素在相邻帧之间发生了位置变化，但其亮度在时间上仍能保持基本不变（即“亮度恒定”）。我们可以先在第一帧视频中选取一个像素周围的小区域，然后在第二帧视频中寻找另一个小区域，使得这两个区域之间的某种函数值（例如归一化互相关）达到最大。接着将这个小区域在整个第一帧上滑动并寻找峰值，从而生成所谓的“代价体积”（PWC 中的 C）。这种技术相当稳健（对颜色变化具有不变性），但计算成本较高。在某些情况下，为了减少第一帧中的误检数量，可能需要使用较大的搜索区域，这进一步增加了计算复杂度。\n\n为降低生成代价体积的成本，第一个优化措施是采用“金字塔处理”（PWC 中的 P）。通过使用较低分辨率的图像，可以在较小的第二帧版本上以较小的搜索区域进行滑动搜索，从而得到一个较小的运动向量，再将这一信息作为提示，在金字塔的下一更高分辨率层级上进行更有针对性的搜索。这种多尺度运动估计既可以在图像域进行，也可以在特征域进行（即利用卷积神经网络生成的下采样特征图）。实际上，PWC 会用在较低分辨率下估计出的运动流的上采样版本来“扭曲”（PWC 中的 W）第一帧，因为这样可以在金字塔的下一更高分辨率层级上搜索更小的运动增量，从而缩小搜索范围。以下是 [Deqing Sun](http:\u002F\u002Fresearch.nvidia.com\u002Fperson\u002Fdeqing-sun) 在一次演讲中展示的使用两层金字塔的示意图：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_2d95bfdc3621.png)\n\n需要注意的是，这里使用的三项优化措施（P\u002FW\u002FC）并非 PWC-Net 所独有。这些技术同样被用于 SpyNet [[2016b]](#2016b) 和 FlowNet2 [[2016a]](#2016a)。然而，在 PWC-Net 中，这些优化是在“CNN 特征”上进行的，而不是在图像金字塔上：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_ab9c50bdbe1b.png)\n\n作者还承认，为了达到最佳性能，必须进行细致的数据增强（例如添加水平翻转）。此外，为了提高鲁棒性，作者建议在多个数据集上进行训练（例如 Sintel+KITTI+HD1K），并仔细平衡各类样本的分布不均问题。\n\n由于该算法每次仅处理两帧连续图像，因此它与仅使用图像对的方法（而非 n 帧、n>2）具有相同的局限性。也就是说，如果某个物体移出了画面，预测的光流很可能会出现较大的 EPE。正如作者所指出的，使用更多帧的技术可以通过随时间传播运动信息来弥补这一局限性。此外，该模型有时也会对小型、快速移动的物体失效。\n\n\n## 网络 \u003Ca name=\"pwc-net-network\">\u003C\u002Fa>\n\n以下是 [[2018a]](#2018a) 中描述的网络架构图：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_b0c0c8a9e856.png)\n\n## Jupyter 笔记本 \u003Ca name=\"pwc-net-jupyter-notebooks\">\u003C\u002Fa>\n\n测试此实现的推荐方法是使用以下 Jupyter 笔记本：\n\n- [`光流数据集（准备与检查）`](tfoptflow\u002Fdataset_prep.ipynb)：在该笔记本中，我们：\n  + 加载光流数据集，并（自动）创建训练模型所需的额外数据（首次加载时执行一次操作）。\n  + 展示每个数据集的样本图像\u002F光流。请注意，您必须已经下载并解压了主数据文件。有关各数据集的下载链接，请参阅[[数据集]](#datasets)。\n\n- [`PWC-Net 大模型训练（采用多步学习率调度）`](tfoptflow\u002Fpwcnet_train_lg-6-2-multisteps-chairsthingsmix.ipynb)：在该笔记本中，我们：\n  + 使用 PWC-Net 大模型（带有密集连接和残差连接），6 层金字塔结构，将第 2 层上采样放大 4 倍作为最终的光流预测。\n  + 使用 [[2016a]](#2016a) 中描述的 S\u003Csub>long\u003C\u002Fsub> 调度，在 `FlyingChairs` 和 `FlyingThings3DHalfRes` 数据集的混合数据上训练模型。\n  + 在 [`PWC-Net 小模型训练（采用多步学习率调度）`](tfoptflow\u002Fpwcnet_train_sm-6-2-multisteps-chairsthingsmix.ipynb) 中，我们训练模型的小版本（无密集或残差连接）。\n  + 在 [`PWC-Net 大模型训练（采用循环学习率调度）`](tfoptflow\u002Fpwcnet_train_lg-6-2-cyclic-chairsthingsmix.ipynb) 中，我们使用 Cyclic\u003Csub>short\u003C\u002Fsub> 调度训练大模型。\n  + 在 [`PWC-Net 小模型训练（采用循环学习率调度）`](tfoptflow\u002Fpwcnet_train_sm-6-2-cyclic-chairsthingsmix.ipynb) 中，我们使用 Cyclic\u003Csub>short\u003C\u002Fsub> 调度训练小模型（无密集或残差连接）。\n\n- [`PWC-Net 大模型评估（在 FlyingChairs 验证集上）`](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_flyingchairs.ipynb)：在该笔记本中，我们：\n  + 评估在 `FlyingChairs` 和 `FlyingThings3DHalfRes` 数据集混合数据上、采用 S\u003Csub>long\u003C\u002Fsub> 调度训练的 PWC-Net 大模型。\n  + 在 `FlyingChairs` 数据集的**验证集**上进行评估，得到平均 EPE 为 1.44。\n  + 进行基本误差分析。\n\n- [`PWC-Net 大模型评估（在 MPI-Sintel 'clean' 上）`](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_mpisintelclean.ipynb)：在该笔记本中，我们：\n  + 评估在 `FlyingChairs` 和 `FlyingThings3DHalfRes` 数据集混合数据上、采用 S\u003Csub>long\u003C\u002Fsub> 调度训练的 PWC-Net 大模型。\n  + 在 MPI-Sintel 数据集的**'clean'** 版本上进行评估，得到平均 EPE 为 2.60。\n  + 进行基本误差分析。\n\n- [`PWC-Net 大模型评估（在 MPI-Sintel 'final' 上）`](tfoptflow\u002Fpwcnet_eval_lg-6-2-multisteps-chairsthingsmix_mpisintelfinal.ipynb)：在该笔记本中，我们：\n  + 评估在 `FlyingChairs` 和 `FlyingThings3DHalfRes` 数据集混合数据上、采用 S\u003Csub>long\u003C\u002Fsub> 调度训练的 PWC-Net 大模型。\n  + 在 MPI-Sintel 数据集的**'final'** 版本上进行评估，得到平均 EPE 为 3.70。\n  + 进行基本误差分析。\n\n## 训练 \u003Ca name=\"pwc-net-training\">\u003C\u002Fa>\n\n\n### 多步学习率调度 \u003Ca name=\"pwc-net-training-multisteps\">\u003C\u002Fa>\n\n与原始论文不同，我们并未按顺序在 `FlyingChairs` 和 `FlyingThings3D` 上进行训练（即先在 `FlyingChairs` 上预训练，再在 `FlyingThings3D` 上微调）。这是因为 `MPI-Sintel` 数据集上的平均光流幅值仅为 13.5，而 `FlyingChairs` 和 `FlyingThings3D` 的平均光流幅值分别为 11.1 和 38。在我们的实验中，如果在 `FlyingThings3D` 上进行微调，反而会在 `MPI-Sintel` 上得到更差的结果。\n\n通过使用 `FlyingThings3D` 数据集的半分辨率版本，其平均光流幅值为 19，与 `FlyingChairs` 和 `MPI-Sintel` 更为接近，我们获得了更为稳定的结果。随后，我们在 `FlyingChairs` 和 `FlyingThings3DHalfRes` 数据集的混合数据上进行训练。当然，这一混合数据还可以进一步扩展至其他数据集。\n\n以下是上述列出的 S\u003Csub>long\u003C\u002Fsub> 训练笔记本的训练曲线：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_c00e8fa2c605.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_f91718701a2b.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_356cf3b1ee08.png)\n\n请注意，如果您在运行上述训练笔记本时点击 Tensorboard 中的 `IMAGE` 选项卡，便可以可视化训练过程中几个验证样本的进展（包括各金字塔层级上的预测光流），如下所示：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_cf7527e2678f.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_398f0fdc6f3d.png)\n\n### 循环学习率调度 \u003Ca name=\"pwc-net-training-cyclic\">\u003C\u002Fa>\n\n如果您不想使用长时间的训练调度，但仍想尝试这段代码，不妨试试我们非常短的**循环学习率调度**（10 万次迭代，批量大小 8）。虽然结果远不如前者，但能够**快速进行实验**：\n\n| 模型名称 | 笔记本 | FlyingChairs (384x512) AEPE | Sintel clean (436x1024) AEPE | Sintel final (436x1024) AEPE |\n| :---: | :---:| :---: | :---: | :---: |\n| `pwcnet-lg-6-2-cyclic-chairsthingsmix` | [训练](tfoptflow\u002Fpwcnet_train_lg-6-2-cyclic-chairsthingsmix.ipynb) | 2.67 ([笔记本](tfoptflow\u002Fpwcnet_eval_lg-6-2-cyclic-chairsthingsmix_flyingchairs.ipynb)) | 3.99 ([笔记本](tfoptflow\u002Fpwcnet_eval_lg-6-2-cyclic-chairsthingsmix_mpisintelclean.ipynb)) | 5.08 ([笔记本](tfoptflow\u002Fpwcnet_eval_lg-6-2-cyclic-chairsthingsmix_mpisintelfinal.ipynb)) |\n| `pwcnet-sm-6-2-cyclic-chairsthingsmix` | [训练](tfoptflow\u002Fpwcnet_train_sm-6-2-cyclic-chairsthingsmix.ipynb) | 2.79 ([笔记本](tfoptflow\u002Fpwcnet_eval_sm-6-2-cyclic-chairsthingsmix_flyingchairs.ipynb)) | 4.34 ([笔记本](tfoptflow\u002Fpwcnet_eval_sm-6-2-cyclic-chairsthingsmix_mpisintelclean.ipynb)) | 5.3 ([笔记本](tfoptflow\u002Fpwcnet_eval_sm-6-2-cyclic-chairsthingsmix_mpisintelfinal.ipynb)) |\n\n以下是 Cyclic\u003Csub>short\u003C\u002Fsub> 训练笔记本的训练曲线：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_276b8052438a.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_46226d1cc875.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_43f7fc376217.png)\n\n### 混合精度训练 \u003Ca name=\"pwc-net-training-mixed-precision\">\u003C\u002Fa>\n\n您可以通过使用**混合精度**训练进一步加快训练速度。不过，同样不要指望能达到相同的精度水平：\n\n| 模型名称 | 笔记本 | FlyingChairs (384x512) AEPE | Sintel clean (436x1024) AEPE | Sintel final (436x1024) AEPE |\n| :---: | :---:| :---: | :---: | :---: |\n| `pwcnet-sm-6-2-cyclic-chairsthingsmix-fp16` | [训练](tfoptflow\u002Fpwcnet_train_sm-6-2-cyclic-chairsthingsmix-fp16.ipynb) | 2.47 ([笔记本](tfoptflow\u002Fpwcnet_eval_sm-6-2-cyclic-chairsthingsmix-fp16.ipynb)) | 3.77 ([笔记本](pwcnet_eval_sm-6-2-cyclic-chairsthingsmix-fp16.ipynb)) | 4.90 ([笔记本](pwcnet_eval_sm-6-2-cyclic-chairsthingsmix-fp16.ipynb))|\n\n## 评估 \u003Ca name=\"pwc-net-eval\">\u003C\u002Fa>\n\n如评估笔记本所示，且正如预期，当一帧到下一帧的平均光流幅值较高时，PWC-Net 模型进行准确光流预测的难度会增加：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_2a7ac80f7503.png)\n\n尤其对于这一模型——以及任何其他基于两帧的运动估计器——当图像元素简单地从画面中消失或突然飞入时，生成准确预测尤为困难：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_a9d66b929cc5.png)\n\n尽管如此，当平均运动适中时，小型和大型模型都能产生令人瞩目的结果：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_20b04a8e1d4a.png)\n\n## 推理 \u003Ca name=\"pwc-net-predict\">\u003C\u002Fa>\n\n您可以通过两种方式调用此处提供的代码，为自己的数据集生成光流预测：\n\n- 使用 `ModelPWCNet` 对象的 `predict_from_img_pairs()` 方法，传入一组图像对；\n- 将 `OpticalFlowDataset` 对象传递给 `ModelPWCNet` 对象，并调用其 `predict()` 方法。\n\n### 在图像对上运行推理 \u003Ca name=\"pwc-net-predict-img-pairs\">\u003C\u002Fa>\n\n如果您想在自己的图像集上使用预训练的 PWC-Net 模型，可以按照如下示例，通过 `ModelPWCNet` 对象的 `predict_from_img_pairs()` 方法传入一组图像对：\n\n```python\nfrom __future__ import absolute_import, division, print_function\nfrom copy import deepcopy\nfrom skimage.io import imread\nfrom model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_TEST_OPTIONS\nfrom visualize import display_img_pairs_w_flows\n\n# 构建待处理的图像对列表\nimg_pairs = []\nfor pair in range(1, 4):\n    image_path1 = f'.\u002Fsamples\u002Fmpisintel_test_clean_ambush_1_frame_00{pair:02d}.png'\n    image_path2 = f'.\u002Fsamples\u002Fmpisintel_test_clean_ambush_1_frame_00{pair+1:02d}.png'\n    image1, image2 = imread(image_path1), imread(image_path2)\n    img_pairs.append((image1, image2))\n\n# TODO：设置用于推理的设备\n# 这里我们使用 GPU（若要使用 CPU 进行推理，请使用 '\u002Fdevice:CPU:0'）\ngpu_devices = ['\u002Fdevice:GPU:0']  \ncontroller = '\u002Fdevice:GPU:0'\n\n# TODO：设置训练好的模型路径（请确保已从 http:\u002F\u002Fbit.ly\u002Ftfoptflow 下载该模型）\nckpt_path = '.\u002Fmodels\u002Fpwcnet-lg-6-2-multisteps-chairsthingsmix\u002Fpwcnet.ckpt-595000'\n\n# 配置模型以进行推理，首先采用默认选项\nnn_opts = deepcopy(_DEFAULT_PWCNET_TEST_OPTIONS)\nnn_opts['verbose'] = True\nnn_opts['ckpt_path'] = ckpt_path\nnn_opts['batch_size'] = 1\nnn_opts['gpu_devices'] = gpu_devices\nnn_opts['controller'] = controller\n\n# 我们以四分之一分辨率模式运行 PWC-Net-large 模型\n# 即采用 6 层金字塔结构，并在最终光流预测中将第 2 层以 4 倍放大\nnn_opts['use_dense_cx'] = True\nnn_opts['use_res_cx'] = True\nnn_opts['pyr_lvls'] = 6\nnn_opts['flow_pred_lvl'] = 2\n\n# 本数据集中的图像尺寸并非 64 的倍数，而模型生成的光流则被填充至 64 的倍数。\n# 因此，我们需要将预测的光流裁剪回原始尺寸\nnn_opts['adapt_info'] = (1, 436, 1024, 2)\n\n# 实例化模型以进行推理，并显示模型配置\nnn = ModelPWCNet(mode='test', options=nn_opts)\nnn.print_config()\n\n# 生成预测并显示结果\npred_labels = nn.predict_from_img_pairs(img_pairs, batch_size=1, verbose=False)\ndisplay_img_pairs_w_flows(img_pairs, pred_labels)\n```\n\n上述代码可在 [`pwcnet_predict_from_img_pairs.ipynb`](tfoptflow\u002Fpwcnet_predict_from_img_pairs.ipynb) 笔记本及 [`pwcnet_predict_from_img_pairs.py`](tfoptflow\u002Fpwcnet_predict_from_img_pairs.py) 脚本中找到。\n\n### 在数据集的测试集上运行推理 \u003Ca name=\"pwc-net-predict-dataset\">\u003C\u002Fa>\n\n如果您想从头训练一个 PWC-Net 模型，或使用自己的数据集对预训练的 PWC-Net 模型进行微调，则需要**实现一个数据集处理器**，该处理器需继承自 [`dataset_base.py`](tfoptflow\u002Fdataset_base.py) 中的 `OpticalFlowDataset` 基类。\n\n我们为多个知名数据集提供了若干数据集处理器，例如 MPI-Sintel（[`dataset_mpisintel.py`](tfoptflow\u002Fdataset_mpisintel.py)）、FlyingChairs（[`dataset_flyingchairs.py`](tfoptflow\u002Fdataset_flyingchairs.py)）、FlyingThings3D（[`dataset_flyingthings3d.py`](tfoptflow\u002Fdataset_flyingthings3d.py)）以及 KITTI（[`dataset_kitti.py`](tfoptflow\u002Fdataset_kitti.py)）。这些处理器均可作为您实现自定义数据集的起点。请注意，这并不复杂；派生类只需告知基类哪些文件列表用于训练、验证和测试，其余繁重工作则由基类完成。\n\n一旦您拥有数据处理器，便可将其传递给 `ModelPWCNet` 对象，并调用其 `predict()` 方法，为其测试集生成光流预测，具体操作可参见 [`pwcnet_predict.ipynb`](tfoptflow\u002Fpwcnet_predict.ipynb) 笔记本及 [`pwcnet_predict.py`](tfoptflow\u002Fpwcnet_predict.py) 脚本。\n\n## 数据集 \u003Ca name=\"datasets\">\u003C\u002Fa>\n\n用于光流估计的常见数据集包括：\n\n- FlyingThings3D [[图像对](https:\u002F\u002Flmb.informatik.uni-freiburg.de\u002Fdata\u002FSceneFlowDatasets_CVPR16\u002FRelease_april16\u002Fdata\u002FFlyingThings3D\u002Fraw_data\u002Fflyingthings3d__frames_cleanpass.tar) + [光流](https:\u002F\u002Flmb.informatik.uni-freiburg.de\u002Fdata\u002FSceneFlowDatasets_CVPR16\u002FRelease_april16\u002Fdata\u002FFlyingThings3D\u002Fderived_data\u002Fflyingthings3d__optical_flow.tar.bz2) + [all_unused_files.txt](https:\u002F\u002Flmb.informatik.uni-freiburg.de\u002Fresources\u002Fdatasets\u002FSceneFlow\u002Fassets\u002Fall_unused_files.txt)]\n- FlyingChairs [[图像对 + 光流](https:\u002F\u002Flmb.informatik.uni-freiburg.de\u002Fdata\u002FFlyingChairs\u002FFlyingChairs.zip) + [FlyingChairs 训练与验证分割](https:\u002F\u002Flmb.informatik.uni-freiburg.de\u002Fresources\u002Fdatasets\u002FFlyingChairs\u002FFlyingChairs_train_val.txt)]\n- MPI Sintel [[zip]](http:\u002F\u002Ffiles.is.tue.mpg.de\u002Fsintel\u002FMPI-Sintel-complete.zip)\n- KITTI Flow 2012 [[zip]](http:\u002F\u002Fwww.cvlibs.net\u002Fdownload.php?file=data_stereo_flow.zip) 和\u002F或 KITTI Flow 2015 [[zip]](http:\u002F\u002Fwww.cvlibs.net\u002Fdownload.php?file=data_scene_flow.zip)\n\n其他光流数据集（未在此处使用）：\n\n- Middlebury 光流 [[网页]](http:\u002F\u002Fvision.middlebury.edu\u002Fflow\u002F)\n- 海德堡 HD1K 光流 [[网页]](http:\u002F\u002Fhci-benchmark.org\u002Fflow)\n\n根据 [[2018a]](#2018a)，KITTI 和 Sintel 目前是光流领域最具挑战性且应用最广泛的基准。KITTI 基准面向自动驾驶应用，其半稠密的真实光流数据通过激光雷达采集。2012 年的数据集仅包含静态场景。而 2015 年的数据集则通过人工标注扩展至动态场景，由于大运动、严重光照变化及遮挡，对现有方法提出了更高要求。\n\nSintel 基准则基于开源动画电影《Sintel》制作，分为清洁版和最终版两个阶段。其中，最终版包含强烈的大气效果、运动模糊及相机噪声，这些因素均会给现有方法带来严峻挑战。\n\n## 参考文献\n\n### 2018年\n\n- [2018a]\u003Ca name=\"2018a\">\u003C\u002Fa> 孙等人，2018年。PWC-Net：基于金字塔、扭曲和代价体积的卷积神经网络光流估计。[[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.02371) [[网页]](http:\u002F\u002Fresearch.nvidia.com\u002Fpublication\u002F2018-02_PWC-Net%3A-CNNs-for) [[PyTorch（官方）]](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FPWC-Net\u002Ftree\u002Fmaster\u002FPyTorch) [[PyTorch]](https:\u002F\u002Fgithub.com\u002Fsniklaus\u002Fpytorch-pwc) [[PyTorch]](https:\u002F\u002Fgithub.com\u002FRanhaoKang\u002FPWC-Net_pytorch) [[Caffe（官方）]](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FPWC-Net\u002Ftree\u002Fmaster\u002FCaffe) [[TensorFlow]](https:\u002F\u002Fgithub.com\u002Fdjl11\u002FPWC_Net_TensorFlow) [[TensorFlow]](https:\u002F\u002Fgithub.com\u002Fdaigo0927\u002FPWC-Net_tf) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vVU8XV0Ac_0) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=LBJ20kxr1a0)\n\n### 2017年\n\n- [2017a]\u003Ca name=\"2017a\">\u003C\u002Fa> Baghaie等人，2017年。用于光流估计的稠密描述子：一项比较研究。[[网页]](http:\u002F\u002Fwww.mdpi.com\u002F2313-433X\u002F3\u002F1\u002F12)\n\n### 2016年\n\n- [2016a]\u003Ca name=\"2016a\">\u003C\u002Fa> Ilg等人，2016年。FlowNet 2.0：深度网络在光流估计中的演进。[[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.01925) [[PyTorch（官方）]](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fflownet2-pytorch) [[TensorFlow]](https:\u002F\u002Fgithub.com\u002Fsampepose\u002Fflownet2-tf)\n- [2016b]\u003Ca name=\"2016b\">\u003C\u002Fa> Ranjan等人，2016年。SpyNet：基于空间金字塔网络的光流估计。[[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.00850) [[Torch（官方）]](https:\u002F\u002Fgithub.com\u002Fanuragranj\u002Fspynet) [[PyTorch]](https:\u002F\u002Fgithub.com\u002Fsniklaus\u002Fpytorch-spynet)\n\n### 2015年\n\n- [2015a]\u003Ca name=\"2015a\">\u003C\u002Fa> Fischer等人，2015年。FlowNet：利用卷积神经网络学习光流。[[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1504.06852) [[Tensorflow（FlowNet-S）]](https:\u002F\u002Fgithub.com\u002FDingGit\u002Fflownet-tf)\n\n## 致谢\n\n我们还感谢以下其他TensorFlow实现：\n- daigo0927的https:\u002F\u002Fgithub.com\u002Fdaigo0927\u002FPWC-Net_tf\n- djl11的https:\u002F\u002Fgithub.com\u002Fdjl11\u002FPWC_Net_TensorFlow\n- PatWie的https:\u002F\u002Fgithub.com\u002Ftensorpack\u002Ftensorpack\u002Ftree\u002Fmaster\u002Fexamples\u002FOpticalFlow\n\n```\n@InProceedings{Sun2018PWC-Net,\n  author    = {Deqing Sun and Xiaodong Yang and Ming-Yu Liu and Jan Kautz},\n  title     = {{PWC-Net}: {CNNs} for Optical Flow Using Pyramid, Warping, and Cost Volume},\n  booktitle = CVPR,\n  year      = {2018},\n}\n```\n\n```\n@InProceedings\\{DFIB15,\n  author       = \"A. Dosovitskiy and P. Fischer and E. Ilg and P. H{\\\"a}usser and C. Hazirbas and V. Golkov and P. v.d. Smagt and D. Cremers and T. Brox\",\n  title        = \"FlowNet: Learning Optical Flow with Convolutional Networks\",\n  booktitle    = \"IEEE International Conference on Computer Vision (ICCV)\",\n  month        = \"Dec\",\n  year         = \"2015\",\n  url          = \"http:\u002F\u002Flmb.informatik.uni-freiburg.de\u002F\u002FPublications\u002F2015\u002FDFIB15\"\n}\n```\n\n# 联系方式\n\n如果您对本工作有任何疑问，请随时通过以下方式联系我们：\n\n[![https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fphilferriere](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_readme_a5d35e87943b.png)](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fphilferriere)","# tfoptflow 快速上手指南\n\n## 环境准备\n- **系统**：Linux \u002F Windows 64 位  \n- **Python**：3.6+（推荐 Anaconda3 5.2.0）  \n- **GPU**：可选，支持单卡\u002F多卡（CUDA 10.0+ 驱动即可）  \n- **依赖**：TensorFlow ≥1.12（CPU\u002FGPU 均可）\n\n## 安装步骤\n\n1. 克隆仓库  \n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Ftfoptflow.git\n   cd tfoptflow\n   ```\n\n2. 创建并激活 Conda 环境  \n   - Linux  \n   ```bash\n   conda env create -f setup\u002Fdlubu36.yml\n   conda activate tfoptflow\n   ```\n   - Windows  \n   ```bash\n   conda env create -f setup\u002Fdlwin36.yml\n   activate tfoptflow\n   ```\n\n3. 安装额外依赖（如已用 yml 文件可跳过）  \n   ```bash\n   pip install opencv-python tqdm\n   ```\n\n## 基本使用\n\n### 1. 下载预训练模型\n```bash\n# 下载地址（浏览器打开即可）\nhttp:\u002F\u002Fbit.ly\u002Ftfoptflow\n# 选择 pwcnet-lg-6-2-multisteps-chairsthingsmix.zip 或 pwcnet-sm-6-2-multisteps-chairsthingsmix.zip\nunzip pwcnet-lg-6-2-multisteps-chairsthingsmix.zip -d models\u002F\n```\n\n### 2. 单对图片推理\n```python\nfrom model_pwcnet import ModelPWCNet\nfrom visualize import display_img_pairs_w_flows\n\n# 初始化模型\nnn = ModelPWCNet(mode='test', ckpt_path='models\u002Fpwcnet-lg-6-2-multisteps-chairsthingsmix')\n\n# 推理\nimg_pairs = [('data\u002Fframe_0010.png', 'data\u002Fframe_0011.png')]\npred_labels = nn.predict_from_img_pairs(img_pairs, batch_size=1, verbose=True)\n\n# 可视化\ndisplay_img_pairs_w_flows(img_pairs, pred_labels)\n```\n\n### 3. 批量推理（数据集）\n```python\n# 准备数据集目录结构\n# data\u002F\n#   sintel\u002F\n#     test\u002F\n#       final\u002F\n#         alley_1\u002F\n#           frame_0001.png\n#           frame_0002.png\n#           ...\n\n# 运行\nnn.predict_from_dataset(\n    dataset_root='data\u002Fsintel\u002Ftest\u002Ffinal',\n    save_dir='output',\n    batch_size=4\n)\n```\n\n### 4. 快速体验 Jupyter\n```bash\njupyter notebook\n# 打开 pwcnet_predict_from_img_pairs.ipynb 直接体验交互式推理\n```\n\n至此，您已完成 tfoptflow 的快速上手。如需训练或微调，请打开 `pwcnet_train_lg-6-2-multisteps-chairsthingsmix.ipynb` 参考完整流程。","一家做短视频特效的初创公司，需要在 2 周内上线“人物瞬移”滤镜：用户拍一段 3 秒视频，系统把人物从 A 点瞬间移到 B 点，背景保持连续。团队只有 1 名算法工程师 + 2 台 Windows 游戏本。\n\n### 没有 tfoptflow 时\n- 工程师先去找传统 Lucas-Kanade 光流，发现 OpenCV 版在 1080p 视频上帧率只有 3 fps，GPU 利用率 10%，完全达不到实时。\n- 想用 PyTorch 版 PWC-Net，但官方 repo 只给 Linux 训练脚本，Windows 下 CUDA 扩展编译失败，折腾 3 天环境还没跑通。\n- 训练数据只有 500 段 1080p 自拍视频，单卡 8 GB 显存一次只能塞 4 帧，batch size=1，训练 1 个 epoch 要 6 小时，迭代 50 次直接错过上线 deadline。\n- 最终只能退而求其次，用帧差法做“瞬移”，效果像 PPT 切换，用户吐槽“五毛特效”。\n\n### 使用 tfoptflow 后\n- 直接 `pip install` 就能在 Windows 上跑，预训练模型 5 分钟下载完，1080p 光流推理 25 fps，GPU 利用率拉到 90%，实时无压力。\n- 内置多卡 + 混合精度训练脚本，两台 1080 Ti 组 2×11 GB，batch size 提到 8，训练 1 个 epoch 缩短到 25 分钟，3 小时完成微调。\n- 用 repo 自带的 MPI-Sintel 预训练权重做迁移学习，500 段自拍数据只需 10 epoch 就收敛，EPE 从 3.8 降到 1.2，人物边缘不再撕裂。\n- 上线当天，滤镜在 App Store 冲上“摄影与录像”榜第 3，用户评论“瞬移毫无违和感”。\n\ntfoptflow 让 1 人 2 机的小团队在 Windows 上也能复现 SOTA 光流，把“五毛特效”变成“电影级瞬移”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilferriere_tfoptflow_52c065b7.png","philferriere","Phil Ferriere","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fphilferriere_bc8449f7.jpg","Former Cruise Senior Software\u002FResearch Engineer and Microsoft Tech\u002FDevelopment Lead passionate about Deep Learning with a focus on Computer Vision.","Freelance","Palm Springs, CA","pferriere@hotmail.com",null,"https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fphilferriere","https:\u002F\u002Fgithub.com\u002Fphilferriere",[86,90],{"name":87,"color":88,"percentage":89},"Jupyter Notebook","#DA5B0B",99.7,{"name":91,"color":92,"percentage":93},"Python","#3572A5",0.3,530,134,"2025-12-10T00:39:31","MIT","Linux, Windows","非必需；若使用 GPU，官方测试在 GTX 1080 \u002F 1080 Ti \u002F Titan X 上通过，未指定显存与 CUDA 版本","未说明",{"notes":102,"python":103,"dependencies":104},"使用纯 TensorFlow 实现，无需自定义 CUDA kernel，因此 CPU 亦可运行；官方提供 Ubuntu 与 Windows 的完整 conda 环境导出文件（.yml 与 .txt）","3.6（与 Anaconda3 v5.2.0 对应）",[105,106,107,108,109,110,111,112,113],"tensorflow","jupyter","numpy","opencv-python","matplotlib","scikit-image","h5py","pillow","tqdm",[52,14,51,13],[116,117,118,119,105,120,121,122,123,124],"optical-flow","computer-vision","cvpr2018","pwc-net","deep-learning","motion-estimation","mpi-sintel","flying-chairs","kitti-dataset","2026-03-27T02:49:30.150509","2026-04-06T06:53:16.723712",[128,133,138,143,148,153],{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},6046,"使用 FlyingThings3DHalfResDataset 时报错 AttributeError: 'FlyingThings3DHalfResDataset' object has no attribute 'generate_files'，如何解决？","该错误通常是因为 ds_root 参数格式不正确。FlyingThings3DHalfResDataset 要求 ds_root 是一个包含两个路径的元组：\n```python\nds_root = ('\u002Fpath\u002Fto\u002FFlyingThings3D_FullRes', '\u002Fpath\u002Fto\u002FFlyingThings3D_HalfRes')\n```\n第一次运行脚本时，代码会自动从全分辨率数据生成半分辨率数据，无需手动下采样。","https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Ftfoptflow\u002Fissues\u002F26",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},6047,"是否有在 KITTI 上预训练好的 PWC-Net 模型？","官方仓库未提供 KITTI 预训练模型。如需在 KITTI 上微调，需自行使用 KITTI 数据集进行训练。多位用户已在 Issue 中请求共享微调代码和模型，但未有官方回复。","https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Ftfoptflow\u002Fissues\u002F12",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},6048,"tfoptflow 与 nvlabs 官方实现的性能差异如何？","在 KITTI 数据集上，nvlabs 官方实现（https:\u002F\u002Fgithub.com\u002FNVlabs\u002FPWC-Net）的精度显著高于 tfoptflow 实现。用户实测对比显示官方实现结果更优。","https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Ftfoptflow\u002Fissues\u002F17",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},6049,"光流可视化时，向下运动的垂直光流显示为绿色而非黄色，如何解决？","这是颜色编码实现差异导致的问题。建议使用 Middlebury 官方 MATLAB 脚本（http:\u002F\u002Fvision.middlebury.edu\u002Fflow\u002Fsubmit\u002F）进行可视化，其颜色映射更准确。用户 jeffbaena 表示已找到替代方案，可通过邮件联系获取。","https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Ftfoptflow\u002Fissues\u002F20",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},6050,"pwcnet_finetune_sm-6-2-cyclic-chairsthingsmix.ipynb 训练时出现 loss=nan 如何解决？","该问题可能由数值不稳定导致。建议按维护者建议在 model_pwcnet.py 第 540 行附近添加梯度裁剪或数值稳定处理代码。具体修改需参考仓库最新更新。","https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Ftfoptflow\u002Fissues\u002F7",{"id":154,"question_zh":155,"answer_zh":156,"source_url":157},6051,"FlyingThings3D_HalfRes 数据集路径应如何配置？","路径结构应保持：\n```\ndataset_root\u002F\n├── FlyingThings3D_HalfRes\u002F\n│   ├── frames_cleanpass\u002F\n│ └── optical_flow\u002F\n```\n在代码中定义：\n```python\n_FLYINGTHINGS3DHALFRES_ROOT = '\u002Fyour\u002Fpath\u002FFlyingThings3D_HalfRes'\n```\n注意首次运行时会自动生成半分辨率数据，无需手动处理。","https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Ftfoptflow\u002Fissues\u002F9",[]]