[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-philipperemy--keras-tcn":3,"tool-philipperemy--keras-tcn":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":75,"owner_website":82,"owner_url":83,"languages":84,"stars":97,"forks":98,"last_commit_at":99,"license":100,"difficulty_score":101,"env_os":102,"env_gpu":103,"env_ram":104,"env_deps":105,"category_tags":110,"github_topics":111,"view_count":23,"oss_zip_url":117,"oss_zip_packed_at":117,"status":16,"created_at":118,"updated_at":119,"faqs":120,"releases":156},1298,"philipperemy\u002Fkeras-tcn","keras-tcn","Keras Temporal Convolutional Network. Supports Python and R.","keras-tcn 是一个专为 Keras 打造的“时间卷积网络”插件，几行代码就能把传统 LSTM\u002FGRU 换成更轻、更快、并行度更高的 TCN。它解决了长序列记忆不足、梯度消失、训练慢等痛点，在序列 MNIST、文本、金融时间序列等任务上常能直接超越循环网络。支持 Python 与 R，TensorFlow 2.9-2.19 均已验证，Mac 用户还能一键启用 GPU。开发者、机器学习研究者或任何想用卷积思路做时间序列建模的人都能即装即用；通过调节扩张卷积的层数、感受野、残差连接等参数，可灵活平衡精度与速度。","# Keras TCN\n\n*Keras Temporal Convolutional Network*. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.01271)]\n\nTested with Tensorflow 2.9, 2.10, 2.11, 2.12, 2.13, 2.14, 2.15, 2.16, 2.17, 2.18, 2.19 (Mar 13, 2025).\n\nFor a fully working example of Keras TCN using **R Language**, [browse here](https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F246).\n\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_ecb2431317e3.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fkeras-tcn)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_ecb2431317e3.png\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fkeras-tcn)\n![Keras TCN CI](https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fworkflows\u002FKeras%20TCN%20CI\u002Fbadge.svg?branch=master)\n```bash\npip install keras-tcn\n```\n\nFor [MacOS users](https:\u002F\u002Fdeveloper.apple.com\u002Fmetal\u002Ftensorflow-plugin\u002F) to use the GPU: `pip install tensorflow-metal`.\n\n## Why TCN (Temporal Convolutional Network) instead of LSTM\u002FGRU?\n\n- TCNs exhibit longer memory than recurrent architectures with the same capacity.\n- Performs better than LSTM\u002FGRU on long time series (Seq. MNIST, Adding Problem, Copy Memory, Word-level PTB...).\n- Parallelism (convolutional layers), flexible receptive field size (how far the model can see), stable gradients (compared to backpropagation through time, vanishing gradients)...\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_8be3ed217d4e.png\">\n  \u003Cb>Visualization of a stack of dilated causal convolutional layers (Wavenet, 2016)\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n## TCN Layer\n\n### TCN Class\n\n```python\nTCN(\n    nb_filters=64,\n    kernel_size=3,\n    nb_stacks=1,\n    dilations=(1, 2, 4, 8, 16, 32),\n    padding='causal',\n    use_skip_connections=True,\n    dropout_rate=0.0,\n    return_sequences=False,\n    activation='relu',\n    kernel_initializer='he_normal',\n    use_batch_norm=False,\n    use_layer_norm=False,\n    go_backwards=False,\n    return_state=False,\n    **kwargs\n)\n```\n\n### Arguments\n\n- `nb_filters`: Integer. The number of filters to use in the convolutional layers. Would be similar to `units` for LSTM. Can be a list.\n- `kernel_size`: Integer. The size of the kernel to use in each convolutional layer.\n- `dilations`: List\u002FTuple. A dilation list. Example is: [1, 2, 4, 8, 16, 32, 64].\n- `nb_stacks`: Integer. The number of stacks of residual blocks to use.\n- `padding`: String. The padding to use in the convolutions. 'causal' for a causal network (as in the original implementation) and 'same' for a non-causal network.\n- `use_skip_connections`: Boolean. If we want to add skip connections from input to each residual block.\n- `return_sequences`: Boolean. Whether to return the last output in the output sequence, or the full sequence.\n- `dropout_rate`: Float between 0 and 1. Fraction of the input units to drop.\n- `activation`: The activation used in the residual blocks o = activation(x + F(x)).\n- `kernel_initializer`: Initializer for the kernel weights matrix (Conv1D).\n- `use_batch_norm`: Whether to use batch normalization in the residual layers or not.\n- `use_layer_norm`: Whether to use layer normalization in the residual layers or not.\n- `go_backwards`: Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.\n- `return_state`: Boolean. Whether to return the last state in addition to the output. Default: False.\n- `kwargs`: Any other set of arguments for configuring the parent class Layer. For example \"name=str\", Name of the model. Use unique names when using multiple TCN.\n\n### Input shape\n\n3D tensor with shape `(batch_size, timesteps, input_dim)`.\n\n`timesteps` can be `None`. This can be useful if each sequence is of a different length: [Multiple Length Sequence Example](tasks\u002Fmulti_length_sequences.py).\n\n### Output shape\n\n- if `return_sequences=True`: 3D tensor with shape `(batch_size, timesteps, nb_filters)`.\n- if `return_sequences=False`: 2D tensor with shape `(batch_size, nb_filters)`.\n\n\n### How do I choose the correct set of parameters to configure my TCN layer?\n\nHere are some of my notes regarding my experience using TCN:\n\n- `nb_filters`: Present in any ConvNet architecture. It is linked to the predictive power of the model and affects the size of your network. The more, the better unless you start to overfit. It's similar to the number of units in an LSTM\u002FGRU architecture too.\n- `kernel_size`: Controls the spatial area\u002Fvolume considered in the convolutional ops. Good values are usually between 2 and 8. If you think your sequence heavily depends on t-1 and t-2, but less on the rest, then choose a kernel size of 2\u002F3. For NLP tasks, we prefer bigger kernel sizes. A large kernel size will make your network much bigger.\n- `dilations`: It controls how deep your TCN layer is. Usually, consider a list with multiple of two. You can guess how many dilations you need by matching the receptive field (of the TCN) with the length of features in your sequence. For example, if your input sequence is periodic, you might want to have multiples of that period as dilations.\n- `nb_stacks`: Not very useful unless your sequences are very long (like waveforms with hundreds of thousands of time steps).\n- `padding`: I have only used `causal` since a TCN stands for Temporal Convolutional Networks. Causal prevents information leakage.\n- `use_skip_connections`: Skip connections connects layers, similarly to DenseNet. It helps the gradients flow. Unless you experience a drop in performance, you should always activate it.\n- `return_sequences`: Same as the one present in the LSTM layer. Refer to the Keras doc for this parameter.\n- `dropout_rate`: Similar to `recurrent_dropout` for the LSTM layer. I usually don't use it much. Or set it to a low value like `0.05`.\n- `activation`: Leave it to default. I have never changed it.\n- `kernel_initializer`: If the training of the TCN gets stuck, it might be worth changing this parameter. For example: `glorot_uniform`.\n\n- `use_batch_norm`, `use_layer_norm`: Use normalization if your network is big enough and the task contains enough data. I usually prefer using `use_layer_norm`, but you can try and see which one works the best.\n\n\n### Receptive field\n\nThe receptive field is defined as: the maximum number of steps back in time from current sample at time T, that a filter from (block, layer, stack, TCN) can hit (effective history) + 1. The receptive field of the TCN can be calculated using the formula:\n\u003Cp align=\"center\">\n  \u003Cimg width=\"400\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_df57f4ed1c6f.png\">\n\u003C\u002Fp>\n\nwhere N\u003Csub>stack\u003C\u002Fsub> is the number of stacks, N\u003Csub>b\u003C\u002Fsub> is the number of residual blocks per stack, d is a vector containing the dilations of each residual block in each stack, and K is the kernel size. The 2 is there because there are two `Conv1d` layers in a single `ResidualBlock`.\n\nIdeally you want your receptive field to be bigger than the largest length of input sequence, if you pass a sequence longer than your receptive field into the model, any extra values (further back in the sequence) will be replaced with zeros.\n\n#### Examples\n\n*NOTE*: Unlike the TCN, example figures only include a single `Conv1d` per layer, so the formula becomes R\u003Csub>field\u003C\u002Fsub> = 1 + (K-1)⋅N\u003Csub>stack\u003C\u002Fsub>⋅Σi di (without the factor 2).\n\n- If a dilated conv net has only one stack of residual blocks with a kernel size of `2` and dilations `[1, 2, 4, 8]`, its receptive field is `16`. The image below illustrates it:\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_3fc57d5f4242.png\">\n  \u003Cb>ks = 2, dilations = [1, 2, 4, 8], 1 block\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n- If a dilated conv net has 2 stacks of residual blocks, you would have the situation below, that is, an increase in the receptive field up to 31:\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_52713b42ee05.jpg\">\n  \u003Cb>ks = 2, dilations = [1, 2, 4, 8], 2 blocks\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n\n- If we increased the number of stacks to 3, the size of the receptive field would increase again, such as below:\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_82291d8404b0.jpg\">\n  \u003Cb>ks = 2, dilations = [1, 2, 4, 8], 3 blocks\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n\n### Non-causal TCN\n\nMaking the TCN architecture non-causal allows it to take the future into consideration to do its prediction as shown in the figure below.\n\nHowever, it is not anymore suitable for real-time applications.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_7346693bfce8.png\">\n  \u003Cb>Non-Causal TCN - ks = 3, dilations = [1, 2, 4, 8], 1 block\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\nTo use a non-causal TCN, specify `padding='valid'` or `padding='same'` when initializing the TCN layers.\n\n## Run\n\nOnce `keras-tcn` is installed as a package, you can take a glimpse of what is possible to do with TCNs. Some tasks examples are available in the repository for this purpose:\n\n```bash\ncd adding_problem\u002F\npython main.py # run adding problem task\n\ncd copy_memory\u002F\npython main.py # run copy memory task\n\ncd mnist_pixel\u002F\npython main.py # run sequential mnist pixel task\n```\n\nReproducible results are possible on (NVIDIA) GPUs using the [tensorflow-determinism](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Ftensorflow-determinism) library. It was tested with keras-tcn by @lingdoc.\n\n## Tasks\n\n### Word PTB\n\nLanguage modeling remains one of the primary applications of recurrent networks. In this example, we show that TCN can beat LSTM on the [WordPTB](tasks\u002Fword_ptb\u002FREADME.md) task, without too much tuning.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_5754247cd59a.png\" width=\"800\">\u003Cbr>\n  \u003Ci>TCN vs LSTM (comparable number of weights)\u003C\u002Fi>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n### Adding Task\n\nThe task consists of feeding a large array of decimal numbers to the network, along with a boolean array of the same length. The objective is to sum the two decimals where the boolean array contain the two 1s.\n\n#### Explanation\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_d9fc43ae4f73.png\">\n  \u003Cb>Adding Problem Task\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n#### Implementation results\n\n```\n782\u002F782 [==============================] - 154s 197ms\u002Fstep - loss: 0.8437 - val_loss: 0.1883\n782\u002F782 [==============================] - 154s 196ms\u002Fstep - loss: 0.0702 - val_loss: 0.0111\n[...]\n782\u002F782 [==============================] - 152s 194ms\u002Fstep - loss: 6.9630e-04 - val_loss: 3.7180e-04\n```\n\n### Copy Memory Task\n\nThe copy memory consists of a very large array:\n- At the beginning, there's the vector x of length N. This is the vector to copy.\n- At the end, N+1 9s are present. The first 9 is seen as a delimiter.\n- In the middle, only 0s are there.\n\nThe idea is to copy the content of the vector x to the end of the large array. The task is made sufficiently complex by increasing the number of 0s in the middle.\n\n#### Explanation\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_e9e6709975fb.png\">\n  \u003Cb>Copy Memory Task\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n#### Implementation results (first epochs)\n\n```\n118\u002F118 [==============================] - 17s 143ms\u002Fstep - loss: 1.1732 - accuracy: 0.6725 - val_loss: 0.1119 - val_accuracy: 0.9796\n[...]\n118\u002F118 [==============================] - 15s 125ms\u002Fstep - loss: 0.0268 - accuracy: 0.9885 - val_loss: 0.0206 - val_accuracy: 0.9908\n118\u002F118 [==============================] - 15s 125ms\u002Fstep - loss: 0.0228 - accuracy: 0.9900 - val_loss: 0.0169 - val_accuracy: 0.9933\n```\n\n### Sequential MNIST\n\n#### Explanation\n\nThe idea here is to consider MNIST images as 1-D sequences and feed them to the network. This task is particularly hard because sequences are 28*28 = 784 elements. In order to classify correctly, the network has to remember all the sequence. Usual LSTM are unable to perform well on this task.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_f3b88ee0c2bd.png\">\n  \u003Cb>Sequential MNIST\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n#### Implementation results\n\n```\n1875\u002F1875 [==============================] - 46s 25ms\u002Fstep - loss: 0.0949 - accuracy: 0.9706 - val_loss: 0.0763 - val_accuracy: 0.9756\n1875\u002F1875 [==============================] - 46s 25ms\u002Fstep - loss: 0.0831 - accuracy: 0.9743 - val_loss: 0.0656 - val_accuracy: 0.9807\n[...]\n1875\u002F1875 [==============================] - 46s 25ms\u002Fstep - loss: 0.0486 - accuracy: 0.9840 - val_loss: 0.0572 - val_accuracy: 0.9832\n1875\u002F1875 [==============================] - 46s 25ms\u002Fstep - loss: 0.0453 - accuracy: 0.9858 - val_loss: 0.0424 - val_accuracy: 0.9862\n```\n\n## R Language\n\nFor a fully working example of Keras TCN using **R Language**, [browse here](https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F246).\n\n## References\n- https:\u002F\u002Fgithub.com\u002Flocuslab\u002FTCN\u002F (TCN for Pytorch)\n- https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.01271 (An Empirical Evaluation of Generic Convolutional and Recurrent Networks\nfor Sequence Modeling)\n- https:\u002F\u002Farxiv.org\u002Fpdf\u002F1609.03499 (Original Wavenet paper)\n- - https:\u002F\u002Fgithub.com\u002FBaichenjia\u002FTensorflow-TCN (Tensorflow Eager implementation of TCNs)\n\n## Citation\n\n```\n@misc{KerasTCN,\n  author = {Philippe Remy},\n  title = {Temporal Convolutional Networks for Keras},\n  year = {2020},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn}},\n}\n```\n\n## Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_632e70129ee0.png\" \u002F>\n\u003C\u002Fa>\n","# Keras TCN\n\n*Keras 时间卷积网络*。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.01271)]\n\n已在 Tensorflow 2.9、2.10、2.11、2.12、2.13、2.14、2.15、2.16、2.17、2.18、2.19 上进行测试（截至2025年3月13日）。\n\n如需使用 **R 语言** 的完整可用 Keras TCN 示例，请[浏览此处](https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F246)。\n\n[![下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_ecb2431317e3.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fkeras-tcn)\n[![下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_ecb2431317e3.png\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fkeras-tcn)\n![Keras TCN CI](https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fworkflows\u002FKeras%20TCN%20CI\u002Fbadge.svg?branch=master)\n```bash\npip install keras-tcn\n```\n\n对于 [MacOS 用户](https:\u002F\u002Fdeveloper.apple.com\u002Fmetal\u002Ftensorflow-plugin\u002F) 使用 GPU：`pip install tensorflow-metal`。\n\n## 为什么选择 TCN（时间卷积网络）而非 LSTM\u002FGRU？\n\n- 在相同容量下，TCN 具有比循环架构更长的内存。\n- 在长时序数据上表现优于 LSTM\u002FGRU（例如 Seq. MNIST、加法问题、复制记忆、词级 PTB 等）。\n- 并行性（卷积层）、灵活的感受野大小（模型能“看到”多远）、稳定的梯度（相比通过时间反向传播的梯度消失问题）……\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_8be3ed217d4e.png\">\n  \u003Cb>扩张因果卷积层堆叠的可视化（Wavenet，2016）\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n## TCN 层\n\n### TCN 类\n\n```python\nTCN(\n    nb_filters=64,\n    kernel_size=3,\n    nb_stacks=1,\n    dilations=(1, 2, 4, 8, 16, 32),\n    padding='causal',\n    use_skip_connections=True,\n    dropout_rate=0.0,\n    return_sequences=False,\n    activation='relu',\n    kernel_initializer='he_normal',\n    use_batch_norm=False,\n    use_layer_norm=False,\n    go_backwards=False,\n    return_state=False,\n    **kwargs\n)\n```\n\n### 参数说明\n\n- `nb_filters`：整数。卷积层中使用的滤波器数量。类似于 LSTM 的 `units`。可以是一个列表。\n- `kernel_size`：整数。每个卷积层中使用的卷积核大小。\n- `dilations`：列表\u002F元组。扩张列表。示例：[1, 2, 4, 8, 16, 32, 64]。\n- `nb_stacks`：整数。使用的残差块堆栈数量。\n- `padding`：字符串。卷积中使用的填充方式。“causal”表示因果网络（如原始实现），“same”表示非因果网络。\n- `use_skip_connections`：布尔值。是否在每个残差块中添加从输入到输出的跳跃连接。\n- `return_sequences`：布尔值。是否返回输出序列中的最后一个结果，还是整个序列。\n- `dropout_rate`：介于 0 和 1 之间的浮点数。丢弃输入单元的比例。\n- `activation`：残差块中使用的激活函数，即 o = activation(x + F(x))。\n- `kernel_initializer`：卷积层权重矩阵的初始化方法（Conv1D）。\n- `use_batch_norm`：是否在残差层中使用批归一化。\n- `use_layer_norm`：是否在残差层中使用层归一化。\n- `go_backwards`：布尔值（默认为 False）。如果为 True，则反向处理输入序列并返回反转后的序列。\n- `return_state`：布尔值。是否在输出之外返回最后一个状态。默认为 False。\n- `kwargs`：用于配置父类 Layer 的其他参数。例如，“name=str”，模型名称。当使用多个 TCN 时，应使用唯一名称。\n\n### 输入形状\n\n形状为 `(batch_size, timesteps, input_dim)` 的三维张量。\n\n`timesteps` 可以是 `None`。这在每个序列长度不同时很有用：[多长度序列示例](tasks\u002Fmulti_length_sequences.py)。\n\n### 输出形状\n\n- 如果 `return_sequences=True`：形状为 `(batch_size, timesteps, nb_filters)` 的三维张量。\n- 如果 `return_sequences=False`：形状为 `(batch_size, nb_filters)` 的二维张量。\n\n\n### 如何选择合适的参数来配置我的 TCN 层？\n\n以下是我使用 TCN 的一些经验总结：\n\n- `nb_filters`：任何卷积网络架构中都存在。它与模型的预测能力相关，并影响网络规模。通常越多越好，除非开始过拟合。这也类似于 LSTM\u002FGRU 架构中的单元数量。\n- `kernel_size`：控制卷积操作中考虑的空间范围\u002F体积。通常取值在 2 到 8 之间。如果你认为序列主要依赖于 t-1 和 t-2，而对其他部分依赖较小，则可选择 2 或 3 的卷积核大小。对于 NLP 任务，我们倾向于更大的卷积核。较大的卷积核会使网络规模显著增大。\n- `dilations`：控制 TCN 层的深度。通常建议使用包含多个 2 的列表。可以通过将 TCN 的感受野与序列中特征的长度相匹配来估算需要多少个扩张。例如，如果输入序列是周期性的，可以考虑将周期的倍数作为扩张值。\n- `nb_stacks`：除非你的序列非常长（如包含数十万个时间步的波形），否则作用不大。\n- `padding`：我只使用过“causal”，因为 TCN 代表时间卷积网络。因果填充可以防止信息泄漏。\n- `use_skip_connections`：跳跃连接类似于 DenseNet，用于连接各层，有助于梯度流动。除非性能下降，否则应始终启用。\n- `return_sequences`：与 LSTM 层中的参数相同。请参考 Keras 文档了解该参数。\n- `dropout_rate`：类似于 LSTM 层中的 `recurrent_dropout`。我通常不太使用，或将其设置为较低的值，如 0.05。\n- `activation`：保持默认即可。我从未更改过。\n- `kernel_initializer`：如果 TCN 的训练陷入停滞，不妨尝试更改此参数。例如：“glorot_uniform”。\n\n- `use_batch_norm`、`use_layer_norm`：如果网络足够大且任务数据量充足，可以使用归一化。我通常更倾向于使用 `use_layer_norm`，但你可以尝试看看哪种效果更好。\n\n### 受体野\n\n受体野的定义是：从当前时刻 T 的样本向前追溯的最大步数，即（块、层、堆栈、TCN）中的滤波器能够作用到的有效历史长度 + 1。TCN 的受体野可以通过以下公式计算：\n\u003Cp align=\"center\">\n  \u003Cimg width=\"400\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_df57f4ed1c6f.png\">\n\u003C\u002Fp>\n\n其中，N\u003Csub>stack\u003C\u002Fsub> 表示堆栈的数量，N\u003Csub>b\u003C\u002Fsub> 表示每个堆栈中的残差块数量，d 是一个向量，包含每个堆栈中每个残差块的扩张率，K 则是卷积核的大小。之所以乘以 2，是因为单个 `ResidualBlock` 中包含两个 `Conv1d` 层。\n\n理想情况下，你的受体野应该大于输入序列的最大长度；如果你将一个超过受体野长度的序列输入模型，那么序列中更靠前的多余值将会被零填充替代。\n\n#### 示例\n\n*注*：与 TCN 不同，示例图中每层仅包含一个 `Conv1d`，因此公式变为 R\u003Csub>field\u003C\u002Fsub> = 1 + (K-1)⋅N\u003Csub>stack\u003C\u002Fsub>⋅Σi di（不带系数 2）。\n\n- 如果一个扩张卷积网络仅有一个由残差块组成的堆栈，卷积核大小为 `2`，扩张率为 `[1, 2, 4, 8]`，则其受体野为 `16`。下图展示了这一情况：\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_3fc57d5f4242.png\">\n  \u003Cb>ks = 2, dilations = [1, 2, 4, 8], 1 block\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n- 如果一个扩张卷积网络有 2 个由残差块组成的堆栈，则会出现如下情况，即受体野增至 31：\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_52713b42ee05.jpg\">\n  \u003Cb>ks = 2, dilations = [1, 2, 4, 8], 2 blocks\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n\n- 如果我们将堆栈数量增加到 3 个，则受体野的大小会再次增大，如下所示：\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_82291d8404b0.jpg\">\n  \u003Cb>ks = 2, dilations = [1, 2, 4, 8], 3 blocks\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n\n### 非因果 TCN\n\n将 TCN 架构设计为非因果，使其能够考虑未来信息来进行预测，如图所示。\n\n然而，这种设计不再适用于实时应用。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_7346693bfce8.png\">\n  \u003Cb>非因果 TCN - ks = 3, dilations = [1, 2, 4, 8], 1 block\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n要使用非因果 TCN，需在初始化 TCN 层时指定 `padding='valid'` 或 `padding='same'`。\n\n## 运行\n\n一旦 `keras-tcn` 作为包安装完毕，你就可以初步了解 TCN 能够完成的任务。为此，仓库中提供了一些任务示例：\n\n```bash\ncd adding_problem\u002F\npython main.py # 运行加法问题任务\n\ncd copy_memory\u002F\npython main.py # 运行复制记忆任务\n\ncd mnist_pixel\u002F\npython main.py # 运行顺序 MNIST 像素任务\n```\n\n通过使用 [tensorflow-determinism](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Ftensorflow-determinism) 库，在（NVIDIA）GPU 上可以实现可复现的结果。该库已由 @lingdoc 在 keras-tcn 上进行了测试。\n\n## 任务\n\n### Word PTB\n\n语言建模仍然是循环网络的主要应用之一。在本例中，我们展示了 TCN 在 [WordPTB](tasks\u002Fword_ptb\u002FREADME.md) 任务上无需过多调优即可超越 LSTM。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_5754247cd59a.png\" width=\"800\">\u003Cbr>\n  \u003Ci>TCN 与 LSTM（权重数量相当）\u003C\u002Fi>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n### 加法任务\n\n该任务要求将一个大型十进制数数组与一个相同长度的布尔数组一同输入网络。目标是在布尔数组中出现两个 1 的位置对这两个十进制数进行求和。\n\n#### 解释\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_d9fc43ae4f73.png\">\n  \u003Cb>加法问题任务\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n#### 实施结果\n\n```\n782\u002F782 [==============================] - 154s 197ms\u002Fstep - loss: 0.8437 - val_loss: 0.1883\n782\u002F782 [==============================] - 154s 196ms\u002Fstep - loss: 0.0702 - val_loss: 0.0111\n[...]\n782\u002F782 [==============================] - 152s 194ms\u002Fstep - loss: 6.9630e-04 - val_loss: 3.7180e-04\n```\n\n### 复制记忆任务\n\n复制记忆任务涉及一个非常大的数组：\n- 开头是一个长度为 N 的向量 x，这是需要复制的向量。\n- 末尾有 N+1 个 9，其中第一个 9 被视为分隔符。\n- 中间只有 0。\n\n任务的目标是将向量 x 的内容复制到大数组的末尾。通过增加中间的 0 的数量，任务的复杂度得以提升。\n\n#### 解释\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_e9e6709975fb.png\">\n  \u003Cb>复制记忆任务\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n#### 实施结果（前几轮）\n\n```\n118\u002F118 [==============================] - 17s 143ms\u002Fstep - loss: 1.1732 - accuracy: 0.6725 - val_loss: 0.1119 - val_accuracy: 0.9796\n[...]\n118\u002F118 [==============================] - 15s 125ms\u002Fstep - loss: 0.0268 - accuracy: 0.9885 - val_loss: 0.0206 - val_accuracy: 0.9908\n118\u002F118 [==============================] - 15s 125ms\u002Fstep - loss: 0.0228 - accuracy: 0.9900 - val_loss: 0.0169 - val_accuracy: 0.9933\n```\n\n### 顺序 MNIST\n\n#### 解释\n\n这里的思路是将 MNIST 图像视为一维序列并输入网络。由于序列长度为 28×28=784 个元素，这项任务尤其困难。为了正确分类，网络必须记住整个序列。普通的 LSTM 在这项任务上表现不佳。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_f3b88ee0c2bd.png\">\n  \u003Cb>顺序 MNIST\u003C\u002Fb>\u003Cbr>\u003Cbr>\n\u003C\u002Fp>\n\n#### 实施结果\n\n```\n1875\u002F1875 [==============================] - 46s 25ms\u002Fstep - loss: 0.0949 - accuracy: 0.9706 - val_loss: 0.0763 - val_accuracy: 0.9756\n1875\u002F1875 [==============================] - 46s 25ms\u002Fstep - loss: 0.0831 - accuracy: 0.9743 - val_loss: 0.0656 - val_accuracy: 0.9807\n[...]\n1875\u002F1875 [==============================] - 46s 25ms\u002Fstep - loss: 0.0486 - accuracy: 0.9840 - val_loss: 0.0572 - val_accuracy: 0.9832\n1875\u002F1875 [==============================] - 46s 25ms\u002Fstep - loss: 0.0453 - accuracy: 0.9858 - val_loss: 0.0424 - val_accuracy: 0.9862\n```\n\n## R 语言\n\n关于使用 **R 语言** 完整运行 Keras TCN 的示例，请[浏览此处](https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F246)。\n\n## 参考文献\n- https:\u002F\u002Fgithub.com\u002Flocuslab\u002FTCN\u002F（Pytorch 版 TCN）\n- https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.01271（通用卷积与循环网络在序列建模中的实证评估）\n- https:\u002F\u002Farxiv.org\u002Fpdf\u002F1609.03499（原始 Wavenet 论文）\n- https:\u002F\u002Fgithub.com\u002FBaichenjia\u002FTensorflow-TCN（TensorFlow Eager 版 TCN 实现）\n\n## 引用\n\n```\n@misc{KerasTCN,\n  author = {Philippe Remy},\n  title = {用于 Keras 的时间卷积网络},\n  year = {2020},\n  publisher = {GitHub},\n  journal = {GitHub 仓库},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn}},\n}\n```\n\n## 贡献者\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_readme_632e70129ee0.png\" \u002F>\n\u003C\u002Fa>","# keras-tcn 快速上手指南\n\n## 环境准备\n- **系统**：Windows \u002F macOS \u002F Linux  \n- **Python**：≥3.7  \n- **TensorFlow**：2.9–2.19（已验证）  \n- **硬件**：CPU 即可；GPU 需 NVIDIA 显卡 + CUDA 11.2+（macOS 可用 `tensorflow-metal`）\n\n## 安装步骤\n```bash\n# 1. 创建并激活虚拟环境（可选）\npython -m venv venv\nsource venv\u002Fbin\u002Factivate  # Windows: venv\\Scripts\\activate\n\n# 2. 安装 keras-tcn\npip install keras-tcn -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 3. （可选）GPU 加速\n# NVIDIA\npip install tensorflow-determinism  # 保证结果可复现\n# macOS\npip install tensorflow-metal\n```\n\n## 基本使用\n```python\nfrom tensorflow.keras.models import Sequential\nfrom tensorflow.keras.layers import Dense\nfrom tcn import TCN\nimport numpy as np\n\n# 1. 构造示例数据：100 条序列，每条 50 步，每步 10 维特征\nx = np.random.rand(100, 50, 10).astype(np.float32)\ny = np.random.randint(0, 2, size=(100, 1))\n\n# 2. 构建模型\nmodel = Sequential([\n    TCN(nb_filters=64,\n        kernel_size=3,\n        dilations=[1, 2, 4, 8],\n        padding='causal',\n        return_sequences=False),\n    Dense(1, activation='sigmoid')\n])\n\n# 3. 编译并训练\nmodel.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])\nmodel.fit(x, y, epochs=5, batch_size=16)\n```\n\n运行后即可看到训练日志。","一家做智慧物流的初创公司，需要实时预测全国 2000 多个分拨中心的未来 7 天包裹到达量，以便提前调度人力与车辆。\n\n### 没有 keras-tcn 时\n- 用 LSTM 建模，序列长度一旦超过 96 步（4 天×24 小时），训练时间从 2 小时飙升到 8 小时，GPU 利用率却不到 30 %。  \n- 为了缓解梯度消失，不得不堆 3 层 LSTM，参数膨胀到 1200 万，线上推理延迟 180 ms，无法做到分钟级更新。  \n- 遇到“618”大促这种超长周期模式，模型只能看到最近 4 天，导致峰值预测误差高达 35 %，临时加人成本翻倍。  \n- 不同分拨中心的历史长度差异大，短的 30 天、长的 2 年，LSTM 需要手动补零或截断，代码里一团 if-else。  \n\n### 使用 keras-tcn 后\n- 同样 96 步输入，TCN 的并行卷积把训练时间从 8 小时压回 45 分钟，GPU 利用率稳定在 90 % 以上。  \n- 一层 TCN（nb_filters=128, dilations=[1,2,4,8,16,32]）只有 400 万参数，推理延迟降到 25 ms，分钟级滚动预测轻松实现。  \n- 扩张卷积让感受野直接覆盖 7 天（168 步），大促峰值误差降到 12 %，提前 2 天完成人力排班，节省加班费 20 万元。  \n- keras-tcn 支持变长输入，直接把 `(None, 24)` 喂进去即可，不同中心共用同一套模型文件，代码量减少 60 %。  \n\nkeras-tcn 用卷积替代循环，既让长序列预测更快更准，又把工程复杂度降到可维护范围。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphilipperemy_keras-tcn_9d7eb13a.png","philipperemy","Philippe Rémy","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fphilipperemy_ab4d68d0.jpg","From Paris to Bangkok via London, Tokyo, LA. Ex @ImperialCollegeLondon. Maths, Stats, Machine Learning.   🏄 🎲 🎱 Co Founder of @skysense, an AI startup.","Imperial College London","Earth","premy.enseirb@gmail.com","philipperemy.github.io\u002Fme","https:\u002F\u002Fgithub.com\u002Fphilipperemy",[85,89,93],{"name":86,"color":87,"percentage":88},"Python","#3572A5",53.5,{"name":90,"color":91,"percentage":92},"Jupyter Notebook","#DA5B0B",46,{"name":94,"color":95,"percentage":96},"Shell","#89e051",0.5,2002,463,"2026-04-01T15:02:36","MIT",1,"Linux, macOS, Windows","可选；NVIDIA GPU 需安装 CUDA 11.2+（对应 TensorFlow 2.9–2.19）。macOS 用户如需 GPU 需额外安装 tensorflow-metal","未说明",{"notes":106,"python":107,"dependencies":108},"安装命令：pip install keras-tcn；如需可复现实验，可额外安装 tensorflow-determinism。示例任务位于 adding_problem\u002F、copy_memory\u002F、mnist_pixel\u002F 目录下，可直接运行 main.py 体验。","未说明（依赖 TensorFlow 2.9–2.19，通常需 Python 3.7–3.11）",[109,67],"tensorflow>=2.9,\u003C=2.19",[13],[112,113,114,115,116],"keras","tcn","recurrent-neural-networks","deep-learning","machine-learning",null,"2026-03-27T02:49:30.150509","2026-04-06T05:44:15.921149",[121,126,131,136,141,146,151],{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},5944,"升级 TCN 后 BatchNorm 报错、模型结构改变怎么办？","该问题已在最新版修复。若仍遇到，请：\n1. 确认已升级到最新 keras-tcn：`pip install keras-tcn --upgrade`\n2. 用 MNIST 示例验证：关闭 dropout、开启 batch_norm，10 个 epoch 内应能达到 ≈98.4 % 准确率。\n3. 若自定义代码仍报错，可暂时回退到旧版本（如 2.8.3）并锁定版本号。","https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F88",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},5945,"TCN 支持“有状态（stateful）”模式以处理变长序列吗？","官方实现目前不直接支持 stateful 模式。若需处理变长序列或节省显存，可参考以下思路：\n- 使用外部 FIFO 队列缓存历史激活值（参考 fast-wavenet 实现）。\n- 在推断阶段手动维护隐藏状态，将 TCN 拆分为逐时间步调用。\n- 训练阶段仍需完整序列，stateful 主要用于在线\u002F流式推断。","https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F33",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},5946,"Python 2.7 下 `from tcn import tcn` 报 SyntaxError（f-string）怎么办？","该错误是因为源码使用了 Python 3.6+ 的 f-string。解决：\n1. 升级到 Python 3.6+。\n2. 若必须留在 Python 2.7，安装已去掉 f-string 的兼容版本：\n   ```bash\n   pip install keras-tcn==2.7.1\n   ```","https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F24",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},5947,"在 Colab 使用 TensorBoard callback 时报错 `tf.bool` 类型不匹配怎么办？","这是 TensorFlow 2.4–2.6 与 weight_norm 的已知冲突。解决：\n1. 升级到 TensorFlow ≥2.7：`pip install tensorflow>=2.7`。\n2. 或临时关闭 weight_norm：在 TCN 层设置 `use_weight_norm=False` 后再添加 callback。","https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F179",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},5948,"如何获得时间序列分类（多类别）的完整示例？","在回归示例基础上做以下改动即可：\n```python\n# 最后一层改为多类别输出\nmodel.add(Dense(num_classes, activation='softmax'))\n# 损失函数改为分类交叉熵\nmodel.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])\n```\n官方示例已更新，可直接参考 README 中的 classification 用例。","https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F50",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},5949,"TCN 支持“多对多”回归吗？","支持。将 `return_sequences=True` 并保持输出序列长度即可。可直接复用 `copy_memory` 示例：\n- 设置 `regression=True, classes=0`\n- 输入与输出序列长度相同，即为“多对多”回归。无需修改 TCN 内部代码。","https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F7",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},5950,"出现警告“Gradients do not exist for variables ['kernel', 'bias']”怎么办？","该警告通常因网络中某些层未参与损失计算。检查：\n1. 是否忘记在 `model.compile()` 中传入 `loss` 参数。\n2. 若使用 `Sequential`，确保所有 TCN 层都在模型内且 `return_sequences` 设置正确。\n3. 升级到最新 keras-tcn（≥3.5.1）已修复部分梯度作用域问题。","https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fkeras-tcn\u002Fissues\u002F263",[157,161,165,170,175,179,183,187],{"id":158,"version":159,"summary_zh":117,"released_at":160},105557,"3.3.0","2021-02-16T10:00:47",{"id":162,"version":163,"summary_zh":117,"released_at":164},105558,"3.2.1","2021-01-02T12:54:19",{"id":166,"version":167,"summary_zh":168,"released_at":169},105559,"3.1.0","https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.01271.pdf","2020-04-25T10:36:56",{"id":171,"version":172,"summary_zh":173,"released_at":174},105560,"2.7.0","- Add activation at the end of the res block o = activation(f(x) + x)","2019-06-14T02:14:46",{"id":176,"version":177,"summary_zh":117,"released_at":178},105561,"2.5.7","2019-02-26T16:23:35",{"id":180,"version":181,"summary_zh":117,"released_at":182},105562,"2.5.6","2019-02-24T17:15:41",{"id":184,"version":185,"summary_zh":117,"released_at":186},105563,"2.3.6","2019-02-20T03:50:50",{"id":188,"version":189,"summary_zh":117,"released_at":190},105564,"2.3.5","2019-01-09T04:41:58"]