[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-BloodAxe--pytorch-toolbelt":3,"tool-BloodAxe--pytorch-toolbelt":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",155373,2,"2026-04-14T11:34:08",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":95,"env_os":96,"env_gpu":97,"env_ram":98,"env_deps":99,"category_tags":104,"github_topics":105,"view_count":32,"oss_zip_url":79,"oss_zip_packed_at":79,"status":17,"created_at":122,"updated_at":123,"faqs":124,"releases":154},7510,"BloodAxe\u002Fpytorch-toolbelt","pytorch-toolbelt","PyTorch extensions for fast R&D prototyping and Kaggle farming","pytorch-toolbelt 是一款专为 PyTorch 设计的扩展库，旨在加速深度学习的研究原型开发与 Kaggle 竞赛实战。它并非要取代 Catalyst 或 Fast.ai 等高层框架，而是作为强有力的补充，提供了一系列开箱即用的“瑞士军刀”式功能，帮助开发者摆脱重复造轮子的困境。\n\n该工具主要解决了模型构建繁琐、常用模块缺失以及大尺寸图像推理困难等痛点。它将科研与竞赛中高频使用的代码封装成简洁接口，让用户能更专注于算法创新而非工程细节。特别适合从事计算机视觉的研究人员、数据科学家以及热衷于 Kaggle 竞赛的开发者使用。\n\n在技术亮点方面，pytorch-toolbelt 内置了灵活的编码器 - 解码器架构，可轻松搭建 U-Net 等经典模型；集成了 CoordConv、SCSE、Hypercolumn 等先进模块；提供了丰富的损失函数（如 Focal Loss、Dice Loss 等）以满足不同任务需求。此外，它还支持针对分割和分类任务的 GPU 加速测试时增强（TTA），并能高效处理高达 5000x5000 像素的超大图像推理，显著提升了实验效率与模型性能。","# Important Update\n\n![ukraine-flag](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBloodAxe_pytorch-toolbelt_readme_4aab4defff0f.jpg)\n\nOn February 24th, 2022, Russia declared war and invaded peaceful Ukraine. \nAfter the annexation of Crimea and the occupation of the Donbas region, Putin's regime decided to destroy Ukrainian nationality.\nUkrainians show fierce resistance and demonstrate to the entire world what it's like to fight for the nation's independence.\n\nUkraine's government launched a website to help russian mothers, wives & sisters find their beloved ones killed or captured in Ukraine - https:\u002F\u002F200rf.com & https:\u002F\u002Ft.me\u002Frf200_now (Telegram channel).\nOur goal is to inform those still in Russia & Belarus, so they refuse to assault Ukraine. \n\nHelp us get maximum exposure to what is happening in Ukraine, violence, and inhuman acts of terror that the \"Russian World\" has brought to Ukraine. \nThis is a comprehensive Wiki on how you can help end this war: https:\u002F\u002Fhow-to-help-ukraine-now.super.site\u002F \n\nOfficial channels\n* [Official account of the Parliament of Ukraine](https:\u002F\u002Ft.me\u002Fverkhovnaradaofukraine)\n* [Ministry of Defence](https:\u002F\u002Fwww.facebook.com\u002FMinistryofDefence.UA)\n* [Office of the president](https:\u002F\u002Fwww.facebook.com\u002Fpresident.gov.ua)\n* [Cabinet of Ministers of Ukraine](https:\u002F\u002Fwww.facebook.com\u002FKabminUA)\n* [Center of strategic communications](https:\u002F\u002Fwww.facebook.com\u002FStratcomCentreUA)\n* [Minister of Foreign Affairs of Ukraine](https:\u002F\u002Ftwitter.com\u002FDmytroKuleba)\n\nGlory to Ukraine!\n\n\n# Pytorch-toolbelt\n\nA `pytorch-toolbelt` is a Python library with a set of bells and whistles for PyTorch for fast R&D prototyping and Kaggle farming:\n\n## What's inside\n\n* Easy model building using flexible encoder-decoder architecture.\n* Modules: CoordConv, SCSE, Hypercolumn, Depthwise separable convolution and more.\n* GPU-friendly test-time augmentation TTA for segmentation and classification\n* GPU-friendly inference on huge (5000x5000) images\n* Every-day common routines (fix\u002Frestore random seed, filesystem utils, metrics)\n* Losses: BinaryFocalLoss, Focal, ReducedFocal, Lovasz, Jaccard and Dice losses, Wing Loss and more.\n* Extras for [Catalyst](https:\u002F\u002Fgithub.com\u002Fcatalyst-team\u002Fcatalyst) library (Visualization of batch predictions, additional metrics) \n\nShowcase: [Catalyst, Albumentations, Pytorch Toolbelt example: Semantic Segmentation @ CamVid](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1OUPJYU7TzH5Vz1si6FBkooackuIlzaGr#scrollTo=GUWuiO5K3aUm)\n\n# Why\n\nHonest answer is \"I needed a convenient way to re-use code for my Kaggle career\". \nDuring 2018 I achieved a [Kaggle Master](https:\u002F\u002Fwww.kaggle.com\u002Fbloodaxe) badge and this been a long path. \nVery often I found myself re-using most of the old pipelines over and over again. \nAt some point it crystallized into this repository. \n\nThis lib is not meant to replace catalyst \u002F ignite \u002F fast.ai high-level frameworks. Instead it's designed to complement them.\n\n# Installation\n\n`pip install pytorch_toolbelt`\n\n# How do I ... \n\n## Model creation\n\n### Create Encoder-Decoder U-Net model\n\nBelow a code snippet that creates vanilla U-Net model for binary segmentation. \nBy design, both encoder and decoder produces a list of tensors, from fine (high-resolution, indexed `0`) to coarse (low-resolution) feature maps. \nAccess to all intermediate feature maps is beneficial if you want to apply deep supervision losses on them or encoder-decoder of object detection task, \nwhere access to intermediate feature maps is necessary.\n \n```python\nfrom torch import nn\nfrom pytorch_toolbelt.modules import encoders as E\nfrom pytorch_toolbelt.modules import decoders as D\n\nclass UNet(nn.Module):\n    def __init__(self, input_channels, num_classes):\n        super().__init__()\n        self.encoder = E.UnetEncoder(in_channels=input_channels, out_channels=32, growth_factor=2)\n        self.decoder = D.UNetDecoder(self.encoder.channels, decoder_features=32)\n        self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)\n\n    def forward(self, x):\n        x = self.encoder(x)\n        x = self.decoder(x)\n        return self.logits(x[0])\n```\n\n### Create Encoder-Decoder FPN model with pretrained encoder\n\nSimilarly to previous example, you can change decoder to FPN with contatenation. \n\n ```python\nfrom torch import nn\nfrom pytorch_toolbelt.modules import encoders as E\nfrom pytorch_toolbelt.modules import decoders as D\n\nclass SEResNeXt50FPN(nn.Module):\n    def __init__(self, num_classes, fpn_channels):\n        super().__init__()\n        self.encoder = E.SEResNeXt50Encoder()\n        self.decoder = D.FPNCatDecoder(self.encoder.channels, fpn_channels)\n        self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)\n\n    def forward(self, x):\n        x = self.encoder(x)\n        x = self.decoder(x)\n        return self.logits(x[0])\n```\n\n### Change number of input channels for the Encoder\n\nAll encoders from `pytorch_toolbelt` supports changing number of input channels. Simply call `encoder.change_input_channels(num_channels)` and first convolution layer will be changed.\nWhenever possible, existing weights of convolutional layer will be re-used (in case new number of channels is greater than default, new weight tensor will be padded with randomly-initialized weigths).\nClass method returns `self`, so this call can be chained.\n\n\n```python\nfrom pytorch_toolbelt.modules import encoders as E\n\nencoder = E.SEResnet101Encoder()\nencoder = encoder.change_input_channels(6)\n```\n\n\n## Misc\n\n\n## Count number of parameters in encoder\u002Fdecoder and other modules\n\nWhen designing a model and optimizing number of features in neural network, I found it's quite useful to print number of parameters in high-level blocks (like `encoder` and `decoder`).\nHere is how to do it with `pytorch_toolbelt`:\n\n\n```python\nfrom torch import nn\nfrom pytorch_toolbelt.modules import encoders as E\nfrom pytorch_toolbelt.modules import decoders as D\nfrom pytorch_toolbelt.utils import count_parameters\n\nclass SEResNeXt50FPN(nn.Module):\n    def __init__(self, num_classes, fpn_channels):\n        super().__init__()\n        self.encoder = E.SEResNeXt50Encoder()\n        self.decoder = D.FPNCatDecoder(self.encoder.channels, fpn_channels)\n        self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)\n\n    def forward(self, x):\n        x = self.encoder(x)\n        x = self.decoder(x)\n        return self.logits(x[0])\n\nnet = SEResNeXt50FPN(1, 128)\nprint(count_parameters(net))\n# Prints {'total': 34232561, 'trainable': 34232561, 'encoder': 25510896, 'decoder': 8721536, 'logits': 129}\n\n```\n\n### Compose multiple losses\n\nThere are multiple ways to combine multiple losses, and high-level DL frameworks like Catalyst offers way more flexible way to achieve this, but here's 100%-pure PyTorch implementation of mine:\n\n```python\nfrom pytorch_toolbelt import losses as L\n\n# Creates a loss function that is a weighted sum of focal loss \n# and lovasz loss with weigths 1.0 and 0.5 accordingly.\nloss = L.JointLoss(L.FocalLoss(), L.LovaszLoss(), 1.0, 0.5)\n```\n\n\n## TTA \u002F Inferencing\n\n### Apply Test-time augmentation (TTA) for the model\n\nTest-time augmetnation (TTA) can be used in both training and testing phases. \n\n```python\nfrom pytorch_toolbelt.inference import tta\n\nmodel = UNet()\n\n# Truly functional TTA for image classification using horizontal flips:\nlogits = tta.fliplr_image2label(model, input)\n\n# Truly functional TTA for image segmentation using D4 augmentation:\nlogits = tta.d4_image2mask(model, input)\n\n```\n\n### Inference on huge images:\n\nQuite often, there is a need to perform image segmentation for enormously big image (5000px and more). There are a few problems with such a big pixel arrays:\n 1. There are size limitations on maximum size of CUDA tensors (Concrete numbers depends on driver and GPU version)\n 2. Heavy CNNs architectures may eat up all available GPU memory with ease when inferencing relatively small 1024x1024 images, leaving no room to bigger image resolution.\n  \nOne of the solutions is to slice input image into tiles (optionally overlapping) and feed each through model and concatenate the results back. \nIn this way you can guarantee upper limit of GPU ram usage, while keeping ability to process arbitrary-sized images on GPU.\n  \n\n```python\nimport numpy as np\nfrom torch.utils.data import DataLoader\nimport cv2\n\nfrom pytorch_toolbelt.inference.tiles import ImageSlicer, CudaTileMerger\nfrom pytorch_toolbelt.utils.torch_utils import tensor_from_rgb_image, to_numpy\n\n\nimage = cv2.imread('really_huge_image.jpg')\nmodel = get_model(...)\n\n# Cut large image into overlapping tiles\ntiler = ImageSlicer(image.shape, tile_size=(512, 512), tile_step=(256, 256))\n\n# HCW -> CHW. Optionally, do normalization here\ntiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]\n\n# Allocate a CUDA buffer for holding entire mask\nmerger = CudaTileMerger(tiler.target_shape, 1, tiler.weight)\n\n# Run predictions for tiles and accumulate them\nfor tiles_batch, coords_batch in DataLoader(list(zip(tiles, tiler.crops)), batch_size=8, pin_memory=True):\n    tiles_batch = tiles_batch.float().cuda()\n    pred_batch = model(tiles_batch)\n\n    merger.integrate_batch(pred_batch, coords_batch)\n\n# Normalize accumulated mask and convert back to numpy\nmerged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8)\nmerged_mask = tiler.crop_to_orignal_size(merged_mask)\n```\n\n## Advanced examples\n\n1. [Inria Sattelite Segmentation](https:\u002F\u002Fgithub.com\u002FBloodAxe\u002FCatalyst-Inria-Segmentation-Example)\n1. [CamVid Semantic Segmentation](https:\u002F\u002Fgithub.com\u002FBloodAxe\u002FCatalyst-CamVid-Segmentation-Example)\n\n\n## Citation\n\n```\n@misc{Khvedchenya_Eugene_2019_PyTorch_Toolbelt,\n  author = {Khvedchenya, Eugene},\n  title = {PyTorch Toolbelt},\n  year = {2019},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt}},\n  commit = {cc5e9973cdb0dcbf1c6b6e1401bf44b9c69e13f3}\n}\n```\n","# 重要更新\n\n![乌克兰国旗](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBloodAxe_pytorch-toolbelt_readme_4aab4defff0f.jpg)\n\n2022年2月24日，俄罗斯宣布开战并入侵和平的乌克兰。继吞并克里米亚和占领顿巴斯地区之后，普京政权决定摧毁乌克兰民族认同。乌克兰人民展现出顽强的抵抗精神，向全世界展示了为国家独立而战的意义。\n\n乌克兰政府开通了一个网站，帮助俄罗斯的母亲、妻子和姐妹寻找在乌克兰阵亡或被俘的亲人——https:\u002F\u002F200rf.com 和 https:\u002F\u002Ft.me\u002Frf200_now（Telegram频道）。我们的目标是让仍身处俄罗斯和白俄罗斯的人们了解真相，从而拒绝参与对乌克兰的侵略。\n\n请帮助我们让更多人了解乌克兰正在发生的事情——“俄罗斯世界”给乌克兰带来的暴力与非人道恐怖行径。这里有一个全面的维基页面，介绍你可以如何帮助结束这场战争：https:\u002F\u002Fhow-to-help-ukraine-now.super.site\u002F\n\n官方渠道：\n* [乌克兰议会官方账号](https:\u002F\u002Ft.me\u002Fverkhovnaradaofukraine)\n* [乌克兰国防部](https:\u002F\u002Fwww.facebook.com\u002FMinistryofDefence.UA)\n* [乌克兰总统办公室](https:\u002F\u002Fwww.facebook.com\u002Fpresident.gov.ua)\n* [乌克兰内阁](https:\u002F\u002Fwww.facebook.com\u002FKabminUA)\n* [战略传播中心](https:\u002F\u002Fwww.facebook.com\u002FStratcomCentreUA)\n* [乌克兰外交部长](https:\u002F\u002Ftwitter.com\u002FDmytroKuleba)\n\n荣耀归于乌克兰！\n\n\n# Pytorch-toolbelt\n\n`pytorch-toolbelt` 是一个基于 PyTorch 的 Python 库，提供了一系列实用工具和模块，旨在加速研发原型设计和 Kaggle 竞赛中的模型训练：\n\n## 包含内容\n\n* 基于灵活编码器-解码器架构的便捷模型构建。\n* 模块：CoordConv、SCSE、Hypercolumn、深度可分离卷积等。\n* 针对分割和分类任务的 GPU 友好型测试时增强 TTA。\n* 对超大尺寸图像（如 5000x5000）进行 GPU 加速推理。\n* 常用工具函数：固定\u002F恢复随机种子、文件系统工具、评估指标等。\n* 损失函数：BinaryFocalLoss、Focal、ReducedFocal、Lovasz、Jaccard 和 Dice 损失、Wing Loss 等。\n* 与 [Catalyst](https:\u002F\u002Fgithub.com\u002Fcatalyst-team\u002Fcatalyst) 库的集成扩展功能（批量预测可视化、额外指标）。\n\n示例：[Catalyst、Albumentations、Pytorch Toolbelt 示例：CamVid 数据集上的语义分割](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1OUPJYU7TzH5Vz1si6FBkooackuIlzaGr#scrollTo=GUWuiO5K3aUm)\n\n# 缘由\n\n坦率地说，“我需要一种方便的方式来复用代码，以支持我的 Kaggle 职业生涯”。2018 年，我获得了 [Kaggle Master](https:\u002F\u002Fwww.kaggle.com\u002Fbloodaxe) 称号，这是一段漫长的过程。我经常发现自己一遍又一遍地重复使用旧的代码流程。最终，这些经验积累形成了这个库。\n\n该库并非旨在取代 Catalyst、Ignite 或 Fast.ai 等高级框架，而是作为它们的补充。\n\n# 安装\n\n`pip install pytorch_toolbelt`\n\n# 如何...\n\n## 模型创建\n\n### 创建 U-Net 编码器-解码器模型\n\n以下代码片段创建了一个用于二分类分割的原生 U-Net 模型。按照设计，编码器和解码器都会输出一系列特征图，从精细（高分辨率，索引为 0）到粗糙（低分辨率）。访问所有中间特征图非常有用，例如在应用深度监督损失时，或者在目标检测任务中，中间特征图的访问往往是必要的。\n\n```python\nfrom torch import nn\nfrom pytorch_toolbelt.modules import encoders as E\nfrom pytorch_toolbelt.modules import decoders as D\n\nclass UNet(nn.Module):\n    def __init__(self, input_channels, num_classes):\n        super().__init__()\n        self.encoder = E.UnetEncoder(in_channels=input_channels, out_channels=32, growth_factor=2)\n        self.decoder = D.UNetDecoder(self.encoder.channels, decoder_features=32)\n        self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)\n\n    def forward(self, x):\n        x = self.encoder(x)\n        x = self.decoder(x)\n        return self.logits(x[0])\n```\n\n### 创建带有预训练编码器的 FPN 编码器-解码器模型\n\n与前一个示例类似，你可以将解码器替换为带有特征融合的 FPN 解码器。\n\n```python\nfrom torch import nn\nfrom pytorch_toolbelt.modules import encoders as E\nfrom pytorch_toolbelt.modules import decoders as D\n\nclass SEResNeXt50FPN(nn.Module):\n    def __init__(self, num_classes, fpn_channels):\n        super().__init__()\n        self.encoder = E.SEResNeXt50Encoder()\n        self.decoder = D.FPNCatDecoder(self.encoder.channels, fpn_channels)\n        self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)\n\n    def forward(self, x):\n        x = self.encoder(x)\n        x = self.decoder(x)\n        return self.logits(x[0])\n```\n\n### 更改编码器的输入通道数\n\n`pytorch_toolbelt` 中的所有编码器都支持更改输入通道数。只需调用 `encoder.change_input_channels(num_channels)`，即可修改第一层卷积的输入通道数。在可能的情况下，现有卷积层的权重会被重用（如果新通道数大于默认值，则会用随机初始化的权重填充新增部分）。该方法返回 `self`，因此可以链式调用。\n\n```python\nfrom pytorch_toolbelt.modules import encoders as E\n\nencoder = E.SEResnet101Encoder()\nencoder = encoder.change_input_channels(6)\n```\n\n## 其他\n\n## 统计编码器\u002F解码器及其他模块的参数量\n\n在设计模型并优化神经网络中的特征数量时，我发现打印高层模块（如编码器和解码器）的参数量非常有帮助。以下是使用 `pytorch_toolbelt` 实现的方法：\n\n```python\nfrom torch import nn\nfrom pytorch_toolbelt.modules import encoders as E\nfrom pytorch_toolbelt.modules import decoders as D\nfrom pytorch_toolbelt.utils import count_parameters\n\nclass SEResNeXt50FPN(nn.Module):\n    def __init__(self, num_classes, fpn_channels):\n        super().__init__()\n        self.encoder = E.SEResNeXt50Encoder()\n        self.decoder = D.FPNCatDecoder(self.encoder.channels, fpn_channels)\n        self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)\n\n    def forward(self, x):\n        x = self.encoder(x)\n        x = self.decoder(x)\n        return self.logits(x[0])\n\nnet = SEResNeXt50FPN(1, 128)\nprint(count_parameters(net))\n# 输出：{'total': 34232561, 'trainable': 34232561, 'encoder': 25510896, 'decoder': 8721536, 'logits': 129}\n```\n\n### 组合多个损失函数\n\n虽然像 Catalyst 这样的高级深度学习框架提供了更灵活的方式来组合多损失，但这里是一个纯 PyTorch 实现的多损失组合方法：\n\n```python\nfrom pytorch_toolbelt import losses as L\n\n# 创建一个损失函数，它是焦点损失和 Lovasz 损失的加权和，权重分别为 1.0 和 0.5。\nloss = L.JointLoss(L.FocalLoss(), L.LovaszLoss(), 1.0, 0.5)\n```\n\n## TTA \u002F 推理\n\n### 为模型应用测试时增强（TTA）\n\n测试时增强（TTA）可以在训练和测试阶段使用。\n\n```python\nfrom pytorch_toolbelt.inference import tta\n\nmodel = UNet()\n\n# 使用水平翻转实现真正有效的图像分类 TTA：\nlogits = tta.fliplr_image2label(model, input)\n\n# 使用 D4 增强实现真正有效的图像分割 TTA：\nlogits = tta.d4_image2mask(model, input)\n```\n\n### 对超大图像进行推理：\n\n在许多情况下，需要对非常大的图像（5000像素及以上）进行分割。处理如此大的像素数组会遇到几个问题：\n1. CUDA 张量的最大尺寸存在限制（具体数值取决于驱动程序和 GPU 版本）。\n2. 复杂的 CNN 架构在推理相对较小的 1024x1024 图像时，可能会轻易耗尽所有可用的 GPU 内存，从而无法处理更大分辨率的图像。\n\n一种解决方案是将输入图像切分成小块（可选重叠），分别送入模型进行推理，然后将结果拼接起来。这样既可以确保 GPU 内存的使用上限，又能够处理任意大小的图像。\n\n```python\nimport numpy as np\nfrom torch.utils.data import DataLoader\nimport cv2\n\nfrom pytorch_toolbelt.inference.tiles import ImageSlicer, CudaTileMerger\nfrom pytorch_toolbelt.utils.torch_utils import tensor_from_rgb_image, to_numpy\n\n\nimage = cv2.imread('really_huge_image.jpg')\nmodel = get_model(...)\n\n# 将大图像切割成重叠的小块\ntiler = ImageSlicer(image.shape, tile_size=(512, 512), tile_step=(256, 256))\n\n# HCW -> CHW。可选在此处进行归一化\ntiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]\n\n# 分配一个 CUDA 缓冲区来存储完整的掩码\nmerger = CudaTileMerger(tiler.target_shape, 1, tiler.weight)\n\n# 对每个小块进行预测并累积结果\nfor tiles_batch, coords_batch in DataLoader(list(zip(tiles, tiler.crops)), batch_size=8, pin_memory=True):\n    tiles_batch = tiles_batch.float().cuda()\n    pred_batch = model(tiles_batch)\n\n    merger.integrate_batch(pred_batch, coords_batch)\n\n# 对累积的掩码进行归一化，并转换回 NumPy 数组\nmerged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8)\nmerged_mask = tiler.crop_to_orignal_size(merged_mask)\n```\n\n## 高级示例\n\n1. [Inria 卫星图像分割](https:\u002F\u002Fgithub.com\u002FBloodAxe\u002FCatalyst-Inria-Segmentation-Example)\n1. [CamVid 语义分割](https:\u002F\u002Fgithub.com\u002FBloodAxe\u002FCatalyst-CamVid-Segmentation-Example)\n\n\n## 引用\n\n```\n@misc{Khvedchenya_Eugene_2019_PyTorch_Toolbelt,\n  author = {Khvedchenya, Eugene},\n  title = {PyTorch Toolbelt},\n  year = {2019},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt}},\n  commit = {cc5e9973cdb0dcbf1c6b6e1401bf44b9c69e13f3}\n}\n```","# PyTorch Toolbelt 快速上手指南\n\n`pytorch-toolbelt` 是一个专为 PyTorch 设计的工具库，旨在加速研发原型设计和 Kaggle 竞赛。它提供了灵活的编码器 - 解码器架构、丰富的模块（如 CoordConv, SCSE）、高效的测试时增强（TTA）以及多种常用的损失函数。\n\n## 环境准备\n\n*   **系统要求**：Linux, macOS 或 Windows\n*   **Python 版本**：建议 Python 3.7+\n*   **前置依赖**：\n    *   PyTorch (建议最新版本)\n    *   torchvision\n    *   NumPy\n    *   OpenCV-Python (`cv2`) (用于大图像推理示例)\n\n请确保已正确安装 CUDA 驱动及对应的 PyTorch GPU 版本以获得最佳性能。\n\n## 安装步骤\n\n使用 pip 进行安装。国内用户推荐使用清华源或阿里源以加速下载。\n\n**通用安装命令：**\n```bash\npip install pytorch-toolbelt\n```\n\n**使用国内镜像源安装（推荐）：**\n```bash\npip install pytorch-toolbelt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\n### 1. 构建编码器 - 解码器模型 (U-Net)\n\n`pytorch-toolbelt` 的核心优势在于快速搭建基于 Encoder-Decoder 架构的模型。以下是一个创建用于二分类分割的 vanilla U-Net 的示例：\n\n```python\nfrom torch import nn\nfrom pytorch_toolbelt.modules import encoders as E\nfrom pytorch_toolbelt.modules import decoders as D\n\nclass UNet(nn.Module):\n    def __init__(self, input_channels, num_classes):\n        super().__init__()\n        # 定义编码器：输入通道数，基础输出通道数，增长因子\n        self.encoder = E.UnetEncoder(in_channels=input_channels, out_channels=32, growth_factor=2)\n        # 定义解码器：使用编码器的通道配置\n        self.decoder = D.UNetDecoder(self.encoder.channels, decoder_features=32)\n        # 定义最终输出层\n        self.logits = nn.Conv2d(self.decoder.channels[0], num_classes, kernel_size=1)\n\n    def forward(self, x):\n        x = self.encoder(x)\n        x = self.decoder(x)\n        # 取最高分辨率的特征图进行预测\n        return self.logits(x[0])\n\n# 实例化模型\nmodel = UNet(input_channels=3, num_classes=1)\n```\n\n### 2. 组合损失函数\n\n库中内置了多种常用的损失函数（如 Focal Loss, Lovasz Loss, Dice Loss 等），并支持轻松组合：\n\n```python\nfrom pytorch_toolbelt import losses as L\n\n# 创建一个加权组合损失：Focal Loss (权重 1.0) + Lovasz Loss (权重 0.5)\nloss_fn = L.JointLoss(L.FocalLoss(), L.LovaszLoss(), 1.0, 0.5)\n\n# 在训练循环中使用\n# output = model(input)\n# loss = loss_fn(output, target)\n```\n\n### 3. 统计模型参数量\n\n在设计模型时，快速查看各部分（Encoder\u002FDecoder）的参数量非常有用：\n\n```python\nfrom pytorch_toolbelt.utils import count_parameters\n\n# 假设已经定义了上述 model\nparams_info = count_parameters(model)\nprint(params_info)\n# 输出示例：{'total': ..., 'trainable': ..., 'encoder': ..., 'decoder': ...}\n```\n\n### 4. 测试时增强 (TTA)\n\n利用内置的 TTA 功能提升推理效果，支持分类和分割任务：\n\n```python\nfrom pytorch_toolbelt.inference import tta\n\n# 图像分类：使用水平翻转 TTA\n# logits = tta.fliplr_image2label(model, input_tensor)\n\n# 图像分割：使用 D4 (旋转 + 翻转) 增强 TTA\n# mask_logits = tta.d4_image2mask(model, input_tensor)\n```\n\n### 5. 超大图像推理\n\n针对无法一次性放入显存的超大图像（如 5000x5000 像素），库提供了切片推理与合并工具：\n\n```python\nimport numpy as np\nimport cv2\nfrom torch.utils.data import DataLoader\nfrom pytorch_toolbelt.inference.tiles import ImageSlicer, CudaTileMerger\nfrom pytorch_toolbelt.utils.torch_utils import tensor_from_rgb_image, to_numpy\n\n# 读取大图\nimage = cv2.imread('really_huge_image.jpg')\n# model = get_model(...) \n\n# 1. 将大图切割为重叠的瓦片 (tile)\ntiler = ImageSlicer(image.shape, tile_size=(512, 512), tile_step=(256, 256))\ntiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]\n\n# 2. 初始化 CUDA 合并器\nmerger = CudaTileMerger(tiler.target_shape, num_classes=1, weight=tiler.weight)\n\n# 3. 分批推理并合并结果\nfor tiles_batch, coords_batch in DataLoader(list(zip(tiles, tiler.crops)), batch_size=8, pin_memory=True):\n    tiles_batch = tiles_batch.float().cuda()\n    with torch.no_grad():\n        pred_batch = model(tiles_batch)\n    merger.integrate_batch(pred_batch, coords_batch)\n\n# 4. 获取最终合并后的掩码\nmerged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8)\nmerged_mask = tiler.crop_to_orignal_size(merged_mask)\n```","某计算机视觉团队正在开发一套针对高分辨率病理切片（5000x5000 像素）的肿瘤分割系统，需要在有限时间内验证多种网络架构以提升 Kaggle 竞赛成绩。\n\n### 没有 pytorch-toolbelt 时\n- **模型搭建繁琐**：手动编写 U-Net 等编码器 - 解码器结构耗时费力，难以快速调整中间特征层以应用深度监督损失。\n- **大图推理崩溃**：直接对超大尺寸病理图进行推理极易导致显存溢出（OOM），需自行编写复杂的分块滑动窗口逻辑。\n- **实验迭代缓慢**：缺乏内置的高级损失函数（如 Focal Loss、Dice Loss）和测试时增强（TTA）模块，每次尝试新策略都要重复造轮子。\n- **代码复用率低**：种子固定、文件处理等日常工具函数散落在各个脚本中，维护困难且容易引入不一致性。\n\n### 使用 pytorch-toolbelt 后\n- **架构灵活构建**：利用其灵活的 Encoder-Decoder API，几行代码即可组装包含 CoordConv 或 SCSE 模块的自定义模型，轻松访问所有中间特征图。\n- **高效大图处理**：调用内置的 GPU 友好型推理接口，无缝支持 5000x5000 级别图像的显存优化处理，无需关心底层分块细节。\n- **策略快速验证**：直接导入 BinaryFocalLoss、Lovasz 等现成损失函数及 TTA 模块，显著缩短从想法到实验结果的路径。\n- **工程规范统一**：复用库中经过验证的日常工具 routine，确保随机种子、指标计算等环节的一致性，让团队专注于核心算法创新。\n\npytorch-toolbelt 通过提供丰富的预置模块和工程化组件，将研究人员从重复的基础设施搭建中解放出来，实现了从“写代码”到“搞研发”的效率飞跃。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FBloodAxe_pytorch-toolbelt_2d4a9386.png","BloodAxe","Eugene Khvedchenya","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FBloodAxe_206d5841.jpg","Senior Deep Learning Engineer @NVidia\r\nKaggle Grandmaster\r\nAuthor of pytorch-toolbelt\r\nCo-Author of albumentations (No more after 2022)p","@NVidia","Odesa, Ukraine","ekhvedchenya@gmail.com",null,"https:\u002F\u002Fcomputer-vision-talks.com","https:\u002F\u002Fgithub.com\u002FBloodAxe",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",100,{"name":88,"color":89,"percentage":90},"Makefile","#427819",0,1571,126,"2026-04-13T07:40:21","MIT",1,"","需要 NVIDIA GPU（用于 TTA 和大图像推理），具体型号和显存大小未说明，但支持处理 5000x5000 像素的大图像","未说明",{"notes":100,"python":98,"dependencies":101},"该库主要用于 PyTorch 的快速研发原型设计和 Kaggle 竞赛。支持编码器 - 解码器架构模型构建、多种损失函数、测试时增强（TTA）以及超大图像的切片推理。安装命令为 `pip install pytorch_toolbelt`。README 中未明确列出具体的操作系统、Python 版本或详细的硬件配置要求。",[102,103],"torch","catalyst (可选)",[14,15,16],[106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121],"pytorch","kaggle","image-classification","image-segmentation","deep-learning","segmentation","python","image-processing","machine-learning","focal-loss","jaccard-loss","tta","test-time-augmentation","augmentation","object-detection","pipeline","2026-03-27T02:49:30.150509","2026-04-15T04:29:14.794933",[125,130,135,140,145,149],{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},33666,"在使用 ImageSlicer 和 CudaTileMerger 进行大图推理时，为什么会出现特征丢失或形状异常的问题？","这通常不是合并函数本身的问题，而是因为没有正确设置重叠（overlap）和权重。关键解决方案是：\n1. 确保切片之间有重叠，以减轻边缘效应。\n2. 使用 `weight='mean'` 参数。\n推荐配置示例：`ImageSlicer(tile_size=512, tile_step=256, weight=\"mean\")`。如果是在 Windows 上遇到特定问题，请检查代码实现是否有平台相关的 Bug，Linux 环境下通常表现正常。","https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fissues\u002F23",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},33667,"BalancedBCEWithLogitsLoss 中的 `gamma` 参数有什么作用？","`gamma` 是作者对该损失函数的扩展参数。当 `gamma=1` 时，它等同于标准的 Balanced BCE (BBCE)。当 `gamma > 1` 时，它会衰减平衡权重。例如，如果 `gamma = 2`，正样本比例为 0.25，负样本比例为 0.75，则对应的权重将分别变为 0.25^2 和 0.75^2。这允许用户调整对难易样本的关注程度。","https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fissues\u002F65",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},33668,"为什么当目标标签全为 0（没有正样本）时，Dice Loss 返回 0 而不是 1 或 NaN？","这是设计行为。当没有正样本目标（即传入全零张量）时，Dice 指标在数学上是未定义的。为了避免损失计算中出现 `NaN`，代码逻辑会将其回退（fallback）为 0。如果你希望避免这种情况，可以在输入中添加一个极小值（epsilon），例如 `zeros += 1e-90`，但这通常不需要，因为返回 0 表示在该特定情况下没有产生惩罚。","https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fissues\u002F59",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},33669,"pytorch-toolbelt 是否必须依赖 `opencv-python`？在无头（headless）Docker 环境中如何使用？","库并不强制绑定特定的 OpenCV 包。在无头环境（如 Docker）中，建议安装 `opencv-python-headless` 而不是标准的 `opencv-python`。库内部会检查是否已安装任何 OpenCV 版本（通过 `import cv2`），如果已存在则不会强制安装其他版本。如果遇到依赖冲突，可以手动安装 headless 版本并确保其在导入路径中优先。","https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fissues\u002F71",{"id":146,"question_zh":147,"answer_zh":148,"source_url":129},33670,"如何优化金字塔补丁权重损失（pyramid patch weight loss）的计算性能？","原始的 Numpy 实现可能较慢。可以通过以下方式优化：\n1. 替换 `sqrt` 和 `square` 等慢速操作。\n2. 使用更高效的权重生成逻辑，例如基于距离中心的线性权重：`De_x = np.minimum(x, width - x)` 和 `De_y = np.minimum(y, height - y)`，然后计算 `res = De_x * De_y`。\n3. 如果需要更强的中心权重，可以对结果进行平方处理 `np.square(res)`。这些方法比原始实现效果更好且速度更快。",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},33671,"在使用 tiling 工具配合 YOLOv5 模型时，遇到 'The size of tensor a must match the size of tensor b' 错误怎么办？","该错误通常是因为模型输出的特征图尺寸与切片工具预期的权重图尺寸不匹配。YOLOv5 的输出格式可能与工具默认假设不同。解决方案包括：\n1. 检查模型输出张量的维度，确保其与图像切片后的尺寸逻辑一致。\n2. 如果使用的是 DetectMultiBackend，需确认其输出是否已经过适当的后处理或维度调整。\n3. 考虑切换到实现更明确的模型版本（如 YOLOv4 Darknet 实现），或者手动调整 `integrate_batch` 中的张量拼接逻辑以适配 YOLOv5 的输出形状。","https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fissues\u002F68",[155,160,165,169,174,179,184,189,194,199,204,209,214,219,224,229,234,239,244,249],{"id":156,"version":157,"summary_zh":158,"released_at":159},259786,"0.8.0","## 变更内容\n\n* @BloodAxe 在 https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fpull\u002F92 中对功能和模块进行了全面重构\n* @BloodAxe 在 https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fpull\u002F96 中对功能和模块进行了全面重构\n\n* @BloodAxe 在 https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fpull\u002F97 中发布了 0.7.0 版本\n* @BloodAxe 在 https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fpull\u002F98 中合并了来自 BloodAxe\u002Fdevelop 分支的 #97 拉取请求\n* @MrPrajwalB 在 https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fpull\u002F102 中指出，简化的焦点损失实现对于二分类情况是不正确的\n\n## 新贡献者\n* @MrPrajwalB 在 https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fpull\u002F102 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fcompare\u002F0.7.0...0.8.0","2024-11-21T20:06:36",{"id":161,"version":162,"summary_zh":163,"released_at":164},259787,"0.7.0","# 新功能\n\n* 所有编码器、解码器和头部现在都继承自 `HasOutputFeaturesSpecification` 接口，以便查询该模块输出的通道数和步幅。\n* 新增损失类 `QualityFocalLoss`，源自 https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.04388。\n* 新函数 `pad_tensor_to_size`——一个用于 N 维张量 `[B,C, ...]` 形状的通用填充函数。\n* 添加了 `DropPath` 层（又称 DropConnect）。\n* 提供了 SegFormer 主干网络的预训练权重。\n* `first_class_background_init` 用于初始化最后一个输出卷积或线性层：权重设为零，偏置层设置为 `[logit(bg_prob), logit(1-bg_prob), ...]`。\n* 新函数 `instantiate_normalization_block` 可根据名称创建归一化层。此函数已在部分解码器层和头部中使用。\n\n* # 改进\n\n* 通过显式禁用该函数的 AMP 自动混合精度，并将预测值和目标值转换为 `float32` 类型，提高了 `focal_loss_with_logits` 函数的数值精度。\n* `MultiscaleTTA` 现在允许设置输入和预测结果缩放时的插值模式和对齐角参数。\n* `BinaryFocalLoss` 现在实现了 `__repr__` 方法。\n* `name_for_stride` 现在可以接受 `stride` 参数为 `None` 的情况。此时该函数不执行任何操作，直接返回输入参数 `name`。\n* `RandomSubsetDataset` 现在新增可选的 `weights` 参数，用于按给定概率选择样本。\n\n# 错误修复\n\n* `RandomSubsetDataset` 中 `get_collate_fn` 的实现存在错误，它会直接返回编排函数，而不是调用该函数。\n\n# 破坏性变更\n\n* 解码器的签名已更改，要求第一个参数 `input_spec` 必须为 `FeatureMapsSpecification` 类型。\n* 重写了 BiFPN 解码器，以支持任意数量的输入特征图，以及用户自定义的归一化、激活函数和 BiFPN 块。\n* 重写了 UNetDecoder，允许将上采样块指定为字符串类型。\n\n* 已移除 `WeightedLoss` 和 `JointLoss` 类。如果您的代码曾使用过这些类，您可以将其复制粘贴到项目中继续使用，但我强烈建议您改用现代深度学习框架，这些框架支持通过配置文件定义损失函数。\n```python\nclass WeightedLoss(_Loss):\n    \"\"\"围绕损失函数的包装类，应用固定因子进行加权。此类有助于平衡不同尺度的多个损失。\"\"\"\n\n    def __init__(self, loss, weight=1.0):\n        super().__init__()\n        self.loss = loss\n        self.weight = weight\n\n    def forward(self, *input):\n        return self.loss(*input) * self.weight\n\n\nclass JointLoss(_Loss):\n    \"\"\"\n    将两个损失函数封装为一个。该类计算两个损失的加权和。\n    \"\"\"\n\n    def __init__(self, first: nn.Module, second: nn.Module, first_weight=1.0, second_weight=1.0):\n        super().__init__()\n        self.first = WeightedLoss(first, first_weight)\n        self.second = WeightedLoss(second, second_weight)\n\n    def forward(self, *input):\n        return self.first(*input) + self.second(*i","2023-08-19T14:26:10",{"id":166,"version":167,"summary_zh":79,"released_at":168},259788,"0.6.2","2022-12-25T21:39:53",{"id":170,"version":171,"summary_zh":172,"released_at":173},259789,"0.6.1","# PyTorch Toolbelt 0.6.1\n\n* 修复了 CI 流水线\n* 增加对 Python 3.10 的支持\n* 修复了在使用 mask 参数时 DatasetMeanStdCalculator 中的 bug","2022-10-25T15:59:08",{"id":175,"version":176,"summary_zh":177,"released_at":178},259790,"0.6.0","# 断裂性变更\n所有与 Catalyst 相关的回调函数均已移至 Catalyst 库的 [分支](https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fcatalyst)。","2022-10-20T21:13:05",{"id":180,"version":181,"summary_zh":182,"released_at":183},259791,"0.5.3","# Bug修复\n\n- 修复了 https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fissues\u002F78，感谢 https:\u002F\u002Fgithub.com\u002Fmehran66 指出该问题。\n\n# 新功能\n\n- InriaAerialImageDataset，用于处理 Inria 高空影像数据集。\n- `get_collate_for_dataset` 函数，用于在数据集实例（参数）暴露 `get_collate_fn` 方法时获取 collate 函数。该函数也适用于 ConcatDataset。\n\n# 改进\n\n- `DatasetMeanStdCalculator` 支持 `dtype` 参数，用于指定累加器类型（默认为 float64）。","2022-10-20T18:47:56",{"id":185,"version":186,"summary_zh":187,"released_at":188},259792,"0.5.2","# Bug修复\n\n* 修复了 `ApplySoftmaxTo` 和 `ApplySigmoidTo` 模块中的一个 bug，该 bug 可能导致当输入为字符串时，激活函数无法正确应用到输入上。\n\n# 新增 API\n\n* 添加了 `fs.find_images_in_dir_recursive`\n* 添加了 `utils.describe_outputs`，用于返回复杂输出（如字典、嵌套列表等）的易读表示，以便查看每个张量的形状、均值和标准差。\n\n# 其他\n\n更多的 MyPy 修复及类型注解","2022-08-26T14:59:11",{"id":190,"version":191,"summary_zh":192,"released_at":193},259793,"0.5.1","## 新增 API\n\n* 添加了 `fs.find_subdirectories_in_dir`，用于获取指定目录下的子目录列表（非递归）。\n* 添加了对 TTA 预测结果进行 log-odds 平均的函数，以及对应的 `logodd_mean` 函数。\n\n## 改进\n\n* 在 `plot_confusion_matrix` 中，可以通过 `show_scores` 参数禁用在每个单元格中绘制分数（默认为 `True`）。\n* `freeze_model` 方法现在会返回传入的 `module` 参数。","2022-06-27T19:58:11",{"id":195,"version":196,"summary_zh":197,"released_at":198},259794,"0.5.0","# 版本 0.5.0\n\n这是 Pytorch Toolbelt 的重大版本更新。距离上一次更新已经过去了一段时间，自 0.4.4 版本以来，我们进行了许多改进和更新：\n\n## 新特性\n\n* 添加了类 `pytorch_toolbelt.datasets.DatasetMeanStdCalculator`，用于计算无法完全加载到内存中的数据集的均值和标准差。\n* 新增解码器模块：`BiFPNDecoder`\n* 新增编码器：`SwinTransformer`、`SwinB`、`SwinL`、`SwinT`、`SwinS`\n* 在分布式工具中添加了 `broadcast_from_master` 函数。该方法允许将张量从主节点广播到所有其他节点。\n* 在 DDP 中添加了 `reduce_dict_sum`，用于收集并拼接来自所有节点的字典列表。\n* 添加了 `master_print`，作为 `print` 的直接替代品，仅在零号进程上输出到标准输出。\n\n## Bug 修复\n\n* 修复了 lovasz 损失中的 bug，由 @seefun 在 https:\u002F\u002Fgithub.com\u002FBloodAxe\u002Fpytorch-toolbelt\u002Fpull\u002F62 中完成。\n\n## 破坏性变更\n\n* 边界框匹配方法被拆分为两个：`match_bboxes` 和 `match_bboxes_hungarian`。前者使用预测边界框的得分，优先匹配置信度最高的预测；而后者则通过匈牙利算法匹配边界框，以最大化整体 IoU。\n* `set_manual_seed` 现在也会设置 NumPy 的随机种子。\n* `to_numpy` 现在能够正确处理 `None` 以及所有可迭代对象（不仅限于元组和列表）。\n\n## 修复与改进（无破坏性变更）\n\n* 为 `ApplySoftmaxTo` 添加了 `dim` 参数，用于指定 softmax 运算作用的通道（默认值为 1，此前为硬编码）。\n* `ApplySigmoidTo` 现在会就地应用 sigmoid 操作（纯粹的性能优化）。\n* `TileMerger` 现在支持指定一个 `device`（遵循 PyTorch 语义），用于存储累积瓦片的中间张量。\n* 所有 TTA 函数现在都支持 PyTorch Tracing。\n* `MultiscaleTTA` 现在支持返回单个 Tensor 的模型（键值对输出仍保持原有行为）。\n* `balanced_binary_cross_entropy_with_logits` 和 `BalancedBCEWithLogitsLoss` 现在支持 `ignore_index` 参数。\n* `BiTemperedLogisticLoss` 和 `BinaryBiTemperedLogisticLoss` 也新增了 `ignore_index` 参数的支持。\n* `focal_loss_with_logits` 现在同样支持 `ignore_index` 参数。被忽略值的计算已从 `BinaryFocalLoss` 移至该函数中。\n* 减少了编码器中来自 `timm` 的模板代码和硬编码内容。现在 `GenericTimmEncoder` 直接从 `timm` 的编码器实例查询输出步幅和特征图。\n* 基于 HRNet 的编码器现在增加了一个 `use_incre_features` 参数，用于指定输出特征图是否应增加特征数量。\n* `change_extension`、`read_rgb_image` 和 `read_image_as_is` 函数现在支持 `Path` 类型的输入参数。返回类型（str）保持不变。\n* `count_parameters` 现在接受 `human_friendly` 参数，可以以更友好的格式（如 `21.1M`）显示参数数量，而非原始数字 `21123123`。\n* `plot_confusion_matrix` 现在新增了 `format_string` 参数（默认为 None），用于自定义混淆矩阵中数值的显示格式。","2022-03-10T21:22:20",{"id":200,"version":201,"summary_zh":202,"released_at":203},259795,"0.4.4","# 新特性\n\n- 针对3D数据新增了平铺式处理类 `VolumeSlicer` 和 `VolumeMerger`。其设计与 `ImageSlicer` 类似。现在可以在不担心内存溢出的情况下，对超大体积的3D数据进行分割。\n- 支持在D2、D4以及翻转变换TTA中对标签（标量或1D向量）进行增强和反增强操作。\n- 平衡二元交叉熵损失 (`BalancedBCEWithLogitsLoss`)。\n- 双温逻辑损失 (`BiTemperedLogisticLoss`)。\n- 新增辅助模块 `SelectByIndex` 用于从模型的命名输出中选择特定项（适用于 `nn.Sequential`）。\n- 新增来自 `torchvision` 的编码器 `MobileNetV3Large` 和 `MobileNetV3Small`。\n- 新增来自 `timm` 包的编码器（HRNets、ResNetD、EfficientNetV2等）。\n- DeepLabV3 和 DeepLabV3+ 解码器。\n- 基于纯 PyTorch 的边界框匹配实现 (`match_bboxes`)，支持使用匈牙利算法在 CPU 和 GPU 上进行匹配。\n\n# Bug修复\n- 修复了 Lovasz Loss 中的 bug (#62)，感谢 @seefun。\n\n# 破坏性变更\n\n- 将 `BinaryLovaszLoss` 类中的参数 `ignore` 重命名为 `ignore_index`。\n- 将 `FPNSumDecoder` 和 `FPNCatDecoder` 构造函数中的 `fpn_channels` 参数重命名为 `channels`。\n- 将 `HRNetSegmentationDecoder` 构造函数中的 `output_channels` 参数重命名为 `channels`。\n- `conv1x1` 默认不再将偏置设为零。\n- 将最低 PyTorch 版本提升至 1.8.1。\n\n# 其他改进\n\n- `Ensembler` 类在与 `torch.jit.tracing` 搭配使用时存在兼容性问题。\n- 大量文档字符串和类型注解的优化与增强。\n\n","2021-08-12T07:48:52",{"id":205,"version":206,"summary_zh":207,"released_at":208},259796,"0.4.3","# PyTorch Toolbelt 0.4.3\n\n## 模块\n\n- 在 `get_activation_block` 中添加了缺失的 `sigmoid` 激活函数支持\n- 使编码器支持 JIT 编译和跟踪\n- 更好地支持来自 `timm` 的编码器（这些编码器以 `Timm` 为前缀命名）\n\n## 工具函数\n\n- `rgb_image_from_tensor` 现在会对值进行裁剪\n\n## TTA 和集成\n\n- `Ensembler` 现在通过 `reduction` 参数支持算术平均、几何平均和调和平均。\n- 同时也将几何平均和调和平均引入到所有 TTA 函数中。\n\n## 数据集\n\n- `read_binary_mask`\n- 重构 `SegmentationDataset`，以支持用于深度监督的步进掩码\n- 添加了 `RandomSubsetDataset` 和 `RandomSubsetWithMaskDataset`，用于根据某些条件对数据集进行采样（例如，仅采样特定类别的样本）\n\n## 其他\n\n一如既往，增加了更多测试，改进了类型注解和注释。","2021-04-02T10:55:06",{"id":210,"version":211,"summary_zh":212,"released_at":213},259797,"0.4.2","# Breaking Changes\r\n\r\n* Bump up minimal PyTorch version to 1.7.1\r\n\r\n# New features\r\n\r\n* New dataset classes `ClassificationDataset`, `SegmentationDataset` for easy every-day use in Kaggle \r\n* New losses: `FocalCosineLoss`, `BiTemperedLogisticLoss`, `SoftF1Loss`\r\n* Support of new activations for `get_activation_block` (Silu, Softplus, Gelu)\r\n* More encoders from timm package: NFNets, NFRegNet, HRNet, DPN\r\n* `RocAucMetricCallback` for Catalyst\r\n* `MultilabelAccuracyCallback` and `AccuracyCallback` with DDP support\r\n\r\n# Bugfixes\r\n\r\n* Fix invalid prefix in catalyst registry to from `tbt` to `tbt.`\r\n","2021-03-03T09:36:08",{"id":215,"version":216,"summary_zh":217,"released_at":218},259798,"0.4.1","# New features\r\n\r\n* Added Soft-F1 loss for direct optimization of F1 score (Binary case only)\r\n* Fully rework TTA (Kept backward compatibility where it's possible) module for inference. \r\n* Added support of `ignore_index` to Dice & Jaccard losses.\r\n* Improved Lovasz loss to work in `fp16` mode.\r\n* Added option to override selected params in `make_n_channel_input`.\r\n* More Encoders, from `timm` package. \r\n* `FPNFuse` module not works on 2D, 3D and N-D inputs.\r\n* Added Global K-Max 2D pooling block.\r\n* Added Generalized mean pooling 2D block.\r\n* Added `softmax_over_dim_X`, `argmax_over_dim_X` shorthand functions for use in metrics to get soft\u002Fhard labels without using lambda functions.\r\n* Added helper visualization functions to add fancy header to image, stack images of different sizes.\r\n* Improved rendering of confusion matrix.\r\n\r\n# Catalyst goodies\r\n\r\n* Encoders & Losses are available in Catalyst registry \r\n* `StopIfNanCallback`\r\n* Added `OutputDistributionCallback` to log distribtion of predictions to TensorBoard.\r\n* Added `UMAPCallback` to visualize embedding space using UMAP in TensorBoard. \r\n\r\n\r\n# Breaking Changes\r\n\r\n* Renamed `CudaTileMerger` to `TileMerger`. `TileMerger` allows to specify target device explicitly.\r\n* `tensor_from_rgb_image` removed in favor of `image_to_tensor`.\r\n\r\n# Bug fixes & Improvements\r\n\r\n* Improve numeric stability of `focal_loss_with_logits` when `reduction=\"sum\"` \r\n* Prevent `NaN` in FocalLoss when all elements are equal to `ignore_index` value.\r\n* A LOT of type hints.\r\n\r\n","2021-01-14T10:30:56",{"id":220,"version":221,"summary_zh":222,"released_at":223},259799,"0.4.0","# New features\r\n* Memory-efficient `Swish` and `Mish` activation functions (Credits goes to http:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models)\r\n* Refactor EfficientNet encoders (no pretrained weights yet)\r\n\r\n# Fixes\r\n* Fixed incorrect default value for `ignore_index` in `SoftCrossEntropyLoss`\r\n\r\n# Breaking changes\r\n* All catalyst-related utils updated to be compatible with Catalyst 20.8.2\r\n* Remove PIL package dependency\r\n\r\n# Improvements\r\n* More comments, more type hints","2020-08-19T10:27:31",{"id":225,"version":226,"summary_zh":227,"released_at":228},259800,"0.3.2","# New features\r\n\r\n* Many helpful callbacks for Catalyst library: HyperParameterCallback, LossAdapter to name a few.\r\n* New losses for deep model supervision (Helpful, when size of target and output mask are different)\r\n* Stacked Hourglass encoder\r\n* Context Aggregation Network decoder\r\n\r\n# Breaking Changes\r\n\r\n* ABN module will now resolve as nn.Sequential(BatchNorm2d, Activation) instead of a hand-crafted module. This enables easier conversion of batch normalization modules to the nn.SyncBatchNorm.\r\n\r\n* Almost every Encoder\u002FDecoder implementation has been refactored for better clarity and flexibility. Please double-check your pipelines.\r\n\r\n# Important bugfixes\r\n\r\n* Improved numerical stability of Dice \u002F Jaccard losses (Using log_sigmoid() + exp() instead of plain sigmoid() )\r\n\r\n\r\n# Other\r\n\r\n* A lots of comments for functions and modules\r\n* Code cleanup, thanks for DeepSource\r\n* Type annotations for modules and functions\r\n* Update of README\r\n","2020-04-28T19:28:44",{"id":230,"version":231,"summary_zh":232,"released_at":233},259801,"0.3.1","# Fixes\r\n\r\n* Fixed bug in computation IoU metric in `binary_dice_iou_score` function\r\n* Fixed incorrect default value in `SoftCrossEntropyLoss` #38 \r\n\r\n# Improvements\r\n\r\n* Function `draw_binary_segmentation_predictions` now has parameter `image_format` (`rgb`|`bgr`|`gray`) to specify format of the image to visualize correctly images in TB\r\n* More type annotations across the codebase\r\n\r\n\r\n# New features\r\n\r\n* New visualization function `draw_multilabel_segmentation_predictions`\r\n","2020-02-25T14:26:34",{"id":235,"version":236,"summary_zh":237,"released_at":238},259802,"0.3.0","# Pytorch Toolbel 0.3.0\r\n\r\nThis release has a huge set of new features, bugfixes and breaking changes. So be careful, when upgrading. \r\n`pip install pytorch-toolbelt==0.3.0`\r\n\r\n# New features\r\n\r\n## Encoders\r\n\r\n* HRNetV2\r\n* DenseNets\r\n* EfficientNet \r\n* `Encoder` class has `change_input_channels` method to change number of channels in input image\r\n\r\n## New losses\r\n\r\n* `BCELoss` with support of `ignore_index`\r\n* `SoftBCELoss` (Label smoothing loss for binary case with support of `ignore_index`)\r\n* `SoftCrossEntropyLoss` (Label smoothing loss for multiclass case with support of `ignore_index`)\r\n\r\n## Catalyst goodies\r\n\r\n* Online pseudolabeling callback\r\n* Training signal annealing callback\r\n\r\n## Other\r\n\r\n* New activation functions support in `ABN` block: Swish, Mish, HardSigmoid\r\n* New decoders (Unet, FPN, DeeplabV3, PPM) to simplify creation of segmentation models\r\n* `CREDITS.md` to include all the references to code\u002Farticles. Existing list is definitely not complete, so feel free to make PR's\r\n* Object context block from OCNet\r\n\r\n# API changes\r\n\r\n* Focal loss now supports normalized focal loss and reduced focal loss extensions.\r\n* Optimize computation of pyramid weight matrix #34 \r\n* Default value `align_corners=False` in `F.interpolate` when doing bilinear upsampling.\r\n\r\n# Bugfixes\r\n\r\n* Fix missing call to batch normalization block in `FPNBottleneckBN`\r\n* Fix numerical stability for `DiceLoss` and `JaccardLoss` when `log_loss=True`\r\n* Fix numerical stability when computing normalized focal loss\r\n","2020-01-17T20:40:23",{"id":240,"version":241,"summary_zh":242,"released_at":243},259803,"0.2.1","# New features\r\n\r\n- Added normalized focal loss\r\n\r\n# Bugfixes\r\n\r\n- Fixed wrong shape of intermediate layers of DenseNet","2019-10-07T19:45:40",{"id":245,"version":246,"summary_zh":247,"released_at":248},259804,"0.2.0","# PyTorch Toolbelt 0.2.0\r\n\r\nThis release dedicated to housekeeping work. Dice\u002FIoU metrics and losses have been redesigned to reduce amount of duplicated code and bring more clarity. Code is now auto-formatted using Black.\r\n\r\n`pip install pytorch_toolbelt==0.2.0`\r\n\r\n## Catalyst contrib\r\n\r\n- Refactor Dice\u002FIoU loss into single metric `IoUMetricsCallback` with a few cool features: `metric=\"dice|jaccard\"` to choose what metric should be used; `mode=binary|multiclass|multilabel` to specify problem type (binary, multiclass or multi-label segmentation)'; `classes_of_interest=[1,2,4]` to select for which set of classes metric should be computed and `nan_score_on_empty=False` to compute `Dice Accuracy` (Counts as a 1.0 if both `y_true` and `y_pred` are empty; 0.0 if `y_pred` is not empty).\r\n- Added L-p regularization callback to apply L1 and L2 regularization to model with support of regularization strength scheduling.\r\n\r\n\r\n## Losses\r\n\r\n- Refactor `DiceLoss`\u002F`JaccardLoss` losses in a same fashion as metrics.\r\n\r\n## Models\r\n\r\n- Add Densenet encoders\r\n- Bugfix: Fix missing BN+Relu in `UNetDecoder`\r\n- Global pooling modules can squeeze spatial channel dimensions if `flatten=True`.\r\n\r\n## Misc\r\n\r\n- Add more unit tests\r\n- Code-style is now managed with Black\r\n- `to_numpy` now supports `int`, `float` scalar types\r\n","2019-10-04T21:04:06",{"id":250,"version":251,"summary_zh":252,"released_at":253},259805,"0.1.4","# PyTorch 0.1.4\r\n\r\n* Minor release to update Catalyst contrib modules to latest Catalyst (requires catalyst>=19.8)","2019-09-12T09:38:22"]