[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-johnmarktaylor91--torchlens":3,"tool-johnmarktaylor91--torchlens":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":23,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":107,"github_topics":108,"view_count":23,"oss_zip_url":108,"oss_zip_packed_at":108,"status":16,"created_at":109,"updated_at":110,"faqs":111,"releases":140},3941,"johnmarktaylor91\u002Ftorchlens","torchlens","Package for extracting and mapping the results of every single tensor operation in a PyTorch model in one line of code. ","TorchLens 是一款专为 PyTorch 设计的开源工具，旨在让开发者无需修改任何模型代码，仅用一行命令即可提取并可视化模型前向传播中每一个张量操作的中间结果。在深度学习研究与调试中，深入理解复杂网络的内部计算结构往往困难重重，传统方法需要手动插入钩子或大幅重构代码。TorchLens 完美解决了这一痛点，它能自动记录从简单循环网络到拥有数千个操作的大型 Transformer 模型（如 Swin V2）的全部激活值与详细元数据，并生成直观的计算图可视化。\n\n该工具特别适合深度学习研究人员、算法工程师以及需要排查模型行为的教育工作者使用。其核心技术亮点在于“零侵入”设计：用户只需定义标准模型并传入输入数据，TorchLens 便能返回包含完整执行日志的 ModelHistory 对象，支持按需查询任意层级的输出状态。无论是分析梯度消失问题、验证网络架构逻辑，还是教学演示，TorchLens 都能提供透明、详尽的内部视角，帮助用户轻松洞察黑盒模型的运作细节，是探索 PyTorch 模型内部机制的高效助手。","# \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_5ab59957d81e.png\" width=8% height=8%> TorchLens\n\n**Quick Links**\n\n- [Paper introducing TorchLens](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-023-40807-0)\n- [CoLab tutorial](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1ORJLGZPifvdsVPFqq1LYT3t5hV560SoW?usp=sharing)\n- [\\\"Menagerie\\\" of model visualizations](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Fu\u002F0\u002Ffolders\u002F1BsM6WPf3eB79-CRNgZejMxjg38rN6VCb)\n- [Metadata provided by TorchLens](https:\u002F\u002Fstatic-content.springer.com\u002Fesm\u002Fart%3A10.1038%2Fs41598-023-40807-0\u002FMediaObjects\u002F41598_2023_40807_MOESM1_ESM.pdf)\n\n## Overview\n\n*TorchLens* is a package for doing exactly two things:\n\n1) Easily extracting the activations from every single intermediate operation in a PyTorch model—no\n   modifications needed—in one line of code. \"Every operation\" means every operation; \"one line\" means one line.\n2) Understanding the model's computational structure via an intuitive automatic visualization and extensive\n   metadata ([partial list here](https:\u002F\u002Fstatic-content.springer.com\u002Fesm\u002Fart%3A10.1038%2Fs41598-023-40807-0\u002FMediaObjects\u002F41598_2023_40807_MOESM1_ESM.pdf))\n   about the network's computational graph.\n\nHere it is in action for a very simple recurrent model; as you can see, you just define the model like normal and pass\nit in, and *TorchLens* returns a full log of the forward pass along with a visualization:\n\n```python\nclass SimpleRecurrent(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.fc = nn.Linear(in_features=5, out_features=5)\n\n    def forward(self, x):\n        for r in range(4):\n            x = self.fc(x)\n            x = x + 1\n            x = x * 2\n        return x\n\n\nsimple_recurrent = SimpleRecurrent()\nmodel_history = tl.log_forward_pass(simple_recurrent, x,\n                                    layers_to_save='all',\n                                    vis_mode='rolled')\nprint(model_history['linear_1_1:2'].activation)  # second pass of first linear layer\n\n'''\ntensor([[-0.0690, -1.3957, -0.3231, -0.1980,  0.7197],\n        [-0.1083, -1.5051, -0.2570, -0.2024,  0.8248],\n        [ 0.1031, -1.4315, -0.5999, -0.4017,  0.7580],\n        [-0.0396, -1.3813, -0.3523, -0.2008,  0.6654],\n        [ 0.0980, -1.4073, -0.5934, -0.3866,  0.7371],\n        [-0.1106, -1.2909, -0.3393, -0.2439,  0.7345]])\n'''\n```\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_98594a08ddf5.png\" width=30% height=30%>\n\nAnd here it is for a very complex transformer model ([swin_v2_b](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.14030)) with 1932 operations\nin its forward pass; you can grab the saved outputs of every last one:\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_57534eeb90a8.jpg\" width=\"70%\" height=\"70%\">\n\nThe goal of *TorchLens* is to do this for any PyTorch model whatsoever. You can see a bunch of example model\nvisualizations in this [model menagerie](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Fu\u002F0\u002Ffolders\u002F1BsM6WPf3eB79-CRNgZejMxjg38rN6VCb).\n\n## Installation\n\nTo install *TorchLens*, first install graphviz if you haven't already (required to generate the network visualizations),\nand then install *TorchLens* using pip:\n\n```bash\nsudo apt install graphviz\npip install torchlens\n```\n\n*TorchLens* is compatible with versions 1.8.0+ of PyTorch.\n\n## How-To Guide\n\nBelow is a quick demo of how to use it; for an interactive demonstration, see\nthe [CoLab walkthrough](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1ORJLGZPifvdsVPFqq1LYT3t5hV560SoW?usp=sharing).\n\nThe main function of *TorchLens* is `log_forward_pass`: when called on a model and input, it runs a\nforward pass on the model and returns a ModelHistory object containing the intermediate layer activations and\naccompanying metadata, along with a visual representation of every operation that occurred during the forward pass:\n\n```python\nimport torch\nimport torchvision\nimport torchlens as tl\n\nalexnet = torchvision.models.alexnet()\nx = torch.rand(1, 3, 224, 224)\nmodel_history = tl.log_forward_pass(alexnet, x, layers_to_save='all', vis_mode='unrolled')\nprint(model_history)\n\n'''\nLog of AlexNet forward pass:\n\tModel structure: purely feedforward, without branching; 23 total modules.\n\t24 tensors (4.8 MB) computed in forward pass; 24 tensors (4.8 MB) saved.\n\t16 parameter operations (61100840 params total; 248.7 MB).\n\tRandom seed: 3210097511\n\tTime elapsed: 0.288s\n\tModule Hierarchy:\n\t\tfeatures:\n\t\t    features.0, features.1, features.2, features.3, features.4, features.5, features.6, features.7,\n\t\t    features.8, features.9, features.10, features.11, features.12\n\t\tavgpool\n\t\tclassifier:\n\t\t    classifier.0, classifier.1, classifier.2, classifier.3, classifier.4, classifier.5, classifier.6\n\tLayers:\n\t\t0: input_1_0\n\t\t1: conv2d_1_1\n\t\t2: relu_1_2\n\t\t3: maxpool2d_1_3\n\t\t4: conv2d_2_4\n\t\t5: relu_2_5\n\t\t6: maxpool2d_2_6\n\t\t7: conv2d_3_7\n\t\t8: relu_3_8\n\t\t9: conv2d_4_9\n\t\t10: relu_4_10\n\t\t11: conv2d_5_11\n\t\t12: relu_5_12\n\t\t13: maxpool2d_3_13\n\t\t14: adaptiveavgpool2d_1_14\n\t\t15: flatten_1_15\n\t\t16: dropout_1_16\n\t\t17: linear_1_17\n\t\t18: relu_6_18\n\t\t19: dropout_2_19\n\t\t20: linear_2_20\n\t\t21: relu_7_21\n\t\t22: linear_3_22\n\t\t23: output_1_23\n'''\n```\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_a20d5a8a75e2.png\" width=30% height=30%>\n\nYou can pull out information about a given layer, including its activations and helpful metadata, by indexing\nthe ModelHistory object in any of these equivalent ways:\n\n1) the name of a layer (with the convention that 'conv2d_3_7' is the 3rd convolutional layer, and the 7th layer overall)\n2) the name of a module (e.g., 'features' or 'classifier.3') for which that layer is an output, or\n3) the ordinal position of the layer (e.g., 2 for the 2nd layer, -5 for the fifth-to-last; inputs and outputs count as\n   layers here).\n\nTo quickly figure out these names, you can look at the graph visualization, or at the output of printing the\nModelHistory object (both shown above). Here are some examples of how to pull out information about a\nparticular layer, and also how to pull out the actual activations from that layer:\n\n```python\nprint(model_history['conv2d_3_7'])  # pulling out layer by its name\n# The following commented lines pull out the same layer:\n# model_history['conv2d_3'] you can omit the second number (since strictly speaking it's redundant)\n# model_history['conv2d_3_7:1'] colon indicates the pass of a layer (here just one)\n# model_history['features.6'] can grab a layer by the module for which it is an output\n# model_history[7] the 7th layer overall\n# model_history[-17] the 17th-to-last layer\n'''\nLayer conv2d_3_7, operation 8\u002F24:\n\tOutput tensor: shape=(1, 384, 13, 13), dype=torch.float32, size=253.5 KB\n\t\ttensor([[ 0.0503, -0.1089, -0.1210, -0.1034, -0.1254],\n        [ 0.0789, -0.0752, -0.0581, -0.0372, -0.0181],\n        [ 0.0949, -0.0780, -0.0401, -0.0209, -0.0095],\n        [ 0.0929, -0.0353, -0.0220, -0.0324, -0.0295],\n        [ 0.1100, -0.0337, -0.0330, -0.0479, -0.0235]])...\n\tParams: Computed from params with shape (384,), (384, 192, 3, 3); 663936 params total (2.5 MB)\n\tParent Layers: maxpool2d_2_6\n\tChild Layers: relu_3_8\n\tFunction: conv2d (grad_fn=ConvolutionBackward0)\n\tComputed inside module: features.6\n\tTime elapsed:  5.670E-04s\n\tOutput of modules: features.6\n\tOutput of bottom-level module: features.6\n\tLookup keys: -17, 7, conv2d_3_7, conv2d_3_7:1, features.6, features.6:1\n'''\n\n# You can pull out the actual output activations from a layer with the activation field:\nprint(model_history['conv2d_3_7'].activation)\n'''\ntensor([[[[-0.0867, -0.0787, -0.0817,  ..., -0.0820, -0.0655, -0.0195],\n          [-0.1213, -0.1130, -0.1386,  ..., -0.1331, -0.1118, -0.0520],\n          [-0.0959, -0.0973, -0.1078,  ..., -0.1103, -0.1091, -0.0760],\n          ...,\n          [-0.0906, -0.1146, -0.1308,  ..., -0.1076, -0.1129, -0.0689],\n          [-0.1017, -0.1256, -0.1100,  ..., -0.1160, -0.1035, -0.0801],\n          [-0.1006, -0.0941, -0.1204,  ..., -0.1146, -0.1065, -0.0631]]...\n'''\n```\n\nIf you do not wish to save the activations for all layers (e.g., to save memory), you can specify which layers to save\nwith the `layers_to_save` argument when calling `log_forward_pass`; you can either indicate layers in the same way\nas indexing them above, or by passing in a desired substring for filtering the layers (e.g., 'conv'\nwill pull out all conv layers):\n\n```python\n# Pull out conv2d_3_7, the output of the 'features' module, the fifth-to-last layer, and all linear (i.e., fc) layers:\nmodel_history = tl.log_forward_pass(alexnet, x, vis_mode='unrolled',\n                                    layers_to_save=['conv2d_3_7', 'features', -5, 'linear'])\nprint(model_history.layer_labels)\n'''\n['conv2d_3_7', 'maxpool2d_3_13', 'linear_1_17', 'dropout_2_19', 'linear_2_20', 'linear_3_22']\n'''\n```\n\nThe main function of *TorchLens* is `log_forward_pass`; the remaining functions are:\n\n1) `get_model_metadata`, to retrieve all model metadata without saving any activations (e.g., to figure out which\n   layers you wish to save; note that this is the same as calling `log_forward_pass` with `layers_to_save=None`)\n2) `show_model_graph`, which visualizes the model graph without saving any activations\n3) `validate_model_activations`, which runs a procedure to check that the activations are correct: specifically,\n   it runs a forward pass and saves all intermediate activations, re-runs the forward pass from each intermediate\n   layer, and checks that the resulting output matches the ground-truth output. It also checks that swapping in\n   random nonsense activations instead of the saved activations generates the wrong output. **If this function ever\n   returns False (i.e., the saved activations are wrong), please contact me via email (johnmarkedwardtaylor@gmail.com)\n   or on this GitHub page with a description of the problem, and I will update TorchLens to fix the problem.**\n\nAnd that's it. *TorchLens* remains in active development, and the goal is for it to work with any PyTorch model\nwhatosever without exception. As of the time of this writing, it has been tested with over 700\nimage, video, auditory, multimodal, and language models, including feedforward, recurrent, transformer,\nand graph neural networks.\n\n## Miscellaneous Features\n\n- You can visualize models at different levels of nesting depth using the `vis_nesting_depth` argument\n  to `log_forward_pass`; for example, here you can see one of GoogLeNet's \"inception\" modules at different levels of\n  nesting depth:\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_743e77279612.png\" width=80% height=80%>\n\n- An experimental feature is to extract not just the activations from all of a model's operations,\n  but also the gradients from a backward pass (which you can compute based on any intermediate layer, not just the\n  model's\n  output),\n  and also visualize the path taken by the backward pass (shown with blue arrows below). See the CoLab tutorial for\n  instructions on how to do this.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_f51f1a2efb26.png\" width=30% height=30%>\n\n- You can see the literal code that was used to run the model with the func_call_stack field.\n  Each entry is a `FuncCallLocation` object with a clean repr and source context:\n\n```python\nprint(model_history['conv2d_3'].func_call_stack[0])\n'''\nFuncCallLocation:\n  file: \u002Fusr\u002Flocal\u002Flib\u002Fpython3.10\u002Fdist-packages\u002Ftorchvision\u002Fmodels\u002Falexnet.py\n  line: 48\n  function: forward\n  code:\n          x = self.features(x)\n          x = self.avgpool(x)\n    --->  x = self.classifier(x)\n          return x\n'''\n```\n\n## Planned Features\n\n1) In the further future, I am considering adding functionality to not just save activations,\n   but counterfactually intervene on them (e.g., how would the output have changed if these parameters\n   were different or if a different nonlinearity were used). Let me know if you'd find this useful\n   and if so, what specific kind of functionality you'd want.\n2) I am planning to add an option to only visualize a single submodule of a model rather than the full graph at once.\n\n## Other Packages You Should Check Out\n\nThe goal is for *TorchLens* to completely solve the problem of extracting activations and metadata\nfrom deep neural networks and visualizing their structure so that nobody has to think about this stuff ever again, but\nit intentionally leaves out certain functionality: for example, it has no functions for loading models or stimuli, or\nfor analyzing the extracted activations. This is in part because it's impossible to predict all the things you might\nwant to do with the activations, or all the possible models you might want to look at, but also because there are\nalready outstanding packages for doing these things. Here are a few-let me know if I've missed any!\n\n- [Cerbrec](cerbrec.com): Program for interactively visualizing and debugging deep neural networks (uses TorchLens under\n  the hood for extracting the graphs of PyTorch models!)\n- [ThingsVision](https:\u002F\u002Fgithub.com\u002FViCCo-Group\u002Fthingsvision): has excellent functionality for loading vision models,\n  loading stimuli, and analyzing the extracted activations\n- [Net2Brain](https:\u002F\u002Fgithub.com\u002Fcvai-roig-lab\u002FNet2Brain): similar excellent end-to-end functionality to ThingsVision,\n  along with functionality for comparing extracted activations to neural data.\n- [surgeon-pytorch](https:\u002F\u002Fgithub.com\u002Farchinetai\u002Fsurgeon-pytorch): easy-to-use functionality for extracting activations\n  from models, along with functionality for training a model using loss functions based on intermediate layer\n  activations\n- [deepdive](https:\u002F\u002Fgithub.com\u002FColinConwell\u002FDeepDive): has outstanding functionality for loading and benchmarking\n  many different models\n- [torchvision feature_extraction module](https:\u002F\u002Fpytorch.org\u002Fvision\u002Fstable\u002Ffeature_extraction.html): can extract\n  activations from models with static computational graphs\n- [rsatoolbox3](https:\u002F\u002Fgithub.com\u002Frsagroup\u002Frsatoolbox): total solution for performing representational similarity\n  analysis on DNN activations and brain data\n\n## Acknowledgments\n\nThe development of *TorchLens* benefitted greatly from discussions with Nikolaus Kriegeskorte, George Alvarez,\nAlfredo Canziani, Tal Golan, and the Visual Inference Lab at Columbia University. Thank you to Kale Kundert\nfor helpful discussion and for his code contributions enabling PyTorch Lightning compatibility.\nAll network visualizations were created with graphviz. Logo created by Nikolaus Kriegeskorte.\n\n## Citing Torchlens\n\nTo cite *TorchLens*, you can\ncite [this paper describing the package](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-023-40807-0) (and consider adding a star\nto this repo if you find *TorchLens* useful):\n\nTaylor, J., Kriegeskorte, N. Extracting and visualizing hidden activations and computational graphs of PyTorch models\nwith *TorchLens*. Sci Rep 13, 14375 (2023). https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41598-023-40807-0\n\n## Contact\n\nAs *TorchLens* is still in active development, I would love your feedback. Please contact\njohnmarkedwardtaylor@gmail.com,\ncontact me via [twitter](https:\u002F\u002Ftwitter.com\u002Fjohnmark_taylor), or post on\nthe [issues](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fissues)\nor [discussion](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fdiscussions) page for this GitHub\nrepository, if you have any questions, comments, or suggestions (or if you'd be interested in collaborating!).\n","# \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_5ab59957d81e.png\" width=8% height=8%> TorchLens\n\n**快速链接**\n\n- [介绍TorchLens的论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-023-40807-0)\n- [Colab教程](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1ORJLGZPifvdsVPFqq1LYT3t5hV560SoW?usp=sharing)\n- [模型可视化“动物园”](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Fu\u002F0\u002Ffolders\u002F1BsM6WPf3eB79-CRNgZejMxjg38rN6VCb)\n- [TorchLens提供的元数据](https:\u002F\u002Fstatic-content.springer.com\u002Fesm\u002Fart%3A10.1038%2Fs41598-023-40807-0\u002FMediaObjects\u002F41598_2023_40807_MOESM1_ESM.pdf)\n\n## 概述\n\n*TorchLens* 是一个用于完成以下两件事的工具包：\n\n1) 无需任何修改，只需一行代码即可轻松提取 PyTorch 模型中每一个中间操作的激活值。“每一个操作”指的就是每一个操作；“一行”就是真正的一行。\n2) 通过直观的自动可视化以及关于网络计算图的丰富元数据（[部分列表在此](https:\u002F\u002Fstatic-content.springer.com\u002Fesm\u002Fart%3A10.1038%2Fs41598-023-40807-0\u002FMediaObjects\u002F41598_2023_40807_MOESM1_ESM.pdf)），帮助理解模型的计算结构。\n\n下面是一个非常简单的循环模型的示例：正如你所看到的，你只需像平常一样定义模型并将其传入，*TorchLens* 就会返回完整的前向传播日志以及可视化结果：\n\n```python\nclass SimpleRecurrent(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.fc = nn.Linear(in_features=5, out_features=5)\n\n    def forward(self, x):\n        for r in range(4):\n            x = self.fc(x)\n            x = x + 1\n            x = x * 2\n        return x\n\n\nsimple_recurrent = SimpleRecurrent()\nmodel_history = tl.log_forward_pass(simple_recurrent, x,\n                                    layers_to_save='all',\n                                    vis_mode='rolled')\nprint(model_history['linear_1_1:2'].activation)  # 第一次线性层的第二次传递\n\n'''\ntensor([[-0.0690, -1.3957, -0.3231, -0.1980,  0.7197],\n        [-0.1083, -1.5051, -0.2570, -0.2024,  0.8248],\n        [ 0.1031, -1.4315, -0.5999, -0.4017,  0.7580],\n        [-0.0396, -1.3813, -0.3523, -0.2008,  0.6654],\n        [ 0.0980, -1.4073, -0.5934, -0.3866,  0.7371],\n        [-0.1106, -1.2909, -0.3393, -0.2439,  0.7345]])\n'''\n```\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_98594a08ddf5.png\" width=30% height=30%>\n\n再来看一个非常复杂的 Transformer 模型（[swin_v2_b](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.14030)），其前向传播过程中有 1932 个操作；你可以获取到每一个操作的保存输出：\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_57534eeb90a8.jpg\" width=\"70%\" height=\"70%\">\n\n*TorchLens* 的目标是能够对任何 PyTorch 模型实现上述功能。你可以在这个 [模型动物园](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Fu\u002F0\u002Ffolders\u002F1BsM6WPf3eB79-CRNgZejMxjg38rN6VCb) 中看到许多模型可视化的示例。\n\n## 安装\n\n要安装 *TorchLens*，首先确保已安装 graphviz（生成网络可视化所需的工具），然后使用 pip 安装 *TorchLens*：\n\n```bash\nsudo apt install graphviz\npip install torchlens\n```\n\n*TorchLens* 兼容 PyTorch 1.8.0 及以上版本。\n\n## 使用指南\n\n以下是使用方法的简短演示；如需交互式演示，请参阅 [CoLab 演示](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1ORJLGZPifvdsVPFqq1LYT3t5hV560SoW?usp=sharing)。\n\n*TorchLens* 的主要函数是 `log_forward_pass`：当它被调用时，会对给定的模型和输入执行一次前向传播，并返回一个 ModelHistory 对象，其中包含中间层的激活值及配套的元数据，同时还会生成前向传播过程中每个操作的可视化表示：\n\n```python\nimport torch\nimport torchvision\nimport torchlens as tl\n\nalexnet = torchvision.models.alexnet()\nx = torch.rand(1, 3, 224, 224)\nmodel_history = tl.log_forward_pass(alexnet, x, layers_to_save='all', vis_mode='unrolled')\nprint(model_history)\n\n'''\nAlexNet前向传播日志：\n\t模型结构：纯前馈，无分支；共23个模块。\n\t前向传播中计算了24个张量（4.8 MB）；保存了24个张量（4.8 MB）。\n\t参数操作16次（总参数数61100840个；248.7 MB）。\n\t随机种子：3210097511\n\t耗时：0.288秒\n\t模块层级：\n\t\tfeatures:\n\t\t    features.0, features.1, features.2, features.3, features.4, features.5, features.6, features.7,\n\t\t    features.8, features.9, features.10, features.11, features.12\n\t\tavgpool\n\t\tclassifier:\n\t\t    classifier.0, classifier.1, classifier.2, classifier.3, classifier.4, classifier.5, classifier.6\n\t层：\n\t\t0: input_1_0\n\t\t1: conv2d_1_1\n\t\t2: relu_1_2\n\t\t3: maxpool2d_1_3\n\t\t4: conv2d_2_4\n\t\t5: relu_2_5\n\t\t6: maxpool2d_2_6\n\t\t7: conv2d_3_7\n\t\t8: relu_3_8\n\t\t9: conv2d_4_9\n\t\t10: relu_4_10\n\t\t11: conv2d_5_11\n\t\t12: relu_5_12\n\t\t13: maxpool2d_3_13\n\t\t14: adaptiveavgpool2d_1_14\n\t\t15: flatten_1_15\n\t\t16: dropout_1_16\n\t\t17: linear_1_17\n\t\t18: relu_6_18\n\t\t19: dropout_2_19\n\t\t20: linear_2_20\n\t\t21: relu_7_21\n\t\t22: linear_3_22\n\t\t23: output_1_23\n'''\n```\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_a20d5a8a75e2.png\" width=30% height=30%>\n\n你可以通过以下任意一种等效方式索引 ModelHistory 对象，来提取特定层的信息，包括其激活值和有用的元数据：\n\n1) 层的名称（约定为 'conv2d_3_7' 表示第3个卷积层，整体的第7层）\n2) 该层所属模块的名称（例如 'features' 或 'classifier.3'）\n3) 层的序号位置（例如 2 表示第2层，-5 表示倒数第5层；这里的输入和输出也被计为层）。\n\n为了快速确定这些名称，你可以查看图可视化，或者直接打印 ModelHistory 对象的输出（两者均已在上文展示）。以下是一些提取特定层信息以及该层实际激活值的示例：\n\n```python\nprint(model_history['conv2d_3_7'])  # 通过层名提取\n# 以下注释掉的行也能提取同一层：\n# model_history['conv2d_3'] 可以省略第二个数字（因为严格来说它是冗余的）\n# model_history['conv2d_3_7:1'] 冒号表示某层的哪一次传递（这里只有一次）\n# model_history['features.6'] 可以通过该层所属的模块来提取\n# model_history[7] 整体的第7层\n\n# model_history[-17] 倒数第17层\n'''\n层 conv2d_3_7，操作 8\u002F24：\n\t输出张量：形状=(1, 384, 13, 13)，类型=torch.float32，大小=253.5 KB\n\t\t张量([[ 0.0503, -0.1089, -0.1210, -0.1034, -0.1254],\n        [ 0.0789, -0.0752, -0.0581, -0.0372, -0.0181],\n        [ 0.0949, -0.0780, -0.0401, -0.0209, -0.0095],\n        [ 0.0929, -0.0353, -0.0220, -0.0324, -0.0295],\n        [ 0.1100, -0.0337, -0.0330, -0.0479, -0.0235]])...\n\t参数：由形状为 (384,) 和 (384, 192, 3, 3) 的参数计算得出；共 663936 个参数（2.5 MB）\n\t父层：maxpool2d_2_6\n\t子层：relu_3_8\n\t函数：conv2d（grad_fn=ConvolutionBackward0）\n\t在模块中计算：features.6\n\t耗时：5.670E-04s\n\t模块输出：features.6\n\t底层模块输出：features.6\n\t查找键：-17、7、conv2d_3_7、conv2d_3_7:1、features.6、features.6:1\n'''\n\n# 你可以通过 activation 字段提取某一层的实际输出激活值：\nprint(model_history['conv2d_3_7'].activation)\n'''\ntensor([[[[-0.0867, -0.0787, -0.0817,  ..., -0.0820, -0.0655, -0.0195],\n          [-0.1213, -0.1130, -0.1386,  ..., -0.1331, -0.1118, -0.0520],\n          [-0.0959, -0.0973, -0.1078,  ..., -0.1103, -0.1091, -0.0760],\n          ...,\n          [-0.0906, -0.1146, -0.1308,  ..., -0.1076, -0.1129, -0.0689],\n          [-0.1017, -0.1256, -0.1100,  ..., -0.1160, -0.1035, -0.0801],\n          [-0.1006, -0.0941, -0.1204,  ..., -0.1146, -0.1065, -0.0631]]...\n'''\n```\n\n如果你不想保存所有层的激活值（例如为了节省内存），可以在调用 `log_forward_pass` 时使用 `layers_to_save` 参数指定要保存的层；你可以像上面那样通过索引方式指定层，也可以传入一个子字符串来筛选层（例如，'conv' 会提取所有卷积层）：\n\n```python\n# 提取 conv2d_3_7、'features' 模块的输出、倒数第五层以及所有线性层（即 fc 层）：\nmodel_history = tl.log_forward_pass(alexnet, x, vis_mode='unrolled',\n                                    layers_to_save=['conv2d_3_7', 'features', -5, 'linear'])\nprint(model_history.layer_labels)\n'''\n['conv2d_3_7', 'maxpool2d_3_13', 'linear_1_17', 'dropout_2_19', 'linear_2_20', 'linear_3_22']\n'''\n```\n\n*TorchLens* 的主要功能是 `log_forward_pass`；其余功能包括：\n\n1) `get_model_metadata`：用于检索所有模型元数据而无需保存任何激活值（例如，用于确定要保存哪些层；请注意，这与调用 `log_forward_pass` 并设置 `layers_to_save=None` 相同）\n2) `show_model_graph`：用于可视化模型图结构，同样无需保存任何激活值\n3) `validate_model_activations`：运行一个流程来检查激活值是否正确：具体来说，它会先执行一次前向传播并保存所有中间激活值，然后从每个中间层重新开始前向传播，并验证结果是否与真实输出一致。此外，它还会检查用随机噪声替换保存的激活值是否会生成错误的输出。**如果此函数返回 False（即保存的激活值有误），请通过电子邮件 (johnmarkedwardtaylor@gmail.com) 或本 GitHub 页面告知我问题详情，我将更新 TorchLens 以修复该问题。**\n\n以上就是全部内容。*TorchLens* 目前仍在积极开发中，目标是能够毫无例外地兼容任何 PyTorch 模型。截至撰写本文时，它已在超过 700 种图像、视频、音频、多模态和语言模型上进行了测试，涵盖前馈网络、循环神经网络、Transformer 以及图神经网络等。\n\n## 其他功能\n\n- 你可以使用 `log_forward_pass` 的 `vis_nesting_depth` 参数以不同嵌套深度可视化模型；例如，这里展示了 GoogLeNet 的“inception”模块在不同嵌套深度下的视图：\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_743e77279612.png\" width=80% height=80%>\n\n- 一项实验性功能是可以不仅提取模型所有操作的激活值，还可以提取反向传播中的梯度（这些梯度可以基于任意中间层计算，而不仅仅是模型的输出），并可视化反向传播的路径（如下方蓝色箭头所示）。有关如何实现此功能的说明，请参阅 CoLab 教程。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_readme_f51f1a2efb26.png\" width=30% height=30%>\n\n- 你可以通过 func_call_stack 字段查看运行模型所用的确切代码。每一项都是一个 `FuncCallLocation` 对象，包含清晰的表示和源代码上下文：\n\n```python\nprint(model_history['conv2d_3'].func_call_stack[0])\n'''\nFuncCallLocation：\n  文件：\u002Fusr\u002Flocal\u002Flib\u002Fpython3.10\u002Fdist-packages\u002Ftorchvision\u002Fmodels\u002Falexnet.py\n  行：48\n  函数：forward\n  代码：\n          x = self.features(x)\n          x = self.avgpool(x)\n    --->  x = self.classifier(x)\n          return x\n'''\n```\n\n## 计划中的功能\n\n1) 在未来，我计划增加一项功能，不仅可以保存激活值，还可以对它们进行反事实干预（例如，如果这些参数不同或使用了不同的非线性激活函数，输出会如何变化）。请告诉我你是否觉得这项功能有用，以及你希望具体实现什么样的功能。\n2) 我计划添加一个选项，允许只可视化模型的一个子模块，而不是一次性展示整个图结构。\n\n## 其他值得了解的工具包\n\n我们的目标是让 *TorchLens* 完全解决从深度神经网络中提取激活值和元数据、并可视化其结构的问题，从而让任何人都不再需要为此操心。然而，它有意省略了一些功能：例如，没有用于加载模型或刺激数据、也没有用于分析提取出的激活值的函数。这部分原因在于，我们无法预知你可能想对这些激活值做哪些操作，也无法预见所有你可能想要研究的模型类型；另一方面，目前已经有一些非常优秀的工具包可以完成这些任务。以下是一些推荐的工具包——如果你觉得我遗漏了什么，请告诉我！\n\n- [Cerbrec](cerbrec.com)：一款用于交互式可视化和调试深度神经网络的程序（底层使用了 *TorchLens* 来提取 PyTorch 模型的计算图！）\n- [ThingsVision](https:\u002F\u002Fgithub.com\u002FViCCo-Group\u002Fthingsvision)：提供了强大的视觉模型加载、刺激数据加载以及提取激活值后分析的功能。\n- [Net2Brain](https:\u002F\u002Fgithub.com\u002Fcvai-roig-lab\u002FNet2Brain)：与 ThingsVision 类似，同样具备出色的端到端功能，并且还支持将提取的激活值与神经科学数据进行比较。\n- [surgeon-pytorch](https:\u002F\u002Fgithub.com\u002Farchinetai\u002Fsurgeon-pytorch)：易于使用的模型激活值提取功能，同时还支持基于中间层激活值定义损失函数来训练模型。\n- [deepdive](https:\u002F\u002Fgithub.com\u002FColinConwell\u002FDeepDive)：在加载和基准测试多种不同模型方面表现出色。\n- [torchvision feature_extraction 模块](https:\u002F\u002Fpytorch.org\u002Fvision\u002Fstable\u002Ffeature_extraction.html)：可以从具有静态计算图的模型中提取激活值。\n- [rsatoolbox3](https:\u002F\u002Fgithub.com\u002Frsagroup\u002Frsatoolbox)：一个完整的解决方案，可用于对深度神经网络的激活值和脑数据进行表征相似性分析。\n\n## 致谢\n\n*TorchLens* 的开发得益于与 Nikolaus Kriegeskorte、George Alvarez、Alfredo Canziani、Tal Golan 以及哥伦比亚大学视觉推理实验室的深入讨论。同时，也感谢 Kale Kundert 提供的有益讨论及代码贡献，使得 *TorchLens* 能够兼容 PyTorch Lightning。所有网络可视化均使用 Graphviz 生成。Logo 由 Nikolaus Kriegeskorte 设计。\n\n## 引用 TorchLens\n\n若需引用 *TorchLens*，您可以参考[介绍该工具包的论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-023-40807-0)，并在发现 *TorchLens* 有用时为本仓库点亮一颗星：\n\nTaylor, J., Kriegeskorte, N. 使用 *TorchLens* 提取并可视化 PyTorch 模型的隐藏激活值和计算图。Sci Rep 13, 14375 (2023). https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41598-023-40807-0\n\n## 联系方式\n\n由于 *TorchLens* 仍处于积极开发阶段，我们非常期待您的反馈。如有任何问题、意见或建议，或者您有兴趣开展合作，请通过以下方式与我们联系：\n- 邮箱：johnmarkedwardtaylor@gmail.com\n- Twitter：[twitter.com\u002Fjohnmark_taylor](https:\u002F\u002Ftwitter.com\u002Fjohnmark_taylor)\n- GitHub 仓库的 [Issues](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fissues) 或 [Discussions](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fdiscussions) 页面。","# TorchLens 快速上手指南\n\nTorchLens 是一个强大的 PyTorch 工具，旨在无需修改模型代码的情况下，一键提取模型中**每一个**中间操作的激活值（Activations），并提供直观的计算图可视化及详细的元数据。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **Python 版本**：兼容主流 Python 3.x 版本。\n*   **PyTorch 版本**：需要 **1.8.0** 或更高版本。\n*   **系统依赖**：必须安装 **Graphviz** 用于生成网络结构可视化图像。\n    *   **Ubuntu\u002FDebian**: `sudo apt install graphviz`\n    *   **macOS (Homebrew)**: `brew install graphviz`\n    *   **Windows**: 请下载并安装 Graphviz 安装包，并将 `bin` 目录添加到系统环境变量 PATH 中。\n\n## 安装步骤\n\n推荐使用国内镜像源加速安装过程。\n\n1.  **安装系统依赖 Graphviz**（如已安装可跳过）：\n    ```bash\n    sudo apt install graphviz\n    ```\n\n2.  **通过 pip 安装 TorchLens**（使用清华源）：\n    ```bash\n    pip install torchlens -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n## 基本使用\n\nTorchLens 的核心功能是 `log_forward_pass`。只需一行代码，即可记录模型的前向传播过程，获取所有中间层的激活值和计算图。\n\n### 最简单的使用示例\n\n以下示例展示了如何对预训练的 AlexNet 模型进行完整的前向传播日志记录：\n\n```python\nimport torch\nimport torchvision\nimport torchlens as tl\n\n# 1. 加载模型和输入数据\nalexnet = torchvision.models.alexnet()\nx = torch.rand(1, 3, 224, 224)\n\n# 2. 执行前向传播记录\n# layers_to_save='all': 保存所有层的激活值\n# vis_mode='unrolled': 展开显示计算图\nmodel_history = tl.log_forward_pass(alexnet, x, layers_to_save='all', vis_mode='unrolled')\n\n# 3. 查看摘要信息\nprint(model_history)\n\n# 4. 提取特定层的激活值\n# 可以通过层名称（如 'conv2d_3_7'）、模块名或索引位置访问\nlayer_info = model_history['conv2d_3_7']\nactivations = layer_info.activation\n\nprint(f\"激活值形状：{activations.shape}\")\n```\n\n### 关键功能说明\n\n*   **自动可视化**：运行上述代码后，TorchLens 会自动生成并显示模型的计算结构图（需支持图形显示的终端或 Notebook 环境）。\n*   **灵活索引**：您可以通过多种方式访问特定层的数据：\n    *   按层名称：`model_history['conv2d_3_7']`\n    *   按模块输出：`model_history['features.6']`\n    *   按顺序索引：`model_history[7]` （第 7 层）或 `model_history[-1]` （最后一层）\n*   **节省内存**：如果不需要保存所有层，可通过 `layers_to_save` 参数指定特定层或关键词（如 `['conv', 'linear']`）来过滤。\n\n```python\n# 仅保存卷积层和名为 'features' 的模块输出\nmodel_history = tl.log_forward_pass(alexnet, x, layers_to_save=['conv', 'features'])\n```","某计算机视觉研究员正在调试一个包含复杂循环结构和动态分支的自定义 PyTorch 模型，试图定位导致梯度消失的具体算子位置。\n\n### 没有 torchlens 时\n- **代码侵入性强**：为了获取中间层输出，必须在模型源码中手动插入大量钩子（hooks）或修改 `forward` 函数，破坏了原有代码结构且容易引入新 bug。\n- **动态结构难追踪**：面对循环或条件分支，难以确定某次特定迭代（如第 3 次循环中的线性层）的具体激活值，只能靠打印猜测。\n- **黑盒可视化缺失**：缺乏自动生成的计算图，无法直观看到数据在数百个算子间的真实流向，排查拓扑错误全靠脑补。\n- **调试效率低下**：每次尝试提取不同层的特征都需要重新编写提取逻辑并重启训练，单次排查往往耗时数小时。\n\n### 使用 torchlens 后\n- **零代码侵入**：仅需调用 `tl.log_forward_pass` 一行代码，无需修改任何模型定义，即可自动捕获所有中间算子的输出结果。\n- **精准定位迭代**：通过生成的元数据字典，可直接索引到具体操作（如 `linear_1_3:2` 表示第一次线性层的第三次通行），瞬间锁定异常数值。\n- **自动全景可视**：自动生成包含 1900+ 个节点的计算图可视化报告，清晰展示包括滚动循环在内的完整数据流转路径。\n- **即时反馈闭环**：一次性获取全量历史日志与结构信息，将原本数小时的“猜谜式”调试缩短为分钟级的精准分析。\n\ntorchlens 将 PyTorch 模型的黑盒前向传播转化为透明、可索引且可视化的白盒过程，让复杂网络的内部机理一目了然。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjohnmarktaylor91_torchlens_5ab59957.png","johnmarktaylor91","JohnMark Taylor","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fjohnmarktaylor91_4c4723ba.jpg","Visual neuroscience postdoc at Columbia University, working in the Kriegeskorte Lab. ","Columbia University","New York, NY","johnmarkedwardtaylor@gmail.com","johnmark_taylor","johnmarktaylor.com","https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91",[86,90],{"name":87,"color":88,"percentage":89},"Python","#3572A5",99.5,{"name":91,"color":92,"percentage":93},"Jupyter Notebook","#DA5B0B",0.5,640,28,"2026-03-23T03:18:45","GPL-3.0","Linux, macOS, Windows","未说明 (依赖 PyTorch 环境，CPU\u002FGPU 均可运行)","未说明 (取决于模型大小及是否保存所有中间层激活值)",{"notes":102,"python":103,"dependencies":104},"安装前需先在操作系统层面安装 Graphviz 工具（例如 Linux 下使用 'sudo apt install graphviz'），否则无法生成网络可视化图像。该工具兼容 PyTorch 1.8.0 及以上版本，旨在支持任意 PyTorch 模型（包括 CNN、RNN、Transformer 等）。若需保存所有中间层激活值，内存消耗会随模型深度显著增加，可通过 layers_to_save 参数筛选特定层以节省内存。","未说明 (示例中显示使用了 Python 3.10)",[105,106],"torch>=1.8.0","graphviz",[13],null,"2026-03-27T02:49:30.150509","2026-04-06T08:40:10.923742",[112,117,122,127,132,136],{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},17988,"为什么在使用 TorchLens 可视化时，某些变量看起来被原地更新了（例如 y=x1）？","这是因为执行的函数是“原地”加法操作（in-place add，例如 `iadd`，如 `x += 1`）。在这种情况下，x1 会被原地更新，因此后续引用该变量的结果（如 y）会显示为更新后的值。这是 PyTorch 的正常行为，TorchLens 如实反映了计算图。如果希望逻辑更清晰，建议使用非原地操作，例如将 `x1 += x2` 改为 `add_output = x1 + x2`。","https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fissues\u002F23",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},17989,"TorchLens 是否支持带有性能指标（如运行时间、内存使用）的颜色编码图形可视化？","是的，该功能已在最新发布的版本中添加到了主分支。现在用户可以利用 TorchLens 进行带有颜色编码的计算图可视化，以展示每个计算图元素的性能指标（主要是运行时间，也包括内存指标），从而帮助识别性能瓶颈。请确保升级到最新版本以使用此功能。","https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fissues\u002F20",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},17990,"如何在使用 ESCNN 等第三方库构建的模型上使用 TorchLens？","TorchLens 通过给 PyTorch 命名空间中的所有函数添加装饰器来记录调用结果。对于像 ESCNN 这样导入方式特殊的库，核心问题在于装饰时机。目前的机制是在调用 `log_forward_pass` 时进行装饰，以确保外部调用时函数保持“干净”状态。如果遇到兼容性问题，可能需要确保在导入相关库之前或之后正确初始化 TorchLens，或者等待维护者实现更自动化的装饰方案（如在导入 TorchLens 时自动 exhaustive 地装饰所有引用）。目前建议先尝试标准用法，若失败可检查导入顺序。","https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fissues\u002F18",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},17991,"在使用 Torchvision 的 Swin Transformer 模型并指定 `layers_to_save` 子集时，为什么报错说“计算图已改变”？","当使用 `log_forward_pass` 提取特定层（`layers_to_save`）的激活值时，如果输入数据或随机种子与初始调用时不一致，或者模型内部结构因动态特性导致计算图变化，就会触发此错误。对于 Swin Transformer 等复杂模型，确保每次运行时的输入张量形状、数据类型完全一致，并且固定随机种子（如使用 `torch.manual_seed()`）。如果问题依旧，可能是模型内部的动态控制流导致图结构不稳定，建议尝试不指定 `layers_to_save` 先运行完整日志，或检查模型是否处于 eval 模式以固定行为。","https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fissues\u002F58",{"id":133,"question_zh":134,"answer_zh":135,"source_url":116},17992,"运行 TorchLens 时出现关于 `has_cuda`, `has_cudnn` 等属性已弃用的警告，该如何处理？","这些警告源自 PyTorch 新版本中废弃了 `torch.has_cuda` 等旧属性，建议使用新的 API（如 `torch.backends.cuda.is_built()`）。虽然这些警告通常不影响 TorchLens 的核心功能，但它们表明你的 PyTorch 版本较新而 TorchLens 可能依赖了旧版接口。你可以忽略这些 UserWarning，或者升级 TorchLens 到最新版本（如果已修复）。如果警告伴随报错（如 TypeError），则需检查是否因 PyTorch 版本不兼容导致参数传递错误，此时应同步更新 TorchLens。",{"id":137,"question_zh":138,"answer_zh":139,"source_url":126},17993,"TorchLens 是如何实现对 PyTorch 函数的监控和日志记录的？","TorchLens 的工作原理是给 PyTorch 命名空间中的所有函数附加一个复杂的装饰器（decorator）。每当这些被装饰的函数被调用时，其结果就会被记录下来。目前，这种装饰操作是在调用 TorchLens 的主函数（如 `log_forward_pass`）时动态进行的，而不是在导入 TorchLens 库时立即执行。这样做的好处是，在 TorchLens 函数调用之外，PyTorch 函数保持其原始的“干净”状态，避免对日常训练或推理造成不必要的开销或副作用。",[141,146,151,156,161,166,171,176,181,186,191,196,201,206,211,216,221,226,231,236],{"id":142,"version":143,"summary_zh":144,"released_at":145},108396,"v1.0.1","## v1.0.1（2026-03-23）\n\n_本版本依据 GPL-3.0-only 许可证发布。_\n\n### 错误修复\n\n- **decoration**：在包装 Tensor.__getitem__ 后，清除过时的 sq_item C 插槽（[`b2c6085`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fb2c6085e31695b973b8d11c23c773af837a846cc)）\n\n当使用 Python 函数替换 C 扩展类型上的 __getitem__ 时，CPython 会在 tp_as_sequence 中设置 sq_item 插槽。这会导致 PySequence_Check(tensor) 返回 True（而在原生 PyTorch 中为 False），从而使得 `torch.tensor([0-d_tensor, ...])` 将元素当作序列进行迭代并调用 len()——而对于零维张量，这会引发 TypeError。无论恢复原始的 wrapper_descriptor 还是使用 delattr，该插槽都不会被清除。\n\n修复方法：在每次 decoration\u002Fundecoration 循环后（decorate_all_once、unwrap_torch、wrap_torch），通过 ctypes 将 sq_item 置为 null。这样做是安全的，因为张量索引使用的是 mp_subscript（映射协议），而非 sq_item（序列协议）。已通过 tp_name 保护机制验证；在非 CPython 实现上则会静默失败。\n\n新增 9 个回归测试，覆盖所有生命周期路径。\n\n### 杂项\n\n- 添加 secret 检测的 pre-commit 钩子（[`0e2889a`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F0e2889ae90f822840895d9331e912a570d2a9acf)）\n\n添加 detect-private-key（pre-commit-hooks）和 detect-secrets（Yelp），以在密钥、令牌及高熵字符串泄露到仓库之前将其捕获。\n\n---\n\n**详细变更**：[v1.0.0...v1.0.1](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv1.0.0...v1.0.1)","2026-03-23T01:31:38",{"id":147,"version":148,"summary_zh":149,"released_at":150},108397,"v1.0.0","## v1.0.0 (2026-03-13)\n\n_本版本采用 GPL-3.0-only 许可证发布。_\n\n### 错误修复\n\n- **装饰**：为 mypy 的返回值检查，将 mode.device 强制转换为 str ([`45c0ff3`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F45c0ff3be8586675817905dcb273e9bad7ac0519))\n\nCI 中的 mypy（使用更严格的 torch stub 文件）检测到 mode.device 返回的是 torch.device 类型，而非 str。通过显式地进行 str() 转换，可以满足 Optional[str] 的返回类型注解。\n\n### 功能改进\n\n- **装饰**：惰性包装 — 导入 torchlens 不再有副作用 ([`b5da8b8`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fb5da8b8b15136f108534ba29022ab6579bdb9315))\n\n重大变更：torch 函数不再在导入时被自动包装。包装操作会延迟到首次调用 log_forward_pass() 时才进行，并且一旦完成就会持续生效。\n\n具体改动包括以下三点：\n\n1. 惰性装饰：移除了 __init__.py 中的 decorate_all_once() 和 patch_detached_references() 调用。现在由 _ensure_model_prepared() 在首次使用时通过 wrap_torch() 触发包装。\n\n2. 公开的包装\u002F解包 API：\n   - torchlens.wrap_torch() — 安装包装器（幂等操作）\n   - torchlens.unwrap_torch() — 恢原 torch 可调用对象\n   - torchlens.wrapped() — 上下文管理器（进入时包装，退出时解包）\n   - log_forward_pass(unwrap_when_done=True) — 一次性便捷方法\n   此外，旧的名称（undecorate_all_globally、redecorate_all_globally）仍作为内部别名保留。\n\n3. torch.identity 修复：装饰后的 identity 函数现在存储在 _state._decorated_identity 中，而不是通过猴子补丁的方式修改 torch.identity（该函数在 PyTorch 类型 stub 中并不存在）。这一改动消除了两个 mypy 错误。\n\n测试已更新：共通过 75 个测试用例，其中包括 12 个新的生命周期测试。\n\n### 重大变更\n\n- **装饰**：torch 函数不再在导入时被自动包装。包装操作会延迟到首次调用 log_forward_pass() 时才进行，并且一旦完成就会持续生效。\n\n---\n\n**详细变更日志**：[v0.22.0...v1.0.0](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.22.0...v1.0.0)","2026-03-13T22:06:26",{"id":152,"version":153,"summary_zh":154,"released_at":155},108398,"v0.22.0","## v0.22.0 (2026-03-13)\n\n_本版本依据 GPL-3.0-only 许可证发布。_\n\n### 杂项任务\n\n- **类型**: 移除过时的类型注解噪声 ([`5ba099d`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F5ba099df82f789f308f86d78a66c82b1baf3751d))\n\n### 文档\n\n- **维护**: 更新维护者说明 ([`685a358`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F685a358caa12b3210de68e136c59959de4c3f94f))\n\n- **维护**: 将 CLAUDE.md 和 AGENTS.md 拆分为架构与实现两个角色 ([`18f8ae9`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F18f8ae9851e91be5ca7c335facb45209212f6d95))\n\n打破符号链接镜像惯例：CLAUDE.md 现在包含架构层面的内容（是什么、为什么以及如何关联），而 AGENTS.md 则包含实现层面的内容（规范、陷阱、已知缺陷、测试命令）。纯实现相关的子目录（.github、scripts、tests、utils）仅保留 AGENTS.md。同时填充了 .project-context\u002F 模板文件（架构、规范、陷阱、决策）。\n\n### 功能\n\n- **装饰**: 添加全局取消装饰覆盖选项 ([`b0bafeb`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fb0bafeb51fbf9b406bd34a84ed826cd4b9c607bb))\n\n- **可视化**: 添加 dagua 与 torchlens 的集成 ([`35d5bcd`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F35d5bcdb976420b86be54048ab4503cbbc146763))\n\n### 重构\n\n- **类型**: 完成包级别的 mypy 清理 ([`6f9a3fe`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F6f9a3fe423367c9bd9e32949cdf06945d90786ee))\n\n---\n\n**详细变更**: [v0.21.3...v0.22.0](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.21.3...v0.22.0)","2026-03-13T20:13:26",{"id":157,"version":158,"summary_zh":159,"released_at":160},108399,"v0.21.3","## v0.21.3（2026-03-11）\n\n*本版本依据 GPL-3.0-only 许可证发布。*\n\n### 错误修复\n\n- **tests**：使 SIGALRM 信号安全测试具有确定性（[`b3fc461`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fb3fc46154f0942c829c3bbeb9b07f3d900471b79)）\n  \n  将基于定时器的 SIGALRM 替换为在 `forward()` 内直接调用 `os.kill()`，以确保信号始终在日志记录过程中触发。这消除了当前向传播在定时器到期之前完成时导致的不稳定的跳过现象。\n\n  共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**详细变更**：[v0.21.2...v0.21.3](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.21.2...v0.21.3)","2026-03-11T23:34:20",{"id":162,"version":163,"summary_zh":164,"released_at":165},108400,"v0.21.2","## v0.21.2 (2026-03-09)\n\n_本版本依据 GPL-3.0-only 许可证发布。_\n\n### 错误修复\n\n- **vis**: 避免在大型图上 ELK 布局失败时引发 graphviz.Digraph 内存炸弹 ([`f5563ee`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Ff5563eee457383772c9b148cca058b87e126b6b3))\n\n当 ELK 布局在 100 万节点以上的图上因 OOM 或超时而失败时，之前的回退路径会在 Python 中构建一个 graphviz.Digraph — 由于嵌套子图的 body 列表不断复制，内存占用会呈指数级增长，最终导致程序无限挂起。现在，render_elk_direct 会在内部处理这种失败情况：复用已收集的 Phase 1 数据生成不含位置信息的 DOT 文本，并直接使用 sfdp 渲染，完全绕过 graphviz.Digraph。\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **vis**: 对于大型图绕过 ELK，改用 Python 拓扑布局 ([`37cce3a`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F37cce3ab5607591f622b4fd7f5916bca16736d59))\n\nELK 的应力算法会分配两个 O(n²) 复杂度的距离矩阵（每个矩阵大小为 n² × 16 字节）。在 10 万个节点的情况下，这将消耗 160 GB 内存；而在 100 万个节点时，则需要 16 TB 内存——这就是导致 std::bad_alloc 异常的根本原因。此前设置的 15 万节点阈值根本无法解决问题。\n\n对于超过 10 万个节点的图，我们现在完全跳过 ELK，转而在 Python 中计算拓扑排序布局（采用 Kahn 算法，时间复杂度为 O(n+m)）。模块边界框则根据节点位置计算得出。最终结果会输入到相同的 neato -n 渲染流程中，从而保留聚类框。\n\n如果 ELK 在较小的图上也失败，我们将同样使用 Python 拓扑布局作为回退方案，而不是沿用之前会构建 graphviz.Digraph 并因嵌套子图 body 列表复制而爆内存的 sfdp 路径。\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**详细变更**：[v0.21.1...v0.21.2](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.21.1...v0.21.2)","2026-03-09T17:53:27",{"id":167,"version":168,"summary_zh":169,"released_at":170},108401,"v0.21.1","## v0.21.1 (2026-03-09)\n\n_本版本依据 GPL-3.0-only 许可证发布。_\n\n### 错误修复\n\n- **postprocess**: 修复 _build_module_param_info 中的 mypy 类型错误 ([`11ea006`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F11ea006c9a683675c926a9b0d649a2fa24b46558))\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### 杂项任务\n\n- 触发 CI ([`99f4102`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F99f4102fa34564868291214d6b6cf3dfc239fdce))\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### 性能改进\n\n- **postprocess**: 针对大型模型优化流水线 ([`a211417`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fa2114170b4f5b441d4c0ab7add26435820fc8f8f))\n\n- 每步详细计时：将分组的 _vtimed 块解包为单步计时，并附带图统计摘要，方便用户定位具体哪一步耗时较长（O16）；  \n- 根据包含模块元组缓存 module_str，避免在第 6 步中重复拼接字符串（O8）；  \n- 在 _undecorate_all_saved_tensors 中添加早期退出判断，跳过无 captured_args\u002Fkwargs 的层的 BFS 遍历（O5）；  \n- 在 _build_module_logs 中预先计算 buffer_layers_by_module 字典，消除每模块 O(模块 × 缓冲区) 的扫描操作（O6）；  \n- 第 11 步重命名时采用单次遍历重建参数列表，取代原先的三次遍历——枚举 + 构建索引集合 + 过滤模式（O2）；  \n- 在 _trim_and_reorder 中用 dict 替代 OrderedDict（Python 3.7+ 会保留插入顺序），以降低内存分配开销（O4）；  \n- 在 _refine_iso_groups 中采用反向索引方法：时间复杂度从 O(成员 × 邻居) 降至 O(成员²) 的全组合方式（O9）；  \n- 在 _merge_iso_groups_to_layers 中，先以 frozenset 形式预计算每个子图的参数类型，再进行配对循环（O10）；  \n- 使用集合实现 O(n) 复杂度的冲突检测，替代 _find_isomorphic_matches 中 O(n²) 的 .count() 调用（O12）\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**详细变更**：[v0.21.0...v0.21.1](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.21.0...v0.21.1)","2026-03-09T15:04:08",{"id":172,"version":173,"summary_zh":174,"released_at":175},108402,"v0.21.0","## v0.21.0 (2026-03-09)\n\n_本版本依据 GPL-3.0-only 许可证发布。_\n\n### 错误修复\n\n- **capture**: 修复 output_tensors 字段字典中的 mypy 类型错误 ([`d54e9a9`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fd54e9a99df47442ddc396ca20666203cfbeb5f06))\n\n将 fields_dict 注解为 Dict[str, Any]，并以正确类型提取 param_shapes，以满足 mypy 的严格类型推断。\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **vis**: 将堆内存限制传递给 ELK Worker 线程，以防止在 100 万节点图中发生 OOM ([`23ef8d8`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F23ef8d8c175846fce1b48427a0c980a825486b9c))\n\n运行 ELK 布局的 Node.js Worker 没有在其 resourceLimits 中显式设置 maxOldGenerationSizeMb — 只设置了 stackSizeMb。而 --max-old-space-size 标志仅控制主线程的 V8 隔离区，而非 Worker 的。这导致 Worker 在处理约 16GB 内存的 100 万节点图时仍会 OOM，尽管主线程已配置为最多 64GB。\n\n- 向 Worker 的 resourceLimits 添加 maxOldGenerationSizeMb 和 maxYoungGenerationSizeMb，并通过 _TL_HEAP_MB 环境变量传递。\n- 添加 _available_memory_mb() 函数来检测系统 RAM，并将堆分配上限设为 (可用 - 4GB)，以避免与 Python 进程竞争资源。\n- 在 OOM 诊断信息中包含可用系统内存。\n\n此外，还包括来自 feat\u002Fgrand-rename 分支的字段和参数重命名。\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### 文档更新\n\n- 使用深度探索会议 4 的发现更新所有 CLAUDE.md 文件 ([`b15c5bf`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fb15c5bfa665011002a67cdcc7710f4737f1fc5d6))\n\n同步项目及子包文档与当前代码库：\n- 更新了全部 36 个模块的行数统计。\n- 在 visualization\u002F 目录下添加了 elk_layout.py 的文档。\n- 将 arg_positions.py 和 salient_args.py 移至 capture\u002F 目录。\n- 记录了 13 个新 bug（ELK-IF-THEN、BFLOAT16-TOL 等）。\n- 更新了测试数量（共 16 个文件，总计 1,004 个测试）。\n- 在 validation\u002F、utils\u002F 和 decoration\u002F 目录下新增已知 bug 章节。\n- 更新了 data_classes\u002F 目录，添加了新字段和属性。\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### 功能改进\n\n- 为提高清晰度，重命名所有数据结构字段和函数参数 ([`f0d7452`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Ff0d7452e272bede7a874e1bda2dbce68dbb94697))\n\n对全部 8 种数据结构（ModelLog、LayerPassLog、LayerLog、ParamLog、ModuleLog、BufferLog、ModulePassLog、FuncCallLocation）以及面向用户的函数参数进行了约 68 处重命名。主要变更如下：\n\n- tensor_contents → activation，grad_contents → gradient\n- 所有 *_fsize* → *_memory*（例如 tensor_fsize → tensor_memory）\n- func_applied_name → func_name，gradfunc → grad_fn_name\n- is_bottom_level_submodule_output → is_leaf_module_output\n- containing_module_origin → containing_module\n- spouse_layers → co_parent_layers，orig_ancestors → root_ancestors\n- model_is_recurrent → is_recurrent，elapsed_time_* → time_*\n- vis_opt → vis_mode，save_only → vis_save_only\n- 修正拼写错误：output_descendents → output_des","2026-03-09T12:04:33",{"id":177,"version":178,"summary_zh":179,"released_at":180},108403,"v0.20.5","## v0.20.5 (2026-03-09)\n\n_本版本依据 GPL-3.0-only 许可证发布。_\n\n### 错误修复\n\n- **vis**: 防止 100 万节点的 ELK 渲染因内存不足而被 OOM 杀死 ([#128](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fpull\u002F128), [`d9a1525`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fd9a15259e452bf3c224732a0bc5a1671f9cbb56e))\n\n100 万节点的渲染在 RSS 达到约 74GB 时被 OOM 杀死，原因如下：1. 模型参数（约 8–10GB）在 ELK 子进程中一直存活；2. `preexec_fn` 强制执行 fork+exec，导致写时复制使 74GB 的进程内存占用翻倍；3. 堆和栈的计算公式产生了荒谬的值（堆 5.6TB，栈 15GB）；4. 在启动子进程前未进行内存清理。\n\n更改内容：\n- `render_large_graph.py`: 将 `log_forward_pass` 与 `render_graph` 分离，在 ELK 渲染前释放模型和自动求导相关资源；\n- `elk_layout.py`: 将堆内存上限设为 64GB，栈内存下限设为 4096MB、上限设为 8192MB；将 JSON 写入临时文件（在子进程启动前释放字符串占用的内存）；在启动子进程前调用 `gc.collect`；并在模块级别设置 `RLIMIT_STACK`（移除 `preexec_fn` 及其强制的 fork+exec 操作）。\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**详细变更**: [v0.20.4...v0.20.5](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.20.4...v0.20.5)","2026-03-09T02:49:14",{"id":182,"version":183,"summary_zh":184,"released_at":185},108404,"v0.20.4","## v0.20.4 (2026-03-09)\n\n_本版本依据 GPL-3.0-only 许可证发布。_\n\n### 错误修复\n\n- **postprocess**: 修复条件分支检测中仅向后传播的泛洪问题，并添加 THEN 标签 ([#88](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fpull\u002F88), [`d737828`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fd7378281a4be5e56602902cd4cf1aa555b391d44))\n\n  问题 #88：_mark_conditional_branches 曾以双向方式（父节点与子节点）进行泛洪，导致非条件节点的子节点也被错误地标记为处于条件分支中。此次修复将泛洪范围限制为仅父层。\n\n  此外，在 save_source_context=True 时，通过 AST 分析新增了 THEN 分支检测功能，并在可视化中添加 IF\u002FTHEN 边缘标签。还增加了 8 个新的测试模型、22 个新测试用例，并修复了 MODEL_LOG_FIELD_ORDER 中缺失的 'verbose' 字段。\n\n  共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **vis**: 使用 Worker 线程处理 ELK 布局，以解决大型图谱中的栈溢出问题 ([`3fe6a84`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F3fe6a84c3d9d8a5856dfadcf451ef85700a2385c))\n\n  V8 的 --stack-size 标志会默默地将栈大小限制在远低于请求值的水平，从而在拥有 100 万以上节点的图谱中引发“调用栈大小已超出最大值”的错误。现改用 Node.js 的 Worker 线程，并设置 resourceLimits.stackSizeMb 参数，可在 V8 隔离域级别可靠地提供请求的栈大小。\n\n  共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**详细变更**：[v0.20.3...v0.20.4](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.20.3...v0.20.4)","2026-03-09T01:03:14",{"id":187,"version":188,"summary_zh":189,"released_at":190},108405,"v0.20.3","## v0.20.3 (2026-03-08)\n\n_本版本依据 GPL-3.0-only 许可证发布。_\n\n### Bug 修复\n\n- **vis**: 将 ELK Node.js 堆栈大小下限提高至 4GB，以支持大型图谱（[`29af94e`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F29af94ed51b9018ba214f2de15c48e5454a773b3)）\n\n此前的 128MB 对于 ELK 在 50 万节点以上的大规模图谱上的递归布局来说过于不足。\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **vis**: 提升 ELK Node.js 子进程的操作系统堆栈限制（[`da82c9d`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fda82c9d0fd66c2fdcbaabb86b31750811693a930)）\n\n此前操作系统的软堆栈限制（ulimit -s）小于传递给 Node.js 的 --stack-size 参数值，导致在处理大规模图谱（50 万节点以上）时出现段错误，而非让 V8 使用请求的堆栈空间。通过使用 preexec_fn 仅在子进程中将 RLIMIT_STACK 设置为无限制。\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### 性能改进\n\n- **decoration**: 优化模型准备流程，并将会话属性移至 ModelLog 字典中（[`b63a4fa`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fb63a4fa3871113a5cc73554c576fb11c5eac2cd2)）\n\n针对 _prepare_model_session 及相关初始化代码进行了五项性能优化：\n\n- PERF-38：用 deque 替代 _traverse_model_modules 中的 O(N²) 列表拼接；  \n- PERF-37：在 _get_class_metadata 中按类缓存 user_methods，并将 _pytorch_internal 集合提升至模块级别的 frozenset；  \n- PERF-36：直接遍历 module._parameters，而非对 named_parameters 地址进行拆分后再查字典；  \n- PERF-39：跳过已准备好的模型的 patch_model_instance 操作；  \n- 将 4 个会话作用域内的模块属性（tl_module_pass_num、tl_module_pass_labels、tl_tensors_entered_labels、tl_tensors_exited_labels）从 nn.Module 实例中移至以 id(module) 为键的 ModelLog 字典中，并移除 tl_source_model_log（死代码）。此举消除了 _cleanup_model_session 中针对每个模块的清理迭代。\n\n在 1 万个模块的情况下：ensure_prepared 的重复调用耗时从约 48 毫秒降至约 0.4 毫秒（提升 111 倍），会话初始化速度提升约 1.3 倍，清理速度提升约 1.4 倍。\n\n共同作者：Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**详细变更**：[v0.20.2...v0.20.3](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.20.2...v0.20.3)","2026-03-08T23:49:07",{"id":192,"version":193,"summary_zh":194,"released_at":195},108406,"v0.20.2","## v0.20.2 (2026-03-08)\n\n_This release is published under the GPL-3.0-only License._\n\n### Bug Fixes\n\n- **vis**: Increase ELK Node.js stack size to prevent overflow ([`b8edbc8`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fb8edbc8c779dda44b046c9efa4855b3a6fad46f7))\n\nBump --stack-size floor from 64MB to 128MB and multiplier from 16x to 48x (matching heap scaling) to prevent \"Maximum call stack size exceeded\" in elkjs on large graphs.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### Chores\n\n- **scripts**: Enable loop detection in render_large_graph ([`803e16f`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F803e16f3ecec7e2fbb1c662ce963f7de51f4d16c))\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**Detailed Changes**: [v0.20.1...v0.20.2](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.20.1...v0.20.2)\n","2026-03-08T20:26:07",{"id":197,"version":198,"summary_zh":199,"released_at":200},108407,"v0.20.1","## v0.20.1 (2026-03-08)\n\n_This release is published under the GPL-3.0-only License._\n\n### Chores\n\n- **scripts**: Use log_forward_pass vis_opt instead of separate render call ([`d2aea0f`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fd2aea0fab326e02444d99aadc7dcf12abf9c8b10))\n\nLet verbose mode handle all phase timing instead of manual timestamps. Use log_forward_pass's built-in vis_opt to render in one call.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### Performance Improvements\n\n- **model_prep**: Optimize _prepare_model_session for large models ([`2892323`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F2892323ac2910532c9bbde894280ba917a72e966))\n\n- Hoist set(dir(nn.Module)) to module-level constant _NN_MODULE_ATTRS\n- Replace dir(module) MRO walk with __dict__ scans for attrs and methods\n- Pre-build address→module dict to eliminate per-parameter tree walks\n- Use model.modules() with cached tl_module_address instead of second DFS\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**Detailed Changes**: [v0.20.0...v0.20.1](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.20.0...v0.20.1)\n","2026-03-08T19:42:57",{"id":202,"version":203,"summary_zh":204,"released_at":205},108408,"v0.20.0","## v0.20.0 (2026-03-08)\n\n_This release is published under the GPL-3.0-only License._\n\n### Chores\n\n- **scripts**: Unify large graph render scripts into single parameterized script ([`07a8186`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F07a8186537cf88bb39f4abae55a05dc01ec16457))\n\nReplace `run_250k.py` and `run_1M.py` with `render_large_graph.py` that accepts any node count as a CLI argument, plus --format, --seed, and\n--outdir options.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### Features\n\n- **logging**: Add verbose mode for timed progress messages ([`0603f10`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F0603f1035b8be7c2c44a416099fac282eaafa686))\n\nAdd `verbose: bool = False` parameter to `log_forward_pass`, `show_model_graph`, and internal pipeline functions. When enabled, prints `[torchlens]`-prefixed progress at each major pipeline stage with timing. Also fixes `_trim_and_reorder_model_history_fields` to preserve all non-ordered attributes (not just private ones).\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**Detailed Changes**: [v0.19.0...v0.20.0](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.19.0...v0.20.0)\n","2026-03-08T19:01:49",{"id":207,"version":208,"summary_zh":209,"released_at":210},108409,"v0.19.0","## v0.19.0 (2026-03-08)\n\n_This release is published under the GPL-3.0-only License._\n\n### Chores\n\n- Add large graph render scripts to scripts\u002F ([`d02233b`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fd02233bc050657ed6088b8338e056c2c531d45bd))\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### Documentation\n\n- Update RESULTS.md and tests\u002FCLAUDE.md with current counts ([`75fa346`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F75fa3463b16505d21b3498a29c848e5da248bd36))\n\n- Total tests: 892 → 951, test files: 14 → 15, toy models: 249 → 250\n- Add test_large_graphs.py (51 tests) to file tables\n- Add decoration overhead benchmark table to profiling baselines\n- Add large graph scaling section (100 to 1M nodes)\n- Update all per-file test counts to current values\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### Features\n\n- **capture**: Add func_config field for lightweight hyperparameter extraction ([`7144d6d`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F7144d6d5200bb17377956ffb8c579553ae34ddfe))\n\nAdds a `func_config` dict to every LayerPassLog\u002FLayerLog containing computation-defining hyperparameters (kernel_size, stride, in\u002Fout channels, dropout p, etc.) extracted at capture time with zero tensor cloning. Empty for source tensors and output nodes.\n\nAlso fixes pre-existing test failures in test_validation.py (read-only property assignments) and adds detect_loops to MODEL_LOG_FIELD_ORDER.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### Testing\n\n- **profiling**: Add decoration overhead benchmark to profiling report ([`c8fbffd`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fc8fbffdc43b4f97a2e7ab90cecd78a8d59950592))\n\nMeasures per-call overhead of TorchLens's toggle-gated wrappers when logging is disabled. Benchmarks 11 functions from cheap (relu, add) to heavy (conv2d, SDPA) — confirms ~600ns overhead on cheap ops, \u003C1% on real compute.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**Detailed Changes**: [v0.18.0...v0.19.0](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.18.0...v0.19.0)\n","2026-03-08T18:41:14",{"id":212,"version":213,"summary_zh":214,"released_at":215},108410,"v0.18.0","## v0.18.0 (2026-03-08)\n\n_This release is published under the GPL-3.0-only License._\n\n### Features\n\n- **data**: Add MACs properties to LayerPassLog, LayerLog, ModelLog, ModuleLog ([`c60e7b9`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fc60e7b9a2fd29de09d82c5be076e75c6e4c8b39d))\n\nMACs (multiply-accumulate operations) = FLOPs \u002F 2. Added:\n- LayerPassLog: macs_forward, macs_backward properties\n- LayerLog: macs_forward, macs_backward properties\n- ModelLog: total_macs_forward, total_macs_backward, total_macs, macs_by_type()\n- ModuleLog: flops_forward, flops_backward, flops, macs_forward, macs_backward, macs\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### Refactoring\n\n- **data**: Convert 23 stored fields to computed @properties across data classes ([`2c22208`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F2c22208a72636bb796268510a3529cf1183519eb))\n\nReplace redundant stored fields with computed @property methods that derive their values from existing data. Eliminates ~155 lines of write-site code across 13 files while preserving identical behavior and passing all invariants.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**Detailed Changes**: [v0.17.0...v0.18.0](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.17.0...v0.18.0)\n","2026-03-08T15:00:34",{"id":217,"version":218,"summary_zh":219,"released_at":220},108411,"v0.17.0","## v0.17.0 (2026-03-08)\n\n_This release is published under the GPL-3.0-only License._\n\n### Bug Fixes\n\n- **types**: Add mypy annotations for defaultdict and deque in elk_layout ([`98478a3`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F98478a33a28e890554bad6325478efb8a2ff5f85))\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n### Features\n\n- **vis**: Scale ELK rendering to 250k+ nodes, add detect_loops option ([`4707631`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F470763146ce30334b5853cfc7cfd104035fd9922))\n\nELK's layered algorithm (Sugiyama) uses O(n²) memory for crossing minimization, causing std::bad_alloc at ~150k+ nodes. This adds:\n\n- Auto-switch to ELK stress algorithm above 150k nodes (O(n) memory) - Topological position seeding for stress to preserve directional flow - Increased Node.js heap allocation (16GB floor, 48x JSON size) - Better error messages when Node.js OOM-kills (was silent empty stderr) - `detect_loops` parameter on log_forward_pass\u002Fshow_model_graph to skip expensive isomorphic subgraph expansion, keeping only same-param grouping (Rule 1). Default True (existing behavior unchanged). - 8 loop comparison tests rendering with\u002Fwithout loop detection\n\nSuccessfully renders 250k-node graphs in ~19 minutes (was impossible). 1M-node render in progress.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**Detailed Changes**: [v0.16.4...v0.17.0](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.16.4...v0.17.0)\n","2026-03-08T13:55:50",{"id":222,"version":223,"summary_zh":224,"released_at":225},108412,"v0.16.4","## v0.16.4 (2026-03-08)\n\n_This release is published under the GPL-3.0-only License._\n\n### Bug Fixes\n\n- **validation**: Use call nesting depth not address depth in invariant Q, fix test inputs ([#117](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fpull\u002F117), [`98660ff`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F98660ff114f31a2324d3268c0d925cf0b31c02c9))\n\n---\n\n**Detailed Changes**: [v0.16.3...v0.16.4](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.16.3...v0.16.4)\n","2026-03-08T01:55:56",{"id":227,"version":228,"summary_zh":229,"released_at":230},108413,"v0.16.3","## v0.16.3 (2026-03-08)\n\n_This release is published under the GPL-3.0-only License._\n\n### Bug Fixes\n\n- **tests**: Rename model_kwargs to input_kwargs and fix test configs ([`8768339`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F87683393e2c38a2703a73643c7b975517bffd589))\n\n- Rename model_kwargs= to input_kwargs= in 17 test call sites to match API\n- Skip test_gpt_bigcode (JIT-compiled attention incompatible with TorchLens)\n- Fix test_deformable_detr backbone out_features to match 2-layer ResNet\n- Add index, symint, checkkeypaddingmask to _UNARY_FUNCS ArgSpec entries\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **types**: Resolve 14 pre-existing mypy errors across 4 files ([`7adcc36`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F7adcc362552697cf8961024d9bc1dcd7a4fb347f))\n\n- rng.py: annotate rng_dict as Dict[str, object] for mixed-type values\n- elk_layout.py: annotate out_shape as tuple to avoid index-out-of-range\n- model_prep.py: annotate meta as Dict[str, object], add type: ignore for\n  dynamic tl_buffer_address attrs on Tensor\n- output_tensors.py: add type: ignore for ArgSpec arg-type and assignment\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**Detailed Changes**: [v0.16.2...v0.16.3](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.16.2...v0.16.3)\n","2026-03-08T00:29:34",{"id":232,"version":233,"summary_zh":234,"released_at":235},108414,"v0.16.2","## v0.16.2 (2026-03-08)\n\n_This release is published under the GPL-3.0-only License._\n\n### Bug Fixes\n\n- **validation**: Exempt exponential_ from perturbation check ([`a6dce40`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fa6dce40dc314a36a0296a7c65dd4faf04647b126))\n\nIn-place RNG op — output is determined by RNG state, not input values. Fixes test_gumbel_vq_model.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **validation**: Exempt maximum\u002Fminimum from perturbation check ([`9d658bc`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F9d658bc1d80222523ec7e410c0cf1d38665b16f2))\n\ntorch.maximum\u002Fminimum with extreme-valued args (e.g. RWKV's negative infinity masks) are insensitive to perturbation — same as max\u002Fmin.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **validation**: Fix perturbation precision and add exemptions for C++ ops ([`7f86655`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F7f86655705a8d914126a174ff8b95cf3cc123cee))\n\n- Scale constant-tensor perturbation by ±10% of magnitude to ensure\n  float32-distinguishable values while staying in safe range\n- Add posthoc magnitude-ratio exemption: when non-perturbed parent's\n  magnitude dwarfs the perturbed parent's (>100x), float32 arithmetic\n  swallows the perturbation — exempt rather than fail\n- Add maximum\u002Fminimum to posthoc exemption (element-wise binary ops\n  insensitive to perturbation when one arg dominates)\n- Skip perturbation for _op (torchvision C++ PyCapsule ops like nms,\n  roi_align) — perturbed coordinates segfault native extensions\n- Fix nystromformer test: seq_len must equal num_landmarks² (64)\n\nFixes: test_fcos_resnet50_train, test_retinanet_resnet50_train, test_nystromformer, test_maskrcnn_resnet50_train\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **validation**: Scale perturbation range by magnitude for large constants ([`efcd876`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fefcd8766b38c6ea9f5d3715fac4994509985299b))\n\nConstant tensors with large values (e.g. 1e8) were perturbed by ±1.0, which is indistinguishable in float32 precision. Now scales expansion by abs(value) * 1e-4 to ensure perturbation is always detectable.\n\nFixes test_fcos_resnet50_train validation failure.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **validation**: Use relative threshold for near-constant tensor perturbation ([`e0c82c7`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fe0c82c7a63e63d2f6e033274d152afc68ecb060b))\n\nNear-constant float tensors (e.g. [2.6785714, 2.6785717]) had a value range smaller than float32 precision, so perturbation within [min, max] produced the same value after rounding. Changed exact `lo == hi` check to relative threshold `hi - lo \u003C max(1e-6, abs(lo) * 1e-6)` to trigger range expansion for these cases.\n\nFixes test_ssd300_vgg16_train validation failure.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **vis**: Probe nvm paths when node is not on PATH ([`4d3ae24`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F4d3ae2414bbcc0835d4cb01da230f158004da690))\n\nNon-interactive shells (IDE test runners, cron, subprocesses) often lack nvm's PATH additions, causing elk_available() to return False even when elkjs is installed. Add _find_node_binary() to probe ~\u002F.nvm\u002Fversions\u002Fnode\u002F as a fallback, and inject the node binary's directory into the subprocess PATH so elkjs detection works regardless of shell configuration.\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**Detailed Changes**: [v0.16.1...v0.16.2](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.16.1...v0.16.2)\n","2026-03-08T00:02:51",{"id":237,"version":238,"summary_zh":239,"released_at":240},108415,"v0.16.1","## v0.16.1 (2026-03-07)\n\n_This release is published under the GPL-3.0-only License._\n\n### Bug Fixes\n\n- **tests**: Use relative import for example_models in test_large_graphs ([`e2d0ae4`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002Fe2d0ae476e4d54c2211a3629e8958032c82b1c0f))\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n- **vis**: Harden ELK heap scaling and fix flaky signal safety test ([`41b9f89`](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcommit\u002F41b9f893692bc39a4ea0342092df62b2f2ee2b38))\n\n- Bump ELK Node.js heap scaling from 8x to 16x JSON size to prevent OOM\n  on 250k+ node graphs\n- Mark 100k node tests as @rare (too slow for regular runs)\n- Fix flaky TestSignalSafety: use setitimer(50ms) instead of alarm(1s),\n  increase model iterations to 50k, skip if alarm doesn't fire\n\nCo-Authored-By: Claude Opus 4.6 \u003Cnoreply@anthropic.com>\n\n---\n\n**Detailed Changes**: [v0.16.0...v0.16.1](https:\u002F\u002Fgithub.com\u002Fjohnmarktaylor91\u002Ftorchlens\u002Fcompare\u002Fv0.16.0...v0.16.1)\n","2026-03-07T22:00:23"]