[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-rgeirhos--texture-vs-shape":3,"tool-rgeirhos--texture-vs-shape":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",148568,2,"2026-04-09T23:34:24",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108111,"2026-04-08T11:23:26",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":99,"forks":100,"last_commit_at":101,"license":102,"difficulty_score":10,"env_os":103,"env_gpu":104,"env_ram":103,"env_deps":105,"category_tags":113,"github_topics":114,"view_count":32,"oss_zip_url":79,"oss_zip_packed_at":79,"status":17,"created_at":122,"updated_at":123,"faqs":124,"releases":153},6062,"rgeirhos\u002Ftexture-vs-shape","texture-vs-shape","Pre-trained models, data, code & materials from the paper \"ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness\" (ICLR 2019 Oral)","texture-vs-shape 是一个源自 ICLR 2019 口头报告论文的开源项目，提供了预训练模型、数据集及核心代码。它主要揭示并解决了一个关键问题：在 ImageNet 上训练的卷积神经网络（CNN）往往存在严重的“纹理偏差”。这意味着模型在识别物体时，过度依赖表面纹理而非形状特征。例如，当看到一只拥有大象纹理的猫时，模型会错误地将其识别为大象，这与人类主要依靠形状认知的机制截然不同。\n\n该项目通过提供风格化 ImageNet（Stylized-ImageNet）等独特数据集，帮助研究人员量化模型的形状偏差，并验证了增加形状偏差能显著提升模型的准确率与鲁棒性。其技术亮点在于不仅复现了论文中的对比实验，还持续更新工具箱，支持轻松评估任意 PyTorch 或 TensorFlow 模型的形状偏好，甚至能绘制直观的偏差分析图表。\n\ntexture-vs-shape 非常适合 AI 研究人员、深度学习开发者以及对模型可解释性感兴趣的数据科学家使用。如果你希望深入探究神经网络的内部决策机制，或者致力于构建更抗干扰、更接近人类视觉系统的计算机视觉模型，这套资源将提供坚实的数据基础与分析工","texture-vs-shape 是一个源自 ICLR 2019 口头报告论文的开源项目，提供了预训练模型、数据集及核心代码。它主要揭示并解决了一个关键问题：在 ImageNet 上训练的卷积神经网络（CNN）往往存在严重的“纹理偏差”。这意味着模型在识别物体时，过度依赖表面纹理而非形状特征。例如，当看到一只拥有大象纹理的猫时，模型会错误地将其识别为大象，这与人类主要依靠形状认知的机制截然不同。\n\n该项目通过提供风格化 ImageNet（Stylized-ImageNet）等独特数据集，帮助研究人员量化模型的形状偏差，并验证了增加形状偏差能显著提升模型的准确率与鲁棒性。其技术亮点在于不仅复现了论文中的对比实验，还持续更新工具箱，支持轻松评估任意 PyTorch 或 TensorFlow 模型的形状偏好，甚至能绘制直观的偏差分析图表。\n\ntexture-vs-shape 非常适合 AI 研究人员、深度学习开发者以及对模型可解释性感兴趣的数据科学家使用。如果你希望深入探究神经网络的内部决策机制，或者致力于构建更抗干扰、更接近人类视觉系统的计算机视觉模型，这套资源将提供坚实的数据基础与分析工具，助力你的科研工作更加严谨高效。","### :tada: Update (Aug 2021): \nPlotting the shape bias of your model has never been easier! The comprehensive toolbox at [bethgelab:model-vs-human](https:\u002F\u002Fgithub.com\u002Fbethgelab\u002Fmodel-vs-human) supports all datasets reported here (e.g. texture-shape cue conflict, silhouettes-only, edges-only) and comes with code to evaluate arbitrary PyTorch \u002F TensorFlow models.\n\n# Data, code and materials from \u003Cbr>\"ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness\"\n\nThis repository contains information, data and materials from the paper [ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness](https:\u002F\u002Fopenreview.net\u002Fforum?id=Bygh9j09KX) by Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. We hope that you may find this repository a useful resource for your own research.\n\nThe core idea is explained in the Figure below: If a Convolutional Neural Network sees a cat with elephant texture, it thinks it's an elephant even though the shape is still clearly a cat. We found this \"texture bias\" to be common for ImageNet-trained CNNs, which is in contrast to the widely held belief that CNNs mostly learn to recognise objects by detecting their shapes.\n![intro_figure](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frgeirhos_texture-vs-shape_readme_fff19b338a62.png) \n\nPlease don't hesitate to contact me at robert.geirhos@bethgelab.org or open an issue in case there is any question! Reproducibility & Open Science are important to me, and I appreciate feedback on what could be improved.\n\nThis README is structured according to the repo's structure: one section per subdirectory (alphabetically).\n\n##### Related repositories:\nNote that Stylized-ImageNet, an important dataset used in this paper, has its own repository at [rgeirhos:Stylized-ImageNet](https:\u002F\u002Fgithub.com\u002Frgeirhos\u002FStylized-ImageNet). The cue conflict dataset used to evaluate shape\u002Ftexture bias is NOT the same as Stylized-ImageNet. \n\nSome aspects of this repository are borrowed from our earlier work, \"Generalisation in humans and deep neural networks\" (published at NeurIPS 2018). The corresponding code, data and materials can be obtained from [rgeirhos:generalisation-humans-DNNs](https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Fgeneralisation-humans-DNNs). For convencience, some human data from this repo (which are used in the texture-vs-shape work for comparison) are included here directly (under ``raw-data\u002Fraw-data-from-generalisation-paper\u002F``).\n\n\n## code\nThe ``code\u002F`` directory contains mapping functionality that can be used to determine the corresponding entry-level class (out of 16, e.g. \"dog\") from a vector of length 1,000 (softmax output of a typical ImageNet classifier). In order to use this, follow the steps below:\n\n```python\n    # get softmax output\n    softmax_output = SomeCNN(input_image) # replace with your favourite CNN\n\n    # convert to numpy\n    softmax_output_numpy = SomeConversionToNumpy(softmax_output) # replace with conversion\n\n    # create mapping\n    mapping = probabilities_to_decision.ImageNetProbabilitiesTo16ClassesMapping()\n    \n    # obtain decision \n    decision_from_16_classes = mapping.probabilities_to_decision(softmax_output_numpy)\n```\n\n## data-analysis\nThe ``data-analysis\u002F`` directory contains the main analysis script ``data-analysis.R`` and some helper functionality. All created plots will then be stored in the ``paper-figures\u002F`` directory.\n\nPlease note: For AlexNet, VGG-16 and GoogLeNet we used the caffe implementation; for ResNet-50 the torchvision implementation. When computing the shape bias for AlexNet and VGG-16 using torchvision, the values differ (as pointed out in issue #7). The shape bias of AlexNet is 25.3%, for VGG-16 it is 9.2%. Both shape bias values are lower than the ones reported in the paper that were obtained with caffe. This means that using the torchvision implementation, we obtain even more extreme texture bias for these two models. Generally we recommend using the torchvision implementation (more commonly used & up-to-date framework).\n\n## lab-experiment\nEverything necessary to run an experiment in the lab with human participants. This is based on MATLAB.\n\n##### experimental-code\nContains the main MATLAB experiment, `shape_texture_experiment.m`, as well as a `.yaml` file where the specific parameter values used in an experiment are specified (such as the stimulus presentation duration). Some functions depend on our in-house iShow library which can be obtained from [here](http:\u002F\u002Fdx.doi.org\u002F10.5281\u002Fzenodo.34217).\n\n##### helper-functions\nSome of the helper functions are based on other people's code, please check out the corresponding files for the copyright notices.\n\n\n## models\nThe file ``load_pretrained_models.py`` will load the following models that are trained on Stylized-ImageNet:\n\n```python\n\n    from load_pretrained_models import load_model\n    \n    model_A = \"resnet50_trained_on_SIN\"\n    model_B = \"resnet50_trained_on_SIN_and_IN\"\n    model_C = \"resnet50_trained_on_SIN_and_IN_then_finetuned_on_IN\"\n    \n    model = load_model(model_name = model_A) # or model_B or model_C\n```\nThese correspond to the models reported in Table 2 of the paper (method details in Section A.5 of the Appendix). Additionally, AlexNet and VGG-16 trained on SIN are provided. Please note that the overall performance of those two models is not great since the hyperparameters used during training were likely suboptimal. The top1\u002Ftop5 performance of VGG-16 trained on SIN and evaluated on ImageNet are: Prec@1 52.260 Prec@5 76.390 (evaluated on SIN: Prec@1 48.958 Prec@5 73.092).\n\nWe used the [PyTorch ImageNet training script](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fexamples\u002Ftree\u002Fmaster\u002Fimagenet)  to train the models. These are the training hyperparameters:\n\n- batch size: 256\n- optimizer: SGD (`torch.optim.SGD`)\n- momentum: 0.9\n- weight decay: 1e-4\n- number of epochs: 60 (`model_A`) respectively 45 (`model_B`). However, these 45 epochs for `model_B` correspond to 90 epochs of normal ImageNet training since the dataset used to train `model_B` is twice as large (combined ImageNet and Stylized-ImageNet), thus in every epoch the classifier sees twice as many images as in a standard epoch.\n- learning rate: 0.1 multiplied by 0.1 after every 20 epochs (`model_A`) respectively after every 15 epochs (`model_B`).\n- pretrained on ImageNet: True (for `model_A` and for `model_B`), i.e. using `torchvision.models.resnet50(pretrained=True)`. Initialising models weights with the standard weights from ImageNet training proved beneficial for overall accuracy.\n\n`model_C` was initialised with the weights of `model_B`. Fine-tuning on ImageNet was then performed for 60 epochs using a learning rate of 0.01 multiplied by 0.1 after 30 epochs. The other hyperparameters (batch size, optimizer, momentum & weight decay) were identical to the ones used for training `model_A` and `model_B`.\n\nFor dataset preprocessing, we used the standard ImageNet normalization for both IN and SIN (as e.g. used in the [PyTorch ImageNet training script](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fexamples\u002Ftree\u002Fmaster\u002Fimagenet)), with the following mean and standard deviation:\n\n- mean = [0.485, 0.456, 0.406]\n- std = [0.229, 0.224, 0.225]\n\nThese were the training transformations:\n\n```python\n    train_transforms = transforms.Compose([\n                                  transforms.RandomResizedCrop(224),\n                                  transforms.RandomHorizontalFlip(),\n                                  transforms.ToTensor()])\n```\n\nand those the validation transformations:\n```python                                  \n    val_transforms = transforms.Compose([\n                                      transforms.Resize(256),\n                                      transforms.CenterCrop(224),\n                                      transforms.ToTensor()])                 \n```\nInput format: RGB.\n\n#### Shape bias and IN accuracies of different SIN-trained models\n\nThese are the shape bias values of the four models mentioned above. As a rough guideline, the more epochs a model was trained on ImageNet the lower its shape bias; the more epochs a model was trained on Stylized-ImageNet the higher its shape bias. Fine-tuning on ImageNet (as for model_C) leads to improved ImageNet performance, even better than a standard ResNet-50, but it also means that the model \"forgets\" the shape bias it had before finetuning.\n\n| model | shape bias | top-1 IN acc | top-5 IN acc |\n|---|---|---|---|\n| standard ResNet-50 | 21.39% | 76.13 | 92.86 |\n| model_A            | 81.37% | 60.18 | 82.62 |\n| model_B            | 34.65% | 74.59 | 92.14 |\n| model_C            | 20.54% | 76.72 | 93.28 |\n\nNote that these values are computed using a slightly different probability aggregation method as reported in the paper. We here used the average: ImageNet class probabilities were mapped to the corresponding 16-class-ImageNet category using the average of all corresponding fine-grained category probabilities. We recommend using this approach instead of other aggregation methods (summation, max, ...). The updated appendix of [this paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1808.08750), page 22f, describes why the average aggregation method is the principled and preferable way.\n\n## paper-figures\nContains all figures of the paper. All figures reporting results can be generated by the scripts in `data-analysis\u002F`.\n\n\n## raw-data\nHere, a ``.csv`` file for each observer and network experiment contains the raw data, including a total number of 48,560 human psychophysical trials across 97 participants in a controlled lab setting.\n\n\n## stimuli\nThese are the raw stimuli used in our experiments. Each directory contains stimuli images split into 16 subdirectories (one per category).\n\n\n# FAQ\n\n#### Code to run style transfer:\nI used [Leon Gatys' code](https:\u002F\u002Fgithub.com\u002Fleongatys\u002FPytorchNeuralStyleTransfer) to run style transfer with default settings and hyperparameters as specified in the code. The final content and style loss depend on the image.\n\n#### Can you share Stylized-ImageNet directly?\nUnfortunately, due to copyright restrictions I am not allowed to share this version of ImageNet directly, since not all of the original ImageNet images are permitted for using \u002F sharing \u002F modification.\n\n#### In addition to the cue conflict stimuli, can you share the stimuli for the texture experiment \u002F the 'original' experiment \u002F ...?\nUnfortunately, the image permissions do not allow me to share or distribute these stimuli.\n\n#### How do I compute the shape bias of a model?\nIt's simple: check out [bethgelab:model-vs-human](https:\u002F\u002Fgithub.com\u002Fbethgelab\u002Fmodel-vs-human), which supports plotting the shape bias for arbitrary PyTorch \u002F TensorFlow models (dataset name: cue-conflict). Alternatively, if you'd like to go through the steps one-by-one, here's what you'll need to do:\n\n1. Evaluate your models on all 1,280 images here (https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Ftexture-vs-shape\u002Ftree\u002Fmaster\u002Fstimuli\u002Fstyle-transfer-preprocessed-512).\n2. Map model decisions to 16 classes using the code provided above (https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Ftexture-vs-shape#code).\n3. Exclude images without a cue conflict (e.g. texture=cat, shape=cat).\n4. Take the subset of \"correctly\" classified images (either shape or texture category correctly predicted).\n5. Compute \"shape bias\" as the following fraction: (correct shape decisions) \u002F (correct shape decisions + correct texture decisions).\n","### :tada: 更新（2021年8月）：\n现在，绘制您模型的形状偏向性从未如此简单！位于 [bethgelab:model-vs-human](https:\u002F\u002Fgithub.com\u002Fbethgelab\u002Fmodel-vs-human) 的综合工具箱支持此处报告的所有数据集（例如纹理-形状线索冲突、仅轮廓、仅边缘等），并附带用于评估任意 PyTorch 或 TensorFlow 模型的代码。\n\n# 来自论文\u003Cbr>“ImageNet 训练的 CNN 倾向于纹理；增强形状偏向性可提高准确性和鲁棒性”的数据、代码及材料\n\n本仓库包含由 Robert Geirhos、Patricia Rubisch、Claudio Michaelis、Matthias Bethge、Felix A. Wichmann 和 Wieland Brendel 共同撰写的论文《ImageNet 训练的 CNN 倾向于纹理；增强形状偏向性可提高准确性和鲁棒性》的相关信息、数据和材料。我们希望该仓库能为您的研究提供有价值的参考。\n\n核心思想如图所示：如果卷积神经网络看到一只带有大象纹理的猫，它会认为这是一只大象，尽管其形状仍然清晰地呈现出猫的特征。我们发现这种“纹理偏向性”在 ImageNet 训练的 CNN 中非常普遍，这与长期以来广泛持有的观点——即 CNN 主要通过识别物体的形状来完成分类——形成了鲜明对比。\n![intro_figure](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frgeirhos_texture-vs-shape_readme_fff19b338a62.png) \n\n如有任何疑问，请随时通过 robert.geirhos@bethgelab.org 与我联系，或在 Issues 中提交问题！可重复性和开放科学对我而言至关重要，我也非常欢迎关于如何进一步改进的意见和建议。\n\n本 README 的结构与仓库目录结构相对应：每个子目录对应一个部分（按字母顺序排列）。\n\n##### 相关仓库：\n请注意，本文中使用的重要数据集 Stylized-ImageNet 拥有自己的仓库，地址为 [rgeirhos:Stylized-ImageNet](https:\u002F\u002Fgithub.com\u002Frgeirhos\u002FStylized-ImageNet)。用于评估形状\u002F纹理偏向性的线索冲突数据集与 Stylized-ImageNet 并不相同。\n\n本仓库的部分内容借鉴了我们之前的工作“人类与深度神经网络中的泛化能力”（发表于 NeurIPS 2018）。相应的代码、数据和材料可在 [rgeirhos:generalisation-humans-DNNs](https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Fgeneralisation-humans-DNNs) 获取。为方便起见，本仓库直接包含了来自该论文的人类数据（这些数据被用于纹理与形状相关研究的比较），存放于 ``raw-data\u002Fraw-data-from-generalisation-paper\u002F`` 目录下。\n\n## code\n`code\u002F` 目录包含映射功能，可用于根据长度为 1,000 的向量（典型 ImageNet 分类器的 softmax 输出）确定对应的初级类别（共 16 类，例如“狗”）。使用方法如下：\n\n```python\n    # 获取 softmax 输出\n    softmax_output = SomeCNN(input_image) # 替换为您喜欢的 CNN\n\n    # 转换为 numpy 数组\n    softmax_output_numpy = SomeConversionToNumpy(softmax_output) # 替换为实际转换函数\n\n    # 创建映射\n    mapping = probabilities_to_decision.ImageNetProbabilitiesTo16ClassesMapping()\n    \n    # 得到最终决策\n    decision_from_16_classes = mapping.probabilities_to_decision(softmax_output_numpy)\n```\n\n## data-analysis\n`data-analysis\u002F` 目录包含主分析脚本 `data-analysis.R` 及一些辅助功能。所有生成的图表将保存在 `paper-figures\u002F` 目录中。\n\n请注意：对于 AlexNet、VGG-16 和 GoogLeNet，我们使用的是 Caffe 实现；而对于 ResNet-50，则采用了 torchvision 实现。当使用 torchvision 对 AlexNet 和 VGG-16 进行形状偏向性计算时，结果与 Caffe 实现存在差异（如 Issue #7 所指出）。AlexNet 的形状偏向性为 25.3%，而 VGG-16 为 9.2%。这两个数值均低于论文中基于 Caffe 实现所报告的结果。这意味着，采用 torchvision 实现时，我们测得的这两款模型的纹理偏向性更为极端。总体而言，我们建议使用 torchvision 实现，因为它更常用且框架更新。\n\n## lab-experiment\n实验室中进行人类受试者实验所需的一切内容。该部分基于 MATLAB 开发。\n\n##### experimental-code\n包含主 MATLAB 实验程序 `shape_texture_experiment.m`，以及一个 `.yaml` 文件，用于指定实验中的具体参数值（如刺激呈现时间）。部分函数依赖于我们内部开发的 iShow 库，该库可从 [这里](http:\u002F\u002Fdx.doi.org\u002F10.5281\u002Fzenodo.34217) 获取。\n\n##### helper-functions\n部分辅助函数引用了他人的代码，请查阅相应文件以了解版权信息。\n\n## 模型\n文件 ``load_pretrained_models.py`` 将加载在 Stylized-ImageNet 上训练的以下模型：\n\n```python\n\n    from load_pretrained_models import load_model\n    \n    model_A = \"resnet50_trained_on_SIN\"\n    model_B = \"resnet50_trained_on_SIN_and_IN\"\n    model_C = \"resnet50_trained_on_SIN_and_IN_then_finetuned_on_IN\"\n    \n    model = load_model(model_name = model_A) # 或 model_B 或 model_C\n```\n这些对应于论文表2中报告的模型（方法细节见附录A.5节）。此外，还提供了在SIN上训练的AlexNet和VGG-16。请注意，由于训练时使用的超参数可能不够优化，这两款模型的整体性能并不理想。在SIN上训练并在ImageNet上评估的VGG-16的Top1\u002FTop5准确率分别为：Prec@1 52.260，Prec@5 76.390（在SIN上评估时：Prec@1 48.958，Prec@5 73.092）。\n\n我们使用了[PyTorch ImageNet训练脚本](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fexamples\u002Ftree\u002Fmaster\u002Fimagenet)来训练这些模型。以下是训练时的超参数设置：\n\n- 批量大小：256\n- 优化器：SGD (`torch.optim.SGD`)\n- 动量：0.9\n- 权重衰减：1e-4\n- 训练轮数：60轮（model_A），45轮（model_B）。然而，model_B的45轮相当于正常ImageNet训练的90轮，因为用于训练model_B的数据集是ImageNet和Stylized-ImageNet的合并，因此每轮分类器看到的图片数量是标准训练轮次的两倍。\n- 学习率：0.1，每20轮（model_A）或每15轮（model_B）乘以0.1进行衰减。\n- 是否预训练于ImageNet：是（对于model_A和model_B），即使用`torchvision.models.resnet50(pretrained=True)`。用ImageNet训练的初始权重来初始化模型，被证明有利于提高整体准确率。\n\nmodel_C是以model_B的权重进行初始化的。随后在ImageNet上进行了60轮微调，学习率为0.01，每30轮乘以0.1进行衰减。其他超参数（批量大小、优化器、动量和权重衰减）与训练model_A和model_B时相同。\n\n在数据预处理方面，我们对IN和SIN都采用了标准的ImageNet归一化处理（如[PyTorch ImageNet训练脚本](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fexamples\u002Ftree\u002Fmaster\u002Fimagenet)中所用），其均值和标准差如下：\n\n- 均值 = [0.485, 0.456, 0.406]\n- 标准差 = [0.229, 0.224, 0.225]\n\n训练时使用的变换为：\n\n```python\n    train_transforms = transforms.Compose([\n                                  transforms.RandomResizedCrop(224),\n                                  transforms.RandomHorizontalFlip(),\n                                  transforms.ToTensor()])\n```\n\n验证时使用的变换为：\n\n```python                                  \n    val_transforms = transforms.Compose([\n                                      transforms.Resize(256),\n                                      transforms.CenterCrop(224),\n                                      transforms.ToTensor()])                 \n```\n\n输入格式：RGB。\n\n#### 不同SIN训练模型的形状偏差与IN准确率\n\n以下是上述四款模型的形状偏差值。大致来说，模型在ImageNet上训练的轮数越多，其形状偏差越低；而在Stylized-ImageNet上训练的轮数越多，其形状偏差越高。在ImageNet上进行微调（如model_C）会提升ImageNet上的表现，甚至优于标准的ResNet-50，但这也意味着模型会“忘记”微调前的形状偏差。\n\n| 模型 | 形状偏差 | Top-1 IN准确率 | Top-5 IN准确率 |\n|---|---|---|---|\n| 标准ResNet-50 | 21.39% | 76.13 | 92.86 |\n| model_A            | 81.37% | 60.18 | 82.62 |\n| model_B            | 34.65% | 74.59 | 92.14 |\n| model_C            | 20.54% | 76.72 | 93.28 |\n\n请注意，这些数值是使用与论文中略有不同的概率聚合方法计算得出的。我们这里采用的是平均法：将ImageNet类别的概率映射到对应的16类ImageNet类别中，使用所有相关细粒度类别概率的平均值。我们建议使用这种方法，而不是其他聚合方式（求和、取最大等）。[这篇论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1808.08750)的更新版附录第22页及后续内容解释了为什么平均聚合方法才是原则性和更优的选择。\n\n## paper-figures\n包含论文中的所有图表。所有报告结果的图表都可以通过`data-analysis\u002F`中的脚本生成。\n\n## raw-data\n此处为每个观察者和网络实验提供一个`.csv`文件，其中包含原始数据，共计48,560次人类心理物理试验，涉及97名参与者，在受控的实验室环境中完成。\n\n## stimuli\n这些是我们实验中使用的原始刺激材料。每个目录包含按16个子目录（每个类别一个）划分的刺激图像。\n\n# 常见问题解答\n\n#### 运行风格迁移的代码：\n我使用了[Leon Gatys的代码](https:\u002F\u002Fgithub.com\u002Fleongatys\u002FPytorchNeuralStyleTransfer)，并按照代码中指定的默认设置和超参数运行了风格迁移。最终的内容损失和风格损失取决于具体的图像。\n\n#### 能否直接分享Stylized-ImageNet？\n很遗憾，由于版权限制，我无法直接分享这一版本的ImageNet，因为并非所有的原始ImageNet图像都允许使用、分享或修改。\n\n#### 除了线索冲突刺激之外，能否分享纹理实验\u002F“原始”实验等的刺激材料？\n很遗憾，图像使用权限不允许我分享或分发这些刺激材料。\n\n#### 如何计算模型的形状偏差？\n很简单：可以查看[bethgelab:model-vs-human](https:\u002F\u002Fgithub.com\u002Fbethgelab\u002Fmodel-vs-human)，它支持绘制任意PyTorch\u002FTensorFlow模型的形状偏差曲线（数据集名称：线索冲突）。或者，如果您想一步步操作，可以按照以下步骤进行：\n\n1. 在这里的1,280张图像上评估您的模型（https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Ftexture-vs-shape\u002Ftree\u002Fmaster\u002Fstimuli\u002Fstyle-transfer-preprocessed-512）。\n2. 使用上述提供的代码将模型决策映射到16个类别（https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Ftexture-vs-shape#code）。\n3. 排除没有线索冲突的图像（例如，纹理=猫，形状=猫）。\n4. 选取“正确”分类的图像子集（即正确预测了形状或纹理类别）。\n5. 计算“形状偏差”，公式为：（正确形状判断的数量）\u002F（正确形状判断的数量 + 正确纹理判断的数量）。","# texture-vs-shape 快速上手指南\n\n本指南帮助开发者快速复现论文《ImageNet-trained CNNs are biased towards texture》中的核心实验，评估模型在“纹理”与“形状”之间的偏好偏差（Shape Bias）。\n\n## 环境准备\n\n*   **操作系统**: Linux \u002F macOS \u002F Windows\n*   **语言环境**: Python 3.6+\n*   **核心依赖**:\n    *   PyTorch (推荐用于加载预训练模型和评估)\n    *   NumPy\n    *   Matplotlib (用于绘图)\n    *   R 语言及必要包 (可选，仅当需要运行 `data-analysis.R` 复现论文图表时)\n*   **前置知识**: 熟悉 PyTorch 模型推理流程。\n\n> **注意**: 本项目部分历史实验基于 Caffe 或 MATLAB，但官方推荐使用 **PyTorch (torchvision)** 实现，因其更通用且更新及时。使用 torchvision 实现的模型（如 AlexNet, VGG-16）计算出的形状偏差值可能略低于原论文（基于 Caffe），但这反映了更极端的纹理偏差。\n\n## 安装步骤\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Ftexture-vs-shape.git\n    cd texture-vs-shape\n    ```\n\n2.  **安装 Python 依赖**\n    确保已安装 PyTorch。若未安装，可使用以下命令（国内用户推荐使用清华源加速）：\n    ```bash\n    pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple torch torchvision numpy matplotlib\n    ```\n    *注：本项目主要提供数据处理脚本和预训练权重加载逻辑，无复杂的 `setup.py` 一键安装，需手动配置环境。*\n\n3.  **获取预训练模型权重**\n    代码库中的 `models\u002Fload_pretrained_models.py` 用于加载在 Stylized-ImageNet (SIN) 上训练的模型。\n    *   由于版权限制，作者无法直接分发 Stylized-ImageNet 数据集及部分刺激图像。\n    *   你需要自行下载预训练权重文件（通常需在原仓库 Issue 或相关论文附录中查找链接），并将其放置在 `models\u002F` 目录下，或修改 `load_pretrained_models.py` 中的路径指向你的本地权重文件。\n\n## 基本使用\n\n### 1. 加载在 Stylized-ImageNet 上训练的模型\n\n使用提供的脚本加载具有不同形状偏差的 ResNet-50 变体：\n\n```python\nfrom load_pretrained_models import load_model\n\n# 可选模型:\n# model_A: 仅在 SIN 上训练 (高形状偏差)\n# model_B: 在 SIN 和 ImageNet 混合数据上训练\n# model_C: 先在混合数据训练，再在 ImageNet 微调 (低形状偏差，高精度)\n\nmodel_name = \"resnet50_trained_on_SIN\" \nmodel = load_model(model_name=model_name)\n\nmodel.eval()\n```\n\n### 2. 将模型输出映射为 16 个基础类别\n\n为了计算形状偏差，需要将 ImageNet 的 1000 类输出概率映射到 16 个入口级类别（如 \"dog\", \"cat\" 等）。\n\n```python\nimport numpy as np\nfrom code.probabilities_to_decision import ImageNetProbabilitiesTo16ClassesMapping\n\n# 假设 input_image 已经过预处理并输入模型\n# softmax_output 形状应为 [1, 1000]\nsoftmax_output = model(input_image) \n\n# 转换为 numpy 数组\nsoftmax_output_numpy = softmax_output.detach().cpu().numpy()\n\n# 创建映射器\nmapping = ImageNetProbabilitiesTo16ClassesMapping()\n\n# 获取决策结果 (返回 0-15 的类别索引)\ndecision_from_16_classes = mapping.probabilities_to_decision(softmax_output_numpy)\n```\n\n### 3. 计算形状偏差 (Shape Bias)\n\n要评估任意模型的形状偏差，请遵循以下逻辑流程（或使用推荐的工具箱 [bethgelab\u002Fmodel-vs-human](https:\u002F\u002Fgithub.com\u002Fbethgelab\u002Fmodel-vs-human)）：\n\n1.  **评估**: 在冲突数据集（Cue Conflict Dataset，即纹理与形状不一致的图像，共 1280 张）上运行模型推理。\n2.  **映射**: 使用上述代码将 1000 类概率映射为 16 类决策。\n3.  **筛选**: 排除无冲突样本（纹理和形状属于同一类），仅保留存在冲突的样本。\n4.  **统计**: 在模型预测正确（预测结果等于形状标签 **或** 纹理标签）的样本中，计算形状偏差：\n\n$$ \\text{Shape Bias} = \\frac{\\text{预测为形状类别的正确次数}}{\\text{预测为形状类别的正确次数} + \\text{预测为纹理类别的正确次数}} $$\n\n*提示：官方强烈建议使用 **平均概率聚合方法** (average aggregation)，即将细粒度类别的概率取平均值映射到大类，而非求和或取最大值。*","某自动驾驶感知团队在优化车辆对极端天气下障碍物的识别能力时，发现模型在浓雾或伪装场景中频繁误判物体类别。\n\n### 没有 texture-vs-shape 时\n- 模型过度依赖表面纹理特征，将覆盖积雪的“猫”形状物体误识别为“雪堆”，或将迷彩涂装的车辆漏检。\n- 缺乏量化评估手段，工程师仅凭直觉调整数据增强策略，无法确认模型是否真正提升了形状敏感度。\n- 复现论文中的“纹理 - 形状冲突”测试集耗时费力，需手动收集并标注大量特殊对抗样本。\n- 不同框架（PyTorch\u002FTensorFlow）下的模型对比困难，难以统一标准来衡量架构改进带来的鲁棒性收益。\n\n### 使用 texture-vs-shape 后\n- 利用内置的冲突数据集快速诊断出模型存在严重的“纹理偏差”，明确将优化目标转向提升形状偏置（Shape Bias）。\n- 通过提供的映射代码，直接将千分类 Softmax 输出转换为决策类，高效评估模型在剪影或边缘-only 数据上的表现。\n- 借助标准化的分析脚本，一键生成形状偏置对比图，直观验证引入风格化 ImageNet 训练后模型鲁棒性的提升幅度。\n- 统一了跨框架评估流程，团队得以在同一基准下对比 ResNet 与 ViT 等架构在极端条件下的泛化能力差异。\n\ntexture-vs-shape 通过提供标准化的偏差诊断工具与数据集，帮助开发者从“看纹理”转向“看形状”，显著提升了模型在复杂现实场景中的准确性与鲁棒性。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frgeirhos_texture-vs-shape_fff19b33.png","rgeirhos","Robert Geirhos","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Frgeirhos_3ca4a650.jpg","Research Scientist, Google DeepMind","Google","Toronto",null,"robertgeirhos.com","https:\u002F\u002Fgithub.com\u002Frgeirhos",[83,87,91,95],{"name":84,"color":85,"percentage":86},"R","#198CE7",67.9,{"name":88,"color":89,"percentage":90},"MATLAB","#e16737",18.1,{"name":92,"color":93,"percentage":94},"Python","#3572A5",13.5,{"name":96,"color":97,"percentage":98},"Shell","#89e051",0.5,812,107,"2026-04-08T22:32:36","NOASSERTION","未说明","未说明 (代码基于 PyTorch，通常支持 CPU 或 GPU 运行，但 README 未明确指定显卡型号、显存或 CUDA 版本要求)",{"notes":106,"python":103,"dependencies":107},"1. 该仓库主要包含数据分析脚本 (R)、模型加载代码 (PyTorch) 和人类实验代码 (MATLAB)。2. 预训练模型 (如 ResNet-50) 可通过提供的 Python 脚本直接加载。3. 由于版权限制，作者无法直接分享 Stylized-ImageNet 数据集及部分原始刺激图像，用户需自行准备或参考相关仓库。4. 计算形状偏差 (Shape Bias) 推荐使用更新的 'model-vs-human' 工具箱，或直接使用本仓库提供的映射代码处理 1000 类输出到 16 类决策。5. 部分旧模型 (AlexNet, VGG-16) 的原始结果基于 Caffe，若使用 torchvision 复现，形状偏差数值会有所不同（纹理偏差更极端）。",[108,109,110,111,112],"torch","torchvision","numpy","R (用于数据分析脚本)","MATLAB (仅用于人类实验部分)",[14],[115,116,117,118,119,120,121],"deep-learning","psychophysics","human-vision","object-recognition","features","shape","texture","2026-03-27T02:49:30.150509","2026-04-10T10:34:30.187500",[125,130,135,140,144,149],{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},27458,"是否有工具可以验证神经网络中的形状偏差方法？","是的，维护者团队最近开源了一个名为 `model-vs-human` 的 Python 工具箱（地址：https:\u002F\u002Fgithub.com\u002Fbethgelab\u002Fmodel-vs-human），该工具专门用于评估预训练模型的形状偏差。如果您在复现形状偏差相关实验或验证新方法时遇到困难，可以使用此工具箱来辅助分析和验证结果。","https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Ftexture-vs-shape\u002Fissues\u002F26",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},27453,"如何获取预训练的 Stylized-ImageNet (SIN) VGG-16 模型？","维护者已同意分享该模型。用户可以直接使用 SIN 训练的 VGG-16 模型，但需要注意：为了获得与正常 VGG 模型可比的结果，可能需要对内容\u002F风格权重进行平方处理（content\u002Fstyle weights^2）。如果在代码中遇到预处理问题，请确认是否错误地使用了 Caffe 预处理，SIN 训练模型通常不需要或需要特定的预处理配置。","https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Ftexture-vs-shape\u002Fissues\u002F9",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},27454,"运行 Docker 镜像后为什么工作文件夹是空的，没有生成结果？","Docker 镜像的主要作用是让您无需安装各种库即可生成 Stylized-ImageNet 数据集，它不包含论文中的数据分析和绘图部分。因此，运行镜像后只会得到空的工作文件夹或 Notebook。若要复现论文中的图表（如混淆矩阵），您需要单独运行仓库中的 R 脚本：`data-analysis\u002Fdata-analysis.R`。Docker 镜像与该 R 脚本或相关的 .py 文件没有直接连接。","https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Ftexture-vs-shape\u002Fissues\u002F13",{"id":141,"question_zh":142,"answer_zh":143,"source_url":129},27455,"为什么无法复现 Cue Conflict 实验的结果（准确率与论文不符）？","这通常是因为 torchvision 库更新了预训练模型的权重版本。如果您使用的是较新版本的 torchvision，请尝试显式指定旧的权重版本以匹配原始结果。例如，对于 ResNet50，请将代码从 `models.resnet50(pretrained=True)` 修改为 `models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)`。此外，维护者还开源了一个名为 `model-vs-human` 的 Python 工具箱，可用于评估预训练模型的形状偏差，建议尝试使用该工具解决复现问题。",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},27456,"ResNet-50 架构本身是否有修改使其在 Stylized-ImageNet 上具有形状偏差？","ResNet-50 的架构本身没有进行修改。模型表现出形状偏差（Shape-biased）还是纹理偏差（Texture-biased）完全取决于训练数据集。在标准 ImageNet 上训练时，模型倾向于学习纹理特征；而在 Stylized-ImageNet (SIN) 上训练时，由于纹理信息被破坏，模型被迫学习形状特征，从而表现出形状偏差。您可以直接使用相同的 ResNet-50 架构，只需更换训练数据集即可改变其偏差特性。","https:\u002F\u002Fgithub.com\u002Frgeirhos\u002Ftexture-vs-shape\u002Fissues\u002F24",{"id":150,"question_zh":151,"answer_zh":152,"source_url":134},27457,"在使用 SIN 训练的模型进行风格迁移时，如何调整权重以获得最佳效果？","为了在使用 SIN 训练的模型（如 SIN-VGG16）时获得与标准模型相当的风格迁移效果，建议对内容和风格权重进行平方操作。可以通过以下代码逻辑实现权重归一化：遍历模块损失列表，将强度设置为 `i.strength = i.strength**2 \u002F max(i.target.size())`。如果仅进行平方操作而不除以目标大小，也可能改善结果，具体需根据输出图像效果微调。",[]]