[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-piEsposito--blitz-bayesian-deep-learning":3,"tool-piEsposito--blitz-bayesian-deep-learning":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":95,"forks":96,"last_commit_at":97,"license":98,"difficulty_score":99,"env_os":100,"env_gpu":101,"env_ram":101,"env_deps":102,"category_tags":109,"github_topics":110,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":117,"updated_at":118,"faqs":119,"releases":150},8864,"piEsposito\u002Fblitz-bayesian-deep-learning","blitz-bayesian-deep-learning","A simple and extensible library to create Bayesian Neural Network layers on PyTorch.","blitz-bayesian-deep-learning（简称 BLiTZ）是一个专为 PyTorch 设计的轻量级开源库，旨在帮助开发者轻松构建贝叶斯神经网络。与传统神经网络只能输出单一预测值不同，BLiTZ 通过引入概率分布来模拟权重的不确定性，使模型不仅能给出预测结果，还能提供该结果的置信区间。\n\n这一特性有效解决了传统深度学习模型在面临噪声数据或分布外样本时“过度自信”的问题。例如在金融交易或医疗诊断等高风险场景中，知道预测值的可靠范围往往比单纯的数值估计更具决策价值。BLiTZ 基于经典的“神经网络权重不确定性”理论，允许用户在几乎不改变原有 PyTorch 代码结构的前提下，将普通网络层替换为贝叶斯层，并自动计算模型复杂度的代价函数，从而简化了训练流程。\n\n该工具特别适合需要评估模型不确定性的 AI 研究人员、数据科学家以及希望提升模型鲁棒性的深度学习开发者。其核心亮点在于高度的可扩展性与易用性：用户既可以快速上手进行回归或分类任务，也能通过其核心的权重采样类自定义更复杂的网络结构。如果你希望在保持 PyTorch 原生开发体验的同时，为模型增添“自知之明”，BLiTZ 是","blitz-bayesian-deep-learning（简称 BLiTZ）是一个专为 PyTorch 设计的轻量级开源库，旨在帮助开发者轻松构建贝叶斯神经网络。与传统神经网络只能输出单一预测值不同，BLiTZ 通过引入概率分布来模拟权重的不确定性，使模型不仅能给出预测结果，还能提供该结果的置信区间。\n\n这一特性有效解决了传统深度学习模型在面临噪声数据或分布外样本时“过度自信”的问题。例如在金融交易或医疗诊断等高风险场景中，知道预测值的可靠范围往往比单纯的数值估计更具决策价值。BLiTZ 基于经典的“神经网络权重不确定性”理论，允许用户在几乎不改变原有 PyTorch 代码结构的前提下，将普通网络层替换为贝叶斯层，并自动计算模型复杂度的代价函数，从而简化了训练流程。\n\n该工具特别适合需要评估模型不确定性的 AI 研究人员、数据科学家以及希望提升模型鲁棒性的深度学习开发者。其核心亮点在于高度的可扩展性与易用性：用户既可以快速上手进行回归或分类任务，也能通过其核心的权重采样类自定义更复杂的网络结构。如果你希望在保持 PyTorch 原生开发体验的同时，为模型增添“自知之明”，BLiTZ 是一个值得尝试的专业选择。","# Blitz - Bayesian Layers in Torch Zoo\n\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_9af0f5025df1.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fblitz-bayesian-pytorch)\n\nBLiTZ is a simple and extensible library to create Bayesian Neural Network Layers (based on whats proposed in [Weight Uncertainty in Neural Networks paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1505.05424)) on PyTorch. By using BLiTZ layers and utils, you can add uncertanity and gather the complexity cost of your model in a simple way that does not affect the interaction between your layers, as if you were using standard PyTorch.\n\nBy using our core weight sampler classes, you can extend and improve this library to add uncertanity to a bigger scope of layers as you will in a well-integrated to PyTorch way. Also pull requests are welcome.\n\n \n# Index\n * [Install](#Install)\n * [Documentation](#Documentation)\n * [A simple example for regression](#A-simple-example-for-regression)\n   * [Importing the necessary modules](#Importing-the-necessary-modules)\n   * [Loading and scaling data](#Loading-and-scaling-data)\n   * [Creating our variational regressor class](#Creating-our-variational-regressor-class)\n   * [Defining a confidence interval evaluating function](#Defining-a-confidence-interval-evaluating-function)\n   * [Creating our regressor and loading data](#Creating-our-regressor-and-loading-data)\n   * [Our main training and evaluating loop](#Our-main-training-and-evaluating-loop)\n * [Bayesian Deep Learning in a Nutshell](#Bayesian-Deep-Learning-in-a-Nutshell)\n   * [First of all, a deterministic NN layer linear-transformation](#First-of-all,-a-deterministic-NN-layer-linear-transformation)\n   * [The purpose of Bayesian Layers](#The-purpose-of-Bayesian-Layers)\n   * [Weight sampling on Bayesian Layers](#Weight-sampling-on-Bayesian-Layers)\n   * [It is possible to optimize our trainable weights](#It-is-possible-to-optimize-our-trainable-weights)\n   * [It is also true that there is complexity cost function differentiable along its variables](#It-is-also-true-that-there-is-complexity-cost-function-differentiable-along-its-variables)\n   * [To get the whole cost function at the nth sample](#To-get-the-whole-cost-function-at-the-nth-sample)\n   * [Some notes and wrap up](#Some-notes-and-wrap-up)\n * [Citing](#Citing)\n * [References](#References)\n   \n   \n## Install\n\nTo install BLiTZ you can use pip command:\n\n```\npip install blitz-bayesian-pytorch\n```\nOr, via conda:\n\n```\nconda install -c conda-forge blitz-bayesian-pytorch\n```\n\nYou can also git-clone it and pip-install it locally:\n\n```\nconda create -n blitz python=3.9\nconda activate blitz\ngit clone https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning.git\ncd blitz-bayesian-deep-learning\npip install .\n```\n\n## Documentation\n\nDocumentation for our layers, weight (and prior distribution) sampler and utils:\n * [Bayesian Layers](doc\u002Flayers.md)\n * [Weight and prior distribution samplers](doc\u002Fsamplers.md)\n * [Utils (for easy integration with PyTorch)](doc\u002Futils.md)\n * [Losses](doc\u002Flosses.md)\n\n## A simple example for regression\n\n(You can see it for your self by running [this example](blitz\u002Fexamples\u002Fbayesian_regression_boston.py) on your machine).\n\nWe will now see how can Bayesian Deep Learning be used for regression in order to gather confidence interval over our datapoint rather than a pontual continuous value prediction. Gathering a confidence interval for your prediction may be even a more useful information than a low-error estimation. \n\nI sustain my argumentation on the fact that, with good\u002Fhigh prob a confidence interval, you can make a more reliable decision than with a very proximal estimation on some contexts: if you are trying to get profit from a trading operation, for example, having a good confidence interval may lead you to know if, at least, the value on which the operation wil procees will be lower (or higher) than some determinate X.\n\nKnowing if a value will be, surely (or with good probability) on a determinate interval can help people on sensible decision more than a very proximal estimation that, if lower or higher than some limit value, may cause loss on a transaction. The point is that, sometimes, knowing if there will be profit may be more useful than measuring it.\n\nIn order to demonstrate that, we will create a Bayesian Neural Network Regressor for the Boston-house-data toy dataset, trying to create confidence interval (CI) for the houses of which the price we are trying to predict. We will perform some scaling and the CI will be about 75%. It will be interesting to see that about 90% of the CIs predicted are lower than the high limit OR (inclusive) higher than the lower one.\n\n## Importing the necessary modules\nDespite from the known modules, we will bring from BLiTZ athe `variational_estimator`decorator, which helps us to handle the BayesianLinear layers on the module keeping it fully integrated with the rest of Torch, and, of course, `BayesianLinear`, which is our layer that features weight uncertanity.\n\n```python\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport numpy as np\n\nfrom blitz.modules import BayesianLinear\nfrom blitz.utils import variational_estimator\n\nfrom sklearn.datasets import load_boston\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.model_selection import train_test_split\n```\n\n## Loading and scaling data\n\nNothing new under the sun here, we are importing and standard-scaling the data to help with the training.\n\n```python\nX, y = load_boston(return_X_y=True)\nX = StandardScaler().fit_transform(X)\ny = StandardScaler().fit_transform(np.expand_dims(y, -1))\n\nX_train, X_test, y_train, y_test = train_test_split(X,\n                                                    y,\n                                                    test_size=.25,\n                                                    random_state=42)\n\n\nX_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()\nX_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()\n```\n\n# Creating our variational regressor class\n\nWe can create our class with inhreiting from nn.Module, as we would do with any Torch network. Our decorator introduces the methods to handle the bayesian features, as calculating the complexity cost of the Bayesian Layers and doing many feedforwards (sampling different weights on each one) in order to sample our loss.\n\n```python\n@variational_estimator\nclass BayesianRegressor(nn.Module):\n    def __init__(self, input_dim, output_dim):\n        super().__init__()\n        #self.linear = nn.Linear(input_dim, output_dim)\n        self.blinear1 = BayesianLinear(input_dim, 512)\n        self.blinear2 = BayesianLinear(512, output_dim)\n        \n    def forward(self, x):\n        x_ = self.blinear1(x)\n        x_ = F.relu(x_)\n        return self.blinear2(x_)\n```\n\n# Defining a confidence interval evaluating function\n\nThis function does create a confidence interval for each prediction on the batch on which we are trying to sample the label value. We then can measure the accuracy of our predictions by seeking how much of the prediciton distributions did actually include the correct label for the datapoint.\n\n\n```python\ndef evaluate_regression(regressor,\n                        X,\n                        y,\n                        samples = 100,\n                        std_multiplier = 2):\n    preds = [regressor(X) for i in range(samples)]\n    preds = torch.stack(preds)\n    means = preds.mean(axis=0)\n    stds = preds.std(axis=0)\n    ci_upper = means + (std_multiplier * stds)\n    ci_lower = means - (std_multiplier * stds)\n    ic_acc = (ci_lower \u003C= y) * (ci_upper >= y)\n    ic_acc = ic_acc.float().mean()\n    return ic_acc, (ci_upper >= y).float().mean(), (ci_lower \u003C= y).float().mean()\n```\n\n# Creating our regressor and loading data\n\nNotice here that we create our `BayesianRegressor` as we would do with other neural networks.\n\n```python\nregressor = BayesianRegressor(13, 1)\noptimizer = optim.Adam(regressor.parameters(), lr=0.01)\ncriterion = torch.nn.MSELoss()\n\nds_train = torch.utils.data.TensorDataset(X_train, y_train)\ndataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)\n\nds_test = torch.utils.data.TensorDataset(X_test, y_test)\ndataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)\n```\n\n## Our main training and evaluating loop\n\nWe do a training loop that only differs from a common torch training by having its loss sampled by its sample_elbo method. All the other stuff can be done normally, as our purpose with BLiTZ is to ease your life on iterating on your data with different Bayesian NNs without trouble.\n\nHere is our very simple training loop:\n\n```python\niteration = 0\nfor epoch in range(100):\n    for i, (datapoints, labels) in enumerate(dataloader_train):\n        optimizer.zero_grad()\n        \n        loss = regressor.sample_elbo(inputs=datapoints,\n                           labels=labels,\n                           criterion=criterion,\n                           sample_nbr=3)\n        loss.backward()\n        optimizer.step()\n        \n        iteration += 1\n        if iteration%100==0:\n            ic_acc, under_ci_upper, over_ci_lower = evaluate_regression(regressor,\n                                                                        X_test,\n                                                                        y_test,\n                                                                        samples=25,\n                                                                        std_multiplier=3)\n            \n            print(\"CI acc: {:.2f}, CI upper acc: {:.2f}, CI lower acc: {:.2f}\".format(ic_acc, under_ci_upper, over_ci_lower))\n            print(\"Loss: {:.4f}\".format(loss))\n```\n\n## Bayesian Deep Learning in a Nutshell\nA very fast explanation of how is uncertainity introduced in Bayesian Neural Networks and how we model its loss in order to objectively improve the confidence over its prediction and reduce the variance without dropout. \n\n## First of all, a deterministic NN layer linear transformation\n\nAs we know, on deterministic (non bayesian) neural network layers, the trainable parameters correspond directly to the weights used on its linear transformation of the previous one (or the input, if it is the case). It corresponds to the following equation:\n\n\n![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_6bf3c78e7e9b.png)}&space;=&space;W^{(i&plus;1)}\\cdot&space;z^{(i)}&space;&plus;&space;b^{(i&plus;1)}) \n\n*(Z correspond to the activated-output of the layer i)*\n\n## The purpose of Bayesian Layers\n\nBayesian layers seek to introduce uncertainity on its weights by sampling them from a distribution parametrized by trainable variables on each feedforward operation. \n\nThis allows we not just to optimize the performance metrics of the model, but also gather the uncertainity of the network predictions over a specific datapoint (by sampling it much times and measuring the dispersion) and aimingly reduce as much as possible the variance of the network over the prediction, making possible to know how much of incertainity we still have over the label if we try to model it in function of our specific datapoint.\n\n## Weight sampling on Bayesian Layers\nTo do so, on each feedforward operation we sample the parameters of the linear transformation with the following equations (where **ρ** parametrizes the standard deviation and **μ** parametrizes the mean for the samples linear transformation parameters) :\n\nFor the weights:\n\n![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_d9a58d704c3d.png)}_{(n)}&space;=&space;\\mathcal{N}(0,1)&space;*&space;log(1&space;&plus;&space;\\rho^{(i)}&space;)&space;&plus;&space;\\mu^{(i)})\n\n*Where the sampled W corresponds to the weights used on the linear transformation for the ith layer on the nth sample.*\n\nFor the biases:\n\n![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_86a0185e9712.png)}_{(n)}&space;=&space;\\mathcal{N}(0,1)&space;*&space;log(1&space;&plus;&space;\\rho^{(i)}&space;)&space;&plus;&space;\\mu^{(i)})\n\n*Where the sampled b corresponds to the biases used on the linear transformation for the ith layer on the nth sample.*\n\n## It is possible to optimize our trainable weights\n\nEven tough we have a random multiplier for our weights and biases, it is possible to optimize them by, given some differentiable function of the weights sampled and trainable parameters (in our case, the loss), summing the derivative of the function relative to both of them:\n\n1. Let ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_ea1208eeeed3.png))\n2. Let ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_7e94bf93ae99.png))\n3. Let ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_27e3454fc65e.png)&space;*&space;\\epsilon)\n4. Let ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_c32c3ff2100a.png)) be differentiable relative to its variables\n\nTherefore:\n\n5. ![equation](https:\u002F\u002Flatex.codecogs.com\u002Fgif.latex?\\Delta_{\\mu}&space;=&space;\\frac{\\delta&space;f(w,&space;\\theta)}{\\delta&space;w}&space;&plus;&space;\\frac{\\delta&space;f(w,&space;\\theta)}{\\delta&space;\\mu})\n\nand\n\n\n6. ![equation](https:\u002F\u002Flatex.codecogs.com\u002Fgif.latex?\\Delta_{\\rho}&space;=&space;\\frac{\\delta&space;f(w,&space;\\theta)}{\\delta&space;w}&space;\\frac{\\epsilon}{1&space;&plus;&space;e^\\rho&space;}&space;&plus;&space;\\frac{\\delta&space;f(w,&space;\\theta)}{\\delta&space;\\rho})\n\n## It is also true that there is complexity cost function differentiable along its variables\n\nIt is known that the crossentropy loss (and MSE) are differentiable. Therefore if we prove that there is a complexity-cost function that is differentiable, we can leave it to our framework take the derivatives and compute the gradients on the optimization step.\n\n**The complexity cost is calculated, on the feedforward operation, by each of the Bayesian Layers, (with the layers pre-defined-simpler apriori distribution and its empirical distribution). The sum of the complexity cost of each layer is summed to the loss.**\n\nAs proposed in [Weight Uncertainty in Neural Networks paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1505.05424), we can gather the complexity cost of a distribution by taking the [Kullback-Leibler Divergence](https:\u002F\u002Fjhui.github.io\u002F2017\u002F01\u002F05\u002FDeep-learning-Information-theory\u002F) from it to a much simpler distribution, and by making some approximation, we will can differentiate this function relative to its variables (the distributions):\n\n1. Let ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_548ea174d62f.png)) be a low-entropy distribution pdf set by hand, which will be assumed as an \"a priori\" distribution for the weights\n\n2. Let ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_3d184379b7ee.png)) be the a posteriori empirical distribution pdf for our sampled weights, given its parameters.\n\n\n\n\nTherefore, for each scalar on the W sampled matrix:\n\n\n\n\n3. ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_6595d6d2409d.png)&space;\\lVert&space;{P}(w)&space;)&space;=&space;\\lim_{n\\to\\infty}1\u002Fn\\sum_{i=0}^{n}&space;{Q}(w^{(i)}&space;|&space;\\theta)*&space;(\\log{{Q}(w^{(i)}&space;|&space;\\theta)}&space;-&space;\\log{{P}(w^{(i)})}&space;))\n\n\nBy assuming a very large n, we could approximate:\n\n4. ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_6595d6d2409d.png)&space;\\lVert&space;{P}(w)&space;)&space;=&space;1\u002Fn\\sum_{i=0}^{n}&space;{Q}(w^{(i)}&space;|&space;\\theta)*&space;(\\log{{Q}(w^{(i)}&space;|&space;\\theta)}&space;-&space;\\log{{P}(w^{(i)})}&space;))\n\n\nand therefore:\n\n\n5. ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_6595d6d2409d.png)&space;\\lVert&space;{P}(w)&space;)&space;=&space;\\mu_Q&space;*\\sum_{i=0}^{n}&space;(\\log{{Q}(w^{(i)}&space;|&space;\\theta)}&space;-&space;\\log{{P}(w^{(i)})}&space;))\n\n\nAs the expected (mean) of the Q distribution ends up by just scaling the values, we can take it out of the equation (as there will be no framework-tracing). Have a complexity cost of the nth sample as:\n\n6. ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_1f0cf4313bcf.png)}&space;(w^{(n)},&space;\\theta)&space;}&space;=&space;(\\log{{Q}(w^{(n)}&space;|&space;\\theta)}&space;-&space;\\log{{P}(w^{(n)})}&space;))\n\nWhich is differentiable relative to all of its parameters. \n\n## To get the whole cost function at the nth sample:\n\n1. Let a performance (fit to data) function be: ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_15ea3f8540ef.png)}&space;(w^{(n)},&space;\\theta)})\n\n\nTherefore the whole cost function on the nth sample of weights will be:\n\n2. ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_3eaea68041a8.png)}&space;(w^{(n)},&space;\\theta)&space;}&space;=&space;{C^{(n)}&space;(w^{(n)},&space;\\theta)&space;}&space;&plus;&space;{P^{(n)}&space;(w^{(n)},&space;\\theta)&space;})\n\nWe can estimate the true full Cost function by Monte Carlo sampling it (feedforwarding the netwok X times and taking the mean over full loss) and then backpropagate using our estimated value. It works for a low number of experiments per backprop and even for unitary experiments.\n\n## Some notes and wrap up\nWe came to the end of a Bayesian Deep Learning in a Nutshell tutorial. By knowing what is being done here, you can implement your bnn model as you wish. \n\nMaybe you can optimize by doing one optimize step per sample, or by using this Monte-Carlo-ish method to gather the loss some times, take its mean and then optimizer. Your move.\n\nFYI: **Our Bayesian Layers and utils help to calculate the complexity cost along the layers on each feedforward operation, so don't mind it to much.**\n \n## References:\n * [Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424, 2015.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1505.05424)\n \n \n## Citing\n\nIf you use `BLiTZ` in your research, you can cite it as follows:\n\n```bibtex\n@misc{esposito2020blitzbdl,\n    author = {Piero Esposito},\n    title = {BLiTZ - Bayesian Layers in Torch Zoo (a Bayesian Deep Learing library for Torch)},\n    year = {2020},\n    publisher = {GitHub},\n    journal = {GitHub repository},\n    howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning\u002F}},\n}\n```\n \n###### Made by Pi Esposito\n","# Blitz - Torch Zoo 中的贝叶斯层\n\n[![下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_9af0f5025df1.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fblitz-bayesian-pytorch)\n\nBLiTZ 是一个简单且可扩展的库，用于在 PyTorch 上构建贝叶斯神经网络层（基于《神经网络中的权重不确定性》论文中提出的方案）。通过使用 BLiTZ 的层和工具，您可以以一种不会影响各层之间交互的方式，轻松地为模型添加不确定性并计算复杂度代价，就像使用标准 PyTorch 一样。\n\n借助我们核心的权重采样器类，您可以扩展和完善此库，以更广泛地为各种层引入不确定性，并且这些操作将与 PyTorch 完美集成。我们也欢迎贡献代码！\n\n \n# 目录\n * [安装](#Install)\n * [文档](#Documentation)\n * [回归任务的简单示例](#A-simple-example-for-regression)\n   * [导入必要的模块](#Importing-the-necessary-modules)\n   * [加载并标准化数据](#Loading-and-scaling-data)\n   * [创建我们的变分回归器类](#Creating-our-variational-regressor-class)\n   * [定义置信区间评估函数](#Defining-a-confidence-interval-evaluating-function)\n   * [创建回归器并加载数据](#Creating-our-regressor-and-loading-data)\n   * [主训练与评估循环](#Our-main-training-and-evaluating-loop)\n * [贝叶斯深度学习简述](#Bayesian-Deep-Learning-in-a-Nutshell)\n   * [首先，确定性神经网络层的线性变换](#First-of-all,-a-deterministic-NN-layer-linear-transformation)\n   * [贝叶斯层的目的](#The-purpose-of-Bayesian-Layers)\n   * [贝叶斯层中的权重采样](#Weight-sampling-on-Bayesian-Layers)\n   * [我们可以优化可训练的权重](#It-is-possible-to-optimize-our-trainable-weights)\n   * [同时，存在一个关于其变量可导的复杂度代价函数](#It-is-also-true-that-there-is-complexity-cost-function-differentiable-along-its-variables)\n   * [获取第 n 次采样的完整代价函数](#To-get-the-whole-cost-function-at-the-nth-sample)\n   * [一些说明与总结](#Some-notes-and-wrap-up)\n * [引用](#Citing)\n * [参考文献](#References)\n   \n   \n## 安装\n\n要安装 BLiTZ，您可以使用 pip 命令：\n\n```\npip install blitz-bayesian-pytorch\n```\n或者通过 conda：\n\n```\nconda install -c conda-forge blitz-bayesian-pytorch\n```\n\n您也可以通过 git 克隆并在本地使用 pip 安装：\n\n```\nconda create -n blitz python=3.9\nconda activate blitz\ngit clone https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning.git\ncd blitz-bayesian-deep-learning\npip install .\n```\n\n## 文档\n\n关于我们层、权重（及先验分布）采样器和工具的文档：\n * [贝叶斯层](doc\u002Flayers.md)\n * [权重与先验分布采样器](doc\u002Fsamplers.md)\n * [工具（便于与 PyTorch 集成）](doc\u002Futils.md)\n * [损失函数](doc\u002Flosses.md)\n\n## 回归任务的简单示例\n\n（您可以在自己的机器上运行 [这个示例](blitz\u002Fexamples\u002Fbayesian_regression_boston.py) 来亲自体验）。\n\n接下来我们将展示如何利用贝叶斯深度学习进行回归任务，从而为每个数据点预测一个置信区间，而不是单一的连续值预测。为预测结果提供置信区间，有时甚至比低误差的精确估计更有用。\n\n我的理由是，在某些情况下，拥有一个高置信度的区间范围，往往能帮助我们做出比非常接近真实值的估计更为可靠的决策：例如，在进行交易操作时，如果能够知道价格至少会在某个特定范围内波动，就能判断这笔交易是否会带来收益或损失。了解某个数值是否一定会落在某个区间内，对于需要谨慎决策的场景来说，可能比一个非常接近但可能超出界限的精确估计更有价值。关键在于，有时候知道“是否有收益”本身可能比精确衡量收益大小更重要。\n\n为了证明这一点，我们将使用 Boston-house-data 这个玩具数据集，构建一个贝叶斯神经网络回归器，尝试为待预测房价生成置信区间。我们将对数据进行标准化处理，置信区间的置信度设为 75%。有趣的是，我们会发现大约 90% 的预测置信区间要么低于上限，要么高于下限。\n  \n## 导入必要的模块\n\n除了常用的模块外，我们还将从 BLiTZ 库中引入 `variational_estimator` 装饰器，它可以帮助我们在保持与 PyTorch 完全兼容的同时处理 BayesianLinear 层；当然，还有我们的核心层——具有权重不确定性的 `BayesianLinear`。\n\n```python\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport numpy as np\n\nfrom blitz.modules import BayesianLinear\nfrom blitz.utils import variational_estimator\n\nfrom sklearn.datasets import load_boston\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.model_selection import train_test_split\n```\n\n## 加载并标准化数据\n\n这里没有什么新内容，我们只是加载数据并对其进行标准化处理，以便更好地进行训练。\n\n```python\nX, y = load_boston(return_X_y=True)\nX = StandardScaler().fit_transform(X)\ny = StandardScaler().fit_transform(np.expand_dims(y, -1))\n\nX_train, X_test, y_train, y_test = train_test_split(X,\n                                                    y,\n                                                    test_size=.25,\n                                                    random_state=42)\n\n\nX_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()\nX_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()\n```\n\n# 创建我们的变分回归器类\n\n我们可以像构建任何 PyTorch 网络一样，通过继承 `nn.Module` 来创建我们的类。装饰器为我们提供了处理贝叶斯特性的方法，例如计算贝叶斯层的复杂度代价，以及执行多次前向传播（每次采样不同的权重）来估计损失。\n\n```python\n@variational_estimator\nclass BayesianRegressor(nn.Module):\n    def __init__(self, input_dim, output_dim):\n        super().__init__()\n        #self.linear = nn.Linear(input_dim, output_dim)\n        self.blinear1 = BayesianLinear(input_dim, 512)\n        self.blinear2 = BayesianLinear(512, output_dim)\n        \n    def forward(self, x):\n        x_ = self.blinear1(x)\n        x_ = F.relu(x_)\n        return self.blinear2(x_)\n```\n\n# 定义置信区间评估函数\n\n该函数为我们尝试对标签值进行采样的批次上的每个预测创建一个置信区间。然后，我们可以通过计算预测分布中有多少确实包含了数据点的正确标签来衡量预测的准确性。\n\n\n```python\ndef evaluate_regression(regressor,\n                        X,\n                        y,\n                        samples = 100,\n                        std_multiplier = 2):\n    preds = [regressor(X) for i in range(samples)]\n    preds = torch.stack(preds)\n    means = preds.mean(axis=0)\n    stds = preds.std(axis=0)\n    ci_upper = means + (std_multiplier * stds)\n    ci_lower = means - (std_multiplier * stds)\n    ic_acc = (ci_lower \u003C= y) * (ci_upper >= y)\n    ic_acc = ic_acc.float().mean()\n    return ic_acc, (ci_upper >= y).float().mean(), (ci_lower \u003C= y).float().mean()\n```\n\n# 创建回归器并加载数据\n\n请注意，我们在这里创建 `BayesianRegressor` 的方式与其他神经网络相同。\n\n```python\nregressor = BayesianRegressor(13, 1)\noptimizer = optim.Adam(regressor.parameters(), lr=0.01)\ncriterion = torch.nn.MSELoss()\n\nds_train = torch.utils.data.TensorDataset(X_train, y_train)\ndataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)\n\nds_test = torch.utils.data.TensorDataset(X_test, y_test)\ndataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)\n```\n\n## 主训练和评估循环\n\n我们的训练循环与普通的 PyTorch 训练循环唯一的不同之处在于，其损失是通过 `sample_elbo` 方法进行采样的。其他部分都可以正常进行，因为 BLiTZ 的目的就是让你在使用不同的贝叶斯神经网络时，能够更轻松地迭代数据，而无需担心复杂性。\n\n以下是我们的简单训练循环：\n\n```python\niteration = 0\nfor epoch in range(100):\n    for i, (datapoints, labels) in enumerate(dataloader_train):\n        optimizer.zero_grad()\n        \n        loss = regressor.sample_elbo(inputs=datapoints,\n                           labels=labels,\n                           criterion=criterion,\n                           sample_nbr=3)\n        loss.backward()\n        optimizer.step()\n        \n        iteration += 1\n        if iteration%100==0:\n            ic_acc, under_ci_upper, over_ci_lower = evaluate_regression(regressor,\n                                                                        X_test,\n                                                                        y_test,\n                                                                        samples=25,\n                                                                        std_multiplier=3)\n            \n            print(\"CI acc: {:.2f}, CI upper acc: {:.2f}, CI lower acc: {:.2f}\".format(ic_acc, under_ci_upper, over_ci_lower))\n            print(\"Loss: {:.4f}\".format(loss))\n```\n\n## 贝叶斯深度学习简述\n快速解释一下如何在贝叶斯神经网络中引入不确定性，以及我们如何对其损失进行建模，以便客观地提高预测的置信度并减少方差，而无需使用 Dropout 技术。\n\n## 首先，确定性神经网络层的线性变换\n\n如我们所知，在确定性的（非贝叶斯）神经网络层中，可训练参数直接对应于其前一层（或输入层，如果是第一层的话）线性变换中使用的权重。这可以用以下公式表示：\n\n\n![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_6bf3c78e7e9b.png)}&space;=&space;W^{(i&plus;1)}\\cdot&space;z^{(i)}&space;&plus;&space;b^{(i&plus;1)}) \n\n*(Z 表示第 i 层的激活输出)*\n\n## 贝叶斯层的目的\n\n贝叶斯层旨在通过在每次前向传播时从由可训练变量参数化的分布中采样权重，从而在其权重中引入不确定性。 \n\n这样做的好处不仅在于优化模型的性能指标，还在于可以收集网络对特定数据点预测的不确定性（通过多次采样并测量分散程度），并尽可能地减少网络预测的方差，从而让我们了解在基于特定数据点进行建模时，我们对标签仍然有多少不确定性。\n\n## 贝叶斯层中的权重采样\n为此，在每次前向传播时，我们会使用以下公式对线性变换的参数进行采样（其中 **ρ** 参数化标准差，**μ** 参数化样本线性变换参数的均值）：\n\n对于权重：\n\n![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_d9a58d704c3d.png)}_{(n)}&space;=&space;\\mathcal{N}(0,1)&space;*&space;log(1&space;&plus;&space;\\rho^{(i)}&space;)&space;&plus;&space;\\mu^{(i)})\n\n*其中采样的 W 对应于第 i 层在第 n 次采样时用于线性变换的权重。*\n\n对于偏置：\n\n![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_86a0185e9712.png)}_{(n)}&space;=&space;\\mathcal{N}(0,1)&space;*&space;log(1&space;&plus;&space;\\rho^{(i)}&space;)&space;&plus;&space;\\mu^{(i)})\n\n*其中采样的 b 对应于第 i 层在第 n 次采样时用于线性变换的偏置。*\n\n## 可以优化我们的可训练权重\n\n尽管我们的权重和偏置带有随机乘数，但我们仍然可以通过给定一个关于采样权重和可训练参数的可微分函数（在本例中为损失），然后对该函数分别对这两个参数求导来优化它们：\n\n1. 设 ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_ea1208eeeed3.png))\n2. 设 ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_7e94bf93ae99.png))\n3. 设 ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_27e3454fc65e.png)&space;*&space;\\epsilon)\n4. 设 ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_c32c3ff2100a.png)) 关于其变量可微分\n\n因此：\n\n5. ![equation](https:\u002F\u002Flatex.codecogs.com\u002Fgif.latex?\\Delta_{\\mu}&space;=&space;\\frac{\\delta&space;f(w,&space;\\theta)}{\\delta&space;w}&space;&plus;&space;\\frac{\\delta&space;f(w,&space;\\theta)}{\\delta&space;\\mu})\n\n以及\n\n\n6. ![equation](https:\u002F\u002Flatex.codecogs.com\u002Fgif.latex?\\Delta_{\\rho}&space;=&space;\\frac{\\delta&space;f(w,&space;\\theta)}{\\delta&space;w}&space;\\frac{\\epsilon}{1&space;&plus;&space;e^\\rho&space;}&space;&plus;&space;\\frac{\\delta&space;f(w,&space;\\theta)}{\\delta&space;\\rho})\n\n## 同样地，也存在一种复杂度代价函数，其对各变量可导。\n\n众所周知，交叉熵损失（以及均方误差）都是可导的。因此，如果我们证明存在一种可导的复杂度代价函数，就可以让我们的框架自动求导并计算优化步骤中的梯度。\n\n**复杂度代价是在前向传播过程中，由每个贝叶斯层分别计算得出的（基于预先定义的较简单先验分布及其经验分布）。各层的复杂度代价之和会被累加到总损失中。**\n\n正如[《神经网络中的权重不确定性》论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1505.05424)所提出的，我们可以通过计算该分布与一个更简单的分布之间的[KL散度](https:\u002F\u002Fjhui.github.io\u002F2017\u002F01\u002F05\u002FDeep-learning-Information-theory\u002F)来获取分布的复杂度代价，并通过一些近似处理，使这一函数能够对其变量（即各个分布）求导：\n\n1. 设 ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_548ea174d62f.png)) 为人工设定的低熵分布的概率密度函数，将其视为权重的“先验”分布。\n2. 设 ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_3d184379b7ee.png)) 为我们采样的权重在给定参数下的后验经验分布的概率密度函数。\n\n\n\n\n因此，对于采样矩阵 W 中的每一个标量：\n\n\n\n3. ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_6595d6d2409d.png)&space;\\lVert&space;{P}(w)&space;)&space;=&space;\\lim_{n\\to\\infty}1\u002Fn\\sum_{i=0}^{n}&space;{Q}(w^{(i)}&space;|&space;\\theta)*&space;(\\log{{Q}(w^{(i)}&space;|&space;\\theta)}&space;-&space;\\log{{P}(w^{(i)})}&space;))\n\n\n假设 n 非常大时，我们可以近似得到：\n\n4. ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_6595d6d2409d.png)&space;\\lVert&space;{P}(w)&space;)&space;=&space;1\u002Fn\\sum_{i=0}^{n}&space;{Q}(w^{(i)}&space;|&space;\\theta)*&space;(\\log{{Q}(w^{(i)}&space;|&space;\\theta)}&space;-&space;\\log{{P}(w^{(i)})}&space;))\n\n\n从而有：\n\n5. ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_6595d6d2409d.png)&space;\\lVert&space;{P}(w)&space;)&space;=&space;\\mu_Q&space;*\\sum_{i=0}^{n}&space;(\\log{{Q}(w^{(i)}&space;|&space;\\theta)}&space;-&space;\\log{{P}(w^{(i)})}&space;))\n\n\n由于 Q 分布的期望值（均值）只是对数值进行缩放，因此可以将其从表达式中提取出来（因为不会影响框架的梯度追踪）。于是，第 n 个样本的复杂度代价为：\n\n6. ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_1f0cf4313bcf.png)}&space;(w^{(n)},&space;\\theta)&space;}&space;=&space;(\\log{{Q}(w^{(n)}&space;|&space;\\theta)}&space;-&space;\\log{{P}(w^{(n)})}&space;))\n\n该表达式对其所有参数均可导。\n\n## 要得到第 n 个样本的完整代价函数：\n\n1. 设性能（拟合数据）函数为：![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_15ea3f8540ef.png)}&space;(w^{(n)},&space;\\theta)})\n\n\n因此，第 n 个权重样本的完整代价函数为：\n\n2. ![equation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_readme_3eaea68041a8.png)}&space;(w^{(n)},&space;\\theta)&space;}&space;=&space;{C^{(n)}&space;(w^{(n)},&space;\\theta)&space;}&space;&plus;&space;{P^{(n)}&space;(w^{(n)},&space;\\theta)&space;})\n\n我们可以通过蒙特卡洛采样来估计真实的完整代价函数（即多次前向传播网络并取总损失的平均值），然后使用该估计值进行反向传播。这种方法即使每次反向传播只进行少量实验，甚至只进行一次实验，也同样有效。\n\n## 一些说明与总结\n至此，《贝叶斯深度学习概览》教程就告一段落了。了解了这里的内容后，您可以根据自己的需求实现贝叶斯神经网络模型。\n\n例如，您可以在每次采样时执行一次优化步骤，或者采用这种类似蒙特卡洛的方法多次收集损失、取其平均值后再进行优化。一切由您决定。\n\n温馨提示：**我们的贝叶斯层及工具类会在每次前向传播时自动计算各层的复杂度代价，因此您无需过多关注这一点。**\n \n## 参考文献：\n * [查尔斯·布伦德尔、朱利安·科恩比斯、科雷·卡武克乔卢和达安·维尔斯特拉。神经网络中的权重不确定性。arXiv 预印本 arXiv:1505.05424，2015 年。](https:\u002F\u002Farxiv.org\u002Fabs\u002F1505.05424)\n \n \n## 引用方式\n\n如果您在研究中使用了 `BLiTZ`，可以按如下方式引用：\n\n```bibtex\n@misc{esposito2020blitzbdl,\n    author = {皮耶罗·埃斯波西托},\n    title = {BLiTZ - Torch Zoo 中的贝叶斯层（一个面向 Torch 的贝叶斯深度学习库）},\n    year = {2020},\n    publisher = {GitHub},\n    journal = {GitHub 仓库},\n    howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning\u002F}},\n}\n```\n \n###### 制作：皮耶罗·埃斯波西托","# BLiTZ 快速上手指南\n\nBLiTZ (Bayesian Layers in Torch Zoo) 是一个基于 PyTorch 的轻量级扩展库，旨在帮助开发者轻松构建贝叶斯神经网络（BNN）。它允许你在不改变原有 PyTorch 层交互逻辑的前提下，为模型引入权重不确定性（Weight Uncertainty），从而获取预测结果的置信区间并计算模型复杂度成本。\n\n## 环境准备\n\n在开始之前，请确保你的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows\n*   **Python 版本**: 推荐 Python 3.8 - 3.9 (原文示例使用 3.9)\n*   **核心依赖**:\n    *   PyTorch (建议安装最新稳定版)\n    *   NumPy\n    *   Scikit-learn (用于数据预处理和加载示例数据集)\n\n## 安装步骤\n\n你可以选择通过 `pip`、`conda` 或源码方式进行安装。国内用户若遇到下载速度慢的问题，建议使用国内镜像源加速。\n\n### 方式一：使用 pip 安装（推荐）\n\n```bash\n# 使用官方源\npip install blitz-bayesian-pytorch\n\n# 【推荐】国内用户使用清华镜像源加速\npip install blitz-bayesian-pytorch -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 方式二：使用 conda 安装\n\n```bash\n# 使用官方 channel\nconda install -c conda-forge blitz-bayesian-pytorch\n\n# 【推荐】国内用户使用中科大镜像源加速\nconda install -c https:\u002F\u002Fmirrors.ustc.edu.cn\u002Fanaconda\u002Fcloud\u002Fconda-forge blitz-bayesian-pytorch\n```\n\n### 方式三：源码安装（适合需要修改源码的开发者）\n\n```bash\nconda create -n blitz python=3.9\nconda activate blitz\ngit clone https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning.git\ncd blitz-bayesian-deep-learning\npip install .\n```\n\n## 基本使用\n\n以下是一个完整的回归任务示例，展示如何使用 BLiTZ 构建贝叶斯回归模型并评估预测置信区间。\n\n### 1. 导入必要模块\n\n除了常规的 PyTorch 模块外，我们需要从 `blitz` 导入 `BayesianLinear` 层和 `variational_estimator` 装饰器。\n\n```python\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport numpy as np\n\n# 从 BLiTZ 导入核心组件\nfrom blitz.modules import BayesianLinear\nfrom blitz.utils import variational_estimator\n\nfrom sklearn.datasets import load_boston\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.model_selection import train_test_split\n```\n\n### 2. 数据加载与预处理\n\n加载波士顿房价数据集并进行标准化处理，随后转换为 PyTorch Tensor。\n\n```python\nX, y = load_boston(return_X_y=True)\nX = StandardScaler().fit_transform(X)\ny = StandardScaler().fit_transform(np.expand_dims(y, -1))\n\nX_train, X_test, y_train, y_test = train_test_split(X,\n                                                    y,\n                                                    test_size=.25,\n                                                    random_state=42)\n\nX_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()\nX_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()\n```\n\n### 3. 定义贝叶斯回归模型\n\n使用 `@variational_estimator` 装饰器修饰你的 `nn.Module` 类。这将自动注入处理贝叶斯特性所需的方法（如计算复杂度成本和多次采样前向传播）。\n\n```python\n@variational_estimator\nclass BayesianRegressor(nn.Module):\n    def __init__(self, input_dim, output_dim):\n        super().__init__()\n        # 使用 BayesianLinear 替代普通的 nn.Linear\n        self.blinear1 = BayesianLinear(input_dim, 512)\n        self.blinear2 = BayesianLinear(512, output_dim)\n        \n    def forward(self, x):\n        x_ = self.blinear1(x)\n        x_ = F.relu(x_)\n        return self.blinear2(x_)\n```\n\n### 4. 定义置信区间评估函数\n\n该函数通过多次采样预测结果来计算均值和标准差，进而生成置信区间，并统计真实标签落在区间内的比例。\n\n```python\ndef evaluate_regression(regressor,\n                        X,\n                        y,\n                        samples = 100,\n                        std_multiplier = 2):\n    preds = [regressor(X) for i in range(samples)]\n    preds = torch.stack(preds)\n    means = preds.mean(axis=0)\n    stds = preds.std(axis=0)\n    ci_upper = means + (std_multiplier * stds)\n    ci_lower = means - (std_multiplier * stds)\n    ic_acc = (ci_lower \u003C= y) * (ci_upper >= y)\n    ic_acc = ic_acc.float().mean()\n    return ic_acc, (ci_upper >= y).float().mean(), (ci_lower \u003C= y).float().mean()\n```\n\n### 5. 训练与评估循环\n\n初始化模型和优化器。注意训练循环中的损失计算需使用 `sample_elbo` 方法，该方法会自动处理贝叶斯层的采样和复杂度成本计算。\n\n```python\n# 初始化模型\nregressor = BayesianRegressor(13, 1)\noptimizer = optim.Adam(regressor.parameters(), lr=0.01)\ncriterion = torch.nn.MSELoss()\n\n# 创建 DataLoader\nds_train = torch.utils.data.TensorDataset(X_train, y_train)\ndataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)\n\nds_test = torch.utils.data.TensorDataset(X_test, y_test)\ndataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)\n\n# 训练循环\niteration = 0\nfor epoch in range(100):\n    for i, (datapoints, labels) in enumerate(dataloader_train):\n        optimizer.zero_grad()\n        \n        # 关键步骤：使用 sample_elbo 计算包含不确定性成本的损失\n        loss = regressor.sample_elbo(inputs=datapoints,\n                           labels=labels,\n                           criterion=criterion,\n                           sample_nbr=3)\n        loss.backward()\n        optimizer.step()\n        \n        iteration += 1\n        if iteration % 100 == 0:\n            ic_acc, under_ci_upper, over_ci_lower = evaluate_regression(regressor,\n                                                                        X_test,\n                                                                        y_test,\n                                                                        samples=25,\n                                                                        std_multiplier=3)\n            \n            print(\"CI acc: {:.2f}, CI upper acc: {:.2f}, CI lower acc: {:.2f}\".format(ic_acc, under_ci_upper, over_ci_lower))\n            print(\"Loss: {:.4f}\".format(loss))\n```","某量化交易团队正在构建基于深度学习的股价趋势预测模型，旨在为高频交易决策提供数据支撑。\n\n### 没有 blitz-bayesian-deep-learning 时\n- 模型仅能输出单一的点估计值（如“预计上涨 2.5%\"），无法告知该预测的可信程度，交易员难以判断是否值得下注。\n- 面对市场剧烈波动或罕见数据（分布外样本），传统神经网络往往盲目自信地给出错误预测，导致严重的资金回撤。\n- 若要手动实现贝叶斯神经网络以获取不确定性，需从零推导变分推断公式并重写反向传播逻辑，开发周期长达数周且极易出错。\n- 缺乏对模型复杂度成本的原生支持，难以在预测精度与模型过拟合风险之间找到最佳平衡点。\n\n### 使用 blitz-bayesian-deep-learning 后\n- 直接调用库中的贝叶斯层替换标准 PyTorch 层，模型不仅能预测股价，还能输出置信区间（如\"95% 概率上涨 1%-4%\"），让交易策略更具鲁棒性。\n- 利用权重采样机制，模型在面对异常市场数据时会自动扩大预测方差，提示交易员“当前预测不可靠”，从而触发风控机制避免盲目操作。\n- 无需修改现有训练循环架构，仅需几行代码即可集成不确定性估算，将原本数周的算法验证工作缩短至几天。\n- 内置的复杂度成本函数可自动计算并优化变分下界，帮助团队快速筛选出既准确又不过度复杂的轻量级交易模型。\n\nblitz-bayesian-deep-learning 的核心价值在于让开发者能以极简的代码成本，将“知其然”的点预测升级为“知其所以然”的概率决策，显著提升 AI 在高风险场景下的可靠性。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FpiEsposito_blitz-bayesian-deep-learning_b05177d6.png","piEsposito","Pi Esposito","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FpiEsposito_ae332b62.png","ai computer nerd",null,"pi@piesposi.to","piesposi_to","https:\u002F\u002Fpiesposito.github.io\u002F","https:\u002F\u002Fgithub.com\u002FpiEsposito",[83,87,91],{"name":84,"color":85,"percentage":86},"Python","#3572A5",53.8,{"name":88,"color":89,"percentage":90},"Jupyter Notebook","#DA5B0B",46.1,{"name":92,"color":93,"percentage":94},"Shell","#89e051",0.1,981,111,"2026-04-06T07:24:29","GPL-3.0",1,"","未说明",{"notes":103,"python":104,"dependencies":105},"该工具是一个基于 PyTorch 的贝叶斯神经网络库，支持通过 pip 或 conda 安装。示例代码中使用了 scikit-learn 进行数据加载和预处理。README 未明确指定操作系统、GPU 或内存的具体需求，通常取决于所运行的具体模型规模和数据集大小。建议使用 conda 创建虚拟环境进行安装。","3.9",[106,107,108],"torch","numpy","scikit-learn",[14],[111,112,113,114,115,116],"pytorch","pytorch-implementation","pytorch-tutorial","bayesian-layers","bayesian-deep-learning","bayesian-neural-networks","2026-03-27T02:49:30.150509","2026-04-18T14:24:34.001600",[120,125,130,135,140,145],{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},39766,"如何在 GPU (CUDA) 上运行模型？","虽然该库主要设计为轻量级 CPU 运行，但支持 CUDA。你需要确保不仅将模型移动到 GPU，还要将输入数据也移动到相同的设备。具体步骤如下：\n1. 定义设备：`device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')`\n2. 将模型移至设备：`model = model.to(device)`\n3. 在训练循环中，将每个 batch 的数据移至设备：`inputs, targets = inputs.to(device), targets.to(device)`。\n注意：如果只移动模型而不移动数据，会报 `RuntimeError: Expected object of device type cuda but got device type cpu` 错误。","https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning\u002Fissues\u002F9",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},39767,"为什么贝叶斯 MLP 在回归任务中无法学习（输出几乎为常数）？","这通常是因为网络过大或先验参数设置不当导致的。解决方案包括：\n1. **减少节点数量**：尝试使用更小的网络结构。\n2. **增加先验 sigma (prior_sigma_1)**：增大权重的先验方差有助于模型探索解空间。\n例如，将层定义修改为：`self.blinear1 = BayesianLinear(1, 5, prior_sigma_1=1)`。对于较大的贝叶斯网络，可能需要更多的训练轮次 (epochs) 才能收敛。","https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning\u002Fissues\u002F51",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},39768,"代码中存在巨大的内存需求问题怎么办？","如果在训练简单模型时出现异常高的内存占用（如 >12GB），且内存随 epoch 增加而增长，可能是因为某些操作在每个 epoch 中意外地保留了梯度追踪历史。建议检查代码中是否有未 detach 的张量参与了累积计算。维护者指出这可能是由于梯度通过 epochs 被持续追踪导致的，通常需要检查损失计算或中间变量的处理方式，确保不需要反向传播的张量及时脱离计算图。","https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning\u002Fissues\u002F68",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},39769,"log_posterior() 函数末尾的 -0.5 是否正确？","这是一个已确认的代码错误。在高斯分布的对数似然计算中，末尾不应包含 `-0.5` 这一项。维护者已确认该常数项是错误的并同意将其移除。如果你在使用旧版本代码，可以手动在 `blitz\u002Fmodules\u002Fweight_sampler.py` 的第 50 行左右删除公式末尾的 `- 0.5`。","https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning\u002Fissues\u002F58",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},39770,"为什么预测结果的方差太小，置信区间无法覆盖真实值？","如果模型均值预测准确但方差过小，通常是因为复杂度成本权重 (complexity_cost_weight) 过高或先验参数设置限制了后验分布的灵活性。建议尝试以下调整：\n1. 降低 `complexity_cost_weight`，减少对 KL 散度的惩罚，允许后验分布更自由地拟合数据不确定性。\n2. 调整 `prior_sigma_1` 和 `prior_sigma_2` 以改变先验分布的宽度。\n3. 尝试不同的架构，有时卷积层或特定的层组合能更好地捕捉不确定性。","https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning\u002Fissues\u002F50",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},39771,"训练集和测试集的数据似乎有重叠，这是正常的吗？","正常情况下，训练集 (X_train, y_train) 和测试集 (X_test, y_test) 应该是完全独立的数据划分，不应包含相同的数据样本。如果你在 `X_test` 和 `y_train` 之间发现了相同的数据，这通常不是库本身的问题，而是数据预处理或切片代码中的逻辑错误。请检查你的数据分割代码（如 `train_test_split` 的使用）以及数据形状的变换过程，确保没有发生索引错位或数据泄露。","https:\u002F\u002Fgithub.com\u002FpiEsposito\u002Fblitz-bayesian-deep-learning\u002Fissues\u002F75",[151,156,161,166,171,176],{"id":152,"version":153,"summary_zh":154,"released_at":155},315708,"0.2.8","修复了一些小 bug，并优化了 GPU 使用。","2022-04-15T13:05:29",{"id":157,"version":158,"summary_zh":159,"released_at":160},315709,"0.2.7","添加径向和翻转图层","2020-11-28T16:52:35",{"id":162,"version":163,"summary_zh":164,"released_at":165},315710,"0.2.6","支持 PyTorch 1.7","2020-11-22T15:50:20",{"id":167,"version":168,"summary_zh":169,"released_at":170},315711,"0.2.5","版本 0.2.5","2020-07-01T15:47:14",{"id":172,"version":173,"summary_zh":174,"released_at":175},315712,"0.2.3","新增内容：\n\n变分推断新特性\n新示例\n内置贝叶斯VGG模型\n在贝叶斯层中使用不同先验分布","2020-05-25T18:44:09",{"id":177,"version":178,"summary_zh":77,"released_at":179},315713,"0.2.1","2020-05-20T15:39:02"]