[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-hidet-org--hidet":3,"tool-hidet-org--hidet":61},[4,18,26,36,44,52],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",141543,2,"2026-04-06T11:32:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":53,"name":54,"github_repo":55,"description_zh":56,"stars":57,"difficulty_score":10,"last_commit_at":58,"category_tags":59,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,60],"视频",{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":103,"forks":104,"last_commit_at":105,"license":106,"difficulty_score":10,"env_os":107,"env_gpu":108,"env_ram":109,"env_deps":110,"category_tags":115,"github_topics":116,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":121,"updated_at":122,"faqs":123,"releases":149},4599,"hidet-org\u002Fhidet","hidet","An open-source efficient deep learning framework\u002Fcompiler, written in python.","Hidet 是一款基于 Python 开发的开源深度学习编译器，旨在将 PyTorch 和 ONNX 模型端到端地编译为高效的 CUDA 内核。它主要解决了深度学习模型在 NVIDIA GPU 上推理速度慢、资源利用率低的问题，通过自动应用一系列图级和算子级的优化策略，显著提升模型运行性能。\n\n这款工具特别适合需要在生产环境中部署高效推理服务的开发者，以及关注底层系统优化的研究人员。用户只需简单的几行代码，即可利用 `torch.compile` 接口将现有的 PyTorch 模型转换为高性能版本，无需手动编写复杂的 CUDA 代码。\n\nHidet 的独特之处在于其源自 ASPLOS '23 学术研究的“任务映射编程范式”，能够智能搜索并生成最优的张量程序。目前，它专注于 Linux 环境下基于 CUDA Toolkit 11.6+ 的 NVIDIA GPU 加速，要求使用 Python 3.9 及以上版本。作为一个由 CentML 团队积极维护的 Apache 2.0 许可项目，Hidet 为社区提供了一个透明、可扩展且高性能的深度学习编译解决方案，帮助用户轻松释放硬件潜力。","# Hidet: An Open-Source Deep Learning Compiler\n[**Documentation**](http:\u002F\u002Fhidet.org\u002Fdocs)  |\n[**Research Paper**](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3575693.3575702)  |\n[**Releases**](https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Freleases) |\n[**Contributing**](https:\u002F\u002Fhidet.org\u002Fdocs\u002Fstable\u002Fdeveloper-guides\u002Fcontributing.html)\n\n![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fhidet-org\u002Fhidet)\n![GitHub Workflow Status](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fhidet-org\u002Fhidet\u002Ftests.yaml)\n\n\nHidet is an open-source deep learning compiler, written in Python. \nIt supports end-to-end compilation of DNN models from PyTorch and ONNX to efficient cuda kernels.\nA series of graph-level and operator-level optimizations are applied to optimize the performance.\n\nCurrently, hidet focuses on optimizing the inference workloads on NVIDIA GPUs, and requires\n- Linux OS\n- CUDA Toolkit 11.6+\n- Python 3.9+\n\n## Getting Started\n\n### Installation\nPlease install hidet via\n```bash\npip install hidet\n```\n\nYou can also install hidet via [building from source](https:\u002F\u002Fdocs.hidet.org\u002Fstable\u002Fgetting-started\u002Fbuild-from-source.html#).\n\n### Usage\n\nOptimize a PyTorch model through hidet (require PyTorch 2.3):\n```python\nimport torch\n\n# Define pytorch model\nmodel = torch.hub.load('pytorch\u002Fvision:v0.6.0', 'resnet18', pretrained=True).cuda().eval()\nx = torch.rand(1, 3, 224, 224).cuda()\n\n# Compile the model through Hidet\n# Optional: set optimization options (see our documentation for more details)\n#   import hidet \n#   hidet.torch.dynamo_config.search_space(2)  # tune each tunable operator\nmodel_opt = torch.compile(model, backend='hidet')  \n\n# Run the optimized model\ny = model_opt(x)\n```\nSee the following tutorials to learn other usages:\n- [Quick Start](http:\u002F\u002Fhidet.org\u002Fdocs\u002Fstable\u002Fgallery\u002Fgetting-started\u002Fquick-start.html)\n\n## Publication\nHidet originates from the following research work:\n\n>  **Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs**  \n>  Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, and Gennady Pekhimenko.  \n>  ASPLOS '23\n\nIf you used **Hidet** in your research, welcome to cite our\n[paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3575693.3575702).\n\n## Development \nHidet is currently under active development by a team at [CentML Inc](https:\u002F\u002Fcentml.ai\u002F). \n\n## Contributing\nWe welcome contributions from the community. Please see \n[contribution guide](https:\u002F\u002Fhidet.org\u002Fdocs\u002Fstable\u002Fdeveloper-guides\u002Fcontributing.html)\nfor more details.\n\n## License\nHidet is released under the [Apache 2.0 license](LICENSE).\n","# Hidet：一个开源深度学习编译器\n[**文档**](http:\u002F\u002Fhidet.org\u002Fdocs)  |\n[**研究论文**](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3575693.3575702)  |\n[**发布版本**](https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Freleases) |\n[**贡献指南**](https:\u002F\u002Fhidet.org\u002Fdocs\u002Fstable\u002Fdeveloper-guides\u002Fcontributing.html)\n\n![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fhidet-org\u002Fhidet)\n![GitHub 工作流状态](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fhidet-org\u002Fhidet\u002Ftests.yaml)\n\n\nHidet 是一个用 Python 编写的开源深度学习编译器。\n它支持从 PyTorch 和 ONNX 到高效 CUDA 内核的端到端 DNN 模型编译。\n通过一系列图级和算子级优化来提升性能。\n\n目前，Hidet 主要专注于优化 NVIDIA GPU 上的推理工作负载，需要满足以下条件：\n- Linux 操作系统\n- CUDA Toolkit 11.6+\n- Python 3.9+\n\n## 快速入门\n\n### 安装\n请通过以下命令安装 Hidet：\n```bash\npip install hidet\n```\n\n您也可以通过 [从源码构建](https:\u002F\u002Fdocs.hidet.org\u002Fstable\u002Fgetting-started\u002Fbuild-from-source.html#) 的方式安装 Hidet。\n\n### 使用\n\n通过 Hidet 优化一个 PyTorch 模型（需 PyTorch 2.3）：\n```python\nimport torch\n\n# 定义 PyTorch 模型\nmodel = torch.hub.load('pytorch\u002Fvision:v0.6.0', 'resnet18', pretrained=True).cuda().eval()\nx = torch.rand(1, 3, 224, 224).cuda()\n\n# 通过 Hidet 编译模型\n# 可选：设置优化选项（详情请参阅我们的文档）\n#   import hidet \n#   hidet.torch.dynamo_config.search_space(2)  # 调优每个可调优算子\nmodel_opt = torch.compile(model, backend='hidet')  \n\n# 运行优化后的模型\ny = model_opt(x)\n```\n更多用法请参阅以下教程：\n- [快速入门](http:\u002F\u002Fhidet.org\u002Fdocs\u002Fstable\u002Fgallery\u002Fgetting-started\u002Fquick-start.html)\n\n## 出版物\nHidet 源自以下研究工作：\n\n>  **Hidet：面向深度学习张量程序的任务映射编程范式**  \n>  丁耀耀、Cody Hao Yu、郑博健、刘一智、王一达、Gennady Pekhimenko。  \n>  ASPLOS '23\n\n如果您在研究中使用了 **Hidet**，欢迎引用我们的\n[论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3575693.3575702)。\n\n## 开发\nHidet 目前由 [CentML Inc](https:\u002F\u002Fcentml.ai\u002F) 团队积极开发中。\n\n## 贡献\n我们欢迎社区的贡献。更多详情请参阅\n[贡献指南](https:\u002F\u002Fhidet.org\u002Fdocs\u002Fstable\u002Fdeveloper-guides\u002Fcontributing.html)。\n\n## 许可证\nHidet 采用 [Apache 2.0 许可证](LICENSE) 发布。","# Hidet 快速上手指南\n\nHidet 是一个用 Python 编写的开源深度学习编译器，支持将 PyTorch 和 ONNX 模型端到端编译为高效的 CUDA 内核，专为 NVIDIA GPU 上的推理负载优化。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux\n*   **CUDA Toolkit**：版本 11.6 或更高\n*   **Python**：版本 3.9 或更高\n*   **PyTorch**：建议使用 PyTorch 2.3（用于 `torch.compile` 集成）\n\n## 安装步骤\n\n推荐使用 pip 进行安装：\n\n```bash\npip install hidet\n```\n\n如果您需要从源码构建以获取最新功能或进行开发，请参考官方文档中的 [从源码构建指南](https:\u002F\u002Fdocs.hidet.org\u002Fstable\u002Fgetting-started\u002Fbuild-from-source.html#)。\n\n> **提示**：国内开发者若遇到下载速度慢的问题，可使用国内镜像源加速安装：\n> ```bash\n> pip install hidet -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 基本使用\n\n以下是通过 Hidet 优化 PyTorch 模型的最简示例。该示例加载一个预训练的 ResNet18 模型并使用 Hidet 后端进行编译加速。\n\n```python\nimport torch\n\n# 定义 PyTorch 模型\nmodel = torch.hub.load('pytorch\u002Fvision:v0.6.0', 'resnet18', pretrained=True).cuda().eval()\nx = torch.rand(1, 3, 224, 224).cuda()\n\n# 通过 Hidet 编译模型\n# 可选：设置优化选项（详见官方文档）\n#   import hidet \n#   hidet.torch.dynamo_config.search_space(2)  # 调整每个可调节算子的搜索空间\nmodel_opt = torch.compile(model, backend='hidet')  \n\n# 运行优化后的模型\ny = model_opt(x)\n```\n\n更多详细用法和进阶教程，请访问 [Hidet 快速入门教程](http:\u002F\u002Fhidet.org\u002Fdocs\u002Fstable\u002Fgallery\u002Fgetting-started\u002Fquick-start.html)。","某计算机视觉团队正在将基于 PyTorch 训练的 ResNet18 模型部署到 NVIDIA GPU 服务器上，以支撑高并发的实时图像分类服务。\n\n### 没有 hidet 时\n- **推理延迟过高**：直接运行原生 PyTorch 模型时，由于算子粒度细且缺乏底层优化，单次推理耗时难以满足毫秒级响应需求。\n- **显存占用冗余**：框架在执行过程中产生大量临时中间张量，导致显存利用率低，限制了单卡可支持的并发批次大小。\n- **调优门槛极高**：若想提升性能，开发人员需手动编写复杂的 CUDA 内核代码，不仅开发周期长，还极易引入难以排查的 Bug。\n- **硬件算力浪费**：默认执行路径无法充分挖掘 GPU 的并行计算潜力，导致昂贵的硬件资源在高峰期仍处于“吃不饱”状态。\n\n### 使用 hidet 后\n- **端到端编译加速**：通过 `torch.compile` 接入 hidet 后端，自动将模型编译为高效 CUDA 内核，显著降低推理延迟，轻松达成实时性指标。\n- **图级优化省显存**：hidet 自动应用算子融合与内存复用策略，大幅减少中间变量开销，使单卡并发处理能力成倍提升。\n- **零样本代码改造**：无需重写任何模型代码或手写 CUDA，仅用几行配置即可触发深层优化，让算法工程师专注业务逻辑。\n- **极致硬件利用**：内置的搜索空间自动寻找当前硬件下的最优执行方案，确保 GPU 算力被充分释放，降低单位请求的计算成本。\n\nhidet 的核心价值在于让开发者无需成为 CUDA 专家，也能通过简单的 Python 接口获得媲美手工优化的极致推理性能。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhidet-org_hidet_6ef7e6e8.png","hidet-org","Hidet","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhidet-org_aba8894f.png","An open-source efficient deep learning framework.",null,"https:\u002F\u002Fhidet.org","https:\u002F\u002Fgithub.com\u002Fhidet-org",[80,84,88,92,96,100],{"name":81,"color":82,"percentage":83},"Python","#3572A5",97,{"name":85,"color":86,"percentage":87},"C++","#f34b7d",2.4,{"name":89,"color":90,"percentage":91},"C","#555555",0.4,{"name":93,"color":94,"percentage":95},"Shell","#89e051",0.1,{"name":97,"color":98,"percentage":99},"CMake","#DA3434",0,{"name":101,"color":102,"percentage":99},"Dockerfile","#384d54",739,68,"2026-04-02T06:15:15","Apache-2.0","Linux","必需 NVIDIA GPU，需安装 CUDA Toolkit 11.6+","未说明",{"notes":111,"python":112,"dependencies":113},"该工具专注于优化 NVIDIA GPU 上的推理工作负载。支持从 PyTorch 和 ONNX 端到端编译为高效的 CUDA 内核。可通过 pip 直接安装或从源码构建。","3.9+",[114],"torch>=2.3",[14],[117,118,119,120],"deep-learning","compiler","inference","framework","2026-03-27T02:49:30.150509","2026-04-07T03:52:50.812881",[124,129,134,139,144],{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},20926,"在使用 Hidet 编译 Stable Diffusion 时遇到编译错误或速度无提升，如何解决？","部分编译错误已修复（如 T4 编译和缩放点积注意力映射）。对于不支持的 'channels last' 格式和自定义模块，可以通过以下 Dynamo 配置禁用相关部分来解决：\n1. 将模型转换为连续内存格式：`pipe.unet = pipe.unet.to(memory_format=torch.contiguous_format)`\n2. 在图中禁止特定的注意力处理器：`torch._dynamo.disallow_in_graph(diffusers.models.attention_processor.Attention)`\n如果仍有非连续张量错误，请检查是否应用了上述内存格式转换。","https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fissues\u002F225",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},20927,"如何查看或保存 Hidet 编译后的 CUDA 内核连接关系及模型权重？","Hidet 提供了保存编译模型的方法。使用编译模型的 `save` 方法（例如参考测试代码中的用法），可以将权重存储到生成的文件中（通常是 `weights.npz` 文件，可用 numpy 加载）。同时生成的 `graph_execution.json` 文件描述了如何使用这些权重以及执行图的结构。如果需要自定义导出逻辑，可以修改 `python\u002Fhidet\u002Fdrivers\u002Fbuild_graph.py` 文件中的相关函数。","https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fissues\u002F385",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},20928,"在哪里可以找到 Hidet 与其他 PyTorch Dynamo 后端的性能基准测试数据和脚本？","性能基准测试的追踪信息和详细数据曾在相关 Issue 中更新（包含 ResNet50 和 BERT 等模型在 RTX 4090 上的对比数据）。生成这些报告的基准测试脚本位于仓库的 `hidet\u002Fscripts\u002Fbench` 目录下，用户可以自行运行以获取最新的性能对比报告。","https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fissues\u002F154",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},20929,"遇到 `torch.abs` 在 GPU 和 CPU 上输出不一致或乘法交换律失效的问题怎么办？","这是一个已知的 Bug，已在 PR #391 中修复。该问题涉及乘法交换律应用于 `torch.abs` 输入时导致的计算不匹配。建议升级到包含此修复的最新版本。此外，PyTorch 社区的相关补丁（PR #113253）也可能有助于解决此类底层算子问题。","https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fissues\u002F382",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},20930,"Hidet 是否有自动并行化（Automatic Parallelization）的计划或文档？","关于自动并行化的详细讨论和技术方案曾以 RFC（请求意见稿）的形式提出。虽然相关的追踪 Issue 已关闭，但 RFC 文档的内容仍然具有很高的参考价值，可供开发者了解 Hidet 在该方向的设计思路和未来规划。建议查阅仓库中 `rfcs` 目录下的相关文档（如 `0003-auto-parallelization.md`）。","https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fissues\u002F335",[150,155,160,165,170,175,180,185,190,195,200,205,210],{"id":151,"version":152,"summary_zh":153,"released_at":154},126940,"v0.6.1","## 变更内容\n- [性能] [llama8B] 添加子图重写规则，以融合 Matmul 和类型转换操作 (#1021) (Peterson Guo)\n- [CI][GPU] 禁用 A10、A100、L4 GPU 测试，仅启用 H100 测试 (#1022) (Michael Wojcikiewicz)\n- [原语] 为 cp-async 指令添加 volatile 标记 (#1018) (Yaoyao Ding)\n- [增强][CuTe] 在 Hexcute 中支持 `scan` 运算符 (#996) (xiaocenxiaocen)\n- [CuTe] 移除 Hexcute 中的一个未使用文件 (#1017) (xiaocenxiaocen)\n- [增强][CuTe] 简化 Hexcute 中的逐元素操作 (#995) (xiaocenxiaocen)\n- [Bufix] 修复 InlineLetStmt 传递中的一个错误 (#1002) (Yaoyao Ding)\n- [Bugfix] 修复使用 tempfile 的工具函数中的一个错误 (#1001) (Yaoyao Ding)\n- [Bugfix] 修复广播运算符的动态形状问题 (#1015) (Yaoyao Ding)\n- [特性] 支持张量内存数据移动指令 (#1014) (Bolin Sun)\n- [测试][CuTe] 为 `tma` 张量和 hopper gemm 内核编写测试 (#940) (xiaocenxiaocen)\n- [特性] 张量内存分配 + 寄存器加载\u002F存储指令 (#1013) (Bolin Sun)\n- [选项] 支持通过环境变量指定选项值 (#1012) (Yaoyao Ding)\n- [性能] [hidet+dmwl] 在 hidet 中添加 einops.rearrange 支持 (#961) (Peterson Guo)\n- [CI] 更新同步分支 (Yaoyao Ding)\n- [特性][CuTe] 将 `tma` 回退到普通的异步复制操作 (#945) (xiaocenxiaocen)\n- [特性][CuTe] 添加对 `tma` 张量和指令的支持 (#939) (xiaocenxiaocen)\n- [CI] 在 A100 和 A10 上设置每周功能测试 (#984) (Alibek Tokayev)\n- [CI] 将夜间轮子包上传至轮子服务器 (#990) (Yaoyao Ding)\n- [安全] 在工作流中添加 `permissions` 部分 (#993) (Yaoyao Ding)\n- [版本] 使用 `setuptools-scm` 管理 hidet 版本 (#988) (Yaoyao Ding)\n- [特性] 添加 TCPStore 作为 NCCL 进程组的新初始化方法 (#987) (Max Hu)\n- [安全] 检查由请求数据构造的路径 (#968) (Yaoyao Ding)\n- [变换] 移除自动量化传递 (#972) (Yaoyao Ding)\n- [格式化] 使用 black 的并行格式化功能 (#977) (Yaoyao Ding)\n- [工作流] 启用 public-synced-main 分支上的测试 (#985) (Yaoyao Ding)\n- [CI] 在 tests.yaml 中添加 CI 排除路径 (#974) (Alibek Tokayev)\n- [增强] 将编译后图的内存管理委托给 torch (#978) (zhumakhan)\n- [测试] 测试另一个方向的同步。(#499) (Yaoyao Ding)\n- [测试] 测试提交同步 (#983) (Yaoyao Ding)\n- [CI] 添加用于同步的工作流 (Yaoyao Ding)\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fcompare\u002Fv0.6.0...v0.6.1","2025-09-02T03:47:26",{"id":156,"version":157,"summary_zh":158,"released_at":159},126941,"v0.6.0","## 变更内容\n* [依赖] 将 pygraphviz 移至开发依赖\n* [修复] 完成发布收尾工作\n* [CI] 使用默认运行器，而非大型运行器\n* [文档] 移动发布指南\n* [应用] 逐步淘汰应用层抽象\n* [安全] 修复使用 tempfile.mktemp() 的安全问题\n* [BUG] 简化符号变量\n* [CI] 添加 CI 依赖安装步骤\n* [Bugfix] 添加信号处理器以清理 NCCL 同步文件\n* [CI] 为所有工作流添加 permissions.contents: read 权限\n* [修复] 修复因 `mbarrier_try_wait` 接口变更导致的 CI 失败\n* [性能] 对批量矩阵乘法进行批次维度展平\n* [CI] 修复 Nightly 工作流\n* [CI] 所有功能测试均使用基础 Docker 镜像\n* [特性][CuTe] 在 Hexcute 中添加 `mbarrier` 算子\n* [特性] 添加流图可视化功能\n* [发布][收尾] 准备发布\n* [BUG][CI] 修复由 PyTorch 2.7.0 版本引起的 CI 失败\n* [修复] 修复矩阵乘法问题\n* [Bug] 释放 hidet 中为 kv 缓存预留的内存\n* [Bugfix] 为流图缓存哈希键添加 rank 信息\n* [BUG] 修复在使用 cuBLAS 编译模型时触发的内存错误\n* [测试] 为 `torch.compile` 和 `split` 算子增加更多测试用例\n* [Bugfix] 调整网格维度以支持更大批次尺寸\n* [性能] 默认启用区间调度表\n* [特性] fp8_scaled_mm\n* [Bugfix] 修复并重构代码，以支持 Deepseek R1 的编译\n* [修复] 修正 int8 Tensor Core 的 mma 配置名称\n* [图缓存] 在需要时将图可视化结果转储至缓存\n* [热修复] 修复当前 CI 失败的问题\n* [性能] 加速广播操作\n* [性能] 改进 `Expr` 简化过程\n* [包] 重构依赖配置\n* [依赖项] 将 black 升级至 25.1.0\n* [重构] 重构数据类型的属性方法\n* [特性] fp8_mm\n* [依赖项] 移除文档构建中对 jinja2 版本的限制\n* [依赖项] 移除依赖 TensorFlow 的 gpt2 示例\n* [性能] 优化图调度表，并支持嵌套形状\n* [Hopper] 在 hexcute 中添加 wgmma 指令\n* [特性] 支持将 FlowGraph 缓存至 CompiledGraph\n* [修复] 修复 `instantiate_symbols` 传递中的一个错误\n* [选项] 添加用于控制两个 nvcc 编译标志的选项\n* [增强] 增加收集不支持算子的功能\n* [传递] 优化加法链\n* [性能] 支持 fused_moe_awq_gptq\n* [特性] 完成 warp 特化\n* [CI][修复] 捕获心跳日志，确保 build-docs 失败时程序退出\n* [Bugfix] 在计算张量字节数时，将张量形状强制转换为 int64\n* [代码生成][运行时] 为保护公共函数添加 try-catch 语句\n* [性能] 实现 identity 操作\n* [修复][BUG] 移除 cute 中代码生成阶段的赋值语句\n* [CI][修复] 捕获心跳日志，确保任何测试失败时程序都能退出\n* [IR][运行时] 支持指针类型符号变量\n* [BUG] 修复形状中出现复杂表达式的问题\n* [Hopper] 添加成本模型\n* [CI][修复] 在所有测试工作流中对 build-docs 使用相同的修复措施\n* [B","2025-05-26T04:49:41",{"id":161,"version":162,"summary_zh":163,"released_at":164},126942,"v0.5.0","## 变更内容\n- [BUG] 添加编译服务器要求 (#661) 由 **Vadim Gimpelson** 提交，提交号：300fd33d\n- [BUG] 对 vllm 的 TP 进行多项修复 (#651) 由 **Vadim Gimpelson** 提交，提交号：9c29f669\n- 使用 wgmma 实现 matmul_f16 (#627) 由 **kjiang170** 提交，提交号：9f0ea7dd\n- [BUG] VLLM（以及 DMWL）使用 hidet 后端进行编译 (#647) 由 **zhumakhan** 提交，提交号：6c6be7a0\n- [IR] 在张量映射创建中添加对 `swizzle`、`interleave` 和 `l2Promotion` 的支持 (#643) 由 **Bolin Sun** 提交，提交号：21ff63f5\n- [BUG] 修复将哈希值附加到签名的问题 (#638) 由 **xiaocenxiaocen** 提交，提交号：dbd66132\n- Hexcute 基础分支（所有相关 PR 将合并到此基础 PR 中。）(#294) 由 **xiaocenxiaocen** 提交，提交号：b1fdf17d\n- [PERF] 将 parallel_k 的默认值设置为 'disabled' (#634) 由 **Vadim Gimpelson** 提交，提交号：135212bd\n- 在必要时适配 bfloat16 (#624) 由 **ZichuWu** 提交，提交号：9045865d\n- [Bug] 并行编译同步 (#616) 由 **ZichuWu** 提交，提交号：4c16c576\n- [COMPTIME] 加速热启动速度 (#625) 由 **Vadim Gimpelson** 提交，提交号：22c657b0\n- [BUG] 修复 torch2.5 内存溢出问题及文档构建修复 (#637) 由 **zhumakhan** 提交，提交号：bf32f8b1\n- 撤销 “[BUG] 修复 torch2.5 内存溢出问题” (#635) 由 **zhumakhan** 提交，提交号：9131a5c5\n- [BUG] 修复 torch2.5 内存溢出问题 (#609) 由 **zhumakhan** 提交，提交号：fe59c639\n- [CI] 修复构建和发布到内部 Hidet PYPI 索引中的小拼写错误 (#598) 由 **xinli-centml** 提交，提交号：f8400fe1\n- [PERF] 在另一个位置支持 bf16 (#623) 由 **Vadim Gimpelson** 提交，提交号：7f773490\n- [Tests] 适配 bfloat16 的测试\u002F运算符 (#615) 由 **ZichuWu** 提交，提交号：ba9c0ad5\n- [DISTRIBUTED] 在 `torch.compile` 模式下支持 `all_reduce` (#612) 由 **Vadim Gimpelson** 提交，提交号：0bca591c\n- [torchAPI] 从 torch 继承 CUDA 流 (#618) 由 **Vadim Gimpelson** 提交，提交号：ad4e00a0\n- [BUG] 修复共享映射实现中的错误 (#608) 由 **Vadim Gimpelson** 提交，提交号：ffdbde4b\n- [CI] 关闭 tests\u002Flang 的搜索空间 2 (#617) 由 **ZichuWu** 提交，提交号：5f7fae83\n- [Tests] 适配 bfloat16 测试用例的 tests\u002Flang (#594) 由 **ZichuWu** 提交，提交号：5b829cbd\n- [Tests] 适配 bfloat16 的 tests\u002Ffrontends (#592) 由 **ZichuWu** 提交，提交号：a5b72e62\n- [Tests] 适配 bfloat16 测试用例的 tests\u002Fir (#593) 由 **ZichuWu** 提交，提交号：545aeea4\n- [Tests] 调整 tests\u002Fmodels 中的测试用例以适应 bfloat16。（#595）由 **ZichuWu** 提交，提交号：bedff214\n- 为所有 CompiledGraph 使用一个全局 CUDA 工作区 (#603) 由 **Max Hu** 提交，提交号：66523079\n- [Fix] 修复在适配 `bfloat16` 数据类型测试用例时遇到的小错误 (#607) 由 **Bolin Sun** 提交，提交号：275070da\n- Kaihang\u002Fwgmma 支持 tf32 u8 i8 (#549) 由 **kjiang170** 提交，提交号：a0e6658f\n- [CI] 排除 tests\u002Funit_tests\u002Ftest_dynamic_shape.py::test_attention[cuda] (#606) 由 **Vadim Gimpelson** 提交，提交号：5579392a\n- [Tests] 调整 tests\u002Funit-tests 中的测试用例以适应 bfloat16。（#596）由 **ZichuWu** 提交，提交号：0e5ec55b\n- [BUG] 修复将 fxgraph 正确转换为 hidet 流程图的问题，并扩展用户站点包中查找 nccl 库的功能 (#604) 由 **Vadim Gimpelson** 提交，提交号：1995d431\n- [Tests] 为 tests\u002Fcuda 添加 bfloat16 测试用例 (#590) 由 **ZichuWu** 提交，提交号：febfbd71\n- [Tests] 调整 tests\u002Futils 中的测试用例以适应 bfloat16。（#597）由 **ZichuWu** 提交，提交号：36aab6f3\n- [Tests] 将 tests\u002Fapps 中的 float16 更改为 bfloat16 (#589) 由 **ZichuWu**","2024-12-21T21:47:09",{"id":166,"version":167,"summary_zh":168,"released_at":169},126943,"v0.4.1","## 变更内容\n- [修复] 修复由 `any` 运算符触发的错误 (#369)，作者：**Bolin Sun**，提交号：6a4c2e54\n- [修复] 为 mobilebert-uncased 模型添加了 `torch.t` 支持 (#353)，作者：**zhumakhan**，提交号：95d95a4c\n- [CI] 在测试和发布测试执行中使用相同的镜像 (#463)，作者：**c-fteixeira**，提交号：49fd3325\n- [BUG] 修复图中不允许操作的 bug (#464)，作者：**Vadim Gimpelson**，提交号：d84f2c5b\n- [CI] 将发布工作流迁移到内部 ARC 运行器 (#461)，作者：**c-fteixeira**，提交号：b5d6aafd\n- [CI] 更新 CI 容器 (#460)，作者：**Vadim Gimpelson**，提交号：b9735910\n- [Bug] 将 test_arithmetic.py 重命名为 test_arithmetic2.py (#459)，作者：**Vadim Gimpelson**，提交号：6aa6cf82\n- 更新 requirements-dev.txt，使用 PyTorch 版本 >= 2.3.0 (#458)，作者：**Vadim Gimpelson**，提交号：6b322953\n- [CI] 重复 start_instance (#361)，作者：**vadiklyutiy**，提交号：cf5caddf\n- [运算符] 添加 `leaky_relu` 支持 (#360)，作者：**Bolin Sun**，提交号：7401cccb\n- [修复] 修复在编译 `torch.nn.Upsample` 模块时因 `align_corners=True` 而触发的错误 (#344)，作者：**Bolin Sun**，提交号：2c34cfc0\n- [性能] 为 `add_hints_pass` 中的循环引入远程解决方法 (#356)，作者：**vadiklyutiy**，提交号：3195be5b\n- [运算符] 注册那些其 PyTorch 函数等价物已被 Hidet 支持的张量方法 (#347)，作者：**Bolin Sun**，提交号：44ab5ad3\n- [性能] 引入 add_hint_pass (#355)，作者：**vadiklyutiy**，提交号：c014dab1\n- [CI] 将 NVIDIA Docker 容器升级至 24.4 版本 (#354)，作者：**vadiklyutiy**，提交号：cb809b99\n- [修复] 将注意力掩码的类型从 fp32 转换为 f16 (#323)，作者：**zhumakhan**，提交号：9a10dc01\n- [修复] 为 conv-bert-base 模型添加缺失的 `torch.multiply` 和 `torch.nn.functional.unfold` 操作 (#351)，作者：**zhumakhan**，提交号：18842eeb\n- [修复] 修复 `register_methods` 中的 bug (#331)，作者：**Bolin Sun**，提交号：c87c5153\n- [修复] 处理 `setitem` 中关于 dtype 和设备的特殊情况 (#332)，作者：**Bolin Sun**，提交号：ff9445e2\n- [BUG] 修复 `bench_op.py` 中的 search_space bug (#348)，作者：**vadiklyutiy**，提交号：29e4c0e8\n- [OPS] 禁止在 fxgraph 中使用不支持的函数 (#317)，作者：**vadiklyutiy**，提交号：984cf75e\n- [选项] 移除 dynamo_config['search_space'] (#342)，作者：**vadiklyutiy**，提交号：0814bd8e\n- [运算符] 添加对 `torch.Tensor.view_as` 的支持 (#334)，作者：**Bolin Sun**，提交号：5f19dd05\n- [运算符] 添加对 `torch.nn.TransformerEncoder` 的支持 (#327)，作者：**Bolin Sun**，提交号：d625146e\n- [选项] 从 `torch.compile()` 继承 `options` (#260)，作者：**vadiklyutiy**，提交号：3638a0b5\n- [运算符] 为 `Tensor` 类添加 `__ge__` 方法 (#330)，作者：**Bolin Sun**，提交号：ed5fefff\n- [修复] 修复由 `ClampOp` 触发的错误 (#329)，作者：**Bolin Sun**，提交号：05984cb8\n- [修复] 处理在 `getitem` 中因设备差异导致的 hidet 错误 (#322)，作者：**Bolin Sun**，提交号：5a908205\n- [修复] 修复由 `register_functions.py` 中的 `tensor_reshape` 函数触发的 RuntimeError (#328)，作者：**Bolin Sun**，提交号：0cd2f838\n- [运算符] 添加在编译 `DALLE2_pytorch` 时遇到的 PyTorch 运算符 (#319)，作者：**Bolin Sun**，提交号：ecb99b1d\n- [修复] 修复由于尝试修改 `immutable_list` 导致的 `tensor_expand` 中的 bug (#320)，作者：**Bolin Sun**，提交号：bb89e227\n- [杂项] 替换","2024-07-30T02:40:55",{"id":171,"version":172,"summary_zh":173,"released_at":174},126944,"v0.4.0","## 变更内容\n- [修复] 修复由运算符 `any` 触发的错误 (#369)，作者：**Bolin Sun**，提交哈希：6a4c2e54\n- [修复] 为 mobilebert-uncased 模型添加了 `torch.t` 操作 (#353)，作者：**zhumakhan**，提交哈希：95d95a4c\n- [CI] 在测试和发布测试执行中使用相同的镜像 (#463)，作者：**c-fteixeira**，提交哈希：49fd3325\n- [BUG] 修复图中不允许操作的 bug (#464)，作者：**Vadim Gimpelson**，提交哈希：d84f2c5b\n- [CI] 将发布工作流迁移到内部 ARC 运行器 (#461)，作者：**c-fteixeira**，提交哈希：b5d6aafd\n- [CI] 更新 CI 容器 (#460)，作者：**Vadim Gimpelson**，提交哈希：b9735910\n- [Bug] 将 test_arithmetic.py 重命名为 test_arithmetic2.py (#459)，作者：**Vadim Gimpelson**，提交哈希：6aa6cf82\n- 更新 requirements-dev.txt，使用 PyTorch 版本 >= 2.3.0 (#458)，作者：**Vadim Gimpelson**，提交哈希：6b322953\n- [CI] 重复 start_instance (#361)，作者：**vadiklyutiy**，提交哈希：cf5caddf\n- [运算符] 添加对 `leaky_relu` 的支持 (#360)，作者：**Bolin Sun**，提交哈希：7401cccb\n- [修复] 修复在编译 `torch.nn.Upsample` 模块时因 `align_corners=True` 而触发的错误 (#344)，作者：**Bolin Sun**，提交哈希：2c34cfc0\n- [性能] 为 `add_hints_pass` 中的循环引入远程解决方法 (#356)，作者：**vadiklyutiy**，提交哈希：3195be5b\n- [运算符] 注册 PyTorch 函数等价物受 Hidet 支持的张量方法 (#347)，作者：**Bolin Sun**，提交哈希：44ab5ad3\n- [性能] 引入 add_hint_pass (#355)，作者：**vadiklyutiy**，提交哈希：c014dab1\n- [CI] 将 NVIDIA Docker 容器升级至 24.4 版本 (#354)，作者：**vadiklyutiy**，提交哈希：cb809b99\n- [修复] 将注意力掩码的类型从 fp32 转换为 f16 (#323)，作者：**zhumakhan**，提交哈希：9a10dc01\n- [修复] 为 conv-bert-base 模型添加缺失的 `torch.multiply` 和 `torch.nn.functional.unfold` 操作 (#351)，作者：**zhumakhan**，提交哈希：18842eeb\n- [修复] 修复 `register_methods` 中的 bug (#331)，作者：**Bolin Sun**，提交哈希：c87c5153\n- [修复] 处理 `setitem` 中关于 dtype 和设备的特殊情况 (#332)，作者：**Bolin Sun**，提交哈希：ff9445e2\n- [BUG] 修复 `bench_op.py` 中的 search_space bug (#348)，作者：**vadiklyutiy**，提交哈希：29e4c0e8\n- [OPS] 禁止在 fxgraph 中使用不支持的函数 (#317)，作者：**vadiklyutiy**，提交哈希：984cf75e\n- [选项] 移除 dynamo_config['search_space'] (#342)，作者：**vadiklyutiy**，提交哈希：0814bd8e\n- [运算符] 添加对 `torch.Tensor.view_as` 的支持 (#334)，作者：**Bolin Sun**，提交哈希：5f19dd05\n- [运算符] 添加对 `torch.nn.TransformerEncoder` 的支持 (#327)，作者：**Bolin Sun**，提交哈希：d625146e\n- [选项] 从 `torch.compile()` 继承 `options` (#260)，作者：**vadiklyutiy**，提交哈希：3638a0b5\n- [运算符] 为 `Tensor` 类添加 `__ge__` 方法 (#330)，作者：**Bolin Sun**，提交哈希：ed5fefff\n- [修复] 修复由 `ClampOp` 触发的错误 (#329)，作者：**Bolin Sun**，提交哈希：05984cb8\n- [修复] 处理因 `getitem` 中设备差异导致的 hidet 错误 (#322)，作者：**Bolin Sun**，提交哈希：5a908205\n- [修复] 修复由 `register_functions.py` 中的 `tensor_reshape` 函数触发的 RuntimeError (#328)，作者：**Bolin Sun**，提交哈希：0cd2f838\n- [运算符] 添加在编译 `DALLE2_pytorch` 时遇到的 PyTorch 运算符 (#319)，作者：**Bolin Sun**，提交哈希：ecb99b1d\n- [修复] 修复由于尝试修改 `immutable_list` 导致的 `tensor_expand` 中的 bug (#320)，作者：**Bolin Sun**，提交哈希：bb89e227\n- [杂项] 替换","2024-07-28T09:50:47",{"id":176,"version":177,"summary_zh":178,"released_at":179},126945,"v0.3.1","## 变更内容\n* [版本] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F361 中将版本号提升至 v0.3.1.dev\n* [选项] 由 @serach24 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F362 中新增禁用命令式执行的选项\n* [图][基准测试] 由 @Aalanli 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F363 中更新了基准测试函数\n* [编译服务器] 由 @xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F365 中更新了编译服务器的依赖项\n* [工具] 由 @destefy 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F367 中更改了多进程上下文\n* [Dynamo] 由 @destefy 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F369 中重构了 Hidet 远程编译的相关代码\n* [图][Dynamo 后端] 由 @Aalanli 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F371 中实现了左移、右移和取模操作\n* [图][算子] 由 @Aalanli 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F372 中修复了归约操作的 bug，并新增了 uint8x4 类型\n* [编译图] 由 @destefy 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F377 中添加了存储调度表的选项\n* [图][张量] 由 @xiaocenxiaocen 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F374 中移除了不必要的同步操作\n* [图][Dynamo 后端] 由 @Aalanli 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F383 中修复了命令式运行中的一个小 bug\n* [图] 由 @Aalanli 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F384 中修复了 CompiledGraph 的别名问题\n* [前端] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F387 中为 `torch.sqrt` 添加了映射\n* [修复][图] 由 @destefy 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F392 中修改为先将编译图写入临时文件\n* [算子] 由 @BolinSNLHM 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F378 中提升了 x86 CPU 上的 FP32 矩阵乘法性能\n* [Bug 修复] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F391 中修复了与 C\u002FC++ 整数提升相关的 bug\n* [选项] 由 @destefy 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F393 中新增了将 Var 类的 id 属性默认设置为 0 的选项\n* [CI] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F394 中添加了 CI 工作流及脚本\n* [CI] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F395 中修复了死锁问题\n* [算子] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F366 中对归约操作进行了优化\n* [CI] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F396 中通过工作流启动和停止编译服务器\n* [算子] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F399 中支持池化算子的高级选项\n* [Torch] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F400 中实现了 __torch_func__ 协议\n* [文档] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F401 中增加了更多文档\n* [Bug 修复] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F402 中修复了自动调度器中的性能问题\n* [库] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F404 中添加了 cublas 库\n* [算子] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F405 中新增了 `hidet.ops.matmul_cublas` 算子\n* [融合] 允许进行浅层融合 c","2024-04-03T15:21:06",{"id":181,"version":182,"summary_zh":183,"released_at":184},126946,"v0.3.0","## 注释\n\n在本次发布中，我们进一步增强了对大型语言模型推理、分布式推理和量化技术的支持。同时，我们也使 hidet script 更加稳定，并为其添加了更多文档。此外，还支持了更多的算子和模型。详细信息如下。\n\n### 前端\n* [前端] 由 @Aalanli 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F294 中实现的动态形状 FX 追踪\n* [Torch] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F310 中实现的 PyTorch 权重窃取功能\n* [Dynamo 前端] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F319 中重构的动态形状支持\n* [Torch][图][算子] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F347 中为 torchvision 模型支持添加并修复的各项内容\n* [Dynamo] 对注意力机制进行小幅优化，并注册了几项函数，由 @xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F345 中完成\n\n### 算子与模型\n* [算子] 由 @Aalanli 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F290 中对卷积运算进一步提升性能\n* [算子] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F296 中重构矩阵乘法实现\n* [模型支持] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F303 中新增对 wav2vec 的支持\n* [算子] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F307 中更新了动态形状下的注意力机制\n* [算子] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F308 中实现了自适应池化归约操作\n* [归约] 由 @xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F311 中将归约算子优化并统一到一处\n* [算子] 由 @xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F316 中通过向量化加载、动态形状等手段优化了归一化算子\n* [模型] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F322 中为 T5 添加了缺失的算子\n* [Bug 修复] 由 @xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F325 中修复了归约操作应在初始化共享内存为零后执行同步线程的问题\n* [模型] 由 @Aalanli 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F324 中实现了 Llama 2 支持\n* [模型] 由 @Aalanli 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F333 中修复了 Llama 2 相关问题\n* [算子] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F337 中实现了复合逐元素运算\n* [算子] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F343 中新增 clamp\u002Fisinf\u002Fany\u002Fall 等算子，并增强了 where 算子的功能\n* [Torch][算子] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F348 中进一步提升了对 torchvision 模型的支持\n* [算子] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F349 中新增了 einsum 算子\n* [算子][图][回归] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F356 中对 CNN 进行了优化\n* [图] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F358 中修复了若干小 bug\n\n### 分布式推理\n* [分布式] 由 @soodoshll 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F284 中实现的 all_reduce 算子及图中的分布式信息\n* [分布式] 由 @soodoshll 在 https:\u002F\u002Fgithub.com\u002Fhidet-","2023-09-28T15:53:46",{"id":186,"version":187,"summary_zh":188,"released_at":189},126947,"v0.2.4","## 变更内容\n* [版本] 由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F188 中将版本号提升至 v0.2.4.dev\n* [Dynamo] 模块测试 + 算子支持，由 AndreSlavescu 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F148 中完成\n* 重构编译工作流以支持无 CUDA 的 CPU，由 LDY1998 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F189 中完成\n* [Stack] 允许 ulimit 栈大小低于预期值，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F195 中实现\n* [Readme] 添加平台要求，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F196 中完成\n* [DataType] 添加 complex64 和 complex128 数据类型，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F200 中完成\n* [Example] 添加运行 GPT-2 模型的示例，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F203 中完成\n* [Fusion] 在融合中使用内联传递，以允许带有核参数的模板调用函数，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F197 中实现\n* [Frontend][Operator] 为 dinov2 添加缺失的算子，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F206 中完成\n* [Backend] 添加 OpenMP 支持，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F208 中完成\n* [Operator] 更新 batch_matmul 以使用 Hidet Script，由 hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F207 中完成\n* [Cache] 添加缓存管理命令行界面，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F212 中完成\n* [IR] 对常量表达式进行编译时常量折叠，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F209 中完成\n* [Torch][Operator] 在可能的情况下允许更改 PyTorch 张量的设备，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F214 中完成\n* [Torch][Operator] 为 torch.min\u002Fmax\u002Fminimum\u002Fmaximum 添加算子映射，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F216 中完成\n* [Typo] 修复 resnext.py 中的一个拼写错误，由 eltociear 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F210 中完成\n* [Operator] 为 llama 添加缺失的算子，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F219 中完成\n* [IR] 在 Task 和 FlowGraph 层面增加对动态形状的支持，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F220 中完成\n* [Torch] 为 `torch.ops.aten.add` 和 `torch.ops.aten.cos` 添加映射，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F223 中完成\n* [Operator][Backend] 添加 nvcc 标志以提高数学运算速度，并更新 Attention 调度，由 hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F221 中完成\n* [CI] 在测试前始终清除缓存，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F224 中完成\n* 修复针对 sm \u003C 80 的无效 mma 配置下的 batch_matmul，由 xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F227 中完成\n* [Dynamic Shape] 增加更多动态形状支持，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F228 中完成\n* [CI] 将 `importlib_metadata` 添加到 `requirements-dev.txt`，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F233 中完成\n* [Script] 在 Hidet Script 中添加列表推导式支持，由 yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F235 中完成\n* [Refactor][Dynamic Sh","2023-06-21T02:00:36",{"id":191,"version":192,"summary_zh":193,"released_at":194},126948,"v0.2.3","## 变更内容\n* [版本] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F144 中将版本号提升至 v0.2.3.dev\n* [工作流] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F145 中更新工作流，使用稳定版 PyTorch\n* [算子] 当计算设备低于 SM80 时，将 matmul 降级为 batch_matmul，由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F146 中实现\n* [Dynamo] 增加非线性算子支持及测试，由 @AndreSlavescu 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F143 中完成\n* 移除教程提示信息，由 @LDY1998 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F149 中完成\n* [BUG] 解决转换编译问题，由 @xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F150 中修复\n* [Dynamo] 修复 Dynamo 测试并转储图 IR，由 @xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F153 中完成\n* [CI] 定期进行基准测试，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F155 中实现\n* [CI] 更新基准测试脚本，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F156 中完成\n* [CI] 在基准测试脚本中添加更多环境信息，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F157 中完成\n* [CI] 移除基准测试工作流，但在专用服务器上运行，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F159 中完成\n* [CI] 更新基准测试脚本，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F160 中完成\n* [CI] 将基准测试脚本中的搜索空间从 0 调整为 2，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F161 中完成\n* [CI] 更新基准测试脚本，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F162 中完成\n* [CI] 更新基准测试脚本，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F163 中完成\n* [IR][Pass] 重构融合实现，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F164 中完成\n* [Dynamo] 增加算子支持，以运行 diffusers 中的 UNet2DConditionModel，由 @xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F151 中完成\n* [IR][动态形状] 增强 Tensor Program IR 以支持动态形状，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F165 中完成\n* [算子] 允许 matmul_f16 与尾部操作融合，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F167 中完成\n* [CI] 更新基准测试脚本，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F168 中完成\n* [CUDA] 实现 CUDA 上下文的延迟初始化，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F169 中完成\n* [Bug 修复] 允许基准测试脚本中一个后端失败，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F170 中完成\n* [Bug 修复] 对 fp64 约减使用自动调度器，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F171 中完成\n* [算子] 添加 `gather` 算子以及 `torch.zeros` 和 `torch.neg` 的映射，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F174 中完成\n* [CI] 更新基准测试脚本，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F179 中完成\n* [Bug 修复] 为 PyTorch softmax 映射添加 `_stacklevel` 参数，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F178 中完成\n* [IR] 为循环语句添加 unroll 指令，由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F180 中完成\n* [算子] Fl","2023-04-24T20:31:35",{"id":196,"version":197,"summary_zh":198,"released_at":199},126949,"v0.2.2","## 变更内容\n* [版本] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F118 中将版本号提升至 0.2.2.dev\n* [选项] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F120 中新增 `debug_cache_tuning` 选项\n* [修复] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F121 中移除 shfl 原语中的 lambda 表达式\n* [IR][重构] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F122 中重构 functor\u002Fvisitor\u002F重写器\n* [修复] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F123 中修复 IR 打印器中的 bug\n* [修复] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F124 中修复 `IRModule.update_function` 中的 bug\n* [前端] 由 @digital-nomad-cheng 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F127 中修正拼写错误\n* [算子] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F128 中添加对 hidet 中外部内核的支持\n* [测试] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F129 中重新组织前端测试文件\n* [Dynamo] 由 @AndreSlavescu 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F131 中添加算子支持\n* [修复] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F134 中允许网格计算被内联\n* [图] 由 @xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F135 中进行类型转换优化\n* [修复] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F136 中修复将 blockDim 映射到 blockIdx 的 bug\n* [修复] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F137 中修复基于规则的简化器中的 bug\n* [工作流] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F138 中更新 CI 工作流的并发图\n* [运行时] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F139 中为 `CompiledFunction` 添加 `src_path` 和 `source()` 成员\n* [运行时][IR] 由 @yaoyaoding 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F140 中支持彩色源代码，并将 blockDim 添加到 extern_vars 中\n* [修复] 由 @hjjq 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F141 中在转储前将张量转换为 CPU 格式\n\n## 新贡献者\n* @digital-nomad-cheng 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F127 中做出了首次贡献\n* @xinli-git 在 https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F135 中做出了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fcompare\u002Fv0.2.1...v0.2.2","2023-03-24T00:51:05",{"id":201,"version":202,"summary_zh":203,"released_at":204},126950,"v0.2.1","## What's Changed\r\n* [Version] Bump version to 0.2.1.dev by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F73\r\n* [CI] Prevent fork repos from running workflow by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F74\r\n* [Fixbug] Fix a bug in ``trace_from`` when the inputs are directly used as outputs by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F76\r\n* [Operator] Add reduce_f16 and squeeze as Reduce's resolve variants by @hjjq in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F75\r\n* [IR] Input specification assertion message for valid IR check by @AndreSlavescu in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F78\r\n* [Operator] Add conv3d, max_pool3d, avg_pool3d by @hjjq in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F79\r\n* [Dynamo] Add the entry point registration for dynamo by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F80\r\n* [Fix] Update shape utility functions to expect Sequence instead of List by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F86\r\n* [Bugfix] 'double'->'float64' in onnx dtype conversion by @soodoshll in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F88\r\n* [Fix] Mark the reduce fp16 operator not fusible by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F100\r\n* [Fixbug] Use uint64_t instead of unsigned long long for literals by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F101\r\n* [Fixbug] Fix a bug in the minimum and maximum operator by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F102\r\n* [Dynamo] Update dynamo registration after pytorch refactored that part by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F84\r\n* [Fixbug] Fix bugs in binary_arithmetic op and swizzle layout by @hjjq in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F104\r\n* [Fixbug] Call fuse in reduce_fp16 operator by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F105\r\n* [ONNX] Fix the out of bound error in onnx slice function during importing by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F106\r\n* [Fixbug] Reverse map of binary operator by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F107\r\n* [Fixbug] Add attributes to Clip operator by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F108\r\n* [Fixbug] Binary arthmatic ops raise error when one is scalar on GPU by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F109\r\n* [Graph] Refactor forward function of FlowGraph by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F110\r\n* [Fixbug] Use int64 as the output of arg-reduce by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F111\r\n* [README] Update readme by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F114\r\n* [Fixbug] Fix a bug when an graph output is constant by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F113\r\n* [Community] Create CODE_OF_CONDUCT.md by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F115\r\n* [Community] Update issue templates by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F116\r\n* [Fixbug] Resolve the min\u002Fmax function according to compute capability by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F112\r\n* [Workflow] Update workflow  by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F117\r\n* [Workflow] Update publish workflow by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F119\r\n\r\n## New Contributors\r\n* @soodoshll made their first contribution in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F88\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fcompare\u002Fv0.2.0...v0.2.1","2023-02-18T06:26:58",{"id":206,"version":207,"summary_zh":208,"released_at":209},126951,"v0.2.0","## What's Changed\r\n* [Version] Bump version to 0.2.dev by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F60\r\n* [Frontend] Add `torch.tensor` binding by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F61\r\n* [Version] Add __version__ to root namespace by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F62\r\n* [FFI] Add SharedLibrary class to track the usage of dynamic library by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F63\r\n* [Operator] Fix a bug in resize2d operator defintion by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F64\r\n* [CI] Update scripts to build wheel by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F65\r\n* [CI] Remove docs workflow by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F66\r\n* [Docs] Update README.md and require cuda-python>=11.6.1 by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F67\r\n* [Docs] Add instructions for installing nightly version of hidet by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F68\r\n* [Docs] Fixed typo in docs by @AndreSlavescu in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F69\r\n* [Operator] Add dilation support for conv2d by @hjjq in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F71\r\n* [Fixbug] Cast back to original data type in the mix precision pass: by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F72\r\n* [CI] Add automatic publish workflow (PyPI) by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F70\r\n\r\n## New Contributors\r\n* @AndreSlavescu made their first contribution in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F69\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fcompare\u002Fv0.1...v0.2.0","2023-01-13T23:59:42",{"id":211,"version":212,"summary_zh":213,"released_at":214},126952,"v0.1","This is the first release of hidet. \r\n\r\nFor the usage of hidet, please visit: [https:\u002F\u002Fdocs.hidet.org](https:\u002F\u002Fdocs.hidet.org)\r\n\r\n## What's Changed\r\n* [Docs] Update documentation by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F2\r\n* [Operator] Add leaky_relu and conv2d_transpose operator by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F3\r\n* [Doc] Add doc on how to define operator computation by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F4\r\n* [Bug] fix bugs in reshape and conv2d_transpose by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F5\r\n* [Option] Add option module  by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F6\r\n* [Docs] Add documentation on how to add new operators by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F7\r\n* [Operator] Add PRelu op by @hjjq in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F8\r\n* [Docs] Add documentation for operator cache & fix a typo by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F9\r\n* [Operator] Add Abs and And operator by @hjjq in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F10\r\n* [CI] Update github workflow by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F11\r\n* [CI] Update docs workflow, not delete remote dest dir by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F12\r\n* [Operator] Add conv2d_transpose_gemm operator & fix a bug by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F13\r\n* [Runtime] force to use gpu tensor buffer in cuda graph by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F14\r\n* [Functor] Fix a bug in IR functor by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F15\r\n* [Graph] Force users to give an input order when multiple symbolic inputs are found in traced graph by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F17\r\n* [Operator] Add BitShift, Bitwise*, Ceil Operators by @hjjq in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F19\r\n* [IR] Refactor scalar type system by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F18\r\n* [IR] Refactoring math functions by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F20\r\n* [Operator] Fix a bug when resolve matmul to batch_matmul by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F21\r\n* [Operator] Add cubic interpolation to Resize Operator by @hjjq in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F22\r\n* [Packfunc] Refactor packed func & add vector type by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F23\r\n* [Pass] Add lower_special_cast pass and refactor resolve rule registration by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F24\r\n* [Docs] Change github repo url by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F25\r\n* [Operator] Add float16 precision matrix multiplication by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F26\r\n* [Docs] Add a guide on operator resolving by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F27\r\n* [CI] Avoid interactive query in apt installation of tzdata by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F28\r\n* [Docs] Add sub-graph rewrite tutorial by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F29\r\n* [Tensor] Implement dlpack tensor exchange protocol by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F30\r\n* [Frontend] Add a torch dynamo backend based on hidet \"onnx2hidet\" by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F31\r\n* [Frontend] Add hidet dynamo backend based on torch.fx by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F32\r\n* [Frontend] Make onnx dependency optional by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F33\r\n* [Frontend] Add more operator mappings for pytorch frontend by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F34\r\n* [Opeartor] Fix a bug in take (index can be in [-r, r-1]) by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F35\r\n* [Frontend] Add an option to print correctness report in hidet backend of torch dynamo by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F36\r\n* [IR] Refactor the attribute 'dtype' of hidet.Tensor from 'str' to 'DataType' by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F37\r\n* [Operator] Add a constant operator and deprecates manually implemented fill cuda kernel by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F38\r\n* [ONNX] Add reduce l2 onnx operator by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F40\r\n* [CLI] Add the 'hidet' command line interface by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F39\r\n* [Codegen] Add explicit conversion type for float16 by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F41\r\n* [Docs] Add the documentation for 'hidet' backend of PyTorch dynamo by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F42\r\n* [Runtime] Refactor the cuda runtime api used in hidet by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F43\r\n* [Testing] Remove redundant models in hidet.testing by @yaoyaoding in https:\u002F\u002Fgithub.com\u002Fhidet-org\u002Fhidet\u002Fpull\u002F44\r\n* [Runtime][IR] Refactor the de","2023-01-06T02:57:18"]