[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-fastmachinelearning--hls4ml":3,"tool-fastmachinelearning--hls4ml":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",142651,2,"2026-04-06T23:34:12",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":117,"forks":118,"last_commit_at":119,"license":120,"difficulty_score":121,"env_os":122,"env_gpu":123,"env_ram":122,"env_deps":124,"category_tags":133,"github_topics":134,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":146,"updated_at":147,"faqs":148,"releases":179},4983,"fastmachinelearning\u002Fhls4ml","hls4ml","Machine learning on FPGAs using HLS","hls4ml 是一款专为在 FPGA（现场可编程门阵列）上实现机器学习推理而设计的开源工具。它能够将 TensorFlow、Keras 等主流框架训练好的模型，自动转换为高效的高层次综合（HLS）代码，进而生成可在硬件上运行的固件。\n\n传统深度学习模型在通用处理器上运行往往存在延迟高、功耗大的问题，难以满足某些极端场景的需求。hls4ml 正是为了解决这一痛点而生，它能将推理延迟压缩至微秒甚至纳秒级，同时保持极低的功耗。这使得它在高能物理实验（如欧洲核子研究中心的触发系统）、量子计算控制、核聚变反馈回路以及卫星环境监测等对实时性要求极高的领域大放异彩。\n\n这款工具非常适合嵌入式 AI 开发者、科研人员以及需要部署超低延迟算法的工程师使用。即使没有深厚的硬件描述语言背景，用户也能通过简单的 Python 接口完成从模型导入到硬件比特流生成的全流程。hls4ml 的独特亮点在于其高度可配置的资源优化策略，允许用户在精度、速度和资源占用之间灵活权衡，真正实现了“软件定义硬件”的便捷体验，让前沿的机器学习算法能在边缘端高效落地。","\u003Cp align=\"center\">\n   \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Ffastmachinelearning.github.io\u002Fraw\u002Fmaster\u002Fimages\u002Fhls4ml_logo.svg\" alt=\"hls4ml\" width=\"400\"\u002F>\n\u003C\u002Fp>\n\n[![DOI](https:\u002F\u002Fzenodo.org\u002Fbadge\u002F108329371.svg)](https:\u002F\u002Fzenodo.org\u002Fbadge\u002Flatestdoi\u002F108329371)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-red.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![Documentation Status](https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Factions\u002Fworkflows\u002Fbuild-sphinx.yml\u002Fbadge.svg)](https:\u002F\u002Ffastmachinelearning.org\u002Fhls4ml)\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fhls4ml.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fhls4ml)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_674db87017b0.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fhls4ml)\n\u003Ca href=\"https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Fhls4ml\u002F\">\u003Cimg alt=\"conda-forge\" src=\"https:\u002F\u002Fimg.shields.io\u002Fconda\u002Fdn\u002Fconda-forge\u002Fhls4ml.svg?label=conda-forge\">\u003C\u002Fa>\n\nA package for machine learning inference in FPGAs. We create firmware implementations of machine learning algorithms using high level synthesis language (HLS). We translate traditional open-source machine learning package models into HLS that can be configured for your use-case!\n\nhls4ml is designed for ultra-low-latency inference on FPGAs. While it has strong roots in high-energy physics applications (e.g., L1 trigger systems at the CERN Large Hadron Collider), it has also been adopted across diverse scientific and industrial domains. Example use cases include control systems for quantum computing, feedback loops in nuclear fusion, low-power environmental monitoring on satellites, and biomedical signal processing (e.g., arrhythmia classification).\n\n\nIf you have any questions, comments, or ideas regarding hls4ml or just want to show us how you use hls4ml, don't hesitate to reach us through the [discussions](https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fdiscussions) tab.\n\n# Documentation & Tutorial\n\nFor more information visit the webpage: [https:\u002F\u002Ffastmachinelearning.org\u002Fhls4ml\u002F](https:\u002F\u002Ffastmachinelearning.org\u002Fhls4ml\u002F).\n\nFor introductory material on FPGAs, HLS and ML inferences using hls4ml, check out the [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=2y3GNY4tf7A&ab_channel=SystemsGroupatETHZ%C3%BCrich).\n\nDetailed tutorials on how to use `hls4ml`'s various functionalities can be found [here](https:\u002F\u002Fgithub.com\u002Fhls-fpga-machine-learning\u002Fhls4ml-tutorial).\n\n# Installation\n```bash\npip install hls4ml\n```\n\nTo install the extra dependencies for profiling:\n\n```bash\npip install hls4ml[profiling]\n```\n\n# Getting Started\n### Creating an HLS project\n```Python\nimport hls4ml\n\n# Fetch a keras model from our example repository\n# This will download our example model to your working directory and return an example configuration file\nconfig = hls4ml.utils.fetch_example_model('KERAS_3layer.json')\n\n# You can print the configuration to see some default parameters\nprint(config)\n\n# Convert it to a hls project\nhls_model = hls4ml.converters.keras_v2_to_hls(config)\n\n# Print full list of example models if you want to explore more\nhls4ml.utils.fetch_example_list()\n```\n\n### Building a project.\nWe will build the project using Xilinx Vivado HLS, which can be downloaded and installed from [here](https:\u002F\u002Fwww.xilinx.com\u002Fproducts\u002Fdesign-tools\u002Fvivado\u002Fintegration\u002Fesl-design.html). Alongside Vivado HLS, hls4ml also supports Vitis HLS, Intel HLS, Catapult HLS and has some experimental support dor Intel oneAPI. The target back-end can be changed using the argument backend when building the model.\n\n```Python\n# Use Vivado HLS to synthesize the model\n# This might take several minutes\nhls_model.build()\n\n# Print out the report if you want\nhls4ml.report.read_vivado_report('my-hls-test')\n```\n\n# FAQ\n\nList of frequently asked questions and common HLS synthesis can be found [here](https:\u002F\u002Ffastmachinelearning.org\u002Fhls4ml\u002Fintro\u002Ffaq.html)\n\n# Citation\nIf you use this software in a publication, please cite the software\n```bibtex\n@software{fastml_hls4ml,\n  author       = {{FastML Team}},\n  title        = {fastmachinelearning\u002Fhls4ml},\n  year         = 2025,\n  publisher    = {Zenodo},\n  version      = {v1.3.0},\n  doi          = {10.5281\u002Fzenodo.1201549},\n  url          = {https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml}\n}\n```\nthe first publication:\n```bibtex\n@article{Duarte:2018ite,\n    author = \"Duarte, Javier and others\",\n    title = \"{Fast inference of deep neural networks in FPGAs for particle physics}\",\n    eprint = \"1804.06913\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"physics.ins-det\",\n    reportNumber = \"FERMILAB-PUB-18-089-E\",\n    doi = \"10.1088\u002F1748-0221\u002F13\u002F07\u002FP07027\",\n    journal = \"JINST\",\n    volume = \"13\",\n    number = \"07\",\n    pages = \"P07027\",\n    year = \"2018\"\n}\n```\nand the latest overview paper:\n```bibtex\n@article{Schulte:2025mai,\n    author = \"Schulte, Jan-Frederik and others\",\n    title = \"{hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware}\",\n    eprint = \"2512.01463\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"cs.AR\",\n    reportNumber = \"FERMILAB-PUB-25-0890-CSAID-ETD-PPD\",\n    month = \"12\",\n    year = \"2025\"\n}\n```\n\nAdditionally, if you use specific features developed in later papers, please cite those as well. For example, CNNs:\n```bibtex\n@article{Aarrestad:2021zos,\n    author = \"Aarrestad, Thea and others\",\n    title = \"{Fast convolutional neural networks on FPGAs with hls4ml}\",\n    eprint = \"2101.05108\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"cs.LG\",\n    reportNumber = \"FERMILAB-PUB-21-130-SCD\",\n    doi = \"10.1088\u002F2632-2153\u002Fac0ea1\",\n    journal = \"Mach. Learn. Sci. Tech.\",\n    volume = \"2\",\n    number = \"4\",\n    pages = \"045015\",\n    year = \"2021\"\n}\n@article{Ghielmetti:2022ndm,\n    author = \"Ghielmetti, Nicol\\`{o} and others\",\n    title = \"{Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml}\",\n    eprint = \"2205.07690\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"cs.CV\",\n    reportNumber = \"FERMILAB-PUB-22-435-PPD\",\n    doi = \"10.1088\u002F2632-2153\u002Fac9cb5\",\n    journal =\"Mach. Learn. Sci. Tech.\",\n    year = \"2022\"\n}\n```\nDistributed arithmetic:\n```bibtex\n@misc{Sun:2025,\n      title={da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs},\n      author={Chang Sun and others},\n      year={2025},\n      eprint={2507.04535},\n      archivePrefix={arXiv},\n      primaryClass={cs.AR},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.04535},\n}\n```\nbinary\u002Fternary networks:\n```bibtex\n@article{Loncar:2020hqp,\n    author = \"Ngadiuba, Jennifer and others\",\n    title = \"{Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML}\",\n    eprint = \"2003.06308\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"cs.LG\",\n    reportNumber = \"FERMILAB-PUB-20-167-PPD-SCD\",\n    doi = \"10.1088\u002F2632-2153\u002Faba042\",\n    journal = \"Mach. Learn. Sci. Tech.\",\n    volume = \"2\",\n    pages = \"015001\",\n    year = \"2021\"\n}\n```\n\n# Acknowledgments\nIf you benefited from participating in our community, we ask that you please acknowledge the Fast Machine Learning collaboration, and particular individuals who helped you, in any publications.\nPlease use the following text for this acknowledgment:\n  > We acknowledge the Fast Machine Learning collective as an open community of multi-domain experts and collaborators. This community and \\\u003Cnames of individuals\\>, in particular, were important for the development of this project.\n\n# Funding\nWe gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for \u003Ca href=\"https:\u002F\u002Fa3d3.ai\">Accelerating AI Algorithms for Data Driven Discovery (A3D3)\u003C\u002Fa> under Cooperative Agreement No. \u003Ca href=\"https:\u002F\u002Fwww.nsf.gov\u002Fawardsearch\u002FshowAward?AWD_ID=2117997\">PHY-2117997\u003C\u002Fa>, U.S. Department of Energy (DOE) Office of Science, Office of Advanced Scientific Computing Research under the Real‐time Data Reduction Codesign at the Extreme Edge for Science (XDR) Project (\u003Ca href=\"https:\u002F\u002Fscience.osti.gov\u002F-\u002Fmedia\u002Fgrants\u002Fpdf\u002Ffoas\u002F2021\u002FSC_FOA_0002501.pdf\">DE-FOA-0002501\u003C\u002Fa>), DOE Office of Science, Office of High Energy Physics Early Career Research Program (\u003Ca href=\"https:\u002F\u002Fpamspublic.science.energy.gov\u002FWebPAMSExternal\u002FInterface\u002FCommon\u002FViewPublicAbstract.aspx?rv=df0ae4ab-a46e-481a-9acc-3856b6b041e5&rtc=24&PRoleId=10\">DE-SC0021187\u003C\u002Fa>, DE-0000247070), the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant No. \u003Ca href=\"https:\u002F\u002Fdoi.org\u002F10.3030\u002F772369\">772369\u003C\u002Fa>), and the Eric & Wendy Schmidt Fund for Strategic Innovation through the CERN Next Generation Triggers project under grant agreement number SIF-2023-004.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_f80e373e8691.png\" alt=\"A3D3\" width=\"130\"\u002F>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_383b95622182.png\" alt=\"NSF\" width=\"130\"\u002F>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_0e555d197d4a.png\" alt=\"DOE\" width=\"130\"\u002F>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_15ec46ad483d.png\" alt=\"ERC\" width=\"130\"\u002F>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_09cf91ee2512.png\" alt=\"NGT\" width=\"130\" \u002F>\n\u003C\u002Fp>\n","\u003Cp align=\"center\">\n   \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Ffastmachinelearning.github.io\u002Fraw\u002Fmaster\u002Fimages\u002Fhls4ml_logo.svg\" alt=\"hls4ml\" width=\"400\"\u002F>\n\u003C\u002Fp>\n\n[![DOI](https:\u002F\u002Fzenodo.org\u002Fbadge\u002F108329371.svg)](https:\u002F\u002Fzenodo.org\u002Fbadge\u002Flatestdoi\u002F108329371)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-red.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![Documentation Status](https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Factions\u002Fworkflows\u002Fbuild-sphinx.yml\u002Fbadge.svg)](https:\u002F\u002Ffastmachinelearning.org\u002Fhls4ml)\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fhls4ml.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fhls4ml)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_674db87017b0.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fhls4ml)\n\u003Ca href=\"https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Fhls4ml\u002F\">\u003Cimg alt=\"conda-forge\" src=\"https:\u002F\u002Fimg.shields.io\u002Fconda\u002Fdn\u002Fconda-forge\u002Fhls4ml.svg?label=conda-forge\">\u003C\u002Fa>\n\n一个用于在FPGA上进行机器学习推理的软件包。我们使用高层次综合语言（HLS）创建机器学习算法的固件实现。我们将传统的开源机器学习模型转换为可针对您的用例进行配置的HLS代码！\n\nhls4ml专为FPGA上的超低延迟推理而设计。尽管它在高能物理应用领域（例如，欧洲核子研究中心大型强子对撞机的L1触发系统）有着深厚的基础，但也已被广泛应用于各种科学和工业领域。其典型应用场景包括量子计算控制系统、核聚变反馈回路、卫星上的低功耗环境监测以及生物医学信号处理（如心律失常分类）等。\n\n如果您对hls4ml有任何疑问、意见或想法，或者只是想与我们分享您如何使用hls4ml，请随时通过[讨论区](https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fdiscussions)与我们联系。\n\n# 文档与教程\n\n更多信息请访问官网：[https:\u002F\u002Ffastmachinelearning.org\u002Fhls4ml\u002F](https:\u002F\u002Ffastmachinelearning.org\u002Fhls4ml\u002F)。\n\n有关FPGA、HLS以及使用hls4ml进行ML推理的入门资料，请观看[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=2y3GNY4tf7A&ab_channel=SystemsGroupatETHZ%C3%BCrich)。\n\n关于如何使用`hls4ml`各项功能的详细教程，请参阅[这里](https:\u002F\u002Fgithub.com\u002Fhls-fpga-machine-learning\u002Fhls4ml-tutorial)。\n\n# 安装\n```bash\npip install hls4ml\n```\n\n若需安装用于性能分析的额外依赖项：\n\n```bash\npip install hls4ml[profiling]\n```\n\n# 快速开始\n### 创建一个HLS项目\n```Python\nimport hls4ml\n\n# 从我们的示例仓库中获取一个Keras模型\n# 这将下载我们的示例模型到您的工作目录，并返回一个示例配置文件\nconfig = hls4ml.utils.fetch_example_model('KERAS_3layer.json')\n\n# 您可以打印配置以查看一些默认参数\nprint(config)\n\n# 将其转换为HLS项目\nhls_model = hls4ml.converters.keras_v2_to_hls(config)\n\n# 如果您想进一步探索，可以打印所有示例模型的列表\nhls4ml.utils.fetch_example_list()\n```\n\n### 构建项目。\n我们将使用Xilinx Vivado HLS来构建该项目，您可以从[这里](https:\u002F\u002Fwww.xilinx.com\u002Fproducts\u002Fdesign-tools\u002Fvivado\u002Fintegration\u002Fesl-design.html)下载并安装。除了Vivado HLS之外，hls4ml还支持Vitis HLS、Intel HLS、Catapult HLS，并对Intel oneAPI提供了一些实验性支持。您可以在构建模型时通过backend参数更改目标后端。\n\n```Python\n# 使用Vivado HLS合成模型\n# 这可能需要几分钟时间\nhls_model.build()\n\n# 如果需要，可以打印报告\nhls4ml.report.read_vivado_report('my-hls-test')\n```\n\n# 常见问题解答\n\n常见问题及常见的HLS综合问题列表请参见[这里](https:\u002F\u002Ffastmachinelearning.org\u002Fhls4ml\u002Fintro\u002Ffaq.html)。\n\n# 引用\n如果您在论文或其他出版物中使用本软件，请引用以下内容：\n```bibtex\n@software{fastml_hls4ml,\n  author       = {{FastML Team}},\n  title        = {fastmachinelearning\u002Fhls4ml},\n  year         = 2025,\n  publisher    = {Zenodo},\n  version      = {v1.3.0},\n  doi          = {10.5281\u002Fzenodo.1201549},\n  url          = {https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml}\n}\n```\n首次发表的相关论文：\n```bibtex\n@article{Duarte:2018ite,\n    author = \"Duarte, Javier and others\",\n    title = \"{Fast inference of deep neural networks in FPGAs for particle physics}\",\n    eprint = \"1804.06913\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"physics.ins-det\",\n    reportNumber = \"FERMILAB-PUB-18-089-E\",\n    doi = \"10.1088\u002F1748-0221\u002F13\u002F07\u002FP07027\",\n    journal = \"JINST\",\n    volume = \"13\",\n    number = \"07\",\n    pages = \"P07027\",\n    year = \"2018\"\n}\n```\n以及最新的综述论文：\n```bibtex\n@article{Schulte:2025mai,\n    author = \"Schulte, Jan-Frederik and others\",\n    title = \"{hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware}\",\n    eprint = \"2512.01463\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"cs.AR\",\n    reportNumber = \"FERMILAB-PUB-25-0890-CSAID-ETD-PPD\",\n    month = \"12\",\n    year = \"2025\"\n}\n```\n\n此外，如果您使用了后期论文中开发的特定功能，也请一并引用。例如，关于卷积神经网络的论文：\n```bibtex\n@article{Aarrestad:2021zos,\n    author = \"Aarrestad, Thea and others\",\n    title = \"{Fast convolutional neural networks on FPGAs with hls4ml}\",\n    eprint = \"2101.05108\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"cs.LG\",\n    reportNumber = \"FERMILAB-PUB-21-130-SCD\",\n    doi = \"10.1088\u002F2632-2153\u002Fac0ea1\",\n    journal =\"Mach. Learn. Sci. Tech.\",\n    volume = \"2\",\n    number = \"4\",\n    pages = \"045015\",\n    year = \"2021\"\n}\n@article{Ghielmetti:2022ndm,\n    author = \"Ghielmetti, Nicol\\`{o} and others\",\n    title = \"{Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml}\",\n    eprint = \"2205.07690\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"cs.CV\",\n    reportNumber = \"FERMILAB-PUB-22-435-PPD\",\n    doi = \"10.1088\u002F2632-2153\u002Fac9cb5\",\n    journal =\"Mach. Learn. Sci. Tech.\",\n    year = \"2022\"\n}\n```\n关于分布式算术的论文：\n```bibtex\n@misc{Sun:2025,\n      title={da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs},\n      author={Chang Sun and others},\n      year={2025},\n      eprint={2507.04535},\n      archivePrefix={arXiv},\n      primaryClass={cs.AR},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.04535},\n}\n```\n关于二值化\u002F三值化网络的论文：\n```bibtex\n@article{Loncar:2020hqp,\n    author = \"Ngadiuba, Jennifer and others\",\n    title = \"{Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML}\",\n    eprint = \"2003.06308\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"cs.LG\",\n    reportNumber = \"FERMILAB-PUB-20-167-PPD-SCD\",\n    doi = \"10.1088\u002F2632-2153\u002Faba042\",\n    journal = \"Mach. Learn. Sci. Tech.\",\n    volume = \"2\",\n    pages = \"015001\",\n    year = \"2021\"\n}\n```\n\n# 致谢\n如果您从参与我们的社区中受益，我们恳请您在任何出版物中注明 Fast Machine Learning 合作团队，以及特别帮助过您的个人。\n请使用以下文字进行致谢：\n  > 我们感谢 Fast Machine Learning 团队作为一个由多领域专家和合作者组成的开放社区。该社区以及\\\u003C个人姓名\\>，尤其是他们，对本项目的开发起到了重要作用。\n\n# 资助\n我们衷心感谢美国国家科学基金会（NSF）“数据革命赋能”（HDR）研究所下“加速面向数据驱动发现的人工智能算法”（A3D3）项目在合作协议编号\u003Ca href=\"https:\u002F\u002Fwww.nsf.gov\u002Fawardsearch\u002FshowAward?AWD_ID=2117997\">PHY-2117997\u003C\u002Fa>下的支持；感谢美国能源部科学办公室高级科学计算研究处，在“面向科学的极端边缘实时数据压缩协同设计”（XDR）项目（\u003Ca href=\"https:\u002F\u002Fscience.osti.gov\u002F-\u002Fmedia\u002Fgrants\u002Fpdf\u002Ffoas\u002F2021\u002FSC_FOA_0002501.pdf\">DE-FOA-0002501\u003C\u002Fa>）中的资助；感谢美国能源部科学办公室高能物理早期职业研究计划（\u003Ca href=\"https:\u002F\u002Fpamspublic.science.energy.gov\u002FWebPAMSExternal\u002FInterface\u002FCommon\u002FViewPublicAbstract.aspx?rv=df0ae4ab-a46e-481a-9acc-3856b6b041e5&rtc=24&PRoleId=10\">DE-SC0021187\u003C\u002Fa>, DE-0000247070）的支持；感谢欧洲研究理事会（ERC）在欧盟“地平线2020”研究与创新计划框架下提供的资助（ grant No. \u003Ca href=\"https:\u002F\u002Fdoi.org\u002F10.3030\u002F772369\">772369\u003C\u002Fa>）；以及感谢埃里克与温迪·施密特战略创新基金通过 CERN 下一代触发器项目，在资助协议编号 SIF-2023-004 下提供的支持。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_f80e373e8691.png\" alt=\"A3D3\" width=\"130\"\u002F>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_383b95622182.png\" alt=\"NSF\" width=\"130\"\u002F>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_0e555d197d4a.png\" alt=\"DOE\" width=\"130\"\u002F>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_15ec46ad483d.png\" alt=\"ERC\" width=\"130\"\u002F>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_readme_09cf91ee2512.png\" alt=\"NGT\" width=\"130\" \u002F>\n\u003C\u002Fp>","# hls4ml 快速上手指南\n\nhls4ml 是一个用于在 FPGA 上进行机器学习推理的开源工具包。它能够将主流的机器学习模型（如 Keras\u002FTensorFlow）转换为高层次综合（HLS）代码，从而实现超低延迟的硬件加速推理。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：推荐 Linux (Ubuntu\u002FCentOS) 或 macOS。Windows 用户建议使用 WSL2。\n*   **Python 版本**：Python 3.8 或更高版本。\n*   **HLS 编译器（必需）**：\n    *   **Xilinx Vivado HLS** 或 **Vitis HLS**（最常用，需单独下载安装）。\n    *   也支持 Intel HLS Compiler、Catapult HLS 等后端。\n    *   *注意：hls4ml 本身不包含 HLS 编译器，请务必先从厂商官网安装相应工具并配置好环境变量。*\n*   **基础依赖**：`pip`, `git`, `cmake` (部分功能需要)。\n\n## 2. 安装步骤\n\n### 标准安装\n使用 pip 直接安装最新稳定版：\n\n```bash\npip install hls4ml\n```\n\n### 安装额外功能（可选）\n如果您需要使用性能分析（profiling）功能，请安装额外依赖：\n\n```bash\npip install hls4ml[profiling]\n```\n\n> **提示**：国内用户若遇到下载速度慢的问题，可使用清华源或阿里源加速安装：\n> ```bash\n> pip install hls4ml -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 3. 基本使用\n\n以下是最简单的快速入门流程，演示如何获取一个示例模型并将其转换为 HLS 项目。\n\n### 第一步：转换模型\n运行以下 Python 代码，它将自动下载一个示例 Keras 模型配置文件，并将其转换为 hls4ml 项目结构。\n\n```Python\nimport hls4ml\n\n# 获取示例模型配置 (会自动下载到当前目录)\nconfig = hls4ml.utils.fetch_example_model('KERAS_3layer.json')\n\n# 查看默认配置参数\nprint(config)\n\n# 将 Keras 模型转换为 HLS 项目\nhls_model = hls4ml.converters.keras_v2_to_hls(config)\n\n# (可选) 查看所有可用的示例模型列表\n# hls4ml.utils.fetch_example_list()\n```\n\n### 第二步：构建与综合\n调用 `.build()` 方法将启动后端的 HLS 编译器（默认为 Vivado HLS）进行代码综合。此过程可能需要几分钟。\n\n```Python\n# 使用已安装的 HLS 工具链综合模型\n# 确保已在系统环境变量中配置好 vivado_hls 或 vitis_hls\nhls_model.build()\n\n# 读取并打印综合报告 (以 Vivado 为例)\nhls4ml.report.read_vivado_report('my-hls-test')\n```\n\n构建完成后，您可以在生成的目录中找到可综合的 C++ 代码、测试台以及资源利用率报告。您可以进一步修改配置（如量化精度、流水线策略等）以优化 FPGA 资源消耗和延迟。","某高能物理实验室团队需要在大型强子对撞机（LHC）的 L1 触发系统中，实时处理海量传感器数据以在微秒级内识别罕见粒子碰撞事件。\n\n### 没有 hls4ml 时\n- **开发门槛极高**：算法工程师精通 Python 和 TensorFlow，但完全不懂 FPGA 底层的 Verilog\u002FVHDL 硬件描述语言，无法将训练好的模型部署到硬件上。\n- **迭代周期漫长**：手动将神经网络层翻译为硬件逻辑需要数周时间，一旦模型结构微调，整个硬件代码需推倒重来，严重拖慢实验进度。\n- **延迟难以达标**：传统软件推理或通用硬件方案无法满足纳秒至微秒级的超低延迟要求，导致大量有效物理事件因处理超时而丢失。\n- **资源优化困难**：缺乏自动化工具来平衡模型精度与 FPGA 有限的逻辑资源，往往造成硬件利用率低下或模型被迫过度简化。\n\n### 使用 hls4ml 后\n- **无缝转换模型**：团队直接导入现有的 Keras 模型，hls4ml 自动将其转换为高层次综合（HLS）代码，无需编写任何底层硬件代码。\n- **敏捷迭代验证**：修改模型结构后，仅需重新运行转换脚本并在几分钟内完成综合，将原本数周的开发周期缩短至小时级。\n- **极致低延迟推理**：生成的固件专为 FPGA 并行计算优化，实现了纳秒级的推理延迟，确保在极短的时间窗口内精准捕获关键碰撞信号。\n- **智能资源调配**：通过配置量化参数，hls4ml 自动优化模型以适应特定 FPGA 芯片的资源限制，在保证精度的同时最大化硬件效率。\n\nhls4ml 打破了算法与硬件间的壁垒，让科学家能以软件开发的效率，在 FPGA 上实现工业级的超低延迟智能推理。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffastmachinelearning_hls4ml_b2803d57.png","fastmachinelearning","Fast Machine Learning Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ffastmachinelearning_c5174850.png","Real-time and accelerated ML for fundamental sciences",null,"fml@fastmachinelearning.org","http:\u002F\u002Ffastmachinelearning.org\u002F","https:\u002F\u002Fgithub.com\u002Ffastmachinelearning",[81,85,89,93,97,101,105,109,113],{"name":82,"color":83,"percentage":84},"Python","#3572A5",49.5,{"name":86,"color":87,"percentage":88},"C++","#f34b7d",46.8,{"name":90,"color":91,"percentage":92},"Tcl","#e4cc98",1.4,{"name":94,"color":95,"percentage":96},"SystemVerilog","#DAE1C2",0.8,{"name":98,"color":99,"percentage":100},"Shell","#89e051",0.7,{"name":102,"color":103,"percentage":104},"Verilog","#b2b7f8",0.4,{"name":106,"color":107,"percentage":108},"CMake","#DA3434",0.3,{"name":110,"color":111,"percentage":112},"C","#555555",0.1,{"name":114,"color":115,"percentage":116},"Makefile","#427819",0,1931,539,"2026-04-06T16:37:00","Apache-2.0",4,"未说明","不需要 GPU（专为 FPGA 推理设计，依赖 HLS 综合工具而非图形处理器）",{"notes":125,"python":122,"dependencies":126},"该工具主要用于将机器学习模型转换为 FPGA 固件代码。核心运行依赖是外部的高层次综合 (HLS) 工具（如 Xilinx Vivado HLS），需单独下载并配置环境变量。安装可通过 pip 或 conda-forge 进行，支持添加 '[profiling]' 额外依赖以启用性能分析功能。",[127,128,129,130,131,132],"keras (用于模型转换)","Xilinx Vivado HLS (主要后端)","Vitis HLS","Intel HLS Compiler","Catapult HLS","Intel oneAPI (实验性支持)",[14],[135,136,137,138,139,140,141,142,143,144,145],"hls","machine-learning","fpga","python","keras","pytorch","onnx","vivado","vivado-hls","neural-network","intel-hls","2026-03-27T02:49:30.150509","2026-04-07T18:37:16.629052",[149,154,159,164,169,174],{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},22635,"为什么使用 QKeras 模型时 HLS 预测结果很差，而普通 Dense 层正常？","这通常是由于配置策略（Strategy）和 IO 类型（io_type）组合不当导致的。特别是对于 Conv2D 层，`Strategy: Resource` 配合 `io_type: io_parallel` 的组合可能存在问题。建议尝试将 `io_type` 设置为 `io_stream`，这通常能解决问题并获得更好的综合结果。配置示例如下：\n```python\nconfig = hls4ml.utils.config_from_keras_model(model, granularity='name')\nconfig['Model']['Strategy'] = 'Resource'\nconfig['LayerName']['softmax']['Strategy'] = 'Stable'\nhls_model = hls4ml.converters.convert_from_keras_model(\n    model,\n    hls_config=config,\n    output_dir='..\u002Foutput\u002Fmodel_std\u002Fhls4ml_prj',\n    io_type='io_stream'\n)\n```\n如果仍有警告但能运行，说明配置已生效。","https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fissues\u002F437",{"id":155,"question_zh":156,"answer_zh":157,"source_url":158},22636,"CNN 模型中滤波器（filters）数量有限制吗？为什么 32 个滤波器的层会导致综合卡住？","如果遇到综合卡在卷积层（如 32 个滤波器）或出现精度大幅下降的问题，特别是在使用了模型剪枝（pruning）后，很可能是因为剪枝破坏了模型层与层之间的原始连接关系，导致图不连通（Graph disconnected）。建议尝试使用未剪枝的模型进行验证。如果未剪枝模型工作正常，则问题出在剪枝后的模型结构上，需要检查剪枝后的张量连接是否正确。","https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fissues\u002F572",{"id":160,"question_zh":161,"answer_zh":162,"source_url":163},22637,"如何在网络压缩（移除低权重）后，在 HLS 翻译和 RTL 实现中跳过这些零权重以节省资源？","在当前的 HLS4ML 版本中，直接将权重设置为零通常不会自动触发硬件资源的跳过优化（即仍然会执行乘法操作）。虽然社区曾探讨过通过逻辑实现乘法跳过，但目前最直接的方法是确认当前版本是否支持该特性。根据讨论，如果仅仅将权重设为零，估计的资源使用情况可能不会有显著变化，因为底层实现可能仍将其视为常规乘法。用户需关注后续版本更新或手动优化生成的 RTL 代码来实现真正的稀疏性加速。","https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fissues\u002F17",{"id":165,"question_zh":166,"answer_zh":167,"source_url":168},22638,"如何评估 FPGA 上神经网络的性能指标（如 GOPS）？","评估性能时，GOPS（每秒十亿次操作）的定义通常基于 Intel 白皮书，即一次浮点乘法或加法算作一次操作。但需注意，峰值浮点计算能力高度依赖于所选的 FPGA 系列和具体实现。如果神经网络未充分利用 FPGA 的 DSP 或逻辑单元（LC），测得的 GOPS 可能低于其他文献值。另一个重要的评价指标是能效比（GOPS\u002FWatt），这通常是 FPGA 相对于 GPU 的主要优势所在。在对比性能时，应明确定义操作单位并考虑硬件利用率。","https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fissues\u002F31",{"id":170,"question_zh":171,"answer_zh":172,"source_url":173},22639,"hls4ml 是否支持 RNN 或 LSTM 层？","是的，hls4ml 已经支持循环神经网络（RNN）层。早期的讨论主要集中在如何实现状态内存（state memory）和节点函数。目前该功能已通过相关 Pull Request（如 #329）得到解决和集成。用户可以像使用卷积层或全连接层一样，在模型中包含 RNN\u002FLSTM 层并进行转换。如果是自定义实现，需注意状态变量的暴露方式，确保其能在调用函数间正确传递以保持序列记忆。","https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fissues\u002F13",{"id":175,"question_zh":176,"answer_zh":177,"source_url":178},22640,"如何生成 hls4ml 的自动 API 参考文档？","hls4ml 项目已支持使用 Sphinx 从 Python 文档字符串（Docstrings）自动生成 API 参考文档。该功能已通过 Pull Request #268 实现并合并。用户现在可以通过构建文档流程来获取最新的 API 参考，无需手动编写。具体的构建步骤通常涉及安装 Sphinx 及相关依赖，然后在项目根目录运行文档构建命令（如 `make html`），生成的文档将包含所有公开 API 的详细说明。","https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fissues\u002F256",[180,185,190,195,200,205,210,215,220,225,230,235,240,245,250,255,260,265,269,274],{"id":181,"version":182,"summary_zh":183,"released_at":184},136347,"v1.3.0","与 v1.2.0 相比的主要变化包括：\n\n* 由 @dimdano 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1390 中实现的 AI Engine (AIE) 后端支持，作为外部插件。\n* ☢️ 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1240 中实现的针对抗辐射 PolarFire 系列 FPGA 的 Libero 后端。\n* 由 @laurilaatu 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1424 中实现的用于 oneAPI 的 Einsum 和 EinsumDense 操作。\n* 由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1426 中实现的 Vivado\u002FVitis 支持样本广播合并操作。\n* 由 @marco66colombo 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1257 中实现的基于 pytest 的综合测试。\n\n完整的变更列表如下：\n\n* 由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1395 中将版本号提升至 1.2.0，并切换到 ruff 进行代码格式化。\n* 由 @dimdano 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1390 中实现的 AI Engine (AIE) 后端支持，作为外部插件。\n* [pre-commit.ci] 由 @pre-commit-ci[bot] 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1397 中进行的 pre-commit 自动更新。\n* [pre-commit.ci] 由 @pre-commit-ci[bot] 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1399 中进行的 pre-commit 自动更新。\n* 由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1403 中更新 README.md 中的资助部分。\n* 由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1404 中将 NGT 标志添加到 README。\n* [pre-commit.ci] 由 @pre-commit-ci[bot] 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1402 中进行的 pre-commit 自动更新。\n* 由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1405 中手动将 checkout 版本递增至 v6。\n* [pre-commit.ci] 由 @pre-commit-ci[bot] 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1408 中进行的 pre-commit 自动更新。\n* 由 @dimdano 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1409 中实现的后端预测钩子。\n* [pre-commit.ci] 由 @pre-commit-ci[bot] 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1411 中进行的 pre-commit 自动更新。\n* [pre-commit.ci] 由 @pre-commit-ci[bot] 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1414 中进行的 pre-commit 自动更新。\n* 由 @marco66colombo 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1257 中实现的基于 pytest 的综合测试。\n* [pre-commit.ci] 由 @pre-commit-ci[bot] 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1416 中进行的 pre-commit 自动更新。\n* 由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1417 中移除 pytest 9 兼容性所需的参数化 fixture。\n* 由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1413 中提升 upload-artifact 版本。\n* [pre-commit.ci] 由 @pre-commit-ci[bot] 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1420 中进行的 pre-commit 自动更新。\n* 由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1419 中允许部分配置定义。\n* 由 @marco66colombo 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1412 中在测试中添加 Keras3 环境。\n* ☢️ 由 @vloncar 在 https:\u002F\u002Fgi","2026-03-20T14:59:12",{"id":186,"version":187,"summary_zh":188,"released_at":189},136348,"v1.2.0","与 v1.1.0 相比的主要变化如下：\n\n* @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1116 中为 Keras v3 添加了前端。\n* @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1191 中实现了 Dense、Conv1\u002F2D 和 EinsumDense 的分布式算术策略，以及 HGQ2 支持。\n* @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1247 中提供了 PyTorch 扩展 API。\n* @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1158 中构建了用于保存和加载 hls4ml 模型的新基础设施。\n* @dimdano 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1174 中支持将模型图拆分为多图，以减少 HLS 综合时间。\n* @enlupi 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1310 中为 Keras 前端和 Vitis 后端添加了双向 RNN 层支持。\n\n\n完整的变更列表如下：\n* @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1234 中更新了 v1.1.0 的文档。\n* @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1235 中更新了 README 中的发布版本。\n* @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1236 中修复了 `README` 和 `CITATION.cff` 文件中的年份和版本信息。\n* @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1247 中提供了简单的 PyTorch 扩展 API。\n* @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1177 中增加了对 TimeDistributed 层的支持。\n* @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1250 中移除了对 `remove_node(..., rewire)` 的调用，因为该参数已被弃用。\n* [pre-commit.ci] 自动更新 pre-commit 配置，由 @pre-commit-ci[bot] 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1251 中完成。\n* @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1255 中为 PyTorch 扩展 API 测试使用了唯一的层名称。\n* @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1253 中修复了 oneAPI 转换类型中的逻辑，避免对同一变量进行两次转换。\n* oneAPI 后端更新报告，由 @enlupi 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1222 中提交。\n* @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1263 中修复了 ONNX 处理多输出时的问题。\n* 文档更新：将快速入门示例更新为使用 Vitis 后端，由 @nikiburggraf 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1258 中完成。\n* @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1249 中新增了控制 TensorBoard 输出位置的功能，可选择输出到标准输出、文件或两者兼有。\n* @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1274 中更新了 pyupgrade 的目标版本。\n* @GiuseppeDiGuglielmo 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1282 中增加了在绘制模型信息时显示累加器精度的功能。\n* @GiuseppeDiGuglielmo 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1281 中为箱线图添加了一条垂直线，表示 x = 1 = 2^0。\n* @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1285 中修复了当 axis=3 或 -1 时 concat3d 的问题。\n* 用于保存和加载的基础设施","2025-11-03T20:19:32",{"id":191,"version":192,"summary_zh":193,"released_at":194},136349,"v1.1.0","## 变更内容\n与 v1.0.0 相比的主要变更如下：\n\n* @steltze 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1037 中为 Vitis 后端引入了新的 FIFO 深度优化器。\n* @laurilaatu 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1131 和 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1195 中通过添加深度可分离卷积以及 RNN 状态和激活量化器，扩展了 oneAPI 后端。\n* @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1124 中为 vivado\u002Fvitis 引入了新的通用转置实现，并由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1165 中适配到 oneAPI。\n* @steltze 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1079 中为 io_stream 资源策略实现了新的 1D 和 2D 深度可分离卷积。\n\n完整的变更列表如下：\n\n* 不覆盖已设置的 accum_t，修复逐点运算输出分辨率，由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1146 中完成。\n* 分割 hgq 测试并将 qkeras 测试单独隔离，以使测试运行时间控制在 1 小时以内，由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1153 中完成。\n* oneAPI 的深度可分离卷积，由 @laurilaatu 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1131 中实现。\n* [pre-commit.ci] pre-commit 自动更新，由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1159 中完成。\n* 修复 Vivado Accelerator 缺失分区因子变量的问题，由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1160 中解决。\n* 修复 PyTorch 中通道最后转换的错误，由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1161 中完成。\n* 支持 PyTorch 解析器中的 Constant 节点，由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1123 中实现。\n* oneAPI 2025.0 的头文件变更，由 @laurilaatu 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1149 中完成。\n* 更新 Torch 性能分析器，由 @jicampos 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1156 中完成。\n* 为 vivado\u002Fvitis 添加通用转置功能，由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1124 中实现。\n* 移除 np.float_（已在 NumPy >= 2.0 中弃用），由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1172 中完成。\n* 在 insert_node 中添加无输入检查，由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1170 中完成。\n* 增加对卷积实现的检查，由 @jicampos 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1155 中完成。\n* 延迟转换器导入并迁移到 pyproject.toml，由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1094 中完成。\n* 修复 PyTorch 上采样解析问题，由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1186 中完成。\n* [pre-commit.ci] pre-commit 自动更新，由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1182 中完成。\n* 修复数据类型不一致导致的量化 RNN 问题，由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1171 中完成。\n* 支持 PyTorch 解析器中的多输出，由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F1151 中完成。\n* 通用","2025-03-17T16:51:08",{"id":196,"version":197,"summary_zh":198,"released_at":199},136350,"v1.0.0","## 变更内容\n\nhls4ml v1.0.0 “foxglove” 引入了多项重大改进：\n\n* @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F979 中引入的全新 QONNX 前端\n* @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F855 中实现的 hls4ml 自动推断数据类型精度的功能\n* @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F955 中添加的 Intel oneAPI 实验性后端\n* @dgburnette 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F956 中添加的 Siemens Catapult 后端\n* @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F914 中新增的对 HGQ 代理模型的支持\n* @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F768 和 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F809 中提供的硬件感知优化 API\n\n其他改进和修复的完整列表如下：\n* [pre-commit.ci] pre-commit 自动更新，由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F949 中完成\n* [pre-commit.ci] pre-commit 自动更新，由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F953 中完成\n* hls4ml 优化 API [第 1 部分]，由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F768 中实现\n* QKeras 对 RNN 层的支持，由 @laurilaatu 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F856 中添加\n* [pre-commit.ci] pre-commit 自动更新，由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F962 中完成\n* 尝试通过限制 tensorflow-model-optimization 来修复 Sphinx 问题，由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F967 中实施\n* 将 pre-commit\u002Faction 从 3.0.0 升级到 3.0.1，由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F968 中完成\n* 将 fractional（及其他）改为属性，并移动量化器，由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F964 中完成\n* [pre-commit.ci] pre-commit 自动更新，由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F969 中完成\n* [pre-commit.ci] pre-commit 自动更新，由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F971 中完成\n* vitis 后端 tarball 修复，由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F972 中完成\n* 移除 nnet_dense_resource.h 的特殊 Vitis 版本，由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F975 中完成\n* 允许 Vitis 综合测试，由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F927 中实现\n* 修复综合测试的清理工作（源自 927 的遗留问题），由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F989 中完成\n* 通过固定 tensorflow\u003C=2.15 来修复 Sphinx，由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F992 中完成\n* [pre-commit.ci] pre-commit 自动更新，由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F984 中完成\n* 添加时钟不确定性配置选项，由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F870 中实现\n* 为 Catapult 后端准备初始变更集","2024-12-09T22:47:29",{"id":201,"version":202,"summary_zh":203,"released_at":204},136351,"v0.8.1","## 变更内容\n* 由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F906 中修复 #905 问题\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F921 中自动更新 pre-commit 配置\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F930 中修复 README.md 中的 logo 显示问题\n* 由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F909 中修复当浮点位数 ≥ 14 时的写入器精度问题\n* 由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F907 中使 repack_stream 优化器继承原始精度\n* 由 @schsu 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F941 中更新 A3D3 资助编号\n* 由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F911 中为生成流克隆时添加精度继承功能\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F942 中自动更新 pre-commit 配置\n* 由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F908 中修复 Quartus 多输出与流处理的问题\n* 由 @Landay7 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F940 中修复 Keras LSTM 层的性能分析问题\n* 由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F937 中修复可能乱序的多个输入问题\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F944 中自动更新 pre-commit 配置\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F943 中将 actions\u002Fupload-artifact 版本从 3 升级到 4\n* 由 @calad0i 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F934 中改进了 replace_node 函数\n* 由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F945 中升级至 0.8.1 版本\n\n## 新贡献者\n* @schsu 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F941 中完成了首次贡献\n* @Landay7 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F940 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fcompare\u002Fv0.8.0...v0.8.1","2023-12-19T21:00:03",{"id":206,"version":207,"summary_zh":208,"released_at":209},136352,"v0.8.0","## 变更内容\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F781 中实现：将流水线风格与策略解耦\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F770 中实现：在 ModelGraph 和层中不使用 reader\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F795 中实现：移除 tf_to_hls\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F796 中实现：pre-commit 自动更新\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F802 中实现：修复 QConv2DBatchnorm 权重的解析问题\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F801 中实现：pre-commit 自动更新\n* 讨论 - 内联卷积会显著降低延迟（高达 x15 - x20）由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F800 中提出\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F807 中实现：pre-commit 自动更新\n* 由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F806 中实现：修复量化 po2 的位数过度分配问题\n* 由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F797 中实现：将 Conv 层中的零传播到乘法配置中\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F815 中实现：修复 Vitis Conv1D\u002F2D 延迟策略\n* 改进使用 torch.FX 解析 PyTorch 模型的功能 - 清理工作由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F799 中完成\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F816 中实现：pre-commit 自动更新\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F794 中实现：支持解析嵌套模型\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F821 中实现：修复 n 维密集层转换为 1x1 卷积时的权重加载问题\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F828 中实现：pre-commit 自动更新\n* 由 @joshlerner 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F827 中实现：修复 GarNetStacked 和 GarNet 内部数组精度的权重加载问题\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F830 中实现：pre-commit 自动更新\n* 由 @drankincms 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F833 中实现：修复 GRU\u002FLSTM 的性能分析问题\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F835 中实现：pre-commit 自动更新\n* 由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F836 中实现：移除已过时且未使用的 Docker 目录\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F842 中实现：pre-commit 自动更新\n* 由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F847 中实现：移除 PyTorch 和 Keras 之间已过时且不再使用的参数映射\n* 由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F804 中实现：使二值 CNN 在 Keras 和 hls4ml 之间的结果保持一致\n* 由 @jmitrevs 在 https:\u002F\u002Fgithu 中实现：ExponentPrecisionType 和 XnorPrecisionType 不再继承自 IntegerPrecisionType","2023-11-16T00:10:25",{"id":211,"version":212,"summary_zh":213,"released_at":214},136353,"v0.8.0rc1","## 变更内容\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F781 中实现：将流水线风格与策略解耦\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F770 中实现：在 ModelGraph 和层中不使用 reader\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F795 中实现：移除 tf_to_hls\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F796 中实现：pre-commit 自动更新\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F802 中实现：修复 QConv2DBatchnorm 权重的解析问题\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F801 中实现：pre-commit 自动更新\n* 讨论 - 内联卷积会显著降低延迟（高达 x15 - x20）由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F800 中提出\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F807 中实现：pre-commit 自动更新\n* 由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F806 中实现：修复量化 po2 的位数过度分配问题\n* 由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F797 中实现：将 Conv 层的零值传播到乘法配置中\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F815 中实现：修复 Vitis Conv1D\u002F2D 延迟策略\n* 改进使用 torch.FX 解析 PyTorch 模型的功能 - 清理工作由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F799 中完成\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F816 中实现：pre-commit 自动更新\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F794 中实现：支持解析嵌套模型\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F821 中实现：修复 n 维密集层转换为 1x1 卷积时的权重加载问题\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F828 中实现：pre-commit 自动更新\n* 由 @joshlerner 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F827 中实现：修复 GarNetStacked 和 GarNet 内部数组精度的权重加载问题\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F830 中实现：pre-commit 自动更新\n* 由 @drankincms 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F833 中实现：修复 GRU\u002FLSTM 的性能分析问题\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F835 中实现：pre-commit 自动更新\n* 由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F836 中实现：移除已过时且未使用的 Docker 目录\n* [pre-commit.ci] 由 @pre-commit-ci 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F842 中实现：pre-commit 自动更新\n* 由 @JanFSchulte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F847 中实现：移除 PyTorch 和 Keras 之间已过时且不再使用的参数映射\n* 由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F804 中实现：使二值 CNN 在 Keras 和 hls4ml 之间的匹配一致\n* 由 @jmitrevs 在 https:\u002F\u002Fgithu 中实现：ExponentPrecisionType 和 XnorPrecisionType 不再继承自 IntegerPrecisionType","2023-11-08T00:09:05",{"id":216,"version":217,"summary_zh":218,"released_at":219},136354,"v0.7.1","## 变更内容\n* 由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F778 中将版本号升级至 v0.7.0\n* 由 @drankincms 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F760 中修复了在 io_parallel 特殊情况下且采用完全并行化时的 2D 卷积层问题\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F780 中修复了当策略为 resource 时的 RNN 层问题\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F786 中更新了 Jenkins 测试环境，以避免依赖地狱\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F785 中显式设置了逐点卷积的策略\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F788 中对 0.7.1 的文档进行了小幅修复\n* 由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F791 中发布了 0.7.1 版本\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fcompare\u002Fv0.7.0...v0.7.1","2023-05-13T15:55:52",{"id":221,"version":222,"summary_zh":223,"released_at":224},136355,"v0.7.0","## 变更内容\n* 由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F403 中修复 conv1d 的 io_parallel 资源问题\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F407 中加速 CI 测试\n* 由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F399 中修复 GlobalPooling1D 层\n* 由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F414 中修复批处理的多输入问题\n* 由 @siorpaes 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F424 中修复 ‘qkeras_mnist_dense’ 示例的构建问题 #423\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F435 中更新至 pyyaml 6.0\n* 由 @nicologhielmetti 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F420 中更新 `axi_stream_driver`\n* 重塑修复：对于 flatten 不重新打包流；移除最终的重塑，由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F443 中完成\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F448 中修复使用 `io_type = io_parallel` 和 `Strategy: Resource` 的 Conv2D\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F384 中支持对多维张量应用 Softmax\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F447 中禁用部分不支持的层\n* 修复：量化 relu 和无符号性能分析（第二部分），由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F441 中完成\n* 在 config.py 中添加 GarNet 和 GarNetStack，由 @yiiyama 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F344 中完成\n* 由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F480 中支持 ZeroPadding 层\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F395 中引入新的后端开发框架\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F499 中注册 `ApplyAlpha` 层模板\n* 由 @nicologhielmetti 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F501 中扩展解析功能\n* 由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F490 中移除乘法中的中间类型转换\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F511 中将 QKeras 添加为包依赖项\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F510 中从配置中复制流程\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F508 中更新 VivadoAccelerator 后端\n* 由 @nemerchiedde 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F527 中优化查找表\n* 由 @ChiRuiChen 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F520 中添加 Upsampling2D 测试用例\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F475 中支持 UpSampling1D\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F521 中实现 RNN 支持（第一部分）\n* 由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F523 中实现 Quartus 自定义矩阵乘法和量化\n* 由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F540 中实现 Quartus 上与 Vivado 等效的 Softmax 实现\n* 确保 po2 q 中的比例缩放使用 2 位","2023-04-26T17:08:12",{"id":226,"version":227,"summary_zh":228,"released_at":229},136356,"v0.7.0rc1","## 变更内容\n* 由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F403 中修复 conv1d 的 io_parallel 资源问题\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F407 中加速 CI 测试\n* 由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F399 中修复 GlobalPooling1D 层\n* 由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F414 中修复批处理的多输入问题\n* 由 @siorpaes 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F424 中修复 ‘qkeras_mnist_dense’ 示例的构建问题 #423\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F435 中更新至 pyyaml 6.0\n* 由 @nicologhielmetti 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F420 中更新 `axi_stream_driver`\n* 重塑修复：对于 flatten 不重新打包流；移除最终的 reshape，由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F443 中完成\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F448 中修复使用 `io_type = io_parallel` 和 `Strategy: Resource` 的 Conv2D\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F384 中支持对多维张量应用 Softmax\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F447 中禁用部分不支持的层\n* 修复：量化 relu 和无符号性能分析（第二部分），由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F441 中完成\n* 在 config.py 中添加 GarNet 和 GarNetStack，由 @yiiyama 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F344 中实现\n* 由 @jmduarte 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F480 中支持 ZeroPadding 层\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F395 中引入新的后端开发框架\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F499 中注册 `ApplyAlpha` 层模板\n* 由 @nicologhielmetti 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F501 中扩展解析功能\n* 由 @jmitrevs 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F490 中移除乘法中的中间类型转换\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F511 中将 QKeras 添加为包依赖项\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F510 中从配置中复制流程\n* 由 @thesps 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F508 中更新 VivadoAccelerator 后端\n* 由 @nemerchiedde 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F527 中优化查找表\n* 由 @ChiRuiChen 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F520 中添加 Upsampling2D 测试用例\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F475 中支持 UpSampling1D\n* 由 @vloncar 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F521 中实现 RNN 支持（第一部分）\n* 由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F523 中实现 Quartus 自定义矩阵乘法和量化\n* 由 @bo3z 在 https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F540 中实现 Quartus 上与 Vivado 等效的 Softmax 实现\n* 确保 po2 q 中的比例缩放位数为 2 位","2023-04-15T01:44:06",{"id":231,"version":232,"summary_zh":233,"released_at":234},136357,"v0.6.0","## What's Changed\r\n* `VivadoAccelerator` backend: target `pynq-z2` and `zcu102` boards directly from hls4ml by @nicologhielmetti\r\n* Updated `PyTorch` and `ONNX` converters by @Duchstf \r\n* `line_buffer` Conv2D implementation for `io_stream`: reduced resource usage and latency by @Keb-L, @violatingcp, @vloncar \r\n* Support `QConv2DBatchnorm` layer from `QKeras` by @nicologhielmetti \r\n* Improved profiling plots - easier to compare original vs `hls4ml` converted models by @maksgraczyk \r\n* Better derivation of data types for `QKeras` models by @jmduarte, @thesps \r\n* Improved CI by @thesps\r\n* More support for models with branches, skip connections, `Merge` and `Concatenate` layers by @jmduarte, @vloncar \r\n* Support for `Dense` layers over multi-dimensional tensors by @vloncar \r\n* Overall improvements by @vloncar, @jmduarte, @thesps, @jmitrevs & others\r\n\r\n## New Contributors\r\n* @siorpaes made their first contribution in https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F424\r\n* @jmitrevs made their first contribution in https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F403\r\n* @anders-wind made their first contribution in https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F302\r\n* @KOVI89alipes made their first contribution in https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F318\r\n* @maksgraczyk made their first contribution in https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F323\r\n* @Keb-L made their first contribution in https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F332\r\n* @ConsVin made their first contribution in https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F307\r\n* @nicologhielmetti made their first contribution in https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fpull\u002F298\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ffastmachinelearning\u002Fhls4ml\u002Fcompare\u002Fv0.5.0...v0.6.0","2021-11-12T12:25:59",{"id":236,"version":237,"summary_zh":238,"released_at":239},136358,"v0.5.0","What's new:\r\n- Streaming IO layer implementations, especially of Convolutional layers, accessed through the config with `IOType: io_stream`. Scales CNN support to much larger models than previously possible (see [arXiv:2101.05108](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.05108))\r\n- New [documentation and API reference](https:\u002F\u002Ffastmachinelearning.org\u002Fhls4ml\u002F)\r\n- Further optimizations for QKeras \u002F quantization aware training. A 'shift' operation is now used for `po2` quantizers\r\n- Allow redefinition of weights directory for standalone project compilation\r\n- `profiling` for PyTorch models\r\n\r\nDeprecated:\r\n- `IOType : io_serial` is deprecated, and superceded by new `IOType: io_stream`\r\n\r\nBugfixes:\r\n- Fix to Initiation Interval and different min\u002Fmax latency for `Strategy: Resource`\r\n- Fix warnings in `hls4ml` command line script flow\r\n- Write yml config from Python API - for mixed API \u002F command line flow","2021-03-05T17:15:46",{"id":241,"version":242,"summary_zh":243,"released_at":244},136359,"v0.5.0-beta","Pre-release of hls4ml version `v0.5.0`. \r\n\r\nWhat's new:\r\n- Streaming IO layer implementations, especially of Convolutional layers, accessed through the config with `io_type: io_stream`. Scales CNN support to much larger models than previously possible (see [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.05108))\r\n- New [documentation and API reference](https:\u002F\u002Ffastmachinelearning.org\u002Fhls4ml\u002F)\r\n- Further optimizations for QKeras \u002F quantization aware training. A 'shift' operation is now used for `po2` quantizers\r\n- Allow redefinition of weights directory for standalone project compilation","2021-01-18T15:37:23",{"id":246,"version":247,"summary_zh":248,"released_at":249},136360,"v0.4.0","What's new:\r\n\r\n - Support for GarNet layer (see [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.03601]))\r\n - Input layer precision added to config generator utility\r\n - New 'SkipOptimizers' config option. Now you can run all Optimizers by default (as in v0.3.0) but subtract any specified by 'SkipOptimizers' e.g. `hls_config['SkipOptimizers'] = ['fuse_consecutive_batch_normalization']`\r\n - Print out the latency report from Cosimulation\r\n \r\nBugfixes:\r\n\r\n - Fixes related to tensorflow 2.3: new Functional API, changes to handling of Input layer\r\n - Fix error with config generator utility and activation layers gor `granularity='name'`\r\n - Fix issue with reloading of emulation library after configuration change\r\n - Fix to handling of layers with `use_bias=False` and merged Dense and BatchNormalization\r\n\r\n","2020-10-30T16:49:04",{"id":251,"version":252,"summary_zh":253,"released_at":254},136361,"v0.3.0","What's new:\r\n- API expansion: \r\n  - Create configuration dictionary from model object\r\n  - Run 'C Simulation' from Python with `hls_model.predict(X)`\r\n  - Trace model layer output with `hls_model.trace(X)`\r\n  - Write HLS project, run synthesis flow from Python\r\n- QKeras support: convert models trained using layers and quantizers from QKeras\r\n- Example models moved to separate repo, added as a submodule with an API to retrieve them\r\n- New Softmax implementations\r\n- Minor fixes: weights exported at higher precision, concatenate layer shape corrected","2020-07-31T09:06:47",{"id":256,"version":257,"summary_zh":258,"released_at":259},136362,"v0.2.0","What's new:\r\n- `tf_to_hls`: convert tensorflow protobuf (`.pb`) models to HLS projects\r\n- Support for Keras model `.h5` files (extending existing support for `.json` architecture + `.h5` weights format)\r\n- Support larger Conv1D \u002F 2D layers\r\n- Support for binary and ternary layers from QKeras\r\n- API enhancements for addition of custom layer and new backends\r\n- Keras and HLS model profiling tool\r\n- `hls4ml report` command to gather HLS build reports\r\n- `hls4ml build -l` command to run logic synthesis\r\n- Fused Batch Normalization and Dense layer optimization pass\r\n","2020-03-31T10:48:40",{"id":261,"version":262,"summary_zh":263,"released_at":264},136363,"v0.1.6","- Support for larger Dense layers (enabled with `Strategy: Resource` in the configuration file)\r\n- Binary\u002FTernary NN refinements\r\n- Built-in optimization framework\r\n- Optional C\u002FRTL validation\r\n","2020-02-10T17:05:20",{"id":266,"version":267,"summary_zh":76,"released_at":268},136364,"v0.1.5","2019-08-02T20:38:07",{"id":270,"version":271,"summary_zh":272,"released_at":273},136365,"v0.1.2","Update license","2018-03-20T16:55:43",{"id":275,"version":276,"summary_zh":277,"released_at":278},136366,"v0.1.1","second beta version: fixed README","2018-03-16T18:07:40"]