[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-gomlx--gomlx":3,"tool-gomlx--gomlx":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",150720,2,"2026-04-11T11:33:10",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":64,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":75,"owner_location":75,"owner_email":76,"owner_twitter":75,"owner_website":77,"owner_url":78,"languages":79,"stars":95,"forks":96,"last_commit_at":97,"license":98,"difficulty_score":10,"env_os":99,"env_gpu":100,"env_ram":101,"env_deps":102,"category_tags":112,"github_topics":113,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":119,"updated_at":120,"faqs":121,"releases":152},6699,"gomlx\u002Fgomlx","gomlx","GoMLX: An Accelerated Machine Learning Framework For Go","GoMLX 是一个专为 Go 语言打造的高性能机器学习与数学计算框架，常被开发者视为\"Go 语言版的 PyTorch 或 TensorFlow\"。它旨在解决 Go 生态中缺乏原生、高效且功能完整的深度学习工具的痛点，让开发者无需依赖 Python 环境即可在 Go 项目中直接完成模型的训练、微调、修改及组合。\n\n这套框架非常适合熟悉 Go 语言的软件工程师、后端开发者以及希望将 AI 能力无缝集成到现有 Go 服务中的研究人员。其设计理念高度契合 Go 哲学，强调代码的可读性与逻辑的透明性，虽然写法上略显严谨，但能帮助用户建立清晰的心智模型，避免隐性陷阱。\n\nGoMLX 的技术亮点在于其灵活的双后端架构：一方面提供纯 Go 实现的后端，确保极高的可移植性，甚至能在浏览器（WASM）或嵌入式设备上运行；另一方面集成了基于 OpenXLA 的高性能引擎，支持即时编译（JIT），可充分利用 NVIDIA GPU、TPU 等多硬件加速资源。这使得它既能胜任轻量级边缘计算任务，也能通过分布式执行技术处理大规模模型训练，性能表现可与主流 Python 框架媲美。","# **_GoMLX_**, an Accelerated ML and Math Framework\n\n[![GoDev](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fgo.dev-reference-007d9c?logo=go&logoColor=white)](https:\u002F\u002Fpkg.go.dev\u002Fgithub.com\u002Fgomlx\u002Fgomlx?tab=doc)\n[![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fgomlx\u002Fgomlx)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002FLICENSE)\n[![Go Report Card](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgomlx_gomlx_readme_2b4a70945b89.png)](https:\u002F\u002Fgoreportcard.com\u002Freport\u002Fgithub.com\u002Fgomlx\u002Fgomlx)\n[![Linux\u002Famd64 Tests](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Flinux_amd64_tests.yaml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Flinux_amd64_tests.yaml)\n[![Linux\u002Farm64 Tests](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Flinux_arm64_tests.yaml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Flinux_arm64_tests.yaml)\n[![Darwin\u002Farm64 Tests](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Fdarwin_tests.yaml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Fdarwin_tests.yaml)\n[![Windows\u002Famd64 Tests](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Fwindows_amd64_tests.yaml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Fwindows_amd64_tests.yaml)\n![Coverage](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCoverage-71.4%25-yellow)\n[![Slack](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-GoMLX-purple.svg?logo=slack)](https:\u002F\u002Fapp.slack.com\u002Fclient\u002FT029RQSE6\u002FC08TX33BX6U)\n[![Sponsor gomlx](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSponsor-gomlx-white?logo=github&style=flat-square)](https:\u002F\u002Fgithub.com\u002Fsponsors\u002Fgomlx)\n\n## 📖 About **_GoMLX_**\n\u003Cimg align=\"right\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgomlx_gomlx_readme_71a8ac0bad19.png\" alt=\"GoMLX Gopher\" width=\"220px\"\u002F>\n\n**GoMLX** is an easy-to-use set of Machine Learning and generic math libraries and tools. \nIt can be seen as a **PyTorch\u002FJax\u002FTensorFlow for Go**.\n\nIt can be used to train, fine-tune, modify, and combine machine learning models. \nIt provides all the tools to make that work easy: from a complete set of differentiable operators, \nall the way to UItools to plot metrics while training in a notebook.\n\nIt runs almost everywhere Go runs, using a pure Go backend. \nIt runs even in the browser with WASM ([see demo created with GoMLX](https:\u002F\u002Fjanpfeifer.github.io\u002FhiveGo\u002Fwww\u002Fhive\u002F)). \nLikely, it will work in embedded devices as well (see [Tamago](https:\u002F\u002Fgithub.com\u002Fusbarmory\u002Ftamago)).\n\nIt also supports a very optimized backend engine based on [OpenXLA](https:\u002F\u002Fgithub.com\u002Fopenxla\u002Fxla) \nthat uses just-in-time compilation to CPU, GPUs (Nvidia, and likely AMD ROCm, Intel, Macs) and Google's TPUs.\nIt also supports modern distributed execution (**new, still being actively improved**) for multi-TPU or multi-GPU\nusing XLA Shardy, an evolution of the [GSPMD distribution](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.04663)).\n\nIt's the same engine that powers Google's [Jax](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fjax), \n[TensorFlow](https:\u002F\u002Ftensorflow.org\u002F) and [Pytorch\u002FXLA](https:\u002F\u002Fdocs.pytorch.org\u002Fxla\u002Fmaster\u002Flearn\u002Fxla-overview.html),\nand it has the same speed in many cases. \nUse this backend to train large models or with large datasets.\n\n> [!Tip]\n> * See our 🎓 [**tutorial**](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Ftutorial.html) 🎓\n> * See _Eli Bendersky_'s blog post [\"GoMLX: ML in Go without Python\"](https:\u002F\u002Feli.thegreenplace.net\u002F2024\u002Fgomlx-ml-in-go-without-python\u002F), \n>   (a bit outdated, but still useful)\n> * A [guided example for Kaggle Dogs Vs Cats](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fdogsvscats.html).\n> * A simple [GoMLX slide deck](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002F1QWp_N9_7_n7gQKePHfmb5AFFBXnN6DTqjpWxNC0Ecpk\u002Fedit?usp=sharing) with small sample code.  \n\n\u003Cdiv>\n\u003Cp>It was developed to be a full-featured ML platform for Go, productionizable and easily to experiment with ML ideas\n—see Long-Term Goals below.\u003C\u002Fp>\n\nIt strives to be **simple to read and reason about**, leading the user to a correct and transparent mental model \nof what is going on (no surprises)—aligned with Go philosophy.\nAt the cost of more typing (more verbose) at times.\n\nIt is also incredibly flexible and easy to extend and try non-conventional ideas: use it to experiment with new\noptimizer ideas, complex regularizers, funky multitasking, etc.\n\nDocumentation is kept up to date (if it is not well-documented, it is as if the code is not there), \nand error messages are useful (always with a stack-trace) and try to make it easy to solve issues.\n\u003C\u002Fdiv>\n\n## 🗺️ Overview\n\n**GoMLX** is a full-featured ML framework, supporting various well-known ML components  \nfrom the bottom to the top of the stack. But it is still only a slice of what a major ML library\u002Fframework should provide \n(like TensorFlow, Jax, or PyTorch).\n\n### Examples developed using GoMLX\n\n  * **🚀 NEW 🚀** [KaLM-Gema3 12B parameters](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface\u002Ftree\u002Fmain\u002Fexamples\u002Fkalmgemma3): Tecent's top-ranked sentence encoder\n    for RAGs, using [go-huggingface](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface\u002F) to load the model and tokenizer, and **GoMLX** to execute it.\n  * **🚀 NEW 🚀** [Gemma 3 270M](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002Fgemma3): Demonstrates ONNX-converted\n    text generation (LLM) using the [onnx-community\u002Fgemma-3-270m-it-ONNX](https:\u002F\u002Fhuggingface.co\u002Fonnx-community\u002Fgemma-3-270m-it-ONNX) \n    model with GoMLX. \n    It uses the [`gomlx\u002Fonnx-gomlx`](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fonnx-gomlx) package to convert the model, and [`gomlx\u002Fgo-huggingface`](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface) to download the model and run the   * **🚀 NEW 🚀** [GPT-2](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002Fgpt2): Demonstrates text generation using the\n    the new (experimental) transformer and generator packages.\ntokenizer.\n  * **🚀 NEW 🚀** [BERT-base-NER](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002FBERT-base-NER): A BERT-base model fine-tuned\n    for Named Entity Recognition. It's also a ONNX-converted model from [dslim\u002Fbert-base-NER model](https:\u002F\u002Fhuggingface.co\u002Fdslim\u002Fbert-base-NER) from HuggingFace.\n  - **🚀 NEW 🚀** [MixedBread Reranker v1](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002Fmxbai-rerank): A cross-encoder reranking \n    example, see [HuggingFace MixedBread Reranker v1 page](https:\u002F\u002Fhuggingface.co\u002Fmixedbread-ai\u002Fmxbai-rerank-base-v1).\n    It uses the [`gomlx\u002Fonnx-gomlx`](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fonnx-gomlx) package to convert the model, and [`gomlx\u002Fgo-huggingface`](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface) to download the model and run the tokenizer.\n\n  * [Adult\u002FCensus model](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fuci-adult.html);\n  * [How do KANs learn ?](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fkan_shapes.html); \n  * [Cifar-10 demo](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fcifar.html); \n  * [MNIST demo (library and command-line only)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002Fmnist)\n  * [Dogs & Cats classifier demo](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fdogsvscats.html); \n  * [IMDB Movie Review demo](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fimdb.html); \n  * [Diffusion model for Oxford Flowers 102 dataset (generates random flowers)](examples\u002Foxfordflowers102\u002FOxfordFlowers102_Diffusion.ipynb);\n    * [Flow Matching Study Notebook](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fflow_matching.html) based on Meta's [\"Flow Matching Guide and Code\"](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Fflow-matching-guide-and-code\u002F).\n  * [GNN model for OGBN-MAG (experimental)](examples\u002Fogbnmag\u002Fogbn-mag.ipynb).\n  * Last, a trivial [synthetic linear model](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002Fexamples\u002Flinear\u002Flinear.go), for those curious to see a barebones simple model.\n  * Neural Style Transfer 10-year Celebration: [see a demo written using GoMLX](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002Fstyletransfer\u002Fblob\u002Fmain\u002Fdemo.ipynb) of the [original paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1508.06576).\n  * [Triplet Losses](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002Fml\u002Ftrain\u002Flosses\u002Ftriplet.go): various negative sampling strategies as well as various distance metrics.\n  * [AlphaZero AI for the game of Hive](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002FhiveGo\u002F): it uses a trivial GNN to evaluate\n    positions on the board. It includes a [WASM demo (runs GoMLX in the browser!)](https:\u002F\u002Fjanpfeifer.github.io\u002FhiveGo\u002Fwww\u002Fhive\u002F) and a command-line UI to test your skills!\n\n### Backends\n\nGoMLX is a friendly \"intermediary ML API\", that hosts a common API and a library of ML layers and such. But per-se it doesn't execute any computation: it relies on different backends to compile and execute the computation on very different hardware.\n\nThere is a common backend interface (currently in `github.com\u002Fgomlx\u002Fgomlx\u002Fbackends`, but it will soon go to its own repo), and 3 different implementations:\n\n   1. **`xla`**: [OpenXLA](https:\u002F\u002Fgithub.com\u002Fopenxla\u002Fxla) backend for CPUs, GPUs, and TPUs. State-of-the-art as these things go, but only static-shape.\n      For linux\u002Famd64, linux\u002Farm64 (CPU) and darwin\u002Farm64 (CPU) for now. Using the [go-xla](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-xla) Go version of the APIs.\n   2. **`go`**: a pure Go backend (no C\u002FC++ dependencies): slower but very portable (compiles to WASM\u002FWindows\u002Fetc.): \n      * SIMD support is underway (see [SIMD for Go](https:\u002F\u002Fgithub.com\u002Fgolang\u002Fgo\u002Fissues\u002F73787) and under-development [go-highway](https:\u002F\u002Fgithub.com\u002Fajroetker\u002Fgo-highway)); \n      * **🚀 NEW 🚀**: added support for some **fused operations** and for some types of quantization, greatly improving performance\n        in some cases.\n      * See also [GoMLX compiled to WASM to power the AI for a game of Hive](https:\u002F\u002Fjanpfeifer.github.io\u002FhiveGo\u002Fwww\u002Fhive\u002F)\n      * Dynamic shape support planned (maybe mid-2026).\n   3. **🚀 NEW 🚀** **[go-darwinml](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-darwinml)**: Go bindings to Apple's CoreML supporting Metal acceleration, MLX, and any backend DarwinOS related. \n\n### Highlights\n\n* Converting ONNX models to GoMLX with [onnx-gomlx](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fonnx-gomlx): both as an alternative for `onnxruntime` (leveraging XLA),\n  but also to further fine-tune models. See also [go-huggingface](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface) to easily download ONNX model files from HuggingFace.\n* [Docker \"gomlx_jupyterlab\"](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fjanpfeifer\u002Fgomlx_jupyterlab) with integrated JupyterLab and [GoNB](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002Fgonb) (a Go kernel for Jupyter notebooks)\n* Autodiff: automatic differentiation—only gradients for now, no jacobian.\n* Context: automatic variable management for ML models.\n* ML layers library with some of the most popular machine learning \"layers\": FFN layers,  \n  various activation functions, layer and batch normalization, convolutions, pooling, dropout, Multi-Head-Attention\n  (for transformer layers), LSTM, KAN (B-Splines, [GR-KAN\u002FKAT networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.10594), Discrete-KAN, PiecewiseLinear KAN),\n  PiecewiseLinear (for calibration and normalization), various regularizations,\n  FFT (reverse\u002Fdifferentiable), learnable rational functions (both for activations and [GR-KAN\u002FKAT networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.10594)),\n  VNN (Vector Neural Networks) for SO(3)-Equivariant\u002FInvariant layers, etc.\n* Training library, with some pretty-printing. Including plots for Jupyter notebook, using [GoNB, a Go Kernel](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002Fgonb).\n  * Also, various debugging tools: collecting values for particular nodes for plotting, simply logging  the value\n    of nodes during training, stack-trace of the code where nodes are created.\n* `gomlx_checkpoints`, the command line tool to inspect checkpoints of train(-ing) models, **generate plots**\n  with loss and arbitrary evaluation metrics using Plotly.\n  See [example of training session](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fgomlx_checkpoints_plot_example.html),\n  with the effects of a learning rate change during the training.\n  It also allows plotting different models together, to compare their evolution.\n* SGD and Adam (AdamW and Adamax) optimizers.\n* Various losses and metrics.\n* Pre-Trained models to use: InceptionV3 (image model), many more from HuggingFace using [onnx-gomlx](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fonnx-gomlx).\n  See also [go-huggingface](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface) to easily download ONNX model files from HuggingFace. \n* Read Numpy arrays into GoMLX tensors -- see package `github.com\u002Fgomlx\u002Fgomlx\u002Fpkg\u002Fcore\u002Ftensors\u002Fnumpy`.\n* (**Experimental**) Support static linking of PJRT: slower to build the Go program, but deploying it doesn't require installing a PJRT plugin in the machine you are deploying it. It requires you to compile your own static PJRT plugin from XLA sources.\n  Use `go build --tags=pjrt_cpu_static` or include `import _ \"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fxla\u002Fcpu\u002Fstatic\"`.\n* **Auto-installation of XLA PJRT plugins** (for CPU, GPU and TPUs; Linux and Macs)\n  in the user'slocal lib directory (`$HOME\u002F.local\u002Flib` in Linux and `$HOME\u002FLibrary\u002FApplication Support\u002FXLA` in Mac).\n  It can be disabled by setting `GOMLX_NO_AUTO_INSTALL` or programmatically by \n  calling `xla.EnableAutoInstall(false)`).\n* **Distributed Execution** (across multiple GPUs or TPUs) with little hints from the user.\n  One only needs to configure a distributed dataset, and the trainer picks up from there.\n  See code change in [UCI-Adult demo](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002Fexamples\u002Fadult\u002Fdemo\u002Fmain.go#L222). **Experimental**, \n  pls report any issues and help us improve it.\n\n\n## 👥 Support\n\n* Discussion in the [Slack channel #gomlx](https:\u002F\u002Fapp.slack.com\u002Fclient\u002FT029RQSE6\u002FC08TX33BX6U) (you can [join the slack server here](https:\u002F\u002Finvite.slack.golangbridge.org\u002F)).\n* [Q&A and discussions](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fdiscussions\u002Fcategories\u002Fq-a)\n* [Issues](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fissues)\n* Random brainstorming on projects: just start a Q&A, and I'm happy to meet in discord somewhere or VC.\n* [Google Groups: groups.google.com\u002Fg\u002Fgomlx-discuss](https:\u002F\u002Fgroups.google.com\u002Fg\u002Fgomlx-discuss)\n\n## \u003Ca id=\"installation\">\u003C\u002Fa>🛠️ + ⚙️ Installation \n\n**For most users, no installation is needed.**\n\n**For XLA**, it will by default auto-install the required XLA PJRT plugins (for CPU, GPU and TPUs; Linux and Macs)\nin the user's local lib directory (`$HOME\u002F.local\u002Flib\u002Fgo-xla` in Linux; `$HOME\u002FLibrary\u002FApplication Support\u002Fgo-xla` in Mac;\n`$HOME\\AppData\\Local\\go-xla` in Windows).\nIt can be disabled by setting `GOMLX_NO_AUTO_INSTALL` or programmatically by calling `xla.EnableAutoInstall(false)`).\n\nIf you want to manually pre-install for building production dockers, a specific version, or such custom setups,\nsee [github.com\u002Fgomlx\u002Fgo-xla](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-xla) for details, \nthere is a self-explanatory simple installer program.\n\nIf you want to use only a pure **Go backend**, simply do `import _ \"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fsimplego\"` and \nthere is no need to install anything.\n\n## 🐳  [Pre-built Docker](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fjanpfeifer\u002Fgomlx_jupyterlab)\n\nThe easiest to start playing with it, it's just [pulling the docker image](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fjanpfeifer\u002Fgomlx_jupyterlab)\nthat includes **GoMLX** + [JupyterLab](https:\u002F\u002Fjupyterlab.readthedocs.io\u002F) + [GoNB](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002Fgonb) (a Go kernel for Jupyter) and \n[Nvidia's CUDA runtime](https:\u002F\u002Fhub.docker.com\u002Flayers\u002Fnvidia\u002Fcuda\u002F11.8.0-cudnn8-runtime-ubuntu22.04\u002Fimages\u002Fsha256-08aed54a213b52e9cb658760b6d985db2f4c5f7e8f11ac45ec66b5c746237823?context=explore)\n(for optional support of GPU) pre-installed -- it is ~5Gb to download.\n\nFrom a directory you want to make visible in Jupyter, do:\n> For GPU support add the flag `--gpus all` to the `docker run` command bellow.\n\n```bash\ndocker pull janpfeifer\u002Fgomlx_jupyterlab:latest\ndocker run -it --rm -p 8888:8888 -v \"${PWD}\":\u002Fhome\u002Fjupyter\u002Fwork janpfeifer\u002Fgomlx_jupyterlab:latest\n```\n\nIt will display a URL starting with `127.0.0.1:8888` in the terminal (it will include a secret token needed) that you can open in your browser.\n\nYou can open and interact with the tutorial from there, it is included in the docker under the directory `Projects\u002Fgomlx\u002Fexamples\u002Ftutorial`.\n\nMore details on the [docker here](docker\u002Fjupyterlab\u002FREADME.md).\n\nIt runs on Windows as well: _Docker Desktop_ uses WSL2 under the hood.\n\n## 🧭 Tutorial\n\nSee the [tutorial here](examples\u002Ftutorial\u002Ftutorial.ipynb). It covers a bit of everything. \n\nAfter that, look at the demos in the [examples\u002F](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples) directory.\n\nThe library itself is well-documented (pls open issues if something is missing), and the code is\nnot too hard to read. \n_Godoc_ is available in [pkg.go.dev](https:\u002F\u002Fpkg.go.dev\u002Fgithub.com\u002Fgomlx\u002Fgomlx).\n\nFinally, feel free to ask questions: time allowing (when not at work), I'm always happy to help: \nI'm offten connected to [Slack channel #gomlx](https:\u002F\u002Fapp.slack.com\u002Fclient\u002FT029RQSE6\u002FC08TX33BX6U);\nalternatively the [groups.google.com\u002Fg\u002Fgomlx-discuss](https:\u002F\u002Fgroups.google.com\u002Fg\u002Fgomlx-discuss).\n\n### Inference & Productionization\n\nInference or serving a model is done currently by using the Go code used to create the model along with the checkpoint\nwith the trained weights and hyperparameters used to train the model. \nIn other words, it uses the same tools used for training.\n\nIt's straight forward for instance, to create a Docker with a pretrained model and serve it from there.\nOr include it in you own application.\n\nFor a simple example of how to do this and export a model inference as a library, see \n[`...\u002Fexamples\u002Fcifar\u002Fclassifer`](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002Fexamples\u002Fcifar\u002Fclassifier\u002Fclassifier.go), \nand its use in the last cells of the [Cifar-10 demo](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fcifar.html).\n\nIn the future we plan to also export models to ONNX or XLA's StableHLO, and one could use tools that serve those directly,\nwithout linking GoMLX -- it will save a little executable size.\n\n## 🎯 Long-term Goals\n\n1. Building and training models in Go -- as opposed to Python (or some other language) -- with focus on:\n   - Being easy(ier) to read and reason about, leading the user to a correct and transparent mental\n     model of what is going on. Even if that means being more verbose when writing.\n   - Clean, separable APIs: individual APIs should be self-contained and decoupled where possible.\n   - Composability: Any component should be replaceable, so they can be customized and experimented with.\n     That means sometimes more coding (there is not one magic train object that does everything),\n     but it makes it clear what is happening, and to replace parts with custom versions.\n   - Up-to-date documentation: if the documentation is not there or if it's badly written, it's as \n     if the code was not there either.\n   - Clear and actionable error reporting\n2. To be a productive research and educational platform to experiment with new ML ideas and learn.\n   - Support mirrored training on multiple devices and various forms of distributed training (model and\u002For data\n     parallelism), in particular to support for large language models and similarly large model training.\n3. To be a robust and reliable platform for production. Some subgoals:\n   - Support modern accelerator hardware like TPUs and GPUs.\n   - Multiple backends beyond XLA, e.g:  llamacpp, WebNN (with Wasm), pure Go version, etc.\n   - Import pre-trained models from [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels) and allow fine-tuning -- ONNX versions already working for many models in [onnx-gomlx](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fonnx-gomlx).  \n   - Compile models to binary as in C-libraries and\u002For WebAssembly, to be linked and consumed (inference) anywhere\n     (any language).\n\n## 🤔 FAQ\n\n- **What are the environment variables used by GoMLX?** \n  - `GOMLX_BACKEND`: defines the backend engine to use (if using `backends.New()`). The value is formatted as \"\u003Cbackend_name>[:\u003Cbackend_config>]\",\n    with the config part being optional. Examples:\n    - `GOMLX_BACKEND=go`: Use the `SimpleGo` backend, the pure Go implementation that is very portable but slow.\n    - `GOMLX_BACKEND=\"xla:cpu\"`: Use XLA (the faster backend, only runs on Linux now) for CPU\n    - `GOMLX_BACKEND=\"xla:cuda\"`: Use XLA for for Nvidia CUDA\n    - `GOMLX_BACKEND=\"xla:\u002Fpath\u002Fto\u002Fmy\u002Fpjrt_plugin.so\"`: Use XLA with an arbitrary PJRT. PJRT is a plugin system for XLA to support different hardware.\n      One can install PJRTs build for NVIDIA GPUs (there is an installation script for that), there is also one for ROCm (not tested by the author),\n      for TPU (Google Cloud) and reports of PJRTs being built to even new accelerators (e.g.: [TensTorrent XLA](https:\u002F\u002Fgithub.com\u002Ftenstorrent\u002Ftt-xla))\n  - `PJRT_PLUGIN_LIBRARY_PATH`: the underlying XLA backend uses this variable as an extra directory to search for plugin locations.\n    It searches for the systems library paths (`$LD_LIBRARY_PATH`, `\u002Fetc\u002Fld.so.conf`), the default `\u002Fusr\u002Flocal\u002Flib\u002Fgomlx\u002Fpjrt` and `$PJRT_PLUGIN_LIBRARY_PATH` if set.\n  - `GOMLX_NO_AUTO_INSTALL`: if set to `1`, GoMLX will not automatically install PJRTs when running on a system without them.\n  - `XLA_FLAGS`: optional controls for XLA backend. It should be set to a semicolon (\";\") separated list of options. If you set to `--help` \n    the backend will print out some help for all options. There is also a description on the page [XLA Flags Guidance](https:\u002F\u002Fopenxla.org\u002Fxla\u002Fflags_guidance).\n- **What backends to include when using GoMLX?**\n  - The recommendation is to use `import _ \"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fdefault\"` which will import `xla` (or the alias `stablehlo`) and\n    `go` (_SimpleGo_) backends. If you add `-tags=noxla` to the compiler it won't include the XLA backend.\n  - `import _ \"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fsimplego\"` to include only `go` (no C++ dependencies)\n  - `import _ \"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fxla\"` to import only XLA.\n- **Where are AI context files for (Gemini\u002FClaude Code\u002FConductor, etc.)**\n  - To prevent cluttering the root folder of the project, they are all moved to the `.agents\u002F` directory and also included in the `.gitignore`. This way\n    one can simply symbolic link whichever AI configuration file they use to the root directory of their local copy to use them.\n\n## 🤝 Collaborating\n\nThe project is looking forward to contributions for anyone interested. \nMany parts are not yet set in stone, so there is plenty of space for improvements and re-designs for those interested\nand with good experience in Go, Machine Learning, and APIs in general. See the [TODO file](docs\u002FTODO.md)\nfor inspiration.\n\nNo governance guidelines have been established yet.\n\nSee the section Support above to get in touch (Slack channel or Google Groups)!\n\n## 💖 Support the Project\n\nIf you find this project helpful, please consider \n[supporting our work through GitHub Sponsors](https:\u002F\u002Fgithub.com\u002Fsponsors\u002Fgomlx). \n\nYour contribution helps us (currently mostly [me](https:\u002F\u002Fgithub.com\u002Fjanpfeifer)) dedicate more time to maintenance\nand add new features for the entire GoMLX ecosystem.\n\nIt also helps us acquire access (buying or cloud) to hardware for more portability: e.g.: ROCm, Apple Metal (GPU), \nMulti-GPU\u002FTPU, NVidia DGX Spark, Tenstorrent, etc.\n\n## 🚀 Advanced Topics\n\n* [CHANGELOG](docs\u002FCHANGELOG.md)\n* [TODO](docs\u002FTODO.md)\n* [Error Handling](docs\u002Ferror_handling.md)\n* [Developing](docs\u002Fdeveloping.md)\n\n##  💖 Thanks\n\n* [Go](golang.org)\n* [OpenXLA](https:\u002F\u002Fgithub.com\u002Fopenxla\u002Fxla)\n* [TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002F)\n* [Jax](https:\u002F\u002Fgithub.com\u002Fjax-ml\u002Fjax)\n* [PyTorch](https:\u002F\u002Fpytorch.org\u002F)\n* [ONNX](https:\u002F\u002Fonnx.ai\u002F)\n* [HuggingFace](https:\u002F\u002Fhuggingface.co\u002F)\n\n## ⚖️ License \n\n> Copyright 2025 Jan Pfeifer\n\n**GoMLX** is distributed under the terms of the [Apache License Version 2.0](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002FLICENSE).\nUnless it is explicitly stated otherwise, any contribution intentionally submitted for inclusion in this project shall be licensed under [Apache License Version 2.0](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002FLICENSE)\nwithout any additional terms or conditions.\n","# **_GoMLX_**，一个加速的机器学习和数学框架\n\n[![GoDev](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fgo.dev-reference-007d9c?logo=go&logoColor=white)](https:\u002F\u002Fpkg.go.dev\u002Fgithub.com\u002Fgomlx\u002Fgomlx?tab=doc)\n[![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fgomlx\u002Fgomlx)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002FLICENSE)\n[![Go Report Card](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgomlx_gomlx_readme_2b4a70945b89.png)](https:\u002F\u002Fgoreportcard.com\u002Freport\u002Fgithub.com\u002Fgomlx\u002Fgomlx)\n[![Linux\u002Famd64 测试](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Flinux_amd64_tests.yaml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Flinux_amd64_tests.yaml)\n[![Linux\u002Farm64 测试](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Flinux_arm64_tests.yaml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Flinux_arm64_tests.yaml)\n[![Darwin\u002Farm64 测试](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Fdarwin_tests.yaml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Fdarwin_tests.yaml)\n[![Windows\u002Famd64 测试](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Fwindows_amd64_tests.yaml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Factions\u002Fworkflows\u002Fwindows_amd64_tests.yaml)\n![覆盖率](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCoverage-71.4%25-yellow)\n[![Slack](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-GoMLX-purple.svg?logo=slack)](https:\u002F\u002Fapp.slack.com\u002Fclient\u002FT029RQSE6\u002FC08TX33BX6U)\n[![赞助 gomlx](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSponsor-gomlx-white?logo=github&style=flat-square)](https:\u002F\u002Fgithub.com\u002Fsponsors\u002Fgomlx)\n\n## 📖 关于 **_GoMLX_**\n\u003Cimg align=\"right\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgomlx_gomlx_readme_71a8ac0bad19.png\" alt=\"GoMLX Gopher\" width=\"220px\"\u002F>\n\n**GoMLX** 是一套易于使用的机器学习和通用数学库及工具。 \n它可以被视为 **Go 语言版的 PyTorch\u002FJax\u002FTensorFlow**。\n\n它可用于训练、微调、修改和组合机器学习模型。 \n提供了使这些操作变得简单的所有工具：从完整的可微分算子集，\n到在笔记本中进行训练时用于绘制指标的 UI 工具。\n\n它几乎可以在任何运行 Go 的地方运行，使用纯 Go 后端实现。 \n甚至可以在浏览器中通过 WASM 运行（[查看用 GoMLX 创建的演示](https:\u002F\u002Fjanpfeifer.github.io\u002FhiveGo\u002Fwww\u002Fhive\u002F)）。\n很可能它也能在嵌入式设备上运行（参见 [Tamago](https:\u002F\u002Fgithub.com\u002Fusbarmory\u002Ftamago)）。\n\n此外，它还支持基于 [OpenXLA](https:\u002F\u002Fgithub.com\u002Fopenxla\u002Fxla) 的高度优化后端引擎，\n该引擎利用即时编译技术，可在 CPU、GPU（Nvidia，以及可能的 AMD ROCm 和 Intel，还有 Mac 设备）以及 Google 的 TPU 上运行。\n它还支持现代分布式执行功能（**新功能，仍在积极改进中**），可通过 XLA Shardy 实现多 TPU 或多 GPU 分布式训练，\n这是对 [GSPMD 分布式策略](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.04663) 的进一步发展。\n\n该引擎与 Google 的 [Jax](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fjax)、\n[TensorFlow](https:\u002F\u002Ftensorflow.org\u002F) 和 [Pytorch\u002FXLA](https:\u002F\u002Fdocs.pytorch.org\u002Fxla\u002Fmaster\u002Flearn\u002Fxla-overview.html) 所使用的引擎相同，\n在许多情况下性能也相当。 \n使用此后端可以训练大型模型或处理大规模数据集。\n\n> [!Tip]\n> * 请参阅我们的 🎓 [教程](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Ftutorial.html) 🎓\n> * 参阅 Eli Bendersky 的博客文章“GoMLX：无需 Python 的 Go 语言机器学习”（略显过时，但仍具参考价值）\n> * 一个针对 Kaggle 狗猫分类问题的引导式示例：[狗猫分类](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fdogsvscats.html)。\n> * 一份简单的 [GoMLX 幻灯片介绍](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002F1QWp_N9_7_n7gQKePHfmb5AFFBXnN6DTqjpWxNC0Ecpk\u002Fedit?usp=sharing)，包含少量示例代码。\n\n\u003Cdiv>\n\u003Cp>它被开发出来是为了成为 Go 语言的一个功能齐全的机器学习平台，既可用于生产环境，又便于用户试验各种机器学习想法——详见下文的长期目标。\u003C\u002Fp>\n\n它力求做到 **简单易读、易于理解**，帮助用户建立正确且透明的心理模型，\n从而避免意外情况的发生——这完全符合 Go 语言的设计哲学。\n当然，在某些情况下可能会需要更多的代码输入（即更冗长）。\u003C\u002Fp>\n\n同时，它也非常灵活，易于扩展，适合尝试非常规的想法：例如，您可以使用它来试验新的优化器方案、复杂的正则化方法、奇特的多任务学习等。\n\u003C\u002Fdiv>\n\n## 🗺️ 概述\n\n**GoMLX** 是一个功能齐全的机器学习框架，支持从底层到高层的各种知名机器学习组件。 \n然而，它目前仍然只是主流机器学习库\u002F框架（如 TensorFlow、Jax 或 PyTorch）所提供功能的一部分。\n\n### 使用 GoMLX 开发的示例\n\n  * **🚀 新增 🚀** [KaLM-Gema3 12B 参数](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface\u002Ftree\u002Fmain\u002Fexamples\u002Fkalmgemma3)：腾讯排名靠前的用于 RAG 的句子编码器，使用 [go-huggingface](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface\u002F) 加载模型和分词器，并通过 **GoMLX** 执行。\n  * **🚀 新增 🚀** [Gemma 3 270M](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002Fgemma3)：演示了使用 [onnx-community\u002Fgemma-3-270m-it-ONNX](https:\u002F\u002Fhuggingface.co\u002Fonnx-community\u002Fgemma-3-270m-it-ONNX) 模型并通过 ONNX 转换后的文本生成（LLM），采用 GoMLX 运行。该示例使用 [`gomlx\u002Fonnx-gomlx`](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fonnx-gomlx) 包进行模型转换，并借助 [`gomlx\u002Fgo-huggingface`](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface) 下载模型及运行分词器。\n  * **🚀 新增 🚀** [GPT-2](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002Fgpt2)：展示了使用新的（实验性）变换器和生成器包进行文本生成。\n  * **🚀 新增 🚀** [BERT-base-NER](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002FBERT-base-NER)：一个针对命名实体识别任务微调的 BERT-base 模型。该模型同样是由 HuggingFace 上的 [dslim\u002Fbert-base-NER 模型](https:\u002F\u002Fhuggingface.co\u002Fdslim\u002Fbert-base-NER) 转换为 ONNX 格式的。\n  - **🚀 新增 🚀** [MixedBread Reranker v1](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002Fmxbai-rerank)：一个交叉编码器重排序示例，详情请参见 [HuggingFace MixedBread Reranker v1 页面](https:\u002F\u002Fhuggingface.co\u002Fmixedbread-ai\u002Fmxbai-rerank-base-v1)。该示例使用 [`gomlx\u002Fonnx-gomlx`](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fonnx-gomlx) 包进行模型转换，并借助 [`gomlx\u002Fgo-huggingface`](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface) 下载模型及运行分词器。\n\n  * [Adult\u002FCensus 模型](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fuci-adult.html);\n  * [KAN 如何学习？](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fkan_shapes.html); \n  * [Cifar-10 示例](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fcifar.html); \n  * [MNIST 示例（仅限库和命令行版本)](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002Fmnist)\n  * [狗猫分类器示例](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fdogsvscats.html); \n  * [IMDB 电影评论示例](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fimdb.html); \n  * [牛津花卉 102 数据集的扩散模型（生成随机花卉）](examples\u002Foxfordflowers102\u002FOxfordFlowers102_Diffusion.ipynb);\n    * [流匹配研究笔记本](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fflow_matching.html)，基于 Meta 的 [\"流匹配指南与代码\"](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Fflow-matching-guide-and-code\u002F)。\n  * [OGBN-MAG 的 GNN 模型（实验性）](examples\u002Fogbnmag\u002Fogbn-mag.ipynb)。\n  * 最后，还有一个简单的 [合成线性模型](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002Fexamples\u002Flinear\u002Flinear.go)，供好奇者查看一个最基础的简单模型。\n  * 神经风格迁移十周年纪念：[查看使用 GoMLX 编写的演示](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002Fstyletransfer\u002Fblob\u002Fmain\u002Fdemo.ipynb)，基于 [原始论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1508.06576)。\n  * [三元组损失](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002Fml\u002Ftrain\u002Flosses\u002Ftriplet.go)：包含多种负采样策略以及不同的距离度量方法。\n  * [AlphaZero AI 用于蜂巢棋游戏](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002FhiveGo\u002F)：它使用一个简单的 GNN 来评估棋盘上的局面。其中包含一个 [WASM 演示（在浏览器中运行 GoMLX！)](https:\u002F\u002Fjanpfeifer.github.io\u002FhiveGo\u002Fwww\u002Fhive\u002F) 和一个命令行界面，供玩家测试自己的技巧！\n\n### 后端\n\nGoMLX 是一个友好的“中间 ML API”，提供通用的 API 和一系列 ML 层等组件。但它本身并不执行任何计算：它依赖于不同的后端来编译并在各种硬件上执行计算。\n\n目前有一个通用的后端接口（位于 `github.com\u002Fgomlx\u002Fgomlx\u002Fbackends`，但很快将迁移到独立的仓库），并有三种不同的实现：\n\n   1. **`xla`**：适用于 CPU、GPU 和 TPU 的 [OpenXLA](https:\u002F\u002Fgithub.com\u002Fopenxla\u002Fxla) 后端。就现有技术而言属于最先进的水平，但仅支持静态形状。\n      目前支持 linux\u002Famd64、linux\u002Farm64（CPU）以及 darwin\u002Farm64（CPU）。使用 [go-xla](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-xla) 提供的 Go 版本 API。\n   2. **`go`**：纯 Go 后端（无 C\u002FC++ 依赖）：速度较慢但非常便携（可编译为 WASM、Windows 等）：\n      * SIMD 支持正在推进中（参见 [Go 的 SIMD 讨论](https:\u002F\u002Fgithub.com\u002Fgolang\u002Fgo\u002Fissues\u002F73787) 以及正在开发的 [go-highway](https:\u002F\u002Fgithub.com\u002Fajroetker\u002Fgo-highway)）；\n      * **🚀 新增 🚀**：新增对部分 **融合操作** 和某些量化类型的支持，从而在某些情况下显著提升性能。\n      * 另请参阅 [GoMLX 编译为 WASM 为蜂巢棋游戏的 AI 提供支持](https:\u002F\u002Fjanpfeifer.github.io\u002FhiveGo\u002Fwww\u002Fhive\u002F)。\n      * 计划支持动态形状（可能在 2026 年年中）。\n   3. **🚀 新增 🚀** **[go-darwinml](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-darwinml)**：Apple CoreML 的 Go 绑定，支持 Metal 加速、MLX 以及所有 DarwinOS 相关的后端。\n\n### 亮点\n\n* 使用 [onnx-gomlx](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fonnx-gomlx) 将 ONNX 模型转换为 GoMLX：既可以作为 `onnxruntime` 的替代方案（利用 XLA），也可以用于进一步微调模型。同时，还可以通过 [go-huggingface](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface) 轻松从 HuggingFace 下载 ONNX 模型文件。\n* [Docker \"gomlx_jupyterlab\"](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fjanpfeifer\u002Fgomlx_jupyterlab)，集成了 JupyterLab 和 [GoNB](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002Fgonb)（Jupyter 笔记本的 Go 内核）。\n* 自动微分：目前仅支持梯度计算，暂不支持雅可比矩阵。\n* 上下文管理：自动管理机器学习模型中的变量。\n* 包含多种流行机器学习“层”的 ML 层库：前馈神经网络层、各类激活函数、层归一化和批归一化、卷积、池化、Dropout、多头注意力机制（用于 Transformer 层）、LSTM、KAN（B 样条、[GR-KAN\u002FKAT 网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.10594)、离散 KAN、分段线性 KAN）、分段线性层（用于校准和归一化）、各种正则化方法、可逆\u002F可微分快速傅里叶变换、可学习有理函数（既可用于激活函数，也可用于 [GR-KAN\u002FKAT 网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.10594)）、VNN（向量神经网络）用于 SO(3) 等变\u002F不变层等。\n* 训练库，带有格式化的打印输出，并支持在 Jupyter 笔记本中使用 [GoNB, a Go Kernel](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002Fgonb) 绘制图表。\n  * 此外，还提供多种调试工具：收集特定节点的值以供绘图、在训练过程中简单记录节点的值，以及生成创建节点的代码堆栈跟踪信息。\n* `gomlx_checkpoints` 是一个命令行工具，用于检查训练模型的检查点，并**生成图表**，使用 Plotly 绘制损失曲线和任意评估指标。请参阅[训练会话示例](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fgomlx_checkpoints_plot_example.html)，其中展示了训练过程中学习率变化的效果。该工具还允许将不同模型的训练过程绘制在同一张图上，以便比较它们的演变。\n* SGD 和 Adam（AdamW 和 Adamax）优化器。\n* 多种损失函数和评估指标。\n* 可直接使用的预训练模型：InceptionV3（图像模型），以及通过 [onnx-gomlx](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fonnx-gomlx) 获取的更多来自 HuggingFace 的模型。同时，可以使用 [go-huggingface](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-huggingface) 轻松从 HuggingFace 下载 ONNX 模型文件。\n* 将 Numpy 数组读取为 GoMLX 张量——请参阅包 `github.com\u002Fgomlx\u002Fgomlx\u002Fpkg\u002Fcore\u002Ftensors\u002Fnumpy`。\n* （实验性）支持 PJRT 的静态链接：虽然构建 Go 程序的速度较慢，但部署时无需在目标机器上安装 PJRT 插件。这需要您从 XLA 源码编译自己的静态 PJRT 插件。可以使用 `go build --tags=pjrt_cpu_static` 命令，或通过 `import _ \"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fxla\u002Fcpu\u002Fstatic\"` 来启用。\n* **自动安装 XLA PJRT 插件**（适用于 CPU、GPU 和 TPU；支持 Linux 和 Mac），安装位置为用户的本地库目录（Linux 下为 `$HOME\u002F.local\u002Flib`，Mac 下为 `$HOME\u002FLibrary\u002FApplication Support\u002FXLA`）。可通过设置 `GOMLX_NO_AUTO_INSTALL` 或编程方式调用 `xla.EnableAutoInstall(false)` 来禁用此功能。\n* **分布式执行**（跨多个 GPU 或 TPU），用户只需提供少量提示即可。只需配置好分布式数据集，训练器便会自动完成其余工作。请参阅 [UCI-Adult 示例](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002Fexamples\u002Fadult\u002Fdemo\u002Fmain.go#L222) 中的代码更改。此功能目前仍处于实验阶段，请报告任何问题并帮助我们改进它。\n\n\n## 👥 支持\n\n* 在 [Slack 频道 #gomlx](https:\u002F\u002Fapp.slack.com\u002Fclient\u002FT029RQSE6\u002FC08TX33BX6U) 进行讨论（您可以[在此加入 Slack 服务器](https:\u002F\u002Finvite.slack.golangbridge.org\u002F)）。\n* [问答与讨论](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fdiscussions\u002Fcategories\u002Fq-a)\n* [问题反馈](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fissues)\n* 随机项目头脑风暴：只需发起一个问答，我很乐意在 Discord 或视频通话中与您交流。\n* [Google Groups: groups.google.com\u002Fg\u002Fgomlx-discuss](https:\u002F\u002Fgroups.google.com\u002Fg\u002Fgomlx-discuss)\n\n## \u003Ca id=\"installation\">\u003C\u002Fa>🛠️ + ⚙️ 安装 \n\n**对于大多数用户来说，无需安装。**\n\n**对于 XLA**，系统默认会自动安装所需的 XLA PJRT 插件（适用于 CPU、GPU 和 TPU；支持 Linux 和 Mac），安装位置为用户的本地库目录（Linux 下为 `$HOME\u002F.local\u002Flib\u002Fgo-xla`；Mac 下为 `$HOME\u002FLibrary\u002FApplication Support\u002Fgo-xla`；Windows 下为 `$HOME\\AppData\\Local\\go-xla`）。可以通过设置 `GOMLX_NO_AUTO_INSTALL` 或编程方式调用 `xla.EnableAutoInstall(false)` 来禁用此功能。\n\n如果您希望手动预安装以构建生产级 Docker 镜像、指定特定版本，或进行其他自定义设置，请参阅 [github.com\u002Fgomlx\u002Fgo-xla](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgo-xla) 以获取详细信息，其中包含易于理解的简单安装程序。\n\n如果您只想使用纯 **Go 后端**，只需导入 `_ \"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fsimplego\"` 即可，无需安装任何额外组件。\n\n## 🐳  [预构建的 Docker](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fjanpfeifer\u002Fgomlx_jupyterlab)\n\n最简单的入门方式就是拉取这个包含 **GoMLX** + [JupyterLab](https:\u002F\u002Fjupyterlab.readthedocs.io\u002F) + [GoNB](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002Fgonb)（Jupyter 的 Go 内核）以及 **Nvidia 的 CUDA 运行时**（用于可选的 GPU 支持）的 Docker 镜像——镜像大小约为 5GB。CUDA 运行时已预先安装，无需额外操作。\n\n在您希望在 Jupyter 中可见的目录下执行以下命令：\n\n> 如果需要 GPU 支持，请在下面的 `docker run` 命令中添加 `--gpus all` 标志。\n\n```bash\ndocker pull janpfeifer\u002Fgomlx_jupyterlab:latest\ndocker run -it --rm -p 8888:8888 -v \"${PWD}\":\u002Fhome\u002Fjupyter\u002Fwork janpfeifer\u002Fgomlx_jupyterlab:latest\n```\n\n终端中会显示一个以 `127.0.0.1:8888` 开头的 URL（包含所需的访问令牌），您可以在浏览器中打开该链接。\n\n您还可以从那里打开并交互式地体验教程，该教程已包含在 Docker 的 `Projects\u002Fgomlx\u002Fexamples\u002Ftutorial` 目录中。\n\n更多关于该 Docker 的详细信息请参见 [docker\u002Fjupyterlab\u002FREADME.md](docker\u002Fjupyterlab\u002FREADME.md)。\n\n该 Docker 也支持 Windows 系统：_Docker Desktop_ 底层使用 WSL2 技术。\n\n## 🧭 教程\n\n请参阅[此处的教程](examples\u002Ftutorial\u002Ftutorial.ipynb)。它涵盖了各个方面的内容。\n\n之后，可以查看位于[examples\u002F](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples)目录中的示例。\n\n该库本身文档齐全（如果发现遗漏之处，请提交问题），代码也相对容易阅读。\n_Godoc_ 文档可在 [pkg.go.dev](https:\u002F\u002Fpkg.go.dev\u002Fgithub.com\u002Fgomlx\u002Fgomlx) 上找到。\n\n最后，欢迎随时提问：在时间允许的情况下（非工作时间），我非常乐意提供帮助：\n我经常在线于 [Slack 频道 #gomlx](https:\u002F\u002Fapp.slack.com\u002Fclient\u002FT029RQSE6\u002FC08TX33BX6U)；\n或者访问 [groups.google.com\u002Fg\u002Fgomlx-discuss](https:\u002F\u002Fgroups.google.com\u002Fg\u002Fgomlx-discuss)。\n\n### 推理与生产部署\n\n目前，模型的推理或服务是通过使用创建模型时所用的 Go 代码，并结合包含训练权重和超参数的检查点来实现的。\n换句话说，就是使用与训练相同的工具来进行推理。\n\n例如，直接构建一个包含预训练模型的 Docker 容器并从中提供服务，或者将其集成到您自己的应用程序中，都是非常简单的操作。\n\n关于如何实现这一点以及将模型推理导出为库的简单示例，请参阅\n[`...\u002Fexamples\u002Fcifar\u002Fclassifer`](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002Fexamples\u002Fcifar\u002Fclassifier\u002Fclassifier.go),\n以及在[Cifar-10 示例](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Fcifar.html)的最后几节中的使用方法。\n\n未来，我们计划还将模型导出为 ONNX 或 XLA 的 StableHLO 格式，这样就可以直接使用支持这些格式的工具进行服务，\n而无需链接 GoMLX——这将略微减小可执行文件的大小。\n\n## 🎯 长期目标\n\n1. 在 Go 中构建和训练模型——而非 Python（或其他语言）——重点在于：\n   - 易于阅读和理解，使用户能够形成对当前操作过程正确且透明的心理模型。即使这意味着编写时会更冗长。\n   - 清晰、可分离的 API：每个 API 应尽可能自成一体且相互解耦。\n   - 可组合性：任何组件都应可替换，以便进行自定义和实验。\n     这意味着有时需要更多的编码工作（不存在一个能完成所有任务的“魔术”训练对象），\n     但这样做可以使流程更加清晰，并便于用自定义版本替换部分组件。\n   - 及时更新的文档：如果文档缺失或编写不佳，就如同代码不存在一样。\n   - 清晰且可操作的错误报告\n2. 成为一个高效的研究和教育平台，用于试验新的机器学习思想并进行学习。\n   - 支持多设备上的镜像训练以及各种形式的分布式训练（模型并行和\u002F或数据并行），\n     特别是支持大型语言模型及类似大规模模型的训练。\n3. 成为一个稳健可靠的生产级平台。一些子目标包括：\n   - 支持现代加速硬件，如 TPU 和 GPU。\n   - 提供除 XLA 之外的多种后端，例如：llamacpp、WebNN（通过 Wasm）、纯 Go 实现等。\n   - 导入来自[Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels)的预训练模型并支持微调——ONNX 版本已经在 [onnx-gomlx](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fonnx-gomlx) 中对许多模型生效。\n   - 将模型编译为 C 库和\u002F或 WebAssembly 二进制文件，以便在任何地方（任何语言）进行链接和推理使用。\n\n## 🤔 常见问题解答\n\n- **GoMLX 使用哪些环境变量？**\n  - `GOMLX_BACKEND`：定义要使用的后端引擎（如果使用 `backends.New()`）。其值格式为“\u003Cbackend_name>[:\u003Cbackend_config>]”，其中配置部分是可选的。示例：\n    - `GOMLX_BACKEND=go`：使用 `SimpleGo` 后端，即纯 Go 实现，具有很好的可移植性但速度较慢。\n    - `GOMLX_BACKEND=\"xla:cpu\"`：使用 XLA 后端（更快的后端，目前仅支持 Linux 系统）运行 CPU 计算。\n    - `GOMLX_BACKEND=\"xla:cuda\"`：使用 XLA 后端运行 Nvidia CUDA。\n    - `GOMLX_BACKEND=\"xla:\u002Fpath\u002Fto\u002Fmy\u002Fpjrt_plugin.so\"`：使用 XLA 并加载任意 PJRT 插件。PJRT 是 XLA 的插件系统，用于支持不同的硬件。\n      用户可以安装为 NVIDIA GPU 构建的 PJRT（有相应的安装脚本），也有适用于 ROCm 的 PJRT（作者未测试过），\n      以及适用于 TPU（Google Cloud）的 PJRT，甚至还有针对新型加速器（例如：[TensTorrent XLA](https:\u002F\u002Fgithub.com\u002Ftenstorrent\u002Ftt-xla)）开发的 PJRT。\n  - `PJRT_PLUGIN_LIBRARY_PATH`：底层 XLA 后端会将此变量视为额外的目录，用于搜索插件位置。\n    它会依次查找系统的库路径（`$LD_LIBRARY_PATH`、`\u002Fetc\u002Fld.so.conf`）、默认路径 `\u002Fusr\u002Flocal\u002Flib\u002Fgomlx\u002Fpjrt`，以及用户设置的 `$PJRT_PLUGIN_LIBRARY_PATH`。\n  - `GOMLX_NO_AUTO_INSTALL`：若设置为 `1`，GoMLX 在运行于尚未安装 PJRT 的系统上时，将不会自动安装 PJRT。\n  - `XLA_FLAGS`：用于控制 XLA 后端的可选参数。应以分号（`;`）分隔的选项列表形式设置。若设置为 `--help`，后端会打印出所有选项的帮助信息。\n    更多信息请参阅[XLA 标志指南](https:\u002F\u002Fopenxla.org\u002Fxla\u002Fflags_guidance)页面。\n- **使用 GoMLX 时应包含哪些后端？**\n  - 建议使用 `import _ \"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fdefault\"`，这将导入 `xla`（或别名 `stablehlo`）和 `go`（_SimpleGo_）后端。\n    如果在编译时添加 `-tags=noxla` 参数，则不会包含 XLA 后端。\n  - 使用 `import _ \"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fsimplego\"` 仅导入 `go` 后端（无 C++ 依赖）。\n  - 使用 `import _ \"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fxla\"` 仅导入 XLA 后端。\n- **AI 上下文文件（Gemini\u002FClaude Code\u002FConductor 等）存放在哪里？**\n  - 为避免项目根目录过于杂乱，这些文件已被移至 `.agents\u002F` 目录，并已加入 `.gitignore` 文件中。\n    这样一来，用户只需将所需的 AI 配置文件以符号链接的形式指向本地副本的根目录，即可轻松使用它们。\n\n## 🤝 合作\n\n该项目欢迎所有感兴趣的人士贡献代码。\n许多部分尚未最终确定，因此对于那些具备 Go、机器学习和 API 设计经验的人来说，仍有大量改进和重新设计的空间。\n请参阅[TODO 文件](docs\u002FTODO.md)获取灵感。\n\n目前尚未制定治理规范。\n\n如需联系，请参阅上方的“支持”部分！（可通过 Slack 频道或 Google Groups）\n\n## 💖 支持本项目\n\n如果您觉得本项目有所帮助，请考虑通过[GitHub Sponsors](https:\u002F\u002Fgithub.com\u002Fsponsors\u002Fgomlx)支持我们的工作。\n\n您的捐助将帮助我们（目前主要是[我](https:\u002F\u002Fgithub.com\u002Fjanpfeifer)）投入更多时间进行维护，\n并为整个 GoMLX 生态系统添加新功能。\n\n同时，这笔资金也将用于获取更多硬件资源的使用权（购买或云服务），以提升跨平台兼容性，例如：ROCm、Apple Metal（GPU）、多 GPU\u002FTPU、NVIDIA DGX Spark、Tenstorrent 等。\n\n## 🚀 高级主题\n\n* [更新日志](docs\u002FCHANGELOG.md)\n* [待办事项](docs\u002FTODO.md)\n* [错误处理](docs\u002Ferror_handling.md)\n* [开发指南](docs\u002Fdeveloping.md)\n\n##  💖 感谢\n\n* [Go](golang.org)\n* [OpenXLA](https:\u002F\u002Fgithub.com\u002Fopenxla\u002Fxla)\n* [TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002F)\n* [Jax](https:\u002F\u002Fgithub.com\u002Fjax-ml\u002Fjax)\n* [PyTorch](https:\u002F\u002Fpytorch.org\u002F)\n* [ONNX](https:\u002F\u002Fonnx.ai\u002F)\n* [HuggingFace](https:\u002F\u002Fhuggingface.co\u002F)\n\n## ⚖️ 许可证 \n\n> 版权归2025年扬·普费弗所有\n\n**GoMLX** 根据 [Apache许可证第2.0版](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002FLICENSE) 进行分发。\n除非另有明确说明，否则任何有意提交以纳入本项目的贡献，都将依据 [Apache许可证第2.0版](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fblob\u002Fmain\u002FLICENSE) 进行许可，且不附加任何额外条款或条件。","# GoMLX 快速上手指南\n\nGoMLX 是专为 Go 语言设计的机器学习与数学框架，被誉为\"Go 版本的 PyTorch\u002FJAX\u002FTensorFlow\"。它支持纯 Go 后端（可运行于浏览器\u002FWASM）以及基于 OpenXLA 的高性能后端（支持 CPU、GPU 和 TPU 加速）。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**：Linux (amd64\u002Farm64), macOS (arm64\u002Famd64), Windows (amd64)\n- **Go 版本**：建议使用 Go 1.21 或更高版本\n- **硬件加速（可选）**：\n  - 若使用 GPU\u002FTPU 加速，需安装对应的 NVIDIA CUDA 驱动或 Google TPU 环境。\n  - 若仅使用纯 Go 后端，无需额外硬件依赖，甚至可在浏览器中通过 WASM 运行。\n\n### 前置依赖\n- 确保已安装 Git。\n- 若计划使用 Jupyter Notebook 进行交互式开发，推荐安装 [GoNB](https:\u002F\u002Fgithub.com\u002Fjanpfeifer\u002Fgonb)。\n\n## 安装步骤\n\n### 1. 初始化 Go 模块\n在你的项目目录中初始化 Go 模块（如果尚未初始化）：\n\n```bash\ngo mod init my-gomlx-project\n```\n\n### 2. 安装 GoMLX 核心库\n使用 `go get` 命令安装 GoMLX：\n\n```bash\ngo get github.com\u002Fgomlx\u002Fgomlx\n```\n\n### 3. 安装后端引擎（按需选择）\n\n#### 方案 A：纯 Go 后端（默认，高兼容性）\n无需额外操作，代码默认使用纯 Go 后端，适合开发调试、WASM 部署或无 GPU 环境。\n\n#### 方案 B：OpenXLA 后端（高性能，支持 GPU\u002FTPU）\n若需利用 GPU 或 TPU 加速训练大模型，需引入 XLA 后端包：\n\n```bash\ngo get github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fxla\n```\n*注意：使用 XLA 后端时，请确保系统已安装相应的 CUDA 工具包（针对 NVIDIA GPU）或 TPU 运行时库。*\n\n#### 方案 C：ONNX 模型支持（可选）\n若需加载 HuggingFace 上的 ONNX 模型：\n\n```bash\ngo get github.com\u002Fgomlx\u002Fonnx-gomlx\ngo get github.com\u002Fgomlx\u002Fgo-huggingface\n```\n\n## 基本使用\n\n以下是一个最简单的示例，展示如何创建一个线性模型并执行前向传播计算。\n\n### 示例代码：构建简单的线性层\n\n创建文件 `main.go`，写入以下内容：\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\"\n\t\"github.com\u002Fgomlx\u002Fgomlx\u002Fgraph\"\n\t\"github.com\u002Fgomlx\u002Fgomlx\u002Flayers\"\n\t\"github.com\u002Fgomlx\u002Fgomlx\u002Ftypes\u002Ftensors\"\n)\n\nfunc main() {\n\t\u002F\u002F 1. 获取默认后端 (默认为纯 Go 后端，若安装了 xla 包且环境支持可自动切换)\n\tbackend := backends.DefaultBackend()\n\n\t\u002F\u002F 2. 创建计算图上下文\n\tctx := graph.NewContext(backend)\n\n\t\u002F\u002F 3. 定义输入张量 (形状: [batch_size=2, features=3])\n\tinput := tensors.FromScalarSlice(backend, []float32{1, 2, 3, 4, 5, 6}, []int{2, 3})\n\n\t\u002F\u002F 4. 在计算图中构建模型：一个简单的全连接层 (Dense Layer)\n\t\u002F\u002F 输入维度 3 -> 输出维度 1\n\toutput := layers.Dense(ctx, input, 1, nil, \"linear_layer\")\n\n\t\u002F\u002F 5. 执行计算并获取结果\n\tresult := output.Value()\n\t\n\tfmt.Printf(\"输入形状：%v\\n\", input.Shape())\n\tfmt.Printf(\"输出结果：%v\\n\", result.Slice())\n\t\n\t\u002F\u002F 清理资源\n\tctx.Done()\n}\n```\n\n### 运行程序\n\n```bash\ngo run main.go\n```\n\n### 下一步建议\n- **教程学习**：访问官方 [Tutorial](https:\u002F\u002Fgomlx.github.io\u002Fgomlx\u002Fnotebooks\u002Ftutorial.html) 了解完整的训练流程。\n- **实战案例**：参考 `examples` 目录下的 [MNIST](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002Fmnist) 或 [GPT-2](https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Ftree\u002Fmain\u002Fexamples\u002Fgpt2) 示例。\n- **交互式开发**：使用 Docker 启动集成环境：\n  ```bash\n  docker run -p 8888:8888 janpfeifer\u002Fgomlx_jupyterlab\n  ```","某电商平台的后端团队希望在不引入 Python 依赖的情况下，直接在现有的 Go 微服务架构中集成实时欺诈检测模型。\n\n### 没有 gomlx 时\n- **架构割裂**：必须额外部署 Python 推理服务，通过 gRPC 或 HTTP 进行跨语言调用，显著增加了系统延迟和运维复杂度。\n- **训练门槛高**：数据科学家需将模型从实验环境重写为生产代码，Go 工程师因缺乏原生深度学习库而无法直接参与模型迭代。\n- **硬件加速困难**：纯 Go 实现难以利用 GPU 或 TPU 进行大规模训练，导致处理海量交易数据时耗时过长。\n- **调试黑盒**：跨语言边界使得错误追踪困难，内存管理和数据类型转换容易引发隐蔽的运行时崩溃。\n\n### 使用 gomlx 后\n- **原生集成**：直接在 Go 代码中构建、训练并部署模型，消除了外部服务依赖，将推理延迟降低至毫秒级。\n- **统一技术栈**：利用 gomlx 提供的可微分算子和类似 PyTorch 的 API，Go 工程师能无缝接手模型优化工作，加速功能上线。\n- **高性能计算**：通过集成的 OpenXLA 后端，自动调用 JIT 编译技术加速 CPU、GPU 及 TPU 运算，大幅缩短大模型训练时间。\n- **透明可控**：遵循 Go 语言简洁透明的哲学，开发者可清晰掌控从数据预处理到梯度更新的每一步，便于排查问题和定制非标准算法。\n\ngomlx 让 Go 开发者能在单一语言环境下享受工业级的机器学习能力，彻底打破了算法工程化中的语言壁垒。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgomlx_gomlx_f73ccaeb.png","GoMLX","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fgomlx_43e45cbc.jpg","GoMLX Project",null,"gomlx-discuss@googlegroups.com","http:\u002F\u002Fgithub.com\u002Fgomlx","https:\u002F\u002Fgithub.com\u002Fgomlx",[80,84,88,92],{"name":81,"color":82,"percentage":83},"Go","#00ADD8",97.2,{"name":85,"color":86,"percentage":87},"Jupyter Notebook","#DA5B0B",2.6,{"name":89,"color":90,"percentage":91},"Dockerfile","#384d54",0.1,{"name":93,"color":94,"percentage":91},"Shell","#89e051",1359,73,"2026-04-11T08:51:53","Apache-2.0","Linux, macOS, Windows","非必需。若使用高性能后端 (OpenXLA\u002FXLA)，支持 NVIDIA GPU (可能支持 AMD ROCm, Intel GPU) 及 Google TPU；纯 Go 后端无需 GPU。未指定具体显存大小或 CUDA 版本要求。","未说明",{"notes":103,"python":104,"dependencies":105},"1. 该工具是 Go 语言的机器学习框架，旨在无需 Python 即可运行。2. 提供两种主要后端：'go'后端为纯 Go 实现，兼容性极强（支持浏览器 WASM、嵌入式设备），但速度较慢；'xla'后端基于 OpenXLA，支持 JIT 编译到 CPU\u002FGPU\u002FTPU，性能高但目前主要支持 Linux 和 macOS 的 CPU 及部分 GPU。3. 新增对 Apple CoreML\u002FMetal 的后端支持 (go-darwinml)。4. 支持在浏览器中通过 WASM 运行。","无需 Python (纯 Go 语言框架)",[106,107,108,109,110,111],"Go (语言环境)","github.com\u002Fgomlx\u002Fgomlx","github.com\u002Fgomlx\u002Fgo-xla (可选，用于 XLA 后端)","github.com\u002Fgomlx\u002Fonnx-gomlx (可选，用于 ONNX 模型)","github.com\u002Fgomlx\u002Fgo-huggingface (可选，用于加载 HuggingFace 模型)","github.com\u002Fjanpfeifer\u002Fgonb (可选，用于 Jupyter Notebook 支持)",[14],[114,115,116,117,118],"go","golang","machine-learning","neural-network","xla","2026-03-27T02:49:30.150509","2026-04-12T05:24:58.565539",[122,127,132,137,142,147],{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},30233,"GoMLX 支持哪些模型格式（如 PyTorch、TensorFlow、ONNX）？","GoMLX 原生支持通过 ONNX 导入模型。您可以将 PyTorch 或 TensorFlow 模型转换为 ONNX 格式，然后在 GoMLX 中使用。例如，LSTM 模型已得到支持：您可以在 Python 中创建模型并导出为 ONNX，然后使用 GoMLX 的转换器加载。具体示例代码可参考 `onnx-gomlx` 仓库中的 Jupyter Notebook（`onnx-py.ipynb` 和 `onnx-go.ipynb`），其中展示了从 Python 创建到 Go 转换的完整流程。","https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fissues\u002F80",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},30234,"GoMLX 是否还依赖已弃用的 XLA Client API？","不再依赖。从 v0.11.0 版本开始，GoMLX 已迁移至 `gopjrt` 库，不再依赖旧的 C\u002FC++ XLA Client API。现在它仅依赖 XlaBuilder（用于构建 ML 程序）和 PJRT C API（用于编译和执行）。如果您遇到相关报错，请确保升级到最新版本。","https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fissues\u002F52",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},30235,"在使用深度卷积（Depthwise Convolution）时遇到梯度形状不匹配错误怎么办？","这是一个已知问题，发生在反向传播计算梯度时，输入梯度的通道维度被错误计算为 1 而不是原始通道数。该问题已在 PR #206 中修复，该补丁实现了对通道分组和批次分组的卷积梯度正确计算。请升级到包含此修复的最新版本（维护者提到在发布大版本更新后已关闭此问题），如果问题仍然存在，请检查是否使用了正确的 `FilterGroupCount` 配置。","https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fissues\u002F132",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},30236,"SimpleGo 后端是否存在数据竞争（Data Race）问题？","是的，早期版本在并行执行路径中存在数据竞争，特别是在 `executeParallel` 函数中对 `execBuf.results` 和 `execBuf.owned` 的无锁写入。该问题已在 PR #393 中修复，修复方案是将结果写入操作移入锁保护范围内，并解决了缓冲区泄漏问题。请使用合并了 PR #393 之后的版本（v0.27.2 之后），并通过 `go test -race` 验证问题是否解决。","https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fissues\u002F387",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},30237,"SimpleGo 纯 Go 后端支持哪些算子？是否支持 AveragePool 和 Pad？","SimpleGo 后端旨在提供无需 CGO 的调试体验，但算子支持仍在完善中。AveragePool 和 Pad 算子此前缺失，但维护者已确认这是常见操作并计划添加。关于 InceptionV3 等模型的支持进度，可跟踪 Issue #191。注意：Pure Go 后端性能比 XLA 或 ONNX Runtime (ORT) 慢约 10 倍，仅推荐用于调试或小规模测试；在生产环境中，CPU 上 ORT 速度约为 XLA 的 2 倍，GPU 上 XLA 更快。","https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fissues\u002F243",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},30238,"如何在 macOS 上编译 GoMLX 及其 XLA C 库？","在 macOS 上编译时可能会遇到缺少 `gperftools\u002Fmalloc_extension.h` 头文件的错误。这通常是因为缺少 Google Performance Tools (gperftools) 依赖。您需要先安装 gperftools（例如通过 Homebrew: `brew install gperftools`），并确保编译器能找到相应的头文件路径。如果问题依旧，请检查环境变量 `CGO_CFLAGS` 和 `CGO_LDFLAGS` 是否正确指向了安装路径。","https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fgomlx\u002Fissues\u002F23",[153,158,163,168,173,178,183,188,193,198,203,208,213,218,223,228,233,238,243,248],{"id":154,"version":155,"summary_zh":156,"released_at":157},214581,"v0.27.2","### 核心模块：\n\n- 包 `graph`：\n  - `DotGeneral` 现在会将 `AccumulatorDType` 和 `OutputDType` 传递给后端（而不是假定后端不支持这些类型并进行转换）。此外，默认情况下，半精度浮点数会使用 float32 作为累加器。\n    - 对于 `xla` 后端：为变量（权重）与 `DotGeneral` 操作的左操作数之间添加了一个“hacky”依赖关系。这是因为 XLA CPU 会对权重进行临时重排布局，而这一依赖关系确保在同一时间只分配一个临时缓冲区，从而避免在模型各层之间重复分配大量临时内存。在一个包含 48 层、总大小约 22GB 的模型中，这种方法节省了高达 48GB 的临时内存。\n\n- 包 `ml\u002Fmodel\u002Ftransformer`：\n  - 添加了架构参数，取值为“standard”或“gemma”。\n  - 激活函数现在接受 `activations.Type` 类型的值（而非字符串），不过仍支持通过上下文超参数以字符串形式进行转换。\n  - 新增了 `WithTransposedWeights()` 和 `WithCausalMask()` 选项。\n  - 代码得到简化。\n\n- 包 `ml\u002Flayers`：\n  - 为归一化类型添加了常量。\n\n\n### 后端：\n\n- 后端 `xla`：\n  - 将对 `github.com\u002Fgomlx\u002Fgo-xla` 的依赖更新至 v0.2.2，修复了 NVIDIA CUDA 驱动路径的问题。\n  - 对于不支持的累加数据类型（目前仅支持 float32），`DotGeneral` 会先自动将输入数据类型转换为累加数据类型。\n  - 如果传递 `-vmodule=executable=1` 参数，会启用可执行文件内存消耗的日志记录。\n  - 添加了 `OptimizationBarrier` 操作，但尚未在 `graph` 包中公开。\n  - 为 `DotGeneral` 操作的权重与左操作数之间添加了一个“hack”依赖，以大幅降低临时内存的使用量。相关讨论见 https:\u002F\u002Fgithub.com\u002Fopenxla\u002Fstablehlo\u002Fissues\u002F2923。\n\n- 后端 `simplego`（即“go”后端）：\n  - 对于带有累加数据类型的 `DotGeneral`，它会先自动将输入数据类型转换为累加数据类型（例外是半精度类型，默认使用 float32 作为累加器）。","2026-03-21T16:32:17",{"id":159,"version":160,"summary_zh":161,"released_at":162},214582,"v0.27.1","- 包 `examples\u002F...`:\n  - 更新了 `gemma3`、`mxbai-rerank` 和 `bert-base-ner`，使其使用新的 `onnx-gomlx` v0.4.1 API，并升级了依赖版本。\n\n- 包 `graph`:\n  - `Floor` 和 `Ceil` 操作现在对整数数据类型是恒等操作。\n\n- 包 `simplego`:\n  - 移除了执行时的 panic：改为返回错误。\n  - 修复了未实现错误中缺少注解和堆栈跟踪的问题。\n  - 实现了 `Pad()` 操作，并在 `graph.TestPad` 中添加了一些测试用例。\n\n- 包 `xla`:\n  - 将 `TF_CPP_MIN_LOG_LEVEL` 的默认值更改为 3。详情请参阅 https:\u002F\u002Fgithub.com\u002Fopenxla\u002Fxla\u002Fissues\u002F26466。\n","2026-03-17T10:19:45",{"id":164,"version":165,"summary_zh":166,"released_at":167},214583,"v0.27.0","# v0.27.0：图函数；改进的 Go 后端（融合运算）；量化数据类型；更多机器学习层及修复\n\n- 包 `backends`：进行了重大重构，以增加对函数\u002F闭包的支持。\n  - 添加了 `backends.Function`，现在它包含了所有的“算子”方法。\n  - 添加了 `NewFunction`、`Closure` 和 `Call`。\n  - 将 `backends.Op` 重命名为 `backends.Value`。\n  - 添加了 `FusedOps`，允许后端暴露融合的（更高效的）操作——在不支持或计算梯度时，会自动回退到分解后的操作。\n  - 添加了 `ErrNotImplemented` 错误和 `IsNotImplemented(err)` 函数。\n  - 添加了 `Quantization` 结构体、`QuantizationScheme`（线性、NF4）以及 `NF4LookupTable`。\n  - 移除了 `Dot()` 操作（与 `DotGeneral` 重复）。\n  - `DotGeneral()` 现在接受一个 `DotGeneralConfig` 结构体，其中包含设置累加器和输出数据类型的选项。\n- 包 `simplego`：\n  - 增加了对 `Float16` 的支持（感谢 @timkaye11）\n  - 增加了计算节点去重功能（即“公共子表达式消除”CSE）（感谢 @timkaye11、@janpfeifer）\n    - CSI-Adult 示例训练速度提升了约 6%。\n  - DotGeneral：对分块路径进行预阻塞，这可能导致分块节点的去重（@timekaye11）。\n  - DotGeneral：增加了 smallMatMul 执行路径，针对小型矩阵乘法进行了优化（感谢 @timkaye11）。\n  - 实验性地支持利用 SIMD 指令的 `packgemm`（@ajroetker、@janpfeifer）。\n  - 增加了函数\u002F闭包支持（感谢 @ajroetker）。\n  - 添加了 `Reverse` 操作。\n  - 增加了融合运算：`FusedGelu`、`FusedDense`、`FusedSoftmax`、`FusedLayerNorm`、\n    `FusedScaledDotProductAttention`、`FusedAttentionQKVProjection`。\n  - 添加了 `FusedQuantizedDense`：融合了解量化 + 矩阵乘法 + 偏置 + 激活函数，适用于 Int4\u002FInt8 权重，\n    支持线性和 NF4 量化方案、分块尺度，并可选零点。\n  - `FusedScaledDotProductAttention`：增加了 `ScaledDotProductAttentionConfig` 选项结构体，\n    其中包含 `QuantizedMatmuls` 标志，用于选择是否使用 uint8 量化的 Q@K\u002Fattn@V 矩阵乘法\n    （等待 go-highway 发布以实现实际加速）。\n  - `Bitcast` 被重构为纯位重新解释；字节内的拆分操作被移至 `ConvertDType`。\n- 新包 `bucketing`：\n  - 用于管理张量（或其他任何内容）分桶的工具——感谢 @ajroetker。\n- 包 `dtypes`：\n  - 增加了 `Uint2`、`Uint4`、`Int2`、`Int4`。\n- 包 `graph`：\n  - 增加了 `Function` 概念（以及对闭包的支持）和 `Function.Call` 操作。\n  - 控制流：增加了 `While` 和 `If` 操作。\n  - 排序操作：增加了 `Sort`、`SortFunc`、`TopK`、`BottomK`。\n  - 修复了针对打包的子字节类型（`Int4`、`Int2`、`Uint4` 和 `Uint2`）的 `Bitcast`，\n    使其能够在 `uint8`（字节）之间来回“位转换”，从而简化量化过程。\n  - 添加了 `Atan2` 函数。\n  - 增加了测试辅助函数，以便同时测试各种后端。\n  - 修复了 `Gather` 对 `indexVectorAxis` 的验证逻辑，改为检查其与 `startIndices` 的秩是否匹配，而不是与 `opera","2026-03-13T07:10:30",{"id":169,"version":170,"summary_zh":171,"released_at":172},214584,"v0.26.0","API 变更：`dtypes` 包已从 `github.com\u002Fgomlx\u002Fgopjrt\u002Fdtypes` 移至 `github.com\u002Fgomlx\u002Fgomlx\u002Fpkg\u002Fcore\u002Fdtypes`。\n这应该只需简单地更改导入路径即可。\n\nXLA：\n- go-xla（取代现已废弃的 stablehlo 和 gopjrt 库）\n  - 添加了标准插件（CPU 和 GPU\u002FTPU，如可用）的自动安装功能。\n    （可通过设置环境变量 `GOMLX_NO_AUTO_INSTALL` 为任意值来禁用）\n  - 修复了插件销毁时的一些内存泄漏；\n  - 在某些低延迟场景中提升了性能（使用 GenPool 替代 sync.Pool）：\n- 移除了旧的 `gomlx\u002Fbackends\u002Fxla`（即使用已弃用的 `xlabuilder` API 的版本）。\n- 将 `gomlx\u002Fbackends\u002Fstablehlo` 重命名为 `gomlx\u002Fbackends\u002Fxla`，并改用新的 `go-xla` 库。\n- 新增 `xla.EnableAutoInstall(enabled bool)` 方法，用于启用或禁用标准插件的自动安装。\n  同时新增 `xla.AutoInstall()` 方法，可立即自动安装标准插件。\n- 实现了新 `gomlx\u002Fgomlx\u002Fpkg\u002Fcore\u002Fdtypes`（以及 `bfloat16`）与旧 `gomlx\u002Fgomlx\u002Fpkg\u002Fcore\u002Fdtypes`（及相应 `bfloat16`）之间的转换。\n- 为 XLA CPU 增加了 Linux\u002Farm64 和 Windows\u002Famd64 平台的支持。\n\n其他更新：\n- `tensors` 包：\n  - 新增 `CopyFlatData()` 方法，该方法会返回错误（此前曾被重命名为 `MustCopyFlatData`）。\n- `graph` 包：\n  - 新增 `RNGStateFromSeedForGraph` 函数，用于根据种子为图生成随机数状态。\n- `pkg\u002Fcore\u002Fdtypes` 包：\n  - 引入了新类型，这些类型是从现已废弃的 Gopjrt 中复制而来。\n- `simplego` 包：\n  - 支持按优先级注册执行器。","2025-12-17T19:45:22",{"id":174,"version":175,"summary_zh":176,"released_at":177},214585,"v0.25.0","今年年末的这款重磅更新带来了一份大礼：分布式执行功能。对于任何需要扩展训练（或推理）规模的人来说，这都是必选功能。不过，正如这类新特性一样，目前仍处于**实验性**阶段。我本人会实际使用它，并希望在年底前或明年初解决所有遗留问题——包括补充一份分布式教程（目前已有演示）。🦌\n\n此外，还进行了一些 API 的清理工作，主要是将原本会触发 panic 的地方改为返回错误。因此，如果你升级到这个版本，可能需要做一些小的调整，算是有点“格林奇式”的“礼物”吧。不过这些改动都非常简单，比如在某些函数和方法的返回值列表中添加了错误类型。\n\n我们还开展了一些外部合作🎉，期待未来能有更多这样的合作！毕竟还有很多有趣的事情可以做呢——比如量化、在“Simple Go”后端中利用 SIMD 指令等。\n\n**亮点：**\n\n- 分布式（跨设备）执行：支持自动分片和 SPMD 策略；同时新增了对“可移植设备”执行的支持。\n- API 变更：（只需进行简单的适配）\n  - 大多数非图构建相关的 API 现在都会返回错误，而不再触发 panic。不过，图构建相关函数仍然使用 panic 来报告错误——因为用其他方式表达数学运算实在太麻烦了。\n  - 所有“Rng”均更名为“RNG”——在 Go 语言中，缩写通常采用大写字母。\n\n分布式计算的改进与重构：\n\n- `graph` 包：\n  - 修复并优化了文档。\n  - 新增了 `IsNegative`、`IsPositive`、`IsNonNegative`、`IsNonPositive` 等函数。\n  - 添加了 `SubScalar` 以及针对 `*Scalar` 函数的测试。\n  - 新增了 `Graph.WithDistributedStrategy`、`Graph.WithDeviceMesh`、`Graph.DeviceMesh` 和 `Graph.NumDevices`。\n  - 增加了 `Graph.Distributed()` 方法，支持跨设备的集体操作（如 `AllReduce`）。\n  - 重命名：s\u002F`Exec.InDevice`\u002F`Exec.WithDevice`；s\u002F`Exec.SetName`\u002F`Exec.WithName`。\n  - 新增了 `RunOnDevice`。\n  - 新增了 `Exec.AutoSharding` 和 `Exec.SPMD`。\n\n- `context` 包：\n  - 新增了 `context.MustGetParam[T](ctx, key)` 和 `context.MustGetGraphParam[T](ctx, graph, key)`。\n  - 同样新增了 `Exec.AutoSharding` 和 `Exec.SPMD`。\n  - 新增了 `Variable.DistributedValue` 和 `Variable.SetDistributedValue`。\n\n- `train` 包：\n  - 新增了 `train.DistributedDataset` 和 `train.BaseDataset`。\n  - `Dataset.Reset` 现在会返回错误，而不是触发 panic。\n  - `Trainer.TrainStep`、`Trainer.EvalStep` 和 `Trainer.Eval` 现在都会返回错误，而非直接 panic。\n  - 新增了 `Trainer.WithDeviceAssignment`。\n  - 新增了 `Trainer.DistributedTrainStep`、`Trainer.DistributedEvalStep` 和 `Trainer.DistributedEval`。\n\n- `datasets` 包：\n  - 新增了 `datasets.DistributedAccumulator`：用于将普通数据集转换为分布式数据集。\n  - 新增了 `datasets.OnDevice`：用于提前将数据上传到各个设备上。\n\n- `backend` 包：\n  - 新增了 `Backend.CopyToDevice`。\n  - `Builder.Parameter()` 现在接受一个可选的 `ShardingSpec` 参数，用于处理分片输入。\n  - 新增了 `AllReduce` 操作。\n  - `Backend.NumDevices()` 返回","2025-12-05T13:36:04",{"id":179,"version":180,"summary_zh":181,"released_at":182},214586,"v0.24.1","* 更新依赖至 Gopjrt v0.8.4：新增对 macOS (darwin\u002Farm64) 的支持，并添加了 CPU PJRT 插件。\n* 在 Darwin 系统上，默认包含 `stablehlo`（即 `xla`）。\n* GitHub Actions：\n  * 添加了 macOS 测试。\n  * 移除了不必要的 `apt install` 包安装步骤。\n","2025-10-23T08:56:31",{"id":184,"version":185,"summary_zh":186,"released_at":187},214587,"v0.24.0","此版本**破坏了 API**：主要是目录结构的变更，因此修复起来较为简单。\n\n> 注意：破坏 API 并非轻率之举（这总是让我感到遗憾）。经过两年的有机发展，项目中逐渐积累了一些小的 API 问题，现在是时候进行一次“焕新”了。\n\n* **本次发布的主要亮点**：\n  * 废弃旧的“xla”后端（现称为“oldxla”），转而使用“stablehlo”（同样别名为“xla”）：\n    在大多数情况下无需任何改动（`github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fdefault` 会自动将两者互换），\n    但在特殊场景下可能需要进行少量调整。\n  * 大规模重构：已导出的 GoMLX 包被移至 `\u002Fpkg` 目录下。具体变更如下：\n    * **这要求修改导入路径**：核心包（`tensors`、`shapes` 和 `graph`）位于 `pkg\u002Fcore`；\n      机器学习相关包（`context`、`layers`、`train`、`datasets` 等）位于 `pkg\u002Fml`；\n      辅助工具包（`fsutil`、`sets`、`xslices`、`xsync`）位于 `pkg\u002Fsupport`。\n    * 对 `graph.Exec` 和 `context.Exec` 进行了规范化处理，API 略有变化：\n      `Exec.Exec...` 方法现在返回错误，而 `Exec.MustExec...` 方法则会 panic（取代了旧的 `Exec.Call` 格式）；\n      `graph.NewExec` 和 `context.NewExec` 会返回错误，而 `graph.MustNewExec` 和 `context.MustNewExec` 则会 panic。\n    * 旧的 `ml\u002Fdata` 下的文件工具现归入 `pkg\u002Fsupport\u002Ffsutil`；原 `ml\u002Fdata` 包已被重命名为 `pkg\u002Fml\u002Fdatasets`，目前仅包含各类数据集类型。\n    * 未被移动的包：\n      * `backends` 包：计划于今年晚些时候或 2026 年初迁移到独立的代码库。\n      * `ui` 和 `example` 包：由于它们只是附加组件，暂时保留在原处。核心 `GoMLX` 不依赖这些包，因此对其外部依赖的要求相对宽松。\n\n\u003Chr\u002F>\n\n* 将外部的简单 `must` 和 `exceptions` 工具包复制到 `\u002Finternal\u002F...` 目录下，以消除对外部依赖的引用。\n* `xla` 包（旧版）：现已**废弃**，更名为 `oldxla`。`stablehlo` 包取而代之，并同时接管了 `xla` 后端的名称别名。\n  * 旧版本现注册为后端“oldxla”。\n  * 仅在使用 `oldxla` 标签编译时才会包含在 `github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fdefault` 中。\n* `stablehlo` 包：\n  * 现已默认完全替代 `xla`。即使设置 `GOMLX_BACKEND=xla`，实际使用的仍然是 `stablehlo` 后端。\n  * 新增了 `github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fstablehlo\u002Fcpu\u002Fdynamic` 和 `github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fstablehlo\u002Fcpu\u002Fstatic`\n    用于可选地强制启用 CPU PJRT 插件的动态或静态链接。\n  * 默认禁用 XLA 日志，通过将 `TF_CPP_MIN_LOG_LEVEL` 设置为 2（仅记录错误级别）来实现，前提是该环境变量尚未被设置。\n* `graph` 包：\n  * `NewExec`、`NewExecAny`、`Exec`、`ExecOnce` 和 `ExecOnceN` 现在会在失败时返回错误。\n  * `MustNewExec`、`MustNewExecAny`、`MustExec`、`MustExecOnce` 和 `MustExecOnceN` 则会 panic。","2025-10-21T06:24:25",{"id":189,"version":190,"summary_zh":191,"released_at":192},214588,"v0.23.2","- 更新依赖库至新的 Gopjrt v0.8.2 -- 解决了 CUDA PJRT 向后兼容性方面的问题（此前缺失）。\n- `stablehlo` 包：\n  - 增加了对布尔值比较的支持，并添加了相应的测试用例。\n  - 修复了在形状推导过程中对 Gather 操作的错误检查。","2025-09-29T17:53:16",{"id":194,"version":195,"summary_zh":196,"released_at":197},214589,"v0.23.1","* 包 `backends`：\n  * 移除了算子 `Broadcast`：该算子是多余的，因为 `BroadcastInDim` 已经涵盖了其功能。\n* 包 `graph`：\n  * 之前未定义 `BroadcastPrefix` 的反向传播。现在改用 `BroadcastInDim` 实现后，反向传播已正常工作。\n* 包 `simplego`：\n  * 仅在发生错误时记录 ConvGeneral 的统计信息。\n","2025-09-25T08:03:14",{"id":199,"version":200,"summary_zh":201,"released_at":202},214590,"v0.23.0","* 包 `shapes`：\n  * 新增 `FromAnyValue`：从 Go 类型中提取形状。\n* 新后端：`stablehlo`（简称为 _“hlo”_），基于 https:\u002F\u002Fgithub.com\u002Fgomlx\u002Fstablehlo。\n  * 所有标准的二元和一元算子均已实现。\n  * 少数标准算子也已实现。\n  * 如果在编译 `backends\u002Fdefault` 时使用 `-tags=stablehlo` 标签，则会包含 `stablehlo` 后端。\n  * 大幅清理了代码生成器：大多数不再依赖 `gopjrt\u002Fxlabuilder`。\n* 包 `graph`：\n  * `ArgMin`、`ArgMax`：\n    * 修复了 `ArgMin` 现在可以接受负轴的问题。\n    * 对于 `stablehlo` 和 `go` 后端，NaN 值将被有意选中（与 Jax\u002FTensorFlow\u002FPyTorch 保持一致）。\n  * `Clip` 现在使用后端算子 `Clamp`。\n  * `Inverse` 重命名为 `Reciprocal`——`Inverse` 现已成为 `Reciprocal` 的弃用别名。\n  * 为多种归约操作添加了测试。\n  * 新增 `IsNaN`。\n  * 修复了当提供的掩码仅是输入张量前几个维度时的 `MaskedReduceMean` 问题。\n* 包 `nanlogger`：\n  * `NanLogger.WithStopAtFirst` 现在可用于控制 `NanLogger.Trace` 的默认行为。\n* 包 `backends`：\n  * 算子不再自动生成：现在由其自身作为事实来源（而非从 XLA 代码中生成）。\n  * 新增 `IsNaN`。\n  * 多处注释得到改进。\n  * 移除了错误的 `SelectAndScatterSum`，该算子现已在 `gopjrt` 中被弃用。\n* 包 `train`：\n  * `Loop.EveryNSteps` 会考虑当前的全局步数（而不是始终从 0 开始计数）。\n  * 实现了 `train.Dataset` 接口的数据集现在也可以实现 `ShortName() string` 方法，以提供用于指标的简短名称。\n* 包 `losses`：\n  * `MeanSquaredError`：修复了权重\u002F掩码对掩码的预期不一致问题。\n* 包 `commandline`：\n  * 公开了命令行更新频率参数 `RefreshPeriod`。\n  * 修复了进度条\u002F指标表格的闪烁问题。\n  * 改进了颜色，并使步数输出更易读。\n* `gomlx_checkpoints` CLI 工具：\n  * 新增 `-plot` 参数，用于为所有指标生成图表。该工具支持多种模型，因此可用于比较不同模型。","2025-09-21T14:56:48",{"id":204,"version":205,"summary_zh":206,"released_at":207},214591,"v0.22.1","* Package `backends`:\r\n  * `ConvGeneralDilated` renamed to `ConvGeneral`\r\n* Package `backends\u002Fshapeinference`:\r\n  * Added `ConvGeneralOp` to infer the output shape of a convolution.\r\n* Package `backends\u002Fsimplego`:\r\n  * Implemented `ConvGeneral` operation: supporting strides, padding, dilations (input and kernel),\r\n    and grouping (channels or batch), as well as transposing (arbitrary axes) convolutions.\r\n* Package `types\u002Fshapes`: \r\n  * `Shape.Iter()` and `Shape.IterOn()` also yields the flat index being iterated.\r\n  * Added `Shape.Strides()` and `Shape.IterOnAxes()`.\r\n* Package `graph`:\r\n  * Names of parameters for `ConvGeneral` were standardized to \"input,\" \"kernel\" and \"channels.\"\r\n  * `ConvGeneralDilated` is being aliased to `ConvGeneral` and the former will be deprecated on\r\n    a future version.\r\n  * `ConvGeneral`: added gradient for grouped (by channels or by batch) convolutions.\r\n  * Fixed shape of the kernel for `images.ChannelFirst` configuration.\r\n  * Added `Split`.\r\n  * `TransposeAllDims` -> `TransposeAllAxes`.\r\n* Package `layers`:\r\n  * Updated the configuration names for `Convolution`, to match the standards in the `graph` package.\r\n  * Added `ChannelGroupCount()` and `BatchGroupCount()` to `Convolution` configuration.\r\n* Updated to gopjrt v0.8.0, with the changes to the convolution API.\r\n\r\n(release v0.22.0 was cancelled due to a couple of bugs noticed slightly after release)\r\n\r\n","2025-08-22T12:03:39",{"id":209,"version":210,"summary_zh":211,"released_at":212},214592,"v0.21.1","* Package `tensors` and `graph`:\r\n  * Added support for zero-dim tensors.\r\n* Package `backends`:\r\n  * Method **`New()` will return an error (as opposed to panic)**.\r\n    The temporarily `NewOrErr` was marked as deprecated, use `New` instead.\r\n* Package `optimizers`:\r\n  * New `AdamConfig.WithBackoffSteps()` (or the hyperparameter `adam_backoff`) that prevents gradient steps\r\n    from being taken until the given number of steps has executed. This allows a better estimate (moving average) of\r\n    the gradients (\"momentum\") and their variances to be calculated before applying them.\r\n  * New `optimizers.ParamAdamBeta1` and `optimizers.ParamAdamBeta2` hyperparameters to control Adam beta1 and beta2\r\n    hyperparameters.\r\n* Package `context`:\r\n  * Added `Variable.DType()`.\r\n  * Variable `#rngstate` marked as non-trainable during creation.\r\n* `gomlx_checkpoints`:\r\n  * Added `-perturb`.\r\n  * Now it has its own `go.mod`, so it separated the dependencies.\r\n* Docker:\r\n  * Included `openssh-client` (ssh) and `dlv` (Go debugger) by default.\r\n* `SimpleGo` (\"go\") backend:\r\n  * Fixed mishandling of multi-output operations and race condition on parallel execution (#197)\r\n  * Refactoring and clean up of execution loops.\r\n  * Separated `TestDotGeneral_PerformanceTable` behind the build tag `perf`.\r\n","2025-08-16T09:07:39",{"id":214,"version":215,"summary_zh":216,"released_at":217},214593,"v0.21.0","* Package `simplego`:\r\n  * Added `GetBackend` that returns a singleton backend, created with the default configuration at the first request.\r\n* Package `ui\u002Fcommandline`:\r\n  * Added optional extra arbitrary metrics to print in the command-line with `AttachProgressBar`.\r\n  * Added `FormatDuration` to pretty-print duration.\r\n* Package `graph`\r\n  * Added gradients of `Cos` and `Sin` that were missing.\r\n  * Fixed (removed) the extra empty line in auto-generate functions comments that was preventing the documentation\r\n     from being assigned to the functions.\r\n  * Added parameters `sorted` and `unique` to `Scatter` (like the other functions `Scatter*`) -- **Small API change**.\r\n  * Added `ScatterUpdate`, for now only for `unique=true`.\r\n  * Package `nanlogger`:\r\n    * Allow traces that only report also.\r\n    * Created context parameter `optimizer.ParamNanLogger`: if set to NanLogger, it will trace all occurrences of\r\n      of NaN values in gradient: great to debug where are the NaN appearing in the model first.\r\n* Package `ml\u002Ftrain`:\r\n  * Improved support for accumulated gradients. Fixed evaluation (context reuse) for when using accumulated gradients.\r\n  * Added `Trainer.WithMaxExecutors`.\r\n* Package `ml\u002Ftrain\u002Fmetrics`:\r\n  * `MeanMetric` allows for disabling dynamic batch weighting.  API slightly changed: `NewMeanMetric` now\r\n    returns a `MeanMetric` struct, not an interface.\r\n  * Added `StreamingMedianMetric`.\r\n* Package `ml\u002Ftrain\u002Foptimizers`:\r\n  * Added `RMSProp()` optimizer.\r\n* Package `ml\u002Flayers`\r\n  * Added normalizing 1\u002Fsqrt(d_k) factor to attention logits in the MultiHeadAttention layer: this will break current\r\n    models using it.\r\n  * Added `RMSNorm` normalizer.\r\n* `gomlx_checkpoints` command-line tool:\r\n  * Added support for multiple models to allow comparing models.\r\n  * Fixed the printing of metrics with tiny values.\r\n* Package `context`:\r\n  * Allow VariableInitializers to use the `context.Context` itself, with its own random initializer.\r\n  * `DefaultInitializer` now creates an initializer. The new default uses He initializer, the same used in PyTorch.\r\n  * Package `initializers`:\r\n    * They now use the `context` random number generator state, which simplifies things. \r\n    * `ParamInitialSeed` removed, since the RNG is initialized by `Context.RngStateWithSeed()`.\r\n* Fixed some flaky tests.\r\n","2025-07-01T10:29:41",{"id":219,"version":220,"summary_zh":221,"released_at":222},214594,"v0.20.1","* Package `train`:\r\n  * Better handling of loss (without regularization) in metrics. Added `SetLossNoRegularization` and `GetLossNoRegularization`.\r\n  * Added `Trainer.AccumulateGradients(n)` to accumulate n steps of gradients before applying them. This is useful if \r\n    the desired batch size doesn't fit in memory, so it accumulates the gradients until the virtual batch size gradient\r\n    is calculated.\r\n* Package `optimizers`:\r\n  * Added support for the new `train.OptimizeWithGradients` interface, to support gradient accumulators. \r\n  * Cleaned up `StochasticGradientDescent` API. Added option to disable decay for testing.\r\n* Pacakge `vnn`:\r\n  * Added `Config.Scaler` to add a scaler operator just after the linear projection of a layer. It allows the VNN\r\n    to operate on magnitude independent vectors.\r\n  * Fixed the `LayerNormalization`, to make it more stable in backprop.\r\n  * Fixed `Relu`: added support for non-shared non-linearities and a \"leak\" parameter (\"vnn_relu_negative_slope\").\r\n  * Added `VNN().ActivationFn()` to allow setting arbitrary activation functions.\r\n* Package `types\u002Ftensors\u002Fnumpy`:\r\n  * Added support for \"Fortran order\" files.\r\n* Package `tensors`:\r\n  * Attempting to finalize an \"on-device\" tensor whose backend has already been finalized is now a no-op -- as opposed to an panic.\r\n  * Access to a on-device or shared buffer now checks that the backend hasn't been finalized.\r\n    And if it has, it panics with a meaningful error message.\r\n  * Added integration tests.\r\n","2025-06-12T07:26:44",{"id":224,"version":225,"summary_zh":226,"released_at":227},214595,"v0.20.0","* Package `backends`:\r\n  * **API CHANGE**: Method `NewWithConfig()` changed\r\n  * Method **`New()` will be changed to return an error (as opposed to panic) at next version**.\r\n    Temporarily the methods `MustNew()` (which panics on errors, like today) and `NewOrErr` (which returns\r\n    an error) were created to have a clear API, and `New()` was marked as deprecated. At the next version\r\n    `New()` will change the API.\r\n  * Added `IsFinalized()` to the Backend API, to better handle attempts to access finalized backends.\r\n  * Fixed bug in `xla` backend where an error was not being sent when Backend was already finalized.\r\n* Package `types\u002Ftensors\u002Fnumpy` with methods to read and write tensors from\u002Fto `.npy` and `.npz` files.\r\n* Package `simplego`:\r\n  * Fixed bug introduced in parallelize version of Erf(x).\r\n* Package `tensors`:\r\n  * Added `Tensor.ToLocal()` to detach a tensor from its backend.\r\n* Package `ui\u002Fgonb\u002Fplotly`: \r\n  * Update dependencies to new go-plotly v0.7.0 (many changes to the API), while preserving as much as possible\r\n    the GoMLX api offered.\r\n* Updated example notebooks to use `github.com\u002Fgomlx\u002Fgomlx\u002Fbackends\u002Fdefault` (instead of only `\u002Fxla`) and to\r\n  use the new `backends.MustNew()`.","2025-06-03T16:47:46",{"id":229,"version":230,"summary_zh":231,"released_at":232},214596,"v0.19.5","This release improves SimpleGo in some cases a couple of orders of magnitude. It should be now ~5x to 20x slower than the state of the art (oneDNN), which is a good point to be -- it provides portability and C-independence, and works just fine for many use cases.\r\n\r\nSIMD is not here yet, but we are looking forward to provide experimental support once 1.25 is out.\r\n\r\nFrom the CHANGELOG:\r\n\r\n* Package `simplego`, the pure Go backend:\r\n  * Added several benchmarks for SimpleGo DotGeneral. Run with:\r\n    `go test .\u002Fbackends\u002Fsimplego\u002F -test.v -test.run PerformanceTable -perf`\r\n  * DotGeneral reimplemented in 2 different versions:\r\n    * Version for small inner matrices, with block iteration and loop unrolling.\r\n    * Version for larger inner matrices: re-package inputs in ~4K blocks, and recursively partition matrices.\r\n    * Added parallelization: at batch level and in the partitioning in the larger matrices.\r\n  * Parallel execution of the Ops: that helps a lot during training (cut the training time almost in half for the adult \r\n    dataset), but it may hurt inference if you are running many batches in parallel. \r\n    So it dynamically decides to run sequentially or in parallel depending on the number of computations\r\n    being executed concurrently.\r\n    Added also configurations `GOMLX_BACKEND=go:ops_sequential` and `GOMLX_BACKEND=go:ops_parallel` \r\n    to force one type of execution or another.\r\n  * Parallelized Erf(x): this will become a model on how to parallelize other unary functions -- probably\r\n    when SIMD is available.\r\n","2025-05-30T11:07:19",{"id":234,"version":235,"summary_zh":236,"released_at":237},214597,"v0.19.4","* Vector Neural Networks (VNN): allows one to build 3D rotation (SO(3)) equivariant and\u002For invariant networks. See package `ml\u002Flayers\u002Fvnn`.\r\n* Package `xla`\r\n  * Remove dependencies to `gopjrt` internal protos: requires updated `Gopjrt`.\r\n* Package `tensors`\r\n  * Fixed pretty-print of booleans.\r\n","2025-05-24T08:51:07",{"id":239,"version":240,"summary_zh":241,"released_at":242},214598,"v0.19.3","(This release merges the v0.19.2, which was not properly released)\r\n\r\n* Package `simplego`:\r\n  * Fixed `Gather` of scalar values.\r\n  * Fixed `Where` checking of shape.\r\n  * New ops: `NotEqual`, `Erf`, `ArgMinMax`, `ReduceWindow`, `ReduceBitwise{And,Or,Xor}` and \r\n    `ReduceLogical{And,Or,Xor}`\r\n  * Fixed initialization of re-used buffers where needed.\r\n* Package `backends\u002Fdefault`:\r\n  * Only include XLA by default on linux\u002Famd64 platforms.\r\n* Package `shapeinference`:\r\n  * Changed to return errors instead of exceptions.\r\n* Package `types\u002Ftensors`:\r\n  * Removed dependency to `gopjrt\u002Fpjrt` -- otherwise we'll always need to install the C\u002FC++ library.\r\n* Package `types\u002Fshape`:\r\n  * Added `Shape.Iter()` and `Shape.IterOn()`.\r\n* Package `backend`:\r\n  * `Backend` interface now returns errors instead of panicking.\r\n* Package `graph`:\r\n  * Added `NewExecOrError` and `Exec.CallOrError` as error-returning alternatives.\r\n* gofmt cleanups by @zjtv\r\n","2025-05-20T08:40:33",{"id":244,"version":245,"summary_zh":246,"released_at":247},214599,"v0.19.2","* GoMLX now runs on Windows natively!\r\n* Package `simplego`:\r\n  * Fixed `Gather` of scalar values.\r\n  * Fixed `Where` checking of shape.\r\n  * Added `Erf` op.\r\n* Package `types\u002Ftensors`:\r\n  * Removed dependency to `gopjrt\u002Fpjrt` -- otherwise we'll always need to install the C\u002FC++ library.\r\n","2025-05-01T12:15:54",{"id":249,"version":250,"summary_zh":251,"released_at":252},214600,"v0.19.1","* `go mod tidy`\r\n* Package `simplego`:\r\n  * \"not implemented\" error now includes the name of the corresponding method that was not implemented.\r\n  * Several memory fixes.\r\n  * Added `Slice` and `RngBitsGenerator` ops.\r\n* Updated to Gopjrt v0.7.0, with more memory fixes. **Requires an update of the C++ libraries**.\r\n","2025-04-30T19:01:48"]