[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-lablup--backend.ai":3,"tool-lablup--backend.ai":64},[4,18,28,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":24,"last_commit_at":25,"category_tags":26,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,2,"2026-04-06T19:52:38",[27,14],"插件",{"id":29,"name":30,"github_repo":31,"description_zh":32,"stars":33,"difficulty_score":24,"last_commit_at":34,"category_tags":35,"status":17},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85052,"2026-04-08T11:03:08",[15,16,36,27,13,37,38,14,39],"视频","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":17},2181,"OpenHands","OpenHands\u002FOpenHands","OpenHands 是一个专注于 AI 驱动开发的开源平台，旨在让智能体（Agent）像人类开发者一样理解、编写和调试代码。它解决了传统编程中重复性劳动多、环境配置复杂以及人机协作效率低等痛点，通过自动化流程显著提升开发速度。\n\n无论是希望提升编码效率的软件工程师、探索智能体技术的研究人员，还是需要快速原型验证的技术团队，都能从中受益。OpenHands 提供了灵活多样的使用方式：既可以通过命令行（CLI）或本地图形界面在个人电脑上轻松上手，体验类似 Devin 的流畅交互；也能利用其强大的 Python SDK 自定义智能体逻辑，甚至在云端大规模部署上千个智能体并行工作。\n\n其核心技术亮点在于模块化的软件智能体 SDK，这不仅构成了平台的引擎，还支持高度可组合的开发模式。此外，OpenHands 在 SWE-bench 基准测试中取得了 77.6% 的优异成绩，证明了其解决真实世界软件工程问题的能力。平台还具备完善的企业级功能，支持与 Slack、Jira 等工具集成，并提供细粒度的权限管理，适合从个人开发者到大型企业的各类用户场景。",70806,"2026-04-08T11:10:08",[38,13,14,27],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":24,"last_commit_at":54,"category_tags":55,"status":17},51,"gstack","garrytan\u002Fgstack","gstack 是 Y Combinator CEO Garry Tan 亲自开源的一套 AI 工程化配置，旨在将 Claude Code 升级为你的虚拟工程团队。面对单人开发难以兼顾产品战略、架构设计、代码审查及质量测试的挑战，gstack 提供了一套标准化解决方案，帮助开发者实现堪比二十人团队的高效产出。\n\n这套配置特别适合希望提升交付效率的创始人、技术负责人，以及初次尝试 Claude Code 的开发者。gstack 的核心亮点在于内置了 15 个具有明确职责的 AI 角色工具，涵盖 CEO、设计师、工程经理、QA 等职能。用户只需通过简单的斜杠命令（如 `\u002Freview` 进行代码审查、`\u002Fqa` 执行测试、`\u002Fplan-ceo-review` 规划功能），即可自动化处理从需求分析到部署上线的全链路任务。\n\n所有操作基于 Markdown 和斜杠命令，无需复杂配置，完全免费且遵循 MIT 协议。gstack 不仅是一套工具集，更是一种现代化的软件工厂实践，让单人开发者也能拥有严谨的工程流程。",66972,"2026-04-08T11:10:00",[13,27],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":24,"last_commit_at":62,"category_tags":63,"status":17},3074,"gpt4free","xtekky\u002Fgpt4free","gpt4free 是一个由社区驱动的开源项目，旨在聚合多种可访问的大型语言模型（LLM）和媒体生成接口，让用户能更灵活、便捷地使用前沿 AI 能力。它解决了直接调用各类模型时面临的接口分散、门槛高或成本昂贵等痛点，通过统一的标准将不同提供商的资源整合在一起。\n\n无论是希望快速集成 AI 功能的开发者、需要多模型对比测试的研究人员，还是想免费体验最新技术的普通用户，都能从中受益。gpt4free 提供了丰富的使用方式：既包含易于上手的 Python 和 JavaScript 客户端库，也支持部署本地图形界面（GUI），更提供了兼容 OpenAI 标准的 REST API，方便无缝替换现有应用后端。\n\n其技术亮点在于强大的多提供商支持架构，能够动态调度包括 Opus、Gemini、DeepSeek 等多种主流模型资源，并支持 Docker 一键部署及本地推理。项目秉持社区优先原则，在降低使用门槛的同时，也为贡献者提供了扩展新接口的便利框架，是探索和利用多样化 AI 资源的实用工具。",65970,"2026-04-04T01:02:03",[27,38,13],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":80,"owner_email":81,"owner_twitter":80,"owner_website":82,"owner_url":83,"languages":84,"stars":121,"forks":122,"last_commit_at":123,"license":124,"difficulty_score":125,"env_os":126,"env_gpu":127,"env_ram":128,"env_deps":129,"category_tags":142,"github_topics":143,"view_count":24,"oss_zip_url":80,"oss_zip_packed_at":80,"status":17,"created_at":155,"updated_at":156,"faqs":157,"releases":186},5527,"lablup\u002Fbackend.ai","backend.ai","Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing\u002FML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, Gaudi NPU, Google TPU, GraphCore IPU and other NPUs.","Backend.AI 是一个基于容器的高效计算集群平台，旨在为人工智能和科学计算提供灵活的资源管理方案。它解决了在多用户环境下，如何统一调度异构硬件资源（如 NVIDIA\u002FAMD GPU、Google TPU 及各种国产 NPU）并隔离运行环境的难题，让用户能够按需或批量启动包含各类主流框架和编程语言的计算会话。\n\n该平台特别适合需要高性能算力的开发者、数据科学家及研究人员使用。无论是进行模型训练、数据分析还是算法验证，用户都能通过 Backend.AI 轻松获取所需的计算资源，而无需关心底层基础设施的复杂配置。\n\nBackend.AI 的核心亮点在于其自研的\"Sokovan\"编排器，能够智能分配和隔离底层资源，支持高度定制化的作业调度。同时，它提供了完善的 REST 和 GraphQL API 接口，便于系统集成与自动化管理。在访问体验上，Backend.AI 支持通过浏览器直接连接容器内的 Jupyter、Web 终端，甚至允许通过 SSH 或 VSCode 远程开发，实现了安全便捷的“云端本地化”操作体验。对于希望构建私有 AI 算力池或优化现有集群效率的团队来说，这是一个功能强","Backend.AI 是一个基于容器的高效计算集群平台，旨在为人工智能和科学计算提供灵活的资源管理方案。它解决了在多用户环境下，如何统一调度异构硬件资源（如 NVIDIA\u002FAMD GPU、Google TPU 及各种国产 NPU）并隔离运行环境的难题，让用户能够按需或批量启动包含各类主流框架和编程语言的计算会话。\n\n该平台特别适合需要高性能算力的开发者、数据科学家及研究人员使用。无论是进行模型训练、数据分析还是算法验证，用户都能通过 Backend.AI 轻松获取所需的计算资源，而无需关心底层基础设施的复杂配置。\n\nBackend.AI 的核心亮点在于其自研的\"Sokovan\"编排器，能够智能分配和隔离底层资源，支持高度定制化的作业调度。同时，它提供了完善的 REST 和 GraphQL API 接口，便于系统集成与自动化管理。在访问体验上，Backend.AI 支持通过浏览器直接连接容器内的 Jupyter、Web 终端，甚至允许通过 SSH 或 VSCode 远程开发，实现了安全便捷的“云端本地化”操作体验。对于希望构建私有 AI 算力池或优化现有集群效率的团队来说，这是一个功能强大且扩展性极佳的选择。","Backend.AI\n==========\n\n[![PyPI release version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fbackend.ai-manager.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fbackend.ai-manager\u002F)\n![Supported Python versions](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fbackend.ai-manager.svg)\n![Wheels](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fwheel\u002Fbackend.ai-manager.svg)\n[![Gitter](https:\u002F\u002Fbadges.gitter.im\u002Flablup\u002Fbackend.ai.svg)](https:\u002F\u002Fgitter.im\u002Flablup\u002Fbackend.ai)\n\nBackend.AI is a streamlined, container-based computing cluster platform\nthat hosts popular computing\u002FML frameworks and diverse programming languages,\nwith pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU,\nRebellions, FuriosaAI, HyperAccel, Google TPU, Graphcore IPU and other NPUs.\n\nIt allocates and isolates the underlying computing resources for multi-tenant\ncomputation sessions on-demand or in batches with customizable job schedulers with its own orchestrator named \"Sokovan\".\n\nAll its functions are exposed as REST and GraphQL APIs.\n\n\nRequirements\n------------\n\n### Python & Build Tools\n\n- **Python**: 3.13.x (main branch requires CPython 3.13.7)\n- **Pantsbuild**: 2.27.x\n- See [full version compatibility table](src\u002Fai\u002Fbackend\u002FREADME.md#development-setup)\n\n### Infrastructure\n\n**Required**:\n- Docker 20.10+ (with Compose v2)\n- PostgreSQL 16+ (tested with 16.3)\n- Redis 7.2+ (tested with 7.2.11)\n- etcd 3.5+ (tested with 3.5.14)\n- Prometheus 3.x (tested with 3.1.0)\n\n**Recommended** (for observability):\n- Grafana 11.x (tested with 11.4.0)\n- Loki 3.x (tested with 3.5.0)\n- Tempo 2.x (tested with 2.7.2)\n- OpenTelemetry Collector\n\n→ Detailed infrastructure setup: [Infrastructure Documentation](src\u002Fai\u002Fbackend\u002FREADME.md#infrastructure-layer)\n\n### System\n\n- **OS**: Linux (Debian\u002FRHEL-based) or macOS\n- **Permissions**: sudo access for installation\n- **Resources**: 4+ CPU cores, 8GB+ RAM recommended for development\n\n\nGetting Started\n---------------\n\n### Quick Start (Development)\n\n#### 1. Clone and Install\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai.git\ncd backend.ai\n.\u002Fscripts\u002Finstall-dev.sh\n```\n\nThis script will:\n- Check required dependencies (Docker, Python, etc.)\n- Set up Python virtual environment with Pantsbuild\n- Start halfstack infrastructure (PostgreSQL, Redis, etcd, Grafana, etc.)\n- Initialize database schemas\n- Create default API keypairs and user accounts\n\n#### 2. Start Backend.AI Services\n\nStart each component in separate terminals:\n\n**Manager** (Terminal 1):\n```bash\n.\u002Fbackend.ai mgr start-server --debug\n```\n\n**Agent** (Terminal 2):\n```bash\n.\u002Fbackend.ai ag start-server --debug\n```\n\n**Storage Proxy** (Terminal 3):\n```bash\n.\u002Fpy -m ai.backend.storage.server\n```\n\n**Web Server** (Terminal 4):\n```bash\n.\u002Fpy -m ai.backend.web.server\n```\n\n**App Proxy** (Terminal 5-6, optional for in-container service access):\n```bash\n.\u002Fbackend.ai app-proxy-coordinator start-server --debug\n.\u002Fbackend.ai app-proxy-worker start-server --debug\n```\n\n#### 3. Run Your First Session\n\nSet up client environment:\n```bash\nsource env-local-user-session.sh\n# This script prints your default User ID and Password;\n.\u002Fbackend.ai login\n# When prompted, enter the User ID and Password shown above.\n```\n\nRun a simple Python session:\n```bash\n.\u002Fbackend.ai run python -c \"print('Hello Backend.AI!')\"\n```\n\nOr access Web UI at **http:\u002F\u002Flocalhost:8090** with credentials from `env-local-*.sh` files.\n\n### Accessing Compute Sessions (aka Kernels)\n\nBackend.AI provides websocket tunneling into individual computation sessions (containers),\nso that users can use their browsers and client CLI to access in-container applications directly\nin a secure way.\n\n* Jupyter: data scientists' favorite tool\n   * Most container images have intrinsic Jupyter and JupyterLab support.\n* Web-based terminal\n   * All container sessions have intrinsic ttyd support.\n* SSH\n   * All container sessions have intrinsic SSH\u002FSFTP\u002FSCP support with auto-generated per-user SSH keypair.\n     PyCharm and other IDEs can use on-demand sessions using SSH remote interpreters.\n* VSCode\n   * Most container sessions have intrinsic web-based VSCode support.\n\n### Working with Storage\n\nBackend.AI provides an abstraction layer on top of existing network-based storages\n(e.g., NFS\u002FSMB), called vfolders (virtual folders).\nEach vfolder works like a cloud storage that can be mounted into any computation\nsessions and shared between users and user groups with differentiated privileges.\n\n### Installation for Multi-node Tests & Production\n\nPlease consult [our documentation](http:\u002F\u002Fdocs.backend.ai) for community-supported materials.\nContact the sales team (contact@lablup.com) for professional paid support and deployment options.\n\n\nArchitecture\n------------\n\nFor comprehensive system architecture, component interactions, and infrastructure details, see:\n\n**[Component Architecture Documentation](src\u002Fai\u002Fbackend\u002FREADME.md)**\n\nThis document covers:\n- System architecture diagrams and component flow\n- Port numbers and infrastructure setup\n- Component dependencies and communication protocols\n- Development and production environment configuration\n\n\nContents in This Repository\n---------------------------\n\nThis repository contains all open-source server-side components and the client SDK for Python\nas a reference implementation of API clients.\n\n### Directory Structure\n\n* `src\u002Fai\u002Fbackend\u002F`: Source codes\n  - `manager\u002F`: Manager as the cluster control-plane\n  - `manager\u002Fapi`: Manager API handlers\n  - `account_manager\u002F`: Unified user profile and SSO management\n  - `agent\u002F`: Agent as per-node controller\n  - `agent\u002Fdocker\u002F`: Agent's Docker backend\n  - `agent\u002Fk8s\u002F`: Agent's Kubernetes backend\n  - `agent\u002Fdummy\u002F`: Agent's dummy backend\n  - `kernel\u002F`: Agent's kernel runner counterpart\n  - `runner\u002F`: Agent's in-kernel prebuilt binaries\n  - `helpers\u002F`: Agent's in-kernel helper package\n  - `common\u002F`: Shared utilities\n  - `client\u002F`: Client SDK\n  - `cli\u002F`: Unified CLI for all components\n  - `install\u002F`: SCIE-based TUI installer\n  - `storage\u002F`: Storage proxy for offloading storage operations\n  - `storage\u002Fapi`: Storage proxy's manager-facing and client-facing APIs\n  - `appproxy\u002F`: App proxy for accessing container apps from outside\n  - `appproxy\u002Fcoordinator`: App proxy coordinator who provisions routing circuits\n  - `appproxy\u002Fworker`: App proxy worker who forwards the traffic\n  - `web\u002F`: Web UI server\n    - `static\u002F`: Backend.AI WebUI release artifacts\n  - `logging\u002F`: Logging subsystem\n  - `plugin\u002F`: Plugin subsystem\n  - `test\u002F`: Integration test suite\n  - `testutils\u002F`: Shared utilities used by unit tests\n  - `meta\u002F`: Legacy meta package\n  - `accelerator\u002F`: Intrinsic accelerator plugins\n* `docs\u002F`: Unified documentation\n* `tests\u002F`\n  - `manager\u002F`, `agent\u002F`, ...: Per-component unit tests\n* `configs\u002F`\n  - `manager\u002F`, `agent\u002F`, ...: Per-component sample configurations\n* `docker\u002F`: Dockerfiles for auxiliary containers\n* `fixtures\u002F`\n  - `manager\u002F`, ...: Per-component fixtures for development setup and tests\n* `plugins\u002F`: A directory to place plugins such as accelerators, monitors, etc.\n* `scripts\u002F`: Scripts to assist development workflows\n  - `install-dev.sh`: The single-node development setup script from the working copy\n* `stubs\u002F`: Type annotation stub packages written by us\n* `tools\u002F`: A directory to host Pants-related tooling\n* `dist\u002F`: A directory to put build artifacts (.whl files) and Pants-exported virtualenvs\n* `changes\u002F`: News fragments for towncrier\n* `pants.toml`: The Pants configuration\n* `pyproject.toml`: Tooling configuration (towncrier, pytest, mypy)\n* `BUILD`: The root build config file\n* `**\u002FBUILD`: Per-directory build config files\n* `BUILD_ROOT`: An indicator to mark the build root directory for Pants\n* `CLAUDE.md`: The steering guide for agent-assisted development\n* `requirements.txt`: The unified requirements file\n* `*.lock`, `tools\u002F*.lock`: The dependency lock files\n* `docker-compose.*.yml`: Per-version recommended halfstack container configs\n* `README.md`: This file\n* `MIGRATION.md`: The migration guide for updating between major releases\n* `VERSION`: The unified version declaration\n\nServer-side components are licensed under LGPLv3 to promote non-proprietary open\ninnovation in the open-source community while other shared libraries and client SDKs\nare distributed under the MIT license.\n\nThere is no obligation to open your service\u002Fsystem codes if you just run the\nserver-side components as-is (e.g., just run as daemons or import the components\nwithout modification in your codes).\nPlease contact us (contact-at-lablup-com) for commercial consulting and more\nlicensing details\u002Foptions about individual use-cases.\n\n\nMajor Components\n----------------\n\nBackend.AI consists of the following core components:\n\n### Server-Side Components\n\n**[Manager](src\u002Fai\u002Fbackend\u002Fmanager\u002FREADME.md)** - Central API gateway and orchestrator\n- Routes REST\u002FGraphQL requests and orchestrates cluster operations\n- Session scheduling via Sokovan orchestrator\n- User authentication and RBAC authorization\n- Plugin interfaces: `backendai_scheduler_v10`, `backendai_agentselector_v10`, `backendai_hook_v20`, `backendai_webapp_v20`, `backendai_monitor_stats_v10`, `backendai_monitor_error_v10`\n- Legacy repo: https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-manager\n\n**[Agent](src\u002Fai\u002Fbackend\u002Fagent\u002FREADME.md)** - Kernel lifecycle management on compute nodes\n- Manages Docker containers (kernels) on individual nodes\n- Self-registers to cluster via heartbeats\n- Plugin interfaces: `backendai_accelerator_v21`, `backendai_monitor_stats_v10`, `backendai_monitor_error_v10`\n- Legacy repo: https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-agent\n\n**[Storage Proxy](src\u002Fai\u002Fbackend\u002Fstorage\u002FREADME.md)** - Virtual folder and storage backend abstraction\n- Unified interface for multiple storage backends\n- Real-time performance metrics and acceleration APIs\n- Legacy repo: https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-storage-proxy\n\n**[Webserver](src\u002Fai\u002Fbackend\u002Fweb\u002FREADME.md)** - Web UI hosting and session management\n- Hosts Backend.AI WebUI (SPA)\n- Session management and API request signing\n- Legacy repo: https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-webserver\n\n**Synchronizing the static Backend.AI WebUI version:**\n```console\n$ scripts\u002Fdownload-webui-release.sh \u003Ctarget version to download>\n```\n\n**[App Proxy](src\u002Fai\u002Fbackend\u002Fappproxy\u002Fcoordinator\u002FREADME.md)** - Service routing and load balancing\n- Routes traffic to in-container services (Jupyter, VSCode, etc.)\n- Dynamic circuit provisioning and health monitoring\n\n### Container Runtime Components\n\n**[Kernels](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-kernels)** - Container image recipes\n- Dockerfile-based computing environment recipes\n- Support for popular ML frameworks and programming languages\n\n**[Jail](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-jail)** - Programmable sandbox (Rust)\n- ptrace-based system call filtering\n- Resource control and security enforcement\n\n**[Hook](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-hook)** - In-container runtime library\n- libc overrides for resource control\n- Web-based interactive stdin support\n\n### Client SDK Libraries\n\nWe offer client SDKs in popular programming languages (MIT License):\n\n- **Python** - `pip install backend.ai-client` | [GitHub](src\u002Fai\u002Fbackend\u002Fclient) | Includes CLI\n- **Java** - [Releases](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-client-java)\n- **Javascript** - `npm install backend.ai-client` | [GitHub](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-client-js)\n- **PHP** - (under preparation) `composer require lablup\u002Fbackend.ai-client` | [GitHub](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-client-php)\n\n\nPlugins\n-------\n\nBackend.AI supports plugin-based extensibility via Python package entrypoints:\n\n**Accelerator Plugins** (`backendai_accelerator_v21`)\n- [CUDA](src\u002Fai\u002Fbackend\u002Faccelerator\u002Fcuda_open) - NVIDIA GPU support\n- [CUDA Mock](src\u002Fai\u002Fbackend\u002Faccelerator\u002Fcuda_mock) - Development without actual GPUs\n- [ROCm](src\u002Fai\u002Fbackend\u002Faccelerator\u002Frocm) - AMD GPU support\n- [Furiosa](src\u002Fai\u002Fbackend\u002Faccelerator\u002Ffuriosa) - Furiosa NPU (Warboy \u002F RNGD) support\n- [Hyperaccel](src\u002Fai\u002Fbackend\u002Faccelerator\u002Fhyperaccel) - Hyperaccel LPU support\n- [IPU](src\u002Fai\u002Fbackend\u002Faccelerator\u002Fipu) - Graphcore IPU support\n- [Rebellions](src\u002Fai\u002Fbackend\u002Faccelerator\u002Frebellions) - Rebellions NPU (ATOM, ATOM+, ATOM Max) support\n- [Tenstorrent](src\u002Fai\u002Fbackend\u002Faccelerator\u002Ftenstorrent) - Tenstorrent NPU (Wormhole, Blackhole) support\n- [TPU](src\u002Fai\u002Fbackend\u002Faccelerator\u002Ftpu) - Google TPU (v2, v3) support\n\n**Monitoring Plugins**\n- [`backendai_monitor_stats_v10`](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-monitor-datadog) - Datadog statistics collector\n- [`backendai_monitor_error_v10`](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-monitor-sentry) - Sentry exception collector\n\n\nLegacy Components\n-----------------\n\n**[Media Library](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-media)** - Multi-media output support (no longer maintained)\n\n**IDE Extensions** - (Deprecated: Use in-kernel Jupyter Lab, VSCode Server, or SSH instead)\n- [VSCode Live Code Runner](https:\u002F\u002Fgithub.com\u002Flablup\u002Fvscode-live-code-runner)\n- [Atom Live Code Runner](https:\u002F\u002Fgithub.com\u002Flablup\u002Fatom-live-code-runner)\n\nDevelopment\n-----------\n\n### Building Packages\n\nBuild Python wheels or SCIE (Self-Contained Installable Executables):\n\n```bash\n.\u002Fscripts\u002Fbuild-wheels.sh  # Build .whl packages\n.\u002Fscripts\u002Fbuild-scies.sh   # Build SCIE packages\n```\n\nPackages are placed in `dist\u002F` directory.\n\n### Code Quality Hooks\n\nBackend.AI uses Git pre-commit hooks to maintain code quality:\n\n```bash\n# Automatically runs on every commit:\n# - Auto-formatting (pants fmt)\n# - Linting (pants lint)\n\n# Bypass hooks if needed (use sparingly)\ngit commit --no-verify\n```\n\nThe pre-commit hook validates:\n- Code style and formatting\n\nType checking and tests run in CI for comprehensive coverage.\n\nSee [CLAUDE.md](CLAUDE.md#hooks-and-code-quality) for detailed hook system documentation.\n\n### Development Guide\n\nFor detailed development setup, build system usage, and contribution guidelines:\n- [Development Setup](src\u002Fai\u002Fbackend\u002FREADME.md#development-setup) - Python versions, Pantsbuild, dependency management\n- [CONTRIBUTING.md](.github\u002FCONTRIBUTING.md) - Contribution guidelines and development workflow\n- [MIGRATION.md](MIGRATION.md) - Migration guide for major version updates\n\n#### CLI Tab Completion for Development\n\nTo enable shell tab completion for the Backend.AI CLI during development:\n\n```bash\n# For bash\u002Fzsh (from repository root) - MUST use 'source', not execute directly\nsource scripts\u002Fsetup-dev-completion.sh\n\n# For fish shell\nsource scripts\u002Fsetup-dev-completion.fish\n\n# Or create a permanent alias\n# bash\u002Fzsh:\necho 'alias bai-dev=\"cd \u002Fpath\u002Fto\u002Fbackend.ai && source scripts\u002Fsetup-dev-completion.sh\"' >> ~\u002F.zshrc\n# fish:\necho 'alias bai-dev=\"cd \u002Fpath\u002Fto\u002Fbackend.ai; and source scripts\u002Fsetup-dev-completion.fish\"' >> ~\u002F.config\u002Ffish\u002Fconfig.fish\n```\n\nThis will enable tab completion for:\n- `backend.ai \u003Ctab>` - Show all commands\n- `backend.ai session \u003Ctab>` - Session management\n- `backend.ai admin \u003Ctab>` - Admin operations\n- `backend.ai --\u003Ctab>` - Global options\n\nSupports **bash**, **zsh**, and **fish** shells.\n\nLicense\n-------\n\nRefer to [LICENSE file](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002Fmain\u002FLICENSE).\n","Backend.AI\n==========\n\n[![PyPI 发布版本](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fbackend.ai-manager.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fbackend.ai-manager\u002F)\n![支持的 Python 版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fbackend.ai-manager.svg)\n![轮子包](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fwheel\u002Fbackend.ai-manager.svg)\n[![Gitter](https:\u002F\u002Fbadges.gitter.im\u002Flablup\u002Fbackend.ai.svg)](https:\u002F\u002Fgitter.im\u002Flablup\u002Fbackend.ai)\n\nBackend.AI 是一个精简的、基于容器的计算集群平台，支持主流的计算和机器学习框架以及多种编程语言，并且能够插拔式地接入异构加速器，包括 CUDA GPU、ROCm GPU、Rebellions、FuriosaAI、HyperAccel、Google TPU、Graphcore IPU 等 NPU。\n\n它通过名为“Sokovan”的编排器，利用自定义的任务调度器，按需或批量为多租户计算会话分配并隔离底层计算资源。\n\n其所有功能均以 REST 和 GraphQL API 的形式对外暴露。\n\n\n要求\n----\n\n### Python 及构建工具\n\n- **Python**: 3.13.x（主分支需要 CPython 3.13.7）\n- **Pantsbuild**: 2.27.x\n- 请参阅 [完整的版本兼容性表](src\u002Fai\u002Fbackend\u002FREADME.md#development-setup)\n\n### 基础设施\n\n**必需**：\n- Docker 20.10+（带 Compose v2）\n- PostgreSQL 16+（已测试 16.3）\n- Redis 7.2+（已测试 7.2.11）\n- etcd 3.5+（已测试 3.5.14）\n- Prometheus 3.x（已测试 3.1.0）\n\n**推荐**（用于可观测性）：\n- Grafana 11.x（已测试 11.4.0）\n- Loki 3.x（已测试 3.5.0）\n- Tempo 2.x（已测试 2.7.2）\n- OpenTelemetry 收集器\n\n→ 详细的基础设施设置：[基础设施文档](src\u002Fai\u002Fbackend\u002FREADME.md#infrastructure-layer)\n\n### 系统\n\n- **操作系统**：Linux（基于 Debian\u002FRHEL）或 macOS\n- **权限**：安装时需具有 sudo 权限\n- **资源**：建议开发环境至少配备 4 核 CPU 和 8GB 内存\n\n\n快速入门\n--------\n\n### 快速入门（开发）\n\n#### 1. 克隆并安装\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai.git\ncd backend.ai\n.\u002Fscripts\u002Finstall-dev.sh\n```\n\n该脚本将执行以下操作：\n- 检查所需的依赖项（Docker、Python 等）\n- 使用 Pantsbuild 设置 Python 虚拟环境\n- 启动半堆栈基础设施（PostgreSQL、Redis、etcd、Grafana 等）\n- 初始化数据库模式\n- 创建默认的 API 密钥对和用户账户\n\n#### 2. 启动 Backend.AI 服务\n\n在不同的终端中分别启动各个组件：\n\n**管理器**（终端 1）：\n```bash\n.\u002Fbackend.ai mgr start-server --debug\n```\n\n**代理**（终端 2）：\n```bash\n.\u002Fbackend.ai ag start-server --debug\n```\n\n**存储代理**（终端 3）：\n```bash\n.\u002Fpy -m ai.backend.storage.server\n```\n\n**Web 服务器**（终端 4）：\n```bash\n.\u002Fpy -m ai.backend.web.server\n```\n\n**应用代理**（终端 5-6，可选，用于容器内服务访问）：\n```bash\n.\u002Fbackend.ai app-proxy-coordinator start-server --debug\n.\u002Fbackend.ai app-proxy-worker start-server --debug\n```\n\n#### 3. 运行您的第一个会话\n\n设置客户端环境：\n```bash\nsource env-local-user-session.sh\n# 该脚本会打印您的默认用户 ID 和密码；\n.\u002Fbackend.ai login\n# 在提示时输入上述显示的用户 ID 和密码。\n```\n\n运行一个简单的 Python 会话：\n```bash\n.\u002Fbackend.ai run python -c \"print('Hello Backend.AI!')\"\n```\n\n或者使用 `env-local-*.sh` 文件中的凭据，访问 Web UI：**http:\u002F\u002Flocalhost:8090**。\n\n### 访问计算会话（即内核）\n\nBackend.AI 提供了到各个计算会话（容器）的 WebSocket 隧道，使用户能够通过浏览器和客户端 CLI 安全地直接访问容器内的应用程序。\n\n* Jupyter：数据科学家最喜爱的工具\n   * 大多数容器镜像都内置了 Jupyter 和 JupyterLab 支持。\n* 基于 Web 的终端\n   * 所有容器会话都内置了 ttyd 支持。\n* SSH\n   * 所有容器会话都内置了 SSH\u002FSFTP\u002FSCP 支持，并自动生成每个用户的 SSH 密钥对。\n     PyCharm 和其他 IDE 可以使用 SSH 远程解释器来创建按需会话。\n* VSCode\n   * 大多数容器会话都内置了基于 Web 的 VSCode 支持。\n\n### 存储操作\n\nBackend.AI 在现有的网络存储（如 NFS\u002FSMB）之上提供了一个抽象层，称为 vfolder（虚拟文件夹）。\n每个 vfolder 就像一个云存储，可以挂载到任何计算会话中，并可在用户和用户组之间共享，同时赋予不同的权限。\n\n### 多节点测试及生产环境的安装\n\n请参阅我们的 [文档](http:\u002F\u002Fdocs.backend.ai)，获取社区支持的相关资料。\n如需专业的付费支持和部署方案，请联系销售团队（contact@lablup.com）。\n\n\n架构\n----\n\n有关完整的系统架构、组件交互及基础设施细节，请参阅：\n\n**[组件架构文档](src\u002Fai\u002Fbackend\u002FREADME.md)**\n\n本文档涵盖：\n- 系统架构图和组件流程\n- 端口号和基础设施设置\n- 组件依赖关系和通信协议\n- 开发与生产环境配置\n\n\n本仓库内容\n------------\n\n本仓库包含所有开源的服务器端组件以及 Python 客户端 SDK，\n作为 API 客户端的参考实现。\n\n### 目录结构\n\n* `src\u002Fai\u002Fbackend\u002F`: 源代码\n  - `manager\u002F`: 管理器作为集群控制平面\n  - `manager\u002Fapi`: 管理器的 API 处理程序\n  - `account_manager\u002F`: 统一的用户档案和 SSO 管理\n  - `agent\u002F`: 代理作为每个节点的控制器\n  - `agent\u002Fdocker\u002F`: 代理的 Docker 后端\n  - `agent\u002Fk8s\u002F`: 代理的 Kubernetes 后端\n  - `agent\u002Fdummy\u002F`: 代理的虚拟后端\n  - `kernel\u002F`: 代理的内核运行时对应组件\n  - `runner\u002F`: 代理在内核中预编译的二进制文件\n  - `helpers\u002F`: 代理在内核中的辅助工具包\n  - `common\u002F`: 共享工具\n  - `client\u002F`: 客户端 SDK\n  - `cli\u002F`: 所有组件的统一命令行界面\n  - `install\u002F`: 基于 SCIE 的 TUI 安装程序\n  - `storage\u002F`: 存储代理，用于卸载存储操作\n  - `storage\u002Fapi`: 存储代理面向管理器和客户端的 API\n  - `appproxy\u002F`: 应用代理，用于从外部访问容器应用\n  - `appproxy\u002Fcoordinator`: 应用代理协调器，负责配置路由电路\n  - `appproxy\u002Fworker`: 应用代理工作节点，负责转发流量\n  - `web\u002F`: Web UI 服务器\n    - `static\u002F`: Backend.AI WebUI 发布产物\n  - `logging\u002F`: 日志子系统\n  - `plugin\u002F`: 插件子系统\n  - `test\u002F`: 集成测试套件\n  - `testutils\u002F`: 单元测试中使用的共享工具\n  - `meta\u002F`: 旧版元数据包\n  - `accelerator\u002F`: 内置加速器插件\n* `docs\u002F`: 统一日志文档\n* `tests\u002F`\n  - `manager\u002F`, `agent\u002F`, ...: 各组件的单元测试\n* `configs\u002F`\n  - `manager\u002F`, `agent\u002F`, ...: 各组件的示例配置\n* `docker\u002F`: 辅助容器的 Dockerfile\n* `fixtures\u002F`\n  - `manager\u002F`, ...: 各组件的开发环境和测试用例\n* `plugins\u002F`: 用于放置加速器、监控器等插件的目录\n* `scripts\u002F`: 用于辅助开发流程的脚本\n  - `install-dev.sh`: 从工作副本进行单节点开发环境搭建的脚本\n* `stubs\u002F`: 我们编写的类型注解存根包\n* `tools\u002F`: 用于存放与 Pants 工具相关的文件的目录\n* `dist\u002F`: 用于存放构建产物（.whl 文件）和由 Pants 导出的虚拟环境的目录\n* `changes\u002F`: 用于 towncrier 的新闻片段\n* `pants.toml`: Pants 构建工具的配置文件\n* `pyproject.toml`: 工具配置文件（towncrier、pytest、mypy）\n* `BUILD`: 根级构建配置文件\n* `**\u002FBUILD`: 各目录的构建配置文件\n* `BUILD_ROOT`: 用于标记 Pants 构建根目录的标识符\n* `CLAUDE.md`: 代理辅助开发的指导手册\n* `requirements.txt`: 统一的依赖清单\n* `*.lock`, `tools\u002F*.lock`: 依赖锁定文件\n* `docker-compose.*.yml`: 各版本推荐的半栈容器配置\n* `README.md`: 当前文件\n* `MIGRATION.md`: 主要版本间升级的迁移指南\n* `VERSION`: 统一的版本声明\n\n服务器端组件采用 LGPLv3 许可证，旨在促进开源社区中的非专有开放创新；而其他共享库和客户端 SDK 则采用 MIT 许可证。\n\n如果您仅按原样运行服务器端组件（例如，直接以守护进程方式运行，或在您的代码中不加修改地导入这些组件），则没有义务公开您的服务或系统源代码。\n如需商业咨询及针对具体使用场景的更多许可细节和选项，请联系我们（contact-at-lablup-com）。\n\n\n主要组件\n----------------\n\nBackend.AI 由以下核心组件组成：\n\n### 服务器端组件\n\n**[管理器](src\u002Fai\u002Fbackend\u002Fmanager\u002FREADME.md)** - 中央 API 网关和编排器\n- 路由 REST\u002FGraphQL 请求并编排集群操作\n- 通过 Sokovan 编排器进行会话调度\n- 用户身份验证和 RBAC 授权\n- 插件接口：`backendai_scheduler_v10`、`backendai_agentselector_v10`、`backendai_hook_v20`、`backendai_webapp_v20`、`backendai_monitor_stats_v10`、`backendai_monitor_error_v10`\n- 旧版仓库：https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-manager\n\n**[代理](src\u002Fai\u002Fbackend\u002Fagent\u002FREADME.md)** - 计算节点上的内核生命周期管理\n- 管理各个节点上的 Docker 容器（内核）\n- 通过心跳机制自动注册到集群\n- 插件接口：`backendai_accelerator_v21`、`backendai_monitor_stats_v10`、`backendai_monitor_error_v10`\n- 旧版仓库：https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-agent\n\n**[存储代理](src\u002Fai\u002Fbackend\u002Fstorage\u002FREADME.md)** - 虚拟文件夹和存储后端抽象\n- 提供多存储后端的统一接口\n- 实时性能指标和加速 API\n- 旧版仓库：https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-storage-proxy\n\n**[Web 服务器](src\u002Fai\u002Fbackend\u002Fweb\u002FREADME.md)** - Web UI 托管和会话管理\n- 托管 Backend.AI WebUI（SPA）\n- 会话管理和 API 请求签名\n- 旧版仓库：https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-webserver\n\n**同步静态 Backend.AI WebUI 版本：**\n```console\n$ scripts\u002Fdownload-webui-release.sh \u003C要下载的目标版本>\n```\n\n**[应用代理](src\u002Fai\u002Fbackend\u002Fappproxy\u002Fcoordinator\u002FREADME.md)** - 服务路由和负载均衡\n- 将流量路由到容器内的服务（Jupyter、VSCode 等）\n- 动态电路配置和健康监测\n\n### 容器运行时组件\n\n**[内核](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-kernels)** - 容器镜像配方\n- 基于 Dockerfile 的计算环境配方\n- 支持流行的机器学习框架和编程语言\n\n**[Jail](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-jail)** - 可编程沙盒（Rust）\n- 基于 ptrace 的系统调用过滤\n- 资源控制和安全强化\n\n**[Hook](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-hook)** - 容器内运行时库\n- libc 替换以实现资源控制\n- 基于 Web 的交互式 stdin 支持\n\n### 客户端 SDK 库\n\n我们提供多种流行编程语言的客户端 SDK（MIT 许可证）：\n\n- **Python** - `pip install backend.ai-client` | [GitHub](src\u002Fai\u002Fbackend\u002Fclient) | 包含 CLI\n- **Java** - [发布页面](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-client-java)\n- **Javascript** - `npm install backend.ai-client` | [GitHub](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-client-js)\n- **PHP** - （准备中）`composer require lablup\u002Fbackend.ai-client` | [GitHub](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-client-php)\n\n\n插件\n-------\n\nBackend.AI 通过 Python 包的入口点支持基于插件的扩展性：\n\n**加速器插件** (`backendai_accelerator_v21`)\n- [CUDA](src\u002Fai\u002Fbackend\u002Faccelerator\u002Fcuda_open) - NVIDIA GPU 支持\n- [CUDA Mock](src\u002Fai\u002Fbackend\u002Faccelerator\u002Fcuda_mock) - 无需实际 GPU 的开发环境\n- [ROCm](src\u002Fai\u002Fbackend\u002Faccelerator\u002Frocm) - AMD GPU 支持\n- [Furiosa](src\u002Fai\u002Fbackend\u002Faccelerator\u002Ffuriosa) - Furiosa NPU（Warboy \u002F RNGD）支持\n- [Hyperaccel](src\u002Fai\u002Fbackend\u002Faccelerator\u002Fhyperaccel) - Hyperaccel LPU 支持\n- [IPU](src\u002Fai\u002Fbackend\u002Faccelerator\u002Fipu) - Graphcore IPU 支持\n- [Rebellions](src\u002Fai\u002Fbackend\u002Faccelerator\u002Frebellions) - Rebellions NPU（ATOM、ATOM+、ATOM Max）支持\n- [Tenstorrent](src\u002Fai\u002Fbackend\u002Faccelerator\u002Ftenstorrent) - Tenstorrent NPU（Wormhole、Blackhole）支持\n- [TPU](src\u002Fai\u002Fbackend\u002Faccelerator\u002Ftpu) - Google TPU（v2、v3）支持\n\n**监控插件**\n- [`backendai_monitor_stats_v10`](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-monitor-datadog) - Datadog 统计信息收集器\n- [`backendai_monitor_error_v10`](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-monitor-sentry) - Sentry 异常收集器\n\n\n遗留组件\n-----------------\n\n**[媒体库](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai-media)** - 多媒体输出支持（已不再维护）\n\n**IDE 扩展** - （已弃用：请改用内核中的 Jupyter Lab、VSCode Server 或 SSH）\n- [VSCode Live Code Runner](https:\u002F\u002Fgithub.com\u002Flablup\u002Fvscode-live-code-runner)\n- [Atom Live Code Runner](https:\u002F\u002Fgithub.com\u002Flablup\u002Fatom-live-code-runner)\n\n开发\n-----------\n\n### 构建软件包\n\n构建 Python 的 wheel 文件或 SCIE（自包含可安装执行文件）：\n\n```bash\n.\u002Fscripts\u002Fbuild-wheels.sh  # 构建 .whl 包\n.\u002Fscripts\u002Fbuild-scies.sh   # 构建 SCIE 包\n```\n\n生成的包会放置在 `dist\u002F` 目录下。\n\n### 代码质量钩子\n\nBackend.AI 使用 Git 的 pre-commit 钩子来保持代码质量：\n\n```bash\n# 每次提交时自动运行：\n# - 自动格式化（pants fmt）\n# - 代码检查（pants lint）\n\n# 如有需要可绕过钩子（请谨慎使用）\ngit commit --no-verify\n```\n\npre-commit 钩子会验证：\n- 代码风格和格式\n\n类型检查和测试则在 CI 中运行，以确保全面覆盖。\n\n详细钩子系统文档请参阅 [CLAUDE.md](CLAUDE.md#hooks-and-code-quality)。\n\n### 开发指南\n\n有关详细的开发环境搭建、构建系统使用以及贡献指南：\n- [开发环境搭建](src\u002Fai\u002Fbackend\u002FREADME.md#development-setup) - Python 版本、Pantsbuild、依赖管理\n- [CONTRIBUTING.md](.github\u002FCONTRIBUTING.md) - 贡献指南和开发流程\n- [MIGRATION.md](MIGRATION.md) - 主要版本更新迁移指南\n\n#### 开发时启用 CLI Tab 补全功能\n\n在开发过程中为 Backend.AI CLI 启用 Shell Tab 补全功能：\n\n```bash\n# 对于 bash\u002Fzsh（从仓库根目录） - 必须使用 'source'，不能直接执行\nsource scripts\u002Fsetup-dev-completion.sh\n\n# 对于 fish shell\nsource scripts\u002Fsetup-dev-completion.fish\n\n# 或者创建永久别名\n# bash\u002Fzsh:\necho 'alias bai-dev=\"cd \u002Fpath\u002Fto\u002Fbackend.ai && source scripts\u002Fsetup-dev-completion.sh\"' >> ~\u002F.zshrc\n# fish:\necho 'alias bai-dev=\"cd \u002Fpath\u002Fto\u002Fbackend.ai; and source scripts\u002Fsetup-dev-completion.fish\"' >> ~\u002F.config\u002Ffish\u002Fconfig.fish\n```\n\n这将启用以下 Tab 补全功能：\n- `backend.ai \u003Ctab>` - 显示所有命令\n- `backend.ai session \u003Ctab>` - 会话管理\n- `backend.ai admin \u003Ctab>` - 管理操作\n- `backend.ai --\u003Ctab>` - 全局选项\n\n支持 **bash**、**zsh** 和 **fish** shell。\n\n许可证\n-------\n\n请参阅 [LICENSE 文件](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002Fmain\u002FLICENSE)。","# Backend.AI 快速上手指南\n\nBackend.AI 是一个基于容器的计算集群平台，支持多种机器学习框架和编程语言，并提供对 CUDA、ROCm、TPU 等多种异构加速器的插件式支持。它通过 REST 和 GraphQL API 暴露所有功能，适合多租户按需或批量分配计算资源。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux (基于 Debian\u002FRHEL) 或 macOS\n- **权限**: 需要 `sudo` 权限进行安装\n- **硬件资源**: 开发环境推荐 4+ CPU 核心，8GB+ 内存\n\n### 前置依赖\n请确保已安装以下基础软件：\n\n**必需组件**:\n- **Python**: 3.13.x (主分支要求 CPython 3.13.7)\n- **Docker**: 20.10+ (需包含 Compose v2)\n- **PostgreSQL**: 16+ (测试版本 16.3)\n- **Redis**: 7.2+ (测试版本 7.2.11)\n- **etcd**: 3.5+ (测试版本 3.5.14)\n- **Prometheus**: 3.x (测试版本 3.1.0)\n\n**可选组件** (用于可观测性):\n- Grafana 11.x, Loki 3.x, Tempo 2.x, OpenTelemetry Collector\n\n**构建工具**:\n- **Pantsbuild**: 2.27.x\n\n> **提示**: 国内用户若遇到 Python 包下载缓慢，可配置国内镜像源（如清华源、阿里源）加速 `pip` 和 `pants` 依赖下载。\n\n## 安装步骤\n\n### 1. 克隆代码并初始化环境\n执行官方提供的开发安装脚本，该脚本会自动检查依赖、创建虚拟环境、启动基础设施服务（数据库、缓存等）并初始化默认账户。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai.git\ncd backend.ai\n.\u002Fscripts\u002Finstall-dev.sh\n```\n\n### 2. 启动后端服务\n需要在**不同的终端窗口**中分别启动以下核心组件：\n\n**终端 1 - Manager (集群控制平面):**\n```bash\n.\u002Fbackend.ai mgr start-server --debug\n```\n\n**终端 2 - Agent (节点控制器):**\n```bash\n.\u002Fbackend.ai ag start-server --debug\n```\n\n**终端 3 - Storage Proxy (存储代理):**\n```bash\n.\u002Fpy -m ai.backend.storage.server\n```\n\n**终端 4 - Web Server (Web UI 服务):**\n```bash\n.\u002Fpy -m ai.backend.web.server\n```\n\n**终端 5 & 6 - App Proxy (可选，用于容器内应用访问):**\n```bash\n.\u002Fbackend.ai app-proxy-coordinator start-server --debug\n.\u002Fbackend.ai app-proxy-worker start-server --debug\n```\n\n## 基本使用\n\n### 1. 配置客户端会话\n加载本地用户环境变量并登录。脚本会打印默认的用户 ID 和密码。\n\n```bash\nsource env-local-user-session.sh\n# 根据提示输入上面脚本输出的 User ID 和 Password\n.\u002Fbackend.ai login\n```\n\n### 2. 运行第一个计算任务\n使用 CLI 运行一个简单的 Python 会话：\n\n```bash\n.\u002Fbackend.ai run python -c \"print('Hello Backend.AI!')\"\n```\n\n### 3. 访问 Web 界面\n打开浏览器访问 **http:\u002F\u002Flocalhost:8090**，使用 `env-local-*.sh` 文件中生成的凭证登录。\n\n### 4. 常用功能入口\n登录后，您可以直接通过浏览器或 CLI 访问容器内的开发环境：\n- **Jupyter\u002FJupyterLab**: 大多数镜像内置支持，适合数据科学开发。\n- **Web Terminal**: 所有会话内置 `ttyd` 支持，提供网页版终端。\n- **VSCode**: 多数会话内置网页版 VSCode 支持。\n- **SSH\u002FSFTP**: 所有会话支持 SSH 连接，可使用 PyCharm 等 IDE 通过远程解释器连接。\n- **虚拟文件夹 (vfolders)**: 类似云存储，可挂载到任意计算会话并在用户间共享。","某生物科技公司算法团队需在混合了 NVIDIA GPU 和 Google TPU 的集群上，为多名数据科学家并发运行基因测序分析任务。\n\n### 没有 backend.ai 时\n- **资源调度混乱**：人工分配硬件导致高端算力闲置与低效争抢并存，无法根据任务自动匹配 CUDA GPU 或 TPU。\n- **环境配置繁琐**：每位成员需手动在服务器上安装依赖、配置 Docker，不同项目间的库版本冲突频发，排查耗时。\n- **多租户隔离缺失**：团队成员共用服务器权限，误操作易导致他人任务中断，且难以精确统计个人算力消耗。\n- **远程开发受阻**：缺乏统一入口，开发者无法直接通过 VSCode 或 Jupyter 安全地接入容器内部进行调试。\n\n### 使用 backend.ai 后\n- **智能异构调度**：backend.ai 自动识别任务需求，将深度学习模型分发至 GPU，将大规模并行计算调度至 TPU，资源利用率提升 40%。\n- **一键环境就绪**：基于预置容器镜像，开发者秒级启动包含特定框架的计算会话，彻底消除环境配置差异带来的“在我机器上能跑”问题。\n- **精细化隔离与计量**：为每位研究员提供独立的沙箱会话，互不干扰，并自动生成详细的算力使用报表，便于成本分摊。\n- **无缝开发体验**：通过内置的 WebSocket 隧道，团队成员可直接在本地 IDE 中 SSH 连接远程容器，或在浏览器中打开 JupyterLab 即时编码。\n\nbackend.ai 将复杂的异构集群转化为按需取用的标准化算力服务，让算法团队专注于核心科研而非运维琐事。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flablup_backend.ai_a66b8264.png","lablup","Lablup","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Flablup_f3f520e5.png","lab | up: Make AI Accessible - A start-up to innovate AI research \u002F service processes.",null,"contact@lablup.com","https:\u002F\u002Fwww.lablup.com","https:\u002F\u002Fgithub.com\u002Flablup",[85,89,93,96,100,104,107,111,115,118],{"name":86,"color":87,"percentage":88},"Python","#3572A5",97.3,{"name":90,"color":91,"percentage":92},"Jinja","#a52a22",0.9,{"name":94,"color":95,"percentage":92},"Shell","#89e051",{"name":97,"color":98,"percentage":99},"HTML","#e34c26",0.4,{"name":101,"color":102,"percentage":103},"Starlark","#76d275",0.2,{"name":105,"color":106,"percentage":103},"Roff","#ecdebe",{"name":108,"color":109,"percentage":110},"CSS","#663399",0.1,{"name":112,"color":113,"percentage":114},"Dockerfile","#384d54",0,{"name":116,"color":117,"percentage":114},"JavaScript","#f1e05a",{"name":119,"color":120,"percentage":114},"C","#555555",629,172,"2026-04-08T07:17:23","LGPL-3.0",4,"Linux (Debian\u002FRHEL-based), macOS","非必需（支持插件化异构加速器）。可选支持：NVIDIA CUDA GPU, AMD ROCm GPU, Rebellions, FuriosaAI, HyperAccel, Google TPU, Graphcore IPU 等。开发环境可使用 CUDA Mock 插件在无实体 GPU 下运行。","最低：未说明；推荐：8GB+ (开发环境)",{"notes":130,"python":131,"dependencies":132},"1. 安装需要 sudo 权限。2. 核心编排器名为 'Sokovan'。3. 开发环境建议使用提供的 install-dev.sh 脚本一键搭建包含数据库和监控组件的半栈基础设施。4. 生产环境或多节点测试需参考官方文档。5. 服务端组件基于 LGPLv3 协议，客户端 SDK 基于 MIT 协议。","3.13.x (主分支要求 CPython 3.13.7)",[133,134,135,136,137,138,139,140,141],"Docker 20.10+ (含 Compose v2)","PostgreSQL 16+","Redis 7.2+","etcd 3.5+","Prometheus 3.x","Pantsbuild 2.27.x","Grafana 11.x (推荐)","Loki 3.x (推荐)","Tempo 2.x (推荐)",[27,16],[144,145,146,147,148,149,150,151,152,153,154],"python","docker","distributed-computing","api","documentation","cloud-computing","backendai","containers","hpc","monitoring","paas","2026-03-27T02:49:30.150509","2026-04-08T21:02:06.380669",[158,163,168,173,178,182],{"id":159,"question_zh":160,"answer_zh":161,"source_url":162},25084,"如何在本地 VSCode 中通过 Remote SSH 模式连接到 Backend.AI 的会话容器？","Backend.AI 支持生成用于 VSCode 远程连接的专用密码。当会话容器启动时，系统会在主目录下自动生成一个 `.password` 文件（由 root 拥有，权限 644）。用户可以通过 Web UI 的文件下载 API 获取该密码，然后构造 VSCode 的远程 URL 格式：`vscode:\u002F\u002Fvscode-remote\u002Fssh-remote+\u003C用户名>%3A\u003CURL 编码的密码>@\u003CIP>:\u003C端口>\u002F\u003C远程路径>`。注意用户名和密码中不能包含冒号，需进行 URL 编码。","https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F646",{"id":164,"question_zh":165,"answer_zh":166,"source_url":167},25083,"为什么使用 SSH 私钥连接会话时仍然提示输入密码？","这通常是因为安装脚本使用了 root 权限（sudo）运行，导致容器内生成的默认用户权限或文件所有权出现异常。解决方案是：不要使用 sudo 运行安装脚本，而是以普通用户身份重新执行 `install-dev.sh`。安装过程中如果需要特权操作，脚本内部会自行处理，但整体环境和后续运行应基于普通用户权限。","https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F1600",{"id":169,"question_zh":170,"answer_zh":171,"source_url":172},25085,"在 CentOS 7 (gcc 4.8.5) 上安装 Backend.AI 时遇到 hiredis-py 编译错误怎么办？","这是由于 CentOS 7 默认的 gcc 版本过旧（4.8.5），不支持 C99 标准中的某些语法（如 for 循环内的变量声明），而 `hiredis` 库需要 C99 支持。解决方法包括：1. 升级 gcc 编译器到支持 C99 的版本；2. 或者尝试寻找预编译的 Python wheel 包以避免源码编译；3. 如果可能，建议在更新的 Linux 发行版（如 Ubuntu 20.04+）上进行部署以避开此类底层依赖问题。","https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F609",{"id":174,"question_zh":175,"answer_zh":176,"source_url":177},25086,"如何在本地开发环境中无需配置认证头即可调用 Manager API？","对于本地开发设置，可以使用 `backend.ai proxy` 命令。该命令会启动一个不安全的但透明的 API 代理，它针对 Manager 服务并预配置了客户端 SDK 所需的凭据。通过此代理发起请求时，客户端不需要附加任何认证头或 Cookie 即可直接调用 API。","https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F268",{"id":179,"question_zh":180,"answer_zh":181,"source_url":167},25087,"在 Ubuntu 上安装 Backend.AI 的完整前置依赖和步骤是什么？","推荐步骤如下：\n1. 安装 Docker：更新 apt 源，添加 Docker 官方 GPG 密钥和仓库，然后安装 `docker-ce`, `docker-ce-cli`, `containerd.io`。\n2. 配置 Docker 用户权限：执行 `sudo usermod -aG docker $USER` 将当前用户加入 docker 组。\n3. 安装 Docker Compose：创建插件目录并下载二进制文件到 `\u002Fusr\u002Flocal\u002Flib\u002Fdocker\u002Fcli-plugins\u002F`。\n4. 确保使用非 root 用户运行安装脚本 `install-dev.sh` 以避免权限问题。",{"id":183,"question_zh":184,"answer_zh":185,"source_url":177},25088,"Manager 服务启动时报错 'Error initializing cleanup_contexts: monitoring_ctx' 是什么原因？","该错误通常发生在插件加载阶段，特别是当环境配置不完整或依赖项版本冲突时。维护者指出相关代码和所需材料均为开源，建议开发者直接阅读源代码排查具体插件加载失败的原因。此外，确保遵循官方文档的安装步骤，检查 Python 版本兼容性（如 Python 3.9）以及 etcd 服务是否正常运行。",[187,192,197,202,207,212,217,222,227,232,237,242,247,252,257,262,267,272,277,282],{"id":188,"version":189,"summary_zh":190,"released_at":191},154492,"26.3.3","### 修复\n* 修复 vfolder 邀请 RBAC 重新迁移时出现的 ON CONFLICT 列不匹配问题，该问题会导致在 alembic 升级过程中抛出 InvalidColumnReferenceError 错误。（[#10471](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10471)）\n\n### 完整变更日志\n\n查看从项目开始到本版本（26.3.3）的[完整变更日志](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F26.3.3\u002FCHANGELOG.md）。\n\n### 全部提交记录\n\n查看从版本（26.3.2）到版本（26.3.3）之间的[全部提交记录](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F26.3.2...26.3.3)。\n","2026-03-24T09:00:12",{"id":193,"version":194,"summary_zh":195,"released_at":196},154493,"26.3.2","### 修复\n* 添加安全的 Prometheus 指标封装，以防止 `mmap` 错误传播到业务逻辑中 ([#10395](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10395))\n* 移除 Web 服务器 `config.toml.j2` 模板中的重复 `debug` 字段 ([#10423](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10423))\n* 在管理器中添加缺失的 OpenTelemetrySpec 初始化，从而支持将跟踪和日志导出到 OTEL 收集器。([#10439](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10439))\n\n### 完整变更日志\n\n请查看[完整变更日志](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F26.3.2\u002FCHANGELOG.md)，其中包含了自本版本（26.3.2）之前的全部变更记录。\n\n\n### 完整提交日志\n\n请查看[完整提交日志](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F26.3.1...26.3.2)，其中列出了从版本 26.3.1 到 26.3.2 之间的所有提交记录。\n","2026-03-24T06:18:08",{"id":198,"version":199,"summary_zh":200,"released_at":201},154504,"26.2.2","### Fixes\n* Filter out zero-valued resource slots from the `resource_allocation_limit_for_sessions` scaling group response. ([#9221](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9221))\n* Fix fair share StringFilter ignoring negation and case-insensitivity flags in Domain, Project, and User filter conditions ([#9222](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9222))\n* Fix resourceInfo quantity values serialized with excessive decimal places (scale=6) by normalizing Decimals in the GQL layer. ([#9223](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9223))\n* Fix search_rg_* fair share queries incorrectly excluding entities without fair share records when top-level filters are applied, by using INNER JOIN'd columns in RG-context condition factories. ([#9225](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9225))\n* Move filter specs to common and add build_query_condition to REST filters ([#9247](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9247))\n\n### Full Changelog\n\nCheck out [the full changelog](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F26.2.2\u002FCHANGELOG.md) until this release (26.2.2).\n\n\n### Full Commit Logs\n\nCheck out [the full commit logs](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F26.2.1...26.2.2) between release (26.2.1) and (26.2.2).\n","2026-02-23T14:47:38",{"id":203,"version":204,"summary_zh":205,"released_at":206},154505,"26.2.1","### Features\n* Apply prometheus_client multiprocess mode for multi-worker metric aggregation ([#8786](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8786))\n* Convert JSONB reads to normalized tables and fix resource freeing ([#8787](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8787))\n* Disable introspection in hive router configuration ([#8789](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8789))\n* Add GitHub Action to periodically update the default seccomp profile ([#8791](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8791))\n* make GQL query resolver return types nullable for graceful error handling ([#8793](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8793))\n* Add normalized `usage_bucket_entries` table for per-slot resource usage tracking and remove legacy JSONB dual-writes for agents\u002Fsessions occupied slots. ([#8858](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8858))\n\n### Improvements\n* Optimize `invalidate_kernel_related_cache` by replacing SCAN with Index SET ([#8785](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8785))\n* Downgrade route provisioning failure from error to warning log ([#8801](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8801))\n\n### Fixes\n* Rename strawberry `Kernel` GQL types to follow V2 naming convention ([#8769](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8769))\n* Handle missing scratch recovery files gracefully to prevent agent startup failure ([#8796](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8796)),\n  ([#8814](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8814))\n* Fix model service dry run failing when extra mounts are present by correctly extracting `folder_id` from `VFolderID` objects. Previously, the full `VFolderID` object was passed instead of the UUID, causing type mismatches in mount configuration. ([#8804](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8804))\n* Fix Cartesian product warning in model serving repository by using `select_from()` pattern for JOIN queries, resolving SQLAlchemy warnings about missing join conditions between `users` and `keypairs` tables. ([#8813](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8813))\n* Raise proper GPFS exceptions for 4xx HTTP responses instead of silently passing them. Previously, `base_response_handler` ignored all 4xx status codes, causing authentication failures (401) to be misidentified as `GPFSNotFoundError` and preventing proper error diagnosis ([#8815](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8815))\n* Fix UndefinedObjectError on vfolder queries after upgrade by adding alembic migration to rename the `vfolderpermission` PostgreSQL enum type to `vfoldermountpermission`. ([#9067](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9067))\n* Fix alembic migration naming convention for BA-4308 by renaming file and revision ID to use standard 12-char hex format. ([#9070](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9070))\n* Fix Prometheus metrics scrape crash when `\u002Ftmp` multiprocess directory is cleaned by OS, by using a persistent directory (`\u002Fvar\u002Frun\u002Fbackendai\u002Fprometheus\u002F`) outside `\u002Ftmp`. ([#9114](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9114))\n* Fix FK violation on agent heartbeat by seeding `resource_slot_types` table during oneshot DB initialization. ([#9129](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9129))\n* Fix cross-architecture binary collision in build-scies CI job by including runner architecture in Pants cache key. ([#9185](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9185))\n* Pass SCIENCE_AUTH_API_GITHUB_COM_BEARER to Pants subprocess environment to prevent GitHub API rate limit errors during scie build in CI. ([#9186](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9186))\n* Fix release CI failure where `dist\u002Fexport\u002F` directory caused asset upload loop to error with \"is a directory\". ([#9194](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9194))\n\n### Documentation Updates\n* Add \u002Fsubmit and \u002Frelease skills ([#8819](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8819))\n\n### Miscellaneous\n* Separate ClientConnectionError handling from generic exception in web proxy to prevent unnecessary 500 error responses and noisy error logs on normal client disconnections. ([#9158](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9158))\n* Split GitHub Release asset upload into individual `gh release upload` calls with retry to prevent API timeout on bulk upload. ([#9184](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9184))\n\n### Full Changelog\n\nCheck out [the full changelog](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F26.2.1\u002FCHANGELOG.md) until this release (26.2.1).\n\n\n### Full Commit Logs\n\nCheck out [the full commit logs](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F26.2.0...26.2.1) between release (26.2.0) and (26.2.1).\n","2026-02-22T07:43:32",{"id":208,"version":209,"summary_zh":210,"released_at":211},154506,"25.15.10","### External Dependency Updates\n* Upgrade valkey-glide to ~=2.2.2 and opentelemetry to ~=1.39.0 ([#9170](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9170))\n\n### Full Changelog\n\nCheck out [the full changelog](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F25.15.10\u002FCHANGELOG.md) until this release (25.15.10).\n\n\n### Full Commit Logs\n\nCheck out [the full commit logs](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F25.15.9...25.15.10) between release (25.15.9) and (25.15.10).\n","2026-02-20T09:49:35",{"id":213,"version":214,"summary_zh":215,"released_at":216},154494,"26.3.1","### 功能特性\n* 为 GraphQL 过滤类型添加 AND、OR、NOT 逻辑运算符，以支持复杂的布尔过滤表达式。([#10250](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10250))\n* 在管理器的内部应用中新增内部健康检查端点 (`\u002Fhealth`)，并将公共健康检查处理器简化为单纯的存活探针。([#10308](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10308))\n\n### 改进\n* 添加 `TimeoutSeconds` 注解类型，以集中并简化请求 DTO 中的会话超时验证逻辑。([#10267](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10267))\n\n### 修复\n* 修复全局容器注册表 RBAC 迁移逻辑，使其映射到项目范围而非域范围。([#10082](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10082))\n* 修复资源预设检查在扩展组无活动会话时返回错误占用率的问题。([#10268](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10268))\n* 修复会话依赖关系 GraphQL 数据加载器因键映射错误及缺少急加载而导致返回空结果的问题。([#10280](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10280))\n* 在依赖注入重构后，通过将 `db` 和 `config_provider` 注入根应用上下文，恢复 Web 应用插件（OpenID、TOTP）对它们的访问权限。([#10292](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10292))\n* 为 `AuthorizeRequest` 和 `AuthorizeAction` 添加 `otp` 字段，以兼容 TOTP 双因素认证。([#10305](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10305))\n* 在利用率空闲检查中排除无法度量的指标，而不是将统计收集失败视为 0% 使用率。([#10316](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10316))\n* 在依赖注入重构后，通过将 `etcd` 和 `valkey_stat` 注入根应用上下文，恢复 Web 应用插件（Cloud）对它们的访问权限。([#10318](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10318))\n* 将经过身份验证的 TOTP 端点路由至 `web_handler`，而非匿名处理程序。([#10345](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10345))\n\n### 完整变更日志\n\n请查看[完整变更日志](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F26.3.1\u002FCHANGELOG.md)，了解自本版本（26.3.1）之前的全部变更记录。\n\n\n### 完整提交日志\n\n请查看[完整提交日志](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F26.3.0...26.3.1)，了解从版本 (26.3.0) 到 (26.3.1) 之间的所有提交记录。\n","2026-03-24T01:27:29",{"id":218,"version":219,"summary_zh":220,"released_at":221},154495,"25.11.4","### 功能\n* 定期执行资源使用量重新计算（[#5646](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F5646)）\n\n### 改进\n* 为 `gather_container_measures` 调用添加每个插件的超时时间（120秒），以防止单个卡住的插件阻塞所有其他插件的统计收集（[#9781](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9781)）\n\n### 修复\n* 修复 Valkey 客户端地址的值类型错误（[#5649](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F5649)）\n* 从 `StatContext` 中移除不必要的 `asyncio.Lock`，因为自并发已由 `TimerDelayPolicy.CANCEL` 阻止，且每个收集方法都操作独立的数据结构（[#9256](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9256)）\n* 修复容器 net_rx\u002Fnet_tx 统计因未检查 setns() 返回值而读取宿主机命名空间计数器的问题（[#9681](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9681)）\n* 在 `netstat_ns()` 之前预先验证命名空间路径，以防止线程池因过时网络命名空间中的卡死线程而耗尽资源（[#9782](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9782)）\n* 将无法测量的指标排除在利用率空闲检查之外，而不是将统计收集失败视为 0% 使用率（[#10316](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F10316)）\n\n### 完整变更日志\n\n请查看[完整变更日志](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F25.11.4\u002FCHANGELOG.md)，直至本版本（25.11.4）。\n\n\n### 完整提交日志\n\n请查看[完整提交日志](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F25.11.3...25.11.4)，比较版本（25.11.3）与（25.11.4）之间的更改。\n","2026-03-20T03:45:17",{"id":223,"version":224,"summary_zh":225,"released_at":226},154496,"25.15.12","### 修复\n* 从 `StatContext` 中移除不必要的 `asyncio.Lock`，因为 `TimerDelayPolicy.CANCEL` 已经防止了自并发，且每个收集方法都操作独立的数据结构（[#9256](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9256)）\n* 将 CANCEL 延迟策略应用于统计收集定时器，以防止任务堆积（[#9257](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9257)）\n* 修复由于 `SessionRow.result` 列 ORM 定义中 PostgreSQL 枚举类型名称不匹配（`sessionresult` 对 `sessionresults`）而导致的会话创建时出现的 `DatatypeMismatchError`。（[#9278](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9278)）\n* 在全局切换后，`ModifyContainerRegistryNode` 操作会因重复关联错误而失败（[#9468](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9468)）\n* 修复 `AccountManagerConfig` 中默认组 gid 的错误（使用 `st_gid` 而不是 `st_uid`）。（[#9571](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9571)）\n* 允许超级管理员绕过代理摘要 GraphQL 解析器中的 `hide_agents` 限制（[#9623](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9623)）\n* 在解析 `gpu_alloc_map` 字段时，在 GQL 解析器层面去除 `DeviceId` 键中的 `GPU-` 前缀，以修复 `UUIDFloatMap` 验证错误。（[#9642](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9642)）\n* 验证云检测 IMDS 响应，并强化元数据 JSON 解析，以防止在非主流云服务商上产生误报。（[#9653](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9653)）\n* 通过检查 `setns()` 的返回值，修复容器 net_rx\u002Fnet_tx 统计读取主机命名空间计数器的问题。（[#9681](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9681)）\n* 通过用基于 GraphQL 的分页列表替换已弃用的 REST API，修复 CLI 中的服务名查找问题。（[#9745](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9745)）\n* 在执行 `netstat_ns` 之前预先验证命名空间路径，以防止线程池耗尽。（[#9782](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9782)）\n\n### 完整变更日志\n\n请查看[完整变更日志](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F25.15.12\u002FCHANGELOG.md)，直至本版本（25.15.12）。\n\n\n### 完整提交日志\n\n请查看[完整提交日志](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F25.15.11...25.15.12)，对比版本（25.15.11）与（25.15.12）之间的更改。\n","2026-03-16T02:06:11",{"id":228,"version":229,"summary_zh":230,"released_at":231},154497,"26.3.0","### 功能特性\n\n#### 客户端 SDK v2\n我们对 Backend.AI 客户端库进行了全面重写，推出了 SDK v2，提供了一个可注入认证的类型化异步 HTTP 客户端、覆盖所有 API 表面的领域特定客户端类、WebSocket\u002FSSE 流式传输支持，以及跨所有领域的 Pydantic 类型化请求\u002F响应 DTO。\n\n* 添加 SDK v2 基础架构，包括异步 HTTP 客户端、可注入认证、类型化异常和领域客户端存根（[#8903](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8903)）\n* 为 SDK v2 基础客户端添加 204 No Content 响应支持及 `typed_request_no_content()` 方法（[#8936](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8936)）\n* 为 SDK v2 的 `BackendAIClient` 添加 WebSocket 和 SSE 连接支持（[#8937](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8937)）\n* 为 SDK v2 基础客户端添加二进制和 multipart 请求支持（上传\u002F下载），并恢复延迟会话文件操作（[#8952](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8952)）\n* 为 SDK v2 添加匿名客户端，用于未认证的端点（[#9478](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9478)）\n* 为 `BackendAIAnonymousClient` 添加 `extra_headers` 支持，允许在将请求代理到管理器时进行每请求头注入（例如 `X-Forwarded-For`、`X-Forwarded-Host`、`X-Forwarded-Proto`）（[#9594](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9594)）\n* 在 Web 服务器上注册一个持久化的 `BackendAIClientRegistry`，并将其用于 `update-password-no-auth` API 处理程序（[#9595](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9595)）\n* 实现 SDK v2 的 `AuthClient`，包含所有认证领域 REST 端点的异步方法（[#8913](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8913)）\n* 添加 SDK v2 的 Config 和 Infrastructure 领域客户端，实现对全部异步 REST API 的全覆盖（[#8914](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8914)）\n* 添加 SDK v2 的 Model Serving 领域客户端，涵盖全部 14 种 API 方法（列表、搜索、创建、删除、扩展、同步、路由、令牌、错误、运行时）（[#8915](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8915)）\n* 实现 SDK v2 的 `SessionClient`，提供针对所有会话管理端点的类型化异步方法（[#8916](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8916)）\n* 添加 SDK v2 的容器镜像仓库和存储端点领域客户端（[#8918](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8918)）\n* 添加 SDK v2 的 `VFolderClient`，实现对虚拟文件夹管理端点的全覆盖，包括 CRUD 操作、文件操作、共享\u002F邀请以及管理员操作（[#8919](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8919)）\n* 添加 SDK v2 的 Template 和 Operations 领域客户端，提供完整的 CRUD 支持（[#8912](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8912)）\n* 添加 SDK v2 的 `StreamingClient`，用于 WebSocket（终端、执行、代理）和 SSE（会话事件、后台任务事件）操作（[#8939](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8939)）\n* 添加 SDK v2 的 `NotificationClient`，用于通知通道和规则管理（[#8940](https:\u002F\u002Fgith","2026-03-16T02:05:23",{"id":233,"version":234,"summary_zh":235,"released_at":236},154498,"26.3.0rc2","### 功能特性\n* 添加用于 Prometheus 查询预设管理操作及执行查询的 GraphQL API ([#9643](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9643))\n* 引入 DeploymentSubStep 类型，作为跟踪部署策略进度的基础构建块。([#9817](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9817))\n* 根据尝试次数和超时时间，将部署协调器失败分类为 need_retry、expired 或 give_up ([#9871](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9871))\n* 为支持零停机部署，添加部署生命周期子步骤处理器（provisioning、progressing）([#9880](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9880))\n* 为支持零停机部署，添加部署策略评估器和结果应用器 ([#9888](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9888))\n\n### 改进\n* 在 `gather_container_measures` 调用中为每个插件添加超时限制（120秒），以防止单个卡住的插件阻塞所有其他插件的指标采集 ([#9781](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9781))\n* 将 `KernelStatusInMatchSpec` 从 `api\u002Fgql\u002Fkernel\u002Ftypes.py` 移至 `data\u002Fkernel\u002Ftypes.py`，以解决来自 `repositories\u002Fscheduler\u002Foptions.py` 的跨层导入问题。([#9828](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9828))\n* 将 SessionLifecycleManager 从模型层移至服务层，以修复向上依赖的导入违规问题 ([#9830](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9830))\n* 提取共享的 PaginationInfo DTO 类，并用统一的导入替换各模块中的重复定义。([#9869](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9869))\n* 在 `recalc_resource_usage()` 中加入 agent_resources 资源协调逻辑，用于检测并自动修正 `agent_resources.used` 与实际资源分配之间的偏差 ([#9931](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9931))\n\n### 修复\n* 在路由更新时同步 `circuit.route_info`，确保推理指标采集始终针对当前路由集，从而修复自动扩缩容在首次缩容后无法重新触发扩容的问题 ([#9760](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9760))\n* 在调用 `netstat_ns()` 之前预先验证命名空间路径，以避免因过期网络命名空间中的卡死线程而导致线程池耗尽 ([#9782](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9782))\n* 为 MemoryPlugin 的 sysfs_impl 指标采集添加每调用超时和错误隔离机制，防止因容器损坏导致的永久性挂起 ([#9783](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9783))\n* 在 7 个文件中的 22 个 admin_* GraphQL 解析器上添加缺失的 check_admin_only() 检查，以防止非超级管理员用户进行未授权访问。([#9827](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9827))\n* 通过更新时间戳并跳过 populate_fixture() 中 NOT NULL + server_default 列的 None 值，修复填充角色 fixture 时出现的 NotNullViolationError 错误。([#9894](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9894))\n* 修复待处理会话查询中缺少 `is_preemptible` 列的问题，该问题导致调度","2026-03-12T08:43:20",{"id":238,"version":239,"summary_zh":240,"released_at":241},154507,"25.15.9","### Fixes\n* Read HTTP responses before connection closes in Agent Watcher APIs ([#7165](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7165))\n* Fix race condition in kernel metrics cleanup by removing automatic cleanup from `collect_container_stat()` ([#8250](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8250))\n* `agent_summary_list` pagination may skip agents due to JOIN with `KernelRow` ([#8351](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8351))\n* Add explicit `pycares` dependency to `backend.ai-common` to prevent version mismatch when installing from wheel ([#8357](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8357))\n* Prevent kernel status regression from TERMINATED to TERMINATING in `destroy_session()` which caused a livelock in model serving sessions ([#8406](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8406))\n* Add missing ordering when querying pending sessions ([#8419](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8419))\n* Add VNC and RDP protocol support for remote desktop service ports. ([#8431](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8431))\n* Handle missing constraint and index in permission tables migration downgrade ([#8446](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8446))\n* Make `group` parameter required in `check_presets` API. Previously, the default value \"default\" caused `ProjectNotFound` errors when the default group did not exist in the system. ([#8462](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8462))\n* Allow committing sessions whose base image has been deleted by resolving both ALIVE and DELETED images during session commit validation ([#8511](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8511))\n* Check vfolder permissions of requester's keypair rather when clone vfolder. This allows users with sufficient permission can clone vfolders from model store. ([#8569](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8569))\n* Fix Cartesian product warning in model serving repository by using `select_from()` pattern for JOIN queries, resolving SQLAlchemy warnings about missing join conditions between `users` and `keypairs` tables. ([#8813](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8813))\n\n### Full Changelog\n\nCheck out [the full changelog](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F25.15.9\u002FCHANGELOG.md) until this release (25.15.9).\n\n\n### Full Commit Logs\n\nCheck out [the full commit logs](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F25.15.8...25.15.9) between release (25.15.8) and (25.15.9).\n","2026-02-20T06:38:59",{"id":243,"version":244,"summary_zh":245,"released_at":246},154499,"26.2.5","### 改进\n* 向 `recalc_resource_usage()` 添加 `agent_resources` 对账逻辑，用于检测并自动修复 `agent_resources.used` 与实际资源分配之间的偏差（[#9931](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9931)）\n\n### 修复\n* 在全局开关启用后，`ModifyContainerRegistryNode` 操作会因重复关联错误而失败（[#9468](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9468)）\n* 在统计信息采集期间，所有容器共享同一个 Docker 客户端实例。此前每个容器都会创建自己的 Docker() 实例，导致在大规模场景下出现文件描述符耗尽的问题（[#9469](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9469)）\n* 在向代理发送 RPC 之前，验证推理会话是否至少挂载了一个模型类型的虚拟目录（[#9557](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9557)）\n* 修复 `AccountManagerConfig` 中默认组 GID 的错误（使用 `st_gid` 而不是 `st_uid`）（[#9571](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9571)）\n* 修复当会话状态切换为 RUNNING 时，`status_info` 字段未被清空的问题（[#9600](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9600)）\n* 允许超级管理员绕过 `agent_summary` GraphQL 解析器中的 `hide_agents` 限制（[#9623](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9623)）\n* 在解析 `gpu_alloc_map` 字段的 GQL 解析器层面，移除 `DeviceId` 键中的 `GPU-` 前缀，以修复 `UUIDFloatMap` 验证错误（[#9642](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9642)）\n* 验证云检测 IMDS 响应，并强化元数据 JSON 解析，以防止在非主流云服务商上产生误报（[#9653](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9653)）\n* 修复容器 `net_rx\u002Fnet_tx` 统计信息读取宿主机命名空间计数器的问题，原因是未检查 `setns()` 函数的返回值（[#9681](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9681)）\n* 通过将已弃用的 REST API 替换为基于 GraphQL 的分页列表，修复 CLI 中服务名称查找功能（[#9745](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9745)）\n* 在路由更新时同步 `circuit.route_info`，确保推理指标采集始终针对当前路由集合，从而解决自动扩缩容在首次缩容周期后无法重新触发扩容的问题（[#9760](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9760)）\n* 在调用 `netstat_ns()` 之前预先验证命名空间路径，以防止因过时网络命名空间中的线程挂起而导致线程池耗尽（[#9782](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9782)）\n* 修复内核 `container_id` 属性被 `UserDict` 数据字典遮蔽的问题，该问题会导致代理重启后容器统计信息采集失败（[#9790](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9790)）\n* 修复强制终止和批量终止路径中的资源泄漏问题，原因是未设置 `resource_allocations.free_at`，且未扣减 `agent_resources.used`（[#9930](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9930)）\n* 通过在占用率查询中添加内核状态过滤器，修复 `check-presets` 中的幽灵资源使用问题（[#9967](https:\u002F\u002Fgithub.co","2026-03-12T08:43:32",{"id":248,"version":249,"summary_zh":250,"released_at":251},154500,"26.3.0rc1","### 功能特性\n* 将 RBAC Creator 模式应用于镜像（[#8073](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8073)）\n* 将用户创建迁移至使用 RBACEntityCreator 模式，以实现自动范围关联（[#8627](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8627)）\n* 添加 `SessionV2` Strawberry GraphQL 模式（[#8641](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8641)）\n* 在 Manager 中启用 OpenTelemetry 分布式追踪，通过激活全局 TracerProvider 并对 aiohttp 服务器\u002F客户端进行插桩，以支持 W3C Trace Context 的传播。（[#8694](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8694)）\n* 添加 RBAC GraphQL 类型及角色、权限和实体的模式注册。（[#8746](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8746)）\n* 为 Strawberry GraphQL V2 API 实现 `admin_bulk_update_users_v2` 变更操作。（[#8771](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8771)）\n* 在 GraphQL 解析器中间件中添加 OpenTelemetry Span，用于记录每个解析器的延迟信息（[#8777](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8777)）\n* 在 Agent 端生成 dropbear SSH 主机密钥，而非在容器内部生成。此举可避免用户误删容器内的主机密钥而导致 SSH 连接失败。（[#8783](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8783)）\n* 对多工作进程的指标聚合应用 prometheus_client 多进程模式（[#8786](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8786)）\n* 将 JSONB 读取转换为规范化表，并修复资源释放问题（[#8787](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8787)）\n* 在 Hive 路由器配置中禁用内省功能（[#8789](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8789)）\n* 添加 GitHub Action，定期更新默认的 seccomp 配置文件（[#8791](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8791)）\n* 将 GraphQL 查询解析器的返回类型设置为可空，以便更优雅地处理错误（[#8793](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8793)）\n* 将 ResourceSlot 读取转换为 list[SlotQuantity]，并修复分配的原子性问题（[#8794](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8794)）\n* 添加 REST API 端点，用于分页列出 Agent，并支持按状态筛选（[#8820](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8820)）\n* 添加 REST 端点，用于列出计算会话，包含嵌套的容器数据、分页功能以及结构化过滤支持（[#8821](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8821)）\n* 添加分页的 REST 端点，用于列出推理服务（[#8822](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8822)）\n* 添加 DomainComposer，用于仓库和领域对象的结构化依赖注入（[#8855](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8855)）\n* 添加 OrchestrationComposer，用于在第 5 层进行 sokovan 编排器、领导者选举和空闲检查器主机的依赖注入（[#8857](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8857)）\n* 添加规范化的 `usage_bucket_entries` 表，用于按槽位跟踪资源使用情况，并移除针对 Agent 和会话占用槽位的旧版 JSONB 双写操作。（[#8858]","2026-03-11T02:11:31",{"id":253,"version":254,"summary_zh":255,"released_at":256},154501,"26.2.4","### 修复\n* 修复会话 `occupied_slots` 返回空值的问题：改由规范化的 `resource_allocations` 表进行计算，而非依赖已废弃且在 Phase 3 后不再写入的 JSONB 列。([#9433](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9433))\n* 将 Prometheus 多进程默认目录从受环境影响的 `tempfile.gettempdir()` 改为硬编码的 `\u002Ftmp\u002Fbackend.ai\u002Fprometheus`，以避免在不同部署环境中出现权限问题；同时支持通过 `BACKENDAI_PROMETHEUS_DIR` 环境变量或 `base_dir` 参数进行覆盖。([#9434](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9434))\n\n### 完整变更日志\n\n请查看[完整变更日志](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F26.2.4\u002FCHANGELOG.md)，了解截至本版本（26.2.4）的所有变更。\n\n### 完整提交记录\n\n请查看[完整提交记录](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F26.2.3...26.2.4)，对比版本 (26.2.3) 和 (26.2.4) 之间的所有提交。\n","2026-02-27T04:35:12",{"id":258,"version":259,"summary_zh":260,"released_at":261},154502,"26.2.3","### Fixes\n* Remove the unnecessary `asyncio.Lock` from `StatContext` as self-concurrency is already prevented by `TimerDelayPolicy.CANCEL` and each collect method operates on independent data structures ([#9256](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9256))\n* Apply CANCEL delay policy to stat collection timers to prevent task accumulation ([#9257](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9257))\n* Fix `DatatypeMismatchError` on session creation caused by mismatched PostgreSQL enum type name (`sessionresult` vs `sessionresults`) in `SessionRow.result` column ORM definition. ([#9278](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9278))\n* Fix prometheus multiprocess dir permission error by using a uid-based path (`\u002Ftmp\u002Fbackendai.{uid}\u002Fprometheus\u002F`) and logging clear errors on `mkdir()` failures. ([#9319](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9319))\n* Allow fair share weight updates without resource group membership by adding graceful fallback when scaling group doesn't exist. ([#9320](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9320))\n* Allow fair share and usage bucket queries for entities not registered in a resource group, returning default values instead of raising errors. ([#9321](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9321))\n* Fix session creation failure caused by mismatched PostgreSQL enum type name by standardizing on the singular form `sessionresult` (matching database convention) - removed explicit plural name from ORM and added migration to rename `sessionresults` → `sessionresult` where needed. ([#9326](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9326))\n* Restore AppProxyConnectionError HTTP status code from 503 to 500 to match client-side error handling ([#9330](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9330))\n* Skip TERMINATING state and transition directly to TERMINATED on force-terminate to immediately free resources ([#9341](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9341))\n* Add missing scheduling history records for enqueue (initial PENDING) and terminating state transitions ([#9342](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9342))\n\n### Full Changelog\n\nCheck out [the full changelog](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F26.2.3\u002FCHANGELOG.md) until this release (26.2.3).\n\n\n### Full Commit Logs\n\nCheck out [the full commit logs](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F26.2.2...26.2.3) between release (26.2.2) and (26.2.3).\n","2026-02-25T09:40:19",{"id":263,"version":264,"summary_zh":265,"released_at":266},154503,"25.15.11","### Fixes\n* Fix cross-architecture binary collision in build-scies CI job by including runner architecture in Pants cache key. ([#9185](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9185))\n* Pass SCIENCE_AUTH_API_GITHUB_COM_BEARER to Pants subprocess environment to prevent GitHub API rate limit errors during scie build in CI. ([#9186](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9186))\n* Fix release CI failure where `dist\u002Fexport\u002F` directory caused asset upload loop to error with \"is a directory\". ([#9194](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9194))\n\n### Miscellaneous\n* Split GitHub Release asset upload into individual `gh release upload` calls with retry to prevent API timeout on bulk upload. ([#9184](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F9184))\n\n### Full Changelog\n\nCheck out [the full changelog](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F25.15.11\u002FCHANGELOG.md) until this release (25.15.11).\n\n\n### Full Commit Logs\n\nCheck out [the full commit logs](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F25.15.10...25.15.11) between release (25.15.10) and (25.15.11).\n","2026-02-23T15:21:03",{"id":268,"version":269,"summary_zh":270,"released_at":271},154508,"26.2.0","### Features\n\n#### GraphQL V2 API\nImplemented the Strawberry-based GraphQL V2 API layer across core entities (User, Project, Domain, Kernel, Image) with full repository and service layers, DataLoaders, nested filter\u002Forder support, cross-entity relationships, and scope-based naming conventions following BEP-1041.\n\n* Define `KernelV2` GraphQL types with structured fields ([#8079](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8079))\n* Add `kernels` resolver to `AgentV2` GQL type ([#8080](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8080))\n* Implement BatchQuerier-based `SearchImageAction` ([#8339](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8339))\n* Implement `Image` strawberry GraphQL DataLoaders ([#8340](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8340))\n* Define `ImageV2` GraphQL schema types with structured fields ([#8396](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8396))\n* Add `ImageAlias` support to Strawberry GraphQL schema ([#8400](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8400))\n* Implement BEP-1041 scope-based GraphQL API naming convention ([#8467](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8467))\n* Add User V2 GraphQL API type definitions ([#8556](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8556))\n* Add User repository and service layer for V2 API ([#8557](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8557))\n* Add Project V2 repository and service layer with search capabilities following User V2 pattern ([#8558](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8558))\n* Add ProjectV2 GraphQL type definitions ([#8559](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8559))\n* Add Domain V2 repository layer ([#8560](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8560))\n* Add Domain V2 GraphQL API type definitions ([#8561](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8561))\n* Add admin_update_resource_group GraphQL mutation ([#8563](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8563))\n* Replace strawberry `image` GQL queries with `admin`, `container-registry` scoped queries ([#8567](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8567))\n* Add entity-scoped scheduling history GraphQL APIs ([#8571](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8571))\n* Add relationship fields and expand FK coverage ([#8607](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8607))\n* Complete GraphQL V2 query implementation for user\u002Fproject\u002Fdomain ([#8608](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8608))\n* Implement `admin_bulk_create_users` mutation for Strawberry GraphQL V2 API. ([#8609](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8609))\n* Register domain_v2 GraphQL queries and add rg_domains_v2 ([#8610](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8610))\n* Add EXISTS-based nested filter\u002Forder helpers for Group and User ([#8671](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8671))\n* Add nested filter\u002Forder for Project and User in DomainV2 GQL ([#8675](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8675))\n* Add nested filter\u002Forder for Domain and User in ProjectV2 GQL ([#8678](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8678))\n* Add nested filter\u002Forder for Domain and Project in UserV2 GQL ([#8681](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8681))\n* Extend ResourceGroup filter\u002Forder fields and add Agent STATUS order ([#8682](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8682))\n* Add core entity DataLoaders for user, project, domain, and agent ([#8696](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8696))\n* Add tiebreaker_order to PaginationSpec for deterministic pagination ([#8697](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8697))\n* Add cross-entity relationship fields to V2 GQL types ([#8699](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8699))\n* Implement KernelV2GQL nested entity fields using DataLoaders ([#8700](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8700))\n* Add DataLoaders and resolve_nodes for remaining simple-key Node types ([#8709](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8709))\n* make GQL query resolver return types nullable for graceful error handling ([#8793](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8793))\n\n#### Fair Share Scheduler Enhancements\nExtended the Fair Share scheduling system with resource group-scoped APIs, usage bucket queries, Fair Share metric calculation, client SDK methods, and CLI commands.\n\n* Filter nested usage buckets by parent period ([#8448](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8448))\n* Fix project\u002Fuser fair share filtering by domain and resource group ([#8471](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8471))\n* Add scheduler field to ResourceGroupGQL ([#8485](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8485))\n* Add Resource group scoped fair share query APIs ([#8504](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8504))\n* Provide default values for missing fair share records at Repository layer ([#8562](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8562))\n* Add usage bucket scoped APIs with date filters ([#8570](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8570))\n* Add REST API bulk upsert for fair share weights ([","2026-02-13T13:34:04",{"id":273,"version":274,"summary_zh":275,"released_at":276},154509,"26.2.0rc1","### Features\n* Support service definition override in legacy model service creation and dry run ([#6934](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F6934))\n* Apply `RBACFieldCreator` pattern to Kernel creation, establishing Session-Kernel relationship via `EntityFieldRow`. Add `FieldType` enum to distinguish field-scoped entities from scope-scoped entities. ([#8035](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8035))\n* Apply RBAC Creator and Granter\u002FRevoker patterns to VFolder. VFolder creation now uses `RBACEntityCreator` for automatic scope association, and permission grant\u002Frevoke operations use `RBACGranter`\u002F`RBACRevoker` for consistent object-level permission management. ([#8038](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8038))\n* Add method based authorization decorator for pydantic migration ([#8046](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8046))\n* Apply RBAC Creator pattern to Model Deployment, automatically creating USER scope associations when endpoints are created for proper ownership tracking and permission management. ([#8048](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8048))\n* Define `KernelV2` GraphQL types with structured fields ([#8079](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8079))\n* Add `kernels` resolver to `AgentV2` GQL type ([#8080](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8080))\n* Add Upload-Offset header validation for TUS upload adhering TUS protocol spec ([#8161](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8161))\n* Automatically install Rover CLI in TUI installer during `install_halfstack` if not already installed, and configure `APOLLO_ELV2_LICENSE` environment variable for supergraph schema generation. ([#8312](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8312))\n* Implement BatchQuerier-based `SearchImageAction` ([#8339](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8339))\n* Implement `Image` strawberry GraphQL DataLoaders ([#8340](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8340))\n* Add exponential backoff for Redis consumer reconnection ([#8387](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8387))\n* Add Image ID-based service logic ([#8392](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8392))\n* Add per-process utilization metrics to Prometheus export. This enables CPU and memory monitoring at the process level within containers, allowing operators to identify resource-heavy processes and perform more granular resource analysis ([#8394](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8394))\n* Define `ImageV2` GraphQL schema types with structured fields ([#8396](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8396))\n* Add `ImageAlias` support to Strawberry GraphQL schema ([#8400](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8400))\n* Add zip based stream class for muplipe files install reqeust. ([#8424](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8424))\n* Implement querier pattern for RBAC permission and scope fetchers, enabling SQL-level pagination and filtering for permission groups, scoped permissions, and object permissions ([#8432](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8432))\n* Filter nested usage buckets by parent period ([#8448](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8448))\n* Implement BEP-1041 scope-based GraphQL API naming convention ([#8467](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8467))\n* Fix project\u002Fuser fair share filtering by domain and resource group ([#8471](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8471))\n* Add scheduler field to ResourceGroupGQL ([#8485](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8485))\n* Add resource group, container registry, artifact registry, and storage host as new RBAC scope types ([#8491](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8491))\n* Enhance project CSV export with JOIN fields ([#8492](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8492))\n* Add JOIN field support to User, Session, and Keypair CSV exports ([#8493](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8493))\n* Add periodic background task for volume stats observation ([#8497](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8497))\n* Add Prometheus Gauges for volume performance metrics and cache-based API response ([#8498](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8498))\n* Enable mypy strict mode across entire codebase ([#8499](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8499))\n* Add Resource group scoped fair share query APIs ([#8504](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8504))\n* Add User V2 GraphQL API type definitions ([#8556](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8556))\n* Add User repository and service layer for V2 API ([#8557](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8557))\n* Add Project V2 repository and service layer with search capabilities following User V2 pattern ([#8558](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8558))\n* Add ProjectV2 GraphQL type definitions ([#8559](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8559))\n* Add Domain V2 repository layer ([#8560](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8560))\n* Add Domain V2 GraphQL API type","2026-02-12T06:49:01",{"id":278,"version":279,"summary_zh":280,"released_at":281},154510,"25.15.8","### Fixes\n* Fix missing `.vimrc` in runner package by restricting gitignore patterns to project root ([#8238](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8238))\n\n### External Dependency Updates\n* Fix pycares version to 4.11 because aiohttp does not support pycares 5.0, which introduced breaking changes. ([#7451](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7451))\n\n### Full Changelog\n\nCheck out [the full changelog](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fblob\u002F25.15.8\u002FCHANGELOG.md) until this release (25.15.8).\n\n\n### Full Commit Logs\n\nCheck out [the full commit logs](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fcompare\u002F25.15.7...25.15.8) between release (25.15.7) and (25.15.8).\n","2026-01-26T03:47:51",{"id":283,"version":284,"summary_zh":285,"released_at":286},154511,"26.1.0","### Features\n\n#### Fair Share Scheduler\nImplemented a Fair Share scheduling system for equitable resource distribution. This system tracks usage by domain and project, adjusting scheduling priorities based on configurable weights.\n\n* Add Fair Share and Resource Usage History row models with tests and migration ([#8008](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8008))\n* Implement Fair Share and Resource Usage History repository layers ([#8030](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8030))\n* Implement FairShareService and ResourceUsageService ([#8031](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8031))\n* Implement Fair Share API layer and CLI ([#8033](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8033))\n* Implement FairShareObserver with factor calculation ([#8208](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8208))\n* Add GQL mutations for fair share configuration ([#8215](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8215))\n* Add REST API endpoints for fair share weight and spec management ([#8220](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8220))\n* Add Fair Share weight management with bulk operations and resource validation ([#8278](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8278))\n* Implement FairShareSequencer with scheduling_rank for Sokovan scheduler ([#8152](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8152))\n\n#### Sokovan Scheduler Redesign\nRedesigned the Sokovan scheduler with a Coordinator-centric architecture, improving state transition logic and integrating scheduling history recording.\n\n* Unify scheduler handlers with SessionLifecycleHandler pattern ([#7867](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7867))\n* Add status transition types and implement status_transitions() in all handlers ([#8111](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8111))\n* Separate SessionPromotionHandler from SessionLifecycleHandler ([#8112](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8112))\n* Redesign Sokovan scheduler with Coordinator-centric architecture ([#8137](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8137))\n* Centralize state change logic to Coordinator and fix kernel status transitions ([#7919](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7919))\n\n#### Scheduling History\nAdded comprehensive scheduling history tracking and API stack for monitoring and debugging scheduler operations.\n\n* Integrate history recording into ScheduleCoordinator status transitions ([#7896](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7896))\n* Add sub_steps support to scheduling history tables ([#7898](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7898))\n* Add recorder instrumentation to session lifecycle handlers ([#7905](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7905))\n* Implement Scheduling History API stack ([#7917](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7917))\n* Improve scheduling history CLI and recording ([#8063](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8063))\n* Implement DeploymentRecorderContext and RouteRecorderContext ([#7972](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7972))\n* Integrate history recording with deployment\u002Froute coordinators ([#7976](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7976))\n* Add recorder instrumentation to deployment and route executors ([#7994](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7994))\n\n#### Email Notification System\nImplemented email notification channel using SMTP, supporting both authenticated SMTP servers and unauthenticated relay servers.\n\n* Implement email notification channel support using SMTP, supporting both authenticated SMTP servers and unauthenticated relay servers ([#7941](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7941))\n* Add email notification support for model service endpoint lifecycle transitions. Notifications are sent through configured notification channels and include endpoint details, previous\u002Fnew status, and transition result ([#8159](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F8159))\n\n#### Agent RPC Connection Pooling\nImplemented RPC connection pooling for improved communication efficiency with agents.\n\n* Implement Agent RPC Connection Pool ([#7909](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7909))\n* Migrate registry.py to AgentClientPool and remove legacy client ([#7973](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7973))\n\n#### CSV Export Infrastructure\nImplemented streaming CSV export infrastructure for bulk data exports.\n\n* Add streaming export infrastructure for CSV export ([#7918](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7918))\n* Implement domain CSV export for projects, users, sessions ([#7944](https:\u002F\u002Fgithub.com\u002Flablup\u002Fbackend.ai\u002Fissues\u002F7944))\n\n#### RBAC System Improvements\nAdded features to support permission inheritance when an entity exists as a field of another entity.\n\n* Add RBAC-aware entity management utilities including Creator, Purger, and Granter classes. Creator handles entity creation with automatic scope association, Purger performs entity deletion with cascading removal of related RBAC data (permissions, roles, associations), and ","2026-01-26T03:04:02"]