[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-getumbrel--llama-gpt":3,"tool-getumbrel--llama-gpt":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160411,2,"2026-04-18T23:33:24",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":106,"forks":107,"last_commit_at":108,"license":109,"difficulty_score":10,"env_os":110,"env_gpu":111,"env_ram":112,"env_deps":113,"category_tags":120,"github_topics":121,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":138,"updated_at":139,"faqs":140,"releases":169},9323,"getumbrel\u002Fllama-gpt","llama-gpt","A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!","LlamaGPT 是一款可完全本地部署、离线运行的类 ChatGPT 聊天机器人，核心由强大的 Llama 2 及 Code Llama 模型驱动。它的核心价值在于极致的隐私保护：所有对话数据仅在用户设备内部处理，绝不上传至云端或第三方服务器，彻底消除了数据泄露的隐患。\n\n这一特性完美解决了用户对敏感信息交互的隐私顾虑，以及在没有网络连接环境下仍需使用智能助手的需求。无论是注重数据安全的企业管理者、需要隔离环境测试代码的开发者、关注隐私的研究人员，还是希望在家中搭建私人智能助手的普通技术爱好者，都能从中受益。\n\n在技术亮点方面，LlamaGPT 不仅支持从 7B 到 70B 多种规模的对话模型，还新增了对 Code Llama 系列的支持，使其具备出色的代码生成与解释能力。此外，它兼容 Nvidia GPU 加速，并提供 OpenAI 格式的 API 接口，方便集成到现有工作流中。安装过程十分灵活，既可通过 umbrelOS 一键部署，也支持在 M1\u002FM2 Mac 或任何装有 Docker 的系统上快速运行，让私有化大模型的应用门槛大幅降低。","\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fapps.umbrel.com\u002Fapp\u002Fllama-gpt\">\n    \u003Cimg width=\"150\" height=\"150\" src=\"https:\u002F\u002Fi.imgur.com\u002FLI59cui.png\" alt=\"LlamaGPT\" width=\"200\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">LlamaGPT\u003C\u002Fh1>\n  \u003Cp align=\"center\">\n    A self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device.\n    \u003Cbr\u002F>\n    \u003Cstrong>New: Support for Code Llama models and Nvidia GPUs.\u003C\u002Fstrong>\n    \u003Cbr \u002F>\n    \u003Cbr \u002F>\n    \u003Ca href=\"https:\u002F\u002Fumbrel.com\">\u003Cstrong>umbrel.com (we're hiring) »\u003C\u002Fstrong>\u003C\u002Fa>\n    \u003Cbr \u002F>\n    \u003Cbr \u002F>\n    \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fumbrel\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Fumbrel?style=social\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Ft.me\u002Fgetumbrel\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcommunity-chat-%235351FB\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Freddit.com\u002Fr\u002Fgetumbrel\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Freddit\u002Fsubreddit-subscribers\u002Fgetumbrel?style=social\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcommunity.umbrel.com\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcommunity-forum-%235351FB\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fumbrel.com\u002F#start\">\n    \u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002Fsj5vqEG.jpg\" width=\"100%\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n## Contents\n\n1. [Demo](#demo)\n2. [Supported Models](#supported-models)\n3. [How to install](#how-to-install)\n   - [On umbrelOS home server](#install-llamagpt-on-your-umbrelos-home-server)\n   - [On M1\u002FM2 Mac](#install-llamagpt-on-m1m2-mac)\n   - [Anywhere else with Docker](#install-llamagpt-anywhere-else-with-docker)\n   - [Kubernetes](#install-llamagpt-with-kubernetes)\n4. [OpenAI-compatible API](#openai-compatible-api)\n5. [Benchmarks](#benchmarks)\n6. [Roadmap and contributing](#roadmap-and-contributing)\n7. [Acknowledgements](#acknowledgements)\n\n## Demo\n\nhttps:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Fassets\u002F10330103\u002F5d1a76b8-ed03-4a51-90bd-12ebfaf1e6cd\n\n## Supported models\n\nCurrently, LlamaGPT supports the following models. Support for running custom models is on the roadmap.\n\n| Model name                               | Model size | Model download size | Memory required |\n| ---------------------------------------- | ---------- | ------------------- | --------------- |\n| Nous Hermes Llama 2 7B Chat (GGML q4_0)  | 7B         | 3.79GB              | 6.29GB          |\n| Nous Hermes Llama 2 13B Chat (GGML q4_0) | 13B        | 7.32GB              | 9.82GB          |\n| Nous Hermes Llama 2 70B Chat (GGML q4_0) | 70B        | 38.87GB             | 41.37GB         |\n| Code Llama 7B Chat (GGUF Q4_K_M)         | 7B         | 4.24GB              | 6.74GB          |\n| Code Llama 13B Chat (GGUF Q4_K_M)        | 13B        | 8.06GB              | 10.56GB         |\n| Phind Code Llama 34B Chat (GGUF Q4_K_M)  | 34B        | 20.22GB             | 22.72GB         |\n\n## How to install\n\n### Install LlamaGPT on your umbrelOS home server\n\nRunning LlamaGPT on an [umbrelOS](https:\u002F\u002Fumbrel.com) home server is one click. Simply install it from the [Umbrel App Store](https:\u002F\u002Fapps.umbrel.com\u002Fapp\u002Fllama-gpt).\n\n[![LlamaGPT on Umbrel App Store](https:\u002F\u002Fapps.umbrel.com\u002Fapp\u002Fllama-gpt\u002Fbadge-light.svg)](https:\u002F\u002Fapps.umbrel.com\u002Fapp\u002Fllama-gpt)\n\n### Install LlamaGPT on M1\u002FM2 Mac\n\nMake sure your have Docker and Xcode installed.\n\nThen, clone this repo and `cd` into it:\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt.git\ncd llama-gpt\n```\n\nRun LlamaGPT with the following command:\n\n```\n.\u002Frun-mac.sh --model 7b\n```\n\nYou can access LlamaGPT at http:\u002F\u002Flocalhost:3000.\n\n> To run 13B or 70B chat models, replace `7b` with `13b` or `70b` respectively.\n> To run 7B, 13B or 34B Code Llama models, replace `7b` with `code-7b`, `code-13b` or `code-34b` respectively.\n\nTo stop LlamaGPT, do `Ctrl + C` in Terminal.\n\n### Install LlamaGPT anywhere else with Docker\n\nYou can run LlamaGPT on any x86 or arm64 system. Make sure you have Docker installed.\n\nThen, clone this repo and `cd` into it:\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt.git\ncd llama-gpt\n```\n\nRun LlamaGPT with the following command:\n\n```\n.\u002Frun.sh --model 7b\n```\n\nOr if you have an Nvidia GPU, you can run LlamaGPT with CUDA support using the `--with-cuda` flag, like:\n\n```\n.\u002Frun.sh --model 7b --with-cuda\n```\n\nYou can access LlamaGPT at `http:\u002F\u002Flocalhost:3000`.\n\n> To run 13B or 70B chat models, replace `7b` with `13b` or `70b` respectively.\n> To run Code Llama 7B, 13B or 34B models, replace `7b` with `code-7b`, `code-13b` or `code-34b` respectively.\n\nTo stop LlamaGPT, do `Ctrl + C` in Terminal.\n\n> Note: On the first run, it may take a while for the model to be downloaded to the `\u002Fmodels` directory. You may also see lots of output like this for a few minutes, which is normal:\n>\n> ```\n> llama-gpt-llama-gpt-ui-1       | [INFO  wait] Host [llama-gpt-api-13b:8000] not yet available...\n> ```\n>\n> After the model has been automatically downloaded and loaded, and the API server is running, you'll see an output like:\n>\n> ```\n> llama-gpt-ui_1   | ready - started server on 0.0.0.0:3000, url: http:\u002F\u002Flocalhost:3000\n> ```\n>\n> You can then access LlamaGPT at http:\u002F\u002Flocalhost:3000.\n\n---\n\n### Install LlamaGPT with Kubernetes\n\nFirst, make sure you have a running Kubernetes cluster and `kubectl` is configured to interact with it.\n\nThen, clone this repo and `cd` into it.\n\nTo deploy to Kubernetes first create a namespace:\n\n```bash\nkubectl create ns llama\n```\n\nThen apply the manifests under the `\u002Fdeploy\u002Fkubernetes` directory with\n\n```bash\nkubectl apply -k deploy\u002Fkubernetes\u002F. -n llama\n```\n\nExpose your service however you would normally do that.\n\n## OpenAI compatible API\n\nThanks to llama-cpp-python, a drop-in replacement for OpenAI API is available at `http:\u002F\u002Flocalhost:3001`. Open http:\u002F\u002Flocalhost:3001\u002Fdocs to see the API documentation.\n\n## Benchmarks\n\nWe've tested LlamaGPT models on the following hardware with the default system prompt, and user prompt: \"How does the universe expand?\" at temperature 0 to guarantee deterministic results. Generation speed is averaged over the first 10 generations.\n\nFeel free to add your own benchmarks to this table by opening a pull request.\n\n#### Nous Hermes Llama 2 7B Chat (GGML q4_0)\n\n| Device                              | Generation speed |\n| ----------------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM)       | 54 tokens\u002Fsec    |\n| GCP c2-standard-16 vCPU (64 GB RAM) | 16.7 tokens\u002Fsec  |\n| Ryzen 5700G 4.4GHz 4c (16 GB RAM)   | 11.50 tokens\u002Fsec |\n| GCP c2-standard-4 vCPU (16 GB RAM)  | 4.3 tokens\u002Fsec   |\n| Umbrel Home (16GB RAM)              | 2.7 tokens\u002Fsec   |\n| Raspberry Pi 4 (8GB RAM)            | 0.9 tokens\u002Fsec   |\n\n#### Nous Hermes Llama 2 13B Chat (GGML q4_0)\n\n| Device                              | Generation speed |\n| ----------------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM)       | 20 tokens\u002Fsec    |\n| GCP c2-standard-16 vCPU (64 GB RAM) | 8.6 tokens\u002Fsec   |\n| GCP c2-standard-4 vCPU (16 GB RAM)  | 2.2 tokens\u002Fsec   |\n| Umbrel Home (16GB RAM)              | 1.5 tokens\u002Fsec   |\n\n#### Nous Hermes Llama 2 70B Chat (GGML q4_0)\n\n| Device                              | Generation speed |\n| ----------------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM)       | 4.8 tokens\u002Fsec   |\n| GCP e2-standard-16 vCPU (64 GB RAM) | 1.75 tokens\u002Fsec  |\n| GCP c2-standard-16 vCPU (64 GB RAM) | 1.62 tokens\u002Fsec  |\n\n#### Code Llama 7B Chat (GGUF Q4_K_M)\n\n| Device                        | Generation speed |\n| ----------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM) | 41 tokens\u002Fsec    |\n\n#### Code Llama 13B Chat (GGUF Q4_K_M)\n\n| Device                        | Generation speed |\n| ----------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM) | 25 tokens\u002Fsec    |\n\n#### Phind Code Llama 34B Chat (GGUF Q4_K_M)\n\n| Device                        | Generation speed |\n| ----------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM) | 10.26 tokens\u002Fsec |\n\n## Roadmap and contributing\n\nWe're looking to add more features to LlamaGPT. You can see the roadmap [here](https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Fissues\u002F8#issuecomment-1681321145). The highest priorities are:\n\n- [x] Moving the model out of the Docker image and into a separate volume.\n- [x] Add Metal support for M1\u002FM2 Macs.\n- [x] Add support for Code Llama models.\n- [x] Add CUDA support for NVIDIA GPUs.\n- [ ] Add ability to load custom models.\n- [ ] Allow users to switch between models.\n\nIf you're a developer who'd like to help with any of these, please open an issue to discuss the best way to tackle the challenge. If you're looking to help but not sure where to begin, check out [these issues](https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Flabels\u002Fgood%20first%20issue) that have specifically been marked as being friendly to new contributors.\n\n## Acknowledgements\n\nA massive thank you to the following developers and teams for making LlamaGPT possible:\n\n- [Mckay Wrigley](https:\u002F\u002Fgithub.com\u002Fmckaywrigley) for building [Chatbot UI](https:\u002F\u002Fgithub.com\u002Fmckaywrigley).\n- [Georgi Gerganov](https:\u002F\u002Fgithub.com\u002Fggerganov) for implementing [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp).\n- [Andrei](https:\u002F\u002Fgithub.com\u002Fabetlen) for building the [Python bindings for llama.cpp](https:\u002F\u002Fgithub.com\u002Fabetlen\u002Fllama-cpp-python).\n- [NousResearch](https:\u002F\u002Fnousresearch.com) for [fine-tuning the Llama 2 7B and 13B models](https:\u002F\u002Fhuggingface.co\u002FNousResearch).\n- [Phind](https:\u002F\u002Fwww.phind.com\u002F) for [fine-tuning the Code Llama 34B model](https:\u002F\u002Fwww.phind.com\u002Fblog\u002Fcode-llama-beats-gpt4).\n- [Tom Jobbins](https:\u002F\u002Fhuggingface.co\u002FTheBloke) for [quantizing the Llama 2 models](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FNous-Hermes-Llama-2-7B-GGML).\n- [Meta](https:\u002F\u002Fai.meta.com\u002Fllama) for releasing Llama 2 and Code Llama under a permissive license.\n\n---\n\n[![License](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fgetumbrel\u002Fllama-gpt?color=%235351FB)](https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Fblob\u002Fmaster\u002FLICENSE.md)\n\n[umbrel.com](https:\u002F\u002Fumbrel.com)\n","\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fapps.umbrel.com\u002Fapp\u002Fllama-gpt\">\n    \u003Cimg width=\"150\" height=\"150\" src=\"https:\u002F\u002Fi.imgur.com\u002FLI59cui.png\" alt=\"LlamaGPT\" width=\"200\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">LlamaGPT\u003C\u002Fh1>\n  \u003Cp align=\"center\">\n    一款自托管、离线的类ChatGPT聊天机器人，基于Llama 2模型。100%私密，数据不会离开您的设备。\n    \u003Cbr\u002F>\n    \u003Cstrong>新增：支持Code Llama模型和Nvidia GPU。\u003C\u002Fstrong>\n    \u003Cbr \u002F>\n    \u003Cbr \u002F>\n    \u003Ca href=\"https:\u002F\u002Fumbrel.com\">\u003Cstrong>umbrel.com（我们正在招聘）»\u003C\u002Fstrong>\u003C\u002Fa>\n    \u003Cbr \u002F>\n    \u003Cbr \u002F>\n    \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fumbrel\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Fumbrel?style=social\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Ft.me\u002Fgetumbrel\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcommunity-chat-%235351FB\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Freddit.com\u002Fr\u002Fgetumbrel\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Freddit\u002Fsubreddit-subscribers\u002Fgetumbrel?style=social\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcommunity.umbrel.com\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcommunity-forum-%235351FB\">\n    \u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fumbrel.com\u002F#start\">\n    \u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002Fsj5vqEG.jpg\" width=\"100%\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n## 目录\n\n1. [演示](#demo)\n2. [支持的模型](#supported-models)\n3. [安装方法](#how-to-install)\n   - [在umbrelOS家庭服务器上](#install-llamagpt-on-your-umbrelos-home-server)\n   - [在M1\u002FM2 Mac上](#install-llamagpt-on-m1m2-mac)\n   - [其他任何使用Docker的地方](#install-llamagpt-anywhere-else-with-docker)\n   - [Kubernetes](#install-llamagpt-with-kubernetes)\n4. [与OpenAI兼容的API](#openai-compatible-api)\n5. [基准测试](#benchmarks)\n6. [路线图与贡献](#roadmap-and-contributing)\n7. [致谢](#acknowledgements)\n\n## 演示\n\nhttps:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Fassets\u002F10330103\u002F5d1a76b8-ed03-4a51-90bd-12ebfaf1e6cd\n\n## 支持的模型\n\n目前，LlamaGPT支持以下模型。运行自定义模型的支持计划在未来实现。\n\n| 模型名称                               | 模型规模 | 模型下载大小 | 所需内存 |\n| ---------------------------------------- | ---------- | ------------------- | --------------- |\n| Nous Hermes Llama 2 7B Chat (GGML q4_0)  | 7B         | 3.79GB              | 6.29GB          |\n| Nous Hermes Llama 2 13B Chat (GGML q4_0) | 13B        | 7.32GB              | 9.82GB          |\n| Nous Hermes Llama 2 70B Chat (GGML q4_0) | 70B        | 38.87GB             | 41.37GB         |\n| Code Llama 7B Chat (GGUF Q4_K_M)         | 7B         | 4.24GB              | 6.74GB          |\n| Code Llama 13B Chat (GGUF Q4_K_M)        | 13B        | 8.06GB              | 10.56GB         |\n| Phind Code Llama 34B Chat (GGUF Q4_K_M)  | 34B        | 20.22GB             | 22.72GB         |\n\n## 安装方法\n\n### 在您的umbrelOS家庭服务器上安装LlamaGPT\n\n在[umbrelOS](https:\u002F\u002Fumbrel.com)家庭服务器上运行LlamaGPT只需点击一下即可。只需从[Umbrel应用商店](https:\u002F\u002Fapps.umbrel.com\u002Fapp\u002Fllama-gpt)安装即可。\n\n[![LlamaGPT在Umbrel应用商店](https:\u002F\u002Fapps.umbrel.com\u002Fapp\u002Fllama-gpt\u002Fbadge-light.svg)](https:\u002F\u002Fapps.umbrel.com\u002Fapp\u002Fllama-gpt)\n\n### 在M1\u002FM2 Mac上安装LlamaGPT\n\n请确保您已安装Docker和Xcode。\n\n然后，克隆此仓库并进入其中：\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt.git\ncd llama-gpt\n```\n\n使用以下命令运行LlamaGPT：\n\n```\n.\u002Frun-mac.sh --model 7b\n```\n\n您可以通过http:\u002F\u002Flocalhost:3000访问LlamaGPT。\n\n> 要运行13B或70B聊天模型，请分别将`7b`替换为`13b`或`70b`。\n> 要运行7B、13B或34B的Code Llama模型，请分别将`7b`替换为`code-7b`、`code-13b`或`code-34b`。\n\n要停止LlamaGPT，请在终端中按`Ctrl + C`。\n\n### 在其他任何地方使用Docker安装LlamaGPT\n\n您可以在任何x86或arm64系统上运行LlamaGPT。请确保已安装Docker。\n\n然后，克隆此仓库并进入其中：\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt.git\ncd llama-gpt\n```\n\n使用以下命令运行LlamaGPT：\n\n```\n.\u002Frun.sh --model 7b\n```\n\n或者，如果您有Nvidia GPU，可以使用`--with-cuda`标志以CUDA支持运行LlamaGPT，例如：\n\n```\n.\u002Frun.sh --model 7b --with-cuda\n```\n\n您可以通过`http:\u002F\u002Flocalhost:3000`访问LlamaGPT。\n\n> 要运行13B或70B聊天模型，请分别将`7b`替换为`13b`或`70b`。\n> 要运行Code Llama 7B、13B或34B模型，请分别将`7b`替换为`code-7b`、`code-13b`或`code-34b`。\n\n要停止LlamaGPT，请在终端中按`Ctrl + C`。\n\n> 注意：首次运行时，模型下载到`\u002Fmodels`目录可能需要一些时间。您还可能会看到类似以下内容的大量输出，持续几分钟，这是正常现象：\n>\n> ```\n> llama-gpt-llama-gpt-ui-1       | [INFO  wait] 主机 [llama-gpt-api-13b:8000] 尚未可用...\n> ```\n>\n> 当模型自动下载并加载完毕，且API服务器开始运行后，您将看到如下输出：\n>\n> ```\n> llama-gpt-ui_1   | ready - started server on 0.0.0.0:3000, url: http:\u002F\u002Flocalhost:3000\n> ```\n>\n> 此时，您就可以通过http:\u002F\u002Flocalhost:3000访问LlamaGPT了。\n\n---\n\n### 使用Kubernetes安装LlamaGPT\n\n首先，确保您有一个正在运行的Kubernetes集群，并且`kubectl`已配置好以便与其交互。\n\n然后，克隆此仓库并进入其中。\n\n要部署到Kubernetes，首先创建一个命名空间：\n\n```bash\nkubectl create ns llama\n```\n\n然后应用`\u002Fdeploy\u002Fkubernetes`目录下的清单文件：\n\n```bash\nkubectl apply -k deploy\u002Fkubernetes\u002F. -n llama\n```\n\n按照常规方式公开您的服务。\n\n## 与OpenAI兼容的API\n\n得益于llama-cpp-python，一个可直接替代OpenAI API的接口已在`http:\u002F\u002Flocalhost:3001`提供。打开http:\u002F\u002Flocalhost:3001\u002Fdocs查看API文档。\n\n## 基准测试\n\n我们在以下硬件上使用默认系统提示和用户提示“宇宙是如何膨胀的？”以温度0运行LlamaGPT模型，以确保结果的确定性。生成速度取前10次生成的平均值。\n\n欢迎通过提交拉取请求向此表格添加您自己的基准测试结果。\n\n#### Nous Hermes Llama 2 7B Chat (GGML q4_0)\n\n| 设备                              | 生成速度 |\n| ----------------------------------- | -------- |\n| M1 Max MacBook Pro (64GB 内存)       | 54 tokens\u002F秒    |\n| GCP c2-standard-16 vCPU (64 GB 内存) | 16.7 tokens\u002F秒  |\n| Ryzen 5700G 4.4GHz 4核 (16 GB 内存)   | 11.50 tokens\u002F秒 |\n| GCP c2-standard-4 vCPU (16 GB 内存)  | 4.3 tokens\u002F秒   |\n| Umbrel Home (16GB 内存)              | 2.7 tokens\u002F秒   |\n| Raspberry Pi 4 (8GB 内存)            | 0.9 tokens\u002F秒   |\n\n#### Nous Hermes Llama 2 13B Chat (GGML q4_0)\n\n| 设备                              | 生成速度 |\n| ----------------------------------- | -------- |\n| M1 Max MacBook Pro (64GB 内存)       | 20 tokens\u002F秒    |\n| GCP c2-standard-16 vCPU (64 GB 内存) | 8.6 tokens\u002F秒   |\n| GCP c2-standard-4 vCPU (16 GB 内存)  | 2.2 tokens\u002F秒   |\n| Umbrel Home (16GB 内存)              | 1.5 tokens\u002F秒   |\n\n#### Nous Hermes Llama 2 70B Chat (GGML q4_0)\n\n| 设备                              | 生成速度 |\n| ----------------------------------- | -------- |\n| M1 Max MacBook Pro (64GB 内存)       | 4.8 tokens\u002F秒   |\n| GCP e2-standard-16 vCPU (64 GB 内存) | 1.75 tokens\u002F秒  |\n| GCP c2-standard-16 vCPU (64 GB 内存) | 1.62 tokens\u002F秒  |\n\n#### Code Llama 7B Chat (GGUF Q4_K_M)\n\n| 设备                        | 生成速度 |\n| ----------------------------- | ---------- |\n| M1 Max MacBook Pro (64GB 内存) | 41 tokens\u002F秒    |\n\n#### Code Llama 13B Chat (GGUF Q4_K_M)\n\n| 设备                        | 生成速度 |\n| ----------------------------- | ---------- |\n| M1 Max MacBook Pro (64GB 内存) | 25 tokens\u002F秒    |\n\n#### Phind Code Llama 34B Chat (GGUF Q4_K_M)\n\n| 设备                        | 生成速度 |\n| ----------------------------- | ---------- |\n| M1 Max MacBook Pro (64GB 内存) | 10.26 tokens\u002F秒 |\n\n## 路线图与贡献\n\n我们计划为LlamaGPT添加更多功能。您可以在此处查看路线图：[链接](https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Fissues\u002F8#issuecomment-1681321145)。目前的优先事项包括：\n\n- [x] 将模型从Docker镜像中移出，放入独立的存储卷。\n- [x] 为M1\u002FM2 Mac添加Metal支持。\n- [x] 添加对Code Llama模型的支持。\n- [x] 为NVIDIA GPU添加CUDA支持。\n- [ ] 添加加载自定义模型的功能。\n- [ ] 允许用户在不同模型之间切换。\n\n如果您是开发者并希望参与其中，请开一个议题讨论最佳实现方式。如果您想帮忙但不确定从何入手，可以查看这些已被标记为适合新手贡献的议题：[链接](https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Flabels\u002Fgood%20first%20issue)。\n\n## 致谢\n\n衷心感谢以下开发者和团队使LlamaGPT成为可能：\n\n- [Mckay Wrigley](https:\u002F\u002Fgithub.com\u002Fmckaywrigley) 构建了[Chatbot UI](https:\u002F\u002Fgithub.com\u002Fmckaywrigley)。\n- [Georgi Gerganov](https:\u002F\u002Fgithub.com\u002Fggerganov) 实现了[llama.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp)。\n- [Andrei](https:\u002F\u002Fgithub.com\u002Fabetlen) 构建了[llama.cpp的Python绑定](https:\u002F\u002Fgithub.com\u002Fabetlen\u002Fllama-cpp-python)。\n- [NousResearch](https:\u002F\u002Fnousresearch.com) 对[Llama 2 7B和13B模型进行了微调](https:\u002F\u002Fhuggingface.co\u002FNousResearch)。\n- [Phind](https:\u002F\u002Fwww.phind.com\u002F) 对[Code Llama 34B模型进行了微调](https:\u002F\u002Fwww.phind.com\u002Fblog\u002Fcode-llama-beats-gpt4)。\n- [Tom Jobbins](https:\u002F\u002Fhuggingface.co\u002FTheBloke) 对[Llama 2模型进行了量化](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FNous-Hermes-Llama-2-7B-GGML)。\n- [Meta](https:\u002F\u002Fai.meta.com\u002Fllama) 以宽松许可协议发布了Llama 2和Code Llama。\n\n---\n\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fgetumbrel\u002Fllama-gpt?color=%235351FB)](https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Fblob\u002Fmaster\u002FLICENSE.md)\n\n[umbrel.com](https:\u002F\u002Fumbrel.com)","# LlamaGPT 快速上手指南\n\nLlamaGPT 是一个自托管、完全离线的类 ChatGPT 聊天机器人，由 Llama 2 和 Code Llama 模型驱动。所有数据均在本地处理，确保 100% 隐私。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**：macOS (M1\u002FM2), Linux (x86_64\u002Farm64), 或任何支持 Docker 的系统。\n- **内存需求**（根据所选模型不同）：\n  - 7B 模型：至少 6.3 GB 可用内存\n  - 13B 模型：至少 9.8 GB 可用内存\n  - 34B 模型：至少 22.7 GB 可用内存\n  - 70B 模型：至少 41.4 GB 可用内存\n- **硬件加速**（可选但推荐）：\n  - Mac: M1\u002FM2 芯片（自动启用 Metal 加速）\n  - NVIDIA GPU: 需安装支持 CUDA 的驱动\n\n### 前置依赖\n- **Docker**：必须安装 Docker Desktop 或 Docker Engine。\n- **Git**：用于克隆项目仓库。\n- **Xcode Command Line Tools**：仅限 macOS 用户。\n  ```bash\n  xcode-select --install\n  ```\n\n## 安装步骤\n\n### 1. 克隆项目\n打开终端，执行以下命令获取源码：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt.git\ncd llama-gpt\n```\n\n### 2. 运行服务\n\n#### 方案 A：通用 Docker 运行（推荐 Linux\u002FWindows\u002F其他 Mac）\n适用于大多数 x86 或 arm64 架构设备。\n\n**运行 7B 对话模型：**\n```bash\n.\u002Frun.sh --model 7b\n```\n\n**运行 13B 或 70B 对话模型：**\n```bash\n.\u002Frun.sh --model 13b\n# 或\n.\u002Frun.sh --model 70b\n```\n\n**运行 Code Llama 代码模型：**\n```bash\n.\u002Frun.sh --model code-7b\n# 或 code-13b, code-34b\n```\n\n**启用 NVIDIA GPU 加速（CUDA）：**\n如果您的设备配有 NVIDIA 显卡，请添加 `--with-cuda` 标志：\n```bash\n.\u002Frun.sh --model 7b --with-cuda\n```\n\n#### 方案 B：macOS (M1\u002FM2) 专用运行\n针对 Apple Silicon 芯片优化的脚本：\n```bash\n.\u002Frun-mac.sh --model 7b\n```\n*注：同样支持替换为 `13b`, `70b`, `code-7b` 等参数。*\n\n#### 方案 C：Umbrel OS 用户\n如果您使用 Umbrel 家庭服务器，可直接在 Umbrel App Store 中一键安装 LlamaGPT。\n\n> **注意**：首次运行时，系统会自动下载模型文件到 `\u002Fmodels` 目录，可能需要几分钟时间。期间出现 `Host not yet available` 日志属于正常现象，请耐心等待直到看到 `started server on 0.0.0.0:3000`。\n\n## 基本使用\n\n### 访问聊天界面\n服务启动成功后，在浏览器中打开：\n```text\nhttp:\u002F\u002Flocalhost:3000\n```\n即可开始与本地 AI 进行私密对话。\n\n### 调用 OpenAI 兼容 API\nLlamaGPT 提供了与 OpenAI 格式兼容的 API 接口，方便集成到其他应用。\n- **API 地址**：`http:\u002F\u002Flocalhost:3001`\n- **API 文档**：`http:\u002F\u002Flocalhost:3001\u002Fdocs`\n\n您可以使用标准的 OpenAI SDK 指向该地址进行开发。\n\n### 停止服务\n在运行脚本的终端窗口中按下 `Ctrl + C` 即可停止服务。","某金融初创公司的后端团队需要在本地开发环境中频繁调试敏感的交易算法代码，同时确保核心逻辑绝不外泄。\n\n### 没有 llama-gpt 时\n- **数据泄露风险高**：开发者若使用在线 AI 助手排查代码漏洞，必须将包含商业机密的交易逻辑片段上传至云端，存在极大的合规隐患。\n- **网络依赖性强**：在封闭的内网开发环境或网络不稳定时，无法随时获取代码建议，导致调试工作被迫中断。\n- **响应延迟影响心流**：等待云端 API 返回结果的网络延迟经常打断程序员的思维连贯性，降低复杂算法的编写效率。\n- **成本不可控**：团队高频次的代码查询会导致云端 Token 消耗激增，显著增加项目的运营预算。\n\n### 使用 llama-gpt 后\n- **100% 数据隐私安全**：借助 llama-gpt 的离线自托管特性，所有代码分析与生成均在本地设备完成，敏感交易数据从未离开过公司内网。\n- **完全离线可用**：无论网络状况如何，开发者都能通过本地部署的 llama-gpt 即时获得基于 Code Llama 模型的专业代码补全与纠错建议。\n- **零延迟即时反馈**：利用本地 Nvidia GPU 或 M1\u002FM2 芯片加速，llama-gpt 提供毫秒级的代码响应，让开发者保持专注的心流状态。\n- **零边际使用成本**：一次性部署后，团队可无限次调用 llama-gpt 进行高强度代码审查，无需担心按量计费带来的预算压力。\n\nllama-gpt 通过构建纯本地的智能编程伴侣，完美解决了敏感行业在享受 AI 提效红利与严守数据隐私之间的两难困境。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgetumbrel_llama-gpt_9b32e5d0.png","getumbrel","Umbrel","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fgetumbrel_f84b145c.png","The ultimate home server and home cloud OS. Buy an Umbrel Home, or install umbrelOS on a Raspberry Pi.",null,"umbrel","https:\u002F\u002Fumbrel.com","https:\u002F\u002Fgithub.com\u002Fgetumbrel",[82,86,90,94,98,102],{"name":83,"color":84,"percentage":85},"TypeScript","#3178c6",89.5,{"name":87,"color":88,"percentage":89},"Shell","#89e051",7.5,{"name":91,"color":92,"percentage":93},"Dockerfile","#384d54",1.6,{"name":95,"color":96,"percentage":97},"JavaScript","#f1e05a",0.9,{"name":99,"color":100,"percentage":101},"CSS","#663399",0.3,{"name":103,"color":104,"percentage":105},"Makefile","#427819",0.2,10978,713,"2026-04-18T13:50:57","MIT","Linux, macOS","非必需。支持 NVIDIA GPU (需启用 --with-cuda 标志以使用 CUDA 加速)；M1\u002FM2 Mac 支持 Metal 加速。具体显存需求取决于模型大小（例如运行 7B 模型约需 6-7GB 内存\u002F显存，70B 模型需 41GB+）。","最低 6.29GB (运行 7B 模型)，推荐 16GB+；运行 70B 模型需 41.37GB+",{"notes":114,"python":115,"dependencies":116},"主要通过 Docker 部署（支持 x86 和 arm64 架构），也可在 Kubernetes 上运行。首次运行会自动下载模型文件（大小从 3.79GB 到 38.87GB 不等），启动时间较长。支持 Nous Hermes Llama 2 系列及 Code Llama 系列模型。提供兼容 OpenAI 的 API 接口。","未说明",[117,118,119],"Docker","llama.cpp","llama-cpp-python",[13,35,14,15],[122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137],"ai","chatgpt","gpt","gpt-4","gpt4all","llama","llama-2","llama-cpp","llama2","llamacpp","llm","localai","openai","self-hosted","code-llama","codellama","2026-03-27T02:49:30.150509","2026-04-19T09:17:46.354688",[141,146,151,156,161,165],{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},42053,"启动时卡在 'Host [...] not yet available' 或出现 AssertionError 错误怎么办？","这通常是因为模型格式不兼容（GGML 与 GGUF）或版本过旧导致的。请执行以下步骤修复：\n1. 拉取最新代码：`git pull origin master`\n2. 重新运行启动脚本：\n   - Linux\u002FWindows: `.\u002Frun.sh --model 7b` (可将 7b 替换为 13b, 70b, code-7b 等)\n   - Mac M1\u002FM2: `.\u002Frun-mac.sh --model 7b`\n此问题已在主分支修复，支持 Code Llama 模型并默认使用 GGUF 格式。","https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Fissues\u002F86",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},42054,"在 Windows 上运行时容器重启循环并报错 'Syntax error' 或 ': not found' 如何解决？","这是由于 Windows 和 Unix 之间的换行符不一致导致的。可以通过以下任一方法解决：\n方法一（推荐）：确保已拉取最新代码，维护者已添加 `.gitattributes` 文件自动处理换行符，然后强制重建容器：\n`docker compose up -d --build`\n\n方法二（手动修复）：\n1. 进入 api 文件夹：`cd api`\n2. 转换换行符：`dos2unix run.sh` (需安装 dos2unix) 或在 Git Bash 中运行\n3. 重建容器：`docker compose up -d --build`","https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Fissues\u002F10",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},42055,"如何启用 NVIDIA CUDA 或 Apple Metal (M1\u002FM2) 加速支持？","项目现已支持 CUDA 和 Metal 加速。\n1. **CUDA (NVIDIA)**: 确保安装了 NVIDIA Container Toolkit，并使用带有 `--with-cuda` 参数的启动脚本，例如：`.\u002Frun.sh --model 7b --with-cuda`。项目已通过 PR #72 原生支持 CUDA 构建。\n2. **Metal (Mac M1\u002FM2)**: 使用专门的启动脚本 `.\u002Frun-mac.sh --model 7b`。该脚本会自动配置 `LLAMA_METAL=1` 进行构建。\n如果使用的是旧版本，可能需要手动编译支持 CUDA 的 llama-cpp-python 镜像并映射模型文件。","https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Fissues\u002F6",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},42056,"Docker 容器启动崩溃或依赖处理完成后立即退出怎么办？","如果是旧版本导致的崩溃，尝试重建容器通常能解决问题：\n`docker compose up -d --build`\n\n此外，为了避免模型重载并修复某些警告，可以在 `docker-compose.yaml` 的服务配置中添加以下环境变量和权限设置：\n```yaml\nenvironment:\n  USE_MLOCK: 1\ncap_add:\n  - IPC_LOCK\n```\n添加后请记得重新运行 `docker compose up -d` 生效。","https:\u002F\u002Fgithub.com\u002Fgetumbrel\u002Fllama-gpt\u002Fissues\u002F4",{"id":162,"question_zh":163,"answer_zh":164,"source_url":145},42057,"运行大模型（如 34b）时卡住不动，但小模型（7b\u002F13b）正常是怎么回事？","这通常是因为显存或内存不足导致模型无法加载。虽然日志显示 'Host not yet available'，但实际是后端 API 因内存不足启动失败。\n1. 检查系统可用内存和显存。例如 34b 模型即使量化也需要较多资源（建议 32GB+ RAM 和大显存）。\n2. 尝试减少分配给容器的资源限制，或者换用较小的模型（如 7b 或 13b）测试是否正常。\n3. 确保使用了正确的量化版本模型（如 q4_0），以降低内存需求。",{"id":166,"question_zh":167,"answer_zh":168,"source_url":145},42058,"模型文件存放在哪里？如何手动管理模型？","默认情况下，模型会下载并存储在克隆仓库目录下的 `models\u002F` 文件夹中。\n如果需要手动指定或预下载模型：\n1. 从 HuggingFace 下载对应的 `.bin` 或 `.gguf` 模型文件。\n2. 将文件放置在本地的 `models\u002F` 目录中。\n3. 重新启动容器，系统通常会检测到现有文件而跳过下载。\n注意：不同版本的 llama-gpt 可能对应不同的模型格式（GGML 或 GGUF），请确保下载的模型格式与代码版本匹配。",[]]