[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-OpenGVLab--LLaMA-Adapter":3,"tool-OpenGVLab--LLaMA-Adapter":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",152630,2,"2026-04-12T23:33:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":75,"owner_location":75,"owner_email":76,"owner_twitter":77,"owner_website":75,"owner_url":78,"languages":79,"stars":104,"forks":105,"last_commit_at":106,"license":107,"difficulty_score":108,"env_os":109,"env_gpu":110,"env_ram":111,"env_deps":112,"category_tags":118,"github_topics":75,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":120,"updated_at":121,"faqs":122,"releases":153},6991,"OpenGVLab\u002FLLaMA-Adapter","LLaMA-Adapter","[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters","LLaMA-Adapter 是一套高效的大语言模型微调方案，旨在让 LLaMA 系列模型快速具备指令遵循和多模态理解能力。它核心解决了传统全量微调成本高昂、耗时漫长且资源消耗巨大的痛点，仅需约 1.2M 的可训练参数和 1 小时即可完成适配，极大降低了算力门槛。\n\n该工具非常适合希望低成本定制大模型的 AI 研究人员、开发者以及需要快速验证想法的技术团队。其独特的技术亮点在于引入了“零初始化注意力机制”（Zero-init Attention），通过在预训练模型中插入轻量级适配器层，在几乎不改变原有参数的前提下实现高性能微调。此外，LLaMA-Adapter 不仅支持文本指令微调，还扩展到了视觉 - 语言等多模态任务（如 LLaMA-Adapter V2），并能与 LangChain 等框架无缝集成。作为已被 ICLR 2024 收录的开源项目，它为社区提供了一条通往高性能大模型应用的便捷路径。","# LLaMA-Adapter: Efficient Fine-tuning of LLaMA 🚀\n\n## Announcement: We release **[LLaMA2-Accessory](https:\u002F\u002Fgithub.com\u002FAlpha-VLLM\u002FLLaMA2-Accessory)**, an open-source toolkit for **pre-training**, **fine-tuning** and **deployment** of **LLMs** and **mutlimodal LLMs**.🔥\n\nOfficial implementation of ['LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention'](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.16199.pdf) and ['LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model'](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.15010.pdf).\n\n\u003Cp align=\"center\">                                                                                                                                          \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpenGVLab_LLaMA-Adapter_readme_764bc6b39e7e.png\"\u002F width=\"100%\"> \u003Cbr>\n\u003C\u002Fp>\n\n\nThis repo proposes **LLaMA-Adapter (V2)**, a lightweight adaption method for fine-tuning **Instruction-following** and **Multi-modal** [LLaMA](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama) models 🔥.\n\nTry out the web demo 🤗 of LLaMA-Adapter: [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fcsuhan\u002FLLaMA-Adapter), [LLaMA-Adapter V2](http:\u002F\u002Fllama-adapter.opengvlab.com\u002F) and [ImageBind-LLM](http:\u002F\u002Fimagebind-llm.opengvlab.com\u002F).\n\nJoin us at [Wechat](https:\u002F\u002Fgithub.com\u002FAlpha-VLLM\u002FLLaMA2-Accessory\u002Fblob\u002Fmain\u002Fdocs\u002Fwechat.md)!\n\n## News\n- **[2024.01.18]** [LLaMA-Adapter](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.16199.pdf) is accepted by **ICLR 2024**!🎉\n- **[2023.12.08]** We release [OneLLM](https:\u002F\u002Fgithub.com\u002Fcsuhan\u002FOneLLM) which aligns **eight** modalities to language using a unified framework.🔥🔥🔥\n- **[2023.11.11]** We release [SPHINX](https:\u002F\u002Fgithub.com\u002FAlpha-VLLM\u002FLLaMA2-Accessory\u002Ftree\u002Fmain\u002FSPHINX), a new multi-modal LLM, which is a huge leap from LLaMa-Adapter-V2.🔥🔥🔥\n- **[2023.10.11]** We realse **LLaMA-Adapter V2.1**, an improved version of LLaMA-Adapter V2 with stronger multi-modal reasoning performance. Check [llama_adapter_v2_multimodal7b](llama_adapter_v2_multimodal7b) for details.\n- **[2023.08.28]** We release quantized LLM with [OmniQuant](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FOmniQuant), which is an efficient, accurate, and omnibearing (even extremely low bit) quantization algorithm. Multimodal version is coming soon.🔥🔥🔥\n- **[2023.07.24]** We release **[LLaMA2-Accessory](https:\u002F\u002Fgithub.com\u002FAlpha-VLLM\u002FLLaMA2-Accessory)**, an open-source toolkit for **pre-training**, **fine-tuning** and **deployment** of **Large Language Models (LLMs)** and **mutlimodal LLMs**. Please check [Alpha-VLLM\u002FLLaMA2-Accessory](https:\u002F\u002Fgithub.com\u002FAlpha-VLLM\u002FLLaMA2-Accessory) for more details!🔥🔥🔥\n- **[2023.07.05]** We release the pretrain\u002Ffinetune code of [llama_adapter_v2_multimodal7b](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Ftree\u002Fmain\u002Fllama_adapter_v2_multimodal7b).\n- **[2023.07.04]** We release the code for reproducing [Gorilla](https:\u002F\u002Fgithub.com\u002FShishirPatil\u002Fgorilla) by both full finetune and LLaMA-Adapter, please see [gorilla\u002FREADME.md](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Fblob\u002Fmain\u002Fgorilla\u002FREADME.md).\n- **[2023.06.08]** We release the [demo](http:\u002F\u002Fimagebind-llm.opengvlab.com\u002F) of ImageBind-LLM.\n- **[2023.06.06]** We release [Point-Bind](https:\u002F\u002Fgithub.com\u002FZrrSkywalker\u002FPoint-Bind) to extend ImageBind with 3D point clouds, which achieves 3D instruction-following capacity for [imagebind_LLM](imagebind_LLM).\n- **[2023.06.05]** We support the integration of LLaMA-Adapter (both V1 and V2) and [LangChain](https:\u002F\u002Fpython.langchain.com\u002Fen\u002Flatest\u002Findex.html). Check out the [Notebook](\u002Fdocs\u002Flangchain_LLaMA_AdapterV2_demo.ipynb).\n- **[2023.05.29]** We release the code of ImageBind-LLM at [imagebind_LLM](imagebind_LLM).\n- **[2023.05.23]** We release the [demos](http:\u002F\u002Fllama-adapter.opengvlab.com\u002F) and [multi-modal code](llama_adapter_v2_multimodal7b) of LLaMA-Adapter V2!\n- **[2023.05.05]** We release the paper and code of our new work [Personalize Segment Anything](https:\u002F\u002Fgithub.com\u002FZrrSkywalker\u002FPersonalize-SAM), which efficiently fine-tunes Segment Anything with **10 seconds**, and improves DreamBooth for better **text-to-image generation**. \n- **[2023.04.30]** We noticed that GPT-4 evaluation has a strong positional bias in favor of the first response. We will soon update the paper to reveal the position bias. Great thanks to [Canwen Xu](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=oopKCDMAAAAJ&hl=en).\n- **[2023.04.28]** We release **LLaMA-Adapter V2**, a multi-modal instruction model. Check out our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.15010), [demos](#demos) and [code](llama_adapter_v2_chat65b)!\n- **[2023.03.28]**  The [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.16199.pdf) and [training code](alpaca_finetuning_v1) for **LLaMA-Adapter V1** are released. 📌\n\n\n## Released Models \n\n| Name                                                         | Approach                                               | Data                                                         | Modality                           | Visual         | Text                  |\n| ------------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------------------ | ---------------------------------- | -------------- | --------------------- |\n| [LLaMA-Adapter V1](.\u002Falpaca_finetuning_v1)                   | prefix, gate                                           | Alpaca                                                       | Text                               | ×              | LLaMA-7B              |\n| [LLaMA-Adapter V2 dialog](.\u002Fllama_adapter_v2_chat65b)        | scale, bias, norm                                      | ShareGPT                                                     | Text                               | ×              | LLaMA-65B             |\n| [LLaMA-Adapter V2 multimodal](.\u002Fllama_adapter_v2_multimodal7b) | [P] prefix, projection, gate \u003Cbr \u002F>[F] bias, norm      | [P] Image-Text-V1\u003Cbr \u002F>[F] GPT4LLM, LLaVA                    | Image&Text                         | CLIP-ViT-L\u002F14  | LLaMA-7B              |\n| [LLaMA-Adapter V2.1 multimodal](.\u002Fllama_adapter_v2_multimodal7b) | [P] prefix, projection, gate \u003Cbr \u002F>[F] bias, norm, lora      | [P] Image-Text-V1\u003Cbr \u002F>[F] GPT4LLM, LLaVA, VQAv2                    | Image&Text                         | CLIP-ViT-L\u002F14  | LLaMA-7B              |\n| [ImageBind-LLM](.\u002Fimagebind_LLM)                             | [P] prefix, projection, gate\u003Cbr \u002F>[F] bias, norm, lora | [P] Image-Text-V1\u003Cbr \u002F>[F] [Instruction Following](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Fblob\u002Fmain\u002Fimagebind_LLM\u002Fdocs\u002Ftrain.md#data-1) | ImageBind Modalities + Point Cloud | imagebind_huge | Open-Chinese-LLaMA-7B |\n| ImageBind-dialog                                             | [P] prefix, projection, gate\u003Cbr \u002F>[F] bias, norm, lora | [P] Image-Text-V1\u003Cbr \u002F>[F] LLaVA, ShareGPT                   | ImageBind Modalities + Point Cloud | imagebind_huge | Open-Chinese-LLaMA-7B |\n\n+ [P] means **P**re-train and [F] means **F**ine-tune\n+ **Image-Text-V1** is  a concatenation of LAION400M, COYO, MMC4, SBU, Conceptual Captions, and COCO\n+ **ImageBind Modalities** include image, video, text, audio, depth, thermal, IMU\n+ **ImageBind-dialog** will be release soon\n\n\n## Overview\nEfficiency Comparison:\n|  Model | Parameters | Storage Space | Training Time  \n| :-----: | :-----: |:-----:| :-----: |\n|  [Alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca) | 7B |13G| 3 Hours|\n|  LLaMA-Adapter | 1.2M |4.7M| 1 Hour|\n\nBy inserting adapters into LLaMA's transformer, our method only introduces **1.2M** learnable parameters, and turns a LLaMA into an instruction-following model within **1 hour**. For stablizing training at early stages, we propose a novel **Zero-init Attention** with zero gating mechanism to adaptively incorporate the instructional signals. After fine-tuning, LLaMA-Adapter can generate high-quality instruction-following sentences, comparable to the fully fine-tuned [Stanford Alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca) and [Alpaca-Lora](https:\u002F\u002Fgithub.com\u002Ftloen\u002Falpaca-lora).\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpenGVLab_LLaMA-Adapter_readme_542afa855171.png\"\u002F width=\"90%\">\n\u003C\u002Fdiv>\n\nOur approach can be simply extended to **Multi-modal Input Instructions**. The reasoning framework of image-conditioned LLaMA-Adapter for [ScienceQA](https:\u002F\u002Fscienceqa.github.io\u002F) is as follows, which is also shared by other modalities, such as audio and video.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpenGVLab_LLaMA-Adapter_readme_c6570d563fdc.png\"\u002F width=\"90%\">\n\u003C\u002Fdiv>\n\n\n\n\n## Setup\n\nHere is a from-scratch script for **LLaMA-Adapter V1**.\n```bash\nconda create -n llama_adapter -y python=3.8\nconda activate llama_adapter\n\n# install pytorch\nconda install pytorch cudatoolkit -c pytorch -y\n\n# install dependency and llama-adapter\npip install -r requirements.txt\npip install -e .\n```\n**Note**: **To setup other models**, please refer to [llama_adapter_v2_chat65b](llama_adapter_v2_chat65b), [llama_adapter_v2_multimodal7b](llama_adapter_v2_multimodal7b) and [imagebind_LLM](imagebind_LLM) for more details.\n\n## Inference\n\nPlease request access to the pre-trained LLaMA from [this form](https:\u002F\u002Fforms.gle\u002Fjk851eBVbX1m5TAv5) (official) or download the LLaMA-7B from [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fnyanko7\u002FLLaMA-7B\u002Ftree\u002Fmain) (unofficial). Then, obtain the weights of our LLaMA-Adapter from [here](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Freleases\u002Fdownload\u002Fv.1.0.0\u002Fllama_adapter_len10_layer30_release.pth). We denote the path to the downloaded weights of LLaMA and adapters as `TARGET_FOLDER` and `ADAPTER_PATH`.\n\nHere is an example to generate instruction-following sentences with 7B LLaMA model and our LLaMA-Adapter:\n```bash\ntorchrun --nproc_per_node 1 example.py \\\n         --ckpt_dir $TARGET_FOLDER\u002Fmodel_size\\\n         --tokenizer_path $TARGET_FOLDER\u002Ftokenizer.model \\\n         --adapter_path $ADAPTER_PATH\n```\n\n## Training\n\nWe release the simple fine-tuning code of LLaMA-Adapter on LLaMA-7B model at [here](alpaca_finetuning_v1), which is for effortless reproduction with minimal dependencies. We will soon release the fine-tuning code for LLaMA-65B and multi-model LLaMA-Adapter.\n\nPlease download the 52K instruction-following training [data](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca\u002Fblob\u002Fmain\u002Falpaca_data.json) from Standford Alpaca, and put it under `DATA_PATH`. Then run:\n\n```bash\ncd alpaca_finetuning_v1\n\ntorchrun --nproc_per_node 8 finetuning.py \\\n         --model Llama7B_adapter \\\n         --llama_model_path $TARGET_FOLDER\u002F \\\n         --data_path $DATA_PATH\u002Falpaca_data.json \\\n         --adapter_layer 30 \\\n         --adapter_len 10 \\\n         --max_seq_len 512 \\\n         --batch_size 4 \\\n         --epochs 5 \\\n         --warmup_epochs 2 \\\n         --blr 9e-3 \\\n         --weight_decay 0.02 \\\n         --output_dir .\u002Fcheckpoint\u002F\n```\n\n## Comparison with Other Methods\n\n### Instruction 1:\nHow can I develop my critical thinking skills? Give me three suggestions.\n\n* **LLaMA-Adapter V2:**\n1. Read widely: Reading books, articles, and other sources of information can help you develop your critical thinking skills. By exposing yourself to different perspectives and ideas, you can learn to analyze and evaluate information more effectively.\n\n2. Practice mindfulness: Mindfulness is the practice of being present in the moment and paying attention to your thoughts and feelings. By being more aware of your thoughts and emotions, you can better understand and evaluate them. This can help you develop a more objective and rational approach to thinking.\n\n3. Engage in open-minded discussions: Join a discussion group or debate club where you can engage in open-minded discussions with others. This can help you develop your critical thinking skills by challenging your assumptions and encouraging you to consider different perspectives.\n\n* **LLaMA-Adapter V1:**\n1. Read widely and critically. \n2. Engage in discussions with people who have different perspectives. \n3. Ask questions and seek out answers.\n\n### Instruction 2:\nAs a superhero, how would you explain your origin story to a curious child?\n\n* **LLaMA-Adapter V2:**\n\nAs a superhero, I have a very interesting and exciting origin story to share with you. Once upon a time, there was a young boy named John who lived in a small town. John was always curious and wanted to help others, so he spent his days exploring the world around him and learning about different cultures and people. One day, while he was out exploring, he stumbled upon a mysterious cave. Inside the cave, John found a strange device that had the power to give him incredible abilities. With these abilities, John could fly, shoot lasers from his eyes, and even turn invisible.\n\nJohn knew that he had to use these powers for good, so he decided to become a superhero and use his abilities to protect his town and the people who lived there. He started to train every day, learning how to use his powers and how to fight against evil. Eventually, John became the superhero known as \"The Invisible Man\" and he used his powers to help others and make the world a better place.\n\nThat's my origin story, and I'm sure you'll love it!\n\n* **LLaMA-Adapter V1:**\n\nOnce upon a time, there was a young boy who had a special power. He used his power to help people in need and protect the innocent. He became known as the superhero, and he used his powers to fight evil and protect the world.\n\n### Instruction 3: \nTell me about alpacas.\n\n* **LLaMA-Adapter**: Alpacas are members of the camelid family and are native to the Andes Mountains of South America. They are typically found in herds of 10-20 animals and are known for their gentle nature and their ability to thrive in harsh climates. Alpacas are typically white or grey in color and have long, silky coats. They are also known for their ability to produce high-quality wool, which is used in the production of clothing and other products.\n\n\u003Cdetails>\u003Csummary>OTHER METHODS\u003C\u002Fsummary>\n\n* **Alpaca-LoRA**: Alpacas are members of the camelid family and are native to the Andes Mountains of South America. They are known for their soft, luxurious fleece, which is used to make clothing, blankets, and other items. Alpacas are herbivores and graze on grasses and other plants. They are social animals and live in herds of up to 20 individuals.\n\n* **Stanford Alpaca**: Alpacas are small, fluffy animals related to camels and llamas. They are native to Peru and Bolivia, and were first domesticated around 5,000 years ago. They are kept mainly for their fine, soft fleece, which is used to make knitwear and other garments. Alpacas are herd animals and live in small family groups, led by an older male. They are highly social animals and can form strong bonds with their owners.\n\n* **text-davinci-003**: Alpacas are a domesticated species of South American camelid, similar to llamas. They are native to the Andes Mountains in South America and are kept as livestock for their soft, luxurious wool. Alpacas are social animals and live in herds of up to 20 individuals. They are typically kept in small herds of two to five animals, and are relatively easy to care for. Alpacas are herbivores and feed on grass, hay, and other vegetation. They are also known for their gentle and friendly nature, making them popular as pets.\n\n\u003C\u002Fdetails>\n\n---\n\n### Instruction 4: \nTell me about the president of Mexico in 2019.\n\n* **LLaMA-Adapter**: The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1st, 2018. He is a member of the left-wing party Morena and is known for his populist policies. He has focused on tackling corruption, reducing poverty, and improving the country's infrastructure. He has also been a vocal critic of the United States' immigration policies.\n\n\u003Cdetails>\u003Csummary>OTHER METHODS\u003C\u002Fsummary>\n\n* **Alpaca-LoRA**: The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1, 2018. He is a member of the National Regeneration Movement (MORENA) political party and is the first left-wing president of Mexico since 1946. He is known for his anti-corruption and anti-neoliberal policies, as well as his commitment to improving the living conditions of the Mexican people.\n\n* **Stanford Alpaca**: In 2019, Andres Manuel Lopez Obrador was sworn in as President of Mexico. He is the country’s 58th president and the first from the left-wing Morena party. He was elected in 2018 on a promise to combat corruption and reduce poverty and has faced criticism for his handling of the economy, crime and the country’s relationship with the United States. He is also known for his ambitious social welfare programs and has been praised for raising the minimum wage and providing aid to low-income families. He is seen as a champion of democracy and human rights in Mexico.\n\n* **text-davinci-003**: The president of Mexico in 2019 is Andrés Manuel López Obrador, commonly known as AMLO. He is a Mexican politician who has been the President of Mexico since December 1, 2018. He is a member of the National Regeneration Movement (MORENA) party and previously served as the Head of Government of Mexico City from 2000 to 2005. He is a left-wing populist and has promised to reduce corruption and inequality in Mexico. He has also promised to increase public spending on social programs and infrastructure projects.\n\n\u003C\u002Fdetails>\n\n---\n\n### Instruction 5: \nWrite a Python program that prints the first 10 Fibonacci numbers.\n\n* **LLaMA-Adapter**:\n```python\ndef fibonacci():\n    a = 0\n    b = 1\n    for i in range(10):\n        a, b = b, a + b\n        print(a, b)\n    print(\"Fibonacci numbers: \", a, b)\n\nfibonacci()\n```\n\n\u003Cdetails>\u003Csummary>OTHER METHODS\u003C\u002Fsummary>\n\n* **Alpaca-LoRA**:\n\n```python\ndef fibonacci(n):\n    if n == 0:\n        return 0\n    elif n == 1:\n        return 1\n    else:\n        return fibonacci(n-1) + fibonacci(n-2)\n\nprint(fibonacci(10))\n```\n\n* **Stanford Alpaca**:\n\n```python\nnumbers = [0, 1]\n\nfor i in range(2, 11):\n    numbers.append(numbers[i-2] + numbers[i-1])\n\nprint(numbers)\n# Output: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]\n```\n\n* **text-davinci-003**:\n\n```python\n# Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones.\n\n# Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...\n\n# Program:\n\n# Initialize the first two numbers of the Fibonacci sequence\na = 0\nb = 1\n\n# Print the first 10 numbers of the Fibonacci sequence\nprint(a)\nprint(b)\n\nfor i in range(2,10):\n    c = a + b\n    a = b\n    b = c\n    print(c)\n```\n\n\u003C\u002Fdetails>\n\n---\n\n### Instruction 6: \nWrite a program that prints the numbers from 1 to 100. But for multiples of three print 'Fizz' instead of the number and for the multiples of five print 'Buzz'. For numbers which are multiples of both three and five print 'FizzBuzz'.\n\n* **LLaMA-Adapter**:\n\n```python\ndef print_numbers(n):\n    for i in range(1, 101):\n        if i % 3 == 0 and i % 5 == 0:\n            print(\"FizzBuzz\")\n        elif i % 3 == 0:\n            print(\"Fizz\")\n        elif i % 5 == 0:\n            print(\"Buzz\")\n        else:\n            print(i)\n```\n\n\u003Cdetails>\u003Csummary>OTHER METHODS\u003C\u002Fsummary>\n\n* **Alpaca-LoRA**:\n\n```python\nfor i in range(1, 101):\n    if i % 3 == 0 and i % 5 == 0:\n        print(\"FizzBuzz\")\n    elif i % 3 == 0:\n        print(\"Fizz\")\n    elif i % 5 == 0:\n        print(\"Buzz\")\n    else:\n        print(i)\n```\n\n* **Stanford Alpaca**:\n\n```python\nfor i in range(1, 101):\n    if i % 3 == 0 and i % 5 == 0:\n        print('FizzBuzz')\n    elif i % 3 == 0:\n        print('Fizz')\n    elif i % 5 == 0:\n        print('Buzz')\n    else:\n        print(i)\n```\n\n* **text-davinci-003**:\n\n```javascript\nfor (let i = 1; i \u003C= 100; i++) {\n  if (i % 3 === 0 && i % 5 === 0) {\n    console.log(\"FizzBuzz\");\n  } else if (i % 3 === 0) {\n    console.log(\"Fizz\");\n  } else if (i % 5 === 0) {\n    console.log(\"Buzz\");\n  } else {\n    console.log(i);\n  }\n}\n```\n\n\u003C\u002Fdetails>\n\n\n\n## Contributors\n[Peng Gao](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=_go6DPsAAAAJ&hl=zh-CN), [Jiaming Han](https:\u002F\u002Fcsuhan.com), [Renrui Zhang](https:\u002F\u002Fgithub.com\u002FZrrSkywalker), [Aojun Zhou](https:\u002F\u002Fgithub.com\u002Faojunzz)\n\n## Hiring Announcement\n\n🔥 **We are hiring** interns, postdocs, and full-time researchers at the **General Vision Group, Shanghai AI Lab**, with a focus on multi-modality and vision foundation models. If you are interested, please contact [gaopengcuhk@gmail.com](mailto:gaopengcuhk@gmail.com).\n\n## Citation\nIf you find our LLaMA-Adapter code and paper useful, please kindly cite:\n```bash\n@article{zhang2023llamaadapter,\n  title = {LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention},\n  author={Zhang, Renrui and Han, Jiaming and Liu, Chris and Gao, Peng and Zhou, Aojun and Hu, Xiangfei and Yan, Shilin and Lu, Pan and Li, Hongsheng and Qiao, Yu},\n  journal={arXiv preprint arXiv:2303.16199},\n  year={2023}\n}\n```\n\nIf you find our LLaMA-Adapter V2 code and paper useful, please kindly cite:\n```bash\n@article{gao2023llamaadapterv2,\n  title = {LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model},\n  author={Gao, Peng and Han, Jiaming and Zhang, Renrui and Lin, Ziyi and Geng, Shijie and Zhou, Aojun and Zhang, Wei and Lu, Pan and He, Conghui and Yue, Xiangyu and Li, Hongsheng and Qiao, Yu},\n  journal={arXiv preprint arXiv:2304.15010},\n  year={2023}\n}\n```\n\n## Acknowledgement\nThis repo benefits from [LLaMA](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama), [Stanford Alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca), and [Alpaca-Lora](https:\u002F\u002Fgithub.com\u002Ftloen\u002Falpaca-lora). Thanks for their wonderful works.\n","# LLaMA-Adapter：高效微调LLaMA 🚀\n\n## 公告：我们发布了 **[LLaMA2-Accessory](https:\u002F\u002Fgithub.com\u002FAlpha-VLLM\u002FLLaMA2-Accessory)**，一个用于 **预训练**、**微调** 和 **部署** **LLM** 及 **多模态 LLM** 的开源工具包。🔥\n\n官方实现了论文 ['LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention'](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.16199.pdf) 和 ['LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model'](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.15010.pdf)。\n\n\u003Cp align=\"center\">                                                                                                                                          \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpenGVLab_LLaMA-Adapter_readme_764bc6b39e7e.png\"\u002F width=\"100%\"> \u003Cbr>\n\u003C\u002Fp>\n\n\n本仓库提出了 **LLaMA-Adapter (V2)**，一种轻量级的适配方法，用于微调 **指令跟随** 和 **多模态** [LLaMA](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama) 模型 🔥。\n\n快来体验 LLaMA-Adapter 的网页演示 🤗：[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fcsuhan\u002FLLaMA-Adapter)、[LLaMA-Adapter V2](http:\u002F\u002Fllama-adapter.opengvlab.com\u002F) 和 [ImageBind-LLM](http:\u002F\u002Fimagebind-llm.opengvlab.com\u002F)。\n\n欢迎加入我们的 [Wechat](https:\u002F\u002Fgithub.com\u002FAlpha-VLLM\u002FLLaMA2-Accessory\u002Fblob\u002Fmain\u002Fdocs\u002Fwechat.md) 社区！\n\n## 最新消息\n- **[2024.01.18]** [LLaMA-Adapter](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.16199.pdf) 被 **ICLR 2024** 接收！🎉\n- **[2023.12.08]** 我们发布了 [OneLLM](https:\u002F\u002Fgithub.com\u002Fcsuhan\u002FOneLLM)，该模型使用统一框架将 **八种** 模态对齐到语言。🔥🔥🔥\n- **[2023.11.11]** 我们发布了 [SPHINX](https:\u002F\u002Fgithub.com\u002FAlpha-VLLM\u002FLLaMA2-Accessory\u002Ftree\u002Fmain\u002FSPHINX)，一款全新的多模态 LLM，相比 LLaMa-Adapter-V2 实现了巨大飞跃。🔥🔥🔥\n- **[2023.10.11]** 我们发布了 **LLaMA-Adapter V2.1**，这是 LLaMA-Adapter V2 的改进版本，具有更强的多模态推理能力。详情请参见 [llama_adapter_v2_multimodal7b](llama_adapter_v2_multimodal7b)。\n- **[2023.08.28]** 我们通过 [OmniQuant](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FOmniQuant) 发布了量化后的 LLM，这是一种高效、精准且全面覆盖（甚至极低比特）的量化算法。多模态版本即将推出。🔥🔥🔥\n- **[2023.07.24]** 我们发布了 **[LLaMA2-Accessory](https:\u002F\u002Fgithub.com\u002FAlpha-VLLM\u002FLLaMA2-Accessory)**，一个用于 **预训练**、**微调** 和 **部署** **大型语言模型 (LLM)** 以及 **多模态 LLM** 的开源工具包。更多详情请访问 [Alpha-VLLM\u002FLLaMA2-Accessory](https:\u002F\u002Fgithub.com\u002FAlpha-VLLM\u002FLLaMA2-Accessory)!🔥🔥🔥\n- **[2023.07.05]** 我们发布了 [llama_adapter_v2_multimodal7b](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Ftree\u002Fmain\u002Fllama_adapter_v2_multimodal7b) 的预训练\u002F微调代码。\n- **[2023.07.04]** 我们发布了通过全量微调和 LLaMA-Adapter 复现 [Gorilla](https:\u002F\u002Fgithub.com\u002FShishirPatil\u002Fgorilla) 的代码，请参阅 [gorilla\u002FREADME.md](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Fblob\u002Fmain\u002Fgorilla\u002FREADME.md)。\n- **[2023.06.08]** 我们发布了 ImageBind-LLM 的演示网站：[http:\u002F\u002Fimagebind-llm.opengvlab.com\u002F]。\n- **[2023.06.06]** 我们发布了 [Point-Bind](https:\u002F\u002Fgithub.com\u002FZrrSkywalker\u002FPoint-Bind)，它将 ImageBind 扩展至 3D 点云，从而为 [imagebind_LLM](imagebind_LLM) 提供 3D 指令跟随能力。\n- **[2023.06.05]** 我们支持将 LLaMA-Adapter（包括 V1 和 V2）与 [LangChain](https:\u002F\u002Fpython.langchain.com\u002Fen\u002Flatest\u002Findex.html) 集成。请查看 [Notebook](\u002Fdocs\u002Flangchain_LLaMA_AdapterV2_demo.ipynb)。\n- **[2023.05.29]** 我们在 [imagebind_LLM](imagebind_LLM) 中发布了 ImageBind-LLM 的代码。\n- **[2023.05.23]** 我们发布了 LLaMA-Adapter V2 的 [演示](http:\u002F\u002Fllama-adapter.opengvlab.com\u002F) 和 [多模态代码](llama_adapter_v2_multimodal7b)!\n- **[2023.05.05]** 我们发布了新工作 [Personalize Segment Anything](https:\u002F\u002Fgithub.com\u002FZrrSkywalker\u002FPersonalize-SAM) 的论文和代码，该方法可在 **10 秒内** 高效微调 Segment Anything，并改进 DreamBooth 以实现更好的 **文本到图像生成**。\n- **[2023.04.30]** 我们注意到 GPT-4 的评估存在强烈的首答偏好偏差。我们将很快更新论文以揭示这一位置偏差。特别感谢 [Canwen Xu](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=oopKCDMAAAAJ&hl=en)。\n- **[2023.04.28]** 我们发布了 **LLaMA-Adapter V2**，这是一款多模态指令模型。请参阅我们的 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.15010)、[演示](#demos) 和 [代码](llama_adapter_v2_chat65b)!\n- **[2023.03.28]**  **LLaMA-Adapter V1** 的 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.16199.pdf) 和 [训练代码](alpaca_finetuning_v1) 已发布。📌\n\n## 已发布模型 \n\n| 名称                                                         | 方法                                               | 数据                                                         | 模态                           | 视觉         | 文本                  |\n| ------------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------------------ | ---------------------------------- | -------------- | --------------------- |\n| [LLaMA-Adapter V1](.\u002Falpaca_finetuning_v1)                   | 前缀、门控                                           | Alpaca                                                       | 文本                               | ×              | LLaMA-7B              |\n| [LLaMA-Adapter V2 对话](.\u002Fllama_adapter_v2_chat65b)        | 缩放、偏置、归一化                                      | ShareGPT                                                     | 文本                               | ×              | LLaMA-65B             |\n| [LLaMA-Adapter V2 多模态](.\u002Fllama_adapter_v2_multimodal7b) | [P] 前缀、投影、门控 \u003Cbr \u002F>[F] 偏置、归一化      | [P] 图像-文本-V1\u003Cbr \u002F>[F] GPT4LLM、LLaVA                    | 图像&文本                         | CLIP-ViT-L\u002F14  | LLaMA-7B              |\n| [LLaMA-Adapter V2.1 多模态](.\u002Fllama_adapter_v2_multimodal7b) | [P] 前缀、投影、门控 \u003Cbr \u002F>[F] 偏置、归一化、LoRA      | [P] 图像-文本-V1\u003Cbr \u002F>[F] GPT4LLM、LLaVA、VQAv2                    | 图像&文本                         | CLIP-ViT-L\u002F14  | LLaMA-7B              |\n| [ImageBind-LLM](.\u002Fimagebind_LLM)                             | [P] 前缀、投影、门控\u003Cbr \u002F>[F] 偏置、归一化、LoRA | [P] 图像-文本-V1\u003Cbr \u002F>[F] [指令遵循](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Fblob\u002Fmain\u002Fimagebind_LLM\u002Fdocs\u002Ftrain.md#data-1) | ImageBind 模态 + 点云 | imagebind_huge | Open-Chinese-LLaMA-7B |\n| ImageBind-对话                                             | [P] 前缀、投影、门控\u003Cbr \u002F>[F] 偏置、归一化、LoRA | [P] 图像-文本-V1\u003Cbr \u002F>[F] LLaVA、ShareGPT                   | ImageBind 模态 + 点云 | imagebind_huge | Open-Chinese-LLaMA-7B |\n\n+ [P] 表示 **P**re-train（预训练），[F] 表示 **F**ine-tune（微调）\n+ **图像-文本-V1** 是 LAION400M、COYO、MMC4、SBU、Conceptual Captions 和 COCO 的拼接\n+ **ImageBind 模态** 包括图像、视频、文本、音频、深度、热成像、IMU\n+ **ImageBind-对话** 将于近期发布\n\n\n## 概述\n效率对比：\n|  模型 | 参数量 | 存储空间 | 训练时间  \n| :-----: | :-----: |:-----:| :-----: |\n|  [Alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca) | 7B |13G| 3小时|\n|  LLaMA-Adapter | 1.2M |4.7M| 1小时|\n\n通过将适配器插入 LLaMA 的 Transformer 中，我们的方法仅引入了 **1.2M** 可学习参数，并在 **1 小时** 内将 LLaMA 转变为一个指令跟随模型。为了在训练初期稳定模型，我们提出了一种新颖的 **零初始化注意力** 方法，结合零门控机制，以自适应地融入指令信号。微调完成后，LLaMA-Adapter 能够生成高质量的指令跟随句子，其效果可与完全微调的 [Stanford Alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca) 和 [Alpaca-Lora](https:\u002F\u002Fgithub.com\u002Ftloen\u002Falpaca-lora) 相媲美。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpenGVLab_LLaMA-Adapter_readme_542afa855171.png\"\u002F width=\"90%\">\n\u003C\u002Fdiv>\n\n我们的方法可以轻松扩展到 **多模态输入指令**。例如，针对 [ScienceQA](https:\u002F\u002Fscienceqa.github.io\u002F) 的图像条件 LLaMA-Adapter 推理框架如下，这一框架同样适用于其他模态，如音频和视频。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpenGVLab_LLaMA-Adapter_readme_c6570d563fdc.png\"\u002F width=\"90%\">\n\u003C\u002Fdiv>\n\n\n\n\n## 设置\n\n以下是用于 **LLaMA-Adapter V1** 的从头开始脚本。\n```bash\nconda create -n llama_adapter -y python=3.8\nconda activate llama_adapter\n\n# 安装 PyTorch\nconda install pytorch cudatoolkit -c pytorch -y\n\n# 安装依赖和 LLaMA-Adapter\npip install -r requirements.txt\npip install -e .\n```\n**注意**: **要设置其他模型**，请参考 [llama_adapter_v2_chat65b](llama_adapter_v2_chat65b)、[llama_adapter_v2_multimodal7b](llama_adapter_v2_multimodal7b) 和 [imagebind_LLM](imagebind_LLM)，以获取更多详细信息。\n\n## 推理\n\n请通过 [此表单](https:\u002F\u002Fforms.gle\u002Fjk851eBVbX1m5TAv5)（官方）申请访问预训练的 LLaMA，或从 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fnyanko7\u002FLLaMA-7B\u002Ftree\u002Fmain)（非官方）下载 LLaMA-7B。然后，从 [这里](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Freleases\u002Fdownload\u002Fv.1.0.0\u002Fllama_adapter_len10_layer30_release.pth) 获取我们 LLaMA-Adapter 的权重。我们将下载的 LLaMA 和适配器权重路径分别记为 `TARGET_FOLDER` 和 `ADAPTER_PATH`。\n\n以下是一个使用 7B LLaMA 模型和我们的 LLaMA-Adapter 生成指令跟随句子的示例：\n```bash\ntorchrun --nproc_per_node 1 example.py \\\n         --ckpt_dir $TARGET_FOLDER\u002Fmodel_size\\\n         --tokenizer_path $TARGET_FOLDER\u002Ftokenizer.model \\\n         --adapter_path $ADAPTER_PATH\n```\n\n## 训练\n\n我们在 [这里](alpaca_finetuning_v1) 发布了 LLaMA-Adapter 在 LLaMA-7B 模型上的简单微调代码，便于用最少的依赖项轻松复现。我们很快还将发布 LLaMA-65B 和多模态 LLaMA-Adapter 的微调代码。\n\n请从 Stanford Alpaca 下载包含 52,000 条指令跟随训练数据的 [数据集](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca\u002Fblob\u002Fmain\u002Falpaca_data.json)，并将其放置在 `DATA_PATH` 目录下。然后运行：\n\n```bash\ncd alpaca_finetuning_v1\n\ntorchrun --nproc_per_node 8 finetuning.py \\\n         --model Llama7B_adapter \\\n         --llama_model_path $TARGET_FOLDER\u002F \\\n         --data_path $DATA_PATH\u002Falpaca_data.json \\\n         --adapter_layer 30 \\\n         --adapter_len 10 \\\n         --max_seq_len 512 \\\n         --batch_size 4 \\\n         --epochs 5 \\\n         --warmup_epochs 2 \\\n         --blr 9e-3 \\\n         --weight_decay 0.02 \\\n         --output_dir .\u002Fcheckpoint\u002F\n```\n\n## 与其他方法的比较\n\n### 指令 1：\n我该如何培养自己的批判性思维能力？请给出三条建议。\n\n* **LLaMA-Adapter V2：**\n1. 广泛阅读：阅读书籍、文章及其他信息来源，有助于提升你的批判性思维能力。通过接触不同的观点和想法，你可以更有效地分析和评估信息。\n\n2. 练习正念：正念是指活在当下，关注自己的思想和感受。通过更加觉察自己的思想和情绪，你可以更好地理解并评估它们，从而形成更为客观和理性的思维方式。\n\n3. 参与开放式的讨论：加入讨论小组或辩论俱乐部，与其他成员进行开放式的交流。这可以帮助你挑战自身的假设，鼓励你从不同角度思考问题，进而提升批判性思维能力。\n\n* **LLaMA-Adapter V1：**\n1. 广泛且批判性地阅读。\n2. 与持有不同观点的人展开讨论。\n3. 提出问题并寻求答案。\n\n### 指令 2：\n作为一名超级英雄，你会如何向一个好奇的孩子解释你的起源故事呢？\n\n* **LLaMA-Adapter V2：**\n\n作为一名超级英雄，我有一个非常有趣又激动人心的起源故事要跟你分享。从前，有个叫约翰的小男孩，他住在一座小镇上。约翰总是充满好奇心，也渴望帮助他人，因此他每天都会探索周围的世界，了解不同的文化和人群。有一天，他在外出探险时偶然发现了一处神秘的山洞。在山洞里，约翰找到了一件神奇的装置，它赋予了他超凡的能力。有了这些能力，约翰可以飞翔、用眼睛发射激光，甚至让自己隐形。\n\n约翰明白自己必须善用这些力量，于是他决定成为一名超级英雄，用自己的能力保护小镇和镇上的居民。从此，他每天刻苦训练，学习如何运用自己的超能力，以及如何对抗邪恶势力。最终，约翰成为了人们口中的“隐形侠”，他利用自己的超能力帮助他人，让世界变得更加美好。\n\n这就是我的起源故事，我相信你一定会喜欢的！\n\n* **LLaMA-Adapter V1：**\n\n从前，有一位拥有特殊能力的少年。他用这种能力去帮助有需要的人，保护无辜者。后来，他被大家称为超级英雄，用他的超能力与邪恶作斗争，守护着这个世界。\n\n### 指令 3：\n请介绍一下羊驼。\n\n* **LLaMA-Adapter**：羊驼属于骆驼科动物，原产于南美洲的安第斯山脉。它们通常以10至20只的群体生活，性格温顺，适应性强，能在恶劣的气候条件下生存。羊驼的毛色多为白色或灰色，拥有一身长而柔滑的皮毛。此外，羊驼还以其高品质的羊毛闻名，这种羊毛常被用于制作服装和其他产品。\n\n\u003Cdetails>\u003Csummary>其他方法\u003C\u002Fsummary>\n\n* **Alpaca-LoRA**：羊驼是骆驼科的一员，原产于南美洲的安第斯山脉。它们以柔软奢华的羊驼毛著称，这种毛被用来制作衣服、毯子等物品。羊驼是草食动物，以草和其他植物为食。它们群居，通常以不超过20只的群体生活。\n\n* **斯坦福羊驼**：羊驼是一种小型、毛茸茸的动物，与骆驼和美洲驼同属一类。它们原产于秘鲁和玻利维亚，大约在5000年前就被驯化。人们饲养羊驼主要是为了获取其细软的羊驼毛，用于制作针织品和其他服饰。羊驼是群居动物，通常以小家庭为单位生活，由一只年长的雄性领导。它们非常善于社交，能与主人建立深厚的感情。\n\n* **text-davinci-003**：羊驼是南美洲的一种家养骆驼科动物，与美洲驼相似。它们原产于南美洲的安第斯山脉，作为牲畜饲养，主要取其柔软奢华的羊驼毛。羊驼是群居动物，通常以不超过20只的群体生活。它们一般以2到5只为一群，饲养起来相对容易。羊驼是草食动物，以草、干草和其他植被为食。它们性格温和友善，因此也很适合作为宠物。\n\n\u003C\u002Fdetails>\n\n---\n\n### 指令 4：\n请介绍一下2019年墨西哥的总统。\n\n* **LLaMA-Adapter**：2019年墨西哥的总统是安德烈斯·曼努埃尔·洛佩斯·奥夫拉多尔，他于2018年12月1日就任。他是左翼政党莫雷纳党的一员，以民粹主义政策著称。他致力于打击腐败、减少贫困，并改善国家的基础设施建设。同时，他也一直公开批评美国的移民政策。\n\n\u003Cdetails>\u003Csummary>其他方法\u003C\u002Fsummary>\n\n* **Alpaca-LoRA**：2019年墨西哥的总统是安德烈斯·曼努埃尔·洛佩斯·奥夫拉多尔，他于2018年12月1日就任。他是国家复兴运动（MORENA）政党的成员，也是自1946年以来首位来自左翼的墨西哥总统。他以反腐败和反新自由主义政策闻名，同时也致力于改善墨西哥人民的生活状况。\n\n* **斯坦福羊驼**：2019年，安德烈斯·曼努埃尔·洛佩斯·奥夫拉多尔宣誓就任墨西哥总统。他是该国第58任总统，也是首位来自左翼政党莫雷纳的总统。他在2018年以打击腐败、减少贫困为竞选承诺当选，但随后因处理经济、犯罪问题以及与美国关系的方式而受到批评。他以雄心勃勃的社会福利计划著称，曾提高最低工资并向低收入家庭提供援助，因此备受赞誉。他被视为墨西哥民主与人权的捍卫者。\n\n* **text-davinci-003**：2019年墨西哥的总统是安德烈斯·曼努埃尔·洛佩斯·奥夫拉多尔，简称AMLO。他是一位墨西哥政治家，自2018年12月1日起担任墨西哥总统。他是国家复兴运动（MORENA）党的一员，此前曾在2000年至2005年间担任墨西哥城政府首长。他是一位左翼民粹主义者，曾承诺减少墨西哥的腐败和不平等现象。他还表示将增加对社会项目和基础设施建设的公共支出。\n\n\u003C\u002Fdetails>\n\n---\n\n### 指令 5：\n编写一个 Python 程序，打印前 10 个斐波那契数。\n\n* **LLaMA-Adapter**:\n```python\ndef fibonacci():\n    a = 0\n    b = 1\n    for i in range(10):\n        a, b = b, a + b\n        print(a, b)\n    print(\"斐波那契数: \", a, b)\n\nfibonacci()\n```\n\n\u003Cdetails>\u003Csummary>其他方法\u003C\u002Fsummary>\n\n* **Alpaca-LoRA**：\n\n```python\ndef fibonacci(n):\n    if n == 0:\n        return 0\n    elif n == 1:\n        return 1\n    else:\n        return fibonacci(n-1) + fibonacci(n-2)\n\nprint(fibonacci(10))\n```\n\n* **斯坦福 Alpaca**：\n\n```python\nnumbers = [0, 1]\n\nfor i in range(2, 11):\n    numbers.append(numbers[i-2] + numbers[i-1])\n\nprint(numbers)\n# 输出：[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]\n```\n\n* **text-davinci-003**：\n\n```python\n# 斐波那契数列是如下整数序列中的数字，称为斐波那契数列，其特点是每个数都等于前两个数之和。\n\n# 斐波那契数列：0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...\n\n# 程序：\n\n# 初始化斐波那契数列的前两个数字\na = 0\nb = 1\n\n# 打印斐波那契数列的前 10 个数字\nprint(a)\nprint(b)\n\nfor i in range(2,10):\n    c = a + b\n    a = b\n    b = c\n    print(c)\n```\n\n\u003C\u002Fdetails>\n\n---\n\n### 指令 6：\n编写一个程序，打印从 1 到 100 的数字。但对于 3 的倍数，打印“Fizz”代替该数字；对于 5 的倍数，打印“Buzz”代替该数字。对于既是 3 的倍数又是 5 的倍数的数字，打印“FizzBuzz”。\n\n* **LLaMA-Adapter**：\n\n```python\ndef print_numbers(n):\n    for i in range(1, 101):\n        if i % 3 == 0 and i % 5 == 0:\n            print(\"FizzBuzz\")\n        elif i % 3 == 0:\n            print(\"Fizz\")\n        elif i % 5 == 0:\n            print(\"Buzz\")\n        else:\n            print(i)\n```\n\n\u003Cdetails>\u003Csummary>其他方法\u003C\u002Fsummary>\n\n* **Alpaca-LoRA**：\n\n```python\nfor i in range(1, 101):\n    if i % 3 == 0 and i % 5 == 0:\n        print(\"FizzBuzz\")\n    elif i % 3 == 0:\n        print(\"Fizz\")\n    elif i % 5 == 0:\n        print(\"Buzz\")\n    else:\n        print(i)\n```\n\n* **斯坦福 Alpaca**：\n\n```python\nfor i in range(1, 101):\n    if i % 3 == 0 and i % 5 == 0:\n        print('FizzBuzz')\n    elif i % 3 == 0:\n        print('Fizz')\n    elif i % 5 == 0:\n        print('Buzz')\n    else:\n        print(i)\n```\n\n* **text-davinci-003**：\n\n```javascript\nfor (let i = 1; i \u003C= 100; i++) {\n  if (i % 3 === 0 && i % 5 === 0) {\n    console.log(\"FizzBuzz\");\n  } else if (i % 3 === 0) {\n    console.log(\"Fizz\");\n  } else if (i % 5 === 0) {\n    console.log(\"Buzz\");\n  } else {\n    console.log(i);\n  }\n}\n```\n\n\u003C\u002Fdetails>\n\n\n\n## 贡献者\n[高鹏](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=_go6DPsAAAAJ&hl=zh-CN), [韩嘉明](https:\u002F\u002Fcsuhan.com), [张仁瑞](https:\u002F\u002Fgithub.com\u002FZrrSkywalker), [周傲俊](https:\u002F\u002Fgithub.com\u002Faojunzz)\n\n## 招聘公告\n\n🔥 **我们正在招聘** 实习生、博士后和全职研究员，加入 **上海人工智能实验室通用视觉组**，研究方向为多模态和视觉基础模型。如有兴趣，请联系 [gaopengcuhk@gmail.com](mailto:gaopengcuhk@gmail.com)。\n\n## 引用\n如果您觉得我们的 LLaMA-Adapter 代码和论文有用，请引用以下内容：\n```bash\n@article{zhang2023llamaadapter,\n  title = {LLaMA-Adapter: 使用零初始化注意力高效微调语言模型},\n  author={Zhang, Renrui and Han, Jiaming and Liu, Chris and Gao, Peng and Zhou, Aojun and Hu, Xiangfei and Yan, Shilin and Lu, Pan and Li, Hongsheng and Qiao, Yu},\n  journal={arXiv 预印本 arXiv:2303.16199},\n  year={2023}\n}\n```\n\n如果您觉得我们的 LLaMA-Adapter V2 代码和论文有用，请引用以下内容：\n```bash\n@article{gao2023llamaadapterv2,\n  title = {LLaMA-Adapter V2：参数高效的视觉指令模型},\n  author={Gao, Peng and Han, Jiaming and Zhang, Renrui and Lin, Ziyi and Geng, Shijie and Zhou, Aojun and Zhang, Wei and Lu, Pan and He, Conghui and Yue, Xiangyu and Li, Hongsheng and Qiao, Yu},\n  journal={arXiv 预印本 arXiv:2304.15010},\n  year={2023}\n}\n```\n\n## 致谢\n本仓库受益于 [LLaMA](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama)、[斯坦福 Alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca) 和 [Alpaca-Lora](https:\u002F\u002Fgithub.com\u002Ftloen\u002Falpaca-lora)。感谢他们的杰出工作。","# LLaMA-Adapter 快速上手指南\n\nLLaMA-Adapter 是一种轻量级的适配方法，仅需引入极少量的可学习参数（约 1.2M），即可在 1 小时内将 LLaMA 模型微调为指令遵循模型。本指南基于官方 V1 版本提供快速启动流程。\n\n## 环境准备\n\n*   **操作系统**: Linux (推荐 Ubuntu)\n*   **Python 版本**: 3.8+\n*   **硬件要求**: NVIDIA GPU (支持 CUDA)，显存建议 16GB 以上（针对 7B 模型微调\u002F推理）\n*   **前置依赖**:\n    *   Conda (推荐用于环境管理)\n    *   PyTorch (需匹配 CUDA 版本)\n    *   LLaMA 模型权重 (需自行申请或下载)\n\n## 安装步骤\n\n### 1. 创建并激活虚拟环境\n```bash\nconda create -n llama_adapter -y python=3.8\nconda activate llama_adapter\n```\n\n### 2. 安装 PyTorch\n请根据您的 CUDA 版本选择合适的安装命令（以下为通用示例）：\n```bash\nconda install pytorch cudatoolkit -c pytorch -y\n```\n> **国内加速提示**: 如遇下载缓慢，可使用清华源：\n> `conda install pytorch cudatoolkit -c https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fcloud\u002Fpytorch -y`\n\n### 3. 安装依赖与工具包\n克隆仓库后，在项目根目录执行：\n```bash\npip install -r requirements.txt\npip install -e .\n```\n> **国内加速提示**: 建议使用 pip 国内镜像：\n> `pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n> `pip install -e . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n## 基本使用\n\n### 前置准备：获取模型权重\n在运行前，您需要准备以下两个路径的权重文件：\n1.  **LLaMA 原生权重 (`TARGET_FOLDER`)**:\n    *   官方申请：[Google Form](https:\u002F\u002Fforms.gle\u002Fjk851eBVbX1m5TAv5)\n    *   HuggingFace 非官方下载：[LLaMA-7B](https:\u002F\u002Fhuggingface.co\u002Fnyanko7\u002FLLaMA-7B\u002Ftree\u002Fmain)\n2.  **LLaMA-Adapter 微调权重 (`ADAPTER_PATH`)**:\n    *   下载地址：[llama_adapter_len10_layer30_release.pth](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Freleases\u002Fdownload\u002Fv.1.0.0\u002Fllama_adapter_len10_layer30_release.pth)\n\n### 推理示例\n以下命令展示如何使用 7B LLaMA 模型配合 Adapter 生成指令遵循回复。请替换 `$TARGET_FOLDER` 和 `$ADAPTER_PATH` 为您的实际本地路径。\n\n```bash\ntorchrun --nproc_per_node 1 example.py \\\n         --ckpt_dir $TARGET_FOLDER\u002Fmodel_size\\\n         --tokenizer_path $TARGET_FOLDER\u002Ftokenizer.model \\\n         --adapter_path $ADAPTER_PATH\n```\n\n### 微调示例 (可选)\n若需从头微调，请下载 [Stanford Alpaca 数据集](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca\u002Fblob\u002Fmain\u002Falpaca_data.json) 并放置于 `DATA_PATH` 目录下，然后进入 `alpaca_finetuning_v1` 目录运行：\n\n```bash\ncd alpaca_finetuning_v1\n\ntorchrun --nproc_per_node 8 finetuning.py \\\n         --model Llama7B_adapter \\\n         --llama_model_path $TARGET_FOLDER\u002F \\\n         --data_path $DATA_PATH\u002Falpaca_data.json \\\n         --adapter_layer 30 \\\n         --adapter_len 10 \\\n         --max_seq_len 512 \\\n         --batch_size 4 \\\n         --epochs 5 \\\n         --warmup_epochs 2 \\\n         --blr 9e-3 \\\n         --weight_decay 0.02 \\\n         --output_dir .\u002Fcheckpoint\u002F\n```\n\n> **注意**: 如需使用多模态版本 (V2\u002FV2.1) 或 65B 模型，请参考仓库中 `llama_adapter_v2_multimodal7b` 及 `llama_adapter_v2_chat65b` 目录下的具体文档。","某初创教育科技公司希望将通用的 LLaMA 大模型快速改造为能够解答高中物理题并遵循特定教学风格的专属助教。\n\n### 没有 LLaMA-Adapter 时\n- **训练成本高昂**：全量微调需要更新数十亿参数，必须租用昂贵的多卡 GPU 集群，单次实验成本高达数千美元。\n- **时间周期漫长**：数据准备到模型收敛通常需要数天甚至数周，无法跟上课程内容的快速迭代节奏。\n- **存储压力巨大**：每针对一个新学科（如化学或历史）微调一个模型，都需要保存一份完整的巨型权重文件，服务器存储空间迅速告急。\n- **灾难性遗忘风险**：在注入新知识时，模型容易丢失原有的通用语言理解能力，导致回答变得生硬或逻辑混乱。\n\n### 使用 LLaMA-Adapter 后\n- **极致参数高效**：仅需训练约 120 万可学习参数（Zero-init Attention 机制），单张消费级显卡即可完成微调，硬件成本降低两个数量级。\n- **小时级快速交付**：从数据输入到获得可用模型仅需 1 小时，团队可以在一天内完成多个学科模型的验证与部署。\n- **轻量级存储管理**：只需保存微小的适配器权重文件，同一基座模型可灵活挂载多个不同学科的“插件”，极大节省存储空间。\n- **能力稳定保留**：独特的零初始化注意力机制确保模型在掌握物理知识的同时，完美保留了 LLaMA 原有的流畅对话与推理底座能力。\n\nLLaMA-Adapter 通过极致的参数效率，让中小企业也能以低成本、分钟级的速度实现大模型的垂直领域定制化落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpenGVLab_LLaMA-Adapter_6c1dea29.png","OpenGVLab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FOpenGVLab_4422f20f.jpg","General Vision Team of Shanghai AI Laboratory",null,"opengvlab@gmail.com","opengvlab","https:\u002F\u002Fgithub.com\u002FOpenGVLab",[80,84,88,92,96,100],{"name":81,"color":82,"percentage":83},"Python","#3572A5",95,{"name":85,"color":86,"percentage":87},"JavaScript","#f1e05a",2.7,{"name":89,"color":90,"percentage":91},"Shell","#89e051",1.5,{"name":93,"color":94,"percentage":95},"Rust","#dea584",0.4,{"name":97,"color":98,"percentage":99},"Scheme","#1e4aec",0.2,{"name":101,"color":102,"percentage":103},"C++","#f34b7d",0.1,5926,382,"2026-04-10T10:49:52","GPL-3.0",4,"Linux","必需 NVIDIA GPU。训练 LLaMA-7B 示例使用 8 卡环境 (torchrun --nproc_per_node 8)，推理支持单卡。具体显存需求未说明，但需容纳 LLaMA 基座模型 (7B\u002F65B) 及 CUDA 加速环境。","未说明",{"notes":113,"python":114,"dependencies":115},"1. 官方安装脚本仅针对 Linux (conda 环境)。\n2. 需单独申请或下载 LLaMA 基座模型权重 (7B 或 65B)。\n3. 不同版本模型 (V1, V2, Multimodal) 的具体依赖可能不同，需参考对应子目录文档。\n4. 训练示例基于 Alpaca 数据集。","3.8",[116,117],"pytorch","cudatoolkit",[35,14,119],"其他","2026-03-27T02:49:30.150509","2026-04-13T13:34:54.241138",[123,128,133,138,143,148],{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},31485,"LLaMA-Adapter-V2 的多模态能力（图像、视频、音频等）代码在哪里？","多模态的预训练、微调和推理代码已发布。支持图像、视频、文本、音频和点云输入，以及中英文双语回复。您可以访问以下仓库获取代码：\nhttps:\u002F\u002Fgithub.com\u002FZrrSkywalker\u002FLLaMA-Adapter\u002Ftree\u002Fmain\u002Fimagebind_LLM\n此外，完整的微调代码也已上传至：\nhttps:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Ftree\u002Fmain\u002Fllama_adapter_v2_multimodal","https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Fissues\u002F38",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},31486,"运行 imagebind_LLM 示例时遇到缺少 'knn.so' 文件或构建错误怎么办？","这通常是由于 PyTorch 版本与 CUDA 版本不匹配导致的。建议将 torch 版本降级以匹配您的 CUDA 版本（例如用户反馈降级到 torch==1.10.0 或匹配特定 CUDA 版本的 torch）。此外，确保已手动安装缺失的依赖项，如：\nconda install -c conda-forge cartopy\npip install matplotlib --no-cache-dir --force-reinstall\npip install ninja\n如果涉及 cpp_extension 构建问题，可能需要修改 pytorch 源码中的命令（将 'ninja', '-v' 改为 'ninja', '--version'），但首选方案是检查并修正 torch 和 CUDA 的版本兼容性。","https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Fissues\u002F91",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},31487,"哪个文件包含用于微调的预训练权重（Pretrained Weights）？","名为 \"7B.pth\" 的检查点已经是经过指令跟随数据微调后的模型。如果您需要预训练后但未进行指令微调的权重（即仅经过 COCO 和 CC3M 等数据预训练的权重），请使用 \"7B-pretrained.pth\"。\n下载地址：\n微调版：https:\u002F\u002Fhuggingface.co\u002FCxxs\u002FImageBind-LLM\u002Fblob\u002Fmain\u002F7B.pth\n预训练版：https:\u002F\u002Fhuggingface.co\u002FCxxs\u002FImageBind-LLM\u002Fblob\u002Fmain\u002F7B-pretrained.pth","https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Fissues\u002F66",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},31488,"运行 llama_adapter_v2_multimodal7b 的 demo 时输出为空或只返回特定 token IDs 怎么办？","这个问题通常是因为使用了错误的基座模型版本。请确保使用的是 **LLaMA-7B** (第一代)，而不是 **LLaMA-2-7B**。多模态 Adapter V2 的演示脚本目前仅兼容第一代 LLaMA 模型。如果您配置的是 LLaMA-2，请切换回 LLaMA-7B 重新尝试。","https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Fissues\u002F107",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},31489,"如何实现对多张图像的推理（例如同时描述两张图片）？","目前官方仍在探索如何将 LLaMA-Adapter V2 适配于多图像理解任务，尚未提供简单的现成代码修改方案来直接支持多图推理。虽然理论上可以将多张图像视为独立的 token 序列加入，但具体的实现细节和最佳实践仍在研究中。建议关注官方仓库的后续更新以获取正式支持。","https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Fissues\u002F71",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},31490,"Visual Instruction 模型是否支持 7B 模型？相关代码在哪里？","是的，该功能已支持。视觉指令模型的相关代码（包括预训练、微调和推理）已在 imagebind_LLM 目录中发布，并且支持 7B 模型。您可以在此处找到完整代码：\nhttps:\u002F\u002Fgithub.com\u002FZrrSkywalker\u002FLLaMA-Adapter\u002Ftree\u002Fmain\u002Fimagebind_LLM\n此外，针对多模态版本的 M1 Mac 加速分支也可在社区的 fork 中找到。","https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FLLaMA-Adapter\u002Fissues\u002F24",[154,159,164],{"id":155,"version":156,"summary_zh":157,"released_at":158},231258,"v.2.1.0","发布 LLaMA-Adapter V2.1 多模态模型","2023-10-12T08:35:30",{"id":160,"version":161,"summary_zh":162,"released_at":163},231259,"v.2.0.0","发布 LLaMA-Adapter V2 多模态模型","2023-05-25T16:51:13",{"id":165,"version":166,"summary_zh":75,"released_at":167},231260,"v.1.0.0","2023-03-19T18:32:44"]