[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-JIA-Lab-research--LongLoRA":3,"tool-JIA-Lab-research--LongLoRA":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":73,"owner_company":75,"owner_location":75,"owner_email":75,"owner_twitter":75,"owner_website":75,"owner_url":76,"languages":77,"stars":82,"forks":83,"last_commit_at":84,"license":85,"difficulty_score":10,"env_os":86,"env_gpu":87,"env_ram":86,"env_deps":88,"category_tags":96,"github_topics":97,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":103,"updated_at":104,"faqs":105,"releases":135},8673,"JIA-Lab-research\u002FLongLoRA","LongLoRA","Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)","LongLoRA 是一套专为提升大语言模型长上下文处理能力而设计的开源方案，包含核心算法 LongLoRA 及配套的长指令微调数据集 LongAlpaca。它主要解决了传统大模型在处理长篇文档、多轮对话时，因上下文窗口限制导致信息丢失或计算资源消耗过大的难题。\n\n通过引入高效的移位短注意力机制（Shifted Short Attention）并仅对少量参数进行微调，LongLoRA 能以极低的显存成本，将模型的上下文理解长度显著扩展至 32k 甚至更长，同时保持原有的通用能力。该项目已作为口头报告论文入选 ICLR 2024，并提供了从数据生成、模型训练到流式推理的完整代码实现，支持 Llama 2、GPTNeoX 等多种主流架构，还可与 QLoRA 结合进一步降低硬件门槛。\n\nLongLoRA 非常适合 AI 研究人员、大模型开发者以及需要处理长文本任务的技术团队使用。无论是希望低成本定制专属长文本模型的研究者，还是致力于开发文档分析、法律合同审查等长上下文应用的企业开发者，都能利用 LongLoRA 快速构建高效可靠的解决方案。","\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_33b6676a97b3.png\" alt=\"Stanford-Alpaca\" style=\"width: 100%; min-width: 300px; display: block; margin: auto;\">\n\u003C\u002Fp>\n\n# LongLoRA and LongAlpaca for Long-context LLMs\n\n\n[![Huggingface Models](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModels-Huggingface%20Models-bron)](https:\u002F\u002Fhuggingface.co\u002FYukang)\n[![Data](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FData-LongAlpaca%2012k-light)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYukang\u002FLongAlpaca-12k)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arvix%20Link-green)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.12307)\n\n[![Code License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode%20License-Apache_2.0-yellow.svg)](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fblob\u002Fmain\u002FLICENSE)\n[![Data License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FData%20License-CC%20By%20NC%204.0-orange.svg)](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fblob\u002Fmain\u002FDATA_LICENSE)\n[![Weight License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeight%20License-CC%20By%20NC%204.0-red)](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fblob\u002Fmain\u002FWEIGHT_LICENSE)\n\n\n## TABLE OF CONTENTS\n1. [News](#news)\n2. [Highlights](#highlights)\n3. [How to contribute](#how-to-contribute)\n4. [Requirements](#usage-requirements)\n5. [Installation and quick guide](#installation-and-quick-guide)\n6. [LongAlpaca Data](#longalpaca-data)\n7. [Models](#models)\n8. [Training](#training)\n9. [Evaluation](#evaluation)\n10. [Demo](#demo)\n11. [Streaming Inference](#streaming-inference)\n12. [Data Generation via Pdf2Text](#data-generation-via-pdf2text)\n13. [Examples](#examples)\n14. [Citation](#citation)\n15. [Acknowledgement](#acknowledgement)\n16. [License](#license)\n      \n## News\n- [x] [2024.1.17] [LongLoRA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.12307) has been accepted by **ICLR 2024** as an **Oral** presentation.\n- [x] [2023.11.19] We release a new version of LongAlpaca models, [LongAlpaca-7B-16k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B-16k), [LongAlpaca-7B-16k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-13B-16k), and [LongAlpaca-7B-16k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-70B-16k). These models are fine-tuned on a subset LongAlpaca-12k dataset with LongLoRA in SFT, [LongAlpaca-16k-length](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYukang\u002FLongAlpaca-16k-length). We evaluate the [LongAlpaca-7B-16k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B-16k) model on LongBench and L-Eval benchmarks and results can be found [here](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Ftree\u002Fmain\u002Fbenchmarks).\n- [x] [2023.11.2] We have updated our LongAlpaca models from alpaca prompting to llama2 prompting, which is consistent to their pre-trained models. Please refer to the [inference code](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fblob\u002F2345c6d030f61ac3a031906386a103a5b05e0e6f\u002Finference.py#L18) with the llama2 prompting.\n- [x] [2023.10.23] We support the combination of [QLoRA](https:\u002F\u002Fgithub.com\u002Fartidoro\u002Fqlora) and LongLoRA in the [supervised fine-tuning](supervised-fine-tune-qlora.py), for further reduction of the GPU memory cost. We release the LoRA weights of a 7B model at [LongAlpaca-7B-qlora-weights](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B-qlora-weights).\n- [x] [2023.10.18] We support [StreamingLLM](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fstreaming-llm) inference on our LongAlpaca models. This increases the context-length of the multi-round dialogue in StreamingLLM.\n- [x] [2023.10.8] **We release the long instruction-following dataset**, [LongAlpaca-12k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYukang\u002FLongAlpaca-12k) and **the corresponding models**, [LongAlpaca-7B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B), [LongAlpaca-13B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-13B), and [LongAlpaca-70B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-70B).\n- (*The previous sft models*, [Llama-2-13b-chat-longlora-32k-sft](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-chat-longlora-32k-sft) and [Llama-2-70b-chat-longlora-32k-sft](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-70b-chat-longlora-32k-sft), *have been deprecated*.)\n- [x] [2023.10.3] We add support GPTNeoX models. Please refer to this [PR](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fpull\u002F32) for usage. Thanks for @naubull2 for this contribution.\n- [x] [2023.9.22] We release all our fine-tuned [models](https:\u002F\u002Fhuggingface.co\u002FYukang), including **70B-32k models**, [LLaMA2-LongLoRA-70B-32k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-70b-longlora-32k), [LLaMA2-LongLoRA-7B-100k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-100k-ft). Welcome to check them out!\n- [x] [2023.9.22] We release [Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2309.12307) and this GitHub repo, including training and evaluation code.\n\n**LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models [[Paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2309.12307)]** \u003Cbr \u002F>\n[Yukang Chen](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=6p0ygKUAAAAJ&hl=en),\n[Shengju Qian](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=QNnWmasAAAAJ),\n[Haotian Tang](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=WxL13BAAAAAJ&hl),\n[Xin Lai](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=tqNDPA4AAAAJ&hl=zh-CN),\n[Zhijian Liu](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=3coYSTUAAAAJ&hl=en),\n[Song Han](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=E0iCaa4AAAAJ&hl=zh-CN),\n[Jiaya Jia](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=XPAkzTEAAAAJ&hl=en)\u003Cbr \u002F>\n\n## Highlights\n1. In LongLoRA approach, The proposed shifted short attention is easy to implement, compatible with Flash-Attention, and is not required during inference.\n2. We released all our models, including models from 7B to 70B, context length from 8k to 100k, including [LLaMA2-LongLoRA-7B-100k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-100k-ft), [LLaMA2-LongLoRA-13B-64k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-64k), and [LLaMA2-LongLoRA-70B-32k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-70b-longlora-32k).\n3. We built up a long-context instruction-following dataset, [LongAlpaca-12k](#longalpaca-data). We released the corresponding [LongAlpaca-7B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B), [LongAlpaca-13B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-13B) and [LongAlpaca-70B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-70B) models. To our best knowledge, this is the first open-sourced long-context 70B model.\n\n\n## How to Contribute\n- Make sure to have git installed.\n- Create your own [fork](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Ffork) of the project.\n- Clone the repository on your local machine, using git clone and pasting the url of this project.\n- Read both the `Requirements` and `Installation and Quick Guide` sections below.\n- Commit and push your changes.\n- Make a pull request when finished modifying the project.\n\n\n## Usage Requirements\nTo download and use the [pre-trained weights](#pre-trained-weights) you will need:\n1. Hugging Face (HF) account with valid email. Note, the email used for HF must alse be used for the license agreement.\n2. Accept the Meta [license and acceptable use policy](https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fllama-downloads\u002F) \n\n\n## Installation and Quick Guide\nTo install and run the application:\n1. [Fork this repo](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Ffork) on github\n2. Clone the repository on your local machine, using git clone and pasting the url of this project.\n3. Run the following code:\n```\npip install -r requirements.txt\npip install flash-attn --no-build-isolation\n```\n4. Use either a [Released model](#released-models) or [Fine tune](#fine-tuning) a model to fit your preferences.\n5. Test your model by chat.\n6. Deploy your own demo.\n\n## LongAlpaca Data\n\nLongAlpaca-12k contains 9k long QA data that we collected and 3k short QA sampled from the original [Alpaca data](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca\u002Fblob\u002Fmain\u002Falpaca_data.json). This is to avoid the case that the model might degrade at short instruction following. The data we collect contains various types and amounts as the following figure.\n\n\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_f11d968bceb0.png\" alt=\"Stanford-Alpaca\" style=\"width: 60%; min-width: 300px; display: block; margin: auto;\">\n\u003C\u002Fp>\n\n\n| Data           | Short QA | Long QA  | Total    | Download |\n|:---------------|----------|----------|----------|----------|\n| LongAlpaca-12k | 3k       | 9k       | 12k      | [Link](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYukang\u002FLongAlpaca-12k) |\n\nFollowing the original Alpaca format, our Long QA data uses the following prompts for fine-tuning:\n- `instruction`: `str`, describes the task the model should perform. For example, to answer a question after reading a book section or paper. We vary the contents and questions to make instructions diverse.\n- `output`: `str`, the answer to the instruction.\n\nWe did not use the `input` format in the Alpaca format for simplicity.\n\n## Models\n\n### Models with supervised fine-tuning\n| Model          | Size | Context | Train   | Link                                                       |\n|:---------------|------|---------|---------|------------------------------------------------------------|\n| LongAlpaca-7B  | 7B   | 32768   | Full FT | [Model](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B)       |\n| LongAlpaca-13B | 13B  | 32768   | Full FT | [Model](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-13B)      |\n| LongAlpaca-70B | 70B  | 32768   | LoRA+ | [Model](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-70B) [(LoRA-weight)](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-70B-lora) |\n\n\n### Models with context extension via fully fine-tuning\n| Model                       | Size | Context | Train | Link                                                              |\n|:----------------------------|------|---------|-------|-------------------------------------------------------------------|\n| Llama-2-7b-longlora-8k-ft   | 7B   | 8192    | Full FT    | [Model](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-8k-ft)  |\n| Llama-2-7b-longlora-16k-ft  | 7B   | 16384   | Full FT    | [Model](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-16k-ft)  |\n| Llama-2-7b-longlora-32k-ft  | 7B   | 32768   | Full FT    | [Model](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-32k-ft)  |\n| Llama-2-7b-longlora-100k-ft | 7B   | 100000  | Full FT    | [Model](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-100k-ft) |\n| Llama-2-13b-longlora-8k-ft  | 13B  | 8192    | Full FT    | [Model](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-8k-ft)  |\n| Llama-2-13b-longlora-16k-ft | 13B  | 16384   | Full FT    | [Model](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-16k-ft) |\n| Llama-2-13b-longlora-32k-ft | 13B  | 32768   | Full FT    | [Model](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-32k-ft) |\n\n### Models with context extension via improved LoRA fine-tuning\n| Model                       | Size | Context | Train | Link                                                                |\n|:----------------------------|------|---------|-------|---------------------------------------------------------------------|\n| Llama-2-7b-longlora-8k      | 7B   | 8192    | LoRA+ | [LoRA-weight](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-8k) |\n| Llama-2-7b-longlora-16k     | 7B   | 16384   | LoRA+ | [LoRA-weight](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-16k)       |\n| Llama-2-7b-longlora-32k     | 7B   | 32768   | LoRA+ | [LoRA-weight](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-32k)       |\n| Llama-2-13b-longlora-8k     | 13B  | 8192    | LoRA+ | [LoRA-weight](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-8k)       |\n| Llama-2-13b-longlora-16k    | 13B  | 16384   | LoRA+ | [LoRA-weight](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-16k)      |\n| Llama-2-13b-longlora-32k    | 13B  | 32768   | LoRA+ | [LoRA-weight](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-32k)      |\n| Llama-2-13b-longlora-64k    | 13B  | 65536   | LoRA+ | [LoRA-weight](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-64k)      |\n| Llama-2-70b-longlora-32k    | 70B  | 32768   | LoRA+ | [LoRA-weight](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-70b-longlora-32k)      |\n| Llama-2-70b-chat-longlora-32k    | 70B  | 32768   | LoRA+ | [LoRA-weight](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-70b-chat-longlora-32k) |\n\n## Training\n### Pre-trained weights\nWe use LLaMA2 models as the pre-trained weights and fine-tune them to long context window sizes. Download based on your choices.\n\n| Pre-trained weights                                                        |\n|:---------------------------------------------------------------------------|\n| [Llama-2-7b-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-hf)           |\n| [Llama-2-13b-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-13b-hf)         |\n| [Llama-2-70b-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-70b-hf)         |\n| [Llama-2-7b-chat-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-chat-hf) |\n| [Llama-2-13b-chat-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-13b-chat-hf)         |\n| [Llama-2-70b-chat-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-70b-chat-hf)         |\n\nThis project also supports GPTNeoX models as the base model architecture. Some candidate pre-trained weights may include [GPT-NeoX-20B](https:\u002F\u002Fhuggingface.co\u002FEleutherAI\u002Fgpt-neox-20b), [Polyglot-ko-12.8B](https:\u002F\u002Fhuggingface.co\u002FEleutherAI\u002Fpolyglot-ko-12.8b) and other variants.\n\n### Fine-tuning\n```\ntorchrun --nproc_per_node=8 fine-tune.py  \\\n        --model_name_or_path path_to\u002FLlama-2-7b-hf \\\n        --bf16 True \\\n        --output_dir path_to_saving_checkpoints       \\\n        --cache_dir path_to_cache \\\n        --model_max_length 8192 \\\n        --use_flash_attn True \\\n        --low_rank_training False \\\n        --num_train_epochs 1  \\\n        --per_device_train_batch_size 1     \\\n        --per_device_eval_batch_size 2     \\\n        --gradient_accumulation_steps 8     \\\n        --evaluation_strategy \"no\"     \\\n        --save_strategy \"steps\"     \\\n        --save_steps 1000     \\\n        --save_total_limit 2     \\\n        --learning_rate 2e-5     \\\n        --weight_decay 0.0     \\\n        --warmup_steps 20     \\\n        --lr_scheduler_type \"constant_with_warmup\"     \\\n        --logging_steps 1     \\\n        --deepspeed \"ds_configs\u002Fstage2.json\" \\\n        --tf32 True \\\n        --max_steps 1000\n```\n\n- Please remember to change `path_to\u002FLlama-2-7b-hf`, `path_to_saving_checkpoints`, `path_to_cache` to your own directory.\n- Note that you can change `model_max_length` to other values.\n- You could change `ds_configs\u002Fstage2.json` to `ds_configs\u002Fstage3.json` if you want.\n- Please set `use_flash_attn` as `False` if you use V100 machines or do not install flash attention.\n- You can set `low_rank_training` as `False` if you want to use fully fine-tuning. It will cost more GPU memory and slower, but the performance will be a bit better.\n- When training is finished, to get the full model weight:\n```\ncd path_to_saving_checkpoints && python zero_to_fp32.py . pytorch_model.bin\n```\nNote that the path_to_saving_checkpoints might be the global_step directory, which depends on the deepspeed versions.\n\n### Supervised Fine-tuning\n```\ntorchrun --nproc_per_node=8 supervised-fine-tune.py  \\\n        --model_name_or_path path_to_Llama2_chat_models \\\n        --bf16 True \\\n        --output_dir path_to_saving_checkpoints       \\\n        --model_max_length 16384 \\\n        --use_flash_attn True \\\n        --data_path LongAlpaca-16k-length.json \\\n        --low_rank_training True \\\n        --num_train_epochs 5  \\\n        --per_device_train_batch_size 1     \\\n        --per_device_eval_batch_size 2     \\\n        --gradient_accumulation_steps 8     \\\n        --evaluation_strategy \"no\"     \\\n        --save_strategy \"steps\"     \\\n        --save_steps 98     \\\n        --save_total_limit 2     \\\n        --learning_rate 2e-5     \\\n        --weight_decay 0.0     \\\n        --warmup_steps 20     \\\n        --lr_scheduler_type \"constant_with_warmup\"     \\\n        --logging_steps 1     \\\n        --deepspeed \"ds_configs\u002Fstage2.json\" \\\n        --tf32 True\n```\n- There is no need to make supervised fine-tuning upon the fine-tuned context extended models. It is all right to directly use base model as Llama2-chat models, as the amount of long instruction following data is enough for SFT.\n- Our long instruction following data can be found in [LongAlpaca-12k.json](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYukang\u002FLongAlpaca-12k).\n- Note that supervised-fine-tune.py can be replaced by supervised-fine-tune-qlora.py if you want to try 4-bit quantized fine-tuning for further GPU memory reduction. This follows [QLoRA](https:\u002F\u002Fgithub.com\u002Fartidoro\u002Fqlora).\n- If you meet issue for saving pytorch_model.bin after the qlora sft, please refer to this [issue](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fissues\u002F123).\n\n### Get trainable weights in low-rank training\nIn low-rank training, we set embedding and normalization layers as trainable. Please use the following line to extract the trainable weights `trainable_params.bin` from `pytorch_model.bin`\n```\npython3 get_trainable_weights.py --checkpoint_path path_to_saving_checkpoints --trainable_params \"embed,norm\"\n```\n\n### Merge LoRA Weight\nMerge the LoRA weights of `pytorch_model.bin` and trainable parameters `trainable_params.bin`, save the resulting model into your desired path in the Hugging Face format:\n```\npython3 merge_lora_weights_and_save_hf_model.py \\\n        --base_model path_to\u002FLlama-2-7b-hf \\\n        --peft_model path_to_saving_checkpoints \\\n        --context_size 8192 \\\n        --save_path path_to_saving_merged_model\n```\nFor example,\n```\npython3 merge_lora_weights_and_save_hf_model.py \\\n        --base_model \u002Fdataset\u002Fpretrained-models\u002FLlama-2-7b-hf \\\n        --peft_model \u002Fdataset\u002Fyukangchen\u002Fhf_models\u002Flora-models\u002FLlama-2-7b-longlora-8k \\\n        --context_size 8192 \\\n        --save_path \u002Fdataset\u002Fyukangchen\u002Fmodels\u002FLlama-2-7b-longlora-8k-merged\n```\n\n\n## Evaluation\n### Perplexity Validation\nTo evaluate a model that is trained in the low-rank setting, please set both `base_model` and `peft_model`. `base_model` is the pre-trained weight. `peft_model` is the path to the saved checkpoint, which should contain `trainable_params.bin`, `adapter_model.bin` and `adapter_config.json`. For example,\n```\npython3 eval.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to\u002FLlama-2-7b-hf --peft_model path_to_saving_checkpoints --data_path pg19\u002Ftest.bin\n```\n\nOr evaluate with multiple GPUs as follows.\n```\ntorchrun --nproc_per_node=auto eval_distributed.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to\u002FLlama-2-7b-hf --peft_model path_to_saving_checkpoints --data_path pg19\u002Ftest.bin\n```\n\nTo evaluate a model that is fully fine-tuned, you only need to set `base_model` as the path to the saved checkpoint, which should contain `pytorch_model.bin` and `config.json`. `peft_model` should be ignored.\n```\npython3 eval.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to_saving_checkpoints --data_path pg19\u002Ftest.bin\n```\n\nOr evaluate with multiple GPUs as follows.\n```\ntorchrun --nproc_per_node=auto eval_distributed.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to_saving_checkpoints --data_path pg19\u002Ftest.bin\n```\n\n- Note that `--seq_len` is to set the sequence length for evaluation. `--context_size` is to set the context length of the model during fine-tuning. `--seq_len` should not be larger than `--context_size`.\n\n- We have already tokenized the validation and test splits of PG19 and proof-pile dataset into `pg19\u002Fvalidation.bin`, `pg19\u002Ftest.bin`, and `proof-pile\u002Ftest_sampled_data.bin`, with the tokenizer of LLaMA. `proof-pile\u002Ftest_sampled_data.bin` contains 128 documents that are randomly sampled from the total proof-pile test split. For each document, it has at least 32768 tokens. We also release the sampled ids in [proof-pile\u002Ftest_sampled_ids.bin](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1cnzWODLRQYAd7HeugzLCIhaqzaLZv7J5\u002Fview?usp=share_link). You can download them from the links below.\n\n| Dataset    | Split      | Link                                                                                                         |\n|:-----------|------------|--------------------------------------------------------------------------------------------------------------|\n| PG19       | validation | [pg19\u002Fvalidation.bin](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1rbJvb0qRIf2mQoN2ON7S93TbTzMnlrN6\u002Fview?usp=share_link) |\n| PG19       | test       | [pg19\u002Ftest.bin](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1QANDMdctpacPAYgS04adDXqByGEq-Ret\u002Fview?usp=share_link)       |\n| Proof-pile | test       | [proof-pile\u002Ftest_sampled_data.bin](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1bUI5lPDvrqzY_XXJJ2sSuvZx0Y9AZClE\u002Fview?usp=share_link)         |\n \n\n### Passkey Retrieval\nWe provide a manner to test the passkey retrieval accuracy. For example,\n```\npython3 passkey_retrivial.py \\\n        --context_size 32768 \\\n        --base_model path_to\u002FLlama-2-7b-longlora-32k \\\n        --max_tokens 32768 \\\n        --interval 1000\n```\n- Note that the `context_size` is the context length during fine-tuning.\n- `max_tokens` is maximum length for the document in passkey retrieval evaluation.\n- `interval` is the interval during the document length increasing. It is a rough number because the document increases by sentences.\n\n## Demo\n### Local Inference\nTo chat with LongAlpaca models,\n```\npython3 inference.py  \\\n        --base_model path_to_model \\\n        --question $question \\\n        --context_size $context_length \\\n        --max_gen_len $max_gen_len \\\n        --flash_attn True \\\n        --material $material_content\n```\nTo ask a question related to a book:\n```\npython3 inference.py  \\\n        --base_model \u002Fdata\u002Fmodels\u002FLongAlpaca-13B \\\n        --question \"Why doesn't Professor Snape seem to like Harry?\" \\\n        --context_size 32768 \\\n        --max_gen_len 512 \\\n        --flash_attn True \\\n        --material \"materials\u002FHarry Potter and the Philosophers Stone_section2.txt\"\n```\n\nTo ask a question related to a paper:\n```\npython3 inference.py  \\\n        --base_model \u002Fdata\u002Fmodels\u002FLongAlpaca-13B \\\n        --question \"What are the main contributions and novelties of this work?\" \\\n        --context_size 32768 \\\n        --max_gen_len 512 \\\n        --flash_attn True \\\n        --material \"materials\u002Fpaper1.txt\"\n```\n- Note that inference.py can be replaced by inference-qlora.py if you want to try 4-bit quantized fine-tuning for further GPU memory reduction. This follows [QLoRA](https:\u002F\u002Fgithub.com\u002Fartidoro\u002Fqlora).\n\n### Online Demo\nTo deploy your own demo run \n```\npython3 demo.py  \\\n\t--base_model path_to_model \\\n\t--context_size $context_size \\\n\t--max_gen_len $max_gen_len \\\n\t--flash_attn True\n```\nExample \n```\npython3 demo.py  \\\n\t--base_model \u002Fdata\u002Fmodels\u002FLongAlpaca-13B \\\n\t--context_size 32768 \\\n\t--max_gen_len 512 \\\n\t--flash_attn True\n```\n- Note that `flash_attn=True` will make the generation slow but save much GPU memory.\n\n## Streaming Inference\nWe support the inference of LongAlpaca models with [StreamingLLM](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fstreaming-llm). This increases the context-length of the multi-round dialogue in StreamingLLM.\nHere is an example,\n```\npython run_streaming_llama_longalpaca.py \\\n\t----enable_streaming \\\n\t--test_filepath outputs_stream.json \\\n\t--use_flash_attn True \\\n\t--recent_size 32768\n```\n- Note that please use a smaller recent_size if you meet OOM issues, for example 8192.\n- `test_filepath` is the json file that contains prompts for inference. We provide an example file [outputs_stream.json](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F13WGepnamWR8FKQS2UceyhNgV1ALHNx3w\u002Fview?usp=share_link), which is a subset of LongAlpaca-12k. You can replace it to your own questions.\n\n## Data Generation via Pdf2text\nDuring our dataset collection, we convert paper and books from pdf to text. The conversion quality has a large influence on the final model quality. We think that this step is non-trivial. We release the tool for the pdf2txt conversion, in the folder `pdf2txt`. It is built upon `pdf2image`, `easyocr`, `ditod` and `detectron2`. Please refer to the [README.md](pdf2txt\u002FREADME.md) in `pdf2txt` for more details.\n\n## Examples\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_6d3b7532553a.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_22ff1b1f46d9.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_644c80dc0706.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_3d83f3a07064.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_7a1669111742.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_9447e1bec944.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_3e42232f28f7.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_7c0c2368f8f5.png\" width=\"100%\"> \u003C\u002Fp>\n\n## Citation\nIf you find this project useful in your research, please consider citing:\n\n```\n@inproceedings{longlora,\n  author       = {Yukang Chen and Shengju Qian and Haotian Tang and Xin Lai and Zhijian Liu and Song Han and Jiaya Jia},\n  title        = {LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models},\n  booktitle    = {The International Conference on Learning Representations (ICLR)},\n  year         = {2024},\n}\n```\n\n\n```\n@misc{long-alpaca,\n  author = {Yukang Chen and Shaozuo Yu and Shengju Qian and Haotian Tang and Xin Lai and Zhijian Liu and Song Han and Jiaya Jia},\n  title = {Long Alpaca: Long-context Instruction-following models},\n  year = {2023},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA}},\n}\n```\n## Acknowledgement\n-  This work is built upon the [LLaMA2](https:\u002F\u002Fai.meta.com\u002Fllama) as the pre-trained models.\n-  This work can also be built upon the [GPTNeoX-HF](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fgpt_neox) which is based upon [EleutherAI\u002FGPTNeoX](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Fgpt-neox) as the pre-trained model architecture.\n- This work is based on [DeepSpeed](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDeepSpeed), [peft](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft), and [Flash-Attention2](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention) for acceleration.\n- Some evaluation code is modified upon [Landmark Attention](https:\u002F\u002Fgithub.com\u002Fepfml\u002Flandmark-attention).\n- We use [LongChat](https:\u002F\u002Fgithub.com\u002FDachengLi1\u002FLongChat) for the retrieval evaluation.\n- We follow [StreamingLLM](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fstreaming-llm) for streaming inference.\n- We combine [QLoRA](https:\u002F\u002Fgithub.com\u002Fartidoro\u002Fqlora) with LongLoRA for supervised fine-tuning.\n\n\n## License\n- LongLoRA is licensed under the Apache License 2.0. This means that it requires the preservation of copyright and license notices. \n- Data and weights are under CC-BY-NC 4.0 License. They are licensed for research use only, and allowed only non-commercial. Models trained using the dataset should not be used outside of research purposes.\n","\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_33b6676a97b3.png\" alt=\"斯坦福-Alpaca\" style=\"width: 100%; min-width: 300px; display: block; margin: auto;\">\n\u003C\u002Fp>\n\n# 长上下文大模型的LongLoRA与LongAlpaca\n\n\n[![Hugging Face 模型](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModels-Huggingface%20Models-bron)](https:\u002F\u002Fhuggingface.co\u002FYukang)\n[![数据](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FData-LongAlpaca%2012k-light)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYukang\u002FLongAlpaca-12k)\n[![论文](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arvix%20Link-green)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.12307)\n\n[![代码许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode%20License-Apache_2.0-yellow.svg)](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fblob\u002Fmain\u002FLICENSE)\n[![数据许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FData%20License-CC%20By%20NC%204.0-orange.svg)](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fblob\u002Fmain\u002FDATA_LICENSE)\n[![权重许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeight%20License-CC%20By%20NC%204.0-red)](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fblob\u002Fmain\u002FWEIGHT_LICENSE)\n\n\n## 目录\n1. [新闻](#news)\n2. [亮点](#highlights)\n3. [如何贡献](#how-to-contribute)\n4. [使用要求](#usage-requirements)\n5. [安装与快速指南](#installation-and-quick-guide)\n6. [LongAlpaca 数据](#longalpaca-data)\n7. [模型](#models)\n8. [训练](#training)\n9. [评估](#evaluation)\n10. [演示](#demo)\n11. [流式推理](#streaming-inference)\n12. [通过Pdf2Text生成数据](#data-generation-via-pdf2text)\n13. [示例](#examples)\n14. [引用](#citation)\n15. [致谢](#acknowledgement)\n16. [许可证](#license)\n      \n## 新闻\n- [x] [2024.1.17] [LongLoRA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.12307) 已被 **ICLR 2024** 接受为 **口头报告**。\n- [x] [2023.11.19] 我们发布了新版本的 LongAlpaca 模型，包括 [LongAlpaca-7B-16k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B-16k)、[LongAlpaca-13B-16k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-13B-16k) 和 [LongAlpaca-70B-16k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-70B-16k)。这些模型是在 LongLoRA 的 SFT 下，基于 LongAlpaca-12k 数据集的一个子集进行微调的，即 [LongAlpaca-16k-length](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYukang\u002FLongAlpaca-16k-length)。我们对 [LongAlpaca-7B-16k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B-16k) 模型进行了 LongBench 和 L-Eval 基准测试，结果可在此处查看：[链接](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Ftree\u002Fmain\u002Fbenchmarks)。\n- [x] [2023.11.2] 我们已将 LongAlpaca 模型的提示方式从 alpaca 提示更新为 llama2 提示，以与其预训练模型保持一致。请参考带有 llama2 提示的 [推理代码](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fblob\u002F2345c6d030f61ac3a031906386a103a5b05e0e6f\u002Finference.py#L18)。\n- [x] [2023.10.23] 我们支持在 [监督微调](supervised-fine-tune-qlora.py) 中将 [QLoRA](https:\u002F\u002Fgithub.com\u002Fartidoro\u002Fqlora) 与 LongLoRA 结合使用，以进一步降低 GPU 内存占用。我们已在 [LongAlpaca-7B-qlora-weights](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B-qlora-weights) 上发布了 7B 模型的 LoRA 权重。\n- [x] [2023.10.18] 我们支持在我们的 LongAlpaca 模型上使用 [StreamingLLM](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fstreaming-llm) 进行推理。这增加了 StreamingLLM 中多轮对话的上下文长度。\n- [x] [2023.10.8] **我们发布了长指令遵循数据集**，即 [LongAlpaca-12k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYukang\u002FLongAlpaca-12k)，以及 **相应的模型**，包括 [LongAlpaca-7B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B)、[LongAlpaca-13B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-13B) 和 [LongAlpaca-70B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-70B)。\n- (*之前的 sft 模型*，[Llama-2-13b-chat-longlora-32k-sft](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-chat-longlora-32k-sft) 和 [Llama-2-70b-chat-longlora-32k-sft](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-70b-chat-longlora-32k-sft)，*已被弃用*。)\n- [x] [2023.10.3] 我们新增了对 GPTNeoX 模型的支持。请参阅此 [PR](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fpull\u002F32) 了解使用方法。感谢 @naubull2 的贡献。\n- [x] [2023.9.22] 我们发布了所有微调过的 [模型](https:\u002F\u002Fhuggingface.co\u002FYukang)，其中包括 **70B-32k 模型**，如 [LLaMA2-LongLoRA-70B-32k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-70b-longlora-32k) 和 [LLaMA2-LongLoRA-7B-100k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-100k-ft)。欢迎大家查看！\n- [x] [2023.9.22] 我们发布了 [论文](http:\u002F\u002Farxiv.org\u002Fabs\u002F2309.12307) 和这个 GitHub 仓库，其中包含了训练和评估代码。\n\n**LongLoRA：高效微调长上下文大型语言模型 [[论文](http:\u002F\u002Farxiv.org\u002Fabs\u002F2309.12307)]** \u003Cbr \u002F>\n[Yukang Chen](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=6p0ygKUAAAAJ&hl=en),\n[Shengju Qian](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=QNnWmasAAAAJ),\n[Haotian Tang](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=WxL13BAAAAAJ&hl),\n[Xin Lai](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=tqNDPA4AAAAJ&hl=zh-CN),\n[Zhijian Liu](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=3coYSTUAAAAJ&hl=en),\n[Song Han](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=E0iCaa4AAAAJ&hl=zh-CN),\n[Jiaya Jia](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=XPAkzTEAAAAJ&hl=en)\u003Cbr \u002F>\n\n## 亮点\n1. 在 LongLoRA 方法中，提出的偏移短注意力机制易于实现，兼容 Flash-Attention，并且在推理过程中无需使用。\n2. 我们发布了所有模型，涵盖 7B 至 70B 不等的参数规模，上下文长度从 8k 到 100k 不等，包括 [LLaMA2-LongLoRA-7B-100k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-100k-ft)、[LLaMA2-LongLoRA-13B-64k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-64k) 和 [LLaMA2-LongLoRA-70B-32k](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-70b-longlora-32k) 等。\n3. 我们构建了一个长上下文指令遵循数据集，即 [LongAlpaca-12k](#longalpaca-data)。我们还发布了相应的 [LongAlpaca-7B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B)、[LongAlpaca-13B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-13B) 和 [LongAlpaca-70B](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-70B) 模型。据我们所知，这是首个开源的长上下文 70B 模型。\n\n\n## 如何贡献\n- 请确保已安装 git。\n- 创建您自己的项目 [fork](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Ffork)。\n- 使用 git clone 并粘贴该项目的 URL，在您的本地机器上克隆该仓库。\n- 仔细阅读下方的 `使用要求` 和 `安装与快速指南` 部分。\n- 提交并推送您的更改。\n- 完成项目修改后，请发起一个 pull request。\n\n\n## 使用要求\n要下载和使用 [预训练权重](#pre-trained-weights)，您需要：\n1. 具有有效邮箱的 Hugging Face (HF) 账户。请注意，用于 HF 的邮箱也必须用于许可协议。\n2. 接受 Meta 的 [许可及可接受使用政策](https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fllama-downloads\u002F)\n\n## 安装与快速指南\n要安装并运行该应用：\n1. 在 GitHub 上 [Fork 此仓库](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Ffork)\n2. 使用 `git clone` 命令，并粘贴该项目的 URL，将仓库克隆到本地。\n3. 运行以下命令：\n```\npip install -r requirements.txt\npip install flash-attn --no-build-isolation\n```\n4. 选择一个 [已发布的模型](#released-models) 或者 [进行微调](#fine-tuning)，以满足您的需求。\n5. 通过聊天测试您的模型。\n6. 部署您自己的演示。\n\n## LongAlpaca 数据\n\nLongAlpaca-12k 包含我们收集的 9,000 条长问答数据，以及从原始 [Alpaca 数据](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca\u002Fblob\u002Fmain\u002Falpaca_data.json) 中采样的 3,000 条短问答数据。这样做是为了避免模型在处理短指令时性能下降的情况。我们收集的数据类型和数量如图所示。\n\n\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_f11d968bceb0.png\" alt=\"Stanford-Alpaca\" style=\"width: 60%; min-width: 300px; display: block; margin: auto;\">\n\u003C\u002Fp>\n\n\n| 数据           | 短问答 | 长问答  | 总计    | 下载 |\n|:---------------|----------|----------|----------|----------|\n| LongAlpaca-12k | 3k       | 9k       | 12k      | [链接](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYukang\u002FLongAlpaca-12k) |\n\n遵循原始 Alpaca 的格式，我们的长问答数据使用以下提示进行微调：\n- `instruction`: `str`，描述模型应执行的任务。例如，在阅读书籍章节或论文后回答问题。我们通过变化内容和问题来使指令更加多样化。\n- `output`: `str`，即对指令的回答。\n\n为了简化流程，我们未使用 Alpaca 格式中的 `input` 字段。\n\n## 模型\n\n### 经过监督微调的模型\n| 模型          | 参数量 | 上下文长度 | 训练方式   | 链接                                                       |\n|:---------------|------|---------|---------|------------------------------------------------------------|\n| LongAlpaca-7B  | 70亿 | 32,768  | 全量微调 | [模型](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-7B)       |\n| LongAlpaca-13B | 130亿 | 32,768  | 全量微调 | [模型](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-13B)      |\n| LongAlpaca-70B | 700亿 | 32,768  | LoRA+ | [模型](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-70B) [(LoRA权重)](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLongAlpaca-70B-lora) |\n\n\n### 通过全量微调扩展上下文长度的模型\n| 模型                       | 参数量 | 上下文长度 | 训练方式 | 链接                                                              |\n|:----------------------------|------|---------|-------|-------------------------------------------------------------------|\n| Llama-2-7b-longlora-8k-ft   | 70亿 | 8,192    | 全量微调    | [模型](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-8k-ft)  |\n| Llama-2-7b-longlora-16k-ft  | 70亿 | 16,384   | 全量微调    | [模型](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-16k-ft)  |\n| Llama-2-7b-longlora-32k-ft  | 70亿 | 32,768   | 全量微调    | [模型](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-32k-ft)  |\n| Llama-2-7b-longlora-100k-ft | 70亿 | 100,000  | 全量微调    | [模型](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-100k-ft) |\n| Llama-2-13b-longlora-8k-ft  | 130亿 | 8,192    | 全量微调    | [模型](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-8k-ft)  |\n| Llama-2-13b-longlora-16k-ft | 130亿 | 16,384   | 全量微调    | [模型](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-16k-ft) |\n| Llama-2-13b-longlora-32k-ft | 130亿 | 32,768   | 全量微调    | [模型](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-32k-ft) |\n\n### 通过改进的 LoRA 微调扩展上下文长度的模型\n| 模型                       | 参数量 | 上下文长度 | 训练方式 | 链接                                                                |\n|:----------------------------|------|---------|-------|---------------------------------------------------------------------|\n| Llama-2-7b-longlora-8k      | 70亿 | 8,192    | LoRA+ | [LoRA 权重](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-8k) |\n| Llama-2-7b-longlora-16k     | 70亿 | 16,384   | LoRA+ | [LoRA 权重](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-16k)       |\n| Llama-2-7b-longlora-32k     | 70亿 | 32,768   | LoRA+ | [LoRA 权重](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-7b-longlora-32k)       |\n| Llama-2-13b-longlora-8k     | 130亿 | 8,192    | LoRA+ | [LoRA 权重](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-8k)       |\n| Llama-2-13b-longlora-16k    | 130亿 | 16,384   | LoRA+ | [LoRA 权重](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-16k)      |\n| Llama-2-13b-longlora-32k    | 130亿 | 32,768   | LoRA+ | [LoRA 权重](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-32k)      |\n| Llama-2-13b-longlora-64k    | 130亿 | 65,536   | LoRA+ | [LoRA 权重](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-13b-longlora-64k)      |\n| Llama-2-70b-longlora-32k    | 700亿 | 32,768   | LoRA+ | [LoRA 权重](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-70b-longlora-32k)      |\n| Llama-2-70b-chat-longlora-32k    | 700亿 | 32,768   | LoRA+ | [LoRA 权重](https:\u002F\u002Fhuggingface.co\u002FYukang\u002FLlama-2-70b-chat-longlora-32k) |\n\n## 训练\n### 预训练权重\n我们使用 LLaMA2 模型作为预训练权重，并对其进行微调以支持更长的上下文窗口。请根据您的需求下载相应的权重。\n\n| 预训练权重                                                        |\n|:---------------------------------------------------------------------------|\n| [Llama-2-7b-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-hf)           |\n| [Llama-2-13b-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-13b-hf)         |\n| [Llama-2-70b-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-70b-hf)         |\n| [Llama-2-7b-chat-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-chat-hf) |\n| [Llama-2-13b-chat-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-13b-chat-hf)         |\n| [Llama-2-70b-chat-hf](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-70b-chat-hf)         |\n\n本项目也支持 GPTNeoX 模型作为基础架构。一些候选的预训练权重可能包括 [GPT-NeoX-20B](https:\u002F\u002Fhuggingface.co\u002FEleutherAI\u002Fgpt-neox-20b)、[Polyglot-ko-12.8B](https:\u002F\u002Fhuggingface.co\u002FEleutherAI\u002Fpolyglot-ko-12.8b) 等其他变体。\n\n### 微调\n```\ntorchrun --nproc_per_node=8 fine-tune.py  \\\n        --model_name_or_path path_to\u002FLlama-2-7b-hf \\\n        --bf16 True \\\n        --output_dir path_to_saving_checkpoints       \\\n        --cache_dir path_to_cache \\\n        --model_max_length 8192 \\\n        --use_flash_attn True \\\n        --low_rank_training False \\\n        --num_train_epochs 1  \\\n        --per_device_train_batch_size 1     \\\n        --per_device_eval_batch_size 2     \\\n        --gradient_accumulation_steps 8     \\\n        --evaluation_strategy \"no\"     \\\n        --save_strategy \"steps\"     \\\n        --save_steps 1000     \\\n        --save_total_limit 2     \\\n        --learning_rate 2e-5     \\\n        --weight_decay 0.0     \\\n        --warmup_steps 20     \\\n        --lr_scheduler_type \"constant_with_warmup\"     \\\n        --logging_steps 1     \\\n        --deepspeed \"ds_configs\u002Fstage2.json\" \\\n        --tf32 True \\\n        --max_steps 1000\n```\n\n- 请记得将 `path_to\u002FLlama-2-7b-hf`、`path_to_saving_checkpoints` 和 `path_to_cache` 替换为您的实际路径。\n- 您可以调整 `model_max_length` 的值。\n- 如果需要，您可以将 `ds_configs\u002Fstage2.json` 替换为 `ds_configs\u002Fstage3.json`。\n- 如果您使用的是 V100 显卡或未安装 Flash Attention，请将 `use_flash_attn` 设置为 `False`。\n- 如果您希望进行全参数微调，可以将 `low_rank_training` 设置为 `False`。虽然这会占用更多显存且训练速度较慢，但模型性能可能会更好。\n- 训练完成后，要获取完整的模型权重：\n```\ncd path_to_saving_checkpoints && python zero_to_fp32.py . pytorch_model.bin\n```\n请注意，`path_to_saving_checkpoints` 可能是某个 global_step 目录，具体取决于 DeepSpeed 的版本。\n\n### 有监督微调\n```\ntorchrun --nproc_per_node=8 supervised-fine-tune.py  \\\n        --model_name_or_path path_to_Llama2_chat_models \\\n        --bf16 True \\\n        --output_dir path_to_saving_checkpoints       \\\n        --model_max_length 16384 \\\n        --use_flash_attn True \\\n        --data_path LongAlpaca-16k-length.json \\\n        --low_rank_training True \\\n        --num_train_epochs 5  \\\n        --per_device_train_batch_size 1     \\\n        --per_device_eval_batch_size 2     \\\n        --gradient_accumulation_steps 8     \\\n        --evaluation_strategy \"no\"     \\\n        --save_strategy \"steps\"     \\\n        --save_steps 98     \\\n        --save_total_limit 2     \\\n        --learning_rate 2e-5     \\\n        --weight_decay 0.0     \\\n        --warmup_steps 20     \\\n        --lr_scheduler_type \"constant_with_warmup\"     \\\n        --logging_steps 1     \\\n        --deepspeed \"ds_configs\u002Fstage2.json\" \\\n        --tf32 True\n```\n- 对于已扩展上下文的微调模型，无需再进行有监督微调。可以直接使用 Llama2-chat 模型作为基础模型，因为长指令遵循数据已经足够用于 SFT。\n- 我们的长指令遵循数据可在 [LongAlpaca-12k.json](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYukang\u002FLongAlpaca-12k) 中找到。\n- 如果您想进一步减少显存占用，可以使用 `supervised-fine-tune-qlora.py` 来尝试 4 位量化微调。这基于 [QLoRA](https:\u002F\u002Fgithub.com\u002Fartidoro\u002Fqlora)。\n- 如果在 QLoRA SFT 后保存 `pytorch_model.bin` 时遇到问题，请参考此 [issue](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA\u002Fissues\u002F123)。\n\n### 获取低秩训练中的可训练权重\n在低秩训练中，我们设置嵌入层和归一化层为可训练。请使用以下命令从 `pytorch_model.bin` 中提取可训练权重 `trainable_params.bin`：\n```\npython3 get_trainable_weights.py --checkpoint_path path_to_saving_checkpoints --trainable_params \"embed,norm\"\n```\n\n### 合并 LoRA 权重\n将 `pytorch_model.bin` 和可训练参数 `trainable_params.bin` 的 LoRA 权重合并，并将结果模型以 Hugging Face 格式保存到您指定的路径：\n```\npython3 merge_lora_weights_and_save_hf_model.py \\\n        --base_model path_to\u002FLlama-2-7b-hf \\\n        --peft_model path_to_saving_checkpoints \\\n        --context_size 8192 \\\n        --save_path path_to_saving_merged_model\n```\n例如：\n```\npython3 merge_lora_weights_and_save_hf_model.py \\\n        --base_model \u002Fdataset\u002Fpretrained-models\u002FLlama-2-7b-hf \\\n        --peft_model \u002Fdataset\u002Fyukangchen\u002Fhf_models\u002Flora-models\u002FLlama-2-7b-longlora-8k \\\n        --context_size 8192 \\\n        --save_path \u002Fdataset\u002Fyukangchen\u002Fmodels\u002FLlama-2-7b-longlora-8k-merged\n```\n\n\n## 评估\n\n### 悖论验证\n要在低秩设置下评估模型，请同时设置 `base_model` 和 `peft_model`。`base_model` 是预训练权重，而 `peft_model` 是保存检查点的路径，其中应包含 `trainable_params.bin`、`adapter_model.bin` 和 `adapter_config.json`。例如：\n```\npython3 eval.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to\u002FLlama-2-7b-hf --peft_model path_to_saving_checkpoints --data_path pg19\u002Ftest.bin\n```\n\n或者使用多 GPU 进行评估，如下所示：\n```\ntorchrun --nproc_per_node=auto eval_distributed.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to\u002FLlama-2-7b-hf --peft_model path_to_saving_checkpoints --data_path pg19\u002Ftest.bin\n```\n\n对于完全微调后的模型，只需将 `base_model` 设置为保存检查点的路径，该路径应包含 `pytorch_model.bin` 和 `config.json`。此时无需指定 `peft_model`。\n```\npython3 eval.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to_saving_checkpoints --data_path pg19\u002Ftest.bin\n```\n\n或者使用多 GPU 进行评估，如下所示：\n```\ntorchrun --nproc_per_node=auto eval_distributed.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to_saving_checkpoints --data_path pg19\u002Ftest.bin\n```\n\n- 请注意，`--seq_len` 用于设置评估时的序列长度，而 `--context_size` 则用于设置微调期间模型的上下文长度。`--seq_len` 不应大于 `--context_size`。\n\n- 我们已经使用 LLaMA 的分词器将 PG19 和 proof-pile 数据集的验证集和测试部分分别分词为 `pg19\u002Fvalidation.bin`、`pg19\u002Ftest.bin` 和 `proof-pile\u002Ftest_sampled_data.bin`。其中，`proof-pile\u002Ftest_sampled_data.bin` 包含从 proof-pile 测试集中随机抽取的 128 篇文档，每篇文档至少有 32768 个标记。我们还发布了这些文档的索引文件 [proof-pile\u002Ftest_sampled_ids.bin](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1cnzWODLRQYAd7HeugzLCIhaqzaLZv7J5\u002Fview?usp=share_link)，您可以通过以下链接下载：\n\n| 数据集    | 划分      | 链接                                                                                                         |\n|:-----------|------------|--------------------------------------------------------------------------------------------------------------|\n| PG19       | 验证集    | [pg19\u002Fvalidation.bin](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1rbJvb0qRIf2mQoN2ON7S93TbTzMnlrN6\u002Fview?usp=share_link) |\n| PG19       | 测试集    | [pg19\u002Ftest.bin](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1QANDMdctpacPAYgS04adDXqByGEq-Ret\u002Fview?usp=share_link)       |\n| Proof-pile | 测试集    | [proof-pile\u002Ftest_sampled_data.bin](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1bUI5lPDvrqzY_XXJJ2sSuvZx0Y9AZClE\u002Fview?usp=share_link)         |\n\n### 密钥检索\n我们提供了一种测试密钥检索准确性的方法。例如：\n```\npython3 passkey_retrivial.py \\\n        --context_size 32768 \\\n        --base_model path_to\u002FLlama-2-7b-longlora-32k \\\n        --max_tokens 32768 \\\n        --interval 1000\n```\n\n- 请注意，`context_size` 是微调期间的上下文长度。\n- `max_tokens` 是密钥检索评估中文档的最大长度。\n- `interval` 是文档长度递增的间隔，由于文档是以句子为单位增加的，因此这个数值只是一个大致的参考。\n\n## 演示\n### 本地推理\n要与 LongAlpaca 模型对话，可以运行：\n```\npython3 inference.py  \\\n        --base_model path_to_model \\\n        --question $question \\\n        --context_size $context_length \\\n        --max_gen_len $max_gen_len \\\n        --flash_attn True \\\n        --material $material_content\n```\n\n例如，针对一本书提出问题：\n```\npython3 inference.py  \\\n        --base_model \u002Fdata\u002Fmodels\u002FLongAlpaca-13B \\\n        --question \"为什么斯内普教授似乎不喜欢哈利？\" \\\n        --context_size 32768 \\\n        --max_gen_len 512 \\\n        --flash_attn True \\\n        --material \"materials\u002FHarry Potter and the Philosophers Stone_section2.txt\"\n```\n\n又如，针对一篇论文提出问题：\n```\npython3 inference.py  \\\n        --base_model \u002Fdata\u002Fmodels\u002FLongAlpaca-13B \\\n        --question \"这项工作的主要贡献和创新点是什么？\" \\\n        --context_size 32768 \\\n        --max_gen_len 512 \\\n        --flash_attn True \\\n        --material \"materials\u002Fpaper1.txt\"\n```\n\n- 请注意，如果您希望进一步减少 GPU 内存占用并尝试 4 位量化微调，可以将 `inference.py` 替换为 `inference-qlora.py`。这遵循 [QLoRA](https:\u002F\u002Fgithub.com\u002Fartidoro\u002Fqlora) 的方法。\n\n### 在线演示\n要部署您自己的演示程序，可以运行：\n```\npython3 demo.py  \\\n\t--base_model path_to_model \\\n\t--context_size $context_size \\\n\t--max_gen_len $max_gen_len \\\n\t--flash_attn True\n```\n\n例如：\n```\npython3 demo.py  \\\n\t--base_model \u002Fdata\u002Fmodels\u002FLongAlpaca-13B \\\n\t--context_size 32768 \\\n\t--max_gen_len 512 \\\n\t--flash_attn True\n```\n\n- 请注意，将 `flash_attn` 设置为 `True` 会降低生成速度，但能显著节省 GPU 内存。\n\n## 流式推理\n我们支持使用 [StreamingLLM](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fstreaming-llm) 对 LongAlpaca 模型进行流式推理。这能够提升 StreamingLLM 中多轮对话的上下文长度。以下是一个示例：\n```\npython run_streaming_llama_longalpaca.py \\\n\t----enable_streaming \\\n\t--test_filepath outputs_stream.json \\\n\t--use_flash_attn True \\\n\t--recent_size 32768\n```\n\n- 如果遇到 OOM 问题，请适当减小 `recent_size`，例如设置为 8192。\n- `test_filepath` 是包含推理提示的 JSON 文件。我们提供了一个示例文件 [outputs_stream.json](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F13WGepnamWR8FKQS2UceyhNgV1ALHNx3w\u002Fview?usp=share_link)，它是 LongAlpaca-12k 的一个子集。您可以将其替换为您自己的问题。\n\n## 通过 Pdf2text 生成数据\n在收集数据的过程中，我们将论文和书籍从 PDF 转换为文本。转换质量对最终模型的质量有着重要影响。我们认为这一步骤并不简单。为此，我们发布了一个用于 PDF 到文本转换的工具，位于 `pdf2txt` 文件夹中。该工具基于 `pdf2image`、`easyocr`、`ditod` 和 `detectron2` 构建。更多详细信息请参阅 `pdf2txt` 文件夹中的 [README.md](pdf2txt\u002FREADME.md)。\n\n## 示例\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_6d3b7532553a.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_22ff1b1f46d9.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_644c80dc0706.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_3d83f3a07064.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_7a1669111742.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_9447e1bec944.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_3e42232f28f7.png\" width=\"100%\"> \u003C\u002Fp>\n\u003Cp align=\"center\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_readme_7c0c2368f8f5.png\" width=\"100%\"> \u003C\u002Fp>\n\n## 引用\n如果您在研究中使用了本项目，请考虑引用以下内容：\n\n```\n@inproceedings{longlora,\n  author       = {Yukang Chen and Shengju Qian and Haotian Tang and Xin Lai and Zhijian Liu and Song Han and Jiaya Jia},\n  title        = {LongLoRA: 长上下文大语言模型的高效微调},\n  booktitle    = {国际表征学习会议 (ICLR)},\n  year         = {2024},\n}\n```\n\n\n```\n@misc{long-alpaca,\n  author = {Yukang Chen and Shaozuo Yu and Shengju Qian and Haotian Tang and Xin Lai and Zhijian Liu and Song Han and Jiaya Jia},\n  title = {Long Alpaca: 长上下文指令遵循模型},\n  year = {2023},\n  publisher = {GitHub},\n  journal = {GitHub 仓库},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA}},\n}\n```\n## 致谢\n- 本工作以 [LLaMA2](https:\u002F\u002Fai.meta.com\u002Fllama) 作为预训练模型。\n- 本工作也可基于 [GPTNeoX-HF](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fgpt_neox)，其架构源自 [EleutherAI\u002FGPTNeoX](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Fgpt-neox) 的预训练模型。\n- 本工作利用 [DeepSpeed](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDeepSpeed)、[peft](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft) 和 [Flash-Attention2](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention) 进行加速。\n- 部分评估代码参考并修改自 [Landmark Attention](https:\u002F\u002Fgithub.com\u002Fepfml\u002Flandmark-attention)。\n- 我们使用 [LongChat](https:\u002F\u002Fgithub.com\u002FDachengLi1\u002FLongChat) 进行检索评估。\n- 流式推理部分参考了 [StreamingLLM](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fstreaming-llm)。\n- 在监督微调中，我们将 [QLoRA](https:\u002F\u002Fgithub.com\u002Fartidoro\u002Fqlora) 与 LongLoRA 结合使用。\n\n\n## 许可证\n- LongLoRA 采用 Apache License 2.0 许可证。这意味着必须保留版权和许可证声明。\n- 数据和权重采用 CC-BY-NC 4.0 许可证。它们仅允许用于研究目的，且仅限非商业用途。使用该数据集训练的模型不得用于研究以外的其他用途。","# LongLoRA 快速上手指南\n\nLongLoRA 是一种用于长上下文大型语言模型（LLM）的高效微调方法，支持将 LLaMA2 等模型的上下文窗口扩展至 100k token，同时保持推理速度。本指南帮助中国开发者快速完成环境搭建与基础使用。\n\n## 环境准备\n\n在开始之前，请确保满足以下系统和依赖要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+)\n*   **Python**: 3.8 或更高版本\n*   **GPU**: 支持 CUDA 的 NVIDIA 显卡（显存需求视模型大小而定，7B 模型建议至少 16GB，70B 需多卡或量化方案）\n*   **账号准备**:\n    *   注册 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002F) 账号。\n    *   接受 Meta 的 [LLaMA 模型许可协议](https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fllama-downloads\u002F)（如需使用 LLaMA2 基座）。\n*   **网络环境**: 由于模型和数据托管在 Hugging Face，国内用户建议配置代理或使用镜像源加速下载。\n\n## 安装步骤\n\n1.  **克隆项目代码**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FLongLoRA.git\n    cd LongLoRA\n    ```\n\n2.  **安装 Python 依赖**\n    建议先配置国内 pip 镜像源以加速安装：\n    ```bash\n    pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n3.  **安装 Flash-Attention**\n    LongLoRA 依赖 Flash-Attention 以获得最佳性能。请使用以下命令安装（跳过构建隔离以加快编译）：\n    ```bash\n    pip install flash-attn --no-build-isolation\n    ```\n    *注：如果编译失败，请确保已安装正确的 CUDA Toolkit 和 C++ 编译环境。*\n\n## 基本使用\n\n### 1. 加载预训练模型\n你可以直接从 Hugging Face 加载已微调好的 LongAlpaca 模型（例如 7B 版本）。以下是最简单的推理示例：\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\n# 模型路径 (可替换为其他版本，如 Yukang\u002FLongAlpaca-13B)\nmodel_name = \"Yukang\u002FLongAlpaca-7B\"\n\n# 加载分词器和模型\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_name,\n    torch_dtype=torch.float16,\n    device_map=\"auto\"\n)\n\n# 准备输入 (遵循 LLaMA2 对话格式)\nprompt = \"\"\"[INST] \u003C\u003CSYS>>\nYou are a helpful assistant.\n\u003C\u003C\u002FSYS>>\n\nSummarize the following long text briefly:\n(此处放入你的长文本内容...)\n[\u002FINST]\"\"\"\n\n# 生成回复\ninputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(\n    **inputs,\n    max_new_tokens=512,\n    do_sample=True,\n    temperature=0.7,\n    top_p=0.9\n)\n\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n```\n\n### 2. 使用 LoRA 权重进行推理\n如果你使用的是仅包含 LoRA 权重的模型（如 `Llama-2-7b-longlora-32k`），需要结合基座模型加载：\n\n```python\nfrom peft import PeftModel\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\n\nbase_model_name = \"meta-llama\u002FLlama-2-7b-hf\" # 需接受协议并登录\nlora_weights_path = \"Yukang\u002FLlama-2-7b-longlora-32k\"\n\ntokenizer = AutoTokenizer.from_pretrained(base_model_name)\nbase_model = AutoModelForCausalLM.from_pretrained(base_model_name, torch_dtype=torch.float16, device_map=\"auto\")\n\n# 加载 LoRA 适配器\nmodel = PeftModel.from_pretrained(base_model, lora_weights_path)\n\n# 后续推理步骤同上...\n```\n\n### 3. 流式推理 (Streaming)\nLongLoRA 支持 StreamingLLM 以处理超长多轮对话，减少显存占用。参考项目中的 `inference.py` 脚本启用该功能：\n\n```bash\npython inference.py --model_name_or_path Yukang\u002FLongAlpaca-7B --streaming\n```\n\n> **提示**: 对于长文本任务，请确保输入数据符合 LongAlpaca 的数据格式（Instruction-Output 对），并利用其支持的 32k+ 上下文窗口优势。","某法律科技团队需要构建一个能直接阅读并分析长达数万字的复杂合同或判决书的智能助手，以辅助律师快速提取关键条款和风险点。\n\n### 没有 LongLoRA 时\n- **上下文截断严重**：传统大模型受限于原生上下文窗口（如 4k 或 8k），面对长篇法律文书时被迫截断输入，导致遗漏位于文档末尾的关键免责条款或判决依据。\n- **训练成本高昂**：若试图通过全量微调扩展模型长度，显存占用呈平方级增长，普通显卡无法承载，必须租用昂贵的多卡集群，开发门槛极高。\n- **推理速度缓慢**：长序列下的注意力机制计算量巨大，导致生成回答的延迟极高，无法满足律师在会议中实时查询的需求。\n- **灾难性遗忘**：在强行适配长文本的过程中，模型原本擅长的通用指令遵循能力和短文本逻辑推理能力显著下降。\n\n### 使用 LongLoRA 后\n- **原生支持超长上下文**：LongLoRA 通过移位短注意力机制，让模型在不改变架构的前提下轻松处理 32k 甚至更长的文档，完整理解整本合同逻辑。\n- **高效低显存微调**：利用参数高效微调技术，仅需少量显存即可在单张消费级显卡上完成长文本能力的注入，大幅降低算力预算。\n- **保持推理高性能**：优化后的注意力计算方式显著减少了长序列下的计算开销，确保在处理百页文档时依然能快速输出精准摘要。\n- **能力无损保留**：在扩展长度的同时，完美保留了基座模型原有的通用对话与逻辑推理能力，实现了“长短兼修”。\n\nLongLoRA 以极低的算力成本打破了大模型的“长度记忆”瓶颈，让处理海量文档的智能应用得以在本地高效落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJIA-Lab-research_LongLoRA_29260225.png","JIA-Lab-research","JIA Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FJIA-Lab-research_a2efb296.png",null,"https:\u002F\u002Fgithub.com\u002FJIA-Lab-research",[78],{"name":79,"color":80,"percentage":81},"Python","#3572A5",100,2694,286,"2026-04-14T02:09:25","Apache-2.0","未说明","需要 NVIDIA GPU（因依赖 flash-attn），具体显存需求取决于模型大小（7B\u002F13B\u002F70B）及上下文长度（最高支持 100k）。支持结合 QLoRA 以降低显存成本。",{"notes":89,"python":86,"dependencies":90},"1. 必须安装 flash-attn 且需禁用构建隔离（--no-build-isolation）。2. 使用预训练权重前需拥有 Hugging Face 账号并接受 Meta 的许可证协议。3. 支持 GPTNeoX 和 LLaMA2 架构。4. 提供 StreamingLLM 推理支持以扩展多轮对话上下文。5. 部分大模型（如 70B）仅提供 LoRA 权重而非全量微调权重。",[91,92,93,94,95],"flash-attn","transformers","torch","accelerate","bitsandbytes",[35,14],[98,99,100,101,102],"fine-tuning-llm","large-language-models","long-context","llm","lora","2026-03-27T02:49:30.150509","2026-04-18T09:19:14.209890",[106,111,116,121,126,131],{"id":107,"question_zh":108,"answer_zh":109,"source_url":110},38851,"项目是否支持多轮对话（Multi-round conversation）格式的微调？","项目目前主要使用 Llama-2 的 Prompt 格式进行 SFT。虽然有用户建议参考 Hugging Face Transformers 库中的 `chat_templating` 功能来更好地支持多轮对话（因为 Llama-2-chat 的分词器已内置相关模板），但代码中曾出现过格式化逻辑被注释的情况。如果需要支持多轮对话，建议检查最新的代码更新，确认是否启用了模板格式化，或参考社区提供的 PR 和 Hugging Face 的聊天模板文档自行实现数据格式化。","https:\u002F\u002Fgithub.com\u002FJIA-Lab-research\u002FLongLoRA\u002Fissues\u002F107",{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},38847,"在 8x40GB A100 GPU 上进行 Llama-2-7b 的监督微调（SFT）时遇到 CUDA 显存不足（OOM）错误，该如何解决？","如果已经设置了 per_device_train_batch_size=1、low_rank_training=True 和 use_flash_attn=True 仍然显存不足，建议尝试使用量化技术。有用户反馈使用 `supervised-fine-tune-qlora.py` 脚本进行 QLoRA 训练可以成功运行，尽管训练时间较长。此外，确保使用了 Flash Attention 也是关键。","https:\u002F\u002Fgithub.com\u002FJIA-Lab-research\u002FLongLoRA\u002Fissues\u002F91",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},38848,"数据集预处理耗时过长导致超过 30 分钟超时（Timeout）报错，如何解决？","该问题通常是因为多进程使用默认的 'fork' 方法启动，导致锁状态复制进而引起 NCCL 后端超时。解决方法是更改进程池的启动方法为 'forkserver'。代码示例如下：\n```python\nimport multiprocessing\nctx = multiprocessing.get_context(method='forkserver')\nwith ctx.Pool(processes=30) as pool:\n    # 其余代码\n```\n通过这种方式可以避免因进程锁定导致的超时异常。","https:\u002F\u002Fgithub.com\u002FJIA-Lab-research\u002FLongLoRA\u002Fissues\u002F45",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},38849,"是否必须使用 RedPajama 数据集进行微调才能获得长文本能力？如果不使用它，直接用长文本指令数据微调可行吗？","不一定必须使用 RedPajama 数据集。维护者表示，如果不使用 RedPajama，而是直接使用长文本问答数据集（如 LongQA）进行指令微调，模型也是可以获得长文本对话能力的。您可以尝试跳过 RedPajama 预训练阶段，直接进行指令微调。","https:\u002F\u002Fgithub.com\u002FJIA-Lab-research\u002FLongLoRA\u002Fissues\u002F41",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},38850,"在不支持 Flash Attention 的显卡（如 V100）上进行 SFT 时，遇到 `q_len % group_size > 0` 的整除报错怎么办？","如果在没有 Flash Attention 的环境下运行，由于序列长度（q_len）可能无法被 group_size 整除而报错。建议的解决方案是对数据进行 Padding 处理，使其长度能被 group_size 整除。虽然原生代码使用 `padding=\"longest\"`（按批次内最长样本填充），但这可能导致变长序列仍无法整除。对于固定显存限制的旧卡，可能需要手动将数据 Padding 到固定的最大长度（max_length），或者修改逻辑以处理不能整除的情况（例如将 group_size 动态调整为当前长度），但维护者强烈建议在可能的情况下使用支持 Flash Attention 的环境以避免此类麻烦。","https:\u002F\u002Fgithub.com\u002FJIA-Lab-research\u002FLongLoRA\u002Fissues\u002F65",{"id":132,"question_zh":133,"answer_zh":134,"source_url":125},38852,"如何构建数据集才能在使用 LongLoRA 扩展长上下文能力的同时，保留 Llama-2-chat 原有的短上下文能力？","为了在扩展长上下文的同时保留原有能力，建议使用与 Llama-2-chat 一致的格式构建数据集。维护者提到他们参考了 Alpaca 数据的形式，挑选了一些指令长度不一的数据进行训练。关键在于数据格式应与模型预训练或原始聊天格式兼容，以便模型能继承其原有的对话能力，同时通过 LongLoRA 机制学习处理更长的序列。",[]]