[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-PhoebusSi--Alpaca-CoT":3,"tool-PhoebusSi--Alpaca-CoT":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":78,"owner_twitter":78,"owner_website":81,"owner_url":82,"languages":83,"stars":107,"forks":108,"last_commit_at":109,"license":110,"difficulty_score":111,"env_os":112,"env_gpu":113,"env_ram":114,"env_deps":115,"category_tags":127,"github_topics":128,"view_count":111,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":144,"updated_at":145,"faqs":146,"releases":177},406,"PhoebusSi\u002FAlpaca-CoT","Alpaca-CoT","We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台，我们欢迎开源爱好者发起任何有意义的pr！","Alpaca-CoT 是一个面向大语言模型（LLM）指令微调的一站式开源平台，统一整合了指令数据（如思维链 CoT 数据）、多种主流大模型以及参数高效微调方法（如 LoRA、P-tuning）的接口。它解决了研究人员在尝试不同微调策略时需要反复适配数据格式和模型接口的繁琐问题，大幅降低实验门槛。平台特别适合对大模型微调感兴趣的开发者和科研人员使用，无需从零搭建复杂训练流程，即可快速验证想法。其核心亮点在于高度模块化的设计，支持灵活组合数据、模型与微调技术，并鼓励社区贡献新功能或集成更多 LLM 相关技术。通过 Alpaca-CoT，用户能更专注于算法创新而非工程适配，加速大模型研究与应用探索。","[**中文**](.\u002FCN_README.md) | [**English**](.\u002FREADME.md)\n\n\u003Cdiv id=\"top\">\u003C\u002Fdiv>\n\n![Alpaca-CoT](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_f3fc654840d1.jpg)\n# Alpaca-CoT: An Instruction-Tuning Platform with Unified Interface for Instruction Collection, Parameter-efficient Methods, and Large Language Models\n[![LICENSE](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FPhoebusSi\u002FAlpaca-CoT)](https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Fblob\u002Fmain\u002FLICENSE.txt)\n[![torch](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpytorch-%3E=1.13-red?logo=pytorch)](https:\u002F\u002Fpytorch.org\u002F)\n[![data](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fhuggingface-dataset-yellow?logo=data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAABGdBTUEAALGPC\u002FxhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAABmJLR0QA\u002FwD\u002FAP+gvaeTAAAJXUlEQVRYCQXBeWzW933A8TfYQIBAMOa0\u002FXx+QJZoaRJIQpusySJSaGKY0nWpRrZVmTq1Ug+t\u002FzRd2\u002F2zqUv\u002FWNdKW9RN3SZtU7NpUmqwuQyGmPswl\u002FHx\u002FT0c7ZIWSihgG\u002FP4enxgP++9XgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPDRHVf+us9nb\u002Fb7ym8G3XRz0PXXbrkMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAICP+3x8YNi\u002FnZnxqOWPb3uvc8r+49p\u002FSu+nSSfvfTwx6ZH+ocp37pUrBQAAAAAAAAAAAAAAAAAAAAAAAPrHrBsq+6+OfjTojZ\u002FomVe1dZW2zNPm2dpSpbsWaNta7fxTvdtsZWroTmnEH6k1AAAAAAAAAAAAAAAAAAAA9A\u002F5OadHP\u002FSjH+q+ldqCts\u002FXjmXaWadd9XqpXi+s1lNLtW2O7kTbN2j\u002FHqemvVwu+xIAAAAAAAAAAAAAAAAA\u002FaXK1xz\u002FsOyJl3UnerJGewpazDTPNA\u002FNQ\u002FPQPNM80zzTrnr9YKE2zdL0TXXy\u002FvCYbwAAAAAAAAAAAAAAcKfknzl2ddK2dbqvWrsatJhpCk2hKTSFptAUmkJTaArNQ4uZnlupO9FzX3DGyZGRcT8DAAAAAAAAAADAbwf8RGWqr8\u002F2p7R1rvaG5qEpNIWm0BSaQlNoCk2hKTSFptAUWsy0s053oj1fdXLa34yOVlYDAAAAAAAAoM4en3S\u002F6Su6C+0paB6aQlNoMTQPTaEpNIWm0BSaQouheWgKTaHFTM+t0ia0f4elMf8bAAAAAAAAgJv33ObA0Rl3ztZzKzXPNIWm0GJ47\u002FQ6p3syLYam0BSaQlNoseDgmbVOdWdaLGgKTaHFTI8s0gPr9EFpfHTUZwAAAAAAABifcqeX3tT9czTPNIWmsHK54H\u002F9\u002FbNu\u002FswW3\u002F7qi45cWKN5aApNoVcK\u002FvzdDW7ZvMVvfOn3vdexVouhKTSFdhd0J3r7Zw6O+O8AAAAAAFy5XlldGbrW594lenaF5pmm0Dwcv7jG17dt9plPNrrhuUYv7fhdvVLQFJqH0z3hm29scsPGRp96ZqvH3ntSrxY0habQYqYfzNdTW5zRX97SBQAAAABc7\u002FN1b72vzWhPQVNoCk1hpRj+w\u002Fc+5Q+\u002F97zH\u002FucTDp5Zq3loCk2hKTz7\u002FhPu+enTfvmLL\u002Fvb449qMTSFptA807PLdU+Njt140He\u002F8hwAAAAAt+\u002F7Xf\u002Fv73TvLM0zTaEpNIXm4UxvWEmh1xo0D02hKTSFptArBb1ScKor02JoCk2hKTQPvVSvzbN1qMP+kl8EAAAAYHDEH1v8uh6o1jzTFJpCU2gKTaF56NWCXi5oCk2hKTQPvVrQywXNQ1NoCk2hKTSF9hS0BR3Y6537fhsAAACAUtl\u002FMn1FD87RPNMUmkJTaAq9WvDoe0\u002F6ztsv+IsDj2mxoL9o0GsNjnWucd+\u002FPe33v\u002FWC5YtrNA9NoSk0habQnoK2oH3N3h32rwEAAAC4N+IPvPot3T9bi2s0zzTPNIXmodcKHvrPp3xyw1Y3bfqsf\u002Fnll\u002FzHv\u002Fmk7\u002FzV827\u002Fo00+s7HR7Z9\u002FxanuTPNM89AUmmeaZ9pT0GZ08JD9w5VvAAAAAHD9buXPvf4T3YUeq9G2hdqxQvNMO+u0u86B0+v8g8bNPvvCa776e42++HyjW15o9OVPv+b6jY3+y\u002Fc36rWCdtbppXrNQ0\u002FVattCPbxYW+bpcKrcvu8rAAAA2NRUdbfkuw60a9MsbanWtoXaXK0HH9aWam2u1gu17vjper\u002F00lY\u002F3Pp52xtft3fb52zf\u002FId+bfsmBzrW6cka3Vmlu+Zo20Jtrta2hboD3R86MTAzNOYPbGqqAgBgqOy3HTyg+x\u002FVptm6d57mmR6r0fdn6bEaPfKI7p6jeYOl\u002F13v2De3OPrWNkf+Yqvj77zkxOlHtbdOm6v1xFJtX6Q\u002Fn6VnlmlvQVuqdcc8PbRey10OlX0bgCatcmaq0+Mb9eA8PVGrLXP06BLNM22dr2eWa0+DNlfryVq9FprCyoW1Vi6t0SsFvZLp4Ud091zNMz1Wo4ce1mKmhxbp7rl6apnum6UX3nB8qpKamqxCrVJ7PPmyts7VnoJ21ev++XpiqV5cra3ztbtBjy3R5mq9VKd5pnloHppnen6l7qzS08u0s05bH9Kuej28WA8u1N6CdjXortl66S3Lk15Vq7hxzyctpRH31WrrfG1doKeXa29BP1ikHcv1WI2eqtUU2jpf987T7gZNoXmmnXXaMkcPPax5pkeX6KlaPVmrRx7R3oIer9V983XfPD34O1Ym+oYHhn2CW\u002FdsdPC0Ns\u002FW86u0u0GPLNFDi\u002FRSnXbVa55pCs1Duxt091xtX6x5pin0wAJtfUh7C5pCU2ieaVe9XlytbQ\u002Fr8aXaW9DTy3TXIh3\u002FtaURP8vNQTdYujzp7oXatlCPL9ULq\u002FX8Kr24WvNMU2gKTaHFTE\u002FW6u452lvQrnptrtazKzTPNIWm0BSaZ3p+lV5YredX6bGl2vqQ7i\u002FoRP\u002FkyKRPc3vAFxzs1J2ztGO5djXokRo9u1KLmabQFJpCU2gx0xNLdc9czTPtadCWaj27QvNMU2gKTaEptJjpqeV6bKl2F\u002FTEEt29RMu3LI1VPsXAgItnZrzi1e9oE3pwnl5Ypb2hxUyLmeaZ5pkWM+1u0N1V2oIeWaCH5+sOtHWeptBipnmmeabFTPNMewp6bqUeqNYd1Xr9XScf+Et1EQDlqcqL0zP+ynsH9NSr2vyQ7kIPztUTj+iZZdqxTI8v1ha07THtfFM7XtOz27TrLd1Xp3vQE0u0Y7meqdXji7Vtjraguxbp2S\u002FocIdT016fmvLTAAAAjI5WVpfG\u002FLH6K0eLeuOf9cJ2bV+vrfXaWqcfrNcr39XyTccnPTE97eGp6Ur72KTHHb6svV\u002FXQ0\u002FovtXaWtDDz2nnW\u002Frxf+j4Rz6Y9sbwuD9SVwEAAAAAAFAqWVMqu\u002F3+iO+NT9lTmZ6443j\u002FmON95cr0xJ3RCU+WJ90OAAAAUCr7J2MTnqg8KN9x\u002FG7ZiYGxyvTknfKEvaUxfzY85h+XStYAAAAAAAAAAAAAAKDOHh52ed+Qjw1N+PjtEVcAAAAAAAAAANweccXQhI\u002F3DfnY8LDL1dkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD8P8HSw4EMlPZhAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDIzLTA0LTEyVDEyOjI0OjQxKzAwOjAwUmNguAAAACV0RVh0ZGF0ZTptb2RpZnkAMjAyMy0wNC0xMlQxMjoyNDo0MSswMDowMCM+2AQAAAAodEVYdGRhdGU6dGltZXN0YW1wADIwMjMtMDQtMTJUMTI6MjQ6NDErMDA6MDB0K\u002FnbAAAAAElFTkSuQmCC)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT)\n[![model](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fhuggingface-model-yellow?logo=data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAABGdBTUEAALGPC\u002FxhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAABmJLR0QA\u002FwD\u002FAP+gvaeTAAAJXUlEQVRYCQXBeWzW933A8TfYQIBAMOa0\u002FXx+QJZoaRJIQpusySJSaGKY0nWpRrZVmTq1Ug+t\u002FzRd2\u002F2zqUv\u002FWNdKW9RN3SZtU7NpUmqwuQyGmPswl\u002FHx\u002FT0c7ZIWSihgG\u002FP4enxgP++9XgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPDRHVf+us9nb\u002Fb7ym8G3XRz0PXXbrkMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAICP+3x8YNi\u002FnZnxqOWPb3uvc8r+49p\u002FSu+nSSfvfTwx6ZH+ocp37pUrBQAAAAAAAAAAAAAAAAAAAAAAAPrHrBsq+6+OfjTojZ\u002FomVe1dZW2zNPm2dpSpbsWaNta7fxTvdtsZWroTmnEH6k1AAAAAAAAAAAAAAAAAAAA9A\u002F5OadHP\u002FSjH+q+ldqCts\u002FXjmXaWadd9XqpXi+s1lNLtW2O7kTbN2j\u002FHqemvVwu+xIAAAAAAAAAAAAAAAAA\u002FaXK1xz\u002FsOyJl3UnerJGewpazDTPNA\u002FNQ\u002FPQPNM80zzTrnr9YKE2zdL0TXXy\u002FvCYbwAAAAAAAAAAAAAAcKfknzl2ddK2dbqvWrsatJhpCk2hKTSFptAUmkJTaArNQ4uZnlupO9FzX3DGyZGRcT8DAAAAAAAAAADAbwf8RGWqr8\u002F2p7R1rvaG5qEpNIWm0BSaQlNoCk2hKTSFptAUWsy0s053oj1fdXLa34yOVlYDAAAAAAAAoM4en3S\u002F6Su6C+0paB6aQlNoMTQPTaEpNIWm0BSaQouheWgKTaHFTM+t0ia0f4elMf8bAAAAAAAAgJv33ObA0Rl3ztZzKzXPNIWm0GJ47\u002FQ6p3syLYam0BSaQlNoseDgmbVOdWdaLGgKTaHFTI8s0gPr9EFpfHTUZwAAAAAAABifcqeX3tT9czTPNIWmsHK54H\u002F9\u002FbNu\u002FswW3\u002F7qi45cWKN5aApNoVcK\u002FvzdDW7ZvMVvfOn3vdexVouhKTSFdhd0J3r7Zw6O+O8AAAAAAFy5XlldGbrW594lenaF5pmm0Dwcv7jG17dt9plPNrrhuUYv7fhdvVLQFJqH0z3hm29scsPGRp96ZqvH3ntSrxY0habQYqYfzNdTW5zRX97SBQAAAABc7\u002FN1b72vzWhPQVNoCk1hpRj+w\u002Fc+5Q+\u002F97zH\u002FucTDp5Zq3loCk2hKTz7\u002FhPu+enTfvmLL\u002Fvb449qMTSFptA807PLdU+Njt140He\u002F8hwAAAAAt+\u002F7Xf\u002Fv73TvLM0zTaEpNIXm4UxvWEmh1xo0D02hKTSFptArBb1ScKor02JoCk2hKTQPvVSvzbN1qMP+kl8EAAAAYHDEH1v8uh6o1jzTFJpCU2gKTaF56NWCXi5oCk2hKTQPvVrQywXNQ1NoCk2hKTSF9hS0BR3Y6537fhsAAACAUtl\u002FMn1FD87RPNMUmkJTaAq9WvDoe0\u002F6ztsv+IsDj2mxoL9o0GsNjnWucd+\u002FPe33v\u002FWC5YtrNA9NoSk0habQnoK2oH3N3h32rwEAAAC4N+IPvPot3T9bi2s0zzTPNIXmodcKHvrPp3xyw1Y3bfqsf\u002Fnll\u002FzHv\u002Fmk7\u002FzV827\u002Fo00+s7HR7Z9\u002FxanuTPNM89AUmmeaZ9pT0GZ08JD9w5VvAAAAAHD9buXPvf4T3YUeq9G2hdqxQvNMO+u0u86B0+v8g8bNPvvCa776e42++HyjW15o9OVPv+b6jY3+y\u002Fc36rWCdtbppXrNQ0\u002FVattCPbxYW+bpcKrcvu8rAAAA2NRUdbfkuw60a9MsbanWtoXaXK0HH9aWam2u1gu17vjper\u002F00lY\u002F3Pp52xtft3fb52zf\u002FId+bfsmBzrW6cka3Vmlu+Zo20Jtrta2hboD3R86MTAzNOYPbGqqAgBgqOy3HTyg+x\u002FVptm6d57mmR6r0fdn6bEaPfKI7p6jeYOl\u002F13v2De3OPrWNkf+Yqvj77zkxOlHtbdOm6v1xFJtX6Q\u002Fn6VnlmlvQVuqdcc8PbRey10OlX0bgCatcmaq0+Mb9eA8PVGrLXP06BLNM22dr2eWa0+DNlfryVq9FprCyoW1Vi6t0SsFvZLp4Ud091zNMz1Wo4ce1mKmhxbp7rl6apnum6UX3nB8qpKamqxCrVJ7PPmyts7VnoJ21ev++XpiqV5cra3ztbtBjy3R5mq9VKd5pnloHppnen6l7qzS08u0s05bH9Kuej28WA8u1N6CdjXortl66S3Lk15Vq7hxzyctpRH31WrrfG1doKeXa29BP1ikHcv1WI2eqtUU2jpf987T7gZNoXmmnXXaMkcPPax5pkeX6KlaPVmrRx7R3oIer9V983XfPD34O1Ym+oYHhn2CW\u002FdsdPC0Ns\u002FW86u0u0GPLNFDi\u002FRSnXbVa55pCs1Duxt091xtX6x5pin0wAJtfUh7C5pCU2ieaVe9XlytbQ\u002Fr8aXaW9DTy3TXIh3\u002FtaURP8vNQTdYujzp7oXatlCPL9ULq\u002FX8Kr24WvNMU2gKTaHFTE\u002FW6u452lvQrnptrtazKzTPNIWm0BSaZ3p+lV5YredX6bGl2vqQ7i\u002FoRP\u002FkyKRPc3vAFxzs1J2ztGO5djXokRo9u1KLmabQFJpCU2gx0xNLdc9czTPtadCWaj27QvNMU2gKTaEptJjpqeV6bKl2F\u002FTEEt29RMu3LI1VPsXAgItnZrzi1e9oE3pwnl5Ypb2hxUyLmeaZ5pkWM+1u0N1V2oIeWaCH5+sOtHWeptBipnmmeabFTPNMewp6bqUeqNYd1Xr9XScf+Et1EQDlqcqL0zP+ynsH9NSr2vyQ7kIPztUTj+iZZdqxTI8v1ha07THtfFM7XtOz27TrLd1Xp3vQE0u0Y7meqdXji7Vtjraguxbp2S\u002FocIdT016fmvLTAAAAjI5WVpfG\u002FLH6K0eLeuOf9cJ2bV+vrfXaWqcfrNcr39XyTccnPTE97eGp6Ur72KTHHb6svV\u002FXQ0\u002FovtXaWtDDz2nnW\u002Frxf+j4Rz6Y9sbwuD9SVwEAAAAAAFAqWVMqu\u002F3+iO+NT9lTmZ6443j\u002FmON95cr0xJ3RCU+WJ90OAAAAUCr7J2MTnqg8KN9x\u002FG7ZiYGxyvTknfKEvaUxfzY85h+XStYAAAAAAAAAAAAAAKDOHh52ed+Qjw1N+PjtEVcAAAAAAAAAANweccXQhI\u002F3DfnY8LDL1dkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD8P8HSw4EMlPZhAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDIzLTA0LTEyVDEyOjI0OjQxKzAwOjAwUmNguAAAACV0RVh0ZGF0ZTptb2RpZnkAMjAyMy0wNC0xMlQxMjoyNDo0MSswMDowMCM+2AQAAAAodEVYdGRhdGU6dGltZXN0YW1wADIwMjMtMDQtMTJUMTI6MjQ6NDErMDA6MDB0K\u002FnbAAAAAElFTkSuQmCC)](https:\u002F\u002Fhuggingface.co\u002FQingyiSi\u002FAlpaca-CoT)\n[![wandb](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fwandb-tools-orange?logo=WeightsAndBiases)](https:\u002F\u002Fwandb.ai)\n[![colab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGoogle-Colab-blue?logo=Google%20Colab)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1wfrKqyPkz5BGD1Gkij_cvbUeweIDdRav?usp=sharing)\n\n\nThis is the repository for the `Alpaca-CoT` project, which aims to build an instruction finetuning (IFT) platform with extensive instruction collection (especially the CoT datasets) and a unified interface for various large language models and parameter-efficient methods.  We are constantly expanding our [instruction-tuning data collection](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002F), and integrating more LLMs and more parameter-efficient methods. In addition, we created a new branch [`tabular_llm`](https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Ftree\u002Ftabular_llm) to build a Tabular LLM for solving Table Intelligence Tasks.\n\nYou are warmly welcome to provide us with any non-collected instruction-tuning datasets (or their sources). We will uniformly format them, train the Alpaca model (and other LLMs in the early future) with these datasets, open source the [model checkpoints](https:\u002F\u002Fhuggingface.co\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain), and conduct extensive empirical studies. We hope that our project can make a modest contribution to the open-source process of large language models, and reduce its threshold for NLP researchers to get started.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_473b6eb4f8ce.jpg\" width = \"100\" height = \"100\" align=right \u002F>\nYou can also choose to join our group chat (WeChat) and communicate with more people with the same interests. At present, the number of group members is too large to join the group directly through the group QR code. You need to connect with me first to get into the group.\n\n## News\n-  ⚠ If you want to use other methods besides LORA, please install the edited version in our project `pip install -e .\u002Fpeft`.\n\n-  🚀12.8: LLM `InternLM` was merged.\n-  🚀8.16: `4bit quantization` is available for `lora`, `qlora` and `adalora`.  \n-  🚀8.16: Parameter-efficient methods `Qlora`, `Sequential adapter` and `Parallel adapter` was merged.  \n-  🚀7.24: LLM `ChatGLM v2` was merged.\n-  🚀7.20: LLM `Baichuan` was merged.\n-  6.25: Add model evaluation code, including belle and MMCU.\n\u003Cdetails>\u003Csummary> - more \u003C\u002Fsummary>\n\u003Cp>\n    \n-  5.20: fixes bugs in model saving and add wandb support.\n-  5.15: more datasets like `GPT4Tools`, `Auto CoT`, `pCLUE` are add.\n-  🚀5.5: A new branch [`tabular_llm`](https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Ftree\u002Ftabular_llm) is created to build a Tabular LLM. We collect instruction fine-tuning data for table-related tasks like table question answering and use them to fine-tune LLMs in this repo.\n-  🚀5.4: All parameter-efficient methods in PEFT (e.g., p-tuning) were merged, which can be set by hyper-parameter directly.\n-  🚀5.4: LLM `MOSS` was merged.\n-  4.21: Datasets `GAOKAO`, `camel`, `FLAN-Muffin`, `COIG` are collected and formatted.\n-  4.15: Datasets `webGPT`, `dolly`, `baize`, `hh-rlhf`, `OIG(part)` are collected and formatted.\n-  4.12: Now you can try Alpaca-CoT on \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1wfrKqyPkz5BGD1Gkij_cvbUeweIDdRav?usp=sharing\" >Google Colab\u003C\u002Fa>.\n-  4.11: Added function `multi-turn conversation` by [@paulcx](https:\u002F\u002Fgithub.com\u002Fpaulcx).\n-  4.9: Datasets `firefly`, `instruct`, `Code Alpaca` are collected and formatted, which can be found [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain).\n-  4.7: Added functions `Parameter merging`, `Local chatting`, `Batch predicting` and `Web service building` by [@weberr](https:\u002F\u002Fgithub.com\u002Fweberrr).\n-  4.4: Datasets `GPTeacher`,`Guanaco`,`HC3`,`prosocial-dialog`, `belle-chat&belle-math`, `xP3` and `natural-instructions` are collected and formatted.\n-  4.3: The Chinese CoT dataset `CoT_CN_data.json` can be found [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain).\n  \n\u003C\u002Fp>\n\u003C\u002Fdetails> \n\n## Overview\n\n![img](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_1669fbcd13a2.png)\n\n\n[LLaMA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.13971v1) [1] is a great work that demonstrates the amazing zero-shot and few-shot ability. It significantly reduces the cost of training, finetuning, and using competitive large language models, i.e., LLaMA-13B outperforms GPT-3(175B) and LLaMA-65B is competitive with PaLM-540B. Recently, to boost the instruction-following ability of LLaMA, [Stanford Alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca) [2] finetuned LLaMA-7B on 52K instruction-following data generated by the [Self-Instruct](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10560) [3] techniques. However, at present, the LLM research community still faces three challenges: 1. Even LLaMA-7b still has high requirements for computing resources; 2. There are few open source datasets for instruction finetuning; and 3. There is a lack of empirical study on the impact of various types of instruction on model abilities, such as the ability to respond to Chinese instruction and the CoT reasoning.\n\nTo this end, we propose this project, which leverages various improvements that were subsequently proposed, with the following advantages:\n- 1. This repo contains code, modified from [here](https:\u002F\u002Fgithub.com\u002Ftloen\u002Falpaca-lora) and [here](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft), that can **_finetune LLaMA cheaply and efficiently_** (without performance degradation compared to Stanford Alpaca) by using [low-rank adaptation (LoRA)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.09685.pdf) [4], [PEFT](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft) and [bitsandbytes](https:\u002F\u002Fgithub.com\u002FTimDettmers\u002Fbitsandbytes). The `7b`, `13b` and `30b` versions of LLaMA models can be easily trained on a single 80G A100.\n- 2. The models published in this repo significantly **_improve the CoT (reasoning) capability_**.\n- 3. The models published in this repo significantly **_improve the ability to follow Chinese instructions_**.\n- 4. This repo contains **_a [collection of instruction-finetuning datasets](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT) that are continuously collected_**, which so far include English, Chinese and CoT instructions. In addition, a collection of checkpoints trained with various instruction datasets is also provided.\n- 5. This repo  **_integrates multiple LLMs and unifies their interfaces_**, It can be easily switched through hyperparameters. Currently, it includes **LLaMA**, **ChatGLM**[5], **Bloom**[6] and **MOSS**, and more will continue to be added in the future for researchers to easily invoke and compare different LLMs.\n- 6. This repo  **_integrates multiple parameter-efficient methods and unifies their interfaces_**, It can be easily switched through hyperparameters. Currently, it includes **LoRA**, **P-tuning**[5], **adalora** and **prefix tuning**, and more will continue to be added in the future for researchers to easily invoke and compare different parameter-efficient methods.\n- 7. This repo contains **_extensive empirical studies and qualitative analysis_**, which may provide valuable findings and promote the exploration of LLM in the future.\n\n\n**To the best of our knowledge, this work is the first to study _CoT reasoning_ based on LLaMA and Alpaca.** Therefore, we abbreviate our work to `Alpaca-CoT`.\n\n## Data Collection\n\nThe relative size of collected datasets can be shown by this graph:\n\n![img](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_606571b6cd76.png)\n\n\nReferring to [this](https:\u002F\u002Fgithub.com\u002FyaodongC\u002Fawesome-instruction-dataset) ([@yaodongC](https:\u002F\u002Fgithub.com\u002FyaodongC)), we labeled each collected dataset according to the following rules:\n\n(Lang)Lingual-Tags:\n- EN: Instruction datasets in English\n- CN: Instruction datasets in Chinese\n- ML: [Multi-lingual] Instruction datasets in multiple languages\n\n(Task)Task-Tags:\n- MT: [Multi-task] Datasets containing multiple tasks\n- TS: [Task-specific] Datasets tailored for specific tasks\n\n(Gen)Generation-method:\n- HG: [Human Generated Dataset] Datasets created by humans\n- SI: [Self-Instruct] Datasets generated using self-instruct methods\n- MIX: [Mixed Dataset] Dataset contains both human and machine generated data\n- COL: [Collection of Dataset] Dataset made from a collection of other datasets\n\n### Statistics\n\n| Dataset                                                                        | Nums     | Lang         | Task      | Gen        | Type                                                                                                              | Src                                                                            | Url                                                                                             |\n| :----------------------------------------------------------------------------- | :------- | :----------- | :-------- | :----------| :---------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------- |\n| [Chain of Thought](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002FFLAN)                    | 74771    | EN\u002FCN        | MT        | HG         | instruct with cot reasoning                                                                                       | annotating CoT on existing data                                                | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FChain-of-Thought)      |\n| [GPT4all](https:\u002F\u002Fgithub.com\u002Fnomic-ai\u002Fgpt4all)                                 | 806199   | EN           | MT        | COL        | code, stories and dialogs                                                                                          | distillation from GPT-3.5-turbo                                                | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FGPT4all)               |\n| [GPTeacher](https:\u002F\u002Fgithub.com\u002Fteknium1\u002FGPTeacher)                             | 29013    | EN           | MT        | SI         | general, roleplay, toolformer                                                                                     | GPT-4 & toolformer                                                             | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FGPTeacher)             |\n| [Guanaco](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FJosephusCheung\u002FGuanacoDataset)       | 534610   | ML           | MT        | SI         | various linguistic tasks                                                                                          | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FGuanaco)               |\n| [HC3](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHello-SimpleAI\u002FHC3)                      | 37175    | EN\u002FCN        | TS        | MIX        | dialogue evaluation                                                                                               | human or ChatGPT                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FHC3)                   |\n| [alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca)                         | 52002    | EN           | MT        | SI         | general instruct                                                                                                  | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Falpaca)                |\n| [Natural Instructions](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions)        | 5040134  | ML           | MT        | COL        | diverse nlp tasks                                                                                                 | human annotated datasets collection                                            | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FNatural-Instructions)  |\n| [belle_cn](https:\u002F\u002Fhuggingface.co\u002FBelleGroup)                                  | 1079517  | CN           | TS\u002FMT     | SI         | general, mathematical reasoning, dialogue                                                                         | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fbelle_cn)              |\n| [instinwild](https:\u002F\u002Fgithub.com\u002FXueFuzhao\u002FInstructionWild)                     | 52191    | EN\u002FCN        | MT        | SI         | generation, open-qa, mind-storm                                                                                   | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Finstinwild)            |\n| [prosocial dialog](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fallenai\u002Fprosocial-dialog)   | 165681   | EN           | TS        | MIX        | dialogue                                                                                                          | GPT-3 rewrites questions + humans feedback manually                            | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fprosocial-dialog)      |\n| [finance_en](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgbharti\u002Ffinance-alpaca)           | 68912    | EN           | TS        | COL        | financial related qa                                                                                              | GPT3.5                                                                         | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002F)                      |\n| [xP3](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbigscience\u002FxP3)                          | 78883588 | ML           | MT        | COL        | a collection of prompts & datasets across 46 of languages & 16 NLP tasks                                          | human annotated datasets collection                                            | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FxP3)                   |\n| [firefly](https:\u002F\u002Fgithub.com\u002Fyangjianxin1\u002FFirefly)                             | 1649398  | CN           | MT        | COL        | 23 nlp tasks                                                                                                      | human annotated datasets collection                                            | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Ffirefly)               |\n| [instruct](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fswype\u002Finstruct)                     | 888969   | EN           | MT        | COL        | augmented of GPT4All, Alpaca, open-source Meta datasets                                                           | augmentation performed using the advanced NLP tools provided by AllenAI        | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Finstruct)              |\n| [Code Alpaca](https:\u002F\u002Fgithub.com\u002Fsahil280114\u002Fcodealpaca)                       | 20022    | EN           | TS        | SI         | code generation, editing, optimization                                                                            | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FCodeAlpaca)            |\n| [Alpaca_GPT4](https:\u002F\u002Fgithub.com\u002FInstruction-Tuning-with-GPT-4\u002FGPT-4-LLM)      | 52002    | EN\u002FCN        | MT        | SI         | general instruct                                                                                                  | generated by GPT-4 using Alpaca                                                | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FalpacaGPT4)            |\n| [webGPT](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopenai\u002Fwebgpt_comparisons)            | 18994    | EN           | TS        | MIX        | information retrieval (IR) QA                                                                                     | fine-tuned GPT-3, each instruction has two outputs, select better one          | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FwebGPT)                |\n| [dolly 2.0](https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly)                           | 15015    | EN           | TS        | HG         | closed QA , summarization and etc, Wikipedia as references                                                        | human annotated                                                                | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fdolly)                 |\n| [baize](https:\u002F\u002Fgithub.com\u002Fproject-baize\u002Fbaize-chatbot)                        | 653699   | EN           | MT        | COL        | a collection from Alpaca, Quora, StackOverFlow and MedQuAD questions                                              | human annotated datasets collection                                            | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fbaize)                 |\n| [hh-rlhf](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fhh-rlhf)                               | 284517   | EN           | TS        | MIX        | dialogue                                                                                                          | dialog between human and RLHF models                                           | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fhh-rlhf)               |\n| [OIG(part)](https:\u002F\u002Flaion.ai\u002Fblog\u002Foig-dataset\u002F)                                | 49237    | EN           | MT        | COL        | created from various tasks, such as question and answering                                                        | using data augmentation, human annotated datasets collection                   | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FOIG)                   |\n| [GAOKAO](https:\u002F\u002Fgithub.com\u002FOpenLMLab\u002FGAOKAO-Bench)                            | 2785     | CN           | MT        | COL        | Multiple-choice, Fill-in-the-blank and Open-ended questions from examination                                      | human annotated                                                                | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FGAOKAO)                |\n| [camel](https:\u002F\u002Fgithub.com\u002Flightaime\u002Fcamel)                                    | 760620   | EN           | MT        | SI         | Role-Playing conversations in AI Society, Code, Math, Physics, Chemistry, Biolog                                  | gpt-3.5-turbo                                                                  | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fcamel)                 |\n| [FLAN-Muffin](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMuennighoff\u002Fflan)                | 1764800  | EN           | MT        | COL        | 60 nlp tasks                                                                                                      | human annotated datasets collection                                            | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FFLAN-Muffin)           |\n| [COIG(FlagInstruct)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBAAI\u002FCOIG)                | 298428   | CN           | MT        | COL        | collect fron Exam, Translated, Human Value Alignment Instructions and Counterfactural Correction Multi-round Chat | using automatic tool and manual verification                                   | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FCOIG)                  |\n| [GPT4Tools](https:\u002F\u002Fgithub.com\u002FStevenGrove\u002FGPT4Tools)                          | 71446    | EN           | MT        | SI         | a collection of tool-related instructions                                                                         | gpt-3.5-turbo                                                                  | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fgpt4tools)             |\n| [ShareChat](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRyokoAI\u002FShareGPT52K)               | 1663241  | EN           | MT        | MIX        | general instruct                                                                                                  | crowdsourcing to collect conversations between people and ChatGPT (ShareGPT)   | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FShareGPT)              |\n| [Auto CoT](https:\u002F\u002Fgithub.com\u002Famazon-science\u002Fauto-cot)                         | 5816     | EN           | MT        | COL        | arithmetic, commonsense, symbolic, and other logical reasoning tasks                                              | human annotated datasets collection                                            | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FAuto-CoT)              |\n| [MOSS](https:\u002F\u002Fgithub.com\u002FOpenLMLab\u002FMOSS)                                      | 1583595  | EN\u002FCN        | TS        | SI         | general instruct                                                                                                  | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FMOSS)                  |\n| [ultrachat](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FUltraChat)                               | 28247446 | EN           |           |            | Questions about the World, Writing and Creation, Assistance on Existent Materials                                 | two separate gpt-3.5-turbo                                                     | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fultrachat)             |\n| [Chinese-medical](https:\u002F\u002Fgithub.com\u002FToyhom\u002FChinese-medical-dialogue-data)     | 792099   | CN           | TS        | COL        | Questions about medical advice                                                                                    | crawl                                                                          | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FChinese-medical)       |\n| [CSL](https:\u002F\u002Fgithub.com\u002Fydli-ai\u002Fcsl)                                          | 396206   | CN           | MT        | COL        | paper text generation, keyword extraction, text summarization and text classification                             | crawl                                                                          | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FCSL)                   |\n| [pCLUE](https:\u002F\u002Fgithub.com\u002FCLUEbenchmark\u002FpCLUE)                                | 1200705  | CN           | MT        | COL        | general instruct                                                                                                  |                                                                                | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FpCLUE)                 |\n| [news_commentary](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FP01son\u002Finstructions)         | 252776   | CN           | TS        | COL        | translate                                                                                                         |                                                                                | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fnews_commentary)       |\n| [StackLLaMA](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Flvwerra\u002Fstack-exchange-paired)    | todo     | EN           |           |            |                                                                                                                   |                                                                                |                                                                                                 |\n\n\n\n\n### Download\nYou can download all the formatted data [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain). Then you should put them in the [data](https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002Falpaca-CoT\u002Ftree\u002Fmain\u002Fdata) folder.\n\nYou can download all checkpoints trained on various types of instruction data from [here](https:\u002F\u002Fhuggingface.co\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain). Then, after setting `LoRA_WEIGHTS` (in `generate.py`) to the local path, you can directly execute the model inference.\n\n\n### Data Formatting\nAll data in our collection is formatted into the same templates, where each sample is as follows:\n```\n[\n{\"instruction\": instruction string,\n\"input\": input string, # (may be empty)\n\"output\": output string}\n]\n```\nNote that, for CoT datasets, we first use the [template](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002FFLAN\u002Fblob\u002Fmain\u002Fflan\u002Fv2\u002Ftemplates.py) provided by FLAN to change the original dataset into various Chain-of-Thoughts forms, and then convert it to the above format. The formatting script can be found [here](https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002Falpaca-CoT\u002Fblob\u002Fmain\u002Fdata\u002Forigin_cot_data\u002Fformating.py).\n\n\n## Multi-interface Unified Platform\n### Setup\n```\npip install -r requirements.txt\n```\nNote that, make sure python>=3.9 when finetuning ChatGLM.\n\n**PEFT**\n* if you want to use other methods besides LORA, please install the edited version in our project\n```\npip install -e .\u002Fpeft\n```\n\n\n### Instruction Finetuning\nIn order for researchers to conduct systematic IFT research on LLMs, we have collected different types of instruction data, integrated multiple LLMs, and unified interfaces, making it easy to customize the desired collocation:\n- `--model_type` : Set the LLM you want to use. Currently, [llama, chatglm, bloom, moss] are supported. The latter two have strong Chinese capabilities, and more LLMs will be integrated in the future.\n- `--peft_type`: Set the PEFT you want to use. Currently, [lora, adalora, prefix tuning, p tuning, prompt] are supported.\n- `--data`: Set the data type used for IFT to flexibly tailor the desired command compliance ability. For example, for strong reasoning ability, set \"alpaca-cot\", for strong Chinese ability, set \"belle1.5m\", for coding and story generation ability, set \"gpt4all\", and for financial related response ability, set \"finance\".\n- `--model_name_or_path`: This is set to load different versions of the model weights for the target LLM  `--model_type`. For example, to load the llama's 13b version of weights, you can set decapoda-research\u002Fllama-13b-hf.\n\n**Single GPU**\n- for LLaMA\n```\npython3 uniform_finetune.py --model_type llama --model_name_or_path decapoda-research\u002Fllama-7b-hf \\\n    --data alpaca-belle-cot --lora_target_modules q_proj v_proj \\\n    --per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1\n```\n\nNote: for multiple datasets, you can use `--data` like `--data .\u002Fdata\u002Falpaca.json .\u002Fdata\u002Ffinance.json \u003Cpath2yourdata_1>`\n\n- for ChatGLM\n```\npython3 uniform_finetune.py   --model_type chatglm --model_name_or_path THUDM\u002Fchatglm-6b \\\n    --data alpaca-belle-cot --lora_target_modules query_key_value \\\n    --lora_r 32 --lora_alpha 32 --lora_dropout 0.1 --per_gpu_train_batch_size 2 \\\n    --learning_rate 2e-5 --epochs 1\n```\nNote that `load_in_8bit` is not yet suitable for ChatGLM, so batch_size must be smaller than others.\n\n- for BLOOM\n```\npython3 uniform_finetune.py   --model_type bloom --model_name_or_path bigscience\u002Fbloomz-7b1-mt \\\n    --data alpaca-belle-cot --lora_target_modules query_key_value \\\n    --per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1\n```\n\n- for MOSS\n```\npython3 uniform_finetune.py   ---model_type moss --model_name_or_path fnlp\u002Fmoss-moon-003-sft  \\\n    --data alpaca --lora_target_modules q_proj v_proj --per_gpu_train_batch_size 1 \\\n    --learning_rate 3e-4 --epochs 3\n```\n\n- for InternLM\n```\npython3 uniform_finetune.py   --model_type internlm --model_name_or_path internlm\u002Finternlm-7b \\\n    --data alpaca --lora_target_modules q_proj v_proj --lora_r 32 --lora_alpha 32 \\\n    --lora_dropout 0.1 --per_gpu_train_batch_size 1 --learning_rate 2e-5 --epochs 1 \\\n    --compute_dtype=\"fp32\"\n```\n\nNote that you can also pass the local path (where LLM weights saved) to `--model_name_or_path`. And the data type `--data` can be freely set according to your interests.\n\n**Multiple GPUs**\n``` bash\ntorchrun --nnodes 1 --nproc_per_node $ngpu uniform_finetune.py $args --data $data \n```\n\n- for LLaMA\n```\npython3 -m torch.distributed.launch --nproc_per_node 4  \\\n    --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy uniform_finetune.py \\\n    --model_type llama --model_name_or_path decapoda-research\u002Fllama-7b-hf \\\n    --data alpaca-belle-cot --lora_target_modules q_proj v_proj \\\n    --per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1\n```\n- for ChatGLM\n```\npython3 -m torch.distributed.launch --nproc_per_node 4  \\\n    --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy \\\n    uniform_finetune.py   --model_type chatglm --model_name_or_path THUDM\u002Fchatglm-6b \\\n    --data alpaca-belle-cot --lora_target_modules query_key_value \\\n    --lora_r 32 --lora_alpha 32 --lora_dropout 0.1 --per_gpu_train_batch_size 2 \\\n    --learning_rate 2e-5 --epochs 1\n```\nNote that `load_in_8bit` is not yet suitable for ChatGLM, so batch_size must be smaller than others.\n\n- for BLOOM\n```\npython3 -m torch.distributed.launch --nproc_per_node 4  \\\n    --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy \\\n    uniform_finetune.py   --model_type bloom --model_name_or_path bigscience\u002Fbloomz-7b1-mt \\\n    --data alpaca-belle-cot --lora_target_modules query_key_value \\\n    --per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1\n```\n\n- for InternLM\n```\npython3 -m torch.distributed.launch --nproc_per_node 4  \\\n    --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy \\\n    uniform_finetune.py   --model_type internlm --model_name_or_path internlm\u002Finternlm-7b \\\n    --data alpaca --lora_target_modules q_proj v_proj --lora_r 32 --lora_alpha 32 \\\n    --lora_dropout 0.1 --per_gpu_train_batch_size 1 --learning_rate 2e-5 --epochs 1 \\\n    --compute_dtype=\"fp32\"\n```\n\n\n\n### Inference\n```\npython3 generate.py  --data alpaca-belle-cot --model_type llama\n\npython3 generate.py  --data alpaca-belle-cot --model_type chatglm\n\npython3 generate.py  --data alpaca-belle-cot --model_type bloom\n\n```\nMore details of instruction finetuing and inference can be found [here](https:\u002F\u002Fgithub.com\u002Ftloen\u002Falpaca-lora) where we modified from. Note that the folders `saved-xxx7b` are the save path for LoRA weights, and LLaMA weights are automatically downloaded from Hugging Face.\n\n\n### Inference Hyper-parameter Explanation\n```\ntop_p=0.9,\n        #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.\n\ntemperature=1.0,\n        #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.\n\ndo_sample=True,\n        #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.\n\nno_repeat_ngram_size=6,\n        #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.\n\nrepetition_penalty=1.8,\n        #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.\n```\n\n\n### Parameter merging\n```\npython3 merge.py --model_type llama --size 7b --lora_dir xxx --merged_dir yyy\n```\n\n### Local chatting\n```\npython3 server.py --model_type chatglm --size 6b --lora_dir xxx\n```\n### Batch predicting\n```\npython3 predict.py --model_type chatglm --size 6b --data for_dict_data --lora_dir xxx --result_dir yyy\n```\n\n### Web service building\n\n```\npython3 web.py --model_type chatglm --size 6b --lora_dir xxx\n```\n\n## Empirical Study of Instruction-tuning Open LLMs in Chinese (As of June 25th)\n\u003Cdetails>\u003Csummary>Note: The following experimental results are all obtained from ___An Empirical Study of Instruction-tuning Large Language Models in Chinese___.\u003C\u002Fsummary>\n\u003Cp>\n\n### 1. Benchmarks\nThis paper selects two evaluation benchmarks, Belle-eval and MMCU, to comprehensively evaluate LLM competencies in Chinese.\n\nBelle-eval is constructed by self-instruct with ChatGPT, which has 1,000 diverse instructions that involve 10 categories covering common NLP tasks (e.g., QA) and challenging tasks (e.g., code and math). We use ChatGPT to rate the model responses based on the golden answers. This benchmark is considered to be as the assessment of AGI (instruction-following) capability.\n\nMMCU is a collection of Chinese multiple choice questions in four professional disciplines of medicine, law, psychology and education (e.g., Gaokao examination). It allows LLMs to take exams in human society in a multiple-choice test manner, making it suitable for evaluating the breadth and depth of knowledge of LLMs across multiple disciplines. \n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_0acc26d0020d.png\" width=\"35%\">\n\u003C\u002Fp>\n\nData statistics of Belle-eval and MMCU are shown in the table above.\n\n### 2. Main Factors\nWe conduct experiments to study the three main factors in instruction-tuning LLMs: LLM bases, Parameter-efficient Methods, Chinese Instruction Datasets.\n\n#### 2.1 LLM Bases\nFor open LLMs, we test existing LLMs and LLMs fine-tuned with LoRA on Alpaca-GPT4 on Belle-eval and MMCU, respectively.\n\n\n   \u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_957c8ddf0b7a.png\" width=\"80%\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_add64bb6aada.png\" width=\"40%\">\n\u003C\u002Fp>\n\nTable 2 shows the scores of open LLMs on Belle-eval. Table 3 shows the accuracy of LLMs on MMCU. They fine-tune all the open LLMs with the same parameter-efficient method LoRA and the same instruction dataset Alpaca-GPT4. \n\n___Experimental Results:___\n1. Evaluation of Existing LLMs\n\n\n    ___Performance on Belle-eval___\n\n    (1) For base LLMs, Bloom performs the best.\n\n    (2) For sft LLMs, ChatGLM outperforms others by large margins, thanks to the fact that it is trained with the most Chinese tokens and HFRL.\n\n    (3) The Open QA, Math, CloseQA and Extract categories are still very challenging for existing open LLMs.\n\n    (4) Vicuna and moss-sft have clear improvements compared to their bases, LLaMA and moss-base, respectively.\n\n    (5) In contrast, the performance of sft models, Bloomz and Bloomz-mt, is reduced compared to the base model Bloom, because they tend to generate a shorter response.\n\n    ___Performance on MMCU___\n\n    (1) All base LLMs perform poorly because it is almost difficult to generate content in the specified format before fine-tuning, e.g., outputting option numbers.\n\n    (2) All sft LLMs outperform their corresponding base LLMs, respectively. In particular, Bloomz performs the best (even beats ChatGLM) because it can generate option number directly as required without generating other irrelevant content, which is also due to the data characteristics of its supervised fine-tuning dataset xP3.\n\n    (3) Among the four disciplines, law is the most challenging for LLMs.\n\n\n\n   \u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_2a6adbf6b511.png\" width=\"40%\">\n\u003C\u002Fp>\n\nThe performance results of LLMs after instruction-tuning on Alpaca-GPT4-zh are shown in Figure 1.\n\n2. Instruction-tuning Different LLMs\n\n\n\n    (1) On Belle-eval, the performance improvement of sft LLMs brought by instruction-tuning is not as significant as that of base LLMs, except for sft Bloomz and Bloomz-mt.\n\n    (2) Vicuna and ChatGLM encounter performance drops after instruction-tuning, because Vicuna is trained from real human-ChatGPT conversations, with better quality than Alpaca-GPT4. ChatGLM adopts HFRL, which may be no longer suitable for further instruction-tuning. \n\n    (3) On MMCU, most LLMs achieve performance boosts after instruction-tuning, with the exception of Bloomz and Bloomz-mt, which have unexpectedly significantly decreased performance.\n\n    (4) After instruction-tuning, Bloom has significant improvements and performs well on both benchmarks. Although ChatGLM beats Bloom consistently, it suffers performance drop during instruction-tuning. Therefore, among all open LLMs, Bloom is most suitable as a foundation model in the subsequent experiments for Chinese instruction-tuning exploration.\n\n#### 2.2 Parameter-efficient Methods\nFor parameter-efficient methods other than LoRA, the paper collects a range of parameter-efficient methods to instruction-tune Bloom on the Alpaca-GPT4 dataset.\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_0eb1d6ebf45a.png\" width=\"40%\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_1ec45380c7b2.png\" width=\"40%\">\n\u003C\u002Fp>\n\n___Experimental Results:___\n\n1. Comparison of Parameter-efficient Methods\n\n    (1) SadapterH performs the best among all parameter-efficient methods, which can be used as an alternative to LoRA.\n\n    (2) P-tuning and prompt-tuning underperform others by large margins, indicating that only adding trainable layers in the embedding layer are not enough to support LLMs for generation tasks. \n\n    (3) Although AdaLoRA is an improvement of LoRA, its performance has a clear drop, possibly because the LoRA's trainable parameters for LLMs are not suitable for further reduction. \n\n    (4) Comparing the upper and lower parts, it can be seen that increasing the number of trainable parameters for sequential adapters (i.e., SadapterP and SadapterH) does not bring gain, while the opposite phenomenon is observed for parallel adapters(i.e., P-adapter)\n\n2. Training Loss\n\n    (1) Prompt-tuning and P-tuning converge the slowest and has the highest losses after convergence. This shows that embedding-only adapters are not suitable for instruction-tuning LLMs. \n\n    (2) The initial loss of AdaLoRA is very high because it requires simultaneous learning of parameter budget allocation, which makes the model unable to fit the training data well. \n\n    (3) The other methods can quickly converge on training data and fit it well.\n\n#### 2.3 Chinese instruction Datasets\nFor the impact of various types of Chinese instruction datasets, authors gather popular open Chinese instructions (as shown in Table 5) to fine-tune Bloom with LoRA.\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_a9dbbd10c232.png\" width=\"80%\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_d45c1b6b8541.png\" width=\"80%\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_1e7b291a4719.png\" width=\"40%\">\n\u003C\u002Fp>\n\nTable 6 and Table 7 show Bloom's fine-tuning on different instruction datasets.\n\n___Experimental Results:___\n\n1. Performance on Belle-eval\n\n    (1) the instruction data constructed by ChatGPT (e.g., using self-instruction methods or collecting real human-ChatGPT conversations) consistently enhances the instruction-following ability with 3.1 ∼ 11-point score increases. \n\n    (2) Among these datasets, Belle has the best performance due to the largest amount of instruction data. However, the performance of models trained on moss-sft-data, containing more data built in a similar way, is unsatisfactory.\n\n    (3) The performance brought by the Alpaca-GPT4 instructions is the second best, with only 49K being comparable to the 1.54M Belle.\n\n    (4) Instinwild brings the least performance gains among them because the seed instructions it crawls from Tweet (\"in wild\") are not as comprehensive as those (like Alpaca) carefully designed by humans.\n\n    (5) These ChatGPT-based data mainly have a significant improvement effect on open generation tasks such as Brain Storm and Generation, while there is a significant decrease in tasks that require high reading comprehension skills, such as Close QA and Extract.\n\n    (6) These instruction datasets cause damage to the model's instruction-following ability, because the form and intent of each NLP or examination dataset are unitary, which can easily be overfitted. \n\n    (7) Among them, COIG-trans performs the best because it involves over 2000 different tasks with a wide variety of task instructions. In contrast, xP3 and COIG-ccmc have the worst negative impact on model performance. Both of them only cover a few types of tasks (translation and QA for the former, counterfactual correction conversations for the latter), which hardly cover the popular instructions and tasks for humans.\n\n2. Performance on MMCU\n\n    (1) Instruction-tuning on each dataset can always result in performance improvement. \n\n    (2) Among the ChatGPT-based data shown in the upper part, ShareGPT-zh underperforms others by large margins. This may be due to the fact that real users rarely ask multiple choice questions about academic topics. \n\n    (3) Among the dataset-collection data shown in the lower part, HC3 and COIG-ccmc results in the lowest accuracy because the unique questions of HC3 are only 13K, and the task format of COIG-ccmc is significantly different from MMCU. \n    \n    (4) COIG-exam brings the greatest accuracy improvement, benefiting from the similar task format as MMCU.\n\n### 3. Other Factors\nFour Other Factors: CoT, Expansion of Chinese Vocabulary, Language of Prompts and Human-value Alignment\n\n#### 3.1 CoT\nFor CoT, authors compare the performance before and after adding CoT data during instruction-tuning.\n\n___Experiment Settings:___\n\nWe collect 9 CoT datasets and their prompts from FLAN, and then translate them into Chinese using Google Translate. They compare the performance before and after adding CoT data during instruction-tuning.\n\nFirst note the way to add CoT data as \"Alpaca-GPT4+CoT\". In addition, add a sentence \"先思考，再决定\" (\"think step by step\" in Chinese) at the end of each instruction, to induce the model to respond to instructions based on the CoT, and label this way as \"Alpaca-GPT4+CoT*\".\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_8af41667a4cb.png\" width=\"40%\">\n\u003C\u002Fp>\n\n___Experimental Results:___ \n\n1. \"Alpaca-GPT4+CoT\" outperforms \"Alpaca-GPT4\" in Code and Math tasks that require strong reasoning ability. Besides, there is also a significant improvement in the MMCU Education task.\n\n2. As shown in the line of \"Alpaca-GPT4+CoT*\", the simple sentence can further improve the performance of reasoning tasks Code and Education, while the Math performance is slightly inferior to \"Alpaca-GPT4+CoT\". This may require further exploring of more robust prompts.\n\n#### 3.2 Expansion of Chinese Vocabulary\nFor expansion of Chinese vocabulary, authors test the influence of the number of Chinese tokens in the tokenizer’s vocabulary on LLMs’ ability to express Chinese. For example, if a Chinese character is in the vocabulary, it can be represented by a single token, otherwise it may require multiple tokens to represent it.\n\n___Experiment Settings:___ Authors mainly conduct experiments on LLaMA, which uses SentencePiece(32K vocabulary size of Chinese characters) covering fewer Chinese characters than Bloom(250K).\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_a8f7bfa8b538.png\" width=\"45%\">\n\u003C\u002Fp>\n\n___Experimental Results:___\n\n1. Pre-training on more Chinese corpus with expansion of Chinese vocabulary is consistently helpful for instruction-following ability.\n\n2. And counterintuitively, \"llama-voc-pre-l\" (100B) is inferior to \"llama-voc-pre\" (20B) on MMCU, which shows that pre-training on more data may not necessarily lead to higher performance for academic exams.\n\n#### 3.3 Language of Prompts\n\nFor the language of prompts, authors test the suitability of instruction fine-tuning for using Chinese prompts.\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_277dde909c67.png\" width=\"60%\">\n\u003C\u002Fp>\n\nFigure 4 shows the results of using Chinese and English prompts based on LLaMA and Bloom.  When instruction-tuning LLaMA, using Chinese prompts can improve the performance on both benchmarks compared to English prompts, while the opposite phenomenon can be observed on Bloom.\n\n___Experimental Results:___\n\n1. For models with weaker Chinese abilities(e.g., LLaMA), using Chinese prompts can effectively help respond in Chinese.\n\n2. For models with good Chinese abilities (e.g., Bloom), using prompts in English (the language they are better at) can better guide the model to understand the process of fine-tuning with instructions.\n\n#### 3.4 Human-value Alignment\nTo avoid LLMs generating toxic content, aligning them with human values is a crucial issue. We add human-value alignment data built by COIG into instruction-tuning to explore its impact. \n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_cd00a707ac64.png\" width=\"30%\">\n\u003C\u002Fp>\n\nFigure 5 compares the results of instruction-tuning with and without human-value alignment.\n\n___Experimental Results:___ The human-value alignment results in a slight performance drop. How to balance the harmlessness and performance of LLMs is a research direction worth exploring in the future.\n\n\n\u003C\u002Fp>\n\u003C\u002Fdetails> \n\n## Quantitative Analysis\n\u003Cdetails>\u003Csummary>Note: The following figure shows the statistics of the dataset collected as of March 26, which is only displayed as a motivation of data collection. More datasets have been collected, such as financial related instruction datasets.\u003C\u002Fsummary>\n\u003Cp>\n\n![data collection statistics](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_4a1bed34cc83.png)\nThe current collection of instruction-finetuning datasets consists mainly of three parts:\n- `alpaca_data_cleaned.json`: about 52K English instruction-following training samples.\n- `CoT_data.json`: 9 CoT datasets involving about 75k samples. (published by FLAN[7])\n- `belle_data_cn.json`:  about 0.5M Chinese |instruction-following training samples. (published by BELLE [8])\n\n### Ablation of CoT and Chinese Instructions\n\n\n![ablation-cot](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_49d95f4ba257.png)\n\"w\u002Fo CoT\" and \"w\u002Fo CN\" denote models that exclude CoT data and Chinese instructions from their instruction finetuning data, respectively.\n\nThe above table shows two examples (involving with numerical calculations) that require a certain amount of reasoning ability to respond correctly.\nAs shown in the middle column, `Ours w\u002Fo CoT` fails to generate the correct response, which shows that once the finetuning data does not contain CoT data, the model's reasoning ability significantly decreases. This further demonstrates that CoT data is essential for LLM models.\n\n![ablation-cot](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_c1396b21bd1d.png)\n\nThe above table shows two examples that require the ability to respond to Chinese instructions.\nAs shown in the right column, either the generated content of `Ours w\u002Fo CN` is unreasonable, or the Chinese instructions are answered in English by `Ours w\u002Fo CN`. This shows that removing Chinese data during finetuning will cause the model to be unable to handle Chinese instructions, and further demonstrates the need to collect Chinese instruction finetuning data.\n\n\n![ablation-cot](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_f4b18d68f3c4.png)\n\nThe above table shows a relatively difficult example, which requires both a certain accumulation of knowledge of Chinese history and a logical and complete ability to state historical events. As shown in this table, `Ours w\u002Fo CN` can only generate a short and erroneous response, because due to the lack of Chinese finetuning data, the corresponding knowledge of Chinese history is naturally lacking.  Although `Ours w\u002Fo CoT` lists some relevant Chinese historical events, its logic of expression is self-contradictory, which is caused by the lack of CoT data.\n`\n\n**In summary, the models finetuned from our complete dataset (English, Chinese, and CoT instruction data) can significantly improve model reasoning and Chinese instruction following abilities.**\n\n### The Effect of CoT Data\n\n![CoT-comparison](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_db6d5ab2a00b.png)\nSamples of each odd number of rows do not apply the CoT prompt, such as \"step-by-step reasoning.\" Both `Ours(w\u002FCoT)` and Alpaca are based on LLaMA-7B, and the only difference between them two is that the instruction-finetuning data of `Ours(w\u002FCoT)` has a extra CoT data than that of Alpaca.\n\nFrom the above table, we find that:\n- `Ours(w\u002FCoT)` always generates the correct rationale before the answer, while Alpaca fails to generate any reasonable rationale, as shown in the first 4 examples (commonsense questions). This shows that using CoT data for finetuning can significantly improve reasoning ability.\n- For `Ours(w\u002FCoT)`, the CoT prompt (e.g., concatenate 'step-by-step' with the input question) has little effect on easy examples (e.g., commonsense questions) and has an important effect on challenging questions (e.g., questions requiring reasoning, like the last four examples).\n- For Alpaca, CoT prompt always has little effect or even negative impact. For the last two examples, after adding CoT prompt, Aplpaca changes the correct generated answer to the wrong one. This may be due to the inconsistency between the input forms of finetuning and inference.\n\n\n### The Effect of Chinese Instruction Data\n\n_Quantitative comparison of responses to Chinese instructions._\n![CN_compare_CN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_20630df5f6a2.png)\n\nOur model is finetuned from a 7B LLaMA on 52K English instructions and 0.5M Chinese instructions. Stanford Alpaca (our reimplementation) is finetuned from a 7B LLaMA on 52K English instructions. BELLE is finetuned from a 7B BLOOM on 2B Chinese instructions.\n\nFrom the above table, several observations can be found:\n- Compared to Alpaca, `ours (w\u002F CN)` has a stronger ability to understand Chinese instructions. For the first example, Alpaca fails to distinguish between the `instruction` part and `input` part, while we do.\n- Chinese instruction finetuning data can significant enhance the ability to interact in Chinese. For the second example, `ours (w\u002F CN)` not only provides the correct code, but also provides the corresponding Chinese annotation, while Alpaca does not. In addition, as shown in the 3-5 examples, Alpaca can only respond to Chinese instruction with an English response.\n- Compared to BELLE, `ours (w\u002F CN)`'s performance on instructions requiring an open response (as shown in last two examples) still needs to be improved. BELLE's outstanding performance against such instructions is due to: 1. Its BLOOM backbone model encounters much more multilingual data during pre-training; 2. Its Chinese instruction finetuning data is more than ours, that is, 2M vs 0.5M.\n\n\n\n _Quantitative comparison of responses to English instructions. The purpose of this subsection is to explore whether finetuning on Chinese instructions has a negative impact on Alpaca._\n![CN_compare_EN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_2b3356b2496d.png)\n\n\nFrom the above table, we find that:\n- Finetuning with Chinese instruction data does not weaken the original English instruction–following ability, on the contrary, there is also a certain enhancement in generating a better response to English instructions. The response of `ours (w\u002F CN)` shows more detail than that of Alpaca, e.g. for the third example, `ours (w\u002F CN)` list three more provinces than Alpaca.\n\n\u003C\u002Fp>\n\u003C\u002Fdetails> \n\n\n\n\n## Citation\nPlease cite the repo if you use the data collection, code, and experimental findings in this repo.\n```\n@misc{si2023empirical,\n      title={An Empirical Study of Instruction-tuning Large Language Models in Chinese}, \n      author={Qingyi Si and Tong Wang and Zheng Lin and Xu Zhang and Yanan Cao and Weiping Wang},\n      year={2023},\n      eprint={2310.07328},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\nFor data and models, please cite the original data, parameter-efficient methods and LLMs source as well.\n\nWe would like to express our special gratitude to APUS AilMe Lab for sponsoring the 8 A100 GPUs for the experiments.\n\n\n\u003Cp align=\"right\">(\u003Ca href=\"#top\">back to top\u003C\u002Fa>)\u003C\u002Fp>\n\n## All Thanks To Our Contributors \n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_da68014e60e1.png\" \u002F>\n\u003C\u002Fa>\n","[**中文**](.\u002FCN_README.md) | [**English**](.\u002FREADME.md)\n\n\u003Cdiv id=\"top\">\u003C\u002Fdiv>\n\n![Alpaca-CoT](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_f3fc654840d1.jpg)\n\n# Alpaca-CoT：一个统一接口的指令微调平台，支持指令收集、参数高效方法和大语言模型（Large Language Models, LLMs）\n[![LICENSE](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FPhoebusSi\u002FAlpaca-CoT)](https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Fblob\u002Fmain\u002FLICENSE.txt)\n[![torch](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpytorch-%3E=1.13-red?logo=pytorch)](https:\u002F\u002Fpytorch.org\u002F)\n[![data](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fhuggingface-dataset-yellow?logo=data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAABGdBTUEAALGPC\u002FxhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAABmJLR0QA\u002FwD\u002FAP+gvaeTAAAJXUlEQVRYCQXBeWzW933A8TfYQIBAMOa0\u002FXx+QJZoaRJIQpusySJSaGKY0nWpRrZVmTq1Ug+t\u002FzRd2\u002F2zqUv\u002FWNdKW9RN3SZtU7NpUmqwuQyGmPswl\u002FHx\u002FT0c7ZIWSihgG\u002FP4enxgP++9XgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPDRHVf+us9nb\u002Fb7ym8G3XRz0PXXbrkMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAICP+3x8YNi\u002FnZnxqOWPb3uvc8r+49p\u002FSu+nSSfvfTwx6ZH+ocp37pUrBQAAAAAAAAAAAAAAAAAAAAAAAPrHrBsq+6+OfjTojZ\u002FomVe1dZW2zNPm2dpSpbsWaNta7fxTvdtsZWroTmnEH6k1AAAAAAAAAAAAAAAAAAAA9A\u002F5OadHP\u002FSjH+q+ldqCts\u002FXjmXaWadd9XqpXi+s1lNLtW2O7kTbN2j\u002FHqemvVwu+xIAAAAAAAAAAAAAAAAA\u002FaXK1xz\u002FsOyJl3UnerJGewpazDTPNA\u002FNQ\u002FPQPNM80zzTrnr9YKE2zdL0TXXy\u002FvCYbwAAAAAAAAAAAAAAcKfknzl2ddK2dbqvWrsatJhpCk2hKTSFptAUmkJTaArNQ4uZnlupO9FzX3DGyZGRcT8DAAAAAAAAAADAbwf8RGWqr8\u002F2p7R1rvaG5qEpNIWm0BSaQlNoCk2hKTSFptAUWsy0s053oj1fdXLa34yOVlYDAAAAAAAAoM4en3S\u002F6Su6C+0paB6aQlNoMTQPTaEpNIWm0BSaQouheWgKTaHFTM+t0ia0f4elMf8bAAAAAAAAgJv33ObA0Rl3ztZzKzXPNIWm0GJ47\u002FQ6p3syLYam0BSaQlNoseDgmbVOdWdaLGgKTaHFTI8s0gPr9EFpfHTUZwAAAAAAABifcqeX3tT9czTPNIWmsHK54H\u002F9\u002FbNu\u002FswW3\u002F7qi45cWKN5aApNoVcK\u002FvzdDW7ZvMVvfOn3vdexVouhKTSFdhd0J3r7Zw6O+O8AAAAAAFy5XlldGbrW594lenaF5pmm0Dwcv7jG17dt9plPNrrhuUYv7fhdvVLQFJqH0z3hm29scsPGRp96ZqvH3ntSrxY0habQYqYfzNdTW5zRX97SBQAAAABc7\u002FN1b72vzWhPQVNoCk1hpRj+w\u002Fc+5Q+\u002F97zH\u002FucTDp5Zq3loCk2hKTz7\u002FhPu+enTfvmLL\u002Fvb449qMTSFptA807PLdU+Njt140He\u002F8hwAAAAAt+\u002F7Xf\u002Fv73TvLM0zTaEpNIXm4UxvWEmh1xo0D02hKTSFptArBb1ScKor02JoCk2hKTQPvVSvzbN1qMP+kl8EAAAAYHDEH1v8uh6o1jzTFJpCU2gKTaF56NWCXi5oCk2hKTQPvVrQywXNQ1NoCk2hKTSF9hS0BR3Y6537fhsAAACAUtl\u002FMn1FD87RPNMUmkJTaAq9WvDoe0\u002F6ztsv+IsDj2mxoL9o0GsNjnWucd+\u002FPe33v\u002FWC5YtrNA9NoSk0habQnoK2oH3N3h32rwEAAAC4N+IPvPot3T9bi2s0zzTPNIXmodcKHvrPp3xyw1Y3bfqsf\u002Fnll\u002FzHv\u002Fmk7\u002FzV827\u002Fo00+s7HR7Z9\u002FxanuTPNM89AUmmeaZ9pT0GZ08JD9w5VvAAAAAHD9buXPvf4T3YUeq9G2hdqxQvNMO+u0u86B0+v8g8bNPvvCa776e42++HyjW15o9OVPv+b6jY3+y\u002Fc36rWCdtbppXrNQ0\u002FVattCPbxYW+bpcKrcvu8rAAAA2NRUdbfkuw60a9MsbanWtoXaXK0HH9aWam2u1gu17vjper\u002F00lY\u002F3Pp52xtft3fb52zf\u002FId+bfsmBzrW6cka3Vmlu+Zo20Jtrta2hboD3R86MTAzNOYPbGqqAgBgqOy3HTyg+x\u002FVptm6d57mmR6r0fdn6bEaPfKI7p6jeYOl\u002F13v2De3OPrWNkf+Yqvj77zkxOlHtbdOm6v1xFJtX6Q\u002Fn6VnlmlvQVuqdcc8PbRey10OlX0bgCatcmaq0+Mb9eA8PVGrLXP06BLNM22dr2eWa0+DNlfryVq9FprCyoW1Vi6t0SsFvZLp4Ud091zNMz1Wo4ce1mKmhxbp7rl6apnum6UX3nB8qpKamqxCrVJ7PPmyts7VnoJ21ev++XpiqV5cra3ztbtBjy3R5mq9VKd5pnloHppnen6l7qzS08u0s05bH9Kuej28WA8u1N6CdjXortl66S3Lk15Vq7hxzyctpRH31WrrfG1doKeXa29BP1ikHcv1WI2eqtUU2jpf987T7gZNoXmmnXXaMkcPPax5pkeX6KlaPVmrRx7R3oIer9V983XfPD34O1Ym+oYHhn2CW\u002FdsdPC0Ns\u002FW86u0u0GPLNFDi\u002FRSnXbVa55pCs1Duxt091xtX6x5pin0wAJtfUh7C5pCU2ieaVe9XlytbQ\u002Fr8aXaW9DTy3TXIh3\u002FtaURP8vNQTdYujzp7oXatlCPL9ULq\u002FX8Kr24WvNMU2gKTaHFTE\u002FW6u452lvQrnptrtazKzTPNIWm0BSaZ3p+lV5YredX6bGl2vqQ7i\u002FoRP\u002FkyKRPc3vAFxzs1J2ztGO5djXokRo9u1KLmabQFJpCU2gx0xNLdc9czTPtadCWaj27QvNMU2gKTaEptJjpqeV6bKl2F\u002FTEEt29RMu3LI1VPsXAgItnZrzi1e9oE3pwnl5Ypb2hxUyLmeaZ5pkWM+1u0N1V2oIeWaCH5+sOtHWeptBipnmmeabFTPNMewp6bqUeqNYd1Xr9XScf+Et1EQDlqcqL0zP+ynsH9NSr2vyQ7kIPztUTj+iZZdqxTI8v1ha07THtfFM7XtOz27TrLd1Xp3vQE0u0Y7meqdXji7Vtjraguxbp2S\u002FocIdT016fmvLTAAAAjI5WVpfG\u002FLH6K0eLeuOf9cJ2bV+vrfXaWqcfrNcr39XyTccnPTE97eGp6Ur72KTHHb6svV\u002FXQ0\u002FovtXaWtDDz2nnW\u002Frxf+j4Rz6Y9sbwuD9SVwEAAAAAAFAqWVMqu\u002F3+iO+NT9lTmZ6443j\u002FmON95cr0xJ3RCU+WJ90OAAAAUCr7J2MTnqg8KN9x\u002FG7ZiYGxyvTknfKEvaUxfzY85h+XStYAAAAAAAAAAAAAAKDOHh52ed+Qjw1N+PjtEVcAAAAAAAAAANweccXQhI\u002F3DfnY8LDL1dkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD8P8HSw4EMlPZhAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDIzLTA0LTEyVDEyOjI0OjQxKzAwOjAwUmNguAAAACV0RVh0ZGF0ZTptb2RpZnkAMjAyMy0wNC0xMlQxMjoyNDo0MSswMDowMCM+2AQAAAAodEVYdGRhdGU6dGltZXN0YW1wADIwMjMtMDQtMTJUMTI6MjQ6NDErMDA6MDB0K\u002FnbAAAAAElFTkSuQmCC)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT)\n[![model](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fhuggingface-model-yellow?logo=data:image\u002Fpng;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAABGdBTUEAALGPC\u002FxhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAABmJLR0QA\u002FwD\u002FAP+gvaeTAAAJXUlEQVRYCQXBeWzW933A8TfYQIBAMOa0\u002FXx+QJZoaRJIQpusySJSaGKY0nWpRrZVmTq1Ug+t\u002FzRd2\u002F2zqUv\u002FWNdKW9RN3SZtU7NpUmqwuQyGmPswl\u002FHx\u002FT0c7ZIWSihgG\u002FP4enxgP++9XgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPDRHVf+us9nb\u002Fb7ym8G3XRz0PXXbrkMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAICP+3x8YNi\u002FnZnxqOWPb3uvc8r+49p\u002FSu+nSSfvfTwx6ZH+ocp37pUrBQAAAAAAAAAAAAAAAAAAAAAAAPrHrBsq+6+OfjTojZ\u002FomVe1dZW2zNPm2dpSpbsWaNta7fxTvdtsZWroTmnEH6k1AAAAAAAAAAAAAAAAAAAA9A\u002F5OadHP\u002FSjH+q+ldqCts\u002FXjmXaWadd9XqpXi+s1lNLtW2O7kTbN2j\u002FHqemvVwu+xIAAAAAAAAAAAAAAAAA\u002FaXK1xz\u002FsOyJl3UnerJGewpazDTPNA\u002FNQ\u002FPQPNM80zzTrnr9YKE2zdL0TXXy\u002FvCYbwAAAAAAAAAAAAAAcKfknzl2ddK2dbqvWrsatJhpCk2hKTSFptAUmkJTaArNQ4uZnlupO9FzX3DGyZGRcT8DAAAAAAAAAADAbwf8RGWqr8\u002F2p7R1rvaG5qEpNIWm0BSaQlNoCk2hKTSFptAUWsy0s053oj1fdXLa34yOVlYDAAAAAAAAoM4en3S\u002F6Su6C+0paB6aQlNoMTQPTaEpNIWm0BSaQouheWgKTaHFTM+t0ia0f4elMf8bAAAAAAAAgJv33ObA0Rl3ztZzKzXPNIWm0GJ47\u002FQ6p3syLYam0BSaQlNoseDgmbVOdWdaLGgKTaHFTI8s0gPr9EFpfHTUZwAAAAAAABifcqeX3tT9czTPNIWmsHK54H\u002F9\u002FbNu\u002FswW3\u002F7qi45cWKN5aApNoVcK\u002FvzdDW7ZvMVvfOn3vdexVouhKTSFdhd0J3r7Zw6O+O8AAAAAAFy5XlldGbrW594lenaF5pmm0Dwcv7jG17dt9plPNrrhuUYv7fhdvVLQFJqH0z3hm29scsPGRp96ZqvH3ntSrxY0habQYqYfzNdTW5zRX97SBQAAAABc7\u002FN1b72vzWhPQVNoCk1hpRj+w\u002Fc+5Q+\u002F97zH\u002FucTDp5Zq3loCk2hKTz7\u002FhPu+enTfvmLL\u002Fvb449qMTSFptA807PLdU+Njt140He\u002F8hwAAAAAt+\u002F7Xf\u002Fv73TvLM0zTaEpNIXm4UxvWEmh1xo0D02hKTSFptArBb1ScKor02JoCk2hKTQPvVSvzbN1qMP+kl8EAAAAYHDEH1v8uh6o1jzTFJpCU2gKTaF56NWCXi5oCk2hKTQPvVrQywXNQ1NoCk2hKTSF9hS0BR3Y6537fhsAAACAUtl\u002FMn1FD87RPNMUmkJTaAq9WvDoe0\u002F6ztsv+IsDj2mxoL9o0GsNjnWucd+\u002FPe33v\u002FWC5YtrNA9NoSk0habQnoK2oH3N3h32rwEAAAC4N+IPvPot3T9bi2s0zzTPNIXmodcKHvrPp3xyw1Y3bfqsf\u002Fnll\u002FzHv\u002Fmk7\u002FzV827\u002Fo00+s7HR7Z9\u002FxanuTPNM89AUmmeaZ9pT0GZ08JD9w5VvAAAAAHD9buXPvf4T3YUeq9G2hdqxQvNMO+u0u86B0+v8g8bNPvvCa776e42++HyjW15o9OVPv+b6jY3+y\u002Fc36rWCdtbppXrNQ0\u002FVattCPbxYW+bpcKrcvu8rAAAA2NRUdbfkuw60a9MsbanWtoXaXK0HH9aWam2u1gu17vjper\u002F00lY\u002F3Pp52xtft3fb52zf\u002FId+bfsmBzrW6cka3Vmlu+Zo20Jtrta2hboD3R86MTAzNOYPbGqqAgBgqOy3HTyg+x\u002FVptm6d57mmR6r0fdn6bEaPfKI7p6jeYOl\u002F13v2De3OPrWNkf+Yqvj77zkxOlHtbdOm6v1xFJtX6Q\u002Fn6VnlmlvQVuqdcc8PbRey10OlX0bgCatcmaq0+Mb9eA8PVGrLXP06BLNM22dr2eWa0+DNlfryVq9FprCyoW1Vi6t0SsFvZLp4Ud091zNMz1Wo4ce1mKmhxbp7rl6apnum6UX3nB8qpKamqxCrVJ7PPmyts7VnoJ21ev++XpiqV5cra3ztbtBjy3R5mq9VKd5pnloHppnen6l7qzS08u0s05bH9Kuej28WA8u1N6CdjXortl66S3Lk15Vq7hxzyctpRH31WrrfG1doKeXa29BP1ikHcv1WI2eqtUU2jpf987T7gZNoXmmnXXaMkcPPax5pkeX6KlaPVmrRx7R3oIer9V983XfPD34O1Ym+oYHhn2CW\u002FdsdPC0Ns\u002FW86u0u0GPLNFDi\u002FRSnXbVa55pCs1Duxt091xtX6x5pin0wAJtfUh7C5pCU2ieaVe9XlytbQ\u002Fr8aXaW9DTy3TXIh3\u002FtaURP8vNQTdYujzp7oXatlCPL9ULq\u002FX8Kr24WvNMU2gKTaHFTE\u002FW6u452lvQrnptrtazKzTPNIWm0BSaZ3p+lV5YredX6bGl2vqQ7i\u002FoRP\u002FkyKRPc3vAFxzs1J2ztGO5djXokRo9u1KLmabQFJpCU2gx0xNLdc9czTPtadCWaj27QvNMU2gKTaEptJjpqeV6bKl2F\u002FTEEt29RMu3LI1VPsXAgItnZrzi1e9oE3pwnl5Ypb2hxUyLmeaZ5pkWM+1u0N1V2oIeWaCH5+sOtHWeptBipnmmeabFTPNMewp6bqUeqNYd1Xr9XScf+Et1EQDlqcqL0zP+ynsH9NSr2vyQ7kIPztUTj+iZZdqxTI8v1ha07THtfFM7XtOz27TrLd1Xp3vQE0u0Y7meqdXji7Vtjraguxbp2S\u002FocIdT016fmvLTAAAAjI5WVpfG\u002FLH6K0eLeuOf9cJ2bV+vrfXaWqcfrNcr39XyTccnPTE97eGp6Ur72KTHHb6svV\u002FXQ0\u002FovtXaWtDDz2nnW\u002Frxf+j4Rz6Y9sbwuD9SVwEAAAAAAFAqWVMqu\u002F3+iO+NT9lTmZ6443j\u002FmON95cr0xJ3RCU+WJ90OAAAAUCr7J2MTnqg8KN9x\u002FG7ZiYGxyvTknfKEvaUxfzY85h+XStYAAAAAAAAAAAAAAKDOHh52ed+Qjw1N+PjtEVcAAAAAAAAAANweccXQhI\u002F3DfnY8LDL1dkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD8P8HSw4EMlPZhAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDIzLTA0LTEyVDEyOjI0OjQxKzAwOjAwUmNguAAAACV0RVh0ZGF0ZTptb2RpZnkAMjAyMy0wNC0xMlQxMjoyNDo0MSswMDowMCM+2AQAAAAodEVYdGRhdGU6dGltZXN0YW1wADIwMjMtMDQtMTJUMTI6MjQ6NDErMDA6MDB0K\u002FnbAAAAAElFTkSuQmCC)](https:\u002F\u002Fhuggingface.co\u002FQingyiSi\u002FAlpaca-CoT)\n[![wandb](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fwandb-tools-orange?logo=WeightsAndBiases)](https:\u002F\u002Fwandb.ai)\n[![colab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGoogle-Colab-blue?logo=Google%20Colab)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1wfrKqyPkz5BGD1Gkij_cvbUeweIDdRav?usp=sharing)\n\n这是 `Alpaca-CoT` 项目的代码仓库，旨在构建一个指令微调（Instruction Finetuning, IFT）平台，该平台包含丰富的指令数据集（尤其是思维链（Chain-of-Thought, CoT）数据集），并为多种大语言模型（Large Language Models, LLMs）和参数高效微调方法提供统一接口。我们正在不断扩充我们的 [指令微调数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002F)，并集成更多的大语言模型和参数高效方法。此外，我们创建了一个新分支 [`tabular_llm`](https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Ftree\u002Ftabular_llm)，用于构建面向表格智能任务（Table Intelligence Tasks）的表格大语言模型（Tabular LLM）。\n\n我们诚挚欢迎各位贡献尚未被我们收集的指令微调数据集（或其来源）。我们将统一格式化这些数据，使用它们训练 Alpaca 模型（以及未来早期阶段的其他大语言模型），开源相应的 [模型检查点](https:\u002F\u002Fhuggingface.co\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain)，并开展广泛的实证研究。我们希望本项目能为大语言模型的开源进程做出微薄贡献，并降低自然语言处理（NLP）研究人员的入门门槛。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_473b6eb4f8ce.jpg\" width = \"100\" height = \"100\" align=right \u002F>\n您也可以选择加入我们的微信群聊，与更多志同道合的人交流。目前群成员人数较多，无法直接通过群二维码入群，需要先联系我才能加入。\n\n## 最新动态\n- ⚠ 如果您想使用除 LoRA 以外的其他方法，请安装本项目中修改后的版本：`pip install -e .\u002Fpeft`。\n\n- 🚀12.8：已集成大语言模型 `InternLM`。\n- 🚀8.16：`lora`、`qlora` 和 `adalora` 均支持 `4bit 量化`。\n- 🚀8.16：已集成参数高效方法 `Qlora`、`Sequential adapter` 和 `Parallel adapter`。\n- 🚀7.24：已集成大语言模型 `ChatGLM v2`。\n- 🚀7.20：已集成大语言模型 `Baichuan`。\n- 6.25：新增模型评估代码，包括 belle 和 MMCU。\n\u003Cdetails>\u003Csummary> - 更多 \u003C\u002Fsummary>\n\u003Cp>\n    \n- 5.20：修复模型保存中的 bug，并增加 wandb 支持。\n- 5.15：新增更多数据集，如 `GPT4Tools`、`Auto CoT`、`pCLUE`。\n- 🚀5.5：创建新分支 [`tabular_llm`](https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Ftree\u002Ftabular_llm) 以构建表格大语言模型。我们收集了面向表格相关任务（如表格问答）的指令微调数据，并使用这些数据在本仓库中微调大语言模型。\n- 🚀5.4：已集成 PEFT 中的所有参数高效方法（例如 p-tuning），可通过超参数直接设置。\n- 🚀5.4：已集成大语言模型 `MOSS`。\n- 4.21：已收集并格式化数据集 `GAOKAO`、`camel`、`FLAN-Muffin`、`COIG`。\n- 4.15：已收集并格式化数据集 `webGPT`、`dolly`、`baize`、`hh-rlhf`、`OIG(part)`。\n- 4.12：现在您可以在 \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1wfrKqyPkz5BGD1Gkij_cvbUeweIDdRav?usp=sharing\" >Google Colab\u003C\u002Fa> 上尝试 Alpaca-CoT。\n- 4.11：由 [@paulcx](https:\u002F\u002Fgithub.com\u002Fpaulcx) 新增 `多轮对话` 功能。\n- 4.9：已收集并格式化数据集 `firefly`、`instruct`、`Code Alpaca`，可在 [此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain) 找到。\n- 4.7：由 [@weberr](https:\u002F\u002Fgithub.com\u002Fweberrr) 新增 `参数合并`、`本地聊天`、`批量预测` 和 `Web 服务构建` 功能。\n- 4.4：已收集并格式化数据集 `GPTeacher`、`Guanaco`、`HC3`、`prosocial-dialog`、`belle-chat&belle-math`、`xP3` 和 `natural-instructions`。\n- 4.3：中文 CoT 数据集 `CoT_CN_data.json` 可在 [此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain) 找到。\n  \n\u003C\u002Fp>\n\u003C\u002Fdetails> \n\n## 概览\n\n![img](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_1669fbcd13a2.png)\n\n\n[LLaMA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.13971v1) [1] 是一项杰出的工作，展示了强大的零样本（zero-shot）和少样本（few-shot）能力。它显著降低了训练、微调和使用具有竞争力的大语言模型的成本，例如 LLaMA-13B 的性能优于 GPT-3(175B)，而 LLaMA-65B 则可与 PaLM-540B 相媲美。近期，为了提升 LLaMA 的指令遵循能力，[Stanford Alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca) [2] 使用 [Self-Instruct](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10560) [3] 技术生成的 52K 条指令数据对 LLaMA-7B 进行了微调。然而，当前大语言模型研究社区仍面临三大挑战：1. 即使是 LLaMA-7B 对计算资源仍有较高要求；2. 开源的指令微调数据集较少；3. 缺乏对不同类型指令（如中文指令和 CoT 推理）对模型能力影响的实证研究。\n\n为此，我们提出了本项目，整合了后续提出的多种改进方法，具有以下优势：\n- 1. 本仓库包含基于 [此处](https:\u002F\u002Fgithub.com\u002Ftloen\u002Falpaca-lora) 和 [此处](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft) 修改的代码，通过使用 [低秩自适应（Low-Rank Adaptation, LoRA）](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.09685.pdf) [4]、[PEFT](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft) 和 [bitsandbytes](https:\u002F\u002Fgithub.com\u002FTimDettmers\u002Fbitsandbytes)，能够**_以低成本高效地微调 LLaMA_**（性能不逊于 Stanford Alpaca）。LLaMA 的 `7b`、`13b` 和 `30b` 版本均可在单张 80G A100 上轻松训练。\n- 2. 本仓库发布的模型显著**_提升了 CoT（推理）能力_**。\n- 3. 本仓库发布的模型显著**_提升了中文指令遵循能力_**。\n- 4. 本仓库包含**_[持续扩充的指令微调数据集集合](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT)_**，目前已涵盖英文、中文及 CoT 指令。此外，还提供了使用各类指令数据集训练得到的模型检查点集合。\n- 5. 本仓库**_集成了多种大语言模型并统一了其接口_**，可通过超参数轻松切换。目前已支持 **LLaMA**、**ChatGLM**[5]、**Bloom**[6] 和 **MOSS**，未来将持续增加更多模型，便于研究人员轻松调用和比较不同大语言模型。\n- 6. 本仓库**_集成了多种参数高效微调方法并统一了其接口_**，可通过超参数轻松切换。目前已支持 **LoRA**、**P-tuning**[5]、**adalora** 和 **prefix tuning**，未来将持续增加更多方法，便于研究人员轻松调用和比较不同参数高效方法。\n- 7. 本仓库包含**_广泛的实证研究和定性分析_**，可能提供有价值的发现，推动未来大语言模型的探索。\n\n**据我们所知，本工作是首个基于 LLaMA 和 Alpaca 研究 _CoT 推理_ 的项目。** 因此，我们将本工作简称为 `Alpaca-CoT`。\n\n## 数据收集\n\n所收集数据集的相对规模可通过下图展示：\n\n![img](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_606571b6cd76.png)\n\n\n参考 [此列表](https:\u002F\u002Fgithub.com\u002FyaodongC\u002Fawesome-instruction-dataset)（[@yaodongC](https:\u002F\u002Fgithub.com\u002FyaodongC)），我们根据以下规则对每个收集的数据集进行了标注：\n\n(Lang) 语言标签（Lingual-Tags）:\n- EN: 英文指令数据集（Instruction datasets in English）\n- CN: 中文指令数据集（Instruction datasets in Chinese）\n- ML: 多语言指令数据集（[Multi-lingual] Instruction datasets in multiple languages）\n\n(Task) 任务标签（Task-Tags）:\n- MT: 多任务数据集（[Multi-task] Datasets containing multiple tasks）\n- TS: 特定任务数据集（[Task-specific] Datasets tailored for specific tasks）\n\n(Gen) 生成方法（Generation-method）:\n- HG: 人工生成数据集（[Human Generated Dataset] Datasets created by humans）\n- SI: 自指令生成数据集（[Self-Instruct] Datasets generated using self-instruct methods）\n- MIX: 混合数据集（[Mixed Dataset] Dataset contains both human and machine generated data）\n- COL: 数据集集合（[Collection of Dataset] Dataset made from a collection of other datasets）\n\n### 统计信息\n\n| 数据集（Dataset）                                                              | 数量（Nums） | 语言（Lang） | 任务类型（Task） | 生成方式（Gen） | 类型（Type）                                                                                                      | 来源（Src）                                                                    | 下载链接（Url）                                                                                 |\n| :----------------------------------------------------------------------------- | :------- | :----------- | :-------- | :----------| :---------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------- |\n| [Chain of Thought](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002FFLAN)                    | 74771    | EN\u002FCN        | MT        | HG         | 包含思维链（Chain-of-Thought, CoT）推理的指令数据                                                                  | 在已有数据上标注 CoT                                                           | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FChain-of-Thought)      |\n| [GPT4all](https:\u002F\u002Fgithub.com\u002Fnomic-ai\u002Fgpt4all)                                 | 806199   | EN           | MT        | COL        | 代码、故事和对话                                                                                                   | 从 GPT-3.5-turbo 蒸馏得到                                                      | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FGPT4all)               |\n| [GPTeacher](https:\u002F\u002Fgithub.com\u002Fteknium1\u002FGPTeacher)                             | 29013    | EN           | MT        | SI         | 通用、角色扮演、Toolformer                                                                                         | GPT-4 与 Toolformer                                                            | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FGPTeacher)             |\n| [Guanaco](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FJosephusCheung\u002FGuanacoDataset)       | 534610   | ML           | MT        | SI         | 多种语言学任务                                                                                                     | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FGuanaco)               |\n| [HC3](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHello-SimpleAI\u002FHC3)                      | 37175    | EN\u002FCN        | TS        | MIX        | 对话评估                                                                                                           | 人类或 ChatGPT                                                                 | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FHC3)                   |\n| [alpaca](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca)                         | 52002    | EN           | MT        | SI         | 通用指令                                                                                                           | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Falpaca)                |\n| [Natural Instructions](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fnatural-instructions)        | 5040134  | ML           | MT        | COL        | 多样化的 NLP 任务                                                                                                  | 人工标注的数据集集合                                                           | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FNatural-Instructions)  |\n| [belle_cn](https:\u002F\u002Fhuggingface.co\u002FBelleGroup)                                  | 1079517  | CN           | TS\u002FMT     | SI         | 通用、数学推理、对话                                                                                               | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fbelle_cn)              |\n| [instinwild](https:\u002F\u002Fgithub.com\u002FXueFuzhao\u002FInstructionWild)                     | 52191    | EN\u002FCN        | MT        | SI         | 生成、开放问答（open-QA）、头脑风暴                                                                                | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Finstinwild)            |\n| [prosocial dialog](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fallenai\u002Fprosocial-dialog)   | 165681   | EN           | TS        | MIX        | 对话                                                                                                               | GPT-3 改写问题 + 人工手动反馈                                                  | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fprosocial-dialog)      |\n| [finance_en](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgbharti\u002Ffinance-alpaca)           | 68912    | EN           | TS        | COL        | 金融相关问答                                                                                                       | GPT3.5                                                                         | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002F)                      |\n| [xP3](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbigscience\u002FxP3)                          | 78883588 | ML           | MT        | COL        | 覆盖 46 种语言和 16 项 NLP 任务的提示（prompt）与数据集集合                                                        | 人工标注的数据集集合                                                           | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FxP3)                   |\n| [firefly](https:\u002F\u002Fgithub.com\u002Fyangjianxin1\u002FFirefly)                             | 1649398  | CN           | MT        | COL        | 23 项 NLP 任务                                                                                                     | 人工标注的数据集集合                                                           | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Ffirefly)               |\n| [instruct](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fswype\u002Finstruct)                     | 888969   | EN           | MT        | COL        | GPT4All、Alpaca 和开源 Meta 数据集的增强版本                                                                       | 使用 AllenAI 提供的高级 NLP 工具进行增强                                        | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Finstruct)              |\n| [Code Alpaca](https:\u002F\u002Fgithub.com\u002Fsahil280114\u002Fcodealpaca)                       | 20022    | EN           | TS        | SI         | 代码生成、编辑、优化                                                                                               | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FCodeAlpaca)            |\n| [Alpaca_GPT4](https:\u002F\u002Fgithub.com\u002FInstruction-Tuning-with-GPT-4\u002FGPT-4-LLM)      | 52002    | EN\u002FCN        | MT        | SI         | 通用指令                                                                                                           | 使用 GPT-4 基于 Alpaca 生成                                                    | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FalpacaGPT4)            |\n| [webGPT](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopenai\u002Fwebgpt_comparisons)            | 18994    | EN           | TS        | MIX        | 信息检索（IR）问答                                                                                                 | 微调后的 GPT-3，每条指令有两个输出，选择更优者                                 | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FwebGPT)                |\n| [dolly 2.0](https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly)                           | 15015    | EN           | TS        | HG         | 封闭式问答、摘要等，参考维基百科                                                                                   | 人工标注                                                                       | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fdolly)                 |\n| [baize](https:\u002F\u002Fgithub.com\u002Fproject-baize\u002Fbaize-chatbot)                        | 653699   | EN           | MT        | COL        | 来自 Alpaca、Quora、StackOverFlow 和 MedQuAD 的问题集合                                                            | 人工标注的数据集集合                                                           | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fbaize)                 |\n| [hh-rlhf](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fhh-rlhf)                               | 284517   | EN           | TS        | MIX        | 对话                                                                                                               | 人类与 RLHF 模型之间的对话                                                     | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fhh-rlhf)               |\n| [OIG(part)](https:\u002F\u002Flaion.ai\u002Fblog\u002Foig-dataset\u002F)                                | 49237    | EN           | MT        | COL        | 来自多种任务（如问答）的数据                                                                                       | 使用数据增强和人工标注的数据集集合                                             | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FOIG)                   |\n| [GAOKAO](https:\u002F\u002Fgithub.com\u002FOpenLMLab\u002FGAOKAO-Bench)                            | 2785     | CN           | MT        | COL        | 考试中的选择题、填空题和开放题                                                                                     | 人工标注                                                                       | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FGAOKAO)                |\n| [camel](https:\u002F\u002Fgithub.com\u002Flightaime\u002Fcamel)                                    | 760620   | EN           | MT        | SI         | AI 社会中的角色扮演对话，涵盖代码、数学、物理、化学、生物等领域                                                   | gpt-3.5-turbo                                                                  | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fcamel)                 |\n| [FLAN-Muffin](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMuennighoff\u002Fflan)                | 1764800  | EN           | MT        | COL        | 60 项 NLP 任务                                                                                                     | 人工标注的数据集集合                                                           | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FFLAN-Muffin)           |\n| [COIG(FlagInstruct)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBAAI\u002FCOIG)                | 298428   | CN           | MT        | COL        | 来自考试、翻译、人类价值观对齐指令和反事实修正多轮对话的数据                                                       | 使用自动化工具并辅以人工验证                                                   | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FCOIG)                  |\n| [GPT4Tools](https:\u002F\u002Fgithub.com\u002FStevenGrove\u002FGPT4Tools)                          | 71446    | EN           | MT        | SI         | 工具相关指令集合                                                                                                   | gpt-3.5-turbo                                                                  | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fgpt4tools)             |\n| [ShareChat](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRyokoAI\u002FShareGPT52K)               | 1663241  | EN           | MT        | MIX        | 通用指令                                                                                                           | 众包收集的人类与 ChatGPT（ShareGPT）之间的对话                                 | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FShareGPT)              |\n| [Auto CoT](https:\u002F\u002Fgithub.com\u002Famazon-science\u002Fauto-cot)                         | 5816     | EN           | MT        | COL        | 算术、常识、符号及其他逻辑推理任务                                                                                 | 人工标注的数据集集合                                                           | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FAuto-CoT)              |\n| [MOSS](https:\u002F\u002Fgithub.com\u002FOpenLMLab\u002FMOSS)                                      | 1583595  | EN\u002FCN        | TS        | SI         | 通用指令                                                                                                           | text-davinci-003                                                               | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FMOSS)                  |\n| [ultrachat](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FUltraChat)                               | 28247446 | EN           |           |            | 关于世界的问题、写作与创作、对现有材料的辅助                                                                       | 两个独立的 gpt-3.5-turbo                                                       | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fultrachat)             |\n| [Chinese-medical](https:\u002F\u002Fgithub.com\u002FToyhom\u002FChinese-medical-dialogue-data)     | 792099   | CN           | TS        | COL        | 医疗建议相关问题                                                                                                   | 爬取                                                                           | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FChinese-medical)       |\n| [CSL](https:\u002F\u002Fgithub.com\u002Fydli-ai\u002Fcsl)                                          | 396206   | CN           | MT        | COL        | 论文文本生成、关键词提取、文本摘要和文本分类                                                                       | 爬取                                                                           | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FCSL)                   |\n| [pCLUE](https:\u002F\u002Fgithub.com\u002FCLUEbenchmark\u002FpCLUE)                                | 1200705  | CN           | MT        | COL        | 通用指令                                                                                                           |                                                                                | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002FpCLUE)                 |\n| [news_commentary](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FP01son\u002Finstructions)         | 252776   | CN           | TS        | COL        | 翻译                                                                                                               |                                                                                | [download](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain\u002Fnews_commentary)       |\n| [StackLLaMA](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Flvwerra\u002Fstack-exchange-paired)    | todo     | EN           |           |            |                                                                                                                   |                                                                                |                                                                                                 |\n\n### 下载\n你可以从 [此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain) 下载所有已格式化的数据。下载后，请将它们放入 [data](https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002Falpaca-CoT\u002Ftree\u002Fmain\u002Fdata) 文件夹中。\n\n你也可以从 [此处](https:\u002F\u002Fhuggingface.co\u002FQingyiSi\u002FAlpaca-CoT\u002Ftree\u002Fmain) 下载在各类指令数据上训练好的所有模型检查点（checkpoints）。随后，只需在 `generate.py` 中将 `LoRA_WEIGHTS` 设置为本地路径，即可直接执行模型推理（inference）。\n\n### 数据格式化\n我们收集的所有数据均被统一格式化为相同的模板，其中每个样本格式如下：\n```\n[\n{\"instruction\": instruction string,\n\"input\": input string, # (可能为空)\n\"output\": output string}\n]\n```\n注意：对于思维链（Chain-of-Thought, CoT）数据集，我们首先使用 FLAN 提供的 [模板](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002FFLAN\u002Fblob\u002Fmain\u002Fflan\u002Fv2\u002Ftemplates.py) 将原始数据集转换为多种 CoT 形式，然后再转换为上述格式。格式化脚本可在 [此处](https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002Falpaca-CoT\u002Fblob\u002Fmain\u002Fdata\u002Forigin_cot_data\u002Fformating.py) 找到。\n\n## 多接口统一平台\n### 环境设置\n```\npip install -r requirements.txt\n```\n注意：在微调 ChatGLM 时，请确保 Python 版本 >= 3.9。\n\n**PEFT**\n* 如果你想使用除 LoRA 以外的其他方法，请安装本项目中提供的修改版 PEFT：\n```\npip install -e .\u002Fpeft\n```\n\n\n### 指令微调（Instruction Finetuning）\n为了便于研究人员对大语言模型（LLMs）进行系统的指令微调（IFT）研究，我们收集了不同类型的指令数据，集成了多个 LLM，并统一了接口，方便用户灵活组合所需配置：\n- `--model_type`：设置要使用的 LLM。目前支持 [llama, chatglm, bloom, moss]。后两者具备较强的中文能力，未来将集成更多 LLM。\n- `--peft_type`：设置要使用的参数高效微调（PEFT）方法。目前支持 [lora, adalora, prefix tuning, p tuning, prompt]。\n- `--data`：设置用于 IFT 的数据类型，以灵活定制所需的指令遵循能力。例如，若需强推理能力，可设为 \"alpaca-cot\"；若需强中文能力，可设为 \"belle1.5m\"；若需代码与故事生成能力，可设为 \"gpt4all\"；若需金融相关回复能力，可设为 \"finance\"。\n- `--model_name_or_path`：用于加载目标 LLM（由 `--model_type` 指定）的不同版本模型权重。例如，若要加载 LLaMA 的 13B 版本权重，可设置为 `decapoda-research\u002Fllama-13b-hf`。\n\n**单 GPU**\n- 对于 LLaMA\n```\npython3 uniform_finetune.py --model_type llama --model_name_or_path decapoda-research\u002Fllama-7b-hf \\\n    --data alpaca-belle-cot --lora_target_modules q_proj v_proj \\\n    --per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1\n```\n\n注意：对于多个数据集，可使用 `--data` 如下形式：`--data .\u002Fdata\u002Falpaca.json .\u002Fdata\u002Ffinance.json \u003Cpath2yourdata_1>`\n\n- 对于 ChatGLM\n```\npython3 uniform_finetune.py   --model_type chatglm --model_name_or_path THUDM\u002Fchatglm-6b \\\n    --data alpaca-belle-cot --lora_target_modules query_key_value \\\n    --lora_r 32 --lora_alpha 32 --lora_dropout 0.1 --per_gpu_train_batch_size 2 \\\n    --learning_rate 2e-5 --epochs 1\n```\n注意：`load_in_8bit` 目前尚不适用于 ChatGLM，因此 batch_size 必须比其他模型更小。\n\n- 对于 BLOOM\n```\npython3 uniform_finetune.py   --model_type bloom --model_name_or_path bigscience\u002Fbloomz-7b1-mt \\\n    --data alpaca-belle-cot --lora_target_modules query_key_value \\\n    --per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1\n```\n\n- 对于 MOSS\n```\npython3 uniform_finetune.py   ---model_type moss --model_name_or_path fnlp\u002Fmoss-moon-003-sft  \\\n    --data alpaca --lora_target_modules q_proj v_proj --per_gpu_train_batch_size 1 \\\n    --learning_rate 3e-4 --epochs 3\n```\n\n- 对于 InternLM\n```\npython3 uniform_finetune.py   --model_type internlm --model_name_or_path internlm\u002Finternlm-7b \\\n    --data alpaca --lora_target_modules q_proj v_proj --lora_r 32 --lora_alpha 32 \\\n    --lora_dropout 0.1 --per_gpu_train_batch_size 1 --learning_rate 2e-5 --epochs 1 \\\n    --compute_dtype=\"fp32\"\n```\n\n注意：你也可以将本地路径（保存 LLM 权重的位置）传给 `--model_name_or_path`。数据类型 `--data` 可根据你的兴趣自由设置。\n\n**多 GPU**\n``` bash\ntorchrun --nnodes 1 --nproc_per_node $ngpu uniform_finetune.py $args --data $data \n```\n\n- 对于 LLaMA\n```\npython3 -m torch.distributed.launch --nproc_per_node 4  \\\n    --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy uniform_finetune.py \\\n    --model_type llama --model_name_or_path decapoda-research\u002Fllama-7b-hf \\\n    --data alpaca-belle-cot --lora_target_modules q_proj v_proj \\\n    --per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1\n```\n- 对于 ChatGLM\n```\npython3 -m torch.distributed.launch --nproc_per_node 4  \\\n    --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy \\\n    uniform_finetune.py   --model_type chatglm --model_name_or_path THUDM\u002Fchatglm-6b \\\n    --data alpaca-belle-cot --lora_target_modules query_key_value \\\n    --lora_r 32 --lora_alpha 32 --lora_dropout 0.1 --per_gpu_train_batch_size 2 \\\n    --learning_rate 2e-5 --epochs 1\n```\n注意：`load_in_8bit` 目前尚不适用于 ChatGLM，因此 batch_size 必须比其他模型更小。\n\n- 对于 BLOOM\n```\npython3 -m torch.distributed.launch --nproc_per_node 4  \\\n    --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy \\\n    uniform_finetune.py   --model_type bloom --model_name_or_path bigscience\u002Fbloomz-7b1-mt \\\n    --data alpaca-belle-cot --lora_target_modules query_key_value \\\n    --per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1\n```\n\n- 对于 InternLM\n```\npython3 -m torch.distributed.launch --nproc_per_node 4  \\\n    --nnodes=1 --node_rank=0 --master_addr=xxx --master_port=yyy \\\n    uniform_finetune.py   --model_type internlm --model_name_or_path internlm\u002Finternlm-7b \\\n    --data alpaca --lora_target_modules q_proj v_proj --lora_r 32 --lora_alpha 32 \\\n    --lora_dropout 0.1 --per_gpu_train_batch_size 1 --learning_rate 2e-5 --epochs 1 \\\n    --compute_dtype=\"fp32\"\n```\n\n\n\n### 推理（Inference）\n```\npython3 generate.py  --data alpaca-belle-cot --model_type llama\n\npython3 generate.py  --data alpaca-belle-cot --model_type chatglm\n\npython3 generate.py  --data alpaca-belle-cot --model_type bloom\n\n```\n有关指令微调和推理的更多细节，可参考我们所基于修改的项目 [此处](https:\u002F\u002Fgithub.com\u002Ftloen\u002Falpaca-lora)。注意：`saved-xxx7b` 文件夹是 LoRA 权重的保存路径，而 LLaMA 权重会自动从 Hugging Face 下载。\n\n### 推理超参数说明\n```\ntop_p=0.9,\n        #适度提高 nucleus sampling（核采样）的概率阈值，以增加候选 token 的数量，从而提升生成多样性。\n\ntemperature=1.0,\n        #之前较低的 temperature 参数可能导致生成词的概率分布严重极化，使生成策略退化为贪心解码（greedy decoding）。\n\ndo_sample=True,\n        #do_sample 参数默认为 False。设为 True 后，生成方法将转变为 beam-search multinomial sampling（束搜索多项式采样）解码策略。\n\nno_repeat_ngram_size=6,\n        #将下一个重复 n-gram 的概率设为 0，确保不会出现重复的 n-gram。此设置为经验性初步探索。\n\nrepetition_penalty=1.8,\n        #对于先前已出现过的词，在后续预测过程中通过引入 repetition_penalty（重复惩罚）参数降低其再次出现的概率。此设置为经验性初步探索。\n```\n\n\n### 参数合并\n```\npython3 merge.py --model_type llama --size 7b --lora_dir xxx --merged_dir yyy\n```\n\n### 本地聊天\n```\npython3 server.py --model_type chatglm --size 6b --lora_dir xxx\n```\n### 批量预测\n```\npython3 predict.py --model_type chatglm --size 6b --data for_dict_data --lora_dir xxx --result_dir yyy\n```\n\n### Web 服务部署\n\n```\npython3 web.py --model_type chatglm --size 6b --lora_dir xxx\n```\n\n## 中文指令微调开源大语言模型的实证研究（截至 6 月 25 日）\n\u003Cdetails>\u003Csummary>注：以下实验结果均来自《An Empirical Study of Instruction-tuning Large Language Models in Chinese》（中文指令微调大语言模型实证研究）。\u003C\u002Fsummary>\n\u003Cp>\n\n### 1. 基准测试（Benchmarks）\n本文选取了两个评估基准 Belle-eval 和 MMCU，以全面评估大语言模型（LLM）在中文场景下的能力。\n\nBelle-eval 是通过 ChatGPT 自我指导（self-instruct）构建的，包含 1,000 条多样化的指令，涵盖 10 个类别，包括常见的 NLP 任务（如问答 QA）和具有挑战性的任务（如代码和数学）。我们使用 ChatGPT 根据标准答案对模型回复进行评分。该基准被视为对通用人工智能（AGI）指令遵循能力的评估。\n\nMMCU 是一个中文多项选择题数据集，涵盖医学、法律、心理学和教育四个专业学科（例如高考题目）。它允许大语言模型以多项选择题的形式参与人类社会的考试，因此适合评估 LLM 在多学科知识广度与深度方面的能力。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_0acc26d0020d.png\" width=\"35%\">\n\u003C\u002Fp>\n\n上表展示了 Belle-eval 和 MMCU 的数据统计信息。\n\n### 2. 主要影响因素\n我们通过实验研究了指令微调大语言模型中的三个主要因素：基础大语言模型（LLM bases）、参数高效微调方法（Parameter-efficient Methods）和中文指令数据集（Chinese Instruction Datasets）。\n\n#### 2.1 基础大语言模型（LLM Bases）\n针对开源大语言模型，我们在 Belle-eval 和 MMCU 上分别测试了现有 LLM 以及在 Alpaca-GPT4 数据集上使用 LoRA 微调后的 LLM。\n\n   \u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_957c8ddf0b7a.png\" width=\"80%\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_add64bb6aada.png\" width=\"40%\">\n\u003C\u002Fp>\n\n表 2 展示了开源 LLM 在 Belle-eval 上的得分，表 3 展示了 LLM 在 MMCU 上的准确率。所有开源 LLM 均使用相同的参数高效方法 LoRA 和相同的指令数据集 Alpaca-GPT4 进行微调。\n\n___实验结果：___  \n1. 现有 LLM 的评估\n\n    ___Belle-eval 上的表现___\n\n    (1) 在基础 LLM 中，Bloom 表现最佳。\n\n    (2) 在经过监督微调（sft）的 LLM 中，ChatGLM 显著优于其他模型，这得益于其使用了最多的中文 token 和人类反馈强化学习（HFRL, Human Feedback Reinforcement Learning）。\n\n    (3) 开放式问答（Open QA）、数学（Math）、封闭式问答（CloseQA）和信息抽取（Extract）等类别对现有开源 LLM 仍然极具挑战性。\n\n    (4) Vicuna 和 moss-sft 相较于其基础模型 LLaMA 和 moss-base 分别有明显提升。\n\n    (5) 相比之下，sft 模型 Bloomz 和 Bloomz-mt 的表现反而低于基础模型 Bloom，因为它们倾向于生成更短的回复。\n\n    ___MMCU 上的表现___\n\n    (1) 所有基础 LLM 表现都很差，因为在微调前几乎无法按指定格式生成内容（例如输出选项编号）。\n\n    (2) 所有 sft LLM 均优于其对应的基础 LLM。其中 Bloomz 表现最佳（甚至超过 ChatGLM），因为它能直接按要求生成选项编号，而不产生无关内容，这也与其监督微调数据集 xP3 的数据特性有关。\n\n    (3) 在四个学科中，法律对 LLM 最具挑战性。\n\n   \u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_2a6adbf6b511.png\" width=\"40%\">\n\u003C\u002Fp>\n\n图 1 展示了在 Alpaca-GPT4-zh 数据集上进行指令微调后 LLM 的性能结果。\n\n2. 对不同 LLM 进行指令微调\n\n    (1) 在 Belle-eval 上，除 Bloomz 和 Bloomz-mt 外，sft LLM 经指令微调带来的性能提升不如基础 LLM 显著。\n\n    (2) Vicuna 和 ChatGLM 在指令微调后性能下降，因为 Vicuna 是基于真实的人类-ChatGPT 对话训练的，其数据质量优于 Alpaca-GPT4；而 ChatGLM 采用了 HFRL，可能不再适合进一步的指令微调。\n\n    (3) 在 MMCU 上，大多数 LLM 在指令微调后性能均有提升，但 Bloomz 和 Bloomz-mt 却意外地出现了显著性能下降。\n\n    (4) 指令微调后，Bloom 在两个基准上均有显著提升且表现良好。尽管 ChatGLM 始终优于 Bloom，但在指令微调过程中出现了性能下降。因此，在所有开源 LLM 中，Bloom 最适合作为后续中文指令微调探索的基础模型。\n\n#### 2.2 参数高效微调方法（Parameter-efficient Methods）\n除了 LoRA 之外，本文还收集了多种参数高效微调方法，并在 Alpaca-GPT4 数据集上对 Bloom 模型进行了指令微调。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_0eb1d6ebf45a.png\" width=\"40%\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_1ec45380c7b2.png\" width=\"40%\">\n\u003C\u002Fp>\n\n___实验结果：___\n\n1. 参数高效方法对比\n\n    (1) SadapterH 在所有参数高效方法中表现最佳，可作为 LoRA 的替代方案。\n\n    (2) P-tuning 和 prompt-tuning 显著落后于其他方法，表明仅在嵌入层（embedding layer）添加可训练参数不足以支持 LLM 完成生成任务。\n\n    (3) 尽管 AdaLoRA 是 LoRA 的改进版本，但其性能明显下降，可能是因为 LLM 所需的 LoRA 可训练参数量已不适合进一步缩减。\n\n(4) 对比上下两部分可以看出，增加串行适配器（即 SadapterP 和 SadapterH）的可训练参数数量并未带来性能提升，而并行适配器（即 P-adapter）则呈现出相反的现象。\n\n2. 训练损失（Training Loss）\n\n    (1) Prompt-tuning 和 P-tuning 收敛最慢，且收敛后的损失最高。这表明仅调整嵌入层（embedding-only）的适配器不适合用于大语言模型（LLM）的指令微调（instruction-tuning）。\n\n    (2) AdaLoRA 的初始损失非常高，因为它需要同时学习参数预算分配（parameter budget allocation），导致模型难以很好地拟合训练数据。\n\n    (3) 其他方法能够快速在训练数据上收敛，并实现良好的拟合效果。\n\n#### 2.3 中文指令数据集（Chinese instruction Datasets）\n\n为了研究不同类型中文指令数据集的影响，作者收集了流行的开源中文指令数据（如表 5 所示），并使用 LoRA 对 Bloom 模型进行微调。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_a9dbbd10c232.png\" width=\"80%\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_d45c1b6b8541.png\" width=\"80%\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_1e7b291a4719.png\" width=\"40%\">\n\u003C\u002Fp>\n\n表 6 和表 7 展示了 Bloom 在不同指令数据集上的微调结果。\n\n___实验结果（Experimental Results）：___\n\n1. Belle-eval 上的表现\n\n    (1) 由 ChatGPT 构建的指令数据（例如通过自指令方法生成，或收集真实的人类与 ChatGPT 对话）始终能显著提升模型的指令遵循能力，得分提高 3.1 ∼ 11 分。\n\n    (2) 在这些数据集中，Belle 表现最佳，因其包含的指令数据量最大。然而，以类似方式构建但数据量更多的 moss-sft-data 所训练的模型表现却不理想。\n\n    (3) Alpaca-GPT4 指令带来的性能提升位居第二，仅用 49K 条指令即可媲美拥有 1.54M 条指令的 Belle。\n\n    (4) Instinwild 带来的性能增益最小，因为它从推特（“in wild”）爬取的种子指令不如 Alpaca 等由人工精心设计的指令全面。\n\n    (5) 这些基于 ChatGPT 的数据主要对开放生成任务（如头脑风暴 Brain Storm 和文本生成 Generation）有显著提升效果，但在需要高阅读理解能力的任务（如封闭式问答 Close QA 和信息抽取 Extract）上表现明显下降。\n\n    (6) 这些指令数据反而损害了模型的指令遵循能力，因为各类 NLP 或考试类数据集的形式和意图较为单一，容易导致过拟合。\n\n    (7) 其中，COIG-trans 表现最佳，因其涵盖超过 2000 种不同任务，任务指令形式丰富多样。相比之下，xP3 和 COIG-ccmc 对模型性能的负面影响最大。前者仅覆盖少量任务类型（翻译和问答），后者则专注于反事实修正对话（counterfactual correction conversations），均未能覆盖人类常用的主流指令和任务。\n\n2. MMCU 上的表现\n\n    (1) 在每个数据集上进行指令微调均能带来性能提升。\n\n    (2) 在上半部分所示的基于 ChatGPT 的数据中，ShareGPT-zh 的表现远低于其他数据集。这可能是因为真实用户很少就学术主题提出多项选择题。\n\n    (3) 在下半部分所示的数据集收集类数据中，HC3 和 COIG-ccmc 的准确率最低。原因在于 HC3 的独特问题仅有 13K 条，而 COIG-ccmc 的任务格式与 MMCU 差异显著。\n\n    (4) COIG-exam 带来了最大的准确率提升，得益于其任务格式与 MMCU 高度相似。\n\n### 3. 其他因素  \n四个其他因素：思维链（CoT, Chain-of-Thought）、中文词汇表扩展、提示语言（Language of Prompts）以及人类价值观对齐（Human-value Alignment）\n\n#### 3.1 思维链（CoT）\n在 CoT 方面，作者比较了在指令微调（instruction-tuning）过程中加入 CoT 数据前后的模型性能。\n\n___实验设置：___\n\n我们从 FLAN 中收集了 9 个 CoT 数据集及其对应的提示（prompts），并使用 Google Translate 将其翻译为中文。随后，比较在指令微调中加入 CoT 数据前后的性能差异。\n\n首先，将加入 CoT 数据的方式标记为 “Alpaca-GPT4+CoT”。此外，在每条指令末尾添加一句“先思考，再决定”（即英文中的 \"think step by step\"），以引导模型基于 CoT 进行回答，并将这种方式标记为 “Alpaca-GPT4+CoT*”。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_8af41667a4cb.png\" width=\"40%\">\n\u003C\u002Fp>\n\n___实验结果：___ \n\n1. “Alpaca-GPT4+CoT” 在需要强推理能力的代码（Code）和数学（Math）任务上优于 “Alpaca-GPT4”。此外，在 MMCU 教育（Education）任务上也有显著提升。\n\n2. 如 “Alpaca-GPT4+CoT*” 所示，仅添加这一简单句子即可进一步提升代码和教育类推理任务的性能，但数学任务的表现略逊于 “Alpaca-GPT4+CoT”。这可能需要进一步探索更鲁棒的提示设计。\n\n#### 3.2 中文词汇表扩展\n在中文词汇表扩展方面，作者测试了分词器（tokenizer）词汇表中中文 token 数量对大语言模型（LLMs）中文表达能力的影响。例如，若某个汉字在词汇表中，则可用单个 token 表示；否则可能需要多个 token 来表示。\n\n___实验设置：___ 作者主要在 LLaMA 上进行实验。LLaMA 使用 SentencePiece（中文字符词汇表大小为 32K），覆盖的中文字符少于 Bloom（250K）。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_a8f7bfa8b538.png\" width=\"45%\">\n\u003C\u002Fp>\n\n___实验结果：___\n\n1. 在扩大中文词汇表的基础上，使用更多中文语料进行预训练，始终有助于提升模型的指令遵循能力。\n\n2. 与直觉相反，“llama-voc-pre-l”（100B）在 MMCU 上的表现不如 “llama-voc-pre”（20B），这表明在学术考试任务上，更多的预训练数据未必带来更高的性能。\n\n#### 3.3 提示语言（Language of Prompts）\n\n在提示语言方面，作者测试了使用中文提示进行指令微调的适用性。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_277dde909c67.png\" width=\"60%\">\n\u003C\u002Fp>\n\n图 4 展示了基于 LLaMA 和 Bloom 使用中文与英文提示的结果。在对 LLaMA 进行指令微调时，使用中文提示相比英文提示在两个基准测试上均能提升性能；而在 Bloom 上则观察到相反的现象。\n\n___实验结果：___\n\n1. 对于中文能力较弱的模型（如 LLaMA），使用中文提示能有效帮助其生成中文回答。\n\n2. 对于中文能力较强的模型（如 Bloom），使用其更擅长的语言（英文）作为提示，能更好地引导模型理解指令微调的过程。\n\n#### 3.4 人类价值观对齐（Human-value Alignment）\n为了避免大语言模型生成有害内容，使其与人类价值观对齐是一个关键问题。我们在指令微调中加入了由 COIG 构建的人类价值观对齐数据，以探索其影响。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_cd00a707ac64.png\" width=\"30%\">\n\u003C\u002Fp>\n\n图 5 比较了在指令微调中加入与不加入人类价值观对齐数据的结果。\n\n___实验结果：___ 加入人类价值观对齐会导致性能略有下降。如何在未来研究中平衡大语言模型的无害性（harmlessness）与性能，是一个值得探索的方向。\n\n\n\u003C\u002Fp>\n\u003C\u002Fdetails> \n\n## 定量分析\n\u003Cdetails>\u003Csummary>注：下图展示了截至 3 月 26 日所收集数据集的统计情况，仅用于说明数据收集的动机。此后已收集更多数据集，例如金融相关的指令数据集。\u003C\u002Fsummary>\n\u003Cp>\n\n![data collection statistics](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_4a1bed34cc83.png)\n当前收集的指令微调数据集主要包括以下三部分：\n- `alpaca_data_cleaned.json`：约 5.2 万条英文指令遵循训练样本。\n- `CoT_data.json`：9 个 CoT 数据集，共约 7.5 万条样本。（由 FLAN[7] 发布）\n- `belle_data_cn.json`：约 50 万条中文指令遵循训练样本。（由 BELLE[8] 发布）\n\n### CoT 与中文指令的消融实验（Ablation of CoT and Chinese Instructions）\n\n\n![ablation-cot](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_49d95f4ba257.png)\n“w\u002Fo CoT” 和 “w\u002Fo CN” 分别表示在指令微调数据中排除 CoT 数据和中文指令的模型。\n\n上表展示了两个需要一定推理能力（涉及数值计算）才能正确回答的示例。  \n如中间列所示，`Ours w\u002Fo CoT` 未能生成正确回答，表明一旦微调数据中不含 CoT 数据，模型的推理能力会显著下降。这进一步证明 CoT 数据对大语言模型至关重要。\n\n![ablation-cot](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_c1396b21bd1d.png)\n\n上表展示了两个需要具备中文指令响应能力的示例。  \n如右列所示，`Ours w\u002Fo CN` 要么生成的内容不合理，要么用英文回答中文指令。这表明在微调过程中移除中文数据会导致模型无法处理中文指令，进一步证明了收集中文指令微调数据的必要性。\n\n![ablation-cot](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_f4b18d68f3c4.png)\n\n上表展示了一个相对困难的示例，既需要一定的中国历史知识积累，又要求具备逻辑完整地陈述历史事件的能力。如表所示，`Ours w\u002Fo CN` 仅能生成简短且错误的回答，这是由于缺乏中文微调数据，自然也缺少相应的中国历史知识。尽管 `Ours w\u002Fo CoT` 列出了一些相关的中国历史事件，但其表述逻辑自相矛盾，这是由于缺少 CoT 数据所致。\n\n**综上所述，使用我们完整的数据集（包含英文、中文和 CoT 指令数据）进行微调的模型，能显著提升模型的推理能力和中文指令遵循能力。**\n\n### CoT 数据的效果\n\n![CoT-comparison](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_db6d5ab2a00b.png)  \n奇数行的样本未使用 CoT（Chain-of-Thought，思维链）提示，例如“逐步推理”。`Ours(w\u002FCoT)` 和 Alpaca 均基于 LLaMA-7B，两者唯一的区别在于 `Ours(w\u002FCoT)` 的指令微调数据中额外包含了 CoT 数据。\n\n从上表中，我们发现：\n- `Ours(w\u002FCoT)` 总是在答案前生成正确的推理过程，而 Alpaca 无法生成任何合理的推理过程，如前四个例子（常识性问题）所示。这表明使用 CoT 数据进行微调可以显著提升模型的推理能力。\n- 对于 `Ours(w\u002FCoT)`，CoT 提示（例如在输入问题前拼接“step-by-step”）对简单问题（如常识性问题）影响较小，但对复杂问题（如后四个需要推理的问题）有显著效果。\n- 对于 Alpaca，CoT 提示始终效果甚微，甚至产生负面影响。在最后两个例子中，加入 CoT 提示后，Alpaca 将原本正确的答案改成了错误答案。这可能是由于微调阶段和推理阶段的输入形式不一致所致。\n\n\n### 中文指令数据的效果\n\n_对中文指令响应的定量比较。_  \n![CN_compare_CN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_20630df5f6a2.png)\n\n我们的模型基于 7B LLaMA，在 52K 条英文指令和 50 万条中文指令上进行了微调。Stanford Alpaca（我们复现的版本）基于 7B LLaMA，在 52K 条英文指令上微调。BELLE 则基于 7B BLOOM，在 200 万条中文指令上微调。\n\n从上表中可以得出以下观察：\n- 与 Alpaca 相比，`ours (w\u002F CN)` 具备更强的中文指令理解能力。在第一个例子中，Alpaca 无法区分 `instruction` 部分和 `input` 部分，而我们的模型可以正确区分。\n- 中文指令微调数据能显著增强模型用中文交互的能力。在第二个例子中，`ours (w\u002F CN)` 不仅提供了正确的代码，还附上了对应的中文注释，而 Alpaca 没有做到这一点。此外，如第 3–5 个例子所示，Alpaca 只能用英文回应中文指令。\n- 与 BELLE 相比，`ours (w\u002F CN)` 在需要开放式回答的指令上（如最后两个例子）仍有待提升。BELLE 在此类指令上的出色表现归因于：1）其 BLOOM 主干模型在预训练阶段接触了更多多语言数据；2）其中文指令微调数据量远超我们（200 万 vs 50 万）。\n\n\n_对英文指令响应的定量比较。本小节旨在探究中文指令微调是否会对 Alpaca 的英文指令遵循能力产生负面影响。_  \n![CN_compare_EN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_2b3356b2496d.png)\n\n\n从上表中，我们发现：\n- 使用中文指令数据进行微调并不会削弱模型原有的英文指令遵循能力；相反，在生成英文指令的响应方面还有一定提升。例如在第三个例子中，`ours (w\u002F CN)` 列出了比 Alpaca 多三个省份的详细信息。\n\n\n\u003C\u002Fp>\n\u003C\u002Fdetails> \n\n\n\n\n## 引用\n如果您使用了本仓库中的数据收集、代码或实验发现，请引用本仓库。\n```\n@misc{si2023empirical,\n      title={An Empirical Study of Instruction-tuning Large Language Models in Chinese}, \n      author={Qingyi Si and Tong Wang and Zheng Lin and Xu Zhang and Yanan Cao and Weiping Wang},\n      year={2023},\n      eprint={2310.07328},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n对于数据和模型，请同时引用原始数据、参数高效方法以及大语言模型（LLMs）的来源。\n\n我们特别感谢 APUS AilMe Lab 赞助了实验所需的 8 块 A100 GPU。\n\n\n\u003Cp align=\"right\">(\u003Ca href=\"#top\">回到顶部\u003C\u002Fa>)\u003C\u002Fp>\n\n## 特别感谢所有贡献者\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_readme_da68014e60e1.png\" \u002F>\n\u003C\u002Fa>","# Alpaca-CoT 快速上手指南\n\n## 环境准备\n\n- **操作系统**：Linux \u002F macOS（推荐 Ubuntu 20.04+）\n- **Python 版本**：≥ 3.8\n- **PyTorch 版本**：≥ 1.13（建议使用 CUDA 11.7\u002F11.8）\n- **显存要求**：至少 16GB（用于 LLaMA 等大模型微调；若使用 4-bit 量化可降低至 8GB）\n\n> 💡 国内用户建议使用清华源或阿里云镜像加速依赖安装：\n> ```bash\n> pip config set global.index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\n1. 克隆仓库：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT.git\n   cd Alpaca-CoT\n   ```\n\n2. 安装基础依赖：\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. （可选）如需使用除 LoRA 外的其他参数高效微调方法（如 QLoRA、Adapter 等），请安装项目内定制版 PEFT：\n   ```bash\n   pip install -e .\u002Fpeft\n   ```\n\n4. 登录 Hugging Face（用于下载模型和数据集）：\n   ```bash\n   huggingface-cli login\n   ```\n   > 若在国内访问较慢，可配置 HF 镜像（如通过 `HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com`）\n\n## 基本使用\n\n以下示例展示如何使用 Alpaca-CoT 对 LLaMA 模型进行 LoRA 微调：\n\n```bash\npython src\u002Ftrain_bash.py \\\n    --model_name_or_path decapoda-research\u002Fllama-7b-hf \\\n    --dataset alpaca_cot \\\n    --do_train \\\n    --finetuning_type lora \\\n    --lora_target q_proj,v_proj \\\n    --output_dir output\u002Fllama-7b-lora \\\n    --per_device_train_batch_size 4 \\\n    --gradient_accumulation_steps 4 \\\n    --lr_scheduler_type cosine \\\n    --logging_steps 10 \\\n    --save_steps 1000 \\\n    --learning_rate 5e-5 \\\n    --num_train_epochs 3.0 \\\n    --fp16\n```\n\n> ✅ 数据集 `alpaca_cot` 已自动从 [Hugging Face 数据集库](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FQingyiSi\u002FAlpaca-CoT) 加载，包含大量思维链（Chain-of-Thought）指令数据。\n\n训练完成后，可使用以下命令进行本地推理测试：\n```bash\npython src\u002Fcli_demo.py \\\n    --model_name_or_path decapoda-research\u002Fllama-7b-hf \\\n    --adapter_name_or_path output\u002Fllama-7b-lora \\\n    --finetuning_type lora\n```","某高校 NLP 实验室的研究员正在尝试基于开源大模型（如 LLaMA）进行指令微调，目标是构建一个能生成高质量思维链（Chain-of-Thought, CoT）推理的问答系统。\n\n### 没有 Alpaca-CoT 时\n- 需要手动整合来自不同来源的 CoT 指令数据（如 FLAN、Alpaca、Self-Instruct），格式不统一，清洗和对齐耗时费力。\n- 尝试 LoRA、P-tuning 等参数高效微调方法时，每换一种方法都要重写训练脚本，代码复用率极低。\n- 切换不同大模型（如从 LLaMA 换到 Bloom）需重新适配整个训练流程，接口差异大，调试成本高。\n- 团队成员各自维护不同版本的微调代码，协作困难，难以复现彼此结果。\n\n### 使用 Alpaca-CoT 后\n- 内置统一格式的 CoT 指令数据集接口，一行配置即可加载多种来源数据，省去繁琐预处理。\n- 支持 LoRA、P-tuning 等主流参数高效方法的即插即用，只需修改配置文件即可切换，无需重写训练逻辑。\n- 提供标准化的大模型接入层，LLaMA、Bloom 等模型只需指定名称即可自动适配，训练流程完全一致。\n- 整个团队基于同一套代码库开发，PR 可直接贡献回主干，实验可复现性显著提升。\n\nAlpaca-CoT 通过统一数据、模型与微调方法的接口，大幅降低大模型指令微调的技术门槛和工程成本。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPhoebusSi_Alpaca-CoT_9692e09e.png","PhoebusSi","Qingyi Si","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FPhoebusSi_7c85e915.jpg",null,"UCAS","Beijing,China","https:\u002F\u002Fphoebussi.github.io\u002F","https:\u002F\u002Fgithub.com\u002FPhoebusSi",[84,88,92,96,100,104],{"name":85,"color":86,"percentage":87},"Jupyter Notebook","#DA5B0B",89.7,{"name":89,"color":90,"percentage":91},"Python","#3572A5",8.1,{"name":93,"color":94,"percentage":95},"MDX","#fcb32c",2.1,{"name":97,"color":98,"percentage":99},"Dockerfile","#384d54",0.1,{"name":101,"color":102,"percentage":103},"Shell","#89e051",0,{"name":105,"color":106,"percentage":103},"Makefile","#427819",2800,251,"2026-04-01T01:05:06","Apache-2.0",4,"Linux, macOS, Windows","需要 NVIDIA GPU，显存 8GB+（用于 4bit 量化训练如 QLoRA），CUDA 版本需与 PyTorch >=1.13 兼容（建议 CUDA 11.7+）","未说明",{"notes":116,"python":114,"dependencies":117},"项目支持多种大语言模型（如 LLaMA、ChatGLM2、Baichuan、InternLM 等）和参数高效微调方法（如 LoRA、QLoRA、Adapter 等）；若使用非 LoRA 方法，需安装项目内定制版 PEFT（pip install -e .\u002Fpeft）；支持 Google Colab 运行；首次运行需从 Hugging Face 下载数据集和模型文件。",[118,119,120,121,122,123,124,125,126],"torch>=1.13","transformers","accelerate","peft","bitsandbytes","wandb","datasets","sentencepiece","protobuf",[51,13,26],[129,130,131,132,133,134,135,136,137,138,139,140,141,142,143],"chatglm","llama","llm","lora","chatgpt","cot","instruction-tuning","alpaca","moss","p-tuning","parameter-efficient","pytorch","tabul","tabular-data","tabular-model","2026-03-27T02:49:30.150509","2026-04-06T07:12:54.428228",[147,152,157,162,167,172],{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},1514,"训练时模型输出大量重复内容（如重复表情符号或词语），如何解决？","这通常是因为训练参数设置不当导致的。建议检查以下几点：1）确保训练数据质量良好；2）不要盲目增大 batch size，过大的 batch size 可能影响收敛；3）对于约 5 万条指令数据，3 个 epoch 是合适的；4）可尝试使用默认的 per_gpu_train_batch_size 和 gradient_accumulation_steps 配置，避免因显存优化过度而影响训练稳定性。","https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Fissues\u002F33",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},1515,"使用 ChatGLM-6B 在 A100 80G 显卡上多卡训练时报 OOM（显存不足），即使调整 LoRA rank 为 8 或使用 torchrun 也无效，怎么办？","OOM 问题可能与数据集中每条样本长度过长有关。建议修改 uniform_finetune.py 中 ChatGLM 的 tokenize 函数，启用 truncation=True 并设置 max_length=args.cutoff_len，同时将 padding 设为 False 而非 \"max_length\"。具体代码如下：\n```python\nif \"chatglm\" in args.model_type:\n    def prompt_tokenize(prompt):\n        input_ids = tokenizer.encode(prompt,\n                                     truncation=True,\n                                     max_length=args.cutoff_len,\n                                     padding=False)\n        return {\"input_ids\": input_ids, \"labels\": copy.deepcopy(input_ids)}\n    def completion_tokenize(completion):\n        input_ids = tokenizer.encode(completion)\n        return {\"input_ids\": input_ids, \"labels\": copy.deepcopy(input_ids)}\n```","https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Fissues\u002F103",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},1516,"使用 MOSS 模型时出现 RuntimeError: expected scalar type Half but found Float 错误，如何解决？","该错误通常由混合精度类型不匹配引起。可尝试以下任一方案：1）将 from_pretrained 中的 load_in_8bit=True 替换为 torch_dtype=torch.float16；2）在加载模型后调用 model.half()；3）升级 CUDA、bitsandbytes 和 transformers 到最新版本。例如：\n```python\nmodel_class.model.from_pretrained(args.model_name_or_path, torch_dtype=torch.float16, device_map=device_map)\n```","https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Fissues\u002F106",{"id":163,"question_zh":164,"answer_zh":165,"source_url":166},1517,"CoT 中文数据集中的翻译存在不一致问题（如选项与答案描述不同），会影响训练效果吗？","确实存在部分中文 CoT 数据翻译不一致的问题，虽然语义相近，但可能影响模型复述选项的能力。项目方已对 ecqa 数据集进行了重新处理并更新。尽管无法覆盖所有样本，但模型仍能从中学习推理逻辑，整体语义理解影响不大。","https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Fissues\u002F62",{"id":168,"question_zh":169,"answer_zh":170,"source_url":171},1518,"Colab 上运行模型效果差，输出大量重复内容，如何复现 README 中的效果？","Colab 上默认加载的 LoRA 权重可能是基于纯英文 instruction 数据训练的，导致中文效果不佳。建议指定使用包含中英混合数据训练的权重目录，例如：--lora_dir \"saved-alpaca-belle-cot7b\"。此外，确保使用项目提供的最新 server.py 脚本启动服务。","https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Fissues\u002F52",{"id":173,"question_zh":174,"answer_zh":175,"source_url":176},1519,"多 GPU 训练保存的 LoRA 权重，在单 GPU 上推理时加载失败，报错 shape 不匹配（如 lora_A.weight 形状不一致），如何解决？","该问题通常是因为多卡训练时 LoRA 的 rank（r 参数）与单卡推理时使用的配置不一致。请确保推理时使用的 generate.py 脚本与训练时的 LoRA 配置（特别是 r 值）完全一致。若问题仍存在，建议使用项目最新版的 generate.py 文件，并确认未手动修改 LoRA 相关超参。","https:\u002F\u002Fgithub.com\u002FPhoebusSi\u002FAlpaca-CoT\u002Fissues\u002F36",[]]