[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-01-ai--Yi":3,"tool-01-ai--Yi":64},[4,17,25,39,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":10,"last_commit_at":23,"category_tags":24,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":26,"name":27,"github_repo":28,"description_zh":29,"stars":30,"difficulty_score":10,"last_commit_at":31,"category_tags":32,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[33,34,35,36,14,37,15,13,38],"图像","数据工具","视频","插件","其他","音频",{"id":40,"name":41,"github_repo":42,"description_zh":43,"stars":44,"difficulty_score":45,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[14,33,13,15,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":45,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[15,33,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":45,"last_commit_at":62,"category_tags":63,"status":16},2181,"OpenHands","OpenHands\u002FOpenHands","OpenHands 是一个专注于 AI 驱动开发的开源平台，旨在让智能体（Agent）像人类开发者一样理解、编写和调试代码。它解决了传统编程中重复性劳动多、环境配置复杂以及人机协作效率低等痛点，通过自动化流程显著提升开发速度。\n\n无论是希望提升编码效率的软件工程师、探索智能体技术的研究人员，还是需要快速原型验证的技术团队，都能从中受益。OpenHands 提供了灵活多样的使用方式：既可以通过命令行（CLI）或本地图形界面在个人电脑上轻松上手，体验类似 Devin 的流畅交互；也能利用其强大的 Python SDK 自定义智能体逻辑，甚至在云端大规模部署上千个智能体并行工作。\n\n其核心技术亮点在于模块化的软件智能体 SDK，这不仅构成了平台的引擎，还支持高度可组合的开发模式。此外，OpenHands 在 SWE-bench 基准测试中取得了 77.6% 的优异成绩，证明了其解决真实世界软件工程问题的能力。平台还具备完善的企业级功能，支持与 Slack、Jira 等工具集成，并提供细粒度的权限管理，适合从个人开发者到大型企业的各类用户场景。",70612,"2026-04-05T11:12:22",[15,14,13,36],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":100,"forks":101,"last_commit_at":102,"license":103,"difficulty_score":45,"env_os":104,"env_gpu":105,"env_ram":104,"env_deps":106,"category_tags":109,"github_topics":110,"view_count":112,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":113,"updated_at":114,"faqs":115,"releases":144},2282,"01-ai\u002FYi","Yi","A series of large language models trained from scratch by developers @01-ai","Yi 是由零一万物（01.AI）从头训练的一系列开源大型语言模型，旨在打造下一代强大的双语人工智能助手。它基于 3 万亿 token 的多语言语料库进行训练，在语言理解、常识推理及阅读理解等核心能力上表现卓越，有效解决了高质量开源模型稀缺以及中英文场景下性能不平衡的痛点。\n\n在技术亮点方面，Yi 系列模型在全球权威评测中屡获佳绩。其中，Yi-34B-Chat 曾在 AlpacaEval 榜单上位居全球第二，仅次于 GPT-4 Turbo，超越了包括 Claude 和 Mixtral 在内的众多知名模型；其基座模型也在开源界名列前茅，展现了极高的智能水平。此外，项目提供了完善的生态支持，涵盖从快速部署、量化压缩到微调训练的全流程工具链，并兼容多种主流推理框架。\n\nYi 非常适合各类用户群体：开发者可将其作为构建智能应用的坚实基座，轻松集成到现有系统中；研究人员能利用其开源权重深入探索大模型机制或进行垂直领域微调；企业用户则可借助其出色的双语能力优化客服、翻译等业务场景。无论是希望体验顶尖 AI 技术的普通爱好者，还是追求高效落地的专业团队，Yi 都是一个值得信赖的选择。","\u003Cp align=\"left\">\n    &nbspEnglish&nbsp | &nbsp; \u003Ca href=\"README_CN.md\">中文\u003C\u002Fa>\n\u003C\u002Fp>\n\u003Cbr>\u003Cbr>\n\n\u003Cdiv align=\"center\">\n\n\u003Cpicture>\n  \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002F01-ai\u002FYi\u002Fmain\u002Fassets\u002Fimg\u002FYi_logo_icon_dark.svg\" width=\"200px\">\n  \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002F01-ai\u002FYi\u002Fmain\u002Fassets\u002Fimg\u002FYi_logo_icon_light.svg\" width=\"200px\"> \n  \u003Cimg alt=\"specify theme context for images\" src=\"https:\u002F\u002Fraw.githubusercontent.com\u002F01-ai\u002FYi\u002Fmain\u002Fassets\u002Fimg\u002FYi_logo_icon_light.svg\" width=\"200px\">\n\u003C\u002Fpicture>\n\n\u003C\u002Fbr>\n\u003C\u002Fbr>\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Factions\u002Fworkflows\u002Fbuild_docker_image.yml\">\n  \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Factions\u002Fworkflows\u002Fbuild_docker_image.yml\u002Fbadge.svg\">\n\u003C\u002Fa>\n\u003Ca href=\"mailto:oss@01.ai\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F✉️-yi@01.ai-FFE01B\">\n\u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n\u003Cdiv id=\"top\">\u003C\u002Fdiv>  \n\n\u003Cdiv align=\"center\">\n  \u003Ch3 align=\"center\">Building the Next Generation of Open-Source and Bilingual LLMs\u003C\u002Fh3>\n\u003C\u002Fdiv>\n\u003Cp align=\"center\">\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F01-ai\" target=\"_blank\">Hugging Face\u003C\u002Fa> • 🤖 \u003Ca href=\"https:\u002F\u002Fwww.modelscope.cn\u002Forganization\u002F01ai\u002F\" target=\"_blank\">ModelScope\u003C\u002Fa> • 🟣 \u003Ca href=\"https:\u002F\u002Fwisemodel.cn\u002Forganization\u002F01.AI\" target=\"_blank\">wisemodel\u003C\u002Fa>\n\u003C\u002Fp> \n\n\u003Cp align=\"center\">\n    👩‍🚀 Ask questions or discuss ideas on \u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fdiscussions\" target=\"_blank\"> GitHub \u003C\u002Fa>\n\u003C\u002Fp> \n\n\u003Cp align=\"center\">\n    👋 Join us on \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FhYUwWddeAu\" target=\"_blank\"> 👾 Discord \u003C\u002Fa> or \u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi-1.5\u002Fissues\u002F2\" target=\"_blank\"> 💬 WeChat \u003C\u002Fa>\n\u003C\u002Fp> \n\n\u003Cp align=\"center\">\n    📝 Check out  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.04652\"> Yi Tech Report \u003C\u002Fa>\n\u003C\u002Fp> \n\n\u003Cp align=\"center\">\n    📚 Grow at \u003Ca href=\"#learning-hub\"> Yi Learning Hub \u003C\u002Fa>\n\u003C\u002Fp> \n\n\u003Cp align=\"center\">\n    💪 Learn at \u003Ca href=\"https:\u002F\u002F01-ai.github.io\u002F\"> Yi Tech Blog \u003C\u002Fa>\n\u003C\u002Fp> \n\n\u003C!-- DO NOT REMOVE ME -->\n\n\u003Chr>\n\n\u003Cdetails open>\n\u003Csummary>\u003C\u002Fb>📕 Table of Contents\u003C\u002Fb>\u003C\u002Fsummary>\n\n- [What is Yi?](#what-is-yi)\n  - [Introduction](#introduction)\n  - [Models](#models)\n    - [Chat models](#chat-models)\n    - [Base models](#base-models)\n    - [Model info](#model-info)\n  - [News](#news)\n- [How to use Yi?](#how-to-use-yi)\n  - [Quick start](#quick-start)\n    - [Choose your path](#choose-your-path)\n    - [pip](#quick-start---pip)\n    - [docker](#quick-start---docker)\n    - [llama.cpp](#quick-start---llamacpp)\n    - [conda-lock](#quick-start---conda-lock)\n    - [Web demo](#web-demo)\n  - [Fine-tuning](#fine-tuning)\n  - [Quantization](#quantization)\n  - [Deployment](#deployment)\n  - [FAQ](#faq)\n  - [Learning hub](#learning-hub)\n- [Why Yi?](#why-yi)\n  - [Ecosystem](#ecosystem)\n    - [Upstream](#upstream)\n    - [Downstream](#downstream)\n      - [Serving](#serving)\n      - [Quantization](#quantization-1)\n      - [Fine-tuning](#fine-tuning-1)\n      - [API](#api)\n  - [Benchmarks](#benchmarks)\n    - [Base model performance](#base-model-performance)\n    - [Chat model performance](#chat-model-performance)\n  - [Tech report](#tech-report)\n    - [Citation](#citation)\n- [Who can use Yi?](#who-can-use-yi)\n- [Misc.](#misc)\n  - [Acknowledgements](#acknowledgments)\n  - [Disclaimer](#disclaimer)\n  - [License](#license)\n\n\u003C\u002Fdetails>\n\n\u003Chr>\n\n# What is Yi?\n\n## Introduction \n\n- 🤖 The Yi series models are the next generation of open-source large language models trained from scratch by [01.AI](https:\u002F\u002F01.ai\u002F).\n\n- 🙌 Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. For example,\n  \n  - Yi-34B-Chat model **landed in second place (following GPT-4 Turbo)**, outperforming other LLMs (such as GPT-4, Mixtral, Claude) on the AlpacaEval Leaderboard (based on data available up to January 2024).\n\n  - Yi-34B model **ranked first among all existing open-source models** (such as Falcon-180B, Llama-70B, Claude) in **both English and Chinese** on various benchmarks, including Hugging Face Open LLM Leaderboard (pre-trained) and C-Eval (based on data available up to November 2023).\n  \n  - 🙏 (Credits to Llama) Thanks to the Transformer and Llama open-source communities, as they reduce the efforts required to build from scratch and enable the utilization of the same tools within the AI ecosystem.  \n\n  \u003Cdetails style=\"display: inline;\">\u003Csummary> If you're interested in Yi's adoption of Llama architecture and license usage policy, see  \u003Cspan style=\"color:  green;\">Yi's relation with Llama.\u003C\u002Fspan> ⬇️\u003C\u002Fsummary> \u003Cul> \u003Cbr>\n  \n  \n> 💡 TL;DR\n> \n> The Yi series models adopt the same model architecture as Llama but are **NOT** derivatives of Llama.\n\n- Both Yi and Llama are based on the Transformer structure, which has been the standard architecture for large language models since 2018.\n\n- Grounded in the Transformer architecture, Llama has become a new cornerstone for the majority of state-of-the-art open-source models due to its excellent stability, reliable convergence, and robust compatibility. This positions Llama as the recognized foundational framework for models including Yi.\n\n- Thanks to the Transformer and Llama architectures, other models can leverage their power, reducing the effort required to build from scratch and enabling the utilization of the same tools within their ecosystems.\n\n- However, the Yi series models are NOT derivatives of Llama, as they do not use Llama's weights.\n\n  - As Llama's structure is employed by the majority of open-source models, the key factors of determining model performance are training datasets, training pipelines, and training infrastructure.\n\n  - Developing in a unique and proprietary way, Yi has independently created its own high-quality training datasets, efficient training pipelines, and robust training infrastructure entirely from the ground up. This effort has led to excellent performance with Yi series models ranking just behind GPT4 and surpassing Llama on the [Alpaca Leaderboard in Dec 2023](https:\u002F\u002Ftatsu-lab.github.io\u002Falpaca_eval\u002F). \n\u003C\u002Ful>\n\u003C\u002Fdetails>\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n## News \n\n\u003Cdetails>\n  \u003Csummary>🔥 \u003Cb>2024-07-29\u003C\u002Fb>: The \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FHaijian06\u002FYi\u002Ftree\u002Fmain\u002FCookbook\">Yi Cookbook 1.0 \u003C\u002Fa> is released, featuring tutorials and examples in both Chinese and English.\u003C\u002Fsummary>\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>🎯 \u003Cb>2024-05-13\u003C\u002Fb>: The \u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi-1.5\">Yi-1.5 series models \u003C\u002Fa> are open-sourced, further improving coding, math, reasoning, and instruction-following abilities.\u003C\u002Fsummary>\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>🎯 \u003Cb>2024-03-16\u003C\u002Fb>: The \u003Ccode>Yi-9B-200K\u003C\u002Fcode> is open-sourced and available to the public.\u003C\u002Fsummary>\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>🎯 \u003Cb>2024-03-08\u003C\u002Fb>: \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.04652\">Yi Tech Report\u003C\u002Fa> is published! \u003C\u002Fsummary>\n\u003C\u002Fdetails>\n\n\n\u003Cdetails open>\n  \u003Csummary>🔔 \u003Cb>2024-03-07\u003C\u002Fb>: The long text capability of the Yi-34B-200K has been enhanced. \u003C\u002Fsummary>\n  \u003Cbr>\nIn the \"Needle-in-a-Haystack\" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. We continue to pre-train the model on 5B tokens long-context data mixture and demonstrate a near-all-green performance.\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>🎯 \u003Cb>2024-03-06\u003C\u002Fb>: The \u003Ccode>Yi-9B\u003C\u002Fcode> is open-sourced and available to the public.\u003C\u002Fsummary>\n  \u003Cbr>\n\u003Ccode>Yi-9B\u003C\u002Fcode> stands out as the top performer among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension.\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>🎯 \u003Cb>2024-01-23\u003C\u002Fb>: The Yi-VL models, \u003Ccode>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-VL-34B\">Yi-VL-34B\u003C\u002Fa>\u003C\u002Fcode> and \u003Ccode>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-VL-6B\">Yi-VL-6B\u003C\u002Fa>\u003C\u002Fcode>, are open-sourced and available to the public.\u003C\u002Fsummary>\n  \u003Cbr>\n  \u003Ccode>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-VL-34B\">Yi-VL-34B\u003C\u002Fa>\u003C\u002Fcode> has ranked \u003Cstrong>first\u003C\u002Fstrong> among all existing open-source models in the latest benchmarks, including \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16502\">MMMU\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.11944\">CMMMU\u003C\u002Fa> (based on data available up to January 2024).\u003C\u002Fli>\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>🎯 \u003Cb>2023-11-23\u003C\u002Fb>: \u003Ca href=\"#chat-models\">Chat models\u003C\u002Fa> are open-sourced and available to the public.\u003C\u002Fsummary>\n\u003Cbr>This release contains two chat models based on previously released base models, two 8-bit models quantized by GPTQ, and two 4-bit models quantized by AWQ.\n\n- `Yi-34B-Chat`\n- `Yi-34B-Chat-4bits`\n- `Yi-34B-Chat-8bits`\n- `Yi-6B-Chat`\n- `Yi-6B-Chat-4bits`\n- `Yi-6B-Chat-8bits`\n\nYou can try some of them interactively at:\n\n- [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002F01-ai\u002FYi-34B-Chat)\n- [Replicate](https:\u002F\u002Freplicate.com\u002F01-ai)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>🔔 \u003Cb>2023-11-23\u003C\u002Fb>: The Yi Series Models Community License Agreement is updated to \u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Fmain\u002FMODEL_LICENSE_AGREEMENT.txt\">v2.1\u003C\u002Fa>.\u003C\u002Fsummary>\n\u003C\u002Fdetails>\n\n\u003Cdetails> \n\u003Csummary>🔥 \u003Cb>2023-11-08\u003C\u002Fb>: Invited test of Yi-34B chat model.\u003C\u002Fsummary>\n\u003Cbr>Application form:\n\n- [English](https:\u002F\u002Fcn.mikecrm.com\u002Fl91ODJf)\n- [Chinese](https:\u002F\u002Fcn.mikecrm.com\u002FgnEZjiQ)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>🎯 \u003Cb>2023-11-05\u003C\u002Fb>: \u003Ca href=\"#base-models\">The base models, \u003C\u002Fa>\u003Ccode>Yi-6B-200K\u003C\u002Fcode> and \u003Ccode>Yi-34B-200K\u003C\u002Fcode>, are open-sourced and available to the public.\u003C\u002Fsummary>\n\u003Cbr>This release contains two base models with the same parameter sizes as the previous\nrelease, except that the context window is extended to 200K.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>🎯 \u003Cb>2023-11-02\u003C\u002Fb>: \u003Ca href=\"#base-models\">The base models, \u003C\u002Fa>\u003Ccode>Yi-6B\u003C\u002Fcode> and \u003Ccode>Yi-34B\u003C\u002Fcode>, are open-sourced and available to the public.\u003C\u002Fsummary>\n\u003Cbr>The first public release contains two bilingual (English\u002FChinese) base models\nwith the parameter sizes of 6B and 34B.  Both of them are trained with 4K\nsequence length and can be extended to 32K during inference time.\n\n\u003C\u002Fdetails>\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n## Models\n\nYi models come in multiple sizes and cater to different use cases. You can also fine-tune Yi models to meet your specific requirements. \n\nIf you want to deploy Yi models, make sure you meet the [software and hardware requirements](#deployment).\n\n### Chat models\n\n| Model | Download  |\n|---|---|\n|Yi-34B-Chat\t| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-34B-Chat)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-34B-Chat\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-34B-Chat) |\n|Yi-34B-Chat-4bits\t| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-34B-Chat-4bits)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-34B-Chat-4bits\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-34B-Chat-4bits) |\n|Yi-34B-Chat-8bits | • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-34B-Chat-8bits)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-34B-Chat-8bits\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-34B-Chat-8bits) |\n|Yi-6B-Chat| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-6B-Chat)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-6B-Chat\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat) |\n|Yi-6B-Chat-4bits | • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-6B-Chat-4bits)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-6B-Chat-4bits\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-4bits) |\n|Yi-6B-Chat-8bits\t| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-6B-Chat-8bits)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-6B-Chat-8bits\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits) |\n\n\u003Csub>\u003Csup> - 4-bit series models are quantized by AWQ. \u003Cbr> - 8-bit series models are quantized by GPTQ \u003Cbr> - All quantized models have a low barrier to use since they can be deployed on consumer-grade GPUs (e.g., 3090, 4090). \u003C\u002Fsup>\u003C\u002Fsub>\n\n### Base models\n\n| Model | Download |\n|---|---|\n|Yi-34B| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-34B)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-34B\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits) |\n|Yi-34B-200K|• [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-34B-200K)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-34B-200K\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits)|\n|Yi-9B|• [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-9B)  • [🤖 ModelScope](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-9B)|\n|Yi-9B-200K | • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-9B-200K)  • [🤖 ModelScope](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-9B-200K)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits) |\n|Yi-6B| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-6B)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-6B\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits) |\n|Yi-6B-200K\t| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-6B-200K)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-6B-200K\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits) |\n\n\u003Csub>\u003Csup> - 200k is roughly equivalent to 400,000 Chinese characters.  \u003Cbr> - If you want to use the previous version of the Yi-34B-200K (released on Nov 5, 2023), run `git checkout 069cd341d60f4ce4b07ec394e82b79e94f656cf` to download the weight. \u003C\u002Fsup>\u003C\u002Fsub>\n\n### Model info\n\n- For chat and base models\n\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Model\u003C\u002Fth>\n\u003Cth>Intro\u003C\u002Fth>\n\u003Cth>Default context window\u003C\u002Fth>\n\u003Cth>Pretrained tokens\u003C\u002Fth>\n\u003Cth>Training Data Date\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>6B series models\u003C\u002Ftd>\n\u003Ctd>They are suitable for personal and academic use.\u003C\u002Ftd>\n\u003Ctd rowspan=\"3\">4K\u003C\u002Ftd>\n\u003Ctd>3T\u003C\u002Ftd>\n\u003Ctd rowspan=\"3\">Up to June 2023\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>9B series models\u003C\u002Ftd>\n\u003Ctd>It is the best at coding and math in the Yi series models.\u003C\u002Ftd>\n\u003Ctd>Yi-9B is continuously trained based on Yi-6B, using 0.8T tokens.\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>34B series models\u003C\u002Ftd>\n\u003Ctd>They are suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It&#39;s a cost-effective solution that&#39;s affordable and equipped with emergent ability.\u003C\u002Ftd>\n\u003Ctd>3T\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\n\n- For chat models\n  \n  \u003Cdetails style=\"display: inline;\">\u003Csummary>For chat model limitations, see the explanations below. ⬇️\u003C\u002Fsummary>\n   \u003Cul>\n    \u003Cbr>The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training.\n\n    \u003Cbr>However, this higher diversity might amplify certain existing issues, including:\n      \u003Cli>Hallucination: This refers to the model generating factually incorrect or nonsensical information. With the model's responses being more varied, there's a higher chance of hallucination that are not based on accurate data or logical reasoning.\u003C\u002Fli>\n      \u003Cli>Non-determinism in re-generation: When attempting to regenerate or sample responses, inconsistencies in the outcomes may occur. The increased diversity can lead to varying results even under similar input conditions.\u003C\u002Fli>\n      \u003Cli>Cumulative Error: This occurs when errors in the model's responses compound over time. As the model generates more diverse responses, the likelihood of small inaccuracies building up into larger errors increases, especially in complex tasks like extended reasoning, mathematical problem-solving, etc.\u003C\u002Fli>\n      \u003Cli>To achieve more coherent and consistent responses, it is advisable to adjust generation configuration parameters such as temperature, top_p, or top_k. These adjustments can help in the balance between creativity and coherence in the model's outputs.\u003C\u002Fli>\n  \u003C\u002Ful>\n  \u003C\u002Fdetails>\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n\n# How to use Yi?\n\n- [Quick start](#quick-start)\n  - [Choose your path](#choose-your-path)\n  - [pip](#quick-start---pip)\n  - [docker](#quick-start---docker)\n  - [conda-lock](#quick-start---conda-lock)\n  - [llama.cpp](#quick-start---llamacpp)\n  - [Web demo](#web-demo)\n- [Fine-tuning](#fine-tuning)\n- [Quantization](#quantization)\n- [Deployment](#deployment)\n- [FAQ](#faq)\n- [Learning hub](#learning-hub)\n\n## Quick start\n\n> **💡 Tip**: If you want to get started with the Yi model and explore different methods for inference, check out the [Yi Cookbook](https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Ftree\u002Fmain\u002FCookbook).\n\n### Choose your path\n\nSelect one of the following paths to begin your journey with Yi!\n\n ![Quick start - Choose your path](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_8ce0f0e81389.png)\n\n#### 🎯 Deploy Yi locally\n\nIf you prefer to deploy Yi models locally, \n\n  - 🙋‍♀️ and you have **sufficient** resources (for example, NVIDIA A800 80GB), you can choose one of the following methods:\n    - [pip](#quick-start---pip)\n    - [Docker](#quick-start---docker)\n    - [conda-lock](#quick-start---conda-lock)\n\n  - 🙋‍♀️ and you have **limited** resources (for example, a MacBook Pro), you can use [llama.cpp](#quick-start---llamacpp).\n\n#### 🎯 Not to deploy Yi locally\n\nIf you prefer not to deploy Yi models locally, you can explore Yi's capabilities using any of the following options.\n\n##### 🙋‍♀️ Run Yi with APIs\n\nIf you want to explore more features of Yi, you can adopt one of these methods:\n\n- Yi APIs (Yi official)\n  - [Early access has been granted](https:\u002F\u002Fx.com\u002F01AI_Yi\u002Fstatus\u002F1735728934560600536?s=20) to some applicants. Stay tuned for the next round of access!\n\n- [Yi APIs](https:\u002F\u002Freplicate.com\u002F01-ai\u002Fyi-34b-chat\u002Fapi?tab=nodejs) (Replicate)\n\n##### 🙋‍♀️ Run Yi in playground\n\nIf you want to chat with Yi with more customizable options (e.g., system prompt, temperature, repetition penalty, etc.), you can try one of the following options:\n\n  - [Yi-34B-Chat-Playground](https:\u002F\u002Fplatform.lingyiwanwu.com\u002Fprompt\u002Fplayground) (Yi official)\n    - Access is available through a whitelist. Welcome to apply (fill out a form in [English](https:\u002F\u002Fcn.mikecrm.com\u002Fl91ODJf) or [Chinese](https:\u002F\u002Fcn.mikecrm.com\u002FgnEZjiQ)).\n  \n  - [Yi-34B-Chat-Playground](https:\u002F\u002Freplicate.com\u002F01-ai\u002Fyi-34b-chat) (Replicate) \n\n##### 🙋‍♀️ Chat with Yi\n\n If you want to chat with Yi, you can use one of these online services, which offer a similar user experience:\n\n- [Yi-34B-Chat](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002F01-ai\u002FYi-34B-Chat) (Yi official on Hugging Face)\n  - No registration is required.\n\n- [Yi-34B-Chat](https:\u002F\u002Fplatform.lingyiwanwu.com\u002F) (Yi official beta)\n  - Access is available through a whitelist. Welcome to apply (fill out a form in [English](https:\u002F\u002Fcn.mikecrm.com\u002Fl91ODJf) or [Chinese](https:\u002F\u002Fcn.mikecrm.com\u002FgnEZjiQ)).\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### Quick start - pip \n\nThis tutorial guides you through every step of running **Yi-34B-Chat locally on an A800 (80G)** and then performing inference.\n\n#### Step 0: Prerequisites\n\n- Make sure Python 3.10 or a later version is installed.\n\n- If you want to run other Yi models, see [software and hardware requirements](#deployment).\n\n#### Step 1: Prepare your environment \n\nTo set up the environment and install the required packages, execute the following command.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi.git\ncd yi\npip install -r requirements.txt\n```\n\n#### Step 2: Download the Yi model\n\nYou can download the weights and tokenizer of Yi models from the following sources:\n\n- [Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai)\n- [ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Forganization\u002F01ai\u002F)\n- [WiseModel](https:\u002F\u002Fwisemodel.cn\u002Forganization\u002F01.AI)\n\n#### Step 3: Perform inference\n\nYou can perform inference with Yi chat or base models as below.\n\n##### Perform inference with Yi chat model\n\n1. Create a file named  `quick_start.py` and copy the following content to it.\n\n    ```python\n    from transformers import AutoModelForCausalLM, AutoTokenizer\n\n    model_path = '\u003Cyour-model-path>'\n\n    tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)\n\n    # Since transformers 4.35.0, the GPT-Q\u002FAWQ model can be loaded using AutoModelForCausalLM.\n    model = AutoModelForCausalLM.from_pretrained(\n        model_path,\n        device_map=\"auto\",\n        torch_dtype='auto'\n    ).eval()\n\n    # Prompt content: \"hi\"\n    messages = [\n        {\"role\": \"user\", \"content\": \"hi\"}\n    ]\n\n    input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')\n    output_ids = model.generate(input_ids.to('cuda'))\n    response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)\n\n    # Model response: \"Hello! How can I assist you today?\"\n    print(response)\n    ```\n\n2. Run `quick_start.py`.\n\n    ```bash\n    python quick_start.py\n    ```\n\n    Then you can see an output similar to the one below. 🥳\n\n    ```bash\n    Hello! How can I assist you today?\n    ```\n\n##### Perform inference with Yi base model\n\n- Yi-34B\n\n  The steps are similar to [pip - Perform inference with Yi chat model](#perform-inference-with-yi-chat-model).\n\n  You can use the existing file [`text_generation.py`](https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Ftree\u002Fmain\u002Fdemo).\n\n  ```bash\n  python demo\u002Ftext_generation.py  --model \u003Cyour-model-path>\n  ```\n\n  Then you can see an output similar to the one below. 🥳\n\n  \u003Cdetails>\n\n  \u003Csummary>Output. ⬇️ \u003C\u002Fsummary>\n\n  \u003Cbr>\n\n  **Prompt**: Let me tell you an interesting story about cat Tom and mouse Jerry,\n\n  **Generation**: Let me tell you an interesting story about cat Tom and mouse Jerry, which happened in my childhood. My father had a big house with two cats living inside it to kill mice. One day when I was playing at home alone, I found one of the tomcats lying on his back near our kitchen door, looking very much like he wanted something from us but couldn’t get up because there were too many people around him! He kept trying for several minutes before finally giving up...\n\n  \u003C\u002Fdetails>\n\n- Yi-9B\n  \n  Input\n\n  ```bash\n  from transformers import AutoModelForCausalLM, AutoTokenizer\n  \n  MODEL_DIR = \"01-ai\u002FYi-9B\"\n  model = AutoModelForCausalLM.from_pretrained(MODEL_DIR, torch_dtype=\"auto\")\n  tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, use_fast=False)\n  \n  input_text = \"# write the quick sort algorithm\"\n  inputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\n  outputs = model.generate(**inputs, max_length=256)\n  print(tokenizer.decode(outputs[0], skip_special_tokens=True))\n  ```\n\n  Output\n\n  ```bash\n  # write the quick sort algorithm\n  def quick_sort(arr):\n      if len(arr) \u003C= 1:\n          return arr\n      pivot = arr[len(arr) \u002F\u002F 2]\n      left = [x for x in arr if x \u003C pivot]\n      middle = [x for x in arr if x == pivot]\n      right = [x for x in arr if x > pivot]\n      return quick_sort(left) + middle + quick_sort(right)\n  \n  # test the quick sort algorithm\n  print(quick_sort([3, 6, 8, 10, 1, 2, 1]))\n  ```\n\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### Quick start - Docker \n\u003Cdetails>\n\u003Csummary> Run Yi-34B-chat locally with Docker: a step-by-step guide. ⬇️\u003C\u002Fsummary> \n\u003Cbr>This tutorial guides you through every step of running \u003Cstrong>Yi-34B-Chat on an A800 GPU\u003C\u002Fstrong> or \u003Cstrong>4*4090\u003C\u002Fstrong> locally and then performing inference.\n \u003Ch4>Step 0: Prerequisites\u003C\u002Fh4>\n\u003Cp>Make sure you've installed \u003Ca href=\"https:\u002F\u002Fdocs.docker.com\u002Fengine\u002Finstall\u002F?open_in_browser=true\">Docker\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html\">nvidia-container-toolkit\u003C\u002Fa>.\u003C\u002Fp>\n\n\u003Ch4> Step 1: Start Docker \u003C\u002Fh4>\n\u003Cpre>\u003Ccode>docker run -it --gpus all \\\n-v &lt;your-model-path&gt;: \u002Fmodels\nghcr.io\u002F01-ai\u002Fyi:latest\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>Alternatively, you can pull the Yi Docker image from \u003Ccode>registry.lingyiwanwu.com\u002Fci\u002F01-ai\u002Fyi:latest\u003C\u002Fcode>.\u003C\u002Fp>\n\n\u003Ch4>Step 2: Perform inference\u003C\u002Fh4>\n    \u003Cp>You can perform inference with Yi chat or base models as below.\u003C\u002Fp>\n\n\u003Ch5>Perform inference with Yi chat model\u003C\u002Fh5>\n    \u003Cp>The steps are similar to \u003Ca href=\"#perform-inference-with-yi-chat-model\">pip - Perform inference with Yi chat model\u003C\u002Fa>.\u003C\u002Fp>\n    \u003Cp>\u003Cstrong>Note\u003C\u002Fstrong> that the only difference is to set \u003Ccode>model_path = '&lt;your-model-mount-path&gt;'\u003C\u002Fcode> instead of \u003Ccode>model_path = '&lt;your-model-path&gt;'\u003C\u002Fcode>.\u003C\u002Fp>\n\u003Ch5>Perform inference with Yi base model\u003C\u002Fh5>\n    \u003Cp>The steps are similar to \u003Ca href=\"#perform-inference-with-yi-base-model\">pip - Perform inference with Yi base model\u003C\u002Fa>.\u003C\u002Fp>\n    \u003Cp>\u003Cstrong>Note\u003C\u002Fstrong> that the only difference is to set \u003Ccode>--model &lt;your-model-mount-path&gt;'\u003C\u002Fcode> instead of \u003Ccode>model &lt;your-model-path&gt;\u003C\u002Fcode>.\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n### Quick start - conda-lock\n\n\u003Cdetails>\n\u003Csummary>You can use \u003Ccode>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fconda\u002Fconda-lock\">conda-lock\u003C\u002Fa>\u003C\u002Fcode> to generate fully reproducible lock files for conda environments. ⬇️\u003C\u002Fsummary>\n\u003Cbr>\nYou can refer to \u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Febba23451d780f35e74a780987ad377553134f68\u002Fconda-lock.yml\">conda-lock.yml\u003C\u002Fa>  for the exact versions of the dependencies. Additionally, you can utilize \u003Ccode>\u003Ca href=\"https:\u002F\u002Fmamba.readthedocs.io\u002Fen\u002Flatest\u002Fuser_guide\u002Fmicromamba.html\">micromamba\u003C\u002Fa>\u003C\u002Fcode> for installing these dependencies.\n\u003Cbr>\nTo install the dependencies, follow these steps:\n\n1. Install micromamba by following the instructions available \u003Ca href=\"https:\u002F\u002Fmamba.readthedocs.io\u002Fen\u002Flatest\u002Finstallation\u002Fmicromamba-installation.html\">here\u003C\u002Fa>.\n\n2. Execute \u003Ccode>micromamba install -y -n yi -f conda-lock.yml\u003C\u002Fcode> to create a conda environment named \u003Ccode>yi\u003C\u002Fcode> and install the necessary dependencies.\n\u003C\u002Fdetails>\n\n\n### Quick start - llama.cpp\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_llama.cpp.md\">The following tutorial \u003C\u002Fa> will guide you through every step of running a quantized model (\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FXeIaso\u002Fyi-chat-6B-GGUF\u002Ftree\u002Fmain\">Yi-chat-6B-2bits\u003C\u002Fa>) locally and then performing inference.\n\u003Cdetails>\n\u003Csummary> Run Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide. ⬇️\u003C\u002Fsummary> \n\u003Cbr>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_llama.cpp.md\">This tutorial\u003C\u002Fa> guides you through every step of running a quantized model (\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FXeIaso\u002Fyi-chat-6B-GGUF\u002Ftree\u002Fmain\">Yi-chat-6B-2bits\u003C\u002Fa>) locally and then performing inference.\u003C\u002Fp>\n\n- [Step 0: Prerequisites](#step-0-prerequisites)\n- [Step 1: Download llama.cpp](#step-1-download-llamacpp)\n- [Step 2: Download Yi model](#step-2-download-yi-model)\n- [Step 3: Perform inference](#step-3-perform-inference)\n\n#### Step 0: Prerequisites \n\n- This tutorial assumes you use a MacBook Pro with 16GB of memory and an Apple M2 Pro chip.\n  \n- Make sure [`git-lfs`](https:\u002F\u002Fgit-lfs.com\u002F) is installed on your machine.\n  \n#### Step 1: Download `llama.cpp`\n\nTo clone the [`llama.cpp`](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp) repository, run the following command.\n\n```bash\ngit clone git@github.com:ggerganov\u002Fllama.cpp.git\n```\n\n#### Step 2: Download Yi model\n\n2.1 To clone [XeIaso\u002Fyi-chat-6B-GGUF](https:\u002F\u002Fhuggingface.co\u002FXeIaso\u002Fyi-chat-6B-GGUF\u002Ftree\u002Fmain) with just pointers, run the following command.\n\n```bash\nGIT_LFS_SKIP_SMUDGE=1 git clone https:\u002F\u002Fhuggingface.co\u002FXeIaso\u002Fyi-chat-6B-GGUF\n```\n\n2.2 To download a quantized Yi model ([yi-chat-6b.Q2_K.gguf](https:\u002F\u002Fhuggingface.co\u002FXeIaso\u002Fyi-chat-6B-GGUF\u002Fblob\u002Fmain\u002Fyi-chat-6b.Q2_K.gguf)), run the following command.\n\n```bash\ngit-lfs pull --include yi-chat-6b.Q2_K.gguf\n```\n\n#### Step 3: Perform inference\n\nTo perform inference with the Yi model, you can use one of the following methods.\n\n- [Method 1: Perform inference in terminal](#method-1-perform-inference-in-terminal)\n  \n- [Method 2: Perform inference in web](#method-2-perform-inference-in-web)\n\n##### Method 1: Perform inference in terminal\n\nTo compile `llama.cpp` using 4 threads and then conduct inference, navigate to the `llama.cpp` directory, and run the following command.\n\n> ##### Tips\n> \n> - Replace `\u002FUsers\u002Fyu\u002Fyi-chat-6B-GGUF\u002Fyi-chat-6b.Q2_K.gguf` with the actual path of your model.\n>\n> - By default, the model operates in completion mode.\n> \n> - For additional output customization options (for example, system prompt, temperature, repetition penalty, etc.), run `.\u002Fmain -h` to check detailed descriptions and usage.\n\n```bash\nmake -j4 && .\u002Fmain -m \u002FUsers\u002Fyu\u002Fyi-chat-6B-GGUF\u002Fyi-chat-6b.Q2_K.gguf -p \"How do you feed your pet fox? Please answer this question in 6 simple steps:\\nStep 1:\" -n 384 -e\n\n...\n\nHow do you feed your pet fox? Please answer this question in 6 simple steps:\n\nStep 1: Select the appropriate food for your pet fox. You should choose high-quality, balanced prey items that are suitable for their unique dietary needs. These could include live or frozen mice, rats, pigeons, or other small mammals, as well as fresh fruits and vegetables.\n\nStep 2: Feed your pet fox once or twice a day, depending on the species and its individual preferences. Always ensure that they have access to fresh water throughout the day.\n\nStep 3: Provide an appropriate environment for your pet fox. Ensure it has a comfortable place to rest, plenty of space to move around, and opportunities to play and exercise.\n\nStep 4: Socialize your pet with other animals if possible. Interactions with other creatures can help them develop social skills and prevent boredom or stress.\n\nStep 5: Regularly check for signs of illness or discomfort in your fox. Be prepared to provide veterinary care as needed, especially for common issues such as parasites, dental health problems, or infections.\n\nStep 6: Educate yourself about the needs of your pet fox and be aware of any potential risks or concerns that could affect their well-being. Regularly consult with a veterinarian to ensure you are providing the best care.\n\n...\n\n```\n\nNow you have successfully asked a question to the Yi model and got an answer! 🥳\n\n##### Method 2: Perform inference in web\n\n1. To initialize a lightweight and swift chatbot, run the following command.\n\n    ```bash\n    cd llama.cpp\n    .\u002Fserver --ctx-size 2048 --host 0.0.0.0 --n-gpu-layers 64 --model \u002FUsers\u002Fyu\u002Fyi-chat-6B-GGUF\u002Fyi-chat-6b.Q2_K.gguf\n    ```\n\n    Then you can get an output like this:\n\n\n    ```bash\n    ...\n    \n    llama_new_context_with_model: n_ctx      = 2048\n    llama_new_context_with_model: freq_base  = 5000000.0\n    llama_new_context_with_model: freq_scale = 1\n    ggml_metal_init: allocating\n    ggml_metal_init: found device: Apple M2 Pro\n    ggml_metal_init: picking default device: Apple M2 Pro\n    ggml_metal_init: ggml.metallib not found, loading from source\n    ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil\n    ggml_metal_init: loading '\u002FUsers\u002Fyu\u002Fllama.cpp\u002Fggml-metal.metal'\n    ggml_metal_init: GPU name:   Apple M2 Pro\n    ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)\n    ggml_metal_init: hasUnifiedMemory              = true\n    ggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MB\n    ggml_metal_init: maxTransferRate               = built-in GPU\n    ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   128.00 MiB, ( 2629.44 \u002F 10922.67)\n    llama_new_context_with_model: KV self size  =  128.00 MiB, K (f16):   64.00 MiB, V (f16):   64.00 MiB\n    ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, ( 2629.45 \u002F 10922.67)\n    llama_build_graph: non-view tensors processed: 676\u002F676\n    llama_new_context_with_model: compute buffer total size = 159.19 MiB\n    ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   156.02 MiB, ( 2785.45 \u002F 10922.67)\n    Available slots:\n    -> Slot 0 - max context: 2048\n    \n    llama server listening at http:\u002F\u002F0.0.0.0:8080\n    ```\n\n2. To access the chatbot interface, open your web browser and enter `http:\u002F\u002F0.0.0.0:8080` into the address bar. \n   \n    ![Yi model chatbot interface - llama.cpp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_52402f9ee848.png)\n\n\n3. Enter a question, such as \"How do you feed your pet fox? Please answer this question in 6 simple steps\" into the prompt window, and you will receive a corresponding answer.\n\n    ![Ask a question to Yi model - llama.cpp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_a4e0fafc523e.png)\n\n\u003C\u002Ful>\n\u003C\u002Fdetails>\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### Web demo\n\nYou can build a web UI demo for Yi **chat** models (note that Yi base models are not supported in this senario).\n\n[Step 1: Prepare your environment](#step-1-prepare-your-environment). \n\n[Step 2: Download the Yi model](#step-2-download-the-yi-model).\n\nStep 3. To start a web service locally, run the following command.\n\n```bash\npython demo\u002Fweb_demo.py -c \u003Cyour-model-path>\n```\n\nYou can access the web UI by entering the address provided in the console into your browser. \n\n ![Quick start - web demo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_29f31cde98e5.gif)\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### Fine-tuning\n\n```bash\nbash finetune\u002Fscripts\u002Frun_sft_Yi_6b.sh\n```\n\nOnce finished, you can compare the finetuned model and the base model with the following command:\n\n```bash\nbash finetune\u002Fscripts\u002Frun_eval.sh\n```\n\u003Cdetails style=\"display: inline;\">\u003Csummary>For advanced usage (like fine-tuning based on your custom data), see the explanations below. ⬇️ \u003C\u002Fsummary> \u003Cul>\n\n### Finetune code for Yi 6B and 34B\n\n#### Preparation\n\n##### From Image\n\nBy default, we use a small dataset from [BAAI\u002FCOIG](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBAAI\u002FCOIG) to finetune the base model.\nYou can also prepare your customized dataset in the following `jsonl` format:\n\n```json\n{ \"prompt\": \"Human: Who are you? Assistant:\", \"chosen\": \"I'm Yi.\" }\n```\n\nAnd then mount them in the container to replace the default ones:\n\n```bash\ndocker run -it \\\n    -v \u002Fpath\u002Fto\u002Fsave\u002Ffinetuned\u002Fmodel\u002F:\u002Ffinetuned-model \\\n    -v \u002Fpath\u002Fto\u002Ftrain.jsonl:\u002Fyi\u002Ffinetune\u002Fdata\u002Ftrain.json \\\n    -v \u002Fpath\u002Fto\u002Feval.jsonl:\u002Fyi\u002Ffinetune\u002Fdata\u002Feval.json \\\n    ghcr.io\u002F01-ai\u002Fyi:latest \\\n    bash finetune\u002Fscripts\u002Frun_sft_Yi_6b.sh\n```\n\n##### From Local Server\n\nMake sure you have conda. If not, use\n\n```bash\nmkdir -p ~\u002Fminiconda3\nwget https:\u002F\u002Frepo.anaconda.com\u002Fminiconda\u002FMiniconda3-latest-Linux-x86_64.sh -O ~\u002Fminiconda3\u002Fminiconda.sh\nbash ~\u002Fminiconda3\u002Fminiconda.sh -b -u -p ~\u002Fminiconda3\nrm -rf ~\u002Fminiconda3\u002Fminiconda.sh\n~\u002Fminiconda3\u002Fbin\u002Fconda init bash\nsource ~\u002F.bashrc\n```\n\nThen, create a conda env:\n\n```bash\nconda create -n dev_env python=3.10 -y\nconda activate dev_env\npip install torch==2.0.1 deepspeed==0.10 tensorboard transformers datasets sentencepiece accelerate ray==2.7\n```\n\n#### Hardware Setup\n\nFor the Yi-6B model, a node with 4 GPUs, each with GPU memory larger than 60GB, is recommended.\n\nFor the Yi-34B model, because the usage of the zero-offload technique consumes a lot of CPU memory, please be careful to limit the number of GPUs in the 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the number of GPUs (as shown in scripts\u002Frun_sft_Yi_34b.sh).\n\nA typical hardware setup for finetuning the 34B model is a node with 8 GPUs (limited to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each with GPU memory larger than 80GB, and total CPU memory larger than 900GB.\n\n#### Quick Start\n\nDownload a LLM-base model to MODEL_PATH (6B and 34B). A typical folder of models is like:\n\n```bash\n|-- $MODEL_PATH\n|   |-- config.json\n|   |-- pytorch_model-00001-of-00002.bin\n|   |-- pytorch_model-00002-of-00002.bin\n|   |-- pytorch_model.bin.index.json\n|   |-- tokenizer_config.json\n|   |-- tokenizer.model\n|   |-- ...\n```\n\nDownload a dataset from huggingface to local storage DATA_PATH, e.g. Dahoas\u002Frm-static.\n\n```bash\n|-- $DATA_PATH\n|   |-- data\n|   |   |-- train-00000-of-00001-2a1df75c6bce91ab.parquet\n|   |   |-- test-00000-of-00001-8c7c51afc6d45980.parquet\n|   |-- dataset_infos.json\n|   |-- README.md\n```\n\n`finetune\u002Fyi_example_dataset` has example datasets, which are modified from [BAAI\u002FCOIG](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBAAI\u002FCOIG)\n\n```bash\n|-- $DATA_PATH\n    |--data\n        |-- train.jsonl\n        |-- eval.jsonl\n```\n\n`cd` into the scripts folder, copy and paste the script, and run. For example:\n\n```bash\ncd finetune\u002Fscripts\n\nbash run_sft_Yi_6b.sh\n```\n\nFor the Yi-6B base model, setting training_debug_steps=20 and num_train_epochs=4 can output a chat model, which takes about 20 minutes.\n\nFor the Yi-34B base model, it takes a relatively long time for initialization. Please be patient.\n\n#### Evaluation\n\n```bash\ncd finetune\u002Fscripts\n\nbash run_eval.sh\n```\n\nThen you'll see the answer from both the base model and the finetuned model.\n\u003C\u002Ful>\n\u003C\u002Fdetails>\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### Quantization\n\n#### GPT-Q\n```bash\npython quantization\u002Fgptq\u002Fquant_autogptq.py \\\n  --model \u002Fbase_model                      \\\n  --output_dir \u002Fquantized_model            \\\n  --trust_remote_code\n```\n\nOnce finished, you can then evaluate the resulting model as follows:\n\n```bash\npython quantization\u002Fgptq\u002Feval_quantized_model.py \\\n  --model \u002Fquantized_model                       \\\n  --trust_remote_code\n```\n\n\u003Cdetails style=\"display: inline;\">\u003Csummary>For details, see the explanations below. ⬇️\u003C\u002Fsummary> \u003Cul>\n\n#### GPT-Q quantization\n\n[GPT-Q](https:\u002F\u002Fgithub.com\u002FIST-DASLab\u002Fgptq) is a PTQ (Post-Training Quantization)\nmethod. It saves memory and provides potential speedups while retaining the accuracy\nof the model. \n\nYi models can be GPT-Q quantized without a lot of efforts. \nWe provide a step-by-step tutorial below.\n\nTo run GPT-Q, we will use [AutoGPTQ](https:\u002F\u002Fgithub.com\u002FPanQiWei\u002FAutoGPTQ) and\n[exllama](https:\u002F\u002Fgithub.com\u002Fturboderp\u002Fexllama).\nAnd the huggingface transformers has integrated optimum and auto-gptq to perform\nGPTQ quantization on language models.\n\n##### Do Quantization\n\nThe `quant_autogptq.py` script is provided for you to perform GPT-Q quantization:\n\n```bash\npython quant_autogptq.py --model \u002Fbase_model \\\n    --output_dir \u002Fquantized_model --bits 4 --group_size 128 --trust_remote_code\n```\n\n##### Run Quantized Model\n\nYou can run a quantized model using the `eval_quantized_model.py`:\n\n```bash\npython eval_quantized_model.py --model \u002Fquantized_model --trust_remote_code\n```\n\u003C\u002Ful>\n\u003C\u002Fdetails>\n\n#### AWQ\n\n```bash\npython quantization\u002Fawq\u002Fquant_autoawq.py \\\n  --model \u002Fbase_model                      \\\n  --output_dir \u002Fquantized_model            \\\n  --trust_remote_code\n```\n\nOnce finished, you can then evaluate the resulting model as follows:\n\n```bash\npython quantization\u002Fawq\u002Feval_quantized_model.py \\\n  --model \u002Fquantized_model                       \\\n  --trust_remote_code\n```\n\u003Cdetails style=\"display: inline;\">\u003Csummary>For details, see the explanations below. ⬇️\u003C\u002Fsummary> \u003Cul>\n\n#### AWQ quantization\n\n[AWQ](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fllm-awq) is a PTQ (Post-Training Quantization)\nmethod. It's an efficient and accurate low-bit weight quantization (INT3\u002F4) for LLMs.\n\nYi models can be AWQ quantized without a lot of efforts. \nWe provide a step-by-step tutorial below.\n\nTo run AWQ, we will use [AutoAWQ](https:\u002F\u002Fgithub.com\u002Fcasper-hansen\u002FAutoAWQ).\n\n##### Do Quantization\n\nThe `quant_autoawq.py` script is provided for you to perform AWQ quantization:\n\n```bash\npython quant_autoawq.py --model \u002Fbase_model \\\n    --output_dir \u002Fquantized_model --bits 4 --group_size 128 --trust_remote_code\n```\n\n##### Run Quantized Model\n\nYou can run a quantized model using the `eval_quantized_model.py`:\n\n```bash\npython eval_quantized_model.py --model \u002Fquantized_model --trust_remote_code\n```\n\n\n\u003C\u002Ful>\n\u003C\u002Fdetails>\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### Deployment\n\nIf you want to deploy Yi models, make sure you meet the software and hardware requirements. \n\n#### Software requirements\n\nBefore using Yi quantized models, make sure you've installed the correct software listed below.\n\n| Model | Software\n|---|---\nYi 4-bit quantized models | [AWQ and CUDA](https:\u002F\u002Fgithub.com\u002Fcasper-hansen\u002FAutoAWQ?tab=readme-ov-file#install-from-pypi)\nYi 8-bit quantized models |  [GPTQ and CUDA](https:\u002F\u002Fgithub.com\u002FPanQiWei\u002FAutoGPTQ?tab=readme-ov-file#quick-installation)\n\n#### Hardware requirements\n\nBefore deploying Yi in your environment, make sure your hardware meets the following requirements.\n\n##### Chat models\n\n| Model                | Minimum VRAM |        Recommended GPU Example       |\n|:----------------------|:--------------|:-------------------------------------:|\n| Yi-6B-Chat           | 15 GB         | 1 x RTX 3090 (24 GB) \u003Cbr> 1 x RTX 4090 (24 GB) \u003Cbr>  1 x A10 (24 GB)  \u003Cbr> 1 x A30 (24 GB)              |\n| Yi-6B-Chat-4bits     | 4 GB          | 1 x RTX 3060 (12 GB)\u003Cbr> 1 x RTX 4060 (8 GB)                   |\n| Yi-6B-Chat-8bits     | 8 GB          | 1 x RTX 3070 (8 GB) \u003Cbr> 1 x RTX 4060 (8 GB)                   |\n| Yi-34B-Chat          | 72 GB         | 4 x RTX 4090 (24 GB)\u003Cbr> 1 x A800 (80GB)               |\n| Yi-34B-Chat-4bits    | 20 GB         | 1 x RTX 3090 (24 GB) \u003Cbr> 1 x RTX 4090 (24 GB) \u003Cbr> 1 x A10 (24 GB)  \u003Cbr> 1 x A30 (24 GB)  \u003Cbr> 1 x A100 (40 GB) |\n| Yi-34B-Chat-8bits    | 38 GB         | 2 x RTX 3090 (24 GB) \u003Cbr> 2 x RTX 4090 (24 GB)\u003Cbr> 1 x A800  (40 GB) |\n\nBelow are detailed minimum VRAM requirements under different batch use cases.\n\n|  Model                  | batch=1 | batch=4 | batch=16 | batch=32 |\n| ----------------------- | ------- | ------- | -------- | -------- |\n| Yi-6B-Chat              | 12 GB   | 13 GB   | 15 GB    | 18 GB    |\n| Yi-6B-Chat-4bits  | 4 GB    | 5 GB    | 7 GB     | 10 GB    |\n| Yi-6B-Chat-8bits  | 7 GB    | 8 GB    | 10 GB    | 14 GB    |\n| Yi-34B-Chat       | 65 GB   | 68 GB   | 76 GB    | > 80 GB   |\n| Yi-34B-Chat-4bits | 19 GB   | 20 GB   | 30 GB    | 40 GB    |\n| Yi-34B-Chat-8bits | 35 GB   | 37 GB   | 46 GB    | 58 GB    |\n\n##### Base models\n\n| Model                | Minimum VRAM |        Recommended GPU Example       |\n|----------------------|--------------|:-------------------------------------:|\n| Yi-6B                | 15 GB         | 1 x RTX 3090 (24 GB) \u003Cbr> 1 x RTX 4090 (24 GB) \u003Cbr> 1 x A10 (24 GB)  \u003Cbr> 1 x A30 (24 GB)                |\n| Yi-6B-200K           | 50 GB         | 1 x A800 (80 GB)                            |\n| Yi-9B                | 20 GB         | 1 x RTX 4090 (24 GB)                           |\n| Yi-34B               | 72 GB         | 4 x RTX 4090 (24 GB) \u003Cbr> 1 x A800 (80 GB)               |\n| Yi-34B-200K          | 200 GB        | 4 x A800 (80 GB)                        |\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### FAQ\n\u003Cdetails>\n\u003Csummary> If you have any questions while using the Yi series models, the answers provided below could serve as a helpful reference for you. ⬇️\u003C\u002Fsummary> \n\u003Cbr> \n\n#### 💡Fine-tuning\n- \u003Cstrong>Base model or Chat model - which to fine-tune?\u003C\u002Fstrong>\n  \u003Cbr>The choice of pre-trained language model for fine-tuning hinges on the computational resources you have at your disposal and the particular demands of your task.\n    - If you are working with a substantial volume of fine-tuning data (say, over 10,000 samples), the Base model could be your go-to choice.\n    - On the other hand, if your fine-tuning data is not quite as extensive, opting for the Chat model might be a more fitting choice.\n    - It is generally advisable to fine-tune both the Base and Chat models, compare their performance, and then pick the model that best aligns with your specific requirements.\n- \u003Cstrong>Yi-34B versus Yi-34B-Chat for full-scale fine-tuning - what is the difference?\u003C\u002Fstrong>\n  \u003Cbr>\n  The key distinction between full-scale fine-tuning on `Yi-34B`and `Yi-34B-Chat` comes down to the fine-tuning approach and outcomes.\n    - Yi-34B-Chat employs a Special Fine-Tuning (SFT) method, resulting in responses that mirror human conversation style more closely.\n    - The Base model's fine-tuning is more versatile, with a relatively high performance potential.\n    - If you are confident in the quality of your data, fine-tuning with `Yi-34B` could be your go-to.\n    - If you are aiming for model-generated responses that better mimic human conversational style, or if you have doubts about your data quality, `Yi-34B-Chat` might be your best bet.\n\n#### 💡Quantization\n- \u003Cstrong>Quantized model versus original model - what is the performance gap?\u003C\u002Fstrong>\n    - The performance variance is largely contingent on the quantization method employed and the specific use cases of these models. For instance, when it comes to models provided by the AWQ official, from a Benchmark standpoint, quantization might result in a minor performance drop of a few percentage points.\n    - Subjectively speaking, in situations like logical reasoning, even a 1% performance shift could impact the accuracy of the output results.\n    \n#### 💡General\n- \u003Cstrong>Where can I source fine-tuning question answering datasets?\u003C\u002Fstrong>\n    - You can find fine-tuning question answering datasets on platforms like Hugging Face, with datasets like [m-a-p\u002FCOIG-CQIA](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fm-a-p\u002FCOIG-CQIA) readily available. \n    - Additionally, Github offers fine-tuning frameworks, such as [hiyouga\u002FLLaMA-Factory](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FLLaMA-Factory), which integrates pre-made datasets.\n\n- \u003Cstrong>What is the GPU memory requirement for fine-tuning Yi-34B FP16?\u003C\u002Fstrong>\n  \u003Cbr>\n  The GPU memory needed for fine-tuning 34B FP16 hinges on the specific fine-tuning method employed. For full parameter fine-tuning, you'll need 8 GPUs each with 80 GB; however, more economical solutions like Lora require less. For more details, check out [hiyouga\u002FLLaMA-Factory](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FLLaMA-Factory). Also, consider using BF16 instead of FP16 for fine-tuning to optimize performance.\n\n- \u003Cstrong>Are there any third-party platforms that support chat functionality for the Yi-34b-200k model?\u003C\u002Fstrong>\n  \u003Cbr>\n  If you're looking for third-party Chats, options include [fireworks.ai](https:\u002F\u002Ffireworks.ai\u002Flogin?callbackURL=https:\u002F\u002Ffireworks.ai\u002Fmodels\u002Ffireworks\u002Fyi-34b-chat).\n  \u003C\u002Fdetails>\n\n### Learning hub\n\n\u003Cdetails>\n\u003Csummary> If you want to learn Yi, you can find a wealth of helpful educational resources here. ⬇️\u003C\u002Fsummary> \n\u003Cbr> \n\nWelcome to the Yi learning hub! \n\nWhether you're a seasoned developer or a newcomer, you can find a wealth of helpful educational resources to enhance your understanding and skills with Yi models, including insightful blog posts, comprehensive video tutorials, hands-on guides, and more.  \n\nThe content you find here has been generously contributed by knowledgeable Yi experts and passionate enthusiasts. We extend our heartfelt gratitude for your invaluable contributions! \n\nAt the same time, we also warmly invite you to join our collaborative effort by contributing to Yi. If you have already made contributions to Yi, please don't hesitate to showcase your remarkable work in the table below.\n\nWith all these resources at your fingertips, you're ready to start your exciting journey with Yi. Happy learning! 🥳\n\n#### Tutorials\n\n##### Blog tutorials\n\n| Deliverable                                                  | Date       | Author                                                       |\n| ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ |\n| [使用 Dify、Meilisearch、零一万物模型实现最简单的 RAG   应用（三）：AI 电影推荐](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FRi2ap9_5EMzdfiBhSSL_MQ) | 2024-05-20 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [使用autodl服务器，在A40显卡上运行，   Yi-34B-Chat-int4模型，并使用vllm优化加速，显存占用42G，速度18 words-s](https:\u002F\u002Fblog.csdn.net\u002Ffreewebsys\u002Farticle\u002Fdetails\u002F134698597?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-17-134698597-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-05-20 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [Yi-VL   最佳实践](https:\u002F\u002Fmodelscope.cn\u002Fdocs\u002Fyi-vl最佳实践) | 2024-05-20 | [ModelScope](https:\u002F\u002Fgithub.com\u002Fmodelscope)                  |\n| [一键运行零一万物新鲜出炉Yi-1.5-9B-Chat大模型](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FntMs2G_XdWeM3I6RUOBJrA) | 2024-05-13 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [零一万物开源Yi-1.5系列大模型](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002Fd-ogq4hcFbsuL348ExJxpA) | 2024-05-13 | [刘聪](https:\u002F\u002Fgithub.com\u002Fliucongg)                          |\n| [零一万物Yi-1.5系列模型发布并开源！ 34B-9B-6B   多尺寸，魔搭社区推理微调最佳实践教程来啦！](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002F3wD-0dCgXB646r720o8JAg) | 2024-05-13 | [ModelScope](https:\u002F\u002Fgithub.com\u002Fmodelscope)                  |\n| [Yi-34B   本地部署简单测试](https:\u002F\u002Fblog.csdn.net\u002Farkohut\u002Farticle\u002Fdetails\u002F135331469?ops_request_misc=%7B%22request%5Fid%22%3A%22171636390616800185813639%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636390616800185813639&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-10-135331469-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-05-13 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [驾辰龙跨Llama持Wasm，玩转Yi模型迎新春过大年（上）](https:\u002F\u002Fblog.csdn.net\u002Fweixin_53443275\u002Farticle\u002Fdetails\u002F136091398?ops_request_misc=%7B%22request%5Fid%22%3A%22171636390616800185813639%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636390616800185813639&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-5-136091398-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-05-13 | [Words  worth](https:\u002F\u002Fblog.csdn.net\u002Fweixin_53443275?type=blog) |\n| [驾辰龙跨Llama持Wasm，玩转Yi模型迎新春过大年（下篇）](https:\u002F\u002Fblog.csdn.net\u002Fweixin_53443275\u002Farticle\u002Fdetails\u002F136096309) | 2024-05-13 | [Words  worth](https:\u002F\u002Fblog.csdn.net\u002Fweixin_53443275?type=blog) |\n| [Ollama新增两个命令，开始支持零一万物Yi-1.5系列模型](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FbBgzGJvUqIohodcy9U-pFw) | 2024-05-13 | AI工程师笔记                                                 |\n| [使用零一万物 200K 模型和 Dify 快速搭建模型应用](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F686774859) | 2024-05-13 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [(持更) 零一万物模型折腾笔记：社区 Yi-34B 微调模型使用](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F671549900) | 2024-05-13 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [Python+ERNIE-4.0-8K-Yi-34B-Chat大模型初探](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FWaygSfn5T8ZPB1mPdGADEQ) | 2024-05-11 | 江湖评谈                                                     |\n| [技术布道   Vue及Python调用零一万物模型和Prompt模板（通过百度千帆大模型平台）](https:\u002F\u002Fblog.csdn.net\u002Fucloud2012\u002Farticle\u002Fdetails\u002F137187469) | 2024-05-11 | [MumuLab](https:\u002F\u002Fblog.csdn.net\u002Fucloud2012?type=blog)        |\n| [多模态大模型Yi-VL-plus体验 效果很棒](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F694736111) | 2024-04-27 | [大家好我是爱因](https:\u002F\u002Fwww.zhihu.com\u002Fpeople\u002Fiamein)        |\n| [使用autodl服务器，两个3090显卡上运行，   Yi-34B-Chat-int4模型，并使用vllm优化加速，显存占用42G，速度23 words-s](https:\u002F\u002Fblog.csdn.net\u002Ffreewebsys\u002Farticle\u002Fdetails\u002F134725765?ops_request_misc=%7B%22request%5Fid%22%3A%22171636356716800211598950%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636356716800211598950&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-9-134725765-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-04-27 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [Getting Started with Yi-1.5-9B-Chat](https:\u002F\u002Fwww.secondstate.io\u002Farticles\u002Fyi-1.5-9b-chat\u002F) | 2024-04-27 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [基于零一万物yi-vl-plus大模型简单几步就能批量生成Anki图片笔记](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002F_ea6g0pzzeO4WyYtuWycWQ) | 2024-04-24 | [正经人王同学](https:\u002F\u002Fgithub.com\u002Fzjrwtx)                    |\n| [【AI开发：语言】一、Yi-34B超大模型本地部署CPU和GPU版](https:\u002F\u002Fblog.csdn.net\u002Falarey\u002Farticle\u002Fdetails\u002F137769471?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-16-137769471-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-04-21 | [My的梦想已实现](https:\u002F\u002Fblog.csdn.net\u002Falarey?type=blog)     |\n| [【Yi-34B-Chat-Int4】使用4个2080Ti显卡11G版本，运行Yi-34B模型，5年前老显卡是支持的，可以正常运行，速度   21 words-s，vllm要求算力在7以上的显卡就可以](https:\u002F\u002Fblog.csdn.net\u002Ffreewebsys\u002Farticle\u002Fdetails\u002F134754086) | 2024-03-22 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [零一万物大模型部署+微调总结](https:\u002F\u002Fblog.csdn.net\u002Fv_wus\u002Farticle\u002Fdetails\u002F135704126?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-18-135704126-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-03-22 | [v_wus](https:\u002F\u002Fblog.csdn.net\u002Fv_wus?type=blog)               |\n| [零一万物Yi大模型vllm推理时Yi-34B或Yi-6bchat重复输出的解决方案](https:\u002F\u002Fblog.csdn.net\u002Fqq_39667443\u002Farticle\u002Fdetails\u002F136028776?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-6-136028776-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-03-02 | [郝铠锋](https:\u002F\u002Fblog.csdn.net\u002Fqq_39667443?type=blog)        |\n| [Yi-34B微调训练](https:\u002F\u002Fblog.csdn.net\u002Flsjlnd\u002Farticle\u002Fdetails\u002F135336984?ops_request_misc=%7B%22request%5Fid%22%3A%22171636343416800188513953%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636343416800188513953&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-12-135336984-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-03-02 | [lsjlnd](https:\u002F\u002Fblog.csdn.net\u002Flsjlnd?type=blog)             |\n| [实测零一万物Yi-VL多模态语言模型：能准确“识图吃瓜”](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002Ffu4O9XvJ03JhimsEyI-SsQ) | 2024-02-02 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [零一万物开源Yi-VL多模态大模型，魔搭社区推理&微调最佳实践来啦！](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F680098411) | 2024-01-26 | [ModelScope](https:\u002F\u002Fgithub.com\u002Fmodelscope)                  |\n| [单卡 3 小时训练 Yi-6B 大模型 Agent：基于 Llama   Factory 实战](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F678989191) | 2024-01-22 | [郑耀威](https:\u002F\u002Fgithub.com\u002Fhiyouga)                         |\n| [零一科技Yi-34B   Chat大模型环境搭建&推理](https:\u002F\u002Fblog.csdn.net\u002Fzzq1989_\u002Farticle\u002Fdetails\u002F135597181?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-8-135597181-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-01-15 | [要养家的程序员](https:\u002F\u002Fblog.csdn.net\u002Fzzq1989_?type=blog)   |\n| [基于LLaMA   Factory，单卡3小时训练专属大模型 Agent](https:\u002F\u002Fblog.csdn.net\u002Fm0_59596990\u002Farticle\u002Fdetails\u002F135760285?ops_request_misc=%7B%22request%5Fid%22%3A%22171636343416800188513953%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636343416800188513953&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-10-135760285-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-01-15 | [机器学习社区](https:\u002F\u002Fblog.csdn.net\u002Fm0_59596990?type=blog)  |\n| [双卡   3080ti 部署 Yi-34B 大模型 - Gradio + vLLM 踩坑全记录](https:\u002F\u002Fblog.csdn.net\u002Farkohut\u002Farticle\u002Fdetails\u002F135321242?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-10-135321242-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-01-02 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [【大模型部署实践-3】3个能在3090上跑起来的4bits量化Chat模型（baichuan2-13b、InternLM-20b、Yi-34b）](https:\u002F\u002Fblog.csdn.net\u002Fqq_40302568\u002Farticle\u002Fdetails\u002F135040985?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-30-135040985-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-01-02 | [aq_Seabiscuit](https:\u002F\u002Fblog.csdn.net\u002Fqq_40302568?type=blog) |\n| [只需 24G   显存，用 vllm 跑起来 Yi-34B 中英双语大模型](https:\u002F\u002Fblog.csdn.net\u002Farkohut\u002Farticle\u002Fdetails\u002F135274973) | 2023-12-28 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [零一万物模型官方   Yi-34B 模型本地离线运行部署使用笔记（物理机和docker两种部署方式），200K 超长文本内容，34B 干翻一众 70B   模型，打榜分数那么高，这模型到底行不行？](https:\u002F\u002Fblog.csdn.net\u002Fu014374009\u002Farticle\u002Fdetails\u002F136327696) | 2023-12-28 | [代码讲故事](https:\u002F\u002Fblog.csdn.net\u002Fu014374009?type=blog)     |\n| [LLM -   大模型速递之 Yi-34B 入门与 LoRA 微调](https:\u002F\u002Fblog.csdn.net\u002FBIT_666\u002Farticle\u002Fdetails\u002F134990402) | 2023-12-18 | [BIT_666](https:\u002F\u002Fbitddd.blog.csdn.net\u002F?type=blog)           |\n| [通过vllm框架进行大模型推理](https:\u002F\u002Fblog.csdn.net\u002Fweixin_45920955\u002Farticle\u002Fdetails\u002F135300561?ops_request_misc=%7B%22request%5Fid%22%3A%22171636343416800188513953%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636343416800188513953&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-13-135300561-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2023-12-18 | [土山炮](https:\u002F\u002Fblog.csdn.net\u002Fweixin_45920955?type=blog)    |\n| [CPU 混合推理，非常见大模型量化方案：“二三五六” 位量化方案](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F671698216) | 2023-12-12 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [零一万物模型折腾笔记：官方 Yi-34B 模型基础使用](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F671387298) | 2023-12-10 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [Running Yi-34B-Chat locally using LlamaEdge](https:\u002F\u002Fwww.secondstate.io\u002Farticles\u002Fyi-34b\u002F) | 2023-11-30 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [本地运行零一万物 34B 大模型，使用 Llama.cpp &   21G 显存](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F668921042) | 2023-11-26 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n\n##### GitHub Project\n\n| Deliverable                                                  | Date       | Author                                      |\n| ------------------------------------------------------------ | ---------- | ------------------------------------------- |\n| [yi-openai-proxy](https:\u002F\u002Fgithub.com\u002Fsoulteary\u002Fyi-openai-proxy) | 2024-05-11 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)        |\n| [基于零一万物 Yi 模型和 B 站构建大语言模型高质量训练数据集](https:\u002F\u002Fgithub.com\u002Fzjrwtx\u002FbilibiliQA_databuilder) | 2024-04-29 | [正经人王同学](https:\u002F\u002Fgithub.com\u002Fzjrwtx)   |\n| [基于视频网站和零一万物大模型构建大语言模型高质量训练数据集](https:\u002F\u002Fgithub.com\u002Fzjrwtx\u002FVideoQA_databuilder) | 2024-04-25 | [正经人王同学](https:\u002F\u002Fgithub.com\u002Fzjrwtx)   |\n| [基于零一万物yi-34b-chat-200k输入任意文章地址，点击按钮即可生成无广告或推广内容的简要笔记，并生成分享图给好友](https:\u002F\u002Fgithub.com\u002Fzjrwtx\u002Fopen_summary) | 2024-04-24 | [正经人王同学](https:\u002F\u002Fgithub.com\u002Fzjrwtx)   |\n| [Food-GPT-Yi-model](https:\u002F\u002Fgithub.com\u002FThisisHubert\u002FFoodGPT-Yi-model) | 2024-04-21 | [Hubert S](https:\u002F\u002Fgithub.com\u002FThisisHubert) |\n\n##### Video tutorials\n\n| Deliverable                                                  | Date       | Author                                                       |\n| ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ |\n| [Run dolphin-2.2-yi-34b on IoT Devices](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=NJ89T5mO25Y) | 2023-11-30 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [只需 24G 显存，用 vllm 跑起来 Yi-34B 中英双语大模型](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV17t4y1f7Ee\u002F) | 2023-12-28 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [Install Yi 34B Locally - Chinese English Bilingual LLM](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=CVQvj4Wrh4w&t=476s) | 2023-11-05 | [Fahd Mirza](https:\u002F\u002Fwww.youtube.com\u002F@fahdmirza)             |\n| [Dolphin Yi 34b - Brand New Foundational Model TESTED](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=On3Zuv27V3k&t=85s) | 2023-11-27 | [Matthew Berman](https:\u002F\u002Fwww.youtube.com\u002F@matthew_berman)    |\n| [Yi-VL-34B 多模态大模型 - 用两张 A40 显卡跑起来](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Q5411y7AG\u002F) | 2024-01-28 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [4060Ti 16G显卡安装零一万物最新开源的Yi-1.5版大语言模型](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV16i421X7Jx\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-05-14 | [titan909](https:\u002F\u002Fspace.bilibili.com\u002F526393761)             |\n| [Yi-1.5: True Apache 2.0 Competitor to LLAMA-3](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=KCDYrfWeTRc) | 2024-05-13 | [Prompt Engineering](https:\u002F\u002Fwww.youtube.com\u002F@engineerprompt) |\n| [Install Yi-1.5 Model Locally - Beats Llama 3 in Various Benchmarks](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Ba-G7Il0UkA) | 2024-05-13 | [Fahd Mirza](https:\u002F\u002Fwww.youtube.com\u002F@fahdmirza)             |\n| [how to install Ollama and run Yi 6B](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=4Jnar7OUHqQ) | 2024-05-13 | [Ridaa Davids](https:\u002F\u002Fwww.youtube.com\u002F@quantanovabusiness)  |\n| [地表最强混合智能AI助手：llama3_70B+Yi_34B+Qwen1.5_110B](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Xm411C7V1\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-05-04 | [朱扎特](https:\u002F\u002Fspace.bilibili.com\u002F494512200?spm_id_from=333.788.0.0) |\n| [ChatDoc学术论文辅助--基于Yi-34B和langchain进行PDF知识库问答](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV11i421C7B5\u002F?spm_id_from=333.999.0.0&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-05-03 | [朱扎特](https:\u002F\u002Fspace.bilibili.com\u002F494512200?spm_id_from=333.788.0.0) |\n| [基于Yi-34B的领域知识问答项目演示](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1zZ42177ZA\u002F?spm_id_from=333.999.0.0&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-05-02 | [朱扎特](https:\u002F\u002Fspace.bilibili.com\u002F494512200?spm_id_from=333.788.0.0) |\n| [使用RTX4090+GaLore算法 全参微调Yi-6B大模型](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1ax4y1U7Ep\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-03-24 | [小工蚂创始人](https:\u002F\u002Fspace.bilibili.com\u002F478674499?spm_id_from=333.788.0.0) |\n| [无内容审查NSFW大语言模型Yi-34B-Chat蒸馏版测试,RolePlay,《天龙八部》马夫人康敏,本地GPU,CPU运行](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=VL-W0TnLCns) | 2024-03-20 | [刘悦的技术博客](https:\u002F\u002Fv3u.cn\u002F)                            |\n| [无内容审查NSFW大语言模型整合包,Yi-34B-Chat,本地CPU运行,角色扮演潘金莲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=rBvbgwz3oHM) | 2024-03-16 | [刘悦的技术博客](https:\u002F\u002Fv3u.cn\u002F)                            |\n| [量化 Yi-34B-Chat 并在单卡 RTX 4090 使用 vLLM 部署](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1jx421y7xj\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-03-05 | [白鸽巢](https:\u002F\u002Fspace.bilibili.com\u002F138938660?spm_id_from=333.788.0.0) |\n| [Yi-VL-34B（5）：使用3个3090显卡24G版本，运行Yi-VL-34B模型，支持命令行和web界面方式，理解图片的内容转换成文字](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1BB421z7oA\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-27 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [Win环境KoboldCpp本地部署大语言模型进行各种角色扮演游戏](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV14J4m1e77f\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-25 | [魚蟲蟲](https:\u002F\u002Fspace.bilibili.com\u002F431981179?spm_id_from=333.788.0.0) |\n| [无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P2](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV19v421677y\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-23 | [魚蟲蟲](https:\u002F\u002Fspace.bilibili.com\u002F431981179?spm_id_from=333.788.0.0) |\n| [【wails】（2）：使用go-llama.cpp 运行 yi-01-6b大模型，使用本地CPU运行，速度还可以，等待下一版本更新](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV194421F7Fy\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-20 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [【xinference】（6）：在autodl上，使用xinference部署yi-vl-chat和qwen-vl-chat模型，可以使用openai调用成功](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV19Z421z7cv\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-06 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P1](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1tU421o7Co\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-05 | [魚蟲蟲](https:\u002F\u002Fspace.bilibili.com\u002F431981179?spm_id_from=333.788.0.0) |\n| [2080Ti部署YI-34B大模型 xinference-oneapi-fastGPT本地知识库使用指南](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1hC411z7xu\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-30 | [小饭护法要转码](https:\u002F\u002Fspace.bilibili.com\u002F39486865?spm_id_from=333.788.0.0) |\n| [Best Story Writing AI Model - Install Yi 6B 200K Locally on Windows](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=cZs2jRtl0bs) | 2024-01-22 | [Fahd Mirza](https:\u002F\u002Fwww.youtube.com\u002F@fahdmirza)             |\n| [Mac 本地运行大语言模型方法与常见问题指南（Yi 34B 模型+32 GB 内存测试）](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1VT4y1b7Th\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-21 | [小吴苹果机器人](https:\u002F\u002Fspace.bilibili.com\u002F1732749682?spm_id_from=333.788.0.0) |\n| [【Dify知识库】（11）：Dify0.4.9改造支持MySQL，成功接入yi-6b 做对话，本地使用fastchat启动，占8G显存，完成知识库配置](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1ia4y1y7JH\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-21 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [这位LLM先生有点暴躁,用的是YI-6B的某个量化版,#LLM #大语言模型 #暴躁老哥](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=eahXJrdtQuc) | 2024-01-20 | [晓漫吧](https:\u002F\u002Fwww.youtube.com\u002F@xiaomanba)                 |\n| [大模型推理 NvLink 桥接器有用吗｜双卡 A6000 测试一下](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1AW4y1w7DC\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-17 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [大模型推理 A40 vs A6000 谁更强 - 对比 Yi-34B 的单、双卡推理性能](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1aK4y1z7GF\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-15 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [C-Eval 大语言模型评测基准- 用 LM Evaluation Harness + vLLM 跑起来](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Yw411g7ZL\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-11 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [双显卡部署 Yi-34B 大模型 - vLLM + Gradio 踩坑记录](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1p94y1c7ak\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-01 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [手把手教学！使用 vLLM 快速部署 Yi-34B-Chat](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1ew41157Mk\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-26 | [白鸽巢](https:\u002F\u002Fspace.bilibili.com\u002F138938660?spm_id_from=333.788.0.0) |\n| [如何训练企业自己的大语言模型？Yi-6B LORA微调演示 #小工蚁](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1uc41117zz\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-21 | [小工蚂创始人](https:\u002F\u002Fspace.bilibili.com\u002F478674499?spm_id_from=333.788.0.0) |\n| [Yi-34B（4）：使用4个2080Ti显卡11G版本，运行Yi-34B模型，5年前老显卡是支持的，可以正常运行，速度 21 words\u002Fs](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1nj41157L3\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-02 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [使用autodl服务器，RTX 3090 * 3 显卡上运行， Yi-34B-Chat模型，显存占用60G](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1BM411R7ae\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-01 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [使用autodl服务器，两个3090显卡上运行， Yi-34B-Chat-int4模型，用vllm优化，增加 --num-gpu 2，速度23 words\u002Fs](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Hu4y1L7BH\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-01 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [Yi大模型一键本地部署 技术小白玩转AI](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV16H4y117md\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-01 | [技术小白玩转AI](https:\u002F\u002Fspace.bilibili.com\u002F3546586137234288?spm_id_from=333.788.0.0) |\n| [01.AI's Yi-6B: Overview and Fine-Tuning](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=mye-UOkAliQ) | 2023-11-28 | [AI Makerspace](https:\u002F\u002Fwww.youtube.com\u002F@AI-Makerspace)      |\n| [Yi 34B Chat LLM outperforms Llama 70B](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=RYtrF-R5jDc) | 2023-11-27 | [DLExplorer](https:\u002F\u002Fwww.youtube.com\u002F@DLExplorers-lg7dt)     |\n| [How to run open source models on mac Yi 34b on m3 Max](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=GAo-dopkgjI) | 2023-11-26 | [TECHNO PREMIUM](https:\u002F\u002Fwww.youtube.com\u002F@technopremium91)   |\n| [Yi-34B - 200K - The BEST & NEW CONTEXT WINDOW KING ](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=7WBojwwv5Qo) | 2023-11-24 | [Prompt Engineering](https:\u002F\u002Fwww.youtube.com\u002F@engineerprompt) |\n| [Yi 34B : The Rise of Powerful Mid-Sized Models - Base,200k & Chat](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=bWCjwtu_tHs) | 2023-11-24 | [Sam Witteveen](https:\u002F\u002Fwww.youtube.com\u002F@samwitteveenai)     |\n| [在IoT设备运行破解版李开复大模型dolphin-2.2-yi-34b（还可作为私有OpenAI API服务器）](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1SQ4y18744\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-11-15 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [Run dolphin-2.2-yi-34b on IoT Devices (Also works as a Private OpenAI API Server)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=NJ89T5mO25Y) | 2023-11-14 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [How to Install Yi 34B 200K Llamafied on Windows Laptop](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=enoha4K4HkQ) | 2023-11-11 | [Fahd Mirza](https:\u002F\u002Fwww.youtube.com\u002F@fahdmirza)             |\n\n\u003C\u002Fdetails>\n\n\n# Why Yi? \n\n  - [Ecosystem](#ecosystem)\n    - [Upstream](#upstream)\n    - [Downstream](#downstream)\n      - [Serving](#serving)\n      - [Quantization](#quantization-1)\n      - [Fine-tuning](#fine-tuning-1)\n      - [API](#api)\n  - [Benchmarks](#benchmarks)\n    - [Chat model performance](#chat-model-performance)\n    - [Base model performance](#base-model-performance)\n      - [Yi-34B and Yi-34B-200K](#yi-34b-and-yi-34b-200k)\n      - [Yi-9B](#yi-9b)\n\n## Ecosystem\n\nYi has a comprehensive ecosystem, offering a range of tools, services, and models to enrich your experiences and maximize productivity.\n\n- [Upstream](#upstream)\n- [Downstream](#downstream)\n  - [Serving](#serving)\n  - [Quantization](#quantization-1)\n  - [Fine-tuning](#fine-tuning-1)\n  - [API](#api)\n\n### Upstream\n\nThe Yi series models follow the same model architecture as Llama. By choosing Yi, you can leverage existing tools, libraries, and resources within the Llama ecosystem, eliminating the need to create new tools and enhancing development efficiency.\n\nFor example, the Yi series models are saved in the format of the Llama model. You can directly use `LlamaForCausalLM` and `LlamaTokenizer` to load the model. For more information, see [Use the chat model](#31-use-the-chat-model).\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"01-ai\u002FYi-34b\", use_fast=False)\n\nmodel = AutoModelForCausalLM.from_pretrained(\"01-ai\u002FYi-34b\", device_map=\"auto\")\n```\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### Downstream\n\n> 💡 Tip\n> \n> - Feel free to create a PR and share the fantastic work you've built using the Yi series models.\n>\n> - To help others quickly understand your work, it is recommended to use the format of `\u003Cmodel-name>: \u003Cmodel-intro> + \u003Cmodel-highlights>`.\n\n#### Serving \n\nIf you want to get up with Yi in a few minutes, you can use the following services built upon Yi.\n\n- Yi-34B-Chat: you can chat with Yi using one of the following platforms:\n  - [Yi-34B-Chat | Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002F01-ai\u002FYi-34B-Chat)\n  - [Yi-34B-Chat | Yi Platform](https:\u002F\u002Fplatform.lingyiwanwu.com\u002F): **Note** that currently it's available through a whitelist. Welcome to apply (fill out a form in [English](https:\u002F\u002Fcn.mikecrm.com\u002Fl91ODJf) or [Chinese](https:\u002F\u002Fcn.mikecrm.com\u002FgnEZjiQ)) and experience it firsthand!\n  \n- [Yi-6B-Chat (Replicate)](https:\u002F\u002Freplicate.com\u002F01-ai): you can use this model with more options by setting additional parameters and calling APIs.\n  \n- [ScaleLLM](https:\u002F\u002Fgithub.com\u002Fvectorch-ai\u002FScaleLLM#supported-models): you can use this service to run Yi models locally with added flexibility and customization.\n  \n#### Quantization\n\nIf you have limited computational capabilities, you can use Yi's quantized models as follows. \n\nThese quantized models have reduced precision but offer increased efficiency, such as faster inference speed and smaller RAM usage.\n\n- [TheBloke\u002FYi-34B-GPTQ](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FYi-34B-GPTQ) \n- [TheBloke\u002FYi-34B-GGUF](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FYi-34B-GGUF)\n- [TheBloke\u002FYi-34B-AWQ](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FYi-34B-AWQ)\n  \n#### Fine-tuning\n\nIf you're seeking to explore the diverse capabilities within Yi's thriving family, you can delve into Yi's fine-tuned models as below.\n\n- [TheBloke Models](https:\u002F\u002Fhuggingface.co\u002FTheBloke): this site hosts numerous fine-tuned models derived from various LLMs including Yi. \n  \n  This is not an exhaustive list for Yi, but to name a few sorted on downloads:\n  - [TheBloke\u002Fdolphin-2_2-yi-34b-AWQ](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002Fdolphin-2_2-yi-34b-AWQ)\n  - [TheBloke\u002FYi-34B-Chat-AWQ](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FYi-34B-Chat-AWQ)\n  - [TheBloke\u002FYi-34B-Chat-GPTQ](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FYi-34B-Chat-GPTQ)\n  \n- [SUSTech\u002FSUS-Chat-34B](https:\u002F\u002Fhuggingface.co\u002FSUSTech\u002FSUS-Chat-34B): this model ranked first among all models below 70B and outperformed the twice larger deepseek-llm-67b-chat. You can check the result on the [Open LLM Leaderboard](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FHuggingFaceH4\u002Fopen_llm_leaderboard).\n  \n- [OrionStarAI\u002FOrionStar-Yi-34B-Chat-Llama](https:\u002F\u002Fhuggingface.co\u002FOrionStarAI\u002FOrionStar-Yi-34B-Chat-Llama): this model excelled beyond other models (such as GPT-4, Qwen-14B-Chat, Baichuan2-13B-Chat) in C-Eval and CMMLU evaluations on the [OpenCompass LLM Leaderboard](https:\u002F\u002Fopencompass.org.cn\u002Fleaderboard-llm). \n  \n- [NousResearch\u002FNous-Capybara-34B](https:\u002F\u002Fhuggingface.co\u002FNousResearch\u002FNous-Capybara-34B): this model is trained with 200K context length and 3 epochs on the Capybara dataset. \n\n#### API\n\n- [amazing-openai-api](https:\u002F\u002Fgithub.com\u002Fsoulteary\u002Famazing-openai-api): this tool converts Yi model APIs into the OpenAI API format out of the box.\n- [LlamaEdge](https:\u002F\u002Fwww.secondstate.io\u002Farticles\u002Fyi-34b\u002F#create-an-openai-compatible-api-service-for-the-yi-34b-chat-model): this tool builds an OpenAI-compatible API server for Yi-34B-Chat using a portable Wasm (WebAssembly) file, powered by Rust.\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n## Tech report\n\nFor detailed capabilities of the Yi series model, see [Yi: Open Foundation Models by 01.AI](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.04652).\n\n### Citation\n\n```\n@misc{ai2024yi,\n    title={Yi: Open Foundation Models by 01.AI},\n    author={01. AI and : and Alex Young and Bei Chen and Chao Li and Chengen Huang and Ge Zhang and Guanwei Zhang and Heng Li and Jiangcheng Zhu and Jianqun Chen and Jing Chang and Kaidong Yu and Peng Liu and Qiang Liu and Shawn Yue and Senbin Yang and Shiming Yang and Tao Yu and Wen Xie and Wenhao Huang and Xiaohui Hu and Xiaoyi Ren and Xinyao Niu and Pengcheng Nie and Yuchi Xu and Yudong Liu and Yue Wang and Yuxuan Cai and Zhenyu Gu and Zhiyuan Liu and Zonghong Dai},\n    year={2024},\n    eprint={2403.04652},\n    archivePrefix={arXiv},\n    primaryClass={cs.CL}\n}\n```\n\n## Benchmarks \n\n- [Chat model performance](#chat-model-performance)\n- [Base model performance](#base-model-performance)\n\n### Chat model performance\n\nYi-34B-Chat model demonstrates exceptional performance, ranking first among all existing open-source models in the benchmarks including MMLU, CMMLU, BBH, GSM8k, and more.\n\n![Chat model performance](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_d317706bd8fe.png) \n\n\u003Cdetails>\n\u003Csummary> Evaluation methods and challenges. ⬇️ \u003C\u002Fsummary>\n\n- **Evaluation methods**: we evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA.\n- **Zero-shot vs. few-shot**: in chat models, the zero-shot approach is more commonly employed.\n- **Evaluation strategy**: our evaluation strategy involves generating responses while following instructions explicitly or implicitly (such as using few-shot examples). We then isolate relevant answers from the generated text.\n- **Challenges faced**: some models are not well-suited to produce output in the specific format required by instructions in few datasets, which leads to suboptimal results.\n\n\u003Cstrong>*\u003C\u002Fstrong>: C-Eval results are evaluated on the validation datasets\n\u003C\u002Fdetails>\n\n### Base model performance\n\n#### Yi-34B and Yi-34B-200K \n\nThe Yi-34B and Yi-34B-200K models stand out as the top performers among open-source models, especially excelling in MMLU, CMMLU, common-sense reasoning, reading comprehension, and more.\n\n![Base model performance](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_aaa969cb7c68.png)\n\n\u003Cdetails>\n\u003Csummary> Evaluation methods. ⬇️\u003C\u002Fsummary>\n\n- **Disparity in results**: while benchmarking open-source models, a disparity has been noted between results from our pipeline and those reported by public sources like OpenCompass.\n- **Investigation findings**: a deeper investigation reveals that variations in prompts, post-processing strategies, and sampling techniques across models may lead to significant outcome differences.\n- **Uniform benchmarking process**: our methodology aligns with the original benchmarks—consistent prompts and post-processing strategies are used, and greedy decoding is applied during evaluations without any post-processing for the generated content.\n- **Efforts to retrieve unreported scores**: for scores that were not reported by the original authors (including scores reported with different settings), we try to get results with our pipeline.\n- **Extensive model evaluation**: to evaluate the model’s capability extensively, we adopted the methodology outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ were incorporated to evaluate reading comprehension.\n- **Special configurations**: CSQA was exclusively tested using a 7-shot setup, while all other tests were conducted with a 0-shot configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), HumanEval (0-shot@1), and MBPP (3-shot@1) under the category \"Math & Code\".\n- **Falcon-180B caveat**: Falcon-180B was not tested on QuAC and OBQA due to technical constraints. Its performance score is an average from other tasks, and considering the generally lower scores of these two tasks, Falcon-180B's capabilities are likely not underestimated.\n\u003C\u002Fdetails>\n\n#### Yi-9B\n\nYi-9B is almost the best among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension.\n\n![Yi-9B benchmark - details](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_c6971329b01f.png)\n\n- In terms of **overall** ability (Mean-All), Yi-9B performs the best among similarly sized open-source models, surpassing DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B, and Gemma-7B.\n\n  ![Yi-9B benchmark - overall](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_19980488c3c4.png)\n\n- In terms of **coding** ability (Mean-Code), Yi-9B's performance is second only to DeepSeek-Coder-7B, surpassing Yi-34B, SOLAR-10.7B, Mistral-7B, and Gemma-7B.\n\n  ![Yi-9B benchmark - code](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_b2eafeb91353.png)\n\n- In terms of **math** ability (Mean-Math), Yi-9B's performance is second only to DeepSeek-Math-7B, surpassing SOLAR-10.7B, Mistral-7B, and Gemma-7B.\n\n  ![Yi-9B benchmark - math](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_ef6008782aea.png)\n\n- In terms of **common sense and reasoning** ability (Mean-Text), Yi-9B's performance is on par with Mistral-7B, SOLAR-10.7B, and Gemma-7B.\n\n  ![Yi-9B benchmark - text](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_41deed45235f.png)\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n# Who can use Yi?\n\nEveryone! 🙌 ✅\n\nThe code and weights of the Yi series models are distributed under the [Apache 2.0 license](https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Fmain\u002FLICENSE), which means the Yi series models are free for personal usage, academic purposes, and commercial use. \n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n# Misc.\n\n### Acknowledgments\n\nA heartfelt thank you to each of you who have made contributions to the Yi community! You have helped Yi not just a project, but a vibrant, growing home for innovation.\n\n[![yi contributors](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_c30fd069c6c5.png)](https:\u002F\u002Fgithub.com\u002F01-ai\u002Fyi\u002Fgraphs\u002Fcontributors)\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### Disclaimer\n\nWe use data compliance checking algorithms during the training process, to\nensure the compliance of the trained model to the best of our ability. Due to\ncomplex data and the diversity of language model usage scenarios, we cannot\nguarantee that the model will generate correct, and reasonable output in all\nscenarios. Please be aware that there is still a risk of the model producing\nproblematic outputs. We will not be responsible for any risks and issues\nresulting from misuse, misguidance, illegal usage, and related misinformation,\nas well as any associated data security concerns.\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### License\n\nThe code and weights of the Yi-1.5 series models are distributed under the [Apache 2.0 license](https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Fmain\u002FLICENSE).\n\nIf you create derivative works based on this model, please include the following attribution in your derivative works:\n\n    This work is a derivative of [The Yi Series Model You Base On] by 01.AI, used under the Apache 2.0 License.\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">Back to top ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n","\u003Cp align=\"left\">\n    &nbspEnglish&nbsp | &nbsp; \u003Ca href=\"README_CN.md\">中文\u003C\u002Fa>\n\u003C\u002Fp>\n\u003Cbr>\u003Cbr>\n\n\u003Cdiv align=\"center\">\n\n\u003Cpicture>\n  \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002F01-ai\u002FYi\u002Fmain\u002Fassets\u002Fimg\u002FYi_logo_icon_dark.svg\" width=\"200px\">\n  \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002F01-ai\u002FYi\u002Fmain\u002Fassets\u002Fimg\u002FYi_logo_icon_light.svg\" width=\"200px\"> \n  \u003Cimg alt=\"specify theme context for images\" src=\"https:\u002F\u002Fraw.githubusercontent.com\u002F01-ai\u002FYi\u002Fmain\u002Fassets\u002Fimg\u002FYi_logo_icon_light.svg\" width=\"200px\">\n\u003C\u002Fpicture>\n\n\u003C\u002Fbr>\n\u003C\u002Fbr>\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Factions\u002Fworkflows\u002Fbuild_docker_image.yml\">\n  \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Factions\u002Fworkflows\u002Fbuild_docker_image.yml\u002Fbadge.svg\">\n\u003C\u002Fa>\n\u003Ca href=\"mailto:oss@01.ai\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F✉️-yi@01.ai-FFE01B\">\n\u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n\u003Cdiv id=\"top\">\u003C\u002Fdiv>  \n\n\u003Cdiv align=\"center\">\n  \u003Ch3 align=\"center\">构建下一代开源双语大模型\u003C\u002Fh3>\n\u003C\u002Fdiv>\n\u003Cp align=\"center\">\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F01-ai\" target=\"_blank\">Hugging Face\u003C\u002Fa> • 🤖 \u003Ca href=\"https:\u002F\u002Fwww.modelscope.cn\u002Forganization\u002F01ai\u002F\" target=\"_blank\">ModelScope\u003C\u002Fa> • 🟣 \u003Ca href=\"https:\u002F\u002Fwisemodel.cn\u002Forganization\u002F01.AI\" target=\"_blank\">wisemodel\u003C\u002Fa>\n\u003C\u002Fp> \n\n\u003Cp align=\"center\">\n    👩‍🚀 在 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fdiscussions\" target=\"_blank\"> GitHub \u003C\u002Fa> 上提问或讨论想法\n\u003C\u002Fp> \n\n\u003Cp align=\"center\">\n    👋 加入我们的 \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FhYUwWddeAu\" target=\"_blank\"> 👾 Discord \u003C\u002Fa> 或 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi-1.5\u002Fissues\u002F2\" target=\"_blank\"> 💬 微信 \u003C\u002Fa>\n\u003C\u002Fp> \n\n\u003Cp align=\"center\">\n    📝 查看 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.04652\"> Yi 技术报告 \u003C\u002Fa>\n\u003C\u002Fp> \n\n\u003Cp align=\"center\">\n    📚 在 \u003Ca href=\"#learning-hub\"> Yi 学习中心 \u003C\u002Fa> 不断成长\n\u003C\u002Fp> \n\n\u003Cp align=\"center\">\n    💪 在 \u003Ca href=\"https:\u002F\u002F01-ai.github.io\u002F\"> Yi 技术博客 \u003C\u002Fa> 学习更多知识\n\u003C\u002Fp> \n\n\u003C!-- DO NOT REMOVE ME -->\n\n\u003Chr>\n\n\u003Cdetails open>\n\u003Csummary>\u003C\u002Fb>📕 目录\u003C\u002Fb>\u003C\u002Fsummary>\n\n- [什么是 Yi？](#what-is-yi)\n  - [简介](#introduction)\n  - [模型](#models)\n    - [对话模型](#chat-models)\n    - [基础模型](#base-models)\n    - [模型信息](#model-info)\n  - [新闻](#news)\n- [如何使用 Yi？](#how-to-use-yi)\n  - [快速入门](#quick-start)\n    - [选择你的路径](#choose-your-path)\n    - [pip](#quick-start---pip)\n    - [docker](#quick-start---docker)\n    - [llama.cpp](#quick-start---llamacpp)\n    - [conda-lock](#quick-start---conda-lock)\n    - [Web 演示](#web-demo)\n  - [微调](#fine-tuning)\n  - [量化](#quantization)\n  - [部署](#deployment)\n  - [常见问题解答](#faq)\n  - [学习中心](#learning-hub)\n- [为什么选择 Yi？](#why-yi)\n  - [生态系统](#ecosystem)\n    - [上游](#upstream)\n    - [下游](#downstream)\n      - [推理服务](#serving)\n      - [量化](#quantization-1)\n      - [微调](#fine-tuning-1)\n      - [API](#api)\n  - [基准测试](#benchmarks)\n    - [基础模型性能](#base-model-performance)\n    - [对话模型性能](#chat-model-performance)\n  - [技术报告](#tech-report)\n    - [引用](#citation)\n- [谁可以使用 Yi？](#who-can-use-yi)\n- [其他](#misc)\n  - [致谢](#acknowledgments)\n  - [免责声明](#disclaimer)\n  - [许可证](#license)\n\n\u003C\u002Fdetails>\n\n\u003Chr>\n\n# 什么是 Yi？\n\n## 简介 \n\n- 🤖 Yi 系列模型是由 [01.AI](https:\u002F\u002F01.ai\u002F) 从零开始训练的下一代开源大型语言模型。\n\n- 🙌 作为一款双语语言模型，Yi 系列模型基于 3T 多语言语料库进行训练，已成为全球最强大的 LLM 之一，在语言理解、常识推理、阅读理解等方面表现出色。例如：\n  \n  - Yi-34B-Chat 模型在 AlpacaEval 排行榜上**位居第二（仅次于 GPT-4 Turbo）**，超越了 GPT-4、Mixtral、Claude 等其他 LLM（数据截至 2024 年 1 月）。\n\n  - Yi-34B 模型在多个基准测试中，包括 Hugging Face Open LLM Leaderboard（预训练）和 C-Eval（数据截至 2023 年 11 月），**在英语和中文两个语种中均排名第一**，领先于 Falcon-180B、Llama-70B、Claude 等现有开源模型。\n\n  - 🙏 （感谢 Llama）得益于 Transformer 和 Llama 开源社区的努力，它们大大减少了从零构建模型所需的工作量，并使得 AI 生态系统中的工具得以共享使用。  \n\n  \u003Cdetails style=\"display: inline;\">\u003Csummary> 如果你对 Yi 如何采用 Llama 架构以及其许可使用政策感兴趣，请参阅 \u003Cspan style=\"color:  green;\">Yi 与 Llama 的关系。\u003C\u002Fspan> ⬇️\u003C\u002Fsummary> \u003Cul> \u003Cbr>\n  \n  \n> 💡 TL;DR\n> \n> Yi 系列模型采用了与 Llama 相同的模型架构，但**并非** Llama 的衍生品。\n\n- Yi 和 Llama 都基于 Transformer 结构，自 2018 年以来，Transformer 已成为大型语言模型的标准架构。\n\n- 基于 Transformer 架构，Llama 凭借其出色的稳定性、可靠的收敛性和强大的兼容性，已成为大多数先进开源模型的新基石，也因此被公认为包括 Yi 在内的诸多模型的基础框架。\n\n- 正是由于 Transformer 和 Llama 架构的存在，其他模型能够充分利用其优势，从而减少从零构建模型所需的努力，并在各自的生态系统中共享使用相同的工具。\n\n- 然而，Yi 系列模型**并非** Llama 的衍生品，因为它们并未使用 Llama 的权重。\n\n  - 由于大多数开源模型都采用了 Llama 的结构，决定模型性能的关键因素在于训练数据集、训练流程以及训练基础设施。\n\n  - Yi 以独特且自主的方式，完全从头开始构建了高质量的训练数据集、高效的训练流程以及稳健的训练基础设施。正是这些努力使 Yi 系列模型取得了优异的成绩，其性能紧随 GPT4 之后，并在 2023 年 12 月的 [Alpaca 排行榜] 上超越了 Llama。\n\u003C\u002Ful>\n\u003C\u002Fdetails>\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n## 新闻 \n\n\u003Cdetails>\n  \u003Csummary>🔥 \u003Cb>2024-07-29\u003C\u002Fb>: \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FHaijian06\u002FYi\u002Ftree\u002Fmain\u002FCookbook\">Yi Cookbook 1.0\u003C\u002Fa> 正式发布，包含中英文教程与示例。\u003C\u002Fsummary>\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>🎯 \u003Cb>2024-05-13\u003C\u002Fb>: \u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi-1.5\">Yi-1.5 系列模型\u003C\u002Fa> 开源，进一步提升了代码、数学、推理及指令遵循能力。\u003C\u002Fsummary>\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>🎯 \u003Cb>2024-03-16\u003C\u002Fb>: \u003Ccode>Yi-9B-200K\u003C\u002Fcode> 已开源并面向公众开放。\u003C\u002Fsummary>\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>🎯 \u003Cb>2024-03-08\u003C\u002Fb>: \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.04652\">Yi 技术报告\u003C\u002Fa> 发表！\u003C\u002Fsummary>\n\u003C\u002Fdetails>\n\n\n\u003Cdetails open>\n  \u003Csummary>🔔 \u003Cb>2024-03-07\u003C\u002Fb>: Yi-34B-200K 的长文本处理能力得到增强。\u003C\u002Fsummary>\n  \u003Cbr>\n在“大海捞针”测试中，Yi-34B-200K 的表现提升了 10.5%，从 89.3% 提升至令人印象深刻的 99.8%。我们仍在使用 50 亿 token 的长上下文数据混合集对该模型进行预训练，并展现出近乎全绿的成绩。\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>🎯 \u003Cb>2024-03-06\u003C\u002Fb>: \u003Ccode>Yi-9B\u003C\u002Fcode> 已开源并面向公众开放。\u003C\u002Fsummary>\n  \u003Cbr>\n\u003Ccode>Yi-9B\u003C\u002Fcode> 在一系列类似规模的开源模型中（包括 Mistral-7B、SOLAR-10.7B、Gemma-7B、DeepSeek-Coder-7B-Base-v1.5 等）表现最为突出，尤其在代码、数学、常识推理和阅读理解方面表现出色。\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>🎯 \u003Cb>2024-01-23\u003C\u002Fb>: Yi-VL 模型，\u003Ccode>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-VL-34B\">Yi-VL-34B\u003C\u002Fa>\u003C\u002Fcode> 和 \u003Ccode>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-VL-6B\">Yi-VL-6B\u003C\u002Fa>\u003C\u002Fcode>, 已开源并面向公众开放。\u003C\u002Fsummary>\n  \u003Cbr>\n\u003Ccode>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-VL-34B\">Yi-VL-34B\u003C\u002Fa>\u003C\u002Fcode> 在最新基准测试中，包括 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16502\">MMMU\u003C\u002Fa> 和 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.11944\">CMMMU\u003C\u002Fa>（基于截至 2024 年 1 月的数据），在所有现有开源模型中位居\u003Cstrong>第一\u003C\u002Fstrong>。\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>🎯 \u003Cb>2023-11-23\u003C\u002Fb>: \u003Ca href=\"#chat-models\">聊天模型\u003C\u002Fa> 已开源并面向公众开放。\u003C\u002Fsummary>\n\u003Cbr>本次发布包含两款基于先前发布的基础模型的聊天模型、两款由 GPTQ 量化的 8 位模型，以及两款由 AWQ 量化的 4 位模型。\n\n- `Yi-34B-Chat`\n- `Yi-34B-Chat-4bits`\n- `Yi-34B-Chat-8bits`\n- `Yi-6B-Chat`\n- `Yi-6B-Chat-4bits`\n- `Yi-6B-Chat-8bits`\n\n您可以在以下平台交互体验部分模型：\n\n- [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002F01-ai\u002FYi-34B-Chat)\n- [Replicate](https:\u002F\u002Freplicate.com\u002F01-ai)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>🔔 \u003Cb>2023-11-23\u003C\u002Fb>: Yi 系列模型社区许可协议更新至 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Fmain\u002FMODEL_LICENSE_AGREEMENT.txt\">v2.1\u003C\u002Fa>。\u003C\u002Fsummary>\n\u003C\u002Fdetails>\n\n\u003Cdetails> \n\u003Csummary>🔥 \u003Cb>2023-11-08\u003C\u002Fb>: Yi-34B 聊天模型邀请测试。\u003C\u002Fsummary>\n\u003Cbr>申请表：\n\n- [英文](https:\u002F\u002Fcn.mikecrm.com\u002Fl91ODJf)\n- [中文](https:\u002F\u002Fcn.mikecrm.com\u002FgnEZjiQ)\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>🎯 \u003Cb>2023-11-05\u003C\u002Fb>: \u003Ca href=\"#base-models\">基础模型\u003C\u002Fa>，\u003Ccode>Yi-6B-200K\u003C\u002Fcode> 和 \u003Ccode>Yi-34B-200K\u003C\u002Fcode>, 已开源并面向公众开放。\u003C\u002Fsummary>\n\u003Cbr>本次发布包含两款与此前版本参数规模相同的基础模型，区别在于上下文窗口扩展至 20 万 token。\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>🎯 \u003Cb>2023-11-02\u003C\u002Fb>: \u003Ca href=\"#base-models\">基础模型\u003C\u002Fa>，\u003Ccode>Yi-6B\u003C\u002Fcode> 和 \u003Ccode>Yi-34B\u003C\u002Fcode>, 已开源并面向公众开放。\u003C\u002Fsummary>\n\u003Cbr>首次公开发布包含两款双语（英\u002F中）的基础模型，参数规模分别为 60 亿和 340 亿。两者均以 4 千序列长度进行训练，推理时可扩展至 3.2 万。\n\u003C\u002Fdetails>\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n## 模型\n\nYi 模型拥有多种尺寸，可满足不同应用场景的需求。您还可以对 Yi 模型进行微调，以更好地适配您的具体要求。\n\n如果您希望部署 Yi 模型，请确保满足[软硬件要求](#deployment)。\n\n### 聊天模型\n\n| 模型 | 下载 |\n|---|---|\n|Yi-34B-Chat\t| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-34B-Chat)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-34B-Chat\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-34B-Chat) |\n|Yi-34B-Chat-4bits\t| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-34B-Chat-4bits)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-34B-Chat-4bits\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-34B-Chat-4bits) |\n|Yi-34B-Chat-8bits | • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-34B-Chat-8bits)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-34B-Chat-8bits\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-34B-Chat-8bits) |\n|Yi-6B-Chat| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-6B-Chat)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-6B-Chat\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat) |\n|Yi-6B-Chat-4bits | • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-6B-Chat-4bits)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-6B-Chat-4bits\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-4bits) |\n|Yi-6B-Chat-8bits\t| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-6B-Chat-8bits)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-6B-Chat-8bits\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits) |\n\n\u003Csub>\u003Csup> - 4 位系列模型采用 AWQ 量化。 \u003Cbr> - 8 位系列模型采用 GPTQ 量化。 \u003Cbr> - 所有量化后的模型使用门槛较低，可在消费级 GPU 上部署（如 3090、4090）。 \u003C\u002Fsup>\u003C\u002Fsub>\n\n### 基础模型\n\n| 模型 | 下载 |\n|---|---|\n|Yi-34B| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-34B)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-34B\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits) |\n|Yi-34B-200K|• [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-34B-200K)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-34B-200K\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits)|\n|Yi-9B|• [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-9B)  • [🤖 ModelScope](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-9B)|\n|Yi-9B-200K | • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-9B-200K)  • [🤖 ModelScope](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-9B-200K)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits) |\n|Yi-6B| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-6B)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-6B\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits) |\n|Yi-6B-200K\t| • [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai\u002FYi-6B-200K)  • [🤖 ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002F01ai\u002FYi-6B-200K\u002Fsummary)  • [🟣 wisemodel](https:\u002F\u002Fwisemodel.cn\u002Fmodels\u002F01.AI\u002FYi-6B-Chat-8bits) |\n\n\u003Csub>\u003Csup> - 200k 大致相当于 40 万汉字。  \u003Cbr> - 如果您想使用 Yi-34B-200K 的旧版本（于 2023 年 11 月 5 日发布），请运行 `git checkout 069cd341d60f4ce4b07ec394e82b79e94f656cf` 来下载权重。 \u003C\u002Fsup>\u003C\u002Fsub>\n\n### 模型信息\n\n- 对于聊天和基础模型\n\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>模型\u003C\u002Fth>\n\u003Cth>简介\u003C\u002Fth>\n\u003Cth>默认上下文窗口\u003C\u002Fth>\n\u003Cth>预训练 token 数量\u003C\u002Fth>\n\u003Cth>训练数据日期\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>6B 系列模型\u003C\u002Ftd>\n\u003Ctd>它们适用于个人和学术用途。\u003C\u002Ftd>\n\u003Ctd rowspan=\"3\">4K\u003C\u002Ftd>\n\u003Ctd>3T\u003C\u002Ftd>\n\u003Ctd rowspan=\"3\">截至 2023 年 6 月\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>9B 系列模型\u003C\u002Ftd>\n\u003Ctd>它是 Yi 系列模型中在编码和数学方面表现最好的。\u003C\u002Ftd>\n\u003Ctd>Yi-9B 是在 Yi-6B 的基础上持续训练的，使用了 0.8T 的 token。\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>34B 系列模型\u003C\u002Ftd>\n\u003Ctd>它们适用于个人、学术以及商业用途（尤其是中小企业）。这是一种经济实惠且具备涌现能力的成本效益解决方案。\u003C\u002Ftd>\n\u003Ctd>3T\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\n\n- 对于聊天模型\n  \n  \u003Cdetails style=\"display: inline;\">\u003Csummary>关于聊天模型的局限性，请参阅下方说明。⬇️\u003C\u002Fsummary>\n   \u003Cul>\n    \u003Cbr>发布的聊天模型经过了监督微调（SFT）的专属训练。与其他标准聊天模型相比，我们的模型能够产生更多样化的回复，因此适用于各种下游任务，例如创意场景。此外，这种多样性有望提高生成更高质量回复的可能性，这将有利于后续的强化学习（RL）训练。\n\n    \u003Cbr>然而，这种更高的多样性可能会放大某些现有问题，包括：\n      \u003Cli>幻觉：指模型生成事实不正确或不合逻辑的信息。随着模型回复变得更加多样化，出现基于不准确数据或缺乏逻辑推理的幻觉的可能性也会增加。\u003C\u002Fli>\n      \u003Cli>重新生成时的非确定性：在尝试重新生成或采样回复时，可能会出现结果不一致的情况。多样性的增加可能导致即使在相似的输入条件下，也会产生不同的结果。\u003C\u002Fli>\n      \u003Cli>累积误差：当模型回复中的错误随着时间推移而不断累积时就会发生这种情况。随着模型生成更多样化的回复，小的不准确性逐渐积累成较大误差的可能性会增加，尤其是在复杂的任务中，如长篇推理、数学问题求解等。\u003C\u002Fli>\n      \u003Cli>为了获得更加连贯和一致的回复，建议调整生成配置参数，例如温度、top_p 或 top_k。这些调整有助于在模型输出的创造性和连贯性之间取得平衡。\u003C\u002Fli>\n  \u003C\u002Ful>\n  \u003C\u002Fdetails>\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n\n# 如何使用 Yi？\n\n- [快速入门](#quick-start)\n  - [选择你的路径](#choose-your-path)\n  - [pip](#quick-start---pip)\n  - [docker](#quick-start---docker)\n  - [conda-lock](#quick-start---conda-lock)\n  - [llama.cpp](#quick-start---llamacpp)\n  - [Web 演示](#web-demo)\n- [微调](#fine-tuning)\n- [量化](#quantization)\n- [部署](#deployment)\n- [常见问题](#faq)\n- [学习中心](#learning-hub)\n\n## 快速入门\n\n> **💡 提示**: 如果您想开始使用 Yi 模型并探索不同的推理方法，请查看 [Yi 烹饪书](https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Ftree\u002Fmain\u002FCookbook)。\n\n### 选择你的路径\n\n请选择以下其中一个路径，开始你与 Yi 的旅程！\n\n![快速入门 - 选择你的路径](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_8ce0f0e81389.png)\n\n#### 🎯 在本地部署 Yi\n\n如果你更倾向于在本地部署 Yi 模型：\n\n  - 🙋‍♀️ 如果你拥有 **充足** 的资源（例如 NVIDIA A800 80GB），你可以选择以下方法之一：\n    - [pip](#快速入门---pip)\n    - [Docker](#快速入门---docker)\n    - [conda-lock](#快速入门---conda-lock)\n\n  - 🙋‍♀️ 如果你只有 **有限** 的资源（例如 MacBook Pro），可以使用 [llama.cpp](#快速入门---llamacpp)。\n\n#### 🎯 不在本地部署 Yi\n\n如果你不希望在本地部署 Yi 模型，可以通过以下任意一种方式来体验 Yi 的能力。\n\n##### 🙋‍♀️ 使用 API 运行 Yi\n\n如果你想探索 Yi 的更多功能，可以采用以下方法之一：\n\n- Yi 官方 API\n  - 已向部分申请者开放了 **早期访问权限**（[推文链接](https:\u002F\u002Fx.com\u002F01AI_Yi\u002Fstatus\u002F1735728934560600536?s=20)）。敬请关注下一轮开放！\n\n- [Yi API](https:\u002F\u002Freplicate.com\u002F01-ai\u002Fyi-34b-chat\u002Fapi?tab=nodejs)（Replicate）\n\n##### 🙋‍♀️ 在 Playground 中运行 Yi\n\n如果你想以更多自定义选项（如系统提示、温度、重复惩罚等）与 Yi 对话，可以尝试以下几种方式：\n\n  - [Yi-34B-Chat-Playground](https:\u002F\u002Fplatform.lingyiwanwu.com\u002Fprompt\u002Fplayground)（Yi 官方）\n    - 需通过白名单访问。欢迎申请（填写 [英文表单](https:\u002F\u002Fcn.mikecrm.com\u002Fl91ODJf) 或 [中文表单](https:\u002F\u002Fcn.mikecrm.com\u002FgnEZjiQ)）。\n\n  - [Yi-34B-Chat-Playground](https:\u002F\u002Freplicate.com\u002F01-ai\u002Fyi-34b-chat)（Replicate）\n\n##### 🙋‍♀️ 与 Yi 聊天\n\n如果你想与 Yi 对话，可以使用以下在线服务，它们提供相似的用户体验：\n\n- [Yi-34B-Chat](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002F01-ai\u002FYi-34B-Chat)（Hugging Face 上的 Yi 官方版本）\n  - 无需注册。\n\n- [Yi-34B-Chat](https:\u002F\u002Fplatform.lingyiwanwu.com\u002F)（Yi 官方测试版）\n  - 需通过白名单访问。欢迎申请（填写 [英文表单](https:\u002F\u002Fcn.mikecrm.com\u002Fl91ODJf) 或 [中文表单](https:\u002F\u002Fcn.mikecrm.com\u002FgnEZjiQ)）。\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### 快速入门 - pip \n\n本教程将指导你完成在 A800（80G）上本地运行 **Yi-34B-Chat** 并进行推理的全过程。\n\n#### 步骤 0：先决条件\n\n- 确保已安装 Python 3.10 或更高版本。\n\n- 如果你想运行其他 Yi 模型，请参阅 [软硬件要求](#部署)。\n\n#### 步骤 1：准备环境 \n\n要设置环境并安装所需包，请执行以下命令。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi.git\ncd yi\npip install -r requirements.txt\n```\n\n#### 步骤 2：下载 Yi 模型\n\n你可以从以下来源下载 Yi 模型的权重和分词器：\n\n- [Hugging Face](https:\u002F\u002Fhuggingface.co\u002F01-ai)\n- [ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Forganization\u002F01ai\u002F)\n- [WiseModel](https:\u002F\u002Fwisemodel.cn\u002Forganization\u002F01.AI)\n\n#### 步骤 3：进行推理\n\n你可以按照以下方式对 Yi 的聊天模型或基础模型进行推理。\n\n##### 使用 Yi 聊天模型进行推理\n\n1. 创建一个名为 `quick_start.py` 的文件，并将以下内容复制到其中。\n\n    ```python\n    from transformers import AutoModelForCausalLM, AutoTokenizer\n\n    model_path = '\u003Cyour-model-path>'\n\n    tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)\n\n    # 自 transformers 4.35.0 起，GPT-Q\u002FAWQ 模型可使用 AutoModelForCausalLM 加载。\n    model = AutoModelForCausalLM.from_pretrained(\n        model_path,\n        device_map=\"auto\",\n        torch_dtype='auto'\n    ).eval()\n\n    # 提示内容：“hi”\n    messages = [\n        {\"role\": \"user\", \"content\": \"hi\"}\n    ]\n\n    input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')\n    output_ids = model.generate(input_ids.to('cuda'))\n    response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)\n\n    # 模型响应：“Hello! How can I assist you today?”\n    print(response)\n    ```\n\n2. 运行 `quick_start.py`。\n\n    ```bash\n    python quick_start.py\n    ```\n\n    你将看到类似以下的输出。🥳\n\n    ```bash\n    Hello! How can I assist you today?\n    ```\n\n##### 使用 Yi 基础模型进行推理\n\n- Yi-34B\n\n  步骤与 [pip - 使用 Yi 聊天模型进行推理](#使用-Yi-聊天模型进行推理) 类似。\n\n  你可以直接使用现有的文件 [`text_generation.py`](https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Ftree\u002Fmain\u002Fdemo)。\n\n  ```bash\n  python demo\u002Ftext_generation.py  --model \u003Cyour-model-path>\n  ```\n\n  你将看到类似以下的输出。🥳\n\n  \u003Cdetails>\n\n  \u003Csummary>输出。⬇️ \u003C\u002Fsummary>\n\n  \u003Cbr>\n\n  **提示**：让我给你讲一个关于猫汤姆和老鼠杰瑞的有趣故事，\n\n  **生成**：让我给你讲一个关于猫汤姆和老鼠杰瑞的有趣故事，这发生在我小时候。我父亲有一栋大房子，里面养了两只猫来抓老鼠。有一天，我独自在家玩耍时，发现其中一只公猫正仰面躺在厨房门口附近，看起来好像想让我们做点什么，却因为周围人太多而起不来！它试了好几分钟，最后还是放弃了……\n\n  \u003C\u002Fdetails>\n\n- Yi-9B\n  \n  输入\n\n  ```bash\n  from transformers import AutoModelForCausalLM, AutoTokenizer\n\n  MODEL_DIR = \"01-ai\u002FYi-9B\"\n  model = AutoModelForCausalLM.from_pretrained(MODEL_DIR, torch_dtype=\"auto\")\n  tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, use_fast=False)\n\n  input_text = \"# 写出快速排序算法\"\n  inputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\n  outputs = model.generate(**inputs, max_length=256)\n  print(tokenizer.decode(outputs[0], skip_special_tokens=True))\n  ```\n\n  输出\n\n  ```bash\n  # 写出快速排序算法\n  def quick_sort(arr):\n      if len(arr) \u003C= 1:\n          return arr\n      pivot = arr[len(arr) \u002F\u002F 2]\n      left = [x for x in arr if x \u003C pivot]\n      middle = [x for x in arr if x == pivot]\n      right = [x for x in arr if x > pivot]\n      return quick_sort(left) + middle + quick_sort(right)\n  \n  # 测试快速排序算法\n  print(quick_sort([3, 6, 8, 10, 1, 2, 1]))\n  ```\n\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### 快速入门 - Docker \n\u003Cdetails>\n\u003Csummary> 使用 Docker 在本地运行 Yi-34B-chat：分步指南。⬇️\u003C\u002Fsummary> \n\u003Cbr>本教程将引导您完成在本地使用 \u003Cstrong>A800 GPU\u003C\u002Fstrong> 或 \u003Cstrong>4*4090\u003C\u002Fstrong> 运行 \u003Cstrong>Yi-34B-Chat\u003C\u002Fstrong> 并进行推理的每一个步骤。\n \u003Ch4>步骤 0：先决条件\u003C\u002Fh4>\n\u003Cp>请确保已安装 \u003Ca href=\"https:\u002F\u002Fdocs.docker.com\u002Fengine\u002Finstall\u002F?open_in_browser=true\">Docker\u003C\u002Fa> 和 \u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html\">NVIDIA 容器工具包\u003C\u002Fa>。\u003C\u002Fp>\n\n\u003Ch4> 步骤 1：启动 Docker \u003C\u002Fh4>\n\u003Cpre>\u003Ccode>docker run -it --gpus all \\\n-v &lt;your-model-path&gt;: \u002Fmodels\nghcr.io\u002F01-ai\u002Fyi:latest\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>或者，您也可以从 \u003Ccode>registry.lingyiwanwu.com\u002Fci\u002F01-ai\u002Fyi:latest\u003C\u002Fcode> 拉取 Yi 的 Docker 镜像。\u003C\u002Fp>\n\n\u003Ch4>步骤 2：进行推理\u003C\u002Fh4>\n    \u003Cp>您可以按照如下方式使用 Yi 聊天模型或基础模型进行推理。\u003C\u002Fp>\n\n\u003Ch5>使用 Yi 聊天模型进行推理\u003C\u002Fh5>\n    \u003Cp>步骤与 \u003Ca href=\"#perform-inference-with-yi-chat-model\">pip - 使用 Yi 聊天模型进行推理\u003C\u002Fa> 类似。\u003C\u002Fp>\n    \u003Cp>\u003Cstrong>注意\u003C\u002Fstrong>，唯一的区别是需要将 \u003Ccode>model_path = '&lt;your-model-mount-path&gt;'\u003C\u002Fcode> 替换为 \u003Ccode>model_path = '&lt;your-model-path&gt;'\u003C\u002Fcode>。\u003C\u002Fp>\n\u003Ch5>使用 Yi 基础模型进行推理\u003C\u002Fh5>\n    \u003Cp>步骤与 \u003Ca href=\"#perform-inference-with-yi-base-model\">pip - 使用 Yi 基础模型进行推理\u003C\u002Fa> 类似。\u003C\u002Fp>\n    \u003Cp>\u003Cstrong>注意\u003C\u002Fstrong>，唯一的区别是需要将 \u003Ccode>--model &lt;your-model-mount-path&gt;'\u003C\u002Fcode> 替换为 \u003Ccode>model &lt;your-model-path&gt;\u003C\u002Fcode>。\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n### 快速入门 - conda-lock\n\n\u003Cdetails>\n\u003Csummary>您可以使用 \u003Ccode>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fconda\u002Fconda-lock\">conda-lock\u003C\u002Fa>\u003C\u002Fcode> 为 Conda 环境生成完全可复现的锁定文件。⬇️\u003C\u002Fsummary>\n\u003Cbr>\n您可以参考 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Febba23451d780f35e74a780987ad377553134f68\u002Fconda-lock.yml\">conda-lock.yml\u003C\u002Fa> 来获取依赖项的确切版本。此外，您还可以使用 \u003Ccode>\u003Ca href=\"https:\u002F\u002Fmamba.readthedocs.io\u002Fen\u002Flatest\u002Fuser_guide\u002Fmicromamba.html\">micromamba\u003C\u002Fa>\u003C\u002Fcode> 来安装这些依赖项。\n\u003Cbr>\n要安装依赖项，请按照以下步骤操作：\n\n1. 按照 \u003Ca href=\"https:\u002F\u002Fmamba.readthedocs.io\u002Fen\u002Flatest\u002Finstallation\u002Fmicromamba-installation.html\">此处\u003C\u002Fa> 的说明安装 micromamba。\n\n2. 执行 \u003Ccode>micromamba install -y -n yi -f conda-lock.yml\u003C\u002Fcode>，以创建名为 \u003Ccode>yi\u003C\u002Fcode> 的 Conda 环境并安装必要的依赖项。\n\u003C\u002Fdetails>\n\n### 快速入门 - llama.cpp\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_llama.cpp.md\">以下教程\u003C\u002Fa>将指导您完成在本地运行量化模型（\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FXeIaso\u002Fyi-chat-6B-GGUF\u002Ftree\u002Fmain\">Yi-chat-6B-2bits\u003C\u002Fa>）并进行推理的每一个步骤。\n\u003Cdetails>\n\u003Csummary> 使用 llama.cpp 在本地运行 Yi-chat-6B-2bits：分步指南。⬇️\u003C\u002Fsummary> \n\u003Cbr>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_llama.cpp.md\">本教程\u003C\u002Fa>将指导您完成在本地运行量化模型（\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FXeIaso\u002Fyi-chat-6B-GGUF\u002Ftree\u002Fmain\">Yi-chat-6B-2bits\u003C\u002Fa>）并进行推理的每一个步骤。\u003C\u002Fp>\n\n- [步骤 0：先决条件](#step-0-prerequisites)\n- [步骤 1：下载 llama.cpp](#step-1-download-llamacpp)\n- [步骤 2：下载 Yi 模型](#step-2-download-yi-model)\n- [步骤 3：进行推理](#step-3-perform-inference)\n\n#### 步骤 0：先决条件 \n\n- 本教程假设您使用配备 16GB 内存和 Apple M2 Pro 芯片的 MacBook Pro。\n  \n- 请确保您的机器上已安装 [`git-lfs`](https:\u002F\u002Fgit-lfs.com\u002F)。\n  \n#### 步骤 1：下载 `llama.cpp`\n\n要克隆 [`llama.cpp`](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp) 仓库，运行以下命令。\n\n```bash\ngit clone git@github.com:ggerganov\u002Fllama.cpp.git\n```\n\n#### 步骤 2：下载 Yi 模型\n\n2.1 若要仅通过指针克隆 [XeIaso\u002Fyi-chat-6B-GGUF](https:\u002F\u002Fhuggingface.co\u002FXeIaso\u002Fyi-chat-6B-GGUF\u002Ftree\u002Fmain)，运行以下命令。\n\n```bash\nGIT_LFS_SKIP_SMUDGE=1 git clone https:\u002F\u002Fhuggingface.co\u002FXeIaso\u002Fyi-chat-6B-GGUF\n```\n\n2.2 若要下载量化后的 Yi 模型（[yi-chat-6b.Q2_K.gguf](https:\u002F\u002Fhuggingface.co\u002FXeIaso\u002Fyi-chat-6B-GGUF\u002Fblob\u002Fmain\u002Fyi-chat-6b.Q2_K.gguf)），运行以下命令。\n\n```bash\ngit-lfs pull --include yi-chat-6b.Q2_K.gguf\n```\n\n#### 步骤 3：进行推理\n\n要使用 Yi 模型进行推理，您可以采用以下方法之一。\n\n- [方法 1：在终端中进行推理](#method-1-perform-inference-in-terminal)\n  \n- [方法 2：在网页中进行推理](#method-2-perform-inference-in-web)\n\n##### 方法 1：在终端中进行推理\n\n要使用 4 个线程编译 `llama.cpp` 并进行推理，请进入 `llama.cpp` 目录，然后运行以下命令。\n\n> ##### 提示\n> \n> - 请将 `\u002FUsers\u002Fyu\u002Fyi-chat-6B-GGUF\u002Fyi-chat-6b.Q2_K.gguf` 替换为您模型的实际路径。\n>\n> - 默认情况下，模型以补全模式运行。\n> \n> - 如需更多输出自定义选项（例如系统提示、温度、重复惩罚等），可运行 `.\u002Fmain -h` 查看详细说明和用法。\n\n```bash\nmake -j4 && .\u002Fmain -m \u002FUsers\u002Fyu\u002Fyi-chat-6B-GGUF\u002Fyi-chat-6b.Q2_K.gguf -p \"如何喂养您的宠物狐狸？请用 6 个简单步骤回答这个问题：\\n第 1 步：\" -n 384 -e\n\n...\n\n如何喂养您的宠物狐狸？请用 6 个简单步骤回答这个问题：\n\n第 1 步：为您的宠物狐狸选择合适的食物。您应选择高质量、均衡的猎物，这些猎物应符合其独特的饮食需求。这些可能包括活体或冷冻的老鼠、大鼠、鸽子或其他小型哺乳动物，以及新鲜的水果和蔬菜。\n\n第 2 步：根据狐狸的种类及其个人偏好，每天喂食一到两次。务必确保它们全天都能获得新鲜的水。\n\n第 3 步：为您的宠物狐狸提供合适的环境。确保它有一个舒适的休息场所、充足的活动空间，以及玩耍和锻炼的机会。\n\n第 4 步：如果可能的话，让您的宠物与其他动物互动。与其他生物的互动可以帮助它们培养社交技能，防止无聊或压力。\n\n第 5 步：定期检查您的狐狸是否有疾病或不适的迹象。准备好在必要时提供兽医护理，尤其是针对寄生虫、牙齿健康问题或感染等常见问题。\n\n第 6 步：了解您的宠物狐狸的需求，并注意任何可能影响其福祉的风险或担忧。定期咨询兽医，以确保您提供最佳的护理。\n\n...\n\n```\n\n现在您已成功向 Yi 模型提问并获得了答案！🥳\n\n##### 方法 2：在网页中进行推理\n\n1. 要初始化一个轻量且快速的聊天机器人，运行以下命令。\n\n    ```bash\n    cd llama.cpp\n    .\u002Fserver --ctx-size 2048 --host 0.0.0.0 --n-gpu-layers 64 --model \u002FUsers\u002Fyu\u002Fyi-chat-6B-GGUF\u002Fyi-chat-6b.Q2_K.gguf\n    ```\n\n    然后您将看到如下输出：\n\n\n    ```bash\n    ...\n    \n    llama_new_context_with_model: n_ctx      = 2048\n    llama_new_context_with_model: freq_base  = 5000000.0\n    llama_new_context_with_model: freq_scale = 1\n    ggml_metal_init: allocating\n    ggml_metal_init: found device: Apple M2 Pro\n    ggml_metal_init: picking default device: Apple M2 Pro\n    ggml_metal_init: ggml.metallib not found, loading from source\n    ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil\n    ggml_metal_init: loading '\u002FUsers\u002Fyu\u002Fllama.cpp\u002Fggml-metal.metal'\n    ggml_metal_init: GPU name:   Apple M2 Pro\n    ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)\n    ggml_metal_init: hasUnifiedMemory              = true\n    ggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MB\n    ggml_metal_init: maxTransferRate               = built-in GPU\n    ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   128.00 MiB, ( 2629.44 \u002F 10922.67)\n    llama_new_context_with_model: KV self size  =  128.00 MiB, K (f16):   64.00 MiB, V (f16):   64.00 MiB\n    ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, ( 2629.45 \u002F 10922.67)\n    llama_build_graph: non-view tensors processed: 676\u002F676\n    llama_new_context_with_model: compute buffer total size = 159.19 MiB\n    ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   156.02 MiB, ( 2785.45 \u002F 10922.67)\n    Available slots:\n    -> Slot 0 - max context: 2048\n    \n    llama server listening at http:\u002F\u002F0.0.0.0:8080\n    ```\n\n2. 要访问聊天机器人界面，打开您的网页浏览器，在地址栏中输入 `http:\u002F\u002F0.0.0.0:8080`。\n   \n    ![Yi 模型聊天机器人界面 - llama.cpp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_52402f9ee848.png)\n\n\n3. 在提示框中输入一个问题，例如“如何喂养您的宠物狐狸？请用 6 个简单步骤回答这个问题”，您将收到相应的答案。\n\n    ![向 Yi 模型提问 - llama.cpp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_a4e0fafc523e.png)\n\n\u003C\u002Ful>\n\u003C\u002Fdetails>\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### 网页演示\n\n您可以为 Yi **聊天** 模型构建一个 Web UI 演示（请注意，在此场景下不支持 Yi 基础模型）。\n\n[步骤 1：准备环境](#step-1-prepare-your-environment)。\n\n[步骤 2：下载 Yi 模型](#step-2-download-the-yi-model)。\n\n步骤 3. 要在本地启动 Web 服务，请运行以下命令。\n\n```bash\npython demo\u002Fweb_demo.py -c \u003Cyour-model-path>\n```\n\n您可以通过在浏览器中输入控制台提供的地址来访问 Web UI。\n\n![快速入门 - 网页演示](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_29f31cde98e5.gif)\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### 微调\n\n```bash\nbash finetune\u002Fscripts\u002Frun_sft_Yi_6b.sh\n```\n\n完成后，您可以使用以下命令比较微调后的模型和基础模型：\n\n```bash\nbash finetune\u002Fscripts\u002Frun_eval.sh\n```\n\u003Cdetails style=\"display: inline;\">\u003Csummary>对于高级用法（例如基于自定义数据进行微调），请参阅下方的说明。⬇️ \u003C\u002Fsummary> \u003Cul>\n\n### Yi 6B 和 34B 的微调代码\n\n#### 准备工作\n\n##### 从镜像中获取\n\n默认情况下，我们使用来自 [BAAI\u002FCOIG](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBAAI\u002FCOIG) 的小型数据集来微调基础模型。\n您也可以按照以下 `jsonl` 格式准备自定义数据集：\n\n```json\n{ \"prompt\": \"Human: 你是谁？Assistant:\", \"chosen\": \"我是 Yi。\" }\n```\n\n然后将这些数据挂载到容器中，以替换默认的数据集：\n\n```bash\ndocker run -it \\\n    -v \u002Fpath\u002Fto\u002Fsave\u002Ffinetuned\u002Fmodel\u002F:\u002Ffinetuned-model \\\n    -v \u002Fpath\u002Fto\u002Ftrain.jsonl:\u002Fyi\u002Ffinetune\u002Fdata\u002Ftrain.json \\\n    -v \u002Fpath\u002Fto\u002Feval.jsonl:\u002Fyi\u002Ffinetune\u002Fdata\u002Feval.json \\\n    ghcr.io\u002F01-ai\u002Fyi:latest \\\n    bash finetune\u002Fscripts\u002Frun_sft_Yi_6b.sh\n```\n\n##### 从本地服务器中获取\n\n请确保您已安装 conda。如果没有，请执行以下操作：\n\n```bash\nmkdir -p ~\u002Fminiconda3\nwget https:\u002F\u002Frepo.anaconda.com\u002Fminiconda\u002FMiniconda3-latest-Linux-x86_64.sh -O ~\u002Fminiconda3\u002Fminiconda.sh\nbash ~\u002Fminiconda3\u002Fminiconda.sh -b -u -p ~\u002Fminiconda3\nrm -rf ~\u002Fminiconda3\u002Fminiconda.sh\n~\u002Fminiconda3\u002Fbin\u002Fconda init bash\nsource ~\u002F.bashrc\n```\n\n然后创建一个 conda 环境：\n\n```bash\nconda create -n dev_env python=3.10 -y\nconda activate dev_env\npip install torch==2.0.1 deepspeed==0.10 tensorboard transformers datasets sentencepiece accelerate ray==2.7\n```\n\n#### 硬件配置\n\n对于 Yi-6B 模型，建议使用配备 4 张 GPU 的节点，每张 GPU 的显存需大于 60GB。\n\n对于 Yi-34B 模型，由于零卸载技术会消耗大量 CPU 内存，请务必谨慎限制 34B 微调训练中使用的 GPU 数量。请使用 CUDA_VISIBLE_DEVICES 来限制 GPU 数量（如 scripts\u002Frun_sft_Yi_34b.sh 中所示）。\n\n典型的 34B 模型微调硬件配置为：配备 8 张 GPU 的节点（通过 CUDA_VISIBLE_DEVICES=0,1,2,3 限制为 4 张），每张 GPU 的显存需大于 80GB，且总 CPU 内存需大于 900GB。\n\n#### 快速开始\n\n将 LLM 基础模型下载到 MODEL_PATH（6B 和 34B）。典型的模型文件夹结构如下：\n\n```bash\n|-- $MODEL_PATH\n|   |-- config.json\n|   |-- pytorch_model-00001-of-00002.bin\n|   |-- pytorch_model-00002-of-00002.bin\n|   |-- pytorch_model.bin.index.json\n|   |-- tokenizer_config.json\n|   |-- tokenizer.model\n|   |-- ...\n```\n\n从 Hugging Face 下载数据集到本地存储 DATA_PATH，例如 Dahoas\u002Frm-static。\n\n```bash\n|-- $DATA_PATH\n|   |-- data\n|   |   |-- train-00000-of-00001-2a1df75c6bce91ab.parquet\n|   |   |-- test-00000-of-00001-8c7c51afc6d45980.parquet\n|   |-- dataset_infos.json\n|   |-- README.md\n```\n\n`finetune\u002Fyi_example_dataset` 包含示例数据集，这些数据集改编自 [BAAI\u002FCOIG](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBAAI\u002FCOIG)。\n\n```bash\n|-- $DATA_PATH\n    |--data\n        |-- train.jsonl\n        |-- eval.jsonl\n```\n\n进入 scripts 文件夹，复制并粘贴脚本，然后运行。例如：\n\n```bash\ncd finetune\u002Fscripts\n\nbash run_sft_Yi_6b.sh\n```\n\n对于 Yi-6B 基础模型，设置 training_debug_steps=20 和 num_train_epochs=4 可以输出一个聊天模型，整个过程大约需要 20 分钟。\n\n对于 Yi-34B 基础模型，初始化过程相对较长，请耐心等待。\n\n#### 评估\n\n```bash\ncd finetune\u002Fscripts\n\nbash run_eval.sh\n```\n\n随后您将看到来自基础模型和微调后模型的回答。\n\u003C\u002Ful>\n\u003C\u002Fdetails>\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### 量化\n\n#### GPT-Q\n```bash\npython quantization\u002Fgptq\u002Fquant_autogptq.py \\\n  --model \u002Fbase_model                      \\\n  --output_dir \u002Fquantized_model            \\\n  --trust_remote_code\n```\n\n完成之后，您可以按照以下方式评估生成的模型：\n\n```bash\npython quantization\u002Fgptq\u002Feval_quantized_model.py \\\n  --model \u002Fquantized_model                       \\\n  --trust_remote_code\n```\n\n\u003Cdetails style=\"display: inline;\">\u003Csummary>详情请参阅下方说明。⬇️\u003C\u002Fsummary> \u003Cul>\n\n#### GPT-Q 量化\n\n[GPT-Q](https:\u002F\u002Fgithub.com\u002FIST-DASLab\u002Fgptq) 是一种 PTQ（训练后量化）方法。它可以在保持模型精度的同时节省内存并带来潜在的速度提升。\n\nYi 模型可以轻松地进行 GPT-Q 量化。我们将在下面提供一个分步教程。\n\n为了运行 GPT-Q，我们将使用 [AutoGPTQ](https:\u002F\u002Fgithub.com\u002FPanQiWei\u002FAutoGPTQ) 和 [exllama](https:\u002F\u002Fgithub.com\u002Fturboderp\u002Fexllama)。Hugging Face Transformers 已经集成了 optimum 和 auto-gptq，以对语言模型执行 GPT-Q 量化。\n\n##### 进行量化\n\n我们提供了 `quant_autogptq.py` 脚本，供您执行 GPT-Q 量化：\n\n```bash\npython quant_autogptq.py --model \u002Fbase_model \\\n    --output_dir \u002Fquantized_model --bits 4 --group_size 128 --trust_remote_code\n```\n\n##### 运行量化后的模型\n\n您可以使用 `eval_quantized_model.py` 来运行量化后的模型：\n\n```bash\npython eval_quantized_model.py --model \u002Fquantized_model --trust_remote_code\n```\n\u003C\u002Ful>\n\u003C\u002Fdetails>\n\n#### AWQ\n\n```bash\npython quantization\u002Fawq\u002Fquant_autoawq.py \\\n  --model \u002Fbase_model                      \\\n  --output_dir \u002Fquantized_model            \\\n  --trust_remote_code\n```\n\n完成之后，您可以按照以下方式评估生成的模型：\n\n```bash\npython quantization\u002Fawq\u002Feval_quantized_model.py \\\n  --model \u002Fquantized_model                       \\\n  --trust_remote_code\n```\n\n\u003Cdetails style=\"display: inline;\">\u003Csummary>详情请参阅下方说明。⬇️\u003C\u002Fsummary> \u003Cul>\n\n#### AWQ 量化\n\n[AWQ](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fllm-awq) 是一种 PTQ（训练后量化）方法。它是一种高效且准确的低比特权重量化（INT3\u002F4），适用于大型语言模型。\n\nYi 模型可以轻松地进行 AWQ 量化。我们将在下面提供一个分步教程。\n\n为了运行 AWQ，我们将使用 [AutoAWQ](https:\u002F\u002Fgithub.com\u002Fcasper-hansen\u002FAutoAWQ)。\n\n##### 进行量化\n\n我们提供了 `quant_autoawq.py` 脚本，供您执行 AWQ 量化：\n\n```bash\npython quant_autoawq.py --model \u002Fbase_model \\\n    --output_dir \u002Fquantized_model --bits 4 --group_size 128 --trust_remote_code\n```\n\n##### 运行量化后的模型\n\n您可以使用 `eval_quantized_model.py` 来运行量化后的模型：\n\n```bash\npython eval_quantized_model.py --model \u002Fquantized_model --trust_remote_code\n```\n\n\n\u003C\u002Ful>\n\u003C\u002Fdetails>\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### 部署\n\n如果您想部署 Yi 模型，请确保满足软件和硬件要求。\n\n#### 软件要求\n\n在使用 Yi 量化模型之前，请确保已安装以下正确的软件。\n\n| 模型 | 软件 |\n|---|---|\n| Yi 4-bit 量化模型 | [AWQ 和 CUDA](https:\u002F\u002Fgithub.com\u002Fcasper-hansen\u002FAutoAWQ?tab=readme-ov-file#install-from-pypi) |\n| Yi 8-bit 量化模型 | [GPTQ 和 CUDA](https:\u002F\u002Fgithub.com\u002FPanQiWei\u002FAutoGPTQ?tab=readme-ov-file#quick-installation) |\n\n#### 硬件要求\n\n在将 Yi 部署到您的环境中之前，请确保您的硬件满足以下要求。\n\n##### 对话模型\n\n| 模型                | 最小显存 | 推荐 GPU 示例       |\n|:----------------------|:--------------|:-------------------------------------:|\n| Yi-6B-Chat           | 15 GB         | 1 x RTX 3090 (24 GB) \u003Cbr> 1 x RTX 4090 (24 GB) \u003Cbr>  1 x A10 (24 GB)  \u003Cbr> 1 x A30 (24 GB)              |\n| Yi-6B-Chat-4bits     | 4 GB          | 1 x RTX 3060 (12 GB)\u003Cbr> 1 x RTX 4060 (8 GB)                   |\n| Yi-6B-Chat-8bits     | 8 GB          | 1 x RTX 3070 (8 GB) \u003Cbr> 1 x RTX 4060 (8 GB)                   |\n| Yi-34B-Chat          | 72 GB         | 4 x RTX 4090 (24 GB)\u003Cbr> 1 x A800 (80GB)               |\n| Yi-34B-Chat-4bits    | 20 GB         | 1 x RTX 3090 (24 GB) \u003Cbr> 1 x RTX 4090 (24 GB) \u003Cbr> 1 x A10 (24 GB)  \u003Cbr> 1 x A30 (24 GB)  \u003Cbr> 1 x A100 (40 GB) |\n| Yi-34B-Chat-8bits    | 38 GB         | 2 x RTX 3090 (24 GB) \u003Cbr> 2 x RTX 4090 (24 GB)\u003Cbr> 1 x A800  (40 GB) |\n\n以下是不同批量情况下的详细最小显存要求。\n\n|  模型                  | batch=1 | batch=4 | batch=16 | batch=32 |\n| ----------------------- | ------- | ------- | -------- | -------- |\n| Yi-6B-Chat              | 12 GB   | 13 GB   | 15 GB    | 18 GB    |\n| Yi-6B-Chat-4bits  | 4 GB    | 5 GB    | 7 GB     | 10 GB    |\n| Yi-6B-Chat-8bits  | 7 GB    | 8 GB    | 10 GB    | 14 GB    |\n| Yi-34B-Chat       | 65 GB   | 68 GB   | 76 GB    | > 80 GB   |\n| Yi-34B-Chat-4bits | 19 GB   | 20 GB   | 30 GB    | 40 GB    |\n| Yi-34B-Chat-8bits | 35 GB   | 37 GB   | 46 GB    | 58 GB    |\n\n##### 基础模型\n\n| 模型                | 最小显存 | 推荐 GPU 示例       |\n|----------------------|--------------|:-------------------------------------:|\n| Yi-6B                | 15 GB         | 1 x RTX 3090 (24 GB) \u003Cbr> 1 x RTX 4090 (24 GB) \u003Cbr> 1 x A10 (24 GB)  \u003Cbr> 1 x A30 (24 GB)                |\n| Yi-6B-200K           | 50 GB         | 1 x A800 (80 GB)                            |\n| Yi-9B                | 20 GB         | 1 x RTX 4090 (24 GB)                           |\n| Yi-34B               | 72 GB         | 4 x RTX 4090 (24 GB) \u003Cbr> 1 x A800 (80 GB)               |\n| Yi-34B-200K          | 200 GB        | 4 x A800 (80 GB)                        |\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### 常见问题解答\n\u003Cdetails>\n\u003Csummary> 如果您在使用 Yi 系列模型时有任何疑问，以下提供的答案可以作为您的参考。⬇️\u003C\u002Fsummary> \n\u003Cbr> \n\n#### 💡微调\n- \u003Cstrong>基础模型还是对话模型——该选择哪一个进行微调？\u003C\u002Fstrong>\n  \u003Cbr>选择哪种预训练语言模型进行微调，主要取决于您可用的计算资源以及任务的具体需求。\n    - 如果您拥有大量的微调数据（比如超过1万条样本），那么基础模型可能是您的首选。\n    - 相反，如果您拥有的微调数据量较少，那么对话模型可能更适合您。\n    - 通常建议同时微调基础模型和对话模型，比较它们的表现，然后根据具体需求选择最合适的模型。\n- \u003Cstrong>Yi-34B 和 Yi-34B-Chat 进行全参数微调有什么区别？\u003C\u002Fstrong>\n  \u003Cbr>\n  在 `Yi-34B` 和 `Yi-34B-Chat` 上进行全参数微调的主要区别在于微调方法及其效果。\n    - Yi-34B-Chat 采用特殊微调（SFT）方法，生成的回复更接近人类对话风格。\n    - 基础模型的微调则更为通用，且具有较高的性能潜力。\n    - 如果您对自己的数据质量有信心，可以选择使用 `Yi-34B` 进行微调。\n    - 如果您希望模型生成的回复更贴近人类对话风格，或者对数据质量有所顾虑，那么 `Yi-34B-Chat` 可能是更好的选择。\n\n#### 💡量化\n- \u003Cstrong>量化模型与原始模型相比，性能差距有多大？\u003C\u002Fstrong>\n    - 性能差异主要取决于所使用的量化方法以及这些模型的具体应用场景。例如，对于 AWQ 官方提供的模型而言，在基准测试中，量化可能会导致性能小幅下降几个百分点。\n    - 从主观感受来看，在逻辑推理等场景下，即使只有1%的性能变化，也可能影响输出结果的准确性。\n    \n#### 💡综合\n- \u003Cstrong>我在哪里可以找到用于问答任务的微调数据集？\u003C\u002Fstrong>\n    - 您可以在 Hugging Face 等平台上找到问答任务的微调数据集，例如 [m-a-p\u002FCOIG-CQIA](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fm-a-p\u002FCOIG-CQIA) 数据集就非常容易获取。\n    - 此外，Github 上也提供了微调框架，比如 [hiyouga\u002FLLaMA-Factory](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FLLaMA-Factory)，其中整合了现成的数据集。\n\n- \u003Cstrong>微调 Yi-34B FP16 需要多少显存？\u003C\u002Fstrong>\n  \u003Cbr>\n  微调 34B FP16 所需的显存大小取决于具体的微调方法。如果是全参数微调，则需要8张每张80 GB显存的GPU；而像 LoRA 这样的经济型方案则所需显存较少。更多细节请参阅 [hiyouga\u002FLLaMA-Factory](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FLLaMA-Factory)。此外，为了优化性能，也可以考虑使用 BF16 而不是 FP16 进行微调。\n\n- \u003Cstrong>是否有第三方平台支持 Yi-34b-200k 模型的聊天功能？\u003C\u002Fstrong>\n  \u003Cbr>\n  如果您正在寻找第三方聊天平台，可以选择 [fireworks.ai](https:\u002F\u002Ffireworks.ai\u002Flogin?callbackURL=https:\u002F\u002Ffireworks.ai\u002Fmodels\u002Ffireworks\u002Fyi-34b-chat)。\n\n\u003C\u002Fdetails>\n\n### 学习中心\n\n\u003Cdetails>\n\u003Csummary> 如果您想学习 Yi，这里有许多有用的教育资源可供您参考。⬇️\u003C\u002Fsummary> \n\u003Cbr> \n\n欢迎来到 Yi 学习中心！\n\n无论您是经验丰富的开发者还是初学者，都可以在这里找到丰富的学习资源，帮助您更好地理解并掌握 Yi 模型的相关知识，包括深度博客文章、全面的视频教程、实践指南等等。\n\n这里的内容由众多精通 Yi 的专家和热情爱好者无私分享而来。我们衷心感谢大家的宝贵贡献！\n\n同时，我们也诚挚邀请您加入我们的共建行列，为 Yi 贡献一份力量。如果您已经为 Yi 做过贡献，请不要犹豫，在下方表格中展示您的优秀成果吧。\n\n有了这些资源的帮助，相信您已经准备好开启一段精彩的 Yi 学习之旅了！祝您学习愉快！🥳\n\n#### 教程\n\n##### 博客教程\n\n| 可交付成果                                                  | 日期       | 作者                                                       |\n| ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ |\n| [使用 Dify、Meilisearch、零一万物模型实现最简单的 RAG   应用（三）：AI 电影推荐](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FRi2ap9_5EMzdfiBhSSL_MQ) | 2024-05-20 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [使用autodl服务器，在A40显卡上运行，   Yi-34B-Chat-int4模型，并使用vllm优化加速，显存占用42G，速度18 words-s](https:\u002F\u002Fblog.csdn.net\u002Ffreewebsys\u002Farticle\u002Fdetails\u002F134698597?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-17-134698597-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-05-20 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [Yi-VL   最佳实践](https:\u002F\u002Fmodelscope.cn\u002Fdocs\u002Fyi-vl最佳实践) | 2024-05-20 | [ModelScope](https:\u002F\u002Fgithub.com\u002Fmodelscope)                  |\n| [一键运行零一万物新鲜出炉Yi-1.5-9B-Chat大模型](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FntMs2G_XdWeM3I6RUOBJrA) | 2024-05-13 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [零一万物开源Yi-1.5系列大模型](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002Fd-ogq4hcFbsuL348ExJxpA) | 2024-05-13 | [刘聪](https:\u002F\u002Fgithub.com\u002Fliucongg)                          |\n| [零一万物Yi-1.5系列模型发布并开源！ 34B-9B-6B   多尺寸，魔搭社区推理微调最佳实践教程来啦！](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002F3wD-0dCgXB646r720o8JAg) | 2024-05-13 | [ModelScope](https:\u002F\u002Fgithub.com\u002Fmodelscope)                  |\n| [Yi-34B   本地部署简单测试](https:\u002F\u002Fblog.csdn.net\u002Farkohut\u002Farticle\u002Fdetails\u002F135331469?ops_request_misc=%7B%22request%5Fid%22%3A%22171636390616800185813639%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636390616800185813639&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-10-135331469-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-05-13 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [驾辰龙跨Llama持Wasm，玩转Yi模型迎新春过大年（上）](https:\u002F\u002Fblog.csdn.net\u002Fweixin_53443275\u002Farticle\u002Fdetails\u002F136091398?ops_request_misc=%7B%22request%5Fid%22%3A%22171636390616800185813639%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636390616800185813639&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-5-136091398-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-05-13 | [Words  worth](https:\u002F\u002Fblog.csdn.net\u002Fweixin_53443275?type=blog) |\n| [驾辰龙跨Llama持Wasm，玩转Yi模型迎新春过大年（下篇）](https:\u002F\u002Fblog.csdn.net\u002Fweixin_53443275\u002Farticle\u002Fdetails\u002F136096309) | 2024-05-13 | [Words  worth](https:\u002F\u002Fblog.csdn.net\u002Fweixin_53443275?type=blog) |\n| [Ollama新增两个命令，开始支持零一万物Yi-1.5系列模型](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FbBgzGJvUqIohodcy9U-pFw) | 2024-05-13 | AI工程师笔记                                                 |\n| [使用零一万物 200K 模型和 Dify 快速搭建模型应用](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F686774859) | 2024-05-13 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [(持更) 零一万物模型折腾笔记：社区 Yi-34B 微调模型使用](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F671549900) | 2024-05-13 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [Python+ERNIE-4.0-8K-Yi-34B-Chat大模型初探](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FWaygSfn5T8ZPB1mPdGADEQ) | 2024-05-11 | 江湖评谈                                                     |\n| [技术布道   Vue及Python调用零一万物模型和Prompt模板（通过百度千帆大模型平台）](https:\u002F\u002Fblog.csdn.net\u002Fucloud2012\u002Farticle\u002Fdetails\u002F137187469) | 2024-05-11 | [MumuLab](https:\u002F\u002Fblog.csdn.net\u002Fucloud2012?type=blog)        |\n| [多模态大模型Yi-VL-plus体验 效果很棒](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F694736111) | 2024-04-27 | [大家好我是爱因](https:\u002F\u002Fwww.zhihu.com\u002Fpeople\u002Fiamein)        |\n| [使用autodl服务器，两个3090显卡上运行，   Yi-34B-Chat-int4模型，并使用vllm优化加速，显存占用42G，速度23 words-s](https:\u002F\u002Fblog.csdn.net\u002Ffreewebsys\u002Farticle\u002Fdetails\u002F134725765?ops_request_misc=%7B%22request%5Fid%22%3A%22171636356716800211598950%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636356716800211598950&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-9-134725765-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-04-27 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [Getting Started with Yi-1.5-9B-Chat](https:\u002F\u002Fwww.secondstate.io\u002Farticles\u002Fyi-1.5-9b-chat\u002F) | 2024-04-27 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [基于零一万物yi-vl-plus大模型简单几步就能批量生成Anki图片笔记](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002F_ea6g0pzzeO4WyYtuWycWQ) | 2024-04-24 | [正经人王同学](https:\u002F\u002Fgithub.com\u002Fzjrwtx)                    |\n| [【AI开发：语言】一、Yi-34B超大模型本地部署CPU和GPU版](https:\u002F\u002Fblog.csdn.net\u002Falarey\u002Farticle\u002Fdetails\u002F137769471?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-16-137769471-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-04-21 | [My的梦想已实现](https:\u002F\u002Fblog.csdn.net\u002Falarey?type=blog)     |\n| [【Yi-34B-Chat-Int4】使用4个2080Ti显卡11G版本，运行Yi-34B模型，5年前老显卡是支持的，可以正常运行，速度   21 words-s，vllm要求算力在7以上的显卡就可以](https:\u002F\u002Fblog.csdn.net\u002Ffreewebsys\u002Farticle\u002Fdetails\u002F134754086) | 2024-03-22 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [零一万物大模型部署+微调总结](https:\u002F\u002Fblog.csdn.net\u002Fv_wus\u002Farticle\u002Fdetails\u002F135704126?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-18-135704126-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-03-22 | [v_wus](https:\u002F\u002Fblog.csdn.net\u002Fv_wus?type=blog)               |\n| [零一万物Yi大模型vllm推理时Yi-34B或Yi-6bchat重复输出的解决方案](https:\u002F\u002Fblog.csdn.net\u002Fqq_39667443\u002Farticle\u002Fdetails\u002F136028776?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-6-136028776-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-03-02 | [郝铠锋](https:\u002F\u002Fblog.csdn.net\u002Fqq_39667443?type=blog)        |\n| [Yi-34B微调训练](https:\u002F\u002Fblog.csdn.net\u002Flsjlnd\u002Farticle\u002Fdetails\u002F135336984?ops_request_misc=%7B%22request%5Fid%22%3A%22171636343416800188513953%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636343416800188513953&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-12-135336984-null-null.nonecase&utm_term=Yi大模型&spm=1018.2226.3001.4450) | 2024-03-02 | [lsjlnd](https:\u002F\u002Fblog.csdn.net\u002Flsjlnd?type=blog)             |\n| [实测零一万物Yi-VL多模态语言模型：能准确“识图吃瓜”](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002Ffu4O9XvJ03JhimsEyI-SsQ) | 2024-02-02 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [零一万物开源Yi-VL多模态大模型，魔搭社区推理&微调最佳实践来啦！](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F680098411) | 2024-01-26 | [ModelScope](https:\u002F\u002Fgithub.com\u002Fmodelscope)                  |\n| [单卡 3 小时训练 Yi-6B 大模型 Agent：基于 Llama   Factory 实战](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F678989191) | 2024-01-22 | [郑耀威](https:\u002F\u002Fgithub.com\u002Fhiyouga)                         |\n| [零一科技Yi-34B   Chat大模型环境搭建&推理](https:\u002F\u002Fblog.csdn.net\u002Fzzq1989_\u002Farticle\u002Fdetails\u002F135597181?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-8-135597181-null-null.nonecase&utm term=Yi大模型&spm=1018.2226.3001.4450) | 2024-01-15 | [要养家的程序员](https:\u002F\u002Fblog.csdn.net\u002Fzzq1989_?type=blog)   |\n| [基于LLaMA   Factory，单卡3小时训练专属大模型 Agent](https:\u002F\u002Fblog.csdn.net\u002Fm0_59596990\u002Farticle\u002Fdetails\u002F135760285?ops_request_misc=%7B%22request%5Fid%22%3A%22171636343416800188513953%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636343416800188513953&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-10-135760285-null-null.nonecase&utm term=Yi大模型&spm=1018.2226.3001.4450) | 2024-01-15 | [机器学习社区](https:\u002F\u002Fblog.csdn.net\u002Fm0_59596990?type=blog)  |\n| [双卡   3080ti 部署 Yi-34B 大模型 - Gradio + vLLM 踩坑全记录](https:\u002F\u002Fblog.csdn.net\u002Farkohut\u002Farticle\u002Fdetails\u002F135321242?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-10-135321242-null-null.nonecase&utm term=Yi大模型&spm=1018.2226.3001.4450) | 2024-01-02 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [【大模型部署实践-3】3个能在3090上跑起来的4bits量化Chat模型（baichuan2-13b、InternLM-20b、Yi-34b）](https:\u002F\u002Fblog.csdn.net\u002Fqq_40302568\u002Farticle\u002Fdetails\u002F135040985?ops_request_misc=%7B%22request%5Fid%22%3A%22171636168816800227489911%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636168816800227489911&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-30-135040985-null-null.nonecase&utm term=Yi大模型&spm=1018.2226.3001.4450) | 2024-01-02 | [aq_Seabiscuit](https:\u002F\u002Fblog.csdn.net\u002Fqq_40302568?type=blog) |\n| [只需 24G   显存，用 vllm 跑起来 Yi-34B 中英双语大模型](https:\u002F\u002Fblog.csdn.net\u002Farkohut\u002Farticle\u002Fdetails\u002F135274973) | 2023-12-28 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [零一万物模型官方   Yi-34B 模型本地离线运行部署使用笔记（物理机和docker两种部署方式），200K 超长文本内容，34B 干翻一众 70B   模型，打榜分数那么高，这模型到底行不行？](https:\u002F\u002Fblog.csdn.net\u002Fu014374009\u002Farticle\u002Fdetails\u002F136327696) | 2023-12-28 | [代码讲故事](https:\u002F\u002Fblog.csdn.net\u002Fu014374009?type=blog)     |\n| [LLM -   大模型速递之 Yi-34B 入门与 LoRA 微调](https:\u002F\u002Fblog.csdn.net\u002FBIT_666\u002Farticle\u002Fdetails\u002F134990402) | 2023-12-18 | [BIT_666](https:\u002F\u002Fbitddd.blog.csdn.net\u002F?type=blog)           |\n| [通过vllm框架进行大模型推理](https:\u002F\u002Fblog.csdn.net\u002Fweixin_45920955\u002Farticle\u002Fdetails\u002F135300561?ops_request_misc=%7B%22request%5Fid%22%3A%22171636343416800188513953%22%2C%22scm%22%3A%2220140713.130102334.pc%5Fblog.%22%7D&request_id=171636343416800188513953&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-13-135300561-null-null.nonecase&utm term=Yi大模型&spm=1018.2226.3001.4450) | 2023-12-18 | [土山炮](https:\u002F\u002Fblog.csdn.net\u002Fweixin_45920955?type=blog)    |\n| [CPU 混合推理，非常见大模型量化方案：“二三五六” 位量化方案](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F671698216) | 2023-12-12 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [零一万物模型折腾笔记：官方 Yi-34B 模型基础使用](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F671387298) | 2023-12-10 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n| [Running Yi-34B-Chat locally using LlamaEdge](https:\u002F\u002Fwww.secondstate.io\u002Farticles\u002Fyi-34b\u002F) | 2023-11-30 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [本地运行零一万物 34B 大模型，使用 Llama.cpp &   21G 显存](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F668921042) | 2023-11-26 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)                         |\n\n##### GitHub 项目\n\n| 可交付成果                                                  | 日期       | 作者                                      |\n| ------------------------------------------------------------ | ---------- | ------------------------------------------- |\n| [yi-openai-proxy](https:\u002F\u002Fgithub.com\u002Fsoulteary\u002Fyi-openai-proxy) | 2024-05-11 | [苏洋](https:\u002F\u002Fgithub.com\u002Fsoulteary)        |\n| [基于零一万物 Yi 模型和 B 站构建大语言模型高质量训练数据集](https:\u002F\u002Fgithub.com\u002Fzjrwtx\u002FbilibiliQA_databuilder) | 2024-04-29 | [正经人王同学](https:\u002F\u002Fgithub.com\u002Fzjrwtx)   |\n| [基于视频网站和零一万物大模型构建大语言模型高质量训练数据集](https:\u002F\u002Fgithub.com\u002Fzjrwtx\u002FVideoQA_databuilder) | 2024-04-25 | [正经人王同学](https:\u002F\u002Fgithub.com\u002Fzjrwtx)   |\n| [基于零一万物yi-34b-chat-200k输入任意文章地址，点击按钮即可生成无广告或推广内容的简要笔记，并生成分享图给好友](https:\u002F\u002Fgithub.com\u002Fzjrwtx\u002Fopen_summary) | 2024-04-24 | [正经人王同学](https:\u002F\u002Fgithub.com\u002Fzjrwtx)   |\n| [Food-GPT-Yi-model](https:\u002F\u002Fgithub.com\u002FThisisHubert\u002FFoodGPT-Yi-model) | 2024-04-21 | [Hubert S](https:\u002F\u002Fgithub.com\u002FThisisHubert) |\n\n##### 视频教程\n\n| 交付成果                                                  | 日期       | 作者                                                       |\n| ------------------------------------------------------------ | ---------- | ------------------------------------------------------------ |\n| [在物联网设备上运行 dolphin-2.2-yi-34b](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=NJ89T5mO25Y) | 2023-11-30 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [仅需 24G 显存，用 vllm 跑起来 Yi-34B 中英双语大模型](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV17t4y1f7Ee\u002F) | 2023-12-28 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [本地安装 Yi 34B - 中英双语大语言模型](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=CVQvj4Wrh4w&t=476s) | 2023-11-05 | [Fahd Mirza](https:\u002F\u002Fwww.youtube.com\u002F@fahdmirza)             |\n| [Dolphin Yi 34b - 全新基础模型测试](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=On3Zuv27V3k&t=85s) | 2023-11-27 | [Matthew Berman](https:\u002F\u002Fwww.youtube.com\u002F@matthew_berman)    |\n| [Yi-VL-34B 多模态大模型 - 用两张 A40 显卡跑起来](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Q5411y7AG\u002F) | 2024-01-28 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [4060Ti 16G显卡安装零一万物最新开源的Yi-1.5版大语言模型](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV16i421X7Jx\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-05-14 | [titan909](https:\u002F\u002Fspace.bilibili.com\u002F526393761)             |\n| [Yi-1.5: 真正的 Apache 2.0 许可竞争者，媲美 LLAMA-3](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=KCDYrfWeTRc) | 2024-05-13 | [Prompt Engineering](https:\u002F\u002Fwww.youtube.com\u002F@engineerprompt) |\n| [本地安装 Yi-1.5 模型 - 在多项基准测试中超越 Llama 3](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Ba-G7Il0UkA) | 2024-05-13 | [Fahd Mirza](https:\u002F\u002Fwww.youtube.com\u002F@fahdmirza)             |\n| [如何安装 Ollama 并运行 Yi 6B](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=4Jnar7OUHqQ) | 2024-05-13 | [Ridaa Davids](https:\u002F\u002Fwww.youtube.com\u002F@quantanovabusiness)  |\n| [地表最强混合智能AI助手：llama3_70B+Yi_34B+Qwen1.5_110B](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Xm411C7V1\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-05-04 | [朱扎特](https:\u002F\u002Fspace.bilibili.com\u002F494512200?spm_id_from=333.788.0.0) |\n| [ChatDoc学术论文辅助--基于Yi-34B和langchain进行PDF知识库问答](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV11i421C7B5\u002F?spm_id_from=333.999.0.0&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-05-03 | [朱扎特](https:\u002F\u002Fspace.bilibili.com\u002F494512200?spm_id_from=333.788.0.0) |\n| [基于Yi-34B的领域知识问答项目演示](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1zZ42177ZA\u002F?spm_id_from=333.999.0.0&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-05-02 | [朱扎特](https:\u002F\u002Fspace.bilibili.com\u002F494512200?spm_id_from=333.788.0.0) |\n| [使用RTX4090+GaLore算法 全参微调Yi-6B大模型](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1ax4y1U7Ep\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-03-24 | [小工蚂创始人](https:\u002F\u002Fspace.bilibili.com\u002F478674499?spm_id_from=333.788.0.0) |\n| [无内容审查NSFW大语言模型Yi-34B-Chat蒸馏版测试,RolePlay,《天龙八部》马夫人康敏,本地GPU,CPU运行](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=VL-W0TnLCns) | 2024-03-20 | [刘悦的技术博客](https:\u002F\u002Fv3u.cn\u002F)                            |\n| [无内容审查NSFW大语言模型整合包,Yi-34B-Chat,本地CPU运行,角色扮演潘金莲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=rBvbgwz3oHM) | 2024-03-16 | [刘悦的技术博客](https:\u002F\u002Fv3u.cn\u002F)                            |\n| [量化 Yi-34B-Chat 并在单卡 RTX 4090 使用 vLLM 部署](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1jx421y7xj\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-03-05 | [白鸽巢](https:\u002F\u002Fspace.bilibili.com\u002F138938660?spm_id_from=333.788.0.0) |\n| [Yi-VL-34B（5）：使用3个3090显卡24G版本，运行Yi-VL-34B模型，支持命令行和web界面方式，理解图片的内容转换成文字](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1BB421z7oA\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-27 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [Win环境KoboldCpp本地部署大语言模型进行各种角色扮演游戏](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV14J4m1e77f\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-25 | [魚蟲蟲](https:\u002F\u002Fspace.bilibili.com\u002F431981179?spm_id_from=333.788.0.0) |\n| [无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P2](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV19v421677y\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-23 | [魚蟲蟲](https:\u002F\u002Fspace.bilibili.com\u002F431981179?spm_id_from=333.788.0.0) |\n| [【wails】（2）：使用go-llama.cpp 运行 yi-01-6b大模型，使用本地CPU运行，速度还可以，等待下一版本更新](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV194421F7Fy\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-20 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [【xinference】（6）：在autodl上，使用xinference部署yi-vl-chat和qwen-vl-chat模型，可以使用openai调用成功](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV19Z421z7cv\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-06 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [无需显卡本地部署Yi-34B-Chat进行角色扮演游戏 P1](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1tU421o7Co\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-02-05 | [魚蟲蟲](https:\u002F\u002Fspace.bilibili.com\u002F431981179?spm_id_from=333.788.0.0) |\n| [2080Ti部署YI-34B大模型 xinference-oneapi-fastGPT本地知识库使用指南](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1hC411z7xu\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-30 | [小饭护法要转码](https:\u002F\u002Fspace.bilibili.com\u002F39486865?spm_id_from=333.788.0.0) |\n| [最佳故事写作AI模型 - 在Windows上本地安装 Yi 6B 200K](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=cZs2jRtl0bs) | 2024-01-22 | [Fahd Mirza](https:\u002F\u002Fwww.youtube.com\u002F@fahdmirza)             |\n| [Mac 本地运行大语言模型方法与常见问题指南（Yi 34B 模型+32 GB 内存测试）](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1VT4y1b7Th\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-21 | [小吴苹果机器人](https:\u002F\u002Fspace.bilibili.com\u002F1732749682?spm_id_from=333.788.0.0) |\n| [【Dify知识库】（11）：Dify0.4.9改造支持MySQL，成功接入yi-6b 做对话，本地使用fastchat启动，占8G显存，完成知识库配置](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1ia4y1y7JH\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-21 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [这位LLM先生有点暴躁,用的是YI-6B的某个量化版,#LLM #大语言模型 #暴躁老哥](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=eahXJrdtQuc) | 2024-01-20 | [晓漫吧](https:\u002F\u002Fwww.youtube.com\u002F@xiaomanba)                 |\n| [大模型推理 NvLink 桥接器有用吗｜双卡 A6000 测试一下](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1AW4y1w7DC\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-17 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [大模型推理 A40 vs A6000 谁更强 - 对比 Yi-34B 的单、双卡推理性能](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1aK4y1z7GF\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-15 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [C-Eval 大语言模型评测基准- 用 LM Evaluation Harness + vLLM 跑起来](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Yw411g7ZL\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-11 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [双显卡部署 Yi-34B 大模型 - vLLM + Gradio 踩坑记录](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1p94y1c7ak\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2024-01-01 | [漆妮妮](https:\u002F\u002Fspace.bilibili.com\u002F1262370256)              |\n| [手把手教学！使用 vLLM 快速部署 Yi-34B-Chat](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1ew41157Mk\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-26 | [白鸽巢](https:\u002F\u002Fspace.bilibili.com\u002F138938660?spm_id_from=333.788.0.0) |\n| [如何训练企业自己的大语言模型？Yi-6B LORA微调演示 #小工蚁](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1uc41117zz\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-21 | [小工蚂创始人](https:\u002F\u002Fspace.bilibili.com\u002F478674499?spm_id_from=333.788.0.0) |\n| [Yi-34B（4）：使用4个2080Ti显卡11G版本，运行Yi-34B模型，5年前老显卡是支持的，可以正常运行，速度 21 words\u002Fs](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1nj41157L3\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-02 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [使用autodl服务器，RTX 3090 * 3 显卡上运行， Yi-34B-Chat模型，显存占用60G](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1BM411R7ae\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-01 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [使用autodl服务器，两个3090显卡上运行， Yi-34B-Chat-int4模型，用vllm优化，增加 --num-gpu 2，速度23 words\u002Fs](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Hu4y1L7BH\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-01 | [fly-iot](https:\u002F\u002Fgitee.com\u002Ffly-iot)                         |\n| [Yi大模型一键本地部署 技术小白玩转AI](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV16H4y117md\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-12-01 | [技术小白玩转AI](https:\u002F\u002Fspace.bilibili.com\u002F3546586137234288?spm_id_from=333.788.0.0) |\n| [01.AI's Yi-6B: 概述和微调](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=mye-UOkAliQ) | 2023-11-28 | [AI Makerspace](https:\u002F\u002Fwww.youtube.com\u002F@AI-Makerspace)      |\n| [Yi 34B Chat LLM 击败 Llama 70B](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=RYtrF-R5jDc) | 2023-11-27 | [DLExplorer](https:\u002F\u002Fwww.youtube.com\u002F@DLExplorers-lg7dt)     |\n| [如何在mac上运行开源模型 Yi 34b on m3 Max](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=GAo-dopkgjI) | 2023-11-26 | [TECHNO PREMIUM](https:\u002F\u002Fwww.youtube.com\u002F@technopremium91)   |\n| [Yi-34B - 200K - 最佳 & 新的上下文窗口之王 ](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=7WBojwwv5Qo) | 2023-11-24 | [Prompt Engineering](https:\u002F\u002Fwww.youtube.com\u002F@engineerprompt) |\n| [Yi 34B : 强大的中等规模模型崛起 - Base,200k & Chat](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=bWCjwtu_tHs) | 2023-11-24 | [Sam Witteveen](https:\u002F\u002Fwww.youtube.com\u002F@samwitteveenai)     |\n| [在IoT设备运行破解版李开复大模型dolphin-2.2-yi-34b（还可作为私有OpenAI API服务器）](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1SQ4y18744\u002F?spm_id_from=333.337.search-card.all.click&vd_source=ab85f93e294a2f6be11db57c29c6d706) | 2023-11-15 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [Run dolphin-2.2-yi-34b on IoT Devices (Also works as a Private OpenAI API Server)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=NJ89T5mO25Y) | 2023-11-14 | [Second State](https:\u002F\u002Fgithub.com\u002Fsecond-state)              |\n| [如何在Windows笔记本电脑上安装 Yi 34B 200K Llamafied](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=enoha4K4HkQ) | 2023-11-11 | [Fahd Mirza](https:\u002F\u002Fwww.youtube.com\u002F@fahdmirza)             |\n\n\u003C\u002Fdetails>\n\n\n\n\n# 为什么选择 Yi？\n\n  - [生态系统](#ecosystem)\n    - [上游](#upstream)\n    - [下游](#downstream)\n      - [推理服务](#serving)\n      - [量化](#quantization-1)\n      - [微调](#fine-tuning-1)\n      - [API](#api)\n  - [基准测试](#benchmarks)\n    - [聊天模型性能](#chat-model-performance)\n    - [基础模型性能](#base-model-performance)\n      - [Yi-34B 和 Yi-34B-200K](#yi-34b-and-yi-34b-200k)\n      - [Yi-9B](#yi-9b)\n\n## 生态系统\n\nYi 拥有全面的生态系统，提供一系列工具、服务和模型，以丰富您的使用体验并最大化生产力。\n\n- [上游](#upstream)\n- [下游](#downstream)\n  - [推理服务](#serving)\n  - [量化](#quantization-1)\n  - [微调](#fine-tuning-1)\n  - [API](#api)\n\n### 上游\n\nYi 系列模型沿用了与 Llama 相同的模型架构。选择 Yi，您可以充分利用 Llama 生态系统中现有的工具、库和资源，无需重新开发新工具，从而提升开发效率。\n\n例如，Yi 系列模型以 Llama 模型的格式保存。您可以直接使用 `LlamaForCausalLM` 和 `LlamaTokenizer` 加载模型。更多信息请参阅 [使用聊天模型](#31-use-the-chat-model)。\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"01-ai\u002FYi-34b\", use_fast=False)\n\nmodel = AutoModelForCausalLM.from_pretrained(\"01-ai\u002FYi-34b\", device_map=\"auto\")\n```\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### 下游\n\n> 💡 小贴士\n> \n> - 欢迎创建 PR，分享您使用 Yi 系列模型所构建的优秀成果。\n>\n> - 为了帮助他人快速理解您的工作，建议采用 `\u003C模型名称>: \u003C模型简介> + \u003C模型亮点>` 的格式。\n\n#### 推理服务\n\n如果您希望在几分钟内快速上手 Yi，可以使用以下基于 Yi 构建的服务。\n\n- Yi-34B-Chat：您可以通过以下平台与 Yi 对话：\n  - [Yi-34B-Chat | Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002F01-ai\u002FYi-34B-Chat)\n  - [Yi-34B-Chat | Yi 平台](https:\u002F\u002Fplatform.lingyiwanwu.com\u002F)：**注意** 目前仅限白名单用户使用。欢迎申请（填写 [英文](https:\u002F\u002Fcn.mikecrm.com\u002Fl91ODJf) 或 [中文](https:\u002F\u002Fcn.mikecrm.com\u002FgnEZjiQ) 表单）并亲身体验！\n\n- [Yi-6B-Chat (Replicate)](https:\u002F\u002Freplicate.com\u002F01-ai)：您可以通过设置额外参数并调用 API 来使用该模型，获得更多选项。\n\n- [ScaleLLM](https:\u002F\u002Fgithub.com\u002Fvectorch-ai\u002FScaleLLM#supported-models)：您可使用此服务在本地运行 Yi 模型，同时享受更高的灵活性和自定义能力。\n\n#### 量化\n\n如果您计算资源有限，可以使用 Yi 的量化模型，如下所示。\n\n这些量化模型虽然精度有所降低，但效率更高，例如推理速度更快、内存占用更小。\n\n- [TheBloke\u002FYi-34B-GPTQ](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FYi-34B-GPTQ)\n- [TheBloke\u002FYi-34B-GGUF](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FYi-34B-GGUF)\n- [TheBloke\u002FYi-34B-AWQ](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FYi-34B-AWQ)\n\n#### 微调\n\n如果您希望探索 Yi 繁荣家族中的多样化能力，可以深入研究以下微调模型。\n\n- [TheBloke Models](https:\u002F\u002Fhuggingface.co\u002FTheBloke)：该网站托管了许多基于各种 LLM（包括 Yi）微调的模型。\n\n这并非 Yi 的完整列表，但按下载量排序，列举几个示例：\n  - [TheBloke\u002Fdolphin-2_2-yi-34b-AWQ](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002Fdolphin-2_2-yi-34b-AWQ)\n  - [TheBloke\u002FYi-34B-Chat-AWQ](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FYi-34B-Chat-AWQ)\n  - [TheBloke\u002FYi-34B-Chat-GPTQ](https:\u002F\u002Fhuggingface.co\u002FTheBloke\u002FYi-34B-Chat-GPTQ)\n\n- [SUSTech\u002FSUS-Chat-34B](https:\u002F\u002Fhuggingface.co\u002FSUSTech\u002FSUS-Chat-34B)：该模型在所有 70B 以下的模型中排名第一，并且性能优于两倍规模的 deepseek-llm-67b-chat。您可以在 [Open LLM Leaderboard](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FHuggingFaceH4\u002Fopen_llm_leaderboard) 上查看结果。\n\n- [OrionStarAI\u002FOrionStar-Yi-34B-Chat-Llama](https:\u002F\u002Fhuggingface.co\u002FOrionStarAI\u002FOrionStar-Yi-34B-Chat-Llama)：该模型在 [OpenCompass LLM Leaderboard](https:\u002F\u002Fopencompass.org.cn\u002Fleaderboard-llm) 的 C-Eval 和 CMMLU 测试中，表现超越了其他模型（如 GPT-4、Qwen-14B-Chat、Baichuan2-13B-Chat）。\n\n- [NousResearch\u002FNous-Capybara-34B](https:\u002F\u002Fhuggingface.co\u002FNousResearch\u002FNous-Capybara-34B)：该模型使用 Capybara 数据集训练，上下文长度为 20 万，训练了 3 个 epoch。\n\n#### API\n\n- [amazing-openai-api](https:\u002F\u002Fgithub.com\u002Fsoulteary\u002Famazing-openai-api)：该工具可将 Yi 模型的 API 直接转换为 OpenAI API 格式。\n- [LlamaEdge](https:\u002F\u002Fwww.secondstate.io\u002Farticles\u002Fyi-34b\u002F#create-an-openai-compatible-api-service-for-the-yi-34b-chat-model)：该工具利用 Rust 编写的便携式 Wasm（WebAssembly）文件，为 Yi-34B-Chat 构建了一个兼容 OpenAI 的 API 服务器。\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n## 技术报告\n\n有关 Yi 系列模型的详细能力，请参阅 [Yi: 由 01.AI 开放的基础模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.04652)。\n\n### 引用\n\n```\n@misc{ai2024yi,\n    title={Yi: Open Foundation Models by 01.AI},\n    author={01. AI and : and Alex Young and Bei Chen and Chao Li and Chengen Huang and Ge Zhang and Guanwei Zhang and Heng Li and Jiangcheng Zhu and Jianqun Chen and Jing Chang and Kaidong Yu and Peng Liu and Qiang Liu and Shawn Yue and Senbin Yang and Shiming Yang and Tao Yu and Wen Xie and Wenhao Huang and Xiaohui Hu and Xiaoyi Ren and Xinyao Niu and Pengcheng Nie and Yuchi Xu and Yudong Liu and Yue Wang and Yuxuan Cai and Zhenyu Gu and Zhiyuan Liu and Zonghong Dai},\n    year={2024},\n    eprint={2403.04652},\n    archivePrefix={arXiv},\n    primaryClass={cs.CL}\n}\n```\n\n## 基准测试\n\n- [聊天模型性能](#chat-model-performance)\n- [基础模型性能](#base-model-performance)\n\n### 聊天模型性能\n\nYi-34B-Chat 模型表现出色，在 MMLU、CMMLU、BBH、GSM8k 等多项基准测试中位居所有开源模型之首。\n\n![聊天模型性能](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_d317706bd8fe.png) \n\n\u003Cdetails>\n\u003Csummary> 评估方法与挑战。⬇️ \u003C\u002Fsummary>\n\n- **评估方法**：我们使用零样本和少样本两种方式对各类基准进行了评估，TruthfulQA 除外。\n- **零样本 vs. 少样本**：在聊天模型中，通常更倾向于采用零样本方法。\n- **评估策略**：我们的评估策略是在明确或隐含地遵循指令（例如通过少样本示例）的情况下生成响应，随后从生成的文本中提取相关答案。\n- **面临的挑战**：部分模型并不擅长以少数数据集指令所要求的特定格式输出，这导致了次优的结果。\n\n\u003Cstrong>*\u003C\u002Fstrong>：C-Eval 的结果基于验证数据集进行评估\n\u003C\u002Fdetails>\n\n### 基础模型性能\n\n#### Yi-34B 和 Yi-34B-200K \n\nYi-34B 和 Yi-34B-200K 模型在开源模型中表现突出，尤其在 MMLU、CMMLU、常识推理、阅读理解等方面表现出色。\n\n![基础模型性能](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_aaa969cb7c68.png)\n\n\u003Cdetails>\n\u003Csummary> 评估方法。⬇️ \u003C\u002Fsummary>\n\n- **结果差异**：在对开源模型进行基准测试时，我们发现自身流水线的结果与 OpenCompass 等公开来源报告的结果存在差异。\n- **调查结果**：深入分析表明，不同模型在提示词、后处理策略以及采样技术上的差异可能导致显著的结果偏差。\n- **统一的基准测试流程**：我们的方法遵循原始基准测试的标准——使用一致的提示词和后处理策略，并在评估过程中采用贪婪解码，不对生成内容进行任何后处理。\n- **尝试获取未公开的分数**：对于原始作者未报告的分数（包括以不同设置报告的分数），我们尝试用自身的流水线重新计算。\n- **全面的模型评估**：为全面评估模型能力，我们采用了 Llama2 中的方法。具体而言，我们加入了 PIQA、SIQA、HellaSwag、WinoGrande、ARC、OBQA 和 CSQA 来评估常识推理能力。同时，还引入了 SquAD、QuAC 和 BoolQ 来评估阅读理解能力。\n- **特殊配置**：CSQA 仅采用 7 抽样设置进行测试，其余测试均采用 0 抽样设置。此外，我们在“数学与代码”类别下引入了 GSM8K（8 抽样@1）、MATH（4 抽样@1）、HumanEval（0 抽样@1）和 MBPP（3 抽样@1）。\n- **Falcon-180B 的说明**：由于技术限制，Falcon-180B 未在 QuAC 和 OBQA 上进行测试。其性能评分是其他任务的平均值，考虑到这两项任务的得分普遍较低，Falcon-180B 的实际能力可能并未被低估。\n\u003C\u002Fdetails>\n\n#### Yi-9B\n\nYi-9B 在一系列类似规模的开源模型中几乎处于最佳位置（包括 Mistral-7B、SOLAR-10.7B、Gemma-7B、DeepSeek-Coder-7B-Base-v1.5 等），尤其在代码、数学、常识推理和阅读理解方面表现优异。\n\n![Yi-9B 基准测试详情](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_c6971329b01f.png)\n\n- 在 **综合** 能力方面（Mean-All），Yi-9B 表现优于 DeepSeek-Coder、DeepSeek-Math、Mistral-7B、SOLAR-10.7B 和 Gemma-7B，位居同类开源模型之首。\n\n  ![Yi-9B 基准测试 - 综合](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_19980488c3c4.png)\n\n- 在 **编码** 能力方面（Mean-Code），Yi-9B 的表现仅次于 DeepSeek-Coder-7B，超越了 Yi-34B、SOLAR-10.7B、Mistral-7B 和 Gemma-7B。\n\n  ![Yi-9B 基准测试 - 编码](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_b2eafeb91353.png)\n\n- 在 **数学** 能力方面（Mean-Math），Yi-9B 的表现仅次于 DeepSeek-Math-7B，超过了 SOLAR-10.7B、Mistral-7B 和 Gemma-7B。\n\n  ![Yi-9B 基准测试 - 数学](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_ef6008782aea.png)\n\n- 在 **常识与推理** 能力方面（Mean-Text），Yi-9B 的表现与 Mistral-7B、SOLAR-10.7B 和 Gemma-7B 不相上下。\n\n  ![Yi-9B 基准测试 - 文本](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_41deed45235f.png)\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n# 谁可以使用 Yi？\n\n所有人都可以！🙌 ✅\n\nYi 系列模型的代码和权重依据 [Apache 2.0 许可证](https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Fmain\u002FLICENSE) 进行分发，这意味着 Yi 系列模型可供个人使用、学术研究以及商业用途，且完全免费。\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n# 其他\n\n### 致谢\n\n衷心感谢每一位为 Yi 社区做出贡献的人！正是你们的努力，让 Yi 不仅仅是一个项目，更成为一个充满活力、不断发展的创新家园。\n\n[![yi contributors](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_readme_c30fd069c6c5.png)](https:\u002F\u002Fgithub.com\u002F01-ai\u002Fyi\u002Fgraphs\u002Fcontributors)\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### 免责声明\n\n我们在训练过程中使用了数据合规性检查算法，以尽可能确保训练后的模型符合规范。然而，由于数据的复杂性和语言模型应用场景的多样性，我们无法保证模型在所有情况下都能生成正确且合理的输出。请注意，模型仍有可能产生问题输出的风险。对于因误用、误导、非法使用及相关虚假信息而引发的风险和问题，以及由此产生的任何数据安全顾虑，我们概不负责。\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>\n\n### 许可证\n\nYi-1.5 系列模型的代码和权重依据 [Apache 2.0 许可证](https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fblob\u002Fmain\u002FLICENSE) 进行分发。\n\n如果您基于此模型创建衍生作品，请在您的衍生作品中包含以下署名：\n\n    本作品是基于 01.AI 的 [您所使用的 Yi 系列模型] 的衍生作品，依据 Apache 2.0 许可证使用。\n\n\u003Cp align=\"right\"> [\n  \u003Ca href=\"#top\">返回顶部 ⬆️ \u003C\u002Fa>  ] \n\u003C\u002Fp>","# Yi 大模型快速上手指南\n\nYi 系列是由零一万物（01.AI）研发的新一代开源双语大语言模型，在中文和英文理解、逻辑推理及代码能力上表现卓越。本指南将帮助您快速在本地部署并运行 Yi 模型。\n\n## 1. 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐 Ubuntu 20.04+), macOS, Windows (WSL2)\n- **GPU**: NVIDIA GPU (推荐显存 16GB 以上以运行 34B 量化版，8GB 以上可运行 6B 量化版)\n- **CUDA**: 11.8 或更高版本\n- **Python**: 3.8 - 3.11\n\n### 前置依赖\n确保已安装以下基础工具：\n- `git`\n- `pip` 或 `conda`\n- `nvidia-docker` (若使用 Docker 方案)\n\n## 2. 安装步骤\n\n您可以根据需求选择以下任意一种方式快速启动。\n\n### 方案 A：使用 pip 快速安装 (推荐)\n适合快速体验推理功能。\n\n```bash\n# 安装 transformers 和 accelerate\npip install transformers accelerate torch --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n\n# 国内加速源可选 (阿里云镜像)\n# pip install transformers accelerate torch -i https:\u002F\u002Fmirrors.aliyun.com\u002Fpypi\u002Fsimple\u002F\n```\n\n### 方案 B：使用 Docker (环境隔离最佳)\n适合生产部署或避免环境冲突。\n\n```bash\n# 拉取官方构建的镜像\ndocker pull 01ai\u002Fyi:latest\n\n# 运行容器 (需挂载模型目录)\ndocker run --gpus all -it -v \u002Fpath\u002Fto\u002Fmodels:\u002Fmodels 01ai\u002Fyi:latest\n```\n\n### 方案 C：使用 llama.cpp (低显存\u002FCPU 推理)\n适合在消费级显卡或纯 CPU 环境下运行量化模型。\n\n```bash\n# 克隆仓库并编译\ngit clone https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp.git\ncd llama.cpp\nmake LLAMA_CUBLAS=1\n\n# 下载 Yi 的 GGUF 量化模型 (以 Yi-34B-Chat-4bits 为例)\n# 请从 HuggingFace 或 ModelScope 下载对应的 .gguf 文件\n```\n\n## 3. 基本使用\n\n### 方法一：Python 代码调用 (Transformers)\n这是最通用的开发方式，适用于二次开发和集成。\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\n# 加载模型 (建议使用国内镜像源 ModelScope 或指定 local_files_only)\n# 模型路径可以是本地路径，如 \".\u002FYi-6B-Chat\"\nmodel_path = \"01-ai\u002FYi-6B-Chat\" \n\ntokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_path, \n    device_map=\"auto\", \n    torch_dtype=torch.float16, \n    trust_remote_code=True\n)\n\n# 构造对话输入\nprompt = \"你好，请介绍一下你自己。\"\nmessages = [\n    {\"role\": \"user\", \"content\": prompt}\n]\n\n# 生成回复\ntext = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\ninputs = tokenizer([text], return_tensors=\"pt\").to(model.device)\n\noutputs = model.generate(**inputs, max_new_tokens=512)\nresponse = tokenizer.decode(outputs[0], skip_special_tokens=True)\n\nprint(response)\n```\n\n### 方法二：命令行交互 (llama.cpp)\n如果您下载了 GGUF 格式的量化模型，可使用以下命令直接对话：\n\n```bash\n# 替换为您下载的 .gguf 模型文件路径\n.\u002Fmain -m models\u002Fyi-34b-chat-4bits.gguf -p \"你好，今天天气怎么样？\" -n 512 --color\n```\n\n### 方法三：Web Demo\n项目支持启动本地 Web 界面进行交互测试：\n\n```bash\n# 进入项目目录后运行\npython web_demo.py --model-path \u002Fpath\u002Fto\u002FYi-6B-Chat\n```\n*注：具体脚本文件名请参考仓库最新 `examples` 目录。*\n\n---\n**提示**：国内开发者推荐优先从 **ModelScope (魔搭社区)** 或 **Wisemodel (始智 AI)** 下载模型权重，以获得更快的下载速度。","某跨境电商公司的技术团队需要构建一个能同时处理海量英文产品评论和中文客服工单的智能分析系统，以实时提取用户情感倾向并生成回复建议。\n\n### 没有 Yi 时\n- **语言割裂严重**：团队需分别部署英文和中文两套模型，导致架构复杂且维护成本高昂，难以实现双语上下文关联分析。\n- **推理延迟过高**：通用小模型在长文本（如详细评测）上表现不佳，而调用国外顶尖闭源 API 不仅网络延迟大，还面临数据出境合规风险。\n- **领域适配困难**：开源模型在电商垂直领域的术语理解上偏差较大，微调后容易出现“灾难性遗忘”，丢失原有的多语言能力。\n- **资源消耗巨大**：为了兼顾精度与速度，不得不采购昂贵的 GPU 集群来运行多个冗余模型，算力预算经常超标。\n\n### 使用 Yi 后\n- **原生双语统一**：利用 Yi 基于 3T 多语料从头训练的特性，单模型即可完美覆盖中英混合输入，无需切换上下文，架构简化为单一服务。\n- **性能与合规兼得**：Yi-34B-Chat 在 AlpacaEval 等榜单表现接近 GPT-4 Turbo，本地化部署既消除了网络延迟，又确保了敏感用户数据不出境。\n- **垂直领域精准**：凭借强大的基座能力，仅需少量电商数据进行微调，Yi 就能准确识别“物流时效”、“材质触感”等专业术语的情感色彩。\n- **降本增效显著**：得益于高效的推理优化，团队在同等算力下吞吐量提升明显，大幅降低了单位请求的硬件成本。\n\nYi 以其卓越的原生双语能力和开源灵活性，帮助企业在保障数据主权的前提下，用更低成本实现了媲美顶尖闭源模型的业务智能化升级。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F01-ai_Yi_52402f9e.png","01-ai","01.AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002F01-ai_a7914c00.png","A global company building AI 2.0 platform and applications",null,"yi@01.ai","01ai_yi","https:\u002F\u002F01.ai\u002F","https:\u002F\u002Fgithub.com\u002F01-ai",[85,89,93,97],{"name":86,"color":87,"percentage":88},"Jupyter Notebook","#DA5B0B",92.2,{"name":90,"color":91,"percentage":92},"Python","#3572A5",7.6,{"name":94,"color":95,"percentage":96},"Shell","#89e051",0.1,{"name":98,"color":99,"percentage":96},"Dockerfile","#384d54",7838,487,"2026-04-02T18:30:08","Apache-2.0","未说明","未说明（文中提及支持量化版本如 4-bit\u002F8-bit 以降低硬件门槛，但未列出具体显存或 CUDA 版本要求）",{"notes":107,"python":104,"dependencies":108},"README 主要介绍了模型系列（Yi, Yi-1.5, Yi-VL）、参数量（6B, 9B, 34B 等）、上下文长度（4K, 32K, 200K）及获取渠道。文中提到可通过 pip、docker、llama.cpp 或 conda-lock 快速开始，并提供了量化模型（GPTQ\u002FAWQ）以适应不同硬件，但当前提供的文本片段中未包含具体的操作系统、GPU 型号、内存大小、Python 版本或依赖库的详细版本号。建议查看仓库中的 'Quick start' 具体章节或 Dockerfile 以获取详细环境配置。",[104],[15],[111],"large-language-models",4,"2026-03-27T02:49:30.150509","2026-04-06T07:12:04.311246",[116,121,126,131,136,140],{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},10482,"如何使用自定义数据集对 Yi-VL 模型进行微调？","可以使用 modelscope\u002Fswift 库进行微调。在命令行参数中指定自定义数据集路径：\n--custom_train_dataset_path xxx.jsonl \\\n--custom_val_dataset_path yyy.jsonl\n\n数据格式应为 JSON\u002FJSONL 列表，包含 query（问题）、response（回答）和 images（图片路径列表）。示例如下：\n[{\"query\": \"55555\", \"response\": \"66666\", \"images\": [\"image_path\"]},\n{\"query\": \"eeeee\", \"response\": \"fffff\", \"history\": [], \"images\": [\"image_path\"]},\n{\"query\": \"EEEEE\", \"response\": \"FFFFF\", \"history\": [[\"AAAAA\", \"BBBBB\"], [\"CCCCC\", \"DDDDD\"]], \"images\": [\"image_path\", \"image_path2\", \"image_path3\"]}]","https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fissues\u002F348",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},10483,"运行长上下文模型（如 Yi-6B-200K）时遇到 CUDA device-side assert 或 CUBLAS 错误怎么办？","这通常与显存不足或配置有关。尝试修改模型目录下的 config.json 文件，将 max_position_embeddings 的值调整为目标长度（例如 100000）。如果调整后出现 CUBLAS_STATUS_EXECUTION_FAILED 错误，说明长序列带来了额外的显存消耗，当前显卡显存可能不足以支撑该配置，建议减少序列长度或更换显存更大的显卡。","https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fissues\u002F100",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},10484,"微调 Yi-34B 模型时遇到 CUDA Out Of Memory (OOM) 错误如何解决？","请检查 transformers 库的版本，建议使用 4.35 或更高版本。同时在代码加载模型时（from_pretrained），添加参数 use_flash_attention_2=True 以启用 Flash Attention 2，这能显著降低显存占用。示例：\nmodel = AutoModelForCausalLM.from_pretrained(..., use_flash_attention_2=True)","https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fissues\u002F163",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},10485,"微调后的模型生成结果中包含大量多余的 \u003C\u002Fs> 标记，如何解决？","这是因为使用了快速的分词器（fast tokenizer）导致的兼容性问题。建议在加载分词器时设置 use_fast=False。目前官方推荐始终使用非快速模式的分词器以避免此类生成异常。","https:\u002F\u002Fgithub.com\u002F01-ai\u002FYi\u002Fissues\u002F80",{"id":137,"question_zh":138,"answer_zh":139,"source_url":135},10486,"微调 Yi 模型时应该使用哪个 EOS Token（结束符）？是 \u003C|endoftext|> 还是 \u003C|im_end|>？","不建议将 \u003C|im_end|> 设为 EOS Token，因为这会导致生成过早停止。\u003C|im_end|> 应作为文本末尾的可选标记，而真正的 EOS Token 应使用 \u003C|endoftext|> (ID 64001)。",{"id":141,"question_zh":142,"answer_zh":143,"source_url":125},10487,"在 4090 等消费级显卡上运行长上下文模型遇到显存错误或执行失败怎么办？","长序列推理会消耗大量额外显存。如果在修改 config.json 中的 max_position_embeddings 后遇到 CUBLAS 执行失败，通常是因为显存不足。官方表示由于缺乏 4090 硬件难以直接复现，但建议用户尝试降低最大序列长度或使用显存更大的专业卡（如 A100\u002FA800）来运行超长上下文任务。",[145],{"id":146,"version":147,"summary_zh":79,"released_at":79},71051,"0.1.0"]