[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-deepseek-ai--DeepSeek-Coder-V2":3,"tool-deepseek-ai--DeepSeek-Coder-V2":64},[4,17,25,39,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":10,"last_commit_at":23,"category_tags":24,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":26,"name":27,"github_repo":28,"description_zh":29,"stars":30,"difficulty_score":10,"last_commit_at":31,"category_tags":32,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[33,34,35,36,14,37,15,13,38],"图像","数据工具","视频","插件","其他","音频",{"id":40,"name":41,"github_repo":42,"description_zh":43,"stars":44,"difficulty_score":45,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[14,33,13,15,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":45,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[15,33,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":45,"last_commit_at":62,"category_tags":63,"status":16},2181,"OpenHands","OpenHands\u002FOpenHands","OpenHands 是一个专注于 AI 驱动开发的开源平台，旨在让智能体（Agent）像人类开发者一样理解、编写和调试代码。它解决了传统编程中重复性劳动多、环境配置复杂以及人机协作效率低等痛点，通过自动化流程显著提升开发速度。\n\n无论是希望提升编码效率的软件工程师、探索智能体技术的研究人员，还是需要快速原型验证的技术团队，都能从中受益。OpenHands 提供了灵活多样的使用方式：既可以通过命令行（CLI）或本地图形界面在个人电脑上轻松上手，体验类似 Devin 的流畅交互；也能利用其强大的 Python SDK 自定义智能体逻辑，甚至在云端大规模部署上千个智能体并行工作。\n\n其核心技术亮点在于模块化的软件智能体 SDK，这不仅构成了平台的引擎，还支持高度可组合的开发模式。此外，OpenHands 在 SWE-bench 基准测试中取得了 77.6% 的优异成绩，证明了其解决真实世界软件工程问题的能力。平台还具备完善的企业级功能，支持与 Slack、Jira 等工具集成，并提供细粒度的权限管理，适合从个人开发者到大型企业的各类用户场景。",70612,"2026-04-05T11:12:22",[15,14,13,36],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":79,"stars":83,"forks":84,"last_commit_at":85,"license":86,"difficulty_score":87,"env_os":78,"env_gpu":88,"env_ram":89,"env_deps":90,"category_tags":93,"github_topics":79,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":94,"updated_at":95,"faqs":96,"releases":127},2113,"deepseek-ai\u002FDeepSeek-Coder-V2","DeepSeek-Coder-V2","DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence","DeepSeek-Coder-V2 是一款开源的混合专家（MoE）代码大模型，旨在打破闭源模型在代码智能领域的性能垄断。它在 DeepSeek-V2 的基础上，额外使用了 6 万亿个令牌进行持续预训练，不仅大幅提升了代码生成与数学推理能力，还保持了出色的通用语言处理水平。\n\n这款模型主要解决了开发者在面对复杂编程任务时，开源工具往往难以媲美顶尖闭源模型（如 GPT-4 Turbo）的痛点。它将支持的编程语言从 86 种扩展至 338 种，并将上下文窗口从 16K 显著提升至 128K，使其能够轻松处理超长代码库和复杂的项目逻辑。\n\nDeepSeek-Coder-V2 非常适合软件工程师、算法研究人员以及需要高质量代码辅助的企业团队使用。无论是日常代码补全、跨语言项目迁移，还是高难度的算法推导，它都能提供强有力的支持。其独特的技术亮点在于采用高效的 MoE 架构，在实现比肩闭源模型性能的同时，依然保持开源免费的优势，并支持 MIT 代码许可证，让使用者可以更自由地进行本地部署和二次开发。","\u003C!-- markdownlint-disable first-line-h1 -->\n\u003C!-- markdownlint-disable html -->\n\u003C!-- markdownlint-disable no-duplicate-header -->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002Ffigures\u002Flogo.svg?raw=true\" width=\"60%\" alt=\"DeepSeek-V2\" \u002F>\n\u003C\u002Fdiv>\n\u003Chr>\n\u003Cdiv align=\"center\" style=\"line-height: 1;\">\n  \u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Homepage\" src=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002Ffigures\u002Fbadge.svg?raw=true\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fchat.deepseek.com\u002F\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Chat\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤖%20Chat-DeepSeek%20V2-536af5?color=536af5&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\" style=\"line-height: 1;\">\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTc7c45Zzu5\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Discord\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002Ffigures\u002Fqr.jpeg?raw=true\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Wechat\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fdeepseek_ai\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Twitter Follow\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-deepseek_ai-white?logo=x&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\" style=\"line-height: 1;\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002FLICENSE-CODE\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Code License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode_License-MIT-f5de53?&color=f5de53\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002FLICENSE-MODEL\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Model License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel_License-Model_Agreement-f5de53?&color=f5de53\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cp align=\"center\">\n  \u003Ca href=\"#2-model-downloads\">Model Download\u003C\u002Fa> |\n  \u003Ca href=\"#3-evaluation-results\">Evaluation Results\u003C\u002Fa> |\n  \u003Ca href=\"#5-api-platform\">API Platform\u003C\u002Fa> |\n  \u003Ca href=\"#6-how-to-run-locally\">How to Use\u003C\u002Fa> |\n  \u003Ca href=\"#7-license\">License\u003C\u002Fa> |\n  \u003Ca href=\"#8-citation\">Citation\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.11931\">\u003Cb>Paper Link\u003C\u002Fb>👁️\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n# DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence\n\n## 1. Introduction\nWe present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. \n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder-V2_readme_249ac926c37b.png\">\n\u003C\u002Fp>\n\n\nIn standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.  The list of supported programming languages can be found [here](supported_langs.txt).\n\n## 2. Model Downloads\n\nWe release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the [DeepSeekMoE](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.06066) framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public. \n\n\u003Cdiv align=\"center\">\n\n|            **Model**            | **#Total Params** | **#Active Params** | **Context Length** |                         **Download**                         |\n| :-----------------------------: | :---------------: | :----------------: | :----------------: | :----------------------------------------------------------: |\n|   DeepSeek-Coder-V2-Lite-Base   |        16B        |        2.4B        |        128k        | [🤗 HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base) |\n| DeepSeek-Coder-V2-Lite-Instruct |        16B        |        2.4B        |        128k        | [🤗 HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct) |\n|     DeepSeek-Coder-V2-Base      |       236B        |        21B         |        128k        | [🤗 HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Base) |\n|   DeepSeek-Coder-V2-Instruct    |       236B        |        21B         |        128k        | [🤗 HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Instruct) |\n\n\u003C\u002Fdiv>\n\n\n\n## 3. Evaluation Results\n### 3.1 Code Generation\n\n\n|  | #TP | #AP | HumanEval | MBPP+ | LiveCodeBench | USACO |\n|:------------|:--------:|:--------:|:--------:|:--------:|:--------:|:-----------:|\n| **Closed-Source Models** |  |  |  |  |  |  |\n| **Gemini-1.5-Pro**                  |  -   |  -   |   83.5    | **74.6**  |   34.1   |   4.9    |\n| **Claude-3-Opus**                   |  -   |  -   |   84.2    |   72.0    |   34.6   |   7.8    |\n| **GPT-4-Turbo-1106**                |  -   |  -   |   87.8    |   69.3    |   37.1   |   11.1   |\n| **GPT-4-Turbo-0409**                |  -   |  -   |   88.2    |   72.2    | **45.7** |   12.3   |\n| **GPT-4o-0513**                     |  -   |  -   | **91.0** |   73.5    |   43.4   | **18.8** |\n| **Open-Source Models**              |      |      |           |           |          |          |\n| **CodeStral**                       | 22B  | 22B  |   78.1    |   68.2    |   31.0   |   4.6    |\n| **DeepSeek-Coder-Instruct**         | 33B  | 33B  |   79.3    |   70.1   |   22.5   |   4.2    |\n| **Llama3-Instruct**                 | 70B  | 70B  |   81.1    |   68.8   |   28.7   |   3.3    |\n| **DeepSeek-Coder-V2-Lite-Instruct** | 16B | 2.4B | 81.1 | 68.8 | 24.3 | 6.5 |\n| **DeepSeek-Coder-V2-Instruct** | 236B | 21B  | **90.2** | **76.2** | **43.4** | **12.1** |\n\n### 3.2 Code Completion\n\n\n| Model                           | #TP  | #AP  | RepoBench (Python) | RepoBench (Java) | HumanEval FIM |\n| :------------------------------ | :--: | :--: | :----------------: | :--------------: | :-----------: |\n| **CodeStral**                   | 22B  | 22B  |      **46.1**      |     **45.7**     |     83.0      |\n| **DeepSeek-Coder-Base**         |  7B  |  7B  |        36.2        |       43.3       |     86.1      |\n| **DeepSeek-Coder-Base**         | 33B  | 33B  |        39.1        |       44.8       |   **86.4**    |\n| **DeepSeek-Coder-V2-Lite-Base** | 16B  | 2.4B |        38.9        |       43.3       |   **86.4**    |\n\n### 3.3 Code Fixing\n\n\n|                                     | #TP  | #AP  | Defects4J | SWE-Bench |  Aider   |\n| ----------------------------------- | :--: | :--: | :-------: | :-------: | :------: |\n| **Closed-Source Models**            |      |      |           |           |          |\n| **Gemini-1.5-Pro**                  |  -   |  -   |   18.6    |   19.3    |   57.1   |\n| **Claude-3-Opus**                   |  -   |  -   |   25.5    |   11.7    |   68.4   |\n| **GPT-4-Turbo-1106**                |  -   |  -   |   22.8    |   22.7    |   65.4   |\n| **GPT-4-Turbo-0409**                |  -   |  -   |   24.3    |   18.3    |   63.9   |\n| **GPT-4o-0513**                     |  -   |  -   | **26.1**  | **26.7**  | **72.9** |\n| **Open-Source Models**              |      |      |           |           |          |\n| **CodeStral**                       | 22B  | 22B  |   17.8    |    2.7    |   51.1   |\n| **DeepSeek-Coder-Instruct**         | 33B  | 33B  |   11.3    |    0.0    |   54.5   |\n| **Llama3-Instruct**                 | 70B  | 70B  |   16.2    |     -     |   49.2   |\n| **DeepSeek-Coder-V2-Lite-Instruct** | 16B  | 2.4B |    9.2    |    0.0    |   44.4   |\n| **DeepSeek-Coder-V2-Instruct**      | 236B | 21B  | **21.0**  | **12.7**  | **73.7** |\n\n### 3.4 Mathematical Reasoning\n\n\n|                                     | #TP  | #AP  |  GSM8K   |   MATH   | AIME 2024 | Math Odyssey |\n| ----------------------------------- | :--: | :--: | :------: | :------: | :-------: | :----------: |\n| **Closed-Source Models**            |      |      |          |          |           |              |\n| **Gemini-1.5-Pro**                  |  -   |  -   |   90.8   |   67.7   |   2\u002F30    |     45.0     |\n| **Claude-3-Opus**                   |  -   |  -   |   95.0   |   60.1   |   2\u002F30    |     40.6     |\n| **GPT-4-Turbo-1106**                |  -   |  -   |   91.4   |   64.3   |   1\u002F30    |     49.1     |\n| **GPT-4-Turbo-0409**                |  -   |  -   |   93.7   |   73.4   | **3\u002F30**  |     46.8     |\n| **GPT-4o-0513**                     |  -   |  -   | **95.8** | **76.6** |   2\u002F30    |   **53.2**   |\n| **Open-Source Models**              |      |      |          |          |           |              |\n| **Llama3-Instruct**                 | 70B  | 70B  |   93.0   |   50.4   |   1\u002F30    |     27.9     |\n| **DeepSeek-Coder-V2-Lite-Instruct** | 16B  | 2.4B |   86.4   |   61.8   |   0\u002F30    |     44.4     |\n| **DeepSeek-Coder-V2-Instruct**      | 236B | 21B  | **94.9** | **75.7** | **4\u002F30**  |   **53.7**   |\n\n### 3.5 General Natural Language\n\n|      Benchmark       | Domain  | DeepSeek-V2-Lite Chat | DeepSeek-Coder-V2-Lite Instruct | DeepSeek-V2 Chat | DeepSeek-Coder-V2 Instruct |\n| :------------------: | :-----: | :-------------------: | :-----------------------------: | :--------------: | :------------------------: |\n|       **BBH**        | English |         48.1          |              61.2               |       79.7       |          **83.9**          |\n|       **MMLU**       | English |         55.7          |              60.1               |       78.1       |          **79.2**          |\n|     **ARC-Easy**     | English |         86.1          |              88.9               |     **98.1**     |            97.4            |\n|  **ARC-Challenge**   | English |         73.4          |              77.4               |       92.3       |          **92.8**          |\n|     **TriviaQA**     | English |         65.2          |              59.5               |     **86.7**     |            82.3            |\n| **NaturalQuestions** | English |         35.5          |              30.8               |     **53.4**     |            47.5            |\n|     **AGIEval**      | English |         42.8          |              28.7               |     **61.4**     |             60             |\n|     **CLUEWSC**      | Chinese |         80.0          |              76.5               |     **89.9**     |            85.9            |\n|      **C-Eval**      | Chinese |         60.1          |              61.6               |       78.0       |          **79.4**          |\n|      **CMMLU**       | Chinese |         62.5          |              62.7               |     **81.6**     |            80.9            |\n|    **Arena-Hard**    |    -    |         11.4          |              38.1               |       41.6       |          **65.0**          |\n|  **AlpaceEval 2.0**  |    -    |         16.9          |              17.7               |     **38.9**     |            36.9            |\n|     **MT-Bench**     |    -    |         7.37          |              7.81               |     **8.97**     |            8.77            |\n|    **Alignbench**    |    -    |         6.02          |              6.83               |     **7.91**     |            7.84            |\n\n### 3.6 Context Window\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"80%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder-V2_readme_2d44e2ec176c.png\">\n\u003C\u002Fp>\n\n\nEvaluation results on the ``Needle In A Haystack`` (NIAH) tests.  DeepSeek-Coder-V2 performs well across all context window lengths up to **128K**. \n\n## 4. Chat Website\n\nYou can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: [chat.deepseek.com](https:\u002F\u002Fchat.deepseek.com\u002Fsign_in)\n\n## 5. API Platform\nWe also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https:\u002F\u002Fplatform.deepseek.com\u002F), and you can also pay-as-you-go at an unbeatable price.\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"40%\" src=\"figures\u002Fmodel_price.jpg\">\n\u003C\u002Fp>\n\n\n## 6. How to run locally\n**Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB*8 GPUs are required.**\n\n### Inference with Huggingface's Transformers\nYou can directly employ [Huggingface's Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) for model inference.\n\n#### Code Completion\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\ninput_text = \"#write a quick sort algorithm\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_length=128)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n```\n\n#### Code Insertion\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\ninput_text = \"\"\"\u003C｜fim▁begin｜>def quick_sort(arr):\n    if len(arr) \u003C= 1:\n        return arr\n    pivot = arr[0]\n    left = []\n    right = []\n\u003C｜fim▁hole｜>\n        if arr[i] \u003C pivot:\n            left.append(arr[i])\n        else:\n            right.append(arr[i])\n    return quick_sort(left) + [pivot] + quick_sort(right)\u003C｜fim▁end｜>\"\"\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_length=128)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])\n```\n\n#### Chat Completion\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\nmessages=[\n    { 'role': 'user', 'content': \"write a quick sort algorithm in python.\"}\n]\ninputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors=\"pt\").to(model.device)\n# tokenizer.eos_token_id is the id of \u003C｜end▁of▁sentence｜> token\noutputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)\nprint(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))\n```\n\n\n\nThe complete chat template can be found within `tokenizer_config.json` located in the huggingface model repository.\n\nAn example of chat template is as belows:\n\n```bash\n\u003C｜begin▁of▁sentence｜>User: {user_message_1}\n\nAssistant: {assistant_message_1}\u003C｜end▁of▁sentence｜>User: {user_message_2}\n\nAssistant:\n```\n\nYou can also add an optional system message:\n\n```bash\n\u003C｜begin▁of▁sentence｜>{system_message}\n\nUser: {user_message_1}\n\nAssistant: {assistant_message_1}\u003C｜end▁of▁sentence｜>User: {user_message_2}\n\nAssistant:\n```\n\nIn the last round of dialogue, note that \"Assistant:\" has no space after the colon. Adding a space might cause the following issues on the 16B-Lite model:\n- English questions receiving Chinese responses.\n- Responses containing garbled text.\n- Responses repeating excessively.\n\nOlder versions of Ollama had this bug (see https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2\u002Fissues\u002F12), but it has been fixed in the latest version.\n\n\n### Inference with SGLang (recommended)\n[SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang) currently supports MLA optimizations, FP8 (W8A8), FP8 KV Cache, and Torch Compile, offering the best latency and throughput among open-source frameworks. Here are some example commands to launch an OpenAI API-compatible server:\n\n```bash\n# BF16, tensor parallelism = 8\npython3 -m sglang.launch_server --model deepseek-ai\u002FDeepSeek-Coder-V2-Instruct --tp 8 --trust-remote-code\n\n# BF16, w\u002F torch.compile (The compilation can take several minutes)\npython3 -m sglang.launch_server --model deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct --trust-remote-code --enable-torch-compile\n\n# FP8, tensor parallelism = 8, FP8 KV cache\npython3 -m sglang.launch_server --model neuralmagic\u002FDeepSeek-Coder-V2-Instruct-FP8 --tp 8 --trust-remote-code --kv-cache-dtype fp8_e5m2\n```\n\nAfter launching the server, you can query it with OpenAI API\n\n```\nimport openai\nclient = openai.Client(\n    base_url=\"http:\u002F\u002F127.0.0.1:30000\u002Fv1\", api_key=\"EMPTY\")\n\n# Chat completion\nresponse = client.chat.completions.create(\n    model=\"default\",\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are a helpful AI assistant\"},\n        {\"role\": \"user\", \"content\": \"List 3 countries and their capitals.\"},\n    ],\n    temperature=0,\n    max_tokens=64,\n)\nprint(response)\n```\n\n\n### Inference with vLLM (recommended)\nTo utilize [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) for model inference, please merge this Pull Request into your vLLM codebase: https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Fpull\u002F4650.\n\n```python\nfrom transformers import AutoTokenizer\nfrom vllm import LLM, SamplingParams\n\nmax_model_len, tp_size = 8192, 1\nmodel_name = \"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nllm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)\nsampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])\n\nmessages_list = [\n    [{\"role\": \"user\", \"content\": \"Who are you?\"}],\n    [{\"role\": \"user\", \"content\": \"write a quick sort algorithm in python.\"}],\n    [{\"role\": \"user\", \"content\": \"Write a piece of quicksort code in C++.\"}],\n]\n\nprompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]\n\noutputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)\n\ngenerated_text = [output.outputs[0].text for output in outputs]\nprint(generated_text)\n```\n\n\n\n## 7. License\n\nThis code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-Coder-V2 Base\u002FInstruct models is subject to [the Model License](LICENSE-MODEL). DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use.\n\n## 8. Citation\n```latex\n@article{zhu2024deepseek,\n  title={DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence},\n  author={Zhu, Qihao and Guo, Daya and Shao, Zhihong and Yang, Dejian and Wang, Peiyi and Xu, Runxin and Wu, Y and Li, Yukun and Gao, Huazuo and Ma, Shirong and others},\n  journal={arXiv preprint arXiv:2406.11931},\n  year={2024}\n}\n```\n\n## 9. Contact\nIf you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).\n","\u003C!-- markdownlint-disable first-line-h1 -->\n\u003C!-- markdownlint-disable html -->\n\u003C!-- markdownlint-disable no-duplicate-header -->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002Ffigures\u002Flogo.svg?raw=true\" width=\"60%\" alt=\"DeepSeek-V2\" \u002F>\n\u003C\u002Fdiv>\n\u003Chr>\n\u003Cdiv align=\"center\" style=\"line-height: 1;\">\n  \u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Homepage\" src=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002Ffigures\u002Fbadge.svg?raw=true\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fchat.deepseek.com\u002F\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Chat\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤖%20Chat-DeepSeek%20V2-536af5?color=536af5&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\" style=\"line-height: 1;\">\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTc7c45Zzu5\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Discord\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002Ffigures\u002Fqr.jpeg?raw=true\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Wechat\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fdeepseek_ai\" target=\"_blank\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Twitter Follow\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-deepseek_ai-white?logo=x&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\" style=\"line-height: 1;\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002FLICENSE-CODE\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Code License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode_License-MIT-f5de53?&color=f5de53\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002FLICENSE-MODEL\" style=\"margin: 2px;\">\n    \u003Cimg alt=\"Model License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel_License-Model_Agreement-f5de53?&color=f5de53\" style=\"display: inline-block; vertical-align: middle;\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cp align=\"center\">\n  \u003Ca href=\"#2-model-downloads\">模型下载\u003C\u002Fa> |\n  \u003Ca href=\"#3-evaluation-results\">评估结果\u003C\u002Fa> |\n  \u003Ca href=\"#5-api-platform\">API平台\u003C\u002Fa> |\n  \u003Ca href=\"#6-how-to-run-locally\">使用方法\u003C\u002Fa> |\n  \u003Ca href=\"#7-license\">许可证\u003C\u002Fa> |\n  \u003Ca href=\"#8-citation\">引用\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.11931\">\u003Cb>论文链接\u003C\u002Fb>👁️\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n# DeepSeek-Coder-V2：突破代码智能领域闭源模型的壁垒\n\n## 1. 引言\n我们推出了DeepSeek-Coder-V2，这是一个开源的专家混合（MoE）代码语言模型，在代码相关任务中达到了与GPT4-Turbo相当的性能。具体来说，DeepSeek-Coder-V2是在DeepSeek-V2的中间检查点基础上，通过额外的6万亿个token继续预训练而成。通过这一持续的预训练，DeepSeek-Coder-V2显著提升了DeepSeek-V2的编码和数学推理能力，同时在通用语言任务上保持了相近的性能。与DeepSeek-Coder-33B相比，DeepSeek-Coder-V2在各类代码相关任务以及推理和通用能力方面都有了显著提升。此外，DeepSeek-Coder-V2将支持的编程语言从86种扩展到了338种，并将上下文长度从16K扩展至128K。\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder-V2_readme_249ac926c37b.png\">\n\u003C\u002Fp>\n\n\n在标准基准测试中，DeepSeek-Coder-V2在编码和数学基准测试中均表现出优于GPT4-Turbo、Claude 3 Opus和Gemini 1.5 Pro等闭源模型的性能。支持的编程语言列表可以在此处找到[这里](supported_langs.txt)。\n\n## 2. 模型下载\n\n我们基于[DeepSeekMoE](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.06066)框架发布了具有16B和236B参数的DeepSeek-Coder-V2，其有效参数仅为2.4B和21B，包括基础模型和指令微调模型，现已向公众开放。\n\n\u003Cdiv align=\"center\">\n\n|            **模型**            | **总参数量** | **有效参数量** | **上下文长度** |                         **下载**                         |\n| :-----------------------------: | :---------------: | :----------------: | :----------------: | :----------------------------------------------------------: |\n|   DeepSeek-Coder-V2-Lite-Base   |        16B        |        2.4B        |        128k        | [🤗 HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base) |\n| DeepSeek-Coder-V2-Lite-Instruct |        16B        |        2.4B        |        128k        | [🤗 HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct) |\n|     DeepSeek-Coder-V2-Base      |       236B        |        21B         |        128k        | [🤗 HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Base) |\n|   DeepSeek-Coder-V2-Instruct    |       236B        |        21B         |        128k        | [🤗 HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Instruct) |\n\n\u003C\u002Fdiv>\n\n\n\n## 3. 评估结果\n### 3.1 代码生成\n\n\n|  | #TP | #AP | HumanEval | MBPP+ | LiveCodeBench | USACO |\n|:------------|:--------:|:--------:|:--------:|:--------:|:--------:|:-----------:|\n| **闭源模型** |  |  |  |  |  |  |\n| **Gemini-1.5-Pro**                  |  -   |  -   |   83.5    | **74.6**  |   34.1   |   4.9    |\n| **Claude-3-Opus**                   |  -   |  -   |   84.2    |   72.0    |   34.6   |   7.8    |\n| **GPT-4-Turbo-1106**                |  -   |  -   |   87.8    |   69.3    |   37.1   |   11.1   |\n| **GPT-4-Turbo-0409**                |  -   |  -   |   88.2    |   72.2    | **45.7** |   12.3   |\n| **GPT-4o-0513**                     |  -   |  -   | **91.0** |   73.5    |   43.4   | **18.8** |\n| **开源模型**              |      |      |           |           |          |          |\n| **CodeStral**                       | 22B  | 22B  |   78.1    |   68.2    |   31.0   |   4.6    |\n| **DeepSeek-Coder-Instruct**         | 33B  | 33B  |   79.3    |   70.1   |   22.5   |   4.2    |\n| **Llama3-Instruct**                 | 70B  | 70B  |   81.1    |   68.8   |   28.7   |   3.3    |\n| **DeepSeek-Coder-V2-Lite-Instruct** | 16B | 2.4B | 81.1 | 68.8 | 24.3 | 6.5 |\n| **DeepSeek-Coder-V2-Instruct** | 236B | 21B  | **90.2** | **76.2** | **43.4** | **12.1** |\n\n### 3.2 代码补全\n\n\n| 模型                           | #TP  | #AP  | RepoBench (Python) | RepoBench (Java) | HumanEval FIM |\n| :------------------------------ | :--: | :--: | :----------------: | :--------------: | :-----------: |\n| **CodeStral**                   | 22B  | 22B  |      **46.1**      |     **45.7**     |     83.0      |\n| **DeepSeek-Coder-Base**         |  7B  |  7B  |        36.2        |       43.3       |     86.1      |\n| **DeepSeek-Coder-Base**         | 33B  | 33B  |        39.1        |       44.8       |   **86.4**    |\n| **DeepSeek-Coder-V2-Lite-Base** | 16B  | 2.4B |        38.9        |       43.3       |   **86.4**    |\n\n### 3.3 代码修复\n\n\n|                                     | #TP  | #AP  | Defects4J | SWE-Bench |  Aider   |\n| ----------------------------------- | :--: | :--: | :-------: | :-------: | :------: |\n| **闭源模型**            |      |      |           |           |          |\n| **Gemini-1.5-Pro**                  |  -   |  -   |   18.6    |   19.3    |   57.1   |\n| **Claude-3-Opus**                   |  -   |  -   |   25.5    |   11.7    |   68.4   |\n| **GPT-4-Turbo-1106**                |  -   |  -   |   22.8    |   22.7    |   65.4   |\n| **GPT-4-Turbo-0409**                |  -   |  -   |   24.3    |   18.3    |   63.9   |\n| **GPT-4o-0513**                     |  -   |  -   | **26.1**  | **26.7**  | **72.9** |\n| **开源模型**              |      |      |           |           |          |\n| **CodeStral**                       | 22B  | 22B  |   17.8    |    2.7    |   51.1   |\n| **DeepSeek-Coder-Instruct**         | 33B  | 33B  |   11.3    |    0.0    |   54.5   |\n| **Llama3-Instruct**                 | 70B  | 70B  |   16.2    |     -     |   49.2   |\n| **DeepSeek-Coder-V2-Lite-Instruct** | 16B  | 2.4B |    9.2    |    0.0    |   44.4   |\n| **DeepSeek-Coder-V2-Instruct**      | 236B | 21B  | **21.0**  | **12.7**  | **73.7** |\n\n### 3.4 数学推理\n\n\n|                                     | #TP  | #AP  |  GSM8K   |   MATH   | AIME 2024 | Math Odyssey |\n| ----------------------------------- | :--: | :--: | :------: | :------: | :-------: | :----------: |\n| **闭源模型**            |      |      |          |          |           |              |\n| **Gemini-1.5-Pro**                  |  -   |  -   |   90.8   |   67.7   |   2\u002F30    |     45.0     |\n| **Claude-3-Opus**                   |  -   |  -   |   95.0   |   60.1   |   2\u002F30    |     40.6     |\n| **GPT-4-Turbo-1106**                |  -   |  -   |   91.4   |   64.3   |   1\u002F30    |     49.1     |\n| **GPT-4-Turbo-0409**                |  -   |  -   |   93.7   |   73.4   | **3\u002F30**  |     46.8     |\n| **GPT-4o-0513**                     |  -   |  -   | **95.8** | **76.6** |   2\u002F30    |   **53.2**   |\n| **开源模型**              |      |      |          |          |           |              |\n| **Llama3-Instruct**                 | 70B  | 70B  |   93.0   |   50.4   |   1\u002F30    |     27.9     |\n| **DeepSeek-Coder-V2-Lite-Instruct** | 16B  | 2.4B |   86.4   |   61.8   |   0\u002F30    |     44.4     |\n| **DeepSeek-Coder-V2-Instruct**      | 236B | 21B  | **94.9** | **75.7** | **4\u002F30**  |   **53.7**   |\n\n### 3.5 通用自然语言\n\n|      基准测试       | 领域  | DeepSeek-V2-Lite Chat | DeepSeek-Coder-V2-Lite Instruct | DeepSeek-V2 Chat | DeepSeek-Coder-V2 Instruct |\n| :------------------: | :-----: | :-------------------: | :-----------------------------: | :--------------: | :------------------------: |\n|       **BBH**        | 英语 |         48.1          |              61.2               |       79.7       |          **83.9**          |\n|       **MMLU**       | 英语 |         55.7          |              60.1               |       78.1       |          **79.2**          |\n|     **ARC-Easy**     | 英语 |         86.1          |              88.9               |     **98.1**     |            97.4            |\n|  **ARC-Challenge**   | 英语 |         73.4          |              77.4               |       92.3       |          **92.8**          |\n|     **TriviaQA**     | 英语 |         65.2          |              59.5               |     **86.7**     |            82.3            |\n| **NaturalQuestions** | 英语 |         35.5          |              30.8               |     **53.4**     |            47.5            |\n|     **AGIEval**      | 英语 |         42.8          |              28.7               |     **61.4**     |             60             |\n|     **CLUEWSC**      | 中文 |         80.0          |              76.5               |     **89.9**     |            85.9            |\n|      **C-Eval**      | 中文 |         60.1          |              61.6               |       78.0       |          **79.4**          |\n|      **CMMLU**       | 中文 |         62.5          |              62.7               |     **81.6**     |            80.9            |\n|    **Arena-Hard**    |    -    |         11.4          |              38.1               |       41.6       |          **65.0**          |\n|  **AlpaceEval 2.0**  |    -    |         16.9          |              17.7               |     **38.9**     |            36.9            |\n|     **MT-Bench**     |    -    |         7.37          |              7.81               |     **8.97**     |            8.77            |\n|    **Alignbench**    |    -    |         6.02          |              6.83               |     **7.91**     |            7.84            |\n\n### 3.6 上下文窗口\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"80%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder-V2_readme_2d44e2ec176c.png\">\n\u003C\u002Fp>\n\n\n在“Needle In A Haystack”（NIAH）测试中的评估结果。DeepSeek-Coder-V2 在所有上下文窗口长度上表现良好，最高可达 **128K**。\n\n## 4. 聊天网站\n\n您可以在 DeepSeek 官方网站上与 DeepSeek-Coder-V2 进行对话：[chat.deepseek.com](https:\u002F\u002Fchat.deepseek.com\u002Fsign_in)\n\n## 5. API 平台\n我们还在 DeepSeek 平台提供与 OpenAI 兼容的 API：[platform.deepseek.com](https:\u002F\u002Fplatform.deepseek.com\u002F)，并且您可以按需付费，价格极具竞争力。\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"40%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder-V2_readme_3d346d5851f5.jpg\">\n\u003C\u002Fp>\n\n\n## 6. 如何本地运行\n**在此，我们提供了一些使用 DeepSeek-Coder-V2-Lite 模型的示例。如果您想以 BF16 格式运行 DeepSeek-Coder-V2 进行推理，则需要 80GB*8 张 GPU。**\n\n### 使用 Hugging Face 的 Transformers 进行推理\n您可以直接使用 [Hugging Face 的 Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) 进行模型推理。\n\n#### 代码补全\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\ninput_text = \"#write a quick sort algorithm\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_length=128)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n```\n\n#### 代码插入\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\ninput_text = \"\"\"\u003C｜fim▁begin｜>def quick_sort(arr):\n    if len(arr) \u003C= 1:\n        return arr\n    pivot = arr[0]\n    left = []\n    right = []\n\u003C｜fim▁hole｜>\n        if arr[i] \u003C pivot:\n            left.append(arr[i])\n        else:\n            right.append(arr[i])\n    return quick_sort(left) + [pivot] + quick_sort(right)\u003C｜fim▁end｜>\"\"\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_length=128)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])\n```\n\n#### 聊天补全\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\nmessages=[\n    { 'role': 'user', 'content': \"write a quick sort algorithm in python.\"}\n]\ninputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors=\"pt\").to(model.device)\n# tokenizer.eos_token_id 是 \u003C｜end▁of▁sentence｜> 标记的 id\noutputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)\nprint(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))\n```\n\n\n\n完整的聊天模板可以在 Hugging Face 模型仓库中的 `tokenizer_config.json` 文件中找到。\n\n聊天模板示例如下：\n\n```bash\n\u003C｜begin▁of▁sentence｜>User: {user_message_1}\n\nAssistant: {assistant_message_1}\u003C｜end▁of▁sentence｜>User: {user_message_2}\n\nAssistant:\n```\n\n您还可以添加可选的系统消息：\n\n```bash\n\u003C｜begin▁of▁sentence｜>{system_message}\n\nUser: {user_message_1}\n\nAssistant: {assistant_message_1}\u003C｜end▁of▁sentence｜>User: {user_message_2}\n\nAssistant:\n```\n\n在最后一轮对话中，请注意“Assistant:”后面没有空格。如果添加空格，可能会导致 16B-Lite 模型出现以下问题：\n- 英文问题返回中文回答。\n- 回答包含乱码。\n- 回答过度重复。\n\nOllama 的旧版本曾存在此 bug（参见 https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2\u002Fissues\u002F12），但最新版本已修复。\n\n\n### 使用 SGLang 进行推理（推荐）\n[SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang) 目前支持 MLA 优化、FP8 (W8A8)、FP8 KV 缓存以及 Torch Compile，在开源框架中提供了最佳的延迟和吞吐量。以下是启动兼容 OpenAI API 的服务器的一些示例命令：\n\n```bash\n# BF16，张量并行度 = 8\npython3 -m sglang.launch_server --model deepseek-ai\u002FDeepSeek-Coder-V2-Instruct --tp 8 --trust-remote-code\n\n# BF16，启用 torch.compile（编译可能需要几分钟）\npython3 -m sglang.launch_server --model deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct --trust-remote-code --enable-torch-compile\n\n# FP8，张量并行度 = 8，FP8 KV 缓存\npython3 -m sglang.launch_server --model neuralmagic\u002FDeepSeek-Coder-V2-Instruct-FP8 --tp 8 --trust-remote-code --kv-cache-dtype fp8_e5m2\n```\n\n启动服务器后，您可以使用 OpenAI API 进行查询：\n\n```\nimport openai\nclient = openai.Client(\n    base_url=\"http:\u002F\u002F127.0.0.1:30000\u002Fv1\", api_key=\"EMPTY\")\n\n# 聊天补全\nresponse = client.chat.completions.create(\n    model=\"default\",\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are a helpful AI assistant\"},\n        {\"role\": \"user\", \"content\": \"List 3 countries and their capitals.\"},\n    ],\n    temperature=0,\n    max_tokens=64,\n)\nprint(response)\n```\n\n\n### 使用 vLLM 进行推理（推荐）\n要使用 [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) 进行模型推理，请将此 Pull Request 合并到您的 vLLM 代码库中：https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Fpull\u002F4650。\n\n```python\nfrom transformers import AutoTokenizer\nfrom vllm import LLM, SamplingParams\n\nmax_model_len, tp_size = 8192, 1\nmodel_name = \"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nllm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)\nsampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])\n\nmessages_list = [\n    [{\"role\": \"user\", \"content\": \"Who are you?\"}],\n    [{\"role\": \"user\", \"content\": \"write a quick sort algorithm in python.\"}],\n    [{\"role\": \"user\", \"content\": \"Write a piece of quicksort code in C++.\"}],\n]\n\nprompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]\n\noutputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)\n\ngenerated_text = [output.outputs[0].text for output in outputs]\nprint(generated_text)\n```\n\n\n\n## 7. 许可证\n\n本代码仓库采用 [MIT 许可证](LICENSE-CODE) 许可。DeepSeek-Coder-V2 Base\u002FInstruct 模型的使用受 [模型许可证](LICENSE-MODEL) 约束。DeepSeek-Coder-V2 系列（包括 Base 和 Instruct）支持商业用途。\n\n## 8. 引用\n```latex\n@article{zhu2024deepseek,\n  title={DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence},\n  author={Zhu, Qihao and Guo, Daya and Shao, Zhihong and Yang, Dejian and Wang, Peiyi and Xu, Runxin and Wu, Y and Li, Yukun and Gao, Huazuo and Ma, Shirong and others},\n  journal={arXiv preprint arXiv:2406.11931},\n  year={2024}\n}\n```\n\n## 9. 联系方式\n如果您有任何疑问，请提交 issue 或联系我们的邮箱 [service@deepseek.com](service@deepseek.com)。","# DeepSeek-Coder-V2 快速上手指南\n\nDeepSeek-Coder-V2 是一款开源的混合专家（MoE）代码语言模型，在代码生成、补全及数学推理任务上表现卓越，支持 338 种编程语言和 128K 上下文长度。本指南将帮助您快速部署并使用该模型。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+) 或 macOS。\n*   **Python 版本**: 3.8 或更高版本。\n*   **GPU 显存**:\n    *   **DeepSeek-Coder-V2-Lite (16B\u002F2.4B 激活)**: 建议至少 24GB 显存（使用量化后可更低）。\n    *   **DeepSeek-Coder-V2 (236B\u002F21B 激活)**: 需要多卡并行或高显存服务器（建议使用 vLLM 或框架原生分布式推理）。\n*   **前置依赖**:\n    *   `pip` (包管理工具)\n    *   `git`\n    *   `cuda` (如需 GPU 加速，建议版本 11.8 或 12.1+)\n\n## 2. 安装步骤\n\n推荐使用 `pip` 安装必要的依赖库。为了获得最佳推理速度，建议安装 `vllm` 或 `transformers`。\n\n### 基础依赖安装\n\n```bash\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\npip install transformers accelerate sentencepiece protobuf\n```\n\n### 可选：高性能推理引擎 (推荐)\n\n对于生产环境或大模型推理，强烈建议安装 `vllm` 以获得更高的吞吐量：\n\n```bash\npip install vllm\n```\n\n> **提示**：国内用户若下载缓慢，可配置清华源或阿里源加速：\n> `pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple \u003Cpackage_name>`\n\n## 3. 基本使用\n\n### 方式一：使用 Hugging Face Transformers (最简单)\n\n此方法适合快速测试和轻量级应用。以下示例以 **DeepSeek-Coder-V2-Lite-Instruct** 为例。\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\n# 1. 加载模型和分词器\n# 模型名称可根据需求替换为 deepseek-ai\u002FDeepSeek-Coder-V2-Instruct (236B)\nmodel_name = \"deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct\"\n\ntokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_name, \n    torch_dtype=torch.bfloat16, \n    device_map=\"auto\",\n    trust_remote_code=True\n)\n\n# 2. 构建输入提示 (Prompt)\n# DeepSeek-Coder-V2 遵循标准的指令格式\nprompt = \"\"\"\u003C|begin_of_sentence|>You are an expert Python programmer. Please write a function to calculate the fibonacci sequence.\n\u003C|user|>\nWrite a python function for fibonacci.\n\u003C|assistant|>\n\"\"\"\n\n# 3. 生成代码\ninputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(\n    **inputs, \n    max_new_tokens=512, \n    do_sample=False, \n    temperature=0.0, \n    top_p=1.0\n)\n\n# 4. 解码并打印结果\nresult = tokenizer.decode(outputs[0], skip_special_tokens=False)\nprint(result)\n```\n\n### 方式二：使用 vLLM 进行高效推理\n\n如果您需要处理高并发请求或使用完整的 236B 模型，请使用 vLLM 启动服务。\n\n**启动 API 服务：**\n\n```bash\n# Lite 版本示例\npython -m vllm.entrypoints.api_server \\\n    --model deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct \\\n    --trust-remote-code \\\n    --port 8000\n\n# 完整版本示例 (需多卡环境)\n# python -m vllm.entrypoints.api_server \\\n#     --model deepseek-ai\u002FDeepSeek-Coder-V2-Instruct \\\n#     --trust-remote-code \\\n#     --tensor-parallel-size 8 \\\n#     --port 8000\n```\n\n**调用示例 (curl)：**\n\n```bash\ncurl http:\u002F\u002Flocalhost:8000\u002Fgenerate \\\n    -d '{\n        \"prompt\": \"\u003C|begin_of_sentence|>You are a coding assistant.\u003C|user|>Sort this list in Python: [3, 1, 4]\u003C|assistant|>\",\n        \"max_tokens\": 200,\n        \"temperature\": 0\n    }'\n```\n\n### 模型下载说明\n\n所有模型权重均托管在 Hugging Face。您可以直接通过代码自动下载，或手动访问以下地址：\n\n*   **Lite 版 (16B)**: [DeepSeek-Coder-V2-Lite-Instruct](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct)\n*   **完整版 (236B)**: [DeepSeek-Coder-V2-Instruct](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Instruct)\n\n> **注意**：使用前请仔细阅读并同意对应的 [代码许可证 (MIT)](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002FLICENSE-CODE) 和 [模型协议](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-V2\u002Fblob\u002Fmain\u002FLICENSE-MODEL)。","某金融科技公司后端团队正紧急重构一个遗留的单体支付系统，需将其拆分为微服务并迁移至云原生架构，同时必须兼容十余种老旧编程语言编写的核心算法模块。\n\n### 没有 DeepSeek-Coder-V2 时\n- **多语言支持受限**：团队被迫组合使用多个专用工具来处理 Python、Go 及冷门的 COBOL 代码，上下文切换频繁且容易出错。\n- **长上下文理解困难**：面对数十万行的遗留代码库，现有模型因 16K 上下文限制无法一次性读取完整逻辑，导致重构时经常遗漏关键依赖或产生幻觉。\n- **复杂逻辑推理不足**：在处理涉及高精度数学计算的利息算法时，通用模型常给出逻辑错误的代码片段，需要资深工程师花费大量时间人工审查和修正。\n- **闭源模型成本高**：若追求同等智能水平需调用昂贵的闭源 API，且敏感金融代码上传至第三方服务器存在合规与数据泄露风险。\n\n### 使用 DeepSeek-Coder-V2 后\n- **全栈语言统一处理**：凭借对 338 种编程语言的广泛支持，DeepSeek-Coder-V2 能流畅理解并转换从现代 Go 到古老 Fortran 的所有模块，实现单一工作流覆盖。\n- **全景代码分析**：利用 128K 超长上下文窗口，DeepSeek-Coder-V2 可一次性摄入整个服务模块的代码，精准梳理调用链路，确保重构后的逻辑完整性。\n- **专家级推理能力**：基于 MoE 架构增强的数学与代码推理能力，DeepSeek-Coder-V2 能直接生成经过验证的高精度算法代码，大幅减少人工调试时间。\n- **安全自主可控**：团队可本地部署该开源模型，在完全不外传代码的前提下享受媲美 GPT-4 Turbo 的智能辅助，完美满足金融级安全合规要求。\n\nDeepSeek-Coder-V2 通过突破闭源模型的性能壁垒，让企业在保障数据安全的同时，以开源成本实现了全语言、长上下文的顶级代码智能重构。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder-V2_249ac926.png","deepseek-ai","DeepSeek","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fdeepseek-ai_04503588.png","",null,"service@deepseek.com","https:\u002F\u002Fwww.deepseek.com\u002F","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai",6588,1060,"2026-04-05T08:28:41","MIT",4,"未说明（模型基于 DeepSeekMoE 架构，参数量巨大：Lite 版总参数 16B\u002F激活 2.4B，完整版总参数 236B\u002F激活 21B，通常推理需要高性能多卡 GPU 集群或量化处理）","未说明",{"notes":91,"python":89,"dependencies":92},"README 提供的片段主要介绍了模型性能、参数量（包含 Base 和 Instruct 版本）、支持的编程语言数量（338 种）以及上下文长度（128K）。文中未列出具体的运行环境配置（如操作系统、Python 版本、依赖库列表等）。用户需前往提供的 Hugging Face 链接查看具体的模型卡片以获取详细的部署要求和依赖信息。",[],[15],"2026-03-27T02:49:30.150509","2026-04-06T07:13:12.643073",[97,102,107,112,117,122],{"id":98,"question_zh":99,"answer_zh":100,"source_url":101},9732,"为什么模型总是用中文回复，即使系统提示明确要求只使用英文？","这通常是因为使用的推理框架（如 llama.cpp 或 Ollama）中的对话模板（template）配置不正确。llama.cpp 目前未硬编码 deepseek-coder-v2 的模板，默认回退到 chatml 模板导致问题。Ollama 的默认模板在 'Assistant:' 后有多余空格且缺少结束符。\n\n解决方案是手动修正模板。对于 Ollama，请使用以下正确的 Modelfile 配置：\n\nTEMPLATE \"\"\"{{ if .System }}{{ .System }}\n\n{{ end }}{{ if .Prompt }}User: {{ .Prompt }}\n\n{{ end }}Assistant:{{ .Response }}\u003C｜end▁of▁sentence｜>\"\"\"\n\nPARAMETER stop \"User:\"\nPARAMETER stop \"Assistant:\"\nPARAMETER stop \"\u003C｜end▁of▁sentence｜>\"\n\n应用此模板后，模型将严格遵循指令输出英文。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2\u002Fissues\u002F12",{"id":103,"question_zh":104,"answer_zh":105,"source_url":106},9733,"在使用 DeepSpeed 进行 bf16 混合精度微调时遇到 dtype 不匹配错误（float != BFloat16）怎么办？","这是一个已知的类型处理问题，官方已经修复。请更新 `modeling_deepseek.py` 文件以解决该错误。\n\n您可以参考 HuggingFace 上的最新提交来获取修复后的代码：\nhttps:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2-Lite-Base\u002Fcommit\u002Fe5e79b92c85f8f9182ec006b575483227201fd5e\n\n该问题通常源于 Router 函数中使用 float32 而其他部分使用 bfloat16，更新后的代码正确处理了这种混合精度下的类型转换。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2\u002Fissues\u002F6",{"id":108,"question_zh":109,"answer_zh":110,"source_url":111},9734,"如何在 8 张 GPU 上使用 vLLM 加载 DeepSeek-Coder-V2-Instruct 模型以避免显存溢出（OOM）？","加载失败通常是因为张量并行度（tensor-parallel-size）设置不当。默认示例可能将其设置为 1，这意味着尝试将所有张量放入单张 GPU，从而导致 OOM。\n\n在 8 张 GPU（如 A100 或 A800）上运行时，必须显式设置张量并行度为 8。请在启动命令中添加参数：\n\n--tensor-parallel-size 8\n\n此外，请确保使用较新版本的 vLLM（如 0.5.3.post1 或更高版本），旧版本可能存在加载问题。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2\u002Fissues\u002F30",{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},9735,"运行 DeepSeek-Coder-V2-Lite 模型需要什么样的 GPU 和显存配置？","对于 **DeepSeek-Coder-V2-Lite** 模型：\n1. **BF16 格式**：单张 40GB 显存的 GPU（如 A100 40G）即可运行。\n2. **消费级显卡（如 3x24GB 4090）**：如果显存总和足够但单卡不足，需要启用张量并行（TP 2）或流水线并行（PP 2）。\n3. **低显存方案**：如果硬件资源有限，建议使用 Ollama 运行量化版本（Quantized version）的模型。\n\n注意：完整版 DeepSeek-Coder-V2 (236B) BF16 格式则需要 80GB * 8 GPUs。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2\u002Fissues\u002F11",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},9736,"DeepSeek API 上提供的 Coder V2 模型具体是哪个版本？","DeepSeek 平台 API 上提供的模型是 **DeepSeek-Coder-V2 236B** 版本。这是该系列的大参数版本，具有强大的代码生成和理解能力。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2\u002Fissues\u002F2",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},9737,"在哪里可以找到 DeepSeek-Coder-V2 的源代码？感觉没有完全开源。","本项目主要开源的是模型的权重（Weights）以及用于推理和微调的代码脚本（位于 `deepseek_v2` 目录下的 `modeling_deepseek.py` 等文件）。\n\n如果您指的是完整的训练代码库或底层算子实现，部分核心组件可能未完全公开，但提供的推理代码足以让用户下载权重并在本地运行模型。请确保您克隆了完整的仓库并查看了 `deepseek_v2` 文件夹，那里包含了加载和运行模型所需的核心 Python 代码。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder-V2\u002Fissues\u002F65",[]]