[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-deepseek-ai--DeepSeek-Coder":3,"tool-deepseek-ai--DeepSeek-Coder":65},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160411,2,"2026-04-18T23:33:24",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},8553,"spec-kit","github\u002Fspec-kit","Spec Kit 是一款专为提升软件开发效率而设计的开源工具包，旨在帮助团队快速落地“规格驱动开发”（Spec-Driven Development）模式。传统开发中，需求文档往往与代码实现脱节，导致沟通成本高且结果不可控；而 Spec Kit 通过将规格说明书转化为可执行的指令，让 AI 直接依据明确的业务场景生成高质量代码，从而减少从零开始的随意编码，确保产出结果的可预测性。\n\n该工具特别适合希望利用 AI 辅助编程的开发者、技术负责人及初创团队。无论是启动全新项目还是在现有工程中引入规范化流程，用户只需通过简单的命令行操作，即可初始化项目并集成主流的 AI 编程助手。其核心技术亮点在于“规格即代码”的理念，支持社区扩展与预设模板，允许用户根据特定技术栈定制开发流程。此外，Spec Kit 强调官方维护的安全性，提供稳定的版本管理，帮助开发者在享受 AI 红利的同时，依然牢牢掌握架构设计的主动权，真正实现从“凭感觉写代码”到“按规格建系统”的转变。",88749,"2026-04-17T09:48:14",[15,26,14,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":10,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85267,"2026-04-18T11:00:28",[26,51,52,53,14,54,15,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":62,"last_commit_at":63,"category_tags":64,"status":16},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[15,51,54],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":71,"readme_en":72,"readme_zh":73,"quickstart_zh":74,"use_case_zh":75,"hero_image_url":76,"owner_login":77,"owner_name":78,"owner_avatar_url":79,"owner_bio":80,"owner_company":81,"owner_location":81,"owner_email":82,"owner_twitter":81,"owner_website":83,"owner_url":84,"languages":85,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":23,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":106,"github_topics":81,"view_count":10,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":107,"updated_at":108,"faqs":109,"releases":139},9382,"deepseek-ai\u002FDeepSeek-Coder","DeepSeek-Coder","DeepSeek Coder: Let the Code Write Itself","DeepSeek-Coder 是一款专为编程任务打造的高性能开源代码大模型系列，旨在让代码编写更加智能高效。它通过在 2 万亿 token 的高质量数据上进行从零训练（其中 87% 为代码，13% 为中英文自然语言），能够理解并生成超过 80 种编程语言，从常见的 Python、Java 到专业的 CUDA、Solidity 均能胜任。\n\n该工具主要解决了开发者在代码补全、项目级上下文理解及代码填空等场景中的痛点。凭借 16K 的超长上下文窗口和独特的“填空”训练任务，DeepSeek-Coder 不仅能单行补全，更能基于整个项目文件进行精准的代码生成与修复，显著提升了开发效率。在 HumanEval、MBPP 等多个权威基准测试中，其表现超越了同类开源模型，甚至部分版本可媲美 GPT-3.5 Turbo。\n\nDeepSeek-Coder 非常适合软件工程师、算法研究人员以及需要辅助编程的学生使用。无论是希望在本地部署轻量级模型（提供 1B 至 33B 多种尺寸可选）以保护代码隐私的企业团队，还是追求极致性能的资深开发者，都能找到合适的版本。其强大的多语言支持和高性价比的推理能力，使其","DeepSeek-Coder 是一款专为编程任务打造的高性能开源代码大模型系列，旨在让代码编写更加智能高效。它通过在 2 万亿 token 的高质量数据上进行从零训练（其中 87% 为代码，13% 为中英文自然语言），能够理解并生成超过 80 种编程语言，从常见的 Python、Java 到专业的 CUDA、Solidity 均能胜任。\n\n该工具主要解决了开发者在代码补全、项目级上下文理解及代码填空等场景中的痛点。凭借 16K 的超长上下文窗口和独特的“填空”训练任务，DeepSeek-Coder 不仅能单行补全，更能基于整个项目文件进行精准的代码生成与修复，显著提升了开发效率。在 HumanEval、MBPP 等多个权威基准测试中，其表现超越了同类开源模型，甚至部分版本可媲美 GPT-3.5 Turbo。\n\nDeepSeek-Coder 非常适合软件工程师、算法研究人员以及需要辅助编程的学生使用。无论是希望在本地部署轻量级模型（提供 1B 至 33B 多种尺寸可选）以保护代码隐私的企业团队，还是追求极致性能的资深开发者，都能找到合适的版本。其强大的多语言支持和高性价比的推理能力，使其成为构建智能编程助手或进行代码相关研究的理想选择。","\u003Cp align=\"center\">\n\u003Cimg width=\"1000px\" alt=\"DeepSeek Coder\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_08bce2316927.png\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\">[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_c8241abaf847.png\" width=\"20px\"> Homepage]\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fchat.deepseek.com\u002F\">[🤖 Chat with DeepSeek Coder]\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\">[🤗 Models Download]\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTc7c45Zzu5\">[Discord]\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fguoday\u002Fassert\u002Fblob\u002Fmain\u002FQR.png?raw=true\">[WeChat (微信)]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2401.14196\">\u003Cb>Paper Link\u003C\u002Fb>👁️\u003C\u002Fa>\n\u003C\u002Fp>\n\u003Chr>\n\n\n### 1. Introduction of DeepSeek Coder\n\nDeepSeek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and an extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, DeepSeek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_b9a43d8cbe71.png\" alt=\"result\" width=\"70%\">\n\u003C\u002Fp>\n\n- **Massive Training Data**: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.\n\n- **Highly Flexible & Scalable**: Offered in model sizes of 1B, 5.7B, 6.7B and 33B, enabling users to choose the setup most suitable for their requirements.\n\n- **Superior Model Performance**: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.\n\n- **Advanced Code Completion Capabilities**: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.\n\n#### Supported Programming Languages\n`['ada', 'agda', 'alloy', 'antlr', 'applescript', 'assembly', 'augeas', 'awk', 'batchfile', 'bluespec', 'c', 'c-sharp', 'clojure', 'cmake', 'coffeescript', 'common-lisp', 'cpp', 'css', 'cuda', 'dart', 'dockerfile', 'elixir', 'elm', 'emacs-lisp', 'erlang', 'f-sharp', 'fortran', 'glsl', 'go', 'groovy', 'haskell', 'html', 'idris', 'isabelle', 'java', 'java-server-pages', 'javascript', 'json', 'julia', 'jupyter-notebook', 'kotlin', 'lean', 'literate-agda', 'literate-coffeescript', 'literate-haskell', 'lua', 'makefile', 'maple', 'markdown', 'mathematica', 'matlab', 'ocaml', 'pascal', 'perl', 'php', 'powershell', 'prolog', 'protocol-buffer', 'python', 'r', 'racket', 'restructuredtext', 'rmarkdown', 'ruby', 'rust', 'sas', 'scala', 'scheme', 'shell', 'smalltalk', 'solidity', 'sparql', 'sql', 'stan', 'standard-ml', 'stata', 'systemverilog', 'tcl', 'tcsh', 'tex', 'thrift', 'typescript', 'verilog', 'vhdl', 'visual-basic', 'xslt', 'yacc', 'yaml', 'zig']`\n\n### 2. Evaluation Results\nWe evaluate DeepSeek Coder on various coding-related benchmarks.\nOnly `pass@1` results on HumanEval (Python and Multilingual), MBPP, and DS-1000 are reported here:\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_4c419f51a76d.png\" alt=\"table\" width=\"70%\">\n\u003C\u002Fp>\n\n\nThe result shows that DeepSeek-Coder-Base-33B significantly outperforms existing open-source code LLMs. Compared with CodeLlama-34B, it leads by 7.9%, 9.3%, 10.8% and 5.9% respectively on HumanEval Python, HumanEval Multilingual, MBPP and DS-1000.\nSurprisingly, our DeepSeek-Coder-Base-7B reaches the performance of CodeLlama-34B.\nThe DeepSeek-Coder-Instruct-33B model after instruction tuning outperforms GPT35-turbo on HumanEval and achieves comparable results with GPT35-turbo on MBPP.\n\nMore evaluation details can be found in the [Detailed Evaluation](#6-detailed-evaluation-results).\n\n\n### 3. Procedure of Data Creation and Model Training\n\n#### Data Creation\n\n- Step 1: Collect code data from GitHub and apply the same filtering rules as [StarCoder Data](https:\u002F\u002Fgithub.com\u002Fbigcode-project\u002Fbigcode-dataset) to filter data.\n- Step 2: Parsing the dependencies of files within the same repository to rearrange the file positions based on their dependencies.\n- Step 3: Concatenating dependent files to form a single example and employ repo-level minhash for deduplication.\n- Step 4: Further filtering out low-quality code, such as codes with syntax errors or poor readability.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_def754f0f3c2.png\" alt=\"data_creation\" width=\"100%\">\n\n#### Model Training\n\n- Step 1: Initially pre-trained with a dataset consisting of 87% code, 10% code-related language (Github Markdown and StackExchange), and 3% non-code-related Chinese language. Models are pre-trained using 1.8T tokens and a 4K window size in this step.\n- Step 2: Further Pre-training using an extended 16K window size on an additional 200B tokens, resulting in foundational models (**DeepSeek-Coder-Base**).\n- Step 3: Instruction Fine-tuning on 2B tokens of instruction data, resulting in instruction-tuned models (**DeepSeek-Coder-Instruct**).\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_38995ed64995.png\" alt=\"model_pretraining\" width=\"100%\">\n\n\n### 4. How to Use\nBefore proceeding, you'll need to install the necessary dependencies. You can do this by running the following command:\n```\npip install -r requirements.txt\n```\nA demo is also available on the [🤗 Hugging Face Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002Fdeepseek-coder-33b-instruct), and you can run the demo locally using `app.py` in the [demo](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fdeepseek-coder\u002Ftree\u002Fmain\u002Fdemo) folder.  (Thanks to all the HF team for their support)\n\nHere are some examples of how to use our model.\n\n#### 1) Code Completion\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\ninput_text = \"#write a quick sort algorithm\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_length=128)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n```\nThis code will output the following result:\n```\ndef quick_sort(arr):\n    if len(arr) \u003C= 1:\n        return arr\n    pivot = arr[0]\n    left = []\n    right = []\n    for i in range(1, len(arr)):\n        if arr[i] \u003C pivot:\n            left.append(arr[i])\n        else:\n            right.append(arr[i])\n    return quick_sort(left) + [pivot] + quick_sort(right)\n```\n\n#### 2) Code Insertion\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\ninput_text = \"\"\"\u003C｜fim▁begin｜>def quick_sort(arr):\n    if len(arr) \u003C= 1:\n        return arr\n    pivot = arr[0]\n    left = []\n    right = []\n\u003C｜fim▁hole｜>\n        if arr[i] \u003C pivot:\n            left.append(arr[i])\n        else:\n            right.append(arr[i])\n    return quick_sort(left) + [pivot] + quick_sort(right)\u003C｜fim▁end｜>\"\"\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_length=128)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])\n```\nThis code will output the following result:\n```\n   for i in range(1, len(arr)):\n```\n\n#### 3) Chat Model Inference\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-instruct\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-instruct\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\nmessages=[\n    { 'role': 'user', 'content': \"write a quick sort algorithm in python.\"}\n]\ninputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors=\"pt\").to(model.device)\n# tokenizer.eos_token_id is the id of \u003C|EOT|> token\noutputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)\nprint(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))\n```\nThis code will output the following result:\n```\nSure, here is a simple implementation of the Quick Sort algorithm in Python:\n\ndef quick_sort(arr):\n    if len(arr) \u003C= 1:\n        return arr\n    else:\n        pivot = arr[0]\n        less_than_pivot = [x for x in arr[1:] if x \u003C= pivot]\n        greater_than_pivot = [x for x in arr[1:] if x > pivot]\n        return quick_sort(less_than_pivot) + [pivot] + quick_sort(greater_than_pivot)\n\n# Test the function\narr = [10, 7, 8, 9, 1, 5]\nprint(\"Original array:\", arr)\nprint(\"Sorted array:\", quick_sort(arr))\n\nThis code works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The pivot element is then in its final position. The process is then repeated for the sub-arrays.\n```\n\nIf you don't want to use the provided API `apply_chat_template` which loads the template from `tokenizer_config.json`, you can use the following template to chat with our model. Replace the `['content']` with your instructions and the model's previous (if any) responses, then the model will generate the response to the currently given instruction.\n```\nYou are an AI programming assistant, utilizing the DeepSeek Coder model, developed by DeepSeek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n### Instruction:\n['content']\n### Response:\n['content']\n\u003C|EOT|>\n### Instruction:\n['content']\n### Response:\n\n```\n\n#### 4) Repository Level Code Completion\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\n\ninput_text = \"\"\"#utils.py\nimport torch\nfrom sklearn import datasets\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.metrics import accuracy_score\n\ndef load_data():\n    iris = datasets.load_iris()\n    X = iris.data\n    y = iris.target\n\n    # Standardize the data\n    scaler = StandardScaler()\n    X = scaler.fit_transform(X)\n\n    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n\n    # Convert numpy data to PyTorch tensors\n    X_train = torch.tensor(X_train, dtype=torch.float32)\n    X_test = torch.tensor(X_test, dtype=torch.float32)\n    y_train = torch.tensor(y_train, dtype=torch.int64)\n    y_test = torch.tensor(y_test, dtype=torch.int64)\n\n    return X_train, X_test, y_train, y_test\n\ndef evaluate_predictions(y_test, y_pred):\n    return accuracy_score(y_test, y_pred)\n\n\n# model.py\nimport torch\nimport torch.nn as nn\nimport torch.optim as optim\nfrom torch.utils.data import DataLoader, TensorDataset\n\nclass IrisClassifier(nn.Module):\n    def __init__(self):\n        super(IrisClassifier, self).__init__()\n        self.fc = nn.Sequential(\n            nn.Linear(4, 16),\n            nn.ReLU(),\n            nn.Linear(16, 3)\n        )\n\n    def forward(self, x):\n        return self.fc(x)\n\n    def train_model(self, X_train, y_train, epochs, lr, batch_size):\n        criterion = nn.CrossEntropyLoss()\n        optimizer = optim.Adam(self.parameters(), lr=lr)\n\n        # Create DataLoader for batches\n        dataset = TensorDataset(X_train, y_train)\n        dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)\n\n        for epoch in range(epochs):\n            for batch_X, batch_y in dataloader:\n                optimizer.zero_grad()\n                outputs = self(batch_X)\n                loss = criterion(outputs, batch_y)\n                loss.backward()\n                optimizer.step()\n\n    def predict(self, X_test):\n        with torch.no_grad():\n            outputs = self(X_test)\n            _, predicted = outputs.max(1)\n        return predicted.numpy()\n\n\n# main.py\nfrom utils import load_data, evaluate_predictions\nfrom model import IrisClassifier as Classifier\n\ndef main():\n    # Model training and evaluation\n\"\"\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_new_tokens=140)\nprint(tokenizer.decode(outputs[0]))\n```\n\n---\nIn the following scenario, the DeepSeek-Coder-6.7B model effectively calls a class **IrisClassifier** and its member function from the `model.py` file, and also utilizes functions from the `utils.py` file, to correctly complete the **main** function in the `main.py` file for model training and evaluation.\n\n![Completion GIF](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_51592700b774.gif)\n\n### 5. How to Fine-tune DeepSeek-Coder\n\nWe provide script `finetune\u002Ffinetune_deepseekcoder.py` for users to finetune our models on downstream tasks.\n\nThe script supports the training with [DeepSpeed](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDeepSpeed). You need install required packages by:\n\n```bash\npip install -r finetune\u002Frequirements.txt\n```\n\nPlease follow [Sample Dataset Format](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnickrosh\u002FEvol-Instruct-Code-80k-v1) to prepare your training data.\nEach line is a json-serialized string with two required fields `instruction` and `output`.\n\nAfter data preparation, you can use the sample shell script to finetune `deepseek-ai\u002Fdeepseek-coder-6.7b-instruct`. \nRemember to specify `DATA_PATH`, `OUTPUT_PATH`.\nAnd please choose appropriate hyper-parameters(e.g., `learning_rate`, `per_device_train_batch_size`) according to your scenario.\n\n```bash\nDATA_PATH=\"\u003Cyour_data_path>\"\nOUTPUT_PATH=\"\u003Cyour_output_path>\"\nMODEL=\"deepseek-ai\u002Fdeepseek-coder-6.7b-instruct\"\n\ncd finetune && deepspeed finetune_deepseekcoder.py \\\n    --model_name_or_path $MODEL_PATH \\\n    --data_path $DATA_PATH \\\n    --output_dir $OUTPUT_PATH \\\n    --num_train_epochs 3 \\\n    --model_max_length 1024 \\\n    --per_device_train_batch_size 16 \\\n    --per_device_eval_batch_size 1 \\\n    --gradient_accumulation_steps 4 \\\n    --evaluation_strategy \"no\" \\\n    --save_strategy \"steps\" \\\n    --save_steps 100 \\\n    --save_total_limit 100 \\\n    --learning_rate 2e-5 \\\n    --warmup_steps 10 \\\n    --logging_steps 1 \\\n    --lr_scheduler_type \"cosine\" \\\n    --gradient_checkpointing True \\\n    --report_to \"tensorboard\" \\\n    --deepspeed configs\u002Fds_config_zero3.json \\\n    --bf16 True\n```\n\n### 6. Detailed Evaluation Results\n\nThe reproducible code for the following evaluation results can be found in the [Evaluation](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fdeepseek-coder\u002Ftree\u002Fmain\u002FEvaluation) directory.\n#### 1) Multilingual HumanEval Benchmark\n![HumanEval](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_842f24f79b18.png)\n\n#### 2) MBPP Benchmark\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_2ab49937fbb1.png\" alt=\"MBPP\" width=\"40%\">\n\n#### 3) DS-1000 Benchmark\n![DS-1000](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_0ee99a272b15.png)\n\n#### 4) Program-Aid Math Reasoning Benchmark\n![Math](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_a42e08c306eb.png)\n\n### Inference with vLLM\n\nYou can also employ [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) for high-throughput inference.\n\n**Text Completion**\n\n```python\nfrom vllm import LLM, SamplingParams\n\ntp_size = 4 # Tensor Parallelism\nsampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)\nmodel_name = \"deepseek-ai\u002Fdeepseek-coder-6.7b-base\"\nllm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)\n\nprompts = [\n    \"If everyone in a country loves one another,\",\n    \"The research should also focus on the technologies\",\n    \"To determine if the label is correct, we need to\"\n]\noutputs = llm.generate(prompts, sampling_params)\n\ngenerated_text = [output.outputs[0].text for output in outputs]\nprint(generated_text)\n```\n\n**Chat Completion**\n\n```python\nfrom transformers import AutoTokenizer\nfrom vllm import LLM, SamplingParams\n\ntp_size = 4 # Tensor Parallelism\nsampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)\nmodel_name = \"deepseek-ai\u002Fdeepseek-coder-6.7b-instruct\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nllm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)\n\nmessages_list = [\n    [{\"role\": \"user\", \"content\": \"Who are you?\"}],\n    [{\"role\": \"user\", \"content\": \"What can you do?\"}],\n    [{\"role\": \"user\", \"content\": \"Explain Transformer briefly.\"}],\n]\nprompts = [tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) for messages in messages_list]\n\nsampling_params.stop = [tokenizer.eos_token]\noutputs = llm.generate(prompts, sampling_params)\n\ngenerated_text = [output.outputs[0].text for output in outputs]\nprint(generated_text)\n```\n\n### 7. Q&A\n\n#### Could You Provide the tokenizer.model File for Model Quantization?\n\nDeepSeek Coder utilizes the [HuggingFace Tokenizer](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftokenizers\u002Findex) to implement the Bytelevel-BPE algorithm, with specially designed pre-tokenizers to ensure optimal performance. Currently, there is no direct way to convert the tokenizer into a SentencePiece tokenizer. We are contributing to the open-source quantization methods facilitate the usage of HuggingFace Tokenizer.\n\n##### GGUF(llama.cpp)\n\nWe have submitted a [PR](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp\u002Fpull\u002F4070) to the popular quantization repository [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp) to fully support all HuggingFace pre-tokenizers, including ours.\n\nWhile waiting for the PR to be merged, you can generate your GGUF model using the following steps:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FDOGEwbx\u002Fllama.cpp.git\ncd llama.cpp\ngit checkout regex_gpt2_preprocess\n# set up the environment according to README\nmake\npython3 -m pip install -r requirements.txt\n# generate GGUF model\npython convert-hf-to-gguf.py \u003CMODEL_PATH> --outfile \u003CGGUF_PATH> --model-name deepseekcoder\n# use q4_0 quantization as an example\n.\u002Fquantize \u003CGGUF_PATH> \u003COUTPUT_PATH> q4_0\n.\u002Fmain -m \u003COUTPUT_PATH> -n 128 -p \u003CPROMPT>\n```\n##### GPTQ(exllamav2)\n\n`UPDATE:`[exllamav2](https:\u002F\u002Fgithub.com\u002Fturboderp\u002Fexllamav2) has been able to support Huggingface Tokenizer. Please pull the latest version and try out.\n\nRemember to set RoPE scaling to 4 for correct output, more discussion could be found in this [PR](https:\u002F\u002Fgithub.com\u002Fturboderp\u002Fexllamav2\u002Fpull\u002F189).\n\n#### How to use the deepseek-coder-instruct to complete the code?\n\nAlthough the deepseek-coder-instruct models are not specifically trained for code completion tasks during supervised fine-tuning (SFT), they retain the capability to perform code completion effectively. To enable this functionality, you simply need to adjust the eos_token_id parameter. Set the eos_token_id to 32014, as opposed to its default value of 32021 in the deepseek-coder-instruct configuration. This modification prompts the model to recognize the end of a sequence differently, thereby facilitating code completion tasks.\n\n\n### 8. Resources\n[awesome-deepseek-coder](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fawesome-deepseek-coder) is a curated list of open-source projects related to DeepSeek Coder.\n\n### 9. License\nThis code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.\n\nSee the [LICENSE-CODE](LICENSE-CODE) and [LICENSE-MODEL](LICENSE-MODEL) for more details.\n\n### 10. Citation\n```\n@misc{deepseek-coder,\n  author = {Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y.K. Li, Fuli Luo, Yingfei Xiong, Wenfeng Liang},\n  title = {DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence},\n  journal = {CoRR},\n  volume = {abs\u002F2401.14196},\n  year = {2024},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.14196},\n}\n```\n\n### 11. Contact\n\nIf you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).\n","\u003Cp align=\"center\">\n\u003Cimg width=\"1000px\" alt=\"DeepSeek Coder\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_08bce2316927.png\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\">[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_c8241abaf847.png\" width=\"20px\"> 首页]\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fchat.deepseek.com\u002F\">[🤖 与 DeepSeek Coder 聊天]\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\">[🤗 模型下载]\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTc7c45Zzu5\">[Discord]\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fguoday\u002Fassert\u002Fblob\u002Fmain\u002FQR.png?raw=true\">[WeChat (微信)]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2401.14196\">\u003Cb>论文链接\u003C\u002Fb>👁️\u003C\u002Fa>\n\u003C\u002Fp>\n\u003Chr>\n\n\n### 1. DeepSeek Coder 简介\n\nDeepSeek Coder 是一系列从头开始训练的代码语言模型，总共使用了 2T 的数据进行训练，其中 87% 是代码，13% 是自然语言，涵盖英语和中文两种语言。我们提供了多种规模的代码模型，从 1B 到 33B 不等。每个模型都基于项目级别的代码语料库进行预训练，采用了 16K 的窗口大小以及额外的填空任务，以支持项目级别的代码补全和修复。在编码能力方面，DeepSeek Coder 在多个编程语言和各类基准测试中均达到了开源代码模型中的最先进水平。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_b9a43d8cbe71.png\" alt=\"result\" width=\"70%\">\n\u003C\u002Fp>\n\n- **海量训练数据**：从头开始训练，使用了 2T 的数据，其中包括 87% 的代码和 13% 的语言数据，覆盖英语和中文。\n  \n- **高度灵活且可扩展**：提供 1B、5.7B、6.7B 和 33B 四种模型规模，用户可以根据自身需求选择最适合的配置。\n\n- **卓越的模型性能**：在 HumanEval、MultiPL-E、MBPP、DS-1000 和 APPS 等公开可用的代码模型基准测试中，性能处于领先地位。\n\n- **先进的代码补全能力**：采用 16K 的窗口大小和填空任务，支持项目级别的代码补全和修复。\n\n#### 支持的编程语言\n`['ada', 'agda', 'alloy', 'antlr', 'applescript', 'assembly', 'augeas', 'awk', 'batchfile', 'bluespec', 'c', 'c-sharp', 'clojure', 'cmake', 'coffeescript', 'common-lisp', 'cpp', 'css', 'cuda', 'dart', 'dockerfile', 'elixir', 'elm', 'emacs-lisp', 'erlang', 'f-sharp', 'fortran', 'glsl', 'go', 'groovy', 'haskell', 'html', 'idris', 'isabelle', 'java', 'java-server-pages', 'javascript', 'json', 'julia', 'jupyter-notebook', 'kotlin', 'lean', 'literate-agda', 'literate-coffeescript', 'literate-haskell', 'lua', 'makefile', 'maple', 'markdown', 'mathematica', 'matlab', 'ocaml', 'pascal', 'perl', 'php', 'powershell', 'prolog', 'protocol-buffer', 'python', 'r', 'racket', 'restructuredtext', 'rmarkdown', 'ruby', 'rust', 'sas', 'scala', 'scheme', 'shell', 'smalltalk', 'solidity', 'sparql', 'sql', 'stan', 'standard-ml', 'stata', 'systemverilog', 'tcl', 'tcsh', 'tex', 'thrift', 'typescript', 'verilog', 'vhdl', 'visual-basic', 'xslt', 'yacc', 'yaml', 'zig']`\n\n### 2. 评估结果\n我们在多个与编码相关的基准测试上对 DeepSeek Coder 进行了评估。\n此处仅报告 HumanEval（Python 和多语言）、MBPP 以及 DS-1000 上的 `pass@1` 结果：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_4c419f51a76d.png\" alt=\"table\" width=\"70%\">\n\u003C\u002Fp>\n\n\n结果显示，DeepSeek-Coder-Base-33B 显著优于现有的开源代码大模型。与 CodeLlama-34B 相比，在 HumanEval Python、HumanEval 多语言、MBPP 和 DS-1000 上分别领先 7.9%、9.3%、10.8% 和 5.9%。令人惊讶的是，我们的 DeepSeek-Coder-Base-7B 已经达到了 CodeLlama-34B 的性能水平。而经过指令微调后的 DeepSeek-Coder-Instruct-33B 模型，在 HumanEval 上的表现超越了 GPT35-turbo，而在 MBPP 上则与之不相上下。\n\n更多评估细节请参见 [详细评估结果](#6-detailed-evaluation-results)。\n\n\n### 3. 数据构建与模型训练流程\n\n#### 数据构建\n\n- 步骤 1：从 GitHub 收集代码数据，并应用与 [StarCoder Data](https:\u002F\u002Fgithub.com\u002Fbigcode-project\u002Fbigcode-dataset) 相同的过滤规则来清理数据。\n- 步骤 2：解析同一仓库内文件之间的依赖关系，根据依赖顺序重新排列文件位置。\n- 步骤 3：将相互依赖的文件拼接成一个完整的示例，并使用仓库级别的 minhash 进行去重。\n- 步骤 4：进一步过滤掉低质量的代码，例如存在语法错误或可读性较差的代码。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_def754f0f3c2.png\" alt=\"data_creation\" width=\"100%\">\n\n#### 模型训练\n\n- 步骤 1：初始预训练阶段使用的数据集中包含 87% 的代码、10% 的代码相关语言（如 GitHub Markdown 和 StackExchange）以及 3% 的非代码相关中文语言。在此步骤中，模型使用 1.8T 的数据和 4K 的窗口大小进行预训练。\n- 步骤 2：进一步使用 200B 的数据进行扩展的 16K 窗口大小预训练，最终得到基础模型（**DeepSeek-Coder-Base**）。\n- 步骤 3：使用 2B 的指令数据进行指令微调，从而得到指令微调后的模型（**DeepSeek-Coder-Instruct**）。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_38995ed64995.png\" alt=\"model_pretraining\" width=\"100%\">\n\n### 4. 使用方法\n在开始之前，您需要安装必要的依赖项。可以通过运行以下命令来完成：\n```\npip install -r requirements.txt\n```\n此外，在 [🤗 Hugging Face Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002Fdeepseek-coder-33b-instruct) 上也提供了一个演示，您也可以使用 [demo](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fdeepseek-coder\u002Ftree\u002Fmain\u002Fdemo) 文件夹中的 `app.py` 在本地运行该演示。（感谢 HF 团队的所有支持）\n\n以下是关于我们模型的一些使用示例。\n\n#### 1) 代码补全\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\ninput_text = \"#write a quick sort algorithm\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_length=128)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n```\n这段代码将输出以下结果：\n```\ndef quick_sort(arr):\n    if len(arr) \u003C= 1:\n        return arr\n    pivot = arr[0]\n    left = []\n    right = []\n    for i in range(1, len(arr)):\n        if arr[i] \u003C pivot:\n            left.append(arr[i])\n        else:\n            right.append(arr[i])\n    return quick_sort(left) + [pivot] + quick_sort(right)\n```\n\n#### 2) 代码插入\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\ninput_text = \"\"\"\u003C｜fim▁begin｜>def quick_sort(arr):\n    if len(arr) \u003C= 1:\n        return arr\n    pivot = arr[0]\n    left = []\n    right = []\n\u003C｜fim▁hole｜>\n        if arr[i] \u003C pivot:\n            left.append(arr[i])\n        else:\n            right.append(arr[i])\n    return quick_sort(left) + [pivot] + quick_sort(right)\u003C｜fim▁end｜>\"\"\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_length=128)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])\n```\n这段代码将输出以下结果：\n```\n   for i in range(1, len(arr)):\n```\n\n#### 3) 聊天模型推理\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-instruct\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-instruct\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\nmessages=[\n    { 'role': 'user', 'content': \"write a quick sort algorithm in python.\"}\n]\ninputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors=\"pt\").to(model.device)\n# tokenizer.eos_token_id 是 \u003C|EOT|> 标记的 ID\noutputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)\nprint(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))\n```\n这段代码将输出以下结果：\n```\n当然，这里有一个简单的 Python 快速排序算法实现：\n\ndef quick_sort(arr):\n    if len(arr) \u003C= 1:\n        return arr\n    else:\n        pivot = arr[0]\n        less_than_pivot = [x for x in arr[1:] if x \u003C= pivot]\n        greater_than_pivot = [x for x in arr[1:] if x > pivot]\n        return quick_sort(less_than_pivot) + [pivot] + quick_sort(greater_than_pivot)\n\n# 测试函数\narr = [10, 7, 8, 9, 1, 5]\nprint(\"原始数组:\", arr)\nprint(\"排序后的数组:\", quick_sort(arr))\n\n这段代码通过选择数组中的一个“基准”元素，并根据其与基准的关系将其他元素分为两组：小于基准和大于基准。然后，基准元素就位于其最终位置上。接下来，对这两组分别重复上述过程。\n```\n\n如果您不想使用提供的 API `apply_chat_template`，它会从 `tokenizer_config.json` 中加载模板，那么您可以使用以下模板与我们的模型进行对话。将 `['content']` 替换为您的指令以及模型之前的回复（如果有的话），模型就会针对当前给出的指令生成回复。\n```\n你是一位 AI 编程助手，使用由 DeepSeek 公司开发的 DeepSeek Coder 模型，只回答与计算机科学相关的问题。对于政治敏感问题、安全与隐私问题以及其他非计算机科学类问题，你将拒绝回答。\n### 指令：\n['content']\n### 回答：\n['content']\n\u003C|EOT|>\n### 指令：\n['content']\n### 回答：\n\n```\n\n#### 4) 仓库级别的代码补全\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\n\ninput_text = \"\"\"#utils.py\nimport torch\nfrom sklearn import datasets\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.metrics import accuracy_score\n\ndef load_data():\n    iris = datasets.load_iris()\n    X = iris.data\n    y = iris.target\n\n    # 数据标准化\n    scaler = StandardScaler()\n    X = scaler.fit_transform(X)\n\n    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n\n    # 将 numpy 数据转换为 PyTorch 张量\n    X_train = torch.tensor(X_train, dtype=torch.float32)\n    X_test = torch.tensor(X_test, dtype=torch.float32)\n    y_train = torch.tensor(y_train, dtype=torch.int64)\n    y_test = torch.tensor(y_test, dtype=torch.int64)\n\n    return X_train, X_test, y_train, y_test\n\ndef evaluate_predictions(y_test, y_pred):\n    return accuracy_score(y_test, y_pred)\n\n# model.py\nimport torch\nimport torch.nn as nn\nimport torch.optim as optim\nfrom torch.utils.data import DataLoader, TensorDataset\n\nclass IrisClassifier(nn.Module):\n    def __init__(self):\n        super(IrisClassifier, self).__init__()\n        self.fc = nn.Sequential(\n            nn.Linear(4, 16),\n            nn.ReLU(),\n            nn.Linear(16, 3)\n        )\n\n    def forward(self, x):\n        return self.fc(x)\n\n    def train_model(self, X_train, y_train, epochs, lr, batch_size):\n        criterion = nn.CrossEntropyLoss()\n        optimizer = optim.Adam(self.parameters(), lr=lr)\n\n        # 创建数据加载器以进行批次处理\n        dataset = TensorDataset(X_train, y_train)\n        dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)\n\n        for epoch in range(epochs):\n            for batch_X, batch_y in dataloader:\n                optimizer.zero_grad()\n                outputs = self(batch_X)\n                loss = criterion(outputs, batch_y)\n                loss.backward()\n                optimizer.step()\n\n    def predict(self, X_test):\n        with torch.no_grad():\n            outputs = self(X_test)\n            _, predicted = outputs.max(1)\n        return predicted.numpy()\n\n\n# main.py\nfrom utils import load_data, evaluate_predictions\nfrom model import IrisClassifier as Classifier\n\ndef main():\n    # 模型训练和评估\n\"\"\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_new_tokens=140)\nprint(tokenizer.decode(outputs[0]))\n```\n\n---\n在以下场景中，DeepSeek-Coder-6.7B模型能够有效地调用`model.py`文件中的`IrisClassifier`类及其成员函数，并使用`utils.py`文件中的函数，从而正确完成`main.py`文件中的主函数，实现模型的训练与评估。\n\n![完成示例动图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_51592700b774.gif)\n\n### 5. 如何微调 DeepSeek-Coder\n\n我们提供了脚本 `finetune\u002Ffinetune_deepseekcoder.py`，供用户在其下游任务上对我们的模型进行微调。\n\n该脚本支持使用 [DeepSpeed](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDeepSpeed) 进行训练。您需要通过以下命令安装所需的依赖包：\n\n```bash\npip install -r finetune\u002Frequirements.txt\n```\n\n请按照 [样本数据集格式](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnickrosh\u002FEvol-Instruct-Code-80k-v1) 准备您的训练数据。每行应为一个 JSON 序列化的字符串，包含两个必填字段：`instruction` 和 `output`。\n\n数据准备完成后，您可以使用示例 Shell 脚本来微调 `deepseek-ai\u002Fdeepseek-coder-6.7b-instruct`。请务必指定 `DATA_PATH` 和 `OUTPUT_PATH`，并根据您的具体情况选择合适的超参数（例如：`learning_rate`、`per_device_train_batch_size`）。\n\n```bash\nDATA_PATH=\"\u003Cyour_data_path>\"\nOUTPUT_PATH=\"\u003Cyour_output_path>\"\nMODEL=\"deepseek-ai\u002Fdeepseek-coder-6.7b-instruct\"\n\ncd finetune && deepspeed finetune_deepseekcoder.py \\\n    --model_name_or_path $MODEL_PATH \\\n    --data_path $DATA_PATH \\\n    --output_dir $OUTPUT_PATH \\\n    --num_train_epochs 3 \\\n    --model_max_length 1024 \\\n    --per_device_train_batch_size 16 \\\n    --per_device_eval_batch_size 1 \\\n    --gradient_accumulation_steps 4 \\\n    --evaluation_strategy \"no\" \\\n    --save_strategy \"steps\" \\\n    --save_steps 100 \\\n    --save_total_limit 100 \\\n    --learning_rate 2e-5 \\\n    --warmup_steps 10 \\\n    --logging_steps 1 \\\n    --lr_scheduler_type \"cosine\" \\\n    --gradient_checkpointing True \\\n    --report_to \"tensorboard\" \\\n    --deepspeed configs\u002Fds_config_zero3.json \\\n    --bf16 True\n```\n\n### 6. 详细评估结果\n\n以下评估结果的可复现代码可在 [Evaluation](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fdeepseek-coder\u002Ftree\u002Fmain\u002FEvaluation) 目录中找到。\n#### 1) 多语言 HumanEval 基准测试\n![HumanEval](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_842f24f79b18.png)\n\n#### 2) MBPP 基准测试\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_2ab49937fbb1.png\" alt=\"MBPP\" width=\"40%\">\n\n#### 3) DS-1000 基准测试\n![DS-1000](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_0ee99a272b15.png)\n\n#### 4) Program-Aid 数学推理基准测试\n![Math](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_readme_a42e08c306eb.png)\n\n### 使用 vLLM 进行推理\n\n您也可以使用 [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) 进行高吞吐量推理。\n\n**文本补全**\n\n```python\nfrom vllm import LLM, SamplingParams\n\ntp_size = 4 # 张量并行度\nsampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)\nmodel_name = \"deepseek-ai\u002Fdeepseek-coder-6.7b-base\"\nllm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)\n\nprompts = [\n    \"如果一个国家的所有人都彼此相爱，\",\n    \"研究还应关注那些技术，\",\n    \"为了确定标签是否正确，我们需要\"\n]\noutputs = llm.generate(prompts, sampling_params)\n\ngenerated_text = [output.outputs[0].text for output in outputs]\nprint(generated_text)\n```\n\n**聊天补全**\n\n```python\nfrom transformers import AutoTokenizer\nfrom vllm import LLM, SamplingParams\n\ntp_size = 4 # 张量并行度\nsampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)\nmodel_name = \"deepseek-ai\u002Fdeepseek-coder-6.7b-instruct\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nllm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)\n\nmessages_list = [\n    [{\"role\": \"user\", \"content\": \"你是谁？\"}],\n    [{\"role\": \"user\", \"content\": \"你能做什么？\"}],\n    [{\"role\": \"user\", \"content\": \"请简要解释一下 Transformer。\"}],\n]\nprompts = [tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) for messages in messages_list]\n\nsampling_params.stop = [tokenizer.eos_token]\noutputs = llm.generate(prompts, sampling_params)\n\ngenerated_text = [output.outputs[0].text for output in outputs]\nprint(generated_text)\n```\n\n### 7. 问答\n\n#### 是否可以提供用于模型量化的 tokenizer.model 文件？\n\nDeepSeek Coder 使用 [HuggingFace Tokenizer](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftokenizers\u002Findex) 实现 Bytelevel-BPE 算法，并设计了特殊的预分词器以确保最佳性能。目前尚无直接方法将该分词器转换为 SentencePiece 格式。我们正在致力于开源量化方法，以促进 HuggingFace Tokenizer 的应用。\n\n##### GGUF (llama.cpp)\n\n我们已向流行的量化库 [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp) 提交了一个 [PR](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp\u002Fpull\u002F4070)，旨在全面支持所有 HuggingFace 预分词器，包括我们的预分词器。\n\n在 PR 合并之前，您可以通过以下步骤生成自己的 GGUF 模型：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FDOGEwbx\u002Fllama.cpp.git\ncd llama.cpp\ngit checkout regex_gpt2_preprocess\n# 按照 README 设置环境\nmake\npython3 -m pip install -r requirements.txt\n# 生成 GGUF 模型\npython convert-hf-to-gguf.py \u003CMODEL_PATH> --outfile \u003CGGUF_PATH> --model-name deepseekcoder\n\n# 以 q4_0 量化为例\n.\u002Fquantize \u003CGGUF_PATH> \u003COUTPUT_PATH> q4_0\n.\u002Fmain -m \u003COUTPUT_PATH> -n 128 -p \u003CPROMPT>\n```\n##### GPTQ(exllamav2)\n\n`更新：`[exllamav2](https:\u002F\u002Fgithub.com\u002Fturboderp\u002Fexllamav2) 已经支持 Huggingface Tokenizer。请拉取最新版本并尝试使用。\n\n请注意，为了获得正确的输出，需将 RoPE 缩放因子设置为 4，更多讨论可参见此 [PR](https:\u002F\u002Fgithub.com\u002Fturboderp\u002Fexllamav2\u002Fpull\u002F189)。\n\n#### 如何使用 DeepSeek-Coder-Instruct 完成代码？\n\n尽管 DeepSeek-Coder-Instruct 模型在监督微调（SFT）过程中并未专门针对代码补全任务进行训练，但它们仍然具备高效执行代码补全的能力。要启用此功能，只需调整 eos_token_id 参数即可。将 eos_token_id 设置为 32014，而不是 DeepSeek-Coder-Instruct 配置中的默认值 32021。这一修改会促使模型以不同的方式识别序列的结束，从而更好地支持代码补全任务。\n\n\n### 8. 资源\n[awesome-deepseek-coder](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fawesome-deepseek-coder) 是一份关于 DeepSeek Coder 的开源项目精选列表。\n\n### 9. 许可证\n本代码仓库采用 MIT 许可证。DeepSeek Coder 模型的使用则受模型许可证约束。DeepSeek Coder 支持商业用途。\n\n更多详情请参阅 [LICENSE-CODE](LICENSE-CODE) 和 [LICENSE-MODEL](LICENSE-MODEL)。\n\n### 10. 引用\n```\n@misc{deepseek-coder,\n  author = {郭达亚、朱启浩、杨德健、谢振达、董凯、张文涛、陈冠廷、毕晓、吴Y、李YK、罗富利、熊英飞、梁文峰},\n  title = {DeepSeek-Coder：当大型语言模型遇见编程——代码智能的崛起},\n  journal = {CoRR},\n  volume = {abs\u002F2401.14196},\n  year = {2024},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.14196},\n}\n```\n\n### 11. 联系方式\n\n如有任何问题，请提交 Issue 或发送邮件至 [service@deepseek.com](mailto:service@deepseek.com)。","# DeepSeek-Coder 快速上手指南\n\nDeepSeek-Coder 是由深度求索（DeepSeek）推出的一系列代码语言模型，支持 80+ 种编程语言。该模型在 2T token 数据上从头训练（含 87% 代码和 13% 自然语言），提供从 1B 到 33B 多种尺寸版本，具备卓越的项目级代码补全、填空及指令遵循能力。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐) 或 Windows\u002FmacOS\n*   **Python**: 3.8 及以上版本\n*   **GPU**: 推荐使用 NVIDIA GPU 并安装 CUDA 驱动\n    *   运行 6.7B 模型建议显存 ≥ 16GB (使用 bfloat16 精度)\n    *   运行 33B 模型建议显存 ≥ 48GB (或使用多卡\u002F量化方案)\n*   **前置依赖**:\n    *   `torch` (建议 2.0+)\n    *   `transformers` (建议 4.35+)\n    *   `accelerate`\n\n## 2. 安装步骤\n\n### 方法一：通过 requirements.txt 安装（推荐）\n如果您已克隆官方仓库，请直接运行：\n```bash\npip install -r requirements.txt\n```\n\n### 方法二：手动安装核心依赖\n若未克隆仓库，可直接安装必要的 Python 包。国内用户建议使用清华或阿里镜像源加速下载：\n```bash\npip install torch transformers accelerate -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n> **注意**：首次加载模型时会自动从 Hugging Face 下载权重。国内网络若连接缓慢，建议配置 Hugging Face 镜像环境变量：\n> ```bash\n> export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com\n> ```\n\n## 3. 基本使用\n\nDeepSeek-Coder 提供两种主要类型的模型：\n*   **Base 版** (`-base`): 适用于代码补全 (Completion) 和代码填空 (Infilling)。\n*   **Instruct 版** (`-instruct`): 适用于对话交互和指令遵循 (Chat)。\n\n以下是最简单的使用示例（以 6.7B 模型为例）：\n\n### 场景一：代码补全 (Code Completion)\n适用于根据注释或上下文生成后续代码。\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\n# 加载 Base 模型\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\n\ninput_text = \"#write a quick sort algorithm\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\n\n# 生成代码\noutputs = model.generate(**inputs, max_length=128)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n```\n\n### 场景二：代码填空 (Code Insertion)\n适用于在现有代码中间插入缺失逻辑（使用 `\u003C｜fim▁begin｜>`, `\u003C｜fim▁hole｜>`, `\u003C｜fim▁end｜>` 标记）。\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\n\ninput_text = \"\"\"\u003C｜fim▁begin｜>def quick_sort(arr):\n    if len(arr) \u003C= 1:\n        return arr\n    pivot = arr[0]\n    left = []\n    right = []\n\u003C｜fim▁hole｜>\n        if arr[i] \u003C pivot:\n            left.append(arr[i])\n        else:\n            right.append(arr[i])\n    return quick_sort(left) + [pivot] + quick_sort(right)\u003C｜fim▁end｜>\"\"\"\n\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_length=128)\n\n# 仅输出填充部分\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])\n```\n\n### 场景三：对话模式 (Chat \u002F Instruct)\n适用于向模型提问或请求编写完整功能模块。\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\n# 加载 Instruct 模型\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-instruct\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-instruct\", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()\n\nmessages = [\n    { 'role': 'user', 'content': \"write a quick sort algorithm in python.\"}\n]\n\n# 应用对话模板\ninputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors=\"pt\").to(model.device)\n\noutputs = model.generate(\n    inputs, \n    max_new_tokens=512, \n    do_sample=False, \n    top_k=50, \n    top_p=0.95, \n    num_return_sequences=1, \n    eos_token_id=tokenizer.eos_token_id\n)\n\nprint(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))\n```","某初创团队的后端工程师正在紧急重构一个包含十万行代码的遗留电商系统，需将核心交易模块从 Python 迁移至 Go 语言并修复潜在并发漏洞。\n\n### 没有 DeepSeek-Coder 时\n- **跨语言迁移效率极低**：工程师需手动逐行翻译 Python 逻辑为 Go 代码，不仅耗时数天，还极易因语法习惯差异引入逻辑错误。\n- **上下文理解断裂**：面对长达数千行的复杂函数，通用模型受限于短上下文窗口，无法关联项目其他文件中的依赖定义，导致生成的代码无法直接运行。\n- **多语言支持薄弱**：在处理涉及 SQL 存储过程与 Shell 部署脚本的混合编程场景时，工具频繁出现语法幻觉，需要人工反复修正。\n- **调试成本高昂**：生成的代码缺乏对项目级结构的感知，缺少必要的错误处理机制，导致测试阶段报错频发，排查困难。\n\n### 使用 DeepSeek-Coder 后\n- **智能项目级迁移**：利用其 16K 上下文窗口和项目级预训练能力，DeepSeek-Coder 能一次性读取整个模块，自动完成从 Python 到 Go 的高保真转换，保持原有业务逻辑不变。\n- **精准代码填充**：通过“填空式”生成任务，它能准确识别缺失的并发锁机制与接口定义，直接在现有代码框架中补全高质量的 Go 实现。\n- **全栈语言无缝切换**：凭借对 80+ 种编程语言的深度掌握，DeepSeek-Coder 在同一会话中流畅处理 Go 主程序、SQL 查询优化及 Docker 配置，无需切换工具。\n- **一次通过率显著提升**：生成的代码天然符合项目规范且包含完善的异常处理，在 HumanEval 等基准测试中表现超越同类开源模型，大幅减少后续调试时间。\n\nDeepSeek-Coder 凭借其对项目级上下文的深刻理解和卓越的跨语言能力，将原本需要数天的重构工作压缩至数小时，真正实现了让代码自我编写。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdeepseek-ai_DeepSeek-Coder_b46e564a.png","deepseek-ai","DeepSeek","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fdeepseek-ai_04503588.png","",null,"service@deepseek.com","https:\u002F\u002Fwww.deepseek.com\u002F","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai",[86,90],{"name":87,"color":88,"percentage":89},"Python","#3572A5",99.7,{"name":91,"color":92,"percentage":93},"Shell","#89e051",0.3,23052,2757,"2026-04-18T14:05:54","MIT","未说明","需要 NVIDIA GPU (代码示例使用 .cuda())，显存需求取决于模型大小：1B\u002F5.7B\u002F6.7B 版本建议 8GB+，33B 版本建议 24GB+ (需使用 bfloat16 精度)","未说明 (建议系统内存大于模型权重大小，33B 模型建议 64GB+)",{"notes":102,"python":98,"dependencies":103},"1. 代码示例明确使用 torch.bfloat16 数据类型和 CUDA 加速，因此需要支持 bfloat16 的较新 NVIDIA 显卡（如 Ampere 架构及以上）和相应 CUDA 版本。2. 提供多种尺寸模型（1B, 5.7B, 6.7B, 33B），请根据硬件条件选择。3. 加载模型时需设置 trust_remote_code=True。4. 支持项目级代码补全，上下文窗口大小为 16K。",[104,105],"torch","transformers",[15],"2026-03-27T02:49:30.150509","2026-04-19T09:15:04.784667",[110,115,120,125,129,134],{"id":111,"question_zh":112,"answer_zh":113,"source_url":114},42098,"如何在单张 A100 显卡上加速 33B 模型的推理？有模型并行方案吗？","直接使用 Hugging Face 原生加载方式会比较慢，建议部署时使用 vLLM，速度会快很多。在 vLLM 中通常使用张量并行（Tensor Parallelism, TP），一般设置为 TP4 或 TP8。虽然理论上 TP 可以减轻单卡访存压力，但在某些情况下（如自回归解码），单卡到多卡引入的通信同步开销可能导致多卡 TP 不如单卡快，具体性能需根据实际框架和硬件测试。如果显存不足（非 80G 版本），也可以考虑使用 GGUF 格式进行量化加载。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder\u002Fissues\u002F15",{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},42099,"为什么 FIM (Fill-In-the-Middle) 功能无法正常工作，输出乱码？","使用 FIM 功能时请注意以下几点：\n1. 对于 Base 模型，必须在 prompt 前添加 bos token。\n2. 虽然官方未专门针对 FIM 任务进行微调，但 Instruction 模型依然具备 FIM 能力。\n3. 确保特殊 token（如 `\u003C｜fim▁begin｜>`, `\u003C｜fim▁hole｜>`, `\u003C｜fim▁end｜>`）编码正确。\n\n参考代码示例：\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nmodel = AutoModelForCausalLM.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\u002F\", trust_remote_code=True, device_map=\"auto\")\ntokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai\u002Fdeepseek-coder-6.7b-base\u002F\", trust_remote_code=True, padding_side='right')\nprompt = \"\"\"\u003C｜fim▁begin｜>#!\u002Fusr\u002Fbin\u002Fenv python3\n\u003C｜fim▁hole｜>\n\u003C｜fim▁end｜>\"\"\"\nids = tokenizer(prompt, return_tensors='pt').input_ids\nout = model.generate(ids, max_new_tokens=100, do_sample=False, top_k=50, top_p=0.95)\n```","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder\u002Fissues\u002F71",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},42100,"微调 DeepSeek-Coder 时，自定义数据集应该是什么格式？","自定义数据集应整理为 JSON 格式，包含 `instruction` 和 `output` 字段。可以参考 HuggingFace 上的 `Evol-Instruct-Code-80k-v1` 数据集格式。示例如下：\n```json\n[\n  {\n    \"instruction\": \"give python syntax in a Nutshell\",\n    \"output\": \"Python is a high-level programming language...\"\n  },\n  {\n    \"instruction\": \"Print the content in between the curly brackets\",\n    \"output\": \"print(content)\"\n  }\n]\n```\n加载数据集后可直接使用官方提供的微调脚本（如 `finetune_deepseekcoder.py`）进行训练，用法与其他 Llama 模型类似。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder\u002Fissues\u002F137",{"id":126,"question_zh":127,"answer_zh":128,"source_url":124},42101,"微调后保存的模型加载时报错 `mismatched_sizes` 怎么办？","如果在微调保存后加载模型遇到 `mismatched_sizes` 错误，可以在加载预训练模型时添加 `ignore_mismatched_sizes=True` 参数。同时确保使用 `trust_remote_code=True`。示例代码如下：\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\ntokenizer = AutoTokenizer.from_pretrained(\"SaveOutputFolder\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"SaveOutputFolder\", \n    ignore_mismatched_sizes=True, \n    trust_remote_code=True, \n    torch_dtype=torch.bfloat16\n).cuda()\n\ninput_text = \"#write a quick sort algorithm\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_length=128, top_k=50, top_p=0.95, do_sample=True)\n```",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},42102,"微调后进行 HumanEval 评测报错 `Failed to extract code block with error list index out of range` 是什么原因？","该错误通常是因为评测脚本无法从模型输出中正确提取代码块。解决方法是修改 `Evaluation\u002FHumaneval\u002Futils\u002Futils.py` 文件中的 `extract_generation_code` 函数，增加一个包装器（wrapper）来处理输出格式，确保能正确解析生成的代码部分。此外，需确认输出中是否确实包含代码块，有时即使输出格式略有不同，只要逻辑正确，Pass@1 指标仍可能正常。如果是特定语言（如 Java）出现此问题，请检查该语言的提示词模板是否正确引导模型输出代码块。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder\u002Fissues\u002F111",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},42103,"在哪里可以找到 `tokenizer.model` 文件以支持 exllama 等第三方框架？","DeepSeek-Coder 基于 Llama 架构，通常不直接提供单独的 `tokenizer.model` 文件（这是 SentencePiece 的旧格式）。如果您需要使用 exllama 或其他依赖该文件的框架，可以尝试从对应的 HuggingFace 模型仓库中下载配置文件，或者使用 HuggingFace 的 `transformers` 库加载 tokenizer 并转换格式。部分社区成员可能已经上传了转换后的文件，建议在 HuggingFace 数据集或模型页面搜索相关资源，或在 Discord 社区询问是否有现成的转换工具。","https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder\u002Fissues\u002F50",[]]