[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-microsoft--DialoGPT":3,"tool-microsoft--DialoGPT":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",154349,2,"2026-04-13T23:32:16",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":10,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":108,"github_topics":109,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":119,"updated_at":120,"faqs":121,"releases":150},7320,"microsoft\u002FDialoGPT","DialoGPT","Large-scale pretraining for dialogue","DialoGPT 是一款由微软研发的大规模预训练对话生成模型，旨在让机器像人类一样进行自然流畅的多轮交流。它主要解决了传统聊天机器人回复生硬、缺乏上下文连贯性以及难以维持长期对话的问题。通过在 Reddit 上提取的 1.47 亿条多轮对话数据进行训练，DialoGPT 在单次对话的图灵测试中，其生成回复的质量已能与人类表现相媲美。\n\n这款工具特别适合 AI 研究人员、开发者以及对自然语言处理感兴趣的技术爱好者使用。研究人员可以利用其开源代码复现前沿成果或探索对话系统的边界；开发者则能基于提供的小型、中型和大型三种不同规模的预训练模型，快速构建智能客服、虚拟助手或创意写作辅助工具。普通用户也可通过集成的演示项目体验与高智商 AI 聊天的乐趣。\n\nDialoGPT 的核心技术亮点在于其基于 OpenAI GPT-2 架构进行了大规模针对性预训练，并支持分布式训练与混合精度计算，使得在有限硬件资源下高效训练大模型成为可能。此外，项目还结合了 DialogRPT 排序模型进一步优化回复质量。值得注意的是，虽然官方已推出性能更强的继任者 GODEL，但 DialoGPT 凭借其成熟的生态和在 ","DialoGPT 是一款由微软研发的大规模预训练对话生成模型，旨在让机器像人类一样进行自然流畅的多轮交流。它主要解决了传统聊天机器人回复生硬、缺乏上下文连贯性以及难以维持长期对话的问题。通过在 Reddit 上提取的 1.47 亿条多轮对话数据进行训练，DialoGPT 在单次对话的图灵测试中，其生成回复的质量已能与人类表现相媲美。\n\n这款工具特别适合 AI 研究人员、开发者以及对自然语言处理感兴趣的技术爱好者使用。研究人员可以利用其开源代码复现前沿成果或探索对话系统的边界；开发者则能基于提供的小型、中型和大型三种不同规模的预训练模型，快速构建智能客服、虚拟助手或创意写作辅助工具。普通用户也可通过集成的演示项目体验与高智商 AI 聊天的乐趣。\n\nDialoGPT 的核心技术亮点在于其基于 OpenAI GPT-2 架构进行了大规模针对性预训练，并支持分布式训练与混合精度计算，使得在有限硬件资源下高效训练大模型成为可能。此外，项目还结合了 DialogRPT 排序模型进一步优化回复质量。值得注意的是，虽然官方已推出性能更强的继任者 GODEL，但 DialoGPT 凭借其成熟的生态和在 Hugging Face 上的便捷调用方式，依然是学习对话式 AI 的经典范例。","# A State-of-the-Art Large-scale Pretrained Response Generation Model (DialoGPT)\n\n## This project page is no longer maintained as DialoGPT is superseded by [GODEL](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FGODEL), which outperforms DialoGPT according to the results of [this paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2206.11309.pdf). Unless you use DialoGPT for reproducibility reasons, we highly recommend you switch to [GODEL](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FGODEL).\n\nThis repository contains the source code and trained model for a large-scale pretrained dialogue response generation model. The [human evaluation results](#human_eval) indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test.\n\n\u003C!--See more details on our [project page](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fproject\u002Flarge-scale-pretraining-for-response-generation\u002F)-->\n\nThe repository is based on [huggingface pytorch-transformer](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransfer-learning-conv-ai) and [OpenAI GPT-2](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgpt-2), containing data extraction script, model training code and pretrained small (117M) medium (345M) and large (762M) model checkpoint.\n\nThe model is trained on 147M multi-turn dialogue from Reddit discussion thread. The largest model can be trained in several hours on a 8 V100 machines (however this is not required), with distributed training and FP16 option. \n\nThe include script can be used to reproduce the results of DSTC-7 grounded dialogue generation challenge and a 6k multi-reference dataset created from Reddit data. \n\nProject webpage: [https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fproject\u002Flarge-scale-pretraining-for-response-generation\u002F](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fproject\u002Flarge-scale-pretraining-for-response-generation\u002F)\n\nArXiv paper: [https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.00536](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.00536)\n\n\n## News ##\n\n***(Update 07\u002F09\u002F2022) Changes on the files.pushshift.io\u002Freddit server caused our data generation pipeline to break. These problems have now been fixed, and the steps explained in the Data Preparation subsection below should work again. Data is generated in about 10 hours with 8 processes (`-j 8`), and 800GB of temporary disk space is needed.***\n\n***(Update 06\u002F23\u002F2021) We have released a retrieval-augmented\u002Fgrounded version of DialoGPT (RetGen), please check out the [RetGen repo](https:\u002F\u002Fgithub.com\u002Fdreasysnail\u002FRetGen)  and [RetGen paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.06597)***\n\n***(Update 05\u002F20\u002F2021) An awesome [video walkthrough](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Zo679MYoJns) on YouTube for DialoGPT by [Prakhar Mishra](http:\u002F\u002Fwsl.iiitb.ac.in\u002Fprakhar-mishra\u002F)***\n\n***(Update 03\u002F31\u002F2021) A 3rd party demo by [AK391](https:\u002F\u002Fgithub.com\u002FAK391) using Gradio [web demo](https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT) try it out***\n\n\n***(Update 09\u002F15\u002F2020) A set of large-scale [dialog ranking models](https:\u002F\u002Fgithub.com\u002Fgolsun\u002FDialogRPT) has been released!***\n\nDialoGPT generation is improved by integrating with our latest dialog ranking models, [DialogRPT](https:\u002F\u002Fgithub.com\u002Fgolsun\u002FDialogRPT)\n\n***(Update 07\u002F08\u002F2020) The 6K multi-ref test set has been released!***\n\nTo generate the data, pleaser run `demo.py` and set the data option to 'full', the generated 6k multi-ref test set will be located at\n\n`.\u002Fdata\u002Ftest.refs.txt`\n\n***(Update 03\u002F10\u002F2020) Model cards available in Huggingface Transformers!***\n\nPlease check out our model cards in huggingface Transformers repository. With several lines of code it should be pretty straighforward to play with the DialoGPT interactively. \n\n[small model: https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-small](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-small)\n\n[medium model: https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-medium](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-medium)\n\n[large model: https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-large](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-large)\n\n[**(New)** Ranking model: https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialogRPT-updown](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialogRPT-updown?text=I+love+NLP%21+%3C%7Cendoftext%7C%3E+Me+too%21)\n\n\n***(Update 01\u002F06\u002F2020) Some third-party decoding script implementations:***\n\n- [https:\u002F\u002Fgithub.com\u002Fpolakowo\u002Fgpt2bot](https:\u002F\u002Fgithub.com\u002Fpolakowo\u002Fgpt2bot) GPT2Bot implementation based on telegram by polakowo, [ref](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-573904419)\n- [https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1PslHE4Rl4RqSa20s7HEp0ZKITBir6ezE](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1PslHE4Rl4RqSa20s7HEp0ZKITBir6ezE) A colab interactive notebook by qywu,[ref](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-551410203)\n- [https:\u002F\u002Fgithub.com\u002Fandreamad8\u002FDialoGPT2-Interact](https:\u002F\u002Fgithub.com\u002Fandreamad8\u002FDialoGPT2-Interact) An interactive script featuring multiturn chatbot by andreamad8,[ref](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-551450016)\n- [https:\u002F\u002Fgithub.com\u002FLHolten\u002FDialoGTP-MMI-decoder](https:\u002F\u002Fgithub.com\u002FLHolten\u002FDialoGTP-MMI-decoder) An MMI implementation by LHolten,[ref](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-558318401)\n- [https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1-_KjlAV3J1IVDw_9KogjKDCzgFY7Jp7E](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1-_KjlAV3J1IVDw_9KogjKDCzgFY7Jp7E) A colab interactive notebook by illuminascent@Reddit,[ref](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FMachineLearning\u002Fcomments\u002Fdt5woy\u002Fp_dialogpt_state_of_the_art_conversational_model\u002F?st=k530k3oo&sh=f6cd20fd)\n- [https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F15wa925dj7jvdvrz8_z3vU7btqAFQLVlG](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F15wa925dj7jvdvrz8_z3vU7btqAFQLVlG) A great tutorial of how to finetune DialoGPT to build a customized bot built by [Rostyslav Neskorozhenyi](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fslanj\u002F). [ref](https:\u002F\u002Ftowardsdatascience.com\u002Fmake-your-own-rick-sanchez-bot-with-transformers-and-dialogpt-fine-tuning-f85e6d1f4e30) \n- [https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT](https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT) A 3rd party demo by [AK391](https:\u002F\u002Fgithub.com\u002FAK391) using Gradio [web demo](https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT)  \n\n\u003C!--**This github repository will be updated soon. Please stay tuned.**-->\n\u003C!--## Minimal Computational Configurations-->\n## Recommended Configuration\n\n- Linux Ubuntu 16.04\n- GPU with at least 12G memory\n\nDialoGPT was developed entirely on **Ubuntu 16.04**, and -- depending on our availability -- we try to provide support if you experience difficulties running the code on the same configuration. However, we are **unable to provide support for other distributions or operating systems**. Portions of the code may run on other UNIX flavors (macOS, Windows subsystem for Linux, Cygwin, etc.), but it is recommended to use Ubuntu for the main training code.\n\nThe training code can be run on CPU, but it can be slow. We would recommend to use GPU to train and finetune all models. There is no minimal limit of the number of GPUs. However, if using distributed train for multiple GPUs configuration, the speed-up vs the number of GPUs is roughly sub-linear. To simulate the same batchsize when using less GPUs, please use a larger `gradient_accumulation_steps` in model training. \n\nThe 117M and 345M model can be loaded in a single GPU with 12G memory. The 762M model would require a single GPU that has greater than 16G memory for efficient training. The training speed on a benchmark data with 50M training instances and V100 GPUs:\n\n| n\\_gpu           | epoch time (h) | token\u002Fsec  |\n|----------------------|--------|--------|\n| 1              | 118 | 10847 |\n| 2              | 62 | 20645 |\n| 4              | 34 | 37647 |\n| 8              | 18 | 71356 |\n\nFine-tuning from our pretrained model on a new dataset typically requires 1-2 epochs.\n\n\n## Setup & Installation (TL;DR)\n\nWe created a demo script `demo.py` to ease the difficulty of the deployment of this system. The `demo.py` contains a pipeline of **model downloading**, data extraction, data preprocessing and model training over a dummy dataset within one commandline. \n\n\n\n#### Train model with Conda Environment\n\nPlease use the below commandlines to clone, install the requirements and load the Conda environment (Note that the Nvidia CUDA 10.0 developer toolkit is required):\n\n\n```bash\nsudo apt-get install -y make wget gzip bzip2 xz-utils zstd sed\n```\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT.git\ncd DialoGPT\nconda env create -f LSP-linux.yml -n LSP\nconda activate LSP\n```\n\nIf you run this on an architecture other than Linux, please use `LSP-generic.yml` instead of `LSP-linux.yml` but please note that the generic one is not tested in all platform, so the stablity can not be gauranteed.\nTo use fp16 training, please install apex by using commands below\n  \n```bash\nconda activate LSP\ngit clone https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex\ncd apex\ngit reset --hard 3d01e4a0a188cc8df54bc6e44cf5eb40ff6b4cc5\npip install -v --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" .\npython3.6 demo.py\n```\n\n#### Train model with Docker environment\nTo start, first install the docker and Nvidia-docker from their official repos.\nThe image environment for running the code can be loaded as below:  \n\n*Nvidia-docker v2.**\n\n```bash\n$ docker run --gpus all --ipc=host --rm -it -v $PWD:\u002Fworkspace --network=host icaruszyz\u002Flarge-scale-training:dialogpt bash\n```\n*Nvidia-docker v1.**\n\n```bash\n$ nvidia-docker --rm -it -v $PWD:\u002Fworkspace --network=host icaruszyz\u002Flarge-scale-training:dialogpt bash\n```\n\nInside the docker container, run \n\n```bash\npython demo.py\n```\n\n\n\n## Pipeline details\n\nThis section explains all components in the `demo.py`.\n\n#### Data loading\nBefore running `demo.py`, you can set *DATA_FOLDER* (default value `.\u002Fmodels`)  in `demo.py` as the place you want to download all the data and pretrained\u002Ffine-tuned models. Then simply run\n```bash\npython demo.py\n```\nto \n\n* automatically download models and data, \n* prepare raw data into db that is ready to use for the program,\n* generate a training scripts.\n\nNote that by default the `demo.py` will use a dummy data, please specify the Reddit training data by using option `--data`. Three options are  available:`dummy`,`small` and `full`. \n\n```bash\npython demo.py --data small\npython demo.py --data full\n```\n\nThe small Reddit data is around 140MB and the full Reddit data is more than 27GB. You can prepare a cup of coffee when processing with the full Reddit data because **it takes a long time**!\n\nTo generate the 6k multi-ref test set data, pleaser run `demo.py` and set the data option to 'full', the generation will be located at\n\n`.\u002Fdata\u002Ftest.refs.txt`\n\n#### Pretrained model\n\nThe pretrained and fine-tuned models are available on azure blobstorage.\nPlease run\u002Fsee `demo.py` for more details about how to download\u002Fuse those models. Or you could download directly by using the links in `demo_utils.py`.\n\n#### Preparing data\nFirst, use the `prepare4db.sh` to convert a tsv data file into the correct format that the following script can recognize.\nThe trainig data need to be then processed into a database file with below commandline:\n\n```bash\npython prepro.py --corpus $DATA_PATH\n```\n\n\n\n#### Using the training script\n\nThe training script can be used in single GPU or multiple GPU settings (distributed training across multiple GPUs within a single node):\n\n```bash\npython .\u002FLSP_train.py  # Single GPU training\npython -m torch.distributed.launch --nproc_per_node=8 .\u002FLSP_train.py  # Training on 8 GPUs\n```\n\n\nThe training script accept several arguments to tweak the training: \n\nArgument | Type | Default value | Description\n---------|------|---------------|------------\nmax\\_seq\\_length | `int` | `128` | Maximum number of tokens for each training instance. \ntrain\\_input\\_file | `str` | `\"\"` | Path of the training dataset in a .db format\neval\\_input\\_file | `str` | `\"\"` | Path of the validation set in a tsv format\ncontinue_from | `int` | `0` | Resuming the training after a specified number of steps\nfp16 | `boolean` | `True` | Whether to use 16-bits floating point for model training.\ntrain\\_batch\\_size | `int` | `4` | Batch size for training\nvalid\\_batch\\_size | `int` | `4` | Batch size for validation\ngradient\\_accumulation\\_steps | `int` | `2` | Accumulate gradients on several steps\nlearning\\_rate | `float` | `1e-5` | Learning rate\nlr\\_schedule | `str` | `noam` | Learning rate schedule can be chosen from [`noam`, `noamwd`, `BERT`, `None`]\nnum\\_optim\\_steps | `int` | `1000000` | Number of training optimization steps\nno_token_id | `boolean` | `True` | If set True, using all-zeros token-type embedding.\n\n\nDuring the training, two log files will be updated. The `train_log.txt` and `eval_log.txt` contains the model loss, perplexity and training speed (tokens\u002Fsec) statistics for the training and dev set. \n\nThe log file and saved model checkpoint can be found in `.\u002Fmodels\u002Foutput_model`\n\n#### Model decoding\nWe note that even with properly filtered Reddit dataset, sometimes our model can still generate moderately toxic\u002Finappropriate responses. Due to this reason, we are unable to provide the decoding script at this time (The live demo and decoding script access is upon invitation only now ).\nWe are currently still working on a controlled decoding method to prevent this system from toxic generation. Please stay tuned. \n\n**See issues [#3](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3) and [Reddit discussions](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FMachineLearning\u002Fcomments\u002Fdt5woy\u002Fp_dialogpt_state_of_the_art_conversational_model\u002F) for some discussions on third-party decoding methods.** \n\nSee below for some third-party decoding methods:\n- [https:\u002F\u002Fgithub.com\u002Fpolakowo\u002Fgpt2bot](https:\u002F\u002Fgithub.com\u002Fpolakowo\u002Fgpt2bot) GPT2Bot implementation based on telegram by polakowo, [ref](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-573904419)\n- [https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1PslHE4Rl4RqSa20s7HEp0ZKITBir6ezE](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1PslHE4Rl4RqSa20s7HEp0ZKITBir6ezE) A colab interactive notebook by qywu,[ref](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-551410203)\n- [https:\u002F\u002Fgithub.com\u002Fandreamad8\u002FDialoGPT2-Interact](https:\u002F\u002Fgithub.com\u002Fandreamad8\u002FDialoGPT2-Interact) An interactive script featuring multiturn chatbot by andreamad8,[ref](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-551450016)\n- [https:\u002F\u002Fgithub.com\u002FLHolten\u002FDialoGTP-MMI-decoder](https:\u002F\u002Fgithub.com\u002FLHolten\u002FDialoGTP-MMI-decoder) An MMI implementation by LHolten,[ref](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-558318401)\n- [https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1-_KjlAV3J1IVDw_9KogjKDCzgFY7Jp7E](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1-_KjlAV3J1IVDw_9KogjKDCzgFY7Jp7E) A colab interactive notebook by illuminascent@Reddit,[ref](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FMachineLearning\u002Fcomments\u002Fdt5woy\u002Fp_dialogpt_state_of_the_art_conversational_model\u002F?st=k530k3oo&sh=f6cd20fd)\n- [https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT](https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT) A 3rd party demo by [AK391](https:\u002F\u002Fgithub.com\u002FAK391) using Gradio [web demo](https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT)  \n\n## Models\n\nWe release 6 fine-tuned models which can be further fine-tuned on low-resource  user-customized dataset. The total parameters in these models range from 117M to 762M, in accord with OpenAI GPT-2 model sizes.   \n\n| Model           |  Fine-tuned from GPT-2| Trained from scratch\n|----------------------|--------|--------|\n| DialoGPT 762M model| [\\[link\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Flarge_ft.pkl) [\\[huggingface model card\\]](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-large)  | [\\[link\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Flarge_fs.pkl) |\n| DialoGPT 345M model| [\\[link\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Fmedium_ft.pkl) [\\[huggingface model card\\]](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-medium) | [\\[link\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Fmedium_fs.pkl) | \n| DialoGPT 117M model| [\\[link\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Fsmall_ft.pkl) [\\[huggingface model card\\]](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-small)| [\\[link\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Fsmall_fs.pkl) | \n| DialoGPT 345M model (reverse, for MMI)| [link](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Fsmall_reverse.pkl) | -| \n| [DialogRPT](https:\u002F\u002Fgithub.com\u002Fgolsun\u002FDialogRPT) (**new** ranking models) | [link](https:\u002F\u002Fgithub.com\u002Fgolsun\u002FDialogRPT) | -| \n\n\nThe model files can be loaded exactly as the GPT-2 model checkpoints from Huggingface's [Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers). You can find the corresponding configuration files (`merges.txt`, `config.json`, `vocab.json`) in DialoGPT's repo in `.\u002Fconfigs\u002F*`.\n\nThe reverse model is predicting the source from the target. This model is used  for MMI reranking. \n\nThe [DialogRPT](https:\u002F\u002Fgithub.com\u002Fgolsun\u002FDialogRPT) models our recently proposed ranking models used to predict the human feedback (upvotes, replies) of the responses. These models can be used to improve the DialoGPT generation quality (see our [EMNLP paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.06978) for details).\n\n## Retraining full models\n\n### Data Preparation\n\nThe first step to retrain the full models is to generate the aforementioned 27GB Reddit dataset. This involves downloading full Reddit submission and comments dumps from [https:\u002F\u002Ffiles.pushshift.io\u002Freddit](https:\u002F\u002Ffiles.pushshift.io\u002Freddit) and creating intermediate files, which overall require **700GB of local disk space**. Downloading and processing the full data requires about 1-2 days, depending on your (CPU) compute capabilties (e.g., ~24 hours with 8 cores on a recent computer). Assuming you ran the above setup and installation steps (conda activate LSP, etc.), you can create the full dataset by running either:\n\n```\npython demo.py --data full\n```\nor\n```\ncd reddit_extractor; SIZE=full make -j 8; cd ..\n```\n\nThe former command calls the latter, so the two methods are equivalent. We recommend the former, as the latter is mostly useful if you run into any problem or want to customize any arguments (e.g., the `make` command lets you build only a subset of the data). Note that the downloading phase can be error prone, for example based on your geolocation (firewall, etc.). If the above commands fail to generate `data\u002Ftrain.tsv`, or if that file is not anywhere close to 27GB, it means something went wrong. In that case, you may want to inspect `reddit_extractor\u002Fwget-log` and `reddit_extractor\u002Flogs\u002F*.log` for any obvious error (e.g., wget unable to download from pushshift.io). If error messages don't make sense to you, feel free to contact us. If so, please be sure to include any error messages gathered from these log files.\n\nTraining data statistics: the generated training tsv file should be roughly 26.8 GB uncompressed, with 146.8M training instances, 3.87B source tokens, and 2.14B target tokens (including utterance-level 0\u002F1 weights). The resulting train.tsv file should contain 146,846,215 lines.\n\n\n### Training\n\nWe recommand generating the above data using the `demo.py --data full`, as it (1) generates the data, (2) converts it into DB format, and (3) trains a model using `python LSP_train.py`. Please directly edit `demo.py` if you want to customize any of the hyperparameters.\n\n\n## Evaluations\n\n#### DSTC-7 challenge\n\nOur model achieved the state-of-the-art results in [DSTC-7 Challenge response generation task](https:\u002F\u002Fgithub.com\u002Fmgalley\u002FDSTC7-End-to-End-Conversation-Modeling). \n\n\n| Experiment         | NIST2 | NIST4 | BLEU2  | BLEU4 | METEOR | ENT-4 | DIST-1 | DIST-2 | Avg. Len |\n|--------------------|-------|-------|--------|-------|--------|----------|------------|------------|---------|\n| Human response     | 2.62  | 2.65  | 12.35% | 3.13% | 8.31%  | 10.45    | 16.66%     | 67.01%     | 18.8    |\n| DSTC-7 Winner      | 2.51  | 2.52  | 14.35% | 1.83% | 8.07%  | 9.03     | 10.89%     | 32.49%     | 15.1    |\n| DialoGPT 345M      | 2.80  | 2.82  | 14.16% | 2.31% | 8.51%  | **10.08**    | 9.13%      | 39.73%     | 16.9    |\n| DialoGPT 345M (BS) | **2.92**  | **2.97**  | **19.18%** | **6.05%** | **9.29%**  | 9.57     | **15.73%**     | **51.03%**     | 14.2    |\n\nwhere ENT represents the [Entropy score](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.05972), and DIST represents the [Distinct score](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1510.03055.pdf). For all metrics except the average length, larger are better.  \n\n\u003C!--| Experiment           | NIST1  | NIST2  | NIST3  | NIST4  | BLEU1  | BLEU2  | BLEU3  | BLEU4  | METEOR | ENT-1 | ENT-2 | ENT-3 | ENT-4 | DIST-1 | DIST-2 | Len |\n|----------------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|----------|----------|----------|----------|------------|------------|---------|\n| Human                | 2.4237 | 2.6244 | 2.6472 | 2.65   | 0.3408 | 0.1235 | 0.0572 | 0.0313 | 0.0831 | 6.5893   | 9.7423   | 10.4101  | 10.4450  | 0.1666     | 0.6701     | 18.7568 |\n| DSTC-7 Winner | 2.3408 | 2.5102 | 2.522  | 2.523  | 0.4122 | 0.1435 | 0.0501 | 0.0183 | 0.0807 | 5.3832   | 7.6065   | 8.5304   | 9.0298   | 0.1089     | 0.3249     | 15.1327 |\n| DialoGPT           | 2.5863 | 2.804  | 2.823  | 2.8246 | 0.3927 | 0.1416 | 0.0555 | 0.0231 | 0.0851 | 5.5791   | 8.5109   | 9.6872   | 10.0765  | 0.0913     | 0.3973     | 16.9484 |\n| DialoGPT(beam search)       | **2.5943**| **2.9163** | **2.9624** | **2.9681**| **0.4238** | **0.1918** | **0.1027** | **0.0605** | **0.0929** | **6.0815**   | **8.7379**   | 9.4037   | 9.5697   | 0.1573     | 0.5103     | 14.1603 |-->\n\nNote that the superior automatic evaluation comparing to human responses does not necessary imply that our model achieves human parity. Please check out our paper for more detailed analysis.\n\n\nTo fine-tune the `345M` DialoGPT model on the DSTC-7 challenge data on a server with 8 V100 GPUs, please run the following commandline (The DSTC data can be found at [DSTC-7 repo](https:\u002F\u002Fgithub.com\u002Fmgalley\u002FDSTC7-End-to-End-Conversation-Modeling)): \n\n```bash\npython3 -m torch.distributed.launch --nproc_per_node=8 train_LSP.py --init_checkpoint .\u002Fmodels\u002Fmedium\u002Fmedium_ft.pkl --train_input_file .\u002Fdata\u002FDSTC_train.db --eval_input_file .\u002Fdata\u002FDSTC_valid.tsv --model_name_or_path .\u002Fmodel\u002Fmedium\u002F --learning_rate 1e-4  --train_batch_size 64 --eval_batch_size 64 --no_token_id\n```\n\nThe trained model can be found at [DSTC medium model](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002FDSTC\u002Fmedium_ft.pkl)\n\n\n#### Evaluation\n\n1. Please **downloads** the following 3rd-party packages and save into the empty folder `3rdparty`:\n\t* [**mteval-v14c.pl**](https:\u002F\u002Fgoo.gl\u002FYUFajQ) to compute [NIST](http:\u002F\u002Fwww.mt-archive.info\u002FHLT-2002-Doddington.pdf). You may need to install the following [perl](https:\u002F\u002Fwww.perl.org\u002Fget.html) modules (e.g. by `cpan install`): XML:Twig, Sort:Naturally and String:Util.\n\t* [**meteor-1.5**](http:\u002F\u002Fwww.cs.cmu.edu\u002F~alavie\u002FMETEOR\u002Fdownload\u002Fmeteor-1.5.tar.gz) to compute [METEOR](http:\u002F\u002Fwww.cs.cmu.edu\u002F~alavie\u002FMETEOR\u002Findex.html). It requires [Java](https:\u002F\u002Fwww.java.com\u002Fen\u002Fdownload\u002Fhelp\u002Fdownload_options.xml).\n\n\n2. Please follow the [DSTC-7 official repo](https:\u002F\u002Fgithub.com\u002Fmgalley\u002FDSTC7-End-to-End-Conversation-Modeling\u002Ftree\u002Fmaster\u002Fdata_extraction) to extract the data, and put `data-official-test\u002Ftest.refs.txt` into `.\u002Fdstc\u002Fdata\u002F` folder.\n\n3. Run the extraction script below to produce the human response hypothesis file `human.resp.txt`:\n\n\t```bash\n\tpython extract_human.py\n\t```\n\n4. Finally, to reproduce the results of human hypothesis on DSTC dataset, please run following commands under the repo folder:\n\n\t```bash\n\tpython batch_eval.py\n\t```\n\nThe evaluation results will be generated in the folder `.\u002Fdstc\u002Feval\u002F`\n\n\n## 6K multi-ref dataset result\n\n### Automatic evaluation\n\nWe test on 6K multi-ref dataset from Reddit. The results are summarized in below\n\n| Experiment         | NIST2 | NIST4 | BLEU2  | BLEU4 | METEOR | ENT-4 | DIST-1 | DIST-2 | Avg. Len |\n|--------------------|-------|-------|--------|-------|--------|----------|------------|------------|---------|\n| Human response     | 3.41  | 4.25  | 17.90% | 7.48% | 10.64% | 11       | 14.50%     | 63.00%     | 13.1    |\n| DialoGPT 117M      | 2.39  | 2.41  | 10.54% | 1.55% | 7.53%  | 10.78    | 8.60%      | 39.90%     | 12.8    |\n| DialoGPT 345M      | 3     | 3.06  | 16.96% | 4.56% | 9.81%  | 9.13     | 6.80%      | 26.30%     | 12.2    |\n| DialoGPT 762M      | 2.84  | 2.9   | 18.66% | 5.25% | 9.66%  | 9.72     | 7.76%      | 29.93%     | 11.2    |\n| DialoGPT 345M (BS) | **3.4**  | **3.5**   | **21.76%** | **7.92%** | 10.74%  | 10.48     | **12.38%**     | **48.74%**    | 11.3    |\n| DialoGPT 345M (w\u002FMMI)| 3.28  | 3.33 | 15.68% | 3.94% | **11.23%**  | **11.25**     | 9.39%    | 45.55%   | 17.2    |\n\n### \u003Ca name=\"human_eval\">\u003C\u002Fa>Human evaluation \n\nWe further conduct human evaluations (6K examples for each methods, each example is evaluated by 3 human judges). The results show a strong evidence that our generation quality is towards approaching the quality of real human responses, under this non-interactive Turing test:\n\n\n*Relevance*: A and B, which one is more relevant to the source prompt.\n\n| System A | A Wins (%) | Ties (%) | B Wins (%) | System B|\n|--------------------|-------|-------|--------|-------|\n|DialoGPT 345M|2671      (45%)   | 513         (9%) |   2816       (47%)| Human responses|\n|DialoGPT 345M| 3281       (72%)|    394         (9%)  |  882         (19%)| [PersonalityChat](https:\u002F\u002Fdocs.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fproject-personality-chat\u002Foverview)|\n|DialoGPT 345M w\u002F MMI| **2871**     (48%)|    522         (9%)  |  2607      (43%)| Human responses|\n\n*Informativeness*: A and B, which one is more contentful and informative. \n\n| System A | A Wins (%) | Ties (%) | B Wins (%) | System B|\n|--------------------|-------|-------|--------|-------|\n|DialoGPT 345M| 2722       (45%) |  234         (4%) |  3044       (51%)| Human responses|\n|DialoGPT 345M|3490       (77%) |   206         (5%)  |  861         (19%)| [PersonalityChat](https:\u002F\u002Fdocs.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fproject-personality-chat\u002Foverview)|\n|DialoGPT 345M w\u002F MMI| **3011**       (50%)|    234        (4%)  |  2755       (46%)| Human responses|\n\n\n*Human-Like*: A and B, which one do you think is more likely to be generated by Human.\n\n| System A | A Wins (%) | Ties (%) | B Wins (%) | System B|\n|--------------------|-------|-------|--------|-------|\n|DialoGPT 345M|2716       (45%)  | 263         (4%)  | 3021       (50%)| Human responses|\n|DialoGPT 345M|3462       (76%) |  196         (4%)  | 899         (20%)| [PersonalityChat](https:\u002F\u002Fdocs.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fproject-personality-chat\u002Foverview)|\n|DialoGPT 345M w\u002F MMI| **2978**      (50%)|    241         (4%)  |  2781        (46%)| Human responses|\n\n\nPlease see full details in our [arxiv paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.00536). \n\n\n\n\n\u003C!--Relevance\nSystem Wins      (%)         Ties        (%)         Losses   (%)\n2 vs 1     2671       (0.45)    513         (0.09)    2816       (0.47)\n2 vs 3     3281       (0.72)    394         (0.09)    882         (0.19)\n2 vs 4     2379       (0.40)    527         (0.09)    3094       (0.52)\n2 vs 5     3019       (0.50)    581         (0.10)    2400       (0.40)\n2 vs 6     2726       (0.45)    576         (0.10)    2698       (0.45)\n \nInformativeness\nSystem Wins      (%)         Ties        (%)         Losses   (%)\n2 vs 1     2722       (0.45)    234         (0.04)    3044       (0.51)\n2 vs 3     3490       (0.77)    206         (0.05)    861         (0.19)\n2 vs 4     2474       (0.41)    257         (0.04)    3269       (0.54)\n2 vs 5     3230       (0.54)    362         (0.06)    2408       (0.40)\n2 vs 6     2856       (0.48)    303         (0.05)    2841       (0.47)\n \nHuman-Like\nSystem Wins      (%)         Ties        (%)         Losses   (%)\n2 vs 1     2716       (0.45)    263         (0.04)    3021       (0.50)\n2 vs 3     3462       (0.76)    196         (0.04)    899         (0.20)\n2 vs 4     2478       (0.41)    289         (0.05)    3233       (0.54)\n2 vs 5     3233       (0.54)    340         (0.06)    2427       (0.40)\n2 vs 6     2847       (0.47)    321         (0.05)    2832       (0.47)\n--> \n\n\n\u003C!--| Experiment                   | NIST1 | NIST2 | NIST3 | NIST4 | BLEU1  | BLEU2  | BLEU3  | BLEU4 | METEOR | ENT-4 | DIST-1 | DIST-2 |\n|------------------------------|-------|-------|-------|-------|--------|--------|--------|-------|--------|----------|------------|------------|\n| Human response               | 2.99  | 3.41  | 3.83  | 4.25  | 39.61% | 17.90% | 10.71% | 7.48% | 10.64% | 11       | 14.50%     | 63.00%     |\n| DialoGPT 117M      | 2.25  | 2.39  | 2.41  | 2.41  | 35.43% | 10.54% | 3.85%  | 1.55% | 7.53%  | 10.78    | 8.60%      | 39.90%     |\n| DialoGPT 345M      |**2.67** | **3**     | **3.05**  | **3.06**  | 40.97% | 16.96% | 8.31%  | 4.56% | **9.81%**  | 9.13     | 6.80%      | 26.30%     |\n| DialoGPT 762M      | 2.51  | 2.84  | 2.89  | 2.9   | **44.45%** | **18.66%** | **9.32%**  | **5.25%** | 9.66%  | **9.72**     | **7.76%**      | **29.93%**    |-->\n\n\u003C!--| DialoGPT 762M  (beam search)| 2.51  | 2.9   | 2.97  | 2.98  | 44.84% | 21.08% | 12.01% | 7.57% | 10.11% | 10.06    | 11.62%     | 44.07%     |-->\n\u003C!--## ConvAI challenge -->\n\u003C!--| DialoGPT 345M (beam search)  | 2.94  | 3.4   | 3.49  | 3.5   | 45.27% | 21.76% | 12.51% | 7.92% | 10.74% | 10.48    | 12.38%     | 48.74%     |-->\n## Generated Examples\n* Multi-turn generation examples from an interactive environment:\n\n|Role | Response |\n|---------|--------|\n|User | Does money buy happiness? |\n| Bot |  Depends how much money you spend on it .|\n|User | What is the best way to buy happiness ? |\n| Bot | You just have to be a millionaire by your early 20s, then you can be happy . |\n|User |This is so difficult ! |\n| Bot | You have no idea how hard it is to be a millionaire and happy . There is a reason the rich have a lot of money |\n\n* Multi-turn  self-playing  dialogue with user prompt\n\n|Role | Response |\n|---------|--------|\n|User | what is the meaning of life ?|\n| Bot | The  meaning  is  to  be  with  your  family  and friends .|\n|Bot | I’m going to guess :  It means that your parents and friends have loved you . That’s the meaning , I believe . |\n| Bot | I think  the  meaning  is  to  make  sure  you’re alive when you die . So you’ll be with your family , friends , and yourself .|\n|Bot |It’s not , but thanks for trying to help . |\n\n\n\n\n\n* Commonsense question answering examples\n\n|User | Bot |\n|---------|--------|\n|who is the first president of the United States | George Washington |\n|what is the boiling point of water? | I think it's about 212 F. |\n|which one is bigger, sun or moon? | The sun . |\n|which animal has black and white stripes? | A black and white striped cat . |\n\n* Handling context that does not appear in training examples (note that training examples are sampled from 2005 to 2017)\n\n|User | Bot |\n|---------|--------|\n|The trading war between China and US is still happening . | I'm sure the Chinese government and the U . S . will be fine with that as long as they keep their arms on the table and the money flowing . |\n|who won the world cup in 2018 ? | Germany , I believe . |\n|Nvidia's Titan RTX is really good . | I have this card and i love it . No complaints . |\n|Can Porsche beat Tesla with its new Taycan EV ? | I don't think so . The Tesla is more powerful than the porsche .\n\n* Our bot's response to philosophy questions\n\n|User | Bot |\n|---------|--------|\n|What is the meaning of a good life ? | I think it's that we're all connected to our past lives and the meaning of life is to live the way you want to and how you want to . |\n|Which is more important -- being successful or being happy ? | Being successful is just endless . Being happy is the only thing that matters . |\n|Is peace the only way to stop war ? |  Not if the world is not willing to work with you . Peace is the ultimate goal , but peace requires more than that . |\n| How much freedom should people have ? |  The right amount of freedom is the freedom to do as you please , as long as you don't hurt people or property .|\n\n\n* Finally  \n\n|User | Bot |\n|---------|--------|\n|What do you think of the DialoGPT repo ? | It's pretty good , but it's very easy to find a bug . |\n\nPlease start a issue if you spot any :)\n\n\n\n## Try our system\nThe live demo and decoding script access is upon invitation only now. Please stayed tuned for the full release. \n\n\n\u003C!--The live demo and decoding script access is upon approval request. Please apply [here](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1epTNfaqva1isVO_o9pbyhVLsnzDn58dGkcLB0OUVcqs\u002Fedit?usp=sharing)-->\n\n\u003C!--This model should give a Hits@1 over 79, perplexity of 20.5 and F1 of 16.5 using the convai2 evaluation script (see below).\n\nThese numbers are slightly lower than the number we obtained in the ConvAI2 competition. Here is what you can tweak to reach the same results:\n\n- in the ConvAI2 competition we also used tweaked position emebddings so that the history of the dialog always start at with the same embeddings. This is easy to add with pytorch-pretrained-bert and should improve the hits@1 metric.\n- in the ConvAI2 competition we used a beam search decoder. While the results are better in term of f1 metric, our feeling is that the human experience is les compelling with beam search versus the nucleus sampling detector which is provided in the present repository.-->\n\n\u003C!--## Using the interaction script\n\nThe training script saves all the experiments and checkpoints in a sub-folder named with the timestamp of the experiment in the `.\u002Fruns` folder of the repository base folder.\n\nYou can then use the interactive script to interact with the model simply by pointing to this folder.\n\nHere is an example command line to run the interactive script:\n\n```bash\npython .\u002Finteract.py --model_checkpoint .\u002Fdata\u002FApr17_13-31-38_thunder\u002F  # run the interactive script with a training checkpoint\npython .\u002Finteract.py  # run the interactive script with the finetuned model on our S3\n```\n\nThe fine-tuned model will gives FINAL Hits@1: 0.715\n\nThe interactive script accept a few arguments to tweak the decoding algorithm:\n\nArgument | Type | Default value | Description\n---------|------|---------------|------------\ndataset_path | `str` | `\"\"` | Path or url of the dataset. If empty download from S3.\ndataset_cache | `str` | `'.\u002Fdataset_cache.bin'` | Path or url of the dataset cache\nmodel | `str` | `\"openai-gpt\"` | Path, url or short name of the model\nmax_history | `int` | `2` | Number of previous utterances to keep in history\ndevice | `str` | `cuda` if `torch.cuda.is_available()` else `cpu` | Device (cuda or cpu)\nno_sample | action `store_true` | Set to use greedy decoding instead of sampling\nmax_length | `int` | `20` | Maximum length of the output utterances\nmin_length | `int` | `1` | Minimum length of the output utterances\nseed | `int` | `42` | Seed\ntemperature | `int` | `0.7` | Sampling softmax temperature\ntop_k | `int` | `0` | Filter top-k tokens before sampling (`\u003C=0`: no filtering)\ntop_p | `float` | `0.9` | Nucleus filtering (top-p) before sampling (`\u003C=0.0`: no filtering)\n\n## Running ConvAI2 evaluation scripts\n\nTo run the evaluation scripts of the ConvAI2 challenge, you first need to install `ParlAI` in the repo base folder like this:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FParlAI.git\ncd ParlAI\npython setup.py develop\n```\n\nYou can then run the evaluation script from `ParlAI` base folder:\n\n```bash\ncd ParlAI\npython ..\u002Fconvai_evaluation.py --eval_type hits@1  # to download and evaluate our fine-tuned model on hits@1 metric\npython ..\u002Fconvai_evaluation.py --eval_type hits@1  --model_checkpoint .\u002Fdata\u002FApr17_13-31-38_thunder\u002F  # to evaluate a training checkpoint on hits@1 metric\n```\n\nThe evaluation script accept a few arguments to select the evaluation metric and tweak the decoding algorithm:\n\nArgument | Type | Default value | Description\n---------|------|---------------|------------\neval_type | `str` | `\"hits@1\"` | Evaluate the model on `hits@1`, `ppl` or `f1` metric on the ConvAI2 validation dataset\nmodel | `str` | `\"openai-gpt\"` | Path, url or short name of the model\nmax_history | `int` | `2` | Number of previous utterances to keep in history\ndevice | `str` | `cuda` if `torch.cuda.is_available()` else `cpu` | Device (cuda or cpu)\nno_sample | action `store_true` | Set to use greedy decoding instead of sampling\nmax_length | `int` | `20` | Maximum length of the output utterances\nmin_length | `int` | `1` | Minimum length of the output utterances\nseed | `int` | `42` | Seed\ntemperature | `int` | `0.7` | Sampling softmax temperature\ntop_k | `int` | `0` | Filter top-k tokens before sampling (`\u003C=0`: no filtering)\ntop_p | `float` | `0.9` | Nucleus filtering (top-p) before sampling (`\u003C=0.0`: no filtering)\n\n-->\n\n## Related Project\n\n* RetGen: [https:\u002F\u002Fgithub.com\u002Fdreasysnail\u002FRetGen](https:\u002F\u002Fgithub.com\u002Fdreasysnail\u002FRetGen). Retrieval-augmented\u002Fgrounded DialoGPT and beyond. RetGen is a joint training framework that simultaneously optimizes a dense passage retriever and a knowledge-grounded text generator in an end-to-end fashion. \n\n* Microsoft ICECAPS: [https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Ficecaps](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Ficecaps). \n\n\tAs an orthogonal repository of this project, \n\tMicrosoft Icecaps is an open-source toolkit (in tensorflow) for building neural conversational systems. Icecaps provides an array of tools from recent conversation modeling and general NLP literature within a flexible paradigm that enables complex multi-task learning setups. \n\n* Pretrained UniLM: [https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm)\n* MT-DNN: [https:\u002F\u002Fgithub.com\u002Fnamisan\u002Fmt-dnn](https:\u002F\u002Fgithub.com\u002Fnamisan\u002Fmt-dnn)\n* A chinese counterpart of DialoGPT by yangjianxin1. [https:\u002F\u002Fgithub.com\u002Fyangjianxin1\u002FGPT2-chitchat](https:\u002F\u002Fgithub.com\u002Fyangjianxin1\u002FGPT2-chitchat). We are glad to see that the MMI strategy that we used in DialoGPT has also improved the performance for this project as well!\n\n## Contact\n\nPlease contact [DialoGPT@microsoft.com](mailto:DialoGPT@microsoft.com) if you have any questions\u002Fsuggestions. However, the response will be sporadic. Please expect delay.\n\n## Contributing\n\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https:\u002F\u002Fcla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002F).\nFor more information see the [Code of Conduct FAQ](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002Ffaq\u002F) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n## Disclaimer\n\nThis repository aims to facilitate research in large-scale pretraining for conversational data. This toolkit contains only part of the modeling machinery needed to actually produce a model weight file in a running dialog. On its own, this model provides only information about the weights of various text spans; in order for a researcher to actually use it, they will need to bring conversational data of their own and decode the response generation from the pretrained system. Microsoft is not responsible for any generation from the 3rd party utilization of the pretrained system. \n\n\n\n## Citation\nIf you use this code in your research, you can cite our [arxiv paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.00536):\n```bash\n@inproceedings{zhang2019dialogpt,\n    title={DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation},\n    author={Yizhe Zhang and Siqi Sun and Michel Galley and Yen-Chun Chen and Chris Brockett and Xiang Gao and Jianfeng Gao and Jingjing Liu and Bill Dolan},\n    year={2020},\n    booktitle={ACL, system demonstration}\n}\n```\n\n\n\n","# 一种最先进的大规模预训练回复生成模型（DialoGPT）\n\n## 本项目页面已不再维护，因为DialoGPT已被[GODEL](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FGODEL)取代。根据[这篇论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2206.11309.pdf)的结果，GODEL在性能上优于DialoGPT。除非出于复现目的需要使用DialoGPT，否则我们强烈建议您切换到[GODEL](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FGODEL)。\n\n此仓库包含一个大规模预训练对话回复生成模型的源代码和训练好的模型。[人工评估结果](#human_eval)表明，在单轮对话图灵测试中，DialoGPT生成的回复质量可与人类回复相媲美。\n\n\u003C!--更多详情请参阅我们的[项目页面](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fproject\u002Flarge-scale-pretraining-for-response-generation\u002F)-->\n\n该仓库基于[huggingface pytorch-transformer](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransfer-learning-conv-ai)和[OpenAI GPT-2](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgpt-2)，包含了数据提取脚本、模型训练代码以及预训练的小型（117M）、中型（345M）和大型（762M）模型检查点。\n\n该模型是在来自Reddit讨论线程的1.47亿个多轮对话上进行训练的。最大的模型可以在8张V100显卡的机器上用几个小时完成训练（但这并非必需），支持分布式训练和FP16优化。\n\n其中的脚本可用于复现DSTC-7接地式对话生成挑战赛的结果，以及从Reddit数据中创建的6千条多参考数据集。\n\n项目网页：[https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fproject\u002Flarge-scale-pretraining-for-response-generation\u002F](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fproject\u002Flarge-scale-pretraining-for-response-generation\u002F)\n\nArXiv论文：[https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.00536](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.00536)\n\n\n## 新闻 ##\n\n***(更新于2022年7月9日) files.pushshift.io\u002Freddit服务器上的更改导致我们的数据生成管道中断。这些问题现已修复，下方“数据准备”小节中说明的步骤应能再次正常运行。使用8个进程（`-j 8`）大约10小时即可生成数据，同时需要800GB的临时磁盘空间。***\n\n***(更新于2021年6月23日) 我们发布了DialoGPT的检索增强\u002F接地版本（RetGen），请查看[RetGen仓库](https:\u002F\u002Fgithub.com\u002Fdreasysnail\u002FRetGen)和[RetGen论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.06597)***\n\n***(更新于2021年5月20日) YouTube上有一段由[Prakhar Mishra](http:\u002F\u002Fwsl.iiitb.ac.in\u002Fprakhar-mishra\u002F)制作的关于DialoGPT的精彩[视频教程](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Zo679MYoJns)***\n\n***(更新于2021年3月31日) 第三方开发者[AK391](https:\u002F\u002Fgithub.com\u002FAK391)使用Gradio制作了一个[网页演示](https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT)，快来试试吧***\n\n\n***(更新于2020年9月15日) 一组大规模的[对话排序模型](https:\u002F\u002Fgithub.com\u002Fgolsun\u002FDialogRPT)已经发布！***\n\n通过与我们最新的对话排序模型[DialogRPT](https:\u002F\u002Fgithub.com\u002Fgolsun\u002FDialogRPT)集成，DialoGPT的生成效果得到了进一步提升。\n\n***(更新于2020年7月8日) 6千条多参考测试集已经发布！***\n\n要生成该数据，请运行`demo.py`并将数据选项设置为‘full’，生成的6千条多参考测试集将位于：\n\n`.\u002Fdata\u002Ftest.refs.txt`\n\n***(更新于2020年3月10日) Huggingface Transformers中提供了模型卡片！***\n\n请查看我们在Huggingface Transformers仓库中的模型卡片。只需几行代码，就可以非常方便地交互式地体验DialoGPT。\n\n[小型模型：https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-small](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-small)\n\n[中型模型：https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-medium](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-medium)\n\n[大型模型：https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-large](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-large)\n\n[**(新)** 排序模型：https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialogRPT-updown](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialogRPT-updown?text=I+love+NLP%21+%3C%7Cendoftext%7C%3E+Me+too%21)\n\n\n***(更新于2020年1月6日) 一些第三方解码脚本实现：***\n\n- [https:\u002F\u002Fgithub.com\u002Fpolakowo\u002Fgpt2bot](https:\u002F\u002Fgithub.com\u002Fpolakowo\u002Fgpt2bot) polakowo基于Telegram实现的GPT2Bot，[参考](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-573904419)\n- [https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1PslHE4Rl4RqSa20s7HEp0ZKITBir6ezE](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1PslHE4Rl4RqSa20s7HEp0ZKITBir6ezE) qywu制作的Colab交互式笔记本，[参考](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-551410203)\n- [https:\u002F\u002Fgithub.com\u002Fandreamad8\u002FDialoGPT2-Interact](https:\u002F\u002Fgithub.com\u002Fandreamad8\u002FDialoGPT2-Interact) andreamad8开发的多轮聊天机器人交互式脚本，[参考](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-551450016)\n- [https:\u002F\u002Fgithub.com\u002FLHolten\u002FDialoGTP-MMI-decoder](https:\u002F\u002Fgithub.com\u002FLHolten\u002FDialoGTP-MMI-decoder) LHolten实现的MMI解码器，[参考](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-558318401)\n- [https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1-_KjlAV3J1IVDw_9KogjKDCzgFY7Jp7E](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1-_KjlAV3J1IVDw_9KogjKDCzgFY7Jp7E) illuminascent@Reddit制作的Colab交互式笔记本，[参考](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FMachineLearning\u002Fcomments\u002Fdt5woy\u002Fp_dialogpt_state_of_the_art_conversational_model\u002F?st=k530k3oo&sh=f6cd20fd)\n- [https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F15wa925dj7jvdvrz8_z3vU7btqAFQLVlG](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F15wa925dj7jvdvrz8_z3vU7btqAFQLVlG) Rostyslav Neskorozhenyi（LinkedIn: slanj）制作的一篇优秀教程，介绍了如何微调DialoGPT以构建自定义机器人。[参考](https:\u002F\u002Ftowardsdatascience.com\u002Fmake-your-own-rick-sanchez-bot-with-transformers-and-dialogpt-fine-tuning-f85e6d1f4e30) \n- [https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT](https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT) 第三方开发者[AK391](https:\u002F\u002Fgithub.com\u002FAK391)使用Gradio制作的[网页演示](https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT)  \n\n\u003C!--**此GitHub仓库将很快更新，请继续关注。**-->\n\u003C!--## 最低计算配置-->\n\n## 推荐配置\n\n- Linux Ubuntu 16.04\n- 至少配备12G显存的GPU\n\nDialoGPT 完全在 **Ubuntu 16.04** 上开发。根据我们的资源情况，如果您在同一配置下运行代码遇到困难，我们会尽力提供支持。然而，我们 **无法为其他发行版或操作系统提供支持**。部分代码可能能在其他类 UNIX 系统（如 macOS、Windows Subsystem for Linux、Cygwin 等）上运行，但建议使用 Ubuntu 来运行主要的训练代码。\n\n训练代码可以在 CPU 上运行，但速度会较慢。我们推荐使用 GPU 来训练和微调所有模型。对 GPU 数量没有最低限制。不过，当使用多 GPU 的分布式训练时，加速效果与 GPU 数量的关系大致呈亚线性。为了在较少的 GPU 上模拟相同的批量大小，请在模型训练中增加 `gradient_accumulation_steps` 参数。\n\n117M 和 345M 模型可以在单块 12G 显存的 GPU 上加载。而 762M 模型则需要单块显存大于 16G 的 GPU 才能高效训练。以下是在包含 5000 万条训练样本的基准数据集上，使用 V100 GPU 进行训练的速度对比：\n\n| n\\_gpu           | 每轮耗时 (小时) | 每秒处理的 token 数 |\n|----------------------|--------|--------|\n| 1              | 118 | 10847 |\n| 2              | 62 | 20645 |\n| 4              | 34 | 37647 |\n| 8              | 18 | 71356 |\n\n从我们的预训练模型开始，在新数据集上进行微调通常只需要 1–2 轮。\n\n## 设置与安装（简要说明）\n\n我们创建了一个演示脚本 `demo.py`，以简化该系统的部署过程。`demo.py` 包含一个完整的流程：**模型下载**、数据提取、数据预处理以及在示例数据集上进行模型训练，所有步骤只需一条命令即可完成。\n\n\n\n#### 使用 Conda 环境训练模型\n\n请使用以下命令克隆代码、安装依赖并激活 Conda 环境（注意需要安装 Nvidia CUDA 10.0 开发工具包）：\n\n\n```bash\nsudo apt-get install -y make wget gzip bzip2 xz-utils zstd sed\n```\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT.git\ncd DialoGPT\nconda env create -f LSP-linux.yml -n LSP\nconda activate LSP\n```\n\n如果您在非 Linux 架构上运行，请使用 `LSP-generic.yml` 替代 `LSP-linux.yml`，但请注意，通用版本并未在所有平台上测试过，因此稳定性无法保证。若要使用 fp16 精度训练，请按以下命令安装 Apex：\n\n```bash\nconda activate LSP\ngit clone https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex\ncd apex\ngit reset --hard 3d01e4a0a188cc8df54bc6e44cf5eb40ff6b4cc5\npip install -v --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" .\npython3.6 demo.py\n```\n\n#### 使用 Docker 环境训练模型\n首先，请从官方源安装 Docker 和 Nvidia-Docker。\n运行代码所需的镜像环境可按如下方式加载：\n\n*Nvidia-Docker v2.*\n\n```bash\n$ docker run --gpus all --ipc=host --rm -it -v $PWD:\u002Fworkspace --network=host icaruszyz\u002Flarge-scale-training:dialogpt bash\n```\n\n*Nvidia-Docker v1.*\n\n```bash\n$ nvidia-docker --rm -it -v $PWD:\u002Fworkspace --network=host icaruszyz\u002Flarge-scale-training:dialogpt bash\n```\n\n进入 Docker 容器后，运行：\n\n```bash\npython demo.py\n```\n\n## 流水线详情\n\n本节将解释 `demo.py` 中的所有组件。\n\n#### 数据加载\n在运行 `demo.py` 之前，您可以在 `demo.py` 中设置 *DATA_FOLDER*（默认值为 `.\u002Fmodels`），指定下载所有数据和预训练\u002F微调模型的目录。然后只需运行以下命令：\n```bash\npython demo.py\n```\n即可完成：\n\n* 自动下载模型和数据，\n* 将原始数据准备成程序可直接使用的数据库文件，\n* 生成训练脚本。\n\n请注意，默认情况下 `demo.py` 会使用示例数据，请通过 `--data` 选项指定 Reddit 训练数据。可用的选项有：`dummy`、`small` 和 `full`。\n\n```bash\npython demo.py --data small\npython demo.py --data full\n```\n\n小型 Reddit 数据约为 140MB，而完整版 Reddit 数据则超过 27GB。处理完整版数据时，建议您先泡上一杯咖啡，因为 **这需要很长时间**！\n\n要生成包含 6,000 条多参考测试集的数据，请运行 `demo.py` 并将数据选项设置为 `full`，生成的文件将位于：\n\n`.\u002Fdata\u002Ftest.refs.txt`\n\n#### 预训练模型\n\n预训练和微调后的模型可在 Azure Blob 存储中获取。\n有关如何下载和使用这些模型的更多详细信息，请参阅或运行 `demo.py`。您也可以直接使用 `demo_utils.py` 中的链接进行下载。\n\n#### 数据准备\n首先，使用 `prepare4db.sh` 脚本将 TSV 格式的数据文件转换为后续脚本能够识别的正确格式。\n随后，需使用以下命令行将训练数据处理成数据库文件：\n\n```bash\npython prepro.py --corpus $DATA_PATH\n```\n\n\n\n#### 使用训练脚本\n\n训练脚本既可以在单 GPU 环境下运行，也可以在多 GPU 环境下运行（即在同一台机器上的多 GPU 分布式训练）：\n\n```bash\npython .\u002FLSP_train.py  # 单 GPU 训练\npython -m torch.distributed.launch --nproc_per_node=8 .\u002FLSP_train.py  # 在 8 个 GPU 上训练\n```\n\n\n训练脚本接受多个参数来调整训练过程：\n\n参数 | 类型 | 默认值 | 描述\n---------|------|---------------|------------\nmax_seq_length | `int` | `128` | 每个训练实例的最大 token 数。\ntrain_input_file | `str` | `\"\"` | 训练数据集的路径，格式为 .db。\neval_input_file | `str` | `\"\"` | 验证集的路径，格式为 tsv。\ncontinue_from | `int` | `0` | 从指定步骤继续训练。\nfp16 | `boolean` | `True` | 是否使用 16 位浮点数进行模型训练。\ntrain_batch_size | `int` | `4` | 训练批次大小。\nvalid_batch_size | `int` | `4` | 验证批次大小。\ngradient_accumulation_steps | `int` | `2` | 多步累积梯度。\nlearning_rate | `float` | `1e-5` | 学习率。\nlr_schedule | `str` | `noam` | 学习率调度策略可选 [`noam`, `noamwd`, `BERT`, `None`]。\nnum_optim_steps | `int` | `1000000` | 训练优化步数。\nno_token_id | `boolean` | `True` | 如果设置为 True，则使用全零的 token 类型嵌入。\n\n\n在训练过程中，系统会更新两个日志文件：`train_log.txt` 和 `eval_log.txt`，其中记录了训练集和验证集的模型损失、困惑度以及训练速度（tokens\u002Fsec）等统计信息。\n\n日志文件和保存的模型检查点将存储在 `.\u002Fmodels\u002Foutput_model` 目录下。\n\n#### 模型解码\n我们注意到，即使使用经过适当筛选的 Reddit 数据集，我们的模型有时仍可能生成具有一定毒性或不恰当的回复。因此，目前我们无法提供解码脚本（实时演示和解码脚本的访问权限目前仅限受邀用户）。\n我们仍在研究一种受控解码方法，以防止模型生成有害内容。敬请关注。\n\n**请参阅问题 [#3](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3) 和 [Reddit 讨论](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FMachineLearning\u002Fcomments\u002Fdt5woy\u002Fp_dialogpt_state_of_the_art_conversational_model\u002F) 中关于第三方解码方法的讨论。**\n\n以下是一些第三方解码方法：\n- [https:\u002F\u002Fgithub.com\u002Fpolakowo\u002Fgpt2bot](https:\u002F\u002Fgithub.com\u002Fpolakowo\u002Fgpt2bot) polakowo 基于 Telegram 实现的 GPT2Bot，[参考](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-573904419)\n- [https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1PslHE4Rl4RqSa20s7HEp0ZKITBir6ezE](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1PslHE4Rl4RqSa20s7HEp0ZKITBir6ezE) qywu 提供的 Colab 交互式笔记本，[参考](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-551410203)\n- [https:\u002F\u002Fgithub.com\u002Fandreamad8\u002FDialoGPT2-Interact](https:\u002F\u002Fgithub.com\u002Fandreamad8\u002FDialoGPT2-Interact) andreamad8 开发的多轮聊天机器人交互式脚本，[参考](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-551450016)\n- [https:\u002F\u002Fgithub.com\u002FLHolten\u002FDialoGTP-MMI-decoder](https:\u002F\u002Fgithub.com\u002FLHolten\u002FDialoGTP-MMI-decoder) LHolten 实现的 MMI 方法，[参考](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F3#issuecomment-558318401)\n- [https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1-_KjlAV3J1IVDw_9KogjKDCzgFY7Jp7E](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1-_KjlAV3J1IVDw_9KogjKDCzgFY7Jp7E) illuminascent@Reddit 提供的 Colab 交互式笔记本，[参考](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FMachineLearning\u002Fcomments\u002Fdt5woy\u002Fp_dialogpt_state_of_the_art_conversational_model\u002F?st=k530k3oo&sh=f6cd20fd)\n- [https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT](https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT) AK391 使用 Gradio 构建的第三方演示网页，[参考](https:\u002F\u002Fgradio.app\u002Fg\u002FAK391\u002FDialoGPT)\n\n## 模型\n\n我们发布了6个经过微调的模型，这些模型可以在低资源的用户自定义数据集上进一步微调。这些模型的总参数量从1.17亿到7.62亿不等，与OpenAI GPT-2模型的规模一致。\n\n| 模型           | 基于GPT-2微调 | 从头训练 |\n|----------------------|--------|--------|\n| DialoGPT 7.62亿参数模型| [\\[链接\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Flarge_ft.pkl) [\\[Hugging Face模型卡片\\]](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-large)  | [\\[链接\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Flarge_fs.pkl) |\n| DialoGPT 3.45亿参数模型| [\\[链接\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Fmedium_ft.pkl) [\\[Hugging Face模型卡片\\]](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-medium) | [\\[链接\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Fmedium_fs.pkl) | \n| DialoGPT 1.17亿参数模型| [\\[链接\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Fsmall_ft.pkl) [\\[Hugging Face模型卡片\\]](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FDialoGPT-small)| [\\[链接\\]](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Fsmall_fs.pkl) | \n| DialoGPT 3.45亿参数模型（反向，用于MMI）| [链接](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002Fmultiref\u002Fsmall_reverse.pkl) | -| \n| [DialogRPT](https:\u002F\u002Fgithub.com\u002Fgolsun\u002FDialogRPT) (**新**排序模型) | [链接](https:\u002F\u002Fgithub.com\u002Fgolsun\u002FDialogRPT) | -| \n\n\n这些模型文件可以像Hugging Face的[Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)库中的GPT-2模型检查点一样直接加载。对应的配置文件（`merges.txt`、`config.json`、`vocab.json`）可以在DialoGPT仓库的`.\u002Fconfigs\u002F*`目录下找到。\n\n反向模型是从目标序列预测源序列，该模型用于MMI重排序。\n\n[DialogRPT](https:\u002F\u002Fgithub.com\u002Fgolsun\u002FDialogRPT)是我们最近提出的排序模型，用于预测回复的人类反馈（点赞、回复）。这些模型可以用来提升DialoGPT生成质量（详情请参阅我们的EMNLP论文[arXiv:2009.06978](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.06978)）。\n\n## 重新训练完整模型\n\n### 数据准备\n\n重新训练完整模型的第一步是生成上述提到的27GB Reddit数据集。这需要从[https:\u002F\u002Ffiles.pushshift.io\u002Freddit](https:\u002F\u002Ffiles.pushshift.io\u002Freddit)下载完整的Reddit帖子和评论转储，并创建中间文件，整个过程总共需要**700GB的本地磁盘空间**。下载和处理全部数据大约需要1到2天，具体时间取决于你的计算能力（例如，在配备8核处理器的现代计算机上大约需要24小时）。假设你已经完成了上述设置和安装步骤（如激活conda环境LSP等），你可以通过运行以下命令之一来创建完整数据集：\n\n```\npython demo.py --data full\n```\n或者\n```\ncd reddit_extractor; SIZE=full make -j 8; cd ..\n```\n\n前一个命令会调用后一个命令，因此这两种方法是等效的。我们推荐使用前者，因为后者主要在遇到问题或需要自定义参数时才更有用（例如，`make`命令允许你只构建部分数据）。请注意，下载阶段可能会因地理位置等因素而出现错误（如防火墙等）。如果上述命令未能生成`data\u002Ftrain.tsv`文件，或者该文件的大小远未达到27GB，则说明过程中出现了问题。此时，你可以检查`reddit_extractor\u002Fwget-log`和`reddit_extractor\u002Flogs\u002F*.log`文件，查看是否有明显的错误信息（如wget无法从pushshift.io下载）。如果你看不懂这些错误信息，欢迎随时联系我们，并务必附上从这些日志文件中收集到的相关错误信息。\n\n训练数据统计：生成的训练tsv文件未压缩时大小约为26.8GB，包含1.468亿条训练实例，38.7亿个源端标记和21.4亿个目标端标记（包括话语级别的0\u002F1权重）。最终的train.tsv文件应包含146,846,215行。\n\n\n### 训练\n\n我们建议使用`demo.py --data full`来生成上述数据，因为它（1）生成数据，（2）将其转换为数据库格式，（3）使用`python LSP_train.py`训练模型。如果你想自定义任何超参数，请直接编辑`demo.py`文件。\n\n## 评估\n\n### DSTC-7 挑战赛\n\n我们的模型在[DSTC-7 挑战赛响应生成任务](https:\u002F\u002Fgithub.com\u002Fmgalley\u002FDSTC7-End-to-End-Conversation-Modeling)中取得了当前最佳的成绩。\n\n\n| 实验         | NIST2 | NIST4 | BLEU2  | BLEU4 | METEOR | ENT-4 | DIST-1 | DIST-2 | 平均长度 |\n|--------------------|-------|-------|--------|-------|--------|----------|------------|------------|---------|\n| 人类回复     | 2.62  | 2.65  | 12.35% | 3.13% | 8.31%  | 10.45    | 16.66%     | 67.01%     | 18.8    |\n| DSTC-7 冠军      | 2.51  | 2.52  | 14.35% | 1.83% | 8.07%  | 9.03     | 10.89%     | 32.49%     | 15.1    |\n| DialoGPT 345M      | 2.80  | 2.82  | 14.16% | 2.31% | 8.51%  | **10.08**    | 9.13%      | 39.73%     | 16.9    |\n| DialoGPT 345M (束搜索) | **2.92**  | **2.97**  | **19.18%** | **6.05%** | **9.29%**  | 9.57     | **15.73%**     | **51.03%**     | 14.2    |\n\n其中，ENT 表示[熵得分](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.05972)，DIST 表示[多样性得分](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1510.03055.pdf)。除平均长度外，其他指标数值越大越好。  \n\n\u003C!--| 实验           | NIST1  | NIST2  | NIST3  | NIST4  | BLEU1  | BLEU2  | BLEU3  | BLEU4  | METEOR | ENT-1 | ENT-2 | ENT-3 | ENT-4 | DIST-1 | DIST-2 | Len |\n|----------------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|----------|----------|----------|----------|------------|------------|---------|\n| 人类                | 2.4237 | 2.6244 | 2.6472 | 2.65   | 0.3408 | 0.1235 | 0.0572 | 0.0313 | 0.0831 | 6.5893   | 9.7423   | 10.4101  | 10.4450  | 0.1666     | 0.6701     | 18.7568 |\n| DSTC-7 冠军 | 2.3408 | 2.5102 | 2.522  | 2.523  | 0.4122 | 0.1435 | 0.0501 | 0.0183 | 0.0807 | 5.3832   | 7.6065   | 8.5304   | 9.0298   | 0.1089     | 0.3249     | 15.1327 |\n| DialoGPT           | 2.5863 | 2.804  | 2.823  | 2.8246 | 0.3927 | 0.1416 | 0.0555 | 0.0231 | 0.0851 | 5.5791   | 8.5109   | 9.6872   | 10.0765  | 0.0913     | 0.3973     | 16.9484 |\n| DialoGPT(束搜索)       | **2.5943**| **2.9163** | **2.9624** | **2.9681**| **0.4238** | **0.1918** | **0.1027** | **0.0605** | **0.0929** | **6.0815**   | **8.7379**   | 9.4037   | 9.5697   | 0.1573     | 0.5103     | 14.1603 |-->\n\n请注意，尽管自动评估结果优于人类回复，但这并不意味着我们的模型达到了人类水平。更多详细分析请参阅我们的论文。\n\n\n要在配备 8 张 V100 GPU 的服务器上，使用 DSTC-7 挑战赛数据对 `345M` DialoGPT 模型进行微调，请运行以下命令行（DSTC 数据可在 [DSTC-7 仓库](https:\u002F\u002Fgithub.com\u002Fmgalley\u002FDSTC7-End-to-End-Conversation-Modeling) 中找到）：\n\n```bash\npython3 -m torch.distributed.launch --nproc_per_node=8 train_LSP.py --init_checkpoint .\u002Fmodels\u002Fmedium\u002Fmedium_ft.pkl --train_input_file .\u002Fdata\u002FDSTC_train.db --eval_input_file .\u002Fdata\u002FDSTC_valid.tsv --model_name_or_path .\u002Fmodel\u002Fmedium\u002F --learning_rate 1e-4  --train_batch_size 64 --eval_batch_size 64 --no_token_id\n```\n\n训练好的模型可在此处找到：[DSTC medium 模型](https:\u002F\u002Facvrpublicycchen.blob.core.windows.net\u002Fdialogpt\u002FDSTC\u002Fmedium_ft.pkl)\n\n\n#### 评估\n\n1. 请**下载**以下第三方软件包，并将其保存到空文件夹 `3rdparty` 中：\n\t* [**mteval-v14c.pl**](https:\u002F\u002Fgoo.gl\u002FYUFajQ) 用于计算[NIST](http:\u002F\u002Fwww.mt-archive.info\u002FHLT-2002-Doddington.pdf)。您可能需要安装以下[Perl](https:\u002F\u002Fwww.perl.org\u002Fget.html)模块（例如通过 `cpan install`）：XML:Twig、Sort:Naturally 和 String:Util。\n\t* [**meteor-1.5**](http:\u002F\u002Fwww.cs.cmu.edu\u002F~alavie\u002FMETEOR\u002Fdownload\u002Fmeteor-1.5.tar.gz) 用于计算[METEOR](http:\u002F\u002Fwww.cs.cmu.edu\u002F~alavie\u002FMETEOR\u002Findex.html)。它需要[Java](https:\u002F\u002Fwww.java.com\u002Fen\u002Fdownload\u002Fhelp\u002Fdownload_options.xml)环境。\n\n\n2. 请按照[DSTC-7 官方仓库](https:\u002F\u002Fgithub.com\u002Fmgalley\u002FDSTC7-End-to-End-Conversation-Modeling\u002Ftree\u002Fmaster\u002Fdata_extraction)中的说明提取数据，并将 `data-official-test\u002Ftest.refs.txt` 放入 `.\u002Fdstc\u002Fdata\u002F` 文件夹中。\n\n3. 运行以下提取脚本以生成人类回复假设文件 `human.resp.txt`：\n\n\t```bash\n\tpython extract_human.py\n\t```\n\n4. 最后，要复现 DSTC 数据集上的人类假设结果，请在仓库文件夹下运行以下命令：\n\n\t```bash\n\tpython batch_eval.py\n\t```\n\n评估结果将生成在 `.\u002Fdstc\u002Feval\u002F` 文件夹中。\n\n\n## 6K 多参考数据集结果\n\n### 自动评估\n\n我们在来自 Reddit 的 6K 多参考数据集上进行了测试。结果汇总如下：\n\n| 实验         | NIST2 | NIST4 | BLEU2  | BLEU4 | METEOR | ENT-4 | DIST-1 | DIST-2 | 平均长度 |\n|--------------------|-------|-------|--------|-------|--------|----------|------------|------------|---------|\n| 人类回复     | 3.41  | 4.25  | 17.90% | 7.48% | 10.64% | 11       | 14.50%     | 63.00%     | 13.1    |\n| DialoGPT 117M      | 2.39  | 2.41  | 10.54% | 1.55% | 7.53%  | 10.78    | 8.60%      | 39.90%     | 12.8    |\n| DialoGPT 345M      | 3     | 3.06  | 16.96% | 4.56% | 9.81%  | 9.13     | 6.80%      | 26.30%     | 12.2    |\n| DialoGPT 762M      | 2.84  | 2.9   | 18.66% | 5.25% | 9.66%  | 9.72     | 7.76%      | 29.93%     | 11.2    |\n| DialoGPT 345M (束搜索) | **3.4**  | **3.5**   | **21.76%** | **7.92%** | 10.74%  | 10.48     | **12.38%**     | **48.74%**    | 11.3    |\n| DialoGPT 345M (带 MMI) | 3.28  | 3.33 | 15.68% | 3.94% | **11.23%**  | **11.25**     | 9.39%    | 45.55%   | 17.2    |\n\n### \u003Ca name=\"human_eval\">\u003C\u002Fa>人工评估\n\n我们进一步进行了人工评估（每种方法使用6,000个示例，每个示例由3位人类评委进行评价）。结果表明，在这种非交互式图灵测试下，我们的生成质量正逐渐接近真实人类回复的质量，证据十分充分：\n\n\n*相关性*：A和B中，哪一个与源提示更相关？\n\n| 系统 A | A胜出 (%) | 平局 (%) | B胜出 (%) | 系统 B|\n|--------------------|-------|-------|--------|-------|\n|DialoGPT 345M|2671      (45%)   | 513         (9%) |   2816       (47%)|人类回复|\n|DialoGPT 345M| 3281       (72%)|    394         (9%)  |  882         (19%)| [PersonalityChat](https:\u002F\u002Fdocs.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fproject-personality-chat\u002Foverview)|\n|DialoGPT 345M w\u002F MMI| **2871**     (48%)|    522         (9%)  |  2607      (43%)|人类回复|\n\n*信息量*：A和B中，哪一个内容更丰富、更具信息量？\n\n| 系统 A | A胜出 (%) | 平局 (%) | B胜出 (%) | 系统 B|\n|--------------------|-------|-------|--------|-------|\n|DialoGPT 345M| 2722       (45%) |  234         (4%) |  3044       (51%)|人类回复|\n|DialoGPT 345M|3490       (77%) |   206         (5%)  |  861         (19%)| [PersonalityChat](https:\u002F\u002Fdocs.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fproject-personality-chat\u002Foverview)|\n|DialoGPT 345M w\u002F MMI| **3011**       (50%)|    234        (4%)  |  2755       (46%)|人类回复|\n\n\n*拟人度*：A和B中，你认为哪一个更有可能是由人类生成的？\n\n| 系统 A | A胜出 (%) | 平局 (%) | B胜出 (%) | 系统 B|\n|--------------------|-------|-------|--------|-------|\n|DialoGPT 345M|2716       (45%)  | 263         (4%)  | 3021       (50%)|人类回复|\n|DialoGPT 345M|3462       (76%) |  196         (4%)  | 899         (20%)| [PersonalityChat](https:\u002F\u002Fdocs.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fproject-personality-chat\u002Foverview)|\n|DialoGPT 345M w\u002F MMI| **2978**      (50%)|    241         (4%)  |  2781        (46%)|人类回复|\n\n\n详细内容请参阅我们的[arXiv论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.00536)。\n\n\n\n\n\u003C!--相关性\n系统 胜出      (%)         平局        (%)         失败   (%)\n2 vs 1     2671       (0.45)    513         (0.09)    2816       (0.47)\n2 vs 3     3281       (0.72)    394         (0.09)    882         (0.19)\n2 vs 4     2379       (0.40)    527         (0.09)    3094       (0.52)\n2 vs 5     3019       (0.50)    581         (0.10)    2400       (0.40)\n2 vs 6     2726       (0.45)    576         (0.10)    2698       (0.45)\n \n信息量\n系统 胜出      (%)         平局        (%)         失败   (%)\n2 vs 1     2722       (0.45)    234         (0.04)    3044       (0.51)\n2 vs 3     3490       (0.77)    206         (0.05)    861         (0.19)\n2 vs 4     2474       (0.41)    257         (0.04)    3269       (0.54)\n2 vs 5     3230       (0.54)    362         (0.06)    2408       (0.40)\n2 vs 6     2856       (0.48)    303         (0.05)    2841       (0.47)\n \n拟人度\n系统 胜出      (%)         平局        (%)         失败   (%)\n2 vs 1     2716       (0.45)    263         (0.04)    3021       (0.50)\n2 vs 3     3462       (0.76)    196         (0.04)    899         (0.20)\n2 vs 4     2478       (0.41)    289         (0.05)    3233       (0.54)\n2 vs 5     3233       (0.54)    340         (0.06)    2427       (0.40)\n2 vs 6     2847       (0.47)    321         (0.05)    2832       (0.47)\n--> \n\n\n\u003C!--| 实验                   | NIST1 | NIST2 | NIST3 | NIST4 | BLEU1  | BLEU2  | BLEU3  | BLEU4 | METEOR | ENT-4 | DIST-1 | DIST-2 |\n|------------------------------|-------|-------|-------|-------|--------|--------|--------|-------|--------|----------|------------|------------|\n| 人类回复               | 2.99  | 3.41  | 3.83  | 4.25  | 39.61% | 17.90% | 10.71% | 7.48% | 10.64% | 11       | 14.50%     | 63.00%     |\n| DialoGPT 117M      | 2.25  | 2.39  | 2.41  | 2.41  | 35.43% | 10.54% | 3.85%  | 1.55% | 7.53%  | 10.78    | 8.60%      | 39.90%     |\n| DialoGPT 345M      |**2.67** | **3**     | **3.05**  | **3.06**  | 40.97% | 16.96% | 8.31%  | 4.56% | **9.81%**  | 9.13     | 6.80%      | 26.30%     |\n| DialoGPT 762M      | 2.51  | 2.84  | 2.89  | 2.9   | **44.45%** | **18.66%** | **9.32%**  | **5.25%** | 9.66%  | **9.72**     | **7.76%**      | **29.93%**    |-->\n\n\u003C!--| DialoGPT 762M  (beam search)| 2.51  | 2.9   | 2.97  | 2.98  | 44.84% | 21.08% | 12.01% | 7.57% | 10.11% | 10.06    | 11.62%     | 44.07%     |-->\n\u003C!--## ConvAI挑战 -->\n\u003C!--| DialoGPT 345M (beam search)  | 2.94  | 3.4   | 3.49  | 3.5   | 45.27% | 21.76% | 12.51% | 7.92% | 10.74% | 10.48    | 12.38%     | 48.74%     |-->\n\n## 生成的示例\n* 交互式环境中的多轮生成示例：\n\n|角色 | 回答 |\n|---------|--------|\n|用户 | 金钱能买到幸福吗？ |\n| 机器人 | 这取决于你花了多少钱在它上面。|\n|用户 | 那么，怎样才能最好地买到幸福呢？ |\n| 机器人 | 你只需要在20岁出头就成为百万富翁，这样你就能感到幸福了。|\n|用户 | 这也太难了吧！ |\n| 机器人 | 你根本不知道当个既有钱又幸福的人有多难。富人之所以有钱，自有其原因啊。|\n\n* 带有用户提示的多轮自对话\n\n|角色 | 回答 |\n|---------|--------|\n|用户 | 人生的意义是什么？ |\n| 机器人 | 意义就是和家人、朋友在一起。|\n|机器人 | 我猜：人生的意义就是你的父母和朋友们一直爱你。我相信这就是意义所在。|\n| 机器人 | 我觉得人生的意义在于，当你离开这个世界时，还能活着。这样你就可以和家人、朋友以及你自己在一起。|\n|机器人 | 不是这样的，不过还是谢谢你尝试帮助我。|\n\n\n\n\n* 常识问答示例\n\n|用户 | 机器人 |\n|---------|--------|\n|美国的第一任总统是谁？ | 乔治·华盛顿 |\n|水的沸点是多少？ | 我觉得大概是212华氏度。|\n|太阳和月亮哪个更大？ | 太阳。|\n|哪种动物身上有黑白相间的条纹？ | 一只黑白条纹的猫。|\n\n* 处理训练样本中未出现的情境（请注意，训练样本取自2005年至2017年）\n\n|用户 | 机器人 |\n|---------|--------|\n|中美之间的贸易战仍在继续。 | 我相信只要双方都能保持冷静、继续谈生意，中国政府和美国政府就能妥善处理这个问题。|\n|2018年世界杯冠军是哪支球队？ | 我认为是德国队。|\n|英伟达的Titan RTX显卡真的很棒。 | 我就有这张卡，非常喜欢，没有任何问题。|\n|保时捷的新款Taycan电动车能打败特斯拉吗？ | 我不这么认为。特斯拉比保时捷更强大。\n\n* 我们的机器人对哲学问题的回答\n\n|用户 | 机器人 |\n|---------|--------|\n|什么是美好生活？ | 我认为，我们每个人都与自己的前世有着联系，而人生的意义就在于按照自己想要的方式生活。|\n|成功和快乐哪个更重要？ | 成功是永无止境的。而快乐才是唯一重要的事情。|\n|和平是阻止战争的唯一途径吗？ | 如果世界不愿意与你合作，那就不是了。和平固然是最终目标，但实现和平还需要更多努力。|\n|人们应该拥有多少自由？ | 适当的自由就是可以随心所欲地做自己想做的事，只要不伤害他人或破坏财物即可。|\n\n* 最后\n\n|用户 | 机器人 |\n|---------|--------|\n|你觉得DialoGPT这个仓库怎么样？ | 这个仓库挺不错的，不过很容易发现一些bug。|\n\n如果你发现了什么问题，请提交一个issue吧：)\n\n\n\n## 体验我们的系统\n目前，实时演示和解码脚本的访问权限仅限受邀用户。敬请期待正式发布。\n\n\n\u003C!--实时演示和解码脚本的访问权限需经过申请批准。请在此处提交申请[这里](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1epTNfaqva1isVO_o9pbyhVLsnzDn58dGkcLB0OUVcqs\u002Fedit?usp=sharing)--> \n\n\u003C!--该模型使用convai2评估脚本（见下文）应能达到79%以上的Hits@1、20.5的困惑度以及16.5的F1分数。\n\n这些数字略低于我们在ConvAI2竞赛中取得的成绩。若要达到相同效果，你可以尝试以下调整：\n\n- 在ConvAI2竞赛中，我们还对位置嵌入进行了调整，使对话历史始终从相同的嵌入开始。这用pytorch-pretrained-bert很容易实现，并且能够提升Hits@1指标。\n- 在ConvAI2竞赛中，我们使用的是束搜索解码器。虽然从F1指标来看结果更好，但我们认为，相较于当前仓库中提供的核采样检测器，束搜索带来的用户体验并不那么吸引人。-->\n\n\u003C!--## 使用交互脚本\n\n训练脚本会将所有实验和检查点保存在仓库根目录下的`.\u002Fruns`文件夹中，以实验时间戳命名的子文件夹里。\n\n之后，你只需指向该文件夹，即可使用交互脚本与模型进行互动。\n\n以下是运行交互脚本的示例命令行：\n\n```bash\npython .\u002Finteract.py --model_checkpoint .\u002Fdata\u002FApr17_13-31-38_thunder\u002F  # 使用训练检查点运行交互脚本\npython .\u002Finteract.py  # 使用我们S3上的微调模型运行交互脚本\n```\n\n微调后的模型最终Hits@1为0.715。\n\n交互脚本接受几个参数来调整解码算法：\n\n参数 | 类型 | 默认值 | 描述\n---------|------|---------------|------------\ndataset_path | `str` | `\"\"` | 数据集的路径或URL。若为空，则从S3下载。\ndataset_cache | `str` | `'.\u002Fdataset_cache.bin'` | 数据集缓存的路径或URL。\nmodel | `str` | `\"openai-gpt\"` | 模型的路径、URL或简称。\nmax_history | `int` | `2` | 保留对话历史中前几轮发言的数量。\ndevice | `str` | 如果`torch.cuda.is_available()`则为`cuda`，否则为`cpu` | 设备（cuda或cpu）。\nno_sample | `store_true` | 设置为真以使用贪婪解码而非采样。\nmax_length | `int` | `20` | 输出发言的最大长度。\nmin_length | `int` | `1` | 输出发言的最小长度。\nseed | `int` | `42` | 种子。\ntemperature | `int` | `0.7` | 采样softmax温度。\ntop_k | `int` | `0` | 采样前过滤掉top-k个词（`\u003C=0`：不进行过滤）。\ntop_p | `float` | `0.9` | 采样前进行核过滤（top-p）（`\u003C=0.0`：不进行过滤）。\n\n## 运行 ConvAI2 评估脚本\n\n要运行 ConvAI2 挑战赛的评估脚本，您首先需要在仓库的根目录下安装 `ParlAI`，具体步骤如下：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FParlAI.git\ncd ParlAI\npython setup.py develop\n```\n\n然后您可以从 `ParlAI` 的根目录运行评估脚本：\n\n```bash\ncd ParlAI\npython ..\u002Fconvai_evaluation.py --eval_type hits@1  # 下载并评估我们在 hits@1 指标上的微调模型\npython ..\u002Fconvai_evaluation.py --eval_type hits@1  --model_checkpoint .\u002Fdata\u002FApr17_13-31-38_thunder\u002F  # 在 hits@1 指标上评估某个训练检查点\n```\n\n评估脚本接受几个参数来选择评估指标并调整解码算法：\n\n参数 | 类型 | 默认值 | 描述\n---------|------|---------------|------------\neval_type | `str` | `\"hits@1\"` | 在 ConvAI2 验证集上使用 `hits@1`、`ppl` 或 `f1` 指标评估模型\nmodel | `str` | `\"openai-gpt\"` | 模型的路径、URL 或简称\nmax_history | `int` | `2` | 保留历史记录中的前几轮对话轮次\ndevice | `str` | 如果 `torch.cuda.is_available()` 则为 `cuda`，否则为 `cpu` | 设备（cuda 或 cpu）\nno_sample | 动作 `store_true` | 设置为使用贪婪解码而非采样\nmax_length | `int` | `20` | 输出话语的最大长度\nmin_length | `int` | `1` | 输出话语的最小长度\nseed | `int` | `42` | 随机种子\ntemperature | `float` | `0.7` | 采样 softmax 温度\ntop_k | `int` | `0` | 采样前过滤 top-k 个词（`\u003C=0`：不进行过滤）\ntop_p | `float` | `0.9` | 采样前进行核过滤（top-p）（`\u003C=0.0`：不进行过滤）\n\n-->\n\n## 相关项目\n\n* RetGen: [https:\u002F\u002Fgithub.com\u002Fdreasysnail\u002FRetGen](https:\u002F\u002Fgithub.com\u002Fdreasysnail\u002FRetGen)。基于检索增强\u002F知识增强的 DialoGPT 及其扩展。RetGen 是一个联合训练框架，能够以端到端的方式同时优化密集段落检索器和知识增强文本生成器。\n\n* 微软 ICECAPS: [https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Ficecaps](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Ficecaps)。\n\n作为本项目的互补性资源库，\n微软 Icecaps 是一个用于构建神经对话系统的开源工具包（基于 TensorFlow）。Icecaps 提供了一系列来自最新对话建模和通用 NLP 文献的工具，采用灵活的范式支持复杂的多任务学习设置。\n\n* 预训练 UniLM: [https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm)\n* MT-DNN: [https:\u002F\u002Fgithub.com\u002Fnamisan\u002Fmt-dnn](https:\u002F\u002Fgithub.com\u002Fnamisan\u002Fmt-dnn)\n* 由 yangjianxin1 开发的中文版 DialoGPT。[https:\u002F\u002Fgithub.com\u002Fyangjianxin1\u002FGPT2-chitchat](https:\u002F\u002Fgithub.com\u002Fyangjianxin1\u002FGPT2-chitchat)。我们很高兴地看到，我们在 DialoGPT 中使用的 MMI 策略也显著提升了该项目的性能！\n\n## 联系方式\n\n如有任何问题或建议，请联系 [DialoGPT@microsoft.com](mailto:DialoGPT@microsoft.com)。不过，我们的回复可能会比较延迟，请您谅解。\n\n## 贡献说明\n\n本项目欢迎各类贡献和建议。大多数贡献都需要您同意一份贡献者许可协议（CLA），声明您有权并将您的贡献权利授予我们使用。详情请访问 https:\u002F\u002Fcla.opensource.microsoft.com。\n\n当您提交拉取请求时，CLA 机器人会自动判断您是否需要签署 CLA，并相应地标记您的 PR（例如状态检查、评论等）。您只需按照机器人提供的指示操作即可。对于所有使用我们 CLA 的仓库，您只需完成一次此流程。\n\n本项目已采纳 [微软开源行为准则](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002F)。更多信息请参阅 [行为准则常见问题解答](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002Ffaq\u002F)，或通过 [opencode@microsoft.com](mailto:opencode@microsoft.com) 联系我们，提出更多问题或意见。\n\n## 免责声明\n\n本仓库旨在促进大规模预训练在对话数据领域的研究。该工具包仅包含构建实际对话模型权重文件所需的部分建模组件。单独使用该模型只能提供不同文本片段的权重信息；若研究人员希望真正利用它，还需自行准备对话数据，并从预训练系统中解码生成响应。微软不对第三方使用该预训练系统所产生的任何结果承担责任。\n\n\n\n## 引用信息\n如果您在研究中使用了本代码，可以引用我们的 arXiv 论文：\n```bash\n@inproceedings{zhang2019dialogpt,\n    title={DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation},\n    author={Yizhe Zhang and Siqi Sun and Michel Galley and Yen-Chun Chen and Chris Brockett and Xiang Gao and Jianfeng Gao and Jingjing Liu and Bill Dolan},\n    year={2020},\n    booktitle={ACL, system demonstration}\n}\n```","# DialoGPT 快速上手指南\n\n> **重要提示**：本项目已不再维护。DialoGPT 已被微软推出的 **[GODEL](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FGODEL)** 模型取代，后者在多项指标上表现更优。除非您为了复现旧论文结果，否则强烈建议切换至 GODEL。\n\nDialoGPT 是一个基于 Reddit 讨论线程中 1.47 亿轮多回合对话训练的大规模预训练对话响应生成模型。它在单轮图灵测试中生成的回复质量可与人类相媲美。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**：推荐 **Ubuntu 16.04**（官方主要支持环境）。其他 Linux 发行版、macOS 或 WSL 可能运行部分代码，但不保证稳定性。\n- **GPU**：\n  - Small (117M) \u002F Medium (345M) 模型：至少 12GB 显存。\n  - Large (762M) 模型：建议 16GB+ 显存。\n- **CUDA**：需要 Nvidia CUDA 10.0 开发工具包。\n\n### 前置依赖\n确保系统已安装以下基础工具：\n```bash\nsudo apt-get install -y make wget gzip bzip2 xz-utils zstd sed\n```\n\n## 安装步骤\n\n您可以选择使用 **Conda** 或 **Docker** 进行部署。\n\n### 方案一：使用 Conda 环境（推荐）\n\n1. **克隆项目并创建环境**\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT.git\n   cd DialoGPT\n   conda env create -f LSP-linux.yml -n LSP\n   conda activate LSP\n   ```\n   *注：若非 Linux 系统，请将 `LSP-linux.yml` 替换为 `LSP-generic.yml`（稳定性未完全保证）。*\n\n2. **安装 Apex (用于 FP16 加速训练)**\n   若需使用混合精度训练，请执行以下命令安装 NVIDIA Apex：\n   ```bash\n   conda activate LSP\n   git clone https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex\n   cd apex\n   git reset --hard 3d01e4a0a188cc8df54bc6e44cf5eb40ff6b4cc5\n   pip install -v --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" .\n   ```\n\n### 方案二：使用 Docker 环境\n\n如果您已安装 Docker 和 Nvidia-Docker，可直接拉取镜像运行：\n\n**Nvidia-Docker v2:**\n```bash\ndocker run --gpus all --ipc=host --rm -it -v $PWD:\u002Fworkspace --network=host icaruszyz\u002Flarge-scale-training:dialogpt bash\n```\n\n**Nvidia-Docker v1:**\n```bash\nnvidia-docker --rm -it -v $PWD:\u002Fworkspace --network=host icaruszyz\u002Flarge-scale-training:dialogpt bash\n```\n\n进入容器后，即可直接运行脚本。\n\n## 基本使用\n\n项目提供了一个集成脚本 `demo.py`，可一键完成模型下载、数据预处理和演示运行。\n\n### 1. 快速体验（使用虚拟数据）\n运行以下命令将自动下载预训练模型并使用虚拟数据进行流程测试：\n```bash\npython demo.py\n```\n\n### 2. 使用真实数据训练\u002F微调\n`demo.py` 支持通过 `--data` 参数指定数据集规模：\n- `dummy`: 虚拟数据（默认）\n- `small`: 小型 Reddit 数据集 (~140MB)\n- `full`: 完整 Reddit 数据集 (>27GB，处理耗时较长)\n\n**示例：使用小型数据集运行**\n```bash\npython demo.py --data small\n```\n\n**示例：使用完整数据集运行（生成 6k 多参考测试集）**\n```bash\npython demo.py --data full\n```\n*运行完成后，生成的测试集位于 `.\u002Fdata\u002Ftest.refs.txt`。*\n\n### 3. 手动训练模型\n如果您已准备好数据库格式的数据，可以直接使用训练脚本。\n\n**单 GPU 训练：**\n```bash\npython .\u002FLSP_train.py\n```\n\n**多 GPU 分布式训练（例如 8 卡）：**\n```bash\npython -m torch.distributed.launch --nproc_per_node=8 .\u002FLSP_train.py\n```\n\n**常用训练参数说明：**\n- `--max_seq_length`: 单个实例的最大 token 数（默认 128）\n- `--train_input_file`: .db 格式的训练数据路径\n- `--fp16`: 是否启用 16 位浮点训练（默认 True）\n\n### 4. 直接使用 Hugging Face Transformers\n如果您仅需进行推理而不需要重新训练，推荐使用 Hugging Face 提供的预训练模型，无需配置复杂环境：\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"microsoft\u002FDialoGPT-medium\")\nmodel = AutoModelForCausalLM.from_pretrained(\"microsoft\u002FDialoGPT-medium\")\n\n# 简单对话示例\ninput_text = \"Hello, how are you?\"\nnew_input_ids = tokenizer.encode(input_text + tokenizer.eos_token, return_tensors='pt')\n\nbot_input_ids = new_input_ids\nfor i in range(3):\n    outputs = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)\n    print(tokenizer.decode(outputs[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True))\n    bot_input_ids = torch.cat([bot_input_ids, outputs[:, bot_input_ids.shape[-1]:]], dim=1)\n```\n*可用模型版本：`microsoft\u002FDialoGPT-small`, `microsoft\u002FDialoGPT-medium`, `microsoft\u002FDialoGPT-large`*","某初创团队正在开发一款面向 Z 世代的心理健康陪伴聊天机器人，需要其具备自然、共情且多轮次的对话能力。\n\n### 没有 DialoGPT 时\n- 开发人员不得不依赖基于规则或简单检索的旧式引擎，导致回复生硬刻板，经常答非所问，无法理解用户复杂的情绪表达。\n- 为了覆盖多样化的对话场景，团队需人工编写数万条问答对，不仅耗时数月，而且难以维持上下文连贯性，多轮对话极易“失忆”。\n- 用户测试反馈极差，因机器人缺乏“人味”和逻辑跳跃处理能力，导致用户留存率低，项目面临搁浅风险。\n\n### 使用 DialoGPT 后\n- 团队直接调用在 1.47 亿条 Reddit 多轮对话数据上预训练的 DialoGPT 模型，使其能生成符合人类直觉的自然回复，显著提升了共情能力。\n- 借助其强大的上下文记忆机制，DialoGPT 能轻松处理长程多轮交互，无需人工穷举场景即可应对各种突发话题，开发周期从数月缩短至数周。\n- 在单轮图灵测试中，DialoGPT 生成的回复质量已媲美真人，用户互动时长大幅增加，成功验证了产品的核心价值。\n\nDialoGPT 通过大规模预训练技术，将原本高昂的自然语言对话构建成本降至最低，让开发者能专注于业务逻辑而非基础语料积累。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_DialoGPT_df702f78.png","microsoft","Microsoft","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmicrosoft_4900709c.png","Open source projects and samples from Microsoft",null,"opensource@microsoft.com","OpenAtMicrosoft","https:\u002F\u002Fopensource.microsoft.com","https:\u002F\u002Fgithub.com\u002Fmicrosoft",[83,87,90],{"name":84,"color":85,"percentage":86},"Python","#3572A5",97.9,{"name":88,"color":89,"percentage":32},"Makefile","#427819",{"name":91,"color":92,"percentage":93},"Shell","#89e051",0,2423,351,"2026-04-10T19:11:44","MIT","Linux (Ubuntu 16.04)","必需。推荐 NVIDIA GPU。小\u002F中型模型需单卡 12GB+ 显存；大型模型需单卡 16GB+ 显存。训练示例基于 V100。要求 CUDA 10.0。","未说明（数据处理需 800GB 临时磁盘空间）",{"notes":102,"python":103,"dependencies":104},"官方仅在 Ubuntu 16.04 上提供完整支持，其他系统（macOS, WSL）可能运行但不保证稳定性且无官方支持。数据预处理脚本依赖 pushshift.io 服务器。大型模型分布式训练在 8 张 V100 上需数小时。支持 FP16 混合精度训练以加速。","3.6",[105,106,107],"pytorch-transformers (Hugging Face)","apex (NVIDIA)","conda",[16,35,14],[110,111,112,113,114,115,116,117,118],"dialogue","machine-learning","pytorch","transformer","text-generation","dialogpt","gpt-2","text-data","data-processing","2026-03-27T02:49:30.150509","2026-04-14T12:35:36.466976",[122,127,132,137,142,146],{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},32868,"为什么 DialoGPT 模型在对话过程中会不断重复相同的句子或短语？","这通常不是预期行为，可能是由于训练过程中的“灾难性遗忘”（catastrophic forgetting）导致的。如果模型原本能输出良好的回复但现在表现不佳，建议检查在 Reddit 数据集上的损失值（loss），以确认训练是否出了问题。此外，生成时未正确设置停止条件或采样参数也可能导致重复。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F45",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},32869,"在批量处理数据时，如果在右侧使用 0 进行填充（padding），为什么会导致模型性能严重下降？","虽然在 Transformer 架构中自注意力机制理论上不受右侧填充影响，但在 DialoGPT 的实际应用中，填充位置至关重要。如果在上下文和分隔符之间插入填充（即格式为 \"context [PAD][PAD] [SEP]\"），会改变上下文的语义表示，且这种序列在训练阶段从未出现过，从而导致性能下降。正确的做法是将填充放在整个序列的末尾（即 \"context [SEP] response [SEP] [PAD][PAD]\"），或者确保批次内输入序列长度一致以避免填充干扰。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F20",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},32870,"DialoGPT 的多轮对话格式中，不同对话轮次之间是否需要特殊的分隔标记？","是的，同一会话中的对话轮次会被连接成一段长文本，并以结束文本令牌（end-of-text token, \u003C|endoftext|>）结尾。根据源码实现，轮次之间的格式通常为 \"Turn1\u003C|endoftext|>Turn2\u003C|endoftext|>...TurnN\"。具体实现可参考官方训练脚本 LSP_train.py 中的相关代码逻辑。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F17",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},32871,"下载 Reddit 数据集时遇到 'make: wget: No such file or directory' 错误怎么办？","该错误通常是因为系统中未安装 'wget' 工具。在 macOS 或 Linux 上，可以通过包管理器安装（例如 macOS 使用 'brew install wget'，Ubuntu 使用 'sudo apt-get install wget'）。此外，部分旧的数据集链接（如 pushshift.io 上的 v2 版本文件）可能已失效（返回 404 错误），此时需要寻找替代的数据源或更新后的数据集地址。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDialoGPT\u002Fissues\u002F60",{"id":143,"question_zh":144,"answer_zh":145,"source_url":126},32872,"微调后的模型出现生成乱码或无限重复单个字符的问题，可能的原因是什么？","这种情况通常表明模型在微调过程中发生了严重的过拟合或灾难性遗忘。建议检查微调数据集的质量和格式是否与预训练数据一致，并监控训练过程中的损失值变化。如果损失值异常波动或未收敛，可能需要调整学习率、增加正则化或重新检查数据预处理步骤（如 tokenizer 的使用是否正确）。",{"id":147,"question_zh":148,"answer_zh":149,"source_url":126},32873,"在多轮对话生成时，如何正确构建输入张量以避免历史对话丢失？","在每一轮对话中，需要将新的用户输入编码后追加到之前的聊天历史 ID 中。具体操作是使用 torch.cat 将历史 tensor (chat_history_ids) 和新输入 tensor (new_user_input_ids) 在维度 -1 上拼接。注意首轮对话时直接使用新输入，后续轮次需拼接。同时，生成时应限制最大长度（max_length）并正确设置 pad_token_id 为 eos_token_id，以防止显存溢出并确保生成停止。",[]]