[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-minimaxir--gpt-2-simple":3,"tool-minimaxir--gpt-2-simple":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":90,"forks":91,"last_commit_at":92,"license":93,"difficulty_score":10,"env_os":94,"env_gpu":95,"env_ram":96,"env_deps":97,"category_tags":106,"github_topics":107,"view_count":23,"oss_zip_url":82,"oss_zip_packed_at":82,"status":16,"created_at":112,"updated_at":113,"faqs":114,"releases":134},1475,"minimaxir\u002Fgpt-2-simple","gpt-2-simple","Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts","gpt-2-simple是一个简化GPT-2模型微调与文本生成的Python工具。它将OpenAI官方模型、微调脚本和生成管理整合为简洁API，让开发者无需复杂配置即可快速训练定制化文本生成器。只需几行代码即可完成模型下载、微调和生成，支持在Google Colab免费使用GPU训练，并能指定开头短语或保存生成结果。特别适合有一定Python基础的研究人员或开发者，用于快速验证文本生成效果或处理中小规模数据集。虽然当前更推荐使用aitextgen（训练效率更高），但gpt-2-simple的检查点仍可无缝迁移，为轻量级项目提供灵活选择。","# gpt-2-simple\n\n![gen_demo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fminimaxir_gpt-2-simple_readme_b0db6f023a98.png)\n\nA simple Python package that wraps existing model fine-tuning and generation scripts for [OpenAI](https:\u002F\u002Fopenai.com)'s [GPT-2 text generation model](https:\u002F\u002Fopenai.com\u002Fblog\u002Fbetter-language-models\u002F) (specifically the \"small\" 124M and \"medium\" 355M hyperparameter versions). Additionally, this package allows easier generation of text, generating to a file for easy curation, allowing for prefixes to force the text to start with a given phrase.\n\nThis package incorporates and makes minimal low-level changes to:\n\n- Model management from OpenAI's [official GPT-2 repo](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgpt-2) (MIT License)\n- Model finetuning from Neil Shepperd's [fork](https:\u002F\u002Fgithub.com\u002Fnshepperd\u002Fgpt-2) of GPT-2 (MIT License)\n- Text generation output management from [textgenrnn](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Ftextgenrnn) (MIT License \u002F also created by me)\n\nFor finetuning, it is **strongly** recommended to use a GPU, although you can generate using a CPU (albeit much more slowly). If you are training in the cloud, using a Colaboratory notebook or a Google Compute Engine VM w\u002F the [TensorFlow Deep Learning](https:\u002F\u002Fcloud.google.com\u002Fdeep-learning-vm\u002F) image is strongly recommended. (as the GPT-2 model is hosted on GCP)\n\nYou can use gpt-2-simple to retrain a model using a GPU **for free** in [this Colaboratory notebook](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce), which also demos additional features of the package.\n\nNote: Development on gpt-2-simple has mostly been superceded by [aitextgen](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Faitextgen), which has similar AI text generation capabilities with more efficient training time and resource usage. If you do not require using TensorFlow, I recommend using aitextgen instead. Checkpoints trained using gpt-2-simple can be [loaded using aitextgen](https:\u002F\u002Fdocs.aitextgen.io\u002Fgpt-2-simple\u002F) as well.\n\n## Install\n\ngpt-2-simple can be installed [via PyPI](https:\u002F\u002Fpypi.org\u002Fproject\u002Fgpt_2_simple\u002F):\n\n```shell\npip3 install gpt-2-simple\n```\n\nYou will also need to install the corresponding TensorFlow 2.X version (min 2.5.1) for your system (e.g. `tensorflow` or `tensorflow-gpu`).\n\n## Usage\n\nAn example for downloading the model to the local system, finetuning it on a dataset. and generating some text.\n\nWarning: the pretrained 124M model, and thus any finetuned model, is 500 MB! (the pretrained 355M model is 1.5 GB)\n\n```python\nimport gpt_2_simple as gpt2\nimport os\nimport requests\n\nmodel_name = \"124M\"\nif not os.path.isdir(os.path.join(\"models\", model_name)):\n\tprint(f\"Downloading {model_name} model...\")\n\tgpt2.download_gpt2(model_name=model_name)   # model is saved into current directory under \u002Fmodels\u002F124M\u002F\n\n\nfile_name = \"shakespeare.txt\"\nif not os.path.isfile(file_name):\n\turl = \"https:\u002F\u002Fraw.githubusercontent.com\u002Fkarpathy\u002Fchar-rnn\u002Fmaster\u002Fdata\u002Ftinyshakespeare\u002Finput.txt\"\n\tdata = requests.get(url)\n\n\twith open(file_name, 'w') as f:\n\t\tf.write(data.text)\n\n\nsess = gpt2.start_tf_sess()\ngpt2.finetune(sess,\n              file_name,\n              model_name=model_name,\n              steps=1000)   # steps is max number of training steps\n\ngpt2.generate(sess)\n```\n\nThe generated model checkpoints are by default in `\u002Fcheckpoint\u002Frun1`. If you want to load a model from that folder and generate text from it:\n\n```python\nimport gpt_2_simple as gpt2\n\nsess = gpt2.start_tf_sess()\ngpt2.load_gpt2(sess)\n\ngpt2.generate(sess)\n```\n\nAs with textgenrnn, you can generate and save text for later use (e.g. an API or a bot) by using the `return_as_list` parameter.\n\n```python\nsingle_text = gpt2.generate(sess, return_as_list=True)[0]\nprint(single_text)\n```\n\nYou can pass a `run_name` parameter to `finetune` and `load_gpt2` if you want to store\u002Fload multiple models in a `checkpoint` folder.\n\nThere is also a command-line interface for both finetuning and generation with strong defaults for just running on a Cloud VM w\u002F GPU. For finetuning (which will also download the model if not present):\n\n```shell\ngpt_2_simple finetune shakespeare.txt\n```\n\nAnd for generation, which generates texts to files in a `gen` folder:\n\n```shell\ngpt_2_simple generate\n```\n\nMost of the same parameters available in the functions are available as CLI arguments, e.g.:\n\n```shell\ngpt_2_simple generate --temperature 1.0 --nsamples 20 --batch_size 20 --length 50 --prefix \"\u003C|startoftext|>\" --truncate \"\u003C|endoftext|>\" --include_prefix False --nfiles 5\n```\n\nSee below to see what some of the CLI arguments do.\n\nNB: _Restart the Python session first_ if you want to finetune on another dataset or load another model.\n\n## Differences Between gpt-2-simple And Other Text Generation Utilities\n\nThe method GPT-2 uses to generate text is slightly different than those like other packages like textgenrnn (specifically, generating the full text sequence purely in the GPU and decoding it later), which cannot easily be fixed without hacking the underlying model code. As a result:\n\n- In general, GPT-2 is better at maintaining context over its entire generation length, making it good for generating conversational text. The text is also generally gramatically correct, with proper capitalization and few typoes.\n- The original GPT-2 model was trained on a _very_ large variety of sources, allowing the model to incorporate idioms not seen in the input text.\n- GPT-2 can only generate a maximum of 1024 tokens per request (about 3-4 paragraphs of English text).\n- GPT-2 cannot stop early upon reaching a specific end token. (workaround: pass the `truncate` parameter to a `generate` function to only collect text until a specified end token. You may want to reduce `length` appropriately.)\n- Higher temperatures work better (e.g. 0.7 - 1.0) to generate more interesting text, while other frameworks work better between 0.2 - 0.5.\n- When finetuning GPT-2, it has no sense of the beginning or end of a document within a larger text. You'll need to use a bespoke character sequence to indicate the beginning and end of a document. Then while generating, you can specify a `prefix` targeting the beginning token sequences, and a `truncate` targeting the end token sequence. You can also set `include_prefix=False` to discard the prefix token while generating (e.g. if it's something unwanted like `\u003C|startoftext|>`).\n- If you pass a single-column `.csv` file to `finetune()`, it will automatically parse the CSV into a format ideal for training with GPT-2 (including prepending `\u003C|startoftext|>` and suffixing `\u003C|endoftext|>` to every text document, so the `truncate` tricks above are helpful when generating output). This is necessary to handle both quotes and newlines in each text document correctly.\n- GPT-2 allows you to generate texts in parallel by setting a `batch_size` that is divisible into `nsamples`, resulting in much faster generation. Works very well with a GPU (can set `batch_size` up to 20 on Colaboratory's K80)!\n- Due to GPT-2's architecture, it scales up nicely with more powerful GPUs. For the 124M model, if you want to train for longer periods of time, GCP's P100 GPU is about 3x faster than a K80\u002FT4 for only 3x the price, making it price-comparable (the V100 is about 1.5x faster than the P100 but about 2x the price). The P100 uses 100% of the GPU even with `batch_size=1`, and about 88% of the V100 GPU.\n- If you have a partially-trained GPT-2 model and want to continue finetuning it, you can set `overwrite=True` to finetune, which will continue training and remove the previous iteration of the model without creating a duplicate copy. This can be especially useful for transfer learning (e.g. heavily finetune GPT-2 on one dataset, then finetune on other dataset to get a \"merging\" of both datasets).\n- If your input text dataset is massive (>100 MB), you may want to preencode and compress the dataset using `gpt2.encode_dataset(file_path)`. THe output is a compressed `.npz` file which will load much faster into the GPU for finetuning.\n- The 774M \"large\" model may support finetuning because it will cause modern GPUs to go out-of-memory (you may get lucky if you use a P100 GPU on Colaboratory). However, you can still generate from the default pretrained model using `gpt2.load_gpt2(sess, model_name='774M')` and `gpt2.generate(sess, model_name='774M')`.\n- The 1558M \"extra large\", true model, may not work out-of-the-box with the GPU included with the Colaboratory Notebook. More testing is needed to identify optimial configurations for it.\n\n## Interactive Apps Using gpt-2-simple\n\n- [gpt2-small](https:\u002F\u002Fminimaxir.com\u002Fapps\u002Fgpt2-small\u002F) — App using the default GPT-2 124M pretrained model\n- [gpt2-reddit](https:\u002F\u002Fminimaxir.com\u002Fapps\u002Fgpt2-reddit\u002F) — App to generate Reddit titles based on a specified subreddit and\u002For keyword(s)\n- [gpt2-mtg](https:\u002F\u002Fminimaxir.com\u002Fapps\u002Fgpt2-mtg\u002F) — App to generate Magic: The Gathering cards\n\n## Text Generation Examples Using gpt-2-simple\n\n- [ResetEra](https:\u002F\u002Fwww.resetera.com\u002Fthreads\u002Fi-trained-an-ai-on-thousands-of-resetera-thread-conversations-and-it-created-hot-gaming-shitposts.112167\u002F) — Generated video game forum discussions ([GitHub w\u002F dumps](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Fresetera-gpt-2))\n- [\u002Fr\u002Flegaladvice](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Flegaladviceofftopic\u002Fcomments\u002Fbfqf22\u002Fi_trained_a_moreadvanced_ai_on_rlegaladvice\u002F) — Title generation ([GitHub w\u002F dumps](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Flegaladvice-gpt2))\n- [Hacker News](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Fhacker-news-gpt-2) — Tens of thousands of generated Hacker News submission titles\n\n## Maintainer\u002FCreator\n\nMax Woolf ([@minimaxir](https:\u002F\u002Fminimaxir.com))\n\n_Max's open-source projects are supported by his [Patreon](https:\u002F\u002Fwww.patreon.com\u002Fminimaxir). If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use._\n\n## License\n\nMIT\n\n## Disclaimer\n\nThis repo has no affiliation or relationship with OpenAI.\n","# gpt-2-simple\n\n![gen_demo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fminimaxir_gpt-2-simple_readme_b0db6f023a98.png)\n\n这是一个简单的 Python 包，用于封装 OpenAI 的 GPT-2 文本生成模型（特别是“small”1.24亿参数版本和“medium”3.55亿参数版本）的现有模型微调与生成脚本。此外，该包还支持更便捷地生成文本，可将生成结果保存到文件以便于后续整理，并允许指定前缀以强制文本以特定短语开头。\n\n此包对以下内容进行了整合，并仅做了少量底层改动：\n\n- 来自 OpenAI 官方 GPT-2 仓库（MIT 许可证）的模型管理\n- 来自 Neil Shepperd 分支的 GPT-2 微调代码（MIT 许可证）\n- 来自 [textgenrnn](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Ftextgenrnn) 的文本生成输出管理（MIT 许可证／同样由我开发）\n\n对于微调，**强烈建议**使用 GPU，尽管您也可以用 CPU 进行生成（但速度会慢很多）。如果您在云端训练，强烈推荐使用 Colaboratory Notebook 或配备 TensorFlow 深度学习镜像的 Google Compute Engine 虚拟机。（因为 GPT-2 模型托管在 GCP 上）\n\n您可以使用 gpt-2-simple 在 [此 Colaboratory Notebook](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce) 中免费利用 GPU 重新训练模型，该 Notebook 还演示了该包的其他功能。\n\n注意：gpt-2-simple 的开发工作已基本被 [aitextgen](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Faitextgen) 取代，后者具备相似的 AI 文本生成能力，且训练时间和资源消耗更加高效。如果您不需要使用 TensorFlow，我建议改用 aitextgen。使用 gpt-2-simple 训练得到的检查点同样可以 [通过 aitextgen 加载](https:\u002F\u002Fdocs.aitextgen.io\u002Fgpt-2-simple\u002F)。\n\n## 安装\n\ngpt-2-simple 可通过 PyPI 安装：\n\n```shell\npip3 install gpt-2-simple\n```\n\n此外，您还需要为您的系统安装对应的 TensorFlow 2.X 版本（最低 2.5.1），例如 `tensorflow` 或 `tensorflow-gpu`。\n\n## 使用方法\n\n以下是一个示例，展示如何将模型下载到本地、在数据集上进行微调并生成一些文本。\n\n警告：预训练的 1.24 亿参数模型及其微调后的模型大小均为 500 MB！（预训练的 3.55 亿参数模型则为 1.5 GB）\n\n```python\nimport gpt_2_simple as gpt2\nimport os\nimport requests\n\nmodel_name = \"124M\"\nif not os.path.isdir(os.path.join(\"models\", model_name)):\n    print(f\"正在下载 {model_name} 模型...\")\n    gpt2.download_gpt2(model_name=model_name)   # 模型会保存到当前目录下的 \u002Fmodels\u002F124M\u002F\n\nfile_name = \"shakespeare.txt\"\nif not os.path.isfile(file_name):\n    url = \"https:\u002F\u002Fraw.githubusercontent.com\u002Fkarpathy\u002Fchar-rnn\u002Fmaster\u002Fdata\u002Ftinyshakespeare\u002Finput.txt\"\n    data = requests.get(url)\n\n    with open(file_name, 'w') as f:\n        f.write(data.text)\n\n\nsess = gpt2.start_tf_sess()\ngpt2.finetune(sess,\n              file_name,\n              model_name=model_name,\n              steps=1000)   # steps 是最大训练步数\n\ngpt2.generate(sess)\n```\n\n生成的模型检查点默认保存在 `\u002Fcheckpoint\u002Frun1`。如果您想从该文件夹加载模型并生成文本：\n\n```python\nimport gpt_2_simple as gpt2\n\nsess = gpt2.start_tf_sess()\ngpt2.load_gpt2(sess)\n\ngpt2.generate(sess)\n```\n\n与 textgenrnn 类似，您可以通过设置 `return_as_list` 参数来生成并保存文本，以便后续使用（例如作为 API 或机器人）。\n\n```python\nsingle_text = gpt2.generate(sess, return_as_list=True)[0]\nprint(single_text)\n```\n\n如果您希望在 `checkpoint` 文件夹中存储或加载多个模型，可以在 `finetune` 和 `load_gpt2` 中传入 `run_name` 参数。\n\n此外，该包还提供了命令行界面，支持微调与生成操作，默认配置适合直接在配备 GPU 的云虚拟机上运行。对于微调（如果模型尚未下载，则会自动下载）：\n\n```shell\ngpt_2_simple finetune shakespeare.txt\n```\n\n对于生成操作，会将生成的文本保存到 `gen` 文件夹中：\n\n```shell\ngpt_2_simple generate\n```\n\n大部分函数中可用的参数同样支持作为 CLI 参数，例如：\n\n```shell\ngpt_2_simple generate --temperature 1.0 --nsamples 20 --batch_size 20 --length 50 --prefix \"\u003C|startoftext|>\" --truncate \"\u003C|endoftext|>\" --include_prefix False --nfiles 5\n```\n\n请参阅下方了解部分 CLI 参数的具体作用。\n\n注意：如果您想在另一个数据集上微调或加载另一个模型，请先**重启 Python 会话**。\n\n## gpt-2-simple 与其他文本生成工具的区别\n\nGPT-2 生成文本的方式与其他工具（如 textgenrnn）略有不同——具体而言，textgenrnn 是在 GPU 上直接生成完整的文本序列，然后再进行解码，这种做法很难在不修改底层模型代码的情况下修复。因此：\n\n- 总体而言，GPT-2 在整个生成过程中更能保持上下文连贯性，特别适合生成对话类文本。此外，生成的文本语法正确，大小写规范，错别字也较少。\n- 原始 GPT-2 模型是在种类极其丰富的数据源上训练的，这使得模型能够融入输入文本中未出现过的习语。\n- GPT-2 每次请求最多只能生成 1024 个标记（约相当于 3-4 段英文文本）。\n- GPT-2 无法在遇到特定结束标记时提前停止。（解决方法：在 `generate` 函数中传入 `truncate` 参数，以仅收集到指定结束标记为止的文本。你可能需要适当调小 `length` 参数。）\n- 较高的温度设置（例如 0.7 - 1.0）能生成更有趣的文本，而其他框架的最佳温度范围则在 0.2 - 0.5 之间。\n- 在微调 GPT-2 时，它无法感知更大文本中文档的开头和结尾。你需要使用自定义字符序列来标记文档的起始和结束。然后，在生成时，你可以指定一个 `prefix` 来定位文档开头标记，以及一个 `truncate` 来定位文档结束标记。你还可以将 `include_prefix=False` 设置为 True，以便在生成时忽略前缀标记（例如，如果你不想生成 `\u003C|startoftext|>` 这样的标记）。\n- 如果你向 `finetune()` 传递一个单列的 `.csv` 文件，它会自动将 CSV 解析成适合 GPT-2 训练的格式（包括在每个文本文档前添加 `\u003C|startoftext|>`，并在末尾添加 `\u003C|endoftext|>`，因此上述 `truncate` 技巧在生成输出时非常有用）。这样做是为了正确处理每个文本文档中的引号和换行符。\n- GPT-2 支持通过设置可被 `nsamples` 整除的 `batch_size` 来并行生成文本，从而大幅提升生成速度。配合 GPU 使用效果极佳（在 Colaboratory 的 K80 上，`batch_size` 最高可设为 20！）。\n- 由于 GPT-2 的架构设计，它能很好地适应更强大的 GPU。对于 1.24 亿参数的模型，如果你想长时间训练，GCP 的 P100 GPU 比 K80\u002FT4 快约 3 倍，但价格仅贵 3 倍，性价比相当（V100 比 P100 快约 1.5 倍，但价格贵约 2 倍）。P100 即使在 `batch_size=1` 时也能充分利用 GPU，而 V100 的利用率约为 88%。\n- 如果你已有部分训练好的 GPT-2 模型，并希望继续微调，可以将 `overwrite=True` 设置为 True，这样就能在不创建副本的情况下继续训练并覆盖之前的模型迭代。这在迁移学习中尤其有用（例如，先用一个数据集对 GPT-2 进行深度微调，再用另一个数据集微调，实现两个数据集的“融合”）。\n- 如果你的输入文本数据集非常庞大（超过 100 MB），你可以使用 `gpt2.encode_dataset(file_path)` 对数据集进行预编码和压缩。输出是一个压缩后的 `.npz` 文件，加载到 GPU 中进行微调的速度会快很多。\n- 7.74 亿参数的“large”模型或许支持微调，因为现代 GPU 可能会因内存不足而崩溃（如果你在 Colaboratory 上使用 P100 GPU，可能会幸运一些）。不过，你仍然可以使用默认预训练模型进行生成，只需调用 `gpt2.load_gpt2(sess, model_name='774M')` 和 `gpt2.generate(sess, model_name='774M')`。\n- 15.58 亿参数的“extra large”真模型可能无法直接在 Colaboratory Notebook 自带的 GPU 上运行。还需要更多测试来确定最适合它的配置。\n\n## 使用 gpt-2-simple 的交互式应用\n\n- [gpt2-small](https:\u002F\u002Fminimaxir.com\u002Fapps\u002Fgpt2-small\u002F) — 使用默认 GPT-2 1.24 亿参数预训练模型的应用\n- [gpt2-reddit](https:\u002F\u002Fminimaxir.com\u002Fapps\u002Fgpt2-reddit\u002F) — 根据指定的 subreddit 和\u002F或关键词生成 Reddit 标题的应用\n- [gpt2-mtg](https:\u002F\u002Fminimaxir.com\u002Fapps\u002Fgpt2-mtg\u002F) — 生成万智牌卡牌的应用\n\n## 使用 gpt-2-simple 的文本生成示例\n\n- [ResetEra](https:\u002F\u002Fwww.resetera.com\u002Fthreads\u002Fi-trained-an-ai-on-thousands-of-resetera-thread-conversations-and-it-created-hot-gaming-shitposts.112167\u002F) — 生成的电子游戏论坛讨论帖（[GitHub 包含转储文件](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Fresetera-gpt-2)）\n- [\u002Fr\u002Flegaladvice](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Flegaladviceofftopic\u002Fcomments\u002Fbfqf22\u002Fi_trained_a_moreadvanced_ai_on_rlegaladvice\u002F) — 标题生成（[GitHub 包含转储文件](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Flegaladvice-gpt2)）\n- [Hacker News](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Fhacker-news-gpt-2) — 数以万计的 Hacker News 提交标题生成\n\n## 维护者\u002F创作者\n\nMax Woolf ([@minimaxir](https:\u002F\u002Fminimaxir.com))\n\nMax 的开源项目由他的 [Patreon](https:\u002F\u002Fwww.patreon.com\u002Fminimaxir) 支持。如果你觉得这个项目对你有帮助，任何金钱上的捐助都将受到欢迎，并会被用于有意义的创作用途。\n\n## 许可协议\n\nMIT\n\n## 免责声明\n\n本仓库与 OpenAI 无任何关联或合作关系。","# gpt-2-simple 快速上手指南\n\n## 环境准备\n- **系统要求**：建议使用 NVIDIA GPU（如 T4\u002FP100），CPU 也可运行但速度较慢  \n- **前置依赖**：Python 3.6+，TensorFlow 2.5.1+  \n- **国内加速推荐**：使用清华源加速安装依赖\n\n## 安装步骤\n```shell\npip install gpt_2_simple tensorflow -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n```python\nimport gpt_2_simple as gpt2\nimport requests\n\n# 下载示例数据（莎士比亚文本）\nurl = \"https:\u002F\u002Fraw.githubusercontent.com\u002Fkarpathy\u002Fchar-rnn\u002Fmaster\u002Fdata\u002Ftinyshakespeare\u002Finput.txt\"\nwith open(\"shakespeare.txt\", \"w\") as f:\n    f.write(requests.get(url).text)\n\n# 下载124M预训练模型（首次运行自动下载，约500MB）\ngpt2.download_gpt2(model_name=\"124M\")\n\n# 启动TensorFlow会话\nsess = gpt2.start_tf_sess()\n\n# 微调模型（1000步）\ngpt2.finetune(sess, \"shakespeare.txt\", steps=1000)\n\n# 生成文本并打印\ngenerated_text = gpt2.generate(sess, return_as_list=True)[0]\nprint(generated_text)\n```\n\n> 注意：若需使用自己的数据，替换 `shakespeare.txt` 为本地文件路径。首次运行会自动下载模型文件。","一位独立游戏开发者正在为一款复古风格的文字冒险游戏创作剧情对话，需要让NPC角色使用莎士比亚式古英语风格说话，但手动编写数百条符合语境的对话耗时且风格不统一。\n\n### 没有 gpt-2-simple 时\n- 每条对话都需要人工模仿莎士比亚语言风格，耗时数小时才能写出20条合格内容\n- 语言风格容易偏离，有时出现现代词汇或语法错误，破坏沉浸感\n- 无法批量生成不同语境下的对话变体，每次调整都要重写\n- 缺乏训练数据管理能力，无法将已验证的优质对话纳入模型持续优化\n- 需要依赖外部API或付费模型，成本高且无法本地部署保护游戏内容隐私\n\n### 使用 gpt-2-simple 后\n- 仅用一份《莎士比亚全集》文本作为训练数据，30分钟内即可在免费Colab上完成模型微调\n- 生成的对话自然贴合伊丽莎白时代语言习惯，连“thou”“hath”等古语用法都准确自然\n- 可通过指定前缀（如“NPC对主角说：”）批量生成100+条风格一致的对话，效率提升20倍\n- 训练后的模型保存为本地文件，可随时加载、迭代，加入新对话后重新微调无需从头开始\n- 完全离线运行，所有剧情内容不外传，保障游戏知识产权安全\n\ngpt-2-simple 让开发者从“人工写诗”转变为“训练一个莎士比亚灵魂”，以极低成本实现了专业级文本生成能力。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fminimaxir_gpt-2-simple_b0db6f02.png","minimaxir","Max Woolf","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fminimaxir_3ab20437.jpg","Senior Data Scientist @buzzfeed. Plotter of pretty charts.","@buzzfeed ","San Francisco","max@minimaxir.com",null,"https:\u002F\u002Fminimaxir.com","https:\u002F\u002Fgithub.com\u002Fminimaxir",[86],{"name":87,"color":88,"percentage":89},"Python","#3572A5",100,3406,670,"2026-04-02T08:38:46","NOASSERTION","Linux, macOS, Windows","推荐使用 NVIDIA GPU，显存 8GB+，CUDA 版本需与 TensorFlow 2.5.1+ 兼容","16GB+",{"notes":98,"python":99,"dependencies":100},"建议使用 Colab 等云平台免费运行，首次运行需下载 500MB 至 1.5GB 模型文件；训练建议使用 GPU，CPU 生成速度极慢；支持通过 .csv 文件自动处理文本格式；模型检查点默认保存在 \u002Fcheckpoint\u002Frun1 目录；推荐使用 aitextgen 替代以获得更高效训练","3.8+",[101,102,103,104,105],"tensorflow>=2.5.1","requests","numpy","tqdm","regex",[13,26],[108,109,110,111],"text-generation","tensorflow","openai","textgenrnn","2026-03-27T02:49:30.150509","2026-04-06T08:52:36.163440",[115,120,124,129],{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},8832,"运行finetune时出现JSONDecodeError如何解决？","在调用 `gpt2.finetune()` 之前添加 `import tensorflow as tf; tf.reset_default_graph()`。示例代码：\n```python\nimport tensorflow as tf\ntf.reset_default_graph()\nsess = gpt2.start_tf_sess()\ngpt2.finetune(sess, 'java_train_java.txt', model_name=model_name, steps=1000)\n```","https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Fgpt-2-simple\u002Fissues\u002F97",{"id":121,"question_zh":122,"answer_zh":123,"source_url":119},8833,"训练1.8GB文本文件是否需要GPU？","是的，尤其对于大文本文件，通常需要GPU进行训练。",{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},8834,"'Gold sampling'是什么意思？","'Gold' 指的是实际文本延续，即ground truth，是人工撰写的正确续写。在示例中，它是模型生成的对比基准。","https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Fgpt-2-simple\u002Fissues\u002F51",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},8835,"如何让GPT-2聊天机器人记住对话上下文？","创建一个'memory-context bank'存储对话历史，将整个对话作为前缀输入模型。例如，存储查询和响应，每次生成时使用完整上下文。具体实现可参考Colab notebook: https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1qbVsxvCjfTdWCqBzciLo1YeY0FhzQ-pn","https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Fgpt-2-simple\u002Fissues\u002F109",[135,140,145,150,155,160,165,170,175,180,185,190,195,200,205,210,215],{"id":136,"version":137,"summary_zh":138,"released_at":139},115874,"v0.8.1","Thanks to https:\u002F\u002Fgithub.com\u002FYaleDHLab via https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Fgpt-2-simple\u002Fpull\u002F275, gpt-2-simple now supports TensorFlow 2 by default, and the minimum TensorFlow version is now 2.5.1! The Colab Notebook has also been update to no longer use TensorFlow 1.X.\r\n\r\nNote: Development on gpt-2-simple has mostly been superceded by [aitextgen](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Faitextgen), which has similar AI text generation capabilities with more efficient training time and resource usage. If you do not require using TensorFlow, I recommend using aitextgen instead. Checkpoints trained using gpt-2-simple can be [loaded using aitextgen](https:\u002F\u002Fdocs.aitextgen.io\u002Fgpt-2-simple\u002F) as well.\r\n\r\n","2021-10-18T02:38:39",{"id":141,"version":142,"summary_zh":143,"released_at":144},115875,"v0.7.2","* Switched the model URL from GCP to Azure. (#253) \r\n* Pin TensorFlow 1.15 (#200)\r\n* Add checkpoint loading from other checkpoints (#175)","2021-02-14T21:13:20",{"id":146,"version":147,"summary_zh":148,"released_at":149},115876,"v0.7.1","Some have successfully finetuned 774M\u002F1558M, so the assert has been removed.","2019-12-28T04:05:21",{"id":151,"version":152,"summary_zh":153,"released_at":154},115877,"v0.7","* Multi-GPU support (#127) (not fully tested; will add some docs when done)\r\n* Fixed checkpoint dir bug (#134)\r\n* Added a hard assert of a TensorFlow version >= 2.0 is used (#137)","2019-12-01T18:39:34",{"id":156,"version":157,"summary_zh":158,"released_at":159},115878,"v0.6","* 774M is explicitly blocked from being fine-tuned and will trigger an assert if attempted. If a way to finetune it without being super-painful is added, the ability to finetune it will be restored.\r\n* Allow ability to generate text from the default pretrained models by passing `model_name` to `gpt2.load_gpt2()` and `gpt2.generate()` (this _will_ work with 774M.\r\n* Add`sgd` as an `optimizer` parameter to `finetune` (default: `adam`)\r\n* Support for changed model names, w\u002F changes more prominent in the README.","2019-08-28T17:11:35",{"id":161,"version":162,"summary_zh":163,"released_at":164},115879,"v0.5.4","Merged a few PRs:\r\n\r\nFixed generate cmd run name: #78 \r\nResolved most depreciation warnings: #83 \r\nOptional model parameters: #90 \r\n\r\nThis does not make the package fully TF 2.0 compatible, but it's a big step!\r\n\r\n","2019-07-29T00:07:52",{"id":166,"version":167,"summary_zh":168,"released_at":169},115880,"v0.5.3","Assertion was triggering false positives, so removing it.","2019-06-19T05:35:46",{"id":171,"version":172,"summary_zh":173,"released_at":174},115881,"v0.5.2","Minor fix to prevent issue hit with gpt-2-cloud-run.\r\n\r\nA goal of the release was to allow a graph reset without resetting the parameters; that did not seem to work, so holding off on that release.","2019-06-18T04:00:18",{"id":176,"version":177,"summary_zh":178,"released_at":179},115882,"v0.5.1","Merged PRs (including fix for prefix issue). (see commits for more info)","2019-06-16T03:16:49",{"id":181,"version":182,"summary_zh":183,"released_at":184},115883,"v0.5","## Adapted a few functions from Neil Shepperd's fork:\r\n\t\r\n* Nucleus Sampling (`top_p`) when generating text, which results in surprisingly different results. (setting `top_p=0.9` works well). Supercedes `top_k` when used. (#51)\r\n* An `encode_dataset()` function to preencode and compress a large dataset before loading it for finetuning. (#19, #54)\r\n\r\n## Improvements to continuing model training:\r\n\r\n* `overwrite` argument for `finetune`: with `restore_from=\"latest\"`, this continues model training without creating a duplicate copy of the model, and is therefore good for transfer learning using multiple datasets (#20)\r\n* You can continue to `finetune` a model without having the original GPT-2 model present.\r\n\t\r\n## Improvements with I\u002FO involving Colaboratory\r\n* Checkpoint folders are now packaged into a `.tar` file when copying to Google Drive, and when copying from Google Drive, the '.tar' file is automatically unpackaged into the correct checkpoint format. (you can pass `copy_folder=True` to the `copy_checkpoint` function to revert to the old behavior). (#37: thanks @woctezuma !)\r\n* `copy_checkpoint_to_gdrive` and `copy_checkpoint_from_gdrive` now take a `run_name` argument instead of a `checkpoint_folder` argument.\r\n\r\n## Miscellaneous\r\n\r\n* Added CLI arguments for `top_k`, `top_p`, `overwrite`.\r\n* Cleaned up redundant function parameters (#39)","2019-05-20T03:53:44",{"id":186,"version":187,"summary_zh":188,"released_at":189},115884,"v0.4.2","* `load_gpt2()` in a fresh session is much faster and uses much less memory when loaded. (for the 117M model, the system will stay under \u003C2 GB RAM which is the critical point for cloud services)\r\n* `start_tf_sess()` now accepts a `threads` parameter, which is useful if you know exactly how many threads will be used.","2019-05-05T22:51:19",{"id":191,"version":192,"summary_zh":193,"released_at":194},115885,"v0.4.1","Number of CSV tokens was inadvertently doubled. (#25)","2019-05-05T16:35:39",{"id":196,"version":197,"summary_zh":198,"released_at":199},115886,"v0.4","* Support the 345M model (thanks to Neil Shepperd for the [gradient checkpointing](https:\u002F\u002Fgithub.com\u002Fnshepperd\u002Fgpt-2\u002Fcommit\u002F47df6da611716b4826e3397cd68d711c6951c8e5) implementation!)\r\n* Support model_name in the CLI for above support\r\n* Support run_name in the CLI\r\n* Support `.csv` files as an input dataset to `finetune` (will parse the CSV as if it was done via `encode_csv()`).\r\n* Fix one off issues (#21)\r\n","2019-05-05T05:39:39",{"id":201,"version":202,"summary_zh":203,"released_at":204},115887,"v0.3.1","* Fix one-off error where checkpoint saved a step early.\r\n* Fix issue where `restore_from='fresh` uses the counter from a previously-trained checkpoint.\r\n* If `restore_from='latest` , `steps` will now train for the specified amount of steps, instead of the training until the specified number of steps. (#13, #14)","2019-04-23T03:36:14",{"id":206,"version":207,"summary_zh":208,"released_at":209},115888,"v0.3","* Added a basic CLI.\r\n* Added a `include_prefix` parameter to give an option to exclude the input prefix.\r\n* Improved regex for truncation.","2019-04-21T17:20:52",{"id":211,"version":212,"summary_zh":213,"released_at":214},115889,"v0.2","* `is_gpt2_downloaded`: Check if the model is downloaded.\r\n* `encode_csv`: Convert a CSV to a format suitable for GPT-2.","2019-04-20T17:43:16",{"id":216,"version":217,"summary_zh":82,"released_at":218},115890,"v0.1","2019-04-19T00:19:44"]