[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-google-research--electra":3,"tool-google-research--electra":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",149489,2,"2026-04-10T11:32:46",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":84,"forks":85,"last_commit_at":86,"license":87,"difficulty_score":88,"env_os":89,"env_gpu":90,"env_ram":89,"env_deps":91,"category_tags":99,"github_topics":100,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":104,"updated_at":105,"faqs":106,"releases":136},6325,"google-research\u002Felectra","electra","ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators","ELECTRA 是一种高效的自监督语言表示学习方法，旨在以更低的计算成本预训练 Transformer 模型。与传统方法通过“生成”缺失文本不同，ELECTRA 创新地采用“判别”机制：它让模型像侦探一样，去识别输入中哪些词是原始的“真词”，哪些是由另一个小型网络替换的“假词”。这种类似生成对抗网络（GAN）的思路，极大地提升了数据利用效率。\n\n这一设计解决了以往大模型预训练极其消耗算力、小模型效果不佳的痛点。即使在单张 GPU 上训练小型模型，ELECTRA 也能获得强劲的性能；而在大规模训练下，它在 SQuAD 问答和 GLUE 语言理解等基准测试中均达到了业界领先水平，表现优于同量级的 BERT 或 ALBERT 模型。\n\nELECTRA 非常适合自然语言处理领域的研究人员和开发者使用，尤其是那些希望在有限硬件资源下进行模型预训练，或需要为分类、问答、序列标注等下游任务微调高性能模型的技术团队。其核心亮点在于独特的“替换令牌检测”预训练任务，不仅大幅降低了训练门槛，还衍生出了基于能量模型的 Electric 变体，可用于文本重排序等高级场景。","# ELECTRA\n\n## Introduction\n\n**ELECTRA** is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish \"real\" input tokens vs \"fake\" input tokens generated by another neural network, similar to the discriminator of a [GAN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F) dataset.\n\nFor a detailed description and experimental results, please refer to our ICLR 2020 paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https:\u002F\u002Fopenreview.net\u002Fpdf?id=r1xMH1BtvB).\n\nThis repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https:\u002F\u002Fgluebenchmark.com\u002F)), QA tasks (e.g., [SQuAD](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F)), and sequence tagging tasks (e.g., [text chunking](https:\u002F\u002Fwww.clips.uantwerpen.be\u002Fconll2000\u002Fchunking\u002F)).\n\nThis repository also contains code for **Electric**, a version of ELECTRA inspired by [energy-based models](http:\u002F\u002Fyann.lecun.com\u002Fexdb\u002Fpublis\u002Fpdf\u002Flecun-06.pdf). Electric provides a more principled view of ELECTRA as a \"negative sampling\" [cloze model](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCloze_test). It can also efficiently produce [pseudo-likelihood scores](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.14659.pdf) for text, which can be used to re-rank the outputs of speech recognition or machine translation systems. For details on Electric, please refer to out EMNLP 2020 paper [Pre-Training Transformers as Energy-Based Cloze Models](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.emnlp-main.20.pdf).\n\n\n\n## Released Models\n\nWe are initially releasing three pre-trained models:\n\n| Model | Layers | Hidden Size | Params | GLUE score (test set) | Download |\n| --- | --- | --- | --- | ---  | --- |\n| ELECTRA-Small | 12 | 256 | 14M | 77.4  | [link](https:\u002F\u002Fstorage.googleapis.com\u002Felectra-data\u002Felectra_small.zip) |\n| ELECTRA-Base | 12 | 768 | 110M | 82.7 | [link](https:\u002F\u002Fstorage.googleapis.com\u002Felectra-data\u002Felectra_base.zip) |\n| ELECTRA-Large | 24 | 1024 | 335M |  85.2 | [link](https:\u002F\u002Fstorage.googleapis.com\u002Felectra-data\u002Felectra_large.zip) |\n\nThe models were trained on uncased English text. They correspond to ELECTRA-Small++, ELECTRA-Base++, ELECTRA-1.75M  in our paper. We hope to release other models, such as multilingual models, in the future.\n\nOn [GLUE](https:\u002F\u002Fgluebenchmark.com\u002F), ELECTRA-Large scores slightly better than ALBERT\u002FXLNET, ELECTRA-Base scores better than BERT-Large, and ELECTRA-Small scores slightly worst than [TinyBERT](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.10351) (but uses no distillation). See the expected results section below for detailed performance numbers.\n\n\n\n## Requirements\n* Python 3\n* [TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002F) 1.15 (although we hope to support TensorFlow 2.0 at a future date)\n* [NumPy](https:\u002F\u002Fnumpy.org\u002F)\n* [scikit-learn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002F) and [SciPy](https:\u002F\u002Fwww.scipy.org\u002F) (for computing some evaluation metrics).\n\n## Pre-training\nUse `build_pretraining_dataset.py` to create a pre-training dataset from a dump of raw text. It has the following arguments:\n\n* `--corpus-dir`: A directory containing raw text files to turn into ELECTRA examples. A text file can contain multiple documents with empty lines separating them.\n* `--vocab-file`: File defining the wordpiece vocabulary.\n* `--output-dir`: Where to write out ELECTRA examples.\n* `--max-seq-length`: The number of tokens per example (128 by default).\n* `--num-processes`: If >1 parallelize across multiple processes (1 by default).\n* `--blanks-separate-docs`: Whether blank lines indicate document boundaries (True by default).\n* `--do-lower-case\u002F--no-lower-case`: Whether to lower case the input text (True by default).\n\nUse `run_pretraining.py` to pre-train an ELECTRA model. It has the following arguments:\n\n* `--data-dir`: a directory where pre-training data, model weights, etc. are stored. By default, the training loads examples from `\u003Cdata-dir>\u002Fpretrain_tfrecords` and a vocabulary from `\u003Cdata-dir>\u002Fvocab.txt`.\n*  `--model-name`: a name for the model being trained. Model weights will be saved in `\u003Cdata-dir>\u002Fmodels\u002F\u003Cmodel-name>` by default.\n* `--hparams` (optional): a JSON dict or path to a JSON file containing model hyperparameters, data paths, etc. See `configure_pretraining.py` for the supported hyperparameters.\n\nIf training is halted, re-running the `run_pretraining.py` with the same arguments will continue the training where it left off.\n\nYou can continue pre-training from the released ELECTRA checkpoints by\n1. Setting the model-name to point to a downloaded model (e.g., `--model-name electra_small` if you downloaded weights to `$DATA_DIR\u002Felectra_small`).\n2. Setting `num_train_steps` by (for example) adding `\"num_train_steps\": 4010000` to the `--hparams`. This will continue training the small model for 10000 more steps (it has already been trained for 4e6 steps).\n3. Increase the learning rate to account for the linear learning rate decay. For example, to start with a learning rate of 2e-4 you should set the `learning_rate` hparam to 2e-4 * (4e6 + 10000) \u002F 10000.\n4. For ELECTRA-Small, you also need to specifiy `\"generator_hidden_size\": 1.0` in the `hparams` because we did not use a small generator for that model.\n\n##  Quickstart: Pre-train a small ELECTRA model.\nThese instructions pre-train a small ELECTRA model (12 layers, 256 hidden size). Unfortunately, the data we used in the paper is not publicly available, so we will use the [OpenWebTextCorpus](https:\u002F\u002Fskylion007.github.io\u002FOpenWebTextCorpus\u002F) released by Aaron Gokaslan and Vanya Cohen instead. The fully-trained model (~4 days on a v100 GPU) should perform roughly in between [GPT](https:\u002F\u002Fs3-us-west-2.amazonaws.com\u002Fopenai-assets\u002Fresearch-covers\u002Flanguage-unsupervised\u002Flanguage_understanding_paper.pdf) and BERT-Base in terms of GLUE performance. By default the model is trained on length-128 sequences, so it is not suitable for running on question answering. See the \"expected results\" section below for more details on model performance.\n\n#### Setup\n1. Place a vocabulary file in `$DATA_DIR\u002Fvocab.txt`. Our ELECTRA models all used the exact same vocabulary as English uncased BERT, which you can download [here](https:\u002F\u002Fstorage.googleapis.com\u002Felectra-data\u002Fvocab.txt).\n2. Download the [OpenWebText](https:\u002F\u002Fskylion007.github.io\u002FOpenWebTextCorpus\u002F) corpus (12G) and extract it  (i.e., run `tar xf openwebtext.tar.xz`). Place it in `$DATA_DIR\u002Fopenwebtext`.\n3. Run `python3 build_openwebtext_pretraining_dataset.py --data-dir $DATA_DIR --num-processes 5`. It pre-processes\u002Ftokenizes the data and outputs examples as [tfrecord](https:\u002F\u002Fwww.tensorflow.org\u002Ftutorials\u002Fload_data\u002Ftfrecord) files under `$DATA_DIR\u002Fpretrain_tfrecords`. The tfrecords require roughly 30G of disk space.\n\n#### Pre-training the model.\nRun `python3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt`\nto train a small ELECTRA model for 1 million steps on the data. This takes slightly over 4 days on a Tesla V100 GPU. However, the model should achieve decent results after 200k steps (10 hours of training on the v100 GPU).\n\nTo customize the training, add `--hparams '{\"hparam1\": value1, \"hparam2\": value2, ...}'` to the run command. `--hparams` can also be a path to a `.json` file containing the hyperparameters. Some particularly useful options:\n\n* `\"debug\": true` trains a tiny ELECTRA model for a few steps.\n* `\"model_size\": one of \"small\", \"base\", or \"large\"`: determines the size of the model\n* `\"electra_objective\": false` trains a model with masked language modeling instead of replaced token detection (essentially BERT with dynamic masking and no next-sentence prediction).\n* `\"num_train_steps\": n` controls how long the model is pre-trained for.\n* `\"pretrain_tfrecords\": \u003Cpaths>` determines where the pre-training data is located. Note you need to specify the specific files not just the directory (e.g., `\u003Cdata-dir>\u002Fpretrain_tf_records\u002Fpretrain_data.tfrecord*`)\n* `\"vocab_file\": \u003Cpath>` and `\"vocab_size\": n` can be used to set a custom wordpiece vocabulary.\n* `\"learning_rate\": lr, \"train_batch_size\": n`, etc. can be used to change training hyperparameters\n* `\"model_hparam_overrides\": {\"hidden_size\": n, \"num_hidden_layers\": m}`, etc. can be used to changed the hyperparameters for the underlying transformer (the `\"model_size\"` flag sets the default values).\n\nSee `configure_pretraining.py` for the full set of supported hyperparameters.\n\n#### Evaluating the pre-trained model.\n\nTo evaluate the model on a downstream task, see the below finetuning instructions. To evaluate the generator\u002Fdiscriminator on the openwebtext data run `python3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt --hparams '{\"do_train\": false, \"do_eval\": true}'`. This will print out eval metrics such as the accuracy of the generator and discriminator, and also writing the metrics out to `data-dir\u002Fmodel-name\u002Fresults`.\n\n## Fine-tuning\n\nUse `run_finetuning.py` to fine-tune and evaluate an ELECTRA model on a downstream NLP task. It expects three arguments:\n\n* `--data-dir`: a directory where data, model weights, etc. are stored. By default, the script loads finetuning data from `\u003Cdata-dir>\u002Ffinetuning_data\u002F\u003Ctask-name>` and a vocabulary from `\u003Cdata-dir>\u002Fvocab.txt`.\n*  `--model-name`: a name of the pre-trained model: the pre-trained weights should exist in `data-dir\u002Fmodels\u002Fmodel-name`.\n* `--hparams`: a JSON dict containing model hyperparameters, data paths, etc. (e.g., `--hparams '{\"task_names\": [\"rte\"], \"model_size\": \"base\", \"learning_rate\": 1e-4, ...}'`). See `configure_pretraining.py` for the supported hyperparameters.  Instead of a dict, this can also be a path to a `.json` file containing the hyperparameters. You must specify the `\"task_names\"` and `\"model_size\"` (see examples below).\n\nEval metrics will be saved in `data-dir\u002Fmodel-name\u002Fresults` and model weights will be saved in `data-dir\u002Fmodel-name\u002Ffinetuning_models` by default. Evaluation is done on the dev set by default. To customize the training, add `--hparams '{\"hparam1\": value1, \"hparam2\": value2, ...}'` to the run command. Some particularly useful options:\n\n* `\"debug\": true` fine-tunes a tiny ELECTRA model for a few steps.\n* `\"task_names\": [\"task_name\"]`: specifies the tasks to train on. A list because the codebase nominally supports multi-task learning, (although be warned this has not been thoroughly tested).\n* `\"model_size\": one of \"small\", \"base\", or \"large\"`: determines the size of the model; you must set this to the same size as the pre-trained model.\n* `\"do_train\" and \"do_eval\"`: train and\u002For evaluate a model (both are set to true by default). For using `\"do_eval\": true` with `\"do_train\": false`, you need to specify the `init_checkpoint`, e.g., `python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{\"model_size\": \"base\", \"task_names\": [\"mnli\"], \"do_train\": false, \"do_eval\": true, \"init_checkpoint\": \"\u003Cdata-dir>\u002Fmodels\u002Felectra_base\u002Ffinetuning_models\u002Fmnli_model_1\"}'`\n* `\"num_trials\": n`: If >1, does multiple fine-tuning\u002Fevaluation runs with different random seeds.\n* `\"learning_rate\": lr, \"train_batch_size\": n`, etc. can be used to change training hyperparameters.\n* `\"model_hparam_overrides\": {\"hidden_size\": n, \"num_hidden_layers\": m}`, etc. can be used to changed the hyperparameters for the underlying transformer (the `\"model_size\"` flag sets the default values).\n\n### Setup\nGet a pre-trained ELECTRA model either by training your own (see pre-training instructions above), or downloading the release ELECTRA weights and unziping them under `$DATA_DIR\u002Fmodels` (e.g., you should have a directory`$DATA_DIR\u002Fmodels\u002Felectra_large` if you are using the large model).\n\n\n### Finetune ELECTRA on a GLUE  task\n\nDownload the GLUE data by running [this script](https:\u002F\u002Fgist.github.com\u002FW4ngatang\u002F60c2bdb54d156a41194446737ce03e2e). Set up the data by running `mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic\u002Fdiagnostic.tsv mnli && mkdir -p $DATA_DIR\u002Ffinetuning_data && mv * $DATA_DIR\u002Ffinetuning_data`.\n\nThen run `run_finetuning.py`. For example, to fine-tune ELECTRA-Base  on MNLI\n```\npython3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{\"model_size\": \"base\", \"task_names\": [\"mnli\"]}'\n```\nOr fine-tune a small model pre-trained using the above instructions on CoLA.\n```\npython3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_small_owt --hparams '{\"model_size\": \"small\", \"task_names\": [\"cola\"]}'\n```\n\n### Finetune ELECTRA on question answering\n\nThe code supports [SQuAD](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F) 1.1 and 2.0, as well as datasets in [the 2019 MRQA shared task](https:\u002F\u002Fgithub.com\u002Fmrqa\u002FMRQA-Shared-Task-2019)\n\n* **Squad 1.1**: Download the [train](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002Fdataset\u002Ftrain-v1.1.json) and [dev](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002Fdataset\u002Fdev-v1.1.json) datasets and move them under `$DATA_DIR\u002Ffinetuning_data\u002Fsquadv1\u002F(train|dev).json`\n* **Squad 2.0**: Download the datasets from the [SQuAD Website](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F) and move them under `$DATA_DIR\u002Ffinetuning_data\u002Fsquad\u002F(train|dev).json`\n* **MRQA tasks**: Download the data from [here](https:\u002F\u002Fgithub.com\u002Fmrqa\u002FMRQA-Shared-Task-2019#datasets). Move the data to `$DATA_DIR\u002Ffinetuning_data\u002F(newsqa|naturalqs|triviaqa|searchqa)\u002F(train|dev).jsonl`.\n\nThen run (for example)\n```\npython3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{\"model_size\": \"base\", \"task_names\": [\"squad\"]}'\n```\n\nThis repository uses the official evaluation code released by the [SQuAD](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F) authors and [the MRQA shared task](https:\u002F\u002Fgithub.com\u002Fmrqa\u002FMRQA-Shared-Task-2019) to compute metrics\n\n### Finetune ELECTRA on sequence tagging\n\nDownload the CoNLL-2000 text chunking dataset from [here](https:\u002F\u002Fwww.clips.uantwerpen.be\u002Fconll2000\u002Fchunking\u002F) and put it under `$DATA_DIR\u002Ffinetuning_data\u002Fchunk\u002F(train|dev).txt`. Then run\n```\npython3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{\"model_size\": \"base\", \"task_names\": [\"chunk\"]}'\n```\n\n### Adding a new task\nThe easiest way to run on a new task is to implement a new `finetune.task.Task`, add it to `finetune.task_builder.py`, and then use `run_finetuning.py` as normal. For classification\u002Fqa\u002Fsequence tagging, you can inherit from a `finetune.classification.classification_tasks.ClassificationTask`, `finetune.qa.qa_tasks.QATask`, or `finetune.tagging.tagging_tasks.TaggingTask`.\nFor preprocessing data, we use the same tokenizer as [BERT](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fbert).\n\n\n## Expected Results\nHere are expected results for ELECTRA on various tasks (test set for chunking, dev set for the other tasks). Note that variance in fine-tuning can be [quite large](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.06305), so for some tasks you may see big fluctuations in scores when fine-tuning from the same checkpoint multiple times. The below scores show median performance over a large number of random seeds.  ELECTRA-Small\u002FBase\u002FLarge are our released models. ELECTRA-Small-OWT is the OpenWebText-trained model from above (it performs a bit worse than ELECTRA-Small due to being trained for less time and on a smaller dataset).\n\n|  | CoLA | SST | MRPC | STS  | QQP  | MNLI | QNLI | RTE | SQuAD 1.1 | SQuAD 2.0 | Chunking |\n| --- | --- | --- | --- | ---  | ---  | --- | --- | --- | ---| ---| --- |\n| Metrics | MCC | Acc | Acc | Spearman  | Acc  | Acc | Acc | Acc | EM | EM | F1 |\n| ELECTRA-Large| 69.1 | 96.9 | 90.8 | 92.6 | 92.4 | 90.9 | 95.0 | 88.0 | 89.7 | 88.1 | 97.2 |\n| ELECTRA-Base | 67.7 | 95.1 | 89.5 | 91.2 | 91.5  | 88.8  | 93.2 | 82.7 | 86.8 | 80.5 | 97.1 |\n| ELECTRA-Small | 57.0 | 91.2 | 88.0 |  87.5 | 89.0  | 81.3 | 88.4 | 66.7 | 75.8 | 70.1 |  96.5 |\n| ELECTRA-Small-OWT | 56.8 | 88.3 | 87.4 |  86.8 | 88.3  | 78.9 | 87.9 | 68.5 | -- | -- |  -- |\n\nSee [here](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Felectra\u002Fissues\u002F3) for losses \u002F training curves of the models during pre-training.\n\n## Electric\n\nTo train [Electric](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.emnlp-main.20.pdf), use the same pre-training script and command as ELECTRA. Pass `\"electra_objective\": false` and  `\"electric_objective\": true` to the hyperparameters. We plan to release pre-trained Electric models soon!\n\n## Citation\nIf you use this code for your publication, please cite the original paper:\n```\n@inproceedings{clark2020electra,\n  title = {{ELECTRA}: Pre-training Text Encoders as Discriminators Rather Than Generators},\n  author = {Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning},\n  booktitle = {ICLR},\n  year = {2020},\n  url = {https:\u002F\u002Fopenreview.net\u002Fpdf?id=r1xMH1BtvB}\n}\n```\n\nIf you use the code for Electric, please cite the Electric paper:\n```\n@inproceedings{clark2020electric,\n  title = {Pre-Training Transformers as Energy-Based Cloze Models},\n  author = {Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning},\n  booktitle = {EMNLP},\n  year = {2020},\n  url = {https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.emnlp-main.20.pdf}\n}\n```\n\n## Contact Info\nFor help or issues using ELECTRA, please submit a GitHub issue.\n\nFor personal communication related to ELECTRA, please contact [Kevin Clark](https:\u002F\u002Fcs.stanford.edu\u002F~kevclark\u002F) (`kevclark@cs.stanford.edu`).\n","# ELECTRA\n\n## 简介\n\n**ELECTRA** 是一种自监督的语言表示学习方法。它可以在相对较少的计算资源下对 Transformer 网络进行预训练。ELECTRA 模型通过区分“真实”输入标记与由另一个神经网络生成的“虚假”输入标记来进行训练，这类似于 [GAN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1406.2661.pdf) 中的判别器。在小规模情况下，ELECTRA 即使只使用单个 GPU 训练也能取得优异的效果。而在大规模情况下，ELECTRA 在 [SQuAD 2.0](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F) 数据集上达到了最先进的水平。\n\n有关详细说明和实验结果，请参阅我们在 ICLR 2020 上发表的论文 [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https:\u002F\u002Fopenreview.net\u002Fpdf?id=r1xMH1BtvB)。\n\n本仓库包含用于预训练 ELECTRA 的代码，其中包括在单个 GPU 上运行的小型 ELECTRA 模型。它还支持在下游任务上对 ELECTRA 进行微调，这些任务包括分类任务（例如 [GLUE](https:\u002F\u002Fgluebenchmark.com\u002F)）、问答任务（例如 [SQuAD](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F)）以及序列标注任务（例如 [文本分块](https:\u002F\u002Fwww.clips.uantwerpen.be\u002Fconll2000\u002Fchunking\u002F)）。\n\n此外，本仓库还包含 **Electric** 的代码，它是受 [基于能量的模型](http:\u002F\u002Fyann.lecun.com\u002Fexdb\u002Fpublis\u002Fpdf\u002Flecun-06.pdf) 启发的 ELECTRA 版本。Electric 将 ELECTRA 更加严谨地视为一种“负采样”的 [完形填空模型](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCloze_test)。它还可以高效地生成文本的 [伪似然分数](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.14659.pdf)，这些分数可用于对语音识别或机器翻译系统的输出进行重新排序。有关 Electric 的详细信息，请参阅我们在 EMNLP 2020 上发表的论文 [Pre-Training Transformers as Energy-Based Cloze Models](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.emnlp-main.20.pdf)。\n\n\n\n## 已发布的模型\n\n我们最初发布了三个预训练模型：\n\n| 模型 | 层数 | 隐藏层大小 | 参数量 | GLUE 分数（测试集） | 下载链接 |\n| --- | --- | --- | --- | ---  | --- |\n| ELECTRA-Small | 12 | 256 | 14M | 77.4  | [链接](https:\u002F\u002Fstorage.googleapis.com\u002Felectra-data\u002Felectra_small.zip) |\n| ELECTRA-Base | 12 | 768 | 110M | 82.7 | [链接](https:\u002F\u002Fstorage.googleapis.com\u002Felectra-data\u002Felectra_base.zip) |\n| ELECTRA-Large | 24 | 1024 | 335M |  85.2 | [链接](https:\u002F\u002Fstorage.googleapis.com\u002Felectra-data\u002Felectra_large.zip) |\n\n这些模型是在不区分大小写的英文文本上训练的。它们分别对应于我们论文中的 ELECTRA-Small++、ELECTRA-Base++ 和 ELECTRA-1.75M。我们希望在未来发布其他模型，例如多语言模型。\n\n在 [GLUE](https:\u002F\u002Fgluebenchmark.com\u002F) 基准上，ELECTRA-Large 的得分略高于 ALBERT\u002FXLNET，ELECTRA-Base 的得分优于 BERT-Large，而 ELECTRA-Small 的得分则略低于 [TinyBERT](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.10351)（但未使用蒸馏技术）。详细的性能指标请参见下方的预期结果部分。\n\n\n\n## 要求\n* Python 3\n* [TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002F) 1.15（尽管我们希望在未来支持 TensorFlow 2.0）\n* [NumPy](https:\u002F\u002Fnumpy.org\u002F)\n* [scikit-learn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002F) 和 [SciPy](https:\u002F\u002Fwww.scipy.org\u002F)（用于计算部分评估指标）。\n\n## 预训练\n使用 `build_pretraining_dataset.py` 可以从原始文本转储中创建预训练数据集。该脚本具有以下参数：\n\n* `--corpus-dir`: 包含要转换为 ELECTRA 示例的原始文本文件的目录。一个文本文件可以包含多个文档，各文档之间用空行分隔。\n* `--vocab-file`: 定义 WordPiece 词汇表的文件。\n* `--output-dir`: 存储 ELECTRA 示例的输出目录。\n* `--max-seq-length`: 每个示例的最大标记数（默认为 128）。\n* `--num-processes`: 如果大于 1，则会并行化多个进程（默认为 1）。\n* `--blanks-separate-docs`: 是否将空行视为文档边界（默认为 True）。\n* `--do-lower-case\u002F--no-lower-case`: 是否将输入文本转换为小写（默认为 True）。\n\n使用 `run_pretraining.py` 可以对 ELECTRA 模型进行预训练。该脚本具有以下参数：\n\n* `--data-dir`: 存储预训练数据、模型权重等的目录。默认情况下，训练会从 `\u003Cdata-dir>\u002Fpretrain_tfrecords` 加载示例，并从 `\u003Cdata-dir>\u002Fvocab.txt` 加载词汇表。\n* `--model-name`: 正在训练的模型名称。模型权重默认会保存到 `\u003Cdata-dir>\u002Fmodels\u002F\u003Cmodel-name>`。\n* `--hparams`（可选）：包含模型超参数、数据路径等的 JSON 字典或 JSON 文件路径。支持的超参数请参阅 `configure_pretraining.py`。\n\n如果训练被中断，再次运行带有相同参数的 `run_pretraining.py` 将会从中断处继续训练。\n\n您可以通过以下步骤从已发布的 ELECTRA 检查点继续预训练：\n1. 将 `model-name` 设置为指向已下载的模型（例如，如果您将权重下载到 `$DATA_DIR\u002Felectra_small`，则设置为 `--model-name electra_small`）。\n2. 设置 `num_train_steps`（例如，在 `--hparams` 中添加 `\"num_train_steps\": 4010000`）。这将使小型模型再训练 10000 步（它已经训练了 400 万步）。\n3. 提高学习率以适应线性学习率衰减。例如，若要以 2e-4 的初始学习率开始，应将 `learning_rate` 超参数设置为 2e-4 * (4e6 + 10000) \u002F 10000。\n4. 对于 ELECTRA-Small，还需要在 `hparams` 中指定 `\"generator_hidden_size\": 1.0`，因为我们并未为此模型使用小型生成器。\n\n## 快速入门：预训练一个小型 ELECTRA 模型。\n这些说明将预训练一个小型 ELECTRA 模型（12 层，隐藏层大小为 256）。遗憾的是，我们在论文中使用的数据并未公开，因此我们将改用由 Aaron Gokaslan 和 Vanya Cohen 发布的 [OpenWebTextCorpus](https:\u002F\u002Fskylion007.github.io\u002FOpenWebTextCorpus\u002F) 数据集。完全训练好的模型（在 v100 GPU 上大约需要 4 天）在 GLUE 任务上的表现大致介于 [GPT](https:\u002F\u002Fs3-us-west-2.amazonaws.com\u002Fopenai-assets\u002Fresearch-covers\u002Flanguage-unsupervised\u002Flanguage_understanding_paper.pdf) 和 BERT-Base 之间。默认情况下，该模型是在长度为 128 的序列上进行训练的，因此不适合用于问答任务。有关模型性能的更多详细信息，请参阅下方的“预期结果”部分。\n\n#### 设置\n1. 将词汇表文件放置在 `$DATA_DIR\u002Fvocab.txt` 中。我们的 ELECTRA 模型均使用与英语小写 BERT 完全相同的词汇表，您可以从[这里](https:\u002F\u002Fstorage.googleapis.com\u002Felectra-data\u002Fvocab.txt)下载。\n2. 下载 [OpenWebText](https:\u002F\u002Fskylion007.github.io\u002FOpenWebTextCorpus\u002F) 数据集（12GB），并解压它（即运行 `tar xf openwebtext.tar.xz`）。将其放置在 `$DATA_DIR\u002Fopenwebtext` 目录下。\n3. 运行 `python3 build_openwebtext_pretraining_dataset.py --data-dir $DATA_DIR --num-processes 5`。此脚本会预处理\u002F分词数据，并将示例以 [tfrecord](https:\u002F\u002Fwww.tensorflow.org\u002Ftutorials\u002Fload_data\u002Ftfrecord) 文件的形式输出到 `$DATA_DIR\u002Fpretrain_tfrecords` 目录下。这些 tfrecord 文件大约需要 30GB 的磁盘空间。\n\n#### 预训练模型。\n运行 `python3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt`，即可在该数据集上对小型 ELECTRA 模型进行 100 万步的预训练。这在 Tesla V100 GPU 上大约需要略多于 4 天的时间。不过，该模型在经过 20 万步（约 10 小时的 v100 GPU 训练）后便能达到不错的效果。\n\n若要自定义训练过程，可在运行命令中添加 `--hparams '{\"hparam1\": value1, \"hparam2\": value2, ...}'`。`--hparams` 也可以是一个包含超参数的 `.json` 文件路径。一些特别有用的选项包括：\n\n* `\"debug\": true`：训练一个微型 ELECTRA 模型，仅进行几步。\n* `\"model_size\": one of \"small\", \"base\", or \"large\"`：决定模型的大小。\n* `\"electra_objective\": false`：训练一个采用掩码语言建模而非替换标记检测的模型（本质上是带有动态掩码、无下一句预测的 BERT）。\n* `\"num_train_steps\": n`：控制预训练的持续时间。\n* `\"pretrain_tfrecords\": \u003Cpaths>`：指定预训练数据的位置。请注意，您需要指定具体的文件名，而不仅仅是目录（例如 `\u003Cdata-dir>\u002Fpretrain_tf_records\u002Fpretrain_data.tfrecord*`）。\n* `\"vocab_file\": \u003Cpath>` 和 `\"vocab_size\": n` 可用于设置自定义的 WordPiece 词汇表。\n* `\"learning_rate\": lr, \"train_batch_size\": n` 等参数可用于调整训练超参数。\n* `\"model_hparam_overrides\": {\"hidden_size\": n, \"num_hidden_layers\": m}` 等参数可用于修改底层 Transformer 的超参数（`\"model_size\"` 标志会设置默认值）。\n\n完整的支持超参数列表请参阅 `configure_pretraining.py`。\n\n#### 评估预训练模型。\n\n若要在下游任务上评估模型，请参阅下方的微调说明。若要在 OpenWebText 数据上评估生成器和判别器，则运行 `python3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt --hparams '{\"do_train\": false, \"do_eval\": true}'`。此命令将打印出生成器和判别器的准确率等评估指标，并将这些指标保存到 `data-dir\u002Fmodel-name\u002Fresults` 文件中。\n\n## 微调\n\n使用 `run_finetuning.py` 脚本可以在下游 NLP 任务上对 ELECTRA 模型进行微调和评估。该脚本需要三个参数：\n\n* `--data-dir`：存储数据、模型权重等的目录。默认情况下，脚本会从 `\u003Cdata-dir>\u002Ffinetuning_data\u002F\u003Ctask-name>` 加载微调数据，并从 `\u003Cdata-dir>\u002Fvocab.txt` 加载词汇表。\n* `--model-name`：预训练模型的名称；预训练权重应位于 `data-dir\u002Fmodels\u002Fmodel-name` 目录下。\n* `--hparams`：包含模型超参数、数据路径等信息的 JSON 字典（例如 `--hparams '{\"task_names\": [\"rte\"], \"model_size\": \"base\", \"learning_rate\": 1e-4, ...}'`）。支持的超参数请参阅 `configure_pretraining.py`。除了字典之外，此参数也可以是一个包含超参数的 `.json` 文件路径。您必须指定 `\"task_names\"` 和 `\"model_size\"`（见下方示例）。\n\n评估指标默认会保存到 `data-dir\u002Fmodel-name\u002Fresults`，模型权重则默认保存到 `data-dir\u002Fmodel-name\u002Ffinetuning_models`。默认情况下，评估是在开发集上进行的。若要自定义训练过程，可在运行命令中添加 `--hparams '{\"hparam1\": value1, \"hparam2\": value2, ...}'`。一些特别有用的选项包括：\n\n* `\"debug\": true`：对微型 ELECTRA 模型进行几步骤的微调。\n* `\"task_names\": [\"task_name\"]`：指定要训练的任务。此处使用列表是因为代码库理论上支持多任务学习，但请注意，这一功能尚未经过充分测试。\n* `\"model_size\": one of \"small\", \"base\", or \"large\"`：决定模型的大小；您必须将其设置为与预训练模型相同的尺寸。\n* `\"do_train\" and \"do_eval\"`：训练和\u002F或评估模型（默认均为真）。若要使用 `\"do_eval\": true` 且 `\"do_train\": false\"`，则需要指定 `init_checkpoint`，例如 `python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{\"model_size\": \"base\", \"task_names\": [\"mnli\"], \"do_train\": false, \"do_eval\": true, \"init_checkpoint\": \"\u003Cdata-dir>\u002Fmodels\u002Felectra_base\u002Ffinetuning_models\u002Fmnli_model_1\"}'`。\n* `\"num_trials\": n`：如果大于 1，则会使用不同的随机种子进行多次微调和评估。\n* `\"learning_rate\": lr, \"train_batch_size\": n` 等参数可用于调整训练超参数。\n* `\"model_hparam_overrides\": {\"hidden_size\": n, \"num_hidden_layers\": m}` 等参数可用于修改底层 Transformer 的超参数（`\"model_size\"` 标志会设置默认值）。\n\n### 设置\n获取一个预训练的 ELECTRA 模型，方法有两种：一是自行训练（参见上述预训练说明），二是下载官方发布的 ELECTRA 权重，并将其解压到 `$DATA_DIR\u002Fmodels` 目录下（例如，如果您使用大型模型，则应有一个 `$DATA_DIR\u002Fmodels\u002Felectra_large` 目录）。\n\n### 在 GLUE 任务上微调 ELECTRA\n\n通过运行[此脚本](https:\u002F\u002Fgist.github.com\u002FW4ngatang\u002F60c2bdb54d156a41194446737ce03e2e)下载 GLUE 数据。然后执行以下命令来整理数据：`mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic\u002Fdiagnostic.tsv mnli && mkdir -p $DATA_DIR\u002Ffinetuning_data && mv * $DATA_DIR\u002Ffinetuning_data`。\n\n接着运行 `run_finetuning.py`。例如，要在 MNLI 上微调 ELECTRA-Base 模型：\n```\npython3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{\"model_size\": \"base\", \"task_names\": [\"mnli\"]}'\n```\n\n或者使用上述说明预训练的小模型在 CoLA 上进行微调：\n```\npython3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_small_owt --hparams '{\"model_size\": \"small\", \"task_names\": [\"cola\"]}'\n```\n\n### 在问答任务上微调 ELECTRA\n\n该代码支持 [SQuAD](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F) 1.1 和 2.0，以及 [2019 年 MRQA 共享任务](https:\u002F\u002Fgithub.com\u002Fmrqa\u002FMRQA-Shared-Task-2019)中的数据集。\n\n* **SQuAD 1.1**：下载 [train](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002Fdataset\u002Ftrain-v1.1.json) 和 [dev](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002Fdataset\u002Fdev-v1.1.json) 数据集，并将其移动到 `$DATA_DIR\u002Ffinetuning_data\u002Fsquadv1\u002F(train|dev).json`。\n* **SQuAD 2.0**：从 [SQuAD 官网](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F)下载数据集，并将其移动到 `$DATA_DIR\u002Ffinetuning_data\u002Fsquad\u002F(train|dev).json`。\n* **MRQA 任务**：从 [这里](https:\u002F\u002Fgithub.com\u002Fmrqa\u002FMRQA-Shared-Task-2019#datasets)下载数据。将数据移至 `$DATA_DIR\u002Ffinetuning_data\u002F(newsqa|naturalqs|triviaqa|searchqa)\u002F(train|dev).jsonl`。\n\n然后运行（例如）：\n```\npython3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{\"model_size\": \"base\", \"task_names\": [\"squad\"]}'\n```\n\n此仓库使用由 [SQuAD](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F) 作者和 [MRQA 共享任务](https:\u002F\u002Fgithub.com\u002Fmrqa\u002FMRQA-Shared-Task-2019)发布的官方评估代码来计算指标。\n\n### 在序列标注任务上微调 ELECTRA\n\n从 [这里](https:\u002F\u002Fwww.clips.uantwerpen.be\u002Fconll2000\u002Fchunking\u002F)下载 CoNLL-2000 文本分块数据集，并将其放置在 `$DATA_DIR\u002Ffinetuning_data\u002Fchunk\u002F(train|dev).txt` 下。然后运行：\n```\npython3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{\"model_size\": \"base\", \"task_names\": [\"chunk\"]}'\n```\n\n### 添加新任务\n在新任务上运行的最简单方法是实现一个新的 `finetune.task.Task`，将其添加到 `finetune.task_builder.py` 中，然后像往常一样使用 `run_finetuning.py`。对于分类、问答和序列标注任务，您可以继承 `finetune.classification.classification_tasks.ClassificationTask`、`finetune.qa.qa_tasks.QATask` 或 `finetune.tagging.tagging_tasks.TaggingTask`。\n\n在数据预处理方面，我们使用与 [BERT](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fbert) 相同的分词器。\n\n## 预期结果\n以下是 ELECTRA 在各项任务上的预期结果（分块任务使用测试集，其他任务使用开发集）。请注意，微调过程中的方差可能[相当大](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.06305)，因此在多次从同一检查点进行微调时，某些任务的得分可能会出现较大波动。以下分数显示了大量随机种子下的中位数性能。ELECTRA-Small\u002FBase\u002FLarge 是我们已发布的模型。ELECTRA-Small-OWT 是上述基于 OpenWebText 训练的模型（由于训练时间较短且数据集较小，其性能略逊于 ELECTRA-Small）。\n\n|  | CoLA | SST | MRPC | STS  | QQP  | MNLI | QNLI | RTE | SQuAD 1.1 | SQuAD 2.0 | 分块 |\n| --- | --- | --- | --- | ---  | ---  | --- | --- | --- | ---| ---| --- |\n| 指标 | MCC | Acc | Acc | Spearman  | Acc  | Acc | Acc | Acc | EM | EM | F1 |\n| ELECTRA-Large| 69.1 | 96.9 | 90.8 | 92.6 | 92.4 | 90.9 | 95.0 | 88.0 | 89.7 | 88.1 | 97.2 |\n| ELECTRA-Base | 67.7 | 95.1 | 89.5 | 91.2 | 91.5  | 88.8  | 93.2 | 82.7 | 86.8 | 80.5 | 97.1 |\n| ELECTRA-Small | 57.0 | 91.2 | 88.0 |  87.5 | 89.0  | 81.3 | 88.4 | 66.7 | 75.8 | 70.1 |  96.5 |\n| ELECTRA-Small-OWT | 56.8 | 88.3 | 87.4 |  86.8 | 88.3  | 78.9 | 87.9 | 68.5 | -- | -- |  -- |\n\n有关预训练期间各模型的损失和训练曲线，请参阅[此处](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Felectra\u002Fissues\u002F3)。\n\n## Electric\n\n要训练 [Electric](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.emnlp-main.20.pdf)，请使用与 ELECTRA 相同的预训练脚本和命令。在超参数中传递 `\"electra_objective\": false` 和 `\"electric_objective\": true`。我们计划很快发布预训练的 Electric 模型！\n\n## 引用\n如果您在论文中使用此代码，请引用原始论文：\n```\n@inproceedings{clark2020electra,\n  title = {{ELECTRA}: Pre-training Text Encoders as Discriminators Rather Than Generators},\n  author = {Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning},\n  booktitle = {ICLR},\n  year = {2020},\n  url = {https:\u002F\u002Fopenreview.net\u002Fpdf?id=r1xMH1BtvB}\n}\n```\n\n如果您使用此代码训练 Electric，请引用 Electric 论文：\n```\n@inproceedings{clark2020electric,\n  title = {Pre-Training Transformers as Energy-Based Cloze Models},\n  author = {Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning},\n  booktitle = {EMNLP},\n  year = {2020},\n  url = {https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.emnlp-main.20.pdf}\n}\n```\n\n## 联系方式\n如需帮助或遇到 ELECTRA 使用问题，请提交 GitHub 问题。\n\n如需与 ELECTRA 相关的个人沟通，请联系 [Kevin Clark](https:\u002F\u002Fcs.stanford.edu\u002F~kevclark\u002F)（`kevclark@cs.stanford.edu`）。","# ELECTRA 快速上手指南\n\nELECTRA 是一种高效的自监督语言表示学习方法。它通过训练模型区分“真实”输入 token 和由另一个神经网络生成的“伪造”token（类似 GAN 的判别器），从而在较少计算资源下实现强大的预训练效果。本指南将帮助你快速搭建环境并运行一个小规模的 ELECTRA 模型。\n\n## 环境准备\n\n在开始之前，请确保你的系统满足以下要求：\n\n*   **操作系统**: Linux 或 macOS (Windows 需配合 WSL 使用)\n*   **Python**: 版本 3.x\n*   **深度学习框架**: TensorFlow 1.15 (目前官方代码主要基于 TF 1.x，暂不直接支持 TF 2.x)\n*   **核心依赖**:\n    *   NumPy\n    *   scikit-learn\n    *   SciPy\n\n**国内加速建议**：\n由于 TensorFlow 1.15 较旧且官方源下载可能较慢，建议使用清华或阿里镜像源安装依赖：\n```bash\npip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple tensorflow==1.15.0 numpy scikit-learn scipy\n```\n\n## 安装步骤\n\n1.  **克隆代码仓库**\n    首先获取 ELECTRA 的源代码：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Felectra.git\n    cd electra\n    ```\n\n2.  **安装 Python 依赖**\n    确保已安装上述要求的 TensorFlow 1.15 及其他库。如果未安装，执行：\n    ```bash\n    pip install -r requirements.txt\n    # 如果没有 requirements.txt，请手动安装：\n    # pip install tensorflow==1.15.0 numpy scikit-learn scipy\n    ```\n\n3.  **准备数据与词汇表**\n    创建一个数据目录（例如 `$DATA_DIR`），并下载必要的文件：\n    \n    *   **下载词汇表** (与 BERT uncased 相同)：\n        ```bash\n        mkdir -p $DATA_DIR\n        wget -O $DATA_DIR\u002Fvocab.txt https:\u002F\u002Fstorage.googleapis.com\u002Felectra-data\u002Fvocab.txt\n        ```\n    \n    *   **下载预训练语料** (以 OpenWebText 为例)：\n        ```bash\n        cd $DATA_DIR\n        # 下载并解压 (约 12GB)\n        wget https:\u002F\u002Fskylion007.github.io\u002FOpenWebTextCorpus\u002Fopenwebtext.tar.xz\n        tar xf openwebtext.tar.xz\n        mv openwebtext openwebtext_corpus\n        ```\n\n4.  **构建预训练数据集**\n    运行脚本将原始文本转换为 ELECTRA 所需的 TFRecord 格式：\n    ```bash\n    python3 build_openwebtext_pretraining_dataset.py --data-dir $DATA_DIR --num-processes 5\n    ```\n    *注：生成的 tfrecord 文件将存储在 `$DATA_DIR\u002Fpretrain_tfrecords`，约占 30GB 磁盘空间。*\n\n## 基本使用\n\n以下示例展示如何从头预训练一个小型的 ELECTRA 模型（12 层，隐藏层大小 256）。\n\n### 1. 启动预训练\n在单张 GPU（如 Tesla V100）上运行以下命令。完全训练约需 4 天，但训练 20 万步（约 10 小时）即可获得不错的效果。\n\n```bash\npython3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt\n```\n\n**自定义训练参数**：\n你可以通过 `--hparams` 参数调整模型大小、步数或开启调试模式。例如，仅进行少量步骤的调试训练：\n```bash\npython3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_debug --hparams '{\"debug\": true}'\n```\n\n常用超参数说明：\n*   `\"model_size\"`: 可选 \"small\", \"base\", \"large\"。\n*   `\"num_train_steps\"`: 控制预训练总步数。\n*   `\"learning_rate\"`: 学习率。\n\n### 2. 评估预训练模型\n训练完成后，可以在 OpenWebText 数据上评估生成器（Generator）和判别器（Discriminator）的性能：\n\n```bash\npython3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt --hparams '{\"do_train\": false, \"do_eval\": true}'\n```\n评估结果将保存在 `$DATA_DIR\u002Felectra_small_owt\u002Fresults`。\n\n### 3. 下游任务微调 (Fine-tuning)\n使用预训练好的模型进行下游任务（如 GLUE 基准测试中的 RTE 任务）微调：\n\n```bash\npython3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_small_owt --hparams '{\"task_names\": [\"rte\"], \"model_size\": \"small\", \"learning_rate\": 1e-4}'\n```\n\n*   `task_names`: 指定要微调的任务名称列表。\n*   `model_size`: 必须与预训练模型的大小保持一致。\n*   微调后的模型权重默认保存在 `$DATA_DIR\u002Felectra_small_owt\u002Ffinetuning_models`。","某初创科技公司的算法团队需要在有限的单 GPU 资源下，快速构建一个高精度的中文客服工单自动分类系统。\n\n### 没有 electra 时\n- **训练成本高昂**：传统预训练模型（如 BERT-Large）参数量巨大，团队现有的单张显卡无法承载，必须租赁昂贵的多卡集群，导致预算紧张。\n- **开发周期漫长**：由于算力受限，模型预训练和微调过程极其缓慢，从数据准备到模型上线往往需要数周时间，难以响应业务急需。\n- **小样本效果不佳**：在工单标注数据较少的情况下，直接复用通用大模型容易出现过拟合，分类准确率难以突破瓶颈，大量复杂工单仍需人工介入。\n- **部署难度极大**：庞大的模型体积导致推理延迟高，难以集成到对实时性要求严格的在线客服系统中。\n\n### 使用 electra 后\n- **资源利用高效**：electra 独特的判别式预训练机制使其在极小算力下表现优异，团队成功在单张 GPU 上完成了高质量模型的训练，大幅降低了硬件门槛。\n- **迭代速度飞跃**：得益于高效的训练策略，模型收敛速度显著提升，新功能的验证周期从数周缩短至几天，迅速满足了业务上线需求。\n- **小数据高性能**：即使在标注数据有限的场景下，electra-Small 模型也展现出了超越同量级蒸馏模型的泛化能力，工单自动分类准确率提升了 15%。\n- **轻量易于部署**：生成的模型参数量小且推理速度快，轻松嵌入现有服务架构，实现了毫秒级的工单自动路由响应。\n\nelectra 通过创新的“判别器”预训练范式，让中小团队也能以极低算力成本获得业界领先的自然语言理解能力。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-research_electra_c9e601fc.png","google-research","Google Research","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fgoogle-research_c23b2adf.png","",null,"https:\u002F\u002Fresearch.google","https:\u002F\u002Fgithub.com\u002Fgoogle-research",[80],{"name":81,"color":82,"percentage":83},"Python","#3572A5",100,2373,349,"2026-04-07T09:17:15","Apache-2.0",4,"未说明","预训练阶段需要 NVIDIA GPU（示例提及 Tesla V100），单卡即可运行小规模模型；显存和 CUDA 版本未明确说明",{"notes":92,"python":93,"dependencies":94},"该工具基于 TensorFlow 1.15（暂不支持 TF 2.0）。预训练小模型在 Tesla V100 上约需 4 天，但 20 万步（约 10 小时）后可获得不错结果。处理后的预训练数据（tfrecords）约需 30GB 磁盘空间。支持从已发布的检查点继续训练。","3",[95,96,97,98],"TensorFlow==1.15","NumPy","scikit-learn","SciPy",[35,14],[101,102,103],"nlp","deep-learning","tensorflow","2026-03-27T02:49:30.150509","2026-04-11T03:24:25.824119",[107,112,117,122,127,131],{"id":108,"question_zh":109,"answer_zh":110,"source_url":111},28634,"如何在 Hugging Face Transformers 中加载 ELECTRA 模型时解决 'KeyError: electra' 错误？","该问题通常出现在较旧版本的 Transformers 库中（如 2.7.0），因为当时尚未正式支持 ELECTRA 架构。维护者确认 ELECTRA 已通过 Hugging Face Transformers 库在 PyTorch 中可用（v2.8.0 及以上版本）。请升级您的 transformers 库到最新版本：`pip install --upgrade transformers`。升级后，您可以直接使用 `AutoTokenizer.from_pretrained(\"google\u002Felectra-small-generator\")` 加载模型。注意：对于微调任务，通常只需要使用 Discriminator（判别器）模型；Generator（生成器）主要用于预训练阶段。","https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Felectra\u002Fissues\u002F32",{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},28635,"是否有计划支持在 PyTorch 中加载或使用 ELECTRA 模型？","是的，ELECTRA 现在已经通过 Hugging Face Transformers 库在 PyTorch 中可用。您可以在 Transformers v2.8.0 或更高版本中找到支持。此外，维护者还提到正在实施 ELECTRA 的预训练方法支持，这将通过后续的 Pull Request 加入。如果您需要使用 PyTorch 进行微调或推理，直接安装最新版的 transformers 库即可。","https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Felectra\u002Fissues\u002F19",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},28636,"在进行进一步预训练（Further Pretraining）时遇到 'adam_m not found in checkpoint' 错误如何解决？","这是一个已知问题，通常发生在尝试从官方预训练检查点继续预训练时，因为优化器状态变量（如 adam_m）不匹配或缺失。虽然社区中有很多用户遇到此问题，但官方仓库中尚未提供直接的修复脚本。常见的变通方法是修改加载逻辑以忽略缺失的优化器变量，或者从头开始训练而不是加载优化器状态。由于 TensorFlow 1.x 的复杂性，许多用户最终选择放弃使用该代码库进行进一步预训练，转而使用 Hugging Face 的实现，后者对检查点加载有更好的兼容性。","https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Felectra\u002Fissues\u002F45",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},28637,"训练 ELECTRA Base 模型时出现 'NaN loss'（损失值为非数字）导致训练发散，该如何解决？","当训练较大的模型（如 Base 或 Large）时，可能会出现损失值变为 NaN 的情况。有用户尝试通过降低学习率（learning rate）来解决此问题，但并未完全保证成功。这可能与硬件配置、TensorFlow 版本或超参数设置有关。如果遇到此问题，建议首先尝试显著降低初始学习率，并检查数据预处理是否正确。如果问题依旧存在，考虑到官方代码库对此类问题的支持有限，许多用户选择转向更稳定的 Hugging Face 实现或调整批次大小（batch size）。","https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Felectra\u002Fissues\u002F36",{"id":128,"question_zh":129,"answer_zh":130,"source_url":111},28638,"Hugging Face 上的 ELECTRA 'generator' 和 'discriminator' 模型有什么区别？我应该用哪个？","ELECTRA 架构包含两个主要组件：Generator（生成器）和 Discriminator（判别器）。Generator 是一个较小的模型，用于替换输入 token 以进行训练；Discriminator 是主模型，用于判断 token 是否被替换。对于大多数下游任务（如分类、问答等）的微调（Finetuning），您只需要使用 **Discriminator** 模型。Generator 主要用于预训练阶段。因此，如果您是在 Hugging Face 上寻找用于具体任务的模型，请选择带有 'discriminator' 标识的模型（或者默认的 electra-base\u002Fmodel 通常指判别器）。",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},28639,"在哪里可以找到 ELECTRA Base 和 Large 模型的最终损失值或训练曲线作为参考？","许多用户在训练非英语版本的 ELECTRA 模型时希望获得官方 Base 和 Large 模型的损失曲线或最终损失值作为参考基准。然而，根据 Issue 讨论，官方仓库并未直接提供详细的损失训练曲线图或具体的最终损失数值文档。用户通常需要依赖论文中的描述或自行监控训练过程。如果您在使用 TensorBoard 时只看到一个点，请确保在训练参数中正确设置了评估频率，以便记录多个步骤的数据。","https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Felectra\u002Fissues\u002F3",[]]