[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-zihangdai--xlnet":3,"tool-zihangdai--xlnet":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":75,"owner_email":77,"owner_twitter":75,"owner_website":75,"owner_url":78,"languages":79,"stars":92,"forks":93,"last_commit_at":94,"license":95,"difficulty_score":96,"env_os":97,"env_gpu":98,"env_ram":99,"env_deps":100,"category_tags":106,"github_topics":107,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":111,"updated_at":112,"faqs":113,"releases":151},8912,"zihangdai\u002Fxlnet","xlnet","XLNet: Generalized Autoregressive Pretraining for Language Understanding","XLNet 是一种先进的无监督语言表示学习方法，旨在提升机器对自然语言的理解能力。它主要解决了传统预训练模型（如 BERT）在建模时因人为掩盖部分文本而导致的训练与预测不一致问题，同时克服了处理长篇文章时上下文记忆不足的局限。\n\nXLNet 的核心亮点在于创新性地采用了“广义自回归预训练”目标，通过随机排列输入顺序来更全面地捕捉文字间的依赖关系；此外，它还集成了 Transformer-XL 架构，显著增强了对长文本语境的处理能力。在阅读理解、情感分析、自然语言推理及文档排序等 20 多项主流任务中，XLNet 的表现均超越了同期的 BERT 模型，取得了业界领先的成果。\n\n这款工具非常适合人工智能研究人员、算法工程师及 NLP 开发者使用。如果您正在构建需要深度理解语义的问答系统、分类器或搜索排序应用，XLNet 提供的预训练模型（包括 Base 和 Large 版本）能帮助您快速搭建高性能基线。虽然普通用户无法直接操作代码，但许多基于 XLNet 优化的后端服务已在无形中提升了大家日常使用的智能助手和搜索体验。","## Introduction\n\n**XLNet** is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally, XLNet employs [Transformer-XL](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.02860) as the backbone model, exhibiting excellent performance for language tasks involving long context. Overall, XLNet achieves state-of-the-art (SOTA) results on various downstream language tasks including question answering, natural language inference, sentiment analysis, and document ranking.\n\nFor a detailed description of technical details and experimental results, please refer to our paper:\n\n​        [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.08237)\n\n​        Zhilin Yang\\*, Zihang Dai\\*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le \n\n​        (*: equal contribution) \n\n​        Preprint 2019\n\n\n\n\n## Release Notes\n\n* July 16, 2019: XLNet-Base.\n* June 19, 2019: initial release with XLNet-Large and code.\n\n## Results\n\nAs of June 19, 2019, XLNet outperforms BERT on 20 tasks and achieves state-of-the-art results on 18 tasks. Below are some comparison between XLNet-Large and BERT-Large, which have similar model sizes:\n\n### Results on Reading Comprehension\n\nModel | [RACE accuracy](http:\u002F\u002Fwww.qizhexie.com\u002Fdata\u002FRACE_leaderboard.html) | SQuAD1.1 EM | SQuAD2.0 EM\n--- | --- | --- | ---\nBERT-Large | 72.0 | 84.1 | 78.98\nXLNet-Base | | | 80.18\nXLNet-Large | **81.75** | **88.95** | **86.12**\n\nWe use SQuAD dev results in the table to exclude other factors such as using additional training data or other data augmentation techniques. See [SQuAD leaderboard](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F) for test numbers.\n\n### Results on Text Classification\n\nModel | IMDB | Yelp-2 | Yelp-5 | DBpedia | Amazon-2 | Amazon-5\n--- | --- | --- | --- | --- | --- | ---\nBERT-Large | 4.51 | 1.89 | 29.32 | 0.64 | 2.63 | 34.17\nXLNet-Large | **3.79** | **1.55** | **27.80** | **0.62** | **2.40** | **32.26**\n\nThe above numbers are error rates.\n\n### Results on GLUE\n\nModel | MNLI | QNLI | QQP | RTE | SST-2 | MRPC | CoLA | STS-B\n--- | --- | --- | --- | --- | --- | --- | --- | ---\nBERT-Large | 86.6 | 92.3 | 91.3 | 70.4 | 93.2 | 88.0 | 60.6 | 90.0\nXLNet-Base | 86.8 | 91.7 | 91.4 | 74.0 | 94.7 | 88.2 | 60.2 | 89.5\nXLNet-Large | **89.8** | **93.9** | **91.8** | **83.8** | **95.6** | **89.2** | **63.6** | **91.8**\n\nWe use single-task dev results in the table to exclude other factors such as multi-task learning or using ensembles.\n\n## Pre-trained models\n\n### Released Models\n\nAs of \u003Cu>July 16, 2019\u003C\u002Fu>, the following models have been made available:\n* **[`XLNet-Large, Cased`](https:\u002F\u002Fstorage.googleapis.com\u002Fxlnet\u002Freleased_models\u002Fcased_L-24_H-1024_A-16.zip)**: 24-layer, 1024-hidden, 16-heads\n* **[`XLNet-Base, Cased`](https:\u002F\u002Fstorage.googleapis.com\u002Fxlnet\u002Freleased_models\u002Fcased_L-12_H-768_A-12.zip)**: 12-layer, 768-hidden, 12-heads. This model is trained on full data (different from the one in the paper).\n\nWe only release cased models for now because on the tasks we consider, we found: (1) for the base setting, cased and uncased models have similar performance; (2) for the large setting, cased models are a bit better in some tasks.\n\nEach .zip file contains three items:\n*   A TensorFlow checkpoint (`xlnet_model.ckpt`) containing the pre-trained weights (which is actually 3 files).\n*   A [Sentence Piece](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fsentencepiece) model (`spiece.model`) used for (de)tokenization.\n*   A config file (`xlnet_config.json`) which specifies the hyperparameters of the model.\n\n\n### Future Release Plan\n\nWe also plan to continuously release more pretrained models under different settings, including:\n* A pretrained model that is **finetuned on Wikipedia**. This can be used for tasks with Wikipedia text such as SQuAD and HotpotQA.\n* Pretrained models with other hyperparameter configurations, targeting specific downstream tasks.\n* Pretrained models that benefit from new techniques.\n\n### Subscribing to XLNet on Google Groups\n\nTo receive notifications about updates, announcements and new releases, we recommend subscribing to the XLNet on [Google Groups](https:\u002F\u002Fgroups.google.com\u002Fforum\u002F#!forum\u002Fxlnet).\n\n\n\n## Fine-tuning with XLNet\n\nAs of \u003Cu>June 19, 2019\u003C\u002Fu>, this code base has been tested with TensorFlow 1.13.1 under Python2.\n\n### Memory Issue during Finetuning\n\n- Most of the SOTA results in our paper were produced on TPUs, which generally have more RAM than common GPUs. As a result, it is currently very difficult (costly) to re-produce most of the `XLNet-Large` SOTA results in the paper using GPUs with 12GB - 16GB of RAM, because a 16GB GPU is only able to hold a \u003Cu>single sequence with length 512\u003C\u002Fu> for `XLNet-Large`. Therefore, a large number (ranging from 32 to 128, equal to `batch_size`) of GPUs are required to reproduce many results in the paper.\n- We are experimenting with gradient accumulation to potentially relieve the memory burden, which could be included in a near-future update.\n- **Alternative methods** of finetuning XLNet on **constrained hardware** have been presented in [renatoviolin's repo](https:\u002F\u002Fgithub.com\u002Frenatoviolin\u002Fxlnet), which obtained 86.24 F1 on SQuAD2.0 with a 8GB memory GPU.\n\nGiven the memory issue mentioned above, using the default finetuning scripts (`run_classifier.py` and `run_squad.py`), we benchmarked the maximum batch size on a single **16GB** GPU with TensorFlow **1.13.1**:\n\n| System        | Seq Length | Max Batch Size |\n| ------------- | ---------- | -------------- |\n| `XLNet-Base`  | 64         | 120            |\n| ...           | 128        | 56             |\n| ...           | 256        | 24             |\n| ...           | 512        | 8              |\n| `XLNet-Large` | 64         | 16             |\n| ...           | 128        | 8              |\n| ...           | 256        | 2              |\n| ...           | 512        | 1              |\n\nIn most cases, it is possible to reduce the batch size `train_batch_size` or the maximum sequence length `max_seq_length` to fit in given hardware. The decrease in performance depends on the task and the available resources.\n\n\n### Text Classification\u002FRegression\n\nThe code used to perform classification\u002Fregression finetuning is in `run_classifier.py`. It also contains examples for standard one-document classification, one-document regression, and document pair classification. Here, we provide two concrete examples of how `run_classifier.py` can be used.\n\nFrom here on, we assume XLNet-Large and XLNet-base has been downloaded to `$LARGE_DIR` and `$BASE_DIR` respectively.\n\n\n#### (1) STS-B: sentence pair relevance regression (with GPUs)\n\n- Download the [GLUE data](https:\u002F\u002Fgluebenchmark.com\u002Ftasks) by running [this script](https:\u002F\u002Fgist.github.com\u002FW4ngatang\u002F60c2bdb54d156a41194446737ce03e2e) and unpack it to some directory `$GLUE_DIR`.\n\n- Perform **multi-GPU** (4 V100 GPUs) finetuning with XLNet-Large by running\n\n  ```shell\n  CUDA_VISIBLE_DEVICES=0,1,2,3 python run_classifier.py \\\n    --do_train=True \\\n    --do_eval=False \\\n    --task_name=sts-b \\\n    --data_dir=${GLUE_DIR}\u002FSTS-B \\\n    --output_dir=proc_data\u002Fsts-b \\\n    --model_dir=exp\u002Fsts-b \\\n    --uncased=False \\\n    --spiece_model_file=${LARGE_DIR}\u002Fspiece.model \\\n    --model_config_path=${LARGE_DIR}\u002Fxlnet_config.json \\\n    --init_checkpoint=${LARGE_DIR}\u002Fxlnet_model.ckpt \\\n    --max_seq_length=128 \\\n    --train_batch_size=8 \\\n    --num_hosts=1 \\\n    --num_core_per_host=4 \\\n    --learning_rate=5e-5 \\\n    --train_steps=1200 \\\n    --warmup_steps=120 \\\n    --save_steps=600 \\\n    --is_regression=True\n  ```\n\n- Evaluate the finetuning results with a single GPU by\n\n  ```shell\n  CUDA_VISIBLE_DEVICES=0 python run_classifier.py \\\n    --do_train=False \\\n    --do_eval=True \\\n    --task_name=sts-b \\\n    --data_dir=${GLUE_DIR}\u002FSTS-B \\\n    --output_dir=proc_data\u002Fsts-b \\\n    --model_dir=exp\u002Fsts-b \\\n    --uncased=False \\\n    --spiece_model_file=${LARGE_DIR}\u002Fspiece.model \\\n    --model_config_path=${LARGE_DIR}\u002Fxlnet_config.json \\\n    --max_seq_length=128 \\\n    --eval_batch_size=8 \\\n    --num_hosts=1 \\\n    --num_core_per_host=1 \\\n    --eval_all_ckpt=True \\\n    --is_regression=True\n\n  # Expected performance: \"eval_pearsonr 0.916+ \"\n  ```\n\n**Notes**:\n\n- In the context of GPU training, `num_core_per_host` denotes the number of GPUs to use.\n- In the multi-GPU setting, `train_batch_size` refers to the \u003Cu>per-GPU batch size\u003C\u002Fu>.\n- `eval_all_ckpt` allows one to evaluate all saved checkpoints (save frequency is controlled by `save_steps`) after training finishes and choose the best model based on dev performance.\n- `data_dir` and `output_dir` refer to the directories of the \"raw data\" and \"preprocessed tfrecords\" respectively, while `model_dir` is the working directory for saving checkpoints and tensorflow events. **`model_dir` should be set as a separate folder to `init_checkpoint`.**\n- To try out \u003Cu>XLNet-base\u003C\u002Fu>, one can simply set `--train_batch_size=32` and `--num_core_per_host=1`, along with according changes in `init_checkpoint` and `model_config_path`.\n- For GPUs with smaller RAM, please proportionally decrease the `train_batch_size` and increase `num_core_per_host` to use the same training setting.\n- **Important**: we separate the training and evaluation into \"two phases\", as using multi GPUs to perform evaluation is tricky (one has to correctly separate the data across GPUs). To ensure correctness, we only support single-GPU evaluation for now.\n\n\n#### (2) IMDB: movie review sentiment classification (with TPU V3-8)\n\n- Download and unpack the IMDB dataset by running\n\n  ```shell\n  wget http:\u002F\u002Fai.stanford.edu\u002F~amaas\u002Fdata\u002Fsentiment\u002FaclImdb_v1.tar.gz\n  tar zxvf aclImdb_v1.tar.gz\n  ```\n\n- Launch a Google cloud TPU V3-8 instance (see the [Google Cloud TPU tutorial](https:\u002F\u002Fcloud.google.com\u002Ftpu\u002Fdocs\u002Ftutorials\u002Fmnist) for how to set up Cloud TPUs).\n\n- Set up your Google storage bucket path `$GS_ROOT` and move the IMDB dataset and pretrained checkpoint into your Google storage.\n\n- Perform TPU finetuning with XLNet-Large by running\n\n  ```shell\n  python run_classifier.py \\\n    --use_tpu=True \\\n    --tpu=${TPU_NAME} \\\n    --do_train=True \\\n    --do_eval=True \\\n    --eval_all_ckpt=True \\\n    --task_name=imdb \\\n    --data_dir=${IMDB_DIR} \\\n    --output_dir=${GS_ROOT}\u002Fproc_data\u002Fimdb \\\n    --model_dir=${GS_ROOT}\u002Fexp\u002Fimdb \\\n    --uncased=False \\\n    --spiece_model_file=${LARGE_DIR}\u002Fspiece.model \\\n    --model_config_path=${GS_ROOT}\u002F${LARGE_DIR}\u002Fmodel_config.json \\\n    --init_checkpoint=${GS_ROOT}\u002F${LARGE_DIR}\u002Fxlnet_model.ckpt \\\n    --max_seq_length=512 \\\n    --train_batch_size=32 \\\n    --eval_batch_size=8 \\\n    --num_hosts=1 \\\n    --num_core_per_host=8 \\\n    --learning_rate=2e-5 \\\n    --train_steps=4000 \\\n    --warmup_steps=500 \\\n    --save_steps=500 \\\n    --iterations=500\n\n  # Expected performance: \"eval_accuracy 0.962+ \"\n  ```\n\n**Notes**:\n\n- To obtain the SOTA on the IMDB dataset, using sequence length 512 is **necessary**. Therefore, we show how this can be done with a TPU V3-8.\n- Alternatively, one can use a sequence length smaller than 512, a smaller batch size, or switch to XLNet-base to train on GPUs. But performance drop is expected.\n- Notice that the `data_dir` and `spiece_model_file` both use a local path rather than a Google Storage path. The reason is that data preprocessing is actually performed locally. Hence, using local paths leads to a faster preprocessing speed.\n\n### SQuAD2.0\n\nThe code for the SQuAD dataset is included in `run_squad.py`.\n\nTo run the code:\n\n(1) Download the SQuAD2.0 dataset into `$SQUAD_DIR` by:\n\n```shell\nmkdir -p ${SQUAD_DIR} && cd ${SQUAD_DIR}\nwget https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002Fdataset\u002Ftrain-v2.0.json\nwget https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002Fdataset\u002Fdev-v2.0.json\n```\n\n(2) Perform data preprocessing using the script `scripts\u002Fprepro_squad.sh`.\n\n- This will take quite some time in order to accurately map character positions (raw data) to sentence piece positions (used for training).\n\n- For faster parallel preprocessing, please refer to the flags `--num_proc` and `--proc_id` in `run_squad.py`.\n\n(3) Perform training and evaluation.\n\nFor the best performance, XLNet-Large uses \u003Cu>sequence length 512\u003C\u002Fu> and \u003Cu>batch size 48\u003C\u002Fu> for training.\n\n- As a result, reproducing the best result with GPUs is quite difficult.\n\n- For training with one TPU v3-8, one can simply run the script `scripts\u002Ftpu_squad_large.sh` after both the TPU and Google storage have been setup.\n- `run_squad.py` will automatically perform threshold searching on the dev set of squad and output the score. With `scripts\u002Ftpu_squad_large.sh`, the expected F1 score should be around 88.6 (median of our multiple runs).\n\nAlternatively, one can use XLNet-Base with GPUs (e.g. three V100). One set of reasonable hyper-parameters can be found in the script `scripts\u002Fgpu_squad_base.sh`.\n\n\n### RACE reading comprehension\n\nThe code for the reading comprehension task [RACE](https:\u002F\u002Fwww.cs.cmu.edu\u002F~glai1\u002Fdata\u002Frace\u002F) is included in `run_race.py`.\n\n- Notably, the average length of the passages in RACE is over 300 tokens (not peices), which is \u003Cu>significantly longer\u003C\u002Fu> than other popular reading comprehension datasets such as SQuAD.\n- Also, many questions can be very difficult and requires complex reasoning for machines to solve (see [one example here](misc\u002Frace_example.md)).\n\n\nTo run the code:\n\n(1) Download the RACE dataset from the [official website](https:\u002F\u002Fwww.cs.cmu.edu\u002F~glai1\u002Fdata\u002Frace\u002F) and unpack the raw data to `$RACE_DIR`.\n\n(2) Perform training and evaluation:\n\n- The SOTA performance (accuracy 81.75) of RACE is produced using XLNet-Large with sequence length 512 and batch size 32, which requires a large TPU v3-32 in the pod setting. Please refer to the script `script\u002Ftpu_race_large_bsz32.sh` for this setting.\n- Using XLNet-Large with sequence length 512 and batch size 8 on a TPU v3-8 can give you an accuracy of around 80.3 (see `script\u002Ftpu_race_large_bsz8.sh`).\n\n### Using Google Colab\n\n[An example](notebooks\u002Fcolab_imdb_gpu.ipynb) of using Google Colab with GPUs has been provided. Note that since the hardware is constrained in the example, the results are worse than the best we can get. It mainly serves as an example and should be modified accordingly to maximize performance.\n\n\n## Custom Usage of XLNet\n\n### XLNet Abstraction\n\nFor finetuning, it is likely that you will be able to modify existing files such as `run_classifier.py`, `run_squad.py` and `run_race.py` for your task at hand. However, we also provide an abstraction of XLNet to enable more flexible usage. Below is an example:\n\n```python\nimport xlnet\n\n# some code omitted here...\n# initialize FLAGS\n# initialize instances of tf.Tensor, including input_ids, seg_ids, and input_mask\n\n# XLNetConfig contains hyperparameters that are specific to a model checkpoint.\nxlnet_config = xlnet.XLNetConfig(json_path=FLAGS.model_config_path)\n\n# RunConfig contains hyperparameters that could be different between pretraining and finetuning.\nrun_config = xlnet.create_run_config(is_training=True, is_finetune=True, FLAGS=FLAGS)\n\n# Construct an XLNet model\nxlnet_model = xlnet.XLNetModel(\n    xlnet_config=xlnet_config,\n    run_config=run_config,\n    input_ids=input_ids,\n    seg_ids=seg_ids,\n    input_mask=input_mask)\n\n# Get a summary of the sequence using the last hidden state\nsummary = xlnet_model.get_pooled_out(summary_type=\"last\")\n\n# Get a sequence output\nseq_out = xlnet_model.get_sequence_output()\n\n# build your applications based on `summary` or `seq_out`\n```\n\n### Tokenization\n\nBelow is an example of doing tokenization in XLNet:\n```python\nimport sentencepiece as spm\nfrom prepro_utils import preprocess_text, encode_ids\n\n# some code omitted here...\n# initialize FLAGS\n\ntext = \"An input text string.\"\n\nsp_model = spm.SentencePieceProcessor()\nsp_model.Load(FLAGS.spiece_model_file)\ntext = preprocess_text(text, lower=FLAGS.uncased)\nids = encode_ids(sp_model, text)\n```\nwhere `FLAGS.spiece_model_file` is the SentencePiece model file in the same zip as the pretrained model, `FLAGS.uncased` is a bool indicating whether to do uncasing.\n\n\n## Pretraining with XLNet\n\nRefer to `train.py` for pretraining on TPUs and `train_gpu.py` for pretraining on GPUs. First we need to preprocess the text data into tfrecords.\n\n```shell\npython data_utils.py \\\n\t--bsz_per_host=32 \\\n\t--num_core_per_host=16 \\\n\t--seq_len=512 \\\n\t--reuse_len=256 \\\n\t--input_glob=*.txt \\\n\t--save_dir=${SAVE_DIR} \\\n\t--num_passes=20 \\\n\t--bi_data=True \\\n\t--sp_path=spiece.model \\\n\t--mask_alpha=6 \\\n\t--mask_beta=1 \\\n\t--num_predict=85\n```\n\nwhere `input_glob` defines all input text files, `save_dir` is the output directory for tfrecords, and `sp_path` is a [Sentence Piece](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fsentencepiece) model. Here is our script to train the Sentence Piece model\n\n```bash\nspm_train \\\n\t--input=$INPUT \\\n\t--model_prefix=sp10m.cased.v3 \\\n\t--vocab_size=32000 \\\n\t--character_coverage=0.99995 \\\n\t--model_type=unigram \\\n\t--control_symbols=\u003Ccls>,\u003Csep>,\u003Cpad>,\u003Cmask>,\u003Ceod> \\\n\t--user_defined_symbols=\u003Ceop>,.,(,),\",-,–,£,€ \\\n\t--shuffle_input_sentence \\\n\t--input_sentence_size=10000000\n```\n\nSpecial symbols are used, including `control_symbols` and `user_defined_symbols`. We use `\u003Ceop>` and `\u003Ceod>` to denote End of Paragraph and End of Document respectively.\n\nThe input text files to `data_utils.py` must use the following format:\n* Each line is a sentence.\n* An empty line means End of Document.\n* (Optional) If one also wants to model paragraph structures, `\u003Ceop>` can be inserted at the end of certain lines (without any space) to indicate that the corresponding sentence ends a paragraph.\n\nFor example, the text input file could be:\n```\nThis is the first sentence.\nThis is the second sentence and also the end of the paragraph.\u003Ceop>\nAnother paragraph.\n\nAnother document starts here.\n```\n\nAfter preprocessing, we are ready to pretrain an XLNet. Below are the hyperparameters used for pretraining XLNet-Large:\n\n```shell\npython train.py\n  --record_info_dir=$DATA\u002Ftfrecords \\\n  --train_batch_size=2048 \\\n  --seq_len=512 \\\n  --reuse_len=256 \\\n  --mem_len=384 \\\n  --perm_size=256 \\\n  --n_layer=24 \\\n  --d_model=1024 \\\n  --d_embed=1024 \\\n  --n_head=16 \\\n  --d_head=64 \\\n  --d_inner=4096 \\\n  --untie_r=True \\\n  --mask_alpha=6 \\\n  --mask_beta=1 \\\n  --num_predict=85\n```\n\nwhere we only list the most important flags and the other flags could be adjusted based on specific use cases.\n\n","## 引言\n\n**XLNet** 是一种基于新型广义排列语言建模目标的无监督语言表示学习方法。此外，XLNet 采用 [Transformer-XL](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.02860) 作为骨干模型，在处理长上下文的语言任务中表现出色。总体而言，XLNet 在多项下游语言任务上取得了当前最优（SOTA）成绩，包括问答、自然语言推理、情感分析和文档排序等。\n\n有关技术细节和实验结果的详细说明，请参阅我们的论文：\n\n​        [XLNet：用于语言理解的广义自回归预训练](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.08237)\n\n​        Zhilin Yang\\*, Zihang Dai\\*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le \n\n​        (*: 共同第一作者) \n\n​        预印本 2019 年\n\n\n\n\n## 发布说明\n\n* 2019年7月16日：XLNet-Base。\n* 2019年6月19日：首次发布 XLNet-Large 及其代码。\n\n## 结果\n\n截至2019年6月19日，XLNet 在20项任务中超越了 BERT，并在18项任务中取得了当前最优成绩。以下是 XLNet-Large 和 BERT-Large 的一些对比，两者的模型规模相近：\n\n### 阅读理解任务结果\n\n模型 | [RACE 准确率](http:\u002F\u002Fwww.qizhexie.com\u002Fdata\u002FRACE_leaderboard.html) | SQuAD1.1 EM | SQuAD2.0 EM\n--- | --- | --- | ---\nBERT-Large | 72.0 | 84.1 | 78.98\nXLNet-Base | | | 80.18\nXLNet-Large | **81.75** | **88.95** | **86.12**\n\n表中使用的是 SQuAD 开发集的结果，以排除其他因素的影响，例如使用额外的训练数据或数据增强技术。测试数据可参见 [SQuAD 排行榜](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F)。\n\n### 文本分类任务结果\n\n模型 | IMDB | Yelp-2 | Yelp-5 | DBpedia | Amazon-2 | Amazon-5\n--- | --- | --- | --- | --- | --- | ---\nBERT-Large | 4.51 | 1.89 | 29.32 | 0.64 | 2.63 | 34.17\nXLNet-Large | **3.79** | **1.55** | **27.80** | **0.62** | **2.40** | **32.26**\n\n以上数字为错误率。\n\n### GLUE 任务结果\n\n模型 | MNLI | QNLI | QQP | RTE | SST-2 | MRPC | CoLA | STS-B\n--- | --- | --- | --- | --- | --- | --- | --- | ---\nBERT-Large | 86.6 | 92.3 | 91.3 | 70.4 | 93.2 | 88.0 | 60.6 | 90.0\nXLNet-Base | 86.8 | 91.7 | 91.4 | 74.0 | 94.7 | 88.2 | 60.2 | 89.5\nXLNet-Large | **89.8** | **93.9** | **91.8** | **83.8** | **95.6** | **89.2** | **63.6** | **91.8**\n\n表中使用的是单任务开发集的结果，以排除多任务学习或集成方法等其他因素的影响。\n\n## 预训练模型\n\n### 已发布的模型\n\n截至\u003Cu>2019年7月16日\u003C\u002Fu>,已提供以下模型：\n* **[`XLNet-Large, Cased`](https:\u002F\u002Fstorage.googleapis.com\u002Fxlnet\u002Freleased_models\u002Fcased_L-24_H-1024_A-16.zip)**：24层，1024隐藏单元，16头注意力机制\n* **[`XLNet-Base, Cased`](https:\u002F\u002Fstorage.googleapis.com\u002Fxlnet\u002Freleased_models\u002Fcased_L-12_H-768_A-12.zip)**：12层，768隐藏单元，12头注意力机制。该模型是在完整数据集上训练的（与论文中的不同）。\n\n目前我们仅发布了大小写敏感的模型，因为在我们所考虑的任务中，我们发现：(1) 对于基础设置，大小写敏感和不敏感的模型性能相近；(2) 对于大型设置，大小写敏感的模型在某些任务中表现略优。\n\n每个 .zip 文件包含三项内容：\n*   一个 TensorFlow 检查点文件 (`xlnet_model.ckpt`)，其中包含预训练权重（实际上由3个文件组成）。\n*   一个 [Sentence Piece](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fsentencepiece) 模型文件 (`spiece.model`)，用于分词和反向分词。\n*   一个配置文件 (`xlnet_config.json`)，用于指定模型的超参数。\n\n\n### 未来发布计划\n\n我们还计划持续发布更多不同设置下的预训练模型，包括：\n* 一个在维基百科上进行微调的预训练模型。这可用于处理包含维基百科文本的任务，如 SQuAD 和 HotpotQA。\n* 具有其他超参数配置的预训练模型，以针对特定的下游任务。\n* 利用新技术改进的预训练模型。\n\n### 订阅 Google Groups 上的 XLNet\n\n为了接收有关更新、公告和新发布的通知，我们建议您订阅 [Google Groups](https:\u002F\u002Fgroups.google.com\u002Fforum\u002F#!forum\u002Fxlnet) 上的 XLNet 讨论组。\n\n\n\n## 使用 XLNet 进行微调\n\n截至\u003Cu>2019年6月19日\u003C\u002Fu>,该代码库已在 Python2 环境下使用 TensorFlow 1.13.1 进行了测试。\n\n### 微调过程中的内存问题\n\n- 我们论文中大多数 SOTA 结果都是在 TPU 上产生的，而 TPU 的内存通常比普通 GPU 更大。因此，目前使用配备 12GB 至 16GB 内存的 GPU 来复现论文中大部分 `XLNet-Large` 的 SOTA 结果非常困难（成本高昂），因为一台 16GB 的 GPU 只能容纳长度为 \u003Cu>512 的单个序列\u003C\u002Fu>用于 `XLNet-Large`。因此，需要大量（32 至 128 张，即等于 `batch_size`）GPU 才能复现论文中的许多结果。\n- 我们正在尝试梯度累积的方法来缓解内存压力，这可能会在不久的将来的新版本中加入。\n- 在 [renatoviolin 的仓库](https:\u002F\u002Fgithub.com\u002Frenatoviolin\u002Fxlnet) 中，已经提出了在**资源受限的硬件**上微调 XLNet 的**替代方法**，该方法使用 8GB 显存的 GPU 在 SQuAD2.0 上获得了 86.24 的 F1 分数。\n\n鉴于上述内存问题，我们使用默认的微调脚本（`run_classifier.py` 和 `run_squad.py`），并在单个配备 **16GB** 显存的 GPU 上，使用 TensorFlow **1.13.1** 测试了最大批量大小：\n\n| 系统        | 序列长度 | 最大批量大小 |\n| ------------- | ---------- | -------------- |\n| `XLNet-Base`  | 64         | 120            |\n| ...           | 128        | 56             |\n| ...           | 256        | 24             |\n| ...           | 512        | 8              |\n| `XLNet-Large` | 64         | 16             |\n| ...           | 128        | 8              |\n| ...           | 256        | 2              |\n| ...           | 512        | 1              |\n\n在大多数情况下，可以通过减小批量大小 `train_batch_size` 或最大序列长度 `max_seq_length` 来适应现有硬件。性能的下降程度取决于具体任务和可用资源。\n\n### 文本分类\u002F回归\n\n用于执行分类\u002F回归微调的代码位于 `run_classifier.py` 中。该文件还包含标准单文档分类、单文档回归以及文档对分类的示例。在此，我们提供两个具体的使用 `run_classifier.py` 的示例。\n\n从现在开始，我们假设 XLNet-Large 和 XLNet-base 已分别下载到 `$LARGE_DIR` 和 `$BASE_DIR` 目录中。\n\n\n#### (1) STS-B：句子对相关性回归（使用 GPU）\n\n- 通过运行 [此脚本](https:\u002F\u002Fgist.github.com\u002FW4ngatang\u002F60c2bdb54d156a41194446737ce03e2e) 下载 [GLUE 数据集](https:\u002F\u002Fgluebenchmark.com\u002Ftasks)，并将其解压到某个目录 `$GLUE_DIR` 中。\n\n- 使用 XLNet-Large 在 **多 GPU**（4 张 V100 GPU）上进行微调，运行以下命令：\n\n  ```shell\n  CUDA_VISIBLE_DEVICES=0,1,2,3 python run_classifier.py \\\n    --do_train=True \\\n    --do_eval=False \\\n    --task_name=sts-b \\\n    --data_dir=${GLUE_DIR}\u002FSTS-B \\\n    --output_dir=proc_data\u002Fsts-b \\\n    --model_dir=exp\u002Fsts-b \\\n    --uncased=False \\\n    --spiece_model_file=${LARGE_DIR}\u002Fspiece.model \\\n    --model_config_path=${LARGE_DIR}\u002Fxlnet_config.json \\\n    --init_checkpoint=${LARGE_DIR}\u002Fxlnet_model.ckpt \\\n    --max_seq_length=128 \\\n    --train_batch_size=8 \\\n    --num_hosts=1 \\\n    --num_core_per_host=4 \\\n    --learning_rate=5e-5 \\\n    --train_steps=1200 \\\n    --warmup_steps=120 \\\n    --save_steps=600 \\\n    --is_regression=True\n  ```\n\n- 使用单张 GPU 评估微调结果，运行以下命令：\n\n  ```shell\n  CUDA_VISIBLE_DEVICES=0 python run_classifier.py \\\n    --do_train=False \\\n    --do_eval=True \\\n    --task_name=sts-b \\\n    --data_dir=${GLUE_DIR}\u002FSTS-B \\\n    --output_dir=proc_data\u002Fsts-b \\\n    --model_dir=exp\u002Fsts-b \\\n    --uncased=False \\\n    --spiece_model_file=${LARGE_DIR}\u002Fspiece.model \\\n    --model_config_path=${LARGE_DIR}\u002Fxlnet_config.json \\\n    --max_seq_length=128 \\\n    --eval_batch_size=8 \\\n    --num_hosts=1 \\\n    --num_core_per_host=1 \\\n    --eval_all_ckpt=True \\\n    --is_regression=True\n\n  # 预期性能： \"eval_pearsonr 0.916+ \"\n  ```\n\n**注意事项**：\n\n- 在 GPU 训练的上下文中，`num_core_per_host` 表示使用的 GPU 数量。\n- 在多 GPU 设置中，`train_batch_size` 指的是 \u003Cu>每块 GPU 的批次大小\u003C\u002Fu>。\n- `eval_all_ckpt` 允许在训练结束后评估所有保存的检查点（保存频率由 `save_steps` 控制），并根据验证集表现选择最佳模型。\n- `data_dir` 和 `output_dir` 分别指“原始数据”和“预处理后的 TFRecords 文件”的目录，而 `model_dir` 是用于保存检查点和 TensorFlow 事件的工作目录。**`model_dir` 应设置为与 `init_checkpoint` 不同的独立文件夹。**\n- 如果要尝试使用 \u003Cu>XLNet-base\u003C\u002Fu>,只需将 `--train_batch_size` 设置为 32，并将 `--num_core_per_host` 设置为 1，同时相应地更改 `init_checkpoint` 和 `model_config_path`。\n- 对于显存较小的 GPU，请按比例减少 `train_batch_size` 并增加 `num_core_per_host` 来使用相同的训练设置。\n- **重要提示**：我们将训练和评估分为“两个阶段”，因为使用多 GPU 进行评估比较复杂（需要正确地在各 GPU 之间划分数据）。为确保准确性，目前我们仅支持单 GPU 评估。\n\n\n#### (2) IMDB：电影评论情感分类（使用 TPU V3-8）\n\n- 通过运行以下命令下载并解压 IMDB 数据集：\n\n  ```shell\n  wget http:\u002F\u002Fai.stanford.edu\u002F~amaas\u002Fdata\u002Fsentiment\u002FaclImdb_v1.tar.gz\n  tar zxvf aclImdb_v1.tar.gz\n  ```\n\n- 启动一个 Google Cloud TPU V3-8 实例（请参阅 [Google Cloud TPU 教程](https:\u002F\u002Fcloud.google.com\u002Ftpu\u002Fdocs\u002Ftutorials\u002Fmnist) 了解如何设置 Cloud TPU）。\n\n- 设置您的 Google 存储桶路径 `$GS_ROOT`，并将 IMDB 数据集和预训练检查点移动到您的 Google 存储中。\n\n- 使用 XLNet-Large 在 TPU 上进行微调，运行以下命令：\n\n  ```shell\n  python run_classifier.py \\\n    --use_tpu=True \\\n    --tpu=${TPU_NAME} \\\n    --do_train=True \\\n    --do_eval=True \\\n    --eval_all_ckpt=True \\\n    --task_name=imdb \\\n    --data_dir=${IMDB_DIR} \\\n    --output_dir=${GS_ROOT}\u002Fproc_data\u002Fimdb \\\n    --model_dir=${GS_ROOT}\u002Fexp\u002Fimdb \\\n    --uncased=False \\\n    --spiece_model_file=${LARGE_DIR}\u002Fspiece.model \\\n    --model_config_path=${GS_ROOT}\u002F${LARGE_DIR}\u002Fmodel_config.json \\\n    --init_checkpoint=${GS_ROOT}\u002F${LARGE_DIR}\u002Fxlnet_model.ckpt \\\n    --max_seq_length=512 \\\n    --train_batch_size=32 \\\n    --eval_batch_size=8 \\\n    --num_hosts=1 \\\n    --num_core_per_host=8 \\\n    --learning_rate=2e-5 \\\n    --train_steps=4000 \\\n    --warmup_steps=500 \\\n    --save_steps=500 \\\n    --iterations=500\n\n  # 预期性能： \"eval_accuracy 0.962+ \"\n  ```\n\n**注意事项**：\n\n- 要在 IMDB 数据集中获得 SOTA 性能，使用序列长度 512 是 **必要的**。因此，我们展示了如何使用 TPU V3-8 实现这一点。\n- 或者，可以使用小于 512 的序列长度、更小的批次大小，或切换到 XLNet-base 在 GPU 上进行训练。但预计性能会有所下降。\n- 注意，`data_dir` 和 `spiece_model_file` 均使用本地路径而非 Google 存储路径。原因是数据预处理实际上是在本地完成的。因此，使用本地路径可以加快预处理速度。\n\n### SQuAD2.0\n\nSQuAD 数据集的相关代码包含在 `run_squad.py` 中。\n\n运行代码步骤如下：\n\n(1) 将 SQuAD2.0 数据集下载到 `$SQUAD_DIR` 目录中：\n\n```shell\nmkdir -p ${SQUAD_DIR} && cd ${SQUAD_DIR}\nwget https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002Fdataset\u002Ftrain-v2.0.json\nwget https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002Fdataset\u002Fdev-v2.0.json\n```\n\n(2) 使用脚本 `scripts\u002Fprepro_squad.sh` 进行数据预处理。\n\n- 由于需要准确地将字符位置（原始数据）映射到 SentencePiece 位置（用于训练），这一过程可能需要较长时间。\n\n- 如需更快的并行预处理，请参考 `run_squad.py` 中的 `--num_proc` 和 `--proc_id` 标志。\n\n(3) 进行训练和评估。\n\n为了获得最佳性能，XLNet-Large 在训练时应使用 \u003Cu>序列长度 512\u003C\u002Fu> 和 \u003Cu>批次大小 48\u003C\u002Fu>。\n\n- 因此，使用 GPU 复现最佳结果相当困难。\n\n- 如果使用一台 TPU v3-8 进行训练，可以在 TPU 和 Google 存储都设置好之后，直接运行脚本 `scripts\u002Ftpu_squad_large.sh`。\n- `run_squad.py` 会自动在 SQuAD 的验证集上进行阈值搜索并输出分数。使用 `scripts\u002Ftpu_squad_large.sh`，预期 F1 分数应在 88.6 左右（我们多次运行的中位数）。\n\n此外，也可以使用 XLNet-Base 在 GPU 上进行训练（例如三张 V100）。一组合理的超参数可在脚本 `scripts\u002Fgpu_squad_base.sh` 中找到。\n\n### RACE 阅读理解\n\n阅读理解任务 [RACE](https:\u002F\u002Fwww.cs.cmu.edu\u002F~glai1\u002Fdata\u002Frace\u002F) 的代码包含在 `run_race.py` 中。\n\n- 值得注意的是，RACE 数据集中文章的平均长度超过 300 个标记（而非片段），这比 SQuAD 等其他流行的阅读理解数据集要\u003Cu>显著更长\u003C\u002Fu>。\n- 此外，许多问题非常困难，机器需要进行复杂的推理才能解答（请参阅[此处的一个示例](misc\u002Frace_example.md))。\n\n\n运行代码的步骤如下：\n\n(1) 从[RACE 官方网站](https:\u002F\u002Fwww.cs.cmu.edu\u002F~glai1\u002Fdata\u002Frace\u002F)下载数据集，并将原始数据解压到 `$RACE_DIR` 目录下。\n\n(2) 进行训练和评估：\n\n- RACE 的当前最优性能（准确率 81.75%）是使用 XLNet-Large 模型、序列长度为 512、批量大小为 32 得到的，这需要在一个 pod 设置中的大型 TPU v3-32 上运行。有关此设置，请参考脚本 `script\u002Ftpu_race_large_bsz32.sh`。\n- 在 TPU v3-8 上使用 XLNet-Large 模型、序列长度为 512、批量大小为 8，可以获得约 80.3% 的准确率（参见 `script\u002Ftpu_race_large_bsz8.sh`）。\n\n\n### 使用 Google Colab\n\n我们提供了一个使用带有 GPU 的 Google Colab 的[示例](notebooks\u002Fcolab_imdb_gpu.ipynb)。请注意，由于该示例中的硬件资源有限，结果可能不如最佳性能。它主要用作一个示例，应根据实际情况进行修改以最大化性能。\n\n\n## XLNet 的自定义使用\n\n### XLNet 抽象层\n\n对于微调任务，您很可能可以直接修改现有的文件，例如 `run_classifier.py`、`run_squad.py` 和 `run_race.py` 来适应您的具体任务。不过，我们也提供了一个 XLNet 的抽象层，以便更灵活地使用。以下是一个示例：\n\n```python\nimport xlnet\n\n# 此处省略部分代码...\n# 初始化 FLAGS\n# 初始化 tf.Tensor 实例，包括 input_ids、seg_ids 和 input_mask\n\n# XLNetConfig 包含特定于模型检查点的超参数。\nxlnet_config = xlnet.XLNetConfig(json_path=FLAGS.model_config_path)\n\n# RunConfig 包含在预训练和微调过程中可能不同的超参数。\nrun_config = xlnet.create_run_config(is_training=True, is_finetune=True, FLAGS=FLAGS)\n\n# 构建 XLNet 模型\nxlnet_model = xlnet.XLNetModel(\n    xlnet_config=xlnet_config,\n    run_config=run_config,\n    input_ids=input_ids,\n    seg_ids=seg_ids,\n    input_mask=input_mask)\n\n# 使用最后一个隐藏状态获取序列摘要\nsummary = xlnet_model.get_pooled_out(summary_type=\"last\")\n\n# 获取序列输出\nseq_out = xlnet_model.get_sequence_output()\n\n# 根据 `summary` 或 `seq_out` 构建您的应用\n```\n\n### 分词\n\n以下是 XLNet 中分词的示例：\n```python\nimport sentencepiece as spm\nfrom prepro_utils import preprocess_text, encode_ids\n\n# 此处省略部分代码...\n# 初始化 FLAGS\n\ntext = \"一个输入文本字符串。\"\n\nsp_model = spm.SentencePieceProcessor()\nsp_model.Load(FLAGS.spiece_model_file)\ntext = preprocess_text(text, lower=FLAGS.uncased)\nids = encode_ids(sp_model, text)\n```\n其中 `FLAGS.spiece_model_file` 是与预训练模型位于同一压缩包中的 SentencePiece 模型文件，`FLAGS.uncased` 是一个布尔值，指示是否进行小写转换。\n\n\n## 使用 XLNet 进行预训练\n\n关于在 TPU 上进行预训练，请参考 `train.py`；而在 GPU 上进行预训练，则可参考 `train_gpu.py`。首先，我们需要将文本数据预处理为 tfrecords 格式。\n\n```shell\npython data_utils.py \\\n\t--bsz_per_host=32 \\\n\t--num_core_per_host=16 \\\n\t--seq_len=512 \\\n\t--reuse_len=256 \\\n\t--input_glob=*.txt \\\n\t--save_dir=${SAVE_DIR} \\\n\t--num_passes=20 \\\n\t--bi_data=True \\\n\t--sp_path=spiece.model \\\n\t--mask_alpha=6 \\\n\t--mask_beta=1 \\\n\t--num_predict=85\n```\n\n其中 `input_glob` 定义了所有输入文本文件，`save_dir` 是 tfrecords 的输出目录，而 `sp_path` 是一个[Sentence Piece](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fsentencepiece)模型。以下是用于训练 Sentence Piece 模型的脚本：\n\n```bash\nspm_train \\\n\t--input=$INPUT \\\n\t--model_prefix=sp10m.cased.v3 \\\n\t--vocab_size=32000 \\\n\t--character_coverage=0.99995 \\\n\t--model_type=unigram \\\n\t--control_symbols=\u003Ccls>,\u003Csep>,\u003Cpad>,\u003Cmask>,\u003Ceod> \\\n\t--user_defined_symbols=\u003Ceop>,.,(,),\",-,–,£,€ \\\n\t--shuffle_input_sentence \\\n\t--input_sentence_size=10000000\n```\n\n这里使用了特殊符号，包括 `control_symbols` 和 `user_defined_symbols`。我们用 `\u003Ceop>` 和 `\u003Ceod>` 分别表示段落结束和文档结束。\n\n传递给 `data_utils.py` 的输入文本文件必须采用以下格式：\n* 每一行代表一个句子。\n* 空行表示文档结束。\n* （可选）如果希望对段落结构进行建模，可以在某些行末尾插入 `\u003Ceop>`（不加空格），以表明该句属于某个段落的结尾。\n\n例如，输入文本文件可以是：\n```\n这是第一句话。\n这是第二句话，也是段落的结尾。\u003Ceop>\n另一个段落。\n\n这里开始另一篇文档。\n```\n\n预处理完成后，我们就可以开始预训练 XLNet 了。以下是 XLNet-Large 预训练所使用的超参数：\n\n```shell\npython train.py\n  --record_info_dir=$DATA\u002Ftfrecords \\\n  --train_batch_size=2048 \\\n  --seq_len=512 \\\n  --reuse_len=256 \\\n  --mem_len=384 \\\n  --perm_size=256 \\\n  --n_layer=24 \\\n  --d_model=1024 \\\n  --d_embed=1024 \\\n  --n_head=16 \\\n  --d_head=64 \\\n  --d_inner=4096 \\\n  --untie_r=True \\\n  --mask_alpha=6 \\\n  --mask_beta=1 \\\n  --num_predict=85\n```\n\n这里仅列出了最重要的几个标志，其他标志可以根据具体应用场景进行调整。","# XLNet 快速上手指南\n\nXLNet 是一种基于广义排列语言建模目标的无监督语言表示学习方法，采用 Transformer-XL 作为骨干模型，在长上下文任务中表现优异。本指南将帮助开发者快速完成环境配置、模型下载及微调运行。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐) 或 macOS\n- **Python 版本**: Python 2.7 (官方代码库主要基于 Py2 测试，若使用 Py3 需自行适配)\n- **深度学习框架**: TensorFlow 1.13.1\n- **硬件建议**:\n    - **XLNet-Base**: 单张 16GB GPU 可运行（序列长度需调整）。\n    - **XLNet-Large**: 显存需求极高。单张 16GB GPU 仅能支持序列长度 512 的 batch_size=1。复现论文 SOTA 结果通常需要使用 TPU 或多卡 GPU 集群。\n\n### 前置依赖\n安装必要的 Python 库：\n```bash\npip install tensorflow==1.13.1\npip install sentencepiece\npip install numpy\n```\n\n## 安装步骤\n\n### 1. 获取源代码\n克隆官方仓库：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fzihangdai\u002Fxlnet.git\ncd xlnet\n```\n\n### 2. 下载预训练模型\n目前官方主要提供 Cased 版本模型。请根据需求选择下载 `XLNet-Large` 或 `XLNet-Base`。\n\n**XLNet-Large (24 层)**\n```bash\nmkdir models_large\ncd models_large\nwget https:\u002F\u002Fstorage.googleapis.com\u002Fxlnet\u002Freleased_models\u002Fcased_L-24_H-1024_A-16.zip\nunzip cased_L-24_H-1024_A-16.zip\n# 解压后包含：xlnet_model.ckpt, spiece.model, xlnet_config.json\nexport LARGE_DIR=$(pwd)\ncd ..\n```\n\n**XLNet-Base (12 层)**\n```bash\nmkdir models_base\ncd models_base\nwget https:\u002F\u002Fstorage.googleapis.com\u002Fxlnet\u002Freleased_models\u002Fcased_L-12_H-768_A-12.zip\nunzip cased_L-12_H-768_A-12.zip\nexport BASE_DIR=$(pwd)\ncd ..\n```\n\n> **注意**: 国内用户若下载缓慢，可尝试使用代理加速或寻找国内镜像源存储的模型文件。\n\n## 基本使用\n\n以下以 **文本分类\u002F回归任务 (GLUE 数据集 STS-B)** 为例，演示如何使用 `run_classifier.py` 进行微调。\n\n### 1. 准备数据\n下载并解压 GLUE 数据集（以 STS-B 为例）：\n```bash\n# 假设已下载脚本 download_glue_data.py\npython download_glue_data.py --data_dir=$GLUE_DIR --tasks=STS-B\n```\n\n### 2. 单机多卡微调 (GPU)\n假设使用 4 张 V100 GPU 进行 `XLNet-Large` 的微调训练：\n\n```bash\nCUDA_VISIBLE_DEVICES=0,1,2,3 python run_classifier.py \\\n  --do_train=True \\\n  --do_eval=False \\\n  --task_name=sts-b \\\n  --data_dir=${GLUE_DIR}\u002FSTS-B \\\n  --output_dir=proc_data\u002Fsts-b \\\n  --model_dir=exp\u002Fsts-b \\\n  --uncased=False \\\n  --spiece_model_file=${LARGE_DIR}\u002Fspiece.model \\\n  --model_config_path=${LARGE_DIR}\u002Fxlnet_config.json \\\n  --init_checkpoint=${LARGE_DIR}\u002Fxlnet_model.ckpt \\\n  --max_seq_length=128 \\\n  --train_batch_size=8 \\\n  --num_hosts=1 \\\n  --num_core_per_host=4 \\\n  --learning_rate=5e-5 \\\n  --train_steps=1200 \\\n  --warmup_steps=120 \\\n  --save_steps=600 \\\n  --is_regression=True\n```\n\n**参数说明**:\n- `num_core_per_host`: GPU 数量。\n- `train_batch_size`: **每张卡**的批次大小。\n- `model_dir`: 保存检查点的目录，**必须**与 `init_checkpoint` 所在目录分开。\n\n### 3. 单卡评估 (GPU)\n训练完成后，使用单张 GPU 评估效果：\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python run_classifier.py \\\n  --do_train=False \\\n  --do_eval=True \\\n  --task_name=sts-b \\\n  --data_dir=${GLUE_DIR}\u002FSTS-B \\\n  --output_dir=proc_data\u002Fsts-b \\\n  --model_dir=exp\u002Fsts-b \\\n  --uncased=False \\\n  --spiece_model_file=${LARGE_DIR}\u002Fspiece.model \\\n  --model_config_path=${LARGE_DIR}\u002Fxlnet_config.json \\\n  --max_seq_length=128 \\\n  --eval_batch_size=8 \\\n  --num_hosts=1 \\\n  --num_core_per_host=1 \\\n  --eval_all_ckpt=True \\\n  --is_regression=True\n```\n\n### 显存优化提示\n若遇到显存不足 (OOM) 错误：\n1. 减小 `--max_seq_length` (如从 512 降至 128)。\n2. 减小 `--train_batch_size`。\n3. 对于低显存显卡，建议优先尝试 `XLNet-Base` 模型，并将 `--train_batch_size` 设为 32，`--num_core_per_host` 设为 1。","某大型电商平台的智能客服团队正在升级其自动问答系统，旨在更精准地理解用户复杂的投诉描述并直接从海量知识库中提取答案。\n\n### 没有 xlnet 时\n- **长文本理解能力弱**：面对用户长篇大论的故障描述，旧模型（如 BERT）因上下文窗口限制，经常遗漏关键的前置条件或后置结果，导致答非所问。\n- **语义歧义处理差**：在涉及多重否定或复杂指代（如“我不觉得这个不像坏的”）的句子中，模型极易判断错误，将正面反馈误判为负面。\n- **答案抽取精度低**：在阅读理解任务中，模型难以从冗长的商品评论或政策文档中定位到确切的答案片段，准确率徘徊在 78% 左右。\n- **依赖大量标注数据**：为了提升特定场景的表现，团队需要耗费数周时间人工标注成千上万条训练数据，成本高昂且迭代缓慢。\n\n### 使用 xlnet 后\n- **长上下文精准捕捉**：借助 Transformer-XL 架构，xlnet 能完整建模超长用户日志，准确关联段落首尾信息，彻底解决长文本遗忘问题。\n- **复杂逻辑推理增强**：基于广义自回归预训练目标，xlnet 在处理双重否定及复杂语序时表现卓越，情感分析与意图识别的错误率显著下降。\n- **SOTA 级答案定位**：在 SQuAD 等阅读理解基准上，xlnet-Large 将精确匹配率（EM）提升至 86% 以上，能从文档中一次性锁定标准答案。\n- **小样本高效迁移**：凭借更强的语言表示学习能力，仅需少量微调数据即可在垂直领域达到最佳效果，大幅缩短了模型上线周期。\n\nxlnet 通过突破性的排列语言建模机制，让机器真正具备了像人类一样灵活理解任意顺序文本的能力，显著提升了复杂场景下的自然语言处理上限。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzihangdai_xlnet_87016d23.png","zihangdai","Zihang Dai","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fzihangdai_0c78b4f1.png",null,"xAI","zander.dai@gmail.com","https:\u002F\u002Fgithub.com\u002Fzihangdai",[80,84,88],{"name":81,"color":82,"percentage":83},"Python","#3572A5",96.9,{"name":85,"color":86,"percentage":87},"Jupyter Notebook","#DA5B0B",1.8,{"name":89,"color":90,"percentage":91},"Shell","#89e051",1.2,6177,1154,"2026-04-14T02:34:18","Apache-2.0",4,"未说明","非必需（支持 TPU），若使用 GPU 则需 NVIDIA GPU。XLNet-Large 在序列长度 512 时单卡至少需 16GB 显存（仅能容纳 batch_size=1）；XLNet-Base 在序列长度 512 时需约 8-16GB 显存。文中测试环境为 V100 GPU。","未说明（但指出 TPU 通常比普通 GPU 拥有更多内存，复现 SOTA 结果对显存要求极高）",{"notes":101,"python":102,"dependencies":103},"1. 该代码库基于 2019 年发布，主要依赖 TensorFlow 1.13.1 和 Python 2，与现代环境兼容性较差。2. 大多数 SOTA 结果是在 TPU 上产生的，使用 12GB-16GB 显存的 GPU 很难复现 XLNet-Large 的结果，通常需要多卡并行（32-128 张）或减小批次大小\u002F序列长度。3. 目前仅提供带大小写（cased）的预训练模型。4. 评估阶段目前仅支持单 GPU 模式以确保正确性。","Python 2",[104,105],"TensorFlow==1.13.1","SentencePiece",[35,14],[108,109,110],"tensorflow","nlp","deep-learning","2026-03-27T02:49:30.150509","2026-04-18T14:24:40.000147",[114,119,123,128,132,137,141,146],{"id":115,"question_zh":116,"answer_zh":117,"source_url":118},39946,"如何从 XLNet 模型中提取词嵌入（Word Embeddings）？","可以使用项目中的 `gpu_extract` 脚本来获取词嵌入。运行该脚本后，即可成功生成包含词嵌入的 JSON 输出文件。注意确保使用的是修复了真实 token 与填充 token 对齐问题的最新版本代码。","https:\u002F\u002Fgithub.com\u002Fzihangdai\u002Fxlnet\u002Fissues\u002F39",{"id":120,"question_zh":121,"answer_zh":122,"source_url":118},39947,"如何在运行提取脚本时强制使用 GPU 而不是 CPU？","默认情况下脚本可能运行在 CPU 上。要强制使用 GPU，请确保使用 `gpu_extract` 脚本，并在运行环境中正确配置了 GPU 支持（例如安装正确的 CUDA 版本和 TensorFlow GPU 版）。用户反馈直接运行该脚本即可在单 GPU 环境下获得词嵌入的 JSON 输出。",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},39948,"在 TPU v3-8 上运行 SQuAD 大型模型脚本时遇到内存不足（OOM）错误怎么办？","遇到 OOM 错误时，建议采取以下步骤：\n1. 确保使用的虚拟机环境与之前运行 BERT 时一致，无需特殊更改。\n2. 创建 TPU 时指定正确的 TensorFlow 版本，例如：`ctpu up --name your-xlnet-tpu --tpu-size v3-8 --tpu-only --tf-version 1.14.1.dev20190518 --zone us-central1-b`。\n3. 为了快速调试，可以先设置较小的 `train_steps`（例如 20）来排查是否是配置问题。\n4. 检查虚拟机上的 TensorFlow 版本是否满足 XLNet 的要求（使用 `pip list | grep tensorflow` 查看）。","https:\u002F\u002Fgithub.com\u002Fzihangdai\u002Fxlnet\u002Fissues\u002F15",{"id":129,"question_zh":130,"answer_zh":131,"source_url":127},39949,"在 Colab TPU 上进行预训练时遇到 TensorFlow 版本兼容性问题如何解决？","Colab TPU 存在版本冲突：TensorFlow 1.14.0 可以运行分类任务但无法运行预训练（缺少 `tensorflow.contrib.tpu.proto`），而 1.13.1 会导致硬件状态错误。目前的变通方案是：对于微调任务（如 `run_classifier`），使用 TensorFlow 1.14.0；对于预训练，可能需要等待官方更新自定义的 `tpu_estimator.py` 以兼容新版本，或者尝试在本地环境中配置特定的 TF 1.13.1 环境。有用户通过减少 `train_steps` 到 6000 成功运行了部分任务。",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},39950,"预训练过程中 Loss 不下降甚至波动增加是否正常？","预训练初期 Loss 波动或暂时上升可能是正常现象，特别是当数据集较大或模型较深时。用户反馈在使用 160 万句子数据集、配置为 6 层、隐藏层维度 768、序列长度 512 的参数下进行预训练时，Loss 会在一定范围内波动。关键在于持续观察长期趋势，并确保学习率调度（lr）和梯度裁剪（gnorm）正常工作。如果长期不收敛，需检查数据预处理和质量。","https:\u002F\u002Fgithub.com\u002Fzihangdai\u002Fxlnet\u002Fissues\u002F124",{"id":138,"question_zh":139,"answer_zh":140,"source_url":136},39951,"从头预训练 XLNet 需要多长时间以及推荐什么硬件配置？","根据社区经验，使用两张 32GB Tesla V100 GPU 预训练 20 万条数据大约需要 5 天时间。对于更大的数据集（如 160 万句子），时间会相应增加。推荐的预训练参数包括：`train_batch_size=32`, `seq_len=512`, `reuse_len=256`, `mem_len=384`, `n_layer=6`, `d_model=768` 等，具体可根据显存大小调整。",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},39952,"如何将训练好的 XLNet 模型导出用于服务（Serving）？","导出模型需要编写一个 `serving_input_fn` 函数。在该函数中，定义输入特征规范（feature_spec），包括 `input_ids`, `input_mask`, `segment_ids` 等占位符。使用 `tf.parse_example` 解析序列化后的 Example 字符串，并返回 `tf.estimator.export.ServingInputReceiver` 对象。确保在 `tf.variable_scope(\"model\")` 下构建模型以正确加载权重。","https:\u002F\u002Fgithub.com\u002Fzihangdai\u002Fxlnet\u002Fissues\u002F113",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},39953,"在推理（Inference）模式下仍然得到随机输出怎么办？","如果在推理时得到随机结果，请确保在创建运行配置时将 `is_training` 和 `is_finetune` 都设置为 `False`。代码示例：`run_config = xlnet.xlnet.create_run_config(is_training=False, is_finetune=False, FLAGS=FLAGS)`。如果问题依旧，检查输入数据的预处理（如分词、mask 生成）是否与预训练时保持一致，并确认模型权重已正确初始化加载。","https:\u002F\u002Fgithub.com\u002Fzihangdai\u002Fxlnet\u002Fissues\u002F160",[]]