[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-allenai--longformer":3,"tool-allenai--longformer":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":96,"forks":97,"last_commit_at":98,"license":99,"difficulty_score":10,"env_os":100,"env_gpu":101,"env_ram":102,"env_deps":103,"category_tags":110,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":111,"updated_at":112,"faqs":113,"releases":144},1993,"allenai\u002Flongformer","longformer","Longformer: The Long-Document Transformer","Longformer 是专为处理超长文档设计的 Transformer 模型，能有效应对传统模型因上下文长度限制而无法处理长文本的问题。它通过创新的“滑动窗口注意力”机制，在保持计算效率的同时，支持长达 16,384 个标记的输入，远超常规模型的 512–1024 标记限制，特别适合文档摘要、法律文书分析、学术论文处理等任务。Longformer 还提供编码器-解码器版本（LED），支持长文本到长文本的序列生成，如自动摘要或问答。它已无缝集成到 Hugging Face Transformers 库中，支持梯度检查点、FP16 精度和 CPU\u002FTPU 运行，大幅降低显存需求，使普通 GPU 也能处理长文本任务。开发者和研究人员可直接加载预训练模型进行微调，无需从头训练，极大提升了效率。适合需要处理长文本的 NLP 工程师、学术研究者和文档智能应用的开发者使用，普通用户则可通过集成其能力的应用间接受益。","# \u003Cp align=center>`Longformer`\u003C\u002Fp>\n`Longformer` and `LongformerEncoderDecoder (LED)` are pretrained transformer models for long documents.\n\n**\\*\\*\\*\\*\\* New December 1st, 2020: LongformerEncoderDecoder \\*\\*\\*\\*\\***\n\nA `LongformerEncoderDecoder (LED)` model is now available. It supports seq2seq tasks with long input. With gradient checkpointing, fp16, and 48GB gpu, the input length can be up to 16K tokens. Check the updated paper for the model details and evaluation.\n\n* Pretrained models:  1) [`led-base-16384`](https:\u002F\u002Fai2-s2-research.s3-us-west-2.amazonaws.com\u002Flongformer\u002Flongformer-encdec-base-16384.tar.gz),  2) [`led-large-16384`](https:\u002F\u002Fai2-s2-research.s3-us-west-2.amazonaws.com\u002Flongformer\u002Flongformer-encdec-large-16384.tar.gz)\n\n* Requirements: Make sure to use the huggingface\u002Ftransformers fork specified in `requirements.txt`. It adds support for gradient checkpointing and allows different maximum sequence length for the input and output. You can also run `pip install git+https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer.git`\n\n* Check the script `scripts\u002Fsummarization.py` for an example of how to use the model.\n\n\n**\\*\\*\\*\\*\\* New July 23rd, 2020: Speed degradation \\*\\*\\*\\*\\***\n\nA significant speed degradation in the hugginface\u002Ftransformers was recenlty discovered and fixed (check [this PR](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fpull\u002F5811) for details). To avoid this problem, either use the old [release v2.11.0](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fv2.11.0) but it doesn't support gradient checkpointing, or use the master branch. This problem should be fixed with the next hugginface\u002Ftransformers release.\n\n\n**\\*\\*\\*\\*\\* New June 29th, 2020: Easier to use Gradient checkpointing \\*\\*\\*\\*\\***\n\nGradient checkpointing has been released with huggingface\u002Ftransformers [release v3.0.0](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fv3.0.0). Gradient checkpointing reduces memory by 5x which makes it possible to process longer sequences on smaller GPUs. To use, try something like the following:\n\n```\nfrom transformers import LongformerModel\nmodel = LongformerModel.from_pretrained('allenai\u002Flongformer-base-4096', gradient_checkpointing=True)\n```\n\n**\\*\\*\\*\\*\\* New June 2nd, 2020: Integrating with Huggingface + Train your own long model + Gradient checkpointing \\*\\*\\*\\*\\***\n\n1. `Longformer` is now integrated in the huggingface\u002Ftransformers [release v2.11.0](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fv2.11.0). Now you can do\n```\nfrom transformers import LongformerModel\nmodel = LongformerModel.from_pretrained(\"allenai\u002Flongformer-base-4096\")\n```\nThe release also includes `LongformerForQA` and other `LongformerForTaskName` with automatic setting of global attention.\n\n2. We added a [notebook](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fallenai\u002Flongformer\u002Fblob\u002Fmaster\u002Fscripts\u002Fconvert_model_to_long.ipynb) to show how to convert an existing pretrained model into its \"long\" version. \n\n3. Gradient checkpointing has been merged into HF master ([check PR](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fpull\u002F4659)). Gradient checkpointing can reduce memory usage significanlty (5x for `longformer-base-4096`) allowing longer sequences on smaller gpus. \n\n\n**\\*\\*\\*\\*\\* New April 27th, 2020: A PyTorch implementation of the sliding window attention  \\*\\*\\*\\*\\***\n\nWe added a PyTorch implementation of the sliding window attention that doesn't require the custom CUDA kernel. It is limited in functionality but more convenient to use for finetuning on downstream tasks. \n\n**Advantage**: supports CPU, TPU and fp16, which aren't supported by the custom CUDA kernel\n\n**Limitations**: uses 2x more memory (but fp16 offsets that), and doesn’t support dilation and autoregressive attention (not needed for finetuning)\n\ntherefore, it is suitable for finetuning on downstream tasks but not a good choice for language modeling. The code snippit below and the TriviaQA scripts were updated to use this new implementation.\n\n**\\*\\*\\*\\*\\* End new information \\*\\*\\*\\*\\***\n\n### How to use\n\n1. Download pretrained model\n  * [`longformer-base-4096`](https:\u002F\u002Fai2-s2-research.s3-us-west-2.amazonaws.com\u002Flongformer\u002Flongformer-base-4096.tar.gz)\n  * [`longformer-large-4096`](https:\u002F\u002Fai2-s2-research.s3-us-west-2.amazonaws.com\u002Flongformer\u002Flongformer-large-4096.tar.gz)\n\n2. Install environment and code\n\n    ```bash\n    conda create --name longformer python=3.7\n    conda activate longformer\n    conda install cudatoolkit=10.0\n    pip install git+https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer.git\n    ```\n\n3. Run the model\n\n    ```python\n    import torch\n    from longformer.longformer import Longformer, LongformerConfig\n    from longformer.sliding_chunks import pad_to_window_size\n    from transformers import RobertaTokenizer\n\n    config = LongformerConfig.from_pretrained('longformer-base-4096\u002F') \n    # choose the attention mode 'n2', 'tvm' or 'sliding_chunks'\n    # 'n2': for regular n2 attantion\n    # 'tvm': a custom CUDA kernel implementation of our sliding window attention\n    # 'sliding_chunks': a PyTorch implementation of our sliding window attention\n    config.attention_mode = 'sliding_chunks'\n\n    model = Longformer.from_pretrained('longformer-base-4096\u002F', config=config)\n    tokenizer = RobertaTokenizer.from_pretrained('roberta-base')\n    tokenizer.model_max_length = model.config.max_position_embeddings\n\n    SAMPLE_TEXT = ' '.join(['Hello world! '] * 1000)  # long input document\n \n    input_ids = torch.tensor(tokenizer.encode(SAMPLE_TEXT)).unsqueeze(0)  # batch of size 1\n\n    # TVM code doesn't work on CPU. Uncomment this if `config.attention_mode = 'tvm'`\n    # model = model.cuda(); input_ids = input_ids.cuda()\n\n    # Attention mask values -- 0: no attention, 1: local attention, 2: global attention\n    attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device) # initialize to local attention\n    attention_mask[:, [1, 4, 21,]] =  2  # Set global attention based on the task. For example,\n                                         # classification: the \u003Cs> token\n                                         # QA: question tokens\n\n    # padding seqlen to the nearest multiple of 512. Needed for the 'sliding_chunks' attention\n    input_ids, attention_mask = pad_to_window_size(\n            input_ids, attention_mask, config.attention_window[0], tokenizer.pad_token_id)\n\n    output = model(input_ids, attention_mask=attention_mask)[0]\n    ```\n\n### Model pretraining\n\n[This notebook](https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fblob\u002Fmaster\u002Fscripts\u002Fconvert_model_to_long.ipynb) demonstrates our procedure for training Longformer starting from the RoBERTa checkpoint. The same procedure can be followed to get a long-version of other existing pretrained models. \n\n### TriviaQA\n\n* Training scripts: `scripts\u002Ftriviaqa.py`\n* Pretrained large model: [`here`](https:\u002F\u002Fai2-s2-research.s3-us-west-2.amazonaws.com\u002Flongformer\u002Ftriviaqa-longformer-large.tar.gz) (replicates leaderboard results)\n* Instructions: `scripts\u002Fcheatsheet.txt`\n\n\n### CUDA kernel\n\nOur custom CUDA kernel is implemented in TVM.  For now, the kernel only works on GPUs and Linux. We tested it on Ubuntu, Python 3.7, CUDA10, PyTorch >= 1.2.0. If it doesn't work for your environment, please create a new issue.\n\n**Compiling the kernel**: We already include the compiled binaries of the CUDA kernel, so most users won't need to compile it, but if you are intersted, check `scripts\u002Fcheatsheet.txt` for instructions.\n\n\n### Known issues\n\nPlease check the repo [issues](https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fissues) for a list of known issues that we are planning to address soon. If your issue is not discussed, please create a new one. \n\n\n### Citing\n\nIf you use `Longformer` in your research, please cite [Longformer: The Long-Document Transformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.05150).\n```\n@article{Beltagy2020Longformer,\n  title={Longformer: The Long-Document Transformer},\n  author={Iz Beltagy and Matthew E. Peters and Arman Cohan},\n  journal={arXiv:2004.05150},\n  year={2020},\n}\n```\n\n`Longformer` is an open-source project developed by [the Allen Institute for Artificial Intelligence (AI2)](http:\u002F\u002Fwww.allenai.org).\nAI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.\n","# \u003Cp align=center>`Longformer`\u003C\u002Fp>\n`Longformer` 和 `LongformerEncoderDecoder (LED)` 是针对长文档的预训练 Transformer 模型。\n\n**\\*\\*\\*\\*\\* 2020年12月1日新功能：LongformerEncoderDecoder \\*\\*\\*\\*\\***\n\n现在提供了一个 `LongformerEncoderDecoder (LED)` 模型。它支持长输入的序列到序列任务。借助梯度检查点、fp16 和 48GB GPU，输入长度最高可达 16K tokens。请查看更新后的论文以获取模型详情和评估结果。\n\n* 预训练模型：1) [`led-base-16384`](https:\u002F\u002Fai2-s2-research.s3-us-west-2.amazonaws.com\u002Flongformer\u002Flongformer-encdec-base-16384.tar.gz), 2) [`led-large-16384`](https:\u002F\u002Fai2-s2-research.s3-us-west-2.amazonaws.com\u002Flongformer\u002Flongformer-encdec-large-16384.tar.gz)\n\n* 要求：请确保使用 `requirements.txt` 中指定的 huggingface\u002Ftransformers 分支。该分支增加了对梯度检查点的支持，并允许输入和输出设置不同的最大序列长度。您也可以运行 `pip install git+https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer.git`\n\n* 请查看脚本 `scripts\u002Fsummarization.py` 以了解如何使用该模型的示例。\n\n\n**\\*\\*\\*\\*\\* 2020年7月23日新功能：速度下降 \\*\\*\\*\\*\\***\n\n最近在 hugginface\u002Ftransformers 中发现并修复了一个显著的速度下降问题（详情请查看 [此 PR](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fpull\u002F5811)）。为避免此问题，您可以使用旧版本 [v2.11.0](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fv2.11.0)，但该版本不支持梯度检查点；或者使用 master 分支。下一个 hugginface\u002Ftransformers 发布版应会修复此问题。\n\n\n**\\*\\*\\*\\*\\* 2020年6月29日新功能：更易使用梯度检查点 \\*\\*\\*\\*\\***\n\n梯度检查点已随 huggingface\u002Ftransformers [v3.0.0 版本](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fv3.0.0)发布。梯度检查点可将内存减少 5 倍，从而在较小的 GPU 上处理更长的序列成为可能。要使用，可以尝试如下代码：\n\n```\nfrom transformers import LongformerModel\nmodel = LongformerModel.from_pretrained('allenai\u002Flongformer-base-4096', gradient_checkpointing=True)\n```\n\n**\\*\\*\\*\\*\\* 2020年6月2日新功能：与 Huggingface 集成 + 训练自己的长模型 + 梯度检查点 \\*\\*\\*\\*\\*\n\n1. `Longformer` 现已集成到 huggingface\u002Ftransformers [v2.11.0 版本](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fv2.11.0)中。现在您可以这样操作：\n```\nfrom transformers import LongformerModel\nmodel = LongformerModel.from_pretrained(\"allenai\u002Flongformer-base-4096\")\n```\n该版本还包含 `LongformerForQA` 和其他 `LongformerForTaskName`，并自动设置全局注意力。\n\n2. 我们添加了一个 [notebook](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fallenai\u002Flongformer\u002Fblob\u002Fmaster\u002Fscripts\u002Fconvert_model_to_long.ipynb) 来展示如何将现有预训练模型转换为“长”版本。\n\n3. 梯度检查点已合并到 HF master 分支（[查看 PR](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fpull\u002F4659)）。梯度检查点可显著减少内存占用（对于 `longformer-base-4096` 可减少 5 倍），从而在较小的 GPU 上处理更长的序列成为可能。\n\n\n**\\*\\*\\*\\*\\* 2020年4月27日新功能：滑动窗口注意力的 PyTorch 实现 \\*\\*\\*\\*\\***\n\n我们添加了滑动窗口注意力的 PyTorch 实现，无需自定义 CUDA 内核。它的功能有限，但在下游任务微调时使用起来更方便。\n\n**优势**：支持 CPU、TPU 和 fp16，而自定义 CUDA 内核不支持这些。\n\n**局限性**：内存占用增加一倍（但 fp16 可部分抵消），且不支持扩张和自回归注意力（微调时不需要）。\n\n因此，它适合下游任务微调，但不适合作为语言建模的首选。下面的代码片段和 TriviaQA 脚本已更新为使用这一新实现。\n\n**\\*\\*\\*\\*\\* 新信息结束 \\*\\*\\*\\*\\***\n\n### 使用方法\n\n1. 下载预训练模型\n  * [`longformer-base-4096`](https:\u002F\u002Fai2-s2-research.s3-us-west-2.amazonaws.com\u002Flongformer\u002Flongformer-base-4096.tar.gz)\n  * [`longformer-large-4096`](https:\u002F\u002Fai2-s2-research.s3-us-west-2.amazonaws.com\u002Flongformer\u002Flongformer-large-4096.tar.gz)\n\n2. 安装环境和代码\n\n    ```bash\n    conda create --name longformer python=3.7\n    conda activate longformer\n    conda install cudatoolkit=10.0\n    pip install git+https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer.git\n    ```\n\n3. 运行模型\n\n    ```python\n    import torch\n    from longformer.longformer import Longformer, LongformerConfig\n    from longformer.sliding_chunks import pad_to_window_size\n    from transformers import RobertaTokenizer\n\n    config = LongformerConfig.from_pretrained('longformer-base-4096\u002F') \n    # 选择注意力模式 'n2'、'tvm' 或 'sliding_chunks'\n    # 'n2': 用于常规 n2 注意力\n    # 'tvm': 我们的滑动窗口注意力的自定义 CUDA 内核实现\n    # 'sliding_chunks': 我们的滑动窗口注意力的 PyTorch 实现\n    config.attention_mode = 'sliding_chunks'\n\n    model = Longformer.from_pretrained('longformer-base-4096\u002F', config=config)\n    tokenizer = RobertaTokenizer.from_pretrained('roberta-base')\n    tokenizer.model_max_length = model.config.max_position_embeddings\n\n    SAMPLE_TEXT = ' '.join(['Hello world! '] * 1000)  # 长输入文档\n \n    input_ids = torch.tensor(tokenizer.encode(SAMPLE_TEXT)).unsqueeze(0)  # 批次大小为 1\n\n    # TVM 代码无法在 CPU 上运行。如果 `config.attention_mode = 'tvm'`，请取消注释\n    # model = model.cuda(); input_ids = input_ids.cuda()\n\n    # 注意力掩码值 —— 0: 不关注，1: 局部关注，2: 全局关注\n    attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device) # 初始化为局部关注\n    attention_mask[:, [1, 4, 21,]] =  2  # 根据任务设置全局关注。例如，\n                                         # 分类：\u003Cs> 标记\n                                         # QA：问题标记\n\n    # 将序列长度填充到最接近 512 的倍数。这是‘sliding_chunks’注意力所必需的\n    input_ids, attention_mask = pad_to_window_size(\n            input_ids, attention_mask, config.attention_window[0], tokenizer.pad_token_id)\n\n    output = model(input_ids, attention_mask=attention_mask)[0]\n    ```\n\n### 模型预训练\n\n[此 notebook](https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fblob\u002Fmaster\u002Fscripts\u002Fconvert_model_to_long.ipynb) 展示了我们从 RoBERTa 检查点开始训练 Longformer 的流程。同样的流程也可用于获得其他现有预训练模型的长版本。\n\n### TriviaQA\n\n* 训练脚本：`scripts\u002Ftriviaqa.py`\n* 预训练大模型：[`这里`](https:\u002F\u002Fai2-s2-research.s3-us-west-2.amazonaws.com\u002Flongformer\u002Ftriviaqa-longformer-large.tar.gz)（复现排行榜结果）\n* 使用说明：`scripts\u002Fcheatsheet.txt`\n\n### CUDA内核\n\n我们的自定义CUDA内核是在TVM中实现的。目前，该内核仅支持GPU和Linux系统。我们已在Ubuntu、Python 3.7、CUDA10以及PyTorch >= 1.2.0环境下进行了测试。如果它在您的环境中无法正常运行，请提交一个新的问题。\n\n**编译内核**：我们已内置了CUDA内核的编译二进制文件，因此大多数用户无需再进行编译。不过，如果您有兴趣，可查看`scripts\u002Fcheatsheet.txt`以获取相关说明。\n\n\n### 已知问题\n\n请查看仓库的[issues](https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fissues)，了解我们近期计划解决的已知问题列表。如果您的问题未被提及，请创建一个新的issue。\n\n\n### 引用\n\n如果您在研究中使用了`Longformer`，请引用[Longformer: The Long-Document Transformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.05150)。\n```\n@article{Beltagy2020Longformer,\n  title={Longformer: The Long-Document Transformer},\n  author={Iz Beltagy and Matthew E. Peters and Arman Cohan},\n  journal={arXiv:2004.05150},\n  year={2020},\n}\n```\n\n`Longformer`是由[艾伦人工智能研究所（AI2）](http:\u002F\u002Fwww.allenai.org)开发的一个开源项目。AI2是一家非营利性研究机构，其使命是通过具有重大影响力的AI研究与工程，为全人类做出贡献。","# Longformer 快速上手指南\n\n## 环境准备\n\n- **系统要求**：Linux 或 macOS（CUDA 内核仅支持 Linux + GPU）\n- **Python 版本**：3.7+\n- **硬件建议**：至少 16GB 显存（使用梯度检查点可降低至 8GB）\n- **前置依赖**：PyTorch ≥ 1.2.0，CUDA 10.0+（如使用 TVM 内核）\n\n> 推荐使用国内镜像加速：`pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n## 安装步骤\n\n```bash\nconda create --name longformer python=3.7\nconda activate longformer\nconda install cudatoolkit=10.0\npip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple git+https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer.git\n```\n\n> 若仅需基础功能（推荐新手），可直接使用 Hugging Face Transformers：\n> ```bash\n> pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple transformers>=3.0.0\n> ```\n\n## 基本使用\n\n### 1. 加载预训练模型（推荐方式）\n\n```python\nfrom transformers import LongformerModel, LongformerTokenizer\n\nmodel = LongformerModel.from_pretrained('allenai\u002Flongformer-base-4096', gradient_checkpointing=True)\ntokenizer = LongformerTokenizer.from_pretrained('allenai\u002Flongformer-base-4096')\ntokenizer.model_max_length = 4096\n```\n\n### 2. 处理长文本示例\n\n```python\nimport torch\n\nSAMPLE_TEXT = 'Hello world! ' * 1000  # 构造长文本\ninput_ids = tokenizer.encode(SAMPLE_TEXT, return_tensors='pt')\n\n# 设置注意力掩码：1=局部注意力，2=全局注意力（如 [CLS] 位置）\nattention_mask = torch.ones(input_ids.shape, dtype=torch.long)\nattention_mask[0, 0] = 2  # 对 [CLS] token 设置全局注意力\n\n# 模型推理\noutputs = model(input_ids, attention_mask=attention_mask)\nlast_hidden_state = outputs[0]  # 获取输出\n```\n\n> ✅ 推荐使用 `gradient_checkpointing=True` 以节省显存，适用于消费级 GPU  \n> ✅ 支持 Hugging Face 生态，可直接用于 `LongformerForSequenceClassification`、`LongformerForQuestionAnswering` 等任务","一家法律科技公司正在开发智能合同审查系统，帮助法务团队快速分析长达50页以上的商业合同，识别关键条款如违约责任、保密义务和争议解决机制。传统方法依赖人工逐条阅读，效率低且易遗漏。\n\n### 没有 longformer 时\n- 合同文本通常超过8000个token，远超标准Transformer的512token限制，必须手动切分段落，导致上下文断裂，模型无法理解跨段落的逻辑关联。\n- 切分后分别处理多个片段，再人工拼接结果，耗时长达3–5小时\u002F份合同，且容易因片段顺序错乱误判条款关系。\n- 使用BERT等模型时，为适配短序列不得不删减重要内容，导致关键条款（如“本协议终止后保密义务仍持续三年”）被截断丢失。\n- 模型训练数据因无法完整输入长文档，只能使用摘要或人工标注的短样本，泛化能力差，对复杂条款识别准确率不足65%。\n- 部署时需多GPU并行处理，硬件成本高，中小律所难以负担。\n\n### 使用 longformer 后\n- 直接输入完整合同（最高支持16K token），无需切分，模型能完整捕捉“定义条款→适用范围→违约后果”之间的逻辑链条。\n- 一键完成条款抽取与分类，处理一份合同时间从数小时缩短至8分钟以内，准确率提升至92%以上。\n- 通过梯度检查点技术，在24GB显存的消费级GPU上即可运行，无需昂贵集群，显著降低部署门槛。\n- 支持端到端的seq2seq任务，可直接生成结构化摘要（如“保密义务：持续3年，适用于所有员工及第三方”），减少人工复核工作量。\n- 模型在真实合同数据上微调后，能识别隐含条款（如“默认适用纽约州法律”），远超传统规则引擎的识别能力。\n\nlongformer 让法律AI从“片段分析”进化为“全文理解”，真正实现合同审查的自动化与精准化。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fallenai_longformer_a4f7ff52.png","allenai","Ai2","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fallenai_65c450d5.png","",null,"ai2-info@allenai.org","http:\u002F\u002Fwww.allenai.org","https:\u002F\u002Fgithub.com\u002Fallenai",[84,88,92],{"name":85,"color":86,"percentage":87},"Python","#3572A5",89.5,{"name":89,"color":90,"percentage":91},"Jupyter Notebook","#DA5B0B",10.4,{"name":93,"color":94,"percentage":95},"Shell","#89e051",0.1,2189,288,"2026-04-04T13:23:49","Apache-2.0","Linux","需要 NVIDIA GPU，显存 48GB 可支持 16K tokens 输入，推荐使用 CUDA 10.0，CUDA 内核仅支持 Linux 环境","未说明",{"notes":104,"python":105,"dependencies":106},"建议使用 conda 创建独立环境，首次运行需下载约 5GB 模型文件；若使用 sliding_chunks 注意力模式，可避免 CUDA 内核依赖，支持 CPU\u002FTPU\u002Ffp16；梯度检查点可降低显存占用 5 倍，适合小显存 GPU；推荐使用官方 fork 的 transformers 以支持长序列和梯度检查点功能","3.7",[107,108,109],"torch>=1.2.0","transformers","tvm",[26,13],"2026-03-27T02:49:30.150509","2026-04-06T05:36:24.458533",[114,119,124,129,134,139],{"id":115,"question_zh":116,"answer_zh":117,"source_url":118},9006,"运行 LongFormer TVM 内核时出现段错误或非法指令，如何排查？","该问题通常由 TVM 编译的 CUDA 内核与当前 GPU 架构不兼容导致。建议重新编译 lib_diagonaled_mm_float32_cuda.so，确保使用与 GPU 匹配的 compute capability（如 sm_70）。可参考 TVM 社区优化讨论：https:\u002F\u002Fdiscuss.tvm.ai\u002Ft\u002Fdeveloping-a-faster-schedule-for-longformers-kernel\u002F6367","https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fissues\u002F71",{"id":120,"question_zh":121,"answer_zh":122,"source_url":123},9002,"使用 LongFormer 时出现 CUDA 错误 'device-side assert triggered'，如何解决？","该错误通常是由于 pad_token_id 不为 0 导致位置嵌入索引越界。解决方案是确保 pad_token_id 设置为 0，或在模型配置中调整 pad_token_id 以匹配词表范围。参考 Hugging Face RoBERTa 实现中对位置 ID 的处理：https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fblob\u002Fmaster\u002Fsrc\u002Ftransformers\u002Fmodeling_roberta.py#L804","https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fissues\u002F99",{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},9003,"如何在 CPU 上运行 LongFormer 的滑动窗口注意力机制？","LongFormer 已支持在 CPU 上运行的 'sliding_chunks' 实现，无需使用 TVM CUDA 内核。只需将 config.attention_mode 设置为 'sliding_chunks' 而非 'tvm'，即可在 CPU 上正常运行。","https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fissues\u002F3",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},9004,"如何在 LongFormer 中实现文本分类任务？","官方示例已迁移至 Hugging Face Transformers 仓库。请参考最新文本分类示例：https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fmaster\u002Fexamples\u002Fpytorch\u002Ftext-classification，使用该目录下的脚本进行训练和评估。","https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fissues\u002F34",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},9005,"导入 TVM 的 nvcc 模块时报错 'cannot import name nvcc'，如何解决？","TVM 的 nvcc 模块已被移除或重构，不再支持直接从 tvm.contrib 导入 nvcc。建议检查 TVM 版本兼容性，或改用 tvm.runtime.module 或 tvm.driver 代替。建议升级到最新 TVM 版本并参考官方文档重构 CUDA 编译流程。","https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fissues\u002F52",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},9007,"如何训练支持 16K 以上 token 的 LongFormer 模型？","可通过以下方法支持长序列：1) 使用 fp16 混合精度训练；2) 启用 gradient_checkpointing；3) 减小 attention_window（如从 512 降至 256）；4) 使用正弦位置编码替代学习位置编码；5) 减小中间层维度。参考：https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fissues\u002F62","https:\u002F\u002Fgithub.com\u002Fallenai\u002Flongformer\u002Fissues\u002F62",[145,149],{"id":146,"version":147,"summary_zh":79,"released_at":148},106457,"v0.2","2020-05-18T02:19:35",{"id":150,"version":151,"summary_zh":79,"released_at":152},106458,"v0.1","2020-05-17T23:34:23"]