[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-studio-ousia--luke":3,"tool-studio-ousia--luke":64},[4,17,27,35,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",141543,2,"2026-04-06T11:32:54",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85013,"2026-04-06T11:09:19",[26,43,44,45,14,46,15,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":23,"last_commit_at":54,"category_tags":55,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[14,26,13,15,46],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74963,"2026-04-06T11:16:39",[15,26,13,46],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":92,"forks":93,"last_commit_at":94,"license":95,"difficulty_score":23,"env_os":96,"env_gpu":97,"env_ram":98,"env_deps":99,"category_tags":107,"github_topics":79,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":108,"updated_at":109,"faqs":110,"releases":145},4663,"studio-ousia\u002Fluke","luke","LUKE -- Language Understanding with Knowledge-based Embeddings","LUKE 是一款基于 Transformer 架构的先进预训练模型，全称为“基于知识嵌入的语言理解”。与传统模型仅关注单词不同，LUKE 的核心突破在于能够同时学习“单词”与“实体”（如人名、地名、机构名）的上下文表示。它通过独特的“实体感知自注意力机制”，让模型在处理文本时能更精准地识别和理解其中的关键实体及其相互关系。\n\n这一特性使 LUKE 在多项高难度自然语言处理任务中表现卓越，包括抽取式问答、命名实体识别、关系分类及实体类型判断等，并在 SQuAD、CoNLL-2003 等多个权威基准测试中取得了业界领先的成果。此外，项目方还发布了针对日语优化的版本，在日语自然语言理解任务上同样刷新了最佳成绩。\n\nLUKE 主要面向 AI 研究人员、NLP 算法工程师及开发者。如果你正在构建需要深度语义理解或复杂实体关系分析的应用，或者希望复现前沿学术论文中的实验，LUKE 提供了完整的预训练与微调代码，并兼容 Hugging Face Transformers 和 AllenNLP 等主流框架，能帮助你高效地将顶尖技术落地到实际业务场景中。","\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstudio-ousia_luke_readme_5148a9b3731d.png\" width=\"200\" alt=\"LUKE\">\n\n[![CircleCI](https:\u002F\u002Fcircleci.com\u002Fgh\u002Fstudio-ousia\u002Fluke.svg?style=svg&circle-token=49524bfde04659b8b54509f7e0f06ec3cf38f15e)](https:\u002F\u002Fcircleci.com\u002Fgh\u002Fstudio-ousia\u002Fluke)\n\n---\n\n**LUKE** (**L**anguage **U**nderstanding with **K**nowledge-based\n**E**mbeddings) is a new pretrained contextualized representation of words and\nentities based on transformer. It was proposed in our paper\n[LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.01057).\nIt achieves state-of-the-art results on important NLP benchmarks including\n**[SQuAD v1.1](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F)** (extractive\nquestion answering),\n**[CoNLL-2003](https:\u002F\u002Fwww.clips.uantwerpen.be\u002Fconll2003\u002Fner\u002F)** (named entity\nrecognition), **[ReCoRD](https:\u002F\u002Fsheng-z.github.io\u002FReCoRD-explorer\u002F)**\n(cloze-style question answering),\n**[TACRED](https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Ftacred\u002F)** (relation\nclassification), and\n**[Open Entity](https:\u002F\u002Fwww.cs.utexas.edu\u002F~eunsol\u002Fhtml_pages\u002Fopen_entity.html)**\n(entity typing).\n\nThis repository contains the source code to pretrain the model and fine-tune it\nto solve downstream tasks.\n\n## News\n\n**November 9, 2022: The large version of LUKE-Japanese is available**\n\nThe large version of LUKE-Japanese is available on the Hugging Face Model Hub:\n\n- [luke-japanese-large](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-large)\n- [luke-japanese-large-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-large-lite)\n\nThis model achieves state-of-the-art results on three datasets in\n[JGLUE](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002FJGLUE).\n\n| Model                         | MARC-ja   | JSTS                | JNLI      | JCommonsenseQA |\n| ----------------------------- | --------- | ------------------- | --------- | -------------- |\n|                               | acc       | Pearson\u002FSpearman    | acc       | acc            |\n| **LUKE Japanese large**       | **0.965** | **0.932**\u002F**0.902** | **0.927** | 0.893          |\n| _Baselines:_                  |           |\n| Tohoku BERT large             | 0.955     | 0.913\u002F0.872         | 0.900     | 0.816          |\n| Waseda RoBERTa large (seq128) | 0.954     | 0.930\u002F0.896         | 0.924     | **0.907**      |\n| Waseda RoBERTa large (seq512) | 0.961     | 0.926\u002F0.892         | 0.926     | 0.891          |\n| XLM RoBERTa large             | 0.964     | 0.918\u002F0.884         | 0.919     | 0.840          |\n\n**October 27, 2022: The Japanese version of LUKE is available**\n\nThe Japanese version of LUKE is now available on the Hugging Face Model Hub:\n\n- [luke-japanese-base](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-base)\n- [luke-japanese-base-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-base-lite)\n\nThis model outperforms other base-sized models on four datasets in\n[JGLUE](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002FJGLUE).\n\n| Model                  | MARC-ja   | JSTS                | JNLI      | JCommonsenseQA |\n| ---------------------- | --------- | ------------------- | --------- | -------------- |\n|                        | acc       | Pearson\u002FSpearman    | acc       | acc            |\n| **LUKE Japanese base** | **0.965** | **0.916**\u002F**0.877** | **0.912** | **0.842**      |\n| _Baselines:_           |           |\n| Tohoku BERT base       | 0.958     | 0.909\u002F0.868         | 0.899     | 0.808          |\n| NICT BERT base         | 0.958     | 0.910\u002F0.871         | 0.902     | 0.823          |\n| Waseda RoBERTa base    | 0.962     | 0.913\u002F0.873         | 0.895     | 0.840          |\n| XLM RoBERTa base       | 0.961     | 0.877\u002F0.831         | 0.893     | 0.687          |\n\n**April 13, 2022: The mLUKE fine-tuning code is available**\n\n[The example code](examples) is updated. Now it is based on\n[allennlp](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fallennlp) and\n[transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers). You can reproduce\nthe experiments in the [LUKE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.01057) and\n[mLUKE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.08151) papers with this implementation. For\nthe details, please see `README.md` under each example directory. The older code\nused in [the LUKE paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.01057) has been moved to\n[`examples\u002Flegacy`](examples\u002Flegacy).\n\n**April 13, 2022: The detailed instructions for pretraining LUKE models are\navailable**\n\nFor those interested in pretraining LUKE models, we explain how to prepare\ndatasets and run the pretraining code on [`pretraining.md`](pretraining.md).\n\n**November 24, 2021: Entity disambiguation example is available**\n\nThe example code of entity disambiguation based on LUKE has been added to this\nrepository. This model was originally proposed in\n[our paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.00426), and achieved state-of-the-art\nresults on five standard entity disambiguation datasets: AIDA-CoNLL, MSNBC,\nAQUAINT, ACE2004, and WNED-WIKI.\n\nFor further details, please refer to\n[`examples\u002Fentity_disambiguation`](examples\u002Fentity_disambiguation).\n\n**August 3, 2021: New example code based on Hugging Face Transformers and\nAllenNLP is available**\n\nNew fine-tuning examples of three downstream tasks, i.e., _NER_, _relation\nclassification_, and _entity typing_, have been added to LUKE. These examples\nare developed based on Hugging Face Transformers and AllenNLP. The fine-tuning\nmodels are defined using simple AllenNLP's Jsonnet config files!\n\nThe example code is available in [`examples`](examples).\n\n**May 5, 2021: LUKE is added to Hugging Face Transformers**\n\nLUKE has been added to the\n[master branch of the Hugging Face Transformers library](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers).\nYou can now solve entity-related tasks (e.g., named entity recognition, relation\nclassification, entity typing) easily using this library.\n\nFor example, the LUKE-large model fine-tuned on the TACRED dataset can be used\nas follows:\n\n```python\nfrom transformers import LukeTokenizer, LukeForEntityPairClassification\nmodel = LukeForEntityPairClassification.from_pretrained(\"studio-ousia\u002Fluke-large-finetuned-tacred\")\ntokenizer = LukeTokenizer.from_pretrained(\"studio-ousia\u002Fluke-large-finetuned-tacred\")\ntext = \"Beyoncé lives in Los Angeles.\"\nentity_spans = [(0, 7), (17, 28)]  # character-based entity spans corresponding to \"Beyoncé\" and \"Los Angeles\"\ninputs = tokenizer(text, entity_spans=entity_spans, return_tensors=\"pt\")\noutputs = model(**inputs)\nlogits = outputs.logits\npredicted_class_idx = int(logits[0].argmax())\nprint(\"Predicted class:\", model.config.id2label[predicted_class_idx])\n# Predicted class: per:cities_of_residence\n```\n\nWe also provide the following three Colab notebooks that show how to reproduce\nour experimental results on CoNLL-2003, TACRED, and Open Entity datasets using\nthe library:\n\n- [Reproducing experimental results of LUKE on CoNLL-2003 Using Hugging Face Transformers](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fstudio-ousia\u002Fluke\u002Fblob\u002Fmaster\u002Fnotebooks\u002Fhuggingface_conll_2003.ipynb)\n- [Reproducing experimental results of LUKE on TACRED Using Hugging Face Transformers](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fstudio-ousia\u002Fluke\u002Fblob\u002Fmaster\u002Fnotebooks\u002Fhuggingface_tacred.ipynb)\n- [Reproducing experimental results of LUKE on Open Entity Using Hugging Face Transformers](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fstudio-ousia\u002Fluke\u002Fblob\u002Fmaster\u002Fnotebooks\u002Fhuggingface_open_entity.ipynb)\n\nPlease refer to the\n[official documentation](https:\u002F\u002Fhuggingface.co\u002Ftransformers\u002Fmaster\u002Fmodel_doc\u002Fluke.html)\nfor further details.\n\n**November 5, 2021: LUKE-500K (base) model**\n\nWe released LUKE-500K (base), a new pretrained LUKE model which is smaller than\nexisting LUKE-500K (large). The experimental results of the LUKE-500K (base) and\nLUKE-500K (large) on SQuAD v1 and CoNLL-2003 are shown as follows:\n\n| Task                          | Dataset                                                      | Metric | LUKE-500K (base) | LUKE-500K (large) |\n| ----------------------------- | ------------------------------------------------------------ | ------ | ---------------- | ----------------- |\n| Extractive Question Answering | [SQuAD v1.1](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F)    | EM\u002FF1  | 86.1\u002F92.3        | 90.2\u002F95.4         |\n| Named Entity Recognition      | [CoNLL-2003](https:\u002F\u002Fwww.clips.uantwerpen.be\u002Fconll2003\u002Fner\u002F) | F1     | 93.3             | 94.3              |\n\nWe tuned only the batch size and learning rate in the experiments based on\nLUKE-500K (base).\n\n## Comparison with State-of-the-Art\n\nLUKE outperforms the previous state-of-the-art methods on five important NLP\ntasks:\n\n| Task                           | Dataset                                                                      | Metric | LUKE-500K (large) | Previous SOTA                                                             |\n| ------------------------------ | ---------------------------------------------------------------------------- | ------ | ----------------- | ------------------------------------------------------------------------- |\n| Extractive Question Answering  | [SQuAD v1.1](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F)                    | EM\u002FF1  | **90.2**\u002F**95.4** | 89.9\u002F95.1 ([Yang et al., 2019](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.08237))         |\n| Named Entity Recognition       | [CoNLL-2003](https:\u002F\u002Fwww.clips.uantwerpen.be\u002Fconll2003\u002Fner\u002F)                 | F1     | **94.3**          | 93.5 ([Baevski et al., 2019](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.07785))           |\n| Cloze-style Question Answering | [ReCoRD](https:\u002F\u002Fsheng-z.github.io\u002FReCoRD-explorer\u002F)                         | EM\u002FF1  | **90.6**\u002F**91.2** | 83.1\u002F83.7 ([Li et al., 2019](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FD19-6011\u002F)) |\n| Relation Classification        | [TACRED](https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Ftacred\u002F)                          | F1     | **72.7**          | 72.0 ([Wang et al. , 2020](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.01808))             |\n| Fine-grained Entity Typing     | [Open Entity](https:\u002F\u002Fwww.cs.utexas.edu\u002F~eunsol\u002Fhtml_pages\u002Fopen_entity.html) | F1     | **78.2**          | 77.6 ([Wang et al. , 2020](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.01808))             |\n\nThese numbers are reported in\n[our EMNLP 2020 paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.01057).\n\n## Installation\n\nLUKE can be installed using [Poetry](https:\u002F\u002Fpython-poetry.org\u002F):\n\n```bash\npoetry install\n\n# If you want to run pretraining for LUKE\npoetry install --extras \"pretraining opennlp\"\n# If you want to run pretraining for mLUKE\npoetry install --extras \"pretraining icu\"\n```\n\nThe virtual environment automatically created by Poetry can be activated by\n`poetry shell`.\n\n**A note on installing `torch`**\n\nThe pytorch installed via `poetry install` does not necessarily match your\nhardware. In such case, see [the official site](https:\u002F\u002Fpytorch.org\u002F) and\nreinstall the correct version with the `pip` command.\n\n```bash\npoetry run pip3 uninstall torch torchvision torchaudio\n# Example for Linux with CUDA 11.3\npoetry run pip3 install torch torchvision torchaudio --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu113\n```\n\n## Released Models\n\nOur pretrained models can be used with the\n[transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) library. The model\ndocumentations can be found in the following links:\n[LUKE](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmodel_doc\u002Fluke) and\n[mLUKE](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmodel_doc\u002Fmluke).\n\nCurrently, the following models are available on\n[the Hugging Face Model Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels).\n\n|           Name            |                                         model_name                                          | Entity Vocab Size | Params |\n| :-----------------------: | :-----------------------------------------------------------------------------------------: | :---------------: | :----: |\n|      **LUKE (base)**      |           [studio-ousia\u002Fluke-base](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-base)           |       500K        | 253 M  |\n|     **LUKE (large)**      |          [studio-ousia\u002Fluke-large](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-large)          |       500K        | 484 M  |\n|     **mLUKE (base)**      |          [studio-ousia\u002Fmluke-base](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fmluke-base)          |       1.2M        | 586 M  |\n|     **mLUKE (large)**     |         [studio-ousia\u002Fmluke-large](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fmluke-large)         |       1.2M        | 868 M  |\n| **LUKE Japanese (base)**  |  [studio-ousia\u002Fluke-japanese-base](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-base)  |       570K        | 281 M  |\n| **LUKE Japanese (large)** | [studio-ousia\u002Fluke-japanese-large](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-large) |       570K        | 562 M  |\n\n### Lite Models\n\nThe entity embeddings cause a large memory footprint as they contain all the\nWikipedia entities that we used in pretraining. However, in some downstream\ntasks (e.g., entity typing, named entity recognition, and relation\nclassification), we only need special entity embeddings such as `[MASK]`. Also,\nyou may want to only use the word representations.\n\nWith such use-cases in mind, to make our models easier to use, we have uploaded\nlite models only with special entity embeddings. These models perform exactly\nthe same as the full models but have much fewer parameters, which enable\nfine-tuning the model with small GPUs.\n\n|           Name            |                                              model_name                                               | Params |\n| :-----------------------: | :---------------------------------------------------------------------------------------------------: | :----: |\n|      **LUKE (base)**      |           [studio-ousia\u002Fluke-base-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-base-lite)           | 125 M  |\n|     **LUKE (large)**      |          [studio-ousia\u002Fluke-large-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-large-lite)          | 356 M  |\n|     **mLUKE (base)**      |          [studio-ousia\u002Fmluke-base-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fmluke-base-lite)          | 279 M  |\n|     **mLUKE (large)**     |         [studio-ousia\u002Fmluke-large-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fmluke-large-lite)         | 561 M  |\n| **LUKE Japanese (base)**  |  [studio-ousia\u002Fluke-japanese-base-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-base-lite)  | 134 M  |\n| **LUKE Japanese (large)** | [studio-ousia\u002Fluke-japanese-large-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-large-lite) | 415 M  |\n\n## Fine-tuning LUKE models\n\nWe release the fine-tuning code based on\n[allennlp](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fallennlp) and\n[transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) under\n[`examples`](examples). You can run fine-tuning experiments very easily with\npre-defined config files and the `allennlp train` command. For the details and\nexample commands for each task, please see the task directory under\n[`examples`](examples).\n\n## Pretraining LUKE models\n\nThe detailed instructions for pretraining luke models can be found on\n[`pretraining.md`](pretraining.md).\n\n## Citation\n\nIf you use LUKE in your work, please cite the\n[original paper](https:\u002F\u002Faclanthology.org\u002F2020.emnlp-main.523\u002F).\n\n```\n@inproceedings{yamada-etal-2020-luke,\n    title = \"{LUKE}: Deep Contextualized Entity Representations with Entity-aware Self-attention\",\n    author = \"Yamada, Ikuya  and\n      Asai, Akari  and\n      Shindo, Hiroyuki  and\n      Takeda, Hideaki  and\n      Matsumoto, Yuji\",\n    booktitle = \"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)\",\n    year = \"2020\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https:\u002F\u002Faclanthology.org\u002F2020.emnlp-main.523\",\n    doi = \"10.18653\u002Fv1\u002F2020.emnlp-main.523\",\n}\n```\n\nFor mLUKE, please cite\n[this paper](https:\u002F\u002Faclanthology.org\u002F2022.acl-long.505\u002F).\n\n```\n@inproceedings{ri-etal-2022-mluke,\n    title = \"m{LUKE}: {T}he Power of Entity Representations in Multilingual Pretrained Language Models\",\n    author = \"Ri, Ryokan  and\n      Yamada, Ikuya  and\n      Tsuruoka, Yoshimasa\",\n    booktitle = \"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",\n    year = \"2022\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https:\u002F\u002Faclanthology.org\u002F2022.acl-long.505\",\n}\n```\n","\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstudio-ousia_luke_readme_5148a9b3731d.png\" width=\"200\" alt=\"LUKE\">\n\n[![CircleCI](https:\u002F\u002Fcircleci.com\u002Fgh\u002Fstudio-ousia\u002Fluke.svg?style=svg&circle-token=49524bfde04659b8b54509f7e0f06ec3cf38f15e)](https:\u002F\u002Fcircleci.com\u002Fgh\u002Fstudio-ousia\u002Fluke)\n\n---\n\n**LUKE**（**L**anguage **U**nderstanding with **K**nowledge-based\n**E**mbeddings）是一种基于Transformer的新型预训练上下文词和实体表示模型。该模型在我们的论文\n[LUKE: 基于实体感知自注意力机制的深度上下文实体表示](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.01057)中提出。它在多项重要的自然语言处理基准测试中取得了当前最优性能，包括\n**[SQuAD v1.1](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F)**（抽取式问答）、\n**[CoNLL-2003](https:\u002F\u002Fwww.clips.uantwerpen.be\u002Fconll2003\u002Fner\u002F)**（命名实体识别）、\n**[ReCoRD](https:\u002F\u002Fsheng-z.github.io\u002FReCoRD-explorer\u002F)**（完形填空式问答）、\n**[TACRED](https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Ftacred\u002F)**（关系分类）以及\n**[Open Entity](https:\u002F\u002Fwww.cs.utexas.edu\u002F~eunsol\u002Fhtml_pages\u002Fopen_entity.html)**（实体类型标注）。\n\n本仓库包含用于预训练该模型及针对下游任务进行微调的源代码。\n\n## 最新消息\n\n**2022年11月9日：LUKE-Japanese大模型现已发布**\n\nLUKE-Japanese大模型已在Hugging Face Model Hub上线：\n\n- [luke-japanese-large](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-large)\n- [luke-japanese-large-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-large-lite)\n\n该模型在[JGLUE](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002FJGLUE)中的三个数据集上均取得了当前最优成绩。\n\n| 模型                         | MARC-ja   | JSTS                | JNLI      | JCommonsenseQA |\n| ----------------------------- | --------- | ------------------- | --------- | -------------- |\n|                               | acc       | Pearson\u002FSpearman    | acc       | acc            |\n| **LUKE Japanese large**       | **0.965** | **0.932**\u002F**0.902** | **0.927** | 0.893          |\n| _基线模型:_                  |           |\n| Tohoku BERT large             | 0.955     | 0.913\u002F0.872         | 0.900     | 0.816          |\n| Waseda RoBERTa large (seq128) | 0.954     | 0.930\u002F0.896         | 0.924     | **0.907**      |\n| Waseda RoBERTa large (seq512) | 0.961     | 0.926\u002F0.892         | 0.926     | 0.891          |\n| XLM RoBERTa large             | 0.964     | 0.918\u002F0.884         | 0.919     | 0.840          |\n\n**2022年10月27日：LUKE的日语版本现已发布**\n\nLUKE的日语版本现已在Hugging Face Model Hub上线：\n\n- [luke-japanese-base](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-base)\n- [luke-japanese-base-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-base-lite)\n\n该模型在[JGLUE](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002FJGLUE)中的四个数据集上均优于其他基础尺寸模型。\n\n| 模型                  | MARC-ja   | JSTS                | JNLI      | JCommonsenseQA |\n| ---------------------- | --------- | ------------------- | --------- | -------------- |\n|                        | acc       | Pearson\u002FSpearman    | acc       | acc            |\n| **LUKE Japanese base** | **0.965** | **0.916**\u002F**0.877** | **0.912** | **0.842**      |\n| _基线模型:_           |           |\n| Tohoku BERT base       | 0.958     | 0.909\u002F0.868         | 0.899     | 0.808          |\n| NICT BERT base         | 0.958     | 0.910\u002F0.871         | 0.902     | 0.823          |\n| Waseda RoBERTa base    | 0.962     | 0.913\u002F0.873         | 0.895     | 0.840          |\n| XLM RoBERTa base       | 0.961     | 0.877\u002F0.831         | 0.893     | 0.687          |\n\n**2022年4月13日：mLUKE微调代码现已发布**\n\n[示例代码](examples)已更新，现基于\n[allennlp](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fallennlp)和\n[transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)。您可使用此实现复现[LUKE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.01057)和\n[mLUKE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.08151)论文中的实验。详细信息请参阅各示例目录下的`README.md`文件。此前用于[LUKE论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.01057)的旧版代码已被移至\n[`examples\u002Flegacy`](examples\u002Flegacy)。\n\n**2022年4月13日：LUKE模型预训练的详细说明现已发布**\n\n对于希望预训练LUKE模型的用户，我们已在[`pretraining.md`](pretraining.md)中详细介绍了如何准备数据集以及运行预训练代码。\n\n**2021年11月24日：实体消歧义示例现已提供**\n\n本仓库新增了基于LUKE的实体消歧义示例代码。该模型最初在\n[我们的论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.00426)中提出，并在AIDA-CoNLL、MSNBC、AQUAINT、ACE2004和WNED-WIKI这五个标准实体消歧义数据集中取得了当前最优成绩。\n\n更多详情请参阅\n[`examples\u002Fentity_disambiguation`](examples\u002Fentity_disambiguation)。\n\n**2021年8月3日：基于Hugging Face Transformers和AllenNLP的新示例代码现已发布**\n\nLUKE新增了三个下游任务的微调示例，分别为_命名实体识别_、_关系分类_和_实体类型标注_。这些示例基于Hugging Face Transformers和AllenNLP开发，微调模型通过简单的AllenNLP Jsonnet配置文件定义！\n\n示例代码可在[`examples`](examples)中找到。\n\n**2021年5月5日：LUKE已加入Hugging Face Transformers库**\n\nLUKE已被添加到\n[Hugging Face Transformers库的主分支](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)。现在您可以使用该库轻松解决与实体相关的任务（如命名实体识别、关系分类、实体类型标注等）。\n\n例如，经过TACRED数据集微调的LUKE-large模型可以按如下方式使用：\n\n```python\nfrom transformers import LukeTokenizer, LukeForEntityPairClassification\nmodel = LukeForEntityPairClassification.from_pretrained(\"studio-ousia\u002Fluke-large-finetuned-tacred\")\ntokenizer = LukeTokenizer.from_pretrained(\"studio-ousia\u002Fluke-large-finetuned-tacred\")\ntext = \"碧昂丝住在洛杉矶。\"\nentity_spans = [(0, 7), (17, 28)]  # 分别对应“碧昂丝”和“洛杉矶”的字符级实体跨度\ninputs = tokenizer(text, entity_spans=entity_spans, return_tensors=\"pt\")\noutputs = model(**inputs)\nlogits = outputs.logits\npredicted_class_idx = int(logits[0].argmax())\nprint(\"预测类别:\", model.config.id2label[predicted_class_idx])\n\n# 预测类别：per:居住城市\n```\n\n我们还提供了以下三个 Colab 笔记本，展示了如何使用该库在 CoNLL-2003、TACRED 和 Open Entity 数据集上复现我们的实验结果：\n\n- [使用 Hugging Face Transformers 复现 LUKE 在 CoNLL-2003 上的实验结果](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fstudio-ousia\u002Fluke\u002Fblob\u002Fmaster\u002Fnotebooks\u002Fhuggingface_conll_2003.ipynb)\n- [使用 Hugging Face Transformers 复现 LUKE 在 TACRED 上的实验结果](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fstudio-ousia\u002Fluke\u002Fblob\u002Fmaster\u002Fnotebooks\u002Fhuggingface_tacred.ipynb)\n- [使用 Hugging Face Transformers 复现 LUKE 在 Open Entity 上的实验结果](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fstudio-ousia\u002Fluke\u002Fblob\u002Fmaster\u002Fnotebooks\u002Fhuggingface_open_entity.ipynb)\n\n更多详细信息请参阅\n[官方文档](https:\u002F\u002Fhuggingface.co\u002Ftransformers\u002Fmaster\u002Fmodel_doc\u002Fluke.html)。\n\n**2021年11月5日：LUKE-500K（base）模型**\n\n我们发布了 LUKE-500K（base），这是一个比现有的 LUKE-500K（large）更小的新预训练 LUKE 模型。LUKE-500K（base）和 LUKE-500K（large）在 SQuAD v1 和 CoNLL-2003 上的实验结果如下：\n\n| 任务                          | 数据集                                                      | 指标 | LUKE-500K（base） | LUKE-500K（large） |\n| ----------------------------- | ------------------------------------------------------------ | ------ | ---------------- | ----------------- |\n| 抽取式问答                    | [SQuAD v1.1](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F)    | EM\u002FF1  | 86.1\u002F92.3        | 90.2\u002F95.4         |\n| 命名实体识别                | [CoNLL-2003](https:\u002F\u002Fwww.clips.uantwerpen.be\u002Fconll2003\u002Fner\u002F) | F1     | 93.3             | 94.3              |\n\n我们在基于 LUKE-500K（base）的实验中仅调整了批量大小和学习率。\n\n## 与当前最先进方法的比较\n\nLUKE 在五个重要的自然语言处理任务上均优于之前的最先进方法：\n\n| 任务                           | 数据集                                                                      | 指标 | LUKE-500K（large） | 之前的最先进方法                                                             |\n| ------------------------------ | ---------------------------------------------------------------------------- | ------ | ----------------- | ------------------------------------------------------------------------- |\n| 抽取式问答                     | [SQuAD v1.1](https:\u002F\u002Frajpurkar.github.io\u002FSQuAD-explorer\u002F)                    | EM\u002FF1  | **90.2**\u002F**95.4** | 89.9\u002F95.1 ([Yang et al., 2019](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.08237))         |\n| 命名实体识别                   | [CoNLL-2003](https:\u002F\u002Fwww.clips.uantwerpen.be\u002Fconll2003\u002Fner\u002F)                 | F1     | **94.3**          | 93.5 ([Baevski et al., 2019](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.07785))           |\n| Cloze 式问答                   | [ReCoRD](https:\u002F\u002Fsheng-z.github.io\u002FReCoRD-explorer\u002F)                         | EM\u002FF1  | **90.6**\u002F**91.2** | 83.1\u002F83.7 ([Li et al., 2019](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FD19-6011\u002F)) |\n| 关系分类                       | [TACRED](https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Ftacred\u002F)                          | F1     | **72.7**          | 72.0 ([Wang et al., 2020](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.01808))             |\n| 细粒度实体类型标注             | [Open Entity](https:\u002F\u002Fwww.cs.utexas.edu\u002F~eunsol\u002Fhtml_pages\u002Fopen_entity.html) | F1     | **78.2**          | 77.6 ([Wang et al., 2020](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.01808))             |\n\n这些数据均发表在\n[我们的 EMNLP 2020 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.01057)中。\n\n## 安装\n\n可以使用 [Poetry](https:\u002F\u002Fpython-poetry.org\u002F) 安装 LUKE：\n\n```bash\npoetry install\n\n# 如果要运行 LUKE 的预训练\npoetry install --extras \"pretraining opennlp\"\n# 如果要运行 mLUKE 的预训练\npoetry install --extras \"pretraining icu\"\n```\n\nPoetry 自动创建的虚拟环境可以通过 `poetry shell` 激活。\n\n**关于安装 `torch` 的说明**\n\n通过 `poetry install` 安装的 PyTorch 可能并不完全匹配您的硬件配置。在这种情况下，请访问 [官方站点](https:\u002F\u002Fpytorch.org\u002F)，并使用 `pip` 命令重新安装适合您硬件的正确版本。\n\n```bash\npoetry run pip3 uninstall torch torchvision torchaudio\n# 以 Linux 系统且配备 CUDA 11.3 为例\npoetry run pip3 install torch torchvision torchaudio --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu113\n```\n\n## 发布的模型\n\n我们的预训练模型可以与\n[transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) 库一起使用。模型文档可在以下链接中找到：\n[LUKE](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmodel_doc\u002Fluke) 和\n[mLUKE](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmodel_doc\u002Fmluke)。\n\n目前，以下模型已在\n[Hugging Face Model Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels) 上发布。\n\n|           名称            |                                         model_name                                          | 实体词汇表大小 | 参数量 |\n| :-----------------------: | :-----------------------------------------------------------------------------------------: | :---------------: | :----: |\n|      **LUKE (base)**      |           [studio-ousia\u002Fluke-base](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-base)           |       500K        | 253 M  |\n|     **LUKE (large)**      |          [studio-ousia\u002Fluke-large](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-large)          |       500K        | 484 M  |\n|     **mLUKE (base)**      |          [studio-ousia\u002Fmluke-base](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fmluke-base)          |       1.2M        | 586 M  |\n|     **mLUKE (large)**     |         [studio-ousia\u002Fmluke-large](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fmluke-large)         |       1.2M        | 868 M  |\n| **LUKE 日语版 (base)**  |  [studio-ousia\u002Fluke-japanese-base](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-base)  |       570K        | 281 M  |\n| **LUKE 日语版 (large)** | [studio-ousia\u002Fluke-japanese-large](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-large) |       570K        | 562 M  |\n\n### 精简模型\n\n实体嵌入会占用大量内存，因为它们包含了我们在预训练中使用的所有维基百科实体。然而，在一些下游任务中（例如实体类型标注、命名实体识别和关系分类），我们只需要特殊的实体嵌入，比如 `[MASK]`。此外，你可能只想使用词表示。\n\n考虑到这些应用场景，为了使我们的模型更易于使用，我们上传了仅包含特殊实体嵌入的精简模型。这些模型的功能与完整模型完全相同，但参数量要少得多，因此可以在小型 GPU 上进行微调。\n\n|           名称            |                                              模型名称                                               | 参数量 |\n| :-----------------------: | :---------------------------------------------------------------------------------------------------: | :----: |\n|      **LUKE (base)**      |           [studio-ousia\u002Fluke-base-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-base-lite)           | 1.25亿 |\n|     **LUKE (large)**      |          [studio-ousia\u002Fluke-large-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-large-lite)          | 3.56亿 |\n|     **mLUKE (base)**      |          [studio-ousia\u002Fmluke-base-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fmluke-base-lite)          | 2.79亿 |\n|     **mLUKE (large)**     |         [studio-ousia\u002Fmluke-large-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fmluke-large-lite)         | 5.61亿 |\n| **LUKE 日语 (base)**  |  [studio-ousia\u002Fluke-japanese-base-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-base-lite)  | 1.34亿 |\n| **LUKE 日语 (large)** | [studio-ousia\u002Fluke-japanese-large-lite](https:\u002F\u002Fhuggingface.co\u002Fstudio-ousia\u002Fluke-japanese-large-lite) | 4.15亿 |\n\n## 微调 LUKE 模型\n\n我们基于 [allennlp](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fallennlp) 和 [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) 发布了微调代码，位于 [`examples`](examples) 目录下。你可以使用预定义的配置文件和 `allennlp train` 命令非常方便地运行微调实验。有关每个任务的详细信息和示例命令，请参阅 [`examples`](examples) 下的任务目录。\n\n## 预训练 LUKE 模型\n\n预训练 LUKE 模型的详细说明可在 [`pretraining.md`](pretraining.md) 中找到。\n\n## 引用\n\n如果你在工作中使用了 LUKE，请引用[原始论文](https:\u002F\u002Faclanthology.org\u002F2020.emnlp-main.523\u002F)。\n\n```\n@inproceedings{yamada-etal-2020-luke,\n    title = \"{LUKE}: 基于实体感知自注意力的深度上下文化实体表示\",\n    author = \"山田郁也 与 浅井明里 与 新藤博之 与 武田英明 与 松本裕二\",\n    booktitle = \"2020 年自然语言处理经验方法会议 (EMNLP) 论文集\",\n    year = \"2020\",\n    publisher = \"计算语言学协会\",\n    url = \"https:\u002F\u002Faclanthology.org\u002F2020.emnlp-main.523\",\n    doi = \"10.18653\u002Fv1\u002F2020.emnlp-main.523\",\n}\n```\n\n对于 mLUKE，请引用[这篇论文](https:\u002F\u002Faclanthology.org\u002F2022.acl-long.505\u002F)。\n\n```\n@inproceedings{ri-etal-2022-mluke,\n    title = \"m{LUKE}：多语言预训练语言模型中实体表示的力量\",\n    author = \"李良宽 与 山田郁也 与 鹤冈义正\",\n    booktitle = \"第 60 届计算语言学协会年会论文集（第一卷：长篇论文）\",\n    year = \"2022\",\n    publisher = \"计算语言学协会\",\n    url = \"https:\u002F\u002Faclanthology.org\u002F2022.acl-long.505\",\n}\n```","# LUKE 快速上手指南\n\nLUKE (Language Understanding with Knowledge-based Embeddings) 是一种基于 Transformer 的预训练模型，专为单词和实体提供上下文表示。它在命名实体识别 (NER)、关系分类、实体类型判断等涉及实体的 NLP 任务中表现卓越。\n\n## 环境准备\n\n*   **操作系统**: Linux, macOS, Windows\n*   **Python**: 3.7 或更高版本\n*   **包管理工具**: 推荐安装 [Poetry](https:\u002F\u002Fpython-poetry.org\u002F) 以管理依赖。\n*   **硬件**: 如需进行微调或推理加速，建议配备 NVIDIA GPU 并安装对应的 CUDA 驱动。\n\n## 安装步骤\n\n### 1. 使用 Poetry 安装（推荐）\n\n克隆仓库并安装基础依赖：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke.git\ncd luke\npoetry install\n```\n\n激活虚拟环境：\n\n```bash\npoetry shell\n```\n\n### 2. 配置 PyTorch 版本\n\nPoetry 默认安装的 PyTorch 可能不包含 CUDA 支持或不匹配您的硬件。请根据官方指引重新安装适配您环境的版本。\n\n**示例：Linux + CUDA 11.3**\n\n```bash\npoetry run pip3 uninstall torch torchvision torchaudio\npoetry run pip3 install torch torchvision torchaudio --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu113\n```\n\n> **国内加速提示**：如果下载缓慢，可使用清华或阿里镜像源替换 `--extra-index-url` 或直接指定 `-i` 参数。\n> 例如：`poetry run pip3 install torch ... -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n### 3. 直接使用 Hugging Face Transformers（无需克隆仓库）\n\n如果您仅需使用预训练模型进行推理或微调，可直接安装 `transformers` 库：\n\n```bash\npip install transformers\n# 国内加速\npip install transformers -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\nLUKE 已集成至 Hugging Face `transformers` 库。以下是一个最简单的**实体对分类**示例（基于 TACRED 数据集微调的模型）：\n\n```python\nfrom transformers import LukeTokenizer, LukeForEntityPairClassification\n\n# 加载模型和分词器\nmodel = LukeForEntityPairClassification.from_pretrained(\"studio-ousia\u002Fluke-large-finetuned-tacred\")\ntokenizer = LukeTokenizer.from_pretrained(\"studio-ousia\u002Fluke-large-finetuned-tacred\")\n\n# 准备输入文本和实体跨度（字符索引）\ntext = \"Beyoncé lives in Los Angeles.\"\nentity_spans = [(0, 7), (17, 28)]  # 对应 \"Beyoncé\" 和 \"Los Angeles\"\n\n# 编码并推理\ninputs = tokenizer(text, entity_spans=entity_spans, return_tensors=\"pt\")\noutputs = model(**inputs)\nlogits = outputs.logits\n\n# 获取预测结果\npredicted_class_idx = int(logits[0].argmax())\nprint(\"Predicted class:\", model.config.id2label[predicted_class_idx])\n# 输出: Predicted class: per:cities_of_residence\n```\n\n### 可用模型\n您可以在 [Hugging Face Model Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels?search=luke) 找到更多预训练模型，包括：\n*   `studio-ousia\u002Fluke-base` (基础版)\n*   `studio-ousia\u002Fluke-large` (大型版)\n*   `studio-ousia\u002Fluke-japanese-base` (日语版)","某金融科技公司正在构建智能客服系统，需要从海量客户投诉工单中自动提取关键实体（如人名、公司名）并精准识别它们之间的复杂关系（如“持股”、“任职”），以辅助风控决策。\n\n### 没有 luke 时\n- 传统模型将文本中的实体仅视为普通词汇序列，无法区分“苹果”是指水果还是科技公司，导致实体识别准确率低下。\n- 在处理长距离依赖的关系分类任务时，模型难以捕捉相隔较远的两个实体间的语义关联，经常误判或漏判关键风险关系。\n- 面对包含大量专业术语和嵌套实体的金融文档，通用预训练模型表现乏力，需要耗费大量人力进行规则后处理和数据标注修正。\n- 模型缺乏显式的知识嵌入能力，对于未出现在训练集中的罕见实体组合，泛化能力极差，频繁出现“幻觉”式错误。\n\n### 使用 luke 后\n- luke 独创的基于知识的实体嵌入机制，让模型能直接理解“实体”概念，显著提升了在 CoNLL-2003 等基准上的命名实体识别精度，准确区分多义词。\n- 借助实体感知自注意力机制，luke 能精准建模实体间的交互，即使在复杂的长句中也能为 TACRED 类型的关系分类任务提供状态级的判断结果。\n- 针对金融领域的特定下游任务微调后，luke 大幅减少了对人工规则的依赖，直接从非结构化文本中提取出高质量的结构化知识图谱数据。\n- 凭借强大的上下文表征能力，luke 在面对未见过的实体组合时展现出卓越的泛化性，有效降低了冷启动场景下的模型失效风险。\n\nluke 通过将深度上下文表示与知识库嵌入完美融合，彻底解决了传统模型在复杂实体关系理解上的瓶颈，让机器真正读懂了文本背后的知识网络。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstudio-ousia_luke_8b30549b.png","studio-ousia","Studio Ousia","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fstudio-ousia_b40105f6.png","",null,"info@ousia.jp","http:\u002F\u002Fwww.ousia.jp","https:\u002F\u002Fgithub.com\u002Fstudio-ousia",[84,88],{"name":85,"color":86,"percentage":87},"Jupyter Notebook","#DA5B0B",71.4,{"name":89,"color":90,"percentage":91},"Python","#3572A5",28.6,727,99,"2026-02-18T15:04:28","Apache-2.0","Linux, macOS","非绝对必需（取决于任务），预训练或运行大型模型建议 NVIDIA GPU。需根据硬件手动安装对应 CUDA 版本的 PyTorch（文中示例为 CUDA 11.3）。显存大小未明确说明，但大型模型通常需要较高显存。","未说明",{"notes":100,"python":101,"dependencies":102},"1. 项目使用 Poetry 进行依赖管理和环境构建。2. 通过 'poetry install' 安装的 PyTorch 版本可能不匹配当前硬件，官方强烈建议根据硬件环境（如 CUDA 版本）使用 pip 手动重新安装正确的 PyTorch 版本。3. 提供预训练和微调代码，微调示例基于 AllenNLP 和 Hugging Face Transformers。4. 支持日语模型（LUKE-Japanese）及多语言模型（mLUKE）。","未说明 (通过 Poetry 管理)",[103,104,105,106],"torch","transformers","allennlp","poetry",[15],"2026-03-27T02:49:30.150509","2026-04-07T07:13:00.396981",[111,116,121,126,131,136,141],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},21209,"如何获取 LUKE 模型的预训练（Pretraining）指导说明？","官方已提供详细的预训练指导文档，您可以直接访问以下链接查看：https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke\u002Fblob\u002Fmaster\u002Fpretraining.md。该文档包含了环境配置、数据集准备及训练命令等详细信息。","https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke\u002Fissues\u002F41",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},21210,"为什么我复现的 SQuAD 任务结果低于论文报告的分数？","如果代码在维护者环境中能复现报告的性能，通常是因为您的运行环境存在技术问题。常见原因包括批次大小（batch size）设置不当。例如，有用户将 batch size 调整为 48 后性能仍低，后来发现应调整为 18。请仔细检查您的超参数配置（特别是 batch size）以及依赖环境是否与官方一致。","https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke\u002Fissues\u002F56",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},21211,"LUKE 是否支持非英语语言（如土耳其语）的预训练和实体消歧？","目前的实体消歧（Entity Disambiguation）评估脚本是专门为论文中使用的英文数据集设计的。虽然核心的预训练代码（不包括特定数据集类，如 `EntityDisambiguationDataset`）理论上可以用于其他语言的数据集，但官方尚未提供直接支持其他语言 Wikipedia dump 进行端到端训练和测试的选项。如需支持其他语言，可能需要自行修改数据集加载部分。","https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke\u002Fissues\u002F126",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},21212,"在使用预训练模型进行关系分类（Relation Classification）时，为什么微调后的结果与使用生成的检查点结果差异巨大？","这通常是由环境不一致导致的。维护者指出，他们可以使用提供的检查点文件复现报告的结果。如果您得到的 F1 分数显著低于预期（例如预期 72 却得到 64），请确保使用 `poetry` 创建完全一致的实验环境，并检查数据加载工具（data loading utils）和评估指标的实现是否正确。","https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke\u002Fissues\u002F57",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},21213,"LUKE 预训练过程中如何处理实体不平衡问题？","类似于 BERT 的掩码语言模型（Masked Language Model），LUKE 在预训练期间简单地从词汇表中的所有实体中预测每一个实体，官方并未针对实体不平衡问题进行特殊的额外处理。","https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke\u002Fissues\u002F67",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},21214,"是否有用于 AllenNLP 或 Transformers 的 LUKE NER（命名实体识别）示例代码？","有的，社区贡献了一个使用 LUKE 结合 AllenNLP\u002FTransformers 解决 NER 任务的新示例。您可以访问以下地址获取代码和说明：https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke\u002Ftree\u002Fdownstream_allennlp\u002Fexamples_allennlp\u002Fner。","https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke\u002Fissues\u002F38",{"id":142,"question_zh":143,"answer_zh":144,"source_url":135},21215,"实体词汇表中的标题能否映射到唯一的 Wikipedia 页面 ID 或 Wikidata 实体 ID？","目前存在一些标题无法对齐到唯一 ID 的情况（部分缺失，部分对应同一 ID）。关于发布实体词汇表标题与 Wikipedia\u002FWikidata ID 之间映射关系的需求，官方尚未直接提供该映射文件，且相关讨论因缺乏后续活动而关闭。用户可能需要自行构建或使用外部工具进行对齐。",[]]