[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-taishi-i--awesome-japanese-nlp-resources":3,"tool-taishi-i--awesome-japanese-nlp-resources":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",151314,2,"2026-04-11T23:32:58",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":74,"owner_company":76,"owner_location":74,"owner_email":74,"owner_twitter":74,"owner_website":74,"owner_url":77,"languages":74,"stars":78,"forks":79,"last_commit_at":80,"license":81,"difficulty_score":82,"env_os":83,"env_gpu":84,"env_ram":84,"env_deps":85,"category_tags":88,"github_topics":89,"view_count":32,"oss_zip_url":74,"oss_zip_packed_at":74,"status":17,"created_at":99,"updated_at":100,"faqs":101,"releases":138},6732,"taishi-i\u002Fawesome-japanese-nlp-resources","awesome-japanese-nlp-resources","A curated list of resources dedicated to Python libraries, LLMs, dictionaries, and corpora of NLP for Japanese","awesome-japanese-nlp-resources 是一个专为日语自然语言处理（NLP）打造的精选资源库。它系统地整理了涵盖 Python、C++、Rust、Java 等多种编程语言的库，以及大语言模型（LLMs）、专业词典和各类语料库。\n\n针对日语独特的语言结构（如复杂的形态变化和缺乏空格分词），开发者在进行文本分析时往往面临工具分散、标准不一的难题。这份资源库通过人工筛选与分类，一站式解决了寻找高质量日语 NLP 工具的痛点。它不仅收录了 850 多个 GitHub 项目和 278 个 Hugging Face 模型与数据集，还细致地按功能划分为形态素分析、句法解析、情感分析、机器翻译及命名实体识别等类别，甚至包含了最新的预训练模型和教程。\n\n无论是正在构建日语聊天机器人的 AI 工程师、需要处理大规模日文数据的研究人员，还是希望快速上手日语 NLP 的学生，都能从中高效找到所需资源。其独特的亮点在于跨语言支持的广度以及对前沿大模型资源的及时更新，同时提供多语言文档支持，极大地降低了全球开发者进入日语 NLP 领域的门槛。作为一个开放协作的项目，它持续社区贡献，确保所列资","awesome-japanese-nlp-resources 是一个专为日语自然语言处理（NLP）打造的精选资源库。它系统地整理了涵盖 Python、C++、Rust、Java 等多种编程语言的库，以及大语言模型（LLMs）、专业词典和各类语料库。\n\n针对日语独特的语言结构（如复杂的形态变化和缺乏空格分词），开发者在进行文本分析时往往面临工具分散、标准不一的难题。这份资源库通过人工筛选与分类，一站式解决了寻找高质量日语 NLP 工具的痛点。它不仅收录了 850 多个 GitHub 项目和 278 个 Hugging Face 模型与数据集，还细致地按功能划分为形态素分析、句法解析、情感分析、机器翻译及命名实体识别等类别，甚至包含了最新的预训练模型和教程。\n\n无论是正在构建日语聊天机器人的 AI 工程师、需要处理大规模日文数据的研究人员，还是希望快速上手日语 NLP 的学生，都能从中高效找到所需资源。其独特的亮点在于跨语言支持的广度以及对前沿大模型资源的及时更新，同时提供多语言文档支持，极大地降低了全球开发者进入日语 NLP 领域的门槛。作为一个开放协作的项目，它持续社区贡献，确保所列资源始终保持前沿与实用。","# awesome-japanese-nlp-resources\n\n[![Awesome](https:\u002F\u002Fcdn.rawgit.com\u002Fsindresorhus\u002Fawesome\u002Fd7305f38d29fed78fa85652e3a63e154dd8e8829\u002Fmedia\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources)\n[![RRs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-brightgreen)](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fpulls)\n[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources-search)\n[![License: CC0-1.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-CC0_1.0-lightgrey.svg)](http:\u002F\u002Fcreativecommons.org\u002Fpublicdomain\u002Fzero\u002F1.0\u002F)\n[![CC0](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaishi-i_awesome-japanese-nlp-resources_readme_b7657951a0bb.png)](http:\u002F\u002Fcreativecommons.org\u002Fpublicdomain\u002Fzero\u002F1.0\u002F)\n\nA curated list of resources dedicated to Python libraries, llms, dictionaries, and corpora of NLP for Japanese\n\n- Listed information on [850 GitHub repositories](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.full.md)\n- Listed information on [278 Hugging Face repositories](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002Fhuggingface.md) (models and datasets)\n- 🎉 We are excited to announce the release of [awesome-japanese-nlp-survey](https:\u002F\u002Fawesome-japanese-nlp-survey.vercel.app\u002F) on March 1, 2026!\n\n\n[English](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.en.md) | [日本語 (Japanese) ](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.ja.md) | [繁體中文 (Chinese) ](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.zh-hant.md) | [简体中文 (Chinese) ](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.zh-hans.md)\n\n\n## 🎉 The latest additions\n\n**Corpus**\n * [kamuskita](https:\u002F\u002Fgithub.com\u002Fmatbahasa\u002Fkamuskita) - マレー語勉強会で作っているオープンなマレー語・日本語辞典『みんなのマレー語辞典』\n\n_Updated on Apr 06, 2026_\n\n## Contents\n * [Hugging Face](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002Fhuggingface.md)\n   * [Models](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002Fhuggingface.md#models)\n   * [Datasets](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002Fhuggingface.md#datasets)\n * [Python library](#python-library)\n   * [Morphology analysis](#morphology-analysis)\n   * [Parsing](#parsing)\n   * [Converter](#converter)\n   * [Preprocessor](#preprocessor)\n   * [Sentence spliter](#sentence-spliter)\n   * [Sentiment analysis](#sentiment-analysis)\n   * [Machine translation](#machine-translation)\n   * [Named entity recognition](#named-entity-recognition)\n   * [OCR](#ocr)\n   * [Tool for pretrained models](#tool-for-pretrained-models)\n   * [Others](#others)\n * [C++](#c)\n   * [Morphology analysis](#morphology-analysis-1)\n   * [Parsing](#parsing-1)\n   * [Others](#others-1)\n * [Rust crate](#rust-crate)\n   * [Morphology analysis](#morphology-analysis-2)\n   * [Converter](#converter-1)\n   * [Search engine library](#search-engine-library)\n   * [Others](#others-2)\n * [JavaScript](#javaScript)\n   * [Morphology analysis](#morphology-analysis-3)\n   * [Converter](#converter-2)\n   * [Others](#others-3)\n * [Go](#go)\n   * [Morphology analysis](#morphology-analysis-4)\n   * [Others](#others-4)\n * [Java](#java)\n   * [Morphology analysis](#morphology-analysis-5)\n   * [Others](#others-5)\n * [Pretrained model](#pretrained-model)\n   * [Word2Vec](#word2Vec)\n   * [Transformer based models](#transformer-based-models)\n * [ChatGPT](#chatgpt)\n * [Dictionary and IME](#dictionary-and-ime)\n * [Corpus](#corpus)\n   * [Part-of-speech tagging \u002F Named entity recognition](#part-of-speech-tagging--named-entity-recognition)\n   * [Text classification](#text-classification)\n   * [Parallel corpus](#parallel-corpus)\n   * [Dialog corpus](#dialog-corpus)\n   * [Others](#others-3)\n * [Tutorial](#tutorial)\n * [Research summary](#research-summary)\n * [Reference](#reference)\n * [Contributors](#contributors)\n\n\n## Python library\n\n### Morphology analysis\nLibraries that split Japanese text into words or morphemes and assign part-of-speech and base forms\n\n * [sudachi.rs](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002Fsudachi.rs) - SudachiPy 0.6* and above are developed as Sudachi.rs.\n * [Janome](https:\u002F\u002Fgithub.com\u002Fmocobeta\u002Fjanome) - Japanese morphological analysis engine written in pure Python\n * [mecab-python3](https:\u002F\u002Fgithub.com\u002FSamuraiT\u002Fmecab-python3) - mecab-python. mecab-python. you can find original version here:http:\u002F\u002Ftaku910.github.io\u002Fmecab\u002F\n * [mecab](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fmecab) - This repository is for building Windows 64-bit MeCab binary and improving MeCab Python binding.\n * [fugashi](https:\u002F\u002Fgithub.com\u002Fpolm\u002Ffugashi) - A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.\n * [nagisa](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fnagisa) - A Japanese tokenizer based on recurrent neural networks\n * [pyknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fpyknp) - A Python Module for JUMAN++\u002FKNP\n * [Mykytea-python](https:\u002F\u002Fgithub.com\u002Fchezou\u002FMykytea-python) - Python wrapper for KyTea\n * [konoha](https:\u002F\u002Fgithub.com\u002Fhimkt\u002Fkonoha) - Konoha: Simple wrapper of Japanese Tokenizers\n * [natto-py](https:\u002F\u002Fgithub.com\u002Fburuzaemon\u002Fnatto-py) - natto-py combines the Python programming language with MeCab, the part-of-speech and morphological analyzer for the Japanese language.\n * [rakutenma-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Frakutenma-python) - Rakuten MA (Python version)\n * [python-vaporetto](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fpython-vaporetto) -  Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.\n * [dango](https:\u002F\u002Fgithub.com\u002Fmkartawijaya\u002Fdango) - An easy to use tokenizer for Japanese text, aimed at language learners and non-linguists\n * [rhoknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Frhoknp) - Yet another Python binding for Juman++\u002FKNP\n * [python-vibrato](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fpython-vibrato) -  Viterbi-based accelerated tokenizer (Python wrapper)\n * [jagger-python](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjagger-python) - Python binding for Jagger(C++ implementation of Pattern-based Japanese Morphological Analyzer)\n * [Mecari](https:\u002F\u002Fgithub.com\u002Fzbller\u002FMecari) - Mecari (Japanese Morphological Analysis with Graph Neural Networks)\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [SudachiPy](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachiPy) | 📥 375k | 📦 63M | ⭐ 429 | 🔴 october 2022|\n| 🔗 [Janome](https:\u002F\u002Fgithub.com\u002Fmocobeta\u002Fjanome) | 📥 50k | 📦 12M | ⭐ 909 | 🟡 october 2025|\n| 🔗 [mecab-python3](https:\u002F\u002Fgithub.com\u002FSamuraiT\u002Fmecab-python3) | 📥 206k | 📦 36M | ⭐ 581 | 🟡 november 2025|\n| 🔗 [mecab](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fmecab\u002Ftree\u002Fmaster\u002Fmecab\u002Fpython) | 📥 24k | 📦 724k | ⭐ 271 | 🔴 october 2024|\n| 🔗 [fugashi](https:\u002F\u002Fgithub.com\u002Fpolm\u002Ffugashi) | 📥 120k | 📦 14M | ⭐ 518 | 🟡 october 2025|\n| 🔗 [nagisa](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fnagisa) | 📥 49k | 📦 8M | ⭐ 416 | 🟢 february|\n| 🔗 [pyknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fpyknp) | 📥 1k | 📦 3M | ⭐ 93 | 🟢 january|\n| 🔗 [Mykytea-python](https:\u002F\u002Fgithub.com\u002Fchezou\u002FMykytea-python) | 📥 2k | 📦 562k | ⭐ 36 | 🟢 last monday|\n| 🔗 [konoha](https:\u002F\u002Fgithub.com\u002Fhimkt\u002Fkonoha) | 📥 50k | 📦 6M | ⭐ 261 | 🟢 march|\n| 🔗 [natto-py](https:\u002F\u002Fgithub.com\u002Fburuzaemon\u002Fnatto-py) | 📥 38k | 📦 34M | ⭐ 95 | 🔴 november 2023|\n| 🔗 [rakutenma-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Frakutenma-python) | 📥 14 | 📦 27k | ⭐ 23 | 🔴 may 2017|\n| 🔗 [python-vaporetto](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fpython-vaporetto) | 📥 229 | 📦 175k | ⭐ 21 | 🟡 june 2025|\n| 🔗 [dango](https:\u002F\u002Fgithub.com\u002Fmkartawijaya\u002Fdango) | 📥 42 | 📦 26k | ⭐ 25 | 🔴 november 2021|\n| 🔗 [rhoknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Frhoknp) | 📥 20k | 📦 1M | ⭐ 38 | 🟢 march|\n| 🔗 [python-vibrato](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fpython-vibrato) | 📥 138 | 📦 117k | ⭐ 43 | 🔴 september 2024|\n| 🔗 [jagger-python](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjagger-python) | 📥 631 | 📦 300k | ⭐ 13 | 🔴 march 2024|\n| 🔗 [Mecari](https:\u002F\u002Fgithub.com\u002Fzbller\u002FMecari) | - | - | ⭐ 39 | 🟡 september 2025|\n\n\n### Parsing\nLibraries that analyze syntactic and dependency structures of Japanese sentences\n\n * [ginza](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fginza) - A Japanese NLP Library using spaCy as framework based on Universal Dependencies\n * [cabocha](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fcabocha) - Yet Another Japanese Dependency Structure Analyzer\n * [UniDic2UD](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002FUniDic2UD) - Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and contemporary Japanese\n * [camphr](https:\u002F\u002Fgithub.com\u002FPKSHATechnology-Research\u002Fcamphr) - Camphr - NLP libary for creating pipeline components\n * [SuPar-UniDic](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002FSuPar-UniDic) - Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and contemporary Japanese with BERT models\n * [depccg](https:\u002F\u002Fgithub.com\u002Fmasashi-y\u002Fdepccg) - A* CCG Parser with a Supertag and Dependency Factored Model\n * [bertknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fbertknp) - A Japanese dependency parser based on BERT\n * [esupar](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002Fesupar) - Tokenizer POS-Tagger and Dependency-parser with BERT\u002FRoBERTa\u002FDeBERTa models for Japanese and other languages\n * [yomikata](https:\u002F\u002Fgithub.com\u002Fpassaglia\u002Fyomikata) - Heteronym disambiguation library using a fine-tuned BERT model.\n * [jdepp-python](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjdepp-python) - Python binding for J.DepP(C++ implementation of Japanese Dependency Parsers)\n * [lightblue](https:\u002F\u002Fgithub.com\u002Fdaisukebekki\u002Flightblue) - A CCG parser for Japanese with DTS-representations\n * [natsume-simple](https:\u002F\u002Fgithub.com\u002Fborh-lab\u002Fnatsume-simple) - natsume-simpleは日本語の係り受け関係検索システム\n * [jdeppy](https:\u002F\u002Fgithub.com\u002Fmatsurih\u002Fjdeppy) - Python wrapper for J.DepP, fast Japanese Dependency Parser\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [ginza](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fginza) | 📥 12k | 📦 2M | ⭐ 841 | 🔴 march 2024|\n| 🔗 [cabocha](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fcabocha\u002Ftree\u002Fmaster\u002Fpython) | 📥 98 | 📦 54k | ⭐ 7 | 🔴 august 2022|\n| 🔗 [UniDic2UD](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002FUniDic2UD) | 📥 256 | 📦 330k | ⭐ 38 | 🟡 december 2025|\n| 🔗 [camphr](https:\u002F\u002Fgithub.com\u002FPKSHATechnology-Research\u002Fcamphr) | 📥 580 | 📦 271k | ⭐ 337 | 🔴 august 2021|\n| 🔗 [SuPar-UniDic](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002FSuParUniDic) | 📥 32 | 📦 119k | ⭐ 21 | 🔴 repo not found|\n| 🔗 [depccg](https:\u002F\u002Fgithub.com\u002Fmasashi-y\u002Fdepccg) | 📥 60 | 📦 46k | ⭐ 98 | 🔴 august 2023|\n| 🔗 [bertknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fbertknp) | - | - | ⭐ 23 | 🔴 october 2021|\n| 🔗 [esupar](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002Fesupar) | 📥 516 | 📦 171k | ⭐ 55 | 🟢 february|\n| 🔗 [yomikata](https:\u002F\u002Fgithub.com\u002Fpassaglia\u002Fyomikata) | 📥 33 | 📦 50k | ⭐ 32 | 🔴 october 2023|\n| 🔗 [jdepp-python](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjdepp-python) | 📥 647 | 📦 285k | ⭐ 4 | 🔴 february 2024|\n| 🔗 [lightblue](https:\u002F\u002Fgithub.com\u002Fdaisukebekki\u002Flightblue) | - | - | ⭐ 28 | 🟢 march|\n| 🔗 [natsume-simple](https:\u002F\u002Fgithub.com\u002Fborh-lab\u002Fnatsume-simple) | - | - | ⭐ 5 | 🔴 february 2025|\n| 🔗 [jdeppy](https:\u002F\u002Fgithub.com\u002Fmatsurih\u002Fjdeppy) | 📥 10 | 📦 11k | ⭐ 3 | 🔴 february 2022|\n\n\n### Converter\nLibraries that convert between character types such as kana, romaji, and full-width\u002Fhalf-width forms\n\n * [pykakasi](https:\u002F\u002Fgithub.com\u002Fmiurahr\u002Fpykakasi) - Lightweight converter from Japanese Kana-kanji sentences into Kana-Roman.\n * [cutlet](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fcutlet) - Japanese to romaji converter in Python\n * [alphabet2kana](https:\u002F\u002Fgithub.com\u002Fshihono\u002Falphabet2kana) - Convert English alphabet to Katakana\n * [Convert-Numbers-to-Japanese](https:\u002F\u002Fgithub.com\u002FGreatdane\u002FConvert-Numbers-to-Japanese) - Converts Arabic numerals, or 'western' style numbers, to a Japanese context.\n * [mozcpy](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fmozcpy) - Mozc for Python: Kana-Kanji converter\n * [jamorasep](https:\u002F\u002Fgithub.com\u002Ftachi-hi\u002Fjamorasep) - Japanese text parser to separate Hiragana\u002FKatakana string into morae (syllables).\n * [text2phoneme](https:\u002F\u002Fgithub.com\u002Fkorguchi\u002Ftext2phoneme) - 日本語文を音素列へ変換するスクリプト\n * [jntajis-python](https:\u002F\u002Fgithub.com\u002Fopencollector\u002Fjntajis-python) - A fast character conversion and transliteration library based on the scheme defined for Japan National Tax Agency (国税庁) 's\n * [wiredify](https:\u002F\u002Fgithub.com\u002Feggplants\u002Fwiredify) - Convert japanese kana from ba-bi-bu-be-bo into va-vi-vu-ve-vo\n * [mecab-text-cleaner](https:\u002F\u002Fgithub.com\u002F34j\u002Fmecab-text-cleaner) - Simple Python package (CLI\u002FPython API) for getting japanese readings (yomigana) and accents using MeCab.\n * [pynormalizenumexp](https:\u002F\u002Fgithub.com\u002Ftkscode\u002Fpynormalizenumexp) - 数量表現や時間表現の抽出・正規化を行うNormalizeNumexpのPython実装\n * [Jusho](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002FJusho) - Easy wrapper for the postal code data of Japan\n * [yurenizer](https:\u002F\u002Fgithub.com\u002Fsea-turt1e\u002Fyurenizer) - Japanese text normalizer that resolves spelling inconsistencies. （日本語表記揺れ解消ツール）\n * [e2k](https:\u002F\u002Fgithub.com\u002FPatchethium\u002Fe2k) - A tool for automatic English to Katakana conversion\n * [alkana.py](https:\u002F\u002Fgithub.com\u002Fzomysan\u002Falkana.py) - A tool to get the katakana reading of an alphabetical string.\n * [englishtokanaconverter](https:\u002F\u002Fgithub.com\u002Factlaboratory\u002Fenglishtokanaconverter) - 英語文字列をカタカナに変換するプログラム\n * [kanjiconv](https:\u002F\u002Fgithub.com\u002Fsea-turt1e\u002Fkanjiconv) - Kanji Converter to Hiragana, Katakana, Roman alphabet.\n * [kanjize](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002Fkanjize) - Kanjize(カンジャイズ): Easy converter between Kanji-Number and Integer\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [pykakasi](https:\u002F\u002Fgithub.com\u002Fmiurahr\u002Fpykakasi) | 📥 298k | 📦 30M | ⭐ 445 | 🔴 july 2022|\n| 🔗 [cutlet](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fcutlet) | 📥 18k | 📦 2M | ⭐ 374 | 🟡 june 2025|\n| 🔗 [alphabet2kana](https:\u002F\u002Fgithub.com\u002Fshihono\u002Falphabet2kana) | 📥 215 | 📦 58k | ⭐ 14 | 🟢 february|\n| 🔗 [Convert-Numbers-to-Japanese](https:\u002F\u002Fgithub.com\u002FGreatdane\u002FConvert-Numbers-to-Japanese) | - | - | ⭐ 50 | 🔴 november 2020|\n| 🔗 [mozcpy](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fmozcpy) | 📥 114 | 📦 13k | ⭐ 47 | 🔴 february 2025|\n| 🔗 [jamorasep](https:\u002F\u002Fgithub.com\u002Ftachi-hi\u002Fjamorasep) | 📥 89 | 📦 9k | ⭐ 11 | 🟢 february|\n| 🔗 [text2phoneme](https:\u002F\u002Fgithub.com\u002Fkorguchi\u002Ftext2phoneme) | - | - | ⭐ 13 | 🔴 may 2023|\n| 🔗 [jntajis-python](https:\u002F\u002Fgithub.com\u002Fopencollector\u002Fjntajis-python) | 📥 1k | 📦 117k | ⭐ 21 | 🟢 march|\n| 🔗 [wiredify](https:\u002F\u002Fgithub.com\u002Feggplants\u002Fwiredify) | 📥 27 | 📦 6k | ⭐ 3 | 🟡 december 2025|\n| 🔗 [mecab-text-cleaner](https:\u002F\u002Fgithub.com\u002F34j\u002Fmecab-text-cleaner) | 📥 10 | 📦 4k | ⭐ 7 | 🟢 february|\n| 🔗 [pynormalizenumexp](https:\u002F\u002Fgithub.com\u002Ftkscode\u002Fpynormalizenumexp) | 📥 30 | 📦 14k | ⭐ 8 | 🔴 april 2024|\n| 🔗 [Jusho](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002FJusho) | 📥 217 | 📦 55k | ⭐ 11 | 🔴 june 2024|\n| 🔗 [yurenizer](https:\u002F\u002Fgithub.com\u002Fsea-turt1e\u002Fyurenizer) | 📥 51 | 📦 18k | ⭐ 5 | 🔴 march 2025|\n| 🔗 [e2k](https:\u002F\u002Fgithub.com\u002FPatchethium\u002Fe2k) | 📥 368 | 📦 26k | ⭐ 16 | 🟢 march|\n| 🔗 [alkana.py](https:\u002F\u002Fgithub.com\u002Fzomysan\u002Falkana.py) | - | - | ⭐ 34 | 🔴 october 2021|\n| 🔗 [englishtokanaconverter](https:\u002F\u002Fgithub.com\u002Factlaboratory\u002Fenglishtokanaconverter) | - | - | ⭐ 4 | 🟢 yesterday|\n| 🔗 [kanjiconv](https:\u002F\u002Fgithub.com\u002Fsea-turt1e\u002Fkanjiconv) | 📥 133 | 📦 12k | ⭐ 17 | 🟡 october 2025|\n| 🔗 [kanjize](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002Fkanjize) | 📥 12k | 📦 1M | ⭐ 68 | 🟡 june 2025|\n\n\n### Preprocessor\nLibraries that normalize and clean text before analysis\n\n * [neologdn](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fneologdn) - Japanese text normalizer for mecab-neologd\n * [jaconv](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fjaconv) - Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku\n * [mojimoji](https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fmojimoji) - A fast converter between Japanese hankaku and zenkaku characters\n * [text-cleaning](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Ftext-cleaning) - A powerful text cleaner for Japanese web texts\n * [HojiChar](https:\u002F\u002Fgithub.com\u002FHojiChar\u002FHojiChar) - 複数の前処理を構成して管理するテキスト前処理ツール\n * [utsuho](https:\u002F\u002Fgithub.com\u002Fjuno-rmks\u002Futsuho) - Utsuho is a Python module that facilitates bidirectional conversion between half-width katakana and full-width katakana in Japanese.\n * [python-habachen](https:\u002F\u002Fgithub.com\u002FHizuru3\u002Fpython-habachen) - Yet Another Fast Japanese String Converter\n * [kairyou](https:\u002F\u002Fgithub.com\u002Fbikatr7\u002Fkairyou) - Quickly preprocesses Japanese text using NLP\u002FNER from SpaCy for Japanese translation or other NLP tasks.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [neologdn](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fneologdn) | 📥 8k | 📦 1M | ⭐ 287 | 🟡 december 2025|\n| 🔗 [jaconv](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fjaconv) | 📥 567k | 📦 64M | ⭐ 344 | 🟢 february|\n| 🔗 [mojimoji](https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fmojimoji) | 📥 70k | 📦 11M | ⭐ 152 | 🔴 january 2024|\n| 🔗 [text-cleaning](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Ftext-cleaning) | - | - | ⭐ 12 | 🔴 november 2022|\n| 🔗 [HojiChar](https:\u002F\u002Fgithub.com\u002FHojiChar\u002FHojiChar) | 📥 19k | 📦 919k | ⭐ 125 | 🟡 november 2025|\n| 🔗 [utsuho](https:\u002F\u002Fgithub.com\u002Fjuno-rmks\u002Futsuho) | 📥 291 | 📦 21k | ⭐ 4 | 🟢 march|\n| 🔗 [python-habachen](https:\u002F\u002Fgithub.com\u002FHizuru3\u002Fpython-habachen) | 📥 26k | 📦 2M | ⭐ 6 | 🟡 october 2025|\n| 🔗 [kairyou](https:\u002F\u002Fgithub.com\u002Fbikatr7\u002Fkairyou) | 📥 58 | 📦 31k | ⭐ 6 | 🟡 june 2025|\n\n\n### Sentence spliter\nLibraries that automatically detect sentence boundaries and split text\n\n * [Bunkai](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fbunkai) - Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)\n * [japanese-sentence-breaker](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fjapanese-sentence-breaker) - Japanese Sentence Breaker\n * [sengiri](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fsengiri) - Yet another sentence-level tokenizer for the Japanese text\n * [budoux](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fbudoux) - Standalone. Small. Language-neutral. BudouX is the successor to Budou, the machine learning powered line break organizer tool.\n * [ja_sentence_segmenter](https:\u002F\u002Fgithub.com\u002Fwwwcojp\u002Fja_sentence_segmenter) - japanese sentence segmentation library for python\n * [hasami](https:\u002F\u002Fgithub.com\u002Fmkartawijaya\u002Fhasami) - A tool to perform sentence segmentation on Japanese text\n * [kuzukiri](https:\u002F\u002Fgithub.com\u002Falinear-corp\u002Fkuzukiri) - Japanese Text Segmenter for Python written in Rust\n * [ja-senter-benchmark](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fja-senter-benchmark) - Comparison of Japanese Sentence Segmentation Tools\n * [fast-bunkai](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Ffast-bunkai) - Japanese sentence splitting(日本語文境界判定器), 40–250× faster via a Rust-accelerated Python library with near-perfect API compatibility with megagonlabs\u002Fbunkai.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [bunkai](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fbunkai) | 📥 571 | 📦 109k | ⭐ 199 | 🔴 august 2023|\n| 🔗 [japanese-sentence-breaker](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fjapanese-sentence-breaker) | 📥 4 | 📦 5k | ⭐ 14 | 🔴 february 2021|\n| 🔗 [sengiri](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fsengiri) | 📥 100 | 📦 136k | ⭐ 24 | 🟡 november 2025|\n| 🔗 [budoux](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fbudoux) | 📥 9k | 📦 451k | ⭐ 1.6k | 🟢 last thursday|\n| 🔗 [ja_sentence_segmenter](https:\u002F\u002Fgithub.com\u002Fwwwcojp\u002Fja_sentence_segmenter) | 📥 2k | 📦 193k | ⭐ 74 | 🔴 april 2023|\n| 🔗 [hasami](https:\u002F\u002Fgithub.com\u002Fmkartawijaya\u002Fhasami) | 📥 158 | 📦 39k | ⭐ 6 | 🔴 february 2021|\n| 🔗 [kuzukiri](https:\u002F\u002Fgithub.com\u002Falinear-corp\u002Fkuzukiri) | 📥 183 | 📦 27k | ⭐ 6 | 🟡 june 2025|\n| 🔗 [ja-senter-benchmark](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fja-senter-benchmark) | - | - | ⭐ 9 | 🔴 february 2023|\n| 🔗 [fast-bunkai](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Ffast-bunkai) | 📥 71 | 📦 4k | ⭐ 71 | 🟡 october 2025|\n\n\n### Sentiment analysis\nLibraries that detect emotions or polarity in text\n\n * [oseti](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Foseti) - Dictionary based Sentiment Analysis for Japanese\n * [negapoji](https:\u002F\u002Fgithub.com\u002Fliaoziyang\u002Fnegapoji) - Japanese negative positive classification.日本語文書のネガポジを判定。\n * [pymlask](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fpymlask) - Emotion analyzer for Japanese text\n * [asari](https:\u002F\u002Fgithub.com\u002FHironsan\u002Fasari) - Japanese sentiment analyzer implemented in Python.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [oseti](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Foseti) | 📥 379 | 📦 167k | ⭐ 97 | 🟡 august 2025|\n| 🔗 [negapoji](https:\u002F\u002Fgithub.com\u002Fliaoziyang\u002Fnegapoji) | - | - | ⭐ 151 | 🔴 august 2017|\n| 🔗 [pymlask](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fpymlask) | 📥 40 | 📦 66k | ⭐ 116 | 🔴 july 2024|\n| 🔗 [asari](https:\u002F\u002Fgithub.com\u002FHironsan\u002Fasari) | 📥 91 | 📦 80k | ⭐ 152 | 🔴 october 2022|\n\n\n### Machine translation\nLibraries that automatically translate text between languages\n\n * [jparacrawl-finetune](https:\u002F\u002Fgithub.com\u002FMorinoseiMorizo\u002Fjparacrawl-finetune) - An example usage of JParaCrawl pre-trained Neural Machine Translation (NMT) models.\n * [JASS](https:\u002F\u002Fgithub.com\u002FMao-KU\u002FJASS) - JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation (LREC2020) & Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation (ACM TALLIP)\n * [PheMT](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FPheMT) - A phenomenon-wise evaluation dataset for Japanese-English machine translation robustness. The dataset is based on the MTNT dataset, with additional annotations of four linguistic phenomena; Proper Noun, Abbreviated Noun, Colloquial Expression, and Variant. COLING 2020.\n * [VISA](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FVISA) - An ambiguous subtitles dataset for visual scene-aware machine translation\n * [plamo-translate-cli](https:\u002F\u002Fgithub.com\u002Fpfnet\u002Fplamo-translate-cli) - A command-line interface for translation using the plamo-2-translate model with local execution.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [jparacrawl-finetune](https:\u002F\u002Fgithub.com\u002FMorinoseiMorizo\u002Fjparacrawl-finetune) | - | - | ⭐ 105 | 🔴 april 2021|\n| 🔗 [JASS](https:\u002F\u002Fgithub.com\u002FMao-KU\u002FJASS) | - | - | ⭐ 16 | 🔴 january 2022|\n| 🔗 [PheMT](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FPheMT) | - | - | ⭐ 19 | 🔴 february 2021|\n| 🔗 [VISA](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FVISA) | - | - | ⭐ 14 | 🔴 october 2022|\n| 🔗 [plamo-translate-cli](https:\u002F\u002Fgithub.com\u002Fpfnet\u002Fplamo-translate-cli) | - | - | ⭐ 339 | 🟡 october 2025|\n\n\n### Named entity recognition\nLibraries that extract names of people, places, and organizations from text\n\n * [namaco](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002Fnamaco) - Character Based Named Entity Recognition.\n * [entitypedia](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002Fentitypedia) - Entitypedia is an Extended Named Entity Dictionary from Wikipedia.\n * [noyaki](https:\u002F\u002Fgithub.com\u002Fken11\u002Fnoyaki) - Converts character span label information to tokenized text-based label information.\n * [bert-japanese-ner-finetuning](https:\u002F\u002Fgithub.com\u002Fken11\u002Fbert-japanese-ner-finetuning) - Code to perform finetuning of the BERT model. BERTモデルのファインチューニングで固有表現抽出用タスクのモデルを作成・使用するサンプルです\n * [joint-information-extraction-hs](https:\u002F\u002Fgithub.com\u002Faih-uth\u002Fjoint-information-extraction-hs) - 詳細なアノテーション基準に基づく症例報告コーパスからの固有表現及び関係の抽出精度の推論を行うコード\n * [pygeonlp](https:\u002F\u002Fgithub.com\u002Fgeonlp-platform\u002Fpygeonlp) - pygeonlp, A python module for geotagging Japanese texts.\n * [bert-ner-japanese](https:\u002F\u002Fgithub.com\u002Fjurabiinc\u002Fbert-ner-japanese) - BERTによる日本語固有表現抽出のファインチューニング用プログラム\n * [huggingface-finetune-japanese](https:\u002F\u002Fgithub.com\u002Ftsmatz\u002Fhuggingface-finetune-japanese) - Examples to finetune encoder-only and encoder-decoder transformers for Japanese language (Hugging Face) Resources\n * [novelanalysisbyner](https:\u002F\u002Fgithub.com\u002Flychee1223\u002Fnovelanalysisbyner) - BERTのfine-tuningによる固有表現抽出\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [namaco](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002Fnamaco) | - | - | ⭐ 40 | 🔴 february 2018|\n| 🔗 [entitypedia](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002Fentitypedia) | - | - | ⭐ 13 | 🔴 december 2018|\n| 🔗 [noyaki](https:\u002F\u002Fgithub.com\u002Fken11\u002Fnoyaki) | 📥 131 | 📦 20k | ⭐ 5 | 🔴 august 2022|\n| 🔗 [bert-japanese-ner-finetuning](https:\u002F\u002Fgithub.com\u002Fken11\u002Fbert-japanese-ner-finetuning) | - | - | ⭐ 11 | 🔴 june 2022|\n| 🔗 [joint-information-extraction-hs](https:\u002F\u002Fgithub.com\u002Faih-uth\u002Fjoint-information-extraction-hs) | - | - | ⭐ 1 | 🔴 november 2021|\n| 🔗 [pygeonlp](https:\u002F\u002Fgithub.com\u002Fgeonlp-platform\u002Fpygeonlp) | 📥 70 | 📦 22k | ⭐ 22 | 🟢 march|\n| 🔗 [bert-ner-japanese](https:\u002F\u002Fgithub.com\u002Fjurabiinc\u002Fbert-ner-japanese) | - | - | ⭐ 5 | 🔴 september 2022|\n| 🔗 [huggingface-finetune-japanese](https:\u002F\u002Fgithub.com\u002Ftsmatz\u002Fhuggingface-finetune-japanese) | - | - | ⭐ 16 | 🔴 october 2023|\n| 🔗 [novelanalysisbyner](https:\u002F\u002Fgithub.com\u002Flychee1223\u002Fnovelanalysisbyner) | - | - | ⭐ 2 | 🔴 june 2024|\n\n\n### OCR\nLibraries that recognize and extract text from images\n\n * [Manga OCR](https:\u002F\u002Fgithub.com\u002Fkha-white\u002Fmanga-ocr) - About Optical character recognition for Japanese text, with the main focus being Japanese manga\n * [mokuro](https:\u002F\u002Fgithub.com\u002Fkha-white\u002Fmokuro) - Read Japanese manga inside browser with selectable text.\n * [handwritten-japanese-ocr](https:\u002F\u002Fgithub.com\u002Fyas-sim\u002Fhandwritten-japanese-ocr) - Handwritten Japanese OCR demo using touch panel to draw the input text using Intel OpenVINO toolkit\n * [OCR_Japanease](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FOCR_Japanease) - 日本語OCR\n * [ndlocr_cli](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlocr_cli) - NDLOCRのアプリケーション\n * [donut](https:\u002F\u002Fgithub.com\u002Fclovaai\u002Fdonut) - Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022\n * [JMTrans](https:\u002F\u002Fgithub.com\u002Fttop32\u002FJMTrans) - manga translator - get japanese manga from url to translate manga image\n * [Kindai-OCR](https:\u002F\u002Fgithub.com\u002Fducanh841988\u002FKindai-OCR) - OCR system for recognizing modern Japanese magazines\n * [text_recognition](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Ftext_recognition) - NDLOCR用テキスト認識モジュール\n * [Poricom](https:\u002F\u002Fgithub.com\u002Fblueaxis\u002FPoricom) - Optical character recognition in manga images. Manga OCR desktop application\n * [owocr](https:\u002F\u002Fgithub.com\u002Faurorawright\u002Fowocr) - Optical character recognition for Japanese text\n * [yomitoku](https:\u002F\u002Fgithub.com\u002Fkotaro-kinoshita\u002Fyomitoku) - Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.\n * [findtextcenternet](https:\u002F\u002Fgithub.com\u002Flithium0003\u002Ffindtextcenternet) - Japanese OCR with CenterNet\n * [simple-ocr-for-manga](https:\u002F\u002Fgithub.com\u002Fyisusdev2005\u002Fsimple-ocr-for-manga) - A simple OCR for manga (Japanese traditional and Japanese vertical)\n * [jp-ocr-evaluation](https:\u002F\u002Fgithub.com\u002Fyoshino\u002Fjp-ocr-evaluation) - 日本語の文章画像に対するOCRの性能を評価\n * [paddleocr-vl-sft-for-japanese-manga-on-rtx-3060](https:\u002F\u002Fgithub.com\u002Fopenvino-book\u002Fpaddleocr-vl-sft-for-japanese-manga-on-rtx-3060) - Fine-tune PaddleOCR-VL on the Manga109s dataset for Japanese manga text recognition. The base model struggles with vertical Japanese text reading order in manga. After fine-tuning, the model correctly handles manga-specific text layouts.\n * [MangaOCR](https:\u002F\u002Fgithub.com\u002Fgnurt2041\u002FMangaOCR) - A lightweight OCR model for Japanese text, especially in Manga\n * [meikiocr](https:\u002F\u002Fgithub.com\u002Frtr46\u002Fmeikiocr) - high-speed, high-accuracy, local ocr for japanese video games\n * [meikipop](https:\u002F\u002Fgithub.com\u002Frtr46\u002Fmeikipop) - universal japanese ocr popup dictionary for windows, linux and macos\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [manga-ocr](https:\u002F\u002Fgithub.com\u002Fkha-white\u002Fmanga-ocr) | 📥 4k | 📦 267k | ⭐ 2.6k | 🟡 june 2025|\n| 🔗 [mokuro](https:\u002F\u002Fgithub.com\u002Fkha-white\u002Fmokuro) | 📥 1k | 📦 94k | ⭐ 1.6k | 🟢 february|\n| 🔗 [handwritten-japanese-ocr](https:\u002F\u002Fgithub.com\u002Fyas-sim\u002Fhandwritten-japanese-ocr) | - | - | ⭐ 38 | 🔴 april 2022|\n| 🔗 [OCR_Japanease](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FOCR_Japanease) | - | - | ⭐ 246 | 🔴 april 2021|\n| 🔗 [ndlocr_cli](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlocr_cli) | - | - | ⭐ 654 | 🟡 september 2025|\n| 🔗 [donut](https:\u002F\u002Fgithub.com\u002Fclovaai\u002Fdonut) | 📥 291 | 📦 198k | ⭐ 6.8k | 🔴 july 2023|\n| 🔗 [JMTrans](https:\u002F\u002Fgithub.com\u002Fttop32\u002FJMTrans) | - | - | ⭐ 90 | 🔴 january 2021|\n| 🔗 [Kindai-OCR](https:\u002F\u002Fgithub.com\u002Fducanh841988\u002FKindai-OCR) | - | - | ⭐ 153 | 🔴 july 2023|\n| 🔗 [text_recognition](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Ftext_recognition) | - | - | ⭐ 8 | 🔴 july 2023|\n| 🔗 [Poricom](https:\u002F\u002Fgithub.com\u002Fblueaxis\u002FPoricom) | - | - | ⭐ 421 | 🔴 june 2023|\n| 🔗 [owocr](https:\u002F\u002Fgithub.com\u002Faurorawright\u002Fowocr) | - | - | ⭐ 223 | 🟢 last monday|\n| 🔗 [yomitoku](https:\u002F\u002Fgithub.com\u002Fkotaro-kinoshita\u002Fyomitoku) | 📥 1k | 📦 86k | ⭐ 1.4k | 🟢 march|\n| 🔗 [findtextcenternet](https:\u002F\u002Fgithub.com\u002Flithium0003\u002Ffindtextcenternet) | - | - | ⭐ 59 | 🟡 august 2025|\n| 🔗 [simple-ocr-for-manga](https:\u002F\u002Fgithub.com\u002Fyisusdev2005\u002Fsimple-ocr-fogi-manga) | - | - | ⭐ 7 | 🔴 repo not found|\n| 🔗 [jp-ocr-evaluation](https:\u002F\u002Fgithub.com\u002Fyoshino\u002Fjp-ocr-evaluation) | - | - | ⭐ 1 | 🔴 march 2024|\n| 🔗 [paddleocr-vl-sft-for-japanese-manga-on-rtx-3060](https:\u002F\u002Fgithub.com\u002Fopenvino-book\u002Fpaddleocr-vl-sft-for-japanese-manga-on-rtx-3060) | - | - | ⭐ 11 | 🟡 december 2025|\n| 🔗 [MangaOCR](https:\u002F\u002Fgithub.com\u002Fgnurt2041\u002FMangaOCR) | - | - | ⭐ 35 | 🔴 may 2024|\n| 🔗 [meikiocr](https:\u002F\u002Fgithub.com\u002Frtr46\u002Fmeikiocr) | 📥 1k | 📦 23k | ⭐ 69 | 🟢 last wednesday|\n| 🔗 [meikipop](https:\u002F\u002Fgithub.com\u002Frtr46\u002Fmeikipop) | - | - | ⭐ 257 | 🔴 invalid|\n\n\n### Tool for pretrained models\nLibraries that utilize pretrained models to improve accuracy and efficiency\n\n * [JGLUE](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002FJGLUE) - JGLUE: Japanese General Language Understanding Evaluation\n * [ginza-transformers](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fginza-transformers) - Use custom tokenizers in spacy-transformers\n * [t5_japanese_dialogue_generation](https:\u002F\u002Fgithub.com\u002FJinyamyzk\u002Ft5_japanese_dialogue_generation) - T5による会話生成\n * [japanese_text_classification](https:\u002F\u002Fgithub.com\u002FMasao-Taketani\u002Fjapanese_text_classification) - To investigate various DNN text classifiers including MLP, CNN, RNN, BERT approaches.\n * [Japanese-BERT-Sentiment-Analyzer](https:\u002F\u002Fgithub.com\u002Fizuna385\u002FJapanese-BERT-Sentiment-Analyzer) - Deploying sentiment analysis server with FastAPI and BERT\n * [jmlm_scoring](https:\u002F\u002Fgithub.com\u002Fminhpqn\u002Fjmlm_scoring) - Masked Language Model-based Scoring for Japanese and Vietnamese\n * [allennlp-shiba-model](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fallennlp-shiba-model) - AllenNLP integration for Shiba: Japanese CANINE model\n * [evaluate_japanese_w2v](https:\u002F\u002Fgithub.com\u002Fshihono\u002Fevaluate_japanese_w2v) - script to evaluate pre-trained Japanese word2vec model on Japanese similarity dataset\n * [gector-ja](https:\u002F\u002Fgithub.com\u002Fjonnyli1125\u002Fgector-ja) - BERT-based GEC tagging for Japanese\n * [Japanese-BPEEncoder](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-BPEEncoder) - Japanese-BPEEncoder\n * [Japanese-BPEEncoder_V2](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-BPEEncoder_V2) - Japanese-BPEEncoder Version 2\n * [transformer-copy](https:\u002F\u002Fgithub.com\u002Fyouichiro\u002Ftransformer-copy) - 日本語文法誤り訂正ツール\n * [japanese-stable-diffusion](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-stable-diffusion) - Japanese Stable Diffusion is a Japanese specific latent text-to-image diffusion model capable of generating photo-realistic images given any text input.\n * [nagisa_bert](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fnagisa_bert) - A BERT model for nagisa\n * [prefix-tuning-gpt](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fprefix-tuning-gpt) - Example code for prefix-tuning GPT\u002FGPT-NeoX models and for inference with trained prefixes\n * [JGLUE-benchmark](https:\u002F\u002Fgithub.com\u002Fnobu-g\u002FJGLUE-benchmark) - Training and evaluation scripts for JGLUE, a Japanese language understanding benchmark\n * [jptranstokenizer](https:\u002F\u002Fgithub.com\u002Fretarfi\u002Fjptranstokenizer) - Japanese Tokenizer for transformers library\n * [jp-stable](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Flm-evaluation-harness\u002Ftree\u002Fjp-stable) - JP Language Model Evaluation Harness\n * [compare-ja-tokenizer](https:\u002F\u002Fgithub.com\u002Fhitachi-nlp\u002Fcompare-ja-tokenizer) - How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese-ACL SRW 2023\n * [lm-evaluation-harness-jp-stable](https:\u002F\u002Fgithub.com\u002Ftdc-yamada-ya\u002Flm-evaluation-harness-jp-stable) - A framework for few-shot evaluation of autoregressive language models.\n * [llm-lora-classification](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fllm-lora-classification) - llm-lora-classification\n * [jp-stable](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Flm-evaluation-harness\u002Ftree\u002Fjp-stable) - JP Language Model Evaluation Harness\n * [rinna_gpt-neox_ggml-lora](https:\u002F\u002Fgithub.com\u002Fyukaryavka\u002Frinna_gpt-neox_ggml-lora) - The repository contains scripts and merge scripts that have been modified to adapt an Alpaca-Lora adapter for LoRA tuning when assuming the use of the \"rinna\u002Fjapanese-gpt-neox...\" [gpt-neox] model converted to ggml.\n * [japanese-llm-roleplay-benchmark](https:\u002F\u002Fgithub.com\u002Foshizo\u002Fjapanese-llm-roleplay-benchmark) - このリポジトリは日本語LLMのキャラクターロールプレイに関する性能を評価するために作成しました。\n * [japanese-llm-ranking](https:\u002F\u002Fgithub.com\u002Fyuzu-ai\u002Fjapanese-llm-ranking) - This repository supports YuzuAI's Rakuda leaderboard of Japanese LLMs, which is a Japanese-focused analogue of LMSYS' Vicuna eval.\n * [llm-jp-eval](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-eval) - このツールは、複数のデータセットを横断して日本語の大規模言語モデルを自動評価するものです．\n * [llm-jp-sft](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-sft) - This repository contains the code for supervised fine-tuning of LLM-jp models.\n * [llm-jp-tokenizer](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-tokenizer) - LLM勉強会（LLM-jp）で開発しているLLM用のトークナイザー関連をまとめたリポジトリです．\n * [japanese-lm-fin-harness](https:\u002F\u002Fgithub.com\u002Fpfnet-research\u002Fjapanese-lm-fin-harness) - Japanese Language Model Financial Evaluation Harness\n * [ja-vicuna-qa-benchmark](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fja-vicuna-qa-benchmark) - Japanese Vicuna QA Benchmark\n * [swallow-evaluation](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-evaluation) - Swallowプロジェクト 大規模言語モデル 評価スクリプト\n * [swallow-evaluation-instruct](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-evaluation-instruct) - Swallowプロジェクト 事後学習ずみ大規模言語モデル 評価フレームワーク\n * [pretrained_doc2vec_ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fpretrained_doc2vec_ja) - pretrained doc2vec models on Japanese Wikipedia\n * [pl-bert-ja](https:\u002F\u002Fgithub.com\u002Fkyamauchi1023\u002Fpl-bert-ja) - A repository of Japanese Phoneme-Level BERT\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [JGLUE](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002FJGLUE) | - | - | ⭐ 338 | 🔴 march 2025|\n| 🔗 [ginza-transformers](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fginza-transformers) | 📥 1k | 📦 186k | ⭐ invalid | 🔴 august 2022|\n| 🔗 [t5_japanese_dialogue_generation](https:\u002F\u002Fgithub.com\u002FJinyamyzk\u002Ft5_japanese_dialogue_generation) | - | - | ⭐ 3 | 🔴 november 2021|\n| 🔗 [japanese_text_classification](https:\u002F\u002Fgithub.com\u002FMasao-Taketani\u002Fjapanese_text_classification) | - | - | ⭐ 9 | 🔴 january 2020|\n| 🔗 [Japanese-BERT-Sentiment-Analyzer](https:\u002F\u002Fgithub.com\u002Fizuna385\u002FJapanese-BERT-Sentiment-Analyzer) | - | - | ⭐ invalid | 🔴 april 2021|\n| 🔗 [jmlm_scoring](https:\u002F\u002Fgithub.com\u002Fminhpqn\u002Fjmlm_scoring) | - | - | ⭐ 5 | 🔴 february 2022|\n| 🔗 [allennlp-shiba-model](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fallennlp-shiba-model) | 📥 32 | 📦 20k | ⭐ 12 | 🔴 june 2021|\n| 🔗 [evaluate_japanese_w2v](https:\u002F\u002Fgithub.com\u002Fshihono\u002Fevaluate_japanese_w2v) | - | - | ⭐ 12 | 🔴 november 2024|\n| 🔗 [gector-ja](https:\u002F\u002Fgithub.com\u002Fjonnyli1125\u002Fgector-ja) | - | - | ⭐ 19 | 🔴 june 2021|\n| 🔗 [Japanese-BPEEncoder](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-BPEEncoder) | - | - | ⭐ 41 | 🔴 september 2021|\n| 🔗 [Japanese-BPEEncoder_V2](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-BPEEncoder_V2) | - | - | ⭐ 41 | 🔴 january 2023|\n| 🔗 [transformer-copy](https:\u002F\u002Fgithub.com\u002Fyouichiro\u002Ftransformer-copy) | - | - | ⭐ 29 | 🔴 september 2020|\n| 🔗 [japanese-stable-diffusion](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-stable-diffusion) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [nagisa_bert](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fnagisa_bert) | 📥 40 | 📦 57k | ⭐ 5 | 🟢 february|\n| 🔗 [prefix-tuning-gpt](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fprefix-tuning-gpt) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [JGLUE-benchmark](https:\u002F\u002Fgithub.com\u002Fnobu-g\u002FJGLUE-benchmark) | - | - | ⭐ 18 | 🟢 last thursday|\n| 🔗 [jptranstokenizer](https:\u002F\u002Fgithub.com\u002Fretarfi\u002Fjptranstokenizer) | 📥 83 | 📦 28k | ⭐ 5 | 🔴 february 2024|\n| 🔗 [jp-stable](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Flm-evaluation-harness\u002Ftree\u002Fjp-stable) | - | - | ⭐ 154 | 🔴 november 2023|\n| 🔗 [compare-ja-tokenizer](https:\u002F\u002Fgithub.com\u002Fhitachi-nlp\u002Fcompare-ja-tokenizer) | - | - | ⭐ 6 | 🔴 june 2023|\n| 🔗 [lm-evaluation-harness-jp-stable](https:\u002F\u002Fgithub.com\u002Ftdc-yamada-ya\u002Flm-evaluation-harness-jp-stable) | - | - | ⭐ 1 | 🔴 june 2023|\n| 🔗 [llm-lora-classification](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fllm-lora-classification) | - | - | ⭐ 98 | 🔴 july 2023|\n| 🔗 [jp-stable](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Flm-evaluation-harness\u002Ftree\u002Fjp-stable) | - | - | ⭐ 154 | 🔴 november 2023|\n| 🔗 [rinna_gpt-neox_ggml-lora](https:\u002F\u002Fgithub.com\u002Fyukaryavka\u002Frinna_gpt-neox_ggml-lora) | - | - | ⭐ 19 | 🔴 may 2023|\n| 🔗 [japanese-llm-roleplay-benchmark](https:\u002F\u002Fgithub.com\u002Foshizo\u002Fjapanese-llm-roleplay-benchmark) | - | - | ⭐ 40 | 🔴 november 2023|\n| 🔗 [japanese-llm-ranking](https:\u002F\u002Fgithub.com\u002Fyuzu-ai\u002Fjapanese-llm-ranking) | - | - | ⭐ 50 | 🔴 march 2024|\n| 🔗 [llm-jp-eval](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-eval) | - | - | ⭐ 150 | 🟢 last monday|\n| 🔗 [llm-jp-sft](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-sft) | - | - | ⭐ 62 | 🔴 june 2024|\n| 🔗 [llm-jp-tokenizer](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-tokenizer) | - | - | ⭐ 46 | 🟢 last monday|\n| 🔗 [japanese-lm-fin-harness](https:\u002F\u002Fgithub.com\u002Fpfnet-research\u002Fjapanese-lm-fin-harness) | - | - | ⭐ 77 | 🟢 january|\n| 🔗 [ja-vicuna-qa-benchmark](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fja-vicuna-qa-benchmark) | - | - | ⭐ 33 | 🔴 june 2024|\n| 🔗 [swallow-evaluation](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-evaluation) | - | - | ⭐ 24 | 🟡 september 2025|\n| 🔗 [swallow-evaluation-instruct](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-evaluation-instruct) | - | - | ⭐ 27 | 🟡 october 2025|\n| 🔗 [pretrained_doc2vec_ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fpretrained_doc2vec_ja) | - | - | ⭐ 25 | 🔴 january 2019|\n| 🔗 [pl-bert-ja](https:\u002F\u002Fgithub.com\u002Fkyamauchi1023\u002Fpl-bert-ja) | - | - | ⭐ 24 | 🔴 december 2023|\n\n\n### Others\nGeneral-purpose tools supporting Japanese language processing\n\n * [namedivider-python](https:\u002F\u002Fgithub.com\u002Frskmoi\u002Fnamedivider-python) - A tool for dividing the Japanese full name into a family name and a given name.\n * [asa-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fasa-python) - A curated list of resources dedicated to Python libraries of NLP for Japanese\n * [python_asa](https:\u002F\u002Fgithub.com\u002FTakeuchi-Lab-LM\u002Fpython_asa) - python版日本語意味役割付与システム（ASA）\n * [toiro](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Ftoiro) - A comparison tool of Japanese tokenizers\n * [ja-timex](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fja-timex) - 自然言語で書かれた時間情報表現を抽出\u002F規格化するルールベースの解析器\n * [JapaneseTokenizers](https:\u002F\u002Fgithub.com\u002FKensuke-Mitsuzawa\u002FJapaneseTokenizers) - A set of metrics for feature selection from text data\n * [daaja](https:\u002F\u002Fgithub.com\u002Fkajyuuen\u002Fdaaja) - This repository has implementations of data augmentation for NLP for Japanese.\n * [accel-brain-code](https:\u002F\u002Fgithub.com\u002Faccel-brain\u002Faccel-brain-code) - The purpose of this repository is to make prototypes as case study in the context of proof of concept(PoC) and research and development(R&D) that I have written in my website. The main research topics are Auto-Encoders in relation to the representation learning, the statistical machine learning for energy-based models, adversarial generation net…\n * [kyoto-reader](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkyoto-reader) - A processor for KyotoCorpus, KWDLC, and AnnotatedFKCCorpus\n * [nlplot](https:\u002F\u002Fgithub.com\u002Ftakapy0210\u002Fnlplot) - Visualization Module for Natural Language Processing\n * [rake-ja](https:\u002F\u002Fgithub.com\u002Fkanjirz50\u002Frake-ja) - Rapid Automatic Keyword Extraction algorithm for Japanese\n * [jel](https:\u002F\u002Fgithub.com\u002Fizuna385\u002Fjel) - Japanese Entity Linker.\n * [MedNER-J](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FMedNER-J) - Latest version of MedEX\u002FJ (Japanese disease name extractor)\n * [zunda-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fzunda-python) - Zunda: Japanese Enhanced Modality Analyzer client for Python.\n * [AIO2_DPR_baseline](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FAIO2_DPR_baseline) - https:\u002F\u002Fwww.nlp.ecei.tohoku.ac.jp\u002Fprojects\u002Faio\u002F\n * [showcase](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fshowcase) - A PyTorch implementation of the Japanese Predicate-Argument Structure (PAS) analyser presented in the paper of Matsubayashi & Inui (2018) with some improvements.\n * [darts-clone-python](https:\u002F\u002Fgithub.com\u002Frixwew\u002Fdarts-clone-python) - Darts-clone python binding\n * [jrte-corpus_example](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fjrte-corpus_example) - Example codes for Japanese Realistic Textual Entailment Corpus\n * [desuwa](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fdesuwa) - Feature annotator to morphemes and phrases based on KNP rule files (pure-Python)\n * [HotPepperGourmetDialogue](https:\u002F\u002Fgithub.com\u002FHironsan\u002FHotPepperGourmetDialogue) - Restaurant Search System through Dialogue in Japanese.\n * [nlp-recipes-ja](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp-recipes-ja) - Samples codes for natural language processing in Japanese\n * [Japanese_nlp_scripts](https:\u002F\u002Fgithub.com\u002Folsgaard\u002FJapanese_nlp_scripts) - Small example scripts for working with Japanese texts in Python\n * [DNorm-J](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FDNorm-J) - Japanese version of DNorm\n * [pyknp-eventgraph](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fpyknp-eventgraph) - EventGraph is a development platform for high-level NLP applications in Japanese.\n * [ishi](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fishi) - Ishi: A volition classifier for Japanese\n * [python-npylm](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Fpython-npylm) - ベイズ階層言語モデルによる教師なし形態素解析\n * [python-npycrf](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Fpython-npycrf) - 条件付確率場とベイズ階層言語モデルの統合による半教師あり形態素解析\n * [unsupervised-pos-tagging](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Funsupervised-pos-tagging) - 教師なし品詞タグ推定\n * [negima](https:\u002F\u002Fgithub.com\u002Fcocodrips\u002Fnegima) - Negima is a Python package to extract phrases in Japanese text by using the part-of-speeches based rules you defined.\n * [YouyakuMan](https:\u002F\u002Fgithub.com\u002Fneilctwu\u002FYouyakuMan) - Extractive summarizer using BertSum as summarization model\n * [japanese-numbers-python](https:\u002F\u002Fgithub.com\u002Ftakumakanari\u002Fjapanese-numbers-python) - A parser for Japanese number (Kanji, arabic) in the natural language.\n * [kantan](https:\u002F\u002Fgithub.com\u002Fitayperl\u002Fkantan) - Lookup japanese words by radical patterns\n * [make-meidai-dialogue](https:\u002F\u002Fgithub.com\u002Fknok\u002Fmake-meidai-dialogue) - Get Japanese dialogue corpus\n * [japanese_summarizer](https:\u002F\u002Fgithub.com\u002Fryuryukke\u002Fjapanese_summarizer) - A summarizer for Japanese articles.\n * [chirptext](https:\u002F\u002Fgithub.com\u002Fletuananh\u002Fchirptext) - ChirpText is a collection of text processing tools for Python.\n * [yubin](https:\u002F\u002Fgithub.com\u002Falvations\u002Fyubin) - Japanese Address Munger\n * [jawiki-cleaner](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fjawiki-cleaner) - Japanese Wikipedia Cleaner\n * [japanese2phoneme](https:\u002F\u002Fgithub.com\u002Fiory\u002Fjapanese2phoneme) - A python library to convert Japanese to phoneme.\n * [anlp_nlp2021_d3-1](https:\u002F\u002Fgithub.com\u002Farusl\u002Fanlp_nlp2021_d3-1) - This repository contains codes related to the experiments in \"An Experimental Evaluation of Japanese Tokenizers for Sentiment-Based Text Classification\"\n * [aozora_classification](https:\u002F\u002Fgithub.com\u002Fshibuiwilliam\u002Faozora_classification) - This project aims to classify Japanese sentence to how well similar to some Japanese classical writers, such as Soseki Natsume, Ogai Mori, Ryunosuke Akutagawa and so on.\n * [aozora-corpus-generator](https:\u002F\u002Fgithub.com\u002Fborh\u002Faozora-corpus-generator) - Generates plain or tokenized text files from the Aozora Bunko\n * [JLM](https:\u002F\u002Fgithub.com\u002Fjiali-ms\u002FJLM) - A fast LSTM Language Model for large vocabulary language like Japanese and Chinese\n * [NTM](https:\u002F\u002Fgithub.com\u002Fm3yrin\u002FNTM) - Testing of Neural Topic Modeling for Japanese articles\n * [EN-JP-ML-Lexicon](https:\u002F\u002Fgithub.com\u002FMachine-Learning-Tokyo\u002FEN-JP-ML-Lexicon) - This is a English-Japanese lexicon for Machine Learning and Deep Learning terminology.\n * [text-generation](https:\u002F\u002Fgithub.com\u002Fdiscus0434\u002Ftext-generation) - Easy-to-use scripts to fine-tune GPT-2-JA with your own texts, to generate sentences, and to tweet them automatically.\n * [chainer_nic](https:\u002F\u002Fgithub.com\u002Fyuyay\u002Fchainer_nic) - Neural Image Caption (NIC) on chainer, its pretrained models on English and Japanese image caption datasets.\n * [unihan-lm](https:\u002F\u002Fgithub.com\u002FJetRunner\u002Funihan-lm) - The official repository for \"UnihanLM: Coarse-to-Fine Chinese-Japanese Language Model Pretraining with the Unihan Database\", AACL-IJCNLP 2020\n * [mbart-finetuning](https:\u002F\u002Fgithub.com\u002Fken11\u002Fmbart-finetuning) - Code to perform finetuning of the mBART model.\n * [xvector_jtubespeech](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fxvector_jtubespeech) - xvector model on jtubespeech\n * [TinySegmenterMaker](https:\u002F\u002Fgithub.com\u002Fshogo82148\u002FTinySegmenterMaker) - TinySegmenter用の学習モデルを自作するためのツール．\n * [Grongish](https:\u002F\u002Fgithub.com\u002Fshogo82148\u002FGrongish) - 日本語とグロンギ語の相互変換スクリプト\n * [WordCloud-Japanese](https:\u002F\u002Fgithub.com\u002Faocattleya\u002FWordCloud-Japanese) - WordCloudでの日本語文章をMecab（形態素解析エンジン）を使用せずに形態素解析チックな表示を実現するスクリプト\n * [snark](https:\u002F\u002Fgithub.com\u002Fhiraokusky\u002Fsnark) - 日本語ワードネットを利用したDBアクセスライブラリ\n * [toEmoji](https:\u002F\u002Fgithub.com\u002Fmkan0141\u002FtoEmoji) - 日本語文を絵文字だけの文に変換するなにか\n * [termextract](https:\u002F\u002Fgithub.com\u002Fkanjirz50\u002Ftermextract) - - 専門用語抽出アルゴリズムの実装の練習\n * [JDT-with-KenLM-scoring](https:\u002F\u002Fgithub.com\u002FTUT-SLP-lab\u002FJDT-with-KenLM-scoring) - Japanese-Dialog-Transformerの応答候補に対して、KenLMによるN-gram言語モデルでスコアリングし、フィルタリング若しくはリランキングを行う。\n * [mixture-of-unigram-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002Fmixture-of-unigram-model) - Mixture of Unigram Model and Infinite Mixture of Unigram Model in Python. (混合ユニグラムモデルと無限混合ユニグラムモデル)\n * [hidden-markov-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002Fhidden-markov-model) - Hidden Markov Model (HMM) and Infinite Hidden Markov Model (iHMM) in Python. (隠れマルコフモデルと無限隠れマルコフモデル)\n * [Ngram-language-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002FNgram-language-model) - Ngram language model in Python. (Nグラム言語モデル)\n * [ASRDeepSpeech](https:\u002F\u002Fgithub.com\u002FJeanMaximilienCadic\u002FASRDeepSpeech) - Automatic Speech Recognition with deepspeech2 model in pytorch with support from Zakuro AI.\n * [neural_ime](https:\u002F\u002Fgithub.com\u002Fyohokuno\u002Fneural_ime) - Neural IME: Neural Input Method Engine\n * [neural_japanese_transliterator](https:\u002F\u002Fgithub.com\u002FKyubyong\u002Fneural_japanese_transliterator) - Can neural networks transliterate Romaji into Japanese correctly?\n * [tinysegmenter](https:\u002F\u002Fgithub.com\u002FSamuraiT\u002Ftinysegmenter) - tokenizer specified for Japanese\n * [AugLy-jp](https:\u002F\u002Fgithub.com\u002Fchck\u002FAugLy-jp) - Data Augmentation for Japanese Text on AugLy\n * [furigana4epub](https:\u002F\u002Fgithub.com\u002FMumumu4\u002Ffurigana4epub) - A Python script for adding furigana to Japanese epub books using Mecab and Unidic.\n * [PyKatsuyou](https:\u002F\u002Fgithub.com\u002FSmashinFries\u002FPyKatsuyou) - Japanese verb\u002Fadjective inflections tool\n * [jageocoder](https:\u002F\u002Fgithub.com\u002Ft-sagara\u002Fjageocoder) - Pure Python Japanese address geocoder\n * [pygeonlp](https:\u002F\u002Fgithub.com\u002Fgeonlp-platform\u002Fpygeonlp) - pygeonlp, A python module for geotagging Japanese texts.\n * [nksnd](https:\u002F\u002Fgithub.com\u002Fyoriyuki\u002Fnksnd) - New kana-kanji conversion engine\n * [JaMIE](https:\u002F\u002Fgithub.com\u002Fracerandom\u002FJaMIE) - A Japanese Medical Information Extraction Toolkit\n * [fasttext-vs-word2vec-on-twitter-data](https:\u002F\u002Fgithub.com\u002FGINK03\u002Ffasttext-vs-word2vec-on-twitter-data) - fasttextとword2vecの比較と、実行スクリプト、学習スクリプトです\n * [minimal-search-engine](https:\u002F\u002Fgithub.com\u002FGINK03\u002Fminimal-search-engine) - 最小のサーチエンジン\u002FPageRank\u002Ftf-idf\n * [5ch-analysis](https:\u002F\u002Fgithub.com\u002FGINK03\u002F5ch-analysis) - 5chの過去ログをスクレイピングして、過去流行った単語(ex, 香具師, orz)などを追跡調査\n * [tweet_extructor](https:\u002F\u002Fgithub.com\u002FtatHi\u002Ftweet_extructor) - Twitter日本語評判分析データセットのためのツイートダウンローダ\n * [japanese-word-aggregation](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fjapanese-word-aggregation) - Aggregating Japanese words based on Juman++ and ConceptNet5.5\n * [jinf](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fjinf) - A Japanese inflection converter\n * [kwja](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkwja) - A unified language analyzer for Japanese\n * [mlm-scoring-transformers](https:\u002F\u002Fgithub.com\u002FRyutaro-A\u002Fmlm-scoring-transformers) - Reproduced package based on Masked Language Model Scoring (ACL2020).\n * [ClipCap-for-Japanese](https:\u002F\u002Fgithub.com\u002FJapanese-Image-Captioning\u002FClipCap-for-Japanese) - [PyTorch] ClipCap for Japanese\n * [SAT-for-Japanese](https:\u002F\u002Fgithub.com\u002FJapanese-Image-Captioning\u002FSAT-for-Japanese) - [PyTorch] Show, Attend and Tell for Japanese\n * [cihai](https:\u002F\u002Fgithub.com\u002Fcihai\u002Fcihai) - Python library for CJK (Chinese, Japanese, and Korean) language dictionary\n * [marine](https:\u002F\u002Fgithub.com\u002F6gsn\u002Fmarine) - MARINE : Multi-task leaRnIng-based JapaNese accent Estimation\n * [whisper-asr-finetune](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fwhisper-asr-finetune) - Finetuning Whisper ASR model\n * [japanese_chatbot](https:\u002F\u002Fgithub.com\u002FCjangCjengh\u002Fjapanese_chatbot) - A PyTorch Implementation of japanese chatbot using BERT and Transformer's decoder\n * [radicalchar](https:\u002F\u002Fgithub.com\u002Fyamamaya\u002Fradicalchar) - 部首文字正規化ライブラリ\n * [akaza](https:\u002F\u002Fgithub.com\u002Ftokuhirom\u002Fakaza) - Yet another Japanese IME for IBus\u002FLinux\n * [posuto](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fposuto) -  Japanese postal code data.\n * [tacotron2-japanese](https:\u002F\u002Fgithub.com\u002FCjangCjengh\u002Ftacotron2-japanese) - Tacotron2 implementation of Japanese\n * [ibus-hiragana](https:\u002F\u002Fgithub.com\u002Fesrille\u002Fibus-hiragana) - ひらがなIME for IBus\n * [furiganapad](https:\u002F\u002Fgithub.com\u002Fesrille\u002Ffuriganapad) - ふりがなパッド\n * [chikkarpy](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002Fchikkarpy) - Japanese synonym library\n * [ja-tokenizer-docker-py](https:\u002F\u002Fgithub.com\u002Fp-geon\u002Fja-tokenizer-docker-py) - Mecab + NEologd + Docker + Python3\n * [JapaneseEmbeddingEval](https:\u002F\u002Fgithub.com\u002Foshizo\u002FJapaneseEmbeddingEval) - JapaneseEmbeddingEval\n * [gptuber-by-langchain](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fgptuber-by-langchain) - GPTがYouTuberをやります\n * [shuwa](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fshuwa) - Extend GNOME On-Screen Keyboard for Input Methods\n * [japanese-nli-model](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002Fjapanese-nli-model) - This repository provides the code for Japanese NLI model, a fine-tuned masked language model.\n * [tra-fugu](https:\u002F\u002Fgithub.com\u002Ftos-kamiya\u002Ftra-fugu) - A tool for Japanese-English translation and English-Japanese translation by using FuguMT\n * [fugumt](https:\u002F\u002Fgithub.com\u002Fs-taka\u002Ffugumt) - ぷるーふおぶこんせぷと で公開した機械翻訳エンジンを利用する翻訳環境です。 フォームに入力された文字列の翻訳、PDFの翻訳が可能です。\n * [JaSPICE](https:\u002F\u002Fgithub.com\u002Fkeio-smilab23\u002FJaSPICE) - JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models\n * [Retrieval-based-Voice-Conversion-WebUI-JP-localization](https:\u002F\u002Fgithub.com\u002Fyantaisa11\u002FRetrieval-based-Voice-Conversion-WebUI-JP-localization) - jp-localization\n * [pyopenjtalk](https:\u002F\u002Fgithub.com\u002Fr9y9\u002Fpyopenjtalk) - Python wrapper for OpenJTalk\n * [yomigana-ebook](https:\u002F\u002Fgithub.com\u002Frabbit19981023\u002Fyomigana-ebook) - Make learning Japanese easier by adding readings for every kanji in the eBook\n * [N46Whisper](https:\u002F\u002Fgithub.com\u002FAyanaminn\u002FN46Whisper) - Whisper based Japanese subtitle generator\n * [japanese_llm_simple_webui](https:\u002F\u002Fgithub.com\u002Fnoir55\u002Fjapanese_llm_simple_webui) - Rinna-3.6B、OpenCALM等の日本語対応LLM(大規模言語モデル)用の簡易Webインタフェースです\n * [pdf-translator](https:\u002F\u002Fgithub.com\u002Fdiscus0434\u002Fpdf-translator) - pdf-translator translates English PDF files into Japanese, preserving the original layout.\n * [japanese_qa_demo_with_haystack_and_es](https:\u002F\u002Fgithub.com\u002FShingo-Kamata\u002Fjapanese_qa_demo_with_haystack_and_es) - Haystack + Elasticsearch + wikipedia(ja) を用いた、日本語の質問応答システムのサンプル\n * [mozc-devices](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmozc-devices) - Automatically exported from code.google.com\u002Fp\u002Fmozc-morse\n * [natsume](https:\u002F\u002Fgithub.com\u002Ffaruzan0820\u002Fnatsume) - A Japanese text frontend processing toolkit\n * [vits-japros-webui](https:\u002F\u002Fgithub.com\u002Flitagin02\u002Fvits-japros-webui) - 日本語TTS（VITS）の学習と音声合成のGradio WebUI\n * [ja-law-parser](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fja-law-parser) - A Japanese law parser\n * [dictation-kit](https:\u002F\u002Fgithub.com\u002Fjulius-speech\u002Fdictation-kit) - Japanese dictation kit using Julius\n * [julius4seg](https:\u002F\u002Fgithub.com\u002FHiroshiba\u002Fjulius4seg) - Juliusを使ったセグメンテーション支援ツール\n * [voicevox_engine](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox_engine) - 無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXの音声合成エンジン\n * [LLaVA-JP](https:\u002F\u002Fgithub.com\u002Ftosiyuki\u002FLLaVA-JP) - LLaVA-JP is a Japanese VLM trained by LLaVA method\n * [RAG-Japanese](https:\u002F\u002Fgithub.com\u002FAkimParis\u002FRAG-Japanese) - Open source RAG with Llama Index for Japanese LLM in low resource settting\n * [bertjsc](https:\u002F\u002Fgithub.com\u002Fer-ri\u002Fbertjsc) - Japanese Spelling Error Corrector using BERT(Masked-Language Model). BERTに基づいて日本語校正\n * [llm-leaderboard](https:\u002F\u002Fgithub.com\u002Fwandb\u002Fllm-leaderboard) - Project of llm evaluation to Japanese tasks\n * [jglue-evaluation-scripts](https:\u002F\u002Fgithub.com\u002Fnobu-g\u002Fjglue-evaluation-scripts) - Training and evaluation scripts for JGLUE, a Japanese language understanding benchmark\n * [BLIP2-Japanese](https:\u002F\u002Fgithub.com\u002FZhaoPeiduo\u002FBLIP2-Japanese) - Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.\n * [wikipedia-passages-jawiki-embeddings-utils](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fwikipedia-passages-jawiki-embeddings-utils) - wikipedia 日本語の文を、各種日本語の embeddings や faiss index へと変換するスクリプト等。\n * [simple-simcse-ja](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fsimple-simcse-ja) - Exploring Japanese SimCSE\n * [wikipedia-japanese-open-rag](https:\u002F\u002Fgithub.com\u002Flawofcycles\u002Fwikipedia-japanese-open-rag) - Wikipediaの日本語記事を元に、ユーザの質問に回答するGradioベースのRAGのサンプル\n * [gpt4-autoeval](https:\u002F\u002Fgithub.com\u002Fnorthern-system-service\u002Fgpt4-autoeval) - GPT-4 を用いて、言語モデルの応答を自動評価するスクリプト\n * [t5-japanese](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Ft5-japanese) - 日本語T5モデル\n * [japanese_llm_eval](https:\u002F\u002Fgithub.com\u002Flightblue-tech\u002Fjapanese_llm_eval) - A repo for evaluating Japanese LLMs　・　日本語LLMを評価するレポ\n * [jmteb](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fjmteb) - The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)\n * [pydomino](https:\u002F\u002Fgithub.com\u002Fdwangomediavillage\u002Fpydomino) - 日本語音声に対して音素ラベルをアラインメントするためのツールです\n * [easynovelassistant](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasynovelassistant) - 軽量で規制も検閲もない日本語ローカル LLM『LightChatAssistant-TypeB』による、簡単なノベル生成アシスタントです。ローカル特権の永続生成 Generate forever で、当たりガチャを積み上げます。読み上げにも対応。\n * [clip-japanese](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Fclip-japanese) - 日本語データセットでのqlora instruction tuning学習サンプルコード\n * [rime-jaroomaji](https:\u002F\u002Fgithub.com\u002Flazyfoxchan\u002Frime-jaroomaji) - Japanese rōmaji input schema for Rime IME\n * [deep-question-generation](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Fdeep-question-generation) - 深層学習を用いたクイズ自動生成（日本語T5モデル）\n * [magpie-nemotron](https:\u002F\u002Fgithub.com\u002Faratako\u002Fmagpie-nemotron) - Magpieという手法とNemotron-4-340B-Instructを用いて合成対話データセットを作るコード\n * [qlora_ja](https:\u002F\u002Fgithub.com\u002Fsosuke115\u002Fqlora_ja) - 日本語データセットでのqlora instruction tuning学習サンプルコード\n * [mozcdic-ut-jawiki](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-jawiki) - Mozc UT Jawiki Dictionary is a dictionary generated from the Japanese Wikipedia for Mozc.\n * [shisa-v2](https:\u002F\u002Fgithub.com\u002Fshisa-ai\u002Fshisa-v2) - Japanese \u002F English Bilingual LLM\n * [llm-translator](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fllm-translator) - Mixtral-based Ja-En (En-Ja) Translation model\n * [llm-jp-asr](https:\u002F\u002Fgithub.com\u002Ftosiyuki\u002Fllm-jp-asr) - Whisperのデコーダをllm-jp-1.3b-v1.0に置き換えた音声認識モデルを学習させるためのコード\n * [rag-japanese](https:\u002F\u002Fgithub.com\u002Fakimfromparis\u002Frag-japanese) - Open source RAG with Llama Index for Japanese LLM in low resource settting\n * [monaka](https:\u002F\u002Fgithub.com\u002Fkomiya-lab\u002Fmonaka) - A Japanese Parser (including historical Japanese)\n * [jp-translate.cloud](https:\u002F\u002Fgithub.com\u002Fmatthewbieda\u002Fjp-translate.cloud) - A state-of-the-art open-source Japanese \u003C--> English machine translation system based on the latest NMT research.\n * [substring-word-finder](https:\u002F\u002Fgithub.com\u002Ftoufu-24\u002Fsubstring-word-finder) - 連続部分文字列の単語判定を行います\n * [heron-vlm-leaderboard](https:\u002F\u002Fgithub.com\u002Fwandb\u002Fheron-vlm-leaderboard) - This project is a benchmarking tool for evaluating and comparing the performance of various Vision Language Models (VLMs). It uses two datasets: LLaVA-Bench-In-the-Wild and Japanese HERON Bench to measure model performance.\n * [text2dataset](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Ftext2dataset) - Easily turn large English text datasets into Japanese text datasets using open LLMs.\n * [mecab-web-api](https:\u002F\u002Fgithub.com\u002Fbungoume\u002Fmecab-web-api) - MeCabを利用した日本語形態素解析WebAPI\n * [mecab_controller](https:\u002F\u002Fgithub.com\u002Fajatt-tools\u002Fmecab_controller) - Mecab wrapper to generate furigana readings.\n * [vits](https:\u002F\u002Fgithub.com\u002Fzassou65535\u002Fvits) - VITSによるテキスト読み上げ器&ボイスチェンジャー\n * [akari_chatgpt_bot](https:\u002F\u002Fgithub.com\u002Fakarigroup\u002Fakari_chatgpt_bot) - 音声認識、文章生成、音声合成を使って対話するチャットボットアプリ\n * [kudasai](https:\u002F\u002Fgithub.com\u002Fbikatr7\u002Fkudasai) - Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies\n * [mecab-visualizer](https:\u002F\u002Fgithub.com\u002Fsophiefy\u002Fmecab-visualizer) - MeCabの形態素解析結果を可視化するツール\n * [add-dictionary](https:\u002F\u002Fgithub.com\u002Fmassao000\u002Fadd-dictionary) - OpenJTalkのユーザ辞書をGUIで追加するアプリ\n * [j-moshi](https:\u002F\u002Fgithub.com\u002Fnu-dialogue\u002Fj-moshi) - J-Moshi: A Japanese Full-duplex Spoken Dialogue System\n * [jatts](https:\u002F\u002Fgithub.com\u002Funilight\u002Fjatts) - JATTS: Japanese TTS (for research)\n * [tsukasa-speech](https:\u002F\u002Fgithub.com\u002Frespaired\u002Ftsukasa-speech) - a Frontier Japanese Speech Generation net\n * [symptom-expression-search](https:\u002F\u002Fgithub.com\u002Fpo3rin\u002Fsymptom-expression-search) - ElasticsearchやGiNZA、患者表現辞書を使った患者表現揺れ吸収する意味構造検索を試した\n * [llm-jp-judge](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-judge) - 生成自動評価を行うためのPythonツール\n * [asagi-vlm-colaboratory-sample](https:\u002F\u002Fgithub.com\u002Fkazuhito00\u002Fasagi-vlm-colaboratory-sample) - Colaboratory上でAsagi(合成データセットを活用した大規模日本語VLM)をお試しするサンプル\n * [llm-jp-eval-mm](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-eval-mm) - This tool automatically evaluates Japanese multi-modal large language models across multiple datasets.\n * [llm-jp-judge](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-judge) - 生成自動評価を行うためのPythonツール\n * [manga109api](https:\u002F\u002Fgithub.com\u002Fmanga109\u002Fmanga109api) - Simple python API to read annotation data of Manga109\n * [fastrtc-jp](https:\u002F\u002Fgithub.com\u002Froute250\u002Ffastrtc-jp) - fastrtc用の日本語TTSとSTT追加キット\n * [whisper-transcription](https:\u002F\u002Fgithub.com\u002Ffumifumi0831\u002Fwhisper-transcription) - Pythonを使用したWhisperモデルによる音声文字起こしツール\n * [pocket-researcher](https:\u002F\u002Fgithub.com\u002Fu-masao\u002Fpocket-researcher) - LLMを活用した自律調査エージェント。手軽に情報収集、概要把握。\n * [jtransbench](https:\u002F\u002Fgithub.com\u002Fwebbigdata-jp\u002Fjtransbench) - A tool to easily benchmark Japanese translation skills\n * [easyllasa](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasyllasa) - EasyLlasa は 5～15秒の日本語音声と日本語テキストから日本語音声を生成する TSTS (TextSpeechToSpeech) です。\n * [kanjikana-model](https:\u002F\u002Fgithub.com\u002Fdigital-go-jp\u002Fkanjikana-model) - 氏名漢字カナ突合モデル\n * [deep-openreview-research-ja](https:\u002F\u002Fgithub.com\u002Ftb-yasu\u002Fdeep-openreview-research-ja) - OpenReview論文を自動で発見・分析する日本語対応AIエージェント\n * [pitchbench](https:\u002F\u002Fgithub.com\u002Fshewiiii\u002Fpitchbench) - Experimental Japanese pitch accent based LLM Benchmark\n * [mini-transformer-from-scratch](https:\u002F\u002Fgithub.com\u002Fzuofanf\u002Fmini-transformer-from-scratch) - English to Japanese Transformer from scratch\n * [vv_core_inference](https:\u002F\u002Fgithub.com\u002Fhiroshiba\u002Fvv_core_inference) - VOICEVOXのコア内で用いられているディープラーニングモデルの推論コード\n * [pyopenjtalk-plus](https:\u002F\u002Fgithub.com\u002Ftsukumijima\u002Fpyopenjtalk-plus) - pyopenjtalk-plus: A Python wrapper for OpenJTalk with additional improvements\n * [japanese_spelling_correction](https:\u002F\u002Fgithub.com\u002Fphkhanhtrinh23\u002Fjapanese_spelling_correction) - Japanese Spelling Correction\n * [py-kaomoji](https:\u002F\u002Fgithub.com\u002Fshibuiwilliam\u002Fpy-kaomoji) - python kaomoji\n * [llm-jp-vila](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-vila) - This repository contains the code for training llm-jp\u002Fllm-jp-3-vila-14b, modified from VILA repository.\n * [kanjivg-radical](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fkanjivg-radical) - kanjivg-radical\n * [japanese-wordnet-visualization](https:\u002F\u002Fgithub.com\u002FHemingwayLee\u002Fjapanese-wordnet-visualization) - This project visualizes the Japanese Wordnet (日本語ワードネット) with web application built by Django\n * [piper-plus](https:\u002F\u002Fgithub.com\u002Fayutaz\u002Fpiper-plus) - Enhanced Piper TTS with Japanese support, WebAssembly, multi-GPU training, and quality improvements.\n * [Japanera](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002FJapanera) - Easy Tools for Japanese Era System\n * [bert-abstractive-text-summarization](https:\u002F\u002Fgithub.com\u002Fiwasakiyuuki\u002Fbert-abstractive-text-summarization) - Japanese Sentence Summarization with BERT\n * [kyujipy](https:\u002F\u002Fgithub.com\u002Fdrturnon\u002Fkyujipy) - A Python library to convert Japanese texts from Shinjitai (新字体) to Kyujitai (舊字體) and vice versa\n * [jitenbot](https:\u002F\u002Fgithub.com\u002Fkonstantindjairo\u002Fjitenbot) - Web crawler for creating personal copies of Japanese dictionaries\n * [ja-icd10](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fja-icd10) - ICD-10 国際疾病分類の日本語情報を扱うためのPythonパッケージ\n * [pl-bert-vits2](https:\u002F\u002Fgithub.com\u002Ftonnetonne814\u002Fpl-bert-vits2) - VITS2 using Phoneme-Level Japanese BERT\n * [ndc_predictor](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndc_predictor) - NDCPredictorの機械学習モデル（書誌情報から日本十進分類を推測するfastTextの学習済みモデル）\n * [pfmt-bench-fin-ja](https:\u002F\u002Fgithub.com\u002Fpfnet-research\u002Fpfmt-bench-fin-ja) - pfmt-bench-fin-ja: Preferred Multi-turn Benchmark for Finance in Japanese\n * [marine-plus](https:\u002F\u002Fgithub.com\u002Ftsukumijima\u002Fmarine-plus) - MARINE : Multi-task leaRnIng-based JapaNese accent Estimation (Also supported Windows)\n * [ja-tokenizer-benchmark](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fja-tokenizer-benchmark) - Compare the speed of various Japanese tokenizers in Python.\n * [yat](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fyat) - yat: Yet Another Tokenizer for Japanese NLP\n * [igakuqa119](https:\u002F\u002Fgithub.com\u002Fdocto-rin\u002Figakuqa119) - Evaluating LLMs on the 119th Japanese Medical Licensing Examination\n * [japanese-luw-tokenizer](https:\u002F\u002Fgithub.com\u002Fkoichiyasuoka\u002Fjapanese-luw-tokenizer) - Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers\n * [ibus-jig](https:\u002F\u002Fgithub.com\u002Fy-koj\u002Fibus-jig) - ibus-jig: Japanese-language Input-method using GPT-4\n * [jp-stopword-filter](https:\u002F\u002Fgithub.com\u002FBrambleXu\u002Fjp-stopword-filter) - A lightweight Python library designed to filter stopwords from Japanese text based on customizable rules.\n * [yasumail](https:\u002F\u002Fgithub.com\u002Fterallite\u002Fyasumail) - Synthetic Japanese business email generator for ML training data\n * [himotoki](https:\u002F\u002Fgithub.com\u002Fmsr2903\u002Fhimotoki) - A Python-based Japanese Tokenizer, Dictionary, Morphological Analyzer and Romanization Tool. Based on JMDict for Language Learning.\n * [diafill-toolkit](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fdiafill-toolkit) - A toolkit for synthesizing filler-rich, short-utterance Japanese dialogue scripts for speech-based interaction using Large Language Models (LLMs) This project is designed to generate data in two phases: Seed Generation (metadata creation) and Dialogue Generation (script creation).\n * [eval_vertical_ja](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Feval_vertical_ja) - Evaluating Multimodal Large Language Models on Vertically Written Japanese Text\n * [jp-llm-corpus-pii-filter](https:\u002F\u002Fgithub.com\u002Fmatsuolab\u002Fjp-llm-corpus-pii-filter) - 本コードは，大規模言語モデル（LLM）の学習用コーパスから，個人情報の中でも特に配慮が求められる「要配慮個人情報」をフィルタリングするためのものです.\n * [eval_vertical_ja](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Feval_vertical_ja) - Evaluating Multimodal Large Language Models on Vertically Written Japanese Text\n * [Novel2DialCorpus](https:\u002F\u002Fgithub.com\u002Fganbon\u002FNovel2DialCorpus) - 小説テキストから雑談対話コーパスを構築する手法\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [namedivider-python](https:\u002F\u002Fgithub.com\u002Frskmoi\u002Fnamedivider-python) | 📥 730 | 📦 82k | ⭐ 251 | 🟡 november 2025|\n| 🔗 [asa-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fasa-python) | 📥 36 | 📦 31k | ⭐ 11 | 🔴 february 2019|\n| 🔗 [python_asa](https:\u002F\u002Fgithub.com\u002FTakeuchi-Lab-LM\u002Fpython_asa) | - | - | ⭐ 22 | 🔴 january 2020|\n| 🔗 [toiro](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Ftoiro) | 📥 13 | 📦 27k | ⭐ 121 | 🟡 november 2025|\n| 🔗 [ja-timex](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fja-timex) | 📥 551 | 📦 93k | ⭐ 140 | 🔴 november 2023|\n| 🔗 [JapaneseTokenizers](https:\u002F\u002Fgithub.com\u002FKensuke-Mitsuzawa\u002FJapaneseTokenizers) | - | - | ⭐ 137 | 🔴 march 2019|\n| 🔗 [daaja](https:\u002F\u002Fgithub.com\u002Fkajyuuen\u002Fdaaja) | 📥 66 | 📦 25k | ⭐ 64 | 🔴 february 2023|\n| 🔗 [accel-brain-code](https:\u002F\u002Fgithub.com\u002Faccel-brain\u002Faccel-brain-code) | 📥 251 | 📦 150k | ⭐ 323 | 🔴 december 2023|\n| 🔗 [JGLUE](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002FJGLUE) | - | - | ⭐ 338 | 🔴 march 2025|\n| 🔗 [kyoto-reader](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkyoto-reader) | 📥 64 | 📦 52k | ⭐ 10 | 🔴 june 2024|\n| 🔗 [nlplot](https:\u002F\u002Fgithub.com\u002Ftakapy0210\u002Fnlplot) | 📥 212 | 📦 109k | ⭐ 238 | 🔴 september 2022|\n| 🔗 [rake-ja](https:\u002F\u002Fgithub.com\u002Fkanjirz50\u002Frake-ja) | - | - | ⭐ 21 | 🔴 october 2018|\n| 🔗 [jel](https:\u002F\u002Fgithub.com\u002Fizuna385\u002Fjel) | 📥 13 | 📦 8k | ⭐ 11 | 🔴 july 2021|\n| 🔗 [MedNER-J](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FMedNER-J) | - | - | ⭐ 18 | 🔴 may 2022|\n| 🔗 [zunda-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fzunda-python) | 📥 10 | 📦 6k | ⭐ 10 | 🔴 november 2019|\n| 🔗 [AIO2_DPR_baseline](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FAIO2_DPR_baseline) | - | - | ⭐ 16 | 🔴 january 2022|\n| 🔗 [showcase](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fshowcase) | 📥 4 | 📦 7k | ⭐ 6 | 🔴 june 2018|\n| 🔗 [darts-clone-python](https:\u002F\u002Fgithub.com\u002Frixwew\u002Fdarts-clone-python) | 📥 3k | 📦 9M | ⭐ 20 | 🔴 april 2022|\n| 🔗 [jrte-corpus_example](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fjrte-corpus_example) | - | - | ⭐ 3 | 🔴 november 2021|\n| 🔗 [desuwa](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fdesuwa) | 📥 18 | 📦 10k | ⭐ 6 | 🔴 may 2022|\n| 🔗 [HotPepperGourmetDialogue](https:\u002F\u002Fgithub.com\u002FHironsan\u002FHotPepperGourmetDialogue) | - | - | ⭐ 277 | 🔴 may 2016|\n| 🔗 [nlp-recipes-ja](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp-recipes-ja) | - | - | ⭐ 66 | 🔴 april 2021|\n| 🔗 [Japanese_nlp_scripts](https:\u002F\u002Fgithub.com\u002Folsgaard\u002FJapanese_nlp_scripts) | - | - | ⭐ 26 | 🔴 june 2019|\n| 🔗 [DNorm-J](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FDNorm-J) | - | - | ⭐ 9 | 🔴 june 2022|\n| 🔗 [pyknp-eventgraph](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fpyknp-eventgraph) | 📥 86 | 📦 66k | ⭐ 9 | 🔴 september 2022|\n| 🔗 [ishi](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fishi) | 📥 6 | 📦 6k | ⭐ 2 | 🔴 may 2020|\n| 🔗 [python-npylm](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Fpython-npylm) | - | - | ⭐ 34 | 🔴 january 2019|\n| 🔗 [python-npycrf](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Fpython-npycrf) | - | - | ⭐ 11 | 🔴 march 2018|\n| 🔗 [unsupervised-pos-tagging](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Funsupervised-pos-tagging) | - | - | ⭐ 16 | 🔴 october 2017|\n| 🔗 [negima](https:\u002F\u002Fgithub.com\u002Fcocodrips\u002Fnegima) | 📥 17 | 📦 16k | ⭐ 14 | 🔴 august 2018|\n| 🔗 [YouyakuMan](https:\u002F\u002Fgithub.com\u002Fneilctwu\u002FYouyakuMan) | - | - | ⭐ 52 | 🔴 september 2020|\n| 🔗 [japanese-numbers-python](https:\u002F\u002Fgithub.com\u002Ftakumakanari\u002Fjapanese-numbers-python) | 📥 1k | 📦 2M | ⭐ 21 | 🔴 april 2020|\n| 🔗 [kantan](https:\u002F\u002Fgithub.com\u002Fitayperl\u002Fkantan) | - | - | ⭐ 8 | 🔴 october 2024|\n| 🔗 [make-meidai-dialogue](https:\u002F\u002Fgithub.com\u002Fknok\u002Fmake-meidai-dialogue) | - | - | ⭐ 40 | 🔴 september 2017|\n| 🔗 [japanese_summarizer](https:\u002F\u002Fgithub.com\u002Fryuryukke\u002Fjapanese_summarizer) | - | - | ⭐ 10 | 🔴 august 2022|\n| 🔗 [chirptext](https:\u002F\u002Fgithub.com\u002Fletuananh\u002Fchirptext) | 📥 6k | 📦 212k | ⭐ 7 | 🔴 october 2022|\n| 🔗 [yubin](https:\u002F\u002Fgithub.com\u002Falvations\u002Fyubin) | 📥 7 | 📦 3k | ⭐ 3 | 🔴 october 2019|\n| 🔗 [jawiki-cleaner](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fjawiki-cleaner) | 📥 34 | 📦 24k | ⭐ 6 | 🔴 february 2021|\n| 🔗 [japanese2phoneme](https:\u002F\u002Fgithub.com\u002Fiory\u002Fjapanese2phoneme) | 📥 5 | 📦 4k | ⭐ 1 | 🔴 february 2022|\n| 🔗 [anlp_nlp2021_d3-1](https:\u002F\u002Fgithub.com\u002Farusl\u002Fanlp_nlp2021_d3-1) | - | - | ⭐ 1 | 🔴 march 2022|\n| 🔗 [aozora_classification](https:\u002F\u002Fgithub.com\u002Fshibuiwilliam\u002Faozora_classification) | - | - | ⭐ 11 | 🔴 september 2017|\n| 🔗 [aozora-corpus-generator](https:\u002F\u002Fgithub.com\u002Fborh\u002Faozora-corpus-generator) | - | - | ⭐ 8 | 🟡 june 2025|\n| 🔗 [JLM](https:\u002F\u002Fgithub.com\u002Fjiali-ms\u002FJLM) | - | - | ⭐ 111 | 🔴 june 2019|\n| 🔗 [NTM](https:\u002F\u002Fgithub.com\u002Fm3yrin\u002FNTM) | - | - | ⭐ 13 | 🔴 july 2019|\n| 🔗 [EN-JP-ML-Lexicon](https:\u002F\u002Fgithub.com\u002FMachine-Learning-Tokyo\u002FEN-JP-ML-Lexicon) | - | - | ⭐ 40 | 🔴 march 2021|\n| 🔗 [text-generation](https:\u002F\u002Fgithub.com\u002Fdiscus0434\u002Ftext-generation) | - | - | ⭐ invalid | 🟡 august 2025|\n| 🔗 [chainer_nic](https:\u002F\u002Fgithub.com\u002Fyuyay\u002Fchainer_nic) | - | - | ⭐ 17 | 🔴 december 2018|\n| 🔗 [unihan-lm](https:\u002F\u002Fgithub.com\u002FJetRunner\u002Funihan-lm) | - | - | ⭐ 2 | 🔴 november 2020|\n| 🔗 [mbart-finetuning](https:\u002F\u002Fgithub.com\u002Fken11\u002Fmbart-finetuning) | - | - | ⭐ 3 | 🔴 october 2021|\n| 🔗 [xvector_jtubespeech](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fxvector_jtubespeech) | - | - | ⭐ 47 | 🔴 november 2023|\n| 🔗 [TinySegmenterMaker](https:\u002F\u002Fgithub.com\u002Fshogo82148\u002FTinySegmenterMaker) | - | - | ⭐ 72 | 🔴 september 2022|\n| 🔗 [Grongish](https:\u002F\u002Fgithub.com\u002Fshogo82148\u002FGrongish) | - | - | ⭐ 25 | 🟡 december 2025|\n| 🔗 [WordCloud-Japanese](https:\u002F\u002Fgithub.com\u002Faocattleya\u002FWordCloud-Japanese) | - | - | ⭐ 9 | 🔴 january 2020|\n| 🔗 [snark](https:\u002F\u002Fgithub.com\u002Fhiraokusky\u002Fsnark) | - | - | ⭐ 11 | 🔴 march 2020|\n| 🔗 [toEmoji](https:\u002F\u002Fgithub.com\u002Fmkan0141\u002FtoEmoji) | - | - | ⭐ 4 | 🔴 april 2018|\n| 🔗 [termextract](https:\u002F\u002Fgithub.com\u002Fkanjirz50\u002Ftermextract) | - | - | ⭐ 18 | 🔴 september 2018|\n| 🔗 [JDT-with-KenLM-scoring](https:\u002F\u002Fgithub.com\u002FTUT-SLP-lab\u002FJDT-with-KenLM-scoring) | - | - | ⭐ 1 | 🔴 july 2022|\n| 🔗 [mixture-of-unigram-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002Fmixture-of-unigram-model) | - | - | ⭐ 6 | 🔴 june 2017|\n| 🔗 [hidden-markov-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002Fhidden-markov-model) | - | - | ⭐ 5 | 🔴 june 2017|\n| 🔗 [Ngram-language-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002FNgram-language-model) | - | - | ⭐ 5 | 🔴 december 2017|\n| 🔗 [ASRDeepSpeech](https:\u002F\u002Fgithub.com\u002FJeanMaximilienCadic\u002FASRDeepSpeech) | - | - | ⭐ 69 | 🔴 september 2022|\n| 🔗 [neural_ime](https:\u002F\u002Fgithub.com\u002Fyohokuno\u002Fneural_ime) | - | - | ⭐ 67 | 🔴 december 2016|\n| 🔗 [neural_japanese_transliterator](https:\u002F\u002Fgithub.com\u002FKyubyong\u002Fneural_japanese_transliterator) | - | - | ⭐ 178 | 🔴 september 2017|\n| 🔗 [tinysegmenter](https:\u002F\u002Fgithub.com\u002FSamuraiT\u002Ftinysegmenter) | 📥 112k | 📦 173k | ⭐ repo not found | 🔴 november 2015|\n| 🔗 [AugLy-jp](https:\u002F\u002Fgithub.com\u002Fchck\u002FAugLy-jp) | 📥 85 | 📦 30k | ⭐ 7 | 🔴 september 2021|\n| 🔗 [furigana4epub](https:\u002F\u002Fgithub.com\u002FMumumu4\u002Ffurigana4epub) | 📥 22 | 📦 12k | ⭐ 29 | 🔴 september 2021|\n| 🔗 [PyKatsuyou](https:\u002F\u002Fgithub.com\u002FSmashinFries\u002FPyKatsuyou) | 📥 93 | 📦 20k | ⭐ 12 | 🔴 march 2025|\n| 🔗 [jageocoder](https:\u002F\u002Fgithub.com\u002Ft-sagara\u002Fjageocoder) | 📥 4k | 📦 354k | ⭐ 95 | 🟢 last tuesday|\n| 🔗 [pygeonlp](https:\u002F\u002Fgithub.com\u002Fgeonlp-platform\u002Fpygeonlp) | 📥 70 | 📦 22k | ⭐ 22 | 🟢 march|\n| 🔗 [nksnd](https:\u002F\u002Fgithub.com\u002Fyoriyuki\u002Fnksnd) | - | - | ⭐ 26 | 🔴 may 2018|\n| 🔗 [JaMIE](https:\u002F\u002Fgithub.com\u002Fracerandom\u002FJaMIE) | - | - | ⭐ 9 | 🟢 march|\n| 🔗 [fasttext-vs-word2vec-on-twitter-data](https:\u002F\u002Fgithub.com\u002FGINK03\u002Ffasttext-vs-word2vec-on-twitter-data) | - | - | ⭐ 48 | 🔴 august 2017|\n| 🔗 [minimal-search-engine](https:\u002F\u002Fgithub.com\u002FGINK03\u002Fminimal-search-engine) | - | - | ⭐ 19 | 🔴 july 2019|\n| 🔗 [5ch-analysis](https:\u002F\u002Fgithub.com\u002FGINK03\u002F5ch-analysis) | - | - | ⭐ 75 | 🔴 november 2018|\n| 🔗 [tweet_extructor](https:\u002F\u002Fgithub.com\u002FtatHi\u002Ftweet_extructor) | - | - | ⭐ 3 | 🔴 august 2022|\n| 🔗 [japanese-word-aggregation](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fjapanese-word-aggregation) | - | - | ⭐ 2 | 🔴 august 2018|\n| 🔗 [jinf](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fjinf) | 📥 619 | 📦 56k | ⭐ 4 | 🔴 december 2022|\n| 🔗 [kwja](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkwja) | 📥 340 | 📦 57k | ⭐ 141 | 🟡 august 2025|\n| 🔗 [mlm-scoring-transformers](https:\u002F\u002Fgithub.com\u002FRyutaro-A\u002Fmlm-scoring-transformers) | - | - | ⭐ 6 | 🔴 december 2022|\n| 🔗 [ClipCap-for-Japanese](https:\u002F\u002Fgithub.com\u002FJapanese-Image-Captioning\u002FClipCap-for-Japanese) | - | - | ⭐ 12 | 🔴 october 2022|\n| 🔗 [SAT-for-Japanese](https:\u002F\u002Fgithub.com\u002FJapanese-Image-Captioning\u002FSAT-for-Japanese) | - | - | ⭐ 2 | 🔴 october 2022|\n| 🔗 [cihai](https:\u002F\u002Fgithub.com\u002Fcihai\u002Fcihai) | 📥 833 | 📦 213k | ⭐ 93 | 🟢 today|\n| 🔗 [marine](https:\u002F\u002Fgithub.com\u002F6gsn\u002Fmarine) | 📥 43 | 📦 15k | ⭐ 36 | 🔴 september 2022|\n| 🔗 [whisper-asr-finetune](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fwhisper-asr-finetune) | - | - | ⭐ 32 | 🔴 december 2022|\n| 🔗 [japanese_chatbot](https:\u002F\u002Fgithub.com\u002FCjangCjengh\u002Fjapanese_chatbot) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [radicalchar](https:\u002F\u002Fgithub.com\u002Fyamamaya\u002Fradicalchar) | - | - | ⭐ 9 | 🔴 december 2022|\n| 🔗 [akaza](https:\u002F\u002Fgithub.com\u002Ftokuhirom\u002Fakaza) | - | - | ⭐ 249 | 🟢 yesterday|\n| 🔗 [posuto](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fposuto) | 📥 6k | 📦 696k | ⭐ 226 | 🟢 last wednesday|\n| 🔗 [tacotron2-japanese](https:\u002F\u002Fgithub.com\u002FCjangCjengh\u002Ftacotron2-japanese) | - | - | ⭐ 269 | 🔴 september 2022|\n| 🔗 [ibus-hiragana](https:\u002F\u002Fgithub.com\u002Fesrille\u002Fibus-hiragana) | - | - | ⭐ 78 | 🟢 march|\n| 🔗 [furiganapad](https:\u002F\u002Fgithub.com\u002Fesrille\u002Ffuriganapad) | - | - | ⭐ 19 | 🟡 april 2025|\n| 🔗 [chikkarpy](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002Fchikkarpy) | 📥 418 | 📦 60k | ⭐ 55 | 🔴 february 2022|\n| 🔗 [ja-tokenizer-docker-py](https:\u002F\u002Fgithub.com\u002Fp-geon\u002Fja-tokenizer-docker-py) | - | - | ⭐ 36 | 🔴 may 2022|\n| 🔗 [JapaneseEmbeddingEval](https:\u002F\u002Fgithub.com\u002Foshizo\u002FJapaneseEmbeddingEval) | - | - | ⭐ 183 | 🔴 october 2024|\n| 🔗 [gptuber-by-langchain](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fgptuber-by-langchain) | - | - | ⭐ 63 | 🔴 january 2023|\n| 🔗 [shuwa](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fshuwa) | - | - | ⭐ 146 | 🔴 december 2022|\n| 🔗 [japanese-nli-model](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002Fjapanese-nli-model) | - | - | ⭐ 6 | 🔴 october 2022|\n| 🔗 [tra-fugu](https:\u002F\u002Fgithub.com\u002Ftos-kamiya\u002Ftra-fugu) | - | - | ⭐ 6 | 🔴 march 2023|\n| 🔗 [fugumt](https:\u002F\u002Fgithub.com\u002Fs-taka\u002Ffugumt) | - | - | ⭐ 64 | 🔴 february 2021|\n| 🔗 [JaSPICE](https:\u002F\u002Fgithub.com\u002Fkeio-smilab23\u002FJaSPICE) | 📥 4 | 📦 2k | ⭐ 9 | 🔴 november 2023|\n| 🔗 [Retrieval-based-Voice-Conversion-WebUI-JP-localization](https:\u002F\u002Fgithub.com\u002Fyantaisa11\u002FRetrieval-based-Voice-Conversion-WebUI-JP-localization) | - | - | ⭐ 48 | 🔴 april 2023|\n| 🔗 [pyopenjtalk](https:\u002F\u002Fgithub.com\u002Fr9y9\u002Fpyopenjtalk) | 📥 19k | 📦 1M | ⭐ 249 | 🟡 april 2025|\n| 🔗 [yomigana-ebook](https:\u002F\u002Fgithub.com\u002Frabbit19981023\u002Fyomigana-ebook) | 📥 22 | 📦 7k | ⭐ 26 | 🔴 february 2024|\n| 🔗 [N46Whisper](https:\u002F\u002Fgithub.com\u002FAyanaminn\u002FN46Whisper) | - | - | ⭐ 1.7k | 🔴 february 2025|\n| 🔗 [japanese_llm_simple_webui](https:\u002F\u002Fgithub.com\u002Fnoir55\u002Fjapanese_llm_simple_webui) | - | - | ⭐ 17 | 🔴 may 2024|\n| 🔗 [pdf-translator](https:\u002F\u002Fgithub.com\u002Fdiscus0434\u002Fpdf-translator) | - | - | ⭐ 339 | 🔴 may 2024|\n| 🔗 [japanese_qa_demo_with_haystack_and_es](https:\u002F\u002Fgithub.com\u002FShingo-Kamata\u002Fjapanese_qa_demo_with_haystack_and_es) | - | - | ⭐ 1 | 🔴 december 2022|\n| 🔗 [mozc-devices](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmozc-devices) | - | - | ⭐ 2.7k | 🟡 november 2025|\n| 🔗 [natsume](https:\u002F\u002Fgithub.com\u002Ffaruzan0820\u002Fnatsume) | 📥 0 | 📦 3k | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [vits-japros-webui](https:\u002F\u002Fgithub.com\u002Flitagin02\u002Fvits-japros-webui) | - | - | ⭐ 42 | 🔴 january 2024|\n| 🔗 [ja-law-parser](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fja-law-parser) | - | - | ⭐ 25 | 🔴 january 2024|\n| 🔗 [dictation-kit](https:\u002F\u002Fgithub.com\u002Fjulius-speech\u002Fdictation-kit) | - | - | ⭐ 164 | 🔴 april 2019|\n| 🔗 [julius4seg](https:\u002F\u002Fgithub.com\u002FHiroshiba\u002Fjulius4seg) | - | - | ⭐ 7 | 🔴 august 2021|\n| 🔗 [voicevox_engine](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox_engine) | - | - | ⭐ 1.7k | 🟢 last wednesday|\n| 🔗 [LLaVA-JP](https:\u002F\u002Fgithub.com\u002Ftosiyuki\u002FLLaVA-JP) | - | - | ⭐ 64 | 🔴 june 2024|\n| 🔗 [RAG-Japanese](https:\u002F\u002Fgithub.com\u002FAkimParis\u002FRAG-Japanese) | - | - | ⭐ 10 | 🟡 may 2025|\n| 🔗 [bertjsc](https:\u002F\u002Fgithub.com\u002Fer-ri\u002Fbertjsc) | - | - | ⭐ 14 | 🔴 august 2024|\n| 🔗 [llm-leaderboard](https:\u002F\u002Fgithub.com\u002Fwandb\u002Fllm-leaderboard) | - | - | ⭐ 92 | 🟡 september 2025|\n| 🔗 [jglue-evaluation-scripts](https:\u002F\u002Fgithub.com\u002Fnobu-g\u002Fjglue-evaluation-scripts) | - | - | ⭐ 18 | 🟢 last thursday|\n| 🔗 [BLIP2-Japanese](https:\u002F\u002Fgithub.com\u002FZhaoPeiduo\u002FBLIP2-Japanese) | - | - | ⭐ 13 | 🟡 september 2025|\n| 🔗 [wikipedia-passages-jawiki-embeddings-utils](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fwikipedia-passages-jawiki-embeddings-utils) | - | - | ⭐ 11 | 🔴 march 2024|\n| 🔗 [simple-simcse-ja](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fsimple-simcse-ja) | - | - | ⭐ 69 | 🔴 october 2023|\n| 🔗 [wikipedia-japanese-open-rag](https:\u002F\u002Fgithub.com\u002Flawofcycles\u002Fwikipedia-japanese-open-rag) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [gpt4-autoeval](https:\u002F\u002Fgithub.com\u002Fnorthern-system-service\u002Fgpt4-autoeval) | - | - | ⭐ 16 | 🔴 june 2024|\n| 🔗 [t5-japanese](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Ft5-japanese) | - | - | ⭐ 118 | 🟡 september 2025|\n| 🔗 [japanese_llm_eval](https:\u002F\u002Fgithub.com\u002Flightblue-tech\u002Fjapanese_llm_eval) | - | - | ⭐ 5 | 🔴 invalid|\n| 🔗 [jmteb](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fjmteb) | - | - | ⭐ 89 | 🟢 march|\n| 🔗 [pydomino](https:\u002F\u002Fgithub.com\u002Fdwangomediavillage\u002Fpydomino) | - | - | ⭐ 39 | 🟡 august 2025|\n| 🔗 [easynovelassistant](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasynovelassistant) | - | - | ⭐ 222 | 🔴 july 2024|\n| 🔗 [clip-japanese](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Fclip-japanese) | - | - | ⭐ 13 | 🟡 september 2025|\n| 🔗 [rime-jaroomaji](https:\u002F\u002Fgithub.com\u002Flazyfoxchan\u002Frime-jaroomaji) | - | - | ⭐ 48 | 🟢 last thursday|\n| 🔗 [deep-question-generation](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Fdeep-question-generation) | - | - | ⭐ 12 | 🔴 march 2023|\n| 🔗 [magpie-nemotron](https:\u002F\u002Fgithub.com\u002Faratako\u002Fmagpie-nemotron) | - | - | ⭐ 9 | 🔴 july 2024|\n| 🔗 [qlora_ja](https:\u002F\u002Fgithub.com\u002Fsosuke115\u002Fqlora_ja) | - | - | ⭐ 1 | 🔴 july 2024|\n| 🔗 [mozcdic-ut-jawiki](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-jawiki) | - | - | ⭐ 28 | 🟢 last thursday|\n| 🔗 [shisa-v2](https:\u002F\u002Fgithub.com\u002Fshisa-ai\u002Fshisa-v2) | - | - | ⭐ 28 | 🟡 december 2025|\n| 🔗 [llm-translator](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fllm-translator) | - | - | ⭐ 20 | 🔴 january 2025|\n| 🔗 [llm-jp-asr](https:\u002F\u002Fgithub.com\u002Ftosiyuki\u002Fllm-jp-asr) | - | - | ⭐ 9 | 🔴 september 2024|\n| 🔗 [rag-japanese](https:\u002F\u002Fgithub.com\u002Fakimfromparis\u002Frag-japanese) | - | - | ⭐ 10 | 🟡 may 2025|\n| 🔗 [monaka](https:\u002F\u002Fgithub.com\u002Fkomiya-lab\u002Fmonaka) | - | - | ⭐ 5 | 🔴 january 2025|\n| 🔗 [jp-translate.cloud](https:\u002F\u002Fgithub.com\u002Fmatthewbieda\u002Fjp-translate.cloud) | - | - | ⭐ 3 | 🔴 september 2024|\n| 🔗 [substring-word-finder](https:\u002F\u002Fgithub.com\u002Ftoufu-24\u002Fsubstring-word-finder) | - | - | ⭐ 4 | 🟡 november 2025|\n| 🔗 [heron-vlm-leaderboard](https:\u002F\u002Fgithub.com\u002Fwandb\u002Fheron-vlm-leaderboard) | - | - | ⭐ 6 | 🔴 december 2024|\n| 🔗 [text2dataset](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Ftext2dataset) | - | - | ⭐ 28 | 🔴 january 2025|\n| 🔗 [mecab-web-api](https:\u002F\u002Fgithub.com\u002Fbungoume\u002Fmecab-web-api) | - | - | ⭐ 40 | 🔴 july 2022|\n| 🔗 [mecab_controller](https:\u002F\u002Fgithub.com\u002Fajatt-tools\u002Fmecab_controller) | - | - | ⭐ 19 | 🟢 march|\n| 🔗 [vits](https:\u002F\u002Fgithub.com\u002Fzassou65535\u002Fvits) | - | - | ⭐ 92 | 🔴 february 2023|\n| 🔗 [akari_chatgpt_bot](https:\u002F\u002Fgithub.com\u002Fakarigroup\u002Fakari_chatgpt_bot) | - | - | ⭐ 48 | 🟡 october 2025|\n| 🔗 [kudasai](https:\u002F\u002Fgithub.com\u002Fbikatr7\u002Fkudasai) | - | - | ⭐ 26 | 🟡 june 2025|\n| 🔗 [mecab-visualizer](https:\u002F\u002Fgithub.com\u002Fsophiefy\u002Fmecab-visualizer) | - | - | ⭐ 2 | 🔴 september 2023|\n| 🔗 [add-dictionary](https:\u002F\u002Fgithub.com\u002Fmassao000\u002Fadd-dictionary) | - | - | ⭐ 3 | 🟡 october 2025|\n| 🔗 [j-moshi](https:\u002F\u002Fgithub.com\u002Fnu-dialogue\u002Fj-moshi) | - | - | ⭐ 305 | 🟡 june 2025|\n| 🔗 [jatts](https:\u002F\u002Fgithub.com\u002Funilight\u002Fjatts) | - | - | ⭐ 44 | 🟢 march|\n| 🔗 [tsukasa-speech](https:\u002F\u002Fgithub.com\u002Frespaired\u002Ftsukasa-speech) | - | - | ⭐ 63 | 🟡 may 2025|\n| 🔗 [symptom-expression-search](https:\u002F\u002Fgithub.com\u002Fpo3rin\u002Fsymptom-expression-search) | - | - | ⭐ 2 | 🔴 february 2021|\n| 🔗 [llm-jp-judge](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-judge) | - | - | ⭐ 40 | 🟡 december 2025|\n| 🔗 [asagi-vlm-colaboratory-sample](https:\u002F\u002Fgithub.com\u002Fkazuhito00\u002Fasagi-vlm-colaboratory-sample) | - | - | ⭐ 1 | 🔴 march 2025|\n| 🔗 [llm-jp-eval-mm](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-eval-mm) | - | - | ⭐ 41 | 🟢 january|\n| 🔗 [llm-jp-judge](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-judge) | - | - | ⭐ 40 | 🟡 december 2025|\n| 🔗 [manga109api](https:\u002F\u002Fgithub.com\u002Fmanga109\u002Fmanga109api) | 📥 190 | 📦 46k | ⭐ 129 | 🔴 march 2022|\n| 🔗 [fastrtc-jp](https:\u002F\u002Fgithub.com\u002Froute250\u002Ffastrtc-jp) | - | - | ⭐ 5 | 🟡 may 2025|\n| 🔗 [whisper-transcription](https:\u002F\u002Fgithub.com\u002Ffumifumi0831\u002Fwhisper-transcription) | - | - | ⭐ 17 | 🟢 january|\n| 🔗 [pocket-researcher](https:\u002F\u002Fgithub.com\u002Fu-masao\u002Fpocket-researcher) | - | - | ⭐ 10 | 🟡 april 2025|\n| 🔗 [jtransbench](https:\u002F\u002Fgithub.com\u002Fwebbigdata-jp\u002Fjtransbench) | - | - | ⭐ 13 | 🟡 october 2025|\n| 🔗 [easyllasa](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasyllasa) | - | - | ⭐ 25 | 🟡 september 2025|\n| 🔗 [kanjikana-model](https:\u002F\u002Fgithub.com\u002Fdigital-go-jp\u002Fkanjikana-model) | - | - | ⭐ 114 | 🟡 december 2025|\n| 🔗 [deep-openreview-research-ja](https:\u002F\u002Fgithub.com\u002Ftb-yasu\u002Fdeep-openreview-research-ja) | - | - | ⭐ 13 | 🟡 november 2025|\n| 🔗 [pitchbench](https:\u002F\u002Fgithub.com\u002Fshewiiii\u002Fpitchbench) | - | - | ⭐ 1 | 🟢 february|\n| 🔗 [mini-transformer-from-scratch](https:\u002F\u002Fgithub.com\u002Fzuofanf\u002Fmini-transformer-from-scratch) | - | - | ⭐ 2 | 🟡 november 2025|\n| 🔗 [vv_core_inference](https:\u002F\u002Fgithub.com\u002Fhiroshiba\u002Fvv_core_inference) | - | - | ⭐ 31 | 🟡 december 2025|\n| 🔗 [pyopenjtalk-plus](https:\u002F\u002Fgithub.com\u002Ftsukumijima\u002Fpyopenjtalk-plus) | 📥 24k | 📦 456k | ⭐ 56 | 🔴 invalid|\n| 🔗 [japanese_spelling_correction](https:\u002F\u002Fgithub.com\u002Fphkhanhtrinh23\u002Fjapanese_spelling_correction) | - | - | ⭐ 14 | 🔴 september 2023|\n| 🔗 [py-kaomoji](https:\u002F\u002Fgithub.com\u002Fshibuiwilliam\u002Fpy-kaomoji) | 📥 28 | 📦 37k | ⭐ 6 | 🔴 december 2018|\n| 🔗 [llm-jp-vila](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-vila) | - | - | ⭐ 10 | 🟡 august 2025|\n| 🔗 [kanjivg-radical](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fkanjivg-radical) | - | - | ⭐ 106 | 🔴 august 2018|\n| 🔗 [japanese-wordnet-visualization](https:\u002F\u002Fgithub.com\u002FHemingwayLee\u002Fjapanese-wordnet-visualization) | - | - | ⭐ 3 | 🔴 november 2022|\n| 🔗 [piper-plus](https:\u002F\u002Fgithub.com\u002Fayutaz\u002Fpiper-plus) | - | - | ⭐ 106 | 🟢 today|\n| 🔗 [Japanera](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002FJapanera) | 📥 3k | 📦 366k | ⭐ 35 | 🟡 june 2025|\n| 🔗 [bert-abstractive-text-summarization](https:\u002F\u002Fgithub.com\u002Fiwasakiyuuki\u002Fbert-abstractive-text-summarization) | - | - | ⭐ 49 | 🔴 december 2019|\n| 🔗 [kyujipy](https:\u002F\u002Fgithub.com\u002Fdrturnon\u002Fkyujipy) | 📥 25 | 📦 23k | ⭐ 22 | 🟢 january|\n| 🔗 [jitenbot](https:\u002F\u002Fgithub.com\u002Fkonstantindjairo\u002Fjitenbot) | - | - | ⭐ 4 | 🔴 december 2024|\n| 🔗 [ja-icd10](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fja-icd10) | - | - | ⭐ 5 | 🔴 july 2021|\n| 🔗 [pl-bert-vits2](https:\u002F\u002Fgithub.com\u002Ftonnetonne814\u002Fpl-bert-vits2) | - | - | ⭐ 14 | 🔴 december 2023|\n| 🔗 [ndc_predictor](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndc_predictor) | - | - | ⭐ 11 | 🔴 august 2021|\n| 🔗 [pfmt-bench-fin-ja](https:\u002F\u002Fgithub.com\u002Fpfnet-research\u002Fpfmt-bench-fin-ja) | - | - | ⭐ 9 | 🔴 march 2025|\n| 🔗 [marine-plus](https:\u002F\u002Fgithub.com\u002Ftsukumijima\u002Fmarine-plus) | 📥 299 | 📦 12k | ⭐ 8 | 🟢 march|\n| 🔗 [ja-tokenizer-benchmark](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fja-tokenizer-benchmark) | - | - | ⭐ 7 | 🔴 february 2022|\n| 🔗 [yat](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fyat) | - | - | ⭐ 7 | 🔴 june 2018|\n| 🔗 [igakuqa119](https:\u002F\u002Fgithub.com\u002Fdocto-rin\u002Figakuqa119) | - | - | ⭐ 8 | 🟢 january|\n| 🔗 [japanese-luw-tokenizer](https:\u002F\u002Fgithub.com\u002Fkoichiyasuoka\u002Fjapanese-luw-tokenizer) | - | - | ⭐ 6 | 🔴 december 2021|\n| 🔗 [ibus-jig](https:\u002F\u002Fgithub.com\u002Fy-koj\u002Fibus-jig) | - | - | ⭐ 4 | 🔴 december 2023|\n| 🔗 [jp-stopword-filter](https:\u002F\u002Fgithub.com\u002FBrambleXu\u002Fjp-stopword-filter) | 📥 8 | 📦 5k | ⭐ 4 | 🔴 november 2024|\n| 🔗 [yasumail](https:\u002F\u002Fgithub.com\u002Fterallite\u002Fyasumail) | - | - | ⭐ 2 | 🟢 january|\n| 🔗 [himotoki](https:\u002F\u002Fgithub.com\u002Fmsr2903\u002Fhimotoki) | 📥 73 | 📦 4k | ⭐ 3 | 🟢 february|\n| 🔗 [diafill-toolkit](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fdiafill-toolkit) | - | - | ⭐ 0 | 🟢 january|\n| 🔗 [eval_vertical_ja](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Feval_vertical_ja) | - | - | ⭐ 1 | 🟡 november 2025|\n| 🔗 [jp-llm-corpus-pii-filter](https:\u002F\u002Fgithub.com\u002Fmatsuolab\u002Fjp-llm-corpus-pii-filter) | - | - | ⭐ 7 | 🔴 march 2025|\n| 🔗 [eval_vertical_ja](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Feval_vertical_ja) | - | - | ⭐ 1 | 🟡 november 2025|\n| 🔗 [Novel2DialCorpus](https:\u002F\u002Fgithub.com\u002Fganbon\u002FNovel2DialCorpus) | - | - | ⭐ 0 | 🟢 february|\n\n\n## C++\n\n### Morphology analysis\nHigh-performance libraries for Japanese morphological analysis\n\n * [mecab](https:\u002F\u002Fgithub.com\u002Ftaku910\u002Fmecab) - Yet another Japanese morphological analyzer\n * [jumanpp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fjumanpp) - Juman++ (a Morphological Analyzer Toolkit)\n * [kytea](https:\u002F\u002Fgithub.com\u002Fneubig\u002Fkytea) - The Kyoto Text Analysis Toolkit for word segmentation and pronunciation estimation, etc.\n * [juman](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fjuman) - Japanese Morphological Analysis System JUMAN\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [mecab](https:\u002F\u002Fgithub.com\u002Ftaku910\u002Fmecab) | - | - | ⭐ 1.1k | 🔴 february 2025|\n| 🔗 [jumanpp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fjumanpp) | - | - | ⭐ 411 | 🔴 march 2023|\n| 🔗 [kytea](https:\u002F\u002Fgithub.com\u002Fneubig\u002Fkytea) | - | - | ⭐ 212 | 🔴 april 2020|\n| 🔗 [juman](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fjuman) | - | - | ⭐ 12 | 🔴 december 2021|\n\n### Parsing\nLibraries for dependency and syntactic parsing of Japanese sentences\n\n * [cabocha](https:\u002F\u002Fgithub.com\u002Ftaku910\u002Fcabocha) - Yet Another Japanese Dependency Structure Analyzer\n * [knp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fknp) - A Japanese Parser\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [cabocha](https:\u002F\u002Fgithub.com\u002Ftaku910\u002Fcabocha) | - | - | ⭐ 121 | 🔴 february 2025|\n| 🔗 [knp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fknp) | - | - | ⭐ 34 | 🔴 november 2023|\n\n### Others\nOther Japanese NLP and text processing libraries\n\n * [jsc](https:\u002F\u002Fgithub.com\u002Fyohokuno\u002Fjsc) - Joint source channel model for Japanese Kana Kanji conversion, Chinese pinyin input and CJE mixed input.\n * [aquaskk](https:\u002F\u002Fgithub.com\u002Fcodefirst\u002Faquaskk) - An input method without morphological analysis.\n * [mozc](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmozc) - Mozc - a Japanese Input Method Editor designed for multi-platform\n * [trimatch](https:\u002F\u002Fgithub.com\u002Ftuem\u002Ftrimatch) - Trimatch: An (Exact|Prefix|Approximate) String Matching Library\n * [resembla](https:\u002F\u002Fgithub.com\u002Ftuem\u002Fresembla) - Resembla: Word-based Japanese similar sentence search library\n * [corvusskk](https:\u002F\u002Fgithub.com\u002Fnathancorvussolis\u002Fcorvusskk) - ▽▼ SKK-like Japanese Input Method Editor for Windows\n * [mozuku](https:\u002F\u002Fgithub.com\u002Ft3tra-dev\u002Fmozuku) - 日本語文章の解析・校正を行う LSP サーバー。\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [jsc](https:\u002F\u002Fgithub.com\u002Fyohokuno\u002Fjsc) | - | - | ⭐ 15 | 🔴 december 2012|\n| 🔗 [aquaskk](https:\u002F\u002Fgithub.com\u002Fcodefirst\u002Faquaskk) | - | - | ⭐ 369 | 🔴 july 2023|\n| 🔗 [mozc](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmozc) | - | - | ⭐ 2.9k | 🟢 yesterday|\n| 🔗 [trimatch](https:\u002F\u002Fgithub.com\u002Ftuem\u002Ftrimatch) | - | - | ⭐ 2 | 🟢 february|\n| 🔗 [resembla](https:\u002F\u002Fgithub.com\u002Ftuem\u002Fresembla) | - | - | ⭐ 73 | 🟡 august 2025|\n| 🔗 [corvusskk](https:\u002F\u002Fgithub.com\u002Fnathancorvussolis\u002Fcorvusskk) | - | - | ⭐ 362 | 🟢 march|\n| 🔗 [mozuku](https:\u002F\u002Fgithub.com\u002Ft3tra-dev\u002Fmozuku) | - | - | ⭐ 411 | 🟢 last friday|\n\n\n## Rust crate\n\n### Morphology analysis\nFast Japanese morphological analysis crates written in Rust\n\n * [lindera](https:\u002F\u002Fgithub.com\u002Flindera-morphology\u002Flindera) - A morphological analysis library.\n * [vaporetto](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fvaporetto) - Vaporetto: Very Accelerated POintwise pREdicTion based TOkenizer\n * [goya](https:\u002F\u002Fgithub.com\u002FLeko\u002Fgoya) - Japanese Morphological Analysis written in Rust\n * [vibrato](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fvibrato) - vibrato: Viterbi-based accelerated tokenizer\n * [yoin](https:\u002F\u002Fgithub.com\u002Fagatan\u002Fyoin) - A Japanese Morphological Analyzer written in pure Rust\n * [mecab-rs](https:\u002F\u002Fgithub.com\u002Ftsurai\u002Fmecab-rs) - Safe Rust bindings for mecab a part-of-speech and morphological analyzer library\n * [awabi](https:\u002F\u002Fgithub.com\u002Fnakagami\u002Fawabi) - A morphological analyzer using mecab dictionary\n * [kanpyo](https:\u002F\u002Fgithub.com\u002Ftogatoga\u002Fkanpyo) - Japanese Morphological Analyzer written in Rust\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [lindera](https:\u002F\u002Fgithub.com\u002Flindera-morphology\u002Flindera) | - | 📦 1M | ⭐ 610 | 🟢 today|\n| 🔗 [vaporetto](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fvaporetto) | - | 📦 196k | ⭐ 255 | 🟢 february|\n| 🔗 [goya](https:\u002F\u002Fgithub.com\u002FLeko\u002Fgoya) | - | 📦 11k | ⭐ 83 | 🔴 december 2021|\n| 🔗 [vibrato](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fvibrato) | - | 📦 60k | ⭐ 404 | 🟢 february|\n| 🔗 [yoin](https:\u002F\u002Fgithub.com\u002Fagatan\u002Fyoin) | - | 📦 3k | ⭐ 26 | 🔴 october 2017|\n| 🔗 [mecab-rs](https:\u002F\u002Fgithub.com\u002Ftsurai\u002Fmecab-rs) | - | 📦 40k | ⭐ 71 | 🔴 september 2023|\n| 🔗 [awabi](https:\u002F\u002Fgithub.com\u002Fnakagami\u002Fawabi) | - | 📦 24k | ⭐ 10 | 🟡 november 2025|\n| 🔗 [kanpyo](https:\u002F\u002Fgithub.com\u002Ftogatoga\u002Fkanpyo) | - | 📦 2.5k | ⭐ 109 | 🟢 february|\n\n\n### Converter\nCrates for script and character conversion in Japanese text\n\n * [wana_kana_rust](https:\u002F\u002Fgithub.com\u002FPSeitz\u002Fwana_kana_rust) - Utility library for checking and converting between Japanese characters - Hiragana, Katakana - and Romaji\n * [unicode-jp-rs](https:\u002F\u002Fgithub.com\u002Fgemmarx\u002Funicode-jp-rs) - A Rust library to convert Japanese Half-width-kana[半角ｶﾅ] and Wide-alphanumeric[全角英数] into normal ones\n * [kana](https:\u002F\u002Fgithub.com\u002Fgbrlsnchs\u002Fkana) - [Mirror] CLI program for transliterating romaji text to either hiragana or katakana\n * [kanaria](https:\u002F\u002Fgithub.com\u002Fsamunohito\u002Fkanaria) - このライブラリは、ひらがな・カタカナ、半角・全角の相互変換や判別を始めとした機能を提供します。\n * [japanese-address-parser](https:\u002F\u002Fgithub.com\u002Fyuukitoriyama\u002Fjapanese-address-parser) - 日本の住所を都道府県\u002F市区町村\u002F町名\u002Fその他に分割するライブラリです\n * [yosina](https:\u002F\u002Fgithub.com\u002Fyosina-lib\u002Fyosina) - Yosina is a transliteration library deals with the letters and symbols used in Japanese writing.\n * [mojimoji-rs](https:\u002F\u002Fgithub.com\u002Feuropeanplaice\u002Fmojimoji-rs) - Rust implementation of a fast converter between Japanese hankaku and zenkaku characters, mojimoji.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [wana_kana_rust](https:\u002F\u002Fgithub.com\u002FPSeitz\u002Fwana_kana_rust) | - | 📦 360k | ⭐ 90 | 🔴 march 2025|\n| 🔗 [unicode-jp-rs](https:\u002F\u002Fgithub.com\u002Fgemmarx\u002Funicode-jp-rs) | - | 📦 64k | ⭐ 19 | 🔴 april 2020|\n| 🔗 [kana](https:\u002F\u002Fgithub.com\u002Fgbrlsnchs\u002Fkana) | - | - | ⭐ 12 | 🔴 january 2023|\n| 🔗 [kanaria](https:\u002F\u002Fgithub.com\u002Fsamunohito\u002Fkanaria) | - | - | ⭐ 21 | 🟢 february|\n| 🔗 [japanese-address-parser](https:\u002F\u002Fgithub.com\u002Fyuukitoriyama\u002Fjapanese-address-parser) | - | - | ⭐ 10 | 🟢 march|\n| 🔗 [yosina](https:\u002F\u002Fgithub.com\u002Fyosina-lib\u002Fyosina) | - | - | ⭐ 24 | 🟢 march|\n| 🔗 [mojimoji-rs](https:\u002F\u002Fgithub.com\u002Feuropeanplaice\u002Fmojimoji-rs) | - | - | ⭐ 4 | 🔴 november 2022|\n\n\n### Search engine library\nLibraries for Japanese full-text search and indexing\n\n * [lindera-tantivy](https:\u002F\u002Fgithub.com\u002Flindera-morphology\u002Flindera-tantivy) - Lindera tokenizer for Tantivy.\n * [tantivy-vibrato](https:\u002F\u002Fgithub.com\u002Fakr4\u002Ftantivy-vibrato) - A Tantivy tokenizer using Vibrato.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [lindera-tantivy](https:\u002F\u002Fgithub.com\u002Flindera-morphology\u002Flindera-tantivy) | - | 📦 178k | ⭐ 69 | 🟢 january|\n| 🔗 [tantivy-vibrato](https:\u002F\u002Fgithub.com\u002Fakr4\u002Ftantivy-vibrato) | - | 📦 1.5k | ⭐ 3 | 🔴 january 2023|\n\n\n### Others\nSupplementary crates for Japanese text and IME processing\n\n * [daachorse](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fdaachorse) - A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.\n * [find-simdoc](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Ffind-simdoc) - Finding all pairs of similar documents time- and memory-efficiently\n * [crawdad](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fcrawdad) - Rust library of natural language dictionaries using character-wise double-array tries.\n * [tokenizer-speed-bench](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Ftokenizer-speed-bench) -  Comparison code of various tokenizers\n * [stringmatch-bench](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Fstringmatch-bench) - Here provides benchmark tools to compare the performance of data structures for string matching.\n * [vime](https:\u002F\u002Fgithub.com\u002Falgon-320\u002Fvime) - Using Vim as an input method for X11 apps\n * [voicevox_core](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox_core) - 無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア\n * [akaza](https:\u002F\u002Fgithub.com\u002Fakaza-im\u002Fakaza) - Yet another Japanese IME for IBus\u002FLinux\n * [Jotoba](https:\u002F\u002Fgithub.com\u002FWeDontPanic\u002FJotoba) - A free online, self-hostable, multilang Japanese dictionary.\n * [dvorakjp-romantable](https:\u002F\u002Fgithub.com\u002Fshinespark\u002Fdvorakjp-romantable) - Google 日本語入力用DvorakJPローマ字テーブル \u002F DvorakJP Roman Table for Google Japanese Input\n * [niinii](https:\u002F\u002Fgithub.com\u002FNetdex\u002Fniinii) -  Japanese glossator for assisted reading of text using Ichiran\n * [cskk](https:\u002F\u002Fgithub.com\u002Fnaokiri\u002Fcskk) - SKK (Simple Kana Kanji henkan) library\n * [japanki](https:\u002F\u002Fgithub.com\u002Ftysonwu\u002Fjapanki) - Learn Japanese vocabs 🇯🇵 by doing quizzes on CLI!\n * [jpreprocess](https:\u002F\u002Fgithub.com\u002Fjpreprocess\u002Fjpreprocess) - Japanese text preprocessor for Text-to-Speech applications (OpenJTalk rewrite in rust language)\n * [listup_precedent](https:\u002F\u002Fgithub.com\u002Fjapanese-law-analysis\u002Flistup_precedent) - 裁判例のデータ一覧を裁判所のホームページ(https:\u002F\u002Fwww.courts.go.jp\u002Findex.html) をスクレイピングして生成するソフトウェア\n * [jisho](https:\u002F\u002Fgithub.com\u002Feagleflo\u002Fjisho) - Jisho is a CLI tool & Rust library that provides a Japanese-English dictionary.\n * [kanalizer](https:\u002F\u002Fgithub.com\u002Fvoicevox\u002Fkanalizer) - 英単語から読みを推測するライブラリ。\n * [koharu](https:\u002F\u002Fgithub.com\u002Fmayocream\u002Fkoharu) - Automated manga translation tool with LLM, written in Rust.\n * [yomine](https:\u002F\u002Fgithub.com\u002Fmcgrizzz\u002Fyomine) - A Japanese vocabulary mining tool designed to help language learners mine new words and expressions.\n * [matsuba](https:\u002F\u002Fgithub.com\u002Fmrpicklepinosaur\u002Fmatsuba) - lightweight japanese ime written in rust\n * [hujiang_dictionary](https:\u002F\u002Fgithub.com\u002Fasutorufa\u002Fhujiang_dictionary) - 日本語辞書 by Rust, support Telegram bot, AWS Lambda and Cloudflare Workers. Support LLM and search RAG.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [daachorse](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fdaachorse) | - | 📦 781k | ⭐ 249 | 🟢 today|\n| 🔗 [find-simdoc](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Ffind-simdoc) | - | 📦 29k | ⭐ 62 | 🔴 march 2025|\n| 🔗 [crawdad](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fcrawdad) | - | 📦 65k | ⭐ 37 | 🔴 january 2025|\n| 🔗 [tokenizer-speed-bench](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Ftokenizer-speed-bench) | - | - | ⭐ 4 | 🔴 march 2023|\n| 🔗 [stringmatch-bench](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Fstringmatch-bench) | - | - | ⭐ 3 | 🔴 september 2022|\n| 🔗 [vime](https:\u002F\u002Fgithub.com\u002Falgon-320\u002Fvime) | - | - | ⭐ 230 | 🔴 november 2022|\n| 🔗 [voicevox_core](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox_core) | - | - | ⭐ 1.1k | 🟢 march|\n| 🔗 [akaza](https:\u002F\u002Fgithub.com\u002Fakaza-im\u002Fakaza) | - | - | ⭐ 249 | 🟢 yesterday|\n| 🔗 [Jotoba](https:\u002F\u002Fgithub.com\u002FWeDontPanic\u002FJotoba) | - | - | ⭐ 200 | 🔴 january 2024|\n| 🔗 [dvorakjp-romantable](https:\u002F\u002Fgithub.com\u002Fshinespark\u002Fdvorakjp-romantable) | - | - | ⭐ 56 | 🟢 february|\n| 🔗 [niinii](https:\u002F\u002Fgithub.com\u002FNetdex\u002Fniinii) | - | - | ⭐ 14 | 🟢 march|\n| 🔗 [cskk](https:\u002F\u002Fgithub.com\u002Fnaokiri\u002Fcskk) | - | - | ⭐ 80 | 🟢 march|\n| 🔗 [japanki](https:\u002F\u002Fgithub.com\u002Ftysonwu\u002Fjapanki) | - | - | ⭐ 3 | 🔴 october 2023|\n| 🔗 [jpreprocess](https:\u002F\u002Fgithub.com\u002Fjpreprocess\u002Fjpreprocess) | - | - | ⭐ 54 | 🟢 february|\n| 🔗 [listup_precedent](https:\u002F\u002Fgithub.com\u002Fjapanese-law-analysis\u002Flistup_precedent) | - | - | ⭐ 6 | 🟢 last thursday|\n| 🔗 [jisho](https:\u002F\u002Fgithub.com\u002Feagleflo\u002Fjisho) | - | - | ⭐ 18 | 🟢 last thursday|\n| 🔗 [kanalizer](https:\u002F\u002Fgithub.com\u002Fvoicevox\u002Fkanalizer) | - | - | ⭐ 27 | 🟢 march|\n| 🔗 [koharu](https:\u002F\u002Fgithub.com\u002Fmayocream\u002Fkoharu) | - | - | ⭐ 1.8k | 🟢 today|\n| 🔗 [yomine](https:\u002F\u002Fgithub.com\u002Fmcgrizzz\u002Fyomine) | - | - | ⭐ 49 | 🟢 february|\n| 🔗 [matsuba](https:\u002F\u002Fgithub.com\u002Fmrpicklepinosaur\u002Fmatsuba) | - | - | ⭐ 18 | 🔴 march 2023|\n| 🔗 [hujiang_dictionary](https:\u002F\u002Fgithub.com\u002Fasutorufa\u002Fhujiang_dictionary) | - | - | ⭐ 70 | 🟢 today|\n\n\n## JavaScript\n\n### Morphology analysis\nJapanese morphological analysis libraries for browser and Node.js\n\n * [kuromoji.js](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fkuromoji.js) - JavaScript implementation of Japanese morphological analyzer\n * [rakutenma](https:\u002F\u002Fgithub.com\u002Frakuten-nlp\u002Frakutenma) -  Rakuten MA - morphological analyzer (word segmentor + PoS Tagger) for Chinese and Japanese written purely in JavaScript.\n * [node-mecab-ya](https:\u002F\u002Fgithub.com\u002Fgolbin\u002Fnode-mecab-ya) - Yet another mecab wrapper for nodejs\n * [juman-bin](https:\u002F\u002Fgithub.com\u002Fthammin\u002Fjuman-bin) - a User-Extensible Morphological Analyzer for Japanese. 日本語形態素解析システム\n * [node-mecab-async](https:\u002F\u002Fgithub.com\u002Fhecomi\u002Fnode-mecab-async) - Asynchronous japanese morphological analyser using MeCab.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [kuromoji.js](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fkuromoji.js) | 📥 181k\u002Fweek | 📦 8.6M | ⭐ 971 | 🔴 november 2018|\n| 🔗 [rakutenma](https:\u002F\u002Fgithub.com\u002Frakuten-nlp\u002Frakutenma) | 📥 36\u002Fweek | 📦 906 | ⭐ 472 | 🔴 january 2015|\n| 🔗 [node-mecab-ya](https:\u002F\u002Fgithub.com\u002Fgolbin\u002Fmecab-ya) | 📥 95\u002Fweek | 📦 7.4k | ⭐ 110 | 🔴 repo not found|\n| 🔗 [juman-bin](https:\u002F\u002Fgithub.com\u002Fthammin\u002Fjuman-bin) | 📥 10\u002Fweek | 📦 305 | ⭐ 3 | 🔴 may 2017|\n| 🔗 [node-mecab-async](https:\u002F\u002Fgithub.com\u002Fhecomi\u002Fnode-mecab-async) | 📥 5k\u002Fweek | 📦 340k | ⭐ 104 | 🔴 october 2017|\n\n\n### Converter\nLibraries for converting Japanese scripts and readings\n\n * [kuroshiro](https:\u002F\u002Fgithub.com\u002Fhexenq\u002Fkuroshiro) - Japanese language library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported.\n * [kuroshiro-analyzer-kuromoji](https:\u002F\u002Fgithub.com\u002Fhexenq\u002Fkuroshiro-analyzer-kuromoji) - Kuromoji morphological analyzer for kuroshiro.\n * [hepburn](https:\u002F\u002Fgithub.com\u002Flovell\u002Fhepburn) - Node.js module for converting Japanese Hiragana and Katakana script to, and from, Romaji using Hepburn romanisation\n * [japanese-numerals-to-number](https:\u002F\u002Fgithub.com\u002Ftwada\u002Fjapanese-numerals-to-number) - Converts Japanese Numerals into number\n * [jslingua](https:\u002F\u002Fgithub.com\u002Fkariminf\u002Fjslingua) - Javascript libraries to process text: Arabic, Japanese, etc.\n * [WanaKana](https:\u002F\u002Fgithub.com\u002FWaniKani\u002FWanaKana) - Javascript library for detecting and transliterating Hiragana \u003C--> Katakana \u003C--> Romaji\n * [node-romaji-name](https:\u002F\u002Fgithub.com\u002Fjeresig\u002Fnode-romaji-name) - Normalize and fix common issues with Romaji-based Japanese names.\n * [kyujitai.js](https:\u002F\u002Fgithub.com\u002Fhakatashi\u002Fkyujitai.js) - Utility collections for making Japanese text old-fashioned\n * [normalize-japanese-addresses](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fnormalize-japanese-addresses) - オープンソースの住所正規化ライブラリ。\n * [jaconv](https:\u002F\u002Fgithub.com\u002Fkazuhikoarase\u002Fjaconv) - 日本語文字変換ライブラリ (javascript)\n * [romaji-conv](https:\u002F\u002Fgithub.com\u002Fkoozaki\u002Fromaji-conv) - Convert romaji into hiragana\n * [japanese-addresses-v2](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fjapanese-addresses-v2) - 全国の住所データAPI\n * [jptext-to-emoji](https:\u002F\u002Fgithub.com\u002Felzup\u002Fjptext-to-emoji) - テキストの単語を絵文字に変換する\n * [japanese.js](https:\u002F\u002Fgithub.com\u002Fhakatashi\u002Fjapanese.js) - Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [kuroshiro](https:\u002F\u002Fgithub.com\u002Fhexenq\u002Fkuroshiro) | 📥 12k\u002Fweek | 📦 435k | ⭐ 963 | 🔴 june 2021|\n| 🔗 [kuroshiro-analyzer-kuromoji](https:\u002F\u002Fgithub.com\u002Fhexenq\u002Fkuroshiro-analyzer-kuromoji) | 📥 12k\u002Fweek | 📦 410k | ⭐ 68 | 🔴 august 2018|\n| 🔗 [hepburn](https:\u002F\u002Fgithub.com\u002Flovell\u002Fhepburn) | 📥 154k\u002Fweek | 📦 3.7M | ⭐ 137 | 🟡 september 2025|\n| 🔗 [japanese-numerals-to-number](https:\u002F\u002Fgithub.com\u002Ftwada\u002Fjapanese-numerals-to-number) | 📥 41k\u002Fweek | 📦 2.3M | ⭐ 59 | 🔴 february 2023|\n| 🔗 [jslingua](https:\u002F\u002Fgithub.com\u002Fkariminf\u002Fjslingua) | 📥 71\u002Fweek | 📦 8.3k | ⭐ 53 | 🔴 october 2023|\n| 🔗 [WanaKana](https:\u002F\u002Fgithub.com\u002FWaniKani\u002FWanaKana) | 📥 rate limited by upstream service | 📦 2.2M | ⭐ 912 | 🟡 september 2025|\n| 🔗 [node-romaji-name](https:\u002F\u002Fgithub.com\u002Fjeresig\u002Fnode-romaji-name) | 📥 440\u002Fweek | 📦 14k | ⭐ 41 | 🔴 december 2023|\n| 🔗 [kyujitai.js](https:\u002F\u002Fgithub.com\u002Fhakatashi\u002Fkyujitai.js) | 📥 rate limited by upstream service | 📦 1.1k | ⭐ 23 | 🔴 august 2020|\n| 🔗 [normalize-japanese-addresses](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fnormalize-japanese-addresses) | - | - | ⭐ 946 | 🟡 july 2025|\n| 🔗 [jaconv](https:\u002F\u002Fgithub.com\u002Fkazuhikoarase\u002Fjaconv) | - | - | ⭐ 87 | 🟡 june 2025|\n| 🔗 [romaji-conv](https:\u002F\u002Fgithub.com\u002Fkoozaki\u002Fromaji-conv) | - | - | ⭐ 26 | 🟢 february|\n| 🔗 [japanese-addresses-v2](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fjapanese-addresses-v2) | - | - | ⭐ 71 | 🔴 january 2025|\n| 🔗 [jptext-to-emoji](https:\u002F\u002Fgithub.com\u002Felzup\u002Fjptext-to-emoji) | - | - | ⭐ 2 | 🟢 february|\n| 🔗 [japanese.js](https:\u002F\u002Fgithub.com\u002Fhakatashi\u002Fjapanese.js) | - | - | ⭐ 167 | 🔴 august 2020|\n\n\n### Others\nOther libraries for Japanese NLP in JavaScript\n\n * [bangumi-data](https:\u002F\u002Fgithub.com\u002Fbangumi-data\u002Fbangumi-data) - Raw data for Japanese Anime\n * [yomichan](https:\u002F\u002Fgithub.com\u002FFooSoft\u002Fyomichan) - Japanese pop-up dictionary extension for Chrome and Firefox.\n * [proofreading-tool](https:\u002F\u002Fgithub.com\u002Fgecko655\u002Fproofreading-tool) - GUIで動作する文書校正ツール GUI tool for textlinting.\n * [kanjigrid](https:\u002F\u002Fgithub.com\u002Fminosvasilias\u002Fkanjigrid) - A web-app displaying the 2200 kanji characters taught in James Heisig's \"Remembering the Kanji\", 6th edition.\n * [japanese-toolkit](https:\u002F\u002Fgithub.com\u002Fechamudi\u002Fjapanese-toolkit) - Monorepo for Kanji, Furigana, Japanese DB, and others\n * [analyze-desumasu-dearu](https:\u002F\u002Fgithub.com\u002Ftextlint-ja\u002Fanalyze-desumasu-dearu) - 文の敬体(ですます調)、常体(である調)を解析するJavaScriptライブラリ\n * [hatsuon](https:\u002F\u002Fgithub.com\u002FDJTB\u002Fhatsuon) - Japanese pitch accent utils\n * [sentiment_ja_js](https:\u002F\u002Fgithub.com\u002Fotodn\u002Fsentiment_ja_js) - Sentiment Analysis in Japanese. sentiment_ja with JavaScript\n * [mecab-ipadic-seed](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fmecab-ipadic-seed) - mecab-ipadic seed dictionary reader\n * [Japanese-Word-Of-The-Day](https:\u002F\u002Fgithub.com\u002FLuanRT\u002FJapanese-Word-Of-The-Day) - Well, a different Japanese word everyday.\n * [oskim](https:\u002F\u002Fgithub.com\u002Fesrille\u002Foskim) - Extend GNOME On-Screen Keyboard for Input Methods\n * [tweetMapping](https:\u002F\u002Fgithub.com\u002Fwtnv-lab\u002FtweetMapping) - 東日本大震災発生から24時間以内につぶやかれたジオタグ付きツイートのデジタルアーカイブです。\n * [pitch-accent](https:\u002F\u002Fgithub.com\u002Fshirakaba\u002Fpitch-accent) - Predict pitch accent in Japanese\n * [kana2ipa](https:\u002F\u002Fgithub.com\u002Famanoese\u002Fkana2ipa) - 「ひらがな」または「カタカナ」を日本語で発音する際の音声記号(IPA)に変換するコマンド\n * [voicevox](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox) - 無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのエディター\n * [kamiya-codec](https:\u002F\u002Fgithub.com\u002Ffasiha\u002Fkamiya-codec) - Towards a Japanese verb conjugator and deconjugator based on Taeko Kamiya's *The Handbook of Japanese Verbs* and *The Handbook of Japanese Adjectives and Adverbs* opuses.\n * [closewords](https:\u002F\u002Fgithub.com\u002Fotoneko1102\u002Fclosewords) - 最も似た単語を単語群から検索する日本語(漢字含む)対応のライブラリ\n * [japanese-analyzer](https:\u002F\u002Fgithub.com\u002Fcokice\u002Fjapanese-analyzer) - Japanese Sentence Analyzer (日本語文章解析器)\n * [japanese-furigana-normalize](https:\u002F\u002Fgithub.com\u002Fmarvnc\u002Fjapanese-furigana-normalize) - Normalize Japanese Furigana\n * [yama](https:\u002F\u002Fgithub.com\u002Fsapjax\u002Fyama) - acquire Japanese vocabulary on any website\n * [kaitai](https:\u002F\u002Fgithub.com\u002Fcompile10\u002Fkaitai) - An application for analyzing Japanese sentence structure using AI. This tool visualizes how words and phrases relate to each other, showing grammatical relationships with interactive diagrams.\n * [tsukeru-furigana-converter](https:\u002F\u002Fgithub.com\u002Fln2058\u002Ftsukeru-furigana-converter) - Browser extension (Chrome\u002FEdge\u002FFirefox) that injects furigana into Japanese webpages on-demand; includes dictionary tooltips, JLPT filtering, and vocab\u002FAnki export.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [bangumi-data](https:\u002F\u002Fgithub.com\u002Fbangumi-data\u002Fbangumi-data) | 📥 830\u002Fweek | 📦 58k | ⭐ 598 | 🟢 last wednesday|\n| 🔗 [yomichan](https:\u002F\u002Fgithub.com\u002FFooSoft\u002Fyomichan) | - | - | ⭐ 1.1k | 🔴 february 2023|\n| 🔗 [proofreading-tool](https:\u002F\u002Fgithub.com\u002Fgecko655\u002Fproofreading-tool) | - | - | ⭐ 87 | 🟡 october 2025|\n| 🔗 [kanjigrid](https:\u002F\u002Fgithub.com\u002Fminosvasilias\u002Fkanjigrid) | - | - | ⭐ 44 | 🔴 november 2018|\n| 🔗 [japanese-toolkit](https:\u002F\u002Fgithub.com\u002Fechamudi\u002Fjapanese-toolkit) | - | - | ⭐ 63 | 🔴 january 2023|\n| 🔗 [analyze-desumasu-dearu](https:\u002F\u002Fgithub.com\u002Ftextlint-ja\u002Fanalyze-desumasu-dearu) | 📥 90k\u002Fweek | 📦 rate limited by upstream service | ⭐ 18 | 🔴 january 2025|\n| 🔗 [hatsuon](https:\u002F\u002Fgithub.com\u002FDJTB\u002Fhatsuon) | 📥 16\u002Fweek | 📦 911 | ⭐ 38 | 🔴 march 2022|\n| 🔗 [sentiment_ja_js](https:\u002F\u002Fgithub.com\u002Fotodn\u002Fsentiment_ja_js) | - | - | ⭐ 10 | 🔴 december 2021|\n| 🔗 [mecab-ipadic-seed](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fmecab-ipadic-seed) | 📥 127\u002Fweek | 📦 6.1k | ⭐ 8 | 🔴 july 2016|\n| 🔗 [Japanese-Word-Of-The-Day](https:\u002F\u002Fgithub.com\u002FLuanRT\u002FJapanese-Word-Of-The-Day) | 📥 1\u002Fweek | 📦 rate limited by upstream service | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [oskim](https:\u002F\u002Fgithub.com\u002Fesrille\u002Foskim) | - | - | ⭐ 2 | 🔴 february 2023|\n| 🔗 [tweetMapping](https:\u002F\u002Fgithub.com\u002Fwtnv-lab\u002FtweetMapping) | - | - | ⭐ 26 | 🟢 march|\n| 🔗 [pitch-accent](https:\u002F\u002Fgithub.com\u002Fshirakaba\u002Fpitch-accent) | 📥 9\u002Fweek | 📦 102 | ⭐ 2 | 🔴 september 2023|\n| 🔗 [kana2ipa](https:\u002F\u002Fgithub.com\u002Famanoese\u002Fkana2ipa) | - | - | ⭐ 17 | 🔴 october 2020|\n| 🔗 [voicevox](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox) | - | - | ⭐ 3.1k | 🟢 today|\n| 🔗 [kamiya-codec](https:\u002F\u002Fgithub.com\u002Ffasiha\u002Fkamiya-codec) | - | - | ⭐ 22 | 🟡 may 2025|\n| 🔗 [closewords](https:\u002F\u002Fgithub.com\u002Fotoneko1102\u002Fclosewords) | - | - | ⭐ 4 | 🟢 march|\n| 🔗 [japanese-analyzer](https:\u002F\u002Fgithub.com\u002Fcokice\u002Fjapanese-analyzer) | - | - | ⭐ 714 | 🟡 december 2025|\n| 🔗 [japanese-furigana-normalize](https:\u002F\u002Fgithub.com\u002Fmarvnc\u002Fjapanese-furigana-normalize) | - | - | ⭐ 6 | 🔴 july 2024|\n| 🔗 [yama](https:\u002F\u002Fgithub.com\u002Fsapjax\u002Fyama) | - | - | ⭐ 8 | 🟢 february|\n| 🔗 [kaitai](https:\u002F\u002Fgithub.com\u002Fcompile10\u002Fkaitai) | - | - | ⭐ 1 | 🟢 yesterday|\n| 🔗 [tsukeru-furigana-converter](https:\u002F\u002Fgithub.com\u002Fln2058\u002Ftsukeru-furigana-converter) | - | - | ⭐ 1 | 🟢 march|\n\n\n## Go\n\n### Morphology analysis\nLightweight Japanese morphological analysis libraries in Go\n\n * [kagome](https:\u002F\u002Fgithub.com\u002Fikawaha\u002Fkagome) - Self-contained Japanese Morphological Analyzer written in pure Go\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [kagome](https:\u002F\u002Fgithub.com\u002Fikawaha\u002Fkagome) | - | - | ⭐ 959 | 🟢 last friday|\n\n\n### Others\nAdditional Go-based Japanese text processing libraries\n\n * [ojosama](https:\u002F\u002Fgithub.com\u002Fjiro4989\u002Fojosama) - テキストを壱百満天原サロメお嬢様風の口調に変換します\n * [nihongo](https:\u002F\u002Fgithub.com\u002Fgojp\u002Fnihongo) - Japanese Dictionary\n * [yomichan-import](https:\u002F\u002Fgithub.com\u002FFooSoft\u002Fyomichan-import) - External dictionary importer for Yomichan.\n * [imas-ime-dic](https:\u002F\u002Fgithub.com\u002Fmaruamyu\u002Fimas-ime-dic) - THE IDOLM@STER words dictionary for Japanese IME (by imas-db.jp)\n * [go-kakasi](https:\u002F\u002Fgithub.com\u002Fsarumaj\u002Fgo-kakasi) - Kanji transliteration to hiragana\u002Fkatakana\u002Fromaji, in Go\n * [go-moji](https:\u002F\u002Fgithub.com\u002Fktnyt\u002Fgo-moji) - A Go library for Zenkaku\u002FHankaku conversion\n * [ojichat](https:\u002F\u002Fgithub.com\u002Fgreymd\u002Fojichat) - おじさんがLINEやメールで送ってきそうな文を生成する\n * [name](https:\u002F\u002Fgithub.com\u002Fkuniwak\u002Fname) - Name Searcher in Japanese\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [ojosama](https:\u002F\u002Fgithub.com\u002Fjiro4989\u002Fojosama) | - | - | ⭐ 387 | 🟢 march|\n| 🔗 [nihongo](https:\u002F\u002Fgithub.com\u002Fgojp\u002Fnihongo) | - | - | ⭐ 83 | 🔴 february 2024|\n| 🔗 [yomichan-import](https:\u002F\u002Fgithub.com\u002FFooSoft\u002Fyomichan-import) | - | - | ⭐ 86 | 🔴 february 2023|\n| 🔗 [imas-ime-dic](https:\u002F\u002Fgithub.com\u002Fmaruamyu\u002Fimas-ime-dic) | - | - | ⭐ 32 | 🟢 january|\n| 🔗 [go-kakasi](https:\u002F\u002Fgithub.com\u002Fsarumaj\u002Fgo-kakasi) | - | - | ⭐ 6 | 🟢 last thursday|\n| 🔗 [go-moji](https:\u002F\u002Fgithub.com\u002Fktnyt\u002Fgo-moji) | - | - | ⭐ 20 | 🔴 april 2019|\n| 🔗 [ojichat](https:\u002F\u002Fgithub.com\u002Fgreymd\u002Fojichat) | - | - | ⭐ 1.3k | 🔴 october 2024|\n| 🔗 [name](https:\u002F\u002Fgithub.com\u002Fkuniwak\u002Fname) | - | - | ⭐ 11 | 🔴 january 2025|\n\n\n## Java\n\n### Morphology analysis\nJapanese morphological analysis and dictionary management libraries\n\n * [kuromoji](https:\u002F\u002Fgithub.com\u002Fatilika\u002Fkuromoji) - Kuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search\n * [Sudachi](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachi) -　A Japanese Tokenizer for Business\n * [SudachiDict](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachiDict) - A lexicon for Sudachi\n * [meval](https:\u002F\u002Fgithub.com\u002Fteru-oka-1933\u002Fmeval) - 形態素解析器性能評価システム MevAL\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [kuromoji](https:\u002F\u002Fgithub.com\u002Fatilika\u002Fkuromoji) | - | - | ⭐ 1k | 🔴 september 2019|\n| 🔗 [Sudachi](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachi) | - | - | ⭐ 953 | 🔴 november 2024|\n| 🔗 [SudachiDict](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachiDict) | - | - | ⭐ 285 | 🟢 january|\n| 🔗 [meval](https:\u002F\u002Fgithub.com\u002Fteru-oka-1933\u002Fmeval) | - | - | ⭐ 7 | 🔴 august 2019|\n\n\n### Others\nJava libraries for Japanese NLP and OCR\n\n * [kanjitomo-ocr](https:\u002F\u002Fgithub.com\u002Fsakarika\u002Fkanjitomo-ocr) - Java library for identifying Japanese characters from images\n * [jakaroma](https:\u002F\u002Fgithub.com\u002Fnicolas-raoul\u002Fjakaroma) - Java library and command-line tool to transliterate Japanese kanji to romaji (Latin alphabet)\n * [kakasi-java](https:\u002F\u002Fgithub.com\u002Fnicolas-raoul\u002Fkakasi-java) - Kanji transliteration to hiragana\u002Fkatakana\u002Fromaji, in Java\n * [Kamite](https:\u002F\u002Fgithub.com\u002Ffauu\u002FKamite) - A desktop language immersion companion for learners of Japanese\n * [react-native-japanese-tokenizer](https:\u002F\u002Fgithub.com\u002Fcraftzdog\u002Freact-native-japanese-tokenizer) - Async Japanese Tokenizer Native Plugin for React Native for iOS and Android\n * [elasticsearch-analysis-japanese](https:\u002F\u002Fgithub.com\u002Fsuguru\u002Felasticsearch-analysis-japanese) - Japanese analyzer uses kuromoji japanese tokenizer for ElasticSearch\n * [moji4j](https:\u002F\u002Fgithub.com\u002Fandree-surya\u002Fmoji4j) - A Java library to converts between Japanese Hiragana, Katakana, and Romaji scripts.\n * [neologdn-java](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fneologdn-java) - Japanese text normalizer for mecab-neologd\n * [elasticsearch-sudachi](https:\u002F\u002Fgithub.com\u002Fworksapplications\u002Felasticsearch-sudachi) - The Japanese analysis plugin for elasticsearch\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [kanjitomo-ocr](https:\u002F\u002Fgithub.com\u002Fsakarika\u002Fkanjitomo-ocr) | - | - | ⭐ 205 | 🔴 may 2021|\n| 🔗 [jakaroma](https:\u002F\u002Fgithub.com\u002Fnicolas-raoul\u002Fjakaroma) | - | - | ⭐ 68 | 🟡 june 2025|\n| 🔗 [kakasi-java](https:\u002F\u002Fgithub.com\u002Fnicolas-raoul\u002Fkakasi-java) | - | - | ⭐ 55 | 🔴 april 2016|\n| 🔗 [Kamite](https:\u002F\u002Fgithub.com\u002Ffauu\u002FKamite) | - | - | ⭐ 133 | 🔴 march 2025|\n| 🔗 [react-native-japanese-tokenizer](https:\u002F\u002Fgithub.com\u002Fcraftzdog\u002Freact-native-japanese-tokenizer) | - | - | ⭐ 38 | 🔴 june 2023|\n| 🔗 [elasticsearch-analysis-japanese](https:\u002F\u002Fgithub.com\u002Fsuguru\u002Felasticsearch-analysis-japanese) | - | - | ⭐ 29 | 🔴 march 2012|\n| 🔗 [moji4j](https:\u002F\u002Fgithub.com\u002Fandree-surya\u002Fmoji4j) | - | - | ⭐ 33 | 🔴 june 2022|\n| 🔗 [neologdn-java](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fneologdn-java) | - | - | ⭐ 5 | 🟢 february|\n| 🔗 [elasticsearch-sudachi](https:\u002F\u002Fgithub.com\u002Fworksapplications\u002Felasticsearch-sudachi) | - | - | ⭐ 220 | 🟢 last wednesday|\n\n\n## Pretrained model\n\n### Word2Vec\nModels that convert words into numeric vectors to capture semantic similarity\n\n * [japanese-words-to-vectors](https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fjapanese-words-to-vectors) - Word2vec (word to vectors) approach for Japanese language using Gensim and Mecab.\n * [chiVe](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FchiVe) - Japanese word embedding with Sudachi and NWJC\n * [elmo-japanese](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Felmo-japanese) - elmo-japanese\n * [embedrank](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fembedrank) - Python Implementation of EmbedRank\n * [aovec](https:\u002F\u002Fgithub.com\u002Feggplants\u002Faovec) - Easy aozorabunko Word2Vec Builder - 青空文庫全書籍のWord2Vecビルダー+構築済みモデル\n * [dependency-based-japanese-word-embeddings](https:\u002F\u002Fgithub.com\u002Flapras-inc\u002Fdependency-based-japanese-word-embeddings) - This is a repository for the AI LAB article \"係り受けに基づく日本語単語埋込 (Dependency-based Japanese Word Embeddings)\" ( Article URL https:\u002F\u002Fai-lab.lapras.com\u002Fnlp\u002Fjapanese-word-embedding\u002F)\n * [jawikivec](https:\u002F\u002Fgithub.com\u002Fwikiwikification\u002Fjawikivec) - Yet Another Japanese-Wikipedia Entity Vectors\n * [jawiki_word_vector_updater](https:\u002F\u002Fgithub.com\u002Fkamigaito\u002Fjawiki_word_vector_updater) - 最新の日本語Wikipediaのダンプデータから，MeCabを用いてIPA辞書と最新のNeologd辞書の両方で形態素解析を実施し，その結果に基づいた word2vec，fastText，GloVeの単語分散表現を学習するためのスクリプト\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [japanese-words-to-vectors](https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fjapanese-words-to-vectors) | - | - | ⭐ 87 | 🔴 august 2020|\n| 🔗 [chiVe](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FchiVe) | - | - | ⭐ 172 | 🔴 march 2024|\n| 🔗 [elmo-japanese](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Felmo-japanese) | - | - | ⭐ 4 | 🔴 october 2019|\n| 🔗 [embedrank](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fembedrank) | - | - | ⭐ 48 | 🔴 march 2019|\n| 🔗 [aovec](https:\u002F\u002Fgithub.com\u002Feggplants\u002Faovec) | 📥 111 | 📦 82k | ⭐ 3 | 🔴 january 2023|\n| 🔗 [dependency-based-japanese-word-embeddings](https:\u002F\u002Fgithub.com\u002Flapras-inc\u002Fdependency-based-japanese-word-embeddings) | - | - | ⭐ 8 | 🔴 august 2019|\n| 🔗 [jawikivec](https:\u002F\u002Fgithub.com\u002Fwikiwikification\u002Fjawikivec) | - | - | ⭐ 2 | 🔴 november 2018|\n| 🔗 [jawiki_word_vector_updater](https:\u002F\u002Fgithub.com\u002Fkamigaito\u002Fjawiki_word_vector_updater) | - | - | ⭐ 11 | 🔴 may 2020|\n\n\n### Transformer based models\nModels that use self-attention to understand context and perform advanced language tasks\n\n * [bert-japanese](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fbert-japanese) - BERT models for Japanese text.\n * [japanese-pretrained-models](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-pretrained-models) - Code for producing Japanese pretrained models provided by rinna Co., Ltd.\n * [bert-japanese](https:\u002F\u002Fgithub.com\u002Fyoheikikuta\u002Fbert-japanese) - BERT with SentencePiece for Japanese text.\n * [SudachiTra](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachiTra) - Japanese tokenizer for Transformers\n * [japanese-dialog-transformers](https:\u002F\u002Fgithub.com\u002Fnttcslab\u002Fjapanese-dialog-transformers) - Code for evaluating Japanese pretrained models provided by NTT Ltd.\n * [shiba](https:\u002F\u002Fgithub.com\u002Foctanove\u002Fshiba) - Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.\n * [Dialog](https:\u002F\u002Fgithub.com\u002Freppy4620\u002FDialog) - A PyTorch Implementation of japanese chatbot using BERT and Transformer's decoder\n * [language-pretraining](https:\u002F\u002Fgithub.com\u002Fretarfi\u002Flanguage-pretraining) - BERT and ELECTRA models of PyTorch implementations for Japanese text.\n * [medbertjp](https:\u002F\u002Fgithub.com\u002Fou-medinfo\u002Fmedbertjp) - Trials of pre-trained BERT models for the medical domain in Japanese.\n * [ILYS-aoba-chatbot](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FILYS-aoba-chatbot) - ILYS-aoba-chatbot\n * [t5-japanese](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Ft5-japanese) - Codes to pre-train Japanese T5 models\n * [pytorch_bert_japanese](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fpytorch_bert_japanese) - PytorchでBERTの日本語学習済みモデルを利用する\n * [Laboro-BERT-Japanese](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-BERT-Japanese) - Laboro BERT Japanese: Japanese BERT Pre-Trained With Web-Corpus\n * [RoBERTa-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FRoBERTa-japanese) - Japanese BERT Pretrained Model\n * [aMLP-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FaMLP-japanese) - aMLP Transformer Model for Japanese\n * [bert-japanese-aozora](https:\u002F\u002Fgithub.com\u002Fakirakubo\u002Fbert-japanese-aozora) - Japanese BERT trained on Aozora Bunko and Wikipedia, pre-tokenized by MeCab with UniDic & SudachiPy\n * [sbert-ja](https:\u002F\u002Fgithub.com\u002Fcolorfulscoop\u002Fsbert-ja) - Code to train Sentence BERT Japanese model for Hugging Face Model Hub\n * [BERT-Japan-vaccination](https:\u002F\u002Fgithub.com\u002FPatrickJohnRamos\u002FBERT-Japan-vaccination) - Official fine-tuning code for \"Emotion Analysis of Japanese Tweets and Comparison to Vaccinations in Japan\"\n * [gpt2-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002Fgpt2-japanese) - Japanese GPT2 Generation Model\n * [text2text-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002Ftext2text-japanese) - gpt-2 based text2text conversion model\n * [gpt-ja](https:\u002F\u002Fgithub.com\u002Fcolorfulscoop\u002Fgpt-ja) - GPT-2 Japanese model for HuggingFace's transformers\n * [friendly_JA-Model](https:\u002F\u002Fgithub.com\u002Fastremo\u002Ffriendly_JA-Model) - MT model trained using the friendly_JA Corpus attempting to make Japanese easier\u002Fmore accessible to occidental people by using the Latin\u002FEnglish derived katakana lexicon instead of the standard Sino-Japanese lexicon\n * [albert-japanese](https:\u002F\u002Fgithub.com\u002Falinear-corp\u002Falbert-japanese) - BERT with SentencePiece for Japanese text.\n * [ja_text_bert](https:\u002F\u002Fgithub.com\u002FKosuke-Szk\u002Fja_text_bert) - 日本語WikipediaコーパスでBERTのPre-Trainedモデルを生成するためのリポジトリ\n * [DistilBERT-base-jp](https:\u002F\u002Fgithub.com\u002FBandaiNamcoResearchInc\u002FDistilBERT-base-jp) - A Japanese DistilBERT pretrained model, which was trained on Wikipedia.\n * [bert](https:\u002F\u002Fgithub.com\u002Finformatix-inc\u002Fbert) - This repository provides snippets to use RoBERTa pre-trained on Japanese corpus. Our dataset consists of Japanese Wikipedia and web-scrolled articles, 25GB in total. The released model is built based on that from HuggingFace.\n * [Laboro-DistilBERT-Japanese](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-DistilBERT-Japanese) - Laboro DistilBERT Japanese\n * [luke](https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke) - LUKE -- Language Understanding with Knowledge-based Embeddings\n * [GPTSAN](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FGPTSAN) - General-purpose Swich transformer based Japanese language mode\n * [japanese-clip](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-clip) - Japanese CLIP by rinna Co., Ltd.\n * [AcademicBART](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FAcademicBART) - We pretrained a BART-based Japanese masked language model on paper abstracts from the academic database CiNii Articles\n * [AcademicRoBERTa](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FAcademicRoBERTa) - We pretrained a RoBERTa-based Japanese masked language model on paper abstracts from the academic database CiNii Articles.\n * [LINE-DistilBERT-Japanese](https:\u002F\u002Fgithub.com\u002Fline\u002FLINE-DistilBERT-Japanese) - DistilBERT model pre-trained on 131 GB of Japanese web text. The teacher model is BERT-base that built in-house at LINE.\n * [Japanese-Alpaca-LoRA](https:\u002F\u002Fgithub.com\u002Fkunishou\u002FJapanese-Alpaca-LoRA) - 日本語に翻訳したStanford Alpacaのデータセットを用いてLLaMAをファインチューニングし作成したLow-Rank AdapterのリンクとGenerateサンプルコード\n * [albert-japanese-tinysegmenter](https:\u002F\u002Fgithub.com\u002Fnknytk\u002Falbert-japanese-tinysegmenter) - Pretrained models, codes and guidances to pretrain official ALBERT(https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Falbert) on Japanese Wikipedia Resources\n * [japanese-llama-experiment](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjapanese-llama-experiment) - Japanese LLaMa experiment\n * [easylightchatassistant](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasylightchatassistant) - EasyLightChatAssistant は軽量で検閲や規制のないローカル日本語モデルのLightChatAssistant を、KoboldCpp で簡単にお試しする環境です。\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [bert-japanese](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fbert-japanese) | - | - | ⭐ 544 | 🔴 march 2024|\n| 🔗 [japanese-pretrained-models](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-pretrained-models) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [bert-japanese](https:\u002F\u002Fgithub.com\u002Fyoheikikuta\u002Fbert-japanese) | - | - | ⭐ 498 | 🔴 february 2021|\n| 🔗 [SudachiTra](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachiTra) | 📥 445 | 📦 164k | ⭐ 79 | 🔴 december 2023|\n| 🔗 [japanese-dialog-transformers](https:\u002F\u002Fgithub.com\u002Fnttcslab\u002Fjapanese-dialog-transformers) | - | - | ⭐ 245 | 🔴 june 2023|\n| 🔗 [shiba](https:\u002F\u002Fgithub.com\u002Foctanove\u002Fshiba) | 📥 8 | 📦 7k | ⭐ 89 | 🔴 november 2023|\n| 🔗 [Dialog](https:\u002F\u002Fgithub.com\u002Freppy4620\u002FDialog) | - | - | ⭐ 72 | 🔴 october 2020|\n| 🔗 [language-pretraining](https:\u002F\u002Fgithub.com\u002Fretarfi\u002Flanguage-pretraining) | - | - | ⭐ 50 | 🔴 may 2023|\n| 🔗 [medbertjp](https:\u002F\u002Fgithub.com\u002Fou-medinfo\u002Fmedbertjp) | - | - | ⭐ 12 | 🔴 november 2020|\n| 🔗 [ILYS-aoba-chatbot](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FILYS-aoba-chatbot) | - | - | ⭐ 23 | 🔴 october 2021|\n| 🔗 [t5-japanese](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Ft5-japanese) | - | - | ⭐ 40 | 🔴 september 2021|\n| 🔗 [pytorch_bert_japanese](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fpytorch_bert_japanese) | - | - | ⭐ 35 | 🔴 june 2019|\n| 🔗 [Laboro-BERT-Japanese](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-BERT-Japanese) | - | - | ⭐ 73 | 🔴 may 2022|\n| 🔗 [RoBERTa-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FRoBERTa-japanese) | - | - | ⭐ 23 | 🔴 november 2021|\n| 🔗 [aMLP-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FaMLP-japanese) | - | - | ⭐ 16 | 🔴 may 2022|\n| 🔗 [bert-japanese-aozora](https:\u002F\u002Fgithub.com\u002Fakirakubo\u002Fbert-japanese-aozora) | - | - | ⭐ 40 | 🔴 august 2020|\n| 🔗 [sbert-ja](https:\u002F\u002Fgithub.com\u002Fcolorfulscoop\u002Fsbert-ja) | - | - | ⭐ 11 | 🔴 august 2021|\n| 🔗 [BERT-Japan-vaccination](https:\u002F\u002Fgithub.com\u002FPatrickJohnRamos\u002FBERT-Japan-vaccination) | - | - | ⭐ 7 | 🔴 may 2022|\n| 🔗 [gpt2-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002Fgpt2-japanese) | - | - | ⭐ 324 | 🔴 september 2023|\n| 🔗 [text2text-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002Ftext2text-japanese) | - | - | ⭐ 33 | 🔴 july 2021|\n| 🔗 [gpt-ja](https:\u002F\u002Fgithub.com\u002Fcolorfulscoop\u002Fgpt-ja) | - | - | ⭐ 3 | 🔴 september 2021|\n| 🔗 [friendly_JA-Model](https:\u002F\u002Fgithub.com\u002Fastremo\u002Ffriendly_JA-Model) | - | - | ⭐ 1 | 🔴 may 2022|\n| 🔗 [albert-japanese](https:\u002F\u002Fgithub.com\u002Falinear-corp\u002Falbert-japanese) | - | - | ⭐ 33 | 🔴 october 2021|\n| 🔗 [ja_text_bert](https:\u002F\u002Fgithub.com\u002FKosuke-Szk\u002Fja_text_bert) | - | - | ⭐ 115 | 🔴 november 2018|\n| 🔗 [DistilBERT-base-jp](https:\u002F\u002Fgithub.com\u002FBandaiNamcoResearchInc\u002FDistilBERT-base-jp) | - | - | ⭐ 161 | 🔴 april 2020|\n| 🔗 [bert](https:\u002F\u002Fgithub.com\u002Finformatix-inc\u002Fbert) | - | - | ⭐ 28 | 🔴 april 2022|\n| 🔗 [Laboro-DistilBERT-Japanese](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-DistilBERT-Japanese) | - | - | ⭐ 16 | 🔴 december 2020|\n| 🔗 [luke](https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke) | - | - | ⭐ 727 | 🔴 june 2023|\n| 🔗 [GPTSAN](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FGPTSAN) | - | - | ⭐ 118 | 🔴 september 2023|\n| 🔗 [japanese-clip](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-clip) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [AcademicBART](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FAcademicBART) | - | - | ⭐ 2 | 🔴 july 2024|\n| 🔗 [AcademicRoBERTa](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FAcademicRoBERTa) | - | - | ⭐ 9 | 🔴 september 2024|\n| 🔗 [LINE-DistilBERT-Japanese](https:\u002F\u002Fgithub.com\u002Fline\u002FLINE-DistilBERT-Japanese) | - | - | ⭐ 46 | 🔴 march 2023|\n| 🔗 [Japanese-Alpaca-LoRA](https:\u002F\u002Fgithub.com\u002Fkunishou\u002FJapanese-Alpaca-LoRA) | - | - | ⭐ 141 | 🔴 april 2023|\n| 🔗 [albert-japanese-tinysegmenter](https:\u002F\u002Fgithub.com\u002Fnknytk\u002Falbert-japanese-tinysegmenter) | - | - | ⭐ 13 | 🔴 september 2023|\n| 🔗 [japanese-llama-experiment](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjapanese-llama-experiment) | - | - | ⭐ 54 | 🟡 december 2025|\n| 🔗 [easylightchatassistant](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasylightchatassistant) | - | - | ⭐ 44 | 🔴 april 2024|\n\n\n## ChatGPT\nResources for using ChatGPT and APIs for Japanese dialogue and text generation\n\n * [VRChatGPT](https:\u002F\u002Fgithub.com\u002FYuchi-Games\u002FVRChatGPT) - ChatGPTを使ってVRChat上でお喋り出来るようにするプログラム。\n * [AITuberDegikkoMirii](https:\u002F\u002Fgithub.com\u002FM-gen\u002FAITuberDegikkoMirii) - AITuberの基礎となる部分を開発しています\n * [wanna](https:\u002F\u002Fgithub.com\u002Fhirokidaichi\u002Fwanna) - Shell command launcher with natural language\n * [ChatdollKit](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit) - ChatdollKit enables you to make your 3D model into a chatbot\n * [ChuanhuChatGPTJapanese](https:\u002F\u002Fgithub.com\u002Fgyokuro33\u002FChuanhuChatGPTJapanese) - GUI for ChatGPT API For Japanese\n * [AISisterAIChan](https:\u002F\u002Fgithub.com\u002Fmanju-summoner\u002FAISisterAIChan) - ChatGPT3.5を搭載した伺かゴースト「AI妹アイちゃん」です。利用には別途ChatGPTのAPIキーが必要です。\n * [vrchatbot](https:\u002F\u002Fgithub.com\u002FGeson-anko\u002Fvrchatbot) - VRChatにAI Botを作るためのリポジトリ\n * [gptuber-by-langchain](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fgptuber-by-langchain) - GPTがYouTuberをやります\n * [openai-chatfriend](https:\u002F\u002Fgithub.com\u002Fsupershaneski\u002Fopenai-chatfriend) - A chatbox application built using Nuxt 3 powered by Open AI Text completion endpoint. You can select different personality of your AI friend. The default will respond in Japanese. You can use this app to practice your Nihongo skills!\n * [chrome-ext-translate-to-hiragana-with-chatgpt](https:\u002F\u002Fgithub.com\u002Ffranzwong\u002Fchrome-ext-translate-to-hiragana-with-chatgpt) - This Chrome extension can translate selected Japanese text to Hiragana by using ChatGPT.\n * [azure-search-openai-demo](https:\u002F\u002Fgithub.com\u002Fnohanaga\u002Fazure-search-openai-demo) - このサンプルでは、Retrieval Augmented Generation パターンを使用して、独自のデータに対してChatGPT のような体験を作成するためのいくつかのアプローチを示しています。\n * [chatvrm](https:\u002F\u002Fgithub.com\u002Fpixiv\u002Fchatvrm) - ChatVRMはブラウザで簡単に3Dキャラクターと会話ができるデモアプリケーションです。\n * [sftly-replace](https:\u002F\u002Fgithub.com\u002Fkmizu\u002Fsftly-replace) - A Chrome extention to replace the selected text softly\n * [summarize_arxv](https:\u002F\u002Fgithub.com\u002Frkmt\u002Fsummarize_arxv) - Summarize arXiv paper with figures\n * [aiavatarkit](https:\u002F\u002Fgithub.com\u002Fuezo\u002Faiavatarkit) - Building AI-based conversational avatars lightning fast\n * [pva-aoai-integration-solution](https:\u002F\u002Fgithub.com\u002FCity-of-Kobe\u002Fpva-aoai-integration-solution) - このリポジトリは、神戸市役所でのChatGPTの試行利用に向けて作成したフロー等をソリューション化し公開するものです。\n * [jp-azureopenai-samples](https:\u002F\u002Fgithub.com\u002Fazure-samples\u002Fjp-azureopenai-samples) - Azure OpenAIを活用したアプリケーション実装のリファレンスを目的として、アプリのサンプル（リファレンスアーキテクチャ、サンプルコードとデプロイ手順）を無償提供しています。\n * [character_chat](https:\u002F\u002Fgithub.com\u002Fmutaguchi\u002Fcharacter_chat) - OpenAIのAPIを利用して、設定したキャラクターと日本語で会話するチャットスクリプトです。\n * [chatgpt-slackbot](https:\u002F\u002Fgithub.com\u002Fsifue\u002Fchatgpt-slackbot) - OpenAIのChatGPT APIをSlack上で利用するためのSlackbotスクリプト (日本語での利用が前提)\n * [chatgpt-prompt-sample-japanese](https:\u002F\u002Fgithub.com\u002Fdahatake\u002Fchatgpt-prompt-sample-japanese) - ChatGPT の Prompt のサンプルです。\n * [kanji-flashcard-app-gpt4](https:\u002F\u002Fgithub.com\u002Fadilmoujahid\u002Fkanji-flashcard-app-gpt4) - A Japanese Kanji Flashcard App built using Python and Langchain, enhanced with the intelligence of GPT-4.\n * [IgakuQA](https:\u002F\u002Fgithub.com\u002Fjungokasai\u002FIgakuQA) - Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations\n * [japagen](https:\u002F\u002Fgithub.com\u002Fretrieva\u002Fjapagen) - 日本語タスクにおけるLLMを用いた疑似学習データ生成の検討\n * [generativeai-prompt-sample-japanese](https:\u002F\u002Fgithub.com\u002Fdahatake\u002Fgenerativeai-prompt-sample-japanese) - ChatGPTやCopilotなど各種生成AI用の「日本語]の Prompt のサンプル\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [VRChatGPT](https:\u002F\u002Fgithub.com\u002FYuchi-Games\u002FVRChatGPT) | - | - | ⭐ 15 | 🔴 march 2023|\n| 🔗 [AITuberDegikkoMirii](https:\u002F\u002Fgithub.com\u002FM-gen\u002FAITuberDegikkoMirii) | - | - | ⭐ 5 | 🔴 march 2023|\n| 🔗 [wanna](https:\u002F\u002Fgithub.com\u002Fhirokidaichi\u002Fwanna) | 📥 68 | 📦 20k | ⭐ 142 | 🔴 april 2023|\n| 🔗 [ChatdollKit](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit) | - | - | ⭐ 1.1k | 🟢 march|\n| 🔗 [ChuanhuChatGPTJapanese](https:\u002F\u002Fgithub.com\u002Fgyokuro33\u002FChuanhuChatGPTJapanese) | - | - | ⭐ 1 | 🔴 march 2023|\n| 🔗 [AISisterAIChan](https:\u002F\u002Fgithub.com\u002Fmanju-summoner\u002FAISisterAIChan) | - | - | ⭐ 26 | 🔴 may 2023|\n| 🔗 [vrchatbot](https:\u002F\u002Fgithub.com\u002FGeson-anko\u002Fvrchatbot) | - | - | ⭐ 29 | 🔴 december 2022|\n| 🔗 [gptuber-by-langchain](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fgptuber-by-langchain) | - | - | ⭐ 63 | 🔴 january 2023|\n| 🔗 [openai-chatfriend](https:\u002F\u002Fgithub.com\u002Fsupershaneski\u002Fopenai-chatfriend) | - | - | ⭐ 16 | 🔴 april 2023|\n| 🔗 [chrome-ext-translate-to-hiragana-with-chatgpt](https:\u002F\u002Fgithub.com\u002Ffranzwong\u002Fchrome-ext-translate-to-hiragana-with-chatgpt) | - | - | ⭐ 1 | 🔴 april 2023|\n| 🔗 [azure-search-openai-demo](https:\u002F\u002Fgithub.com\u002Fnohanaga\u002Fazure-search-openai-demo) | - | - | ⭐ 46 | 🔴 december 2023|\n| 🔗 [chatvrm](https:\u002F\u002Fgithub.com\u002Fpixiv\u002Fchatvrm) | - | - | ⭐ 834 | 🟡 may 2025|\n| 🔗 [sftly-replace](https:\u002F\u002Fgithub.com\u002Fkmizu\u002Fsftly-replace) | - | - | ⭐ 4 | 🔴 may 2023|\n| 🔗 [summarize_arxv](https:\u002F\u002Fgithub.com\u002Frkmt\u002Fsummarize_arxv) | - | - | ⭐ 173 | 🔴 may 2023|\n| 🔗 [aiavatarkit](https:\u002F\u002Fgithub.com\u002Fuezo\u002Faiavatarkit) | - | - | ⭐ 573 | 🟢 yesterday|\n| 🔗 [pva-aoai-integration-solution](https:\u002F\u002Fgithub.com\u002FCity-of-Kobe\u002Fpva-aoai-integration-solution) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [jp-azureopenai-samples](https:\u002F\u002Fgithub.com\u002Fazure-samples\u002Fjp-azureopenai-samples) | - | - | ⭐ 280 | 🟢 march|\n| 🔗 [character_chat](https:\u002F\u002Fgithub.com\u002Fmutaguchi\u002Fcharacter_chat) | - | - | ⭐ 16 | 🔴 june 2023|\n| 🔗 [chatgpt-slackbot](https:\u002F\u002Fgithub.com\u002Fsifue\u002Fchatgpt-slackbot) | - | - | ⭐ 64 | 🔴 july 2024|\n| 🔗 [chatgpt-prompt-sample-japanese](https:\u002F\u002Fgithub.com\u002Fdahatake\u002Fchatgpt-prompt-sample-japanese) | - | - | ⭐ 428 | 🟢 last thursday|\n| 🔗 [kanji-flashcard-app-gpt4](https:\u002F\u002Fgithub.com\u002Fadilmoujahid\u002Fkanji-flashcard-app-gpt4) | - | - | ⭐ 6 | 🔴 october 2023|\n| 🔗 [IgakuQA](https:\u002F\u002Fgithub.com\u002Fjungokasai\u002FIgakuQA) | - | - | ⭐ 49 | 🔴 march 2023|\n| 🔗 [japagen](https:\u002F\u002Fgithub.com\u002Fretrieva\u002Fjapagen) | - | - | ⭐ 1 | 🔴 october 2024|\n| 🔗 [generativeai-prompt-sample-japanese](https:\u002F\u002Fgithub.com\u002Fdahatake\u002Fgenerativeai-prompt-sample-japanese) | - | - | ⭐ 428 | 🟢 last thursday|\n\n\n## Dictionary and IME\nResources for Japanese dictionaries and input method editors (IME)\n\n * [mecab-ipadic-neologd](https:\u002F\u002Fgithub.com\u002Fneologd\u002Fmecab-ipadic-neologd) - Neologism dictionary based on the language resources on the Web for mecab-ipadic\n * [tdmelodic](https:\u002F\u002Fgithub.com\u002FPKSHATechnology-Research\u002Ftdmelodic) - A Japanese accent dictionary generator\n * [jamdict](https:\u002F\u002Fgithub.com\u002Fneocl\u002Fjamdict) - Python 3 library for manipulating Jim Breen's JMdict, KanjiDic2, JMnedict and kanji-radical mappings\n * [unidic-py](https:\u002F\u002Fgithub.com\u002Fpolm\u002Funidic-py) - Unidic packaged for installation via pip.\n * [Japanese-Company-Lexicon](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FJapanese-Company-Lexicon) - Japanese Company Lexicon (JCLdic)\n * [manbyo-sudachi](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fmanbyo-sudachi) - Sudachi向け万病辞書\n * [jawiki-kana-kanji-dict](https:\u002F\u002Fgithub.com\u002Ftokuhirom\u002Fjawiki-kana-kanji-dict) - Generate SKK\u002FMeCab dictionary from Wikipedia(Japanese edition)\n * [JIWC-Dictionary](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FJIWC-Dictionary) - dictionary to find emotion related to text\n * [JumanDIC](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FJumanDIC) - This repository contains source dictionary files to build dictionaries for JUMAN and Juman++.\n * [ipadic-py](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fipadic-py) - IPAdic packaged for easy use from Python.\n * [unidic-lite](https:\u002F\u002Fgithub.com\u002Fpolm\u002Funidic-lite) - A small version of UniDic for easy pip installs.\n * [emoji-ime-dictionary](https:\u002F\u002Fgithub.com\u002Fpeaceiris\u002Femoji-ime-dictionary) - 日本語で絵文字入力をするための IME 追加辞書 orange_book Google 日本語入力などで日本語から絵文字への変換を可能にする IME 拡張辞書\n * [google-ime-dictionary](https:\u002F\u002Fgithub.com\u002Fpeaceiris\u002Fgoogle-ime-dictionary) - 日英変換・英語略語展開のための IME 追加辞書 orange_book 日本語から英語への和英変換や英語略語の展開を Google 日本語入力や ATOK などで可能にする IME 拡張辞書\n * [dic-nico-intersection-pixiv](https:\u002F\u002Fgithub.com\u002Fncaq\u002Fdic-nico-intersection-pixiv) - ニコニコ大百科とピクシブ百科事典の共通部分のIME辞書\n * [google-ime-user-dictionary-ja-en](https:\u002F\u002Fgithub.com\u002FKEINOS\u002Fgoogle-ime-user-dictionary-ja-en) - GoogleIME用カタカナ語辞書プロジェクトのアーカイブです。Project archive of Google IME user dictionary from Katakana word ( Japanese loanword ) to English.\n * [emoticon](https:\u002F\u002Fgithub.com\u002Ftiwanari\u002Femoticon) - Google日本語入力の顔文字辞書∩(,,Ò‿Ó,,)∩\n * [mecab-mozcdic](https:\u002F\u002Fgithub.com\u002Fakirakubo\u002Fmecab-mozcdic) - open source mozc dictionaryをMeCab辞書のフォーマットに変換したものです。\n * [denonbu-ime-dic](https:\u002F\u002Fgithub.com\u002Falbno273\u002Fdenonbu-ime-dic) - 電音IME: Microsoft IMEなどで利用することを想定した「電音部」関連用語の辞書\n * [nijisanji-ime-dic](https:\u002F\u002Fgithub.com\u002FUmichang\u002Fnijisanji-ime-dic) - Microsoft IMEなどで利用することを想定した「にじさんじ」関連用語の用語辞書です。\n * [pokemon-ime-dic](https:\u002F\u002Fgithub.com\u002FUmichang\u002Fpokemon-ime-dic) - Microsoft IMEなどで利用することを想定した、現状判明している全てのポケモンの名前を網羅した用語辞書です。\n * [EJDict](https:\u002F\u002Fgithub.com\u002Fkujirahand\u002FEJDict) - English-Japanese Dictionary data (Public Domain) EJDict-hand\n * [Ayashiy-Nipongo-Dic](https:\u002F\u002Fgithub.com\u002FRinrin0413\u002FAyashiy-Nipongo-Dic) - 贵樣ばこゐ辞畫を使て正レい日本语を使ラことが出來ゑ。\n * [genshin-dict](https:\u002F\u002Fgithub.com\u002Fkotofurumiya\u002Fgenshin-dict) - Windows\u002FmacOSで使える原神の単語辞書です\n * [jmdict-simplified](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Fjmdict-simplified) - JMdict and JMnedict in JSON format\n * [mozcdict-ext](https:\u002F\u002Fgithub.com\u002Freasonset\u002Fmozcdict-ext) - Convert external words into Mozc system dictionary\n * [mh-dict-jp](https:\u002F\u002Fgithub.com\u002Futubo\u002Fmh-dict-jp) - MonsterHunterのユーザー辞書を作りたい…\n * [jitenbot](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002Fjitenbot) - Convert data from Japanese dictionary websites and applications into portable file formats\n * [mecab-unidic-neologd](https:\u002F\u002Fgithub.com\u002Fneologd\u002Fmecab-unidic-neologd) - Neologism dictionary based on the language resources on the Web for mecab-unidic\n * [hololive-dictionary](https:\u002F\u002Fgithub.com\u002Fheppokofrontend\u002Fhololive-dictionary) - ホロライブ（ホロライブプロダクション）に関する辞書ファイルです。.\u002Fdictionary フォルダ内のテキストファイルを使って、IMEに単語を追加できます。詳細はREADME.mdをご覧ください。\n * [jmdict-yomitan](https:\u002F\u002Fgithub.com\u002Fthemoeway\u002Fjmdict-yomitan) - JMdict, JMnedict, KANJIDIC for Yomitan\u002FYomichan.\n * [yomichan-jlpt-vocab](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002Fyomichan-jlpt-vocab) - JLPT level tags for words in Yomichan\n * [Jitendex](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002FJitendex) - A free and openly licensed Japanese-to-English dictionary compatible with multiple dictionary clients\n * [jiten](https:\u002F\u002Fgithub.com\u002Fobfusk\u002Fjiten) - japanese android\u002Fcli\u002Fweb dictionary based on jmdict\u002Fkanjidic — 日本語　辞典　和英辞典　漢英字典　和独辞典　和蘭辞典\n * [pixiv-yomitan](https:\u002F\u002Fgithub.com\u002FMarvNC\u002Fpixiv-yomitan) - Pixiv Encyclopedia Dictionary for Yomitan\n * [uchinaaguchi_dict](https:\u002F\u002Fgithub.com\u002Fnanjakkun\u002Fuchinaaguchi_dict) - うちなーぐち辞典（沖縄語辞典）\n * [yomitan-dictionaries](https:\u002F\u002Fgithub.com\u002Fmarvnc\u002Fyomitan-dictionaries) - Japanese and Chinese dictionaries for Yomitan.\n * [mouse_over_dictionary](https:\u002F\u002Fgithub.com\u002Fkengo700\u002Fmouse_over_dictionary) - マウスオーバーした単語を自動で読み取る汎用辞書ツール\n * [jisyo](https:\u002F\u002Fgithub.com\u002Fskk-dict\u002Fjisyo) - かな漢字変換エンジン SKKのための新しい辞書形式\n * [skk-jisyo.emoji-ja](https:\u002F\u002Fgithub.com\u002Fymrl\u002Fskk-jisyo.emoji-ja) - 日本語の読みから Emoji に変換するための SKK 辞書 😂\n * [anthy](https:\u002F\u002Fgithub.com\u002Fnetsphere-labs\u002Fanthy) - Anthy is a kana-kanji conversion engine for Japanese. It converts roma-ji to kana, and the kana text to a mixed kana and kanji.\n * [aws_dic_for_google_ime](https:\u002F\u002Fgithub.com\u002Fkonyu\u002Faws_dic_for_google_ime) - AWSサービス名のGoogle日本語入力向けの辞書\n * [cl-skkserv](https:\u002F\u002Fgithub.com\u002Ftani\u002Fcl-skkserv) - Common LispによるSKK辞書サーバーとその拡張\n * [anthy](https:\u002F\u002Fgithub.com\u002Fxorgy\u002Fanthy) - Anthy maintenance\n * [anthy-unicode](https:\u002F\u002Fgithub.com\u002Ffujiwarat\u002Fanthy-unicode) - Anthy Unicode - Another Anthy\n * [azooKey](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002FazooKey) - azooKey: A Japanese Keyboard iOS Application Fully Developed in Swift\n * [azookey-desktop](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002Fazookey-desktop) - Japanese Input Method \"azooKey\" for Desktop, supporting macOS\n * [fcitx5-hazkey](https:\u002F\u002Fgithub.com\u002F7ka-hiira\u002Ffcitx5-hazkey) - Japanese input method for fcitx5, powered by azooKey engine\n * [mozcdic-ut-place-names](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-place-names) - Mozc UT Place Name Dictionary is a dictionary converted from the Japan Post's ZIP code data for Mozc.\n * [azookeykanakanjiconverter](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002Fazookeykanakanjiconverter) - Kana-Kanji Conversion Module written in Swift\n * [libkkc](https:\u002F\u002Fgithub.com\u002Fueno\u002Flibkkc) - Japanese Kana Kanji conversion input method library\n * [libskk](https:\u002F\u002Fgithub.com\u002Fueno\u002Flibskk) - Japanese SKK input method library\n * [kanayomi-dict](https:\u002F\u002Fgithub.com\u002Fwarihima\u002Fkanayomi-dict) - openjtalk形式のユーザー辞書\n * [cjkvi-dict](https:\u002F\u002Fgithub.com\u002Fcjkvi\u002Fcjkvi-dict) - 漢字データベースの辞書関連データ\n * [wlsp-classical](https:\u002F\u002Fgithub.com\u002Fyocjyet\u002Fwlsp-classical) - 古典日本語の分類語彙表データ\n * [kanji-dict](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fkanji-dict) - 漢字の書き順(筆順)・読み方・画数・部首・用例・成り立ちを調べるための漢字辞書です。Unicode 15.1 のすべての漢字 98,682字を収録しています。\n * [Kaomoji_proj](https:\u002F\u002Fgithub.com\u002Fmtripg6666tdr\u002FKaomoji_proj) - (๑ ᴖ ᴑ ᴖ ๑)みょんかおもじ（旧Kaomoji_proj）はMicrosoft社の入力ソフト、Microsoft IME向けの顔文字の辞書を作成するプロジェクトです。\n * [kotlin-kana-kanji-converter](https:\u002F\u002Fgithub.com\u002FKazumaProject\u002Fkotlin-kana-kanji-converter) - Kotlin かな漢字変換プログラム\n * [alfred-japanese-dictionary](https:\u002F\u002Fgithub.com\u002Fchrisgrieser\u002Falfred-japanese-dictionary) - Japanese-English Dictionary using jisho.org with audio, csv export of entries, and preview of dictionary sites.\n * [ichiran](https:\u002F\u002Fgithub.com\u002Ftshatrov\u002Fichiran) - Linguistic tools for texts in Japanese language\n * [mikan](https:\u002F\u002Fgithub.com\u002Fmojyack\u002Fmikan) - A Japanese input method.\n * [colloquial-kansai-dictionary](https:\u002F\u002Fgithub.com\u002Fsethclydesdale\u002Fcolloquial-kansai-dictionary) - A quick reference for the material taught in Colloquial Kansai Japanese.\n * [jisho-open](https:\u002F\u002Fgithub.com\u002Fhlorenzi\u002Fjisho-open) - Web frontend for the JMdict Japanese-English dictionary project, with study list support!\n * [macskk](https:\u002F\u002Fgithub.com\u002Fmtgto\u002Fmacskk) - Yet Another macOS SKK Input Method\n * [nandoku](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fnandoku) - 難読漢字を学年別にまとめた辞書です。\n * [japanese_android_ime](https:\u002F\u002Fgithub.com\u002Fnelsonapenn\u002Fjapanese_android_ime) - A FOSS Japanese IME for Android\n * [anthywl](https:\u002F\u002Fgithub.com\u002Ftadeokondrak\u002Fanthywl) - Japanese input method for Sway using libanthy\n * [sekka](https:\u002F\u002Fgithub.com\u002Fkiyoka\u002Fsekka) - Yet another Japanese Input Method inspired by SKK.\n * [sumibi](https:\u002F\u002Fgithub.com\u002Fkiyoka\u002Fsumibi) - Japanese input method powered by ChatGPT API\n * [jinmei-dict](https:\u002F\u002Fgithub.com\u002Fs1r-j\u002Fjinmei-dict) - 辞書データから人名だけを抜き出し、読み仮名（カタカナ）をキーとして、候補となる書き文字をリストで保持するようなJSON形式に整形しています。\n * [japanesekeyboard](https:\u002F\u002Fgithub.com\u002Fkazumaproject\u002Fjapanesekeyboard) - スミレ 完全オフラインの日本語キーボードアプリ\n * [japanesearabic](https:\u002F\u002Fgithub.com\u002Fa-hamdi\u002Fjapanesearabic) - JapaneseArabic Dictionary (日本語・アラビア語辞書) قاموس اللغة اليابانية والعربية (Yomitan)\n * [o-dic](https:\u002F\u002Fgithub.com\u002Fmakotoga\u002Fo-dic) - 沖縄辞書\n * [skk-emoji-jisyo](https:\u002F\u002Fgithub.com\u002Fuasi\u002Fskk-emoji-jisyo) - SKK 絵文字辞書\n * [mozcdic-ut-personal-names](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-personal-names) - A personal name dictionary for Mozc.\n * [mozcdic-ut-sudachidict](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-sudachidict) - A dictionary converted from SudachiDict for Mozc.\n * [nihongo](https:\u002F\u002Fgithub.com\u002Fsph-mn\u002Fnihongo) - japanese language data and dictionary\n * [kagome-dict](https:\u002F\u002Fgithub.com\u002Fikawaha\u002Fkagome-dict) - Dictionary Library for Kagome v2\n * [canna](https:\u002F\u002Fgithub.com\u002Fcanna-input\u002Fcanna) - Canna Japanese input system\n * [kansai-accent-dictionary](https:\u002F\u002Fgithub.com\u002Fnullponull\u002Fkansai-accent-dictionary) - 京阪式アクセント（関西弁）辞書 - 4,615語を収録した日本語方言アクセント辞書\n * [jitendex](https:\u002F\u002Fgithub.com\u002Fjitendex\u002Fjitendex) - A free, offline, and openly licensed Japanese-to-English dictionary. Updates monthly!\n * [karukan](https:\u002F\u002Fgithub.com\u002Ftogatoga\u002Fkarukan) - Japanese Input Method System for Linux, Neural Kana-Kanji Conversion Engine + fcitx5 IME\n * [shitto-mania-dic](https:\u002F\u002Fgithub.com\u002Fjunikematsu\u002Fshitto-mania-dic) - 嫉妬辞書（Shitto-Mania \u002F Jealousy Dictionary）\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [mecab-ipadic-neologd](https:\u002F\u002Fgithub.com\u002Fneologd\u002Fmecab-ipadic-neologd) | - | - | ⭐ 2.8k | 🔴 september 2020|\n| 🔗 [tdmelodic](https:\u002F\u002Fgithub.com\u002FPKSHATechnology-Research\u002Ftdmelodic) | - | - | ⭐ 124 | 🔴 march 2024|\n| 🔗 [jamdict](https:\u002F\u002Fgithub.com\u002Fneocl\u002Fjamdict) | 📥 337 | 📦 54k | ⭐ 168 | 🔴 june 2021|\n| 🔗 [unidic-py](https:\u002F\u002Fgithub.com\u002Fpolm\u002Funidic-py) | 📥 72k | 📦 10M | ⭐ 109 | 🔴 february 2025|\n| 🔗 [Japanese-Company-Lexicon](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FJapanese-Company-Lexicon) | - | - | ⭐ 100 | 🔴 january 2023|\n| 🔗 [manbyo-sudachi](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fmanbyo-sudachi) | - | - | ⭐ 7 | 🔴 april 2021|\n| 🔗 [jawiki-kana-kanji-dict](https:\u002F\u002Fgithub.com\u002Ftokuhirom\u002Fjawiki-kana-kanji-dict) | - | - | ⭐ 61 | 🟢 last tuesday|\n| 🔗 [JIWC-Dictionary](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FJIWC-Dictionary) | - | - | ⭐ 40 | 🔴 january 2021|\n| 🔗 [JumanDIC](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FJumanDIC) | - | - | ⭐ 4 | 🔴 august 2022|\n| 🔗 [ipadic-py](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fipadic-py) | 📥 32k | 📦 7M | ⭐ 24 | 🔴 october 2021|\n| 🔗 [unidic-lite](https:\u002F\u002Fgithub.com\u002Fpolm\u002Funidic-lite) | 📥 78k | 📦 10M | ⭐ 49 | 🔴 september 2020|\n| 🔗 [emoji-ime-dictionary](https:\u002F\u002Fgithub.com\u002Fpeaceiris\u002Femoji-ime-dictionary) | - | - | ⭐ 366 | 🔴 january 2023|\n| 🔗 [google-ime-dictionary](https:\u002F\u002Fgithub.com\u002Fpeaceiris\u002Fgoogle-ime-dictionary) | - | - | ⭐ 104 | 🔴 january 2023|\n| 🔗 [dic-nico-intersection-pixiv](https:\u002F\u002Fgithub.com\u002Fncaq\u002Fdic-nico-intersection-pixiv) | - | - | ⭐ 83 | 🔴 september 2024|\n| 🔗 [google-ime-user-dictionary-ja-en](https:\u002F\u002Fgithub.com\u002FKEINOS\u002Fgoogle-ime-user-dictionary-ja-en) | - | - | ⭐ 58 | 🔴 december 2016|\n| 🔗 [emoticon](https:\u002F\u002Fgithub.com\u002Ftiwanari\u002Femoticon) | - | - | ⭐ 44 | 🔴 may 2020|\n| 🔗 [mecab-mozcdic](https:\u002F\u002Fgithub.com\u002Fakirakubo\u002Fmecab-mozcdic) | - | - | ⭐ 10 | 🔴 january 2018|\n| 🔗 [denonbu-ime-dic](https:\u002F\u002Fgithub.com\u002Falbno273\u002Fdenonbu-ime-dic) | - | - | ⭐ 2 | 🔴 november 2022|\n| 🔗 [nijisanji-ime-dic](https:\u002F\u002Fgithub.com\u002FUmichang\u002Fnijisanji-ime-dic) | - | - | ⭐ 38 | 🟢 march|\n| 🔗 [pokemon-ime-dic](https:\u002F\u002Fgithub.com\u002FUmichang\u002Fpokemon-ime-dic) | - | - | ⭐ 0 | 🔴 january 2020|\n| 🔗 [EJDict](https:\u002F\u002Fgithub.com\u002Fkujirahand\u002FEJDict) | - | - | ⭐ 254 | 🟡 november 2025|\n| 🔗 [Ayashiy-Nipongo-Dic](https:\u002F\u002Fgithub.com\u002FRinrin0413\u002FAyashiy-Nipongo-Dic) | - | - | ⭐ 26 | 🔴 may 2024|\n| 🔗 [genshin-dict](https:\u002F\u002Fgithub.com\u002Fkotofurumiya\u002Fgenshin-dict) | - | - | ⭐ 126 | 🟢 february|\n| 🔗 [jmdict-simplified](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Fjmdict-simplified) | - | - | ⭐ 349 | 🟢 last monday|\n| 🔗 [mozcdict-ext](https:\u002F\u002Fgithub.com\u002Freasonset\u002Fmozcdict-ext) | - | - | ⭐ 69 | 🟡 september 2025|\n| 🔗 [mh-dict-jp](https:\u002F\u002Fgithub.com\u002Futubo\u002Fmh-dict-jp) | - | - | ⭐ 5 | 🟡 april 2025|\n| 🔗 [jitenbot](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002Fjitenbot) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [mecab-unidic-neologd](https:\u002F\u002Fgithub.com\u002Fneologd\u002Fmecab-unidic-neologd) | - | - | ⭐ 87 | 🔴 september 2020|\n| 🔗 [hololive-dictionary](https:\u002F\u002Fgithub.com\u002Fheppokofrontend\u002Fhololive-dictionary) | - | - | ⭐ 24 | 🔴 december 2024|\n| 🔗 [jmdict-yomitan](https:\u002F\u002Fgithub.com\u002Fthemoeway\u002Fjmdict-yomitan) | - | - | ⭐ 259 | 🟢 february|\n| 🔗 [yomichan-jlpt-vocab](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002Fyomichan-jlpt-vocab) | - | - | ⭐ 126 | 🟡 august 2025|\n| 🔗 [Jitendex](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002FJitendex) | - | - | ⭐ 466 | 🟢 today|\n| 🔗 [jiten](https:\u002F\u002Fgithub.com\u002Fobfusk\u002Fjiten) | - | - | ⭐ 129 | 🔴 december 2023|\n| 🔗 [pixiv-yomitan](https:\u002F\u002Fgithub.com\u002FMarvNC\u002Fpixiv-yomitan) | - | - | ⭐ 55 | 🟢 march|\n| 🔗 [uchinaaguchi_dict](https:\u002F\u002Fgithub.com\u002Fnanjakkun\u002Fuchinaaguchi_dict) | - | - | ⭐ 4 | 🟢 last monday|\n| 🔗 [yomitan-dictionaries](https:\u002F\u002Fgithub.com\u002Fmarvnc\u002Fyomitan-dictionaries) | - | - | ⭐ 755 | 🟢 march|\n| 🔗 [mouse_over_dictionary](https:\u002F\u002Fgithub.com\u002Fkengo700\u002Fmouse_over_dictionary) | - | - | ⭐ 72 | 🔴 january 2020|\n| 🔗 [jisyo](https:\u002F\u002Fgithub.com\u002Fskk-dict\u002Fjisyo) | - | - | ⭐ 28 | 🔴 september 2023|\n| 🔗 [skk-jisyo.emoji-ja](https:\u002F\u002Fgithub.com\u002Fymrl\u002Fskk-jisyo.emoji-ja) | - | - | ⭐ 30 | 🔴 march 2018|\n| 🔗 [aws_dic_for_google_ime](https:\u002F\u002Fgithub.com\u002Fkonyu\u002Faws_dic_for_google_ime) | - | - | ⭐ 7 | 🔴 november 2019|\n| 🔗 [cl-skkserv](https:\u002F\u002Fgithub.com\u002Ftani\u002Fcl-skkserv) | - | - | ⭐ 31 | 🔴 october 2024|\n| 🔗 [anthy](https:\u002F\u002Fgithub.com\u002Fxorgy\u002Fanthy) | - | - | ⭐ 3 | 🔴 july 2013|\n| 🔗 [anthy-unicode](https:\u002F\u002Fgithub.com\u002Ffujiwarat\u002Fanthy-unicode) | - | - | ⭐ 42 | 🟢 march|\n| 🔗 [azooKey](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002FazooKey) | - | - | ⭐ 684 | 🟢 yesterday|\n| 🔗 [azookey-desktop](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002Fazookey-desktop) | - | - | ⭐ 876 | 🟢 last monday|\n| 🔗 [fcitx5-hazkey](https:\u002F\u002Fgithub.com\u002F7ka-hiira\u002Ffcitx5-hazkey) | - | - | ⭐ 183 | 🟢 february|\n| 🔗 [mozcdic-ut-place-names](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-place-names) | - | - | ⭐ 22 | 🟢 last thursday|\n| 🔗 [azookeykanakanjiconverter](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002Fazookeykanakanjiconverter) | - | - | ⭐ 139 | 🟢 last tuesday|\n| 🔗 [libkkc](https:\u002F\u002Fgithub.com\u002Fueno\u002Flibkkc) | - | - | ⭐ 112 | 🔴 august 2024|\n| 🔗 [libskk](https:\u002F\u002Fgithub.com\u002Fueno\u002Flibskk) | - | - | ⭐ 100 | 🟢 march|\n| 🔗 [kanayomi-dict](https:\u002F\u002Fgithub.com\u002Fwarihima\u002Fkanayomi-dict) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [cjkvi-dict](https:\u002F\u002Fgithub.com\u002Fcjkvi\u002Fcjkvi-dict) | - | - | ⭐ 110 | 🔴 september 2017|\n| 🔗 [wlsp-classical](https:\u002F\u002Fgithub.com\u002Fyocjyet\u002Fwlsp-classical) | - | - | ⭐ 2 | 🟡 november 2025|\n| 🔗 [kanji-dict](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fkanji-dict) | - | - | ⭐ 6 | 🟢 march|\n| 🔗 [Kaomoji_proj](https:\u002F\u002Fgithub.com\u002Fmtripg6666tdr\u002FKaomoji_proj) | - | - | ⭐ 11 | 🟡 october 2025|\n| 🔗 [kotlin-kana-kanji-converter](https:\u002F\u002Fgithub.com\u002FKazumaProject\u002Fkotlin-kana-kanji-converter) | - | - | ⭐ 5 | 🟢 last wednesday|\n| 🔗 [alfred-japanese-dictionary](https:\u002F\u002Fgithub.com\u002Fchrisgrieser\u002Falfred-japanese-dictionary) | - | - | ⭐ 6 | 🟢 february|\n| 🔗 [ichiran](https:\u002F\u002Fgithub.com\u002Ftshatrov\u002Fichiran) | - | - | ⭐ 390 | 🟢 january|\n| 🔗 [mikan](https:\u002F\u002Fgithub.com\u002Fmojyack\u002Fmikan) | - | - | ⭐ 24 | 🟡 june 2025|\n| 🔗 [colloquial-kansai-dictionary](https:\u002F\u002Fgithub.com\u002Fsethclydesdale\u002Fcolloquial-kansai-dictionary) | - | - | ⭐ 9 | 🟢 february|\n| 🔗 [jisho-open](https:\u002F\u002Fgithub.com\u002Fhlorenzi\u002Fjisho-open) | - | - | ⭐ 57 | 🟢 february|\n| 🔗 [macskk](https:\u002F\u002Fgithub.com\u002Fmtgto\u002Fmacskk) | - | - | ⭐ 287 | 🟢 today|\n| 🔗 [nandoku](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fnandoku) | - | - | ⭐ 1 | 🟢 february|\n| 🔗 [japanese_android_ime](https:\u002F\u002Fgithub.com\u002Fnelsonapenn\u002Fjapanese_android_ime) | - | - | ⭐ 2 | 🟡 september 2025|\n| 🔗 [anthywl](https:\u002F\u002Fgithub.com\u002Ftadeokondrak\u002Fanthywl) | - | - | ⭐ 34 | 🟡 april 2025|\n| 🔗 [sekka](https:\u002F\u002Fgithub.com\u002Fkiyoka\u002Fsekka) | - | - | ⭐ 24 | 🟡 july 2025|\n| 🔗 [sumibi](https:\u002F\u002Fgithub.com\u002Fkiyoka\u002Fsumibi) | - | - | ⭐ 43 | 🟢 march|\n| 🔗 [jinmei-dict](https:\u002F\u002Fgithub.com\u002Fs1r-j\u002Fjinmei-dict) | - | - | ⭐ 7 | 🔴 april 2020|\n| 🔗 [japanesekeyboard](https:\u002F\u002Fgithub.com\u002Fkazumaproject\u002Fjapanesekeyboard) | - | - | ⭐ 226 | 🟢 last friday|\n| 🔗 [japanesearabic](https:\u002F\u002Fgithub.com\u002Fa-hamdi\u002Fjapanesearabic) | - | - | ⭐ 19 | 🟡 may 2025|\n| 🔗 [o-dic](https:\u002F\u002Fgithub.com\u002Fmakotoga\u002Fo-dic) | - | - | ⭐ 6 | 🔴 invalid|\n| 🔗 [skk-emoji-jisyo](https:\u002F\u002Fgithub.com\u002Fuasi\u002Fskk-emoji-jisyo) | - | - | ⭐ 140 | 🔴 january 2025|\n| 🔗 [mozcdic-ut-personal-names](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-personal-names) | - | - | ⭐ 26 | 🟢 last thursday|\n| 🔗 [mozcdic-ut-sudachidict](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-sudachidict) | - | - | ⭐ 22 | 🟢 february|\n| 🔗 [nihongo](https:\u002F\u002Fgithub.com\u002Fsph-mn\u002Fnihongo) | - | - | ⭐ 20 | 🔴 january 2025|\n| 🔗 [kagome-dict](https:\u002F\u002Fgithub.com\u002Fikawaha\u002Fkagome-dict) | - | - | ⭐ 15 | 🟢 march|\n| 🔗 [canna](https:\u002F\u002Fgithub.com\u002Fcanna-input\u002Fcanna) | - | - | ⭐ 4 | 🟡 august 2025|\n| 🔗 [kansai-accent-dictionary](https:\u002F\u002Fgithub.com\u002Fnullponull\u002Fkansai-accent-dictionary) | - | - | ⭐ 1 | 🟡 december 2025|\n| 🔗 [jitendex](https:\u002F\u002Fgithub.com\u002Fjitendex\u002Fjitendex) | - | - | ⭐ 466 | 🟢 today|\n| 🔗 [karukan](https:\u002F\u002Fgithub.com\u002Ftogatoga\u002Fkarukan) | - | - | ⭐ 262 | 🟢 february|\n| 🔗 [shitto-mania-dic](https:\u002F\u002Fgithub.com\u002Fjunikematsu\u002Fshitto-mania-dic) | - | - | ⭐ 0 | 🟢 march|\n\n\n## Corpus\n\n### Part-of-speech tagging \u002F Named entity recognition\nCorpora annotated with part-of-speech tags and named entities\n\n * [ner-wikipedia-dataset](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fner-wikipedia-dataset) - Wikipediaを用いた日本語の固有表現抽出データセット\n * [IOB2Corpus](https:\u002F\u002Fgithub.com\u002FHironsan\u002FIOB2Corpus) - Japanese IOB2 tagged corpus for Named Entity Recognition.\n * [TwitterCorpus](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002FTwitterCorpus) - 首都大日本語 Twitter コーパス\n * [UD_Japanese-PUD](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002FUD_Japanese-PUD) - Parallel Universal Dependencies.\n * [UD_Japanese-GSD](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002FUD_Japanese-GSD) - Japanese data from the Google UDT 2.0.\n * [KWDLC](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FKWDLC) - Kyoto University Web Document Leads Corpus\n * [AnnotatedFKCCorpus](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FAnnotatedFKCCorpus) - Annotated Fuman Kaitori Center Corpus\n * [UD_Japanese-GSDLUW](https:\u002F\u002Fgithub.com\u002FUniversalDependencies\u002FUD_Japanese-GSDLUW) - Long-unit-word version of UD_Japanese-GSD\n * [ud_japanese-bccwj](https:\u002F\u002Fgithub.com\u002Funiversaldependencies\u002Fud_japanese-bccwj) - This Universal Dependencies (UD) Japanese treebank is based on the definition of UD Japanese convention described in the UD documentation.\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [ner-wikipedia-dataset](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fner-wikipedia-dataset) | - | - | ⭐ 142 | 🔴 september 2023|\n| 🔗 [IOB2Corpus](https:\u002F\u002Fgithub.com\u002FHironsan\u002FIOB2Corpus) | - | - | ⭐ 61 | 🔴 february 2020|\n| 🔗 [TwitterCorpus](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002FTwitterCorpus) | - | - | ⭐ 21 | 🔴 march 2016|\n| 🔗 [UD_Japanese-PUD](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002FUD_Japanese-PUD) | - | - | ⭐ 0 | 🔴 may 2020|\n| 🔗 [UD_Japanese-GSD](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002FUD_Japanese-GSD) | - | - | ⭐ 28 | 🔴 may 2022|\n| 🔗 [KWDLC](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FKWDLC) | - | - | ⭐ 83 | 🔴 december 2023|\n| 🔗 [AnnotatedFKCCorpus](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FAnnotatedFKCCorpus) | - | - | ⭐ 18 | 🔴 december 2023|\n| 🔗 [anthy](https:\u002F\u002Fgithub.com\u002Fnetsphere-labs\u002Fanthy) | - | - | ⭐ 16 | 🔴 february 2023|\n| 🔗 [UD_Japanese-GSDLUW](https:\u002F\u002Fgithub.com\u002FUniversalDependencies\u002FUD_Japanese-GSDLUW) | - | - | ⭐ 3 | 🟡 november 2025|\n| 🔗 [ud_japanese-bccwj](https:\u002F\u002Fgithub.com\u002Funiversaldependencies\u002Fud_japanese-bccwj) | - | - | ⭐ 26 | 🟡 november 2025|\n\n\n### Parallel corpus\nBilingual corpora containing aligned sentences for translation tasks\n\n * [small_parallel_enja](https:\u002F\u002Fgithub.com\u002Fodashi\u002Fsmall_parallel_enja) - 50k English-Japanese Parallel Corpus for Machine Translation Benchmark.\n * [Web-Crawled-Corpus-for-Japanese-Chinese-NMT](https:\u002F\u002Fgithub.com\u002Fzhang-jinyi\u002FWeb-Crawled-Corpus-for-Japanese-Chinese-NMT) - A Web Crawled Corpus for Japanese-Chinese NMT\n * [CourseraParallelCorpusMining](https:\u002F\u002Fgithub.com\u002Fshyyhs\u002FCourseraParallelCorpusMining) - Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation\n * [JESC](https:\u002F\u002Fgithub.com\u002Frpryzant\u002FJESC) - A large parallel corpus of English and Japanese\n * [AMI-Meeting-Parallel-Corpus](https:\u002F\u002Fgithub.com\u002Ftsuruoka-lab\u002FAMI-Meeting-Parallel-Corpus) - AMI Meeting Parallel Corpus\n * [giant_ja-en_parallel_corpus](https:\u002F\u002Fgithub.com\u002FDayuanJiang\u002Fgiant_ja-en_parallel_corpus) - This directory includes a giant Japanese-English subtitle corpus. The raw data comes from the Stanford’s JESC project.\n * [jesc_small](https:\u002F\u002Fgithub.com\u002Fyusugomori\u002Fjesc_small) - Small Japanese-English Subtitle Corpus\n * [graded-enja-corpus](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fgraded-enja-corpus) - 禁止用語や単語レベルを考慮した日英対訳コーパスです。\n * [cjk-compsci-terms](https:\u002F\u002Fgithub.com\u002Fdahlia\u002Fcjk-compsci-terms) - CJK computer science terms comparison \u002F 中日韓電腦科學術語對照 \u002F 日中韓のコンピュータ科学の用語対照 \u002F 한·중·일 전산학 용어 대조\n * [Laboro-ParaCorpus](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-ParaCorpus) - Scripts for creating a Japanese-English parallel corpus and training NMT models\n * [google-vs-deepl-je](https:\u002F\u002Fgithub.com\u002FTzawa\u002Fgoogle-vs-deepl-je) - google-vs-deepl-je\n * [matcha](https:\u002F\u002Fgithub.com\u002Fehimenlp\u002Fmatcha) - 訪日観光客向けメディアMATCHAの記事から、日本語のテキスト平易化のためのデータセットを構築しました。\n * [en-ja-el](https:\u002F\u002Fgithub.com\u002Fshigashiyama\u002Fen-ja-el) - EnJaEL: En-Ja Parallel Entity Linking Dataset (Version 1.0)\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [small_parallel_enja](https:\u002F\u002Fgithub.com\u002Fodashi\u002Fsmall_parallel_enja) | - | - | ⭐ 98 | 🔴 september 2019|\n| 🔗 [Web-Crawled-Corpus-for-Japanese-Chinese-NMT](https:\u002F\u002Fgithub.com\u002Fzhang-jinyi\u002FWeb-Crawled-Corpus-for-Japanese-Chinese-NMT) | - | - | ⭐ 15 | 🔴 september 2023|\n| 🔗 [CourseraParallelCorpusMining](https:\u002F\u002Fgithub.com\u002Fshyyhs\u002FCourseraParallelCorpusMining) | - | - | ⭐ 15 | 🔴 august 2024|\n| 🔗 [JESC](https:\u002F\u002Fgithub.com\u002Frpryzant\u002FJESC) | - | - | ⭐ 89 | 🔴 november 2017|\n| 🔗 [AMI-Meeting-Parallel-Corpus](https:\u002F\u002Fgithub.com\u002Ftsuruoka-lab\u002FAMI-Meeting-Parallel-Corpus) | - | - | ⭐ 11 | 🔴 december 2020|\n| 🔗 [giant_ja-en_parallel_corpus](https:\u002F\u002Fgithub.com\u002FDayuanJiang\u002Fgiant_ja-en_parallel_corpus) | - | - | ⭐ 5 | 🔴 august 2019|\n| 🔗 [jesc_small](https:\u002F\u002Fgithub.com\u002Fyusugomori\u002Fjesc_small) | - | - | ⭐ 3 | 🔴 july 2019|\n| 🔗 [graded-enja-corpus](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fgraded-enja-corpus) | - | - | ⭐ 6 | 🟡 august 2025|\n| 🔗 [cjk-compsci-terms](https:\u002F\u002Fgithub.com\u002Fdahlia\u002Fcjk-compsci-terms) | - | - | ⭐ 150 | 🟢 february|\n| 🔗 [Laboro-ParaCorpus](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-ParaCorpus) | - | - | ⭐ 18 | 🔴 november 2021|\n| 🔗 [google-vs-deepl-je](https:\u002F\u002Fgithub.com\u002FTzawa\u002Fgoogle-vs-deepl-je) | - | - | ⭐ 4 | 🔴 march 2020|\n| 🔗 [matcha](https:\u002F\u002Fgithub.com\u002Fehimenlp\u002Fmatcha) | - | - | ⭐ 6 | 🔴 january 2025|\n| 🔗 [en-ja-el](https:\u002F\u002Fgithub.com\u002Fshigashiyama\u002Fen-ja-el) | - | - | ⭐ 2 | 🔴 january 2025|\n\n\n### Dialog corpus\nCollections of conversation data for training dialogue systems\n\n * [JMRD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FJMRD) - Japanese Movie Recommendation Dialogue dataset\n * [open2ch-dialogue-corpus](https:\u002F\u002Fgithub.com\u002F1never\u002Fopen2ch-dialogue-corpus) - おーぷん2ちゃんねるをクロールして作成した対話コーパス\n * [BSD](https:\u002F\u002Fgithub.com\u002Ftsuruoka-lab\u002FBSD) - The Business Scene Dialogue corpus\n * [asdc](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fasdc) - Accommodation Search Dialog Corpus (宿泊施設探索対話コーパス)\n * [japanese-corpus](https:\u002F\u002Fgithub.com\u002FMokkeMeguru\u002Fjapanese-corpus) - 日本語の対話データ for seq2seq etc\n * [BPersona-chat](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FBPersona-chat) - This repository contains the Japanese–English bilingual chat corpus BPersona-chat published in the paper Chat Translation Error Detection for Assisting Cross-lingual Communications at AACL-IJCNLP 2022's Workshop Eval4NLP 2022.\n * [japanese-daily-dialogue](https:\u002F\u002Fgithub.com\u002Fjqk09a\u002Fjapanese-daily-dialogue) - Japanese Daily Dialogue, or 日本語日常対話コーパス in Japanese, is a high-quality multi-turn dialogue dataset containing daily conversations on five topics: dailylife, school, travel, health, and entertainment.\n * [llm-japanese-dataset](https:\u002F\u002Fgithub.com\u002Fmasanorihirano\u002Fllm-japanese-dataset) - LLM構築用の日本語チャットデータセット\n * [kokorochat](https:\u002F\u002Fgithub.com\u002Fuec-inabalab\u002Fkokorochat) - ロールプレイで収集した日本語のカウンセリング対話データセット\n * [JMultiWOZ-TC](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002FJMultiWOZ-TC) - マルチターン対話でのエージェントのfunction calling評価\n * [HOTATE](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FHOTATE) - 本音・建前付き日本語対話データセット\n * [ETCDataset](https:\u002F\u002Fgithub.com\u002FUEC-InabaLab\u002FETCDataset) - Emotion Transcription in Conversation Dataset は，対話中の各発話に対して話者自身が記述した心情文を含む，約1,000 件の対話からなる日本語対話データセットです．\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [JMRD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FJMRD) | - | - | ⭐ 29 | 🔴 july 2022|\n| 🔗 [open2ch-dialogue-corpus](https:\u002F\u002Fgithub.com\u002F1never\u002Fopen2ch-dialogue-corpus) | - | - | ⭐ 99 | 🔴 june 2021|\n| 🔗 [BSD](https:\u002F\u002Fgithub.com\u002Ftsuruoka-lab\u002FBSD) | - | - | ⭐ 73 | 🔴 november 2021|\n| 🔗 [asdc](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fasdc) | - | - | ⭐ 25 | 🔴 august 2023|\n| 🔗 [japanese-corpus](https:\u002F\u002Fgithub.com\u002FMokkeMeguru\u002Fjapanese-corpus) | - | - | ⭐ 3 | 🔴 october 2018|\n| 🔗 [BPersona-chat](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FBPersona-chat) | - | - | ⭐ 5 | 🔴 january 2023|\n| 🔗 [japanese-daily-dialogue](https:\u002F\u002Fgithub.com\u002Fjqk09a\u002Fjapanese-daily-dialogue) | - | - | ⭐ 56 | 🔴 march 2023|\n| 🔗 [llm-japanese-dataset](https:\u002F\u002Fgithub.com\u002Fmasanorihirano\u002Fllm-japanese-dataset) | - | - | ⭐ 88 | 🔴 january 2024|\n| 🔗 [kokorochat](https:\u002F\u002Fgithub.com\u002Fuec-inabalab\u002Fkokorochat) | - | - | ⭐ 20 | 🟡 august 2025|\n| 🔗 [JMultiWOZ-TC](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002FJMultiWOZ-TC) | - | - | ⭐ 0 | 🟢 march|\n| 🔗 [HOTATE](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FHOTATE) | - | - | ⭐ 1 | 🟢 february|\n| 🔗 [ETCDataset](https:\u002F\u002Fgithub.com\u002FUEC-InabaLab\u002FETCDataset) | - | - | ⭐ 12 | 🟢 january|\n\n### Others\nCorpora for tasks such as question answering or entailment recognition\n\n * [jrte-corpus](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fjrte-corpus) - Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)\n * [kanji-data](https:\u002F\u002Fgithub.com\u002Fdavidluzgouveia\u002Fkanji-data) - A JSON kanji dataset with updated JLPT levels and WaniKani information\n * [JapaneseWordSimilarityDataset](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002FJapaneseWordSimilarityDataset) - Japanese Word Similarity Dataset\n * [simple-jppdb](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002Fsimple-jppdb) - A paraphrase database for Japanese text simplification\n * [chABSA-dataset](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FchABSA-dataset) - chakki's Aspect-Based Sentiment Analysis dataset\n * [JaQuAD](https:\u002F\u002Fgithub.com\u002FSkelterLabsInc\u002FJaQuAD) - JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension (2022, Skelter Labs)\n * [JaNLI](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJaNLI) - Japanese Adversarial Natural Language Inference Dataset\n * [ebe-dataset](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Febe-dataset) - Evidence-based Explanation Dataset (AACL-IJCNLP 2020)\n * [emoji-ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Femoji-ja) - UNICODE絵文字の日本語読み\u002Fキーワード\u002F分類辞書\n * [nayose-wikipedia-ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fnayose-wikipedia-ja) - Wikipediaから作成した日本語名寄せデータセット\n * [ja.text8](https:\u002F\u002Fgithub.com\u002FHironsan\u002Fja.text8) - Japanese text8 corpus for word embedding.\n * [ThreeLineSummaryDataset](https:\u002F\u002Fgithub.com\u002FKodairaTomonori\u002FThreeLineSummaryDataset) - 3行要約データセット\n * [japanese](https:\u002F\u002Fgithub.com\u002Fhingston\u002Fjapanese) - This repo contains a list of the 44,998 most common Japanese words in order of frequency, as determined by the University of Leeds Corpus.\n * [kanji-frequency](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Fkanji-frequency) - Kanji usage frequency data collected from various sources\n * [TEDxJP-10K](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FTEDxJP-10K) - TEDxJP-10K ASR Evaluation Dataset\n * [CoARiJ](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FCoARiJ) - Corpus of Annual Reports in Japan\n * [technological-book-corpus-ja](https:\u002F\u002Fgithub.com\u002Ftextlint-ja\u002Ftechnological-book-corpus-ja) - 日本語で書かれた技術書を収集した生コーパス\u002Fツール\n * [ita-corpus-chuwa](https:\u002F\u002Fgithub.com\u002Fshirayu\u002Fita-corpus-chuwa) - Chunked word annotation for ITA corpus\n * [wikipedia-utils](https:\u002F\u002Fgithub.com\u002Fsingletongue\u002Fwikipedia-utils) - Utility scripts for preprocessing Wikipedia texts for NLP\n * [inappropriate-words-ja](https:\u002F\u002Fgithub.com\u002FMosasoM\u002Finappropriate-words-ja) - 日本語における不適切表現を収集します。自然言語処理の時のデータクリーニング用等に使えると思います。\n * [house-of-councillors](https:\u002F\u002Fgithub.com\u002Fsmartnews-smri\u002Fhouse-of-councillors) - 参議院の公式ウェブサイトから会派、議員、議案、質問主意書のデータを整理しました。\n * [house-of-representatives](https:\u002F\u002Fgithub.com\u002Fsmartnews-smri\u002Fhouse-of-representatives) - 国会議案データベース：衆議院\n * [STAIR-captions](https:\u002F\u002Fgithub.com\u002FSTAIR-Lab-CIT\u002FSTAIR-captions) - STAIR captions: large-scale Japanese image caption dataset\n * [Winograd-Schema-Challenge-Ja](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FWinograd-Schema-Challenge-Ja) - Japanese Translation of Winograd Schema Challenge\n * [speechBSD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FspeechBSD) - An extension of the BSD corpus with audio and speaker attribute information\n * [ita-corpus](https:\u002F\u002Fgithub.com\u002Fmmorise\u002Fita-corpus) - ITAコーパスの文章リスト\n * [rohan4600](https:\u002F\u002Fgithub.com\u002Fmmorise\u002Frohan4600) - モーラバランス型日本語コーパス\n * [anlp-jp-history](https:\u002F\u002Fgithub.com\u002Fwhym\u002Fanlp-jp-history) - 言語処理学会年次大会講演の全リスト・機械可読版など\n * [keigo_transfer_task](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fkeigo_transfer_task) - 敬語変換タスクにおける評価用データセット\n * [loanwords_gairaigo](https:\u002F\u002Fgithub.com\u002Fjamesohortle\u002Floanwords_gairaigo) - English loanwords in Japanese\n * [jawikicorpus](https:\u002F\u002Fgithub.com\u002Fwikiwikification\u002Fjawikicorpus) - Japanese-Wikipedia Wikification Corpus\n * [GeneralPolicySpeechOfPrimeMinisterOfJapan](https:\u002F\u002Fgithub.com\u002Fyuukimiyo\u002FGeneralPolicySpeechOfPrimeMinisterOfJapan) - This is the corpus of Japanese Text that general policy speech of prime minister of Japan\n * [wrime](https:\u002F\u002Fgithub.com\u002Fids-cv\u002Fwrime) - WRIME: 主観と客観の感情分析データセット\n * [jtubespeech](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fjtubespeech) - JTubeSpeech: Corpus of Japanese speech collected from YouTube\n * [WikipediaWordFrequencyList](https:\u002F\u002Fgithub.com\u002Fmaeda6uiui-backup\u002FWikipediaWordFrequencyList) - 日本語Wikipediaで使用される頻出単語のリスト\n * [kokkosho_data](https:\u002F\u002Fgithub.com\u002Frindybell\u002Fkokkosho_data) - 車両不具合情報に関するデータセット\n * [pdmocrdataset-part1](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fpdmocrdataset-part1) - デジタル化資料OCRテキスト化事業において作成されたOCR学習用データセット\n * [huriganacorpus-ndlbib](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhuriganacorpus-ndlbib) - 全国書誌データから作成した振り仮名のデータセット\n * [jvs_hiho](https:\u002F\u002Fgithub.com\u002FHiroshiba\u002Fjvs_hiho) - JVS (Japanese versatile speech) コーパスの自作のラベル\n * [hirakanadic](https:\u002F\u002Fgithub.com\u002Fpo3rin\u002Fhirakanadic) - Allows Sudachi to normalize from hiragana to katakana from any compound word list\n * [animedb](https:\u002F\u002Fgithub.com\u002Fanilogia\u002Fanimedb) - 約100年に渡るアニメ作品リストデータベース\n * [security_words](https:\u002F\u002Fgithub.com\u002FSaitoLab\u002Fsecurity_words) - サイバーセキュリティに関連する公的な組織の日英対応\n * [Data-on-Japanese-Diet-Members](https:\u002F\u002Fgithub.com\u002Fsugi2000\u002FData-on-Japanese-Diet-Members) - 日本の国会議員のデータ\n * [honkoku-data](https:\u002F\u002Fgithub.com\u002Fyuta1984\u002Fhonkoku-data) - 歴史資料の市民参加型翻刻プラットフォーム「みんなで翻刻」のテキストデータ置き場です。 \u002F Transcription texts created on Minna de Honkoku (https:\u002F\u002Fhonkoku.org), a crowdsourced transcription platform for historical Japanese documents.\n * [wikihow_japanese](https:\u002F\u002Fgithub.com\u002FKatsumata420\u002Fwikihow_japanese) - wikiHow dataset (Japanese version)\n * [engineer-vocabulary-list](https:\u002F\u002Fgithub.com\u002Fmercari\u002Fengineer-vocabulary-list) - Engineer Vocabulary List in Japanese\u002FEnglish\n * [JSICK](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJSICK) - Japanese Sentences Involving Compositional Knowledge (JSICK) Dataset\u002FJSICK-stress Test Set\n * [phishurl-list](https:\u002F\u002Fgithub.com\u002FJPCERTCC\u002Fphishurl-list) - Phishing URL dataset from JPCERT\u002FCC\n * [jcms](https:\u002F\u002Fgithub.com\u002Fshigashiyama\u002Fjcms) - A Japanese Corpus of Many Specialized Domains (JCMS)\n * [aozorabunko_text](https:\u002F\u002Fgithub.com\u002Faozorahack\u002Faozorabunko_text) - text-only archives of www.aozora.gr.jp\n * [friendly_JA-Corpus](https:\u002F\u002Fgithub.com\u002Fastremo\u002Ffriendly_JA-Corpus) - friendly_JA is a parallel Japanese-to-Japanese corpus aimed at making Japanese easier by using the Latin\u002FEnglish derived katakana lexicon instead of the standard Sino-Japanese lexicon\n * [topokanji](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Ftopokanji) - Topologically ordered lists of kanji for effective learning\n * [isbn4groups](https:\u002F\u002Fgithub.com\u002Furibo\u002Fisbn4groups) - ISBN-13における日本語での出版物 (978-4-XXXXXXXXX) に関するデータ等\n * [NMeCab](https:\u002F\u002Fgithub.com\u002Fkomutan\u002FNMeCab) - NMeCab: About Japanese morphological analyzer on .NET\n * [ndlngramdata](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlngramdata) - デジタル化資料から作成したOCRテキストデータのngram頻度統計情報のデータセット\n * [ndlngramviewer_v2](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlngramviewer_v2) - 2023年1月にリニューアルしたNDL Ngram Viewerのソースコード等一式\n * [data_set](https:\u002F\u002Fgithub.com\u002Fjapanese-law-analysis\u002Fdata_set) - 法律・判例関係のデータセット\n * [huggingface-datasets_wrime](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_wrime) - WRIME for huggingface datasets\n * [ndl-minhon-ocrdataset](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndl-minhon-ocrdataset) - NDL古典籍OCR学習用データセット（みんなで翻刻加工データ）\n * [PAX_SAPIENTICA](https:\u002F\u002Fgithub.com\u002FAsPJT\u002FPAX_SAPIENTICA) - GIS & Archaeological Simulator. 2023 in development.\n * [j-liwc2015](https:\u002F\u002Fgithub.com\u002Ftasukuigarashi\u002Fj-liwc2015) - Japanese version of LIWC2015\n * [huggingface-datasets_livedoor-news-corpus](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_livedoor-news-corpus) - Japanese Livedoor news corpus for huggingface datasets\n * [huggingface-datasets_JGLUE](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_JGLUE) - JGLUE: Japanese General Language Understanding Evaluation for huggingface datasets\n * [commonsense-moral-ja](https:\u002F\u002Fgithub.com\u002FLanguage-Media-Lab\u002Fcommonsense-moral-ja) - JCommonsenseMorality is a dataset created through crowdsourcing that reflects the commonsense morality of Japanese annotators.\n * [comet-atomic-ja](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fcomet-atomic-ja) - COMET-ATOMIC ja\n * [dcsg-ja](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fdcsg-ja) - Dialogue Commonsense Graph in Japanese\n * [japanese-toxic-dataset](https:\u002F\u002Fgithub.com\u002Finspection-ai\u002Fjapanese-toxic-dataset) - \"Proposal and Evaluation of Japanese Toxicity Schema\" provides a schema and dataset for toxicity in the Japanese language.\n * [camera](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002Fcamera) - CAMERA (CyberAgent Multimodal Evaluation for Ad Text GeneRAtion) is the Japanese ad text generation dataset.\n * [Japanese-Fakenews-Dataset](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-Fakenews-Dataset) - 日本語フェイクニュースデータセット\n * [jpn_explainable_qa_dataset](https:\u002F\u002Fgithub.com\u002Faiishii\u002Fjpn_explainable_qa_dataset) - jpn_explainable_qa_dataset\n * [copa-japanese](https:\u002F\u002Fgithub.com\u002Fnlp-titech\u002Fcopa-japanese) - COPA Dataset in Japanese\n * [WLSP-familiarity](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002FWLSP-familiarity) - Word Familiarity Rate for 'Word List by Semantic Principles (WLSP)'\n * [ProSub](https:\u002F\u002Fgithub.com\u002Fmatbahasa\u002FProSub) - A cross-linguistic study of pronoun substitutes and address terms\n * [commonsense-moral-ja](https:\u002F\u002Fgithub.com\u002FLanguage-Media-Lab\u002Fcommonsense-moral-ja) - JCommonsenseMorality is a dataset created through crowdsourcing that reflects the commonsense morality of Japanese annotators.\n * [ramendb](https:\u002F\u002Fgithub.com\u002Fnuko-yokohama\u002Framendb) - なんとかデータベース( https:\u002F\u002Fsupleks.jp\u002F )からのスクレイピングツールと収集データ\n * [huggingface-datasets_CAMERA](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_CAMERA) - CAMERA (CyberAgent Multimodal Evaluation for Ad Text GeneRAtion) for huggingface datasets\n * [FactCheckSentenceNLI-FCSNLI-](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002FFactCheckSentenceNLI-FCSNLI-) - FactCheckSentenceNLIデータセット\n * [databricks-dolly-15k-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Fdatabricks-dolly-15k-ja) - databricks\u002Fdolly-v2-12b の学習データに使用されたdatabricks-dolly-15k.jsonl を日本語に翻訳したデータセットになります。\n * [EaST-MELD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FEaST-MELD) - EaST-MELD is an English-Japanese dataset for emotion-aware speech translation based on MELD.\n * [meconaudio](https:\u002F\u002Fgithub.com\u002Felith-co-jp\u002Fmeconaudio) - Mecon Audio(Medical Conference Audio)は厚生労働省主催の先進医療会議の議事録の読み上げデータセットです。\n * [japanese-addresses](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fjapanese-addresses) - 全国の町丁目レベル（277,191件）の住所データのオープンデータ\n * [aozorasearch](https:\u002F\u002Fgithub.com\u002Fmyokoym\u002Faozorasearch) - The full-text search system for Aozora Bunko by Groonga. 青空文庫全文検索ライブラリ兼Webアプリ。\n * [llm-jp-corpus](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-corpus) - This repository contains scripts to reproduce the LLM-jp corpus.\n * [alpaca_ja](https:\u002F\u002Fgithub.com\u002Fshi3z\u002Falpaca_ja) - alpacaデータセットを日本語化したものです\n * [instruction_ja](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Finstruction_ja) - Japanese instruction data (日本語指示データ)\n * [japanese-family-names](https:\u002F\u002Fgithub.com\u002Fsiikamiika\u002Fjapanese-family-names) - Top 5000 Japanese family names, with readings, ordered by frequency.\n * [kanji-data-media](https:\u002F\u002Fgithub.com\u002Fkanjialive\u002Fkanji-data-media) - Japanese language data on kanji, radicals, media files, fonts and related resources from Kanji alive\n * [reazonspeech](https:\u002F\u002Fgithub.com\u002Freazon-research\u002Freazonspeech) - Construct large-scale Japanese audio corpus at home\n * [huriganacorpus-aozora](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhuriganacorpus-aozora) - 青空文庫及びサピエの点字データから作成した振り仮名のデータセット\n * [koniwa](https:\u002F\u002Fgithub.com\u002Fkoniwa\u002Fkoniwa) - An open collection of annotated voices in Japanese language\n * [JMMLU](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002FJMMLU) - 日本語マルチタスク言語理解ベンチマーク Japanese Massive Multitask Language Understanding Benchmark\n * [hurigana-speech-corpus-aozora](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhurigana-speech-corpus-aozora) - 青空文庫振り仮名注釈付き音声コーパスのデータセット\n * [jqara](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fjqara) - JQaRA: Japanese Question Answering with Retrieval Augmentation - 検索拡張(RAG)評価のための日本語Q&Aデータセット\n * [jemhopqa](https:\u002F\u002Fgithub.com\u002Faiishii\u002Fjemhopqa) - JEMHopQA (Japanese Explainable Multi-hop Question Answering) is a Japanese multi-hop QA dataset that can evaluate internal reasoning.\n * [jacred](https:\u002F\u002Fgithub.com\u002Fyoumima\u002Fjacred) - Repository for Japanese Document-level Relation Extraction Dataset (plan to be released in March).\n * [jades](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fjades) - JADES is a dataset for text simplification in Japanese, described in \"JADES: New Text Simplification Dataset in Japanese Targeted at Non-Native Speakers\" (the paper will be available soon).\n * [do-not-answer-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Fdo-not-answer-ja) - 2023年8月にメルボルン大学から公開された安全性評価データセット『Do-Not-Answer』を日本語LLMの評価においても使用できるように日本語に自動翻訳し、さらに日本文化も考慮して修正したデータセット。\n * [oasst1-89k-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Foasst1-89k-ja) - OpenAssistant のオープンソースデータ OASST1 を日本語に翻訳したデータセットになります。\n * [jacwir](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fjacwir) - JaCWIR: Japanese Casual Web IR - 日本語情報検索評価のための小規模でカジュアルなWebタイトルと概要のデータセット\n * [japanese-technical-dict](https:\u002F\u002Fgithub.com\u002Flaoshubaby\u002Fjapanese-technical-dict) - 日本語学習者のための科学技術業界でよく使われる片仮名と元の単語対照表\n * [j-unimorph](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fj-unimorph) - Dataset of UniMorph in Japanese\n * [GazeVQA](https:\u002F\u002Fgithub.com\u002Friken-grp\u002FGazeVQA) - Dataset for the LREC-COLING 2024 paper \"A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions\"\n * [J-CRe3](https:\u002F\u002Fgithub.com\u002Friken-grp\u002FJ-CRe3) - Code for J-CRe3 experiments (Ueda et al., LREC-COLING, 2024)\n * [jmed-llm](https:\u002F\u002Fgithub.com\u002Fsociocom\u002Fjmed-llm) - JMED-LLM: Japanese Medical Evaluation Dataset for Large Language Models\n * [lawtext](https:\u002F\u002Fgithub.com\u002Fyamachig\u002Flawtext) - Plain text format for Japanese law\n * [pdmocrdataset-part2](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fpdmocrdataset-part2) - OCR処理プログラム研究開発事業において作成されたOCR学習用データセット\n * [japanesetopicwsd](https:\u002F\u002Fgithub.com\u002Fnut-jnlp\u002Fjapanesetopicwsd) - 話題に基づく語義曖昧性解消評価セット\n * [temporalNLI_dataset](https:\u002F\u002Fgithub.com\u002Ftomo-vv\u002FtemporalNLI_dataset) - Jamp: Controlled Japanese Temporal Inference Dataset for Evaluating Generalization Capacity of Language Models\n * [JSeM](https:\u002F\u002Fgithub.com\u002FDaisukeBekki\u002FJSeM) - Japanese semantic test suite (FraCaS counterpart and extensions)\n * [niilc-qa](https:\u002F\u002Fgithub.com\u002Fmynlp\u002Fniilc-qa) - NIILC QA data\n * [chain-of-thought-ja-dataset](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fchain-of-thought-ja-dataset) - Dataset of paper \"Verification of Chain-of-Thought Prompting in Japanese\"\n * [WikipediaAnnotatedCorpus](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FWikipediaAnnotatedCorpus) - This is a Japanese text corpus that consists of Wikipedia articles with various linguistic annotations.\n * [elaws-history](https:\u002F\u002Fgithub.com\u002Fkissge\u002Felaws-history) - e-Gov 法令検索で配布されている「全ての法令データ」を定期的にダウンロードし、アーカイブしています\n * [Japanese-RP-Bench](https:\u002F\u002Fgithub.com\u002FAratako\u002FJapanese-RP-Bench) - Japanese-RP-BenchはLLMの日本語ロールプレイ能力を測定するためのベンチマークです。\n * [hdic](https:\u002F\u002Fgithub.com\u002Fshikeda\u002Fhdic) - HDIC : Integrated Database of Hanzi Dictionaries in Early Japan\n * [awesome-japan-opendata](https:\u002F\u002Fgithub.com\u002Fjapan-opendata\u002Fawesome-japan-opendata) - Awesome Japan Open Data - 日本のオープンデータ情報一覧・まとめ\n * [kanji-data](https:\u002F\u002Fgithub.com\u002Fmimneko\u002Fkanji-data) - 常用漢字表他、漢字に関するデータ\n * [openchj-genji](https:\u002F\u002Fgithub.com\u002Ftogiso\u002Fopenchj-genji) - 「源氏物語」形態論情報データ\n * [AdParaphrase](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002FAdParaphrase) - This repository contains data for our paper \"AdParaphrase: Paraphrase Dataset for Analyzing Linguistic Features toward Generating Attractive Ad Texts\".\n * [Jamp_sp](https:\u002F\u002Fgithub.com\u002Fynklab\u002FJamp_sp) - アスペクトを考慮した日本語時間推論データセットの構築（Jamp_sp: Controlled Japanese Temporal Inference Dataset Considering Aspect）\n * [jnli-neg](https:\u002F\u002Fgithub.com\u002Fasahi-y\u002Fjnli-neg) - 否定理解能力を評価するための日本語言語推論データセット JNLI-Neg の公開用リポジトリです。\n * [swallow-corpus](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-corpus) - This repository provides Python implementation for building Swallow Corpus Version 1, a large Japanese web corpus (Okazaki et al., 2024), from Common Crawl archives.\n * [jalecon](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fjalecon) - A Dataset of Japanese Lexical Complexity for Non-Native Readers\n * [multils-japanese](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fmultils-japanese) - MultiLS-Japanese Lexical Complexity Prediction and Lexical Simplification Dataset for Japanese: annotator profiles, unaggregated annotation, and annotatation guidelines.\n * [nwjc](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002Fnwjc) - NINJAL Web Japanese Corpus\n * [open-mantra-dataset](https:\u002F\u002Fgithub.com\u002Fmantra-inc\u002Fopen-mantra-dataset) - Dataset introduced in the paper \"Towards Fully Automated Manga Translation\" presented in AAAI21\n * [public-annotations](https:\u002F\u002Fgithub.com\u002Fmanga109\u002Fpublic-annotations) - Various annotations of Manga109 dataset\n * [gimei](https:\u002F\u002Fgithub.com\u002Fwillnet\u002Fgimei) - random Japanese name and address generator\n * [safety-boundary-test](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fsafety-boundary-test) - 日本語言語モデルの安全性の振る舞いを評価するテストセット\n * [j-ono-data](https:\u002F\u002Fgithub.com\u002FObakeConstructs\u002Fj-ono-data) - A simple, open-source collection of Japanese onomatopoeic and mimetic sound words in JSON format. With manga samples.\n * [kanji](https:\u002F\u002Fgithub.com\u002Fsylhare\u002Fkanji) - List of japanese kanji radicals to learn\n * [jethics](https:\u002F\u002Fgithub.com\u002Flanguage-media-lab\u002Fjethics) - 日本語道徳理解度評価用データセットJETHICSの概説ページ (to be update)\n * [waon](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fwaon) - WAON: Large-Scale and High-Quality Japanese Image-Text Dataset for Vision-Language Models\n * [kuci](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkuci) - Kyoto University Commonsense Inference dataset (KUCI)\n * [japanese-address-testdata](https:\u002F\u002Fgithub.com\u002Ft-sagara\u002Fjapanese-address-testdata) - 解析が難しい日本の住所のテストデータセット\n * [jlpt-word-list](https:\u002F\u002Fgithub.com\u002Felzup\u002Fjlpt-word-list) - Japanese word list from JLPT vocabulary\n * [hiragana_mojigazo](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhiragana_mojigazo) - 文字画像データセット(平仮名73文字版)\n * [lawqa_jp](https:\u002F\u002Fgithub.com\u002Fdigital-go-jp\u002Flawqa_jp) - 日本の法令に関する多肢選択式QAデータセット\n * [yjcaptions](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002Fyjcaptions) - YJ Captions 26k Dataset\n * [ja-vg-vqa](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002Fja-vg-vqa) - Japanese Visual Genome VQA dataset\n * [lawhub](https:\u002F\u002Fgithub.com\u002Flwhb\u002Flawhub) - Repository to track Japanese Law in text format\n * [japanese-subtitles-word-kanji-frequency-lists](https:\u002F\u002Fgithub.com\u002Fchriskempson\u002Fjapanese-subtitles-word-kanji-frequency-lists) - A word frequency list derived from subtitles from Japanese drama, anime and films.\n * [jconj](https:\u002F\u002Fgithub.com\u002Fyamagoya\u002Fjconj) - A table-based Japanese word conjugator\n * [extract_jawp_names](https:\u002F\u002Fgithub.com\u002Fhiroshi-manabe\u002Fextract_jawp_names) - Extracts personal names in Wikipedia Japanese.\n * [cejc_yomichan_freq_dict](https:\u002F\u002Fgithub.com\u002Fforsakeninfinity\u002Fcejc_yomichan_freq_dict) - Frequency dictionary for yomichan based on the Corpus of Everyday Japanese Conversation dataset\n * [wikidict-ja](https:\u002F\u002Fgithub.com\u002Fopen-dict-data\u002Fwikidict-ja) - Wikipedia Bilingual Reference Data (Japanese)\n * [ajimee-bench](https:\u002F\u002Fgithub.com\u002Fazookey\u002Fajimee-bench) - AJIMEE-Bench (Advanced Japanese IME Evaluation Benchmark)\n * [j-spaw](https:\u002F\u002Fgithub.com\u002Ftakamichi-lab\u002Fj-spaw) - J-SpAW: Japanese speech corpus for speaker verification and anti-spoofing\n * [camera3](https:\u002F\u002Fgithub.com\u002Fcyberagentailab\u002Fcamera3) - CAMERA3: An Evaluation Dataset for Controllable Ad Text Generation in Japanese\n * [jgpqa](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fjgpqa) - Japanese translation of the GPQA dataset\n * [tanaka-corpus-plus](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Ftanaka-corpus-plus) - Tanaka Corpus のノイズを除去しています。\n * [emotioncorpusjapanesetokushimaa2lab](https:\u002F\u002Fgithub.com\u002Fkmatsu-tokudai\u002Femotioncorpusjapanesetokushimaa2lab) - Japanese emotion corpus Tokushima Univ. A-2 Lab.\n * [osworld-jp](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fosworld-jp) - 言語を考慮した評価のための、日本語版コンピュータユースベンチマーク\n * [quasi_japanese_reviews](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fquasi_japanese_reviews) - Quasi Japanese Reviews (擬似レビューデータ)\n * [psychiatry-clinical-notes](https:\u002F\u002Fgithub.com\u002Fsociocom\u002Fpsychiatry-clinical-notes) - 精神科初診カルテ作成アンケート データセット\n * [merged-town-names](https:\u002F\u002Fgithub.com\u002Fyuukitoriyama\u002Fmerged-town-names) - 市町村合併などにより消滅した旧地名と新地名の対応表\n * [japanesetextemoticondata](https:\u002F\u002Fgithub.com\u002Fkuroshiba-ginji\u002Fjapanesetextemoticondata) - Japanese text-emoticon data.\n * [mishearing-corpus](https:\u002F\u002Fgithub.com\u002Fkishiyamat\u002Fmishearing-corpus) - 聞き間違えコーパス︱CSV＋Table Schema で約 1 万件を管理し、VS Code＋pre-commit＋Frictionless＋GitHub Actions で自動検証を行う日本語データセット\n * [kotowaza](https:\u002F\u002Fgithub.com\u002Fseptn\u002Fkotowaza) - Structured JSON dataset of Japanese proverbs (kotowaza) with meanings in Indonesian & English, examples, JLPT levels, and tags.\n * [selective-rag-kasensabo](https:\u002F\u002Fgithub.com\u002Ftk-yasuno\u002Fselective-rag-kasensabo) - 建設の技術基準に関する質問の専門性粒度（細かい\u002F粗い）を96%正確に自動判定し、最適なRAGシステム（ColBERT\u002FNaive）を選択する実用的なAgentic RAGシステムのMVPです。2025年11月に公開された河川砂防ダムの技術基準を対象に４つのRAGシステムを構築し、専門性の粒度が異なる200問の質問に対して、精度と速度を比較した。\n * [jmle2026-bench](https:\u002F\u002Fgithub.com\u002Fnaoto-iwase\u002Fjmle2026-bench) - LLM benchmark on the 120th Japanese Medical Licensing Examination (Feb 7-8, 2026)\n * [JSTS-Neg](https:\u002F\u002Fgithub.com\u002Freiko-y\u002FJSTS-Neg) - 否定理解能力を評価するための日本語意味的類似度計算データセット JSTS-Neg の公開用リポジトリです。 JSTS-Neg は、JGLUE に含まれる言語推論データセット JSTS を拡張して作成しました。\n * [business-slide-questions](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fbusiness-slide-questions) - このリポジトリでは、ビジネス資料（スライド）を対象とした Visual Question Answering (VQA) ベンチマーク「BusinessSlideVQA」を提供しています。\n * [WLSP-antonym](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002FWLSP-antonym) - Antonym relations for 'Word List by Semantic Principles (WLSP)'\n * [YouCook2-JP](https:\u002F\u002Fgithub.com\u002Fnlab-mpg\u002FYouCook2-JP) - Japanese translation of the YouCook2 dataset.\n * [E2U](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FE2U) - つたわる化に関するデータ\n * [annotation-2025](https:\u002F\u002Fgithub.com\u002FTiny-Colony\u002Fannotation-2025) - このリポジトリは，テキストの「解釈」を人手とLLM出力で比較できるデータを公開するためのものです．\n * [jhpt](https:\u002F\u002Fgithub.com\u002Fnict-astrec-att\u002Fjhpt) - 歴史的日本語資料の原文テキストと，現代語訳（参照訳）テキストをセグメント単位で対応付けた対訳データセットです．詳細は論文を参照ください．\n * [JBE-QA](https:\u002F\u002Fgithub.com\u002Fhancules\u002FJBE-QA) - Japanese Bar Exam QA\n * [j-spaw](https:\u002F\u002Fgithub.com\u002Ftakamichi-lab\u002Fj-spaw) - J-SpAW: Japanese speech corpus for speaker verification and anti-spoofing\n * [JMedWiC](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FJMedWiC) - マスク言語モデルを用いて擬似的な同義・非同義ペアを自動抽出し，人手による同義性アノテーションを通じてラベルを決定することで，日本語の医療分野における語義同一性判定データセットを構築しました．\n * [jhpt](https:\u002F\u002Fgithub.com\u002Fnict-astrec-att\u002Fjhpt) - 歴史的日本語資料対訳データセット\n * [Doppelganger-JC](https:\u002F\u002Fgithub.com\u002F0017-alt\u002FDoppelganger-JC) - This is a dataset benchmarking the misuse of cross-lingual homographs between Chinese and Japanese in LLMs.\n * [modelvista-3lang](https:\u002F\u002Fgithub.com\u002Fkuramitsulab\u002Fmodelvista-3lang) - ソフトウェア図理解のためのVLM評価ベンチマーク（日本語・英語・韓国語対応）\n * [japanese-hr-niah](https:\u002F\u002Fgithub.com\u002Fkufu\u002Fjapanese-hr-niah) - 日本語人事労務ドメインにおけるロングコンテキストLLMの性能評価ベンチマーク\n * [nijl-manyoshutei](https:\u002F\u002Fgithub.com\u002Fkokubunken\u002Fnijl-manyoshutei) - 本リポジトリでは、関西大学所蔵廣瀬本万葉集のTEI\u002FXMLデータ等をCC-BYライセンスのもとで公開しています。\n * [kamuskita](https:\u002F\u002Fgithub.com\u002Fmatbahasa\u002Fkamuskita) - マレー語勉強会で作っているオープンなマレー語・日本語辞典『みんなのマレー語辞典』\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [jrte-corpus](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fjrte-corpus) | - | - | ⭐ 77 | 🔴 june 2023|\n| 🔗 [kanji-data](https:\u002F\u002Fgithub.com\u002Fdavidluzgouveia\u002Fkanji-data) | - | - | ⭐ 215 | 🟢 february|\n| 🔗 [JapaneseWordSimilarityDataset](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002FJapaneseWordSimilarityDataset) | - | - | ⭐ 102 | 🔴 december 2021|\n| 🔗 [simple-jppdb](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002Fsimple-jppdb) | - | - | ⭐ 32 | 🔴 march 2017|\n| 🔗 [chABSA-dataset](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FchABSA-dataset) | - | - | ⭐ 140 | 🔴 september 2018|\n| 🔗 [JaQuAD](https:\u002F\u002Fgithub.com\u002FSkelterLabsInc\u002FJaQuAD) | - | - | ⭐ 110 | 🔴 january 2022|\n| 🔗 [JaNLI](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJaNLI) | - | - | ⭐ 17 | 🔴 may 2023|\n| 🔗 [ebe-dataset](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Febe-dataset) | - | - | ⭐ 18 | 🔴 december 2020|\n| 🔗 [emoji-ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Femoji-ja) | - | - | ⭐ 83 | 🔴 march 2025|\n| 🔗 [nayose-wikipedia-ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fnayose-wikipedia-ja) | - | - | ⭐ 35 | 🔴 march 2020|\n| 🔗 [ja.text8](https:\u002F\u002Fgithub.com\u002FHironsan\u002Fja.text8) | - | - | ⭐ invalid | 🔴 october 2017|\n| 🔗 [ThreeLineSummaryDataset](https:\u002F\u002Fgithub.com\u002FKodairaTomonori\u002FThreeLineSummaryDataset) | - | - | ⭐ 31 | 🔴 april 2018|\n| 🔗 [japanese](https:\u002F\u002Fgithub.com\u002Fhingston\u002Fjapanese) | - | - | ⭐ 87 | 🔴 september 2018|\n| 🔗 [kanji-frequency](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Fkanji-frequency) | - | - | ⭐ 156 | 🟢 march|\n| 🔗 [TEDxJP-10K](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FTEDxJP-10K) | - | - | ⭐ 24 | 🔴 january 2021|\n| 🔗 [CoARiJ](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FCoARiJ) | - | - | ⭐ 94 | 🔴 december 2020|\n| 🔗 [technological-book-corpus-ja](https:\u002F\u002Fgithub.com\u002Ftextlint-ja\u002Ftechnological-book-corpus-ja) | - | - | ⭐ 26 | 🔴 july 2023|\n| 🔗 [ita-corpus-chuwa](https:\u002F\u002Fgithub.com\u002Fshirayu\u002Fita-corpus-chuwa) | - | - | ⭐ 5 | 🔴 august 2021|\n| 🔗 [wikipedia-utils](https:\u002F\u002Fgithub.com\u002Fsingletongue\u002Fwikipedia-utils) | - | - | ⭐ 78 | 🔴 april 2024|\n| 🔗 [inappropriate-words-ja](https:\u002F\u002Fgithub.com\u002FMosasoM\u002Finappropriate-words-ja) | - | - | ⭐ 202 | 🔴 december 2021|\n| 🔗 [house-of-councillors](https:\u002F\u002Fgithub.com\u002Fsmartnews-smri\u002Fhouse-of-councillors) | - | - | ⭐ 107 | 🟢 yesterday|\n| 🔗 [house-of-representatives](https:\u002F\u002Fgithub.com\u002Fsmartnews-smri\u002Fhouse-of-representatives) | - | - | ⭐ 178 | 🟢 yesterday|\n| 🔗 [STAIR-captions](https:\u002F\u002Fgithub.com\u002FSTAIR-Lab-CIT\u002FSTAIR-captions) | - | - | ⭐ 90 | 🔴 july 2018|\n| 🔗 [Winograd-Schema-Challenge-Ja](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FWinograd-Schema-Challenge-Ja) | - | - | ⭐ 6 | 🔴 january 2019|\n| 🔗 [speechBSD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FspeechBSD) | - | - | ⭐ 3 | 🔴 february 2024|\n| 🔗 [ita-corpus](https:\u002F\u002Fgithub.com\u002Fmmorise\u002Fita-corpus) | - | - | ⭐ 229 | 🟢 march|\n| 🔗 [rohan4600](https:\u002F\u002Fgithub.com\u002Fmmorise\u002Frohan4600) | - | - | ⭐ 70 | 🟢 march|\n| 🔗 [anlp-jp-history](https:\u002F\u002Fgithub.com\u002Fwhym\u002Fanlp-jp-history) | - | - | ⭐ 3 | 🔴 april 2024|\n| 🔗 [keigo_transfer_task](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fkeigo_transfer_task) | - | - | ⭐ 21 | 🔴 november 2022|\n| 🔗 [loanwords_gairaigo](https:\u002F\u002Fgithub.com\u002Fjamesohortle\u002Floanwords_gairaigo) | - | - | ⭐ 19 | 🔴 january 2021|\n| 🔗 [jawikicorpus](https:\u002F\u002Fgithub.com\u002Fwikiwikification\u002Fjawikicorpus) | - | - | ⭐ 4 | 🔴 november 2018|\n| 🔗 [GeneralPolicySpeechOfPrimeMinisterOfJapan](https:\u002F\u002Fgithub.com\u002Fyuukimiyo\u002FGeneralPolicySpeechOfPrimeMinisterOfJapan) | - | - | ⭐ 6 | 🔴 january 2020|\n| 🔗 [wrime](https:\u002F\u002Fgithub.com\u002Fids-cv\u002Fwrime) | - | - | ⭐ 174 | 🟡 september 2025|\n| 🔗 [jtubespeech](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fjtubespeech) | - | - | ⭐ 229 | 🔴 march 2023|\n| 🔗 [WikipediaWordFrequencyList](https:\u002F\u002Fgithub.com\u002Fmaeda6uiui-backup\u002FWikipediaWordFrequencyList) | - | - | ⭐ 2 | 🔴 april 2022|\n| 🔗 [kokkosho_data](https:\u002F\u002Fgithub.com\u002Frindybell\u002Fkokkosho_data) | - | - | ⭐ 1 | 🔴 july 2019|\n| 🔗 [pdmocrdataset-part1](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fpdmocrdataset-part1) | - | - | ⭐ 83 | 🔴 june 2024|\n| 🔗 [huriganacorpus-ndlbib](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhuriganacorpus-ndlbib) | - | - | ⭐ 31 | 🔴 september 2021|\n| 🔗 [jvs_hiho](https:\u002F\u002Fgithub.com\u002FHiroshiba\u002Fjvs_hiho) | - | - | ⭐ 31 | 🔴 february 2021|\n| 🔗 [hirakanadic](https:\u002F\u002Fgithub.com\u002Fpo3rin\u002Fhirakanadic) | 📥 28 | 📦 14k | ⭐ 7 | 🔴 july 2023|\n| 🔗 [animedb](https:\u002F\u002Fgithub.com\u002Fanilogia\u002Fanimedb) | - | - | ⭐ 330 | 🔴 january 2023|\n| 🔗 [security_words](https:\u002F\u002Fgithub.com\u002FSaitoLab\u002Fsecurity_words) | - | - | ⭐ 27 | 🔴 august 2023|\n| 🔗 [Data-on-Japanese-Diet-Members](https:\u002F\u002Fgithub.com\u002Fsugi2000\u002FData-on-Japanese-Diet-Members) | - | - | ⭐ 3 | 🔴 september 2022|\n| 🔗 [honkoku-data](https:\u002F\u002Fgithub.com\u002Fyuta1984\u002Fhonkoku-data) | - | - | ⭐ 18 | 🟢 march|\n| 🔗 [wikihow_japanese](https:\u002F\u002Fgithub.com\u002FKatsumata420\u002Fwikihow_japanese) | - | - | ⭐ 35 | 🔴 december 2020|\n| 🔗 [engineer-vocabulary-list](https:\u002F\u002Fgithub.com\u002Fmercari\u002Fengineer-vocabulary-list) | - | - | ⭐ 1.9k | 🔴 november 2020|\n| 🔗 [JSICK](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJSICK) | - | - | ⭐ 45 | 🔴 may 2023|\n| 🔗 [phishurl-list](https:\u002F\u002Fgithub.com\u002FJPCERTCC\u002Fphishurl-list) | - | - | ⭐ 205 | 🟢 march|\n| 🔗 [jcms](https:\u002F\u002Fgithub.com\u002Fshigashiyama\u002Fjcms) | - | - | ⭐ 9 | 🟢 last friday|\n| 🔗 [aozorabunko_text](https:\u002F\u002Fgithub.com\u002Faozorahack\u002Faozorabunko_text) | - | - | ⭐ 91 | 🔴 march 2023|\n| 🔗 [friendly_JA-Corpus](https:\u002F\u002Fgithub.com\u002Fastremo\u002Ffriendly_JA-Corpus) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [topokanji](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Ftopokanji) | - | - | ⭐ 200 | 🔴 january 2016|\n| 🔗 [isbn4groups](https:\u002F\u002Fgithub.com\u002Furibo\u002Fisbn4groups) | - | - | ⭐ 1 | 🔴 june 2024|\n| 🔗 [NMeCab](https:\u002F\u002Fgithub.com\u002Fkomutan\u002FNMeCab) | - | - | ⭐ 99 | 🔴 march 2024|\n| 🔗 [ndlngramdata](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlngramdata) | - | - | ⭐ 15 | 🔴 january 2023|\n| 🔗 [ndlngramviewer_v2](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlngramviewer_v2) | - | - | ⭐ 3 | 🔴 july 2023|\n| 🔗 [data_set](https:\u002F\u002Fgithub.com\u002Fjapanese-law-analysis\u002Fdata_set) | - | - | ⭐ 51 | 🔴 january 2025|\n| 🔗 [huggingface-datasets_wrime](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_wrime) | - | - | ⭐ 4 | 🔴 january 2023|\n| 🔗 [ndl-minhon-ocrdataset](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndl-minhon-ocrdataset) | - | - | ⭐ 20 | 🟢 march|\n| 🔗 [PAX_SAPIENTICA](https:\u002F\u002Fgithub.com\u002FAsPJT\u002FPAX_SAPIENTICA) | - | - | ⭐ 181 | 🟡 december 2025|\n| 🔗 [j-liwc2015](https:\u002F\u002Fgithub.com\u002Ftasukuigarashi\u002Fj-liwc2015) | - | - | ⭐ 13 | 🔴 november 2024|\n| 🔗 [huggingface-datasets_livedoor-news-corpus](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_livedoor-news-corpus) | - | - | ⭐ 2 | 🔴 october 2023|\n| 🔗 [huggingface-datasets_JGLUE](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_JGLUE) | - | - | ⭐ 12 | 🔴 march 2025|\n| 🔗 [commonsense-moral-ja](https:\u002F\u002Fgithub.com\u002FLanguage-Media-Lab\u002Fcommonsense-moral-ja) | - | - | ⭐ 15 | 🟡 november 2025|\n| 🔗 [comet-atomic-ja](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fcomet-atomic-ja) | - | - | ⭐ 31 | 🔴 march 2024|\n| 🔗 [dcsg-ja](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fdcsg-ja) | - | - | ⭐ 6 | 🔴 march 2023|\n| 🔗 [japanese-toxic-dataset](https:\u002F\u002Fgithub.com\u002Finspection-ai\u002Fjapanese-toxic-dataset) | - | - | ⭐ 21 | 🔴 january 2023|\n| 🔗 [camera](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002Fcamera) | - | - | ⭐ 26 | 🔴 august 2024|\n| 🔗 [Japanese-Fakenews-Dataset](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-Fakenews-Dataset) | - | - | ⭐ 20 | 🔴 may 2021|\n| 🔗 [jpn_explainable_qa_dataset](https:\u002F\u002Fgithub.com\u002Faiishii\u002Fjpn_explainable_qa_dataset) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [copa-japanese](https:\u002F\u002Fgithub.com\u002Fnlp-titech\u002Fcopa-japanese) | - | - | ⭐ 1 | 🔴 february 2023|\n| 🔗 [WLSP-familiarity](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002FWLSP-familiarity) | - | - | ⭐ 12 | 🔴 january 2025|\n| 🔗 [ProSub](https:\u002F\u002Fgithub.com\u002Fmatbahasa\u002FProSub) | - | - | ⭐ 5 | 🟡 april 2025|\n| 🔗 [commonsense-moral-ja](https:\u002F\u002Fgithub.com\u002FLanguage-Media-Lab\u002Fcommonsense-moral-ja) | - | - | ⭐ 15 | 🟡 november 2025|\n| 🔗 [ramendb](https:\u002F\u002Fgithub.com\u002Fnuko-yokohama\u002Framendb) | - | - | ⭐ 7 | 🟢 last friday|\n| 🔗 [huggingface-datasets_CAMERA](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_CAMERA) | - | - | ⭐ 3 | 🔴 march 2023|\n| 🔗 [FactCheckSentenceNLI-FCSNLI-](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002FFactCheckSentenceNLI-FCSNLI-) | - | - | ⭐ 0 | 🔴 march 2021|\n| 🔗 [databricks-dolly-15k-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Fdatabricks-dolly-15k-ja) | - | - | ⭐ 89 | 🔴 july 2023|\n| 🔗 [EaST-MELD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FEaST-MELD) | - | - | ⭐ 0 | 🔴 june 2023|\n| 🔗 [meconaudio](https:\u002F\u002Fgithub.com\u002Felith-co-jp\u002Fmeconaudio) | - | - | ⭐ 10 | 🔴 october 2023|\n| 🔗 [japanese-addresses](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fjapanese-addresses) | - | - | ⭐ 761 | 🟡 december 2025|\n| 🔗 [aozorasearch](https:\u002F\u002Fgithub.com\u002Fmyokoym\u002Faozorasearch) | - | - | ⭐ 22 | 🟢 march|\n| 🔗 [llm-jp-corpus](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-corpus) | - | - | ⭐ 44 | 🔴 october 2023|\n| 🔗 [alpaca_ja](https:\u002F\u002Fgithub.com\u002Fshi3z\u002Falpaca_ja) | - | - | ⭐ 86 | 🔴 may 2023|\n| 🔗 [instruction_ja](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Finstruction_ja) | - | - | ⭐ 24 | 🔴 july 2023|\n| 🔗 [japanese-family-names](https:\u002F\u002Fgithub.com\u002Fsiikamiika\u002Fjapanese-family-names) | - | - | ⭐ 18 | 🔴 june 2017|\n| 🔗 [kanji-data-media](https:\u002F\u002Fgithub.com\u002Fkanjialive\u002Fkanji-data-media) | - | - | ⭐ 409 | 🔴 november 2023|\n| 🔗 [reazonspeech](https:\u002F\u002Fgithub.com\u002Freazon-research\u002Freazonspeech) | - | - | ⭐ 380 | 🟢 january|\n| 🔗 [huriganacorpus-aozora](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhuriganacorpus-aozora) | - | - | ⭐ 22 | 🔴 january 2024|\n| 🔗 [koniwa](https:\u002F\u002Fgithub.com\u002Fkoniwa\u002Fkoniwa) | - | - | ⭐ 60 | 🟡 april 2025|\n| 🔗 [JMMLU](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002FJMMLU) | - | - | ⭐ 38 | 🟡 october 2025|\n| 🔗 [hurigana-speech-corpus-aozora](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhurigana-speech-corpus-aozora) | - | - | ⭐ 48 | 🔴 march 2025|\n| 🔗 [jqara](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fjqara) | - | - | ⭐ 43 | 🟡 september 2025|\n| 🔗 [jemhopqa](https:\u002F\u002Fgithub.com\u002Faiishii\u002Fjemhopqa) | - | - | ⭐ 30 | 🟡 april 2025|\n| 🔗 [jacred](https:\u002F\u002Fgithub.com\u002Fyoumima\u002Fjacred) | - | - | ⭐ 8 | 🔴 march 2024|\n| 🔗 [jades](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fjades) | - | - | ⭐ 0 | 🔴 december 2022|\n| 🔗 [do-not-answer-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Fdo-not-answer-ja) | - | - | ⭐ 24 | 🔴 december 2023|\n| 🔗 [oasst1-89k-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Foasst1-89k-ja) | - | - | ⭐ 16 | 🔴 november 2023|\n| 🔗 [jacwir](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fjacwir) | - | - | ⭐ 8 | 🟡 september 2025|\n| 🔗 [japanese-technical-dict](https:\u002F\u002Fgithub.com\u002Flaoshubaby\u002Fjapanese-technical-dict) | - | - | ⭐ 3 | 🔴 november 2024|\n| 🔗 [j-unimorph](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fj-unimorph) | - | - | ⭐ 9 | 🟢 january|\n| 🔗 [GazeVQA](https:\u002F\u002Fgithub.com\u002Friken-grp\u002FGazeVQA) | - | - | ⭐ 0 | 🔴 september 2024|\n| 🔗 [J-CRe3](https:\u002F\u002Fgithub.com\u002Friken-grp\u002FJ-CRe3) | - | - | ⭐ 10 | 🔴 january 2025|\n| 🔗 [jmed-llm](https:\u002F\u002Fgithub.com\u002Fsociocom\u002Fjmed-llm) | - | - | ⭐ 56 | 🔴 september 2024|\n| 🔗 [lawtext](https:\u002F\u002Fgithub.com\u002Fyamachig\u002Flawtext) | - | - | ⭐ 94 | 🟢 january|\n| 🔗 [pdmocrdataset-part2](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fpdmocrdataset-part2) | - | - | ⭐ 15 | 🔴 june 2024|\n| 🔗 [japanesetopicwsd](https:\u002F\u002Fgithub.com\u002Fnut-jnlp\u002Fjapanesetopicwsd) | - | - | ⭐ 2 | 🔴 september 2018|\n| 🔗 [temporalNLI_dataset](https:\u002F\u002Fgithub.com\u002Ftomo-vv\u002FtemporalNLI_dataset) | - | - | ⭐ 1 | 🔴 july 2023|\n| 🔗 [JSeM](https:\u002F\u002Fgithub.com\u002FDaisukeBekki\u002FJSeM) | - | - | ⭐ 13 | 🔴 november 2024|\n| 🔗 [niilc-qa](https:\u002F\u002Fgithub.com\u002Fmynlp\u002Fniilc-qa) | - | - | ⭐ 18 | 🔴 november 2015|\n| 🔗 [chain-of-thought-ja-dataset](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fchain-of-thought-ja-dataset) | - | - | ⭐ 5 | 🔴 september 2023|\n| 🔗 [WikipediaAnnotatedCorpus](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FWikipediaAnnotatedCorpus) | - | - | ⭐ 29 | 🟢 february|\n| 🔗 [elaws-history](https:\u002F\u002Fgithub.com\u002Fkissge\u002Felaws-history) | - | - | ⭐ 5 | 🟢 yesterday|\n| 🔗 [Japanese-RP-Bench](https:\u002F\u002Fgithub.com\u002FAratako\u002FJapanese-RP-Bench) | - | - | ⭐ 18 | 🔴 september 2024|\n| 🔗 [hdic](https:\u002F\u002Fgithub.com\u002Fshikeda\u002Fhdic) | - | - | ⭐ 41 | 🟢 march|\n| 🔗 [awesome-japan-opendata](https:\u002F\u002Fgithub.com\u002Fjapan-opendata\u002Fawesome-japan-opendata) | - | - | ⭐ 159 | 🟢 march|\n| 🔗 [kanji-data](https:\u002F\u002Fgithub.com\u002Fmimneko\u002Fkanji-data) | - | - | ⭐ 18 | 🟢 february|\n| 🔗 [openchj-genji](https:\u002F\u002Fgithub.com\u002Ftogiso\u002Fopenchj-genji) | - | - | ⭐ 2 | 🔴 march 2025|\n| 🔗 [AdParaphrase](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002FAdParaphrase) | - | - | ⭐ 1 | 🟡 may 2025|\n| 🔗 [Jamp_sp](https:\u002F\u002Fgithub.com\u002Fynklab\u002FJamp_sp) | - | - | ⭐ 0 | 🔴 june 2024|\n| 🔗 [jnli-neg](https:\u002F\u002Fgithub.com\u002Fasahi-y\u002Fjnli-neg) | - | - | ⭐ 0 | 🟡 december 2025|\n| 🔗 [swallow-corpus](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-corpus) | - | - | ⭐ 6 | 🔴 november 2024|\n| 🔗 [jalecon](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fjalecon) | - | - | ⭐ 5 | 🔴 july 2023|\n| 🔗 [multils-japanese](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fmultils-japanese) | - | - | ⭐ 0 | 🔴 invalid|\n| 🔗 [nwjc](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002Fnwjc) | - | - | ⭐ 10 | 🔴 april 2022|\n| 🔗 [open-mantra-dataset](https:\u002F\u002Fgithub.com\u002Fmantra-inc\u002Fopen-mantra-dataset) | - | - | ⭐ 199 | 🔴 march 2023|\n| 🔗 [gimei](https:\u002F\u002Fgithub.com\u002Fwillnet\u002Fgimei) | - | - | ⭐ 424 | 🟢 january|\n| 🔗 [safety-boundary-test](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fsafety-boundary-test) | - | - | ⭐ 9 | 🟡 july 2025|\n| 🔗 [j-ono-data](https:\u002F\u002Fgithub.com\u002FObakeConstructs\u002Fj-ono-data) | - | - | ⭐ 7 | 🟢 last thursday|\n| 🔗 [kanji](https:\u002F\u002Fgithub.com\u002Fsylhare\u002Fkanji) | - | - | ⭐ 28 | 🟢 last friday|\n| 🔗 [jethics](https:\u002F\u002Fgithub.com\u002Flanguage-media-lab\u002Fjethics) | - | - | ⭐ 2 | 🟡 june 2025|\n| 🔗 [waon](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fwaon) | - | - | ⭐ 6 | 🟡 november 2025|\n| 🔗 [kuci](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkuci) | - | - | ⭐ 5 | 🔴 february 2024|\n| 🔗 [japanese-address-testdata](https:\u002F\u002Fgithub.com\u002Ft-sagara\u002Fjapanese-address-testdata) | - | - | ⭐ 14 | 🔴 september 2023|\n| 🔗 [jlpt-word-list](https:\u002F\u002Fgithub.com\u002Felzup\u002Fjlpt-word-list) | - | - | ⭐ 66 | 🔴 february 2022|\n| 🔗 [hiragana_mojigazo](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhiragana_mojigazo) | - | - | ⭐ 18 | 🔴 april 2020|\n| 🔗 [lawqa_jp](https:\u002F\u002Fgithub.com\u002Fdigital-go-jp\u002Flawqa_jp) | - | - | ⭐ 267 | 🟢 february|\n| 🔗 [yjcaptions](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002Fyjcaptions) | - | - | ⭐ 60 | 🔴 november 2016|\n| 🔗 [ja-vg-vqa](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002Fja-vg-vqa) | - | - | ⭐ 30 | 🔴 november 2018|\n| 🔗 [lawhub](https:\u002F\u002Fgithub.com\u002Flwhb\u002Flawhub) | - | - | ⭐ 152 | 🔴 november 2020|\n| 🔗 [japanese-subtitles-word-kanji-frequency-lists](https:\u002F\u002Fgithub.com\u002Fchriskempson\u002Fjapanese-subtitles-word-kanji-frequency-lists) | - | - | ⭐ 40 | 🔴 december 2023|\n| 🔗 [jconj](https:\u002F\u002Fgithub.com\u002Fyamagoya\u002Fjconj) | - | - | ⭐ 35 | 🔴 may 2020|\n| 🔗 [extract_jawp_names](https:\u002F\u002Fgithub.com\u002Fhiroshi-manabe\u002Fextract_jawp_names) | - | - | ⭐ 21 | 🔴 december 2022|\n| 🔗 [cejc_yomichan_freq_dict](https:\u002F\u002Fgithub.com\u002Fforsakeninfinity\u002Fcejc_yomichan_freq_dict) | - | - | ⭐ 11 | 🔴 june 2023|\n| 🔗 [wikidict-ja](https:\u002F\u002Fgithub.com\u002Fopen-dict-data\u002Fwikidict-ja) | - | - | ⭐ 5 | 🔴 june 2016|\n| 🔗 [ajimee-bench](https:\u002F\u002Fgithub.com\u002Fazookey\u002Fajimee-bench) | - | - | ⭐ 20 | 🔴 january 2025|\n| 🔗 [j-spaw](https:\u002F\u002Fgithub.com\u002Ftakamichi-lab\u002Fj-spaw) | - | - | ⭐ 5 | 🟡 august 2025|\n| 🔗 [camera3](https:\u002F\u002Fgithub.com\u002Fcyberagentailab\u002Fcamera3) | - | - | ⭐ 4 | 🔴 may 2024|\n| 🔗 [jgpqa](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fjgpqa) | - | - | ⭐ 2 | 🟡 september 2025|\n| 🔗 [tanaka-corpus-plus](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Ftanaka-corpus-plus) | - | - | ⭐ 2 | 🔴 june 2021|\n| 🔗 [emotioncorpusjapanesetokushimaa2lab](https:\u002F\u002Fgithub.com\u002Fkmatsu-tokudai\u002Femotioncorpusjapanesetokushimaa2lab) | - | - | ⭐ 2 | 🔴 september 2024|\n| 🔗 [osworld-jp](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fosworld-jp) | - | - | ⭐ 2 | 🟢 last friday|\n| 🔗 [quasi_japanese_reviews](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fquasi_japanese_reviews) | - | - | ⭐ 1 | 🔴 july 2023|\n| 🔗 [psychiatry-clinical-notes](https:\u002F\u002Fgithub.com\u002Fsociocom\u002Fpsychiatry-clinical-notes) | - | - | ⭐ 1 | 🟡 october 2025|\n| 🔗 [merged-town-names](https:\u002F\u002Fgithub.com\u002Fyuukitoriyama\u002Fmerged-town-names) | - | - | ⭐ 1 | 🔴 may 2022|\n| 🔗 [japanesetextemoticondata](https:\u002F\u002Fgithub.com\u002Fkuroshiba-ginji\u002Fjapanesetextemoticondata) | - | - | ⭐ 1 | 🔴 march 2021|\n| 🔗 [mishearing-corpus](https:\u002F\u002Fgithub.com\u002Fkishiyamat\u002Fmishearing-corpus) | - | - | ⭐ 1 | 🟢 january|\n| 🔗 [kotowaza](https:\u002F\u002Fgithub.com\u002Fseptn\u002Fkotowaza) | - | - | ⭐ 2 | 🟢 february|\n| 🔗 [selective-rag-kasensabo](https:\u002F\u002Fgithub.com\u002Ftk-yasuno\u002Fselective-rag-kasensabo) | - | - | ⭐ 1 | 🟡 november 2025|\n| 🔗 [jmle2026-bench](https:\u002F\u002Fgithub.com\u002Fnaoto-iwase\u002Fjmle2026-bench) | - | - | ⭐ 10 | 🟢 march|\n| 🔗 [JSTS-Neg](https:\u002F\u002Fgithub.com\u002Freiko-y\u002FJSTS-Neg) | - | - | ⭐ 1 | 🟢 february|\n| 🔗 [business-slide-questions](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fbusiness-slide-questions) | - | - | ⭐ 2 | 🟡 may 2025|\n| 🔗 [WLSP-antonym](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002FWLSP-antonym) | - | - | ⭐ 0 | 🔴 march 2021|\n| 🔗 [YouCook2-JP](https:\u002F\u002Fgithub.com\u002Fnlab-mpg\u002FYouCook2-JP) | - | - | ⭐ 0 | 🟡 august 2025|\n| 🔗 [E2U](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FE2U) | - | - | ⭐ 0 | 🟢 march|\n| 🔗 [annotation-2025](https:\u002F\u002Fgithub.com\u002FTiny-Colony\u002Fannotation-2025) | - | - | ⭐ 0 | 🟢 january|\n| 🔗 [jhpt](https:\u002F\u002Fgithub.com\u002Fnict-astrec-att\u002Fjhpt) | - | - | ⭐ 3 | 🟢 march|\n| 🔗 [JBE-QA](https:\u002F\u002Fgithub.com\u002Fhancules\u002FJBE-QA) | - | - | ⭐ 0 | 🟡 november 2025|\n| 🔗 [j-spaw](https:\u002F\u002Fgithub.com\u002Ftakamichi-lab\u002Fj-spaw) | - | - | ⭐ 5 | 🟡 august 2025|\n| 🔗 [JMedWiC](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FJMedWiC) | - | - | ⭐ 3 | 🟢 march|\n| 🔗 [jhpt](https:\u002F\u002Fgithub.com\u002Fnict-astrec-att\u002Fjhpt) | - | - | ⭐ 3 | 🟢 march|\n| 🔗 [Doppelganger-JC](https:\u002F\u002Fgithub.com\u002F0017-alt\u002FDoppelganger-JC) | - | - | ⭐ 1 | 🟢 january|\n| 🔗 [modelvista-3lang](https:\u002F\u002Fgithub.com\u002Fkuramitsulab\u002Fmodelvista-3lang) | - | - | ⭐ 2 | 🟢 march|\n| 🔗 [japanese-hr-niah](https:\u002F\u002Fgithub.com\u002Fkufu\u002Fjapanese-hr-niah) | - | - | ⭐ 1 | 🟢 january|\n| 🔗 [nijl-manyoshutei](https:\u002F\u002Fgithub.com\u002Fkokubunken\u002Fnijl-manyoshutei) | - | - | ⭐ 2 | 🟢 march|\n| 🔗 [kamuskita](https:\u002F\u002Fgithub.com\u002Fmatbahasa\u002Fkamuskita) | - | - | ⭐ 2 | 🟢 last thursday|\n\n\n## Tutorial\nGuides and tutorials for learning Japanese NLP tools and techniques\n\n * [spacy_tutorial](https:\u002F\u002Fgithub.com\u002Fyuibi\u002Fspacy_tutorial) - spaCy tutorial in English and Japanese. spacy-transformers, BERT, GiNZA.\n * [fastTextJapaneseTutorial](https:\u002F\u002Fgithub.com\u002Ficoxfog417\u002FfastTextJapaneseTutorial) - Tutorial to train fastText with Japanese corpus\n * [allennlp-NER-ja](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fallennlp-NER-ja) - AllenNLP-NER-ja: AllenNLP による日本語を対象とした固有表現抽出\n * [chariot-PyTorch-Japanese-text-classification](https:\u002F\u002Fgithub.com\u002Fymym3412\u002Fchariot-PyTorch-Japanese-text-classification) - Experiment for Japanese Text classification using chariot and PyTorch\n * [ginza-examples](https:\u002F\u002Fgithub.com\u002Fpoyo46\u002Fginza-examples) - 日本語NLPライブラリGiNZAのすゝめ\n * [DocumentClassificationUsingBERT-Japanese](https:\u002F\u002Fgithub.com\u002Fnekoumei\u002FDocumentClassificationUsingBERT-Japanese) - DocumentClassificationUsingBERT-Japanese\n * [BERT_Japanese_Google_Colaboratory](https:\u002F\u002Fgithub.com\u002FYutaroOgawa\u002FBERT_Japanese_Google_Colaboratory) - Google Colaboratoryで日本語のBERTを動かす方法です。\n * [bert-book](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fbert-book) - 「BERTによる自然言語処理入門: Transformersを使った実践プログラミング」サポートページ\n * [janome-tutorial](https:\u002F\u002Fgithub.com\u002Fmocobeta\u002Fjanome-tutorial) - Janome を使ったテキストマイニング入門チュートリアルです。\n * [handson-language-models](https:\u002F\u002Fgithub.com\u002Fhnishi\u002Fhandson-language-models) - 日本語の言語モデルのハンズオン資料です\n * [JapaneseNLI](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJapaneseNLI) - Google Colabで日本語テキスト推論を試す\n * [deep-learning-with-pytorch-ja](https:\u002F\u002Fgithub.com\u002FGin5050\u002Fdeep-learning-with-pytorch-ja) - deep-learning-with-pytorchの日本語版repositoryです。\n * [bert-classification-tutorial](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fbert-classification-tutorial) -【2023年版】BERTによるテキスト分類\n * [python-nlp-book](https:\u002F\u002Fgithub.com\u002Fpython-nlp-book\u002Fpython-nlp-book) - ディープラーニングによる自然言語処理（共立出版）のサポートページです\n * [llm-book](https:\u002F\u002Fgithub.com\u002Fghmagazine\u002Fllm-book) - 「大規模言語モデル入門」（技術評論社, 2023）のGitHubリポジトリ\n * [nlp2024-tutorial-3](https:\u002F\u002Fgithub.com\u002Fhiroshi-matsuda-rit\u002Fnlp2024-tutorial-3) - NLP2024 チュートリアル３ 作って学ぶ日本語大規模言語モデル - 環境構築手順とソースコード\n * [japanese-ir-tutorial](https:\u002F\u002Fgithub.com\u002Fmpkato\u002Fjapanese-ir-tutorial) - 日本語情報検索チュートリアル\n * [nlpbook](https:\u002F\u002Fgithub.com\u002Fmamorlis\u002Fnlpbook) - 「自然言語処理の教科書」サポートサイト\n * [kantan-regex-book](https:\u002F\u002Fgithub.com\u002Fmakenowjust\u002Fkantan-regex-book) - 作って学ぶ正規表現エンジン\n * [bert-classification-tutorial-2024](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fbert-classification-tutorial-2024) - 【2024年版】BERTによるテキスト分類\n * [Gemma2_2b_Japanese_finetuning_colab.ipynb](https:\u002F\u002Fgithub.com\u002Fqianniu95\u002Fgemma2_2b_finetune_jp_tutorial\u002Fblob\u002Fmain\u002FGemma2_2b_Japanese_finetuning_colab.ipynb) - Fine-Tuning Google Gemma for Japanese Instructions\n * [nlp100v2020](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp100v2020) - 「言語処理100本ノック 2020」をPythonで解く\n * [textmining-ja](https:\u002F\u002Fgithub.com\u002Fpaithiov909\u002Ftextmining-ja) - Rによる自然言語処理・テキスト分析の練習\n * [nlp2025-tutorial-2](https:\u002F\u002Fgithub.com\u002Fyuiseki\u002Fnlp2025-tutorial-2) - NLP2025 のチュートリアル「地理情報と言語処理 実践入門」の資料とソースコード\n * [nlp100v2025](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp100v2025) - 「言語処理100本ノック 2025」をPythonで解く\n * [topic-models-ao](https:\u002F\u002Fgithub.com\u002Fanemptyarchive\u002Ftopic-models-ao) - 『トピックモデル』(機械学習プロフェッショナルシリーズ)のノート\n * [slp2025](https:\u002F\u002Fgithub.com\u002Fryota-komatsu\u002Fslp2025) -音学シンポジウム2025チュートリアル「マルチモーダル大規模言語モデル入門」資料\n * [book_impress_it-basic-education-ai](https:\u002F\u002Fgithub.com\u002Fliber-craft-co-ltd\u002Fbook_impress_it-basic-education-ai) - インプレス出版「IT基礎教養 自然言語処理＆画像解析」\n * [genai-agent-advanced-book](https:\u002F\u002Fgithub.com\u002Fmasamasa59\u002Fgenai-agent-advanced-book) - 書籍「現場で活用するための生成AIエージェント実践入門」（講談社サイエンティフィック社）で利用されるソースコード\n * [course2024-nlp](https:\u002F\u002Fgithub.com\u002Ftomonari-masada\u002Fcourse2024-nlp) - 2024年度 立教大学大学院 人工知能科学研究科 自然言語処理特論\n * [support-genai-book](https:\u002F\u002Fgithub.com\u002Fyoheikikuta\u002Fsupport-genai-book) - 原論文から解き明かす生成AI（技術評論社）のサポートページです\n * [ir100](https:\u002F\u002Fgithub.com\u002Fir100\u002Fir100) - 情報検索100本ノック\n * [kaggle_llm_book](https:\u002F\u002Fgithub.com\u002Fsinchir0\u002Fkaggle_llm_book) - 『Kaggle ではじめる大規模言語モデル入門　～自然言語処理〈実践〉プログラミング～』のサポートサイト\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [spacy_tutorial](https:\u002F\u002Fgithub.com\u002Fyuibi\u002Fspacy_tutorial) | - | - | ⭐ 65 | 🔴 january 2020|\n| 🔗 [fastTextJapaneseTutorial](https:\u002F\u002Fgithub.com\u002Ficoxfog417\u002FfastTextJapaneseTutorial) | - | - | ⭐ 205 | 🔴 september 2016|\n| 🔗 [allennlp-NER-ja](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fallennlp-NER-ja) | - | - | ⭐ 5 | 🔴 may 2022|\n| 🔗 [chariot-PyTorch-Japanese-text-classification](https:\u002F\u002Fgithub.com\u002Fymym3412\u002Fchariot-PyTorch-Japanese-text-classification) | - | - | ⭐ 5 | 🔴 march 2019|\n| 🔗 [ginza-examples](https:\u002F\u002Fgithub.com\u002Fpoyo46\u002Fginza-examples) | - | - | ⭐ 15 | 🔴 january 2021|\n| 🔗 [DocumentClassificationUsingBERT-Japanese](https:\u002F\u002Fgithub.com\u002Fnekoumei\u002FDocumentClassificationUsingBERT-Japanese) | - | - | ⭐ 0 | 🟡 august 2025|\n| 🔗 [BERT_Japanese_Google_Colaboratory](https:\u002F\u002Fgithub.com\u002FYutaroOgawa\u002FBERT_Japanese_Google_Colaboratory) | - | - | ⭐ 29 | 🔴 january 2022|\n| 🔗 [bert-book](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fbert-book) | - | - | ⭐ 264 | 🔴 february 2024|\n| 🔗 [janome-tutorial](https:\u002F\u002Fgithub.com\u002Fmocobeta\u002Fjanome-tutorial) | - | - | ⭐ 31 | 🔴 march 2019|\n| 🔗 [handson-language-models](https:\u002F\u002Fgithub.com\u002Fhnishi\u002Fhandson-language-models) | - | - | ⭐ 3 | 🔴 march 2021|\n| 🔗 [JapaneseNLI](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJapaneseNLI) | - | - | ⭐ 6 | 🔴 june 2021|\n| 🔗 [deep-learning-with-pytorch-ja](https:\u002F\u002Fgithub.com\u002FGin5050\u002Fdeep-learning-with-pytorch-ja) | - | - | ⭐ 143 | 🔴 may 2021|\n| 🔗 [bert-classification-tutorial](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fbert-classification-tutorial) | - | - | ⭐ 234 | 🔴 may 2024|\n| 🔗 [python-nlp-book](https:\u002F\u002Fgithub.com\u002Fpython-nlp-book\u002Fpython-nlp-book) | - | - | ⭐ 10 | 🔴 may 2023|\n| 🔗 [llm-book](https:\u002F\u002Fgithub.com\u002Fghmagazine\u002Fllm-book) | - | - | ⭐ 467 | 🟡 december 2025|\n| 🔗 [nlp2024-tutorial-3](https:\u002F\u002Fgithub.com\u002Fhiroshi-matsuda-rit\u002Fnlp2024-tutorial-3) | - | - | ⭐ 113 | 🔴 april 2024|\n| 🔗 [japanese-ir-tutorial](https:\u002F\u002Fgithub.com\u002Fmpkato\u002Fjapanese-ir-tutorial) | - | - | ⭐ 3 | 🔴 june 2024|\n| 🔗 [nlpbook](https:\u002F\u002Fgithub.com\u002Fmamorlis\u002Fnlpbook) | - | - | ⭐ 14 | 🟡 april 2025|\n| 🔗 [kantan-regex-book](https:\u002F\u002Fgithub.com\u002Fmakenowjust\u002Fkantan-regex-book) | - | - | ⭐ 22 | 🔴 march 2024|\n| 🔗 [bert-classification-tutorial-2024](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fbert-classification-tutorial-2024) | - | - | ⭐ 30 | 🔴 july 2024|\n| 🔗 [Gemma2_2b_Japanese_finetuning_colab.ipynb](https:\u002F\u002Fgithub.com\u002Fqianniu95\u002Fgemma2_2b_finetune_jp_tutorial\u002Fblob\u002Fmain\u002FGemma2_2b_Japanese_finetuning_colab.ipynb) | - | - | ⭐ repo not found | 🔴 august 2024|\n| 🔗 [nlp100v2020](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp100v2020) | - | - | ⭐ 90 | 🟡 april 2025|\n| 🔗 [textmining-ja](https:\u002F\u002Fgithub.com\u002Fpaithiov909\u002Ftextmining-ja) | - | - | ⭐ 3 | 🟢 march|\n| 🔗 [nlp2025-tutorial-2](https:\u002F\u002Fgithub.com\u002Fyuiseki\u002Fnlp2025-tutorial-2) | - | - | ⭐ 17 | 🟢 february|\n| 🔗 [nlp100v2025](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp100v2025) | - | - | ⭐ 90 | 🟡 april 2025|\n| 🔗 [public-annotations](https:\u002F\u002Fgithub.com\u002Fmanga109\u002Fpublic-annotations) | - | - | ⭐ 13 | 🟡 april 2025|\n| 🔗 [topic-models-ao](https:\u002F\u002Fgithub.com\u002Fanemptyarchive\u002Ftopic-models-ao) | - | - | ⭐ 4 | 🟡 may 2025|\n| 🔗 [slp2025](https:\u002F\u002Fgithub.com\u002Fryota-komatsu\u002Fslp2025) | - | - | ⭐ 64 | 🟢 last wednesday|\n| 🔗 [book_impress_it-basic-education-ai](https:\u002F\u002Fgithub.com\u002Fliber-craft-co-ltd\u002Fbook_impress_it-basic-education-ai) | - | - | ⭐ 4 | 🟡 june 2025|\n| 🔗 [genai-agent-advanced-book](https:\u002F\u002Fgithub.com\u002Fmasamasa59\u002Fgenai-agent-advanced-book) | - | - | ⭐ 194 | 🟡 september 2025|\n| 🔗 [course2024-nlp](https:\u002F\u002Fgithub.com\u002Ftomonari-masada\u002Fcourse2024-nlp) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [support-genai-book](https:\u002F\u002Fgithub.com\u002Fyoheikikuta\u002Fsupport-genai-book) | - | - | ⭐ 91 | 🟢 january|\n| 🔗 [ir100](https:\u002F\u002Fgithub.com\u002Fir100\u002Fir100) | - | - | ⭐ 93 | 🟡 december 2025|\n| 🔗 [kaggle_llm_book](https:\u002F\u002Fgithub.com\u002Fsinchir0\u002Fkaggle_llm_book) | - | - | ⭐ 31 | 🟢 march|\n\n\n## Research summary\nSummaries of studies and papers in Japanese NLP research\n\n * [awesome-bert-japanese](https:\u002F\u002Fgithub.com\u002Fhimkt\u002Fawesome-bert-japanese) - A list of pre-trained BERT models for Japanese with word\u002Fsubword tokenization + vocabulary construction algorithm information\n * [GEC-Info-ja](https:\u002F\u002Fgithub.com\u002Fgotutiyan\u002FGEC-Info-ja) - 文法誤り訂正に関する日本語文献を収集・分類するためのリポジトリ\n * [dataset-list](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fdataset-list) - lists of text corpus and more (mainly Japanese)\n * [tuning_playbook_ja](https:\u002F\u002Fgithub.com\u002FValkyrja3607\u002Ftuning_playbook_ja) - ディープラーニングモデルの性能を体系的に最大化するためのプレイブック\n * [japanese-pitch-accent-resources](https:\u002F\u002Fgithub.com\u002Folety\u002Fjapanese-pitch-accent-resources) - Trying to consolidate japanese phonetic, and in particular pitch accent resources into one list\n * [awesome-japanese-llm](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fawesome-japanese-llm) - オープンソースの日本語LLMまとめ\n\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [awesome-bert-japanese](https:\u002F\u002Fgithub.com\u002Fhimkt\u002Fawesome-bert-japanese) | - | - | ⭐ 132 | 🔴 march 2023|\n| 🔗 [GEC-Info-ja](https:\u002F\u002Fgithub.com\u002Fgotutiyan\u002FGEC-Info-ja) | - | - | ⭐ 13 | 🟡 april 2025|\n| 🔗 [dataset-list](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fdataset-list) | - | - | ⭐ 118 | 🔴 july 2024|\n| 🔗 [tuning_playbook_ja](https:\u002F\u002Fgithub.com\u002FValkyrja3607\u002Ftuning_playbook_ja) | - | - | ⭐ 190 | 🔴 january 2023|\n| 🔗 [japanese-pitch-accent-resources](https:\u002F\u002Fgithub.com\u002Folety\u002Fjapanese-pitch-accent-resources) | - | - | ⭐ 126 | 🔴 february 2024|\n| 🔗 [awesome-japanese-llm](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fawesome-japanese-llm) | - | - | ⭐ 1.4k | 🟢 march|\n\n\n## Reference\n\n * [自然言語処理の餅屋](https:\u002F\u002Fwww.jnlp.org\u002Fnlp\u002Ftop)\n * [yasuokaの日記： 日本語係り受け解析器「2020年の総ざらえ」](https:\u002F\u002Fsrad.jp\u002F~yasuoka\u002Fjournal\u002F643631\u002F)\n * [yasuokaの日記： 日本語係り受け解析器「2021年の総ざらえ」](https:\u002F\u002Fsrad.jp\u002F~yasuoka\u002Fjournal\u002F651542\u002F)\n * https:\u002F\u002Fgithub.com\u002Ftopics\u002Fjapanese?l=python\n * https:\u002F\u002Fgithub.com\u002Ftopics\u002Fjapanese-language?l=python\n * https:\u002F\u002Fgithub.com\u002Fsearch?o=desc&q=corpus+japanese&s=&type=Repositories\n * https:\u002F\u002Fpaperswithcode.com\u002Fdatasets?lang=japanese\n * https:\u002F\u002Fgithub.com\u002Fhimkt\u002Fawesome-bert-japanese\n * [Awesome-Rust-MachineLearning-日本語向けのrustクレートや記事等をまとめたもの](https:\u002F\u002Fgithub.com\u002Fvaaaaanquish\u002FAwesome-Rust-MachineLearning\u002Fblob\u002Fmain\u002FREADME.ja.md)\n * [大規模言語モデル入門Ⅱ 〜生成型LLMの実装と評価](https:\u002F\u002Fgihyo.jp\u002Fbook\u002F2024\u002F978-4-297-14393-0)\n\n\n## Contributors\n\n * [kaisugi](https:\u002F\u002Fgithub.com\u002Fkaisugi) - [website](https:\u002F\u002Fkaisugi.me)\n * [bomin0624](https:\u002F\u002Fgithub.com\u002Fbomin0624) - [twitter](https:\u002F\u002Ftwitter.com\u002Fbomin0624_c)\n * [passaglia](https:\u002F\u002Fgithub.com\u002Fpassaglia) - [twitter](https:\u002F\u002Ftwitter.com\u002FSamPassaglia)\n * [sarumaj](https:\u002F\u002Fgithub.com\u002Fsarumaj) - [github](https:\u002F\u002Fgithub.com\u002Fsarumaj)\n * [ln2058](https:\u002F\u002Fgithub.com\u002Fln2058) - [github](https:\u002F\u002Fgithub.com\u002Fln2058)\n * [ajtgjmdjp](https:\u002F\u002Fgithub.com\u002Fajtgjmdjp) - [github](https:\u002F\u002Fgithub.com\u002Fajtgjmdjp)\n","# 令人惊叹的日本自然语言处理资源\n\n[![Awesome](https:\u002F\u002Fcdn.rawgit.com\u002Fsindresorhus\u002Fawesome\u002Fd7305f38d29fed78fa85652e3a63e154dd8e8829\u002Fmedia\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources)\n[![PRs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-brightgreen)](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fpulls)\n[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources-search)\n[![许可证：CC0-1.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-CC0_1.0-lightgrey.svg)](http:\u002F\u002Fcreativecommons.org\u002Fpublicdomain\u002Fzero\u002F1.0\u002F)\n[![CC0](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaishi-i_awesome-japanese-nlp-resources_readme_b7657951a0bb.png)](http:\u002F\u002Fcreativecommons.org\u002Fpublicdomain\u002Fzero\u002F1.0\u002F)\n\n一个精心整理的列表，专门收录用于日语自然语言处理的 Python 库、大模型、词典和语料库。\n\n- 列出了 [850 个 GitHub 仓库](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.full.md) 的相关信息\n- 列出了 [278 个 Hugging Face 仓库](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002Fhuggingface.md) 的相关信息（模型和数据集）\n- 🎉 我们很高兴地宣布于 2026 年 3 月 1 日发布 [awesome-japanese-nlp-survey](https:\u002F\u002Fawesome-japanese-nlp-survey.vercel.app\u002F)！\n\n\n[English](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.en.md) | [日本語 (Japanese) ](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.ja.md) | [繁體中文 (Chinese) ](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.zh-hant.md) | [简体中文 (Chinese) ](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.zh-hans.md)\n\n\n## 🎉 最新添加的内容\n\n**语料库**\n * [kamuskita](https:\u002F\u002Fgithub.com\u002Fmatbahasa\u002Fkamuskita) - 马来语学习会正在制作的开放性马来语-日语词典《大家的马来语词典》\n\n_更新于 2026 年 4 月 6 日_\n\n## 目录\n * [Hugging Face](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002Fhuggingface.md)\n   * [模型](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002Fhuggingface.md#models)\n   * [数据集](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fblob\u002Fmain\u002Fdocs\u002Fhuggingface.md#datasets)\n * [Python 库](#python-library)\n   * [形态分析](#morphology-analysis)\n   * [句法分析](#parsing)\n   * [转换器](#converter)\n   * [预处理工具](#preprocessor)\n   * [句子分割器](#sentence-spliter)\n   * [情感分析](#sentiment-analysis)\n   * [机器翻译](#machine-translation)\n   * [命名实体识别](#named-entity-recognition)\n   * [OCR](#ocr)\n   * [预训练模型工具](#tool-for-pretrained-models)\n   * [其他](#others)\n * [C++](#c)\n   * [形态分析](#morphology-analysis-1)\n   * [句法分析](#parsing-1)\n   * [其他](#others-1)\n * [Rust crate](#rust-crate)\n   * [形态分析](#morphology-analysis-2)\n   * [转换器](#converter-1)\n   * [搜索引擎库](#search-engine-library)\n   * [其他](#others-2)\n * [JavaScript](#javaScript)\n   * [形态分析](#morphology-analysis-3)\n   * [转换器](#converter-2)\n   * [其他](#others-3)\n * [Go](#go)\n   * [形态分析](#morphology-analysis-4)\n   * [其他](#others-4)\n * [Java](#java)\n   * [形态分析](#morphology-analysis-5)\n   * [其他](#others-5)\n * [预训练模型](#pretrained-model)\n   * [Word2Vec](#word2Vec)\n   * [基于 Transformer 的模型](#transformer-based-models)\n * [ChatGPT](#chatgpt)\n * [词典与输入法](#dictionary-and-ime)\n * [语料库](#corpus)\n   * [词性标注 \u002F 命名实体识别](#part-of-speech-tagging--named-entity-recognition)\n   * [文本分类](#text-classification)\n   * [平行语料库](#parallel-corpus)\n   * [对话语料库](#dialog-corpus)\n   * [其他](#others-3)\n * [教程](#tutorial)\n * [研究综述](#research-summary)\n * [参考文献](#reference)\n * [贡献者](#contributors)\n\n\n## Python 库\n\n### 形态分析\n将日语文本切分为词或语素，并标注词性及词干形式的库\n\n * [sudachi.rs](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002Fsudachi.rs) - SudachiPy 0.6 及以上版本基于 Sudachi.rs 开发。\n * [Janome](https:\u002F\u002Fgithub.com\u002Fmocobeta\u002Fjanome) - 纯 Python 编写的日语形态分析引擎。\n * [mecab-python3](https:\u002F\u002Fgithub.com\u002FSamuraiT\u002Fmecab-python3) - mecab-python。原始版本可在 http:\u002F\u002Ftaku910.github.io\u002Fmecab\u002F 找到。\n * [mecab](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fmecab) - 该仓库用于构建 Windows 64 位 MeCab 二进制文件并改进 MeCab 的 Python 绑定。\n * [fugashi](https:\u002F\u002Fgithub.com\u002Fpolm\u002Ffugashi) - 一个使用 Cython 封装的 MeCab 工具，用于快速、Python 风格的日语文本分词和形态分析。\n * [nagisa](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fnagisa) - 基于循环神经网络的日语分词器。\n * [pyknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fpyknp) - 用于 JUMAN++\u002FKNP 的 Python 模块。\n * [Mykytea-python](https:\u002F\u002Fgithub.com\u002Fchezou\u002FMykytea-python) - KyTea 的 Python 封装。\n * [konoha](https:\u002F\u002Fgithub.com\u002Fhimkt\u002Fkonoha) - Konoha：简单的日语分词器封装。\n * [natto-py](https:\u002F\u002Fgithub.com\u002Fburuzaemon\u002Fnatto-py) - natto-py 将 Python 编程语言与日语词性及形态分析工具 MeCab 结合起来。\n * [rakutenma-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Frakutenma-python) - 拉克坦 MA（Python 版）。\n * [python-vaporetto](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fpython-vaporetto) - Vaporetto 是一种基于点预测的快速轻量级分词器。这是 Vaporetto 的 Python 封装。\n * [dango](https:\u002F\u002Fgithub.com\u002Fmkartawijaya\u002Fdango) - 一款易于使用的日语文本分词器，面向语言学习者和非语言学专业人士。\n * [rhoknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Frhoknp) - 另一个用于 Juman++\u002FKNP 的 Python 绑定。\n * [python-vibrato](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fpython-vibrato) - 基于维特比算法的加速分词器（Python 封装）。\n * [jagger-python](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjagger-python) - Jagger（基于规则的日语形态分析器的 C++ 实现）的 Python 绑定。\n * [Mecari](https:\u002F\u002Fgithub.com\u002Fzbller\u002FMecari) - Mecari（基于图神经网络的日语形态分析）\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [SudachiPy](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachiPy) | 📥 37.5万 | 📦 6300万 | ⭐ 429 | 🔴 2022年10月|\n| 🔗 [Janome](https:\u002F\u002Fgithub.com\u002Fmocobeta\u002Fjanome) | 📥 5万 | 📦 1200万 | ⭐ 909 | 🟡 2025年10月|\n| 🔗 [mecab-python3](https:\u002F\u002Fgithub.com\u002FSamuraiT\u002Fmecab-python3) | 📥 20.6万 | 📦 3600万 | ⭐ 581 | 🟡 2025年11月|\n| 🔗 [mecab](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fmecab\u002Ftree\u002Fmaster\u002Fmecab\u002Fpython) | 📥 2.4万 | 📦 72.4万 | ⭐ 271 | 🔴 2024年10月|\n| 🔗 [fugashi](https:\u002F\u002Fgithub.com\u002Fpolm\u002Ffugashi) | 📥 12万 | 📦 1400万 | ⭐ 518 | 🟡 2025年10月|\n| 🔗 [nagisa](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fnagisa) | 📥 4.9万 | 📦 800万 | ⭐ 416 | 🟢 2月|\n| 🔗 [pyknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fpyknp) | 📥 1千 | 📦 300万 | ⭐ 93 | 🟢 1月|\n| 🔗 [Mykytea-python](https:\u002F\u002Fgithub.com\u002Fchezou\u002FMykytea-python) | 📥 2千 | 📦 56.2万 | ⭐ 36 | 🟢 上周一|\n| 🔗 [konoha](https:\u002F\u002Fgithub.com\u002Fhimkt\u002Fkonoha) | 📥 5万 | 📦 600万 | ⭐ 261 | 🟢 3月|\n| 🔗 [natto-py](https:\u002F\u002Fgithub.com\u002Fburuzaemon\u002Fnatto-py) | 📥 3.8万 | 📦 3400万 | ⭐ 95 | 🔴 2023年11月|\n| 🔗 [rakutenma-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Frakutenma-python) | 📥 14 | 📦 2.7万 | ⭐ 23 | 🔴 2017年5月|\n| 🔗 [python-vaporetto](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fpython-vaporetto) | 📥 229 | 📦 17.5万 | ⭐ 21 | 🟡 2025年6月|\n| 🔗 [dango](https:\u002F\u002Fgithub.com\u002Fmkartawijaya\u002Fdango) | 📥 42 | 📦 2.6万 | ⭐ 25 | 🔴 2021年11月|\n| 🔗 [rhoknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Frhoknp) | 📥 2万 | 📦 100万 | ⭐ 38 | 🟢 3月|\n| 🔗 [python-vibrato](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fpython-vibrato) | 📥 138 | 📦 11.7万 | ⭐ 43 | 🔴 2024年9月|\n| 🔗 [jagger-python](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjagger-python) | 📥 631 | 📦 30万 | ⭐ 13 | 🔴 2024年3月|\n| 🔗 [Mecari](https:\u002F\u002Fgithub.com\u002Fzbller\u002FMecari) | - | - | ⭐ 39 | 🟡 2025年9月|\n\n### 语法分析\n用于分析日语句子句法结构和依存关系的库\n\n * [ginza](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fginza) - 基于 Universal Dependencies 并以 spaCy 为框架的日语 NLP 库\n * [cabocha](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fcabocha) - 另一个日语依存结构分析器\n * [UniDic2UD](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002FUniDic2UD) - 面向现代及当代日语的分词、词性标注、词元还原与依存句法分析工具\n * [camphr](https:\u002F\u002Fgithub.com\u002FPKSHATechnology-Research\u002Fcamphr) - Camphr - 用于构建流水线组件的 NLP 库\n * [SuPar-UniDic](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002FSuPar-UniDic) - 结合 BERT 模型的现代及当代日语分词、词性标注、词元还原与依存句法分析工具\n * [depccg](https:\u002F\u002Fgithub.com\u002Fmasashi-y\u002Fdepccg) - 基于超标记和依存关系因子化模型的 A* CCG 解析器\n * [bertknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fbertknp) - 基于 BERT 的日语依存句法分析器\n * [esupar](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002Fesupar) - 使用 BERT\u002FRoBERTa\u002FDeBERTa 模型的日语及其他语言的分词、词性标注与依存句法分析工具\n * [yomikata](https:\u002F\u002Fgithub.com\u002Fpassaglia\u002Fyomikata) - 利用微调后的 BERT 模型进行同音异义词消歧的库\n * [jdepp-python](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjdepp-python) - J.DepP（C++ 实现的日语依存句法分析器）的 Python 绑定\n * [lightblue](https:\u002F\u002Fgithub.com\u002Fdaisukebekki\u002Flightblue) - 基于 DTS 表示的日语 CCG 解析器\n * [natsume-simple](https:\u002F\u002Fgithub.com\u002Fborh-lab\u002Fnatsume-simple) - natsume-simple 是一个日语助词依存关系检索系统\n * [jdeppy](https:\u002F\u002Fgithub.com\u002Fmatsurih\u002Fjdeppy) - J.DepP 的 Python 封装，一款快速的日语依存句法分析器\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [ginza](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fginza) | 📥 1.2万 | 📦 200万 | ⭐ 841 | 🔴 2024年3月|\n| 🔗 [cabocha](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fcabocha\u002Ftree\u002Fmaster\u002Fpython) | 📥 98 | 📦 5.4万 | ⭐ 7 | 🔴 2022年8月|\n| 🔗 [UniDic2UD](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002FUniDic2UD) | 📥 256 | 📦 33万 | ⭐ 38 | 🟡 2025年12月|\n| 🔗 [camphr](https:\u002F\u002Fgithub.com\u002FPKSHATechnology-Research\u002Fcamphr) | 📥 580 | 📦 27.1万 | ⭐ 337 | 🔴 2021年8月|\n| 🔗 [SuPar-UniDic](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002FSuParUniDic) | 📥 32 | 📦 11.9万 | ⭐ 21 | 🔴 仓库未找到|\n| 🔗 [depccg](https:\u002F\u002Fgithub.com\u002Fmasashi-y\u002Fdepccg) | 📥 60 | 📦 4.6万 | ⭐ 98 | 🔴 2023年8月|\n| 🔗 [bertknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fbertknp) | - | - | ⭐ 23 | 🔴 2021年10月|\n| 🔗 [esupar](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002Fesupar) | 📥 516 | 📦 17.1万 | ⭐ 55 | 🟢 2月|\n| 🔗 [yomikata](https:\u002F\u002Fgithub.com\u002Fpassaglia\u002Fyomikata) | 📥 33 | 📦 5万 | ⭐ 32 | 🔴 2023年10月|\n| 🔗 [jdepp-python](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjdepp-python) | 📥 647 | 📦 28.5万 | ⭐ 4 | 🔴 2024年2月|\n| 🔗 [lightblue](https:\u002F\u002Fgithub.com\u002Fdaisukebekki\u002Flightblue) | - | - | ⭐ 28 | 🟢 3月|\n| 🔗 [natsume-simple](https:\u002F\u002Fgithub.com\u002Fborh-lab\u002Fnatsume-simple) | - | - | ⭐ 5 | 🔴 2025年2月|\n| 🔗 [jdeppy](https:\u002F\u002Fgithub.com\u002Fmatsurih\u002Fjdeppy) | 📥 10 | 📦 1.1万 | ⭐ 3 | 🔴 2022年2月|\n\n### 转换器\n用于在假名、罗马字以及全角\u002F半角形式之间进行转换的库\n\n * [pykakasi](https:\u002F\u002Fgithub.com\u002Fmiurahr\u002Fpykakasi) - 轻量级的日本假名-汉字句子到假名-罗马字的转换工具。\n * [cutlet](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fcutlet) - Python实现的日语到罗马字转换工具。\n * [alphabet2kana](https:\u002F\u002Fgithub.com\u002Fshihono\u002Falphabet2kana) - 将英文字母转换为片假名。\n * [Convert-Numbers-to-Japanese](https:\u002F\u002Fgithub.com\u002FGreatdane\u002FConvert-Numbers-to-Japanese) - 将阿拉伯数字（西式数字）转换为符合日语语境的表达方式。\n * [mozcpy](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fmozcpy) - Mozc for Python：假名-汉字转换工具。\n * [jamorasep](https:\u002F\u002Fgithub.com\u002Ftachi-hi\u002Fjamorasep) - 日语文本解析器，用于将平假名\u002F片假名字符串分割成音节。\n * [text2phoneme](https:\u002F\u002Fgithub.com\u002Fkorguchi\u002Ftext2phoneme) - 将日语文本转换为音素序列的脚本。\n * [jntajis-python](https:\u002F\u002Fgithub.com\u002Fopencollector\u002Fjntajis-python) - 基于日本国税厅定义的方案构建的快速字符转换与转写库。\n * [wiredify](https:\u002F\u002Fgithub.com\u002Feggplants\u002Fwiredify) - 将日语假名中的“ba-bi-bu-be-bo”转换为“va-vi-vu-ve-vo”。\n * [mecab-text-cleaner](https:\u002F\u002Fgithub.com\u002F34j\u002Fmecab-text-cleaner) - 使用Mecab获取日语读音（よみがな）和重音的简单Python包（CLI\u002FPython API）。\n * [pynormalizenumexp](https:\u002F\u002Fgithub.com\u002Ftkscode\u002Fpynormalizenumexp) - 用于提取和规范化数量及时间表达的NormalizeNumexp的Python实现。\n * [Jusho](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002FJusho) - 简单封装日本邮政编码数据的工具。\n * [yurenizer](https:\u002F\u002Fgithub.com\u002Fsea-turt1e\u002Fyurenizer) - 解决日语书写不一致问题的日语文本归一化工具。\n * [e2k](https:\u002F\u002Fgithub.com\u002FPatchethium\u002Fe2k) - 自动将英语转换为片假名的工具。\n * [alkana.py](https:\u002F\u002Fgithub.com\u002Fzomysan\u002Falkana.py) - 用于获取字母字符串对应的片假名读音的工具。\n * [englishtokanaconverter](https:\u002F\u002Fgithub.com\u002Factlaboratory\u002Fenglishtokanaconverter) - 将英语字符串转换为片假名的程序。\n * [kanjiconv](https:\u002F\u002Fgithub.com\u002Fsea-turt1e\u002Fkanjiconv) - 汉字转换为平假名、片假名或罗马字母的工具。\n * [kanjize](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002Fkanjize) - Kanjize（カンジャイズ）：汉字-数字与整数之间的简易转换工具。\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [pykakasi](https:\u002F\u002Fgithub.com\u002Fmiurahr\u002Fpykakasi) | 📥 29.8万 | 📦 3000万 | ⭐ 445 | 🔴 2022年7月|\n| 🔗 [cutlet](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fcutlet) | 📥 1.8万 | 📦 200万 | ⭐ 374 | 🟡 2025年6月|\n| 🔗 [alphabet2kana](https:\u002F\u002Fgithub.com\u002Fshihono\u002Falphabet2kana) | 📥 215 | 📦 5.8万 | ⭐ 14 | 🟢 2月|\n| 🔗 [Convert-Numbers-to-Japanese](https:\u002F\u002Fgithub.com\u002FGreatdane\u002FConvert-Numbers-to-Japanese) | - | - | ⭐ 50 | 🔴 2020年11月|\n| 🔗 [mozcpy](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fmozcpy) | 📥 11.4万 | 📦 1.3万 | ⭐ 47 | 🔴 2025年2月|\n| 🔗 [jamorasep](https:\u002F\u002Fgithub.com\u002Ftachi-hi\u002Fjamorasep) | 📥 8.9万 | 📦 9千 | ⭐ 11 | 🟢 2月|\n| 🔗 [text2phoneme](https:\u002F\u002Fgithub.com\u002Fkorguchi\u002Ftext2phoneme) | - | - | ⭐ 13 | 🔴 2023年5月|\n| 🔗 [jntajis-python](https:\u002F\u002Fgithub.com\u002Fopencollector\u002Fjntajis-python) | 📥 1千 | 📦 11.7万 | ⭐ 21 | 🟢 3月|\n| 🔗 [wiredify](https:\u002F\u002Fgithub.com\u002Feggplants\u002Fwiredify) | 📥 2.7万 | 📦 6千 | ⭐ 3 | 🟡 2025年12月|\n| 🔗 [mecab-text-cleaner](https:\u002F\u002Fgithub.com\u002F34j\u002Fmecab-text-cleaner) | 📥 1万 | 📦 4千 | ⭐ 7 | 🟢 2月|\n| 🔗 [pynormalizenumexp](https:\u002F\u002Fgithub.com\u002Ftkscode\u002Fpynormalizenumexp) | 📥 3万 | 📦 1.4万 | ⭐ 8 | 🔴 2024年4月|\n| 🔗 [Jusho](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002FJusho) | 📥 21.7万 | 📦 5.5万 | ⭐ 11 | 🔴 2024年6月|\n| 🔗 [yurenizer](https:\u002F\u002Fgithub.com\u002Fsea-turt1e\u002Fyurenizer) | 📥 5.1万 | 📦 1.8万 | ⭐ 5 | 🔴 2025年3月|\n| 🔗 [e2k](https:\u002F\u002Fgithub.com\u002FPatchethium\u002Fe2k) | 📥 3.68万 | 📦 2.6万 | ⭐ 16 | 🟢 3月|\n| 🔗 [alkana.py](https:\u002F\u002Fgithub.com\u002Fzomysan\u002Falkana.py) | - | - | ⭐ 34 | 🔴 2021年10月|\n| 🔗 [englishtokanaconverter](https:\u002F\u002Fgithub.com\u002Factlaboratory\u002Fenglishtokanaconverter) | - | - | ⭐ 4 | 🟢 昨天|\n| 🔗 [kanjiconv](https:\u002F\u002Fgithub.com\u002Fsea-turt1e\u002Fkanjiconv) | 📥 13.3万 | 📦 1.2万 | ⭐ 17 | 🟡 2025年10月|\n| 🔗 [kanjize](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002Fkanjize) | 📥 1.2万 | 📦 100万 | ⭐ 68 | 🟡 2025年6月|\n\n\n### 预处理工具\n用于在文本分析之前对其进行归一化和清理的库\n\n * [neologdn](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fneologdn) - 针对mecab-neologd的日语文本归一化工具。\n * [jaconv](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fjaconv) - 纯Python实现的平假名、片假名、半角及全角字符之间的相互转换工具。\n * [mojimoji](https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fmojimoji) - 快速转换日语半角与全角字符的工具。\n * [text-cleaning](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Ftext-cleaning) - 功能强大的日语文本网页文本清理工具。\n * [HojiChar](https:\u002F\u002Fgithub.com\u002FHojiChar\u002FHojiChar) - 用于配置和管理多种预处理步骤的文本预处理工具。\n * [utsuho](https:\u002F\u002Fgithub.com\u002Fjuno-rmks\u002Futsuho) - Utsuho是一个Python模块，可方便地实现日语中半角片假名与全角片假名之间的双向转换。\n * [python-habachen](https:\u002F\u002Fgithub.com\u002FHizuru3\u002Fpython-habachen) - 又一款快速的日语字符串转换工具。\n * [kairyou](https:\u002F\u002Fgithub.com\u002Fbikatr7\u002Fkairyou) - 利用SpaCy的NLP\u002FNER技术快速预处理日语文本，适用于日语翻译或其他NLP任务。\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [neologdn](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fneologdn) | 📥 8千 | 📦 100万 | ⭐ 287 | 🟡 2025年12月|\n| 🔗 [jaconv](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fjaconv) | 📥 56.7万 | 📦 6400万 | ⭐ 344 | 🟢 2月|\n| 🔗 [mojimoji](https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fmojimoji) | 📥 7万 | 📦 1100万 | ⭐ 152 | 🔴 2024年1月|\n| 🔗 [text-cleaning](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Ftext-cleaning) | - | - | ⭐ 12 | 🔴 2022年11月|\n| 🔗 [HojiChar](https:\u002F\u002Fgithub.com\u002FHojiChar\u002FHojiChar) | 📥 1.9万 | 📦 91.9万 | ⭐ 125 | 🟡 2025年11月|\n| 🔗 [utsuho](https:\u002F\u002Fgithub.com\u002Fjuno-rmks\u002Futsuho) | 📥 29.1万 | 📦 2.1万 | ⭐ 4 | 🟢 3月|\n| 🔗 [python-habachen](https:\u002F\u002Fgithub.com\u002FHizuru3\u002Fpython-habachen) | 📥 2.6万 | 📦 200万 | ⭐ 6 | 🟡 2025年10月|\n| 🔗 [kairyou](https:\u002F\u002Fgithub.com\u002Fbikatr7\u002Fkairyou) | 📥 5.8万 | 📦 3.1万 | ⭐ 6 | 🟡 2025年6月|\n\n### 句子分割器\n能够自动检测句子边界并分割文本的库\n\n * [Bunkai](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fbunkai) - 日语文本的句子边界判定工具\n * [japanese-sentence-breaker](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fjapanese-sentence-breaker) - 日语句子分割器\n * [sengiri](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fsengiri) - 又一款用于日语文本的句子级分词工具\n * [budoux](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fbudoux) - 独立、小巧、语言中立。BudouX 是 Budou 的继任者，后者是一款基于机器学习的换行组织工具。\n * [ja_sentence_segmenter](https:\u002F\u002Fgithub.com\u002Fwwwcojp\u002Fja_sentence_segmenter) - 用于 Python 的日语句子分割库\n * [hasami](https:\u002F\u002Fgithub.com\u002Fmkartawijaya\u002Fhasami) - 用于对日语文本进行句子分割的工具\n * [kuzukiri](https:\u002F\u002Fgithub.com\u002Falinear-corp\u002Fkuzukiri) - 用 Rust 编写的 Python 日语文本分割器\n * [ja-senter-benchmark](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fja-senter-benchmark) - 日语句子分割工具比较\n * [fast-bunkai](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Ffast-bunkai) - 日语句子分割（日本語文境界判定器），通过 Rust 加速的 Python 库实现，速度比 megagonlabs\u002Fbunkai 快 40–250 倍，且 API 兼容性几乎完全一致。\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [bunkai](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fbunkai) | 📥 571 | 📦 109k | ⭐ 199 | 🔴 2023年8月|\n| 🔗 [japanese-sentence-breaker](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fjapanese-sentence-breaker) | 📥 4 | 📦 5k | ⭐ 14 | 🔴 2021年2月|\n| 🔗 [sengiri](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fsengiri) | 📥 100 | 📦 136k | ⭐ 24 | 🟡 2025年11月|\n| 🔗 [budoux](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fbudoux) | 📥 9k | 📦 451k | ⭐ 1.6k | 🟢 上周四|\n| 🔗 [ja_sentence_segmenter](https:\u002F\u002Fgithub.com\u002Fwwwcojp\u002Fja_sentence_segmenter) | 📥 2k | 📦 193k | ⭐ 74 | 🔴 2023年4月|\n| 🔗 [hasami](https:\u002F\u002Fgithub.com\u002Fmkartawijaya\u002Fhasami) | 📥 158 | 📦 39k | ⭐ 6 | 🔴 2021年2月|\n| 🔗 [kuzukiri](https:\u002F\u002Fgithub.com\u002Falinear-corp\u002Fkuzukiri) | 📥 183 | 📦 27k | ⭐ 6 | 🟡 2025年6月|\n| 🔗 [ja-senter-benchmark](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fja-senter-benchmark) | - | - | ⭐ 9 | 🔴 2023年2月|\n| 🔗 [fast-bunkai](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Ffast-bunkai) | 📥 71 | 📦 4k | ⭐ 71 | 🟡 2025年10月|\n\n\n### 情感分析\n能够检测文本中情感或极性的库\n\n * [oseti](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Foseti) - 基于词典的日语情感分析工具\n * [negapoji](https:\u002F\u002Fgithub.com\u002Fliaoziyang\u002Fnegapoji) - 日语文本的正负情感分类。日本語文書のネガポジを判定。\n * [pymlask](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fpymlask) - 日语文本的情感分析器\n * [asari](https:\u002F\u002Fgithub.com\u002FHironsan\u002Fasari) - 用 Python 实现的日语情感分析器。\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [oseti](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Foseti) | 📥 379 | 📦 167k | ⭐ 97 | 🟡 2025年8月|\n| 🔗 [negapoji](https:\u002F\u002Fgithub.com\u002Fliaoziyang\u002Fnegapoji) | - | - | ⭐ 151 | 🔴 2017年8月|\n| 🔗 [pymlask](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fpymlask) | 📥 40 | 📦 66k | ⭐ 116 | 🔴 2024年7月|\n| 🔗 [asari](https:\u002F\u002Fgithub.com\u002FHironsan\u002Fasari) | 📥 91 | 📦 80k | ⭐ 152 | 🔴 2022年10月|\n\n\n### 机器翻译\n能够自动将文本从一种语言翻译成另一种语言的库\n\n * [jparacrawl-finetune](https:\u002F\u002Fgithub.com\u002FMorinoseiMorizo\u002Fjparacrawl-finetune) - JParaCrawl 预训练神经机器翻译（NMT）模型的示例用法。\n * [JASS](https:\u002F\u002Fgithub.com\u002FMao-KU\u002FJASS) - JASS：针对日语的序列到序列预训练，用于神经机器翻译（LREC2020）；以及面向低资源神经机器翻译的语言学驱动多任务预训练（ACM TALLIP）。\n * [PheMT](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FPheMT) - 一个基于现象的日英机器翻译鲁棒性评估数据集。该数据集基于 MTNT 数据集，并额外标注了四种语言现象：专有名词、缩略名词、口语表达和变体。COLING 2020。\n * [VISA](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FVISA) - 一个用于视觉场景感知机器翻译的歧义字幕数据集\n * [plamo-translate-cli](https:\u002F\u002Fgithub.com\u002Fpfnet\u002Fplamo-translate-cli) - 使用 plamo-2-translate 模型进行本地执行的命令行翻译接口。\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [jparacrawl-finetune](https:\u002F\u002Fgithub.com\u002FMorinoseiMorizo\u002Fjparacrawl-finetune) | - | - | ⭐ 105 | 🔴 2021年4月|\n| 🔗 [JASS](https:\u002F\u002Fgithub.com\u002FMao-KU\u002FJASS) | - | - | ⭐ 16 | 🔴 2022年1月|\n| 🔗 [PheMT](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FPheMT) | - | - | ⭐ 19 | 🔴 2021年2月|\n| 🔗 [VISA](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FVISA) | - | - | ⭐ 14 | 🔴 2022年10月|\n| 🔗 [plamo-translate-cli](https:\u002F\u002Fgithub.com\u002Fpfnet\u002Fplamo-translate-cli) | - | - | ⭐ 339 | 🟡 2025年10月|\n\n### 命名实体识别\n从文本中提取人名、地名和组织名称的库\n\n * [namaco](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002Fnamaco) - 基于字符的命名实体识别。\n * [entitypedia](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002Fentitypedia) - Entitypedia 是一个基于维基百科的扩展命名实体词典。\n * [noyaki](https:\u002F\u002Fgithub.com\u002Fken11\u002Fnoyaki) - 将字符跨度标签信息转换为基于分词文本的标签信息。\n * [bert-japanese-ner-finetuning](https:\u002F\u002Fgithub.com\u002Fken11\u002Fbert-japanese-ner-finetuning) - 用于对 BERT 模型进行微调的代码。Bertモデルのファインチューニングで固有表現抽出用タスクのモデルを作成・使用するサンプルです\n * [joint-information-extraction-hs](https:\u002F\u002Fgithub.com\u002Faih-uth\u002Fjoint-information-extraction-hs) - 基于详细标注标准的病例报告语料库，用于推断命名实体及关系抽取精度的代码\n * [pygeonlp](https:\u002F\u002Fgithub.com\u002Fgeonlp-platform\u002Fpygeonlp) - pygeonlp，一个用于日语文本地理标记的 Python 模块。\n * [bert-ner-japanese](https:\u002F\u002Fgithub.com\u002Fjurabiinc\u002Fbert-ner-japanese) - 使用 BERT 进行日语命名实体抽取的微调程序\n * [huggingface-finetune-japanese](https:\u002F\u002Fgithub.com\u002Ftsmatz\u002Fhuggingface-finetune-japanese) - 针对日语语言的仅编码器和编码器-解码器 Transformer 进行微调的示例（Hugging Face 资源）\n * [novelanalysisbyner](https:\u002F\u002Fgithub.com\u002Flychee1223\u002Fnovelanalysisbyner) - 通过 BERT 微调进行命名实体抽取\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [namaco](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002Fnamaco) | - | - | ⭐ 40 | 🔴 2018年2月|\n| 🔗 [entitypedia](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002Fentitypedia) | - | - | ⭐ 13 | 🔴 2018年12月|\n| 🔗 [noyaki](https:\u002F\u002Fgithub.com\u002Fken11\u002Fnoyaki) | 📥 131 | 📦 2万 | ⭐ 5 | 🔴 2022年8月|\n| 🔗 [bert-japanese-ner-finetuning](https:\u002F\u002Fgithub.com\u002Fken11\u002Fbert-japanese-ner-finetuning) | - | - | ⭐ 11 | 🔴 2022年6月|\n| 🔗 [joint-information-extraction-hs](https:\u002F\u002Fgithub.com\u002Faih-uth\u002Fjoint-information-extraction-hs) | - | - | ⭐ 1 | 🔴 2021年11月|\n| 🔗 [pygeonlp](https:\u002F\u002Fgithub.com\u002Fgeonlp-platform\u002Fpygeonlp) | 📥 70 | 📦 2.2万 | ⭐ 22 | 🟢 3月|\n| 🔗 [bert-ner-japanese](https:\u002F\u002Fgithub.com\u002Fjurabiinc\u002Fbert-ner-japanese) | - | - | ⭐ 5 | 🔴 2022年9月|\n| 🔗 [huggingface-finetune-japanese](https:\u002F\u002Fgithub.com\u002Ftsmatz\u002Fhuggingface-finetune-japanese) | - | - | ⭐ 16 | 🔴 2023年10月|\n| 🔗 [novelanalysisbyner](https:\u002F\u002Fgithub.com\u002Flychee1223\u002Fnovelanalysisbyner) | - | - | ⭐ 2 | 🔴 2024年6月|\n\n\n### OCR\n从图像中识别并提取文本的库\n\n * [Manga OCR](https:\u002F\u002Fgithub.com\u002Fkha-white\u002Fmanga-ocr) - 关于日语文本的光学字符识别，主要针对日本漫画\n * [mokuro](https:\u002F\u002Fgithub.com\u002Fkha-white\u002Fmokuro) - 在浏览器中阅读日语漫画，并可选择文本。\n * [handwritten-japanese-ocr](https:\u002F\u002Fgithub.com\u002Fyas-sim\u002Fhandwritten-japanese-ocr) - 使用触摸屏绘制输入文本，结合 Intel OpenVINO 工具包实现的手写日语 OCR 演示\n * [OCR_Japanease](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FOCR_Japanease) - 日本語OCR\n * [ndlocr_cli](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlocr_cli) - NDLOCR 的应用程序\n * [donut](https:\u002F\u002Fgithub.com\u002Fclovaai\u002Fdonut) - ECCV 2022 上提出的无 OCR 文档理解 Transformer（Donut）及合成文档生成器（SynthDoG）的官方实现\n * [JMTrans](https:\u002F\u002Fgithub.com\u002Fttop32\u002FJMTrans) - 漫画翻译工具 - 从 URL 获取日语漫画并翻译漫画图像\n * [Kindai-OCR](https:\u002F\u002Fgithub.com\u002Fducanh841988\u002FKindai-OCR) - 用于识别现代日语杂志的 OCR 系统\n * [text_recognition](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Ftext_recognition) - NDLOCR 用文本识别模块\n * [Poricom](https:\u002F\u002Fgithub.com\u002Fblueaxis\u002FPoricom) - 漫画图像中的光学字符识别。漫画 OCR 桌面应用程序\n * [owocr](https:\u002F\u002Fgithub.com\u002Faurorawright\u002Fowocr) - 针对日语文本的光学字符识别\n * [yomitoku](https:\u002F\u002Fgithub.com\u002Fkotaro-kinoshita\u002Fyomitoku) - Yomitoku 是一款专为日语设计的 AI 驱动文档图像分析软件包。\n * [findtextcenternet](https:\u002F\u002Fgithub.com\u002Flithium0003\u002Ffindtextcenternet) - 基于 CenterNet 的日语 OCR\n * [simple-ocr-for-manga](https:\u002F\u002Fgithub.com\u002Fyisusdev2005\u002Fsimple-ocr-for-manga) - 一种简单的 OCR 工具，适用于传统日语和竖排日语漫画\n * [jp-ocr-evaluation](https:\u002F\u002Fgithub.com\u002Fyoshino\u002Fjp-ocr-evaluation) - 对日语文本图像的 OCR 性能进行评估\n * [paddleocr-vl-sft-for-japanese-manga-on-rtx-3060](https:\u002F\u002Fgithub.com\u002Fopenvino-book\u002Fpaddleocr-vl-sft-for-japanese-manga-on-rtx-3060) - 在 Manga109s 数据集上对 PaddleOCR-VL 进行微调，以识别日语漫画中的文字。基础模型在处理漫画中竖排日语文字的阅读顺序时存在困难。经过微调后，该模型能够正确处理漫画特有的文本布局。\n * [MangaOCR](https:\u002F\u002Fgithub.com\u002Fgnurt2041\u002FMangaOCR) - 一款轻量级的 OCR 模型，特别适用于漫画中的日语文本\n * [meikiocr](https:\u002F\u002Fgithub.com\u002Frtr46\u002Fmeikiocr) - 高速、高精度的日语视频游戏本地 OCR\n * [meikipop](https:\u002F\u002Fgithub.com\u002Frtr46\u002Fmeikipop) - 适用于 Windows、Linux 和 macOS 的通用日语 OCR 弹出式词典\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [manga-ocr](https:\u002F\u002Fgithub.com\u002Fkha-white\u002Fmanga-ocr) | 📥 4千 | 📦 26.7万 | ⭐ 2.6千 | 🟡 2025年6月|\n| 🔗 [mokuro](https:\u002F\u002Fgithub.com\u002Fkha-white\u002Fmokuro) | 📥 1千 | 📦 9.4万 | ⭐ 1.6千 | 🟢 2月|\n| 🔗 [handwritten-japanese-ocr](https:\u002F\u002Fgithub.com\u002Fyas-sim\u002Fhandwritten-japanese-ocr) | - | - | ⭐ 38 | 🔴 2022年4月|\n| 🔗 [OCR_Japanease](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FOCR_Japanease) | - | - | ⭐ 246 | 🔴 2021年4月|\n| 🔗 [ndlocr_cli](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlocr_cli) | - | - | ⭐ 654 | 🟡 2025年9月|\n| 🔗 [donut](https:\u002F\u002Fgithub.com\u002Fclovaai\u002Fdonut) | 📥 291 | 📦 19.8万 | ⭐ 6.8千 | 🔴 2023年7月|\n| 🔗 [JMTrans](https:\u002F\u002Fgithub.com\u002Fttop32\u002FJMTrans) | - | - | ⭐ 90 | 🔴 2021年1月|\n| 🔗 [Kindai-OCR](https:\u002F\u002Fgithub.com\u002Fducanh841988\u002FKindai-OCR) | - | - | ⭐ 153 | 🔴 2023年7月|\n| 🔗 [text_recognition](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Ftext_recognition) | - | - | ⭐ 8 | 🔴 2023年7月|\n| 🔗 [Poricom](https:\u002F\u002Fgithub.com\u002Fblueaxis\u002FPoricom) | - | - | ⭐ 421 | 🔴 2023年6月|\n| 🔗 [owocr](https:\u002F\u002Fgithub.com\u002Faurorawright\u002Fowocr) | - | - | ⭐ 223 | 🟢 上周一|\n| 🔗 [yomitoku](https:\u002F\u002Fgithub.com\u002Fkotaro-kinoshita\u002Fyomitoku) | 📥 1千 | 📦 8.6万 | ⭐ 1.4千 | 🟢 3月|\n| 🔗 [findtextcenternet](https:\u002F\u002Fgithub.com\u002Flithium0003\u002Ffindtextcenternet) | - | - | ⭐ 59 | 🟡 2025年8月|\n| 🔗 [simple-ocr-for-manga](https:\u002F\u002Fgithub.com\u002Fyisusdev2005\u002Fsimple-ocr-for-manga) | - | - | ⭐ 7 | 🔴 仓库未找到|\n| 🔗 [jp-ocr-evaluation](https:\u002F\u002Fgithub.com\u002Fyoshino\u002Fjp-ocr-evaluation) | - | - | ⭐ 1 | 🔴 2024年3月|\n| 🔗 [paddleocr-vl-sft-for-japanese-manga-on-rtx-3060](https:\u002F\u002Fgithub.com\u002Fopenvino-book\u002Fpaddleocr-vl-sft-for-japanese-manga-on-rtx-3060) | - | - | ⭐ 11 | 🟡 2025年12月|\n| 🔗 [MangaOCR](https:\u002F\u002Fgithub.com\u002Fgnurt2041\u002FMangaOCR) | - | - | ⭐ 35 | 🔴 2024年5月|\n| 🔗 [meikiocr](https:\u002F\u002Fgithub.com\u002Frtr46\u002Fmeikiocr) | 📥 1千 | 📦 2.3万 | ⭐ 69 | 🟢 上周三|\n| 🔗 [meikipop](https:\u002F\u002Fgithub.com\u002Frtr46\u002Fmeikipop) | - | - | ⭐ 257 | 🔴 无效|\n\n### 预训练模型工具\n利用预训练模型提升准确率和效率的库\n\n * [JGLUE](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002FJGLUE) - JGLUE：日语通用语言理解评估\n * [ginza-transformers](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fginza-transformers) - 在spacy-transformers中使用自定义分词器\n * [t5_japanese_dialogue_generation](https:\u002F\u002Fgithub.com\u002FJinyamyzk\u002Ft5_japanese_dialogue_generation) - 使用T5进行对话生成\n * [japanese_text_classification](https:\u002F\u002Fgithub.com\u002FMasao-Taketani\u002Fjapanese_text_classification) - 用于研究包括MLP、CNN、RNN、BERT在内的多种深度神经网络文本分类器。\n * [Japanese-BERT-Sentiment-Analyzer](https:\u002F\u002Fgithub.com\u002Fizuna385\u002FJapanese-BERT-Sentiment-Analyzer) - 使用FastAPI和BERT部署情感分析服务端\n * [jmlm_scoring](https:\u002F\u002Fgithub.com\u002Fminhpqn\u002Fjmlm_scoring) - 基于掩码语言模型的日语和越南语评分\n * [allennlp-shiba-model](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fallennlp-shiba-model) - AllenNLP与Shiba的集成：日语CANINE模型\n * [evaluate_japanese_w2v](https:\u002F\u002Fgithub.com\u002Fshihono\u002Fevaluate_japanese_w2v) - 用于在日语相似度数据集上评估预训练日语word2vec模型的脚本\n * [gector-ja](https:\u002F\u002Fgithub.com\u002Fjonnyli1125\u002Fgector-ja) - 基于BERT的日语语法错误检测与修正\n * [Japanese-BPEEncoder](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-BPEEncoder) - 日语BPE编码器\n * [Japanese-BPEEncoder_V2](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-BPEEncoder_V2) - 日语BPE编码器版本2\n * [transformer-copy](https:\u002F\u002Fgithub.com\u002Fyouichiro\u002Ftransformer-copy) - 日语文法错误修正工具\n * [japanese-stable-diffusion](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-stable-diffusion) - 日文Stable Diffusion是一种特定于日语的潜在文本到图像扩散模型，能够根据任意文本输入生成照片级逼真的图像。\n * [nagisa_bert](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fnagisa_bert) - 用于nagisa的BERT模型\n * [prefix-tuning-gpt](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fprefix-tuning-gpt) - GPT\u002FGPT-NeoX模型前缀调优示例代码及使用训练好的前缀进行推理的代码\n * [JGLUE-benchmark](https:\u002F\u002Fgithub.com\u002Fnobu-g\u002FJGLUE-benchmark) - JGLUE日语语言理解基准的训练与评估脚本\n * [jptranstokenizer](https:\u002F\u002Fgithub.com\u002Fretarfi\u002Fjptranstokenizer) - 适用于transformers库的日语分词器\n * [jp-stable](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Flm-evaluation-harness\u002Ftree\u002Fjp-stable) - JP语言模型评估框架\n * [compare-ja-tokenizer](https:\u002F\u002Fgithub.com\u002Fhitachi-nlp\u002Fcompare-ja-tokenizer) - 在连续书写语言中，不同分词器在下游任务上的表现如何？——以日语为例，ACL SRW 2023\n * [lm-evaluation-harness-jp-stable](https:\u002F\u002Fgithub.com\u002Ftdc-yamada-ya\u002Flm-evaluation-harness-jp-stable) - 用于自回归语言模型少样本评估的框架。\n * [llm-lora-classification](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fllm-lora-classification) - llm-lora分类\n * [jp-stable](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Flm-evaluation-harness\u002Ftree\u002Fjp-stable) - JP语言模型评估框架\n * [rinna_gpt-neox_ggml-lora](https:\u002F\u002Fgithub.com\u002Fyukaryavka\u002Frinna_gpt-neox_ggml-lora) - 该仓库包含脚本及合并脚本，经过修改后可将Alpaca-Lora适配器应用于LoRA微调，假设使用“rinna\u002Fjapanese-gpt-neox...”[gpt-neox]模型并将其转换为ggml格式。\n * [japanese-llm-roleplay-benchmark](https:\u002F\u002Fgithub.com\u002Foshizo\u002Fjapanese-llm-roleplay-benchmark) - 此仓库旨在评估日语大模型在角色扮演场景中的性能。\n * [japanese-llm-ranking](https:\u002F\u002Fgithub.com\u002Fyuzu-ai\u002Fjapanese-llm-ranking) - 该仓库支持YuzuAI的日语大模型排行榜，这是LMSYS Vicuna评测的日语版。\n * [llm-jp-eval](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-eval) - 该工具可跨多个数据集自动评估日语大型语言模型。\n * [llm-jp-sft](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-sft) - 该仓库包含LLM-jp模型监督微调的代码。\n * [llm-jp-tokenizer](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-tokenizer) - 这是LLM学习会（LLM-jp）开发的LLM用分词器相关代码汇总仓库。\n * [japanese-lm-fin-harness](https:\u002F\u002Fgithub.com\u002Fpfnet-research\u002Fjapanese-lm-fin-harness) - 日语语言模型财务评估框架\n * [ja-vicuna-qa-benchmark](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fja-vicuna-qa-benchmark) - 日语Vicuna问答基准\n * [swallow-evaluation](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-evaluation) - Swallow项目大型语言模型评估脚本\n * [swallow-evaluation-instruct](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-evaluation-instruct) - Swallow项目事后学习过的大型语言模型评估框架\n * [pretrained_doc2vec_ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fpretrained_doc2vec_ja) - 日语维基百科上的预训练doc2vec模型\n * [pl-bert-ja](https:\u002F\u002Fgithub.com\u002Fkyamauchi1023\u002Fpl-bert-ja) - 日语音素级BERT模型仓库\n\n|名称|每周下载量|总下载量|星数|最近一次提交|\n-|-|-|-|-\n| 🔗 [JGLUE](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002FJGLUE) | - | - | ⭐ 338 | 🔴 2025年3月|\n| 🔗 [ginza-transformers](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fginza-transformers) | 📥 1千 | 📦 18.6万 | ⭐ 无效 | 🔴 2022年8月|\n| 🔗 [t5_japanese_dialogue_generation](https:\u002F\u002Fgithub.com\u002FJinyamyzk\u002Ft5_japanese_dialogue_generation) | - | - | ⭐ 3 | 🔴 2021年11月|\n| 🔗 [japanese_text_classification](https:\u002F\u002Fgithub.com\u002FMasao-Taketani\u002Fjapanese_text_classification) | - | - | ⭐ 9 | 🔴 2020年1月|\n| 🔗 [Japanese-BERT-Sentiment-Analyzer](https:\u002F\u002Fgithub.com\u002Fizuna385\u002FJapanese-BERT-Sentiment-Analyzer) | - | - | ⭐ 无效 | 🔴 2021年4月|\n| 🔗 [jmlm_scoring](https:\u002F\u002Fgithub.com\u002Fminhpqn\u002Fjmlm_scoring) | - | - | ⭐ 5 | 🔴 2022年2月|\n| 🔗 [allennlp-shiba-model](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fallennlp-shiba-model) | 📥 32 | 📦 2万 | ⭐ 12 | 🔴 2021年6月|\n| 🔗 [evaluate_japanese_w2v](https:\u002F\u002Fgithub.com\u002Fshihono\u002Fevaluate_japanese_w2v) | - | - | ⭐ 12 | 🔴 2024年11月|\n| 🔗 [gector-ja](https:\u002F\u002Fgithub.com\u002Fjonnyli1125\u002Fgector-ja) | - | - | ⭐ 19 | 🔴 2021年6月|\n| 🔗 [Japanese-BPEEncoder](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-BPEEncoder) | - | - | ⭐ 41 | 🔴 2021年9月|\n| 🔗 [Japanese-BPEEncoder_V2](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-BPEEncoder_V2) | - | - | ⭐ 41 | 🔴 2023年1月|\n| 🔗 [transformer-copy](https:\u002F\u002Fgithub.com\u002Fyouichiro\u002Ftransformer-copy) | - | - | ⭐ 29 | 🔴 2020年9月|\n| 🔗 [japanese-stable-diffusion](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-stable-diffusion) | - | - | ⭐ 仓库未找到 | 🔴 仓库未找到|\n| 🔗 [nagisa_bert](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fnagisa_bert) | 📥 40 | 📦 5.7万 | ⭐ 5 | 🟢 2月|\n| 🔗 [prefix-tuning-gpt](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fprefix-tuning-gpt) | - | - | ⭐ 仓库未找到 | 🔴 仓库未找到|\n| 🔗 [JGLUE-benchmark](https:\u002F\u002Fgithub.com\u002Fnobu-g\u002FJGLUE-benchmark) | - | - | ⭐ 18 | 🟢 上周四|\n| 🔗 [jptranstokenizer](https:\u002F\u002Fgithub.com\u002Fretarfi\u002Fjptranstokenizer) | 📥 83 | 📦 2.8万 | ⭐ 5 | 🔴 2024年2月|\n| 🔗 [jp-stable](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Flm-evaluation-harness\u002Ftree\u002Fjp-stable) | - | - | ⭐ 154 | 🔴 2023年11月|\n| 🔗 [compare-ja-tokenizer](https:\u002F\u002Fgithub.com\u002Fhitachi-nlp\u002Fcompare-ja-tokenizer) | - | - | ⭐ 6 | 🔴 2023年6月|\n| 🔗 [lm-evaluation-harness-jp-stable](https:\u002F\u002Fgithub.com\u002Ftdc-yamada-ya\u002Flm-evaluation-harness-jp-stable) | - | - | ⭐ 1 | 🔴 2023年6月|\n| 🔗 [llm-lora-classification](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fllm-lora-classification) | - | - | ⭐ 98 | 🔴 2023年7月|\n| 🔗 [jp-stable](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Flm-evaluation-harness\u002Ftree\u002Fjp-stable) | - | - | ⭐ 154 | 🔴 2023年11月|\n| 🔗 [rinna_gpt-neox_ggml-lora](https:\u002F\u002Fgithub.com\u002Fyukaryavka\u002Frinna_gpt-neox_ggml-lora) | - | - | ⭐ 19 | 🔴 2023年5月|\n| 🔗 [japanese-llm-roleplay-benchmark](https:\u002F\u002Fgithub.com\u002Foshizo\u002Fjapanese-llm-roleplay-benchmark) | - | - | ⭐ 40 | 🔴 2023年11月|\n| 🔗 [japanese-llm-ranking](https:\u002F\u002Fgithub.com\u002Fyuzu-ai\u002Fjapanese-llm-ranking) | - | - | ⭐ 50 | 🔴 2024年3月|\n| 🔗 [llm-jp-eval](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-eval) | - | - | ⭐ 150 | 🟢 上周一|\n| 🔗 [llm-jp-sft](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-sft) | - | - | ⭐ 62 | 🔴 2024年6月|\n| 🔗 [llm-jp-tokenizer](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-tokenizer) | - | - | ⭐ 46 | 🟢 上周一|\n| 🔗 [japanese-lm-fin-harness](https:\u002F\u002Fgithub.com\u002Fpfnet-research\u002Fjapanese-lm-fin-harness) | - | - | ⭐ 77 | 🟢 1月|\n| 🔗 [ja-vicuna-qa-benchmark](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fja-vicuna-qa-benchmark) | - | - | ⭐ 33 | 🔴 2024年6月|\n| 🔗 [swallow-evaluation](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-evaluation) | - | - | ⭐ 24 | 🟡 2025年9月|\n| 🔗 [swallow-evaluation-instruct](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-evaluation-instruct) | - | - | ⭐ 27 | 🟡 2025年10月|\n| 🔗 [pretrained_doc2vec_ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fpretrained_doc2vec_ja) | - | - | ⭐ 25 | 🔴 2019年1月|\n| 🔗 [pl-bert-ja](https:\u002F\u002Fgithub.com\u002Fkyamauchi1023\u002Fpl-bert-ja) | - | - | ⭐ 24 | 🔴 2023年12月|\n\n\n\n\n### 其他\n支持日语处理的通用工具\n\n* [namedivider-python](https:\u002F\u002Fgithub.com\u002Frskmoi\u002Fnamedivider-python) - 一个用于将日本全名拆分为姓氏和名字的工具。\n* [asa-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fasa-python) - 一个精选的、专注于日语自然语言处理Python库的资源列表。\n* [python_asa](https:\u002F\u002Fgithub.com\u002FTakeuchi-Lab-LM\u002Fpython_asa) - 日语语义角色标注系统（ASA）的Python实现。\n* [toiro](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Ftoiro) - 一个比较日本分词器的工具。\n* [ja-timex](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fja-timex) - 基于规则的解析器，用于从自然语言文本中提取并规范化时间信息表达。\n* [JapaneseTokenizers](https:\u002F\u002Fgithub.com\u002FKensuke-Mitsuzawa\u002FJapaneseTokenizers) - 一套用于从文本数据中进行特征选择的指标集合。\n* [daaja](https:\u002F\u002Fgithub.com\u002Fkajyuuen\u002Fdaaja) - 此仓库包含针对日语NLP的数据增强实现。\n* [accel-brain-code](https:\u002F\u002Fgithub.com\u002Faccel-brain\u002Faccel-brain-code) - 该仓库旨在为我在个人网站上撰写的概念验证（PoC）及研发（R&D）案例制作原型。主要研究方向包括与表示学习相关的自编码器、基于能量模型的统计机器学习、对抗生成网络等。\n* [kyoto-reader](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkyoto-reader) - 京都语料库、KWDLC和带标注的FKCC语料库的处理器。\n* [nlplot](https:\u002F\u002Fgithub.com\u002Ftakapy0210\u002Fnlplot) - 自然语言处理可视化模块。\n* [rake-ja](https:\u002F\u002Fgithub.com\u002Fkanjirz50\u002Frake-ja) - 面向日语的快速自动关键词提取算法。\n* [jel](https:\u002F\u002Fgithub.com\u002Fizuna385\u002Fjel) - 日语文本实体链接工具。\n* [MedNER-J](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FMedNER-J) - MedEX\u002FJ（日语疾病名称抽取器）的最新版本。\n* [zunda-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fzunda-python) - Zunda：面向Python的日语增强模态分析客户端。\n* [AIO2_DPR_baseline](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FAIO2_DPR_baseline) - https:\u002F\u002Fwww.nlp.ecei.tohoku.ac.jp\u002Fprojects\u002Faio\u002F\n* [showcase](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fshowcase) - Matsubayashi & Inui (2018)论文中提出的日语谓词-论元结构（PAS）分析器的PyTorch实现，并做了一些改进。\n* [darts-clone-python](https:\u002F\u002Fgithub.com\u002Frixwew\u002Fdarts-clone-python) - Darts克隆的Python绑定。\n* [jrte-corpus_example](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fjrte-corpus_example) - 日语真实文本蕴含语料库的示例代码。\n* [desuwa](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fdesuwa) - 基于KNP规则文件对词素和短语进行特征标注的工具（纯Python实现）。\n* [HotPepperGourmetDialogue](https:\u002F\u002Fgithub.com\u002FHironsan\u002FHotPepperGourmetDialogue) - 通过日语对话进行餐厅搜索系统。\n* [nlp-recipes-ja](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp-recipes-ja) - 日语自然语言处理的示例代码。\n* [Japanese_nlp_scripts](https:\u002F\u002Fgithub.com\u002Folsgaard\u002FJapanese_nlp_scripts) - 一些用于在Python中处理日语文本的小型示例脚本。\n* [DNorm-J](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FDNorm-J) - DNorm的日语版本。\n* [pyknp-eventgraph](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fpyknp-eventgraph) - EventGraph是用于开发高级日语NLP应用的平台。\n* [ishi](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fishi) - Ishi：日语意志分类器。\n* [python-npylm](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Fpython-npylm) - 基于贝叶斯层次语言模型的无监督形态分析。\n* [python-npycrf](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Fpython-npycrf) - 结合条件随机场与贝叶斯层次语言模型的半监督形态分析。\n* [unsupervised-pos-tagging](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Funsupervised-pos-tagging) - 无监督词性标注。\n* [negima](https:\u002F\u002Fgithub.com\u002Fcocodrips\u002Fnegima) - Negima是一个Python包，可根据用户定义的词性规则从日语文本中提取短语。\n* [YouyakuMan](https:\u002F\u002Fgithub.com\u002Fneilctwu\u002FYouyakuMan) - 使用BertSum作为摘要模型的抽取式摘要器。\n* [japanese-numbers-python](https:\u002F\u002Fgithub.com\u002Ftakumakanari\u002Fjapanese-numbers-python) - 用于解析自然语言中的日语数字（汉字、阿拉伯数字）的解析器。\n* [kantan](https:\u002F\u002Fgithub.com\u002Fitayperl\u002Fkantan) - 通过部首模式查找日语单词。\n* [make-meidai-dialogue](https:\u002F\u002Fgithub.com\u002Fknok\u002Fmake-meidai-dialogue) - 获取日语对话语料库。\n* [japanese_summarizer](https:\u002F\u002Fgithub.com\u002Fryuryukke\u002Fjapanese_summarizer) - 日语文章摘要器。\n* [chirptext](https:\u002F\u002Fgithub.com\u002Fletuananh\u002Fchirptext) - ChirpText是一系列用于Python的文本处理工具。\n* [yubin](https:\u002F\u002Fgithub.com\u002Falvations\u002Fyubin) - 日本地址清洗工具。\n* [jawiki-cleaner](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fjawiki-cleaner) - 日语维基百科清理工具。\n* [japanese2phoneme](https:\u002F\u002Fgithub.com\u002Fiory\u002Fjapanese2phoneme) - 一个将日语转换为音素的Python库。\n* [anlp_nlp2021_d3-1](https:\u002F\u002Fgithub.com\u002Farusl\u002Fanlp_nlp2021_d3-1) - 该仓库包含与“基于情感的文本分类中日语分词器的实验评估”相关实验的代码。\n* [aozora_classification](https:\u002F\u002Fgithub.com\u002Fshibuiwilliam\u002Faozora_classification) - 该项目旨在将日语句子分类为与夏目漱石、森鸥外、芥川龙之介等日本古典作家的相似程度。\n* [aozora-corpus-generator](https:\u002F\u002Fgithub.com\u002Fborh\u002Faozora-corpus-generator) - 从青空文库生成纯文本或分词后的文本文件。\n* [JLM](https:\u002F\u002Fgithub.com\u002Fjiali-ms\u002FJLM) - 适用于日语、汉语等大词汇量语言的快速LSTM语言模型。\n* [NTM](https:\u002F\u002Fgithub.com\u002Fm3yrin\u002FNTM) - 对日语文章进行神经主题建模的测试。\n* [EN-JP-ML-Lexicon](https:\u002F\u002Fgithub.com\u002FMachine-Learning-Tokyo\u002FEN-JP-ML-Lexicon) - 这是一本关于机器学习和深度学习术语的英日词典。\n* [text-generation](https:\u002F\u002Fgithub.com\u002Fdiscus0434\u002Ftext-generation) - 易于使用的脚本，可用于用您自己的文本微调GPT-2-JA，生成句子，并自动推送到Twitter。\n* [chainer_nic](https:\u002F\u002Fgithub.com\u002Fyuyay\u002Fchainer_nic) - 在Chainer上实现的神经图像字幕（NIC），以及其在英语和日语图像字幕数据集上的预训练模型。\n* [unihan-lm](https:\u002F\u002Fgithub.com\u002FJetRunner\u002Funihan-lm) - “UnihanLM：利用Unihan数据库进行粗粒度到细粒度的中日语言模型预训练”，AACL-IJCNLP 2020官方仓库。\n* [mbart-finetuning](https:\u002F\u002Fgithub.com\u002Fken11\u002Fmbart-finetuning) - 用于对mBART模型进行微调的代码。\n* [xvector_jtubespeech](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fxvector_jtubespeech) - xvector模型在jtubespeech上的应用。\n* [TinySegmenterMaker](https:\u002F\u002Fgithub.com\u002Fshogo82148\u002FTinySegmenterMaker) - 用于自制TinySegmenter训练模型的工具。\n* [Grongish](https:\u002F\u002Fgithub.com\u002Fshogo82148\u002FGrongish) - 日语与格隆吉语互转脚本。\n* [WordCloud-Japanese](https:\u002F\u002Fgithub.com\u002Faocattleya\u002FWordCloud-Japanese) - 使用WordCloud实现无需Mecab（形态分析引擎）即可对日语文本进行类似形态分析的显示效果的脚本。\n* [snark](https:\u002F\u002Fgithub.com\u002Fhiraokusky\u002Fsnark) - 基于日语WordNet的数据库访问库。\n* [toEmoji](https:\u002F\u002Fgithub.com\u002Fmkan0141\u002FtoEmoji) - 将日语句子转换为仅由表情符号组成的句子的工具。\n* [termextract](https:\u002F\u002Fgithub.com\u002Fkanjirz50\u002Ftermextract) - 专业术语提取算法实现练习。\n* [JDT-with-KenLM-scoring](https:\u002F\u002Fgithub.com\u002FTUT-SLP-lab\u002FJDT-with-KenLM-scoring) - 对日语对话变换器的回答候选使用KenLM的N-gram语言模型进行打分、过滤或重新排序。\n* [mixture-of-unigram-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002Fmixture-of-unigram-model) - Python中的单字模型混合与无限单字模型混合。（混合单字模型与无限混合单字模型）\n* [hidden-markov-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002Fhidden-markov-model) - Python中的隐马尔可夫模型（HMM）与无限隐马尔可夫模型（iHMM）。（隐藏马尔可夫模型与无限隐藏马尔可夫模型）\n* [Ngram-language-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002FNgram-language-model) - Python中的N-gram语言模型。（N-gram语言模型）\n* [ASRDeepSpeech](https:\u002F\u002Fgithub.com\u002FJeanMaximilienCadic\u002FASRDeepSpeech) - 使用Zakuro AI支持的PyTorch框架下的deepspeech2模型进行自动语音识别。\n* [neural_ime](https:\u002F\u002Fgithub.com\u002Fyohokuno\u002Fneural_ime) - 神经输入法引擎。\n* [neural_japanese_transliterator](https:\u002F\u002Fgithub.com\u002FKyubyong\u002Fneural_japanese_transliterator) - 神经网络能否正确地将罗马字转写成日语？\n* [tinysegmenter](https:\u002F\u002Fgithub.com\u002FSamuraiT\u002Ftinysegmenter) - 专为日语设计的分词器。\n* [AugLy-jp](https:\u002F\u002Fgithub.com\u002Fchck\u002FAugLy-jp) - AugLy上的日语文本数据增强。\n* [furigana4epub](https:\u002F\u002Fgithub.com\u002FMumumu4\u002Ffurigana4epub) - 使用Mecab和Unidic为日语ePub书籍添加假名的Python脚本。\n* [PyKatsuyou](https:\u002F\u002Fgithub.com\u002FSmashinFries\u002FPyKatsuyou) - 日语动词\u002F形容词变位工具。\n* [jageocoder](https:\u002F\u002Fgithub.com\u002Ft-sagara\u002Fjageocoder) - 纯Python实现的日语地址地理编码器。\n* [pygeonlp](https:\u002F\u002Fgithub.com\u002Fgeonlp-platform\u002Fpygeonlp) - pygeonlp，一个用于给日语文本添加地理标签的Python模块。\n* [nksnd](https:\u002F\u002Fgithub.com\u002Fyoriyuki\u002Fnksnd) - 新的假名-汉字转换引擎。\n* [JaMIE](https:\u002F\u002Fgithub.com\u002Fracerandom\u002FJaMIE) - 日本医学信息抽取工具包。\n* [fasttext-vs-word2vec-on-twitter-data](https:\u002F\u002Fgithub.com\u002FGINK03\u002Ffasttext-vs-word2vec-on-twitter-data) - FastText与Word2Vec的比较、执行脚本及训练脚本。\n* [minimal-search-engine](https:\u002F\u002Fgithub.com\u002FGINK03\u002Fminimal-search-engine) - 最小的搜索引擎\u002FPageRank\u002Ftf-idf。\n* [5ch-analysis](https:\u002F\u002Fgithub.com\u002FGINK03\u002F5ch-analysis) - 抓取5ch的历史日志，追踪过去流行的词汇（如“香具师”、“orz”等）。\n* [tweet_extructor](https:\u002F\u002Fgithub.com\u002FtatHi\u002Ftweet_extructor) - 用于日本推特舆情分析数据集的推文下载工具。\n* [japanese-word-aggregation](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fjapanese-word-aggregation) - 基于Juman++和ConceptNet5.5聚合日语词汇。\n* [jinf](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fjinf) - 日语屈折形式转换器。\n* [kwja](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkwja) - 日语统一语言分析器。\n* [mlm-scoring-transformers](https:\u002F\u002Fgithub.com\u002FRyutaro-A\u002Fmlm-scoring-transformers) - 基于掩码语言模型评分（ACL2020）的复现包。\n* [ClipCap-for-Japanese](https:\u002F\u002Fgithub.com\u002FJapanese-Image-Captioning\u002FClipCap-for-Japanese) - [PyTorch] 日语版ClipCap。\n* [SAT-for-Japanese](https:\u002F\u002Fgithub.com\u002FJapanese-Image-Captioning\u002FSAT-for-Japanese) - [PyTorch] 日语版“展示、关注与讲述”。\n* [cihai](https:\u002F\u002Fgithub.com\u002Fcihai\u002Fcihai) - 用于CJK（中文、日语、韩语）语言词典的Python库。\n* [marine](https:\u002F\u002Fgithub.com\u002F6gsn\u002Fmarine) - MARINE：基于多任务学习的日语重音估计。\n* [whisper-asr-finetune](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fwhisper-asr-finetune) - 微调Whisper ASR模型。\n* [japanese_chatbot](https:\u002F\u002Fgithub.com\u002FCjangCjengh\u002Fjapanese_chatbot) - 使用BERT和Transformer解码器实现的日语聊天机器人。\n* [radicalchar](https:\u002F\u002Fgithub.com\u002Fyamamaya\u002Fradicalchar) - 部首字符标准化库。\n* [akaza](https:\u002F\u002Fgithub.com\u002Ftokuhirom\u002Fakaza) - 又一款用于IBus\u002FLinux的日语输入法。\n* [posuto](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fposuto) - 日本邮政编码数据。\n* [tacotron2-japanese](https:\u002F\u002Fgithub.com\u002FCjangCjengh\u002Ftacotron2-japanese) - Tacotron2的日语实现。\n* [ibus-hiragana](https:\u002F\u002Fgithub.com\u002Fesrille\u002Fibus-hiragana) - IBus的平假名输入法。\n* [furiganapad](https:\u002F\u002Fgithub.com\u002Fesrille\u002Ffuriganapad) - 注音板。\n* [chikkarpy](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002Fchikkarpy) - 日语同义词库。\n* [ja-tokenizer-docker-py](https:\u002F\u002Fgithub.com\u002Fp-geon\u002Fja-tokenizer-docker-py) - Mecab + NEologd + Docker + Python3。\n* [JapaneseEmbeddingEval](https:\u002F\u002Fgithub.com\u002Foshizo\u002FJapaneseEmbeddingEval) - 日语嵌入评估。\n* [gptuber-by-langchain](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fgptuber-by-langchain) - GPT将担任YouTuber。\n* [shuwa](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fshuwa) - 扩展GNOME屏幕键盘以支持输入法。\n* [japanese-nli-model](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002Fjapanese-nli-model) - 该仓库提供了日语NLI模型的代码，这是一个经过微调的掩码语言模型。\n* [tra-fugu](https:\u002F\u002Fgithub.com\u002Ftos-kamiya\u002Ftra-fugu) - 使用FuguMT进行日英、英日翻译的工具。\n* [fugumt](https:\u002F\u002Fgithub.com\u002Fs-taka\u002Ffugumt) - 这是一个利用“ぷるーふおぶこんせぷと”公开的机器翻译引擎的翻译环境。可以翻译表单中输入的字符串以及PDF文件。\n* [JaSPICE](https:\u002F\u002Fgithub.com\u002Fkeio-smilab23\u002FJaSPICE) - JaSPICE：基于谓词-论元结构的图像字幕模型自动评价指标。\n* [Retrieval-based-Voice-Conversion-WebUI-JP-localization](https:\u002F\u002Fgithub.com\u002Fyantaisa11\u002FRetrieval-based-Voice-Conversion-WebUI-JP-localization) - 日语本地化。\n* [pyopenjtalk](https:\u002F\u002Fgithub.com\u002Fr9y9\u002Fpyopenjtalk) - OpenJTalk的Python封装。\n* [yomigana-ebook](https:\u002F\u002Fgithub.com\u002Frabbit19981023\u002Fyomigana-ebook) - 通过为电子书中每个汉字添加注音，使学习日语更加容易。\n* [N46Whisper](https:\u002F\u002Fgithub.com\u002FAyanaminn\u002FN46Whisper) - 基于Whisper的日语字幕生成器。\n* [japanese_llm_simple_webui](https:\u002F\u002Fgithub.com\u002Fnoir55\u002Fjapanese_llm_simple_webui) - 这是Rinna-3.6B、OpenCALM等日语LLM（大规模语言模型）的简易Web界面。\n* [pdf-translator](https:\u002F\u002Fgithub.com\u002Fdiscus0434\u002Fpdf-translator) - pdf-translator可以将英文PDF文件翻译成日语，同时保留原始布局。\n* [japanese_qa_demo_with_haystack_and_es](https:\u002F\u002Fgithub.com\u002FShingo-Kamata\u002Fjapanese_qa_demo_with_haystack_and_es) - 使用Haystack + Elasticsearch + 维基百科（日语）的日本问答系统示例。\n* [mozc-devices](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmozc-devices) - 自动从code.google.com\u002Fp\u002Fmozc-morse导出。\n* [natsume](https:\u002F\u002Fgithub.com\u002Ffaruzan0820\u002Fnatsume) - 一个日语文本前端处理工具包。\n* [vits-japros-webui](https:\u002F\u002Fgithub.com\u002Flitagin02\u002Fvits-japros-webui) - 日语TTS（VITS）的学习与语音合成的Gradio WebUI。\n* [ja-law-parser](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fja-law-parser) - 一个日语法律解析器。\n* [dictation-kit](https:\u002F\u002Fgithub.com\u002Fjulius-speech\u002Fdictation-kit) - 使用Julius的日语听写工具。\n* [julius4seg](https:\u002F\u002Fgithub.com\u002FHiroshiba\u002Fjulius4seg) - 使用Julius辅助分段的工具。\n* [voicevox_engine](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox_engine) - 免费可用的中等质量文本朗读软件VOICEVOX的语音合成引擎。\n* [LLaVA-JP](https:\u002F\u002Fgithub.com\u002Ftosiyuki\u002FLLaVA-JP) - LLaVA-JP是使用LLaVA方法训练的日语视觉语言模型。\n* [RAG-Japanese](https:\u002F\u002Fgithub.com\u002FAkimParis\u002FRAG-Japanese) - 在低资源环境下为日语LLM提供的开源RAG，使用Llama Index。\n* [bertjsc](https:\u002F\u002Fgithub.com\u002Fer-ri\u002Fbertjsc) - 使用BERT（掩码语言模型）的日语拼写错误纠正器。基于BERT进行日语校正。\n* [llm-leaderboard](https:\u002F\u002Fgithub.com\u002Fwandb\u002Fllm-leaderboard) - 针对日语任务的LLM评估项目。\n* [jglue-evaluation-scripts](https:\u002F\u002Fgithub.com\u002Fnobu-g\u002Fjglue-evaluation-scripts) - JGLUE日语语言理解基准的训练与评估脚本。\n* [BLIP2-Japanese](https:\u002F\u002Fgithub.com\u002FZhaoPeiduo\u002FBLIP2-Japanese) - 使用预先在日语数据集上训练好的模型修改LAVIS的BLIP2 Q-former。\n* [wikipedia-passages-jawiki-embeddings-utils](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fwikipedia-passages-jawiki-embeddings-utils) - 将维基百科日语文章转换为各种日语嵌入或faiss索引的脚本等。\n* [simple-simcse-ja](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fsimple-simcse-ja) - 探索日语SimCSE。\n* [wikipedia-japanese-open-rag](https:\u002F\u002Fgithub.com\u002Flawofcycles\u002Fwikipedia-japanese-open-rag) - 基于维基百科日语文章，构建一个回答用户问题的Gradio基础RAG示例。\n* [gpt4-autoeval](https:\u002F\u002Fgithub.com\u002Fnorthern-system-service\u002Fgpt4-autoeval) - 使用GPT-4自动评估语言模型响应的脚本。\n* [t5-japanese](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Ft5-japanese) - 日语T5模型。\n* [japanese_llm_eval](https:\u002F\u002Fgithub.com\u002Flightblue-tech\u002Fjapanese_llm_eval) - 一个用于评估日语LLM的仓库。\n* [jmteb](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fjmteb) - JMTEB（日语大规模文本嵌入基准）的评估脚本。\n* [pydomino](https:\u002F\u002Fgithub.com\u002Fdwangomediavillage\u002Fpydomino) - 用于对日语语音进行音素标签对齐的工具。\n* [easynovelassistant](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasynovelassistant) - 使用轻量且无监管、无审查的日语本地LLM《LightChatAssistant-TypeB》的简单小说生成助手。具备永久生成能力，不断累积好运。也支持朗读功能。\n* [clip-japanese](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Fclip-japanese) - 日语数据集上的qlora指令调整学习示例代码。\n* [rime-jaroomaji](https:\u002F\u002Fgithub.com\u002Flazyfoxchan\u002Frime-jaroomaji) - Rime IME的日语罗马字输入方案。\n* [deep-question-generation](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Fdeep-question-generation) - 使用深度学习自动生成测验题目（日语T5模型）。\n* [magpie-nemotron](https:\u002F\u002Fgithub.com\u002Faratako\u002Fmagpie-nemotron) - 使用Magpie方法和Nemotron-4-340B-Instruct生成合成对话数据集的代码。\n* [qlora_ja](https:\u002F\u002Fgithub.com\u002Fsosuke115\u002Fqlora_ja) - 日语数据集上的qlora指令调整学习示例代码。\n* [mozcdic-ut-jawiki](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-jawiki) - Mozc UT Jawiki词典是由日语维基百科为Mozc生成的词典。\n* [shisa-v2](https:\u002F\u002Fgithub.com\u002Fshisa-ai\u002Fshisa-v2) - 日英双语LLM。\n* [llm-translator](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fllm-translator) - 基于Mixtral的日英（英日）翻译模型。\n* [llm-jp-asr](https:\u002F\u002Fgithub.com\u002Ftosiyuki\u002Fllm-jp-asr) - 用于训练将Whisper解码器替换为llm-jp-1.3b-v1.0的语音识别模型的代码。\n* [rag-japanese](https:\u002F\u002Fgithub.com\u002Fakimfromparis\u002Frag-japanese) - 在低资源环境下为日语LLM提供的开源RAG，使用Llama Index。\n* [monaka](https:\u002F\u002Fgithub.com\u002Fkomiya-lab\u002Fmonaka) - 一个日语语法分析器（包括历史日语）。\n* [jp-translate.cloud](https:\u002F\u002Fgithub.com\u002Fmatthewbieda\u002Fjp-translate.cloud) - 基于最新NMT研究的先进开源日英双向机器翻译系统。\n* [substring-word-finder](https:\u002F\u002Fgithub.com\u002Ftoufu-24\u002Fsubstring-word-finder) - 判断连续子字符串是否为单词。\n* [heron-vlm-leaderboard](https:\u002F\u002Fgithub.com\u002Fwandb\u002Fheron-vlm-leaderboard) - 该项目是一个用于评估和比较各类视觉语言模型（VLM）性能的基准测试工具。它使用两个数据集：LLaVA-Bench-In-the-Wild和日本HERON Bench来衡量模型性能。\n* [text2dataset](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Ftext2dataset) - 使用开放的LLM轻松将大型英文语料库转化为日语语料库。\n* [mecab-web-api](https:\u002F\u002Fgithub.com\u002Fbungoume\u002Fmecab-web-api) - 基于Mecab的日语形态分析WebAPI。\n* [mecab_controller](https:\u002F\u002Fgithub.com\u002Fajatt-tools\u002Fmecab_controller) - Mecab封装器，用于生成假名注音。\n* [vits](https:\u002F\u002Fgithub.com\u002Fzassou65535\u002Fvits) - VITS文本朗读器兼变声器。\n* [akari_chatgpt_bot](https:\u002F\u002Fgithub.com\u002Fakarigroup\u002Fakari_chatgpt_bot) - 使用语音识别、文本生成和语音合成进行对话的聊天机器人应用。\n* [kudasai](https:\u002F\u002Fgithub.com\u002Fbikatr7\u002Fkudasai) - 通过先进的预处理和集成翻译技术简化日英翻译流程。\n* [mecab-visualizer](https:\u002F\u002Fgithub.com\u002Fsophiefy\u002Fmecab-visualizer) - 用于可视化Mecab形态分析结果的工具。\n* [add-dictionary](https:\u002F\u002Fgithub.com\u002Fmassao000\u002Fadd-dictionary) - 一个通过GUI为OpenJTalk用户词典添加条目的应用程序。\n* [j-moshi](https:\u002F\u002Fgithub.com\u002Fnu-dialogue\u002Fj-moshi) - J-Moshi：一个日语全双工语音对话系统。\n* [jatts](https:\u002F\u002Fgithub.com\u002Funilight\u002Fjatts) - JATTS：日语TTS（用于研究）。\n* [tsukasa-speech](https:\u002F\u002Fgithub.com\u002Frespaired\u002Ftsukasa-speech) - 一个前沿的日语语音生成网络。\n* [symptom-expression-search](https:\u002F\u002Fgithub.com\u002Fpo3rin\u002Fsymptom-expression-search) - 尝试使用Elasticsearch、GiNZA和患者表达词典进行患者表达波动吸收的意义结构检索。\n* [llm-jp-judge](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-judge) - 用于自动评估生成内容的Python工具。\n* [asagi-vlm-colaboratory-sample](https:\u002F\u002Fgithub.com\u002Fkazuhito00\u002Fasagi-vlm-colaboratory-sample) - 在Colaboratory上试用Asagi（利用合成数据集的大规模日语VLM）的示例。\n* [llm-jp-eval-mm](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-eval-mm) - 该工具可自动跨多个数据集评估日语多模态大型语言模型。\n* [llm-jp-judge](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-judge) - 用于自动评估生成内容的Python工具。\n* [manga109api](https:\u002F\u002Fgithub.com\u002Fmanga109\u002Fmanga109api) - 一个简单的Python API，用于读取Manga109的标注数据。\n* [fastrtc-jp](https:\u002F\u002Fgithub.com\u002Froute250\u002Ffastrtc-jp) - fastrtc的日语TTS和STT附加套件。\n* [whisper-transcription](https:\u002F\u002Fgithub.com\u002Ffumifumi0831\u002Fwhisper-transcription) - 使用Python的Whisper模型进行语音转文字的工具。\n* [pocket-researcher](https:\u002F\u002Fgithub.com\u002Fu-masao\u002Fpocket-researcher) - 利用LLM的自主研究代理。方便地收集信息、掌握概要。\n* [jtransbench](https:\u002F\u002Fgithub.com\u002Fwebbigdata-jp\u002Fjtransbench) - 一个轻松评估日语翻译技能的工具。\n* [easyllasa](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasyllasa) - EasyLlasa是一种TSTS（TextSpeechToSpeech），它根据5至15秒的日语语音和日语文本生成日语语音。\n* [kanjikana-model](https:\u002F\u002Fgithub.com\u002Fdigital-go-jp\u002Fkanjikana-model) - 姓名汉字-假名匹配模型。\n* [deep-openreview-research-ja](https:\u002F\u002Fgithub.com\u002Ftb-yasu\u002Fdeep-openreview-research-ja) - 一个能够自动发现并分析OpenReview论文的对应日语AI代理。\n* [pitchbench](https:\u002F\u002Fgithub.com\u002Fshewiiii\u002Fpitchbench) - 实验性的基于日语发音重音的LLM基准测试。\n* [mini-transformer-from-scratch](https:\u002F\u002Fgithub.com\u002Fzuofanf\u002Fmini-transformer-from-scratch) - 从头开始构建英日Transformer。\n* [vv_core_inference](https:\u002F\u002Fgithub.com\u002Fhiroshiba\u002Fvv_core_inference) - VOICEVOX核心中使用的深度学习模型的推理代码。\n* [pyopenjtalk-plus](https:\u002F\u002Fgithub.com\u002Ftsukumijima\u002Fpyopenjtalk-plus) - pyopenjtalk-plus：一个带有额外改进的OpenJTalk Python封装。\n* [japanese_spelling_correction](https:\u002F\u002Fgithub.com\u002Fphkhanhtrinh23\u002Fjapanese_spelling_correction) - 日语拼写纠正。\n* [py-kaomoji](https:\u002F\u002Fgithub.com\u002Fshibuiwilliam\u002Fpy-kaomoji) - Python版颜文字。\n* [llm-jp-vila](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-vila) - 该仓库包含训练llm-jp\u002Fllm-jp-3-vila-14b的代码，该模型是从VILA仓库修改而来。\n* [kanjivg-radical](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fkanjivg-radical) - kanjivg-radical。\n* [japanese-wordnet-visualization](https:\u002F\u002Fgithub.com\u002FHemingwayLee\u002Fjapanese-wordnet-visualization) - 该项目使用Django构建的Web应用对日语WordNet（日本語ワードネット）进行可视化。\n* [piper-plus](https:\u002F\u002Fgithub.com\u002Fayutaz\u002Fpiper-plus) - 增强版Piper TTS，支持日语、WebAssembly、多GPU训练及质量改进。\n* [Japanera](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002FJapanera) - 轻松使用日本年号系统的工具。\n* [bert-abstractive-text-summarization](https:\u002F\u002Fgithub.com\u002Fiwasakiyuuki\u002Fbert-abstractive-text-summarization) - 使用BERT进行日语句子摘要。\n* [kyujipy](https:\u002F\u002Fgithub.com\u002Fdrturnon\u002Fkyujipy) - 一个Python库，用于将日语文本在新字体与旧字体之间相互转换。\n* [jitenbot](https:\u002F\u002Fgithub.com\u002Fkonstantindjairo\u002Fjitenbot) - 用于创建个人日语词典副本的网页爬虫。\n* [ja-icd10](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fja-icd10) - 一个用于处理ICD-10国际疾病分类日语信息的Python包。\n* [pl-bert-vits2](https:\u002F\u002Fgithub.com\u002Ftonnetonne814\u002Fpl-bert-vits2) - 使用音素级日语BERT的VITS2。\n* [ndc_predictor](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndc_predictor) - NDC预测器的机器学习模型（基于书目信息预测日本十进分类的fastText已训练模型）。\n* [pfmt-bench-fin-ja](https:\u002F\u002Fgithub.com\u002Fpfnet-research\u002Fpfmt-bench-fin-ja) - pfmt-bench-fin-ja：日语金融领域首选的多轮对话基准。\n* [marine-plus](https:\u002F\u002Fgithub.com\u002Ftsukumijima\u002Fmarine-plus) - MARINE：基于多任务学习的日语重音估计（也支持Windows）。\n* [ja-tokenizer-benchmark](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fja-tokenizer-benchmark) - 比较不同日语分词器在Python中的速度。\n* [yat](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fyat) - yat：又一个用于日语NLP的分词器。\n* [igakuqa119](https:\u002F\u002Fgithub.com\u002Fdocto-rin\u002Figakuqa119) - 在第119次日本医学执照考试中评估LLM。\n* [japanese-luw-tokenizer](https:\u002F\u002Fgithub.com\u002Fkoichiyasuoka\u002Fjapanese-luw-tokenizer) - 日语长单元词分词器，结合了Transformers的RemBertTokenizerFast。\n* [ibus-jig](https:\u002F\u002Fgithub.com\u002Fy-koj\u002Fibus-jig) - ibus-jig：使用GPT-4的日语输入法。\n* [jp-stopword-filter](https:\u002F\u002Fgithub.com\u002FBrambleXu\u002Fjp-stopword-filter) - 一个轻量级的Python库，旨在根据可定制规则过滤掉日语文本中的停用词。\n* [yasumail](https:\u002F\u002Fgithub.com\u002Fterallite\u002Fyasumail) - 用于ML训练数据的合成日语商务邮件生成器。\n* [himotoki](https:\u002F\u002Fgithub.com\u002Fmsr2903\u002Fhimotoki) - 一个基于Python的日语分词器、词典、形态分析器和罗马字转写工具。基于JMDict用于语言学习。\n* [diafill-toolkit](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fdiafill-toolkit) - 一个用于合成富含填充语、短句的日语对话脚本的工具包，适用于基于大型语言模型（LLMs）的语音交互。该项目旨在分两个阶段生成数据：种子生成（元数据创建）和对话生成（脚本创作）。\n* [eval_vertical_ja](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Feval_vertical_ja) - 在竖排日语文本上评估多模态大型语言模型。\n* [jp-llm-corpus-pii-filter](https:\u002F\u002Fgithub.com\u002Fmatsuolab\u002Fjp-llm-corpus-pii-filter) - 本代码旨在从大型语言模型（LLM）的训练语料库中过滤出个人信息中特别需要关注的“需关注个人信息”。\n* [eval_vertical_ja](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Feval_vertical_ja) - 在竖排日语文本上评估多模态大型语言模型。\n* [Novel2DialCorpus](https:\u002F\u002Fgithub.com\u002Fganbon\u002FNovel2DialCorpus) - 从小说文本中构建闲聊对话语料库的方法。\n\n|Name|downloads\u002Fweek|total downloads|stars|last commit|\n-|-|-|-|-\n| 🔗 [namedivider-python](https:\u002F\u002Fgithub.com\u002Frskmoi\u002Fnamedivider-python) | 📥 730 | 📦 82k | ⭐ 251 | 🟡 november 2025|\n| 🔗 [asa-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fasa-python) | 📥 36 | 📦 31k | ⭐ 11 | 🔴 february 2019|\n| 🔗 [python_asa](https:\u002F\u002Fgithub.com\u002FTakeuchi-Lab-LM\u002Fpython_asa) | - | - | ⭐ 22 | 🔴 january 2020|\n| 🔗 [toiro](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Ftoiro) | 📥 13 | 📦 27k | ⭐ 121 | 🟡 november 2025|\n| 🔗 [ja-timex](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fja-timex) | 📥 551 | 📦 93k | ⭐ 140 | 🔴 november 2023|\n| 🔗 [JapaneseTokenizers](https:\u002F\u002Fgithub.com\u002FKensuke-Mitsuzawa\u002FJapaneseTokenizers) | - | - | ⭐ 137 | 🔴 march 2019|\n| 🔗 [daaja](https:\u002F\u002Fgithub.com\u002Fkajyuuen\u002Fdaaja) | 📥 66 | 📦 25k | ⭐ 64 | 🔴 february 2023|\n| 🔗 [accel-brain-code](https:\u002F\u002Fgithub.com\u002Faccel-brain\u002Faccel-brain-code) | 📥 251 | 📦 150k | ⭐ 323 | 🔴 december 2023|\n| 🔗 [JGLUE](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002FJGLUE) | - | - | ⭐ 338 | 🔴 march 2025|\n| 🔗 [kyoto-reader](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkyoto-reader) | 📥 64 | 📦 52k | ⭐ 10 | 🔴 june 2024|\n| 🔗 [nlplot](https:\u002F\u002Fgithub.com\u002Ftakapy0210\u002Fnlplot) | 📥 212 | 📦 109k | ⭐ 238 | 🔴 september 2022|\n| 🔗 [rake-ja](https:\u002F\u002Fgithub.com\u002Fkanjirz50\u002Frake-ja) | - | - | ⭐ 21 | 🔴 october 2018|\n| 🔗 [jel](https:\u002F\u002Fgithub.com\u002Fizuna385\u002Fjel) | 📥 13 | 📦 8k | ⭐ 11 | 🔴 july 2021|\n| 🔗 [MedNER-J](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FMedNER-J) | - | - | ⭐ 18 | 🔴 may 2022|\n| 🔗 [zunda-python](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fzunda-python) | 📥 10 | 📦 6k | ⭐ 10 | 🔴 november 2019|\n| 🔗 [AIO2_DPR_baseline](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FAIO2_DPR_baseline) | - | - | ⭐ 16 | 🔴 january 2022|\n| 🔗 [showcase](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fshowcase) | 📥 4 | 📦 7k | ⭐ 6 | 🔴 june 2018|\n| 🔗 [darts-clone-python](https:\u002F\u002Fgithub.com\u002Frixwew\u002Fdarts-clone-python) | 📥 3k | 📦 9M | ⭐ 20 | 🔴 april 2022|\n| 🔗 [jrte-corpus_example](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fjrte-corpus_example) | - | - | ⭐ 3 | 🔴 november 2021|\n| 🔗 [desuwa](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fdesuwa) | 📥 18 | 📦 10k | ⭐ 6 | 🔴 may 2022|\n| 🔗 [HotPepperGourmetDialogue](https:\u002F\u002Fgithub.com\u002FHironsan\u002FHotPepperGourmetDialogue) | - | - | ⭐ 277 | 🔴 may 2016|\n| 🔗 [nlp-recipes-ja](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp-recipes-ja) | - | - | ⭐ 66 | 🔴 april 2021|\n| 🔗 [Japanese_nlp_scripts](https:\u002F\u002Fgithub.com\u002Folsgaard\u002FJapanese_nlp_scripts) | - | - | ⭐ 26 | 🔴 june 2019|\n| 🔗 [DNorm-J](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FDNorm-J) | - | - | ⭐ 9 | 🔴 june 2022|\n| 🔗 [pyknp-eventgraph](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fpyknp-eventgraph) | 📥 86 | 📦 66k | ⭐ 9 | 🔴 september 2022|\n| 🔗 [ishi](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fishi) | 📥 6 | 📦 6k | ⭐ 2 | 🔴 may 2020|\n| 🔗 [python-npylm](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Fpython-npylm) | - | - | ⭐ 34 | 🔴 january 2019|\n| 🔗 [python-npycrf](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Fpython-npycrf) | - | - | ⭐ 11 | 🔴 march 2018|\n| 🔗 [unsupervised-pos-tagging](https:\u002F\u002Fgithub.com\u002Fmusyoku\u002Funsupervised-pos-tagging) | - | - | ⭐ 16 | 🔴 october 2017|\n| 🔗 [negima](https:\u002F\u002Fgithub.com\u002Fcocodrips\u002Fnegima) | 📥 17 | 📦 16k | ⭐ 14 | 🔴 august 2018|\n| 🔗 [YouyakuMan](https:\u002F\u002Fgithub.com\u002Fneilctwu\u002FYouyakuMan) | - | - | ⭐ 52 | 🔴 september 2020|\n| 🔗 [japanese-numbers-python](https:\u002F\u002Fgithub.com\u002Ftakumakanari\u002Fjapanese-numbers-python) | 📥 1k | 📦 2M | ⭐ 21 | 🔴 april 2020|\n| 🔗 [kantan](https:\u002F\u002Fgithub.com\u002Fitayperl\u002Fkantan) | - | - | ⭐ 8 | 🔴 october 2024|\n| 🔗 [make-meidai-dialogue](https:\u002F\u002Fgithub.com\u002Fknok\u002Fmake-meidai-dialogue) | - | - | ⭐ 40 | 🔴 september 2017|\n| 🔗 [japanese_summarizer](https:\u002F\u002Fgithub.com\u002Fryuryukke\u002Fjapanese_summarizer) | - | - | ⭐ 10 | 🔴 august 2022|\n| 🔗 [chirptext](https:\u002F\u002Fgithub.com\u002Fletuananh\u002Fchirptext) | 📥 6k | 📦 212k | ⭐ 7 | 🔴 october 2022|\n| 🔗 [yubin](https:\u002F\u002Fgithub.com\u002Falvations\u002Fyubin) | 📥 7 | 📦 3k | ⭐ 3 | 🔴 october 2019|\n| 🔗 [jawiki-cleaner](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fjawiki-cleaner) | 📥 34 | 📦 24k | ⭐ 6 | 🔴 february 2021|\n| 🔗 [japanese2phoneme](https:\u002F\u002Fgithub.com\u002Fiory\u002Fjapanese2phoneme) | 📥 5 | 📦 4k | ⭐ 1 | 🔴 february 2022|\n| 🔗 [anlp_nlp2021_d3-1](https:\u002F\u002Fgithub.com\u002Farusl\u002Fanlp_nlp2021_d3-1) | - | - | ⭐ 1 | 🔴 march 2022|\n| 🔗 [aozora_classification](https:\u002F\u002Fgithub.com\u002Fshibuiwilliam\u002Faozora_classification) | - | - | ⭐ 11 | 🔴 september 2017|\n| 🔗 [aozora-corpus-generator](https:\u002F\u002Fgithub.com\u002Fborh\u002Faozora-corpus-generator) | - | - | ⭐ 8 | 🟡 june 2025|\n| 🔗 [JLM](https:\u002F\u002Fgithub.com\u002Fjiali-ms\u002FJLM) | - | - | ⭐ 111 | 🔴 june 2019|\n| 🔗 [NTM](https:\u002F\u002Fgithub.com\u002Fm3yrin\u002FNTM) | - | - | ⭐ 13 | 🔴 july 2019|\n| 🔗 [EN-JP-ML-Lexicon](https:\u002F\u002Fgithub.com\u002FMachine-Learning-Tokyo\u002FEN-JP-ML-Lexicon) | - | - | ⭐ 40 | 🔴 march 2021|\n| 🔗 [text-generation](https:\u002F\u002Fgithub.com\u002Fdiscus0434\u002Ftext-generation) | - | - | ⭐ invalid | 🟡 august 2025|\n| 🔗 [chainer_nic](https:\u002F\u002Fgithub.com\u002Fyuyay\u002Fchainer_nic) | - | - | ⭐ 17 | 🔴 december 2018|\n| 🔗 [unihan-lm](https:\u002F\u002Fgithub.com\u002FJetRunner\u002Funihan-lm) | - | - | ⭐ 2 | 🔴 november 2020|\n| 🔗 [mbart-finetuning](https:\u002F\u002Fgithub.com\u002Fken11\u002Fmbart-finetuning) | - | - | ⭐ 3 | 🔴 october 2021|\n| 🔗 [xvector_jtubespeech](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fxvector_jtubespeech) | - | - | ⭐ 47 | 🔴 november 2023|\n| 🔗 [TinySegmenterMaker](https:\u002F\u002Fgithub.com\u002Fshogo82148\u002FTinySegmenterMaker) | - | - | ⭐ 72 | 🔴 september 2022|\n| 🔗 [Grongish](https:\u002F\u002Fgithub.com\u002Fshogo82148\u002FGrongish) | - | - | ⭐ 25 | 🟡 december 2025|\n| 🔗 [WordCloud-Japanese](https:\u002F\u002Fgithub.com\u002Faocattleya\u002FWordCloud-Japanese) | - | - | ⭐ 9 | 🔴 january 2020|\n| 🔗 [snark](https:\u002F\u002Fgithub.com\u002Fhiraokusky\u002Fsnark) | - | - | ⭐ 11 | 🔴 march 2020|\n| 🔗 [toEmoji](https:\u002F\u002Fgithub.com\u002Fmkan0141\u002FtoEmoji) | - | - | ⭐ 4 | 🔴 april 2018|\n| 🔗 [termextract](https:\u002F\u002Fgithub.com\u002Fkanjirz50\u002Ftermextract) | - | - | ⭐ 18 | 🔴 september 2018|\n| 🔗 [JDT-with-KenLM-scoring](https:\u002F\u002Fgithub.com\u002FTUT-SLP-lab\u002FJDT-with-KenLM-scoring) | - | - | ⭐ 1 | 🔴 july 2022|\n| 🔗 [mixture-of-unigram-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002Fmixture-of-unigram-model) | - | - | ⭐ 6 | 🔴 june 2017|\n| 🔗 [hidden-markov-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002Fhidden-markov-model) | - | - | ⭐ 5 | 🔴 june 2017|\n| 🔗 [Ngram-language-model](https:\u002F\u002Fgithub.com\u002FKentoW\u002FNgram-language-model) | - | - | ⭐ 5 | 🔴 december 2017|\n| 🔗 [ASRDeepSpeech](https:\u002F\u002Fgithub.com\u002FJeanMaximilienCadic\u002FASRDeepSpeech) | - | - | ⭐ 69 | 🔴 september 2022|\n| 🔗 [neural_ime](https:\u002F\u002Fgithub.com\u002Fyohokuno\u002Fneural_ime) | - | - | ⭐ 67 | 🔴 december 2016|\n| 🔗 [neural_japanese_transliterator](https:\u002F\u002Fgithub.com\u002FKyubyong\u002Fneural_japanese_transliterator) | - | - | ⭐ 178 | 🔴 september 2017|\n| 🔗 [tinysegmenter](https:\u002F\u002Fgithub.com\u002FSamuraiT\u002Ftinysegmenter) | 📥 112k | 📦 173k | ⭐ repo not found | 🔴 november 2015|\n| 🔗 [AugLy-jp](https:\u002F\u002Fgithub.com\u002Fchck\u002FAugLy-jp) | 📥 85 | 📦 30k | ⭐ 7 | 🔴 september 2021|\n| 🔗 [furigana4epub](https:\u002F\u002Fgithub.com\u002FMumumu4\u002Ffurigana4epub) | 📥 22 | 📦 12k | ⭐ 29 | 🔴 september 2021|\n| 🔗 [PyKatsuyou](https:\u002F\u002Fgithub.com\u002FSmashinFries\u002FPyKatsuyou) | 📥 93 | 📦 20k | ⭐ 12 | 🔴 march 2025|\n| 🔗 [jageocoder](https:\u002F\u002Fgithub.com\u002Ft-sagara\u002Fjageocoder) | 📥 4k | 📦 354k | ⭐ 95 | 🟢 last tuesday|\n| 🔗 [pygeonlp](https:\u002F\u002Fgithub.com\u002Fgeonlp-platform\u002Fpygeonlp) | 📥 70 | 📦 22k | ⭐ 22 | 🟢 march|\n| 🔗 [nksnd](https:\u002F\u002Fgithub.com\u002Fyoriyuki\u002Fnksnd) | - | - | ⭐ 26 | 🔴 may 2018|\n| 🔗 [JaMIE](https:\u002F\u002Fgithub.com\u002Fracerandom\u002FJaMIE) | - | - | ⭐ 9 | 🟢 march|\n| 🔗 [fasttext-vs-word2vec-on-twitter-data](https:\u002F\u002Fgithub.com\u002FGINK03\u002Ffasttext-vs-word2vec-on-twitter-data) | - | - | ⭐ 48 | 🔴 august 2017|\n| 🔗 [minimal-search-engine](https:\u002F\u002Fgithub.com\u002FGINK03\u002Fminimal-search-engine) | - | - | ⭐ 19 | 🔴 july 2019|\n| 🔗 [5ch-analysis](https:\u002F\u002Fgithub.com\u002FGINK03\u002F5ch-analysis) | - | - | ⭐ 75 | 🔴 november 2018|\n| 🔗 [tweet_extructor](https:\u002F\u002Fgithub.com\u002FtatHi\u002Ftweet_extructor) | - | - | ⭐ 3 | 🔴 august 2022|\n| 🔗 [japanese-word-aggregation](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fjapanese-word-aggregation) | - | - | ⭐ 2 | 🔴 august 2018|\n| 🔗 [jinf](https:\u002F\u002Fgithub.com\u002Fhkiyomaru\u002Fjinf) | 📥 619 | 📦 56k | ⭐ 4 | 🔴 december 2022|\n| 🔗 [kwja](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkwja) | 📥 340 | 📦 57k | ⭐ 141 | 🟡 august 2025|\n| 🔗 [mlm-scoring-transformers](https:\u002F\u002Fgithub.com\u002FRyutaro-A\u002Fmlm-scoring-transformers) | - | - | ⭐ 6 | 🔴 december 2022|\n| 🔗 [ClipCap-for-Japanese](https:\u002F\u002Fgithub.com\u002FJapanese-Image-Captioning\u002FClipCap-for-Japanese) | - | - | ⭐ 12 | 🔴 october 2022|\n| 🔗 [SAT-for-Japanese](https:\u002F\u002Fgithub.com\u002FJapanese-Image-Captioning\u002FSAT-for-Japanese) | - | - | ⭐ 2 | 🔴 october 2022|\n| 🔗 [cihai](https:\u002F\u002Fgithub.com\u002Fcihai\u002Fcihai) | 📥 833 | 📦 213k | ⭐ 93 | 🟢 today|\n| 🔗 [marine](https:\u002F\u002Fgithub.com\u002F6gsn\u002Fmarine) | 📥 43 | 📦 15k | ⭐ 36 | 🔴 september 2022|\n| 🔗 [whisper-asr-finetune](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fwhisper-asr-finetune) | - | - | ⭐ 32 | 🔴 december 2022|\n| 🔗 [japanese_chatbot](https:\u002F\u002Fgithub.com\u002FCjangCjengh\u002Fjapanese_chatbot) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [radicalchar](https:\u002F\u002Fgithub.com\u002Fyamamaya\u002Fradicalchar) | - | - | ⭐ 9 | 🔴 december 2022|\n| 🔗 [akaza](https:\u002F\u002Fgithub.com\u002Ftokuhirom\u002Fakaza) | - | - | ⭐ 249 | 🟢 yesterday|\n| 🔗 [posuto](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fposuto) | 📥 6k | 📦 696k | ⭐ 226 | 🟢 last wednesday|\n| 🔗 [tacotron2-japanese](https:\u002F\u002Fgithub.com\u002FCjangCjengh\u002Ftacotron2-japanese) | - | - | ⭐ 269 | 🔴 september 2022|\n| 🔗 [ibus-hiragana](https:\u002F\u002Fgithub.com\u002Fesrille\u002Fibus-hiragana) | - | - | ⭐ 78 | 🟢 march|\n| 🔗 [furiganapad](https:\u002F\u002Fgithub.com\u002Fesrille\u002Ffuriganapad) | - | - | ⭐ 19 | 🟡 april 2025|\n| 🔗 [chikkarpy](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002Fchikkarpy) | 📥 418 | 📦 60k | ⭐ 55 | 🔴 february 2022|\n| 🔗 [ja-tokenizer-docker-py](https:\u002F\u002Fgithub.com\u002Fp-geon\u002Fja-tokenizer-docker-py) | - | - | ⭐ 36 | 🔴 may 2022|\n| 🔗 [JapaneseEmbeddingEval](https:\u002F\u002Fgithub.com\u002Foshizo\u002FJapaneseEmbeddingEval) | - | - | ⭐ 183 | 🔴 october 2024|\n| 🔗 [gptuber-by-langchain](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fgptuber-by-langchain) | - | - | ⭐ 63 | 🔴 january 2023|\n| 🔗 [shuwa](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fshuwa) | - | - | ⭐ 146 | 🔴 december 2022|\n| 🔗 [japanese-nli-model](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002Fjapanese-nli-model) | - | - | ⭐ 6 | 🔴 october 2022|\n| 🔗 [tra-fugu](https:\u002F\u002Fgithub.com\u002Ftos-kamiya\u002Ftra-fugu) | - | - | ⭐ 6 | 🔴 march 2023|\n| 🔗 [fugumt](https:\u002F\u002Fgithub.com\u002Fs-taka\u002Ffugumt) | - | - | ⭐ 64 | 🔴 february 2021|\n| 🔗 [JaSPICE](https:\u002F\u002Fgithub.com\u002Fkeio-smilab23\u002FJaSPICE) | 📥 4 | 📦 2k | ⭐ 9 | 🔴 november 2023|\n| 🔗 [Retrieval-based-Voice-Conversion-WebUI-JP-localization](https:\u002F\u002Fgithub.com\u002Fyantaisa11\u002FRetrieval-based-Voice-Conversion-WebUI-JP-localization) | - | - | ⭐ 48 | 🔴 april 2023|\n| 🔗 [pyopenjtalk](https:\u002F\u002Fgithub.com\u002Fr9y9\u002Fpyopenjtalk) | 📥 19k | 📦 1M | ⭐ 249 | 🟡 april 2025|\n| 🔗 [yomigana-ebook](https:\u002F\u002Fgithub.com\u002Frabbit19981023\u002Fyomigana-ebook) | 📥 22 | 📦 7k | ⭐ 26 | 🔴 february 2024|\n| 🔗 [N46Whisper](https:\u002F\u002Fgithub.com\u002FAyanaminn\u002FN46Whisper) | - | - | ⭐ 1.7k | 🔴 february 2025|\n| 🔗 [japanese_llm_simple_webui](https:\u002F\u002Fgithub.com\u002Fnoir55\u002Fjapanese_llm_simple_webui) | - | - | ⭐ 17 | 🔴 may 2024|\n| 🔗 [pdf-translator](https:\u002F\u002Fgithub.com\u002Fdiscus0434\u002Fpdf-translator) | - | - | ⭐ 339 | 🔴 may 2024|\n| 🔗 [japanese_qa_demo_with_haystack_and_es](https:\u002F\u002Fgithub.com\u002FShingo-Kamata\u002Fjapanese_qa_demo_with_haystack_and_es) | - | - | ⭐ 1 | 🔴 december 2022|\n| 🔗 [mozc-devices](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmozc-devices) | - | - | ⭐ 2.7k | 🟡 november 2025|\n| 🔗 [natsume](https:\u002F\u002Fgithub.com\u002Ffaruzan0820\u002Fnatsume) | 📥 0 | 📦 3k | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [vits-japros-webui](https:\u002F\u002Fgithub.com\u002Flitagin02\u002Fvits-japros-webui) | - | - | ⭐ 42 | 🔴 january 2024|\n| 🔗 [ja-law-parser](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fja-law-parser) | - | - | ⭐ 25 | 🔴 january 2024|\n| 🔗 [dictation-kit](https:\u002F\u002Fgithub.com\u002Fjulius-speech\u002Fdictation-kit) | - | - | ⭐ 164 | 🔴 april 2019|\n| 🔗 [julius4seg](https:\u002F\u002Fgithub.com\u002FHiroshiba\u002Fjulius4seg) | - | - | ⭐ 7 | 🔴 august 2021|\n| 🔗 [voicevox_engine](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox_engine) | - | - | ⭐ 1.7k | 🟢 last wednesday|\n| 🔗 [LLaVA-JP](https:\u002F\u002Fgithub.com\u002Ftosiyuki\u002FLLaVA-JP) | - | - | ⭐ 64 | 🔴 june 2024|\n| 🔗 [RAG-Japanese](https:\u002F\u002Fgithub.com\u002FAkimParis\u002FRAG-Japanese) | - | - | ⭐ 10 | 🟡 may 2025|\n| 🔗 [bertjsc](https:\u002F\u002Fgithub.com\u002Fer-ri\u002Fbertjsc) | - | - | ⭐ 14 | 🔴 august 2024|\n| 🔗 [llm-leaderboard](https:\u002F\u002Fgithub.com\u002Fwandb\u002Fllm-leaderboard) | - | - | ⭐ 92 | 🟡 september 2025|\n| 🔗 [jglue-evaluation-scripts](https:\u002F\u002Fgithub.com\u002Fnobu-g\u002Fjglue-evaluation-scripts) | - | - | ⭐ 18 | 🟢 last thursday|\n| 🔗 [BLIP2-Japanese](https:\u002F\u002Fgithub.com\u002FZhaoPeiduo\u002FBLIP2-Japanese) | - | - | ⭐ 13 | 🟡 september 2025|\n| 🔗 [wikipedia-passages-jawiki-embeddings-utils](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fwikipedia-passages-jawiki-embeddings-utils) | - | - | ⭐ 11 | 🔴 march 2024|\n| 🔗 [simple-simcse-ja](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fsimple-simcse-ja) | - | - | ⭐ 69 | 🔴 october 2023|\n| 🔗 [wikipedia-japanese-open-rag](https:\u002F\u002Fgithub.com\u002Flawofcycles\u002Fwikipedia-japanese-open-rag) | - | - | ⭐ repo not found | 🔴 repo not found|\n| 🔗 [gpt4-autoeval](https:\u002F\u002Fgithub.com\u002Fnorthern-system-service\u002Fgpt4-autoeval) | - | - | ⭐ 16 | 🔴 june 2024|\n| 🔗 [t5-japanese](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Ft5-japanese) | - | - | ⭐ 118 | 🟡 september 2025|\n| 🔗 [japanese_llm_eval](https:\u002F\u002Fgithub.com\u002Flightblue-tech\u002Fjapanese_llm_eval) | - | - | ⭐ 5 | 🔴 invalid|\n| 🔗 [jmteb](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fjmteb) | - | - | ⭐ 89 | 🟢 march|\n| 🔗 [pydomino](https:\u002F\u002Fgithub.com\u002Fdwangomediavillage\u002Fpydomino) | - | - | ⭐ 39 | 🟡 august 2025|\n| 🔗 [easynovelassistant](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasynovelassistant) | - | - | ⭐ 222 | 🔴 july 2024|\n| 🔗 [clip-japanese](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Fclip-japanese) | - | - | ⭐ 13 | 🟡 september 2025|\n| 🔗 [rime-jaroomaji](https:\u002F\u002Fgithub.com\u002Flazyfoxchan\u002Frime-jaroomaji) | - | - | ⭐ 48 | 🟢 last thursday|\n| 🔗 [deep-question-generation](https:\u002F\u002Fgithub.com\u002Fsonoisa\u002Fdeep-question-generation) | - | - | ⭐ 12 | 🔴 march 2023|\n| 🔗 [magpie-nemotron](https:\u002F\u002Fgithub.com\u002Faratako\u002Fmagpie-nemotron) | - | - | ⭐ 9 | 🔴 july 2024|\n| 🔗 [qlora_ja](https:\u002F\u002Fgithub.com\u002Fsosuke115\u002Fqlora_ja) | - | - | ⭐ 1 | 🔴 july 2024|\n| 🔗 [mozcdic-ut-jawiki](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-jawiki) | - | - | ⭐ 28 | 🟢 last thursday|\n| 🔗 [shisa-v2](https:\u002F\u002Fgithub.com\u002Fshisa-ai\u002Fshisa-v2) | - | - | ⭐ 28 | 🟡 december 2025|\n| 🔗 [llm-translator](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fllm-translator) | - | - | ⭐ 20 | 🔴 january 2025|\n| 🔗 [llm-jp-asr](https:\u002F\u002Fgithub.com\u002Ftosiyuki\u002Fllm-jp-asr) | - | - | ⭐ 9 | 🔴 september 2024|\n| 🔗 [rag-japanese](https:\u002F\u002Fgithub.com\u002Fakimfromparis\u002Frag-japanese) | - | - | ⭐ 10 | 🟡 may 2025|\n| 🔗 [monaka](https:\u002F\u002Fgithub.com\u002Fkomiya-lab\u002Fmonaka) | - | - | ⭐ 5 | 🔴 january 2025|\n| 🔗 [jp-translate.cloud](https:\u002F\u002Fgithub.com\u002Fmatthewbieda\u002Fjp-translate.cloud) | - | - | ⭐ 3 | 🔴 september 2024|\n| 🔗 [substring-word-finder](https:\u002F\u002Fgithub.com\u002Ftoufu-24\u002Fsubstring-word-finder) | - | - | ⭐ 4 | 🟡 november 2025|\n| 🔗 [heron-vlm-leaderboard](https:\u002F\u002Fgithub.com\u002Fwandb\u002Fheron-vlm-leaderboard) | - | - | ⭐ 6 | 🔴 december 2024|\n| 🔗 [text2dataset](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Ftext2dataset) | - | - | ⭐ 28 | 🔴 january 2025|\n| 🔗 [mecab-web-api](https:\u002F\u002Fgithub.com\u002Fbungoume\u002Fmecab-web-api) | - | - | ⭐ 40 | 🔴 july 2022|\n| 🔗 [mecab_controller](https:\u002F\u002Fgithub.com\u002Fajatt-tools\u002Fmecab_controller) | - | - | ⭐ 19 | 🟢 march|\n| 🔗 [vits](https:\u002F\u002Fgithub.com\u002Fzassou65535\u002Fvits) | - | - | ⭐ 92 | 🔴 february 2023|\n| 🔗 [akari_chatgpt_bot](https:\u002F\u002Fgithub.com\u002Fakarigroup\u002Fakari_chatgpt_bot) | - | - | ⭐ 48 | 🟡 october 2025|\n| 🔗 [kudasai](https:\u002F\u002Fgithub.com\u002Fbikatr7\u002Fkudasai) | - | - | ⭐ 26 | 🟡 june 2025|\n| 🔗 [mecab-visualizer](https:\u002F\u002Fgithub.com\u002Fsophiefy\u002Fmecab-visualizer) | - | - | ⭐ 2 | 🔴 september 2023|\n| 🔗 [add-dictionary](https:\u002F\u002Fgithub.com\u002Fmassao000\u002Fadd-dictionary) | - | - | ⭐ 3 | 🟡 october 2025|\n| 🔗 [j-moshi](https:\u002F\u002Fgithub.com\u002Fnu-dialogue\u002Fj-moshi) | - | - | ⭐ 305 | 🟡 june 2025|\n| 🔗 [jatts](https:\u002F\u002Fgithub.com\u002Funilight\u002Fjatts) | - | - | ⭐ 44 | 🟢 march|\n| 🔗 [tsukasa-speech](https:\u002F\u002Fgithub.com\u002Frespaired\u002Ftsukasa-speech) | - | - | ⭐ 63 | 🟡 may 2025|\n| 🔗 [symptom-expression-search](https:\u002F\u002Fgithub.com\u002Fpo3rin\u002Fsymptom-expression-search) | - | - | ⭐ 2 | 🔴 february 2021|\n| 🔗 [llm-jp-judge](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-judge) | - | - | ⭐ 40 | 🟡 december 2025|\n| 🔗 [asagi-vlm-colaboratory-sample](https:\u002F\u002Fgithub.com\u002Fkazuhito00\u002Fasagi-vlm-colaboratory-sample) | - | - | ⭐ 1 | 🔴 march 2025|\n| 🔗 [llm-jp-eval-mm](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-eval-mm) | - | - | ⭐ 41 | 🟢 january|\n| 🔗 [llm-jp-judge](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-judge) | - | - | ⭐ 40 | 🟡 december 2025|\n| 🔗 [manga109api](https:\u002F\u002Fgithub.com\u002Fmanga109\u002Fmanga109api) | 📥 190 | 📦 46k | ⭐ 129 | 🔴 march 2022|\n| 🔗 [fastrtc-jp](https:\u002F\u002Fgithub.com\u002Froute250\u002Ffastrtc-jp) | - | - | ⭐ 5 | 🟡 may 2025|\n| 🔗 [whisper-transcription](https:\u002F\u002Fgithub.com\u002Ffumifumi0831\u002Fwhisper-transcription) | - | - | ⭐ 17 | 🟢 january|\n| 🔗 [pocket-researcher](https:\u002F\u002Fgithub.com\u002Fu-masao\u002Fpocket-researcher) | - | - | ⭐ 10 | 🟡 april 2025|\n| 🔗 [jtransbench](https:\u002F\u002Fgithub.com\u002Fwebbigdata-jp\u002Fjtransbench) | - | - | ⭐ 13 | 🟡 october 2025|\n| 🔗 [easyllasa](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasyllasa) | - | - | ⭐ 25 | 🟡 september 2025|\n| 🔗 [kanjikana-model](https:\u002F\u002Fgithub.com\u002Fdigital-go-jp\u002Fkanjikana-model) | - | - | ⭐ 114 | 🟡 december 2025|\n| 🔗 [deep-openreview-research-ja](https:\u002F\u002Fgithub.com\u002Ftb-yasu\u002Fdeep-openreview-research-ja) | - | - | ⭐ 13 | 🟡 november 2025|\n| 🔗 [pitchbench](https:\u002F\u002Fgithub.com\u002Fshewiiii\u002Fpitchbench) | - | - | ⭐ 1 | 🟢 february|\n| 🔗 [mini-transformer-from-scratch](https:\u002F\u002Fgithub.com\u002Fzuofanf\u002Fmini-transformer-from-scratch) | - | - | ⭐ 2 | 🟡 november 2025|\n| 🔗 [vv_core_inference](https:\u002F\u002Fgithub.com\u002Fhiroshiba\u002Fvv_core_inference) | - | - | ⭐ 31 | 🟡 december 2025|\n| 🔗 [pyopenjtalk-plus](https:\u002F\u002Fgithub.com\u002Ftsukumijima\u002Fpyopenjtalk-plus) | 📥 24k | 📦 456k | ⭐ 56 | 🔴 invalid|\n| 🔗 [japanese_spelling_correction](https:\u002F\u002Fgithub.com\u002Fphkhanhtrinh23\u002Fjapanese_spelling_correction) | - | - | ⭐ 14 | 🔴 september 2023|\n| 🔗 [py-kaomoji](https:\u002F\u002Fgithub.com\u002Fshibuiwilliam\u002Fpy-kaomoji) | 📥 28 | 📦 37k | ⭐ 6 | 🔴 december 2018|\n| 🔗 [llm-jp-vila](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-vila) | - | - | ⭐ 10 | 🟡 august 2025|\n| 🔗 [kanjivg-radical](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fkanjivg-radical) | - | - | ⭐ 106 | 🔴 august 2018|\n| 🔗 [japanese-wordnet-visualization](https:\u002F\u002Fgithub.com\u002FHemingwayLee\u002Fjapanese-wordnet-visualization) | - | - | ⭐ 3 | 🔴 november 2022|\n| 🔗 [piper-plus](https:\u002F\u002Fgithub.com\u002Fayutaz\u002Fpiper-plus) | - | - | ⭐ 106 | 🟢 today|\n| 🔗 [Japanera](https:\u002F\u002Fgithub.com\u002Fnagataaaas\u002FJapanera) | 📥 3k | 📦 366k | ⭐ 35 | 🟡 june 2025|\n| 🔗 [bert-abstractive-text-summarization](https:\u002F\u002Fgithub.com\u002Fiwasakiyuuki\u002Fbert-abstractive-text-summarization) | - | - | ⭐ 49 | 🔴 december 2019|\n| 🔗 [kyujipy](https:\u002F\u002Fgithub.com\u002Fdrturnon\u002Fkyujipy) | 📥 25 | 📦 23k | ⭐ 22 | 🟢 january|\n| 🔗 [jitenbot](https:\u002F\u002Fgithub.com\u002Fkonstantindjairo\u002Fjitenbot) | - | - | ⭐ 4 | 🔴 december 2024|\n| 🔗 [ja-icd10](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fja-icd10) | - | - | ⭐ 5 | 🔴 july 2021|\n| 🔗 [pl-bert-vits2](https:\u002F\u002Fgithub.com\u002Ftonnetonne814\u002Fpl-bert-vits2) | - | - | ⭐ 14 | 🔴 december 2023|\n| 🔗 [ndc_predictor](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndc_predictor) | - | - | ⭐ 11 | 🔴 august 2021|\n| 🔗 [pfmt-bench-fin-ja](https:\u002F\u002Fgithub.com\u002Fpfnet-research\u002Fpfmt-bench-fin-ja) | - | - | ⭐ 9 | 🔴 march 2025|\n| 🔗 [marine-plus](https:\u002F\u002Fgithub.com\u002Ftsukumijima\u002Fmarine-plus) | 📥 299 | 📦 12k | ⭐ 8 | 🟢 march|\n| 🔗 [ja-tokenizer-benchmark](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fja-tokenizer-benchmark) | - | - | ⭐ 7 | 🔴 february 2022|\n| 🔗 [yat](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fyat) | - | - | ⭐ 7 | 🔴 june 2018|\n| 🔗 [igakuqa119](https:\u002F\u002Fgithub.com\u002Fdocto-rin\u002Figakuqa119) | - | - | ⭐ 8 | 🟢 january|\n| 🔗 [japanese-luw-tokenizer](https:\u002F\u002Fgithub.com\u002Fkoichiyasuoka\u002Fjapanese-luw-tokenizer) | - | - | ⭐ 6 | 🔴 december 2021|\n| 🔗 [ibus-jig](https:\u002F\u002Fgithub.com\u002Fy-koj\u002Fibus-jig) | - | - | ⭐ 4 | 🔴 december 2023|\n| 🔗 [jp-stopword-filter](https:\u002F\u002Fgithub.com\u002FBrambleXu\u002Fjp-stopword-filter) | 📥 8 | 📦 5k | ⭐ 4 | 🔴 november 2024|\n| 🔗 [yasumail](https:\u002F\u002Fgithub.com\u002Fterallite\u002Fyasumail) | - | - | ⭐ 2 | 🟢 january|\n| 🔗 [himotoki](https:\u002F\u002Fgithub.com\u002Fmsr2903\u002Fhimotoki) | 📥 73 | 📦 4k | ⭐ 3 | 🟢 february|\n| 🔗 [diafill-toolkit](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fdiafill-toolkit) | - | - | ⭐ 0 | 🟢 january|\n| 🔗 [eval_vertical_ja](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Feval_vertical_ja) | - | - | ⭐ 1 | 🟡 november 2025|\n| 🔗 [jp-llm-corpus-pii-filter](https:\u002F\u002Fgithub.com\u002Fmatsuolab\u002Fjp-llm-corpus-pii-filter) | - | - | ⭐ 7 | 🔴 march 2025|\n| 🔗 [eval_vertical_ja](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Feval_vertical_ja) | - | - | ⭐ 1 | 🟡 november 2025|\n| 🔗 [Novel2DialCorpus](https:\u002F\u002Fgithub.com\u002Fganbon\u002FNovel2DialCorpus) | - | - | ⭐ 0 | 🟢 february|\n\n## C++\n\n### 词法分析\n用于日语词法分析的高性能库\n\n * [mecab](https:\u002F\u002Fgithub.com\u002Ftaku910\u002Fmecab) - 又一个日语词法分析器\n * [jumanpp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fjumanpp) - Juman++（词法分析工具包）\n * [kytea](https:\u002F\u002Fgithub.com\u002Fneubig\u002Fkytea) - 京都文本分析工具包，用于分词、发音估计等\n * [juman](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fjuman) - 日语词法分析系统JUMAN\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [mecab](https:\u002F\u002Fgithub.com\u002Ftaku910\u002Fmecab) | - | - | ⭐ 1.1k | 🔴 2025年2月|\n| 🔗 [jumanpp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fjumanpp) | - | - | ⭐ 411 | 🔴 2023年3月|\n| 🔗 [kytea](https:\u002F\u002Fgithub.com\u002Fneubig\u002Fkytea) | - | - | ⭐ 212 | 🔴 2020年4月|\n| 🔗 [juman](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fjuman) | - | - | ⭐ 12 | 🔴 2021年12月|\n\n### 句法分析\n用于日语句子依存句法和语法分析的库\n\n * [cabocha](https:\u002F\u002Fgithub.com\u002Ftaku910\u002Fcabocha) - 又一个日语依存结构分析器\n * [knp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fknp) - 日语句法分析器\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [cabocha](https:\u002F\u002Fgithub.com\u002Ftaku910\u002Fcabocha) | - | - | ⭐ 121 | 🔴 2025年2月|\n| 🔗 [knp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fknp) | - | - | ⭐ 34 | 🔴 2023年11月|\n\n### 其他\n其他日语NLP和文本处理库\n\n * [jsc](https:\u002F\u002Fgithub.com\u002Fyohokuno\u002Fjsc) - 用于日语假名汉字转换、中文拼音输入及中日混用输入的联合信道模型。\n * [aquaskk](https:\u002F\u002Fgithub.com\u002Fcodefirst\u002Faquaskk) - 无需词法分析的输入法。\n * [mozc](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmozc) - Mozc：一款跨平台的日语输入法编辑器\n * [trimatch](https:\u002F\u002Fgithub.com\u002Ftuem\u002Ftrimatch) - Trimatch：精确\u002F前缀\u002F近似字符串匹配库\n * [resembla](https:\u002F\u002Fgithub.com\u002Ftuem\u002Fresembla) - Resembla：基于单词的日语相似句子搜索库\n * [corvusskk](https:\u002F\u002Fgithub.com\u002Fnathancorvussolis\u002Fcorvusskk) - ▽▼ 类SKK的日语输入法编辑器，适用于Windows\n * [mozuku](https:\u002F\u002Fgithub.com\u002Ft3tra-dev\u002Fmozuku) - 用于日语文章解析与校对的LSP服务器。\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [jsc](https:\u002F\u002Fgithub.com\u002Fyohokuno\u002Fjsc) | - | - | ⭐ 15 | 🔴 2012年12月|\n| 🔗 [aquaskk](https:\u002F\u002Fgithub.com\u002Fcodefirst\u002Faquaskk) | - | - | ⭐ 369 | 🔴 2023年7月|\n| 🔗 [mozc](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmozc) | - | - | ⭐ 2.9k | 🟢 昨天|\n| 🔗 [trimatch](https:\u002F\u002Fgithub.com\u002Ftuem\u002Ftrimatch) | - | - | ⭐ 2 | 🟢 2月|\n| 🔗 [resembla](https:\u002F\u002Fgithub.com\u002Ftuem\u002Fresembla) | - | - | ⭐ 73 | 🟡 2025年8月|\n| 🔗 [corvusskk](https:\u002F\u002Fgithub.com\u002Fnathancorvussolis\u002Fcorvusskk) | - | - | ⭐ 362 | 🟢 3月|\n| 🔗 [mozuku](https:\u002F\u002Fgithub.com\u002Ft3tra-dev\u002Fmozuku) | - | - | ⭐ 411 | 🟢 上周五|\n\n\n## Rust crate\n\n### 词法分析\n用Rust编写的快速日语词法分析库\n\n * [lindera](https:\u002F\u002Fgithub.com\u002Flindera-morphology\u002Flindera) - 词法分析库。\n * [vaporetto](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fvaporetto) - Vaporetto：基于逐点预测的超高速分词器\n * [goya](https:\u002F\u002Fgithub.com\u002FLeko\u002Fgoya) - 用Rust编写的日语词法分析器\n * [vibrato](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fvibrato) - vibrato：基于维特比算法的加速分词器\n * [yoin](https:\u002F\u002Fgithub.com\u002Fagatan\u002Fyoin) - 纯Rust编写的日语词法分析器\n * [mecab-rs](https:\u002F\u002Fgithub.com\u002Ftsurai\u002Fmecab-rs) - mecab词性标注和词法分析库的安全Rust绑定\n * [awabi](https:\u002F\u002Fgithub.com\u002Fnakagami\u002Fawabi) - 使用mecab词典的词法分析器\n * [kanpyo](https:\u002F\u002Fgithub.com\u002Ftogatoga\u002Fkanpyo) - 用Rust编写的日语词法分析器\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [lindera](https:\u002F\u002Fgithub.com\u002Flindera-morphology\u002Flindera) | - | 📦 100万 | ⭐ 610 | 🟢 今天|\n| 🔗 [vaporetto](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fvaporetto) | - | 📦 19.6万 | ⭐ 255 | 🟢 2月|\n| 🔗 [goya](https:\u002F\u002Fgithub.com\u002FLeko\u002Fgoya) | - | 📦 1.1万 | ⭐ 83 | 🔴 2021年12月|\n| 🔗 [vibrato](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fvibrato) | - | 📦 6万 | ⭐ 404 | 🟢 2月|\n| 🔗 [yoin](https:\u002F\u002Fgithub.com\u002Fagatan\u002Fyoin) | - | 📦 3千 | ⭐ 26 | 🔴 2017年10月|\n| 🔗 [mecab-rs](https:\u002F\u002Fgithub.com\u002Ftsurai\u002Fmecab-rs) | - | 📦 4万 | ⭐ 71 | 🔴 2023年9月|\n| 🔗 [awabi](https:\u002F\u002Fgithub.com\u002Fnakagami\u002Fawabi) | - | 📦 2.4万 | ⭐ 10 | 🟡 2025年11月|\n| 🔗 [kanpyo](https:\u002F\u002Fgithub.com\u002Ftogatoga\u002Fkanpyo) | - | 📦 2.5千 | ⭐ 109 | 🟢 2月|\n\n\n### 转换器\n用于日语文本中文字和字符转换的库\n\n * [wana_kana_rust](https:\u002F\u002Fgithub.com\u002FPSeitz\u002Fwana_kana_rust) - 用于检查和转换日语字符——平假名、片假名——与罗马字之间的实用库\n * [unicode-jp-rs](https:\u002F\u002Fgithub.com\u002Fgemmarx\u002Funicode-jp-rs) - 一个将日本半角假名[半角ｶﾅ]和全角英数字[全角英数]转换为标准形式的Rust库\n * [kana](https:\u002F\u002Fgithub.com\u002Fgbrlsnchs\u002Fkana) - [镜像]CLI程序，用于将罗马字文本转写为平假名或片假名\n * [kanaria](https:\u002F\u002Fgithub.com\u002Fsamunohito\u002Fkanaria) - 该库提供平假名、片假名以及半角、全角之间的相互转换和识别等功能。\n * [japanese-address-parser](https:\u002F\u002Fgithub.com\u002Fyuukitoriyama\u002Fjapanese-address-parser) - 用于将日本地址拆分为都道府县\u002F市区町村\u002F町名\u002F其他部分的库\n * [yosina](https:\u002F\u002Fgithub.com\u002Fyosina-lib\u002Fyosina) - Yosina是一个处理日语书写中使用的字母和符号的转写库。\n * [mojimoji-rs](https:\u002F\u002Fgithub.com\u002Feuropeanplaice\u002Fmojimoji-rs) - 一种快速实现日语半角与全角字符互换的Rust实现，即“mojimoji”。\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [wana_kana_rust](https:\u002F\u002Fgithub.com\u002FPSeitz\u002Fwana_kana_rust) | - | 📦 36万 | ⭐ 90 | 🔴 2025年3月|\n| 🔗 [unicode-jp-rs](https:\u002F\u002Fgithub.com\u002Fgemmarx\u002Funicode-jp-rs) | - | 📦 6.4万 | ⭐ 19 | 🔴 2020年4月|\n| 🔗 [kana](https:\u002F\u002Fgithub.com\u002Fgbrlsnchs\u002Fkana) | - | - | ⭐ 12 | 🔴 2023年1月|\n| 🔗 [kanaria](https:\u002F\u002Fgithub.com\u002Fsamunohito\u002Fkanaria) | - | - | ⭐ 21 | 🟢 2月|\n| 🔗 [japanese-address-parser](https:\u002F\u002Fgithub.com\u002Fyuukitoriyama\u002Fjapanese-address-parser) | - | - | ⭐ 10 | 🟢 3月|\n| 🔗 [yosina](https:\u002F\u002Fgithub.com\u002Fyosina-lib\u002Fyosina) | - | - | ⭐ 24 | 🟢 3月|\n| 🔗 [mojimoji-rs](https:\u002F\u002Fgithub.com\u002Feuropeanplaice\u002Fmojimoji-rs) | - | - | ⭐ 4 | 🔴 2022年11月|\n\n\n### 搜索引擎库\n用于日语全文检索和索引的库\n\n * [lindera-tantivy](https:\u002F\u002Fgithub.com\u002Flindera-morphology\u002Flindera-tantivy) - Lindera分词器，用于Tantivy。\n * [tantivy-vibrato](https:\u002F\u002Fgithub.com\u002Fakr4\u002Ftantivy-vibrato) - 使用Vibrato的Tantivy分词器。\n\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [lindera-tantivy](https:\u002F\u002Fgithub.com\u002Flindera-morphology\u002Flindera-tantivy) | - | 📦 17.8万 | ⭐ 69 | 🟢 1月|\n| 🔗 [tantivy-vibrato](https:\u002F\u002Fgithub.com\u002Fakr4\u002Ftantivy-vibrato) | - | 📦 1.5千 | ⭐ 3 | 🔴 2023年1月|\n\n### 其他\n用于日语文本和输入法处理的补充工具箱\n\n * [daachorse](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fdaachorse) - 使用 Rust 语言中的紧凑双数组数据结构实现的快速 Aho-Corasick 算法。\n * [find-simdoc](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Ffind-simdoc) - 以高效的时间和内存方式查找所有相似文档对。\n * [crawdad](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fcrawdad) - 基于字符级双数组字典树的自然语言词典 Rust 库。\n * [tokenizer-speed-bench](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Ftokenizer-speed-bench) - 各种分词器的性能对比代码。\n * [stringmatch-bench](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Fstringmatch-bench) - 提供用于比较字符串匹配相关数据结构性能的基准测试工具。\n * [vime](https:\u002F\u002Fgithub.com\u002Falgon-320\u002Fvime) - 将 Vim 用作 X11 应用程序的输入法。\n * [voicevox_core](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox_core) - 免费且中等质量的文本转语音软件 VOICEVOX 的核心部分。\n * [akaza](https:\u002F\u002Fgithub.com\u002Fakaza-im\u002Fakaza) - 另一个适用于 IBus\u002FLinux 的日语输入法。\n * [Jotoba](https:\u002F\u002Fgithub.com\u002FWeDontPanic\u002FJotoba) - 一款免费的在线、可自托管的多语言日语词典。\n * [dvorakjp-romantable](https:\u002F\u002Fgithub.com\u002Fshinespark\u002Fdvorakjp-romantable) - 用于 Google 日语输入法的 DvorakJP 拼音表。\n * [niinii](https:\u002F\u002Fgithub.com\u002FNetdex\u002Fniinii) - 使用 Ichiran 辅助阅读文本的日语词汇标注工具。\n * [cskk](https:\u002F\u002Fgithub.com\u002Fnaokiri\u002Fcskk) - SKK（简单假名汉字转换）库。\n * [japanki](https:\u002F\u002Fgithub.com\u002Ftysonwu\u002Fjapanki) - 通过命令行问答学习日语词汇 🇯🇵！\n * [jpreprocess](https:\u002F\u002Fgithub.com\u002Fjpreprocess\u002Fjpreprocess) - 面向文本转语音应用的日语文本预处理器（基于 Rust 语言重写的 OpenJTalk）。\n * [listup_precedent](https:\u002F\u002Fgithub.com\u002Fjapanese-law-analysis\u002Flistup_precedent) - 通过爬取日本法院官网 (https:\u002F\u002Fwww.courts.go.jp\u002Findex.html) 生成裁判案例数据列表的软件。\n * [jisho](https:\u002F\u002Fgithub.com\u002Feagleflo\u002Fjisho) - Jisho 是一个提供日英词典功能的命令行工具及 Rust 库。\n * [kanalizer](https:\u002F\u002Fgithub.com\u002Fvoicevox\u002Fkanalizer) - 从英文单词推测日语读音的库。\n * [koharu](https:\u002F\u002Fgithub.com\u002Fmayocream\u002Fkoharu) - 使用大语言模型的自动化漫画翻译工具，由 Rust 编写。\n * [yomine](https:\u002F\u002Fgithub.com\u002Fmcgrizzz\u002Fyomine) - 专为语言学习者设计的日本语词汇挖掘工具，帮助用户发现新单词和表达。\n * [matsuba](https:\u002F\u002Fgithub.com\u002Fmrpicklepinosaur\u002Fmatsuba) - 轻量级的日语输入法，使用 Rust 编写。\n * [hujiang_dictionary](https:\u002F\u002Fgithub.com\u002Fasutorufa\u002Fhujiang_dictionary) - 由 Rust 实现的日语词典，支持 Telegram 机器人、AWS Lambda 和 Cloudflare Workers。同时支持大语言模型和检索增强生成技术。\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [daachorse](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fdaachorse) | - | 📦 78.1万 | ⭐ 249 | 🟢 今天|\n| 🔗 [find-simdoc](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Ffind-simdoc) | - | 📦 2.9万 | ⭐ 62 | 🔴 2025年3月|\n| 🔗 [crawdad](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fcrawdad) | - | 📦 6.5万 | ⭐ 37 | 🔴 2025年1月|\n| 🔗 [tokenizer-speed-bench](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Ftokenizer-speed-bench) | - | - | ⭐ 4 | 🔴 2023年3月|\n| 🔗 [stringmatch-bench](https:\u002F\u002Fgithub.com\u002Flegalforce-research\u002Fstringmatch-bench) | - | - | ⭐ 3 | 🔴 2022年9月|\n| 🔗 [vime](https:\u002F\u002Fgithub.com\u002Falgon-320\u002Fvime) | - | - | ⭐ 230 | 🔴 2022年11月|\n| 🔗 [voicevox_core](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox_core) | - | - | ⭐ 1.1千 | 🟢 3月|\n| 🔗 [akaza](https:\u002F\u002Fgithub.com\u002Fakaza-im\u002Fakaza) | - | - | ⭐ 249 | 🟢 昨天|\n| 🔗 [Jotoba](https:\u002F\u002Fgithub.com\u002FWeDontPanic\u002FJotoba) | - | - | ⭐ 200 | 🔴 2024年1月|\n| 🔗 [dvorakjp-romantable](https:\u002F\u002Fgithub.com\u002Fshinespark\u002Fdvorakjp-romantable) | - | - | ⭐ 56 | 🟢 2月|\n| 🔗 [niinii](https:\u002F\u002Fgithub.com\u002FNetdex\u002Fniinii) | - | - | ⭐ 14 | 🟢 3月|\n| 🔗 [cskk](https:\u002F\u002Fgithub.com\u002Fnaokiri\u002Fcskk) | - | - | ⭐ 80 | 🟢 3月|\n| 🔗 [japanki](https:\u002F\u002Fgithub.com\u002Ftysonwu\u002Fjapanki) | - | - | ⭐ 3 | 🔴 2023年10月|\n| 🔗 [jpreprocess](https:\u002F\u002Fgithub.com\u002Fjpreprocess\u002Fjpreprocess) | - | - | ⭐ 54 | 🟢 2月|\n| 🔗 [listup_precedent](https:\u002F\u002Fgithub.com\u002Fjapanese-law-analysis\u002Flistup_precedent) | - | - | ⭐ 6 | 🟢 上周四|\n| 🔗 [jisho](https:\u002F\u002Fgithub.com\u002Feagleflo\u002Fjisho) | - | - | ⭐ 18 | 🟢 上周四|\n| 🔗 [kanalizer](https:\u002F\u002Fgithub.com\u002Fvoicevox\u002Fkanalizer) | - | - | ⭐ 27 | 🟢 3月|\n| 🔗 [koharu](https:\u002F\u002Fgithub.com\u002Fmayocream\u002Fkoharu) | - | - | ⭐ 1.8千 | 🟢 今天|\n| 🔗 [yomine](https:\u002F\u002Fgithub.com\u002Fmcgrizzz\u002Fyomine) | - | - | ⭐ 49 | 🟢 2月|\n| 🔗 [matsuba](https:\u002F\u002Fgithub.com\u002Fmrpicklepinosaur\u002Fmatsuba) | - | - | ⭐ 18 | 🔴 2023年3月|\n| 🔗 [hujiang_dictionary](https:\u002F\u002Fgithub.com\u002Fasutorufa\u002Fhujiang_dictionary) | - | - | ⭐ 70 | 🟢 今天|\n\n\n## JavaScript\n\n### 形态分析\n适用于浏览器和 Node.js 的日语形态分析库\n\n * [kuromoji.js](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fkuromoji.js) - 日语形态分析器的 JavaScript 实现。\n * [rakutenma](https:\u002F\u002Fgithub.com\u002Frakuten-nlp\u002Frakutenma) - 拉克坦 MA - 纯 JavaScript 编写的中日文形态分析器（分词器 + 词性标注器）。\n * [node-mecab-ya](https:\u002F\u002Fgithub.com\u002Fgolbin\u002Fnode-mecab-ya) - 又一个用于 Node.js 的 MeCab 封装库。\n * [juman-bin](https:\u002F\u002Fgithub.com\u002Fthammin\u002Fjuman-bin) - 一种可扩展的日语形态分析系统。\n * [node-mecab-async](https:\u002F\u002Fgithub.com\u002Fhecomi\u002Fnode-mecab-async) - 使用 MeCab 的异步日语形态分析器。\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [kuromoji.js](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fkuromoji.js) | 📥 18.1万\u002F周 | 📦 860万 | ⭐ 971 | 🔴 2018年11月|\n| 🔗 [rakutenma](https:\u002F\u002Fgithub.com\u002Frakuten-nlp\u002Frakutenma) | 📥 36\u002F周 | 📦 906 | ⭐ 472 | 🔴 2015年1月|\n| 🔗 [node-mecab-ya](https:\u002F\u002Fgithub.com\u002Fgolbin\u002Fnode-mecab-ya) | 📥 9.5万\u002F周 | 📦 7.4千 | ⭐ 110 | 🔴 仓库未找到|\n| 🔗 [juman-bin](https:\u002F\u002Fgithub.com\u002Fthammin\u002Fjuman-bin) | 📥 1万\u002F周 | 📦 305 | ⭐ 3 | 🔴 2017年5月|\n| 🔗 [node-mecab-async](https:\u002F\u002Fgithub.com\u002Fhecomi\u002Fnode-mecab-async) | 📥 5千\u002F周 | 📦 34万 | ⭐ 104 | 🔴 2017年10月|\n\n### 转换器\n用于转换日语文字和读音的库\n\n * [kuroshiro](https:\u002F\u002Fgithub.com\u002Fhexenq\u002Fkuroshiro) - 日语语言库，支持将日语句子转换为平假名、片假名或罗马字，并提供振假名和送假名模式。\n * [kuroshiro-analyzer-kuromoji](https:\u002F\u002Fgithub.com\u002Fhexenq\u002Fkuroshiro-analyzer-kuromoji) - 适用于 kuroshiro 的 Kuromoji 词法分析器。\n * [hepburn](https:\u002F\u002Fgithub.com\u002Flovell\u002Fhepburn) - 使用赫本式罗马字转写法，在 Node.js 中实现日语平假名和片假名与罗马字之间的相互转换。\n * [japanese-numerals-to-number](https:\u002F\u002Fgithub.com\u002Ftwada\u002Fjapanese-numerals-to-number) - 将日本数字转换为阿拉伯数字。\n * [jslingua](https:\u002F\u002Fgithub.com\u002Fkariminf\u002Fjslingua) - 用于处理文本的 JavaScript 库，支持阿拉伯语、日语等多种语言。\n * [WanaKana](https:\u002F\u002Fgithub.com\u002FWaniKani\u002FWanaKana) - 用于检测并进行平假名 ↔ 片假名 ↔ 罗马字之间相互转写的 JavaScript 库。\n * [node-romaji-name](https:\u002F\u002Fgithub.com\u002Fjeresig\u002Fnode-romaji-name) - 规范化并修复基于罗马字的日语姓名中常见的问题。\n * [kyujitai.js](https:\u002F\u002Fgithub.com\u002Fhakatashi\u002Fkyujitai.js) - 用于使日语文本呈现旧式风格的实用工具集合。\n * [normalize-japanese-addresses](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fnormalize-japanese-addresses) - 开源地址标准化库。\n * [jaconv](https:\u002F\u002Fgithub.com\u002Fkazuhikoarase\u002Fjaconv) - 日语字符转换库（JavaScript）。\n * [romaji-conv](https:\u002F\u002Fgithub.com\u002Fkoozaki\u002Fromaji-conv) - 将罗马字转换为平假名。\n * [japanese-addresses-v2](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fjapanese-addresses-v2) - 全国地址数据 API。\n * [jptext-to-emoji](https:\u002F\u002Fgithub.com\u002Felzup\u002Fjptext-to-emoji) - 将文本中的单词转换为表情符号。\n * [japanese.js](https:\u002F\u002Fgithub.com\u002Fhakatashi\u002Fjapanese.js) - 用于日语文本处理的实用工具集合。包括平假名化、片假名化和罗马字化等功能。\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [kuroshiro](https:\u002F\u002Fgithub.com\u002Fhexenq\u002Fkuroshiro) | 📥 1.2万\u002F周 | 📦 43.5万 | ⭐ 963 | 🔴 2021年6月|\n| 🔗 [kuroshiro-analyzer-kuromoji](https:\u002F\u002Fgithub.com\u002Fhexenq\u002Fkuroshiro-analyzer-kuromoji) | 📥 1.2万\u002F周 | 📦 41万 | ⭐ 68 | 🔴 2018年8月|\n| 🔗 [hepburn](https:\u002F\u002Fgithub.com\u002Flovell\u002Fhepburn) | 📥 15.4万\u002F周 | 📦 370万 | ⭐ 137 | 🟡 2025年9月|\n| 🔗 [japanese-numerals-to-number](https:\u002F\u002Fgithub.com\u002Ftwada\u002Fjapanese-numerals-to-number) | 📥 4.1万\u002F周 | 📦 230万 | ⭐ 59 | 🔴 2023年2月|\n| 🔗 [jslingua](https:\u002F\u002Fgithub.com\u002Fkariminf\u002Fjslingua) | 📥 71\u002F周 | 📦 8,300 | ⭐ 53 | 🔴 2023年10月|\n| 🔗 [WanaKana](https:\u002F\u002Fgithub.com\u002FWaniKani\u002FWanaKana) | 📥 受上游服务速率限制 | 📦 220万 | ⭐ 912 | 🟡 2025年9月|\n| 🔗 [node-romaji-name](https:\u002F\u002Fgithub.com\u002Fjeresig\u002Fnode-romaji-name) | 📥 440\u002F周 | 📦 1.4万 | ⭐ 41 | 🔴 2023年12月|\n| 🔗 [kyujitai.js](https:\u002F\u002Fgithub.com\u002Fhakatashi\u002Fkyujitai.js) | 📥 受上游服务速率限制 | 📦 1,100 | ⭐ 23 | 🔴 2020年8月|\n| 🔗 [normalize-japanese-addresses](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fnormalize-japanese-addresses) | - | - | ⭐ 946 | 🟡 2025年7月|\n| 🔗 [jaconv](https:\u002F\u002Fgithub.com\u002Fkazuhikoarase\u002Fjaconv) | - | - | ⭐ 87 | 🟡 2025年6月|\n| 🔗 [romaji-conv](https:\u002F\u002Fgithub.com\u002Fkoozaki\u002Fromaji-conv) | - | - | ⭐ 26 | 🟢 2月|\n| 🔗 [japanese-addresses-v2](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fjapanese-addresses-v2) | - | - | ⭐ 71 | 🔴 2025年1月|\n| 🔗 [jptext-to-emoji](https:\u002F\u002Fgithub.com\u002Felzup\u002Fjptext-to-emoji) | - | - | ⭐ 2 | 🟢 2月|\n| 🔗 [japanese.js](https:\u002F\u002Fgithub.com\u002Fhakatashi\u002Fjapanese.js) | - | - | ⭐ 167 | 🔴 2020年8月|\n\n### 其他\nJavaScript 中用于日语 NLP 的其他库\n\n * [bangumi-data](https:\u002F\u002Fgithub.com\u002Fbangumi-data\u002Fbangumi-data) - 日本动漫的原始数据\n * [yomichan](https:\u002F\u002Fgithub.com\u002FFooSoft\u002Fyomichan) - 适用于 Chrome 和 Firefox 的日语弹出式词典扩展程序。\n * [proofreading-tool](https:\u002F\u002Fgithub.com\u002Fgecko655\u002Fproofreading-tool) - 基于 GUI 的文档校对工具，用于文本校验。\n * [kanjigrid](https:\u002F\u002Fgithub.com\u002Fminosvasilias\u002Fkanjigrid) - 一个 Web 应用，展示了詹姆斯·海西格《记住汉字》第六版中教授的 2200 个汉字。\n * [japanese-toolkit](https:\u002F\u002Fgithub.com\u002Fechamudi\u002Fjapanese-toolkit) - 包含汉字、假名注音、日语数据库等的 Monorepo\n * [analyze-desumasu-dearu](https:\u002F\u002Fgithub.com\u002Ftextlint-ja\u002Fanalyze-desumasu-dearu) - 用于解析日语敬体（ですます调）和常体（である调）的 JavaScript 库\n * [hatsuon](https:\u002F\u002Fgithub.com\u002FDJTB\u002Fhatsuon) - 日语声调工具\n * [sentiment_ja_js](https:\u002F\u002Fgithub.com\u002Fotodn\u002Fsentiment_ja_js) - 日语情感分析。使用 JavaScript 实现的日语情感分析\n * [mecab-ipadic-seed](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fmecab-ipadic-seed) - MeCab-ipadic 种子词典读取器\n * [Japanese-Word-Of-The-Day](https:\u002F\u002Fgithub.com\u002FLuanRT\u002FJapanese-Word-Of-The-Day) - 每天学习一个不同的日语单词。\n * [oskim](https:\u002F\u002Fgithub.com\u002Fesrille\u002Foskim) - 扩展 GNOME 屏幕键盘以支持输入法\n * [tweetMapping](https:\u002F\u002Fgithub.com\u002Fwtnv-lab\u002FtweetMapping) - 这是东日本大地震发生后 24 小时内发布的带地理标签推文的数字档案。\n * [pitch-accent](https:\u002F\u002Fgithub.com\u002Fshirakaba\u002Fpitch-accent) - 预测日语声调\n * [kana2ipa](https:\u002F\u002Fgithub.com\u002Famanoese\u002Fkana2ipa) - 将“平假名”或“片假名”转换为日语发音时使用的国际音标 (IPA) 的命令行工具\n * [voicevox](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox) - 可免费使用的中等质量文本转语音软件 VOICEVOX 的编辑器\n * [kamiya-codec](https:\u002F\u002Fgithub.com\u002Ffasiha\u002Fkamiya-codec) - 基于神谷妙子的《日语动词手册》和《日语形容词与副词手册》作品的日语动词变位及还原工具\n * [closewords](https:\u002F\u002Fgithub.com\u002Fotoneko1102\u002Fclosewords) - 从一组词语中搜索最相似单词的对应日语（包括汉字）的库\n * [japanese-analyzer](https:\u002F\u002Fgithub.com\u002Fcokice\u002Fjapanese-analyzer) - 日语句子分析器\n * [japanese-furigana-normalize](https:\u002F\u002Fgithub.com\u002Fmarvnc\u002Fjapanese-furigana-normalize) - 规范化日语假名注音\n * [yama](https:\u002F\u002Fgithub.com\u002Fsapjax\u002Fyama) - 在任何网站上获取日语词汇\n * [kaitai](https:\u002F\u002Fgithub.com\u002Fcompile10\u002Fkaitai) - 使用 AI 分析日语句子结构的应用程序。该工具通过交互式图表可视化单词和短语之间的关系，展示语法关系。\n * [tsukeru-furigana-converter](https:\u002F\u002Fgithub.com\u002Fln2058\u002Ftsukeru-furigana-converter) - 浏览器扩展程序（Chrome\u002FEdge\u002FFirefox），可按需将假名注音注入日语网页；包含词典提示框、JLPT 过滤以及词汇表\u002FAnki 导出功能。\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [bangumi-data](https:\u002F\u002Fgithub.com\u002Fbangumi-data\u002Fbangumi-data) | 📥 830\u002F周 | 📦 5.8万 | ⭐ 598 | 🟢 上周三|\n| 🔗 [yomichan](https:\u002F\u002Fgithub.com\u002FFooSoft\u002Fyomichan) | - | - | ⭐ 1.1千 | 🔴 2023年2月|\n| 🔗 [proofreading-tool](https:\u002F\u002Fgithub.com\u002Fgecko655\u002Fproofreading-tool) | - | - | ⭐ 87 | 🟡 2025年10月|\n| 🔗 [kanjigrid](https:\u002F\u002Fgithub.com\u002Fminosvasilias\u002Fkanjigrid) | - | - | ⭐ 44 | 🔴 2018年11月|\n| 🔗 [japanese-toolkit](https:\u002F\u002Fgithub.com\u002Fechamudi\u002Fjapanese-toolkit) | - | - | ⭐ 63 | 🔴 2023年1月|\n| 🔗 [analyze-desumasu-dearu](https:\u002F\u002Fgithub.com\u002Ftextlint-ja\u002Fanalyze-desumasu-dearu) | 📥 9万\u002F周 | 📦 受上游服务速率限制 | ⭐ 18 | 🔴 2025年1月|\n| 🔗 [hatsuon](https:\u002F\u002Fgithub.com\u002FDJTB\u002Fhatsuon) | 📥 16\u002F周 | 📦 911 | ⭐ 38 | 🔴 2022年3月|\n| 🔗 [sentiment_ja_js](https:\u002F\u002Fgithub.com\u002Fotodn\u002Fsentiment_ja_js) | - | - | ⭐ 10 | 🔴 2021年12月|\n| 🔗 [mecab-ipadic-seed](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fmecab-ipadic-seed) | 📥 127\u002F周 | 📦 6.1万 | ⭐ 8 | 🔴 2016年7月|\n| 🔗 [Japanese-Word-Of-The-Day](https:\u002F\u002Fgithub.com\u002FLuanRT\u002FJapanese-Word-Of-The-Day) | 📥 1\u002F周 | 📦 受上游服务速率限制 | ⭐ 未找到仓库 | 🔴 未找到仓库|\n| 🔗 [oskim](https:\u002F\u002Fgithub.com\u002Fesrille\u002Foskim) | - | - | ⭐ 2 | 🔴 2023年2月|\n| 🔗 [tweetMapping](https:\u002F\u002Fgithub.com\u002Fwtnv-lab\u002FtweetMapping) | - | - | ⭐ 26 | 🟢 3月|\n| 🔗 [pitch-accent](https:\u002F\u002Fgithub.com\u002Fshirakaba\u002Fpitch-accent) | 📥 9\u002F周 | 📦 102 | ⭐ 2 | 🔴 2023年9月|\n| 🔗 [kana2ipa](https:\u002F\u002Fgithub.com\u002Famanoese\u002Fkana2ipa) | - | - | ⭐ 17 | 🔴 2020年10月|\n| 🔗 [voicevox](https:\u002F\u002Fgithub.com\u002FVOICEVOX\u002Fvoicevox) | - | - | ⭐ 3.1千 | 🟢 今天|\n| 🔗 [kamiya-codec](https:\u002F\u002Fgithub.com\u002Ffasiha\u002Fkamiya-codec) | - | - | ⭐ 22 | 🟡 2025年5月|\n| 🔗 [closewords](https:\u002F\u002Fgithub.com\u002Fotoneko1102\u002Fclosewords) | - | - | ⭐ 4 | 🟢 3月|\n| 🔗 [japanese-analyzer](https:\u002F\u002Fgithub.com\u002Fcokice\u002Fjapanese-analyzer) | - | - | ⭐ 714 | 🟡 2025年12月|\n| 🔗 [japanese-furigana-normalize](https:\u002F\u002Fgithub.com\u002Fmarvnc\u002Fjapanese-furigana-normalize) | - | - | ⭐ 6 | 🔴 2024年7月|\n| 🔗 [yama](https:\u002F\u002Fgithub.com\u002Fsapjax\u002Fyama) | - | - | ⭐ 8 | 🟢 2月|\n| 🔗 [kaitai](https:\u002F\u002Fgithub.com\u002Fcompile10\u002Fkaitai) | - | - | ⭐ 1 | 🟢 昨天|\n| 🔗 [tsukeru-furigana-converter](https:\u002F\u002Fgithub.com\u002Fln2058\u002Ftsukeru-furigana-converter) | - | - | ⭐ 1 | 🟢 3月|\n\n\n## Go\n\n### 形态分析\nGo 语言中的轻量级日语形态分析库\n\n * [kagome](https:\u002F\u002Fgithub.com\u002Fikawaha\u002Fkagome) - 完全由 Go 编写的自包含日语形态分析器\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [kagome](https:\u002F\u002Fgithub.com\u002Fikawaha\u002Fkagome) | - | - | ⭐ 959 | 🟢 上周五|\n\n### 其他\n基于 Go 的日语文本处理库\n\n * [ojosama](https:\u002F\u002Fgithub.com\u002Fjiro4989\u002Fojosama) - 将文本转换为一百满天原萨洛梅小姐风格的口吻\n * [nihongo](https:\u002F\u002Fgithub.com\u002Fgojp\u002Fnihongo) - 日语词典\n * [yomichan-import](https:\u002F\u002Fgithub.com\u002FFooSoft\u002Fyomichan-import) - Yomichan 的外部词典导入工具。\n * [imas-ime-dic](https:\u002F\u002Fgithub.com\u002Fmaruamyu\u002Fimas-ime-dic) - 用于日语输入法的偶像大师词汇词典（由 imas-db.jp 提供）\n * [go-kakasi](https:\u002F\u002Fgithub.com\u002Fsarumaj\u002Fgo-kakasi) - 使用 Go 实现的汉字到平假名\u002F片假名\u002F罗马字的转写工具\n * [go-moji](https:\u002F\u002Fgithub.com\u002Fktnyt\u002Fgo-moji) - 用于全角与半角字符转换的 Go 库\n * [ojichat](https:\u002F\u002Fgithub.com\u002Fgreymd\u002Fojichat) - 生成类似于大叔在 LINE 或邮件中发送的句子\n * [name](https:\u002F\u002Fgithub.com\u002Fkuniwak\u002Fname) - 日语姓名搜索工具\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [ojosama](https:\u002F\u002Fgithub.com\u002Fjiro4989\u002Fojosama) | - | - | ⭐ 387 | 🟢 三月|\n| 🔗 [nihongo](https:\u002F\u002Fgithub.com\u002Fgojp\u002Fnihongo) | - | - | ⭐ 83 | 🔴 2024年2月|\n| 🔗 [yomichan-import](https:\u002F\u002Fgithub.com\u002FFooSoft\u002Fyomichan-import) | - | - | ⭐ 86 | 🔴 2023年2月|\n| 🔗 [imas-ime-dic](https:\u002F\u002Fgithub.com\u002Fmaruamyu\u002Fimas-ime-dic) | - | - | ⭐ 32 | 🟢 一月|\n| 🔗 [go-kakasi](https:\u002F\u002Fgithub.com\u002Fsarumaj\u002Fgo-kakasi) | - | - | ⭐ 6 | 🟢 上周四|\n| 🔗 [go-moji](https:\u002F\u002Fgithub.com\u002Fktnyt\u002Fgo-moji) | - | - | ⭐ 20 | 🔴 2019年4月|\n| 🔗 [ojichat](https:\u002F\u002Fgithub.com\u002Fgreymd\u002Fojichat) | - | - | ⭐ 1.3k | 🔴 2024年10月|\n| 🔗 [name](https:\u002F\u002Fgithub.com\u002Fkuniwak\u002Fname) | - | - | ⭐ 11 | 🔴 2025年1月|\n\n\n## Java\n\n### 形态分析\n日语形态分析及词典管理库\n\n * [kuromoji](https:\u002F\u002Fgithub.com\u002Fatilika\u002Fkuromoji) - Kuromoji 是一个自包含且易于使用的日语形态分析器，专为搜索设计\n * [Sudachi](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachi) - 面向商业的日语分词工具\n * [SudachiDict](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachiDict) - Sudachi 的词典\n * [meval](https:\u002F\u002Fgithub.com\u002Fteru-oka-1933\u002Fmeval) - 形态分析器性能评估系统 MevAL\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [kuromoji](https:\u002F\u002Fgithub.com\u002Fatilika\u002Fkuromoji) | - | - | ⭐ 1k | 🔴 2019年9月|\n| 🔗 [Sudachi](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachi) | - | - | ⭐ 953 | 🔴 2024年11月|\n| 🔗 [SudachiDict](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachiDict) | - | - | ⭐ 285 | 🟢 一月|\n| 🔗 [meval](https:\u002F\u002Fgithub.com\u002Fteru-oka-1933\u002Fmeval) | - | - | ⭐ 7 | 🔴 2019年8月|\n\n\n### 其他\n用于日语 NLP 和 OCR 的 Java 库\n\n * [kanjitomo-ocr](https:\u002F\u002Fgithub.com\u002Fsakarika\u002Fkanjitomo-ocr) - 用于从图像中识别日语字符的 Java 库\n * [jakaroma](https:\u002F\u002Fgithub.com\u002Fnicolas-raoul\u002Fjakaroma) - 将日语汉字转写为罗马字的 Java 库及命令行工具\n * [kakasi-java](https:\u002F\u002Fgithub.com\u002Fnicolas-raoul\u002Fkakasi-java) - 使用 Java 实现的汉字到平假名\u002F片假名\u002F罗马字的转写工具\n * [Kamite](https:\u002F\u002Fgithub.com\u002Ffauu\u002FKamite) - 面向日语学习者的桌面语言沉浸伴侣\n * [react-native-japanese-tokenizer](https:\u002F\u002Fgithub.com\u002Fcraftzdog\u002Freact-native-japanese-tokenizer) - 适用于 iOS 和 Android 的 React Native 异步日语分词原生插件\n * [elasticsearch-analysis-japanese](https:\u002F\u002Fgithub.com\u002Fsuguru\u002Felasticsearch-analysis-japanese) - 基于 kuromoji 日语分词器的 Elasticsearch 日语分析器\n * [moji4j](https:\u002F\u002Fgithub.com\u002Fandree-surya\u002Fmoji4j) - 可在日语平假名、片假名和罗马字之间相互转换的 Java 库\n * [neologdn-java](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fneologdn-java) - 用于 mecab-neologd 的日语文本归一化工具\n * [elasticsearch-sudachi](https:\u002F\u002Fgithub.com\u002Fworksapplications\u002Felasticsearch-sudachi) - Elasticsearch 的日语分析插件\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [kanjitomo-ocr](https:\u002F\u002Fgithub.com\u002Fsakarika\u002Fkanjitomo-ocr) | - | - | ⭐ 205 | 🔴 2021年5月|\n| 🔗 [jakaroma](https:\u002F\u002Fgithub.com\u002Fnicolas-raoul\u002Fjakaroma) | - | - | ⭐ 68 | 🟡 2025年6月|\n| 🔗 [kakasi-java](https:\u002F\u002Fgithub.com\u002Fnicolas-raoul\u002Fkakasi-java) | - | - | ⭐ 55 | 🔴 2016年4月|\n| 🔗 [Kamite](https:\u002F\u002Fgithub.com\u002Ffauu\u002FKamite) | - | - | ⭐ 133 | 🔴 2025年3月|\n| 🔗 [react-native-japanese-tokenizer](https:\u002F\u002Fgithub.com\u002Fcraftzdog\u002Freact-native-japanese-tokenizer) | - | - | ⭐ 38 | 🔴 2023年6月|\n| 🔗 [elasticsearch-analysis-japanese](https:\u002F\u002Fgithub.com\u002Fsuguru\u002Felasticsearch-analysis-japanese) | - | - | ⭐ 29 | 🔴 2012年3月|\n| 🔗 [moji4j](https:\u002F\u002Fgithub.com\u002Fandree-surya\u002Fmoji4j) | - | - | ⭐ 33 | 🔴 2022年6月|\n| 🔗 [neologdn-java](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fneologdn-java) | - | - | ⭐ 5 | 🟢 二月|\n| 🔗 [elasticsearch-sudachi](https:\u002F\u002Fgithub.com\u002Fworksapplications\u002Felasticsearch-sudachi) | - | - | ⭐ 220 | 🟢 上周三|\n\n\n## 预训练模型\n\n### Word2Vec\n将单词转换为数值向量以捕捉语义相似性的模型\n\n * [japanese-words-to-vectors](https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fjapanese-words-to-vectors) - 使用 Gensim 和 Mecab 的 Word2Vec 方法构建的日语词向量模型\n * [chiVe](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FchiVe) - 基于 Sudachi 和 NWJC 的日语词嵌入\n * [elmo-japanese](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Felmo-japanese) - elmo 日语版本\n * [embedrank](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fembedrank) - EmbedRank 的 Python 实现\n * [aovec](https:\u002F\u002Fgithub.com\u002Feggplants\u002Faovec) - 简易青空文库 Word2Vec 构建工具 - 包含青空文库所有书籍的 Word2Vec 构建脚本及预训练模型\n * [dependency-based-japanese-word-embeddings](https:\u002F\u002Fgithub.com\u002Flapras-inc\u002Fdependency-based-japanese-word-embeddings) - 此仓库对应 AI LAB 文章“基于依存关系的日语词嵌入”（文章链接：https:\u002F\u002Fai-lab.lapras.com\u002Fnlp\u002Fjapanese-word-embedding\u002F）\n * [jawikivec](https:\u002F\u002Fgithub.com\u002Fwikiwikification\u002Fjawikivec) - 又一个日本维基百科实体向量\n * [jawiki_word_vector_updater](https:\u002F\u002Fgithub.com\u002Fkamigaito\u002Fjawiki_word_vector_updater) - 利用最新版日语维基百科的转储数据，结合 MeCab 分别使用 IPA 词典和最新的 Neologd 词典进行形态分析，并基于结果训练 word2vec、fastText 和 GloVe 的词向量表示模型的脚本\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [japanese-words-to-vectors](https:\u002F\u002Fgithub.com\u002Fphilipperemy\u002Fjapanese-words-to-vectors) | - | - | ⭐ 87 | 🔴 2020年8月|\n| 🔗 [chiVe](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FchiVe) | - | - | ⭐ 172 | 🔴 2024年3月|\n| 🔗 [elmo-japanese](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Felmo-japanese) | - | - | ⭐ 4 | 🔴 2019年10月|\n| 🔗 [embedrank](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fembedrank) | - | - | ⭐ 48 | 🔴 2019年3月|\n| 🔗 [aovec](https:\u002F\u002Fgithub.com\u002Feggplants\u002Faovec) | 📥 111 | 📦 82k | ⭐ 3 | 🔴 2023年1月|\n| 🔗 [dependency-based-japanese-word-embeddings](https:\u002F\u002Fgithub.com\u002Flapras-inc\u002Fdependency-based-japanese-word-embeddings) | - | - | ⭐ 8 | 🔴 2019年8月|\n| 🔗 [jawikivec](https:\u002F\u002Fgithub.com\u002Fwikiwikification\u002Fjawikivec) | - | - | ⭐ 2 | 🔴 2018年11月|\n| 🔗 [jawiki_word_vector_updater](https:\u002F\u002Fgithub.com\u002Fkamigaito\u002Fjawiki_word_vector_updater) | - | - | ⭐ 11 | 🔴 2020年5月|\n\n### 基于Transformer的模型\n利用自注意力机制理解上下文并执行高级语言任务的模型\n\n * [bert-japanese](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fbert-japanese) - 用于日语文本的BERT模型。\n * [japanese-pretrained-models](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-pretrained-models) - 由rinna公司提供的日语预训练模型生成代码。\n * [bert-japanese](https:\u002F\u002Fgithub.com\u002Fyoheikikuta\u002Fbert-japanese) - 使用SentencePiece的日语BERT模型。\n * [SudachiTra](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachiTra) - 面向Transformers的日语分词器。\n * [japanese-dialog-transformers](https:\u002F\u002Fgithub.com\u002Fnttcslab\u002Fjapanese-dialog-transformers) - NTT实验室提供的日语预训练模型评估代码。\n * [shiba](https:\u002F\u002Fgithub.com\u002Foctanove\u002Fshiba) - CANINE高效字符级Transformer的PyTorch实现及预训练日语模型。\n * [Dialog](https:\u002F\u002Fgithub.com\u002Freppy4620\u002FDialog) - 使用BERT和Transformer解码器的日语聊天机器人PyTorch实现。\n * [language-pretraining](https:\u002F\u002Fgithub.com\u002Fretarfi\u002Flanguage-pretraining) - 日语文本的PyTorch实现BERT和ELECTRA模型。\n * [medbertjp](https:\u002F\u002Fgithub.com\u002Fou-medinfo\u002Fmedbertjp) - 日语医学领域预训练BERT模型的尝试。\n * [ILYS-aoba-chatbot](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FILYS-aoba-chatbot) - ILYS-aoba聊天机器人。\n * [t5-japanese](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Ft5-japanese) - 日语T5模型预训练代码。\n * [pytorch_bert_japanese](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fpytorch_bert_japanese) - 使用PyTorch调用BERT日语预训练模型。\n * [Laboro-BERT-Japanese](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-BERT-Japanese) - Laboro BERT日语版：基于网络语料库预训练的日语BERT模型。\n * [RoBERTa-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FRoBERTa-japanese) - 日语BERT预训练模型。\n * [aMLP-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FaMLP-japanese) - 面向日语的aMLP Transformer模型。\n * [bert-japanese-aozora](https:\u002F\u002Fgithub.com\u002Fakirakubo\u002Fbert-japanese-aozora) - 在青空文库和维基百科上训练的日语BERT，使用MeCab结合UniDic与SudachiPy进行预分词。\n * [sbert-ja](https:\u002F\u002Fgithub.com\u002Fcolorfulscoop\u002Fsbert-ja) - 为Hugging Face Model Hub训练句子级BERT日语模型的代码。\n * [BERT-Japan-vaccination](https:\u002F\u002Fgithub.com\u002FPatrickJohnRamos\u002FBERT-Japan-vaccination) - “日本推文情感分析与疫苗接种情况对比”的官方微调代码。\n * [gpt2-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002Fgpt2-japanese) - 日语GPT2生成模型。\n * [text2text-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002Ftext2text-japanese) - 基于GPT-2的文本到文本转换模型。\n * [gpt-ja](https:\u002F\u002Fgithub.com\u002Fcolorfulscoop\u002Fgpt-ja) - HuggingFace transformers中的GPT-2日语模型。\n * [friendly_JA-Model](https:\u002F\u002Fgithub.com\u002Fastremo\u002Ffriendly_JA-Model) - 使用friendly_JA语料库训练的机器翻译模型，旨在通过采用源自拉丁语\u002F英语的片假名字汇替代标准汉日词汇，使日语对西方人更加易懂、易于接触。\n * [albert-japanese](https:\u002F\u002Fgithub.com\u002Falinear-corp\u002Falbert-japanese) - 使用SentencePiece的日语ALBERT模型。\n * [ja_text_bert](https:\u002F\u002Fgithub.com\u002FKosuke-Szk\u002Fja_text_bert) - 用于基于日本维基百科语料库生成BERT预训练模型的仓库。\n * [DistilBERT-base-jp](https:\u002F\u002Fgithub.com\u002FBandaiNamcoResearchInc\u002FDistilBERT-base-jp) - 基于维基百科训练的日语DistilBERT预训练模型。\n * [bert](https:\u002F\u002Fgithub.com\u002Finformatix-inc\u002Fbert) - 该仓库提供使用日语语料库预训练的RoBERTa模型的片段。我们的数据集包括日本维基百科和网络爬取的文章，总计25GB。发布的模型基于HuggingFace的版本构建。\n * [Laboro-DistilBERT-Japanese](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-DistilBERT-Japanese) - Laboro DistilBERT日语版。\n * [luke](https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke) - LUKE——基于知识嵌入的语言理解模型。\n * [GPTSAN](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FGPTSAN) - 通用开关Transformer架构的日语语言模型。\n * [japanese-clip](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-clip) - rinna公司提供的日语CLIP模型。\n * [AcademicBART](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FAcademicBART) - 我们在学术数据库CiNii Articles的论文摘要上预训练了一个基于BART的日语掩码语言模型。\n * [AcademicRoBERTa](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FAcademicRoBERTa) - 我们在学术数据库CiNii Articles的论文摘要上预训练了一个基于RoBERTa的日语掩码语言模型。\n * [LINE-DistilBERT-Japanese](https:\u002F\u002Fgithub.com\u002Fline\u002FLINE-DistilBERT-Japanese) - 基于131GB日语网络文本预训练的DistilBERT模型。其教师模型是LINE内部构建的BERT-base。\n * [Japanese-Alpaca-LoRA](https:\u002F\u002Fgithub.com\u002Fkunishou\u002FJapanese-Alpaca-LoRA) - 使用翻译成日语的Stanford Alpaca数据集对LLaMA进行微调后生成的低秩适配器链接及生成示例代码。\n * [albert-japanese-tinysegmenter](https:\u002F\u002Fgithub.com\u002Fnknytk\u002Falbert-japanese-tinysegmenter) - 预训练模型、代码及指南，用于在日本维基百科资源上预训练官方ALBERT模型（https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Falbert）。\n * [japanese-llama-experiment](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjapanese-llama-experiment) - 日语LLaMa实验。\n * [easylightchatassistant](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasylightchatassistant) - EasyLightChatAssistant是一个轻量级、无审查无限制的本地日语模型LightChatAssistant，可通过KoboldCpp轻松试用的环境。\n\n|名称|每周下载量|总下载量|星数|最后提交|\n-|-|-|-|-\n| 🔗 [bert-japanese](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fbert-japanese) | - | - | ⭐ 544 | 🔴 2024年3月|\n| 🔗 [japanese-pretrained-models](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-pretrained-models) | - | - | ⭐ 未找到仓库 | 🔴 未找到仓库|\n| 🔗 [bert-japanese](https:\u002F\u002Fgithub.com\u002Fyoheikikuta\u002Fbert-japanese) | - | - | ⭐ 498 | 🔴 2021年2月|\n| 🔗 [SudachiTra](https:\u002F\u002Fgithub.com\u002FWorksApplications\u002FSudachiTra) | 📥 445 | 📦 16.4万 | ⭐ 79 | 🔴 2023年12月|\n| 🔗 [japanese-dialog-transformers](https:\u002F\u002Fgithub.com\u002Fnttcslab\u002Fjapanese-dialog-transformers) | - | - | ⭐ 245 | 🔴 2023年6月|\n| 🔗 [shiba](https:\u002F\u002Fgithub.com\u002Foctanove\u002Fshiba) | 📥 8 | 📦 7千 | ⭐ 89 | 🔴 2023年11月|\n| 🔗 [Dialog](https:\u002F\u002Fgithub.com\u002Freppy4620\u002FDialog) | - | - | ⭐ 72 | 🔴 2020年10月|\n| 🔗 [language-pretraining](https:\u002F\u002Fgithub.com\u002Fretarfi\u002Flanguage-pretraining) | - | - | ⭐ 50 | 🔴 2023年5月|\n| 🔗 [medbertjp](https:\u002F\u002Fgithub.com\u002Fou-medinfo\u002Fmedbertjp) | - | - | ⭐ 12 | 🔴 2020年11月|\n| 🔗 [ILYS-aoba-chatbot](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FILYS-aoba-chatbot) | - | - | ⭐ 23 | 🔴 2021年10月|\n| 🔗 [t5-japanese](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Ft5-japanese) | - | - | ⭐ 40 | 🔴 2021年9月|\n| 🔗 [pytorch_bert_japanese](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fpytorch_bert_japanese) | - | - | ⭐ 35 | 🔴 2019年6月|\n| 🔗 [Laboro-BERT-Japanese](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-BERT-Japanese) | - | - | ⭐ 73 | 🔴 2022年5月|\n| 🔗 [RoBERTa-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FRoBERTa-japanese) | - | - | ⭐ 23 | 🔴 2021年11月|\n| 🔗 [aMLP-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FaMLP-japanese) | - | - | ⭐ 16 | 🔴 2022年5月|\n| 🔗 [bert-japanese-aozora](https:\u002F\u002Fgithub.com\u002Fakirakubo\u002Fbert-japanese-aozora) | - | - | ⭐ 40 | 🔴 2020年8月|\n| 🔗 [sbert-ja](https:\u002F\u002Fgithub.com\u002Fcolorfulscoop\u002Fsbert-ja) | - | - | ⭐ 11 | 🔴 2021年8月|\n| 🔗 [BERT-Japan-vaccination](https:\u002F\u002Fgithub.com\u002FPatrickJohnRamos\u002FBERT-Japan-vaccination) | - | - | ⭐ 7 | 🔴 2022年5月|\n| 🔗 [gpt2-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002Fgpt2-japanese) | - | - | ⭐ 324 | 🔴 2023年9月|\n| 🔗 [text2text-japanese](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002Ftext2text-japanese) | - | - | ⭐ 33 | 🔴 2021年7月|\n| 🔗 [gpt-ja](https:\u002F\u002Fgithub.com\u002Fcolorfulscoop\u002Fgpt-ja) | - | - | ⭐ 3 | 🔴 2021年9月|\n| 🔗 [friendly_JA-Model](https:\u002F\u002Fgithub.com\u002Fastremo\u002Ffriendly_JA-Model) | - | - | ⭐ 1 | 🔴 2022年5月|\n| 🔗 [albert-japanese](https:\u002F\u002Fgithub.com\u002Falinear-corp\u002Falbert-japanese) | - | - | ⭐ 33 | 🔴 2021年10月|\n| 🔗 [ja_text_bert](https:\u002F\u002Fgithub.com\u002FKosuke-Szk\u002Fja_text_bert) | - | - | ⭐ 115 | 🔴 2018年11月|\n| 🔗 [DistilBERT-base-jp](https:\u002F\u002Fgithub.com\u002FBandaiNamcoResearchInc\u002FDistilBERT-base-jp) | - | - | ⭐ 161 | 🔴 2020年4月|\n| 🔗 [bert](https:\u002F\u002Fgithub.com\u002Finformatix-inc\u002Fbert) | - | - | ⭐ 28 | 🔴 2022年4月|\n| 🔗 [Laboro-DistilBERT-Japanese](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-DistilBERT-Japanese) | - | - | ⭐ 16 | 🔴 2020年12月|\n| 🔗 [luke](https:\u002F\u002Fgithub.com\u002Fstudio-ousia\u002Fluke) | - | - | ⭐ 727 | 🔴 2023年6月|\n| 🔗 [GPTSAN](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FGPTSAN) | - | - | ⭐ 118 | 🔴 2023年9月|\n| 🔗 [japanese-clip](https:\u002F\u002Fgithub.com\u002Frinnakk\u002Fjapanese-clip) | - | - | ⭐ 未找到仓库 | 🔴 未找到仓库|\n| 🔗 [AcademicBART](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FAcademicBART) | - | - | ⭐ 2 | 🔴 2024年7月|\n| 🔗 [AcademicRoBERTa](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FAcademicRoBERTa) | - | - | ⭐ 9 | 🔴 2024年9月|\n| 🔗 [LINE-DistilBERT-Japanese](https:\u002F\u002Fgithub.com\u002Fline\u002FLINE-DistilBERT-Japanese) | - | - | ⭐ 46 | 🔴 2023年3月|\n| 🔗 [Japanese-Alpaca-LoRA](https:\u002F\u002Fgithub.com\u002Fkunishou\u002FJapanese-Alpaca-LoRA) | - | - | ⭐ 141 | 🔴 2023年4月|\n| 🔗 [albert-japanese-tinysegmenter](https:\u002F\u002Fgithub.com\u002Fnknytk\u002Falbert-japanese-tinysegmenter) | - | - | ⭐ 13 | 🔴 2023年9月|\n| 🔗 [japanese-llama-experiment](https:\u002F\u002Fgithub.com\u002Flighttransport\u002Fjapanese-llama-experiment) | - | - | ⭐ 54 | 🟡 2025年12月|\n| 🔗 [easylightchatassistant](https:\u002F\u002Fgithub.com\u002Fzuntan03\u002Feasylightchatassistant) | - | - | ⭐ 44 | 🔴 2024年4月|\n\n## ChatGPT\n用于日语对话和文本生成的ChatGPT及API资源\n\n * [VRChatGPT](https:\u002F\u002Fgithub.com\u002FYuchi-Games\u002FVRChatGPT) - 使用ChatGPT在VRChat中实现聊天功能的程序。\n * [AITuberDegikkoMirii](https:\u002F\u002Fgithub.com\u002FM-gen\u002FAITuberDegikkoMirii) - 开发AITuber的基础部分。\n * [wanna](https:\u002F\u002Fgithub.com\u002Fhirokidaichi\u002Fwanna) - 通过自然语言启动Shell命令的工具。\n * [ChatdollKit](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit) - ChatdollKit可以让您的3D模型变成聊天机器人。\n * [ChuanhuChatGPTJapanese](https:\u002F\u002Fgithub.com\u002Fgyokuro33\u002FChuanhuChatGPTJapanese) - 面向日语用户的ChatGPT API图形界面。\n * [AISisterAIChan](https:\u002F\u002Fgithub.com\u002Fmanju-summoner\u002FAISisterAIChan) - 搭载ChatGPT3.5的伺か幽灵“AI妹妹艾酱”。使用时需另行获取ChatGPT的API密钥。\n * [vrchatbot](https:\u002F\u002Fgithub.com\u002FGeson-anko\u002Fvrchatbot) - 用于在VRChat中创建AI机器人的代码库。\n * [gptuber-by-langchain](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fgptuber-by-langchain) - GPT将担任YouTuber。\n * [openai-chatfriend](https:\u002F\u002Fgithub.com\u002Fsupershaneski\u002Fopenai-chatfriend) - 基于Nuxt 3构建的聊天应用，由OpenAI文本补全接口驱动。您可以选择不同性格的AI朋友。默认以日语回应。您可以用这个应用练习日语技能！\n * [chrome-ext-translate-to-hiragana-with-chatgpt](https:\u002F\u002Fgithub.com\u002Ffranzwong\u002Fchrome-ext-translate-to-hiragana-with-chatgpt) - 这款Chrome扩展程序可以利用ChatGPT将选中的日语文本翻译成平假名。\n * [azure-search-openai-demo](https:\u002F\u002Fgithub.com\u002Fnohanaga\u002Fazure-search-openai-demo) - 本示例展示了如何使用检索增强生成模式，针对自有数据打造类似ChatGPT的体验。\n * [chatvrm](https:\u002F\u002Fgithub.com\u002Fpixiv\u002Fchatvrm) - ChatVRM是一个演示应用程序，可在浏览器中轻松与3D角色进行对话。\n * [sftly-replace](https:\u002F\u002Fgithub.com\u002Fkmizu\u002Fsftly-replace) - 一款轻柔替换选中文本的Chrome扩展。\n * [summarize_arxv](https:\u002F\u002Fgithub.com\u002Frkmt\u002Fsummarize_arxv) - 使用图表总结arXiv论文。\n * [aiavatarkit](https:\u002F\u002Fgithub.com\u002Fuezo\u002Faiavatarkit) - 快速构建基于AI的对话型虚拟形象。\n * [pva-aoai-integration-solution](https:\u002F\u002Fgithub.com\u002FCity-of-Kobe\u002Fpva-aoai-integration-solution) - 该仓库旨在将神户市政府为试用ChatGPT而制定的工作流程等解决方案公开。\n * [jp-azureopenai-samples](https:\u002F\u002Fgithub.com\u002Fazure-samples\u002Fjp-azureopenai-samples) - 为了提供Azure OpenAI应用实现的参考，免费提供应用程序样本（参考架构、示例代码和部署步骤）。\n * [character_chat](https:\u002F\u002Fgithub.com\u002Fmutaguchi\u002Fcharacter_chat) - 利用OpenAI API，与设定的角色用日语对话的聊天脚本。\n * [chatgpt-slackbot](https:\u002F\u002Fgithub.com\u002Fsifue\u002Fchatgpt-slackbot) - 在Slack上使用OpenAI ChatGPT API的Slack机器人脚本（以日语使用为前提）。\n * [chatgpt-prompt-sample-japanese](https:\u002F\u002Fgithub.com\u002Fdahatake\u002Fchatgpt-prompt-sample-japanese) - ChatGPT提示词的示例。\n * [kanji-flashcard-app-gpt4](https:\u002F\u002Fgithub.com\u002Fadilmoujahid\u002Fkanji-flashcard-app-gpt4) - 使用Python和Langchain构建的日语汉字抽认卡应用，并结合GPT-4的强大智能。\n * [IgakuQA](https:\u002F\u002Fgithub.com\u002Fjungokasai\u002FIgakuQA) - 评估GPT-4和ChatGPT在日本医学执照考试中的表现。\n * [japagen](https:\u002F\u002Fgithub.com\u002Fretrieva\u002Fjapagen) - 探讨在日语任务中使用大语言模型生成伪学习数据。\n * [generativeai-prompt-sample-japanese](https:\u002F\u002Fgithub.com\u002Fdahatake\u002Fgenerativeai-prompt-sample-japanese) - 面向ChatGPT、Copilot等各种生成式AI的“日语”提示词示例。\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [VRChatGPT](https:\u002F\u002Fgithub.com\u002FYuchi-Games\u002FVRChatGPT) | - | - | ⭐ 15 | 🔴 2023年3月|\n| 🔗 [AITuberDegikkoMirii](https:\u002F\u002Fgithub.com\u002FM-gen\u002FAITuberDegikkoMirii) | - | - | ⭐ 5 | 🔴 2023年3月|\n| 🔗 [wanna](https:\u002F\u002Fgithub.com\u002Fhirokidaichi\u002Fwanna) | 📥 68 | 📦 2万 | ⭐ 142 | 🔴 2023年4月|\n| 🔗 [ChatdollKit](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit) | - | - | ⭐ 1.1千 | 🟢 3月|\n| 🔗 [ChuanhuChatGPTJapanese](https:\u002F\u002Fgithub.com\u002Fgyokuro33\u002FChuanhuChatGPTJapanese) | - | - | ⭐ 1 | 🔴 2023年3月|\n| 🔗 [AISisterAIChan](https:\u002F\u002Fgithub.com\u002Fmanju-summoner\u002FAISisterAIChan) | - | - | ⭐ 26 | 🔴 2023年5月|\n| 🔗 [vrchatbot](https:\u002F\u002Fgithub.com\u002FGeson-anko\u002Fvrchatbot) | - | - | ⭐ 29 | 🔴 2022年12月|\n| 🔗 [gptuber-by-langchain](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fgptuber-by-langchain) | - | - | ⭐ 63 | 🔴 2023年1月|\n| 🔗 [openai-chatfriend](https:\u002F\u002Fgithub.com\u002Fsupershaneski\u002Fopenai-chatfriend) | - | - | ⭐ 16 | 🔴 2023年4月|\n| 🔗 [chrome-ext-translate-to-hiragana-with-chatgpt](https:\u002F\u002Fgithub.com\u002Ffranzwong\u002Fchrome-ext-translate-to-hiragana-with-chatgpt) | - | - | ⭐ 1 | 🔴 2023年4月|\n| 🔗 [azure-search-openai-demo](https:\u002F\u002Fgithub.com\u002Fnohanaga\u002Fazure-search-openai-demo) | - | - | ⭐ 46 | 🔴 2023年12月|\n| 🔗 [chatvrm](https:\u002F\u002Fgithub.com\u002Fpixiv\u002Fchatvrm) | - | - | ⭐ 834 | 🟡 2025年5月|\n| 🔗 [sftly-replace](https:\u002F\u002Fgithub.com\u002Fkmizu\u002Fsftly-replace) | - | - | ⭐ 4 | 🔴 2023年5月|\n| 🔗 [summarize_arxv](https:\u002F\u002Fgithub.com\u002Frkmt\u002Fsummarize_arxv) | - | - | ⭐ 173 | 🔴 2023年5月|\n| 🔗 [aiavatarkit](https:\u002F\u002Fgithub.com\u002Fuezo\u002Faiavatarkit) | - | - | ⭐ 573 | 🟢 昨天|\n| 🔗 [pva-aoai-integration-solution](https:\u002F\u002Fgithub.com\u002FCity-of-Kobe\u002Fpva-aoai-integration-solution) | - | - | ⭐ 未找到仓库 | 🔴 未找到仓库|\n| 🔗 [jp-azureopenai-samples](https:\u002F\u002Fgithub.com\u002Fazure-samples\u002Fjp-azureopenai-samples) | - | - | ⭐ 280 | 🟢 3月|\n| 🔗 [character_chat](https:\u002F\u002Fgithub.com\u002Fmutaguchi\u002Fcharacter_chat) | - | - | ⭐ 16 | 🔴 2023年6月|\n| 🔗 [chatgpt-slackbot](https:\u002F\u002Fgithub.com\u002Fsifue\u002Fchatgpt-slackbot) | - | - | ⭐ 64 | 🔴 2024年7月|\n| 🔗 [chatgpt-prompt-sample-japanese](https:\u002F\u002Fgithub.com\u002Fdahatake\u002Fchatgpt-prompt-sample-japanese) | - | - | ⭐ 428 | 🟢 上周四|\n| 🔗 [kanji-flashcard-app-gpt4](https:\u002F\u002Fgithub.com\u002Fadilmoujahid\u002Fkanji-flashcard-app-gpt4) | - | - | ⭐ 6 | 🔴 2023年10月|\n| 🔗 [IgakuQA](https:\u002F\u002Fgithub.com\u002Fjungokasai\u002FIgakuQA) | - | - | ⭐ 49 | 🔴 2023年3月|\n| 🔗 [japagen](https:\u002F\u002Fgithub.com\u002Fretrieva\u002Fjapagen) | - | - | ⭐ 1 | 🔴 2024年10月|\n| 🔗 [generativeai-prompt-sample-japanese](https:\u002F\u002Fgithub.com\u002Fdahatake\u002Fgenerativeai-prompt-sample-japanese) | - | - | ⭐ 428 | 🟢 上周四|\n\n\n## 字典与输入法\n日语词典和输入法编辑器（IME）相关资源\n\n* [mecab-ipadic-neologd](https:\u002F\u002Fgithub.com\u002Fneologd\u002Fmecab-ipadic-neologd) - 基于网络语言资源的、适用于 mecab-ipadic 的新词词典\n * [tdmelodic](https:\u002F\u002Fgithub.com\u002FPKSHATechnology-Research\u002Ftdmelodic) - 日语口音词典生成器\n * [jamdict](https:\u002F\u002Fgithub.com\u002Fneocl\u002Fjamdict) - 用于操作 Jim Breen 的 JMdict、KanjiDic2、JMnedict 及汉字部首映射的 Python 3 库\n * [unidic-py](https:\u002F\u002Fgithub.com\u002Fpolm\u002Funidic-py) - 将 Unidic 打包为可通过 pip 安装的格式。\n * [Japanese-Company-Lexicon](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FJapanese-Company-Lexicon) - 日本公司词典（JCLdic）\n * [manbyo-sudachi](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fmanbyo-sudachi) - 面向 Sudachi 的万病词典\n * [jawiki-kana-kanji-dict](https:\u002F\u002Fgithub.com\u002Ftokuhirom\u002Fjawiki-kana-kanji-dict) - 从维基百科（日文版）生成 SKK\u002FMeCab 词典\n * [JIWC-Dictionary](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FJIWC-Dictionary) - 用于查找与文本相关情感的词典\n * [JumanDIC](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FJumanDIC) - 该仓库包含用于构建 JUMAN 和 Juman++ 词典的源词典文件。\n * [ipadic-py](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fipadic-py) - 将 IPAdic 打包以便于从 Python 中使用。\n * [unidic-lite](https:\u002F\u002Fgithub.com\u002Fpolm\u002Funidic-lite) - 便于通过 pip 安装的 UniDic 精简版。\n * [emoji-ime-dictionary](https:\u002F\u002Fgithub.com\u002Fpeaceiris\u002Femoji-ime-dictionary) - 用于日语中输入表情符号的 IME 扩展词典，可在 orange_book、Google 日语输入等工具中实现日语到表情符号的转换。\n * [google-ime-dictionary](https:\u002F\u002Fgithub.com\u002Fpeaceiris\u002Fgoogle-ime-dictionary) - 用于日英互译及英语缩略语展开的 IME 扩展词典，可在 Google 日语输入、ATOK 等工具中实现日语到英语的翻译以及英语缩略语的展开。\n * [dic-nico-intersection-pixiv](https:\u002F\u002Fgithub.com\u002Fncaq\u002Fdic-nico-intersection-pixiv) - 尼古尼古大百科与 Pixiv 百科事典共有的 IME 词典\n * [google-ime-user-dictionary-ja-en](https:\u002F\u002Fgithub.com\u002FKEINOS\u002Fgoogle-ime-user-dictionary-ja-en) - Google IME 用户词典项目的归档，内容为片假名词汇（日语外来语）到英语的映射。\n * [emoticon](https:\u002F\u002Fgithub.com\u002Ftiwanari\u002Femoticon) - Google 日本语输入的表情符号词典∩(,,Ò‿Ó,,)∩\n * [mecab-mozcdic](https:\u002F\u002Fgithub.com\u002Fakirakubo\u002Fmecab-mozcdic) - 将开源 Mozc 词典转换为 MeCab 词典格式。\n * [denonbu-ime-dic](https:\u002F\u002Fgithub.com\u002Falbno273\u002Fdenonbu-ime-dic) - 电音 IME：面向 Microsoft IME 等工具设计的“电音部”相关术语词典\n * [nijisanji-ime-dic](https:\u002F\u002Fgithub.com\u002FUmichang\u002Fnijisanji-ime-dic) - 面向 Microsoft IME 等工具设计的“彩虹社”相关术语词典。\n * [pokemon-ime-dic](https:\u002F\u002Fgithub.com\u002FUmichang\u002Fpokemon-ime-dic) - 面向 Microsoft IME 等工具设计的、涵盖目前已知所有宝可梦名称的术语词典。\n * [EJDict](https:\u002F\u002Fgithub.com\u002Fkujirahand\u002FEJDict) - 英日词典数据（公共领域）EJDict-hand\n * [Ayashiy-Nipongo-Dic](https:\u002F\u002Fgithub.com\u002FRinrin0413\u002FAyashiy-Nipongo-Dic) - 使用贵样ばこゐ辞畫可以正确使用日语。\n * [genshin-dict](https:\u002F\u002Fgithub.com\u002Fkotofurumiya\u002Fgenshin-dict) - 适用于 Windows\u002FmacOS 的原神单词词典。\n * [jmdict-simplified](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Fjmdict-simplified) - 以 JSON 格式提供的 JMdict 和 JMnedict 数据\n * [mozcdict-ext](https:\u002F\u002Fgithub.com\u002Freasonset\u002Fmozcdict-ext) - 将外部词汇转换为 Mozc 系统词典\n * [mh-dict-jp](https:\u002F\u002Fgithub.com\u002Futubo\u002Fmh-dict-jp) - 想制作怪物猎人用户词典…\n * [jitenbot](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002Fjitenbot) - 将日本词典网站和应用中的数据转换为便携式文件格式\n * [mecab-unidic-neologd](https:\u002F\u002Fgithub.com\u002Fneologd\u002Fmecab-unidic-neologd) - 基于网络语言资源的、适用于 mecab-unidic 的新词词典\n * [hololive-dictionary](https:\u002F\u002Fgithub.com\u002Fheppokofrontend\u002Fhololive-dictionary) - 关于 Hololive（Hololive Production）的词典文件。可使用 .\u002Fdictionary 文件夹内的文本文件将词汇添加到 IME 中。详情请参阅 README.md。\n * [jmdict-yomitan](https:\u002F\u002Fgithub.com\u002Fthemoeway\u002Fjmdict-yomitan) - 为 Yomitan\u002FYomichan 提供的 JMdict、JMnedict 和 KANJIDIC 数据。\n * [yomichan-jlpt-vocab](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002Fyomichan-jlpt-vocab) - 为 Yomichan 中的词汇添加 JLPT 等级标签\n * [Jitendex](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002FJitendex) - 免费且开放许可的日英词典，兼容多种词典客户端\n * [jiten](https:\u002F\u002Fgithub.com\u002Fobfusk\u002Fjiten) - 基于 jmdict\u002Fkanjidic 的日语 Android\u002FCLI\u002FWeb 词典 — 日本語　辞典　和英辞典　漢英字典　和独辞典　和蘭辞典\n * [pixiv-yomitan](https:\u002F\u002Fgithub.com\u002FMarvNC\u002Fpixiv-yomitan) - 为 Yomitan 准备的 Pixiv 百科事典词典\n * [uchinaaguchi_dict](https:\u002F\u002Fgithub.com\u002Fnanjakkun\u002Fuchinaaguchi_dict) - 冲绳方言词典\n * [yomitan-dictionaries](https:\u002F\u002Fgithub.com\u002Fmarvnc\u002Fyomitan-dictionaries) - 为 Yomitan 准备的日语和中文词典。\n * [mouse_over_dictionary](https:\u002F\u002Fgithub.com\u002Fkengo700\u002Fmouse_over_dictionary) - 自动读取鼠标悬停单词的通用词典工具\n * [jisyo](https:\u002F\u002Fgithub.com\u002Fskk-dict\u002Fjisyo) - 为假名汉字转换引擎 SKK 设计的新词典格式\n * [skk-jisyo.emoji-ja](https:\u002F\u002Fgithub.com\u002Fymrl\u002Fskk-jisyo.emoji-ja) - 用于将日语读音转换为 Emoji 的 SKK 词典 😂\n * [anthy](https:\u002F\u002Fgithub.com\u002Fnetsphere-labs\u002Fanthy) - Anthy 是一款日语假名汉字转换引擎。它能将罗马字转换为假名，并进一步将假名文本混合成假名和汉字。\n * [aws_dic_for_google_ime](https:\u002F\u002Fgithub.com\u002Fkonyu\u002Faws_dic_for_google_ime) - 适用于 Google 日语输入的 AWS 服务名称词典\n * [cl-skkserv](https:\u002F\u002Fgithub.com\u002Ftani\u002Fcl-skkserv) - 用 Common Lisp 编写的 SKK 词典服务器及其扩展\n * [anthy](https:\u002F\u002Fgithub.com\u002Fxorgy\u002Fanthy) - Anthy 维护\n * [anthy-unicode](https:\u002F\u002Fgithub.com\u002Ffujiwarat\u002Fanthy-unicode) - Anthy Unicode - 另一个 Anthy\n * [azooKey](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002FazooKey) - azooKey：完全用 Swift 开发的日语键盘 iOS 应用程序\n * [azookey-desktop](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002Fazookey-desktop) - 支持 macOS 的桌面端日语输入法“azooKey”\n * [fcitx5-hazkey](https:\u002F\u002Fgithub.com\u002F7ka-hiira\u002Ffcitx5-hazkey) - 基于 azooKey 引擎的 fcitx5 日语输入法\n * [mozcdic-ut-place-names](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-place-names) - Mozc UT 地名词典是由日本邮政的 ZIP 码数据转换而来的 Mozc 词典。\n * [azookeykanakanjiconverter](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002Fazookeykanakanjiconverter) - 用 Swift 编写的假名-汉字转换模块\n * [libkkc](https:\u002F\u002Fgithub.com\u002Fueno\u002Flibkkc) - 日语假名汉字转换输入法库\n * [libskk](https:\u002F\u002Fgithub.com\u002Fueno\u002Flibskk) - 日语 SKK 输入法库\n * [kanayomi-dict](https:\u002F\u002Fgithub.com\u002Fwarihima\u002Fkanayomi-dict) - openjtalk 格式的用户词典\n * [cjkvi-dict](https:\u002F\u002Fgithub.com\u002Fcjkvi\u002Fcjkvi-dict) - 汉字数据库相关的词典数据\n * [wlsp-classical](https:\u002F\u002Fgithub.com\u002Fyocjyet\u002Fwlsp-classical) - 古典日语分类词汇表数据\n * [kanji-dict](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fkanji-dict) - 这是一本用于查询汉字笔顺、读音、笔画数、部首、例句及字源的汉字词典。收录了 Unicode 15.1 中的所有 98,682 个汉字。\n * [Kaomoji_proj](https:\u002F\u002Fgithub.com\u002Fmtripg6666tdr\u002FKaomoji_proj) - (๑ ᴖ ᴑ ᴖ ๑)みょんかおもじ（旧 Kaomoji_proj）是一个为微软公司的输入软件 Microsoft IME 制作表情符号词典的项目。\n * [kotlin-kana-kanji-converter](https:\u002F\u002Fgithub.com\u002FKazumaProject\u002Fkotlin-kana-kanji-converter) - Kotlin 假名汉字转换程序\n * [alfred-japanese-dictionary](https:\u002F\u002Fgithub.com\u002Fchrisgrieser\u002Falfred-japanese-dictionary) - 使用 jisho.org 的日英词典，附带音频、条目 CSV 导出及词典网站预览功能。\n * [ichiran](https:\u002F\u002Fgithub.com\u002Ftshatrov\u002Fichiran) - 用于日语文本的语言学工具\n * [mikan](https:\u002F\u002Fgithub.com\u002Fmojyack\u002Fmikan) - 一种日语输入法。\n * [colloquial-kansai-dictionary](https:\u002F\u002Fgithub.com\u002Fsethclydesdale\u002Fcolloquial-kansai-dictionary) - 用于快速参考口语关西方言课程所学内容的词典。\n * [jisho-open](https:\u002F\u002Fgithub.com\u002Fhlorenzi\u002Fjisho-open) - JMdict 日英词典项目的网页前端，支持学习列表功能！\n * [macskk](https:\u002F\u002Fgithub.com\u002Fmtgto\u002Fmacskk) - 又一款 macOS 版 SKK 输入法\n * [nandoku](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fnandoku) - 这是一本按年级整理的难读汉字词典。\n * [japanese_android_ime](https:\u002F\u002Fgithub.com\u002Fnelsonapenn\u002Fjapanese_android_ime) - 一款面向 Android 的 FOSS 日语 IME\n * [anthywl](https:\u002F\u002Fgithub.com\u002Ftadeokondrak\u002Fanthywl) - 使用 libanthy 的 Sway 日语输入法\n * [sekka](https:\u002F\u002Fgithub.com\u002Fkiyoka\u002Fsekka) - 另一款受 SKK 启发的日语输入法。\n * [sumibi](https:\u002F\u002Fgithub.com\u002Fkiyoka\u002Fsumibi) - 基于 ChatGPT API 的日语输入法\n * [jinmei-dict](https:\u002F\u002Fgithub.com\u002Fs1r-j\u002Fjinmei-dict) - 从词典数据中提取人名，并以假名（片假名）为键，将候选书写形式以列表形式整理成 JSON 格式。\n * [japanesekeyboard](https:\u002F\u002Fgithub.com\u002Fkazumaproject\u002Fjapanesekeyboard) - スミレ 完全离线的日语键盘应用\n * [japanesearabic](https:\u002F\u002Fgithub.com\u002Fa-hamdi\u002Fjapanesearabic) - 日阿词典（日语・阿拉伯语辞书） قاموس اللغة اليابانية والعربية (Yomitan)\n * [o-dic](https:\u002F\u002Fgithub.com\u002Fmakotoga\u002Fo-dic) - 冲绳词典\n * [skk-emoji-jisyo](https:\u002F\u002Fgithub.com\u002Fuasi\u002Fskk-emoji-jisyo) - SKK 表情符号词典\n * [mozcdic-ut-personal-names](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-personal-names) - 用于 Mozc 的个人姓名词典。\n * [mozcdic-ut-sudachidict](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-sudachidict) - 由 SudachiDict 转换而来的 Mozc 词典。\n * [nihongo](https:\u002F\u002Fgithub.com\u002Fsph-mn\u002Fnihongo) - 日语语言数据和词典\n * [kagome-dict](https:\u002F\u002Fgithub.com\u002Fikawaha\u002Fkagome-dict) - Kagome v2 的词典库\n * [canna](https:\u002F\u002Fgithub.com\u002Fcanna-input\u002Fcanna) - Canna 日语输入系统\n * [kansai-accent-dictionary](https:\u002F\u002Fgithub.com\u002Fnullponull\u002Fkansai-accent-dictionary) - 京阪式口音（关西方言）词典 - 收录了 4,615 个词汇的日语方言口音词典\n * [jitendex](https:\u002F\u002Fgithub.com\u002Fjitendex\u002Fjitendex) - 一款免费、离线且开放许可的日英词典。每月更新！\n * [karukan](https:\u002F\u002Fgithub.com\u002Ftogatoga\u002Fkarukan) - 面向 Linux 的日语输入法系统，结合神经网络假名-汉字转换引擎和 fcitx5 IME\n * [shitto-mania-dic](https:\u002F\u002Fgithub.com\u002Fjunikematsu\u002Fshitto-mania-dic) - 嫉妒词典（Shitto-Mania \u002F Jealousy Dictionary）\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [mecab-ipadic-neologd](https:\u002F\u002Fgithub.com\u002Fneologd\u002Fmecab-ipadic-neologd) | - | - | ⭐ 2.8k | 🔴 2020年9月|\n| 🔗 [tdmelodic](https:\u002F\u002Fgithub.com\u002FPKSHATechnology-Research\u002Ftdmelodic) | - | - | ⭐ 124 | 🔴 2024年3月|\n| 🔗 [jamdict](https:\u002F\u002Fgithub.com\u002Fneocl\u002Fjamdict) | 📥 337 | 📦 5.4万 | ⭐ 168 | 🔴 2021年6月|\n| 🔗 [unidic-py](https:\u002F\u002Fgithub.com\u002Fpolm\u002Funidic-py) | 📥 7.2万 | 📦 1000万 | ⭐ 109 | 🔴 2025年2月|\n| 🔗 [Japanese-Company-Lexicon](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FJapanese-Company-Lexicon) | - | - | ⭐ 100 | 🔴 2023年1月|\n| 🔗 [manbyo-sudachi](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fmanbyo-sudachi) | - | - | ⭐ 7 | 🔴 2021年4月|\n| 🔗 [jawiki-kana-kanji-dict](https:\u002F\u002Fgithub.com\u002Ftokuhirom\u002Fjawiki-kana-kanji-dict) | - | - | ⭐ 61 | 🟢 上周二|\n| 🔗 [JIWC-Dictionary](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FJIWC-Dictionary) | - | - | ⭐ 40 | 🔴 2021年1月|\n| 🔗 [JumanDIC](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FJumanDIC) | - | - | ⭐ 4 | 🔴 2022年8月|\n| 🔗 [ipadic-py](https:\u002F\u002Fgithub.com\u002Fpolm\u002Fipadic-py) | 📥 3.2万 | 📦 700万 | ⭐ 24 | 🔴 2021年10月|\n| 🔗 [unidic-lite](https:\u002F\u002Fgithub.com\u002Fpolm\u002Funidic-lite) | 📥 7.8万 | 📦 1000万 | ⭐ 49 | 🔴 2020年9月|\n| 🔗 [emoji-ime-dictionary](https:\u002F\u002Fgithub.com\u002Fpeaceiris\u002Femoji-ime-dictionary) | - | - | ⭐ 366 | 🔴 2023年1月|\n| 🔗 [google-ime-dictionary](https:\u002F\u002Fgithub.com\u002Fpeaceiris\u002Fgoogle-ime-dictionary) | - | - | ⭐ 104 | 🔴 2023年1月|\n| 🔗 [dic-nico-intersection-pixiv](https:\u002F\u002Fgithub.com\u002Fncaq\u002Fdic-nico-intersection-pixiv) | - | - | ⭐ 83 | 🔴 2024年9月|\n| 🔗 [google-ime-user-dictionary-ja-en](https:\u002F\u002Fgithub.com\u002FKEINOS\u002Fgoogle-ime-user-dictionary-ja-en) | - | - | ⭐ 58 | 🔴 2016年12月|\n| 🔗 [emoticon](https:\u002F\u002Fgithub.com\u002Ftiwanari\u002Femoticon) | - | - | ⭐ 44 | 🔴 2020年5月|\n| 🔗 [mecab-mozcdic](https:\u002F\u002Fgithub.com\u002Fakirakubo\u002Fmecab-mozcdic) | - | - | ⭐ 10 | 🔴 2018年1月|\n| 🔗 [denonbu-ime-dic](https:\u002F\u002Fgithub.com\u002Falbno273\u002Fdenonbu-ime-dic) | - | - | ⭐ 2 | 🔴 2022年11月|\n| 🔗 [nijisanji-ime-dic](https:\u002F\u002Fgithub.com\u002FUmichang\u002Fnijisanji-ime-dic) | - | - | ⭐ 38 | 🟢 3月|\n| 🔗 [pokemon-ime-dic](https:\u002F\u002Fgithub.com\u002FUmichang\u002Fpokemon-ime-dic) | - | - | ⭐ 0 | 🔴 2020年1月|\n| 🔗 [EJDict](https:\u002F\u002Fgithub.com\u002Fkujirahand\u002FEJDict) | - | - | ⭐ 254 | 🟡 2025年11月|\n| 🔗 [Ayashiy-Nipongo-Dic](https:\u002F\u002Fgithub.com\u002FRinrin0413\u002FAyashiy-Nipongo-Dic) | - | - | ⭐ 26 | 🔴 2024年5月|\n| 🔗 [genshin-dict](https:\u002F\u002Fgithub.com\u002Fkotofurumiya\u002Fgenshin-dict) | - | - | ⭐ 126 | 🟢 2月|\n| 🔗 [jmdict-simplified](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Fjmdict-simplified) | - | - | ⭐ 349 | 🟢 上周一|\n| 🔗 [mozcdict-ext](https:\u002F\u002Fgithub.com\u002Freasonset\u002Fmozcdict-ext) | - | - | ⭐ 69 | 🟡 2025年9月|\n| 🔗 [mh-dict-jp](https:\u002F\u002Fgithub.com\u002Futubo\u002Fmh-dict-jp) | - | - | ⭐ 5 | 🟡 2025年4月|\n| 🔗 [jitenbot](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002Fjitenbot) | - | - | ⭐ 仓库未找到 | 🔴 仓库未找到|\n| 🔗 [mecab-unidic-neologd](https:\u002F\u002Fgithub.com\u002Fneologd\u002Fmecab-unidic-neologd) | - | - | ⭐ 87 | 🔴 2020年9月|\n| 🔗 [hololive-dictionary](https:\u002F\u002Fgithub.com\u002Fheppokofrontend\u002Fhololive-dictionary) | - | - | ⭐ 24 | 🔴 2024年12月|\n| 🔗 [jmdict-yomitan](https:\u002F\u002Fgithub.com\u002Fthemoeway\u002Fjmdict-yomitan) | - | - | ⭐ 259 | 🟢 2月|\n| 🔗 [yomichan-jlpt-vocab](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002Fyomichan-jlpt-vocab) | - | - | ⭐ 126 | 🟡 2025年8月|\n| 🔗 [Jitendex](https:\u002F\u002Fgithub.com\u002Fstephenmk\u002FJitendex) | - | - | ⭐ 466 | 🟢 今天|\n| 🔗 [jiten](https:\u002F\u002Fgithub.com\u002Fobfusk\u002Fjiten) | - | - | ⭐ 129 | 🔴 2023年12月|\n| 🔗 [pixiv-yomitan](https:\u002F\u002Fgithub.com\u002FMarvNC\u002Fpixiv-yomitan) | - | - | ⭐ 55 | 🟢 3月|\n| 🔗 [uchinaaguchi_dict](https:\u002F\u002Fgithub.com\u002Fnanjakkun\u002Fuchinaaguchi_dict) | - | - | ⭐ 4 | 🟢 上周一|\n| 🔗 [yomitan-dictionaries](https:\u002F\u002Fgithub.com\u002Fmarvnc\u002Fyomitan-dictionaries) | - | - | ⭐ 755 | 🟢 3月|\n| 🔗 [mouse_over_dictionary](https:\u002F\u002Fgithub.com\u002Fkengo700\u002Fmouse_over_dictionary) | - | - | ⭐ 72 | 🔴 2020年1月|\n| 🔗 [jisyo](https:\u002F\u002Fgithub.com\u002Fskk-dict\u002Fjisyo) | - | - | ⭐ 28 | 🔴 2023年9月|\n| 🔗 [skk-jisyo.emoji-ja](https:\u002F\u002Fgithub.com\u002Fymrl\u002Fskk-jisyo.emoji-ja) | - | - | ⭐ 30 | 🔴 2018年3月|\n| 🔗 [aws_dic_for_google_ime](https:\u002F\u002Fgithub.com\u002Fkonyu\u002Faws_dic_for_google_ime) | - | - | ⭐ 7 | 🔴 2019年11月|\n| 🔗 [cl-skkserv](https:\u002F\u002Fgithub.com\u002Ftani\u002Fcl-skkserv) | - | - | ⭐ 31 | 🔴 2024年10月|\n| 🔗 [anthy](https:\u002F\u002Fgithub.com\u002Fxorgy\u002Fanthy) | - | - | ⭐ 3 | 🔴 2013年7月|\n| 🔗 [anthy-unicode](https:\u002F\u002Fgithub.com\u002Ffujiwarat\u002Fanthy-unicode) | - | - | ⭐ 42 | 🟢 3月|\n| 🔗 [azooKey](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002FazooKey) | - | - | ⭐ 684 | 🟢 昨天|\n| 🔗 [azookey-desktop](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002Fazookey-desktop) | - | - | ⭐ 876 | 🟢 上周一|\n| 🔗 [fcitx5-hazkey](https:\u002F\u002Fgithub.com\u002F7ka-hiira\u002Ffcitx5-hazkey) | - | - | ⭐ 183 | 🟢 2月|\n| 🔗 [mozcdic-ut-place-names](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-place-names) | - | - | ⭐ 22 | 🟢 上周四|\n| 🔗 [azookeykanakanjiconverter](https:\u002F\u002Fgithub.com\u002Fensan-hcl\u002Fazookeykanakanjiconverter) | - | - | ⭐ 139 | 🟢 上周二|\n| 🔗 [libkkc](https:\u002F\u002Fgithub.com\u002Fueno\u002Flibkkc) | - | - | ⭐ 112 | 🔴 2024年8月|\n| 🔗 [libskk](https:\u002F\u002Fgithub.com\u002Fueno\u002Flibskk) | - | - | ⭐ 100 | 🟢 3月|\n| 🔗 [kanayomi-dict](https:\u002F\u002Fgithub.com\u002Fwarihima\u002Fkanayomi-dict) | - | - | ⭐ 仓库未找到 | 🔴 仓库未找到|\n| 🔗 [cjkvi-dict](https:\u002F\u002Fgithub.com\u002Fcjkvi\u002Fcjkvi-dict) | - | - | ⭐ 110 | 🔴 2017年9月|\n| 🔗 [wlsp-classical](https:\u002F\u002Fgithub.com\u002Fyocjyet\u002Fwlsp-classical) | - | - | ⭐ 2 | 🟡 2025年11月|\n| 🔗 [kanji-dict](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fkanji-dict) | - | - | ⭐ 6 | 🟢 3月|\n| 🔗 [Kaomoji_proj](https:\u002F\u002Fgithub.com\u002Fmtripg6666tdr\u002FKaomoji_proj) | - | - | ⭐ 11 | 🟡 2025年10月|\n| 🔗 [kotlin-kana-kanji-converter](https:\u002F\u002Fgithub.com\u002FKazumaProject\u002Fkotlin-kana-kanji-converter) | - | - | ⭐ 5 | 🟢 上周三|\n| 🔗 [alfred-japanese-dictionary](https:\u002F\u002Fgithub.com\u002Fchrisgrieser\u002Falfred-japanese-dictionary) | - | - | ⭐ 6 | 🟢 2月|\n| 🔗 [ichiran](https:\u002F\u002Fgithub.com\u002Ftshatrov\u002Fichiran) | - | - | ⭐ 390 | 🟢 1月|\n| 🔗 [mikan](https:\u002F\u002Fgithub.com\u002Fmojyack\u002Fmikan) | - | - | ⭐ 24 | 🟡 2025年6月|\n| 🔗 [colloquial-kansai-dictionary](https:\u002F\u002Fgithub.com\u002Fsethclydesdale\u002Fcolloquial-kansai-dictionary) | - | - | ⭐ 9 | 🟢 2月|\n| 🔗 [jisho-open](https:\u002F\u002Fgithub.com\u002Fhlorenzi\u002Fjisho-open) | - | - | ⭐ 57 | 🟢 2月|\n| 🔗 [macskk](https:\u002F\u002Fgithub.com\u002Fmtgto\u002Fmacskk) | - | - | ⭐ 287 | 🟢 今天|\n| 🔗 [nandoku](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fnandoku) | - | - | ⭐ 1 | 🟢 2月|\n| 🔗 [japanese_android_ime](https:\u002F\u002Fgithub.com\u002Fnelsonapenn\u002Fjapanese_android_ime) | - | - | ⭐ 2 | 🟡 2025年9月|\n| 🔗 [anthywl](https:\u002F\u002Fgithub.com\u002Ftadeokondrak\u002Fanthywl) | - | - | ⭐ 34 | 🟡 2025年4月|\n| 🔗 [sekka](https:\u002F\u002Fgithub.com\u002Fkiyoka\u002Fsekka) | - | - | ⭐ 24 | 🟡 2025年7月|\n| 🔗 [sumibi](https:\u002F\u002Fgithub.com\u002Fkiyoka\u002Fsumibi) | - | - | ⭐ 43 | 🟢 3月|\n| 🔗 [jinmei-dict](https:\u002F\u002Fgithub.com\u002Fs1r-j\u002Fjinmei-dict) | - | - | ⭐ 7 | 🔴 2020年4月|\n| 🔗 [japanesekeyboard](https:\u002F\u002Fgithub.com\u002Fkazumaproject\u002Fjapanesekeyboard) | - | - | ⭐ 226 | 🟢 上周五|\n| 🔗 [japanesearabic](https:\u002F\u002Fgithub.com\u002Fa-hamdi\u002Fjapanesearabic) | - | - | ⭐ 19 | 🟡 2025年5月|\n| 🔗 [o-dic](https:\u002F\u002Fgithub.com\u002Fmakotoga\u002Fo-dic) | - | - | ⭐ 6 | 🔴 无效|\n| 🔗 [skk-emoji-jisyo](https:\u002F\u002Fgithub.com\u002Fuasi\u002Fskk-emoji-jisyo) | - | - | ⭐ 140 | 🔴 2025年1月|\n| 🔗 [mozcdic-ut-personal-names](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-personal-names) | - | - | ⭐ 26 | 🟢 上周四|\n| 🔗 [mozcdic-ut-sudachidict](https:\u002F\u002Fgithub.com\u002Futuhiro78\u002Fmozcdic-ut-sudachidict) | - | - | ⭐ 22 | 🟢 2月|\n| 🔗 [nihongo](https:\u002F\u002Fgithub.com\u002Fsph-mn\u002Fnihongo) | - | - | ⭐ 20 | 🔴 2025年1月|\n| 🔗 [kagome-dict](https:\u002F\u002Fgithub.com\u002Fikawaha\u002Fkagome-dict) | - | - | ⭐ 15 | 🟢 3月|\n| 🔗 [canna](https:\u002F\u002Fgithub.com\u002Fcanna-input\u002Fcanna) | - | - | ⭐ 4 | 🟡 2025年8月|\n| 🔗 [kansai-accent-dictionary](https:\u002F\u002Fgithub.com\u002Fnullponull\u002Fkansai-accent-dictionary) | - | - | ⭐ 1 | 🟡 2025年12月|\n| 🔗 [jitendex](https:\u002F\u002Fgithub.com\u002Fjitendex\u002Fjitendex) | - | - | ⭐ 466 | 🟢 今天|\n| 🔗 [karukan](https:\u002F\u002Fgithub.com\u002Ftogatoga\u002Fkarukan) | - | - | ⭐ 262 | 🟢 2月|\n| 🔗 [shitto-mania-dic](https:\u002F\u002Fgithub.com\u002Fjunikematsu\u002Fshitto-mania-dic) | - | - | ⭐ 0 | 🟢 3月|\n\n## 语料库\n\n### 词性标注 \u002F 命名实体识别\n带有词性标注和命名实体标注的语料库\n\n * [ner-wikipedia-dataset](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fner-wikipedia-dataset) - 基于维基百科的日语命名实体抽取数据集\n * [IOB2Corpus](https:\u002F\u002Fgithub.com\u002FHironsan\u002FIOB2Corpus) - 用于命名实体识别的日语IOB2标注语料库。\n * [TwitterCorpus](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002FTwitterCorpus) - 首都大学日语Twitter语料库\n * [UD_Japanese-PUD](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002FUD_Japanese-PUD) - 平行通用依存关系树库。\n * [UD_Japanese-GSD](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002FUD_Japanese-GSD) - 来自Google UDT 2.0的日语数据。\n * [KWDLC](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FKWDLC) - 京都大学网络文档线索语料库\n * [AnnotatedFKCCorpus](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FAnnotatedFKCCorpus) - 注释富山买卖中心语料库\n * [UD_Japanese-GSDLUW](https:\u002F\u002Fgithub.com\u002FUniversalDependencies\u002FUD_Japanese-GSDLUW) - UD_Japanese-GSD的长单位词版本\n * [ud_japanese-bccwj](https:\u002F\u002Fgithub.com\u002Funiversaldependencies\u002Fud_japanese-bccwj) - 这个通用依存关系（UD）日语树库基于UD文档中描述的UD日语规范定义。\n\n\n|名称|每周下载量|总下载量|星标数|最后提交|\n-|-|-|-|-\n| 🔗 [ner-wikipedia-dataset](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fner-wikipedia-dataset) | - | - | ⭐ 142 | 🔴 2023年9月|\n| 🔗 [IOB2Corpus](https:\u002F\u002Fgithub.com\u002FHironsan\u002FIOB2Corpus) | - | - | ⭐ 61 | 🔴 2020年2月|\n| 🔗 [TwitterCorpus](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002FTwitterCorpus) | - | - | ⭐ 21 | 🔴 2016年3月|\n| 🔗 [UD_Japanese-PUD](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002FUD_Japanese-PUD) | - | - | ⭐ 0 | 🔴 2020年5月|\n| 🔗 [UD_Japanese-GSD](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002FUD_Japanese-GSD) | - | - | ⭐ 28 | 🔴 2022年5月|\n| 🔗 [KWDLC](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FKWDLC) | - | - | ⭐ 83 | 🔴 2023年12月|\n| 🔗 [AnnotatedFKCCorpus](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FAnnotatedFKCCorpus) | - | - | ⭐ 18 | 🔴 2023年12月|\n| 🔗 [anthy](https:\u002F\u002Fgithub.com\u002Fnetsphere-labs\u002Fanthy) | - | - | ⭐ 16 | 🔴 2023年2月|\n| 🔗 [UD_Japanese-GSDLUW](https:\u002F\u002Fgithub.com\u002FUniversalDependencies\u002FUD_Japanese-GSDLUW) | - | - | ⭐ 3 | 🟡 2025年11月|\n| 🔗 [ud_japanese-bccwj](https:\u002F\u002Fgithub.com\u002Funiversaldependencies\u002Fud_japanese-bccwj) | - | - | ⭐ 26 | 🟡 2025年11月|\n\n\n### 平行语料库\n包含用于翻译任务的对齐句子的双语语料库\n\n * [small_parallel_enja](https:\u002F\u002Fgithub.com\u002Fodashi\u002Fsmall_parallel_enja) - 用于机器翻译基准测试的5万句英日平行语料库。\n * [Web-Crawled-Corpus-for-Japanese-Chinese-NMT](https:\u002F\u002Fgithub.com\u002Fzhang-jinyi\u002FWeb-Crawled-Corpus-for-Japanese-Chinese-NMT) - 用于日中NMT的网络爬取语料库\n * [CourseraParallelCorpusMining](https:\u002F\u002Fgithub.com\u002Fshyyhs\u002FCourseraParallelCorpusMining) - Coursera语料挖掘及多阶段微调以提升讲座翻译质量\n * [JESC](https:\u002F\u002Fgithub.com\u002Frpryzant\u002FJESC) - 大型英日平行语料库\n * [AMI-Meeting-Parallel-Corpus](https:\u002F\u002Fgithub.com\u002Ftsuruoka-lab\u002FAMI-Meeting-Parallel-Corpus) - AMI会议平行语料库\n * [giant_ja-en_parallel_corpus](https:\u002F\u002Fgithub.com\u002FDayuanJiang\u002Fgiant_ja-en_parallel_corpus) - 本目录包含一个巨大的日英字幕语料库。原始数据来源于斯坦福大学的JESC项目。\n * [jesc_small](https:\u002F\u002Fgithub.com\u002Fyusugomori\u002Fjesc_small) - 小型日英字幕语料库\n * [graded-enja-corpus](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fgraded-enja-corpus) - 考虑禁忌用语和词汇级别的日英对照语料库。\n * [cjk-compsci-terms](https:\u002F\u002Fgithub.com\u002Fdahlia\u002Fcjk-compsci-terms) - 中日韩计算机科学术语对照 \u002F 中日韩电脑科学术语对照 \u002F 日中韩的计算机科学术语对照 \u002F 한·중·일 전산학 용어 대조\n * [Laboro-ParaCorpus](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-ParaCorpus) - 用于创建日英平行语料库和训练NMT模型的脚本\n * [google-vs-deepl-je](https:\u002F\u002Fgithub.com\u002FTzawa\u002Fgoogle-vs-deepl-je) - google-vs-deepl-je\n * [matcha](https:\u002F\u002Fgithub.com\u002Fehimenlp\u002Fmatcha) - 从面向访日游客的媒体MATCHA的文章中，构建了用于简化日语文本的数据集。\n * [en-ja-el](https:\u002F\u002Fgithub.com\u002Fshigashiyama\u002Fen-ja-el) - EnJaEL：英日平行实体链接数据集（版本1.0）\n\n\n|名称|每周下载量|总下载量|星标数|最后提交|\n-|-|-|-|-\n| 🔗 [small_parallel_enja](https:\u002F\u002Fgithub.com\u002Fodashi\u002Fsmall_parallel_enja) | - | - | ⭐ 98 | 🔴 2019年9月|\n| 🔗 [Web-Crawled-Corpus-for-Japanese-Chinese-NMT](https:\u002F\u002Fgithub.com\u002Fzhang-jinyi\u002FWeb-Crawled-Corpus-for-Japanese-Chinese-NMT) | - | - | ⭐ 15 | 🔴 2023年9月|\n| 🔗 [CourseraParallelCorpusMining](https:\u002F\u002Fgithub.com\u002Fshyyhs\u002FCourseraParallelCorpusMining) | - | - | ⭐ 15 | 🔴 2024年8月|\n| 🔗 [JESC](https:\u002F\u002Fgithub.com\u002Frpryzant\u002FJESC) | - | - | ⭐ 89 | 🔴 2017年11月|\n| 🔗 [AMI-Meeting-Parallel-Corpus](https:\u002F\u002Fgithub.com\u002Ftsuruoka-lab\u002FAMI-Meeting-Parallel-Corpus) | - | - | ⭐ 11 | 🔴 2020年12月|\n| 🔗 [giant_ja-en_parallel_corpus](https:\u002F\u002Fgithub.com\u002FDayuanJiang\u002Fgiant_ja-en_parallel_corpus) | - | - | ⭐ 5 | 🔴 2019年8月|\n| 🔗 [jesc_small](https:\u002F\u002Fgithub.com\u002Fyusugomori\u002Fjesc_small) | - | - | ⭐ 3 | 🔴 2019年7月|\n| 🔗 [graded-enja-corpus](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Fgraded-enja-corpus) | - | - | ⭐ 6 | 🟡 2025年8月|\n| 🔗 [cjk-compsci-terms](https:\u002F\u002Fgithub.com\u002Fdahlia\u002Fcjk-compsci-terms) | - | - | ⭐ 150 | 🟢 2月|\n| 🔗 [Laboro-ParaCorpus](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FLaboro-ParaCorpus) | - | - | ⭐ 18 | 🔴 2021年11月|\n| 🔗 [google-vs-deepl-je](https:\u002F\u002Fgithub.com\u002FTzawa\u002Fgoogle-vs-deepl-je) | - | - | ⭐ 4 | 🔴 2020年3月|\n| 🔗 [matcha](https:\u002F\u002Fgithub.com\u002Fehimenlp\u002Fmatcha) | - | - | ⭐ 6 | 🔴 2025年1月|\n| 🔗 [en-ja-el](https:\u002F\u002Fgithub.com\u002Fshigashiyama\u002Fen-ja-el) | - | - | ⭐ 2 | 🔴 2025年1月|\n\n### 对话语料库\n用于训练对话系统的对话数据集合\n\n * [JMRD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FJMRD) - 日本电影推荐对话数据集\n * [open2ch-dialogue-corpus](https:\u002F\u002Fgithub.com\u002F1never\u002Fopen2ch-dialogue-corpus) - 通过爬取“开放2ch”论坛创建的对话语料库\n * [BSD](https:\u002F\u002Fgithub.com\u002Ftsuruoka-lab\u002FBSD) - 商务场景对话语料库\n * [asdc](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fasdc) - 宿泊设施搜索对话语料库\n * [japanese-corpus](https:\u002F\u002Fgithub.com\u002FMokkeMeguru\u002Fjapanese-corpus) - 用于序列到序列等任务的日语对话数据\n * [BPersona-chat](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FBPersona-chat) - 该仓库包含日英双语聊天语料库 BPersona-chat，该语料库发表于 AACL-IJCNLP 2022 的 Workshop Eval4NLP 2022 上的论文《Chat Translation Error Detection for Assisting Cross-lingual Communications》中。\n * [japanese-daily-dialogue](https:\u002F\u002Fgithub.com\u002Fjqk09a\u002Fjapanese-daily-dialogue) - 日本日常对话语料库，或日语中的“日本語日常対話コーパス”，是一个高质量的多轮对话数据集，包含关于五个主题的日常对话：日常生活、学校、旅行、健康和娱乐。\n * [llm-japanese-dataset](https:\u002F\u002Fgithub.com\u002Fmasanorihirano\u002Fllm-japanese-dataset) - 用于构建大型语言模型的日语聊天数据集\n * [kokorochat](https:\u002F\u002Fgithub.com\u002Fuec-inabalab\u002Fkokorochat) - 通过角色扮演收集的日语咨询对话数据集\n * [JMultiWOZ-TC](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002FJMultiWOZ-TC) - 多轮对话中智能体函数调用能力的评估\n * [HOTATE](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FHOTATE) - 包含真实想法与表面客套话的日语对话数据集\n * [ETCDataset](https:\u002F\u002Fgithub.com\u002FUEC-InabaLab\u002FETCDataset) - 对话情感转录数据集是由约1,000段对话组成的日语对话数据集，每段对话都包含了说话者自己对每个话语的情感描述。\n\n\n|名称|每周下载量|总下载量|星标数|最近一次提交|\n-|-|-|-|-\n| 🔗 [JMRD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FJMRD) | - | - | ⭐ 29 | 🔴 2022年7月|\n| 🔗 [open2ch-dialogue-corpus](https:\u002F\u002Fgithub.com\u002F1never\u002Fopen2ch-dialogue-corpus) | - | - | ⭐ 99 | 🔴 2021年6月|\n| 🔗 [BSD](https:\u002F\u002Fgithub.com\u002Ftsuruoka-lab\u002FBSD) | - | - | ⭐ 73 | 🔴 2021年11月|\n| 🔗 [asdc](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fasdc) | - | - | ⭐ 25 | 🔴 2023年8月|\n| 🔗 [japanese-corpus](https:\u002F\u002Fgithub.com\u002FMokkeMeguru\u002Fjapanese-corpus) | - | - | ⭐ 3 | 🔴 2018年10月|\n| 🔗 [BPersona-chat](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002FBPersona-chat) | - | - | ⭐ 5 | 🔴 2023年1月|\n| 🔗 [japanese-daily-dialogue](https:\u002F\u002Fgithub.com\u002Fjqk09a\u002Fjapanese-daily-dialogue) | - | - | ⭐ 56 | 🔴 2023年3月|\n| 🔗 [llm-japanese-dataset](https:\u002F\u002Fgithub.com\u002Fmasanorihirano\u002Fllm-japanese-dataset) | - | - | ⭐ 88 | 🔴 2024年1月|\n| 🔗 [kokorochat](https:\u002F\u002Fgithub.com\u002Fuec-inabalab\u002Fkokorochat) | - | - | ⭐ 20 | 🟡 2025年8月|\n| 🔗 [JMultiWOZ-TC](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002FJMultiWOZ-TC) | - | - | ⭐ 0 | 🟢 3月|\n| 🔗 [HOTATE](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FHOTATE) | - | - | ⭐ 1 | 🟢 2月|\n| 🔗 [ETCDataset](https:\u002F\u002Fgithub.com\u002FUEC-InabaLab\u002FETCDataset) | - | - | ⭐ 12 | 🟢 1月|\n\n### 其他\n用于问答或蕴含识别等任务的语料库\n\n* [jrte-corpus](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fjrte-corpus) - 日语真实文本蕴含语料库（NLP 2020, LREC 2020）\n * [kanji-data](https:\u002F\u002Fgithub.com\u002Fdavidluzgouveia\u002Fkanji-data) - 包含更新后的JLPT等级和WaniKani信息的JSON格式汉字数据集\n * [JapaneseWordSimilarityDataset](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002FJapaneseWordSimilarityDataset) - 日语词语相似度数据集\n * [simple-jppdb](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002Fsimple-jppdb) - 用于日语文本简化任务的释义数据库\n * [chABSA-dataset](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FchABSA-dataset) - chakki的基于方面的情感分析数据集\n * [JaQuAD](https:\u002F\u002Fgithub.com\u002FSkelterLabsInc\u002FJaQuAD) - JaQuAD：面向机器阅读理解的日语问答数据集（2022年，Skelter Labs）\n * [JaNLI](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJaNLI) - 日语对抗性自然语言推理数据集\n * [ebe-dataset](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Febe-dataset) - 基于证据的解释数据集（AACL-IJCNLP 2020）\n * [emoji-ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Femoji-ja) - UNICODE表情符号的日语读音\u002F关键词\u002F分类词典\n * [nayose-wikipedia-ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fnayose-wikipedia-ja) - 由维基百科构建的日语名称归一化数据集\n * [ja.text8](https:\u002F\u002Fgithub.com\u002FHironsan\u002Fja.text8) - 用于词嵌入的日语文本8语料库。\n * [ThreeLineSummaryDataset](https:\u002F\u002Fgithub.com\u002FKodairaTomonori\u002FThreeLineSummaryDataset) - 三行摘要数据集\n * [japanese](https:\u002F\u002Fgithub.com\u002Fhingston\u002Fjapanese) - 该仓库包含由利兹大学语料库统计出的44,998个最常见日语词汇，按出现频率排序。\n * [kanji-frequency](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Fkanji-frequency) - 从各种来源收集的汉字使用频率数据\n * [TEDxJP-10K](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FTEDxJP-10K) - TEDxJP-10K ASR评估数据集\n * [CoARiJ](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FCoARiJ) - 日本年度报告语料库\n * [technological-book-corpus-ja](https:\u002F\u002Fgithub.com\u002Ftextlint-ja\u002Ftechnological-book-corpus-ja) - 收集的日语技术书籍原始语料库\u002F工具\n * [ita-corpus-chuwa](https:\u002F\u002Fgithub.com\u002Fshirayu\u002Fita-corpus-chuwa) - ITA语料库的分块词汇标注\n * [wikipedia-utils](https:\u002F\u002Fgithub.com\u002Fsingletongue\u002Fwikipedia-utils) - 用于NLP预处理维基百科文本的实用脚本\n * [inappropriate-words-ja](https:\u002F\u002Fgithub.com\u002FMosasoM\u002Finappropriate-words-ja) - 收集日语中的不当表达。可用于自然语言处理时的数据清洗等。\n * [house-of-councillors](https:\u002F\u002Fgithub.com\u002Fsmartnews-smri\u002Fhouse-of-councillors) - 整理了日本参议院官方网站上的党派、议员、议案及质询书的相关数据。\n * [house-of-representatives](https:\u002F\u002Fgithub.com\u002Fsmartnews-smri\u002Fhouse-of-representatives) - 国会议案数据库：众议院\n * [STAIR-captions](https:\u002F\u002Fgithub.com\u002FSTAIR-Lab-CIT\u002FSTAIR-captions) - STAIR captions：大规模日语图像描述数据集\n * [Winograd-Schema-Challenge-Ja](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FWinograd-Schema-Challenge-Ja) - 日语版维诺格拉德模式挑战\n * [speechBSD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FspeechBSD) - 扩展版BSD语料库，包含音频及说话人属性信息\n * [ita-corpus](https:\u002F\u002Fgithub.com\u002Fmmorise\u002Fita-corpus) - ITA语料库的文章列表\n * [rohan4600](https:\u002F\u002Fgithub.com\u002Fmmorise\u002Frohan4600) - 摩拉平衡型日语语料库\n * [anlp-jp-history](https:\u002F\u002Fgithub.com\u002Fwhym\u002Fanlp-jp-history) - 日本语言处理学会年会演讲的完整列表及其机器可读版本等\n * [keigo_transfer_task](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fkeigo_transfer_task) - 敬语转换任务的评估数据集\n * [loanwords_gairaigo](https:\u002F\u002Fgithub.com\u002Fjamesohortle\u002Floanwords_gairaigo) - 日语中的英语外来词\n * [jawikicorpus](https:\u002F\u002Fgithub.com\u002Fwikiwikification\u002Fjawikicorpus) - 日语维基百科维基化语料库\n * [GeneralPolicySpeechOfPrimeMinisterOfJapan](https:\u002F\u002Fgithub.com\u002Fyuukimiyo\u002FGeneralPolicySpeechOfPrimeMinisterOfJapan) - 这是日本首相施政演说的日语文本语料库\n * [wrime](https:\u002F\u002Fgithub.com\u002Fids-cv\u002Fwrime) - WRIME：主客观情感分析数据集\n * [jtubespeech](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fjtubespeech) - JTubeSpeech：从YouTube收集的日语音频语料库\n * [WikipediaWordFrequencyList](https:\u002F\u002Fgithub.com\u002Fmaeda6uiui-backup\u002FWikipediaWordFrequencyList) - 日语维基百科中高频词汇列表\n * [kokkosho_data](https:\u002F\u002Fgithub.com\u002Frindybell\u002Fkokkosho_data) - 车辆故障信息相关数据集\n * [pdmocrdataset-part1](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fpdmocrdataset-part1) - 在数字化资料OCR文本化项目中创建的OCR学习用数据集\n * [huriganacorpus-ndlbib](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhuriganacorpus-ndlbib) - 由全国书目数据生成的假名标注数据集\n * [jvs_hiho](https:\u002F\u002Fgithub.com\u002FHiroshiba\u002Fjvs_hiho) - 自制的JVS（日语多用途语音）语料库标签\n * [hirakanadic](https:\u002F\u002Fgithub.com\u002Fpo3rin\u002Fhirakanadic) - 允许Sudachi从任何复合词列表中将平假名规范化为片假名\n * [animedb](https:\u002F\u002Fgithub.com\u002Fanilogia\u002Fanimedb) - 约100年来动画作品列表数据库\n * [security_words](https:\u002F\u002Fgithub.com\u002FSaitoLab\u002Fsecurity_words) - 与网络安全相关的官方机构的英日对照词汇\n * [Data-on-Japanese-Diet-Members](https:\u002F\u002Fgithub.com\u002Fsugi2000\u002FData-on-Japanese-Diet-Members) - 日本国会议员数据\n * [honkoku-data](https:\u002F\u002Fgithub.com\u002Fyuta1984\u002Fhonkoku-data) - 这是历史资料市民参与型转录平台“大家一起转录”的文本数据存放处。\u002F 在“大家一起转录”（https:\u002F\u002Fhonkoku.org）平台上创建的历史日文文献转录文本。\n * [wikihow_japanese](https:\u002F\u002Fgithub.com\u002FKatsumata420\u002Fwikihow_japanese) - wikiHow数据集（日语版）\n * [engineer-vocabulary-list](https:\u002F\u002Fgithub.com\u002Fmercari\u002Fengineer-vocabulary-list) - 日英双语工程师词汇表\n * [JSICK](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJSICK) - 涉及组合知识的日语文本数据集\u002FJSICK压力测试集\n * [phishurl-list](https:\u002F\u002Fgithub.com\u002FJPCERTCC\u002Fphishurl-list) - 来自JPCERT\u002FCC的网络钓鱼URL数据集\n * [jcms](https:\u002F\u002Fgithub.com\u002Fshigashiyama\u002Fjcms) - 多领域专业日语语料库（JCMS）\n * [aozorabunko_text](https:\u002F\u002Fgithub.com\u002Faozorahack\u002Faozorabunko_text) - www.aozora.gr.jp的纯文本档案\n * [friendly_JA-Corpus](https:\u002F\u002Fgithub.com\u002Fastremo\u002Ffriendly_JA-Corpus) - friendly_JA是一个平行的日语-日语语料库，旨在通过使用源自拉丁语\u002F英语的片假名词汇而非传统的汉日词汇来使日语更易懂\n * [topokanji](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Ftopokanji) - 按拓扑顺序排列的汉字列表，便于高效学习\n * [isbn4groups](https:\u002F\u002Fgithub.com\u002Furibo\u002Fisbn4groups) - 关于ISBN-13中日本出版物（978-4-XXXXXXXXX）的数据等\n * [NMeCab](https:\u002F\u002Fgithub.com\u002Fkomutan\u002FNMeCab) - NMeCab：关于.NET平台上的日语形态分析器\n * [ndlngramdata](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlngramdata) - 由数字化资料生成的OCR文本数据的n元组频率统计信息数据集\n * [ndlngramviewer_v2](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlngramviewer_v2) - 2023年1月更新的NDL Ngram Viewer源代码等全套文件\n * [data_set](https:\u002F\u002Fgithub.com\u002Fjapanese-law-analysis\u002Fdata_set) - 法律和判例相关数据集\n * [huggingface-datasets_wrime](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_wrime) - Hugging Face数据集中的WRIME\n * [ndl-minhon-ocrdataset](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndl-minhon-ocrdataset) - NDL古籍OCR学习用数据集（大家转录加工数据）\n * [PAX_SAPIENTICA](https:\u002F\u002Fgithub.com\u002FAsPJT\u002FPAX_SAPIENTICA) - GIS & 考古模拟器。2023年开发中。\n * [j-liwc2015](https:\u002F\u002Fgithub.com\u002Ftasukuigarashi\u002Fj-liwc2015) - 日语版LIWC2015\n * [huggingface-datasets_livedoor-news-corpus](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_livedoor-news-corpus) - Hugging Face数据集中日语Livedoor新闻语料库\n * [huggingface-datasets_JGLUE](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_JGLUE) - JGLUE：Hugging Face数据集中的日语通用语言理解评估\n * [commonsense-moral-ja](https:\u002F\u002Fgithub.com\u002FLanguage-Media-Lab\u002Fcommonsense-moral-ja) - JCommonsenseMorality是一个通过众包创建的数据集，反映了日本标注者的常识性道德观。\n * [comet-atomic-ja](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fcomet-atomic-ja) - COMET-ATOMIC ja\n * [dcsg-ja](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fdcsg-ja) - 日语对话常识图谱\n * [japanese-toxic-dataset](https:\u002F\u002Fgithub.com\u002Finspection-ai\u002Fjapanese-toxic-dataset) - “日语毒性模式的提出与评估”提供了一个针对日语中毒性的模式和数据集。\n * [camera](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002Fcamera) - CAMERA（CyberAgent广告文案生成多模态评估）是日语广告文案生成数据集。\n * [Japanese-Fakenews-Dataset](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-Fakenews-Dataset) - 日语虚假新闻数据集\n * [jpn_explainable_qa_dataset](https:\u002F\u002Fgithub.com\u002Faiishii\u002Fjpn_explainable_qa_dataset) - jpn_explainable_qa_dataset\n * [copa-japanese](https:\u002F\u002Fgithub.com\u002Fnlp-titech\u002Fcopa-japanese) - 日语COPA数据集\n * [WLSP-familiarity](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002FWLSP-familiarity) - “语义原则词汇表（WLSP）”的单词熟悉度\n * [ProSub](https:\u002F\u002Fgithub.com\u002Fmatbahasa\u002FProSub) - 一项关于代词替代词和称谓的跨语言研究\n * [commonsense-moral-ja](https:\u002F\u002Fgithub.com\u002FLanguage-Media-Lab\u002Fcommonsense-moral-ja) - JCommonsenseMorality是一个通过众包创建的数据集，反映了日本标注者的常识性道德观。\n * [ramendb](https:\u002F\u002Fgithub.com\u002Fnuko-yokohama\u002Framendb) - 从某个数据库（ https:\u002F\u002Fsupleks.jp\u002F ）抓取的工具和收集的数据\n * [huggingface-datasets_CAMERA](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_CAMERA) - Hugging Face数据集中的CAMERA（CyberAgent广告文案生成多模态评估）\n * [FactCheckSentenceNLI-FCSNLI-](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002FFactCheckSentenceNLI-FCSNLI-) - FactCheckSentenceNLI数据集\n * [databricks-dolly-15k-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Fdatabricks-dolly-15k-ja) - 这是将用于训练databricks\u002Fdolly-v2-12b的学习数据databricks-dolly-15k.jsonl翻译成日语后的数据集。\n * [EaST-MELD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FEaST-MELD) - EaST-MELD是一个基于MELD的情感感知语音翻译的英日数据集。\n * [meconaudio](https:\u002F\u002Fgithub.com\u002Felith-co-jp\u002Fmeconaudio) - Mecon Audio（医学会议音频）是厚生劳动省主办的先进医疗会议记录的朗读数据集。\n * [japanese-addresses](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fjapanese-addresses) - 全国町丁目级别的地址数据开放数据\n * [aozorasearch](https:\u002F\u002Fgithub.com\u002Fmyokoym\u002Faozorasearch) - Groonga实现的青空文库全文检索系统。青空文库全文检索库兼Web应用。\n * [llm-jp-corpus](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-corpus) - 该仓库包含用于重现LLM-jp语料库的脚本。\n * [alpaca_ja](https:\u002F\u002Fgithub.com\u002Fshi3z\u002Falpaca_ja) - 将Alpaca数据集翻译成日语\n * [instruction_ja](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Finstruction_ja) - 日语指令数据\n * [japanese-family-names](https:\u002F\u002Fgithub.com\u002Fsiikamiika\u002Fjapanese-family-names) - 日本前5000个姓氏，附带读音，按出现频率排序。\n * [kanji-data-media](https:\u002F\u002Fgithub.com\u002Fkanjialive\u002Fkanji-data-media) - Kanji alive提供的关于汉字、部首、媒体文件、字体及相关资源的日语语言数据\n * [reazonspeech](https:\u002F\u002Fgithub.com\u002Freazon-research\u002Freazonspeech) - 在家构建大规模日语音频语料库\n * [huriganacorpus-aozora](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhuriganacorpus-aozora) - 由青空文库及Sapie盲文数据生成的假名标注数据集\n * [koniwa](https:\u002F\u002Fgithub.com\u002Fkoniwa\u002Fkoniwa) - 一个开放的日语标注语音集合\n * [JMMLU](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002FJMMLU) - 日语多任务语言理解基准\n * [hurigana-speech-corpus-aozora](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhurigana-speech-corpus-aozora) - 青空文库带假名注释的语音语料库数据集\n * [jqara](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fjqara) - JQaRA：带有检索增强的日语问答——用于检索增强（RAG）评估的日语问答数据集\n * [jemhopqa](https:\u002F\u002Fgithub.com\u002Faiishii\u002Fjemhopqa) - JEMHopQA（日语可解释多跳问答）是一个可以评估内部推理过程的日语多跳问答数据集。\n * [jacred](https:\u002F\u002Fgithub.com\u002Fyoumima\u002Fjacred) - 日本文档级关系抽取数据集的存储库（计划于三月发布）。\n * [jades](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fjades) - JADES是一个用于日语文本简化的目标数据集，在“JADES：面向非母语者的新型日语文本简化数据集”一文中有所介绍（论文即将发表）。\n * [do-not-answer-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Fdo-not-answer-ja) - 将2023年8月由墨尔本大学发布的安全评估数据集《Do-Not-Answer》自动翻译成日语，并结合日本文化进行了修改，以便在日语LLM的评估中也能使用。\n * [oasst1-89k-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Foasst1-89k-ja) - 这是将OpenAssistant的开源数据OASST1翻译成日语后的数据集。\n * [jacwir](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fjacwir) - JaCWIR：日语休闲网络信息检索——用于日语信息检索评估的小规模、休闲式网页标题和摘要数据集\n * [japanese-technical-dict](https:\u002F\u002Fgithub.com\u002Flaoshubaby\u002Fjapanese-technical-dict) - 为日语学习者准备的科学技术行业中常用片假名与其原词对照表\n * [j-unimorph](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fj-unimorph) - 日语Unimorph数据集\n * [GazeVQA](https:\u002F\u002Fgithub.com\u002Friken-grp\u002FGazeVQA) - 为LREC-COLING 2024论文“一种基于视线的视觉问答数据集，用于澄清模糊的日语问题”准备的数据集\n * [J-CRe3](https:\u002F\u002Fgithub.com\u002Friken-grp\u002FJ-CRe3) - J-CRe3实验的代码（Ueda等人，LREC-COLING，2024）\n * [jmed-llm](https:\u002F\u002Fgithub.com\u002Fsociocom\u002Fjmed-llm) - JMED-LLM：大型语言模型的日本医疗评估数据集\n * [lawtext](https:\u002F\u002Fgithub.com\u002Fyamachig\u002Flawtext) - 日语法律的纯文本格式\n * [pdmocrdataset-part2](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fpdmocrdataset-part2) - 在OCR处理程序研发项目中创建的OCR学习用数据集\n * [japanesetopicwsd](https:\u002F\u002Fgithub.com\u002Fnut-jnlp\u002Fjapanesetopicwsd) - 基于话题的语义歧义消解评估套装\n * [temporalNLI_dataset](https:\u002F\u002Fgithub.com\u002Ftomo-vv\u002FtemporalNLI_dataset) - Jamp：控制型日语时间推理数据集，用于评估语言模型的泛化能力\n * [JSeM](https:\u002F\u002Fgithub.com\u002FDaisukeBekki\u002FJSeM) - 日语语义测试套件（FraCaS的对应物及扩展）\n * [niilc-qa](https:\u002F\u002Fgithub.com\u002Fmynlp\u002Fniilc-qa) - NIILC QA数据\n * [chain-of-thought-ja-dataset](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fchain-of-thought-ja-dataset) - 论文“日语思维链提示验证”的数据集\n * [WikipediaAnnotatedCorpus](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FWikipediaAnnotatedCorpus) - 这是一个由带有各种语言学标注的维基百科文章组成的日语文本语料库。\n * [elaws-history](https:\u002F\u002Fgithub.com\u002Fkissge\u002Felaws-history) - 定期下载并存档e-Gov法令搜索中发布的“所有法令数据”\n * [Japanese-RP-Bench](https:\u002F\u002Fgithub.com\u002FAratako\u002FJapanese-RP-Bench) - 日语RP基准是用来衡量LLM日语角色扮演能力的基准。\n * [hdic](https:\u002F\u002Fgithub.com\u002Fshikeda\u002Fhdic) - HDIC：早期日本汉字字典综合数据库\n * [awesome-japan-opendata](https:\u002F\u002Fgithub.com\u002Fjapan-opendata\u002Fawesome-japan-opendata) - 优秀日本开放数据——日本开放数据信息一览及汇总\n * [kanji-data](https:\u002F\u002Fgithub.com\u002Fmimneko\u002Fkanji-data) - 常用汉字表及其他与汉字相关的数据\n * [openchj-genji](https:\u002F\u002Fgithub.com\u002Ftogiso\u002Fopenchj-genji) - 《源氏物语》形态学信息数据\n * [AdParaphrase](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002FAdParaphrase) - 该仓库包含我们论文“AdParaphrase：用于分析生成吸引人广告文案的语言特征的释义数据集”的数据。\n * [Jamp_sp](https:\u002F\u002Fgithub.com\u002Fynklab\u002FJamp_sp) - 构建考虑方面的日语时间推理数据集（Jamp_sp：控制型日语时间推理数据集，同时考虑方面因素）\n * [jnli-neg](https:\u002F\u002Fgithub.com\u002Fasahi-y\u002Fjnli-neg) - 这是用于公开的否定理解能力评估日语语言推理数据集JNLI-Neg的存储库。\n * [swallow-corpus](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-corpus) - 该仓库提供了Python实现，用于从Common Crawl档案中构建Swallow语料库第1版，这是一个大型日语网络语料库（Okazaki等人，2024）。\n * [jalecon](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fjalecon) - 面向非母语读者的日语词汇复杂性数据集\n * [multils-japanese](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fmultils-japanese) - 日语多维度词汇复杂性预测及词汇简化数据集：标注者简介、未聚合的标注以及标注指南。\n * [nwjc](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002Fnwjc) - NINJAL网络日语语料库\n * [open-mantra-dataset](https:\u002F\u002Fgithub.com\u002Fmantra-inc\u002Fopen-mantra-dataset) - 该数据集在AAAI21上发表的论文“迈向全自动漫画翻译”中被介绍\n * [public-annotations](https:\u002F\u002Fgithub.com\u002Fmanga109\u002Fpublic-annotations) - Manga109数据集的各种标注\n * [gimei](https:\u002F\u002Fgithub.com\u002Fwillnet\u002Fgimei) - 随机生成的日语姓名和地址\n * [safety-boundary-test](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fsafety-boundary-test) - 用于评估日语语言模型安全性行为的测试集\n * [j-ono-data](https:\u002F\u002Fgithub.com\u002FObakeConstructs\u002Fj-ono-data) - 一个简单、开源的日语拟声词和拟态词合集，以JSON格式呈现。附有漫画示例。\n * [kanji](https:\u002F\u002Fgithub.com\u002Fsylhare\u002Fkanji) - 学习日语汉字部首的列表\n * [jethics](https:\u002F\u002Fgithub.com\u002Flanguage-media-lab\u002Fjethics) - 日语道德理解度评估用数据集JETHICS的概述页面（待更新）\n * [waon](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fwaon) - WAON：面向视觉-语言模型的大规模高质量日语图文数据集\n * [kuci](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkuci) - 京都大学常识推理数据集（KUCI）\n * [japanese-address-testdata](https:\u002F\u002Fgithub.com\u002Ft-sagara\u002Fjapanese-address-testdata) - 难以解析的日本地址测试数据集\n * [jlpt-word-list](https:\u002F\u002Fgithub.com\u002Felzup\u002Fjlpt-word-list) - JLPT词汇中的日语单词列表\n * [hiragana_mojigazo](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhiragana_mojigazo) - 字符图像数据集（平假名73字符版）\n * [lawqa_jp](https:\u002F\u002Fgithub.com\u002Fdigital-go-jp\u002Flawqa_jp) - 日本法令相关的多项选择题QA数据集\n * [yjcaptions](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002Fyjcaptions) - YJ Captions 26k Dataset\n * [ja-vg-vqa](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002Fja-vg-vqa) - 日本视觉基因组VQA数据集\n * [lawhub](https:\u002F\u002Fgithub.com\u002Flwhb\u002Flawhub) - 用于跟踪日语法律文本格式的存储库\n * [japanese-subtitles-word-kanji-frequency-lists](https:\u002F\u002Fgithub.com\u002Fchriskempson\u002Fjapanese-subtitles-word-kanji-frequency-lists) - 根据日剧、动漫和电影字幕得出的单词频率列表。\n * [jconj](https:\u002F\u002Fgithub.com\u002Fyamagoya\u002Fjconj) - 一个基于表格的日语动词变位工具\n * [extract_jawp_names](https:\u002F\u002Fgithub.com\u002Fhiroshi-manabe\u002Fextract_jawp_names) - 提取维基百科日语版中的个人姓名。\n * [cejc_yomichan_freq_dict](https:\u002F\u002Fgithub.com\u002Fforsakeninfinity\u002Fcejc_yomichan_freq_dict) - 基于日常日语对话数据集的Yomichan频率词典\n * [wikidict-ja](https:\u002F\u002Fgithub.com\u002Fopen-dict-data\u002Fwikidict-ja) - 维基百科双语参考数据（日语）\n * [ajimee-bench](https:\u002F\u002Fgithub.com\u002Fazookey\u002Fajimee-bench) - AJIMEE-Bench（高级日语IME评估基准）\n * [j-spaw](https:\u002F\u002Fgithub.com\u002Ftakamichi-lab\u002Fj-spaw) - J-SpAW：用于说话人验证和防欺骗的日语语音语料库\n * [camera3](https:\u002F\u002Fgithub.com\u002Fcyberagentailab\u002Fcamera3) - CAMERA3：用于可控日语广告文案生成的评估数据集\n * [jgpqa](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fjgpqa) - GPQA数据集的日语译本\n * [tanaka-corpus-plus](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Ftanaka-corpus-plus) - 正在去除田中语料库中的噪声。\n * [emotioncorpusjapanesetokushimaa2lab](https:\u002F\u002Fgithub.com\u002Fkmatsu-tokudai\u002Femotioncorpusjapanesetokushimaa2lab) - 日本情绪语料库德岛大学A-2实验室。\n * [osworld-jp](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fosworld-jp) - 考虑语言因素的评估用，日语版计算机使用基准\n * [quasi_japanese_reviews](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fquasi_japanese_reviews) - 类似日语评论（伪评论数据）\n * [psychiatry-clinical-notes](https:\u002F\u002Fgithub.com\u002Fsociocom\u002Fpsychiatry-clinical-notes) - 精神科初诊病历制作问卷数据集\n * [merged-town-names](https:\u002F\u002Fgithub.com\u002Fyuukitoriyama\u002Fmerged-town-names) - 因市町村合并等原因而消失的旧地名与新地名对照表\n * [japanesetextemoticondata](https:\u002F\u002Fgithub.com\u002Fkuroshiba-ginji\u002Fjapanesetextemoticondata) - 日语文本表情符号数据。\n * [mishearing-corpus](https:\u002F\u002Fgithub.com\u002Fkishiyamat\u002Fmishearing-corpus) - 听错语料库︱CSV＋Table Schema管理约1万条记录，利用VS Code＋pre-commit＋Frictionless＋GitHub Actions进行自动化验证的日语数据集\n * [kotowaza](https:\u002F\u002Fgithub.com\u002Fseptn\u002Fkotowaza) - 结构化的JSON数据集，包含印尼语、英语释义、例句、JLPT等级和标签的日语谚语（kotowaza）。\n * [selective-rag-kasensabo](https:\u002F\u002Fgithub.com\u002Ftk-yasuno\u002Fselective-rag-kasensabo) - 这是一个实用的代理型RAG系统MVP，能够以96%的准确率自动判定建筑技术标准相关问题的专业性粒度（精细\u002F粗糙），并据此选择最佳的RAG系统（ColBERT\u002FNaive）。以2025年11月公布的河川砂防大坝技术标准为例，构建了四个RAG系统，并对200道专业性粒度不同的问题进行了精度和速度的比较。\n * [jmle2026-bench](https:\u002F\u002Fgithub.com\u002Fnaoto-iwase\u002Fjmle2026-bench) - 第120次日本医师资格考试的LLM基准测试（2026年2月7-8日）\n * [JSTS-Neg](https:\u002F\u002Fgithub.com\u002Freiko-y\u002FJSTS-Neg) - 这是用于公开的否定理解能力评估日语语义相似度计算数据集JSTS-Neg的存储库。JSTS-Neg是在JGLUE包含的语言推理数据集JSTS基础上扩展而成的。\n * [business-slide-questions](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fbusiness-slide-questions) - 该仓库提供了一项针对商业资料（幻灯片）的视觉问答（VQA）基准测试“BusinessSlideVQA”。\n * [WLSP-antonym](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002FWLSP-antonym) - “语义原则词汇表（WLSP）”的反义关系\n * [YouCook2-JP](https:\u002F\u002Fgithub.com\u002Fnlab-mpg\u002FYouCook2-JP) - YouCook2数据集的日语译本。\n * [E2U](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FE2U) - 关于传播的数据\n * [annotation-2025](https:\u002F\u002Fgithub.com\u002FTiny-Colony\u002Fannotation-2025) - 该仓库旨在公开一种数据，允许将文本的“解读”与人工标注和LLM输出进行比较。\n * [jhpt](https:\u002F\u002Fgithub.com\u002Fnict-astrec-att\u002Fjhpt) - 这是一个将历史日语资料原文文本与现代日语译文（参考译文）按段落一一对应的双语对照数据集。详细内容请参阅论文。\n * [JBE-QA](https:\u002F\u002Fgithub.com\u002Fhancules\u002FJBE-QA) - 日本律师资格考试QA\n * [j-spaw](https:\u002F\u002Fgithub.com\u002Ftakamichi-lab\u002Fj-spaw) - J-SpAW：用于说话人验证和防欺骗的日语语音语料库\n * [JMedWiC](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FJMedWiC) - 使用掩码语言模型自动提取同义与非同义词对，并通过人工同义性标注确定标签，从而构建了日本医疗领域中语义同一性判定的数据集。\n * [jhpt](https:\u002F\u002Fgithub.com\u002Fnict-astrec-att\u002Fjhpt) - 历史日语资料双语对照数据集\n * [Doppelganger-JC](https:\u002F\u002Fgithub.com\u002F0017-alt\u002FDoppelganger-JC) - 这是一个评估LLM中中日跨语言同形异义词误用情况的数据集基准。\n * [modelvista-3lang](https:\u002F\u002Fgithub.com\u002Fkuramitsulab\u002Fmodelvista-3lang) - 用于软件图纸理解的VLM评估基准（支持日语、英语和韩语）\n * [japanese-hr-niah](https:\u002F\u002Fgithub.com\u002Fkufu\u002Fjapanese-hr-niah) - 日本人事劳务领域的长上下文LLM性能评估基准\n * [nijl-manyoshutei](https:\u002F\u002Fgithub.com\u002Fkokubunken\u002Fnijl-manyoshutei) - 本仓库在CC-BY许可下公开了关西大学所藏广濑本万叶集的TEI\u002FXML数据等。\n * [kamuskita](https:\u002F\u002Fgithub.com\u002Fmatbahasa\u002Fkamuskita) - 在马来语学习会上制作的开放马来语-日语词典《大家的马来语词典》\n\n|名称|每周下载量|总下载量|星标数|最近提交|\n-|-|-|-|-\n| 🔗 [jrte-corpus](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fjrte-corpus) | - | - | ⭐ 77 | 🔴 2023年6月|\n| 🔗 [kanji-data](https:\u002F\u002Fgithub.com\u002Fdavidluzgouveia\u002Fkanji-data) | - | - | ⭐ 215 | 🟢 2月|\n| 🔗 [JapaneseWordSimilarityDataset](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002FJapaneseWordSimilarityDataset) | - | - | ⭐ 102 | 🔴 2021年12月|\n| 🔗 [simple-jppdb](https:\u002F\u002Fgithub.com\u002Ftmu-nlp\u002Fsimple-jppdb) | - | - | ⭐ 32 | 🔴 2017年3月|\n| 🔗 [chABSA-dataset](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FchABSA-dataset) | - | - | ⭐ 140 | 🔴 2018年9月|\n| 🔗 [JaQuAD](https:\u002F\u002Fgithub.com\u002FSkelterLabsInc\u002FJaQuAD) | - | - | ⭐ 110 | 🔴 2022年1月|\n| 🔗 [JaNLI](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJaNLI) | - | - | ⭐ 17 | 🔴 2023年5月|\n| 🔗 [ebe-dataset](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Febe-dataset) | - | - | ⭐ 18 | 🔴 2020年12月|\n| 🔗 [emoji-ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Femoji-ja) | - | - | ⭐ 83 | 🔴 2025年3月|\n| 🔗 [nayose-wikipedia-ja](https:\u002F\u002Fgithub.com\u002Fyagays\u002Fnayose-wikipedia-ja) | - | - | ⭐ 35 | 🔴 2020年3月|\n| 🔗 [ja.text8](https:\u002F\u002Fgithub.com\u002FHironsan\u002Fja.text8) | - | - | ⭐ 无效 | 🔴 2017年10月|\n| 🔗 [ThreeLineSummaryDataset](https:\u002F\u002Fgithub.com\u002FKodairaTomonori\u002FThreeLineSummaryDataset) | - | - | ⭐ 31 | 🔴 2018年4月|\n| 🔗 [japanese](https:\u002F\u002Fgithub.com\u002Fhingston\u002Fjapanese) | - | - | ⭐ 87 | 🔴 2018年9月|\n| 🔗 [kanji-frequency](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Fkanji-frequency) | - | - | ⭐ 156 | 🟢 3月|\n| 🔗 [TEDxJP-10K](https:\u002F\u002Fgithub.com\u002Flaboroai\u002FTEDxJP-10K) | - | - | ⭐ 24 | 🔴 2021年1月|\n| 🔗 [CoARiJ](https:\u002F\u002Fgithub.com\u002Fchakki-works\u002FCoARiJ) | - | - | ⭐ 94 | 🔴 2020年12月|\n| 🔗 [technological-book-corpus-ja](https:\u002F\u002Fgithub.com\u002Ftextlint-ja\u002Ftechnological-book-corpus-ja) | - | - | ⭐ 26 | 🔴 2023年7月|\n| 🔗 [ita-corpus-chuwa](https:\u002F\u002Fgithub.com\u002Fshirayu\u002Fita-corpus-chuwa) | - | - | ⭐ 5 | 🔴 2021年8月|\n| 🔗 [wikipedia-utils](https:\u002F\u002Fgithub.com\u002Fsingletongue\u002Fwikipedia-utils) | - | - | ⭐ 78 | 🔴 2024年4月|\n| 🔗 [inappropriate-words-ja](https:\u002F\u002Fgithub.com\u002FMosasoM\u002Finappropriate-words-ja) | - | - | ⭐ 202 | 🔴 2021年12月|\n| 🔗 [house-of-councillors](https:\u002F\u002Fgithub.com\u002Fsmartnews-smri\u002Fhouse-of-councillors) | - | - | ⭐ 107 | 🟢 昨天|\n| 🔗 [house-of-representatives](https:\u002F\u002Fgithub.com\u002Fsmartnews-smri\u002Fhouse-of-representatives) | - | - | ⭐ 178 | 🟢 昨天|\n| 🔗 [STAIR-captions](https:\u002F\u002Fgithub.com\u002FSTAIR-Lab-CIT\u002FSTAIR-captions) | - | - | ⭐ 90 | 🔴 2018年7月|\n| 🔗 [Winograd-Schema-Challenge-Ja](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FWinograd-Schema-Challenge-Ja) | - | - | ⭐ 6 | 🔴 2019年1月|\n| 🔗 [speechBSD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FspeechBSD) | - | - | ⭐ 3 | 🔴 2024年2月|\n| 🔗 [ita-corpus](https:\u002F\u002Fgithub.com\u002Fmmorise\u002Fita-corpus) | - | - | ⭐ 229 | 🟢 3月|\n| 🔗 [rohan4600](https:\u002F\u002Fgithub.com\u002Fmmorise\u002Frohan4600) | - | - | ⭐ 70 | 🟢 3月|\n| 🔗 [anlp-jp-history](https:\u002F\u002Fgithub.com\u002Fwhym\u002Fanlp-jp-history) | - | - | ⭐ 3 | 🔴 2024年4月|\n| 🔗 [keigo_transfer_task](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fkeigo_transfer_task) | - | - | ⭐ 21 | 🔴 2022年11月|\n| 🔗 [loanwords_gairaigo](https:\u002F\u002Fgithub.com\u002Fjamesohortle\u002Floanwords_gairaigo) | - | - | ⭐ 19 | 🔴 2021年1月|\n| 🔗 [jawikicorpus](https:\u002F\u002Fgithub.com\u002Fwikiwikification\u002Fjawikicorpus) | - | - | ⭐ 4 | 🔴 2018年11月|\n| 🔗 [GeneralPolicySpeechOfPrimeMinisterOfJapan](https:\u002F\u002Fgithub.com\u002Fyuukimiyo\u002FGeneralPolicySpeechOfPrimeMinisterOfJapan) | - | - | ⭐ 6 | 🔴 2020年1月|\n| 🔗 [wrime](https:\u002F\u002Fgithub.com\u002Fids-cv\u002Fwrime) | - | - | ⭐ 174 | 🟡 2025年9月|\n| 🔗 [jtubespeech](https:\u002F\u002Fgithub.com\u002Fsarulab-speech\u002Fjtubespeech) | - | - | ⭐ 229 | 🔴 2023年3月|\n| 🔗 [WikipediaWordFrequencyList](https:\u002F\u002Fgithub.com\u002Fmaeda6uiui-backup\u002FWikipediaWordFrequencyList) | - | - | ⭐ 2 | 🔴 2022年4月|\n| 🔗 [kokkosho_data](https:\u002F\u002Fgithub.com\u002Frindybell\u002Fkokkosho_data) | - | - | ⭐ 1 | 🔴 2019年7月|\n| 🔗 [pdmocrdataset-part1](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fpdmocrdataset-part1) | - | - | ⭐ 83 | 🔴 2024年6月|\n| 🔗 [huriganacorpus-ndlbib](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhuriganacorpus-ndlbib) | - | - | ⭐ 31 | 🔴 2021年9月|\n| 🔗 [jvs_hiho](https:\u002F\u002Fgithub.com\u002FHiroshiba\u002Fjvs_hiho) | - | - | ⭐ 31 | 🔴 2021年2月|\n| 🔗 [hirakanadic](https:\u002F\u002Fgithub.com\u002Fpo3rin\u002Fhirakanadic) | 📥 28 | 📦 14k | ⭐ 7 | 🔴 2023年7月|\n| 🔗 [animedb](https:\u002F\u002Fgithub.com\u002Fanilogia\u002Fanimedb) | - | - | ⭐ 330 | 🔴 2023年1月|\n| 🔗 [security_words](https:\u002F\u002Fgithub.com\u002FSaitoLab\u002Fsecurity_words) | - | - | ⭐ 27 | 🔴 2023年8月|\n| 🔗 [Data-on-Japanese-Diet-Members](https:\u002F\u002Fgithub.com\u002Fsugi2000\u002FData-on-Japanese-Diet-Members) | - | - | ⭐ 3 | 🔴 2022年9月|\n| 🔗 [honkoku-data](https:\u002F\u002Fgithub.com\u002Fyuta1984\u002Fhonkoku-data) | - | - | ⭐ 18 | 🟢 3月|\n| 🔗 [wikihow_japanese](https:\u002F\u002Fgithub.com\u002FKatsumata420\u002Fwikihow_japanese) | - | - | ⭐ 35 | 🔴 2020年12月|\n| 🔗 [engineer-vocabulary-list](https:\u002F\u002Fgithub.com\u002Fmercari\u002Fengineer-vocabulary-list) | - | - | ⭐ 1.9k | 🔴 2020年11月|\n| 🔗 [JSICK](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJSICK) | - | - | ⭐ 45 | 🔴 2023年5月|\n| 🔗 [phishurl-list](https:\u002F\u002Fgithub.com\u002FJPCERTCC\u002Fphishurl-list) | - | - | ⭐ 205 | 🟢 3月|\n| 🔗 [jcms](https:\u002F\u002Fgithub.com\u002Fshigashiyama\u002Fjcms) | - | - | ⭐ 9 | 🟢 上周五|\n| 🔗 [aozorabunko_text](https:\u002F\u002Fgithub.com\u002Faozorahack\u002Faozorabunko_text) | - | - | ⭐ 91 | 🔴 2023年3月|\n| 🔗 [friendly_JA-Corpus](https:\u002F\u002Fgithub.com\u002Fastremo\u002Ffriendly_JA-Corpus) | - | - | ⭐ 仓库未找到 | 🔴 仓库未找到|\n| 🔗 [topokanji](https:\u002F\u002Fgithub.com\u002Fscriptin\u002Ftopokanji) | - | - | ⭐ 200 | 🔴 2016年1月|\n| 🔗 [isbn4groups](https:\u002F\u002Fgithub.com\u002Furibo\u002Fisbn4groups) | - | - | ⭐ 1 | 🔴 2024年6月|\n| 🔗 [NMeCab](https:\u002F\u002Fgithub.com\u002Fkomutan\u002FNMeCab) | - | - | ⭐ 99 | 🔴 2024年3月|\n| 🔗 [ndlngramdata](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlngramdata) | - | - | ⭐ 15 | 🔴 2023年1月|\n| 🔗 [ndlngramviewer_v2](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndlngramviewer_v2) | - | - | ⭐ 3 | 🔴 2023年7月|\n| 🔗 [data_set](https:\u002F\u002Fgithub.com\u002Fjapanese-law-analysis\u002Fdata_set) | - | - | ⭐ 51 | 🔴 2025年1月|\n| 🔗 [huggingface-datasets_wrime](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_wrime) | - | - | ⭐ 4 | 🔴 2023年1月|\n| 🔗 [ndl-minhon-ocrdataset](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fndl-minhon-ocrdataset) | - | - | ⭐ 20 | 🟢 3月|\n| 🔗 [PAX_SAPIENTICA](https:\u002F\u002Fgithub.com\u002FAsPJT\u002FPAX_SAPIENTICA) | - | - | ⭐ 181 | 🟡 2025年12月|\n| 🔗 [j-liwc2015](https:\u002F\u002Fgithub.com\u002Ftasukuigarashi\u002Fj-liwc2015) | - | - | ⭐ 13 | 🔴 2024年11月|\n| 🔗 [huggingface-datasets_livedoor-news-corpus](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_livedoor-news-corpus) | - | - | ⭐ 2 | 🔴 2023年10月|\n| 🔗 [huggingface-datasets_JGLUE](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_JGLUE) | - | - | ⭐ 12 | 🔴 2025年3月|\n| 🔗 [commonsense-moral-ja](https:\u002F\u002Fgithub.com\u002FLanguage-Media-Lab\u002Fcommonsense-moral-ja) | - | - | ⭐ 15 | 🟡 2025年11月|\n| 🔗 [comet-atomic-ja](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fcomet-atomic-ja) | - | - | ⭐ 31 | 🔴 2024年3月|\n| 🔗 [dcsg-ja](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fdcsg-ja) | - | - | ⭐ 6 | 🔴 2023年3月|\n| 🔗 [japanese-toxic-dataset](https:\u002F\u002Fgithub.com\u002Finspection-ai\u002Fjapanese-toxic-dataset) | - | - | ⭐ 21 | 🔴 2023年1月|\n| 🔗 [camera](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002Fcamera) | - | - | ⭐ 26 | 🔴 2024年8月|\n| 🔗 [Japanese-Fakenews-Dataset](https:\u002F\u002Fgithub.com\u002Ftanreinama\u002FJapanese-Fakenews-Dataset) | - | - | ⭐ 20 | 🔴 2021年5月|\n| 🔗 [jpn_explainable_qa_dataset](https:\u002F\u002Fgithub.com\u002Faiishii\u002Fjpn_explainable_qa_dataset) | - | - | ⭐ 仓库未找到 | 🔴 仓库未找到|\n| 🔗 [copa-japanese](https:\u002F\u002Fgithub.com\u002Fnlp-titech\u002Fcopa-japanese) | - | - | ⭐ 1 | 🔴 2023年2月|\n| 🔗 [WLSP-familiarity](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002FWLSP-familiarity) | - | - | ⭐ 12 | 🔴 2025年1月|\n| 🔗 [ProSub](https:\u002F\u002Fgithub.com\u002Fmatbahasa\u002FProSub) | - | - | ⭐ 5 | 🟡 2025年4月|\n| 🔗 [commonsense-moral-ja](https:\u002F\u002Fgithub.com\u002FLanguage-Media-Lab\u002Fcommonsense-moral-ja) | - | - | ⭐ 15 | 🟡 2025年11月|\n| 🔗 [ramendb](https:\u002F\u002Fgithub.com\u002Fnuko-yokohama\u002Framendb) | - | - | ⭐ 7 | 🟢 上周五|\n| 🔗 [huggingface-datasets_CAMERA](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fhuggingface-datasets_CAMERA) | - | - | ⭐ 3 | 🔴 2023年3月|\n| 🔗 [FactCheckSentenceNLI-FCSNLI-](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002FFactCheckSentenceNLI-FCSNLI-) | - | - | ⭐ 0 | 🔴 2021年3月|\n| 🔗 [databricks-dolly-15k-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Fdatabricks-dolly-15k-ja) | - | - | ⭐ 89 | 🔴 2023年7月|\n| 🔗 [EaST-MELD](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FEaST-MELD) | - | - | ⭐ 0 | 🔴 2023年6月|\n| 🔗 [meconaudio](https:\u002F\u002Fgithub.com\u002Felith-co-jp\u002Fmeconaudio) | - | - | ⭐ 10 | 🔴 2023年10月|\n| 🔗 [japanese-addresses](https:\u002F\u002Fgithub.com\u002Fgeolonia\u002Fjapanese-addresses) | - | - | ⭐ 761 | 🟡 2025年12月|\n| 🔗 [aozorasearch](https:\u002F\u002Fgithub.com\u002Fmyokoym\u002Faozorasearch) | - | - | ⭐ 22 | 🟢 3月|\n| 🔗 [llm-jp-corpus](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fllm-jp-corpus) | - | - | ⭐ 44 | 🔴 2023年10月|\n| 🔗 [alpaca_ja](https:\u002F\u002Fgithub.com\u002Fshi3z\u002Falpaca_ja) | - | - | ⭐ 86 | 🔴 2023年5月|\n| 🔗 [instruction_ja](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Finstruction_ja) | - | - | ⭐ 24 | 🔴 2023年7月|\n| 🔗 [japanese-family-names](https:\u002F\u002Fgithub.com\u002Fsiikamiika\u002Fjapanese-family-names) | - | - | ⭐ 18 | 🔴 2017年6月|\n| 🔗 [kanji-data-media](https:\u002F\u002Fgithub.com\u002Fkanjialive\u002Fkanji-data-media) | - | - | ⭐ 409 | 🔴 2023年11月|\n| 🔗 [reazonspeech](https:\u002F\u002Fgithub.com\u002Freazon-research\u002Freazonspeech) | - | - | ⭐ 380 | 🟢 1月|\n| 🔗 [huriganacorpus-aozora](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhuriganacorpus-aozora) | - | - | ⭐ 22 | 🔴 2024年1月|\n| 🔗 [koniwa](https:\u002F\u002Fgithub.com\u002Fkoniwa\u002Fkoniwa) | - | - | ⭐ 60 | 🟡 2025年4月|\n| 🔗 [JMMLU](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002FJMMLU) | - | - | ⭐ 38 | 🟡 2025年10月|\n| 🔗 [hurigana-speech-corpus-aozora](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhurigana-speech-corpus-aozora) | - | - | ⭐ 48 | 🔴 2025年3月|\n| 🔗 [jqara](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fjqara) | - | - | ⭐ 43 | 🟡 2025年9月|\n| 🔗 [jemhopqa](https:\u002F\u002Fgithub.com\u002Faiishii\u002Fjemhopqa) | - | - | ⭐ 30 | 🟡 2025年4月|\n| 🔗 [jacred](https:\u002F\u002Fgithub.com\u002Fyoumima\u002Fjacred) | - | - | ⭐ 8 | 🔴 2024年3月|\n| 🔗 [jades](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fjades) | - | - | ⭐ 0 | 🔴 2022年12月|\n| 🔗 [do-not-answer-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Fdo-not-answer-ja) | - | - | ⭐ 24 | 🔴 2023年12月|\n| 🔗 [oasst1-89k-ja](https:\u002F\u002Fgithub.com\u002Fkunishou\u002Foasst1-89k-ja) | - | - | ⭐ 16 | 🔴 2023年11月|\n| 🔗 [jacwir](https:\u002F\u002Fgithub.com\u002Fhotchpotch\u002Fjacwir) | - | - | ⭐ 8 | 🟡 2025年9月|\n| 🔗 [japanese-technical-dict](https:\u002F\u002Fgithub.com\u002Flaoshubaby\u002Fjapanese-technical-dict) | - | - | ⭐ 3 | 🔴 2024年11月|\n| 🔗 [j-unimorph](https:\u002F\u002Fgithub.com\u002Fcl-tohoku\u002Fj-unimorph) | - | - | ⭐ 9 | 🟢 1月|\n| 🔗 [GazeVQA](https:\u002F\u002Fgithub.com\u002Friken-grp\u002FGazeVQA) | - | - | ⭐ 0 | 🔴 2024年9月|\n| 🔗 [J-CRe3](https:\u002F\u002Fgithub.com\u002Friken-grp\u002FJ-CRe3) | - | - | ⭐ 10 | 🔴 2025年1月|\n| 🔗 [jmed-llm](https:\u002F\u002Fgithub.com\u002Fsociocom\u002Fjmed-llm) | - | - | ⭐ 56 | 🔴 2024年9月|\n| 🔗 [lawtext](https:\u002F\u002Fgithub.com\u002Fyamachig\u002Flawtext) | - | - | ⭐ 94 | 🟢 1月|\n| 🔗 [pdmocrdataset-part2](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fpdmocrdataset-part2) | - | - | ⭐ 15 | 🔴 2024年6月|\n| 🔗 [japanesetopicwsd](https:\u002F\u002Fgithub.com\u002Fnut-jnlp\u002Fjapanesetopicwsd) | - | - | ⭐ 2 | 🔴 2018年9月|\n| 🔗 [temporalNLI_dataset](https:\u002F\u002Fgithub.com\u002Ftomo-vv\u002FtemporalNLI_dataset) | - | - | ⭐ 1 | 🔴 2023年7月|\n| 🔗 [JSeM](https:\u002F\u002Fgithub.com\u002FDaisukeBekki\u002FJSeM) | - | - | ⭐ 13 | 🔴 2024年11月|\n| 🔗 [niilc-qa](https:\u002F\u002Fgithub.com\u002Fmynlp\u002Fniilc-qa) | - | - | ⭐ 18 | 🔴 2015年11月|\n| 🔗 [chain-of-thought-ja-dataset](https:\u002F\u002Fgithub.com\u002Fnlp-waseda\u002Fchain-of-thought-ja-dataset) | - | - | ⭐ 5 | 🔴 2023年9月|\n| 🔗 [WikipediaAnnotatedCorpus](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002FWikipediaAnnotatedCorpus) | - | - | ⭐ 29 | 🟢 2月|\n| 🔗 [elaws-history](https:\u002F\u002Fgithub.com\u002Fkissge\u002Felaws-history) | - | - | ⭐ 5 | 🟢 昨天|\n| 🔗 [Japanese-RP-Bench](https:\u002F\u002Fgithub.com\u002FAratako\u002FJapanese-RP-Bench) | - | - | ⭐ 18 | 🔴 2024年9月|\n| 🔗 [hdic](https:\u002F\u002Fgithub.com\u002Fshikeda\u002Fhdic) | - | - | ⭐ 41 | 🟢 3月|\n| 🔗 [awesome-japan-opendata](https:\u002F\u002Fgithub.com\u002Fjapan-opendata\u002Fawesome-japan-opendata) | - | - | ⭐ 159 | 🟢 3月|\n| 🔗 [kanji-data](https:\u002F\u002Fgithub.com\u002Fmimneko\u002Fkanji-data) | - | - | ⭐ 18 | 🟢 2月|\n| 🔗 [openchj-genji](https:\u002F\u002Fgithub.com\u002Ftogiso\u002Fopenchj-genji) | - | - | ⭐ 2 | 🔴 2025年3月|\n| 🔗 [AdParaphrase](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002FAdParaphrase) | - | - | ⭐ 1 | 🟡 2025年5月|\n| 🔗 [Jamp_sp](https:\u002F\u002Fgithub.com\u002Fynklab\u002FJamp_sp) | - | - | ⭐ 0 | 🔴 2024年6月|\n| 🔗 [jnli-neg](https:\u002F\u002Fgithub.com\u002Fasahi-y\u002Fjnli-neg) | - | - | ⭐ 0 | 🟡 2025年12月|\n| 🔗 [swallow-corpus](https:\u002F\u002Fgithub.com\u002Fswallow-llm\u002Fswallow-corpus) | - | - | ⭐ 6 | 🔴 2024年11月|\n| 🔗 [jalecon](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fjalecon) | - | - | ⭐ 5 | 🔴 2023年7月|\n| 🔗 [multils-japanese](https:\u002F\u002Fgithub.com\u002Fnaist-nlp\u002Fmultils-japanese) | - | - | ⭐ 0 | 🔴 无效|\n| 🔗 [nwjc](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002Fnwjc) | - | - | ⭐ 10 | 🔴 2022年4月|\n| 🔗 [open-mantra-dataset](https:\u002F\u002Fgithub.com\u002Fmantra-inc\u002Fopen-mantra-dataset) | - | - | ⭐ 199 | 🔴 2023年3月|\n| 🔗 [gimei](https:\u002F\u002Fgithub.com\u002Fwillnet\u002Fgimei) | - | - | ⭐ 424 | 🟢 1月|\n| 🔗 [safety-boundary-test](https:\u002F\u002Fgithub.com\u002Fsbintuitions\u002Fsafety-boundary-test) | - | - | ⭐ 9 | 🟡 2025年7月|\n| 🔗 [j-ono-data](https:\u002F\u002Fgithub.com\u002FObakeConstructs\u002Fj-ono-data) | - | - | ⭐ 7 | 🟢 上周四|\n| 🔗 [kanji](https:\u002F\u002Fgithub.com\u002Fsylhare\u002Fkanji) | - | - | ⭐ 28 | 🟢 上周五|\n| 🔗 [jethics](https:\u002F\u002Fgithub.com\u002Flanguage-media-lab\u002Fjethics) | - | - | ⭐ 2 | 🟡 2025年6月|\n| 🔗 [waon](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fwaon) | - | - | ⭐ 6 | 🟡 2025年11月|\n| 🔗 [kuci](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fkuci) | - | - | ⭐ 5 | 🔴 2024年2月|\n| 🔗 [japanese-address-testdata](https:\u002F\u002Fgithub.com\u002Ft-sagara\u002Fjapanese-address-testdata) | - | - | ⭐ 14 | 🔴 2023年9月|\n| 🔗 [jlpt-word-list](https:\u002F\u002Fgithub.com\u002Felzup\u002Fjlpt-word-list) | - | - | ⭐ 66 | 🔴 2022年2月|\n| 🔗 [hiragana_mojigazo](https:\u002F\u002Fgithub.com\u002Fndl-lab\u002Fhiragana_mojigazo) | - | - | ⭐ 18 | 🔴 2020年4月|\n| 🔗 [lawqa_jp](https:\u002F\u002Fgithub.com\u002Fdigital-go-jp\u002Flawqa_jp) | - | - | ⭐ 267 | 🟢 2月|\n| 🔗 [yjcaptions](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002Fyjcaptions) | - | - | ⭐ 60 | 🔴 2016年11月|\n| 🔗 [ja-vg-vqa](https:\u002F\u002Fgithub.com\u002Fyahoojapan\u002Fja-vg-vqa) | - | - | ⭐ 30 | 🔴 2018年11月|\n| 🔗 [lawhub](https:\u002F\u002Fgithub.com\u002Flwhb\u002Flawhub) | - | - | ⭐ 152 | 🔴 2020年11月|\n| 🔗 [japanese-subtitles-word-kanji-frequency-lists](https:\u002F\u002Fgithub.com\u002Fchriskempson\u002Fjapanese-subtitles-word-kanji-frequency-lists) | - | - | ⭐ 40 | 🔴 2023年12月|\n| 🔗 [jconj](https:\u002F\u002Fgithub.com\u002Fyamagoya\u002Fjconj) | - | - | ⭐ 35 | 🔴 2020年5月|\n| 🔗 [extract_jawp_names](https:\u002F\u002Fgithub.com\u002Fhiroshi-manabe\u002Fextract_jawp_names) | - | - | ⭐ 21 | 🔴 2022年12月|\n| 🔗 [cejc_yomichan_freq_dict](https:\u002F\u002Fgithub.com\u002Fforsakeninfinity\u002Fcejc_yomichan_freq_dict) | - | - | ⭐ 11 | 🔴 2023年6月|\n| 🔗 [wikidict-ja](https:\u002F\u002Fgithub.com\u002Fopen-dict-data\u002Fwikidict-ja) | - | - | ⭐ 5 | 🔴 2016年6月|\n| 🔗 [ajimee-bench](https:\u002F\u002Fgithub.com\u002Fazookey\u002Fajimee-bench) | - | - | ⭐ 20 | 🔴 2025年1月|\n| 🔗 [j-spaw](https:\u002F\u002Fgithub.com\u002Ftakamichi-lab\u002Fj-spaw) | - | - | ⭐ 5 | 🟡 2025年8月|\n| 🔗 [camera3](https:\u002F\u002Fgithub.com\u002Fcyberagentailab\u002Fcamera3) | - | - | ⭐ 4 | 🔴 2024年5月|\n| 🔗 [jgpqa](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fjgpqa) | - | - | ⭐ 2 | 🟡 2025年9月|\n| 🔗 [tanaka-corpus-plus](https:\u002F\u002Fgithub.com\u002Fmarmooo\u002Ftanaka-corpus-plus) | - | - | ⭐ 2 | 🔴 2021年6月|\n| 🔗 [emotioncorpusjapanesetokushimaa2lab](https:\u002F\u002Fgithub.com\u002Fkmatsu-tokudai\u002Femotioncorpusjapanesetokushimaa2lab) | - | - | ⭐ 2 | 🔴 2024年9月|\n| 🔗 [osworld-jp](https:\u002F\u002Fgithub.com\u002Fkarakuri-ai\u002Fosworld-jp) | - | - | ⭐ 2 | 🟢 上周五|\n| 🔗 [quasi_japanese_reviews](https:\u002F\u002Fgithub.com\u002Fmegagonlabs\u002Fquasi_japanese_reviews) | - | - | ⭐ 1 | 🔴 2023年7月|\n| 🔗 [psychiatry-clinical-notes](https:\u002F\u002Fgithub.com\u002Fsociocom\u002Fpsychiatry-clinical-notes) | - | - | ⭐ 1 | 🟡 2025年10月|\n| 🔗 [merged-town-names](https:\u002F\u002Fgithub.com\u002Fyuukitoriyama\u002Fmerged-town-names) | - | - | ⭐ 1 | 🔴 2022年5月|\n| 🔗 [japanesetextemoticondata](https:\u002F\u002Fgithub.com\u002Fkuroshiba-ginji\u002Fjapanesetextemoticondata) | - | - | ⭐ 1 | 🔴 2021年3月|\n| 🔗 [mishearing-corpus](https:\u002F\u002Fgithub.com\u002Fkishiyamat\u002Fmishearing-corpus) | - | - | ⭐ 1 | 🟢 1月|\n| 🔗 [kotowaza](https:\u002F\u002Fgithub.com\u002Fseptn\u002Fkotowaza) | - | - | ⭐ 2 | 🟢 2月|\n| 🔗 [selective-rag-kasensabo](https:\u002F\u002Fgithub.com\u002Ftk-yasuno\u002Fselective-rag-kasensabo) | - | - | ⭐ 1 | 🟡 2025年11月|\n| 🔗 [jmle2026-bench](https:\u002F\u002Fgithub.com\u002Fnaoto-iwase\u002Fjmle2026-bench) | - | - | ⭐ 10 | 🟢 3月|\n| 🔗 [JSTS-Neg](https:\u002F\u002Fgithub.com\u002Freiko-y\u002FJSTS-Neg) | - | - | ⭐ 1 | 🟢 2月|\n| 🔗 [business-slide-questions](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fbusiness-slide-questions) | - | - | ⭐ 2 | 🟡 2025年5月|\n| 🔗 [WLSP-antonym](https:\u002F\u002Fgithub.com\u002Fmasayu-a\u002FWLSP-antonym) | - | - | ⭐ 0 | 🔴 2021年3月|\n| 🔗 [YouCook2-JP](https:\u002F\u002Fgithub.com\u002Fnlab-mpg\u002FYouCook2-JP) | - | - | ⭐ 0 | 🟡 2025年8月|\n| 🔗 [E2U](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FE2U) | - | - | ⭐ 0 | 🟢 3月|\n| 🔗 [annotation-2025](https:\u002F\u002Fgithub.com\u002FTiny-Colony\u002Fannotation-2025) | - | - | ⭐ 0 | 🟢 1月|\n| 🔗 [jhpt](https:\u002F\u002Fgithub.com\u002Fnict-astrec-att\u002Fjhpt) | - | - | ⭐ 3 | 🟢 3月|\n| 🔗 [JBE-QA](https:\u002F\u002Fgithub.com\u002Fhancules\u002FJBE-QA) | - | - | ⭐ 0 | 🟡 2025年11月|\n| 🔗 [j-spaw](https:\u002F\u002Fgithub.com\u002Ftakamichi-lab\u002Fj-spaw) | - | - | ⭐ 5 | 🟡 2025年8月|\n| 🔗 [JMedWiC](https:\u002F\u002Fgithub.com\u002FEhimeNLP\u002FJMedWiC) | - | - | ⭐ 3 | 🟢 3月|\n| 🔗 [jhpt](https:\u002F\u002Fgithub.com\u002Fnict-astrec-att\u002Fjhpt) | - | - | ⭐ 3 | 🟢 3月|\n| 🔗 [Doppelganger-JC](https:\u002F\u002Fgithub.com\u002F0017-alt\u002FDoppelganger-JC) | - | - | ⭐ 1 | 🟢 1月|\n| 🔗 [modelvista-3lang](https:\u002F\u002Fgithub.com\u002Fkuramitsulab\u002Fmodelvista-3lang) | - | - | ⭐ 2 | 🟢 3月|\n| 🔗 [japanese-hr-niah](https:\u002F\u002Fgithub.com\u002Fkufu\u002Fjapanese-hr-niah) | - | - | ⭐ 1 | 🟢 1月|\n| 🔗 [nijl-manyoshutei](https:\u002F\u002Fgithub.com\u002Fkokubunken\u002Fnijl-manyoshutei) | - | - | ⭐ 2 | 🟢 3月|\n| 🔗 [kamuskita](https:\u002F\u002Fgithub.com\u002Fmatbahasa\u002Fkamuskita) | - | - | ⭐ 2 | 🟢 上周四|\n\n## 教程\n学习日语自然语言处理工具和技巧的指南与教程\n\n * [spacy_tutorial](https:\u002F\u002Fgithub.com\u002Fyuibi\u002Fspacy_tutorial) - spaCy 英文和日文教程。包含 spacy-transformers、BERT、GiNZA。\n * [fastTextJapaneseTutorial](https:\u002F\u002Fgithub.com\u002Ficoxfog417\u002FfastTextJapaneseTutorial) - 使用日语文本语料训练 fastText 的教程。\n * [allennlp-NER-ja](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fallennlp-NER-ja) - AllenNLP-NER-ja：使用 AllenNLP 进行日语命名实体识别。\n * [chariot-PyTorch-Japanese-text-classification](https:\u002F\u002Fgithub.com\u002Fymym3412\u002Fchariot-PyTorch-Japanese-text-classification) - 使用 chariot 和 PyTorch 进行日语文本分类的实验。\n * [ginza-examples](https:\u002F\u002Fgithub.com\u002Fpoyo46\u002Fginza-examples) - 日语 NLP 库 GiNZA 推荐。\n * [DocumentClassificationUsingBERT-Japanese](https:\u002F\u002Fgithub.com\u002Fnekoumei\u002FDocumentClassificationUsingBERT-Japanese) - 使用 BERT 进行日语文档分类。\n * [BERT_Japanese_Google_Colaboratory](https:\u002F\u002Fgithub.com\u002FYutaroOgawa\u002FBERT_Japanese_Google_Colaboratory) - 在 Google Colab 上运行日语 BERT 的方法。\n * [bert-book](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fbert-book) - 《BERT 自然语言处理入门：基于 Transformers 的实战编程》支持页面。\n * [janome-tutorial](https:\u002F\u002Fgithub.com\u002Fmocobeta\u002Fjanome-tutorial) - 使用 Janome 进行文本挖掘入门教程。\n * [handson-language-models](https:\u002F\u002Fgithub.com\u002Fhnishi\u002Fhandson-language-models) - 日语语言模型动手实践资料。\n * [JapaneseNLI](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJapaneseNLI) - 在 Google Colab 上尝试日语文本推理。\n * [deep-learning-with-pytorch-ja](https:\u002F\u002Fgithub.com\u002FGin5050\u002Fdeep-learning-with-pytorch-ja) - 深度学习与 PyTorch 的日语版仓库。\n * [bert-classification-tutorial](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fbert-classification-tutorial) - 【2023 年版】使用 BERT 进行文本分类。\n * [python-nlp-book](https:\u002F\u002Fgithub.com\u002Fpython-nlp-book\u002Fpython-nlp-book) - 《深度学习自然语言处理》（共立出版）的支持页面。\n * [llm-book](https:\u002F\u002Fgithub.com\u002Fghmagazine\u002Fllm-book) - 《大规模语言模型入门》（技术评论社，2023 年）的 GitHub 仓库。\n * [nlp2024-tutorial-3](https:\u002F\u002Fgithub.com\u002Fhiroshi-matsuda-rit\u002Fnlp2024-tutorial-3) - NLP2024 教程 3：亲手构建并学习日语大规模语言模型——环境搭建步骤及源代码。\n * [japanese-ir-tutorial](https:\u002F\u002Fgithub.com\u002Fmpkato\u002Fjapanese-ir-tutorial) - 日语文本信息检索教程。\n * [nlpbook](https:\u002F\u002Fgithub.com\u002Fmamorlis\u002Fnlpbook) - 《自然语言处理教科书》支持网站。\n * [kantan-regex-book](https:\u002F\u002Fgithub.com\u002Fmakenowjust\u002Fkantan-regex-book) - 通过实践学习正则表达式引擎。\n * [bert-classification-tutorial-2024](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fbert-classification-tutorial-2024) - 【2024 年版】使用 BERT 进行文本分类。\n * [Gemma2_2b_Japanese_finetuning_colab.ipynb](https:\u002F\u002Fgithub.com\u002Fqianniu95\u002Fgemma2_2b_finetune_jp_tutorial\u002Fblob\u002Fmain\u002FGemma2_2b_Japanese_finetuning_colab.ipynb) - 针对日语指令对 Google Gemma 进行微调。\n * [nlp100v2020](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp100v2020) - 使用 Python 解答《语言处理 100 题 2020》。\n * [textmining-ja](https:\u002F\u002Fgithub.com\u002Fpaithiov909\u002Ftextmining-ja) - 使用 R 进行自然语言处理和文本分析练习。\n * [nlp2025-tutorial-2](https:\u002F\u002Fgithub.com\u002Fyuiseki\u002Fnlp2025-tutorial-2) - NLP2025 教程《地理信息与语言处理 实践入门》的资料和源代码。\n * [nlp100v2025](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp100v2025) - 使用 Python 解答《语言处理 100 题 2025》。\n * [topic-models-ao](https:\u002F\u002Fgithub.com\u002Fanemptyarchive\u002Ftopic-models-ao) - 《主题模型》（机器学习专业系列）的笔记。\n * [slp2025](https:\u002F\u002Fgithub.com\u002Fryota-komatsu\u002Fslp2025) - 2025 年音学研讨会教程《多模态大规模语言模型入门》资料。\n * [book_impress_it-basic-education-ai](https:\u002F\u002Fgithub.com\u002Fliber-craft-co-ltd\u002Fbook_impress_it-basic-education-ai) - Impress 出版社《IT 基础素养：自然语言处理与图像分析》。\n * [genai-agent-advanced-book](https:\u002F\u002Fgithub.com\u002Fmasamasa59\u002Fgenai-agent-advanced-book) - 书籍《现场活用生成式 AI 代理实践入门》（讲谈社科学社）中使用的源代码。\n * [course2024-nlp](https:\u002F\u002Fgithub.com\u002Ftomonari-masada\u002Fcourse2024-nlp) - 2024 年立教大学研究生院人工智能科学研究科自然语言处理专题讲座。\n * [support-genai-book](https:\u002F\u002Fgithub.com\u002Fyoheikikuta\u002Fsupport-genai-book) - 从原始论文解析生成式 AI（技术评论社）的支持页面。\n * [ir100](https:\u002F\u002Fgithub.com\u002Fir100\u002Fir100) - 信息检索 100 题。\n * [kaggle_llm_book](https:\u002F\u002Fgithub.com\u002Fsinchir0\u002Fkaggle_llm_book) - 《Kaggle 入门大规模语言模型——自然语言处理〈实战〉编程》的支持网站。\n\n|名称|每周下载量|总下载量|星数|最后提交|\n-|-|-|-|-\n| 🔗 [spacy_tutorial](https:\u002F\u002Fgithub.com\u002Fyuibi\u002Fspacy_tutorial) | - | - | ⭐ 65 | 🔴 2020年1月|\n| 🔗 [fastTextJapaneseTutorial](https:\u002F\u002Fgithub.com\u002Ficoxfog417\u002FfastTextJapaneseTutorial) | - | - | ⭐ 205 | 🔴 2016年9月|\n| 🔗 [allennlp-NER-ja](https:\u002F\u002Fgithub.com\u002Fshunk031\u002Fallennlp-NER-ja) | - | - | ⭐ 5 | 🔴 2022年5月|\n| 🔗 [chariot-PyTorch-Japanese-text-classification](https:\u002F\u002Fgithub.com\u002Fymym3412\u002Fchariot-PyTorch-Japanese-text-classification) | - | - | ⭐ 5 | 🔴 2019年3月|\n| 🔗 [ginza-examples](https:\u002F\u002Fgithub.com\u002Fpoyo46\u002Fginza-examples) | - | - | ⭐ 15 | 🔴 2021年1月|\n| 🔗 [DocumentClassificationUsingBERT-Japanese](https:\u002F\u002Fgithub.com\u002Fnekoumei\u002FDocumentClassificationUsingBERT-Japanese) | - | - | ⭐ 0 | 🟡 2025年8月|\n| 🔗 [BERT_Japanese_Google_Colaboratory](https:\u002F\u002Fgithub.com\u002FYutaroOgawa\u002FBERT_Japanese_Google_Colaboratory) | - | - | ⭐ 29 | 🔴 2022年1月|\n| 🔗 [bert-book](https:\u002F\u002Fgithub.com\u002Fstockmarkteam\u002Fbert-book) | - | - | ⭐ 264 | 🔴 2024年2月|\n| 🔗 [janome-tutorial](https:\u002F\u002Fgithub.com\u002Fmocobeta\u002Fjanome-tutorial) | - | - | ⭐ 31 | 🔴 2019年3月|\n| 🔗 [handson-language-models](https:\u002F\u002Fgithub.com\u002Fhnishi\u002Fhandson-language-models) | - | - | ⭐ 3 | 🔴 2021年3月|\n| 🔗 [JapaneseNLI](https:\u002F\u002Fgithub.com\u002Fverypluming\u002FJapaneseNLI) | - | - | ⭐ 6 | 🔴 2021年6月|\n| 🔗 [deep-learning-with-pytorch-ja](https:\u002F\u002Fgithub.com\u002FGin5050\u002Fdeep-learning-with-pytorch-ja) | - | - | ⭐ 143 | 🔴 2021年5月|\n| 🔗 [bert-classification-tutorial](https:\u002F\u002Fgithub.com\u002FhppRC\u002Fbert-classification-tutorial) | - | - | ⭐ 234 | 🔴 2024年5月|\n| 🔗 [python-nlp-book](https:\u002F\u002Fgithub.com\u002Fpython-nlp-book\u002Fpython-nlp-book) | - | - | ⭐ 10 | 🔴 2023年5月|\n| 🔗 [llm-book](https:\u002F\u002Fgithub.com\u002Fghmagazine\u002Fllm-book) | - | - | ⭐ 467 | 🟡 2025年12月|\n| 🔗 [nlp2024-tutorial-3](https:\u002F\u002Fgithub.com\u002Fhiroshi-matsuda-rit\u002Fnlp2024-tutorial-3) | - | - | ⭐ 113 | 🔴 2024年4月|\n| 🔗 [japanese-ir-tutorial](https:\u002F\u002Fgithub.com\u002Fmpkato\u002Fjapanese-ir-tutorial) | - | - | ⭐ 3 | 🔴 2024年6月|\n| 🔗 [nlpbook](https:\u002F\u002Fgithub.com\u002Fmamorlis\u002Fnlpbook) | - | - | ⭐ 14 | 🟡 2025年4月|\n| 🔗 [kantan-regex-book](https:\u002F\u002Fgithub.com\u002Fmakenowjust\u002Fkantan-regex-book) | - | - | ⭐ 22 | 🔴 2024年3月|\n| 🔗 [bert-classification-tutorial-2024](https:\u002F\u002Fgithub.com\u002Fhpprc\u002Fbert-classification-tutorial-2024) | - | - | ⭐ 30 | 🔴 2024年7月|\n| 🔗 [Gemma2_2b_Japanese_finetuning_colab.ipynb](https:\u002F\u002Fgithub.com\u002Fqianniu95\u002Fgemma2_2b_finetune_jp_tutorial\u002Fblob\u002Fmain\u002FGemma2_2b_Japanese_finetuning_colab.ipynb) | - | - | ⭐ 仓库未找到 | 🔴 2024年8月|\n| 🔗 [nlp100v2020](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp100v2020) | - | - | ⭐ 90 | 🟡 2025年4月|\n| 🔗 [textmining-ja](https:\u002F\u002Fgithub.com\u002Fpaithiov909\u002Ftextmining-ja) | - | - | ⭐ 3 | 🟢 3月|\n| 🔗 [nlp2025-tutorial-2](https:\u002F\u002Fgithub.com\u002Fyuiseki\u002Fnlp2025-tutorial-2) | - | - | ⭐ 17 | 🟢 2月|\n| 🔗 [nlp100v2025](https:\u002F\u002Fgithub.com\u002Fupura\u002Fnlp100v2025) | - | - | ⭐ 90 | 🟡 2025年4月|\n| 🔗 [public-annotations](https:\u002F\u002Fgithub.com\u002Fmanga109\u002Fpublic-annotations) | - | - | ⭐ 13 | 🟡 2025年4月|\n| 🔗 [topic-models-ao](https:\u002F\u002Fgithub.com\u002Fanemptyarchive\u002Ftopic-models-ao) | - | - | ⭐ 4 | 🟡 2025年5月|\n| 🔗 [slp2025](https:\u002F\u002Fgithub.com\u002Fryota-komatsu\u002Fslp2025) | - | - | ⭐ 64 | 🟢 上周三|\n| 🔗 [book_impress_it-basic-education-ai](https:\u002F\u002Fgithub.com\u002Fliber-craft-co-ltd\u002Fbook_impress_it-basic-education-ai) | - | - | ⭐ 4 | 🟡 2025年6月|\n| 🔗 [genai-agent-advanced-book](https:\u002F\u002Fgithub.com\u002Fmasamasa59\u002Fgenai-agent-advanced-book) | - | - | ⭐ 194 | 🟡 2025年9月|\n| 🔗 [course2024-nlp](https:\u002F\u002Fgithub.com\u002Ftomonari-masada\u002Fcourse2024-nlp) | - | - | ⭐ 仓库未找到 | 🔴 仓库未找到|\n| 🔗 [support-genai-book](https:\u002F\u002Fgithub.com\u002Fyoheikikuta\u002Fsupport-genai-book) | - | - | ⭐ 91 | 🟢 1月|\n| 🔗 [ir100](https:\u002F\u002Fgithub.com\u002Fir100\u002Fir100) | - | - | ⭐ 93 | 🟡 2025年12月|\n| 🔗 [kaggle_llm_book](https:\u002F\u002Fgithub.com\u002Fsinchir0\u002Fkaggle_llm_book) | - | - | ⭐ 31 | 🟢 3月|\n\n\n\n\n## 研究总结\n日本自然语言处理研究中的各类研究与论文摘要\n\n * [awesome-bert-japanese](https:\u002F\u002Fgithub.com\u002Fhimkt\u002Fawesome-bert-japanese) - 包含日语预训练 BERT 模型的列表，附带词\u002F子词分词及词汇构建算法信息\n * [GEC-Info-ja](https:\u002F\u002Fgithub.com\u002Fgotutiyan\u002FGEC-Info-ja) - 收集并分类有关日语文法错误修正相关文献的仓库\n * [dataset-list](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fdataset-list) - 文本语料库等列表（主要为日语）\n * [tuning_playbook_ja](https:\u002F\u002Fgithub.com\u002FValkyrja3607\u002Ftuning_playbook_ja) - 用于系统性地最大化深度学习模型性能的指南\n * [japanese-pitch-accent-resources](https:\u002F\u002Fgithub.com\u002Folety\u002Fjapanese-pitch-accent-resources) - 旨在将日语音韵资源，尤其是重音资源整合到一个列表中\n * [awesome-japanese-llm](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fawesome-japanese-llm) - 开源日语大语言模型汇总\n\n\n|名称|每周下载量|总下载量|星数|最后提交|\n-|-|-|-|-\n| 🔗 [awesome-bert-japanese](https:\u002F\u002Fgithub.com\u002Fhimkt\u002Fawesome-bert-japanese) | - | - | ⭐ 132 | 🔴 2023年3月|\n| 🔗 [GEC-Info-ja](https:\u002F\u002Fgithub.com\u002Fgotutiyan\u002FGEC-Info-ja) | - | - | ⭐ 13 | 🟡 2025年4月|\n| 🔗 [dataset-list](https:\u002F\u002Fgithub.com\u002Fikegami-yukino\u002Fdataset-list) | - | - | ⭐ 118 | 🔴 2024年7月|\n| 🔗 [tuning_playbook_ja](https:\u002F\u002Fgithub.com\u002FValkyrja3607\u002Ftuning_playbook_ja) | - | - | ⭐ 190 | 🔴 2023年1月|\n| 🔗 [japanese-pitch-accent-resources](https:\u002F\u002Fgithub.com\u002Folety\u002Fjapanese-pitch-accent-resources) | - | - | ⭐ 126 | 🔴 2024年2月|\n| 🔗 [awesome-japanese-llm](https:\u002F\u002Fgithub.com\u002Fllm-jp\u002Fawesome-japanese-llm) | - | - | ⭐ 1.4k | 🟢 3月|\n\n\n## 参考资料\n\n * [自然语言处理的饼屋](https:\u002F\u002Fwww.jnlp.org\u002Fnlp\u002Ftop)\n * [yasuoka的日志：日语依存句法分析器“2020年的全面梳理”](https:\u002F\u002Fsrad.jp\u002F~yasuoka\u002Fjournal\u002F643631\u002F)\n * [yasuoka的日志：日语依存句法分析器“2021年的全面梳理”](https:\u002F\u002Fsrad.jp\u002F~yasuoka\u002Fjournal\u002F651542\u002F)\n * https:\u002F\u002Fgithub.com\u002Ftopics\u002Fjapanese?l=python\n * https:\u002F\u002Fgithub.com\u002Ftopics\u002Fjapanese-language?l=python\n * https:\u002F\u002Fgithub.com\u002Fsearch?o=desc&q=corpus+japanese&s=&type=Repositories\n * https:\u002F\u002Fpaperswithcode.com\u002Fdatasets?lang=japanese\n * https:\u002F\u002Fgithub.com\u002Fhimkt\u002Fawesome-bert-japanese\n * [Awesome-Rust-MachineLearning-面向日语的 Rust 库和文章等的汇总](https:\u002F\u002Fgithub.com\u002Fvaaaaanquish\u002FAwesome-Rust-MachineLearning\u002Fblob\u002Fmain\u002FREADME.ja.md)\n * [大规模语言模型入门Ⅱ——生成式 LLM 的实现与评估](https:\u002F\u002Fgihyo.jp\u002Fbook\u002F2024\u002F978-4-297-14393-0)\n\n\n## 贡献者\n\n * [kaisugi](https:\u002F\u002Fgithub.com\u002Fkaisugi) - [网站](https:\u002F\u002Fkaisugi.me)\n * [bomin0624](https:\u002F\u002Fgithub.com\u002Fbomin0624) - [推特](https:\u002F\u002Ftwitter.com\u002Fbomin0624_c)\n * [passaglia](https:\u002F\u002Fgithub.com\u002Fpassaglia) - [推特](https:\u002F\u002Ftwitter.com\u002FSamPassaglia)\n * [sarumaj](https:\u002F\u002Fgithub.com\u002Fsarumaj) - [GitHub](https:\u002F\u002Fgithub.com\u002Fsarumaj)\n * [ln2058](https:\u002F\u002Fgithub.com\u002Fln2058) - [GitHub](https:\u002F\u002Fgithub.com\u002Fln2058)\n * [ajtgjmdjp](https:\u002F\u002Fgithub.com\u002Fajtgjmdjp) - [GitHub](https:\u002F\u002Fgithub.com\u002Fajtgjmdjp)","# awesome-japanese-nlp-resources 快速上手指南\n\n`awesome-japanese-nlp-resources` 并非单一的软件库，而是一个精选的日语自然语言处理（NLP）资源列表，涵盖了 Python 库、大模型、词典、语料库等。本指南将指导你如何快速搭建环境，并使用列表中几个最主流的工具进行日语分词、词性标注和句法分析。\n\n## 环境准备\n\n在开始之前，请确保你的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows (Windows 用户建议安装 WSL2 以获得最佳兼容性)。\n*   **Python 版本**: 推荐 Python 3.8 及以上版本。\n*   **包管理器**: 已安装 `pip` 或 `conda`。\n*   **系统依赖**:\n    *   部分底层库（如 MeCab, Sudachi）可能需要编译环境。\n    *   **Ubuntu\u002FDebian**: `sudo apt-get install build-essential cmake`\n    *   **macOS**: 确保已安装 Xcode Command Line Tools (`xcode-select --install`)。\n    *   **Windows**: 建议安装 Visual Studio Build Tools。\n\n> **提示**：国内开发者建议使用清华源或阿里源加速 Python 包下载。\n> ```bash\n> pip config set global.index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\n由于该列表包含众多工具，这里演示安装三个最具代表性的库：**SudachiPy** (工业级分词), **Janome** (纯 Python 实现，无需额外依赖), 和 **Ginza** (基于 spaCy 的句法分析)。\n\n### 1. 安装 SudachiPy (推荐用于生产环境)\nSudachi 是 Works Applications 开发的高性能分词器，支持多种分词模式。\n\n```bash\npip install sudachipy sudachidict_core\n```\n*注：`sudachidict_core` 是核心词典，必须安装才能使用。*\n\n### 2. 安装 Janome (推荐用于快速原型或无编译环境)\nJanome 是完全用 Python 编写的分词器，安装最简单，无需系统级依赖。\n\n```bash\npip install janome\n```\n\n### 3. 安装 Ginza (推荐用于句法分析和依存关系)\nGinza 基于 spaCy 框架，提供高精度的日语依存句法分析。\n\n```bash\npip install ginza\n```\n*注：首次运行时，Ginza 会自动下载对应的模型文件。*\n\n## 基本使用\n\n以下是各库的最简使用示例。\n\n### 1. 使用 SudachiPy 进行分词\n\n```python\nfrom sudachipy import tokenizer\nfrom sudachipy import dictionary\n\n# 创建分词器实例\ntokenizer_obj = dictionary.Dictionary().create()\n\n# 待处理的日语文本\ntext = \"日本語の自然言語処理は面白いです。\"\n\n# 执行分词 (Mode.C 为标准模式)\ntokens = tokenizer_obj.tokenize(text, tokenizer.Tokenizer.SplitMode.C)\n\nfor token in tokens:\n    # 获取表面形式 (单词本身)\n    surface = token.surface()\n    # 获取词性 (Part of Speech)\n    pos = token.part_of_speech()\n    print(f\"{surface}\\t{pos}\")\n```\n\n### 2. 使用 Janome 进行分词与词性标注\n\n```python\nfrom janome.tokenizer import Tokenizer\n\n# 初始化分词器\nt = Tokenizer()\n\ntext = \"日本語の自然言語処理は面白いです。\"\n\n# 分词并遍历结果\nfor token in t.tokenize(text):\n    # surface: 单词，part_of_speech: 词性大类，detailed_part_of_speech: 词性细节\n    print(f\"{token.surface}\\t{token.part_of_speech}\\t{token.detailed_part_of_speech}\")\n```\n\n### 3. 使用 Ginza 进行依存句法分析\n\n```python\nimport spacy\n\n# 加载日语模型 (首次运行会自动下载)\nnlp = spacy.load(\"ja_ginza\")\n\ntext = \"猫がマットの上で寝ています。\"\n\n# 处理文本\ndoc = nlp(text)\n\n# 打印依存关系分析结果\nfor token in doc:\n    # text: 单词，dep_: 依存关系，head.text: 支配词\n    print(f\"{token.text}\\t{token.dep_}\\t{token.head.text}\")\n```\n\n## 更多资源探索\n\n上述工具仅是 `awesome-japanese-nlp-resources` 列表中的一部分。你可以访问其官方仓库查阅更多类别的资源：\n\n*   **Hugging Face 模型**: 查找最新的日语 BERT、LLM 模型及数据集。\n*   **其他语言绑定**: 查看 C++, Rust, Go, Java 等语言的实现。\n*   **语料库**: 获取用于训练的词性标注、命名实体识别及平行语料库。\n\n请访问项目主页获取完整列表：[awesome-japanese-nlp-resources](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources)","某跨境电商团队急需构建一套日语用户评论自动分析系统，以快速提取产品反馈并优化运营策略。\n\n### 没有 awesome-japanese-nlp-resources 时\n- **资源检索如大海捞针**：开发人员需在 GitHub、Hugging Face 及各类论文中盲目搜索，难以区分哪些库支持最新的日语分词或情感分析，耗时数周仍无法确定技术选型。\n- **模型兼容性风险高**：随意下载的预训练模型往往缺乏详细的日语语料说明，导致在处理敬语、方言或特定行业术语时准确率极低，且难以找到对应的修正方案。\n- **生态工具链断裂**：找到了分词库却找不到配套的词典或数据清洗工具，不同组件间的数据格式不统一，需要编写大量胶水代码进行转换，严重拖慢开发进度。\n- **重复造轮子现象严重**：因不了解社区已有的开源成果（如特定的 OCR 后处理或句法解析器），团队花费宝贵精力重新实现基础功能，造成人力资源的巨大浪费。\n\n### 使用 awesome-japanese-nlp-resources 后\n- **一站式精准选型**：直接查阅分类清晰的清单，几分钟内即可锁定适合电商场景的 Python 分词库（如 SudachiPy）和专用的日语情感分析模型，将技术调研时间从数周压缩至半天。\n- **经过验证的高质量资源**：依托列表中 curated（精选）的 278+ Hugging Face 模型与数据集，团队直接选用针对日语商业文本微调过的模型，显著提升了对复杂句式和隐含情感的识别精度。\n- **完整闭环的工具生态**：按图索骥获取从预处理、形态素分析到命名实体识别的全套兼容工具，确保数据流转顺畅，无需额外开发格式转换接口，系统搭建效率提升 300%。\n- **站在巨人肩膀上创新**：充分利用列表中收录的现成语料库和教程，团队跳过基础基建阶段，直接将精力集中在业务逻辑优化上，提前两周完成系统上线。\n\nawesome-japanese-nlp-resources 通过整合分散的日语 NLP 生态资源，将原本混乱的技术探索过程转化为高效的标准作业流程，极大降低了日语人工智能应用的落地门槛。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaishi-i_awesome-japanese-nlp-resources_7eb60fab.png","taishi-i",null,"https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ftaishi-i_a46c605b.png","@CyberAgentAILab ","https:\u002F\u002Fgithub.com\u002Ftaishi-i",945,39,"2026-04-08T07:08:57","CC0-1.0",1,"Linux, macOS, Windows","未说明",{"notes":86,"python":84,"dependencies":87},"该仓库是一个日语 NLP 资源列表（Awesome List），本身不是一个单一的 AI 模型或工具，因此没有统一的运行环境需求。它收录了多种不同语言（Python, C++, Rust, Go, Java, JavaScript）编写的库、预训练模型和数据集。具体的环境需求（如 Python 版本、GPU 要求、依赖库等）需参考列表中各个独立项目的文档。部分基于深度学习的模型（如 BERT, SudachiPy 等）可能需要特定的深度学习框架支持。",[],[14,35],[90,91,92,93,94,95,96,97,98],"awesome","japanese","natural-language-processing","nlp","nlp-library","japanese-language","awesome-list","cc0","llm","2026-03-27T02:49:30.150509","2026-04-12T07:59:18.529119",[102,107,112,117,121,125,130,134],{"id":103,"question_zh":104,"answer_zh":105,"source_url":106},30369,"是否有用于日语文本自动添加标点的机器学习模型或工具？","有的。维护者提供了一个专门用于解决此问题的模型文件。如果在使用 GitHub 下载的模型时遇到权限错误，请使用重新上传的修复版本：[model_to_add_punctuation.zip](https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Ffiles\u002F12918225\u002Fmodel_to_add_punctuation.zip)。该模型旨在预测文本中需要添加标点的位置。","https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fissues\u002F13",{"id":108,"question_zh":109,"answer_zh":110,"source_url":111},30370,"有哪些高效的工具可以为电子书（ePub）中的汉字添加读音（振假名）？","推荐使用 [yomigana-ebook](https:\u002F\u002Fgithub.com\u002Frabbit19981023\u002Fyomigana-ebook)，该工具利用多进程库，性能比 furigana4epub 快约 4 倍。另外，还有一个名为 [furiganalyse](https:\u002F\u002Fgithub.com\u002Fitsupera\u002Ffuriganalyse) 的工具，如果正确配置多进程，其速度可能比 yomigana-ebook 再快约 20%。这些工具已被收录在资源列表中。","https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fissues\u002F11",{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},30371,"有哪些适用于搜索场景且易于使用的日语形态素分析器？","推荐以下两个主要项目：\n1. [atilika\u002Fkuromoji](https:\u002F\u002Fgithub.com\u002Fatilika\u002Fkuromoji)：一个自包含且专为搜索设计的日语形态素分析器，非常易于使用。\n2. [takuyaa\u002Fkuromoji.js](https:\u002F\u002Fgithub.com\u002Ftakuyaa\u002Fkuromoji.js)：Kuromoji 的 JavaScript 实现版本，适合前端或 Node.js 环境使用。","https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fissues\u002F1",{"id":118,"question_zh":119,"answer_zh":120,"source_url":116},30372,"有哪些基于 BERT 的日语依存句法分析器或分词工具？","可以考虑以下基于 BERT 架构的高级工具：\n1. [ku-nlp\u002Fbertknp](https:\u002F\u002Fgithub.com\u002Fku-nlp\u002Fbertknp)：基于 BERT 的日语依存句法分析器。\n2. [KoichiYasuoka\u002Fesupar](https:\u002F\u002Fgithub.com\u002FKoichiYasuoka\u002Fesupar)：支持日语及其他语言的分词器、词性标注器和依存句法分析器，支持 BERT\u002FRoBERTa\u002FDeBERTa 模型。\n3. [daac-tools\u002Fvaporetto](https:\u002F\u002Fgithub.com\u002Fdaac-tools\u002Fvaporetto)：一种基于点对点预测的高速分词器（Rust 实现）。",{"id":122,"question_zh":123,"answer_zh":124,"source_url":116},30373,"是否有针对日语医疗领域的预训练 BERT 模型或疾病名称提取工具？","是的，针对医疗垂直领域有以下资源：\n1. [ou-medinfo\u002Fmedbertjp](https:\u002F\u002Fgithub.com\u002Fou-medinfo\u002Fmedbertjp)：日语医疗领域的预训练 BERT 模型试验项目。\n2. [sociocom\u002FMedNER-J](https:\u002F\u002Fgithub.com\u002Fsociocom\u002FMedNER-J)：MedEX\u002FJ 的最新版本，专门用于提取日语疾病名称的命名实体识别工具。",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},30374,".NET 环境下有什么推荐的日语形态素分析引擎吗？","推荐使用 [NMeCab](https:\u002F\u002Fgithub.com\u002Fkomutan\u002FNMeCab)。这是一个专为 .NET 平台开发的日语形态素分析器，该项目已被正式收录到资源列表的“形态素分析工具”类别中。","https:\u002F\u002Fgithub.com\u002Ftaishi-i\u002Fawesome-japanese-nlp-resources\u002Fissues\u002F4",{"id":131,"question_zh":132,"answer_zh":133,"source_url":116},30375,"有哪些用于日语对话系统或聊天机器人的预训练模型和代码实现？","可以参考以下资源：\n1. [nttcslab\u002Fjapanese-dialog-transformers](https:\u002F\u002Fgithub.com\u002Fnttcslab\u002Fjapanese-dialog-transformers)：NTT 提供的日语预训练模型评估代码。\n2. [reppy4620\u002FDialog](https:\u002F\u002Fgithub.com\u002Freppy4620\u002FDialog)：使用 BERT 和 Transformer Decoder 实现的日语聊天机器人 PyTorch 代码。\n3. [octanove\u002Fshiba](https:\u002F\u002Fgithub.com\u002Foctanove\u002Fshiba)：CANINE（高效字符级 Transformer）的 PyTorch 实现及日语预训练模型。",{"id":135,"question_zh":136,"answer_zh":137,"source_url":111},30376,"如何为这个资源列表贡献新的工具或成为贡献者？","您可以通过提交 Pull Request (PR) 来添加新工具。如果您希望被列为贡献者（Contributors），请在 README.md 文件的 Contributors 部分添加您的 GitHub 链接以及 Twitter 或个人网站链接，然后发送 PR。维护者会在确认合并后更新列表。",[]]