[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-cmusphinx--g2p-seq2seq":3,"tool-cmusphinx--g2p-seq2seq":64},[4,17,25,39,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":10,"last_commit_at":23,"category_tags":24,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":26,"name":27,"github_repo":28,"description_zh":29,"stars":30,"difficulty_score":10,"last_commit_at":31,"category_tags":32,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[33,34,35,36,14,37,15,13,38],"图像","数据工具","视频","插件","其他","音频",{"id":40,"name":41,"github_repo":42,"description_zh":43,"stars":44,"difficulty_score":45,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[14,33,13,15,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":45,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[15,33,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":45,"last_commit_at":62,"category_tags":63,"status":16},2181,"OpenHands","OpenHands\u002FOpenHands","OpenHands 是一个专注于 AI 驱动开发的开源平台，旨在让智能体（Agent）像人类开发者一样理解、编写和调试代码。它解决了传统编程中重复性劳动多、环境配置复杂以及人机协作效率低等痛点，通过自动化流程显著提升开发速度。\n\n无论是希望提升编码效率的软件工程师、探索智能体技术的研究人员，还是需要快速原型验证的技术团队，都能从中受益。OpenHands 提供了灵活多样的使用方式：既可以通过命令行（CLI）或本地图形界面在个人电脑上轻松上手，体验类似 Devin 的流畅交互；也能利用其强大的 Python SDK 自定义智能体逻辑，甚至在云端大规模部署上千个智能体并行工作。\n\n其核心技术亮点在于模块化的软件智能体 SDK，这不仅构成了平台的引擎，还支持高度可组合的开发模式。此外，OpenHands 在 SWE-bench 基准测试中取得了 77.6% 的优异成绩，证明了其解决真实世界软件工程问题的能力。平台还具备完善的企业级功能，支持与 Slack、Jira 等工具集成，并提供细粒度的权限管理，适合从个人开发者到大型企业的各类用户场景。",70626,"2026-04-05T22:51:36",[15,14,13,36],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":87,"forks":88,"last_commit_at":89,"license":90,"difficulty_score":91,"env_os":92,"env_gpu":93,"env_ram":92,"env_deps":94,"category_tags":101,"github_topics":102,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":105,"updated_at":106,"faqs":107,"releases":136},3680,"cmusphinx\u002Fg2p-seq2seq","g2p-seq2seq","G2P with Tensorflow","g2p-seq2seq 是一款基于 TensorFlow 开发的开源工具，专注于将英文单词的拼写（字素）自动转换为对应的发音符号（音素），即实现 G2P 转换。它主要解决了语音合成、语音识别及语言学研究中，如何准确地将文本映射为标准发音序列的难题，尤其擅长处理英语中不规则的拼读关系。\n\n这款工具非常适合语音技术开发者、人工智能研究人员以及需要构建自定义发音词典的语言学家使用。与普通规则引擎不同，g2p-seq2seq 的核心亮点在于采用了先进的 Transformer 模型架构。它摒弃了传统的循环神经网络（RNN）结构，完全依赖注意力机制来捕捉输入与输出之间的全局依赖关系，从而在长序列建模和复杂转导任务中表现出更高的效率与准确率。\n\n用户既可以直接下载预训练模型，通过命令行快速对单词列表进行批量发音生成或交互式测试，也可以利用自己的词典数据从头训练专属模型。其灵活的参数配置允许调整网络层数、隐藏单元大小等关键指标，以满足不同场景下的精度与性能需求。无论是用于评估现有模型的错误率，还是探索新的语音数据处理流程，g2p-seq2seq 都提供了一个高效且易于扩展的技术底座。","[![Build Status](https:\u002F\u002Ftravis-ci.org\u002Fcmusphinx\u002Fg2p-seq2seq.svg?branch=master)](https:\u002F\u002Ftravis-ci.org\u002Fcmusphinx\u002Fg2p-seq2seq)\n\n# Sequence-to-Sequence G2P toolkit\n\nThe tool does Grapheme-to-Phoneme (G2P) conversion using transformer model\nfrom tensor2tensor toolkit [1]. A lot of approaches in sequence modeling and\ntransduction problems use recurrent neural networks. But, transformer model\narchitecture eschews recurrence and instead relies entirely on an attention\nmechanism to draw global dependencies between input and output [2].\n\nThis implementation is based on python\n[TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002Ftutorials\u002Fseq2seq\u002F),\nwhich allows an efficient training on both CPU and GPU.\n\n## Installation\n\nThe tool requires TensorFlow at least version 1.8.0 and Tensor2Tensor version 1.6.6 or higher. Please see the installation\n[guide](https:\u002F\u002Fwww.tensorflow.org\u002Finstall\u002F)\nfor TensorFlow installation details, and details about the Tensor2Tensor installation see [guide](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor)\n\n\nThe g2p-seq2seq package itself uses setuptools, so you can follow standard installation process:\n\n```\nsudo python setup.py install\n```\n\nYou can also run the tests\n\n```\npython setup.py test\n```\n\nThe runnable script `g2p-seq2seq` is installed in  `\u002Fusr\u002Flocal\u002Fbin` folder by default (you can adjust it with `setup.py` options if needed) . You need to make sure you have this folder included in your `PATH` so you can run this script from command line.\n\n## Running G2P\n\nA pretrained 3-layer transformer model with 256 hidden units is [available for download on cmusphinx website](https:\u002F\u002Fsourceforge.net\u002Fprojects\u002Fcmusphinx\u002Ffiles\u002FG2P%20Models\u002Fg2p-seq2seq-model-6.2-cmudict-nostress.tar.gz\u002Fdownload).\nUnpack the model after download. The model is trained on [CMU English dictionary](http:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fcmudict)\n\n```\nwget -O g2p-seq2seq-cmudict.tar.gz https:\u002F\u002Fsourceforge.net\u002Fprojects\u002Fcmusphinx\u002Ffiles\u002FG2P%20Models\u002Fg2p-seq2seq-model-6.2-cmudict-nostress.tar.gz\u002Fdownload\ntar xf g2p-seq2seq-cmudict.tar.gz\n```\n\nThe easiest way to check how the tool works is to run it the interactive mode and type the words\n\n```\n$ g2p-seq2seq --interactive --model_dir model_folder_path\n...\n> hello\n...\nPronunciations: [HH EH L OW]\n...\n>\n```\n\nTo generate pronunciations for an English word list with a trained model, run\n\n```\n  g2p-seq2seq --decode your_wordlist --model_dir model_folder_path [--output decode_output_file_path]\n```\n\nThe wordlist is a text file with one word per line\n\nIf you wish to list top N variants of decoding, set return_beams flag and specify beam_size:\n\n```\n  g2p-seq2seq --decode your_wordlist --model_dir model_folder_path --return_beams --beam_size number_returned_beams [--output decode_output_file_path]\n```\n\nTo evaluate Word Error Rate of the trained model, run\n\n```\n  g2p-seq2seq --evaluate your_test_dictionary --model_dir model_folder_path\n```\n\nThe test dictionary should be a dictionary in standard format:\n```\nhello HH EH L OW\nbye B AY\n```\n\nYou may also calculate Word Error Rate considering all top N best decoded results. In this case we consider word decoding as error only if none of the decoded pronunciations will match with the ground true pronunciation of the word.\n\n## Training G2P system\n\nTo train G2P you need a dictionary (word and phone sequence per line).\nSee an [example dictionary](http:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fcmudict)\n\n```\n  g2p-seq2seq --train train_dictionary.dic --model_dir model_folder_path\n```\n\nYou can set up maximum training steps:\n```\n  \"--max_epochs\" - Maximum number of training epochs (Default: 0).\n     If 0 train until no improvement is observed\n```\n\nIt is a good idea to play with the following parameters:\n```\n  \"--size\" - Size of each model layer (Default: 256).\n\n  \"--num_layers\" - Number of layers in the model (Default: 3).\n\n  \"--filter_size\" - The size of the filter layer in a convolutional layer (Default: 512)\n\n  \"--num_heads\" - Number of applied heads in Multi-attention mechanism (Default: 4)\n```\n\nYou can manually point out Development and Test datasets:\n```\n  \"--valid\" - Development dictionary (Default: created from train_dictionary.dic)\n  \"--test\" - Test dictionary (Default: created from train_dictionary.dic)\n```\n\nOtherwise, The program will split the dataset that you feed to it in the training mode itself. In the directory with the training data you will find three data files with the following extensions: \".train\", \".dev\" and \".test\".\n\nIn the case where you have raw dictionary with stress (for example, like in [CMU English dictionary](http:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fcmudict)), you may set the following parameter while launching the train mode:\n```\n  \"--cleanup\" - Set to True to cleanup dictionary from stress and comments.\n```\n\nIf you need to continue training a saved model just point out the directory with the existing model:\n```\n  g2p-seq2seq --train train_dictionary.dic --model_dir model_folder_path\n```\n\nAnd, if you want to start training from scratch:\n```\n  \"--reinit\" - Rewrite model in model_folder_path\n```\n\nAlso, in case of solving inverse problem:\n```\n  \"--p2g\" - Run the program in a phoneme-to-grapheme conversion mode.\n```\n\nThe differences in pronunciations between short and long words can be significant. So, seq2seq models apply bucketing technique to take account of such problems. On the other hand, splitting initial data into too many buckets can worsen the final results. Because in this case there will not be sufficient amount of examples in each particular bucket. To get better results, you may tune the following three parameters that change the number and size of the buckets:\n```\n  \"--min_length_bucket\" - the size of the minimal bucket (Default: 6)\n  \"--max_length\" - maximal possible length of words or maximal number of phonemes in pronunciations (Default: 30)\n  \"--length_bucket_step\" - multiplier that controls the number of length buckets in the data. The buckets have maximum lengths from min_bucket_length to max_length, increasing by factors of length_bucket_step (Default: 1.5)\n```\n\nAfter training the model, you may freeze it:\n```\n  g2p-seq2seq --model_dir model_folder_path --freeze\n```\n\nFile \"frozen_model.pb\" will appear in \"model_folder_path\" directory after launching previous command. And now, if you run one of the decoding modes, The program will load and use this frozen graph.\n\n\n#### Word error rate on CMU dictionary data sets\n\nSystem | WER ([CMUdict PRONASYL 2007](https:\u002F\u002Fsourceforge.net\u002Fprojects\u002Fcmusphinx\u002Ffiles\u002FG2P%20Models\u002Fphonetisaurus-cmudict-split.tar.gz)), % | WER ([CMUdict latest\\*](https:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fcmudict)), %\n--- | --- | ---\nBaseline WFST (Phonetisaurus) | 24.4 | 33.89\nTransformer num_layers=3, size=256   | 20.6 | 30.2\n\\* These results pointed out for dictionary without stress.\n\n## References\n---------------------------------------\n\n[1] Lukasz Kaiser. \"Accelerating Deep Learning Research with the Tensor2Tensor Library.\" In Google Research Blog, 2017.\n\n[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lucasz Kaiser, and Illia Polosukhin. \"Attention Is All You Need.\"\narXiv preprint\narXiv:1706.03762, 2017.\n","[![构建状态](https:\u002F\u002Ftravis-ci.org\u002Fcmusphinx\u002Fg2p-seq2seq.svg?branch=master)](https:\u002F\u002Ftravis-ci.org\u002Fcmusphinx\u002Fg2p-seq2seq)\n\n# 序列到序列的G2P工具包\n\n该工具使用来自tensor2tensor工具包 [1] 的Transformer模型进行字素到音素（G2P）转换。在序列建模和转导问题中，许多方法都使用循环神经网络。然而，Transformer模型架构摒弃了递归，完全依赖于注意力机制来捕捉输入和输出之间的全局依赖关系 [2]。\n\n此实现基于Python的[TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002Ftutorials\u002Fseq2seq\u002F)，可以在CPU和GPU上高效地进行训练。\n\n## 安装\n\n该工具需要TensorFlow至少版本1.8.0以及Tensor2Tensor版本1.6.6或更高。请参阅[TensorFlow安装指南](https:\u002F\u002Fwww.tensorflow.org\u002Finstall\u002F)以获取详细的安装信息，并参考[Tensor2Tensor安装指南](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor)了解Tensor2Tensor的安装详情。\n\ng2p-seq2seq软件包本身使用setuptools，因此您可以按照标准的安装流程进行：\n\n```\nsudo python setup.py install\n```\n\n您还可以运行测试：\n\n```\npython setup.py test\n```\n\n可执行脚本`g2p-seq2seq`默认安装在`\u002Fusr\u002Flocal\u002Fbin`目录下（如有需要，可通过`setup.py`选项进行调整）。您需要确保该目录已包含在您的`PATH`环境变量中，以便可以从命令行运行此脚本。\n\n## 运行G2P\n\n一个预训练的3层Transformer模型，具有256个隐藏单元，可在[cmusphinx网站](https:\u002F\u002Fsourceforge.net\u002Fprojects\u002Fcmusphinx\u002Ffiles\u002FG2P%20Models\u002Fg2p-seq2seq-model-6.2-cmudict-nostress.tar.gz\u002Fdownload)下载。下载后解压模型。该模型是在[CMU英语词典](http:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fcmudict)上训练的。\n\n```\nwget -O g2p-seq2seq-cmudict.tar.gz https:\u002F\u002Fsourceforge.net\u002Fprojects\u002Fcmusphinx\u002Ffiles\u002FG2P%20Models\u002Fg2p-seq2seq-model-6.2-cmudict-nostress.tar.gz\u002Fdownload\ntar xf g2p-seq2seq-cmudict.tar.gz\n```\n\n检查工具工作方式的最简单方法是进入交互模式并输入单词：\n\n```\n$ g2p-seq2seq --interactive --model_dir model_folder_path\n...\n> hello\n...\n发音：[HH EH L OW]\n...\n>\n```\n\n要使用训练好的模型为英文单词列表生成发音，请运行：\n\n```\n  g2p-seq2seq --decode your_wordlist --model_dir model_folder_path [--output decode_output_file_path]\n```\n\n单词列表是一个每行一个单词的文本文件。\n\n如果您希望列出前N个解码变体，可以设置`return_beams`标志并指定束宽：\n\n```\n  g2p-seq2seq --decode your_wordlist --model_dir model_folder_path --return_beams --beam_size number_returned_beams [--output decode_output_file_path]\n```\n\n要评估训练模型的词错误率，请运行：\n\n```\n  g2p-seq2seq --evaluate your_test_dictionary --model_dir model_folder_path\n```\n\n测试词典应为标准格式的词典：\n```\nhello HH EH L OW\nbye B AY\n```\n\n您也可以计算考虑所有前N个最佳解码结果的词错误率。在这种情况下，只有当所有解码的发音都不匹配单词的真实发音时，才将该单词的解码视为错误。\n\n## 训练G2P系统\n\n要训练G2P，您需要一个词典（每行一个单词及其对应的音素序列）。请参阅[示例词典](http:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fcmudict)。\n\n```\n  g2p-seq2seq --train train_dictionary.dic --model_dir model_folder_path\n```\n\n您可以设置最大训练步数：\n```\n  \"--max_epochs\" - 最大训练轮数（默认：0）。\n     如果为0，则训练直到不再有改进为止\n```\n\n建议尝试以下参数：\n```\n  \"--size\" - 每层模型的大小（默认：256）。\n\n  \"--num_layers\" - 模型的层数（默认：3）。\n\n  \"--filter_size\" - 卷积层中滤波器层的大小（默认：512）\n\n  \"--num_heads\" - 多头注意力机制中应用的头数（默认：4）\n```\n\n您可以手动指定开发集和测试集：\n```\n  \"--valid\" - 开发词典（默认：由train_dictionary.dic创建）\n  \"--test\" - 测试词典（默认：由train_dictionary.dic创建）\n```\n\n否则，程序会自行将您提供的数据集在训练模式下进行分割。在训练数据所在的目录中，您将找到三个带有以下扩展名的数据文件：“.train”、“.dev”和“.test”。\n\n如果您拥有带重音的原始词典（例如[CMU英语词典](http:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fcmudict)），则可以在启动训练模式时设置以下参数：\n```\n  \"--cleanup\" - 设置为True以清除词典中的重音和注释。\n```\n\n如果需要继续训练已保存的模型，只需指定现有模型的目录即可：\n```\n  g2p-seq2seq --train train_dictionary.dic --model_dir model_folder_path\n```\n\n如果您想从头开始训练：\n```\n  \"--reinit\" - 在model_folder_path中重新初始化模型\n```\n\n此外，在解决逆向问题时：\n```\n  \"--p2g\" - 以音素到字素转换模式运行程序。\n```\n\n长短单词之间的发音差异可能很大。因此，序列到序列模型采用分桶技术来处理此类问题。另一方面，如果将初始数据分成过多的桶，可能会导致最终效果变差，因为每个桶中的样本数量可能不足。为了获得更好的结果，您可以调整以下三个控制桶的数量和大小的参数：\n```\n  \"--min_length_bucket\" - 最小桶的长度（默认：6）\n  \"--max_length\" - 单词的最大长度或发音中音素的最大数量（默认：30）\n  \"--length_bucket_step\" - 控制数据中长度桶数量的倍数因子。桶的最大长度从min_bucket_length到max_length，按length_bucket_step的倍数递增（默认：1.5）\n```\n\n训练完模型后，您可以将其冻结：\n```\n  g2p-seq2seq --model_dir model_folder_path --freeze\n```\n\n执行上述命令后，“model_folder_path”目录下会出现“frozen_model.pb”文件。现在，如果您运行任何一种解码模式，程序将加载并使用这个冻结的图。\n\n#### CMU词典数据集上的词错误率\n\n系统 | WER ([CMUdict PRONASYL 2007](https:\u002F\u002Fsourceforge.net\u002Fprojects\u002Fcmusphinx\u002Ffiles\u002FG2P%20Models\u002Fphonetisaurus-cmudict-split.tar.gz)), % | WER ([CMUdict最新版\\*](https:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fcmudict)), %\n--- | --- | ---\n基线WFST（Phonetisaurus） | 24.4 | 33.89\nTransformer num_layers=3, size=256   | 20.6 | 30.2\n\\* 这些结果针对的是无重音词典。\n\n## 参考文献\n---------------------------------------\n\n[1] 卢卡什·凯泽. “借助 Tensor2Tensor 库加速深度学习研究.” 载于 Google 研究博客, 2017 年.\n\n[2] 阿希什·瓦斯瓦尼, 诺姆·沙泽尔, 尼基·帕尔马尔, 雅各布·乌斯科雷特, 利昂·琼斯, 艾丹·N·戈麦斯, 卢卡什·凯泽, 以及 伊利亚·波洛苏金. “注意力即一切.”\narXiv 预印本\narXiv:1706.03762, 2017 年.","# g2p-seq2seq 快速上手指南\n\ng2p-seq2seq 是一个基于 Transformer 架构的图素转音素（Grapheme-to-Phoneme, G2P）工具包。它利用注意力机制处理输入输出间的全局依赖，支持在 CPU 和 GPU 上高效训练与推理，适用于英文单词发音生成任务。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**：Linux 或 macOS（Windows 需配置相应 Python 环境）\n*   **Python 版本**：推荐 Python 3.6+\n*   **核心依赖**：\n    *   TensorFlow >= 1.8.0\n    *   Tensor2Tensor >= 1.6.6\n\n> **注意**：由于该工具依赖较旧版本的 TensorFlow (1.x)，建议在使用前创建独立的虚拟环境（如 `venv` 或 `conda`）以避免版本冲突。国内用户可通过清华源或阿里源加速安装依赖：\n> ```bash\n> pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple tensorflow==1.15.0 tensor2tensor==1.15.7\n> ```\n\n## 安装步骤\n\n克隆项目或直接下载源码后，使用 setuptools 进行标准安装：\n\n```bash\nsudo python setup.py install\n```\n\n安装完成后，可运行测试验证安装是否成功：\n\n```bash\npython setup.py test\n```\n\n默认情况下，可执行脚本 `g2p-seq2seq` 会被安装到 `\u002Fusr\u002Flocal\u002Fbin` 目录。请确保该目录已包含在您的环境变量 `PATH` 中，以便在终端直接调用。\n\n## 基本使用\n\n### 1. 下载预训练模型\n\n最便捷的方式是使用官方提供的基于 CMU English Dictionary 训练的 3 层 Transformer 模型（256 隐藏单元）。\n\n```bash\nwget -O g2p-seq2seq-cmudict.tar.gz https:\u002F\u002Fsourceforge.net\u002Fprojects\u002Fcmusphinx\u002Ffiles\u002FG2P%20Models\u002Fg2p-seq2seq-model-6.2-cmudict-nostress.tar.gz\u002Fdownload\ntar xf g2p-seq2seq-cmudict.tar.gz\n```\n*解压后将生成包含模型文件的文件夹（例如 `model_folder_path`）。*\n\n### 2. 交互式测试\n\n运行交互模式，直接在命令行输入单词即可获取音标：\n\n```bash\ng2p-seq2seq --interactive --model_dir model_folder_path\n```\n\n**示例输出：**\n```text\n...\n> hello\n...\nPronunciations: [HH EH L OW]\n...\n>\n```\n\n### 3. 批量转换\n\n若需对单词列表文件（每行一个单词）进行批量转换，使用 `--decode` 参数：\n\n```bash\ng2p-seq2seq --decode your_wordlist --model_dir model_folder_path --output decode_output_file_path\n```\n\n如需获取每个单词的前 N 个最佳发音候选，可添加 `--return_beams` 和 `--beam_size` 参数：\n\n```bash\ng2p-seq2seq --decode your_wordlist --model_dir model_folder_path --return_beams --beam_size 5 --output decode_output_file_path\n```","某语音初创团队正在开发一款支持生僻人名和地名的智能客服系统，需要快速构建高精度的文本转音素（G2P）引擎以优化语音合成效果。\n\n### 没有 g2p-seq2seq 时\n- **生僻词识别率低**：传统基于规则或统计的 G2P 工具难以处理未登录词（OOV），导致“龘”、“犇”等复杂汉字或新造词发音错误频发。\n- **多音字歧义难解**：缺乏上下文感知能力，无法根据语义准确判断多音字（如“重庆”与“重复”）的正确读音，需人工硬编码大量例外规则。\n- **模型迭代成本高**：若要提升特定领域（如医疗术语）的准确率，重新训练循环神经网络（RNN）模型耗时极长，且难以在普通 GPU 上高效并行。\n- **发音多样性缺失**：只能输出单一标准发音，无法提供备选读音方案，限制了语音合成在自然度上的调整空间。\n\n### 使用 g2p-seq2seq 后\n- **生僻词泛化能力强**：利用 Transformer 架构的全局注意力机制，g2p-seq2seq 能精准捕捉字符间的长距离依赖，显著提升生僻字和新词的拼读准确率。\n- **语境消歧更智能**：模型自动学习上下文特征，无需手动编写规则即可准确区分多音字在不同词汇中的发音，大幅降低维护成本。\n- **训练效率大幅提升**：基于 TensorFlow 和 Tensor2Tensor 的实现支持高效的 GPU 并行训练，团队可在数小时内完成针对垂直领域词典的模型微调。\n- **支持多候选输出**：通过设置 `return_beams` 参数，g2p-seq2seq 可一次性输出 Top-N 种发音变体，为后续语音合成模块提供更丰富的韵律选择。\n\ng2p-seq2seq 凭借 Transformer 架构的优势，将原本繁琐的规则维护工作转化为高效的数据驱动训练，从根本上解决了开放域场景下的发音难题。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcmusphinx_g2p-seq2seq_32863d6b.png","cmusphinx","CMU Sphinx","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fcmusphinx_e083fedf.png","CMU Sphinx tools for speech recognition",null,"cmusphinx.github.io","https:\u002F\u002Fgithub.com\u002Fcmusphinx",[83],{"name":84,"color":85,"percentage":86},"Python","#3572A5",100,681,190,"2026-03-02T16:28:58","NOASSERTION",4,"未说明","非必需，支持在 CPU 和 GPU 上进行高效训练",{"notes":95,"python":96,"dependencies":97},"该工具基于较旧的 TensorFlow 1.x 版本构建。默认安装后脚本位于 \u002Fusr\u002Flocal\u002Fbin，需确保该目录在系统 PATH 中。提供预训练的 Transformer 模型（基于 CMU 英语词典），也支持从头训练或微调。训练时可配置桶排序（bucketing）参数以优化长短词的处理效果。","未说明 (需兼容 TensorFlow 1.8+)",[98,99,100],"tensorflow>=1.8.0","tensor2tensor>=1.6.6","setuptools",[15,38],[67,103,104],"g2p","cmudict","2026-03-27T02:49:30.150509","2026-04-06T09:43:34.050155",[108,113,118,123,128,132],{"id":109,"question_zh":110,"answer_zh":111,"source_url":112},16863,"运行测试时出现 'ImportError: cannot import name input_fn_builder' 错误怎么办？","这是因为版本不兼容。g2p-seq2seq 目前仅支持特定版本的 tensor2tensor（例如 1.5.7）。请卸载当前版本并安装兼容的旧版本：\npip uninstall tensor2tensor\npip install tensor2tensor==1.5.7","https:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fg2p-seq2seq\u002Fissues\u002F108",{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},16864,"遇到 'Estimator's model_fn includes params argument, but params are not passed to Estimator' 警告或错误如何解决？","这通常是因为传递给模型的训练步数（--num_train_steps）太少，小于评估步数（--num_eval_steps）或量化延迟参数设定的步数。解决方法是增加 --num_train_steps 的值，确保其大于其他相关步数设定，然后重新运行训练脚本。","https:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fg2p-seq2seq\u002Fissues\u002F117",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},16865,"处理包含 Unicode 字符（如 IPA 音标）的词汇表时出现 'UnicodeEncodeError' 怎么办？","这是编码问题。建议修改代码逻辑，将此类错误改为警告（warning）而不是直接报错崩溃，并跳过包含非法字符的行继续处理。同时应添加检查机制，提供更有意义的错误提示信息，以便用户定位具体哪一行数据有问题。","https:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fg2p-seq2seq\u002Fissues\u002F110",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},16866,"在 TensorFlow 1.1 环境下运行测试出现 RNNCell 相关错误如何处理？","该问题与旧版 TensorFlow 兼容性有关。建议使用项目中的 't2t' 分支代码，该分支针对新版 tensor2tensor 进行了适配。注意：由于仍存在一些小问题，该分支尚未合并到主分支（master），但它是当前的参考实现。","https:\u002F\u002Fgithub.com\u002Fcmusphinx\u002Fg2p-seq2seq\u002Fissues\u002F83",{"id":129,"question_zh":130,"answer_zh":131,"source_url":127},16867,"使用 tensor2tensor 模型进行交互式推理时提示 'Model not found' 错误？","请检查模型路径是否正确。如果模型文件位于子目录中（例如 \u002Fmodels\u002Fg2p-tensor2tensor\u002Fbase\u002F），必须在命令中指定完整的路径，包括子目录名称，而不仅仅是父目录。例如：\ng2p-seq2seq --interactive --model \u002Fmodels\u002Fg2p-tensor2tensor\u002Fbase\u002F",{"id":133,"question_zh":134,"answer_zh":135,"source_url":122},16868,"训练过程中出现大量 'Symbols are not in vocabulary' 导致准确率极低怎么办？","这通常是因为输入数据中包含词汇表中未定义的符号（如数字 '5' 或特殊字符）。建议在预处理阶段清理数据，移除或替换这些未知符号。同时，可以在代码中添加检查逻辑，当遇到不在词汇表中的符号时输出明确的警告信息，方便排查数据源问题。",[]]