[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-EdinburghNLP--nematus":3,"tool-EdinburghNLP--nematus":65},[4,17,27,35,48,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",149489,2,"2026-04-10T11:32:46",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,"2026-04-10T11:13:16",[26,43,44,45,14,46,15,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":54,"last_commit_at":55,"category_tags":56,"status":16},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[15,43,46],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":23,"last_commit_at":63,"category_tags":64,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[14,26,13,15,46],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":80,"owner_email":80,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":120,"forks":121,"last_commit_at":122,"license":123,"difficulty_score":23,"env_os":124,"env_gpu":125,"env_ram":126,"env_deps":127,"category_tags":135,"github_topics":136,"view_count":10,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":142,"updated_at":143,"faqs":144,"releases":178},6270,"EdinburghNLP\u002Fnematus","nematus","Open-Source Neural Machine Translation in Tensorflow","Nematus 是一款基于 TensorFlow 构建的开源神经机器翻译工具，专注于提供高性能的编码器 - 解码器模型。它主要解决了研究人员和开发者在构建、训练及部署高质量翻译系统时面临的架构选择与工程实现难题，让用户无需从零搭建底层框架即可开展前沿实验。\n\n这款工具特别适合自然语言处理领域的研究人员、算法工程师以及需要定制化翻译解决方案的开发团队使用。其核心亮点在于极高的灵活性：不仅支持经典的 RNN 和先进的 Transformer 架构，还集成了因子化神经机器翻译、深度模型、混合软最大（Mixture of Softmaxes）以及针对 Transformer 的 DropHead 等多种高级技术特性。此外，Nematus 提供了完善的多 GPU 训练加速、最小风险训练（MRT）、批量解码及服务器模式等实用功能，并预置了多个在 WMT 国际评测中表现优异的模型供直接调用或微调。无论是进行学术探索还是构建生产级翻译服务，Nematus 都能提供稳定且高效的技术支持。","NEMATUS\n-------\n\nAttention-based encoder-decoder model for neural machine translation built in Tensorflow.\n\nNotable features include:\n\n  - support for RNN and Transformer architectures\n\n  - support for advanced RNN architectures:\n     - [arbitrary input features](doc\u002Ffactored_neural_machine_translation.md) (factored neural machine translation) http:\u002F\u002Fwww.statmt.org\u002Fwmt16\u002Fpdf\u002FW16-2209.pdf\n     - deep models (Miceli Barone et al., 2017) https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.07631\n     - dropout on all layers (Gal, 2015) http:\u002F\u002Farxiv.org\u002Fabs\u002F1512.05287\n     - tied embeddings (Press and Wolf, 2016) https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.05859\n     - layer normalisation (Ba et al, 2016) https:\u002F\u002Farxiv.org\u002Fabs\u002F1607.06450\n     - mixture of softmaxes (Yang et al., 2017) https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.03953\n     - lexical model (Nguyen and Chiang, 2018) https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FN18-1031\n\n  - support for advanced Transformer architectures:\n     - [arbitrary input features](doc\u002Ffactored_neural_machine_translation.md) (factored neural machine translation) http:\u002F\u002Fwww.statmt.org\u002Fwmt16\u002Fpdf\u002FW16-2209.pdf\n     - DropHead: dropout of entire attention heads (Zhou et al., 2020) https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.13342\n\n - training features:\n     - multi-GPU support [documentation](doc\u002Fmulti_gpu_training.md)\n     - label smoothing\n     - early stopping with user-defined stopping criterion\n     - resume training (optionally with MAP-L2 regularization towards original model)\n     - minimum risk training (MRT)\n\n - scoring and decoding features:\n     - batch decoding\n     - n-best output\n     - scripts for scoring (given parallel corpus) and rescoring (of n-best output)\n     - server mode\n\n - other usability features:\n     - command line interface for training, scoring, and decoding\n     - JSON-formatted storage of model hyperparameters, vocabulary files and training progress\n     - pretrained models for 13 translation directions (many top-performing at WMT shared task of respective year):\n       - http:\u002F\u002Fdata.statmt.org\u002Frsennrich\u002Fwmt16_systems\u002F\n       - http:\u002F\u002Fdata.statmt.org\u002Fwmt17_systems\u002F\n     - backward compatibility: continue using publicly released models with current codebase (scripts to convert from Theano to Tensorflow-style models are provided)\n\n\nSUPPORT\n-------\n\nFor general support requests, there is a Google Groups mailing list at https:\u002F\u002Fgroups.google.com\u002Fd\u002Fforum\u002Fnematus-support . You can also send an e-mail to nematus-support@googlegroups.com .\n\n\nINSTALLATION\n------------\n\nNematus requires the following packages:\n\n - Python 3 (tested on version 3.5.2)\n - TensorFlow 1.15 \u002F 2.X (tested on version 2.0)\n\nTo install tensorflow, we recommend following the steps at:\n  ( https:\u002F\u002Fwww.tensorflow.org\u002Finstall\u002F )\n\nthe following packages are optional, but *highly* recommended\n\n - CUDA >= 7  (only GPU training is sufficiently fast)\n - cuDNN >= 4 (speeds up training substantially)\n\n\nLEGACY THEANO VERSION\n---------------------\n\nNematus originated as a fork of dl4mt-tutorial by Kyunghyun Cho et al. ( https:\u002F\u002Fgithub.com\u002Fnyu-dl\u002Fdl4mt-tutorial ), and was implemented in Theano.\nSee https:\u002F\u002Fgithub.com\u002FEdinburghNLP\u002Fnematus\u002Ftree\u002Ftheano for this Theano-based version of Nematus.\n\nTo use models trained with Theano with the current Tensorflow codebase, use the script `nematus\u002Ftheano_tf_convert.py`.\n\nDOCKER USAGE\n------------\n\nYou can also create docker image by running following command, where you change `suffix` to either `cpu` or `gpu`:\n\n`docker build -t nematus-docker -f Dockerfile.suffix .`\n\nTo run a CPU docker instance with the current working directory shared with the Docker container, execute:\n\n``docker run -v `pwd`:\u002Fplayground -it nematus-docker``\n\nFor GPU you need to have nvidia-docker installed and run:\n\n``nvidia-docker run -v `pwd`:\u002Fplayground -it nematus-docker``\n\n\nTRAINING SPEED\n--------------\n\nTraining speed depends heavily on having appropriate hardware (ideally a recent NVIDIA GPU),\nand having installed the appropriate software packages.\n\nTo test your setup, we provide some speed benchmarks with `test\u002Ftest_train.sh',\non an Intel Xeon CPU E5-2620 v4, with a Nvidia GeForce GTX Titan X (Pascal) and CUDA 9.0:\n\n\nGPU, CuDNN 5.1, tensorflow 1.0.1:\n\n  CUDA_VISIBLE_DEVICES=0 .\u002Ftest_train.sh\n\n>> 225.25 sentenses\u002Fs\n\n \nUSAGE INSTRUCTIONS\n------------------\n\nAll of the scripts below can be run with `--help` flag to get usage information.\n\nSample commands with toy examples are available in the `test` directory;\nfor training a full-scale RNN system, consider the training scripts at http:\u002F\u002Fdata.statmt.org\u002Fwmt17_systems\u002Ftraining\u002F\n\nAn updated version of these scripts that uses the Transformer model can be found at https:\u002F\u002Fgithub.com\u002FEdinburghNLP\u002Fwmt17-transformer-scripts\n\n#### `nematus\u002Ftrain.py` : use to train a new model\n\n#### data sets; model loading and saving\n| parameter | description |\n|---        |---          |\n| --source_dataset PATH | parallel training corpus (source) |\n| --target_dataset PATH | parallel training corpus (target) |\n| --dictionaries PATH [PATH ...] | network vocabularies (one per source factor, plus target vocabulary) |\n| --save_freq INT | save frequency (default: 30000) |\n| --model PATH | model file name (default: model) |\n| --reload PATH | load existing model from this path. Set to \"latest_checkpoint\" to reload the latest checkpoint in the same directory of --model |\n| --no_reload_training_progress | don't reload training progress (only used if --reload is enabled) |\n| --summary_dir PATH | directory for saving summaries (default: same directory as the --model file) |\n| --summary_freq INT | Save summaries after INT updates, if 0 do not save summaries (default: 0) |\n\n#### network parameters (all model types)\n| parameter | description |\n|---        |---          |\n| --model_type {rnn,transformer} | model type (default: rnn) |\n| --embedding_size INT | embedding layer size (default: 512) |\n| --state_size INT | hidden state size (default: 1000) |\n| --source_vocab_sizes INT [INT ...] | source vocabulary sizes (one per input factor) (default: None) |\n| --target_vocab_size INT | target vocabulary size (default: -1) |\n| --factors INT | number of input factors (default: 1) - CURRENTLY ONLY WORKS FOR 'rnn' MODEL |\n| --dim_per_factor INT [INT ...] | list of word vector dimensionalities (one per factor): '--dim_per_factor 250 200 50' for total dimensionality of 500 (default: None) |\n| --tie_encoder_decoder_embeddings | tie the input embeddings of the encoder and the decoder (first factor only). Source and target vocabulary size must be the same |\n| --tie_decoder_embeddings | tie the input embeddings of the decoder with the softmax output embeddings |\n| --output_hidden_activation {tanh,relu,prelu,linear} | activation function in hidden layer of the output network (default: tanh) - CURRENTLY ONLY WORKS FOR 'rnn' MODEL |\n| --softmax_mixture_size INT | number of softmax components to use (default: 1) - CURRENTLY ONLY WORKS FOR 'rnn' MODEL |\n\n#### network parameters (rnn-specific)\n| parameter | description |\n|---        |---          |\n| --rnn_enc_depth INT | number of encoder layers (default: 1) |\n| --rnn_enc_transition_depth INT | number of GRU transition operations applied in the encoder. Minimum is 1. (Only applies to gru). (default: 1) |\n| --rnn_dec_depth INT | number of decoder layers (default: 1) |\n| --rnn_dec_base_transition_depth INT | number of GRU transition operations applied in the first layer of the decoder. Minimum is 2. (Only applies to gru_cond). (default: 2) |\n| --rnn_dec_high_transition_depth INT | number of GRU transition operations applied in the higher layers of the decoder. Minimum is 1. (Only applies to gru). (default: 1) |\n| --rnn_dec_deep_context | pass context vector (from first layer) to deep decoder layers |\n| --rnn_dropout_embedding FLOAT | dropout for input embeddings (0: no dropout) (default: 0.0) |\n| --rnn_dropout_hidden FLOAT | dropout for hidden layer (0: no dropout) (default: 0.0) |\n| --rnn_dropout_source FLOAT | dropout source words (0: no dropout) (default: 0.0) |\n| --rnn_dropout_target FLOAT | dropout target words (0: no dropout) (default: 0.0) |\n| --rnn_layer_normalisation | Set to use layer normalization in encoder and decoder |\n| --rnn_lexical_model | Enable feedforward lexical model (Nguyen and Chiang, 2018) |\n\n#### network parameters (transformer-specific)\n| parameter | description |\n|---        |---          |\n| --transformer_enc_depth INT | number of encoder layers (default: 6) |\n| --transformer_dec_depth INT | number of decoder layers (default: 6) |\n| --transformer_ffn_hidden_size INT | inner dimensionality of feed-forward sub-layers (default: 2048) |\n| --transformer_num_heads INT | number of attention heads used in multi-head attention (default: 8) |\n| --transformer_dropout_embeddings FLOAT | dropout applied to sums of word embeddings and positional encodings (default: 0.1) |\n| --transformer_dropout_residual FLOAT | dropout applied to residual connections (default: 0.1) |\n| --transformer_dropout_relu FLOAT | dropout applied to the internal activation of the feed-forward sub-layers (default: 0.1) |\n| --transformer_dropout_attn FLOAT | dropout applied to attention weights (default: 0.1) |\n| --transformer_drophead FLOAT | dropout of entire attention heads (default: 0.0) |\n\n#### training parameters\n| parameter | description |\n|---        |---          |\n| --loss_function {cross-entropy,per-token-cross-entropy, MRT} | loss function. MRT: Minimum Risk Training https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FP\u002FP16\u002FP16-1159.pdf) (default: cross-entropy) |\n| --decay_c FLOAT | L2 regularization penalty (default: 0.0) |\n| --map_decay_c FLOAT | MAP-L2 regularization penalty towards original weights (default: 0.0) |\n| --prior_model PATH | Prior model for MAP-L2 regularization. Unless using \" --reload\", this will also be used for initialization. |\n| --clip_c FLOAT | gradient clipping threshold (default: 1.0) |\n| --label_smoothing FLOAT | label smoothing (default: 0.0) |\n| --exponential_smoothing FLOAT | exponential smoothing factor; use 0 to disable (default: 0.0) |\n| --optimizer {adam} | optimizer (default: adam) |\n| --adam_beta1 FLOAT | exponential decay rate for the first moment estimates (default: 0.9) |\n| --adam_beta2 FLOAT | exponential decay rate for the second moment estimates (default: 0.999) |\n| --adam_epsilon FLOAT | constant for numerical stability (default: 1e-08) |\n| --learning_schedule {constant,transformer,warmup-plateau-decay} | learning schedule (default: constant) |\n| --learning_rate FLOAT | learning rate (default: 0.0001) |\n| --warmup_steps INT | number of initial updates during which the learning rate is increased linearly during learning rate scheduling (default: 8000) |\n| --plateau_steps INT | number of updates after warm-up before the learning rate starts to decay (applies to 'warmup-plateau-decay' learning schedule only). (default: 0) |\n| --maxlen INT | maximum sequence length for training and validation (default: 100) |\n| --batch_size INT | minibatch size (default: 80) |\n| --token_batch_size INT | minibatch size (expressed in number of source or target tokens). Sentence-level minibatch size will be dynamic. If this is enabled, batch_size only affects sorting by length. (default: 0) |\n| --max_sentences_per_device INT | maximum size of minibatch subset to run on a single device, in number of sentences (default: 0) |\n| --max_tokens_per_device INT | maximum size of minibatch subset to run on a single device, in number of tokens (either source or target - whichever is highest) (default: 0) |\n| --gradient_aggregation_steps INT | number of times to accumulate gradients before aggregating and applying; the minibatch is split between steps, so adding more steps allows larger minibatches to be used (default: 1) |\n| --maxibatch_size INT | size of maxibatch (number of minibatches that are sorted by length) (default: 20) |\n| --no_sort_by_length | do not sort sentences in maxibatch by length |\n| --no_shuffle | disable shuffling of training data (for each epoch) |\n| --keep_train_set_in_memory | Keep training dataset lines stores in RAM during training |\n| --max_epochs INT | maximum number of epochs (default: 5000) |\n| --finish_after INT | maximum number of updates (minibatches) (default: 10000000) |\n| --print_per_token_pro PATH | PATH to store the probability of each target token given source sentences over the training dataset (without training). If set to False, the function will not be triggered. (default: False). Please get rid of the 1.0s at the end of each list which are the probability of padding. |\n\n#### minimum risk training parameters (MRT)\n\n| parameter | description |\n|---        |---          |\n| --mrt_reference | add reference into MRT candidates sentences (default: False) |\n| --mrt_alpha FLOAT | MRT alpha to control the sharpness of the distribution of sampled subspace (default: 0.005) |\n| --samplesN INT | the number of sampled candidates sentences per source sentence (default: 100) |\n| --mrt_loss | evaluation metrics used to compute loss between the candidate translation and reference translation (default: SENTENCEBLEU n=4) |\n| --mrt_ml_mix FLOAT | mix in MLE objective in MRT training with this scaling factor (default: 0) |\n| --sample_way {beam_search, randomly_sample} | the sampling strategy to generate candidates sentences (default: beam_search) |\n| --max_len_a INT | generate candidates sentences with maximum length: ax + b, where x is the length of the source sentence (default: 1.5) |\n| --max_len_b INT | generate candidates sentences with maximum length: ax + b, where x is the length of the source sentence (default: 5) |\n| --max_sentences_of_sampling INT | maximum number of source sentences to generate candidates sentences at one time (limited by device memory capacity) (default: 0) |\n\n#### validation parameters\n| parameter | description |\n|---        |---          |\n| --valid_source_dataset PATH | source validation corpus (default: None) |\n| --valid_target_dataset PATH | target validation corpus (default: None) |\n| --valid_batch_size INT | validation minibatch size (default: 80) |\n| --valid_token_batch_size INT | validation minibatch size (expressed in number of source or target tokens). Sentence-level minibatch size will be dynamic. If this is enabled, valid_batch_size only affects sorting by length. (default: 0) |\n| --valid_freq INT | validation frequency (default: 10000) |\n| --valid_script PATH | path to script for external validation (default: None). The script will be passed an argument specifying the path of a file that contains translations of the source validation corpus. It must write a single score to standard output. |\n| --valid_bleu_source_dataset PATH | source validation corpus for external validation (default: None). If set to None, the dataset for calculating validation loss (valid_source_dataset) will be used |\n| --patience INT | early stopping patience (default: 10) |\n\n#### display parameters\n| parameter | description |\n|---        |---          |\n| --disp_freq INT | display loss after INT updates (default: 1000) |\n| --sample_freq INT | display some samples after INT updates (default: 10000) |\n| --beam_freq INT | display some beam_search samples after INT updates (default: 10000) |\n| --beam_size INT | size of the beam (default: 12) |\n\n#### translate parameters\n| parameter | description |\n|---        |---          |\n| --normalization_alpha [ALPHA] | normalize scores by sentence length (with argument, \" \"exponentiate lengths by ALPHA) |\n| --n_best | Print full beam |\n| --translation_maxlen INT | Maximum length of translation output sentence (default: 200) |\n| --translation_strategy {beam_search,sampling} | translation_strategy, either beam_search or sampling (default: beam_search) |\n\n#### `nematus\u002Ftranslate.py` : use an existing model to translate a source text\n\n| parameter | description |\n|---        |---          |\n| -v, --verbose | verbose mode |\n| -m PATH [PATH ...], --models PATH [PATH ...] | model to use; provide multiple models (with same vocabulary) for ensemble decoding |\n| -b INT, --minibatch_size INT | minibatch size (default: 80) |\n| -i PATH, --input PATH | input file (default: standard input) |\n| -o PATH, --output PATH | output file (default: standard output) |\n| -k INT, --beam_size INT | beam size (default: 5) |\n| -n [ALPHA], --normalization_alpha [ALPHA] | normalize scores by sentence length (with argument, exponentiate lengths by ALPHA) |\n| --n_best | write n-best list (of size k) |\n| --maxibatch_size INT | size of maxibatch (number of minibatches that are sorted by length) (default: 20) |\n\n#### `nematus\u002Fscore.py` : use an existing model to score a parallel corpus\n\n| parameter | description |\n|---        |---          |\n| -v, --verbose | verbose mode |\n| -m PATH [PATH ...], --models PATH [PATH ...] | model to use; provide multiple models (with same vocabulary) for ensemble decoding |\n| -b INT, --minibatch_size INT | minibatch size (default: 80) |\n| -n [ALPHA], --normalization_alpha [ALPHA] | normalize scores by sentence length (with argument, exponentiate lengths by ALPHA) |\n| -o PATH, --output PATH | output file (default: standard output) |\n| -s PATH, --source PATH | source text file |\n| -t PATH, --target PATH | target text file |\n\n\n#### `nematus\u002Frescore.py` : use an existing model to rescore an n-best list.\n\nThe n-best list is assumed to have the same format as Moses:\n\n    sentence-ID (starting from 0) ||| translation ||| scores\n\nnew scores will be appended to the end. `rescore.py` has the same arguments as `score.py`, with the exception of this additional parameter:\n\n| parameter             | description |\n|---                    |--- |\n| -i PATH, --input PATH | input n-best list file (default: standard input) |\n\n\n#### `nematus\u002Ftheano_tf_convert.py` : convert an existing theano model to a tensorflow model\n\nIf you have a Theano model (model.npz) with network architecture features that are currently\nsupported then you can convert it into a tensorflow model using `nematus\u002Ftheano_tf_convert.py`.\n\n| parameter | description |\n|---        |---          |\n| --from_theano | convert from Theano to TensorFlow format |\n| --from_tf | convert from Tensorflow to Theano format |\n| --in PATH | path to input model |\n| --out PATH | path to output model |\n\n\nPUBLICATIONS\n------------\n\nif you use Nematus, please cite the following paper:\n\nRico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli Barone, Jozef Mokry and Maria Nadejde (2017): Nematus: a Toolkit for Neural Machine Translation. In Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 65-68.\n\n```\n@InProceedings{sennrich-EtAl:2017:EACLDemo,\n  author    = {Sennrich, Rico  and  Firat, Orhan  and  Cho, Kyunghyun  and  Birch, Alexandra  and  Haddow, Barry  and  Hitschler, Julian  and  Junczys-Dowmunt, Marcin  and  L\\\"{a}ubli, Samuel  and  Miceli Barone, Antonio Valerio  and  Mokry, Jozef  and  Nadejde, Maria},\n  title     = {Nematus: a Toolkit for Neural Machine Translation},\n  booktitle = {Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics},\n  month     = {April},\n  year      = {2017},\n  address   = {Valencia, Spain},\n  publisher = {Association for Computational Linguistics},\n  pages     = {65--68},\n  url       = {http:\u002F\u002Faclweb.org\u002Fanthology\u002FE17-3017}\n}\n```\n\nthe code is based on the following models:\n\nDzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio (2015): Neural Machine Translation by Jointly Learning to Align and Translate, Proceedings of the International Conference on Learning Representations (ICLR).\n\nAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017): Attention is All You Need, Advances in Neural Information Processing Systems (NIPS).\n\nplease refer to the Nematus paper for a description of implementation differences to the RNN model.\n\n\nACKNOWLEDGMENTS\n---------------\nThis project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements 645452 (QT21), 644333 (TraMOOC), 644402 (HimL) and 688139 (SUMMA).\n","NEMATUS\n-------\n\n基于注意力机制的编码器-解码器模型，用于神经机器翻译，使用 TensorFlow 构建。\n\n显著特性包括：\n\n  - 支持 RNN 和 Transformer 架构\n\n  - 支持高级 RNN 架构：\n     - [任意输入特征](doc\u002Ffactored_neural_machine_translation.md)（因子化神经机器翻译）http:\u002F\u002Fwww.statmt.org\u002Fwmt16\u002Fpdf\u002FW16-2209.pdf\n     - 深度模型（Miceli Barone 等，2017 年）https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.07631\n     - 所有层上的 Dropout（Gal，2015 年）http:\u002F\u002Farxiv.org\u002Fabs\u002F1512.05287\n     - 嵌入共享（Press 和 Wolf，2016 年）https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.05859\n     - 层归一化（Ba 等，2016 年）https:\u002F\u002Farxiv.org\u002Fabs\u002F1607.06450\n     - Softmax 混合（Yang 等，2017 年）https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.03953\n     - 词汇模型（Nguyen 和 Chiang，2018 年）https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FN18-1031\n\n  - 支持高级 Transformer 架构：\n     - [任意输入特征](doc\u002Ffactored_neural_machine_translation.md)（因子化神经机器翻译）http:\u002F\u002Fwww.statmt.org\u002Fwmt16\u002Fpdf\u002FW16-2209.pdf\n     - DropHead：整个注意力头的 Dropout（Zhou 等，2020 年）https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.13342\n\n - 训练特性：\n     - 多 GPU 支持 [文档](doc\u002Fmulti_gpu_training.md)\n     - 标签平滑\n     - 带用户自定义停止条件的早停\n     - 恢复训练（可选地带有向原始模型的 MAP-L2 正则化）\n     - 最小风险训练（MRT）\n\n - 评分和解码特性：\n     - 批量解码\n     - n-best 输出\n     - 用于评分（给定平行语料库）和重新评分（n-best 输出）的脚本\n     - 服务器模式\n\n - 其他易用性特性：\n     - 用于训练、评分和解码的命令行界面\n     - 模型超参数、词汇表文件和训练进度的 JSON 格式存储\n     - 预训练模型适用于 13 种翻译方向（许多在相应年份的 WMT 共享任务中表现优异）：\n       - http:\u002F\u002Fdata.statmt.org\u002Frsennrich\u002Fwmt16_systems\u002F\n       - http:\u002F\u002Fdata.statmt.org\u002Fwmt17_systems\u002F\n     - 向后兼容：可以继续使用公开发布的模型与当前代码库一起使用（提供了从 Theano 转换到 TensorFlow 风格模型的脚本）\n\n\n支持\n-------\n\n对于一般的支持请求，有一个 Google Groups 邮件列表，地址是 https:\u002F\u002Fgroups.google.com\u002Fd\u002Fforum\u002Fnematus-support 。您也可以发送电子邮件至 nematus-support@googlegroups.com .\n\n\n安装\n------------\n\nNematus 需要以下软件包：\n\n - Python 3（已在版本 3.5.2 上测试过）\n - TensorFlow 1.15 \u002F 2.X（已在版本 2.0 上测试过）\n\n要安装 TensorFlow，我们建议按照以下步骤操作：\n  ( https:\u002F\u002Fwww.tensorflow.org\u002Finstall\u002F )\n\n以下软件包是可选的，但*强烈*推荐：\n\n - CUDA >= 7（只有 GPU 训练才足够快）\n - cuDNN >= 4（显著加快训练速度）\n\n\n旧版 Theano\n---------------------\n\nNematus 最初是 Kyunghyun Cho 等人 dl4mt-tutorial 的一个分支（ https:\u002F\u002Fgithub.com\u002Fnyu-dl\u002Fdl4mt-tutorial ），并使用 Theano 实现。\n有关这个基于 Theano 的 Nematus 版本，请参阅 https:\u002F\u002Fgithub.com\u002FEdinburghNLP\u002Fnematus\u002Ftree\u002Ftheano 。\n\n要将使用 Theano 训练的模型与当前的 TensorFlow 代码库一起使用，可以使用脚本 `nematus\u002Ftheano_tf_convert.py`。\n\nDocker 使用\n------------\n\n您也可以通过运行以下命令来创建 Docker 镜像，其中将 `suffix` 更改为 `cpu` 或 `gpu`：\n\n`docker build -t nematus-docker -f Dockerfile.suffix .`\n\n要运行一个 CPU Docker 实例，并将当前工作目录与 Docker 容器共享，可以执行：\n\n``docker run -v `pwd`:\u002Fplayground -it nematus-docker``\n\n对于 GPU，您需要安装 nvidia-docker，然后运行：\n\n``nvidia-docker run -v `pwd`:\u002Fplayground -it nematus-docker``\n\n\n训练速度\n--------------\n\n训练速度在很大程度上取决于是否拥有合适的硬件（理想情况下是较新的 NVIDIA GPU），以及是否安装了相应的软件包。\n\n为了测试您的设置，我们提供了一些速度基准测试，使用 `test\u002Ftest_train.sh`，在 Intel Xeon CPU E5-2620 v4 上，配备 Nvidia GeForce GTX Titan X（Pascal）和 CUDA 9.0：\n\n\nGPU，CuDNN 5.1，TensorFlow 1.0.1：\n\n  CUDA_VISIBLE_DEVICES=0 .\u002Ftest_train.sh\n\n>> 225.25 句子\u002F秒\n\n \n使用说明\n------------------\n\n以下所有脚本都可以使用 `--help` 标志来获取使用信息。\n\n`test` 目录中提供了带有玩具示例的命令；对于训练一个全规模的 RNN 系统，可以参考 http:\u002F\u002Fdata.statmt.org\u002Fwmt17_systems\u002Ftraining\u002F 中的训练脚本。\n\n这些脚本的更新版本，使用 Transformer 模型，可以在 https:\u002F\u002Fgithub.com\u002FEdinburghNLP\u002Fwmt17-transformer-scripts 上找到。\n\n#### `nematus\u002Ftrain.py`：用于训练新模型\n\n#### 数据集；模型加载和保存\n| 参数 | 描述 |\n|---        |---          |\n| --source_dataset PATH | 平行训练语料库（源端） |\n| --target_dataset PATH | 平行训练语料库（目标端） |\n| --dictionaries PATH [PATH ...] | 网络词汇表（每个源因素一个，加上目标词汇表） |\n| --save_freq INT | 保存频率（默认：30000） |\n| --model PATH | 模型文件名（默认：model） |\n| --reload PATH | 从该路径加载现有模型。设置为“latest_checkpoint”以重新加载 --model 所在目录中的最新检查点 |\n| --no_reload_training_progress | 不重新加载训练进度（仅在启用 --reload 时使用） |\n| --summary_dir PATH | 用于保存摘要的目录（默认：与 --model 文件相同的目录） |\n| --summary_freq INT | 每 INT 次更新保存摘要，若为 0 则不保存摘要（默认：0） |\n\n#### 网络参数（所有模型类型）\n| 参数 | 描述 |\n|---        |---          |\n| --model_type {rnn,transformer} | 模型类型（默认：rnn） |\n| --embedding_size INT | 嵌入层大小（默认：512） |\n| --state_size INT | 隐藏状态大小（默认：1000） |\n| --source_vocab_sizes INT [INT ...] | 源端词汇表大小（每个输入因素一个）（默认：无） |\n| --target_vocab_size INT | 目标端词汇表大小（默认：-1） |\n| --factors INT | 输入因素的数量（默认：1）——目前仅适用于“rnn”模型 |\n| --dim_per_factor INT [INT ...] | 单词向量维度列表（每个因素一个）：“--dim_per_factor 250 200 50”表示总维度为 500（默认：无） |\n| --tie_encoder_decoder_embeddings | 将编码器和解码器的输入嵌入绑定在一起（仅第一个因素）。源端和目标端的词汇表大小必须相同 |\n| --tie_decoder_embeddings | 将解码器的输入嵌入与 softmax 输出嵌入绑定在一起 |\n| --output_hidden_activation {tanh,relu,prelu,linear} | 输出网络隐藏层的激活函数（默认：tanh）——目前仅适用于“rnn”模型 |\n| --softmax_mixture_size INT | 要使用的 softmax 组件数量（默认：1）——目前仅适用于“rnn”模型 |\n\n#### 网络参数（RNN 特定）\n| 参数 | 描述 |\n|---        |---          |\n| --rnn_enc_depth INT | 编码器层数（默认：1） |\n| --rnn_enc_transition_depth INT | 在编码器中应用的 GRU 转移操作次数。最小值为 1。（仅适用于 GRU）（默认：1） |\n| --rnn_dec_depth INT | 解码器层数（默认：1） |\n| --rnn_dec_base_transition_depth INT | 在解码器第一层中应用的 GRU 转移操作次数。最小值为 2。（仅适用于 gru_cond）（默认：2） |\n| --rnn_dec_high_transition_depth INT | 在解码器高层中应用的 GRU 转移操作次数。最小值为 1。（仅适用于 GRU）（默认：1） |\n| --rnn_dec_deep_context | 将上下文向量（来自第一层）传递到深层解码器层 |\n| --rnn_dropout_embedding FLOAT | 输入嵌入的 dropout 概率（0：不使用 dropout）（默认：0.0） |\n| --rnn_dropout_hidden FLOAT | 隐藏层的 dropout 概率（0：不使用 dropout）（默认：0.0） |\n| --rnn_dropout_source FLOAT | 源端词的 dropout 概率（0：不使用 dropout）（默认：0.0） |\n| --rnn_dropout_target FLOAT | 目标端词的 dropout 概率（0：不使用 dropout）（默认：0.0） |\n| --rnn_layer_normalisation | 设置在编码器和解码器中使用层归一化 |\n| --rnn_lexical_model | 启用前馈词汇模型（Nguyen 和 Chiang，2018） |\n\n#### 网络参数（Transformer 特定）\n| 参数 | 描述 |\n|---        |---          |\n| --transformer_enc_depth INT | 编码器层数（默认：6） |\n| --transformer_dec_depth INT | 解码器层数（默认：6） |\n| --transformer_ffn_hidden_size INT | 前馈子层的内部维度（默认：2048） |\n| --transformer_num_heads INT | 多头注意力机制中使用的注意力头数（默认：8） |\n| --transformer_dropout_embeddings FLOAT | 应用于词嵌入和位置编码之和的 dropout 概率（默认：0.1） |\n| --transformer_dropout_residual FLOAT | 应用于残差连接的 dropout 概率（默认：0.1） |\n| --transformer_dropout_relu FLOAT | 应用于前馈子层内部激活的 dropout 概率（默认：0.1） |\n| --transformer_dropout_attn FLOAT | 应用于注意力权重的 dropout 概率（默认：0.1） |\n| --transformer_drophead FLOAT | 整个注意力头的 dropout 概率（默认：0.0） |\n\n#### 训练参数\n| 参数 | 描述 |\n|---        |---          |\n| --loss_function {cross-entropy,per-token-cross-entropy, MRT} | 损失函数。MRT：最小风险训练 https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FP\u002FP16\u002FP16-1159.pdf)（默认：交叉熵） |\n| --decay_c FLOAT | L2 正则化惩罚项（默认：0.0） |\n| --map_decay_c FLOAT | 针对原始权重的 MAP-L2 正则化惩罚项（默认：0.0） |\n| --prior_model PATH | 用于 MAP-L2 正则化的先验模型。除非使用“--reload”，否则该模型也将用于初始化。 |\n| --clip_c FLOAT | 梯度裁剪阈值（默认：1.0） |\n| --label_smoothing FLOAT | 标签平滑（默认：0.0） |\n| --exponential_smoothing FLOAT | 指数平滑因子；设为 0 可禁用（默认：0.0） |\n| --optimizer {adam} | 优化器（默认：adam） |\n| --adam_beta1 FLOAT | 第一矩估计的指数衰减率（默认：0.9） |\n| --adam_beta2 FLOAT | 第二矩估计的指数衰减率（默认：0.999） |\n| --adam_epsilon FLOAT | 用于数值稳定性的常数（默认：1e-08） |\n| --learning_schedule {constant,transformer,warmup-plateau-decay} | 学习率调度策略（默认：常数） |\n| --learning_rate FLOAT | 学习率（默认：0.0001） |\n| --warmup_steps INT | 学习率线性增加的初始更新步数，在学习率调度过程中使用（默认：8000） |\n| --plateau_steps INT | 预热期后开始降低学习率之前的更新步数。仅适用于“warmup-plateau-decay”学习率调度策略。（默认：0） |\n| --maxlen INT | 训练和验证的最大序列长度（默认：100） |\n| --batch_size INT | 小批量大小（默认：80） |\n| --token_batch_size INT | 以源端或目标端词数表示的小批量大小。句子级别的小批量大小将动态调整。如果启用此选项，batch_size 仅影响按长度排序。（默认：0） |\n| --max_sentences_per_device INT | 单个设备上运行的小批量子集的最大句子数量（默认：0） |\n| --max_tokens_per_device INT | 单个设备上运行的小批量子集的最大词数（以源端或目标端中较高者为准）（默认：0） |\n| --gradient_aggregation_steps INT | 在聚合和应用梯度之前累积梯度的次数。小批量会在各步骤之间拆分，因此增加步骤数可以使用更大的小批量。（默认：1） |\n| --maxibatch_size INT | 最大批次的大小（按长度排序的小批量数量）（默认：20） |\n| --no_sort_by_length | 不按长度对最大批次中的句子进行排序 |\n| --no_shuffle | 禁用每轮训练数据的随机打乱 |\n| --keep_train_set_in_memory | 在训练过程中将训练数据行保留在内存中 |\n| --max_epochs INT | 最大训练轮数（默认：5000） |\n| --finish_after INT | 最大更新次数（小批量）（默认：10000000） |\n| --print_per_token_pro PATH | 用于存储训练数据集中每个目标词在给定源句条件下的概率路径（无需训练）。若设置为 False，则该功能不会触发。（默认：False）。请删除每个列表末尾的 1.0，它们代表填充的概率。|\n\n#### 最小风险训练参数（MRT）\n\n| 参数 | 描述 |\n|---        |---          |\n| --mrt_reference | 在 MRT 候选句子中加入参考译文（默认：False） |\n| --mrt_alpha FLOAT | MRT 的 alpha 参数，用于控制采样子空间分布的尖锐程度（默认：0.005） |\n| --samplesN INT | 每个源句采样的候选句子数量（默认：100） |\n| --mrt_loss | 用于计算候选译文与参考译文之间损失的评估指标（默认：SENTENCEBLEU n=4） |\n| --mrt_ml_mix FLOAT | 在 MRT 训练中混合 MLE 目标函数，通过此缩放因子进行调整（默认：0） |\n| --sample_way {beam_search, randomly_sample} | 生成候选句子的采样策略（默认：束搜索） |\n| --max_len_a INT | 生成候选句子的最大长度公式为 ax + b，其中 x 为源句长度（默认：1.5） |\n| --max_len_b INT | 生成候选句子的最大长度公式为 ax + b，其中 x 为源句长度（默认：5） |\n| --max_sentences_of_sampling INT | 一次最多可生成候选句子的源句数量（受设备内存容量限制）（默认：0） |\n\n#### 验证参数\n| 参数                | 描述                                                         |\n|---------------------|--------------------------------------------------------------|\n| --valid_source_dataset PATH | 源端验证语料（默认：无）                                     |\n| --valid_target_dataset PATH | 目标端验证语料（默认：无）                                   |\n| --valid_batch_size INT    | 验证时的小批量大小（默认：80）                              |\n| --valid_token_batch_size INT | 以源端或目标端词数表示的验证小批量大小。句子级别的小批量大小将动态调整。若启用此选项，valid_batch_size 仅影响按长度排序。（默认：0） |\n| --valid_freq INT          | 验证频率（默认：10000）                                     |\n| --valid_script PATH       | 外部验证脚本路径（默认：无）。该脚本会接收一个参数，指定包含源端验证语料翻译结果的文件路径，并需将单一评分写入标准输出。 |\n| --valid_bleu_source_dataset PATH | 用于外部 BLEU 评分的源端验证语料（默认：无）。若设为无，则使用计算验证损失的语料（valid_source_dataset）。 |\n| --patience INT            | 早停耐心值（默认：10）                                       |\n\n#### 显示参数\n| 参数                | 描述                                                         |\n|---------------------|--------------------------------------------------------------|\n| --disp_freq INT         | 每更新 INT 次后显示一次损失（默认：1000）                    |\n| --sample_freq INT       | 每更新 INT 次后显示一些样本（默认：10000）                   |\n| --beam_freq INT         | 每更新 INT 次后显示一些束搜索样本（默认：10000）             |\n| --beam_size INT         | 束宽度（默认：12）                                           |\n\n#### 翻译参数\n| 参数                | 描述                                                         |\n|---------------------|--------------------------------------------------------------|\n| --normalization_alpha [ALPHA] | 根据句子长度归一化得分（带参数时，对长度进行 ALPHA 次方运算） |\n| --n_best                | 输出完整束搜索结果                                           |\n| --translation_maxlen INT | 翻译输出句子的最大长度（默认：200）                          |\n| --translation_strategy {beam_search,sampling} | 翻译策略，可选束搜索或采样（默认：束搜索）                  |\n\n#### `nematus\u002Ftranslate.py`：使用现有模型翻译源文本\n\n| 参数                | 描述                                                         |\n|---------------------|--------------------------------------------------------------|\n| -v, --verbose           | 详细模式                                                     |\n| -m PATH [PATH ...], --models PATH [PATH ...] | 要使用的模型；可提供多个具有相同词汇表的模型进行集成解码   |\n| -b INT, --minibatch_size INT | 小批量大小（默认：80）                                      |\n| -i PATH, --input PATH    | 输入文件（默认：标准输入）                                   |\n| -o PATH, --output PATH   | 输出文件（默认：标准输出）                                   |\n| -k INT, --beam_size INT | 束宽度（默认：5）                                            |\n| -n [ALPHA], --normalization_alpha [ALPHA] | 根据句子长度归一化得分（带参数时，对长度进行 ALPHA 次方运算） |\n| --n_best                | 输出 n-best 列表（大小为 k）                                 |\n| --maxibatch_size INT    | 最大批次大小（按长度排序的小批量数量）（默认：20）           |\n\n#### `nematus\u002Fscore.py`：使用现有模型对平行语料进行评分\n\n| 参数                | 描述                                                         |\n|---------------------|--------------------------------------------------------------|\n| -v, --verbose           | 详细模式                                                     |\n| -m PATH [PATH ...], --models PATH [PATH ...] | 要使用的模型；可提供多个具有相同词汇表的模型进行集成解码   |\n| -b INT, --minibatch_size INT | 小批量大小（默认：80）                                      |\n| -n [ALPHA], --normalization_alpha [ALPHA] | 根据句子长度归一化得分（带参数时，对长度进行 ALPHA 次方运算） |\n| -o PATH, --output PATH   | 输出文件（默认：标准输出）                                   |\n| -s PATH, --source PATH   | 源端文本文件                                                 |\n| -t PATH, --target PATH   | 目标端文本文件                                               |\n\n\n#### `nematus\u002Frescore.py`：使用现有模型对 n-best 列表重新评分。\n\n假设 n-best 列表采用与 Moses 相同的格式：\n\n    句子ID（从0开始） ||| 翻译 ||| 得分\n\n新得分将追加到末尾。`rescore.py` 的参数与 `score.py` 相同，但额外增加了一个参数：\n\n| 参数                | 描述                                                         |\n|---------------------|--------------------------------------------------------------|\n| -i PATH, --input PATH   | 输入 n-best 列表文件（默认：标准输入）                       |\n\n\n#### `nematus\u002Ftheano_tf_convert.py`：将现有 Theano 模型转换为 TensorFlow 模型\n\n如果您拥有一个支持当前功能的网络架构的 Theano 模型（model.npz），则可以使用 `nematus\u002Ftheano_tf_convert.py` 将其转换为 TensorFlow 模型。\n\n| 参数                | 描述                                                         |\n|---------------------|--------------------------------------------------------------|\n| --from_theano           | 从 Theano 格式转换为 TensorFlow 格式                         |\n| --from_tf               | 从 TensorFlow 格式转换为 Theano 格式                         |\n| --in PATH               | 输入模型路径                                                 |\n| --out PATH              | 输出模型路径                                                 |\n\n\n出版物\n------\n\n如果您使用 Nematus，请引用以下论文：\n\nRico Sennrich、Orhan Firat、Kyunghyun Cho、Alexandra Birch、Barry Haddow、Julian Hitschler、Marcin Junczys-Dowmunt、Samuel Läubli、Antonio Valerio Miceli Barone、Jozef Mokry 和 Maria Nadejde（2017）：Nematus：神经机器翻译工具包。载于第15届欧洲计算语言学协会大会软件演示文集，西班牙瓦伦西亚，第65–68页。\n\n```\n@InProceedings{sennrich-EtAl:2017:EACLDemo,\n  author    = {Sennrich, Rico  and  Firat, Orhan  and  Cho, Kyunghyun  and  Birch, Alexandra  and  Haddow, Barry  and  Hitschler, Julian  and  Junczys-Dowmunt, Marcin  and  L\\\"{a}ubli, Samuel  and  Miceli Barone, Antonio Valerio  and  Mokry, Jozef  and  Nadejde, Maria},\n  title     = {Nematus: a Toolkit for Neural Machine Translation},\n  booktitle = {Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics},\n  month     = {April},\n  year      = {2017},\n  address   = {Valencia, Spain},\n  publisher = {Association for Computational Linguistics},\n  pages     = {65--68},\n  url       = {http:\u002F\u002Faclweb.org\u002Fanthology\u002FE17-3017}\n}\n```\n\n该代码基于以下模型：\n\nDzmitry Bahdanau、Kyunghyun Cho、Yoshua Bengio（2015）：通过联合学习对齐与翻译实现神经机器翻译，国际表征学习会议（ICLR）论文。\n\nAshish Vaswani、Noam Shazeer、Niki Parmar、Jakob Uszkoreit、Llion Jones、Aidan N. Gomez、Lukasz Kaiser、Illia Polosukhin（2017）：注意力即一切，神经信息处理系统进展（NIPS）论文。\n\n有关与 RNN 模型实现差异的描述，请参阅 Nematus 论文。\n\n\n致谢\n----\n本项目获得了欧盟“地平线2020”研究与创新计划的资助，资助协议编号分别为645452（QT21）、644333（TraMOOC）、644402（HimL）和688139（SUMMA）。","# Nematus 快速上手指南\n\nNematus 是一个基于 TensorFlow 构建的神经机器翻译（NMT）工具，支持 RNN 和 Transformer 架构，具备多 GPU 训练、高级正则化及多种解码功能。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**：Linux 或 macOS（Windows 需通过 WSL 或 Docker 运行）\n*   **Python**：Python 3.5 或更高版本（推荐 3.6+）\n*   **深度学习框架**：TensorFlow 1.15 或 TensorFlow 2.x（官方测试版本为 2.0）\n*   **硬件加速（强烈推荐）**：\n    *   NVIDIA GPU（近期型号效果最佳）\n    *   CUDA >= 7.0\n    *   cuDNN >= 4.0\n    *   *注：仅使用 CPU 训练速度较慢，生产环境建议使用 GPU。*\n\n## 安装步骤\n\n### 1. 安装依赖\n首先安装 Python 包管理工具所需的依赖，并安装 TensorFlow。\n\n```bash\npip install --upgrade pip\npip install tensorflow\n# 若使用 GPU 版本，建议安装 tensorflow-gpu (TF 1.x) 或直接安装包含 GPU 支持的 tensorflow (TF 2.x+)\n```\n\n> **国内加速提示**：如果下载速度慢，可使用清华或阿里镜像源：\n> `pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple tensorflow`\n\n### 2. 获取源码\n克隆 Nematus 仓库：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FEdinburghNLP\u002Fnematus.git\ncd nematus\n```\n\n### 3. (可选) Docker 部署\n如果您希望避免环境配置冲突，可以使用 Docker 快速构建环境。\n\n**构建镜像：**\n```bash\n# 将 suffix 替换为 cpu 或 gpu\ndocker build -t nematus-docker -f Dockerfile.gpu .\n```\n\n**运行容器：**\n```bash\n# GPU 运行示例 (需安装 nvidia-docker)\nnvidia-docker run -v $(pwd):\u002Fplayground -it nematus-docker\n```\n\n## 基本使用\n\nNematus 的核心功能通过命令行脚本调用。以下是训练一个新模型的最简流程。\n\n### 1. 准备数据\n您需要准备源语言和目标语言的平行语料库（每行一个句子），以及对应的词汇表文件。\n\n### 2. 训练模型\n使用 `nematus\u002Ftrain.py` 启动训练。以下是一个基于 Transformer 架构的最小化训练示例：\n\n```bash\npython nematus\u002Ftrain.py \\\n    --model_type transformer \\\n    --source_dataset data\u002Ftrain.src \\\n    --target_dataset data\u002Ftrain.trg \\\n    --dictionaries data\u002Fvocab.src.json data\u002Fvocab.trg.json \\\n    --model models\u002Fmy_first_model \\\n    --batch_size 80 \\\n    --maxlen 100 \\\n    --transformer_enc_depth 6 \\\n    --transformer_dec_depth 6 \\\n    --transformer_num_heads 8 \\\n    --learning_rate 0.0001\n```\n\n**关键参数说明：**\n*   `--source_dataset` \u002F `--target_dataset`: 平行语料路径。\n*   `--dictionaries`: 词汇表文件路径（JSON 格式）。\n*   `--model`: 模型保存路径前缀。\n*   `--model_type`: 选择 `rnn` 或 `transformer`。\n*   `--reload`: 如需断点续训，添加此参数并指向已有模型路径（例如 `--reload latest_checkpoint`）。\n\n### 3. 解码与翻译\n训练完成后，使用 `nematus\u002Ftranslate.py` 进行翻译：\n\n```bash\npython nematus\u002Ftranslate.py \\\n    --models models\u002Fmy_first_model \\\n    --input data\u002Ftest.src \\\n    --output data\u002Ftest.pred \\\n    --beam-size 5\n```\n\n### 4. 查看帮助\n所有脚本均支持 `--help` 参数以查看详细用法和完整参数列表：\n\n```bash\npython nematus\u002Ftrain.py --help\npython nematus\u002Ftranslate.py --help\n```","某跨境电商技术团队需要为平台构建一套支持多语言（如英译法、英译德）的自动商品描述翻译系统，以应对海量 SKU 的快速上架需求。\n\n### 没有 nematus 时\n- **架构迭代困难**：团队若想从传统的 RNN 模型升级到更高效的 Transformer 架构，往往需要重写大量底层代码，研发周期长达数周。\n- **训练效率低下**：缺乏原生的多 GPU 训练支持，单卡训练大规模平行语料耗时极长，且无法利用混合精度或高级正则化技术加速收敛。\n- **特征扩展受限**：难以灵活引入词性标注等额外输入特征（Factored NMT），导致模型在处理复杂语法结构时翻译生硬、准确率遭遇瓶颈。\n- **部署维护繁琐**：缺少统一的命令行接口和服务器模式，每次模型更新都需要手动编写复杂的推理脚本，难以实现自动化流水线。\n\n### 使用 nematus 后\n- **架构无缝切换**：借助 nematus 对 RNN 和 Transformer 的统一支持，团队仅需修改配置文件即可在两种架构间自由切换并对比效果，新模型上线时间缩短至几天。\n- **训练大幅加速**：利用其内置的多 GPU 并行训练及 DropHead 等先进丢弃策略，模型训练速度提升数倍，并能通过早停机制自动防止过拟合。\n- **模型精度跃升**：轻松启用任意输入特征支持和词汇模型优化，显著提升了长难句和专业术语的翻译流畅度，WMT 基准测试得分明显提高。\n- **工程落地便捷**：通过自带的命令行工具和 Server 模式，团队快速搭建了高并发翻译服务，并直接复用官方预训练模型进行微调，极大降低了运维成本。\n\nnematus 让团队无需重复造轮子，即可低成本拥有工业级、可灵活定制的高性能神经机器翻译能力。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEdinburghNLP_nematus_e1631f52.png","EdinburghNLP","Edinburgh NLP","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FEdinburghNLP_1c2aec58.png","The Natural Language Processing Group at the University of Edinburgh",null,"http:\u002F\u002Fgroups.inf.ed.ac.uk\u002Fedinburghnlp\u002F","https:\u002F\u002Fgithub.com\u002FEdinburghNLP",[84,88,92,96,99,102,106,110,113,116],{"name":85,"color":86,"percentage":87},"Python","#3572A5",84,{"name":89,"color":90,"percentage":91},"Perl","#0298c3",5.4,{"name":93,"color":94,"percentage":95},"Emacs Lisp","#c065db",3.1,{"name":97,"color":98,"percentage":95},"JavaScript","#f1e05a",{"name":100,"color":101,"percentage":10},"Hack","#878787",{"name":103,"color":104,"percentage":105},"Shell","#89e051",1.3,{"name":107,"color":108,"percentage":109},"Smalltalk","#596706",0.3,{"name":111,"color":112,"percentage":109},"Ruby","#701516",{"name":114,"color":115,"percentage":109},"NewLisp","#87AED7",{"name":117,"color":118,"percentage":119},"Slash","#007eff",0.1,802,265,"2026-02-04T23:17:23","BSD-3-Clause","未说明 (通常支持 Linux，Docker 可用)","非必需但强烈推荐用于训练。需要 NVIDIA GPU (测试环境为 GTX Titan X)，需安装 CUDA >= 7 和 cuDNN >= 4 (测试环境为 CUDA 9.0)。仅 CPU 训练速度较慢。","未说明",{"notes":128,"python":129,"dependencies":130},"该工具主要基于 TensorFlow 构建。虽然支持 CPU 运行，但文档明确指出只有 GPU 训练才具有足够的速度。提供了 Docker 镜像构建脚本以简化环境配置（支持 CPU 和 GPU 版本）。若需使用旧版 Theano 训练的模型，可使用提供的脚本转换为 TensorFlow 格式。","3.5+ (测试版本 3.5.2)",[131,132,133,134],"TensorFlow 1.15 或 2.X (测试版本 2.0)","CUDA >= 7 (可选但推荐)","cuDNN >= 4 (可选但推荐)","nvidia-docker (如需使用 GPU Docker)",[15],[137,138,139,140,141],"neural-machine-translation","sequence-to-sequence","machine-translation","nmt","mt","2026-03-27T02:49:30.150509","2026-04-10T22:19:41.293699",[145,150,155,160,165,169,174],{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},28371,"运行 score.py 时出现 'KeyError: n_words_src' 错误，原因是什么？","该错误通常是因为加载的模型文件不完整或格式不匹配。确保在运行 score.py 时，通过 --models 参数正确指定了所有必要的模型文件（包括 .npz, .json, .meta 等），并且这些文件来自同一次训练检查点。如果混合了不同版本的模型文件或缺少了包含词汇表大小信息的配置文件，就会找不到 'n_words_src' 键。","https:\u002F\u002Fgithub.com\u002FEdinburghNLP\u002Fnematus\u002Fissues\u002F79",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},28372,"Transformer 模型的解码过程是串行的还是并行的？训练和推理时的注意力矩阵有何不同？","在训练阶段，Transformer 是并行处理的（类似于前馈神经网络），一次性生成整个句子的单个注意力矩阵（例如 16x16）。而在推理（解码）阶段，由于需要依赖已生成的词，过程是串行的，分步进行（例如 16 个步骤，分别生成 1x1, 2x2... 的矩阵）。推理阶段的 16 个串行步骤等效于训练阶段的一个并行步骤生成的单个大矩阵，而不是生成 16 个大矩阵。","https:\u002F\u002Fgithub.com\u002FEdinburghNLP\u002Fnematus\u002Fissues\u002F119",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},28373,"为什么训练时的采样结果正常，但使用 translate.py 翻译时结果很差（如空白或重复单词）？","这通常是因为训练时的采样（stochastic sampling）和翻译时的束搜索（beam search）机制不同。代码中曾有一个已知问题：在使用模型集成（ensembles）时，祖先采样（ancestral sampling）仅使用最后一个模型，而束搜索支持集成。如果不是使用集成模型，请检查是否因训练迭代次数不足导致模型未收敛，或者尝试调整束搜索的宽度参数。此外，需确认翻译脚本是否正确加载了与训练时一致的模型配置。","https:\u002F\u002Fgithub.com\u002FEdinburghNLP\u002Fnematus\u002Fissues\u002F27",{"id":161,"question_zh":162,"answer_zh":163,"source_url":164},28374,"使用 translate.py 处理大量数据时，日志长时间无输出且输出文件为空，是否正常？","这是正常现象，但用户体验不佳。Nematus 默认可能在翻译完成全部批次后才统一打印日志（如 'INFO: Translated xxxx sents'），导致运行过程中看似无响应。对于大规模数据（如 1000 万句），建议监控进程资源占用以确认程序未卡死。如果需要实时进度反馈，可能需要修改源码增加中间日志打印，或将其作为功能改进建议提交。","https:\u002F\u002Fgithub.com\u002FEdinburghNLP\u002Fnematus\u002Fissues\u002F84",{"id":166,"question_zh":167,"answer_zh":168,"source_url":164},28375,"Theano 版本的 Nematus 模型可以直接在 TensorFlow 版本的 Nematus 或 Marian 中加载吗？","不能直接加载。TensorFlow 版本的 Nematus 与 Theano 版本不向后兼容。如果需要转换，可以使用提供的转换脚本 theano_tf_convert.py 将模型从 Theano 格式转换为 TensorFlow 格式。同样，Marian 工具包也不完全支持所有由 Nematus 训练的模型，部分旧模型（如 WMT17 时期的模型）可能需要特定的处理方式或无法直接兼容。",{"id":170,"question_zh":171,"answer_zh":172,"source_url":173},28376,"运行 score.py 时遇到 'TypeError: float() argument must be a string or a number' 错误怎么办？","该错误通常发生在 theano_util.py 加载模型参数时，原因是数据类型转换失败。这可能是因为模型文件损坏、Theano 配置中的 floatX 设置与模型保存时的精度不一致，或者 numpy 版本兼容性问题。解决方法包括：1. 检查 THEANO_FLAGS 环境变量，确保 floatX=float32（或 float64，需与模型一致）；2. 尝试重新保存或转换模型文件；3. 确保使用的 Nematus 代码版本与模型训练时的版本一致，避免接口变更导致的加载错误。","https:\u002F\u002Fgithub.com\u002FEdinburghNLP\u002Fnematus\u002Fissues\u002F48",{"id":175,"question_zh":176,"answer_zh":177,"source_url":154},28377,"如何理解 Transformer 解码器在推理时每次只接收上一个词，但在代码中看到输入似乎是整个句子？","在推理阶段，虽然代码逻辑上可能传递了整个序列占位符，但实际上通过掩码（masking）机制确保解码器在预测第 t 个词时只能看到第 0 到 t-1 个词。训练时是一次性输入完整目标句子并利用掩码计算损失，而推理时是逐步生成，每一步利用之前生成的内容作为输入。这种机制保证了自回归性质的正确实现，同时利用了矩阵运算的高效性。",[]]