[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-andrewt3000--DL4NLP":3,"tool-andrewt3000--DL4NLP":65},[4,17,27,35,48,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",153609,2,"2026-04-13T11:34:59",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,"2026-04-10T11:13:16",[26,43,44,45,14,46,15,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":54,"last_commit_at":55,"category_tags":56,"status":16},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[15,43,46],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":54,"last_commit_at":63,"category_tags":64,"status":16},6590,"gpt4all","nomic-ai\u002Fgpt4all","GPT4All 是一款让普通电脑也能轻松运行大型语言模型（LLM）的开源工具。它的核心目标是打破算力壁垒，让用户无需依赖昂贵的显卡（GPU）或云端 API，即可在普通的笔记本电脑和台式机上私密、离线地部署和使用大模型。\n\n对于担心数据隐私、希望完全掌控本地数据的企业用户、研究人员以及技术爱好者来说，GPT4All 提供了理想的解决方案。它解决了传统大模型必须联网调用或需要高端硬件才能运行的痛点，让日常设备也能成为强大的 AI 助手。无论是希望构建本地知识库的开发者，还是单纯想体验私有化 AI 聊天的普通用户，都能从中受益。\n\n技术上，GPT4All 基于高效的 `llama.cpp` 后端，支持多种主流模型架构（包括最新的 DeepSeek R1 蒸馏模型），并采用 GGUF 格式优化推理速度。它不仅提供界面友好的桌面客户端，支持 Windows、macOS 和 Linux 等多平台一键安装，还为开发者提供了便捷的 Python 库，可轻松集成到 LangChain 等生态中。通过简单的下载和配置，用户即可立即开始探索本地大模型的无限可能。",77307,"2026-04-11T06:52:37",[15,13],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":79,"owner_twitter":79,"owner_website":79,"owner_url":82,"languages":79,"stars":83,"forks":84,"last_commit_at":85,"license":79,"difficulty_score":86,"env_os":87,"env_gpu":88,"env_ram":88,"env_deps":89,"category_tags":92,"github_topics":79,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":93,"updated_at":94,"faqs":95,"releases":96},7190,"andrewt3000\u002FDL4NLP","DL4NLP","Deep Learning for NLP resources","DL4NLP 是一个专注于自然语言处理（NLP）领域的深度学习资源合集，旨在为机器翻译、图像描述生成及对话系统等序列建模任务提供前沿的学习资料。它主要解决了初学者和研究者在进入该领域时面临的资源分散、难以系统获取经典论文与优质课程的问题。\n\n这份资源库非常适合 AI 研究人员、开发者以及希望深入理解 NLP 技术的学生使用。其独特亮点在于不仅汇集了斯坦福大学 CS224D\u002FCS224N 和牛津大学等顶尖高校的课程视频与讲义，还系统梳理了词向量（Word Vectors）技术的演进脉络。从 Bengio 的奠基之作，到 Mikolov 提出的 Word2Vec（涵盖 CBOW 与 Skip-gram 模型及其优化技巧），再到 GloVe 全局向量表示，DL4NLP 将复杂的理论概念与代码实现指南有机结合。通过整合这些高质量内容，它帮助用户快速掌握神经网络在语言处理中的核心应用，是构建扎实理论基础与实践能力的理想起点。","# Deep Learning for NLP resources\n\nState of the art resources for NLP sequence modeling tasks such as machine translation, image captioning, and dialog.\n\n[My notes on neural networks, rnn, lstm](https:\u002F\u002Fgithub.com\u002Fandrewt3000\u002FMachineLearning\u002Fblob\u002Fmaster\u002FneuralNets.md)  \n\n## Deep Learning for NLP \n[Stanford CS 224D: Deep Learning for NLP class](http:\u002F\u002Fcs224d.stanford.edu\u002Fsyllabus.html)  \n[Richard Socher](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=FaOcyfMAAAAJ&hl=en). Class with syllabus, and slides.  \nVideos: [2017 lectures](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6)  \nCS224N [Winter 2019 lectures](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8rXD5-xhemo&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z)  \n\n[Oxford Deep Learning for NLP class](http:\u002F\u002Fwww.cs.ox.ac.uk\u002Fteaching\u002Fcourses\u002F2016-2017\u002Fdl\u002F)  \n[Phil Blunsom](https:\u002F\u002Fscholar.google.co.uk\u002Fcitations?user=eJwbbXEAAAAJ&hl=en). (2017) Class by Deep Mind NLP Group.   \nLecture slides, videos, and practicals: [Github Repository](https:\u002F\u002Fgithub.com\u002Foxford-cs-deepnlp-2017)  \n[2017 videos](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL613dYIGMXoZBtZhbyiBqb0QtgK6oJbpm)  \n\n[A Primer on Neural Network Models for Natural Language Processing](https:\u002F\u002Fwww.jair.org\u002Fmedia\u002F4992\u002Flive-4992-9623-jair.pdf)  \nYoav Goldberg. Submitted 9\u002F2015, published 11\u002F16. 75 page summary of state of the art.  \n\n## Word Vectors\nResources about word vectors, aka word embeddings, and distributed representations for words.  \nWord vectors are numeric representations of words where similar words have similar vectors. Word vectors are often used as input to deep learning systems. This process is sometimes called pretraining. \n\n[A neural probabilistic language model.](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F1839-a-neural-probabilistic-language-model.pdf)  \nBengio 2003. Seminal paper on word vectors.  \n\n___\n[Efficient Estimation of Word Representations in Vector Space](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1301.3781v3.pdf)  \nMikolov et al. 2013. Word2Vec generates word vectors in an unsupervised way by attempting to predict words from a corpus. Describes Continuous Bag-of-Words (CBOW) and Continuous Skip-gram models for learning word vectors.  \nSkip-gram takes center word and predict outside words. Skip-gram is better for large datasets.  \nCBOW - takes outside words and predict the center word. CBOW is better for smaller datasets.  \n\n[Distributed Representations of Words and Phrases and their Compositionality](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)  \nMikolov et al. 2013. Learns vectors for phrases such as \"New York Times.\" Includes optimizations for skip-gram: heirachical softmax, and negative sampling. Subsampling frequent words. (i.e. frequent words like \"the\" are skipped periodically to speed things up and improve vector for less frequently used words)  \n\n[Linguistic Regularities in Continuous Space Word Representations](http:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FN13-1090)  \n[Mikolov](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=oBu8kMMAAAAJ&hl=en) et al. 2013. Performs well on word similarity and analogy task.  Expands on famous example: King – Man + Woman = Queen  \n[Word2Vec source code](https:\u002F\u002Fcode.google.com\u002Fp\u002Fword2vec\u002F)  \n[Word2Vec tutorial](http:\u002F\u002Ftensorflow.org\u002Ftutorials\u002Fword2vec\u002Findex.html) in [TensorFlow](http:\u002F\u002Ftensorflow.org\u002F)  \n\n[word2vec Parameter Learning Explained](http:\u002F\u002Fwww-personal.umich.edu\u002F~ronxin\u002Fpdf\u002Fw2vexp.pdf)  \nRong 2014  \n\nArticles explaining word2vec: [Deep Learning, NLP, and Representations](http:\u002F\u002Fcolah.github.io\u002Fposts\u002F2014-07-NLP-RNNs-Representations\u002F) and \n[The amazing power of word vectors](https:\u002F\u002Fblog.acolyer.org\u002F2016\u002F04\u002F21\u002Fthe-amazing-power-of-word-vectors\u002F)\n\n___\n[GloVe: Global vectors for word representation](http:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002Fglove.pdf)  \nPennington, Socher, Manning. 2014. Creates word vectors and relates word2vec to matrix factorizations.  [Evalutaion section led to controversy](http:\u002F\u002Frare-technologies.com\u002Fmaking-sense-of-word2vec\u002F) by [Yoav Goldberg](https:\u002F\u002Fplus.google.com\u002F114479713299850783539\u002Fposts\u002FBYvhAbgG8T2)  \n[Glove source code and training data](http:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002F) \n\n___\n[Enriching Word Vectors with Subword Information](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1607.04606v1.pdf)  \nBojanowski, Grave, Joulin, Mikolov 2016  \n[FastText Code](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FfastText)  \n\n[Advances in Pre-Training Distributed Word Representations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.09405)  \nT. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, A. Joulin 2017  \n[FastText library](https:\u002F\u002Ffasttext.cc\u002F) includes [English word vectors](https:\u002F\u002Ffasttext.cc\u002Fdocs\u002Fen\u002Fenglish-vectors.html)  \n\n## Sentiment Analysis\nThought vectors are numeric representations for sentences, paragraphs, and documents.  This concept is used for many text classification tasks such as sentiment analysis.      \n\n[Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank](http:\u002F\u002Fnlp.stanford.edu\u002F~socherr\u002FEMNLP2013_RNTN.pdf)  \nSocher et al. 2013.  Introduces Recursive Neural Tensor Network and dataset: \"sentiment treebank.\"  Includes [demo site](http:\u002F\u002Fnlp.stanford.edu\u002Fsentiment\u002F\n). Uses a parse tree.\n\n[Distributed Representations of Sentences and Documents](http:\u002F\u002Fcs.stanford.edu\u002F~quocle\u002Fparagraph_vector.pdf)  \n[Le](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=vfT6-XIAAAAJ), Mikolov. 2014.  Introduces Paragraph Vector. Concatenates and averages pretrained, fixed word vectors to create vectors for sentences, paragraphs and documents. Also known as paragraph2vec.  Doesn't use a parse tree.  \nImplemented in [gensim](https:\u002F\u002Fgithub.com\u002Fpiskvorky\u002Fgensim\u002F).  See [doc2vec tutorial](http:\u002F\u002Frare-technologies.com\u002Fdoc2vec-tutorial\u002F)\n\n[Deep Recursive Neural Networks for Compositionality in Language](http:\u002F\u002Fwww.cs.cornell.edu\u002F~oirsoy\u002Ffiles\u002Fnips14drsv.pdf)  \nIrsoy & Cardie. 2014.  Uses Deep Recursive Neural Networks. Uses a parse tree.\n\n[Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks](https:\u002F\u002Faclweb.org\u002Fanthology\u002FP\u002FP15\u002FP15-1150.pdf)  \nTai et al. 2015  Introduces Tree LSTM. Uses a parse tree.\n\n[Semi-supervised Sequence Learning](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.01432.pdf)  \nDai, Le 2015  \nApproach: \"We present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing.\nThe second approach is to use a sequence autoencoder...\"  \nResult: \"With pretraining, we are able to train long short term memory recurrent networks up to a few hundred\ntimesteps, thereby achieving strong performance in many text classification tasks, such as IMDB, DBpedia and 20 Newsgroups.\"\n\n[Bag of Tricks for Efficient Text Classification](https:\u002F\u002Farxiv.org\u002Fabs\u002F1607.01759)  \nJoulin, Grave, Bojanowski, Mikolov 2016 Facebook AI Research.  \n\"Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation.\"  \n[FastText blog](https:\u002F\u002Fresearch.facebook.com\u002Fblog\u002Ffasttext\u002F)  \n[FastText Code](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FfastText)  \n\n## Neural Machine Translation\nIn 2014, neural machine translation (NMT) performance became comprable to state of the art statistical machine translation(SMT).  \n\n[Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1406.1078v3.pdf) ([abstract](https:\u002F\u002Farxiv.org\u002Fabs\u002F1406.1078))    \nCho et al. 2014 Breakthrough deep learning paper on machine translation. Introduces basic sequence to sequence model which includes two rnns, an encoder for input and a decoder for output.  \n\n[Neural Machine Translation by jointly learning to align and translate](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.0473v6.pdf) ([abstract](https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.0473))     \nBahdanau, Cho, Bengio 2014.  \nImplements attention mechanism. \"Each time the proposed model generates a word in a translation, it\n(soft-)searches for a set of positions in a source sentence where the most relevant information is\nconcentrated\"  \nResult: \"comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation.\"  \n[English to French Demo](http:\u002F\u002F104.131.78.120\u002F)  \n\n[On Using Very Large Target Vocabulary for Neural Machine Translation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1412.2007v2.pdf)  \nJean, Cho, Memisevic, Bengio 2014.    \n\"we try replacing each [UNK] token with the aligned source word or its most likely translation determined by another word alignment model.\"  \nResult: English -> German bleu score = 21.59 (target vocabulary of 50,000)    \n\n[Sequence to Sequence Learning with Neural Networks](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.3215v3.pdf)  \nSutskever, Vinyals, Le 2014.  ([nips presentation](http:\u002F\u002Fresearch.microsoft.com\u002Fapps\u002Fvideo\u002F?id=239083)). Uses seq2seq to generate translations.  \nResult: English -> French bleu score = 34.8 (WMT’14 dataset)  \nA key contribution is improvements from reversing the source sentences.  \n[seq2seq tutorial](http:\u002F\u002Ftensorflow.org\u002Ftutorials\u002Fseq2seq\u002Findex.html) in [TensorFlow](http:\u002F\u002Ftensorflow.org\u002F).   \n\n[Addressing the Rare Word Problem in Neural Machine Translation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1410.8206v4.pdf) ([abstract](https:\u002F\u002Farxiv.org\u002Fabs\u002F1410.8206))  \nLuong, Sutskever, Le, Vinyals, Zaremba 2014    \nReplace UNK words with dictionary lookup.  \nResult: English -> French BLEU score = 37.5.  \n\n[Effective Approaches to Attention-based Neural Machine Translation](http:\u002F\u002Fstanford.edu\u002F~lmthang\u002Fdata\u002Fpapers\u002Femnlp15_attn.pdf)  \nLuong, Pham, Manning. 2015  \n2 models of attention: global and local.  \nResult: English -> German 25.9 BLEU points  \n\n[Context-Dependent Word Representation for Neural\nMachine Translation](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1607.00578v1.pdf)  \nChoi, Cho, Bengio 2016  \n\"we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence.\"  \n\"we propose to represent special tokens (such as numbers, proper nouns and acronyms) with typed symbols to facilitate translating those words that are not well-suited to be translated via continuous vectors.\"   \n\n[Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation](http:\u002F\u002Farxiv.org\u002Fabs\u002F1609.08144)  \nWu et al. 2016  \n[blog post](https:\u002F\u002Fresearch.googleblog.com\u002F2016\u002F09\u002Fa-neural-network-for-machine.html)  \n\"WMT’14 English-to-French, our single model scores 38.95 BLEU\"  \n\"WMT’14 English-to-German, our single model scores 24.17 BLEU\"  \n\n[Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.04558)  \nJohnson et al. 2016  \n[blog post](https:\u002F\u002Fresearch.googleblog.com\u002F2016\u002F11\u002Fzero-shot-translation-with-googles.html)  \nTranslations between untrained language pairs.  \n\nGoogle has started [rolling out NMT](https:\u002F\u002Fblog.google\u002Fproducts\u002Ftranslate\u002Ffound-translation-more-accurate-fluent-sentences-google-translate\u002F) to it's production system, and it's a [significant improvement](http:\u002F\u002Fwww.nytimes.com\u002F2016\u002F12\u002F14\u002Fmagazine\u002Fthe-great-ai-awakening.html?_r=0).  \n\n[Convolutional Sequence to Sequence Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.03122)  \nGehring et al. 2017 Facebook AI research \n[blog post](https:\u002F\u002Fcode.facebook.com\u002Fposts\u002F1978007565818999\u002Fa-novel-approach-to-neural-machine-translation\u002F)  \nArchitecture: Convolutional sequence to sequence. ConvS2s.  \nResults: \"We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.\"\n  \n[Facebook is transitioning entirely to neural machine translation](https:\u002F\u002Fcode.facebook.com\u002Fposts\u002F289921871474277\u002Ftransitioning-entirely-to-neural-machine-translation\u002F)\n  \n[Transformer: A Novel Neural Network Architecture for Language Understanding](https:\u002F\u002Fresearch.googleblog.com\u002F2017\u002F08\u002Ftransformer-novel-neural-network.html)  \nArcitecture: Transformer, a T2T model introduced by Google in [Attention is all you need](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762)  \nResults: \"we show that the Transformer outperforms both recurrent and convolutional models on academic English to German and English to French translation benchmarks.\"  \n[T2T Source code](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor)  \n[T2T blog post](https:\u002F\u002Fresearch.googleblog.com\u002F2017\u002F06\u002Faccelerating-deep-learning-research.html)   \n\n[Universal Transformer](https:\u002F\u002Fai.googleblog.com\u002F2018\u002F08\u002Fmoving-beyond-translation-with.html)  \n8\u002F15\u002F18 Google increases BLEU score by 1 and uses universal tranformer in other domains besides translation.  \n\n[DeepL Translator](https:\u002F\u002Fwww.deepl.com\u002Ftranslator) claims to [outperform competitors](https:\u002F\u002Fwww.deepl.com\u002Fpress.html) but doesn't disclose their architecture.\n\"Specific details of our network architecture will not be published at this time. DeepL Translator is based on a single, non-ensemble model.\"  \n  \n## Conversation modeling \u002F Dialog\n[Neural Responding Machine for Short-Text Conversation](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1503.02364v2.pdf)  \nShang et al. 2015  Uses Neural Responding Machine.  Trained on Weibo dataset.  Achieves one round conversations with 75% appropriate responses.  \n\n[A Neural Network Approach to Context-Sensitive Generation of Conversational Responses](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.06714v1.pdf)  \nSordoni et al. 2015.  Generates responses to tweets.   \nUses [Recurrent Neural Network Language Model (RLM) architecture\nof (Mikolov et al., 2010).](http:\u002F\u002Fwww.fit.vutbr.cz\u002Fresearch\u002Fgroups\u002Fspeech\u002Fpubli\u002F2010\u002Fmikolov_interspeech2010_IS100722.pdf)  source code: [RNNLM Toolkit](http:\u002F\u002Fwww.rnnlm.org\u002F)\n\n[Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1507.04808v3.pdf)  \nSerban, Sordoni, Bengio et al. 2015. Extends [hierarchical recurrent encoder-decoder](https:\u002F\u002Farxiv.org\u002Fabs\u002F1507.02221) neural network (HRED).\n\n[Attention with Intention for a Neural Network Conversation Model](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1510.08565v3.pdf)  \nYao et al. 2015 Architecture is three recurrent networks: an encoder, an intention network and a decoder.  \n\n[A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1605.06069v3.pdf)  \nSerban, Sordoni, Lowe, Charlin, Pineau, Courville, Bengio 2016  \nProposes novel architecture: VHRED.  Latent Variable Hierarchical Recurrent Encoder-Decoder  \nCompares favorably against LSTM and HRED.  \n___\n[A Neural Conversation Model](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.05869v3.pdf)  \nVinyals, [Le](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=vfT6-XIAAAAJ) 2015.  Uses LSTM RNNs to generate conversational responses. Uses [seq2seq framework](http:\u002F\u002Ftensorflow.org\u002Ftutorials\u002Fseq2seq\u002Findex.html).  Seq2Seq was originally designed for machine translation and it \"translates\" a single sentence, up to around 79 words, to a single sentence response, and has no memory of previous dialog exchanges.  Used in Google [Smart Reply feature for Inbox](http:\u002F\u002Fgoogleresearch.blogspot.co.uk\u002F2015\u002F11\u002Fcomputer-respond-to-this-email.html)  \n\n[Incorporating Copying Mechanism in Sequence-to-Sequence Learning](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1603.06393v3.pdf)  \nGu et al. 2016 Proposes CopyNet, builds on seq2seq.  \n\n[A Persona-Based Neural Conversation Model](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1603.06155v2.pdf)  \nLi et al. 2016  Proposes persona-based models for handling the issue of speaker consistency in neural response generation. Builds on seq2seq.  \n\n[Deep Reinforcement Learning for Dialogue Generation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.01541v3.pdf)  \nLi et al. 2016. Uses reinforcement learing to generate diverse responses. Trains 2 agents to chat with each other. Builds on seq2seq.   \n\n[Adversarial Learning for Neural Dialogue Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1701.06547)  \nLi et al. 2017  \n\"We cast the task as a reinforcement learning (RL) problem where we jointly train two systems, a generative model to produce response sequences, and a discriminator—analagous to the human evaluator in the Turing test— to distinguish between the human-generated dialogues and the machine-generated ones. The outputs from the discriminator are then used as rewards for the generative model\"\nThey use REINFORCE algorithm (Williams 1992). They refine using Reward for Every Generation Step (REGS) which breaks up the response and gives rewards for the words individually rather than a single value. They also use \"Teacher Forcing\" to stablize the GAN, which provides a correct (human) response to train incorrect examples. Minimum nunber of words in response is 5.     \nIncludes [source code](https:\u002F\u002Fgithub.com\u002Fjiweil\u002FNeural-Dialogue-Generation)  \n[Video explaining the paper](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8fA6qYG4jFc)  \n\n___\n[Deep learning for chatbots](http:\u002F\u002Fwww.wildml.com\u002F2016\u002F04\u002Fdeep-learning-for-chatbots-part-1-introduction\u002F)  \nArticle summary of state of the art, and challenges for chatbots from 2016.  \n[Deep learning for chatbots. part 2](http:\u002F\u002Fwww.wildml.com\u002F2016\u002F07\u002Fdeep-learning-for-chatbots-2-retrieval-based-model-tensorflow\u002F)  \nImplements a retrieval based dialog agent using dual encoder lstm with TensorFlow, based on the Ubuntu dataset [[paper](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.08909v3.pdf)] includes [source code](https:\u002F\u002Fgithub.com\u002Fdennybritz\u002Fchatbot-retrieval\u002F)  \n\n[Chatbot and Related Research Paper Notes with Images](https:\u002F\u002Fgithub.com\u002Fricsinaruto\u002FSeq2seqChatbots\u002Fwiki\u002FChatbot-and-Related-Research-Paper-Notes-with-Images)  \n\n[Neural Dialog Papers](https:\u002F\u002Fgithub.com\u002Fsnakeztc\u002FNeuralDialogPapers) - A list of papers about creating dialog systems using deep nets   \n\n\n## Language Modeling\nResearchers have been training increasingly large language models and using them to \"transfer learn\" other tasks such as [Google's Bert](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.04805), [fast.ai's ULMFit](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.06146), \n\n[Better Language Models and their implications](https:\u002F\u002Fblog.openai.com\u002Fbetter-language-models\u002F) 2\u002F14\u002F19 Open AI partially releases a large language model, GPT-2.     \n\n\n## Memory and Attention Models\nAttention mechanisms allows the network to refer back to the input sequence, instead of forcing it to encode all information into one fixed-length vector.  - [Attention and Memory in Deep Learning and NLP](http:\u002F\u002Fwww.opendatascience.com\u002Fblog\u002Fattention-and-memory-in-deep-learning-and-nlp\u002F)  \n\n[Memory Networks](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1410.3916v10.pdf) Weston et. al 2014, and \n[End-To-End Memory Networks](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1503.08895v4.pdf) Sukhbaatar et. al 2015.  \nMemory networks are implemented in [MemNN](https:\u002F\u002Fgithub.com\u002Ffacebook\u002FMemNN).  Attempts to solve task of reason attention and memory.  \n[Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1502.05698v7.pdf)  \nWeston 2015. Classifies QA tasks like single factoid, yes\u002Fno etc. Extends memory networks.  \n[Evaluating prerequisite qualities for learning end to end dialog systems](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.06931.pdf)  \nDodge et. al 2015. Tests Memory Networks on 4 tasks including reddit dialog task.  \nSee [Jason Weston lecture on MemNN](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Xumy3Yjq4zk)  \n  \n[Neural Turing Machines](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1410.5401v2.pdf)  \nGraves, Wayne, Danihelka 2014.  \nWe extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-toend, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate\nthat Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.\n[Olah and Carter blog on NTM](http:\u002F\u002Fdistill.pub\u002F2016\u002Faugmented-rnns\u002F#neural-turing-machines)  \n\n[Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1503.01007v4.pdf)  \nJoulin, Mikolov 2015. [Stack RNN source code](https:\u002F\u002Fgithub.com\u002Ffacebook\u002FStack-RNN) and [blog post](https:\u002F\u002Fresearch.facebook.com\u002Fblog\u002F1642778845966521\u002Finferring-algorithmic-patterns-with-stack\u002F)  \n","# 自然语言处理的深度学习资源\n\n用于机器翻译、图像字幕生成和对话等序列建模任务的最先进资源。\n\n[我对神经网络、RNN、LSTM 的笔记](https:\u002F\u002Fgithub.com\u002Fandrewt3000\u002FMachineLearning\u002Fblob\u002Fmaster\u002FneuralNets.md)  \n\n## 深度学习与自然语言处理 \n[斯坦福 CS 224D：深度学习与自然语言处理课程](http:\u002F\u002Fcs224d.stanford.edu\u002Fsyllabus.html)  \n[Richard Socher](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=FaOcyfMAAAAJ&hl=en)。该课程包含教学大纲和幻灯片。  \n视频：[2017 年讲座](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6)  \nCS224N [2019 年冬季讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8rXD5-xhemo&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z)  \n\n[牛津大学深度学习与自然语言处理课程](http:\u002F\u002Fwww.cs.ox.ac.uk\u002Fteaching\u002Fcourses\u002F2016-2017\u002Fdl\u002F)  \n[Phil Blunsom](https:\u002F\u002Fscholar.google.co.uk\u002Fcitations?user=eJwbbXEAAAAJ&hl=en)。（2017 年）由 Deep Mind NLP 团队开设的课程。  \n讲座幻灯片、视频和实践课：[GitHub 仓库](https:\u002F\u002Fgithub.com\u002Foxford-cs-deepnlp-2017)  \n[2017 年视频](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL613dYIGMXoZBtZhbyiBqb0QtgK6oJbpm)  \n\n[A Primer on Neural Network Models for Natural Language Processing](https:\u002F\u002Fwww.jair.org\u002Fmedia\u002F4992\u002Flive-4992-9623-jair.pdf)  \nYoav Goldberg。2015 年 9 月提交，2016 年 11 月发表。这是一份长达 75 页的最新进展综述。  \n\n## 词向量\n关于词向量（又称词嵌入）以及单词分布式表示的资源。  \n词向量是用数值来表示单词的方式，相似的单词在向量空间中也彼此接近。词向量通常被用作深度学习系统的输入，这一过程有时被称为预训练。 \n\n[A neural probabilistic language model.](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F1839-a-neural-probabilistic-language-model.pdf)  \nBengio 2003 年。关于词向量的开创性论文。  \n\n___\n[Efficient Estimation of Word Representations in Vector Space](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1301.3781v3.pdf)  \nMikolov 等人 2013 年。Word2Vec 通过尝试从语料库中预测单词，以无监督方式生成词向量。文中介绍了连续词袋模型（CBOW）和连续跳字模型（Skip-gram），用于学习词向量。  \nSkip-gram 以中心词为输入，预测周围的上下文词；它更适合大型数据集。  \nCBOW 则以周围的上下文词为输入，预测中心词；它更适合小型数据集。  \n\n[Distributed Representations of Words and Phrases and their Compositionality](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)  \nMikolov 等人 2013 年。该研究学习了诸如“New York Times”之类的短语的向量表示。还包括对 Skip-gram 模型的优化：层次化 Softmax 和负采样，并对高频词进行子采样。（即定期跳过像“the”这样的高频词，以加快计算速度并提升低频词的向量质量）  \n\n[Linguistic Regularities in Continuous Space Word Representations](http:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FN13-1090)  \n[Mikolov](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=oBu8kMMAAAAJ&hl=en) 等人 2013 年。该方法在单词相似性和类比任务上表现优异。扩展了著名的例子：King – Man + Woman = Queen  \n[Word2Vec 源代码](https:\u002F\u002Fcode.google.com\u002Fp\u002Fword2vec\u002F)  \n[TensorFlow 中的 Word2Vec 教程](http:\u002F\u002Ftensorflow.org\u002Ftutorials\u002Fword2vec\u002Findex.html)  \n\n[word2vec 参数学习详解](http:\u002F\u002Fwww-personal.umich.edu\u002F~ronxin\u002Fpdf\u002Fw2vexp.pdf)  \nRong 2014 年  \n\n解释 Word2Vec 的文章：[深度学习、NLP 和表示](http:\u002F\u002Fcolah.github.io\u002Fposts\u002F2014-07-NLP-RNNs-Representations\u002F) 和  \n[词向量的惊人力量](https:\u002F\u002Fblog.acolyer.org\u002F2016\u002F04\u002F21\u002Fthe-amazing-power-of-word-vectors\u002F)  \n\n___\n[GloVe: Global vectors for word representation](http:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002Fglove.pdf)  \nPennington、Socher、Manning。2014 年。该方法创建词向量，并将 Word2Vec 与矩阵分解联系起来。其评估部分引发了争议，[Yoav Goldberg](https:\u002F\u002Fplus.google.com\u002F114479713299850783539\u002Fposts\u002FBYvhAbgG8T2) 对此进行了评论。  \n[GloVe 源代码和训练数据](http:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002F)  \n\n___\n[Enriching Word Vectors with Subword Information](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1607.04606v1.pdf)  \nBojanowski、Grave、Joulin、Mikolov 2016 年  \n[FastText 代码](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FfastText)  \n\n[Advances in Pre-Training Distributed Word Representations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.09405)  \nT. Mikolov、E. Grave、P. Bojanowski、C. Puhrsch、A. Joulin 2017 年  \n[FastText 库](https:\u002F\u002Ffasttext.cc\u002F) 包括 [英语词向量](https:\u002F\u002Ffasttext.cc\u002Fdocs\u002Fen\u002Fenglish-vectors.html)\n\n## 情感分析\n句子向量是句子、段落和文档的数值化表示。这一概念被广泛应用于多种文本分类任务，例如情感分析。\n\n[基于情感树库的语义组合性递归深度模型](http:\u002F\u002Fnlp.stanford.edu\u002F~socherr\u002FEMNLP2013_RNTN.pdf)  \nSocher 等人，2013年。提出了递归神经张量网络及数据集“情感树库”。附有[演示网站](http:\u002F\u002Fnlp.stanford.edu\u002Fsentiment\u002F)。该方法使用句法解析树。\n\n[句子与文档的分布式表示](http:\u002F\u002Fcs.stanford.edu\u002F~quocle\u002Fparagraph_vector.pdf)  \nLe 和 Mikolov，2014年。提出了段落向量模型。通过拼接和平均预训练的固定词向量来生成句子、段落和文档的向量表示，也称为 paragraph2vec。该方法不使用句法解析树。\n\n已在[Gensim](https:\u002F\u002Fgithub.com\u002Fpiskvorky\u002Fgensim\u002F)中实现。参见[doc2vec 教程](http:\u002F\u002Frare-technologies.com\u002Fdoc2vec-tutorial\u002F)。\n\n[用于语言组合性的深度递归神经网络](http:\u002F\u002Fwww.cs.cornell.edu\u002F~oirsoy\u002Ffiles\u002Fnips14drsv.pdf)  \nIrsoy 和 Cardie，2014年。采用深度递归神经网络，并使用句法解析树。\n\n[基于树结构长短期记忆网络的改进语义表示](https:\u002F\u002Faclweb.org\u002Fanthology\u002FP\u002FP15\u002FP15-1150.pdf)  \nTai 等人，2015年。提出了树状 LSTM 模型，并使用句法解析树。\n\n[半监督序列学习](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.01432.pdf)  \nDai 和 Le，2015年。  \n方法：“我们提出了两种利用未标注数据来提升循环网络序列学习效果的方法。第一种方法是预测序列中的下一个元素，这在自然语言处理中即为传统的语言模型。第二种方法则是使用序列自编码器……”  \n结果：“通过预训练，我们能够训练长达数百个时间步的长短期记忆循环网络，从而在许多文本分类任务中取得优异性能，例如 IMDB、DBpedia 和 20 Newsgroups 数据集。”\n\n[高效文本分类技巧集](https:\u002F\u002Farxiv.org\u002Fabs\u002F1607.01759)  \nJoulin、Grave、Bojanowski、Mikolov，2016年，Facebook AI Research。  \n“我们的实验表明，我们的快速文本分类器 fastText 在准确率上往往与深度学习分类器不相上下，而在训练和评估速度上则快出多个数量级。”  \n[FastText 博客](https:\u002F\u002Fresearch.facebook.com\u002Fblog\u002Ffasttext\u002F)  \n[FastText 代码](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FfastText)\n\n## 神经机器翻译\n2014年，神经机器翻译（NMT）的性能已可与当时最先进的统计机器翻译（SMT）相媲美。\n\n[使用RNN编码器-解码器学习短语表示以进行统计机器翻译](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1406.1078v3.pdf) ([摘要](https:\u002F\u002Farxiv.org\u002Fabs\u002F1406.1078))    \nCho等人，2014年。这是一篇关于机器翻译的深度学习突破性论文。文中介绍了基本的序列到序列模型，该模型包含两个RNN，一个用于输入的编码器和一个用于输出的解码器。\n\n[通过联合学习对齐与翻译实现神经机器翻译](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.0473v6.pdf) ([摘要](https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.0473))     \nBahdanau、Cho、Bengio，2014年。  \n实现了注意力机制。“每次提出的模型在翻译中生成一个词时，它都会（软）搜索源句子中集中了最相关信息的一组位置。”  \n结果：“在英法翻译任务上，其性能可与现有的基于短语的最先进系统相媲美。”  \n[英法演示](http:\u002F\u002F104.131.78.120\u002F)  \n\n[关于在神经机器翻译中使用超大目标词汇表](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1412.2007v2.pdf)  \nJean、Cho、Memisevic、Bengio，2014年。    \n“我们尝试用对齐的源词或由另一个词对齐模型确定的最可能译文来替换每个[UNK]标记。”  \n结果：英语→德语BLEU分数=21.59（目标词汇表为50,000）    \n\n[使用神经网络进行序列到序列学习](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.3215v3.pdf)  \nSutskever、Vinyals、Le，2014年。([NIPS演讲](http:\u002F\u002Fresearch.microsoft.com\u002Fapps\u002Fvideo\u002F?id=239083))。采用序列到序列模型生成翻译。  \n结果：英语→法语BLEU分数=34.8（WMT’14数据集）  \n一个重要贡献是通过反转源句子所带来的改进。  \n[TensorFlow中的seq2seq教程](http:\u002F\u002Ftensorflow.org\u002Ftutorials\u002Fseq2seq\u002Findex.html)。   \n\n[解决神经机器翻译中的罕见词问题](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1410.8206v4.pdf) ([摘要](https:\u002F\u002Farxiv.org\u002Fabs\u002F1410.8206))  \nLuong、Sutskever、Le、Vinyals、Zaremba，2014年    \n用词典查表替换UNK词。  \n结果：英语→法语BLEU分数=37.5。  \n\n[基于注意力的神经机器翻译的有效方法](http:\u002F\u002Fstanford.edu\u002F~lmthang\u002Fdata\u002Fpapers\u002Femnlp15_attn.pdf)  \nLuong、Pham、Manning，2015年  \n两种注意力模型：全局和局部。  \n结果：英语→德语BLEU分数为25.9  \n\n[用于神经机器翻译的上下文相关词表示](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1607.00578v1.pdf)  \nChoi、Cho、Bengio，2016年  \n“我们建议使用源句子的非线性词袋表示来使词嵌入向量具有上下文相关性。”  \n“我们建议用类型化的符号来表示特殊标记（如数字、专有名词和缩写），以方便翻译那些不适合用连续向量表示的词语。”   \n\n[谷歌的神经机器翻译系统：弥合人与机器翻译之间的差距](http:\u002F\u002Farxiv.org\u002Fabs\u002F1609.08144)  \nWu等人，2016年  \n[博客文章](https:\u002F\u002Fresearch.googleblog.com\u002F2016\u002F09\u002Fa-neural-network-for-machine.html)  \n“在WMT’14英法翻译任务中，我们的单模型BLEU分数达到38.95。”  \n“在WMT’14英德翻译任务中，我们的单模型BLEU分数达到24.17。”  \n\n[谷歌的多语言神经机器翻译系统：实现零样本翻译](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.04558)  \nJohnson等人，2016年  \n[博客文章](https:\u002F\u002Fresearch.googleblog.com\u002F2016\u002F11\u002Fzero-shot-translation-with-googles.html)  \n可在未训练的语言对之间进行翻译。  \n\n谷歌已开始将其生产系统[逐步部署NMT](https:\u002F\u002Fblog.google\u002Fproducts\u002Ftranslate\u002Ffound-translation-more-accurate-fluent-sentences-google-translate\u002F)，并且这是一项[重大改进](http:\u002F\u002Fwww.nytimes.com\u002F2016\u002F12\u002F14\u002Fmagazine\u002Fthe-great-ai-awakening.html?_r=0)。  \n\n[卷积序列到序列学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.03122)  \nGehring等人，2017年，Facebook AI研究 \n[博客文章](https:\u002F\u002Fcode.facebook.com\u002Fposts\u002F1978007565818999\u002Fa-novel-approach-to-neural-machine-translation\u002F)  \n架构：卷积序列到序列。ConvS2s。  \n结果：“我们在WMT'14英德和英法翻译任务上，以比Wu等人（2016年）的深层LSTM设置快一个数量级的速度，超越了其准确率，无论是在GPU还是CPU上。”  \nFacebook正[完全过渡到神经机器翻译](https:\u002F\u002Fcode.facebook.com\u002Fposts\u002F289921871474277\u002Ftransitioning-entirely-to-neural-machine-translation\u002F)。\n\n[Transformer：一种用于语言理解的新式神经网络架构](https:\u002F\u002Fresearch.googleblog.com\u002F2017\u002F08\u002Ftransformer-novel-neural-network.html)  \n架构：Transformer，这是谷歌在[Attention is all you need](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762)中提出的一种T2T模型。  \n结果：“我们证明，Transformer在学术性的英德和英法翻译基准测试中，均优于循环和卷积模型。”  \n[T2T源代码](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor)  \n[T2T博客文章](https:\u002F\u002Fresearch.googleblog.com\u002F2017\u002F06\u002Faccelerating-deep-learning-research.html)   \n\n[通用Transformer](https:\u002F\u002Fai.googleblog.com\u002F2018\u002F08\u002Fmoving-beyond-translation-with.html)  \n2018年8月15日，谷歌将BLEU分数提高了1点，并将通用Transformer应用于翻译以外的其他领域。  \n\n[DeepL Translator](https:\u002F\u002Fwww.deepl.com\u002Ftranslator)声称其性能[优于竞争对手](https:\u002F\u002Fwww.deepl.com\u002Fpress.html)，但并未公开其架构。\n“目前暂不公布我们网络架构的具体细节。DeepL Translator基于单一的非集成模型。”\n\n## 对话建模 \u002F 会话\n[用于短文本对话的神经响应机](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1503.02364v2.pdf)  \nShang 等人，2015年。使用神经响应机模型，在微博数据集上训练。在单轮对话中，能够生成75%合适的回复。\n\n[基于神经网络的上下文敏感对话回复生成方法](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.06714v1.pdf)  \nSordoni 等人，2015年。用于生成推文回复。  \n采用[Mikolov 等人（2010）提出的循环神经网络语言模型（RLM）架构](http:\u002F\u002Fwww.fit.vutbr.cz\u002Fresearch\u002Fgroups\u002Fspeech\u002Fpubli\u002F2010\u002Fmikolov_interspeech2010_IS100722.pdf)。源代码：[RNNLM 工具包](http:\u002F\u002Fwww.rnnlm.org\u002F)。\n\n[利用生成式层次化神经网络模型构建端到端对话系统](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1507.04808v3.pdf)  \nSerban、Sordoni、Bengio 等人，2015年。扩展了[层次递归编码器-解码器](https:\u002F\u002Farxiv.org\u002Fabs\u002F1507.02221)神经网络（HRED）。\n\n[带有意图注意力的神经网络对话模型](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1510.08565v3.pdf)  \nYao 等人，2015年。该架构包含三个循环网络：编码器、意图网络和解码器。\n\n[用于生成对话的层次化潜在变量编码器-解码器模型](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1605.06069v3.pdf)  \nSerban、Sordoni、Lowe、Charlin、Pineau、Courville、Bengio，2016年。提出了一种新颖的架构：VHRED，即潜在变量层次递归编码器-解码器。与 LSTM 和 HRED 相比表现更优。\n___\n[神经对话模型](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.05869v3.pdf)  \nVinyals、[Le](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=vfT6-XIAAAAJ) 2015年。使用 LSTM 循环神经网络生成对话回复。采用 [seq2seq 框架](http:\u002F\u002Ftensorflow.org\u002Ftutorials\u002Fseq2seq\u002Findex.html)。Seq2Seq 最初是为机器翻译设计的，它将一段最多约79个词的句子“翻译”成一句回复，且无法记住之前的对话内容。该技术被应用于谷歌 Inbox 的[智能回复功能](http:\u002F\u002Fgoogleresearch.blogspot.co.uk\u002F2015\u002F11\u002Fcomputer-respond-to-this-email.html)。\n\n[序列到序列学习中引入复制机制](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1603.06393v3.pdf)  \nGu 等人，2016年。提出了 CopyNet 模型，并在此基础上改进了 seq2seq。\n\n[基于人格特征的神经对话模型](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1603.06155v2.pdf)  \nLi 等人，2016年。提出基于人格特征的模型，以解决神经网络生成回复时说话者一致性的问题。该模型建立在 seq2seq 基础之上。\n\n[用于对话生成的深度强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.01541v3.pdf)  \nLi 等人，2016年。利用强化学习生成多样化的回复。训练两个智能体相互对话。基于 seq2seq 架构。\n\n[用于神经对话生成的对抗学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1701.06547)  \nLi 等人，2017年。  \n“我们将这项任务视为一个强化学习问题，联合训练两个系统：一个生成模型用于产生回复序列，另一个判别器——类似于图灵测试中的人类评估者——用来区分人类生成的对话和机器生成的对话。判别器的输出随后被用作生成模型的奖励。”  \n他们使用 REINFORCE 算法（Williams，1992）。并通过“每步奖励生成”（REGS）进一步优化，该方法将回复分解为单个词语并分别给予奖励，而非仅给出一个整体分数。此外，他们还使用“教师强制”来稳定 GAN 模型，通过提供正确的人类回复来训练错误的示例。回复的最小词数为5个。  \n附有[源代码](https:\u002F\u002Fgithub.com\u002Fjiweil\u002FNeural-Dialogue-Generation)。  \n[论文讲解视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8fA6qYG4jFc)。\n\n___\n[聊天机器人的深度学习应用](http:\u002F\u002Fwww.wildml.com\u002F2016\u002F04\u002Fdeep-learning-for-chatbots-part-1-introduction\u002F)  \n文章总结了2016年的最新进展及聊天机器人面临的挑战。  \n[聊天机器人的深度学习应用·第2部分](http:\u002F\u002Fwww.wildml.com\u002F2016\u002F07\u002Fdeep-learning-for-chatbots-2-retrieval-based-model-tensorflow\u002F)  \n基于 Ubuntu 数据集，使用 TensorFlow 实现了一个基于检索的对话代理，采用双编码器 LSTM 模型[[论文](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.08909v3.pdf)]。附有[源代码](https:\u002F\u002Fgithub.com\u002Fdennybritz\u002Fchatbot-retrieval\u002F)。\n\n[聊天机器人及相关研究论文笔记与图片](https:\u002F\u002Fgithub.com\u002Fricsinaruto\u002FSeq2seqChatbots\u002Fwiki\u002FChatbot-and-Related-Research-Paper-Notes-with-Images)\n\n[神经对话论文列表](https:\u002F\u002Fgithub.com\u002Fsnakeztc\u002FNeuralDialogPapers) - 一份关于使用深度网络构建对话系统的论文清单。\n\n## 语言建模\n研究人员一直在训练规模越来越大的语言模型，并将其用于其他任务的迁移学习，例如 [Google 的 Bert](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.04805)、[fast.ai 的 ULMFit](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.06146)等。\n\n[更好的语言模型及其影响](https:\u002F\u002Fblog.openai.com\u002Fbetter-language-models\u002F) 2019年2月14日，OpenAI 部分发布了大型语言模型 GPT-2。\n\n## 记忆与注意力模型\n注意力机制使网络能够回溯到输入序列，而不必将所有信息编码成一个固定长度的向量。  - [深度学习与自然语言处理中的注意力与记忆](http:\u002F\u002Fwww.opendatascience.com\u002Fblog\u002Fattention-and-memory-in-deep-learning-and-nlp\u002F)  \n\n[记忆网络](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1410.3916v10.pdf) 西斯顿等人，2014年；以及 [端到端记忆网络](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1503.08895v4.pdf) 苏克巴塔尔等人，2015年。记忆网络已在 [MemNN](https:\u002F\u002Fgithub.com\u002Ffacebook\u002FMemNN) 中实现。这些模型尝试解决涉及推理、注意力和记忆的任务。  \n[Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1502.05698v7.pdf)  \n西斯顿，2015年。该论文对问答任务进行了分类，如单一事实型、是\u002F否型等，并扩展了记忆网络。  \n[Evaluating prerequisite qualities for learning end to end dialog systems](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.06931.pdf)  \n多奇等人，2015年。该研究在包括Reddit对话任务在内的4项任务上测试了记忆网络。  \n参见 [杰森·西斯顿关于MemNN的讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Xumy3Yjq4zk)  \n\n[神经图灵机](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1410.5401v2.pdf)  \n格雷夫斯、韦恩、丹尼尔卡，2014年。  \n我们通过将神经网络与外部记忆资源相结合，赋予其通过注意力机制与外部记忆交互的能力，从而扩展了神经网络的功能。这种组合系统类似于图灵机或冯·诺依曼架构，但它是可微分的端到端模型，因此可以使用梯度下降法高效地进行训练。初步结果表明，神经图灵机能够从输入输出示例中推断出简单的算法，例如复制、排序和联想式回忆。  \n[奥拉和卡特关于NTM的博客](http:\u002F\u002Fdistill.pub\u002F2016\u002Faugmented-rnns\u002F#neural-turing-machines)  \n\n[利用堆栈增强的循环网络推断算法模式](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1503.01007v4.pdf)  \n朱林、米科洛夫，2015年。[堆栈RNN源代码](https:\u002F\u002Fgithub.com\u002Ffacebook\u002FStack-RNN) 和 [博客文章](https:\u002F\u002Fresearch.facebook.com\u002Fblog\u002F1642778845966521\u002Finferring-algorithmic-patterns-with-stack\u002F)","# DL4NLP 快速上手指南\n\n**注意**：`DL4NLP` 并非一个可安装的单一软件包或库，而是一个由 Andrew Trask 维护的**深度学习与自然语言处理（NLP）资源汇总清单**。它包含了顶尖的课程、经典论文、开源代码库（如 Word2Vec, FastText, Transformer）及教程链接。\n\n本指南将指导你如何利用该清单中的核心资源，快速搭建开发环境并运行经典的 NLP 模型。\n\n## 1. 环境准备\n\n由于清单中的资源多基于 Python 生态（特别是 TensorFlow, PyTorch, Gensim, FastText），建议按以下标准配置环境：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+), macOS, 或 Windows (需 WSL2)\n*   **Python 版本**: 3.8 - 3.10 (兼容性最佳)\n*   **硬件要求**: \n    *   基础学习：任意现代 CPU\n    *   训练模型（如 NMT, Transformer）：推荐 NVIDIA GPU (显存 8GB+) 及 CUDA 驱动\n*   **前置依赖管理**: 推荐使用 `conda` 或 `venv` 隔离环境。\n\n## 2. 安装步骤\n\n由于 `DL4NLP` 是资源列表，你需要根据想学习的模块安装对应的工具库。以下是覆盖清单中大部分内容（Word Vectors, Sentiment Analysis, Seq2Seq）的核心库安装命令。\n\n### 2.1 创建虚拟环境\n```bash\npython -m venv dl4nlp-env\nsource dl4nlp-env\u002Fbin\u002Factivate  # Windows 使用: dl4nlp-env\\Scripts\\activate\n```\n\n### 2.2 安装核心深度学习框架\n国内开发者建议使用清华源或阿里源加速安装：\n\n```bash\n# 安装 PyTorch (推荐用于复现 Transformer, LSTM 等最新架构)\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n\n# 或者安装 TensorFlow (清单中部分旧教程基于 TF)\npip install tensorflow -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 2.3 安装 NLP 专用库\n对应清单中的 Word2Vec, GloVe, FastText 及数据处理工具：\n\n```bash\n# Gensim (包含 Word2Vec, Doc2Vec 实现)\npip install gensim -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# FastText (Facebook AI Research, 用于高效文本分类和词向量)\npip install fasttext -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 其他辅助工具 (Jupyter, 可视化等)\npip install jupyterlab matplotlib scikit-learn -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n> **提示**：若需运行清单中提到的 Stanford CS224D 或 Oxford 课程的具体代码，请前往其对应的 GitHub 仓库（如 `oxford-cs-deepnlp-2017`）克隆项目并安装其特定的 `requirements.txt`。\n\n## 3. 基本使用\n\n以下示例演示如何使用清单中推荐的 **Gensim** 库复现经典的 **Word2Vec** (Skip-gram\u002FCBOW) 模型，这是进入深度学习 NLP 的第一步。\n\n### 3.1 准备数据\n创建一个名为 `demo_word2vec.py` 的文件：\n\n```python\nfrom gensim.models import Word2Vec\nfrom gensim.utils import simple_preprocess\n\n# 1. 准备简单的语料库 (模拟清单中提到的预处理步骤)\ncorpus = [\n    \"Deep learning is great for natural language processing\",\n    \"Word vectors represent words as numeric vectors\",\n    \"Neural machine translation uses sequence to sequence models\",\n    \"Attention mechanisms improve translation quality\"\n]\n\n# 2. 分词预处理\ntokenized_corpus = [simple_preprocess(sentence) for sentence in corpus]\n\n# 3. 训练 Word2Vec 模型\n# vector_size: 向量维度 (常见 100, 300)\n# window: 上下文窗口大小\n# min_count: 忽略频率低于此值的词\n# sg: 1 表示 Skip-gram, 0 表示 CBOW (参考 Mikolov 2013 论文)\nmodel = Word2Vec(\n    sentences=tokenized_corpus,\n    vector_size=100,\n    window=5,\n    min_count=1,\n    workers=4,\n    sg=1 \n)\n\n# 4. 使用模型\nword_vector = model.wv['learning']\nprint(f\"Vector for 'learning': {word_vector[:5]}...\") # 打印前 5 个维度\n\n# 5. 查找相似词 (类比推理的基础)\nsimilar_words = model.wv.most_similar('deep', topn=2)\nprint(f\"Words similar to 'deep': {similar_words}\")\n\n# 6. 保存模型\nmodel.save(\"word2vec_dl4nlp.model\")\n```\n\n### 3.2 运行示例\n在终端执行：\n\n```bash\npython demo_word2vec.py\n```\n\n### 3.3 进阶资源指引\n根据 `DL4NLP` 清单，完成上述基础后，你可进一步探索：\n*   **情感分析**: 使用 `gensim` 实现 **Doc2Vec**，或调用 **FastText** 进行文本分类。\n*   **机器翻译**: 参考清单中的 **TensorFlow Seq2Seq Tutorial** 或 **PyTorch Transformer** 实现，复现 Encoder-Decoder 架构。\n*   **课程学习**: 访问清单中的 [Stanford CS224N](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8rXD5-xhemo&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z) 视频链接获取系统性知识。","某初创公司的算法团队正致力于构建一个多语言智能客服系统，需要快速掌握从词向量预处理到序列建模（如机器翻译、对话生成）的前沿技术。\n\n### 没有 DL4NLP 时\n- **资源搜集低效**：工程师需在海量学术网站中盲目搜索，难以区分过时的教程与 2017-2019 年的最新 SOTA（最先进）方案。\n- **理论实践脱节**：找到了 Stanford CS224D 或 Oxford 的课程大纲，却找不到对应的视频讲座、幻灯片及代码实战仓库，学习路径断裂。\n- **核心概念模糊**：面对 Word2Vec 的 Skip-gram 与 CBOW 模型选择、负采样优化等细节，缺乏像 Yoav Goldberg 综述那样系统性的原理剖析，导致模型调优靠猜。\n- **复现成本高昂**：缺少权威的开源代码指引（如 word2vec 源码或 TensorFlow 教程），团队需花费数周时间从头摸索基础组件的实现。\n\n### 使用 DL4NLP 后\n- **一站式获取前沿资源**：直接通过 DL4NLP 索引到 Richard Socher 和 Phil Blunsom 等顶尖学者的完整课程视频、讲义及 GitHub 实战项目，大幅缩短调研周期。\n- **深度学习路径清晰**：利用整理的 RNN、LSTM 笔记及神经概率语言模型论文，团队成员能迅速建立从理论基础到序列建模任务的完整知识体系。\n- **精准掌握嵌入技术**：借助对 Mikolov 系列论文及 GloVe 的深度解析，团队明确了不同数据集规模下词向量模型的选型策略及优化技巧（如层级 Softmax）。\n- **加速原型开发**：基于提供的官方教程链接和经典代码库，工程师能快速复现高质量的预训练模型，将原本数周的基础搭建工作压缩至几天。\n\nDL4NLP 通过聚合全球顶级的 NLP 深度学习资源，将团队从繁琐的信息筛选中解放出来，使其能专注于核心算法的创新与落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fandrewt3000_DL4NLP_209d9c69.png","andrewt3000","Andrew Thomas","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fandrewt3000_cbdf74fd.jpg",null,"ART Consulting","Birmingham, AL","https:\u002F\u002Fgithub.com\u002Fandrewt3000",2184,457,"2026-04-07T12:17:23",5,"","未说明",{"notes":90,"python":88,"dependencies":91},"该 README 文件是一个自然语言处理（NLP）深度学习资源的汇总列表，包含课程链接、论文摘要和相关代码库地址（如 Word2Vec, GloVe, FastText, TensorFlow 教程等），并非一个可直接安装运行的单一软件工具。因此，文中未提供任何具体的操作系统、硬件配置、Python 版本或依赖库的安装需求。用户需根据列表中引用的具体项目（如 Facebook 的 fastText 或 Google 的 TensorFlow 教程）分别查阅其对应的官方文档以获取运行环境要求。",[],[15],"2026-03-27T02:49:30.150509","2026-04-14T03:09:23.445518",[],[]]