[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-makcedward--nlp":3,"tool-makcedward--nlp":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":101,"forks":102,"last_commit_at":103,"license":81,"difficulty_score":104,"env_os":105,"env_gpu":106,"env_ram":106,"env_deps":107,"category_tags":110,"github_topics":111,"view_count":23,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":116,"updated_at":117,"faqs":118,"releases":146},3596,"makcedward\u002Fnlp","nlp",":memo: This repository recorded my NLP journey.","nlp 是一个记录自然语言处理（NLP）学习旅程的开源教程仓库，旨在通过实战代码和数据集，展示如何利用 NLP 技术解决现实世界的问题。它主要解决了开发者在构建 NLP 模型时面临的三大挑战：如何获取高质量的训练数据、如何进行规范的文本预处理，以及如何应用前沿算法提升模型性能。\n\n该资源非常适合 NLP 初学者、数据科学家以及希望将理论转化为实践的开发者使用。其独特亮点在于提供了极其详尽的“文本预处理”指南，涵盖了从分词、词性标注到拼写纠正等全流程，并附带了可运行的代码示例和深入的技术文章链接。此外，nlp 还特别聚焦于“数据增强”领域，不仅介绍了针对文本、语音和音频的多种增强策略，还深入探讨了对抗攻击防御、数据噪声优化及回译技术等进阶话题。无论是想夯实基础的新手，还是寻求模型优化灵感的研究人员，都能在这里找到有价值的参考方案。","# NLP - Tutorial\r\nRepository to show how NLP can tacke real problem. Including the source code, dataset, state-of-the art in NLP\r\n\r\n## Data Augmentation\r\n*   [Data Augmentation in NLP](https:\u002F\u002Ftowardsdatascience.com\u002Fdata-augmentation-in-nlp-2801a34dfc28)\r\n*   [Data Augmentation library for Text](https:\u002F\u002Ftowardsdatascience.com\u002Fdata-augmentation-library-for-text-9661736b13ff)\r\n*   [Does your NLP model able to prevent adversarial attack?](https:\u002F\u002Fmedium.com\u002Fhackernoon\u002Fdoes-your-nlp-model-able-to-prevent-adversarial-attack-45b5ab75129c)\r\n*   [How does Data Noising Help to Improve your NLP Model?](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fhow-does-data-noising-help-to-improve-your-nlp-model-480619f9fb10)\r\n*   [Data Augmentation library for Speech Recognition](https:\u002F\u002Ftowardsdatascience.com\u002Fdata-augmentation-for-speech-recognition-e7c607482e78)\r\n*   [Data Augmentation library for Audio](https:\u002F\u002Ftowardsdatascience.com\u002Fdata-augmentation-for-audio-76912b01fdf6)\r\n*   [Unsupervied Data Augmentation](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Funsupervised-data-augmentation-6760456db143)\r\n*   [Adversarial Attacks in Textual Deep Neural Networks](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fadversarial-attacks-in-textual-deep-neural-networks-245dc90029df)\r\n*\t[Back Translation in Text Augmentation by nlpaug](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fback-translation-in-text-augmentation-by-nlpaug-d65518dd092f)\r\n\r\n## General\r\n*\t[Tricks of Building an ML or DNN Model](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Ftricks-of-building-an-ml-or-dnn-model-b2de54cf440a)\r\n\r\n## Text Preprocessing\r\n| Section | Sub-Section | Description | Story |\r\n| --- | --- | --- | --- |\r\n| Tokenization | Subword Tokenization |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-subword-helps-on-your-nlp-model-83dd1b836f46) |\r\n| Tokenization | Word Tokenization |  | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-word-tokenization-part-1-4b2b547e6a3) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-word_tokenization.ipynb) |\r\n| Tokenization | Sentence Tokenization |  | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-sentence-tokenization-part-6-86ed55b185e6) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-sentence_tokenization.ipynb) |\r\n| Part of Speech | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-part-of-speech-part-2-b683c90e327d) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-part_of_speech.ipynb) |\r\n| Lemmatization | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-lemmatization-part-3-4bfd7304957) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp_lemmatization.ipynb) |\r\n| Stemming | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-stemming-part-4-b60a319fd52) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-stemming.ipynb) |\r\n| Stop Words | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-stop-words-part-5-d6770df8a936) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-stop_words.ipynb) |\r\n| Phrase Word Recognition | |  |  |\r\n| Spell Checking | Lexicon-based | Peter Norvig algorithm | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fcorrecting-your-spelling-error-with-4-operations-50bcfd519bb8) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Futil\u002Fnlp-util-spell_corrector.ipynb) |\r\n| | Lexicon-based | Symspell | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fessential-text-correction-process-for-nlp-tasks-f731a025fcc3) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Futil\u002Fnlp-util-symspell.ipynb) |\r\n| | Machine Translation | Statistical Machine Translation | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fcorrecting-text-input-by-machine-translation-and-classification-fa9d82087de1) |\r\n| | Machine Translation | Attention | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Ffix-your-text-thought-attention-before-nlp-tasks-7dc074b9744f) |\r\n| String Matching | Fuzzywuzzy | | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-fuzzy-matching-improve-your-nlp-model-bc617385ad6b) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fpreprocessing\u002Fnlp-preprocessing-string_matching-fuzzywuzzy.ipynb) |\r\n\r\n## Text Representation\r\n| Section | Sub-Section | Research Lab | Story | Source |\r\n| --- | --- | --- | --- | --- |\r\n| Traditional Method | Bag-of-words (BoW) |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002F3-basic-approaches-in-bag-of-words-which-are-better-than-word-embeddings-c2cbc7398016) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-bag_of_words.ipynb) |  |\r\n|  | Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002F2-latent-methods-for-dimension-reduction-and-topic-modeling-20ff6d7d547) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-lsa_lda.ipynb) |  |\r\n| Character Level | Character Embedding | NYU | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fbesides-word-embedding-why-you-need-to-know-character-embedding-6096a34a3b10) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-character_embedding.ipynb) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1502.01710v5.pdf) |\r\n| Word Level | Negative Sampling and Hierarchical Softmax |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-negative-sampling-work-on-word2vec-7bf8d545b116) |  |\r\n|  | Word2Vec, GloVe, fastText |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002F3-silver-bullets-of-word-embedding-in-nlp-10fa8f50cc5a) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-word_embedding.ipynb) |  |\r\n|  | Contextualized Word Vectors (CoVe) | Salesforce | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Freplacing-your-word-embeddings-by-contextualized-word-vectors-9508877ad65d) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-embeddings-word-cove.ipynb) | [Paper](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F7209-learned-in-translation-contextualized-word-vectors.pdf) [Code](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002Fcove) |\r\n|  | Misspelling Oblivious (word) Embeddings | Facebook | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fnew-model-for-word-embeddings-which-are-resilient-to-misspellings-moe-9ecfd3ab473e) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.09755.pdf) |\r\n|  | Embeddings from Language Models (ELMo) | AI2 | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Felmo-helps-to-further-improve-your-word-embeddings-c6ed2c9df95f) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-embeddings-sentence-elmo.ipynb) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.05365.pdf) [Code](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fallennlp\u002F) |\r\n|  | Contextual String Embeddings | Zalando Research | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fcontextual-embeddings-for-nlp-sequence-labeling-9a92ba5a6cf0) | [Paper](http:\u002F\u002Faclweb.org\u002Fanthology\u002FC18-1139) [Code](https:\u002F\u002Fgithub.com\u002Fzalandoresearch\u002Fflair)| \r\n| Sentence Level | Skip-thoughts |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Ftransforming-text-to-sentence-embeddings-layer-via-some-thoughts-b77bed60822c) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-skip_thoughts.ipynb) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.06726) [Code](https:\u002F\u002Fgithub.com\u002Fryankiros\u002Fskip-thoughts) |\r\n|  | InferSent |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Flearning-sentence-embeddings-by-natural-language-inference-a50b4661a0b8) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-embeddings-sentence-infersent.ipynb) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1705.02364) [Code](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FInferSent) |\r\n|  | Quick-Thoughts | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fbuilding-sentence-embeddings-via-quick-thoughts-945484cae273) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.02893.pdf) [Code](https:\u002F\u002Fgithub.com\u002Flajanugen\u002FS2V) |\r\n|  | General Purpose Sentence (GenSen) |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Flearning-generic-sentence-representation-by-various-nlp-tasks-df39ce4e81d7) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1804.00079.pdf) [Code](https:\u002F\u002Fgithub.com\u002FMaluuba\u002Fgensen) |\r\n|  | Bidirectional Encoder Representations from Transformers (BERT) | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-bert-leverage-attention-mechanism-and-transformer-to-learn-word-contextual-relations-5bbee1b6dbdb) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.04805) [Code](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fbert)| \r\n|  | Generative Pre-Training (GPT) | OpenAI | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fcombining-supervised-learning-and-unsupervised-learning-to-improve-word-vectors-d4dea84ec36b) | [Paper(2019)](https:\u002F\u002Fs3-us-west-2.amazonaws.com\u002Fopenai-assets\u002Fresearch-covers\u002Flanguage-unsupervised\u002Flanguage_understanding_paper.pdf) [Code](https:\u002F\u002Fgithub.com\u002Fopenai\u002Ffinetune-transformer-lm)| \r\n|  | Self-Governing Neural Networks (SGNN) | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fembeddings-free-deep-learning-nlp-model-ce067c7a7c93) | [Paper](https:\u002F\u002Faclweb.org\u002Fanthology\u002FD18-1105) | \r\n|  | Multi-Task Deep Neural Networks (MT-DNN) | Microsoft | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fwhen-multi-task-learning-meet-with-bert-d1c49cc40a0c) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.11504.pdf) | \r\n|  | Generative Pre-Training-2 (GPT-2) | OpenAI | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Ftoo-powerful-nlp-model-generative-pre-training-2-4cc6afb6655) | [Paper(2019)](https:\u002F\u002Fd4mucfpksywv.cloudfront.net\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf) [Code](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgpt-2)| \r\n|  | Universal Language Model Fine-tuning (ULMFiT) | OpenAI | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fmulti-task-learning-in-language-model-for-text-classification-c3acc1fedd89) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1801.06146.pdf) [Code](https:\u002F\u002Fgithub.com\u002Ffastai\u002Ffastai)| \r\n|  | BERT in Science Domain |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-to-apply-bert-in-scientific-domain-2d9db0480bd9) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1903.10676.pdf) [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.08746.pdf)| \r\n|  | BERT in Clinical Domain | NYU\u002FPU | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-do-they-apply-bert-in-the-clinical-domain-49113a51be50) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.03323.pdf) [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.05342.pdf)| \r\n|  | RoBERTa | UW\u002FFacebook | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fa-robustly-optimized-bert-pretraining-approach-f6b6e537e6a6) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.03323.pdf) [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.11692.pdf)| \r\n|  | Unified Language Model for NLP and NLU (UNILM) | Microsoft | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Funified-language-model-pre-training-for-natural-language-understanding-and-generation-f87dc226aa2) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.03197.pdf)| \r\n|  | Cross-lingual Language Model (XLMs) | Facebook | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fcross-lingual-language-model-56a65dba9358) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.07291.pdf)| \r\n|  | Transformer-XL | CMU\u002FGoogle | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Faddress-limitation-of-rnn-in-nlp-problems-by-using-transformer-xl-866d7ce1c8f4) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.02860.pdf)| \r\n|  | XLNet | CMU\u002FGoogle | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fwhy-does-xlnet-outperform-bert-da98a8503d5b) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.08237.pdf)| \r\n|  | CTRL | Salesforce | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fa-controllable-framework-for-text-generation-8be9e1f2c5db) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.05858.pdf)|z\r\n|  | ALBERT | Google\u002FToyota | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fa-lite-bert-for-reducing-inference-time-bed8d990daac) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.11942.pdf)|\r\n|  | T5 | Googles | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Ftext-to-text-transfer-transformer-e35dc28bae14) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.10683.pdf)|\r\n|  | MultiFiT |   | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fmulti-lingual-language-model-fine-tuning-81922a80438f) | [Paper(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.04761.pdf) |\r\n|  | XTREME |   | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fnew-multilingual-model-xtreme-276bbaa26d79) | [Paper(2020)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2003.11080.pdf) |\r\n|  | REALM |   | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Frealm-retrieval-augmented-language-model-pre-training-534feae7ab98) | [Paper(2020)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2002.08909.pdf) |\r\n\r\n| Document Level | lda2vec |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fcombing-lda-and-word-embeddings-for-topic-modeling-fe4a1315a5b4) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1605.02019.pdf) |\r\n|  | doc2vec | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Funderstand-how-to-transfer-your-paragraph-to-vector-by-doc2vec-1e225ccf102) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fembeddings\u002Fnlp-embeddings-document-doc2vec.ipynb) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1405.4053.pdf) |\r\n\r\n## NLP Problem \r\n| Section | Sub-Section | Description | Research Lab | Story | Paper & Code |\r\n| --- | --- | --- | --- | --- | --- |\r\n| Named Entity Recognition (NER) | Pattern-based Recognition | | | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fpattern-based-recognition-did-help-in-nlp-5c54b4e7a962)  |  |\r\n| | Lexicon-based Recognition | | | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fstep-out-from-regular-expression-for-feature-engineering-134e594f542c) |  |\r\n| | spaCy Pre-trained NER | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnamed-entity-recognition-3fad3f53c91e) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-named_entity_recognition.ipynb) |  |\r\n| Optical Character Recognition (OCR) | Printed Text | Google Cloud Vision API | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fsecret-of-google-web-based-ocr-service-fe30eecedd01) | [Paper](https:\u002F\u002Fdas2018.cvl.tuwien.ac.at\u002Fmedia\u002Ffiler_public\u002F85\u002Ffd\u002F85fd4698-040f-45f4-8fcc-56d66533b82d\u002Fdas2018_short_papers.pdf) |\r\n| | Handwriting | LSTM | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Flstm-based-handwriting-recognition-by-google-eb99663ca6de) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1902.10525.pdf) | \r\n| Text Summarization | Extractive Approach | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Ftext-summarization-extractive-approach-567fe4b85c23) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-text_summarization_extractive.ipynb) | |\r\n| | Abstractive Approach |  |  | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fsummarize-document-by-combing-extractive-and-abstractive-steps-40295310526) | \r\n| Emotion Recognition | Audio, Text, Visual | 3 Multimodals for Emotion Recognition |  | [Medium](https:\u002F\u002Fbecominghuman.ai\u002Fmultimodal-for-emotion-recognition-21df267fddc4) |\r\n\r\n## Acoustic Problem\r\n| Section | Sub-Section | Description | Research Lab | Story | Paper & Code |\r\n| --- | --- | --- | --- | --- | --- |\r\n| Feature Representation | Unsupervised Learning| Introduction to Audio Feature Learning | |  [Medium](https:\u002F\u002Fmedium.com\u002Fhackernoon\u002Fhow-can-you-apply-unsupervised-learning-on-audio-data-be95153c5860) | [Paper 1](https:\u002F\u002Fai.stanford.edu\u002F~ang\u002Fpapers\u002Fnips09-AudioConvolutionalDBN.pdf) [Paper 2](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1607.03681.pdf) [Paper 3](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.03835.pdf)\r\n| Feature Representation | Unsupervised Learning| Speech2Vec and Sentence Level Embeddings | |  [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Ftwo-ways-to-learn-audio-embeddings-9dfcaab10ba6) | [Paper 1](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.08976.pdf) [Paper 2](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1902.07817.pdf)\r\n| Feature Representation | Unsupervised Learning| Wav2vec | |  [Medium](https:\u002F\u002Fbecominghuman.ai\u002Funsupervised-pre-training-for-speech-recognition-wav2vec-aba643824324) | [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.05862.pdf)\r\n| Speech-to-text | | Introduction to Speeh-to-text | |  [Medium](https:\u002F\u002Fbecominghuman.ai\u002Fhow-does-your-assistant-device-work-based-on-text-to-speech-technology-5f31e56eae7e) |\r\n\r\n## Text Distance Measurement\r\n| Section | Sub-Section | Description | Research Lab | Story | Paper & Code |\r\n| --- | --- | --- | --- | --- | --- |\r\n| Euclidean Distance, Cosine Similarity and Jaccard Similarity |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002F3-basic-distance-measurement-in-text-mining-5852becff1d7) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-3_basic_distance_measurement_in_text_mining.ipynb) |  |\r\n| Edit Distance | Levenshtein Distance |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fmeasure-distance-between-2-words-by-simple-calculation-a97cf4993305) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-distance-edit_distance.ipynb) |  |\r\n| Word Moving Distance (WMD) |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fword-distance-between-word-embeddings-cc3e9cf1d632) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-word_mover_distance.ipynb) |\r\n| Supervised Word Moving Distance (S-WMD) |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fword-distance-between-word-embeddings-with-weight-bf02869c50e1)|\r\n| Manhattan LSTM |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Ftext-matching-with-deep-learning-e6aa05333399) | [Paper](http:\u002F\u002Fwww.mit.edu\u002F~jonasm\u002Finfo\u002FMuellerThyagarajan_AAAI16.pdf) |\r\n\r\n## Model Interpretation\r\n| Section | Sub-Section | Description | Research Lab | Story | Paper & Code |\r\n| --- | --- | --- | --- | --- | --- |\r\n| ELI5, LIME and Skater |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002F3-ways-to-interpretate-your-nlp-model-to-management-and-customer-5428bc07ce15) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-model_interpretation.ipynb) |\r\n| SHapley Additive exPlanations (SHAP) |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Finterpreting-your-deep-learning-model-by-shap-e69be2b47893) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-model_interpretation_shap.ipynb) |\r\n| Anchors |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fanchor-your-model-interpretation-by-anchors-aa4ed7104032) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-model_interpretation_anchor.ipynb) |\r\n\r\n## Graph\r\n| Section | Sub-Section | Description | Research Lab | Story | Paper & Code |\r\n| --- | --- | --- | --- | --- | --- |\r\n| Embeddings | | TransE, RESCAL, DistMult, ComplEx, PyTorch BigGraph | |  [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fa-gentle-introduction-to-graph-embeddings-c7b3d1db0fa8) | [RESCAL(2011)](https:\u002F\u002Fpdfs.semanticscholar.org\u002F68a3\u002F3a3afac65eb6e0fb3726c1f9c8b727f32a42.pdf?_ga=2.21151099.1397092755.1575835510-317581445.1533093975) [TransE(2013)](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F5071-translating-embeddings-for-modeling-multi-relational-data.pdf) [DistMult(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1412.6575v4.pdf) [ComplEx(2016)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.06357.pdf) [PyTorch BigGraph(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1903.12287.pdf)\r\n| Embeddings | | DeepWalk, node2vec, LINE, GraphSAGE | |  [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Frandom-walk-in-node-embeddings-deepwalk-node2vec-line-and-graphsage-ca23df60e493) | [DeepWalk(2014)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1403.6652.pdf) [node2vec(2015)](https:\u002F\u002Fcs.stanford.edu\u002F~jure\u002Fpubs\u002Fnode2vec-kdd16.pdf) [LINE(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1503.03578.pdf) [GraphSAGE(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1706.02216.pdf)\r\n| Embeddings | | WLG, GCN, GAT, GIN | |  [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002F4-graph-neural-networks-you-need-to-know-wlg-gcn-gat-gin-1bf10d29d836) | [WLG(2011)](http:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fvolume12\u002Fshervashidze11a\u002Fshervashidze11a.pdf) [GCN2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1609.02907.pdf) [GAT(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.10903.pdf) [GraphSAGE(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.00826.pdf)\r\n| Embeddings | | [PinSAGE(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.01973.pdf) | Pinterest | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fwhen-graphsage-meets-pinterest-5e82c9a88120)\r\n| Embeddings | | [HoIE(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1510.04935.pdf), [SimpIE(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.04868.pdf) | | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fknowledge-graph-embeddings-dc9251bffa80)\r\n| Embeddings | | [ContE(2017)](http:\u002F\u002Frepository.ittelkom-pwt.ac.id\u002F4358\u002F1\u002FLearning%20Contextual%20Embeddings%20for%20Knowledge%20Graph%20Completion.pdf), [ETE(2017)](https:\u002F\u002Fpersagen.com\u002Ffiles\u002Fmisc\u002FMoon2017Learning.pdf) | | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Ffrom-conte-to-entity-type-embeddings-in-natural-language-processing-19e53db90dd5)\r\n\r\n## Meta-Learning\r\n| Section | Sub-Section | Description | Story |\r\n| --- | --- | --- | --- |\r\n| Introduction |  | [Matching Nets(2016)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.04080.pdf) [MANN(2016)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1605.06065.pdf) [LSTM-based meta-learner(2017)](https:\u002F\u002Fopenreview.net\u002Fpdf?id=rJY0-Kcll) [Prototypical Networks(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.05175.pdf) [ARC(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.00767.pdf) [MAML(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.03400.pdf) [MetaNet(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.00837.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fa-gentle-introduction-to-meta-learning-8e36f3d93f61)  |\r\n| NLP | Dialog Generation | [DAML(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.03520.pdf), [PAML(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.10033.pdf), [NTMS(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.10487.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fmeta-learning-in-dialog-generation-41367e397086)\r\n| | Classification | [Intent Embeddings(2016)](https:\u002F\u002Fwww.csie.ntu.edu.tw\u002F~yvchen\u002Fdoc\u002FICASSP16_ZeroShot.pdf) [LEOPARD(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.03863.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fmeta-learning-in-nlp-classification-db78fbcdf15c)\r\n| CV | Unsupervised Learning | [CACTUs(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.02334.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Funsupervised-learning-in-meta-learning-f71c549e2ae2)\r\n| General | | [Siamese Network(1994)](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F769-signature-verification-using-a-siamese-time-delay-neural-network.pdf), [Triplet Network(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1412.6622.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fhow-do-twins-and-triplet-neural-network-work-cfed66d9b829)\r\n| | [MAML+(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.09502.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Ffrom-maml-to-maml-20de07203d59)\r\n\r\n## Image\r\n| Section | Sub-Section | Description | Research Lab | Story | Paper & Code |\r\n| --- | --- | --- | --- | --- | --- |\r\n| Object Detection |  | R-CNN | |  [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fhow-r-cnn-works-on-object-detection-443679b0187c) | [Paper(2013)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1311.2524.pdf)\r\n| Object Detection |  | Fast R-CNN | |  [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fhow-fast-r-cnn-works-on-object-detection-546e4812eaa1) | [Paper(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1504.08083.pdf)\r\n| Object Detection |  | Faster R-CNN | |  [Medium](https:\u002F\u002Fbecominghuman.ai\u002Fhow-faster-r-cnn-works-on-object-detection-3d92432ce321) | [Paper(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.01497.pdf)\r\n| Object Detection |  | VGGNet | |  [Medium](https:\u002F\u002Fbecominghuman.ai\u002Fwhat-is-the-vgg-neural-network-a590caa72643) | [Paper(2014)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.1556.pdf)\r\n| Instance Segmentation | | Mask R-CNN | FAIR | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fmask-r-cnn-for-instance-segmentation-7f0708e3e25b) | [Paper(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.06870.pdf) | \r\n| Image Classification |  | [ResNet(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1512.03385.pdf) |Microsoft |  [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fhow-does-resnet-improve-performance-caaa436f885b)  |\r\n| Image Classification |  | [ResNeXt(2016)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1611.05431.pdf) | |  [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fenhancing-resnet-to-resnext-for-image-classification-3449f62a774c)  |\r\n\r\n## Evaluation\r\n| Section | Sub-Section | Description | Story |\r\n| --- | --- | --- | --- |\r\n| Introduction | | | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fevaluation-metrics-are-what-you-need-to-define-in-the-earlier-stage-99dbfae51472)\r\n| Classification | | Confusion Matrix, ROC, AUC | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fevaluation-metrics-for-classification-problems-e7442092bc5)\r\n| Regression | | MAE, MSE, RMSE, MAPE, WMAPE | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fevaluation-metrics-for-regression-problems-fff2ac8e3f43)\r\n| Textual | | Perplexity, BLEU, GER, WER, GLUE | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fevaluation-metrics-for-textual-problems-6e881feef5ad)\r\n\r\n## Source Code \r\n| Section | Sub-Section | Description | Link |\r\n| --- | --- | --- | --- |\r\n| Spellcheck |  |  | [Github](https:\u002F\u002Fgithub.com\u002Fnorvig\u002Fpytudes) |\r\n| InferSent |  |  | [Github](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FInferSent) |\r\n","# NLP - 教程\n用于展示自然语言处理如何解决实际问题的仓库。包含源代码、数据集以及 NLP 领域的最新进展。\n\n## 数据增强\n*   [NLP 中的数据增强](https:\u002F\u002Ftowardsdatascience.com\u002Fdata-augmentation-in-nlp-2801a34dfc28)\n*   [文本数据增强库](https:\u002F\u002Ftowardsdatascience.com\u002Fdata-augmentation-library-for-text-9661736b13ff)\n*   [你的 NLP 模型能否抵御对抗攻击？](https:\u002F\u002Fmedium.com\u002Fhackernoon\u002Fdoes-your-nlp-model-able-to-prevent-adversarial-attack-45b5ab75129c)\n*   [数据噪声化如何帮助提升 NLP 模型性能？](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fhow-does-data-noising-help-to-improve-your-nlp-model-480619f9fb10)\n*   [语音识别中的数据增强库](https:\u002F\u002Ftowardsdatascience.com\u002Fdata-augmentation-for-speech-recognition-e7c607482e78)\n*   [音频数据增强库](https:\u002F\u002Ftowardsdatascience.com\u002Fdata-augmentation-for-audio-76912b01fdf6)\n*   [无监督数据增强](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Funsupervised-data-augmentation-6760456db143)\n*   [文本深度神经网络中的对抗攻击](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fadversarial-attacks-in-textual-deep-neural-networks-245dc90029df)\n*   [通过 nlpaug 进行文本增强中的回译](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fback-translation-in-text-augmentation-by-nlpaug-d65518dd092f)\n\n## 通用\n*   [构建机器学习或深度神经网络模型的技巧](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Ftricks-of-building-an-ml-or-dnn-model-b2de54cf440a)\n\n## 文本预处理\n| 版块 | 子版块 | 描述 | 文章链接 |\n| --- | --- | --- | --- |\n| 分词 | 子词分词 |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-subword-helps-on-your-nlp-model-83dd1b836f46) |\n| 分词 | 单词分词 |  | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-word-tokenization-part-1-4b2b547e6a3) [GitHub](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-word_tokenization.ipynb) |\n| 分词 | 句子分词 |  | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-sentence-tokenization-part-6-86ed55b185e6) [GitHub](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-sentence_tokenization.ipynb) |\n| 词性标注 | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-part-of-speech-part-2-b683c90e327d) [GitHub](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-part_of_speech.ipynb) |\n| 词形还原 | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-lemmatization-part-3-4bfd7304957) [GitHub](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp_lemmatization.ipynb) |\n| 词干提取 | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-stemming-part-4-b60a319fd52) [GitHub](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-stemming.ipynb) |\n| 停用词过滤 | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnlp-pipeline-stop-words-part-5-d6770df8a936) [GitHub](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-stop_words.ipynb) |\n| 短语识别 | |  |  |\n| 拼写检查 | 基于词典 | 彼得·诺维格算法 | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fcorrecting-your-spelling-error-with-4-operations-50bcfd519bb8) [GitHub](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Futil\u002Fnlp-util-spell_corrector.ipynb) |\n| | 基于词典 | Symspell | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fessential-text-correction-process-for-nlp-tasks-f731a025fcc3) [GitHub](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Futil\u002Fnlp-util-symspell.ipynb) |\n| | 机器翻译 | 统计机器翻译 | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fcorrecting-text-input-by-machine-translation-and-classification-fa9d82087de1) |\n| | 机器翻译 | 注意力机制 | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Ffix-your-text-thought-attention-before-nlp-tasks-7dc074b9744f) |\n| 字符串匹配 | Fuzzywuzzy | | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-fuzzy-matching-improve-your-nlp-model-bc617385ad6b) [GitHub](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fpreprocessing\u002Fnlp-preprocessing-string_matching-fuzzywuzzy.ipynb) |\n\n## 文本表示\n| 部分 | 子部分 | 研究实验室 | 故事 | 来源 |\n| --- | --- | --- | --- | --- |\n| 传统方法 | 词袋模型 (BoW) |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002F3-basic-approaches-in-bag-of-words-which-are-better-than-word-embeddings-c2cbc7398016) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-bag_of_words.ipynb) |  |\n|  | 潜在语义分析 (LSA) 和潜在狄利克雷分配 (LDA) |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002F2-latent-methods-for-dimension-reduction-and-topic-modeling-20ff6d7d547) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-lsa_lda.ipynb) |  |\n| 字符级 | 字符嵌入 | NYU | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fbesides-word-embedding-why-you-need-to-know-character-embedding-6096a34a3b10) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-character_embedding.ipynb) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1502.01710v5.pdf) |\n| 词级 | 负采样和层次 softmax |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-negative-sampling-work-on-word2vec-7bf8d545b116) |  |\n|  | Word2Vec、GloVe、fastText |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002F3-silver-bullets-of-word-embedding-in-nlp-10fa8f50cc5a) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-word_embedding.ipynb) |  |\n|  | 上下文感知词向量 (CoVe) | Salesforce | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Freplacing-your-word-embeddings-by-contextualized-word-vectors-9508877ad65d) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-embeddings-word-cove.ipynb) | [论文](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F7209-learned-in-translation-contextualized-word-vectors.pdf) [代码](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002Fcove) |\n|  | 对拼写错误不敏感的词嵌入 | Facebook | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fnew-model-for-word-embeddings-which-are-resilient-to-misspellings-moe-9ecfd3ab473e) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.09755.pdf) |\n|  | 语言模型生成的嵌入 (ELMo) | AI2 | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Felmo-helps-to-further-improve-your-word-embeddings-c6ed2c9df95f) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-embeddings-sentence-elmo.ipynb) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.05365.pdf) [代码](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fallennlp\u002F) |\n|  | 上下文字符串嵌入 | Zalando Research | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fcontextual-embeddings-for-nlp-sequence-labeling-9a92ba5a6cf0) | [论文](http:\u002F\u002Faclweb.org\u002Fanthology\u002FC18-1139) [代码](https:\u002F\u002Fgithub.com\u002Fzalandoresearch\u002Fflair) |\n| 句子级 | Skip-thoughts |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Ftransforming-text-to-sentence-embeddings-layer-via-some-thoughts-b77bed60822c) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-skip_thoughts.ipynb) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.06726) [代码](https:\u002F\u002Fgithub.com\u002Fryankiros\u002Fskip-thoughts) |\n|  | InferSent |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Flearning-sentence-embeddings-by-natural-language-inference-a50b4661a0b8) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-embeddings-sentence-infersent.ipynb) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1705.02364) [代码](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FInferSent) |\n|  | Quick-Thoughts | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fbuilding-sentence-embeddings-via-quick-thoughts-945484cae273) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.02893.pdf) [代码](https:\u002F\u002Fgithub.com\u002Flajanugen\u002FS2V) |\n|  | 通用句子嵌入 (GenSen) |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Flearning-generic-sentence-representation-by-various-nlp-tasks-df39ce4e81d7) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1804.00079.pdf) [代码](https:\u002F\u002Fgithub.com\u002FMaluuba\u002Fgensen) |\n|  | 基于 Transformer 的双向编码器表示 (BERT) | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-bert-leverage-attention-mechanism-and-transformer-to-learn-word-contextual-relations-5bbee1b6dbdb) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.04805) [代码](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fbert) |\n|  | 生成式预训练 (GPT) | OpenAI | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fcombining-supervised-learning-and-unsupervised-learning-to-improve-word-vectors-d4dea84ec36b) | [论文(2019)](https:\u002F\u002Fs3-us-west-2.amazonaws.com\u002Fopenai-assets\u002Fresearch-covers\u002Flanguage-unsupervised\u002Flanguage_understanding_paper.pdf) [代码](https:\u002F\u002Fgithub.com\u002Fopenai\u002Ffinetune-transformer-lm) |\n|  | 自主神经网络 (SGNN) | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fembeddings-free-deep-learning-nlp-model-ce067c7a7c93) | [论文](https:\u002F\u002Faclweb.org\u002Fanthology\u002FD18-1105) |\n|  | 多任务深度神经网络 (MT-DNN) | Microsoft | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fwhen-multi-task-learning-meet-with-bert-d1c49cc40a0c) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.11504.pdf) |\n|  | 生成式预训练-2 (GPT-2) | OpenAI | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Ftoo-powerful-nlp-model-generative-pre-training-2-4cc6afb6655) | [论文(2019)](https:\u002F\u002Fd4mucfpksywv.cloudfront.net\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf) [代码](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgpt-2) |\n|  | 通用语言模型微调 (ULMFiT) | OpenAI | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fmulti-task-learning-in-language-model-for-text-classification-c3acc1fedd89) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1801.06146.pdf) [代码](https:\u002F\u002Fgithub.com\u002Ffastai\u002Ffastai) |\n|  | BERT 在科学领域 |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-to-apply-bert-in-scientific-domain-2d9db0480bd9) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1903.10676.pdf) [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.08746.pdf) |\n|  | BERT 在临床领域 | NYU\u002FPU | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-do-they-apply-bert-in-the-clinical-domain-49113a51be50) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.03323.pdf) [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.05342.pdf) |\n|  | RoBERTa | UW\u002FFacebook | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fa-robustly-optimized-bert-pretraining-approach-f6b6e537e6a6) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.03323.pdf) [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.11692.pdf) |\n|  | NLP 和 NLU 的统一语言模型 (UNILM) | Microsoft | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Funified-language-model-pre-training-for-natural-language-understanding-and-generation-f87dc226aa2) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.03197.pdf) |\n|  | 跨语言语言模型 (XLMs) | Facebook | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fcross-lingual-language-model-56a65dba9358) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.07291.pdf) |\n|  | Transformer-XL | CMU\u002FGoogle | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Faddress-limitation-of-rnn-in-nlp-problems-by-using-transformer-xl-866d7ce1c8f4) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.02860.pdf) |\n|  | XLNet | CMU\u002FGoogle | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fwhy-does-xlnet-outperform-bert-da98a8503d5b) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.08237.pdf) |\n|  | CTRL | Salesforce | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fa-controllable-framework-for-text-generation-8be9e1f2c5db) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.05858.pdf) |\n|  | ALBERT | Google\u002FToyota | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fa-lite-bert-for-reducing-inference-time-bed8d990daac) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.11942.pdf) |\n|  | T5 | Googles | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Ftext-to-text-transfer-transformer-e35dc28bae14) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.10683.pdf) |\n|  | MultiFiT |   | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fmulti-lingual-language-model-fine-tuning-81922a80438f) | [论文(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.04761.pdf) |\n|  | XTREME |   | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fnew-multilingual-model-xtreme-276bbaa26d79) | [论文(2020)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2003.11080.pdf) |\n|  | REALM |   | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Frealm-retrieval-augmented-language-model-pre-training-534feae7ab98) | [论文(2020)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2002.08909.pdf) |\n\n| 文档级 | lda2vec |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fcombing-lda-and-word-embeddings-for-topic-modeling-fe4a1315a5b4) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1605.02019.pdf) |\n|  | doc2vec | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Funderstand-how-to-transfer-your-paragraph-to-vector-by-doc2vec-1e225ccf102) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fembeddings\u002Fnlp-embeddings-document-doc2vec.ipynb) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1405.4053.pdf) |\n\n## 自然语言处理问题\n| 版块 | 子版块 | 描述 | 研究实验室 | 故事 | 论文与代码 |\n| --- | --- | --- | --- | --- | --- |\n| 命名实体识别 (NER) | 基于模式的识别 | | | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fpattern-based-recognition-did-help-in-nlp-5c54b4e7a962)  |  |\n| | 基于词典的识别 | | | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fstep-out-from-regular-expression-for-feature-engineering-134e594f542c) |  |\n| | spaCy 预训练 NER | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Fnamed-entity-recognition-3fad3f53c91e) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-named_entity_recognition.ipynb) |  |\n| 光学字符识别 (OCR) | 印刷文本 | Google Cloud Vision API | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fsecret-of-google-web-based-ocr-service-fe30eecedd01) | [论文](https:\u002F\u002Fdas2018.cvl.tuwien.ac.at\u002Fmedia\u002Ffiler_public\u002F85\u002Ffd\u002F85fd4698-040f-45f4-8fcc-56d66533b82d\u002Fdas2018_short_papers.pdf) |\n| | 手写文字 | LSTM | Google | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Flstm-based-handwriting-recognition-by-google-eb99663ca6de) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1902.10525.pdf) | \n| 文本摘要 | 抽取式方法 | | | [Medium](https:\u002F\u002Fmedium.com\u002F@makcedward\u002Ftext-summarization-extractive-approach-567fe4b85c23) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-text_summarization_extractive.ipynb) | |\n| | 摘要式方法 |  |  | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fsummarize-document-by-combing-extractive-and-abstractive-steps-40295310526) | |\n| 情感识别 | 音频、文本、视觉 | 用于情感识别的三种多模态方法 |  | [Medium](https:\u002F\u002Fbecominghuman.ai\u002Fmultimodal-for-emotion-recognition-21df267fddc4) | |\n\n## 声学问题\n| 版块 | 子版块 | 描述 | 研究实验室 | 故事 | 论文与代码 |\n| --- | --- | --- | --- | --- | --- |\n| 特征表示 | 无监督学习 | 音频特征学习导论 | |  [Medium](https:\u002F\u002Fmedium.com\u002Fhackernoon\u002Fhow-can-you-apply-unsupervised-learning-on-audio-data-be95153c5860) | [论文 1](https:\u002F\u002Fai.stanford.edu\u002F~ang\u002Fpapers\u002Fnips09-AudioConvolutionalDBN.pdf) [论文 2](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1607.03681.pdf) [论文 3](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.03835.pdf) |\n| 特征表示 | 无监督学习 | Speech2Vec 和句子级嵌入 | |  [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Ftwo-ways-to-learn-audio-embeddings-9dfcaab10ba6) | [论文 1](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.08976.pdf) [论文 2](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1902.07817.pdf) |\n| 特征表示 | 无监督学习 | Wav2vec | |  [Medium](https:\u002F\u002Fbecominghuman.ai\u002Funsupervised-pre-training-for-speech-recognition-wav2vec-aba643824324) | [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.05862.pdf) |\n| 语音转文本 | | 语音转文本导论 | |  [Medium](https:\u002F\u002Fbecominghuman.ai\u002Fhow-does-your-assistant-device-work-based-on-text-to-speech-technology-5f31e56eae7e) | |\n\n## 文本距离度量\n| 版块 | 子版块 | 描述 | 研究实验室 | 故事 | 论文与代码 |\n| --- | --- | --- | --- | --- | --- |\n| 欧氏距离、余弦相似度和 Jaccard 相似度 |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002F3-basic-distance-measurement-in-text-mining-5852becff1d7) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-3_basic_distance_measurement_in_text_mining.ipynb) |  |\n| 编辑距离 | Levenshtein 距离 |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fmeasure-distance-between-2-words-by-simple-calculation-a97cf4993305) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-distance-edit_distance.ipynb) |  |\n| 词移动距离 (WMD) |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fword-distance-between-word-embeddings-cc3e9cf1d632) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-word_mover_distance.ipynb) |\n| 监督型词移动距离 (S-WMD) |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fword-distance-between-word-embeddings-with-weight-bf02869c50e1)|\n| 曼哈顿 LSTM |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Ftext-matching-with-deep-learning-e6aa05333399) | [论文](http:\u002F\u002Fwww.mit.edu\u002F~jonasm\u002Finfo\u002FMuellerThyagarajan_AAAI16.pdf) |\n\n## 模型解释\n| 版块 | 子版块 | 描述 | 研究实验室 | 故事 | 论文与代码 |\n| --- | --- | --- | --- | --- | --- |\n| ELI5、LIME 和 Skater |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002F3-ways-to-interpretate-your-nlp-model-to-management-and-customer-5428bc07ce15) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-model_interpretation.ipynb) |\n| SHapley Additive exPlanations (SHAP) |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Finterpreting-your-deep-learning-model-by-shap-e69be2b47893) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-model_interpretation_shap.ipynb) |\n| Anchors |  |  |  | [Medium](https:\u002F\u002Ftowardsdatascience.com\u002Fanchor-your-model-interpretation-by-anchors-aa4ed7104032) [Github](https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fblob\u002Fmaster\u002Fsample\u002Fnlp-model_interpretation_anchor.ipynb) |\n\n## 图\n| 版块 | 子版块 | 描述 | 研究实验室 | 文章 | 论文与代码 |\n| --- | --- | --- | --- | --- | --- |\n| 嵌入 | | TransE、RESCAL、DistMult、ComplEx、PyTorch BigGraph | |  [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fa-gentle-introduction-to-graph-embeddings-c7b3d1db0fa8) | [RESCAL(2011)](https:\u002F\u002Fpdfs.semanticscholar.org\u002F68a3\u002F3a3afac65eb6e0fb3726c1f9c8b727f32a42.pdf?_ga=2.21151099.1397092755.1575835510-317581445.1533093975) [TransE(2013)](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F5071-translating-embeddings-for-modeling-multi-relational-data.pdf) [DistMult(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1412.6575v4.pdf) [ComplEx(2016)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.06357.pdf) [PyTorch BigGraph(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1903.12287.pdf) |\n| 嵌入 | | DeepWalk、node2vec、LINE、GraphSAGE | |  [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Frandom-walk-in-node-embeddings-deepwalk-node2vec-line-and-graphsage-ca23df60e493) | [DeepWalk(2014)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1403.6652.pdf) [node2vec(2015)](https:\u002F\u002Fcs.stanford.edu\u002F~jure\u002Fpubs\u002Fnode2vec-kdd16.pdf) [LINE(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1503.03578.pdf) [GraphSAGE(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1706.02216.pdf) |\n| 嵌入 | | WLG、GCN、GAT、GIN | |  [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002F4-graph-neural-networks-you-need-to-know-wlg-gcn-gat-gin-1bf10d29d836) | [WLG(2011)](http:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fvolume12\u002Fshervashidze11a\u002Fshervashidze11a.pdf) [GCN2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1609.02907.pdf) [GAT(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.10903.pdf) [GraphSAGE(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.00826.pdf) |\n| 嵌入 | | [PinSAGE(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.01973.pdf) | Pinterest | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fwhen-graphsage-meets-pinterest-5e82c9a88120) |\n| 嵌入 | | [HoIE(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1510.04935.pdf)、[SimpIE(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.04868.pdf) | | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fknowledge-graph-embeddings-dc9251bffa80) |\n| 嵌入 | | [ContE(2017)](http:\u002F\u002Frepository.ittelkom-pwt.ac.id\u002F4358\u002F1\u002FLearning%20Contextual%20Embeddings%20for%20Knowledge%20Graph%20Completion.pdf)、[ETE(2017)](https:\u002F\u002Fpersagen.com\u002Ffiles\u002Fmisc\u002FMoon2017Learning.pdf) | | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Ffrom-conte-to-entity-type-embeddings-in-natural-language-processing-19e53db90dd5) |\n\n## 元学习\n| 版块 | 子版块 | 描述 | 文章 |\n| --- | --- | --- | --- |\n| 引言 |  | [Matching Nets(2016)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.04080.pdf) [MANN(2016)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1605.06065.pdf) [基于LSTM的元学习器(2017)](https:\u002F\u002Fopenreview.net\u002Fpdf?id=rJY0-Kcll) [原型网络(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.05175.pdf) [ARC(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.00767.pdf) [MAML(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.03400.pdf) [MetaNet(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.00837.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fa-gentle-introduction-to-meta-learning-8e36f3d93f61)  |\n| NLP | 对话生成 | [DAML(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.03520.pdf)、[PAML(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.10033.pdf)、[NTMS(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.10487.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fmeta-learning-in-dialog-generation-41367e397086) |\n| | 分类 | [意图嵌入(2016)](https:\u002F\u002Fwww.csie.ntu.edu.tw\u002F~yvchen\u002Fdoc\u002FICASSP16_ZeroShot.pdf) [LEOPARD(2019)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.03863.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fmeta-learning-in-nlp-classification-db78fbcdf15c) |\n| CV | 无监督学习 | [CACTUs(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.02334.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Funsupervised-learning-in-meta-learning-f71c549e2ae2) |\n| 综合 | | [Siamese网络(1994)](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F769-signature-verification-using-a-siamese-time-delay-neural-network.pdf)、[Triplet网络(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1412.6622.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fhow-do-twins-and-triplet-neural-network-work-cfed66d9b829) |\n| | [MAML+(2018)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.09502.pdf) | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Ffrom-maml-to-maml-20de07203d59) |\n\n## 图像\n| 版块 | 子版块 | 描述 | 研究实验室 | 文章 | 论文与代码 |\n| --- | --- | --- | --- | --- | --- |\n| 目标检测 |  | R-CNN | |  [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fhow-r-cnn-works-on-object-detection-443679b0187c) | [论文(2013)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1311.2524.pdf) |\n| 目标检测 |  | Fast R-CNN | |  [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fhow-fast-r-cnn-works-on-object-detection-546e4812eaa1) | [论文(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1504.08083.pdf) |\n| 目标检测 |  | Faster R-CNN | |  [Medium](https:\u002F\u002Fbecominghuman.ai\u002Fhow-faster-r-cnn-works-on-object-detection-3d92432ce321) | [论文(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.01497.pdf) |\n| 目标检测 |  | VGGNet | |  [Medium](https:\u002F\u002Fbecominghuman.ai\u002Fwhat-is-the-vgg-neural-network-a590caa72643) | [论文(2014)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.1556.pdf) |\n| 实例分割 | | Mask R-CNN | FAIR | [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fmask-r-cnn-for-instance-segmentation-7f0708e3e25b) | [论文(2017)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.06870.pdf) | \n| 图像分类 |  | [ResNet(2015)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1512.03385.pdf) |Microsoft |  [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fhow-does-resnet-improve-performance-caaa436f885b)  |\n| 图像分类 |  | [ResNeXt(2016)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1611.05431.pdf) | |  [Medium](https:\u002F\u002Fmedium.com\u002Fdataseries\u002Fenhancing-resnet-to-resnext-for-image-classification-3449f62a774c)  |\n\n## 评估\n| 版块 | 子版块 | 描述 | 文章 |\n| --- | --- | --- | --- |\n| 引言 | | | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fevaluation-metrics-are-what-you-need-to-define-in-the-earlier-stage-99dbfae51472) |\n| 分类 | | 混淆矩阵、ROC曲线、AUC | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fevaluation-metrics-for-classification-problems-e7442092bc5) |\n| 回归 | | MAE、MSE、RMSE、MAPE、WMAPE | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fevaluation-metrics-for-regression-problems-fff2ac8e3f43) |\n| 文本 | | 熵、BLEU、GER、WER、GLUE | [Medium](https:\u002F\u002Fmedium.com\u002Ftowards-artificial-intelligence\u002Fevaluation-metrics-for-textual-problems-6e881feef5ad) |\n\n## 源代码\n| 版块 | 子版块 | 描述 | 链接 |\n| --- | --- | --- | --- |\n| 拼写检查 |  |  | [Github](https:\u002F\u002Fgithub.com\u002Fnorvig\u002Fpytudes) |\n| InferSent |  |  | [Github](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FInferSent) |","# NLP 开源工具快速上手指南\n\n本指南基于 `nlp` 仓库内容整理，旨在帮助开发者快速掌握自然语言处理（NLP）的核心流程，涵盖数据增强、文本预处理及文本表示等关键领域。该仓库主要提供教程、数据集链接及前沿模型（SOTA）的代码示例与论文解读。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows (推荐 Linux 以获得最佳兼容性)\n*   **Python 版本**：Python 3.6 或更高版本\n*   **前置依赖**：\n    *   `pip` (Python 包管理工具)\n    *   `git` (用于克隆代码仓库)\n    *   基础科学计算库：`numpy`, `pandas`, `scikit-learn`\n    *   深度学习框架（根据具体示例选择）：`tensorflow` 或 `pytorch`\n\n> **提示**：建议创建虚拟环境以避免依赖冲突。\n> ```bash\n> python -m venv nlp_env\n> source nlp_env\u002Fbin\u002Factivate  # Windows 用户请使用: nlp_env\\Scripts\\activate\n> ```\n\n## 安装步骤\n\n由于该仓库是一个教程集合而非单一的 Python 包，您需要克隆仓库并安装各章节示例所需的特定依赖。\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp.git\n    cd nlp\n    ```\n\n2.  **安装通用依赖**\n    仓库中的示例通常依赖常见的 NLP 库。您可以一次性安装基础套件：\n    ```bash\n    pip install nltk spacy gensim scikit-learn fuzzywuzzy python-Levenshtein\n    ```\n    *注：国内用户推荐使用清华源加速安装：*\n    ```bash\n    pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple nltk spacy gensim scikit-learn fuzzywuzzy python-Levenshtein\n    ```\n\n3.  **下载必要数据资源**\n    部分预处理示例（如分词、词性标注）需要下载 NLTK 或 SpaCy 的数据包：\n    ```python\n    # 在 Python 交互环境中运行\n    import nltk\n    nltk.download('punkt')\n    nltk.download('averaged_perceptron_tagger')\n    nltk.download('stopwords')\n    \n    # 如果使用 SpaCy (以英文为例)\n    # python -m spacy download en_core_web_sm\n    ```\n\n4.  **安装特定模型依赖**\n    若需运行 BERT、ELMo 或 GPT 等进阶示例，请根据对应 Notebook 文件头部说明安装额外库，例如：\n    ```bash\n    pip install transformers torch tensorflow-hub\n    ```\n\n## 基本使用\n\n本仓库的核心价值在于通过 Jupyter Notebook 展示具体的 NLP 流水线。以下是两个最基础的使用场景示例。\n\n### 场景一：文本预处理流水线 (Tokenization & Stemming)\n\n参考 `sample\u002Fnlp-word_tokenization.ipynb` 和 `sample\u002Fnlp-stemming.ipynb`，执行基础的文本清洗：\n\n```python\nimport nltk\nfrom nltk.tokenize import word_tokenize\nfrom nltk.stem import PorterStemmer\n\n# 1. 原始文本\ntext = \"Data augmentation helps improve NLP models significantly!\"\n\n# 2. 分词 (Word Tokenization)\ntokens = word_tokenize(text)\nprint(\"Tokens:\", tokens)\n\n# 3. 词干提取 (Stemming)\nstemmer = PorterStemmer()\nstems = [stemmer.stem(token) for token in tokens]\nprint(\"Stems:\", stems)\n```\n\n### 场景二：文本表示 (Bag-of-Words)\n\n参考 `sample\u002Fnlp-bag_of_words.ipynb`，将文本转换为向量表示：\n\n```python\nfrom sklearn.feature_extraction.text import CountVectorizer\n\ndocuments = [\n    \"NLP is powerful\",\n    \"NLP uses deep learning\",\n    \"Deep learning is amazing\"\n]\n\n# 初始化 BoW 转换器\nvectorizer = CountVectorizer()\n\n# 拟合并转换数据\nX = vectorizer.fit_transform(documents)\n\n# 查看词汇表和向量矩阵\nprint(\"Vocabulary:\", vectorizer.get_feature_names_out())\nprint(\"Vector Shape:\", X.shape)\nprint(\"Vector Data:\\n\", X.toarray())\n```\n\n### 进阶探索\n\n对于更复杂的任务（如数据增强、对抗攻击防御或使用 BERT），请直接打开仓库中 `sample\u002F` 目录下对应的 `.ipynb` 文件。每个 Notebook 都包含了完整的代码实现、数据集加载逻辑以及指向相关论文和详细教程（Medium 文章）的链接。\n\n*   **数据增强**：查看 `Data Augmentation` 章节相关的 Notebook。\n*   **拼写检查**：运行 `sample\u002Futil\u002Fnlp-util-spell_corrector.ipynb`。\n*   **预训练模型**：参考 `Text Representation` 章节中关于 BERT、RoBERTa 或 T5 的示例代码。","某电商公司的数据科学团队正致力于构建一个智能客服系统，需要从海量且杂乱的用户评论中提取关键诉求并自动分类。\n\n### 没有 nlp 时\n- **数据噪声干扰严重**：用户评论中充斥大量拼写错误、口语化表达及无关符号，导致原始文本无法直接用于模型训练，人工清洗耗时极长。\n- **样本多样性不足**：针对“物流延迟”等特定负面场景的训练样本稀缺，模型因缺乏足够数据支撑而频繁出现漏判或误判。\n- **特征提取粒度粗糙**：仅能基于简单的关键词匹配识别意图，无法理解\"not good\"与\"good\"的本质区别，也难以处理未登录的新词或变体。\n- **抗攻击能力薄弱**：面对恶意用户故意插入干扰字符或同义词替换的对抗性输入，系统极易失效，输出荒谬的分类结果。\n\n### 使用 nlp 后\n- **自动化预处理流水线**：利用 nlp 提供的拼写校正（如 SymSpell 算法）和停用词过滤功能，快速将非结构化脏数据转化为标准文本，清洗效率提升 80%。\n- **高效数据增强**：通过回译（Back Translation）和无监督数据增强技术，自动生成大量高质量的合成样本，显著解决了长尾场景下的数据不平衡问题。\n- **精细化语义理解**：借助子词分词（Subword Tokenization）和词性标注，模型能精准捕捉细微的语义差异，即使面对新词或复杂句式也能准确识别用户意图。\n- **鲁棒性显著增强**：集成对抗攻击防御机制和数据噪声处理策略，使系统在面对恶意干扰输入时仍能保持稳定的分类准确率。\n\nnlp 通过提供从数据清洗、增强到模型鲁棒性优化的全套解决方案，将原本需要数周的手工工程缩短为几天的高效自动化流程，大幅降低了自然语言处理项目的落地门槛。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmakcedward_nlp_baa97d9d.png","makcedward","Edward Ma","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmakcedward_74d33cf5.jpg","Focus on Natural Language Processing, Transferring Learning, Data Science Architecture","SambaNova Systems","San Francisco Bay Area",null,"https:\u002F\u002Fmakcedward.github.io\u002F","https:\u002F\u002Fgithub.com\u002Fmakcedward",[85,89,93,97],{"name":86,"color":87,"percentage":88},"Python","#3572A5",61.8,{"name":90,"color":91,"percentage":92},"Jupyter Notebook","#DA5B0B",35.2,{"name":94,"color":95,"percentage":96},"sed","#64b970",1.6,{"name":98,"color":99,"percentage":100},"Shell","#89e051",1.4,1083,323,"2026-04-01T04:17:35",1,"","未说明",{"notes":108,"python":106,"dependencies":109},"该仓库主要是一个 NLP 教程集合，包含指向外部文章、数据集和代码示例（如 Jupyter Notebook）的链接列表。README 中未提供具体的安装指南、环境配置要求或依赖库版本信息。用户需根据链接中提及的具体模型（如 BERT, GPT-2, ELMo 等）或对应的原始项目仓库来确定具体的运行环境需求。",[],[26,51,13,14,54,15],[67,112,113,114,115],"deep-learning","machine-learning","data-science","ai","2026-03-27T02:49:30.150509","2026-04-06T06:46:01.822756",[119,124,129,133,137,142],{"id":120,"question_zh":121,"answer_zh":122,"source_url":123},16469,"为什么使用 Doc2Vec 检索到的相似文档不相似？","检索结果的质量依赖于训练数据和特征工程。虽然深度学习通常声称无需特征工程，但仍需引导神经网络提取有效特征。建议结合词性标注（Part-of-Speech）或字符级特征（character-level features）增强输入表示，以提升模型对语义的理解能力。","https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fissues\u002F2",{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},16468,"导入 aion 模块时出现 \"No module named 'aion'\" 错误怎么办？","即使使用 pip install aion 成功安装，仍可能因路径问题导致导入失败。尝试手动添加模块路径：\nimport sys, os\naion_dir = 'aion 库的实际安装路径'\nsys.path.insert(0, aion_dir)\n如果上述方法无效，请检查 Python 环境是否与安装环境一致，并确认 aion 是否真正安装在当前环境的 site-packages 目录中。","https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fissues\u002F1",{"id":130,"question_zh":131,"answer_zh":132,"source_url":123},16470,"运行 Doc2Vec 示例笔记本时出现 NameError: name 'x_train' is not defined 错误","该错误是因为变量 x_train 未定义。请确保在执行 doc2vec_embs.build_vocab(documents=x_train) 之前，已通过类似以下代码正确定义 x_train：\nx_train, x_val, y_train, y_val = train_test_split(np.array(train_raw_df.data), train_raw_df.target, test_size=0.2)\n请检查数据加载和划分步骤是否完整执行。",{"id":134,"question_zh":135,"answer_zh":136,"source_url":123},16471,"运行 Doc2Vec 示例时遇到 SSLCertVerificationError 如何解决？","此错误通常发生在下载预训练模型或数据时 SSL 证书验证失败。可尝试以下方法：\n1. 更新 certifi 包：pip install --upgrade certifi\n2. 在代码中禁用 SSL 验证（仅用于测试）：\n   import ssl\n   ssl._create_default_https_context = ssl._create_unverified_context\n3. 手动下载所需文件并本地加载，避免运行时联网请求。",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},16472,"如何获取示例中使用的 glove.6B.50d.vec 文件？","可从 Stanford NLP 官网下载预训练的 GloVe 向量文件。访问 https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002F，下载 glove.6B.zip，解压后即可获得 glove.6B.50d.vec 文件。将其放置在项目指定目录或在代码中指定正确路径即可加载。","https:\u002F\u002Fgithub.com\u002Fmakcedward\u002Fnlp\u002Fissues\u002F3",{"id":143,"question_zh":144,"answer_zh":145,"source_url":128},16473,"多个用户报告无法导入 aion 模块，是否有通用解决方案？","该问题较为普遍，主要源于 Python 路径配置问题。除手动添加 sys.path 外，建议：\n1. 确认使用的是与安装 aion 相同的 Python 解释器（尤其在多版本环境中）；\n2. 使用 python -m pip install aion 确保安装到当前环境；\n3. 在虚拟环境中重新安装以避免权限或路径冲突；\n4. 检查是否存在名为 aion.py 的文件与包名冲突。",[]]