[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-eugeneyan--applied-ml":3,"tool-eugeneyan--applied-ml":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",158594,2,"2026-04-16T23:34:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":76,"owner_twitter":73,"owner_website":79,"owner_url":80,"languages":76,"stars":81,"forks":82,"last_commit_at":83,"license":84,"difficulty_score":85,"env_os":86,"env_gpu":87,"env_ram":87,"env_deps":88,"category_tags":91,"github_topics":93,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":108,"updated_at":109,"faqs":110,"releases":126},8250,"eugeneyan\u002Fapplied-ml","applied-ml","📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.","applied-ml 是一个专注于“生产环境中的数据科学与机器学习”的精选资源库。它汇集了来自 Airbnb、Google、Uber、Netflix 等科技巨头的技术论文、工程博客和实战案例，旨在填补学术理论与工业落地之间的鸿沟。\n\n在实际开发中，许多团队往往知道算法原理，却不清楚如何在真实业务中构建可靠的系统。applied-ml 正是为了解决这一痛点而生。它不仅展示了各大公司如何定义问题（例如将个性化推荐转化为搜索或序列建模问题），还深入剖析了哪些技术方案行之有效、哪些曾遭遇失败，并提供了关于数据质量、特征存储、模型管理及 MLOps 基础设施等全流程的宝贵经验。通过这些内容，用户能够更准确地评估项目的投资回报率（ROI），避免重复造轮子。\n\n这份资源特别适合机器学习工程师、数据科学家、技术负责人以及正在探索 AI 落地的研究人员使用。无论你是需要寻找特定场景（如异常检测、自然语言处理或隐私保护计算）的参考架构，还是希望了解大厂团队的协作模式与避坑指南，applied-ml 都能提供极具价值的实战视角。其独特的价值在于不仅关注“怎么做”，更强调“为什么这样做”以及“实际效果如何”，","applied-ml 是一个专注于“生产环境中的数据科学与机器学习”的精选资源库。它汇集了来自 Airbnb、Google、Uber、Netflix 等科技巨头的技术论文、工程博客和实战案例，旨在填补学术理论与工业落地之间的鸿沟。\n\n在实际开发中，许多团队往往知道算法原理，却不清楚如何在真实业务中构建可靠的系统。applied-ml 正是为了解决这一痛点而生。它不仅展示了各大公司如何定义问题（例如将个性化推荐转化为搜索或序列建模问题），还深入剖析了哪些技术方案行之有效、哪些曾遭遇失败，并提供了关于数据质量、特征存储、模型管理及 MLOps 基础设施等全流程的宝贵经验。通过这些内容，用户能够更准确地评估项目的投资回报率（ROI），避免重复造轮子。\n\n这份资源特别适合机器学习工程师、数据科学家、技术负责人以及正在探索 AI 落地的研究人员使用。无论你是需要寻找特定场景（如异常检测、自然语言处理或隐私保护计算）的参考架构，还是希望了解大厂团队的协作模式与避坑指南，applied-ml 都能提供极具价值的实战视角。其独特的价值在于不仅关注“怎么做”，更强调“为什么这样做”以及“实际效果如何”，是连接前沿研究与工程实践的桥梁。","# applied-ml\nCurated papers, articles, and blogs on **data science & machine learning in production**. ⚙️\n\n[![contributions welcome](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcontributions-welcome-brightgreen.svg?style=flat)](.\u002FCONTRIBUTING.md) [![Summaries](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fsummaries-in%20tweets-%2300acee.svg?style=flat)](https:\u002F\u002Ftwitter.com\u002Feugeneyan\u002Fstatus\u002F1350509546133811200) ![HitCount](http:\u002F\u002Fhits.dwyl.com\u002Feugeneyan\u002Fapplied-ml.svg)\n\nFiguring out how to implement your ML project? Learn how other organizations did it:\n\n- **How** the problem is framed 🔎(e.g., personalization as recsys vs. search vs. sequences)\n- **What** machine learning techniques worked ✅ (and sometimes, what didn't ❌)\n- **Why** it works, the science behind it with research, literature, and references 📂\n- **What** real-world results were achieved (so you can better assess ROI ⏰💰📈)\n\nP.S., Want a summary of ML advancements? 👉[`ml-surveys`](https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fml-surveys)\n\nP.P.S, Looking for guides and interviews on applying ML? 👉[`applyingML`](https:\u002F\u002Fapplyingml.com)\n\n**Table of Contents**\n\n1. [Data Quality](#data-quality)\n2. [Data Engineering](#data-engineering)\n3. [Data Discovery](#data-discovery)\n4. [Feature Stores](#feature-stores)\n5. [Classification](#classification)\n6. [Regression](#regression)\n7. [Forecasting](#forecasting)\n8. [Recommendation](#recommendation)\n9. [Search & Ranking](#search--ranking)\n10. [Embeddings](#embeddings)\n11. [Natural Language Processing](#natural-language-processing)\n12. [Sequence Modelling](#sequence-modelling)\n13. [Computer Vision](#computer-vision)\n14. [Reinforcement Learning](#reinforcement-learning)\n15. [Anomaly Detection](#anomaly-detection)\n16. [Graph](#graph)\n17. [Optimization](#optimization)\n18. [Information Extraction](#information-extraction)\n19. [Weak Supervision](#weak-supervision)\n20. [Generation](#generation)\n21. [Audio](#audio)\n22. [Privacy-Preserving Machine Learning](#privacy-preserving-machine-learning)\n23. [Validation and A\u002FB Testing](#validation-and-ab-testing)\n24. [Model Management](#model-management)\n25. [Efficiency](#efficiency)\n26. [Ethics](#ethics)\n27. [Infra](#infra)\n28. [MLOps Platforms](#mlops-platforms)\n29. [Practices](#practices)\n30. [Team Structure](#team-structure)\n31. [Fails](#fails)\n\n## Data Quality\n1. [Reliable and Scalable Data Ingestion at Airbnb](https:\u002F\u002Fwww.slideshare.net\u002FHadoopSummit\u002Freliable-and-scalable-data-ingestion-at-airbnb-63920989) `Airbnb` `2016`\n2. [Monitoring Data Quality at Scale with Statistical Modeling](https:\u002F\u002Feng.uber.com\u002Fmonitoring-data-quality-at-scale\u002F) `Uber` `2017`\n3. [Data Management Challenges in Production Machine Learning](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub46178\u002F) ([Paper](https:\u002F\u002Fthodrek.github.io\u002FCS839_spring18\u002Fpapers\u002Fp1723-polyzotis.pdf)) `Google` `2017`\n4. [Automating Large-Scale Data Quality Verification](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fautomating-large-scale-data-quality-verification) ([Paper](https:\u002F\u002Fassets.amazon.science\u002Fa6\u002F88\u002Fad858ee240c38c6e9dce128250c0\u002Fautomating-large-scale-data-quality-verification.pdf))`Amazon` `2018`\n5. [Meet Hodor — Gojek’s Upstream Data Quality Tool](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Fmeet-hodor-gojeks-upstream-data-quality-tool) `Gojek` `2019`\n6. [Data Validation for Machine Learning](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub47967\u002F) ([Paper](https:\u002F\u002Fmlsys.org\u002FConferences\u002F2019\u002Fdoc\u002F2019\u002F167.pdf)) `Google` `2019`\n6. [An Approach to Data Quality for Netflix Personalization Systems](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=t7vHpA39TXM) `Netflix` `2020`\n7. [Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters](https:\u002F\u002Fresearch.fb.com\u002Fblog\u002F2020\u002F08\u002Fimproving-the-accuracy-of-community-standards-enforcement-by-certainty-estimation-of-human-decisions\u002F) ([Paper](https:\u002F\u002Fresearch.fb.com\u002Fwp-content\u002Fuploads\u002F2020\u002F08\u002FCLARA-Confidence-of-Labels-and-Raters.pdf)) `Facebook` `2020`\n\n## Data Engineering\n1. [Zipline: Airbnb’s Machine Learning Data Management Platform](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Tg5VEMEsC-0) `Airbnb` `2018`\n2. [Sputnik: Airbnb’s Apache Spark Framework for Data Engineering](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=BQumogSBsUw) `Airbnb` `2020`\n3. [Unbundling Data Science Workflows with Metaflow and AWS Step Functions](https:\u002F\u002Fnetflixtechblog.com\u002Funbundling-data-science-workflows-with-metaflow-and-aws-step-functions-d454780c6280) `Netflix` `2020`\n4. [How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F09\u002F25\u002Fhow-doordash-is-scaling-its-data-platform\u002F) `DoorDash` `2020`\n5. [Revolutionizing Money Movements at Scale with Strong Data Consistency](https:\u002F\u002Feng.uber.com\u002Fmoney-scale-strong-data\u002F) `Uber` `2020`\n6. [Zipline - A Declarative Feature Engineering Framework](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=LjcKCm0G_OY) `Airbnb` `2020`\n7. [Automating Data Protection at Scale, Part 1](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fautomating-data-protection-at-scale-part-1-c74909328e08) ([Part 2](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fautomating-data-protection-at-scale-part-2-c2b8d2068216)) `Airbnb` `2021`\n8. [Real-time Data Infrastructure at Uber](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2104.00087.pdf) `Uber` `2021`\n9. [Introducing Fabricator: A Declarative Feature Engineering Framework](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F01\u002F11\u002Fintroducing-fabricator-a-declarative-feature-engineering-framework\u002F) `DoorDash` `2022`\n10. [Functions & DAGs: introducing Hamilton, a microframework for dataframe generation](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2021\u002F10\u002F14\u002Ffunctions-dags-hamilton\u002F) `Stitch Fix` `2021`\n11. [Optimizing Pinterest’s Data Ingestion Stack: Findings and Learnings](https:\u002F\u002Fmedium.com\u002F@Pinterest_Engineering\u002Foptimizing-pinterests-data-ingestion-stack-findings-and-learnings-994fddb063bf) `Pinterest` `2022`\n12. [Lessons Learned From Running Apache Airflow at Scale](https:\u002F\u002Fshopifyengineering.myshopify.com\u002Fblogs\u002Fengineering\u002Flessons-learned-apache-airflow-scale) `Shopify` `2022`\n13. [Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.09373v4) `Meta` `2022`\n14. [Data Mesh — A Data Movement and Processing Platform @ Netflix](https:\u002F\u002Fnetflixtechblog.com\u002Fdata-mesh-a-data-movement-and-processing-platform-netflix-1288bcab2873) `Netflix` `2022`\n15. [Building Scalable Real Time Event Processing with Kafka and Flink](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F08\u002F02\u002Fbuilding-scalable-real-time-event-processing-with-kafka-and-flink\u002F) `DoorDash` `2022`\n\n## Data Discovery\n1. [Apache Atlas: Data Goverance and Metadata Framework for Hadoop](https:\u002F\u002Fatlas.apache.org\u002F#\u002F) ([Code](https:\u002F\u002Fgithub.com\u002Fapache\u002Fatlas)) `Apache`\n2. [Collect, Aggregate, and Visualize a Data Ecosystem's Metadata](https:\u002F\u002Fmarquezproject.github.io\u002Fmarquez\u002F) ([Code](https:\u002F\u002Fgithub.com\u002FMarquezProject\u002Fmarquez)) `WeWork`\n3. [Discovery and Consumption of Analytics Data at Twitter](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2016\u002Fdiscovery-and-consumption-of-analytics-data-at-twitter.html) `Twitter` `2016`\n4. [Democratizing Data at Airbnb](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fdemocratizing-data-at-airbnb-852d76c51770) `Airbnb` `2017`\n5. [Databook: Turning Big Data into Knowledge with Metadata at Uber](https:\u002F\u002Feng.uber.com\u002Fdatabook\u002F) `Uber` `2018`\n6. [Metacat: Making Big Data Discoverable and Meaningful at Netflix](https:\u002F\u002Fnetflixtechblog.com\u002Fmetacat-making-big-data-discoverable-and-meaningful-at-netflix-56fb36a53520) ([Code](https:\u002F\u002Fgithub.com\u002FNetflix\u002Fmetacat)) `Netflix` `2018`\n7. [Amundsen — Lyft’s Data Discovery & Metadata Engine](https:\u002F\u002Feng.lyft.com\u002Famundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9) `Lyft` `2019`\n8. [Open Sourcing Amundsen: A Data Discovery And Metadata Platform](https:\u002F\u002Feng.lyft.com\u002Fopen-sourcing-amundsen-a-data-discovery-and-metadata-platform-2282bb436234) ([Code](https:\u002F\u002Fgithub.com\u002Flyft\u002Famundsen)) `Lyft` `2019`\n9. [DataHub: A Generalized Metadata Search & Discovery Tool](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002Fdata-hub) ([Code](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fdatahub)) `LinkedIn` `2019`\n10. [Amundsen: One Year Later](https:\u002F\u002Feng.lyft.com\u002Famundsen-1-year-later-7b60bf28602) `Lyft` `2020`\n11. [Using Amundsen to Support User Privacy via Metadata Collection at Square](https:\u002F\u002Fdeveloper.squareup.com\u002Fblog\u002Fusing-amundsen-to-support-user-privacy-via-metadata-collection-at-square\u002F) `Square` `2020`\n12. [Turning Metadata Into Insights with Databook](https:\u002F\u002Feng.uber.com\u002Fmetadata-insights-databook\u002F) `Uber` `2020`\n13. [DataHub: Popular Metadata Architectures Explained](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fdatahub-popular-metadata-architectures-explained) `LinkedIn` `2020`\n14. [How We Improved Data Discovery for Data Scientists at Spotify](https:\u002F\u002Fengineering.atspotify.com\u002F2020\u002F02\u002F27\u002Fhow-we-improved-data-discovery-for-data-scientists-at-spotify\u002F) `Spotify` `2020` \n15. [How We’re Solving Data Discovery Challenges at Shopify](https:\u002F\u002Fengineering.shopify.com\u002Fblogs\u002Fengineering\u002Fsolving-data-discovery-challenges-shopify) `Shopify` `2020`\n16. [Nemo: Data discovery at Facebook](https:\u002F\u002Fengineering.fb.com\u002Fdata-infrastructure\u002Fnemo\u002F) `Facebook` `2020`\n17. [Exploring Data @ Netflix](https:\u002F\u002Fnetflixtechblog.com\u002Fexploring-data-netflix-9d87e20072e3) ([Code](https:\u002F\u002Fgithub.com\u002FNetflix\u002Fnf-data-explorer)) `Netflix` `2021`\n\n## Feature Stores\n1. [Distributed Time Travel for Feature Generation](https:\u002F\u002Fnetflixtechblog.com\u002Fdistributed-time-travel-for-feature-generation-389cccdd3907) `Netflix` `2016`\n2. [Building the Activity Graph, Part 2 (Feature Storage Section)](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2017\u002F07\u002Fbuilding-the-activity-graph--part-2) `LinkedIn` `2017`\n3. [Fact Store at Scale for Netflix Recommendations](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DiwKg8KynVU) `Netflix` `2018`\n4. [Zipline: Airbnb’s Machine Learning Data Management Platform](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Tg5VEMEsC-0) `Airbnb` `2018`\n5. [Feature Store: The missing data layer for Machine Learning pipelines?](https:\u002F\u002Fwww.hopsworks.ai\u002Fpost\u002Ffeature-store-the-missing-data-layer-in-ml-pipelines) `Hopsworks` `2018`\n6. [Introducing Feast: An Open Source Feature Store for Machine Learning](https:\u002F\u002Fcloud.google.com\u002Fblog\u002Fproducts\u002Fai-machine-learning\u002Fintroducing-feast-an-open-source-feature-store-for-machine-learning) ([Code](https:\u002F\u002Fgithub.com\u002Ffeast-dev\u002Ffeast)) `Gojek` `2019`\n7. [Michelangelo Palette: A Feature Engineering Platform at Uber](https:\u002F\u002Fwww.infoq.com\u002Fpresentations\u002Fmichelangelo-palette-uber\u002F) `Uber` `2019`\n8. [The Architecture That Powers Twitter's Feature Store](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=UNailXoiIrY) `Twitter` `2019`\n9. [Accelerating Machine Learning with the Feature Store Service](https:\u002F\u002Ftechnology.condenast.com\u002Fstory\u002Faccelerating-machine-learning-with-the-feature-store-service) `Condé Nast` `2019` \n10. [Feast: Bridging ML Models and Data](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Ffeast-bridging-ml-models-and-data) `Gojek` `2020`\n11. [Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F11\u002F19\u002Fbuilding-a-gigascale-ml-feature-store-with-redis\u002F) `DoorDash` `2020`\n12. [Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Ffeed-typed-ai-features) `LinkedIn` `2020`\n13. [Building a Feature Store](https:\u002F\u002Fnlathia.github.io\u002F2020\u002F12\u002FBuilding-a-feature-store.html) `Monzo Bank` `2020`\n14. [Butterfree: A Spark-based Framework for Feature Store Building](https:\u002F\u002Fmedium.com\u002Fquintoandar-tech-blog\u002Fbutterfree-a-spark-based-framework-for-feature-store-building-48c3640522c7) ([Code](https:\u002F\u002Fgithub.com\u002Fquintoandar\u002Fbutterfree)) `QuintoAndar` `2020`\n15. [Building Riviera: A Declarative Real-Time Feature Engineering Framework](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F03\u002F04\u002Fbuilding-a-declarative-real-time-feature-engineering-framework\u002F) `DoorDash` `2021`\n16. [Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory](https:\u002F\u002Feng.uber.com\u002Foptimal-feature-discovery-ml\u002F) `Uber` `2021`\n17. [ML Feature Serving Infrastructure at Lyft](https:\u002F\u002Feng.lyft.com\u002Fml-feature-serving-infrastructure-at-lyft-d30bf2d3c32a) `Lyft` `2021`\n18. [Near real-time features for near real-time personalization](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2022\u002Fnear-real-time-features-for-near-real-time-personalization) `LinkedIn` `2022`\n19. [Building the Model Behind DoorDash’s Expansive Merchant Selection](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F04\u002F19\u002Fbuilding-merchant-selection\u002F) `DoorDash` `2022`\n20. [Open sourcing Feathr – LinkedIn’s feature store for productive machine learning](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2022\u002Fopen-sourcing-feathr---linkedin-s-feature-store-for-productive-m) `LinkedIn` `2022`\n21. [Evolution of ML Fact Store](https:\u002F\u002Fnetflixtechblog.com\u002Fevolution-of-ml-fact-store-5941d3231762) `Netflix` `2022`\n22. [Developing scalable feature engineering DAGs](https:\u002F\u002Fouterbounds.com\u002Fblog\u002Fdeveloping-scalable-feature-engineering-dags) `Metaflow + Hamilton` via `Outerbounds` `2022`\n23. [Feature Store Design at Constructor](https:\u002F\u002Fmedium.com\u002Fconstructor-engineering\u002Ffeature-store-design-at-constructor-330b65f64b18) `Constructor.io` `2023`\n\n\n## Classification\n1. [Prediction of Advertiser Churn for Google AdWords](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub36678\u002F) ([Paper](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F36678.pdf)) `Google` `2010`\n2. [High-Precision Phrase-Based Document Classification on a Modern Scale](https:\u002F\u002Fengineering.linkedin.com\u002Fresearch\u002F2011\u002Fhigh-precision-phrase-based-document-classification-on-a-modern-scale) ([Paper](http:\u002F\u002Fweb.stanford.edu\u002F~gavish\u002Fdocuments\u002Fphrase_based.pdf)) `LinkedIn` `2011`\n3. [Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.14778\u002F2733004.2733024) ([Paper](http:\u002F\u002Fpages.cs.wisc.edu\u002F%7Eanhai\u002Fpapers\u002Fchimera-vldb14.pdf)) `Walmart` `2014`\n4. [Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks](https:\u002F\u002Fwww.kdd.org\u002Fkdd2016\u002Fsubtopic\u002Fview\u002Flarge-scale-item-categorization-in-e-commerce-using-multiple-recurrent-neur\u002F) ([Paper](https:\u002F\u002Fwww.kdd.org\u002Fkdd2016\u002Fpapers\u002Ffiles\u002Fadf0392-haAemb.pdf)) `NAVER` `2016`\n5. [Learning to Diagnose with LSTM Recurrent Neural Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.03677) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.03677.pdf)) `Google` `2017`\n6. [Discovering and Classifying In-app Message Intent at Airbnb](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fdiscovering-and-classifying-in-app-message-intent-at-airbnb-6a55f5400a0c) `Airbnb` `2019`\n7. [Teaching Machines to Triage Firefox Bugs](https:\u002F\u002Fhacks.mozilla.org\u002F2019\u002F04\u002Fteaching-machines-to-triage-firefox-bugs\u002F) `Mozilla` `2019`\n8. [Categorizing Products at Scale](https:\u002F\u002Fengineering.shopify.com\u002Fblogs\u002Fengineering\u002Fcategorizing-products-at-scale) `Shopify` `2020`\n9. [How We Built the Good First Issues Feature](https:\u002F\u002Fgithub.blog\u002F2020-01-22-how-we-built-good-first-issues\u002F) `GitHub` `2020`\n10. [Testing Firefox More Efficiently with Machine Learning](https:\u002F\u002Fhacks.mozilla.org\u002F2020\u002F07\u002Ftesting-firefox-more-efficiently-with-machine-learning\u002F) `Mozilla` `2020`\n11. [Using ML to Subtype Patients Receiving Digital Mental Health Interventions](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fblog\u002Fa-path-to-personalization-using-ml-to-subtype-patients-receiving-digital-mental-health-interventions\u002F) ([Paper](https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjamanetworkopen\u002Ffullarticle\u002F2768347)) `Microsoft` `2020`\n12. [Scalable Data Classification for Security and Privacy](https:\u002F\u002Fengineering.fb.com\u002Fsecurity\u002Fdata-classification-system\u002F) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.14109.pdf)) `Facebook` `2020`\n13. [Uncovering Online Delivery Menu Best Practices with Machine Learning](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F11\u002F10\u002Funcovering-online-delivery-menu-best-practices-with-machine-learning\u002F) `DoorDash` `2020`\n14. [Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F08\u002F28\u002Fovercome-the-cold-start-problem-in-menu-item-tagging\u002F) `DoorDash` `2020`\n15. [Deep Learning: Product Categorization and Shelving](https:\u002F\u002Fmedium.com\u002Fwalmartglobaltech\u002Fdeep-learning-product-categorization-and-shelving-630571e81e96) `Walmart` `2021`\n16. [Large-scale Item Categorization for e-Commerce](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F2396761.2396838) ([Paper](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FJean_David_Ruvini\u002Fpublication\u002F262270957_Large-scale_item_categorization_for_e-commerce\u002Flinks\u002F5512dc3d0cf270fd7e33a0d5\u002FLarge-scale-item-categorization-for-e-commerce.pdf)) `DianPing`, `eBay` `2012`\n17. [Semantic Label Representation with an Application on Multimodal Product Categorization](https:\u002F\u002Fmedium.com\u002Fwalmartglobaltech\u002Fsemantic-label-representation-with-an-application-on-multimodal-product-categorization-63d668b943b7) `Walmart` `2022`\n18. [Building Airbnb Categories with ML and Human-in-the-Loop](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fbuilding-airbnb-categories-with-ml-and-human-in-the-loop-e97988e70ebb) `Airbnb` `2022`\n\n\n## Regression\n1. [Using Machine Learning to Predict Value of Homes On Airbnb](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fusing-machine-learning-to-predict-value-of-homes-on-airbnb-9272d3d4739d) `Airbnb` `2017`\n2. [Using Machine Learning to Predict the Value of Ad Requests](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2020\u002Fusing-machine-learning-to-predict-the-value-of-ad-requests.html) `Twitter` `2020`\n3. [Open-Sourcing Riskquant, a Library for Quantifying Risk](https:\u002F\u002Fnetflixtechblog.com\u002Fopen-sourcing-riskquant-a-library-for-quantifying-risk-6720cc1e4968) ([Code](https:\u002F\u002Fgithub.com\u002FNetflix-Skunkworks\u002Friskquant)) `Netflix` `2020`\n4. [Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F10\u002F14\u002Fsolving-for-unobserved-data-in-a-regression-model\u002F) `DoorDash` `2020`\n\n## Forecasting\n1. [Engineering Extreme Event Forecasting at Uber with RNN](https:\u002F\u002Feng.uber.com\u002Fneural-networks\u002F) `Uber` `2017`\n2. [Forecasting at Uber: An Introduction](https:\u002F\u002Feng.uber.com\u002Fforecasting-introduction\u002F) `Uber` `2018`\n3. [Transforming Financial Forecasting with Data Science and Machine Learning at Uber](https:\u002F\u002Feng.uber.com\u002Ftransforming-financial-forecasting-machine-learning\u002F) `Uber` `2018`\n4. [Under the Hood of Gojek’s Automated Forecasting Tool](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Funder-the-hood-of-gojeks-automated-forecasting-tool) `Gojek` `2019`\n5. [BusTr: Predicting Bus Travel Times from Real-Time Traffic](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3394486.3403376) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3394486.3403376), [Video](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369790576dd25aef288db\u002F)) `Google` `2020`\n6. [Retraining Machine Learning Models in the Wake of COVID-19](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F09\u002F15\u002Fretraining-ml-models-covid-19\u002F) `DoorDash` `2020`\n7. [Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TkcpjnLh690) ([Paper](https:\u002F\u002Fpeerj.com\u002Fpreprints\u002F3190.pdf), [Code](https:\u002F\u002Fgithub.com\u002Ffacebook\u002Fprophet)) `Atlassian` `2020`\n8. [Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting](https:\u002F\u002Feng.uber.com\u002Forbit\u002F) ([Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.08492), [Video](https:\u002F\u002Fyoutu.be\u002FLXDpq_iwcWY), [Code](https:\u002F\u002Fgithub.com\u002Fuber\u002Forbit)) `Uber` `2021`\n9. [Managing Supply and Demand Balance Through Machine Learning](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F06\u002F29\u002Fmanaging-supply-and-demand-balance-through-machine-learning\u002F) `DoorDash` `2021`\n10. [Greykite: A flexible, intuitive, and fast forecasting library](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2021\u002Fgreykite--a-flexible--intuitive--and-fast-forecasting-library) `LinkedIn` `2021`\n11. [The history of Amazon’s forecasting algorithm](https:\u002F\u002Fwww.amazon.science\u002Flatest-news\u002Fthe-history-of-amazons-forecasting-algorithm) `Amazon` `2021`\n11. [DeepETA: How Uber Predicts Arrival Times Using Deep Learning](https:\u002F\u002Feng.uber.com\u002Fdeepeta-how-uber-predicts-arrival-times\u002F) `Uber` `2022`\n12. [Forecasting Grubhub Order Volume At Scale](https:\u002F\u002Fbytes.grubhub.com\u002Fforecasting-grubhub-order-volume-at-scale-a966c2f901d2) `Grubhub` `2022`\n13. [Causal Forecasting at Lyft (Part 1)](https:\u002F\u002Feng.lyft.com\u002Fcausal-forecasting-at-lyft-part-1-14cca6ff3d6d) `Lyft` `2022`\n\n## Recommendation\n1. [Amazon.com Recommendations: Item-to-Item Collaborative Filtering](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F1167344) ([Paper](https:\u002F\u002Fwww.cs.umd.edu\u002F~samir\u002F498\u002FAmazon-Recommendations.pdf)) `Amazon` `2003`\n2. [Netflix Recommendations: Beyond the 5 stars (Part 1](https:\u002F\u002Fnetflixtechblog.com\u002Fnetflix-recommendations-beyond-the-5-stars-part-1-55838468f429) ([Part 2](https:\u002F\u002Fnetflixtechblog.com\u002Fnetflix-recommendations-beyond-the-5-stars-part-2-d9b96aa399f5)) `Netflix` `2012`\n3. [How Music Recommendation Works — And Doesn’t Work](https:\u002F\u002Fnotes.variogram.com\u002F2012\u002F12\u002F11\u002Fhow-music-recommendation-works-and-doesnt-work\u002F) `Spotify` `2012`\n4. [Learning to Rank Recommendations with the k -Order Statistic Loss](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F2507157.2507210) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F2507157.2507210)) `Google` `2013`\n5. [Recommending Music on Spotify with Deep Learning](https:\u002F\u002Fbenanne.github.io\u002F2014\u002F08\u002F05\u002Fspotify-cnns.html) `Spotify` `2014`\n6. [Learning a Personalized Homepage](https:\u002F\u002Fnetflixtechblog.com\u002Flearning-a-personalized-homepage-aa8ec670359a) `Netflix` `2015`\n7. [The Netflix Recommender System: Algorithms, Business Value, and Innovation](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F2843948) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F2843948)) `Netflix` `2015`\n7. [Session-based Recommendations with Recurrent Neural Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06939) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.06939.pdf)) `Telefonica` `2016`\n8. [Deep Neural Networks for YouTube Recommendations](https:\u002F\u002Fstatic.googleusercontent.com\u002Fmedia\u002Fresearch.google.com\u002Fen\u002F\u002Fpubs\u002Farchive\u002F45530.pdf) `YouTube` `2016`\n9. [E-commerce in Your Inbox: Product Recommendations at Scale](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.07154) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.07154.pdf)) `Yahoo` `2016`\n10. [To Be Continued: Helping you find shows to continue watching on Netflix](https:\u002F\u002Fnetflixtechblog.com\u002Fto-be-continued-helping-you-find-shows-to-continue-watching-on-7c0d8ee4dab6) `Netflix` `2016`\n11. [Personalized Recommendations in LinkedIn Learning](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2016\u002F12\u002Fpersonalized-recommendations-in-linkedin-learning) `LinkedIn` `2016`\n12. [Personalized Channel Recommendations in Slack](https:\u002F\u002Fslack.engineering\u002Fpersonalized-channel-recommendations-in-slack\u002F) `Slack` `2016`\n13. [Recommending Complementary Products in E-Commerce Push Notifications](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.08113) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1707.08113.pdf)) `Alibaba` `2017`\n14. [Artwork Personalization at Netflix](https:\u002F\u002Fnetflixtechblog.com\u002Fartwork-personalization-c589f074ad76) `Netflix` `2017`\n15. [A Meta-Learning Perspective on Cold-Start Recommendations for Items](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F7266-a-meta-learning-perspective-on-cold-start-recommendations-for-items) ([Paper](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F7266-a-meta-learning-perspective-on-cold-start-recommendations-for-items.pdf)) `Twitter` `2017`\n16. [Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.07601) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.07601.pdf)) `Pinterest` `2017`\n17. [Powering Search & Recommendations at DoorDash](https:\u002F\u002Fdoordash.news\u002Fcompany\u002Fpowering-search-recommendations-at-doordash\u002F) `DoorDash` `2017`\n17. [How 20th Century Fox uses ML to predict a movie audience](https:\u002F\u002Fcloud.google.com\u002Fblog\u002Fproducts\u002Fai-machine-learning\u002Fhow-20th-century-fox-uses-ml-to-predict-a-movie-audience) ([Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.08189)) `20th Century Fox` `2018`\n18. [Calibrated Recommendations](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3240323.3240372) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3240323.3240372)) `Netflix` `2018`\n19. [Food Discovery with Uber Eats: Recommending for the Marketplace](https:\u002F\u002Feng.uber.com\u002Fuber-eats-recommending-marketplace\u002F) `Uber` `2018`\n20. [Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3240323.3240354) ([Paper](https:\u002F\u002Fstatic1.squarespace.com\u002Fstatic\u002F5ae0d0b48ab7227d232c2bea\u002Ft\u002F5ba849e3c83025fa56814f45\u002F1537755637453\u002FBartRecSys.pdf)) `Spotify` `2018`\n21. [Talent Search and Recommendation Systems at LinkedIn: Practical Challenges and Lessons Learned](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.06481) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1809.06481.pdf)) `LinkedIn` `2018`\n21. [Behavior Sequence Transformer for E-commerce Recommendation in Alibaba](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.06874) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.06874.pdf)) `Alibaba` `2019`\n22. [SDM: Sequential Deep Matching Model for Online Large-scale Recommender System](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.00385) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.00385.pdf)) `Alibaba` `2019`\n23. [Multi-Interest Network with Dynamic Routing for Recommendation at Tmall](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.08030) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.08030.pdf)) `Alibaba` `2019`\n24. [Personalized Recommendations for Experiences Using Deep Learning](https:\u002F\u002Fwww.tripadvisor.com\u002Fengineering\u002Fpersonalized-recommendations-for-experiences-using-deep-learning\u002F) `TripAdvisor` `2019`\n25. [Powered by AI: Instagram’s Explore recommender system](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fpowered-by-ai-instagrams-explore-recommender-system\u002F) `Facebook` `2019`\n26. [Marginal Posterior Sampling for Slate Bandits](https:\u002F\u002Fwww.ijcai.org\u002Fproceedings\u002F2019\u002F308) ([Paper](https:\u002F\u002Fwww.ijcai.org\u002Fproceedings\u002F2019\u002F0308.pdf)) `Netflix` `2019`\n27. [Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations](https:\u002F\u002Feng.uber.com\u002Fuber-eats-graph-learning\u002F) `Uber` `2019`\n28. [Music recommendation at Spotify](http:\u002F\u002Fsigir.org\u002Fafirm2019\u002Fslides\u002F16.%20Friday%20-%20Music%20Recommendation%20at%20Spotify%20-%20Ben%20Carterette.pdf) `Spotify` `2019`\n29. [Using Machine Learning to Predict what File you Need Next (Part 1)](https:\u002F\u002Fdropbox.tech\u002Fmachine-learning\u002Fcontent-suggestions-machine-learning) `Dropbox` `2019`\n30. [Using Machine Learning to Predict what File you Need Next (Part 2)](https:\u002F\u002Fdropbox.tech\u002Fmachine-learning\u002Fusing-machine-learning-to-predict-what-file-you-need-next-part-2) `Dropbox` `2019`\n31. [Learning to be Relevant: Evolution of a Course Recommendation System](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3357384.3357817) (**PAPER NEEDED**)`LinkedIn` `2019`\n32. [Temporal-Contextual Recommendation in Real-Time](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Ftemporal-contextual-recommendation-in-real-time) ([Paper](https:\u002F\u002Fassets.amazon.science\u002F96\u002F71\u002Fd1f25754497681133c7aa2b7eb05\u002Ftemporal-contextual-recommendation-in-real-time.pdf)) `Amazon` `2020`\n33. [P-Companion: A Framework for Diversified Complementary Product Recommendation](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fp-companion-a-principled-framework-for-diversified-complementary-product-recommendation) ([Paper](https:\u002F\u002Fassets.amazon.science\u002Fd5\u002F16\u002F3f7809974a899a11bacdadefdf24\u002Fp-companion-a-principled-framework-for-diversified-complementary-product-recommendation.pdf)) `Amazon` `2020`\n34. [Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.12981) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.12981.pdf)) `Alibaba` `2020`\n35. [TPG-DNN: A Method for User Intent Prediction with Multi-task Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.02122) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.02122.pdf)) `Alibaba` `2020`\n36. [PURS: Personalized Unexpected Recommender System for Improving User Satisfaction](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3383313.3412238) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3383313.3412238)) `Alibaba` `2020`\n37. [Controllable Multi-Interest Framework for Recommendation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.09347) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.09347)) `Alibaba` `2020`\n38. [MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.02974) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.02974.pdf)) `Alibaba` `2020`\n39. [ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.12002) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.12002.pdf)) `Alibaba` `2020`\n40. [For Your Ears Only: Personalizing Spotify Home with Machine Learning](https:\u002F\u002Fengineering.atspotify.com\u002F2020\u002F01\u002F16\u002Ffor-your-ears-only-personalizing-spotify-home-with-machine-learning\u002F) `Spotify` `2020`\n41. [Reach for the Top: How Spotify Built Shortcuts in Just Six Months](https:\u002F\u002Fengineering.atspotify.com\u002F2020\u002F04\u002F15\u002Freach-for-the-top-how-spotify-built-shortcuts-in-just-six-months\u002F) `Spotify` `2020`\n42. [Contextual and Sequential User Embeddings for Large-Scale Music Recommendation](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3383313.3412248) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3383313.3412248)) `Spotify` `2020`\n43. [The Evolution of Kit: Automating Marketing Using Machine Learning](https:\u002F\u002Fengineering.shopify.com\u002Fblogs\u002Fengineering\u002Fevolution-kit-automating-marketing-machine-learning) `Shopify` `2020`\n44. [A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1)](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fcourse-recommendations-ai-part-one) `LinkedIn` `2020`\n45. [A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2)](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fcourse-recommendations-ai-part-two) `LinkedIn` `2020`\n46. [Building a Heterogeneous Social Network Recommendation System](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fbuilding-a-heterogeneous-social-network-recommendation-system) `LinkedIn` `2020`\n47. [How TikTok recommends videos #ForYou](https:\u002F\u002Fnewsroom.tiktok.com\u002Fen-us\u002Fhow-tiktok-recommends-videos-for-you) `ByteDance` `2020`\n48. [Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.02930) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.02930.pdf)) `Google` `2020`\n49. [Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.13535) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.13535.pdf)) `Google` `2020`\n50. [Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub50257\u002F) ([Paper](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002Fb9f4e78a8830fe5afcf2f0452862fb3c0d6584ea.pdf)) `Google` `2020`\n51. [Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.04473.pdf) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.04473.pdf)) `Tencent` `2020`\n52. [A Case Study of Session-based Recommendations in the Home-improvement Domain](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3383313.3412235) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3383313.3412235)) `Home Depot` `2020`\n53. [Balancing Relevance and Discovery to Inspire Customers in the IKEA App](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3383313.3411550) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3383313.3411550)) `Ikea` `2020`\n54. [How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fhow-we-use-automl-multi-task-learning-and-multi-tower-models-for-pinterest-ads-db966c3dc99e) `Pinterest` `2020`\n55. [Multi-task Learning for Related Products Recommendations at Pinterest](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fmulti-task-learning-for-related-products-recommendations-at-pinterest-62684f631c12) `Pinterest` `2020`\n56. [Improving the Quality of Recommended Pins with Lightweight Ranking](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fimproving-the-quality-of-recommended-pins-with-lightweight-ranking-8ff5477b20e3) `Pinterest` `2020`\n57. [Multi-task Learning and Calibration for Utility-based Home Feed Ranking](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fmulti-task-learning-and-calibration-for-utility-based-home-feed-ranking-64087a7bcbad) `Pinterest` `2020`\n57. [Personalized Cuisine Filter Based on Customer Preference and Local Popularity](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F01\u002F27\u002Fpersonalized-cuisine-filter\u002F) `DoorDash` `2020`\n58. [How We Built a Matchmaking Algorithm to Cross-Sell Products](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Fhow-we-built-a-matchmaking-algorithm-to-cross-sell-products) `Gojek` `2020`\n59. [Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.09293) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.09293.pdf)) `Twitter` `2021`\n60. [Self-supervised Learning for Large-scale Item Recommendations](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.12865) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.12865.pdf)) `Google` `2021`\n61. [Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.07203) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.07203.pdf)) `ByteDance` `2021`\n62. [Using AI to Help Health Experts Address the COVID-19 Pandemic](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fusing-ai-to-help-health-experts-address-the-covid-19-pandemic\u002F) `Facebook` `2021`\n63. [Advertiser Recommendation Systems at Pinterest](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fadvertiser-recommendation-systems-at-pinterest-ccb255fbde20) `Pinterest` `2021`\n64. [On YouTube's Recommendation System](https:\u002F\u002Fblog.youtube\u002Finside-youtube\u002Fon-youtubes-recommendation-system\u002F) `YouTube` `2021`\n65. [\"Are you sure?\": Preliminary Insights from Scaling Product Comparisons to Multiple Shops](https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.03256) `Coveo` `2021`\n66. [Mozrt, a Deep Learning Recommendation System Empowering Walmart Store Associates](https:\u002F\u002Fmedium.com\u002Fwalmartglobaltech\u002Fmozrt-a-deep-learning-recommendation-system-empowering-walmart-store-associates-with-a-5d42c08d88da) `Walmart` `2021`\n67. [Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.09373) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.09373.pdf)) `Meta` `2021`\n67. [The Amazon Music conversational recommender is hitting the right notes](https:\u002F\u002Fwww.amazon.science\u002Flatest-news\u002Fhow-amazon-music-uses-recommendation-system-machine-learning) `Amazon` `2022`\n68. [Personalized complementary product recommendation](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fpersonalized-complementary-product-recommendation) ([Paper](https:\u002F\u002Fassets.amazon.science\u002F6c\u002Fd9\u002Fa0ec3eda4f0fb4312ce0ada41771\u002Fpersonalized-complementary-product-recommendation.pdf)) `Amazon` `2022`\n69. [Building a Deep Learning Based Retrieval System for Personalized Recommendations](https:\u002F\u002Ftech.ebayinc.com\u002Fengineering\u002Fbuilding-a-deep-learning-based-retrieval-system-for-personalized-recommendations\u002F) `eBay` `2022`\n70. [How We Built: An Early-Stage Machine Learning Model for Recommendations](https:\u002F\u002Fwww.onepeloton.com\u002Fpress\u002Farticles\u002Fhow-we-built-machine-learning) `Peloton` `2022`\n71. [Lessons Learned from Building out Context-Aware Recommender Systems](https:\u002F\u002Fwww.onepeloton.com\u002Fpress\u002Farticles\u002Flessons-learned-from-building-context-aware-recommender-systems) `Peloton` `2022`\n72. [Beyond Matrix Factorization: Using hybrid features for user-business recommendations](https:\u002F\u002Fengineeringblog.yelp.com\u002F2022\u002F04\u002Fbeyond-matrix-factorization-using-hybrid-features-for-user-business-recommendations.html) `Yelp` `2022`\n73. [Improving job matching with machine-learned activity features](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2022\u002Fimproving-job-matching-with-machine-learned-activity-features-) `LinkedIn` `2022`\n74. [Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.09373v4) `Meta` `2022`\n75. [Blueprints for recommender system architectures: 10th anniversary edition](https:\u002F\u002Famatriain.net\u002Fblog\u002FRecsysArchitectures) `Xavier Amatriain` `2022`\n76. [How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fhow-pinterest-leverages-realtime-user-actions-in-recommendation-to-boost-homefeed-engagement-volume-165ae2e8cde8) `Pinterest` `2022`\n77. [RecSysOps: Best Practices for Operating a Large-Scale Recommender System](https:\u002F\u002Fnetflixtechblog.medium.com\u002Frecsysops-best-practices-for-operating-a-large-scale-recommender-system-95bbe195a841) `Netflix` `2022`\n78. [Recommend API: Unified end-to-end machine learning infrastructure to generate recommendations](https:\u002F\u002Fslack.engineering\u002Frecommend-api\u002F) `Slack` `2022`\n79. [Evolving DoorDash’s Substitution Recommendations Algorithm](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F09\u002F08\u002Fevolving-doordashs-substitution-recommendations-algorithm\u002F) `DoorDash` `2022`\n80. [Homepage Recommendation with Exploitation and Exploration](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F10\u002F05\u002Fhomepage-recommendation-with-exploitation-and-exploration\u002F) `DoorDash` `2022`\n81. [GPU-accelerated ML Inference at Pinterest](https:\u002F\u002Fmedium.com\u002F@Pinterest_Engineering\u002Fgpu-accelerated-ml-inference-at-pinterest-ad1b6a03a16d) `Pinterest` `2022`\n82. [Addressing Confounding Feature Issue for Causal Recommendation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.06532) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.06532.pdf)) `Tencent` `2022`\n\n\n## Search & Ranking\n1. [Amazon Search: The Joy of Ranking Products](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Famazon-search-the-joy-of-ranking-products) ([Paper](https:\u002F\u002Fassets.amazon.science\u002F89\u002Fcd\u002F34289f1f4d25b5857d776bdf04d5\u002Famazon-search-the-joy-of-ranking-products.pdf), [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=NLrhmn-EZ88), [Code](https:\u002F\u002Fgithub.com\u002Fdariasor\u002FTreeExtra)) `Amazon` `2016`\n2. [How Lazada Ranks Products to Improve Customer Experience and Conversion](https:\u002F\u002Fwww.slideshare.net\u002Feugeneyan\u002Fhow-lazada-ranks-products-to-improve-customer-experience-and-conversion) `Lazada` `2016`\n3. [Ranking Relevance in Yahoo Search](https:\u002F\u002Fwww.kdd.org\u002Fkdd2016\u002Fsubtopic\u002Fview\u002Franking-relevance-in-yahoo-search) ([Paper](https:\u002F\u002Fwww.kdd.org\u002Fkdd2016\u002Fpapers\u002Ffiles\u002Fadf0361-yinA.pdf)) `Yahoo` `2016`\n4. [Learning to Rank Personalized Search Results in Professional Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.04624) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1605.04624.pdf)) `LinkedIn` `2016`\n5. [Using Deep Learning at Scale in Twitter’s Timelines](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2017\u002Fusing-deep-learning-at-scale-in-twitters-timelines.html) `Twitter` `2017`\n6. [An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.01377) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.01377.pdf)) `Etsy` `2017`\n7. [Powering Search & Recommendations at DoorDash](https:\u002F\u002Fdoordash.engineering\u002F2017\u002F07\u002F06\u002Fpowering-search-recommendations-at-doordash\u002F) `DoorDash` `2017`\n8. [Applying Deep Learning To Airbnb Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.09591) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.09591.pdf)) `Airbnb` `2018`\n9. [In-session Personalization for Talent Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.06488) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1809.06488.pdf)) `LinkedIn` `2018`\n10. [Talent Search and Recommendation Systems at LinkedIn](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.06481) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1809.06481.pdf)) `LinkedIn` `2018`\n11. [Food Discovery with Uber Eats: Building a Query Understanding Engine](https:\u002F\u002Feng.uber.com\u002Fuber-eats-query-understanding\u002F) `Uber` `2018`\n12. [Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F1805.08524) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1805.08524.pdf)) `Alibaba` `2018`\n13. [Reinforcement Learning to Rank in E-Commerce Search Engine](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.00710) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.00710.pdf)) `Alibaba` `2018`\n14. [Semantic Product Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.00937) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.00937.pdf)) `Amazon` `2019`\n15. [Machine Learning-Powered Search Ranking of Airbnb Experiences](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fmachine-learning-powered-search-ranking-of-airbnb-experiences-110b4b1a0789) `Airbnb` `2019`\n16. [Entity Personalized Talent Search Models with Tree Interaction Features](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.09041) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1902.09041.pdf)) `LinkedIn` `2019`\n17. [The AI Behind LinkedIn Recruiter Search and recommendation systems](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002F04\u002Fai-behind-linkedin-recruiter-search-and-recommendation-systems) `LinkedIn` `2019`\n18. [Learning Hiring Preferences: The AI Behind LinkedIn Jobs](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002F02\u002Flearning-hiring-preferences--the-ai-behind-linkedin-jobs) `LinkedIn` `2019`\n19. [The Secret Sauce Behind Search Personalisation](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Fthe-secret-sauce-behind-search-personalisation) `Gojek` `2019`\n20. [Neural Code Search: ML-based Code Search Using Natural Language Queries](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fneural-code-search-ml-based-code-search-using-natural-language-queries\u002F) `Facebook` `2019`\n21. [Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.08882) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1902.08882.pdf)) `Alibaba` `2019`\n22. [Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3357384.3357809) `Alibaba` `2019`\n23. [Understanding Searches Better Than Ever Before](https:\u002F\u002Fwww.blog.google\u002Fproducts\u002Fsearch\u002Fsearch-language-understanding-bert\u002F) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.04805.pdf)) `Google` `2019`\n24. [How We Used Semantic Search to Make Our Search 10x Smarter](https:\u002F\u002Fmedium.com\u002Ftokopedia-engineering\u002Fhow-we-used-semantic-search-to-make-our-search-10x-smarter-bd9c7f601821) `Tokopedia` `2019`\n25. [Query2vec: Search query expansion with query embeddings](https:\u002F\u002Fbytes.grubhub.com\u002Fsearch-query-embeddings-using-query2vec-f5931df27d79) `GrubHub` `2019`\n26. [MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search](http:\u002F\u002Fresearch.baidu.com\u002FPublic\u002Fuploads\u002F5d12eca098d40.pdf) `Baidu` `2019`\n27. [Why Do People Buy Seemingly Irrelevant Items in Voice Product Search?](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fwhy-do-people-buy-irrelevant-items-in-voice-product-search) ([Paper](https:\u002F\u002Fassets.amazon.science\u002Ff7\u002F48\u002F0562b2c14338a0b76ccf4f523fa5\u002Fwhy-do-people-buy-irrelevant-items-in-voice-product-search.pdf)) `Amazon` `2020`\n28. [Managing Diversity in Airbnb Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.02621) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.02621.pdf)) `Airbnb` `2020`\n29. [Improving Deep Learning for Airbnb Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.05515) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2002.05515.pdf)) `Airbnb` `2020`\n30. [Quality Matches Via Personalized AI for Hirer and Seeker Preferences](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fquality-matches-via-personalized-ai) `LinkedIn` `2020`\n31. [Understanding Dwell Time to Improve LinkedIn Feed Ranking](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Funderstanding-feed-dwell-time) `LinkedIn` `2020`\n32. [Ads Allocation in Feed via Constrained Optimization](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3394486.3403391) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3394486.3403391), [Video](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f33697a0576dd25aef288ea\u002F)) `LinkedIn` `2020`\n33. [Understanding Dwell Time to Improve LinkedIn Feed Ranking](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Funderstanding-feed-dwell-time) `LinkedIn` `2020`\n34. [AI at Scale in Bing](https:\u002F\u002Fblogs.bing.com\u002Fsearch\u002F2020_05\u002FAI-at-Scale-in-Bing) `Microsoft` `2020`\n35. [Query Understanding Engine in Traveloka Universal Search](https:\u002F\u002Fmedium.com\u002Ftraveloka-engineering\u002Fquery-understanding-engine-in-traveloka-universal-search-410ad3895db7) `Traveloka` `2020`\n36. [Bayesian Product Ranking at Wayfair](https:\u002F\u002Ftech.wayfair.com\u002Fdata-science\u002F2020\u002F01\u002Fbayesian-product-ranking-at-wayfair) `Wayfair` `2020`\n37. [COLD: Towards the Next Generation of Pre-Ranking System](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.16122) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.16122.pdf)) `Alibaba` `2020`\n38. [Shop The Look: Building a Large Scale Visual Shopping System at Pinterest](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3394486.3403372) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3394486.3403372), [Video](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369790576dd25aef288d7\u002F)) `Pinterest` `2020`\n39. [Driving Shopping Upsells from Pinterest Search](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fdriving-shopping-upsells-from-pinterest-search-d06329255402) `Pinterest` `2020`\n40. [GDMix: A Deep Ranking Personalization Framework](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fgdmix--a-deep-ranking-personalization-framework) ([Code](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fgdmix)) `LinkedIn` `2020`\n41. [Bringing Personalized Search to Etsy](https:\u002F\u002Fcodeascraft.com\u002F2020\u002F10\u002F29\u002Fbringing-personalized-search-to-etsy\u002F) `Etsy` `2020`\n42. [Building a Better Search Engine for Semantic Scholar](https:\u002F\u002Fmedium.com\u002Fai2-blog\u002Fbuilding-a-better-search-engine-for-semantic-scholar-ea23a0b661e7) `Allen Institute for AI` `2020`\n43. [Query Understanding for Natural Language Enterprise Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.06238) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2012.06238.pdf)) `Salesforce` `2020`\n44. [Things Not Strings: Understanding Search Intent with Better Recall](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F12\u002F15\u002Funderstanding-search-intent-with-better-recall\u002F) `DoorDash` `2020`\n45. [Query Understanding for Surfacing Under-served Music Content](https:\u002F\u002Fresearch.atspotify.com\u002Fpublications\u002Fquery-understanding-for-surfacing-under-served-music-content\u002F) ([Paper](https:\u002F\u002Flabtomarket.files.wordpress.com\u002F2020\u002F08\u002Fcikm2020.pdf)) `Spotify` `2020`\n46. [Embedding-based Retrieval in Facebook Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11632) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.11632.pdf)) `Facebook` `2020`\n47. [Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.02282) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.02282.pdf)) `JD` `2020`\n48. [QUEEN: Neural query rewriting in e-commerce](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fqueen-neural-query-rewriting-in-e-commerce) ([Paper](https:\u002F\u002Fassets.amazon.science\u002Ff9\u002F78\u002Fdda8f1e143dba8ca96e43ec487c6\u002Fqueen-neural-query-rewriting-in-ecommerce.pdf)) `Amazon` `2021`\n49. [Using Learning-to-rank to Precisely Locate Where to Deliver Packages](https:\u002F\u002Fwww.amazon.science\u002Fblog\u002Fusing-learning-to-rank-to-precisely-locate-where-to-deliver-packages) ([Paper](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fgetting-your-package-to-the-right-place-supervised-machine-learning-for-geolocation)) `Amazon` `2021`\n50. [Seasonal relevance in e-commerce search](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fseasonal-relevance-in-e-commerce-search) ([Paper](https:\u002F\u002Fassets.amazon.science\u002Fac\u002F5e\u002Fd47612a846d6bec15738d7c8ab40\u002Fseasonal-relevance-in-ecommerce-search.pdf)) `Amazon` `2021`\n51. [Graph Intention Network for Click-through Rate Prediction in Sponsored Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.16164) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.16164.pdf)) `Alibaba` `2021`\n52. [How We Built A Context-Specific Bidding System for Etsy Ads](https:\u002F\u002Fcodeascraft.com\u002F2021\u002F03\u002F23\u002Fhow-we-built-a-context-specific-bidding-system-for-etsy-ads\u002F) `Etsy` `2021`\n53. [Pre-trained Language Model based Ranking in Baidu Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.11108) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.11108.pdf)) `Baidu` `2021`\n54. [Stitching together spaces for query-based recommendations](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2021\u002F08\u002F13\u002Fstitching-together-spaces-for-query-based-recommendations\u002F) `Stitch Fix` `2021`\n55. [Deep Natural Language Processing for LinkedIn Search Systems](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.08252) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.08252.pdf)) `LinkedIn` `2021`\n56. [Siamese BERT-based Model for Web Search Relevance Ranking](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.01810) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2112.01810.pdf), [Code](https:\u002F\u002Fgithub.com\u002Fseznam\u002FDaReCzech)) `Seznam` `2021`\n57. [SearchSage: Learning Search Query Representations at Pinterest](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fsearchsage-learning-search-query-representations-at-pinterest-654f2bb887fc) `Pinterest` `2021`\n58. [Query2Prod2Vec: Grounded Word Embeddings for eCommerce](https:\u002F\u002Faclanthology.org\u002F2021.naacl-industry.20\u002F) `Coveo` `2021`\n59. [3 Changes to Expand DoorDash’s Product Search Beyond Delivery](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F05\u002F10\u002F3-changes-to-expand-doordashs-product-search\u002F) `DoorDash` `2022`\n60. [Learning To Rank Diversely](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Flearning-to-rank-diversely-add6b1929621) `Airbnb` `2022`\n61. [How to Optimise Rankings with Cascade Bandits](https:\u002F\u002Fmedium.com\u002Fexpedia-group-tech\u002Fhow-to-optimise-rankings-with-cascade-bandits-5d92dfa0f16b) `Expedia` `2022`\n62. [A Guide to Google Search Ranking Systems](https:\u002F\u002Fdevelopers.google.com\u002Fsearch\u002Fdocs\u002Fappearance\u002Franking-systems-guide) `Google` `2022` \n63. [Deep Learning for Search Ranking at Etsy](https:\u002F\u002Fwww.etsy.com\u002Fcodeascraft\u002Fdeep-learning-for-search-ranking-at-etsy) `Etsy` `2022`\n64. [Search at Calm](https:\u002F\u002Feng.calm.com\u002Fposts\u002Fsearch-at-calm) `Calm` `2022`\n\n## Embeddings\n1. [Vector Representation Of Items, Customer And Cart To Build A Recommendation System](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.06338) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1705.06338.pdf)) `Sears` `2017`\n2. [Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.02349) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.02349.pdf)) `Alibaba` `2018`\n3. [Embeddings@Twitter](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2018\u002Fembeddingsattwitter.html) `Twitter` `2018`\n4. [Listing Embeddings in Search Ranking](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Flisting-embeddings-for-similar-listing-recommendations-and-real-time-personalization-in-search-601172f7603e) ([Paper](https:\u002F\u002Fwww.kdd.org\u002Fkdd2018\u002Faccepted-papers\u002Fview\u002Freal-time-personalization-using-embeddings-for-search-ranking-at-airbnb)) `Airbnb` `2018`\n5. [Understanding Latent Style](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2018\u002F06\u002F28\u002Flatent-style\u002F) `Stitch Fix` `2018`\n6. [Towards Deep and Representation Learning for Talent Search at LinkedIn](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.06473) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1809.06473.pdf)) `LinkedIn` `2018`\n7. [Personalized Store Feed with Vector Embeddings](https:\u002F\u002Fdoordash.engineering\u002F2018\u002F04\u002F02\u002Fpersonalized-store-feed-with-vector-embeddings\u002F) `DoorDash` `2018`\n8. [Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.06556)([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.06556.pdf)) `Moshbit` `2019`\n9. [Machine Learning for a Better Developer Experience](https:\u002F\u002Fnetflixtechblog.com\u002Fmachine-learning-for-a-better-developer-experience-1e600c69f36c) `Netflix` `2020`\n10. [Announcing ScaNN: Efficient Vector Similarity Search](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F07\u002Fannouncing-scann-efficient-vector.html) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1908.10396.pdf), [Code](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002Fmaster\u002Fscann)) `Google` `2020`\n11. [BERT Goes Shopping: Comparing Distributional Models for Product Representations](https:\u002F\u002Faclanthology.org\u002F2021.ecnlp-1.1\u002F) `Coveo` `2021`\n12. [The Embeddings That Came in From the Cold: Improving Vectors for New and Rare Products with Content-Based Inference](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3383313.3411477) `Coveo` `2022`\n13. [Embedding-based Retrieval at Scribd](https:\u002F\u002Ftech.scribd.com\u002Fblog\u002F2021\u002Fembedding-based-retrieval-scribd.html) `Scribd` `2021`\n14. [Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.12724) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2208.12724.pdf)) `Apple` `2022`\n15. [Embeddings at Spotify's Scale - How Hard Could It Be?](https:\u002F\u002Farize.com\u002Fresource\u002Fembeddings-at-scale-spotify-recsys\u002F) `Spotify` `2023`\n\n## Natural Language Processing\n1. [Abusive Language Detection in Online User Content](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F2872427.2883062) ([Paper](http:\u002F\u002Fwww.yichang-cs.com\u002Fyahoo\u002FWWW16_Abusivedetection.pdf)) `Yahoo` `2016`\n2. [Smart Reply: Automated Response Suggestion for Email](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub45189\u002F) ([Paper](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F45189.pdf)) `Google` `2016` \n3. [Building Smart Replies for Member Messages](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2017\u002F10\u002Fbuilding-smart-replies-for-member-messages) `LinkedIn` `2017`\n4. [How Natural Language Processing Helps LinkedIn Members Get Support Easily](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002F04\u002Fhow-natural-language-processing-help-support) `LinkedIn` `2019`\n5. [Gmail Smart Compose: Real-Time Assisted Writing](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.00080) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.00080.pdf)) `Google` `2019`\n6. [Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fgoal-oriented-end-to-end-chatbots-with-profile-features-in-a-real-world-setting) ([Paper](https:\u002F\u002Fassets.amazon.science\u002F47\u002F03\u002Fe0d14dc34d3eb6e0d4ec282067bd\u002Fgoal-oriented-end-to-end-chatbots-with-profile-features-in-a-real-world-setting.pdf)) `Amazon` `2019`\n7. [Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2019\u002F07\u002F15\u002Fgive-me-jeans\u002F) `Stitch Fix` `2019`\n8. [DeText: A deep NLP Framework for Intelligent Text Understanding](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fopen-sourcing-detext) ([Code](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fdetext)) `LinkedIn` `2020`\n9. [SmartReply for YouTube Creators](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F07\u002Fsmartreply-for-youtube-creators.html) `Google` `2020`\n10. [Using Neural Networks to Find Answers in Tables](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F04\u002Fusing-neural-networks-to-find-answers.html) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.02349.pdf)) `Google` `2020`\n11. [A Scalable Approach to Reducing Gender Bias in Google Translate](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F04\u002Fa-scalable-approach-to-reducing-gender.html) `Google` `2020`\n12. [Assistive AI Makes Replying Easier](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fgroup\u002Fmsai\u002Farticles\u002Fassistive-ai-makes-replying-easier-2\u002F) `Microsoft` `2020`\n13. [AI Advances to Better Detect Hate Speech](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fai-advances-to-better-detect-hate-speech\u002F) `Facebook` `2020`\n14. [A State-of-the-Art Open Source Chatbot](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fstate-of-the-art-open-source-chatbot) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.13637.pdf)) `Facebook` `2020`\n15. [A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fa-highly-efficient-real-time-text-to-speech-system-deployed-on-cpus\u002F) `Facebook` `2020`\n16. [Deep Learning to Translate Between Programming Languages](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fdeep-learning-to-translate-between-programming-languages\u002F) ([Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.03511), [Code](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FTransCoder)) `Facebook` `2020`\n17. [Deploying Lifelong Open-Domain Dialogue Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.08076) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.08076.pdf)) `Facebook` `2020`\n18. [Introducing Dynabench: Rethinking the way we benchmark AI](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fdynabench-rethinking-ai-benchmarking\u002F) `Facebook` `2020`\n19. [How Gojek Uses NLP to Name Pickup Locations at Scale](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Fnlp-cartobert) `Gojek` `2020`\n20. [The State-of-the-art Open-Domain Chatbot in Chinese and English](http:\u002F\u002Fresearch.baidu.com\u002FBlog\u002Findex-view?id=142) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.16779.pdf)) `Baidu` `2020`\n21. [PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F06\u002Fpegasus-state-of-art-model-for.html) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.08777.pdf), [Code](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fpegasus)) `Google` `2020`\n22. [Photon: A Robust Cross-Domain Text-to-SQL System](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.acl-demos.24\u002F) ([Paper](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.acl-demos.24.pdf)) ([Demo](http:\u002F\u002Fnaturalsql.com)) `Salesforce`\t`2020`\n23. [GeDi: A Powerful New Method for Controlling Language Models](https:\u002F\u002Fblog.einstein.ai\u002Fgedi\u002F) ([Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.06367), [Code](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FGeDi)) `Salesforce` `2020`\n24. [Applying Topic Modeling to Improve Call Center Operations](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=kzRR8OjF_eI&t=2s) `RICOH` `2020`\n25. [WIDeText: A Multimodal Deep Learning Framework](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fwidetext-a-multimodal-deep-learning-framework-31ce2565880c) `Airbnb` `2020`\n26. [Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fdynaboard-moving-beyond-accuracy-to-holistic-model-evaluation-in-nlp) ([Code](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdynalab?fbclid=IwAR3qcV7QK2uXm4s4M0XUoQQo4i2DEsDy0LZFKxSQCHhP-3hF6fr2-NDFWX8)) `Facebook`  `2021`\n27. [How we reduced our text similarity runtime by 99.96%](https:\u002F\u002Fmedium.com\u002Fdata-science-at-microsoft\u002Fhow-we-reduced-our-text-similarity-runtime-by-99-96-e8e4b4426b35) `Microsoft` `2021`\n28. [Textless NLP: Generating expressive speech from raw audio](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Ftextless-nlp-generating-expressive-speech-from-raw-audio\u002F) [(Part 1)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.01192) [(Part 2)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.00355) [(Part 3)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.03264) [(Code and Pretrained Models)](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Ftree\u002Fmaster\u002Fexamples\u002Ftextless_nlp) `Facebook` `2021`\n29. [Grammar Correction as You Type, on Pixel 6](https:\u002F\u002Fai.googleblog.com\u002F2021\u002F10\u002Fgrammar-correction-as-you-type-on-pixel.html) `Google` `2021`\n30. [Auto-generated Summaries in Google Docs](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F03\u002Fauto-generated-summaries-in-google-docs.html) `Google` `2022`\n31. [ML-Enhanced Code Completion Improves Developer Productivity](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F07\u002Fml-enhanced-code-completion-improves.html) `Google` `2022`\n32. [Words All the Way Down — Conversational Sentiment Analysis](https:\u002F\u002Fmedium.com\u002Fpaypal-tech\u002Fwords-all-the-way-down-conversational-sentiment-analysis-afe0165b84db) `PayPal` `2022`\n\n## Sequence Modelling\n1. [Doctor AI: Predicting Clinical Events via Recurrent Neural Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05942) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.05942.pdf)) `Sutter Health` `2015`\n2. [Deep Learning for Understanding Consumer Histories](https:\u002F\u002Fengineering.zalando.com\u002Fposts\u002F2016\u002F10\u002Fdeep-learning-for-understanding-consumer-histories.html) ([Paper](https:\u002F\u002Fdoogkong.github.io\u002F2017\u002Fpapers\u002Fpaper2.pdf)) `Zalando` `2016`\n3. [Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC5391725\u002F) ([Paper](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC5391725\u002Fpdf\u002Focw112.pdf)) `Sutter Health` `2016`\n4. [Continual Prediction of Notification Attendance with Classical and Deep Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.07120) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.07120.pdf)) `Telefonica` `2017` \n5. [Deep Learning for Electronic Health Records](https:\u002F\u002Fai.googleblog.com\u002F2018\u002F05\u002Fdeep-learning-for-electronic-health.html) ([Paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-018-0029-1.pdf)) `Google` `2018`\n6. [Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.09248) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.09248.pdf))`Alibaba` `2019`\n7. [Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.05639) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.05639.pdf)) `Alibaba` `2020`\n8. [How Duolingo uses AI in every part of its app](https:\u002F\u002Fventurebeat.com\u002F2020\u002F08\u002F18\u002Fhow-duolingo-uses-ai-in-every-part-of-its-app\u002F) `Duolingo` `2020`\n9. [Leveraging Online Social Interactions For Enhancing Integrity at Facebook](https:\u002F\u002Fresearch.fb.com\u002Fblog\u002F2020\u002F08\u002Fleveraging-online-social-interactions-for-enhancing-integrity-at-facebook\u002F) ([Paper](https:\u002F\u002Fresearch.fb.com\u002Fwp-content\u002Fuploads\u002F2020\u002F08\u002FTIES-Temporal-Interaction-Embeddings-For-Enhancing-Social-Media-Integrity-At-Facebook.pdf), [Video](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369780576dd25aef288cf\u002F)) `Facebook` `2020`\n10. [Using deep learning to detect abusive sequences of member activity](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2021\u002Fusing-deep-learning-to-detect-abusive-sequences-of-member-activi) ([Video](https:\u002F\u002Fexchange.scale.com\u002Fpublic\u002Fvideos\u002Fusing-deep-learning-to-detect-abusive-sequences-of-member-activity-on-linkedin)) `LinkedIn` `2021`\n\n## Computer Vision\n1. [Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning](https:\u002F\u002Fdropbox.tech\u002Fmachine-learning\u002Fcreating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning) `Dropbox` `2017`\n2. [Categorizing Listing Photos at Airbnb](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fcategorizing-listing-photos-at-airbnb-f9483f3ab7e3) `Airbnb` `2018`\n3. [Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Famenity-detection-and-beyond-new-frontiers-of-computer-vision-at-airbnb-144a4441b72e) `Airbnb` `2019`\n4. [How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors](https:\u002F\u002Fdeepomatic.com\u002Fen\u002Fhow-we-improved-computer-vision-metrics-by-more-than-5-percent-only-by-cleaning-labelling-errors\u002F) `Deepomatic`\n5. [Making machines recognize and transcribe conversations in meetings using audio and video](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fblog\u002Fmaking-machines-recognize-and-transcribe-conversations-in-meetings-using-audio-and-video\u002F) `Microsoft` `2019`\n6. [Powered by AI: Advancing product understanding and building new shopping experiences](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fpowered-by-ai-advancing-product-understanding-and-building-new-shopping-experiences\u002F) `Facebook` `2020`\n7. [A Neural Weather Model for Eight-Hour Precipitation Forecasting](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F03\u002Fa-neural-weather-model-for-eight-hour.html) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2003.12140.pdf)) `Google` `2020`\n8. [Machine Learning-based Damage Assessment for Disaster Relief](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F06\u002Fmachine-learning-based-damage.html) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.06444.pdf)) `Google` `2020`\n9. [RepNet: Counting Repetitions in Videos](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F06\u002Frepnet-counting-repetitions-in-videos.html) ([Paper](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2020\u002Fpapers\u002FDwibedi_Counting_Out_Time_Class_Agnostic_Video_Repetition_Counting_in_the_CVPR_2020_paper.pdf)) `Google` `2020`\n10. [Converting Text to Images for Product Discovery](https:\u002F\u002Fwww.amazon.science\u002Fblog\u002Fconverting-text-to-images-for-product-discovery) ([Paper](https:\u002F\u002Fassets.amazon.science\u002F4c\u002F76\u002F5830542547b7a11089ce3af943b4\u002Fscipub-972.pdf)) `Amazon` `2020`\n11. [How Disney Uses PyTorch for Animated Character Recognition](https:\u002F\u002Fmedium.com\u002Fpytorch\u002Fhow-disney-uses-pytorch-for-animated-character-recognition-a1722a182627) `Disney` `2020`\n12. [Image Captioning as an Assistive Technology](https:\u002F\u002Fwww.ibm.com\u002Fblogs\u002Fresearch\u002F2020\u002F07\u002Fimage-captioning-assistive-technology\u002F) ([Video](https:\u002F\u002Fivc.ischool.utexas.edu\u002F~yz9244\u002FVizWiz_workshop\u002Fvideos\u002FMMTeam-oral.mp4)) `IBM` `2020`\n13. [AI for AG: Production machine learning for agriculture](https:\u002F\u002Fmedium.com\u002Fpytorch\u002Fai-for-ag-production-machine-learning-for-agriculture-e8cfdb9849a1) `Blue River` `2020`\n14. [AI for Full-Self Driving at Tesla](https:\u002F\u002Fyoutu.be\u002Fhx7BXih7zx8?t=513) `Tesla` `2020`\n15. [On-device Supermarket Product Recognition](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F07\u002Fon-device-supermarket-product.html) `Google` `2020`\n16. [Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F08\u002Fusing-machine-learning-to-detect.html) ([Paper](https:\u002F\u002Fieeexplore.ieee.org\u002Fstamp\u002Fstamp.jsp?tp=&arnumber=9097918)) `Google` `2020`\n17. [Shop The Look: Building a Large Scale Visual Shopping System at Pinterest](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3394486.3403372) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3394486.3403372), [Video](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369790576dd25aef288d7\u002F)) `Pinterest` `2020`\n18. [Developing Real-Time, Automatic Sign Language Detection for Video Conferencing](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F10\u002Fdeveloping-real-time-automatic-sign.html) ([Paper](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F2eaf0d18ec6bef00d7dd88f39dd4f9ff13eeeeb2.pdf)) `Google` `2020`\n19. [Vision-based Price Suggestion for Online Second-hand Items](https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.06009) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2012.06009.pdf)) `Alibaba` `2020`\n20. [New AI Research to Help Predict COVID-19 Resource Needs From X-rays](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fnew-ai-research-to-help-predict-covid-19-resource-needs-from-a-series-of-x-rays\u002F) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2101.04909.pdf), [Model](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FCovidPrognosis)) `Facebook` `2021`\n21. [An Efficient Training Approach for Very Large Scale Face Recognition](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.10375) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.10375)) `Alibaba` `2021`\n22. [Identifying Document Types at Scribd](https:\u002F\u002Ftech.scribd.com\u002Fblog\u002F2021\u002Fidentifying-document-types.html) `Scribd` `2021`\n23. [Semi-Supervised Visual Representation Learning for Fashion Compatibility](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.08052.pdf) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.08052.pdf)) `Walmart` `2021`\n24. [Recognizing People in Photos Through Private On-Device Machine Learning](https:\u002F\u002Fmachinelearning.apple.com\u002Fresearch\u002Frecognizing-people-photos) `Apple` `2021`\n25. [DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.08195.pdf) `Google` `2022`\n26. [Contrastive language and vision learning of general fashion concepts](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-022-23052-9) ([Paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-022-23052-9.pdf))`Coveo` `2022`\n27. [Leveraging Computer Vision for Search Ranking](https:\u002F\u002Farize.com\u002Fresource\u002Fbazaarvoice-leveraging-computer-vision-models-for-search-ranking\u002F) `BazaarVoice` `2023`\n\n## Reinforcement Learning\n1. [Deep Reinforcement Learning for Sponsored Search Real-time Bidding](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.00259) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.00259.pdf)) `Alibaba` `2018`\n2. [Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.08365) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.08365.pdf)) `Alibaba` `2018`\n3. [Reinforcement Learning for On-Demand Logistics](https:\u002F\u002Fdoordash.engineering\u002F2018\u002F09\u002F10\u002Freinforcement-learning-for-on-demand-logistics\u002F) `DoorDash` `2018`\n4. [Reinforcement Learning to Rank in E-Commerce Search Engine](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.00710) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.00710.pdf)) `Alibaba` `2018`\n5. [Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.02572) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.02572.pdf)) `Alibaba` `2019`\n6. [Productionizing Deep Reinforcement Learning with Spark and MLflow](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=hy-w69zf4oo) `Zynga` `2020`\n7. [Deep Reinforcement Learning in Production Part1](https:\u002F\u002Ftowardsdatascience.com\u002Fdeep-reinforcement-learning-in-production-7e1e63471e2) [Part 2](https:\u002F\u002Ftowardsdatascience.com\u002Fdeep-reinforcement-learning-in-production-part-2-personalizing-user-notifications-812a68ce2355) `Zynga` `2020`\n8. [Building AI Trading Systems](https:\u002F\u002Fdennybritz.com\u002Fblog\u002Fai-trading\u002F) `Denny Britz` `2020`\n9. [Shifting Consumption towards Diverse content via Reinforcement Learning](https:\u002F\u002Fresearch.atspotify.com\u002Fshifting-consumption-towards-diverse-content-via-reinforcement-learning\u002F) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3437963.3441775)) `Spotify` `2022`\n10. [Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.06516) `Meta` `2022`\n11. [How to Optimise Rankings with Cascade Bandits](https:\u002F\u002Fmedium.com\u002Fexpedia-group-tech\u002Fhow-to-optimise-rankings-with-cascade-bandits-5d92dfa0f16b) `Expedia` `2022`\n12. [Selecting the Best Image for Each Merchant Using Exploration and Machine Learning](https:\u002F\u002Fdoordash.engineering\u002F2023\u002F01\u002F04\u002Fselecting-the-best-image-for-each-merchant-using-exploration-and-machine-learning\u002F) `DoorDash` `2023`\n\n## Anomaly Detection\n1. [Detecting Performance Anomalies in External Firmware Deployments](https:\u002F\u002Fnetflixtechblog.com\u002Fdetecting-performance-anomalies-in-external-firmware-deployments-ed41b1bfcf46) `Netflix` `2019`\n2. [Detecting and Preventing Abuse on LinkedIn using Isolation Forests](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002Fisolation-forest) ([Code](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fisolation-forest)) `LinkedIn` `2019`\n3. [Deep Anomaly Detection with Spark and Tensorflow](https:\u002F\u002Fdatabricks.com\u002Fsession_eu19\u002Fdeep-anomaly-detection-from-research-to-production-leveraging-spark-and-tensorflow) [(Hopsworks Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TgXVU8DSyCQ)) `Swedbank`, `Hopsworks` `2019`\n4. [Preventing Abuse Using Unsupervised Learning](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=sFRrFWYNAUI) `LinkedIn` `2020`\n5. [The Technology Behind Fighting Harassment on LinkedIn](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Ffighting-harassment) `LinkedIn` `2020`\n6. [Uncovering Insurance Fraud Conspiracy with Network Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.12789) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2002.12789.pdf)) `Ant Financial` `2020`\n7. [How Does Spam Protection Work on Stack Exchange?](https:\u002F\u002Fstackoverflow.blog\u002F2020\u002F06\u002F25\u002Fhow-does-spam-protection-work-on-stack-exchange\u002F) `Stack Exchange` `2020`\n8. [Auto Content Moderation in C2C e-Commerce](https:\u002F\u002Fwww.usenix.org\u002Fconference\u002Fopml20\u002Fpresentation\u002Fueta) `Mercari` `2020`\n9. [Blocking Slack Invite Spam With Machine Learning](https:\u002F\u002Fslack.engineering\u002Fblocking-slack-invite-spam-with-machine-learning\u002F) `Slack` `2020`\n10. [Cloudflare Bot Management: Machine Learning and More](https:\u002F\u002Fblog.cloudflare.com\u002Fcloudflare-bot-management-machine-learning-and-more\u002F) `Cloudflare` `2020`\n11. [Anomalies in Oil Temperature Variations in a Tunnel Boring Machine](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=YV_uLLhPRAk) `SENER` `2020`\n12. [Using Anomaly Detection to Monitor Low-Risk Bank Customers](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=MExokMM_Bp4&t=3s) `Rabobank` `2020`\n13. [Fighting fraud with Triplet Loss](https:\u002F\u002Ftech.olx.com\u002Ffighting-fraud-with-triplet-loss-86e5f79c7a3e) `OLX Group` `2020`\n14. [Facebook is Now Using AI to Sort Content for Quicker Moderation](https:\u002F\u002Fwww.theverge.com\u002F2020\u002F11\u002F13\u002F21562596\u002Ffacebook-ai-moderation) ([Alternative](https:\u002F\u002Fventurebeat.com\u002F2020\u002F11\u002F13\u002Ffacebooks-redoubled-ai-efforts-wont-stop-the-spread-of-harmful-content\u002F)) `Facebook` `2020`\n15. How AI is getting better at detecting hate speech [Part 1](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fhow-ai-is-getting-better-at-detecting-hate-speech\u002F), [Part 2](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fheres-how-were-using-ai-to-help-detect-misinformation\u002F), [Part 3](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Ftraining-ai-to-detect-hate-speech-in-the-real-world\u002F), [Part 4](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fhow-facebook-uses-super-efficient-ai-models-to-detect-hate-speech\u002F) `Facebook` `2020`\n16. [Using deep learning to detect abusive sequences of member activity](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2021\u002Fusing-deep-learning-to-detect-abusive-sequences-of-member-activi) ([Video](https:\u002F\u002Fexchange.scale.com\u002Fpublic\u002Fvideos\u002Fusing-deep-learning-to-detect-abusive-sequences-of-member-activity-on-linkedin)) `LinkedIn` `2021`\n17. [Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop](https:\u002F\u002Feng.uber.com\u002Fproject-radar-intelligent-early-fraud-detection\u002F) `Uber` `2022`\n18. [Graph for Fraud Detection](https:\u002F\u002Fengineering.grab.com\u002Fgraph-for-fraud-detection) `Grab` `2022`\n19. [Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.06516) `Meta` `2022`\n20. [Evolving our machine learning to stop mobile bots](https:\u002F\u002Fblog.cloudflare.com\u002Fmachine-learning-mobile-traffic-bots\u002F) `Cloudflare` `2022`\n21. [Improving the accuracy of our machine learning WAF using data augmentation and sampling](https:\u002F\u002Fblog.cloudflare.com\u002Fdata-generation-and-sampling-strategies\u002F) `Cloudflare` `2022`\n22. [Machine Learning for Fraud Detection in Streaming Services](https:\u002F\u002Fnetflixtechblog.com\u002Fmachine-learning-for-fraud-detection-in-streaming-services-b0b4ef3be3f6) `Netflix` `2022`\n23. [Pricing at Lyft](https:\u002F\u002Feng.lyft.com\u002Fpricing-at-lyft-8a4022065f8b) `Lyft` `2022`\n\n## Graph\n1. [Building The LinkedIn Knowledge Graph](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2016\u002F10\u002Fbuilding-the-linkedin-knowledge-graph) `LinkedIn` `2016`\n2. [Scaling Knowledge Access and Retrieval at Airbnb](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fscaling-knowledge-access-and-retrieval-at-airbnb-665b6ba21e95) `Airbnb` `2018`\n3. [Graph Convolutional Neural Networks for Web-Scale Recommender Systems](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.01973) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.01973.pdf))`Pinterest` `2018`\n4. [Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations](https:\u002F\u002Feng.uber.com\u002Fuber-eats-graph-learning\u002F) `Uber` `2019`\n5. [AliGraph: A Comprehensive Graph Neural Network Platform](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.08730) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1902.08730.pdf)) `Alibaba` `2019`\n6. [Contextualizing Airbnb by Building Knowledge Graph](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fcontextualizing-airbnb-by-building-knowledge-graph-b7077e268d5a) `Airbnb` `2019`\n7. [Retail Graph — Walmart’s Product Knowledge Graph](https:\u002F\u002Fmedium.com\u002Fwalmartlabs\u002Fretail-graph-walmarts-product-knowledge-graph-6ef7357963bc) `Walmart` `2020`\n8. [Traffic Prediction with Advanced Graph Neural Networks](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Ftraffic-prediction-with-advanced-graph-neural-networks) `DeepMind` `2020`\n9. [SimClusters: Community-Based Representations for Recommendations](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3394486.3403370) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3394486.3403370), [Video](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369790576dd25aef288d5\u002F)) `Twitter` `2020`\n10. [Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.06474) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.06474.pdf)) `Alibaba` `2021`\n11. [Graph Intention Network for Click-through Rate Prediction in Sponsored Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.16164) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.16164.pdf)) `Alibaba` `2021`\n12. [JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F17796) ([Paper](https:\u002F\u002Fwww.aaai.org\u002FAAAI21Papers\u002FIAAI-21.DingW.pdf)) `JPMorgan Chase` `2021`\n13. [How AWS uses graph neural networks to meet customer needs](https:\u002F\u002Fwww.amazon.science\u002Fblog\u002Fhow-aws-uses-graph-neural-networks-to-meet-customer-needs) `Amazon` `2022`\n14. [Graph for Fraud Detection](https:\u002F\u002Fengineering.grab.com\u002Fgraph-for-fraud-detection) `Grab` `2022`\n\n## Optimization\n1. [Matchmaking in Lyft Line (Part 1)](https:\u002F\u002Feng.lyft.com\u002Fmatchmaking-in-lyft-line-9c2635fe62c4) [(Part 2)](https:\u002F\u002Feng.lyft.com\u002Fmatchmaking-in-lyft-line-691a1a32a008) [(Part 3)](https:\u002F\u002Feng.lyft.com\u002Fmatchmaking-in-lyft-line-part-3-d8f9497c0e51) `Lyft` `2016`\n2. [The Data and Science behind GrabShare Carpooling](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F8259801) [(Part 1)](https:\u002F\u002Fengineering.grab.com\u002Fthe-data-and-science-behind-grabshare-part-i) (**PAPER NEEDED**) `Grab` `2017`\n3. [How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats](https:\u002F\u002Feng.uber.com\u002Fuber-eats-trip-optimization\u002F) `Uber` `2018`\n4. [Next-Generation Optimization for Dasher Dispatch at DoorDash](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F02\u002F28\u002Fnext-generation-optimization-for-dasher-dispatch-at-doordash\u002F) `DoorDash` `2020` \n5. [Optimization of Passengers Waiting Time in Elevators Using Machine Learning](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vXndCC89BCw&t=4s) `Thyssen Krupp AG` `2020`\n6. [Think Out of The Package: Recommending Package Types for E-commerce Shipments](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fthink-out-of-the-package-recommending-package-types-for-e-commerce-shipments) ([Paper](https:\u002F\u002Fassets.amazon.science\u002F0c\u002F6c\u002F9d0986b94bef92d148f0ac0da1ea\u002Fthink-out-of-the-package-recommending-package-types-for-e-commerce-shipments.pdf)) `Amazon` `2020`\n7. [Optimizing DoorDash’s Marketing Spend with Machine Learning](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F07\u002F31\u002Foptimizing-marketing-spend-with-ml\u002F) `DoorDash` `2020`\n8. [Using learning-to-rank to precisely locate where to deliver packages](https:\u002F\u002Fwww.amazon.science\u002Fblog\u002Fusing-learning-to-rank-to-precisely-locate-where-to-deliver-packages) ([Paper](https:\u002F\u002Fassets.amazon.science\u002F69\u002F8d\u002F2249945a4e10ba8fc758f7523b0c\u002Fgetting-your-package-to-the-right-place-supervised-machine-learning-for-geolocation.pdf))`Amazon` `2021`\n\n## Information Extraction\n1. [Unsupervised Extraction of Attributes and Their Values from Product Description](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FI13-1190\u002F) ([Paper](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FI13-1190.pdf)) `Rakuten` `2013`\n2. [Using Machine Learning to Index Text from Billions of Images](https:\u002F\u002Fdropbox.tech\u002Fmachine-learning\u002Fusing-machine-learning-to-index-text-from-billions-of-images) `Dropbox` `2018`\n3. [Extracting Structured Data from Templatic Documents](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F06\u002Fextracting-structured-data-from.html) ([Paper](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FI13-1190.pdf)) `Google` `2020`\n4. [AutoKnow: self-driving knowledge collection for products of thousands of types](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fautoknow-self-driving-knowledge-collection-for-products-of-thousands-of-types) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.13473.pdf), [Video](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369730576dd25aef288a6\u002F)) `Amazon` `2020`\n5. [One-shot Text Labeling using Attention and Belief Propagation for Information Extraction](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.04153) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2009.04153.pdf)) `Alibaba` `2020`\n6. [Information Extraction from Receipts with Graph Convolutional Networks](https:\u002F\u002Fnanonets.com\u002Fblog\u002Finformation-extraction-graph-convolutional-networks\u002F) `Nanonets` `2021`\n\n## Weak Supervision\n1. [Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3299869.3314036) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3299869.3314036)) `Google` `2019`\n2. [Osprey: Weak Supervision of Imbalanced Extraction Problems without Code](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3329486.3329492) ([Paper](https:\u002F\u002Fajratner.github.io\u002Fassets\u002Fpapers\u002FOsprey_DEEM.pdf)) `Intel` `2019` \n3. [Overton: A Data System for Monitoring and Improving Machine-Learned Products](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.05372) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.05372.pdf)) `Apple` `2019`\n4. [Bootstrapping Conversational Agents with Weak Supervision](https:\u002F\u002Fwww.aaai.org\u002Fojs\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F5011) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1812.06176.pdf)) `IBM` `2019`\n\n## Generation\n1. [Better Language Models and Their Implications](https:\u002F\u002Fopenai.com\u002Fblog\u002Fbetter-language-models\u002F) ([Paper](https:\u002F\u002Fcdn.openai.com\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf))`OpenAI` `2019`\n2. [Image GPT](https:\u002F\u002Fopenai.com\u002Fblog\u002Fimage-gpt\u002F) ([Paper](https:\u002F\u002Fcdn.openai.com\u002Fpapers\u002FGenerative_Pretraining_from_Pixels_V2.pdf), [Code](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fimage-gpt)) `OpenAI` `2019`\n3. [Language Models are Few-Shot Learners](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.14165) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.14165.pdf)) ([GPT-3 Blog post](https:\u002F\u002Fopenai.com\u002Fblog\u002Fopenai-api\u002F)) `OpenAI` `2020`\n4. [Deep Learned Super Resolution for Feature Film Production](https:\u002F\u002Fgraphics.pixar.com\u002Flibrary\u002FSuperResolution\u002F) ([Paper](https:\u002F\u002Fgraphics.pixar.com\u002Flibrary\u002FSuperResolution\u002Fpaper.pdf)) `Pixar` `2020`\n5. [Unit Test Case Generation with Transformers](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2009.05617.pdf) `Microsoft` `2021`\n\n## Audio\n1. [Improving On-Device Speech Recognition with VoiceFilter-Lite](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F11\u002Fimproving-on-device-speech-recognition.html) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2009.04323.pdf))`Google` `2020`\n2. [The Machine Learning Behind Hum to Search](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F11\u002Fthe-machine-learning-behind-hum-to.html) `Google` `2020`\n\n## Privacy-preserving Machine Learning\n1. [Federated Learning: Collaborative Machine Learning without Centralized Training Data](https:\u002F\u002Fai.googleblog.com\u002F2017\u002F04\u002Ffederated-learning-collaborative.html) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1602.05629)) `Google` `2017`\n2. [Federated Learning with Formal Differential Privacy Guarantees](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F02\u002Ffederated-learning-with-formal.html) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.00039)) `Google` `2022`\n3. [MPC-based machine learning: Achieving end-to-end privacy-preserving machine learning](https:\u002F\u002Fresearch.facebook.com\u002Fblog\u002F2022\u002F10\u002Fmpc-based-machine-learning-achieving-end-to-end-privacy-preserving-machine-learning\u002F) ([Paper](https:\u002F\u002Fresearch.facebook.com\u002Ffile\u002F455681589729383\u002FPrivate-Computation-Framework-2.0-White-Paper.pdf)) `Facebook` `2022`\n\n\n## Validation and A\u002FB Testing\n1. [Overlapping Experiment Infrastructure: More, Better, Faster Experimentation](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub36500\u002F) ([Paper](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F36500.pdf)) `Google` `2010`\n2. [The Reusable Holdout: Preserving Validity in Adaptive Data Analysis](https:\u002F\u002Fai.googleblog.com\u002F2015\u002F08\u002Fthe-reusable-holdout-preserving.html) ([Paper](https:\u002F\u002Fscience.sciencemag.org\u002Fcontent\u002Fsci\u002F349\u002F6248\u002F636.full.pdf)) `Google` `2015`\n3. [Twitter Experimentation: Technical Overview](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Fa\u002F2015\u002Ftwitter-experimentation-technical-overview.html) `Twitter` `2015`\n4. [It’s All A\u002FBout Testing: The Netflix Experimentation Platform](https:\u002F\u002Fnetflixtechblog.com\u002Fits-all-a-bout-testing-the-netflix-experimentation-platform-4e1ca458c15) `Netflix` `2016`\n5. [Building Pinterest’s A\u002FB Testing Platform](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fbuilding-pinterests-a-b-testing-platform-ab4934ace9f4) `Pinterest` `2016` \n6. [Experimenting to Solve Cramming](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2017\u002FExperimenting-To-Solve-Cramming.html) `Twitter` `2017`\n7. [Building an Intelligent Experimentation Platform with Uber Engineering](https:\u002F\u002Feng.uber.com\u002Fexperimentation-platform\u002F) `Uber` `2017`\n8. [Scaling Airbnb’s Experimentation Platform](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fhttps-medium-com-jonathan-parks-scaling-erf-23fd17c91166) `Airbnb` `2017`\n9. [Meet Wasabi, an Open Source A\u002FB Testing Platform](https:\u002F\u002Fwww.intuit.com\u002Fblog\u002Ftechnology\u002Fengineering\u002Fmeet-wasabi-an-open-source-ab-testing-platform\u002F) ([Code](https:\u002F\u002Fgithub.com\u002Fintuit\u002Fwasabi)) `Intuit` `2017` \n10. [Analyzing Experiment Outcomes: Beyond Average Treatment Effects](https:\u002F\u002Feng.uber.com\u002Fanalyzing-experiment-outcomes\u002F) `Uber` `2018`\n11. [Under the Hood of Uber’s Experimentation Platform](https:\u002F\u002Feng.uber.com\u002Fxp\u002F) `Uber` `2018`\n12. [Constrained Bayesian Optimization with Noisy Experiments](https:\u002F\u002Fresearch.fb.com\u002Fpublications\u002Fconstrained-bayesian-optimization-with-noisy-experiments\u002F) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1706.07094.pdf)) `Facebook` `2018`\n13. [Reliable and Scalable Feature Toggles and A\u002FB Testing SDK at Grab](https:\u002F\u002Fengineering.grab.com\u002Ffeature-toggles-ab-testing) `Grab` `2018`\n14. [Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions](https:\u002F\u002Fbetter.engineering\u002Fmodeling-conversion-rates-and-saving-millions-of-dollars-using-kaplan-meier-and-gamma-distributions\u002F) ([Code](https:\u002F\u002Fgithub.com\u002Fbetter\u002Fconvoys)) `Better` `2019`\n15. [Detecting Interference: An A\u002FB Test of A\u002FB Tests](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002F06\u002Fdetecting-interference--an-a-b-test-of-a-b-tests) `LinkedIn` `2019`\n16. [Announcing a New Framework for Designing Optimal Experiments with Pyro](https:\u002F\u002Feng.uber.com\u002Foed-pyro-release\u002F) ([Paper](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F9553-variational-bayesian-optimal-experimental-design.pdf)) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.00294.pdf)) `Uber` `2020`\n17. [Enabling 10x More Experiments with Traveloka Experiment Platform](https:\u002F\u002Fmedium.com\u002Ftraveloka-engineering\u002Fenabling-10x-more-experiments-with-traveloka-experiment-platform-8cea13e952c) `Traveloka` `2020`\n18. [Large Scale Experimentation at Stitch Fix](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2020\u002F07\u002F07\u002Flarge-scale-experimentation\u002F) ([Paper](http:\u002F\u002Fproceedings.mlr.press\u002Fv89\u002Fschmit19a\u002Fschmit19a.pdf)) `Stitch Fix` `2020`\n19. [Multi-Armed Bandits and the Stitch Fix Experimentation Platform](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2020\u002F08\u002F05\u002Fbandits\u002F) `Stitch Fix` `2020`\n20. [Experimentation with Resource Constraints](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2020\u002F11\u002F18\u002Fvirtual-warehouse\u002F) `Stitch Fix` `2020`\n21. [Computational Causal Inference at Netflix](https:\u002F\u002Fnetflixtechblog.com\u002Fcomputational-causal-inference-at-netflix-293591691c62) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.10979.pdf)) `Netflix` `2020`\n22. [Key Challenges with Quasi Experiments at Netflix](https:\u002F\u002Fnetflixtechblog.com\u002Fkey-challenges-with-quasi-experiments-at-netflix-89b4f234b852) `Netflix` `2020`\n23. [Making the LinkedIn experimentation engine 20x faster](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fmaking-the-linkedin-experimentation-engine-20x-faster) `LinkedIn` `2020`\n24. [Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Four-evolution-towards-t-rex--the-prehistory-of-experimentation-i) `LinkedIn` `2020`\n25. [How to Use Quasi-experiments and Counterfactuals to Build Great Products](https:\u002F\u002Fengineering.shopify.com\u002Fblogs\u002Fengineering\u002Fusing-quasi-experiments-counterfactuals) `Shopify` `2020`\n26. [Improving Experimental Power through Control Using Predictions as Covariate](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F06\u002F08\u002Fimproving-experimental-power-through-control-using-predictions-as-covariate-cupac\u002F) `DoorDash` `2020`\n27. [Supporting Rapid Product Iteration with an Experimentation Analysis Platform](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F09\u002F09\u002Fexperimentation-analysis-platform-mvp\u002F) `DoorDash` `2020`\n28. [Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F10\u002F07\u002Fimproving-experiment-capacity-by-4x\u002F) `DoorDash` `2020`\n29. [Leveraging Causal Modeling to Get More Value from Flat Experiment Results](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F09\u002F18\u002Fcausal-modeling-to-get-more-value-from-flat-experiment-results\u002F) `DoorDash` `2020`\n30. [Iterating Real-time Assignment Algorithms Through Experimentation](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F12\u002F08\u002Foptimizing-real-time-algorithms-experimentation\u002F) `DoorDash` `2020`\n31. [Spotify’s New Experimentation Platform (Part 1)](https:\u002F\u002Fengineering.atspotify.com\u002F2020\u002F10\u002F29\u002Fspotifys-new-experimentation-platform-part-1\u002F) [(Part 2)](https:\u002F\u002Fengineering.atspotify.com\u002F2020\u002F11\u002F02\u002Fspotifys-new-experimentation-platform-part-2\u002F) `Spotify` `2020`\n32. [Interpreting A\u002FB Test Results: False Positives and Statistical Significance](https:\u002F\u002Fnetflixtechblog.com\u002Finterpreting-a-b-test-results-false-positives-and-statistical-significance-c1522d0db27a) `Netflix` `2021`\n33. [Interpreting A\u002FB Test Results: False Negatives and Power](https:\u002F\u002Fnetflixtechblog.com\u002Finterpreting-a-b-test-results-false-negatives-and-power-6943995cf3a8) `Netflix` `2021`\n34. [Running Experiments with Google Adwords for Campaign Optimization](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F02\u002F05\u002Fgoogle-adwords-campaign-optimization\u002F) `DoorDash` `2021`\n35. [The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000%](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F09\u002F21\u002Fthe-4-principles-doordash-used-to-increase-its-logistics-experiment-capacity-by-1000\u002F) `DoorDash` `2021`\n36. [Experimentation Platform at Zalando: Part 1 - Evolution](https:\u002F\u002Fengineering.zalando.com\u002Fposts\u002F2021\u002F01\u002Fexperimentation-platform-part1.html) `Zalando` `2021`\n37. [Designing Experimentation Guardrails](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fdesigning-experimentation-guardrails-ed6a976ec669) `Airbnb` `2021`\n38. [How Airbnb Measures Future Value to Standardize Tradeoffs](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fhow-airbnb-measures-future-value-to-standardize-tradeoffs-3aa99a941ba5) `Airbnb` `2021`\n38. [Network Experimentation at Scale](https:\u002F\u002Fresearch.fb.com\u002Fpublications\u002Fnetwork-experimentation-at-scale\u002F)([Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.08591)] `Facebook` `2021`\n39. [Universal Holdout Groups at Disney Streaming](https:\u002F\u002Fmedium.com\u002Fdisney-streaming\u002Funiversal-holdout-groups-at-disney-streaming-2043360def4f) `Disney` `2021`\n40. [Experimentation is a major focus of Data Science across Netflix](https:\u002F\u002Fnetflixtechblog.com\u002Fexperimentation-is-a-major-focus-of-data-science-across-netflix-f67923f8e985) `Netflix` `2022`\n41. [Search Journey Towards Better Experimentation Practices](https:\u002F\u002Fengineering.atspotify.com\u002F2022\u002F02\u002Fsearch-journey-towards-better-experimentation-practices\u002F) `Spotify` `2022`\n42. [Artificial Counterfactual Estimation: Machine Learning-Based Causal Inference at Airbnb](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fartificial-counterfactual-estimation-ace-machine-learning-based-causal-inference-at-airbnb-ee32ee4d0512) `Airbnb` `2022`\n43. [Beyond A\u002FB Test : Speeding up Airbnb Search Ranking Experimentation through Interleaving](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fbeyond-a-b-test-speeding-up-airbnb-search-ranking-experimentation-through-interleaving-7087afa09c8e) `Airbnb` `2022`\n44. [Challenges in Experimentation](https:\u002F\u002Feng.lyft.com\u002Fchallenges-in-experimentation-be9ab98a7ef4) `Lyft` `2022`\n45. [Overtracking and Trigger Analysis: Reducing sample sizes while INCREASING sensitivity](https:\u002F\u002Fbooking.ai\u002Fovertracking-and-trigger-analysis-how-to-reduce-sample-sizes-and-increase-the-sensitivity-of-71755bad0e5f) `Booking` `2022`\n46. [Meet Dash-AB — The Statistics Engine of Experimentation at DoorDash](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F05\u002F24\u002Fmeet-dash-ab-the-statistics-engine-of-experimentation-at-doordash\u002F) `DoorDash` `2022`\n47. [Comparing quantiles at scale in online A\u002FB-testing](https:\u002F\u002Fengineering.atspotify.com\u002F2022\u002F03\u002Fcomparing-quantiles-at-scale-in-online-a-b-testing) `Spotify` `2022`\n48. [Accelerating our A\u002FB experiments with machine learning](https:\u002F\u002Fdropbox.tech\u002Fmachine-learning\u002Faccelerating-our-a-b-experiments-with-machine-learning-xr) `Dropbox` `2023`\n49. [Supercharging A\u002FB Testing at Uber](https:\u002F\u002Fwww.uber.com\u002Fblog\u002Fsupercharging-a-b-testing-at-uber\u002F) `Uber` \n\n## Model Management\n1. [Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions](https:\u002F\u002Fvimeo.com\u002F274396495) `Comcast` `2018`\n2. [Overton: A Data System for Monitoring and Improving Machine-Learned Products](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.05372) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.05372.pdf)) `Apple` `2019`\n3. [Runway - Model Lifecycle Management at Netflix](https:\u002F\u002Fwww.usenix.org\u002Fconference\u002Fopml20\u002Fpresentation\u002Fcepoi) `Netflix` `2020`\n4. [Managing ML Models @ Scale - Intuit’s ML Platform](https:\u002F\u002Fwww.usenix.org\u002Fconference\u002Fopml20\u002Fpresentation\u002Fwenzel) `Intuit` `2020`\n5. [ML Model Monitoring - 9 Tips From the Trenches](https:\u002F\u002Fbuilding.nubank.com.br\u002Fml-model-monitoring-9-tips-from-the-trenches\u002F) `Nubank` `2021`\n6. [Dealing with Train-serve Skew in Real-time ML Models: A Short Guide](https:\u002F\u002Fbuilding.nubank.com.br\u002Fdealing-with-train-serve-skew-in-real-time-ml-models-a-short-guide\u002F) `Nubank` `2023`\n\n## Efficiency\n1. [GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce](https:\u002F\u002Fai.facebook.com\u002Fresearch\u002Fpublications\u002Fgroknet-unified-computer-vision-model-trunk-and-embeddings-for-commerce\u002F) ([Paper](https:\u002F\u002Fscontent-sea1-1.xx.fbcdn.net\u002Fv\u002Ft39.8562-6\u002F99353320_565175057533429_3886205100842024960_n.pdf?_nc_cat=110&_nc_sid=ae5e01&_nc_ohc=WQBaZy1gnmUAX8Ecqtt&_nc_ht=scontent-sea1-1.xx&oh=cab2f11dd9154d817149cb73e8b692a8&oe=5F5A3778)) `Facebook` `2020`\n2. [How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs](https:\u002F\u002Fblog.roblox.com\u002F2020\u002F05\u002Fscaled-bert-serve-1-billion-daily-requests-cpus\u002F) `Roblox` `2020`\n3. [Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.15703) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.15703.pdf)) `Uber` `2021`\n4. [GPU-accelerated ML Inference at Pinterest](https:\u002F\u002Fmedium.com\u002F@Pinterest_Engineering\u002Fgpu-accelerated-ml-inference-at-pinterest-ad1b6a03a16d) `Pinterest` `2022`\n\n## Ethics\n1. [Building Inclusive Products Through A\u002FB Testing](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fbuilding-inclusive-products-through-a-b-testing) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2002.05819.pdf)) `LinkedIn` `2020`\n2. [LiFT: A Scalable Framework for Measuring Fairness in ML Applications](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Flift-addressing-bias-in-large-scale-ai-applications) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.07433.pdf)) `LinkedIn` `2020`\n3. [Introducing Twitter’s first algorithmic bias bounty challenge](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2021\u002Falgorithmic-bias-bounty-challenge) `Twitter` `2021`\n4. [Examining algorithmic amplification of political content on Twitter](https:\u002F\u002Fblog.twitter.com\u002Fen_us\u002Ftopics\u002Fcompany\u002F2021\u002Frml-politicalcontent) `Twitter` `2021`\n5. [A closer look at how LinkedIn integrates fairness into its AI products](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2022\u002Fa-closer-look-at-how-linkedin-integrates-fairness-into-its-ai-pr) `LinkedIn` `2022`\n\n## Infra\n1. [Reengineering Facebook AI’s Deep Learning Platforms for Interoperability](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Freengineering-facebook-ais-deep-learning-platforms-for-interoperability) `Facebook` `2020`\n2. [Elastic Distributed Training with XGBoost on Ray](https:\u002F\u002Feng.uber.com\u002Felastic-xgboost-ray\u002F) `Uber` `2021`\n\n## MLOps Platforms\n1. [Meet Michelangelo: Uber’s Machine Learning Platform](https:\u002F\u002Feng.uber.com\u002Fmichelangelo-machine-learning-platform\u002F) `Uber` `2017`\n2. [Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions](https:\u002F\u002Fvimeo.com\u002F274396495) `Comcast` `2018`\n3. [Big Data Machine Learning Platform at Pinterest](https:\u002F\u002Fwww.slideshare.net\u002FAlluxio\u002Fpinterest-big-data-machine-learning-platform-at-pinterest) `Pinterest` `2019`\n4. [Core Modeling at Instagram](https:\u002F\u002Finstagram-engineering.com\u002Fcore-modeling-at-instagram-a51e0158aa48) `Instagram` `2019`\n5. [Open-Sourcing Metaflow - a Human-Centric Framework for Data Science](https:\u002F\u002Fnetflixtechblog.com\u002Fopen-sourcing-metaflow-a-human-centric-framework-for-data-science-fa72e04a5d9) `Netflix` `2019`\n6. [Managing ML Models @ Scale - Intuit’s ML Platform](https:\u002F\u002Fwww.usenix.org\u002Fconference\u002Fopml20\u002Fpresentation\u002Fwenzel) `Intuit` `2020`\n7. [Real-time Machine Learning Inference Platform at Zomato](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0-3ES1vzW14) `Zomato` `2020`\n8. [Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform](https:\u002F\u002Feng.lyft.com\u002Fintroducing-flyte-cloud-native-machine-learning-and-data-processing-platform-fb2bb3046a59) `Lyft` `2020`\n9. [Building Flexible Ensemble ML Models with a Computational Graph](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F01\u002F26\u002Fcomputational-graph-machine-learning-ensemble-model-support\u002F) `DoorDash` `2021`\n10. [LyftLearn: ML Model Training Infrastructure built on Kubernetes](https:\u002F\u002Feng.lyft.com\u002Flyftlearn-ml-model-training-infrastructure-built-on-kubernetes-aef8218842bb) `Lyft` `2021`\n11. [\"You Don't Need a Bigger Boat\": A Full Data Pipeline Built with Open-Source Tools](https:\u002F\u002Fgithub.com\u002Fjacopotagliabue\u002Fyou-dont-need-a-bigger-boat) ([Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.07346)) `Coveo` `2021`\n12. [MLOps at GreenSteam: Shipping Machine Learning](https:\u002F\u002Fneptune.ai\u002Fblog\u002Fmlops-at-greensteam-shipping-machine-learning-case-study) `GreenSteam` `2021`\n13. [Evolving Reddit’s ML Model Deployment and Serving Architecture](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FRedditEng\u002Fcomments\u002Fq14tsw\u002Fevolving_reddits_ml_model_deployment_and_serving\u002F) `Reddit` `2021`\n14. [Redesigning Etsy’s Machine Learning Platform](https:\u002F\u002Fwww.etsy.com\u002Fcodeascraft\u002Fredesigning-etsys-machine-learning-platform\u002F) `Etsy` `2021`\n15. [Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.09373) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.09373.pdf)) `Meta` `2021`\n15. [Building a Platform for Serving Recommendations at Etsy](https:\u002F\u002Fwww.etsy.com\u002Fcodeascraft\u002Fbuilding-a-platform-for-serving-recommendations-at-etsy) `Etsy` `2022` \n16. [Intelligent Automation Platform: Empowering Conversational AI and Beyond at Airbnb](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fintelligent-automation-platform-empowering-conversational-ai-and-beyond-at-airbnb-869c44833ff2) `Airbnb` `2022`\n17. [DARWIN: Data Science and Artificial Intelligence Workbench at LinkedIn](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2022\u002Fdarwin--data-science-and-artificial-intelligence-workbench-at-li) `LinkedIn` `2022`\n18. [The Magic of Merlin: Shopify's New Machine Learning Platform](https:\u002F\u002Fshopify.engineering\u002Fmerlin-shopify-machine-learning-platform) `Shopify` `2022`\n19. [Zalando's Machine Learning Platform](https:\u002F\u002Fengineering.zalando.com\u002Fposts\u002F2022\u002F04\u002Fzalando-machine-learning-platform.html) `Zalando` `2022`\n20. [Inside Meta's AI optimization platform for engineers across the company](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Flooper-meta-ai-optimization-platform-for-engineers\u002F) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.07554.pdf)) `Meta` `2022`\n21. [Monzo’s machine learning stack](https:\u002F\u002Fmonzo.com\u002Fblog\u002F2022\u002F04\u002F26\u002Fmonzos-machine-learning-stack) `Monzo` `2022`\n22. [Evolution of ML Fact Store](https:\u002F\u002Fnetflixtechblog.com\u002Fevolution-of-ml-fact-store-5941d3231762) `Netflix` `2022`\n23. [Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline](https:\u002F\u002Fwww.binance.com\u002Fen\u002Fblog\u002Fall\u002Fusing-mlops-to-build-a-realtime-endtoend-machine-learning-pipeline-3820048062346322706) `Binance` `2022`\n24. [Serving Machine Learning Models Efficiently at Scale at Zillow](https:\u002F\u002Fwww.zillow.com\u002Ftech\u002Fserving-machine-learning-models-efficiently-at-scale-at-zillow\u002F) `Zillow` `2022`\n25. [Didact AI: The anatomy of an ML-powered stock picking engine](https:\u002F\u002Fprincipiamundi.com\u002Fposts\u002Fdidact-anatomy\u002F?utm_campaign=Data_Elixir&utm_source=Data_Elixir_407\u002F) `Didact AI` `2022`\n26. [Deployment for Free - A Machine Learning Platform for Stitch Fix's Data Scientists](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2022\u002F07\u002F14\u002Fdeployment-for-free\u002F) `Stitch Fix` `2022`\n27. [Machine Learning Operations (MLOps): Overview, Definition, and Architecture](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.02302) ([Paper](https:\u002F\u002Farxiv.org\u002Fftp\u002Farxiv\u002Fpapers\u002F2205\u002F2205.02302.pdf)) `IBM` `2022`\n\n## Practices\n1. [Practical Recommendations for Gradient-Based Training of Deep Architectures](https:\u002F\u002Farxiv.org\u002Fabs\u002F1206.5533) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1206.5533.pdf)) `Yoshua Bengio` `2012`\n2. [Machine Learning: The High Interest Credit Card of Technical Debt](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub43146\u002F) ([Paper](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F43146.pdf)) ([Paper](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F5656-hidden-technical-debt-in-machine-learning-systems.pdf)) `Google` `2014`\n3. [Rules of Machine Learning: Best Practices for ML Engineering](https:\u002F\u002Fdevelopers.google.com\u002Fmachine-learning\u002Fguides\u002Frules-of-ml) `Google` `2018`\n4. [On Challenges in Machine Learning Model Management](http:\u002F\u002Fsites.computer.org\u002Fdebull\u002FA18dec\u002Fp5.pdf) `Amazon` `2018`\n5. [Machine Learning in Production: The Booking.com Approach](https:\u002F\u002Fbooking.ai\u002Fhttps-booking-ai-machine-learning-production-3ee8fe943c70) `Booking` `2019`\n6. [150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com](https:\u002F\u002Fbooking.ai\u002F150-successful-machine-learning-models-6-lessons-learned-at-booking-com-681e09107bec) ([Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3292500.3330744)) `Booking` `2019`\n7. [Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=QYQKG5OcwEI) `Rabobank` `2019`\n8. [Challenges in Deploying Machine Learning: a Survey of Case Studies](https:\u002F\u002Farxiv.org\u002Fabs\u002F2011.09926) ([Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2011.09926.pdf)) `Cambridge` `2020`\n9. [Reengineering Facebook AI’s Deep Learning Platforms for Interoperability](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Freengineering-facebook-ais-deep-learning-platforms-for-interoperability) `Facebook` `2020`\n10. [The problem with AI developer tools for enterprises](https:\u002F\u002Ftowardsdatascience.com\u002Fthe-problem-with-ai-developer-tools-for-enterprises-and-what-ikea-has-to-do-with-it-b26277841661) `Databricks` `2020`\n11. [Continuous Integration and Deployment for Machine Learning Online Serving and Models](https:\u002F\u002Feng.uber.com\u002Fcontinuous-integration-deployment-ml\u002F) `Uber` `2021`\n12. [Tuning Model Performance](https:\u002F\u002Feng.uber.com\u002Ftuning-model-performance\u002F) `Uber` `2021`\n13. [Maintaining Machine Learning Model Accuracy Through Monitoring](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F05\u002F20\u002Fmonitor-machine-learning-model-drift\u002F) `DoorDash` `2021`\n14. [Building Scalable and Performant Marketing ML Systems at Wayfair](https:\u002F\u002Fwww.aboutwayfair.com\u002Fcareers\u002Ftech-blog\u002Fbuilding-scalable-and-performant-marketing-ml-systems-at-wayfair) `Wayfair` `2021`\n15. [Our approach to building transparent and explainable AI systems](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2021\u002Ftransparent-and-explainable-AI-systems) `LinkedIn` `2021`\n16. [5 Steps for Building Machine Learning Models for Business](https:\u002F\u002Fshopify.engineering\u002Fbuilding-business-machine-learning-models) `Shopify` `2021`\n17. [Data Is An Art, Not Just A Science—And Storytelling Is The Key](https:\u002F\u002Fshopifyengineering.myshopify.com\u002Fblogs\u002Fengineering\u002Fdata-storytelling-shopify) `Shopify` `2022`\n18. [Best Practices for Real-time Machine Learning: Alerting](https:\u002F\u002Fbuilding.nubank.com.br\u002Fbest-practices-for-real-time-machine-learning-alerting\u002F) `Nubank` `2022`\n19. [Automatic Retraining for Machine Learning Models: Tips and Lessons Learned](https:\u002F\u002Fbuilding.nubank.com.br\u002Fautomatic-retraining-for-machine-learning-models\u002F) `Nubank` `2022`\n20. [RecSysOps: Best Practices for Operating a Large-Scale Recommender System](https:\u002F\u002Fnetflixtechblog.medium.com\u002Frecsysops-best-practices-for-operating-a-large-scale-recommender-system-95bbe195a841) `Netflix` `2022`\n21. [ML Education at Uber: Frameworks Inspired by Engineering Principles](https:\u002F\u002Fwww.uber.com\u002Fen-PL\u002Fblog\u002Fml-education-at-uber\u002F) `Uber` `2022`\n22. [Building and Maintaining Internal Tools for DS\u002FML teams: Lessons Learned](https:\u002F\u002Fbuilding.nubank.com.br\u002Fbuilding-and-maintaining-internal-tools-for-ds-ml-teams-lessons-learned) `Nubank` `2024`\n\n## Team structure\n1. [What is the most effective way to structure a data science team?](https:\u002F\u002Ftowardsdatascience.com\u002Fwhat-is-the-most-effective-way-to-structure-a-data-science-team-498041b88dae) `Udemy` `2017`\n1. [Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2016\u002F03\u002F16\u002Fengineers-shouldnt-write-etl\u002F) `Stitch Fix` `2016`\n2. [Building The Analytics Team At Wish](https:\u002F\u002Fmedium.com\u002Fwish-engineering\u002Fscaling-analytics-at-wish-619eacb97d16) `Wish` `2018`\n3. [Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2019\u002F03\u002F11\u002FFullStackDS-Generalists\u002F) `Stitch Fix` `2019`\n4. [Cultivating Algorithms: How We Grow Data Science at Stitch Fix](https:\u002F\u002Fcultivating-algos.stitchfix.com) `Stitch Fix`\n5. [Analytics at Netflix: Who We Are and What We Do](https:\u002F\u002Fnetflixtechblog.com\u002Fanalytics-at-netflix-who-we-are-and-what-we-do-7d9c08fe6965) `Netflix` `2020`\n6. [Building a Data Team at a Mid-stage Startup: A Short Story](https:\u002F\u002Ferikbern.com\u002F2021\u002F07\u002F07\u002Fthe-data-team-a-short-story.html) `Erikbern` `2021`\n7. [A Behind-the-Scenes Look at How Postman’s Data Team Works](https:\u002F\u002Fentrepreneurshandbook.co\u002Fa-behind-the-scenes-look-at-how-postmans-data-team-works-fded0b8bfc64) `Postman` `2021`\n8. [Data Scientist x Machine Learning Engineer Roles: How are they different? How are they alike?](https:\u002F\u002Fbuilding.nubank.com.br\u002Fdata-scientist-x-machine-learning-engineer-roles-how-are-they-different-how-are-they-alike\u002F) `Nubank` `2022`\n\n## Fails\n1. [When It Comes to Gorillas, Google Photos Remains Blind](https:\u002F\u002Fwww.wired.com\u002Fstory\u002Fwhen-it-comes-to-gorillas-google-photos-remains-blind\u002F) `Google` `2018`\n2. [160k+ High School Students Will Graduate Only If a Model Allows Them to](http:\u002F\u002Fpositivelysemidefinite.com\u002F2020\u002F06\u002F160k-students.html) `International Baccalaureate` `2020`\n3. [An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor](https:\u002F\u002Fwww.wired.com\u002Fstory\u002Falgorithm-predicts-criminality-based-face-sparks-furor\u002F) `Harrisburg University` `2020`\n4. [It's Hard to Generate Neural Text From GPT-3 About Muslims](https:\u002F\u002Ftwitter.com\u002Fabidlabs\u002Fstatus\u002F1291165311329341440) `OpenAI` `2020`\n5. [A British AI Tool to Predict Violent Crime Is Too Flawed to Use](https:\u002F\u002Fwww.wired.co.uk\u002Farticle\u002Fpolice-violence-prediction-ndas) `United Kingdom` `2020`\n6. More in [awful-ai](https:\u002F\u002Fgithub.com\u002Fdaviddao\u002Fawful-ai)\n7. [AI Incident Database](https:\u002F\u002Fincidentdatabase.ai\u002F) `Partnership on AI` `2022`\n\n\u003Cbr>\n\n**P.S., Want a summary of ML advancements?** Get up to speed with survey papers 👉[`ml-surveys`](https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fml-surveys)\n","# applied-ml\n精选关于**生产环境中的数据科学与机器学习**的论文、文章和博客。⚙️\n\n[![欢迎贡献](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcontributions-welcome-brightgreen.svg?style=flat)](.\u002FCONTRIBUTING.md) [![摘要](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fsummaries-in%20tweets-%2300acee.svg?style=flat)](https:\u002F\u002Ftwitter.com\u002Feugeneyan\u002Fstatus\u002F1350509546133811200) ![访问量](http:\u002F\u002Fhits.dwyl.com\u002Feugeneyan\u002Fapplied-ml.svg)\n\n正在思考如何落地你的机器学习项目吗？来看看其他组织是如何做的吧：\n\n- **问题是如何定义的** 🔎（例如，个性化作为推荐系统、搜索或序列建模）\n- **哪些机器学习技术奏效了** ✅（以及有时哪些没有奏效 ❌）\n- **为什么它有效**——背后的科学原理、研究文献及参考 📂\n- **实际取得了哪些成果**（以便你更好地评估投资回报率 ⏰💰📈）\n\nPS：想了解机器学习领域的最新进展摘要吗？👉[`ml-surveys`](https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fml-surveys)\n\nPPS：想找一些关于机器学习应用的指南和访谈吗？👉[`applyingML`](https:\u002F\u002Fapplyingml.com)\n\n**目录**\n\n1. [数据质量](#data-quality)\n2. [数据工程](#data-engineering)\n3. [数据发现](#data-discovery)\n4. [特征存储](#feature-stores)\n5. [分类](#classification)\n6. [回归](#regression)\n7. [预测](#forecasting)\n8. [推荐](#recommendation)\n9. [搜索与排序](#search--ranking)\n10. [嵌入](#embeddings)\n11. [自然语言处理](#natural-language-processing)\n12. [序列建模](#sequence-modelling)\n13. [计算机视觉](#computer-vision)\n14. [强化学习](#reinforcement-learning)\n15. [异常检测](#anomaly-detection)\n16. [图](#graph)\n17. [优化](#optimization)\n18. [信息抽取](#information-extraction)\n19. [弱监督](#weak-supervision)\n20. [生成](#generation)\n21. [音频](#audio)\n22. [隐私保护的机器学习](#privacy-preserving-machine-learning)\n23. [验证与A\u002FB测试](#validation-and-ab-testing)\n24. [模型管理](#model-management)\n25. [效率](#efficiency)\n26. [伦理](#ethics)\n27. [基础设施](#infra)\n28. [MLOps平台](#mlops-platforms)\n29. [实践](#practices)\n30. [团队结构](#team-structure)\n31. [失败案例](#fails)\n\n## 数据质量\n1. [Airbnb的可靠且可扩展的数据摄取](https:\u002F\u002Fwww.slideshare.net\u002FHadoopSummit\u002Freliable-and-scalable-data-ingestion-at-airbnb-63920989) `Airbnb` `2016`\n2. [利用统计建模大规模监控数据质量](https:\u002F\u002Feng.uber.com\u002Fmonitoring-data-quality-at-scale\u002F) `Uber` `2017`\n3. [生产环境中机器学习的数据管理挑战](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub46178\u002F)（[论文](https:\u002F\u002Fthodrek.github.io\u002FCS839_spring18\u002Fpapers\u002Fp1723-polyzotis.pdf)）`Google` `2017`\n4. [自动化大规模数据质量验证](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fautomating-large-scale-data-quality-verification)（[论文](https:\u002F\u002Fassets.amazon.science\u002Fa6\u002F88\u002Fad858ee240c38c6e9dce128250c0\u002Fautomating-large-scale-data-quality-verification.pdf)）`Amazon` `2018`\n5. [认识Hodor——Gojek的上游数据质量工具](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Fmeet-hodor-gojeks-upstream-data-quality-tool) `Gojek` `2019`\n6. [面向机器学习的数据验证](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub47967\u002F)（[论文](https:\u002F\u002Fmlsys.org\u002FConferences\u002F2019\u002Fdoc\u002F2019\u002F167.pdf)）`Google` `2019`\n6. [Netflix个性化系统中的数据质量方法](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=t7vHpA39TXM) `Netflix` `2020`\n7. [通过估计人类决策、标签和标注者的置信度来提升准确性](https:\u002F\u002Fresearch.fb.com\u002Fblog\u002F2020\u002F08\u002Fimproving-the-accuracy-of-community-standards-enforcement-by-certainty-estimation-of-human-decisions\u002F)（[论文](https:\u002F\u002Fresearch.fb.com\u002Fwp-content\u002Fuploads\u002F2020\u002F08\u002FCLARA-Confidence-of-Labels-and-Raters.pdf)）`Facebook` `2020`\n\n## 数据工程\n1. [Zipline：Airbnb的机器学习数据管理平台](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Tg5VEMEsC-0) `Airbnb` `2018`\n2. [Sputnik：Airbnb的Apache Spark数据工程框架](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=BQumogSBsUw) `Airbnb` `2020`\n3. [使用Metaflow和AWS Step Functions解耦数据科学工作流](https:\u002F\u002Fnetflixtechblog.com\u002Funbundling-data-science-workflows-with-metaflow-and-aws-step-functions-d454780c6280) `Netflix` `2020`\n4. [DoorDash如何扩展其数据平台以满足客户需求并应对不断增长的需求](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F09\u002F25\u002Fhow-doordash-is-scaling-its-data-platform\u002F) `DoorDash` `2020`\n5. [通过强数据一致性彻底革新大规模资金流动](https:\u002F\u002Feng.uber.com\u002Fmoney-scale-strong-data\u002F) `Uber` `2020`\n6. [Zipline——一个声明式的特征工程框架](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=LjcKCm0G_OY) `Airbnb` `2020`\n7. [大规模数据保护自动化，第一部分](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fautomating-data-protection-at-scale-part-1-c74909328e08)（[第二部分](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fautomating-data-protection-at-scale-part-2-c2b8d2068216)）`Airbnb` `2021`\n8. [Uber的实时数据基础设施](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2104.00087.pdf) `Uber` `2021`\n9. [推出Fabricator：一个声明式的特征工程框架](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F01\u002F11\u002Fintroducing-fabricator-a-declarative-feature-engineering-framework\u002F) `DoorDash` `2022`\n10. [函数与DAG：引入Hamilton——一个用于生成DataFrame的微框架](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2021\u002F10\u002F14\u002Ffunctions-dags-hamilton\u002F) `Stitch Fix` `2021`\n11. [Pinterest数据摄取栈的优化：发现与经验](https:\u002F\u002Fmedium.com\u002F@Pinterest_Engineering\u002Foptimizing-pinterests-data-ingestion-stack-findings-and-learnings-994fddb063bf) `Pinterest` `2022`\n12. [大规模运行Apache Airflow的经验教训](https:\u002F\u002Fshopifyengineering.myshopify.com\u002Fblogs\u002Fengineering\u002Flessons-learned-apache-airflow-scale) `Shopify` `2022`\n13. [理解大规模深度推荐模型训练中的数据存储与摄取](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.09373v4) `Meta` `2022`\n14. [Data Mesh——Netflix的数据流动与处理平台](https:\u002F\u002Fnetflixtechblog.com\u002Fdata-mesh-a-data-movement-and-processing-platform-netflix-1288bcab2873) `Netflix` `2022`\n15. [使用Kafka和Flink构建可扩展的实时事件处理系统](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F08\u002F02\u002Fbuilding-scalable-real-time-event-processing-with-kafka-and-flink\u002F) `DoorDash` `2022`\n\n## 数据发现\n1. [Apache Atlas：面向Hadoop的数据治理与元数据框架](https:\u002F\u002Fatlas.apache.org\u002F#\u002F) ([代码](https:\u002F\u002Fgithub.com\u002Fapache\u002Fatlas)) `Apache`\n2. [收集、聚合并可视化数据生态系统的元数据](https:\u002F\u002Fmarquezproject.github.io\u002Fmarquez\u002F) ([代码](https:\u002F\u002Fgithub.com\u002FMarquezProject\u002Fmarquez)) `WeWork`\n3. [Twitter上分析数据的发现与消费](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2016\u002Fdiscovery-and-consumption-of-analytics-data-at-twitter.html) `Twitter` `2016`\n4. [Airbnb的数据民主化](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fdemocratizing-data-at-airbnb-852d76c51770) `Airbnb` `2017`\n5. [Databook：在Uber利用元数据将大数据转化为知识](https:\u002F\u002Feng.uber.com\u002Fdatabook\u002F) `Uber` `2018`\n6. [Metacat：让Netflix的大数据可发现且有意义](https:\u002F\u002Fnetflixtechblog.com\u002Fmetacat-making-big-data-discoverable-and-meaningful-at-netflix-56fb36a53520) ([代码](https:\u002F\u002Fgithub.com\u002FNetflix\u002Fmetacat)) `Netflix` `2018`\n7. [Amundsen — Lyft的数据发现与元数据引擎](https:\u002F\u002Feng.lyft.com\u002Famundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9) `Lyft` `2019`\n8. [开源Amundsen：一个数据发现与元数据平台](https:\u002F\u002Feng.lyft.com\u002Fopen-sourcing-amundsen-a-data-discovery-and-metadata-platform-2282bb436234) ([代码](https:\u002F\u002Fgithub.com\u002Flyft\u002Famundsen)) `Lyft` `2019`\n9. [DataHub：一种通用的元数据搜索与发现工具](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002Fdata-hub) ([代码](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fdatahub)) `LinkedIn` `2019`\n10. [Amundsen：一年之后](https:\u002F\u002Feng.lyft.com\u002Famundsen-1-year-later-7b60bf28602) `Lyft` `2020`\n11. [使用Amundsen通过元数据收集支持Square的用户隐私](https:\u002F\u002Fdeveloper.squareup.com\u002Fblog\u002Fusing-amundsen-to-support-user-privacy-via-metadata-collection-at-square\u002F) `Square` `2020`\n12. [借助Databook将元数据转化为洞察](https:\u002F\u002Feng.uber.com\u002Fmetadata-insights-databook\u002F) `Uber` `2020`\n13. [DataHub：常见元数据架构解析](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fdatahub-popular-metadata-architectures-explained) `LinkedIn` `2020`\n14. [我们如何为Spotify的数据科学家改进数据发现](https:\u002F\u002Fengineering.atspotify.com\u002F2020\u002F02\u002F27\u002Fhow-we-improved-data-discovery-for-data-scientists-at-spotify\u002F) `Spotify` `2020`\n15. [我们在Shopify如何解决数据发现挑战](https:\u002F\u002Fengineering.shopify.com\u002Fblogs\u002Fengineering\u002Fsolving-data-discovery-challenges-shopify) `Shopify` `2020`\n16. [Nemo：Facebook的数据发现](https:\u002F\u002Fengineering.fb.com\u002Fdata-infrastructure\u002Fnemo\u002F) `Facebook` `2020`\n17. [探索Netflix的数据](https:\u002F\u002Fnetflixtechblog.com\u002Fexploring-data-netflix-9d87e20072e3) ([代码](https:\u002F\u002Fgithub.com\u002FNetflix\u002Fnf-data-explorer)) `Netflix` `2021`\n\n## 特征存储\n1. [用于特征生成的分布式时间旅行](https:\u002F\u002Fnetflixtechblog.com\u002Fdistributed-time-travel-for-feature-generation-389cccdd3907) `Netflix` `2016`\n2. [构建活动图，第2部分（特征存储章节）](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2017\u002F07\u002Fbuilding-the-activity-graph--part-2) `LinkedIn` `2017`\n3. [面向Netflix推荐的大规模事实存储](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DiwKg8KynVU) `Netflix` `2018`\n4. [Zipline：Airbnb的机器学习数据管理平台](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Tg5VEMEsC-0) `Airbnb` `2018`\n5. [特征存储：机器学习流水线中缺失的数据层吗？](https:\u002F\u002Fwww.hopsworks.ai\u002Fpost\u002Ffeature-store-the-missing-data-layer-in-ml-pipelines) `Hopsworks` `2018`\n6. [推出Feast：一个用于机器学习的开源特征存储](https:\u002F\u002Fcloud.google.com\u002Fblog\u002Fproducts\u002Fai-machine-learning\u002Fintroducing-feast-an-open-source-feature-store-for-machine-learning) ([代码](https:\u002F\u002Fgithub.com\u002Ffeast-dev\u002Ffeast)) `Gojek` `2019`\n7. [米开朗基罗调色板：Uber的特征工程平台](https:\u002F\u002Fwww.infoq.com\u002Fpresentations\u002Fmichelangelo-palette-uber\u002F) `Uber` `2019`\n8. [驱动Twitter特征存储的架构](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=UNailXoiIrY) `Twitter` `2019`\n9. [通过特征存储服务加速机器学习](https:\u002F\u002Ftechnology.condenast.com\u002Fstory\u002Faccelerating-machine-learning-with-the-feature-store-service) `康泰纳仕` `2019`\n10. [Feast：连接ML模型与数据](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Ffeast-bridging-ml-models-and-data) `Gojek` `2020`\n11. [利用Redis、二进制序列化和压缩构建可扩展的ML特征存储](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F11\u002F19\u002Fbuilding-a-gigascale-ml-feature-store-with-redis\u002F) `DoorDash` `2020`\n12. [通过标准化实现快速实验：LinkedIn信息流中的类型化AI特征](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Ffeed-typed-ai-features) `LinkedIn` `2020`\n13. [构建特征存储](https:\u002F\u002Fnlathia.github.io\u002F2020\u002F12\u002FBuilding-a-feature-store.html) `Monzo Bank` `2020`\n14. [Butterfree：基于Spark的特征存储构建框架](https:\u002F\u002Fmedium.com\u002Fquintoandar-tech-blog\u002Fbutterfree-a-spark-based-framework-for-feature-store-building-48c3640522c7) ([代码](https:\u002F\u002Fgithub.com\u002Fquintoandar\u002Fbutterfree)) `QuintoAndar` `2020`\n15. [构建Riviera：声明式实时特征工程框架](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F03\u002F04\u002Fbuilding-a-declarative-real-time-feature-engineering-framework\u002F) `DoorDash` `2021`\n16. [最优特征发现：通过信息论实现更优、更精简的机器学习模型](https:\u002F\u002Feng.uber.com\u002Foptimal-feature-discovery-ml\u002F) `Uber` `2021`\n17. [Lyft的ML特征服务基础设施](https:\u002F\u002Feng.lyft.com\u002Fml-feature-serving-infrastructure-at-lyft-d30bf2d3c32a) `Lyft` `2021`\n18. [近实时特征用于近实时个性化](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2022\u002Fnear-real-time-features-for-near-real-time-personalization) `LinkedIn` `2022`\n19. [构建DoorDash广泛商家选择背后的模型](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F04\u002F19\u002Fbuilding-merchant-selection\u002F) `DoorDash` `2022`\n20. [开源Feathr——LinkedIn用于高效机器学习的特征存储](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2022\u002Fopen-sourcing-feathr---linkedin-s-feature-store-for-productive-m) `LinkedIn` `2022`\n21. [ML事实存储的发展](https:\u002F\u002Fnetflixtechblog.com\u002Fevolution-of-ml-fact-store-5941d3231762) `Netflix` `2022`\n22. [开发可扩展的特征工程DAG](https:\u002F\u002Fouterbounds.com\u002Fblog\u002Fdeveloping-scalable-feature-engineering-dags) 由`Metaflow + Hamilton`通过`Outerbounds`实现 `2022`\n23. [Constructor公司的特征存储设计](https:\u002F\u002Fmedium.com\u002Fconstructor-engineering\u002Ffeature-store-design-at-constructor-330b65f64b18) `Constructor.io` `2023`\n\n## 分类\n1. [Google AdWords 广告主流失预测](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub36678\u002F) ([论文](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F36678.pdf)) `Google` `2010`\n2. [现代规模下的高精度基于短语的文档分类](https:\u002F\u002Fengineering.linkedin.com\u002Fresearch\u002F2011\u002Fhigh-precision-phrase-based-document-classification-on-a-modern-scale) ([论文](http:\u002F\u002Fweb.stanford.edu\u002F~gavish\u002Fdocuments\u002Fphrase_based.pdf)) `LinkedIn` `2011`\n3. [Chimera：利用机器学习、规则和众包进行大规模分类](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.14778\u002F2733004.2733024) ([论文](http:\u002F\u002Fpages.cs.wisc.edu\u002F%7Eanhai\u002Fpapers\u002Fchimera-vldb14.pdf)) `Walmart` `2014`\n4. [使用多层循环神经网络进行电子商务中的大规模商品分类](https:\u002F\u002Fwww.kdd.org\u002Fkdd2016\u002Fsubtopic\u002Fview\u002Flarge-scale-item-categorization-in-e-commerce-using-multiple-recurrent-neur\u002F) ([论文](https:\u002F\u002Fwww.kdd.org\u002Fkdd2016\u002Fpapers\u002Ffiles\u002Fadf0392-haAemb.pdf)) `NAVER` `2016`\n5. [使用 LSTM 循环神经网络学习诊断](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.03677) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.03677.pdf)) `Google` `2017`\n6. [在 Airbnb 发现并分类应用内消息意图](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fdiscovering-and-classifying-in-app-message-intent-at-airbnb-6a55f5400a0c) `Airbnb` `2019`\n7. [教机器对 Firefox 错误进行分类](https:\u002F\u002Fhacks.mozilla.org\u002F2019\u002F04\u002Fteaching-machines-to-triage-firefox-bugs\u002F) `Mozilla` `2019`\n8. [大规模商品分类](https:\u002F\u002Fengineering.shopify.com\u002Fblogs\u002Fengineering\u002Fcategorizing-products-at-scale) `Shopify` `2020`\n9. [我们如何构建“首个好问题”功能](https:\u002F\u002Fgithub.blog\u002F2020-01-22-how-we-built-good-first-issues\u002F) `GitHub` `2020`\n10. [利用机器学习更高效地测试 Firefox](https:\u002F\u002Fhacks.mozilla.org\u002F2020\u002F07\u002Ftesting-firefox-more-efficiently-with-machine-learning\u002F) `Mozilla` `2020`\n11. [使用机器学习对接受数字心理健康干预的患者进行亚型划分](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fblog\u002Fa-path-to-personalization-using-ml-to-subtype-patients-receiving-digital-mental-health-interventions\u002F) ([论文](https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjamanetworkopen\u002Ffullarticle\u002F2768347)) `Microsoft` `2020`\n12. [面向安全与隐私的可扩展数据分类](https:\u002F\u002Fengineering.fb.com\u002Fsecurity\u002Fdata-classification-system\u002F) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.14109.pdf)) `Facebook` `2020`\n13. [利用机器学习挖掘在线外卖菜单最佳实践](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F11\u002F10\u002Funcovering-online-delivery-menu-best-practices-with-machine-learning\u002F) `DoorDash` `2020`\n14. [通过人机协作克服菜单项标注的冷启动问题](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F08\u002F28\u002Fovercome-the-cold-start-problem-in-menu-item-tagging\u002F) `DoorDash` `2020`\n15. [深度学习：商品分类与货架摆放](https:\u002F\u002Fmedium.com\u002Fwalmartglobaltech\u002Fdeep-learning-product-categorization-and-shelving-630571e81e96) `Walmart` `2021`\n16. [面向电子商务的大规模商品分类](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F2396761.2396838) ([论文](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FJean_David_Ruvini\u002Fpublication\u002F262270957_Large-scale_item_categorization_for_e-commerce\u002Flinks\u002F5512dc3d0cf270fd7e33a0d5\u002FLarge-scale-item-categorization-for-e-commerce.pdf)) `DianPing`, `eBay` `2012`\n17. [语义标签表示及其在多模态商品分类中的应用](https:\u002F\u002Fmedium.com\u002Fwalmartglobaltech\u002Fsemantic-label-representation-with-an-application-on-multimodal-product-categorization-63d668b943b7) `Walmart` `2022`\n18. [利用机器学习与人机协作构建 Airbnb 类别](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fbuilding-airbnb-categories-with-ml-and-human-in-the-loop-e97988e70ebb) `Airbnb` `2022`\n\n\n## 回归\n1. [利用机器学习预测 Airbnb 上房源的价值](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fusing-machine-learning-to-predict-value-of-homes-on-airbnb-9272d3d4739d) `Airbnb` `2017`\n2. [利用机器学习预测广告请求的价值](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2020\u002Fusing-machine-learning-to-predict-the-value-of-ad-requests.html) `Twitter` `2020`\n3. [开源 Riskquant 风险量化库](https:\u002F\u002Fnetflixtechblog.com\u002Fopen-sourcing-riskquant-a-library-for-quantifying-risk-6720cc1e4968) ([代码](https:\u002F\u002Fgithub.com\u002FNetflix-Skunkworks\u002Friskquant)) `Netflix` `2020`\n4. [通过简单的数据调整解决回归模型中的未观测数据问题](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F10\u002F14\u002Fsolving-for-unobserved-data-in-a-regression-model\u002F) `DoorDash` `2020`\n\n## 预测\n1. [使用 RNN 在 Uber 进行极端事件预测](https:\u002F\u002Feng.uber.com\u002Fneural-networks\u002F) `Uber` `2017`\n2. [Uber 的预测：入门介绍](https:\u002F\u002Feng.uber.com\u002Fforecasting-introduction\u002F) `Uber` `2018`\n3. [在 Uber 利用数据科学和机器学习变革财务预测](https:\u002F\u002Feng.uber.com\u002Ftransforming-financial-forecasting-machine-learning\u002F) `Uber` `2018`\n4. [Gojek 自动化预测工具揭秘](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Funder-the-hood-of-gojeks-automated-forecasting-tool) `Gojek` `2019`\n5. [BusTr：基于实时交通预测公交车运行时间](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3394486.3403376)（[论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3394486.3403376)，[视频](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369790576dd25aef288db\u002F)）`Google` `2020`\n6. [新冠疫情后重新训练机器学习模型](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F09\u002F15\u002Fretraining-ml-models-covid-19\u002F) `DoorDash` `2020`\n7. [使用 Prophet、Databricks、Delta Lake 和 MLflow 进行自动预测](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TkcpjnLh690)（[论文](https:\u002F\u002Fpeerj.com\u002Fpreprints\u002F3190.pdf)，[代码](https:\u002F\u002Fgithub.com\u002Ffacebook\u002Fprophet)）`Atlassian` `2020`\n8. [推出 Orbit：用于时间序列推断与预测的开源工具包](https:\u002F\u002Feng.uber.com\u002Forbit\u002F)（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.08492)，[视频](https:\u002F\u002Fyoutu.be\u002FLXDpq_iwcWY)，[代码](https:\u002F\u002Fgithub.com\u002Fuber\u002Forbit)）`Uber` `2021`\n9. [通过机器学习管理供需平衡](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F06\u002F29\u002Fmanaging-supply-and-demand-balance-through-machine-learning\u002F) `DoorDash` `2021`\n10. [Greykite：灵活、直观且快速的预测库](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2021\u002Fgreykite--a-flexible--intuitive--and-fast-forecasting-library) `LinkedIn` `2021`\n11. [亚马逊预测算法的发展历程](https:\u002F\u002Fwww.amazon.science\u002Flatest-news\u002Fthe-history-of-amazons-forecasting-algorithm) `Amazon` `2021`\n11. [DeepETA：Uber 如何利用深度学习预测到达时间](https:\u002F\u002Feng.uber.com\u002Fdeepeta-how-uber-predicts-arrival-times\u002F) `Uber` `2022`\n12. [Grubhub 大规模订单量预测](https:\u002F\u002Fbytes.grubhub.com\u002Fforecasting-grubhub-order-volume-at-scale-a966c2f901d2) `Grubhub` `2022`\n13. [Lyft 的因果预测（第 1 部分）](https:\u002F\u002Feng.lyft.com\u002Fcausal-forecasting-at-lyft-part-1-14cca6ff3d6d) `Lyft` `2022`\n\n## 推荐列表\n1. [亚马逊推荐：基于物品的协同过滤](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F1167344) ([论文](https:\u002F\u002Fwww.cs.umd.edu\u002F~samir\u002F498\u002FAmazon-Recommendations.pdf)) `亚马逊` `2003`\n2. [Netflix推荐：超越五星评价（第1部分）](https:\u002F\u002Fnetflixtechblog.com\u002Fnetflix-recommendations-beyond-the-5-stars-part-1-55838468f429) ([第2部分](https:\u002F\u002Fnetflixtechblog.com\u002Fnetflix-recommendations-beyond-the-5-stars-part-2-d9b96aa399f5)) `Netflix` `2012`\n3. [音乐推荐如何运作——以及为何有时失效](https:\u002F\u002Fnotes.variogram.com\u002F2012\u002F12\u002F11\u002Fhow-music-recommendation-works-and-doesnt-work\u002F) `Spotify` `2012`\n4. [使用k阶统计量损失函数进行排序学习的推荐系统](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F2507157.2507210) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F2507157.2507210)) `Google` `2013`\n5. [利用深度学习在Spotify上进行音乐推荐](https:\u002F\u002Fbenanne.github.io\u002F2014\u002F08\u002F05\u002Fspotify-cnns.html) `Spotify` `2014`\n6. [学习个性化首页](https:\u002F\u002Fnetflixtechblog.com\u002Flearning-a-personalized-homepage-aa8ec670359a) `Netflix` `2015`\n7. [Netflix推荐系统：算法、商业价值与创新](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F2843948) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F2843948)) `Netflix` `2015`\n7. [基于会话的推荐：利用循环神经网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06939) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.06939.pdf)) `Telefonica` `2016`\n8. [YouTube推荐中的深度神经网络](https:\u002F\u002Fstatic.googleusercontent.com\u002Fmedia\u002Fresearch.google.com\u002Fen\u002F\u002Fpubs\u002Farchive\u002F45530.pdf) `YouTube` `2016`\n9. [电商推送至您的收件箱：大规模产品推荐](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.07154) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.07154.pdf)) `Yahoo` `2016`\n10. [待续：帮助您在Netflix上找到可继续观看的节目](https:\u002F\u002Fnetflixtechblog.com\u002Fto-be-continued-helping-you-find-shows-to-continue-watching-on-7c0d8ee4dab6) `Netflix` `2016`\n11. [LinkedIn Learning中的个性化推荐](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2016\u002F12\u002Fpersonalized-recommendations-in-linkedin-learning) `LinkedIn` `2016`\n12. [Slack中个性化的频道推荐](https:\u002F\u002Fslack.engineering\u002Fpersonalized-channel-recommendations-in-slack\u002F) `Slack` `2016`\n13. [电商推送通知中的互补产品推荐](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.08113) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1707.08113.pdf)) `阿里巴巴` `2017`\n14. [Netflix上的作品封面个性化](https:\u002F\u002Fnetflixtechblog.com\u002Fartwork-personalization-c589f074ad76) `Netflix` `2017`\n15. [元学习视角下的冷启动物品推荐](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F7266-a-meta-learning-perspective-on-cold-start-recommendations-for-items) ([论文](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F7266-a-meta-learning-perspective-on-cold-start-recommendations-for-items.pdf)) `Twitter` `2017`\n16. [Pixie：一个为超过2亿用户实时推荐30多亿种商品的系统](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.07601) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.07601.pdf)) `Pinterest` `2017`\n17. [DoorDash中搜索与推荐的支撑技术](https:\u002F\u002Fdoordash.news\u002Fcompany\u002Fpowering-search-recommendations-at-doordash\u002F) `DoorDash` `2017`\n17. [20世纪福克斯如何利用机器学习预测电影观众](https:\u002F\u002Fcloud.google.com\u002Fblog\u002Fproducts\u002Fai-machine-learning\u002Fhow-20th-century-fox-uses-ml-to-predict-a-movie-audience) ([论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.08189)) `20世纪福克斯` `2018`\n18. [校准后的推荐](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3240323.3240372) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3240323.3240372)) `Netflix` `2018`\n19. [Uber Eats中的美食发现：面向市场的推荐](https:\u002F\u002Feng.uber.com\u002Fuber-eats-recommending-marketplace\u002F) `Uber` `2018`\n20. [探索、利用与解释：用Bandit算法实现可解释的个性化推荐](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3240323.3240354) ([论文](https:\u002F\u002Fstatic1.squarespace.com\u002Fstatic\u002F5ae0d0b48ab7227d232c2bea\u002Ft\u002F5ba849e3c83025fa56814f45\u002F1537755637453\u002FBartRecSys.pdf)) `Spotify` `2018`\n21. [LinkedIn中的人才搜索与推荐系统：实践挑战与经验教训](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.06481) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1809.06481.pdf)) `LinkedIn` `2018`\n21. [阿里巴巴电商推荐中的行为序列Transformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.06874) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.06874.pdf)) `阿里巴巴` `2019`\n22. [SDM：用于在线大规模推荐系统的序列深度匹配模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.00385) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.00385.pdf)) `阿里巴巴` `2019`\n23. [天猫推荐中的多兴趣网络与动态路由](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.08030) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.08030.pdf)) `阿里巴巴` `2019`\n24. [利用深度学习为旅行体验提供个性化推荐](https:\u002F\u002Fwww.tripadvisor.com\u002Fengineering\u002Fpersonalized-recommendations-for-experiences-using-deep-learning\u002F) `TripAdvisor` `2019`\n25. [由AI驱动：Instagram的Explore推荐系统](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fpowered-by-ai-instagrams-explore-recommender-system\u002F) `Facebook` `2019`\n26. [针对Slate Bandit的边际后验采样](https:\u002F\u002Fwww.ijcai.org\u002Fproceedings\u002F2019\u002F308) ([论文](https:\u002F\u002Fwww.ijcai.org\u002Fproceedings\u002F2019\u002F0308.pdf)) `Netflix` `2019`\n27. [Uber Eats中的美食发现：利用图学习提升推荐效果](https:\u002F\u002Feng.uber.com\u002Fuber-eats-graph-learning\u002F) `Uber` `2019`\n28. [Spotify的音乐推荐](http:\u002F\u002Fsigir.org\u002Fafirm2019\u002Fslides\u002F16.%20Friday%20-%20Music%20Recommendation%20at%20Spotify%20-%20Ben%20Carterette.pdf) `Spotify` `2019`\n29. [利用机器学习预测您接下来需要的文件（第1部分）](https:\u002F\u002Fdropbox.tech\u002Fmachine-learning\u002Fcontent-suggestions-machine-learning) `Dropbox` `2019`\n30. [利用机器学习预测您接下来需要的文件（第2部分）](https:\u002F\u002Fdropbox.tech\u002Fmachine-learning\u002Fusing-machine-learning-to-predict-what-file-you-need-next-part-2) `Dropbox` `2019`\n31. [不断优化相关性：课程推荐系统的演进](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3357384.3357817) (**需补充论文**) `LinkedIn` `2019`\n32. [实时的时序上下文推荐](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Ftemporal-contextual-recommendation-in-real-time) ([论文](https:\u002F\u002Fassets.amazon.science\u002F96\u002F71\u002Fd1f25754497681133c7aa2b7eb05\u002Ftemporal-contextual-recommendation-in-real-time.pdf)) `亚马逊` `2020`\n33. [P-Companion：多样化互补产品推荐框架](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fp-companion-a-principled-framework-for-diversified-complementary-product-recommendation) ([论文](https:\u002F\u002Fassets.amazon.science\u002Fd5\u002F16\u002F3f7809974a899a11bacdadefdf24\u002Fp-companion-a-principled-framework-for-diversified-complementary-product-recommendation.pdf)) `亚马逊` `2020`\n34. [深度兴趣与层次化注意力网络用于点击率预测](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.12981) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.12981.pdf)) `阿里巴巴` `2020`\n35. [TPG-DNN：多任务学习用于用户意图预测的方法](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.02122) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.02122.pdf)) `阿里巴巴` `2020`\n36. [PURS：提升用户满意度的个性化意外推荐系统](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3383313.3412238) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3383313.3412238)) `阿里巴巴` `2020`\n37. [可控的多兴趣推荐框架](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.09347) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.09347.pdf)) `阿里巴巴` `2020`\n38. [MiNet：跨领域点击率预测的混合兴趣网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.02974) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.02974.pdf)) `阿里巴巴` `2020`\n39. [ATBRG：自适应目标-行为关系图网络用于高效推荐](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.12002) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.12002.pdf)) `阿里巴巴` `2020`\n40. [专属于您的耳朵：利用机器学习个性化Spotify首页](https:\u002F\u002Fengineering.atspotify.com\u002F2020\u002F01\u002F16\u002Ffor-your-ears-only-personalizing-spotify-home-with-machine-learning\u002F) `Spotify` `2020`\n41. [直冲云霄：Spotify如何仅用六个月打造快捷方式](https:\u002F\u002Fengineering.atspotify.com\u002F2020\u002F04\u002F15\u002Freach-for-the-top-how-spotify-built-shortcuts-in-just-six-months\u002F) `Spotify` `2020`\n42. [用于大规模音乐推荐的上下文与序列用户嵌入](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3383313.3412248) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3383313.3412248)) `Spotify` `2020`\n43. [Kit的演变：利用机器学习自动化营销](https:\u002F\u002Fengineering.shopify.com\u002Fblogs\u002Fengineering\u002Fevolution-kit-automating-marketing-machine-learning) `Shopify` `2020`\n44. [深入解析LinkedIn Learning课程推荐背后的AI（第1部分）](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fcourse-recommendations-ai-part-one) `LinkedIn` `2020`\n45. [深入解析LinkedIn Learning课程推荐背后的AI（第2部分）](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fcourse-recommendations-ai-part-two) `LinkedIn` `2020`\n46. [构建异构社交网络推荐系统](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fbuilding-a-heterogeneous-social-network-recommendation-system) `LinkedIn` `2020`\n47. [TikTok如何为您推荐#ForYou视频](https:\u002F\u002Fnewsroom.tiktok.com\u002Fen-us\u002Fhow-tiktok-recommends-videos-for-you) `字节跳动` `2020`\n48. [从RecSys到冷启动搜索召回的零样本异构迁移学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.02930) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.02930.pdf)) `Google` `2020`\n49. [改进的Deep & Cross网络用于Web规模LTR系统中的特征交叉学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.13535) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.13535.pdf)) `Google` `2020`\n50. [用于推荐中双塔神经网络学习的混合负采样](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub50257\u002F) ([论文](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002Fb9f4e78a8830fe5afcf2f0452862fb3c0d6584ea.pdf)) `Google` `2020`\n51. [未来数据助力训练：为会话型推荐建模未来情境](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.04473.pdf) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.04473.pdf)) `腾讯` `2020`\n52. [家居装修领域的会话型推荐案例研究](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3383313.3412235) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3383313.3412235)) `Home Depot` `2020`\n53. [平衡相关性和发现性以激发顾客兴趣：宜家App中的推荐](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3383313.3411550) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3383313.3411550)) `宜家` `2020`\n54. [我们如何在Pinterest广告中使用AutoML、多任务学习和多塔模型](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fhow-we-use-automl-multi-task-learning-and-multi-tower-models-for-pinterest-ads-db966c3dc99e) `Pinterest` `2020`\n55. [Pinterest中相关产品推荐的多任务学习](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fmulti-task-learning-for-related-products-recommendations-at-pinterest-62684f631c12) `Pinterest` `2020`\n56. [通过轻量级排序提升推荐Pin的质量](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fimproving-the-quality-of-recommended-pins-with-lightweight-ranking-8ff5477b20e3) `Pinterest` `2020`\n57. [多任务学习与校准用于基于效用的主页信息流排序](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fmulti-task-learning-and-calibration-for-utility-based-home-feed-ranking-64087a7bcbad) `Pinterest` `2020`\n57. [基于客户偏好和当地受欢迎程度的个性化菜系筛选器](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F01\u002F27\u002Fpersonalized-cuisine-filter\u002F) `DoorDash` `2020`\n58. [我们如何构建交叉销售产品的匹配算法](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Fhow-we-built-a-matchmaking-algorithm-to-cross-sell-products) `Gojek` `2020`\n59. [从基于模型的候选生成中应对数据集偏差的经验教训](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.09293) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.09293.pdf)) `Twitter` `2021`\n60. [用于大规模物品推荐的自监督学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.12865) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.12865.pdf)) `Google` `2021`\n61. [深度检索：端到端可学习的大规模推荐结构模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.07203) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.07203.pdf)) `字节跳动` `2021`\n62. [利用AI帮助卫生专家应对新冠疫情](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fusing-ai-to-help-health-experts-address-the-covid-19-pandemic\u002F) `Facebook` `2021`\n63. [Pinterest中的广告主推荐系统](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fadvertiser-recommendation-systems-at-pinterest-ccb255fbde20) `Pinterest` `2021`\n64. [关于YouTube推荐系统的思考](https:\u002F\u002Fblog.youtube\u002Finside-youtube\u002Fon-youtubes-recommendation-system\u002F) `YouTube` `2021`\n65. [\"您确定吗？\"：将产品比较扩展到多家商店的初步见解](https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.03256) `Coveo` `2021`\n66. [Mozrt：赋能沃尔玛门店员工的深度学习推荐系统](https:\u002F\u002Fmedium.com\u002Fwalmartglobaltech\u002Fmozrt-a-deep-learning-recommendation-system-empowering-walmart-store-associates-with-a-5d42c08d88da) `沃尔玛` `2021`\n67. [理解大规模深度推荐模型训练中的数据存储与摄入](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.09373) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.09373.pdf)) `Meta` `2021`\n67. [亚马逊音乐的对话式推荐系统正奏响美妙乐章](https:\u002F\u002Fwww.amazon.science\u002Flatest-news\u002Fhow-amazon-music-uses-recommendation-system-machine-learning) `亚马逊` `2022`\n68. [个性化互补产品推荐](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fpersonalized-complementary-product-recommendation) ([论文](https:\u002F\u002Fassets.amazon.science\u002F6c\u002Fd9\u002Fa0ec3eda4f0fb4312ce0ada41771\u002Fpersonalized-complementary-product-recommendation.pdf)) `亚马逊` `2022`\n69. [构建基于深度学习的检索系统以支持个性化推荐](https:\u002F\u002Ftech.ebayinc.com\u002Fengineering\u002Fbuilding-a-deep-learning-based-retrieval-system-for-personalized-recommendations\u002F) `eBay` `2022`\n70. [我们如何构建：早期阶段的机器学习推荐模型](https:\u002F\u002Fwww.onepeloton.com\u002Fpress\u002Farticles\u002Fhow-we-built-machine-learning) `Peloton` `2022`\n71. [构建上下文感知推荐系统的经验教训](https:\u002F\u002Fwww.onepeloton.com\u002Fpress\u002Farticles\u002Flessons-learned-from-building-context-aware-recommender-systems) `Peloton` `2022`\n72. [超越矩阵分解：利用混合特征进行用户-商家推荐](https:\u002F\u002Fengineeringblog.yelp.com\u002F2022\u002F04\u002Fbeyond-matrix-factorization-using-hybrid-features-for-user-business-recommendations.html) `Yelp` `2022`\n73. [利用机器学习活动特征改善职位匹配](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2022\u002Fimproving-job-matching-with-machine-learned-activity-features-) `LinkedIn` `2022`\n74. [理解大规模深度推荐模型训练中的数据存储与摄入](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.09373v4) `Meta` `2022`\n75. [推荐系统架构蓝图：十周年纪念版](https:\u002F\u002Famatriain.net\u002Fblog\u002FRecsysArchitectures) `Xavier Amatriain` `2022`\n76. [Pinterest如何利用实时用户行为提升主页信息流互动量](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fhow-pinterest-leverages-realtime-user-actions-in-recommendation-to-boost-homefeed-engagement-volume-165ae2e8cde8) `Pinterest` `2022`\n77. [RecSysOps：大型推荐系统运营的最佳实践](https:\u002F\u002Fnetflixtechblog.medium.com\u002Frecsysops-best-practices-for-operating-a-large-scale-recommender-system-95bbe195a841) `Netflix` `2022`\n78. [Recommend API：统一的端到端机器学习基础设施，用于生成推荐](https:\u002F\u002Fslack.engineering\u002Frecommend-api\u002F) `Slack` `2022`\n79. [DoorDash替换推荐算法的演进](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F09\u002F08\u002Fevolving-doordashs-substitution-recommendations-algorithm\u002F) `DoorDash` `2022`\n80. [首页推荐中的开发与探索](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F10\u002F05\u002Fhomepage-recommendation-with-exploitation-and-exploration\u002F) `DoorDash` `2022`\n81. [Pinterest中GPU加速的ML推理](https:\u002F\u002Fmedium.com\u002F@Pinterest_Engineering\u002Fgpu-accelerated-ml-inference-at-pinterest-ad1b6a03a16d) `Pinterest` `2022`\n82. [解决因果推荐中的混杂因素问题](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.06532) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.06532.pdf)) `腾讯` `2022`\n\n\n\n## 搜索与排序\n1. [亚马逊搜索：产品排序的乐趣](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Famazon-search-the-joy-of-ranking-products)（[论文](https:\u002F\u002Fassets.amazon.science\u002F89\u002Fcd\u002F34289f1f4d25b5857d776bdf04d5\u002Famazon-search-the-joy-of-ranking-products.pdf)、[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=NLrhmn-EZ88)、[代码](https:\u002F\u002Fgithub.com\u002Fdariasor\u002FTreeExtra)) `亚马逊` `2016`\n2. [Lazada 如何通过产品排序提升用户体验和转化率](https:\u002F\u002Fwww.slideshare.net\u002Feugeneyan\u002Fhow-lazada-ranks-products-to-improve-customer-experience-and-conversion) `Lazada` `2016`\n3. [雅虎搜索中的相关性排序](https:\u002F\u002Fwww.kdd.org\u002Fkdd2016\u002Fsubtopic\u002Fview\u002Franking-relevance-in-yahoo-search)（[论文](https:\u002F\u002Fwww.kdd.org\u002Fkdd2016\u002Fpapers\u002Ffiles\u002Fadf0361-yinA.pdf)）`雅虎` `2016`\n4. [在职业社交网络中学习对个性化搜索结果进行排序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.04624)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1605.04624.pdf)）`领英` `2016`\n5. [在推特时间线中大规模应用深度学习](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2017\u002Fusing-deep-learning-at-scale-in-twitters-timelines.html) `推特` `2017`\n6. [基于集成方法的 Etsy 推广商品点击率预测](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.01377)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.01377.pdf)）`Etsy` `2017`\n7. [DoorDash 的搜索与推荐系统支撑技术](https:\u002F\u002Fdoordash.engineering\u002F2017\u002F07\u002F06\u002Fpowering-search-recommendations-at-doordash\u002F) `DoorDash` `2017`\n8. [将深度学习应用于 Airbnb 搜索](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.09591)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.09591.pdf)）`Airbnb` `2018`\n9. [人才搜索中的会话内个性化](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.06488)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1809.06488.pdf)）`领英` `2018`\n10. [领英的人才搜索与推荐系统](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.06481)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1809.06481.pdf)）`领英` `2018`\n11. [优步外卖的食物发现：构建查询理解引擎](https:\u002F\u002Feng.uber.com\u002Fuber-eats-query-understanding\u002F) `优步` `2018`\n12. [电子商务搜索中全局优化的互影响感知排序](https:\u002F\u002Farxiv.org\u002Fabs\u002F1805.08524)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1805.08524.pdf)）`阿里巴巴` `2018`\n13. [强化学习在电子商务搜索引擎排序中的应用](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.00710)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.00710.pdf)）`阿里巴巴` `2018`\n14. [语义化商品搜索](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.00937)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.00937.pdf)）`亚马逊` `2019`\n15. [机器学习驱动的 Airbnb Experiences 搜索排名](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fmachine-learning-powered-search-ranking-of-airbnb-experiences-110b4b1a0789) `Airbnb` `2019`\n16. [基于树交互特征的实体个性化人才搜索模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.09041)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1902.09041.pdf)）`领英` `2019`\n17. [领英招聘者搜索与推荐系统背后的 AI 技术](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002F04\u002Fai-behind-linkedin-recruiter-search-and-recommendation-systems) `领英` `2019`\n18. [学习招聘偏好：领英职位搜索背后的 AI 技术](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002F02\u002Flearning-hiring-preferences--the-ai-behind-linkedin-jobs) `领英` `2019`\n19. [搜索个性化背后的秘密武器](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Fthe-secret-sauce-behind-search-personalisation) `Gojek` `2019`\n20. [神经代码搜索：基于 ML 的自然语言查询代码搜索](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fneural-code-search-ml-based-code-search-using-natural-language-queries\u002F) `Facebook` `2019`\n21. [通过强化学习聚合来自异构来源的搜索结果](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.08882)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1902.08882.pdf)）`阿里巴巴` `2019`\n22. [面向电子商务搜索的跨域注意力网络与 Wasserstein 正则化](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3357384.3357809) `阿里巴巴` `2019`\n23. [比以往任何时候都更好地理解搜索](https:\u002F\u002Fwww.blog.google\u002Fproducts\u002Fsearch\u002Fsearch-language-understanding-bert\u002F)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.04805.pdf)）`谷歌` `2019`\n24. [我们如何利用语义搜索让搜索效率提升 10 倍](https:\u002F\u002Fmedium.com\u002Ftokopedia-engineering\u002Fhow-we-used-semantic-search-to-make-our-search-10x-smarter-bd9c7f601821) `Tokopedia` `2019`\n25. [Query2vec：使用查询嵌入扩展搜索查询](https:\u002F\u002Fbytes.grubhub.com\u002Fsearch-query-embeddings-using-query2vec-f5931df27d79) `GrubHub` `2019`\n26. [MOBIUS：迈向百度推广搜索下一代查询广告匹配技术](http:\u002F\u002Fresearch.baidu.com\u002FPublic\u002Fuploads\u002F5d12eca098d40.pdf) `百度` `2019`\n27. [为什么人们会在语音商品搜索中购买看似不相关的商品？](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fwhy-do-people-buy-irrelevant-items-in-voice-product-search)（[论文](https:\u002F\u002Fassets.amazon.science\u002Ff7\u002F48\u002F0562b2c14338a0b76ccf4f523fa5\u002Fwhy-do-people-buy-irrelevant-items-in-voice-product-search.pdf)）`亚马逊` `2020`\n28. [管理 Airbnb 搜索中的多样性](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.02621)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.02621.pdf)）`Airbnb` `2020`\n29. [改进 Airbnb 搜索的深度学习技术](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.05515)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2002.05515.pdf)）`Airbnb` `2020`\n30. [通过个性化 AI 实现招聘方和求职者的优质匹配](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fquality-matches-via-personalized-ai) `领英` `2020`\n31. [理解停留时间以提升领英信息流排名](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Funderstanding-feed-dwell-time) `领英` `2020`\n32. [通过约束优化进行信息流广告分配](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3394486.3403391)（[论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3394486.3403391)，[视频](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f33697a0576dd25aef288ea\u002F)) `领英` `2020`\n33. [理解停留时间以提升领英信息流排名](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Funderstanding-feed-dwell-time) `领英` `2020`\n34. [必应中的规模化 AI 技术](https:\u002F\u002Fblogs.bing.com\u002Fsearch\u002F2020_05\u002FAI-at-Scale-in-Bing) `微软` `2020`\n35. [Traveloka 通用搜索中的查询理解引擎](https:\u002F\u002Fmedium.com\u002Ftraveloka-engineering\u002Fquery-understanding-engine-in-traveloka-universal-search-410ad3895db7) `Traveloka` `2020`\n36. [Wayfair 中的贝叶斯商品排序](https:\u002F\u002Ftech.wayfair.com\u002Fdata-science\u002F2020\u002F01\u002Fbayesian-product-ranking-at-wayfair) `Wayfair` `2020`\n37. [COLD：迈向下一代预排序系统](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.16122)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.16122.pdf)）`阿里巴巴` `2020`\n38. [Shop The Look：在 Pinterest 上构建大规模视觉购物系统](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3394486.3403372)（[论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3394486.3403372)，[视频](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369790576dd25aef288d7\u002F)) `Pinterest` `2020`\n39. [通过 Pinterest 搜索推动购物加售](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fdriving-shopping-upsells-from-pinterest-search-d06329255402) `Pinterest` `2020`\n40. [GDMix：一个深度排序个性化框架](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fgdmix--a-deep-ranking-personalization-framework)（[代码](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fgdmix)）`领英` `2020`\n41. [为 Etsy 带来个性化搜索](https:\u002F\u002Fcodeascraft.com\u002F2020\u002F10\u002F29\u002Fbringing-personalized-search-to-etsy\u002F) `Etsy` `2020`\n42. [为 Semantic Scholar 构建更优秀的搜索引擎](https:\u002F\u002Fmedium.com\u002Fai2-blog\u002Fbuilding-a-better-search-engine-for-semantic-scholar-ea23a0b661e7) `艾伦人工智能研究所` `2020`\n43. [面向自然语言企业级搜索的查询理解](https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.06238)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2012.06238.pdf)）`Salesforce` `2020`\n44. [事物而非字符串：通过更好的召回率理解搜索意图](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F12\u002F15\u002Funderstanding-search-intent-with-better-recall\u002F) `DoorDash` `2020`\n45. [用于挖掘未被充分发掘音乐内容的查询理解](https:\u002F\u002Fresearch.atspotify.com\u002Fpublications\u002Fquery-understanding-for-surfacing-under-served-music-content\u002F)（[论文](https:\u002F\u002Flabtomarket.files.wordpress.com\u002F2020\u002F08\u002Fcikm2020.pdf)）`Spotify` `2020`\n46. [基于嵌入的 Facebook 搜索检索](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11632)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.11632.pdf)）`Facebook` `2020`\n47. [通过嵌入学习实现电子商务搜索的个性化与语义化检索](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.02282)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.02282.pdf)）`京东` `2020`\n48. [QUEEN：电商领域的神经查询重写](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fqueen-neural-query-rewriting-in-e-commerce)（[论文](https:\u002F\u002Fassets.amazon.science\u002Ff9\u002F78\u002Fdda8f1e143dba8ca96e43ec487c6\u002Fqueen-neural-query-rewriting-in-ecommerce.pdf)）`亚马逊` `2021`\n49. [利用学习排序精准定位包裹配送地点](https:\u002F\u002Fwww.amazon.science\u002Fblog\u002Fusing-learning-to-rank-to-precisely-locate-where-to-deliver-packages)（[论文](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fgetting-your-package-to-the-right-place-supervised-machine-learning-for-geolocation)) `亚马逊` `2021`\n50. [电商搜索中的季节性相关性](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fseasonal-relevance-in-e-commerce-search)（[论文](https:\u002F\u002Fassets.amazon.science\u002Fac\u002F5e\u002Fd47612a846d6bec15738d7c8ab40\u002Fseasonal-relevance-in-ecommerce-search.pdf)）`亚马逊` `2021`\n51. [赞助搜索中用于预测点击率的图意图网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.16164)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.16164.pdf)）`阿里巴巴` `2021`\n52. [我们如何为 Etsy 广告构建特定场景的竞价系统](https:\u002F\u002Fcodeascraft.com\u002F2021\u002F03\u002F23\u002Fhow-we-built-a-context-specific-bidding-system-for-etsy-ads\u002F) `Etsy` `2021`\n53. [基于预训练语言模型的百度搜索排名](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.11108)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.11108.pdf)）`百度` `2021`\n54. [拼接空间以支持基于查询的推荐](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2021\u002F08\u002F13\u002Fstitching-together-spaces-for-query-based-recommendations\u002F) `Stitch Fix` `2021`\n55. [为领英搜索系统提供深度自然语言处理](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.08252)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.08252.pdf)）`领英` `2021`\n56. [基于 Siamese BERT 的网页搜索相关性排名模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.01810)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2112.01810.pdf)，[代码](https:\u002F\u002Fgithub.com\u002Fseznam\u002FDaReCzech)) `Seznam` `2021`\n57. [SearchSage：在 Pinterest 上学习搜索查询表示](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fsearchsage-learning-search-query-representations-at-pinterest-654f2bb887fc) `Pinterest` `2021`\n58. [Query2Prod2Vec：面向电商的 grounded 词嵌入](https:\u002F\u002Faclanthology.org\u002F2021.naacl-industry.20\u002F) `Coveo` `2021`\n59. [三项措施扩大 DoorDash 商品搜索范围，超越配送业务](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F05\u002F10\u002F3-changes-to-expand-doordashs-product-search\u002F) `DoorDash` `2022`\n60. [学习多样化排序](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Flearning-to-rank-diversely-add6b1929621) `Airbnb` `2022`\n61. [如何利用级联多臂老虎机优化排名](https:\u002F\u002Fmedium.com\u002Fexpedia-group-tech\u002Fhow-to-optimise-rankings-with-cascade-bandits-5d92dfa0f16b) `Expedia` `2022`\n62. [谷歌搜索排名系统指南](https:\u002F\u002Fdevelopers.google.com\u002Fsearch\u002Fdocs\u002Fappearance\u002Franking-systems-guide) `谷歌` `2022`\n63. [Etsy 搜索排名中的深度学习技术](https:\u002F\u002Fwww.etsy.com\u002Fcodeascraft\u002Fdeep-learning-for-search-ranking-at-etsy) `Etsy` `2022`\n64. [Calm 应用中的搜索功能](https:\u002F\u002Feng.calm.com\u002Fposts\u002Fsearch-at-calm) `Calm` `2022`\n\n## 嵌入\n1. [用于构建推荐系统的物品、用户和购物车的向量表示](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.06338) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1705.06338.pdf)) `Sears` `2017`\n2. [阿里巴巴电商推荐中的亿级商品嵌入](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.02349) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.02349.pdf)) `阿里巴巴` `2018`\n3. [Twitter 的嵌入技术](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2018\u002Fembeddingsattwitter.html) `Twitter` `2018`\n4. [搜索排序中的房源嵌入](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Flisting-embeddings-for-similar-listing-recommendations-and-real-time-personalization-in-search-601172f7603e) ([论文](https:\u002F\u002Fwww.kdd.org\u002Fkdd2018\u002Faccepted-papers\u002Fview\u002Freal-time-personalization-using-embeddings-for-search-ranking-at-airbnb)) `Airbnb` `2018`\n5. [理解潜在风格](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2018\u002F06\u002F28\u002Flatent-style\u002F) `Stitch Fix` `2018`\n6. [LinkedIn 人才搜索中的深度表示学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.06473) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1809.06473.pdf)) `LinkedIn` `2018`\n7. [基于向量嵌入的个性化店铺信息流](https:\u002F\u002Fdoordash.engineering\u002F2018\u002F04\u002F02\u002Fpersonalized-store-feed-with-vector-embeddings\u002F) `DoorDash` `2018`\n8. [我们应该使用嵌入吗？实时推荐中嵌入性能的研究](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.06556) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.06556.pdf)) `Moshbit` `2019`\n9. [机器学习助力更优质的开发者体验](https:\u002F\u002Fnetflixtechblog.com\u002Fmachine-learning-for-a-better-developer-experience-1e600c69f36c) `Netflix` `2020`\n10. [发布 ScaNN：高效的向量相似性搜索](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F07\u002Fannouncing-scann-efficient-vector.html) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1908.10396.pdf)，[代码](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002Fmaster\u002Fscann)) `Google` `2020`\n11. [BERT 上市场：比较产品表示的分布模型](https:\u002F\u002Faclanthology.org\u002F2021.ecnlp-1.1\u002F) `Coveo` `2021`\n12. [从冷门领域走出的嵌入：利用内容推理改进新品和稀有产品的向量表示](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3383313.3411477) `Coveo` `2022`\n13. [Scribd 中的基于嵌入的检索](https:\u002F\u002Ftech.scribd.com\u002Fblog\u002F2021\u002Fembedding-based-retrieval-scribd.html) `Scribd` `2021`\n14. [行为型歌曲嵌入的多目标超参数优化](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.12724) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2208.12724.pdf)) `Apple` `2022`\n15. [Spotify 规模下的嵌入——能有多难？](https:\u002F\u002Farize.com\u002Fresource\u002Fembeddings-at-scale-spotify-recsys\u002F) `Spotify` `2023`\n\n## 自然语言处理\n1. [在线用户内容中的辱骂性语言检测](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F2872427.2883062) ([论文](http:\u002F\u002Fwww.yichang-cs.com\u002Fyahoo\u002FWWW16_Abusivedetection.pdf)) `雅虎` `2016`\n2. [Smart Reply：电子邮件自动回复建议](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub45189\u002F) ([论文](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F45189.pdf)) `谷歌` `2016`\n3. [为会员消息构建智能回复](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2017\u002F10\u002Fbuilding-smart-replies-for-member-messages) `领英` `2017`\n4. [自然语言处理如何帮助领英会员更轻松地获得支持](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002F04\u002Fhow-natural-language-processing-help-support) `领英` `2019`\n5. [Gmail Smart Compose：实时辅助写作](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.00080) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.00080.pdf)) `谷歌` `2019`\n6. [在真实场景中结合用户画像特征的目标导向端到端对话模型](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fgoal-oriented-end-to-end-chatbots-with-profile-features-in-a-real-world-setting) ([论文](https:\u002F\u002Fassets.amazon.science\u002F47\u002F03\u002Fe0d14dc34d3eb6e0d4ec282067bd\u002Fgoal-oriented-end-to-end-chatbots-with-profile-features-in-a-real-world-setting.pdf)) `亚马逊` `2019`\n7. [给我牛仔裤，不要鞋子：BERT如何帮助我们满足客户的需求](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2019\u002F07\u002F15\u002Fgive-me-jeans\u002F) `Stitch Fix` `2019`\n8. [DeText：用于智能文本理解的深度NLP框架](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fopen-sourcing-detext) ([代码](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fdetext)) `领英` `2020`\n9. [YouTube创作者的SmartReply](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F07\u002Fsmartreply-for-youtube-creators.html) `谷歌` `2020`\n10. [利用神经网络从表格中寻找答案](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F04\u002Fusing-neural-networks-to-find-answers.html) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.02349.pdf)) `谷歌` `2020`\n11. [一种可扩展的方法来减少谷歌翻译中的性别偏见](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F04\u002Fa-scalable-approach-to-reducing-gender.html) `谷歌` `2020`\n12. [辅助AI让回复更轻松](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fgroup\u002Fmsai\u002Farticles\u002Fassistive-ai-makes-replying-easier-2\u002F) `微软` `2020`\n13. [AI技术进步助力更好地检测仇恨言论](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fai-advances-to-better-detect-hate-speech\u002F) `Facebook` `2020`\n14. [最先进的开源聊天机器人](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fstate-of-the-art-open-source-chatbot) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.13637.pdf)) `Facebook` `2020`\n15. [部署在CPU上的高效实时文本转语音系统](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fa-highly-efficient-real-time-text-to-speech-system-deployed-on-cpus\u002F) `Facebook` `2020`\n16. [深度学习实现编程语言之间的翻译](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fdeep-learning-to-translate-between-programming-languages\u002F) ([论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.03511)，[代码](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FTransCoder)) `Facebook` `2020`\n17. [部署终身开放域对话学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.08076) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.08076.pdf)) `Facebook` `2020`\n18. [推出Dynabench：重新思考AI基准测试的方式](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fdynabench-rethinking-ai-benchmarking\u002F) `Facebook` `2020`\n19. [Gojek如何利用NLP大规模命名取货地点](https:\u002F\u002Fwww.gojek.io\u002Fblog\u002Fnlp-cartobert) `Gojek` `2020`\n20. [中文和英文的最先进开放域聊天机器人](http:\u002F\u002Fresearch.baidu.com\u002FBlog\u002Findex-view?id=142) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.16779.pdf)) `百度` `2020`\n21. [PEGASUS：最先进的摘要式文本摘要模型](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F06\u002Fpegasus-state-of-art-model-for.html) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.08777.pdf)，[代码](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fpegasus)) `谷歌` `2020`\n22. [Photon：鲁棒的跨领域文本转SQL系统](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.acl-demos.24\u002F) ([论文](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.acl-demos.24.pdf)) ([演示](http:\u002F\u002Fnaturalsql.com)) `Salesforce` `2020`\n23. [GeDi：控制语言模型的强大新方法](https:\u002F\u002Fblog.einstein.ai\u002Fgedi\u002F) ([论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.06367)，[代码](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FGeDi)) `Salesforce` `2020`\n24. [应用主题建模提升呼叫中心运营](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=kzRR8OjF_eI&t=2s) `RICOH` `2020`\n25. [WIDeText：多模态深度学习框架](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fwidetext-a-multimodal-deep-learning-framework-31ce2565880c) `Airbnb` `2020`\n26. [Dynaboard：超越准确率，实现NLP中模型的全面评估](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fdynaboard-moving-beyond-accuracy-to-holistic-model-evaluation-in-nlp) ([代码](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdynalab?fbclid=IwAR3qcV7QK2uXm4s4M0XUoQQo4i2DEsDy0LZFKxSQCHhP-3hF6fr2-NDFWX8)) `Facebook` `2021`\n27. [我们如何将文本相似度运行时间缩短了99.96%](https:\u002F\u002Fmedium.com\u002Fdata-science-at-microsoft\u002Fhow-we-reduced-our-text-similarity-runtime-by-99-96-e8e4b4426b35) `微软` `2021`\n28. [无文本NLP：从原始音频生成富有表现力的语音](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Ftextless-nlp-generating-expressive-speech-from-raw-audio\u002F) [(第一部分)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.01192) [(第二部分)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.00355) [(第三部分)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.03264) [(代码和预训练模型)](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Ftree\u002Fmaster\u002Fexamples\u002Ftextless_nlp) `Facebook` `2021`\n29. [Pixel 6上的边写边语法纠错](https:\u002F\u002Fai.googleblog.com\u002F2021\u002F10\u002Fgrammar-correction-as-you-type-on-pixel.html) `谷歌` `2021`\n30. [Google文档中的自动生成摘要](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F03\u002Fauto-generated-summaries-in-google-docs.html) `谷歌` `2022`\n31. [ML增强的代码补全提升开发者生产力](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F07\u002Fml-enhanced-code-completion-improves.html) `谷歌` `2022`\n32. [层层递进——会话情感分析](https:\u002F\u002Fmedium.com\u002Fpaypal-tech\u002Fwords-all-the-way-down-conversational-sentiment-analysis-afe0165b84db) `PayPal` `2022`\n\n## 序列建模\n1. [Doctor AI：通过循环神经网络预测临床事件](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05942) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.05942.pdf)) `Sutter Health` `2015`\n2. [深度学习在理解消费者历史中的应用](https:\u002F\u002Fengineering.zalando.com\u002Fposts\u002F2016\u002F10\u002Fdeep-learning-for-understanding-consumer-histories.html) ([论文](https:\u002F\u002Fdoogkong.github.io\u002F2017\u002Fpapers\u002Fpaper2.pdf)) `Zalando` `2016`\n3. [利用循环神经网络模型早期检测心力衰竭的发作](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC5391725\u002F) ([论文](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC5391725\u002Fpdf\u002Focw112.pdf)) `Sutter Health` `2016`\n4. [结合传统与深度网络持续预测通知参与度](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.07120) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.07120.pdf)) `Telefonica` `2017` \n5. [深度学习在电子健康记录中的应用](https:\u002F\u002Fai.googleblog.com\u002F2018\u002F05\u002Fdeep-learning-for-electronic-health.html) ([论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-018-0029-1.pdf)) `Google` `2018`\n6. [基于长序列用户行为建模的点击率预测实践](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.09248) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.09248.pdf)) `Alibaba` `2019`\n7. [基于序列行为数据的搜索型用户兴趣建模用于CTR预测](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.05639) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.05639.pdf)) `Alibaba` `2020`\n8. [多邻国如何在其应用的各个部分使用AI](https:\u002F\u002Fventurebeat.com\u002F2020\u002F08\u002F18\u002Fhow-duolingo-uses-ai-in-every-part-of-its-app\u002F) `Duolingo` `2020`\n9. [利用在线社交互动提升Facebook平台的诚信度](https:\u002F\u002Fresearch.fb.com\u002Fblog\u002F2020\u002F08\u002Fleveraging-online-social-interactions-for-enhancing-integrity-at-facebook\u002F) ([论文](https:\u002F\u002Fresearch.fb.com\u002Fwp-content\u002Fuploads\u002F2020\u002F08\u002FTIES-Temporal-Interaction-Embeddings-For-Enhancing-Social-Media-Integrity-At-Facebook.pdf), [视频](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369780576dd25aef288cf\u002F)) `Facebook` `2020`\n10. [利用深度学习检测成员活动中的滥用序列](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2021\u002Fusing-deep-learning-to-detect-abusive-sequences-of-member-activi) ([视频](https:\u002F\u002Fexchange.scale.com\u002Fpublic\u002Fvideos\u002Fusing-deep-learning-to-detect-abusive-sequences-of-member-activity-on-linkedin)) `LinkedIn` `2021`\n\n## 计算机视觉\n1. [使用计算机视觉和深度学习构建现代 OCR 流程](https:\u002F\u002Fdropbox.tech\u002Fmachine-learning\u002Fcreating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning) `Dropbox` `2017`\n2. [在 Airbnb 中对房源照片进行分类](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fcategorizing-listing-photos-at-airbnb-f9483f3ab7e3) `Airbnb` `2018`\n3. [设施检测及更进一步——Airbnb 的计算机视觉新前沿](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Famenity-detection-and-beyond-new-frontiers-of-computer-vision-at-airbnb-144a4441b72e) `Airbnb` `2019`\n4. [仅通过清理标注错误，我们将计算机视觉指标提升了 5% 以上](https:\u002F\u002Fdeepomatic.com\u002Fen\u002Fhow-we-improved-computer-vision-metrics-by-more-than-5-percent-only-by-cleaning-labelling-errors\u002F) `Deepomatic`\n5. [利用音频和视频让机器识别并转录会议中的对话](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fblog\u002Fmaking-machines-recognize-and-transcribe-conversations-in-meetings-using-audio-and-video\u002F) `Microsoft` `2019`\n6. [由 AI 驱动：推进产品理解并打造全新购物体验](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fpowered-by-ai-advancing-product-understanding-and-building-new-shopping-experiences\u002F) `Facebook` `2020`\n7. [用于八小时降水预报的神经天气模型](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F03\u002Fa-neural-weather-model-for-eight-hour.html) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2003.12140.pdf)) `Google` `2020`\n8. [基于机器学习的灾害救援损伤评估](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F06\u002Fmachine-learning-based-damage.html) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.06444.pdf)) `Google` `2020`\n9. [RepNet：视频中重复动作计数](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F06\u002Frepnet-counting-repetitions-in-videos.html) ([论文](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2020\u002Fpapers\u002FDwibedi_Counting_Out_Time_Class_Agnostic_Video_Repetition_Counting_in_the_CVPR_2020_paper.pdf)) `Google` `2020`\n10. [将文本转换为图像以用于商品发现](https:\u002F\u002Fwww.amazon.science\u002Fblog\u002Fconverting-text-to-images-for-product-discovery) ([论文](https:\u002F\u002Fassets.amazon.science\u002F4c\u002F76\u002F5830542547b7a11089ce3af943b4\u002Fscipub-972.pdf)) `Amazon` `2020`\n11. [迪士尼如何使用 PyTorch 进行动画角色识别](https:\u002F\u002Fmedium.com\u002Fpytorch\u002Fhow-disney-uses-pytorch-for-animated-character-recognition-a1722a182627) `Disney` `2020`\n12. [图像字幕作为辅助技术](https:\u002F\u002Fwww.ibm.com\u002Fblogs\u002Fresearch\u002F2020\u002F07\u002Fimage-captioning-assistive-technology\u002F) ([视频](https:\u002F\u002Fivc.ischool.utexas.edu\u002F~yz9244\u002FVizWiz_workshop\u002Fvideos\u002FMMTeam-oral.mp4)) `IBM` `2020`\n13. [AI for AG：农业领域的生产型机器学习](https:\u002F\u002Fmedium.com\u002Fpytorch\u002Fai-for-ag-production-machine-learning-for-agriculture-e8cfdb9849a1) `Blue River` `2020`\n14. [特斯拉的完全自动驾驶 AI](https:\u002F\u002Fyoutu.be\u002Fhx7BXih7zx8?t=513) `Tesla` `2020`\n15. [设备端超市商品识别](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F07\u002Fon-device-supermarket-product.html) `Google` `2020`\n16. [利用机器学习检测结肠镜筛查中的覆盖不足](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F08\u002Fusing-machine-learning-to-detect.html) ([论文](https:\u002F\u002Fieeexplore.ieee.org\u002Fstamp\u002Fstamp.jsp?tp=&arnumber=9097918)) `Google` `2020`\n17. [“按图索骥”：在 Pinterest 上构建大规模视觉购物系统](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3394486.3403372) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3394486.3403372), [视频](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369790576dd25aef288d7\u002F)) `Pinterest` `2020`\n18. [开发用于视频会议的实时自动手语检测系统](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F10\u002Fdeveloping-real-time-automatic-sign.html) ([论文](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F2eaf0d18ec6bef00d7dd88f39dd4f9ff13eeeeb2.pdf)) `Google` `2020`\n19. [基于视觉的在线二手商品价格建议](https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.06009) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2012.06009.pdf)) `阿里巴巴` `2020`\n20. [新的 AI 研究助力从 X 光片预测 COVID-19 资源需求](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fnew-ai-research-to-help-predict-covid-19-resource-needs-from-a-series-of-x-rays\u002F) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2101.04909.pdf), [模型](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FCovidPrognosis)) `Facebook` `2021`\n21. [面向超大规模人脸识别的高效训练方法](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.10375) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.10375)) `阿里巴巴` `2021`\n22. [在 Scribd 中识别文档类型](https:\u002F\u002Ftech.scribd.com\u002Fblog\u002F2021\u002Fidentifying-document-types.html) `Scribd` `2021`\n23. [面向时尚搭配性的半监督视觉表征学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.08052.pdf) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.08052.pdf)) `沃尔玛` `2021`\n24. [通过私密的设备端机器学习识别人物照片](https:\u002F\u002Fmachinelearning.apple.com\u002Fresearch\u002Frecognizing-people-photos) `苹果` `2021`\n25. [DeepFusion：用于多模态 3D 物体检测的激光雷达-相机深度融合](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.08195.pdf) `谷歌` `2022`\n26. [通用时尚概念的对比语言与视觉学习](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-022-23052-9) ([论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-022-23052-9.pdf)) `Coveo` `2022`\n27. [利用计算机视觉优化搜索排名](https:\u002F\u002Farize.com\u002Fresource\u002Fbazaarvoice-leveraging-computer-vision-models-for-search-ranking\u002F) `BazaarVoice` `2023`\n\n## 强化学习\n1. [基于深度强化学习的赞助搜索实时竞价](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.00259) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.00259.pdf)) `阿里巴巴` `2018`\n2. [展示广告中无模型强化学习的预算约束出价](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.08365) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.08365.pdf)) `阿里巴巴` `2018`\n3. [按需物流中的强化学习](https:\u002F\u002Fdoordash.engineering\u002F2018\u002F09\u002F10\u002Freinforcement-learning-for-on-demand-logistics\u002F) `DoorDash` `2018`\n4. [电子商务搜索引擎中的排序强化学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.00710) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.00710.pdf)) `阿里巴巴` `2018`\n5. [基于深度强化学习的电商平台动态定价](https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.02572) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.02572.pdf)) `阿里巴巴` `2019`\n6. [使用Spark和MLflow将深度强化学习投入生产](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=hy-w69zf4oo) `Zynga` `2020`\n7. [深度强化学习在生产中的应用 第1部分](https:\u002F\u002Ftowardsdatascience.com\u002Fdeep-reinforcement-learning-in-production-7e1e63471e2) [第2部分](https:\u002F\u002Ftowardsdatascience.com\u002Fdeep-reinforcement-learning-in-production-part-2-personalizing-user-notifications-812a68ce2355) `Zynga` `2020`\n8. [构建AI交易系统](https:\u002F\u002Fdennybritz.com\u002Fblog\u002Fai-trading\u002F) `Denny Britz` `2020`\n9. [通过强化学习引导用户消费向多样化内容转变](https:\u002F\u002Fresearch.atspotify.com\u002Fshifting-consumption-towards-diverse-content-via-reinforcement-learning\u002F) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3437963.3441775)) `Spotify` `2022`\n10. [在线校准中的多臂老虎机：以社交媒体平台的内容审核为例](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.06516) `Meta` `2022`\n11. [如何使用级联多臂老虎机优化排名](https:\u002F\u002Fmedium.com\u002Fexpedia-group-tech\u002Fhow-to-optimise-rankings-with-cascade-bandits-5d92dfa0f16b) `Expedia` `2022`\n12. [利用探索与机器学习为每位商家选择最佳图片](https:\u002F\u002Fdoordash.engineering\u002F2023\u002F01\u002F04\u002Fselecting-the-best-image-for-each-merchant-using-exploration-and-machine-learning\u002F) `DoorDash` `2023`\n\n## 异常检测\n1. [检测外部固件部署中的性能异常](https:\u002F\u002Fnetflixtechblog.com\u002Fdetecting-performance-anomalies-in-external-firmware-deployments-ed41b1bfcf46) `Netflix` `2019`\n2. [使用孤立森林检测并预防LinkedIn上的滥用行为](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002Fisolation-forest) ([代码](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fisolation-forest)) `LinkedIn` `2019`\n3. [结合Spark和TensorFlow的深度异常检测](https:\u002F\u002Fdatabricks.com\u002Fsession_eu19\u002Fdeep-anomaly-detection-from-research-to-production-leveraging-spark-and-tensorflow) [(Hopsworks视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TgXVU8DSyCQ)) `Swedbank`、`Hopsworks` `2019`\n4. [利用无监督学习预防滥用](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=sFRrFWYNAUI) `LinkedIn` `2020`\n5. [LinkedIn上打击骚扰背后的技术](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Ffighting-harassment) `LinkedIn` `2020`\n6. [利用网络学习揭露保险欺诈阴谋](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.12789) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2002.12789.pdf)) `蚂蚁金服` `2020`\n7. [Stack Exchange上的垃圾信息防护机制是如何工作的？](https:\u002F\u002Fstackoverflow.blog\u002F2020\u002F06\u002F25\u002Fhow-does-spam-protection-work-on-stack-exchange\u002F) `Stack Exchange` `2020`\n8. [C2C电商中的自动内容审核](https:\u002F\u002Fwww.usenix.org\u002Fconference\u002Fopml20\u002Fpresentation\u002Fueta) `Mercari` `2020`\n9. [利用机器学习阻止Slack邀请垃圾邮件](https:\u002F\u002Fslack.engineering\u002Fblocking-slack-invite-spam-with-machine-learning\u002F) `Slack` `2020`\n10. [Cloudflare机器人管理：机器学习及其他技术](https:\u002F\u002Fblog.cloudflare.com\u002Fcloudflare-bot-management-machine-learning-and-more\u002F) `Cloudflare` `2020`\n11. [隧道掘进机中油温变化的异常情况](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=YV_uLLhPRAk) `SENER` `2020`\n12. [利用异常检测监控低风险银行客户](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=MExokMM_Bp4&t=3s) `Rabobank` `2020`\n13. [使用三元组损失打击欺诈](https:\u002F\u002Ftech.olx.com\u002Ffighting-fraud-with-triplet-loss-86e5f79c7a3e) `OLX Group` `2020`\n14. [Facebook现在正使用AI对内容进行分类，以加快审核速度](https:\u002F\u002Fwww.theverge.com\u002F2020\u002F11\u002F13\u002F21562596\u002Ffacebook-ai-moderation) ([替代方案](https:\u002F\u002Fventurebeat.com\u002F2020\u002F11\u002F13\u002Ffacebooks-redoubled-ai-efforts-wont-stop-the-spread-of-harmful-content\u002F)) `Facebook` `2020`\n15. AI在仇恨言论检测方面的进步 [第1部分](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fhow-ai-is-getting-better-at-detecting-hate-speech\u002F)、[第2部分](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fheres-how-were-using-ai-to-help-detect-misinformation\u002F)、[第3部分](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Ftraining-ai-to-detect-hate-speech-in-the-real-world\u002F)、[第4部分](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fhow-facebook-uses-super-efficient-ai-models-to-detect-hate-speech\u002F) `Facebook` `2020`\n16. [利用深度学习检测会员活动中的辱骂性序列](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2021\u002Fusing-deep-learning-to-detect-abusive-sequences-of-member-activi) ([视频](https:\u002F\u002Fexchange.scale.com\u002Fpublic\u002Fvideos\u002Fusing-deep-learning-to-detect-abusive-sequences-of-member-activity-on-linkedin)) `LinkedIn` `2021`\n17. [项目RADAR：带有人工参与的智能早期欺诈检测系统](https:\u002F\u002Feng.uber.com\u002Fproject-radar-intelligent-early-fraud-detection\u002F) `Uber` `2022`\n18. [用于欺诈检测的图结构](https:\u002F\u002Fengineering.grab.com\u002Fgraph-for-fraud-detection) `Grab` `2022`\n19. [在线校准中的多臂老虎机：以社交媒体平台的内容审核为例](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.06516) `Meta` `2022`\n20. [不断进化我们的机器学习模型以拦截移动机器人](https:\u002F\u002Fblog.cloudflare.com\u002Fmachine-learning-mobile-traffic-bots\u002F) `Cloudflare` `2022`\n21. [通过数据增强和采样提高我们机器学习WAF的准确性](https:\u002F\u002Fblog.cloudflare.com\u002Fdata-generation-and-sampling-strategies\u002F) `Cloudflare` `2022`\n22. [流媒体服务中的机器学习欺诈检测](https:\u002F\u002Fnetflixtechblog.com\u002Fmachine-learning-for-fraud-detection-in-streaming-services-b0b4ef3be3f6) `Netflix` `2022`\n23. [Lyft的定价策略](https:\u002F\u002Feng.lyft.com\u002Fpricing-at-lyft-8a4022065f8b) `Lyft` `2022`\n\n## 图\n1. [构建 LinkedIn 知识图谱](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2016\u002F10\u002Fbuilding-the-linkedin-knowledge-graph) `LinkedIn` `2016`\n2. [在 Airbnb 扩展知识访问与检索能力](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fscaling-knowledge-access-and-retrieval-at-airbnb-665b6ba21e95) `Airbnb` `2018`\n3. [用于 Web 规模推荐系统的图卷积神经网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.01973) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.01973.pdf)) `Pinterest` `2018`\n4. [Uber Eats 的美食发现：利用图学习驱动推荐](https:\u002F\u002Feng.uber.com\u002Fuber-eats-graph-learning\u002F) `Uber` `2019`\n5. [AliGraph：一个全面的图神经网络平台](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.08730) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1902.08730.pdf)) `Alibaba` `2019`\n6. [通过构建知识图谱实现 Airbnb 的情境化](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fcontextualizing-airbnb-by-building-knowledge-graph-b7077e268d5a) `Airbnb` `2019`\n7. [零售图——沃尔玛的产品知识图谱](https:\u002F\u002Fmedium.com\u002Fwalmartlabs\u002Fretail-graph-walmarts-product-knowledge-graph-6ef7357963bc) `Walmart` `2020`\n8. [利用先进的图神经网络进行交通预测](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Ftraffic-prediction-with-advanced-graph-neural-networks) `DeepMind` `2020`\n9. [SimClusters：基于社区的推荐表示](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3394486.3403370) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3394486.3403370), [视频](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369790576dd25aef288d5\u002F)) `Twitter` `2020`\n10. [元路径引导的邻居聚合网络用于异构图推理](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.06474) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.06474.pdf)) `Alibaba` `2021`\n11. [用于赞助搜索点击率预测的图意图网络](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.16164) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.16164.pdf)) `Alibaba` `2021`\n12. [JEL：在摩根大通应用端到端神经实体链接技术](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F17796) ([论文](https:\u002F\u002Fwww.aaai.org\u002FAAAI21Papers\u002FIAAI-21.DingW.pdf)) `摩根大通` `2021`\n13. [AWS 如何利用图神经网络满足客户需求](https:\u002F\u002Fwww.amazon.science\u002Fblog\u002Fhow-aws-uses-graph-neural-networks-to-meet-customer-needs) `Amazon` `2022`\n14. [用于欺诈检测的图](https:\u002F\u002Fengineering.grab.com\u002Fgraph-for-fraud-detection) `Grab` `2022`\n\n## 优化\n1. [Lyft Line 中的匹配机制（第 1 部分）](https:\u002F\u002Feng.lyft.com\u002Fmatchmaking-in-lyft-line-9c2635fe62c4) [(第 2 部分)](https:\u002F\u002Feng.lyft.com\u002Fmatchmaking-in-lyft-line-691a1a32a008) [(第 3 部分)](https:\u002F\u002Feng.lyft.com\u002Fmatchmaking-in-lyft-line-part-3-d8f9497c0e51) `Lyft` `2016`\n2. [GrabShare 拼车背后的数据与科学](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F8259801) [(第 1 部分)](https:\u002F\u002Fengineering.grab.com\u002Fthe-data-and-science-behind-grabshare-part-i) (**需要论文**)`Grab` `2017`\n3. [Uber Eats 中如何利用行程推断和机器学习优化配送时间](https:\u002F\u002Feng.uber.com\u002Fuber-eats-trip-optimization\u002F) `Uber` `2018`\n4. [DoorDash 的下一代骑手调度优化](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F02\u002F28\u002Fnext-generation-optimization-for-dasher-dispatch-at-doordash\u002F) `DoorDash` `2020`\n5. [利用机器学习优化电梯乘客等待时间](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vXndCC89BCw&t=4s) `Thyssen Krupp AG` `2020`\n6. [跳出固有思维：为电商包裹推荐包装类型](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fthink-out-of-the-package-recommending-package-types-for-e-commerce-shipments) ([论文](https:\u002F\u002Fassets.amazon.science\u002F0c\u002F6c\u002F9d0986b94bef92d148f0ac0da1ea\u002Fthink-out-of-the-package-recommending-package-types-for-e-commerce-shipments.pdf)) `Amazon` `2020`\n7. [利用机器学习优化 DoorDash 的营销支出](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F07\u002F31\u002Foptimizing-marketing-spend-with-ml\u002F) `DoorDash` `2020`\n8. [使用排序学习精准定位包裹投递地点](https:\u002F\u002Fwww.amazon.science\u002Fblog\u002Fusing-learning-to-rank-to-precisely-locate-where-to-deliver-packages) ([论文](https:\u002F\u002Fassets.amazon.science\u002F69\u002F8d\u002F2249945a4e10ba8fc758f7523b0c\u002Fgetting-your-package-to-the-right-place-supervised-machine-learning-for-geolocation.pdf)) `Amazon` `2021`\n\n## 信息抽取\n1. [从产品描述中无监督地提取属性及其值](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FI13-1190\u002F) ([论文](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FI13-1190.pdf)) `Rakuten` `2013`\n2. [利用机器学习索引数十亿张图片中的文本](https:\u002F\u002Fdropbox.tech\u002Fmachine-learning\u002Fusing-machine-learning-to-index-text-from-billions-of-images) `Dropbox` `2018`\n3. [从模板化文档中提取结构化数据](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F06\u002Fextracting-structured-data-from.html) ([论文](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FI13-1190.pdf)) `Google` `2020`\n4. [AutoKnow：面向数千种产品的自动驾驶知识采集系统](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fautoknow-self-driving-knowledge-collection-for-products-of-thousands-of-types) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.13473.pdf), [视频](https:\u002F\u002Fcrossminds.ai\u002Fvideo\u002F5f3369730576dd25aef288a6\u002F)) `Amazon` `2020`\n5. [基于注意力机制和信念传播的一次性文本标注用于信息抽取](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.04153) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2009.04153.pdf)) `Alibaba` `2020`\n6. [利用图卷积网络从收据中提取信息](https:\u002F\u002Fnanonets.com\u002Fblog\u002Finformation-extraction-graph-convolutional-networks\u002F) `Nanonets` `2021`\n\n## 弱监督\n1. [Snorkel DryBell：工业级弱监督部署案例研究](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3299869.3314036) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3299869.3314036)) `Google` `2019`\n2. [Osprey：无需代码的不平衡抽取问题弱监督](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3329486.3329492) ([论文](https:\u002F\u002Fajratner.github.io\u002Fassets\u002Fpapers\u002FOsprey_DEEM.pdf)) `Intel` `2019`\n3. [Overton：用于监控和改进机器学习产品的数据系统](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.05372) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.05372.pdf)) `Apple` `2019`\n4. [利用弱监督自举对话式智能体](https:\u002F\u002Fwww.aaai.org\u002Fojs\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F5011) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1812.06176.pdf)) `IBM` `2019`\n\n## 生成模型\n1. [更好的语言模型及其影响](https:\u002F\u002Fopenai.com\u002Fblog\u002Fbetter-language-models\u002F) ([论文](https:\u002F\u002Fcdn.openai.com\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf)) `OpenAI` `2019`\n2. [Image GPT](https:\u002F\u002Fopenai.com\u002Fblog\u002Fimage-gpt\u002F) ([论文](https:\u002F\u002Fcdn.openai.com\u002Fpapers\u002FGenerative_Pretraining_from_Pixels_V2.pdf), [代码](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fimage-gpt)) `OpenAI` `2019`\n3. [语言模型是少样本学习者](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.14165) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.14165.pdf)) ([GPT-3 博客文章](https:\u002F\u002Fopenai.com\u002Fblog\u002Fopenai-api\u002F)) `OpenAI` `2020`\n4. [用于特效电影制作的深度学习超分辨率](https:\u002F\u002Fgraphics.pixar.com\u002Flibrary\u002FSuperResolution\u002F) ([论文](https:\u002F\u002Fgraphics.pixar.com\u002Flibrary\u002FSuperResolution\u002Fpaper.pdf)) `Pixar` `2020`\n5. [基于 Transformer 的单元测试用例生成](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2009.05617.pdf) `Microsoft` `2021`\n\n## 音频\n1. [使用 VoiceFilter-Lite 改进设备端语音识别](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F11\u002Fimproving-on-device-speech-recognition.html) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2009.04323.pdf)) `Google` `2020`\n2. [“哼唱搜索”背后的机器学习技术](https:\u002F\u002Fai.googleblog.com\u002F2020\u002F11\u002Fthe-machine-learning-behind-hum-to.html) `Google` `2020`\n\n## 隐私保护型机器学习\n1. [联邦学习：无需集中式训练数据的协作式机器学习](https:\u002F\u002Fai.googleblog.com\u002F2017\u002F04\u002Ffederated-learning-collaborative.html) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1602.05629)) `Google` `2017`\n2. [具有形式化差分隐私保证的联邦学习](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F02\u002Ffederated-learning-with-formal.html) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.00039)) `Google` `2022`\n3. [基于 MPC 的机器学习：实现端到端的隐私保护型机器学习](https:\u002F\u002Fresearch.facebook.com\u002Fblog\u002F2022\u002F10\u002Fmpc-based-machine-learning-achieving-end-to-end-privacy-preserving-machine-learning\u002F) ([论文](https:\u002F\u002Fresearch.facebook.com\u002Ffile\u002F455681589729383\u002FPrivate-Computation-Framework-2.0-White-Paper.pdf)) `Facebook` `2022`\n\n## 验证与 A\u002FB 测试\n1. [重叠实验基础设施：更多、更好、更快的实验](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub36500\u002F) ([论文](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F36500.pdf)) `Google` `2010`\n2. [可重复使用的保留集：在自适应数据分析中保持有效性](https:\u002F\u002Fai.googleblog.com\u002F2015\u002F08\u002Fthe-reusable-holdout-preserving.html) ([论文](https:\u002F\u002Fscience.sciencemag.org\u002Fcontent\u002Fsci\u002F349\u002F6248\u002F636.full.pdf)) `Google` `2015`\n3. [Twitter 实验平台技术概述](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Fa\u002F2015\u002Ftwitter-experimentation-technical-overview.html) `Twitter` `2015`\n4. [一切皆是 A\u002FB 测试：Netflix 实验平台](https:\u002F\u002Fnetflixtechblog.com\u002Fits-all-a-bout-testing-the-netflix-experimentation-platform-4e1ca458c15) `Netflix` `2016`\n5. [构建 Pinterest 的 A\u002FB 测试平台](https:\u002F\u002Fmedium.com\u002Fpinterest-engineering\u002Fbuilding-pinterests-a-b-testing-platform-ab4934ace9f4) `Pinterest` `2016`\n6. [通过实验解决信息过载问题](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2017\u002FExperimenting-To-Solve-Cramming.html) `Twitter` `2017`\n7. [利用 Uber 工程团队构建智能实验平台](https:\u002F\u002Feng.uber.com\u002Fexperimentation-platform\u002F) `Uber` `2017`\n8. [扩展 Airbnb 的实验平台](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fhttps-medium-com-jonathan-parks-scaling-erf-23fd17c91166) `Airbnb` `2017`\n9. [认识 Wasabi：一款开源 A\u002FB 测试平台](https:\u002F\u002Fwww.intuit.com\u002Fblog\u002Ftechnology\u002Fengineering\u002Fmeet-wasabi-an-open-source-ab-testing-platform\u002F) ([代码](https:\u002F\u002Fgithub.com\u002Fintuit\u002Fwasabi)) `Intuit` `2017`\n10. [分析实验结果：超越平均处理效应](https:\u002F\u002Feng.uber.com\u002Fanalyzing-experiment-outcomes\u002F) `Uber` `2018`\n11. [Uber 实验平台揭秘](https:\u002F\u002Feng.uber.com\u002Fxp\u002F) `Uber` `2018`\n12. [带噪声实验的约束贝叶斯优化](https:\u002F\u002Fresearch.fb.com\u002Fpublications\u002Fconstrained-bayesian-optimization-with-noisy-experiments\u002F) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1706.07094.pdf)) `Facebook` `2018`\n13. [Grab 的可靠且可扩展的功能开关与 A\u002FB 测试 SDK](https:\u002F\u002Fengineering.grab.com\u002Ffeature-toggles-ab-testing) `Grab` `2018`\n14. [使用 Kaplan-Meier 和伽玛分布建模转化率并节省数百万美元](https:\u002F\u002Fbetter.engineering\u002Fmodeling-conversion-rates-and-saving-millions-of-dollars-using-kaplan-meier-and-gamma-distributions\u002F) ([代码](https:\u002F\u002Fgithub.com\u002Fbetter\u002Fconvoys)) `Better` `2019`\n15. [检测干扰：一次针对 A\u002FB 测试本身的 A\u002FB 测试](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2019\u002F06\u002Fdetecting-interference--an-a-b-test-of-a-b-tests) `LinkedIn` `2019`\n16. [宣布使用 Pyro 设计最优实验的新框架](https:\u002F\u002Feng.uber.com\u002Foed-pyro-release\u002F) ([论文](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F9553-variational-bayesian-optimal-experimental-design.pdf)) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.00294.pdf)) `Uber` `2020`\n17. [借助 Traveloka 实验平台实现实验数量提升 10 倍](https:\u002F\u002Fmedium.com\u002Ftraveloka-engineering\u002Fenabling-10x-more-experiments-with-traveloka-experiment-platform-8cea13e952c) `Traveloka` `2020`\n18. [Stitch Fix 的大规模实验](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2020\u002F07\u002F07\u002Flarge-scale-experimentation\u002F) ([论文](http:\u002F\u002Fproceedings.mlr.press\u002Fv89\u002Fschmit19a\u002Fschmit19a.pdf)) `Stitch Fix` `2020`\n19. [多臂老虎机与 Stitch Fix 实验平台](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2020\u002F08\u002F05\u002Fbandits\u002F) `Stitch Fix` `2020`\n20. [资源受限条件下的实验](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2020\u002F11\u002F18\u002Fvirtual-warehouse\u002F) `Stitch Fix` `2020`\n21. [Netflix 的计算因果推断](https:\u002F\u002Fnetflixtechblog.com\u002Fcomputational-causal-inference-at-netflix-293591691c62) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.10979.pdf)) `Netflix` `2020`\n22. [Netflix 中准实验的关键挑战](https:\u002F\u002Fnetflixtechblog.com\u002Fkey-challenges-with-quasi-experiments-at-netflix-89b4f234b852) `Netflix` `2020`\n23. [使 LinkedIn 实验引擎速度提升 20 倍](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fmaking-the-linkedin-experimentation-engine-20x-faster) `LinkedIn` `2020`\n24. [我们通往 T-REX 的演进历程：LinkedIn 实验基础设施的前史](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Four-evolution-towards-t-rex--the-prehistory-of-experimentation-i) `LinkedIn` `2020`\n25. [如何利用准实验和反事实构建优质产品](https:\u002F\u002Fengineering.shopify.com\u002Fblogs\u002Fengineering\u002Fusing-quasi-experiments-counterfactuals) `Shopify` `2020`\n26. [通过将预测作为协变量来提高实验效力](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F06\u002F08\u002Fimproving-experimental-power-through-control-using-predictions-as-covariate-cupac\u002F) `DoorDash` `2020`\n27. [借助实验分析平台支持快速产品迭代](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F09\u002F09\u002Fexperimentation-analysis-platform-mvp\u002F) `DoorDash` `2020`\n28. [通过并行化和提高灵敏度，将在线实验容量提升 4 倍](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F10\u002F07\u002Fimproving-experiment-capacity-by-4x\u002F) `DoorDash` `2020`\n29. [利用因果建模从平淡的实验结果中获得更多价值](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F09\u002F18\u002Fcausal-modeling-to-get-more-value-from-flat-experiment-results\u002F) `DoorDash` `2020`\n30. [通过实验迭代实时分配算法](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F12\u002F08\u002Foptimizing-real-time-algorithms-experimentation\u002F) `DoorDash` `2020`\n31. [Spotify 新实验平台（第 1 部分）](https:\u002F\u002Fengineering.atspotify.com\u002F2020\u002F10\u002F29\u002Fspotifys-new-experimentation-platform-part-1\u002F) [(第 2 部分)](https:\u002F\u002Fengineering.atspotify.com\u002F2020\u002F11\u002F02\u002Fspotifys-new-experimentation-platform-part-2\u002F) `Spotify` `2020`\n32. [解读 A\u002FB 测试结果：假阳性与统计显著性](https:\u002F\u002Fnetflixtechblog.com\u002Finterpreting-a-b-test-results-false-positives-and-statistical-significance-c1522d0db27a) `Netflix` `2021`\n33. [解读 A\u002FB 测试结果：假阴性与检验效能](https:\u002F\u002Fnetflixtechblog.com\u002Finterpreting-a-b-test-results-false-negatives-and-power-6943995cf3a8) `Netflix` `2021`\n34. [使用 Google AdWords 进行实验以优化广告活动](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F02\u002F05\u002Fgoogle-adwords-campaign-optimization\u002F) `DoorDash` `2021`\n35. [DoorDash 用于将其物流实验能力提升 1000% 的四大原则](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F09\u002F21\u002Fthe-4-principles-doordash-used-to-increase-its-logistics-experiment-capacity-by-1000\u002F) `DoorDash` `2021`\n36. [Zalando 的实验平台：第一部分——演进历程](https:\u002F\u002Fengineering.zalando.com\u002Fposts\u002F2021\u002F01\u002Fexperimentation-platform-part1.html) `Zalando` `2021`\n37. [设计实验护栏](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fdesigning-experimentation-guardrails-ed6a976ec669) `Airbnb` `2021`\n38. [Airbnb 如何衡量未来价值以标准化权衡取舍](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fhow-airbnb-measures-future-value-to-standardize-tradeoffs-3aa99a941ba5) `Airbnb` `2021`\n38. [大规模网络实验](https:\u002F\u002Fresearch.fb.com\u002Fpublications\u002Fnetwork-experimentation-at-scale\u002F) ([论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.08591)) `Facebook` `2021`\n39. [迪士尼流媒体中的通用对照组](https:\u002F\u002Fmedium.com\u002Fdisney-streaming\u002Funiversal-holdout-groups-at-disney-streaming-2043360def4f) `Disney` `2021`\n40. [实验是 Netflix 整体数据科学的核心重点](https:\u002F\u002Fnetflixtechblog.com\u002Fexperimentation-is-a-major-focus-of-data-science-across-netflix-f67923f8e985) `Netflix` `2022`\n41. [迈向更佳实验实践的探索之旅](https:\u002F\u002Fengineering.atspotify.com\u002F2022\u002F02\u002Fsearch-journey-towards-better-experimentation-practices\u002F) `Spotify` `2022`\n42. [人工反事实估计：基于机器学习的 Airbnb 因果推断](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fartificial-counterfactual-estimation-ace-machine-learning-based-causal-inference-at-airbnb-ee32ee4d0512) `Airbnb` `2022`\n43. [超越 A\u002FB 测试：通过交错排序加速 Airbnb 搜索排名实验](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fbeyond-a-b-test-speeding-up-airbnb-search-ranking-experimentation-through-interleaving-7087afa09c8e) `Airbnb` `2022`\n44. [实验面临的挑战](https:\u002F\u002Feng.lyft.com\u002Fchallenges-in-experimentation-be9ab98a7ef4) `Lyft` `2022`\n45. [过度跟踪与触发分析：在提高灵敏度的同时减少样本量](https:\u002F\u002Fbooking.ai\u002Fovertracking-and-trigger-analysis-how-to-reduce-sample-sizes-and-increase-the-sensitivity-of-71755bad0e5f) `Booking` `2022`\n46. [认识 Dash-AB——DoorDash 实验的统计引擎](https:\u002F\u002Fdoordash.engineering\u002F2022\u002F05\u002F24\u002Fmeet-dash-ab-the-statistics-engine-of-experimentation-at-doordash\u002F) `DoorDash` `2022`\n47. [在在线 A\u002FB 测试中大规模比较分位数](https:\u002F\u002Fengineering.atspotify.com\u002F2022\u002F03\u002Fcomparing-quantiles-at-scale-in-online-a-b-testing) `Spotify` `2022`\n48. [利用机器学习加速我们的 A\u002FB 实验](https:\u002F\u002Fdropbox.tech\u002Fmachine-learning\u002Faccelerating-our-a-b-experiments-with-machine-learning-xr) `Dropbox` `2023`\n49. [为 Uber 的 A\u002FB 测试注入强劲动力](https:\u002F\u002Fwww.uber.com\u002Fblog\u002Fsupercharging-a-b-testing-at-uber\u002F) `Uber`\n\n## 模型管理\n1. [机器学习的工程化——从原始数据到预测的溯源管理](https:\u002F\u002Fvimeo.com\u002F274396495) `Comcast` `2018`\n2. [Overton：用于监控和改进机器学习产品的数据系统](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.05372)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.05372.pdf)）`Apple` `2019`\n3. [Runway - Netflix 的模型生命周期管理](https:\u002F\u002Fwww.usenix.org\u002Fconference\u002Fopml20\u002Fpresentation\u002Fcepoi) `Netflix` `2020`\n4. [大规模下的机器学习模型管理——Intuit 的机器学习平台](https:\u002F\u002Fwww.usenix.org\u002Fconference\u002Fopml20\u002Fpresentation\u002Fwenzel) `Intuit` `2020`\n5. [机器学习模型监控——来自一线的 9 条建议](https:\u002F\u002Fbuilding.nubank.com.br\u002Fml-model-monitoring-9-tips-from-the-trenches\u002F) `Nubank` `2021`\n6. [实时机器学习模型中的训练-服务偏移问题处理：简明指南](https:\u002F\u002Fbuilding.nubank.com.br\u002Fdealing-with-train-serve-skew-in-real-time-ml-models-a-short-guide\u002F) `Nubank` `2023`\n\n## 效率\n1. [GrokNet：面向电商的统一计算机视觉模型主干与嵌入](https:\u002F\u002Fai.facebook.com\u002Fresearch\u002Fpublications\u002Fgroknet-unified-computer-vision-model-trunk-and-embeddings-for-commerce\u002F)（[论文](https:\u002F\u002Fscontent-sea1-1.xx.fbcdn.net\u002Fv\u002Ft39.8562-6\u002F99353320_565175057533429_3886205100842024960_n.pdf?_nc_cat=110&_nc_sid=ae5e01&_nc_ohc=WQBaZy1gnmUAX8Ecqtt&_nc_ht=scontent-sea1-1.xx&oh=cab2f11dd9154d817149cb73e8b692a8&oe=5F5A3778)) `Facebook` `2020`\n2. [我们如何将 BERT 扩展到在 CPU 上服务每日超过 10 亿次请求](https:\u002F\u002Fblog.roblox.com\u002F2020\u002F05\u002Fscaled-bert-serve-1-billion-daily-requests-cpus\u002F) `Roblox` `2020`\n3. [置换、量化与微调：神经网络的高效压缩](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.15703)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.15703.pdf)）`Uber` `2021`\n4. [Pinterest 的 GPU 加速机器学习推理](https:\u002F\u002Fmedium.com\u002F@Pinterest_Engineering\u002Fgpu-accelerated-ml-inference-at-pinterest-ad1b6a03a16d) `Pinterest` `2022`\n\n## 伦理\n1. [通过 A\u002FB 测试构建包容性产品](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Fbuilding-inclusive-products-through-a-b-testing)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2002.05819.pdf)）`LinkedIn` `2020`\n2. [LiFT：衡量机器学习应用公平性的可扩展框架](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2020\u002Flift-addressing-bias-in-large-scale-ai-applications)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.07433.pdf)）`LinkedIn` `2020`\n3. [推出 Twitter 首个算法偏见赏金挑战](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2021\u002Falgorithmic-bias-bounty-challenge) `Twitter` `2021`\n4. [考察 Twitter 上政治内容的算法放大效应](https:\u002F\u002Fblog.twitter.com\u002Fen_us\u002Ftopics\u002Fcompany\u002F2021\u002Frml-politicalcontent) `Twitter` `2021`\n5. [深入探讨 LinkedIn 如何将其公平性理念融入 AI 产品中](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2022\u002Fa-closer-look-at-how-linkedin-integrates-fairness-into-its-ai-pr) `LinkedIn` `2022`\n\n## 基础设施\n1. [为互操作性重构 Facebook AI 的深度学习平台](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Freengineering-facebook-ais-deep-learning-platforms-for-interoperability) `Facebook` `2020`\n2. [使用 Ray 在 XGBoost 上进行弹性分布式训练](https:\u002F\u002Feng.uber.com\u002Felastic-xgboost-ray\u002F) `Uber` `2021`\n\n## MLOps 平台\n1. [认识米开朗基罗：Uber 的机器学习平台](https:\u002F\u002Feng.uber.com\u002Fmichelangelo-machine-learning-platform\u002F) `Uber` `2017`\n2. [机器学习的落地实践——从原始数据到预测的全过程管理](https:\u002F\u002Fvimeo.com\u002F274396495) `Comcast` `2018`\n3. [Pinterest 的大数据机器学习平台](https:\u002F\u002Fwww.slideshare.net\u002FAlluxio\u002Fpinterest-big-data-machine-learning-platform-at-pinterest) `Pinterest` `2019`\n4. [Instagram 的核心模型构建](https:\u002F\u002Finstagram-engineering.com\u002Fcore-modeling-at-instagram-a51e0158aa48) `Instagram` `2019`\n5. [开源 Metaflow——以人为本的数据科学框架](https:\u002F\u002Fnetflixtechblog.com\u002Fopen-sourcing-metaflow-a-human-centric-framework-for-data-science-fa72e04a5d9) `Netflix` `2019`\n6. [大规模 ML 模型管理——Intuit 的 ML 平台](https:\u002F\u002Fwww.usenix.org\u002Fconference\u002Fopml20\u002Fpresentation\u002Fwenzel) `Intuit` `2020`\n7. [Zomato 的实时机器学习推理平台](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0-3ES1vzW14) `Zomato` `2020`\n8. [推出 Flyte：云原生机器学习与数据处理平台](https:\u002F\u002Feng.lyft.com\u002Fintroducing-flyte-cloud-native-machine-learning-and-data-processing-platform-fb2bb3046a59) `Lyft` `2020`\n9. [利用计算图构建灵活的集成 ML 模型](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F01\u002F26\u002Fcomputational-graph-machine-learning-ensemble-model-support\u002F) `DoorDash` `2021`\n10. [LyftLearn：基于 Kubernetes 构建的 ML 模型训练基础设施](https:\u002F\u002Feng.lyft.com\u002Flyftlearn-ml-model-training-infrastructure-built-on-kubernetes-aef8218842bb) `Lyft` `2021`\n11. [\"你不需要更大的船\"：用开源工具构建的完整数据流水线](https:\u002F\u002Fgithub.com\u002Fjacopotagliabue\u002Fyou-dont-need-a-bigger-boat)（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.07346)）`Coveo` `2021`\n12. [GreenSteam 的 MLOps：机器学习的交付与部署](https:\u002F\u002Fneptune.ai\u002Fblog\u002Fmlops-at-greensteam-shipping-machine-learning-case-study) `GreenSteam` `2021`\n13. [Reddit ML 模型部署与服务架构的演进](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FRedditEng\u002Fcomments\u002Fq14tsw\u002Fevolving_reddits_ml_model_deployment_and_serving\u002F) `Reddit` `2021`\n14. [重新设计 Etsy 的机器学习平台](https:\u002F\u002Fwww.etsy.com\u002Fcodeascraft\u002Fredesigning-etsys-machine-learning-platform\u002F) `Etsy` `2021`\n15. [理解大规模深度推荐模型训练中的数据存储与摄取](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.09373)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.09373.pdf)）`Meta` `2021`\n15. [在 Etsy 上构建推荐服务的平台](https:\u002F\u002Fwww.etsy.com\u002Fcodeascraft\u002Fbuilding-a-platform-for-serving-recommendations-at-etsy) `Etsy` `2022` \n16. [智能自动化平台：赋能 Airbnb 的对话式 AI 及其应用](https:\u002F\u002Fmedium.com\u002Fairbnb-engineering\u002Fintelligent-automation-platform-empowering-conversational-ai-and-beyond-at-airbnb-869c44833ff2) `Airbnb` `2022`\n17. [DARWIN：LinkedIn 的数据科学与人工智能工作台](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2022\u002Fdarwin--data-science-and-artificial-intelligence-workbench-at-li) `LinkedIn` `2022`\n18. [梅林的魔力：Shopify 的全新机器学习平台](https:\u002F\u002Fshopify.engineering\u002Fmerlin-shopify-machine-learning-platform) `Shopify` `2022`\n19. [Zalando 的机器学习平台](https:\u002F\u002Fengineering.zalando.com\u002Fposts\u002F2022\u002F04\u002Fzalando-machine-learning-platform.html) `Zalando` `2022`\n20. [揭秘 Meta 全公司工程师使用的 AI 优化平台](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Flooper-meta-ai-optimization-platform-for-engineers\u002F)（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.07554.pdf)）`Meta` `2022`\n21. [Monzo 的机器学习技术栈](https:\u002F\u002Fmonzo.com\u002Fblog\u002F2022\u002F04\u002F26\u002Fmonzos-machine-learning-stack) `Monzo` `2022`\n22. [ML Fact Store 的演进](https:\u002F\u002Fnetflixtechblog.com\u002Fevolution-of-ml-fact-store-5941d3231762) `Netflix` `2022`\n23. [利用 MLOps 构建实时端到端机器学习流水线](https:\u002F\u002Fwww.binance.com\u002Fen\u002Fblog\u002Fall\u002Fusing-mlops-to-build-a-realtime-endtoend-machine-learning-pipeline-3820048062346322706) `Binance` `2022`\n24. [在 Zillow 高效地大规模部署机器学习模型](https:\u002F\u002Fwww.zillow.com\u002Ftech\u002Fserving-machine-learning-models-efficiently-at-scale-at-zillow\u002F) `Zillow` `2022`\n25. [Didact AI：一款基于 ML 的选股引擎剖析](https:\u002F\u002Fprincipiamundi.com\u002Fposts\u002Fdidact-anatomy\u002F?utm_campaign=Data_Elixir&utm_source=Data_Elixir_407\u002F) `Didact AI` `2022`\n26. [免费部署——Stitch Fix 数据科学家的机器学习平台](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2022\u002F07\u002F14\u002Fdeployment-for-free\u002F) `Stitch Fix` `2022`\n27. [机器学习运维（MLOps）：概述、定义与架构](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.02302)（[论文](https:\u002F\u002Farxiv.org\u002Fftp\u002Farxiv\u002Fpapers\u002F2205\u002F2205.02302.pdf)）`IBM` `2022`\n\n## 实践\n1. [基于梯度的深度架构训练实用建议](https:\u002F\u002Farxiv.org\u002Fabs\u002F1206.5533) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1206.5533.pdf)) `Yoshua Bengio` `2012`\n2. [机器学习：技术债务的高息信用卡](https:\u002F\u002Fresearch.google\u002Fpubs\u002Fpub43146\u002F) ([论文](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F43146.pdf)) ([论文](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F5656-hidden-technical-debt-in-machine-learning-systems.pdf)) `Google` `2014`\n3. [机器学习规则：ML工程的最佳实践](https:\u002F\u002Fdevelopers.google.com\u002Fmachine-learning\u002Fguides\u002Frules-of-ml) `Google` `2018`\n4. [机器学习模型管理中的挑战](http:\u002F\u002Fsites.computer.org\u002Fdebull\u002FA18dec\u002Fp5.pdf) `Amazon` `2018`\n5. [生产环境中的机器学习：Booking.com的方法](https:\u002F\u002Fbooking.ai\u002Fhttps-booking-ai-machine-learning-production-3ee8fe943c70) `Booking` `2019`\n6. [150个成功的机器学习模型：Booking.com的6点经验教训](https:\u002F\u002Fbooking.ai\u002F150-successful-machine-learning-models-6-lessons-learned-at-booking-com-681e09107bec) ([论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3292500.3330744)) `Booking` `2019`\n7. [全球性银行大规模采用机器学习的成功与挑战](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=QYQKG5OcwEI) `Rabobank` `2019`\n8. [部署机器学习的挑战：案例研究综述](https:\u002F\u002Farxiv.org\u002Fabs\u002F2011.09926) ([论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2011.09926.pdf)) `Cambridge` `2020`\n9. [重构Facebook AI的深度学习平台以实现互操作性](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Freengineering-facebook-ais-deep-learning-platforms-for-interoperability) `Facebook` `2020`\n10. [企业级AI开发者工具的问题](https:\u002F\u002Ftowardsdatascience.com\u002Fthe-problem-with-ai-developer-tools-for-enterprises-and-what-ikea-has-to-do-with-it-b26277841661) `Databricks` `2020`\n11. [面向在线推理与模型的机器学习持续集成与部署](https:\u002F\u002Feng.uber.com\u002Fcontinuous-integration-deployment-ml\u002F) `Uber` `2021`\n12. [模型性能调优](https:\u002F\u002Feng.uber.com\u002Ftuning-model-performance\u002F) `Uber` `2021`\n13. [通过监控维持机器学习模型精度](https:\u002F\u002Fdoordash.engineering\u002F2021\u002F05\u002F20\u002Fmonitor-machine-learning-model-drift\u002F) `DoorDash` `2021`\n14. [在Wayfair构建可扩展且高性能的营销ML系统](https:\u002F\u002Fwww.aboutwayfair.com\u002Fcareers\u002Ftech-blog\u002Fbuilding-scalable-and-performant-marketing-ml-systems-at-wayfair) `Wayfair` `2021`\n15. [我们构建透明且可解释AI系统的做法](https:\u002F\u002Fengineering.linkedin.com\u002Fblog\u002F2021\u002Ftransparent-and-explainable-AI-systems) `LinkedIn` `2021`\n16. [为企业构建机器学习模型的5个步骤](https:\u002F\u002Fshopify.engineering\u002Fbuilding-business-machine-learning-models) `Shopify` `2021`\n17. [数据是一门艺术，而不仅是科学——讲故事是关键](https:\u002F\u002Fshopifyengineering.myshopify.com\u002Fblogs\u002Fengineering\u002Fdata-storytelling-shopify) `Shopify` `2022`\n18. [实时机器学习最佳实践：警报机制](https:\u002F\u002Fbuilding.nubank.com.br\u002Fbest-practices-for-real-time-machine-learning-alerting\u002F) `Nubank` `2022`\n19. [机器学习模型的自动再训练：技巧与经验教训](https:\u002F\u002Fbuilding.nubank.com.br\u002Fautomatic-retraining-for-machine-learning-models\u002F) `Nubank` `2022`\n20. [RecSysOps：大规模推荐系统运维的最佳实践](https:\u002F\u002Fnetflixtechblog.medium.com\u002Frecsysops-best-practices-for-operating-a-large-scale-recommender-system-95bbe195a841) `Netflix` `2022`\n21. [Uber的ML教育：受工程原则启发的框架](https:\u002F\u002Fwww.uber.com\u002Fen-PL\u002Fblog\u002Fml-education-at-uber\u002F) `Uber` `2022`\n22. [为DS\u002FML团队构建和维护内部工具：经验教训](https:\u002F\u002Fbuilding.nubank.com.br\u002Fbuilding-and-maintaining-internal-tools-for-ds-ml-teams-lessons-learned) `Nubank` `2024`\n\n## 团队结构\n1. [构建数据科学团队最有效的方式是什么？](https:\u002F\u002Ftowardsdatascience.com\u002Fwhat-is-the-most-effective-way-to-structure-a-data-science-team-498041b88dae) `Udemy` `2017`\n1. [工程师不应编写ETL：构建高效数据科学部门指南](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2016\u002F03\u002F16\u002Fengineers-shouldnt-write-etl\u002F) `Stitch Fix` `2016`\n2. [在Wish构建分析团队](https:\u002F\u002Fmedium.com\u002Fwish-engineering\u002Fscaling-analytics-at-wish-619eacb97d16) `Wish` `2018`\n3. [警惕数据科学“图钉工厂”：全栈数据科学家通用型人才的力量](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2019\u002F03\u002F11\u002FFullStackDS-Generalists\u002F) `Stitch Fix` `2019`\n4. [培育算法：我们在Stitch Fix如何发展数据科学](https:\u002F\u002Fcultivating-algos.stitchfix.com) `Stitch Fix`\n5. [Netflix的分析部门：我们是谁，我们做什么](https:\u002F\u002Fnetflixtechblog.com\u002Fanalytics-at-netflix-who-we-are-and-what-we-do-7d9c08fe6965) `Netflix` `2020`\n6. [在一家中后期初创公司组建数据团队：一个短篇故事](https:\u002F\u002Ferikbern.com\u002F2021\u002F07\u002F07\u002Fthe-data-team-a-short-story.html) `Erikbern` `2021`\n7. [Postman数据团队工作方式的幕后一览](https:\u002F\u002Fentrepreneurshandbook.co\u002Fa-behind-the-scenes-look-at-how-postmans-data-team-works-fded0b8bfc64) `Postman` `2021`\n8. [数据科学家与机器学习工程师的角色：有何不同？有何相似之处？](https:\u002F\u002Fbuilding.nubank.com.br\u002Fdata-scientist-x-machine-learning-engineer-roles-how-are-they-different-how-are-they-alike\u002F) `Nubank` `2022`\n\n## 失败案例\n1. [说到大猩猩，Google Photos仍然视而不见](https:\u002F\u002Fwww.wired.com\u002Fstory\u002Fwhen-it-comes-to-gorillas-google-photos-remains-blind\u002F) `Google` `2018`\n2. [超过16万名高中生只有在某个模型允许的情况下才能毕业](http:\u002F\u002Fpositivelysemidefinite.com\u002F2020\u002F06\u002F160k-students.html) `国际文凭组织` `2020`\n3. [一种基于人脸“预测”犯罪倾向的算法引发轩然大波](https:\u002F\u002Fwww.wired.com\u002Fstory\u002Falgorithm-predicts-criminality-based-face-sparks-furor\u002F) `哈里斯堡大学` `2020`\n4. [GPT-3很难生成关于穆斯林的神经网络文本](https:\u002F\u002Ftwitter.com\u002Fabidlabs\u002Fstatus\u002F1291165311329341440) `OpenAI` `2020`\n5. [英国用于预测暴力犯罪的人工智能工具缺陷太多，无法使用](https:\u002F\u002Fwww.wired.co.uk\u002Farticle\u002Fpolice-violence-prediction-ndas) `英国` `2020`\n6. 更多内容请参见[awful-ai](https:\u002F\u002Fgithub.com\u002Fdaviddao\u002Fawful-ai)\n7. [AI事件数据库](https:\u002F\u002Fincidentdatabase.ai\u002F) `AI伙伴关系` `2022`\n\n\u003Cbr>\n\n**附注：想了解机器学习领域的最新进展吗？** 通过综述论文快速掌握最新动态 👉[`ml-surveys`](https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fml-surveys)","# applied-ml 快速上手指南\n\n**注意**：`applied-ml` 并非一个需要安装运行的软件库或框架，而是一个**精选资源列表（Curated List）**。它汇集了关于数据科学与机器学习在生产环境中应用的论文、文章和博客。因此，本指南侧重于如何获取和利用这些资源，而非软件安装。\n\n## 环境准备\n\n由于本项目本质上是文档和资源索引，对环境没有特殊的技术依赖要求：\n\n*   **系统要求**：任何支持现代浏览器的操作系统（Windows, macOS, Linux）。\n*   **前置依赖**：\n    *   稳定的互联网连接（部分链接可能需要科学上网访问原始来源，如 Google, Uber Engineering Blog 等）。\n    *   GitHub 账号（可选，用于 Star 项目或提交贡献）。\n*   **语言要求**：大部分原始资源为英文，建议具备基本的英语阅读能力或使用翻译工具辅助。\n\n## 获取与浏览步骤\n\n你无需通过命令行安装该工具，直接通过以下方式访问即可：\n\n### 1. 在线浏览（推荐）\n直接访问 GitHub 仓库页面查看整理好的目录和链接：\n*   **仓库地址**: [https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fapplied-ml](https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fapplied-ml)\n\n### 2. 本地克隆（可选）\n如果你希望离线阅读或检索内容，可以将仓库克隆到本地：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fapplied-ml.git\ncd applied-ml\n```\n\n*国内加速方案*：如果克隆速度较慢，可使用 Gitee 镜像（如有）或配置 Git 代理：\n```bash\n# 示例：配置临时 HTTP 代理（请替换为你的代理地址）\nexport http_proxy=http:\u002F\u002F127.0.0.1:7890\nexport https_proxy=http:\u002F\u002F127.0.0.1:7890\ngit clone https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fapplied-ml.git\n```\n\n## 基本使用\n\n`applied-ml` 的核心价值在于解决“如何在生产环境中落地 ML 项目”的问题。以下是高效使用该资源库的方法：\n\n### 1. 按领域查找最佳实践\n根据你当前面临的技术挑战，在 `README.md` 的目录（Table of Contents）中找到对应章节。资源库涵盖了以下核心领域：\n\n*   **数据基础**：Data Quality (数据质量), Data Engineering (数据工程), Feature Stores (特征存储)\n*   **核心算法场景**：Recommendation (推荐系统), Search & Ranking (搜索排序), Forecasting (预测), NLP, Computer Vision\n*   **工程化与运维**：MLOps Platforms, Model Management, Validation and A\u002FB Testing\n*   **组织与实践**：Team Structure (团队结构), Ethics (伦理), Fails (失败案例复盘)\n\n### 2. 学习大厂落地经验\n点击具体链接，阅读来自 Airbnb, Uber, Netflix, Google, Meta 等公司的实战文章。重点关注以下四个维度（正如项目简介所述）：\n\n*   **How (如何定义问题)**：例如，是将个性化问题建模为推荐系统、搜索还是序列问题？\n*   **What (使用了什么技术)**：哪些机器学习技巧奏效了？哪些失败了？\n*   **Why (背后的原理)**：相关的科学研究、文献引用及理论依据。\n*   **Results (实际成果)**：实现了怎样的业务指标提升（ROI），以便你评估自身项目的预期收益。\n\n### 3. 示例：解决特征存储难题\n假设你需要构建一个特征存储（Feature Store）：\n1.  在目录中找到 **[Feature Stores](#feature-stores)** 章节。\n2.  阅读 `Gojek` 开源 `Feast` 的文章，了解开源方案架构。\n3.  参考 `Netflix` 关于 \"Distributed Time Travel for Feature Generation\" 的实践，学习如何处理时间旅行特征。\n4.  查阅 `DoorDash` 或 `Uber` 的工程博客，了解大规模实时特征服务的构建细节。\n\n### 4. 拓展资源\n*   如需机器学习进展综述，可访问关联项目：[`ml-surveys`](https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fml-surveys)\n*   如需应用指南和访谈，可访问：[`applyingML`](https:\u002F\u002Fapplyingml.com)","某电商初创公司的算法团队正着手构建实时个性化推荐系统，但在技术选型和落地路径上陷入迷茫。\n\n### 没有 applied-ml 时\n- **盲目试错成本高**：团队花费数周调研是将问题定义为序列建模还是搜索排序，缺乏行业标杆参考，导致架构反复推翻重来。\n- **忽视数据质量陷阱**：直接套用学术模型，未借鉴 Uber 或 Airbnb 在生产环境中关于数据监控与清洗的实战经验，上线后因脏数据导致推荐准确率大幅波动。\n- **难以评估投入产出比**：无法找到类似规模公司的真实 ROI 数据（如 Netflix 或 Amazon 的案例），难以向管理层证明项目价值以争取资源。\n- **重复造轮子**：在特征存储和异常检测等通用模块上从零开发，忽略了 Google 和 Facebook 已开源的成熟解决方案与失败教训总结。\n\n### 使用 applied-ml 后\n- **快速锁定最佳实践**：通过查阅\"Recommendation\"和\"Sequence Modelling\"章节，直接参考大厂如何将业务问题转化为具体的机器学习任务，一周内确定技术路线。\n- **规避生产环境大坑**：研读\"Data Quality\"板块中关于大规模数据验证的论文与博客，提前部署了类似 Gojek 的数据质检机制，确保模型输入稳定可靠。\n- **用数据驱动决策**：引用文中收录的真实世界成果报告，清晰量化预期收益，成功获得高层对项目的持续支持。\n- **站在巨人肩膀上**：利用\"Feature Stores\"和\"Fails\"分类下的案例，直接复用成熟的工程模式并避开前人踩过的坑，将研发周期缩短了一半。\n\napplied-ml 通过将全球顶尖公司的生产级机器学习经验结构化，帮助团队从“闭门造车”转向“站在巨人的肩膀上”高效落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feugeneyan_applied-ml_ba30523e.png","eugeneyan","Eugene Yan","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Feugeneyan_41d20c44.jpg",null,"@anthropics","Seattle","eugeneyan.com","https:\u002F\u002Fgithub.com\u002Feugeneyan",28767,3840,"2026-04-16T03:59:20","MIT",1,"","未说明",{"notes":89,"python":87,"dependencies":90},"该工具（applied-ml）并非可执行的软件代码库，而是一个 curated list（精选列表），主要收集了关于数据科学和机器学习在生产环境中应用的论文、文章和博客链接。因此，它没有操作系统、GPU、内存、Python 版本或依赖库等运行环境需求。用户只需通过浏览器阅读链接内容即可。",[],[14,15,35,16,92],"其他",[94,95,96,97,98,99,100,101,102,103,104,105,106,107],"applied-machine-learning","production","applied-data-science","machine-learning","data-science","reinforcement-learning","data-engineering","recsys","search","deep-learning","data-quality","data-discovery","computer-vision","natural-language-processing","2026-03-27T02:49:30.150509","2026-04-17T09:54:12.067182",[111,116,121],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},36938,"是否有推荐的市场篮子分析（Market Basket Analysis）资源或应对大规模频繁项集生成的建议？","虽然仓库中可能没有直接的资源，但社区分享了一个类似的实际案例：在 Red Hat 团队中，他们通过分析 Kubernetes 集群的健康数据来识别共现问题。其方法是先对“症状”进行聚类，然后使用 FP-Growth 算法挖掘频繁模式，最后通过启发式规则修剪生成的项集，从而定义具体的故障模式。相关博客文章和演讲视频可参考：https:\u002F\u002Fwww.operate-first.cloud\u002Fdata-science\u002Fopenshift-anomaly-detection\u002Fdocs\u002Fblog\u002Fdiagnosis-discovery-blog.md 以及 https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=RPBXma8NY0s。","https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fapplied-ml\u002Fissues\u002F145",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},36939,"是否可以在论文和博客列表中添加发表年份以便了解文章的时效性和行业趋势？","维护者已采纳该建议。贡献者 @shreyansh26 已经在 PR #191 中完成了年份的添加以及列表的重新排序工作，现在列表中已包含发表年份信息。","https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fapplied-ml\u002Fissues\u002F188",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},36940,"发现 README 文件中“数据发现”部分存在重复的资源链接，如何处理？","维护者已确认该问题并修复了重复条目。感谢用户指出这一细节，目前 README 中的重复资源已被移除。","https:\u002F\u002Fgithub.com\u002Feugeneyan\u002Fapplied-ml\u002Fissues\u002F177",[]]