[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-khangich--machine-learning-interview":3,"tool-khangich--machine-learning-interview":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",154349,2,"2026-04-13T23:32:16",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":83,"forks":84,"last_commit_at":85,"license":82,"difficulty_score":86,"env_os":87,"env_gpu":88,"env_ram":88,"env_deps":89,"category_tags":92,"github_topics":93,"view_count":32,"oss_zip_url":82,"oss_zip_packed_at":82,"status":17,"created_at":102,"updated_at":103,"faqs":104,"releases":105},7284,"khangich\u002Fmachine-learning-interview","machine-learning-interview","Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.","machine-learning-interview 是一个专为准备机器学习岗位面试的开发者打造的开源学习指南。它系统整理了来自 FAANG、Snapchat、LinkedIn 等顶尖科技公司的真实面试经验与核心考点，旨在帮助求职者高效攻克技术难关。\n\n该资源主要解决了候选人在面对机器学习系统设计（MLSD）和算法考核时无从下手、缺乏实战案例的痛点。通过提供“最小可行学习计划”，它将庞大的知识体系拆解为可执行的步骤，涵盖了 YouTube 推荐、LinkedIn 信息流排序、广告点击预测及 Airbnb 搜索排名等经典系统设计场景，并辅以具体的应用案例和自测题库。\n\n其内容非常适合正在求职的软件工程师、机器学习工程师以及希望提升系统设计能力的进阶开发者使用。独特的亮点在于，它不仅罗列理论，更结合作者十年一线大厂实战经验，提供了从简历筛选到最终拿 offer 的全流程策略，甚至包含对五百道算法题的深度复盘心得。无论是想进入初创公司还是瞄准头部大厂，machine-learning-interview 都能作为一份实用的案头参考，帮助用户建立清晰的知识框架，从容应对高难度的技术面试。","\n# Minimum Viable Study Plan for Machine Learning Interviews\n\u003Cp align=\"center\">\n Machine Learning System Design - Early Preview  - \u003Ca href =\"https:\u002F\u002Frebrand.ly\u002Fmldesignbook\"> Buy on Amazon \u003C\u002Fa>\u003Cbr>\n \u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesignbook\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_20f32a6e742e.png\" alt=\"Machine Learning System Design Interview\" width=\"400\" height=\"300\"> \u003C\u002Fa>\n \n \u003C\u002Fp>\n \u003Cp align=\"center\">\n Machine Learning interviews book on Amazon.\u003Cbr>\n\u003Ca href=\"https:\u002F\u002Fwww.amazon.com\u002Fdp\u002FB09S9JBT86\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_cf1ca57a426f.png\" alt=\"Machine Learning Interviews book on Amazon\" width=\"400\" height=\"300\"> \u003C\u002Fa>\n \n\u003C\u002Fp>\n\nFollow [News about AI projects](https:\u002F\u002Fnews.llmlab.io\u002F)\n\n- Most popular post: [One lesson I learned after solving 500 leetcode questions](https:\u002F\u002Fmlengineer.io\u002Ffrom-semiconductor-to-software-engineer-8c3126dde65b)\n- Oct 10th: Machine Learning System Design course became the [number 1 ML course](https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Factivity-6853724396188790784-tWxj) on educative. \n- June 8th: launch [interview stories series](https:\u002F\u002Frebrand.ly\u002Finterviewstory). \n- April 29th: I launched [mlengineer.io](https:\u002F\u002Fmlengineer.io\u002Ffrom-google-rejection-to-40-offers-71337a224ebe?sk=1408513db21536d25c23f67ce898b37d) blog so you can get latest machine learning interview experience.\n- April 15th 2021: Machine Learning System Design is launched on [interviewquery.com](https:\u002F\u002Frebrand.ly\u002Fmldesigninterview).\n- Feb 9th 2021: [Machine Learning System design](https:\u002F\u002Frebrand.ly\u002Fmlsd_launch) is now available on [educative.io](https:\u002F\u002Frebrand.ly\u002Fmlsd_launch).\n- I'm a SWE, ML with 10 years of experience ([Linkedin profile](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fphamkhang\u002F)). I had offers from Google, LinkedIn, Coupang, Snap and StichFix. Read my [blog](https:\u002F\u002Frebrand.ly\u002Fmleio). \n\n\n\n## Machine Learning Design\n\n| Section | |\n| ------------- | ------------- | \n| 1. [Youtube Recommendation](https:\u002F\u002Frebrand.ly\u002Fmldesign) |\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_162281f8b28b.png\" alt=\"Youtube Recommendation Design\" width=\"100\" height=\"100\"> \u003C\u002Fa>| \n| 2. [The main components in MLSD](https:\u002F\u002Frebrand.ly\u002Fmldesign) |\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_f4a7cfa22dbd.png\" alt=\"The main components in MLSD\" width=\"100\" height=\"100\"> \u003C\u002Fa> | \n| 3. [LinkedIn Feed Ranking](https:\u002F\u002Frebrand.ly\u002Fmldesign) |\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_f11408988596.png\" alt=\"LinkedIn Feed Ranking\" width=\"100\" height=\"100\"> \u003C\u002Fa> | \n| 4. [Ad Click Prediction](https:\u002F\u002Frebrand.ly\u002Fmldesign)|\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_735ca3487b84.png\" alt=\"Ad Click Prediction\" width=\"100\" height=\"100\"> \u003C\u002Fa> | \n| 5. [Estimate Delivery time](https:\u002F\u002Frebrand.ly\u002Fmldesign)|\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_50485382b229.png\" alt=\"Estimate Delivery time\" width=\"100\" height=\"100\"> \u003C\u002Fa> | \n| 6. [Airbnb Search ranking](https:\u002F\u002Frebrand.ly\u002Fmldesign)|\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_597875f65396.png\" alt=\"Airbnb Search ranking\" width=\"100\" height=\"100\"> \u003C\u002Fa> | \n\n\n## Getting Started\n\n| How to  | Resources |\n| ------------- | ------------- |\n| List of promising companies | [WealthFront 2021 list](https:\u002F\u002Fblog.wealthfront.com\u002Fcareer-launching-companies-list\u002F#companies-list).   |\n| Prepare for interview  | [Common questions about Machine Learning Interview process](faqs.md).   |\n| Study guide | [Study guide](README.md) contained minimum set of focus area to aces your interview.   |\n| Design ML system | [ML system design](https:\u002F\u002Fmlengineer.io\u002Fmachine-learning-design-interview-d08be9f44260?source=friends_link&sk=97fe3a510957d65b6311d5d38b30c639) includes actual ML system design usecases.    |\n| ML usecases | [ML usecases](appliedml.md) from top companies    |\n| Test your ML knowledge  | [Machine Learning quiz](https:\u002F\u002Fmlengineer.io\u002Fmachine-learning-assessment-db935aa9fafd?source=friends_link&sk=1062e407bea5d842b7684668b005d08c) are designed based on actual interview questions from dozen of big companies.  |\n| One week before onsite interview | Read [one week check list](https:\u002F\u002Fmlengineer.io\u002Fmachine-learning-engineer-onsite-interview-one-week-checklist-cfd19d57fa02?source=friends_link&sk=80d2bb43c590156a7fa72260dfb4972c) |\n| How to get offer? | Read [success stories](https:\u002F\u002Fmlengineer.io\u002Ffrom-google-rejection-to-40-offers-71337a224ebe?source=friends_link&sk=1408513db21536d25c23f67ce898b37d) |\n| FAANG companies actual MLE interviews | Read [interview stories](https:\u002F\u002Fmlengineer.io\u002Fmlengineer-io-interview\u002Fhome) |\n| Practice coding  | [Leetcode questions by categories for MLE](https:\u002F\u002Fmlengineer.io\u002Fcommon-leetcode-questions-by-categories-532b301130b)  |\n| Advance topics | Read [advance topics](extra.md) |\n\n\n\n## Study guide\n### LeetCode (not all companies ask Leetcode questions)\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_339ca4eada4b.png\">](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1RCb1dVQCLmtOGlJ5J-NJ5pIC7Tda-N2U\u002Fedit#gid=274831950)\n\n- NOTE: there are a lot of companies that do **NOT** ask leetcode questions. There are many paths to become an MLE, you can create your own path if you feel like leetcoding is a waste of time. \n\n- I use [LC time tracking](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1RCb1dVQCLmtOGlJ5J-NJ5pIC7Tda-N2U\u002Fedit#gid=274831950) to keep track of how many times I solves a question and how long I spent each time. Once I finish non-trivial medium LC questions 3 times, I have absolutely no issues solving them in actual interviews (sometimes within 8-10 minutes). It makes a big difference. A better way is to use **LeetPlug** chrome extension [here](https:\u002F\u002Fleetplug.azurewebsites.net\u002Fstatic\u002Fpages\u002Fhowto.html)\n\n [Leetcode questions by categories](https:\u002F\u002Fmlengineer.io\u002Fcommon-leetcode-questions-by-categories-532b301130b?sk=cf77975462cb0c96e6a6daebaa3ab7b9)\n### SQL\n* Know SQL join: [self join](https:\u002F\u002Fwww.sqlservertutorial.net\u002Fsql-server-basics\u002Fsql-server-self-join\u002F), inner, left, right etc. \n* Use [hackerrank](https:\u002F\u002Fwww.hackerrank.com\u002Fdomains\u002Fsql) to practice SQL.\n* Revise\u002FLearn SQL Window Functions: [window functions](https:\u002F\u002Fwww.windowfunctions.com\u002Fquestions\u002Fintro\u002F)\n\n### Programming\n* [Java garbage collection](https:\u002F\u002Fstackify.com\u002Fwhat-is-java-garbage-collection\u002F#:~:text=Java%20garbage%20collection%20is%20the,Machine%2C%20or%20JVM%20for%20short.&text=The%20garbage%20collector%20finds%20these,them%20to%20free%20up%20memory.)\n* [Python pass-by-object-reference](https:\u002F\u002Frobertheaton.com\u002F2014\u002F02\u002F09\u002Fpythons-pass-by-object-reference-as-explained-by-philip-k-dick\u002F)\n* [Python GIL, Fluent Python, chapter 17](http:\u002F\u002Findex-of.es\u002FVarios-2\u002FFluent%20Python%20Clear%20Concise%20and%20Effective%20Programming.pdf)\n* [Python multithread](https:\u002F\u002Frealpython.com\u002Fintro-to-python-threading\u002F)\n* [Python concurrency, Fluent Python, chapter 18](http:\u002F\u002Findex-of.es\u002FVarios-2\u002FFluent%20Python%20Clear%20Concise%20and%20Effective%20Programming.pdf)\n\n### Statistics and probability\n\n* The only cheatsheet that you''ll ever need\n\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_b0eb0259297a.png\">](http:\u002F\u002Fwww.wzchen.com\u002Fprobability-cheatsheet)\n\n\n\n* Learn Bayesian and practice [problems in Bayesian](https:\u002F\u002Fblogs.kent.ac.uk\u002Fjonw\u002Ffiles\u002F2015\u002F04\u002FPuza2005.pdf)\n* Let A and B be events on the same sample space, with P (A) = 0.6 and P (B) = 0.7. Can these two events be disjoint?\n* Given that Alice has 2 kids, at least one of which is a girl, what is the probability that both kids are girls? (credit [swierdo](https:\u002F\u002Fwww.reddit.com\u002Fuser\u002Fswierdo\u002F))\n* A group of 60 students is randomly split into 3 classes of equal size. All partitions are equally likely. Jack and Jill are two students belonging to that group. What is the probability that Jack and Jill will end up in the same class? \n* Given an unfair coin with the probability of heads not equal to .5. What algorithm could you use to create a list of random 1s and 0s.  \n\n\n### Big data (NOT required for Google, Facebook interview)\n* Spark [architecture](http:\u002F\u002Fdatastrophic.io\u002Fcore-concepts-architecture-and-internals-of-apache-spark\u002F) and Spark [lessons learned](https:\u002F\u002Fdatabricks.com\u002Fblog\u002F2016\u002F08\u002F31\u002Fapache-spark-scale-a-60-tb-production-use-case.html) (outdated since Spark 3.0 release)  \n* Spark [OOM](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F21138751\u002Fspark-java-lang-outofmemoryerror-java-heap-space)\n* Cassandra [best practice](https:\u002F\u002Ftech.ebayinc.com\u002Fengineering\u002Fcassandra-data-modeling-best-practices-part-1\u002F) and [here](https:\u002F\u002Fcassandra.apache.org\u002Fdoc\u002Flatest\u002Fdata_modeling\u002Fintro.html), [link](https:\u002F\u002Ftowardsdatascience.com\u002Fwhen-to-use-cassandra-and-when-to-steer-clear-72b7f2cede76![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_15f8fd47063f.png)\n), [cassandra performance](https:\u002F\u002Fwww.scnsoft.com\u002Fblog\u002Fcassandra-performance)\n* Practice problem [finding friends with MapReduce](http:\u002F\u002Fstevekrenzel.com\u002Ffinding-friends-with-mapreduce)\n* Everything in [one page](https:\u002F\u002Fmlengineer.io\u002Fbig-data-knowledge-for-machine-learning-engineer-interview-2020-148d7c335e12?source=friends_link&sk=604c593c522db5195d3bda33dc4662d7).\n\n\n\n\n\n### ML fundamentals\n* [Collinearity](https:\u002F\u002Fstatisticsbyjim.com\u002Fregression\u002Fmulticollinearity-in-regression-analysis\u002F) and [read more](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Cba9LJ9lS8s)\n* [Features scaling](https:\u002F\u002Fsebastianraschka.com\u002FArticles\u002F2014_about_feature_scaling.html)\n* [Random forest vs GBDT](https:\u002F\u002Fmedium.com\u002F@aravanshad\u002Fgradient-boosting-versus-random-forest-cfa3fa8f0d80)\n* [SMOTE synthetic minority over-sampling technique](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1106.1813.pdf)\n* [Compare discriminative vs generative model](https:\u002F\u002Fmedium.com\u002F@mlengineer\u002Fgenerative-and-discriminative-models-af5637a66a3) and [extra read](http:\u002F\u002Fai.stanford.edu\u002F~ang\u002Fpapers\u002Fnips01-discriminativegenerative.pdf)\n* [Logistic regression](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=-la3q9d7AKQ). Try to implement logistic regression from scratch. Bonus point for vectorized version in numpy + completed in 20 minutes [sample code from martinpella](sample\u002Flogistic_regression.ipynb). Followup with MapReduce version. \n* [Quantile regression](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=s203ScTy4xQ&t=954s)\n* [L1\u002FL2 intuition](https:\u002F\u002Fwww.linkedin.com\u002Fpulse\u002Fintuitive-visual-explanation-differences-between-l1-l2-xiaoli-chen\u002F)\n* [Decision tree and Random Forest fundamental](https:\u002F\u002Fpeople.csail.mit.edu\u002Fdsontag\u002Fcourses\u002Fml16\u002Fslides\u002Flecture11.pdf)\n* [Explain boosting](https:\u002F\u002Fweb.stanford.edu\u002F~hastie\u002FTALKS\u002Fboost.pdf)\n* [Least Square as Maximum Likelihood Estimator](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=_-Gnu498s3o)\n* [Maximum Likelihood Estimator introduction](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=WflqTUOvdik&t=15s)\n* [Kmeans](https:\u002F\u002Fstanford.edu\u002F~cpiech\u002Fcs221\u002Fhandouts\u002Fkmeans.html). Try to implement Kmeans from scratch [sample code from flothesof.github.io](sample\u002Fkmeans.ipynb). Bonus point for vectorized version in numpy + completed in 20 minutes. Follow-up with worst case time complexity and improvement for [initialization](extra.md).\n* Fundamentals about [PCA](http:\u002F\u002Falexhwilliams.info\u002Fitsneuronalblog\u002F2016\u002F03\u002F27\u002Fpca\u002F)\n* I didn't use [flashcard](https:\u002F\u002Fmachinelearningflashcards.com\u002F) but I'm sure it helps up to certain extend.\n\n### AB testing\n* [Trustworthy Online Controlled Experiments: A Practical Guide to A\u002FB Testing](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F339914315_Trustworthy_Online_Controlled_Experiments_A_Practical_Guide_to_AB_Testing)\n\n\n### DL fundamentals\n* [The deep learning book](https:\u002F\u002Fwww.deeplearningbook.org\u002F). Read [Part ii](https:\u002F\u002Fwww.deeplearningbook.org\u002Fcontents\u002Fpart_practical.html) \n* [Machine Learning Yearning](https:\u002F\u002Fd2wvfoqc9gyqzf.cloudfront.net\u002Fcontent\u002Fuploads\u002F2018\u002F09\u002FNg-MLY01-13.pdf). Read from section 5 to section 27.\n* [Neural network and backpropagation](http:\u002F\u002Fcs231n.stanford.edu\u002Fslides\u002F2020\u002Flecture_4.pdf)\n* [Activation functions](https:\u002F\u002Fmissinglink.ai\u002Fguides\u002Fneural-network-concepts\u002F7-types-neural-network-activation-functions-right\u002F)\n* [Loss and optimization](http:\u002F\u002Fcs231n.stanford.edu\u002Fslides\u002F2020\u002Flecture_3.pdf)\n* [Convolution Neural network notes](https:\u002F\u002Fcs231n.github.io\u002Fconvolutional-networks\u002F)\n* [Recurrent Neural Networks](http:\u002F\u002Fcs231n.stanford.edu\u002Fslides\u002F2020\u002Flecture_10.pdf)\n\n\n\n### ML system design\n#### ML classic paper\n* [Technical debt in ML](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F5656-hidden-technical-debt-in-machine-learning-systems.pdf)\n* [Rules of ML](https:\u002F\u002Fdevelopers.google.com\u002Fmachine-learning\u002Fguides\u002Frules-of-ml)\n* [An Opinionated Guide to ML Research](http:\u002F\u002Fjoschu.net\u002Fblog\u002Fopinionated-guide-ml-research.html). There is valuable advice in the Personal development section at the bottom.\n\n#### ML productions\n* [Scaling ML at Uber](https:\u002F\u002Feng.uber.com\u002Fscaling-michelangelo\u002F)\n* [DL in production](https:\u002F\u002Fgithub.com\u002Falirezadir\u002FProduction-Level-Deep-Learning)\n#### Food delivery\n* [Uber eats trip optimization](https:\u002F\u002Feng.uber.com\u002Fuber-eats-trip-optimization\u002F)\n* [Uber food discovery](https:\u002F\u002Feng.uber.com\u002Fuber-eats-query-understanding\u002F)\n* [Personalized store feed](https:\u002F\u002Fblog.doordash.com\u002Fpersonalized-store-feed-with-vector-embeddings-251ad7a2c09a)\n* [Doordash dispatch optimization](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F02\u002F28\u002Fnext-generation-optimization-for-dasher-dispatch-at-doordash\u002F)\n\n#### ML design common usecases\n* [ML system design primer](https:\u002F\u002Finterview.mlengineer.io\u002F)\n* [Video recommendation](https:\u002F\u002Finterview.mlengineer.io\u002F)\n* [Feed ranking](https:\u002F\u002Finterview.mlengineer.io\u002F)\n\n\n#### Fraud detection (TBD)\n\n#### Adtech\n* [Ad click prediction trend](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F41159.pdf)\n* [Ad Clicks CTR](https:\u002F\u002Fresearch.fb.com\u002Fwp-content\u002Fuploads\u002F2016\u002F11\u002Fpractical-lessons-from-predicting-clicks-on-ads-at-facebook.pdf)\n* [Delayed feedbacks](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2019\u002Fimproving-engagement-on-digital-ads-with-delayed-feedback.html)\n* [Entity embedding](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2018\u002Fembeddingsattwitter.html)\n* [Star space, embedding all the things](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1709.03856.pdf)\n* [Twitter timeline ranking](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2017\u002Fusing-deep-learning-at-scale-in-twitters-timelines.html)\n\n#### Recommendations:\n* [Instagram explore](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fpowered-by-ai-instagrams-explore-recommender-system\u002F)\n* [TikTok recommendation](https:\u002F\u002Fnewsroom.tiktok.com\u002Fen-us\u002Fhow-tiktok-recommends-videos-for-you)\n* [Deep Neural Networks for YouTube Recommendations](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F45530.pdf)\n* [Wide & Deep Learning for Recommender Systems](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.07792.pdf%29\u002F)\n\n## Testimonials\n- V, Amazon L5 DS\n> I really found the quizzes very helpful for testing my ML understanding. Also, the resources shared helped me a lot for revising concepts for my interview preparation. This course will definitely help engineers crack Machine Learning Engineering and Data Science interviews.\n\n\n- K, Facebook MLE\n> I really like what you've built, it'll help a lot of engineers.\n\n- D, NVIDIA DS\n> I have been using your github repo to prep for my interviews and got an offer with NVIDIA with their data science team. Thanks again for your help!\n\n- A, Booking\n> Woow this is very useful summaries, so nice. \n\n- H, Microsoft\n> That's incredible! \n\n- V, Intel\n> The repo is extremely cohesive! Thanks again. \n\n\n## Intro\n\n* This repo is written based on REAL interview questions from big companies and the study materials are based on legit experts i.e Andrew Ng, Yoshua Bengio etc. \n\n* I have 6 YOE in Machine Learning and have interviewed more than dozen big companies. This is the **minimum** viable study plan that covers all actual interview questions from Facebook, Amazon, Apple, Google, MS, SnapChat, Linkedin etc. \n\n* If you're interested to learn more about paid ML system design course, [click here](course.md). This course will provide 6-7 practical usecases with proven solutions. After this course you will be able to solve new problem with systematic approach.\n\n\n# Acknowledgements and contributing\n1. Thanks for early feedbacks and contributions from [Vivian](https:\u002F\u002Fgithub.com\u002Fliuvivian11), [aragorn87](https:\u002F\u002Fgithub.com\u002Faragorn87) and others. You can create an Issue or Pull Request on this repo. You can also help upvote on [ProductHunt](https:\u002F\u002Fwww.producthunt.com\u002Fposts\u002Fmachine-learning-interview-guideline)\n\n2. If you find this helpful, you can Sponsor this project. It's cool if you don't. \n\n3. Thanks to this community, we have donated about $200 to [HopeForPaws](https:\u002F\u002Fwww.hopeforpaws.org\u002F). If you want to support, you can contribute too on their website. \n\n\n\n","# 机器学习面试的最小可行学习计划\n\u003Cp align=\"center\">\n 机器学习系统设计 - 提前预览 - \u003Ca href =\"https:\u002F\u002Frebrand.ly\u002Fmldesignbook\"> 在亚马逊购买 \u003C\u002Fa>\u003Cbr>\n \u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesignbook\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_20f32a6e742e.png\" alt=\"机器学习系统设计面试\" width=\"400\" height=\"300\"> \u003C\u002Fa>\n \n \u003C\u002Fp>\n \u003Cp align=\"center\">\n 亚马逊上的机器学习面试书籍。\u003Cbr>\n\u003Ca href=\"https:\u002F\u002Fwww.amazon.com\u002Fdp\u002FB09S9JBT86\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_cf1ca57a426f.png\" alt=\"亚马逊上的机器学习面试书籍\" width=\"400\" height=\"300\"> \u003C\u002Fa>\n \n\u003C\u002Fp>\n\n关注 [AI项目新闻](https:\u002F\u002Fnews.llmlab.io\u002F)\n\n- 最受欢迎的文章：[解决500道LeetCode题目后我学到的一课](https:\u002F\u002Fmlengineer.io\u002Ffrom-semiconductor-to-software-engineer-8c3126dde65b)\n- 10月10日：机器学习系统设计课程成为educative平台上的[排名第一的ML课程](https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Factivity-6853724396188790784-tWxj)。 \n- 6月8日：推出[面试故事系列](https:\u002F\u002Frebrand.ly\u002Finterviewstory)。 \n- 4月29日：我推出了[mlengineer.io](https:\u002F\u002Fmlengineer.io\u002Ffrom-google-rejection-to-40-offers-71337a224ebe?sk=1408513db21536d25c23f67ce898b37d)博客，以便您获取最新的机器学习面试经验。\n- 2021年4月15日：机器学习系统设计在[interviewquery.com](https:\u002F\u002Frebrand.ly\u002Fmldesigninterview)上线。\n- 2021年2月9日：[机器学习系统设计](https:\u002F\u002Frebrand.ly\u002Fmlsd_launch)现已在[educative.io](https:\u002F\u002Frebrand.ly\u002Fmlsd_launch)上架。\n- 我是一名拥有10年经验的软件工程师兼机器学习工程师([LinkedIn个人资料](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fphamkhang\u002F)）。我曾收到过来自Google、LinkedIn、Coupang、Snap和StitchFix的录用通知。请阅读我的[博客](https:\u002F\u002Frebrand.ly\u002Fmleio)。\n\n\n\n## 机器学习设计\n\n| 版块 | |\n| ------------- | ------------- | \n| 1. [YouTube推荐系统](https:\u002F\u002Frebrand.ly\u002Fmldesign) |\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_162281f8b28b.png\" alt=\"YouTube推荐系统设计\" width=\"100\" height=\"100\"> \u003C\u002Fa>| \n| 2. [MLSD中的主要组件](https:\u002F\u002Frebrand.ly\u002Fmldesign) |\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_f4a7cfa22dbd.png\" alt=\"MLSD中的主要组件\" width=\"100\" height=\"100\"> \u003C\u002Fa> | \n| 3. [LinkedIn信息流排序](https:\u002F\u002Frebrand.ly\u002Fmldesign) |\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_f11408988596.png\" alt=\"LinkedIn信息流排序\" width=\"100\" height=\"100\"> \u003C\u002Fa> | \n| 4. [广告点击率预测](https:\u002F\u002Frebrand.ly\u002Fmldesign)|\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_735ca3487b84.png\" alt=\"广告点击率预测\" width=\"100\" height=\"100\"> \u003C\u002Fa> | \n| 5. [预计送达时间](https:\u002F\u002Frebrand.ly\u002Fmldesign)|\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_50485382b229.png\" alt=\"预计送达时间\" width=\"100\" height=\"100\"> \u003C\u002Fa> | \n| 6. [Airbnb搜索排名](https:\u002F\u002Frebrand.ly\u002Fmldesign)|\u003Ca href=\"https:\u002F\u002Frebrand.ly\u002Fmldesign\"> \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_597875f65396.png\" alt=\"Airbnb搜索排名\" width=\"100\" height=\"100\"> \u003C\u002Fa> | \n\n\n## 入门指南\n\n| 如何操作 | 资源 |\n| ------------- | ------------- |\n| 有前景的公司列表 | [WealthFront 2021年榜单](https:\u002F\u002Fblog.wealthfront.com\u002Fcareer-launching-companies-list\u002F#companies-list)。   |\n| 面试准备 | [机器学习面试流程常见问题](faqs.md)。   |\n| 学习指南 | [学习指南](README.md)包含了通过面试所需的最小重点领域。   |\n| 设计机器学习系统 | [机器学习系统设计](https:\u002F\u002Fmlengineer.io\u002Fmachine-learning-design-interview-d08be9f44260?source=friends_link&sk=97fe3a510957d65b6311d5d38b30c639)包括实际的机器学习系统设计用例。    |\n| 机器学习用例 | [顶级公司的机器学习用例](appliedml.md)    |\n| 测试你的机器学习知识 | [机器学习测验](https:\u002F\u002Fmlengineer.io\u002Fmachine-learning-assessment-db935aa9fafd?source=friends_link&sk=1062e407bea5d842b7684668b005d08c)基于数十家大型公司的实际面试题设计而成。  |\n| 现场面试前一周 | 阅读[一周检查清单](https:\u002F\u002Fmlengineer.io\u002Fmachine-learning-engineer-onsite-interview-one-week-checklist-cfd19d57fa02?source=friends_link&sk=80d2bb43c590156a7fa72260dfb4972c) |\n| 如何获得录用？ | 阅读[成功案例](https:\u002F\u002Fmlengineer.io\u002Ffrom-google-rejection-to-40-offers-71337a224ebe?source=friends_link&sk=1408513db21536d25c23f67ce898b37d) |\n| FAANG公司的真实MLE面试 | 阅读[面试故事](https:\u002F\u002Fmlengineer.io\u002Fmlengineer-io-interview\u002Fhome) |\n| 编程练习 | [MLE分类的LeetCode题目](https:\u002F\u002Fmlengineer.io\u002Fcommon-leetcode-questions-by-categories-532b301130b)  |\n| 进阶主题 | 阅读[进阶主题](extra.md) |\n\n\n\n## 学习指南\n### LeetCode（并非所有公司都会问LeetCode题目）\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_339ca4eada4b.png\">](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1RCb1dVQCLmtOGlJ5J-NJ5pIC7Tda-N2U\u002Fedit#gid=274831950)\n\n- 注意：有很多公司**不**会问LeetCode题目。成为一名MLE有很多途径，如果你觉得刷LeetCode浪费时间，也可以选择自己的道路。\n\n- 我使用[LC时间跟踪](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1RCb1dVQCLmtOGlJ5J-NJ5pIC7Tda-N2U\u002Fedit#gid=274831950)来记录自己解一道题的次数以及每次花费的时间。一旦我把那些非平凡的中等难度LeetCode题目做了三次，我在实际面试中就能毫无困难地解决它们（有时甚至只需8到10分钟）。这确实有很大帮助。更好的方法是使用**LeetPlug** Chrome扩展程序[这里](https:\u002F\u002Fleetplug.azurewebsites.net\u002Fstatic\u002Fpages\u002Fhowto.html)\n\n [按类别划分的LeetCode题目](https:\u002F\u002Fmlengineer.io\u002Fcommon-leetcode-questions-by-categories-532b301130b?sk=cf77975462cb0c96e6a6daebaa3ab7b9)\n### SQL\n* 掌握SQL连接：[自连接](https:\u002F\u002Fwww.sqlservertutorial.net\u002Fsql-server-basics\u002Fsql-server-self-join\u002F)、内连接、左连接、右连接等。 \n* 使用[hackerrank](https:\u002F\u002Fwww.hackerrank.com\u002Fdomains\u002Fsql)练习SQL。\n* 复习\u002F学习SQL窗口函数：[窗口函数](https:\u002F\u002Fwww.windowfunctions.com\u002Fquestions\u002Fintro\u002F)\n\n### 编程\n* [Java垃圾回收](https:\u002F\u002Fstackify.com\u002Fwhat-is-java-garbage-collection\u002F#:~:text=Java%20garbage%20collection%20is%20the,Machine%2C%20or%20JVM%20for%20short.&text=The%20garbage%20collector%20finds%20these,them%20to%20free%20up%20memory.)\n* [Python按对象引用传递](https:\u002F\u002Frobertheaton.com\u002F2014\u002F02\u002F09\u002Fpythons-pass-by-object-reference-as-explained-by-philip-k-dick\u002F)\n* [Python GIL，《流畅的Python》第17章](http:\u002F\u002Findex-of.es\u002FVarios-2\u002FFluent%20Python%20Clear%20Concise%20and%20Effective%20Programming.pdf)\n* [Python多线程](https:\u002F\u002Frealpython.com\u002Fintro-to-python-threading\u002F)\n* [Python并发，《流畅的Python》第18章](http:\u002F\u002Findex-of.es\u002FVarios-2\u002FFluent%20Python%20Clear%20Concise%20and%20Effective%20Programming.pdf)\n\n### 统计与概率\n\n* 你永远需要的唯一作弊表\n\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_b0eb0259297a.png\">](http:\u002F\u002Fwww.wzchen.com\u002Fprobability-cheatsheet)\n\n\n\n* 学习贝叶斯方法并练习 [贝叶斯相关问题](https:\u002F\u002Fblogs.kent.ac.uk\u002Fjonw\u002Ffiles\u002F2015\u002F04\u002FPuza2005.pdf)\n* 设A和B是同一样本空间上的两个事件，其中P(A) = 0.6，P(B) = 0.7。这两个事件能否互斥？\n* 已知爱丽丝有两个孩子，至少有一个是女孩，那么这两个孩子都是女孩的概率是多少？（感谢 [swierdo](https:\u002F\u002Fwww.reddit.com\u002Fuser\u002Fswierdo\u002F) 提供）\n* 一群60名学生被随机分成3个大小相等的班级，所有分组方式的可能性均等。杰克和吉尔是该群体中的两名学生。他们两人最终会分到同一个班级的概率是多少？\n* 给定一枚不均匀的硬币，其正面朝上的概率不等于0.5。你可以使用什么算法来生成一个由随机1和0组成的列表？\n\n\n### 大数据（Google、Facebook面试非必考内容）\n* Spark [架构](http:\u002F\u002Fdatastrophic.io\u002Fcore-concepts-architecture-and-internals-of-apache-spark\u002F) 和 Spark [经验总结](https:\u002F\u002Fdatabricks.com\u002Fblog\u002F2016\u002F08\u002F31\u002Fapache-spark-scale-a-60-tb-production-use-case.html)（自Spark 3.0发布后已过时）  \n* Spark [OOM问题](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F21138751\u002Fspark-java-lang-outofmemoryerror-java-heap-space)\n* Cassandra [最佳实践](https:\u002F\u002Ftech.ebayinc.com\u002Fengineering\u002Fcassandra-data-modeling-best-practices-part-1\u002F) 和 [这里](https:\u002F\u002Fcassandra.apache.org\u002Fdoc\u002Flatest\u002Fdata_modeling\u002Fintro.html)，[链接](https:\u002F\u002Ftowardsdatascience.com\u002Fwhen-to-use-cassandra-and-when-to-steer-clear-72b7f2cede76![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_readme_15f8fd47063f.png)\n)，[Cassandra性能](https:\u002F\u002Fwww.scnsoft.com\u002Fblog\u002Fcassandra-performance)\n* 练习题 [使用MapReduce查找朋友](http:\u002F\u002Fstevekrenzel.com\u002Ffinding-friends-with-mapreduce)\n* 所有内容都在 [一页](https:\u002F\u002Fmlengineer.io\u002Fbig-data-knowledge-for-machine-learning-engineer-interview-2020-148d7c335e12?source=friends_link&sk=604c593c522db5195d3bda33dc4662d7).\n\n\n\n### 机器学习基础\n* [共线性](https:\u002F\u002Fstatisticsbyjim.com\u002Fregression\u002Fmulticollinearity-in-regression-analysis\u002F) 和 [更多阅读](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Cba9LJ9lS8s)\n* [特征缩放](https:\u002F\u002Fsebastianraschka.com\u002FArticles\u002F2014_about_feature_scaling.html)\n* [随机森林与GBDT对比](https:\u002F\u002Fmedium.com\u002F@aravanshad\u002Fgradient-boosting-versus-random-forest-cfa3fa8f0d80)\n* [SMOTE合成少数类过采样技术](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1106.1813.pdf)\n* [判别模型与生成模型比较](https:\u002F\u002Fmedium.com\u002F@mlengineer\u002Fgenerative-and-discriminative-models-af5637a66a3) 和 [额外阅读](http:\u002F\u002Fai.stanford.edu\u002F~ang\u002Fpapers\u002Fnips01-discriminativegenerative.pdf)\n* [逻辑回归](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=-la3q9d7AKQ)。尝试从头实现逻辑回归。如果用numpy实现向量化版本并在20分钟内完成，可获得加分 [参考代码来自martinpella](sample\u002Flogistic_regression.ipynb)。随后可以进一步实现MapReduce版本。\n* [分位数回归](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=s203ScTy4xQ&t=954s)\n* [L1\u002FL2直观理解](https:\u002F\u002Fwww.linkedin.com\u002Fpulse\u002Fintuitive-visual-explanation-differences-between-l1-l2-xiaoli-chen\u002F)\n* [决策树和随机森林基础](https:\u002F\u002Fpeople.csail.mit.edu\u002Fdsontag\u002Fcourses\u002Fml16\u002Fslides\u002Flecture11.pdf)\n* [梯度提升解释](https:\u002F\u002Fweb.stanford.edu\u002F~hastie\u002FTALKS\u002Fboost.pdf)\n* [最小二乘法作为最大似然估计量](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=_-Gnu498s3o)\n* [最大似然估计量简介](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=WflqTUOvdik&t=15s)\n* [K均值聚类](https:\u002F\u002Fstanford.edu\u002F~cpiech\u002Fcs221\u002Fhandouts\u002Fkmeans.html)。尝试从头实现K均值聚类 [参考代码来自flothesof.github.io](sample\u002Fkmeans.ipynb)。如果用numpy实现向量化版本并在20分钟内完成，可获得加分。后续可以探讨最坏情况下的时间复杂度以及对 [初始化](extra.md) 的改进。\n* 关于 [PCA](http:\u002F\u002Falexhwilliams.info\u002Fitsneuronalblog\u002F2016\u002F03\u002F27\u002Fpca\u002F) 的基础知识\n* 我没有使用 [记忆卡片](https:\u002F\u002Fmachinelearningflashcards.com\u002F)，但我相信它在一定程度上会有帮助。\n\n### A\u002FB测试\n* [可信在线控制实验：A\u002FB测试实用指南](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F339914315_Trustworthy_Online_Controlled_Experiments_A_Practical_Guide_to_AB_Testing)\n\n\n### 深度学习基础\n* [深度学习教科书](https:\u002F\u002Fwww.deeplearningbook.org\u002F)。阅读 [第二部分](https:\u002F\u002Fwww.deeplearningbook.org\u002Fcontents\u002Fpart_practical.html) \n* [机器学习渴望](https:\u002F\u002Fd2wvfoqc9gyqzf.cloudfront.net\u002Fcontent\u002Fuploads\u002F2018\u002F09\u002FNg-MLY01-13.pdf)。从第5节读到第27节。\n* [神经网络与反向传播](http:\u002F\u002Fcs231n.stanford.edu\u002Fslides\u002F2020\u002Flecture_4.pdf)\n* [激活函数](https:\u002F\u002Fmissinglink.ai\u002Fguides\u002Fneural-network-concepts\u002F7-types-neural-network-activation-functions-right\u002F)\n* [损失函数与优化](http:\u002F\u002Fcs231n.stanford.edu\u002Fslides\u002F2020\u002Flecture_3.pdf)\n* [卷积神经网络笔记](https:\u002F\u002Fcs231n.github.io\u002Fconvolutional-networks\u002F)\n* [循环神经网络](http:\u002F\u002Fcs231n.stanford.edu\u002Fslides\u002F2020\u002Flecture_10.pdf)\n\n### 机器学习系统设计\n#### 机器学习经典论文\n* [机器学习中的技术债务](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F5656-hidden-technical-debt-in-machine-learning-systems.pdf)\n* [机器学习规则](https:\u002F\u002Fdevelopers.google.com\u002Fmachine-learning\u002Fguides\u002Frules-of-ml)\n* [机器学习研究的主观指南](http:\u002F\u002Fjoschu.net\u002Fblog\u002Fopinionated-guide-ml-research.html)。文末的个人发展部分包含有价值的建议。\n\n#### 机器学习生产实践\n* [优步的机器学习规模化](https:\u002F\u002Feng.uber.com\u002Fscaling-michelangelo\u002F)\n* [深度学习在生产环境中的应用](https:\u002F\u002Fgithub.com\u002Falirezadir\u002FProduction-Level-Deep-Learning)\n#### 食品配送\n* [优步外卖行程优化](https:\u002F\u002Feng.uber.com\u002Fuber-eats-trip-optimization\u002F)\n* [优步食品发现](https:\u002F\u002Feng.uber.com\u002Fuber-eats-query-understanding\u002F)\n* [个性化店铺信息流](https:\u002F\u002Fblog.doordash.com\u002Fpersonalized-store-feed-with-vector-embeddings-251ad7a2c09a)\n* [DoorDash调度优化](https:\u002F\u002Fdoordash.engineering\u002F2020\u002F02\u002F28\u002Fnext-generation-optimization-for-dasher-dispatch-at-doordash\u002F)\n\n#### 机器学习设计常见用例\n* [机器学习系统设计入门](https:\u002F\u002Finterview.mlengineer.io\u002F)\n* [视频推荐](https:\u002F\u002Finterview.mlengineer.io\u002F)\n* [信息流排序](https:\u002F\u002Finterview.mlengineer.io\u002F)\n\n\n#### 欺诈检测（待补充）\n\n#### 广告技术\n* [广告点击预测趋势](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F41159.pdf)\n* [广告点击率CTR](https:\u002F\u002Fresearch.fb.com\u002Fwp-content\u002Fuploads\u002F2016\u002F11\u002Fpractical-lessons-from-predicting-clicks-on-ads-at-facebook.pdf)\n* [延迟反馈](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2019\u002Fimproving-engagement-on-digital-ads-with-delayed-feedback.html)\n* [实体嵌入](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2018\u002Fembeddingsattwitter.html)\n* [Star Space，万物嵌入](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1709.03856.pdf)\n* [推特时间线排序](https:\u002F\u002Fblog.twitter.com\u002Fengineering\u002Fen_us\u002Ftopics\u002Finsights\u002F2017\u002Fusing-deep-learning-at-scale-in-twitters-timelines.html)\n\n#### 推荐系统：\n* [Instagram Explore](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fpowered-by-ai-instagrams-explore-recommender-system\u002F)\n* [TikTok推荐](https:\u002F\u002Fnewsroom.tiktok.com\u002Fen-us\u002Fhow-tiktok-recommends-videos-for-you)\n* [用于YouTube推荐的深度神经网络](https:\u002F\u002Fstorage.googleapis.com\u002Fpub-tools-public-publication-data\u002Fpdf\u002F45530.pdf)\n* [用于推荐系统的Wide & Deep Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.07792.pdf%29\u002F)\n\n## 用户评价\n- V，亚马逊L5数据科学家\n> 我觉得这些测验对检验我的机器学习理解非常有帮助。此外，分享的资源也极大地帮助我复习了面试准备中的相关概念。这门课程肯定会帮助工程师们顺利通过机器学习工程和数据科学相关的面试。\n\n\n- K，Facebook MLE\n> 我非常喜欢你所做的工作，它会对很多工程师有很大帮助。\n\n- D，NVIDIA数据科学家\n> 我一直在使用你的GitHub仓库来准备面试，并成功拿到了NVIDIA数据科学团队的offer。再次感谢你的帮助！\n\n- A，Booking\n> 哇，这些总结太实用了，真是太棒了。\n\n- H，微软\n> 太不可思议了！\n\n- V，英特尔\n> 这个仓库内容非常连贯！再次感谢。\n\n\n## 简介\n\n* 本仓库的内容基于大型公司的真实面试题目编写而成，学习资料则参考了多位权威专家，如吴恩达、约书亚·本吉奥等。\n\n* 我拥有6年的机器学习从业经验，并曾参加过多家大公司的面试。这份**最小可行**的学习计划涵盖了来自Facebook、Amazon、Apple、Google、微软、Snapchat、LinkedIn等公司的所有实际面试问题。\n\n* 如果您有兴趣了解更多关于付费机器学习系统设计课程的信息，请[点击此处](course.md)。该课程将提供6到7个具有成熟解决方案的实际应用场景。完成课程后，您将能够以系统化的方法解决新问题。\n\n\n# 致谢与贡献\n1. 感谢[Vivian](https:\u002F\u002Fgithub.com\u002Fliuvivian11)、[aragorn87](https:\u002F\u002Fgithub.com\u002Faragorn87)以及其他人的早期反馈和贡献。您可以在本仓库中创建Issue或Pull Request。您也可以在[ProductHunt](https:\u002F\u002Fwww.producthunt.com\u002Fposts\u002Fmachine-learning-interview-guideline)上为该项目点赞。\n\n2. 如果您觉得本项目有所帮助，可以考虑赞助该项目。当然，不赞助也没有关系。\n\n3. 感谢社区的支持，我们已向[HopeForPaws](https:\u002F\u002Fwww.hopeforpaws.org\u002F)捐赠了约200美元。如果您也希望支持，可以通过他们的官网进行捐助。","# machine-learning-interview 快速上手指南\n\n`machine-learning-interview` 并非一个需要安装运行的软件库，而是一份**机器学习面试备考路线图与资源合集**。它提供了从系统设计、算法基础到编程实战的全面学习清单。本指南将帮助你快速利用该仓库开始备考。\n\n## 环境准备\n\n由于本项目主要是文档和资源链接集合，无需特定的系统环境或复杂的依赖安装。你只需要：\n\n*   **操作系统**：Windows, macOS 或 Linux 均可。\n*   **核心工具**：\n    *   现代浏览器（Chrome, Edge, Firefox 等）用于访问外链资源。\n    *   **Python 3.x**：用于实践代码示例（如手写 Logistic Regression, K-Means 等）。\n    *   **Jupyter Notebook** 或 **VS Code**：推荐用于运行和调试机器学习算法练习代码。\n*   **前置知识**：具备基础的统计学、线性代数知识以及至少一门编程语言（Python\u002FJava\u002FC++）的基础。\n\n## 安装步骤\n\n本项目无需通过 `pip` 或 `npm` 安装。你可以直接克隆仓库以获取本地的学习笔记模板和示例代码。\n\n1.  **克隆仓库**\n    打开终端，执行以下命令：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fphamkhang\u002Fmachine-learning-interview.git\n    ```\n\n2.  **进入目录**\n    ```bash\n    cd machine-learning-interview\n    ```\n\n3.  **安装 Python 依赖（可选）**\n    如果你打算运行 `sample\u002F` 目录下的算法实现示例（如 `logistic_regression.ipynb`），建议创建一个虚拟环境并安装基础数据科学库：\n    ```bash\n    python -m venv venv\n    source venv\u002Fbin\u002Factivate  # Windows 用户请使用: venv\\Scripts\\activate\n    pip install numpy pandas scikit-learn jupyter matplotlib\n    ```\n    > **提示**：国内用户可使用清华源加速安装：\n    > ```bash\n    > pip install numpy pandas scikit-learn jupyter matplotlib -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    > ```\n\n## 基本使用\n\n本项目的核心用法是**按照提供的学习路径（Study Guide）进行针对性复习**。\n\n### 1. 制定学习计划\n打开根目录下的 `README.md` 文件，重点关注 **Study guide** 章节。根据你的目标公司类型选择侧重点：\n*   **通用路径**：涵盖 LeetCode、SQL、统计学概率、机器学习基础。\n*   **系统设计路径**：参考 \"Machine Learning Design\" 表格，学习 YouTube 推荐、LinkedIn 信息流排序等经典案例。\n*   **大数据路径**：如果面试涉及大数据岗位，重点阅读 Spark 和 Cassandra 相关章节（注：Google\u002FFacebook 常规面试通常不强制要求）。\n\n### 2. 实战代码练习\n在 `sample\u002F` 目录中，你可以找到经典的算法手写实现示例。这是面试中考察基本功的重点。\n\n**示例：运行逻辑回归手写实现**\n```bash\n# 启动 Jupyter Notebook\njupyter notebook sample\u002Flogistic_regression.ipynb\n```\n*   **任务**：尝试在不看答案的情况下，于 20 分钟内用 NumPy 实现向量化版本的逻辑回归。\n*   **进阶**：尝试实现 MapReduce 版本或 K-Means 聚类算法（参考 `sample\u002Fkmeans.ipynb`）。\n\n### 3. 利用外部资源深化学习\n项目中包含了大量高质量的外部链接，建议按以下顺序访问：\n*   **刷题统计**：使用提供的 [Google Sheets 模板](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1RCb1dVQCLmtOGlJ5J-NJ5pIC7Tda-N2U\u002Fedit#gid=274831950) 追踪 LeetCode 刷题进度（重点重复练习 Medium 难度题目 3 次以上）。\n*   **系统设计**：访问 `mlengineer.io` 阅读关于 ML 系统设计的深度文章（如广告点击预测、配送时间估算等）。\n*   **数学基础**：下载并研读项目推荐的统计学速查表（Cheatsheet）和贝叶斯练习题。\n\n### 4. 模拟面试自检\n在面试前一周，对照项目中的 **\"One week before onsite interview\"** 清单进行最后核查，并阅读 \"Success stories\" 了解真实的面经故事。","一位拥有传统软件开发背景的工程师正在备战 FAANG 大厂的机器学习岗位面试，面对复杂的系统设计题感到无从下手。\n\n### 没有 machine-learning-interview 时\n- **知识体系碎片化**：候选人只能在网上零散搜索\"推荐系统”或“广告点击预测”的面经，缺乏像 YouTube 推荐或 LinkedIn 信息流排序这样成体系的真实案例解析。\n- **设计思路模糊**：在回答机器学习系统设计（MLSD）问题时，不知道如何构建从数据收集、特征工程到模型服务的完整闭环，容易遗漏关键组件。\n- **备考重点偏差**：由于不了解大厂实际考察范围，花费大量时间钻研偏门的算法推导，却忽视了工业界最看重的场景落地能力和架构思维。\n- **缺乏实战参照**：没有来自 Snapchat、LinkedIn 等一线大厂的真实成功路径参考，难以评估自己的准备程度是否达到了录用标准。\n\n### 使用 machine-learning-interview 后\n- **掌握核心案例**：直接研读库中提供的 YouTube 推荐、Airbnb 搜索排序等 6 大经典系统设计案例，快速建立对主流业务场景的深刻理解。\n- **构建标准化框架**：通过学习 MLSD 的核心组件拆解，能够条理清晰地设计出包含数据流水线、模型训练及在线服务的高可用架构。\n- **聚焦最小可行计划**：依据官方提供的“最小可行性学习计划”，精准锁定高频考点，将有限的备考时间集中在最能提升通过率的关键领域。\n- **复刻成功路径**：参考作者从半导体转行并斩获 Google、Snap 等多家 Offer 的真实经验与博客故事，获得针对性的策略指导和信心加持。\n\nmachine-learning-interview 通过将大厂真实的面试真题与系统化的设计方法论相结合，帮助候选人从盲目刷题转向具备工业级思维的精准备战。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkhangich_machine-learning-interview_f4a7cfa2.png","khangich","Khang Pham","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fkhangich_c6a85948.jpg","Machine Learning Engineer | blog.llmlab.io","https:\u002F\u002Fmath.llmlab.io","Bay Area","khangphama@gmail.com","KhangAnPham","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fphamkhang\u002F","https:\u002F\u002Fgithub.com\u002Fkhangich",null,12460,2008,"2026-04-13T16:35:16",1,"","未说明",{"notes":90,"python":88,"dependencies":91},"该项目并非可执行的软件工具或代码库，而是一份机器学习面试的学习指南和资源汇总（包含书单、博客文章链接、练习题和概念解释）。因此，它没有特定的操作系统、GPU、内存、Python 版本或依赖库要求。用户只需通过浏览器访问提供的链接或阅读相关文档即可使用。",[],[14],[94,95,96,97,98,99,100,101],"interview-preparation","deep-learning","system-design","mvp","interivew","machine-learning","leetcode","interview-questions","2026-03-27T02:49:30.150509","2026-04-14T12:26:58.109905",[],[]]