[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-shafaypro--CrackingMachineLearningInterview":3,"tool-shafaypro--CrackingMachineLearningInterview":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",152630,2,"2026-04-12T23:33:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":78,"owner_twitter":78,"owner_website":78,"owner_url":79,"languages":80,"stars":85,"forks":86,"last_commit_at":87,"license":78,"difficulty_score":88,"env_os":89,"env_gpu":90,"env_ram":90,"env_deps":91,"category_tags":94,"github_topics":78,"view_count":32,"oss_zip_url":78,"oss_zip_packed_at":78,"status":17,"created_at":95,"updated_at":96,"faqs":97,"releases":98},7004,"shafaypro\u002FCrackingMachineLearningInterview","CrackingMachineLearningInterview","A repository to prepare you for your machine learning interview, involving most of the questions asked by all the tech giants and local companies. Do this to Ace your Machine Learning Engineer Interviews","CrackingMachineLearningInterview 是一个专为人工智能与机器学习领域求职者打造的开源面试备战知识库。它系统性地整理了来自全球科技巨头及本土企业的核心面试题，旨在帮助候选人从容应对从基础理论到前沿落地的各类考核，从而在激烈的竞争中脱颖而出。\n\n该资源库有效解决了求职者面对庞杂知识体系时无从下手、缺乏针对性练习以及难以掌握最新行业趋势（如大模型应用）的痛点。它不仅涵盖了经典的机器学习算法、统计学和深度学习内容，更紧跟 2026 年技术风向，深度解析了生成式 AI（GenAI）、大语言模型（LLM）、RAG 检索增强生成、智能体（Agents）架构以及 MLOps 生产化部署等现代面试热点。\n\n无论是机器学习工程师、数据科学家、AI 研发人员，还是从事数据工程与 DevOps 的专业开发者，都能从中获益。CrackingMachineLearningInterview 提供了清晰的\"2026 面试路线图”和分阶段学习建议，用户可根据自身目标选择经典题库或专项追踪（如云原生 ML 平台、系统设计）。其独特的亮点在于将传统算法根基与现代 AI 工程实践完美结合，并辅","CrackingMachineLearningInterview 是一个专为人工智能与机器学习领域求职者打造的开源面试备战知识库。它系统性地整理了来自全球科技巨头及本土企业的核心面试题，旨在帮助候选人从容应对从基础理论到前沿落地的各类考核，从而在激烈的竞争中脱颖而出。\n\n该资源库有效解决了求职者面对庞杂知识体系时无从下手、缺乏针对性练习以及难以掌握最新行业趋势（如大模型应用）的痛点。它不仅涵盖了经典的机器学习算法、统计学和深度学习内容，更紧跟 2026 年技术风向，深度解析了生成式 AI（GenAI）、大语言模型（LLM）、RAG 检索增强生成、智能体（Agents）架构以及 MLOps 生产化部署等现代面试热点。\n\n无论是机器学习工程师、数据科学家、AI 研发人员，还是从事数据工程与 DevOps 的专业开发者，都能从中获益。CrackingMachineLearningInterview 提供了清晰的\"2026 面试路线图”和分阶段学习建议，用户可根据自身目标选择经典题库或专项追踪（如云原生 ML 平台、系统设计）。其独特的亮点在于将传统算法根基与现代 AI 工程实践完美结合，并辅以推荐的学习顺序和项目案例，帮助用户构建从理论基础到生产级系统设计的完整能力闭环，是通往理想职位的实用指南。","## CrackingMachineLearningInterview\n\nA practical interview preparation repository for Machine Learning Engineer, AI Engineer, Data Scientist, Deep Learning Engineer, Data Engineer, and DevOps or platform-focused roles.\n\nPlease check out [CrackingMachineLearningInterview](https:\u002F\u002Fshafaypro.github.io\u002FCrackingMachineLearningInterview\u002F) GitPage(for Ui\u002FUX experience).\n\n### Who this repository is for\n* Machine Learning Engineer\n* Data Scientist\n* Deep Learning Engineer\n* AI Engineer\n* Software Engineer working on AI\u002FML products\n* Data Engineer\n* MLOps Engineer\n* DevOps \u002F Platform Engineer\n\n## How to use this repository\n* Start with the **2026 Interview Roadmap** if you are preparing for current AI\u002FML interviews.\n* Use **2026 Additional Questions and Answers** for modern interview rounds.\n* Use the **AI \u002F GenAI**, **Data Engineering**, and **DevOps** sections for specialized interview tracks.\n* Use the **Classic Question Bank** for core ML, statistics, deep learning, and algorithms.\n* Use **Preparation Resources and References** to build a targeted study plan.\n* Use **Suggested Learning Order** if you want a clean path from fundamentals to production AI systems.\n\n## Quick Navigation\n* [2026 Interview Roadmap](.\u002Fdocs\u002F2026-interview-roadmap.md)\n* [2026 Additional Questions and Answers](.\u002Fdocs\u002F2026-additional-questions.md)\n* [2026 Common Interview Questions (New)](.\u002Fdocs\u002Finterview_questions_2026.md)\n* [AI \u002F GenAI Track](#ai--genai-track)\n* [Classic ML Track](#classic-ml-track)\n* [Deep Learning Track](#deep-learning-track)\n* [MLOps Track](#mlops-track)\n* [Data Engineering Track](#data-engineering-track)\n* [DevOps Track](#devops-track)\n* [Coding Challenges Track](#coding-challenges-track)\n* [Cloud ML Platforms](#cloud-ml-platforms)\n* [System Design Track](#system-design-track)\n* [Frameworks Track](#frameworks-track)\n* [Suggested Learning Order](#suggested-learning-order)\n* [Highlighted Projects](#highlighted-projects)\n* [Preparation Resources and References](.\u002Fdocs\u002Fresources-and-references.md)\n* [Study Pattern](.\u002Fdocs\u002Fstudy-pattern.md)\n* [Classic Question Bank](#classic-question-bank)\n* [Contributions](#contributions)\n\n## About\n* Github Profile: [Shafaypro](https:\u002F\u002Fgithub.com\u002Fshafaypro) &copy;\n* Repository: [CrackingMachineLearningInterview](https:\u002F\u002Fgithub.com\u002Fshafaypro\u002FCrackingMachineLearningInterview)\n\n#### Image References\n* Image references are included for educational purposes. Please see the repository references for attribution where applicable.\n\n#### Sharing\nFeel free to share the repository link in your blog, study notes, or interview preparation material.\n\n## Repository Structure\n* [`docs\u002F2026-interview-roadmap.md`](.\u002Fdocs\u002F2026-interview-roadmap.md): current interview focus areas for ML Engineer and AI Engineer roles.\n* [`docs\u002F2026-additional-questions.md`](.\u002Fdocs\u002F2026-additional-questions.md): modern 2026 question bank covering LLMs, RAG, evaluation, agents, and production AI.\n* [`docs\u002Finterview_questions_2026.md`](.\u002Fdocs\u002Finterview_questions_2026.md): deep-dive interview Q&A covering agents, RAG, LLM scaling, production AI, and system design. **(New)**\n* [`docs\u002Fresources-and-references.md`](.\u002Fdocs\u002Fresources-and-references.md): books, references, and additional interview topics.\n* [`docs\u002Fstudy-pattern.md`](.\u002Fdocs\u002Fstudy-pattern.md): recommended preparation topics, difficulty levels, and study structure.\n* [`ai_genai\u002F`](.\u002Fai_genai): GenAI and LLM engineering topics including n8n, CrewAI, LangGraph, LangSmith, multi-agent systems, and advanced RAG. **(Expanded)**\n* [`classical_ml\u002F`](.\u002Fclassical_ml): classical ML algorithms — time series, clustering, dimensionality reduction, recommender systems, feature engineering.\n* [`mlops\u002F`](.\u002Fmlops): MLOps topics — MLflow, model serving, feature stores, explainability, data quality, LLM evaluation. **(Expanded)**\n* [`cloud_ml\u002F`](.\u002Fcloud_ml): cloud ML platforms — AWS SageMaker, Google Vertex AI, Azure ML.\n* [`data_engineering\u002F`](.\u002Fdata_engineering): data engineering interview topics, platform concepts, and geospatial AI. **(Expanded)**\n* [`devops\u002F`](.\u002Fdevops): DevOps, infrastructure, deployment, and AI testing topics. **(Expanded)**\n* [`frameworks\u002F`](.\u002Fframeworks): ML and AI frameworks including FastAPI, Pydantic, PyTorch, HuggingFace, and LLM serving. **(Expanded)**\n* [`system_design\u002F`](.\u002Fsystem_design): ML system design patterns, RAG pipelines, agent architectures, batch vs real-time systems. **(Expanded)**\n* [`deep_learning\u002F`](.\u002Fdeep_learning): deep learning fundamentals, transformers, and applied training pipelines. **(Expanded)**\n* [`coding_challenges\u002F`](.\u002Fcoding_challenges): Python and SQL interview practice guides for coding screens and data problem solving. **(New)**\n* `README.md`: repository landing page plus the original classic ML interview question bank.\n\n## Suggested Learning Order\nUse this order if you want to move from theory to production-grade AI engineering:\n\n1. [Classic ML Track](#classic-ml-track)\n2. [Deep Learning Track](#deep-learning-track)\n3. [AI \u002F GenAI Track](#ai--genai-track)\n4. [Data Engineering Track](#data-engineering-track)\n5. [MLOps Track](#mlops-track)\n6. [Frameworks Track](#frameworks-track)\n7. [System Design Track](#system-design-track)\n8. [Coding Challenges Track](#coding-challenges-track)\n9. [Cloud ML Platforms](#cloud-ml-platforms)\n\n## Highlighted Projects\nUse these to turn the repo into a portfolio, not just a reading list:\n\n* [Training and Inference Pipeline](.\u002Fdeep_learning\u002Fintro_applied_deep_learning.md)\n* [Prompt Experimentation Repo](.\u002Fai_genai\u002Fintro_llm_fundamentals.md)\n* [Multi-Agent Research Assistant](.\u002Fai_genai\u002Fintro_agent_tool_use.md)\n* [Multi-Model Router](.\u002Fai_genai\u002Fintro_multi_model_orchestration.md)\n* [Document Understanding System](.\u002Fai_genai\u002Fintro_multimodal_ai.md)\n* [AI Workflow Automation with n8n](.\u002Fai_genai\u002Fintro_n8n.md)\n* [Eval Pipeline](.\u002Fmlops\u002Fintro_evaluation_guardrails.md)\n* [CI\u002FCD for AI App](.\u002Fmlops\u002Fintro_llmops_mlops_engineering.md)\n* [Scalable AI API](.\u002Fsystem_design\u002Fintro_backend_ai_system_design.md)\n* [Feature Store System](.\u002Fdata_engineering\u002Fintro_data_engineering_for_ai.md)\n\n## AI \u002F GenAI Track\nUse this track for AI Engineer, GenAI Engineer, LLM Engineer, Applied AI, and agent-platform interviews.\n\nCore topics:\n* [LLM & Generative AI Fundamentals](.\u002Fai_genai\u002Fintro_llm_fundamentals.md) **(New)**\n* [RAG](.\u002Fai_genai\u002Fintro_rag.md)\n* [RAG Engineering](.\u002Fai_genai\u002Fintro_rag_engineering.md) **(New)**\n* [Vector Databases](.\u002Fai_genai\u002Fintro_vector_databases.md)\n* [Vector Databases — Advanced (Pinecone, Weaviate, FAISS, pgvector, Hybrid Search, Reranking)](.\u002Fai_genai\u002Fintro_vector_databases_advanced.md) **(New)**\n* [LLMOps](.\u002Fai_genai\u002Fintro_llmops.md)\n* [Agentic AI](.\u002Fai_genai\u002Fintro_agentic_ai.md)\n* [Agent Systems & Tool Use](.\u002Fai_genai\u002Fintro_agent_tool_use.md) **(New)**\n* [Multi-Agent Systems (Patterns, Memory, Tool Calling, Failure Handling)](.\u002Fai_genai\u002Fintro_multi_agent_systems.md) **(New)**\n* [Multi-Model & AI Orchestration](.\u002Fai_genai\u002Fintro_multi_model_orchestration.md) **(New)**\n* [Multimodal AI](.\u002Fai_genai\u002Fintro_multimodal_ai.md) **(New)**\n* [CrewAI](.\u002Fai_genai\u002Fintro_crewai.md) **(New)**\n* [n8n - AI Workflow Automation](.\u002Fai_genai\u002Fintro_n8n.md) **(New)**\n* [n8n - Advanced AI Workflows](.\u002Fai_genai\u002Fintro_n8n_advanced.md) **(New)**\n* [LangGraph](.\u002Fai_genai\u002Fintro_langgraph.md) **(New)**\n* [LangSmith — Observability & Evaluation](.\u002Fai_genai\u002Fintro_langsmith.md) **(New)**\n* [Prompt Engineering (CoT, ReAct, Few-Shot, Self-Consistency, ToT, Output Control)](.\u002Fai_genai\u002Fintro_prompt_engineering.md) **(New)**\n* [Structured Outputs & Function Calling (JSON Mode, Tool Use, Pydantic, Instructor)](.\u002Fai_genai\u002Fintro_structured_outputs.md) **(New)**\n* [LLM Security (Prompt Injection, Jailbreaks, Red-Teaming, Defenses)](.\u002Fai_genai\u002Fintro_llm_security.md) **(New)**\n* [MCP](.\u002Fai_genai\u002Fintro_mcp.md)\n* [LangChain](.\u002Fai_genai\u002Fintro_langchain.md)\n* [Anthropic Overview](.\u002Fai_genai\u002Fintro_anthropic.md)\n\n## Data Engineering Track\nUse this track for pipeline, ETL, orchestration, warehouse, lakehouse, streaming, and geospatial interviews.\n\nCore topics:\n* [Data Engineering for AI](.\u002Fdata_engineering\u002Fintro_data_engineering_for_ai.md) **(New)**\n* [Data Modeling](.\u002Fdata_engineering\u002Fdata-modeling.md) **(New)**\n* [Data Architecture](.\u002Fdata_engineering\u002Fdata-architecture.md) **(New)**\n* [Apache Spark](.\u002Fdata_engineering\u002Fintro_apache_spark.md)\n* [Apache Kafka](.\u002Fdata_engineering\u002Fintro_apache_kafka.md)\n* [Apache Airflow](.\u002Fdata_engineering\u002Fintro_apache_airflow.md)\n* [dbt Introduction](.\u002Fdata_engineering\u002Fintro_dbt.md)\n* [dbt Interview Guide](.\u002Fdata_engineering\u002Finterview_dbt.md)\n* [Apache Iceberg](.\u002Fdata_engineering\u002Fintro_apache_iceberg.md)\n* [Delta Lake](.\u002Fdata_engineering\u002Fintro_delta_lake.md)\n* [DuckDB](.\u002Fdata_engineering\u002Fintro_duckdb.md)\n* [OpenClaw](.\u002Fdata_engineering\u002Fintro_openclaw.md)\n* [Geospatial AI Systems (Google Solar API, ArcGIS, PostGIS, H3)](.\u002Fdata_engineering\u002Fintro_geospatial.md) **(New)**\n\n## Deep Learning Track\nUse this track for ML engineer, deep learning engineer, and applied AI interviews requiring architecture and training depth.\n\nCore topics:\n* [Deep Learning Overview](.\u002Fdeep_learning\u002FREADME.md)\n* [Applied Deep Learning Roadmap](.\u002Fdeep_learning\u002Fintro_applied_deep_learning.md) **(New)**\n* [Transformers](.\u002Fdeep_learning\u002Fintro_transformers.md)\n\n## DevOps Track\nUse this track for infrastructure, CI\u002FCD, containers, orchestration, IaC, and AI system testing interviews.\n\nCore topics:\n* [Docker](.\u002Fdevops\u002Fintro_docker.md)\n* [Kubernetes](.\u002Fdevops\u002Fintro_kubernetes.md)\n* [Helm](.\u002Fdevops\u002Fintro_helm.md)\n* [Terraform](.\u002Fdevops\u002Fintro_terraform.md)\n* [GitHub Actions](.\u002Fdevops\u002Fintro_github_actions.md)\n* [Testing AI Systems (Playwright, Puppeteer, LLM E2E Testing)](.\u002Fdevops\u002Fintro_testing_ai.md) **(New)**\n\n## Classic ML Track\nUse this track for classical ML algorithm interviews, data science roles, and as foundations for ML engineer roles.\n\nCore topics:\n* [Time Series & Forecasting](.\u002Fclassical_ml\u002Fintro_time_series.md)\n* [Clustering Algorithms](.\u002Fclassical_ml\u002Fintro_clustering.md)\n* [Dimensionality Reduction](.\u002Fclassical_ml\u002Fintro_dimensionality_reduction.md)\n* [Recommender Systems](.\u002Fclassical_ml\u002Fintro_recommender_systems.md)\n* [Feature Engineering & Selection](.\u002Fclassical_ml\u002Fintro_feature_engineering.md)\n\n## MLOps Track\nUse this track for MLOps Engineer, Senior ML Engineer, and production ML system interviews.\n\nCore topics:\n* [LLMOps \u002F MLOps Engineering](.\u002Fmlops\u002Fintro_llmops_mlops_engineering.md) **(New)**\n* [MLflow](.\u002Fmlops\u002Fintro_mlflow.md)\n* [Model Explainability (SHAP, LIME)](.\u002Fmlops\u002Fintro_model_explainability.md)\n* [Feature Stores](.\u002Fmlops\u002Fintro_feature_stores.md)\n* [Model Serving](.\u002Fmlops\u002Fintro_model_serving.md)\n* [Data Quality & Validation](.\u002Fmlops\u002Fintro_data_quality.md)\n* [LLM Evaluation (Evals, Benchmarks, Hallucination Detection, HITL)](.\u002Fmlops\u002Fintro_llm_evaluation.md) **(New)**\n* [Evaluation & Guardrails](.\u002Fmlops\u002Fintro_evaluation_guardrails.md) **(New)**\n\n## Cloud ML Platforms\nUse this track for cloud-specific ML engineer and MLOps roles at companies using AWS, GCP, or Azure.\n\nCore topics:\n* [Cloud ML Platforms Overview](.\u002Fcloud_ml\u002FREADME.md) **(New)**\n* [Cloud ML Platforms Comparison (SageMaker vs Vertex AI vs Azure ML)](.\u002Fcloud_ml\u002Fintro_cloud_ml_platforms.md)\n* [AWS SageMaker Interview Guide](.\u002Fcloud_ml\u002Fintro_sagemaker.md) **(New)**\n* [Google Vertex AI Interview Guide](.\u002Fcloud_ml\u002Fintro_vertex_ai.md) **(New)**\n* [Azure Machine Learning Interview Guide](.\u002Fcloud_ml\u002Fintro_azure_ml.md) **(New)**\n\n## System Design Track\nUse this track for senior ML engineer, staff engineer, and principal engineer interviews requiring system design depth.\n\nCore topics:\n* [ML System Design Framework & Patterns](.\u002Fsystem_design\u002FREADME.md)\n* [Backend & System Design for AI](.\u002Fsystem_design\u002Fintro_backend_ai_system_design.md) **(New)**\n* [Recommendation System Design](.\u002Fsystem_design\u002Frecommendation_system.md)\n* [Fraud Detection System Design](.\u002Fsystem_design\u002Ffraud_detection.md)\n* [ML System Design Patterns — RAG, Agents, Batch vs Real-Time (2026)](.\u002Fsystem_design\u002Fml_system_design_patterns.md) **(New)**\n\n## Coding Challenges Track\nUse this track for interview rounds that require live coding, take-home problem solving, or SQL assessments.\n\nCore topics:\n* [Coding Challenges Overview](.\u002Fcoding_challenges\u002FREADME.md) **(New)**\n* [Python Coding Challenges](.\u002Fcoding_challenges\u002Fpython_coding_challenges.md) **(New)**\n* [SQL Coding Challenges](.\u002Fcoding_challenges\u002Fsql_coding_challenges.md) **(New)**\n\n## Frameworks Track\nUse this track for roles requiring hands-on Python API development and AI framework expertise.\n\nCore topics:\n* [PyTorch](.\u002Fframeworks\u002Fintro_pytorch.md)\n* [HuggingFace](.\u002Fframeworks\u002Fintro_huggingface.md)\n* [LangChain](.\u002Fframeworks\u002Fintro_langchain.md)\n* [Ollama](.\u002Fframeworks\u002Fintro_ollama.md)\n* [vLLM](.\u002Fframeworks\u002Fintro_vllm.md)\n* [Unsloth](.\u002Fframeworks\u002Fintro_unsloth.md)\n* [FastAPI — Production AI Backend Engineering](.\u002Fframeworks\u002Fintro_fastapi.md) **(New)**\n* [Pydantic — Data Validation for AI Systems](.\u002Fframeworks\u002Fintro_pydantic.md) **(New)**\n\n# Classic Question Bank\n\n#### Difference between SuperVised and Unsupervised Learning?\n        Supervised learning is when you know the outcome and you are provided with the fully labeled outcome data while in unsupervised you are not \n        provided with labeled outcome data. Fully labeled means that each example in the training dataset is tagged with the answer the algorithm should \n        come up with on its own. So, a labeled dataset of flower images would tell the model which photos were of roses, daisies and daffodils. When shown \n        a new image, the model compares it to the training examples to predict the correct label.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_285a8d44643e.jpg)\n#### What is Reinforcement Learning and how would you define it?\n        A learning differs from supervised learning in not needing labelled input\u002Foutput pairs be presented, and in not needing sub-optimal actions to be\n        explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current\n        knowledge). In reinforcement learning each learning step involves a penalty criteria whether to give the model positive points or negative points\n        and based on that penalizing the model.\n\n![](https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F1\u002F1b\u002FReinforcement_learning_diagram.svg\u002F250px-Reinforcement_learning_diagram.svg.png)\n#### What is Deep Learning ?\n        Deep learning is defined as algorithms inspired by the structure and function of the brain called artificial neural networks(ANN).Deep learning \n        most probably focuses on Non Linear Analysis and is recommend for Non Linear problems regarding Artificial Intelligence.\n\n#### Difference between Machine Learning and Deep Learning?\t\n        Since DL is a subset of ML and both being subset of AI.While basic machine learning models do become progressively better at whatever their \n        function is, they still need some guidance. If an AI algorithm returns an inaccurate prediction, then an engineer has to step in and make \n        adjustments. With a deep learning model, an algorithm can determine on its own if a prediction is accurate or not through its own neural network.\n![](https:\u002F\u002Flawtomated.com\u002Fwp-content\u002Fuploads\u002F2019\u002F04\u002FMLvsDL.png)\n#### Difference between SemiSupervised and Reinforcement Learning?\n        Semi-supervised learning uses a small amount of labeled data combined with a large amount of unlabeled data during training. It sits between\n        supervised (fully labeled) and unsupervised (no labels) learning. Common techniques include self-training, label propagation, and generative\n        models. Example: training an image classifier with 100 labeled images and 10,000 unlabeled ones.\n\n        Reinforcement learning (RL) is a paradigm where an agent learns by interacting with an environment, receiving rewards or penalties based on its\n        actions. Unlike semi-supervised learning which works on a fixed dataset, RL involves sequential decision making. The agent learns a policy that\n        maximizes cumulative reward over time. Example: training a robot to walk or an agent to play chess.\n\n#### Difference between Bias and Variance?\n        Bias is defined as the oversimplification assumption made by the model.\n        Variance is defined as the ability of a model to learn from noise as well, making it highly variant.\n        There is always a tradeoff between these both, hence its recommended to find a balance between these two and always use cross validation to \n        determine the best fit.\n\n#### What is Linear Regressions ? How does it work?\n        Fitting a Line in the respectable dataset when drawn to a plane, in a way that it actually defines the correlation between your dependent\n        variables and your independent variable. Using a simple Line\u002FSlope Formulae. Famously, representing f(X) = M(x) + b.\n        Where b represents bias\n        X represent the input variable (independent ones)\n        f(X) represents Y which is dependent(outcome).\n\n        The working of linear regression is Given a data set of n statistical units, a linear regression model assumes that the relationship between the \n        dependent variable y and the p-vector of regressors x is linear. This relationship is modeled through a disturbance term or error variable ε — an \n        unobserved random variable that adds \"noise\" to the linear relationship between the dependent variable and regressors. Thus the model takes the \n        form Y = B0 + B1X1 + B2X2 + ..... + BNXN\n        This also implies: Y(i) = X(i) ^ T + B(i)\n        Where T : denotes Transpose\n        X(i) : denotes input at the I'th record in form of vector\n        B(i) : denotes vector B which is bias vector.\n\n#### UseCases of Regressions:\n        Poisson regression for count data.\n        Logistic regression and probit regression for binary data.\n        Multinomial logistic regression and multinomial probit regression for categorical data.\n        Ordered logit and ordered probit regression for ordinal data.\n\n#### What is Logistic Regression? How does it work?\n        Logistic regression is a statistical technique used to predict probability of binary response based on one or more independent variables. \n        It means that, given a certain factors, logistic regression is used to predict an outcome which has two values such as 0 or 1, pass or fail,\n        yes or no etc\n        Logistic Regression is used when the dependent variable (target) is categorical.\n        For example,\n            To predict whether an email is spam (1) or (0)\n            Whether the tumor is malignant (1) or not (0)\n            Whether the transaction is fraud or not (1 or 0)\n        The prediction is based on probabilties of specified classes \n        Works the same way as linear regression but uses logit function to scale down the values between 0 and 1 and get the probabilities.\n\n#### What is Logit Function? or Sigmoid function\u002F where in ML and DL you can use it?\n        The sigmoid might be useful if you want to transform a real valued variable into something that represents a probability. While the Logit function\n        is to map probabilistic values from -Inf to +Inf to either real numbers representing True or False towards 1 or 0 (real number). This is commonly used\n        in Classification having base in  Logistic Regression along with Sigmoid based functions in Deep learning used to find a nominal outcome in a\n        layer or output of a layer.\n\n#### What is Gradient Decent Formula to Linear Regression Equation?\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_f045a2a4e115.jpg)\n\n\n\n\n#### What is Support Vector Machine ? how is it different from  OVR classifiers?\n        Support Vector Machine is defined as a technique for both classification and regression. It uses hyperplane estimation and finds the best\n        hyperplane fitting the estimate on linear lines. Although it can also work for non-linear problems using kernel tricks on SVM.\n        SVM is based on marginal lines (maximizing the difference between two classes).\n        One Vs Rest is the base classifier concept used in all ML algorithms involving classification based on a Class A vs all other Classes approach.\n        There are two heuristic approaches which are enhancements of multiclass classification to make the binary classifier perform\n        well on multi-class problems.\n![](https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F7\u002F72\u002FSVM_margin.png\u002F300px-SVM_margin.png)\n\n\n        The algorithms which uses OVO are:\n            1) Extreme learning Machines(ELM's)\n            2) Support Vector Machine(Classifiers)\n            3) K Nearest Neighbours.(for neighbouring classes based on distances)\n            4) Naive Bayes (based on MAP : Maximum Posterior )\n            5) Decision Trees(decision in subnodes after parent node has one feature)\n            6) Neural Networks (different nets)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_63b641eb114e.jpg)\n#### Types of SVM kernels\n        Think of kernels as defined filters each for their own specific use cases.\n\n        1) Polynomial Kernels (used for image processing)\n        2) Gaussian Kernel (When there is no prior knowledge for data)\n        3) Gaussian Radial Basis Function(same as 2)\n        4) Laplace RBF Kernel ( recommend for higher training set more than million)\n        5) Hyperbolic Tangent Kernel (neural network based kernel)\n        6) Sigmoid Kernel(proxy for Neural network)\n        7) Anova Radial Basis Kernel (for Regression Problems)\n\n#### What are the different types of Evaluation metrics in Regression?\n        There are multiple evaluation metrics for Regression Analysis\n        1) Mean Squared Error ( the average squared difference between the estimated values and the actual value)\n        2) Mean Absolute Error (Absolute of the Average difference)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_f6891f027bf0.png)\n#### How would you define Mean absolute error vs Mean squared error?\n        MAE : Use MAE when you are doing regression and don’t want outliers to play a big role. It can also be useful if you know that your distribution is multimodal, and it’s desirable to have predictions at one of the modes, rather than at the mean of them.\n        MSE : use MSE the other way around, when you want to punish the outliers.\n\n#### How would you evaluate your classifier?\n        A classifier can be evaluated through multiple case, having the base case around its confusion metrics and its attributes which are TP, TN , FP and FN. Along with the Accuracy metrics which can be derived alongside Precision, Recall scores.\n\n#### What is Classification?\n        Classification is defined as categorizing classes or entities based on the specified categories either that category exists or not in the respectable data. The concept is quite common for Image based classification or Data Based Classification. The answer in form of Yes or No;\n        alongside answers in form of types of objects\u002Fclasses.\n\n#### How would you differentiate between Multilabel and MultiClass classification?\n        A multiclass defines as a classification outcome which can be of multiple classes either A or B or C but not   two or more than one.\n        While in MultiLabel classification, An outcome can be of either one or more than two classes i.e A or A and B or A and B and C. \n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_4ca36b7d6d5b.png)\n\n\n#### What is a Confusion Matrix?\n        A confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of\n         the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is\n          usually called a matching matrix). Each row of the matrix represents the instances in a predicted class\n           while each column represents the instances in an actual class (or vice versa).\n        The name stems from the fact that it makes it easy to see if the system is confusing two classes\n        (i.e. commonly mislabeling one as another).\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_1fb4c81a1c5e.png)\n\n#### Which Algorithms are High Biased Algorithms?\n        Bias is the simplifying assumptions made by the model to make the target function easier to approximate.\n        1) High bias algorithms are most probably Linear Algorithm, which are concerned with linear relationships or linear distancing. Examples are \n        2) Linear, Logistic or Linear Discriminant Analysis.\n\n#### Which Algorithms are High and low Variance Algorithms?\t\n        Variance is the amount that the estimate of the target function will change given different training data\n\n        1) High Variance Algorithms are Decision Trees, K Nearest Neigbours and SVMs\n        2) Low Variance Algorithms are Linear Regression, Logistic Regression and LDA's\n\n#### Why are the above algorithms are High biased or high variance?\n        Linear machine learning algorithms often have a high bias but a low variance.\n        Nonlinear machine learning algorithms often have a low bias but a high variance.\n\n#### What are root case of Prediction Bias?\n        Possible root causes of prediction bias are:\n\n        1) Incomplete feature set\n        2) Noisy data set\n        3) Buggy pipeline\n        4) Biased training sample\n        5) Overly strong regularization\n\n\n\n#### What is Gradient Decent? Difference between SGD and GD? \n        Gradient Descent is an iterative method to solve the optimization problem. There is no concept of \"epoch\" or \"batch\" in classical gradient decent. The key of gradient decent are\n        * Update the weights by the gradient direction.\n        * The gradient is calculated precisely from all the data points.\n        Stochastic Gradient Descent can be explained as: \n        * Quick and dirty way to \"approximate gradient\" from one single data point. \n        * If we relax on this \"one single data point\" to \"a subset of data\", then the concepts of batch and epoch come.\n\n![OneVariableSGD](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_9a758600dd19.png)\n\n#### What is Randomforest and Decision Trees?\n        A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event\n        outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.\n\n        Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by\n        constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean\n        prediction (regression) of the individual trees. Used to remove the Overfitting occured due to single Decision Trees.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_484d9015d7b2.png)\n#### What is Process of Splitting?\n        Splitting up your data in to subsets based on provided data facts. (can come in handy for decision Trees)\n\n#### What is the process of pruning?\n        The shortening of branches of Decision Trees is termed as pruning. The process is done in order to reach the decision earlier than\n        expected, reducing the size of the tree by turning some branch nodes into leaf nodes, and removing the leaf nodes under the original branch.\n\n#### How do you do Tree Selection?\n        Tree selection is mainly done from the following\n        1) Entropy \n                A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar \n                values (homogeneous). ID 3 algorithm uses entropy to calculate the homogeneity of a sample. If the sample is completely homogeneous the\n                entropy is zero and if the sample is an equally divided it has entropy of one.\n                Entropy(x) -> -p log(p) - qlog(q)  with log of base 2\n        2) Information Gain  \n                The information gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches).\n\n                2.1) Calculate entropy of the target.\n                2.2) The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain, or decrease in entropy.\n                2.3) Choose attribute with the largest information gain as the decision node, divide the dataset by its branches and repeat the same process on every branch.\n\n#### Pseudocode for Entropy in Decision Trees:\n        '''      \n        from math import log\n\n        def calculateEntropy(dataSet):\n        number = len(dataSet)\n        labelCounts = {}\n        for featureVector in dataSet:\n            currentLabel = featureVector[-1]\n            if currentLabel not in labelCounts.keys():\n            labelCounts[currentLabel] = 0\n            labelCounts[currentLabel] +=1\n        entropy = 0\n        for i in labelCounts:\n            probability = float(labelCounts[keys])\u002Fnumber\n            entropy -=probability*log(probability,2)\n        return entropy\n        '''\n#### How does RandomForest Works and Decision Trees?\n        -* Decision Tree *- A Simple Tree compromising of the process defined in selection of Trees.\n        -* RandomForest *- Combination of Multiple N number of Decision Trees and using the aggregation to determine the final outcome.\n        The classifier outcome is based on Voting of each tree within random forest while in case of regression it is based on the \n        averaging of the tree outcome.\n\n#### What is Gini Index? Explain the concept?\n        The Gini Index is calculated by subtracting the sum of the squared probabilities of each class from one. It favors larger partitions.\n        Imagine, you want to draw a decision tree and wants to decide which feature\u002Fcolumn you should use for your first split?, this is probably defined\n        by your gini index.\n\n#### What is the process of gini index calculation?\n        Gini Index:\n        for each branch in split:\n            Calculate percent branch represents .Used for weighting\n            for each class in branch:\n                Calculate probability of class in the given branch.\n                Square the class probability.\n            Sum the squared class probabilities.\n            Subtract the sum from 1. #This is the Ginin Index for branch\n        Weight each branch based on the baseline probability.\n        Sum the weighted gini index for each split.\n\n#### What is the formulation of Gini Split \u002F Gini Index?\n        Favors larger partitions.\n        Uses squared proportion of classes.\n        Perfectly classified, Gini Index would be zero.\n        Evenly distributed would be 1 – (1\u002F# Classes).\n        You want a variable split that has a low Gini Index.\n        The algorithm works as 1 – ( P(class1)^2 + P(class2)^2 + … + P(classN)^2)\n\n#### What is probability? How would you define Likelihood?\n        Probability defines the percentage of Succes occured. or Success of an event. Can be described Chance of having an event is 70% or etc.\n        We suppose that the event that we get the face of coin in success, so the probability of success now is 0.5 because the probability of face and back of a coin is equal. 0.5 is the probability of a success.\n\n        Likelihood is the conditional probability. The same example, we toss the coin 10 times ,and we suppose that we get 7 success ( show the face) and\n        3 failed ( show the back). The likelihood is calculated (for binomial distribution, it can be vary depend on the distributions).\n\n\n        Likelihood(Event(success)) - > L(0.5|7)= 10C7 * 0.5^7 * (1-0.5)^3 = 0.1171 \n\n        L(0.5 | 7) : means event likelihood of back( given number of successes)\n        10C7 -> Combination based on total 10 Events, and having the success outcome be 7 events\n        In general:\n            Event(X | Y)  -> C(Total Event| Success Event) * [(Prob of X) ^ (Success Event X)] * [(1 - Prob of X) ^ (1 - Success Event X)]\n\n#### What is Entropy? and Information Gain ? there difference ?\n        Entropy: Randomness of information being processed.\n\n        Information Gain multiplies the probability of the class times the log (base=2) of that class probability.  Information Gain favors smaller partitions with many distinct values.  Ultimately, you have to experiment with your data and the splitting criterion.\n        IG depends on Entropy Change (decrease represent increase in IG)\n\n\n#### What is KL divergence, how would you define its usecase in ML?\n        Kullback-Leibler divergence calculates a score that measures the divergence of one probability distribution from another\n![](https:\u002F\u002Fwikimedia.org\u002Fapi\u002Frest_v1\u002Fmedia\u002Fmath\u002Frender\u002Fsvg\u002F4958785faae58310ca5ab69de1310e3aafd12b32)\n\n#### How would you define Cross Entropy, What is the main purpose of it ? \n        Entropy: Randomness of information being processed.\n\n        Cross Entropy: A measure from the field of information theory, building upon entropy and generally calculating the difference between two\n        probability distributions. It is closely related to but is different from KL divergence that calculates the relative entropy between two\n        probability distributions, whereas cross-entropy can be thought to calculate the total entropy between the distributions.\n\n        Cross-entropy can be calculated using the probabilities of the events from P and Q, as follows:\n                H(P, Q) = – sum x in X P(x) * log(Q(x))\n#### How would you define AUC - ROC Curve?\n        ROC is a probability curve and AUC represents degree or measure of separability. AUC - ROC curve is a performance measurement for classification problem at various thresholds settings.\n\n        It tells how much model is capable of distinguishing between classes. Mainly used in classification problems for measure at different thresholds.\n        Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s. By analogy, Higher the AUC, better the model is at distinguishing between patients with disease and no disease.\n    \n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_5293a1e927d9.png)\n#### How would you define False positive or Type I error and False Negative or Type II Error ?\n        False positive : A false positive is an outcome where the model incorrectly predicts the positive class.(was A but got predicted B) aka Type I error.\n\n        False Negative : A false negative is an outcome where the model incorrectly predicts the negative class. (was not A but predicted A) aka Type II error.\n\n#### How would you define precision() and Recall(True positive Rate)?\n        Take a simple Classification example of \"Classifying email messages as spam or not spam\"\n\n        Precision measures the percentage of emails flagged as spam that were correctly classified—that is, the percentage of dots to the right of the threshold line, it is also defined as % of event being Called at positive rates e.g \n                Precision = True Positive \u002F (True Positive + False positive) \n        \n        Recall measures the percentage of actual spam emails that were correctly classified\n                Recall = True Postives \u002F (True Positive + False Negative)\n        \n        There is always a tradeoff between precision and Recall same is the case of Bias and Variance.\n\n\n#### Which one would you prefer for you classification model Precision or Recall?\n        This totally depends on Business Usecase or SME usecase. In case of Fraud Detection Business domains such as banks, online ecommerce websites\n        recommends of better recall score than precision. While in other cases such as word suggestions or Multi label Categorization it can be precision.\n        In general, totally dependent on your use case.\n\n#### What is F1 Score? which intution does it gives ?\n        The F1 score is the harmonic mean of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall).\n        Also known as Dice Similarity Coefficient.\n\n        David (Scientist Statistician): The widespread use of the F1 score since it gives equal importance to precision and recall. In practice, different types of mis-classifications incur different costs. In other words, the relative importance of precision and recall is an aspect of the problem\n\n#### What is difference between Preceptron and SVM?\n        The major practical difference between a (kernel) perceptron and SVM is that perceptrons can be trained online (i.e. their weights can be updated\n        as new examples arrive one at a time) whereas SVMs cannot be. Perceptron is no more than hinge loss (loss function) + stochastic gradient descent (optimization).\n        \n        SVM has almost the same goal as L2-regularized perceptron.\n        SVM can be seen as hinge loss + l2 regularization (loss + regularization) + quadratic programming or other fancier optimization algorithms like SMO (optimization).\n\n#### What is the difference between Logistic and Linear Regressions?\n        LogR is Classifier, LR is Regression.\n        LogR values are between 0 and 1 and probabilty in between as well.\n        LR values are in real numbers from 1 to positive N (where N is known)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_164d3b11c8f2.png)\n#### What are outliers and How would you remove them?\n        An outlier is an observation that lies an abnormal distance from other values in a random sample from a population.\n![](https:\u002F\u002Fwww2.southeastern.edu\u002FAcademics\u002FFaculty\u002Fdgurney\u002FOutlier.jpg)\n\n        Outliers can be removed by following:\n            1) Use Inter Quantile Range (IQR * 1.5)\n            2) Use Z-score Scale removal (so that any point much away from mean gets removed)\n            3) Combination of Z Score and IQR (custom scores)\n\n#### What is Regularization?\n        Regularization techniques are used to reduce the error by fitting a function appropriately on the given training set and avoid overfitting.\n        They add a penalty term (Lambda * weight magnitude) to the loss function to discourage overly complex models.\n#### Difference between L1 and L2 Regularization?\n        1) L1 Regularization (Lasso Regression)\n            (Least Absolute Shrinkage and Selection Operator) adds “absolute value of magnitude” of coefficient as penalty term to the loss function.\n\n        2) L2 Regularization (Ridge Regression)\n            Adds “squared magnitude” of coefficient as penalty term to the loss function. Here the highlighted part represents L2 regularization element.\n\n        The key difference between these techniques is that Lasso shrinks the less important feature’s coefficient to zero thus, removing some feature\n        altogether. So, this works well for feature selection in case we have a huge number of features.\n\n#### What are different Technique of Sampling your data?\n        Data Sampling statistical analysis technique used to select, manipulate and analyze a representative subset of data points to identify patterns and trends in the larger data set being examined.\n        There are different techniques of sampling your data\n        1) Simple Random Sampling (records are picked at random)\n        2) Stratified Sampling (subsets based on common factor with equal ratio distribution)\n        3) Cluster Sampling (largest set is breaken down in form of clusters based on defined factors and SRS is applied)\n        4) MultiStage Sampling (cluster on Cluster sampling)\n        5) Symentaic Sampling (Sample created by setting interval)\n        \n#### Can you define the concept of Undersampling and Oversampling?\n        Undersampling is the concept of downsizing the class based sample from a Bigger range to smaller range i.e 1Million records to 0.1 Million records,\n        keeping the ratio of information intact\n\n        Oversampling represents the concept of using a smaller class sample i.e 100K to scale upto million keeping the trend and the property to make up\n        datasets.\n\n#### What is Imbalanced Class?\n        Imbalancment is when you don't have balance in between classes.\n        Imabalnced class is when the normal distribution\u002Fsupport count of multiple classes or classes being considered are not the same or almost same.\n        E.G:\n            Class A has 1 Million Record\n            Class B has 1000 Record\n        This is imbalanced data set and Class B is UnderBalanced Class.\n\n#### How would you resolve the issue of Imbalancment data set?\n        The techniques such as \n            1) OverSampling\n            2) UnderSampling\n            3) Smote combination of both\n            4) bringing in more dataset\n            5) doing more trend analysis\n        can resolve the issue of Imbalancment dataset\n\n#### How would you define Weighted Moving Averages ?\n        A incremental increase of Moving Average having a Weighted Multiple to keep the values which gets repeated during a certain time with High \n        priority\u002F Impact.\n\n#### What is meant by ARIMA Models?\n        A Regressive and Moving Average Model combination is termed as ARIMA. To be exact, Auto Regressive Integrated Moving Averages.\n        A techniques which does regression analysis along with moving averages which fits time series analysis and gets trend analysis with \n        acceptable scores.\n\n#### How would you define Bagging and Boosting? How would XGBoost differ from RandomForest?\n        Bagging : A way to decrease the variance in the prediction by generating additional data for training from dataset using combinations with repetitions to produce multi-sets of the original data. \n                Example : Random Forest  (uses random Sampling subsets)\n        Boosting: An iterative technique which adjusts the weight of an observation based on the last classification\n                Example: AdaBoost, XGboost  (using gradient descent as main method)\n\n#### What is IQR, how can these help in Outliers removal?\n        IQR is interquantile range which specifies the range between your third quantile and the first one.\n        Quantile are 4 points of your data represented by percentage(should be four equal parts )\n            Q1: 0-25%\n            Q2: 25-50%\n            Q3: 50-75%\n            Q4: 75- 100%\n        IQR :- Q3 - Q1\n#### What is SMOTE?\t\n        Synthetic Minority Over-sampling TEchnique also known as SMOTE. \n        A very popular oversampling method that was proposed to improve random oversampling but \n        its behavior on high-dimensional data has not been thoroughly investigated. \n        KNN algorithm gets benefits from SMOTE.\n#### How would you resolve Overfitting or Underfitting? \n        Underfitting\n            1) Increase complexity of model\n            2) Increasing training time\n            3) decrease learning rate\n        Overfitting:\n            1) Cross Validation\n            2) Early Stops\n            3) increased learning rates(hops)\n            4) Ensembling \n            5) Bring in More data\n            6) Remove Features\n#### Mention some techniques which are to avoid Overfitting?\n        1) Cross Validation\n        2) Early Stops\n        3) increased learning rates(hops)\n        4) Ensembling \n        5) Bring in More data\n        6) Remove Features\n#### What is a Neuron?\n        A \"neuron\" in an artificial neural network is a mathematical approximation of a biological neuron.\n        It takes a vector of inputs, performs a transformation on them, and outputs a single scalar value.\n         It can be thought of as a filter. Typically we use nonlinear filters in neural networks.\n\n#### What are Hidden Layers and Input layer?\n![](https:\u002F\u002Fwww.i2tutorials.com\u002Fwp-content\u002Fuploads\u002F2019\u002F05\u002FHidden-layrs-1-i2tutorials.jpg)\n\n        1) Input Layer: Initial input for your neural network\n        2) Hiddent layers: a hidden layer is located between the input and output of the algorithm, \n        in which the function applies weights to the inputs and directs them through an activation function as the output.\n        In short, the hidden layers perform nonlinear transformations of the inputs entered into the network. \n        Hidden layers vary depending on the function of the neural network, and similarly, the layers may vary depending \n        on their associated weights.\n\n#### What are Output Layers?\n        Output layer in ANN determines the final layer which is responsible for the final outcome, the outcome totally depends\n        on the usecase provided and the function which is being used to scale the values. By default, Linear, Sigmoid and Relu are \n        most common choices.\n        Linear for Regression.\n        Sigmoid\u002FSoftmax for Classification.\n\n#### What are activation functions ?\n        Activation functions perform a transformation on the input received, in order to keep values within a manageable range depending\n        on the limitation of the activation function. Its more of a mathematical scale filter applied to a complete layer (Vector)\n        to scale out values.\n        Some common examples for AF are:\n            1) Sigmoid or SoftMax Function (has Vanish Gradient problem)\n                Softmax outputs produce a vector that is non-negative and sums to 1. It's useful when you have mutually exclusive categories \n                (\"these images only contain cats or dogs, not both\"). You can use softmax if you have 2,3,4,5,... mutually exclusive labels.\n\n            2) Tanh function (has Vanish Garident problem)\n                if the outputs are somehow constrained to lie in [−1,1], tanh could make sense.\n            3) Relu Function\n                ReLU units or similar variants can be helpful when the output is bounded above or below. \n                If the output is only restricted to be non-negative, it would make sense to use a ReLU \n                activation as the output function. (0 to Max(x))\n\n            4) Leaky Relu Function (to fix the dying relu problem in the Relu function within hidden layers)\n\n#### What is a Convolutional Neural Network?\n        convolutional-neural-network is a subclass of neural-networks which have at least one convolution layer. \n        They are great for capturing local information (e.g. neighbor pixels in an image or surrounding words in a text) \n        as well as reducing the complexity of the model (faster training, needs fewer samples, reduces the chance of overfitting).\n        . A convolution unit receives its input from multiple units from the previous layer which together create a proximity.\n        Therefore, the input units (that form a small neighborhood) share their weights.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_91a1b80d523f.jpeg)\n\n#### What is recurrent Neural Network?\t\n        A class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence.\n        This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their \n        internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such \n        as unsegmented, connected handwriting recognition or speech recognition.\n![ImageAddress](https:\u002F\u002Fwww.i2tutorials.com\u002Fwp-content\u002Fuploads\u002F2019\u002F09\u002FNeural-network-62-i2tutorials.png)\n\n#### What is LSTM network?\n        Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture.\n        Unlike standard feedforward neural networks, LSTM has feedback connections. It can not only process single\n        data points (such as images), but also entire sequences of data (such as speech or video). \n        For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, Anomly detection in network\n        traffic or IDS.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_1fd1ab45886b.png)\n\n#### What is a Convolutional Layer?\n        A convolution is the simple application of a filter to an input that results in an activation. Repeated application of the \n        same filter to an input results in a map of activations called a feature map, indicating the locations and strength of a \n        detected feature in an input, such as an image.\n        You can use Filters which are based on Horizental Lines or Verticial Lines or Gray Scale conversion or other conversion filters.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_5cf523ffcf2f.webp)\n\n#### What is Pooling Layer?\n        Pooling layers provide an approach to down sampling feature maps by summarizing the presence of features in patches of the feature map.\n        Two common pooling methods are average pooling and max pooling that summarize the average presence of a feature and the most activated \n        presence of a feature respectively.      \n\n        This is required to downsize your feature scale (e.g You have detected vertical lines, now remove some of the feature to go in grain)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_37b56d2a67ff.png)\n\n#### What is MaxPooling Layer? How does it work?\n        Max pooling uses the maximum value found in a considered region. Maximum pooling, or max pooling, is a pooling operation that calculates the maximum, or largest, value in each patch of each feature map.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_81a0fa489ea4.png)\n#### What is Kernel or Filter?\n        kernel methods are a class of algorithms for pattern analysis, whose best known member is the support vector machine (SVM)\n        Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors.\n        A Kernel is used to solve Non- Linear problem by Linear Classifiers in a way that its useable.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_1699c227e820.png)\n\n\n#### What is Segmentation?\n        The process of partitioning a digital source into multiple segments.\n        If you refer Image, Imagine Image source being converted into multiple segments such as Airplane object.\n        The goal of segmentation is to simplify and\u002For change the representation of an image into something that\n         is more meaningful and easier to analyze.\n\n#### What is Pose Estimation?\t\n        Detection of poster from an Image is represented as Post Estimation.\n\n#### What is Forward propagation?\t\n        The input data is fed in the forward direction through the network. Each hidden layer accepts the input data,\n        processes it as per the activation function and passes to the successive layer.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_4899a5df88f4.png)\n\n#### What is backward propagation?\n        Back-propagation is the essence of neural net training. It is the practice of fine-tuning the weights \n        of a neural net based on the error rate (i.e. loss) obtained in the previous epoch (i.e. iteration).\n        Proper tuning of the weights ensures lower error rates, making the model reliable by increasing its generalization\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_c2a4e25c4aaf.jpg)\n\n#### what are dropout neurons?\t\n        The term “dropout” refers to dropping out units (both hidden and visible) in a neural network.\n        Simply put, dropout refers to ignoring units (i.e. neurons) during the training phase of certain \n        set of neurons which is chosen at random. By “ignoring”, I mean these units are not considered during\n        a particular forward or backward pass.\n        More technically, At each training stage, individual nodes are either dropped out of the net with \n        probability 1-p or kept with probability p, so that a reduced network is left; incoming and \n        outgoing edges to a dropped-out node are also removed.\n\n#### what are flattening layers?\t\n        A flatten layer collapses the spatial dimensions of the input into the channel dimension. \n        For example, if the input to the layer is an H-by-W-by-C-by-N-by-S array (sequences of images),\n        then the flattened output is an (H*W*C)-by-N-by-S array.\n\n#### How is backward propagation dealing an improvment in the model?\n        practice of fine-tuning the weights of a neural net based on the error rate (i.e. loss) \n        obtained in the previous epoch (i.e. iteration). Proper tuning of the weights ensures lower error rates, \n        making the model reliable by increasing its generalization\n\n#### What is correlation? and covariance?\n        “Covariance” indicates the direction of the linear relationship between variables. \n        “Correlation” on the other hand measures both the strength and direction of the linear relationship between two variables.\n\n        When comparing data samples from different populations, covariance is used to determine how much two random variables\n        vary together, whereas correlation is used to determine when a change in one variable can result in a change in another. \n        Both covariance and correlation measure linear relationships between variables.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_d8f4fc8bedee.png)\n\n#### What is Anova? when to use Anova?\t\n        Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures \n        (such as the \"variation\" among and between groups) used to analyze the differences among group means in a sample.\n\n        Use a one-way ANOVA when you have collected data about one categorical independent variable and \n        one quantitative dependent variable. The independent variable should have at least three levels\n         (i.e. at least three different groups or categories)\n\n#### How would you define dimensionality reduction? Why do we use dimensionality reduction?\n        Dimensionality reduction or dimension reduction is the process of reducing the number of random variables \n        under consideration by obtaining a set of principal variables. Approaches can be divided into feature \n        selection and feature extraction.\n        The reason we use it is because\n                1) Immensive dataset \n                2) Longer Trainnig time\u002Fgathering time\n                3) Too much complex assumptions\u002F Model overfitting\n        Types of Dimensionality reductions are \n                1) Feature Selection\n                2) Feature Projection( transform data from higher dimention to lower space of fewer dimention)\n                3) Principle component Analysis\n                        Linear Technique for DR, performs linear mapping of data to lower dimention\n                        space in such a way variance is maximized.\n                4) Non Negative Metrics Factorization\n                5) Kernel PCA ( Non linear way of utilization of Kernel Trick)\n                6) Graph Based Kernel PCA ( locally linear embedding, Eigen Embeddings)\n\n                7) Linear Discriminant Analysis\n                        A method used in statistics, pattern recognition and machine learning to find a \n                        linear combination of features that characterizes or separates two or more \n                        classes of objects or events.\n                8) Generalized Discriminant Analysis \n                8) TSNE (is a non-linear dimensionality reduction technique useful for visualization of high-dimensional datasets.)\n                9) U-Map\n                        Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction technique. \n                        Visually, it is similar to t-SNE, but it assumes that the data is uniformly distributed on a locally \n                        connected Riemannian manifold and that the Riemannian metric \n                        is locally constant or approximately locally constant.\n                10) Autoencoders (can learn from Non Linear dimention reduction function)\n\n#### What is Principal Component Analysis? How does PCA work in dimensionality reduction?\n\n        The main linear technique for dimensionality reduction, principal component analysis, performs\n         a linear mapping of the data to a lower-dimensional space in such a way that the variance of \n         the data in the low-dimensional representation is maximized. In practice, the covariance (and \n         sometimes the correlation) matrix of the data is constructed and the eigenvectors on this \n         matrix are computed. The eigenvectors that correspond to the largest eigenvalues (the \n         principal components) can now be used to reconstruct a large fraction of the variance of the \n         original data. The original space (with dimension of the number of points) has been reduced \n         (with data loss, but hopefully retaining the most important variance) to the space spanned by \n         a few eigenvectors\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_6add72a48e8f.jpg)\n\n#### What is Maximum Likelihood estimation?\n        Maximum likelihood estimation is a method that determines values for the parameters of a model. \n        The parameter values are found such that they maximise the likelihood that the process described by the model\n        produced the data that were actually observed.\n\n#### What is Naive Bayes? How does it works?\n        A method of estimating the parameters of a probability distribution by maximizing a likelihood function, \n        so that under the assumed statistical model the observed data is most probable\n![](https:\u002F\u002Fwikimedia.org\u002Fapi\u002Frest_v1\u002Fmedia\u002Fmath\u002Frender\u002Fsvg\u002F52bd0ca5938da89d7f9bf388dc7edcbd546c118e)\n\n![](https:\u002F\u002Fwikimedia.org\u002Fapi\u002Frest_v1\u002Fmedia\u002Fmath\u002Frender\u002Fsvg\u002Fd0d9f596ba491384422716b01dbe74472060d0d7)\n\n\n#### What is Bayes Theorm?\n        The probability of an event, based on prior knowledge of conditions that might be related to the event.\n\n#### What is Probability?\n        Probability is a number between 0 and 1, where, roughly speaking, 0 indicates impossibility and 1 indicates certainty.\n        The higher the probability of an event, the more likely it is that the event will occur. \n        Example:\n        A simple example is the tossing of a fair (unbiased) \n        coin. Since the coin is fair, the two outcomes (\"heads\" and \"tails\") are both equally probable; the probability of \"heads\" equals the probability \n        of \"tails\"; and since no other outcomes are possible, the probability of either \"heads\" or \"tails\" is 1\u002F2 (which could also be written as 0.5 or \n        50%).\n[ReferenceLink](https:\u002F\u002Fmachinelearningmastery.com\u002Fjoint-marginal-and-conditional-probability-for-machine-learning\u002F)\n#### What is Joint Probability?\n        Joint probability is a statistical measure that calculates the likelihood of two events occurring together and at the same point in time.\n                P(A and B) or P (A ^ B) or P(A & B)\n                The joint probability is detremeinded as :\n                P(A and B) = P(A given B) * P(B)       \n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_14fe5f0e0122.jpg)\n\n#### What is Marginal Probability?\n        Probability of event X=A given variable Y. Single Random event probability \n        P(A) , A single probability of an independent event.\n\n#### What is Conditional Probability? what is distributive Probability?\n        Probability of event A given event B is termed as Conditional Probability.\n\n#### What is Z score?\n        Z score (also called the standard score) represents the number of standard deviations with which the \n        value of an observation point or data differ than the mean value of what is observed\n\n#### What is KNN how does it works? what is neigbouring criteria? How you can change it ?\n        KNN is dependent on distancing estimation from the points of a Class to respectable points in class, thus acting as a Vote Based Neigbouring\n        Classifier, where you conclude the outcome of your input to be predicted by measuring which points come close to it.\n        You can have as much as neigbours you want, the more you specify neigbours the more classes it will use to evaluate the final outcome.\n\n        Working is quite similar than a distancing algorithm, although you draw the point and calculate all the neigbouring by looking which\n        are close, when you are done with it you go with as votes, E.g Class A were 5 classes and Class B were 2 classes in that neigbour hood.\n        Hence the vote would be class A.\n\n\n#### Which one would you prefer low FN or FP's based on Fraudial Transaction?\n        Recommended is low FN's, the reason is because if you consider Fraudly Transaction being occured and counting it as not being occured \n        This has huge impact on the Business model.\n\n#### Differentiate between KNN and KMean?\n        KMean: Unsupervised, Random points drawn, each uses distance based averages for prediction.\n        KNN: Supervised, neigbouring, C values , Voting \n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_e50401a4bdaf.png)\n\n#### What is Attention ? Give Example ?\n        A neural attention mechanism equips a neural network with the ability to focus on a subset of its inputs (or features).\n        1) Hard Attention (Image Cropping)\n        2) Soft Attention (Highlight attentional area keeping the image size same)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_e6b688e3cf92.png)\n\n#### What are AutoEncoders? and what are transformers?\n\n        Autoencoders take input data, compress it into a code, then try to recreate the input data from that summarized code. It’s like starting with Moby Dick, creating a SparkNotes version and then trying to rewrite the original story using only SparkNotes for reference. While a neat deep learning trick, there are fewer real-world cases where a simple autocoder is useful. But add a layer of complexity and the possibilities multiply: by using both noisy and clean versions of an image during training, autoencoders can remove noise from visual data like images, video or medical scans to improve picture quality.\n\n#### What is Image Captioning?\t\n        Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to\n        generate the captions\n![](https:\u002F\u002Fcdn.analyticsvidhya.com\u002Fwp-content\u002Fuploads\u002F2018\u002F03\u002Fexample.png)\n\n#### Give some example of Text summarization.\t\n        Summarization is the task of condensing a piece of text to a shorter version, reducing the size of the initial text while preserving the meaning.\n        Some examples are :\n                1) Essay Summarization\n                2) Document Summarization\n                etc\n\n#### Define Style Transfer?\n        Style transfer is a computer vision technique that applies the artistic style of one image to the content of another.\n        It uses convolutional neural networks (typically VGG) to separate and recombine the content and style representations\n        of two images. The loss function combines a content loss (preserving structure of the content image) with a style loss\n        (matching Gram matrix statistics of the style image). Neural Style Transfer was introduced by Gatys et al. (2015).\n        Modern approaches use fast style transfer (pre-trained feed-forward networks) for real-time applications.\n        Use cases: photo filters, artistic image generation, creative tools.\n\n#### Define Image Segmentation and Pose Analysis?\t\n        Image Segmentation : In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into \n        multiple segments (sets of pixels, also known as image objects). The goal of segmentation is to simplify and\u002For change the representation of \n        an image into something that is more meaningful and easier to analyze.\n\n        Pose Analysis:\n                The process of determining the location and the orientation of a Human Entity (pose).\n\n![PoseSegmentation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_7d931b674dec.jpg)\n\n\n#### Define Semantic Segmentation?\n        Semantic Segmentation is the Segmentation of an image based on Type of Objects\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_b9bbe7c5084f.png)\n#### What is Instance Segmentation?\n        Same as Semantic, although with Objects (with respectable ID's)\n\n#### What is Imperative and Symbolic Programming?\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_a86c1f75d606.jpg)\n\n\n#### Define Text Classification, Give some usecase examples?\n        Text classification also known as text tagging or text categorization is the process of categorizing text into organized groups. \n        By using Natural Language Processing (NLP), text classifiers can automatically analyze text and then assign a set of pre-defined\n        tags or categories based on its content.\n        UseCases:\n                1) Document Classification\n                2) Document Categorization\n                3) Point of Interest in Document\n                4) OCR\n                etc\n![](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FRaghava_Rao_Mukkamala\u002Fpublication\u002F321892732\u002Ffigure\u002Ffig3\u002FAS:574016848764930@1513867689381\u002FText-Classification-Architecture.png)\n\n#### which algorithms to use for Missing Data?\n![](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F330704615\u002Ffigure\u002Ffig2\u002FAS:720385997815812@1548764814471\u002FMachine-learning-with-missing-data-Conventional-single-imputation-methods-for-handling.ppm)\n\n\nREFERENCED FROM : https:\u002F\u002Fgithub.com\u002Fandrewekhalel\u002FMLQuestions\n\n#### 1) What's the trade-off between bias and variance? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\n\nIf our model is too simple and has very few parameters then it may have high bias and low variance. On the other hand if our model has large number of parameters then it’s going to have high variance and low bias. So we need to find the right\u002Fgood balance without overfitting and underfitting the data. [[src]](https:\u002F\u002Ftowardsdatascience.com\u002Funderstanding-the-bias-variance-tradeoff-165e6942b229)\n\n#### 2) What is gradient descent? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\n[[Answer]](https:\u002F\u002Fmachinelearningmastery.com\u002Fgradient-descent-for-machine-learning\u002F)\n\nGradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).\n\nGradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.\n\n#### 3) Explain over- and under-fitting and how to combat them? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\n[[Answer]](https:\u002F\u002Ftowardsdatascience.com\u002Foverfitting-vs-underfitting-a-complete-example-d05dd7e19765)\n\nML\u002FDL models essentially learn a relationship between its given inputs(called training features) and objective outputs(called labels). Regardless of the quality of the learned relation(function), its performance on a test set(a collection of data different from the training input) is subject to investigation.\n\nMost ML\u002FDL models have trainable parameters which will be learned to build that input-output relationship. Based on the number of parameters each model has, they can be sorted into more flexible(more parameters) to less flexible(less parameters).\n\nThe problem of Underfitting arises when the flexibility of a model(its number of parameters) is not adequate to capture the underlying pattern in a training dataset. Overfitting, on the other hand, arises when the model is too flexible to the underlying pattern. In the later case it is said that the model has “memorized” the training data.\n\nAn example of underfitting is estimating a second order polynomial(quadratic function) with a first order polynomial(a simple line). Similarly, estimating a line with a 10th order polynomial would be an example of overfitting.\n\n\n#### 4) How do you combat the curse of dimensionality? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\n\n - Feature Selection(manual or via statistical methods)\n - Principal Component Analysis (PCA)\n - Multidimensional Scaling\n - Locally linear embedding  \n[[src]](https:\u002F\u002Ftowardsdatascience.com\u002Fwhy-and-how-to-get-rid-of-the-curse-of-dimensionality-right-with-breast-cancer-dataset-7d528fb5f6c0)\n\n#### 5) What is regularization, why do we use it, and give some examples of common methods? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\nA technique that discourages learning a more complex or flexible model, so as to avoid the risk of overfitting. \nExamples\n - Ridge (L2 norm)\n - Lasso (L1 norm)  \nThe obvious *disadvantage* of **ridge** regression, is model interpretability. It will shrink the coefficients for least important predictors, very close to zero. But it will never make them exactly zero. In other words, the *final model will include all predictors*. However, in the case of the **lasso**, the L1 penalty has the effect of forcing some of the coefficient estimates to be *exactly equal* to zero when the tuning parameter λ is sufficiently large. Therefore, the lasso method also performs variable selection and is said to yield sparse models.\n[[src]](https:\u002F\u002Ftowardsdatascience.com\u002Fregularization-in-machine-learning-76441ddcf99a)\n\n#### 6) Explain Principal Component Analysis (PCA)? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\n[[Answer]](https:\u002F\u002Ftowardsdatascience.com\u002Fa-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c)\n\nPrincipal Component Analysis (PCA) is a dimensionality reduction technique used in machine learning to reduce the number of features in a dataset while retaining as much information as possible. It works by identifying the directions (principal components) in which the data varies the most, and projecting the data onto a lower-dimensional subspace along these directions.\n\n#### 7) Why is ReLU better and more often used than Sigmoid in Neural Networks? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\n\n* Computation Efficiency:\n  As ReLU is a simple threshold the forward and backward path will be faster.\n* Reduced Likelihood of Vanishing Gradient:\n  Gradient of ReLU is 1 for positive values and 0 for negative values while Sigmoid activation saturates (gradients close to 0) quickly with slightly higher or lower inputs leading to vanishing gradients.\n* Sparsity:\n  Sparsity happens when the input of ReLU is negative. This means fewer neurons are firing ( sparse activation ) and the network is lighter. \n\n\n[[src1]](https:\u002F\u002Fmedium.com\u002Fthe-theory-of-everything\u002Funderstanding-activation-functions-in-neural-networks-9491262884e0) [[src2]](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F126238\u002Fwhat-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-networks)\n\n\n\n#### 8) Given stride S and kernel sizes  for each layer of a (1-dimensional) CNN, create a function to compute the [receptive field](https:\u002F\u002Fwww.quora.com\u002FWhat-is-a-receptive-field-in-a-convolutional-neural-network) of a particular node in the network. This is just finding how many input nodes actually connect through to a neuron in a CNN. [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\nThe receptive field are defined portion of space within an inputs that will be used during an operation to generate an output.\n\nConsidering a CNN filter of size k, the receptive field of a peculiar layer is only the number of input used by the filter, in this case k, multiplied by the dimension of the input that is not being reduced by the convolutionnal filter a. This results in a receptive field of k*a.\n\nMore visually, in the case of an image of size 32x32x3, with a CNN with a filter size of 5x5, the corresponding recpetive field will be the the filter size, 5 multiplied by the depth of the input volume (the RGB colors) which is the color dimensio. This thus gives us a recpetive field of dimension 5x5x3.\n\n#### 9) Implement [connected components](http:\u002F\u002Faishack.in\u002Ftutorials\u002Flabelling-connected-components-example\u002F) on an image\u002Fmatrix. [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\n\n#### 10) Implement a sparse matrix class in C++. [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\n[[Answer]](https:\u002F\u002Fwww.geeksforgeeks.org\u002Fsparse-matrix-representation\u002F)\n\n#### 11) Create a function to compute an [integral image](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSummed-area_table), and create another function to get area sums from the integral image.[[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\n[[Answer]](https:\u002F\u002Fwww.geeksforgeeks.org\u002Fsubmatrix-sum-queries\u002F)\n\n#### 12) How would you remove outliers when trying to estimate a flat plane from noisy samples? [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\nRandom sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates.\n[[src]](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FRandom_sample_consensus)\n\n\n\n#### 13) How does [CBIR](https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002Fpublications\u002F2013\u002Farandjelovic13\u002Farandjelovic13.pdf) work? [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\n[[Answer]](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FContent-based_image_retrieval)\nContent-based image retrieval is the concept of using images to gather metadata on their content. Compared to the current image retrieval approach based on the keywords associated to the images, this technique generates its metadata from computer vision techniques to extract the relevant informations that will be used during the querying step. Many approach are possible from feature detection to retrieve keywords to the usage of CNN to extract dense features that will be associated to a known distribution of keywords. \n\nWith this last approach, we care less about what is shown on the image but more about the similarity between the metadata generated by a known image and a list of known label and or tags projected into this metadata space.\n\n#### 14) How does image registration work? Sparse vs. dense [optical flow](http:\u002F\u002Fwww.ncorr.com\u002Fdownload\u002Fpublications\u002Fbakerunify.pdf) and so on. [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\n#### 15) Describe how convolution works. What about if your inputs are grayscale vs RGB imagery? What determines the shape of the next layer?[[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)] \nIn a convolutional neural network (CNN), the convolution operation is applied to the input image using a small matrix called a kernel or filter. The kernel slides over the image in small steps, called strides, and performs element-wise multiplications with the corresponding elements of the image and then sums up the results. The output of this operation is called a feature map.\n\nWhen the input is RGB(or more than 3 channels) the sliding window will be a sliding cube. The shape of the next layer is determined by Kernel size, number of kernels, stride, padding, and dialation.\n\n[[src1]](https:\u002F\u002Fdev.to\u002Fsandeepbalachandran\u002Fmachine-learning-convolution-with-color-images-2p41)[[src2]](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F70231487\u002Foutput-dimensions-of-convolution-in-pytorch)\n\n#### 16) Talk me through how you would create a 3D model of an object from imagery and depth sensor measurements taken at all angles around the object. [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\nThere are two popular methods for 3D reconstruction:\n* Structure from Motion (SfM) [[src]](https:\u002F\u002Fwww.mathworks.com\u002Fhelp\u002Fvision\u002Fug\u002Fstructure-from-motion.html)\n\n* Multi-View Stereo (MVS) [[src]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Zwwty2qPNs8)\n\nSfM is better suited for creating models of large scenes while MVS is better suited for creating models of small objects.\n\n\n#### 17) Implement SQRT(const double & x) without using any special functions, just fundamental arithmetic. [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\nThe taylor series can be used for this step by providing an approximation of sqrt(x):\n\n[[Answer]](https:\u002F\u002Fmath.stackexchange.com\u002Fquestions\u002F732540\u002Ftaylor-series-of-sqrt1x-using-sigma-notation)\n\n#### 18) Reverse a bitstring. [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\nIf you are using python3 :\n\n```\ndata = b'\\xAD\\xDE\\xDE\\xC0'\nmy_data = bytearray(data)\nmy_data.reverse()\n```\n#### 19) Implement non maximal suppression as efficiently as you can. [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\nNon-Maximum Suppression (NMS) is a technique used to eliminate multiple detections of the same object in a given image.\nTo solve that first sort bounding boxes based on their scores(N LogN). Starting with the box with the highest score, remove boxes whose overlapping metric(IoU) is greater than a certain threshold.(N^2)\n\nTo optimize this solution you can use special data structures to query for overlapping boxes such as R-tree or KD-tree. (N LogN)\n[[src]](https:\u002F\u002Ftowardsdatascience.com\u002Fnon-maxima-suppression-139f7e00f0b5)\n\n#### 20) Reverse a linked list in place. [[src](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\n[[Answer]](https:\u002F\u002Fwww.geeksforgeeks.org\u002Freverse-a-linked-list\u002F)\n\n#### 21) What is data normalization and why do we need it? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\nData normalization is very important preprocessing step, used to rescale values to fit in a specific range to assure better convergence during backpropagation. In general, it boils down to subtracting the mean of each data point and dividing by its standard deviation. If we don't do this then some of the features (those with high magnitude) will be weighted more in the cost function (if a higher-magnitude feature changes by 1%, then that change is pretty big, but for smaller features it's quite insignificant). The data normalization makes all features weighted equally.\n\n#### 22) Why do we use convolutions for images rather than just FC layers? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\nFirstly, convolutions preserve, encode, and actually use the spatial information from the image. If we used only FC layers we would have no relative spatial information. Secondly, Convolutional Neural Networks (CNNs) have a partially built-in translation in-variance, since each convolution kernel acts as it's own filter\u002Ffeature detector.\n\n#### 23) What makes CNNs translation invariant? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\nAs explained above, each convolution kernel acts as it's own filter\u002Ffeature detector. So let's say you're doing object detection, it doesn't matter where in the image the object is since we're going to apply the convolution in a sliding window fashion across the entire image anyways.\n\n#### 24) Why do we have max-pooling in classification CNNs? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\nfor a role in Computer Vision. Max-pooling in a CNN allows you to reduce computation since your feature maps are smaller after the pooling. You don't lose too much semantic information since you're taking the maximum activation. There's also a theory that max-pooling contributes a bit to giving CNNs more translation in-variance. Check out this great video from Andrew Ng on the [benefits of max-pooling](https:\u002F\u002Fwww.coursera.org\u002Flearn\u002Fconvolutional-neural-networks\u002Flecture\u002FhELHk\u002Fpooling-layers).\n\n#### 25) Why do segmentation CNNs typically have an encoder-decoder style \u002F structure? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\nThe encoder CNN can basically be thought of as a feature extraction network, while the decoder uses that information to predict the image segments by \"decoding\" the features and upscaling to the original image size.\n\n#### 26) What is the significance of Residual Networks? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\nThe main thing that residual connections did was allow for direct feature access from previous layers. This makes information propagation throughout the network much easier. One very interesting paper about this shows how using local skip connections gives the network a type of ensemble multi-path structure, giving features multiple paths to propagate throughout the network.\n\n#### 27) What is batch normalization and why does it work? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\nTraining Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. The idea is then to normalize the inputs of each layer in such a way that they have a mean output activation of zero and standard deviation of one. This is done for each individual mini-batch at each layer i.e compute the mean and variance of that mini-batch alone, then normalize. This is analogous to how the inputs to networks are standardized. How does this help? We know that normalizing the inputs to a network helps it learn. But a network is just a series of layers, where the output of one layer becomes the input to the next. That means we can think of any layer in a neural network as the first layer of a smaller subsequent network. Thought of as a series of neural networks feeding into each other, we normalize the output of one layer before applying the activation function, and then feed it into the following layer (sub-network).\n\n#### 28) Why would you use many small convolutional kernels such as 3x3 rather than a few large ones? [[src](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)]\nThis is very well explained in the [VGGNet paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.1556.pdf). There are 2 reasons: First, you can use several smaller kernels rather than few large ones to get the same receptive field and capture more spatial context, but with the smaller kernels you are using less parameters and computations. Secondly, because with smaller kernels you will be using more filters, you'll be able to use more activation functions and thus have a more discriminative mapping function being learned by your CNN.\n\n#### 29) Why do we need a validation set and test set? What is the difference between them? [[src](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)]\nWhen training a model, we divide the available data into three separate sets:\n\n - The training dataset is used for fitting the model’s parameters. However, the accuracy that we achieve on the training set is not reliable for predicting if the model will be accurate on new samples.\n - The validation dataset is used to measure how well the model does on examples that weren’t part of the training dataset. The metrics computed on the validation data can be used to tune the hyperparameters of the model. However, every time we evaluate the validation data and we make decisions based on those scores, we are leaking information from the validation data into our model. The more evaluations, the more information is leaked. So we can end up overfitting to the validation data, and once again the validation score won’t be reliable for predicting the behaviour of the model in the real world.\n - The test dataset is used to measure how well the model does on previously unseen examples. It should only be used once we have tuned the parameters using the validation set.\n\nSo if we omit the test set and only use a validation set, the validation score won’t be a good estimate of the generalization of the model.\n\n#### 30) What is stratified cross-validation and when should we use it? [[src](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)]\nCross-validation is a technique for dividing data between training and validation sets. On typical cross-validation this split is done randomly. But in stratified cross-validation, the split preserves the ratio of the categories on both the training and validation datasets.\n\nFor example, if we have a dataset with 10% of category A and 90% of category B, and we use stratified cross-validation, we will have the same proportions in training and validation. In contrast, if we use simple cross-validation, in the worst case we may find that there are no samples of category A in the validation set.\n\nStratified cross-validation may be applied in the following scenarios:\n\n - On a dataset with multiple categories. The smaller the dataset and the more imbalanced the categories, the more important it will be to use stratified cross-validation.\n - On a dataset with data of different distributions. For example, in a dataset for autonomous driving, we may have images taken during the day and at night. If we do not ensure that both types are present in training and validation, we will have generalization problems.\n\n#### 31) Why do ensembles typically have higher scores than individual models? [[src](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)]\nAn ensemble is the combination of multiple models to create a single prediction. The key idea for making better predictions is that the models should make different errors. That way the errors of one model will be compensated by the right guesses of the other models and thus the score of the ensemble will be higher.\n\nWe need diverse models for creating an ensemble. Diversity can be achieved by:\n - Using different ML algorithms. For example, you can combine logistic regression, k-nearest neighbors, and decision trees.\n - Using different subsets of the data for training. This is called bagging.\n - Giving a different weight to each of the samples of the training set. If this is done iteratively, weighting the samples according to the errors of the ensemble, it’s called boosting.\nMany winning solutions to data science competitions are ensembles. However, in real-life machine learning projects, engineers need to find a balance between execution time and accuracy.\n\n#### 32) What is an imbalanced dataset? Can you list some ways to deal with it? [[src](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)]\nAn imbalanced dataset is one that has different proportions of target categories. For example, a dataset with medical images where we have to detect some illness will typically have many more negative samples than positive samples—say, 98% of images are without the illness and 2% of images are with the illness.\n\nThere are different options to deal with imbalanced datasets:\n - Oversampling or undersampling. Instead of sampling with a uniform distribution from the training dataset, we can use other distributions so the model sees a more balanced dataset.\n - Data augmentation. We can add data in the less frequent categories by modifying existing data in a controlled way. In the example dataset, we could flip the images with illnesses, or add noise to copies of the images in such a way that the illness remains visible.\n - Using appropriate metrics. In the example dataset, if we had a model that always made negative predictions, it would achieve a precision of 98%. There are other metrics such as precision, recall, and F-score that describe the accuracy of the model better when using an imbalanced dataset.\n\n#### 33) Can you explain the differences between supervised, unsupervised, and reinforcement learning? [[src](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)]\nIn supervised learning, we train a model to learn the relationship between input data and output data. We need to have labeled data to be able to do supervised learning.\n\nWith unsupervised learning, we only have unlabeled data. The model learns a representation of the data. Unsupervised learning is frequently used to initialize the parameters of the model when we have a lot of unlabeled data and a small fraction of labeled data. We first train an unsupervised model and, after that, we use the weights of the model to train a supervised model.\n\nIn reinforcement learning, the model has some input data and a reward depending on the output of the model. The model learns a policy that maximizes the reward. Reinforcement learning has been applied successfully to strategic games such as Go and even classic Atari video games.\n\n#### 34) What is data augmentation? Can you give some examples? [[src](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)]\nData augmentation is a technique for synthesizing new data by modifying existing data in such a way that the target is not changed, or it is changed in a known way.\n\nComputer vision is one of fields where data augmentation is very useful. There are many modifications that we can do to images:\n - Resize\n - Horizontal or vertical flip\n - Rotate\n - Add noise\n - Deform\n - Modify colors\nEach problem needs a customized data augmentation pipeline. For example, on OCR, doing flips will change the text and won’t be beneficial; however, resizes and small rotations may help.\n\n#### 35) What is Turing test? [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\nThe Turing test is a method to test the machine’s ability to match the human level intelligence. A machine is used to challenge the human intelligence that when it passes the test, it is considered as intelligent. Yet a machine could be viewed as intelligent without sufficiently knowing about people to mimic a human.\n\n#### 36) What is Precision?  \nPrecision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances  \nPrecision = true positive \u002F (true positive + false positive)  \n[[src]](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPrecision_and_recall)\n\n#### 37) What is Recall?  \nRecall (also known as sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances.\nRecall = true positive \u002F (true positive + false negative)  \n[[src]](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPrecision_and_recall)\n\n#### 38) Define F1-score. [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\nIt is the weighted average of precision and recall. It considers both false positive and false negative into account. It is used to measure the model’s performance.  \nF1-Score = 2 * (precision * recall) \u002F (precision + recall)\n\n#### 39) What is cost function? [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\nCost function is a scalar functions which Quantifies the error factor of the Neural Network. Lower the cost function better the Neural network. Eg: MNIST Data set to classify the image, input image is digit 2 and the Neural network wrongly predicts it to be 3\n\n#### 40) List different activation neurons or functions. [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n - Linear Neuron\n - Binary Threshold Neuron\n - Stochastic Binary Neuron\n - Sigmoid Neuron\n - Tanh function\n - Rectified Linear Unit (ReLU)\n\n#### 41) Define Learning Rate.\nLearning rate is a hyper-parameter that controls how much we are adjusting the weights of our network with respect the loss gradient. [[src](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FLearning_rate)]\n\n#### 42) What is Momentum (w.r.t NN optimization)?\nMomentum lets the optimization algorithm remembers its last step, and adds some proportion of it to the current step. This way, even if the algorithm is stuck in a flat region, or a small local minimum, it can get out and continue towards the true minimum. [[src]](https:\u002F\u002Fwww.quora.com\u002FWhat-is-the-difference-between-momentum-and-learning-rate)\n\n#### 43) What is the difference between Batch Gradient Descent and Stochastic Gradient Descent?\nBatch gradient descent computes the gradient using the whole dataset. This is great for convex, or relatively smooth error manifolds. In this case, we move somewhat directly towards an optimum solution, either local or global. Additionally, batch gradient descent, given an annealed learning rate, will eventually find the minimum located in it's basin of attraction.\n\nStochastic gradient descent (SGD) computes the gradient using a single sample. SGD works well (Not well, I suppose, but better than batch gradient descent) for error manifolds that have lots of local maxima\u002Fminima. In this case, the somewhat noisier gradient calculated using the reduced number of samples tends to jerk the model out of local minima into a region that hopefully is more optimal. [[src]](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F49528\u002Fbatch-gradient-descent-versus-stochastic-gradient-descent)\n\n#### 44) Epoch vs. Batch vs. Iteration.\n - **Epoch**: one forward pass and one backward pass of **all** the training examples  \n - **Batch**: examples processed together in one pass (forward and backward)  \n - **Iteration**: number of training examples \u002F Batch size  \n\n#### 45) What is vanishing gradient? [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\nAs we add more and more hidden layers, back propagation becomes less and less useful in passing information to the lower layers. In effect, as information is passed back, the gradients begin to vanish and become small relative to the weights of the networks.\n\n#### 46) What are dropouts? [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\nDropout is a simple way to prevent a neural network from overfitting. It is the dropping out of some of the units in a neural network. It is similar to the natural reproduction process, where the nature produces offsprings by combining distinct genes (dropping out others) rather than strengthening the co-adapting of them.\n\n#### 47) Define LSTM. [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\nLong Short Term Memory – are explicitly designed to address the long term dependency problem, by maintaining a state what to remember and what to forget.\n\n#### 48) List the key components of LSTM. [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n - Gates (forget, Memory, update & Read)\n - tanh(x) (values between -1 to 1)\n - Sigmoid(x) (values between 0 to 1)\n\n#### 49) List the variants of RNN. [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n - LSTM: Long Short Term Memory\n - GRU: Gated Recurrent Unit\n - End to End Network\n - Memory Network\n\n#### 50) What is Autoencoder, name few applications. [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\nAuto encoder is basically used to learn a compressed form of given data. Few applications include\n - Data denoising\n - Dimensionality reduction\n - Image reconstruction\n - Image colorization\n\n#### 51) What are the components of GAN? [[src](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n - Generator\n - Discriminator\n\n#### 52) What's the difference between boosting and bagging?\nBoosting and bagging are similar, in that they are both ensembling techniques, where a number of weak learners (classifiers\u002Fregressors that are barely better than guessing) combine (through averaging or max vote) to create a strong learner that can make accurate predictions. Bagging means that you take bootstrap samples (with replacement) of your data set and each sample trains a (potentially) weak learner. Boosting, on the other hand, uses all data to train each learner, but instances that were misclassified by the previous learners are given more weight so that subsequent learners give more focus to them during training. [[src]](https:\u002F\u002Fwww.quora.com\u002FWhats-the-difference-between-boosting-and-bagging)\n\n#### 53) Explain how a ROC curve works. [[src]](https:\u002F\u002Fwww.springboard.com\u002Fblog\u002Fmachine-learning-interview-questions\u002F)\nThe ROC curve is a graphical representation of the contrast between true positive rates and the false positive rate at various thresholds. It’s often used as a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).\n\n#### 54) What’s the difference between Type I and Type II error? [[src]](https:\u002F\u002Fwww.springboard.com\u002Fblog\u002Fmachine-learning-interview-questions\u002F)\nType I error is a false positive, while Type II error is a false negative. Briefly stated, Type I error means claiming something has happened when it hasn’t, while Type II error means that you claim nothing is happening when in fact something is.\nA clever way to think about this is to think of Type I error as telling a man he is pregnant, while Type II error means you tell a pregnant woman she isn’t carrying a baby.\n\n#### 55) What’s the difference between a generative and discriminative model? [[src]](https:\u002F\u002Fwww.springboard.com\u002Fblog\u002Fmachine-learning-interview-questions\u002F)\nA generative model will learn categories of data while a discriminative model will simply learn the distinction between different categories of data. Discriminative models will generally outperform generative models on classification tasks.\n\n#### 56) Instance-Based Versus Model-Based Learning.\n\n - **Instance-based Learning**: The system learns the examples by heart, then generalizes to new cases using a similarity measure.\n\n - **Model-based Learning**: Another way to generalize from a set of examples is to build a model of these examples, then use that model to make predictions. This is called model-based learning.\n[[src]](https:\u002F\u002Fmedium.com\u002F@sanidhyaagrawal08\u002Fwhat-is-instance-based-and-model-based-learning-s1e10-8e68364ae084)\n\n\n#### 57) When to use a Label Encoding vs. One Hot Encoding?\n\nThis question generally depends on your dataset and the model which you wish to apply. But still, a few points to note before choosing the right encoding technique for your model:\n\nWe apply One-Hot Encoding when:\n\n- The categorical feature is not ordinal (like the countries above)\n- The number of categorical features is less so one-hot encoding can be effectively applied\n\nWe apply Label Encoding when:\n\n- The categorical feature is ordinal (like Jr. kg, Sr. kg, Primary school, high school)\n- The number of categories is quite large as one-hot encoding can lead to high memory consumption\n\n[[src]](https:\u002F\u002Fwww.analyticsvidhya.com\u002Fblog\u002F2020\u002F03\u002Fone-hot-encoding-vs-label-encoding-using-scikit-learn\u002F)\n\n#### 58) What is the difference between LDA and PCA for dimensionality reduction?\n\nBoth LDA and PCA are linear transformation techniques: LDA is a supervised whereas PCA is unsupervised – PCA ignores class labels.\n\nWe can picture PCA as a technique that finds the directions of maximal variance. In contrast to PCA, LDA attempts to find a feature subspace that maximizes class separability.\n\n[[src]](https:\u002F\u002Fsebastianraschka.com\u002Ffaq\u002Fdocs\u002Flda-vs-pca.html)\n\n#### 59) What is t-SNE?\n\nt-Distributed Stochastic Neighbor Embedding (t-SNE) is an unsupervised, non-linear technique primarily used for data exploration and visualizing high-dimensional data. In simpler terms, t-SNE gives you a feel or intuition of how the data is arranged in a high-dimensional space. \n\n[[src]](https:\u002F\u002Ftowardsdatascience.com\u002Fan-introduction-to-t-sne-with-python-example-5a3a293108d1)\n\n#### 60) What is the difference between t-SNE and PCA for dimensionality reduction?\n\nThe first thing to note is that PCA was developed in 1933 while t-SNE was developed in 2008. A lot has changed in the world of data science since 1933 mainly in the realm of compute and size of data. Second, PCA is a linear dimension reduction technique that seeks to maximize variance and preserves large pairwise distances. In other words, things that are different end up far apart. This can lead to poor visualization especially when dealing with non-linear manifold structures. Think of a manifold structure as any geometric shape like: cylinder, ball, curve, etc.\n\nt-SNE differs from PCA by preserving only small pairwise distances or local similarities whereas PCA is concerned with preserving large pairwise distances to maximize variance.\n\n[[src]](https:\u002F\u002Ftowardsdatascience.com\u002Fan-introduction-to-t-sne-with-python-example-5a3a293108d1)\n\n#### 61) What is UMAP?\n\nUMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data.\n\n[[src]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.03426#:~:text=UMAP%20)\n\n#### 62) What is the difference between t-SNE and UMAP for dimensionality reduction?\n\nThe biggest difference between the output of UMAP when compared with t-SNE is this balance between local and global structure - UMAP is often better at preserving global structure in the final projection. This means that the inter-cluster relations are potentially more meaningful than in t-SNE. However, it's important to note that, because UMAP and t-SNE both necessarily warp the high-dimensional shape of the data when projecting to lower dimensions, any given axis or distance in lower dimensions still isn’t directly interpretable in the way of techniques such as PCA.\n\n[[src]](https:\u002F\u002Fpair-code.github.io\u002Funderstanding-umap\u002F)\n\n#### 63) How Random Number Generator Works, e.g. rand() function in python works?\nIt generates a pseudo random number based on the seed and there are some famous algorithm, please see below link for further information on this.\n[[src]](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FLinear_congruential_generator)\n\n#### 64) Given that we want to evaluate the performance of 'n' different machine learning models on the same data, why would the following splitting mechanism be incorrect :\n```\ndef get_splits():\n    df = pd.DataFrame(...)\n    rnd = np.random.rand(len(df))\n    train = df[ rnd \u003C 0.8 ]\n    valid = df[ rnd >= 0.8 & rnd \u003C 0.9 ]\n    test = df[ rnd >= 0.9 ]\n\n    return train, valid, test\n\n#Model 1\n\nfrom sklearn.tree import DecisionTreeClassifier\ntrain, valid, test = get_splits()\n...\n\n#Model 2\n\nfrom sklearn.linear_model import LogisticRegression\ntrain, valid, test = get_splits()\n...\n```\nThe rand() function orders the data differently each time it is run, so if we run the splitting mechanism again, the 80% of the rows we get will be different from the ones we got the first time it was run. This presents an issue as we need to compare the performance of our models on the same test set. In order to ensure reproducible and consistent sampling we would have to set the random seed in advance or store the data once it is split. Alternatively, we could simply set the 'random_state' parameter in sklearn's train_test_split() function in order to get the same train, validation and test sets across different executions. \n\n[[src]](https:\u002F\u002Ftowardsdatascience.com\u002Fwhy-do-we-set-a-random-state-in-machine-learning-models-bb2dc68d8431#:~:text=In%20Scikit%2Dlearn%2C%20the%20random,random%20state%20instance%20from%20np.)\n\n\n#### 65) What is the difference between Bayesian vs frequentist statistics? [[src]](https:\u002F\u002Fwww.kdnuggets.com\u002F2022\u002F10\u002Fnlp-interview-questions.html)\nFrequentist statistics is a framework that focuses on estimating population parameters using sample statistics, and providing point estimates and confidence intervals.\n\nBayesian statistics, on the other hand, is a framework that uses prior knowledge and information to update beliefs about a parameter or hypothesis, and provides probability distributions for parameters.\n\nThe main difference is that Bayesian statistics incorporates prior knowledge and beliefs into the analysis, while frequentist statistics doesn't.\n\n## Contributions\nContributions are most welcomed.\n 1. Fork the repository.\n 2. Commit your *questions* or *answers*.\n 3. Open **pull request**.\n\n## More Study Material\n* Preparation resources and modern references: [`docs\u002Fresources-and-references.md`](.\u002Fdocs\u002Fresources-and-references.md)\n* Recommended topic breakdown: [`docs\u002Fstudy-pattern.md`](.\u002Fdocs\u002Fstudy-pattern.md)\n","## CrackingMachineLearningInterview\n\n面向机器学习工程师、人工智能工程师、数据科学家、深度学习工程师、数据工程师以及专注于DevOps或平台的角色的实用面试准备资源库。\n\n请访问 [CrackingMachineLearningInterview](https:\u002F\u002Fshafaypro.github.io\u002FCrackingMachineLearningInterview\u002F) GitPage（以获得更好的UI\u002FUX体验）。\n\n### 本仓库适合哪些人\n* 机器学习工程师\n* 数据科学家\n* 深度学习工程师\n* 人工智能工程师\n* 从事AI\u002FML相关产品的软件工程师\n* 数据工程师\n* MLOps工程师\n* DevOps\u002F平台工程师\n\n## 如何使用本仓库\n* 如果您正在准备当前的AI\u002FML相关面试，请从**2026年面试路线图**开始。\n* 对于现代面试环节，可以参考**2026年附加问答**。\n* 针对专业方向的面试，可查阅**AI \u002F GenAI**、**数据工程**和**DevOps**章节。\n* 核心的机器学习、统计学、深度学习和算法相关内容，请参考**经典题库**。\n* 制定有针对性的学习计划时，可参考**备考资源与参考文献**。\n* 如果希望从基础到生产级AI系统有一个清晰的学习路径，可以参考**建议学习顺序**。\n\n## 快速导航\n* [2026年面试路线图](.\u002Fdocs\u002F2026-interview-roadmap.md)\n* [2026年附加问答](.\u002Fdocs\u002F2026-additional-questions.md)\n* [2026年常见面试问题（新）](.\u002Fdocs\u002Finterview_questions_2026.md)\n* [AI \u002F GenAI赛道](#ai--genai-track)\n* [经典机器学习赛道](#classic-ml-track)\n* [深度学习赛道](#deep-learning-track)\n* [MLOps赛道](#mlops-track)\n* [数据工程赛道](#data-engineering-track)\n* [DevOps赛道](#devops-track)\n* [编程挑战赛道](#coding-challenges-track)\n* [云上ML平台](#cloud-ml-platforms)\n* [系统设计赛道](#system-design-track)\n* [框架赛道](#frameworks-track)\n* [建议学习顺序](#suggested-learning-order)\n* [重点推荐项目](#highlighted-projects)\n* [备考资源与参考文献](.\u002Fdocs\u002Fresources-and-references.md)\n* [学习模式](.\u002Fdocs\u002Fstudy-pattern.md)\n* [经典题库](#classic-question-bank)\n* [贡献说明](#contributions)\n\n## 关于我们\n* GitHub个人主页：[Shafaypro](https:\u002F\u002Fgithub.com\u002Fshafaypro) &copy;\n* 仓库地址：[CrackingMachineLearningInterview](https:\u002F\u002Fgithub.com\u002Fshafaypro\u002FCrackingMachineLearningInterview)\n\n#### 图片引用\n* 图片引用仅用于教育目的。如需署名，请参阅仓库中的引用信息。\n\n#### 分享\n欢迎在您的博客、学习笔记或面试准备资料中分享本仓库链接。\n\n## 仓库结构\n* [`docs\u002F2026-interview-roadmap.md`](.\u002Fdocs\u002F2026-interview-roadmap.md)：针对机器学习工程师和人工智能工程师职位的当前面试重点领域。\n* [`docs\u002F2026-additional-questions.md`](.\u002Fdocs\u002F2026-additional-questions.md)：涵盖LLM、RAG、评估、智能体及生产级AI的现代2026年题库。\n* [`docs\u002Finterview_questions_2026.md`](.\u002Fdocs\u002Finterview_questions_2026.md)：深入探讨智能体、RAG、LLM扩展、生产级AI和系统设计等主题的面试问答。**(新)**\n* [`docs\u002Fresources-and-references.md`](.\u002Fdocs\u002Fresources-and-references.md)：书籍、参考资料及其他补充面试内容。\n* [`docs\u002Fstudy-pattern.md`](.\u002Fdocs\u002Fstudy-pattern.md)：推荐的备考主题、难度分级及学习结构。\n* [`ai_genai\u002F`](.\u002Fai_genai)：GenAI和LLM工程相关主题，包括n8n、CrewAI、LangGraph、LangSmith、多智能体系统及高级RAG。**(扩展)**\n* [`classical_ml\u002F`](.\u002Fclassical_ml)：经典机器学习算法——时间序列、聚类、降维、推荐系统、特征工程。\n* [`mlops\u002F`](.\u002Fmlops)：MLOps相关主题——MLflow、模型服务、特征存储、可解释性、数据质量、LLM评估。**(扩展)**\n* [`cloud_ml\u002F`](.\u002Fcloud_ml)：云上ML平台——AWS SageMaker、Google Vertex AI、Azure ML。\n* [`data_engineering\u002F`](.\u002Fdata_engineering)：数据工程面试主题、平台概念及地理空间AI。**(扩展)**\n* [`devops\u002F`](.\u002Fdevops)：DevOps、基础设施、部署及AI测试相关主题。**(扩展)**\n* [`frameworks\u002F`](.\u002Fframeworks)：ML和AI框架，包括FastAPI、Pydantic、PyTorch、HuggingFace以及LLM推理服务。**(扩展)**\n* [`system_design\u002F`](.\u002Fsystem_design)：ML系统设计模式、RAG流水线、智能体架构、批处理与实时系统。**(扩展)**\n* [`deep_learning\u002F`](.\u002Fdeep_learning)：深度学习基础、Transformer模型及应用训练流水线。**(扩展)**\n* [`coding_challenges\u002F`](.\u002Fcoding_challenges)：Python和SQL面试练习指南，适用于编码筛选和数据问题解决。**(新)**\n* `README.md`：仓库首页，包含原始的经典机器学习面试题库。\n\n## 建议学习顺序\n如果您希望从理论过渡到生产级AI工程，可以按照以下顺序学习：\n\n1. [经典机器学习赛道](#classic-ml-track)\n2. [深度学习赛道](#deep-learning-track)\n3. [AI \u002F GenAI赛道](#ai--genai-track)\n4. [数据工程赛道](#data-engineering-track)\n5. [MLOps赛道](#mlops-track)\n6. [框架赛道](#frameworks-track)\n7. [系统设计赛道](#system-design-track)\n8. [编程挑战赛道](#coding-challenges-track)\n9. [云上ML平台](#cloud-ml-platforms)\n\n## 重点推荐项目\n利用这些项目，您可以将本仓库转化为作品集，而不仅仅是阅读清单：\n\n* [训练与推理流水线](.\u002Fdeep_learning\u002Fintro_applied_deep_learning.md)\n* [提示工程实验库](.\u002Fai_genai\u002Fintro_llm_fundamentals.md)\n* [多智能体研究助手](.\u002Fai_genai\u002Fintro_agent_tool_use.md)\n* [多模型路由系统](.\u002Fai_genai\u002Fintro_multi_model_orchestration.md)\n* [文档理解系统](.\u002Fai_genai\u002Fintro_multimodal_ai.md)\n* [使用n8n实现AI工作流自动化](.\u002Fai_genai\u002Fintro_n8n.md)\n* [评估流水线](.\u002Fmlops\u002Fintro_evaluation_guardrails.md)\n* [AI应用的CI\u002FCD流程](.\u002Fmlops\u002Fintro_llmops_mlops_engineering.md)\n* [可扩展的AI API](.\u002Fsystem_design\u002Fintro_backend_ai_system_design.md)\n* [特征存储系统](.\u002Fdata_engineering\u002Fintro_data_engineering_for_ai.md)\n\n## AI \u002F 生成式AI赛道\n此赛道适用于AI工程师、生成式AI工程师、大模型工程师、应用AI及代理平台相关的面试。\n\n核心主题：\n* [大模型与生成式AI基础](.\u002Fai_genai\u002Fintro_llm_fundamentals.md) **(新)**\n* [RAG](.\u002Fai_genai\u002Fintro_rag.md)\n* [RAG工程](.\u002Fai_genai\u002Fintro_rag_engineering.md) **(新)**\n* [向量数据库](.\u002Fai_genai\u002Fintro_vector_databases.md)\n* [向量数据库——进阶（Pinecone、Weaviate、FAISS、pgvector、混合检索、重排序）](.\u002Fai_genai\u002Fintro_vector_databases_advanced.md) **(新)**\n* [LLMOps](.\u002Fai_genai\u002Fintro_llmops.md)\n* [代理式AI](.\u002Fai_genai\u002Fintro_agentic_ai.md)\n* [代理系统与工具使用](.\u002Fai_genai\u002Fintro_agent_tool_use.md) **(新)**\n* [多智能体系统（模式、记忆、工具调用、故障处理）](.\u002Fai_genai\u002Fintro_multi_agent_systems.md) **(新)**\n* [多模型与AI编排](.\u002Fai_genai\u002Fintro_multi_model_orchestration.md) **(新)**\n* [多模态AI](.\u002Fai_genai\u002Fintro_multimodal_ai.md) **(新)**\n* [CrewAI](.\u002Fai_genai\u002Fintro_crewai.md) **(新)**\n* [n8n - AI工作流自动化](.\u002Fai_genai\u002Fintro_n8n.md) **(新)**\n* [n8n - 高级AI工作流](.\u002Fai_genai\u002Fintro_n8n_advanced.md) **(新)**\n* [LangGraph](.\u002Fai_genai\u002Fintro_langgraph.md) **(新)**\n* [LangSmith — 可观测性与评估](.\u002Fai_genai\u002Fintro_langsmith.md) **(新)**\n* [提示工程（CoT、ReAct、少样本、自洽性、ToT、输出控制）](.\u002Fai_genai\u002Fintro_prompt_engineering.md) **(新)**\n* [结构化输出与函数调用（JSON模式、工具使用、Pydantic、Instructor）](.\u002Fai_genai\u002Fintro_structured_outputs.md) **(新)**\n* [大模型安全（提示注入、越狱、红队测试、防御机制）](.\u002Fai_genai\u002Fintro_llm_security.md) **(新)**\n* [MCP](.\u002Fai_genai\u002Fintro_mcp.md)\n* [LangChain](.\u002Fai_genai\u002Fintro_langchain.md)\n* [Anthropic概览](.\u002Fai_genai\u002Fintro_anthropic.md)\n\n## 数据工程赛道\n此赛道适用于管道、ETL、编排、数据仓库、湖仓一体、流处理以及地理空间相关岗位的面试。\n\n核心主题：\n* [面向AI的数据工程](.\u002Fdata_engineering\u002Fintro_data_engineering_for_ai.md) **(新)**\n* [数据建模](.\u002Fdata_engineering\u002Fdata-modeling.md) **(新)**\n* [数据架构](.\u002Fdata_engineering\u002Fdata-architecture.md) **(新)**\n* [Apache Spark](.\u002Fdata_engineering\u002Fintro_apache_spark.md)\n* [Apache Kafka](.\u002Fdata_engineering\u002Fintro_apache_kafka.md)\n* [Apache Airflow](.\u002Fdata_engineering\u002Fintro_apache_airflow.md)\n* [dbt简介](.\u002Fdata_engineering\u002Fintro_dbt.md)\n* [dbt面试指南](.\u002Fdata_engineering\u002Finterview_dbt.md)\n* [Apache Iceberg](.\u002Fdata_engineering\u002Fintro_apache_iceberg.md)\n* [Delta Lake](.\u002Fdata_engineering\u002Fintro_delta_lake.md)\n* [DuckDB](.\u002Fdata_engineering\u002Fintro_duckdb.md)\n* [OpenClaw](.\u002Fdata_engineering\u002Fintro_openclaw.md)\n* [地理空间AI系统（Google Solar API、ArcGIS、PostGIS、H3）](.\u002Fdata_engineering\u002Fintro_geospatial.md) **(新)**\n\n## 深度学习赛道\n此赛道适用于机器学习工程师、深度学习工程师以及需要深入理解架构和训练流程的应用AI相关岗位的面试。\n\n核心主题：\n* [深度学习概述](.\u002Fdeep_learning\u002FREADME.md)\n* [应用深度学习路线图](.\u002Fdeep_learning\u002Fintro_applied_deep_learning.md) **(新)**\n* [Transformer](.\u002Fdeep_learning\u002Fintro_transformers.md)\n\n## DevOps赛道\n此赛道适用于基础设施、CI\u002FCD、容器、编排、IaC以及AI系统测试相关岗位的面试。\n\n核心主题：\n* [Docker](.\u002Fdevops\u002Fintro_docker.md)\n* [Kubernetes](.\u002Fdevops\u002Fintro_kubernetes.md)\n* [Helm](.\u002Fdevops\u002Fintro_helm.md)\n* [Terraform](.\u002Fdevops\u002Fintro_terraform.md)\n* [GitHub Actions](.\u002Fdevops\u002Fintro_github_actions.md)\n* [AI系统的测试（Playwright、Puppeteer、大模型端到端测试）](.\u002Fdevops\u002Fintro_testing_ai.md) **(新)**\n\n## 经典机器学习赛道\n此赛道适用于经典机器学习算法相关的面试、数据科学岗位，以及作为机器学习工程师岗位的基础知识。\n\n核心主题：\n* [时间序列与预测](.\u002Fclassical_ml\u002Fintro_time_series.md)\n* [聚类算法](.\u002Fclassical_ml\u002Fintro_clustering.md)\n* [降维](.\u002Fclassical_ml\u002Fintro_dimensionality_reduction.md)\n* [推荐系统](.\u002Fclassical_ml\u002Fintro_recommender_systems.md)\n* [特征工程与选择](.\u002Fclassical_ml\u002Fintro_feature_engineering.md)\n\n## MLOps赛道\n此赛道适用于MLOps工程师、高级机器学习工程师以及生产级ML系统相关岗位的面试。\n\n核心主题：\n* [LLMOps \u002F MLOps工程](.\u002Fmlops\u002Fintro_llmops_mlops_engineering.md) **(新)**\n* [MLflow](.\u002Fmlops\u002Fintro_mlflow.md)\n* [模型可解释性（SHAP、LIME）](.\u002Fmlops\u002Fintro_model_explainability.md)\n* [特征存储](.\u002Fmlops\u002Fintro_feature_stores.md)\n* [模型服务](.\u002Fmlops\u002Fintro_model_serving.md)\n* [数据质量与验证](.\u002Fmlops\u002Fintro_data_quality.md)\n* [大模型评估（评估指标、基准测试、幻觉检测、人机协作）](.\u002Fmlops\u002Fintro_llm_evaluation.md) **(新)**\n* [评估与护栏](.\u002Fmlops\u002Fintro_evaluation_guardrails.md) **(新)**\n\n## 云上机器学习平台赛道\n此赛道适用于在AWS、GCP或Azure等云平台上工作的机器学习工程师及MLOps相关岗位的面试。\n\n核心主题：\n* [云上机器学习平台概述](.\u002Fcloud_ml\u002FREADME.md) **(新)**\n* [云上机器学习平台对比（SageMaker vs Vertex AI vs Azure ML）](.\u002Fcloud_ml\u002Fintro_cloud_ml_platforms.md)\n* [AWS SageMaker面试指南](.\u002Fcloud_ml\u002Fintro_sagemaker.md) **(新)**\n* [Google Vertex AI面试指南](.\u002Fcloud_ml\u002Fintro_vertex_ai.md) **(新)**\n* [Azure Machine Learning面试指南](.\u002Fcloud_ml\u002Fintro_azure_ml.md) **(新)**\n\n## 系统设计赛道\n此赛道适用于需要深入系统设计能力的高级机器学习工程师、资深工程师及首席工程师岗位的面试。\n\n核心主题：\n* [ML系统设计框架与模式](.\u002Fsystem_design\u002FREADME.md)\n* [面向AI的后端与系统设计](.\u002Fsystem_design\u002Fintro_backend_ai_system_design.md) **(新)**\n* [推荐系统设计](.\u002Fsystem_design\u002Frecommendation_system.md)\n* [欺诈检测系统设计](.\u002Fsystem_design\u002Ffraud_detection.md)\n* [ML系统设计模式——RAG、代理、批处理 vs 实时（2026年）](.\u002Fsystem_design\u002Fml_system_design_patterns.md) **(新)**\n\n## 编码挑战赛道\n此赛道适用于需要现场编码、居家编程题解答或SQL考核的面试环节。\n\n核心主题：\n* [编码挑战概述](.\u002Fcoding_challenges\u002FREADME.md) **(新)**\n* [Python编码挑战](.\u002Fcoding_challenges\u002Fpython_coding_challenges.md) **(新)**\n* [SQL编码挑战](.\u002Fcoding_challenges\u002Fsql_coding_challenges.md) **(新)**\n\n## 框架赛道\n此赛道适用于需要实际动手开发Python API并具备AI框架专业知识的岗位。\n\n核心主题：\n* [PyTorch](.\u002Fframeworks\u002Fintro_pytorch.md)\n* [HuggingFace](.\u002Fframeworks\u002Fintro_huggingface.md)\n* [LangChain](.\u002Fframeworks\u002Fintro_langchain.md)\n* [Ollama](.\u002Fframeworks\u002Fintro_ollama.md)\n* [vLLM](.\u002Fframeworks\u002Fintro_vllm.md)\n* [Unsloth](.\u002Fframeworks\u002Fintro_unsloth.md)\n* [FastAPI — 生产级AI后端工程](.\u002Fframeworks\u002Fintro_fastapi.md) **(新)**\n* [Pydantic — AI系统的数据验证](.\u002Fframeworks\u002Fintro_pydantic.md) **(新)**\n\n# 经典题库\n\n#### 监督学习与无监督学习的区别是什么？\n        监督学习是指你已经知道结果，并且提供了完全标注的结果数据；而在无监督学习中，你并没有得到标注好的结果数据。所谓“完全标注”，就是指训练数据集中的每个样本都附有算法应该自行得出的答案标签。例如，一个花卉图像的标注数据集会告诉模型哪些照片是玫瑰、雏菊和水仙花。当给模型一张新图片时，它会将这张图片与训练样本进行比较，从而预测出正确的标签。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_285a8d44643e.jpg)\n#### 什么是强化学习？如何定义它？\n        强化学习不同于监督学习之处在于，它不需要提供标注好的输入-输出对，也不需要明确纠正次优动作。相反，它的重点在于在探索未知领域与利用现有知识之间找到平衡。在强化学习中，每一步学习都会根据某种奖励机制来决定给予模型正向或负向的反馈，并据此调整模型的行为。\n\n![](https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F1\u002F1b\u002FReinforcement_learning_diagram.svg\u002F250px-Reinforcement_learning_diagram.svg.png)\n#### 什么是深度学习？\n        深度学习被定义为受大脑结构和功能启发的人工神经网络（ANN）算法。深度学习通常专注于非线性分析，因此特别适用于人工智能领域的非线性问题。\n\n#### 机器学习与深度学习的区别是什么？\n        由于深度学习是机器学习的一个子集，而两者又都是人工智能的一部分。虽然基础的机器学习模型在其特定任务上会逐渐改进，但仍然需要人工干预。如果某个AI算法的预测不准确，工程师就需要介入并进行调整。而在深度学习中，算法可以通过自身的神经网络自主判断预测是否准确。\n![](https:\u002F\u002Flawtomated.com\u002Fwp-content\u002Fuploads\u002F2019\u002F04\u002FMLvsDL.png)\n#### 半监督学习与强化学习的区别是什么？\n        半监督学习在训练过程中会使用少量的标注数据结合大量的未标注数据。它介于监督学习（完全标注）和无监督学习（无标注）之间。常见的技术包括自训练、标签传播和生成模型等。例如：用100张标注过的图片和10,000张未标注的图片来训练一个图像分类器。\n\n        强化学习（RL）是一种范式，其中智能体通过与环境交互来学习，根据其行为获得奖励或惩罚。与基于固定数据集工作的半监督学习不同，强化学习涉及序列决策过程。智能体会学习一种策略，以最大化随时间累积的奖励。例如：训练机器人行走或训练智能体下棋。\n\n#### 偏差与方差的区别是什么？\n        偏差是指模型所做的过度简化假设。\n        方差则是指模型不仅能够学习数据中的规律，还会学习其中的噪声，从而导致模型的高度变异性。偏差和方差之间总是存在权衡关系，因此建议在这两者之间找到平衡，并始终使用交叉验证来确定最佳模型。\n\n#### 什么是线性回归？它是如何工作的？\n        线性回归是在二维平面上拟合一条直线，以明确因变量与自变量之间的相关性。它使用简单的直线\u002F斜率公式，最常见的是 f(X) = Mx + b 的形式。\n        其中，b 表示偏置项，\n        X 表示输入变量（自变量），\n        f(X) 表示 Y，即因变量（结果）。\n\n        线性回归的工作原理是：给定 n 个统计单位的数据集，线性回归模型假设因变量 y 与 p 维自变量 x 之间的关系是线性的。这种关系通过误差项 ε 来建模——这是一个不可观测的随机变量，会在因变量和自变量之间的线性关系中引入“噪声”。因此，模型的形式为 Y = B0 + B1X1 + B2X2 + …… + BN XN。\n        这也意味着：Y(i) = X(i)^T + B(i)。\n        其中，T 表示转置，\n        X(i) 表示第 i 个记录的输入向量，\n        B(i) 表示偏置向量。\n\n#### 回归分析的应用场景：\n        泊松回归用于计数型数据。\n        逻辑回归和 probit 回归用于二元数据。\n        多项式逻辑回归和多项式 probit 回归用于分类数据。\n        有序 logit 和有序 probit 回归用于顺序型数据。\n\n#### 什么是逻辑回归？它是如何工作的？\n        逻辑回归是一种统计方法，用于根据一个或多个自变量预测二元响应的概率。也就是说，给定某些因素，逻辑回归可以用来预测只有两个可能取值的结果，比如 0 或 1、通过或不通过、是或否等。逻辑回归适用于因变量（目标变量）为分类变量的情况。例如：\n            预测一封电子邮件是垃圾邮件（1）还是正常邮件（0）；\n            判断肿瘤是恶性（1）还是良性（0）；\n            判断一笔交易是否为欺诈（1 或 0）。\n        预测基于指定类别的概率进行。其工作原理与线性回归类似，但使用 logit 函数将数值压缩到 0 到 1 之间，从而得到概率值。\n\n#### 什么是 logit 函数？或者 sigmoid 函数？它们在机器学习和深度学习中有哪些应用？\n        Sigmoid 函数可能适用于将实数值转换为表示概率的值。而 Logit 函数的作用则是将概率值从负无穷映射到正无穷，最终将其转换为 0 或 1 的真实数值，代表假或真。Logit 函数常用于分类任务中，尤其是在逻辑回归中；而 Sigmoid 函数则广泛应用于深度学习中，用于在某一层或输出层中获取名义上的结果。\n\n#### 线性回归方程的梯度下降法公式是什么？\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_f045a2a4e115.jpg)\n\n#### 什么是支持向量机？它与OVR分类器有何不同？\n支持向量机是一种用于分类和回归的技术。它通过超平面估计来找到最佳的线性拟合超平面。尽管如此，SVM也可以通过核技巧处理非线性问题。SVM的核心思想是基于间隔最大化（即最大化两类之间的距离）。\n“一对多”是所有涉及分类的机器学习算法中常用的基本分类器概念，其基本思路是将某一类与其他所有类进行比较。\n为了使二分类器在多分类问题中表现良好，有两种启发式方法可以改进多分类任务：\n![](https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F7\u002F72\u002FSVM_margin.png\u002F300px-SVM_margin.png)\n\n\n使用OVO策略的算法包括：\n1) 极限学习机（ELM）\n2) 支持向量机（分类器）\n3) K近邻算法（基于距离的邻近类别）\n4) 朴素贝叶斯（基于最大后验概率MAP）\n5) 决策树（父节点选择一个特征后，在子节点中继续决策）\n6) 神经网络（不同结构的网络）\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_63b641eb114e.jpg)\n#### SVM核函数的类型\n可以将核函数看作是针对特定应用场景而设计的过滤器。\n\n1) 多项式核（常用于图像处理）\n2) 高斯核（当对数据缺乏先验知识时使用）\n3) 高斯径向基函数（与第2种相同）\n4) 拉普拉斯RBF核（推荐用于训练集超过百万条样本的情况）\n5) 双曲正切核（基于神经网络的核函数）\n6) Sigmoid核（可作为神经网络的代理）\n7) ANOVA径向基核（适用于回归问题）\n\n#### 回归分析中有哪些不同的评估指标？\n回归分析中有多种评估指标：\n1) 均方误差（MSE）：估计值与实际值之间差的平方的平均值。\n2) 平均绝对误差（MAE）：绝对差的平均值。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_f6891f027bf0.png)\n#### 如何定义平均绝对误差与均方误差？\nMAE：当你进行回归分析且不希望异常值产生过大影响时，应使用MAE。此外，如果你知道数据分布是多峰的，并且希望预测结果落在某个峰值上而非平均值上，MAE也会很有用。\nMSE：相反，当你希望惩罚异常值时，则应使用MSE。\n\n#### 如何评估你的分类器？\n分类器可以通过多种方式评估，最基本的是混淆矩阵及其相关指标，如真正例（TP）、真负例（TN）、假正例（FP）和假负例（FN）。同时，还可以结合准确率、精确率和召回率等指标进行综合评估。\n\n#### 什么是分类？\n分类是指根据指定的类别对各类别或实体进行归类，判断该类别是否存在于给定的数据集中。这一概念广泛应用于基于图像的分类或基于数据的分类。分类的结果通常以“是”或“否”的形式呈现，也可能以具体对象或类别的形式给出。\n  \n#### 如何区分多标签分类与多分类？\n多分类指的是一个样本只能属于多个类别中的某一个，例如A、B或C，但不能同时属于两个或更多类别。\n而在多标签分类中，一个样本可以属于一个或多个类别，例如A，或者A和B，甚至A、B和C。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_4ca36b7d6d5b.png)\n\n\n#### 什么是混淆矩阵？\n混淆矩阵，也称为误差矩阵，是一种特殊的表格布局，用于可视化算法（通常是监督学习算法，而非监督学习中则称为匹配矩阵）的性能。矩阵的每一行代表预测为某一类的实例，而每一列则代表实际属于某一类的实例（反之亦然）。\n之所以称为“混淆矩阵”，是因为它能清晰地展示系统是否容易将两类混淆（即经常把一类误判为另一类）。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_1fb4c81a1c5e.png)\n\n#### 哪些算法属于高偏差算法？\n偏差是指模型为了简化目标函数以便于近似而做出的假设。\n1) 高偏差算法最常见的是线性算法，这些算法关注线性关系或线性距离。例如：\n2) 线性回归、逻辑回归或线性判别分析。\n\n#### 哪些算法属于高方差和低方差算法？\n方差是指当训练数据发生变化时，目标函数的估计值会发生多大的变化。\n1) 高方差算法包括决策树、K近邻算法和支持向量机。\n2) 低方差算法包括线性回归、逻辑回归和LDA。\n\n#### 为什么上述算法会表现出高偏差或高方差？\n线性机器学习算法通常具有高偏差和低方差。\n而非线性机器学习算法则往往具有低偏差和高方差。\n\n#### 预测偏差的根本原因有哪些？\n预测偏差的可能根本原因包括：\n\n1) 特征集不完整\n2) 数据噪声过多\n3) 数据处理流程存在缺陷\n4) 训练样本存在偏差\n5) 正则化过于强烈\n\n\n\n#### 什么是梯度下降？随机梯度下降与普通梯度下降有何区别？\n梯度下降是一种用于解决优化问题的迭代方法。在经典的梯度下降中，没有“epoch”或“batch”的概念。梯度下降的关键在于：\n* 根据梯度方向更新权重。\n* 梯度是从所有数据点精确计算得出的。\n随机梯度下降可以这样理解：\n* 它是一种快速而粗略的方法，仅从单个数据点估算梯度。\n* 如果我们将“单个数据点”放宽到“数据的一个子集”，那么就引入了批次和轮次的概念。\n![OneVariableSGD](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_9a758600dd19.png)\n\n#### 什么是随机森林和决策树？\n决策树是一种决策支持工具，它采用树状模型来表示决策及其可能的后果，包括随机事件的结果、资源成本和效用。它是展示仅包含条件控制语句的算法的一种方式。\n\n随机森林或随机决策森林是一种用于分类、回归及其他任务的集成学习方法，其工作原理是在训练时构建大量决策树，并输出各棵树的类别众数（分类）或预测均值（回归）。这种方法可用于消除由单棵决策树引起的过拟合问题。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_484d9015d7b2.png)\n#### 什么是分裂过程？\n        根据提供的数据特征将数据划分为子集。（对决策树非常有用）\n\n#### 什么是剪枝过程？\n        剪枝是指缩短决策树的分支。这一过程旨在使决策更快地做出，通过将某些分支节点转化为叶节点来减小树的规模，并移除原分支下的叶节点。\n\n#### 如何进行树的选择？\n        树的选择主要基于以下两点：\n        1) 熵\n                决策树是从根节点自上而下构建的，涉及将数据划分为包含相似取值（同质性高）的子集。ID3算法使用熵来衡量样本的同质性。如果样本完全同质，熵为零；如果样本被均等划分，则熵为一。\n                熵(x) -> -p log(p) - qlog(q)，其中对数以2为底。\n        2) 信息增益  \n                信息增益是基于数据集在某个属性上划分后熵的减少量。构建决策树的关键在于找到能够带来最大信息增益的属性（即产生最同质的分支）。\n\n                2.1) 计算目标变量的熵。\n                2.2) 然后根据不同的属性对数据集进行划分，计算每个分支的熵，并按比例加权求和，得到划分后的总熵。用划分前的熵减去划分后的总熵，所得结果即为信息增益，也就是熵的减少量。\n                2.3) 选择信息增益最大的属性作为决策节点，按其分支划分数据集，并对每个分支重复上述过程。\n\n#### 决策树中熵的伪代码：\n        '''      \n        from math import log\n\n        def calculateEntropy(dataSet):\n        number = len(dataSet)\n        labelCounts = {}\n        for featureVector in dataSet:\n            currentLabel = featureVector[-1]\n            if currentLabel not in labelCounts.keys():\n            labelCounts[currentLabel] = 0\n            labelCounts[currentLabel] +=1\n        entropy = 0\n        for i in labelCounts:\n            probability = float(labelCounts[keys])\u002Fnumber\n            entropy -=probability*log(probability,2)\n        return entropy\n        '''\n#### 随机森林与决策树如何工作？\n        -* 决策树 *- 一种简单的树结构，由树的选择过程定义。\n        -* 随机森林 *- 多棵决策树的组合，通过聚合方式确定最终结果。\n        在分类任务中，随机森林的分类结果基于每棵树的投票；而在回归任务中，则基于各棵树结果的平均值。\n\n#### 什么是基尼指数？请解释该概念？\n        基尼指数通过从1中减去每个类别的概率平方和来计算。它倾向于较大的划分。\n        比如，你想绘制一棵决策树，并决定第一次应该使用哪个特征\u002F列进行划分？这很可能由你的基尼指数来决定。\n\n#### 基尼指数的计算过程是怎样的？\n        基尼指数：\n        对于每次划分中的每个分支：\n            计算该分支所占的比例，用于加权；\n            对于分支中的每个类别：\n                计算该类别在当前分支中的概率；\n                将该概率平方；\n            将所有类别概率的平方相加；\n            用1减去这个总和。#这就是该分支的基尼指数\n        根据基准概率对每个分支进行加权；\n        将各分支的加权基尼指数相加。\n\n#### 基尼分割\u002F基尼指数的公式是什么？\n        它倾向于较大的划分。\n        使用类别的平方比例。\n        如果完全正确分类，基尼指数为零。\n        如果均匀分布，则为1 – (1\u002F# 类别)。\n        你希望选择一个具有低基尼指数的变量进行划分。\n        算法公式为：1 – ( P(类别1)^2 + P(类别2)^2 + … + P(类别N)^2)\n\n#### 什么是概率？如何定义似然？\n        概率表示某事件发生的百分比，或事件的成功率。例如，可以说某个事件发生的概率是70%等。\n        假设我们抛掷一枚硬币，正面朝上的概率是0.5，因为硬币正反面的概率相等。因此，正面朝上的概率就是0.5。\n\n        似然是条件概率。仍以抛硬币为例，假设我们抛了10次，其中有7次正面朝上，3次背面朝上。似然可以通过二项分布计算得出，具体数值会因分布的不同而有所变化。\n\n\n        似然(成功事件) - > L(0.5|7)= 10C7 * 0.5^7 * (1-0.5)^3 = 0.1171 \n\n        L(0.5 | 7) 表示在已知成功次数的情况下，出现失败的概率。\n        10C7 表示从10次试验中选取7次成功的组合。\n        一般而言：\n            事件(X | Y)  -> C(总事件|成功事件) * [(X的概率) ^ (成功事件X)] * [(1 - X的概率) ^ (1 - 成功事件X)]\n\n#### 什么是熵？信息增益又是什么？它们有何区别？\n        熵：指正在处理的信息的随机性。\n\n        信息增益是将某个类别的概率乘以其概率的以2为底的对数。信息增益倾向于具有许多不同取值的小型划分。最终，你需要根据自己的数据和划分标准进行实验。信息增益取决于熵的变化（熵的减少意味着信息增益的增加）。\n\n\n#### 什么是KL散度？在机器学习中它的应用场景是什么？\n        库尔贝克-莱布勒散度用于计算一个分数，该分数衡量一个概率分布相对于另一个概率分布的差异。\n![](https:\u002F\u002Fwikimedia.org\u002Fapi\u002Frest_v1\u002Fmedia\u002Fmath\u002Frender\u002Fsvg\u002F4958785faae58310ca5ab69de1310e3aafd12b32)\n\n#### 如何定义交叉熵？它的主要用途是什么？\n        熵：指正在处理的信息的随机性。\n\n交叉熵：信息论领域中的一种度量方法，基于熵的概念，通常用于计算两个概率分布之间的差异。它与KL散度密切相关，但又有所不同；KL散度衡量的是两个概率分布之间的相对熵，而交叉熵则可以被视为计算这两个分布的总熵。\n\n交叉熵可以通过P和Q中事件的概率来计算，公式如下：\n                H(P, Q) = – sum x in X P(x) * log(Q(x))\n#### 你如何定义AUC-ROC曲线？\n        ROC曲线是一条概率曲线，而AUC则代表分类器对类别的区分能力或程度。AUC-ROC曲线是在不同阈值设置下对分类问题性能的衡量指标。\n\n        它反映了模型区分不同类别能力的强弱。主要用于分类问题中，在不同阈值下的性能评估。AUC值越高，表明模型越能准确地将0类预测为0、1类预测为1。类比而言，AUC值越高，说明模型越能有效地区分患病患者与健康个体。\n    \n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_5293a1e927d9.png)\n#### 你如何定义假阳性（第一类错误）和假阴性（第二类错误）？\n        假阳性：指模型错误地将某个样本预测为正类的情况。（实际为A，却被预测为B），也称为第一类错误。\n\n        假阴性：指模型错误地将某个样本预测为负类的情况。（实际不是A，却被预测为A），也称为第二类错误。\n\n#### 你如何定义精确率（Precision）和召回率（真正例率，Recall）？\n        以一个简单的分类例子——“将电子邮件分类为垃圾邮件或非垃圾邮件”为例：\n\n        精确率衡量的是被标记为垃圾邮件的邮件中，实际正确分类的比例，即阈值线右侧点所占的比例。也可以定义为被判定为正类的事件中，真正为正类的比例，例如：\n                精确率 = 真正例 \u002F (真正例 + 假正例)\n        \n        召回率则衡量的是所有实际的垃圾邮件中，被正确分类的比例：\n                召回率 = 真正例 \u002F (真正例 + 假阴例)\n        \n        精确率和召回率之间始终存在权衡关系，这与偏差和方差的关系类似。\n\n\n#### 对于你的分类模型，你会更倾向于选择精确率还是召回率？\n        这完全取决于具体的业务场景或领域专家的需求。例如，在欺诈检测等金融领域（如银行、在线电商网站），通常建议优先考虑提高召回率而非精确率。而在其他场景，比如单词建议或多标签分类任务中，则可能更注重精确率。总体而言，应根据具体的应用场景来决定。\n\n#### 什么是F1分数？它体现了怎样的含义？\n        F1分数是精确率和召回率的调和平均值，当精确率和召回率达到完美时，F1分数可达1。它也被称为Dice相似度系数。\n\n        大卫（统计学家）：F1分数之所以被广泛使用，是因为它同等重视精确率和召回率。然而在实际应用中，不同类型分类错误所带来的代价往往不同。换句话说，精确率和召回率的重要性因具体问题而异。\n\n#### 感知机与支持向量机（SVM）有何区别？\n        （核）感知机与SVM之间的一个主要实际区别在于：感知机可以进行在线训练（即随着新样本逐一到达时不断更新权重），而SVM则无法做到这一点。感知机本质上就是铰链损失函数加上随机梯度下降优化算法。\n\n        SVM的目标与L2正则化的感知机几乎相同。\n        SVM可以看作是铰链损失函数加上L2正则化项，再结合二次规划或其他更复杂的优化算法（如SMO算法）。\n\n#### 逻辑回归与线性回归有什么区别？\n        LogR是分类器，LR则是回归模型。\n        LogR的输出值介于0到1之间，表示概率。\n        LR的输出值则是从1到正无穷的实数，其中N为已知常数。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_164d3b11c8f2.png)\n#### 什么是异常值？如何去除异常值？\n        异常值是指在从总体中随机抽取的样本中，与其他数值相比偏离过大、显得不正常的观测值。\n![](https:\u002F\u002Fwww2.southeastern.edu\u002FAcademics\u002FFaculty\u002Fdgurney\u002FOutlier.jpg)\n\n        去除异常值的方法包括：\n            1) 使用四分位距法（IQR * 1.5）\n            2) 使用Z-score标准化后剔除法（使远离均值的数据点被移除）\n            3) 结合Z-score与IQR的自定义方法\n\n#### 什么是正则化？\n        正则化技术用于通过适当地拟合给定的训练数据集来减少误差，并防止过拟合。它会在损失函数中加入惩罚项（Lambda乘以权重大小），以抑制过于复杂的模型。\n\n#### L1正则化与L2正则化有何区别？\n        1) L1正则化（套索回归）\n            （最小绝对收缩和选择算子）在损失函数中加入系数绝对值大小作为惩罚项。\n\n        2) L2正则化（岭回归）\n            在损失函数中加入系数平方大小作为惩罚项。图中高亮部分即为L2正则化项。\n\n        这两种方法的关键区别在于：套索回归会将不重要特征的系数缩减至零，从而直接移除这些特征。因此，在特征数量较多的情况下，套索回归在特征选择方面表现更为出色。\n\n#### 数据采样有哪些不同的方法？\n        数据采样是一种统计分析技术，用于从较大的数据集中选取具有代表性的子集，并对其进行处理和分析，以识别整体数据中的模式和趋势。常见的数据采样方法有：\n        1) 简单随机抽样（随机选取记录）\n        2) 分层抽样（按共同特征划分子集，并保持各子集比例一致）\n        3) 聚类抽样（将总体按特定因素划分为若干聚类，再对每个聚类进行简单随机抽样）\n        4) 多阶段抽样（在聚类抽样的基础上进一步分层抽样）\n        5) 系统抽样（按固定间隔选取样本）\n\n#### 你能解释一下欠采样和过采样的概念吗？\n        欠采样是指将某一类别的样本从较大规模缩小到较小规模的过程，例如将100万条记录减少到10万条，同时保持各类别间的信息比例不变。\n\n过采样是指使用较小的类别样本（例如10万条）来扩展到百万级别，同时保持数据的趋势和特性，从而构建完整的数据集。\n\n#### 什么是不平衡类别？\n当各类别之间的样本数量不均衡时，就称为类别不平衡。具体来说，就是多个类别或所关注类别的正态分布或支持计数不相同或相差很大。  \n例如：  \n- 类别A有100万条记录  \n- 类别B有1000条记录  \n这种情况就是一个不平衡的数据集，而类别B则属于欠平衡类别。\n\n#### 如何解决类别不平衡问题？\n可以通过以下技术来解决类别不平衡问题：  \n1) 过采样  \n2) 欠采样  \n3) SMOTE（结合过采样和欠采样的方法）  \n4) 引入更多数据集  \n5) 进行更深入的趋势分析  \n\n#### 如何定义加权移动平均？\n加权移动平均是在移动平均的基础上，为某些特定时间段内重复出现的值赋予更高的权重，以确保这些值在计算中具有更高的优先级或影响力。\n\n#### 什么是ARIMA模型？\nARIMA是自回归积分滑动平均模型的简称，它结合了回归分析和移动平均法，适用于时间序列分析，并能以可接受的精度进行趋势预测。\n\n#### 如何定义Bagging和Boosting？XGBoost与随机森林有何不同？\n- Bagging：通过从原始数据集中有放回地抽取子样本，生成多个训练数据集，从而降低预测的方差。  \n  例如：随机森林（使用随机抽样的子集）。  \n- Boosting：一种迭代技术，根据上一次分类的结果调整样本的权重。  \n  例如：AdaBoost、XGBoost（主要采用梯度下降法）。\n\n#### 什么是IQR？它如何帮助去除异常值？\nIQR即四分位距，表示第三四分位数与第一四分位数之间的范围。  \n四分位数将数据分为四个相等的部分：  \n- Q1：0%-25%  \n- Q2：25%-50%  \n- Q3：50%-75%  \n- Q4：75%-100%  \nIQR = Q3 - Q1  \n\n#### 什么是SMOTE？\nSMOTE是一种合成少数类过采样技术。  \n它是一种非常流行的过采样方法，旨在改进随机过采样，但其在高维数据上的表现尚未得到充分研究。KNN算法可以从SMOTE中受益。\n\n#### 如何解决过拟合或欠拟合问题？\n**欠拟合**：  \n1) 增加模型复杂度  \n2) 延长训练时间  \n3) 降低学习率  \n\n**过拟合**：  \n1) 交叉验证  \n2) 提前停止  \n3) 提高学习率（跳跃式调整）  \n4) 集成方法  \n5) 引入更多数据  \n6) 删除特征  \n\n#### 避免过拟合的一些技术有哪些？\n1) 交叉验证  \n2) 提前停止  \n3) 提高学习率（跳跃式调整）  \n4) 集成方法  \n5) 引入更多数据  \n6) 删除特征  \n\n#### 什么是神经元？\n人工神经网络中的“神经元”是对生物神经元的一种数学近似。  \n它接收一个输入向量，对其进行变换，并输出一个标量值。  \n可以将其视为一种滤波器。通常，神经网络中会使用非线性滤波器。\n\n#### 什么是隐藏层和输入层？\n![](https:\u002F\u002Fwww.i2tutorials.com\u002Fwp-content\u002Fuploads\u002F2019\u002F05\u002FHidden-layrs-1-i2tutorials.jpg)\n\n1) 输入层：神经网络的初始输入层。  \n2) 隐藏层：位于输入层和输出层之间的一层，它会对输入应用权重，并通过激活函数处理后作为输出传递出去。  \n简而言之，隐藏层对输入数据进行非线性变换。  \n隐藏层的数量和结构取决于神经网络的具体功能，且每层的权重也可能不同。\n\n#### 什么是输出层？\nANN中的输出层决定了最终的输出结果，其具体形式完全取决于应用场景以及用于数值缩放的函数。  \n常见的选择包括：  \n- 线性函数：用于回归任务。  \n- Sigmoid\u002FSoftmax函数：用于分类任务。\n\n#### 什么是激活函数？\n激活函数对接收到的输入进行变换，以使值保持在激活函数允许的范围内。  \n它更像是应用于整个层（向量）的数学尺度滤波器，用于调整数值范围。  \n常见的激活函数包括：  \n1) Sigmoid或Softmax函数（存在梯度消失问题）  \n   Softmax的输出是一个非负向量，且总和为1。当类别互斥时（例如“这些图片只包含猫或狗，不可能同时包含两者”），Softmax非常有用。如果有2、3、4、5个互斥标签，都可以使用Softmax。  \n\n2) Tanh函数（同样存在梯度消失问题）  \n   如果输出被限制在[-1,1]范围内，Tanh可能比较合适。  \n\n3) ReLU函数  \n   当输出有上下限约束时，ReLU单元或类似变体会很有帮助。  \n   如果输出仅限于非负值，那么使用ReLU作为激活函数是合理的（0到Max(x)）。  \n\n4) Leaky ReLU函数  \n   用于解决隐藏层中ReLU函数可能出现的“死亡ReLU”问题。\n\n#### 什么是卷积神经网络？\n卷积神经网络是神经网络的一个子类，至少包含一个卷积层。  \n它们非常适合捕捉局部信息（如图像中的邻近像素或文本中的周围词语），同时还能降低模型的复杂度（训练更快、所需样本更少、减少过拟合风险）。  \n卷积单元会从前一层的多个单元接收输入，共同形成一个邻域。因此，这些输入单元会共享权重。  \n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_91a1b80d523f.jpeg)\n\n#### 什么是循环神经网络？\n        循环神经网络是一类人工神经网络，其中节点之间的连接沿着时间序列形成有向图。\n        这使得它能够表现出时间动态行为。RNN源自前馈神经网络，可以利用其内部状态（记忆）来处理可变长度的输入序列。这使它适用于诸如无分段连笔手写识别或语音识别等任务。\n![图片地址](https:\u002F\u002Fwww.i2tutorials.com\u002Fwp-content\u002Fuploads\u002F2019\u002F09\u002FNeural-network-62-i2tutorials.png)\n\n#### 什么是LSTM网络？\n        长短期记忆网络（LSTM）是一种人工循环神经网络架构。\n        与标准的前馈神经网络不同，LSTM具有反馈连接。它不仅可以处理单个数据点（如图像），还可以处理整个数据序列（如语音或视频）。\n        例如，LSTM适用于无分段连笔手写识别、网络流量异常检测或入侵检测系统等任务。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_1fd1ab45886b.png)\n\n#### 什么是卷积层？\n        卷积是将一个滤波器简单地应用于输入，从而产生激活的过程。对输入重复应用同一个滤波器会生成一张激活图，称为特征图，它指示了在输入（如图像）中检测到的特定特征的位置和强度。\n        您可以使用基于水平线、垂直线、灰度转换或其他转换滤波器的滤波器。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_5cf523ffcf2f.webp)\n\n#### 什么是池化层？\n        池化层通过总结特征图上各个区域中特征的存在情况，提供了一种下采样特征图的方法。\n        常见的两种池化方法是平均池化和最大池化，它们分别总结了特征的平均存在情况和最显著的特征激活情况。\n\n        这一步骤用于缩小特征尺度（例如，您已经检测到垂直线，现在需要减少一些特征以进入下一阶段）。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_37b56d2a67ff.png)\n\n#### 什么是最大池化层？它是如何工作的？\n        最大池化使用所考虑区域内找到的最大值。最大池化操作会计算每个特征图上每个小块中的最大值。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_81a0fa489ea4.png)\n\n#### 什么是核函数或滤波器？\n        核方法是一类用于模式分析的算法，其中最著名的是支持向量机（SVM）。\n        核函数已被引入用于序列数据、图、文本、图像以及向量等。\n        核函数用于通过线性分类器解决非线性问题，使其变得可用。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_1699c227e820.png)\n\n\n#### 什么是分割？\n        将数字源划分为多个片段的过程。\n        如果以图像为例，可以想象将图像源分割成多个片段，比如飞机对象。\n        分割的目标是简化和\u002F或改变图像的表示形式，使其更具意义且更易于分析。\n\n#### 什么是姿态估计？\n        从图像中检测出人体姿势即为姿态估计。\n\n#### 什么是前向传播？\n        输入数据沿网络的正向流动。每一层隐藏层接收输入数据，根据激活函数进行处理，并传递给下一层。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_4899a5df88f4.png)\n\n#### 什么是反向传播？\n        反向传播是训练神经网络的核心。它是根据前一 epoch（迭代）中获得的误差率（即损失）来微调神经网络权重的做法。\n        正确调整权重可以降低误差率，从而提高模型的泛化能力，使其更加可靠。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_c2a4e25c4aaf.jpg)\n\n#### 什么是丢弃神经元？\n        “丢弃”是指在神经网络中随机丢弃某些单元（包括隐藏层和输出层）。\n        简而言之，丢弃就是在训练过程中随机忽略某些神经元。所谓“忽略”，指的是在特定的前向或反向传播过程中不考虑这些单元。\n        更技术性地说，在每个训练阶段，每个节点都有概率 p 被保留，概率 1-p 被丢弃，从而留下一个规模较小的网络；被丢弃的节点的所有输入和输出边也会被移除。\n\n#### 什么是展平层？\n        展平层将输入的空间维度压缩为通道维度。例如，如果该层的输入是一个 H×W×C×N×S 的数组（一系列图像），那么展平后的输出就是一个 (H×W×C)×N×S 的数组。\n\n#### 反向传播是如何改进模型的？\n        它是根据前一 epoch（迭代）中获得的误差率（即损失）来微调神经网络权重的做法。正确调整权重可以降低误差率，从而提高模型的泛化能力，使其更加可靠。\n\n#### 什么是相关性和协方差？\n        “协方差”表示变量之间线性关系的方向。\n        而“相关性”则同时衡量两个变量之间线性关系的强度和方向。\n\n        在比较来自不同总体的数据样本时，协方差用于确定两个随机变量共同变化的程度，而相关性则用于判断一个变量的变化是否会导致另一个变量的变化。协方差和相关性都用于衡量变量之间的线性关系。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_d8f4fc8bedee.png)\n\n#### 什么是方差分析？何时使用方差分析？\n        方差分析（ANOVA）是一组统计模型及其相关的估计方法（例如组间和组内的“变异”），用于分析样本中各组均值之间的差异。\n\n        当您收集了一个分类自变量和一个定量因变量的数据时，可以使用单因素方差分析。自变量应至少有三个水平（即至少三个不同的组或类别）。\n\n#### 你如何定义降维？我们为什么要使用降维？\n        降维是指通过获取一组主变量来减少所考虑的随机变量数量的过程。方法可以分为特征选择和特征提取。\n        我们使用降维的原因是：\n                1) 数据集过于庞大\n                2) 训练时间\u002F数据收集时间过长\n                3) 假设过于复杂\u002F模型过拟合\n        降维的类型包括：\n                1) 特征选择\n                2) 特征投影（将数据从高维空间映射到低维空间）\n                3) 主成分分析\n                        一种用于降维的线性技术，它以最大化方差的方式将数据线性映射到低维空间。\n                4) 非负矩阵分解\n                5) 核主成分分析（利用核技巧的非线性方法）\n                6) 基于图的核主成分分析（局部线性嵌入、特征嵌入）\n\n                7) 线性判别分析\n                        一种用于统计学、模式识别和机器学习的方法，用于寻找能够区分或分离两个或多个类别对象或事件的特征线性组合。\n                8) 广义判别分析\n                8) t-SNE（是一种非线性的降维技术，适用于高维数据的可视化。）\n                9) UMAP\n                        均匀流形近似与投影（UMAP）是一种非线性降维技术。在视觉上，它类似于t-SNE，但它假设数据均匀分布在局部连通的黎曼流形上，并且该流形的黎曼度量在局部是常数或近似常数。\n                10) 自编码器（可以学习非线性降维函数）\n\n#### 什么是主成分分析？PCA在降维中是如何工作的？\n        主成分分析是降维的主要线性技术，它通过线性映射将数据降至低维空间，同时使低维表示中的数据方差最大化。实际上，首先会构建数据的协方差矩阵（有时是相关矩阵），然后计算该矩阵的特征向量。与最大特征值相对应的特征向量（即主成分）可用于重建原始数据的大部分方差。原始空间（维度等于数据点的数量）已被降维至由少数几个特征向量张成的空间（虽然会有一定的信息损失，但有望保留最重要的方差）。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_6add72a48e8f.jpg)\n\n#### 什么是最大似然估计？\n        最大似然估计是一种确定模型参数值的方法。这些参数值被选择为使得模型描述的过程产生实际观测到的数据的可能性最大。\n\n#### 什么是朴素贝叶斯？它是如何工作的？\n        一种通过最大化似然函数来估计概率分布参数的方法，使得在假定的统计模型下，观测到的数据具有最大的概率。\n![](https:\u002F\u002Fwikimedia.org\u002Fapi\u002Frest_v1\u002Fmedia\u002Fmath\u002Frender\u002Fsvg\u002F52bd0ca5938da89d7f9bf388dc7edcbd546c118e)\n\n![](https:\u002F\u002Fwikimedia.org\u002Fapi\u002Frest_v1\u002Fmedia\u002Fmath\u002Frender\u002Fsvg\u002Fd0d9f596ba491384422716b01dbe74472060d0d7)\n\n\n#### 什么是贝叶斯定理？\n        基于对可能与该事件相关的条件的先验知识，对某一事件发生的概率进行评估。\n\n#### 什么是概率？\n        概率是一个介于0和1之间的数值，大致来说，0表示不可能发生，1表示必然发生。事件的概率越高，该事件发生的可能性就越大。例如：\n        一个简单的例子就是抛掷一枚公平（无偏）的硬币。由于硬币是公平的，两种结果（“正面”和“反面”）发生的概率相等；“正面”的概率等于“反面”的概率；并且由于没有其他可能的结果，因此“正面”或“反面”任一结果的概率都是1\u002F2（也可以写成0.5或50%）。\n[参考链接](https:\u002F\u002Fmachinelearningmastery.com\u002Fjoint-marginal-and-conditional-probability-for-machine-learning\u002F)\n#### 什么是联合概率？\n        联合概率是一种统计度量，用于计算两个事件同时发生且在同一时刻发生的可能性。\n                P(A和B) 或 P(A ^ B) 或 P(A & B)\n                联合概率的计算公式为：\n                P(A和B) = P(A给定B) * P(B)       \n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_14fe5f0e0122.jpg)\n\n#### 什么是边缘概率？\n        在变量Y已知的情况下，事件X=A发生的概率。单个随机事件的概率P(A)，即一个独立事件的单一概率。\n\n#### 什么是条件概率？什么是分布概率？\n        在事件B已知的情况下，事件A发生的概率称为条件概率。\n\n#### 什么是Z分数？\n        Z分数（也称为标准分数）表示某个观测值或数据点与所观测数据的均值相差多少个标准差。\n\n#### 什么是KNN？它是如何工作的？邻域标准是什么？如何调整它？\n        KNN算法依赖于计算某一点与类中各个点之间的距离，从而形成基于投票的邻域分类器。最终预测结果取决于哪些点最接近待预测的输入点。你可以根据需要设置任意数量的邻居，指定的邻居越多，分类器就会考虑更多的类别来决定最终结果。\n        其工作原理与距离算法类似，你需要先确定目标点，然后计算所有邻近点的距离，找出最近的几个点。之后根据投票结果得出结论，例如，在该邻域内有5个属于类别A的点，2个属于类别B的点，则最终判定为类别A。\n\n#### 根据欺诈交易，您更倾向于低假阴率还是低假阳率？\n        建议选择低假阴率，原因是如果将实际发生的欺诈交易误判为未发生，\n        这将对业务模式产生巨大影响。\n\n#### KNN和K均值算法有何区别？\n        K均值：无监督学习，随机选取初始点，通过基于距离的平均值进行预测。\n        KNN：有监督学习，基于邻近样本，使用C值进行投票。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_e50401a4bdaf.png)\n\n#### 什么是注意力机制？请举例说明。\n        神经网络中的注意力机制使模型能够专注于其输入（或特征）的一个子集。\n        1) 硬注意力（图像裁剪）\n        2) 软注意力（突出显示关注区域，同时保持图像尺寸不变）\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_e6b688e3cf92.png)\n\n#### 什么是自编码器？什么是Transformer？\n        自编码器接收输入数据，将其压缩成一个代码，然后尝试从这个压缩后的代码中重建原始输入。这就像先读完《白鲸记》，再根据简短的梗概尝试重新写出原作一样。虽然这是一种有趣的深度学习技巧，但在现实世界中，简单的自编码器应用较少。然而，如果增加一些复杂性，其应用场景就会大大扩展：例如，在训练过程中同时使用带噪声和无噪声的图像，自编码器可以去除图像、视频或医学扫描等视觉数据中的噪声，从而提高图像质量。\n\n#### 什么是图像字幕生成？\n        图像字幕生成是指为图像生成文字描述的过程。它结合了自然语言处理和计算机视觉技术来生成字幕。\n\n![](https:\u002F\u002Fcdn.analyticsvidhya.com\u002Fwp-content\u002Fuploads\u002F2018\u002F03\u002Fexample.png)\n\n#### 请举几个文本摘要的例子。\n        文本摘要的任务是将一段较长的文本浓缩成较短的形式，在保留原意的同时减少文本的篇幅。\n        一些例子包括：\n                1) 论文摘要\n                2) 文档摘要\n                等\n\n#### 请定义风格迁移。\n        风格迁移是一种计算机视觉技术，它将一张图像的艺术风格应用到另一张图像的内容上。\n        该技术通常使用卷积神经网络（如VGG）分离并重组两张图像的内容和风格表示。损失函数结合了内容损失（保持内容图像的结构）和风格损失（匹配风格图像的Gram矩阵统计信息）。神经风格迁移由Gatys等人于2015年提出。现代方法则采用快速风格迁移（预训练的前馈网络）以实现实时应用。应用场景包括：照片滤镜、艺术图像生成、创意工具等。\n\n#### 请定义图像分割和姿态分析。\n        图像分割：在数字图像处理和计算机视觉中，图像分割是将数字图像划分为多个区域（像素集合，也称为图像对象）的过程。分割的目标是简化图像表示，使其更易于分析。\n        \n        姿态分析：\n                指确定人体位置和方向的过程。\n\n![PoseSegmentation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_7d931b674dec.jpg)\n\n#### 请定义语义分割。\n        语义分割是根据物体类型对图像进行分割。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_b9bbe7c5084f.png)\n#### 什么是实例分割？\n        与语义分割类似，但针对每个独立的物体，并为其分配唯一的标识符。\n\n#### 什么是命令式编程和符号式编程？\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_readme_a86c1f75d606.jpg)\n\n#### 请定义文本分类，并给出一些应用场景。\n        文本分类又称文本标注或文本归类，是将文本按照特定类别进行划分的过程。通过自然语言处理（NLP），文本分类器可以自动分析文本内容，并根据其内容自动分配预先定义好的标签或类别。\n        应用场景包括：\n                1) 文档分类\n                2) 文档归类\n                3) 文档中的兴趣点识别\n                4) 光学字符识别（OCR）\n                等\n\n![](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FRaghava_Rao_Mukkamala\u002Fpublication\u002F321892732\u002Ffigure\u002Ffig3\u002FAS:574016848764930@1513867689381\u002FText-Classification-Architecture.png)\n\n#### 处理缺失数据应使用哪些算法？\n![](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F330704615\u002Ffigure\u002Ffig2\u002FAS:720385997815812@1548764814471\u002FMachine-learning-with-missing-data-Conventional-single-imputation-methods-for-handling.ppm)\n\n参考来源：https:\u002F\u002Fgithub.com\u002Fandrewekhalel\u002FMLQuestions\n\n#### 1) 偏差与方差之间存在怎样的权衡？[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n        如果我们的模型过于简单，参数很少，则可能具有高偏差和低方差。相反，如果模型参数较多，则会表现出高方差和低偏差。因此，我们需要找到一个适当的平衡点，避免过拟合和欠拟合数据。[[来源]](https:\u002F\u002Ftowardsdatascience.com\u002Funderstanding-the-bias-variance-tradeoff-165e6942b229)\n\n#### 2) 什么是梯度下降法？[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n        [[解答]](https:\u002F\u002Fmachinelearningmastery.com\u002Fgradient-descent-for-machine-learning\u002F)\n        梯度下降法是一种优化算法，用于寻找使成本函数最小化的函数参数（系数）值。\n\n        当参数无法通过解析方法（如线性代数）计算得出，而必须借助优化算法进行搜索时，梯度下降法尤为适用。\n\n#### 3) 请解释过拟合和欠拟合现象，以及如何应对它们？[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n        [[解答]](https:\u002F\u002Ftowardsdatascience.com\u002Foverfitting-vs-underfitting-a-complete-example-d05dd7e19765)\n        机器学习\u002F深度学习模型本质上是在其给定的输入（即训练特征）和目标输出（即标签）之间建立某种关系。无论这种学习到的关系（函数）的质量如何，其在测试集（与训练数据不同的数据集）上的表现都需要进一步评估。\n\n大多数机器学习\u002F深度学习模型都包含可训练的参数，这些参数会在训练过程中被优化，以建立输入与输出之间的关系。根据模型参数的数量，可以将其分为灵活性较高（参数较多）和灵活性较低（参数较少）两类。\n\n当模型的灵活性（即参数数量）不足以捕捉训练数据中的潜在模式时，就会出现欠拟合问题。而当模型过于灵活、过度适应训练数据中的噪声时，则会出现过拟合现象。在后一种情况下，我们说模型“记住了”训练数据。\n\n例如，用一阶多项式（直线）来拟合二阶多项式（二次函数），就是欠拟合的一个例子。同样，用十阶多项式来拟合一条直线，则属于过拟合。\n\n#### 4) 如何应对维度灾难？[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n\n- 特征选择（手动或通过统计方法）\n- 主成分分析（PCA）\n- 多维尺度分析\n- 局部线性嵌入  \n[[来源]](https:\u002F\u002Ftowardsdatascience.com\u002Fwhy-and-how-to-get-rid-of-the-curse-of-dimensionality-right-with-breast-cancer-dataset-7d528fb5f6c0)\n\n#### 5) 什么是正则化？为什么使用正则化？请列举一些常见的正则化方法。[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n正则化是一种通过限制模型复杂度或灵活性来防止过拟合的技术。\n示例：\n- 岭回归（L2范数）\n- Lasso回归（L1范数）\n\n岭回归的一个明显缺点是模型的可解释性较差。它会将不重要特征的系数压缩到接近于零，但不会完全变为零。也就是说，最终模型中仍然会包含所有特征。然而，在Lasso回归中，当调节参数λ足够大时，L1惩罚项会使部分系数精确地等于零。因此，Lasso不仅能够进行正则化，还能实现变量选择，从而得到稀疏模型。\n[[来源]](https:\u002F\u002Ftowardsdatascience.com\u002Fregularization-in-machine-learning-76441ddcf99a)\n\n#### 6) 请解释主成分分析（PCA）。[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n[[答案]](https:\u002F\u002Ftowardsdatascience.com\u002Fa-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c)\n\n主成分分析（PCA）是一种降维技术，用于在机器学习中减少数据集的特征数量，同时尽可能保留数据的信息量。其原理是找到数据变化最大的方向（主成分），并将数据投影到这些方向构成的低维子空间上。\n\n#### 7) 为什么ReLU在神经网络中比Sigmoid更好、更常用？[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n\n- 计算效率：由于ReLU是一个简单的阈值函数，前向传播和反向传播的速度更快。\n- 减少梯度消失的风险：ReLU对正数的导数为1，对负数的导数为0；而Sigmoid激活函数在输入稍有变化时就会迅速饱和（梯度接近于0），从而导致梯度消失的问题。\n- 稀疏性：当ReLU的输入为负数时，会产生稀疏激活现象，即只有少数神经元会被激活，从而使网络更加轻量化。\n\n\n[[来源1]](https:\u002F\u002Fmedium.com\u002Fthe-theory-of-everything\u002Funderstanding-activation-functions-in-neural-networks-9491262884e0) [[来源2]](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F126238\u002Fwhat-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-networks)\n\n\n\n#### 8) 给定一维卷积神经网络中每一层的步长S和卷积核大小，编写一个函数来计算网络中某个节点的[感受野](https:\u002F\u002Fwww.quora.com\u002FWhat-is-a-receptive-field-in-a-convolutional-neural-network)。这实际上就是计算有多少个输入节点会连接到CNN中的某个神经元。[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n感受野是指在一次运算中，用于生成输出的输入区域的一部分。\n\n假设CNN的滤波器大小为k，某一层的感受野就是滤波器使用的输入数量k，再乘以输入中未被卷积滤波器缩小的维度a。这样就得到了k×a的感受野。\n\n更直观地说，对于一个32×32×3的图像，如果使用5×5的卷积滤波器，那么对应的感受野就是滤波器尺寸5乘以输入体积的深度（RGB颜色通道），即5×5×3的尺寸。\n\n#### 9) 在图像\u002F矩阵上实现[连通组件](http:\u002F\u002Faishack.in\u002Ftutorials\u002Flabelling-connected-components-example\u002F)。[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n\n#### 10) 用C++实现一个稀疏矩阵类。[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n[[答案]](https:\u002F\u002Fwww.geeksforgeeks.org\u002Fsparse-matrix-representation\u002F)\n\n#### 11) 编写一个函数来计算[积分图像](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSummed-area_table)，并再编写一个函数从积分图像中获取区域和。[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n[[答案]](https:\u002F\u002Fwww.geeksforgeeks.org\u002Fsubmatrix-sum-queries\u002F)\n\n#### 12) 当试图从带有噪声的样本中估计一个平面时，如何去除异常值？[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n随机抽样一致性算法（RANSAC）是一种迭代方法，用于从包含异常值的数据集中估计数学模型的参数，且不让异常值影响估计结果。[[来源]](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FRandom_sample_consensus)\n\n\n\n#### 13) [CBIR](https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002Fpublications\u002F2013\u002Farandjelovic13\u002Farandjelovic13.pdf)是如何工作的？[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)]\n\n[[答案]](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FContent-based_image_retrieval)\n基于内容的图像检索是指利用图像本身的内容来提取元数据的概念。与目前基于图像关联关键词的检索方法不同，该技术通过计算机视觉技术生成元数据，以提取在查询过程中所需的相关信息。实现方式多种多样，从特征检测来提取关键词，到使用卷积神经网络（CNN）提取密集特征，并将其映射到已知的关键词分布上。\n\n采用后一种方法时，我们不再关注图像中具体显示的内容，而是更关注由已知图像生成的元数据与一组已知标签或标记在该元数据空间中的相似性。\n\n#### 14) 图像配准是如何工作的？稀疏与稠密[光流](http:\u002F\u002Fwww.ncorr.com\u002Fdownload\u002Fpublications\u002Fbakerunify.pdf)等方法。[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n#### 15) 请描述卷积是如何工作的。如果输入是灰度图像和RGB图像，又会有什么不同？下一层的形状由什么决定？[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n在卷积神经网络（CNN）中，卷积操作会使用一个称为核或滤波器的小矩阵对输入图像进行处理。核以步长滑过图像，与图像对应位置的元素逐点相乘并求和，最终得到的结果称为特征图。\n\n当输入为RGB图像（或多于3个通道）时，滑动窗口则会变成一个三维立方体。下一层的形状由卷积核大小、卷积核数量、步长、填充以及膨胀率等因素决定。\n\n[[来源1]](https:\u002F\u002Fdev.to\u002Fsandeepbalachandran\u002Fmachine-learning-convolution-with-color-images-2p41)[[来源2]](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F70231487\u002Foutput-dimensions-of-convolution-in-pytorch)\n\n#### 16) 请告诉我如何根据物体周围各个角度拍摄的图像及深度传感器测量数据，创建该物体的3D模型。[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n3D重建主要有两种流行的方法：\n* 运动恢复结构（SfM）[[来源]](https:\u002F\u002Fwww.mathworks.com\u002Fhelp\u002Fvision\u002Fug\u002Fstructure-from-motion.html)\n\n* 多视角立体视觉（MVS）[[来源]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Zwwty2qPNs8)\n\nSfM更适合用于构建大型场景的模型，而MVS则更适合用于构建小型物体的模型。\n\n\n#### 17) 不使用任何特殊函数，仅利用基本算术运算，实现SQRT(const double & x)。[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n可以使用泰勒级数来近似计算sqrt(x)：\n\n[[答案]](https:\u002F\u002Fmath.stackexchange.com\u002Fquestions\u002F732540\u002Ftaylor-series-of-sqrt1x-using-sigma-notation)\n\n#### 18) 反转一个位串。[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n如果你使用的是Python3：\n\n```\ndata = b'\\xAD\\xDE\\xDE\\xC0'\nmy_data = bytearray(data)\nmy_data.reverse()\n```\n\n#### 19) 尽可能高效地实现非极大值抑制。[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n非极大值抑制（NMS）是一种用于消除同一图像中对同一目标多次检测的技术。\n解决这一问题的第一步是按照置信度分数对边界框进行排序（N LogN）。然后从得分最高的框开始，移除那些与当前框重叠度（IoU）超过特定阈值的框。（N^2）\n\n为了优化这个方案，可以使用R树或KD树等特殊数据结构来快速查询重叠的框。（N LogN）\n[[来源]](https:\u002F\u002Ftowardsdatascience.com\u002Fnon-maxima-suppression-139f7e00f0b5)\n\n#### 20) 在原地反转一个链表。[[来源]](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcomputervision\u002Fcomments\u002F7gku4z\u002Ftechnical_interview_questions_in_cv\u002F)\n\n[[答案]](https:\u002F\u002Fwww.geeksforgeeks.org\u002Freverse-a-linked-list\u002F)\n\n#### 21) 什么是数据归一化？为什么我们需要它？[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n数据归一化是非常重要的预处理步骤，用于将数值缩放到特定范围，以确保反向传播过程中的更好收敛性。通常，这涉及对每个数据点减去其均值，再除以其标准差。如果不进行归一化，某些数值较大的特征会在损失函数中被赋予更高的权重（例如，一个高量级特征变化1%，影响就很大；而对于低量级特征来说，这种变化几乎可以忽略不计）。通过归一化，所有特征的权重才能趋于一致。\n\n#### 22) 为什么我们在处理图像时使用卷积而不是全连接层？[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n首先，卷积能够保留、编码并充分利用图像中的空间信息。如果我们只使用全连接层，就无法获取相对的空间信息。其次，卷积神经网络（CNN）具有一定的平移不变性，因为每个卷积核都充当独立的滤波器或特征检测器。\n\n#### 23) 是什么使CNN具有平移不变性？[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n如上所述，每个卷积核都充当独立的滤波器或特征检测器。因此，假设你在进行目标检测，无论目标位于图像的哪个位置，都不会影响结果，因为我们始终会以滑动窗口的方式在整个图像上应用卷积操作。\n\n#### 24) 为什么分类任务的CNN中要使用最大池化？[[来源]](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)\n最大池化在计算机视觉中有重要作用。它能够减少计算量，因为在池化之后，特征图的尺寸会变小。同时，由于取的是最大激活值，语义信息也不会丢失太多。此外，还有观点认为，最大池化也有助于增强CNN的平移不变性。可以参考吴恩达关于[最大池化的优点](https:\u002F\u002Fwww.coursera.org\u002Flearn\u002Fconvolutional-neural-networks\u002Flecture\u002FhELHk\u002Fpooling-layers)的精彩视频。\n\n#### 25) 为什么分割卷积神经网络通常采用编码器-解码器的结构？[[来源](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)] \n编码器可以被看作是一个特征提取网络，而解码器则利用这些特征信息，通过“解码”并上采样到原始图像尺寸来预测图像的各个分割区域。\n\n#### 26) 残差网络有什么重要意义？[[来源](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)] \n残差连接的主要作用是允许特征直接从前面的层获取信息。这使得信息在网络中的传播变得更加容易。一篇非常有趣的论文表明，使用局部跳跃连接可以使网络形成一种多路径的集成结构，从而使特征可以通过多种路径在网络中传播。\n\n#### 27) 什么是批归一化？它为什么有效？[[来源](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)] \n训练深度神经网络的一个复杂之处在于，随着前一层参数的变化，每一层输入的分布也会在训练过程中发生变化。批归一化的思想就是对每一层的输入进行归一化处理，使其输出激活值的均值为零、标准差为一。这一操作是在每个单独的小批量数据上完成的，即仅计算该小批量的均值和方差，然后进行归一化。这类似于对网络输入进行标准化的方式。那么，这种做法有什么好处呢？我们知道，对网络输入进行归一化有助于模型的学习。然而，神经网络实际上是由一系列层组成的，其中一层的输出会作为下一层的输入。因此，我们可以将神经网络中的任意一层视为一个更小子网络的第一层。从这个角度来看，我们可以在应用激活函数之前对某一层的输出进行归一化，然后再将其传递给下一层（子网络）。\n\n#### 28) 为什么通常使用多个3×3这样的小卷积核，而不是少数几个大卷积核？[[来源](http:\u002F\u002Fhouseofbots.com\u002Fnews-detail\u002F2849-4-data-science-and-machine-learning-interview-questions)] \n这一点在[VGGNet论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.1556.pdf)中有非常清晰的解释。原因有两点：首先，使用多个小卷积核可以获得与少数大卷积核相同的感受野，并捕捉更多的空间上下文信息，但小卷积核所需的参数和计算量更少。其次，由于小卷积核需要使用更多的滤波器，因此可以引入更多的激活函数，从而使卷积神经网络学习到更具区分性的映射函数。\n\n#### 29) 为什么我们需要验证集和测试集？它们之间有什么区别？[[来源](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)] \n在训练模型时，我们会将可用数据划分为三个独立的集合：\n\n - 训练集用于拟合模型的参数。然而，在训练集上得到的准确率并不能可靠地预测模型在新样本上的表现。\n - 验证集用于评估模型在未参与训练的数据上的表现。基于验证集计算的指标可以用来调整模型的超参数。但是，每次我们评估验证集并根据其结果做出决策时，都会将验证集的信息泄露到模型中。评估次数越多，泄露的信息就越多。这样一来，我们就有可能对验证集过拟合，最终导致验证分数无法再可靠地反映模型在实际应用中的表现。\n - 测试集用于评估模型在从未见过的数据上的表现。只有在我们已经通过验证集调优了模型参数之后，才能使用测试集。\n\n因此，如果我们省略测试集而只使用验证集，那么验证分数就无法很好地估计模型的泛化能力。\n\n#### 30) 什么是分层交叉验证？我们在什么情况下应该使用它？[[来源](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)] \n交叉验证是一种将数据划分为训练集和验证集的技术。在普通的交叉验证中，这种划分是随机进行的。而在分层交叉验证中，划分会保持训练集和验证集中各类别比例的一致性。\n\n例如，假设我们有一个数据集，其中类别A占10%，类别B占90%。如果使用分层交叉验证，那么训练集和验证集中的比例将完全相同。相反，如果使用普通的交叉验证，在最坏的情况下，验证集中可能完全没有类别A的样本。\n\n分层交叉验证适用于以下场景：\n\n - 数据集中包含多个类别。数据集越小、类别分布越不均衡，就越需要使用分层交叉验证。\n - 数据集中包含不同分布的数据。例如，在自动驾驶的数据集中，可能会同时包含白天和夜晚拍摄的图像。如果不确保训练集和验证集中都包含这两种类型的数据，模型就会出现泛化问题。\n\n#### 31) 为什么集成模型的性能通常优于单个模型？[[来源](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)] \n集成模型是通过结合多个模型来生成单一预测的方法。提高预测准确性的关键在于，各个模型应犯不同的错误。这样，一个模型的错误就可以被其他模型的正确预测所弥补，从而提升集成模型的整体性能。\n\n为了构建一个有效的集成模型，我们需要多样化的子模型。实现多样性的方法包括：\n - 使用不同的机器学习算法。例如，可以将逻辑回归、K近邻和决策树结合起来。\n - 在训练时使用数据的不同子集。这种方法称为自助法集成。\n - 为训练集中的每个样本赋予不同的权重。如果按照集成模型的误差逐步调整样本权重，则称为提升法。\n\n许多数据科学竞赛的获胜方案都是集成模型。然而，在实际的机器学习项目中，工程师需要在运行时间和准确性之间找到平衡。\n\n#### 32) 什么是不平衡数据集？你能列出一些处理它的方法吗？[[来源](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)]\n不平衡数据集是指目标类别比例不均衡的数据集。例如，在一个用于检测某种疾病的医学图像数据集中，通常负样本的数量会远远多于正样本——比如98%的图像没有该疾病，而只有2%的图像有该疾病。\n\n处理不平衡数据集有不同的方法：\n- 过采样或欠采样。我们不必从训练数据集中以均匀分布进行采样，而是可以采用其他分布方式，使模型看到更平衡的数据集。\n- 数据增强。我们可以通过对现有数据进行可控的修改来增加较少出现类别的数据。在上述示例中，我们可以翻转带有疾病的图像，或者在图像副本上添加噪声，同时确保疾病仍然可见。\n- 使用合适的评估指标。在上述示例中，如果有一个总是预测为阴性的模型，其准确率也会达到98%。然而，当使用不平衡数据集时，精确率、召回率和F1分数等指标能更好地反映模型的准确性。\n\n#### 33) 你能解释一下监督学习、无监督学习和强化学习之间的区别吗？[[来源](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)]\n在监督学习中，我们训练模型来学习输入数据与输出数据之间的关系。要进行监督学习，必须要有标注好的数据。\n\n而在无监督学习中，我们只有未标注的数据。模型会学习数据的表示形式。无监督学习常用于在大量未标注数据和少量标注数据的情况下初始化模型参数。我们先训练一个无监督模型，然后利用该模型的权重来训练一个监督模型。\n\n在强化学习中，模型接收输入数据，并根据其输出获得奖励。模型通过学习一种能够最大化奖励的策略来完成任务。强化学习已成功应用于围棋等策略性游戏，甚至经典的雅达利电子游戏。\n\n#### 34) 什么是数据增强？你能举几个例子吗？[[来源](https:\u002F\u002Fwww.toptal.com\u002Fmachine-learning\u002Finterview-questions)]\n数据增强是一种通过修改现有数据来合成新数据的技术，且这种修改不会改变目标内容，或者只会以已知的方式改变目标内容。\n\n计算机视觉是数据增强非常有用的一个领域。我们可以对图像进行许多种修改：\n- 调整大小\n- 水平或垂直翻转\n- 旋转\n- 添加噪声\n- 变形\n- 修改颜色\n每种问题都需要定制化的数据增强流程。例如，在光学字符识别（OCR）中，翻转会改变文本内容，因此并不适用；但调整大小和小幅旋转可能会有所帮助。\n\n#### 35) 什么是图灵测试？[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n图灵测试是一种用来检验机器是否具备人类水平智能的方法。通过让机器挑战人类智能，若机器通过测试，则被认为具有智能。然而，即使机器能够模仿人类行为，也不一定意味着它真正理解人类。\n\n#### 36) 什么是精确率？\n精确率（也称为阳性预测值）是指检索到的实例中相关实例所占的比例。  \n精确率 = 真阳性 \u002F (真阳性 + 假阳性)  \n[[来源]](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPrecision_and_recall)\n\n#### 37) 什么是召回率？\n召回率（也称为灵敏度）是指所有相关实例中被正确检索出来的实例所占的比例。  \n召回率 = 真阳性 \u002F (真阳性 + 假阴性)  \n[[来源]](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPrecision_and_recall)\n\n#### 38) 请定义F1分数。[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n它是精确率和召回率的加权平均值。它同时考虑了假阳性和假阴性的情况，常用于衡量模型的性能。  \nF1分数 = 2 * (精确率 * 召回率) \u002F (精确率 + 召回率)\n\n#### 39) 什么是损失函数？[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n损失函数是一个标量函数，用于量化神经网络的误差程度。损失函数越低，神经网络的表现越好。例如，在MNIST数据集中，输入图像是数字2，而神经网络错误地预测为3。\n\n#### 40) 列出不同的激活函数或神经元类型。[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n- 线性神经元\n- 二值阈值神经元\n- 随机二值神经元\n- Sigmoid神经元\n- Tanh函数\n- 整流线性单元（ReLU）\n\n#### 41) 请定义学习率。\n学习率是一个超参数，用于控制我们在梯度下降过程中调整网络权重的幅度。[[来源](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FLearning_rate)]\n\n#### 42) 什么是动量法（在神经网络优化中）？\n动量法可以让优化算法记住上一步的方向，并将其部分贡献加入到当前步骤中。这样一来，即使算法陷入平坦区域或局部最小值，也能突破并继续向真正的最小值前进。[[来源]](https:\u002F\u002Fwww.quora.com\u002FWhat-is-the-difference-between-momentum-and-learning-rate)\n\n#### 43) 批量梯度下降和随机梯度下降有什么区别？\n批量梯度下降是使用整个数据集计算梯度。这种方法非常适合凸的或相对平滑的误差曲面。在这种情况下，我们可以较为直接地朝着局部或全局最优解前进。此外，如果学习率逐渐降低，批量梯度下降最终会找到其所处吸引域内的最小值。\n\n随机梯度下降（SGD）则是每次只用一个样本计算梯度。SGD在存在大量局部极值的误差曲面上表现更好（虽然不能说“好”，但相比批量梯度下降确实更优）。由于使用较少的样本计算梯度，产生的梯度会更加“嘈杂”，这有助于模型跳出局部最小值，进入一个可能更优的区域。[[来源]](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F49528\u002Fbatch-gradient-descent-versus-stochastic-gradient-descent)\n\n#### 44) 时代、批次和迭代的区别。\n- **时代**：对**所有**训练样本进行一次前向传播和一次反向传播。\n- **批次**：在一次前向和反向传播中一起处理的样本。\n- **迭代**：训练样本总数除以批次大小。\n\n#### 45) 什么是梯度消失？[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n随着隐藏层的不断增加，反向传播在将信息传递到较低层时变得越来越无效。实际上，当信息反向传播时，梯度会逐渐消失，相对于网络中的权重而言变得非常小。\n\n#### 46) 什么是 Dropout？[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\nDropout 是一种简单有效的方法，用于防止神经网络过拟合。它通过随机丢弃神经网络中的一部分单元来实现。这类似于自然界的繁殖过程：自然界通过组合不同的基因（即丢弃其他基因）来产生后代，而不是强化那些相互适应的基因。\n\n#### 47) 请定义 LSTM。[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n长短期记忆网络（LSTM）专门设计用于解决长期依赖问题，它通过维护一个状态来决定记住什么、忘记什么。\n\n#### 48) 列举 LSTM 的关键组件。[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n - 门控机制（遗忘门、记忆门、更新门和读取门）\n - tanh(x)（取值范围为 -1 到 1）\n - Sigmoid(x)（取值范围为 0 到 1）\n\n#### 49) 列举 RNN 的变体。[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n - LSTM：长短期记忆网络\n - GRU：门控循环单元\n - 端到端网络\n - 记忆网络\n\n#### 50) 什么是自编码器？请列举几个应用。[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n自编码器主要用于学习给定数据的压缩表示。其常见应用包括：\n - 数据去噪\n - 降维\n - 图像重建\n - 图像上色\n\n#### 51) GAN 的组成部分有哪些？[[来源](https:\u002F\u002Fintellipaat.com\u002Finterview-question\u002Fartificial-intelligence-interview-questions\u002F)]\n - 生成器\n - 判别器\n\n#### 52) Boosting 和 Bagging 有什么区别？\nBoosting 和 Bagging 都是集成学习技术，它们通过结合多个弱学习器（分类器或回归器，其性能仅略高于随机猜测）来创建一个能够做出准确预测的强学习器。Bagging 是通过对数据集进行有放回的自助采样，然后用每个样本训练一个可能较弱的学习器。而 Boosting 则使用全部数据来训练每个学习器，但之前被错误分类的样本会在后续训练中获得更高的权重，从而使后续学习器更加关注这些样本。[[来源]](https:\u002F\u002Fwww.quora.com\u002FWhats-the-difference-between-boosting-and-bagging)\n\n#### 53) 请解释 ROC 曲线的工作原理。[[来源]](https:\u002F\u002Fwww.springboard.com\u002Fblog\u002Fmachine-learning-interview-questions\u002F)\nROC 曲线是一种图形化工具，用于展示在不同阈值下真正例率与假正例率之间的对比关系。它通常用来衡量模型的敏感性（真正例）与误报率（假正例）之间的权衡。\n\n#### 54) I 类错误和 II 类错误有什么区别？[[来源]](https:\u002F\u002Fwww.springboard.com\u002Fblog\u002Fmachine-learning-interview-questions\u002F)\nI 类错误是假阳性，而 II 类错误则是假阴性。简而言之，I 类错误是指实际上没有发生的事情却被错误地认为发生了；而 II 类错误则是指实际上发生了的事情却被错误地认为没有发生。可以这样理解：I 类错误就像告诉一个男人他怀孕了，而 II 类错误则像是告诉一位孕妇她并没有怀孕。\n\n#### 55) 生成式模型和判别式模型有什么区别？[[来源]](https:\u002F\u002Fwww.springboard.com\u002Fblog\u002Fmachine-learning-interview-questions\u002F)\n生成式模型会学习数据的不同类别，而判别式模型则专注于区分不同类别的数据。通常情况下，在分类任务中，判别式模型的表现优于生成式模型。\n\n#### 56) 基于实例的学习与基于模型的学习。\n\n - **基于实例的学习**：系统通过记忆训练样本，然后利用相似性度量对新样本进行泛化。\n \n - **基于模型的学习**：另一种从样本集中进行泛化的办法是构建一个模型，然后用这个模型来进行预测。这种方法称为基于模型的学习。\n[[来源]](https:\u002F\u002Fmedium.com\u002F@sanidhyaagrawal08\u002Fwhat-is-instance-based-and-model-based-learning-s1e10-8e68364ae084)\n\n\n#### 57) 何时使用标签编码（Label Encoding）与独热编码（One Hot Encoding）？\n\n这个问题通常取决于你的数据集以及你打算使用的模型。不过，在选择适合你模型的编码方法之前，仍有一些需要注意的要点：\n\n我们使用独热编码的情况包括：\n\n- 分类特征是非有序的（例如国家名称等）\n- 分类特征的数量较少，因此可以有效地应用独热编码\n\n我们使用标签编码的情况包括：\n\n- 分类特征是有序的（例如小学低年级、小学高年级、初中、高中等）\n- 分类数量较多，因为独热编码可能导致较高的内存消耗\n\n[[来源]](https:\u002F\u002Fwww.analyticsvidhya.com\u002Fblog\u002F2020\u002F03\u002Fone-hot-encoding-vs-label-encoding-using-scikit-learn\u002F)\n\n#### 58) LDA 和 PCA 在降维方面有什么区别？\n\nLDA 和 PCA 都是线性变换技术：LDA 是监督学习方法，而 PCA 是无监督学习方法——PCA 不考虑类别标签。我们可以把 PCA 理解为一种寻找数据中方差最大方向的技术。相比之下，LDA 则试图找到能够最大化类别可分性的特征子空间。\n\n[[来源]](https:\u002F\u002Fsebastianraschka.com\u002Ffaq\u002Fdocs\u002Flda-vs-pca.html)\n\n#### 59) 什么是 t-SNE？\n\nt-分布随机邻域嵌入（t-SNE）是一种无监督的非线性技术，主要用于数据探索和高维数据的可视化。简单来说，t-SNE 可以帮助我们直观地了解数据在高维空间中的分布情况。\n\n[[来源]](https:\u002F\u002Ftowardsdatascience.com\u002Fan-introduction-to-t-sne-with-python-example-5a3a293108d1)\n\n#### 60) t-SNE 和 PCA 在降维方面有什么区别？\n\n首先需要注意的是，PCA算法于1933年提出，而t-SNE则是在2008年才被开发出来。自1933年以来，数据科学领域发生了巨大变化，尤其是在计算能力和数据规模方面。其次，PCA是一种线性降维技术，其目标是最大化方差并保留较大的成对距离。换句话说，原本差异较大的样本在降维后会更加分散。然而，当处理非线性流形结构的数据时，这种特性可能导致可视化效果不佳。所谓流形结构，可以理解为任何几何形状，例如圆柱体、球体、曲线等。\n\n与PCA不同，t-SNE只保留较小的成对距离或局部相似性，而PCA则关注于通过最大化方差来保持较大的成对距离。\n\n[[src]](https:\u002F\u002Ftowardsdatascience.com\u002Fan-introduction-to-t-sne-with-python-example-5a3a293108d1)\n\n#### 61) 什么是UMAP？\n\nUMAP（均匀流形近似与投影）是一种新颖的流形学习降维技术。UMAP基于黎曼几何和代数拓扑的理论框架构建而成，最终形成了一种实用且可扩展的算法，适用于真实世界的数据。\n\n[[src]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.03426#:~:text=UMAP%20)\n\n#### 62) 在降维任务中，t-SNE与UMAP有何区别？\n\nUMAP与t-SNE输出结果之间最大的区别在于它们对局部与全局结构的平衡处理方式：UMAP通常更能保留最终投影中的全局结构。这意味着簇间关系在UMAP的可视化中往往比t-SNE更为有意义。然而，需要注意的是，由于UMAP和t-SNE在将高维数据投影到低维空间时都会对数据的高维形状进行一定程度的扭曲，因此无论使用哪种方法，在低维空间中得到的任意轴或距离都无法像PCA那样直接解释。\n\n[[src]](https:\u002F\u002Fpair-code.github.io\u002Funderstanding-umap\u002F)\n\n#### 63) 随机数生成器是如何工作的？例如Python中的rand()函数？\n它根据种子生成伪随机数，并且存在一些著名的算法。有关这方面的更多信息，请参阅以下链接。\n[[src]](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FLinear_congruential_generator)\n\n#### 64) 假设我们要在同一数据集上评估n个不同的机器学习模型的性能，为什么以下划分机制是不正确的呢？\n```\ndef get_splits():\n    df = pd.DataFrame(...)\n    rnd = np.random.rand(len(df))\n    train = df[ rnd \u003C 0.8 ]\n    valid = df[ rnd >= 0.8 & rnd \u003C 0.9 ]\n    test = df[ rnd >= 0.9 ]\n\n    return train, valid, test\n\n#模型1\n\nfrom sklearn.tree import DecisionTreeClassifier\ntrain, valid, test = get_splits()\n...\n\n#模型2\n\nfrom sklearn.linear_model import LogisticRegression\ntrain, valid, test = get_splits()\n...\n```\nrand()函数每次运行时都会以不同的顺序对数据进行排序，因此如果再次执行该划分机制，我们得到的80%数据行将会与第一次不同。这就带来了一个问题：我们需要在相同的测试集上比较各个模型的性能。为了确保采样的可重复性和一致性，我们可以提前设置随机种子，或者在数据划分完成后将其保存下来。另一种方法是直接在sklearn的train_test_split()函数中设置`random_state`参数，这样就能在不同运行中获得完全一致的训练集、验证集和测试集。\n\n[[src]](https:\u002F\u002Ftowardsdatascience.com\u002Fwhy-do-we-set-a-random-state-in-machine-learning-models-bb2dc68d8431#:~:text=In%20Scikit%2Dlearn%2C%20the%20random,random%20state%20instance%20from%20np.)\n\n\n#### 65) 贝叶斯统计与频率派统计有何区别？[[src]](https:\u002F\u002Fwww.kdnuggets.com\u002F2022\u002F10\u002Fnlp-interview-questions.html)\n频率派统计是一种以样本统计量为基础，用于估计总体参数，并提供点估计和置信区间的框架。\n\n相比之下，贝叶斯统计则利用先验知识和信息来更新对某个参数或假设的信念，并为参数提供概率分布。\n\n两者的根本区别在于：贝叶斯统计将先验知识和信念纳入分析过程，而频率派统计则不涉及这些内容。\n\n\n\n## 贡献\n非常欢迎各位贡献：\n1. 克隆本仓库。\n2. 提交您的*问题*或*答案*。\n3. 打开**拉取请求**。\n\n## 更多学习资料\n* 备考资源及现代参考文献：[`docs\u002Fresources-and-references.md`](.\u002Fdocs\u002Fresources-and-references.md)\n* 推荐的主题分解：[`docs\u002Fstudy-pattern.md`](.\u002Fdocs\u002Fstudy-pattern.md)","# CrackingMachineLearningInterview 快速上手指南\n\nCrackingMachineLearningInterview 是一个专为机器学习工程师、AI 工程师、数据科学家及 MLOps 工程师打造的面试准备资源库。它涵盖了从经典机器学习算法到最新的大语言模型（LLM）、RAG、Agent 系统及生产级 AI 架构的全面内容。\n\n本指南将帮助你快速访问并利用该资源库进行高效复习。\n\n## 环境准备\n\n本项目主要为文档和代码示例集合，无需复杂的运行时环境即可阅读。但为了运行其中的代码示例（如 LangChain, PyTorch, n8n 工作流等），建议准备以下环境：\n\n*   **操作系统**：Windows, macOS 或 Linux\n*   **核心依赖**：\n    *   Git (用于克隆仓库)\n    *   Python 3.9+ (推荐 3.10 或更高版本以支持最新 AI 库)\n    *   Node.js (可选，用于运行 n8n 或前端相关示例)\n*   **推荐编辑器**：VS Code (配合 Markdown 预览插件体验更佳)\n*   **网络环境**：部分外部链接或模型下载可能需要稳定的网络连接。\n\n## 安装步骤\n\n你可以通过克隆 GitHub 仓库来获取最新内容，或者直接在线浏览。\n\n### 方法一：克隆仓库（推荐）\n\n在终端中执行以下命令克隆项目：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fshafaypro\u002FCrackingMachineLearningInterview.git\ncd CrackingMachineLearningInterview\n```\n\n### 方法二：在线浏览\n\n如果你只想阅读内容，可以直接访问项目的 GitPage 页面，获得更好的 UI\u002FUX 体验：\n[https:\u002F\u002Fshafaypro.github.io\u002FCrackingMachineLearningInterview\u002F](https:\u002F\u002Fshafaypro.github.io\u002FCrackingMachineLearningInterview\u002F)\n\n### （可选）配置 Python 学习环境\n\n若需运行 `coding_challenges` 或各轨道下的代码示例，建议创建虚拟环境并安装基础依赖：\n\n```bash\npython -m venv venv\nsource venv\u002Fbin\u002Factivate  # Windows 用户请使用: venv\\Scripts\\activate\n\n# 安装常见 AI 基础库（根据具体章节需求按需安装）\npip install torch transformers langchain langgraph crewai pandas scikit-learn fastapi\n```\n\n> **提示**：国内开发者可使用清华源或阿里源加速安装：\n> `pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple \u003Cpackage_name>`\n\n## 基本使用\n\n本仓库按“学习轨道（Track）”组织内容。你可以按照推荐的顺序进行学习，也可以针对特定岗位直接跳转。\n\n### 1. 查看推荐学习路径\n\n打开根目录下的 `README.md` 或直接查看 **Suggested Learning Order** 章节，遵循从理论到生产的路径：\n\n1.  **Classic ML Track**: 经典机器学习算法（时间序列、聚类等）。\n2.  **Deep Learning Track**: 深度学习基础与 Transformers。\n3.  **AI \u002F GenAI Track**: 大模型、RAG、Agent 及多模态技术。\n4.  **Data Engineering Track**: 数据管道、Spark、Kafka 等。\n5.  **MLOps Track**: 模型部署、监控、评估及 LLMOps。\n6.  **System Design Track**: AI 系统设计与架构模式。\n\n### 2. 针对性复习示例\n\n假设你正在准备 **AI 工程师 (GenAI 方向)** 的面试，可以直接进入 `ai_genai\u002F` 目录查阅核心主题：\n\n```bash\n# 进入生成式 AI 专题目录\ncd ai_genai\n\n# 你可以阅读以下关键文档（在文件管理器或 VS Code 中打开）：\n# - intro_llm_fundamentals.md : LLM 基础\n# - intro_rag_engineering.md  : RAG 工程化实践\n# - intro_agent_tool_use.md   : Agent 工具调用\n# - intro_multi_agent_systems.md : 多智能体系统\n```\n\n### 3. 实战项目演练\n\n仓库提供了多个高亮项目（Highlighted Projects），建议动手复现以丰富作品集。例如，构建一个 **多智能体研究助手**：\n\n1.  导航至 `ai_genai\u002Fintro_agent_tool_use.md` 阅读理论。\n2.  参考 `ai_genai\u002Fintro_crewai.md` 或 `ai_genai\u002Fintro_langgraph.md` 中的代码片段。\n3.  在本地创建 `my_agent_project` 文件夹，复制代码并根据示例调整 Prompt 和工具定义。\n4.  运行脚本验证 Agent 是否能正确调用工具并回答问题。\n\n### 4. 刷题与模拟面试\n\n*   **最新题库**：查看 `docs\u002F2026-additional-questions.md` 获取涵盖 LLM、RAG 和 Agent 的现代面试题。\n*   **经典题库**：查看 `Classic Question Bank` 部分复习统计学和传统算法。\n*   **系统设计**：参考 `system_design\u002F` 目录下的 RAG 管道设计和批处理 vs 实时系统架构案例。\n\n通过结合文档阅读与代码实战，你可以系统地构建起应对高阶 AI 职位的知识体系。","资深算法工程师李明正备战一家头部科技公司的机器学习专家岗位面试，急需系统梳理从传统模型到最新 GenAI 的全栈知识体系。\n\n### 没有 CrackingMachineLearningInterview 时\n- **复习方向迷茫**：面对海量资料，难以区分哪些是 2026 年大厂考察的重点（如 Agent、RAG），容易在过时技术上浪费精力。\n- **知识盲区频现**：缺乏针对 MLOps 和云平台的专项训练，在系统设计环节对模型服务化、特征存储等生产级问题回答空洞。\n- **模拟实战缺失**：找不到涵盖多智能体协作、LLM 评估等前沿场景的高质量真题，导致面对新题型时反应迟钝，逻辑混乱。\n- **学习路径杂乱**：没有清晰的进阶路线图，基础理论与工程落地能力脱节，难以构建完整的知识闭环。\n\n### 使用 CrackingMachineLearningInterview 后\n- **精准锁定考点**：直接依据\"2026 面试路线图”和新增问答库，高效聚焦 LLM 缩放、Agent 架构等现代面试核心，复习效率倍增。\n- **补齐工程短板**：通过 MLOps 和云平台专属板块，深入掌握 MLflow、模型监控及 AWS SageMaker 等实战细节，系统设计回答极具深度。\n- **从容应对新潮**：利用 GenAI 轨道中的 LangGraph、多智能体系统及高级 RAG 案例进行模拟演练，能够流畅拆解复杂的前沿技术难题。\n- **构建清晰体系**：遵循“建议学习顺序”，从经典算法平滑过渡到生产级 AI 系统，形成了逻辑严密且覆盖全面的技术知识网。\n\nCrackingMachineLearningInterview 将零散的备考资源转化为结构化的通关指南，帮助候选人从“盲目刷题”转型为“精准打击”，显著提升斩获高薪 Offer 的概率。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshafaypro_CrackingMachineLearningInterview_3c29b0dd.png","shafaypro","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fshafaypro_f0728a4d.jpg","Data\u002FML Architect\r\nUpcoming Staff Data Engineer @ Good companies.","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fimshafay\u002F","Berlin",null,"https:\u002F\u002Fgithub.com\u002Fshafaypro",[81],{"name":82,"color":83,"percentage":84},"HTML","#e34c26",100,604,125,"2026-04-10T14:23:22",1,"","未说明",{"notes":92,"python":90,"dependencies":93},"该仓库是一个面试准备资料库，主要包含文档、路线图、问答指南和项目概念介绍（如 RAG、Agent、MLOps 等），并非一个需要特定运行环境、GPU 或依赖库的可执行软件工具。用户只需使用浏览器阅读 Markdown 文档，或根据文档指引自行搭建相关学习项目的环境。",[],[35,14,13],"2026-03-27T02:49:30.150509","2026-04-13T13:34:20.557324",[],[]]