[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-GokuMohandas--mlops-course":3,"tool-GokuMohandas--mlops-course":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",142651,2,"2026-04-06T23:34:12",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":72,"owner_website":76,"owner_url":78,"languages":79,"stars":95,"forks":96,"last_commit_at":97,"license":98,"difficulty_score":32,"env_os":99,"env_gpu":100,"env_ram":101,"env_deps":102,"category_tags":107,"github_topics":109,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":122,"updated_at":123,"faqs":124,"releases":153},4882,"GokuMohandas\u002Fmlops-course","mlops-course","Learn how to design, develop, deploy and iterate on production-grade ML applications.","mlops-course 是一门专注于将机器学习与软件工程深度融合的实战课程，旨在帮助学习者掌握从模型实验到生产级应用部署的全流程技能。它解决了传统机器学习教学中“重算法、轻工程”的痛点，填补了学术理论与工业界实际需求之间的鸿沟，让用户能够构建可靠、可扩展且易于维护的 ML 系统。\n\n无论是软件工程师、数据科学家，还是希望建立技术认知的产品负责人或应届毕业生，都能从中获益。课程不要求用户切换编程语言，而是基于 Python 生态，引导大家运用软件工程的最佳实践来管理数据、训练、调优及服务化模型。\n\n其独特亮点在于坚持“第一性原理”教学，先厘清概念本质再动手编码；同时涵盖完整的 MLOps 组件串联，包括实验追踪、自动化测试、模型服务、工作流编排以及成熟的 CI\u002FCD 流水线。通过该课程，开发者可以在不改变代码架构的前提下，平滑地将项目从开发环境迁移至生产环境，甚至利用 Anyscale 集群轻松实现算力扩展。这是一套帮助各类技术人员系统化提升工程落地能力的优质资源。","# MLOps Course\n\nLearn how to combine machine learning with software engineering to design, develop, deploy and iterate on production-grade ML applications.\n\n- Lessons: https:\u002F\u002Fmadewithml.com\u002F\n- Code: [GokuMohandas\u002FMade-With-ML](https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML)\n\n\u003Ca href=\"https:\u002F\u002Fmadewithml.com\u002F#course\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_f16d31b7ff13.png\" alt=\"lessons\">\n\u003C\u002Fa>\n\n## Overview\n\nIn this course, we'll go from experimentation (model design + development) to production (model deployment + iteration). We'll do this iteratively by motivating the components that will enable us to build a *reliable* production system.\n\n\u003Cblockquote>\n  \u003Cimg width=20 src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F0\u002F09\u002FYouTube_full-color_icon_%282017%29.svg\u002F640px-YouTube_full-color_icon_%282017%29.svg.png\">&nbsp; Be sure to watch the video below for a quick overview of what we'll be building.\n\u003C\u002Fblockquote>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fyoutu.be\u002FAWgkt8H8yVo\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_04d2f4d3e599.jpg\" alt=\"Course overview video\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n- **💡 First principles**: before we jump straight into the code, we develop a first principles understanding for every machine learning concept.\n- **💻 Best practices**: implement software engineering best practices as we develop and deploy our machine learning models.\n- **📈 Scale**: easily scale ML workloads (data, train, tune, serve) in Python without having to learn completely new languages.\n- **⚙️ MLOps**: connect MLOps components (tracking, testing, serving, orchestration, etc.) as we build an end-to-end machine learning system.\n- **🚀 Dev to Prod**: learn how to quickly and reliably go from development to production without any changes to our code or infra management.\n- **🐙 CI\u002FCD**: learn how to create mature CI\u002FCD workflows to continuously train and deploy better models in a modular way that integrates with any stack.\n\n## Audience\n\nMachine learning is not a separate industry, instead, it's a powerful way of thinking about data that's not reserved for any one type of person.\n\n- **👩‍💻 All developers**: whether software\u002Finfra engineer or data scientist, ML is increasingly becoming a key part of the products that you'll be developing.\n- **👩‍🎓 College graduates**: learn the practical skills required for industry and bridge gap between the university curriculum and what industry expects.\n- **👩‍💼 Product\u002FLeadership**: who want to develop a technical foundation so that they can build amazing (and reliable) products powered by machine learning.\n\n## Set up\n\nBe sure to go through the [course](https:\u002F\u002Fmadewithml\u002F#course) for a much more detailed walkthrough of the content on this repository. We will have instructions for both local laptop and Anyscale clusters for the sections below, so be sure to toggle the ► dropdown based on what you're using (Anyscale instructions will be toggled on by default). If you do want to run this course with Anyscale, where we'll provide the **structure**, **compute (GPUs)** and **community** to learn everything in one weekend, join our next upcoming live cohort → [sign up here](https:\u002F\u002F4190urw86oh.typeform.com\u002Fmadewithml)!\n\n### Cluster\n\nWe'll start by setting up our cluster with the environment and compute configurations.\n\n\u003Cdetails>\n  \u003Csummary>Local\u003C\u002Fsummary>\u003Cbr>\n  Your personal laptop (single machine) will act as the cluster, where one CPU will be the head node and some of the remaining CPU will be the worker nodes. All of the code in this course will work in any personal laptop though it will be slower than executing the same workloads on a larger cluster.\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  We can create an [Anyscale Workspace](https:\u002F\u002Fdocs.anyscale.com\u002Fdevelop\u002Fworkspaces\u002Fget-started) using the [webpage UI](https:\u002F\u002Fconsole.anyscale.com\u002Fo\u002Fmadewithml\u002Fworkspaces\u002Fadd\u002Fblank).\n\n  ```md\n  - Workspace name: `madewithml`\n  - Project: `madewithml`\n  - Cluster environment name: `madewithml-cluster-env`\n  # Toggle `Select from saved configurations`\n  - Compute config: `madewithml-cluster-compute`\n  ```\n\n  > Alternatively, we can use the [CLI](https:\u002F\u002Fdocs.anyscale.com\u002Freference\u002Fanyscale-cli) to create the workspace via `anyscale workspace create ...`\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>Other (cloud platforms, K8s, on-prem)\u003C\u002Fsummary>\u003Cbr>\n\n  If you don't want to do this course locally or via Anyscale, you have the following options:\n\n  - On [AWS and GCP](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Findex.html#cloud-vm-index). Community-supported Azure and Aliyun integrations also exist.\n  - On [Kubernetes](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fkubernetes\u002Findex.html#kuberay-index), via the officially supported KubeRay project.\n  - Deploy Ray manually [on-prem](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Flaunching-clusters\u002Fon-premises.html#on-prem) or onto platforms [not listed here](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Fcommunity\u002Findex.html#ref-cluster-setup).\n\n\u003C\u002Fdetails>\n\n### Git setup\n\nCreate a repository by following these instructions: [Create a new repository](https:\u002F\u002Fgithub.com\u002Fnew) → name it `Made-With-ML` → Toggle `Add a README file` (**very important** as this creates a `main` branch) → Click `Create repository` (scroll down)\n\nNow we're ready to clone the repository that has all of our code:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML.git .\ngit remote set-url origin https:\u002F\u002Fgithub.com\u002FGITHUB_USERNAME\u002FMade-With-ML.git  # \u003C-- CHANGE THIS to your username\ngit checkout -b dev\n```\n\n### Virtual environment\n\n\u003Cdetails>\n  \u003Csummary>Local\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  export PYTHONPATH=$PYTHONPATH:$PWD\n  python3 -m venv venv  # recommend using Python 3.10\n  source venv\u002Fbin\u002Factivate  # on Windows: venv\\Scripts\\activate\n  python3 -m pip install --upgrade pip setuptools wheel\n  python3 -m pip install -r requirements.txt\n  pre-commit install\n  pre-commit autoupdate\n  ```\n\n  > Highly recommend using Python `3.10` and using [pyenv](https:\u002F\u002Fgithub.com\u002Fpyenv\u002Fpyenv) (mac) or [pyenv-win](https:\u002F\u002Fgithub.com\u002Fpyenv-win\u002Fpyenv-win) (windows).\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  Our environment with the appropriate Python version and libraries is already all set for us through the cluster environment we used when setting up our Anyscale Workspace. So we just need to run these commands:\n  ```bash\n  export PYTHONPATH=$PYTHONPATH:$PWD\n  pre-commit install\n  pre-commit autoupdate\n  ```\n\n\u003C\u002Fdetails>\n\n## Notebook\n\nStart by exploring the [jupyter notebook](notebooks\u002Fmadewithml.ipynb) to interactively walkthrough the core machine learning workloads.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_f635c53d8169.png\">\n\u003C\u002Fdiv>\n\n\u003Cdetails>\n  \u003Csummary>Local\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  # Start notebook\n  jupyter lab notebooks\u002Fmadewithml.ipynb\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  Click on the Jupyter icon &nbsp;\u003Cimg width=15 src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F3\u002F38\u002FJupyter_logo.svg\u002F1200px-Jupyter_logo.svg.png\">&nbsp; at the top right corner of our Anyscale Workspace page and this will open up our JupyterLab instance in a new tab. Then navigate to the `notebooks` directory and open up the `madewithml.ipynb` notebook.\n\n\u003C\u002Fdetails>\n\n\n## Scripts\n\nNow we'll execute the same workloads using the clean Python scripts following software engineering best practices (testing, documentation, logging, serving, versioning, etc.) The code we've implemented in our notebook will be refactored into the following scripts:\n\n```bash\nmadewithml\n├── config.py\n├── data.py\n├── evaluate.py\n├── models.py\n├── predict.py\n├── serve.py\n├── train.py\n├── tune.py\n└── utils.py\n```\n\n**Note**: Change the `--num-workers`, `--cpu-per-worker`, and `--gpu-per-worker` input argument values below based on your system's resources. For example, if you're on a local laptop, a reasonable configuration would be `--num-workers 6 --cpu-per-worker 1 --gpu-per-worker 0`.\n\n### Training\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\npython madewithml\u002Ftrain.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --train-loop-config \"$TRAIN_LOOP_CONFIG\" \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results\u002Ftraining_results.json\n```\n\n### Tuning\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\nexport INITIAL_PARAMS=\"[{\\\"train_loop_config\\\": $TRAIN_LOOP_CONFIG}]\"\npython madewithml\u002Ftune.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --initial-params \"$INITIAL_PARAMS\" \\\n    --num-runs 2 \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results\u002Ftuning_results.json\n```\n\n### Experiment tracking\n\nWe'll use [MLflow](https:\u002F\u002Fmlflow.org\u002F) to track our experiments and store our models and the [MLflow Tracking UI](https:\u002F\u002Fwww.mlflow.org\u002Fdocs\u002Flatest\u002Ftracking.html#tracking-ui) to view our experiments. We have been saving our experiments to a local directory but note that in an actual production setting, we would have a central location to store all of our experiments. It's easy\u002Finexpensive to spin up your own MLflow server for all of your team members to track their experiments on or use a managed solution like [Weights & Biases](https:\u002F\u002Fwandb.ai\u002Fsite), [Comet](https:\u002F\u002Fwww.comet.ml\u002F), etc.\n\n```bash\nexport MODEL_REGISTRY=$(python -c \"from madewithml import config; print(config.MODEL_REGISTRY)\")\nmlflow server -h 0.0.0.0 -p 8080 --backend-store-uri $MODEL_REGISTRY\n```\n\n\u003Cdetails>\n  \u003Csummary>Local\u003C\u002Fsummary>\u003Cbr>\n\n  If you're running this notebook on your local laptop then head on over to \u003Ca href=\"http:\u002F\u002Flocalhost:8080\u002F\" target=\"_blank\">http:\u002F\u002Flocalhost:8080\u002F\u003C\u002Fa> to view your MLflow dashboard.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  If you're on \u003Ca href=\"https:\u002F\u002Fdocs.anyscale.com\u002Fdevelop\u002Fworkspaces\u002Fget-started\" target=\"_blank\">Anyscale Workspaces\u003C\u002Fa>, then we need to first expose the port of the MLflow server. Run the following command on your Anyscale Workspace terminal to generate the public URL to your MLflow server.\n\n  ```bash\n  APP_PORT=8080\n  echo https:\u002F\u002F$APP_PORT-port-$ANYSCALE_SESSION_DOMAIN\n  ```\n\n\u003C\u002Fdetails>\n\n### Evaluation\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\nexport HOLDOUT_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fholdout.csv\"\npython madewithml\u002Fevaluate.py \\\n    --run-id $RUN_ID \\\n    --dataset-loc $HOLDOUT_LOC \\\n    --results-fp results\u002Fevaluation_results.json\n```\n```json\n{\n  \"timestamp\": \"June 09, 2023 09:26:18 AM\",\n  \"run_id\": \"6149e3fec8d24f1492d4a4cabd5c06f6\",\n  \"overall\": {\n    \"precision\": 0.9076136428670714,\n    \"recall\": 0.9057591623036649,\n    \"f1\": 0.9046792827719773,\n    \"num_samples\": 191.0\n  },\n...\n```\n\n### Inference\n```bash\n# Get run ID\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\npython madewithml\u002Fpredict.py predict \\\n    --run-id $RUN_ID \\\n    --title \"Transfer learning with transformers\" \\\n    --description \"Using transformers for transfer learning on text classification tasks.\"\n```\n```json\n[{\n  \"prediction\": [\n    \"natural-language-processing\"\n  ],\n  \"probabilities\": {\n    \"computer-vision\": 0.0009767753,\n    \"mlops\": 0.0008223939,\n    \"natural-language-processing\": 0.99762577,\n    \"other\": 0.000575123\n  }\n}]\n```\n\n### Serving\n\n\u003Cdetails>\n  \u003Csummary>Local\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  # Start\n  ray start --head\n  ```\n\n  ```bash\n  # Set up\n  export EXPERIMENT_NAME=\"llm\"\n  export RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n  python madewithml\u002Fserve.py --run_id $RUN_ID\n  ```\n\n  While the application is running, we can use it via cURL, Python, etc.:\n\n  ```bash\n  # via cURL\n  curl -X POST -H \"Content-Type: application\u002Fjson\" -d '{\n    \"title\": \"Transfer learning with transformers\",\n    \"description\": \"Using transformers for transfer learning on text classification tasks.\"\n  }' http:\u002F\u002F127.0.0.1:8000\u002Fpredict\n  ```\n\n  ```python\n  # via Python\n  import json\n  import requests\n  title = \"Transfer learning with transformers\"\n  description = \"Using transformers for transfer learning on text classification tasks.\"\n  json_data = json.dumps({\"title\": title, \"description\": description})\n  requests.post(\"http:\u002F\u002F127.0.0.1:8000\u002Fpredict\", data=json_data).json()\n  ```\n\n  ```bash\n  ray stop  # shutdown\n  ```\n\n```bash\nexport HOLDOUT_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fholdout.csv\"\ncurl -X POST -H \"Content-Type: application\u002Fjson\" -d '{\n    \"dataset_loc\": \"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fholdout.csv\"\n  }' http:\u002F\u002F127.0.0.1:8000\u002Fevaluate\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  In Anyscale Workspaces, Ray is already running so we don't have to manually start\u002Fshutdown like we have to do locally.\n\n  ```bash\n  # Set up\n  export EXPERIMENT_NAME=\"llm\"\n  export RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n  python madewithml\u002Fserve.py --run_id $RUN_ID\n  ```\n\n  While the application is running, we can use it via cURL, Python, etc.:\n\n  ```bash\n  # via cURL\n  curl -X POST -H \"Content-Type: application\u002Fjson\" -d '{\n    \"title\": \"Transfer learning with transformers\",\n    \"description\": \"Using transformers for transfer learning on text classification tasks.\"\n  }' http:\u002F\u002F127.0.0.1:8000\u002Fpredict\n  ```\n\n  ```python\n  # via Python\n  import json\n  import requests\n  title = \"Transfer learning with transformers\"\n  description = \"Using transformers for transfer learning on text classification tasks.\"\n  json_data = json.dumps({\"title\": title, \"description\": description})\n  requests.post(\"http:\u002F\u002F127.0.0.1:8000\u002Fpredict\", data=json_data).json()\n  ```\n\n\u003C\u002Fdetails>\n\n### Testing\n```bash\n# Code\npython3 -m pytest tests\u002Fcode --verbose --disable-warnings\n\n# Data\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\npytest --dataset-loc=$DATASET_LOC tests\u002Fdata --verbose --disable-warnings\n\n# Model\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\npytest --run-id=$RUN_ID tests\u002Fmodel --verbose --disable-warnings\n\n# Coverage\npython3 -m pytest --cov madewithml --cov-report html\n```\n\n## Production\n\nFrom this point onwards, in order to deploy our application into production, we'll need to either be on Anyscale or on a [cloud VM](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Findex.html#cloud-vm-index) \u002F [on-prem](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Flaunching-clusters\u002Fon-premises.html#on-prem) cluster you manage yourself (w\u002F Ray). If not on Anyscale, the commands will be [slightly different](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Frunning-applications\u002Fjob-submission\u002Findex.html) but the concepts will be the same.\n\n> If you don't want to set up all of this yourself, we highly recommend joining our [upcoming live cohort](https:\u002F\u002F4190urw86oh.typeform.com\u002Fmadewithml){:target=\"_blank\"} where we'll provide an environment with all of this infrastructure already set up for you so that you just focused on the machine learning.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_4fd7296b3b49.png\">\n\u003C\u002Fdiv>\n\n### Authentication\n\nThese credentials below are **automatically** set for us if we're using Anyscale Workspaces. We **do not** need to set these credentials explicitly on Workspaces but we do if we're running this locally or on a cluster outside of where our Anyscale Jobs and Services are configured to run.\n\n``` bash\nexport ANYSCALE_HOST=https:\u002F\u002Fconsole.anyscale.com\nexport ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN  # retrieved from Anyscale credentials page\n```\n\n### Cluster environment\n\nThe cluster environment determines **where** our workloads will be executed (OS, dependencies, etc.) We've already created this [cluster environment](.\u002Fdeploy\u002Fcluster_env.yaml) for us but this is how we can create\u002Fupdate one ourselves.\n\n```bash\nexport CLUSTER_ENV_NAME=\"madewithml-cluster-env\"\nanyscale cluster-env build deploy\u002Fcluster_env.yaml --name $CLUSTER_ENV_NAME\n```\n\n### Compute configuration\n\nThe compute configuration determines **what** resources our workloads will be executes on. We've already created this [compute configuration](.\u002Fdeploy\u002Fcluster_compute.yaml) for us but this is how we can create it ourselves.\n\n```bash\nexport CLUSTER_COMPUTE_NAME=\"madewithml-cluster-compute\"\nanyscale cluster-compute create deploy\u002Fcluster_compute.yaml --name $CLUSTER_COMPUTE_NAME\n```\n\n### Anyscale jobs\n\nNow we're ready to execute our ML workloads. We've decided to combine them all together into one [job](.\u002Fdeploy\u002Fjobs\u002Fworkloads.yaml) but we could have also created separate jobs for each workload (train, evaluate, etc.) We'll start by editing the `$GITHUB_USERNAME` slots inside our [`workloads.yaml`](.\u002Fdeploy\u002Fjobs\u002Fworkloads.yaml) file:\n```yaml\nruntime_env:\n  working_dir: .\n  upload_path: s3:\u002F\u002Fmadewithml\u002F$GITHUB_USERNAME\u002Fjobs  # \u003C--- CHANGE USERNAME (case-sensitive)\n  env_vars:\n    GITHUB_USERNAME: $GITHUB_USERNAME  # \u003C--- CHANGE USERNAME (case-sensitive)\n```\n\nThe `runtime_env` here specifies that we should upload our current `working_dir` to an S3 bucket so that all of our workers when we execute an Anyscale Job have access to the code to use. The `GITHUB_USERNAME` is used later to save results from our workloads to S3 so that we can retrieve them later (ex. for serving).\n\nNow we're ready to submit our job to execute our ML workloads:\n```bash\nanyscale job submit deploy\u002Fjobs\u002Fworkloads.yaml\n```\n\n### Anyscale Services\n\nAnd after our ML workloads have been executed, we're ready to launch our serve our model to production. Similar to our Anyscale Jobs configs, be sure to change the `$GITHUB_USERNAME` in [`serve_model.yaml`](.\u002Fdeploy\u002Fservices\u002Fserve_model.yaml).\n\n```yaml\nray_serve_config:\n  import_path: deploy.services.serve_model:entrypoint\n  runtime_env:\n    working_dir: .\n    upload_path: s3:\u002F\u002Fmadewithml\u002F$GITHUB_USERNAME\u002Fservices  # \u003C--- CHANGE USERNAME (case-sensitive)\n    env_vars:\n      GITHUB_USERNAME: $GITHUB_USERNAME  # \u003C--- CHANGE USERNAME (case-sensitive)\n```\n\nNow we're ready to launch our service:\n```bash\n# Rollout service\nanyscale service rollout -f deploy\u002Fservices\u002Fserve_model.yaml\n\n# Query\ncurl -X POST -H \"Content-Type: application\u002Fjson\" -H \"Authorization: Bearer $SECRET_TOKEN\" -d '{\n  \"title\": \"Transfer learning with transformers\",\n  \"description\": \"Using transformers for transfer learning on text classification tasks.\"\n}' $SERVICE_ENDPOINT\u002Fpredict\u002F\n\n# Rollback (to previous version of the Service)\nanyscale service rollback -f $SERVICE_CONFIG --name $SERVICE_NAME\n\n# Terminate\nanyscale service terminate --name $SERVICE_NAME\n```\n\n### CI\u002FCD\n\nWe're not going to manually deploy our application every time we make a change. Instead, we'll automate this process using GitHub Actions!\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_298a6c8e97a7.png\">\n\u003C\u002Fdiv>\n\n1. We'll start by adding the necessary credentials to the [`\u002Fsettings\u002Fsecrets\u002Factions`](https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\u002Fsettings\u002Fsecrets\u002Factions) page of our GitHub repository.\n\n``` bash\nexport ANYSCALE_HOST=https:\u002F\u002Fconsole.anyscale.com\nexport ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN  # retrieved from https:\u002F\u002Fconsole.anyscale.com\u002Fo\u002Fmadewithml\u002Fcredentials\n```\n\n2. Now we can make changes to our code (not on `main` branch) and push them to GitHub. But in order to push our code to GitHub, we'll need to first authenticate with our credentials before pushing to our repository:\n\n```bash\ngit config --global user.name \"Your Name\"  # \u003C-- CHANGE THIS to your name\ngit config --global user.email you@example.com  # \u003C-- CHANGE THIS to your email\ngit add .\ngit commit -m \"\"  # \u003C-- CHANGE THIS to your message\ngit push origin dev\n```\n\nNow you will be prompted to enter your username and password (personal access token). Follow these steps to get personal access token: [New GitHub personal access token](https:\u002F\u002Fgithub.com\u002Fsettings\u002Ftokens\u002Fnew) → Add a name → Toggle `repo` and `workflow` → Click `Generate token` (scroll down) → Copy the token and paste it when prompted for your password.\n\n3. Now we can start a PR from this branch to our `main` branch and this will trigger the [workloads workflow](\u002F.github\u002Fworkflows\u002Fworkloads.yaml). If the workflow (Anyscale Jobs) succeeds, this will produce comments with the training and evaluation results directly on the PR.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_57eb341e3ec9.png\">\n\u003C\u002Fdiv>\n\n4. If we like the results, we can merge the PR into the `main` branch. This will trigger the [serve workflow](\u002F.github\u002Fworkflows\u002Fserve.yaml) which will rollout our new service to production!\n\n### Continual learning\n\nWith our CI\u002FCD workflow in place to deploy our application, we can now focus on continually improving our model. It becomes really easy to extend on this foundation to connect to scheduled runs (cron), [data pipelines](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fdata-engineering\u002F), drift detected through [monitoring](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fmonitoring\u002F), [online evaluation](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fevaluation\u002F#online-evaluation), etc. And we can easily add additional context such as comparing any experiment with what's currently in production (directly in the PR even), etc.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_35e790de5108.png\">\n\u003C\u002Fdiv>\n\n## FAQ\n\n### Jupyter notebook kernels\n\nIssues with configuring the notebooks with jupyter? By default, jupyter will use the kernel with our virtual environment but we can also manually add it to jupyter:\n```bash\npython3 -m ipykernel install --user --name=venv\n```\nNow we can open up a notebook → Kernel (top menu bar) → Change Kernel → `venv`. To ever delete this kernel, we can do the following:\n```bash\njupyter kernelspec list\njupyter kernelspec uninstall venv\n```\n","# MLOps 课程\n\n学习如何将机器学习与软件工程相结合，以设计、开发、部署并迭代生产级的机器学习应用。\n\n- 课程：https:\u002F\u002Fmadewithml.com\u002F\n- 代码：[GokuMohandas\u002FMade-With-ML](https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML)\n\n\u003Ca href=\"https:\u002F\u002Fmadewithml.com\u002F#course\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_f16d31b7ff13.png\" alt=\"lessons\">\n\u003C\u002Fa>\n\n## 概述\n\n在本课程中，我们将从实验阶段（模型设计 + 开发）过渡到生产阶段（模型部署 + 迭代）。我们将通过逐步构建能够支持我们打造一个*可靠*生产系统的各个组件来实现这一目标。\n\n\u003Cblockquote>\n  \u003Cimg width=20 src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F0\u002F09\u002FYouTube_full-color_icon_%282017%29.svg\u002F640px-YouTube_full-color_icon_%282017%29.svg.png\">&nbsp; 请务必观看下方视频，快速了解我们将要构建的内容。\n\u003C\u002Fblockquote>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fyoutu.be\u002FAWgkt8H8yVo\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_04d2f4d3e599.jpg\" alt=\"课程概述视频\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n- **💡 第一性原理**：在直接进入代码之前，我们会对每一个机器学习概念建立基于第一性原理的理解。\n- **💻 最佳实践**：在开发和部署机器学习模型的过程中，我们将践行软件工程的最佳实践。\n- **📈 扩展性**：无需学习全新的编程语言，即可在 Python 中轻松扩展机器学习工作负载（数据处理、训练、调参、服务等）。\n- **⚙️ MLOps**：在构建端到端机器学习系统的过程中，我们将连接 MLOps 的各个组件（跟踪、测试、服务、编排等）。\n- **🚀 从开发到生产**：学习如何在不更改代码或基础设施管理的情况下，快速且可靠地从开发阶段过渡到生产阶段。\n- **🐙 CI\u002FCD**：学习如何创建成熟的 CI\u002FCD 流水线，以模块化的方式持续训练和部署更优秀的模型，并与任何技术栈无缝集成。\n\n## 目标受众\n\n机器学习并不是一个独立的行业，而是一种强大的数据思维模式，它并不局限于某一类人群。\n\n- **👩‍💻 所有开发者**：无论是软件\u002F基础设施工程师还是数据科学家，机器学习正日益成为您所开发产品中的关键组成部分。\n- **👩‍🎓 大学毕业生**：学习行业所需的实用技能，弥合大学课程与行业期望之间的差距。\n- **👩‍💼 产品\u002F管理层**：希望打下坚实的技术基础，从而能够构建出令人惊叹且可靠的机器学习驱动产品的人士。\n\n## 环境搭建\n\n请务必访问[课程页面](https:\u002F\u002Fmadewithml\u002F#course)，以获取关于本仓库内容更为详细的介绍。对于以下各部分，我们将提供本地笔记本电脑和 Anyscale 集群两种环境的说明，请根据您使用的环境切换 ► 下拉菜单（默认会显示 Anyscale 的说明）。如果您希望通过 Anyscale 来运行本课程——我们将为您提供**架构**、**计算资源（GPU）**以及**社区支持**，让您在一个周末内掌握所有知识——欢迎加入我们即将开启的直播班级 → [在此报名](https:\u002F\u002F4190urw86oh.typeform.com\u002Fmadewithml)！\n\n### 集群\n\n我们首先将设置集群的环境和计算配置。\n\n\u003Cdetails>\n  \u003Csummary>本地\u003C\u002Fsummary>\u003Cbr>\n  您的个人笔记本电脑（单机）将充当集群的角色，其中一台 CPU 将作为主节点，其余 CPU 则作为工作节点。本课程中的所有代码均可在任何个人笔记本电脑上运行，不过其执行速度会比在更大规模的集群上慢。\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  我们可以使用[网页界面](https:\u002F\u002Fconsole.anyscale.com\u002Fo\u002Fmadewithml\u002Fworkspaces\u002Fadd\u002Fblank)创建一个[Anyscale 工作空间](https:\u002F\u002Fdocs.anyscale.com\u002Fdevelop\u002Fworkspaces\u002Fget-started)。\n\n  ```md\n  - 工作空间名称：`madewithml`\n  - 项目：`madewithml`\n  - 集群环境名称：`madewithml-cluster-env`\n  # 切换“从已保存配置中选择”\n  - 计算配置：`madewithml-cluster-compute`\n  ```\n\n  > 或者，我们也可以使用[命令行工具](https:\u002F\u002Fdocs.anyscale.com\u002Freference\u002Fanyscale-cli)通过 `anyscale workspace create ...` 创建工作空间。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>其他（云平台、K8s、本地部署）\u003C\u002Fsummary>\u003Cbr>\n\n  如果您不想在本地或通过 Anyscale 运行本课程，您可以选择以下方式：\n\n  - 在 [AWS 和 GCP](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Findex.html#cloud-vm-index) 上。社区支持的 Azure 和 Aliyun 集成也存在。\n  - 在 [Kubernetes](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fkubernetes\u002Findex.html#kuberay-index) 上，通过官方支持的 KubeRay 项目。\n  - 手动部署 Ray [本地](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Flaunching-clusters\u002Fon-premises.html#on-prem) 或部署到[此处未列出的平台](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Fcommunity\u002Findex.html#ref-cluster-setup)上。\n\n\u003C\u002Fdetails>\n\n### Git 设置\n\n按照以下步骤创建一个仓库：[创建新仓库](https:\u002F\u002Fgithub.com\u002Fnew) → 命名为 `Made-With-ML` → 勾选 `添加 README 文件`（**非常重要**，因为这会创建一个 `main` 分支）→ 点击 `创建仓库`（向下滚动）\n\n现在我们可以克隆包含所有代码的仓库：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML.git .\ngit remote set-url origin https:\u002F\u002Fgithub.com\u002FGITHUB_USERNAME\u002FMade-With-ML.git  # \u003C-- 将此处替换为您的用户名\ngit checkout -b dev\n```\n\n### 虚拟环境\n\n\u003Cdetails>\n  \u003Csummary>本地\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  export PYTHONPATH=$PYTHONPATH:$PWD\n  python3 -m venv venv  # 推荐使用 Python 3.10\n  source venv\u002Fbin\u002Factivate  # Windows 上：venv\\Scripts\\activate\n  python3 -m pip install --upgrade pip setuptools wheel\n  python3 -m pip install -r requirements.txt\n  pre-commit install\n  pre-commit autoupdate\n  ```\n\n  > 强烈建议使用 Python `3.10`，并搭配 [pyenv](https:\u002F\u002Fgithub.com\u002Fpyenv\u002Fpyenv)（macOS）或 [pyenv-win](https:\u002F\u002Fgithub.com\u002Fpyenv-win\u002Fpyenv-win)（Windows）。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  我们在设置 Anyscale 工作空间时所使用的集群环境已经为我们配置好了合适的 Python 版本和相关库。因此，我们只需运行以下命令：\n  ```bash\n  export PYTHONPATH=$PYTHONPATH:$PWD\n  pre-commit install\n  pre-commit autoupdate\n  ```\n\n\u003C\u002Fdetails>\n\n## 笔记本\n\n首先，探索 [jupyter notebook](notebooks\u002Fmadewithml.ipynb)，以交互式的方式逐步了解核心机器学习工作负载。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_f635c53d8169.png\">\n\u003C\u002Fdiv>\n\n\u003Cdetails>\n  \u003Csummary>本地\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  # 启动笔记本\n  jupyter lab notebooks\u002Fmadewithml.ipynb\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  点击我们 Anyscale Workspace 页面右上角的 Jupyter 图标 &nbsp;\u003Cimg width=15 src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F3\u002F38\u002FJupyter_logo.svg\u002F1200px-Jupyter_logo.svg.png\">&nbsp; ，这将在新标签页中打开我们的 JupyterLab 实例。然后导航到 `notebooks` 目录，并打开 `madewithml.ipynb` 笔记本。\n\n\u003C\u002Fdetails>\n\n\n## 脚本\n\n现在，我们将使用遵循软件工程最佳实践（测试、文档、日志记录、服务化、版本控制等）的整洁 Python 脚本执行相同的工作负载。我们在笔记本中实现的代码将被重构为以下脚本：\n\n```bash\nmadewithml\n├── config.py\n├── data.py\n├── evaluate.py\n├── models.py\n├── predict.py\n├── serve.py\n├── train.py\n├── tune.py\n└── utils.py\n```\n\n**注意**：请根据您系统的资源调整下面的 `--num-workers`、`--cpu-per-worker` 和 `--gpu-per-worker` 输入参数值。例如，如果您在本地笔记本电脑上运行，合理的配置可能是 `--num-workers 6 --cpu-per-worker 1 --gpu-per-worker 0`。\n\n### 训练\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\npython madewithml\u002Ftrain.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --train-loop-config \"$TRAIN_LOOP_CONFIG\" \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results\u002Ftraining_results.json\n```\n\n### 调参\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\nexport INITIAL_PARAMS=\"[{\\\"train_loop_config\\\": $TRAIN_LOOP_CONFIG}]\"\npython madewithml\u002Ftune.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --initial-params \"$INITIAL_PARAMS\" \\\n    --num-runs 2 \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results\u002Ftuning_results.json\n```\n\n### 实验跟踪\n\n我们将使用 [MLflow](https:\u002F\u002Fmlflow.org\u002F) 来跟踪我们的实验并存储模型，同时使用 [MLflow Tracking UI](https:\u002F\u002Fwww.mlflow.org\u002Fdocs\u002Flatest\u002Ftracking.html#tracking-ui) 查看我们的实验。我们一直将实验保存到本地目录，但请注意，在实际生产环境中，我们会有一个集中位置来存储所有实验。为团队成员搭建一个自己的 MLflow 服务器以跟踪他们的实验非常简单且成本低廉，或者也可以使用托管解决方案，如 [Weights & Biases](https:\u002F\u002Fwandb.ai\u002Fsite)、[Comet](https:\u002F\u002Fwww.comet.ml\u002F) 等。\n\n```bash\nexport MODEL_REGISTRY=$(python -c \"from madewithml import config; print(config.MODEL_REGISTRY)\")\nmlflow server -h 0.0.0.0 -p 8080 --backend-store-uri $MODEL_REGISTRY\n```\n\n\u003Cdetails>\n  \u003Csummary>本地\u003C\u002Fsummary>\u003Cbr>\n\n  如果您在本地笔记本电脑上运行此笔记本，请访问 \u003Ca href=\"http:\u002F\u002Flocalhost:8080\u002F\" target=\"_blank\">http:\u002F\u002Flocalhost:8080\u002F\u003C\u002Fa> 查看您的 MLflow 仪表板。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  如果您在 \u003Ca href=\"https:\u002F\u002Fdocs.anyscale.com\u002Fdevelop\u002Fworkspaces\u002Fget-started\" target=\"_blank\">Anyscale Workspaces\u003C\u002Fa> 上，我们需要先暴露 MLflow 服务器的端口。在您的 Anyscale Workspace 终端上运行以下命令，以生成 MLflow 服务器的公共 URL。\n\n  ```bash\n  APP_PORT=8080\n  echo https:\u002F\u002F$APP_PORT-port-$ANYSCALE_SESSION_DOMAIN\n  ```\n\n\u003C\u002Fdetails>\n\n### 评估\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\nexport HOLDOUT_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fholdout.csv\"\npython madewithml\u002Fevaluate.py \\\n    --run-id $RUN_ID \\\n    --dataset-loc $HOLDOUT_LOC \\\n    --results-fp results\u002Fevaluation_results.json\n```\n```json\n{\n  \"timestamp\": \"2023年6月9日 上午9:26:18\",\n  \"run_id\": \"6149e3fec8d24f1492d4a4cabd5c06f6\",\n  \"overall\": {\n    \"precision\": 0.9076136428670714,\n    \"recall\": 0.9057591623036649,\n    \"f1\": 0.9046792827719773,\n    \"num_samples\": 191.0\n  },\n...\n```\n\n### 推理\n```bash\n# 获取运行 ID\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\npython madewithml\u002Fpredict.py predict \\\n    --run-id $RUN_ID \\\n    --title \"使用 Transformer 进行迁移学习\" \\\n    --description \"在文本分类任务中使用 Transformer 进行迁移学习。\"\n```\n```json\n[{\n  \"prediction\": [\n    \"自然语言处理\"\n  ],\n  \"probabilities\": {\n    \"计算机视觉\": 0.0009767753,\n    \"MLOps\": 0.0008223939,\n    \"自然语言处理\": 0.99762577,\n    \"其他\": 0.000575123\n  }\n}]\n```\n\n### 服务\n\n\u003Cdetails>\n  \u003Csummary>本地\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  # 启动\n  ray start --head\n  ```\n\n  ```bash\n  # 设置\n  export EXPERIMENT_NAME=\"llm\"\n  export RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n  python madewithml\u002Fserve.py --run_id $RUN_ID\n  ```\n\n  在应用运行期间，我们可以通过 cURL、Python 等方式使用它：\n\n  ```bash\n  # 通过 cURL\n  curl -X POST -H \"Content-Type: application\u002Fjson\" -d '{\n    \"title\": \"使用 Transformer 进行迁移学习\",\n    \"description\": \"在文本分类任务上使用 Transformer 进行迁移学习。\"\n  }' http:\u002F\u002F127.0.0.1:8000\u002Fpredict\n  ```\n\n  ```python\n  # 通过 Python\n  import json\n  import requests\n  title = \"使用 Transformer 进行迁移学习\"\n  description = \"在文本分类任务上使用 Transformer 进行迁移学习。\"\n  json_data = json.dumps({\"title\": title, \"description\": description})\n  requests.post(\"http:\u002F\u002F127.0.0.1:8000\u002Fpredict\", data=json_data).json()\n  ```\n\n  ```bash\n  ray stop  # 关闭\n  ```\n\n  ```bash\n  export HOLDOUT_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fholdout.csv\"\n  curl -X POST -H \"Content-Type: application\u002Fjson\" -d '{\n      \"dataset_loc\": \"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fholdout.csv\"\n    }' http:\u002F\u002F127.0.0.1:8000\u002Fevaluate\n  ```\n\n  \u003C\u002Fdetails>\n\n  \u003Cdetails open>\n    \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n    在 Anyscale Workspaces 中，Ray 已经在运行，因此我们无需像在本地那样手动启动或关闭。\n\n    ```bash\n    # 设置\n    export EXPERIMENT_NAME=\"llm\"\n    export RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n    python madewithml\u002Fserve.py --run_id $RUN_ID\n    ```\n\n    在应用运行期间，我们可以通过 cURL、Python 等方式使用它：\n\n    ```bash\n    # 通过 cURL\n    curl -X POST -H \"Content-Type: application\u002Fjson\" -d '{\n        \"title\": \"使用 Transformer 进行迁移学习\",\n        \"description\": \"在文本分类任务上使用 Transformer 进行迁移学习。\"\n      }' http:\u002F\u002F127.0.0.1:8000\u002Fpredict\n    ```\n\n    ```python\n    # 通过 Python\n    import json\n    import requests\n    title = \"使用 Transformer 进行迁移学习\"\n    description = \"在文本分类任务上使用 Transformer 进行迁移学习。\"\n    json_data = json.dumps({\"title\": title, \"description\": description})\n    requests.post(\"http:\u002F\u002F127.0.0.1:8000\u002Fpredict\", data=json_data).json()\n    ```\n\n  \u003C\u002Fdetails>\n\n  ### 测试\n  ```bash\n  # 代码\n  python3 -m pytest tests\u002Fcode --verbose --disable-warnings\n\n  # 数据\n  export DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\n  pytest --dataset-loc=$DATASET_LOC tests\u002Fdata --verbose --disable-warnings\n\n  # 模型\n  export EXPERIMENT_NAME=\"llm\"\n  export RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n  pytest --run-id=$RUN_ID tests\u002Fmodel --verbose --disable-warnings\n\n  # 覆盖率\n  python3 -m pytest --cov madewithml --cov-report html\n  ```\n\n  ## 生产环境\n\n  从这一点开始，为了将我们的应用程序部署到生产环境，我们需要使用 Anyscale 平台，或者在一个由您自己管理的 [云虚拟机](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Findex.html#cloud-vm-index) 或 [本地集群](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Flaunching-clusters\u002Fon-premises.html#on-prem) 上运行（使用 Ray）。如果不使用 Anyscale，相关命令会略有不同（参见 [Ray 文档](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Frunning-applications\u002Fjob-submission\u002Findex.html)），但基本概念是相同的。\n\n  > 如果您不想自己搭建所有这些环境，我们强烈建议您加入我们的[即将开课的直播班](https:\u002F\u002F4190urw86oh.typeform.com\u002Fmadewithml){:target=\"_blank\"}，我们将为您准备好包含所有基础设施的环境，让您专注于机器学习本身。\n\n  \u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_4fd7296b3b49.png\">\n  \u003C\u002Fdiv>\n\n  ### 认证\n\n  如果我们使用 Anyscale Workspaces，以下凭据会**自动**为我们设置。在 Workspaces 中，我们**不需要**显式设置这些凭据；但在本地或在其他未配置 Anyscale Jobs 和 Services 的集群上运行时，则需要设置。\n\n  ```bash\n  export ANYSCALE_HOST=https:\u002F\u002Fconsole.anyscale.com\n  export ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN  # 从 Anyscale 凭据页面获取\n  ```\n\n  ### 集群环境\n\n  集群环境决定了我们的工作负载将在**哪里**执行（操作系统、依赖项等）。我们已经为我们创建了这个[集群环境](.\u002Fdeploy\u002Fcluster_env.yaml)，但您也可以自行创建或更新一个。\n\n  ```bash\n  export CLUSTER_ENV_NAME=\"madewithml-cluster-env\"\n  anyscale cluster-env build deploy\u002Fcluster_env.yaml --name $CLUSTER_ENV_NAME\n  ```\n\n  ### 计算配置\n\n  计算配置决定了我们的工作负载将在**哪些资源**上执行。我们已经为我们创建了这个[计算配置](.\u002Fdeploy\u002Fcluster_compute.yaml)，但您也可以自行创建。\n\n  ```bash\n  export CLUSTER_COMPUTE_NAME=\"madewithml-cluster-compute\"\n  anyscale cluster-compute create deploy\u002Fcluster_compute.yaml --name $CLUSTER_COMPUTE_NAME\n  ```\n\n  ### Anyscale 作业\n\n  现在我们可以开始执行我们的机器学习工作负载了。我们决定将所有工作负载合并为一个[作业](.\u002Fdeploy\u002Fjobs\u002Fworkloads.yaml)，当然也可以为每个工作负载（训练、评估等）分别创建单独的作业。首先，我们需要编辑 [`workloads.yaml`](.\u002Fdeploy\u002Fjobs\u002Fworkloads.yaml) 文件中的 `$GITHUB_USERNAME` 占位符：\n  ```yaml\n  runtime_env:\n    working_dir: .\n    upload_path: s3:\u002F\u002Fmadewithml\u002F$GITHUB_USERNAME\u002Fjobs  # \u003C--- 更改用户名（区分大小写）\n    env_vars:\n      GITHUB_USERNAME: $GITHUB_USERNAME  # \u003C--- 更改用户名（区分大小写）\n  ```\n\n  这里的 `runtime_env` 指定了我们将当前的工作目录上传到 S3 存储桶，以便在执行 Anyscale 作业时，所有工作节点都能访问所需的代码。`GITHUB_USERNAME` 用于稍后将工作负载的结果保存到 S3，以便后续检索（例如用于服务部署）。\n\n  现在我们可以提交作业来执行我们的机器学习工作负载：\n  ```bash\n  anyscale job submit deploy\u002Fjobs\u002Fworkloads.yaml\n  ```\n\n  ### Anyscale 服务\n\n  在机器学习工作负载执行完毕后，我们可以将模型部署到生产环境中进行服务提供。与 Anyscale 作业配置类似，务必在 [`serve_model.yaml`](.\u002Fdeploy\u002Fservices\u002Fserve_model.yaml) 中更改 `$GITHUB_USERNAME`。\n\n  ```yaml\n  ray_serve_config:\n    import_path: deploy.services.serve_model:entrypoint\n    runtime_env:\n      working_dir: .\n      upload_path: s3:\u002F\u002Fmadewithml\u002F$GITHUB_USERNAME\u002Fservices  # \u003C--- 更改用户名（区分大小写）\n      env_vars:\n        GITHUB_USERNAME: $GITHUB_USERNAME  # \u003C--- 更改用户名（区分大小写）\n  ```\n\n  现在我们可以启动服务：\n  ```bash\n  anyscale service deploy deploy\u002Fservices\u002Fserve_model.yaml\n  ```\n\n# 部署服务\nanyscale service rollout -f deploy\u002Fservices\u002Fserve_model.yaml\n\n# 查询\ncurl -X POST -H \"Content-Type: application\u002Fjson\" -H \"Authorization: Bearer $SECRET_TOKEN\" -d '{\n  \"title\": \"使用 Transformer 进行迁移学习\",\n  \"description\": \"在文本分类任务上利用 Transformer 进行迁移学习。\"\n}' $SERVICE_ENDPOINT\u002Fpredict\u002F\n\n# 回滚（回退到服务的上一版本）\nanyscale service rollback -f $SERVICE_CONFIG --name $SERVICE_NAME\n\n# 终止\nanyscale service terminate --name $SERVICE_NAME\n```\n\n### CI\u002FCD\n\n我们不会每次代码变更时都手动部署应用。相反，我们将使用 GitHub Actions 自动化这一流程！\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_298a6c8e97a7.png\">\n\u003C\u002Fdiv>\n\n1. 首先，我们需要将必要的凭据添加到 GitHub 仓库的 [`\u002Fsettings\u002Fsecrets\u002Factions`](https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\u002Fsettings\u002Fsecrets\u002Factions) 页面中。\n\n``` bash\nexport ANYSCALE_HOST=https:\u002F\u002Fconsole.anyscale.com\nexport ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN  # 从 https:\u002F\u002Fconsole.anyscale.com\u002Fo\u002Fmadewithml\u002Fcredentials 获取\n```\n\n2. 现在我们可以对代码进行修改（不在 `main` 分支上），并将其推送到 GitHub。不过，在将代码推送到仓库之前，我们需要先使用凭据进行身份验证：\n\n```bash\ngit config --global user.name \"你的名字\"  # \u003C-- 替换为你的姓名\ngit config --global user.email you@example.com  # \u003C-- 替换为你的邮箱\ngit add .\ngit commit -m \"\"  # \u003C-- 替换为你的提交信息\ngit push origin dev\n```\n\n此时系统会提示你输入用户名和密码（个人访问令牌）。请按照以下步骤获取个人访问令牌：[新建 GitHub 个人访问令牌](https:\u002F\u002Fgithub.com\u002Fsettings\u002Ftokens\u002Fnew) → 添加名称 → 启用 `repo` 和 `workflow` 权限 → 点击“生成令牌”（向下滚动）→ 复制令牌并在提示输入密码时粘贴。\n\n3. 接下来，我们可以从该分支创建一个 Pull Request 到 `main` 分支，这将触发 [workloads 工作流](\u002F.github\u002Fworkflows\u002Fworkloads.yaml)。如果工作流（Anyscale Jobs）成功完成，它会在 PR 上直接生成包含训练和评估结果的评论。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_57eb341e3ec9.png\">\n\u003C\u002Fdiv>\n\n4. 如果我们对结果满意，就可以将 PR 合并到 `main` 分支。这将触发 [serve 工作流](\u002F.github\u002Fworkflows\u002Fserve.yaml)，从而将我们的新服务部署到生产环境！\n\n### 持续学习\n\n有了 CI\u002FCD 流程来部署应用，我们现在可以专注于持续改进模型。基于此基础，我们可以轻松扩展以连接定时任务（cron）、[数据管道](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fdata-engineering\u002F)、通过[监控](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fmonitoring\u002F)检测到的数据漂移、[在线评估](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fevaluation\u002F#online-evaluation)等。此外，我们还可以轻松添加更多上下文，例如直接在 PR 中将任何实验与当前生产中的模型进行对比等。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_readme_35e790de5108.png\">\n\u003C\u002Fdiv>\n\n## 常见问题解答\n\n### Jupyter Notebook 内核\n\n在配置 Jupyter Notebook 的内核时遇到问题吗？默认情况下，Jupyter 会使用我们虚拟环境中的内核，但我们也可以手动将其添加到 Jupyter 中：\n```bash\npython3 -m ipykernel install --user --name=venv\n```\n现在我们可以打开笔记本 → 内核（顶部菜单栏）→ 更改内核 → `venv`。如果需要删除此内核，可以执行以下操作：\n```bash\njupyter kernelspec list\njupyter kernelspec uninstall venv\n```","# MLOps Course 快速上手指南\n\n本指南旨在帮助开发者快速搭建环境并运行 **MLOps Course**，学习如何将机器学习与软件工程结合，构建可可靠的生产级 ML 应用。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux, macOS 或 Windows (WSL 推荐)\n- **Python 版本**: 推荐 **Python 3.10** (可使用 `pyenv` 管理版本)\n- **硬件资源**:\n  - **本地运行**: 任意笔记本电脑即可（单机器模式），但训练速度较慢。建议配置至少 4 核 CPU。\n  - **GPU 加速**: 若需加速训练，建议具备 NVIDIA GPU 及相应驱动。\n- **网络**: 需能访问 GitHub、HuggingFace 及 PyPI 源（国内用户建议配置镜像加速）。\n\n### 前置依赖\n- Git\n- pip, setuptools, wheel\n- Jupyter Lab (用于交互式探索)\n\n> **💡 国内加速建议**：\n> 在设置环境变量或使用 pip 时，推荐使用清华或阿里镜像源以提升下载速度：\n> ```bash\n> export PIP_INDEX_URL=https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n---\n\n## 安装步骤\n\n### 1. 克隆代码仓库\n首先创建一个新的 GitHub 仓库（命名为 `Made-With-ML` 并勾选 \"Add a README file\" 以生成 `main` 分支），然后克隆代码：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML.git .\ngit remote set-url origin https:\u002F\u002Fgithub.com\u002FGITHUB_USERNAME\u002FMade-With-ML.git  # \u003C-- 将 GITHUB_USERNAME 替换为你的用户名\ngit checkout -b dev\n```\n\n### 2. 配置虚拟环境与依赖\n本项目推荐在虚拟环境中运行。\n\n**本地环境 (Local):**\n```bash\nexport PYTHONPATH=$PYTHONPATH:$PWD\npython3 -m venv venv\nsource venv\u002Fbin\u002Factivate  # Windows 用户请使用: venv\\Scripts\\activate\n\n# 升级基础工具并安装依赖\npython3 -m pip install --upgrade pip setuptools wheel\npython3 -m pip install -r requirements.txt\n\n# 安装并更新 pre-commit 钩子\npre-commit install\npre-commit autoupdate\n```\n\n**Anyscale 云环境 (可选):**\n如果你使用 Anyscale Workspace，环境已预配置，只需执行：\n```bash\nexport PYTHONPATH=$PYTHONPATH:$PWD\npre-commit install\npre-commit autoupdate\n```\n\n---\n\n## 基本使用\n\n项目提供了两种使用方式：**Jupyter Notebook 交互式探索** 和 **Python 脚本工程化实践**。\n\n### 方式一：交互式探索 (Notebook)\n适合初学者理解核心工作流。\n\n**启动 Jupyter Lab:**\n```bash\njupyter lab notebooks\u002Fmadewithml.ipynb\n```\n*注：Anyscale 用户可点击工作台右上角的 Jupyter 图标直接进入。*\n\n### 方式二：工程化脚本运行 (推荐)\n将流程拆分为独立的 Python 模块，符合生产级最佳实践。以下是最小化的运行示例。\n\n#### 1. 模型训练 (Training)\n运行训练脚本，将结果保存至本地 JSON 文件。\n*(注意：根据你的机器资源调整 `--num-workers`, `--cpu-per-worker`, `--gpu-per-worker` 参数)*\n\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\n\npython madewithml\u002Ftrain.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --train-loop-config \"$TRAIN_LOOP_CONFIG\" \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results\u002Ftraining_results.json\n```\n\n#### 2. 实验追踪 (Experiment Tracking)\n启动 MLflow 服务器以查看训练指标和模型版本。\n\n```bash\nexport MODEL_REGISTRY=$(python -c \"from madewithml import config; print(config.MODEL_REGISTRY)\")\nmlflow server -h 0.0.0.0 -p 8080 --backend-store-uri $MODEL_REGISTRY\n```\n- **本地访问**: 打开浏览器访问 `http:\u002F\u002Flocalhost:8080\u002F`\n- **Anyscale 访问**: 运行 `echo https:\u002F\u002F8080-port-$ANYSCALE_SESSION_DOMAIN` 获取公开链接。\n\n#### 3. 模型评估 (Evaluation)\n使用保留数据集评估最佳模型的表现。\n\n```bash\nexport EXPERIMENT_NAME=\"llm\"\n# 自动获取验证集损失最小的 Run ID\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\nexport HOLDOUT_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fholdout.csv\"\n\npython madewithml\u002Fevaluate.py \\\n    --run-id $RUN_ID \\\n    --dataset-loc $HOLDOUT_LOC \\\n    --results-fp results\u002Fevaluation_results.json\n```\n\n#### 4. 模型推理 (Inference)\n加载训练好的模型进行预测。\n\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n\npython madewithml\u002Fpredict.py predict \\\n    --run-id $RUN_ID \\\n    --title \"Transfer learning with transformers\" \\\n    --description \"Using transformers for transfer learning on text classification tasks.\"\n```\n\n**输出示例:**\n```json\n[{\n  \"prediction\": [\"natural-language-processing\"],\n  \"probabilities\": {\n    \"computer-vision\": 0.0009767753,\n    \"mlops\": 0.0008223939,\n    \"natural-language-processing\": 0.9976\n  }\n}]\n```","某电商初创公司的算法团队正试图将实验室中准确率高达 92% 的商品推荐模型上线，以支撑即将到来的大促活动。\n\n### 没有 mlops-course 时\n- **实验与生产脱节**：数据科学家在本地 Jupyter Notebook 中完成的模型代码，因缺乏软件工程规范，无法直接由后端工程师部署，导致反复重构和沟通成本高昂。\n- **迭代过程黑盒化**：每次调整超参数或更换数据集后，缺乏系统的追踪机制，团队难以复现最佳结果，甚至出现过误用旧版本模型上线的事故。\n- **扩展性瓶颈**：面对大促期间激增的数据量和并发请求，团队不知如何在 Python 生态内平滑扩展计算资源，只能临时学习陌生的分布式框架，延误了上线窗口。\n- **缺乏自动化流程**：模型更新完全依赖人工手动操作，没有建立 CI\u002FCD 流水线，导致新模型从训练到部署周期长达数周，无法快速响应市场变化。\n\n### 使用 mlops-course 后\n- **端到端工程化落地**：团队遵循课程中的最佳实践，将模型开发纳入标准软件工程流程，实现了从实验代码到生产级 API 的无缝衔接，部署效率提升 300%。\n- **全链路可观测性**：利用课程教授的组件构建了完整的追踪系统，每一次实验的参数、指标和产物均自动记录，确保模型迭代过程透明且可复现。\n- **弹性伸缩能力**：基于课程指导的架构，团队在不切换编程语言的前提下，轻松将训练和推理任务扩展至集群环境，从容应对大促流量洪峰。\n- **自动化持续交付**：建立了成熟的 CI\u002FCD 工作流，一旦新模型通过测试即可自动触发部署，将模型更新周期从数周缩短至小时级，显著提升了业务响应速度。\n\nmlops-course 帮助团队打破了算法与工程的壁垒，构建了一套可靠、可扩展且自动化的生产级机器学习系统。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_mlops-course_612592bf.png","GokuMohandas","Goku Mohandas","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FGokuMohandas_ebad4734.png","ml, bio, art, tennis, travel",null,"gokumd@gmail.com","https:\u002F\u002Fgithub.com\u002FGokuMohandas",[80,84,87,91],{"name":81,"color":82,"percentage":83},"Jupyter Notebook","#DA5B0B",97.9,{"name":85,"color":86,"percentage":32},"Python","#3572A5",{"name":88,"color":89,"percentage":90},"Shell","#89e051",0.1,{"name":92,"color":93,"percentage":94},"Makefile","#427819",0,3333,593,"2026-04-06T00:41:47","MIT","Linux, macOS, Windows","非必需（本地运行可配置为 0），若使用需支持 CUDA 的 NVIDIA GPU，具体型号和显存未说明","未说明",{"notes":103,"python":104,"dependencies":105},"支持在本地笔记本、Anyscale 集群、AWS\u002FGCP、Kubernetes 或本地服务器上运行。本地运行时建议将 GPU 数量设为 0。实验跟踪使用 MLflow，也可集成 Weights & Biases 或 Comet。建议使用 pyenv 管理 Python 版本。","3.10 (推荐)",[106],"未明确列出具体库名及版本 (需安装 requirements.txt)",[14,16,108,35,52],"其他",[110,111,112,113,114,115,116,117,118,119,120,121],"machine-learning","deep-learning","pytorch","mlops","data-engineering","data-quality","data-science","distributed-ml","llms","natural-language-processing","python","ray","2026-03-27T02:49:30.150509","2026-04-07T13:37:03.119865",[125,130,135,140,145,149],{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},22187,"如何在 Mac M1 芯片上解决依赖安装失败（特别是 numpy 构建错误）的问题？","Mac M1 用户在安装依赖时经常遇到 numpy 构建失败的问题。解决方案是将 Python 版本降级为 3.7.13。维护者已更新课程代码和 Makefile 以支持该版本，确保在任何设备上都能无问题运行。建议使用 pyenv 安装特定版本：`pyenv install 3.7.13` 并设置为局部或全局版本。","https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002Fmlops-course\u002Fissues\u002F22",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},22188,"运行 `mkdocs serve` 时出现 'module has no attribute' 错误或无法收集模块怎么办？","如果 mkdocs 无法识别某些文件夹中的 Python 文件或报错缺少属性，通常是因为这些文件夹缺少 `__init__.py` 文件。解决方法是在需要被 mkdocs 收集的每个文件夹中添加一个空的 `__init__.py` 文件，将其标识为常规的 Python 包文件夹。此外，确保使用兼容的版本组合，例如 Python 3.9.1、mkdocs==1.3.0 和 mkdocstrings==0.18.1。","https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002Fmlops-course\u002Fissues\u002F20",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},22189,"Docker 构建过程中运行 `dvc pull` 失败，提示找不到缓存文件或目录（如 stores）怎么办？","MLOps 课程的各个概念是紧密耦合的，必须按顺序学习。Dockerfile 中引用的目录（如 `stores`）是在课程早期的脚本步骤中生成的。如果跳过前面的步骤直接运行后期的 Docker 构建，会因为缺少这些目录而失败。请务必严格按照课程顺序从头到尾执行，不要跳过任何章节。","https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002Fmlops-course\u002Fissues\u002F18",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},22190,"安装依赖时遇到 Python 包版本冲突（特别是 numpy、pandas 和 snorkel 之间）如何解决？","在使用较新的 Python 版本（如 3.10）时，`tagifai`、`snorkel` 和其他库对 `numpy` 的版本要求会发生冲突。最稳妥的解决方案是将 Python 环境版本切换为 3.7.13。维护者已确认该版本能避免大部分依赖冲突，并更新了项目配置以适配此版本。避免手动调整单个包的版本号，直接使用推荐的 Python 版本即可解决。","https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002Fmlops-course\u002Fissues\u002F16",{"id":146,"question_zh":147,"answer_zh":148,"source_url":144},22191,"为什么在 Mac M1 上即使更新了版本仍然无法构建 numpy wheel？","这是由于 Apple Silicon (M1) 架构与某些旧版科学计算库的编译兼容性问题。根本解决办法不是尝试修复编译错误，而是更换 Python 版本。社区反馈表明，使用 Python 3.7.13 可以完美避开此问题，因为该版本有预编译的二进制包或更容易在 M1 上编译。请卸载当前 Python 环境，重新安装 3.7.13 版本。",{"id":150,"question_zh":151,"answer_zh":152,"source_url":139},22192,"课程中的脚本或 Docker 镜像无法运行，是否可以直接跳到后面的章节？","不可以。MLOps 课程的设计是循序渐进的，后续章节依赖前序章节生成的配置文件、数据目录和环境变量。例如，`stores` 目录是在早期课程中通过运行配置脚本创建的。跳过步骤会导致后续命令（如 `dvc pull` 或 Docker 构建）因找不到必要资源而失败。请按顺序完成所有练习。",[]]