[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-MathFoundationRL--Book-Mathematical-Foundation-of-Reinforcement-Learning":3,"tool-MathFoundationRL--Book-Mathematical-Foundation-of-Reinforcement-Learning":65},[4,23,32,40,49,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,2,"2026-04-05T10:45:23",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[17,13,20,19,18],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":29,"last_commit_at":38,"category_tags":39,"status":22},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[19,13,20,18],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":46,"last_commit_at":47,"category_tags":48,"status":22},3215,"awesome-machine-learning","josephmisiti\u002Fawesome-machine-learning","awesome-machine-learning 是一份精心整理的机器学习资源清单，汇集了全球优秀的机器学习框架、库和软件工具。面对机器学习领域技术迭代快、资源分散且难以甄选的痛点，这份清单按编程语言（如 Python、C++、Go 等）和应用场景（如计算机视觉、自然语言处理、深度学习等）进行了系统化分类，帮助使用者快速定位高质量项目。\n\n它特别适合开发者、数据科学家及研究人员使用。无论是初学者寻找入门库，还是资深工程师对比不同语言的技术选型，都能从中获得极具价值的参考。此外，清单还延伸提供了免费书籍、在线课程、行业会议、技术博客及线下聚会等丰富资源，构建了从学习到实践的全链路支持体系。\n\n其独特亮点在于严格的维护标准：明确标记已停止维护或长期未更新的项目，确保推荐内容的时效性与可靠性。作为机器学习领域的“导航图”，awesome-machine-learning 以开源协作的方式持续更新，旨在降低技术探索门槛，让每一位从业者都能高效地站在巨人的肩膀上创新。",72149,1,"2026-04-03T21:50:24",[20,18],{"id":50,"name":51,"github_repo":52,"description_zh":53,"stars":54,"difficulty_score":46,"last_commit_at":55,"category_tags":56,"status":22},2234,"scikit-learn","scikit-learn\u002Fscikit-learn","scikit-learn 是一个基于 Python 构建的开源机器学习库，依托于 SciPy、NumPy 等科学计算生态，旨在让机器学习变得简单高效。它提供了一套统一且简洁的接口，涵盖了从数据预处理、特征工程到模型训练、评估及选择的全流程工具，内置了包括线性回归、支持向量机、随机森林、聚类等在内的丰富经典算法。\n\n对于希望快速验证想法或构建原型的数据科学家、研究人员以及 Python 开发者而言，scikit-learn 是不可或缺的基础设施。它有效解决了机器学习入门门槛高、算法实现复杂以及不同模型间调用方式不统一的痛点，让用户无需重复造轮子，只需几行代码即可调用成熟的算法解决分类、回归、聚类等实际问题。\n\n其核心技术亮点在于高度一致的 API 设计风格，所有估算器（Estimator）均遵循相同的调用逻辑，极大地降低了学习成本并提升了代码的可读性与可维护性。此外，它还提供了强大的模型选择与评估工具，如交叉验证和网格搜索，帮助用户系统地优化模型性能。作为一个由全球志愿者共同维护的成熟项目，scikit-learn 以其稳定性、详尽的文档和活跃的社区支持，成为连接理论学习与工业级应用的最",65628,"2026-04-05T10:10:46",[20,18,14],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":10,"last_commit_at":63,"category_tags":64,"status":22},3364,"keras","keras-team\u002Fkeras","Keras 是一个专为人类设计的深度学习框架，旨在让构建和训练神经网络变得简单直观。它解决了开发者在不同深度学习后端之间切换困难、模型开发效率低以及难以兼顾调试便捷性与运行性能的痛点。\n\n无论是刚入门的学生、专注算法的研究人员，还是需要快速落地产品的工程师，都能通过 Keras 轻松上手。它支持计算机视觉、自然语言处理、音频分析及时间序列预测等多种任务。\n\nKeras 3 的核心亮点在于其独特的“多后端”架构。用户只需编写一套代码，即可灵活选择 TensorFlow、JAX、PyTorch 或 OpenVINO 作为底层运行引擎。这一特性不仅保留了 Keras 一贯的高层易用性，还允许开发者根据需求自由选择：利用 JAX 或 PyTorch 的即时执行模式进行高效调试，或切换至速度最快的后端以获得最高 350% 的性能提升。此外，Keras 具备强大的扩展能力，能无缝从本地笔记本电脑扩展至大规模 GPU 或 TPU 集群，是连接原型开发与生产部署的理想桥梁。",63927,"2026-04-04T15:24:37",[20,14,18],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":79,"owner_twitter":79,"owner_website":82,"owner_url":83,"languages":84,"stars":93,"forks":94,"last_commit_at":95,"license":79,"difficulty_score":46,"env_os":96,"env_gpu":97,"env_ram":97,"env_deps":98,"category_tags":101,"github_topics":102,"view_count":108,"oss_zip_url":79,"oss_zip_packed_at":79,"status":22,"created_at":109,"updated_at":110,"faqs":111,"releases":112},2928,"MathFoundationRL\u002FBook-Mathematical-Foundation-of-Reinforcement-Learning","Book-Mathematical-Foundation-of-Reinforcement-Learning","This is the homepage of a new book entitled \"Mathematical Foundations of Reinforcement Learning.\"","《强化学习的数学基础》是一本专为希望深入理解强化学习核心原理的读者打造的新书。它不只罗列算法步骤，更致力于从数学视角解释“为什么这样设计”以及“为何有效”，帮助读者透过现象看本质。\n\n针对现有资料往往重应用轻推导、或数学门槛过高导致难以入门的痛点，本书精心控制了数学深度，采用循序渐进的结构，将复杂的算法核心思想与干扰细节剥离。全书基于直观的网格世界任务展开大量实例演示，并创新性地使用灰色框标记可选读的深层数学内容，让不同背景的读者都能按需阅读，兼顾了严谨性与易读性。\n\n这本书非常适合高年级本科生、研究生、科研人员及从业者使用。即使你没有强化学习背景，只要具备概率论和线性代数基础，就能顺利上手；而对于已有经验的开发者，它也能提供全新的理论视角以深化认知。此外，作者还配套了浏览量超 200 万的中英文讲座视频，结合书本学习能获得更佳效果。作为西湖大学赵世钰教授多年教学讲义的结晶，它是通往强化学习理论殿堂的一座友好桥梁。","# About the Latex source code of my slides\n\nIf you are a professor and preparing a course and would like to use any content from my slides, feel free to reach out by email. I can share the source code with you. The slides were created using Latex\u002FBeamer.\n\nRegarding reader feedback and questions in the discussion section, please note that due to a high volume of commitments, there may be significant delays in my response. Your understanding would be greatly appreciated.\n\n***\n***\n\n# Why a new book on reinforcement learning?\n\nThis book aims to provide a **mathematical but friendly** introduction to the fundamental concepts, basic problems, and classic algorithms in reinforcement learning. Some essential features of this book are highlighted as follows.\n\n- The book introduces reinforcement learning from a mathematical point of view. Hopefully, readers will not only know the procedure of an algorithm but also understand why it was designed in the first place and why it works effectively.\n\n- The depth of the mathematics is carefully controlled to an adequate level. The mathematics is also presented in a carefully designed manner to ensure that the book is friendly to read. Readers can selectively read the materials presented in gray boxes according to their interests.\n\n- Many illustrative examples are given to help readers better understand the topics. All the examples in this book are based on a grid world task, which is easy to understand and helpful for illustrating concepts and algorithms.\n\n- When introducing an algorithm, the book aims to separate its core idea from complications that may be distracting. In this way, readers can better grasp the core idea of an algorithm.\n\n- The contents of the book are coherently organized. Each chapter is built based on the preceding chapter and lays a necessary foundation for the subsequent one.\n\n[![Book cover](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_readme_f45afa537d0d.png)](https:\u002F\u002Flink.springer.com\u002Fbook\u002F9789819739431)\n\n# Contents\n\nThe topics addressed in the book are shown in the figure below. This book contains ten chapters, which can be classified into two parts: the first part is about basic tools, and the second part is about algorithms. The ten chapters are highly correlated. In general, it is necessary to study the earlier chapters first before the later ones.\n\n![The map of this book](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_readme_a5715d962d6d.png)\n\n\n# Readership\n\nThis book is designed for senior undergraduate students, graduate students, researchers, and practitioners interested in reinforcement learning.\n\nIt does not require readers to have any background in reinforcement learning because it starts by introducing the most basic concepts. If the reader already has some background in reinforcement learning, I believe the book can help them understand some topics more deeply or provide different perspectives.\n\nThis book, however, requires the reader to have some knowledge of probability theory and linear algebra. Some basics of the required mathematics are also included in the appendix of this book.\n\n# About the author\nYou can find my info on my homepage https:\u002F\u002Fwww.shiyuzhao.net (GoogleSite) and my research group website https:\u002F\u002Fshiyuzhao.westlake.edu.cn\n\nI have been teaching a graduate-level course on reinforcement learning since 2019. Along with teaching, I have been preparing this book as the lecture notes for my students. \n\nI sincerely hope this book can help readers smoothly enter the exciting field of reinforcement learning.\n\n# Citation\n\n```\n@book{zhao2025RLBook,\n  title={Mathematical Foundations of Reinforcement Learning},\n  author={S. Zhao},\n  year={2025},\n  publisher={Springer Press}\n}\n```\n# Lecture videos \n\nThe lecture videos have received **2,100,000+ views** over the Internet and received very good feedback!\nBy combining the book with my lecture videos, I believe you can study better. \n\n- **Chinese lecture videos:** You can check the [Bilibili channel](https:\u002F\u002Fspace.bilibili.com\u002F2044042934) or the [Youtube channel](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUCztGtS5YYiNv8x3pj9hLVgg\u002Fplaylists).\n- **English lecture videos:** The English lecture videos have been uploaded to YouTube: [link here](https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=D1T4pcyHsMxj6CzB)\n\n[![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_readme_5676b1573a8e.png)](https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=B6mRR7vxBAjRAm_F)\n\n- [Overview of Reinforcement Learning in 30 Minutes](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZHMWHr9811U&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=1)\n- [L1: Basic Concepts (P1-State, action, policy, ...)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zJHtM5dN69g&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=2)\n- [L1: Basic Concepts (P2-Reward,return, Markov decision process)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=repVl3_GYCI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=3)\n- [L2: Bellman Equation (P1-Motivating examples)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=XCzWrlgZCwc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=4)\n- [L2: Bellman Equation (P2-State value)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DSvi3xEN13I&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=5)\n- [L2: Bellman Equation (P3-Bellman equation-Derivation)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=eNtId8yPWkA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=6)\n- [L2: Bellman Equation (P4-Matrix-vector form and solution)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EtCfBG_eP2w&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=7)\n- [L2: Bellman Equation (P5-Action value)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zJo2sLDzfcU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=8)\n- [L3: Bellman Optimality Equation (P1-Motivating example)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=lXKY_Hyg4SQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=9)\n- [L3: Bellman Optimality Equation (P2-Optimal policy)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=BxyjdHhK8a8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=10)\n- [L3: Bellman Optimality Equation (P3-More on BOE)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=FXftTCKotC8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=11)\n- [L3: Bellman Optimality Equation (P4-Interesting properties)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=a--bck2ow9s&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=12)\n- [L4: Value Iteration and Policy Iteration (P1-Value iteration)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=wMAVmLDIvQU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=13)\n- [L4: Value Iteration and Policy Iteration (P2-Policy iteration)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Pka6Om0nYQ8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=14)\n- [L4: Value Iteration and Policy Iteration (P3-Truncated policy iteration)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=tUjPFPD3Vc8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=15)\n- [L5: Monte Carlo Learning (P1-Motivating examples)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DO1yXinAV_Q&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=16)\n- [L5: Monte Carlo Learning (P2-MC Basic-introduction)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6ShisunU0zs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=17)\n- [L5: Monte Carlo Learning (P3-MC Basic-examples)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=axA0yns9FxU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=18)\n- [L5: Monte Carlo Learning (P4-MC Exploring Starts)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Qt8OMHPkLqg&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=19)\n- [L5: Monte Carlo Learning (P5-MC Epsilon-Greedy-introduction)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=dM3fYE630pY&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=20)\n- [L5: Monte Carlo Learning (P6-MC Epsilon-Greedy-examples)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=x6X_5ePT9gQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=21)\n- [L6: Stochastic Approximation and SGD (P1-Motivating example)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=1bMgejvWoAo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=22)\n- [L6: Stochastic Approximation and SGD (P2-RM algorithm: introduction)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=1FTGcNUUnCE&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=23)\n- [L6: Stochastic Approximation and SGD (P3-RM algorithm: convergence)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=juNDoAFEre4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=24)\n- [L6: Stochastic Approximation and SGD (P4-SGD algorithm: introduction)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EZO7Iadp5m4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=25)\n- [L6: Stochastic Approximation and SGD (P5-SGD algorithm: examples)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=BsxU_4qvvNA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=26)\n- [L6: Stochastic Approximation and SGD (P6-SGD algorithm: properties)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=fWxX9YuEHjE&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=27)\n- [L6: Stochastic Approximation and SGD (P7-SGD algorithm: comparison)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=yNEV2cLKuzU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=28)\n- [L7: Temporal-Difference Learning (P1-Motivating example)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=u1X-7XX3dtI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=29)\n- [L7: Temporal-Difference Learning (P2-TD algorithm: introduction)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=XiCUsc7CCE0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=30)\n- [L7: Temporal-Difference Learning (P3-TD algorithm: convergence)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=faWg8M91-Oo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=31)\n- [L7: Temporal-Difference Learning (P4-Sarsa)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=jYwQufkBUPo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=32)\n- [L7: Temporal-Difference Learning (P5-Expected Sarsa & n-step Sarsa)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0kKzQbWZOlk&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=33)\n- [L7: Temporal-Difference Learning (P6-Q-learning: introduction)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=4BvYR2hm730&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=34)\n- [L7: Temporal-Difference Learning (P7-Q-learning: pseudo code)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=I0YhlOIFF4s&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=35)\n- [L7: Temporal-Difference Learning (P8-Unified viewpoint and summary)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=3t74lvk1GBM&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=36)\n- [L8: Value Function Approximation (P1-Motivating example–curve fitting)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=uJXcI8fcdWc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=37)\n- [L8: Value Function Approximation (P2-Objective function)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Z3HI1TfpJP0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=38)\n- [L8: Value Function Approximation (P3-Optimization algorithm)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=piBDwrKt0uU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=39)\n- [L8: Value Function Approximation (P4-illustrative examples and analysis)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=VFyBNEZxMMs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=40)\n- [L8: Value Function Approximation (P5-Sarsa and Q-learning)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=C-HtY4-W_zw&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=41)\n- [L8: Value Function Approximation (P6-DQN–basic idea)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=lZCcbZbqVSQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=42)\n- [L8: Value Function Approximation (P7-DQN–experience replay)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=rynEdAdebi0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=43)\n- [L8: Value Function Approximation (P8-DQN–implementation and example)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vQHuCHjd6hA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=44)\n- [L9: Policy Gradient Methods (P1-Basic idea)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=mtFHOj83QSo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=45)\n- [L9: Policy Gradient Methods (P2-Metric 1–Average value)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=la8jQc3hX1M&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=46)\n- [L9: Policy Gradient Methods (P3-Metric 2–Average reward)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8RZ_rQFe69E&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=47)\n- [L9: Policy Gradient Methods (P4-Gradients of the metrics)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=MvmtPXur3Ls&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=48)\n- [L9: Policy Gradient Methods (P5-Gradient-based algorithms & REINFORCE)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=1DQnnUC8ng8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=49)\n- [L10: Actor-Critic Methods (P1-The simplest Actor-Critic)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=kjCZAT5Wh80&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=50)\n- [L10: Actor-Critic Methods (P2-Advantage Actor-Critic)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vZVXJJcZNEM&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=51)\n- [L10: Actor-Critic Methods (P3-Importance sampling & off-policy Actor-Critic)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TfO5mnsiGKc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=52)\n- [L10: Actor-Critic Methods (P4-Deterministic Actor-Critic)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=dTjz1RNtic4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=53)\n- [L10: Actor-Critic Methods (P5-Summary and goodbye!)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=npvnnKcXoBs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=54)\n\n**Some comments from YouTube and Amazon:**\n\n[![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_readme_2f90244c2055.jpg)](https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=B6mRR7vxBAjRAm_F)\n[![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_readme_ed7ae2b88f4c.jpg)](https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=B6mRR7vxBAjRAm_F)\n\n\n# Third-party code and materials\n\nMany enthusiastic readers sent me the source code or notes that they developed when they studied this book. If you create any materials based on course, you are welcome to write an email. I am happy to share the links here and hope they may be helpful to other readers. I must emphasize that I have not verified the code. If you have any questions, you can directly contact the developers. \n\n**Code**\n\n*Python:*\n- https:\u002F\u002Fgithub.com\u002FRonchy2000\u002FMulti-agent-RL\u002Ftree\u002Fmaster\u002FRL_Learning-main (Oct 2025, by Rongqi Lu)\n\n- https:\u002F\u002Fgithub.com\u002Fzhoubay\u002FCode-for-Mathematical-Foundations-of-Reinforcement-Learning (Mar 2025, by Xibin ZHOU)\n\n- https:\u002F\u002Fgithub.com\u002F10-OASIS-01\u002Fminrl (Feb 2025)\n\n- https:\u002F\u002Fgithub.com\u002FSupermanCaozh\u002FThe_Coding_Foundation_in_Reinforcement_Learning  (by Zehong Cao, Aug 2024)\n\n- https:\u002F\u002Fgithub.com\u002Fziwenhahaha\u002FCode-of-RL-Beginning by RLGamer (Mar 2024)\n  - Videos for code explanation: https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1fW421w7NH\n\n- https:\u002F\u002Fgithub.com\u002Fjwk1rose\u002FRL_Learning by Wenkang Ji (Feb 2024)\n\n*Matlab:*\n-  https:\u002F\u002Fgithub.com\u002FEveryDayIsaSong\u002FMATLAB-Code-for-Mathematical-Foundation-of-Reinforcement-Learning (by Yucheng Mao, Jan 2026)\n\n*R:*\n\n- https:\u002F\u002Fgithub.com\u002FNewbieToEverything\u002FCode-Mathmatical-Foundation-of-Reinforcement-Learning\n\n*C++:*\n\n- https:\u002F\u002Fgithub.com\u002Fpurundong\u002Ftest_rl\n\n\n**Study notes**\n\n*English:*\n\n- https:\u002F\u002Flyk-love.cn\u002Ftags\u002Freinforcement-learning\u002F \nby a graduate student from UC Davis\n\n*Chinese:* \n\n- https:\u002F\u002Fgithub.com\u002FPeanut-Study\u002FReinforcement-Learning-Study-Note\u002Ftree\u002Fmain (Jan 2026)\n  \n- https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F692207843 \n\n- https:\u002F\u002Fblog.csdn.net\u002Fqq_64671439\u002Fcategory_12540921.html\n\n- http:\u002F\u002Ft.csdnimg.cn\u002FEH4rj\n\n- https:\u002F\u002Fblog.csdn.net\u002FLvGreat\u002Farticle\u002Fdetails\u002F135454738\n\n- https:\u002F\u002Fxinzhe.blog.csdn.net\u002Farticle\u002Fdetails\u002F129452000  \n\n- https:\u002F\u002Fblog.csdn.net\u002Fv20000727\u002Farticle\u002Fdetails\u002F136870879?spm=1001.2014.3001.5502\n\n- https:\u002F\u002Fblog.csdn.net\u002Fm0_64952374\u002Fcategory_12883361.html\n\nThere are also many others notes made by many other readers on the Internet. I am not able to put them all here. You are welcome to recommend to me if you find a good one.\n\n**Bilibili videos made based on my course**\n\n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1DMBYB6Edo （Jan 2026）\n\n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1fW421w7NH\n\n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Ne411m7GX\n  \n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1HX4y1H7uR\n  \n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1TgzsYDEnP\n  \n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1CQ4y1J7zu\n\n# Update history \n\n**(July 2025) Minor update: Typo corrections**\n\nIt has been nearly a year since the book's last update. During this period, keen-eyed readers have identified additional (about 10) typos and brought them to my attention. I really appreciate it. To prevent confusion for future readers, I have corrected these typos and updated the PDF files on GitHub. For the printed Springer edition, I prepared an errata list. Should you read a printed copy, you may consult this list as needed.\n\nFinally, I wish to express heartfelt appreciation for our readers' invaluable contributions. Since its release, this book has received tremendous attention and feedback, enabling swift refinements. While I believe most typos have now been addressed, please don't hesitate to reach out if you spot anything that needs correction.\n\n**(Jun 2025) 10,000+ stars!**\n\n(Dec 2024) 4,000+ stars -> (Feb 2025) 5,000+ stars -> (Mar 2025) 7,000+ stars! -> (Apr 2025) 8,000+ stars! -> (May 2025) 9,000+ stars!\n\n**(Oct 2024) Book cover**\n\nThe design of the book cover is finished. The book will be officially published by Springer early next year. It has been published by Tsinghua University Press.\n\n\n**(Sep 2024) Minor update before printing by Springer**\n\nI revised some very minor places that readers may hardly notice. It is supposed to be the final version before printing by Springer. \n\n**(Aug 2024) 3000 Stars and more code**\n\nThe book has received 3000+ stars, which is a great achievement to me. Thanks to everyone. Hope it really helped you.\n\nI also received more code implementation from enthusiastic readers. For example, this [GitHub page](https:\u002F\u002Fgithub.com\u002FSupermanCaozh\u002FThe_Coding_Foundation_in_Reinforcement_Learning) provided Python implementation of almost all examples in my book. On the one hand, I am very glad to see that. On the other hand, I am a little worried that my students in my offline class may use the code to do their homework:-). Overall, I am happy because it indicates that the book and open course are really helpful to the readers; Otherwise, they would not bother to develop the code by themselves:-)\n\n**(Jun 2024) Minor update before printing**\n\nThis is the fourth version of the book draft. It is supposed to be the final one before the book is officially published. Specifically, when proofreading the book manuscript, I detected some very minor issues. Together with some reported by enthusiastic readers, they have been revised in this version.\n\n**(Apr 2024) Code for the Grid-World Environment**\n\nWe added the code for the grid-world environment in my book. Interested readers can develop and test their own algorithms in this environment. Both Python and MATLAB versions are provided.\n\nPlease note that we do not provide the code of all the algorithms involved in the book. That is because they are the homework for the students in offline teaching: the students need to develop their own algorithms using the provided environment. Nevertheless, there are third-party implementations of some algorithms. Interested readers can check the links on the home page of the book.\n\nI need to thank my PhD students, Yize Mi and Jianan Li, who are also the Teaching Assistants of my offline teaching. They contributed greatly to the code.\n\nYou are welcome to provide any feedback about the code such as bugs if detected.\n\n**(Mar 2024) 2K stars**\n\nThe book has received 2K stars. I also received many positive evaluations of the book from many readers. Very glad that it can be helpful. \n\n**(Mar 2024) Minor update**\n\nThe third version of the draft of the book is online now.\n\nCompared to the second version, the third version is improved in the sense that some minor typos have been corrected. Here, I would like to thank the readers who sent me their feedback. \n\n**(Sep 2023) 1000+ stars**\n\nThe book received 1000+ stars! Thank everybody!\n\n**(Aug 2023) Major update - second version**\n\n*The second version of the draft of the book is online now!!*\n\nCompared to the first version, which was online one year ago, the second version has been improved in various ways. For example, we replotted most of the figures, reorganized some contents to make them clearer, corrected some typos, and added Chapter 10, which was not included in the first version. \n\nI put the first draft of this book online in August 2022. Up to now, I have received valuable feedback from many readers worldwide. I want to express my gratitude to these readers.\n\n**(Nov 2022) Will be jointly published**\n\nThis book will be published *jointly by Springer Nature and Tsinghua University Press*. It will probably be printed in the second half of 2023.\n\nI have received some comments and suggestions about this book from some readers. Thanks a lot, and I appreciate it. I am still collecting feedback and will probably revise the draft in several months. Your feedback can make this book more helpful for other readers!\n\n**(Oct 2022) Lecture notes and vidoes**\n\nThe *lecture slides* have been uploaded in the folder \"Lecture slides.\"\n\nThe *lecture videos* (in Chinese) are online. Please check our Bilibili channel https:\u002F\u002Fspace.bilibili.com\u002F2044042934 or the Youtube channel https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUCztGtS5YYiNv8x3pj9hLVgg\u002Fplaylists\n\n**(Aug 2022) First draft**\n\nThe first draft of the book is online.\n","# 关于我的幻灯片的 LaTeX 源代码\n\n如果您是教授，正在准备课程，并希望使用我幻灯片中的任何内容，请随时通过电子邮件与我联系。我可以将源代码分享给您。这些幻灯片是使用 LaTeX\u002FBeamer 制作的。\n\n关于读者反馈和讨论区的问题，请注意，由于我事务繁忙，回复可能会有较大延迟。感谢您的理解。\n\n***\n***\n\n# 为什么需要一本新的强化学习书籍？\n\n本书旨在为强化学习的基本概念、基础问题和经典算法提供一种**既具有数学性又易于理解**的介绍。本书的一些重要特点如下：\n\n- 本书从数学的角度介绍强化学习。希望通过阅读本书，读者不仅能了解算法的具体步骤，还能理解其设计初衷以及为何能够有效工作。\n  \n- 数学深度被精心控制在适当的水平，并且以经过仔细设计的方式呈现，确保全书易于阅读。读者可以根据自己的兴趣选择性地阅读灰色背景框中的内容。\n\n- 书中提供了大量示例，帮助读者更好地理解相关主题。所有示例均基于网格世界任务，这一任务简单易懂，有助于阐释概念和算法。\n\n- 在介绍算法时，本书力求将核心思想与可能分散注意力的复杂细节区分开来，从而使读者更容易抓住算法的核心要点。\n\n- 全书内容组织连贯，每一章都建立在前一章的基础上，并为后续章节奠定必要的基础。\n\n[![图书封面](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_readme_f45afa537d0d.png)](https:\u002F\u002Flink.springer.com\u002Fbook\u002F9789819739431)\n\n# 目录\n\n本书涵盖的主题如图所示。全书共十章，可分为两大部分：第一部分介绍基础工具，第二部分则聚焦于算法。这十章内容紧密相关，通常建议先学习前面的章节，再继续后面的章节。\n\n![本书目录图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_readme_a5715d962d6d.png)\n\n# 读者对象\n\n本书适合对强化学习感兴趣的高年级本科生、研究生、研究人员及从业者。本书无需读者具备强化学习的基础知识，因为会从最基础的概念开始讲解。如果读者已经有一定的强化学习背景，相信本书也能帮助他们更深入地理解某些主题，或提供不同的视角。\n\n不过，本书要求读者具备一定的概率论和线性代数知识。书中附录也简要介绍了所需的数学基础知识。\n\n# 关于作者\n您可以在我的个人主页 https:\u002F\u002Fwww.shiyuzhao.net（GoogleSite）以及我的研究组网站 https:\u002F\u002Fshiyuzhao.westlake.edu.cn 上找到我的相关信息。\n\n自2019年起，我一直教授一门研究生级别的强化学习课程。在教学的同时，我也一直在编写这本书，作为学生的课堂讲义。\n\n我衷心希望本书能帮助读者顺利进入激动人心的强化学习领域。\n\n# 引用格式\n\n```\n@book{zhao2025RLBook,\n  title={Mathematical Foundations of Reinforcement Learning},\n  author={S. Zhao},\n  year={2025},\n  publisher={Springer Press}\n}\n```\n\n# 讲座视频\n\n这些讲座视频在互联网上已获得**210万+次观看**，并得到了非常好的反馈！结合本书与我的讲座视频，相信您能更好地学习相关内容。\n\n- **中文讲座视频：** 您可以访问 [Bilibili 频道](https:\u002F\u002Fspace.bilibili.com\u002F2044042934) 或 [YouTube 频道](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUCztGtS5YYiNv8x3pj9hLVgg\u002Fplaylists)。\n- **英文讲座视频：** 英文讲座视频已上传至 YouTube：[点击此处](https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=D1T4pcyHsMxj6CzB)\n\n[![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_readme_5676b1573a8e.png)](https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=B6mRR7vxBAjRAm_F)\n\n- [30分钟掌握强化学习概览](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZHMWHr9811U&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=1)\n- [L1：基本概念（P1—状态、动作、策略等）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zJHtM5dN69g&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=2)\n- [L1：基本概念（P2—奖励、回报、马尔可夫决策过程）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=repVl3_GYCI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=3)\n- [L2：贝尔曼方程（P1—动机性示例）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=XCzWrlgZCwc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=4)\n- [L2：贝尔曼方程（P2—状态价值）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DSvi3xEN13I&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=5)\n- [L2：贝尔曼方程（P3—贝尔曼方程推导）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=eNtId8yPWkA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=6)\n- [L2：贝尔曼方程（P4—矩阵-向量形式及解法）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EtCfBG_eP2w&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=7)\n- [L2：贝尔曼方程（P5—动作价值）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zJo2sLDzfcU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=8)\n- [L3：贝尔曼最优性方程（P1—动机性示例）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=lXKY_Hyg4SQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=9)\n- [L3：贝尔曼最优性方程（P2—最优策略）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=BxyjdHhK8a8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=10)\n- [L3：贝尔曼最优性方程（P3—关于BOE的更多内容）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=FXftTCKotC8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=11)\n- [L3：贝尔曼最优性方程（P4—有趣性质）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=a--bck2ow9s&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=12)\n- [L4：值迭代与策略迭代（P1—值迭代）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=wMAVmLDIvQU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=13)\n- [L4：值迭代与策略迭代（P2—策略迭代）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Pka6Om0nYQ8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=14)\n- [L4：值迭代与策略迭代（P3—截断策略迭代）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=tUjPFPD3Vc8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=15)\n- [L5：蒙特卡洛学习（P1—动机性示例）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DO1yXinAV_Q&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=16)\n- [L5：蒙特卡洛学习（P2—MC基础介绍）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6ShisunU0zs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=17)\n- [L5：蒙特卡洛学习（P3—MC基础示例）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=axA0yns9FxU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=18)\n- [L5：蒙特卡洛学习（P4—探索式起点）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Qt8OMHPkLqg&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=19)\n- [L5：蒙特卡洛学习（P5—ε-贪心策略介绍）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=dM3fYE630pY&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=20)\n- [L5：蒙特卡洛学习（P6—ε-贪心策略示例）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=x6X_5ePT9gQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=21)\n- [L6：随机逼近与SGD（P1—动机性示例）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=1bMgejvWoAo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=22)\n- [L6：随机逼近与SGD（P2—RM算法介绍）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=1FTGcNUUnCE&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=23)\n- [L6：随机逼近与SGD（P3—RM算法收敛性）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=juNDoAFEre4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=24)\n- [L6：随机逼近与SGD（P4—SGD算法介绍）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EZO7Iadp5m4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=25)\n- [L6：随机逼近与SGD（P5—SGD算法示例）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=BsxU_4qvvNA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=26)\n- [L6：随机逼近与SGD（P6—SGD算法性质）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=fWxX9YuEHjE&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=27)\n- [L6：随机逼近与SGD（P7—SGD算法比较）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=yNEV2cLKuzU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=28)\n- [L7：时序差分学习（P1—动机性示例）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=u1X-7XX3dtI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=29)\n- [L7：时序差分学习（P2—TD算法介绍）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=XiCUsc7CCE0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=30)\n- [L7：时序差分学习（P3—TD算法收敛性）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=faWg8M91-Oo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=31)\n- [L7：时序差分学习（P4—Sarsa）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=jYwQufkBUPo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=32)\n- [L7：时序差分学习（P5—期望Sarsa与n步Sarsa）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0kKzQbWZOlk&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=33)\n- [L7：时序差分学习（P6—Q-learning介绍）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=4BvYR2hm730&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=34)\n- [L7：时序差分学习（P7—Q-learning伪代码）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=I0YhlOIFF4s&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=35)\n- [L7：时序差分学习（P8—统一视角与总结）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=3t74lvk1GBM&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=36)\n- [L8：值函数近似（P1—动机性示例—曲线拟合）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=uJXcI8fcdWc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=37)\n- [L8：值函数近似（P2—目标函数）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Z3HI1TfpJP0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=38)\n- [L8：值函数近似（P3—优化算法）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=piBDwrKt0uU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=39)\n- [L8：值函数近似（P4—示例分析）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=VFyBNEZxMMs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=40)\n- [L8：值函数近似（P5—Sarsa和Q-learning）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=C-HtY4-W_zw&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=41)\n- [L8：值函数近似（P6—DQN的基本思想）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=lZCcbZbqVSQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=42)\n- [L8：值函数近似（P7—DQN的经验回放）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=rynEdAdebi0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=43)\n- [L8：值函数近似（P8—DQN的实现与示例）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vQHuCHjd6hA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=44)\n- [L9：策略梯度方法（P1—基本思想）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=mtFHOj83QSo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=45)\n- [L9：策略梯度方法（P2—指标1—平均值）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=la8jQc3hX1M&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=46)\n- [L9：策略梯度方法（P3—指标2—平均奖励）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8RZ_rQFe69E&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=47)\n- [L9：策略梯度方法（P4—指标的梯度）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=MvmtPXur3Ls&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=48)\n- [L9：策略梯度方法（P5—基于梯度的算法及REINFORCE）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=1DQnnUC8ng8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=49)\n- [L10：演员-评论家方法（P1—最简单的演员-评论家）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=kjCZAT5Wh80&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=50)\n- [L10：演员-评论家方法（P2—优势演员-评论家）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=vZVXJJcZNEM&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=51)\n- [L10：演员-评论家方法（P3—重要性采样及离策略演员-评论家）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TfO5mnsiGKc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=52)\n- [L10：演员-评论家方法（P4—确定性演员-评论家）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=dTjz1RNtic4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=53)\n- [L10：演员-评论家方法（P5—总结与告别！）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=npvnnKcXoBs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=54)\n\n**来自YouTube和亚马逊的一些评论：**\n\n[![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_readme_2f90244c2055.jpg)](https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=B6mRR7vxBAjRAm_F)\n[![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_readme_ed7ae2b88f4c.jpg)](https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=B6mRR7vxBAjRAm_F)\n\n\n\n\n# 第三方代码和资料\n\n许多热心的读者向我寄来了他们在学习本书时编写的源代码或笔记。如果你基于本课程创建了任何相关资料，欢迎发邮件给我。我很乐意在此分享这些链接，希望能对其他读者有所帮助。需要强调的是，我对这些代码并未进行验证。如有任何疑问，可以直接联系开发者。\n\n**代码**\n\n*Python:*\n- https:\u002F\u002Fgithub.com\u002FRonchy2000\u002FMulti-agent-RL\u002Ftree\u002Fmaster\u002FRL_Learning-main（2025年10月，由Lu Rongqi提供）\n\n- https:\u002F\u002Fgithub.com\u002Fzhoubay\u002FCode-for-Mathematical-Foundations-of-Reinforcement-Learning（2025年3月，由ZHOU Xibin提供）\n\n- https:\u002F\u002Fgithub.com\u002F10-OASIS-01\u002Fminrl（2025年2月）\n\n- https:\u002F\u002Fgithub.com\u002FSupermanCaozh\u002FThe_Coding_Foundation_in_Reinforcement_Learning（由CAO Zehong提供，2024年8月）\n\n- https:\u002F\u002Fgithub.com\u002Fziwenhahaha\u002FCode-of-RL-Beginning，作者RLGamer（2024年3月）\n  - 代码讲解视频：https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1fW421w7NH\n\n- https:\u002F\u002Fgithub.com\u002Fjwk1rose\u002FRL_Learning，作者JI Wenkang（2024年2月）\n\n*Matlab:*\n- https:\u002F\u002Fgithub.com\u002FEveryDayIsaSong\u002FMATLAB-Code-for-Mathematical-Foundation-of-Reinforcement-Learning（由MAO Yucheng提供，2026年1月）\n\n*R:*\n\n- https:\u002F\u002Fgithub.com\u002FNewbieToEverything\u002FCode-Mathmatical-Foundation-of-Reinforcement-Learning\n\n*C++:*\n\n- https:\u002F\u002Fgithub.com\u002Fpurundong\u002Ftest_rl\n\n\n**学习笔记**\n\n*英文:*\n\n- https:\u002F\u002Flyk-love.cn\u002Ftags\u002Freinforcement-learning\u002F\n由加州大学戴维斯分校的一位研究生整理\n\n*中文:* \n\n- https:\u002F\u002Fgithub.com\u002FPeanut-Study\u002FReinforcement-Learning-Study-Note\u002Ftree\u002Fmain（2026年1月）\n  \n- https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F692207843 \n\n- https:\u002F\u002Fblog.csdn.net\u002Fqq_64671439\u002Fcategory_12540921.html\n\n- http:\u002F\u002Ft.csdnimg.cn\u002FEH4rj\n\n- https:\u002F\u002Fblog.csdn.net\u002FLvGreat\u002Farticle\u002Fdetails\u002F135454738\n\n- https:\u002F\u002Fxinzhe.blog.csdn.net\u002Farticle\u002Fdetails\u002F129452000  \n\n- https:\u002F\u002Fblog.csdn.net\u002Fv20000727\u002Farticle\u002Fdetails\u002F136870879?spm=1001.2014.3001.5502\n\n- https:\u002F\u002Fblog.csdn.net\u002Fm0_64952374\u002Fcategory_12883361.html\n\n此外，互联网上还有许多其他读者整理的学习笔记，我无法一一列出。如果你发现好的资料，欢迎推荐给我。\n\n**基于我课程制作的哔哩哔哩视频**\n\n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1DMBYB6Edo（2026年1月）\n\n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1fW421w7NH\n\n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Ne411m7GX\n  \n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1HX4y1H7uR\n  \n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1TgzsYDEnP\n  \n- https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1CQ4y1J7zu\n\n# 更新历史\n\n**(2025年7月) 小幅更新：修正错别字**\n\n距离本书上次更新已近一年。在此期间，细心的读者又发现了约10处错别字，并及时告知了我。对此我深表感谢。为避免后续读者产生困惑，我已经将这些错别字逐一修正，并更新了GitHub上的PDF文件。对于印刷版的Springer出版社版本，我也准备了一份勘误表，如果您阅读的是纸质书，可根据需要查阅该表。\n\n最后，我要衷心感谢各位读者的宝贵贡献。自本书发布以来，我们收到了大量关注与反馈，这使得书中的内容得以迅速完善。虽然我认为大部分错别字现已得到修正，但若您仍发现任何需要更正之处，请随时与我联系。\n\n**(2025年6月) 1万+ 颗星！**\n\n(2024年12月) 4000+ 颗星 -> (2025年2月) 5000+ 颗星 -> (2025年3月) 7000+ 颗星! -> (2025年4月) 8000+ 颗星! -> (2025年5月) 9000+ 颗星!\n\n**(2024年10月) 书籍封面**\n\n书籍封面设计已完成。本书将于明年年初由Springer正式出版，此前已由清华大学出版社出版过。\n\n**(2024年9月) Springer印刷前的小幅更新**\n\n我对一些读者几乎难以察觉的细微之处进行了修订。这应是Springer印刷前的最终版本。\n\n**(2024年8月) 3000颗星及更多代码**\n\n本书目前已获得3000多颗星，这对我来说是一项巨大的成就。感谢所有人的支持，希望本书确实对大家有所帮助。\n\n此外，我还收到了热心读者提供的更多代码实现。例如，这个[GitHub页面](https:\u002F\u002Fgithub.com\u002FSupermanCaozh\u002FThe_Coding_Foundation_in_Reinforcement_Learning)提供了书中几乎所有示例的Python实现。一方面，看到这种情况让我非常高兴；另一方面，我也有些担心我的线下课程学生可能会直接使用这些代码完成作业:-)。总的来说，我很欣慰，因为这表明本书和公开课程确实对读者有帮助；否则，他们不会费心自己编写代码:-)\n\n**(2024年6月) 印刷前的小幅更新**\n\n这是本书的第四稿，预计将是正式出版前的最后一版。具体来说，在校对书稿时，我发现了一些非常细微的问题，加上热心读者反馈的一些问题，都在这一版中得到了修正。\n\n**(2024年4月) 格网世界环境的代码**\n\n我们在书中添加了格网世界环境的代码。感兴趣的读者可以在这个环境中开发并测试自己的算法。同时提供了Python和MATLAB两种版本。\n\n需要注意的是，我们并未提供书中涉及的所有算法代码。这是因为这些算法本身就是线下教学中的作业，学生们需要利用提供的环境自行开发算法。不过，目前已有第三方实现了部分算法，感兴趣的同学可以查看本书主页上的相关链接。\n\n在此特别感谢我的博士生米一泽和李佳楠，他们同时也是我线下课程的助教，为代码的编写做出了重要贡献。\n\n如果您在使用代码时发现任何问题或错误，欢迎随时向我们反馈。\n\n**(2024年3月) 2000颗星**\n\n本书目前已获得2000颗星。我也收到了许多读者对本书的积极评价。很高兴这本书能够对大家有所帮助。\n\n**(2024年3月) 小幅更新**\n\n本书第三稿现已上线。\n\n与第二稿相比，第三稿主要改进了部分错别字。在此，我要感谢那些向我反馈意见的读者们。\n\n**(2023年9月) 1000+ 颗星**\n\n本书获得了1000多颗星！感谢大家！\n\n**(2023年8月) 大幅更新——第二版**\n\n*本书第二稿现已上线！！*\n\n与一年前发布的第一稿相比，第二稿在多个方面都有所改进。例如，我们重新绘制了大部分图表，对部分内容进行了重新组织以使其更加清晰，修正了一些错别字，并新增了第一版中未包含的第10章。\n\n我在2022年8月首次将本书初稿发布在网上。至今，我已收到来自全球各地读者的宝贵反馈。对此，我深表感激。\n\n**(2022年11月) 将联合出版**\n\n本书将由*Springer Nature和清华大学出版社联合出版*。预计将于2023年下半年印刷发行。\n\n我曾收到一些读者对本书的意见和建议。非常感谢大家的支持与厚爱。目前我仍在收集反馈，计划在未来几个月内进一步修改书稿。您的反馈将使本书更好地服务于更多读者！\n\n**(2022年10月) 讲义与视频**\n\n*讲义幻灯片*已上传至“讲义幻灯片”文件夹。\n\n*中文版讲座视频*已上线。请访问我们的Bilibili频道 https:\u002F\u002Fspace.bilibili.com\u002F2044042934 或YouTube频道 https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUCztGtS5YYiNv8x3pj9hLVgg\u002Fplaylists。\n\n**(2022年8月) 初稿**\n\n本书初稿现已上线。","# Book-Mathematical-Foundation-of-Reinforcement-Learning 快速上手指南\n\n本指南旨在帮助读者快速开始学习《强化学习的数学基础》（Mathematical Foundations of Reinforcement Learning）。该项目主要包含书籍内容、LaTeX 源码（针对教授开放）以及配套的讲座视频资源，而非传统的可安装软件库。\n\n## 环境准备\n\n由于本项目核心为学术书籍与视频教程，无需复杂的系统环境或编程依赖即可开始阅读和观看。\n\n*   **系统要求**：任意操作系统（Windows, macOS, Linux）。\n*   **前置依赖**：\n    *   **阅读书籍**：无需安装特定软件，推荐使用支持 PDF 阅读的浏览器或阅读器。若需编译 LaTeX 源码（仅限获授权的教授），需安装 TeX Live 或 MacTeX 及 `beamer` 宏包。\n    *   **观看视频**：需要网络连接。国内用户推荐访问 **Bilibili** 以获得更流畅的观看体验；国际用户可访问 YouTube。\n*   **知识储备**：建议具备概率论和线性代数基础知识（书中附录包含部分数学基础回顾）。\n\n## 获取与学习步骤\n\n本项目不涉及传统的 `pip` 或 `apt` 安装过程，请通过以下方式获取学习资源：\n\n### 1. 获取书籍\n访问 Springer 官方页面查看或购买书籍：\n*   **书籍链接**: [Mathematical Foundations of Reinforcement Learning](https:\u002F\u002Flink.springer.com\u002Fbook\u002F9789819739431)\n\n### 2. 获取讲座视频（推荐国内用户）\n作者提供了完整的中英文讲座视频，与国内开发者最相关的为 Bilibili 频道：\n\n*   **中文讲座视频 (Bilibili)**:\n    访问作者 Bilibili 空间获取全套课程：\n    ```text\n    https:\u002F\u002Fspace.bilibili.com\u002F2044042934\n    ```\n\n*   **英文讲座视频 (YouTube)**:\n    如需英文版本，可访问以下播放列表：\n    ```text\n    https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8\n    ```\n\n### 3. 获取源码（仅限授权教师）\n如果您是高校教授并希望在课程中使用幻灯片内容，需直接联系作者获取 LaTeX 源码：\n*   **联系方式**: 通过作者主页邮箱联系 (https:\u002F\u002Fwww.shiyuzhao.net)\n*   **说明**: 源码使用 LaTeX\u002FBeamer 编写，未公开在仓库中供直接下载。\n\n## 基本使用（学习路径建议）\n\n本书共十章，分为“基础工具”和“算法”两部分，章节间逻辑紧密，建议按顺序学习。结合视频与书籍是最佳学习方式。\n\n### 推荐学习流程\n\n1.  **入门概览**：\n    先观看 30 分钟概览视频，建立整体概念。\n    *   [Overview of Reinforcement Learning in 30 Minutes](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZHMWHr9811U&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=1) (Bilibili 搜同名标题)\n\n2.  **基础概念 (对应第 1-2 章)**：\n    阅读书籍前几章，同时配合视频理解状态、动作、策略及贝尔曼方程。\n    *   视频示例：[L1: Basic Concepts](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zJHtM5dN69g&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=2)\n    *   视频示例：[L2: Bellman Equation](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=XCzWrlgZCwc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=4)\n\n3.  **核心算法 (对应第 3-7 章)**：\n    深入学习动态规划、蒙特卡洛方法、时序差分学习 (TD Learning) 等经典算法。\n    *   重点视频：[L7: Temporal-Difference Learning](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=u1X-7XX3dtI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=29)\n\n4.  **进阶应用 (对应第 8-10 章)**：\n    学习价值函数近似 (DQN) 和策略梯度方法。\n    *   重点视频：[L8: Value Function Approximation (DQN)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=lZCcbZbqVSQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=42)\n\n### 引用本书\n如果在研究或工作中使用了本书内容，请使用以下 BibTeX 进行引用：\n\n```bibtex\n@book{zhao2025RLBook,\n  title={Mathematical Foundations of Reinforcement Learning},\n  author={S. Zhao},\n  year={2025},\n  publisher={Springer Press}\n}\n```","某高校研究生在准备强化学习课题时，试图深入理解算法背后的数学原理，却因现有资料过于碎片化或晦涩难懂而陷入瓶颈。\n\n### 没有 Book-Mathematical-Foundation-of-Reinforcement-Learning 时\n- 只能零散地阅读各类论文和博客，知其然不知其所以然，无法理解算法设计的原始动机。\n- 面对复杂的数学推导往往望而生畏，缺乏循序渐进的引导，难以把握公式背后的直观含义。\n- 缺少统一的示例环境（如网格世界），在不同算法间切换时难以建立连贯的知识体系。\n- 自学过程中容易迷失方向，无法区分核心思想与干扰性的技术细节，导致学习效率低下。\n\n### 使用 Book-Mathematical-Foundation-of-Reinforcement-Learning 后\n- 通过书中“数学友好”的讲解，不仅掌握了算法流程，更深刻理解了其设计初衷及有效性证明。\n- 借助精心控制的数学深度和可选读的灰色框内容，能够根据自身基础灵活调整学习节奏，轻松攻克难点。\n- 全书基于统一的网格世界任务展开 illustrative examples，使抽象概念具象化，快速建立起算法间的逻辑联系。\n- 书籍将核心思想与复杂细节剥离，帮助学习者直击本质，配合章节间严密的逻辑递进，构建了完整的知识大厦。\n\nBook-Mathematical-Foundation-of-Reinforcement-Learning 成功架起了从基础数学理论到强化学习核心算法的桥梁，让深度学习不再是黑盒探索。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMathFoundationRL_Book-Mathematical-Foundation-of-Reinforcement-Learning_f45afa53.png","MathFoundationRL","Shiyu Zhao","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FMathFoundationRL_9688dc1a.jpg",null,"Westlake University","Hangzhou, China","www.shiyuzhao.net","https:\u002F\u002Fgithub.com\u002FMathFoundationRL",[85,89],{"name":86,"color":87,"percentage":88},"MATLAB","#e16737",55,{"name":90,"color":91,"percentage":92},"Python","#3572A5",45,15172,1422,"2026-04-03T16:48:01","","未说明",{"notes":99,"python":97,"dependencies":100},"该项目并非可运行的 AI 软件工具，而是一本关于强化学习数学基础的书籍及其 LaTeX 幻灯片源代码。作者仅向教授提供源代码，普通用户主要阅读出版的书籍或观看配套的讲座视频（Bilibili\u002FYouTube）。书中算法示例基于网格世界（Grid World），无需特定的 GPU、内存或 Python 环境即可阅读和理解。",[],[18],[103,104,105,106,107],"reinforcement-learning","book","courses","tutorials","artificial-intelligence",8,"2026-03-27T02:49:30.150509","2026-04-06T10:24:04.154147",[],[]]