[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-PacktPublishing--Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original":3,"tool-PacktPublishing--Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159267,2,"2026-04-17T11:29:14",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":72,"owner_website":77,"owner_url":78,"languages":79,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":10,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":110,"github_topics":76,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":112,"updated_at":113,"faqs":114,"releases":155},8499,"PacktPublishing\u002FMachine-Learning-for-Algorithmic-Trading-Second-Edition_Original","Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original","Machine Learning for Algorithmic Trading, Second Edition - published by Packt","Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original 是 Packt 出版的《机器学习与算法交易（第二版）》的官方配套开源资源。它旨在解决金融从业者如何将复杂的机器学习理论转化为实际可盈利的交易策略这一难题，填补了从模型构建到回测评估之间的实践空白。\n\n这套资源非常适合量化分析师、数据科学家以及对算法交易感兴趣的开发者使用。内容涵盖四大板块共 23 章，通过超过 150 个可运行的 Jupyter Notebook，手把手演示如何获取数据、进行金融特征工程及管理投资组合。其技术亮点在于不仅讲解了线性回归等基础方法，还深入探讨了利用深度学习（如 CNN、RNN）处理市场数据，从 SEC 文件或财报电话会议记录等文本中提取交易信号，甚至利用生成对抗网络（GAN）合成数据以及通过深度强化学习训练智能交易代理。无论是希望优化现有策略的专业人士，还是想要系统学习量化金融的学生，都能从中获得从数据清洗到策略部署的全流程实战指导。","\u003Cp align='center'>\u003Ca href='https:\u002F\u002Fwww.eventbrite.com\u002Fe\u002Fship-production-pytorch-system-in-a-day-train-optimize-deploy-workshop-tickets-1983348934052?aff=GitHub'>\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPacktPublishing_Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original_readme_d11aced4566a.png'\u002F>\u003C\u002Fa>\u003C\u002Fp>\n\n\n\n---\n\n## Join Our Newsletters 📬\n\n### DataPro  \n*The future of AI is unfolding. Don’t fall behind.*\n\n\u003Cp>\u003Ca href=\"https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-datapronewsletter\u002F?link_from_packtlink=yes\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPacktPublishing_Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original_readme_d64f4cb7d3a3.png\" alt=\"DataPro QR\" width=\"150\"\u002F>\u003C\u002Fa>\u003C\u002Fp>\n\nStay ahead with [**DataPro**](https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-datapronewsletter\u002F?link_from_packtlink=yes), the free weekly newsletter for data scientists, AI\u002FML researchers, and data engineers.  \nFrom trending tools like **PyTorch**, **scikit-learn**, **XGBoost**, and **BentoML** to hands-on insights on **database optimization** and real-world **ML workflows**, you’ll get what matters, fast.\n\n> Stay sharp with [DataPro](https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-datapronewsletter\u002F?link_from_packtlink=yes). Join **115K+ data professionals** who never miss a beat.\n\n---\n\n### BIPro  \n*Business runs on data. Make sure yours tells the right story.*\n\n\u003Cp>\u003Ca href=\"https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-bipro-newsletter\u002F?link_from_packtlink=yes\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPacktPublishing_Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original_readme_769ac60b65af.png\" alt=\"BIPro QR\" width=\"150\"\u002F>\u003C\u002Fa>\u003C\u002Fp>\n\n[**BIPro**](https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-bipro-newsletter\u002F?link_from_packtlink=yes) is your free weekly newsletter for BI professionals, analysts, and data leaders.  \nGet practical tips on **dashboarding**, **data visualization**, and **analytics strategy** with tools like **Power BI**, **Tableau**, **Looker**, **SQL**, and **dbt**.\n\n> Get smarter with [BIPro](https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-bipro-newsletter\u002F?link_from_packtlink=yes). Trusted by **35K+ BI professionals**, see what you’re missing.\n\n\n\n\n# ML for Trading - 2\u003Csup>nd\u003C\u002Fsup> Edition\n\nThis [book](https:\u002F\u002Fwww.amazon.com\u002FMachine-Learning-Algorithmic-Trading-alternative\u002Fdp\u002F1839217715?pf_rd_r=GZH2XZ35GB3BET09PCCA&pf_rd_p=c5b6893a-24f2-4a59-9d4b-aff5065c90ec&pd_rd_r=91a679c7-f069-4a6e-bdbb-a2b3f548f0c8&pd_rd_w=2B0Q0&pd_rd_wg=GMY5S&ref_=pd_gw_ci_mcx_mr_hp_d) aims to show how ML can add value to algorithmic trading strategies in a practical yet comprehensive way. It covers a broad range of ML techniques from linear regression to deep reinforcement learning and demonstrates how to build, backtest, and evaluate a trading strategy driven by model predictions.  \n\nIn four parts with **23 chapters plus an appendix**, it covers on **over 800 pages**:\n- important aspects of data sourcing, **financial feature engineering**, and portfolio management, \n- the design and evaluation of long-short **strategies based on supervised and unsupervised ML algorithms**,\n- how to extract tradeable signals from **financial text data** like SEC filings, earnings call transcripts or financial news,\n- using **deep learning** models like CNN and RNN with market and alternative data, how to generate synthetic data with generative adversarial networks, and training a trading agent using deep reinforcement learning\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fwww.amazon.com\u002FMachine-Learning-Algorithmic-Trading-alternative\u002Fdp\u002F1839217715?pf_rd_r=GZH2XZ35GB3BET09PCCA&pf_rd_p=c5b6893a-24f2-4a59-9d4b-aff5065c90ec&pd_rd_r=91a679c7-f069-4a6e-bdbb-a2b3f548f0c8&pd_rd_w=2B0Q0&pd_rd_wg=GMY5S&ref_=pd_gw_ci_mcx_mr_hp_d\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPacktPublishing_Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original_readme_8698c6363773.png\" width=\"75%\">\n\u003C\u002Fa>\n\u003C\u002Fp>\n\nThis repo contains **over 150 notebooks** that put the concepts, algorithms, and use cases discussed in the book into action. They provide numerous examples that show\n- how to work with and extract signals from market, fundamental and alternative text and image data, \n- how to train and tune models that predict returns for different asset classes and investment horizons, including how to replicate recently published research, and \n- how to design, backtest, and evaluate trading strategies.\n\n> We **highly recommend** to review the notebooks while reading the book; they are usually in executed state and often contain additional information that the space constraints of the book did not permit to include. \n\n## What's new in the 2\u003Csup>nd\u003C\u002Fsup> Edition?\n\nFirst and foremost, this [book](https:\u002F\u002Fwww.amazon.com\u002FMachine-Learning-Algorithmic-Trading-alternative\u002Fdp\u002F1839217715?pf_rd_r=VMKJPZC4N36TTZZCWATP&pf_rd_p=c5b6893a-24f2-4a59-9d4b-aff5065c90ec&pd_rd_r=8f331266-0d21-4c76-a3eb-d2e61d23bb31&pd_rd_w=kVGNF&pd_rd_wg=LYLKH&ref_=pd_gw_ci_mcx_mr_hp_d) demonstrates how you can extract signals from a diverse set of data sources and design trading strategies for different asset classes using a broad range of supervised, unsupervised, and reinforcement learning algorithms. It also provides relevant mathematical and statistical knowledge to facilitate the tuning of an algorithm or the interpretation of the results. Furthermore, it covers the financial background that will help you work with market and fundamental data, extract informative features, and manage the performance of a trading strategy.\n\nFrom a practical standpoint, the 2nd edition aims to equip you with the conceptual understanding and tools to develop your own ML-based trading strategies. To this end, it frames ML as a critical element in a process rather than a standalone exercise, introducing the end-to-end ML for trading workflow from data sourcing, feature engineering, and model optimization to strategy design and backtesting.\n\nMore specifically, the ML4T workflow starts with generating ideas for a well-defined investment universe, collecting relevant data, and extracting informative features. It also involves designing, tuning, and evaluating ML models suited to the predictive task. Finally, it requires developing trading strategies to act on the models' predictive signals, as well as simulating and evaluating their performance on historical data using a backtesting engine. Once you decide to execute an algorithmic strategy in a real market, you will find yourself iterating over this workflow repeatedly to incorporate new information and a changing environment.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FkcgItgp.png\" width=\"75%\">\n\u003C\u002Fp>\n\nThe [second edition](https:\u002F\u002Fwww.amazon.com\u002FMachine-Learning-Algorithmic-Trading-alternative\u002Fdp\u002F1839217715?pf_rd_r=GZH2XZ35GB3BET09PCCA&pf_rd_p=c5b6893a-24f2-4a59-9d4b-aff5065c90ec&pd_rd_r=91a679c7-f069-4a6e-bdbb-a2b3f548f0c8&pd_rd_w=2B0Q0&pd_rd_wg=GMY5S&ref_=pd_gw_ci_mcx_mr_hp_d)'s emphasis on the ML4t workflow translates into a new chapter on [strategy backtesting](08_ml4t_workflow), a new [appendix](24_alpha_factor_library) describing over 100 different alpha factors, and many new practical applications. We have also rewritten most of the existing content for clarity and readability. \n\nThe trading applications now use a broader range of data sources beyond daily US equity prices, including international stocks and ETFs. It also demonstrates how to use ML for an intraday strategy with minute-frequency equity data. Furthermore, it extends the coverage of alternative data sources to include SEC filings for sentiment analysis and return forecasts, as well as satellite images to classify land use. \n\nAnother innovation of the second edition is to replicate several trading applications recently published in top journals: \n- [Chapter 18](18_convolutional_neural_nets) demonstrates how to apply convolutional neural networks to time series converted to image format for return predictions based on [Sezer and Ozbahoglu](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F324802031_Algorithmic_Financial_Trading_with_Deep_Convolutional_Neural_Networks_Time_Series_to_Image_Conversion_Approach) (2018). \n- [Chapter 20](20_autoencoders_for_conditional_risk_factors) shows how to extract risk factors conditioned on stock characteristics for asset pricing using autoencoders based on [Autoencoder Asset Pricing Models](https:\u002F\u002Fwww.aqr.com\u002FInsights\u002FResearch\u002FWorking-Paper\u002FAutoencoder-Asset-Pricing-Models) by Shihao Gu, Bryan T. Kelly, and Dacheng Xiu (2019), and \n- [Chapter 21](21_gans_for_synthetic_time_series) shows how to create synthetic training data using generative adversarial networks based on [Time-series Generative Adversarial Networks](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F8789-time-series-generative-adversarial-networks) by Jinsung Yoon, Daniel Jarrett, and Mihaela van der Schaar (2019).\n\nAll applications now use the latest available (at the time of writing) software versions such as pandas 1.0 and TensorFlow 2.2. There is also a customized version of Zipline that makes it easy to include machine learning model predictions when designing a trading strategy.\n\n## Installation and Data Sources\n\n- For instructions on using a Docker image or setting up various `conda` environments to install the packages used in the notebooks, see [here](installation\u002FREADME.md).\n- To download and preprocess many of the data sources used in this book see [create_datasets](data\u002Fcreate_datasets.ipynb).\n\n# Chapter Summary\n\nThe [book](https:\u002F\u002Fwww.amazon.com\u002FMachine-Learning-Algorithmic-Trading-alternative\u002Fdp\u002F1839217715?pf_rd_r=GZH2XZ35GB3BET09PCCA&pf_rd_p=c5b6893a-24f2-4a59-9d4b-aff5065c90ec&pd_rd_r=91a679c7-f069-4a6e-bdbb-a2b3f548f0c8&pd_rd_w=2B0Q0&pd_rd_wg=GMY5S&ref_=pd_gw_ci_mcx_mr_hp_d) has four parts that address different challenges that arise when sourcing and working with market, fundamental and alternative data sourcing, developing ML solutions to various predictive tasks in the trading context, and designing and evaluating a trading strategy that relies on predictive signals generated by an ML model.\n\n> The directory for each chapter contains a README with additional information on content, code examples and additional resources.  \n\n[Part 1: From Data to Strategy Development](#part-1-from-data-to-strategy-development)\n* [01 Machine Learning for Trading: From Idea to Execution](#01-machine-learning-for-trading-from-idea-to-execution)\n* [02 Market & Fundamental Data: Sources and Techniques](#02-market--fundamental-data-sources-and-techniques)\n* [03 Alternative Data for Finance: Categories and Use Cases](#03-alternative-data-for-finance-categories-and-use-cases)\n* [04 Financial Feature Engineering: How to research Alpha Factors](#04-financial-feature-engineering-how-to-research-alpha-factors)\n* [05 Portfolio Optimization and Performance Evaluation](#05-portfolio-optimization-and-performance-evaluation)\n\n[Part 2: Machine Learning for Trading: Fundamentals](#part-2-machine-learning-for-trading-fundamentals)\n* [06 The Machine Learning Process](#06-the-machine-learning-process)\n* [07 Linear Models: From Risk Factors to Return Forecasts](#07-linear-models-from-risk-factors-to-return-forecasts)\n* [08 The ML4T Workflow: From Model to Strategy Backtesting](#08-the-ml4t-workflow-from-model-to-strategy-backtesting)\n* [09 Time Series Models for Volatility Forecasts and Statistical Arbitrage](#09-time-series-models-for-volatility-forecasts-and-statistical-arbitrage)\n* [10 Bayesian ML: Dynamic Sharpe Ratios and Pairs Trading](#10-bayesian-ml-dynamic-sharpe-ratios-and-pairs-trading)\n* [11 Random Forests: A Long-Short Strategy for Japanese Stocks](#11-random-forests-a-long-short-strategy-for-japanese-stocks)\n* [12 Boosting your Trading Strategy](#12-boosting-your-trading-strategy)\n* [13 Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning](#13-data-driven-risk-factors-and-asset-allocation-with-unsupervised-learning)\n\n[Part 3: Natural Language Processing for Trading](#part-3-natural-language-processing-for-trading)\n* [14 Text Data for Trading: Sentiment Analysis](#14-text-data-for-trading-sentiment-analysis)\n* [15 Topic Modeling: Summarizing Financial News](#15-topic-modeling-summarizing-financial-news)\n* [16 Word embeddings for Earnings Calls and SEC Filings](#16-word-embeddings-for-earnings-calls-and-sec-filings)\n\n[Part 4: Deep & Reinforcement Learning](#part-4-deep--reinforcement-learning)\n* [17 Deep Learning for Trading](#17-deep-learning-for-trading)\n* [18 CNN for Financial Time Series and Satellite Images](#18-cnn-for-financial-time-series-and-satellite-images)\n* [19 RNN for Multivariate Time Series and Sentiment Analysis](#19-rnn-for-multivariate-time-series-and-sentiment-analysis)\n* [20 Autoencoders for Conditional Risk Factors and Asset Pricing](#20-autoencoders-for-conditional-risk-factors-and-asset-pricing)\n* [21 Generative Adversarial Nets for Synthetic Time Series Data](#21-generative-adversarial-nets-for-synthetic-time-series-data)\n* [22 Deep Reinforcement Learning: Building a Trading Agent](#22-deep-reinforcement-learning-building-a-trading-agent)\n* [23 Conclusions and Next Steps](#23-conclusions-and-next-steps)\n* [24 Appendix - Alpha Factor Library](#24-appendix---alpha-factor-library)\n\n\n## Part 1: From Data to Strategy Development\n\nThe first part provides a framework for developing trading strategies driven by machine learning (ML). It focuses on the data that power the ML algorithms and strategies discussed in this book, outlines how to engineer and evaluates features suitable for ML models, and how to manage and measure a portfolio's performance while executing a trading strategy.\n\n### 01 Machine Learning for Trading: From Idea to Execution\n\nThis [chapter](01_machine_learning_for_trading) explores industry trends that have led to the emergence of ML as a source of competitive advantage in the investment industry. We will also look at where ML fits into the investment process to enable algorithmic trading strategies. \n\nMore specifically, it covers the following topics:\n- Key trends behind the rise of ML in the investment industry\n- The design and execution of a trading strategy that leverages ML\n- Popular use cases for ML in trading\n\n### 02 Market & Fundamental Data: Sources and Techniques\n\nThis [chapter](02_market_and_fundamental_data) shows how to work with market and fundamental data and describes critical aspects of the environment that they reflect. For example, familiarity with various order types and the trading infrastructure matter not only for the interpretation of the data but also to correctly design backtest simulations. We also illustrate how to use Python to access and manipulate trading and financial statement data.  \n\nPractical examples demonstrate how to work with trading data from NASDAQ tick data and Algoseek minute bar data with a rich set of attributes capturing the demand-supply dynamic that we will later use for an ML-based intraday strategy. We also cover various data provider APIs and how to source financial statement information from the SEC.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FenaSo0C.png\" title=\"Order Book\" width=\"50%\"\u002F>\n\u003C\u002Fp>\nIn particular, this chapter covers:\n\n- How market data reflects the structure of the trading environment\n- Working with intraday trade and quotes data at minute frequency\n- Reconstructing the **limit order book** from tick data using NASDAQ ITCH \n- Summarizing tick data using various types of bars\n- Working with eXtensible Business Reporting Language (XBRL)-encoded **electronic filings**\n- Parsing and combining market and fundamental data to create a P\u002FE series\n- How to access various market and fundamental data sources using Python\n\n### 03 Alternative Data for Finance: Categories and Use Cases\n\nThis [chapter](03_alternative_data) outlines categories and use cases of alternative data, describes criteria to assess the exploding number of sources and providers, and summarizes the current market landscape. \n\nIt also demonstrates how to create alternative data sets by scraping websites, such as collecting earnings call transcripts for use with natural language processing (NLP) and sentiment analysis algorithms in the third part of the book.\n \nMore specifically, this chapter covers:\n\n- Which new sources of signals have emerged during the alternative data revolution\n- How individuals, business, and sensors generate a diverse set of alternative data\n- Important categories and providers of alternative data\n- Evaluating how the burgeoning supply of alternative data can be used for trading\n- Working with alternative data in Python, such as by scraping the internet\n\n### 04 Financial Feature Engineering: How to research Alpha Factors\n\nIf you are already familiar with ML, you know that feature engineering is a crucial ingredient for successful predictions. It matters at least as much in the trading domain, where academic and industry researchers have investigated for decades what drives asset markets and prices, and which features help to explain or predict price movements.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FUCu4Huo.png\" width=\"70%\">\n\u003C\u002Fp>\n\nThis [chapter](04_alpha_factor_research) outlines the key takeaways of this research as a starting point for your own quest for alpha factors. It also presents essential tools to compute and test alpha factors, highlighting how the NumPy, pandas, and TA-Lib libraries facilitate the manipulation of data and present popular smoothing techniques like the wavelets and the Kalman filter that help reduce noise in data. After reading it, you will know about:\n- Which categories of factors exist, why they work, and how to measure them,\n- Creating e alpha factors using NumPy, pandas, and TA-Lib,\n- How to denoise data using wavelets and the Kalman filter,\n- Using e Zipline offline and on Quantopian to test individual and multiple alpha factors,\n- How to use Alphalens to evaluate predictive performance using, among other metrics, the information coefficient.\n \n### 05 Portfolio Optimization and Performance Evaluation\n\nAlpha factors generate signals that an algorithmic strategy translates into trades, which, in turn, produce long and short positions. The returns and risk of the resulting portfolio determine whether the strategy meets the investment objectives.\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FE2h63ZB.png\" width=\"65%\">\n\u003C\u002Fp>\n\nThere are several approaches to optimize portfolios. These include the application of machine learning (ML) to learn hierarchical relationships among assets and treat them as complements or substitutes when designing the portfolio's risk profile. This [chapter](05_strategy_evaluation) covers:\n- How to measure portfolio risk and return\n- Managing portfolio weights using mean-variance optimization and alternatives\n- Using machine learning to optimize asset allocation in a portfolio context\n- Simulating trades and create a portfolio based on alpha factors using Zipline\n- How to evaluate portfolio performance using pyfolio\n\n## Part 2: Machine Learning for Trading: Fundamentals\n\nThe second part covers the fundamental supervised and unsupervised learning algorithms and illustrates their application to trading strategies. It also introduces the Quantopian platform that allows you to leverage and combine the data and ML techniques developed in this book to implement algorithmic strategies that execute trades in live markets.\n\n### 06 The Machine Learning Process\n\nThis [chapter](06_machine_learning_process) kicks off Part 2 that illustrates how you can use a range of supervised and unsupervised ML models for trading. We will explain each model's assumptions and use cases before we demonstrate relevant applications using various Python libraries. \n\nThere are several aspects that many of these models and their applications have in common. This chapter covers these common aspects so that we can focus on model-specific usage in the following chapters. It sets the stage by outlining how to formulate, train, tune, and evaluate the predictive performance of ML models as a systematic workflow. The content includes:\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002F5qisClE.png\" width=\"65%\">\n\u003C\u002Fp>\n\n- How supervised and unsupervised learning from data works\n- Training and evaluating supervised learning models for regression and classification tasks\n- How the bias-variance trade-off impacts predictive performance\n- How to diagnose and address prediction errors due to overfitting\n- Using cross-validation to optimize hyperparameters with a focus on time-series data\n- Why financial data requires additional attention when testing out-of-sample\n\n### 07 Linear Models: From Risk Factors to Return Forecasts\n\nLinear models are standard tools for inference and prediction in regression and classification contexts. Numerous widely used asset pricing models rely on linear regression. Regularized models like Ridge and Lasso regression often yield better predictions by limiting the risk of overfitting. Typical regression applications identify risk factors that drive asset returns to manage risks or predict returns. Classification problems, on the other hand, include directional price forecasts.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002F3Ph6jma.png\" width=\"65%\">\n\u003C\u002Fp>\n\n[Chapter 07](07_linear_models) covers the following topics:\n\n- How linear regression works and which assumptions it makes\n- Training and diagnosing linear regression models\n- Using linear regression to predict stock returns\n- Use regularization to improve the predictive performance\n- How logistic regression works\n- Converting a regression into a classification problem\n\n### 08 The ML4T Workflow: From Model to Strategy Backtesting\n\nThis [chapter](08_ml4t_workflow) presents an end-to-end perspective on designing, simulating, and evaluating a trading strategy driven by an ML algorithm. \nWe will demonstrate in detail how to backtest an ML-driven strategy in a historical market context using the Python libraries [backtrader](https:\u002F\u002Fwww.backtrader.com\u002F) and [Zipline](https:\u002F\u002Fwww.zipline.io\u002Findex.html). \nThe ML4T workflow ultimately aims to gather evidence from historical data that helps decide whether to deploy a candidate strategy in a live market and put financial resources at risk. A realistic simulation of your strategy needs to faithfully represent how security markets operate and how trades execute. Also, several methodological aspects require attention to avoid biased results and false discoveries that will lead to poor investment decisions.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FR9O0fn3.png\" width=\"65%\">\n\u003C\u002Fp>\n\nMore specifically, after working through this chapter you will be able to:\n\n- Plan and implement end-to-end strategy backtesting\n- Understand and avoid critical pitfalls when implementing backtests\n- Discuss the advantages and disadvantages of vectorized vs event-driven backtesting engines\n- Identify and evaluate the key components of an event-driven backtester\n- Design and execute the ML4T workflow using data sources at minute and daily frequencies, with ML models trained separately or as part of the backtest\n- Use Zipline and backtrader to design and evaluate your own strategies \n\n### 09 Time Series Models for Volatility Forecasts and Statistical Arbitrage\n\nThis [chapter](09_time_series_models) focuses on models that extract signals from a time series' history to predict future values for the same time series. \nTime series models are in widespread use due to the time dimension inherent to trading. It presents tools to diagnose time series characteristics such as stationarity and extract features that capture potentially useful patterns. It also introduces univariate and multivariate time series models to forecast macro data and volatility patterns. \nFinally, it explains how cointegration identifies common trends across time series and shows how to develop a pairs trading strategy based on this crucial concept. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FcglLgJ0.png\" width=\"90%\">\n\u003C\u002Fp>\n\nIn particular, it covers:\n- How to use time-series analysis to prepare and inform the modeling process\n- Estimating and diagnosing univariate autoregressive and moving-average models\n- Building autoregressive conditional heteroskedasticity (ARCH) models to predict volatility\n- How to build multivariate vector autoregressive models\n- Using cointegration to develop a pairs trading strategy\n\n### 10 Bayesian ML: Dynamic Sharpe Ratios and Pairs Trading\n\nBayesian statistics allows us to quantify uncertainty about future events and refine estimates in a principled way as new information arrives. This dynamic approach adapts well to the evolving nature of financial markets. \nBayesian approaches to ML enable new insights into the uncertainty around statistical metrics, parameter estimates, and predictions. The applications range from more granular risk management to dynamic updates of predictive models that incorporate changes in the market environment. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FqOUPIDV.png\" width=\"80%\">\n\u003C\u002Fp>\n\nMore specifically, this [chapter](10_bayesian_machine_learning) covers: \n- How Bayesian statistics applies to machine learning\n- Probabilistic programming with PyMC3\n- Defining and training machine learning models using PyMC3\n- How to run state-of-the-art sampling methods to conduct approximate inference\n- Bayesian ML applications to compute dynamic Sharpe ratios, dynamic pairs trading hedge ratios, and estimate stochastic volatility\n\n\n### 11 Random Forests: A Long-Short Strategy for Japanese Stocks\n\nThis [chapter](11_decision_trees_random_forests) applies decision trees and random forests to trading. Decision trees learn rules from data that encode nonlinear input-output relationships. We show how to train a decision tree to make predictions for regression and classification problems, visualize and interpret the rules learned by the model, and tune the model's hyperparameters to optimize the bias-variance tradeoff and prevent overfitting.\n\nThe second part of the chapter introduces ensemble models that combine multiple decision trees in a randomized fashion to produce a single prediction with a lower error. It concludes with a long-short strategy for Japanese equities based on trading signals generated by a random forest model.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FS4s0rou.png\" width=\"80%\">\n\u003C\u002Fp>\n\nIn short, this chapter covers:\n- Use decision trees for regression and classification\n- Gain insights from decision trees and visualize the rules learned from the data\n- Understand why ensemble models tend to deliver superior results\n- Use bootstrap aggregation to address the overfitting challenges of decision trees\n- Train, tune, and interpret random forests\n- Employ a random forest to design and evaluate a profitable trading strategy\n\n\n### 12 Boosting your Trading Strategy\n\nGradient boosting is an alternative tree-based ensemble algorithm that often produces better results than random forests. The critical difference is that boosting modifies the data used to train each tree based on the cumulative errors made by the model. While random forests train many trees independently using random subsets of the data, boosting proceeds sequentially and reweights the data.\nThis [chapter](12_gradient_boosting_machines) shows how state-of-the-art libraries achieve impressive performance and apply boosting to both daily and high-frequency data to backtest an intraday trading strategy. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FRe0uI0H.png\" width=\"70%\">\n\u003C\u002Fp>\n\nMore specifically, we will cover the following topics:\n- How does boosting differ from bagging, and how did gradient boosting evolve from adaptive boosting,\n- Design and tune adaptive and gradient boosting models with scikit-learn,\n- Build, optimize, and evaluate gradient boosting models on large datasets with the state-of-the-art implementations XGBoost, LightGBM, and CatBoost,\n- Interpreting and gaining insights from gradient boosting models using [SHAP](https:\u002F\u002Fgithub.com\u002Fslundberg\u002Fshap) values, and\n- Using boosting with high-frequency data to design an intraday strategy.\n\n### 13 Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning\n\nDimensionality reduction and clustering are the main tasks for unsupervised learning: \n- Dimensionality reduction transforms the existing features into a new, smaller set while minimizing the loss of information. A broad range of algorithms exists that differ by how they measure the loss of information, whether they apply linear or non-linear transformations or the constraints they impose on the new feature set. \n- Clustering algorithms identify and group similar observations or features instead of identifying new features. Algorithms differ in how they define the similarity of observations and their assumptions about the resulting groups.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FRfk7uCM.png\" width=\"70%\">\n\u003C\u002Fp>\n\nMore specifically, this [chapter](13_unsupervised_learning) covers:\n- How principal and independent component analysis (PCA and ICA) perform linear dimensionality reduction\n- Identifying data-driven risk factors and eigenportfolios from asset returns using PCA\n- Effectively visualizing nonlinear, high-dimensional data using manifold learning\n- Using T-SNE and UMAP to explore high-dimensional image data\n- How k-means, hierarchical, and density-based clustering algorithms work\n- Using agglomerative clustering to build robust portfolios with hierarchical risk parity\n\n\n## Part 3: Natural Language Processing for Trading\n\nText data are rich in content, yet unstructured in format and hence require more preprocessing so that a machine learning algorithm can extract the potential signal. The critical challenge consists of converting text into a numerical format for use by an algorithm, while simultaneously expressing the semantics or meaning of the content. \n\nThe next three chapters cover several techniques that capture language nuances readily understandable to humans so that machine learning algorithms can also interpret them.\n\n### 14 Text Data for Trading: Sentiment Analysis\n\nText data is very rich in content but highly unstructured so that it requires more preprocessing to enable an ML algorithm to extract relevant information. A key challenge consists of converting text into a numerical format without losing its meaning.\nThis [chapter](14_working_with_text_data) shows how to represent documents as vectors of token counts by creating a document-term matrix that, in turn, serves as input for text classification and sentiment analysis. It also introduces the Naive Bayes algorithm and compares its performance to linear and tree-based models.\n\nIn particular, in this chapter covers:\n- What the fundamental NLP workflow looks like\n- How to build a multilingual feature extraction pipeline using spaCy and TextBlob\n- Performing NLP tasks like part-of-speech tagging or named entity recognition\n- Converting tokens to numbers using the document-term matrix\n- Classifying news using the naive Bayes model\n- How to perform sentiment analysis using different ML algorithms\n\n### 15 Topic Modeling: Summarizing Financial News\n\nThis [chapter](15_topic_modeling) uses unsupervised learning to model latent topics and extract hidden themes from documents. These themes can generate detailed insights into a large corpus of financial reports.\nTopic models automate the creation of sophisticated, interpretable text features that, in turn, can help extract trading signals from extensive collections of texts. They speed up document review, enable the clustering of similar documents, and produce annotations useful for predictive modeling.\nApplications include identifying critical themes in company disclosures, earnings call transcripts or contracts, and annotation based on sentiment analysis or using returns of related assets. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FVVSnTCa.png\" width=\"60%\">\n\u003C\u002Fp>\n\n\nMore specifically, it covers:\n- How topic modeling has evolved, what it achieves, and why it matters\n- Reducing the dimensionality of the DTM using latent semantic indexing\n- Extracting topics with probabilistic latent semantic analysis (pLSA)\n- How latent Dirichlet allocation (LDA) improves pLSA to become the most popular topic model\n- Visualizing and evaluating topic modeling results -\n- Running LDA using scikit-learn and gensim\n- How to apply topic modeling to collections of earnings calls and financial news articles\n\n### 16 Word embeddings for Earnings Calls and SEC Filings\n\nThis [chapter](16_word_embeddings) uses neural networks to learn a vector representation of individual semantic units like a word or a paragraph. These vectors are dense with a few hundred real-valued entries, compared to the higher-dimensional sparse vectors of the bag-of-words model. As a result, these vectors embed or locate each semantic unit in a continuous vector space.\n\nEmbeddings result from training a model to relate tokens to their context with the benefit that similar usage implies a similar vector. As a result, they encode semantic aspects like relationships among words through their relative location. They are powerful features that we will use with deep learning models in the following chapters.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002Fv8w9XLL.png\" width=\"80%\">\n\u003C\u002Fp>\n\n More specifically, in this chapter, we will cover:\n- What word embeddings are and how they capture semantic information\n- How to obtain and use pre-trained word vectors\n- Which network architectures are most effective at training word2vec models\n- How to train a word2vec model using TensorFlow and gensim\n- Visualizing and evaluating the quality of word vectors\n- How to train a word2vec model on SEC filings to predict stock price moves\n- How doc2vec extends word2vec and helps with sentiment analysis\n- Why the transformer’s attention mechanism had such an impact on NLP\n- How to fine-tune pre-trained BERT models on financial data\n\n## Part 4: Deep & Reinforcement Learning\n\nPart four explains and demonstrates how to leverage deep learning for algorithmic trading. \nThe powerful capabilities of deep learning algorithms to identify patterns in unstructured data make it particularly suitable for alternative data like images and text. \n\nThe sample applications show, for exapmle, how to combine text and price data to predict earnings surprises from SEC filings, generate synthetic time series to expand the amount of training data, and train a trading agent using deep reinforcement learning.\nSeveral of these applications replicate research recently published in top journals.\n\n### 17 Deep Learning for Trading\n\nThis [chapter](17_deep_learning) presents feedforward neural networks (NN) and demonstrates how to efficiently train large models using backpropagation while managing the risks of overfitting. It also shows how to use TensorFlow 2.0 and PyTorch and how to optimize a NN architecture to generate trading signals.\nIn the following chapters, we will build on this foundation to apply various architectures to different investment applications with a focus on alternative data. These include recurrent NN tailored to sequential data like time series or natural language and convolutional NN, particularly well suited to image data. We will also cover deep unsupervised learning, such as how to create synthetic data using Generative Adversarial Networks (GAN). Moreover, we will discuss reinforcement learning to train agents that interactively learn from their environment.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002F5cet0Fi.png\" width=\"70%\">\n\u003C\u002Fp>\n\nIn particular, this chapter will cover\n- How DL solves AI challenges in complex domains\n- Key innovations that have propelled DL to its current popularity\n- How feedforward networks learn representations from data\n- Designing and training deep neural networks (NNs) in Python\n- Implementing deep NNs using Keras, TensorFlow, and PyTorch\n- Building and tuning a deep NN to predict asset returns\n- Designing and backtesting a trading strategy based on deep NN signals\n\n### 18 CNN for Financial Time Series and Satellite Images\n\nCNN architectures continue to evolve. This chapter describes building blocks common to successful applications, demonstrates how transfer learning can speed up learning, and how to use CNNs for object detection.\nCNNs can generate trading signals from images or time-series data. Satellite data can anticipate commodity trends via aerial images of agricultural areas, mines, or transport networks. Camera footage can help predict consumer activity; we show how to build a CNN that classifies economic activity in satellite images.\nCNNs can also deliver high-quality time-series classification results by exploiting their structural similarity with images, and we design a strategy based on time-series data formatted like images. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FPlLQV0M.png\" width=\"60%\">\n\u003C\u002Fp>\n\nMore specifically, this [chapter](18_convolutional_neural_nets) covers:\n\n- How CNNs employ several building blocks to efficiently model grid-like data\n- Training, tuning and regularizing CNNs for images and time series data using TensorFlow\n- Using transfer learning to streamline CNNs, even with fewer data\n- Designing a trading strategy using return predictions by a CNN trained on time-series data formatted like images\n- How to classify economic activity based on satellite images\n\n### 19 RNN for Multivariate Time Series and Sentiment Analysis\n\nRecurrent neural networks (RNNs) compute each output as a function of the previous output and new data, effectively creating a model with memory that shares parameters across a deeper computational graph. Prominent architectures include Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) that address the challenges of learning long-range dependencies.\nRNNs are designed to map one or more input sequences to one or more output sequences and are particularly well suited to natural language. They can also be applied to univariate and multivariate time series to predict market or fundamental data. This chapter covers how RNN can model alternative text data using the word embeddings that we covered in Chapter 16 to classify the sentiment expressed in documents.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FE9fOApg.png\" width=\"60%\">\n\u003C\u002Fp>\n\nMore specifically, this chapter addresses:\n- How recurrent connections allow RNNs to memorize patterns and model a hidden state\n- Unrolling and analyzing the computational graph of RNNs\n- How gated units learn to regulate RNN memory from data to enable long-range dependencies\n- Designing and training RNNs for univariate and multivariate time series in Python\n- How to learn word embeddings or use pretrained word vectors for sentiment analysis with RNNs\n- Building a bidirectional RNN to predict stock returns using custom word embeddings\n\n### 20 Autoencoders for Conditional Risk Factors and Asset Pricing\n\nThis [chapter](20_autoencoders_for_conditional_risk_factors) shows how to leverage unsupervised deep learning for trading. We also discuss autoencoders, namely, a neural network trained to reproduce the input while learning a new representation encoded by the parameters of a hidden layer. Autoencoders have long been used for nonlinear dimensionality reduction, leveraging the NN architectures we covered in the last three chapters.\nWe replicate a recent AQR paper that shows how autoencoders can underpin a trading strategy. We will use a deep neural network that relies on an autoencoder to extract risk factors and predict equity returns, conditioned on a range of equity attributes.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FaCmE0UD.png\" width=\"60%\">\n\u003C\u002Fp>\n\nMore specifically, in this chapter you will learn about:\n- Which types of autoencoders are of practical use and how they work\n- Building and training autoencoders using Python\n- Using autoencoders to extract data-driven risk factors that take into account asset characteristics to predict returns\n\n### 21 Generative Adversarial Nets for Synthetic Time Series Data\n\nThis chapter introduces generative adversarial networks (GAN). GANs train a generator and a discriminator network in a competitive setting so that the generator learns to produce samples that the discriminator cannot distinguish from a given class of training data. The goal is to yield a generative model capable of producing synthetic samples representative of this class.\nWhile most popular with image data, GANs have also been used to generate synthetic time-series data in the medical domain. Subsequent experiments with financial data explored whether GANs can produce alternative price trajectories useful for ML training or strategy backtests. We replicate the 2019 NeurIPS Time-Series GAN paper to illustrate the approach and demonstrate the results.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FW1Rp89K.png\" width=\"60%\">\n\u003C\u002Fp>\n\nMore specifically, in this chapter you will learn about:\n- How GANs work, why they are useful, and how they could be applied to trading\n- Designing and training GANs using TensorFlow 2\n- Generating synthetic financial data to expand the inputs available for training ML models and backtesting\n\n### 22 Deep Reinforcement Learning: Building a Trading Agent\n\nReinforcement Learning (RL) models goal-directed learning by an agent that interacts with a stochastic environment. RL optimizes the agent's decisions concerning a long-term objective by learning the value of states and actions from a reward signal. The ultimate goal is to derive a policy that encodes behavioral rules and maps states to actions.\nThis [chapter](22_deep_reinforcement_learning) shows how to formulate and solve an RL problem. It covers model-based and model-free methods, introduces the OpenAI Gym environment, and combines deep learning with RL to train an agent that navigates a complex environment. Finally, we'll show you how to adapt RL to algorithmic trading by modeling an agent that interacts with the financial market while trying to optimize an objective function.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002Flg0ofbZ.png\" width=\"60%\">\n\u003C\u002Fp>\n\nMore specifically,this chapter will cover:\n\n- Define a Markov decision problem (MDP)\n- Use value and policy iteration to solve an MDP\n- Apply Q-learning in an environment with discrete states and actions\n- Build and train a deep Q-learning agent in a continuous environment\n- Use the OpenAI Gym to design a custom market environment and train an RL agent to trade stocks\n\n### 23 Conclusions and Next Steps\n\nIn this concluding chapter, we will briefly summarize the essential tools, applications, and lessons learned throughout the book to avoid losing sight of the big picture after so much detail.\nWe will then identify areas that we did not cover but would be worth focusing on as you expand on the many machine learning techniques we introduced and become productive in their daily use.\n\nIn sum, in this chapter, we will\n- Review key takeaways and lessons learned\n- Point out the next steps to build on the techniques in this book\n- Suggest ways to incorporate ML into your investment process\n\n### 24 Appendix - Alpha Factor Library\n\nThroughout this book, we emphasized how the smart design of features, including appropriate preprocessing and denoising, typically leads to an effective strategy. This appendix synthesizes some of the lessons learned on feature engineering and provides additional information on this vital topic.\n\nTo this end, we focus on the broad range of indicators implemented by TA-Lib (see [Chapter 4](04_alpha_factor_research)) and WorldQuant's [101 Formulaic Alphas](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1601.00991.pdf) paper (Kakushadze 2016), which presents real-life quantitative trading factors used in production with an average holding period of 0.6-6.4 days.\n\nThis chapter covers: \n- How to compute several dozen technical indicators using TA-Lib and NumPy\u002Fpandas,\n- Creating the formulaic alphas describe in the above paper, and\n- Evaluating the predictive quality of the results using various metrics from rank correlation and mutual information to feature importance, SHAP values and Alphalens. \n### Download a free PDF\n\n \u003Ci>If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.\u003Cbr>Simply click on the link to claim your free PDF.\u003C\u002Fi>\n\u003Cp align=\"center\"> \u003Ca href=\"https:\u002F\u002Fpackt.link\u002Ffree-ebook\u002F9781839217715\">https:\u002F\u002Fpackt.link\u002Ffree-ebook\u002F9781839217715 \u003C\u002Fa> \u003C\u002Fp>","\u003Cp align='center'>\u003Ca href='https:\u002F\u002Fwww.eventbrite.com\u002Fe\u002Fship-production-pytorch-system-in-a-day-train-optimize-deploy-workshop-tickets-1983348934052?aff=GitHub'>\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPacktPublishing_Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original_readme_d11aced4566a.png'\u002F>\u003C\u002Fa>\u003C\u002Fp>\n\n\n\n---\n\n## 加入我们的新闻通讯 📬\n\n### DataPro  \n*AI的未来正在展开。别掉队。*\n\n\u003Cp>\u003Ca href=\"https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-datapronewsletter\u002F?link_from_packtlink=yes\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPacktPublishing_Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original_readme_d64f4cb7d3a3.png\" alt=\"DataPro QR\" width=\"150\"\u002F>\u003C\u002Fa>\u003C\u002Fp>\n\n与[**DataPro**](https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-datapronewsletter\u002F?link_from_packtlink=yes)保持同步，这是面向数据科学家、AI\u002FML研究人员和数据工程师的免费每周通讯。  \n从热门工具如**PyTorch**、**scikit-learn**、**XGBoost**和**BentoML**，到关于**数据库优化**和真实世界**ML工作流**的实用见解，您将快速获取所需信息。\n\n> 用[DataPro](https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-datapronewsletter\u002F?link_from_packtlink=yes)保持敏锐。加入超过**11.5万名数据专业人士**，不错过任何重要资讯。\n\n---\n\n### BIPro  \n*业务运行依赖于数据。确保您的数据讲述正确的故事。*\n\n\u003Cp>\u003Ca href=\"https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-bipro-newsletter\u002F?link_from_packtlink=yes\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPacktPublishing_Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original_readme_769ac60b65af.png\" alt=\"BIPro QR\" width=\"150\"\u002F>\u003C\u002Fa>\u003C\u002Fp>\n\n[**BIPro**](https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-bipro-newsletter\u002F?link_from_packtlink=yes)是面向商业智能专业人士、分析师和数据领导者的免费每周通讯。  \n提供关于**仪表盘设计**、**数据可视化**和**分析策略**的实用技巧，涵盖**Power BI**、**Tableau**、**Looker**、**SQL**和**dbt**等工具。\n\n> 通过[BIPro](https:\u002F\u002Flanding.packtpub.com\u002Fsubscribe-bipro-newsletter\u002F?link_from_packtlink=yes)变得更聪明。受到**3.5万多名BI专业人士**的信赖，看看您错过了什么。\n\n\n\n\n# 机器学习在交易中的应用 - 第2版\n\n本书[链接](https:\u002F\u002Fwww.amazon.com\u002FMachine-Learning-Algorithmic-Trading-alternative\u002Fdp\u002F1839217715?pf_rd_r=GZH2XZ35GB3BET09PCCA&pf_rd_p=c5b6893a-24f2-4a59-9d4b-aff5065c90ec&pd_rd_r=91a679c7-f069-4a6e-bdbb-a2b3f548f0c8&pd_rd_w=2B0Q0&pd_rd_wg=GMY5S&ref_=pd_gw_ci_mcx_mr_hp_d)旨在以实用且全面的方式展示机器学习如何为算法交易策略增值。书中涵盖了从线性回归到深度强化学习的广泛机器学习技术，并演示了如何构建、回测及评估由模型预测驱动的交易策略。  \n\n全书分为四部分，共**23章加附录**，超过**800页**，内容包括：\n- 数据获取、**金融特征工程**和投资组合管理的重要方面，\n- 基于监督和无监督机器学习算法的多空**策略**的设计与评估，\n- 如何从SEC文件、财报电话会议记录及财经新闻等**金融文本数据**中提取可交易信号，\n- 使用CNN、RNN等**深度学习**模型结合市场数据和另类数据，利用生成对抗网络生成合成数据，以及通过深度强化学习训练交易代理。\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fwww.amazon.com\u002FMachine-Learning-Algorithmic-Trading-alternative\u002Fdp\u002F1839217715?pf_rd_r=GZH2XZ35GB3BET09PCCA&pf_rd_p=c5b6893a-24f2-4a59-9d4b-aff5065c90ec&pd_rd_r=91a679c7-f069-4a6e-bdbb-a2b3f548f0c8&pd_rd_w=2B0Q0&pd_rd_wg=GMY5S&ref_=pd_gw_ci_mcx_mr_hp_d\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPacktPublishing_Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original_readme_8698c6363773.png\" width=\"75%\">\n\u003C\u002Fa>\n\u003C\u002Fp>\n\n本仓库包含**150多个笔记本**，将书中讨论的概念、算法和用例付诸实践。这些笔记本提供了大量示例，展示了\n- 如何处理并从市场、基本面以及另类文本和图像数据中提取信号，\n- 如何训练和调优能够预测不同资产类别和投资期限收益的模型，包括复现近期发表的研究成果，\n- 如何设计、回测和评估交易策略。\n\n> 我们**强烈建议**在阅读本书时同时查看这些笔记本；它们通常处于已执行状态，且常常包含因书籍篇幅限制而未能收录的额外信息。\n\n## 第2版有哪些新内容？\n\n首先，本书[（链接）](https:\u002F\u002Fwww.amazon.com\u002FMachine-Learning-Algorithmic-Trading-alternative\u002Fdp\u002F1839217715?pf_rd_r=VMKJPZC4N36TTZZCWATP&pf_rd_p=c5b6893a-24f2-4a59-9d4b-aff5065c90ec&pd_rd_r=8f331266-0d21-4c76-a3eb-d2e61d23bb31&pd_rd_w=kVGNF&pd_rd_wg=LYLKH&ref_=pd_gw_ci_mcx_mr_hp_d)展示了如何从多样化的数据源中提取信号，并利用广泛的监督学习、无监督学习和强化学习算法为不同资产类别设计交易策略。书中还提供了相关的数学和统计学知识，以帮助调优算法或解释结果。此外，它涵盖了金融背景知识，使读者能够处理市场数据和基本面数据，提取有用特征，并管理交易策略的绩效。\n\n从实践角度来看，第2版旨在为您提供概念性理解与工具，以开发自己的基于机器学习的交易策略。为此，本书将机器学习视为一个流程中的关键环节，而非独立的操作，介绍了端到端的机器学习交易工作流，涵盖数据获取、特征工程、模型优化、策略设计以及回测等环节。\n\n具体而言，ML4T工作流始于为明确的投资标的池构思策略创意，收集相关数据并提取有信息量的特征；接着是针对预测任务设计、调优和评估机器学习模型；最后则是根据模型的预测信号制定交易策略，并使用回测引擎在历史数据上模拟和评估策略表现。一旦决定在真实市场中执行算法策略，您将需要反复迭代这一工作流，以融入新信息并适应不断变化的市场环境。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FkcgItgp.png\" width=\"75%\">\n\u003C\u002Fp>\n\n[第二版](https:\u002F\u002Fwww.amazon.com\u002FMachine-Learning-Algorithmic-Trading-alternative\u002Fdp\u002F1839217715?pf_rd_r=GZH2XZ35GB3BET09PCCA&pf_rd_p=c5b6893a-24f2-4a59-9d4b-aff5065c90ec&pd_rd_r=91a679c7-f069-4a6e-bdbb-a2b3f548f0c8&pd_rd_w=2B0Q0&pd_rd_wg=GMY5S&ref_=pd_gw_ci_mcx_mr_hp_d)强调的ML4T工作流，体现在新增了关于[策略回测](08_ml4t_workflow)的一章、描述超过100种阿尔法因子的新[附录](24_alpha_factor_library)，以及许多新的实际应用案例。我们还对现有内容进行了大量重写，以提高清晰度和可读性。\n\n这些交易应用现在使用的数据源比以往更广泛，不仅限于美国股市的日线数据，还包括国际股票和ETF。书中还演示了如何利用分钟级的股票数据构建日内交易策略。此外，对另类数据源的覆盖范围也进一步扩展，例如利用美国证监会的文件进行情绪分析和收益预测，以及借助卫星图像对土地利用类型进行分类。\n\n第二版的另一项创新是复现了近期发表在顶级期刊上的几项交易应用：\n- [第18章](18_convolutional_neural_nets)展示了如何将时间序列转换为图像格式后，再应用卷积神经网络进行收益预测，其方法基于Sezer和Ozbahoglu于2018年的研究[《基于深度卷积神经网络的时间序列转图像方法的算法化金融交易》](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F324802031_Algorithmic_Financial_Trading_with_Deep_Convolutional_Neural_Networks_Time_Series_to_Image_Conversion_Approach)。\n- [第20章](20_autoencoders_for_conditional_risk_factors)介绍了如何基于Gu、Kelly和Xiu于2019年发表的论文[《自编码器资产定价模型》](https:\u002F\u002Fwww.aqr.com\u002FInsights\u002FResearch\u002FWorking-Paper\u002FAutoencoder-Asset-Pricing-Models)，利用自编码器提取与个股特征相关的风险因子，用于资产定价。\n- [第21章](21_gans_for_synthetic_time_series)则说明了如何基于Yoon、Jarrett和van der Schaar于2019年提出的[时间序列生成对抗网络](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F8789-time-series-generative-adversarial-networks)技术，使用生成对抗网络创建合成训练数据。\n\n所有应用目前均采用撰写时可用的最新软件版本，如pandas 1.0和TensorFlow 2.2。此外，书中还提供了一个定制版的Zipline库，方便在设计交易策略时轻松集成机器学习模型的预测结果。\n\n## 安装与数据源\n\n- 关于如何使用Docker镜像或设置各种conda环境来安装笔记本中所用的软件包，请参阅[此处](installation\u002FREADME.md)。\n- 若要下载并预处理本书中使用的多种数据源，请参阅[create_datasets](data\u002Fcreate_datasets.ipynb)。\n\n# 章节概要\n\n本书共分为四个部分，分别探讨在获取和处理市场、基本面及另类数据时遇到的不同挑战，在交易环境中为各类预测任务开发机器学习解决方案，以及设计并评估依赖于机器学习模型生成的预测信号的交易策略。\n\n> 每个章节的目录中都包含一个README文件，其中提供了关于内容、代码示例及其他资源的补充信息。\n\n[第1部分：从数据到策略开发](#part-1-from-data-to-strategy-development)\n* [01 机器学习在交易中的应用：从想法到执行](#01-machine-learning-for-trading-from-idea-to-execution)\n* [02 市场与基本面数据：来源与技术](#02-market--fundamental-data-sources-and-techniques)\n* [03 金融领域的另类数据：类别与用例](#03-alternative-data-for-finance-categories-and-use-cases)\n* [04 金融特征工程：如何研究阿尔法因子](#04-financial-feature-engineering-how-to-research-alpha-factors)\n* [05 投资组合优化与绩效评估](#05-portfolio-optimization-and-performance-evaluation)\n\n[第2部分：机器学习在交易中的基础](#part-2-machine-learning-for-trading-fundamentals)\n* [06 机器学习流程](#06-the-machine-learning-process)\n* [07 线性模型：从风险因子到收益预测](#07-linear-models-from-risk-factors-to-return-forecasts)\n* [08 ML4T工作流：从模型到策略回测](#08-the-ml4t-workflow-from-model-to-strategy-backtesting)\n* [09 时间序列模型：用于波动率预测与统计套利](#09-time-series-models-for-volatility-forecasts-and-statistical-arbitrage)\n* [10 贝叶斯机器学习：动态夏普比率与配对交易](#10-bayesian-ml-dynamic-sharpe-ratios-and-pairs-trading)\n* [11 随机森林：一种针对日本股票的多空策略](#11-random-forests-a-long-short-strategy-for-japanese-stocks)\n* [12 提升你的交易策略](#12-boosting-your-trading-strategy)\n* [13 无监督学习驱动的风险因子与资产配置](#13-data-driven-risk-factors-and-asset-allocation-with-unsupervised-learning)\n\n[第3部分：自然语言处理在交易中的应用](#part-3-natural-language-processing-for-trading)\n* [14 用于交易的文本数据：情感分析](#14-text-data-for-trading-sentiment-analysis)\n* [15 主题建模：总结财经新闻](#15-topic-modeling-summarizing-financial-news)\n* [16 用于财报电话会议和美国证监会备案文件的词嵌入](#16-word-embeddings-for-earnings-calls-and-sec-filings)\n\n[第4部分：深度学习与强化学习](#part-4-deep--reinforcement-learning)\n* [17 用于交易的深度学习](#17-deep-learning-for-trading)\n* [18 CNN应用于金融时间序列与卫星图像](#18-cnn-for-financial-time-series-and-satellite-images)\n* [19 RNN用于多元时间序列与情感分析](#19-rnn-for-multivariate-time-series-and-sentiment-analysis)\n* [20 自编码器用于条件风险因子与资产定价](#20-autoencoders-for-conditional-risk-factors-and-asset-pricing)\n* [21 生成对抗网络用于合成时间序列数据](#21-generative-adversarial-nets-for-synthetic-time-series-data)\n* [22 深度强化学习：构建交易智能体](#22-deep-reinforcement-learning-building-a-trading-agent)\n* [23 结论与下一步行动](#23-conclusions-and-next-steps)\n* [24 附录——阿尔法因子库](#24-appendix---alpha-factor-library)\n\n\n## 第1部分：从数据到策略开发\n\n第一部分提供了一个基于机器学习（ML）开发交易策略的框架。它重点关注支撑本书中讨论的机器学习算法和策略的数据，概述了如何为机器学习模型构建和评估特征，并在执行交易策略时如何管理和衡量投资组合的绩效。\n\n### 01 机器学习在交易中的应用：从想法到执行\n\n本章探讨了促使机器学习成为投资行业竞争优势来源的行业趋势。我们还将研究机器学习在投资流程中所处的位置，以支持算法交易策略的实施。\n\n具体而言，本章涵盖以下主题：\n- 投资行业中机器学习兴起的关键趋势\n- 利用机器学习设计和执行交易策略\n- 机器学习在交易中的常见应用场景\n\n### 02 市场与基本面数据：来源与技术\n\n本章展示了如何处理市场和基本面数据，并描述了这些数据所反映的环境中的关键方面。例如，熟悉各种订单类型和交易基础设施不仅有助于正确解读数据，还能确保回测模拟的设计准确无误。此外，我们还演示了如何使用Python访问和操作交易数据及财务报表数据。\n\n通过实际案例，我们展示了如何利用纳斯达克的逐笔数据和Algoseek的分钟级分时数据来处理交易数据，这些数据具有丰富的属性，能够捕捉供需动态，而这些动态将在后续用于基于机器学习的日内策略。同时，我们也介绍了多家数据提供商的API，以及如何从美国证券交易委员会获取财务报表信息。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FenaSo0C.png\" title=\"限价订单簿\" width=\"50%\"\u002F>\n\u003C\u002Fp>\n具体来说，本章涵盖了：\n\n- 市场数据如何反映交易环境的结构\n- 处理分钟级别的日内成交与报价数据\n- 使用纳斯达克ITCH协议从逐笔数据重建**限价订单簿**\n- 利用不同类型的K线图汇总逐笔数据\n- 处理可扩展商业报告语言（XBRL）编码的**电子备案文件**\n- 解析并整合市场与基本面数据，构建市盈率序列\n- 如何使用Python访问各类市场和基本面数据源\n\n### 03 金融领域的另类数据：类别与用例\n\n本[章节](03_alternative_data)概述了另类数据的类别和应用场景，介绍了评估不断涌现的数据源和提供商的标准，并总结了当前市场格局。\n\n此外，本章还演示了如何通过网页爬取来构建另类数据集，例如收集财报电话会议的文字稿，以便在本书第三部分中与自然语言处理（NLP）及情感分析算法结合使用。\n\n具体而言，本章涵盖以下内容：\n\n- 在另类数据革命过程中涌现出哪些新型信号来源\n- 个人、企业和各类传感器如何生成多样化的另类数据\n- 另类数据的重要类别及其主要提供商\n- 如何评估日益丰富的另类数据资源在交易中的应用潜力\n- 使用Python进行另类数据处理，包括网络爬虫技术的应用\n\n### 04 金融特征工程：如何研究阿尔法因子\n\n如果你已经熟悉机器学习，就会知道特征工程是实现成功预测的关键环节。在交易领域，这一点同样至关重要。学术界和业界的研究人员几十年来一直在探究驱动资产市场与价格变动的因素，以及哪些特征能够解释或预测价格走势。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FUCu4Huo.png\" width=\"70%\">\n\u003C\u002Fp>\n\n本[章节](04_alpha_factor_research)以这些研究成果为核心要点，为你开启寻找阿尔法因子的旅程。同时，本书还提供了计算和测试阿尔法因子的必备工具，重点介绍了NumPy、pandas和TA-Lib等库如何帮助高效处理数据，以及小波变换和卡尔曼滤波等流行平滑技术如何有效降低数据噪声。阅读本章后，你将了解：\n- 阿尔法因子的主要类别、其作用原理及衡量方法\n- 如何利用NumPy、pandas和TA-Lib构建阿尔法因子\n- 如何运用小波变换和卡尔曼滤波对数据去噪\n- 使用Zipline离线环境及Quantopian平台测试单个或多个阿尔法因子\n- 如何借助Alphalens，通过信息系数等指标评估预测性能\n\n### 05 投资组合优化与绩效评估\n\n阿尔法因子会生成信号，由算法策略转化为交易指令，进而形成多头和空头仓位。最终投资组合的收益与风险决定了该策略是否达成了既定的投资目标。\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FE2h63ZB.png\" width=\"65%\">\n\u003C\u002Fp>\n\n优化投资组合的方法多种多样，其中包括运用机器学习技术来挖掘资产间的层次化关系，在设计投资组合的风险特征时将其视为互补或替代关系。本[章节](05_strategy_evaluation)涵盖以下内容：\n- 如何度量投资组合的风险与收益\n- 利用均值方差优化及其他方法管理投资组合权重\n- 运用机器学习优化资产配置\n- 使用Zipline模拟交易并基于阿尔法因子构建投资组合\n- 如何借助pyfolio评估投资组合绩效\n\n## 第二部分：机器学习在交易中的基础\n\n第二部分将介绍监督学习和无监督学习的基本算法，并展示它们在交易策略中的应用。此外，还将介绍Quantopian平台，该平台允许你整合并运用本书中所开发的数据与机器学习技术，从而在真实市场中执行算法交易策略。\n\n### 06 机器学习流程\n\n本[章节](06_machine_learning_process)作为第二部分的开篇，旨在说明如何将多种监督与无监督机器学习模型应用于交易。我们将首先解释每种模型的假设与适用场景，随后通过不同的Python库演示相关应用。\n\n许多模型及其应用具有若干共性。本章将聚焦这些共性内容，以便在后续章节中更深入地探讨各模型的具体用法。本章以系统化的流程为主线，阐述如何制定、训练、调优机器学习模型，并对其预测性能进行评估。具体内容包括：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002F5qisClE.png\" width=\"65%\">\n\u003C\u002Fp>\n\n- 监督学习与无监督学习从数据中学习的基本原理\n- 回归与分类任务中监督学习模型的训练与评估\n- 偏差-方差权衡对预测性能的影响\n- 如何诊断并解决过拟合导致的预测误差\n- 利用交叉验证优化超参数，尤其针对时间序列数据\n- 为何金融数据在进行样本外测试时需要额外关注\n\n### 07 线性模型：从风险因子到收益预测\n\n线性模型是回归与分类问题中用于推断与预测的标准工具。许多广泛使用的资产定价模型都依赖于线性回归。而岭回归和Lasso回归等正则化模型则通过限制过拟合风险，往往能带来更好的预测效果。典型的回归应用会识别驱动资产收益的风险因素，以管理风险或预测收益；而分类问题则常用于方向性的价格预测。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002F3Ph6jma.png\" width=\"65%\">\n\u003C\u002Fp>\n\n[第7章](07_linear_models)涵盖以下主题：\n\n- 线性回归的工作原理及其假设条件\n- 线性回归模型的训练与诊断\n- 利用线性回归预测股票收益\n- 通过正则化提升预测性能\n- 逻辑回归的工作原理\n- 将回归问题转化为分类问题\n\n### 08 ML4T 工作流：从模型到策略回测\n\n本章[08_ml4t_workflow]从端到端的角度，介绍如何设计、模拟并评估由机器学习算法驱动的交易策略。我们将详细演示如何使用 Python 库 [backtrader](https:\u002F\u002Fwww.backtrader.com\u002F) 和 [Zipline](https:\u002F\u002Fwww.zipline.io\u002Findex.html) 在历史市场环境中对基于机器学习的策略进行回测。ML4T 工作流的最终目标是从历史数据中获取证据，以帮助决策是否将候选策略部署到实盘市场并投入资金。对策略的真实模拟必须忠实反映证券市场的运作方式和交易执行过程。此外，还需关注若干方法论上的细节，以避免产生偏差的结果和虚假发现，从而导致糟糕的投资决策。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FR9O0fn3.png\" width=\"65%\">\n\u003C\u002Fp>\n\n具体而言，学完本章后，你将能够：\n\n- 规划并实施端到端的策略回测\n- 理解并规避回测实施中的关键陷阱\n- 讨论向量化与事件驱动型回测引擎各自的优缺点\n- 识别并评估事件驱动型回测器的关键组件\n- 使用分钟级和日频数据源设计并执行 ML4T 工作流，其中机器学习模型可以单独训练，也可以在回测过程中同步训练\n- 利用 Zipline 和 backtrader 设计并评估自己的策略\n\n### 09 时间序列模型用于波动率预测与统计套利\n\n本章[09_time_series_models]聚焦于从时间序列的历史数据中提取信号，以预测该序列未来的值。由于交易天然具有时间维度，时间序列模型被广泛使用。本章提供了诊断时间序列特征（如平稳性）以及提取潜在有用模式特征的工具。同时，还介绍了单变量和多变量时间序列模型，用于预测宏观数据和波动率模式。最后，本章解释了协整如何识别不同时间序列间的共同趋势，并展示了如何基于这一关键概念开发配对交易策略。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FcglLgJ0.png\" width=\"90%\">\n\u003C\u002Fp>\n\n具体内容包括：\n- 如何利用时间序列分析为建模过程做准备并提供信息\n- 估计和诊断单变量自回归及移动平均模型\n- 构建自回归条件异方差（ARCH）模型以预测波动率\n- 如何构建多变量向量自回归模型\n- 利用协整开发配对交易策略\n\n### 10 贝叶斯机器学习：动态夏普比率与配对交易\n\n贝叶斯统计使我们能够量化对未来事件的不确定性，并在新信息不断涌现时以合理的方式修正估计。这种动态方法非常适合金融市场的不断变化特性。贝叶斯方法应用于机器学习，可以深入洞察统计指标、参数估计和预测结果中的不确定性。其应用范围涵盖更精细的风险管理，以及根据市场环境变化动态更新预测模型等。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FqOUPIDV.png\" width=\"80%\">\n\u003C\u002Fp>\n\n具体来说，本章[10_bayesian_machine_learning]涵盖了：\n- 贝叶斯统计在机器学习中的应用\n- 使用 PyMC3 进行概率编程\n- 利用 PyMC3 定义并训练机器学习模型\n- 如何运行最先进的采样方法进行近似推理\n- 贝叶斯机器学习在计算动态夏普比率、动态配对交易对冲比例以及估计随机波动率方面的应用\n\n### 11 随机森林：针对日本股票的多空策略\n\n本章[11_decision_trees_random_forests]将决策树和随机森林应用于交易。决策树可以从数据中学习规则，这些规则编码了非线性的输入输出关系。我们将展示如何训练决策树来解决回归和分类问题，可视化并解释模型所学到的规则，以及调整模型的超参数以优化偏差-方差权衡并防止过拟合。\n\n本章的第二部分介绍了集成模型，它以随机化的方式组合多个决策树，从而生成误差更低的单一预测。最后，我们提出了一种基于随机森林模型生成的交易信号的日本股市多空策略。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FS4s0rou.png\" width=\"80%\">\n\u003C\u002Fp>\n\n简而言之，本章内容包括：\n- 将决策树用于回归和分类任务\n- 从决策树中获取洞察，并可视化从数据中学到的规则\n- 理解为何集成模型往往能带来更优异的效果\n- 使用自助聚合方法应对决策树过拟合的问题\n- 训练、调优并解读随机森林模型\n- 利用随机森林设计并评估一项盈利的交易策略\n\n### 12 提升你的交易策略\n\n梯度提升是一种基于树的集成算法，通常比随机森林效果更好。其关键区别在于，梯度提升会根据模型累计的误差动态调整每棵树的训练数据。而随机森林是独立地使用数据的随机子集训练多棵树，梯度提升则是按顺序进行，并对数据重新加权。本章[12_gradient_boosting_machines]展示了当前最先进的库如何实现卓越性能，并将梯度提升应用于日频和高频数据，以回测一种日内交易策略。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FRe0uI0H.png\" width=\"70%\">\n\u003C\u002Fp>\n\n具体而言，我们将讨论以下主题：\n- 梯度提升与自助聚合有何不同，梯度提升又是如何从自适应提升发展而来的\n- 使用 scikit-learn 设计并调优自适应提升和梯度提升模型\n- 利用最先进的 XGBoost、LightGBM 和 CatBoost 实现，在大型数据集上构建、优化并评估梯度提升模型\n- 使用 [SHAP](https:\u002F\u002Fgithub.com\u002Fslundberg\u002Fshap) 值解释并深入理解梯度提升模型\n- 将梯度提升与高频数据结合，设计日内交易策略\n\n### 13 基于无监督学习的数据驱动风险因子与资产配置\n\n无监督学习的主要任务是降维和聚类：\n- 降维将现有特征转换为一组新的、更小的特征，同时尽量减少信息损失。现有的算法种类繁多，它们在衡量信息损失的方式、采用线性或非线性变换以及对新特征集施加的约束等方面各有不同。\n- 聚类算法不是寻找新的特征，而是识别并分组相似的观测值或特征。不同算法在定义观测值相似性的方法以及对聚类结果的假设上存在差异。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FRfk7uCM.png\" width=\"70%\">\n\u003C\u002Fp>\n\n具体而言，本章[13_unsupervised_learning]涵盖：\n- 主成分分析（PCA）和独立成分分析（ICA）如何进行线性降维\n- 利用PCA从资产收益率中识别数据驱动的风险因子和特征组合\n- 使用流形学习有效地可视化非线性高维数据\n- 使用T-SNE和UMAP探索高维图像数据\n- k-means、层次聚类和基于密度的聚类算法的工作原理\n- 利用凝聚聚类构建具有层次化风险平价的稳健投资组合\n\n\n## 第三部分：用于交易的自然语言处理\n\n文本数据内容丰富，但格式非结构化，因此需要更多的预处理，以便机器学习算法能够提取潜在信号。关键挑战在于将文本转换为算法可使用的数值形式，同时保留其语义或含义。\n\n接下来的三章将介绍几种技术，这些技术能够捕捉人类易于理解的语言细微差别，从而使机器学习算法也能加以解读。\n\n### 14 用于交易的文本数据：情感分析\n\n文本数据内容非常丰富，但高度非结构化，因此需要更多的预处理才能使机器学习算法提取相关信息。一个关键挑战是在不丢失文本意义的情况下将其转换为数值形式。\n本章[14_working_with_text_data]展示了如何通过构建文档-词项矩阵，将文档表示为词频向量，进而作为文本分类和情感分析的输入。此外，还介绍了朴素贝叶斯算法，并将其性能与线性模型和树模型进行了比较。\n\n本章特别涵盖了以下内容：\n- NLP的基本工作流程是什么\n- 如何使用spaCy和TextBlob构建多语言特征提取流水线\n- 执行词性标注或命名实体识别等NLP任务\n- 利用文档-词项矩阵将文本转换为数字\n- 使用朴素贝叶斯模型对新闻进行分类\n- 如何利用不同的机器学习算法进行情感分析\n\n### 15 主题建模：总结金融新闻\n\n本章[15_topic_modeling]利用无监督学习来建模文档中的潜在主题并提取隐藏的主旨。这些主题可以为大量金融报告提供深入洞察。\n主题模型能够自动创建复杂且可解释的文本特征，从而帮助从大量文本中提取交易信号。它们还能加快文档审查速度，实现相似文档的聚类，并生成对预测建模有用的注释。应用包括识别公司披露文件、财报电话会议记录或合同中的关键主题，以及基于情感分析或相关资产收益的注释。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FVVSnTCa.png\" width=\"60%\">\n\u003C\u002Fp>\n\n\n具体来说，本章涵盖了：\n- 主题建模的发展历程、作用及其重要性\n- 使用潜在语义索引降低文档-词项矩阵的维度\n- 利用概率潜在语义分析（pLSA）提取主题\n- 潜在狄利克雷分配（LDA）如何改进pLSA，使其成为最流行的主题模型\n- 可视化和评估主题建模的结果\n- 使用scikit-learn和gensim运行LDA\n- 如何将主题建模应用于财报电话会议记录和金融新闻文章\n\n### 16 用于财报电话会议和SEC备案文件的词嵌入\n\n本章[16_word_embeddings]利用神经网络学习单个语义单元（如单词或段落）的向量表示。这些向量包含几百个实数值，相比词袋模型的高维稀疏向量更为稠密。因此，这些向量将每个语义单元嵌入到一个连续的向量空间中。\n\n词嵌入是通过训练模型使标记与其上下文相关而得到的，其优势在于相似的用法会对应相似的向量。因此，它们通过相对位置编码了词语之间的关系等语义信息。这些强大的特征将在后续章节中与深度学习模型结合使用。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002Fv8w9XLL.png\" width=\"80%\">\n\u003C\u002Fp>\n\n具体而言，本章将涵盖：\n- 词嵌入是什么，如何捕捉语义信息\n- 如何获取并使用预训练的词向量\n- 哪些网络架构最适合训练word2vec模型\n- 如何使用TensorFlow和gensim训练word2vec模型\n- 可视化和评估词向量的质量\n- 如何在SEC备案文件上训练word2vec模型以预测股价变动\n- doc2vec如何扩展word2vec并有助于情感分析\n- 为什么Transformer的注意力机制对NLP产生了如此深远的影响\n- 如何在金融数据上微调预训练的BERT模型\n\n## 第四部分：深度学习与强化学习\n\n第四部分解释并演示如何利用深度学习进行算法交易。\n深度学习算法在非结构化数据中识别模式的强大能力，使其特别适合处理图像和文本等另类数据。\n\n示例应用展示了如何将文本和价格数据结合起来，从SEC备案文件中预测盈利惊喜；如何生成合成时间序列以扩充训练数据量；以及如何使用深度强化学习训练交易代理。其中一些应用复制了近期发表在顶级期刊上的研究成果。\n\n### 17 深度学习在交易中的应用\n\n本章[介绍](17_deep_learning)前馈神经网络（NN），并演示如何利用反向传播高效训练大型模型，同时管理过拟合风险。此外，还将展示如何使用 TensorFlow 2.0 和 PyTorch，以及如何优化神经网络架构以生成交易信号。\n\n在接下来的几章中，我们将在此基础上，针对不同投资应用场景，特别是另类数据，应用多种神经网络架构。其中包括专为时间序列或自然语言等序列数据设计的循环神经网络，以及特别适用于图像数据的卷积神经网络。我们还将探讨深度无监督学习，例如如何使用生成对抗网络（GAN）创建合成数据。此外，还将讨论强化学习，用于训练能够与环境交互式学习的智能体。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002F5cet0Fi.png\" width=\"70%\">\n\u003C\u002Fp>\n\n具体而言，本章将涵盖：\n- 深度学习如何解决复杂领域中的人工智能挑战\n- 推动深度学习如今广受欢迎的关键创新\n- 前馈网络如何从数据中学习表示\n- 使用 Python 设计和训练深度神经网络（NN）\n- 利用 Keras、TensorFlow 和 PyTorch 实现深度神经网络\n- 构建并调优深度神经网络以预测资产收益\n- 基于深度神经网络信号设计并回测交易策略\n\n### 18 用于金融时间序列和卫星图像的卷积神经网络\n\n卷积神经网络架构仍在不断发展。本章描述了成功应用中常见的构建模块，展示了迁移学习如何加速学习过程，以及如何使用卷积神经网络进行目标检测。\n\n卷积神经网络可以从图像或时间序列数据中生成交易信号。卫星数据可通过农业区、矿山或交通网络的航拍图像来预测大宗商品趋势。摄像头视频可以帮助预测消费者活动；我们将展示如何构建一个能够对卫星图像中的经济活动进行分类的卷积神经网络。\n\n此外，卷积神经网络还能通过利用其与图像的结构相似性，提供高质量的时间序列分类结果，并且我们会设计一种基于以图像格式呈现的时间序列数据的策略。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FPlLQV0M.png\" width=\"60%\">\n\u003C\u002Fp>\n\n更具体地说，本章[介绍](18_convolutional_neural_nets)的内容包括：\n\n- 卷积神经网络如何利用多个构建模块高效地建模网格状数据\n- 使用 TensorFlow 训练、调优和正则化用于图像和时间序列数据的卷积神经网络\n- 利用迁移学习简化卷积神经网络的开发流程，即使在数据较少的情况下也能实现\n- 设计一种基于卷积神经网络对以图像格式呈现的时间序列数据进行收益预测的交易策略\n- 如何根据卫星图像对经济活动进行分类\n\n### 19 用于多元时间序列和情感分析的循环神经网络\n\n循环神经网络（RNN）将每个输出计算为前一输出和新数据的函数，从而有效地创建了一个具有记忆功能、并在更深的计算图中共享参数的模型。其中较为著名的架构包括长短期记忆网络（LSTM）和门控循环单元（GRU），它们旨在解决学习长距离依赖关系的难题。\n\n循环神经网络的设计目的是将一个或多个输入序列映射到一个或多个输出序列，尤其适合处理自然语言。它们也可应用于单变量和多变量时间序列，以预测市场或基本面数据。本章将介绍 RNN 如何利用我们在第 16 章中讨论的词嵌入技术，对替代文本数据进行建模，从而对文档中表达的情感进行分类。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FE9fOApg.png\" width=\"60%\">\n\u003C\u002Fp>\n\n更具体而言，本章将探讨：\n- 循环连接如何使 RNN 能够记忆模式并建模隐藏状态\n- 展开并分析 RNN 的计算图\n- 门控单元如何从数据中学习调节 RNN 内存，以实现长距离依赖\n- 在 Python 中设计和训练用于单变量和多变量时间序列的 RNN\n- 如何学习词嵌入或使用预训练的词向量进行 RNN 情感分析\n- 构建双向 RNN，利用自定义词嵌入预测股票收益\n\n### 20 用于条件风险因子和资产定价的自编码器\n\n本章[介绍](20_autoencoders_for_conditional_risk_factors)如何将无监督深度学习应用于交易。我们还将讨论自编码器，即一种经过训练能够在学习隐藏层参数所编码的新表示的同时重现输入的神经网络。自编码器长期以来一直被用于非线性降维，其基础正是我们在过去三章中介绍过的神经网络架构。\n\n我们将复现一篇近期的 AQR 论文，该论文展示了自编码器如何支撑交易策略。我们将使用一个基于自编码器的深度神经网络，提取风险因子并预测股票收益，这些预测会根据一系列股票属性进行条件化。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FaCmE0UD.png\" width=\"60%\">\n\u003C\u002Fp>\n\n更具体地说，在本章中您将了解到：\n- 哪些类型的自编码器具有实际用途及其工作原理\n- 使用 Python 构建和训练自编码器\n- 如何利用自编码器提取数据驱动的风险因子，同时考虑资产特征以预测收益\n\n### 21 用于生成合成时间序列数据的生成对抗网络\n\n本章介绍生成对抗网络（GAN）。GAN 在竞争环境中同时训练生成器和判别器网络，使生成器学会生成能够欺骗判别器、使其无法区分与给定训练数据类别样本的样本。其目标是生成一个能够产出代表该类别的合成样本的生成模型。\n\n尽管 GAN 最常用于图像数据，但它们也被用于在医疗领域生成合成时间序列数据。随后的金融数据实验则探讨了 GAN 是否能生成对机器学习训练或策略回测有用的替代价格轨迹。我们将复现 2019 年 NeurIPS 时间序列 GAN 论文，以说明这一方法并展示相关结果。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FW1Rp89K.png\" width=\"60%\">\n\u003C\u002Fp>\n\n更具体地说，在本章中您将了解到：\n- GAN 的工作原理、其优势以及如何将其应用于交易\n- 使用 TensorFlow 2 设计和训练 GAN\n- 生成合成金融数据，以扩展可用于训练机器学习模型和回测的数据来源\n\n### 22 深度强化学习：构建交易智能体\n\n强化学习（RL）通过智能体与随机环境的交互来实现目标导向的学习。RL通过从奖励信号中学习状态和动作的价值，优化智能体针对长期目标的决策过程。最终目标是推导出一种策略，该策略编码行为规则，并将状态映射到动作。\n\n本[章](22_deep_reinforcement_learning)展示了如何构建并解决一个强化学习问题。它涵盖了基于模型和无模型的方法，介绍了OpenAI Gym环境，并将深度学习与强化学习相结合，训练一个能够在复杂环境中导航的智能体。最后，我们将展示如何通过建模一个与金融市场交互并试图优化目标函数的智能体，将强化学习应用于算法交易。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002Flg0ofbZ.png\" width=\"60%\">\n\u003C\u002Fp>\n\n具体而言，本章将涵盖：\n\n- 定义马尔可夫决策问题（MDP）\n- 使用值迭代和策略迭代求解MDP\n- 在具有离散状态和动作的环境中应用Q学习\n- 构建并在连续环境中训练深度Q学习智能体\n- 利用OpenAI Gym设计自定义市场环境，并训练一个RL智能体进行股票交易\n\n### 23 结论与下一步\n\n在本总结性章节中，我们将简要回顾全书中的关键工具、应用及所学经验，以帮助读者在阅读大量细节之后仍能把握全局。随后，我们将指出一些未涉及但值得进一步关注的领域，以便您在扩展我们介绍的多种机器学习技术并将其有效应用于日常工作中时有所参考。\n\n总之，在本章中，我们将：\n- 回顾主要收获与经验教训\n- 指明基于本书所介绍技术的下一步发展方向\n- 提供建议，指导如何将机器学习融入您的投资流程\n\n### 24 附录——阿尔法因子库\n\n在整本书中，我们一直强调，特征的合理设计，包括适当的预处理和去噪，通常能够带来有效的策略。本附录综合了我们在特征工程方面的一些经验，并就这一重要主题提供了更多相关信息。\n\n为此，我们重点研究了TA-Lib库中实现的广泛指标（参见[第4章](04_alpha_factor_research)），以及WorldQuant于2016年发表的论文《101种公式化阿尔法》（Kakushadze, 2016），该论文介绍了实际生产环境中使用的量化交易因子，其平均持有期为0.6至6.4天。\n\n本章内容包括：\n- 如何使用TA-Lib以及NumPy\u002Fpandas计算数十种技术指标\n- 构建上述论文中描述的公式化阿尔法因子\n- 采用秩相关、互信息、特征重要性、SHAP值和Alphalens等多种指标评估结果的预测能力\n\n### 免费下载PDF\n\n\u003Ci>如果您已购买本书的纸质版或Kindle电子版，即可免费获得无DRM限制的PDF版本。\u003Cbr>只需点击链接即可领取您的免费PDF。\u003C\u002Fi>\n\u003Cp align=\"center\"> \u003Ca href=\"https:\u002F\u002Fpackt.link\u002Ffree-ebook\u002F9781839217715\">https:\u002F\u002Fpackt.link\u002Ffree-ebook\u002F9781839217715 \u003C\u002Fa> \u003C\u002Fp>","# Machine-Learning-for-Algorithmic-Trading (第二版) 快速上手指南\n\n本指南基于《Machine Learning for Algorithmic Trading - Second Edition》配套开源代码库，帮助开发者快速搭建环境并运行量化交易机器学习示例。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows (推荐 WSL2)\n*   **Python 版本**: Python 3.7+ (书中示例基于 pandas 1.0 和 TensorFlow 2.2 编写，建议使用 Python 3.8 以获得最佳兼容性)\n*   **包管理工具**: `conda` (强烈推荐，用于管理复杂的数据科学依赖) 或 `pip`\n*   **容器工具 (可选)**: Docker (如需使用预构建镜像)\n*   **内存**: 建议至少 8GB RAM，处理深度学习模型或大规模回测时建议 16GB+\n\n## 安装步骤\n\n本项目包含大量依赖项，最稳妥的方式是使用 `conda` 创建隔离环境或使用官方提供的 Docker 镜像。\n\n### 方案一：使用 Conda 环境（推荐）\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fstefan-jansen\u002Fmachine-learning-for-trading.git\n    cd machine-learning-for-trading\n    ```\n\n2.  **查看安装说明**\n    详细的各章节环境配置说明位于 `installation` 目录中。\n    ```bash\n    cat installation\u002FREADME.md\n    ```\n\n3.  **创建并激活环境**\n    根据仓库指引，通常可以使用提供的 `.yml` 文件创建环境。如果根目录有 `environment.yml`，执行：\n    ```bash\n    conda env create -f environment.yml\n    conda activate ml4t\n    ```\n    *注：若需在国内加速下载，请在运行上述命令前配置 conda 国内源：*\n    ```bash\n    conda config --add channels https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Fmain\u002F\n    conda config --add channels https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Ffree\u002F\n    conda config --set show_channel_urls yes\n    ```\n\n4.  **安装自定义回测引擎**\n    本书使用了定制版的 `zipline` 以支持机器学习预测信号，请确保按照 `installation\u002FREADME.md` 中的指示进行额外安装。\n\n### 方案二：使用 Docker（最简便）\n\n如果您希望避免繁琐的依赖冲突，可以直接拉取预构建的 Docker 镜像（如果作者已发布）或根据 `Dockerfile` 自行构建：\n\n```bash\ndocker build -t ml4t-second-edition .\ndocker run -it -p 8888:8888 ml4t-second-edition jupyter notebook --ip=0.0.0.0 --no-browser --allow-root\n```\n\n### 数据准备\n\n代码库不包含原始金融数据，您需要运行脚本下载并预处理数据：\n\n```bash\n# 激活环境后运行\njupyter nbconvert --to notebook --execute data\u002Fcreate_datasets.ipynb\n```\n*注意：部分数据源可能需要注册 API Key 或手动下载，请参考 `data` 目录下的具体 Notebook 说明。*\n\n## 基本使用\n\n本项目核心由 **150+ 个 Jupyter Notebooks** 组成，每个章节对应一个目录，涵盖了从数据获取、特征工程到策略回测的全流程。\n\n### 1. 启动 Jupyter Lab\n进入项目根目录并启动服务：\n```bash\njupyter lab\n```\n\n### 2. 运行第一个示例\n建议从 **Part 1** 开始，了解完整的工作流。\n\n*   **路径**: 打开 `01_machine_learning_for_trading\u002F` 目录。\n*   **操作**: 点击运行该目录下的 Notebook 文件（通常为 `01_...ipynb`）。\n*   **内容**: 该示例将演示如何加载市场数据、进行简单的特征工程并查看数据分布。\n\n### 3. 复现书中的策略\n以第 11 章“随机森林多空策略”为例：\n\n1.  导航至 `11_random_forests\u002F` 目录。\n2.  打开对应的 Notebook。\n3.  按顺序执行单元格。代码将自动：\n    *   加载日本股票数据。\n    *   训练 Random Forest 模型预测收益。\n    *   构建多空投资组合。\n    *   使用定制版 Zipline 进行回测并输出绩效指标。\n\n> **提示**: 书中提到的所有图表和结果均可在这些已执行状态的 Notebook 中找到，且往往包含比纸质书更详细的代码注释和中间变量分析。建议边读书边对照运行相应章节的 Notebook。","某量化对冲基金的研究员正试图构建一个融合财报文本情绪与历史量价数据的自动化交易策略，以捕捉短期市场错误定价。\n\n### 没有 Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original 时\n- **数据清洗无从下手**：面对 SEC  filings（美国证券交易委员会文件）和电话会议记录等非结构化文本，缺乏成熟的预处理流程，难以提取有效的金融特征。\n- **模型选型盲目试错**：在从线性回归到深度强化学习的众多算法中迷失方向，不清楚如何针对特定资产类别和投资周期选择合适的监督或无监督学习模型。\n- **回测系统脆弱**：自行搭建的回测框架常忽略未来函数和数据泄露问题，导致策略在模拟盘表现优异，实盘却大幅亏损。\n- **合成数据缺失**：面对极端市场行情样本不足的问题，无法利用生成对抗网络（GANs）生成高质量的合成数据进行压力测试。\n\n### 使用 Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original 后\n- **文本信号高效提取**：直接复用书中提供的 150+ 个笔记本代码，快速实现从财经新闻和财报中提取可交易信号的自然语言处理流水线。\n- **策略构建有章可循**：依据书中涵盖的 23 章实战指南，精准设计基于 CNN、RNN 及深度强化学习的长短线策略，显著降低模型调试成本。\n- **回测评估严谨可靠**：采用书中演示的专业回测架构，有效规避常见陷阱，确保策略在多种市场环境下的稳健性与泛化能力。\n- **数据增强能力提升**：利用书中关于 GANs 的实战案例生成合成市场数据，弥补了罕见极端行情下的训练样本缺口，提升了模型鲁棒性。\n\nMachine-Learning-for-Algorithmic-Trading-Second-Edition_Original 将复杂的机器学习理论转化为可执行的代码模板，帮助团队在短时间内完成了从数据源处理到策略实盘部署的全流程闭环。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPacktPublishing_Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original_8698c636.png","PacktPublishing","Packt","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FPacktPublishing_6d855195.jpg","Providing books, eBooks, video tutorials, and articles for IT developers, administrators, and users.",null,"https:\u002F\u002Fwww.packtpub.com","https:\u002F\u002Fgithub.com\u002FPacktPublishing",[80,84,88,91],{"name":81,"color":82,"percentage":83},"Jupyter Notebook","#DA5B0B",100,{"name":85,"color":86,"percentage":87},"Python","#3572A5",0,{"name":89,"color":90,"percentage":87},"Erlang","#B83998",{"name":92,"color":93,"percentage":87},"Shell","#89e051",1562,580,"2026-04-16T18:10:31","MIT","未说明 (支持 Docker 及 Conda 环境，通常兼容 Linux\u002FmacOS\u002FWindows)","未明确必需 (涉及深度学习章节如 CNN\u002FRNN\u002FGAN\u002F强化学习，建议使用 NVIDIA GPU)","未说明 (处理金融时间序列及替代数据，建议 16GB+)",{"notes":102,"python":103,"dependencies":104},"本项目为《Machine Learning for Algorithmic Trading》第二版的配套代码库，包含 150+ 个 Notebook。安装细节（Docker 镜像或 Conda 环境配置）请参阅 installation\u002FREADME.md。数据下载与预处理脚本位于 data\u002Fcreate_datasets.ipynb。内容涵盖从线性回归到深度强化学习的多种策略，部分章节复现了最新学术论文（如 CNN、Autoencoder、GAN 在量化中的应用）。","未说明 (书中示例基于 pandas 1.0 和 TensorFlow 2.2，推测需 Python 3.7+)",[105,106,107,108,109],"pandas>=1.0","tensorflow>=2.2","zipline (定制版)","scikit-learn","numpy",[14,111],"其他","2026-03-27T02:49:30.150509","2026-04-18T00:37:21.090100",[115,120,125,130,135,140,145,150],{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},38046,"运行代码时出现 KeyError: 'No object named quandl\u002Fwiki\u002Fprices in the file'，无法加载 assets.h5 数据，如何解决？","这是因为 `create_dataset` notebook 中缺少添加元数据的步骤。你需要在该 notebook 中添加以下代码来生成完整的 `assets.h5` 文件：\n```python\ndf = pd.read_csv('us_equities_meta_data.csv')\nwith pd.HDFStore(DATA_STORE) as store:\n    store.put('us_equities\u002Fstocks', df)\n```\n此外，请确保你已按照官方主仓库的说明创建了数据集。","https:\u002F\u002Fgithub.com\u002FPacktPublishing\u002FMachine-Learning-for-Algorithmic-Trading-Second-Edition_Original\u002Fissues\u002F1",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},38047,"执行机器学习工作流代码时报错：AttributeError: 'DataFrame' object has no attribute 'r'，如何修复？","这是因为生成的 DataFrame 需要先进行转置并重命名列。请使用以下修正后的代码：\n```python\ncorrel = (X\n          .apply(lambda x: spearmanr(x, y))\n          .apply(pd.Series)\n         ).transpose()\n\ncorrel = correl.rename(columns={0: \"r\", 1: \"pval\"})\n```\n这样即可正确访问 `r` 和 `pval` 列。","https:\u002F\u002Fgithub.com\u002FPacktPublishing\u002FMachine-Learning-for-Algorithmic-Trading-Second-Edition_Original\u002Fissues\u002F9",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},38048,"从 NASDAQ 下载公司列表数据时程序卡住（hang）或无响应，原因是什么？","NASDAQ 已经禁用了自动下载功能，因此原有的 URL 不再有效。这是数据源端的变更，代码中的 `pd.read_csv` 请求会一直等待直到超时。建议手动查找替代数据源或查看主仓库是否有更新的数据获取方式。","https:\u002F\u002Fgithub.com\u002FPacktPublishing\u002FMachine-Learning-for-Algorithmic-Trading-Second-Edition_Original\u002Fissues\u002F7",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},38049,"在 Docker 中运行 zipline ingest 时出现 PermissionError: [Errno 13] Permission denied 错误，如何解决？","如果你是在 Windows 上运行 Docker，需要授予 Docker 访问主机驱动器的权限。请参考相关教程配置 Docker 的文件共享设置（例如在 Docker Desktop 设置中添加挂载目录）。同时，建议直接使用源代码仓库而非 Packt 的克隆版本，因为后者可能未及时更新。","https:\u002F\u002Fgithub.com\u002FPacktPublishing\u002FMachine-Learning-for-Algorithmic-Trading-Second-Edition_Original\u002Fissues\u002F5",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},38050,"下载 ITCH 订单流数据时出现 URLError 或 FTP 超时错误，如何获取数据？","原有的 FTP 链接可能已失效或超时。你可以手动从 NASDAQ 官网下载数据，或者将代码中的 FTP_URL 替换为以下 HTTPS 地址：\n`https:\u002F\u002Femi.nasdaq.com\u002FITCH\u002FNasdaq%20ITCH\u002F`\n替换后通常可以正常下载。","https:\u002F\u002Fgithub.com\u002FPacktPublishing\u002FMachine-Learning-for-Algorithmic-Trading-Second-Edition_Original\u002Fissues\u002F17",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},38051,"环境安装非常困难且报错，data 目录不存在，应该如何开始？","不需要一次性安装书中所有包。建议创建一个虚拟环境，仅安装当前要运行的笔记本所需的少量包。如果遇到具体问题，请提供详细错误信息以便排查。此外，请务必使用官方主仓库（stefan-jansen\u002Fmachine-learning-for-trading）提出问题，Packt 的镜像仓库可能已过时。","https:\u002F\u002Fgithub.com\u002FPacktPublishing\u002FMachine-Learning-for-Algorithmic-Trading-Second-Edition_Original\u002Fissues\u002F14",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},38052,"仓库中找不到 AlgoSeek 数据的链接或指南，在哪里可以找到？","相关说明已移至官方主仓库。请查阅主仓库（https:\u002F\u002Fgithub.com\u002Fstefan-jansen\u002Fmachine-learning-for-trading）的 README 文件以获取最新的 AlgoSeek 数据下载和使用指南。","https:\u002F\u002Fgithub.com\u002FPacktPublishing\u002FMachine-Learning-for-Algorithmic-Trading-Second-Edition_Original\u002Fissues\u002F3",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},38053,"项目中找不到 macOS 的环境配置文件（environments\u002Fmacos），只有 linux 目录，如何在 Mac 上安装？","请遵循官方主仓库提供的安装说明。通常 Linux 的环境配置在 macOS 上也适用，或者主仓库中已有针对 macOS 的最新指导。建议直接参考主仓库文档而非此镜像仓库。","https:\u002F\u002Fgithub.com\u002FPacktPublishing\u002FMachine-Learning-for-Algorithmic-Trading-Second-Edition_Original\u002Fissues\u002F13",[]]