[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-EpistasisLab--tpot":3,"tool-EpistasisLab--tpot":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,52],"视频",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[14,35],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":32,"env_os":93,"env_gpu":94,"env_ram":94,"env_deps":95,"category_tags":109,"github_topics":111,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":131,"updated_at":132,"faqs":133,"releases":162},4249,"EpistasisLab\u002Ftpot","tpot","A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.","TPOT（Tree-based Pipeline Optimization Tool）是一款基于 Python 的自动化机器学习工具，旨在充当数据科学家的智能助手。它利用遗传编程算法，自动搜索并优化从数据预处理、特征选择到模型构建与参数调优的完整机器学习流程，最终输出性能最佳的代码管道。\n\n在传统机器学习中，构建高效模型往往需要专家耗费大量时间尝试不同的算法组合与参数配置。TPOT 有效解决了这一痛点，将繁琐的试错过程自动化，帮助用户快速获得高质量的预测模型，显著降低了对深厚领域经验的依赖。\n\n这款工具特别适合数据科学家、机器学习研究人员以及希望提升建模效率的开发者使用。无论是处理复杂的科研数据还是构建商业预测系统，TPOT 都能提供强有力的支持。\n\n近期，TPOT 经历了重大重构，融合了原\"TPOT2\"的核心特性。其技术亮点包括：采用基于图的新架构以提升效率，支持多目标优化，具备更灵活的搜索空间定义能力，以及引入了遗传特征选择机制。这些升级使得 TPOT 在保持易用性的同时，能够应对更复杂的数据挑战，为用户提供更加模块化且可定制的自动化解决方案。","# TPOT\n\n\u003Ccenter>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEpistasisLab_tpot_readme_5c5dda92fbc2.jpg\" width=300 \u002F>\n\u003C\u002Fcenter>\n\n\u003Cbr>\n\n![Tests](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Factions\u002Fworkflows\u002Ftests.yml\u002Fbadge.svg)\n[![PyPI Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Ftpot?label=pypi%20downloads)](https:\u002F\u002Fpypi.org\u002Fproject\u002FTPOT)\n[![Conda Downloads](https:\u002F\u002Fimg.shields.io\u002Fconda\u002Fdn\u002Fconda-forge\u002Ftpot?label=conda%20downloads)](https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Ftpot)\n\nTPOT stands for Tree-based Pipeline Optimization Tool. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. Consider TPOT your Data Science Assistant.\n\n## Contributors\n\nTPOT recently went through a major refactoring. The package was rewritten from scratch to improve efficiency and performance, support new features, and fix numerous bugs. New features include genetic feature selection, a significantly expanded and more flexible method of defining search spaces, multi-objective optimization, a more modular framework allowing for easier customization of the evolutionary algorithm, and more. While in development, this new version was referred to as \"TPOT2\" but we have now merged what was once TPOT2 into the main TPOT package. You can learn more about this new version of TPOT in our GPTP paper titled \"TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning.\"\n\n    Ribeiro, P. et al. (2024). TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning. In: Winkler, S., Trujillo, L., Ofria, C., Hu, T. (eds) Genetic Programming Theory and Practice XX. Genetic and Evolutionary Computation. Springer, Singapore. https:\u002F\u002Fdoi.org\u002F10.1007\u002F978-981-99-8413-8_1\n\nThe current version of TPOT was developed at Cedars-Sinai by:  \n    - Pedro Henrique Ribeiro (Lead developer - https:\u002F\u002Fgithub.com\u002Fperib, https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fpedro-ribeiro\u002F)  \n    - Anil Saini (anil.saini@cshs.org)  \n    - Jose Hernandez (jgh9094@gmail.com)  \n    - Jay Moran (jay.moran@cshs.org)  \n    - Nicholas Matsumoto (nicholas.matsumoto@cshs.org)  \n    - Hyunjun Choi (hyunjun.choi@cshs.org)  \n    - Gabriel Ketron (gabriel.ketron@cshs.org)  \n    - Miguel E. Hernandez (miguel.e.hernandez@cshs.org)  \n    - Jason Moore (moorejh28@gmail.com)  \n\nThe original version of TPOT was primarily developed at the University of Pennsylvania by:  \n    - Randal S. Olson (rso@randalolson.com)  \n    - Weixuan Fu (weixuanf@upenn.edu)  \n    - Daniel Angell (dpa34@drexel.edu)  \n    - Jason Moore (moorejh28@gmail.com)  \n    - and many more generous open-source contributors  \n\n## License\n\nPlease see the [repository license](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fblob\u002Fmain\u002FLICENSE) for the licensing and usage information for TPOT.\nGenerally, we have licensed TPOT to make it as widely usable as possible.\n\nTPOT is free software: you can redistribute it and\u002For modify\nit under the terms of the GNU Lesser General Public License as\npublished by the Free Software Foundation, either version 3 of\nthe License, or (at your option) any later version.\n\nTPOT is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\nGNU Lesser General Public License for more details.\n\nYou should have received a copy of the GNU Lesser General Public\nLicense along with TPOT. If not, see \u003Chttp:\u002F\u002Fwww.gnu.org\u002Flicenses\u002F>.\n\n## Documentation\n\n[The documentation webpage can be found here.](https:\u002F\u002Fepistasislab.github.io\u002Ftpot\u002F)\n\nWe also recommend looking at the Tutorials folder for jupyter notebooks with examples and guides.\n\n## Installation\n\nTPOT requires a working installation of Python.\n\n### Creating a conda environment (optional)\n\nWe recommend using conda environments for installing TPOT, though it would work equally well if manually installed without it.\n\n[More information on making anaconda environments found here.](https:\u002F\u002Fconda.io\u002Fprojects\u002Fconda\u002Fen\u002Flatest\u002Fuser-guide\u002Ftasks\u002Fmanage-environments.html)\n\n```\nconda create --name tpotenv python=3.10\nconda activate tpotenv\n```\n\n### Packages Used\n\npython version >=3.10, \u003C3.14\nnumpy\nscipy\nscikit-learn\nupdate_checker\ntqdm\nstopit\npandas\njoblib\nxgboost\nmatplotlib\ntraitlets\nlightgbm\noptuna\njupyter\nnetworkx\ndask\ndistributed\ndask-ml\ndask-jobqueue\nfunc_timeout\nconfigspace\n\nMany of the hyperparameter ranges used in our configspaces were adapted from either the original TPOT package or the AutoSklearn package. \n\n### Note for M1 Mac or other Arm-based CPU users\n\nYou need to install the lightgbm package directly from conda using the following command before installing TPOT. \n\nThis is to ensure that you get the version that is compatible with your system.\n\n```\nconda install --yes -c conda-forge 'lightgbm>=3.3.3'\n```\n\n### Installing Extra Features with pip\n\nIf you want to utilize the additional features provided by TPOT along with `scikit-learn` extensions, you can install them using `pip`. The command to install TPOT with these extra features is as follows:\n\n```\npip install tpot[sklearnex]\n```\n\nPlease note that while these extensions can speed up scikit-learn packages, there are some important considerations:\n\nThese extensions may not be fully developed and tested on Arm-based CPUs, such as M1 Macs. You might encounter compatibility issues or reduced performance on such systems.\n\nWe recommend using Python 3.9 when installing these extra features, as it provides better compatibility and stability.\n\n\n### Developer\u002FLatest Branch Installation\n\n\n```\npip install -e \u002Fpath\u002Fto\u002Ftpotrepo\n```\n\nIf you downloaded with git pull, then the repository folder will be named TPOT. (Note: this folder is the one that includes setup.py inside of it and not the folder of the same name inside it).\nIf you downloaded as a zip, the folder may be called tpot-main. \n\n\n## Usage \n\nSee the Tutorials Folder for more instructions and examples.\n\n### Best Practices\n\n#### 1 \nTPOT uses dask for parallel processing. When Python is parallelized, each module is imported within each processes. Therefore it is important to protect all code within a `if __name__ == \"__main__\"` when running TPOT from a script. This is not required when running TPOT from a notebook.\n\nFor example:\n\n```\n#my_analysis.py\n\nimport tpot\nif __name__ == \"__main__\":\n    X, y = load_my_data()\n    est = tpot.TPOTClassifier()\n    est.fit(X,y)\n    #rest of analysis\n```\n\n#### 2\n\nWhen designing custom objective functions, avoid the use of global variables.\n\nDon't Do:\n```\nglobal_X = [[1,2],[4,5]]\nglobal_y = [0,1]\ndef foo(est):\n    return my_scorer(est, X=global_X, y=global_y)\n\n```\n\nInstead use a partial\n\n```\nfrom functools import partial\n\ndef foo_scorer(est, X, y):\n    return my_scorer(est, X, y)\n\nif __name__=='__main__':\n    X = [[1,2],[4,5]]\n    y = [0,1]\n    final_scorer = partial(foo_scorer, X=X, y=y)\n```\n\nSimilarly when using lambda functions.\n\nDont Do:\n\n```\ndef new_objective(est, a, b)\n    #definition\n\na = 100\nb = 20\nbad_function = lambda est :  new_objective(est=est, a=a, b=b)\n```\n\nDo:\n```\ndef new_objective(est, a, b)\n    #definition\n\na = 100\nb = 20\ngood_function = lambda est, a=a, b=b : new_objective(est=est, a=a, b=b)\n```\n\n### Tips\n\nTPOT will not check if your data is correctly formatted. It will assume that you have passed in operators that can handle the type of data that was passed in. For instance, if you pass in a pandas dataframe with categorical features and missing data, then you should also include in your configuration operators that can handle those feautures of the data. Alternatively, if you pass in `preprocessing = True`, TPOT will impute missing values, one hot encode categorical features, then standardize the data. (Note that this is currently fitted and transformed on the entire training set before splitting for CV. Later there will be an option to apply per fold, and have the parameters be learnable.)\n\n\nSetting `verbose` to 5 can be helpful during debugging as it will print out the error generated by failing pipelines. \n\n\n## Contributing to TPOT\n\nWe welcome you to check the existing issues for bugs or enhancements to work on. If you have an idea for an extension to TPOT, please file a new issue so we can discuss it.\n\n## Citing TPOT\n\nIf you use TPOT in a scientific publication, please consider citing at least one of the following papers:\n\nHernandez, J. G., Saini, A. K., Ghosh, A., & Moore, J. H. (2025). [The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning](https:\u002F\u002Fwww.cell.com\u002Fpatterns\u002Ffulltext\u002FS2666-3899(25)00162-X). Patterns, 6(7).\n\nBibTeX entry:\n\n```bibtext\n@article{hernandez2025tree,\n  title={The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning},\n  author={Hernandez, Jose Guadalupe and Saini, Anil Kumar and Ghosh, Attri and Moore, Jason H},\n  journal={Patterns},\n  volume={6},\n  number={7},\n  year={2025},\n  publisher={Elsevier}\n}\n```\n\nRibeiro, P., Saini, A., Moran, J., Matsumoto, N., Choi, H., Hernandez, M., & Moore, J. H. (2024). [TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-981-99-8413-8_1). In Genetic programming theory and practice XX (pp. 1-17). Singapore: Springer Nature Singapore.\n\nBitTex entry:\n\n```bibtex\n@incollection{ribeiro2024tpot2,\n  title={TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning},\n  author={Ribeiro, Pedro and Saini, Anil and Moran, Jay and Matsumoto, Nicholas and Choi, Hyunjun and Hernandez, Miguel and Moore, Jason H},\n  booktitle={Genetic programming theory and practice XX},\n  pages={1--17},\n  year={2024},\n  publisher={Springer}\n}\n```\n\nRandal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). [Automating biomedical data science through tree-based pipeline optimization](http:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-319-31204-0_9). *Applications of Evolutionary Computation*, pages 123-137.\n\nBibTeX entry:\n\n```bibtex\n@inbook{Olson2016EvoBio,\n    author={Olson, Randal S. and Urbanowicz, Ryan J. and Andrews, Peter C. and Lavender, Nicole A. and Kidd, La Creis and Moore, Jason H.},\n    editor={Squillero, Giovanni and Burelli, Paolo},\n    chapter={Automating Biomedical Data Science Through Tree-Based Pipeline Optimization},\n    title={Applications of Evolutionary Computation: 19th European Conference, EvoApplications 2016, Porto, Portugal, March 30 -- April 1, 2016, Proceedings, Part I},\n    year={2016},\n    publisher={Springer International Publishing},\n    pages={123--137},\n    isbn={978-3-319-31204-0},\n    doi={10.1007\u002F978-3-319-31204-0_9},\n    url={http:\u002F\u002Fdx.doi.org\u002F10.1007\u002F978-3-319-31204-0_9}\n}\n```\n\nRandal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore (2016). [Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science](http:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=2908918). *Proceedings of GECCO 2016*, pages 485-492.\n\nBibTeX entry:\n\n```bibtex\n@inproceedings{OlsonGECCO2016,\n    author = {Olson, Randal S. and Bartley, Nathan and Urbanowicz, Ryan J. and Moore, Jason H.},\n    title = {Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science},\n    booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference 2016},\n    series = {GECCO '16},\n    year = {2016},\n    isbn = {978-1-4503-4206-3},\n    location = {Denver, Colorado, USA},\n    pages = {485--492},\n    numpages = {8},\n    url = {http:\u002F\u002Fdoi.acm.org\u002F10.1145\u002F2908812.2908918},\n    doi = {10.1145\u002F2908812.2908918},\n    acmid = {2908918},\n    publisher = {ACM},\n    address = {New York, NY, USA},\n}\n```\n\n## Related Papers\n\nTrang T. Le, Weixuan Fu and Jason H. Moore (2020). [Scaling tree-based automated machine learning to biomedical big data with a feature set selector](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle\u002F36\u002F1\u002F250\u002F5511404). *Bioinformatics*.36(1): 250-256.\n\nBibTeX entry:\n\n```bibtex\n@article{le2020scaling,\n  title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector},\n  author={Le, Trang T and Fu, Weixuan and Moore, Jason H},\n  journal={Bioinformatics},\n  volume={36},\n  number={1},\n  pages={250--256},\n  year={2020},\n  publisher={Oxford University Press}\n}\n```\n\n\n## Support for TPOT\n\nTPOT was developed in the [Artificial Intelligence Innovation (A2I) Lab](http:\u002F\u002Fepistasis.org\u002F) at Cedars-Sinai with funding from the [NIH](http:\u002F\u002Fwww.nih.gov\u002F) under grants U01 AG066833 and R01 LM010098. We are incredibly grateful for the support of the NIH and the Cedars-Sinai during the development of this project.\n\nThe TPOT logo was designed by Todd Newmuis, who generously donated his time to the project.\n","# TPOT\n\n\u003Ccenter>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEpistasisLab_tpot_readme_5c5dda92fbc2.jpg\" width=300 \u002F>\n\u003C\u002Fcenter>\n\n\u003Cbr>\n\n![Tests](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Factions\u002Fworkflows\u002Ftests.yml\u002Fbadge.svg)\n[![PyPI Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Ftpot?label=pypi%20downloads)](https:\u002F\u002Fpypi.org\u002Fproject\u002FTPOT)\n[![Conda Downloads](https:\u002F\u002Fimg.shields.io\u002Fconda\u002Fdn\u002Fconda-forge\u002Ftpot?label=conda%20downloads)](https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Ftpot)\n\nTPOT 是“基于树的管道优化工具”的缩写。TPOT 是一个 Python 自动化机器学习工具，它使用遗传编程来优化机器学习管道。您可以将 TPOT 视为您数据科学领域的助手。\n\n## 贡献者\n\nTPOT 最近经历了一次重大重构。该软件包从头开始重写，以提高效率和性能、支持新功能并修复大量 bug。新增功能包括遗传特征选择、显著扩展且更灵活的搜索空间定义方法、多目标优化、更模块化的框架以便于进化算法的定制等。在开发过程中，这个新版本被称为“TPOT2”，但现在我们已将 TPOT2 合并到主 TPOT 包中。您可以在我们的 GPTP 论文《TPOT2：用于自动化机器学习的基于树的管道优化工具的新图结构实现》中了解更多关于 TPOT 新版本的信息。\n\n    Ribeiro, P. 等 (2024)。TPOT2：用于自动化机器学习的基于树的管道优化工具的新图结构实现。载于：Winkler, S., Trujillo, L., Ofria, C., Hu, T.（编）遗传编程理论与实践 XX。遗传与进化计算。新加坡 Springer 出版社。https:\u002F\u002Fdoi.org\u002F10.1007\u002F978-981-99-8413-8_1\n\n当前版本的 TPOT 由 Cedars-Sinai 的以下人员开发：\n    - Pedro Henrique Ribeiro（主要开发者 - https:\u002F\u002Fgithub.com\u002Fperib, https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fpedro-ribeiro\u002F）\n    - Anil Saini (anil.saini@cshs.org)\n    - Jose Hernandez (jgh9094@gmail.com)\n    - Jay Moran (jay.moran@cshs.org)\n    - Nicholas Matsumoto (nicholas.matsumoto@cshs.org)\n    - Hyunjun Choi (hyunjun.choi@cshs.org)\n    - Gabriel Ketron (gabriel.ketron@cshs.org)\n    - Miguel E. Hernandez (miguel.e.hernandez@cshs.org)\n    - Jason Moore (moorejh28@gmail.com)\n\n原始版本的 TPOT 主要由宾夕法尼亚大学的以下人员开发：\n    - Randal S. Olson (rso@randalolson.com)\n    - Weixuan Fu (weixuanf@upenn.edu)\n    - Daniel Angell (dpa34@drexel.edu)\n    - Jason Moore (moorejh28@gmail.com)\n    - 以及众多慷慨的开源贡献者\n\n## 许可证\n\n请参阅 [仓库许可证](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fblob\u002Fmain\u002FLICENSE)，了解 TPOT 的许可和使用信息。总体而言，我们对 TPOT 进行了许可，使其尽可能广泛地被使用。\n\nTPOT 是自由软件：您可以根据自由软件基金会发布的 GNU 较宽松通用公共许可证条款重新分发和修改它，无论是该许可证的第 3 版，还是（经您选择）任何后续版本。\n\nTPOT 以“希望它有用”的态度进行分发，但不提供任何担保；甚至不包括对适销性或特定用途适用性的默示担保。有关详细信息，请参阅 GNU 较宽松通用公共许可证。\n\n您应随 TPOT 一起收到一份 GNU 较宽松通用公共许可证副本。如果没有，请访问 \u003Chttp:\u002F\u002Fwww.gnu.org\u002Flicenses\u002F>。\n\n## 文档\n\n[文档网页在此。](https:\u002F\u002Fepistasislab.github.io\u002Ftpot\u002F)\n\n我们还建议查看教程文件夹中的 Jupyter 笔记本，其中包含示例和指南。\n\n## 安装\n\nTPOT 需要一个可用的 Python 安装。\n\n### 创建 conda 环境（可选）\n\n我们建议使用 conda 环境来安装 TPOT，尽管不使用 conda 环境手动安装也同样可行。\n\n[有关创建 Anaconda 环境的更多信息请见此处。](https:\u002F\u002Fconda.io\u002Fprojects\u002Fconda\u002Fen\u002Flatest\u002Fuser-guide\u002Ftasks\u002Fmanage-environments.html)\n\n```\nconda create --name tpotenv python=3.10\nconda activate tpotenv\n```\n\n### 所需包\n\npython 版本 >=3.10, \u003C3.14\nnumpy\nscipy\nscikit-learn\nupdate_checker\ntqdm\nstopit\npandas\njoblib\nxgboost\nmatplotlib\ntraitlets\nlightgbm\noptuna\njupyter\nnetworkx\ndask\ndistributed\ndask-ml\ndask-jobqueue\nfunc_timeout\nconfigspace\n\n我们配置空间中使用的许多超参数范围均改编自原始 TPOT 包或 AutoSklearn 包。\n\n### M1 Mac 或其他基于 Arm 架构 CPU 用户注意事项\n\n在安装 TPOT 之前，您需要使用以下命令直接从 conda 安装 lightgbm 包。\n\n这是为了确保您获得与您的系统兼容的版本。\n\n```\nconda install --yes -c conda-forge 'lightgbm>=3.3.3'\n```\n\n### 使用 pip 安装附加功能\n\n如果您想利用 TPOT 提供的附加功能以及 `scikit-learn` 扩展，可以使用 `pip` 进行安装。安装带有这些附加功能的 TPOT 的命令如下：\n\n```\npip install tpot[sklearnex]\n```\n\n请注意，虽然这些扩展可以加速 scikit-learn 包，但仍有一些重要考虑事项：\n\n这些扩展可能尚未在基于 Arm 架构的 CPU 上完全开发和测试，例如 M1 Mac。您可能会在这些系统上遇到兼容性问题或性能下降。\n\n我们建议在安装这些附加功能时使用 Python 3.9，因为它提供了更好的兼容性和稳定性。\n\n\n### 开发者\u002F最新分支安装\n\n\n```\npip install -e \u002Fpath\u002Fto\u002Ftpotrepo\n```\n\n如果您通过 git pull 下载，则仓库文件夹将命名为 TPOT。（注意：此文件夹是包含 setup.py 的文件夹，而不是其内部同名文件夹）。\n如果您以 zip 文件形式下载，则文件夹可能名为 tpot-main。\n\n\n## 使用 \n\n更多说明和示例请参阅教程文件夹。\n\n### 最佳实践\n\n#### 1 \nTPOT 使用 Dask 进行并行处理。当 Python 被并行化时，每个模块都会在各个进程中被导入。因此，在从脚本运行 TPOT 时，务必将所有代码保护在 `if __name__ == \"__main__\":` 块内。而在笔记本中运行 TPOT 时，则不需要这样做。\n\n例如：\n\n```\n#my_analysis.py\n\nimport tpot\nif __name__ == \"__main__\":\n    X, y = load_my_data()\n    est = tpot.TPOTClassifier()\n    est.fit(X,y)\n    #其余分析代码\n```\n\n#### 2\n\n在设计自定义目标函数时，应避免使用全局变量。\n\n不要这样做：\n```\nglobal_X = [[1,2],[4,5]]\nglobal_y = [0,1]\ndef foo(est):\n    return my_scorer(est, X=global_X, y=global_y)\n\n```\n\n而应使用 `functools.partial`：\n\n```\nfrom functools import partial\n\ndef foo_scorer(est, X, y):\n    return my_scorer(est, X, y)\n\nif __name__=='__main__':\n    X = [[1,2],[4,5]]\n    y = [0,1]\n    final_scorer = partial(foo_scorer, X=X, y=y)\n```\n\n同样地，在使用 lambda 函数时也应如此。\n\n不要这样做：\n```\ndef new_objective(est, a, b)\n    #定义\n\na = 100\nb = 20\nbad_function = lambda est :  new_objective(est=est, a=a, b=b)\n```\n\n而应这样做：\n```\ndef new_objective(est, a, b)\n    #定义\n\na = 100\nb = 20\ngood_function = lambda est, a=a, b=b : new_objective(est=est, a=a, b=b)\n```\n\n### 小贴士\n\nTPOT 不会检查您的数据是否格式正确。它会假定您已传递能够处理所输入数据类型的算子。例如，如果您传入包含分类特征和缺失值的 Pandas 数据框，那么您的配置中也应包含能够处理这些特征的算子。或者，如果您设置 `preprocessing = True`，TPOT 将会填充缺失值、对分类特征进行独热编码，并对数据进行标准化。（请注意，目前这一步骤是在拆分用于交叉验证之前，针对整个训练集进行拟合和转换的。未来将提供按折应用的功能，并使参数可学习。）\n\n\n将 `verbose` 设置为 5 在调试过程中会很有帮助，因为它会打印出因管道失败而产生的错误信息。\n\n\n## 参与 TPOT 的贡献\n\n我们欢迎您查看现有的问题，寻找可以修复的 bug 或需要改进的功能。如果您对 TPOT 的扩展有想法，请提交一个新的问题，以便我们讨论。\n\n## 引用 TPOT\n\n如果您在科学出版物中使用了 TPOT，请考虑引用以下论文中的至少一篇：\n\nHernandez, J. G., Saini, A. K., Ghosh, A., & Moore, J. H. (2025). [基于树的管道优化工具：利用遗传编程和自动化机器学习解决生物医学研究问题](https:\u002F\u002Fwww.cell.com\u002Fpatterns\u002Ffulltext\u002FS2666-3899(25)00162-X). Patterns, 6(7).\n\nBibTeX 条目：\n\n```bibtext\n@article{hernandez2025tree,\n  title={The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning},\n  author={Hernandez, Jose Guadalupe and Saini, Anil Kumar and Ghosh, Attri and Moore, Jason H},\n  journal={Patterns},\n  volume={6},\n  number={7},\n  year={2025},\n  publisher={Elsevier}\n}\n```\n\nRibeiro, P., Saini, A., Moran, J., Matsumoto, N., Choi, H., Hernandez, M., & Moore, J. H. (2024). [TPOT2：一种新的基于图的实现，用于自动化机器学习的树形管道优化工具](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-981-99-8413-8_1). In Genetic programming theory and practice XX (pp. 1-17). Singapore: Springer Nature Singapore。\n\nBitTex 条目：\n\n```bibtex\n@incollection{ribeiro2024tpot2,\n  title={TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning},\n  author={Ribeiro, Pedro and Saini, Anil and Moran, Jay and Matsumoto, Nicholas and Choi, Hyunjun and Hernandez, Miguel and Moore, Jason H},\n  booktitle={Genetic programming theory and practice XX},\n  pages={1--17},\n  year={2024},\n  publisher={Springer}\n}\n```\n\nRandal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, 和 Jason H. Moore (2016). [通过基于树的管道优化自动化的生物医学数据科学](http:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-319-31204-0_9). *进化计算的应用*, 第 123–137 页。\n\nBibTeX 条目：\n\n```bibtex\n@inbook{Olson2016EvoBio,\n    author={Olson, Randal S. and Urbanowicz, Ryan J. and Andrews, Peter C. and Lavender, Nicole A. and Kidd, La Creis and Moore, Jason H.},\n    editor={Squillero, Giovanni and Burelli, Paolo},\n    chapter={Automating Biomedical Data Science Through Tree-Based Pipeline Optimization},\n    title={Applications of Evolutionary Computation: 19th European Conference, EvoApplications 2016, Porto, Portugal, March 30 -- April 1, 2016, Proceedings, Part I},\n    year={2016},\n    publisher={Springer International Publishing},\n    pages={123--137},\n    isbn={978-3-319-31204-0},\n    doi={10.1007\u002F978-3-319-31204-0_9},\n    url={http:\u002F\u002Fdx.doi.org\u002F10.1007\u002F978-3-319-31204-0_9}\n}\n```\n\nRandal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, 和 Jason H. Moore (2016). [评估用于自动化数据科学的基于树的管道优化工具](http:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=2908918). *GECCO 2016 年会议论文集*, 第 485–492 页。\n\nBibTeX 条目：\n\n```bibtex\n@inproceedings{OlsonGECCO2016,\n    author = {Olson, Randal S. and Bartley, Nathan and Urbanowicz, Ryan J. and Moore, Jason H.},\n    title = {Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science},\n    booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference 2016},\n    series = {GECCO '16},\n    year = {2016},\n    isbn = {978-1-4503-4206-3},\n    location = {Denver, Colorado, USA},\n    pages = {485--492},\n    numpages = {8},\n    url = {http:\u002F\u002Fdoi.acm.org\u002F10.1145\u002F2908812.2908918},\n    doi = {10.1145\u002F2908812.2908918},\n    acmid = {2908918},\n    publisher = {ACM},\n    address = {New York, NY, USA},\n}\n```\n\n## 相关论文\n\nTrang T. Le, Weixuan Fu 和 Jason H. Moore (2020). [利用特征选择器将基于树的自动化机器学习扩展到生物医学大数据](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle\u002F36\u002F1\u002F250\u002F5511404). *Bioinformatics*.36(1): 250-256。\n\nBibTeX 条目：\n\n```bibtex\n@article{le2020scaling,\n  title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector},\n  author={Le, Trang T and Fu, Weixuan and Moore, Jason H},\n  journal={Bioinformatics},\n  volume={36},\n  number={1},\n  pages={250--256},\n  year={2020},\n  publisher={Oxford University Press}\n}\n```\n\n\n## TPOT 的支持\n\nTPOT 是在 Cedars-Sinai 的 [人工智能创新实验室 (A2I)](http:\u002F\u002Fepistasis.org\u002F) 中开发的，并得到了 [美国国立卫生研究院 (NIH)](http:\u002F\u002Fwww.nih.gov\u002F) 的资助，项目编号为 U01 AG066833 和 R01 LM010098。我们对 NIH 和 Cedars-Sinai 在该项目开发过程中的大力支持表示由衷的感谢。\n\nTPOT 的标志由 Todd Newmuis 设计，他慷慨地为该项目贡献了自己的时间。","# TPOT 快速上手指南\n\nTPOT (Tree-based Pipeline Optimization Tool) 是一款基于遗传编程的 Python 自动化机器学习工具，旨在自动优化机器学习管道，充当你的数据科学助手。\n\n## 1. 环境准备\n\n在开始之前，请确保满足以下系统要求：\n\n*   **操作系统**：Linux, macOS, Windows\n    *   *注意*：M1 Mac 或其他 ARM 架构 CPU 用户需特别注意 `lightgbm` 的安装（见安装步骤）。\n*   **Python 版本**：>= 3.10 且 \u003C 3.14\n*   **核心依赖**：TPOT 会自动处理大部分依赖，主要包括 `numpy`, `scipy`, `scikit-learn`, `pandas`, `xgboost`, `lightgbm`, `dask` 等。\n\n## 2. 安装步骤\n\n推荐使用 `conda` 进行环境管理，以获得最佳的依赖兼容性。\n\n### 方案 A：使用 Conda 安装（推荐）\n\n1.  **创建并激活虚拟环境**：\n    ```bash\n    conda create --name tpotenv python=3.10\n    conda activate tpotenv\n    ```\n\n2.  **特殊硬件适配（仅限 M1 Mac\u002FARM 用户）**：\n    如果你使用的是 M1 Mac 或其他 ARM 架构 CPU，必须先通过 conda 安装兼容版的 `lightgbm`：\n    ```bash\n    conda install --yes -c conda-forge 'lightgbm>=3.3.3'\n    ```\n\n3.  **安装 TPOT**：\n    ```bash\n    pip install tpot\n    ```\n\n    *可选：若需使用 `scikit-learn` 加速扩展（不建议在 ARM 架构上使用）：*\n    ```bash\n    pip install tpot[sklearnex]\n    ```\n\n### 方案 B：直接使用 Pip 安装\n\n如果你不使用 conda 环境，可直接使用 pip 安装（需确保已安装符合版本的 Python）：\n\n```bash\npip install tpot\n```\n\n## 3. 基本使用\n\nTPOT 利用 `dask` 进行并行处理。**重要提示**：如果在脚本（`.py` 文件）中运行 TPOT，必须将代码包裹在 `if __name__ == \"__main__\":` 块中，否则会导致多进程导入错误。在 Jupyter Notebook 中则无需此操作。\n\n### 最简单示例\n\n以下是一个使用 `TPOTClassifier` 进行分类任务的最小化示例：\n\n```python\n# my_analysis.py\n\nfrom sklearn.datasets import load_digits\nfrom sklearn.model_selection import train_test_split\nimport tpot\n\nif __name__ == \"__main__\":\n    # 1. 准备数据\n    digits = load_digits()\n    X_train, X_test, y_train, y_test = train_test_split(\n        digits.data, digits.target, train_size=0.75, test_size=0.25, random_state=42\n    )\n\n    # 2. 初始化 TPOT 分类器\n    # generations: 进化代数，population_size: 种群大小\n    # verbosity: 详细程度 (2 显示进度条，5 显示错误详情用于调试)\n    tpot_clf = tpot.TPOTClassifier(\n        generations=5,\n        population_size=50,\n        verbosity=2,\n        random_state=42,\n        n_jobs=-1  # 使用所有可用 CPU 核心\n    )\n\n    # 3. 拟合数据 (开始自动搜索最佳管道)\n    tpot_clf.fit(X_train, y_train)\n\n    # 4. 评估模型\n    print(f\"测试集准确率：{tpot_clf.score(X_test, y_test)}\")\n\n    # 5. 导出最佳管道代码\n    tpot_clf.export(\"tpot_best_pipeline.py\")\n```\n\n### 关键参数说明\n*   `generations`: 遗传算法运行的代数，越多搜索越充分但耗时越长。\n*   `population_size`: 每一代保留的管道数量。\n*   `preprocessing`: 默认为 `False`。若设为 `True`，TPOT 会自动处理缺失值填充、类别特征独热编码及数据标准化。\n*   `verbosity`: 设置为 `5` 可在调试时打印失败管道的具体错误信息。","某电商公司的数据科学团队正面临紧急任务，需要在两天内基于用户历史行为数据构建一个高精度的流失预测模型，以配合即将到来的促销活动。\n\n### 没有 tpot 时\n- **人工试错效率低**：数据科学家需手动编写代码逐一测试随机森林、XGBoost、SVM 等数十种算法及其参数组合，耗时数天仍难穷尽所有可能。\n- **特征工程依赖经验**：预处理和特征选择高度依赖个人经验，容易遗漏关键的非线性特征交互，导致模型上限受限。\n- **流程固化难优化**：一旦确定初步方案，很难快速验证“如果换一种降维方式或分类器”是否会带来显著提升，迭代成本极高。\n- **结果可复现性差**：手动调整的记录分散在多个笔记中，难以完整还原最优模型的构建路径，不利于团队协作与部署。\n\n### 使用 tpot 后\n- **自动搜索最优管线**：tpot 利用遗传编程自动探索数千种机器学习流水线组合，几小时内即可找到比人工经验更优的模型架构与参数配置。\n- **智能特征处理**：内置的遗传特征选择机制能自动发现并保留最具预测力的特征子集，甚至挖掘出人工难以察觉的特征转换方式。\n- **高效多方案对比**：只需修改一行配置，tpot 即可并行评估多目标优化策略，快速输出不同复杂度下的最佳模型供业务抉择。\n- **生成可读代码**：任务完成后，tpot 直接导出完整的 Python 脚本，清晰呈现从数据预处理到模型训练的全部步骤，确保无缝交付生产环境。\n\ntpot 将原本需要数天的人工调参工作压缩至小时级，不仅大幅提升了模型性能，更让数据科学家从繁琐的试错中解放出来，专注于业务逻辑与策略分析。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEpistasisLab_tpot_5c5dda92.jpg","EpistasisLab","Epistasis Lab at Cedars Sinai","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FEpistasisLab_7f46bb0a.jpg","Prof. Jason H. Moore's research lab at Cedars Sinai",null,"jason.moore@csmc.edu","http:\u002F\u002Fepistasis.org","https:\u002F\u002Fgithub.com\u002FEpistasisLab",[81,85],{"name":82,"color":83,"percentage":84},"Jupyter Notebook","#DA5B0B",80.9,{"name":86,"color":87,"percentage":88},"Python","#3572A5",19.1,10047,1564,"2026-04-05T13:54:52","LGPL-3.0","Linux, macOS, Windows","未说明",{"notes":96,"python":97,"dependencies":98},"建议使用 conda 管理环境。M1 Mac 或其他 Arm 架构 CPU 用户需先通过 conda 单独安装 lightgbm (>=3.3.3) 以确保兼容性。若使用并行处理（基于 dask），在脚本中运行时必须将代码包裹在 `if __name__ == \"__main__\":` 块中，但在 Jupyter Notebook 中不需要。若数据包含缺失值或类别特征，需配置相应算子或设置 `preprocessing=True`。",">=3.10, \u003C3.14 (安装 sklearnex 扩展建议用 3.9)",[99,100,101,102,103,104,105,106,107,108],"numpy","scipy","scikit-learn","pandas","xgboost","lightgbm","optuna","dask","distributed","networkx",[14,110,16,13],"其他",[112,113,114,115,116,101,117,118,119,120,121,122,123,124,125,126,127,128,129,130],"machine-learning","python","data-science","automl","automation","hyperparameter-optimization","model-selection","parameter-tuning","automated-machine-learning","random-forest","gradient-boosting","feature-engineering","aiml","alzheimer","alzheimers","nia","u01ag066833","ag066833","adsp","2026-03-27T02:49:30.150509","2026-04-06T15:01:00.343256",[134,139,144,149,154,158],{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},19360,"遇到 sklearn.cross_validation 或 RandomizedPCA 的弃用警告怎么办？","这是因为 scikit-learn 0.18+ 版本将 cross_validation 模块重构为 model_selection，并将 RandomizedPCA 替换为 PCA(svd_solver='randomized')。解决方法是更新代码导入路径：使用 `from sklearn.model_selection import train_test_split` 替代旧的 cross_validation 导入。对于 PCA，请使用 `PCA(svd_solver='randomized')`。TPOT 后续版本已针对这些变化增加了兼容性处理。","https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fissues\u002F284",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},19361,"为什么 TPOT 运行过程中最佳内部交叉验证分数（Current best internal CV score）在多代进化中保持不变？","这通常是因为数据集较简单或当前找到的管道已经接近最优解，导致后续进化难以显著提升分数。需要注意的是，`tpot.evaluated_individuals_` 中存储的是平均交叉验证分数，而 `tpot.score` 输出的是最佳管道的适应度分数。如果分数长期不变，可以尝试增加种群大小（population_size）、代数（generations）或调整变异率来增加搜索多样性。","https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fissues\u002F503",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},19362,"导入 sklearn.model_selection 时出现 'No module named model_selection' 错误如何解决？","该错误通常源于 scikit-learn 版本过旧（低于 0.18）。请升级 scikit-learn 到最新版本：`pip install --upgrade scikit-learn`。升级后，使用 `from sklearn.model_selection import train_test_split` 即可正常导入。切勿尝试安装 `sklearn.cross_validation`，因为该模块在新版中已被移除并整合至 model_selection。","https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fissues\u002F314",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},19363,"TPOT 运行时 CPU 占用率飙升至 100% 导致内核崩溃怎么办？","这通常是由于并行计算设置不当或特定模型（如 XGBoost）资源消耗过大引起的。建议尝试以下方案：1. 在初始化 TPOT 时限制 `n_jobs` 参数（例如设为物理核心数减 1，而不是 -1）；2. 使用 Dask 进行分布式训练，需先安装 dask 并在配置中启用；3. 减少种群大小（population_size）和代数（generations）以降低单次运行负载；4. 检查是否使用了内存密集型模型并适当限制配置空间。","https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fissues\u002F759",{"id":155,"question_zh":156,"answer_zh":157,"source_url":143},19364,"TPOT 输出的最佳管道分数与 evaluated_individuals_ 中的分数不一致是什么原因？","这是正常现象。`evaluated_individuals_` 记录的是每个个体在交叉验证中的平均得分，而最终输出的最佳管道分数通常是基于整个训练集重新训练后的评估结果，或者是适应度函数的具体实现值。此外，随机种子或数据划分的微小差异也可能导致数值不完全匹配。应以最终生成的管道代码在实际测试集上的表现为准。",{"id":159,"question_zh":160,"answer_zh":161,"source_url":138},19365,"如何在不同版本的 scikit-learn 之间保持 TPOT 的兼容性？","TPOT 会根据检测到的 scikit-learn 版本自动调整内部使用的模块（如自动切换 cross_validation 和 model_selection）。用户无需手动修改 TPOT 源码，但应确保安装的 TPOT 版本支持当前的 scikit-learn 版本。如果遇到兼容性问题，建议升级 TPOT 到最新版（`pip install --upgrade tpot`），或参考官方文档查看版本对应关系。",[163,168,173,178,183,188,193,198,203,208,213,218,223,228,233,238,243,248,253,258],{"id":164,"version":165,"summary_zh":166,"released_at":167},117363,"v1.1.0","## 变更内容\n* @john-sandall 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1374 中更新了 Python 3.12 的软件包\n* @john-sandall 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1377 中增加了对 Python 3.13 的支持\n* 修复了 `max_eval_time_mins` 和 `max_eval_time_mins`，使其能够接受 `None` 和 `inf`；同时允许稳态估计器使用模板搜索空间，由 @perib 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1375 中完成\n* @perib 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1371 中针对 sklearn 1.6 进行了更新\n\n## 新贡献者\n* @john-sandall 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1374 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fcompare\u002Fv1.0.0...v1.1.0","2025-07-03T21:13:56",{"id":169,"version":170,"summary_zh":171,"released_at":172},117364,"v1.0.0","## 变更内容\n\n1. 代码库迁移：将 tpot2 整合至 tpot；移除了已弃用或实验性的功能。\n2. 性能优化：优化了管道评估流程和遗传编程算子，从而加快收敛速度并降低计算开销。\n3. 基于图的管道：引入了灵活的基于图的机器学习管道表示方法，增强了对复杂模型架构的探索能力。\n4. 依赖项更新：更新了依赖项，以确保与最新版本的 scikit-learn 及其他核心库的兼容性。\n5. 稳定性提升：修复了多项 bug，并改进了错误处理机制，从而提升了整体稳定性和用户体验。\n6. 遗传特征选择：实现了遗传特征选择机制，在管道优化过程中可自动识别相关特征。\n7. 搜索空间扩展：增强了搜索空间定义的灵活性，允许更全面地探索潜在的管道配置。\n8. 模块化框架：将代码库重构为更加模块化的结构，简化了进化算法组件的自定义和扩展。\n9. 文档重做：修订并扩充了文档，包括更新示例和全面指南，以反映新功能及 API 变更。\n\n## 主要贡献者\n- Pedro Henrique Ribeiro（主要开发者 - https:\u002F\u002Fgithub.com\u002Fperib, https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fpedro-ribeiro\u002F）\n- Anil Saini（[anil.saini@cshs.org](mailto:anil.saini@cshs.org)）\n- Jose Hernandez（[jgh9094@gmail.com](mailto:jgh9094@gmail.com)）\n- Jay Moran（[jay.moran@cshs.org](mailto:jay.moran@cshs.org)）\n- Nicholas Matsumoto（[nicholas.matsumoto@cshs.org](mailto:nicholas.matsumoto@cshs.org)）\n- Gabriel Ketron（[Gabriel.Ketron@cshs.org](mailto:Gabriel.Ketron@cshs.org)）\n- Hyunjun Choi（[hyunjun.choi@cshs.org](mailto:hyunjun.choi@cshs.org)）\n- Miguel E. Hernandez（[miguel.e.hernandez@cshs.org](mailto:miguel.e.hernandez@cshs.org)）\n- Jason Moore（[moorejh28@gmail.com](mailto:moorejh28@gmail.com)）\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fcommits\u002Fv1.0.0","2025-02-25T21:05:37",{"id":174,"version":175,"summary_zh":176,"released_at":177},117365,"v0.12.2","## 变更内容\n* 估计器类型，由 @perib 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1319 中完成\n* 更新依赖项，由 @gatl 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1335 中完成\n* 将 torch 版本从 1.3.1 升级到 1.13.1，由 @dependabot 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1336 中完成\n* 将 scikit-learn 版本更新至 1.1.3，由 @perib 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1337 中完成\n* 移除已弃用的 imp 模块，并修复文档字符串警告，由 @perib 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1331 中完成\n* 更新与 scikit-learn 1.4 的兼容性，由 @perib 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1343 中完成\n* 改进错误信息，由 @gatl 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1338 中完成\n* 修复 Mate 运算符问题，由 @perib 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1268 中完成\n\n## 新贡献者\n* @gatl 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1335 中完成了首次贡献\n* @dependabot 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1336 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fcompare\u002Fv0.12.1...v0.12.2","2024-02-23T19:05:48",{"id":179,"version":180,"summary_zh":181,"released_at":182},117366,"v0.12.1","修复了运行过早终止的问题\n## 变更内容\n* 由 @perib 在 https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fpull\u002F1315 中修复了早期崩溃问题\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot\u002Fcompare\u002Fv0.12.0...v0.12.1","2023-08-15T18:21:39",{"id":184,"version":185,"summary_zh":186,"released_at":187},117367,"v0.12.0","- 修复 NumPy 兼容性\n- Dask 优化\n- 修复 minor bug","2023-05-25T22:44:19",{"id":189,"version":190,"summary_zh":191,"released_at":192},117368,"v0.11.7","- 修复与 scikit-learn 0.24 和 xgboost 1.3.0 的兼容性问题\n- 修复导致 TPOT 在分类超过 50 个类别时无法正常工作的 bug\n- 添加对 `imblearn` 中 `Resampler` 的初始支持\n- 修复若干小 bug","2021-01-06T15:19:33",{"id":194,"version":195,"summary_zh":196,"released_at":197},117369,"0.11.6.post3","- 用于修复与最新版XGBoost（v1.3.0）兼容性问题的补丁","2020-12-14T15:07:26",{"id":199,"version":200,"summary_zh":201,"released_at":202},117370,"v0.11.6.post2","- 将 XGBoost 作为必需的依赖项","2020-11-30T16:31:53",{"id":204,"version":205,"summary_zh":206,"released_at":207},117371,"v0.11.6.post1","- 优化运算符类型检查的逻辑。","2020-11-05T15:52:25",{"id":209,"version":210,"summary_zh":211,"released_at":212},117372,"0.11.6","- 修复一个 bug：当使用 `template` 选项时，点突变功能无法正常工作。\n- 添加一个新的构建配置“TPOT cuML”，该配置将使用 [RAPIDS cuML](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcuml) 和 [DMLC XGBoost](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fxgboost) 中的 GPU 加速估计器，在受限的配置空间内进行搜索。**此配置需要配备 NVIDIA Pascal 架构或更高版本的 GPU（计算能力 ≥ 6.0），并且已安装 cuML 库。**\n- 为 log\u002Flog_file 参数添加字符串路径支持。\n- 修复 0.11.5 版本中的一个 bug：每一代结束后，stdout 不会更新。\n- 修复其他一些小 bug。\n","2020-10-26T15:09:32",{"id":214,"version":215,"summary_zh":216,"released_at":217},117373,"v0.11.1-resAdj","- Development branch based on TPOT 0.11.1 for adjusting covariate without data leakage.","2020-09-02T20:35:20",{"id":219,"version":220,"summary_zh":221,"released_at":222},117374,"v0.11.5","- Make `Pytorch` as an optional dependency\r\n- Refine installation documentation ","2020-06-01T22:10:38",{"id":224,"version":225,"summary_zh":226,"released_at":227},117375,"v0.11.4","- Add a new built configuration \"TPOT NN\" which includes all operators in \"Default TPOT\" plus additional neural network estimators written in PyTorch (currently `tpot.builtins.PytorchLRClassifier` and `tpot.builtins.PytorchMLPClassifier` for classification tasks only)\r\n- Refine `log_file` parameter's behavior","2020-05-29T16:12:58",{"id":229,"version":230,"summary_zh":231,"released_at":232},117376,"v0.11.3","- Fix a bug in TPOTRegressor in v0.11.2\r\n- Add `-log` option in command line interface to save process log to a file.","2020-05-14T13:16:28",{"id":234,"version":235,"summary_zh":236,"released_at":237},117377,"v0.11.2","- Fix `early_stop` parameter does not work properly\r\n- TPOT built-in `OneHotEncoder` can refit to different datasets\r\n- Fix the issue that the attribute `evaluated_individuals_` cannot record correct generation info.\r\n- Add a new parameter `log_file` to output logs to a file instead of `sys.stdout`\r\n- Fix some code quality issues and mistakes in documentations\r\n- Fix minor bugs","2020-05-13T14:49:49",{"id":239,"version":240,"summary_zh":241,"released_at":242},117378,"v0.11.1","- Fix compatibility issue with scikit-learn v0.22\r\n- `warm_start` now saves both Primitive Sets and evaluated_pipelines_ from previous runs;\r\n- Fix the error that TPOT assign wrong fitness scores to non-evaluated pipelines (interrupted by `max_min_mins` or `KeyboardInterrupt`) ;\r\n- Fix the bug that mutation operator cannot generate new pipeline when template is not default value and `warm_start` is True;\r\n- Fix the bug that `max_time_mins` cannot stop optimization process when search space is limited.  \r\n- Fix a bug in exported codes when the exported pipeline is only 1 estimator \r\n- Fix spelling mistakes in documentations \r\n- Fix some code quality issues ","2020-01-03T18:04:20",{"id":244,"version":245,"summary_zh":246,"released_at":247},117379,"v0.11.0","- **Support for Python 3.4 and below has been officially dropped.** Also support for scikit-learn 0.20 or below has been dropped.\r\n- The support of a metric function with the signature `score_func(y_true, y_pred)` for `scoring parameter` has been dropped.\r\n- Refine `StackingEstimator` for not stacking NaN\u002FInfinity predication probabilities.\r\n- Fix a bug that population doesn't persist even `warm_start=True` when `max_time_mins` is not default value.\r\n- Now the `random_state` parameter in TPOT is used for pipeline evaluation instead of using a fixed random seed of 42 before. The `set_param_recursive` function has been moved to `export_utils.py` and it can be used in exported codes for setting `random_state` recursively in scikit-learn Pipeline. It is used to set `random_state` in `fitted_pipeline_` attribute and exported pipelines.\r\n- TPOT can independently use `generations` and `max_time_mins` to limit the optimization process through using one of the parameters or both.\r\n- `.export()` function will return string of exported pipeline if output filename is not specified.\r\n- Add [`SGDClassifier`](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.linear_model.SGDClassifier.html) and [`SGDRegressor`](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.linear_model.SGDRegressor.html) into TPOT default configs.\r\n- Documentation has been updated.\r\n- Fix minor bugs.","2019-11-05T21:04:49",{"id":249,"version":250,"summary_zh":251,"released_at":252},117380,"v0.10.2","- **TPOT v0.10.2 is the last version to support Python 2.7 and Python 3.4.**\r\n- Minor updates for fixing compatibility issues with the latest version of scikit-learn (version > 0.21) and xgboost (v0.90)\r\n- Default value of `template` parameter is changed to `None` instead.\r\n- Fix errors in documentation\r\n","2019-07-16T17:29:50",{"id":254,"version":255,"summary_zh":256,"released_at":257},117381,"v0.10.1","- Add `data_file_path` option into `expert` function for replacing `'PATH\u002FTO\u002FDATA\u002FFILE'` to customized dataset path in exported scripts. (Related issue #838)\r\n- Change python version in CI tests to 3.7\r\n- Add CI tests for macOS.","2019-04-19T15:19:09",{"id":259,"version":260,"summary_zh":261,"released_at":262},117382,"v0.10.0","- Add a new `template` option to specify a desired structure for machine learning pipeline in TPOT. Check [TPOT API](https:\u002F\u002Fepistasislab.github.io\u002Ftpot\u002Fapi\u002F) (it will be updated once it is merge to master branch).\r\n- Add `FeatureSetSelector` operator into TPOT for feature selection based on *priori* export knowledge. Please check our [preprint paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F502484v1.article-info) for more details (*Note: it was named `DatasetSelector` in 1st version paper but we will rename to FeatureSetSelector in next version of the paper*)\r\n- Refine `n_jobs` parameter to accept value below -1. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. It is related to the issue #846.\r\n- Now `memory`  parameter can create memory cache directory if it does not exist. It is related to the issue #837.\r\n- Fix minor bugs.","2019-04-12T14:48:07"]