[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-r0f1--datascience":3,"tool-r0f1--datascience":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",142651,2,"2026-04-06T23:34:12",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":75,"owner_location":76,"owner_email":75,"owner_twitter":75,"owner_website":77,"owner_url":78,"languages":75,"stars":79,"forks":80,"last_commit_at":81,"license":82,"difficulty_score":83,"env_os":84,"env_gpu":85,"env_ram":86,"env_deps":87,"category_tags":101,"github_topics":103,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":117,"updated_at":118,"faqs":119,"releases":120},4893,"r0f1\u002Fdatascience","datascience","Curated list of Python resources for data science.","datascience 是一份精心整理的 Python 数据科学资源清单，旨在为从业者提供从基础库到进阶技巧的一站式导航。它不仅仅罗列了 pandas、scikit-learn 和 matplotlib 等核心工具，更广泛收录了教程、代码片段、博客文章及技术演讲，有效解决了开发者在海量生态中难以快速定位优质学习资源和高效替代方案的痛点。\n\n这份清单特别适合数据科学家、机器学习工程师以及希望提升 Python 数据分析能力的研究人员使用。其独特亮点在于不仅涵盖经典库，还敏锐地引入了 DuckDB（在 DataFrame 上高效运行 SQL）、Polars（多线程加速替代方案）以及 pygwalker 等交互式可视化工具。此外，它还包含了如 uv 依赖管理、marimo 可复现环境等现代工程化实践资源。无论是新手入门寻找学习路径，还是资深专家探索性能优化与新工作流，datascience 都能提供极具价值的参考指引，帮助用户构建更完善的数据科学工具箱。","# Awesome Data Science with Python\n\n> A curated list of awesome resources for practicing data science using Python, including not only libraries, but also links to tutorials, code snippets, blog posts and talks.  \n\n#### Core\n[pandas](https:\u002F\u002Fpandas.pydata.org\u002F) - Data structures built on top of [numpy](https:\u002F\u002Fwww.numpy.org\u002F).  \n[scikit-learn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002F) - Core ML library, [intelex](https:\u002F\u002Fgithub.com\u002Fintel\u002Fscikit-learn-intelex).  \n[matplotlib](https:\u002F\u002Fmatplotlib.org\u002F) - Plotting library.  \n[seaborn](https:\u002F\u002Fseaborn.pydata.org\u002F) - Data visualization library based on matplotlib.  \n[ydata-profiling](https:\u002F\u002Fgithub.com\u002Fydataai\u002Fydata-profiling) - Descriptive statistics using `ProfileReport`.  \n[sklearn_pandas](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fsklearn-pandas) - Helpful `DataFrameMapper` class.  \n[missingno](https:\u002F\u002Fgithub.com\u002FResidentMario\u002Fmissingno) - Missing data visualization.  \n[rainbow-csv](https:\u002F\u002Fmarketplace.visualstudio.com\u002Fitems?itemName=mechatroner.rainbow-csv) - VSCode plugin to display .csv files with nice colors.  \n\n#### General Python Programming\n[Advanced Python Features](https:\u002F\u002Fblog.edward-li.com\u002Ftech\u002Fadvanced-python-features\u002F) - Generics, Protocols, Structural Pattern Matching and more.  \n[uv](https:\u002F\u002Fgithub.com\u002Fastral-sh\u002Fuv) - Dependency management.  \n[pdm](https:\u002F\u002Fpdm-project.org\u002Fen\u002Flatest\u002F) - For large binary distributions, works with uv.  \n[just](https:\u002F\u002Fgithub.com\u002Fcasey\u002Fjust) - Command runner. Replacement for make.  \n[python-dotenv](https:\u002F\u002Fgithub.com\u002Ftheskumar\u002Fpython-dotenv) - Manage environment variables.  \n[structlog](https:\u002F\u002Fgithub.com\u002Fhynek\u002Fstructlog) - Python logging.  \n[more_itertools](https:\u002F\u002Fmore-itertools.readthedocs.io\u002Fen\u002Flatest\u002F) - Extension of itertools.  \n[tqdm](https:\u002F\u002Fgithub.com\u002Ftqdm\u002Ftqdm) - Progress bars for for-loops. Also supports [pandas apply()](https:\u002F\u002Fstackoverflow.com\u002Fa\u002F34365537\u002F1820480).  \n[hydra](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhydra) - Configuration management.  \n\n#### Pandas Tricks, Alternatives and Additions\n[duckdb](https:\u002F\u002Fgithub.com\u002Fduckdb\u002Fduckdb) - Efficiently run SQL queries on pandas DataFrame, [duckplyr](https:\u002F\u002Fgithub.com\u002Ftidyverse\u002Fduckplyr\u002F) for R, [Great Intro](https:\u002F\u002Fcodecut.ai\u002Fdeep-dive-into-duckdb-data-scientists\u002F).  \n[ducklake](https:\u002F\u002Fgithub.com\u002Fduckdb\u002Fducklake) - Duckdb extention for storing data in a datalake.  \n[fireducks](https:\u002F\u002Fgithub.com\u002Ffireducks-dev\u002Ffireducks) - Speedier alternative to pandas with similar API.  \n[pandasvault](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Fpandasvault) - Large collection of pandas tricks.  \n[polars](https:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars) - Multi-threaded alternative to pandas.  \n[xarray](https:\u002F\u002Fgithub.com\u002Fpydata\u002Fxarray\u002F) - Extends pandas to n-dimensional arrays.  \n[mlx](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx) - An array framework for Apple silicon.  \n[pandas_flavor](https:\u002F\u002Fgithub.com\u002FZsailer\u002Fpandas_flavor) - Write custom accessors like `.str` and `.dt`.   \n[daft](https:\u002F\u002Fgithub.com\u002FEventual-Inc\u002FDaft) - Distributed DataFrame.  \n[vaex](https:\u002F\u002Fgithub.com\u002Fvaexio\u002Fvaex) - Out-of-Core DataFrames.  \n[modin](https:\u002F\u002Fgithub.com\u002Fmodin-project\u002Fmodin) - Parallelization library for faster pandas `DataFrame`.  \n[swifter](https:\u002F\u002Fgithub.com\u002Fjmcarpenter2\u002Fswifter) - Apply any function to a pandas DataFrame faster (works with modin). \n\n#### Tables\n[great-tables](https:\u002F\u002Fgithub.com\u002Fposit-dev\u002Fgreat-tables) - Display tabular data nicely.  \n\n#### Interactive Dataframe Visualization\n[pygwalker](https:\u002F\u002Fgithub.com\u002FKanaries\u002Fpygwalker) - Interactive dataframe.  \n[marimo](https:\u002F\u002Fgithub.com\u002Fmarimo-team\u002Fmarimo) - Visualization and reproducible environment.  \n[lux](https:\u002F\u002Fgithub.com\u002Flux-org\u002Flux) - DataFrame visualization within Jupyter.  \n[dtale](https:\u002F\u002Fgithub.com\u002Fman-group\u002Fdtale) - View and analyze Pandas data structures, integrating with Jupyter.  \n[pandasgui](https:\u002F\u002Fgithub.com\u002Fadamerose\u002Fpandasgui) - GUI for viewing, plotting and analyzing Pandas DataFrames.  \n[quak](https:\u002F\u002Fgithub.com\u002Fmanzt\u002Fquak) - Scalable, interactive data table, [twitter](https:\u002F\u002Fx.com\u002Ftrevmanz\u002Fstatus\u002F1816760923949809982).  \n[data-formulator](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fdata-formulator) - Data visualization tool.  \n\n\n#### Environment and Jupyter\n[Jupyter Tricks](https:\u002F\u002Fwww.dataquest.io\u002Fblog\u002Fjupyter-notebook-tips-tricks-shortcuts\u002F)  \n[nteract](https:\u002F\u002Fnteract.io\u002F) - Open Jupyter Notebooks with doubleclick.  \n[papermill](https:\u002F\u002Fgithub.com\u002Fnteract\u002Fpapermill) - Parameterize and execute Jupyter notebooks, [tutorial](https:\u002F\u002Fpbpython.com\u002Fpapermil-rclone-report-1.html).  \n[nbdime](https:\u002F\u002Fgithub.com\u002Fjupyter\u002Fnbdime) - Diff two notebook files, Alternative GitHub App: [ReviewNB](https:\u002F\u002Fwww.reviewnb.com\u002F).  \n[RISE](https:\u002F\u002Fgithub.com\u002Fdamianavila\u002FRISE) - Turn Jupyter notebooks into presentations.  \n[handcalcs](https:\u002F\u002Fgithub.com\u002Fconnorferster\u002Fhandcalcs) - More convenient way of writing mathematical equations in Jupyter.  \n[notebooker](https:\u002F\u002Fgithub.com\u002Fman-group\u002Fnotebooker) - Productionize and schedule Jupyter Notebooks.  \n[voila](https:\u002F\u002Fgithub.com\u002FQuantStack\u002Fvoila) - Turn Jupyter notebooks into standalone web applications. [Voila grid layout](https:\u002F\u002Fgithub.com\u002Fvoila-dashboards\u002Fvoila-gridstack).  \n\n#### Jupyter Alternatives\n[positron](https:\u002F\u002Fgithub.com\u002Fposit-dev\u002Fpositron) - Data Science IDE.  \n[Deepnote](https:\u002F\u002Fdeepnote.com) - Data Science platform with real-time collaboration, environment management.  \n\n#### Extraction + OCR\n[textract](https:\u002F\u002Fgithub.com\u002Fdeanmalmgren\u002Ftextract) - Extract text from any document.  \n[docling](https:\u002F\u002Fgithub.com\u002Fdocling-project\u002Fdocling) - Text extraction.  \n[DeepSeek-OCR](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-OCR) - OCR.  \n[chandra](https:\u002F\u002Fgithub.com\u002Fdatalab-to\u002Fchandra) - OCR.  \n\n#### Big Data\n[spark](https:\u002F\u002Fdocs.databricks.com\u002Fspark\u002Flatest\u002Fdataframes-datasets\u002Fintroduction-to-dataframes-python.html#work-with-dataframes) - `DataFrame` for big data, [cheatsheet](https:\u002F\u002Fgist.github.com\u002Fcrawles\u002Fb47e23da8218af0b9bd9d47f5242d189), [tutorial](https:\u002F\u002Fgithub.com\u002Fericxiao251\u002Fspark-syntax).  \n[dask](https:\u002F\u002Fgithub.com\u002Fdask\u002Fdask), [dask-ml](http:\u002F\u002Fml.dask.org\u002F) - Pandas `DataFrame` for big data and machine learning library, [resources](https:\u002F\u002Fmatthewrocklin.com\u002Fblog\u002F\u002Fwork\u002F2018\u002F07\u002F17\u002Fdask-dev), [talk1](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ccfsbuqsjgI), [talk2](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=RA_2qdipVng), [notebooks](https:\u002F\u002Fgithub.com\u002Fdask\u002Fdask-ec2\u002Ftree\u002Fmaster\u002Fnotebooks), [videos](https:\u002F\u002Fwww.youtube.com\u002Fuser\u002Fmdrocklin).  \n[h2o](https:\u002F\u002Fgithub.com\u002Fh2oai\u002Fh2o-3) - Helpful `H2OFrame` class for out-of-memory dataframes.  \n[cuDF](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcudf) - GPU DataFrame Library, [Intro](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6XzS5XcpicM&t=2m50s).  \n[cupy](https:\u002F\u002Fgithub.com\u002Fcupy\u002Fcupy) - NumPy-like API accelerated with CUDA.  \n[ray](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray\u002F) - Flexible, high-performance distributed execution framework.  \n[bottleneck](https:\u002F\u002Fgithub.com\u002Fkwgoodman\u002Fbottleneck) - Fast NumPy array functions written in C.   \n[petastorm](https:\u002F\u002Fgithub.com\u002Fuber\u002Fpetastorm) - Data access library for parquet files by Uber.  \n[zarr](https:\u002F\u002Fgithub.com\u002Fzarr-developers\u002Fzarr-python) - Distributed NumPy arrays.  \n[NVTabular](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNVTabular) - Feature engineering and preprocessing library for tabular data by Nvidia.  \n[tensorstore](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Ftensorstore) - Reading and writing large multi-dimensional arrays (Google).  \n\n#### Command line tools, CSV\n[csvkit](https:\u002F\u002Fgithub.com\u002Fwireservice\u002Fcsvkit) - Command line tool for CSV files.  \n[csvsort](https:\u002F\u002Fpypi.org\u002Fproject\u002Fcsvsort\u002F) - Sort large csv files.  \n\n#### Classical Statistics\n\n##### Books\n[Lakens - Improving Your Statistical Inferences](https:\u002F\u002Flakens.github.io\u002Fstatistical_inferences\u002F) - Testing, Effect Sizes, Confidence Intervals, Sample Size, Equivalence Testing, Sequential Analysis, [Github](https:\u002F\u002Fgithub.com\u002FLakens\u002Fstatistical_inferences)  \n[Models Demystified](https:\u002F\u002Fm-clark.github.io\u002Fbook-of-models\u002F) - From Linear Regression to Deep Learning. [Github](https:\u002F\u002Fgithub.com\u002Fm-clark\u002Fbook-of-models).  \n[The Math Behind Artificial Intelligence](https:\u002F\u002Fwww.freecodecamp.org\u002Fnews\u002Fthe-math-behind-artificial-intelligence-book) - Engineering-focused book covering linear algebra, calculus, probability & statistics, and optimization theory with Python examples.  \n\n##### Datasets\n[Rdatasets](https:\u002F\u002Fvincentarelbundock.github.io\u002FRdatasets\u002Farticles\u002Fdata.html) - Collection of more than 2000 datasets, stored as csv files (R package).  \n[crimedatasets](https:\u002F\u002Flightbluetitan.github.io\u002Fcrimedatasets\u002F) - Datasets focused on crimes, criminal activities (R package).  \n[educationr](https:\u002F\u002Flightbluetitan.github.io\u002Feducationr\u002F) - Datasets related to education (performance, learning methods, test scores, absenteeism) (R package).  \n[MedDataSets](https:\u002F\u002Flightbluetitan.github.io\u002Fmeddatasets\u002Findex.html) - Datasets related to medicine, diseases, treatments, drugs, and public health (R package).  \n[oncodatasets](https:\u002F\u002Flightbluetitan.github.io\u002Foncodatasets\u002F) - Datasets focused on cancer research, survival rates, genetic studies, biomarkers, epidemiology (R package).  \n[timeseriesdatasets_R](https:\u002F\u002Flightbluetitan.github.io\u002Ftimeseriesdatasets_R\u002F) - Time series datasets (R package).  \n[usdatasets](https:\u002F\u002Flightbluetitan.github.io\u002Fusdatasets\u002F) - US-exclusive datasets (crime, economics, education, finance, energy, healthcare) (R package).  \n[economic datasets](https:\u002F\u002Fcaptgouda24.github.io\u002Fnicholas-decker.github.io\u002Fdatasets.html) - Economic datasets.  \n\n##### p-values\n[The ASA Statement on p-Values: Context, Process, and Purpose](https:\u002F\u002Famstat.tandfonline.com\u002Fdoi\u002Ffull\u002F10.1080\u002F00031305.2016.1154108#.Vt2XIOaE2MN)  \n[Greenland - Statistical tests, P-values, confidence intervals, and power: a guide to misinterpretations](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC4877414\u002F)  \n[Rubin - Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS2590260124000067?via%3Dihub)  \n[Gigerenzer - Mindless Statistics](https:\u002F\u002Flibrary.mpib-berlin.mpg.de\u002Fft\u002Fgg\u002FGG_Mindless_2004.pdf)  \n[Rubin - That's not a two-sided test! It's two one-sided tests! (TOST)](https:\u002F\u002Frss.onlinelibrary.wiley.com\u002Fdoi\u002Ffull\u002F10.1111\u002F1740-9713.01405)  \n[Lakens - How were we supposed to move beyond  p \u003C .05, and why didn’t we?](https:\u002F\u002Ferrorstatistics.com\u002F2024\u002F07\u002F01\u002Fguest-post-daniel-lakens-how-were-we-supposed-to-move-beyond-p-05-and-why-didnt-we-thoughts-on-abandon-statistical-significance-5-years-on\u002F)  \n[McShane et al. - Abandon Statistical Significance](https:\u002F\u002Fwww.tandfonline.com\u002Fdoi\u002Ffull\u002F10.1080\u002F00031305.2018.1527253)  \n[Ho et al. - Moving beyond P values data analysis with estimation graphics](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F333884529_Moving_beyond_P_values_data_analysis_with_estimation_graphics)  \n[Lakens - The probability of p-values as a function of the statistical power of a test](https:\u002F\u002Fdaniellakens.blogspot.com\u002F2014\u002F05\u002Fthe-probability-of-p-values-as-function.html) - p-value distribution is right-skewed and becomes even more skewed the higher the power of the test.  \n\n##### Correlation\n[Guess the Correlation](https:\u002F\u002Fwww.guessthecorrelation.com\u002F) - Correlation guessing game.  \n[phik](https:\u002F\u002Fgithub.com\u002Fkaveio\u002Fphik) - Correlation between categorical, ordinal and interval variables.  \n[hoeffd](https:\u002F\u002Fsearch.r-project.org\u002FCRAN\u002Frefmans\u002FHmisc\u002Fhtml\u002Fhoeffd.html) - Hoeffding's D Statistics, measure of dependence (R package).  \n\n##### Confidence Intervals\n[Morey - The fallacy of placing confidence in confidence intervals](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.3758\u002Fs13423-015-0947-8)  \n\n##### Packages\n[statsmodels](https:\u002F\u002Fwww.statsmodels.org\u002Fstable\u002Findex.html) - Statistical tests.  \n[linearmodels](https:\u002F\u002Fgithub.com\u002Fbashtage\u002Flinearmodels) - Instrumental variable and panel data models.  \n[nomograms](https:\u002F\u002Fhbiostat.org\u002Fbbr\u002Frmsintro.html#nomograms-overall-depiction-of-fitted-models) - Visualization for linear models, [explanation](https:\u002F\u002Fstats.stackexchange.com\u002Fa\u002F155433\u002F285504) (Part of rms R package)  \n[pingouin](https:\u002F\u002Fgithub.com\u002Fraphaelvallat\u002Fpingouin) - Statistical tests. [Pairwise correlation between columns of pandas DataFrame](https:\u002F\u002Fpingouin-stats.org\u002Fgenerated\u002Fpingouin.pairwise_corr.html)   \n[scipy.stats](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fstats.html#statistical-tests) - Statistical tests.  \n[scikit-posthocs](https:\u002F\u002Fgithub.com\u002Fmaximtrp\u002Fscikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons.   \nBland-Altman Plot [1](https:\u002F\u002Fpingouin-stats.org\u002Fgenerated\u002Fpingouin.plot_blandaltman.html), [2](http:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement.  \n[ANOVA](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.f_oneway.html)  \n[StatCheck](https:\u002F\u002Fstatcheck.steveharoz.com\u002F) - Extract statistics from articles and recompute p-values (R package).  \n[tost](https:\u002F\u002Fpingouin-stats.org\u002Fbuild\u002Fhtml\u002Fgenerated\u002Fpingouin.tost.html) - Two One-Sided Test (TOST) for equivalence.  \n[DABEST-python](https:\u002F\u002Fgithub.com\u002FACCLAB\u002FDABEST-python) - Mean difference plots.    \n[Durga](https:\u002F\u002Fgithub.com\u002FKhanKawsar\u002FEstimationPlot) - Mean difference plots (R package).  \n\n##### Effect Size\n[MOTE Effect Size Calculator](https:\u002F\u002Fwww.aggieerin.com\u002Fshiny-server\u002F) - [Shiny App](https:\u002F\u002Fdoomlab.shinyapps.io\u002Fmote\u002F), [R package](https:\u002F\u002Fgithub.com\u002Fdoomlab\u002FMOTE)  \n[Estimating Effect Sizes From Pretest-Posttest-Control Group Designs](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002Fepdf\u002F10.1177\u002F1094428106291059) - Scott B. Morris, [Twitter](https:\u002F\u002Ftwitter.com\u002FMatthewBJane\u002Fstatus\u002F1742588609025200557)    \n\n##### Statistical Tests\n[test_proportions_2indep](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.stats.proportion.test_proportions_2indep.html) - Proportion test.  \n[G-Test](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FG-test) - Alternative to chi-square test, [power_divergence](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.power_divergence.html).  \n\n##### Comparing Two Populations\n[torch-two-sample](https:\u002F\u002Fgithub.com\u002Fjosipd\u002Ftorch-two-sample) - Friedman-Rafsky Test: Compare two population based on a multivariate generalization of the Runstest. [Explanation](https:\u002F\u002Fwww.real-statistics.com\u002Fmultivariate-statistics\u002Fmultivariate-normal-distribution\u002Ffriedman-rafsky-test\u002F), [Application](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC5014134\u002F)  \n\n##### Power and Sample Size Calculations\n[pwrss](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fpwrss\u002Findex.html) - Statistical Power and Sample Size Calculation Tools (R package), [Tutorial with t-test](https:\u002F\u002Frpubs.com\u002Fmetinbulus\u002Fwelch)  \n\n##### Interim Analyses \u002F Sequential Analysis \u002F Stopping\n[Stop Early Stopping](https:\u002F\u002Fstop-early-stopping.osc.garden\u002F) - Nice visualization\n[Sequential Analysis](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSequential_analysis) - Wikipedia.  \n[sequential](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002FSequential\u002FSequential.pdf) - Exact Sequential Analysis for Poisson and Binomial Data (R package).  \n[confseq](https:\u002F\u002Fgithub.com\u002Fgostevehoward\u002Fconfseq) - Uniform boundaries, confidence sequences, and always-valid p-values.  \n\n##### Visualizations\n[Friends don't let friends make certain types of data visualization](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends)  \n[Great Overview over Visualizations](https:\u002F\u002Ftextvis.lnu.se\u002F)  \n[1 dataset, 100 visualizations](https:\u002F\u002F100.datavizproject.com\u002F)  \n[Dependent Propabilities](https:\u002F\u002Fstatic.laszlokorte.de\u002Fstochastic\u002F)  \n[Null Hypothesis Significance Testing (NHST) and Sample Size Calculation](https:\u002F\u002Frpsychologist.com\u002Fd3\u002FNHST\u002F)  \n[estimationstats](https:\u002F\u002Fwww.estimationstats.com\u002F) - Online Tool for visualizing mean differences, effect sizes (Cohen's d) and others.  \n[Sample Size \u002F Duration Calculator](https:\u002F\u002Fcalculator.osc.garden\u002F)  \n[Correlation](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fcorrelation\u002F)  \n[Cohen's d](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fcohend\u002F)  \n[Confidence Interval](https:\u002F\u002Frpsychologist.com\u002Fd3\u002FCI\u002F)  \n[Equivalence, non-inferiority and superiority testing](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fequivalence\u002F)  \n[Bayesian two-sample t test](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fbayes\u002F)  \n[Distribution of p-values when comparing two groups](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fpdist\u002F)  \n[Understanding the t-distribution and its normal approximation](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Ftdist\u002F)  \n[Statistical Power and Sample Size Calculation Tools](https:\u002F\u002Fpwrss.shinyapps.io\u002Findex\u002F)  \n\n##### Tidy Tuesday\n[The Art of Data Visualization with ggplot2, The TidyTuesday Cookbook](https:\u002F\u002Fnrennie.rbind.io\u002Fart-of-viz\u002F)  \n[Best Practices for Data Visualization](https:\u002F\u002Froyal-statistical-society.github.io\u002Fdatavisguide\u002F)  \n[tidytuesday](https:\u002F\u002Fgithub.com\u002Frfordatascience\u002Ftidytuesday) - Weekly challenge for visualization and lots of publicly available datasets for practice.  \n[z3tt\u002FTidyTuesday](https:\u002F\u002Fgithub.com\u002Fz3tt\u002FTidyTuesday) - Nice charts (R).  \n[nrennie\u002Ftidytuesday](https:\u002F\u002Fgithub.com\u002Fnrennie\u002Ftidytuesday) - Nice charts (R).  \n[poncest\u002Ftidytuesday](https:\u002F\u002Fgithub.com\u002Fponcest\u002Ftidytuesday) - Nice charts (R).  \n\n##### Talks\n[Inverse Propensity Weighting](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=SUq0shKLPPs)  \n[Dealing with Selection Bias By Propensity Based Feature Selection](https:\u002F\u002Fwww.youtube.com\u002Fwatch?reload=9&v=3ZWCKr0vDtc)  \n\n##### Texts\n[Modes, Medians and Means: A Unifying Perspective](https:\u002F\u002Fwww.johnmyleswhite.com\u002Fnotebook\u002F2013\u002F03\u002F22\u002Fmodes-medians-and-means-an-unifying-perspective\u002F)   \n[Using Norms to Understand Linear Regression](https:\u002F\u002Fwww.johnmyleswhite.com\u002Fnotebook\u002F2013\u002F03\u002F22\u002Fusing-norms-to-understand-linear-regression\u002F)   \n[Verifying the Assumptions of Linear Models](https:\u002F\u002Fgithub.com\u002Ferykml\u002Fmedium_articles\u002Fblob\u002Fmaster\u002FStatistics\u002Flinear_regression_assumptions.ipynb)  \n[Mediation and Moderation Intro](https:\u002F\u002Fademos.people.uic.edu\u002FChapter14.html)  \n[Montgomery et al. - How conditioning on post-treatment variables can ruin your experiment and what to do about it](https:\u002F\u002Fcpb-us-e1.wpmucdn.com\u002Fsites.dartmouth.edu\u002Fdist\u002F5\u002F2293\u002Ffiles\u002F2021\u002F03\u002Fpost-treatment-bias.pdf)  \n[Lindeløv - Common statistical tests are linear models](https:\u002F\u002Flindeloev.github.io\u002Ftests-as-linear\u002F)    \n[Chatruc - The Central Limit Theorem and its misuse](https:\u002F\u002Fweb.archive.org\u002Fweb\u002F20191229234155\u002Fhttps:\u002F\u002Flambdaclass.com\u002Fdata_etudes\u002Fcentral_limit_theorem_misuse\u002F)  \n[Al-Saleh - Properties of the Standard Deviation that are Rarely Mentioned in Classrooms](http:\u002F\u002Fwww.stat.tugraz.at\u002FAJS\u002Fausg093\u002F093Al-Saleh.pdf)   \n[Wainer - The Most Dangerous Equation](http:\u002F\u002Fnsmn1.uh.edu\u002Fdgraur\u002Fniv\u002Fthemostdangerousequation.pdf)  \n[Gigerenzer - The Bias Bias in Behavioral Economics](https:\u002F\u002Fwww.nowpublishers.com\u002Farticle\u002FDetails\u002FRBE-0092)  \n[Cook - Estimating the chances of something that hasn’t happened yet](https:\u002F\u002Fwww.johndcook.com\u002Fblog\u002F2010\u002F03\u002F30\u002Fstatistical-rule-of-three\u002F)  \n[Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing), [Youtube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DbJyPELmhJc)  \n[How large is that number in the Law of Large Numbers?](https:\u002F\u002Fthepalindrome.org\u002Fp\u002Fhow-large-that-number-in-the-law)  \n[The Prosecutor's Fallacy](https:\u002F\u002Fwww.cebm.ox.ac.uk\u002Fnews\u002Fviews\u002Fthe-prosecutors-fallacy)  \n[The Dunning-Kruger Effect is Autocorrelation](https:\u002F\u002Feconomicsfromthetopdown.com\u002F2022\u002F04\u002F08\u002Fthe-dunning-kruger-effect-is-autocorrelation\u002F)  \n[Rafi, Greenland - Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise](https:\u002F\u002Fbmcmedresmethodol.biomedcentral.com\u002Farticles\u002F10.1186\u002Fs12874-020-01105-9)   \n[Carlin et al. - On the uses and abuses of regression models: a call for reform of statistical practice and teaching](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.06668)  \n[Chen, Roth - Logs with zeros? Some problems and solutions](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.06080)  \n[Wigboldus et al. - Encourage Playing with Data and Discourage Questionable Reporting Practices](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs11336-015-9445-1)  \n[Simmons et al. - False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002F10.1177\u002F0956797611417632?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)  \n[Zhang - An illusion of predictability in scientific results: Even experts confuse inferential uncertainty and outcome variability](https:\u002F\u002Fwww.pnas.org\u002Fdoi\u002F10.1073\u002Fpnas.2302491120) - Figure 1 shows difference between inferential uncertainty and outcome variability.  \n\n#### Evaluation\n[Collins et al. - Evaluation of clinical prediction models (part 1): from development to external validation](https:\u002F\u002Fwww.bmj.com\u002Fcontent\u002F384\u002Fbmj-2023-074819.full) - [Twitter](https:\u002F\u002Ftwitter.com\u002FGSCollins\u002Fstatus\u002F1744309712995098624)    \n\n#### Epidemiology\n[Lesko et al. - A Framework for Descriptive Epidemiology](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC10144679\u002F)  \n[R Epidemics Consortium](https:\u002F\u002Fwww.repidemicsconsortium.org\u002Fprojects\u002F) - Large tool suite for working with epidemiological data (R packages). [Github](https:\u002F\u002Fgithub.com\u002Freconhub)   \n[incidence2](https:\u002F\u002Fgithub.com\u002Freconhub\u002Fincidence2) - Computation, handling, visualisation and simple modelling of incidence (R package).  \n[EpiEstim](https:\u002F\u002Fgithub.com\u002Fmrc-ide\u002FEpiEstim) - Estimate time varying instantaneous reproduction number R during epidemics (R package) [paper](https:\u002F\u002Facademic.oup.com\u002Faje\u002Farticle\u002F178\u002F9\u002F1505\u002F89262).  \n[researchpy](https:\u002F\u002Fgithub.com\u002Fresearchpy\u002Fresearchpy) - Helpful `summary_cont()` function for summary statistics (Table 1).  \n[zEpid](https:\u002F\u002Fgithub.com\u002Fpzivich\u002FzEpid) - Epidemiology analysis package, [Tutorial](https:\u002F\u002Fgithub.com\u002Fpzivich\u002FPython-for-Epidemiologists).  \n[tipr](https:\u002F\u002Fgithub.com\u002FLucyMcGowan\u002Ftipr) - Sensitivity analyses for unmeasured confounders (R package).  \n[quartets](https:\u002F\u002Fgithub.com\u002Fr-causal\u002Fquartets) - Anscombe’s Quartet, Causal Quartet, [Datasaurus Dozen](https:\u002F\u002Fgithub.com\u002Fjumpingrivers\u002FdatasauRus) and others (R package).    \n[episensr](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fepisensr\u002Fvignettes\u002Fepisensr.html) - Quantitative Bias Analysis for Epidemiologic Data (=simulation of possible effects of different sources of bias) (R package).  \n\n#### Machine Learning Tutorials\n[Statistical Inference and Regression](https:\u002F\u002Fmattblackwell.github.io\u002Fgov2002-book\u002F)  \n[Applied Machine Learning in Python](https:\u002F\u002Fgeostatsguy.github.io\u002FMachineLearningDemos_Book\u002Fintro.html)  \n[Convolutional Neural Networks for Visual Recognition](https:\u002F\u002Fcs231n.github.io\u002F) - Stanford CS class.  \n[Intuition for the Algorithms in Machine Learning](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=7o9TMQAHgkQ&list=PLNeXFnYrCJneoY_rKtWJy833YiMrCRi5f&index=1) - Lecture Series.  \n\n#### Exploration and Cleaning\n[Checklist](https:\u002F\u002Fgithub.com\u002Fr0f1\u002Fml_checklist).  \n[pyjanitor](https:\u002F\u002Fgithub.com\u002Fpyjanitor-devs\u002Fpyjanitor) - Clean messy column names.  \n[skimpy](https:\u002F\u002Fgithub.com\u002Faeturrell\u002Fskimpy) - Create summary statistics of dataframes. Helpful `clean_columns()` function.  \n[pandera](https:\u002F\u002Fgithub.com\u002Funionai-oss\u002Fpandera) - Data \u002F Schema validation.  \n[dataframely](https:\u002F\u002Fgithub.com\u002FQuantco\u002Fdataframely) - Data \u002F Schema validation.  \n[pointblank](https:\u002F\u002Fgithub.com\u002Fposit-dev\u002Fpointblank) - Data \u002F Schema validation.  \n[impyute](https:\u002F\u002Fgithub.com\u002Feltonlaw\u002Fimpyute) - Imputations.  \n[fancyimpute](https:\u002F\u002Fgithub.com\u002Fiskandr\u002Ffancyimpute) - Matrix completion and imputation algorithms.  \n[imbalanced-learn](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fimbalanced-learn) - Resampling for imbalanced datasets.  \n[tspreprocess](https:\u002F\u002Fgithub.com\u002FMaxBenChrist\u002Ftspreprocess) - Time series preprocessing: Denoising, Compression, Resampling.  \n[Kaggler](https:\u002F\u002Fgithub.com\u002Fjeongyoonlee\u002FKaggler) - Utility functions (`OneHotEncoder(min_obs=100)`)  \n[skrub](https:\u002F\u002Fgithub.com\u002Fskrub-data\u002Fskrub) - Bridge the gap between tabular data sources and machine-learning models.  \n\n#### Noisy Labels\n[cleanlab](https:\u002F\u002Fgithub.com\u002Fcleanlab\u002Fcleanlab) - Machine learning with noisy labels, finding mislabelled data, and uncertainty quantification. Also see awesome list below.  \n[doubtlab](https:\u002F\u002Fgithub.com\u002Fkoaning\u002Fdoubtlab) - Find bad or noisy labels.\n\n#### Train \u002F Test Split\n[iterative-stratification](https:\u002F\u002Fgithub.com\u002Ftrent-b\u002Fiterative-stratification) - Stratification of multilabel data.  \n\n#### Feature Engineering\n[Vincent Warmerdam: Untitled12.ipynb](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=yXGCKqo5cEY) - Using df.pipe()  \n[Vincent Warmerdam: Winning with Simple, even Linear, Models](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=68ABAU_V8qI)  \n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.pipeline.Pipeline.html) - Pipeline, [examples](https:\u002F\u002Fgithub.com\u002Fjem1031\u002Fpandas-pipelines-custom-transformers).  \n[pdpipe](https:\u002F\u002Fgithub.com\u002Fshaypal5\u002Fpdpipe) - Pipelines for DataFrames.  \n[scikit-lego](https:\u002F\u002Fgithub.com\u002Fkoaning\u002Fscikit-lego) - Custom transformers for pipelines.  \n[categorical-encoding](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fcategorical-encoding) - Categorical encoding of variables, [vtreat (R package)](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fvtreat\u002Fvignettes\u002Fvtreat.html).  \n[patsy](https:\u002F\u002Fgithub.com\u002Fpydata\u002Fpatsy\u002F) - R-like syntax for statistical models.  \n[mlxtend](https:\u002F\u002Frasbt.github.io\u002Fmlxtend\u002Fuser_guide\u002Ffeature_extraction\u002FLinearDiscriminantAnalysis\u002F) - LDA.  \n[featuretools](https:\u002F\u002Fgithub.com\u002FFeaturetools\u002Ffeaturetools) - Automated feature engineering, [example](https:\u002F\u002Fgithub.com\u002FWillKoehrsen\u002Fautomated-feature-engineering\u002Fblob\u002Fmaster\u002Fwalk_through\u002FAutomated_Feature_Engineering.ipynb).  \n[tsfresh](https:\u002F\u002Fgithub.com\u002Fblue-yonder\u002Ftsfresh) - Time series feature engineering.  \n[temporian](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Ftemporian) - Time series feature engineering by Google.  \n[pypeln](https:\u002F\u002Fgithub.com\u002Fcgarciae\u002Fpypeln) - Concurrent data pipelines.  \n[feature-engine](https:\u002F\u002Fgithub.com\u002Ffeature-engine\u002Ffeature_engine) - Encoders, transformers, etc.  \n\n#### Feature Selection\n[Overview Paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS016794731930194X), [Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=JsArBz46_3s), [Repo](https:\u002F\u002Fgithub.com\u002FYimeng-Zhang\u002Ffeature-engineering-and-feature-selection)    \nBlog post series - [1](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-i-univariate-selection\u002F), [2](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-ii-linear-models-and-regularization\u002F), [3](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-iii-random-forests\u002F), [4](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side\u002F)  \nTutorials - [1](https:\u002F\u002Fwww.kaggle.com\u002Fresidentmario\u002Fautomated-feature-selection-with-sklearn), [2](https:\u002F\u002Fmachinelearningmastery.com\u002Ffeature-selection-machine-learning-python\u002F)  \n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fclasses.html#module-sklearn.feature_selection) - Feature selection.  \n[eli5](https:\u002F\u002Feli5.readthedocs.io\u002Fen\u002Flatest\u002Fblackbox\u002Fpermutation_importance.html#feature-selection) - Feature selection using permutation importance.  \n[scikit-feature](https:\u002F\u002Fgithub.com\u002Fjundongl\u002Fscikit-feature) - Feature selection algorithms.  \n[stability-selection](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fstability-selection) - Stability selection.  \n[scikit-rebate](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Fscikit-rebate) - Relief-based feature selection algorithms.  \n[scikit-genetic](https:\u002F\u002Fgithub.com\u002Fmanuel-calzolari\u002Fsklearn-genetic) - Genetic feature selection.  \n[boruta_py](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fboruta_py) - Feature selection, [explaination](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F264360\u002Fboruta-all-relevant-feature-selection-vs-random-forest-variables-of-importanc\u002F264467), [example](https:\u002F\u002Fwww.kaggle.com\u002Ftilii7\u002Fboruta-feature-elimination).  \n[Boruta-Shap](https:\u002F\u002Fgithub.com\u002FEkeany\u002FBoruta-Shap) - Boruta feature selection algorithm + shapley values.  \n[linselect](https:\u002F\u002Fgithub.com\u002Fefavdb\u002Flinselect) - Feature selection package.  \n[mlxtend](https:\u002F\u002Frasbt.github.io\u002Fmlxtend\u002Fuser_guide\u002Ffeature_selection\u002FExhaustiveFeatureSelector\u002F) - Exhaustive feature selection.     \n[BoostARoota](https:\u002F\u002Fgithub.com\u002Fchasedehan\u002FBoostARoota) - Xgboost feature selection algorithm.  \n[INVASE](https:\u002F\u002Fgithub.com\u002Fjsyoon0823\u002FINVASE) - Instance-wise Variable Selection using Neural Networks.  \n[SubTab](https:\u002F\u002Fgithub.com\u002FAstraZeneca\u002FSubTab) - Subsetting Features of Tabular Data for Self-Supervised Representation Learning, AstraZeneca.  \n[mrmr](https:\u002F\u002Fgithub.com\u002Fsmazzanti\u002Fmrmr) - Maximum Relevance and Minimum Redundancy Feature Selection, [Website](http:\u002F\u002Fhome.penglab.com\u002Fproj\u002FmRMR\u002F).  \n[arfs](https:\u002F\u002Fgithub.com\u002FThomasBury\u002Farfs) - All Relevant Feature Selection.  \n[VSURF](https:\u002F\u002Fgithub.com\u002Frobingenuer\u002FVSURF) - Variable Selection Using Random Forests (R package) [doc](https:\u002F\u002Fwww.rdocumentation.org\u002Fpackages\u002FVSURF\u002Fversions\u002F1.1.0\u002Ftopics\u002FVSURF).  \n[FeatureSelectionGA](https:\u002F\u002Fgithub.com\u002Fkaushalshetty\u002FFeatureSelectionGA) - Feature Selection using Genetic Algorithm.  \n\n#### Subset Selection\n[apricot](https:\u002F\u002Fgithub.com\u002Fjmschrei\u002Fapricot) - Selecting subsets of data sets to train machine learning models quickly.  \n[ducks](https:\u002F\u002Fgithub.com\u002Fmanimino\u002Fducks) - Index data for fast lookup by any combination of fields.  \n\n#### Dimensionality Reduction \u002F Representation Learning\n\n##### Selection\nCheck also the Clustering section and self-supervised learning section for ideas!  \n[Review](https:\u002F\u002Fmembers.loria.fr\u002Fmoberger\u002FEnseignement\u002FAVR\u002FExposes\u002FTR_Dimensiereductie.pdf)  \n  \nPCA - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.decomposition.PCA.html)    \nAutoencoder - [link](https:\u002F\u002Fblog.keras.io\u002Fbuilding-autoencoders-in-keras.html)  \nIsomaps - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.Isomap.html#sklearn.manifold.Isomap)    \nLLE - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.LocallyLinearEmbedding.html)  \nForce-directed graph drawing - [link](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.draw_graph.html#scanpy.tl.draw_graph)    \nMDS - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.MDS.html)  \nDiffusion Maps - [link](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.diffmap.html)  \nt-SNE - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.TSNE.html#sklearn.manifold.TSNE)    \nNeRV - [link](https:\u002F\u002Fgithub.com\u002Fziyuang\u002Fpynerv), [paper](https:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fvolume11\u002Fvenna10a\u002Fvenna10a.pdf)  \nMDR - [link](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Fscikit-mdr)  \nUMAP - [link](https:\u002F\u002Fgithub.com\u002Flmcinnes\u002Fumap)  \nRandom Projection - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Frandom_projection.html)  \nIvis - [link](https:\u002F\u002Fgithub.com\u002Fberingresearch\u002Fivis)   \nSimCLR - [link](https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly)  \npymde - Minimum-distortion embedding with PyTorch, [link](https:\u002F\u002Fgithub.com\u002Fcvxgrp\u002Fpymde)\n\n##### Neural-network based\n[esvit](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fesvit) - Vision Transformers for Representation Learning (Microsoft).  \n[MCML](https:\u002F\u002Fgithub.com\u002Fpachterlab\u002FMCML) - Semi-supervised dimensionality reduction of Multi-Class, Multi-Label data (sequencing data) [paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.08.25.457696v1).  \n\n##### Packages\n[Dangers of PCA (paper)](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-022-14395-4).  \n[Phantom oscillations in PCA](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.06.20.545619v1.full).  \n[What to use instead of PCA](https:\u002F\u002Fwww.pnas.org\u002Fdoi\u002F10.1073\u002Fpnas.2319169120).  \n[Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=9iol3Lk6kyU), [tsne intro](https:\u002F\u002Fdistill.pub\u002F2016\u002Fmisread-tsne\u002F). \n[sklearn.manifold](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fclasses.html#module-sklearn.manifold) and [sklearn.decomposition](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fclasses.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others.  \nAdditional plots for PCA - Factor Loadings, Cumulative Variance Explained, [Correlation Circle Plot](http:\u002F\u002Frasbt.github.io\u002Fmlxtend\u002Fuser_guide\u002Fplotting\u002Fplot_pca_correlation_graph\u002F), [Tweet](https:\u002F\u002Ftwitter.com\u002Frasbt\u002Fstatus\u002F1555999903398219777\u002Fphoto\u002F1)  \n[sklearn.random_projection](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Frandom_projection.html) - Johnson-Lindenstrauss lemma, Gaussian random projection, Sparse random projection.  \n[sklearn.cross_decomposition](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fcross_decomposition.html#cross-decomposition) - Partial least squares, supervised estimators for dimensionality reduction and regression.  \n[prince](https:\u002F\u002Fgithub.com\u002FMaxHalford\u002Fprince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD).  \nFaster t-SNE implementations: [tsne-cuda](https:\u002F\u002Fgithub.com\u002FCannyLab\u002Ftsne-cuda), [MulticoreTSNE](https:\u002F\u002Fgithub.com\u002FDmitryUlyanov\u002FMulticore-TSNE), [lvdmaaten](https:\u002F\u002Flvdmaaten.github.io\u002Ftsne\u002F)  \n[umap](https:\u002F\u002Fgithub.com\u002Flmcinnes\u002Fumap) - Uniform Manifold Approximation and Projection, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=nq6iPZVUxZU), [explorer](https:\u002F\u002Fgithub.com\u002FGrantCuster\u002Fumap-explorer), [explanation](https:\u002F\u002Fpair-code.github.io\u002Funderstanding-umap\u002F), [parallel version](https:\u002F\u002Fdocs.rapids.ai\u002Fapi\u002Fcuml\u002Fstable\u002Fapi.html).  \n[humap](https:\u002F\u002Fgithub.com\u002Fwilsonjr\u002Fhumap) - Hierarchical UMAP.  \n[sleepwalk](https:\u002F\u002Fgithub.com\u002Fanders-biostat\u002Fsleepwalk\u002F) - Explore embeddings, interactive visualization (R package).  \n[somoclu](https:\u002F\u002Fgithub.com\u002Fpeterwittek\u002Fsomoclu) - Self-organizing map.  \n[scikit-tda](https:\u002F\u002Fgithub.com\u002Fscikit-tda\u002Fscikit-tda) - Topological Data Analysis, [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fsrep01236), [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=F2t_ytTLrQ4), [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=AWoeBzJd7uQ), [paper](https:\u002F\u002Fwww.uncg.edu\u002Fmat\u002Ffaculty\u002Fcdsmyth\u002Ftopological-approaches-skin.pdf).  \n[giotto-tda](https:\u002F\u002Fgithub.com\u002Fgiotto-ai\u002Fgiotto-tda) - Topological Data Analysis.  \n[ivis](https:\u002F\u002Fgithub.com\u002Fberingresearch\u002Fivis) - Dimensionality reduction using Siamese Networks.  \n[trimap](https:\u002F\u002Fgithub.com\u002Feamid\u002Ftrimap) - Dimensionality reduction using triplets.  \n[scanpy](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fscanpy) - [Force-directed graph drawing](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.draw_graph.html#scanpy.tl.draw_graph), [Diffusion Maps](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.diffmap.html).  \n[direpack](https:\u002F\u002Fgithub.com\u002FSvenSerneels\u002Fdirepack) - Projection pursuit, Sufficient dimension reduction, Robust M-estimators.  \n[DBS](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002FDatabionicSwarm\u002Fvignettes\u002FDatabionicSwarm.html) - DatabionicSwarm (R package).  \n[contrastive](https:\u002F\u002Fgithub.com\u002Fabidlabs\u002Fcontrastive) - Contrastive PCA.  \n[scPCA](https:\u002F\u002Fgithub.com\u002FPhilBoileau\u002FscPCA) - Sparse contrastive PCA (R package).  \n[generalized_contrastive_PCA](https:\u002F\u002Fgithub.com\u002FSjulsonLab\u002Fgeneralized_contrastive_PCA) - Generalized contrastive PCA.  \n[tmap](https:\u002F\u002Fgithub.com\u002Freymond-group\u002Ftmap) - Visualization library for large, high-dimensional data sets.  \n[lollipop](https:\u002F\u002Fgithub.com\u002Fneurodata\u002Flollipop) - Linear Optimal Low Rank Projection.  \n[linearsdr](https:\u002F\u002Fgithub.com\u002FHarrisQ\u002Flinearsdr) - Linear Sufficient Dimension Reduction (R package).  \n[PHATE](https:\u002F\u002Fgithub.com\u002FKrishnaswamyLab\u002FPHATE) - Tool for visualizing high dimensional data.  \n[datamapplot](https:\u002F\u002Fgithub.com\u002FTutteInstitute\u002Fdatamapplot) - Tool for visualizing high dimensional data.  \n\n#### Visualization\n[All charts](https:\u002F\u002Fdatavizproject.com\u002F)  \n[physt](https:\u002F\u002Fgithub.com\u002Fjanpipek\u002Fphyst) - Better histograms, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZG-wH3-Up9Y), [notebook](https:\u002F\u002Fnbviewer.jupyter.org\u002Fgithub\u002Fjanpipek\u002Fpydata2018-berlin\u002Fblob\u002Fmaster\u002Fnotebooks\u002Ftalk.ipynb).  \n[fast-histogram](https:\u002F\u002Fgithub.com\u002Fastrofrog\u002Ffast-histogram) - Fast histograms.  \n[matplotlib_venn](https:\u002F\u002Fgithub.com\u002Fkonstantint\u002Fmatplotlib-venn) - Venn diagrams.  \n[penrose](https:\u002F\u002Fgithub.com\u002Fpenrose\u002Fpenrose) - Venn diagrams.  \n[ridgeplot](https:\u002F\u002Fgithub.com\u002Ftpvasconcelos\u002Fridgeplot) - Ridge plots.  \n[mosaic plots](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.graphics.mosaicplot.mosaic.html) - Categorical variable visualization, [example](https:\u002F\u002Fsukhbinder.wordpress.com\u002F2018\u002F09\u002F18\u002Fmosaic-plot-in-python\u002F).  \n[yellowbrick](https:\u002F\u002Fgithub.com\u002FDistrictDataLabs\u002Fyellowbrick) - Visualizations for ML models (similar to scikit-plot).  \n[bokeh](https:\u002F\u002Fgithub.com\u002Fbokeh\u002Fbokeh) - Interactive visualization library, [Examples](https:\u002F\u002Fbokeh.pydata.org\u002Fen\u002Flatest\u002Fdocs\u002Fuser_guide\u002Fserver.html), [Examples](https:\u002F\u002Fgithub.com\u002FWillKoehrsen\u002FBokeh-Python-Visualization).  \n[lets-plot](https:\u002F\u002Fgithub.com\u002FJetBrains\u002Flets-plot) - Plotting library.  \n[plotnine](https:\u002F\u002Fgithub.com\u002Fhas2k1\u002Fplotnine) - ggplot for Python.  \n[altair](https:\u002F\u002Fgithub.com\u002Fvega\u002Faltair) - Declarative statistical visualization library.  \n[hvplot](https:\u002F\u002Fgithub.com\u002Fpyviz\u002Fhvplot) - High-level plotting library built on top of [holoviews](http:\u002F\u002Fholoviews.org\u002F).  \n[dtreeviz](https:\u002F\u002Fgithub.com\u002Fparrt\u002Fdtreeviz) - Decision tree visualization and model interpretation.  \n[mpl-scatter-density](https:\u002F\u002Fgithub.com\u002Fastrofrog\u002Fmpl-scatter-density) - Scatter density plots. Alternative to 2d-histograms.   \n[ComplexHeatmap](https:\u002F\u002Fgithub.com\u002Fjokergoo\u002FComplexHeatmap) - Complex heatmaps for multidimensional genomic data (R package).  \n[morpheus](https:\u002F\u002Fsoftware.broadinstitute.org\u002Fmorpheus\u002F) - Broad Institute tool matrix visualization and analysis software. [Source](https:\u002F\u002Fgithub.com\u002Fcmap\u002Fmorpheus.js), Tutorial: [1](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0nkYDeekhtQ), [2](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=r9mN6MsxUb0), [Code](https:\u002F\u002Fgithub.com\u002Fbroadinstitute\u002FBBBC021_Morpheus_Exercise).  \n[jupyter-scatter](https:\u002F\u002Fgithub.com\u002Fflekschas\u002Fjupyter-scatter) - Interactive 2D scatter plot widget for Jupyter.  \n[fastplotlib](https:\u002F\u002Fgithub.com\u002Ffastplotlib\u002Ffastplotlib) - Fast plotting library using pygfx.  \n[datamapplot](https:\u002F\u002Fgithub.com\u002FTutteInstitute\u002Fdatamapplot) - Interactive 2D scatter plot.  \n[SandDance](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSandDance) - Interactive visualization tool from Microsoft.  \n\n#### Colors\n[palettable](https:\u002F\u002Fgithub.com\u002Fjiffyclub\u002Fpalettable) - Color palettes from [colorbrewer2](https:\u002F\u002Fcolorbrewer2.org\u002F#type=sequential&scheme=BuGn&n=3).  \n[colorcet](https:\u002F\u002Fgithub.com\u002Fholoviz\u002Fcolorcet) - Collection of perceptually uniform colormaps.  \n[Named Colors Wheel](https:\u002F\u002Farantius.github.io\u002Fweb-color-wheel\u002F) - Color wheel for all named HTML colors.  \n\n#### Dashboards\n[py-shiny](https:\u002F\u002Fgithub.com\u002Frstudio\u002Fpy-shiny) - Shiny for Python, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ijRBbtT2tgc).  \n[superset](https:\u002F\u002Fgithub.com\u002Fapache\u002Fsuperset) - Dashboarding solution by Apache.  \n[streamlit](https:\u002F\u002Fgithub.com\u002Fstreamlit\u002Fstreamlit) - Dashboarding solution. [Resources](https:\u002F\u002Fgithub.com\u002Fmarcskovmadsen\u002Fawesome-streamlit), [Gallery](http:\u002F\u002Fawesome-streamlit.org\u002F) [Components](https:\u002F\u002Fwww.streamlit.io\u002Fcomponents), [bokeh-events](https:\u002F\u002Fgithub.com\u002Fash2shukla\u002Fstreamlit-bokeh-events).  \n[mercury](https:\u002F\u002Fgithub.com\u002Fmljar\u002Fmercury) - Convert Python notebook to web app, [Example](https:\u002F\u002Fgithub.com\u002Fpplonski\u002Fdashboard-python-jupyter-notebook).  \n[dash](https:\u002F\u002Fdash.plot.ly\u002Fgallery) - Dashboarding solution by plot.ly. [Resources](https:\u002F\u002Fgithub.com\u002Fucg8j\u002Fawesome-dash).  \n[visdom](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fvisdom) - Dashboarding library by Facebook.  \n[panel](https:\u002F\u002Fpanel.pyviz.org\u002Findex.html) - Dashboarding solution.  \n[altair example](https:\u002F\u002Fgithub.com\u002Fxhochy\u002Faltair-vue-vega-example) - [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=4L568emKOvs).  \n[voila](https:\u002F\u002Fgithub.com\u002FQuantStack\u002Fvoila) - Turn Jupyter notebooks into standalone web applications.  \n[voila-gridstack](https:\u002F\u002Fgithub.com\u002Fvoila-dashboards\u002Fvoila-gridstack) - Voila grid layout.  \n\n#### UI\n[gradio](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio) - Create UIs for your machine learning model.  \n\n#### Survey Tools\n[samplics](https:\u002F\u002Fgithub.com\u002Fsamplics-org\u002Fsamplics) - Sampling techniques for complex survey designs.  \n\n#### Geographical Tools\n[folium](https:\u002F\u002Fgithub.com\u002Fpython-visualization\u002Ffolium) - Plot geographical maps using the Leaflet.js library, [jupyter plugin](https:\u002F\u002Fgithub.com\u002Fjupyter-widgets\u002Fipyleaflet).  \n[gmaps](https:\u002F\u002Fgithub.com\u002Fpbugnion\u002Fgmaps) - Google Maps for Jupyter notebooks.  \n[stadiamaps](https:\u002F\u002Fstadiamaps.com\u002F) - Plot geographical maps.  \n[datashader](https:\u002F\u002Fgithub.com\u002Fbokeh\u002Fdatashader) - Draw millions of points on a map.  \n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.neighbors.BallTree.html) - BallTree.  \n[pynndescent](https:\u002F\u002Fgithub.com\u002Flmcinnes\u002Fpynndescent) - Nearest neighbor descent for approximate nearest neighbors.  \n[geocoder](https:\u002F\u002Fgithub.com\u002FDenisCarriere\u002Fgeocoder) - Geocoding of addresses, IP addresses.  \nConversion of different geo formats: [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=eHRggqAvczE), [repo](https:\u002F\u002Fgithub.com\u002Fdillongardner\u002FPyDataSpatialAnalysis)  \n[geopandas](https:\u002F\u002Fgithub.com\u002Fgeopandas\u002Fgeopandas) - Tools for geographic data  \nLow Level Geospatial Tools (GEOS, GDAL\u002FOGR, PROJ.4)  \nVector Data (Shapely, Fiona, Pyproj)  \nRaster Data (Rasterio)  \nPlotting (Descartes, Catropy)  \n[Predict economic indicators from Open Street Map](https:\u002F\u002Fjanakiev.com\u002Fblog\u002Fosm-predict-economic-indicators\u002F).   \n[PySal](https:\u002F\u002Fgithub.com\u002Fpysal\u002Fpysal) - Python Spatial Analysis Library.  \n[geography](https:\u002F\u002Fgithub.com\u002Fushahidi\u002Fgeograpy) - Extract countries, regions and cities from a URL or text.  \n[cartogram](https:\u002F\u002Fgo-cart.io\u002Fcartogram) - Distorted maps based on population.  \n\n#### Recommender Systems\nExamples: [1](https:\u002F\u002Flazyprogrammer.me\u002Ftutorial-on-collaborative-filtering-and-matrix-factorization-in-python\u002F), [2](https:\u002F\u002Fmedium.com\u002F@james_aka_yale\u002Fthe-4-recommendation-engines-that-can-predict-your-movie-tastes-bbec857b8223), [2-ipynb](https:\u002F\u002Fgithub.com\u002Fkhanhnamle1994\u002Fmovielens\u002Fblob\u002Fmaster\u002FContent_Based_and_Collaborative_Filtering_Models.ipynb), [3](https:\u002F\u002Fwww.kaggle.com\u002Fmorrisb\u002Fhow-to-recommend-anything-deep-recommender).  \n[surprise](https:\u002F\u002Fgithub.com\u002FNicolasHug\u002FSurprise) - Recommender, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=d7iIb_XVkZs).  \n[implicit](https:\u002F\u002Fgithub.com\u002Fbenfred\u002Fimplicit) - Fast Collaborative Filtering for Implicit Feedback Datasets.  \n[spotlight](https:\u002F\u002Fgithub.com\u002Fmaciejkula\u002Fspotlight) - Deep recommender models using PyTorch.  \n[lightfm](https:\u002F\u002Fgithub.com\u002Flyst\u002Flightfm) - Recommendation algorithms for both implicit and explicit feedback.  \n[funk-svd](https:\u002F\u002Fgithub.com\u002Fgbolmier\u002Ffunk-svd) - Fast SVD.  \n\n#### Decision Tree Models\n[Intro to Decision Trees and Random Forests](https:\u002F\u002Fvictorzhou.com\u002Fblog\u002Fintro-to-random-forests\u002F), [Another good visualization](https:\u002F\u002Fmlu-explain.github.io\u002Fdecision-tree\u002F), Intro to Gradient Boosting [1](https:\u002F\u002Fexplained.ai\u002Fgradient-boosting\u002F), [2](https:\u002F\u002Fwww.gormanalysis.com\u002Fblog\u002Fgradient-boosting-explained\u002F), [Decision Tree Visualization](https:\u002F\u002Fexplained.ai\u002Fdecision-tree-viz\u002Findex.html)    \n[lightgbm](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FLightGBM) - Gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, [doc](https:\u002F\u002Fsites.google.com\u002Fview\u002Flauraepp\u002Fparameters).  \n[xgboost](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fxgboost) - Gradient boosting (GBDT, GBRT or GBM) library, [doc](https:\u002F\u002Fsites.google.com\u002Fview\u002Flauraepp\u002Fparameters), Methods for CIs: [link1](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F255783\u002Fconfidence-interval-for-xgb-forecast), [link2](https:\u002F\u002Ftowardsdatascience.com\u002Fregression-prediction-intervals-with-xgboost-428e0a018b).  \n[catboost](https:\u002F\u002Fgithub.com\u002Fcatboost\u002Fcatboost) - Gradient boosting.  \n[h2o](https:\u002F\u002Fgithub.com\u002Fh2oai\u002Fh2o-3) -  Gradient boosting and general machine learning framework.  \n[pycaret](https:\u002F\u002Fgithub.com\u002Fpycaret\u002Fpycaret) - Wrapper for xgboost, lightgbm, catboost etc.  \n[forestci](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fforest-confidence-interval) - Confidence intervals for random forests.  \n[grf](https:\u002F\u002Fgithub.com\u002Fgrf-labs\u002Fgrf) - Generalized random forest.  \n[dtreeviz](https:\u002F\u002Fgithub.com\u002Fparrt\u002Fdtreeviz) - Decision tree visualization and model interpretation.  \n[Nuance](https:\u002F\u002Fgithub.com\u002FSauceCat\u002FNuance) - Decision tree visualization.  \n[rfpimp](https:\u002F\u002Fgithub.com\u002Fparrt\u002Frandom-forest-importances) - Feature Importance for RandomForests using Permuation Importance.  \nWhy the default feature importance for random forests is wrong: [link](http:\u002F\u002Fexplained.ai\u002Frf-importance\u002Findex.html)  \n[bartpy](https:\u002F\u002Fgithub.com\u002FJakeColtman\u002Fbartpy) - Bayesian Additive Regression Trees.  \n[merf](https:\u002F\u002Fgithub.com\u002Fmanifoldai\u002Fmerf) - Mixed Effects Random Forest for Clustering, [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=gWj4ZwB7f3o)  \n[groot](https:\u002F\u002Fgithub.com\u002Ftudelft-cda-lab\u002FGROOT) - Robust decision trees.  \n[linear-tree](https:\u002F\u002Fgithub.com\u002Fcerlymarco\u002Flinear-tree) - Trees with linear models at the leaves.  \n[supertree](https:\u002F\u002Fgithub.com\u002Fmljar\u002Fsupertree) - Decision tree visualization.  \n\n#### Natural Language Processing (NLP) \u002F Text Processing\n[talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6zm9NC9uRkk)-[nb](https:\u002F\u002Fnbviewer.jupyter.org\u002Fgithub\u002Fskipgram\u002Fmodern-nlp-in-python\u002Fblob\u002Fmaster\u002Fexecutable\u002FModern_NLP_in_Python.ipynb), [nb2](https:\u002F\u002Fahmedbesbes.com\u002Fhow-to-mine-newsfeed-data-and-extract-interactive-insights-in-python.html), [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?time_continue=2&v=sI7VpFNiy_I).  \n[Text classification Intro](https:\u002F\u002Fmlwhiz.com\u002Fblog\u002F2018\u002F12\u002F17\u002Ftext_classification\u002F), [Preprocessing blog post](https:\u002F\u002Fmlwhiz.com\u002Fblog\u002F2019\u002F01\u002F17\u002Fdeeplearning_nlp_preprocess\u002F).  \n[gensim](https:\u002F\u002Fradimrehurek.com\u002Fgensim\u002F) - NLP, doc2vec, word2vec, text processing, topic modelling (LSA, LDA), [Example](https:\u002F\u002Fmarkroxor.github.io\u002Fgensim\u002Fstatic\u002Fnotebooks\u002Fgensim_news_classification.html), [Coherence Model](https:\u002F\u002Fradimrehurek.com\u002Fgensim\u002Fmodels\u002Fcoherencemodel.html) for evaluation.  \nEmbeddings - [GloVe](https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002F) ([[1](https:\u002F\u002Fwww.kaggle.com\u002Fjhoward\u002Fimproved-lstm-baseline-glove-dropout)], [[2](https:\u002F\u002Fwww.kaggle.com\u002Fsbongo\u002Fdo-pretrained-embeddings-give-you-the-extra-edge)]), [StarSpace](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FStarSpace), [wikipedia2vec](https:\u002F\u002Fwikipedia2vec.github.io\u002Fwikipedia2vec\u002Fpretrained\u002F), [visualization](https:\u002F\u002Fprojector.tensorflow.org\u002F).  \n[magnitude](https:\u002F\u002Fgithub.com\u002Fplasticityai\u002Fmagnitude) - Vector embedding utility package.  \n[pyldavis](https:\u002F\u002Fgithub.com\u002Fbmabey\u002FpyLDAvis) - Visualization for topic modelling.  \n[spaCy](https:\u002F\u002Fspacy.io\u002F) - NLP.  \n[NTLK](https:\u002F\u002Fwww.nltk.org\u002F) - NLP, helpful `KMeansClusterer` with `cosine_distance`.  \n[pytext](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FPyText) - NLP from Facebook.  \n[fastText](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FfastText) - Efficient text classification and representation learning.  \n[annoy](https:\u002F\u002Fgithub.com\u002Fspotify\u002Fannoy) - Approximate nearest neighbor search.  \n[faiss](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffaiss) - Approximate nearest neighbor search.  \n[infomap](https:\u002F\u002Fgithub.com\u002Fmapequation\u002Finfomap) - Cluster (word-)vectors to find topics.  \n[datasketch](https:\u002F\u002Fgithub.com\u002Fekzhu\u002Fdatasketch) - Probabilistic data structures for large data (MinHash, HyperLogLog).  \n[flair](https:\u002F\u002Fgithub.com\u002Fzalandoresearch\u002Fflair) - NLP Framework by Zalando.  \n[stanza](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fstanza) - NLP Library.  \n[Chatistics](https:\u002F\u002Fgithub.com\u002FMasterScrat\u002FChatistics) - Turn Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.  \n[textdistance](https:\u002F\u002Fgithub.com\u002Flife4\u002Ftextdistance) - Collection for comparing distances between two or more sequences.  \n\n#### Bio Image Analysis\n[Lee et al. - A beginner's guide to rigor and reproducibility in fluorescence imaging experiments](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC6080651\u002F)  \n[Awesome Cytodata](https:\u002F\u002Fgithub.com\u002Fcytodata\u002Fawesome-cytodata)  \n\n##### Tutorials\n[MIT 7.016 Introductory Biology, Fall 2018](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLUl4u3cNGP63LmSVIVzy584-ZbjbJ-Y63) - Videos 27, 28, and 29 talk about staining and imaging.  \n[Bio-image Analysis Notebooks](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002Fintro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F18a_deconvolution\u002Fextract_psf.html) and [deconvolution](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F18a_deconvolution\u002Fintroduction_deconvolution.html), [3D cell segmentation](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F20_image_segmentation\u002FSegmentation_3D.html), [feature extraction](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F22_feature_extraction\u002Fstatistics_with_pyclesperanto.html) using [pyclesperanto](https:\u002F\u002Fgithub.com\u002FclEsperanto\u002Fpyclesperanto_prototype) and others.  \n[python_for_microscopists](https:\u002F\u002Fgithub.com\u002Fbnsreenu\u002Fpython_for_microscopists) - Notebooks and associated [youtube channel](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUC34rW-HtPJulxr5wp2Xa04w\u002Fvideos) for a variety of image processing tasks.  \n\n##### Datasets\n[jump-cellpainting](https:\u002F\u002Fgithub.com\u002Fjump-cellpainting\u002Fdatasets) - Cellpainting dataset.  \n[MedMNIST](https:\u002F\u002Fgithub.com\u002FMedMNIST\u002FMedMNIST) - Datasets for 2D and 3D Biomedical Image Classification.  \n[CytoImageNet](https:\u002F\u002Fgithub.com\u002Fstan-hua\u002FCytoImageNet) - Huge diverse dataset like ImageNet but for cell images.  \n[Haghighi](https:\u002F\u002Fgithub.com\u002Fcarpenterlab\u002F2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles.  \n[broadinstitute\u002Flincs-profiling-complementarity](https:\u002F\u002Fgithub.com\u002Fbroadinstitute\u002Flincs-profiling-complementarity) - Cellpainting vs. L1000 assay.  \n\n#### Biostatistics \u002F Robust statistics\n[MinCovDet](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.covariance.MinCovDet.html) - Robust estimator of covariance, RMPV, [Paper](https:\u002F\u002Fwires.onlinelibrary.wiley.com\u002Fdoi\u002Ffull\u002F10.1002\u002Fwics.1421), [App1](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002F10.1177\u002F1087057112469257?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed&), [App2](https:\u002F\u002Fwww.cell.com\u002Fcell-reports\u002Fpdf\u002FS2211-1247(21)00694-X.pdf).  \n[moderated z-score](https:\u002F\u002Fclue.io\u002Fconnectopedia\u002Freplicate_collapse) - Weighted average of z-scores based on Spearman correlation.  \n[winsorize](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.mstats.winsorize.html#scipy.stats.mstats.winsorize) - Simple adjustment of outliers.  \n\n#### High-Content Screening Assay Design\n[Zhang XHD (2008) - Novel analytic criteria and effective plate designs for quality control in genome-wide RNAi screens](https:\u002F\u002Fslas-discovery.org\u002Farticle\u002FS2472-5552(22)08204-1\u002Fpdf)  \n[Iversen - A Comparison of Assay Performance Measures in Screening Assays, Signal Window, Z′ Factor, and Assay Variability Ratio](https:\u002F\u002Fwww.slas-discovery.org\u002Farticle\u002FS2472-5552(22)08460-X\u002Fpdf)\n[Z-factor](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FZ-factor) - Measure of statistical effect size.  \n[Z'-factor](https:\u002F\u002Flink.springer.com\u002Freferenceworkentry\u002F10.1007\u002F978-3-540-47648-1_6298) - Measure of statistical effect size.  \n[CV](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCoefficient_of_variation) - Coefficient of variation.  \n[SSMD](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FStrictly_standardized_mean_difference) - Strictly standardized mean difference.  \n[Signal Window](https:\u002F\u002Fwww.intechopen.com\u002Fchapters\u002F48130) - Assay quality measurement.  \n\n#### Microscopy + Assay\n[BD Spectrum Viewer](https:\u002F\u002Fwww.bdbiosciences.com\u002Fen-us\u002Fresources\u002Fbd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes.  \n[SpectraViewer](https:\u002F\u002Fwww.perkinelmer.com\u002Flab-products-and-services\u002Fspectraviewer) - Visualize the spectral compatibility of fluorophores (PerkinElmer).  \n[Thermofisher Spectrum Viewer](https:\u002F\u002Fwww.thermofisher.com\u002Forder\u002Fstain-it) - Thermofisher Spectrum Viewer.  \n[Microscopy Resolution Calculator](https:\u002F\u002Fwww.microscope.healthcare.nikon.com\u002Fmicrotools\u002Fresolution-calculator) - Calculate resolution of images (Nikon).  \n[PlateEditor](https:\u002F\u002Fgithub.com\u002Fvindelorme\u002FPlateEditor) - Drug Layout for plates, [app](https:\u002F\u002Fplateeditor.sourceforge.io\u002F), [zip](https:\u002F\u002Fsourceforge.net\u002Fprojects\u002Fplateeditor\u002F), [paper](https:\u002F\u002Fjournals.plos.org\u002Fplosone\u002Farticle?id=10.1371\u002Fjournal.pone.0252488).  \n\n##### Image Formats and Converters\nOME-Zarr - [paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.02.17.528834v1.full), [standard](https:\u002F\u002Fngff.openmicroscopy.org\u002Flatest\u002F)  \n[bioformats2raw](https:\u002F\u002Fgithub.com\u002Fglencoesoftware\u002Fbioformats2raw) - Various formats to zarr.  \n[raw2ometiff](https:\u002F\u002Fgithub.com\u002Fglencoesoftware\u002Fraw2ometiff) - Zarr to tiff.  \n[BatchConvert](https:\u002F\u002Fgithub.com\u002FEuro-BioImaging\u002FBatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DeCWV274l0c).  \nREMBI model - Recommended Metadata for Biological Images, BioImage Archive: [Study Component Guidance](https:\u002F\u002Fwww.ebi.ac.uk\u002Fbioimage-archive\u002Frembi-help-examples\u002F), [File List Guide](https:\u002F\u002Fwww.ebi.ac.uk\u002Fbioimage-archive\u002Fhelp-file-list\u002F), [paper](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC8606015\u002F), [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=GVmfOpuP2_c), [spreadsheet](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo\u002Fedit#gid=1023506919)  \n\n##### Matrix Formats\n[anndata](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fanndata) - annotated data matrices in memory and on disk, [Docs](https:\u002F\u002Fanndata.readthedocs.io\u002Fen\u002Flatest\u002Findex.html).  \n[muon](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fmuon) - Multimodal omics framework.  \n[mudata](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fmudata) - Multimodal Data (.h5mu) implementation.  \n[bdz](https:\u002F\u002Fgithub.com\u002Fopenssbd\u002Fbdz) - Zarr-based format for storing quantitative biological dynamics data.  \n\n#### Image Viewers\n[napari](https:\u002F\u002Fgithub.com\u002Fnapari\u002Fnapari) - Image viewer and image processing tool.    \n[Fiji](https:\u002F\u002Ffiji.sc\u002F) - General purpose tool. Image viewer and image processing tool.  \n[vizarr](https:\u002F\u002Fgithub.com\u002Fhms-dbmi\u002Fvizarr) - Browser-based image viewer for zarr format.  \n[avivator](https:\u002F\u002Fgithub.com\u002Fhms-dbmi\u002Fviv) - Browser-based image viewer for tiff files.  \n[OMERO](https:\u002F\u002Fwww.openmicroscopy.org\u002Fomero\u002F) - Image viewer for high-content screening. [IDR](https:\u002F\u002Fidr.openmicroscopy.org\u002F) uses OMERO. [Intro](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=nSCrMO_c-5s)   \n[fiftyone](https:\u002F\u002Fgithub.com\u002Fvoxel51\u002Ffiftyone) - Viewer and tool for building high-quality datasets and computer vision models.  \nImage Data Explorer - Microscopy Image Viewer, [Shiny App](https:\u002F\u002Fshiny-portal.embl.de\u002Fshinyapps\u002Fapp\u002F01_image-data-explorer), [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=H8zIZvOt1MA).  \n[ImSwitch](https:\u002F\u002Fgithub.com\u002FImSwitch\u002FImSwitch) - Microscopy Image Viewer, [Doc](https:\u002F\u002Fimswitch.readthedocs.io\u002Fen\u002Fstable\u002Fgui.html), [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=XsbnMkGSPQQ).  \n[pixmi](https:\u002F\u002Fgithub.com\u002Fpiximi\u002Fpiximi) - Web-based image annotation and classification tool, [App](https:\u002F\u002Fwww.piximi.app\u002F).  \n[DeepCell Label](https:\u002F\u002Flabel.deepcell.org\u002F) - Data labeling tool to segment images, [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zfsvUBkEeow).  \n[lightly-studio](https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly-studio) - Image annotation.  \n\n#### Napari Plugins\n[napari-sam](https:\u002F\u002Fgithub.com\u002FMIC-DKFZ\u002Fnapari-sam) - Segment Anything Plugin.  \n[napari-chatgpt](https:\u002F\u002Fgithub.com\u002Froyerlab\u002Fnapari-chatgpt) - ChatGPT Plugin.  \n\n##### Image Restoration and Denoising\n[aydin](https:\u002F\u002Fgithub.com\u002Froyerlab\u002Faydin) - Image denoising.  \n[DivNoising](https:\u002F\u002Fgithub.com\u002Fjuglab\u002FDivNoising) - Unsupervised denoising method.  \n[CSBDeep](https:\u002F\u002Fgithub.com\u002FCSBDeep\u002FCSBDeep) - Content-aware image restoration, [Project page](https:\u002F\u002Fcsbdeep.bioimagecomputing.com\u002Ftools\u002F).  \n[gibbs-diffusion](https:\u002F\u002Fgithub.com\u002Frubenohana\u002Fgibbs-diffusion) - Image denoising.  \n\n##### Illumination correction\n[skimage](https:\u002F\u002Fscikit-image.org\u002Fdocs\u002Fdev\u002Fapi\u002Fskimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE).  \n[cidre](https:\u002F\u002Fgithub.com\u002Fsmithk\u002Fcidre) - Illumination correction method for optical microscopy.  \n[BaSiCPy](https:\u002F\u002Fgithub.com\u002Fpeng-lab\u002FBaSiCPy) - Background and Shading Correction of Optical Microscopy Images, [BaSiC](https:\u002F\u002Fgithub.com\u002Fmarrlab\u002FBaSiC).  \n\n##### Bleedthrough correction \u002F Spectral Unmixing\n[PICASSO](https:\u002F\u002Fgithub.com\u002Fnygctech\u002FPICASSO) - Blind unmixing without reference spectra measurement, [Paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.01.27.428247v1.full)  \n[cytoflow](https:\u002F\u002Fgithub.com\u002Fcytoflow\u002Fcytoflow) - Flow cytometry. Includes Bleedthrough correction methods.  \nLinear unmixing in Fiji for Bleedthrough Correction - [Youtube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=W90qs0J29v8).  \nBleedthrough Correction using Lumos and Fiji - [Link](https:\u002F\u002Fimagej.net\u002Fplugins\u002Flumos-spectral-unmixing).  \nAutoUnmix - [Link](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.05.30.542836v1.full).  \n\n##### Platforms and Pipelines\n[CellProfiler](https:\u002F\u002Fgithub.com\u002FCellProfiler\u002FCellProfiler), [CellProfilerAnalyst](https:\u002F\u002Fgithub.com\u002FCellProfiler\u002FCellProfiler-Analyst) - Create image analysis pipelines.  \n[fractal](https:\u002F\u002Ffractal-analytics-platform.github.io\u002F) - Framework to process high-content imaging data from UZH, [Github](https:\u002F\u002Fgithub.com\u002Ffractal-analytics-platform).  \n[atomai](https:\u002F\u002Fgithub.com\u002Fpycroscopy\u002Fatomai) - Deep and Machine Learning for Microscopy.  \n[py-clesperanto](https:\u002F\u002Fgithub.com\u002Fclesperanto\u002Fpyclesperanto_prototype\u002F) - Tools for 3D microscopy analysis, [deskewing](https:\u002F\u002Fgithub.com\u002FclEsperanto\u002Fpyclesperanto_prototype\u002Fblob\u002Fmaster\u002Fdemo\u002Ftransforms\u002Fdeskew.ipynb) and lots of other tutorials, interacts with napari.  \n[qupath](https:\u002F\u002Fgithub.com\u002Fqupath\u002Fqupath) - Image analysis.  \n\n##### Microscopy Pipelines\nLabsyspharm Stack see below.  \n[BiaPy](https:\u002F\u002Fgithub.com\u002Fdanifranco\u002FBiaPy) - Bioimage analysis pipelines, [paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2024.02.03.576026v2.full).  \n[SCIP](https:\u002F\u002Fscalable-cytometry-image-processing.readthedocs.io\u002Fen\u002Flatest\u002Fusage.html) - Image processing pipeline on top of Dask.  \n[DeepCell Kiosk](https:\u002F\u002Fgithub.com\u002Fvanvalenlab\u002Fkiosk-console\u002Ftree\u002Fmaster) - Image analysis platform.  \n[IMCWorkflow](https:\u002F\u002Fgithub.com\u002FBodenmillerGroup\u002FIMCWorkflow\u002F) - Image analysis pipeline using [steinbock](https:\u002F\u002Fgithub.com\u002FBodenmillerGroup\u002Fsteinbock), [Twitter](https:\u002F\u002Ftwitter.com\u002FNilsEling\u002Fstatus\u002F1715020265963258087), [Paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41596-023-00881-0), [workflow](https:\u002F\u002Fbodenmillergroup.github.io\u002FIMCDataAnalysis\u002F).  \n\n##### Labsyspharm\n[mcmicro](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fmcmicro) - Multiple-choice microscopy pipeline, [Website](https:\u002F\u002Fmcmicro.org\u002Foverview\u002F), [Paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41592-021-01308-y).  \n[MCQuant](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fquantification) - Quantification of cell features.  \n[cylinter](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fcylinter) - Quality assurance for microscopy images, [Website](https:\u002F\u002Flabsyspharm.github.io\u002Fcylinter\u002F).  \n[ashlar](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fashlar) - Whole-slide microscopy image stitching and registration.  \n[scimap](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fscimap) - Spatial Single-Cell Analysis Toolkit.  \n\n##### Cell Segmentation\n[microscopy-tree](https:\u002F\u002Fbiomag-lab.github.io\u002Fmicroscopy-tree\u002F) - Review of cell segmentation algorithms, [Paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fabs\u002Fpii\u002FS0962892421002518).  \nReview of organoid pipelines - [Paper](https:\u002F\u002Farxiv.org\u002Fftp\u002Farxiv\u002Fpapers\u002F2301\u002F2301.02341.pdf).  \n[BioImage.IO](https:\u002F\u002Fbioimage.io\u002F#\u002F) - BioImage Model Zoo.  \n[MEDIAR](https:\u002F\u002Fgithub.com\u002FLee-Gihun\u002FMEDIAR) - Cell segmentation.  \n[cellpose](https:\u002F\u002Fgithub.com\u002Fmouseland\u002Fcellpose) - Cell segmentation. [Paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2020.02.02.931238v1), [Dataset](https:\u002F\u002Fwww.cellpose.org\u002Fdataset).  \n[stardist](https:\u002F\u002Fgithub.com\u002Fstardist\u002Fstardist) - Cell segmentation with Star-convex Shapes.  \n[instanseg](https:\u002F\u002Fgithub.com\u002Finstanseg\u002Finstanseg) - Cell segmentation.  \n[UnMicst](https:\u002F\u002Fgithub.com\u002FHMS-IDAC\u002FUnMicst) - Identifying Cells and Segmenting Tissue.  \n[ilastik](https:\u002F\u002Fgithub.com\u002Filastik\u002Filastik) - Segment, classify, track and count cells. [ImageJ Plugin](https:\u002F\u002Fgithub.com\u002Filastik\u002Filastik4ij).   \n[nnUnet](https:\u002F\u002Fgithub.com\u002FMIC-DKFZ\u002FnnUNet) - 3D biomedical image segmentation.  \n[allencell](https:\u002F\u002Fwww.allencell.org\u002Fsegmenter.html) - Tools for 3D segmentation, classical and deep learning methods.  \n[Cell-ACDC](https:\u002F\u002Fgithub.com\u002FSchmollerLab\u002FCell_ACDC) - Python GUI for cell segmentation and tracking.  \n[ZeroCostDL4Mic](https:\u002F\u002Fgithub.com\u002FHenriquesLab\u002FZeroCostDL4Mic\u002Fwiki) - Deep-Learning in Microscopy.  \n[DL4MicEverywhere](https:\u002F\u002Fgithub.com\u002FHenriquesLab\u002FDL4MicEverywhere) - Bringing the ZeroCostDL4Mic experience using Docker.  \n[EmbedSeg](https:\u002F\u002Fgithub.com\u002Fjuglab\u002FEmbedSeg) - Embedding-based Instance Segmentation.  \n[segment-anything](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything) - Segment Anything (SAM) from Facebook.  \n[micro-sam](https:\u002F\u002Fgithub.com\u002Fcomputational-cell-analytics\u002Fmicro-sam) - Segment Anything for Microscopy.  \n[Segment-Everything-Everywhere-All-At-Once](https:\u002F\u002Fgithub.com\u002FUX-Decoder\u002FSegment-Everything-Everywhere-All-At-Once) - Segment Everything Everywhere All at Once from Microsoft.  \n[deepcell-tf](https:\u002F\u002Fgithub.com\u002Fvanvalenlab\u002Fdeepcell-tf\u002Ftree\u002Fmaster) - Cell segmentation, [DeepCell](https:\u002F\u002Fdeepcell.org\u002F).  \n[labkit](https:\u002F\u002Fgithub.com\u002Fjuglab\u002Flabkit-ui) - Fiji plugin for image segmentation.  \n[MedImageInsight](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.06542) - Embedding Model for General Domain Medical Imaging.  \n[CHIEF](https:\u002F\u002Fgithub.com\u002Fhms-dbmi\u002FCHIEF) - Clinical Histopathology Imaging Evaluation Foundation Model.  \n\n##### Cell Segmentation Datasets\n[cellpose](https:\u002F\u002Fwww.cellpose.org\u002Fdataset) - Cell images.  \n[omnipose](http:\u002F\u002Fwww.cellpose.org\u002Fdataset_omnipose) - Cell images.  \n[LIVECell](https:\u002F\u002Fgithub.com\u002Fsartorius-research\u002FLIVECell) - Cell images.  \n[Sartorius](https:\u002F\u002Fwww.kaggle.com\u002Fcompetitions\u002Fsartorius-cell-instance-segmentation\u002Foverview) - Neurons.  \n[EmbedSeg](https:\u002F\u002Fgithub.com\u002Fjuglab\u002FEmbedSeg\u002Freleases\u002Ftag\u002Fv0.1.0) - 2D + 3D images.  \n[connectomics](https:\u002F\u002Fsites.google.com\u002Fview\u002Fconnectomics\u002F) - Annotation of the EPFL Hippocampus dataset.  \n[ZeroCostDL4Mic](https:\u002F\u002Fwww.ebi.ac.uk\u002Fbiostudies\u002FBioImages\u002Fstudies\u002FS-BIAD895) - Stardist example training and test dataset.  \n\n##### Evaluation\n[seg-eval](https:\u002F\u002Fgithub.com\u002Flstrgar\u002Fseg-eval) - Cell segmentation performance evaluation without Ground Truth labels, [Paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.02.23.529809v1.full.pdf).  \n\n##### Feature Engineering Images\n[Computer vision challenges in drug discovery - Maciej Hermanowicz](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Y5GJmnIhvFk)  \n[CellProfiler](https:\u002F\u002Fgithub.com\u002FCellProfiler\u002FCellProfiler) - Biological image analysis.   \n[scikit-image](https:\u002F\u002Fgithub.com\u002Fscikit-image\u002Fscikit-image) - Image processing.  \n[scikit-image regionprops](https:\u002F\u002Fscikit-image.org\u002Fdocs\u002Fdev\u002Fapi\u002Fskimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent.  \n[mahotas](https:\u002F\u002Fgithub.com\u002Fluispedro\u002Fmahotas) - Zernike, Haralick, LBP, and TAS features, [example](https:\u002F\u002Fgithub.com\u002Fluispedro\u002Fpython-image-tutorial\u002Fblob\u002Fmaster\u002FSegmenting%20cell%20images%20(fluorescent%20microscopy).ipynb).   \n[pyradiomics](https:\u002F\u002Fgithub.com\u002FAIM-Harvard\u002Fpyradiomics) - Radiomics features from medical imaging.  \n[pyefd](https:\u002F\u002Fgithub.com\u002Fhbldh\u002Fpyefd) - Elliptical feature descriptor, approximating a contour with a Fourier series.  \n[pyvips](https:\u002F\u002Fgithub.com\u002Flibvips\u002Fpyvips\u002Ftree\u002Fmaster) - Faster image processing operations.  \n\n#### Domain Adaptation \u002F Batch-Effect Correction \n[Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https:\u002F\u002Fgenomebiology.biomedcentral.com\u002Farticles\u002F10.1186\u002Fs13059-019-1850-9), [Code](https:\u002F\u002Fgithub.com\u002FJinmiaoChenLab\u002FBatch-effect-removal-benchmarking).  \n[R Tutorial on correcting batch effects](https:\u002F\u002Fbroadinstitute.github.io\u002F2019_scWorkshop\u002Fcorrecting-batch-effects.html).  \n[harmonypy](https:\u002F\u002Fgithub.com\u002Fslowkow\u002Fharmonypy) - Fuzzy k-means and locally linear adjustments.  \n[pyliger](https:\u002F\u002Fgithub.com\u002Fwelch-lab\u002Fpyliger) - Batch-effect correction, [R package](https:\u002F\u002Fgithub.com\u002Fwelch-lab\u002Fliger).  \n[nimfa](https:\u002F\u002Fgithub.com\u002Fmims-harvard\u002Fnimfa) - Nonnegative matrix factorization.  \n[scgen](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fscgen) - Batch removal. [Doc](https:\u002F\u002Fscgen.readthedocs.io\u002Fen\u002Fstable\u002F).  \n[CORAL](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002F30e54523f08d963ced3fbb37c00e9225579d2e1d\u002Fcorrect_batch_effects_wdn) - Correcting for Batch Effects Using Wasserstein Distance, [Code](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Fblob\u002F30e54523f08d963ced3fbb37c00e9225579d2e1d\u002Fcorrect_batch_effects_wdn\u002Ftransform.py#L152), [Paper](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC7050548\u002F).   \n[adapt](https:\u002F\u002Fgithub.com\u002Fadapt-python\u002Fadapt) - Awesome Domain Adaptation Python Toolbox.  \n[pytorch-adapt](https:\u002F\u002Fgithub.com\u002FKevinMusgrave\u002Fpytorch-adapt) - Various neural network models for domain adaptation.  \n\n##### Sequencing\n[Single cell tutorial](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fsingle-cell-tutorial).  \n[PyDESeq2](https:\u002F\u002Fgithub.com\u002Fowkin\u002FPyDESeq2) - Analyzing RNA-seq data.  \n[cellxgene](https:\u002F\u002Fgithub.com\u002Fchanzuckerberg\u002Fcellxgene) - Interactive explorer for single-cell transcriptomics data.  \n[scanpy](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fscanpy) - Analyze single-cell gene expression data, [tutorial](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fsingle-cell-tutorial).  \n[besca](https:\u002F\u002Fgithub.com\u002Fbedapub\u002Fbesca) - Beyond single-cell analysis.  \n[janggu](https:\u002F\u002Fgithub.com\u002FBIMSBbioinfo\u002Fjanggu) - Deep Learning for Genomics.  \n[gdsctools](https:\u002F\u002Fgithub.com\u002FCancerRxGene\u002Fgdsctools) - Drug responses in the context of the Genomics of Drug Sensitivity in Cancer project, ANOVA, IC50, MoBEM, [doc](https:\u002F\u002Fgdsctools.readthedocs.io\u002Fen\u002Fmaster\u002F).  \n[monkeybread](https:\u002F\u002Fgithub.com\u002Fimmunitastx\u002Fmonkeybread) - Analysis of single-cell spatial transcriptomics data.  \n\n##### Drug discovery\n[TDC](https:\u002F\u002Fgithub.com\u002Fmims-harvard\u002FTDC\u002Ftree\u002Fmain) - Drug Discovery and Development.  \n[DeepPurpose](https:\u002F\u002Fgithub.com\u002Fkexinhuang12345\u002FDeepPurpose) - Deep Learning Based Molecular Modelling and Prediction Toolkit.  \n\n#### Neural Networks\n[mit6874](https:\u002F\u002Fmit6874.github.io\u002F) - Computational Systems Biology: Deep Learning in the Life Sciences.  \n[ConvNet Shape Calculator](https:\u002F\u002Fmadebyollin.github.io\u002Fconvnet-calculator\u002F) - Calculate output dimensions of Conv2D layer.  \n[Great Gradient Descent Article](https:\u002F\u002Ftowardsdatascience.com\u002F10-gradient-descent-optimisation-algorithms-86989510b5e9).  \n[Intro to semi-supervised learning](https:\u002F\u002Flilianweng.github.io\u002Flil-log\u002F2021\u002F12\u002F05\u002Fsemi-supervised-learning.html).  \n\n##### Tutorials & Viewer\n[Google Tuning Playbook](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ftuning_playbook) - A playbook for systematically maximizing the performance of deep learning models by Google.  \n[fast.ai course](https:\u002F\u002Fcourse.fast.ai\u002F) - Practical Deep Learning for Coders.  \n[Tensorflow without a PhD](https:\u002F\u002Fgithub.com\u002FGoogleCloudPlatform\u002Ftensorflow-without-a-phd) - Neural Network course by Google.  \nFeature Visualization: [Blog](https:\u002F\u002Fdistill.pub\u002F2017\u002Ffeature-visualization\u002F), [PPT](http:\u002F\u002Fcs231n.stanford.edu\u002Fslides\u002F2017\u002Fcs231n_2017_lecture12.pdf)  \n[Tensorflow Playground](https:\u002F\u002Fplayground.tensorflow.org\u002F)  \n[Visualization of optimization algorithms](http:\u002F\u002Fvis.ensmallen.org\u002F), [Another visualization](https:\u002F\u002Fgithub.com\u002Fjettify\u002Fpytorch-optimizer)    \n[cutouts-explorer](https:\u002F\u002Fgithub.com\u002Fmgckind\u002Fcutouts-explorer) - Image Viewer.  \n\n##### Image Related\n[imgaug](https:\u002F\u002Fgithub.com\u002Faleju\u002Fimgaug) - More sophisticated image preprocessing.  \n[Augmentor](https:\u002F\u002Fgithub.com\u002Fmdbloice\u002FAugmentor) - Image augmentation library.  \n[keras preprocessing](https:\u002F\u002Fkeras.io\u002Fpreprocessing\u002Fimage\u002F) - Preprocess images.  \n[albumentations](https:\u002F\u002Fgithub.com\u002Falbu\u002Falbumentations) - Wrapper around imgaug and other libraries.  \n[augmix](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Faugmix) - Image augmentation from Google.  \n[kornia](https:\u002F\u002Fgithub.com\u002Fkornia\u002Fkornia) - Image augmentation, feature extraction and loss functions.  \n[augly](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FAugLy) - Image, audio, text, video augmentation from Facebook.  \n[pyvips](https:\u002F\u002Fgithub.com\u002Flibvips\u002Fpyvips\u002Ftree\u002Fmaster) - Faster image processing operations.  \n\n##### Lossfunction Related\n[SegLoss](https:\u002F\u002Fgithub.com\u002FJunMa11\u002FSegLoss) - List of loss functions for medical image segmentation.  \n\n##### Activation Functions\n[rational_activations](https:\u002F\u002Fgithub.com\u002Fml-research\u002Frational_activations) - Rational activation functions.  \n\n##### Text Related\n[ktext](https:\u002F\u002Fgithub.com\u002Fhamelsmu\u002Fktext) - Utilities for pre-processing text for deep learning in Keras.   \n[textgenrnn](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Ftextgenrnn) - Ready-to-use LSTM for text generation.  \n[ctrl](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002Fctrl) - Text generation.  \n\n##### Neural network and deep learning frameworks\n[OpenMMLab](https:\u002F\u002Fgithub.com\u002Fopen-mmlab) - Framework for segmentation, classification and lots of other computer vision tasks.  \n[caffe](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe) - Deep learning framework, [pretrained models](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe\u002Fwiki\u002FModel-Zoo).  \n[mxnet](https:\u002F\u002Fgithub.com\u002Fapache\u002Fincubator-mxnet) - Deep learning framework, [book](https:\u002F\u002Fd2l.ai\u002Findex.html).  \n\n##### Libs General\n[keras](https:\u002F\u002Fkeras.io\u002F) - Neural Networks on top of [tensorflow](https:\u002F\u002Fwww.tensorflow.org\u002F), [examples](https:\u002F\u002Fgist.github.com\u002Fcandlewill\u002F552fa102352ccce42fd829ae26277d24).  \n[keras-contrib](https:\u002F\u002Fgithub.com\u002Fkeras-team\u002Fkeras-contrib) - Keras community contributions.  \n[keras-tuner](https:\u002F\u002Fgithub.com\u002Fkeras-team\u002Fkeras-tuner) - Hyperparameter tuning for Keras.  \n[hyperas](https:\u002F\u002Fgithub.com\u002Fmaxpumperla\u002Fhyperas) - Keras + Hyperopt: Convenient hyperparameter optimization wrapper.  \n[elephas](https:\u002F\u002Fgithub.com\u002Fmaxpumperla\u002Felephas) - Distributed Deep learning with Keras & Spark.  \n[tflearn](https:\u002F\u002Fgithub.com\u002Ftflearn\u002Ftflearn) - Neural Networks on top of TensorFlow.  \n[tensorlayer](https:\u002F\u002Fgithub.com\u002Ftensorlayer\u002Ftensorlayer) - Neural Networks on top of TensorFlow, [tricks](https:\u002F\u002Fgithub.com\u002Fwagamamaz\u002Ftensorlayer-tricks).  \n[tensorforce](https:\u002F\u002Fgithub.com\u002Freinforceio\u002Ftensorforce) - TensorFlow for applied reinforcement learning.  \n[autokeras](https:\u002F\u002Fgithub.com\u002Fjhfjhfj1\u002Fautokeras) - AutoML for deep learning.  \n[PlotNeuralNet](https:\u002F\u002Fgithub.com\u002FHarisIqbal88\u002FPlotNeuralNet) - Plot neural networks.  \n[lucid](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Flucid) - Neural network interpretability, [Activation Maps](https:\u002F\u002Fopenai.com\u002Fblog\u002Fintroducing-activation-atlases\u002F).  \n[tcav](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftcav) - Interpretability method.  \n[AdaBound](https:\u002F\u002Fgithub.com\u002FLuolc\u002FAdaBound) - Optimizer that trains as fast as Adam and as good as SGD, [alt](https:\u002F\u002Fgithub.com\u002Ftitu1994\u002Fkeras-adabound).  \n[foolbox](https:\u002F\u002Fgithub.com\u002Fbethgelab\u002Ffoolbox) - Adversarial examples that fool neural networks.  \n[hiddenlayer](https:\u002F\u002Fgithub.com\u002Fwaleedka\u002Fhiddenlayer) - Training metrics.  \n[imgclsmob](https:\u002F\u002Fgithub.com\u002Fosmr\u002Fimgclsmob) - Pretrained models.  \n[netron](https:\u002F\u002Fgithub.com\u002Flutzroeder\u002Fnetron) - Visualizer for deep learning and machine learning models.  \n[ffcv](https:\u002F\u002Fgithub.com\u002Flibffcv\u002Fffcv) - Fast dataloader.  \n\n##### Libs PyTorch\n[Good PyTorch Introduction](https:\u002F\u002Fcs230.stanford.edu\u002Fblog\u002Fpytorch\u002F)    \n[skorch](https:\u002F\u002Fgithub.com\u002Fdnouri\u002Fskorch) - Scikit-learn compatible neural network library that wraps PyTorch, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0J7FaLk0bmQ), [slides](https:\u002F\u002Fgithub.com\u002Fthomasjpfan\u002Fskorch_talk).  \n[fastai](https:\u002F\u002Fgithub.com\u002Ffastai\u002Ffastai) - Neural Networks in PyTorch.  \n[timm](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models) - PyTorch image models.  \n[ignite](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fignite) - Highlevel library for PyTorch.  \n[torchcv](https:\u002F\u002Fgithub.com\u002Fdonnyyou\u002Ftorchcv) - Deep Learning in Computer Vision.  \n[pytorch-optimizer](https:\u002F\u002Fgithub.com\u002Fjettify\u002Fpytorch-optimizer) - Collection of optimizers for PyTorch.  \n[pytorch-lightning](https:\u002F\u002Fgithub.com\u002FPyTorchLightning\u002FPyTorch-lightning) - Wrapper around PyTorch.  \n[litserve](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002FLitServe) - Serve models.  \n[lightly](https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR.  \n[MONAI](https:\u002F\u002Fgithub.com\u002Fproject-monai\u002Fmonai) - Deep learning in healthcare imaging.  \n[kornia](https:\u002F\u002Fgithub.com\u002Fkornia\u002Fkornia) - Image transformations, epipolar geometry, depth estimation.  \n[torchinfo](https:\u002F\u002Fgithub.com\u002FTylep\u002Ftorchinfo) - Nice model summary.  \n[lovely-tensors](https:\u002F\u002Fgithub.com\u002Fxl0\u002Flovely-tensors\u002F) - Inspect tensors, mean, std, inf values.  \n\n##### Distributed Libs\n[flexflow](https:\u002F\u002Fgithub.com\u002Fflexflow\u002FFlexFlow) - Distributed TensorFlow Keras and PyTorch.  \n[horovod](https:\u002F\u002Fgithub.com\u002Fhorovod\u002Fhorovod) - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.  \n\n##### Architecture Visualization\n[Awesome List](https:\u002F\u002Fgithub.com\u002Fashishpatel26\u002FTools-to-Design-or-Visualize-Architecture-of-Neural-Network).  \n[netron](https:\u002F\u002Fgithub.com\u002Flutzroeder\u002Fnetron) - Viewer for neural networks.  \n[visualkeras](https:\u002F\u002Fgithub.com\u002Fpaulgavrikov\u002Fvisualkeras) - Visualize Keras networks.  \n\n##### Computer Vision General\n[roboflow](https:\u002F\u002Fgithub.com\u002Froboflow\u002Fsupervision) - Reusable computer vision tools.  \n\n##### Object detection \u002F Instance Segmentation\n[Metrics reloaded: Recommendations for image analysis validation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.01653) - Guide for choosing correct image analysis metrics, [Code](https:\u002F\u002Fgithub.com\u002FProject-MONAI\u002FMetricsReloaded), [Twitter Thread](https:\u002F\u002Ftwitter.com\u002Flena_maierhein\u002Fstatus\u002F1625450342006521857)  \n[Good Yolo Explanation](https:\u002F\u002Fjonathan-hui.medium.com\u002Freal-time-object-detection-with-yolo-yolov2-28b1b93e2088)  \n[ultralytics](https:\u002F\u002Fgithub.com\u002Fultralytics\u002Fultralytics) - Easily accessible Yolo and SAM models.  \n[yolact](https:\u002F\u002Fgithub.com\u002Fdbolya\u002Fyolact) - Fully convolutional model for real-time instance segmentation.  \n[EfficientDet Pytorch](https:\u002F\u002Fgithub.com\u002Ftoandaominh1997\u002FEfficientDet.Pytorch), [EfficientDet Keras](https:\u002F\u002Fgithub.com\u002Fxuannianz\u002FEfficientDet) - Scalable and Efficient Object Detection.  \n[detectron2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2) - Object Detection (Mask R-CNN) by Facebook.  \n[simpledet](https:\u002F\u002Fgithub.com\u002FTuSimple\u002Fsimpledet) - Object Detection and Instance Recognition.  \n[CenterNet](https:\u002F\u002Fgithub.com\u002Fxingyizhou\u002FCenterNet) - Object detection.  \n[FCOS](https:\u002F\u002Fgithub.com\u002Ftianzhi0549\u002FFCOS) - Fully Convolutional One-Stage Object Detection.  \n[norfair](https:\u002F\u002Fgithub.com\u002Ftryolabs\u002Fnorfair) - Real-time 2D object tracking.  \n[Detic](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FDetic) -  Detector with image classes that can use image-level labels (facebookresearch).  \n[EasyCV](https:\u002F\u002Fgithub.com\u002Falibaba\u002FEasyCV) - Image segmentation, classification, metric-learning, object detection, pose estimation.  \n\n##### Image Classification\n[nfnets](https:\u002F\u002Fgithub.com\u002Fypeleg\u002Fnfnets-keras) - Neural network.   \n[efficientnet](https:\u002F\u002Fgithub.com\u002Flukemelas\u002FEfficientNet-PyTorch) - Neural network.   \n[pycls](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fpycls) - PyTorch image classification networks: ResNet, ResNeXt, EfficientNet, and RegNet (by Facebook).  \n\n##### Applications and Snippets\n[SPADE](https:\u002F\u002Fgithub.com\u002Fnvlabs\u002Fspade) - Semantic Image Synthesis.  \n[Entity Embeddings of Categorical Variables](https:\u002F\u002Farxiv.org\u002Fabs\u002F1604.06737), [code](https:\u002F\u002Fgithub.com\u002Fentron\u002Fentity-embedding-rossmann), [kaggle](https:\u002F\u002Fwww.kaggle.com\u002Faquatic\u002Fentity-embedding-neural-net\u002Fcode)  \n[Image Super-Resolution](https:\u002F\u002Fgithub.com\u002Fidealo\u002Fimage-super-resolution) - Super-scaling using a Residual Dense Network.  \nCell Segmentation - [Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=dVFZpodqJiI), Blog Posts: [1](https:\u002F\u002Fwww.thomasjpfan.com\u002F2018\u002F07\u002Fnuclei-image-segmentation-tutorial\u002F), [2](https:\u002F\u002Fwww.thomasjpfan.com\u002F2017\u002F08\u002Fhassle-free-unets\u002F)  \n[deeplearning-models](https:\u002F\u002Fgithub.com\u002Frasbt\u002Fdeeplearning-models) - Deep learning models.  \n\n##### Variational Autoencoders (VAEs)\n[Variational Autoencoder Explanation Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=9zKuYvjFFS8)  \n[disentanglement_lib](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fdisentanglement_lib) - BetaVAE, FactorVAE, BetaTCVAE, DIP-VAE.  \n[ladder-vae-pytorch](https:\u002F\u002Fgithub.com\u002Faddtt\u002Fladder-vae-pytorch) - Ladder Variational Autoencoders (LVAE).  \n[benchmark_VAE](https:\u002F\u002Fgithub.com\u002Fclementchadebec\u002Fbenchmark_VAE) - Unifying Generative Autoencoder implementations.  \n\n##### Generative Adversarial Networks (GANs)\n[Awesome GAN Applications](https:\u002F\u002Fgithub.com\u002Fnashory\u002Fgans-awesome-applications)  \n[The GAN Zoo](https:\u002F\u002Fgithub.com\u002Fhindupuravinash\u002Fthe-gan-zoo) - List of Generative Adversarial Networks.  \n[CycleGAN and Pix2pix](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002Fpytorch-CycleGAN-and-pix2pix) - Various image-to-image tasks.  \n[TensorFlow GAN implementations](https:\u002F\u002Fgithub.com\u002Fhwalsuklee\u002Ftensorflow-generative-model-collections)  \n[PyTorch GAN implementations](https:\u002F\u002Fgithub.com\u002Fznxlwm\u002Fpytorch-generative-model-collections)  \n[PyTorch GAN implementations](https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN#adversarial-autoencoder)  \n[StudioGAN](https:\u002F\u002Fgithub.com\u002FPOSTECH-CVLab\u002FPyTorch-StudioGAN) - PyTorch GAN implementations.  \n\n##### Transformers\n[The Annotated Transformer](https:\u002F\u002Fnlp.seas.harvard.edu\u002Fannotated-transformer\u002F) - Intro to transformers.  \n[Transformers from Scratch](https:\u002F\u002Fe2eml.school\u002Ftransformers.html) - Intro.  \n[Neural Networks: Zero to Hero](https:\u002F\u002Fkarpathy.ai\u002Fzero-to-hero.html) - Video series on building neural networks.  \n[SegFormer](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FSegFormer) - Simple and Efficient Design for Semantic Segmentation with Transformers.  \n[esvit](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fesvit) - Efficient self-supervised Vision Transformers.  \n[nystromformer](https:\u002F\u002Fgithub.com\u002FRishit-dagli\u002FNystromformer) - More efficient transformer because of approximate self-attention.  \n\n##### Deep learning on structured data\n[Great overview for deep learning for tabular data](https:\u002F\u002Fsebastianraschka.com\u002Fblog\u002F2022\u002Fdeep-learning-for-tabular-data.html)  \n[TabPFN](https:\u002F\u002Fgithub.com\u002FPriorLabs\u002FTabPFN) - Foundation Model for Tabular Data.  \n\n##### Graph-Based Neural Networks\n[How to do Deep Learning on Graphs with Graph Convolutional Networks](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780)  \n[Introduction To Graph Convolutional Networks](http:\u002F\u002Ftkipf.github.io\u002Fgraph-convolutional-networks\u002F)  \n[An attempt at demystifying graph deep learning](https:\u002F\u002Fericmjl.github.io\u002Fessays-on-data-science\u002Fmachine-learning\u002Fgraph-nets\u002F)  \n[ogb](https:\u002F\u002Fogb.stanford.edu\u002F) - Open Graph Benchmark, Benchmark datasets.  \n[networkx](https:\u002F\u002Fgithub.com\u002Fnetworkx\u002Fnetworkx) - Graph library.  \n[cugraph](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcugraph) - RAPIDS, Graph library on the GPU.  \n[pytorch-geometric](https:\u002F\u002Fgithub.com\u002Frusty1s\u002Fpytorch_geometric) - Various methods for deep learning on graphs.  \n[dgl](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fdgl) - Deep Graph Library.  \n[graph_nets](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fgraph_nets) - Build graph networks in TensorFlow, by DeepMind.  \n\n#### Model conversion\n[hummingbird](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fhummingbird) - Compile trained ML models into tensor computations (by Microsoft).  \n\n#### GPU\n[cuML](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcuml) - RAPIDS, Run traditional tabular ML tasks on GPUs, [Intro](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6XzS5XcpicM&t=2m50s).  \n[thundergbm](https:\u002F\u002Fgithub.com\u002FXtra-Computing\u002Fthundergbm) - GBDTs and Random Forest.  \n[thundersvm](https:\u002F\u002Fgithub.com\u002FXtra-Computing\u002Fthundersvm) - Support Vector Machines.  \nLegate Numpy - Distributed Numpy array multiple using GPUs by Nvidia (not released yet) [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Jxxs_moibog).  \n\n#### Regression\nOrdinal Regression: [paper](https:\u002F\u002Fonlinelibrary.wiley.com\u002Fdoi\u002F10.1002\u002Fsim.10208)  \nUnderstanding SVM Regression: [slides](https:\u002F\u002Fcs.adelaide.edu.au\u002F~chhshen\u002Fteaching\u002FML_SVR.pdf), [forum](https:\u002F\u002Fwww.quora.com\u002FHow-does-support-vector-regression-work), [paper](http:\u002F\u002Falex.smola.org\u002Fpapers\u002F2003\u002FSmoSch03b.pdf)  \n[Generalized Additive Models](https:\u002F\u002Fm-clark.github.io\u002Fgeneralized-additive-models\u002F) - Tutorial in R.  \n\n[pyearth](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fpy-earth) - Multivariate Adaptive Regression Splines (MARS), [tutorial](https:\u002F\u002Fuc-r.github.io\u002Fmars).  \n[pygam](https:\u002F\u002Fgithub.com\u002Fdswah\u002FpyGAM) - Generalized Additive Models (GAMs), [Explanation](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2015\u002F07\u002F30\u002Fgam\u002F).  \n[GLRM](https:\u002F\u002Fgithub.com\u002Fmadeleineudell\u002FLowRankModels.jl) - Generalized Low Rank Models.  \n[tweedie](https:\u002F\u002Fxgboost.readthedocs.io\u002Fen\u002Flatest\u002Fparameter.html#parameters-for-tweedie-regression-objective-reg-tweedie) - Specialized distribution for zero inflated targets, [Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=-o0lpHBq85I).  \n[MAPIE](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002FMAPIE) - Estimating prediction intervals.  \n\n#### Polynomials\n[orthopy](https:\u002F\u002Fgithub.com\u002Fnschloe\u002Forthopy) - Orthogonal polynomials in all shapes and sizes.  \n\n#### Classification\n[Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DkLPYccEJ8Y), [Notebook](https:\u002F\u002Fgithub.com\u002Fianozsvald\u002Fdata_science_delivered\u002Fblob\u002Fmaster\u002Fml_creating_correct_capable_classifiers.ipynb)  \n[Blog post: Probability Scoring](https:\u002F\u002Fmachinelearningmastery.com\u002Fhow-to-score-probability-predictions-in-python\u002F)  \n[All classification metrics](http:\u002F\u002Frali.iro.umontreal.ca\u002Frali\u002Fsites\u002Fdefault\u002Ffiles\u002Fpublis\u002FSokolovaLapalme-JIPM09.pdf)  \n[DESlib](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002FDESlib) - Dynamic classifier and ensemble selection.  \n[human-learn](https:\u002F\u002Fgithub.com\u002Fkoaning\u002Fhuman-learn) - Create and tune classifier based on your rule set.  \n\n#### Metric Learning\n[Contrastive Representation Learning](https:\u002F\u002Flilianweng.github.io\u002Flil-log\u002F2021\u002F05\u002F31\u002Fcontrastive-representation-learning.html)  \n  \n[metric-learn](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fmetric-learn) - Supervised and weakly-supervised metric learning algorithms.  \n[pytorch-metric-learning](https:\u002F\u002Fgithub.com\u002FKevinMusgrave\u002Fpytorch-metric-learning) - PyTorch metric learning.  \n[deep_metric_learning](https:\u002F\u002Fgithub.com\u002Fronekko\u002Fdeep_metric_learning) - Methods for deep metric learning.  \n[ivis](https:\u002F\u002Fbering-ivis.readthedocs.io\u002Fen\u002Flatest\u002Fsupervised.html) - Metric learning using siamese neural networks.  \n[TensorFlow similarity](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fsimilarity) - Metric learning.  \n\n#### Distance Functions\n[Steck et al. - Is Cosine-Similarity of Embeddings Really About Similarity?](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.05440)  \n[scipy.spatial](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fspatial.distance.html) - All kinds of distance metrics.  \n[vegdist](https:\u002F\u002Frdrr.io\u002Fcran\u002Fvegan\u002Fman\u002Fvegdist.html) - Distance metrics (R package).  \n[pyemd](https:\u002F\u002Fgithub.com\u002Fwmayner\u002Fpyemd) - Earth Mover's Distance \u002F Wasserstein distance, similarity between histograms. [OpenCV implementation](https:\u002F\u002Fdocs.opencv.org\u002F3.4\u002Fd6\u002Fdc7\u002Fgroup__imgproc__hist.html), [POT implementation](https:\u002F\u002Fpythonot.github.io\u002Fauto_examples\u002Fplot_OT_2D_samples.html)   \n[dcor](https:\u002F\u002Fgithub.com\u002Fvnmabus\u002Fdcor)  - Distance correlation and related Energy statistics.  \n[GeomLoss](https:\u002F\u002Fwww.kernel-operations.io\u002Fgeomloss\u002F) - Kernel norms, Hausdorff divergences, Debiased Sinkhorn divergences (=approximation of Wasserstein distance).  \n\n#### Self-supervised Learning\n[lightly](https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly) - MoCo, SimCLR, SimSiam, Barlow Twins, BYOL, NNCLR.  \n[vissl](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fvissl) - Self-Supervised Learning with PyTorch: RotNet, Jigsaw, NPID, ClusterFit, PIRL, SimCLR, MoCo, DeepCluster, SwAV.  \n\n#### Clustering\n[Overview of clustering algorithms applied image data (= Deep Clustering)](https:\u002F\u002Fdeepnotes.io\u002Fdeep-clustering).  \n[Clustering with Deep Learning: Taxonomy and New Methods](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1801.07648.pdf).  \n[Hierarchical Cluster Analysis (R Tutorial)](https:\u002F\u002Fuc-r.github.io\u002Fhc_clustering) - Dendrogram, Tanglegram  \n[Schubert - Stop using the elbow criterion for k-means and how to choose the number of clusters instead](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.12189)  \n[hdbscan](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fhdbscan) - Clustering algorithm, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=dGsxd67IFiU), [blog](https:\u002F\u002Ftowardsdatascience.com\u002Funderstanding-hdbscan-and-density-based-clustering-121dbee1320e).  \n[pyclustering](https:\u002F\u002Fgithub.com\u002Fannoviko\u002Fpyclustering) - All sorts of clustering algorithms.  \n[FCPS](https:\u002F\u002Fgithub.com\u002FMthrun\u002FFCPS) -  Fundamental Clustering Problems Suite (R package).  \n[GaussianMixture](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.mixture.GaussianMixture.html) - Generalized k-means clustering using a mixture of Gaussian distributions, [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=aICqoAG5BXQ).  \n[nmslib](https:\u002F\u002Fgithub.com\u002Fnmslib\u002Fnmslib) - Similarity search library and toolkit for evaluation of k-NN methods.  \n[merf](https:\u002F\u002Fgithub.com\u002Fmanifoldai\u002Fmerf) - Mixed Effects Random Forest for Clustering, [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=gWj4ZwB7f3o)  \n[tree-SNE](https:\u002F\u002Fgithub.com\u002Fisaacrob\u002Ftreesne) - Hierarchical clustering algorithm based on t-SNE.  \n[MiniSom](https:\u002F\u002Fgithub.com\u002FJustGlowing\u002Fminisom) - Pure Python implementation of the Self Organizing Maps.  \n[distribution_clustering](https:\u002F\u002Fgithub.com\u002FEricElmoznino\u002Fdistribution_clustering), [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.02624), [related paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.07770), [alt](https:\u002F\u002Fgithub.com\u002Fr0f1\u002Fdistribution_clustering).  \n[phenograph](https:\u002F\u002Fgithub.com\u002Fdpeerlab\u002Fphenograph) - Clustering by community detection.  \n[FastPG](https:\u002F\u002Fgithub.com\u002Fsararselitsky\u002FFastPG) - Clustering of single cell data (RNA). Improvement of phenograph, [Paper](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F342339899_FastPG_Fast_clustering_of_millions_of_single_cells).  \n[HypHC](https:\u002F\u002Fgithub.com\u002FHazyResearch\u002FHypHC) - Hyperbolic Hierarchical Clustering.  \n[BanditPAM](https:\u002F\u002Fgithub.com\u002FThrunGroup\u002FBanditPAM) - Improved k-Medoids Clustering.  \n[dendextend](https:\u002F\u002Fgithub.com\u002Ftalgalili\u002Fdendextend) - Comparing dendrograms (R package).  \n[DeepDPM](https:\u002F\u002Fgithub.com\u002FBGU-CS-VIL\u002FDeepDPM) - Deep Clustering With An Unknown Number of Clusters.  \n[generalized-kmeans-clustering](https:\u002F\u002Fgithub.com\u002Fderrickburns\u002Fgeneralized-kmeans-clustering) - Generalized k-means clustering.  \n\n##### Clustering Evalutation\n* [Wagner, Wagner - Comparing Clusterings - An Overview](https:\u002F\u002Fpublikationen.bibliothek.kit.edu\u002F1000011477\u002F812079)\n  * [Adjusted Rand Index](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.adjusted_rand_score.html)\n  * [Normalized Mutual Information](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.normalized_mutual_info_score.html)\n  * [Adjusted Mutual Information](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.adjusted_mutual_info_score.html)\n  * [Fowlkes-Mallows Score](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.fowlkes_mallows_score.html)\n  * [Silhouette Coefficient](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.silhouette_score.html)\n  * [Variation of Information](https:\u002F\u002Fgist.github.com\u002Fjwcarr\u002F626cbc80e0006b526688), [Julia](https:\u002F\u002Fclusteringjl.readthedocs.io\u002Fen\u002Flatest\u002Fvarinfo.html)\n  * [Pair Confusion Matrix](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.cluster.pair_confusion_matrix.html)\n  * [Consensus Score](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.consensus_score.html) - The similarity of two sets of biclusters.\n* [Assessing the quality of a clustering (video)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Mf6MqIS2ql4)   \n* [fpc](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Ffpc\u002Findex.html) - Various methods for clustering and cluster validation (R package).  \n  * Minimum distance between any two clusters\n  * Distance between centroids\n  * p-separation index: Like minimum distance. Look at the average distance to nearest point in different cluster for p=10% \"border\" points in any cluster. Measuring density, measuring mountains vs valleys\n  * Estimate density by weighted count of close points \n* Other measures:\n  * Within-cluster average distance\n  * Mean of within-cluster average distance over nearest-cluster average distance (silhouette score)\n  * Within-cluster similarity measure to normal\u002Funiform\n  * Within-cluster (squared) distance to centroid (this is the k-Means loss function)\n  * Correlation coefficient between distance we originally had to the distance the are induced by the clustering (Huberts Gamma)\n  * Entropy of cluster sizes\n  * Average largest within-cluster gap\n  * Variation of clusterings on bootstrapped data\n\n#### Multi-label classification\n[scikit-multilearn](https:\u002F\u002Fgithub.com\u002Fscikit-multilearn\u002Fscikit-multilearn) - Multi-label classification, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=m-tAASQA7XQ&t=18m57s).  \n\n#### Critical AI Texts\n[Sublime - The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.18656)  \n\n#### Signal Processing and Filtering\n[Stanford Lecture Series on Fourier Transformation](https:\u002F\u002Fsee.stanford.edu\u002FCourse\u002FEE261), [Youtube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=gZNm7L96pfY&list=PLB24BC7956EE040CD&index=1), [Lecture Notes](https:\u002F\u002Fsee.stanford.edu\u002Fmaterials\u002Flsoftaee261\u002Fbook-fall-07.pdf).  \n[Visual Fourier explanation](https:\u002F\u002Fdsego.github.io\u002Fdemystifying-fourier\u002F).  \n[The Scientist & Engineer's Guide to Digital Signal Processing (1999)](https:\u002F\u002Fwww.analog.com\u002Fen\u002Feducation\u002Feducation-library\u002Fscientist_engineers_guide.html) - Chapter 3 has good introduction to Bessel, Butterworth and Chebyshev filters.  \n[Kalman Filter article](https:\u002F\u002Fwww.bzarg.com\u002Fp\u002Fhow-a-kalman-filter-works-in-pictures).  \n[Kalman Filter book](https:\u002F\u002Fgithub.com\u002Frlabbe\u002FKalman-and-Bayesian-Filters-in-Python) - Focuses on intuition using Jupyter Notebooks. Includes Bayesian and various Kalman filters.  \n[Interactive Tool](https:\u002F\u002Ffiiir.com\u002F) for FIR and IIR filters, [Examples](https:\u002F\u002Fplot.ly\u002Fpython\u002Ffft-filters\u002F).  \n[filterpy](https:\u002F\u002Fgithub.com\u002Frlabbe\u002Ffilterpy) - Kalman filtering and optimal estimation library.  \n\n#### Filtering in Python\n[scipy.signal](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fsignal.html)\n* [Butterworth low-pass filter example](https:\u002F\u002Fgithub.com\u002Fguillaume-chevalier\u002Ffiltering-stft-and-laplace-transform)\n* [Savitzky–Golay filter](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.signal.savgol_filter.html), [W](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSavitzky%E2%80%93Golay_filter)  \n[pandas.Series.rolling](https:\u002F\u002Fpandas.pydata.org\u002Fdocs\u002Freference\u002Fapi\u002Fpandas.Series.rolling.html) - Choose appropriate `win_type`.  \n\n#### Geometry\n[geomstats](https:\u002F\u002Fgithub.com\u002Fgeomstats\u002Fgeomstats) - Computations and statistics on manifolds with geometric structures.  \n\n#### Time Series\n[Time Series Anomaly Detection Review Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.20512)  \n[statsmodels](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Ftsa.html) - Time series analysis, [seasonal decompose](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.tsa.seasonal.seasonal_decompose.html) [example](https:\u002F\u002Fgist.github.com\u002Fbalzer82\u002F5cec6ad7adc1b550e7ee), [SARIMA](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.tsa.statespace.sarimax.SARIMAX.html), [granger causality](http:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.tsa.stattools.grangercausalitytests.html).  \n[darts](https:\u002F\u002Fgithub.com\u002Funit8co\u002Fdarts) - Time Series library (LightGBM, Neural Networks).  \n[kats](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fkats) - Time series prediction library by Facebook.  \n[prophet](https:\u002F\u002Fgithub.com\u002Ffacebook\u002Fprophet) - Time series prediction library by Facebook.  \n[neural_prophet](https:\u002F\u002Fgithub.com\u002Fourownstory\u002Fneural_prophet) - Time series prediction built on PyTorch.  \n[pmdarima](https:\u002F\u002Fgithub.com\u002Falkaline-ml\u002Fpmdarima) - Wrapper for (Auto-) ARIMA.  \n[modeltime](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fmodeltime\u002Findex.html) - Time series forecasting framework (R package).  \n[pyflux](https:\u002F\u002Fgithub.com\u002FRJT1990\u002Fpyflux) - Time series prediction algorithms (ARIMA, GARCH, GAS, Bayesian).  \n[atspy](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Fatspy) - Automated Time Series Models.  \n[pm-prophet](https:\u002F\u002Fgithub.com\u002Fluke14free\u002Fpm-prophet) - Time series prediction and decomposition library.  \n[htsprophet](https:\u002F\u002Fgithub.com\u002FCollinRooney12\u002Fhtsprophet) - Hierarchical Time Series Forecasting using Prophet.  \n[nupic](https:\u002F\u002Fgithub.com\u002Fnumenta\u002Fnupic) - Hierarchical Temporal Memory (HTM) for Time Series Prediction and Anomaly Detection.  \n[tensorflow](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow\u002F) - LSTM and others, examples: [link](\nhttps:\u002F\u002Fmachinelearningmastery.com\u002Ftime-series-forecasting-long-short-term-memory-network-python\u002F\n), [link](https:\u002F\u002Fgithub.com\u002Fhzy46\u002FTensorFlow-Time-Series-Examples), seq2seq: [1](https:\u002F\u002Fmachinelearningmastery.com\u002Fhow-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption\u002F), [2](https:\u002F\u002Fgithub.com\u002Fguillaume-chevalier\u002Fseq2seq-signal-prediction), [3](https:\u002F\u002Fgithub.com\u002FJEddy92\u002FTimeSeries_Seq2Seq\u002Fblob\u002Fmaster\u002Fnotebooks\u002FTS_Seq2Seq_Intro.ipynb), [4](https:\u002F\u002Fgithub.com\u002FLukeTonin\u002Fkeras-seq-2-seq-signal-prediction)  \n[tspreprocess](https:\u002F\u002Fgithub.com\u002FMaxBenChrist\u002Ftspreprocess) - Preprocessing: Denoising, Compression, Resampling.  \n[tsfresh](https:\u002F\u002Fgithub.com\u002Fblue-yonder\u002Ftsfresh) - Time series feature engineering.  \n[tsfel](https:\u002F\u002Fgithub.com\u002Ffraunhoferportugal\u002Ftsfel) - Time series feature extraction.  \n[thunder](https:\u002F\u002Fgithub.com\u002Fthunder-project\u002Fthunder) - Data structures and algorithms for loading, processing, and analyzing time series data.  \n[gatspy](https:\u002F\u002Fwww.astroml.org\u002Fgatspy\u002F) - General tools for Astronomical Time Series, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=E4NMZyfao2c).  \n[gendis](https:\u002F\u002Fgithub.com\u002FIBCNServices\u002FGENDIS) - shapelets, [example](https:\u002F\u002Fgithub.com\u002FIBCNServices\u002FGENDIS\u002Fblob\u002Fmaster\u002Fgendis\u002Fexample.ipynb).  \n[tslearn](https:\u002F\u002Fgithub.com\u002Frtavenar\u002Ftslearn) - Time series clustering and classification, `TimeSeriesKMeans`, `TimeSeriesKMeans`.  \n[pastas](https:\u002F\u002Fgithub.com\u002Fpastas\u002Fpastas) - Analysis of Groundwater Time Series.  \n[fastdtw](https:\u002F\u002Fgithub.com\u002Fslaypni\u002Ffastdtw) - Dynamic Time Warp Distance.  \n[fable](https:\u002F\u002Fwww.rdocumentation.org\u002Fpackages\u002Ffable\u002Fversions\u002F0.0.0.9000) - Time Series Forecasting (R package).  \n[pydlm](https:\u002F\u002Fgithub.com\u002Fwwrechard\u002Fpydlm) - Bayesian time series modelling ([R package](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fbsts\u002Findex.html), [Blog post](http:\u002F\u002Fwww.unofficialgoogledatascience.com\u002F2017\u002F07\u002Ffitting-bayesian-structural-time-series.html))  \n[PyAF](https:\u002F\u002Fgithub.com\u002Fantoinecarme\u002Fpyaf) - Automatic Time Series Forecasting.  \n[luminol](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fluminol) - Anomaly Detection and Correlation library from Linkedin.  \n[matrixprofile-ts](https:\u002F\u002Fgithub.com\u002Ftarget\u002Fmatrixprofile-ts) - Detecting patterns and anomalies, [website](https:\u002F\u002Fwww.cs.ucr.edu\u002F~eamonn\u002FMatrixProfile.html), [ppt](https:\u002F\u002Fwww.cs.ucr.edu\u002F~eamonn\u002FMatrix_Profile_Tutorial_Part1.pdf), [alternative](https:\u002F\u002Fgithub.com\u002Fmatrix-profile-foundation\u002Fmass-ts).  \n[stumpy](https:\u002F\u002Fgithub.com\u002FTDAmeritrade\u002Fstumpy) - Another matrix profile library.  \n[obspy](https:\u002F\u002Fgithub.com\u002Fobspy\u002Fobspy) - Seismology package. Useful `classic_sta_lta` function.  \n[RobustSTL](https:\u002F\u002Fgithub.com\u002FLeeDoYup\u002FRobustSTL) - Robust Seasonal-Trend Decomposition.  \n[seglearn](https:\u002F\u002Fgithub.com\u002Fdmbee\u002Fseglearn) - Time Series library.  \n[pyts](https:\u002F\u002Fgithub.com\u002Fjohannfaouzi\u002Fpyts) - Time series transformation and classification, [Imaging time series](https:\u002F\u002Fpyts.readthedocs.io\u002Fen\u002Flatest\u002Fauto_examples\u002Findex.html#imaging-time-series).  \nTurn time series into images and use Neural Nets: [example](https:\u002F\u002Fgist.github.com\u002Foguiza\u002Fc9c373aec07b96047d1ba484f23b7b47), [example](https:\u002F\u002Fgithub.com\u002Fkiss90\u002Ftime-series-classification).  \n[sktime](https:\u002F\u002Fgithub.com\u002Falan-turing-institute\u002Fsktime), [sktime-dl](https:\u002F\u002Fgithub.com\u002Fuea-machine-learning\u002Fsktime-dl) - Toolbox for (deep) learning with time series.   \n[adtk](https:\u002F\u002Fgithub.com\u002Farundo\u002Fadtk) - Time Series Anomaly Detection.  \n[rocket](https:\u002F\u002Fgithub.com\u002Fangus924\u002Frocket) - Time Series classification using random convolutional kernels.  \n[luminaire](https:\u002F\u002Fgithub.com\u002Fzillow\u002Fluminaire) - Anomaly Detection for time series.  \n[etna](https:\u002F\u002Fgithub.com\u002Ftinkoff-ai\u002Fetna) - Time Series library.  \n[Chaos Genius](https:\u002F\u002Fgithub.com\u002Fchaos-genius\u002Fchaos_genius) - ML powered analytics engine for outlier\u002Fanomaly detection and root cause analysis.  \n[timesfm](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ftimesfm) - Pretrained Time Series Foundation Model from Google.  \n\n#### Time Series - Nixla\n[nixtla](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fnixtla) - Pretrained Time Series Foundation Model for forecasting and anomaly detection.  \n[statsforecast](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fstatsforecast) - Forecasting with statistical and econometric models.  \n[neuralforecast](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fneuralforecast) - Forecasting with neural networks.  \n[mlforecast](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fmlforecast) - Forecasting with ML models.  \n[hierarchicalforecast](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fhierarchicalforecast) - Hierarchical forecasting with statistical and econometric methods.  \n\n##### Time Series Evaluation\n[TimeSeriesSplit](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.model_selection.TimeSeriesSplit.html) - Sklearn time series split.  \n[tscv](https:\u002F\u002Fgithub.com\u002FWenjieZ\u002FTSCV) - Evaluation with gap.  \n\n#### Financial Data and Trading\nTutorial on using cvxpy: [1](https:\u002F\u002Fcalmcode.io\u002Fcvxpy-one\u002Fthe-stigler-diet.html), [2](https:\u002F\u002Fcalmcode.io\u002Fcvxpy-two\u002Fintroduction.html)  \n[pandas-datareader](https:\u002F\u002Fpandas-datareader.readthedocs.io\u002Fen\u002Flatest\u002Fwhatsnew.html) - Read stock data.  \n[yfinance](https:\u002F\u002Fgithub.com\u002Franaroussi\u002Fyfinance) - Read stock data from Yahoo Finance.  \n[findatapy](https:\u002F\u002Fgithub.com\u002Fcuemacro\u002Ffindatapy) - Read stock data from various sources.  \n[ta](https:\u002F\u002Fgithub.com\u002Fbukosabino\u002Fta) - Technical analysis library.  \n[backtrader](https:\u002F\u002Fgithub.com\u002Fmementum\u002Fbacktrader) - Backtesting for trading strategies.  \n[surpriver](https:\u002F\u002Fgithub.com\u002Ftradytics\u002Fsurpriver) - Find high moving stocks before they move using anomaly detection and machine learning.  \n[ffn](https:\u002F\u002Fgithub.com\u002Fpmorissette\u002Fffn) - Financial functions.  \n[bt](https:\u002F\u002Fgithub.com\u002Fpmorissette\u002Fbt) - Backtesting algorithms.  \n[alpaca-trade-api-python](https:\u002F\u002Fgithub.com\u002Falpacahq\u002Falpaca-trade-api-python) - Commission-free trading through API.  \n[eiten](https:\u002F\u002Fgithub.com\u002Ftradytics\u002Feiten) - Eigen portfolios, minimum variance portfolios and other algorithmic investing strategies.  \n[tf-quant-finance](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Ftf-quant-finance) - Quantitative finance tools in TensorFlow, by Google.  \n[quantstats](https:\u002F\u002Fgithub.com\u002Franaroussi\u002Fquantstats) - Portfolio management.  \n[Riskfolio-Lib](https:\u002F\u002Fgithub.com\u002Fdcajasn\u002FRiskfolio-Lib) - Portfolio optimization and strategic asset allocation.  \n[OpenBBTerminal](https:\u002F\u002Fgithub.com\u002FOpenBB-finance\u002FOpenBBTerminal) - Terminal.  \n[mplfinance](https:\u002F\u002Fgithub.com\u002Fmatplotlib\u002Fmplfinance) - Financial markets data visualization.  \n\n##### Quantopian Stack\n[pyfolio](https:\u002F\u002Fgithub.com\u002Fquantopian\u002Fpyfolio) - Portfolio and risk analytics.  \n[zipline](https:\u002F\u002Fgithub.com\u002Fquantopian\u002Fzipline) - Algorithmic trading.  \n[alphalens](https:\u002F\u002Fgithub.com\u002Fquantopian\u002Falphalens) - Performance analysis of predictive stock factors.  \n[empyrical](https:\u002F\u002Fgithub.com\u002Fquantopian\u002Fempyrical) - Financial risk metrics.  \n[trading_calendars](https:\u002F\u002Fgithub.com\u002Fquantopian\u002Ftrading_calendars) - Calendars for various securities exchanges.  \n\n#### Survival Analysis\n[Time-dependent Cox Model in R](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F101353\u002Fcox-regression-with-time-varying-covariates).  \n[lifelines](https:\u002F\u002Flifelines.readthedocs.io\u002Fen\u002Flatest\u002F) - Survival analysis, Cox PH Regression, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=aKZQUaNHYb0), [talk2](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=fli-yE5grtY).  \n[scikit-survival](https:\u002F\u002Fgithub.com\u002Fsebp\u002Fscikit-survival) - Survival analysis.  \n[xgboost](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fxgboost) - `\"objective\": \"survival:cox\"` [NHANES example](https:\u002F\u002Fshap.readthedocs.io\u002Fen\u002Flatest\u002Fexample_notebooks\u002Ftabular_examples\u002Ftree_based_models\u002FNHANES%20I%20Survival%20Model.html)  \n[survivalstan](https:\u002F\u002Fgithub.com\u002Fhammerlab\u002Fsurvivalstan) - Survival analysis, [intro](http:\u002F\u002Fwww.hammerlab.org\u002F2017\u002F06\u002F26\u002Fintroducing-survivalstan\u002F).  \n[convoys](https:\u002F\u002Fgithub.com\u002Fbetter\u002Fconvoys) - Analyze time lagged conversions.  \nRandomSurvivalForests (R packages: randomForestSRC, ggRandomForests).  \n[pysurvival](https:\u002F\u002Fgithub.com\u002Fsquare\u002Fpysurvival) - Survival analysis.  \n[DeepSurvivalMachines](https:\u002F\u002Fgithub.com\u002Fautonlab\u002FDeepSurvivalMachines) - Fully Parametric Survival Regression.  \n[auton-survival](https:\u002F\u002Fgithub.com\u002Fautonlab\u002Fauton-survival) - Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events.  \n\n#### Outlier Detection & Anomaly Detection\n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Foutlier_detection.html) - Isolation Forest and others.  \n[pyod](https:\u002F\u002Fpyod.readthedocs.io\u002Fen\u002Flatest\u002Fpyod.html) - Outlier Detection \u002F Anomaly Detection.  \n[eif](https:\u002F\u002Fgithub.com\u002Fsahandha\u002Feif) - Extended Isolation Forest.  \n[AnomalyDetection](https:\u002F\u002Fgithub.com\u002Ftwitter\u002FAnomalyDetection) - Anomaly detection (R package).  \n[luminol](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fluminol) - Anomaly Detection and Correlation library from Linkedin.  \nDistances for comparing histograms and detecting outliers - [Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=U7xdiGc7IRU): [Kolmogorov-Smirnov](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy-0.14.0\u002Freference\u002Fgenerated\u002Fscipy.stats.ks_2samp.html), [Wasserstein](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.wasserstein_distance.html), [Energy Distance (Cramer)](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.energy_distance.html), [Kullback-Leibler divergence](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.special.kl_div.html).  \n[banpei](https:\u002F\u002Fgithub.com\u002Ftsurubee\u002Fbanpei) - Anomaly detection library based on singular spectrum transformation.  \n[telemanom](https:\u002F\u002Fgithub.com\u002Fkhundman\u002Ftelemanom) - Detect anomalies in multivariate time series data using LSTMs.  \n[luminaire](https:\u002F\u002Fgithub.com\u002Fzillow\u002Fluminaire) - Anomaly Detection for time series.  \n[rrcf](https:\u002F\u002Fgithub.com\u002FkLabUM\u002Frrcf) - Robust Random Cut Forest algorithm for anomaly detection on streams.  \n\n#### Concept Drift & Domain Shift\n[TorchDrift](https:\u002F\u002Fgithub.com\u002FTorchDrift\u002FTorchDrift) - Drift Detection for PyTorch Models.  \n[alibi-detect](https:\u002F\u002Fgithub.com\u002FSeldonIO\u002Falibi-detect) - Algorithms for outlier, adversarial and drift detection.  \n[evidently](https:\u002F\u002Fgithub.com\u002Fevidentlyai\u002Fevidently) - Evaluate and monitor ML models from validation to production.  \n[Lipton et al. - Detecting and Correcting for Label Shift with Black Box Predictors](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.03916).  \n[Bu et al. - A pdf-Free Change Detection Test Based on Density Difference Estimation](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F7745962).  \n\n#### Ranking\n[lightning](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Flightning) - Large-scale linear classification, regression and ranking.  \n\n#### Causal Inference\n\n##### Texts\n[Chatton et al. - The Causal Cookbook: Recipes for Propensity Scores, G-Computation, and Doubly Robust Standardization](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002F10.1177\u002F25152459241236149)  \n[Statistical Rethinking](https:\u002F\u002Fgithub.com\u002Frmcelreath\u002Fstat_rethinking_2022) - Video Lecture Series, Bayesian Statistics, Causal Models, [R](https:\u002F\u002Fbookdown.org\u002Fcontent\u002F4857\u002F), [python](https:\u002F\u002Fgithub.com\u002Fpymc-devs\u002Fresources\u002Ftree\u002Fmaster\u002FRethinking_2), [numpyro1](https:\u002F\u002Fgithub.com\u002Fasuagar\u002Fstatrethink-course-numpyro-2019), [numpyro2](https:\u002F\u002Ffehiepsi.github.io\u002Frethinking-numpyro\u002F), [tensorflow-probability](https:\u002F\u002Fgithub.com\u002Fksachdeva\u002Frethinking-tensorflow-probability).  \n[Naimi et al. - An introduction to g methods](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC6074945\u002F)  \n[CS 594 Causal Inference and Learning](https:\u002F\u002Fwww.cs.uic.edu\u002F~elena\u002Fcourses\u002Ffall19\u002Fcs594cil.html)  \n[Marginal Effects Tutorial](https:\u002F\u002Fmarginaleffects.com\u002Fvignettes\u002Fgcomputation.html) - Marginal Effects, g-computation and more.  \n[Python Causality Handbook](https:\u002F\u002Fgithub.com\u002Fmatheusfacure\u002Fpython-causality-handbook)  \n[The Effect: An Introduction to Research Design and Causality](https:\u002F\u002Ftheeffectbook.net\u002Findex.html) - Book  \n[Structual Equation Modeling](https:\u002F\u002Fm-clark.github.io\u002Fsem\u002F) - Tutorial in R.  \n\n##### Tools\n[pecan](https:\u002F\u002Fpecan-tool.rpsychologist.com\u002F) - Online tool for building interactive perceived causal networks.  \n[dagitty](https:\u002F\u002Fwww.dagitty.net\u002F) - Build causal DAG.  \n[dowhy](https:\u002F\u002Fgithub.com\u002Fpy-why\u002Fdowhy) - Estimate causal effects.  \n[CausalImpact](https:\u002F\u002Fgithub.com\u002Ftcassou\u002Fcausal_impact) - Causal Impact Analysis ([R package](https:\u002F\u002Fgoogle.github.io\u002FCausalImpact\u002FCausalImpact.html)).  \n[causallib](https:\u002F\u002Fgithub.com\u002FIBM\u002Fcausallib) - Modular causal inference analysis and model evaluations by IBM, [examples](https:\u002F\u002Fgithub.com\u002FIBM\u002Fcausallib\u002Ftree\u002Fmaster\u002Fexamples).  \n[causalml](https:\u002F\u002Fgithub.com\u002Fuber\u002Fcausalml) - Causal inference by Uber.  \n[upliftml](https:\u002F\u002Fgithub.com\u002Fbookingcom\u002Fupliftml) - Causal inference by Booking.com.  \n[causality](https:\u002F\u002Fgithub.com\u002Fakelleh\u002Fcausality) - Causal analysis using observational datasets.  \n[DoubleML](https:\u002F\u002Fgithub.com\u002FDoubleML\u002Fdoubleml-for-py) - Machine Learning + Causal inference, [Tweet](https:\u002F\u002Ftwitter.com\u002FChristophMolnar\u002Fstatus\u002F1574338002305880068), [Presentation](https:\u002F\u002Fscholar.princeton.edu\u002Fsites\u002Fdefault\u002Ffiles\u002Fbstewart\u002Ffiles\u002Ffelton.chern_.slides.20190318.pdf), [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.00060v1).  \n[EconML](https:\u002F\u002Fgithub.com\u002Fpy-why\u002FEconML) - Heterogeneous Treatment Effects Estimation by Microsoft.  \n\n##### Papers\n[Bours - Confounding](https:\u002F\u002Fedisciplinas.usp.br\u002Fpluginfile.php\u002F5625667\u002Fmod_resource\u002Fcontent\u002F3\u002FNontechnicalexplanation-counterfactualdefinition-confounding.pdf)  \n[Bours - Effect Modification and Interaction](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0895435621000330)  \n\n#### Probabilistic Modelling and Bayes\n[Intro](https:\u002F\u002Ferikbern.com\u002F2018\u002F10\u002F08\u002Fthe-hackers-guide-to-uncertainty-estimates.html), [Guide](https:\u002F\u002Fgithub.com\u002FCamDavidsonPilon\u002FProbabilistic-Programming-and-Bayesian-Methods-for-Hackers)  \n[PyMC3](https:\u002F\u002Fwww.pymc.io\u002Fprojects\u002Fdocs\u002Fen\u002Fstable\u002Flearn.html) - Bayesian modelling.  \n[numpyro](https:\u002F\u002Fgithub.com\u002Fpyro-ppl\u002Fnumpyro) - Probabilistic programming with numpy, built on [pyro](https:\u002F\u002Fgithub.com\u002Fpyro-ppl\u002Fpyro).  \n[pomegranate](https:\u002F\u002Fgithub.com\u002Fjmschrei\u002Fpomegranate) - Probabilistic modelling, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=dE5j6NW-Kzg).  \n[pmlearn](https:\u002F\u002Fgithub.com\u002Fpymc-learn\u002Fpymc-learn) - Probabilistic machine learning.  \n[arviz](https:\u002F\u002Fgithub.com\u002Farviz-devs\u002Farviz) - Exploratory analysis of Bayesian models.  \n[zhusuan](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fzhusuan) - Bayesian deep learning, generative models.  \n[edward](https:\u002F\u002Fgithub.com\u002Fblei-lab\u002Fedward) - Probabilistic modelling, inference, and criticism, [Mixture Density Networks (MNDs)](http:\u002F\u002Fedwardlib.org\u002Ftutorials\u002Fmixture-density-network), [MDN Explanation](https:\u002F\u002Ftowardsdatascience.com\u002Fa-hitchhikers-guide-to-mixture-density-networks-76b435826cca).  \n[Pyro](https:\u002F\u002Fgithub.com\u002Fpyro-ppl\u002Fpyro) - Deep Universal Probabilistic Programming.  \n[TensorFlow probability](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fprobability) - Deep learning and probabilistic modelling, [talk1](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=KJxmC5GCWe4), [notebook talk1](https:\u002F\u002Fgithub.com\u002FAlxndrMlk\u002FPyDataGlobal2021\u002Fblob\u002Fmain\u002F00_PyData_Global_2021_nb_full.ipynb), [talk2](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=BrwKURU-wpk), [example](https:\u002F\u002Fgithub.com\u002FCamDavidsonPilon\u002FProbabilistic-Programming-and-Bayesian-Methods-for-Hackers\u002Fblob\u002Fmaster\u002FChapter1_Introduction\u002FCh1_Introduction_TFP.ipynb).  \n[bambi](https:\u002F\u002Fgithub.com\u002Fbambinos\u002Fbambi) - High-level Bayesian model-building interface on top of PyMC3.  \n[neural-tangents](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fneural-tangents) - Infinite Neural Networks.  \n[bnlearn](https:\u002F\u002Fgithub.com\u002Ferdogant\u002Fbnlearn) - Bayesian networks, parameter learning, inference and sampling methods.  \n\n#### Gaussian Processes\n[Visualization](http:\u002F\u002Fwww.infinitecuriosity.org\u002Fvizgp\u002F), [Article](https:\u002F\u002Fdistill.pub\u002F2019\u002Fvisual-exploration-gaussian-processes\u002F)  \n[GPyOpt](https:\u002F\u002Fgithub.com\u002FSheffieldML\u002FGPyOpt) - Gaussian process optimization.   \n[GPflow](https:\u002F\u002Fgithub.com\u002FGPflow\u002FGPflow) - Gaussian processes (TensorFlow).  \n[gpytorch](https:\u002F\u002Fgpytorch.ai\u002F) - Gaussian processes (PyTorch).  \n\n#### Stacking Models and Ensembles\n[Model Stacking Blog Post](http:\u002F\u002Fblog.kaggle.com\u002F2017\u002F06\u002F15\u002Fstacking-made-easy-an-introduction-to-stacknet-by-competitions-grandmaster-marios-michailidis-kazanova\u002F)  \n[mlxtend](https:\u002F\u002Fgithub.com\u002Frasbt\u002Fmlxtend) - `EnsembleVoteClassifier`, `StackingRegressor`, `StackingCVRegressor` for model stacking.  \n[vecstack](https:\u002F\u002Fgithub.com\u002Fvecxoz\u002Fvecstack) - Stacking ML models.  \n[StackNet](https:\u002F\u002Fgithub.com\u002Fkaz-Anova\u002FStackNet) - Stacking ML models.  \n[mlens](https:\u002F\u002Fgithub.com\u002Fflennerhag\u002Fmlens) - Ensemble learning.  \n[combo](https:\u002F\u002Fgithub.com\u002Fyzhao062\u002Fcombo) - Combining ML models (stacking, ensembling).  \n\n#### Model Evaluation\n[evaluate](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate) - Evaluate machine learning models (huggingface).  \n[pycm](https:\u002F\u002Fgithub.com\u002Fsepandhaghighi\u002Fpycm) - Multi-class confusion matrix.  \n[pandas_ml](https:\u002F\u002Fgithub.com\u002Fpandas-ml\u002Fpandas-ml) - Confusion matrix.  \nPlotting learning curve: [link](http:\u002F\u002Fwww.ritchieng.com\u002Fmachinelearning-learning-curve\u002F).  \n[yellowbrick](http:\u002F\u002Fwww.scikit-yb.org\u002Fen\u002Flatest\u002Fapi\u002Fmodel_selection\u002Flearning_curve.html) - Learning curve.  \n[pyroc](https:\u002F\u002Fgithub.com\u002Fnoudald\u002Fpyroc) - Receiver Operating Characteristic (ROC) curves.  \n\n#### Model Uncertainty\n[awesome-conformal-prediction](https:\u002F\u002Fgithub.com\u002Fvaleman\u002Fawesome-conformal-prediction) - Uncertainty quantification.  \n[uncertainty-toolbox](https:\u002F\u002Fgithub.com\u002Funcertainty-toolbox\u002Funcertainty-toolbox) - Predictive uncertainty quantification, calibration, metrics, and visualization.  \n\n#### Model Explanation, Interpretability, Feature Importance\n[Princeton - Reproducibility Crisis in ML‑based Science](https:\u002F\u002Fsites.google.com\u002Fprinceton.edu\u002Frep-workshop)   \n[Book](https:\u002F\u002Fchristophm.github.io\u002Finterpretable-ml-book\u002Fagnostic.html), [Examples](https:\u002F\u002Fgithub.com\u002Fjphall663\u002Finterpretable_machine_learning_with_python)  \nscikit-learn - [Permutation Importance](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.inspection.permutation_importance.html) (can be used on any trained classifier) and [Partial Dependence](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.inspection.partial_dependence.html)  \n[shap](https:\u002F\u002Fgithub.com\u002Fslundberg\u002Fshap) - Explain predictions of machine learning models, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=C80SQe16Rao), [Good Shap intro](https:\u002F\u002Fwww.aidancooper.co.uk\u002Fa-non-technical-guide-to-interpreting-shap-analyses\u002F).  \n[shapiq](https:\u002F\u002Fgithub.com\u002Fmmschlk\u002Fshapiq) - Shapley interaction quantification.  \n[treeinterpreter](https:\u002F\u002Fgithub.com\u002Fandosa\u002Ftreeinterpreter) - Interpreting scikit-learn's decision tree and random forest predictions.  \n[lime](https:\u002F\u002Fgithub.com\u002Fmarcotcr\u002Flime) - Explaining the predictions of any machine learning classifier, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=C80SQe16Rao), [Warning (Myth 7)](https:\u002F\u002Fcrazyoscarchang.github.io\u002F2019\u002F02\u002F16\u002Fseven-myths-in-machine-learning-research\u002F).  \n[lime_xgboost](https:\u002F\u002Fgithub.com\u002Fjphall663\u002Flime_xgboost) - Create LIMEs for XGBoost.  \n[eli5](https:\u002F\u002Fgithub.com\u002FTeamHG-Memex\u002Feli5) - Inspecting machine learning classifiers and explaining their predictions.  \n[lofo-importance](https:\u002F\u002Fgithub.com\u002Faerdem4\u002Flofo-importance) - Leave One Feature Out Importance, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zqsQ2ojj7sE).  \n[pybreakdown](https:\u002F\u002Fgithub.com\u002FMI2DataLab\u002FpyBreakDown) - Generate feature contribution plots.  \n[pycebox](https:\u002F\u002Fgithub.com\u002FAustinRochford\u002FPyCEbox) - Individual Conditional Expectation Plot Toolbox.  \n[pdpbox](https:\u002F\u002Fgithub.com\u002FSauceCat\u002FPDPbox) - Partial dependence plot toolbox, [example](https:\u002F\u002Fwww.kaggle.com\u002Fdansbecker\u002Fpartial-plots).  \n[partial_dependence](https:\u002F\u002Fgithub.com\u002Fnyuvis\u002Fpartial_dependence) - Visualize and cluster partial dependence.  \n[contrastive_explanation](https:\u002F\u002Fgithub.com\u002FMarcelRobeer\u002FContrastiveExplanation) - Contrastive explanations.  \n[DrWhy](https:\u002F\u002Fgithub.com\u002FModelOriented\u002FDrWhy) - Collection of tools for explainable AI.  \n[lucid](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Flucid) - Neural network interpretability.  \n[xai](https:\u002F\u002Fgithub.com\u002FEthicalML\u002FXAI) - An eXplainability toolbox for machine learning.  \n[innvestigate](https:\u002F\u002Fgithub.com\u002Falbermax\u002Finnvestigate) - A toolbox to investigate neural network predictions.  \n[dalex](https:\u002F\u002Fgithub.com\u002Fpbiecek\u002FDALEX) - Explanations for ML models (R package).  \n[interpretml](https:\u002F\u002Fgithub.com\u002Finterpretml\u002Finterpret) - Fit interpretable models, explain models.  \n[shapash](https:\u002F\u002Fgithub.com\u002FMAIF\u002Fshapash) - Model interpretability.  \n[imodels](https:\u002F\u002Fgithub.com\u002Fcsinva\u002Fimodels) - Interpretable ML package.  \n[captum](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fcaptum) - Model interpretability and understanding for PyTorch.  \n\n#### Automated Machine Learning\n[AdaNet](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fadanet) - Automated machine learning based on TensorFlow.  \n[tpot](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot) - Automated machine learning tool, optimizes machine learning pipelines.  \n[autokeras](https:\u002F\u002Fgithub.com\u002Fjhfjhfj1\u002Fautokeras) - AutoML for deep learning.  \n[nni](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002Fnni) - Toolkit for neural architecture search and hyper-parameter tuning by Microsoft.  \n[mljar](https:\u002F\u002Fgithub.com\u002Fmljar\u002Fmljar-supervised) - Automated machine learning.  \n[automl_zero](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002Fmaster\u002Fautoml_zero) - Automatically discover computer programs that can solve machine learning tasks from Google.  \n[AlphaPy](https:\u002F\u002Fgithub.com\u002FScottfreeLLC\u002FAlphaPy) - Automated Machine Learning using scikit-learn xgboost, LightGBM and others.  \n\n#### Graph Representation Learning\n[Karate Club](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fkarateclub) - Unsupervised learning on graphs.   \n[PyTorch Geometric](https:\u002F\u002Fgithub.com\u002Frusty1s\u002Fpytorch_geometric) - Graph representation learning with PyTorch.   \n[DLG](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fdgl) - Graph representation learning with TensorFlow.   \n\n#### Convex optimization\n[cvxpy](https:\u002F\u002Fgithub.com\u002Fcvxgrp\u002Fcvxpy) - Modelling language for convex optimization problems. Tutorial: [1](https:\u002F\u002Fcalmcode.io\u002Fcvxpy-one\u002Fthe-stigler-diet.html), [2](https:\u002F\u002Fcalmcode.io\u002Fcvxpy-two\u002Fintroduction.html)  \n\n#### Evolutionary Algorithms & Optimization\n[deap](https:\u002F\u002Fgithub.com\u002FDEAP\u002Fdeap) - Evolutionary computation framework (Genetic Algorithm, Evolution strategies).  \n[evol](https:\u002F\u002Fgithub.com\u002Fgodatadriven\u002Fevol) - DSL for composable evolutionary algorithms, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=68ABAU_V8qI&t=11m49s).  \n[platypus](https:\u002F\u002Fgithub.com\u002FProject-Platypus\u002FPlatypus) - Multiobjective optimization.  \n[autograd](https:\u002F\u002Fgithub.com\u002FHIPS\u002Fautograd) - Efficiently computes derivatives of numpy code.  \n[nevergrad](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnevergrad) - Derivation-free optimization.  \n[gplearn](https:\u002F\u002Fgplearn.readthedocs.io\u002Fen\u002Fstable\u002F) - Sklearn-like interface for genetic programming.  \n[blackbox](https:\u002F\u002Fgithub.com\u002Fpaulknysh\u002Fblackbox) - Optimization of expensive black-box functions.  \nOptometrist algorithm - [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-017-06645-7).  \n[DeepSwarm](https:\u002F\u002Fgithub.com\u002FPattio\u002FDeepSwarm) - Neural architecture search.  \n[evotorch](https:\u002F\u002Fgithub.com\u002Fnnaisense\u002Fevotorch) - Evolutionary computation library built on Pytorch.  \n\n#### Hyperparameter Tuning\n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Findex.html) - [GridSearchCV](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.model_selection.GridSearchCV.html), [RandomizedSearchCV](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.model_selection.RandomizedSearchCV.html).  \n[sklearn-deap](https:\u002F\u002Fgithub.com\u002Frsteca\u002Fsklearn-deap) - Hyperparameter search using genetic algorithms.  \n[hyperopt](https:\u002F\u002Fgithub.com\u002Fhyperopt\u002Fhyperopt) - Hyperparameter optimization.  \n[hyperopt-sklearn](https:\u002F\u002Fgithub.com\u002Fhyperopt\u002Fhyperopt-sklearn) - Hyperopt + sklearn.  \n[optuna](https:\u002F\u002Fgithub.com\u002Fpfnet\u002Foptuna) - Hyperparamter optimization, [Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=tcrcLRopTX0).  \n[skopt](https:\u002F\u002Fscikit-optimize.github.io\u002F) - `BayesSearchCV` for Hyperparameter search.  \n[tune](https:\u002F\u002Fray.readthedocs.io\u002Fen\u002Flatest\u002Ftune.html) - Hyperparameter search with a focus on deep learning and deep reinforcement learning.  \n[bbopt](https:\u002F\u002Fgithub.com\u002Fevhub\u002Fbbopt) - Black box hyperparameter optimization.  \n[dragonfly](https:\u002F\u002Fgithub.com\u002Fdragonfly\u002Fdragonfly) - Scalable Bayesian optimisation.  \n[botorch](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fbotorch) - Bayesian optimization in PyTorch.  \n[ax](https:\u002F\u002Fgithub.com\u002Ffacebook\u002FAx) - Adaptive Experimentation Platform by Facebook.  \n[lightning-hpo](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flightning-hpo) - Hyperparameter optimization based on optuna.  \n\n#### Incremental Learning, Online Learning\nsklearn - [PassiveAggressiveClassifier](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.linear_model.PassiveAggressiveClassifier.html), [PassiveAggressiveRegressor](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.linear_model.PassiveAggressiveRegressor.html).  \n[river](https:\u002F\u002Fgithub.com\u002Fonline-ml\u002Friver) - Online machine learning.  \n[Kaggler](https:\u002F\u002Fgithub.com\u002Fjeongyoonlee\u002FKaggler) - Online Learning algorithms.  \n\n#### Active Learning\n[Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0efyjq5rWS4)  \n[modAL](https:\u002F\u002Fgithub.com\u002FmodAL-python\u002FmodAL) - Active learning framework.  \n\n#### Reinforcement Learning\n[YouTube](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT), [YouTube](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs)  \nIntro to Monte Carlo Tree Search (MCTS) - [1](https:\u002F\u002Fjeffbradberry.com\u002Fposts\u002F2015\u002F09\u002Fintro-to-monte-carlo-tree-search\u002F), [2](http:\u002F\u002Fmcts.ai\u002Fabout\u002Findex.html), [3](https:\u002F\u002Fmedium.com\u002F@quasimik\u002Fmonte-carlo-tree-search-applied-to-letterpress-34f41c86e238)  \nAlphaZero methodology - [1](https:\u002F\u002Fgithub.com\u002FAppliedDataSciencePartners\u002FDeepReinforcementLearning), [2](https:\u002F\u002Fweb.stanford.edu\u002F~surag\u002Fposts\u002Falphazero.html), [3](https:\u002F\u002Fgithub.com\u002Fsuragnair\u002Falpha-zero-general), [Cheat Sheet](https:\u002F\u002Fmedium.com\u002Fapplied-data-science\u002Falphago-zero-explained-in-one-diagram-365f5abf67e0)  \n[RLLib](https:\u002F\u002Fray.readthedocs.io\u002Fen\u002Flatest\u002Frllib.html) - Library for reinforcement learning.  \n[Horizon](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FHorizon\u002F) - Facebook RL framework.  \n\n#### Deployment and Lifecycle Management\n\n##### Workflow Scheduling and Orchestration\n[nextflow](https:\u002F\u002Fgithub.com\u002Fgoodwright\u002Fnextflow.py) - Run scripts and workflow graphs in Docker image using Google Life Sciences, AWS Batch, [Website](https:\u002F\u002Fgithub.com\u002Fnextflow-io\u002Fnextflow).   \n[airflow](https:\u002F\u002Fgithub.com\u002Fapache\u002Fairflow) - Schedule and monitor workflows.  \n[prefect](https:\u002F\u002Fgithub.com\u002FPrefectHQ\u002Fprefect) - Python specific workflow scheduling.  \n[dagster](https:\u002F\u002Fgithub.com\u002Fdagster-io\u002Fdagster) - Development, production and observation of data assets.  \n[ploomber](https:\u002F\u002Fgithub.com\u002Fploomber\u002Fploomber) - Workflow orchestration.  \n[kestra](https:\u002F\u002Fgithub.com\u002Fkestra-io\u002Fkestra) - Workflow orchestration.  \n[cml](https:\u002F\u002Fgithub.com\u002Fiterative\u002Fcml) - CI\u002FCD for Machine Learning Projects.  \n[rocketry](https:\u002F\u002Fgithub.com\u002FMiksus\u002Frocketry) - Task scheduling.  \n[huey](https:\u002F\u002Fgithub.com\u002Fcoleifer\u002Fhuey) - Task queue.  \n\n##### Containerization and Docker\n[Reduce size of docker images (video)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Z1Al4I4Os_A)  \n[Optimize Docker Image Size](https:\u002F\u002Fwww.augmentedmind.de\u002F2022\u002F02\u002F06\u002Foptimize-docker-image-size\u002F)  \n[cog](https:\u002F\u002Fgithub.com\u002Freplicate\u002Fcog) - Facilitates building Docker images.  \n\n##### Data Versioning, Databases, Pipelines and Model Serving\n[dvc](https:\u002F\u002Fgithub.com\u002Fiterative\u002Fdvc) - Version control for large files.  \n[kedro](https:\u002F\u002Fgithub.com\u002Fquantumblacklabs\u002Fkedro) - Build data pipelines.  \n[feast](https:\u002F\u002Fgithub.com\u002Ffeast-dev\u002Ffeast) - Feature store. [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=_omcXenypmo).  \n[pgvector](https:\u002F\u002Fgithub.com\u002Fpgvector\u002Fpgvector) - Vector similarity search for Postgres.  \n[pinecone](https:\u002F\u002Fwww.pinecone.io\u002F) - Database for vector search applications.  \n[truss](https:\u002F\u002Fgithub.com\u002Fbasetenlabs\u002Ftruss) - Serve ML models.  \n[milvus](https:\u002F\u002Fgithub.com\u002Fmilvus-io\u002Fmilvus) - Vector database for similarity search.  \n[mlem](https:\u002F\u002Fgithub.com\u002Fiterative\u002Fmlem) - Version and deploy your ML models following GitOps principles.  \n\n##### Data Science Related\n[m2cgen](https:\u002F\u002Fgithub.com\u002FBayesWitnesses\u002Fm2cgen) - Transpile trained ML models into other languages.  \n[sklearn-porter](https:\u002F\u002Fgithub.com\u002Fnok\u002Fsklearn-porter) - Transpile trained scikit-learn estimators to C, Java, JavaScript and others.  \n[mlflow](https:\u002F\u002Fmlflow.org\u002F) - Manage the machine learning lifecycle, including experimentation, reproducibility and deployment.  \n[skll](https:\u002F\u002Fgithub.com\u002FEducationalTestingService\u002Fskll) - Command-line utilities to make it easier to run machine learning experiments.  \n[BentoML](https:\u002F\u002Fgithub.com\u002Fbentoml\u002FBentoML) - Package and deploy machine learning models for serving in production.  \n[dagster](https:\u002F\u002Fgithub.com\u002Fdagster-io\u002Fdagster) - Tool with focus on dependency graphs.  \n[knockknock](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fknockknock) - Be notified when your training ends.  \n[metaflow](https:\u002F\u002Fgithub.com\u002FNetflix\u002Fmetaflow) - Lifecycle Management Tool by Netflix.  \n[cortex](https:\u002F\u002Fgithub.com\u002Fcortexlabs\u002Fcortex) - Deploy machine learning models.  \n[Neptune](https:\u002F\u002Fneptune.ai) - Experiment tracking and model registry.  \n[clearml](https:\u002F\u002Fgithub.com\u002Fallegroai\u002Fclearml) - Experiment Manager, MLOps and Data-Management.  \n[polyaxon](https:\u002F\u002Fgithub.com\u002Fpolyaxon\u002Fpolyaxon) - MLOps.  \n[sematic](https:\u002F\u002Fgithub.com\u002Fsematic-ai\u002Fsematic) - Deploy machine learning models.  \n[zenml](https:\u002F\u002Fgithub.com\u002Fzenml-io\u002Fzenml) - MLOPs.  \n\n#### Math and Background\n[All kinds of math and statistics resources](https:\u002F\u002Frealnotcomplex.com\u002F)  \nGilbert Strang - [Linear Algebra](https:\u002F\u002Focw.mit.edu\u002Fcourses\u002Fmathematics\u002F18-06-linear-algebra-spring-2010\u002Findex.htm)  \nGilbert Strang - [Matrix Methods in Data Analysis, Signal Processing, and Machine Learning\n](https:\u002F\u002Focw.mit.edu\u002Fcourses\u002Fmathematics\u002F18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018\u002F)  \n\n#### Resources\n[Distill.pub](https:\u002F\u002Fdistill.pub\u002F) - Blog.   \n[Machine Learning Videos](https:\u002F\u002Fgithub.com\u002Fdustinvtran\u002Fml-videos)  \n[Data Science Notebooks](https:\u002F\u002Fgithub.com\u002Fdonnemartin\u002Fdata-science-ipython-notebooks)  \n[Recommender Systems (Microsoft)](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FRecommenders)  \n[Datascience Cheatsheets](https:\u002F\u002Fgithub.com\u002FFavioVazquez\u002Fds-cheatsheets)   \n\n##### Guidelines \n[datasharing](https:\u002F\u002Fgithub.com\u002Fjtleek\u002Fdatasharing) - Guide to data sharing.  \n\n##### Books\n[Blum - Foundations of Data Science](https:\u002F\u002Fwww.cs.cornell.edu\u002Fjeh\u002Fbook.pdf?file=book.pdf)  \n[Chan - Introduction to Probability for Data Science](https:\u002F\u002Fprobability4datascience.com\u002Findex.html)  \n[Colonescu - Principles of Econometrics with R](https:\u002F\u002Fbookdown.org\u002Fccolonescu\u002FRPoE4\u002F)  \n[Rafael Irizarry - Introduction to Data Science](https:\u002F\u002Frafalab.dfci.harvard.edu\u002Fdsbook-part-1\u002F) (R Language)  \n[Rafael Irizarry - Advanced Data Science](https:\u002F\u002Frafalab.dfci.harvard.edu\u002Fdsbook-part-2\u002F) (R Language)  \n\n##### Other Awesome Lists\n[Awesome Adversarial Machine Learning](https:\u002F\u002Fgithub.com\u002Fyenchenlin\u002Fawesome-adversarial-machine-learning)    \n[Awesome AI Booksmarks](https:\u002F\u002Fgithub.com\u002Fgoodrahstar\u002Fmy-awesome-AI-bookmarks)    \n[Awesome AI on Kubernetes](https:\u002F\u002Fgithub.com\u002FCognonicLabs\u002Fawesome-AI-kubernetes)    \n[Awesome Big Data](https:\u002F\u002Fgithub.com\u002Fonurakpolat\u002Fawesome-bigdata)    \n[Awesome Biological Image Analysis](https:\u002F\u002Fgithub.com\u002Fhallvaaw\u002Fawesome-biological-image-analysis)  \n[Awesome Business Machine Learning](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Fbusiness-machine-learning)    \n[Awesome Causality](https:\u002F\u002Fgithub.com\u002Frguo12\u002Fawesome-causality-algorithms)    \n[Awesome Community Detection](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-community-detection)    \n[Awesome CSV](https:\u002F\u002Fgithub.com\u002FsecretGeek\u002FAwesomeCSV)  \n[Awesome Cytodata](https:\u002F\u002Fgithub.com\u002Fcytodata\u002Fawesome-cytodata)  \n[Awesome Data Science](https:\u002F\u002Fgithub.com\u002Facademic\u002Fawesome-datascience)  \n[Awesome Data Science with Ruby](https:\u002F\u002Fgithub.com\u002Farbox\u002Fdata-science-with-ruby)   \n[Awesome Dash](https:\u002F\u002Fgithub.com\u002Fucg8j\u002Fawesome-dash)   \n[Awesome Decision Trees](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-decision-tree-papers)    \n[Awesome Deep Learning](https:\u002F\u002Fgithub.com\u002FChristosChristofidis\u002Fawesome-deep-learning)   \n[Awesome ETL](https:\u002F\u002Fgithub.com\u002Fpawl\u002Fawesome-etl)   \n[Awesome Financial Machine Learning](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Ffinancial-machine-learning)   \n[Awesome Fraud Detection](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-fraud-detection-papers)   \n[Awesome GAN Applications](https:\u002F\u002Fgithub.com\u002Fnashory\u002Fgans-awesome-applications)   \n[Awesome Graph Classification](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-graph-classification)   \n[Awesome Industry Machine Learning](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Findustry-machine-learning)  \n[Awesome Gradient Boosting](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-gradient-boosting-papers)   \n[Awesome Learning with Label Noise](https:\u002F\u002Fgithub.com\u002Fsubeeshvasu\u002FAwesome-Learning-with-Label-Noise)  \n[Awesome Machine Learning](https:\u002F\u002Fgithub.com\u002Fjosephmisiti\u002Fawesome-machine-learning#python)    \n[Awesome Machine Learning Books](http:\u002F\u002Fmatpalm.com\u002Fblog\u002Fcool_machine_learning_books\u002F)  \n[Awesome Machine Learning Interpretability](https:\u002F\u002Fgithub.com\u002Fjphall663\u002Fawesome-machine-learning-interpretability)     \n[Awesome Machine Learning Operations](https:\u002F\u002Fgithub.com\u002FEthicalML\u002Fawesome-machine-learning-operations)   \n[Awesome Monte Carlo Tree Search](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-monte-carlo-tree-search-papers)   \n[Awesome MLOps](https:\u002F\u002Fgithub.com\u002Fkelvins\u002Fawesome-mlops)  \n[Awesome Neural Network Visualization](https:\u002F\u002Fgithub.com\u002Fashishpatel26\u002FTools-to-Design-or-Visualize-Architecture-of-Neural-Network)  \n[Awesome Online Machine Learning](https:\u002F\u002Fgithub.com\u002FMaxHalford\u002Fawesome-online-machine-learning)  \n[Awesome Pipeline](https:\u002F\u002Fgithub.com\u002Fpditommaso\u002Fawesome-pipeline)  \n[Awesome Public APIs](https:\u002F\u002Fgithub.com\u002Fpublic-apis\u002Fpublic-apis)  \n[Awesome Public Datasets](https:\u002F\u002Fgithub.com\u002Fawesomedata\u002Fawesome-public-datasets)  \n[Awesome Python](https:\u002F\u002Fgithub.com\u002Fvinta\u002Fawesome-python)   \n[Awesome Python Data Science](https:\u002F\u002Fgithub.com\u002Fkrzjoa\u002Fawesome-python-datascience)   \n[Awesome Python Data Science](https:\u002F\u002Fgithub.com\u002Fthomasjpfan\u002Fawesome-python-data-science)  \n[Awesome Pytorch](https:\u002F\u002Fgithub.com\u002Fbharathgs\u002FAwesome-pytorch-list)  \n[Awesome Quantitative Finance](https:\u002F\u002Fgithub.com\u002Fwilsonfreitas\u002Fawesome-quant)  \n[Awesome Recommender Systems](https:\u002F\u002Fgithub.com\u002Fgrahamjenson\u002Flist_of_recommender_systems)  \n[Awesome Satellite Benchmark Datasets](https:\u002F\u002Fgithub.com\u002FSeyed-Ali-Ahmadi\u002FAwesome_Satellite_Benchmark_Datasets)  \n[Awesome Satellite Image for Deep Learning](https:\u002F\u002Fgithub.com\u002Fsatellite-image-deep-learning\u002Ftechniques)  \n[Awesome Single Cell](https:\u002F\u002Fgithub.com\u002Fseandavi\u002Fawesome-single-cell)  \n[Awesome Semantic Segmentation](https:\u002F\u002Fgithub.com\u002Fmrgloom\u002Fawesome-semantic-segmentation)  \n[Awesome Sentence Embedding](https:\u002F\u002Fgithub.com\u002FSeparius\u002Fawesome-sentence-embedding)  \n[Awesome Visual Attentions](https:\u002F\u002Fgithub.com\u002FMenghaoGuo\u002FAwesome-Vision-Attentions)  \n[Awesome Visual Transformer](https:\u002F\u002Fgithub.com\u002Fdk-liang\u002FAwesome-Visual-Transformer)  \n\n#### Lectures\n[NYU Deep Learning SP21](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLLHTzKZzVU9e6xUfG10TkTWApKSZCzuBI) - YouTube Playlist.   \n\n#### Things I google a lot\n[Color Codes](https:\u002F\u002Fgithub.com\u002Fd3\u002Fd3-3.x-api-reference\u002Fblob\u002Fmaster\u002FOrdinal-Scales.md#categorical-colors)  \n[Frequency codes for time series](https:\u002F\u002Fpandas.pydata.org\u002Fpandas-docs\u002Fstable\u002Ftimeseries.html#offset-aliases)  \n[Date parsing codes](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Fdatetime.html#strftime-and-strptime-behavior)  \n\n## Contributing  \nDo you know a package that should be on this list? Did you spot a package that is no longer maintained and should be removed from this list? Then feel free to read the [contribution guidelines](CONTRIBUTING.md) and submit your pull request or create a new issue.  \n\n## License\n[![CC0](http:\u002F\u002Fmirrors.creativecommons.org\u002Fpresskit\u002Fbuttons\u002F88x31\u002Fsvg\u002Fcc-zero.svg)](https:\u002F\u002Fcreativecommons.org\u002Fpublicdomain\u002Fzero\u002F1.0\u002F)\n","# 使用 Python 的超赞数据科学\n\n> 一个精心整理的列表，汇集了使用 Python 进行数据科学实践的优质资源，不仅包括各类库，还有教程链接、代码片段、博客文章和演讲。  \n\n#### 核心\n[pandas](https:\u002F\u002Fpandas.pydata.org\u002F) - 基于 [numpy](https:\u002F\u002Fwww.numpy.org\u002F) 构建的数据结构。  \n[scikit-learn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002F) - 核心机器学习库，[intelex](https:\u002F\u002Fgithub.com\u002Fintel\u002Fscikit-learn-intelex)。  \n[matplotlib](https:\u002F\u002Fmatplotlib.org\u002F) - 绘图库。  \n[seaborn](https:\u002F\u002Fseaborn.pydata.org\u002F) - 基于 matplotlib 的数据可视化库。  \n[ydata-profiling](https:\u002F\u002Fgithub.com\u002Fydataai\u002Fydata-profiling) - 使用 `ProfileReport` 进行描述性统计分析。  \n[sklearn_pandas](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fsklearn-pandas) - 提供实用的 `DataFrameMapper` 类。  \n[missingno](https:\u002F\u002Fgithub.com\u002FResidentMario\u002Fmissingno) - 用于缺失数据可视化的工具。  \n[rainbow-csv](https:\u002F\u002Fmarketplace.visualstudio.com\u002Fitems?itemName=mechatroner.rainbow-csv) - VSCode 插件，可将 .csv 文件以彩色显示。  \n\n#### Python 编程通用工具\n[高级 Python 特性](https:\u002F\u002Fblog.edward-li.com\u002Ftech\u002Fadvanced-python-features\u002F) - 泛型、协议、结构化模式匹配等。  \n[uv](https:\u002F\u002Fgithub.com\u002Fastral-sh\u002Fuv) - 依赖管理工具。  \n[pdm](https:\u002F\u002Fpdm-project.org\u002Fen\u002Flatest\u002F) - 适用于大型二进制分发的工具，与 uv 配合使用。  \n[just](https:\u002F\u002Fgithub.com\u002Fcasey\u002Fjust) - 命令运行器，替代 make。  \n[python-dotenv](https:\u002F\u002Fgithub.com\u002Ftheskumar\u002Fpython-dotenv) - 管理环境变量。  \n[structlog](https:\u002F\u002Fgithub.com\u002Fhynek\u002Fstructlog) - Python 日志记录工具。  \n[more_itertools](https:\u002F\u002Fmore-itertools.readthedocs.io\u002Fen\u002Flatest\u002F) - itertools 的扩展库。  \n[tqdm](https:\u002F\u002Fgithub.com\u002Ftqdm\u002Ftqdm) - 用于 for 循环的进度条，也支持 [pandas apply()](https:\u002F\u002Fstackoverflow.com\u002Fa\u002F34365537\u002F1820480)。  \n[hydra](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhydra) - 配置管理工具。  \n\n#### Pandas 技巧、替代方案及扩展\n[duckdb](https:\u002F\u002Fgithub.com\u002Fduckdb\u002Fduckdb) - 可高效地在 pandas DataFrame 上执行 SQL 查询，[duckplyr](https:\u002F\u002Fgithub.com\u002Ftidyverse\u002Fduckplyr\u002F) 适用于 R，[精彩介绍](https:\u002F\u002Fcodecut.ai\u002Fdeep-dive-into-duckdb-data-scientists\u002F)。  \n[ducklake](https:\u002F\u002Fgithub.com\u002Fduckdb\u002Fducklake) - Duckdb 的扩展，用于将数据存储在数据湖中。  \n[fireducks](https:\u002F\u002Fgithub.com\u002Ffireducks-dev\u002Ffireducks) - 具有类似 API 的更快替代方案。  \n[pandasvault](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Fpandasvault) - 大量 pandas 技巧集合。  \n[polars](https:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars) - 多线程版本的 pandas 替代品。  \n[xarray](https:\u002F\u002Fgithub.com\u002Fpydata\u002Fxarray\u002F) - 将 pandas 扩展到 n 维数组。  \n[mlx](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx) - 面向 Apple 芯片的数组框架。  \n[pandas_flavor](https:\u002F\u002Fgithub.com\u002FZsailer\u002Fpandas_flavor) - 用于编写自定义访问器，如 `.str` 和 `.dt`。  \n[daft](https:\u002F\u002Fgithub.com\u002FEventual-Inc\u002FDaft) - 分布式 DataFrame。  \n[vaex](https:\u002F\u002Fgithub.com\u002Fvaexio\u002Fvaex) - 外存 DataFrame。  \n[modin](https:\u002F\u002Fgithub.com\u002Fmodin-project\u002Fmodin) - 用于加速 pandas `DataFrame` 的并行化库。  \n[swifter](https:\u002F\u002Fgithub.com\u002Fjmcarpenter2\u002Fswifter) - 更快地对 pandas DataFrame 应用任意函数（可与 modin 结合使用）。  \n\n#### 表格工具\n[great-tables](https:\u002F\u002Fgithub.com\u002Fposit-dev\u002Fgreat-tables) - 以美观方式展示表格数据。  \n\n#### 交互式 DataFrame 可视化\n[pygwalker](https:\u002F\u002Fgithub.com\u002FKanaries\u002Fpygwalker) - 交互式 DataFrame。  \n[marimo](https:\u002F\u002Fgithub.com\u002Fmarimo-team\u002Fmarimo) - 可视化与可复现的工作环境。  \n[lux](https:\u002F\u002Fgithub.com\u002Flux-org\u002Flux) - 在 Jupyter 中进行 DataFrame 可视化。  \n[dtale](https:\u002F\u002Fgithub.com\u002Fman-group\u002Fdtale) - 查看和分析 Pandas 数据结构，并与 Jupyter 集成。  \n[pandasgui](https:\u002F\u002Fgithub.com\u002Fadamerose\u002Fpandasgui) - 用于查看、绘图和分析 Pandas DataFrame 的 GUI。  \n[quak](https:\u002F\u002Fgithub.com\u002Fmanzt\u002Fquak) - 可扩展的交互式数据表，[推特](https:\u002F\u002Fx.com\u002Ftrevmanz\u002Fstatus\u002F1816760923949809982)。  \n[data-formulator](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fdata-formulator) - 数据可视化工具。  \n\n\n#### 环境与 Jupyter\n[Jupyter 技巧](https:\u002F\u002Fwww.dataquest.io\u002Fblog\u002Fjupyter-notebook-tips-tricks-shortcuts\u002F)  \n[nteract](https:\u002F\u002Fnteract.io\u002F) - 双击即可打开 Jupyter 笔记本。  \n[papermill](https:\u002F\u002Fgithub.com\u002Fnteract\u002Fpapermill) - 参数化并执行 Jupyter 笔记本，[教程](https:\u002F\u002Fpbpython.com\u002Fpapermil-rclone-report-1.html)。  \n[nbdime](https:\u002F\u002Fgithub.com\u002Fjupyter\u002Fnbdime) - 比较两个笔记本文件，替代 GitHub 应用：[ReviewNB](https:\u002F\u002Fwww.reviewnb.com\u002F)。  \n[RISE](https:\u002F\u002Fgithub.com\u002Fdamianavila\u002FRISE) - 将 Jupyter 笔记本转换为演示文稿。  \n[handcalcs](https:\u002F\u002Fgithub.com\u002Fconnorferster\u002Fhandcalcs) - 在 Jupyter 中更便捷地书写数学公式。  \n[notebooker](https:\u002F\u002Fgithub.com\u002Fman-group\u002Fnotebooker) - 将 Jupyter 笔记本生产化并安排调度。  \n[voila](https:\u002F\u002Fgithub.com\u002FQuantStack\u002Fvoila) - 将 Jupyter 笔记本转化为独立的 Web 应用程序。[Voila 网格布局](https:\u002F\u002Fgithub.com\u002Fvoila-dashboards\u002Fvoila-gridstack)。  \n\n#### Jupyter 替代方案\n[positron](https:\u002F\u002Fgithub.com\u002Fposit-dev\u002Fpositron) - 数据科学 IDE。  \n[Deepnote](https:\u002F\u002Fdeepnote.com) - 支持实时协作和环境管理的数据科学平台。  \n\n#### 文本提取 + OCR\n[textract](https:\u002F\u002Fgithub.com\u002Fdeanmalmgren\u002Ftextract) - 从任何文档中提取文本。  \n[docling](https:\u002F\u002Fgithub.com\u002Fdocling-project\u002Fdocling) - 文本提取工具。  \n[DeepSeek-OCR](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-OCR) - OCR 工具。  \n[chandra](https:\u002F\u002Fgithub.com\u002Fdatalab-to\u002Fchandra) - OCR 工具。\n\n#### 大数据\n[Spark](https:\u002F\u002Fdocs.databricks.com\u002Fspark\u002Flatest\u002Fdataframes-datasets\u002Fintroduction-to-dataframes-python.html#work-with-dataframes) - 用于大数据的`DataFrame`，[速查表](https:\u002F\u002Fgist.github.com\u002Fcrawles\u002Fb47e23da8218af0b9bd9d47f5242d189)，[教程](https:\u002F\u002Fgithub.com\u002Fericxiao251\u002Fspark-syntax)。  \n[Dask](https:\u002F\u002Fgithub.com\u002Fdask\u002Fdask)，[Dask-ML](http:\u002F\u002Fml.dask.org\u002F) - 适用于大数据和机器学习的Pandas `DataFrame`库，[资源](https:\u002F\u002Fmatthewrocklin.com\u002Fblog\u002F\u002Fwork\u002F2018\u002F07\u002F17\u002Fdask-dev)，[演讲1](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ccfsbuqsjgI)，[演讲2](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=RA_2qdipVng)，[笔记本](https:\u002F\u002Fgithub.com\u002Fdask\u002Fdask-ec2\u002Ftree\u002Fmaster\u002Fnotebooks)，[视频](https:\u002F\u002Fwww.youtube.com\u002Fuser\u002Fmdrocklin)。  \n[H2O](https:\u002F\u002Fgithub.com\u002Fh2oai\u002Fh2o-3) - 提供有助于处理超出内存限制的数据框的`H2OFrame`类。  \n[cuDF](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcudf) - GPU 数据帧库，[简介](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6XzS5XcpicM&t=2m50s)。  \n[Cupy](https:\u002F\u002Fgithub.com\u002Fcupy\u002Fcupy) - 基于CUDA加速的类似NumPy的API。  \n[Ray](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray\u002F) - 灵活、高性能的分布式执行框架。  \n[Bottleneck](https:\u002F\u002Fgithub.com\u002Fkwgoodman\u002Fbottleneck) - 用C语言编写的快速NumPy数组函数。  \n[Petastorm](https:\u002F\u002Fgithub.com\u002Fuber\u002Fpetastorm) - Uber开发的Parquet文件数据访问库。  \n[Zarr](https:\u002F\u002Fgithub.com\u002Fzarr-developers\u002Fzarr-python) - 分布式NumPy数组。  \n[NVTabular](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNVTabular) - NVIDIA推出的表格数据特征工程与预处理库。  \n[TensorStore](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Ftensorstore) - 用于读写大型多维数组（由Google开发）。  \n\n#### 命令行工具，CSV\n[CSVkit](https:\u002F\u002Fgithub.com\u002Fwireservice\u002Fcsvkit) - CSV文件的命令行工具。  \n[csvsort](https:\u002F\u002Fpypi.org\u002Fproject\u002Fcsvsort\u002F) - 用于排序大型CSV文件。  \n\n#### 经典统计学\n\n##### 书籍\n[Lakens - 改善你的统计推断](https:\u002F\u002Flakens.github.io\u002Fstatistical_inferences\u002F) - 涵盖假设检验、效应量、置信区间、样本量、等价性检验、序贯分析等内容，[GitHub](https:\u002F\u002Fgithub.com\u002FLakens\u002Fstatistical_inferences)  \n[模型揭秘](https:\u002F\u002Fm-clark.github.io\u002Fbook-of-models\u002F) - 从线性回归到深度学习。[GitHub](https:\u002F\u002Fgithub.com\u002Fm-clark\u002Fbook-of-models)。  \n[人工智能背后的数学](https:\u002F\u002Fwww.freecodecamp.org\u002Fnews\u002Fthe-math-behind-artificial-intelligence-book) - 一本以工程为导向的书籍，涵盖线性代数、微积分、概率与统计以及优化理论，并配有Python示例。  \n\n##### 数据集\n[Rdatasets](https:\u002F\u002Fvincentarelbundock.github.io\u002FRdatasets\u002Farticles\u002Fdata.html) - 包含超过2000个数据集的集合，以CSV文件形式存储（R包）。  \n[crimedatasets](https:\u002F\u002Flightbluetitan.github.io\u002Fcrimedatasets\u002F) - 专注于犯罪和刑事活动的数据集（R包）。  \n[educationr](https:\u002F\u002Flightbluetitan.github.io\u002Feducationr\u002F) - 与教育相关的数据集（如表现、学习方法、考试成绩、缺勤情况）（R包）。  \n[MedDataSets](https:\u002F\u002Flightbluetitan.github.io\u002Fmeddatasets\u002Findex.html) - 与医学、疾病、治疗、药物及公共卫生相关的数据集（R包）。  \n[oncodatasets](https:\u002F\u002Flightbluetitan.github.io\u002Foncodatasets\u002F) - 专注于癌症研究、生存率、遗传学研究、生物标志物及流行病学的数据集（R包）。  \n[timeseriesdatasets_R](https:\u002F\u002Flightbluetitan.github.io\u002Ftimeseriesdatasets_R\u002F) - 时间序列数据集（R包）。  \n[usdatasets](https:\u002F\u002Flightbluetitan.github.io\u002Fusdatasets\u002F) - 仅限美国的数据集（犯罪、经济、教育、金融、能源、医疗保健等）（R包）。  \n[economic datasets](https:\u002F\u002Fcaptgouda24.github.io\u002Fnicholas-decker.github.io\u002Fdatasets.html) - 经济相关数据集。  \n\n##### p值\n[美国统计学会关于p值的声明：背景、过程与目的](https:\u002F\u002Famstat.tandfonline.com\u002Fdoi\u002Ffull\u002F10.1080\u002F00031305.2016.1154108#.Vt2XIOaE2MN)  \n[Greenland - 统计检验、p值、置信区间与功效：误读指南](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC4877414\u002F)  \n[Rubin - 不一致的多重检验校正：使用家族误差率推断单个假设的谬误](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS2590260124000067?via%3Dihub)  \n[Gigerenzer - 无脑统计](https:\u002F\u002Flibrary.mpib-berlin.mpg.de\u002Fft\u002Fgg\u002FGG_Mindless_2004.pdf)  \n[Rubin - 这不是双侧检验！而是两个单侧检验！(TOST)](https:\u002F\u002Frss.onlinelibrary.wiley.com\u002Fdoi\u002Ffull\u002F10.1111\u002F1740-9713.01405)  \n[Lakens - 我们本应如何超越p \u003C .05？为何没有做到？](https:\u002F\u002Ferrorstatistics.com\u002F2024\u002F07\u002F01\u002Fguest-post-daniel-lakens-how-were-we-supposed-to-move-beyond-p-05-and-why-didnt-we-thoughts-on-abandon-statistical-significance-5-years-on\u002F)  \n[McShane等 - 放弃统计显著性](https:\u002F\u002Fwww.tandfonline.com\u002Fdoi\u002Ffull\u002F10.1080\u002F00031305.2018.1527253)  \n[Ho等 - 超越p值：基于估计图形的数据分析](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F333884529_Moving_beyond_P_values_data_analysis_with_estimation_graphics)  \n[Lakens - p值的概率与检验功效的关系](https:\u002F\u002Fdaniellakens.blogspot.com\u002F2014\u002F05\u002Fthe-probability-of-p-values-as-function.html) - p值分布呈右偏态，且随着检验功效的提高，偏态会更加明显。  \n\n##### 相关性\n[猜一猜相关性](https:\u002F\u002Fwww.guessthecorrelation.com\u002F) - 一个猜测相关性的游戏。  \n[phik](https:\u002F\u002Fgithub.com\u002Fkaveio\u002Fphik) - 用于计算分类、有序和区间变量之间的相关性。  \n[hoeffd](https:\u002F\u002Fsearch.r-project.org\u002FCRAN\u002Frefmans\u002FHmisc\u002Fhtml\u002Fhoeffd.html) - Hoeffding's D统计量，用于衡量变量间的依赖关系（R包）。  \n\n##### 置信区间\n[Morey - 对置信区间抱有信任的谬误](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.3758\u002Fs13423-015-0947-8)\n\n##### 软件包\n[statsmodels](https:\u002F\u002Fwww.statsmodels.org\u002Fstable\u002Findex.html) - 统计检验。  \n[linearmodels](https:\u002F\u002Fgithub.com\u002Fbashtage\u002Flinearmodels) - 工具变量和面板数据模型。  \n[nomograms](https:\u002F\u002Fhbiostat.org\u002Fbbr\u002Frmsintro.html#nomograms-overall-depiction-of-fitted-models) - 线性模型的可视化工具，[解释](https:\u002F\u002Fstats.stackexchange.com\u002Fa\u002F155433\u002F285504)（rms R 包的一部分）  \n[pingouin](https:\u002F\u002Fgithub.com\u002Fraphaelvallat\u002Fpingouin) - 统计检验。[Pandas DataFrame 列之间的成对相关性](https:\u002F\u002Fpingouin-stats.org\u002Fgenerated\u002Fpingouin.pairwise_corr.html)   \n[scipy.stats](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fstats.html#statistical-tests) - 统计检验。  \n[scikit-posthocs](https:\u002F\u002Fgithub.com\u002Fmaximtrp\u002Fscikit-posthocs) - 用于成对多重比较的统计事后检验。   \nBland-Altman 图 [1](https:\u002F\u002Fpingouin-stats.org\u002Fgenerated\u002Fpingouin.plot_blandaltman.html), [2](http:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.graphics.agreement.mean_diff_plot.html) - 用于展示两种测量方法之间一致性的图表。  \n[ANOVA](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.f_oneway.html)  \n[StatCheck](https:\u002F\u002Fstatcheck.steveharoz.com\u002F) - 从文章中提取统计数据并重新计算 p 值（R 包）。  \n[tost](https:\u002F\u002Fpingouin-stats.org\u002Fbuild\u002Fhtml\u002Fgenerated\u002Fpingouin.tost.html) - 等效性检验中的双单侧检验（TOST）。  \n[DABEST-python](https:\u002F\u002Fgithub.com\u002FACCLAB\u002FDABEST-python) - 均值差异图。    \n[Durga](https:\u002F\u002Fgithub.com\u002FKhanKawsar\u002FEstimationPlot) - 均值差异图（R 包）。  \n\n##### 效应量\n[MOTE 效应量计算器](https:\u002F\u002Fwww.aggieerin.com\u002Fshiny-server\u002F) - [Shiny 应用程序](https:\u002F\u002Fdoomlab.shinyapps.io\u002Fmote\u002F)，[R 包](https:\u002F\u002Fgithub.com\u002Fdoomlab\u002FMOTE)  \n[从前测-后测对照组设计中估计效应量](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002Fepdf\u002F10.1177\u002F1094428106291059) - Scott B. Morris，[Twitter](https:\u002F\u002Ftwitter.com\u002FMatthewBJane\u002Fstatus\u002F1742588609025200557)    \n\n##### 统计检验\n[test_proportions_2indep](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.stats.proportion.test_proportions_2indep.html) - 比例检验。  \n[G 检验](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FG-test) - 卡方检验的替代方法，[power_divergence](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.power_divergence.html)。  \n\n##### 比较两个总体\n[torch-two-sample](https:\u002F\u002Fgithub.com\u002Fjosipd\u002Ftorch-two-sample) - Friedman-Rafsky 检验：基于 Run 检验的多变量推广来比较两个总体。[解释](https:\u002F\u002Fwww.real-statistics.com\u002Fmultivariate-statistics\u002Fmultivariate-normal-distribution\u002Ffriedman-rafsky-test\u002F)，[应用](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC5014134\u002F)  \n\n##### 功率与样本量计算\n[pwrss](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fpwrss\u002Findex.html) - 统计功效与样本量计算工具（R 包），[t 检验教程](https:\u002F\u002Frpubs.com\u002Fmetinbulus\u002Fwelch)  \n\n##### 中期分析 \u002F 顺序分析 \u002F 停止规则\n[Stop Early Stopping](https:\u002F\u002Fstop-early-stopping.osc.garden\u002F) - 优秀的可视化工具\n[顺序分析](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSequential_analysis) - 维基百科。  \n[sequential](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002FSequential\u002FSequential.pdf) - 泊松分布和二项分布数据的精确顺序分析（R 包）。  \n[confseq](https:\u002F\u002Fgithub.com\u002Fgostevehoward\u002Fconfseq) - 统一边界、置信序列以及始终有效的 p 值。  \n\n##### 可视化\n[Friends don't let friends make certain types of data visualization](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends)  \n[关于可视化的大纲](https:\u002F\u002Ftextvis.lnu.se\u002F)  \n[1 个数据集，100 种可视化方式](https:\u002F\u002F100.datavizproject.com\u002F)  \n[依赖概率](https:\u002F\u002Fstatic.laszlokorte.de\u002Fstochastic\u002F)  \n[零假设显著性检验 (NHST) 和样本量计算](https:\u002F\u002Frpsychologist.com\u002Fd3\u002FNHST\u002F)  \n[estimationstats](https:\u002F\u002Fwww.estimationstats.com\u002F) - 在线工具，用于可视化均值差异、效应量（Cohen's d）等。  \n[样本量\u002F持续时间计算器](https:\u002F\u002Fcalculator.osc.garden\u002F)  \n[相关性](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fcorrelation\u002F)  \n[Cohen's d](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fcohend\u002F)  \n[置信区间](https:\u002F\u002Frpsychologist.com\u002Fd3\u002FCI\u002F)  \n[等效性、非劣效性和优效性检验](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fequivalence\u002F)  \n[贝叶斯两样本 t 检验](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fbayes\u002F)  \n[比较两组时 p 值的分布](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fpdist\u002F)  \n[理解 t 分布及其正态近似](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Ftdist\u002F)  \n[统计功效与样本量计算工具](https:\u002F\u002Fpwrss.shinyapps.io\u002Findex\u002F)  \n\n##### Tidy Tuesday\n[使用 ggplot2 进行数据可视化的艺术，《TidyTuesday》食谱](https:\u002F\u002Fnrennie.rbind.io\u002Fart-of-viz\u002F)  \n[数据可视化的最佳实践](https:\u002F\u002Froyal-statistical-society.github.io\u002Fdatavisguide\u002F)  \n[tidytuesday](https:\u002F\u002Fgithub.com\u002Frfordatascience\u002Ftidytuesday) - 每周的可视化挑战，提供大量公开数据集供练习。  \n[z3tt\u002FTidyTuesday](https:\u002F\u002Fgithub.com\u002Fz3tt\u002FTidyTuesday) - 优秀的图表（R）。  \n[nrennie\u002Ftidytuesday](https:\u002F\u002Fgithub.com\u002Fnrennie\u002Ftidytuesday) - 优秀的图表（R）。  \n[poncest\u002Ftidytuesday](https:\u002F\u002Fgithub.com\u002Fponcest\u002Ftidytuesday) - 优秀的图表（R）。  \n\n##### 讲座\n[逆倾向得分加权](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=SUq0shKLPPs)  \n[通过基于倾向的特征选择处理选择偏倚](https:\u002F\u002Fwww.youtube.com\u002Fwatch?reload=9&v=3ZWCKr0vDtc)\n\n##### 文本\n[众数、中位数与均值：一种统一的视角](https:\u002F\u002Fwww.johnmyleswhite.com\u002Fnotebook\u002F2013\u002F03\u002F22\u002Fmodes-medians-and-means-an-unifying-perspective\u002F)   \n[利用范数理解线性回归](https:\u002F\u002Fwww.johnmyleswhite.com\u002Fnotebook\u002F2013\u002F03\u002F22\u002Fusing-norms-to-understand-linear-regression\u002F)   \n[验证线性模型的假设](https:\u002F\u002Fgithub.com\u002Ferykml\u002Fmedium_articles\u002Fblob\u002Fmaster\u002FStatistics\u002Flinear_regression_assumptions.ipynb)  \n[中介与调节简介](https:\u002F\u002Fademos.people.uic.edu\u002FChapter14.html)  \n[蒙哥马利等——对治疗后变量进行条件化如何毁掉你的实验以及应对之策](https:\u002F\u002Fcpb-us-e1.wpmucdn.com\u002Fsites.dartmouth.edu\u002Fdist\u002F5\u002F2293\u002Ffiles\u002F2021\u002F03\u002Fpost-treatment-bias.pdf)  \n[林德洛夫——常见的统计检验其实都是线性模型](https:\u002F\u002Flindeloev.github.io\u002Ftests-as-linear\u002F)    \n[沙特鲁克——中心极限定理及其误用](https:\u002F\u002Fweb.archive.org\u002Fweb\u002F20191229234155\u002Fhttps:\u002F\u002Flambdaclass.com\u002Fdata_etudes\u002Fcentral_limit_theorem_misuse\u002F)  \n[阿尔-萨莱赫——课堂上很少提及的标准差性质](http:\u002F\u002Fwww.stat.tugraz.at\u002FAJS\u002Fausg093\u002F093Al-Saleh.pdf)   \n[韦纳——最危险的方程](http:\u002F\u002Fnsmn1.uh.edu\u002Fdgraur\u002Fniv\u002Fthemostdangerousequation.pdf)  \n[吉格伦泽——行为经济学中的偏见偏见](https:\u002F\u002Fwww.nowpublishers.com\u002Farticle\u002FDetails\u002FRBE-0092)  \n[库克——估计尚未发生之事的概率](https:\u002F\u002Fwww.johndcook.com\u002Fblog\u002F2010\u002F03\u002F30\u002Fstatistical-rule-of-three\u002F)  \n[相同统计量，不同图形：通过模拟退火生成外观各异但统计特征相同的数据集](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing), [YouTube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DbJyPELmhJc)  \n[大数定律中的“足够大”究竟是多少？](https:\u002F\u002Fthepalindrome.org\u002Fp\u002Fhow-large-that-number-in-the-law)  \n[检方谬误](https:\u002F\u002Fwww.cebm.ox.ac.uk\u002Fnews\u002Fviews\u002Fthe-prosecutors-fallacy)  \n[邓宁-克鲁格效应就是自相关](https:\u002F\u002Feconomicsfromthetopdown.com\u002F2022\u002F04\u002F08\u002Fthe-dunning-kruger-effect-is-autocorrelation\u002F)  \n[拉菲、格林兰——帮助统计科学的语义与认知工具：用相容性和惊讶度取代置信度和显著性](https:\u002F\u002Fbmcmedresmethodol.biomedcentral.com\u002Farticles\u002F10.1186\u002Fs12874-020-01105-9)   \n[卡林等——关于回归模型的用途与滥用：呼吁改革统计实践与教学](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.06668)  \n[陈、罗斯——含零的对数？一些问题与解决方案](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.06080)  \n[维格博尔杜斯等——鼓励玩转数据，反对可疑的报告做法](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs11336-015-9445-1)  \n[西蒙斯等——假阳性心理学：数据收集与分析中的未披露灵活性使得任何结果都能被呈现为显著](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002F10.1177\u002F0956797611417632?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)  \n[张——科学结果中的一种可预测性错觉：即使是专家也会混淆推断不确定性与结果变异性](https:\u002F\u002Fwww.pnas.org\u002Fdoi\u002F10.1073\u002Fpnas.2302491120)——图1展示了推断不确定性与结果变异性之间的区别。  \n\n#### 评估\n[柯林斯等——临床预测模型的评估（第1部分）：从开发到外部验证](https:\u002F\u002Fwww.bmj.com\u002Fcontent\u002F384\u002Fbmj-2023-074819.full)——[Twitter](https:\u002F\u002Ftwitter.com\u002FGSCollins\u002Fstatus\u002F1744309712995098624)    \n\n#### 流行病学\n[莱斯科等——描述性流行病学框架](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC10144679\u002F)  \n[R流行病学联盟](https:\u002F\u002Fwww.repidemicsconsortium.org\u002Fprojects\u002F)——用于处理流行病学数据的大型工具套件（R包）。[GitHub](https:\u002F\u002Fgithub.com\u002Freconhub)   \n[incidence2](https:\u002F\u002Fgithub.com\u002Freconhub\u002Fincidence2)——发病率的计算、处理、可视化及简单建模（R包）。  \n[EpiEstim](https:\u002F\u002Fgithub.com\u002Fmrc-ide\u002FEpiEstim)——在流行病期间估计随时间变化的瞬时基本传染数R（R包）[论文](https:\u002F\u002Facademic.oup.com\u002Faje\u002Farticle\u002F178\u002F9\u002F1505\u002F89262)。  \n[researchpy](https:\u002F\u002Fgithub.com\u002Fresearchpy\u002Fresearchpy)——提供有用的`summary_cont()`函数，用于汇总统计（表1）。  \n[zEpid](https:\u002F\u002Fgithub.com\u002Fpzivich\u002FzEpid)——流行病学分析包，[教程](https:\u002F\u002Fgithub.com\u002Fpzivich\u002FPython-for-Epidemiologists)。  \n[tipr](https:\u002F\u002Fgithub.com\u002FLucyMcGowan\u002Ftipr)——针对未测量混杂因素的敏感性分析（R包）。  \n[quartets](https:\u002F\u002Fgithub.com\u002Fr-causal\u002Fquartets)——安斯康姆四重奏、因果四重奏、Datasaurus Dozen等（R包）。    \n[episensr](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fepisensr\u002Fvignettes\u002Fepisensr.html)——流行病学数据的定量偏倚分析（即模拟不同偏倚来源可能产生的影响）（R包）。  \n\n#### 机器学习教程\n[统计推断与回归](https:\u002F\u002Fmattblackwell.github.io\u002Fgov2002-book\u002F)  \n[Python中的应用机器学习](https:\u002F\u002Fgeostatsguy.github.io\u002FMachineLearningDemos_Book\u002Fintro.html)  \n[用于视觉识别的卷积神经网络](https:\u002F\u002Fcs231n.github.io\u002F)——斯坦福大学计算机科学课程。  \n[机器学习算法直觉入门](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=7o9TMQAHgkQ&list=PLNeXFnYrCJneoY_rKtWJy833YiMrCRi5f&index=1)——系列讲座。  \n\n#### 探索与清洗\n[检查清单](https:\u002F\u002Fgithub.com\u002Fr0f1\u002Fml_checklist)。  \n[pyjanitor](https:\u002F\u002Fgithub.com\u002Fpyjanitor-devs\u002Fpyjanitor)——清理混乱的列名。  \n[skimpy](https:\u002F\u002Fgithub.com\u002Faeturrell\u002Fskimpy)——创建数据框的汇总统计信息。提供有用的`clean_columns()`函数。  \n[pandera](https:\u002F\u002Fgithub.com\u002Funionai-oss\u002Fpandera)——数据\u002F模式验证。  \n[dataframely](https:\u002F\u002Fgithub.com\u002FQuantco\u002Fdataframely)——数据\u002F模式验证。  \n[pointblank](https:\u002F\u002Fgithub.com\u002Fposit-dev\u002Fpointblank)——数据\u002F模式验证。  \n[impyute](https:\u002F\u002Fgithub.com\u002Feltonlaw\u002Fimpyute)——插补。  \n[fancyimpute](https:\u002F\u002Fgithub.com\u002Fiskandr\u002Ffancyimpute)——矩阵补全与插补算法。  \n[imbalanced-learn](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fimbalanced-learn)——不平衡数据集的重采样。  \n[tspreprocess](https:\u002F\u002Fgithub.com\u002FMaxBenChrist\u002Ftspreprocess)——时间序列预处理：去噪、压缩、重采样。  \n[Kaggler](https:\u002F\u002Fgithub.com\u002Fjeongyoonlee\u002FKaggler)——实用函数（如`OneHotEncoder(min_obs=100)`）。  \n[skrub](https:\u002F\u002Fgithub.com\u002Fskrub-data\u002Fskrub)——弥合表格型数据源与机器学习模型之间的差距。\n\n#### 噪声标签\n[cleanlab](https:\u002F\u002Fgithub.com\u002Fcleanlab\u002Fcleanlab) - 用于处理噪声标签的机器学习工具，能够识别错误标注的数据并进行不确定性量化。也可参阅下方的优秀列表。  \n[doubtlab](https:\u002F\u002Fgithub.com\u002Fkoaning\u002Fdoubtlab) - 用于发现不良或噪声标签。\n\n#### 训练\u002F测试集划分\n[iterative-stratification](https:\u002F\u002Fgithub.com\u002Ftrent-b\u002Fiterative-stratification) - 多标签数据的分层采样方法。\n\n#### 特征工程\n[Vincent Warmerdam: Untitled12.ipynb](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=yXGCKqo5cEY) - 使用 df.pipe()  \n[Vincent Warmerdam: 用简单甚至线性模型取胜](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=68ABAU_V8qI)  \n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.pipeline.Pipeline.html) - 管道，[示例](https:\u002F\u002Fgithub.com\u002Fjem1031\u002Fpandas-pipelines-custom-transformers)。  \n[pdpipe](https:\u002F\u002Fgithub.com\u002Fshaypal5\u002Fpdpipe) - 适用于 DataFrame 的管道工具。  \n[scikit-lego](https:\u002F\u002Fgithub.com\u002Fkoaning\u002Fscikit-lego) - 用于管道的自定义转换器。  \n[categorical-encoding](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fcategorical-encoding) - 分类变量编码，[vtreat (R 包)](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fvtreat\u002Fvignettes\u002Fvtreat.html)。  \n[patsy](https:\u002F\u002Fgithub.com\u002Fpydata\u002Fpatsy\u002F) - 类似 R 的统计模型语法。  \n[mlxtend](https:\u002F\u002Frasbt.github.io\u002Fmlxtend\u002Fuser_guide\u002Ffeature_extraction\u002FLinearDiscriminantAnalysis\u002F) - LDA。  \n[featuretools](https:\u002F\u002Fgithub.com\u002FFeaturetools\u002Ffeaturetools) - 自动化特征工程，[示例](https:\u002F\u002Fgithub.com\u002FWillKoehrsen\u002Fautomated-feature-engineering\u002Fblob\u002Fmaster\u002Fwalk_through\u002FAutomated_Feature_Engineering.ipynb)。  \n[tsfresh](https:\u002F\u002Fgithub.com\u002Fblue-yonder\u002Ftsfresh) - 时间序列特征工程。  \n[temporian](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Ftemporian) - 谷歌推出的时间序列特征工程工具。  \n[pypeln](https:\u002F\u002Fgithub.com\u002Fcgarciae\u002Fpypeln) - 并发数据管道。  \n[feature-engine](https:\u002F\u002Fgithub.com\u002Ffeature-engine\u002Ffeature_engine) - 编码器、转换器等。\n\n#### 特征选择\n[综述论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS016794731930194X)，[讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=JsArBz46_3s)，[仓库](https:\u002F\u002Fgithub.com\u002FYimeng-Zhang\u002Ffeature-engineering-and-feature-selection)    \n博客系列 - [1](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-i-univariate-selection\u002F)，[2](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-ii-linear-models-and-regularization\u002F)，[3](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-iii-random-forests\u002F)，[4](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side\u002F)  \n教程 - [1](https:\u002F\u002Fwww.kaggle.com\u002Fresidentmario\u002Fautomated-feature-selection-with-sklearn)，[2](https:\u002F\u002Fmachinelearningmastery.com\u002Ffeature-selection-machine-learning-python\u002F)  \n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fclasses.html#module-sklearn.feature_selection) - 特征选择。  \n[eli5](https:\u002F\u002Feli5.readthedocs.io\u002Fen\u002Flatest\u002Fblackbox\u002Fpermutation_importance.html#feature-selection) - 基于排列重要性的特征选择。  \n[scikit-feature](https:\u002F\u002Fgithub.com\u002Fjundongl\u002Fscikit-feature) - 特征选择算法。  \n[stability-selection](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fstability-selection) - 稳定性选择。  \n[scikit-rebate](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Fscikit-rebate) - 基于 Relief 的特征选择算法。  \n[scikit-genetic](https:\u002F\u002Fgithub.com\u002Fmanuel-calzolari\u002Fsklearn-genetic) - 遗传特征选择。  \n[boruta_py](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fboruta_py) - 特征选择，[解释](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F264360\u002Fboruta-all-relevant-feature-selection-vs-random-forest-variables-of-importanc\u002F264467)，[示例](https:\u002F\u002Fwww.kaggle.com\u002Ftilii7\u002Fboruta-feature-elimination)。  \n[Boruta-Shap](https:\u002F\u002Fgithub.com\u002FEkeany\u002FBoruta-Shap) - Boruta 特征选择算法结合 Shapley 值。  \n[linselect](https:\u002F\u002Fgithub.com\u002Fefavdb\u002Flinselect) - 特征选择工具包。  \n[mlxtend](https:\u002F\u002Frasbt.github.io\u002Fmlxtend\u002Fuser_guide\u002Ffeature_selection\u002FExhaustiveFeatureSelector\u002F) - 穷举式特征选择。     \n[BoostARoota](https:\u002F\u002Fgithub.com\u002Fchasedehan\u002FBoostARoota) - XGBoost 特征选择算法。  \n[INVASE](https:\u002F\u002Fgithub.com\u002Fjsyoon0823\u002FINVASE) - 基于神经网络的实例级变量选择。  \n[SubTab](https:\u002F\u002Fgithub.com\u002FAstraZeneca\u002FSubTab) - 用于自监督表示学习的表格数据特征子集选取，由阿斯利康开发。  \n[mrmr](https:\u002F\u002Fgithub.com\u002Fsmazzanti\u002Fmrmr) - 最大相关最小冗余特征选择，[官网](http:\u002F\u002Fhome.penglab.com\u002Fproj\u002FmRMR\u002F)。  \n[arfs](https:\u002F\u002Fgithub.com\u002FThomasBury\u002Farfs) - 全部相关特征选择。  \n[VSURF](https:\u002F\u002Fgithub.com\u002Frobingenuer\u002FVSURF) - 使用随机森林进行变量选择（R 包），[文档](https:\u002F\u002Fwww.rdocumentation.org\u002Fpackages\u002FVSURF\u002Fversions\u002F1.1.0\u002Ftopics\u002FVSURF)。  \n[FeatureSelectionGA](https:\u002F\u002Fgithub.com\u002Fkaushalshetty\u002FFeatureSelectionGA) - 基于遗传算法的特征选择。\n\n#### 子集选择\n[apricot](https:\u002F\u002Fgithub.com\u002Fjmschrei\u002Fapricot) - 快速选择数据子集以训练机器学习模型。  \n[ducks](https:\u002F\u002Fgithub.com\u002Fmanimino\u002Fducks) - 为任意字段组合创建索引，实现快速查找。\n\n#### 降维 \u002F 表示学习\n\n##### 选择\n同时请参考聚类部分和自监督学习部分以获取更多思路！  \n[综述](https:\u002F\u002Fmembers.loria.fr\u002Fmoberger\u002FEnseignement\u002FAVR\u002FExposes\u002FTR_Dimensiereductie.pdf)  \n  \nPCA - [链接](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.decomposition.PCA.html)    \n自编码器 - [链接](https:\u002F\u002Fblog.keras.io\u002Fbuilding-autoencoders-in-keras.html)  \nIsomap - [链接](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.Isomap.html#sklearn.manifold.Isomap)    \nLLE - [链接](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.LocallyLinearEmbedding.html)  \n力导向图绘制 - [链接](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.draw_graph.html#scanpy.tl.draw_graph)    \nMDS - [链接](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.MDS.html)  \n扩散图 - [链接](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.diffmap.html)  \nt-SNE - [链接](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.TSNE.html#sklearn.manifold.TSNE)    \nNeRV - [链接](https:\u002F\u002Fgithub.com\u002Fziyuang\u002Fpynerv)，[论文](https:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fvolume11\u002Fvenna10a\u002Fvenna10a.pdf)  \nMDR - [链接](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Fscikit-mdr)  \nUMAP - [链接](https:\u002F\u002Fgithub.com\u002Flmcinnes\u002Fumap)  \n随机投影 - [链接](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Frandom_projection.html)  \nIvis - [链接](https:\u002F\u002Fgithub.com\u002Fberingresearch\u002Fivis)   \nSimCLR - [链接](https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly)  \npymde - 基于 PyTorch 的最小失真嵌入，[链接](https:\u002F\u002Fgithub.com\u002Fcvxgrp\u002Fpymde)\n\n##### 基于神经网络的\n[esvit](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fesvit) - 用于表征学习的视觉Transformer（微软）。  \n[MCML](https:\u002F\u002Fgithub.com\u002Fpachterlab\u002FMCML) - 多类别、多标签数据（测序数据）的半监督降维 [论文](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.08.25.457696v1)。  \n\n##### 软件包\n[PCA的危险性（论文）](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-022-14395-4)。  \n[PCA中的假振荡现象](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.06.20.545619v1.full)。  \n[替代PCA的方法](https:\u002F\u002Fwww.pnas.org\u002Fdoi\u002F10.1073\u002Fpnas.2319169120)。  \n[讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=9iol3Lk6kyU)，[t-SNE简介](https:\u002F\u002Fdistill.pub\u002F2016\u002Fmisread-tsne\u002F)。  \n[sklearn.manifold](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fclasses.html#module-sklearn.manifold) 和 [sklearn.decomposition](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fclasses.html#module-sklearn.decomposition) - PCA、t-SNE、MDS、Isomap等。  \nPCA的附加图表：因子载荷图、累计方差解释率图、[相关性圆图](http:\u002F\u002Frasbt.github.io\u002Fmlxtend\u002Fuser_guide\u002Fplotting\u002Fplot_pca_correlation_graph\u002F)、[推文](https:\u002F\u002Ftwitter.com\u002Frasbt\u002Fstatus\u002F1555999903398219777\u002Fphoto\u002F1)。  \n[sklearn.random_projection](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Frandom_projection.html) - Johnson-Lindenstrauss引理、高斯随机投影、稀疏随机投影。  \n[sklearn.cross_decomposition](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fcross_decomposition.html#cross-decomposition) - 偏最小二乘法，用于降维和回归的有监督估计器。  \n[prince](https:\u002F\u002Fgithub.com\u002FMaxHalford\u002Fprince) - 降维、因子分析（PCA、MCA、CA、FAMD）。  \n更快的t-SNE实现：[tsne-cuda](https:\u002F\u002Fgithub.com\u002FCannyLab\u002Ftsne-cuda)、[MulticoreTSNE](https:\u002F\u002Fgithub.com\u002FDmitryUlyanov\u002FMulticore-TSNE)、[lvdmaaten](https:\u002F\u002Flvdmaaten.github.io\u002Ftsne\u002F)。  \n[umap](https:\u002F\u002Fgithub.com\u002Flmcinnes\u002Fumap) - 均匀流形近似与投影，[讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=nq6iPZVUxZU)、[探索工具](https:\u002F\u002Fgithub.com\u002FGrantCuster\u002Fumap-explorer)、[解释](https:\u002F\u002Fpair-code.github.io\u002Funderstanding-umap\u002F)、[并行版本](https:\u002F\u002Fdocs.rapids.ai\u002Fapi\u002Fcuml\u002Fstable\u002Fapi.html)。  \n[humap](https:\u002F\u002Fgithub.com\u002Fwilsonjr\u002Fhumap) - 分层UMAP。  \n[sleepwalk](https:\u002F\u002Fgithub.com\u002Fanders-biostat\u002Fsleepwalk\u002F) - 探索嵌入空间，交互式可视化（R包）。  \n[somoclu](https:\u002F\u002Fgithub.com\u002Fpeterwittek\u002Fsomoclu) - 自组织映射。  \n[scikit-tda](https:\u002F\u002Fgithub.com\u002Fscikit-tda\u002Fscikit-tda) - 拓扑数据分析，[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fsrep01236)、[讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=F2t_ytTLrQ4)、[讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=AWoeBzJd7uQ)、[论文](https:\u002F\u002Fwww.uncg.edu\u002Fmat\u002Ffaculty\u002Fcdsmyth\u002Ftopological-approaches-skin.pdf)。  \n[giotto-tda](https:\u002F\u002Fgithub.com\u002Fgiotto-ai\u002Fgiotto-tda) - 拓扑数据分析。  \n[ivis](https:\u002F\u002Fgithub.com\u002Fberingresearch\u002Fivis) - 使用暹罗网络进行降维。  \n[trimap](https:\u002F\u002Fgithub.com\u002Feamid\u002Ftrimap) - 使用三元组进行降维。  \n[scanpy](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fscanpy) - [力导向图绘制](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.draw_graph.html#scanpy.tl.draw_graph)、[扩散图](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.diffmap.html)。  \n[direpack](https:\u002F\u002Fgithub.com\u002FSvenSerneels\u002Fdirepack) - 投影寻踪、充分降维、稳健M估计量。  \n[DBS](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002FDatabionicSwarm\u002Fvignettes\u002FDatabionicSwarm.html) - DatabionicSwarm（R包）。  \n[contrastive](https:\u002F\u002Fgithub.com\u002Fabidlabs\u002Fcontrastive) - 对比PCA。  \n[scPCA](https:\u002F\u002Fgithub.com\u002FPhilBoileau\u002FscPCA) - 稀疏对比PCA（R包）。  \n[generalized_contrastive_PCA](https:\u002F\u002Fgithub.com\u002FSjulsonLab\u002Fgeneralized_contrastive_PCA) - 广义对比PCA。  \n[tmap](https:\u002F\u002Fgithub.com\u002Freymond-group\u002Ftmap) - 面向大型高维数据集的可视化库。  \n[lollipop](https:\u002F\u002Fgithub.com\u002Fneurodata\u002Flollipop) - 线性最优低秩投影。  \n[linearsdr](https:\u002F\u002Fgithub.com\u002FHarrisQ\u002Flinearsdr) - 线性充分降维（R包）。  \n[PHATE](https:\u002F\u002Fgithub.com\u002FKrishnaswamyLab\u002FPHATE) - 用于可视化高维数据的工具。  \n[datamapplot](https:\u002F\u002Fgithub.com\u002FTutteInstitute\u002Fdatamapplot) - 用于可视化高维数据的工具。\n\n#### 可视化\n[所有图表](https:\u002F\u002Fdatavizproject.com\u002F)  \n[physt](https:\u002F\u002Fgithub.com\u002Fjanpipek\u002Fphyst) - 更好的直方图，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZG-wH3-Up9Y)，[笔记本](https:\u002F\u002Fnbviewer.jupyter.org\u002Fgithub\u002Fjanpipek\u002Fpydata2018-berlin\u002Fblob\u002Fmaster\u002Fnotebooks\u002Ftalk.ipynb)。  \n[fast-histogram](https:\u002F\u002Fgithub.com\u002Fastrofrog\u002Ffast-histogram) - 高速直方图。  \n[matplotlib_venn](https:\u002F\u002Fgithub.com\u002Fkonstantint\u002Fmatplotlib-venn) - 文氏图。  \n[penrose](https:\u002F\u002Fgithub.com\u002Fpenrose\u002Fpenrose) - 文氏图。  \n[ridgeplot](https:\u002F\u002Fgithub.com\u002Ftpvasconcelos\u002Fridgeplot) - 山脊图。  \n[镶嵌图](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.graphics.mosaicplot.mosaic.html) - 分类变量可视化，[示例](https:\u002F\u002Fsukhbinder.wordpress.com\u002F2018\u002F09\u002F18\u002Fmosaic-plot-in-python\u002F)。  \n[yellowbrick](https:\u002F\u002Fgithub.com\u002FDistrictDataLabs\u002Fyellowbrick) - 用于机器学习模型的可视化工具（类似于 scikit-plot）。  \n[bokeh](https:\u002F\u002Fgithub.com\u002Fbokeh\u002Fbokeh) - 交互式可视化库，[示例](https:\u002F\u002Fbokeh.pydata.org\u002Fen\u002Flatest\u002Fdocs\u002Fuser_guide\u002Fserver.html)，[示例](https:\u002F\u002Fgithub.com\u002FWillKoehrsen\u002FBokeh-Python-Visualization)。  \n[lets-plot](https:\u002F\u002Fgithub.com\u002FJetBrains\u002Flets-plot) - 绘图库。  \n[plotnine](https:\u002F\u002Fgithub.com\u002Fhas2k1\u002Fplotnine) - Python 版的 ggplot。  \n[altair](https:\u002F\u002Fgithub.com\u002Fvega\u002Faltair) - 声明式的统计可视化库。  \n[hvplot](https:\u002F\u002Fgithub.com\u002Fpyviz\u002Fhvplot) - 构建在 [holoviews](http:\u002F\u002Fholoviews.org\u002F) 之上的高级绘图库。  \n[dtreeviz](https:\u002F\u002Fgithub.com\u002Fparrt\u002Fdtreeviz) - 决策树可视化与模型解释工具。  \n[mpl-scatter-density](https:\u002F\u002Fgithub.com\u002Fastrofrog\u002Fmpl-scatter-density) - 散点密度图。是二维直方图的替代方案。  \n[ComplexHeatmap](https:\u002F\u002Fgithub.com\u002Fjokergoo\u002FComplexHeatmap) - 用于多维基因组数据的复杂热图（R 包）。  \n[morpheus](https:\u002F\u002Fsoftware.broadinstitute.org\u002Fmorpheus\u002F) - Broad Institute 的矩阵可视化与分析软件。[源代码](https:\u002F\u002Fgithub.com\u002Fcmap\u002Fmorpheus.js)，教程：[1](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0nkYDeekhtQ)，[2](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=r9mN6MsxUb0)，[代码](https:\u002F\u002Fgithub.com\u002Fbroadinstitute\u002FBBBC021_Morpheus_Exercise)。  \n[jupyter-scatter](https:\u002F\u002Fgithub.com\u002Fflekschas\u002Fjupyter-scatter) - Jupyter 中的交互式二维散点图小部件。  \n[fastplotlib](https:\u002F\u002Fgithub.com\u002Ffastplotlib\u002Ffastplotlib) - 使用 pygfx 的快速绘图库。  \n[datamapplot](https:\u002F\u002Fgithub.com\u002FTutteInstitute\u002Fdatamapplot) - 交互式二维散点图。  \n[SandDance](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSandDance) - 来自微软的交互式可视化工具。\n\n#### 颜色\n[palettable](https:\u002F\u002Fgithub.com\u002Fjiffyclub\u002Fpalettable) - 来自 [colorbrewer2](https:\u002F\u002Fcolorbrewer2.org\u002F#type=sequential&scheme=BuGn&n=3) 的颜色调色板。  \n[colorcet](https:\u002F\u002Fgithub.com\u002Fholoviz\u002Fcolorcet) - 一系列感知均匀的颜色映射。  \n[命名颜色轮](https:\u002F\u002Farantius.github.io\u002Fweb-color-wheel\u002F) - 适用于所有命名 HTML 颜色的颜色轮。\n\n#### 仪表板\n[py-shiny](https:\u002F\u002Fgithub.com\u002Frstudio\u002Fpy-shiny) - Python 版 Shiny，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ijRBbtT2tgc)。  \n[superset](https:\u002F\u002Fgithub.com\u002Fapache\u002Fsuperset) - Apache 提供的仪表板解决方案。  \n[streamlit](https:\u002F\u002Fgithub.com\u002Fstreamlit\u002Fstreamlit) - 仪表板解决方案。[资源](https:\u002F\u002Fgithub.com\u002Fmarcskovmadsen\u002Fawesome-streamlit)，[画廊](http:\u002F\u002Fawesome-streamlit.org\u002F) [组件](https:\u002F\u002Fwww.streamlit.io\u002Fcomponents)，[bokeh-events](https:\u002F\u002Fgithub.com\u002Fash2shukla\u002Fstreamlit-bokeh-events)。  \n[mercury](https:\u002F\u002Fgithub.com\u002Fmljar\u002Fmercury) - 将 Python 笔记本转换为 Web 应用程序，[示例](https:\u002F\u002Fgithub.com\u002Fpplonski\u002Fdashboard-python-jupyter-notebook)。  \n[dash](https:\u002F\u002Fdash.plot.ly\u002Fgallery) - plot.ly 提供的仪表板解决方案。[资源](https:\u002F\u002Fgithub.com\u002Fucg8j\u002Fawesome-dash)。  \n[visdom](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fvisdom) - Facebook 提供的仪表板库。  \n[panel](https:\u002F\u002Fpanel.pyviz.org\u002Findex.html) - 仪表板解决方案。  \n[altair 示例](https:\u002F\u002Fgithub.com\u002Fxhochy\u002Faltair-vue-vega-example) - [视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=4L568emKOvs)。  \n[voila](https:\u002F\u002Fgithub.com\u002FQuantStack\u002Fvoila) - 将 Jupyter 笔记本转化为独立的 Web 应用程序。  \n[voila-gridstack](https:\u002F\u002Fgithub.com\u002Fvoila-dashboards\u002Fvoila-gridstack) - Voila 的网格布局。\n\n#### UI\n[gradio](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio) - 为您的机器学习模型创建用户界面。\n\n#### 调查工具\n[samplics](https:\u002F\u002Fgithub.com\u002Fsamplics-org\u002Fsamplics) - 复杂调查设计中的抽样技术。\n\n#### 地理工具\n[folium](https:\u002F\u002Fgithub.com\u002Fpython-visualization\u002Ffolium) - 使用 Leaflet.js 库绘制地理地图，[Jupyter 插件](https:\u002F\u002Fgithub.com\u002Fjupyter-widgets\u002Fipyleaflet)。  \n[gmaps](https:\u002F\u002Fgithub.com\u002Fpbugnion\u002Fgmaps) - Jupyter 笔记本中的 Google 地图。  \n[stadiamaps](https:\u002F\u002Fstadiamaps.com\u002F) - 绘制地理地图。  \n[datashader](https:\u002F\u002Fgithub.com\u002Fbokeh\u002Fdatashader) - 在地图上绘制数百万个点。  \n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.neighbors.BallTree.html) - BallTree。  \n[pynndescent](https:\u002F\u002Fgithub.com\u002Flmcinnes\u002Fpynndescent) - 近邻下降法，用于近似最近邻搜索。  \n[geocoder](https:\u002F\u002Fgithub.com\u002FDenisCarriere\u002Fgeocoder) - 地址和 IP 地址的地理编码。  \n不同地理格式的转换：[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=eHRggqAvczE)，[仓库](https:\u002F\u002Fgithub.com\u002Fdillongardner\u002FPyDataSpatialAnalysis)。  \n[geopandas](https:\u002F\u002Fgithub.com\u002Fgeopandas\u002Fgeopandas) - 地理数据处理工具。  \n低级地理空间工具（GEOS、GDAL\u002FOGR、PROJ.4）。  \n矢量数据（Shapely、Fiona、Pyproj）。  \n栅格数据（Rasterio）。  \n绘图（Descartes、Catropy）。  \n[从 OpenStreetMap 预测经济指标](https:\u002F\u002Fjanakiev.com\u002Fblog\u002Fosm-predict-economic-indicators\u002F)。  \n[PySal](https:\u002F\u002Fgithub.com\u002Fpysal\u002Fpysal) - Python 空间分析库。  \n[geography](https:\u002F\u002Fgithub.com\u002Fushahidi\u002Fgeograpy) - 从 URL 或文本中提取国家、地区和城市信息。  \n[cartogram](https:\u002F\u002Fgo-cart.io\u002Fcartogram) - 基于人口的扭曲地图。\n\n#### 推荐系统\n示例：[1](https:\u002F\u002Flazyprogrammer.me\u002Ftutorial-on-collaborative-filtering-and-matrix-factorization-in-python\u002F)，[2](https:\u002F\u002Fmedium.com\u002F@james_aka_yale\u002Fthe-4-recommendation-engines-that-can-predict-your-movie-tastes-bbec857b8223)，[2-ipynb](https:\u002F\u002Fgithub.com\u002Fkhanhnamle1994\u002Fmovielens\u002Fblob\u002Fmaster\u002FContent_Based_and_Collaborative_Filtering_Models.ipynb)，[3](https:\u002F\u002Fwww.kaggle.com\u002Fmorrisb\u002Fhow-to-recommend-anything-deep-recommender)。  \n[surprise](https:\u002F\u002Fgithub.com\u002FNicolasHug\u002FSurprise) - 推荐系统，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=d7iIb_XVkZs)。  \n[implicit](https:\u002F\u002Fgithub.com\u002Fbenfred\u002Fimplicit) - 针对隐式反馈数据集的快速协同过滤。  \n[spotlight](https:\u002F\u002Fgithub.com\u002Fmaciejkula\u002Fspotlight) - 使用 PyTorch 的深度推荐模型。  \n[lightfm](https:\u002F\u002Fgithub.com\u002Flyst\u002Flightfm) - 同时支持隐式和显式反馈的推荐算法。  \n[funk-svd](https:\u002F\u002Fgithub.com\u002Fgbolmier\u002Ffunk-svd) - 快速 SVD。  \n\n#### 决策树模型\n[决策树与随机森林简介](https:\u002F\u002Fvictorzhou.com\u002Fblog\u002Fintro-to-random-forests\u002F)，[另一幅优秀的可视化图](https:\u002F\u002Fmlu-explain.github.io\u002Fdecision-tree\u002F)，梯度提升简介 [1](https:\u002F\u002Fexplained.ai\u002Fgradient-boosting\u002F)，[2](https:\u002F\u002Fwww.gormanalysis.com\u002Fblog\u002Fgradient-boosting-explained\u002F)，[决策树可视化](https:\u002F\u002Fexplained.ai\u002Fdecision-tree-viz\u002Findex.html)    \n[lightgbm](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FLightGBM) - 基于决策树算法的梯度提升（GBDT、GBRT、GBM 或 MART）框架，[文档](https:\u002F\u002Fsites.google.com\u002Fview\u002Flauraepp\u002Fparameters)。  \n[xgboost](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fxgboost) - 梯度提升（GBDT、GBRT 或 GBM）库，[文档](https:\u002F\u002Fsites.google.com\u002Fview\u002Flauraepp\u002Fparameters)，置信区间方法：[链接1](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F255783\u002Fconfidence-interval-for-xgb-forecast)，[链接2](https:\u002F\u002Ftowardsdatascience.com\u002Fregression-prediction-intervals-with-xgboost-428e0a018b)。  \n[catboost](https:\u002F\u002Fgithub.com\u002Fcatboost\u002Fcatboost) - 梯度提升。  \n[h2o](https:\u002F\u002Fgithub.com\u002Fh2oai\u002Fh2o-3) - 梯度提升及通用机器学习框架。  \n[pycaret](https:\u002F\u002Fgithub.com\u002Fpycaret\u002Fpycaret) - xgboost、lightgbm、catboost 等的封装工具。  \n[forestci](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fforest-confidence-interval) - 随机森林的置信区间。  \n[grf](https:\u002F\u002Fgithub.com\u002Fgrf-labs\u002Fgrf) - 广义随机森林。  \n[dtreeviz](https:\u002F\u002Fgithub.com\u002Fparrt\u002Fdtreeviz) - 决策树可视化与模型解释。  \n[Nuance](https:\u002F\u002Fgithub.com\u002FSauceCat\u002FNuance) - 决策树可视化。  \n[rfpimp](https:\u002F\u002Fgithub.com\u002Fparrt\u002Frandom-forest-importances) - 使用排列重要性评估随机森林的特征重要性。  \n为什么随机森林的默认特征重要性是错误的：[链接](http:\u002F\u002Fexplained.ai\u002Frf-importance\u002Findex.html)  \n[bartpy](https:\u002F\u002Fgithub.com\u002FJakeColtman\u002Fbartpy) - 贝叶斯加性回归树。  \n[merf](https:\u002F\u002Fgithub.com\u002Fmanifoldai\u002Fmerf) - 用于聚类的混合效应随机森林，[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=gWj4ZwB7f3o)  \n[groot](https:\u002F\u002Fgithub.com\u002Ftudelft-cda-lab\u002FGROOT) - 鲁棒决策树。  \n[linear-tree](https:\u002F\u002Fgithub.com\u002Fcerlymarco\u002Flinear-tree) - 叶子节点为线性模型的树。  \n[supertree](https:\u002F\u002Fgithub.com\u002Fmljar\u002Fsupertree) - 决策树可视化。  \n\n#### 自然语言处理（NLP）\u002F文本处理\n[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6zm9NC9uRkk)-[notebook](https:\u002F\u002Fnbviewer.jupyter.org\u002Fgithub\u002Fskipgram\u002Fmodern-nlp-in-python\u002Fblob\u002Fmaster\u002Fexecutable\u002FModern_NLP_in_Python.ipynb)，[notebook2](https:\u002F\u002Fahmedbesbes.com\u002Fhow-to-mine-newsfeed-data-and-extract-interactive-insights-in-python.html)，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?time_continue=2&v=sI7VpFNiy_I)。  \n[文本分类简介](https:\u002F\u002Fmlwhiz.com\u002Fblog\u002F2018\u002F12\u002F17\u002Ftext_classification\u002F)，[预处理博文](https:\u002F\u002Fmlwhiz.com\u002Fblog\u002F2019\u002F01\u002F17\u002Fdeeplearning_nlp_preprocess\u002F)。  \n[gensim](https:\u002F\u002Fradimrehurek.com\u002Fgensim\u002F) - NLP、doc2vec、word2vec、文本处理、主题建模（LSA、LDA），[示例](https:\u002F\u002Fmarkroxor.github.io\u002Fgensim\u002Fstatic\u002Fnotebooks\u002Fgensim_news_classification.html)，[一致性模型](https:\u002F\u002Fradimrehurek.com\u002Fgensim\u002Fmodels\u002Fcoherencemodel.html)用于评估。  \n嵌入 - [GloVe](https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002F) ([[1](https:\u002F\u002Fwww.kaggle.com\u002Fjhoward\u002Fimproved-lstm-baseline-glove-dropout)]，[[2](https:\u002F\u002Fwww.kaggle.com\u002Fsbongo\u002Fdo-pretrained-embeddings-give-you-the-extra-edge)])，[StarSpace](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FStarSpace)，[wikipedia2vec](https:\u002F\u002Fwikipedia2vec.github.io\u002Fwikipedia2vec\u002Fpretrained\u002F)，[可视化](https:\u002F\u002Fprojector.tensorflow.org\u002F)。  \n[magnitude](https:\u002F\u002Fgithub.com\u002Fplasticityai\u002Fmagnitude) - 向量嵌入工具包。  \n[pyldavis](https:\u002F\u002Fgithub.com\u002Fbmabey\u002FpyLDAvis) - 主题建模的可视化工具。  \n[spaCy](https:\u002F\u002Fspacy.io\u002F) - NLP。  \n[NTLK](https:\u002F\u002Fwww.nltk.org\u002F) - NLP，带有 `cosine_distance` 的实用 `KMeansClusterer`。  \n[pytext](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FPyText) - 来自 Facebook 的 NLP。  \n[fastText](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FfastText) - 高效的文本分类和表示学习。  \n[annoy](https:\u002F\u002Fgithub.com\u002Fspotify\u002Fannoy) - 近似最近邻搜索。  \n[faiss](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffaiss) - 近似最近邻搜索。  \n[infomap](https:\u002F\u002Fgithub.com\u002Fmapequation\u002Finfomap) - 将向量聚类以发现主题。  \n[datasketch](https:\u002F\u002Fgithub.com\u002Fekzhu\u002Fdatasketch) - 大数据的概率性数据结构（MinHash、HyperLogLog）。  \n[flair](https:\u002F\u002Fgithub.com\u002Fzalandoresearch\u002Fflair) - Zalando 的 NLP 框架。  \n[stanza](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fstanza) - NLP 库。  \n[Chatistics](https:\u002F\u002Fgithub.com\u002FMasterScrat\u002FChatistics) - 将 Messenger、Hangouts、WhatsApp 和 Telegram 的聊天记录转换为 DataFrame。  \n[textdistance](https:\u002F\u002Fgithub.com\u002Flife4\u002Ftextdistance) - 用于比较两个或多个序列之间距离的集合。  \n\n#### 生物图像分析\n[Lee 等人 - 荧光成像实验中的严谨性和可重复性入门指南](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC6080651\u002F)  \n[Awesome Cytodata](https:\u002F\u002Fgithub.com\u002Fcytodata\u002Fawesome-cytodata)\n\n##### 教程\n[MIT 7.016 生物学导论，2018年秋季](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLUl4u3cNGP63LmSVIVzy584-ZbjbJ-Y63) - 第27、28和29集视频讨论了染色和成像。  \n[Bio-image Analysis Notebooks](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002Fintro.html) - 大量图像处理工作流集合，包括[点扩散函数估计](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F18a_deconvolution\u002Fextract_psf.html)和[反卷积](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F18a_deconvolution\u002Fintroduction_deconvolution.html)，[3D细胞分割](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F20_image_segmentation\u002FSegmentation_3D.html)，以及使用[pyclesperanto](https:\u002F\u002Fgithub.com\u002FclEsperanto\u002Fpyclesperanto_prototype)等工具进行的[特征提取](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F22_feature_extraction\u002Fstatistics_with_pyclesperanto.html)。  \n[python_for_microscopists](https:\u002F\u002Fgithub.com\u002Fbnsreenu\u002Fpython_for_microscopists) - 提供多种图像处理任务的笔记本及配套的[youtube频道](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUC34rW-HtPJulxr5wp2Xa04w\u002Fvideos)。  \n\n##### 数据集\n[jump-cellpainting](https:\u002F\u002Fgithub.com\u002Fjump-cellpainting\u002Fdatasets) - 细胞绘画数据集。  \n[MedMNIST](https:\u002F\u002Fgithub.com\u002FMedMNIST\u002FMedMNIST) - 用于2D和3D生物医学图像分类的数据集。  \n[CytoImageNet](https:\u002F\u002Fgithub.com\u002Fstan-hua\u002FCytoImageNet) - 类似于ImageNet但专为细胞图像设计的庞大且多样化的数据集。  \n[Haghighi](https:\u002F\u002Fgithub.com\u002Fcarpenterlab\u002F2021_Haghighi_NatureMethods) - 基因表达与形态学特征图谱。  \n[broadinstitute\u002Flincs-profiling-complementarity](https:\u002F\u002Fgithub.com\u002Fbroadinstitute\u002Flincs-profiling-complementarity) - 细胞绘画与L1000检测的对比研究。  \n\n#### 生物统计学 \u002F 稳健统计学\n[MinCovDet](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.covariance.MinCovDet.html) - 协方差的稳健估计器，RMPV，[论文](https:\u002F\u002Fwires.onlinelibrary.wiley.com\u002Fdoi\u002Ffull\u002F10.1002\u002Fwics.1421)，[应用1](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002F10.1177\u002F1087057112469257?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed&)，[应用2](https:\u002F\u002Fwww.cell.com\u002Fcell-reports\u002Fpdf\u002FS2211-1247(21)00694-X.pdf)。  \n[调整后的z分数](https:\u002F\u002Fclue.io\u002Fconnectopedia\u002Freplicate_collapse) - 基于Spearman相关性的z分数加权平均值。  \n[winsorize](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.mstats.winsorize.html#scipy.stats.mstats.winsorize) - 对异常值的简单调整。  \n\n#### 高内涵筛选实验设计\n[Zhang XHD (2008) - 全基因组RNAi筛选中用于质量控制的新分析标准及高效板式设计](https:\u002F\u002Fslas-discovery.org\u002Farticle\u002FS2472-5552(22)08204-1\u002Fpdf)  \n[Iversen - 筛选实验中检测性能指标的比较：信号窗口、Z′因子与检测变异性比](https:\u002F\u002Fwww.slas-discovery.org\u002Farticle\u002FS2472-5552(22)08460-X\u002Fpdf)\n[Z因子](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FZ-factor) - 统计效应量的度量。  \n[Z'-因子](https:\u002F\u002Flink.springer.com\u002Freferenceworkentry\u002F10.1007\u002F978-3-540-47648-1_6298) - 统计效应量的度量。  \n[CV](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCoefficient_of_variation) - 变异系数。  \n[SSMD](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FStrictly_standardized_mean_difference) - 严格标准化均值差异。  \n[信号窗口](https:\u002F\u002Fwww.intechopen.com\u002Fchapters\u002F48130) - 检测质量的衡量指标。  \n\n#### 显微镜技术 + 实验\n[BD Spectrum Viewer](https:\u002F\u002Fwww.bdbiosciences.com\u002Fen-us\u002Fresources\u002Fbd-spectrum-viewer) - 计算荧光显微镜染料之间的光谱重叠和串扰。  \n[SpectraViewer](https:\u002F\u002Fwww.perkinelmer.com\u002Flab-products-and-services\u002Fspectraviewer) - 可视化荧光团的光谱兼容性（珀金埃尔默）。  \n[Thermofisher Spectrum Viewer](https:\u002F\u002Fwww.thermofisher.com\u002Forder\u002Fstain-it) - 赛默飞世尔光谱查看器。  \n[显微镜分辨率计算器](https:\u002F\u002Fwww.microscope.healthcare.nikon.com\u002Fmicrotools\u002Fresolution-calculator) - 计算图像分辨率（尼康）。  \n[PlateEditor](https:\u002F\u002Fgithub.com\u002Fvindelorme\u002FPlateEditor) - 用于药剂布局的平板设计工具，[应用程序](https:\u002F\u002Fplateeditor.sourceforge.io\u002F)，[压缩包](https:\u002F\u002Fsourceforge.net\u002Fprojects\u002Fplateeditor\u002F)，[论文](https:\u002F\u002Fjournals.plos.org\u002Fplosone\u002Farticle?id=10.1371\u002Fjournal.pone.0252488)。  \n\n##### 图像格式与转换工具\nOME-Zarr - [论文](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.02.17.528834v1.full)，[标准](https:\u002F\u002Fngff.openmicroscopy.org\u002Flatest\u002F)  \n[bioformats2raw](https:\u002F\u002Fgithub.com\u002Fglencoesoftware\u002Fbioformats2raw) - 将多种格式转换为Zarr。  \n[raw2ometiff](https:\u002F\u002Fgithub.com\u002Fglencoesoftware\u002Fraw2ometiff) - 将Zarr转换为TIFF。  \n[BatchConvert](https:\u002F\u002Fgithub.com\u002FEuro-BioImaging\u002FBatchConvert) - Bioformats2raw的封装工具，结合Nextflow实现并行转换，[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DeCWV274l0c)。  \nREMBI模型 - 生物图像推荐元数据，BioImage Archive：[研究组件指南](https:\u002F\u002Fwww.ebi.ac.uk\u002Fbioimage-archive\u002Frembi-help-examples\u002F)，[文件列表指南](https:\u002F\u002Fwww.ebi.ac.uk\u002Fbioimage-archive\u002Fhelp-file-list\u002F)，[论文](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC8606015\u002F)，[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=GVmfOpuP2_c)，[电子表格](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo\u002Fedit#gid=1023506919)  \n\n##### 矩阵格式\n[anndata](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fanndata) - 内存和磁盘上的注释数据矩阵，[文档](https:\u002F\u002Fanndata.readthedocs.io\u002Fen\u002Flatest\u002Findex.html)。  \n[muon](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fmuon) - 多模态组学框架。  \n[mudata](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fmudata) - 多模态数据（.h5mu）的实现。  \n[bdz](https:\u002F\u002Fgithub.com\u002Fopenssbd\u002Fbdz) - 基于Zarr的格式，用于存储定量生物动力学数据。\n\n#### 图像查看器\n[napari](https:\u002F\u002Fgithub.com\u002Fnapari\u002Fnapari) - 图像查看与图像处理工具。    \n[Fiji](https:\u002F\u002Ffiji.sc\u002F) - 通用工具，兼具图像查看和图像处理功能。  \n[vizarr](https:\u002F\u002Fgithub.com\u002Fhms-dbmi\u002Fvizarr) - 基于浏览器的Zarr格式图像查看器。  \n[avivator](https:\u002F\u002Fgithub.com\u002Fhms-dbmi\u002Fviv) - 基于浏览器的TIFF文件图像查看器。  \n[OMERO](https:\u002F\u002Fwww.openmicroscopy.org\u002Fomero\u002F) - 高内涵筛选专用图像查看器。[IDR](https:\u002F\u002Fidr.openmicroscopy.org\u002F) 即使用OMERO。[简介](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=nSCrMO_c-5s)   \n[fiftyone](https:\u002F\u002Fgithub.com\u002Fvoxel51\u002Ffiftyone) - 用于构建高质量数据集和计算机视觉模型的查看器及工具。  \nImage Data Explorer - 显微镜图像查看器，[Shiny应用](https:\u002F\u002Fshiny-portal.embl.de\u002Fshinyapps\u002Fapp\u002F01_image-data-explorer)，[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=H8zIZvOt1MA)。  \n[ImSwitch](https:\u002F\u002Fgithub.com\u002FImSwitch\u002FImSwitch) - 显微镜图像查看器，[文档](https:\u002F\u002Fimswitch.readthedocs.io\u002Fen\u002Fstable\u002Fgui.html)，[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=XsbnMkGSPQQ)。  \n[pixmi](https:\u002F\u002Fgithub.com\u002Fpiximi\u002Fpiximi) - 基于Web的图像标注与分类工具，[应用](https:\u002F\u002Fwww.piximi.app\u002F)。  \n[DeepCell Label](https:\u002F\u002Flabel.deepcell.org\u002F) - 用于图像分割的数据标注工具，[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zfsvUBkEeow)。  \n[lightly-studio](https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly-studio) - 图像标注工具。  \n\n#### Napari插件\n[napari-sam](https:\u002F\u002Fgithub.com\u002FMIC-DKFZ\u002Fnapari-sam) - Segment Anything插件。  \n[napari-chatgpt](https:\u002F\u002Fgithub.com\u002Froyerlab\u002Fnapari-chatgpt) - ChatGPT插件。  \n\n##### 图像修复与去噪\n[aydin](https:\u002F\u002Fgithub.com\u002Froyerlab\u002Faydin) - 图像去噪。  \n[DivNoising](https:\u002F\u002Fgithub.com\u002Fjuglab\u002FDivNoising) - 无监督去噪方法。  \n[CSBDeep](https:\u002F\u002Fgithub.com\u002FCSBDeep\u002FCSBDeep) - 内容感知图像修复，[项目页面](https:\u002F\u002Fcsbdeep.bioimagecomputing.com\u002Ftools\u002F)。  \n[gibbs-diffusion](https:\u002F\u002Fgithub.com\u002Frubenohana\u002Fgibbs-diffusion) - 图像去噪。  \n\n##### 照明校正\n[skimage](https:\u002F\u002Fscikit-image.org\u002Fdocs\u002Fdev\u002Fapi\u002Fskimage.exposure.html#skimage.exposure.equalize_adapthist) - 照明校正（CLAHE）。  \n[cidre](https:\u002F\u002Fgithub.com\u002Fsmithk\u002Fcidre) - 光学显微镜专用照明校正方法。  \n[BaSiCPy](https:\u002F\u002Fgithub.com\u002Fpeng-lab\u002FBaSiCPy) - 光学显微镜图像背景与阴影校正，[BaSiC](https:\u002F\u002Fgithub.com\u002Fmarrlab\u002FBaSiC)。  \n\n##### 溢色校正\u002F光谱解混\n[PICASSO](https:\u002F\u002Fgithub.com\u002Fnygctech\u002FPICASSO) - 无需参考光谱测量的盲解混，[论文](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.01.27.428247v1.full)  \n[cytoflow](https:\u002F\u002Fgithub.com\u002Fcytoflow\u002Fcytoflow) - 流式细胞术。包含溢色校正方法。  \nFiji中基于线性解混的溢色校正 - [Youtube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=W90qs0J29v8)。  \nLumos与Fiji结合进行溢色校正 - [链接](https:\u002F\u002Fimagej.net\u002Fplugins\u002Flumos-spectral-unmixing)。  \nAutoUnmix - [链接](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.05.30.542836v1.full)。  \n\n##### 平台与流程\n[CellProfiler](https:\u002F\u002Fgithub.com\u002FCellProfiler\u002FCellProfiler), [CellProfilerAnalyst](https:\u002F\u002Fgithub.com\u002FCellProfiler\u002FCellProfiler-Analyst) - 用于创建图像分析流程。  \n[fractal](https:\u002F\u002Ffractal-analytics-platform.github.io\u002F) - 来自苏黎世大学的高内涵成像数据分析框架，[Github](https:\u002F\u002Fgithub.com\u002Ffractal-analytics-platform)。  \n[atomai](https:\u002F\u002Fgithub.com\u002Fpycroscopy\u002Fatomai) - 用于显微镜领域的深度学习与机器学习。  \n[py-clesperanto](https:\u002F\u002Fgithub.com\u002Fclesperanto\u002Fpyclesperanto_prototype\u002F) - 用于三维显微镜分析的工具，包括[deskewing](https:\u002F\u002Fgithub.com\u002FclEsperanto\u002Fpyclesperanto_prototype\u002Fblob\u002Fmaster\u002Fdemo\u002Ftransforms\u002Fdeskew.ipynb)等多种教程，并可与Napari交互。  \n[qupath](https:\u002F\u002Fgithub.com\u002Fqupath\u002Fqupath) - 图像分析平台。  \n\n##### 显微镜分析流程\nLabsyspharm堆栈见下文。  \n[BiaPy](https:\u002F\u002Fgithub.com\u002Fdanifranco\u002FBiaPy) - 生物图像分析流程，[论文](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2024.02.03.576026v2.full)。  \n[SCIP](https:\u002F\u002Fscalable-cytometry-image-processing.readthedocs.io\u002Fen\u002Flatest\u002Fusage.html) - 基于Dask的图像处理流程。  \n[DeepCell Kiosk](https:\u002F\u002Fgithub.com\u002Fvanvalenlab\u002Fkiosk-console\u002Ftree\u002Fmaster) - 图像分析平台。  \n[IMCWorkflow](https:\u002F\u002Fgithub.com\u002FBodenmillerGroup\u002FIMCWorkflow\u002F) - 使用[steinbock](https:\u002F\u002Fgithub.com\u002FBodenmillerGroup\u002Fsteinbock)的图像分析流程，[Twitter](https:\u002F\u002Ftwitter.com\u002FNilsEling\u002Fstatus\u002F1715020265963258087)，[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41596-023-00881-0)，[工作流](https:\u002F\u002Fbodenmillergroup.github.io\u002FIMCDataAnalysis\u002F)。  \n\n##### Labsyspharm\n[mcmicro](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fmcmicro) - 多选题形式的显微镜分析流程，[官网](https:\u002F\u002Fmcmicro.org\u002Foverview\u002F)，[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41592-021-01308-y)。  \n[MCQuant](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fquantification) - 细胞特征量化工具。  \n[cylinter](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fcylinter) - 显微镜图像质量保证工具，[官网](https:\u002F\u002Flabsyspharm.github.io\u002Fcylinter\u002F)。  \n[ashlar](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fashlar) - 全玻片显微镜图像拼接与配准。  \n[scimap](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fscimap) - 空间单细胞分析工具包。\n\n##### 细胞分割\n[microscopy-tree](https:\u002F\u002Fbiomag-lab.github.io\u002Fmicroscopy-tree\u002F) - 细胞分割算法综述，[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fabs\u002Fpii\u002FS0962892421002518)。  \n类器官分析流程综述 - [论文](https:\u002F\u002Farxiv.org\u002Fftp\u002Farxiv\u002Fpapers\u002F2301\u002F2301.02341.pdf)。  \n[BioImage.IO](https:\u002F\u002Fbioimage.io\u002F#\u002F) - 生物图像模型动物园。  \n[MEDIAR](https:\u002F\u002Fgithub.com\u002FLee-Gihun\u002FMEDIAR) - 细胞分割。  \n[cellpose](https:\u002F\u002Fgithub.com\u002Fmouseland\u002Fcellpose) - 细胞分割。[论文](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2020.02.02.931238v1)，[数据集](https:\u002F\u002Fwww.cellpose.org\u002Fdataset)。  \n[stardist](https:\u002F\u002Fgithub.com\u002Fstardist\u002Fstardist) - 基于星凸形状的细胞分割。  \n[instanseg](https:\u002F\u002Fgithub.com\u002Finstanseg\u002Finstanseg) - 细胞分割。  \n[UnMicst](https:\u002F\u002Fgithub.com\u002FHMS-IDAC\u002FUnMicst) - 细胞识别与组织分割。  \n[ilastik](https:\u002F\u002Fgithub.com\u002Filastik\u002Filastik) - 细胞分割、分类、追踪与计数。[ImageJ插件](https:\u002F\u002Fgithub.com\u002Filastik\u002Filastik4ij)。  \n[nnUnet](https:\u002F\u002Fgithub.com\u002FMIC-DKFZ\u002FnnUNet) - 三维生物医学图像分割。  \n[allencell](https:\u002F\u002Fwww.allencell.org\u002Fsegmenter.html) - 用于三维分割的工具，涵盖经典方法和深度学习方法。  \n[Cell-ACDC](https:\u002F\u002Fgithub.com\u002FSchmollerLab\u002FCell_ACDC) - 用于细胞分割与追踪的Python GUI。  \n[ZeroCostDL4Mic](https:\u002F\u002Fgithub.com\u002FHenriquesLab\u002FZeroCostDL4Mic\u002Fwiki) - 显微镜下的深度学习。  \n[DL4MicEverywhere](https:\u002F\u002Fgithub.com\u002FHenriquesLab\u002FDL4MicEverywhere) - 使用Docker实现ZeroCostDL4Mic体验。  \n[EmbedSeg](https:\u002F\u002Fgithub.com\u002Fjuglab\u002FEmbedSeg) - 基于嵌入的实例分割。  \n[segment-anything](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything) - Facebook的“万物分割”（SAM）。  \n[micro-sam](https:\u002F\u002Fgithub.com\u002Fcomputational-cell-analytics\u002Fmicro-sam) - 用于显微镜成像的万物分割。  \n[Segment-Everything-Everywhere-All-At-Once](https:\u002F\u002Fgithub.com\u002FUX-Decoder\u002FSegment-Everything-Everywhere-All-At-Once) - 来自微软的“随时随地一次性分割一切”。  \n[deepcell-tf](https:\u002F\u002Fgithub.com\u002Fvanvalenlab\u002Fdeepcell-tf\u002Ftree\u002Fmaster) - 细胞分割，[DeepCell](https:\u002F\u002Fdeepcell.org\u002F)。  \n[labkit](https:\u002F\u002Fgithub.com\u002Fjuglab\u002Flabkit-ui) - Fiji插件，用于图像分割。  \n[MedImageInsight](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.06542) - 面向通用领域医学影像的嵌入模型。  \n[CHIEF](https:\u002F\u002Fgithub.com\u002Fhms-dbmi\u002FCHIEF) - 临床组织病理学影像评估基础模型。  \n\n##### 细胞分割数据集\n[cellpose](https:\u002F\u002Fwww.cellpose.org\u002Fdataset) - 细胞图像。  \n[omnipose](http:\u002F\u002Fwww.cellpose.org\u002Fdataset_omnipose) - 细胞图像。  \n[LIVECell](https:\u002F\u002Fgithub.com\u002Fsartorius-research\u002FLIVECell) - 细胞图像。  \n[Sartorius](https:\u002F\u002Fwww.kaggle.com\u002Fcompetitions\u002Fsartorius-cell-instance-segmentation\u002Foverview) - 神经元。  \n[EmbedSeg](https:\u002F\u002Fgithub.com\u002Fjuglab\u002FEmbedSeg\u002Freleases\u002Ftag\u002Fv0.1.0) - 2D + 3D图像。  \n[connectomics](https:\u002F\u002Fsites.google.com\u002Fview\u002Fconnectomics\u002F) - EPFL海马体数据集的标注。  \n[ZeroCostDL4Mic](https:\u002F\u002Fwww.ebi.ac.uk\u002Fbiostudies\u002FBioImages\u002Fstudies\u002FS-BIAD895) - Stardist示例训练与测试数据集。  \n\n##### 评价\n[seg-eval](https:\u002F\u002Fgithub.com\u002Flstrgar\u002Fseg-eval) - 无需真实标签的细胞分割性能评估，[论文](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.02.23.529809v1.full.pdf)。  \n\n##### 图像特征工程\n[药物发现中的计算机视觉挑战 - Maciej Hermanowicz](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Y5GJmnIhvFk)  \n[CellProfiler](https:\u002F\u002Fgithub.com\u002FCellProfiler\u002FCellProfiler) - 生物图像分析。   \n[scikit-image](https:\u002F\u002Fgithub.com\u002Fscikit-image\u002Fscikit-image) - 图像处理。  \n[scikit-image regionprops](https:\u002F\u002Fscikit-image.org\u002Fdocs\u002Fdev\u002Fapi\u002Fskimage.measure.html#skimage.measure.regionprops) - 区域属性：面积、偏心率、扩展度等。  \n[mahotas](https:\u002F\u002Fgithub.com\u002Fluispedro\u002Fmahotas) - Zernike、Haralick、LBP及TAS特征，[示例](https:\u002F\u002Fgithub.com\u002Fluispedro\u002Fpython-image-tutorial\u002Fblob\u002Fmaster\u002FSegmenting%20cell%20images%20(fluorescent%20microscopy).ipynb)。   \n[pyradiomics](https:\u002F\u002Fgithub.com\u002FAIM-Harvard\u002Fpyradiomics) - 医学影像中的放射组学特征。  \n[pyefd](https:\u002F\u002Fgithub.com\u002Fhbldh\u002Fpyefd) - 椭圆特征描述子，通过傅里叶级数近似轮廓。  \n[pyvips](https:\u002F\u002Fgithub.com\u002Flibvips\u002Fpyvips\u002Ftree\u002Fmaster) - 更快速的图像处理操作。  \n\n#### 领域适应 \u002F 批次效应校正 \n[Tran - 单细胞RNA测序数据批次效应校正方法基准测试](https:\u002F\u002Fgenomebiology.biomedcentral.com\u002Farticles\u002F10.1186\u002Fs13059-019-1850-9)，[代码](https:\u002F\u002Fgithub.com\u002FJinmiaoChenLab\u002FBatch-effect-removal-benchmarking)。  \n[R教程：校正批次效应](https:\u002F\u002Fbroadinstitute.github.io\u002F2019_scWorkshop\u002Fcorrecting-batch-effects.html)。  \n[harmonypy](https:\u002F\u002Fgithub.com\u002Fslowkow\u002Fharmonypy) - 模糊k均值与局部线性调整。  \n[pyliger](https:\u002F\u002Fgithub.com\u002Fwelch-lab\u002Fpyliger) - 批次效应校正，[R包](https:\u002F\u002Fgithub.com\u002Fwelch-lab\u002Fliger)。  \n[nimfa](https:\u002F\u002Fgithub.com\u002Fmims-harvard\u002Fnimfa) - 非负矩阵分解。  \n[scgen](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fscgen) - 批次去除。[文档](https:\u002F\u002Fscgen.readthedocs.io\u002Fen\u002Fstable\u002F)。  \n[CORAL](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002F30e54523f08d963ced3fbb37c00e9225579d2e1d\u002Fcorrect_batch_effects_wdn) - 利用Wasserstein距离校正批次效应，[代码](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Fblob\u002F30e54523f08d963ced3fbb37c00e9225579d2e1d\u002Fcorrect_batch_effects_wdn\u002Ftransform.py#L152)，[论文](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC7050548\u002F)。   \n[adapt](https:\u002F\u002Fgithub.com\u002Fadapt-python\u002Fadapt) - 强大的领域适应Python工具箱。  \n[pytorch-adapt](https:\u002F\u002Fgithub.com\u002FKevinMusgrave\u002Fpytorch-adapt) - 多种用于领域适应的神经网络模型。  \n\n##### 测序\n[单细胞教程](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fsingle-cell-tutorial)。  \n[PyDESeq2](https:\u002F\u002Fgithub.com\u002Fowkin\u002FPyDESeq2) - 分析RNA-seq数据。  \n[cellxgene](https:\u002F\u002Fgithub.com\u002Fchanzuckerberg\u002Fcellxgene) - 单细胞转录组数据的交互式探索工具。  \n[scanpy](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fscanpy) - 分析单细胞基因表达数据，[教程](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fsingle-cell-tutorial)。  \n[besca](https:\u002F\u002Fgithub.com\u002Fbedapub\u002Fbesca) - 超越单细胞分析。  \n[janggu](https:\u002F\u002Fgithub.com\u002FBIMSBbioinfo\u002Fjanggu) - 针对基因组学的深度学习。  \n[gdsctools](https:\u002F\u002Fgithub.com\u002FCancerRxGene\u002Fgdsctools) - 在“癌症药物敏感性基因组学”项目背景下研究药物反应，包括方差分析、IC50、MoBEM等，[文档](https:\u002F\u002Fgdsctools.readthedocs.io\u002Fen\u002Fmaster\u002F)。  \n[monkeybread](https:\u002F\u002Fgithub.com\u002Fimmunitastx\u002Fmonkeybread) - 单细胞空间转录组数据分析。\n\n##### 药物发现\n[TDC](https:\u002F\u002Fgithub.com\u002Fmims-harvard\u002FTDC\u002Ftree\u002Fmain) - 药物发现与开发。  \n[DeepPurpose](https:\u002F\u002Fgithub.com\u002Fkexinhuang12345\u002FDeepPurpose) - 基于深度学习的分子建模与预测工具包。  \n\n#### 神经网络\n[mit6874](https:\u002F\u002Fmit6874.github.io\u002F) - 计算系统生物学：生命科学中的深度学习。  \n[ConvNet形状计算器](https:\u002F\u002Fmadebyollin.github.io\u002Fconvnet-calculator\u002F) - 计算Conv2D层的输出尺寸。  \n[优秀的梯度下降文章](https:\u002F\u002Ftowardsdatascience.com\u002F10-gradient-descent-optimisation-algorithms-86989510b5e9)。  \n[半监督学习简介](https:\u002F\u002Flilianweng.github.io\u002Flil-log\u002F2021\u002F12\u002F05\u002Fsemi-supervised-learning.html)。  \n\n##### 教程与可视化工具\n[Google调参手册](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ftuning_playbook) - Google出品的系统化提升深度学习模型性能的手册。  \n[fast.ai课程](https:\u002F\u002Fcourse.fast.ai\u002F) - 面向编码者的实用深度学习课程。  \n[TensorFlow无博士教程](https:\u002F\u002Fgithub.com\u002FGoogleCloudPlatform\u002Ftensorflow-without-a-phd) - Google推出的神经网络课程。  \n特征可视化：[博客](https:\u002F\u002Fdistill.pub\u002F2017\u002Ffeature-visualization\u002F)，[PPT](http:\u002F\u002Fcs231n.stanford.edu\u002Fslides\u002F2017\u002Fcs231n_2017_lecture12.pdf)  \n[TensorFlow Playground](https:\u002F\u002Fplayground.tensorflow.org\u002F)  \n[优化算法可视化](http:\u002F\u002Fvis.ensmallen.org\u002F)，[另一份可视化](https:\u002F\u002Fgithub.com\u002Fjettify\u002Fpytorch-optimizer)    \n[cutouts-explorer](https:\u002F\u002Fgithub.com\u002Fmgckind\u002Fcutouts-explorer) - 图像查看器。  \n\n##### 图像相关\n[imgaug](https:\u002F\u002Fgithub.com\u002Faleju\u002Fimgaug) - 更复杂的图像预处理工具。  \n[Augmentor](https:\u002F\u002Fgithub.com\u002Fmdbloice\u002FAugmentor) - 图像增强库。  \n[keras预处理](https:\u002F\u002Fkeras.io\u002Fpreprocessing\u002Fimage\u002F) - 图像预处理功能。  \n[albumentations](https:\u002F\u002Fgithub.com\u002Falbu\u002Falbumentations) - 封装了imgaug及其他库的工具包。  \n[augmix](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Faugmix) - Google出品的图像增强技术。  \n[kornia](https:\u002F\u002Fgithub.com\u002Fkornia\u002Fkornia) - 图像增强、特征提取及损失函数工具。  \n[augly](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FAugLy) - Facebook出品的图像、音频、文本、视频增强工具。  \n[pyvips](https:\u002F\u002Fgithub.com\u002Flibvips\u002Fpyvips\u002Ftree\u002Fmaster) - 更快速的图像处理操作。  \n\n##### 损失函数相关\n[SegLoss](https:\u002F\u002Fgithub.com\u002FJunMa11\u002FSegLoss) - 医学图像分割用损失函数列表。  \n\n##### 激活函数\n[rational_activations](https:\u002F\u002Fgithub.com\u002Fml-research\u002Frational_activations) - 有理激活函数。  \n\n##### 文本相关\n[ktext](https:\u002F\u002Fgithub.com\u002Fhamelsmu\u002Fktext) - Keras中用于深度学习文本预处理的工具集。   \n[textgenrnn](https:\u002F\u002Fgithub.com\u002Fminimaxir\u002Ftextgenrnn) - 即用型LSTM文本生成模型。  \n[ctrl](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002Fctrl) - 文本生成工具。  \n\n##### 神经网络与深度学习框架\n[OpenMMLab](https:\u002F\u002Fgithub.com\u002Fopen-mmlab) - 用于分割、分类及其他计算机视觉任务的框架。  \n[caffe](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe) - 深度学习框架，[预训练模型](https:\u002F\u002Fgithub.com\u002FBVLC\u002Fcaffe\u002Fwiki\u002FModel-Zoo)。  \n[mxnet](https:\u002F\u002Fgithub.com\u002Fapache\u002Fincubator-mxnet) - 深度学习框架，[书籍](https:\u002F\u002Fd2l.ai\u002Findex.html)。  \n\n##### 通用库\n[keras](https:\u002F\u002Fkeras.io\u002F) - 基于[TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002F)的神经网络框架，[示例](https:\u002F\u002Fgist.github.com\u002Fcandlewill\u002F552fa102352ccce42fd829ae26277d24)。  \n[keras-contrib](https:\u002F\u002Fgithub.com\u002Fkeras-team\u002Fkeras-contrib) - Keras社区贡献库。  \n[keras-tuner](https:\u002F\u002Fgithub.com\u002Fkeras-team\u002Fkeras-tuner) - Keras超参数调优工具。  \n[hyperas](https:\u002F\u002Fgithub.com\u002Fmaxpumperla\u002Fhyperas) - Keras + Hyperopt：便捷的超参数优化封装。  \n[elephas](https:\u002F\u002Fgithub.com\u002Fmaxpumperla\u002Felephas) - 使用Keras与Spark进行分布式深度学习。  \n[tflearn](https:\u002F\u002Fgithub.com\u002Ftflearn\u002Ftflearn) - 基于TensorFlow的神经网络框架。  \n[tensorlayer](https:\u002F\u002Fgithub.com\u002Ftensorlayer\u002Ftensorlayer) - 基于TensorFlow的神经网络框架，[技巧](https:\u002F\u002Fgithub.com\u002Fwagamamaz\u002Ftensorlayer-tricks)。  \n[tensorforce](https:\u002F\u002Fgithub.com\u002Freinforceio\u002Ftensorforce) - 适用于强化学习的TensorFlow框架。  \n[autokeras](https:\u002F\u002Fgithub.com\u002Fjhfjhfj1\u002Fautokeras) - 深度学习自动化机器学习工具。  \n[PlotNeuralNet](https:\u002F\u002Fgithub.com\u002FHarisIqbal88\u002FPlotNeuralNet) - 可视化神经网络结构。  \n[lucid](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Flucid) - 神经网络可解释性工具，[激活图谱](https:\u002F\u002Fopenai.com\u002Fblog\u002Fintroducing-activation-atlases\u002F)。  \n[tcav](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftcav) - 可解释性方法。  \n[AdaBound](https:\u002F\u002Fgithub.com\u002FLuolc\u002FAdaBound) - 一种既像Adam一样快速又像SGD一样稳定的优化器，[替代方案](https:\u002F\u002Fgithub.com\u002Ftitu1994\u002Fkeras-adabound)。  \n[foolbox](https:\u002F\u002Fgithub.com\u002Fbethgelab\u002Ffoolbox) - 用于生成欺骗神经网络的对抗样本。  \n[hiddenlayer](https:\u002F\u002Fgithub.com\u002Fwaleedka\u002Fhiddenlayer) - 训练过程中的指标可视化工具。  \n[imgclsmob](https:\u002F\u002Fgithub.com\u002Fosmr\u002Fimgclsmob) - 预训练模型。  \n[netron](https:\u002F\u002Fgithub.com\u002Flutzroeder\u002Fnetron) - 深度学习与机器学习模型的可视化工具。  \n[ffcv](https:\u002F\u002Fgithub.com\u002Flibffcv\u002Fffcv) - 高效的数据加载器。  \n\n##### PyTorch相关库\n[优秀的PyTorch入门](https:\u002F\u002Fcs230.stanford.edu\u002Fblog\u002Fpytorch\u002F)    \n[skorch](https:\u002F\u002Fgithub.com\u002Fdnouri\u002Fskorch) - 兼容Scikit-learn的PyTorch封装神经网络库，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0J7FaLk0bmQ)，[幻灯片](https:\u002F\u002Fgithub.com\u002Fthomasjpfan\u002Fskorch_talk)。  \n[fastai](https:\u002F\u002Fgithub.com\u002Ffastai\u002Ffastai) - 基于PyTorch的神经网络框架。  \n[timm](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models) - PyTorch图像模型。  \n[ignite](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fignite) - PyTorch的高级库。  \n[torchcv](https:\u002F\u002Fgithub.com\u002Fdonnyyou\u002Ftorchcv) - 计算机视觉领域的深度学习工具。  \n[pytorch-optimizer](https:\u002F\u002Fgithub.com\u002Fjettify\u002Fpytorch-optimizer) - PyTorch优化器集合。  \n[pytorch-lightning](https:\u002F\u002Fgithub.com\u002FPyTorchLightning\u002FPyTorch-lightning) - PyTorch的封装框架。  \n[litserve](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002FLitServe) - 模型部署服务。  \n[lightly](https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly) - 提供MoCo、SimCLR、SimSiam、Barlow Twins、BYOL、NNCLR等自监督学习方法。  \n[MONAI](https:\u002F\u002Fgithub.com\u002Fproject-monai\u002Fmonai) - 医疗影像领域的深度学习工具。  \n[kornia](https:\u002F\u002Fgithub.com\u002Fkornia\u002Fkornia) - 图像变换、极线几何、深度估计等功能。  \n[torchinfo](https:\u002F\u002Fgithub.com\u002FTylep\u002Ftorchinfo) - 优秀的模型摘要工具。  \n[lovely-tensors](https:\u002F\u002Fgithub.com\u002Fxl0\u002Flovely-tensors\u002F) - 用于检查张量的均值、标准差、无穷大值等属性。  \n\n##### 分布式相关库\n[flexflow](https:\u002F\u002Fgithub.com\u002Fflexflow\u002FFlexFlow) - 分布式TensorFlow、Keras和PyTorch框架。  \n[horovod](https:\u002F\u002Fgithub.com\u002Fhorovod\u002Fhorovod) - 适用于TensorFlow、Keras、PyTorch以及Apache MXNet的分布式训练框架。\n\n##### 架构可视化\n[精彩列表](https:\u002F\u002Fgithub.com\u002Fashishpatel26\u002FTools-to-Design-or-Visualize-Architecture-of-Neural-Network)。  \n[netron](https:\u002F\u002Fgithub.com\u002Flutzroeder\u002Fnetron) - 神经网络查看器。  \n[visualkeras](https:\u002F\u002Fgithub.com\u002Fpaulgavrikov\u002Fvisualkeras) - 可视化 Keras 网络。  \n\n##### 计算机视觉通用\n[roboflow](https:\u002F\u002Fgithub.com\u002Froboflow\u002Fsupervision) - 可复用的计算机视觉工具。  \n\n##### 目标检测 \u002F 实例分割\n[重新加载指标：图像分析验证建议](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.01653) - 选择正确图像分析指标的指南，[代码](https:\u002F\u002Fgithub.com\u002FProject-MONAI\u002FMetricsReloaded)，[Twitter 帖子](https:\u002F\u002Ftwitter.com\u002Flena_maierhein\u002Fstatus\u002F1625450342006521857)  \n[优秀的 YOLO 解释](https:\u002F\u002Fjonathan-hui.medium.com\u002Freal-time-object-detection-with-yolo-yolov2-28b1b93e2088)  \n[ultralytics](https:\u002F\u002Fgithub.com\u002Fultralytics\u002Fultralytics) - 易于使用的 YOLO 和 SAM 模型。  \n[yolact](https:\u002F\u002Fgithub.com\u002Fdbolya\u002Fyolact) - 用于实时实例分割的全卷积模型。  \n[EfficientDet Pytorch](https:\u002F\u002Fgithub.com\u002Ftoandaominh1997\u002FEfficientDet.Pytorch)，[EfficientDet Keras](https:\u002F\u002Fgithub.com\u002Fxuannianz\u002FEfficientDet) - 可扩展且高效的目标检测。  \n[detectron2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2) - Facebook 的目标检测（Mask R-CNN）。  \n[simpledet](https:\u002F\u002Fgithub.com\u002FTuSimple\u002Fsimpledet) - 目标检测和实例识别。  \n[CenterNet](https:\u002F\u002Fgithub.com\u002Fxingyizhou\u002FCenterNet) - 目标检测。  \n[FCOS](https:\u002F\u002Fgithub.com\u002Ftianzhi0549\u002FFCOS) - 全卷积单阶段目标检测。  \n[norfair](https:\u002F\u002Fgithub.com\u002Ftryolabs\u002Fnorfair) - 实时 2D 对象跟踪。  \n[Detic](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FDetic) - 支持图像级标签的目标检测器（Facebook Research）。  \n[EasyCV](https:\u002F\u002Fgithub.com\u002Falibaba\u002FEasyCV) - 图像分割、分类、度量学习、目标检测、姿态估计。  \n\n##### 图像分类\n[nfnets](https:\u002F\u002Fgithub.com\u002Fypeleg\u002Fnfnets-keras) - 神经网络。   \n[efficientnet](https:\u002F\u002Fgithub.com\u002Flukemelas\u002FEfficientNet-PyTorch) - 神经网络。   \n[pycls](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fpycls) - PyTorch 图像分类网络：ResNet、ResNeXt、EfficientNet 和 RegNet（由 Facebook 开发）。  \n\n##### 应用与片段\n[SPADE](https:\u002F\u002Fgithub.com\u002Fnvlabs\u002Fspade) - 语义图像合成。  \n[类别变量的实体嵌入](https:\u002F\u002Farxiv.org\u002Fabs\u002F1604.06737)，[代码](https:\u002F\u002Fgithub.com\u002Fentron\u002Fentity-embedding-rossmann)，[Kaggle](https:\u002F\u002Fwww.kaggle.com\u002Faquatic\u002Fentity-embedding-neural-net\u002Fcode)  \n[图像超分辨率](https:\u002F\u002Fgithub.com\u002Fidealo\u002Fimage-super-resolution) - 使用残差密集网络进行超分辨率。  \n细胞分割 - [讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=dVFZpodqJiI)，博客文章：[1](https:\u002F\u002Fwww.thomasjpfan.com\u002F2018\u002F07\u002Fnuclei-image-segmentation-tutorial\u002F)，[2](https:\u002F\u002Fwww.thomasjpfan.com\u002F2017\u002F08\u002Fhassle-free-unets\u002F)  \n[deeplearning-models](https:\u002F\u002Fgithub.com\u002Frasbt\u002Fdeeplearning-models) - 深度学习模型。  \n\n##### 变分自编码器 (VAE)\n[变分自编码器解释视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=9zKuYvjFFS8)  \n[disentanglement_lib](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fdisentanglement_lib) - BetaVAE、FactorVAE、BetaTCVAE、DIP-VAE。  \n[ladder-vae-pytorch](https:\u002F\u002Fgithub.com\u002Faddtt\u002Fladder-vae-pytorch) - 梯式变分自编码器 (LVAE)。  \n[benchmark_VAE](https:\u002F\u002Fgithub.com\u002Fclementchadebec\u002Fbenchmark_VAE) - 统一生成自编码器实现。  \n\n##### 生成对抗网络 (GAN)\n[精彩的 GAN 应用](https:\u002F\u002Fgithub.com\u002Fnashory\u002Fgans-awesome-applications)  \n[The GAN Zoo](https:\u002F\u002Fgithub.com\u002Fhindupuravinash\u002Fthe-gan-zoo) - 生成对抗网络列表。  \n[CycleGAN 和 Pix2pix](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002Fpytorch-CycleGAN-and-pix2pix) - 各种图像到图像的任务。  \n[TensorFlow GAN 实现](https:\u002F\u002Fgithub.com\u002Fhwalsuklee\u002Ftensorflow-generative-model-collections)  \n[PyTorch GAN 实现](https:\u002F\u002Fgithub.com\u002Fznxlwm\u002Fpytorch-generative-model-collections)  \n[PyTorch GAN 实现](https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN#adversarial-autoencoder)  \n[StudioGAN](https:\u002F\u002Fgithub.com\u002FPOSTECH-CVLab\u002FPyTorch-StudioGAN) - PyTorch GAN 实现。  \n\n##### 变压器\n[注释版变压器](https:\u002F\u002Fnlp.seas.harvard.edu\u002Fannotated-transformer\u002F) - 变压器入门。  \n[从零开始的变压器](https:\u002F\u002Fe2eml.school\u002Ftransformers.html) - 入门。  \n[神经网络：从零到英雄](https:\u002F\u002Fkarpathy.ai\u002Fzero-to-hero.html) - 关于构建神经网络的视频系列。  \n[SegFormer](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FSegFormer) - 基于变压器的简单高效语义分割设计。  \n[esvit](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fesvit) - 高效的自监督视觉变压器。  \n[nystromformer](https:\u002F\u002Fgithub.com\u002FRishit-dagli\u002FNystromformer) - 因近似自注意力而更高效的变压器。  \n\n##### 结构化数据上的深度学习\n[关于表格数据深度学习的优秀综述](https:\u002F\u002Fsebastianraschka.com\u002Fblog\u002F2022\u002Fdeep-learning-for-tabular-data.html)  \n[TabPFN](https:\u002F\u002Fgithub.com\u002FPriorLabs\u002FTabPFN) - 表格数据的基础模型。  \n\n##### 基于图的神经网络\n[如何使用图卷积网络在图上进行深度学习](https:\u002F\u002Ftowardsdatascience.com\u002Fhow-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780)  \n[图卷积网络简介](http:\u002F\u002Ftkipf.github.io\u002Fgraph-convolutional-networks\u002F)  \n[尝试揭秘图深度学习](https:\u002F\u002Fericmjl.github.io\u002Fessays-on-data-science\u002Fmachine-learning\u002Fgraph-nets\u002F)  \n[ogb](https:\u002F\u002Fogb.stanford.edu\u002F) - 开放图基准，基准数据集。  \n[networkx](https:\u002F\u002Fgithub.com\u002Fnetworkx\u002Fnetworkx) - 图库。  \n[cugraph](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcugraph) - RAPIDS，在 GPU 上的图库。  \n[pytorch-geometric](https:\u002F\u002Fgithub.com\u002Frusty1s\u002Fpytorch_geometric) - 多种用于图上深度学习的方法。  \n[dgl](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fdgl) - 深度图库。  \n[graph_nets](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fgraph_nets) - 在 TensorFlow 中构建图网络，由 DeepMind 开发。  \n\n#### 模型转换\n[hummingbird](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fhummingbird) - 将训练好的机器学习模型编译为张量计算（由 Microsoft 开发）。  \n\n#### GPU\n[cuML](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcuml) - RAPIDS，在 GPU 上运行传统的表格 ML 任务，[介绍](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6XzS5XcpicM&t=2m50s)。  \n[thundergbm](https:\u002F\u002Fgithub.com\u002FXtra-Computing\u002Fthundergbm) - GBDT 和随机森林。  \n[thundersvm](https:\u002F\u002Fgithub.com\u002FXtra-Computing\u002Fthundersvm) - 支持向量机。  \nLegate Numpy - 由 Nvidia 开发的基于 GPU 的分布式 NumPy 数组（尚未发布）[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Jxxs_moibog)。\n\n#### 回归\n有序回归：[论文](https:\u002F\u002Fonlinelibrary.wiley.com\u002Fdoi\u002F10.1002\u002Fsim.10208)  \n理解支持向量回归：[幻灯片](https:\u002F\u002Fcs.adelaide.edu.au\u002F~chhshen\u002Fteaching\u002FML_SVR.pdf)、[论坛](https:\u002F\u002Fwww.quora.com\u002FHow-does-support-vector-regression-work)、[论文](http:\u002F\u002Falex.smola.org\u002Fpapers\u002F2003\u002FSmoSch03b.pdf)  \n[广义加性模型](https:\u002F\u002Fm-clark.github.io\u002Fgeneralized-additive-models\u002F) - R语言教程。  \n\n[pyearth](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fpy-earth) - 多元自适应回归样条（MARS），[教程](https:\u002F\u002Fuc-r.github.io\u002Fmars)。  \n[pygam](https:\u002F\u002Fgithub.com\u002Fdswah\u002FpyGAM) - 广义加性模型（GAMs），[解释](https:\u002F\u002Fmultithreaded.stitchfix.com\u002Fblog\u002F2015\u002F07\u002F30\u002Fgam\u002F)。  \n[GLRM](https:\u002F\u002Fgithub.com\u002Fmadeleineudell\u002FLowRankModels.jl) - 广义低秩模型。  \n[tweedie](https:\u002F\u002Fxgboost.readthedocs.io\u002Fen\u002Flatest\u002Fparameter.html#parameters-for-tweedie-regression-objective-reg-tweedie) - 专用于零膨胀目标的分布，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=-o0lpHBq85I)。  \n[MAPIE](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002FMAPIE) - 预测区间估计。  \n\n#### 多项式\n[orthopy](https:\u002F\u002Fgithub.com\u002Fnschloe\u002Forthopy) - 各种形状和大小的正交多项式。  \n\n#### 分类\n[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DkLPYccEJ8Y)、[笔记本](https:\u002F\u002Fgithub.com\u002Fianozsvald\u002Fdata_science_delivered\u002Fblob\u002Fmaster\u002Fml_creating_correct_capable_classifiers.ipynb)  \n[博客文章：概率评分](https:\u002F\u002Fmachinelearningmastery.com\u002Fhow-to-score-probability-predictions-in-python\u002F)  \n[所有分类指标](http:\u002F\u002Frali.iro.umontreal.ca\u002Frali\u002Fsites\u002Fdefault\u002Ffiles\u002Fpublis\u002FSokolovaLapalme-JIPM09.pdf)  \n[DESlib](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002FDESlib) - 动态分类器与集成选择。  \n[human-learn](https:\u002F\u002Fgithub.com\u002Fkoaning\u002Fhuman-learn) - 基于您的规则集创建并调优分类器。  \n\n#### 度量学习\n[对比表示学习](https:\u002F\u002Flilianweng.github.io\u002Flil-log\u002F2021\u002F05\u002F31\u002Fcontrastive-representation-learning.html)  \n  \n[metric-learn](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fmetric-learn) - 监督与弱监督度量学习算法。  \n[pytorch-metric-learning](https:\u002F\u002Fgithub.com\u002FKevinMusgrave\u002Fpytorch-metric-learning) - PyTorch度量学习。  \n[deep_metric_learning](https:\u002F\u002Fgithub.com\u002Fronekko\u002Fdeep_metric_learning) - 深度度量学习方法。  \n[ivis](https:\u002F\u002Fbering-ivis.readthedocs.io\u002Fen\u002Flatest\u002Fsupervised.html) - 使用暹罗神经网络进行度量学习。  \n[TensorFlow相似度](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fsimilarity) - 度量学习。  \n\n#### 距离函数\n[Steck等 - 嵌入的余弦相似度真的代表相似性吗？](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.05440)  \n[scipy.spatial](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fspatial.distance.html) - 各种距离度量。  \n[vegdist](https:\u002F\u002Frdrr.io\u002Fcran\u002Fvegan\u002Fman\u002Fvegdist.html) - 距离度量（R包）。  \n[pyemd](https:\u002F\u002Fgithub.com\u002Fwmayner\u002Fpyemd) - 地球移动距离\u002F Wasserstein距离，用于比较直方图的相似性。[OpenCV实现](https:\u002F\u002Fdocs.opencv.org\u002F3.4\u002Fd6\u002Fdc7\u002Fgroup__imgproc__hist.html)、[POT实现](https:\u002F\u002Fpythonot.github.io\u002Fauto_examples\u002Fplot_OT_2D_samples.html)   \n[dcor](https:\u002F\u002Fgithub.com\u002Fvnmabus\u002Fdcor)  - 距离相关及相关的能量统计量。  \n[GeomLoss](https:\u002F\u002Fwww.kernel-operations.io\u002Fgeomloss\u002F) - 核范数、豪斯多夫散度、去偏Sinkhorn散度（即Wasserstein距离的近似值）。  \n\n#### 自监督学习\n[lightly](https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly) - MoCo、SimCLR、SimSiam、Barlow Twins、BYOL、NNCLR。  \n[vissl](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fvissl) - 使用PyTorch的自监督学习：RotNet、Jigsaw、NPID、ClusterFit、PIRL、SimCLR、MoCo、DeepCluster、SwAV。  \n\n#### 聚类\n[应用于图像数据的聚类算法综述（即深度聚类）](https:\u002F\u002Fdeepnotes.io\u002Fdeep-clustering)。  \n[深度学习聚类：分类与新方法](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1801.07648.pdf)。  \n[层次聚类分析（R教程）](https:\u002F\u002Fuc-r.github.io\u002Fhc_clustering) - 树状图、缠结图  \n[Schubert - 停止使用肘部法则来确定k-means的簇数，并介绍如何正确选择簇数](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.12189)  \n[hdbscan](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fhdbscan) - 聚类算法，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=dGsxd67IFiU)、[博客](https:\u002F\u002Ftowardsdatascience.com\u002Funderstanding-hdbscan-and-density-based-clustering-121dbee1320e)。  \n[pyclustering](https:\u002F\u002Fgithub.com\u002Fannoviko\u002Fpyclustering) - 各种聚类算法。  \n[FCPS](https:\u002F\u002Fgithub.com\u002FMthrun\u002FFCPS) - 基础聚类问题套件（R包）。  \n[GaussianMixture](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.mixture.GaussianMixture.html) - 使用高斯混合分布的广义k-means聚类，[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=aICqoAG5BXQ)。  \n[nmslib](https:\u002F\u002Fgithub.com\u002Fnmslib\u002Fnmslib) - 相似性搜索库及用于评估k-NN方法的工具箱。  \n[merf](https:\u002F\u002Fgithub.com\u002Fmanifoldai\u002Fmerf) - 混合效应随机森林聚类，[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=gWj4ZwB7f3o)  \n[tree-SNE](https:\u002F\u002Fgithub.com\u002Fisaacrob\u002Ftreesne) - 基于t-SNE的层次聚类算法。  \n[MiniSom](https:\u002F\u002Fgithub.com\u002FJustGlowing\u002Fminisom) - 纯Python实现的自组织映射。  \n[distribution_clustering](https:\u002F\u002Fgithub.com\u002FEricElmoznino\u002Fdistribution_clustering)、[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.02624)、[相关论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.07770)、[替代方案](https:\u002F\u002Fgithub.com\u002Fr0f1\u002Fdistribution_clustering)。  \n[phenograph](https:\u002F\u002Fgithub.com\u002Fdpeerlab\u002Fphenograph) - 基于社区检测的聚类。  \n[FastPG](https:\u002F\u002Fgithub.com\u002Fsararselitsky\u002FFastPG) - 单细胞数据（RNA）聚类。对phenograph的改进，[论文](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F342339899_FastPG_Fast_clustering_of_millions_of_single_cells)。  \n[HypHC](https:\u002F\u002Fgithub.com\u002FHazyResearch\u002FHypHC) - 双曲层次聚类。  \n[BanditPAM](https:\u002F\u002Fgithub.com\u002FThrunGroup\u002FBanditPAM) - 改进的k-Medoids聚类。  \n[dendextend](https:\u002F\u002Fgithub.com\u002Ftalgalili\u002Fdendextend) - 树状图比较（R包）。  \n[DeepDPM](https:\u002F\u002Fgithub.com\u002FBGU-CS-VIL\u002FDeepDPM) - 具有未知簇数的深度聚类。  \n[generalized-kmeans-clustering](https:\u002F\u002Fgithub.com\u002Fderrickburns\u002Fgeneralized-kmeans-clustering) - 广义k-means聚类。\n\n##### 聚类评估\n* [Wagner, Wagner - 比较聚类 - 概述](https:\u002F\u002Fpublikationen.bibliothek.kit.edu\u002F1000011477\u002F812079)\n  * [调整兰德指数](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.adjusted_rand_score.html)\n  * [归一化互信息](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.normalized_mutual_info_score.html)\n  * [调整互信息](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.adjusted_mutual_info_score.html)\n  * [福尔克斯-马洛斯分数](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.fowlkes_mallows_score.html)\n  * [轮廓系数](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.silhouette_score.html)\n  * [信息变化量](https:\u002F\u002Fgist.github.com\u002Fjwcarr\u002F626cbc80e0006b526688), [Julia](https:\u002F\u002Fclusteringjl.readthedocs.io\u002Fen\u002Flatest\u002Fvarinfo.html)\n  * [成对混淆矩阵](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.cluster.pair_confusion_matrix.html)\n  * [共识分数](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.metrics.consensus_score.html) - 两组双聚类的相似性。\n* [评估聚类质量（视频）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Mf6MqIS2ql4)   \n* [fpc](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Ffpc\u002Findex.html) - 各种聚类及聚类验证方法（R包）。  \n  * 任意两个簇之间的最小距离\n  * 质心之间的距离\n  * p-分离指数：类似于最小距离。对于任意簇中占10%的“边界”点，计算其到不同簇最近点的平均距离。用于衡量密度，区分山峰与山谷。\n  * 通过加权计算近邻点数量来估计密度\n* 其他指标：\n  * 簇内平均距离\n  * 簇内平均距离与最近簇平均距离之比的均值（轮廓系数）\n  * 簇内分布与正态或均匀分布的相似度\n  * 簇内各点到质心的距离平方和（即K-Means损失函数）\n  * 原始距离与聚类诱导距离之间的相关系数（Huberts Gamma）\n  * 簇大小的熵\n  * 簇内最大间隙的平均值\n  * 自助法数据上聚类结果的变化\n\n#### 多标签分类\n[scikit-multilearn](https:\u002F\u002Fgithub.com\u002Fscikit-multilearn\u002Fscikit-multilearn) - 多标签分类，[讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=m-tAASQA7XQ&t=18m57s)。  \n\n#### 关键AI文献\n[Sublime - 伪科学在人工智能中的回归：机器学习与深度学习是否忘记了统计学和历史的教训？](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.18656)  \n\n#### 信号处理与滤波\n[斯坦福大学傅里叶变换系列讲座](https:\u002F\u002Fsee.stanford.edu\u002FCourse\u002FEE261)，[YouTube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=gZNm7L96pfY&list=PLB24BC7956EE040CD&index=1)，[讲义](https:\u002F\u002Fsee.stanford.edu\u002Fmaterials\u002Flsoftaee261\u002Fbook-fall-07.pdf)。  \n[傅里叶变换可视化讲解](https:\u002F\u002Fdsego.github.io\u002Fdemystifying-fourier\u002F)。  \n[《科学家与工程师数字信号处理指南》（1999）](https:\u002F\u002Fwww.analog.com\u002Fen\u002Feducation\u002Feducation-library\u002Fscientist_engineers_guide.html) - 第3章对贝塞尔、巴特沃斯和切比雪夫滤波器有很好的介绍。  \n[Kalman滤波器文章](https:\u002F\u002Fwww.bzarg.com\u002Fp\u002Fhow-a-kalman-filter-works-in-pictures)。  \n[Kalman滤波器书籍](https:\u002F\u002Fgithub.com\u002Frlabbe\u002FKalman-and-Bayesian-Filters-in-Python) - 以Jupyter Notebook为主，注重直观理解。包含贝叶斯滤波器及多种Kalman滤波器。  \n[FIR和IIR滤波器交互工具](https:\u002F\u002Ffiiir.com\u002F)，[示例](https:\u002F\u002Fplot.ly\u002Fpython\u002Ffft-filters\u002F)。  \n[filterpy](https:\u002F\u002Fgithub.com\u002Frlabbe\u002Ffilterpy) - Kalman滤波与最优估计库。  \n\n#### Python中的滤波\n[scipy.signal](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fsignal.html)\n* [巴特沃斯低通滤波器示例](https:\u002F\u002Fgithub.com\u002Fguillaume-chevalier\u002Ffiltering-stft-and-laplace-transform)\n* [萨维茨基-戈莱滤波器](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.signal.savgol_filter.html), [W](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSavitzky%E2%80%93Golay_filter)  \n[pandas.Series.rolling](https:\u002F\u002Fpandas.pydata.org\u002Fdocs\u002Freference\u002Fapi\u002Fpandas.Series.rolling.html) - 选择合适的`win_type`。  \n\n#### 几何\n[geomstats](https:\u002F\u002Fgithub.com\u002Fgeomstats\u002Fgeomstats) - 具有几何结构的流形上的计算与统计。\n\n#### 时间序列\n[时间序列异常检测综述论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.20512)  \n[statsmodels](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Ftsa.html) - 时间序列分析，[季节性分解](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.tsa.seasonal.seasonal_decompose.html) [示例](https:\u002F\u002Fgist.github.com\u002Fbalzer82\u002F5cec6ad7adc1b550e7ee)，[SARIMA](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.tsa.statespace.sarimax.SARIMAX.html)，[格兰杰因果检验](http:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.tsa.stattools.grangercausalitytests.html)。  \n[darts](https:\u002F\u002Fgithub.com\u002Funit8co\u002Fdarts) - 时间序列库（LightGBM、神经网络）。  \n[kats](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fkats) - Facebook出品的时间序列预测库。  \n[prophet](https:\u002F\u002Fgithub.com\u002Ffacebook\u002Fprophet) - Facebook出品的时间序列预测库。  \n[neural_prophet](https:\u002F\u002Fgithub.com\u002Fourownstory\u002Fneural_prophet) - 基于PyTorch构建的时间序列预测模型。  \n[pmdarima](https:\u002F\u002Fgithub.com\u002Falkaline-ml\u002Fpmdarima) - (自动) ARIMA的封装库。  \n[modeltime](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fmodeltime\u002Findex.html) - 时间序列预测框架（R包）。  \n[pyflux](https:\u002F\u002Fgithub.com\u002FRJT1990\u002Fpyflux) - 时间序列预测算法（ARIMA、GARCH、GAS、贝叶斯方法）。  \n[atspy](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Fatspy) - 自动化时间序列模型。  \n[pm-prophet](https:\u002F\u002Fgithub.com\u002Fluke14free\u002Fpm-prophet) - 时间序列预测与分解库。  \n[htsprophet](https:\u002F\u002Fgithub.com\u002FCollinRooney12\u002Fhtsprophet) - 使用Prophet进行层次化时间序列预测。  \n[nupic](https:\u002F\u002Fgithub.com\u002Fnumenta\u002Fnupic) - 层次化时间记忆（HTM）用于时间序列预测和异常检测。  \n[tensorflow](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow\u002F) - LSTM等模型，示例：[链接](\nhttps:\u002F\u002Fmachinelearningmastery.com\u002Ftime-series-forecasting-long-short-term-memory-network-python\u002F\n)，[链接](https:\u002F\u002Fgithub.com\u002Fhzy46\u002FTensorFlow-Time-Series-Examples)，seq2seq：[1](https:\u002F\u002Fmachinelearningmastery.com\u002Fhow-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption\u002F)，[2](https:\u002F\u002Fgithub.com\u002Fguillaume-chevalier\u002Fseq2seq-signal-prediction)，[3](https:\u002F\u002Fgithub.com\u002FJEddy92\u002FTimeSeries_Seq2Seq\u002Fblob\u002Fmaster\u002Fnotebooks\u002FTS_Seq2Seq_Intro.ipynb)，[4](https:\u002F\u002Fgithub.com\u002FLukeTonin\u002Fkeras-seq-2-seq-signal-prediction)  \n[tspreprocess](https:\u002F\u002Fgithub.com\u002FMaxBenChrist\u002Ftspreprocess) - 预处理：去噪、压缩、重采样。  \n[tsfresh](https:\u002F\u002Fgithub.com\u002Fblue-yonder\u002Ftsfresh) - 时间序列特征工程。  \n[tsfel](https:\u002F\u002Fgithub.com\u002Ffraunhoferportugal\u002Ftsfel) - 时间序列特征提取。  \n[thunder](https:\u002F\u002Fgithub.com\u002Fthunder-project\u002Fthunder) - 用于加载、处理和分析时间序列数据的数据结构与算法。  \n[gatspy](https:\u002F\u002Fwww.astroml.org\u002Fgatspy\u002F) - 天文时间序列通用工具，[讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=E4NMZyfao2c)。  \n[gendis](https:\u002F\u002Fgithub.com\u002FIBCNServices\u002FGENDIS) - shapelets，[示例](https:\u002F\u002Fgithub.com\u002FIBCNServices\u002FGENDIS\u002Fblob\u002Fmaster\u002Fgendis\u002Fexample.ipynb)。  \n[tslearn](https:\u002F\u002Fgithub.com\u002Frtavenar\u002Ftslearn) - 时间序列聚类与分类，`TimeSeriesKMeans`，`TimeSeriesKMeans`。  \n[pastas](https:\u002F\u002Fgithub.com\u002Fpastas\u002Fpastas) - 地下水时间序列分析。  \n[fastdtw](https:\u002F\u002Fgithub.com\u002Fslaypni\u002Ffastdtw) - 动态时间规整距离。  \n[fable](https:\u002F\u002Fwww.rdocumentation.org\u002Fpackages\u002Ffable\u002Fversions\u002F0.0.0.9000) - 时间序列预测（R包）。  \n[pydlm](https:\u002F\u002Fgithub.com\u002Fwwrechard\u002Fpydlm) - 贝叶斯时间序列建模（[R包](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fbsts\u002Findex.html)，[博客文章](http:\u002F\u002Fwww.unofficialgoogledatascience.com\u002F2017\u002F07\u002Ffitting-bayesian-structural-time-series.html))  \n[PyAF](https:\u002F\u002Fgithub.com\u002Fantoinecarme\u002Fpyaf) - 自动化时间序列预测。  \n[luminol](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fluminol) - LinkedIn出品的异常检测与关联性分析库。  \n[matrixprofile-ts](https:\u002F\u002Fgithub.com\u002Ftarget\u002Fmatrixprofile-ts) - 检测模式与异常，[官网](https:\u002F\u002Fwww.cs.ucr.edu\u002F~eamonn\u002FMatrixProfile.html)，[PPT](https:\u002F\u002Fwww.cs.ucr.edu\u002F~eamonn\u002FMatrix_Profile_Tutorial_Part1.pdf)，[替代方案](https:\u002F\u002Fgithub.com\u002Fmatrix-profile-foundation\u002Fmass-ts)。  \n[stumpy](https:\u002F\u002Fgithub.com\u002FTDAmeritrade\u002Fstumpy) - 另一个矩阵轮廓库。  \n[obspy](https:\u002F\u002Fgithub.com\u002Fobspy\u002Fobspy) - 地震学工具包。其中`classic_sta_lta`函数非常实用。  \n[RobustSTL](https:\u002F\u002Fgithub.com\u002FLeeDoYup\u002FRobustSTL) - 鲁棒的季节趋势分解。  \n[seglearn](https:\u002F\u002Fgithub.com\u002Fdmbee\u002Fseglearn) - 时间序列库。  \n[pyts](https:\u002F\u002Fgithub.com\u002Fjohannfaouzi\u002Fpyts) - 时间序列变换与分类，[将时间序列图像化](https:\u002F\u002Fpyts.readthedocs.io\u002Fen\u002Flatest\u002Fauto_examples\u002Findex.html#imaging-time-series)。  \n将时间序列转化为图像并使用神经网络：[示例](https:\u002F\u002Fgist.github.com\u002Foguiza\u002Fc9c373aec07b96047d1ba484f23b7b47)，[示例](https:\u002F\u002Fgithub.com\u002Fkiss90\u002Ftime-series-classification)。  \n[sktime](https:\u002F\u002Fgithub.com\u002Falan-turing-institute\u002Fsktime)，[sktime-dl](https:\u002F\u002Fgithub.com\u002Fuea-machine-learning\u002Fsktime-dl) - 用于时间序列深度学习的工具箱。  \n[adtk](https:\u002F\u002Fgithub.com\u002Farundo\u002Fadtk) - 时间序列异常检测。  \n[rocket](https:\u002F\u002Fgithub.com\u002Fangus924\u002Frocket) - 使用随机卷积核进行时间序列分类。  \n[luminaire](https:\u002F\u002Fgithub.com\u002Fzillow\u002Fluminaire) - 时间序列异常检测。  \n[etna](https:\u002F\u002Fgithub.com\u002Ftinkoff-ai\u002Fetna) - 时间序列库。  \n[Chaos Genius](https:\u002F\u002Fgithub.com\u002Fchaos-genius\u002Fchaos_genius) - 基于机器学习的分析引擎，用于离群点\u002F异常检测及根本原因分析。  \n[timesfm](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ftimesfm) - Google预训练的时间序列基础模型。  \n\n#### 时间序列 - Nixla\n[nixtla](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fnixtla) - 预训练的时间序列基础模型，用于预测和异常检测。  \n[statsforecast](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fstatsforecast) - 基于统计和计量经济学模型的预测。  \n[neuralforecast](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fneuralforecast) - 基于神经网络的预测。  \n[mlforecast](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fmlforecast) - 基于机器学习模型的预测。  \n[hierarchicalforecast](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fhierarchicalforecast) - 基于统计和计量经济学方法的层次化预测。  \n\n##### 时间序列评估\n[TimeSeriesSplit](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.model_selection.TimeSeriesSplit.html) - Sklearn时间序列分割。  \n[tscv](https:\u002F\u002Fgithub.com\u002FWenjieZ\u002FTSCV) - 带有间隔的评估。\n\n#### 金融数据与交易\n使用 cvxpy 的教程：[1](https:\u002F\u002Fcalmcode.io\u002Fcvxpy-one\u002Fthe-stigler-diet.html)、[2](https:\u002F\u002Fcalmcode.io\u002Fcvxpy-two\u002Fintroduction.html)  \n[pandas-datareader](https:\u002F\u002Fpandas-datareader.readthedocs.io\u002Fen\u002Flatest\u002Fwhatsnew.html) - 读取股票数据。  \n[yfinance](https:\u002F\u002Fgithub.com\u002Franaroussi\u002Fyfinance) - 从 Yahoo Finance 读取股票数据。  \n[findatapy](https:\u002F\u002Fgithub.com\u002Fcuemacro\u002Ffindatapy) - 从多种来源读取股票数据。  \n[ta](https:\u002F\u002Fgithub.com\u002Fbukosabino\u002Fta) - 技术分析库。  \n[backtrader](https:\u002F\u002Fgithub.com\u002Fmementum\u002Fbacktrader) - 用于交易策略的回测工具。  \n[surpriver](https:\u002F\u002Fgithub.com\u002Ftradytics\u002Fsurpriver) - 利用异常检测和机器学习，在股价大幅波动前发现相关股票。  \n[ffn](https:\u002F\u002Fgithub.com\u002Fpmorissette\u002Fffn) - 金融函数库。  \n[bt](https:\u002F\u002Fgithub.com\u002Fpmorissette\u002Fbt) - 回测算法。  \n[alpaca-trade-api-python](https:\u002F\u002Fgithub.com\u002Falpacahq\u002Falpaca-trade-api-python) - 通过 API 实现免佣金交易。  \n[eiten](https:\u002F\u002Fgithub.com\u002Ftradytics\u002Feiten) - 特征投资组合、最小方差投资组合及其他算法化投资策略。  \n[tf-quant-finance](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Ftf-quant-finance) - 谷歌推出的 TensorFlow 量化金融工具。  \n[quantstats](https:\u002F\u002Fgithub.com\u002Franaroussi\u002Fquantstats) - 投资组合管理。  \n[Riskfolio-Lib](https:\u002F\u002Fgithub.com\u002Fdcajasn\u002FRiskfolio-Lib) - 投资组合优化与战略资产配置。  \n[OpenBBTerminal](https:\u002F\u002Fgithub.com\u002FOpenBB-finance\u002FOpenBBTerminal) - 终端工具。  \n[mplfinance](https:\u002F\u002Fgithub.com\u002Fmatplotlib\u002Fmplfinance) - 金融市场数据可视化。  \n\n##### Quantopian 技术栈\n[pyfolio](https:\u002F\u002Fgithub.com\u002Fquantopian\u002Fpyfolio) - 投资组合及风险分析工具。  \n[zipline](https:\u002F\u002Fgithub.com\u002Fquantopian\u002Fzipline) - 算法化交易平台。  \n[alphalens](https:\u002F\u002Fgithub.com\u002Fquantopian\u002Falphalens) - 预测性股票因子的表现分析。  \n[empyrical](https:\u002F\u002Fgithub.com\u002Fquantopian\u002Fempyrical) - 金融风险指标计算。  \n[trading_calendars](https:\u002F\u002Fgithub.com\u002Fquantopian\u002Ftrading_calendars) - 各大证券交易所的日历工具。  \n\n#### 生存分析\n[R 中的时变 Cox 模型](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F101353\u002Fcox-regression-with-time-varying-covariates)。  \n[lifelines](https:\u002F\u002Flifelines.readthedocs.io\u002Fen\u002Flatest\u002F) - 生存分析、Cox PH 回归，[讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=aKZQUaNHYb0)、[讲座2](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=fli-yE5grtY)。  \n[scikit-survival](https:\u002F\u002Fgithub.com\u002Fsebp\u002Fscikit-survival) - 生存分析工具。  \n[xgboost](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fxgboost) - `\"objective\": \"survival:cox\"` [NHANES 示例](https:\u002F\u002Fshap.readthedocs.io\u002Fen\u002Flatest\u002Fexample_notebooks\u002Ftabular_examples\u002Ftree_based_models\u002FNHANES%20I%20Survival%20Model.html)  \n[survivalstan](https:\u002F\u002Fgithub.com\u002Fhammerlab\u002Fsurvivalstan) - 生存分析，[简介](http:\u002F\u002Fwww.hammerlab.org\u002F2017\u002F06\u002F26\u002Fintroducing-survivalstan\u002F)。  \n[convoys](https:\u002F\u002Fgithub.com\u002Fbetter\u002Fconvoys) - 分析时间延迟的转化事件。  \nRandomSurvivalForests（R 包：randomForestSRC、ggRandomForests）。  \n[pysurvival](https:\u002F\u002Fgithub.com\u002Fsquare\u002Fpysurvival) - 生存分析工具。  \n[DeepSurvivalMachines](https:\u002F\u002Fgithub.com\u002Fautonlab\u002FDeepSurvivalMachines) - 全参数化生存回归模型。  \n[auton-survival](https:\u002F\u002Fgithub.com\u002Fautonlab\u002Fauton-survival) - 基于删失时间事件的回归、反事实估计、评估与表型分析。  \n\n#### 离群点检测与异常检测\n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Foutlier_detection.html) - 孤立森林等方法。  \n[pyod](https:\u002F\u002Fpyod.readthedocs.io\u002Fen\u002Flatest\u002Fpyod.html) - 离群点检测\u002F异常检测工具。  \n[eif](https:\u002F\u002Fgithub.com\u002Fsahandha\u002Feif) - 扩展孤立森林。  \n[AnomalyDetection](https:\u002F\u002Fgithub.com\u002Ftwitter\u002FAnomalyDetection) - 异常检测（R 包）。  \n[luminol](https:\u002F\u002Fgithub.com\u002Flinkedin\u002Fluminol) - 来自 LinkedIn 的异常检测与相关性分析库。  \n用于比较直方图并检测离群点的距离指标 - [讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=U7xdiGc7IRU)：[Kolmogorov-Smirnov](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy-0.14.0\u002Freference\u002Fgenerated\u002Fscipy.stats.ks_2samp.html)、[Wasserstein](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.wasserstein_distance.html)、[能量距离（Cramer）](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.energy_distance.html)、[Kullback-Leibler 散度](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.special.kl_div.html)。  \n[banpei](https:\u002F\u002Fgithub.com\u002Ftsurubee\u002Fbanpei) - 基于奇异谱变换的异常检测库。  \n[telemanom](https:\u002F\u002Fgithub.com\u002Fkhundman\u002Ftelemanom) - 使用 LSTM 检测多变量时间序列中的异常。  \n[luminaire](https:\u002F\u002Fgithub.com\u002Fzillow\u002Fluminaire) - 时间序列异常检测工具。  \n[rrcf](https:\u002F\u002Fgithub.com\u002FkLabUM\u002Frrcf) - 适用于流式数据的鲁棒随机切割森林算法，用于异常检测。  \n\n#### 概念漂移与领域偏移\n[TorchDrift](https:\u002F\u002Fgithub.com\u002FTorchDrift\u002FTorchDrift) - 用于 PyTorch 模型的漂移检测工具。  \n[alibi-detect](https:\u002F\u002Fgithub.com\u002FSeldonIO\u002Falibi-detect) - 用于离群点、对抗样本及漂移检测的算法。  \n[evidently](https:\u002F\u002Fgithub.com\u002Fevidentlyai\u002Fevidently) - 从验证到生产阶段对机器学习模型进行评估与监控。  \n[Lipton 等人 - 使用黑盒预测器检测并纠正标签偏移](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.03916)。  \n[Bu 等人 - 基于密度差异估计的无 PDF 变化检测方法](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F7745962)。  \n\n#### 排序\n[lightning](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Flightning) - 大规模线性分类、回归和排序。  \n\n#### 因果推断\n\n##### 文献\n[Chatton 等 - 因果烹饪书：倾向得分、g-计算与双重稳健标准化的配方](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002F10.1177\u002F25152459241236149)  \n[统计再思考](https:\u002F\u002Fgithub.com\u002Frmcelreath\u002Fstat_rethinking_2022) - 视频讲座系列，贝叶斯统计，因果模型，[R](https:\u002F\u002Fbookdown.org\u002Fcontent\u002F4857\u002F)，[python](https:\u002F\u002Fgithub.com\u002Fpymc-devs\u002Fresources\u002Ftree\u002Fmaster\u002FRethinking_2)，[numpyro1](https:\u002F\u002Fgithub.com\u002Fasuagar\u002Fstatrethink-course-numpyro-2019)，[numpyro2](https:\u002F\u002Ffehiepsi.github.io\u002Frethinking-numpyro\u002F)，[tensorflow-probability](https:\u002F\u002Fgithub.com\u002Fksachdeva\u002Frethinking-tensorflow-probability)。  \n[Naimi 等 - g 方法导论](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC6074945\u002F)  \n[CS 594 因果推断与学习](https:\u002F\u002Fwww.cs.uic.edu\u002F~elena\u002Fcourses\u002Ffall19\u002Fcs594cil.html)  \n[边际效应教程](https:\u002F\u002Fmarginaleffects.com\u002Fvignettes\u002Fgcomputation.html) - 边际效应、g-计算等。  \n[Python 因果关系手册](https:\u002F\u002Fgithub.com\u002Fmatheusfacure\u002Fpython-causality-handbook)  \n[The Effect：研究设计与因果关系导论](https:\u002F\u002Ftheeffectbook.net\u002Findex.html) - 书籍  \n[结构方程模型](https:\u002F\u002Fm-clark.github.io\u002Fsem\u002F) - R 语言教程。  \n\n##### 工具\n[pecan](https:\u002F\u002Fpecan-tool.rpsychologist.com\u002F) - 用于构建交互式感知因果网络的在线工具。  \n[dagitty](https:\u002F\u002Fwww.dagitty.net\u002F) - 构建因果 DAG。  \n[dowhy](https:\u002F\u002Fgithub.com\u002Fpy-why\u002Fdowhy) - 估计因果效应。  \n[CausalImpact](https:\u002F\u002Fgithub.com\u002Ftcassou\u002Fcausal_impact) - 因果影响分析（[R 包](https:\u002F\u002Fgoogle.github.io\u002FCausalImpact\u002FCausalImpact.html)）。  \n[causallib](https:\u002F\u002Fgithub.com\u002FIBM\u002Fcausallib) - IBM 提供的模块化因果推断分析与模型评估，[示例](https:\u002F\u002Fgithub.com\u002FIBM\u002Fcausallib\u002Ftree\u002Fmaster\u002Fexamples)。  \n[causalml](https:\u002F\u002Fgithub.com\u002Fuber\u002Fcausalml) - Uber 的因果推断工具。  \n[upliftml](https:\u002F\u002Fgithub.com\u002Fbookingcom\u002Fupliftml) - Booking.com 的因果推断工具。  \n[causality](https:\u002F\u002Fgithub.com\u002Fakelleh\u002Fcausality) - 使用观察性数据集进行因果分析。  \n[DoubleML](https:\u002F\u002Fgithub.com\u002FDoubleML\u002Fdoubleml-for-py) - 机器学习 + 因果推断，[推文](https:\u002F\u002Ftwitter.com\u002FChristophMolnar\u002Fstatus\u002F1574338002305880068)，[演示文稿](https:\u002F\u002Fscholar.princeton.edu\u002Fsites\u002Fdefault\u002Ffiles\u002Fbstewart\u002Ffiles\u002Ffelton.chern_.slides.20190318.pdf)，[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.00060v1)。  \n[EconML](https:\u002F\u002Fgithub.com\u002Fpy-why\u002FEconML) - Microsoft 提供的异质性治疗效应估计工具。  \n\n##### 论文\n[Bours - 混淆因素](https:\u002F\u002Fedisciplinas.usp.br\u002Fpluginfile.php\u002F5625667\u002Fmod_resource\u002Fcontent\u002F3\u002FNontechnicalexplanation-counterfactualdefinition-confounding.pdf)  \n[Bours - 效应修饰与交互作用](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0895435621000330)  \n\n#### 概率模型与贝叶斯\n[Intro](https:\u002F\u002Ferikbern.com\u002F2018\u002F10\u002F08\u002Fthe-hackers-guide-to-uncertainty-estimates.html)，[指南](https:\u002F\u002Fgithub.com\u002FCamDavidsonPilon\u002FProbabilistic-Programming-and-Bayesian-Methods-for-Hackers)  \n[PyMC3](https:\u002F\u002Fwww.pymc.io\u002Fprojects\u002Fdocs\u002Fen\u002Fstable\u002Flearn.html) - 贝叶斯建模。  \n[numpyro](https:\u002F\u002Fgithub.com\u002Fpyro-ppl\u002Fnumpyro) - 基于 [pyro](https:\u002F\u002Fgithub.com\u002Fpyro-ppl\u002Fpyro) 构建的 NumPy 概率编程框架。  \n[pomegranate](https:\u002F\u002Fgithub.com\u002Fjmschrei\u002Fpomegranate) - 概率建模，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=dE5j6NW-Kzg)。  \n[pmlearn](https:\u002F\u002Fgithub.com\u002Fpymc-learn\u002Fpymc-learn) - 概率机器学习。  \n[arviz](https:\u002F\u002Fgithub.com\u002Farviz-devs\u002Farviz) - 贝叶斯模型的探索性分析。  \n[zhusuan](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fzhusuan) - 贝叶斯深度学习，生成模型。  \n[edward](https:\u002F\u002Fgithub.com\u002Fblei-lab\u002Fedward) - 概率建模、推断与批评，[混合密度网络 (MNDs)](http:\u002F\u002Fedwardlib.org\u002Ftutorials\u002Fmixture-density-network)，[MDN 解释](https:\u002F\u002Ftowardsdatascience.com\u002Fa-hitchhikers-guide-to-mixture-density-networks-76b435826cca)。  \n[Pyro](https:\u002F\u002Fgithub.com\u002Fpyro-ppl\u002Fpyro) - 深度通用概率编程。  \n[TensorFlow 概率](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fprobability) - 深度学习与概率建模，[演讲1](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=KJxmC5GCWe4)，[笔记本演讲1](https:\u002F\u002Fgithub.com\u002FAlxndrMlk\u002FPyDataGlobal2021\u002Fblob\u002Fmain\u002F00_PyData_Global_2021_nb_full.ipynb)，[演讲2](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=BrwKURU-wpk)，[示例](https:\u002F\u002Fgithub.com\u002FCamDavidsonPilon\u002FProbabilistic-Programming-and-Bayesian-Methods-for-Hackers\u002Fblob\u002Fmaster\u002FChapter1_Introduction\u002FCh1_Introduction_TFP.ipynb)。  \n[bambi](https:\u002F\u002Fgithub.com\u002Fbambinos\u002Fbambi) - 基于 PyMC3 的高级贝叶斯建模接口。  \n[neural-tangents](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fneural-tangents) - 无限神经网络。  \n[bnlearn](https:\u002F\u002Fgithub.com\u002Ferdogant\u002Fbnlearn) - 贝叶斯网络，参数学习、推理与采样方法。  \n\n#### 高斯过程\n[可视化](http:\u002F\u002Fwww.infinitecuriosity.org\u002Fvizgp\u002F)，[文章](https:\u002F\u002Fdistill.pub\u002F2019\u002Fvisual-exploration-gaussian-processes\u002F)  \n[GPyOpt](https:\u002F\u002Fgithub.com\u002FSheffieldML\u002FGPyOpt) - 高斯过程优化。   \n[GPflow](https:\u002F\u002Fgithub.com\u002FGPflow\u002FGPflow) - 高斯过程（TensorFlow）。  \n[gpytorch](https:\u002F\u002Fgpytorch.ai\u002F) - 高斯过程（PyTorch）。  \n\n#### 模型堆叠与集成\n[模型堆叠博客文章](http:\u002F\u002Fblog.kaggle.com\u002F2017\u002F06\u002F15\u002Fstacking-made-easy-an-introduction-to-stacknet-by-competitions-grandmaster-marios-michailidis-kazanova\u002F)  \n[mlxtend](https:\u002F\u002Fgithub.com\u002Frasbt\u002Fmlxtend) - `EnsembleVoteClassifier`、`StackingRegressor`、`StackingCVRegressor` 用于模型堆叠。  \n[vecstack](https:\u002F\u002Fgithub.com\u002Fvecxoz\u002Fvecstack) - ML 模型堆叠。  \n[StackNet](https:\u002F\u002Fgithub.com\u002Fkaz-Anova\u002FStackNet) - ML 模型堆叠。  \n[mlens](https:\u002F\u002Fgithub.com\u002Fflennerhag\u002Fmlens) - 集成学习。  \n[combo](https:\u002F\u002Fgithub.com\u002Fyzhao062\u002Fcombo) - 结合 ML 模型（堆叠、集成）。  \n\n#### 模型评估\n[evaluate](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fevaluate) - 评估机器学习模型（Hugging Face）。  \n[pycm](https:\u002F\u002Fgithub.com\u002Fsepandhaghighi\u002Fpycm) - 多分类混淆矩阵。  \n[pandas_ml](https:\u002F\u002Fgithub.com\u002Fpandas-ml\u002Fpandas-ml) - 混淆矩阵。  \n学习曲线绘制：[链接](http:\u002F\u002Fwww.ritchieng.com\u002Fmachinelearning-learning-curve\u002F)。  \n[yellowbrick](http:\u002F\u002Fwww.scikit-yb.org\u002Fen\u002Flatest\u002Fapi\u002Fmodel_selection\u002Flearning_curve.html) - 学习曲线。  \n[pyroc](https:\u002F\u002Fgithub.com\u002Fnoudald\u002Fpyroc) - 接收者操作特征（ROC）曲线。  \n\n#### 模型不确定性\n[awesome-conformal-prediction](https:\u002F\u002Fgithub.com\u002Fvaleman\u002Fawesome-conformal-prediction) - 不确定性量化。  \n[uncertainty-toolbox](https:\u002F\u002Fgithub.com\u002Funcertainty-toolbox\u002Funcertainty-toolbox) - 预测不确定性量化、校准、指标与可视化。\n\n#### 模型解释、可解释性、特征重要性\n[普林斯顿 - 基于机器学习的科学中的可重复性危机](https:\u002F\u002Fsites.google.com\u002Fprinceton.edu\u002Frep-workshop)   \n[书籍](https:\u002F\u002Fchristophm.github.io\u002Finterpretable-ml-book\u002Fagnostic.html), [示例](https:\u002F\u002Fgithub.com\u002Fjphall663\u002Finterpretable_machine_learning_with_python)  \nscikit-learn - [排列重要性](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.inspection.permutation_importance.html)（可用于任何训练好的分类器）和 [部分依赖图](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.inspection.partial_dependence.html)  \n[shap](https:\u002F\u002Fgithub.com\u002Fslundberg\u002Fshap) - 解释机器学习模型的预测，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=C80SQe16Rao), [优秀的 SHAP 入门](https:\u002F\u002Fwww.aidancooper.co.uk\u002Fa-non-technical-guide-to-interpreting-shap-analyses\u002F)。  \n[shapiq](https:\u002F\u002Fgithub.com\u002Fmmschlk\u002Fshapiq) - Shapley 交互作用量化。  \n[treeinterpreter](https:\u002F\u002Fgithub.com\u002Fandosa\u002Ftreeinterpreter) - 解释 scikit-learn 的决策树和随机森林预测。  \n[lime](https:\u002F\u002Fgithub.com\u002Fmarcotcr\u002Flime) - 解释任何机器学习分类器的预测，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=C80SQe16Rao), [警告（神话7）](https:\u002F\u002Fcrazyoscarchang.github.io\u002F2019\u002F02\u002F16\u002Fseven-myths-in-machine-learning-research\u002F)。  \n[lime_xgboost](https:\u002F\u002Fgithub.com\u002Fjphall663\u002Flime_xgboost) - 为 XGBoost 创建 LIME 解释。  \n[eli5](https:\u002F\u002Fgithub.com\u002FTeamHG-Memex\u002Feli5) - 检查机器学习分类器并解释其预测。  \n[lofo-importance](https:\u002F\u002Fgithub.com\u002Faerdem4\u002Flofo-importance) - 留一特征法重要性，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zqsQ2ojj7sE)。  \n[pybreakdown](https:\u002F\u002Fgithub.com\u002FMI2DataLab\u002FpyBreakDown) - 生成特征贡献图。  \n[pycebox](https:\u002F\u002Fgithub.com\u002FAustinRochford\u002FPyCEbox) - 个体条件期望图工具箱。  \n[pdpbox](https:\u002F\u002Fgithub.com\u002FSauceCat\u002FPDPbox) - 部分依赖图工具箱，[示例](https:\u002F\u002Fwww.kaggle.com\u002Fdansbecker\u002Fpartial-plots)。  \n[partial_dependence](https:\u002F\u002Fgithub.com\u002Fnyuvis\u002Fpartial_dependence) - 可视化和聚类部分依赖关系。  \n[contrastive_explanation](https:\u002F\u002Fgithub.com\u002FMarcelRobeer\u002FContrastiveExplanation) - 对比解释。  \n[DrWhy](https:\u002F\u002Fgithub.com\u002FModelOriented\u002FDrWhy) - 可解释 AI 工具集合。  \n[lucid](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Flucid) - 神经网络可解释性。  \n[xai](https:\u002F\u002Fgithub.com\u002FEthicalML\u002FXAI) - 机器学习可解释性工具箱。  \n[innvestigate](https:\u002F\u002Fgithub.com\u002Falbermax\u002Finnvestigate) - 用于研究神经网络预测的工具箱。  \n[dalex](https:\u002F\u002Fgithub.com\u002Fpbiecek\u002FDALEX) - ML 模型解释（R 包）。  \n[interpretml](https:\u002F\u002Fgithub.com\u002Finterpretml\u002Finterpret) - 拟合可解释模型，解释模型。  \n[shapash](https:\u002F\u002Fgithub.com\u002FMAIF\u002Fshapash) - 模型可解释性。  \n[imodels](https:\u002F\u002Fgithub.com\u002Fcsinva\u002Fimodels) - 可解释 ML 包。  \n[captum](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fcaptum) - PyTorch 的模型可解释性和理解工具。  \n\n#### 自动机器学习\n[AdaNet](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fadanet) - 基于 TensorFlow 的自动机器学习。  \n[tpot](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Ftpot) - 自动机器学习工具，优化机器学习流水线。  \n[autokeras](https:\u002F\u002Fgithub.com\u002Fjhfjhfj1\u002Fautokeras) - 针对深度学习的 AutoML。  \n[nni](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002Fnni) - 微软提供的神经架构搜索和超参数调优工具包。  \n[mljar](https:\u002F\u002Fgithub.com\u002Fmljar\u002Fmljar-supervised) - 自动机器学习。  \n[automl_zero](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002Fmaster\u002Fautoml_zero) - 来自 Google 的自动发现能够解决机器学习任务的计算机程序。  \n[AlphaPy](https:\u002F\u002Fgithub.com\u002FScottfreeLLC\u002FAlphaPy) - 使用 scikit-learn、XGBoost、LightGBM 等进行自动机器学习。  \n\n#### 图表示学习\n[Karate Club](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fkarateclub) - 图上的无监督学习。   \n[PyTorch Geometric](https:\u002F\u002Fgithub.com\u002Frusty1s\u002Fpytorch_geometric) - 使用 PyTorch 进行图表示学习。   \n[DLG](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fdgl) - 使用 TensorFlow 进行图表示学习。   \n\n#### 凸优化\n[cvxpy](https:\u002F\u002Fgithub.com\u002Fcvxgrp\u002Fcvxpy) - 用于凸优化问题的建模语言。教程：[1](https:\u002F\u002Fcalmcode.io\u002Fcvxpy-one\u002Fthe-stigler-diet.html), [2](https:\u002F\u002Fcalmcode.io\u002Fcvxpy-two\u002Fintroduction.html)  \n\n#### 进化算法与优化\n[deap](https:\u002F\u002Fgithub.com\u002FDEAP\u002Fdeap) - 进化计算框架（遗传算法、进化策略）。  \n[evol](https:\u002F\u002Fgithub.com\u002Fgodatadriven\u002Fevol) - 用于组合式进化算法的 DSL，[演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=68ABAU_V8qI&t=11m49s)。  \n[platypus](https:\u002F\u002Fgithub.com\u002FProject-Platypus\u002FPlatypus) - 多目标优化。  \n[autograd](https:\u002F\u002Fgithub.com\u002FHIPS\u002Fautograd) - 高效计算 numpy 代码的导数。  \n[nevergrad](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnevergrad) - 无需求导的优化方法。  \n[gplearn](https:\u002F\u002Fgplearn.readthedocs.io\u002Fen\u002Fstable\u002F) - 遗传编程的类似 sklearn 的接口。  \n[blackbox](https:\u002F\u002Fgithub.com\u002Fpaulknysh\u002Fblackbox) - 高成本黑盒函数的优化。  \n验光师算法 - [论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-017-06645-7)。  \n[DeepSwarm](https:\u002F\u002Fgithub.com\u002FPattio\u002FDeepSwarm) - 神经架构搜索。  \n[evotorch](https:\u002F\u002Fgithub.com\u002Fnnaisense\u002Fevotorch) - 基于 PyTorch 构建的进化计算库。\n\n#### 超参数调优\n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Findex.html) - [GridSearchCV](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.model_selection.GridSearchCV.html), [RandomizedSearchCV](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.model_selection.RandomizedSearchCV.html)。  \n[sklearn-deap](https:\u002F\u002Fgithub.com\u002Frsteca\u002Fsklearn-deap) - 使用遗传算法进行超参数搜索。  \n[hyperopt](https:\u002F\u002Fgithub.com\u002Fhyperopt\u002Fhyperopt) - 超参数优化。  \n[hyperopt-sklearn](https:\u002F\u002Fgithub.com\u002Fhyperopt\u002Fhyperopt-sklearn) - Hyperopt + sklearn。  \n[optuna](https:\u002F\u002Fgithub.com\u002Fpfnet\u002Foptuna) - 超参数优化，[讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=tcrcLRopTX0)。  \n[skopt](https:\u002F\u002Fscikit-optimize.github.io\u002F) - 用于超参数搜索的`BayesSearchCV`。  \n[tune](https:\u002F\u002Fray.readthedocs.io\u002Fen\u002Flatest\u002Ftune.html) - 面向深度学习和深度强化学习的超参数搜索。  \n[bbopt](https:\u002F\u002Fgithub.com\u002Fevhub\u002Fbbopt) - 黑盒超参数优化。  \n[dragonfly](https:\u002F\u002Fgithub.com\u002Fdragonfly\u002Fdragonfly) - 可扩展的贝叶斯优化。  \n[botorch](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fbotorch) - PyTorch中的贝叶斯优化。  \n[ax](https:\u002F\u002Fgithub.com\u002Ffacebook\u002FAx) - Facebook的自适应实验平台。  \n[lightning-hpo](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flightning-hpo) - 基于optuna的超参数优化。  \n\n#### 增量学习、在线学习\nsklearn - [PassiveAggressiveClassifier](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.linear_model.PassiveAggressiveClassifier.html), [PassiveAggressiveRegressor](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.linear_model.PassiveAggressiveRegressor.html)。  \n[river](https:\u002F\u002Fgithub.com\u002Fonline-ml\u002Friver) - 在线机器学习。  \n[Kaggler](https:\u002F\u002Fgithub.com\u002Fjeongyoonlee\u002FKaggler) - 在线学习算法。  \n\n#### 主动学习\n[讲座](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0efyjq5rWS4)  \n[modAL](https:\u002F\u002Fgithub.com\u002FmodAL-python\u002FmodAL) - 主动学习框架。  \n\n#### 强化学习\n[YouTube](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT), [YouTube](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs)  \n蒙特卡洛树搜索（MCTS）入门 - [1](https:\u002F\u002Fjeffbradberry.com\u002Fposts\u002F2015\u002F09\u002Fintro-to-monte-carlo-tree-search\u002F), [2](http:\u002F\u002Fmcts.ai\u002Fabout\u002Findex.html), [3](https:\u002F\u002Fmedium.com\u002F@quasimik\u002Fmonte-carlo-tree-search-applied-to-letterpress-34f41c86e238)  \nAlphaZero方法论 - [1](https:\u002F\u002Fgithub.com\u002FAppliedDataSciencePartners\u002FDeepReinforcementLearning), [2](https:\u002F\u002Fweb.stanford.edu\u002F~surag\u002Fposts\u002Falphazero.html), [3](https:\u002F\u002Fgithub.com\u002Fsuragnair\u002Falpha-zero-general), [速查表](https:\u002F\u002Fmedium.com\u002Fapplied-data-science\u002Falphago-zero-explained-in-one-diagram-365f5abf67e0)  \n[RLLib](https:\u002F\u002Fray.readthedocs.io\u002Fen\u002Flatest\u002Frllib.html) - 强化学习库。  \n[Horizon](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FHorizon\u002F) - Facebook的强化学习框架。  \n\n#### 部署与生命周期管理\n\n##### 工作流调度与编排\n[nextflow](https:\u002F\u002Fgithub.com\u002Fgoodwright\u002Fnextflow.py) - 使用Google Life Sciences、AWS Batch等，在Docker镜像中运行脚本和工作流图，[官网](https:\u002F\u002Fgithub.com\u002Fnextflow-io\u002Fnextflow)。   \n[airflow](https:\u002F\u002Fgithub.com\u002Fapache\u002Fairflow) - 调度和监控工作流。  \n[prefect](https:\u002F\u002Fgithub.com\u002FPrefectHQ\u002Fprefect) - Python专用的工作流调度。  \n[dagster](https:\u002F\u002Fgithub.com\u002Fdagster-io\u002Fdagster) - 数据资产的开发、生产和观测。  \n[ploomber](https:\u002F\u002Fgithub.com\u002Fploomber\u002Fploomber) - 工作流编排。  \n[kestra](https:\u002F\u002Fgithub.com\u002Fkestra-io\u002Fkestra) - 工作流编排。  \n[cml](https:\u002F\u002Fgithub.com\u002Fiterative\u002Fcml) - 机器学习项目的CI\u002FCD。  \n[rocketry](https:\u002F\u002Fgithub.com\u002FMiksus\u002Frocketry) - 任务调度。  \n[huey](https:\u002F\u002Fgithub.com\u002Fcoleifer\u002Fhuey) - 任务队列。  \n\n##### 容器化与Docker\n[减小Docker镜像大小（视频）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Z1Al4I4Os_A)  \n[优化Docker镜像大小](https:\u002F\u002Fwww.augmentedmind.de\u002F2022\u002F02\u002F06\u002Foptimize-docker-image-size\u002F)  \n[cog](https:\u002F\u002Fgithub.com\u002Freplicate\u002Fcog) - 方便构建Docker镜像。  \n\n##### 数据版本控制、数据库、管道与模型服务\n[dvc](https:\u002F\u002Fgithub.com\u002Fiterative\u002Fdvc) - 大文件的版本控制。  \n[kedro](https:\u002F\u002Fgithub.com\u002Fquantumblacklabs\u002Fkedro) - 构建数据管道。  \n[feast](https:\u002F\u002Fgithub.com\u002Ffeast-dev\u002Ffeast) - 特征存储。[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=_omcXenypmo)。  \n[pgvector](https:\u002F\u002Fgithub.com\u002Fpgvector\u002Fpgvector) - PostgreSQL中的向量相似度搜索。  \n[pinecone](https:\u002F\u002Fwww.pinecone.io\u002F) - 向量搜索应用的数据库。  \n[truss](https:\u002F\u002Fgithub.com\u002Fbasetenlabs\u002Ftruss) - 提供ML模型服务。  \n[milvus](https:\u002F\u002Fgithub.com\u002Fmilvus-io\u002Fmilvus) - 用于相似度搜索的向量数据库。  \n[mlem](https:\u002F\u002Fgithub.com\u002Fiterative\u002Fmlem) - 按照GitOps原则对ML模型进行版本管理和部署。  \n\n##### 数据科学相关\n[m2cgen](https:\u002F\u002Fgithub.com\u002FBayesWitnesses\u002Fm2cgen) - 将训练好的ML模型转译为其他语言。  \n[sklearn-porter](https:\u002F\u002Fgithub.com\u002Fnok\u002Fsklearn-porter) - 将训练好的scikit-learn估计器转译为C、Java、JavaScript等。  \n[mlflow](https:\u002F\u002Fmlflow.org\u002F) - 管理机器学习生命周期，包括实验、可重复性和部署。  \n[skll](https:\u002F\u002Fgithub.com\u002FEducationalTestingService\u002Fskll) - 命令行工具，简化机器学习实验的运行。  \n[BentoML](https:\u002F\u002Fgithub.com\u002Fbentoml\u002FBentoML) - 打包并部署机器学习模型以供生产环境使用。  \n[dagster](https:\u002F\u002Fgithub.com\u002Fdagster-io\u002Fdagster) - 专注于依赖图的工具。  \n[knockknock](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fknockknock) - 在训练结束时收到通知。  \n[metaflow](https:\u002F\u002Fgithub.com\u002FNetflix\u002Fmetaflow) - Netflix的生命周期管理工具。  \n[cortex](https:\u002F\u002Fgithub.com\u002Fcortexlabs\u002Fcortex) - 部署机器学习模型。  \n[Neptune](https:\u002F\u002Fneptune.ai) - 实验跟踪和模型注册。  \n[clearml](https:\u002F\u002Fgithub.com\u002Fallegroai\u002Fclearml) - 实验管理、MLOps和数据管理。  \n[polyaxon](https:\u002F\u002Fgithub.com\u002Fpolyaxon\u002Fpolyaxon) - MLOps。  \n[sematic](https:\u002F\u002Fgithub.com\u002Fsematic-ai\u002Fsematic) - 部署机器学习模型。  \n[zenml](https:\u002F\u002Fgithub.com\u002Fzenml-io\u002Fzenml) - MLOPs。  \n\n#### 数学与背景知识\n[各类数学和统计资源](https:\u002F\u002Frealnotcomplex.com\u002F)  \nGilbert Strang - [线性代数](https:\u002F\u002Focw.mit.edu\u002Fcourses\u002Fmathematics\u002F18-06-linear-algebra-spring-2010\u002Findex.htm)  \nGilbert Strang - [数据分析、信号处理和机器学习中的矩阵方法\n](https:\u002F\u002Focw.mit.edu\u002Fcourses\u002Fmathematics\u002F18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018\u002F)\n\n#### 资源\n[Distill.pub](https:\u002F\u002Fdistill.pub\u002F) - 博客。   \n[机器学习视频](https:\u002F\u002Fgithub.com\u002Fdustinvtran\u002Fml-videos)  \n[数据科学笔记本](https:\u002F\u002Fgithub.com\u002Fdonnemartin\u002Fdata-science-ipython-notebooks)  \n[推荐系统（微软）](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FRecommenders)  \n[数据科学速查表](https:\u002F\u002Fgithub.com\u002FFavioVazquez\u002Fds-cheatsheets)   \n\n##### 指南 \n[datasharing](https:\u002F\u002Fgithub.com\u002Fjtleek\u002Fdatasharing) - 数据共享指南。  \n\n##### 书籍\n[Blum - 数据科学基础](https:\u002F\u002Fwww.cs.cornell.edu\u002Fjeh\u002Fbook.pdf?file=book.pdf)  \n[Chan - 数据科学概率论导论](https:\u002F\u002Fprobability4datascience.com\u002Findex.html)  \n[Colonescu - 使用R语言的计量经济学原理](https:\u002F\u002Fbookdown.org\u002Fccolonescu\u002FRPoE4\u002F)  \n[Rafael Irizarry - 数据科学导论](https:\u002F\u002Frafalab.dfci.harvard.edu\u002Fdsbook-part-1\u002F)（R语言）  \n[Rafael Irizarry - 高级数据科学](https:\u002F\u002Frafalab.dfci.harvard.edu\u002Fdsbook-part-2\u002F)（R语言）  \n\n##### 其他精彩列表\n[Awesome Adversarial Machine Learning](https:\u002F\u002Fgithub.com\u002Fyenchenlin\u002Fawesome-adversarial-machine-learning)    \n[Awesome AI Booksmarks](https:\u002F\u002Fgithub.com\u002Fgoodrahstar\u002Fmy-awesome-AI-bookmarks)    \n[Awesome AI on Kubernetes](https:\u002F\u002Fgithub.com\u002FCognonicLabs\u002Fawesome-AI-kubernetes)    \n[Awesome Big Data](https:\u002F\u002Fgithub.com\u002Fonurakpolat\u002Fawesome-bigdata)    \n[Awesome Biological Image Analysis](https:\u002F\u002Fgithub.com\u002Fhallvaaw\u002Fawesome-biological-image-analysis)  \n[Awesome Business Machine Learning](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Fbusiness-machine-learning)    \n[Awesome Causality](https:\u002F\u002Fgithub.com\u002Frguo12\u002Fawesome-causality-algorithms)    \n[Awesome Community Detection](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-community-detection)    \n[Awesome CSV](https:\u002F\u002Fgithub.com\u002FsecretGeek\u002FAwesomeCSV)  \n[Awesome Cytodata](https:\u002F\u002Fgithub.com\u002Fcytodata\u002Fawesome-cytodata)  \n[Awesome Data Science](https:\u002F\u002Fgithub.com\u002Facademic\u002Fawesome-datascience)  \n[Awesome Data Science with Ruby](https:\u002F\u002Fgithub.com\u002Farbox\u002Fdata-science-with-ruby)   \n[Awesome Dash](https:\u002F\u002Fgithub.com\u002Fucg8j\u002Fawesome-dash)   \n[Awesome Decision Trees](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-decision-tree-papers)    \n[Awesome Deep Learning](https:\u002F\u002Fgithub.com\u002FChristosChristofidis\u002Fawesome-deep-learning)   \n[Awesome ETL](https:\u002F\u002Fgithub.com\u002Fpawl\u002Fawesome-etl)   \n[Awesome Financial Machine Learning](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Ffinancial-machine-learning)   \n[Awesome Fraud Detection](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-fraud-detection-papers)   \n[Awesome GAN Applications](https:\u002F\u002Fgithub.com\u002Fnashory\u002Fgans-awesome-applications)   \n[Awesome Graph Classification](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-graph-classification)   \n[Awesome Industry Machine Learning](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Findustry-machine-learning)  \n[Awesome Gradient Boosting](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-gradient-boosting-papers)   \n[Awesome Learning with Label Noise](https:\u002F\u002Fgithub.com\u002Fsubeeshvasu\u002FAwesome-Learning-with-Label-Noise)  \n[Awesome Machine Learning](https:\u002F\u002Fgithub.com\u002Fjosephmisiti\u002Fawesome-machine-learning#python)    \n[Awesome Machine Learning Books](http:\u002F\u002Fmatpalm.com\u002Fblog\u002Fcool_machine_learning_books\u002F)  \n[Awesome Machine Learning Interpretability](https:\u002F\u002Fgithub.com\u002Fjphall663\u002Fawesome-machine-learning-interpretability)     \n[Awesome Machine Learning Operations](https:\u002F\u002Fgithub.com\u002FEthicalML\u002Fawesome-machine-learning-operations)   \n[Awesome Monte Carlo Tree Search](https:\u002F\u002Fgithub.com\u002Fbenedekrozemberczki\u002Fawesome-monte-carlo-tree-search-papers)   \n[Awesome MLOps](https:\u002F\u002Fgithub.com\u002Fkelvins\u002Fawesome-mlops)  \n[Awesome Neural Network Visualization](https:\u002F\u002Fgithub.com\u002Fashishpatel26\u002FTools-to-Design-or-Visualize-Architecture-of-Neural-Network)  \n[Awesome Online Machine Learning](https:\u002F\u002Fgithub.com\u002FMaxHalford\u002Fawesome-online-machine-learning)  \n[Awesome Pipeline](https:\u002F\u002Fgithub.com\u002Fpditommaso\u002Fawesome-pipeline)  \n[Awesome Public APIs](https:\u002F\u002Fgithub.com\u002Fpublic-apis\u002Fpublic-apis)  \n[Awesome Public Datasets](https:\u002F\u002Fgithub.com\u002Fawesomedata\u002Fawesome-public-datasets)  \n[Awesome Python](https:\u002F\u002Fgithub.com\u002Fvinta\u002Fawesome-python)   \n[Awesome Python Data Science](https:\u002F\u002Fgithub.com\u002Fkrzjoa\u002Fawesome-python-datascience)   \n[Awesome Python Data Science](https:\u002F\u002Fgithub.com\u002Fthomasjpfan\u002Fawesome-python-data-science)  \n[Awesome Pytorch](https:\u002F\u002Fgithub.com\u002Fbharathgs\u002FAwesome-pytorch-list)  \n[Awesome Quantitative Finance](https:\u002F\u002Fgithub.com\u002Fwilsonfreitas\u002Fawesome-quant)  \n[Awesome Recommender Systems](https:\u002F\u002Fgithub.com\u002Fgrahamjenson\u002Flist_of_recommender_systems)  \n[Awesome Satellite Benchmark Datasets](https:\u002F\u002Fgithub.com\u002FSeyed-Ali-Ahmadi\u002FAwesome_Satellite_Benchmark_Datasets)  \n[Awesome Satellite Image for Deep Learning](https:\u002F\u002Fgithub.com\u002Fsatellite-image-deep-learning\u002Ftechniques)  \n[Awesome Single Cell](https:\u002F\u002Fgithub.com\u002Fseandavi\u002Fawesome-single-cell)  \n[Awesome Semantic Segmentation](https:\u002F\u002Fgithub.com\u002Fmrgloom\u002Fawesome-semantic-segmentation)  \n[Awesome Sentence Embedding](https:\u002F\u002Fgithub.com\u002FSeparius\u002Fawesome-sentence-embedding)  \n[Awesome Visual Attentions](https:\u002F\u002Fgithub.com\u002FMenghaoGuo\u002FAwesome-Vision-Attentions)  \n[Awesome Visual Transformer](https:\u002F\u002Fgithub.com\u002Fdk-liang\u002FAwesome-Visual-Transformer)  \n\n#### 讲座\n[NYU深度学习SP21](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLLHTzKZzVU9e6xUfG10TkTWApKSZCzuBI) - YouTube播放列表。   \n\n#### 我经常谷歌的东西\n[颜色代码](https:\u002F\u002Fgithub.com\u002Fd3\u002Fd3-3.x-api-reference\u002Fblob\u002Fmaster\u002FOrdinal-Scales.md#categorical-colors)  \n[时间序列频率代码](https:\u002F\u002Fpandas.pydata.org\u002Fpandas-docs\u002Fstable\u002Ftimeseries.html#offset-aliases)  \n[日期解析代码](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Fdatetime.html#strftime-and-strptime-behavior)  \n\n\n\n## 贡献  \n你知道有哪些应该加入此列表的软件包吗？或者你是否发现某些软件包已经不再维护，需要从列表中移除？那么请阅读[贡献指南](CONTRIBUTING.md)，提交你的拉取请求或创建一个新的问题。  \n\n## 许可证\n[![CC0](http:\u002F\u002Fmirrors.creativecommons.org\u002Fpresskit\u002Fbuttons\u002F88x31\u002Fsvg\u002Fcc-zero.svg)](https:\u002F\u002Fcreativecommons.org\u002Fpublicdomain\u002Fzero\u002F1.0\u002F)","# datascience 快速上手指南\n\n`datascience` 并非单一的软件包，而是一份精选的 **Python 数据科学资源清单**。本指南将帮助你基于该清单的核心推荐，快速搭建一个现代化的 Python 数据科学开发环境，并掌握核心库的基本用法。\n\n## 环境准备\n\n在开始之前，请确保你的系统满足以下要求：\n\n*   **操作系统**：Windows, macOS 或 Linux。\n*   **Python 版本**：推荐安装 **Python 3.9 - 3.12**（避免使用过旧或最新的预览版以保证库兼容性）。\n*   **包管理工具**：推荐使用 [`uv`](https:\u002F\u002Fgithub.com\u002Fastral-sh\u002Fuv) 或 `pip` 进行依赖管理。`uv` 速度极快，是现代 Python 项目的首选。\n*   **开发编辑器**：推荐 VS Code（配合 `rainbow-csv` 插件）或 Jupyter Lab。\n\n> **国内加速建议**：\n> 在中国大陆地区，建议使用国内镜像源加速包下载。\n> *   **uv\u002Fpip 通用镜像**：清华大学开源软件镜像站 (`https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`) 或 阿里云镜像 (`https:\u002F\u002Fmirrors.aliyun.com\u002Fpypi\u002Fsimple\u002F`)。\n\n## 安装步骤\n\n### 方案 A：使用 uv（推荐，极速）\n\n`uv` 是一个用 Rust 编写的超快 Python 包安装器和项目管理器。\n\n1.  **安装 uv** (macOS\u002FLinux):\n    ```bash\n    curl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n    ```\n    *(Windows PowerShell: `powershell -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"`)*\n\n2.  **初始化项目并安装核心库**：\n    创建一个新项目并安装清单中的核心组件（pandas, scikit-learn, matplotlib, seaborn 等）。\n    ```bash\n    uv init my-ds-project\n    cd my-ds-project\n    \n    # 使用清华源安装核心数据科学栈\n    uv pip install pandas scikit-learn matplotlib seaborn ydata-profiling missingno tqdm \\\n      --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n3.  **安装可选增强库**（按需）：\n    ```bash\n    # 高性能 DataFrame (Polars) 和 SQL 支持 (DuckDB)\n    uv pip install polars duckdb --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    \n    # 交互式可视化\n    uv pip install pygwalker --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n### 方案 B：使用 pip（传统方式）\n\n如果你习惯使用原生 pip，可以通过配置临时镜像源来加速。\n\n```bash\npython -m venv venv\nsource venv\u002Fbin\u002Factivate  # Windows: venv\\Scripts\\activate\n\n# 安装核心库\npip install pandas scikit-learn matplotlib seaborn ydata-profiling missingno tqdm \\\n  -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 安装高性能替代方案\npip install polars duckdb pygwalker -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\n以下是基于清单中核心库的最简使用示例，涵盖数据加载、探索、处理和可视化。\n\n### 1. 数据加载与描述性统计\n使用 `pandas` 读取数据，并利用 `ydata-profiling` 一键生成详细的数据分析报告。\n\n```python\nimport pandas as pd\nfrom ydata_profiling import ProfileReport\n\n# 加载数据 (示例使用内置数据集)\ndf = pd.read_csv(\"https:\u002F\u002Fraw.githubusercontent.com\u002Fmwaskom\u002Fseaborn-data\u002Fmaster\u002Ftips.csv\")\n\n# 方法 A: 传统 Pandas 概览\nprint(df.head())\nprint(df.describe())\n\n# 方法 B: 生成交互式分析报告 (节省大量手动检查时间)\nprofile = ProfileReport(df, title=\"Pandas Profiling Report\", minimal=True)\n# 在 Jupyter 中直接显示: profile.to_notebook_iframe()\n# 或保存为 HTML: profile.to_file(\"report.html\")\n```\n\n### 2. 缺失值可视化\n使用 `missingno` 快速识别数据中的缺失模式。\n\n```python\nimport missingno as msno\nimport matplotlib.pyplot as plt\n\n# 绘制缺失值矩阵图\nmsno.matrix(df)\nplt.show()\n```\n\n### 3. 高性能数据处理 (Polars 示例)\n当数据量较大时，可使用清单推荐的 `polars` 替代 pandas 以获得多线程加速。\n\n```python\nimport polars as pl\n\n# Polars 语法与 Pandas 类似但更快\ndf_pl = pl.read_csv(\"https:\u002F\u002Fraw.githubusercontent.com\u002Fmwaskom\u002Fseaborn-data\u002Fmaster\u002Ftips.csv\")\n\n# 执行快速聚合\nresult = df_pl.group_by(\"day\").agg(\n    pl.col(\"total_bill\").mean().alias(\"avg_bill\"),\n    pl.col(\"tip\").sum().alias(\"total_tip\")\n)\nprint(result)\n```\n\n### 4. 交互式可视化\n使用 `pygwalker` 在 Jupyter 中获得类似 Tableau 的拖拽式分析体验。\n\n```python\nimport pygwalker as pyg\nimport pandas as pd\n\ndf = pd.read_csv(\"https:\u002F\u002Fraw.githubusercontent.com\u002Fmwaskom\u002Fseaborn-data\u002Fmaster\u002Ftips.csv\")\n\n# 在 Notebook 中启动交互式界面\ngwalker = pyg.walk(df)\n```\n\n### 5. 进度条监控\n在处理大型循环或 `apply` 操作时，使用 `tqdm` 监控进度。\n\n```python\nfrom tqdm import tqdm\nimport pandas as pd\nimport time\n\ntqdm.pandas() # 启用 pandas 集成\n\n# 模拟耗时操作并显示进度条\ndf['processed'] = df['total_bill'].progress_apply(lambda x: (time.sleep(0.01), x * 1.1)[1])\n```","某电商数据分析师需要在周五下班前，从千万级用户行为日志中快速挖掘促销活动的转化规律并产出可视化报告。\n\n### 没有 datascience 时\n- 面对海量 CSV 文件，手动编写低效的 Pandas 循环代码，处理一次数据需等待数十分钟，且无法利用多核性能。\n- 缺失值分布难以直观判断，只能靠打印统计数字盲猜，导致后续模型训练频繁报错或偏差大。\n- 临时需要 SQL 聚合分析时，必须先将数据导入数据库，流程割裂且环境配置繁琐。\n- 生成的静态图表缺乏交互性，业务方无法自行下钻查看细节，反复沟通修改耗费大量时间。\n- 依赖管理混乱，不同项目的库版本冲突频发，复现同事的分析代码常常失败。\n\n### 使用 datascience 后\n- 直接采纳清单中的 Polars 或 Modin 替代方案，利用多线程将数据处理速度提升十倍，几分钟内完成清洗。\n- 引入 ydata-profiling 和 missingno，一键生成包含缺失值热力图的详细报告，瞬间定位数据质量瓶颈。\n- 借助 DuckDB 直接在 DataFrame 上运行高效 SQL 查询，无需迁移数据即可实现复杂聚合分析。\n- 集成 Pygwalker 或 Marimo 构建交互式看板，业务人员可自主拖拽筛选数据，自助探索洞察。\n- 参考 uv 和 PDM 的最佳实践统一依赖管理，确保团队环境一致，代码在任何机器上均可无缝复现。\n\ndatascience 通过提供经过验证的工具链组合，将数据科学家从繁琐的环境搭建与低效编码中解放出来，使其专注于核心业务价值的挖掘。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fr0f1_datascience_63a8e5b4.png","r0f1","Florian Rohrer","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fr0f1_2d9a7596.jpg",null,"Vienna, Austria","https:\u002F\u002Fr0f1.github.io\u002F","https:\u002F\u002Fgithub.com\u002Fr0f1",4601,709,"2026-04-06T12:55:56","CC0-1.0",1,"Linux, macOS, Windows","非必需。部分库（如 cuDF, cupy, NVTabular）需要 NVIDIA GPU 及 CUDA 支持；mlx 库专为 Apple Silicon 设计。","未说明（取决于具体使用的数据集大小及是否使用 out-of-core 库如 Vaex, Dask）",{"notes":88,"python":89,"dependencies":90},"这是一个数据科学资源清单而非单一软件工具。环境需求高度依赖于所选用的具体库：处理大数据建议使用 Dask\u002FSpark\u002FRay；GPU 加速需安装 RAPIDS (cuDF\u002FcuPy)；Apple M 系列芯片可使用 MLX；部分功能需要 VSCode 插件或独立的 IDE（如 Positron）。建议使用虚拟环境按需安装相关库。","未说明（通常为 3.8+，需兼容列出的主要数据科学库）",[91,92,93,94,95,96,97,98,99,100],"pandas","numpy","scikit-learn","matplotlib","seaborn","polars","duckdb","dask","ray","statsmodels",[102,14,16],"其他",[104,105,106,64,107,108,109,110,111,112,113,114,115,116],"awesome","awesome-list","data-science","python","deep-learning","data-analysis","data-visualization","data-mining","machine-learning","artificial-intelligence","deeplearning","statistics","bayes","2026-03-27T02:49:30.150509","2026-04-07T13:27:26.439509",[],[]]