[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-WenjieDu--SAITS":3,"tool-WenjieDu--SAITS":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",156804,2,"2026-04-15T11:34:33",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":79,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":32,"env_os":95,"env_gpu":96,"env_ram":95,"env_deps":97,"category_tags":107,"github_topics":108,"view_count":32,"oss_zip_url":129,"oss_zip_packed_at":129,"status":17,"created_at":130,"updated_at":131,"faqs":132,"releases":161},7737,"WenjieDu\u002FSAITS","SAITS","The official PyTorch implementation of the paper \"SAITS: Self-Attention-based Imputation for Time Series\". A fast and state-of-the-art (SOTA) deep-learning neural network model for efficient time-series imputation (impute multivariate incomplete time series containing NaN missing data\u002Fvalues with machine learning). https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.08516","SAITS 是一款基于 PyTorch 开发的开源深度学习模型，专为高效处理多变量时间序列数据中的缺失值而设计。在物联网传感器、金融分析及医疗监测等场景中，数据因设备故障或传输错误出现中断是常见难题，传统插补方法往往难以捕捉复杂的时间依赖关系。SAITS 通过引入纯粹的自注意力机制（Self-Attention），摒弃了传统的递归结构，能够直接从全局视角理解数据特征，从而精准地填补包含 NaN 缺失值的时序数据。\n\n作为该领域的标杆性成果，SAITS 不仅推理速度快，更在多项基准测试中达到了业界领先（SOTA）的精度。其核心亮点在于创新的“双重注意力”架构与联合训练策略，使其能同时关注时间步之间的关联以及不同变量间的相互作用。此外，相关团队还将 SAITS 的核心嵌入与训练技巧迁移至 iTransformer、PatchTST 等多种主流架构中，进一步验证了其技术的通用性与有效性。\n\n这款工具非常适合人工智能研究人员、数据科学家及后端开发者使用。无论是需要复现前沿算法的学术探索者，还是致力于构建高可靠性数据管道的工程人员，都能利用 SAITS 轻松解决数据不完整带来的分析障碍，为后续的","SAITS 是一款基于 PyTorch 开发的开源深度学习模型，专为高效处理多变量时间序列数据中的缺失值而设计。在物联网传感器、金融分析及医疗监测等场景中，数据因设备故障或传输错误出现中断是常见难题，传统插补方法往往难以捕捉复杂的时间依赖关系。SAITS 通过引入纯粹的自注意力机制（Self-Attention），摒弃了传统的递归结构，能够直接从全局视角理解数据特征，从而精准地填补包含 NaN 缺失值的时序数据。\n\n作为该领域的标杆性成果，SAITS 不仅推理速度快，更在多项基准测试中达到了业界领先（SOTA）的精度。其核心亮点在于创新的“双重注意力”架构与联合训练策略，使其能同时关注时间步之间的关联以及不同变量间的相互作用。此外，相关团队还将 SAITS 的核心嵌入与训练技巧迁移至 iTransformer、PatchTST 等多种主流架构中，进一步验证了其技术的通用性与有效性。\n\n这款工具非常适合人工智能研究人员、数据科学家及后端开发者使用。无论是需要复现前沿算法的学术探索者，还是致力于构建高可靠性数据管道的工程人员，都能利用 SAITS 轻松解决数据不完整带来的分析障碍，为后续的预测与决策提供高质量的数据基础。","\n> [!TIP]\n> **[Updates in May 2025]** 🎉 Our survey paper [Deep Learning for Multivariate Time Series Imputation: A Survey](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.04059) gets accepted by IJCAI 2025!\nWe comprehensively review the literature of the state-of-the-art deep-learning imputation methods for time series, provide a new taxonomy based on uncertainty and model architecture for them, systematically compare multiple toolkits, and discuss the challenges and future directions in this field.\n> \n> **[Updates in Jun 2024]** 😎 The 1st comprehensive time-seres imputation benchmark paper\n[TSI-Bench: Benchmarking Time Series Imputation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.12747) is now publicly available.\nThe code is open source in the repo [Awesome_Imputation](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FAwesome_Imputation).\nWith nearly 35,000 experiments, we provide a comprehensive benchmarking study on 28 imputation methods, 3 missing patterns (points, sequences, blocks),\nvarious missing rates, and 8 real-world datasets.\n> \n> **[Updates in May 2024]** 🔥 We applied SAITS embedding and training strategies to **iTransformer, FiLM, FreTS, Crossformer, PatchTST, DLinear, ETSformer, FEDformer, \n> Informer, Autoformer, Non-stationary Transformer, Pyraformer, Reformer, SCINet, RevIN, Koopa, MICN, TiDE, and StemGNN** in \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FPyPOTS\">\u003Cimg src=\"https:\u002F\u002Fpypots.com\u002Ffigs\u002Fpypots_logos\u002FPyPOTS\u002Flogo_FFBG.svg\" width=\"26px\" align=\"center\"\u002F> PyPOTS\u003C\u002Fa>\n> to enable them to be applicable to the time-series imputation task.\n\n\n\u003Cp align=\"center\">\n    \u003Ca id=\"SAITS\" href=\"#SAITS\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWenjieDu_SAITS_readme_fa765987f037.jpg\" alt=\"SAITS Title\" title=\"SAITS Title\" width=\"80%\"\u002F>\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-v3-E97040?logo=python&logoColor=white\" \u002F>\n    \u003Cimg alt=\"powered by PyTorch\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPyTorch-❤️-F8C6B5?logo=pytorch&logoColor=white\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fblob\u002Fmain\u002FLICENSE\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-3C7699?logo=opensourceinitiative&logoColor=white\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fdoi.org\u002F10.1016\u002Fj.eswa.2023.119619\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FESWA-published-75C1C4?logo=elsevier&color=FF6C00\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?view_op=view_citation&hl=en&user=j9qvUg0AAAAJ&citation_for_view=j9qvUg0AAAAJ:Y0pCki6q_DkC\" title=\"Paper citation number from Google Scholar\">\n        \u003Cimg src=\"https:\u002F\u002Fpypots.com\u002Ffigs\u002Fcitation_badges\u002Fsaits.svg\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fwebofscience.clarivate.cn\u002Fwos\u002Fwoscc\u002Ffull-record\u002FWOS:000943170100001?SID=USW2EC0D82x89d30RifxLVxJpho5Y\" title=\"This is a Highly Cited Paper recognized by ESI\">\n        \u003Cimg src=\"https:\u002F\u002Fpypots.com\u002Ffigs\u002Fcitation_badges\u002FESI_highly_cited_paper.svg\" \u002F>\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n**‼️Kind reminder: This document can \u003Cins>help you solve many common questions\u003C\u002Fins>, please read it before you run the code.**\n\nThe official code repository is for the paper [SAITS: Self-Attention-based Imputation for Time Series](https:\u002F\u002Fdoi.org\u002F10.1016\u002Fj.eswa.2023.119619) \n(preprint on arXiv is [here](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.08516)), which has been accepted by the journal\n*[Expert Systems with Applications (ESWA)](https:\u002F\u002Fwww.sciencedirect.com\u002Fjournal\u002Fexpert-systems-with-applications)*\n[2022 IF 8.665, CiteScore 12.2, JCR-Q1, CAS-Q1, CCF-C]. You may never have heard of ESWA, \nwhile it was ranked 1st in Google Scholar under the top publications of Artificial Intelligence in 2016 \n([info source](https:\u002F\u002Fwww.sciencedirect.com\u002Fjournal\u002Fexpert-systems-with-applications\u002Fabout\u002Fnews#expert-systems-with-applications-is-currently-ranked-no1-in)), and is still the top 1 AI journal according to Google Scholar metrics \n([here is the current ranking list](https:\u002F\u002Fscholar.google.com\u002Fcitations?view_op=top_venues&hl=en&vq=eng_artificialintelligence) FYI).\n\nSAITS is the first work applying pure self-attention without any recursive design in the algorithm for general time series imputation.\nBasically you can take it as a validated framework for time series imputation, like we've integrated more than 2️⃣0️⃣ forecasting models into PyPOTS by adapting SAITS framework.\nMore generally, you can use it for sequence imputation. Besides, the code here is open source under the MIT license. \nTherefore, you're welcome to modify the SAITS code for your own research purpose and domain applications.\nOf course, it probably needs a bit of modification in the model structure or loss functions for specific scenarios or data input.\nAnd this is [an incomplete list](https:\u002F\u002Fscholar.google.com\u002Fscholar?hl=en&as_sdt=0%2C5&as_ylo=2022&q=%E2%80%9CSAITS%E2%80%9D+%22time+series%22) of scientific research referencing SAITS in their papers. \n\n🤗 Please [cite SAITS](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS#-citing-saits) in your publications if it helps with your work.\nPlease star🌟 this repo to help others notice SAITS if you think it is useful. \nIt really means a lot to our open-source research. Thank you! \nBTW, you may also like \n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FPyPOTS\">\nPyPOTS \u003Cimg align=\"center\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FWenjieDu\u002FPyPOTS?style=social\">\n\u003C\u002Fa>\nfor easily modeling your partially-observed time-series datasets.\n\n> [!IMPORTANT]\n> \u003Ca href='https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FPyPOTS'>\u003Cimg src='https:\u002F\u002Fpypots.com\u002Ffigs\u002Fpypots_logos\u002FPyPOTS\u002Flogo_FFBG.svg' width='130' align='right' \u002F>\u003C\u002Fa>\n> **📣 Attention please:**\n> \n> SAITS now is available in [PyPOTS](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FPyPOTS), a Python toolbox for data mining on POTS (Partially-Observed Time Series). \n> An example of training SAITS for imputing dataset PhysioNet-2012 is shown below. With [PyPOTS](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FPyPOTS), easy peasy! 😉 \n\n\u003Cdetails open>\n  \u003Csummary>\u003Cb>👉 Click here to see the example 👀\u003C\u002Fb>\u003C\u002Fsummary>\n\n``` python\nimport numpy as np\nfrom sklearn.preprocessing import StandardScaler\nfrom pygrinder import mcar, calc_missing_rate\nfrom benchpots.datasets import preprocess_physionet2012\ndata = preprocess_physionet2012(subset='set-a',rate=0.1) # Our ecosystem libs will automatically download and extract it\ntrain_X, val_X, test_X = data[\"train_X\"], data[\"val_X\"], data[\"test_X\"]\nprint(train_X.shape)  # (n_samples, n_steps, n_features)\nprint(val_X.shape)  # samples (n_samples) in train set and val set are different, but they have the same sequence len (n_steps) and feature dim (n_features)\nprint(f\"We have {calc_missing_rate(train_X):.1%} values missing in train_X\")  \ntrain_set = {\"X\": train_X}  # in training set, simply put the incomplete time series into it\nval_set = {\n    \"X\": val_X,\n    \"X_ori\": data[\"val_X_ori\"],  # in validation set, we need ground truth for evaluation and picking the best model checkpoint\n}\ntest_set = {\"X\": test_X}  # in test set, only give the testing incomplete time series for model to impute\ntest_X_ori = data[\"test_X_ori\"]  # test_X_ori bears ground truth for evaluation\nindicating_mask = np.isnan(test_X) ^ np.isnan(test_X_ori)  # mask indicates the values that are missing in X but not in X_ori, i.e. where the gt values are \n\nfrom pypots.imputation import SAITS  # import the model you want to use\nfrom pypots.nn.functional import calc_mae\nsaits = SAITS(n_steps=train_X.shape[1], n_features=train_X.shape[2], n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=5)\nsaits.fit(train_set, val_set)  # train the model on the dataset\nimputation = saits.impute(test_set)  # impute the originally-missing values and artificially-missing values\nmae = calc_mae(imputation, np.nan_to_num(test_X_ori), indicating_mask)  # calculate mean absolute error on the ground truth (artificially-missing values)\nsaits.save(\"save_it_here\u002Fsaits_physionet2012.pypots\")  # save the model for future use\nsaits.load(\"save_it_here\u002Fsaits_physionet2012.pypots\")  # reload the serialized model file for following imputation or training\n```\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fpypots.com\u002Fecosystem\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWenjieDu_SAITS_readme_cff90719f8b8.png\" width=\"95%\"\u002F>\n\u003C\u002Fa>\n\u003Cbr>\n\u003Cb> ☕️ Welcome to the universe of PyPOTS. Enjoy it and have fun!\u003C\u002Fb>\n\u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\n\n## ❖ Motivation and Performance\n⦿ **`Motivation`**: SAITS is developed primarily to help overcome the drawbacks (slow speed, memory constraints, and compounding error)\nof RNN-based imputation models and to obtain the state-of-the-art (SOTA) imputation accuracy on partially-observed time series.\n\n⦿ **`Performance`**: SAITS outperforms [BRITS](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F2018\u002Fhash\u002F734e6bfcd358e25ac1db0a4241b95651-Abstract.html)\nby **12% ∼ 38%** in MAE (mean absolute error) and achieves **2.0 ∼ 2.6** times faster training speed.\nFurthermore, SAITS outperforms Transformer (trained by our joint-optimization approach) by **2% ∼ 19%** in MAE with a\nmore efficient model structure (to obtain comparable performance, SAITS needs only **15% ∼ 30%** parameters of Transformer).\nCompared to another SOTA self-attention imputation model [NRTSI](https:\u002F\u002Fgithub.com\u002Flupalab\u002FNRTSI), SAITS achieves\n**7% ∼ 39%** smaller mean squared error (\u003Cins>above 20% in nine out of sixteen cases\u003C\u002Fins>), meanwhile, needs much\nfewer parameters and less imputation time in practice. \nPlease refer to our [full paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2202.08516.pdf) for more details about SAITS' performance.\n\n\n## ❖ Brief Graphical Illustration of Our Methodology\nHere we only show the two main components of our method: the joint-optimization training approach and SAITS structure.\nFor the detailed description and explanation, please read our full paper `Paper_SAITS.pdf` in this repo \nor [on arXiv](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2202.08516.pdf).\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FWenjieDu\u002FSAITS\u002Fmain\u002Ffigs\u002FTraining approach.svg?sanitize=true\" alt=\"Training approach\" title=\"Training approach\" width=\"800\"\u002F>\n\n\u003Cb>Fig. 1: Training approach\u003C\u002Fb>\n\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FWenjieDu\u002FSAITS\u002Fmain\u002Ffigs\u002FSAITS arch.svg?sanitize=true\" alt=\"SAITS architecture\" title=\"SAITS architecture\" width=\"600\"\u002F>\n\n\u003Cb>Fig. 2: SAITS structure\u003C\u002Fb>\n\n\u003C\u002Fdiv>\n\n\n## ❖ Citing SAITS\nIf you find SAITS is helpful to your work, please cite our paper as below, \n⭐️star this repository, and recommend it to others who you think may need it. 🤗 Thank you!\n\n```bibtex\n@article{du2023saits,\ntitle = {{SAITS: Self-Attention-based Imputation for Time Series}},\njournal = {Expert Systems with Applications},\nvolume = {219},\npages = {119619},\nyear = {2023},\nissn = {0957-4174},\ndoi = {10.1016\u002Fj.eswa.2023.119619},\nurl = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.08516},\nauthor = {Wenjie Du and David Cote and Yan Liu},\n}\n```\nor\n> Wenjie Du, David Cote, and Yan Liu.\n> SAITS: Self-Attention-based Imputation for Time Series.\n> Expert Systems with Applications, 219:119619, 2023.\n\n### 😎 Our latest survey and benchmarking research on time-series imputation may also be useful to your work:\n```bibtex\n@article{wang2025survey,\ntitle={Deep Learning for Multivariate Time Series Imputation: A Survey},\nauthor={Jun Wang and Wenjie Du and Yiyuan Yang and Linglong Qian and Wei Cao and Keli Zhang and Wenjia Wang and Yuxuan Liang and Qingsong Wen},\nbooktitle = {Proceedings of the 34th International Joint Conference on Artificial Intelligence (IJCAI)},\nyear={2025}\n}\n```\n\n```bibtex\n@article{du2024tsibench,\ntitle={TSI-Bench: Benchmarking Time Series Imputation},\nauthor={Wenjie Du and Jun Wang and Linglong Qian and Yiyuan Yang and Fanxing Liu and Zepu Wang and Zina Ibrahim and Haoxin Liu and Zhiyuan Zhao and Yingjie Zhou and Wenjia Wang and Kaize Ding and Yuxuan Liang and B. Aditya Prakash and Qingsong Wen},\njournal={arXiv preprint arXiv:2406.12747},\nyear={2024}\n}\n```\n\n### 🔥 In case you use PyPOTS in your research, please also cite the following paper:\n\n``` bibtex\n@article{du2023pypots,\ntitle={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},\nauthor={Wenjie Du},\njournal={arXiv preprint arXiv:2305.18811},\nyear={2023},\n}\n```\nor\n> Wenjie Du.\n> PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series.\n> arXiv, abs\u002F2305.18811, 2023.\n\n\n## ❖ Repository Structure\nThe implementation of SAITS is in dir [`modeling`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fblob\u002Fmain\u002Fmodeling\u002FSA_models.py).\nWe give configurations of our models in dir [`configs`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Ftree\u002Fmain\u002Fconfigs), provide\nthe dataset links and preprocessing scripts in dir [`dataset_generating_scripts`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Ftree\u002Fmain\u002Fdataset_generating_scripts).\nDir [`NNI_tuning`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Ftree\u002Fmain\u002FNNI_tuning) contains the hyper-parameter searching configurations.\n\n\n## ❖ Development Environment\nAll dependencies of our development environment are listed in file [`conda_env_dependencies.yml`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fblob\u002Fmain\u002Fconda_env_dependencies.yml).\nYou can quickly create a usable python environment with an anaconda command `conda env create -f conda_env_dependencies.yml`.\n\n\n## ❖ Datasets\nFor datasets downloading and generating, please check out the scripts in \ndir [`dataset_generating_scripts`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Ftree\u002Fmain\u002Fdataset_generating_scripts).\n\n\n## ❖ Quick Run\nGenerate the dataset you need first. To do so, please check out the generating scripts in \ndir [`dataset_generating_scripts`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Ftree\u002Fmain\u002Fdataset_generating_scripts).\n\nAfter data generation, train and test your model, for example,\n\n```shell\n# create a dir to save logs and results\nmkdir NIPS_results\n\n# train a model\nnohup python run_models.py \\\n    --config_path configs\u002FPhysioNet2012_SAITS_best.ini \\\n    > NIPS_results\u002FPhysioNet2012_SAITS_best.out &\n\n# during training, you can run the blow command to read the training log\nless NIPS_results\u002FPhysioNet2012_SAITS_best.out\n\n# after training, pick the best model and modify the path of the model for testing in the config file, then run the below command to test the model\npython run_models.py \\\n    --config_path configs\u002FPhysioNet2012_SAITS_best.ini \\\n    --test_mode\n```\n\n❗️Note that paths of datasets and saving dirs may be different on personal computers, please check them in the configuration files.\n\n\n## ❖ Acknowledgments\nThanks to Ciena, Mitacs, and NSERC (Natural Sciences and Engineering Research Council of Canada) for funding support.  \nThanks to all our reviewers for helping improve the quality of this paper.  \nThanks to Ciena for providing computing resources.  \nAnd thank you all for your attention to this work.\n\n\n### ✨Stars\u002Fforks\u002Fissues\u002FPRs are all welcome!\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>\u003Ci>👏 Click to View Stargazers and Forkers: \u003C\u002Fi>\u003C\u002Fb>\u003C\u002Fsummary>\n\n[![Stargazers repo roster for @WenjieDu\u002FSAITS](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWenjieDu_SAITS_readme_23c8fb8c4ce5.png)](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fstargazers)\n\n[![Forkers repo roster for @WenjieDu\u002FSAITS](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWenjieDu_SAITS_readme_4fd81d0772a7.png)](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fnetwork\u002Fmembers)\n\u003C\u002Fdetails>\n\n\n## ❖ Last but Not Least\nIf you have any additional questions or have interests in collaboration, \nplease take a look at [my GitHub profile](https:\u002F\u002Fgithub.com\u002FWenjieDu) and feel free to contact me 😃.","> [!TIP]\n> **[2025年5月更新]** 🎉 我们的综述论文《用于多变量时间序列插补的深度学习：综述》（Deep Learning for Multivariate Time Series Imputation: A Survey）已被IJCAI 2025接收！\n我们全面回顾了当前最先进的基于深度学习的时间序列插补方法的相关文献，提出了一个基于不确定性与模型架构的新分类体系，系统性地比较了多个工具包，并探讨了该领域的挑战与未来发展方向。\n> \n> **[2024年6月更新]** 😎 第一篇全面的时间序列插补基准研究论文\n《TSI-Bench：时间序列插补基准测试》（TSI-Bench: Benchmarking Time Series Imputation）现已公开发布。\n代码已在仓库[Awesome_Imputation](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FAwesome_Imputation)中开源。\n通过近35,000次实验，我们对28种插补方法、3种缺失模式（点缺失、序列缺失、块缺失）、不同缺失率以及8个真实世界数据集进行了全面的基准测试。\n> \n> **[2024年5月更新]** 🔥 我们将SAITS的嵌入和训练策略应用于\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FPyPOTS\">\u003Cimg src=\"https:\u002F\u002Fpypots.com\u002Ffigs\u002Fpypots_logos\u002FPyPOTS\u002Flogo_FFBG.svg\" width=\"26px\" align=\"center\"\u002F> PyPOTS\u003C\u002Fa>中的**iTransformer、FiLM、FreTS、Crossformer、PatchTST、DLinear、ETSformer、FEDformer、\nInformer、Autoformer、Non-stationary Transformer、Pyraformer、Reformer、SCINet、RevIN、Koopa、MICN、TiDE以及StemGNN**，使它们能够适用于时间序列插补任务。\n\n\n\u003Cp align=\"center\">\n    \u003Ca id=\"SAITS\" href=\"#SAITS\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWenjieDu_SAITS_readme_fa765987f037.jpg\" alt=\"SAITS标题\" title=\"SAITS标题\" width=\"80%\"\u002F>\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-v3-E97040?logo=python&logoColor=white\" \u002F>\n    \u003Cimg alt=\"由PyTorch驱动\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPyTorch-❤️-F8C6B5?logo=pytorch&logoColor=white\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fblob\u002Fmain\u002FLICENSE\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-3C7699?logo=opensourceinitiative&logoColor=white\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fdoi.org\u002F10.1016\u002Fj.eswa.2023.119619\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FESWA-已发表-75C1C4?logo=elsevier&color=FF6C00\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?view_op=view_citation&hl=en&user=j9qvUg0AAAAJ&citation_for_view=j9qvUg0AAAAJ:Y0pCki6q_DkC\" title=\"Google Scholar上的论文引用次数\">\n        \u003Cimg src=\"https:\u002F\u002Fpypots.com\u002Ffigs\u002Fcitation_badges\u002Fsaits.svg\" \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fwebofscience.clarivate.cn\u002Fwos\u002Fwoscc\u002Ffull-record\u002FWOS:000943170100001?SID=USW2EC0D82x89d30RifxLVxJpho5Y\" title=\"这是一篇被ESI认定的高度引用论文\">\n        \u003Cimg src=\"https:\u002F\u002Fpypots.com\u002Ffigs\u002Fcitation_badges\u002FESI_highly_cited_paper.svg\" \u002F>\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n**‼️温馨提示：本文档可以\u003Cins>帮助您解答许多常见问题\u003C\u002Fins>,请在运行代码之前仔细阅读。**\n\n官方代码库对应的是论文《SAITS：基于自注意力机制的时间序列插补》（SAITS: Self-Attention-based Imputation for Time Series），该论文已被期刊*[Expert Systems with Applications (ESWA)](https:\u002F\u002Fwww.sciencedirect.com\u002Fjournal\u002Fexpert-systems-with-applications)*接受\n[2022年影响因子8.665，CiteScore 12.2，JCR-Q1，CAS-Q1，CCF-C]。您可能从未听说过ESWA，\n但它曾在2016年被Google Scholar评为人工智能领域顶级出版物中的第一名\n([信息来源](https:\u002F\u002Fwww.sciencedirect.com\u002Fjournal\u002Fexpert-systems-with-applications\u002Fabout\u002Fnews#expert-systems-with-applications-is-currently-ranked-no1-in))，并且至今仍是Google Scholar排名首位的人工智能期刊\n([当前排名列表](https:\u002F\u002Fscholar.google.com\u002Fcitations?view_op=top_venues&hl=en&vq=eng_artificialintelligence)，供您参考)。\n\nSAITS是首个在算法中完全采用自注意力机制、而不使用任何递归设计的通用时间序列插补工作。\n基本上，您可以将其视为一个经过验证的时间序列插补框架，就像我们通过适配SAITS框架，将超过2️⃣0️⃣种预测模型集成到PyPOTS中一样。\n更广泛地说，它也可以用于序列插补。此外，这里的代码以MIT许可证开源。\n因此，欢迎您根据自己的研究目的和领域应用修改SAITS代码。\n当然，在特定场景或数据输入下，可能需要对模型结构或损失函数进行一些调整。\n以下是[一份不完全清单](https:\u002F\u002Fscholar.google.com\u002Fscholar?hl=en&as_sdt=0%2C5&as_ylo=2022&q=%E2%80%9CSAITS%E2%80%9D+%22time+series%22)，展示了科学界在论文中引用SAITS的情况。\n\n🤗 如果SAITS对您的工作有所帮助，请在您的出版物中[引用SAITS](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS#-citing-saits)。\n如果您认为它有用，请为本仓库点亮星标🌟，以便更多人注意到SAITS。\n这对我们的开源研究意义重大。感谢您的支持！\n顺带一提，您可能也会喜欢\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FPyPOTS\">\nPyPOTS \u003Cimg align=\"center\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FWenjieDu\u002FPyPOTS?style=social\">\n\u003C\u002Fa>\n，它可以轻松处理您那些部分观测的时间序列数据集。\n\n> [!IMPORTANT]\n> \u003Ca href='https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FPyPOTS'>\u003Cimg src='https:\u002F\u002Fpypots.com\u002Ffigs\u002Fpypots_logos\u002FPyPOTS\u002Flogo_FFBG.svg' width='130' align='right' \u002F>\u003C\u002Fa>\n> **📣 请注意：**\n> \n> SAITS现已集成到[PYPOTS](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FPyPOTS)中，这是一个用于处理部分观测时间序列（POTS）的数据挖掘Python工具箱。\n下面展示了使用SAITS对PhysioNet-2012数据集进行插补的示例。借助[PYPOTS](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FPyPOTS)，一切都变得轻而易举！😉 \n\n\u003Cdetails open>\n  \u003Csummary>\u003Cb>👉 点击此处查看示例 👀\u003C\u002Fb>\u003C\u002Fsummary>\n\n``` python\nimport numpy as np\nfrom sklearn.preprocessing import StandardScaler\nfrom pygrinder import mcar, calc_missing_rate\nfrom benchpots.datasets import preprocess_physionet2012\ndata = preprocess_physionet2012(subset='set-a',rate=0.1) # 我们的生态系统库会自动下载并解压数据集\ntrain_X, val_X, test_X = data[\"train_X\"], data[\"val_X\"], data[\"test_X\"]\nprint(train_X.shape)  # (n_samples, n_steps, n_features)\nprint(val_X.shape)  # 训练集和验证集的样本数不同，但它们具有相同的序列长度(n_steps)和特征维度(n_features)\nprint(f\"训练集中有{calc_missing_rate(train_X):.1%}的值缺失\")  \ntrain_set = {\"X\": train_X}  # 在训练集中，只需将不完整的时序数据放入即可\nval_set = {\n    \"X\": val_X,\n    \"X_ori\": data[\"val_X_ori\"],  # 在验证集中，我们需要真实标签来进行评估和选择最佳模型检查点\n}\ntest_set = {\"X\": test_X}  # 在测试集中，只提供待填补的不完整时序数据\ntest_X_ori = data[\"test_X_ori\"]  # test_X_ori 包含用于评估的真实标签\nindicating_mask = np.isnan(test_X) ^ np.isnan(test_X_ori)  # mask 标记了 X 中缺失但 X_ori 中未缺失的值，即真实标签所在的位置\n\nfrom pypots.imputation import SAITS  # 导入你想要使用的模型\nfrom pypots.nn.functional import calc_mae\nsaits = SAITS(n_steps=train_X.shape[1], n_features=train_X.shape[2], n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=5)\nsaits.fit(train_set, val_set)  # 在数据集上训练模型\nimputation = saits.impute(test_set)  # 填补原本缺失以及人为添加的缺失值\nmae = calc_mae(imputation, np.nan_to_num(test_X_ori), indicating_mask)  # 在真实标签（人为添加的缺失值）上计算平均绝对误差\nsaits.save(\"save_it_here\u002Fsaits_physionet2012.pypots\")  # 保存模型以备将来使用\nsaits.load(\"save_it_here\u002Fsaits_physionet2012.pypots\")  # 加载已序列化的模型文件，以便后续填补或训练\n```\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fpypots.com\u002Fecosystem\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWenjieDu_SAITS_readme_cff90719f8b8.png\" width=\"95%\"\u002F>\n\u003C\u002Fa>\n\u003Cbr>\n\u003Cb> ☕️ 欢迎来到 PyPOTS 的世界。尽情享受吧！\u003C\u002Fb>\n\u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\n\n\n\n## ❖ 动机与性能\n⦿ **`动机`**: SAITS 主要是为了克服基于 RNN 的填补模型的缺点（速度慢、内存限制和误差累积），并在部分观测的时序数据上达到最先进的填补准确度而开发的。\n\n⦿ **`性能`**: SAITS 在 MAE（平均绝对误差）方面比 [BRITS](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F2018\u002Fhash\u002F734e6bfcd358e25ac1db0a4241b95651-Abstract.html) 高出 **12% ∼ 38%**，并且训练速度提高了 **2.0 ∼ 2.6** 倍。此外，SAITS 还比采用联合优化方法训练的 Transformer 在 MAE 上高出 **2% ∼ 19%**，同时其模型结构更加高效（为了达到相当的性能，SAITS 所需的参数仅为 Transformer 的 **15% ∼ 30%**）。与另一个 SOTA 自注意力填补模型 [NRTSI](https:\u002F\u002Fgithub.com\u002Flupalab\u002FNRTSI) 相比，SAITS 的均方误差小了 **7% ∼ 39%**（*其中九个案例中的误差减少超过 20%*），而且在实际应用中所需的参数更少，填补时间也更短。有关 SAITS 性能的更多详细信息，请参阅我们的 [完整论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2202.08516.pdf)。\n\n\n## ❖ 方法论简图说明\n这里仅展示我们方法的两个主要组成部分：联合优化训练方法和 SAITS 结构。详细的描述和解释，请阅读本仓库中的完整论文 `Paper_SAITS.pdf` 或者 [arXiv 上的版本](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2202.08516.pdf)。\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FWenjieDu\u002FSAITS\u002Fmain\u002Ffigs\u002FTraining approach.svg?sanitize=true\" alt=\"Training approach\" title=\"Training approach\" width=\"800\"\u002F>\n\n\u003Cb>图1：训练方法\u003C\u002Fb>\n\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FWenjieDu\u002FSAITS\u002Fmain\u002Ffigs\u002FSAITS arch.svg?sanitize=true\" alt=\"SAITS architecture\" title=\"SAITS architecture\" width=\"600\"\u002F>\n\n\u003Cb>图2：SAITS 结构\u003C\u002Fb>\n\n\u003C\u002Fdiv>\n\n\n## ❖ 引用 SAITS\n如果您发现 SAITS 对您的工作有所帮助，请按照以下方式引用我们的论文，⭐️给这个仓库加星，并推荐给可能需要它的人。🤗 谢谢！\n\n```bibtex\n@article{du2023saits,\ntitle = {{SAITS: Self-Attention-based Imputation for Time Series}},\njournal = {Expert Systems with Applications},\nvolume = {219},\npages = {119619},\nyear = {2023},\nissn = {0957-4174},\ndoi = {10.1016\u002Fj.eswa.2023.119619},\nurl = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.08516},\nauthor = {Wenjie Du and David Cote and Yan Liu},\n}\n```\n\n或者\n> Wenjie Du, David Cote 和 Yan Liu.\n> SAITS: 基于自注意力机制的时间序列填补。\n> Expert Systems with Applications, 第219卷，第119619页，2023年。\n\n### 😎 我们关于时间序列填补的最新综述和基准研究也可能对您的工作有所帮助：\n```bibtex\n@article{wang2025survey,\ntitle={Deep Learning for Multivariate Time Series Imputation: A Survey},\nauthor={Jun Wang 和 Wenjie Du 和 Yiyuan Yang 和 Linglong Qian 和 Wei Cao 和 Keli Zhang 和 Wenjia Wang 和 Yuxuan Liang 和 Qingsong Wen},\nbooktitle = {第34届国际人工智能联合会议(IJCAI)论文集},\nyear={2025}\n}\n```\n\n```bibtex\n@article{du2024tsibench,\ntitle={TSI-Bench: Benchmarking Time Series Imputation},\nauthor={Wenjie Du 和 Jun Wang 和 Linglong Qian 和 Yiyuan Yang 和 Fanxing Liu 和 Zepu Wang 和 Zina Ibrahim 和 Haoxin Liu 和 Zhiyuan Zhao 和 Yingjie Zhou 和 Wenjia Wang 和 Kaize Ding 和 Yuxuan Liang 和 B. Aditya Prakash 和 Qingsong Wen},\njournal={arXiv 预印本 arXiv:2406.12747},\nyear={2024}\n}\n```\n\n### 🔥 如果您在研究中使用了 PyPOTS，请也引用以下论文：\n\n``` bibtex\n@article{du2023pypots,\ntitle={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},\nauthor={Wenjie Du},\njournal={arXiv 预印本 arXiv:2305.18811},\nyear={2023},\n}\n```\n\n或者\n> Wenjie Du.\n> PyPOTS: 一个用于挖掘部分观测时间序列数据的 Python 工具箱。\n> arXiv，编号 2305.18811，2023年。\n\n\n## ❖ 仓库结构\nSAITS 的实现位于目录 [`modeling`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fblob\u002Fmain\u002Fmodeling\u002FSA_models.py) 中。我们在目录 [`configs`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Ftree\u002Fmain\u002Fconfigs) 中提供了模型配置，在目录 [`dataset_generating_scripts`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Ftree\u002Fmain\u002Fdataset_generating_scripts) 中提供了数据集链接和预处理脚本。目录 [`NNI_tuning`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Ftree\u002Fmain\u002FNNI_tuning) 则包含超参数搜索的配置。\n\n## ❖ 开发环境\n我们的开发环境的所有依赖项都列在文件 [`conda_env_dependencies.yml`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fblob\u002Fmain\u002Fconda_env_dependencies.yml) 中。\n您可以通过 Anaconda 命令 `conda env create -f conda_env_dependencies.yml` 快速创建一个可用的 Python 环境。\n\n\n## ❖ 数据集\n有关数据集的下载和生成，请查看目录 [`dataset_generating_scripts`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Ftree\u002Fmain\u002Fdataset_generating_scripts) 中的脚本。\n\n\n## ❖ 快速运行\n首先生成您需要的数据集。为此，请查看目录 [`dataset_generating_scripts`](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Ftree\u002Fmain\u002Fdataset_generating_scripts) 中的生成脚本。\n\n数据生成完成后，您可以训练并测试您的模型，例如：\n\n```shell\n# 创建一个目录来保存日志和结果\nmkdir NIPS_results\n\n# 训练模型\nnohup python run_models.py \\\n    --config_path configs\u002FPhysioNet2012_SAITS_best.ini \\\n    > NIPS_results\u002FPhysioNet2012_SAITS_best.out &\n\n# 在训练过程中，您可以运行以下命令查看训练日志：\nless NIPS_results\u002FPhysioNet2012_SAITS_best.out\n\n# 训练结束后，选择最佳模型，并在配置文件中修改用于测试的模型路径，然后运行以下命令进行模型测试：\npython run_models.py \\\n    --config_path configs\u002FPhysioNet2012_SAITS_best.ini \\\n    --test_mode\n```\n\n❗️请注意，数据集路径和保存目录在不同电脑上可能会有所不同，请务必在配置文件中确认这些路径。\n\n\n## ❖ 致谢\n感谢 Ciena、Mitacs 以及加拿大自然科学与工程研究委员会（NSERC）提供的资金支持。\n感谢所有审稿人对本文质量提升的帮助。\n感谢 Ciena 提供的计算资源。\n同时也感谢各位对本工作的关注。\n\n\n### ✨ 欢迎点赞、叉项目、提交问题和拉取请求！\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>\u003Ci>👏 点击查看星标用户和叉项目者： \u003C\u002Fi>\u003C\u002Fb>\u003C\u002Fsummary>\n\n[![@WenjieDu\u002FSAITS 的星标用户列表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWenjieDu_SAITS_readme_23c8fb8c4ce5.png)](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fstargazers)\n\n[![@WenjieDu\u002FSAITS 的叉项目者列表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWenjieDu_SAITS_readme_4fd81d0772a7.png)](https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fnetwork\u002Fmembers)\n\u003C\u002Fdetails>\n\n\n## ❖ 最后但同样重要的是\n如果您有任何其他问题或有意合作，\n请查看我的 GitHub 个人主页 [WenjieDu](https:\u002F\u002Fgithub.com\u002FWenjieDu)，并随时与我联系 😃。","# SAITS 快速上手指南\n\nSAITS (Self-Attention-based Imputation for Time Series) 是一个基于纯自注意力机制的时间序列缺失值填补模型。它克服了传统 RNN 模型速度慢、内存受限及误差累积的缺点，在填补精度和训练速度上均表现优异。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows\n*   **Python 版本**: >= 3.8 (推荐 3.9+)\n*   **深度学习框架**: PyTorch (CPU 或 GPU 版本均可)\n*   **前置依赖**:\n    *   `numpy`\n    *   `scikit-learn`\n    *   `pandas` (可选，用于数据处理)\n\n> **💡 国内加速建议**\n> 建议使用国内镜像源安装依赖，以提升下载速度：\n> *   PyPI 镜像：`https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n> *   PyTorch 镜像：`https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118` (根据 CUDA 版本调整)\n\n## 安装步骤\n\nSAITS 的核心功能已集成到 **PyPOTS** (Python Toolbox for Partially-Observed Time Series) 中，推荐直接安装 PyPOTS 以使用 SAITS。\n\n### 1. 安装 PyTorch\n如果您尚未安装 PyTorch，请访问 [PyTorch 官网](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F) 获取适合您环境的安装命令。\n*(以下为 CPU 版本示例，GPU 用户请替换为对应的 cuda 版本)*\n```bash\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcpu\n```\n\n### 2. 安装 PyPOTS (包含 SAITS)\n使用 pip 进行安装（推荐使用清华源）：\n```bash\npip install pypots -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n若需使用额外的数据集处理工具（如示例中的 `benchpots` 和 `pygrinder`），也可一并安装：\n```bash\npip install benchpots pygrinder -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\n以下是一个完整的极简示例，演示如何使用 SAITS 对 `PhysioNet-2012` 数据集进行缺失值填补。该流程包括数据加载、模型初始化、训练、填补及评估。\n\n```python\nimport numpy as np\nfrom sklearn.preprocessing import StandardScaler\nfrom pygrinder import mcar, calc_missing_rate\nfrom benchpots.datasets import preprocess_physionet2012\n\n# 1. 数据预处理\n# 自动下载并提取数据集 (subset='set-a', 缺失率 0.1)\ndata = preprocess_physionet2012(subset='set-a', rate=0.1)\ntrain_X, val_X, test_X = data[\"train_X\"], data[\"val_X\"], data[\"test_X\"]\n\nprint(f\"训练集形状: {train_X.shape}\")  # (样本数，时间步长，特征数)\nprint(f\"训练集缺失率：{calc_missing_rate(train_X):.1%}\")\n\n# 构造数据集字典\ntrain_set = {\"X\": train_X}\nval_set = {\n    \"X\": val_X,\n    \"X_ori\": data[\"val_X_ori\"],  # 验证集需要真实值以选择最佳模型\n}\ntest_set = {\"X\": test_X}\ntest_X_ori = data[\"test_X_ori\"]  # 测试集真实值用于最终评估\n\n# 生成掩码：标记那些原本缺失但在测试集中被人工掩盖的值（用于计算指标）\nindicating_mask = np.isnan(test_X) ^ np.isnan(test_X_ori)\n\n# 2. 模型导入与初始化\nfrom pypots.imputation import SAITS\nfrom pypots.nn.functional import calc_mae\n\nsaits = SAITS(\n    n_steps=train_X.shape[1],   # 时间步长\n    n_features=train_X.shape[2],# 特征维度\n    n_layers=2,                 # Transformer 层数\n    d_model=256,                # 模型维度\n    n_heads=4,                  # 注意力头数\n    d_k=64,                     # Key 维度\n    d_v=64,                     # Value 维度\n    d_ffn=128,                  # 前馈网络维度\n    dropout=0.1,                # Dropout 比率\n    epochs=5                    # 训练轮数\n)\n\n# 3. 模型训练\nsaits.fit(train_set, val_set)\n\n# 4. 执行填补\nimputation = saits.impute(test_set)\n\n# 5. 效果评估 (计算 MAE)\nmae = calc_mae(imputation, np.nan_to_num(test_X_ori), indicating_mask)\nprint(f\"测试集 MAE: {mae:.4f}\")\n\n# 6. 模型保存与加载\nsaits.save(\"save_it_here\u002Fsaits_physionet2012.pypots\")\n# saits.load(\"save_it_here\u002Fsaits_physionet2012.pypots\") # 需要时可重新加载\n```\n\n### 关键参数说明\n*   `n_steps`: 输入序列的时间步长。\n*   `n_features`: 输入序列的特征维度。\n*   `epochs`: 训练迭代次数，可根据验证集损失调整。\n*   `d_model`, `n_heads`: 控制模型容量，数据量较大时可适当增加。\n\n现在您可以尝试将 `train_set` 替换为您自己的时间序列数据（格式为 `(n_samples, n_steps, n_features)` 的 numpy 数组或 Tensor），即可开始您的填补任务。","某智慧工厂的数据团队正在处理来自数百台传感器的实时监测数据，试图预测设备故障，但原始数据中因网络波动存在大量缺失值（NaN）。\n\n### 没有 SAITS 时\n- **简单填充导致失真**：团队被迫使用均值或线性插值填补空缺，完全忽略了传感器数据间复杂的非线性关联，导致异常模式被抹平。\n- **多变量关系断裂**：传统方法难以同时利用温度、振动、压力等多个相关变量的信息，无法通过其他正常传感器的数据来推断缺失值。\n- **模型训练效果差**：由于输入数据质量低，下游的故障预测模型准确率大幅下滑，频繁出现误报或漏报，工程师不得不花费大量时间人工清洗数据。\n- **长序列依赖丢失**：面对长时间跨度的数据块缺失，基于递归的传统深度学习模型计算缓慢且容易遗忘早期的关键状态信息。\n\n### 使用 SAITS 后\n- **自注意力精准重构**：SAITS 利用纯自注意力机制，自动捕捉时间步之间的全局依赖关系，即使面对大块连续缺失，也能基于上下文生成高保真的填补值。\n- **多变量协同推理**：模型充分利用了多变量间的内在联系，例如通过正常的电流读数精准推导出缺失的扭矩数据，显著提升了数据完整性。\n- **下游任务性能飞跃**：经过 SAITS 修复的数据集直接用于训练，使故障预测模型的 F1 分数提升了 15%，大幅减少了不必要的停机维护检查。\n- **高效并行处理**：得益于非递归的架构设计，SAITS 在处理长达数月的历史数据时速度极快，将原本需要数小时的数据预处理流程缩短至分钟级。\n\nSAITS 通过深度自注意力机制将残缺的工业时序数据转化为高质量资产，让企业能够直接从“脏数据”中挖掘出可靠的决策洞察。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWenjieDu_SAITS_4d8ed26f.png","WenjieDu","Wenjie Du","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FWenjieDu_a86fc77e.jpg","AI Researcher \u003C​Time Series, FDA 510k Regulatory​> More awesome private repos will open source! Follow me to get notified ;-)","@TimeSeries-AI","where time series is observed & valued","wdu@time-series.ai","_W_DU_","https:\u002F\u002FTime-Series.AI","https:\u002F\u002Fgithub.com\u002FWenjieDu",[84,88],{"name":85,"color":86,"percentage":87},"Python","#3572A5",97,{"name":89,"color":90,"percentage":10},"Shell","#89e051",502,70,"2026-04-13T08:11:08","MIT","未说明","未说明 (基于 PyTorch，通常支持 CUDA 加速，但 README 未明确具体型号或版本要求)",{"notes":98,"python":99,"dependencies":100},"该工具现已集成到 PyPOTS 库中，建议通过 PyPOTS 调用。代码基于纯自注意力机制，旨在解决 RNN 模型的速度慢和内存限制问题。具体依赖版本需参考 PyPOTS 官方文档，README 中未列出精确的版本号。","3.x (徽章显示 Python v3，未指定具体小版本)",[101,102,103,104,105,106],"PyTorch","PyPOTS","numpy","scikit-learn","pygrinder","benchpots",[14,35,16],[109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128],"time-series","imputation-model","missing-values","self-attention","partially-observed-data","partially-observed-time-series","partially-observed","interpolation","time-series-imputation","incomplete-data","incomplete-time-series","imputation","impute","pytorch","transformer","attention","attention-mechanism","irregular-sampling","deep-learning","machine-learning",null,"2026-03-27T02:49:30.150509","2026-04-16T01:45:05.335163",[133,138,143,148,153,157],{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},34643,"如何运行超参数优化（NNI tuning）时解决 'No option mit in section training' 错误？","该错误是因为配置文件（.ini）的 [training] 部分缺少了 'MIT' 和 'ORT' 配置项。虽然代码 run_models.py 中使用了这两个参数，但默认的示例配置文件可能未包含它们。解决方法是手动在对应的 .ini 文件（例如 NNI_tuning\u002FSAITS\u002FSAITS_basic_config.ini）的 [training] 部分添加这两行配置：\nMIT = True\nORT = True\n添加后重新运行命令即可。此外，确保你只修改了 NNI 搜索空间 JSON 文件中列出的参数，其他参数应保持不变。","https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fissues\u002F30",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},34644,"BRITS 模型中的 consistency loss（一致性损失）是什么？为什么 RITS 模型不计算它？","BRITS 模型由两个 RITS 模型组成。Consistency loss（一致性损失）是为了约束两个双向 RITS 模型（前向和后向）对同一缺失值的估算结果保持一致，从而利用双向信息增强插补效果。单独的 RITS 模型是单向的（仅前向或仅后向），不存在另一个方向的模型与之进行一致性约束，因此不需要也不计算 consistency loss。同样，RITS 通常主要关注重建损失，而 BRITS 通过结合重建、插补和一致性损失来优化双向结构。建议阅读 BRITS 原始论文以深入理解其损失函数设计。","https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fissues\u002F1",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},34645,"DMSA 模块如何通过一次注意力操作同时捕获时间依赖性和特征相关性？","DMSA（Dual-Mask Self-Attention）机制通过特定的掩码策略和注意力矩阵计算实现这一点。在自注意力计算 Q·K^T 中，生成的 N×N 注意力矩阵不仅反映了时间步之间的关联（时间依赖性），由于输入数据是高维的（包含多个特征），注意力权重在计算过程中自然地融合了不同特征维度之间的交互信息（特征相关性）。具体来说，DMSA 通过在注意力机制中同时考虑时间维度和特征维度的掩码，使得单次注意力操作能够在高维空间中同时学习到这两种依赖关系，无需分别进行两次独立的操作。","https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fissues\u002F11",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},34646,"在使用 SAITS 进行单变量时间序列插补时，如果效果不如简单的线性插值或中位数填充，应该如何调整？","如果在单变量时间序列（n_features=1）或缺失模式简单（如单个缺失或小间隙）的场景下，SAITS 的表现不如传统方法（如线性插值或中位数填充），这通常是因为深度学习模型需要足够的数据复杂度和缺失模式才能发挥优势。建议尝试以下调整：\n1. 增加数据量或引入更多相关特征（将 n_features 设为大于 1），利用多变量相关性提升效果。\n2. 调整超参数，如增加 d_model（模型维度）、n_layers（层数）或 epochs（训练轮数），以增强模型拟合能力。\n3. 检查缺失率，如果缺失率过低，深度学习模型可能难以学习到有效的插补模式。\n4. 对于高频数据（如 5 分钟分辨率），可能需要调整 n_steps（时间步长）以覆盖更长的上下文窗口。\n如果数据本身非常简单，传统统计方法可能是更优选择。","https:\u002F\u002Fgithub.com\u002FWenjieDu\u002FSAITS\u002Fissues\u002F44",{"id":154,"question_zh":155,"answer_zh":156,"source_url":142},34647,"如何在配置文件中指定数据集路径以进行训练？","数据集路径通常在项目的配置文件（.ini 文件）中指定。例如，在 configs\u002FAirQuality_SAITS_best.ini 文件中，查找与数据加载相关的部分（通常是 [data] 或类似段落），修改指向 .h5 或 .csv 数据文件的路径。如果代码默认寻找 .h5 文件但你只有 .csv 文件，你需要先编写脚本将 .csv 转换为 .h5 格式，或者查看代码是否支持直接读取 .csv（部分版本可能需要修改数据加载器代码）。确保路径是相对于项目根目录的绝对路径或正确的相对路径。",{"id":158,"question_zh":159,"answer_zh":160,"source_url":137},34648,"NNI 超参数调优具体会改变哪些参数？如何确认调优过程？","NNI（Neural Network Intelligence）只会改变你在搜索空间配置文件（如 SAITS_searching_space.json）中明确定义的参数。其他在 .ini 文件中设定的参数将保持固定。要确认调优过程：\n1. 检查 SAITS_searching_space.json 文件，其中列出了所有会被 NNI 自动调整和搜索的超参数及其取值范围。\n2. 运行调优命令后，NNI 会在本地或服务器上启动实验，生成日志和中间结果。\n3. 你可以通过 NNI 的 Web UI 界面（通常在命令行启动时给出的地址，如 http:\u002F\u002Flocalhost:8080）实时查看每次试验的参数组合、训练进度和评估指标。\n4. 只有当你对 NNI 框架有一定了解并正确配置了搜索空间后，才能获得有效的调优结果，否则可能会回退到默认配置。",[]]