[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Time-MoE--Time-MoE":3,"tool-Time-MoE--Time-MoE":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",151314,2,"2026-04-11T23:32:58",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":64,"owner_name":64,"owner_avatar_url":72,"owner_bio":73,"owner_company":73,"owner_location":73,"owner_email":73,"owner_twitter":73,"owner_website":73,"owner_url":74,"languages":75,"stars":80,"forks":81,"last_commit_at":82,"license":83,"difficulty_score":10,"env_os":84,"env_gpu":85,"env_ram":84,"env_deps":86,"category_tags":95,"github_topics":96,"view_count":32,"oss_zip_url":73,"oss_zip_packed_at":73,"status":17,"created_at":103,"updated_at":104,"faqs":105,"releases":135},6723,"Time-MoE\u002FTime-MoE","Time-MoE","[ICLR 2025 Spotlight] Official implementation of \"Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts\"","Time-MoE 是一款专为时间序列分析打造的开源基础模型，旨在实现高精度的通用预测。它成功解决了传统模型难以处理超大规模数据、无法灵活适应不同预测长度及复杂场景的痛点。无论是从事算法研究的研究人员，还是需要构建预测系统的开发者，都能利用它轻松应对从能源消耗到金融波动等九大领域的时序任务。\n\n作为该领域的里程碑式作品，Time-MoE 拥有两大核心技术亮点：首先，它是首个参数量高达 24 亿（2.4B）且从头训练的时间序列基础模型，采用了先进的“混合专家”（Mixture of Experts）架构，在保持计算效率的同时大幅提升了模型容量；其次，项目配套发布了包含超过 3000 亿个数据点的 Time-300B 数据集，这是目前公开规模最大的时间序列数据集合。该模型支持自回归生成，能够处理长达 4096 的上下文窗口，并允许用户自由设定预测跨度。凭借其在 ICLR 2025 获得 Spotlight 认可的卓越性能，Time-MoE 为探索时间序列数据的深层规律提供了强大而灵活的工具。","\u003Cdiv align=\"center\">\n  \u003Ch2>\u003Cb>(ICLR'25 Spotlight) Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts \u003C\u002Fb>\u003C\u002Fh2>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n![](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002FTime-MoE\u002FTime-MoE?color=green)\n![](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTime-MoE\u002FTime-MoE?color=yellow)\n![](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002FTime-MoE\u002FTime-MoE?color=lightblue)\n![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-Welcome-green)\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n**[\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.16040\">Paper Page\u003C\u002Fa>]**\n**[\u003Ca href=\"https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FLaYn0IJAOlN9Ufp_qus96Q\">中文解读\u003C\u002Fa>]**\n\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTime-MoE_Time-MoE_readme_36d26aca4dfc.png\" width=\"70\">\n\n\u003C\u002Fp>\n\n\n> 1️⃣ Time-MoE is the **first work** to scale time series foundation models up to **2.4 billion** parameters, trained from\n> scratch.\n\n> 2️⃣ Time-300B is the **largest** open-access time series data collection comprising over **300 billion** time points across >9 domains.\n\n## TODO List\n- [ ] Add covariate support\n- [ ] Enable fine-tuning of Time-MoE for forecasting with dynamic features and support time series classification\n\n## Updates\u002FNews:\n\n🚩 **News** (Feb 2025): Time-MoE has been accpeted by ICLR 2025 as a Spotlight (Top 5.1%)!\n\n🚩 **News** (Oct 2024): Time-MoE introduction in [Chinese](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FLaYn0IJAOlN9Ufp_qus96Q)\n\n🚩 **News** (Oct 2024): [Time-300B](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMaple728\u002FTime-300B) dataset is now available \non 🤗 Hugging Face\n\n🚩 **News** (Oct 2024): [Time-MoE (base)](https:\u002F\u002Fhuggingface.co\u002FMaple728\u002FTimeMoE-50M) and [Time-MoE (large)](https:\u002F\u002Fhuggingface.co\u002FMaple728\u002FTimeMoE-200M) are made available\non 🤗 Hugging Face\n\n🚩 **News** (Sept 2024): Time-MoE preprint has been made available on [arXiv](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.16040)\n\n## Introduction\n\nTime-MoE comprises a family of decoder-only time series foundation models with a mixture-of-experts architecture,\ndesigned to operate in an auto-regressive manner, enabling universal forecasting with arbitrary prediction horizons and\ncontext lengths of up to 4096.\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTime-MoE_Time-MoE_readme_0fa561afcb1f.png\" alt=\"\" align=\"center\" width=\"700px\" \u002F>\n\u003C\u002Fp>\n\n## 📚 Training Data\n\n[Time-300B dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMaple728\u002FTime-300B) is available on 🤗 Hugging Face.\n\nHere's an example of how to use this dataset:\n```python\nimport random\nfrom time_moe.datasets.time_moe_dataset import TimeMoEDataset\n\nds = TimeMoEDataset('Time-300B')\nseq_idx = random.randint(0, len(ds) - 1)\nseq = ds[seq_idx]\n```\n\nThis code snippet shows how to load a random data sequence from the Time-300B dataset. First, download the dataset to the local 'Time-300B' folder, import the TimeMoEDataset class from time_moe.datasets, instantiate the class, and finally retrieve a sequence using a random index.\n\n## 🚀 Getting Started\n\n### Installation\n\n1. Install Python 3.10+, and then install the dependencies:\n\n```shell\npip install -r requirements.txt\n```\n\n**Note: Time-MoE requires `transformers==4.40.1` .**\n\n2. [Optional but **recommended**] Install [flash-attn](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention) for faster training and inference speeds with reduced memory usage.\n\n```shell\npip install flash-attn==2.6.3\n```\n\nor\n\n```shell\npip install packaging\npip install ninja\n# Replace \"64\" with the number of CPU cores available on your machine for faster compilation\nMAX_JOBS=64 pip install flash-attn==2.6.3 --no-build-isolation\n```\n\n### 📈 Making Forecasts\n\n**Note**: The `max_position_embeddings` for Time-MoE is set to during training. This means the maximum sequence length for Time-MoE is **4096**. To achieve optimal forecasting performance, it is recommended that **the sum of `context_length` and `prediction_length` does not exceed 4096.**\nIf you wish to support longer sequence length, please fine-tune Time-MoE with the desired longer sequence length.\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM\n\ncontext_length = 12\nseqs = torch.randn(2, context_length)  # tensor shape is [batch_size, context_length]\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    'Maple728\u002FTimeMoE-50M',\n    device_map=\"cpu\",  # use \"cpu\" for CPU inference, and \"cuda\" for GPU inference.\n    trust_remote_code=True,\n)\n\n# use it when the flash-attn is available\n# model = AutoModelForCausalLM.from_pretrained('Maple728\u002FTimeMoE-50M', device_map=\"auto\", attn_implementation='flash_attention_2', trust_remote_code=True)\n\n# normalize seqs\nmean, std = seqs.mean(dim=-1, keepdim=True), seqs.std(dim=-1, keepdim=True)\nnormed_seqs = (seqs - mean) \u002F std\n\n# forecast\nprediction_length = 6\noutput = model.generate(normed_seqs, max_new_tokens=prediction_length)  # shape is [batch_size, 12 + 6]\nnormed_predictions = output[:, -prediction_length:]  # shape is [batch_size, 6]\n\n# inverse normalize\npredictions = normed_predictions * std + mean\n```\n\n+ If the sequences are normalized already:\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM\n\ncontext_length = 12\nnormed_seqs = torch.randn(2, context_length)  # tensor shape is [batch_size, context_length]\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    'Maple728\u002FTimeMoE-50M',\n    device_map=\"cpu\",  # use \"cpu\" for CPU inference, and \"cuda\" for GPU inference.\n    trust_remote_code=True,\n)\n\n# use it when the flash-attn is available\n# model = AutoModelForCausalLM.from_pretrained('Maple728\u002FTimeMoE-50M', device_map=\"auto\", attn_implementation='flash_attention_2', trust_remote_code=True)\n\n# forecast\nprediction_length = 6\noutput = model.generate(normed_seqs, max_new_tokens=prediction_length)  # shape is [batch_size, 12 + 6]\nnormed_predictions = output[:, -prediction_length:]  # shape is [batch_size, 6]\n```\n\n### Evaluation\n\n+ Prepare the benchmark datasets.\n\nYou can access the well pre-processed datasets\nfrom [[Google Drive]](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1KjnAYr9X3D-jyJpo4yM7Giyq5V1Hga_7?usp=sharing), then place\nthe downloaded contents under `.\u002Fdataset`.\n\n+ [Example] Running the follow command to evaluate on ETTh1.\n\n```shell\npython run_eval.py -d dataset\u002FETT-small\u002FETTh1.csv -p 96\n```\n\n## 🔥 Fine-tuning Time-MoE\n\n### Preparing Your Dataset\n\nTo start fine-tuning Time-MoE, your dataset should be converted into a `jsonl` format. Each line represents a time-series data as a dictionary object, where the `sequence` field contains a list of time-series observations. For example:\n\n```jsonl\n{\"sequence\": [1.0, 2.0, 3.0, ...]}\n{\"sequence\": [11.0, 22.0, 33.0, ...]}\n```\n\nYou have the flexibility to save your converted data in `jsonl`, `json`, or `pickle` format. If you are using the [Time-300B](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMaple728\u002FTime-300B) dataset, you can proceed without any additional preprocessing.\n\n### Training Time-MoE on Your Dataset\n\n**Note: If your dataset is small, it is recommended to set `stride` to `1` by adding `--stride 1` to your training command.**\n\n**CPU**\n\nFor training with cpu, execute the following command and ensure to replace `\u003Cdata_path>` with the path to your prepared dataset:\n\n```bash\npython main.py -d \u003Cdata_path>\n```\n\n**Single Node with Single or Multiple GPUs**\n\nTo leverage a single GPU or multiple GPUs on a single node, use this command:\n\n```bash\npython torch_dist_run.py main.py -d \u003Cdata_path>\n```\n\n**Multi-Nodes Multi-GPUs**\n\nFor training across multiple nodes, additional environment configurations are necessary to facilitate inter-node communication:\n\n```bash\nexport MASTER_ADDR=\u003Cmaster_addr>\nexport MASTER_PORT=\u003Cmaster_port>\nexport WORLD_SIZE=\u003Cworld_size>\nexport RANK=\u003Crank>\n\npython torch_dist_run.py main.py -d \u003Cdata_path>\n```\n\nTo train Time-MoE **from scratch**, simply include the `--from_scratch` argument in your command. Here's how it should look:\n\n```bash\npython torch_dist_run.py main.py -d \u003Cdata_path> --from_scratch\n```\n\nTo explore additional command-line arguments and their usage, invoke the help command:\n\n```bash\npython main.py --help\n```\n\n## Citation\n\n> 🙋 Please let us know if you find out a mistake or have any suggestions!\n\n> 🌟 If you find the Time-MoE models helpful in your research, please consider to star this repository and cite the\n> corresponding [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.16040):\n\n```\n@misc{shi2024timemoe,\n      title={Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts}, \n      author={Xiaoming Shi and Shiyu Wang and Yuqi Nie and Dianqi Li and Zhou Ye and Qingsong Wen and Ming Jin},\n      year={2024},\n      eprint={2409.16040},\n      archivePrefix={arXiv},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.16040}, \n}\n```\n\n## Related Resources\n* TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis, in arXiv 2024. [\\[paper\\]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.16032) [\\[GitHub Repo\\]](https:\u002F\u002Fgithub.com\u002Fkwuking\u002FTimeMixer)\n* Towards Neural Scaling Laws for Time Series Foundation Models, arXiv 2024. [\\[paper\\]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.12360)\n* Foundation Models for Time Series Analysis: A Tutorial and Survey, in *KDD*\n  2024. [\\[paper\\]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.14735) [\\[Tutorial\\]](https:\u002F\u002Fwenhaomin.github.io\u002FFM4TS.github.io\u002F)\n* What Can Large Language Models Tell Us about Time Series Analysis, in *ICML*\n  2024. [\\[paper\\]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.02713)\n* Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects, in *TPAMI*\n  2024. [\\[paper\\]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.10125) [\\[Website\\]](https:\u002F\u002Fgithub.com\u002Fqingsongedu\u002FAwesome-SSL4TS)\n* Transformers in Time Series: A Survey, in *IJCAI*\n  2023. [\\[paper\\]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.07125) [\\[GitHub Repo\\]](https:\u002F\u002Fgithub.com\u002Fqingsongedu\u002Ftime-series-transformers-review)\n* A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection, in *TPAMI* 2024. [\\[paper\\]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.03759) [\\[Website\\]](https:\u002F\u002Fgithub.com\u002FKimMeen\u002FAwesome-GNN4TS)\n\n\n## Acknowledgement\n\nWe appreciate the following GitHub repos a lot for their valuable code and efforts.\n\n- Time-LLM [\\[repo\\]](https:\u002F\u002Fgithub.com\u002FKimMeen\u002FTime-LLM)\n- TimeMixer [\\[repo\\]](https:\u002F\u002Fgithub.com\u002Fkwuking\u002FTimeMixer)\n- Time-Series-Library [\\[repo\\]](https:\u002F\u002Fgithub.com\u002Fthuml\u002FTime-Series-Library)\n- Large (Language) Models and Foundation Models (LLM, LM, FM) for Time Series and Spatio-Temporal\n  Data [\\[repo\\]](https:\u002F\u002Fgithub.com\u002Fqingsongedu\u002FAwesome-TimeSeries-SpatioTemporal-LM-LLM)\n\n## License\n\nThis project is licensed under the Apache-2.0 License.\n","\u003Cdiv align=\"center\">\n  \u003Ch2>\u003Cb>(ICLR'25 Spotlight) Time-MoE：基于专家混合的大规模时间序列基础模型\u003C\u002Fb>\u003C\u002Fh2>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n![](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002FTime-MoE\u002FTime-MoE?color=green)\n![](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTime-MoE\u002FTime-MoE?color=yellow)\n![](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002FTime-MoE\u002FTime-MoE?color=lightblue)\n![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-Welcome-green)\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n**[\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.16040\">论文页面\u003C\u002Fa>]**\n**[\u003Ca href=\"https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FLaYn0IJAOlN9Ufp_qus96Q\">中文解读\u003C\u002Fa>]**\n\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTime-MoE_Time-MoE_readme_36d26aca4dfc.png\" width=\"70\">\n\n\u003C\u002Fp>\n\n\n> 1️⃣ Time-MoE是首个将时间序列基础模型规模扩展至**24亿**参数，并从头开始训练的工作。\n\n> 2️⃣ Time-300B是目前**最大**的开源时间序列数据集，涵盖超过**3000亿**个时间点，涉及超过9个领域。\n\n## 待办事项\n- [ ] 添加协变量支持\n- [ ] 实现对Time-MoE的微调，以支持带有动态特征的预测任务，并支持时间序列分类\n\n## 更新\u002F新闻：\n\n🚩 **新闻**（2025年2月）：Time-MoE已被ICLR 2025接收为Spotlight论文（前5.1%）！\n\n🚩 **新闻**（2024年10月）：Time-MoE的中文介绍已在[微信公众号](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FLaYn0IJAOlN9Ufp_qus96Q)发布。\n\n🚩 **新闻**（2024年10月）：[Time-300B](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMaple728\u002FTime-300B)数据集现已在Hugging Face上公开。\n\n🚩 **新闻**（2024年10月）：[Time-MoE (base)](https:\u002F\u002Fhuggingface.co\u002FMaple728\u002FTimeMoE-50M)和[Time-MoE (large)](https:\u002F\u002Fhuggingface.co\u002FMaple728\u002FTimeMoE-200M)已在Hugging Face上发布。\n\n🚩 **新闻**（2024年9月）：Time-MoE的预印本已在[arXiv](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.16040)上发布。\n\n## 简介\n\nTime-MoE是一系列采用专家混合架构的解码器式时间序列基础模型，设计用于自回归方式运行，能够实现任意预测 horizon 和上下文长度（最长4096）的通用预测任务。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTime-MoE_Time-MoE_readme_0fa561afcb1f.png\" alt=\"\" align=\"center\" width=\"700px\" \u002F>\n\u003C\u002Fp>\n\n## 📚 训练数据\n\n[Time-300B数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMaple728\u002FTime-300B)已在Hugging Face上公开。以下是一个使用该数据集的示例：\n```python\nimport random\nfrom time_moe.datasets.time_moe_dataset import TimeMoEDataset\n\nds = TimeMoEDataset('Time-300B')\nseq_idx = random.randint(0, len(ds) - 1)\nseq = ds[seq_idx]\n```\n\n这段代码展示了如何从Time-300B数据集中加载一条随机数据序列。首先需要将数据集下载到本地的‘Time-300B’文件夹中，然后从time_moe.datasets模块中导入TimeMoEDataset类，实例化该类，并通过随机索引获取一条序列。\n\n## 🚀 快速入门\n\n### 安装\n\n1. 安装Python 3.10及以上版本，然后安装依赖项：\n\n```shell\npip install -r requirements.txt\n```\n\n**注意：Time-MoE需要`transformers==4.40.1` 。**\n\n2. [可选但**推荐**] 安装[flash-attn](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention)，以提升训练和推理速度，同时减少内存占用。\n\n```shell\npip install flash-attn==2.6.3\n```\n\n或者\n\n```shell\npip install packaging\npip install ninja\n# 将“64”替换为你机器可用的CPU核心数，以加快编译速度\nMAX_JOBS=64 pip install flash-attn==2.6.3 --no-build-isolation\n```\n\n### 📈 进行预测\n\n**注意**：Time-MoE的`max_position_embeddings`在训练时已设定。这意味着Time-MoE的最大序列长度为**4096**。为了获得最佳预测效果，建议**`context_length`与`prediction_length`之和不超过4096**。若需支持更长的序列长度，请使用所需的更长序列长度对Time-MoE进行微调。\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM\n\ncontext_length = 12\nseqs = torch.randn(2, context_length)  # 张量形状为[batch_size, context_length]\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    'Maple728\u002FTimeMoE-50M',\n    device_map=\"cpu\",  # 使用“cpu”进行CPU推理，使用“cuda”进行GPU推理。\n    trust_remote_code=True,\n)\n\n# 如果已安装flash-attn\n# model = AutoModelForCausalLM.from_pretrained('Maple728\u002FTimeMoE-50M', device_map=\"auto\", attn_implementation='flash_attention_2', trust_remote_code=True)\n\n# 对序列进行归一化\nmean, std = seqs.mean(dim=-1, keepdim=True), seqs.std(dim=-1, keepdim=True)\nnormed_seqs = (seqs - mean) \u002F std\n\n# 预测\nprediction_length = 6\noutput = model.generate(normed_seqs, max_new_tokens=prediction_length)  # 形状为[batch_size, 12 + 6]\nnormed_predictions = output[:, -prediction_length:]  # 形状为[batch_size, 6]\n\n# 反归一化\npredictions = normed_predictions * std + mean\n```\n\n+ 如果序列已经归一化：\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM\n\ncontext_length = 12\nnormed_seqs = torch.randn(2, context_length)  # 张量形状为[batch_size, context_length]\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    'Maple728\u002FTimeMoE-50M',\n    device_map=\"cpu\",  # 使用“cpu”进行CPU推理，使用“cuda”进行GPU推理。\n    trust_remote_code=True,\n)\n\n# 如果已安装flash-attn\n# model = AutoModelForCausalLM.from_pretrained('Maple728\u002FTimeMoE-50M', device_map=\"auto\", attn_implementation='flash_attention_2', trust_remote_code=True)\n\n# 预测\nprediction_length = 6\noutput = model.generate(normed_seqs, max_new_tokens=prediction_length)  # 形状为[batch_size, 12 + 6]\nnormed_predictions = output[:, -prediction_length:]  # 形状为[batch_size, 6]\n```\n\n### 评估\n\n+ 准备基准测试数据集。\n\n你可以从[[Google Drive]](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1KjnAYr9X3D-jyJpo4yM7Giyq5V1Hga_7?usp=sharing)获取预处理好的数据集，然后将下载的内容放入`.\u002Fdataset`目录下。\n\n+ [示例] 运行以下命令来评估ETTh1数据集。\n\n```shell\npython run_eval.py -d dataset\u002FETT-small\u002FETTh1.csv -p 96\n```\n\n## 🔥 微调Time-MoE\n\n### 准备你的数据集\n\n要开始微调Time-MoE，你的数据集应转换为`jsonl`格式。每行代表一个时间序列数据，以字典对象的形式存储，其中`sequence`字段包含一系列时间序列观测值。例如：\n\n```jsonl\n{\"sequence\": [1.0, 2.0, 3.0, ...]}\n{\"sequence\": [11.0, 22.0, 33.0, ...]}\n```\n\n你可以选择将转换后的数据保存为`jsonl`、`json`或`pickle`格式。如果你使用的是[Time-300B](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMaple728\u002FTime-300B)数据集，则无需额外的预处理即可直接使用。\n\n### 在您的数据集上训练 Time-MoE\n\n**注意：如果您的数据集较小，建议将 `stride` 设置为 `1`，方法是在训练命令中添加 `--stride 1`。**\n\n**CPU**\n\n对于使用 CPU 进行训练，请执行以下命令，并确保将 `\u003Cdata_path>` 替换为您准备好的数据集路径：\n\n```bash\npython main.py -d \u003Cdata_path>\n```\n\n**单节点单 GPU 或多 GPU**\n\n要利用单节点上的单个或多个 GPU，请使用以下命令：\n\n```bash\npython torch_dist_run.py main.py -d \u003Cdata_path>\n```\n\n**多节点多 GPU**\n\n在多节点上进行训练时，需要进行额外的环境配置以促进节点间的通信：\n\n```bash\nexport MASTER_ADDR=\u003Cmaster_addr>\nexport MASTER_PORT=\u003Cmaster_port>\nexport WORLD_SIZE=\u003Cworld_size>\nexport RANK=\u003Crank>\n\npython torch_dist_run.py main.py -d \u003Cdata_path>\n```\n\n要从头开始训练 Time-MoE，只需在命令中加入 `--from_scratch` 参数即可。命令示例如下：\n\n```bash\npython torch_dist_run.py main.py -d \u003Cdata_path> --from_scratch\n```\n\n如需了解其他命令行参数及其用法，可以运行帮助命令：\n\n```bash\npython main.py --help\n```\n\n## 引用\n\n> 🙋 如果您发现任何错误或有任何建议，请告知我们！\n\n> 🌟 如果您认为 Time-MoE 模型对您的研究有所帮助，请考虑给本仓库点赞并引用相应的论文：\n\n```\n@misc{shi2024timemoe,\n      title={Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts}, \n      author={Xiaoming Shi and Shiyu Wang and Yuqi Nie and Dianqi Li and Zhou Ye and Qingsong Wen and Ming Jin},\n      year={2024},\n      eprint={2409.16040},\n      archivePrefix={arXiv},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.16040}, \n}\n```\n\n## 相关资源\n* TimeMixer++：用于通用预测分析的通用时间序列模式机器，发表于 arXiv 2024 年。[论文链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.16032) [GitHub 仓库链接](https:\u002F\u002Fgithub.com\u002Fkwuking\u002FTimeMixer)\n* 面向时间序列基础模型的神经规模定律，发表于 arXiv 2024 年。[论文链接](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.12360)\n* 时间序列分析的基础模型：教程与综述，发表于 *KDD* 2024 年。[论文链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.14735) [教程链接](https:\u002F\u002Fwenhaomin.github.io\u002FFM4TS.github.io\u002F)\n* 大型语言模型能为我们提供哪些关于时间序列分析的信息，发表于 *ICML* 2024 年。[论文链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.02713)\n* 时间序列分析中的自监督学习：分类、进展与展望，发表于 *TPAMI* 2024 年。[论文链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.10125) [网站链接](https:\u002F\u002Fgithub.com\u002Fqingsongedu\u002FAwesome-SSL4TS)\n* 时间序列中的 Transformer：综述，发表于 *IJCAI* 2023 年。[论文链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.07125) [GitHub 仓库链接](https:\u002F\u002Fgithub.com\u002Fqingsongedu\u002Ftime-series-transformers-review)\n* 关于图神经网络在时间序列中的应用：预测、分类、插补和异常检测的综述，发表于 *TPAMI* 2024 年。[论文链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.03759) [网站链接](https:\u002F\u002Fgithub.com\u002FKimMeen\u002FAwesome-GNN4TS)\n\n## 致谢\n\n我们非常感谢以下 GitHub 仓库提供的宝贵代码和努力。\n\n- Time-LLM [仓库链接](https:\u002F\u002Fgithub.com\u002FKimMeen\u002FTime-LLM)\n- TimeMixer [仓库链接](https:\u002F\u002Fgithub.com\u002Fkwuking\u002FTimeMixer)\n- 时间序列库 [仓库链接](https:\u002F\u002Fgithub.com\u002Fthuml\u002FTime-Series-Library)\n- 面向时间序列和时空数据的大（语言）模型及基础模型（LLM、LM、FM）[仓库链接](https:\u002F\u002Fgithub.com\u002Fqingsongedu\u002FAwesome-TimeSeries-SpatioTemporal-LM-LLM)\n\n## 许可证\n\n本项目采用 Apache-2.0 许可证。","# Time-MoE 快速上手指南\n\nTime-MoE 是首个参数量高达 **24 亿** 的时间序列基础模型，采用混合专家（MoE）架构，支持长达 4096 的上下文长度和任意预测步长的自回归预测。\n\n## 1. 环境准备\n\n*   **系统要求**：Linux \u002F macOS \u002F Windows\n*   **Python 版本**：3.10 或更高\n*   **核心依赖**：\n    *   `transformers==4.40.1` (必须严格匹配此版本)\n    *   `torch` (建议安装支持 CUDA 的版本以加速推理)\n*   **可选加速**：推荐安装 `flash-attn` 以提升训练和推理速度并降低显存占用。\n\n## 2. 安装步骤\n\n### 第一步：安装基础依赖\n克隆项目或直接安装依赖包：\n```shell\npip install -r requirements.txt\n```\n> **注意**：请确保安装后 `transformers` 版本为 `4.40.1`。\n\n### 第二步：（推荐）安装 Flash Attention\n为了获得最佳性能，建议安装 `flash-attn`。\n```shell\npip install flash-attn==2.6.3\n```\n如果上述命令编译较慢，可使用以下命令利用多核加速编译（将 `64` 替换为你机器的 CPU 核心数）：\n```shell\npip install packaging ninja\nMAX_JOBS=64 pip install flash-attn==2.6.3 --no-build-isolation\n```\n\n## 3. 基本使用\n\n以下示例展示如何加载预训练模型并进行时间序列预测。模型默认最大序列长度为 **4096**（即 `context_length` + `prediction_length` ≤ 4096）。\n\n### 代码示例\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM\n\n# 1. 准备数据\ncontext_length = 12\n# 生成随机测试数据 [batch_size, context_length]\nseqs = torch.randn(2, context_length)\n\n# 2. 加载模型\n# 可选模型: Maple728\u002FTimeMoE-50M (Base), Maple728\u002FTimeMoE-200M (Large)\nmodel = AutoModelForCausalLM.from_pretrained(\n    'Maple728\u002FTimeMoE-50M',\n    device_map=\"auto\",  # 自动选择设备 (cuda\u002Fcpu)，CPU 推理可设为 \"cpu\"\n    trust_remote_code=True,\n    # 若已安装 flash-attn，取消下面这行的注释以启用加速\n    # attn_implementation='flash_attention_2', \n)\n\n# 3. 数据标准化 (Time-MoE 建议在输入前进行归一化)\nmean, std = seqs.mean(dim=-1, keepdim=True), seqs.std(dim=-1, keepdim=True)\nnormed_seqs = (seqs - mean) \u002F std\n\n# 4. 执行预测\nprediction_length = 6\n# generate 返回的是 [input + prediction]，需截取后半部分\noutput = model.generate(normed_seqs, max_new_tokens=prediction_length)\nnormed_predictions = output[:, -prediction_length:]\n\n# 5. 反标准化还原结果\npredictions = normed_predictions * std + mean\n\nprint(f\"预测结果形状：{predictions.shape}\") # [batch_size, prediction_length]\n```\n\n### 使用说明\n*   **模型选择**：示例中使用的是 `TimeMoE-50M`，如需更强能力可替换为 `Maple728\u002FTimeMoE-200M`。\n*   **输入格式**：输入张量形状应为 `[batch_size, context_length]`。\n*   **长度限制**：请确保输入长度与预测长度之和不超过 4096。若需处理更长序列，需要对模型进行微调。","某大型连锁零售企业的算法团队正面临黑五促销期间，需对全球数万个门店的百万级 SKU 进行高精度销量预测的挑战。\n\n### 没有 Time-MoE 时\n- **模型规模受限**：传统时序模型参数量小，难以捕捉跨品类、跨区域的复杂长期依赖关系，导致突发促销趋势预测失准。\n- **数据孤岛严重**：缺乏统一的大规模预训练基础，每个品类需单独训练小模型，无法利用其他领域的通用时序规律，冷启动商品预测效果极差。\n- **推理效率低下**：面对海量序列并发请求，原有架构难以在有限算力下实现长上下文（>2000 步）的快速自回归生成，延迟严重影响补货决策。\n- **泛化能力不足**：模型对未见过的销售模式或异常波动适应性弱，需频繁人工干预调整参数。\n\n### 使用 Time-MoE 后\n- **十亿级参数赋能**：Time-MoE 凭借 24 亿参数规模与混合专家（MoE）架构，精准捕捉全球销售数据中的宏观趋势与微观波动，大幅提升大促峰值预测准确率。\n- **通用基础模型优势**：基于包含 3000 亿数据点的 Time-300B 数据集预训练，Time-MoE 将成熟品类的时序知识迁移至新品类，显著改善冷启动商品的预测表现。\n- **高效长序列推理**：利用其支持的 4096 长度上下文及 Flash Attention 加速，Time-MoE 在保持低延迟的同时，能一次性输入更长历史数据进行更稳健的长周期推演。\n- **动态适应性强**：自回归机制使 Time-MoE 能灵活应对任意预测跨度，自动适应不同地区的季节性差异和突发市场变化，减少人工调优成本。\n\nTime-MoE 通过十亿级参数规模与大规模预训练，将时序预测从“单点定制”升级为“通用智能”，彻底解决了复杂场景下的精度与效率瓶颈。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTime-MoE_Time-MoE_1f1735d7.png","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FTime-MoE_e435d6d1.png",null,"https:\u002F\u002Fgithub.com\u002FTime-MoE",[76],{"name":77,"color":78,"percentage":79},"Python","#3572A5",100,944,110,"2026-04-11T09:47:15","Apache-2.0","未说明","非必需（支持 CPU 推理），但推荐安装 flash-attn 以加速训练和推理并降低显存占用；具体显卡型号、显存大小及 CUDA 版本未在文中明确指定",{"notes":87,"python":88,"dependencies":89},"模型最大序列长度（context_length + prediction_length）限制为 4096，若需更长序列需进行微调；小数据集微调时建议设置步长（stride）为 1；支持单机多卡及多节点分布式训练。","3.10+",[90,91,92,93,94],"transformers==4.40.1","flash-attn==2.6.3 (可选但推荐)","packaging","ninja","torch",[14],[97,98,99,100,101,102],"time-series","time-series-foundation-model","deep-learning","large-model","machine-learning","time-series-forecasting","2026-03-27T02:49:30.150509","2026-04-12T07:51:55.980255",[106,111,116,121,126,131],{"id":107,"question_zh":108,"answer_zh":109,"source_url":110},30341,"如何应用 Time-300B 数据集进行训练？遇到 NaN 值该怎么办？","Time-300B 数据集包含极长的时间序列，直接输入模型会导致显存溢出。常见的处理方法是将原始时间序列分割成多个固定长度的序列后再输入模型。如果在运行代码时遇到 NaN 值，请尝试以下操作：1. 将 Python 版本切换为 3.10.16；2. 重新从 Hugging Face Git 仓库下载数据集，确保文件完整；3. 检查是否在 Mac 和 Linux 系统上均复现了该问题，维护者在这些环境下测试未出现 NaN。如果问题依旧，可能是下载的文件损坏，建议重新下载汇率数据文件。","https:\u002F\u002Fgithub.com\u002FTime-MoE\u002FTime-MoE\u002Fissues\u002F41",{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},30342,"是否有用于训练的 dataset 代码？小模型如何复现？","train_dataset 实际上是 3000 亿规模数据集的销售部分。由于 Time-300B 数据集序列过长，直接在 32 层 TimeMoeDecoderLayer 中实例化会导致显存溢出（OOM）。解决方案是将长序列切分为固定长度的片段。目前微调代码已开源，请参阅项目根目录下的 `README.md` 获取详细教程。","https:\u002F\u002Fgithub.com\u002FTime-MoE\u002FTime-MoE\u002Fissues\u002F15",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},30343,"零样本预测（zero-shot forecasting）效果不佳或训练损失不下降怎么办？","如果训练损失停滞不前（例如保持在 0.04），通常是因为 `stride` 参数未按预期工作导致训练迭代次数不足。请务必使用最新版本的代码进行训练，具体提交 ID 为：`7f257c4aa02980784dfa53c488216cf04c897450`。推荐的训练命令如下：\n`python torch_dist_run.py main.py -d output.jsonl -m TimeMoE-50M --max_length 512 --num_train_epochs 10 --global_batch_size 8 --stride 1 -o logs\u002Ftime_moe`\n注意：如果 `output.jsonl` 有 1138 个时间点且 `max_length` 设为 512，样本数为 626；若 `global_batch_size` 为 8，则每 epoch 约 78 步。设置 10 个 epoch 总步数应为 780 左右。","https:\u002F\u002Fgithub.com\u002FTime-MoE\u002FTime-MoE\u002Fissues\u002F70",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},30344,"全量微调（full-shot）的实验设置是怎样的？是否使用了 Flash Attention？","Time-MoE 的微调策略与其他时序模型不同：它是在所有可用数据集（如 ETTh1, ETTh2, ETTm1, ETTm2 等的组合）上进行训练，然后在单个基准测试集上进行评估。虽然这种通用训练方式在特定基准上的表现可能略逊于针对单一数据集训练的基线模型（如 PatchTST），但能提供更好的泛化能力。关于 Flash Attention，如果在微调过程中发现结果不稳定或优于论文报告，建议检查是否严格遵循了上下文长度设置（如预测 96\u002F192\u002F336\u002F720 对应上下文 512\u002F1024\u002F2048\u002F3072）。","https:\u002F\u002Fgithub.com\u002FTime-MoE\u002FTime-MoE\u002Fissues\u002F65",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},30345,"如何在自定义数据集上微调模型？需要严格遵循 TimeMoEDataset 格式吗？","是的，若要在自定义数据集上微调，建议严格参照 `TimeMoEDataset` 类的代码结构来构建数据集。构建完成后，可以使用提供的 `hf_trainer` 进行训练。关键在于处理长序列时的切分逻辑，需确保输入数据的格式与模型预期的固定长度序列一致，以避免显存溢出。具体实现细节可参考 `README.md` 中的微调教程。","https:\u002F\u002Fgithub.com\u002FTime-MoE\u002FTime-MoE\u002Fissues\u002F17",{"id":132,"question_zh":133,"answer_zh":134,"source_url":120},30346,"训练时的批次大小（batch size）和步数（steps）是如何计算的？","训练步数取决于数据量、最大长度（max_length）和批次大小。例如，若 `output.jsonl` 包含 1138 个时间点，设置 `max_length` 为 512，则训练样本数量为 $1138 - 512 = 626$。若 `global_batch_size` 设置为 8，则每个 epoch 的批次数量约为 $626 \\div 8 \\approx 78$。如果设置 `num_train_epochs` 为 10，总训练步数即为 780 步。请确保 `stride` 参数正确配置，否则实际迭代次数可能会少于预期。",[]]