[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-limix-ldm-ai--LimiX":3,"tool-limix-ldm-ai--LimiX":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":92,"forks":93,"last_commit_at":94,"license":95,"difficulty_score":10,"env_os":96,"env_gpu":97,"env_ram":98,"env_deps":99,"category_tags":113,"github_topics":114,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":119,"updated_at":120,"faqs":121,"releases":152},2174,"limix-ldm-ai\u002FLimiX","LimiX","LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03505","LimiX 是一款专为结构化数据设计的通用人工智能模型，旨在打破传统表格学习中“一个任务一个模型”的局限。它通过统一的训练和推理框架，能够同时处理分类、回归、缺失值填补、特征选择、样本筛选乃至因果推断等多种复杂任务，将原本分散的定制流程整合为高效的基础模型方案。\n\n对于需要处理复杂表格数据的研究人员和开发者而言，LimiX 解决了以往需针对不同目标构建独立管道的痛点，显著提升了建模效率与泛化能力。其核心采用针对结构化数据优化的 Transformer 架构，创新地在样本和特征两个维度上应用注意力机制，精准捕捉关键数据模式。此外，最新发布的 LimiX-2M 版本在保持高性能的同时，大幅降低了显存占用并加快了推理速度，使得在资源受限环境下部署成为可能。无论是从事数据挖掘的算法工程师，还是探索通用智能边界的科研人员，LimiX 都能提供强大的底层支持，助力从专用小模型向统一大模型的范式转变。","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_8d74123fd809.png\" alt=\"LimiX summary\" width=\"89%\">\n\u003C\u002Fdiv>\n\n#  :boom: News\n - 2025-11-10: LimiX-2M is officially released! Compared to LimiX-16M, this smaller variant offers significantly lower GPU memory usage and faster inference speed. The retrieval mechanism has also been enhanced, further improving model performance while reducing both inference time and memory consumption.\n - 2025-08-29: LimiX V1.0 Released.\n\n#  ⚡ Latest Results Compared with SOTA Models\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_566753c8fef3.png\"  width=\"30%\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_079157ce9ce5.png\"  width=\"30%\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_56161228fe46.png\" width=\"30%\">  \n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_5a8d3a0460f7.png\"  width=\"30%\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_55f82343f6e0.png\" width=\"30%\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_069a74c63e51.png\" width=\"30%\">\n\u003C\u002Fdiv>\n\n\n# ➤ Overview\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_1db041df5eb7.png\" alt=\"LimiX summary\" width=\"89%\">\n\u003C\u002Fdiv>\nWe introduce LimiX, the first installment of our LDM series. LimiX aims to push generality further: a single model that handles classification, regression, missing-value imputation, feature selection, sample selection, and causal inference under one training and inference recipe, advancing the shift from bespoke pipelines to unified, foundation-style tabular learning.\n\nLimiX adopts a transformer architecture optimized for structured data modeling and task generalization. The model first embeds features X and targets Y from the prior knowledge base into token representations. Within the core modules, attention mechanisms are applied across both sample and feature dimensions to identify salient patterns in key samples and features. The resulting high-dimensional representations are then passed to regression and classification heads, enabling the model to support diverse predictive tasks. \n\nFor details, please refer to the technical report at the link: [LimiX:Unleashing Structured-Data Modeling Capability for Generalist Intelligence](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03505) or [LimiX_Technical_Report.pdf](https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX\u002Fblob\u002Fmain\u002FLimiX_Technical_Report.pdf).\n\n# ➤ Superior Performance \nThe LimiX model achieved SOTA performance across multiple tasks.\n\n## ➩ Classification \n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_6236d0f062be.png\" width=\"60%\">\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_779b9cf0cb62.png\" width=\"45%\" style=\"margin-right:2%;\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_52a51a00ebf1.png\" width=\"42.5%\">\n\u003C\u002Fdiv>\n\n## ➩ Regression \n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_bf717bed9d44.png\" width=\"60%\">\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_9761e510947b.png\" width=\"45%\" style=\"margin-right:2%;\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_8dae30ed687a.png\" width=\"40.3%\">\n\u003C\u002Fdiv>\n\n## ➩ Missing Values Imputation \n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_f473cb2c257f.png\" alt=\"Missing value imputation\" width=\"60%\">\n\u003C\u002Fdiv>\n\n# ➤ Tutorials \n## ➩ Installation\n### Option 1 (recommended): Use the Dockerfile\nDownload [Dockerfile](https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX\u002Fblob\u002Fmain\u002FDockerfile)\n```bash\ndocker build --network=host -t limix\u002Finfe:v1 --build-arg FROM_IMAGES=nvidia\u002Fcuda:12.2.0-base-ubuntu22.04 -f Dockerfile .\n```\n\n### Option 2: Build manually\nDownload the prebuilt flash_attn files\n```bash\nwget -O flash_attn-2.8.0.post2+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.8.0.post2\u002Fflash_attn-2.8.0.post2+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl\n```\nInstall Python dependencies\n```bash\npip install python==3.12.7 torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1\npip install flash_attn-2.8.0.post2+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl\npip install scikit-learn  einops  huggingface-hub matplotlib networkx numpy pandas  scipy tqdm typing_extensions xgboost kditransform hyperopt\n```\n\n### Download source code\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX.git\ncd LimiX\n```\n\n# ➤ Inference\nLimiX supports tasks such as classification, regression, and missing value imputation\n## ➩ Model download\n| Model size | Download link | Tasks supported |\n| --- | --- | --- |\n| LimiX-16M | [LimiX-16M.ckpt](https:\u002F\u002Fhuggingface.co\u002Fstableai-org\u002FLimiX-16M\u002Ftree\u002Fmain) |  ✅ classification  ✅regression   ✅missing value imputation |\n| LimiX-2M | [LimiX-2M.ckpt](https:\u002F\u002Fhuggingface.co\u002Fstableai-org\u002FLimiX-2M\u002Ftree\u002Fmain) |  ✅ classification  ✅regression |\n\n## ➩ Interface description\n\n### Model Creation\n```python\nclass LimiXPredictor:\n    def __init__(self,\n                 device:torch.device,\n                 model_path:str,\n                 mix_precision:bool=True,\n                 inference_config: list|str,\n                 categorical_features_indices:List[int]|None=None,\n                 outlier_remove_std: float=12,\n                 softmax_temperature:float=0.9,\n                 task_type: Literal['Classification', 'Regression']='Classification',\n                 mask_prediction:bool=False,\n                 inference_with_DDP: bool = False,\n                 seed:int=0)\n```\n| Parameter | Data Type | Description |\n|--------|----------|----------|\n| device | torch.device | The hardware that loads the model |\n| model_path | str | The path to the model that needs to be loaded |\n| mix_precision | bool | Whether to enable the mixed precision inference |\n| inference_config | list\u002Fstr | Configuration file used for inference |\n| categorical_features_indices | list | The indices of categorical columns in the tabular data |\n| outlier_remove_std | float | The threshold is employed to remove outliers, defined as values that are multiples of the standard deviation |\n| softmax_temperature | float | The temperature used to control the behavior of softmax operator |\n| task_type | str | The task type which can be either \"Classification\" or \"Regression\" |\n| mask_prediction | bool | Whether to enable missing value imputation |\n| inference_with_DDP | bool | Whether to enable DDP during inference |\n| seed | int | The seed to control random states |\n### Predict\n```python\ndef predict(self, x_train:np.ndarray, y_train:np.ndarray, x_test:np.ndarray) -> np.ndarray:\n```\n| Parameter   | Data Type    | Description           |\n| ------- | ---------- | ----------------- |\n| x_train  | np.ndarray  | The input features of the training set   |\n| y_train  | np.ndarray  | The target variable of the training set   |\n| x_test   | np.ndarray  | The input features of the test set   |\n\n## Inference Configuration File Description\n| Configuration File Name | Description | Difference |\n| ------- | ---------- | ----- |\n| cls_default_retrieval.json | Default **classification task** inference configuration file **with retrieval** | Better classification performance |\n| cls_default_noretrieval.json | Default **classification task** inference configuration file **without retrieval** | Faster speed, lower memory requirements |\n| reg_default_retrieval.json | Default **regression task** inference configuration file **with retrieval** | Better regression performance |\n| reg_default_noretrieval.json | Default **regression task** inference configuration file **without retrieval** | Faster speed, lower memory requirements |\n| reg_default_noretrieval_MVI.json | Default inference configuration file for **missing value imputation task** |  |\n\n## ➩ Ensemble Inference Based on Sample Retrieval\n\nFor a detailed technical introduction to Ensemble Inference Based on Sample Retrieval, please refer to the [technical report](https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX\u002Fblob\u002Fmain\u002FLimiX_Technical_Report.pdf).\n\nConsidering inference speed and memory requirements, ensemble inference based on sample retrieval currently only supports hardware with specifications higher than the NVIDIA RTX 4090 GPU.\n\n### Classification Task\n\n```\npython inference_classifier.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data\n```\n\n### Regression Task\n\n```\npython inference_regression.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data\n```\n\n### Customizing Data Preprocessing for Inference Tasks\n#### First, Generate the Inference Configuration File\n\n```python\ngenerate_inference_config()\n```\n\n### Classification Task\n#### Single GPU or CPU\n\n```\npython  inference_classifier.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data\n```\n\n#### Multi-GPU Distributed Inference\n\n```\ntorchrun --nproc_per_node=8  inference_classifier.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data --inference_with_DDP\n```\n\n### Regression Task\n#### Single GPU or CPU\n\n```\npython  inference_regression.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data\n```\n\n#### Multi-GPU Distributed Inference\n\n```\ntorchrun --nproc_per_node=8  inference_regression.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data --inference_with_DDP\n```\n\n### Retrieval Optimization Project\nThis project implements an optimized retrieval system. To achieve the best performance, we utilize Optuna for hyperparameter tuning of retrieval parameters.\n#### Installation\nEnsure you have the required dependencies installed:\n```\npip install optuna\n```\n#### Usage\nFor standard inference using pre-optimized parameters, refer to the code below:\n```\nsearchInference = RetrievalSearchHyperparameters(\n           dict(device_id=0,model_path=model_path), X_train, y_train, X_test, y_test,\n)\nconfig, result = searchInference.search(n_trials=10, metric=\"AUC\",\n              inference_config='config\u002Fcls_default_retrieval.json',task_type=\"cls\")\n```\nThis will launch an Optuna study to find the best combination of retrieval parameters for your specific dataset and use case.\n\n## ➩ Classification\n```python\nfrom sklearn.datasets import load_breast_cancer\nfrom sklearn.metrics import accuracy_score, roc_auc_score\nfrom sklearn.model_selection import train_test_split\nfrom huggingface_hub import hf_hub_download\nimport numpy as np\nimport os, sys\n\nos.environ[\"RANK\"] = \"0\"\nos.environ[\"WORLD_SIZE\"] = \"1\"\nos.environ[\"MASTER_ADDR\"] = \"127.0.0.1\"\nos.environ[\"MASTER_PORT\"] = \"29500\"\n\nROOT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), \"..\"))\nif ROOT_DIR not in sys.path:\n    sys.path.insert(0, ROOT_DIR)\nfrom inference.predictor import LimiXPredictor\n\nX, y = load_breast_cancer(return_X_y=True)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)\n\nmodel_file = hf_hub_download(repo_id=\"stableai-org\u002FLimiX-16M\", filename=\"LimiX-16M.ckpt\", local_dir=\".\u002Fcache\")\n\nclf = LimiXPredictor(device=torch.device('cuda'), model_path=model_file, inference_config='config\u002Fcls_default_retrieval.json')\nprediction = clf.predict(X_train, y_train, X_test)\n\nprint(\"roc_auc_score:\", roc_auc_score(y_test, prediction[:, 1]))\nprint(\"accuracy_score:\", accuracy_score(y_test, np.argmax(prediction, axis=1)))\n```\nFor additional examples, refer to [inference_classifier.py](.\u002Finference_classifier.py)\n\n## ➩ Regression\n```python\nfrom functools import partial\n\nfrom sklearn.datasets import fetch_california_housing\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import r2_score\nfrom huggingface_hub import hf_hub_download\ntry:\n    from sklearn.metrics import root_mean_squared_error as mean_squared_error\nexcept:\n    from sklearn.metrics import mean_squared_error\n    mean_squared_error = partial(mean_squared_error, squared=False)\nimport os, sys\n\nos.environ[\"RANK\"] = \"0\"\nos.environ[\"WORLD_SIZE\"] = \"1\"\nos.environ[\"MASTER_ADDR\"] = \"127.0.0.1\"\nos.environ[\"MASTER_PORT\"] = \"29500\"\n\nROOT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), \"..\"))\nif ROOT_DIR not in sys.path:\n    sys.path.insert(0, ROOT_DIR)\nfrom inference.predictor import LimiXPredictor\n\nhouse_data = fetch_california_housing()\nX, y = house_data.data, house_data.target\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\n\ny_mean = y_train.mean()\ny_std = y_train.std()\ny_train_normalized = (y_train - y_mean) \u002F y_std\ny_test_normalized = (y_test - y_mean) \u002F y_std\n\nmodel_path = hf_hub_download(repo_id=\"stableai-org\u002FLimiX-16M\", filename=\"LimiX-16M.ckpt\", local_dir=\".\u002Fcache\")\n\nmodel = LimiXPredictor(device=torch.device('cuda'), model_path=model_path, inference_config='config\u002Freg_default_retrieval.json')\ny_pred = model.predict(X_train, y_train_normalized, X_test)    \n\n# Compute RMSE and R²\ny_pred = y_pred.to('cpu').numpy()\nrmse = mean_squared_error(y_test_normalized, y_pred)\nr2 = r2_score(y_test_normalized, y_pred)\n\nprint(f'RMSE: {rmse}')\nprint(f'R2: {r2}')\n```\nFor additional examples, refer to [inference_regression.py](.\u002Finference_regression.py)\n\n## ➩ Missing value imputation\nFor the demo file, see [examples\u002Fdemo_missing_value_imputation.py](examples\u002Finference_regression.py)\n\n# ➤ Link\n - LimiX:Unleashing Structured-Data Modeling Capability for Generalist Intelligence: [LimiX:Unleashing Structured-Data Modeling Capability for Generalist Intelligence](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03505)\n - LimiX Technical Report: [LimiX_Technical_Report.pdf](https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX\u002Fblob\u002Fmain\u002FLimiX_Technical_Report.pdf)\n - Detailed instructions for using Limix: [Visit the official Limix documentation](https:\u002F\u002Fwww.limix.ai\u002Fdoc\u002F)\n - Balance Comprehensive Challenging Omni-domain Classification Benchmark: [bcco_cls](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fstableai-org\u002Fbcco_cls)\n - Balance Comprehensive Challenging Omni-domain Regression Benchmark: [bcco_reg](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fstableai-org\u002Fbcco_reg)\n\n# ➤ License\nThe code in this repository is open-sourced under the [Apache-2.0](LICENSE.txt) license, while the usage of the LimiX model weights is subject to the Model License. The LimiX weights are fully available for academic research and may be used commercially upon obtaining proper authorization.\n\n# ➤ Citation\n```\n@article{zhang2025limix,\n  title={Limix: Unleashing structured-data modeling capability for generalist intelligence},\n  author={Zhang, Xingxuan and Ren, Gang and Yu, Han and Yuan, Hao and Wang, Hui and Li, Jiansheng and Wu, Jiayun and Mo, Lang and Mao, Li and Hao, Mingchao and others},\n  journal={arXiv preprint arXiv:2509.03505},\n  year={2025}\n}\n```\n","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_8d74123fd809.png\" alt=\"LimiX summary\" width=\"89%\">\n\u003C\u002Fdiv>\n\n#  :boom: 新闻\n - 2025年11月10日：LimiX-2M正式发布！相较于LimiX-16M，这一更小的版本在显存占用上显著降低，推理速度更快。同时，检索机制也得到了增强，进一步提升了模型性能，并减少了推理时间和内存消耗。\n - 2025年8月29日：LimiX V1.0发布。\n\n#  ⚡ 最新结果与SOTA模型对比\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_566753c8fef3.png\"  width=\"30%\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_079157ce9ce5.png\"  width=\"30%\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_56161228fe46.png\" width=\"30%\">  \n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_5a8d3a0460f7.png\"  width=\"30%\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_55f82343f6e0.png\" width=\"30%\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_069a74c63e51.png\" width=\"30%\">\n\u003C\u002Fdiv>\n\n\n# ➤ 概述\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_1db041df5eb7.png\" alt=\"LimiX summary\" width=\"89%\">\n\u003C\u002Fdiv>\n我们推出了LimiX，这是我们LDM系列的第一款产品。LimiX旨在进一步提升通用性：通过一套统一的训练与推理流程，即可完成分类、回归、缺失值填补、特征选择、样本选择以及因果推断等任务，推动从定制化流水线向统一的基础型表格数据学习范式的转变。\n\nLimiX采用专为结构化数据建模和任务泛化优化的Transformer架构。该模型首先将来自先验知识库的特征X和目标Y嵌入为标记表示。在核心模块中，注意力机制同时作用于样本和特征维度，以识别关键样本和特征中的重要模式。随后，这些高维表示会被传递到回归和分类头，从而使模型能够支持多种预测任务。\n\n有关详细信息，请参阅以下技术报告：[LimiX:释放结构化数据建模能力，助力通用智能](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03505) 或 [LimiX_Technical_Report.pdf](https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX\u002Fblob\u002Fmain\u002FLimiX_Technical_Report.pdf)。\n\n# ➤ 卓越性能 \nLimiX模型在多项任务中均取得了SOTA水平的表现。\n\n## ➩ 分类 \n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_6236d0f062be.png\" width=\"60%\">\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_779b9cf0cb62.png\" width=\"45%\" style=\"margin-right:2%;\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_52a51a00ebf1.png\" width=\"42.5%\">\n\u003C\u002Fdiv>\n\n## ➩ 回归 \n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_bf717bed9d44.png\" width=\"60%\">\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_9761e510947b.png\" width=\"45%\" style=\"margin-right:2%;\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_8dae30ed687a.png\" width=\"40.3%\">\n\u003C\u002Fdiv>\n\n## ➩ 缺失值填补 \n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_readme_f473cb2c257f.png\" alt=\"缺失值填补\" width=\"60%\">\n\u003C\u002Fdiv>\n\n# ➤ 教程 \n## ➩ 安装\n### 方案一（推荐）：使用Dockerfile\n下载 [Dockerfile](https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX\u002Fblob\u002Fmain\u002FDockerfile)\n```bash\ndocker build --network=host -t limix\u002Finfe:v1 --build-arg FROM_IMAGES=nvidia\u002Fcuda:12.2.0-base-ubuntu22.04 -f Dockerfile .\n```\n\n### 方案二：手动构建\n下载预编译的flash_attn文件\n```bash\nwget -O flash_attn-2.8.0.post2+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.8.0.post2\u002Fflash_attn-2.8.0.post2+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl\n```\n安装Python依赖\n```bash\npip install python==3.12.7 torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1\npip install flash_attn-2.8.0.post2+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl\npip install scikit-learn  einops  huggingface-hub matplotlib networkx numpy pandas  scipy tqdm typing_extensions xgboost kditransform hyperopt\n```\n\n### 下载源代码\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX.git\ncd LimiX\n```\n\n# ➤ 推理\nLimiX支持分类、回归和缺失值填补等任务\n## ➩ 模型下载\n| 模型大小 | 下载链接 | 支持的任务 |\n| --- | --- | --- |\n| LimiX-16M | [LimiX-16M.ckpt](https:\u002F\u002Fhuggingface.co\u002Fstableai-org\u002FLimiX-16M\u002Ftree\u002Fmain) |  ✅ 分类  ✅回归   ✅缺失值填补 |\n| LimiX-2M | [LimiX-2M.ckpt](https:\u002F\u002Fhuggingface.co\u002Fstableai-org\u002FLimiX-2M\u002Ftree\u002Fmain) |  ✅ 分类  ✅回归 |\n\n## ➩ 接口说明\n\n### 模型创建\n```python\nclass LimiXPredictor:\n    def __init__(self,\n                 device:torch.device,\n                 model_path:str,\n                 mix_precision:bool=True,\n                 inference_config: list|str,\n                 categorical_features_indices:List[int]|None=None,\n                 outlier_remove_std: float=12,\n                 softmax_temperature:float=0.9,\n                 task_type: Literal['Classification', 'Regression']='Classification',\n                 mask_prediction:bool=False,\n                 inference_with_DDP: bool = False,\n                 seed:int=0)\n```\n| 参数 | 数据类型 | 描述 |\n|--------|----------|----------|\n| device | torch.device | 加载模型的硬件设备 |\n| model_path | str | 需要加载的模型路径 |\n| mix_precision | bool | 是否启用混合精度推理 |\n| inference_config | list\u002Fstr | 用于推理的配置文件 |\n| categorical_features_indices | list | 表格数据中分类列的索引 |\n| outlier_remove_std | float | 用于去除异常值的阈值，定义为标准差的倍数 |\n| softmax_temperature | float | 用于控制Softmax操作行为的温度 |\n| task_type | str | 任务类型，可为“分类”或“回归” |\n| mask_prediction | bool | 是否启用缺失值填补 |\n| inference_with_DDP | bool | 是否在推理过程中启用DDP |\n| seed | int | 用于控制随机状态的种子 |\n### 预测\n```python\ndef predict(self, x_train:np.ndarray, y_train:np.ndarray, x_test:np.ndarray) -> np.ndarray:\n```\n| 参数   | 数据类型    | 描述           |\n| ------- | ---------- | ----------------- |\n| x_train  | np.ndarray  | 训练集的输入特征   |\n| y_train  | np.ndarray  | 训练集的目标变量   |\n| x_test   | np.ndarray  | 测试集的输入特征   |\n\n## 推理配置文件说明\n| 配置文件名 | 说明 | 区别 |\n| ------- | ---------- | ----- |\n| cls_default_retrieval.json | 默认的**分类任务**推理配置文件，**带检索** | 分类性能更优 |\n| cls_default_noretrieval.json | 默认的**分类任务**推理配置文件，**不带检索** | 速度更快，内存需求更低 |\n| reg_default_retrieval.json | 默认的**回归任务**推理配置文件，**带检索** | 回归性能更优 |\n| reg_default_noretrieval.json | 默认的**回归任务**推理配置文件，**不带检索** | 速度更快，内存需求更低 |\n| reg_default_noretrieval_MVI.json | 默认的**缺失值插补任务**推理配置文件 |  |\n\n## ➩ 基于样本检索的集成推理\n\n有关基于样本检索的集成推理的详细技术介绍，请参阅[技术报告](https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX\u002Fblob\u002Fmain\u002FLimiX_Technical_Report.pdf)。\n\n考虑到推理速度和内存需求，基于样本检索的集成推理目前仅支持硬件规格高于NVIDIA RTX 4090 GPU的设备。\n\n### 分类任务\n\n```\npython inference_classifier.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data\n```\n\n### 回归任务\n\n```\npython inference_regression.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data\n```\n\n### 自定义推理任务的数据预处理\n#### 首先生成推理配置文件\n\n```python\ngenerate_inference_config()\n```\n\n### 分类任务\n#### 单GPU或CPU\n\n```\npython  inference_classifier.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data\n```\n\n#### 多GPU分布式推理\n\n```\ntorchrun --nproc_per_node=8  inference_classifier.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data --inference_with_DDP\n```\n\n### 回归任务\n#### 单GPU或CPU\n\n```\npython  inference_regression.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data\n```\n\n#### 多GPU分布式推理\n\n```\ntorchrun --nproc_per_node=8  inference_regression.py --save_name your_save_name --inference_config_path path_to_retrieval_config --data_dir path_to_data --inference_with_DDP\n```\n\n### 检索优化项目\n该项目实现了一个优化的检索系统。为了达到最佳性能，我们使用Optuna对检索参数进行超参数调优。\n#### 安装\n请确保已安装所需依赖：\n```\npip install optuna\n```\n#### 使用\n对于使用预先优化参数的标准推理，请参考以下代码：\n```\nsearchInference = RetrievalSearchHyperparameters(\n           dict(device_id=0,model_path=model_path), X_train, y_train, X_test, y_test,\n)\nconfig, result = searchInference.search(n_trials=10, metric=\"AUC\",\n              inference_config='config\u002Fcls_default_retrieval.json',task_type=\"cls\")\n```\n这将启动一个Optuna研究，为您特定的数据集和用例寻找最佳的检索参数组合。\n\n## ➩ 分类\n```python\nfrom sklearn.datasets import load_breast_cancer\nfrom sklearn.metrics import accuracy_score, roc_auc_score\nfrom sklearn.model_selection import train_test_split\nfrom huggingface_hub import hf_hub_download\nimport numpy as np\nimport os, sys\n\nos.environ[\"RANK\"] = \"0\"\nos.environ[\"WORLD_SIZE\"] = \"1\"\nos.environ[\"MASTER_ADDR\"] = \"127.0.0.1\"\nos.environ[\"MASTER_PORT\"] = \"29500\"\n\nROOT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), \"..\"))\nif ROOT_DIR not in sys.path:\n    sys.path.insert(0, ROOT_DIR)\nfrom inference.predictor import LimiXPredictor\n\nX, y = load_breast_cancer(return_X_y=True)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)\n\nmodel_file = hf_hub_download(repo_id=\"stableai-org\u002FLimiX-16M\", filename=\"LimiX-16M.ckpt\", local_dir=\".\u002Fcache\")\n\nclf = LimiXPredictor(device=torch.device('cuda'), model_path=model_file, inference_config='config\u002Fcls_default_retrieval.json')\nprediction = clf.predict(X_train, y_train, X_test)\n\nprint(\"roc_auc_score:\", roc_auc_score(y_test, prediction[:, 1]))\nprint(\"accuracy_score:\", accuracy_score(y_test, np.argmax(prediction, axis=1)))\n```\n更多示例请参阅[inference_classifier.py](.\u002Finference_classifier.py)\n\n## ➩ 回归\n```python\nfrom functools import partial\n\nfrom sklearn.datasets import fetch_california_housing\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import r2_score\nfrom huggingface_hub import hf_hub_download\ntry:\n    from sklearn.metrics import root_mean_squared_error as mean_squared_error\nexcept:\n    from sklearn.metrics import mean_squared_error\n    mean_squared_error = partial(mean_squared_error, squared=False)\nimport os, sys\n\nos.environ[\"RANK\"] = \"0\"\nos.environ[\"WORLD_SIZE\"] = \"1\"\nos.environ[\"MASTER_ADDR\"] = \"127.0.0.1\"\nos.environ[\"MASTER_PORT\"] = \"29500\"\n\nROOT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), \"..\"))\nif ROOT_DIR not in sys.path:\n    sys.path.insert(0, ROOT_DIR)\nfrom inference.predictor import LimiXPredictor\n\nhouse_data = fetch_california_housing()\nX, y = house_data.data, house_data.target\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\n\ny_mean = y_train.mean()\ny_std = y_train.std()\ny_train_normalized = (y_train - y_mean) \u002F y_std\ny_test_normalized = (y_test - y_mean) \u002F y_std\n\nmodel_path = hf_hub_download(repo_id=\"stableai-org\u002FLimiX-16M\", filename=\"LimiX-16M.ckpt\", local_dir=\".\u002Fcache\")\n\nmodel = LimiXPredictor(device=torch.device('cuda'), model_path=model_path, inference_config='config\u002Freg_default_retrieval.json')\ny_pred = model.predict(X_train, y_train_normalized, X_test)    \n\n# 计算RMSE和R²\ny_pred = y_pred.to('cpu').numpy()\nrmse = mean_squared_error(y_test_normalized, y_pred)\nr2 = r2_score(y_test_normalized, y_pred)\n\nprint(f'RMSE: {rmse}')\nprint(f'R2: {r2}')\n```\n更多示例请参阅[inference_regression.py](.\u002Finference_regression.py)\n\n## ➩ 缺失值插补\n演示文件请参见[examples\u002Fdemo_missing_value_imputation.py](examples\u002Finference_regression.py)\n\n# ➤ 链接\n - LimiX：释放通用智能的结构化数据建模能力：[LimiX：释放通用智能的结构化数据建模能力](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03505)\n - LimiX 技术报告：[LimiX_Technical_Report.pdf](https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX\u002Fblob\u002Fmain\u002FLimiX_Technical_Report.pdf)\n - 使用 LimiX 的详细说明：[访问 LimiX 官方文档](https:\u002F\u002Fwww.limix.ai\u002Fdoc\u002F)\n - 平衡综合挑战性全领域分类基准：[bcco_cls](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fstableai-org\u002Fbcco_cls)\n - 平衡综合挑战性全领域回归基准：[bcco_reg](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fstableai-org\u002Fbcco_reg)\n\n# ➤ 许可证\n本仓库中的代码以 [Apache-2.0](LICENSE.txt) 许可证开源，而 LimiX 模型权重的使用则受模型许可证约束。LimiX 权重完全可用于学术研究，并在获得适当授权后也可用于商业用途。\n\n# ➤ 引用\n```\n@article{zhang2025limix,\n  title={Limix: Unleashing structured-data modeling capability for generalist intelligence},\n  author={Zhang, Xingxuan and Ren, Gang and Yu, Han and Yuan, Hao and Wang, Hui and Li, Jiansheng and Wu, Jiayun and Mo, Lang and Mao, Li and Hao, Mingchao and others},\n  journal={arXiv preprint arXiv:2509.03505},\n  year={2025}\n}\n```","# LimiX 快速上手指南\n\nLimiX 是一款基于 Transformer 架构的通用表格数据基础模型，支持分类、回归、缺失值填补等多种任务。本指南将帮助您快速完成环境配置并运行首个预测示例。\n\n## 1. 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐 Ubuntu 22.04)\n- **GPU**: NVIDIA GPU (推荐 RTX 4090 或更高以启用检索增强推理；普通推理支持较低显存设备)\n- **CUDA**: 12.2+\n- **Python**: 3.12.7\n\n### 前置依赖\n- Docker (可选，推荐用于隔离环境)\n- Git\n- NVIDIA Container Toolkit (若使用 Docker)\n\n## 2. 安装步骤\n\n您可以选择使用 Docker（推荐）或手动安装。\n\n### 方案一：使用 Docker（推荐）\n\n下载 `Dockerfile` 并构建镜像：\n\n```bash\nwget https:\u002F\u002Fraw.githubusercontent.com\u002Flimix-ldm\u002FLimiX\u002Fmain\u002FDockerfile\ndocker build --network=host -t limix\u002Finfe:v1 --build-arg FROM_IMAGES=nvidia\u002Fcuda:12.2.0-base-ubuntu22.04 -f Dockerfile .\n```\n\n启动容器：\n```bash\ndocker run --gpus all -it --rm limix\u002Finfe:v1 bash\n```\n\n### 方案二：手动安装\n\n1. **下载预编译的 flash_attn 包**：\n```bash\nwget -O flash_attn-2.8.0.post2+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.8.0.post2\u002Fflash_attn-2.8.0.post2+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl\n```\n\n2. **安装 Python 依赖**：\n```bash\npip install python==3.12.7 torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1\npip install flash_attn-2.8.0.post2+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl\npip install scikit-learn einops huggingface-hub matplotlib networkx numpy pandas scipy tqdm typing_extensions xgboost kditransform hyperopt\n```\n> **提示**：国内用户建议使用清华或阿里镜像源加速安装，例如在 pip 命令后添加 `-i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`。\n\n3. **克隆源代码**：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Flimix-ldm\u002FLimiX.git\ncd LimiX\n```\n\n## 3. 基本使用\n\n以下示例演示如何使用 LimiX-16M 模型进行简单的**二分类任务**（乳腺癌数据集）。\n\n### 步骤 1: 准备代码\n\n创建文件 `quick_start.py`，填入以下代码：\n\n```python\nfrom sklearn.datasets import load_breast_cancer\nfrom sklearn.metrics import accuracy_score, roc_auc_score\nfrom sklearn.model_selection import train_test_split\nfrom huggingface_hub import hf_hub_download\nimport numpy as np\nimport os, sys\nimport torch\n\n# 设置分布式环境变量（单机单卡模式）\nos.environ[\"RANK\"] = \"0\"\nos.environ[\"WORLD_SIZE\"] = \"1\"\nos.environ[\"MASTER_ADDR\"] = \"127.0.0.1\"\nos.environ[\"MASTER_PORT\"] = \"29500\"\n\n# 添加项目路径\nROOT_DIR = os.path.abspath(os.path.dirname(__file__))\nif ROOT_DIR not in sys.path:\n    sys.path.insert(0, ROOT_DIR)\n\nfrom inference.predictor import LimiXPredictor\n\n# 1. 加载数据\nX, y = load_breast_cancer(return_X_y=True)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)\n\n# 2. 下载模型 (自动缓存到本地)\n# 可选模型：LimiX-16M (全能型) 或 LimiX-2M (轻量型，仅支持分类\u002F回归)\nmodel_file = hf_hub_download(repo_id=\"stableai-org\u002FLimiX-16M\", filename=\"LimiX-16M.ckpt\", local_dir=\".\u002Fcache\")\n\n# 3. 初始化预测器\n# task_type: 'Classification' 或 'Regression'\n# inference_config: 选择带检索 (性能更好) 或不带检索 (速度更快) 的配置\nclf = LimiXPredictor(\n    device=torch.device('cuda'), \n    model_path=model_file, \n    inference_config='config\u002Fcls_default_retrieval.json',\n    task_type='Classification'\n)\n\n# 4. 执行预测\n# 输入：训练集特征、训练集标签、测试集特征\nprediction = clf.predict(X_train, y_train, X_test)\n\n# 5. 评估结果\nprint(\"ROC AUC Score:\", roc_auc_score(y_test, prediction[:, 1]))\nprint(\"Accuracy Score:\", accuracy_score(y_test, np.argmax(prediction, axis=1)))\n```\n\n### 步骤 2: 运行脚本\n\n```bash\npython quick_start.py\n```\n\n### 常用配置说明\n\n| 任务类型 | 推荐配置文件 (inference_config) | 说明 |\n| :--- | :--- | :--- |\n| **分类 (Classification)** | `config\u002Fcls_default_retrieval.json` | 默认开启检索，精度最高 |\n| **分类 (快速)** | `config\u002Fcls_default_noretrieval.json` | 关闭检索，速度更快，显存占用更低 |\n| **回归 (Regression)** | `config\u002Freg_default_retrieval.json` | 默认开启检索，精度最高 |\n| **缺失值填补** | `config\u002Freg_default_noretrieval_MVI.json` | 专门用于缺失值填补任务 |\n\n> **注意**：若显存不足（如低于 24GB），建议在初始化 `LimiXPredictor` 时选用 `*_noretrieval.json` 配置文件，或直接使用参数量更小的 `LimiX-2M` 模型。","某金融风控团队正在构建新一代信贷违约预测系统，需同时处理包含大量缺失值的异构表格数据，并兼顾分类（是否违约）与回归（违约概率）任务。\n\n### 没有 LimiX 时\n- **流程割裂严重**：数据科学家需分别搭建缺失值填补、特征选择、分类建模和回归估算的独立流水线，维护多套代码库导致协作效率低下。\n- **模型泛化能力弱**：针对特定任务定制的 XGBoost 或 LightGBM 模型难以迁移，一旦业务目标从“判断违约”调整为“预估损失金额”，必须重新训练全新模型。\n- **缺失值处理粗糙**：传统插补方法（如均值填充）破坏了数据分布特性，导致关键风险特征失真，直接影响最终模型的准确率。\n- **推理资源消耗大**：为覆盖不同任务部署多个专用模型实例，显著增加了 GPU 内存占用和线上推理延迟。\n\n### 使用 LimiX 后\n- **统一架构提效**：LimiX 通过单一模型即可原生支持分类、回归及缺失值自动填补，将原本分散的五个处理步骤整合为一套训练推理配方，开发周期缩短 60%。\n- **通用智能适配**：利用其强大的结构化数据建模能力，同一套权重无需重新训练即可灵活切换预测目标，快速响应业务部门对新指标的分析需求。\n- **高精度数据修复**：基于注意力机制在样本与特征维度的双向挖掘，LimiX 能精准识别关键模式并智能还原缺失数据，使下游任务 AUC 指标提升 3.5%。\n- **轻量高效部署**：采用 LimiX-2M 变体后，在保持 SOTA 性能的同时大幅降低显存占用，单卡即可并发处理多类任务，推理速度提升近两倍。\n\nLimiX 通过将繁琐的定制化表格学习流水线重构为统一的基座模型，真正实现了结构化数据处理的“一模型多用”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flimix-ldm-ai_LimiX_8d74123f.png","limix-ldm-ai","Stable AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Flimix-ldm-ai_6a8061d8.png","",null,"stableai@stable-ai.cn","https:\u002F\u002Fwww.stable-ai.cn","https:\u002F\u002Fgithub.com\u002Flimix-ldm-ai",[84,88],{"name":85,"color":86,"percentage":87},"Python","#3572A5",99,{"name":89,"color":90,"percentage":91},"Dockerfile","#384d54",1,3373,293,"2026-04-05T10:55:16","Apache-2.0","Linux","必需（用于混合精度推理和检索增强）。基础推理支持单 GPU；若使用基于样本检索的集成推理（Ensemble Inference Based on Sample Retrieval），官方建议硬件规格高于 NVIDIA RTX 4090。Docker 镜像基于 CUDA 12.2.0，手动安装需匹配 flash_attn (cu12)。","未说明（但检索模式对显存要求极高，建议系统内存充足）",{"notes":100,"python":101,"dependencies":102},"1. 官方推荐使用 Docker 部署（基于 Ubuntu 22.04 + CUDA 12.2），手动安装需预先下载特定版本的 flash_attn wheel 文件。2. 模型分为 LimiX-16M（支持分类、回归、缺失值填补）和 LimiX-2M（仅支持分类、回归，显存占用更低）。3. 开启‘基于样本检索’功能时，对显卡性能要求极高（需优于 RTX 4090），否则建议使用无检索模式的配置文件以节省显存并提高速度。4. 支持多卡分布式推理（DDP）。","3.12.7",[103,104,105,106,107,108,109,110,111,112],"torch==2.7.1","flash_attn==2.8.0.post2+cu12","scikit-learn","pandas","numpy","xgboost","huggingface-hub","optuna","einops","tqdm",[13,51],[115,116,117,118],"foundation-models","limix","machine-learning","structured-data","2026-03-27T02:49:30.150509","2026-04-06T08:45:34.950436",[122,127,132,137,142,147],{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},10022,"LimiX 是否支持在 Apple Silicon (M 系列芯片) 上运行或进行加速？","目前 LimiX 已兼容 M4 CPU。团队已将 MLX 集成列为最高优先级，旨在利用 Neural Engine\u002FMetal 实现硬件加速，预计性能提升 5-10 倍。该功能计划在未来版本中支持，请密切关注官方更新。","https:\u002F\u002Fgithub.com\u002Flimix-ldm-ai\u002FLimiX\u002Fissues\u002F22",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},10023,"如何在时序数据中使用 LimiX 以避免数据泄露？模型是否会打乱数据顺序？","LimiX 使用合成表格数据进行预训练，因此在序列\u002F时序建模任务中本身不存在数据泄露问题。若要对时序数据建模，只需确保上下文（Context）是基于观测值建立的即可。对于分类任务，上下文通常由 X_context（特征）和 Y_context（标签）组成，其中 X_context 可以是从序列中提取的统计量或领域特定属性，只要这些特征也能在查询数据中计算出来即可。","https:\u002F\u002Fgithub.com\u002Flimix-ldm-ai\u002FLimiX\u002Fissues\u002F34",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},10024,"如何使用 LimiX 进行因果推断？需要修改代码吗？","目前模型内部没有专门的因果推断算法模块，而是通过“特征注意力（feature attention）”机制来辅助因果分析，反映特征重要性。使用方法是在配置文件中设置 `calculate_feature_attention=True`。计算得到的注意力分数可展示不同特征对模型决策的贡献程度，从而辅助用户进行因果推断分析。","https:\u002F\u002Fgithub.com\u002Flimix-ldm-ai\u002FLimiX\u002Fissues\u002F29",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},10025,"使用 Optuna 找到最佳参数后，如何复现最佳模型效果？","如果直接修改 JSON 配置文件未生效，可能是配置路径传入错误或脚本限制。建议尝试不直接使用 `inference_regression.py` 脚本，而是手动创建预测脚本。需确保环境变量（如 RANK, WORLD_SIZE 等）正确设置，并通过代码显式加载包含最佳参数（如 `use_cluster`, `cluster_num`, `dynamic_ratio` 等）的配置对象进行推理。同时检查保存目录下的 config 文件是否完整。","https:\u002F\u002Fgithub.com\u002Flimix-ldm-ai\u002FLimiX\u002Fissues\u002F24",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},10026,"如何从表格数据中提取嵌入特征向量（Embedding Feature Vectors）？","当前开源代码暂无直接提取嵌入向量的现成模块。用户可以自行实现：在 `model\u002Ftransformer.py` 的第 202 行获取 `encoder_out` 输出，并提取对应位置的 Embedding 作为特征表示。类似于大语言模型的处理方式，可以采用平均池化（averaging）进行特征聚合。由于模型已通过 Transformer 层进行了复杂的行列交互，其输出包含了丰富的原始表格数据信息。","https:\u002F\u002Fgithub.com\u002Flimix-ldm-ai\u002FLimiX\u002Fissues\u002F23",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},10027,"如何在 TabArena 基准测试中集成和运行 LimiX？","TabArena 中没有专门的 LimiX 集成接口，用户可以直接在 TabArena 环境中运行 LimiX 的开源版本。相关的评估数据集列表现已在 `benchmark_list` 文件夹中提供。如果需要针对特定流水线优化性能，可以参考社区提交的 PR 进行讨论，但官方可能因并行项目较多而无法立即处理集成请求。","https:\u002F\u002Fgithub.com\u002Flimix-ldm-ai\u002FLimiX\u002Fissues\u002F4",[153,158],{"id":154,"version":155,"summary_zh":156,"released_at":157},116961,"V1.1.0","LimiX-2M is officially released. Compared to LimiX-16M, this more lightweight variant features significantly lower GPU memory usage and faster inference speed. Its retrieval mechanism has also been upgraded, further improving model performance while reducing both inference time and memory consumption.","2025-11-10T08:04:54",{"id":159,"version":160,"summary_zh":161,"released_at":162},116962,"V1.0.1","Update V1.0.1","2025-09-19T09:27:30"]