[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-HDI-Project--ATM":3,"tool-HDI-Project--ATM":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":32,"env_os":93,"env_gpu":94,"env_ram":94,"env_deps":95,"category_tags":102,"github_topics":104,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":110,"updated_at":111,"faqs":112,"releases":141},8734,"HDI-Project\u002FATM","ATM","Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).","ATM（Auto Tune Models）是由麻省理工学院数据与人工智能实验室推出的开源自动化机器学习系统。它的核心功能十分直观：用户只需提供一个分类任务数据集（CSV 格式），ATM 便能自动尝试构建并优化出性能最佳的机器学习模型。\n\n在机器学习实践中，从众多算法中筛选合适模型并精细调整参数往往耗时费力，且对专业知识要求较高。ATM 正是为了解决这一痛点而生，它将复杂的模型选择与超参数调优过程自动化，让用户能更专注于数据本身而非繁琐的工程细节。作为基于同名学术论文研发的成果，ATM 采用了独特的多租户、多数据系统架构，支持在统一环境中高效管理多个实验任务。\n\n这款工具特别适合希望快速验证想法的数据科学家、需要基准模型的研究人员，以及想要降低机器学习入门门槛的开发者。虽然目前项目处于预发布阶段，但其简洁的使用流程和对标准 CSV 数据的原生支持，使其成为探索自动化建模的有力助手。无论是用于学术实验还是原型开发，ATM 都能帮助用户以更少的代码投入，获得高质量的模型结果。","\u003Cp align=\"left\">\n\u003Cimg width=15% src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHDI-Project_ATM_readme_30217158ad84.png\" alt=“ATM” \u002F>\n\u003Ci>An open source project from Data to AI Lab at MIT.\u003C\u002Fi>\n\u003C\u002Fp>\n\n# ATM - Auto Tune Models\n\n[![Development Status](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDevelopment%20Status-2%20--%20Pre--Alpha-yellow)](https:\u002F\u002Fpypi.org\u002Fsearch\u002F?c=Development+Status+%3A%3A+2+-+Pre-Alpha)\n[![PyPi Shield](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fatm.svg)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fatm)\n[![Travis](https:\u002F\u002Ftravis-ci.org\u002FHDI-Project\u002FATM.svg?branch=master)](https:\u002F\u002Ftravis-ci.org\u002FHDI-Project\u002FATM)\n[![CircleCI](https:\u002F\u002Fcircleci.com\u002Fgh\u002FHDI-Project\u002FATM.svg?style=shield)](https:\u002F\u002Fcircleci.com\u002Fgh\u002FHDI-Project\u002FATM)\n[![Coverage Status](https:\u002F\u002Fcodecov.io\u002Fgh\u002FHDI-project\u002FATM\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002FHDI-project\u002FATM)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHDI-Project_ATM_readme_06ceb8478afc.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fatm)\n\n- License: MIT\n- Development Status: [Pre-Alpha](https:\u002F\u002Fpypi.org\u002Fsearch\u002F?c=Development+Status+%3A%3A+2+-+Pre-Alpha)\n- Documentation: https:\u002F\u002FHDI-Project.github.io\u002FATM\u002F\n- Homepage: https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\n\n# Overview\n\nAuto Tune Models (ATM) is an AutoML system designed with ease of use in mind. In short, you give\nATM a classification problem and a dataset as a CSV file, and ATM will try to build the best model\nit can. ATM is based on a [paper](https:\u002F\u002Fdai.lids.mit.edu\u002Fwp-content\u002Fuploads\u002F2018\u002F02\u002Fatm_IEEE_BIgData-9-1.pdf)\nof the same name, and the project is part of the [Human-Data Interaction (HDI) Project](https:\u002F\u002Fhdi-dai.lids.mit.edu\u002F) at MIT.\n\n\n# Install\n\n## Requirements\n\n**ATM** has been developed and tested on [Python 2.7, 3.5, and 3.6](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n\nAlso, although it is not strictly required, the usage of a\n[virtualenv](https:\u002F\u002Fvirtualenv.pypa.io\u002Fen\u002Flatest\u002F) is highly recommended in order to avoid\ninterfering with other software installed in the system where **ATM** is run.\n\nThese are the minimum commands needed to create a virtualenv using python3.6 for **ATM**:\n\n```bash\npip install virtualenv\nvirtualenv -p $(which python3.6) atm-venv\n```\n\nAfterwards, you have to execute this command to have the virtualenv activated:\n\n```bash\nsource atm-venv\u002Fbin\u002Factivate\n```\n\nRemember about executing it every time you start a new console to work on **ATM**!\n\n## Install with pip\n\nAfter creating the virtualenv and activating it, we recommend using\n[pip](https:\u002F\u002Fpip.pypa.io\u002Fen\u002Fstable\u002F) in order to install **ATM**:\n\n```bash\npip install atm\n```\n\nThis will pull and install the latest stable release from [PyPi](https:\u002F\u002Fpypi.org\u002F).\n\n## Install from source\n\nAlternatively, with your virtualenv activated, you can clone the repository and install it from\nsource by running `make install` on the `stable` branch:\n\n```bash\ngit clone git@github.com:HDI-Project\u002FATM.git\ncd ATM\ngit checkout stable\nmake install\n```\n\n## Install for Development\n\nIf you want to contribute to the project, a few more steps are required to make the project ready\nfor development.\n\nFirst, please head to [the GitHub page of the project](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM)\nand make a fork of the project under you own username by clicking on the **fork** button on the\nupper right corner of the page.\n\nAfterwards, clone your fork and create a branch from master with a descriptive name that includes\nthe number of the issue that you are going to work on:\n\n```bash\ngit clone git@github.com:{your username}\u002FATM.git\ncd ATM\ngit branch issue-xx-cool-new-feature master\ngit checkout issue-xx-cool-new-feature\n```\n\nFinally, install the project with the following command, which will install some additional\ndependencies for code linting and testing.\n\n```bash\nmake install-develop\n```\n\nMake sure to use them regularly while developing by running the commands `make lint` and `make test`.\n\n\n# Data Format\n\nATM input is always a CSV file with the following characteristics:\n\n* It uses a single comma, `,`, as the separator.\n* Its first row is a header that contains the names of the columns.\n* There is a column that contains the target variable that will need to be predicted.\n* The rest of the columns are all variables or features that will be used to predict the target column.\n* Each row corresponds to a single, complete, training sample.\n\nHere are the first 5 rows of a valid CSV with 4 features and one target column called `class` as an example:\n\n```\nfeature_01,feature_02,feature_03,feature_04,class\n5.1,3.5,1.4,0.2,Iris-setosa\n4.9,3.0,1.4,0.2,Iris-setosa\n4.7,3.2,1.3,0.2,Iris-setosa\n4.6,3.1,1.5,0.2,Iris-setosa\n```\n\nThis CSV can be passed to ATM as local filesystem path but also as a complete AWS S3 Bucket and\npath specification or as a URL.\n\nYou can find a collection of demo datasets in the [atm-data S3 Bucket in AWS](https:\u002F\u002Fatm-data.s3.amazonaws.com\u002Findex.html).\n\n\n# Quickstart\n\nIn this short tutorial we will guide you through a series of steps that will help you getting\nstarted with **ATM** by exploring its Python API.\n\n## 1. Get the demo data\n\nThe first step in order to run **ATM** is to obtain the demo datasets that will be used in during\nthe rest of the tutorial.\n\nFor this demo we will be using the pollution csv from the atm-data bucket, which you can download with your browser\n[from here](https:\u002F\u002Fatm-data.s3.amazonaws.com\u002Fpollution_1.csv), or using the following command:\n\n```bash\natm download_demo pollution_1.csv\n```\n\n## 2. Create an ATM instance\n\nThe first thing to do after obtaining the demo dataset is creating an ATM instance.\n\n```python\nfrom atm import ATM\n\natm = ATM()\n```\n\nBy default, if the ATM instance is without any arguments, it will create an SQLite database\ncalled `atm.db` in your current working directory.\n\nIf you want to connect to a SQL database instead, or change the location of your SQLite database,\nplease check the [API Reference](https:\u002F\u002Fhdi-project.github.io\u002FATM\u002Fapi\u002Fatm.core.html)\nfor the complete list of available options.\n\n## 3. Search for the best model\n\nOnce you have the **ATM** instance ready, you can use the method `atm.run` to start\nsearching for the model that better predicts the target column of your CSV file.\n\nThis function has to be given the path to your CSV file, which can be a local filesystem path, an URL to\nand HTTP or S3 resource.\n\nFor example, if we have previously downloaded the [pollution_1.csv](https:\u002F\u002Fatm-data.s3.amazonaws.com\u002Fpollution_1.csv)\nfile inside our current working directory, we can call `run` like this:\n\n```python\nresults = atm.run(train_path='pollution_1.csv')\n```\n\nAlternatively, we can use the HTTPS URL of the file to have ATM download the CSV for us:\n\n```python\nresults = atm.run(train_path='https:\u002F\u002Fatm-data.s3.amazonaws.com\u002Fpollution_1.csv')\n```\n\nAs the last option, if we have the file inside an S3 Bucket, we can download it by passing an URI\nin the `s3:\u002F\u002F{bucket}\u002F{key}` format:\n\n```python\nresults = atm.run(train_path='s3:\u002F\u002Fatm-data\u002Fpollution_1.csv')\n```\n\nIn order to make this work with a Private S3 Bucket, please make sure to having configured your\n[AWS credentials file](https:\u002F\u002Fdocs.aws.amazon.com\u002Fsdk-for-java\u002Fv1\u002Fdeveloper-guide\u002Fsetup-credentials.html),\nor to having created your `ATM` instance passing it the `access_key` and `secret_key` arguments.\n\nThis `run` call will start what is called a `Datarun`, and a progress bar will be displayed\nwhile the different models are tested and tuned.\n\n```python\nProcessing dataset demos\u002Fpollution_1.csv\n100%|##########################| 100\u002F100 [00:10\u003C00:00,  6.09it\u002Fs]\n```\n\nOnce this process has ended, a message will print that the `Datarun` has ended. Then we can\nexplore the `results` object.\n\n## 4. Explore the results\n\nOnce the Datarun has finished, we can explore the `results` object in several ways:\n\n**a. Get a summary of the Datarun**\n\nThe `describe` method will return us a summary of the Datarun execution:\n\n```python\nresults.describe()\n```\n\nThis will print a short description of this Datarun similar to this:\n\n```python\nDatarun 1 summary:\n    Dataset: 'demos\u002Fpollution_1.csv'\n    Column Name: 'class'\n    Judgment Metric: 'f1'\n    Classifiers Tested: 100\n    Elapsed Time: 0:00:07.638668\n```\n\n**b. Get a summary of the best classifier**\n\nThe `get_best_classifier` method will print information about the best classifier that was found\nduring this Datarun, including the method used and the best hyperparameters found:\n\n```python\nresults.get_best_classifier()\n```\n\nThe output will be similar to this:\n\n```python\nClassifier id: 94\nClassifier type: knn\nParams chosen:\n    n_neighbors: 13\n    leaf_size: 38\n    weights: uniform\n    algorithm: kd_tree\n    metric: manhattan\n    _scale: True\nCross Validation Score: 0.858 +- 0.096\nTest Score: 0.714\n```\n\n**c. Explore the scores**\n\nThe `get_scores` method will return a `pandas.DataFrame` with information about all the\nclassifiers tested during the Datarun, including their cross validation scores and\nthe location of their pickled models.\n\n```python\nscores = results.get_scores()\n```\n\nThe contents of the scores dataframe should be similar to these:\n\n```python\n  cv_judgment_metric cv_judgment_metric_stdev  id test_judgment_metric  rank\n0       0.8584126984             0.0960095737  94         0.7142857143   1.0\n1       0.8222222222             0.0623609564  12         0.6250000000   2.0\n2       0.8147619048             0.1117618135  64         0.8750000000   3.0\n3       0.8139393939             0.0588721670  68         0.6086956522   4.0\n4       0.8067754468             0.0875180564  50         0.6250000000   5.0\n...\n```\n\n## 5. Make predictions\n\nOnce we have found and explored the best classifier, we will want to make predictions with it.\n\nIn order to do this, we need to follow several steps:\n\n**a. Export the best classifier**\n\nThe `export_best_classifier` method can be used to serialize and save the best classifier model\nusing pickle in the desired location:\n\n```python\nresults.export_best_classifier('path\u002Fto\u002Fmodel.pkl')\n```\n\nIf the classifier has been saved correctly, a message will be printed indicating so:\n\n```python\nClassifier 94 saved as path\u002Fto\u002Fmodel.pkl\n```\n\nIf the path that you provide already exists, you can ovewrite it by adding the argument\n`force=True`.\n\n**b. Load the exported model**\n\nOnce it is exported you can load it back by calling the `load` method from the `atm.Model`\nclass and passing it the path where the model has been saved:\n\n```python\nfrom atm import Model\n\nmodel = Model.load('path\u002Fto\u002Fmodel.pkl')\n```\n\nOnce you have loaded your model, you can pass new data to its `predict` method to make\npredictions:\n\n```python\nimport pandas as pd\n\ndata = pd.read_csv(demo_datasets['pollution'])\n\npredictions = model.predict(data.head())\n```\n\n\n# What's next?\n\nFor more details about **ATM** and all its possibilities and features, please check the\n[documentation site](https:\u002F\u002FHDI-Project.github.io\u002FATM\u002F).\n\nThere you can learn more about its [Command Line Interface](https:\u002F\u002Fhdi-project.github.io\u002FATM\u002Fcli.html)\nand its [REST API](https:\u002F\u002Fhdi-project.github.io\u002FATM\u002Frest.html), as well as\n[how to contribute to ATM](https:\u002F\u002FHDI-Project.github.io\u002FATM\u002Fcommunity\u002Fcontributing.html)\nin order to help us developing new features or cool ideas.\n\n# Credits\n\nATM is an open source project from the Data to AI Lab at MIT which has been built and maintained\nover the years by the following team:\n\n* Bennett Cyphers \u003Cbcyphers@mit.edu>\n* Thomas Swearingen \u003Cswearin3@msu.edu>\n* Carles Sala \u003Ccsala@csail.mit.edu>\n* Plamen Valentinov \u003Cplamen@pythiac.com>\n* Kalyan Veeramachaneni \u003Ckalyan@mit.edu>\n* Micah Smith \u003Cmicahjsmith@gmail.com>\n* Laura Gustafson \u003Clgustaf@mit.edu>\n* Kiran Karra \u003Ckiran.karra@gmail.com>\n* Max Kanter \u003Ckmax12@gmail.com>\n* Alfredo Cuesta-Infante \u003Calfredo.cuesta@urjc.es>\n* Favio André Vázquez \u003Cfavio.vazquezp@gmail.com>\n* Matteo Hoch \u003Cminime@hochweb.com>\n\n\n## Citing ATM\n\nIf you use ATM, please consider citing the following paper:\n\nThomas Swearingen, Will Drevo, Bennett Cyphers, Alfredo Cuesta-Infante, Arun Ross, Kalyan Veeramachaneni. [ATM: A distributed, collaborative, scalable system for automated machine learning.](https:\u002F\u002Fcyphe.rs\u002Fstatic\u002Fatm.pdf) *IEEE BigData 2017*, 151-162\n\nBibTeX entry:\n\n```bibtex\n@inproceedings{DBLP:conf\u002Fbigdataconf\u002FSwearingenDCCRV17,\n  author    = {Thomas Swearingen and\n               Will Drevo and\n               Bennett Cyphers and\n               Alfredo Cuesta{-}Infante and\n               Arun Ross and\n               Kalyan Veeramachaneni},\n  title     = {{ATM:} {A} distributed, collaborative, scalable system for automated\n               machine learning},\n  booktitle = {2017 {IEEE} International Conference on Big Data, BigData 2017, Boston,\n               MA, USA, December 11-14, 2017},\n  pages     = {151--162},\n  year      = {2017},\n  crossref  = {DBLP:conf\u002Fbigdataconf\u002F2017},\n  url       = {https:\u002F\u002Fdoi.org\u002F10.1109\u002FBigData.2017.8257923},\n  doi       = {10.1109\u002FBigData.2017.8257923},\n  timestamp = {Tue, 23 Jan 2018 12:40:42 +0100},\n  biburl    = {https:\u002F\u002Fdblp.org\u002Frec\u002Fbib\u002Fconf\u002Fbigdataconf\u002FSwearingenDCCRV17},\n  bibsource = {dblp computer science bibliography, https:\u002F\u002Fdblp.org}\n}\n```\n\n## Related Projects\n\n### BTB\n\n[BTB](https:\u002F\u002Fgithub.com\u002Fhdi-project\u002Fbtb), for Bayesian Tuning and Bandits, is the core AutoML\nlibrary in development under the HDI project. BTB exposes several methods for hyperparameter\nselection and tuning through a common API. It allows domain experts to extend existing methods\nand add new ones easily. BTB is a central part of ATM, and the two projects were developed in\ntandem, but it is designed to be implementation-agnostic and should be useful for a wide range\nof hyperparameter selection tasks.\n\n### Featuretools\n\n[Featuretools](https:\u002F\u002Fgithub.com\u002Ffeaturetools\u002Ffeaturetools) is a python library for automated\nfeature engineering. It can be used to prepare raw transactional and relational datasets for ATM.\nIt is created and maintained by [Feature Labs](https:\u002F\u002Fwww.featurelabs.com) and is also a part\nof the [Human Data Interaction Project](https:\u002F\u002Fhdi-dai.lids.mit.edu\u002F).\n","\u003Cp align=\"left\">\n\u003Cimg width=15% src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHDI-Project_ATM_readme_30217158ad84.png\" alt=“ATM” \u002F>\n\u003Ci>麻省理工学院数据到人工智能实验室的开源项目。\u003C\u002Fi>\n\u003C\u002Fp>\n\n# ATM - 自动调优模型\n\n[![开发状态](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDevelopment%20Status-2%20--%20Pre--Alpha-yellow)](https:\u002F\u002Fpypi.org\u002Fsearch\u002F?c=Development+Status+%3A%3A+2+-+Pre-Alpha)\n[![PyPi盾牌](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fatm.svg)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fatm)\n[![Travis](https:\u002F\u002Ftravis-ci.org\u002FHDI-Project\u002FATM.svg?branch=master)](https:\u002F\u002Ftravis-ci.org\u002FHDI-Project\u002FATM)\n[![CircleCI](https:\u002F\u002Fcircleci.com\u002Fgh\u002FHDI-Project\u002FATM.svg?style=shield)](https:\u002F\u002Fcircleci.com\u002Fgh\u002FHDI-Project\u002FATM)\n[![覆盖率](https:\u002F\u002Fcodecov.io\u002Fgh\u002FHDI-project\u002FATM\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002FHDI-project\u002FATM)\n[![下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHDI-Project_ATM_readme_06ceb8478afc.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fatm)\n\n- 许可证：MIT\n- 开发状态：[预 Alpha](https:\u002F\u002Fpypi.org\u002Fsearch\u002F?c=Development+Status+%3A%3A+2+-+Pre-Alpha)\n- 文档：https:\u002F\u002FHDI-Project.github.io\u002FATM\u002F\n- 主页：https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\n\n# 概述\n\n自动调优模型（ATM）是一个以易用性为核心设计的AutoML系统。简而言之，您只需向ATM提供一个分类问题和一个CSV格式的数据集，ATM便会尝试构建出最佳模型。ATM基于同名论文[论文链接]，并且该项目是麻省理工学院[人类-数据交互（HDI）项目](https:\u002F\u002Fhdi-dai.lids.mit.edu\u002F)的一部分。\n\n\n# 安装\n\n## 要求\n\n**ATM** 已在 [Python 2.7、3.5 和 3.6](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F) 上开发并测试。\n\n此外，虽然并非严格要求，但强烈建议使用 [virtualenv](https:\u002F\u002Fvirtualenv.pypa.io\u002Fen\u002Flatest\u002F) 来避免与系统中已安装的其他软件发生冲突，从而确保 **ATM** 的正常运行。\n\n以下是使用 Python 3.6 创建用于 **ATM** 的 virtualenv 所需的最低命令：\n\n```bash\npip install virtualenv\nvirtualenv -p $(which python3.6) atm-venv\n```\n\n之后，您需要执行以下命令来激活 virtualenv：\n\n```bash\nsource atm-venv\u002Fbin\u002Factivate\n```\n\n请记住，每次启动新的终端窗口进行 **ATM** 相关工作时，都需要重新激活 virtualenv！\n\n## 使用 pip 安装\n\n创建并激活 virtualenv 后，我们建议使用 [pip](https:\u002F\u002Fpip.pypa.io\u002Fen\u002Fstable\u002F) 来安装 **ATM**：\n\n```bash\npip install atm\n```\n\n这将从 [PyPi](https:\u002F\u002Fpypi.org\u002F) 下载并安装最新稳定版本。\n\n## 从源代码安装\n\n或者，在您的 virtualenv 已经激活的情况下，您可以克隆仓库并通过在 `stable` 分支上运行 `make install` 来从源代码安装：\n\n```bash\ngit clone git@github.com:HDI-Project\u002FATM.git\ncd ATM\ngit checkout stable\nmake install\n```\n\n## 开发环境安装\n\n如果您希望为项目做出贡献，则还需要额外几步来使项目进入开发状态。\n\n首先，请访问项目的 [GitHub 页面](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM)，并在页面右上角点击 **fork** 按钮，以您自己的用户名创建一个项目分支。\n\n随后，克隆您的分支，并基于 master 分支创建一个带有描述性名称的分支，该名称应包含您即将处理的问题编号：\n\n```bash\ngit clone git@github.com:{your username}\u002FATM.git\ncd ATM\ngit branch issue-xx-cool-new-feature master\ngit checkout issue-xx-cool-new-feature\n```\n\n最后，通过以下命令安装项目，这将同时安装一些用于代码检查和测试的附加依赖项：\n\n```bash\nmake install-develop\n```\n\n请务必在开发过程中定期运行 `make lint` 和 `make test` 命令。\n\n\n# 数据格式\n\nATM 的输入始终是一个符合以下特征的 CSV 文件：\n\n* 使用单个逗号 `,` 作为分隔符。\n* 第一行是包含列名的表头。\n* 必须有一列作为目标变量，用于预测。\n* 其余各列为用于预测目标列的特征或变量。\n* 每一行对应一个完整的训练样本。\n\n以下是一个包含 4 个特征和名为 `class` 的目标列的有效 CSV 文件的前 5 行示例：\n\n```\nfeature_01,feature_02,feature_03,feature_04,class\n5.1,3.5,1.4,0.2,Iris-setosa\n4.9,3.0,1.4,0.2,Iris-setosa\n4.7,3.2,1.3,0.2,Iris-setosa\n4.6,3.1,1.5,0.2,Iris-setosa\n```\n\n此 CSV 文件可以作为本地文件路径传递给 ATM，也可以以完整的 AWS S3 存储桶及路径规范，或直接通过 URL 提供。\n\n您可以在 AWS 中的 [atm-data S3 存储桶](https:\u002F\u002Fatm-data.s3.amazonaws.com\u002Findex.html) 中找到一系列演示数据集。\n\n\n# 快速入门\n\n在本简短教程中，我们将引导您完成一系列步骤，帮助您通过探索其 Python API 来开始使用 **ATM**。\n\n## 1. 获取演示数据\n\n运行 **ATM** 的第一步是获取将在后续教程中使用的演示数据集。\n\n本次演示我们将使用 atm-data 存储桶中的污染数据 CSV 文件，您可以通过浏览器从[这里](https:\u002F\u002Fatm-data.s3.amazonaws.com\u002Fpollution_1.csv)下载，或使用以下命令：\n\n```bash\natm download_demo pollution_1.csv\n```\n\n## 2. 创建 ATM 实例\n\n获取演示数据后，下一步就是创建一个 ATM 实例。\n\n```python\nfrom atm import ATM\n\natm = ATM()\n```\n\n默认情况下，如果 ATM 实例未指定任何参数，它将在当前工作目录下创建一个名为 `atm.db` 的 SQLite 数据库。\n\n如果您希望连接到 SQL 数据库，或更改 SQLite 数据库的位置，请参阅 [API 参考文档](https:\u002F\u002Fhdi-project.github.io\u002FATM\u002Fapi\u002Fatm.core.html)，以获取所有可用选项的完整列表。\n\n## 3. 搜索最佳模型\n\n一旦你准备好 **ATM** 实例，就可以使用 `atm.run` 方法开始搜索能够更好地预测 CSV 文件目标列的模型。\n\n此函数需要传入 CSV 文件的路径，该路径可以是本地文件系统路径，也可以是 HTTP 或 S3 资源的 URL。\n\n例如，如果我们之前已将 [pollution_1.csv](https:\u002F\u002Fatm-data.s3.amazonaws.com\u002Fpollution_1.csv) 文件下载到当前工作目录中，我们可以这样调用 `run`：\n\n```python\nresults = atm.run(train_path='pollution_1.csv')\n```\n\n或者，我们也可以使用文件的 HTTPS URL，让 ATM 自动为我们下载 CSV 文件：\n\n```python\nresults = atm.run(train_path='https:\u002F\u002Fatm-data.s3.amazonaws.com\u002Fpollution_1.csv')\n```\n\n最后一种方式是，如果文件存储在 S3 存储桶中，我们可以通过传递 `s3:\u002F\u002F{bucket}\u002F{key}` 格式的 URI 来下载它：\n\n```python\nresults = atm.run(train_path='s3:\u002F\u002Fatm-data\u002Fpollution_1.csv')\n```\n\n为了使这一操作适用于私有 S3 存储桶，请确保已配置好你的 AWS 凭证文件（[AWS 凭证文件配置指南](https:\u002F\u002Fdocs.aws.amazon.com\u002Fsdk-for-java\u002Fv1\u002Fdeveloper-guide\u002Fsetup-credentials.html)），或者在创建 `ATM` 实例时通过 `access_key` 和 `secret_key` 参数进行配置。\n\n调用 `run` 后，将启动一个称为“数据运行”的过程，并在测试和调优不同模型的过程中显示进度条。\n\n```python\nProcessing dataset demos\u002Fpollution_1.csv\n100%|##########################| 100\u002F100 [00:10\u003C00:00,  6.09it\u002Fs]\n```\n\n当此过程结束时，会打印一条消息，表明“数据运行”已完成。随后，我们可以探索 `results` 对象。\n\n## 4. 探索结果\n\n数据运行完成后，我们可以通过多种方式来探索 `results` 对象：\n\n**a. 获取数据运行的摘要**\n\n`describe` 方法会返回数据运行执行的摘要信息：\n\n```python\nresults.describe()\n```\n\n这将打印出类似如下的简短描述：\n\n```python\nDatarun 1 summary:\n    Dataset: 'demos\u002Fpollution_1.csv'\n    Column Name: 'class'\n    Judgment Metric: 'f1'\n    Classifiers Tested: 100\n    Elapsed Time: 0:00:07.638668\n```\n\n**b. 获取最佳分类器的摘要**\n\n`get_best_classifier` 方法会打印出本次数据运行中找到的最佳分类器的相关信息，包括所使用的算法及最优超参数：\n\n```python\nresults.get_best_classifier()\n```\n\n输出内容可能如下所示：\n\n```python\nClassifier id: 94\nClassifier type: knn\nParams chosen:\n    n_neighbors: 13\n    leaf_size: 38\n    weights: uniform\n    algorithm: kd_tree\n    metric: manhattan\n    _scale: True\nCross Validation Score: 0.858 +- 0.096\nTest Score: 0.714\n```\n\n**c. 探索评分**\n\n`get_scores` 方法会返回一个包含数据运行期间所有测试分类器信息的 `pandas.DataFrame`，其中包括它们的交叉验证分数以及其序列化模型的保存位置。\n\n```python\nscores = results.get_scores()\n```\n\n评分 DataFrame 的内容可能类似于以下内容：\n\n```python\n  cv_judgment_metric cv_judgment_metric_stdev  id test_judgment_metric  rank\n0       0.8584126984             0.0960095737  94         0.7142857143   1.0\n1       0.8222222222             0.0623609564  12         0.6250000000   2.0\n2       0.8147619048             0.1117618135  64         0.8750000000   3.0\n3       0.8139393939             0.0588721670  68         0.6086956522   4.0\n4       0.8067754468             0.0875180564  50         0.6250000000   5.0\n...\n```\n\n## 5. 进行预测\n\n当我们找到并分析了最佳分类器后，接下来就需要使用它来进行预测。\n\n为此，我们需要遵循以下几个步骤：\n\n**a. 导出最佳分类器**\n\n`export_best_classifier` 方法可用于将最佳分类器模型序列化并以 pickle 格式保存到指定位置：\n\n```python\nresults.export_best_classifier('path\u002Fto\u002Fmodel.pkl')\n```\n\n如果分类器成功保存，系统将打印一条确认消息：\n\n```python\nClassifier 94 saved as path\u002Fto\u002Fmodel.pkl\n```\n\n如果你提供的路径已存在，可以通过添加 `force=True` 参数来覆盖原有文件。\n\n**b. 加载导出的模型**\n\n导出完成后，你可以通过调用 `atm.Model` 类的 `load` 方法并传入模型保存路径将其重新加载：\n\n```python\nfrom atm import Model\n\nmodel = Model.load('path\u002Fto\u002Fmodel.pkl')\n```\n\n加载模型后，你可以将新数据传递给其 `predict` 方法来进行预测：\n\n```python\nimport pandas as pd\n\ndata = pd.read_csv(demo_datasets['pollution'])\n\npredictions = model.predict(data.head())\n```\n\n\n# 下一步是什么？\n\n如需了解更多关于 **ATM** 及其各种可能性和功能的信息，请访问其[文档网站](https:\u002F\u002FHDI-Project.github.io\u002FATM\u002F)。\n\n在那里，你可以进一步了解其[命令行界面](https:\u002F\u002Fhdi-project.github.io\u002FATM\u002Fcli.html)和[REST API](https:\u002F\u002Fhdi-project.github.io\u002FATM\u002Frest.html)，以及如何参与贡献（[ATM 社区贡献指南](https:\u002F\u002FHDI-Project.github.io\u002FATM\u002Fcommunity\u002Fcontributing.html)），帮助我们开发新功能或实现更多创意。\n\n# 致谢\n\nATM 是麻省理工学院数据到 AI 实验室的一个开源项目，多年来由以下团队构建和维护：\n\n* Bennett Cyphers \u003Cbcyphers@mit.edu>\n* Thomas Swearingen \u003Cswearin3@msu.edu>\n* Carles Sala \u003Ccsala@csail.mit.edu>\n* Plamen Valentinov \u003Cplamen@pythiac.com>\n* Kalyan Veeramachaneni \u003Ckalyan@mit.edu>\n* Micah Smith \u003Cmicahjsmith@gmail.com>\n* Laura Gustafson \u003Clgustaf@mit.edu>\n* Kiran Karra \u003Ckiran.karra@gmail.com>\n* Max Kanter \u003Ckmax12@gmail.com>\n* Alfredo Cuesta-Infante \u003Calfredo.cuesta@urjc.es>\n* Favio André Vázquez \u003Cfavio.vazquezp@gmail.com>\n* Matteo Hoch \u003Cminime@hochweb.com>\n\n## 引用 ATM\n\n如果您使用 ATM，请考虑引用以下论文：\n\nThomas Swearingen, Will Drevo, Bennett Cyphers, Alfredo Cuesta-Infante, Arun Ross, Kalyan Veeramachaneni. [ATM：一种分布式、协作式、可扩展的自动化机器学习系统。](https:\u002F\u002Fcyphe.rs\u002Fstatic\u002Fatm.pdf) *IEEE BigData 2017*, 151–162页\n\nBibTeX 条目：\n\n```bibtex\n@inproceedings{DBLP:conf\u002Fbigdataconf\u002FSwearingenDCCRV17,\n  author    = {Thomas Swearingen and\n               Will Drevo and\n               Bennett Cyphers and\n               Alfredo Cuesta{-}Infante and\n               Arun Ross and\n               Kalyan Veeramachaneni},\n  title     = {{ATM:} {A} distributed, collaborative, scalable system for automated\n               machine learning},\n  booktitle = {2017 {IEEE} International Conference on Big Data, BigData 2017, Boston,\n               MA, USA, December 11-14, 2017},\n  pages     = {151--162},\n  year      = {2017},\n  crossref  = {DBLP:conf\u002Fbigdataconf\u002F2017},\n  url       = {https:\u002F\u002Fdoi.org\u002F10.1109\u002FBigData.2017.8257923},\n  doi       = {10.1109\u002FBigData.2017.8257923},\n  timestamp = {Tue, 23 Jan 2018 12:40:42 +0100},\n  biburl    = {https:\u002F\u002Fdblp.org\u002Frec\u002Fbib\u002Fconf\u002Fbigdataconf\u002FSwearingenDCCRV17},\n  bibsource = {dblp computer science bibliography, https:\u002F\u002Fdblp.org}\n}\n```\n\n## 相关项目\n\n### BTB\n\n[BTB](https:\u002F\u002Fgithub.com\u002Fhdi-project\u002Fbtb)，即贝叶斯调优与多臂老虎机，是 HDI 项目下正在开发的核心 AutoML 库。BTB 通过一个通用的 API 暴露了多种超参数选择和调优方法。它允许领域专家轻松扩展现有方法并添加新方法。BTB 是 ATM 的核心组成部分，这两个项目是协同开发的，但它被设计为实现无关，因此应适用于广泛的超参数选择任务。\n\n### Featuretools\n\n[Featuretools](https:\u002F\u002Fgithub.com\u002Ffeaturetools\u002Ffeaturetools) 是一个用于自动化特征工程的 Python 库。它可以用来为 ATM 准备原始的事务性和关系型数据集。该库由 [Feature Labs](https:\u002F\u002Fwww.featurelabs.com) 创建并维护，同时也是 [人类数据交互项目](https:\u002F\u002Fhdi-dai.lids.mit.edu\u002F) 的一部分。","# ATM 快速上手指南\n\nATM (Auto Tune Models) 是由麻省理工学院 (MIT) Data to AI Lab 开发的开源 AutoML 系统。它旨在简化机器学习流程：用户只需提供分类问题和 CSV 格式的数据集，ATM 即可自动尝试构建最佳模型。\n\n## 环境准备\n\n**系统要求：**\n- Python 版本：2.7, 3.5 或 3.6（推荐 Python 3.6）\n- 操作系统：Linux, macOS 或 Windows\n\n**前置依赖：**\n强烈建议使用虚拟环境（virtualenv）以避免与系统其他软件冲突。\n\n```bash\n# 安装 virtualenv\npip install virtualenv\n\n# 创建名为 atm-venv 的虚拟环境（以 python3.6 为例）\nvirtualenv -p $(which python3.6) atm-venv\n\n# 激活虚拟环境\nsource atm-venv\u002Fbin\u002Factivate\n```\n*注意：每次开启新的终端会话工作时，都需要重新运行激活命令。*\n\n## 安装步骤\n\n你可以通过 PyPI 直接安装稳定版，或者从源码安装。\n\n### 方式一：使用 pip 安装（推荐）\n\n```bash\npip install atm\n```\n\n### 方式二：从源码安装\n\n```bash\ngit clone git@github.com:HDI-Project\u002FATM.git\ncd ATM\ngit checkout stable\nmake install\n```\n\n## 基本使用\n\n以下是最简单的使用流程，演示如何加载数据、自动搜索最佳模型并查看结果。\n\n### 1. 获取示例数据\n\n首先下载官方提供的演示数据集（污染预测数据）：\n\n```bash\natm download_demo pollution_1.csv\n```\n\n### 2. 运行自动建模\n\n创建一个 Python 脚本（例如 `run_atm.py`），输入以下代码：\n\n```python\nfrom atm import ATM\n\n# 初始化 ATM 实例（默认会在当前目录创建 atm.db SQLite 数据库）\natm = ATM()\n\n# 开始搜索最佳模型\n# train_path 可以是本地文件路径、HTTP URL 或 S3 路径\nresults = atm.run(train_path='pollution_1.csv')\n```\n\n运行脚本后，ATM 将启动一个 `Datarun` 任务，自动测试并调整多种分类器，终端会显示进度条：\n\n```text\nProcessing dataset demos\u002Fpollution_1.csv\n100%|##########################| 100\u002F100 [00:10\u003C00:00,  6.09it\u002Fs]\n```\n\n### 3. 查看结果\n\n任务完成后，可以通过 `results` 对象探索最佳模型信息：\n\n```python\n# 查看运行摘要\nresults.describe()\n\n# 获取最佳分类器的详细信息（算法类型、超参数、得分等）\nresults.get_best_classifier()\n\n# 获取所有测试模型的评分 DataFrame\nscores = results.get_scores()\nprint(scores.head())\n```\n\n**输出示例：**\n```text\nClassifier id: 94\nClassifier type: knn\nParams chosen:\n    n_neighbors: 13\n    leaf_size: 38\n    ...\nCross Validation Score: 0.858 +- 0.096\nTest Score: 0.714\n```\n\n### 4. 导出模型并进行预测\n\n找到最佳模型后，将其保存并在新数据上进行预测：\n\n```python\nimport pandas as pd\nfrom atm import Model\n\n# 导出最佳模型到本地文件\nresults.export_best_classifier('best_model.pkl')\n\n# 加载已保存的模型\nmodel = Model.load('best_model.pkl')\n\n# 读取新数据进行预测\ndata = pd.read_csv('pollution_1.csv')\npredictions = model.predict(data.head())\n\nprint(predictions)\n```\n\n---\n*更多高级功能（如命令行接口 CLI、REST API 及自定义配置）请参考官方文档：https:\u002F\u002FHDI-Project.github.io\u002FATM\u002F*","某电商数据团队需要快速构建多个商品分类预测模型，以支持不同业务线的个性化推荐需求。\n\n### 没有 ATM 时\n- 数据科学家需手动为每个业务线的数据集尝试多种算法（如随机森林、SVM、神经网络），耗时数天才能完成初步筛选。\n- 超参数调优依赖人工经验和网格搜索，计算资源浪费严重且容易陷入局部最优解。\n- 多租户环境下，不同项目组的数据和模型配置混杂，缺乏统一管理系统，导致复现困难和协作效率低下。\n- 每次新增数据集都要重新编写完整的建模流水线代码，开发重复劳动占比超过 60%。\n\n### 使用 ATM 后\n- 只需上传 CSV 文件并指定目标列，ATM 自动并行测试数十种模型组合，将模型筛选时间从数天缩短至数小时。\n- 内置智能搜索策略自动优化超参数，在同等算力下找到精度更高的模型配置，显著提升预测准确率。\n- 通过多租户架构隔离不同业务线的数据与实验记录，团队成员可清晰追踪每个模型的来源与性能指标。\n- 标准化输入格式让新数据集接入无需重写代码，数据分析师仅需关注业务逻辑而非工程实现。\n\nATM 将繁琐的模型选择与调优过程自动化，让团队能专注于高价值的业务洞察而非底层技术细节。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHDI-Project_ATM_06faec00.png","HDI-Project","MIT - The Human Data Interaction Project","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FHDI-Project_e0beabd4.jpg","This is a research project within the Data to AI Lab at MIT. ",null,"dai-lab@mit.edu","https:\u002F\u002Fhdi-dai.lids.mit.edu\u002F","https:\u002F\u002Fgithub.com\u002FHDI-Project",[81,85],{"name":82,"color":83,"percentage":84},"Python","#3572A5",96.9,{"name":86,"color":87,"percentage":88},"Makefile","#427819",3.1,528,140,"2026-03-28T05:54:34","MIT","","未说明",{"notes":96,"python":97,"dependencies":98},"该项目处于预发布（Pre-Alpha）阶段。强烈建议使用 virtualenv 创建虚拟环境以避免冲突。支持从本地文件系统、HTTP URL 或 AWS S3 读取 CSV 数据。默认使用 SQLite 数据库存储运行结果。","2.7, 3.5, 3.6",[99,100,101],"virtualenv (推荐)","pandas","scikit-learn (隐含依赖，基于分类器功能)",[103,14,16],"其他",[105,106,107,108,109],"machine-learning","data-science","hyperparameter-optimization","distributed-computing","automl","2026-03-27T02:49:30.150509","2026-04-18T09:19:26.090602",[113,118,123,128,133,137],{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},39141,"运行 enter_data.py 时出现 'ImportError: No module named atm' 错误怎么办？","这通常是由于环境配置或安装路径问题导致的。请确保您已正确设置虚拟环境，并且是在项目根目录下运行脚本。如果问题仍然存在，建议提供以下详细信息以便排查：操作系统版本、Python 版本、虚拟环境设置情况、具体的安装命令以及触发错误的完整命令序列。在某些情况下，切换到稳定的操作系统环境（如从 Kali 切换到 CentOS）也可能解决问题。","https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F50",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},39142,"运行 worker.py 进行模型调优时出现 'ValueError: Invalid param type None' 错误如何解决？","该错误与 BTB (Bayesian Tuning and Bandits) 库的版本兼容性有关，特别是在反序列化超参数时。此问题已在 ATM v0.1.1 版本中修复，请将项目升级到最新版本。如果暂时无法升级，可以尝试将 BTB 库降级到 0.1.2 版本（即运行 `pip install baytune==0.1.2`），这通常能暂时解决该兼容性问题。","https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F119",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},39143,"运行脚本时提示找不到配置文件 'log-script.yaml' (FileNotFoundError) 怎么办？","这通常是因为通过 pip 安装的 egg 包中缺少了必要的模板文件。建议不要直接使用 pip 安装的版本，而是从 GitHub 克隆源代码并在本地以开发模式安装。具体步骤如下：\n1. 克隆仓库：`git clone git@github.com:HDI-Project\u002FATM.git`\n2. 进入目录：`cd atm`\n3. 创建并激活虚拟环境。\n4. 以可编辑模式安装：`pip install -e .`\n这样可以确保所有配置文件和模板都被正确链接到项目中。","https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F91",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},39144,"ATM 是否支持 REST API？如何使用它上传数据？","是的，ATM 已经包含了只读版本的 REST API。您可以运行 `python scripts\u002Frest_api_server.py` 启动服务。主要功能包括：\n1. 上传 CSV 数据：使用 curl 命令 `curl localhost:5000\u002Fenter_data -F file=@\u002Fpath\u002Ffile.csv`，成功会返回 `{\"success\": true}`。\n2. 查看数据集信息：访问 `localhost:5000\u002Fdatasets\u002F\u003CID>` 即可获取数据集的详细信息（如类别数、特征数等）。\n注意：目前上传文件时若文件名重复可能会覆盖原有文件，且默认存储在本地 `atm\u002Fdata` 目录，尚未集成 AWS S3 等云存储。","https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F82",{"id":134,"question_zh":135,"answer_zh":136,"source_url":122},39145,"如何在非官方支持的操作系统（如 macOS）上运行 ATM？","虽然 macOS 可能不是官方首选的支持平台，但用户报告在 macOS (如 10.13.6) 上可以成功运行 ATM。关键在于确保 Python 环境（推荐 Python 3.6+）和依赖库（如 scikit-learn, pandas, baytune 等）的版本与项目要求严格匹配。如果在运行过程中遇到特定于系统的错误，请检查是否是由于路径分隔符或特定系统库缺失引起，并尝试在虚拟环境中重新安装依赖。",{"id":138,"question_zh":139,"answer_zh":140,"source_url":117},39146,"遇到报错但不确定是否是已知问题时，应该如何正确地提交新 Issue？","如果您的错误信息与现有关闭的 Issue 不完全一致，或者现有方案无效，请开启一个新的 Issue 以获得更准确的帮助。在新 Issue 中，请务必包含以下环境细节：\n1. 操作系统 (Operative System)\n2. Python 版本\n3. 虚拟环境设置详情 (virtualenv setup)\n4. 完整的安装命令\n5. 导致错误的具体执行命令及完整的报错堆栈信息 (Traceback)。\n这将帮助维护者快速复现并定位问题。",[142,147,152,157,162,167],{"id":143,"version":144,"summary_zh":145,"released_at":146},315083,"v0.2.2","### 新功能\n\n* 管理依赖项 - [问题 #152](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F152) 由 @csala 提出\n* POST 请求被 CORS 策略阻止 - [问题 #151](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F151) 由 @pvk-developer 提出\n","2019-07-30T09:28:26",{"id":148,"version":149,"summary_zh":150,"released_at":151},315084,"v0.2.1","### 新功能\n\n* Rest API 跨域资源共享 (CORS) - [问题 #146](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F146) 由 @pvk-developer 提出","2019-07-26T16:30:03",{"id":153,"version":154,"summary_zh":155,"released_at":156},315085,"v0.2.0","新的 Python API\n\n### 新特性\n\n* 在 Python 中使用 ATM 的新 API - [问题 #142](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F142)，由 @pvk-developer 和 @csala 提出\n* 文档改进 - [问题 #142](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F142)，由 @pvk-developer 和 @csala 提出\n* 代码清理 - [问题 #102](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F102)，由 @csala 提出\n* 确保数据集可以从 S3 下载 - [问题 #137](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F137)，由 @pvk-developer 提出\n* 更改为使用 PyMySQL，以移除 libmysqlclient-dev 的系统依赖 - [问题 #136](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F136)，由 @pvk-developer 和 @csala 提出","2019-05-29T15:29:03",{"id":158,"version":159,"summary_zh":160,"released_at":161},315086,"v0.1.2","REST API 与集群管理。\n\n### 新特性\n\n* REST API 服务器 - 由 @RogerTangos、@pvk-developer 和 @csala 解决的 [#82](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F82) 及 [#132](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F132) 问题\n* 添加集群管理命令，用于以后台进程方式启动和停止服务器及多个工作进程 - 由 @pvk-developer 和 @csala 解决的 [#130](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F130) 问题\n* 添加 TravisCI 并将文档迁移至 GitHub Pages - 由 @pvk-developer 解决的 [#129](https:\u002F\u002Fgithub.com\u002FHDI-Project\u002FATM\u002Fissues\u002F129) 问题","2019-05-07T18:38:53",{"id":163,"version":164,"summary_zh":165,"released_at":166},315087,"v0.1.1","首次发布于 PyPI。\n\n### 新特性\n\n* 升级至最新版本的 BTB。\n* 新增命令行界面。","2019-04-02T11:31:42",{"id":168,"version":169,"summary_zh":170,"released_at":171},315088,"v0.1.0","首个稳定版本。","2019-02-22T22:15:03"]