[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-duxuhao--Feature-Selection":3,"tool-duxuhao--Feature-Selection":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",149489,2,"2026-04-10T11:32:46",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":79,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":92,"forks":93,"last_commit_at":94,"license":95,"difficulty_score":96,"env_os":97,"env_gpu":97,"env_ram":97,"env_deps":98,"category_tags":106,"github_topics":108,"view_count":32,"oss_zip_url":80,"oss_zip_packed_at":80,"status":17,"created_at":116,"updated_at":117,"faqs":118,"releases":153},6394,"duxuhao\u002FFeature-Selection","Feature-Selection","Features selector based on the self selected-algorithm, loss function and validation method","Feature-Selection 是一款专为机器学习任务设计的开源特征选择工具，旨在帮助开发者从海量数据中高效筛选出最具价值的特征组合。在建模过程中，冗余或无关的特征往往会降低模型精度并增加计算成本，而该工具通过自动化流程解决了这一痛点，让用户能专注于核心算法优化。\n\n它特别适合数据科学家、算法工程师及科研研究人员使用，尤其是那些需要在 Kaggle、IJCAI 等数据竞赛中快速迭代模型的团队。Feature-Selection 的核心亮点在于其高度的灵活性与多样性：它不仅支持基于贪心算法的序列搜索，还能依据特征重要性或相关系数自动剔除劣质特征。用户可自由指定评估指标（如 AUC、LogLoss）和验证方法，并兼容逻辑回归、XGBoost 等多种主流算法。\n\n值得一提的是，该工具的策略曾在多个知名数据竞赛中斩获佳绩，包括荣数 360 赛季冠军及 IJCAI 2018 十二强，证明了其在实战中的卓越表现。此外，它还提供了日志读取与交叉项填充等实用辅助功能，方便用户回溯历史实验结果或处理复杂特征工程。通过简洁的 API 接口，Feature-Selection 让复杂的特征筛选过程变得直","Feature-Selection 是一款专为机器学习任务设计的开源特征选择工具，旨在帮助开发者从海量数据中高效筛选出最具价值的特征组合。在建模过程中，冗余或无关的特征往往会降低模型精度并增加计算成本，而该工具通过自动化流程解决了这一痛点，让用户能专注于核心算法优化。\n\n它特别适合数据科学家、算法工程师及科研研究人员使用，尤其是那些需要在 Kaggle、IJCAI 等数据竞赛中快速迭代模型的团队。Feature-Selection 的核心亮点在于其高度的灵活性与多样性：它不仅支持基于贪心算法的序列搜索，还能依据特征重要性或相关系数自动剔除劣质特征。用户可自由指定评估指标（如 AUC、LogLoss）和验证方法，并兼容逻辑回归、XGBoost 等多种主流算法。\n\n值得一提的是，该工具的策略曾在多个知名数据竞赛中斩获佳绩，包括荣数 360 赛季冠军及 IJCAI 2018 十二强，证明了其在实战中的卓越表现。此外，它还提供了日志读取与交叉项填充等实用辅助功能，方便用户回溯历史实验结果或处理复杂特征工程。通过简洁的 API 接口，Feature-Selection 让复杂的特征筛选过程变得直观易用，是提升模型性能的得力助手。","# MLFeatureSelection\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002FMLFeatureSelection.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002FMLFeatureSelection\u002F)\n\nGeneral features selection based on certain machine learning algorithm and evaluation methods\n\n**Divesity, Flexible and Easy to use**\n\nMore features selection method will be included in the future!\n\n## Quick Installation\n\n```python\npip3 install MLFeatureSelection\n```\n\n## Modulus in version 0.0.9.5.1\n\n- Modulus for selecting features based on greedy algorithm (from MLFeatureSelection import sequence_selection)\n\n- Modulus for removing features based on features importance (from MLFeatureSelection import importance_selection)\n\n- Modulus for removing features based on correlation coefficient (from MLFeatureSelection import coherence_selection)\n\n- Modulus for reading the features combination from log file (from MLFeatureSelection.tools import readlog)\n\n## This features selection method achieved\n\n- **1st** in Rong360\n\n-- https:\u002F\u002Fgithub.com\u002Fduxuhao\u002Frong360-season2\n\n- **6th** in JData-2018\n\n-- https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FJData-2018\n\n- **12nd** in IJCAI-2018 1st round\n\n-- https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FIJCAI-2018-2\n\n## Modulus Usage\n\n[Example](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fblob\u002Fmaster\u002FDemo.ipynb)\n\n- sequence_selection\n\n```python\nfrom MLFeatureSelection import sequence_selection\nfrom sklearn.linear_model import LogisticRegression\n\nsf = sequence_selection.Select(Sequence = True, Random = True, Cross = False) \nsf.ImportDF(df,label = 'Label') #import dataframe and label\nsf.ImportLossFunction(lossfunction, direction = 'ascend') #import loss function handle and optimize direction, 'ascend' for AUC, ACC, 'descend' for logloss etc.\nsf.InitialNonTrainableFeatures(notusable) #those features that is not trainable in the dataframe, user_id, string, etc\nsf.InitialFeatures(initialfeatures) #initial initialfeatures as list\nsf.GenerateCol() #generate features for selection\nsf.SetFeatureEachRound(50, False) # set number of feature each round, and set how the features are selected from all features (True: sample selection, False: select chunk by chunk)\nsf.clf = LogisticRegression() #set the selected algorithm, can be any algorithm\nsf.SetLogFile('record.log') #log file\nsf.run(validate) #run with validation function, validate is the function handle of the validation function, return best features combination\n```\n\n- importance_selection\n\n```python\nfrom MLFeatureSelection import importance_selection\nimport xgboost as xgb\n\nsf = importance_selection.Select() \nsf.ImportDF(df,label = 'Label') #import dataframe and label\nsf.ImportLossFunction(lossfunction, direction = 'ascend') #import loss function and optimize direction\nsf.InitialFeatures() #initial features, input\nsf.SelectRemoveMode(batch = 2)\nsf.clf = xgb.XGBClassifier() \nsf.SetLogFile('record.log') #log file\nsf.run(validate) #run with validation function, return best features combination\n```\n\n- coherence_selection\n\n```python\nfrom MLFeatureSelection import coherence_selection\nimport xgboost as xgb\n\nsf = coherence_selection.Select() \nsf.ImportDF(df,label = 'Label') #import dataframe and label\nsf.ImportLossFunction(lossfunction, direction = 'ascend') #import loss function and optimize direction\nsf.InitialFeatures() #initial features, input\nsf.SelectRemoveMode(batch = 2)\nsf.clf = xgb.XGBClassifier() \nsf.SetLogFile('record.log') #log file\nsf.run(validate) #run with validation function, return best features combination\n```\n\n- tools.readlog: read previous selected features from log\n\n```python\nfrom MLFeatureSelection.tools import readlog\n\nlogfile = 'record.log'\nlogscore = 0.5 #any score in the logfile\nfeatures_combination = readlog(logfile, logscore)\n```\n\n- tools.filldf: complete dataset when there is cross-term features\n\n```python\nfrom MLFeatureSelection.tools import readlog, filldf\n\ndef add(x,y):\n    return x + y\n\ndef substract(x,y):\n    return x - y\n\ndef times(x,y):\n    return x * y\n\ndef divide(x,y):\n    return x\u002Fy\n\ndef sq(x,y):\n    return x ** 2\n\n\nCrossMethod = {'+':add,\n               '-':substract,\n               '*':times,\n               '\u002F':divide,\n               } # set your own cross method\n\ndf = pd.read_csv('XXX')\nlogfile = 'record.log'\nlogscore = 0.5 #any score in the logfile\nfeatures_combination = readlog(logfile, logscore)\ndf = filldf(df, features_combination, CrossMethod)\n```\n\n- format of validate and lossfunction\n\ndefine your own:\n\n**validate**: validation method in function , ie k-fold, last time section valdate, random sampling validation, etc\n\n**lossfunction**: model performance evaluation method, ie logloss, auc, accuracy, etc\n\n```python\ndef validate(X, y, features, clf, lossfunction):\n    \"\"\"define your own validation function with 5 parameters\n    input as X, y, features, clf, lossfunction\n    clf is set by SetClassifier()\n    lossfunction is import earlier\n    features will be generate automatically\n    function return score and trained classfier\n    \"\"\"\n    clf.fit(X[features],y)\n    y_pred = clf.predict(X[features])\n    score = lossfuntion(y_pred,y)\n    return score, clf\n    \ndef lossfunction(y_pred, y_test):\n    \"\"\"define your own loss function with y_pred and y_test\n    return score\n    \"\"\"\n    return np.mean(y_pred == y_test)\n```\n\n## multiple processing \n\nMultiple processing can be set in validate function when you are doing N-fold.\n    \n## DEMO\n\nMore examples are added in example folder include:\n\n- Demo contain all modulus can be found here ([demo](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fblob\u002Fmaster\u002FDemo.ipynb))\n\n- Simple Titanic with 5-fold validation and evaluated by accuracy ([demo](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Ftree\u002Fmaster\u002Fexample\u002Ftitanic))\n\n- Demo for S1, S2 score improvement in JData 2018 predict purchase time competition ([demo](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Ftree\u002Fmaster\u002Fexample\u002FJData2018))\n\n- Demo for IJCAI 2018 CTR prediction ([demo](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Ftree\u002Fmaster\u002Fexample\u002FIJCAI-2018))\n\n\n## Function Parameters\n\n[Parameters](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fblob\u002Fmaster\u002FMLFeatureSelection)\n\n## Algorithm details\n\n[Details](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fblob\u002Fmaster\u002FAlgorithms_Graphs)\n","# MLFeatureSelection\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002FMLFeatureSelection.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002FMLFeatureSelection\u002F)\n\n基于特定机器学习算法和评估方法的通用特征选择工具。\n\n**多样、灵活且易于使用**\n\n未来还将加入更多特征选择方法！\n\n## 快速安装\n\n```python\npip3 install MLFeatureSelection\n```\n\n## 0.0.9.5.1 版本模块\n\n- 基于贪心算法的特征选择模块（来自 MLFeatureSelection import sequence_selection）\n- 基于特征重要性去除特征的模块（来自 MLFeatureSelection import importance_selection）\n- 基于相关系数去除特征的模块（来自 MLFeatureSelection import coherence_selection）\n- 从日志文件中读取特征组合的模块（来自 MLFeatureSelection.tools import readlog）\n\n## 该特征选择方法取得的成绩\n\n- 荣360 第一名\n\n-- https:\u002F\u002Fgithub.com\u002Fduxuhao\u002Frong360-season2\n\n- JData-2018 第六名\n\n-- https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FJData-2018\n\n- IJCAI-2018 初赛第十二名\n\n-- https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FIJCAI-2018-2\n\n## 模块使用示例\n\n[示例](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fblob\u002Fmaster\u002FDemo.ipynb)\n\n### sequence_selection\n\n```python\nfrom MLFeatureSelection import sequence_selection\nfrom sklearn.linear_model import LogisticRegression\n\nsf = sequence_selection.Select(Sequence = True, Random = True, Cross = False) \nsf.ImportDF(df,label = 'Label') #导入数据框及标签\nsf.ImportLossFunction(lossfunction, direction = 'ascend') #导入损失函数句柄及优化方向，'ascend' 用于 AUC、ACC，'descend' 用于 logloss 等\nsf.InitialNonTrainableFeatures(notusable) #数据框中不可训练的特征，如 user_id、字符串等\nsf.InitialFeatures(initialfeatures) #初始化初始特征列表\nsf.GenerateCol() #生成待选择的特征\nsf.SetFeatureEachRound(50, False) #设置每轮选择的特征数量，并指定从所有特征中如何选取（True：随机抽样，False：分块选取）\nsf.clf = LogisticRegression() #设置所选算法，可为任意算法\nsf.SetLogFile('record.log') #设置日志文件\nsf.run(validate) #运行并传入验证函数，validate 为验证函数句柄，返回最佳特征组合\n```\n\n### importance_selection\n\n```python\nfrom MLFeatureSelection import importance_selection\nimport xgboost as xgb\n\nsf = importance_selection.Select() \nsf.ImportDF(df,label = 'Label') #导入数据框及标签\nsf.ImportLossFunction(lossfunction, direction = 'ascend') #导入损失函数及优化方向\nsf.InitialFeatures() #初始化输入特征\nsf.SelectRemoveMode(batch = 2)\nsf.clf = xgb.XGBClassifier() \nsf.SetLogFile('record.log') #设置日志文件\nsf.run(validate) #运行并传入验证函数，返回最佳特征组合\n```\n\n### coherence_selection\n\n```python\nfrom MLFeatureSelection import coherence_selection\nimport xgboost as xgb\n\nsf = coherence_selection.Select() \nsf.ImportDF(df,label = 'Label') #导入数据框及标签\nsf.ImportLossFunction(lossfunction, direction = 'ascend') #导入损失函数及优化方向\nsf.InitialFeatures() #初始化输入特征\nsf.SelectRemoveMode(batch = 2)\nsf.clf = xgb.XGBClassifier() \nsf.SetLogFile('record.log') #设置日志文件\nsf.run(validate) #运行并传入验证函数，返回最佳特征组合\n```\n\n### tools.readlog：从日志中读取先前选择的特征\n\n```python\nfrom MLFeatureSelection.tools import readlog\n\nlogfile = 'record.log'\nlogscore = 0.5 #日志中的任意分数\nfeatures_combination = readlog(logfile, logscore)\n```\n\n### tools.filldf：当存在交叉项特征时补全数据集\n\n```python\nfrom MLFeatureSelection.tools import readlog, filldf\n\ndef add(x,y):\n    return x + y\n\ndef substract(x,y):\n    return x - y\n\ndef times(x,y):\n    return x * y\n\ndef divide(x,y):\n    return x\u002Fy\n\ndef sq(x,y):\n    return x ** 2\n\n\nCrossMethod = {'+':add,\n               '-':substract,\n               '*':times,\n               '\u002F':divide,\n               } # 设置您自己的交叉方法\n\ndf = pd.read_csv('XXX')\nlogfile = 'record.log'\nlogscore = 0.5 #日志中的任意分数\nfeatures_combination = readlog(logfile, logscore)\ndf = filldf(df, features_combination, CrossMethod)\n```\n\n### validate 和 lossfunction 的格式\n\n您可以自定义：\n\n**validate**：验证方法函数，例如 k 折交叉验证、最后时间切片验证、随机采样验证等。\n\n**lossfunction**：模型性能评估方法，例如 logloss、AUC、准确率等。\n\n```python\ndef validate(X, y, features, clf, lossfunction):\n    \"\"\"自定义验证函数，包含 5 个参数：\n    输入为 X、y、features、clf、lossfunction。\n    clf 由 SetClassifier() 设置，\n    lossfunction 在前期导入，\n    features 将自动生成。\n    函数返回得分和训练好的分类器。\n    \"\"\"\n    clf.fit(X[features],y)\n    y_pred = clf.predict(X[features])\n    score = lossfuntion(y_pred,y)\n    return score, clf\n    \ndef lossfunction(y_pred, y_test):\n    \"\"\"自定义损失函数，接收 y_pred 和 y_test，\n    返回得分。\n    \"\"\"\n    return np.mean(y_pred == y_test)\n```\n\n## 多进程处理\n\n在进行 N 折交叉验证时，可在 validate 函数中设置多进程处理。\n\n## 演示\n\n示例文件夹中添加了更多示例，包括：\n\n- 包含所有模块的演示（[demo](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fblob\u002Fmaster\u002FDemo.ipynb)）\n- 使用 5 折交叉验证并以准确率评估的简单泰坦尼克号案例（[demo](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Ftree\u002Fmaster\u002Fexample\u002Ftitanic)）\n- 用于 JData 2018 预测购买时间竞赛中 S1、S2 分数提升的演示（[demo](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Ftree\u002Fmaster\u002Fexample\u002FJData2018)）\n- 用于 IJCAI 2018 CTR 预测的演示（[demo](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Ftree\u002Fmaster\u002Fexample\u002FIJCAI-2018)）\n\n## 函数参数\n\n[参数](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fblob\u002Fmaster\u002FMLFeatureSelection)\n\n## 算法详情\n\n[详情](https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fblob\u002Fmaster\u002FAlgorithms_Graphs)","# MLFeatureSelection 快速上手指南\n\nMLFeatureSelection 是一个基于机器学习算法和评估方法的通用特征选择工具库，具有多样性、灵活性和易用性。该库曾帮助作者在多个数据竞赛（如融 360、JData-2018、IJCAI-2018）中取得优异成绩。\n\n## 环境准备\n\n- **系统要求**：支持 Windows、Linux、macOS\n- **Python 版本**：建议 Python 3.6 及以上\n- **前置依赖**：\n  - `pandas`\n  - `scikit-learn`\n  - `numpy`\n  - 可选：`xgboost`、`lightgbm` 等模型库（根据实际使用的算法安装）\n\n确保已安装基础数据科学库，例如：\n```bash\npip3 install pandas scikit-learn numpy\n```\n\n## 安装步骤\n\n通过 PyPI 直接安装最新稳定版：\n\n```bash\npip3 install MLFeatureSelection\n```\n\n> 💡 **国内加速建议**：如遇下载缓慢，可使用清华或阿里云镜像源：\n> ```bash\n> pip3 install MLFeatureSelection -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 基本使用\n\n以下以**基于特征重要性的选择方法**（`importance_selection`）为例，展示最简使用流程。假设你已有一个包含特征和目标列的 DataFrame `df`，目标列为 `'Label'`。\n\n### 示例代码\n\n```python\nfrom MLFeatureSelection import importance_selection\nimport xgboost as xgb\nimport pandas as pd\nimport numpy as np\n\n# 自定义验证函数（5 折交叉验证示例）\ndef validate(X, y, features, clf, lossfunction):\n    from sklearn.model_selection import cross_val_score\n    score = cross_val_score(clf, X[features], y, cv=5, scoring='accuracy').mean()\n    clf.fit(X[features], y)  # 在全量训练集上重新训练\n    return score, clf\n\n# 自定义损失\u002F评估函数（此处用准确率）\ndef lossfunction(y_pred, y_test):\n    return np.mean(y_pred == y_test)\n\n# 初始化选择器\nsf = importance_selection.Select()\n\n# 导入数据\nsf.ImportDF(df, label='Label')\n\n# 设置评估方向（'ascend' 表示分数越高越好，如 AUC、Accuracy；'descend' 用于 logloss 等）\nsf.ImportLossFunction(lossfunction, direction='ascend')\n\n# 初始化特征列表（若不传则默认使用所有数值型特征）\nsf.InitialFeatures()\n\n# 设置每轮移除特征的数量\nsf.SelectRemoveMode(batch=2)\n\n# 指定基学习器\nsf.clf = xgb.XGBClassifier()\n\n# 设置日志文件（记录每轮特征组合与得分）\nsf.SetLogFile('record.log')\n\n# 运行特征选择\nbest_score, best_clf = sf.run(validate)\n\nprint(\"最佳特征组合得分:\", best_score)\n```\n\n### 说明\n\n- `validate` 函数需接收 5 个参数：`X`, `y`, `features`, `clf`, `lossfunction`，返回 `(score, trained_clf)`。\n- `lossfunction` 接收预测值和真实值，返回标量评分。\n- 日志文件 `record.log` 可用于后续通过 `tools.readlog` 读取历史最优特征组合。\n\n更多用法（如贪心序列选择 `sequence_selection`、相关性剔除 `coherence_selection`）请参考官方 Demo Notebook。","某金融风控团队正在构建信用卡欺诈检测模型，面对数千个原始交易特征，急需筛选出最具预测力的变量组合以提升模型准确率。\n\n### 没有 Feature-Selection 时\n- **人工筛选效率低下**：数据科学家需手动计算相关性或依赖单一算法的重要性评分，耗时数天且难以覆盖所有特征组合。\n- **模型过拟合风险高**：保留了大量冗余或噪声特征（如用户 ID、无关字符串），导致模型在训练集表现好但泛化能力差。\n- **缺乏系统化验证**：无法自动结合特定损失函数（如 AUC 或 LogLoss）和交叉验证来评估不同特征子集的实际效果。\n- **试错成本高昂**：调整特征组合需反复修改代码并重新训练，难以记录历史实验结果进行回溯对比。\n\n### 使用 Feature-Selection 后\n- **自动化高效筛选**：利用 `sequence_selection` 模块基于贪心算法自动遍历特征组合，将原本数天的工作缩短至几小时。\n- **精准剔除噪声**：通过 `importance_selection` 和 `coherence_selection` 模块，依据模型重要性和相关系数自动移除无效特征，显著提升模型泛化性能。\n- **灵活定制评估体系**：支持自定义损失函数方向（如最大化 AUC）和验证方法，确保选出的特征直接服务于业务目标。\n- **实验过程可追溯**：内置日志功能自动记录每次迭代的最佳特征组合与得分，随时通过 `readlog` 工具复现或优化历史方案。\n\nFeature-Selection 通过将特征工程从“手工艺术”转变为“自动化科学”，帮助团队在竞赛级场景中快速锁定最优特征子集，大幅提升建模效率与预测精度。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fduxuhao_Feature-Selection_5d99c2c4.png","duxuhao","Xuhao Du (Peter) 杜旭浩","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fduxuhao_2dccd5ff.jpg","Postdoc in Data mining, Machine learning and Signal Processing","Marhsall Centre, The University of Western Australia","Perth, Australia","duxuhao88@gmail.com",null,"www.linkedin.com\u002Fin\u002Fduxuhao","https:\u002F\u002Fgithub.com\u002Fduxuhao",[84,88],{"name":85,"color":86,"percentage":87},"Python","#3572A5",69.4,{"name":89,"color":90,"percentage":91},"Jupyter Notebook","#DA5B0B",30.6,679,198,"2026-02-24T10:05:36","MIT",1,"未说明",{"notes":99,"python":100,"dependencies":101},"该工具是一个基于传统机器学习算法（如逻辑回归、XGBoost）的特征选择库，非深度学习模型，因此无特定 GPU 或大显存需求。用户需自行定义验证函数（validate）和损失函数（lossfunction）。支持多种特征选择策略：基于贪心算法的序列选择、基于特征重要性的移除、基于相关系数的移除。","3.x (通过 pip3 安装推断)",[102,103,104,105],"scikit-learn","xgboost","pandas","numpy",[16,107,14],"其他",[109,110,111,112,113,114,115],"machine-learning","feature-engineering","feature-selection","data-science","greedy-search","feature-importance","feature-extraction","2026-03-27T02:49:30.150509","2026-04-11T04:55:22.637459",[119,124,129,134,139,144,149],{"id":120,"question_zh":121,"answer_zh":122,"source_url":123},28961,"通过 pip 安装后报错 'MLFeatureSelection' has no attribute 'Select' 怎么办？","这是因为导入方式不正确。如果是通过 pip 安装的，请将代码中的 `import FeatureSelection as FS` 修改为 `from MLFeatureSelection import FeatureSelection as FS`。","https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fissues\u002F4",{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},28962,"特征维度较高时（如 300+），程序运行速度太慢如何优化？","运行速度主要取决于单次模型训练的时间。总耗时约为：单次模型运行时间 × 特征数量。例如单次需 10 秒，320 个特征则需约 3200 秒。目前建议：1. 对于大数据集，可以随时中断程序，手动添加认为有用的特征后重新开始搜索；2. 小或中等数据集运行会较快；3. 作者正在考虑添加并行功能以提升速度。","https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fissues\u002F5",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},28963,"该工具是否支持中文特征名（列名）？","截至目前，该工具尚不支持中文列名。如果数据集中包含中文字符的列名，程序会抛出 `_UnicodeEncodeError: 'ascii' codec can't encode characters...` 错误并停止运行。","https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fissues\u002F15",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},28964,"该工具是否可以与 K-Fold 交叉验证结合使用？","可以结合使用。作者已添加了简单的 Demo。需要注意的是，如果数据集较大（如 IJCAI 数据集），运行时间会较长，但支持随时中断并手动调整特征后继续；如果是小或中等规模的数据集，运行速度会很快。","https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fissues\u002F3",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},28965,"在 Ubuntu 上运行报错 'Select' object has no attribute 'ColumnName' 是什么原因？","这是由 `pandas.DataFrame.sample` 参数设置引起的问题，已在后续版本中修复。请确保更新到最新版本代码即可解决。","https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fissues\u002F7",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},28966,"代码中出现的 preprocessing 库是什么？需要单独安装吗？","`preprocessing` 是作者自己编写的辅助库，包含一些与特征处理相关的功能，并非必要的第三方库。在新上传的版本中，相关依赖问题已经修复，用户无需单独安装该库即可正常运行。","https:\u002F\u002Fgithub.com\u002Fduxuhao\u002FFeature-Selection\u002Fissues\u002F1",{"id":150,"question_zh":151,"answer_zh":152,"source_url":143},28967,"特征选择后放入原模型，为什么损失值比特征选择过程中显示的要高？","这种情况通常是因为 `sample` 参数设置不为 1 导致的采样差异。一般情况下损失应该是一致的，建议检查样本设置是否导致了数据分布的变化。",[154,159],{"id":155,"version":156,"summary_zh":157,"released_at":158},197798,"v0.08.2","基于自选算法、损失函数和验证方法的特征选择算法","2018-09-19T00:25:00",{"id":160,"version":161,"summary_zh":162,"released_at":163},197799,"v0.05","基于选定算法的特征选择。选择选项包括序列式特征选择和基于特征重要性的特征选择。","2018-05-24T02:39:32"]