[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-NVIDIA-Merlin--NVTabular":3,"tool-NVIDIA-Merlin--NVTabular":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":79,"owner_url":80,"languages":81,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":10,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":110,"github_topics":111,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":121,"updated_at":122,"faqs":123,"releases":152},365,"NVIDIA-Merlin\u002FNVTabular","NVTabular","NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.","NVTabular 是一个专为表格数据设计的特征工程与预处理库，核心目标是快速处理 TB 级海量数据集，以支持深度学习推荐系统的训练。在构建推荐系统时，数据科学家常面临数据规模过大、预处理流程复杂、数据加载成为训练瓶颈以及反复实验耗时过长等挑战。NVTabular 通过高层抽象简化代码操作，并利用 GPU 加速计算，让用户无需担心数据规模限制，能更专注于数据策略本身。\n\n作为 NVIDIA Merlin 开源框架的重要组成部分，NVTabular 与 Merlin Models、HugeCTR 等组件协同工作，提供从数据处理到模型部署的端到端 GPU 加速。其技术亮点在于基于 RAPIDS Dask-cuDF 库实现高性能并行处理，例如在单张 V100 GPU 上处理 Criteo 1TB 点击日志仅需 13 分钟，而传统 CPU 方案可能需要数天。此外，NVTabular 还支持将训练阶段的特征工程逻辑无缝迁移至推理阶段，确保生产环境的一致性。\n\nNVTabular 非常适合需要处理大规模表格数据、追求高效迭代的数据科学家和机器学习工程师，能帮助团队显著缩短模型准备时间并提升整体训练","NVTabular 是一个专为表格数据设计的特征工程与预处理库，核心目标是快速处理 TB 级海量数据集，以支持深度学习推荐系统的训练。在构建推荐系统时，数据科学家常面临数据规模过大、预处理流程复杂、数据加载成为训练瓶颈以及反复实验耗时过长等挑战。NVTabular 通过高层抽象简化代码操作，并利用 GPU 加速计算，让用户无需担心数据规模限制，能更专注于数据策略本身。\n\n作为 NVIDIA Merlin 开源框架的重要组成部分，NVTabular 与 Merlin Models、HugeCTR 等组件协同工作，提供从数据处理到模型部署的端到端 GPU 加速。其技术亮点在于基于 RAPIDS Dask-cuDF 库实现高性能并行处理，例如在单张 V100 GPU 上处理 Criteo 1TB 点击日志仅需 13 分钟，而传统 CPU 方案可能需要数天。此外，NVTabular 还支持将训练阶段的特征工程逻辑无缝迁移至推理阶段，确保生产环境的一致性。\n\nNVTabular 非常适合需要处理大规模表格数据、追求高效迭代的数据科学家和机器学习工程师，能帮助团队显著缩短模型准备时间并提升整体训练效率。","## [NVTabular](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNVTabular)\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002FNVTabular?color=orange&label=version)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002FNVTabular\u002F)\n[![LICENSE](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FNVIDIA-Merlin\u002FNVTabular)](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fblob\u002Fstable\u002FLICENSE)\n[![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocumentation-blue.svg)](https:\u002F\u002Fnvidia-merlin.github.io\u002FNVTabular\u002Fstable\u002FIntroduction.html)\n\nNVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the [RAPIDS Dask-cuDF](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcudf\u002Ftree\u002Fmain\u002Fpython\u002Fdask_cudf) library.\n\nNVTabular is a component of [NVIDIA Merlin](https:\u002F\u002Fdeveloper.nvidia.com\u002Fnvidia-merlin), an open source framework for building and deploying recommender systems and works with the other Merlin components including [Merlin Models](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002Fmodels), [HugeCTR](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FHugeCTR) and [Merlin Systems](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002Fsystems) to provide end-to-end acceleration of recommender systems on the GPU. Extending beyond model training, with NVIDIA’s [Triton Inference Server](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Ftensorrt-inference-server), the feature engineering and preprocessing steps performed on the data during training can be automatically applied to incoming data during inference.\n\n\u003C!-- \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVIDIA-Merlin_NVTabular_readme_e316e965f7ba.png'\u002F> -->\n\n### Benefits\n\nWhen training DL recommender systems, data scientists and machine learning (ML) engineers have been faced with the following challenges:\n\n- **Huge Datasets**: Commercial recommenders are trained on huge datasets that may be several terabytes in scale.\n- **Complex Data Feature Engineering and Preprocessing Pipelines**: Datasets need to be preprocessed and transformed so that they can be used with DL models and frameworks. In addition, feature engineering creates an extensive set of new features from existing ones, requiring multiple iterations to arrive at an optimal solution.\n- **Input Bottleneck**: Data loading, if not well optimized, can be the slowest part of the training process, leading to under-utilization of high-throughput computing devices such as GPUs.\n- **Extensive Repeated Experimentation**: The entire data engineering, training, and evaluation process can be repetitious and time consuming, requiring significant computational resources.\n\nNVTabular alleviates these challenges and helps data scientists and ML engineers:\n\n- process datasets that exceed GPU and CPU memory without having to worry about scale.\n- focus on what to do with the data and not how to do it by using abstraction at the operation level.\n- prepare datasets quickly and easily for experimentation so that more models can be trained.\n- deploy models into production by providing faster dataset transformation\n\nLearn more in the NVTabular [core features documentation](https:\u002F\u002Fnvidia-merlin.github.io\u002FNVTabular\u002Fstable\u002Fcore_features.html).\n\n### Performance\n\nWhen running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Furthermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within three minutes. Combined with [HugeCTR](http:\u002F\u002Fwww.github.com\u002FNVIDIA\u002FHugeCTR\u002F), the dataset can be processed and a full model can be trained in only six minutes.\n\nThe performance of the Criteo DRLM workflow also demonstrates the effectiveness of the NVTabular library. The original ETL script provided in Numpy took over five days to complete. Combined with CPU training, the total iteration time is over one week. By optimizing the ETL code in Spark and running on a DGX-1 equivalent cluster, the time to complete feature engineering and preprocessing was reduced to three hours. Meanwhile, training was completed in one hour.\n\n### Installation\n\nNVTabular requires Python version 3.7+. Additionally, GPU support requires:\n\n- CUDA version 11.0+\n- NVIDIA Pascal GPU or later (Compute Capability >=6.0)\n- NVIDIA driver 450.80.02+\n- Linux or WSL\n\n#### Installing NVTabular Using Conda\n\nNVTabular can be installed with Anaconda from the `nvidia` channel by running the following command:\n\n```\nconda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2\n```\n\n#### Installing NVTabular Using Pip\n\nNVTabular can be installed with `pip` by running the following command:\n\n```\npip install nvtabular\n```\n\n> Installing NVTabular with Pip causes NVTabular to run on the CPU only and might require installing additional dependencies manually.\n> When you run NVTabular in one of our Docker containers, the dependencies are already installed.\n\n#### Installing NVTabular with Docker\n\nNVTabular Docker containers are available in the [NVIDIA Merlin container\nrepository](https:\u002F\u002Fcatalog.ngc.nvidia.com\u002F?filters=&orderBy=scoreDESC&query=merlin).\nThe following table summarizes the key information about the containers:\n\n| Container Name    | Container Location                                                                   | Functionality                              |\n| ----------------- | ------------------------------------------------------------------------------------ | ------------------------------------------ |\n| merlin-hugectr    | https:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fteams\u002Fmerlin\u002Fcontainers\u002Fmerlin-hugectr    | NVTabular, HugeCTR, and Triton Inference   |\n| merlin-tensorflow | https:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fteams\u002Fmerlin\u002Fcontainers\u002Fmerlin-tensorflow | NVTabular, Tensorflow and Triton Inference |\n| merlin-pytorch    | https:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fteams\u002Fmerlin\u002Fcontainers\u002Fmerlin-pytorch    | NVTabular, PyTorch, and Triton Inference   |\n\nTo use these Docker containers, you'll first need to install the [NVIDIA Container Toolkit](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnvidia-docker) to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see [Support Matrix](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNVTabular\u002Fblob\u002Fstable\u002Fdocs\u002Fsource\u002Fresources\u002Fsupport_matrix.rst).\n\n### Notebook Examples and Tutorials\n\nWe provide a [collection of examples](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Ftree\u002Fstable\u002Fexamples) to demonstrate feature engineering with NVTabular as Jupyter notebooks:\n\n- Introduction to NVTabular's High-Level API\n- Advanced workflows with NVTabular\n- NVTabular on CPU\n- Scaling NVTabular to multi-GPU systems\n\nIn addition, NVTabular is used in many of our examples in other Merlin libraries:\n\n- [End-To-End Examples with Merlin](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FMerlin\u002Ftree\u002Fstable\u002Fexamples)\n- [Training Examples with Merlin Models](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002Fmodels\u002Ftree\u002Fstable\u002Fexamples)\n- [Training Examples with Transformer4Rec](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FTransformers4Rec\u002Ftree\u002Fstable\u002Fexamples)\n\n### Feedback and Support\n\nIf you'd like to contribute to the library directly, see the [Contributing.md](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNVTabular\u002Fblob\u002Fstable\u002FCONTRIBUTING.md). We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https:\u002F\u002Fdeveloper.nvidia.com\u002Fmerlin-devzone-survey).\n\nIf you're interested in learning more about how NVTabular works, see\n[our NVTabular documentation](https:\u002F\u002Fnvidia-merlin.github.io\u002FNVTabular\u002Fstable\u002FIntroduction.html). We also have [API documentation](https:\u002F\u002Fnvidia-merlin.github.io\u002FNVTabular\u002Fstable\u002Fapi\u002Findex.html) that outlines the specifics of the available calls within the library.\n","## [NVTabular](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNVTabular)\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002FNVTabular?color=orange&label=version)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002FNVTabular\u002F)\n[![LICENSE](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FNVIDIA-Merlin\u002FNVTabular)](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fblob\u002Fstable\u002FLICENSE)\n[![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocumentation-blue.svg)](https:\u002F\u002Fnvidia-merlin.github.io\u002FNVTabular\u002Fstable\u002FIntroduction.html)\n\nNVTabular 是一个用于表格数据（tabular data）的特征工程（feature engineering）和预处理（preprocessing）库，旨在轻松管理 TB 级（terabyte scale）数据集并训练基于深度学习（DL）的推荐系统（recommender systems）。它提供高级抽象以简化代码，并使用 [RAPIDS Dask-cuDF](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcudf\u002Ftree\u002Fmain\u002Fpython\u002Fdask_cudf) 库在图形处理器（GPU）上加速计算。\n\nNVTabular 是 [NVIDIA Merlin](https:\u002F\u002Fdeveloper.nvidia.com\u002Fnvidia-merlin) 的一个组件，这是一个用于构建和部署推荐系统的开源框架，并与其它 Merlin 组件协同工作，包括 [Merlin Models](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002Fmodels)、[HugeCTR](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FHugeCTR) 和 [Merlin Systems](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002Fsystems)，从而在 GPU 上实现推荐系统的端到端加速。除了模型训练之外，借助 NVIDIA 的 [Triton Inference Server](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Ftensorrt-inference-server)，训练期间对数据执行的特征工程和预处理步骤可以自动应用于推理期间的传入数据。\n\n\u003C!-- \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVIDIA-Merlin_NVTabular_readme_e316e965f7ba.png'\u002F> -->\n\n### 优势\n\n在训练深度学习（DL）推荐系统时，数据科学家和机器学习（ML）工程师曾面临以下挑战：\n\n- **海量数据集（Huge Datasets）**：商业推荐系统在海量数据集上进行训练，规模可能达到数 TB。\n- **复杂的数据特征工程和预处理流程（Complex Data Feature Engineering and Preprocessing Pipelines）**：数据集需要经过预处理和转换才能与深度学习（DL）模型和框架一起使用。此外，特征工程会从现有数据中创建大量新特征，需要多次迭代才能达到最优解决方案。\n- **输入瓶颈（Input Bottleneck）**：如果数据加载未得到良好优化，可能会成为训练过程中最慢的部分，导致高吞吐量计算设备（如 GPU）利用率不足。\n- **大量的重复实验（Extensive Repeated Experimentation）**：整个数据工程、训练和评估过程可能是重复且耗时的，需要大量的计算资源。\n\nNVTabular 缓解了这些挑战，并帮助数据科学家和机器学习（ML）工程师：\n\n- 处理超出图形处理器（GPU）和中央处理器（CPU）内存的数据集，而无需担心规模问题。\n- 通过利用操作层面的抽象，专注于如何处理数据而不是如何实现。\n- 快速轻松地准备数据集用于实验，以便训练更多模型。\n- 通过提供更快的数据集转换，将模型部署到生产环境。\n\n在 NVTabular [核心功能文档](https:\u002F\u002Fnvidia-merlin.github.io\u002FNVTabular\u002Fstable\u002Fcore_features.html) 中了解更多。\n\n### 性能\n\n当在单个 V100 32GB GPU 上使用 Criteo 1TB 点击日志数据集运行 NVTabular 时，特征工程和预处理能够在 13 分钟内完成。此外，当在配备八个 V100 GPU 的 DGX-1 集群上运行 NVTabular 时，特征工程和预处理能够在三分钟内完成。结合 [HugeCTR](http:\u002F\u002Fwww.github.com\u002FNVIDIA\u002FHugeCTR\u002F)，数据集可以在仅六分钟内完成处理并训练完整模型。\n\nCriteo DRLM 工作流的性能也证明了 NVTabular 库的有效性。Numpy 中提供的原始 ETL（提取、转换、加载）脚本耗时超过五天才能完成。结合 CPU 训练，总迭代时间超过一周。通过在 Spark 中优化 ETL 代码并在等效于 DGX-1 的集群上运行，完成特征工程和预处理的时间减少到了三小时。同时，训练在一小时内完成。\n\n### 安装\n\nNVTabular 需要 Python 版本 3.7+。此外，GPU 支持需要：\n\n- CUDA 版本 11.0+\n- NVIDIA Pascal GPU 或更高版本（Compute Capability >=6.0）\n- NVIDIA 驱动程序 450.80.02+\n- Linux 或 WSL\n\n#### 使用 Conda 安装 NVTabular\n\n可以通过运行以下命令从 `nvidia` 通道使用 Anaconda 安装 NVTabular：\n\n```\nconda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2\n```\n\n#### 使用 Pip 安装 NVTabular\n\n可以通过运行以下命令使用 `pip` 安装 NVTabular：\n\n```\npip install nvtabular\n```\n\n> 使用 Pip 安装 NVTabular 会导致 NVTabular 仅在 CPU 上运行，并且可能需要手动安装额外的依赖项。\n> 当我们在 Docker 容器中运行 NVTabular 时，依赖项已经安装好了。\n\n#### 使用 Docker 安装 NVTabular\n\nNVTabular Docker 容器可在 [NVIDIA Merlin 容器仓库](https:\u002F\u002Fcatalog.ngc.nvidia.com\u002F?filters=&orderBy=scoreDESC&query=merlin) 中找到。下表总结了有关容器的关键信息：\n\n| 容器名称    | 容器位置                                                                   | 功能                              |\n| ----------------- | ------------------------------------------------------------------------------------ | ------------------------------------------ |\n| merlin-hugectr    | https:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fteams\u002Fmerlin\u002Fcontainers\u002Fmerlin-hugectr    | NVTabular、HugeCTR 和 Triton 推理   |\n| merlin-tensorflow | https:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fteams\u002Fmerlin\u002Fcontainers\u002Fmerlin-tensorflow | NVTabular、Tensorflow 和 Triton 推理 |\n| merlin-pytorch    | https:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fteams\u002Fmerlin\u002Fcontainers\u002Fmerlin-pytorch    | NVTabular、PyTorch 和 Triton 推理   |\n\n要使用这些 Docker 容器，您首先需要安装 [NVIDIA Container Toolkit](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnvidia-docker) 以为 Docker 提供 GPU 支持。您可以使用上表中引用的 NGC 链接获取有关如何启动和运行这些容器的更多信息。要获取有关 NVTabular 每个容器支持的软件和模型版本的更多信息，请参阅 [支持矩阵](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNVTabular\u002Fblob\u002Fstable\u002Fdocs\u002Fsource\u002Fresources\u002Fsupport_matrix.rst)。\n\n### Notebook 示例和教程\n\n我们提供了一系列 [示例集合](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Ftree\u002Fstable\u002Fexamples)，以 Jupyter Notebook 的形式展示如何使用 NVTabular 进行 Feature Engineering（特征工程）：\n\n- NVTabular 高级 API 介绍\n- 使用 NVTabular 的高级 Workflow（工作流）\n- CPU 上的 NVTabular\n- 将 NVTabular 扩展到多 GPU 系统\n\n此外，NVTabular 还被用于我们其他 Merlin 库中的许多示例中：\n\n- [Merlin 端到端示例](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FMerlin\u002Ftree\u002Fstable\u002Fexamples)\n- [Merlin Models 训练示例](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002Fmodels\u002Ftree\u002Fstable\u002Fexamples)\n- [Transformer4Rec 训练示例](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FTransformers4Rec\u002Ftree\u002Fstable\u002Fexamples)\n\n### 反馈与支持\n\n如果您想直接为该库做出贡献，请参阅 [Contributing.md](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNVTabular\u002Fblob\u002Fstable\u002FCONTRIBUTING.md)。我们特别关注针对我们的 Feature Engineering 和 Preprocessing（预处理）操作的贡献或功能请求。为了进一步推进我们的 Merlin Roadmap（路线图），我们鼓励您在此 [调查](https:\u002F\u002Fdeveloper.nvidia.com\u002Fmerlin-devzone-survey) 中分享有关您的 Recommender System（推荐系统）Pipeline（流水线）的所有细节。\n\n如果您有兴趣了解更多关于 NVTabular 的工作原理，请参阅 [我们的 NVTabular 文档](https:\u002F\u002Fnvidia-merlin.github.io\u002FNVTabular\u002Fstable\u002FIntroduction.html)。我们还提供了 [API 文档](https:\u002F\u002Fnvidia-merlin.github.io\u002FNVTabular\u002Fstable\u002Fapi\u002Findex.html)，概述了库内可用调用的具体细节。","# NVTabular 快速上手指南\n\nNVTabular 是 NVIDIA Merlin 框架的核心组件之一，专为深度学习推荐系统设计。它是一个用于表格数据的特征工程和预处理库，能够轻松管理 TB 级数据集，并利用 GPU 加速计算（基于 RAPIDS Dask-cuDF）。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n- **操作系统**：Linux 或 WSL\n- **Python 版本**：3.7 及以上\n- **CUDA 版本**：11.0 及以上\n- **NVIDIA 驱动**：450.80.02 及以上\n- **GPU 硬件**：NVIDIA Pascal 架构或更高版本（Compute Capability >= 6.0）\n\n## 安装步骤\n\n根据您的需求，可以选择 Conda、Pip 或 Docker 进行安装。推荐使用 Conda 以获得完整的 GPU 支持。\n\n### 1. 使用 Conda 安装（推荐）\n\n通过 Anaconda 从 `nvidia` 频道安装，这将自动配置 GPU 依赖：\n\n```bash\nconda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2\n```\n\n### 2. 使用 Pip 安装\n\n```bash\npip install nvtabular\n```\n\n> **注意**：通过 Pip 安装 NVTabular 仅支持 CPU 运行，且可能需要手动安装额外的依赖项。如需 GPU 支持，建议使用 Conda 或 Docker。\n\n### 3. 使用 Docker 安装\n\nNVTabular 提供了预构建的 Docker 容器，集成在 NVIDIA Merlin 容器中。使用前需安装 [NVIDIA Container Toolkit](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnvidia-docker)。\n\n| 容器名称 | 功能描述 |\n| :--- | :--- |\n| `merlin-hugectr` | 包含 NVTabular, HugeCTR 和 Triton Inference |\n| `merlin-tensorflow` | 包含 NVTabular, Tensorflow 和 Triton Inference |\n| `merlin-pytorch` | 包含 NVTabular, PyTorch 和 Triton Inference |\n\n具体镜像地址请参考 [NVIDIA Merlin container repository](https:\u002F\u002Fcatalog.ngc.nvidia.com\u002F?filters=&orderBy=scoreDESC&query=merlin)。\n\n## 基本使用\n\n以下示例展示了如何使用 NVTabular 的高层 API 进行基本的特征工程流程：加载数据、定义转换操作、拟合工作流并输出结果。\n\n```python\nimport nvtabular as nvt\nfrom nvtabular import ops\n\n# 1. 加载数据 (支持 parquet, csv 等格式)\ndataset = nvt.Dataset(\"input_data.parquet\")\n\n# 2. 定义工作流 (Workflow)\n# 对 user_id 和 item_id 列进行类别编码 (Categorify)\nworkflow = nvt.Workflow(\n    [\n        nvt.ColumnSelector([\"user_id\"]) >> ops.Categorify(),\n        nvt.ColumnSelector([\"item_id\"]) >> ops.Categorify(),\n    ]\n)\n\n# 3. 拟合工作流并转换数据\nworkflow.fit(dataset)\ntransformed_dataset = workflow.transform(dataset)\n\n# 4. 保存处理后的数据\ntransformed_dataset.to_parquet(\"output_data\u002F\")\n```\n\n更多高级用法和多 GPU 扩展示例，请参阅官方提供的 [Jupyter Notebook 示例](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Ftree\u002Fstable\u002Fexamples)。","某大型电商平台的算法团队正在构建新一代深度学习推荐系统，需要高效处理每日累积的 TB 级用户行为与点击日志数据。\n\n### 没有 NVTabular 时\n- 依赖传统 CPU 脚本进行 ETL，处理海量数据耗时过长，单次特征工程往往需要数天甚至一周。\n- 数据加载速度跟不上模型训练需求，造成 GPU 算力严重闲置，资源利用率低下。\n- 面对超出内存容量的数据集，常遭遇内存溢出错误，难以在单机环境下完成完整流程。\n- 特征工程代码耦合度高，每次尝试新特征都需要大量重复编码，实验迭代极其缓慢。\n\n### 使用 NVTabular 后\n- 借助 NVTabular 的 GPU 加速能力，TB 级数据的预处理时间从数天锐减至十几分钟，大幅提升效率。\n- 优化的数据流水线消除了输入瓶颈，使 GPU 能够持续满负荷运行，显著缩短模型训练周期。\n- 支持自动分块处理超大文件，轻松应对超出显存和内存的数据规模，无需担忧扩展性问题。\n- 提供高层抽象操作接口，简化了代码复杂度，让工程师能更专注于特征策略而非底层实现。\n\nNVTabular 凭借 GPU 并行计算与端到端优化，为大规模推荐系统提供了极速且可扩展的数据预处理解决方案。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVIDIA-Merlin_NVTabular_e7b901ee.png","NVIDIA-Merlin","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FNVIDIA-Merlin_46f912e0.jpg","Merlin is a framework providing end-to-end GPU-accelerated recommender systems, from feature engineering to deep learning training and deploying to production",null,"https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin",[82,86,90],{"name":83,"color":84,"percentage":85},"Python","#3572A5",97.3,{"name":87,"color":88,"percentage":89},"C++","#f34b7d",2.2,{"name":91,"color":92,"percentage":93},"Shell","#89e051",0.5,1140,149,"2026-03-26T11:59:11","Apache-2.0","Linux, WSL","需要 NVIDIA Pascal 或更高版本 (Compute Capability >=6.0)，CUDA 11.0+，驱动 450.80.02+；Pip 安装仅支持 CPU","未说明",{"notes":102,"python":103,"dependencies":104},"建议使用 Conda 或 Docker 安装以获取完整 GPU 支持；使用 Docker 需安装 NVIDIA Container Toolkit；是 NVIDIA Merlin 框架组件，可与 HugeCTR、Triton 等配合使用","3.7+",[105,106,107,108,109],"RAPIDS","cuDF","Dask","CUDA Toolkit","Numba",[13,51],[112,113,114,115,116,117,118,119,120],"deep-learning","feature-engineering","feature-selection","gpu","machine-learning","nvidia","preprocessing","recommendation-system","recommender-system","2026-03-27T02:49:30.150509","2026-04-06T05:16:48.139046",[124,129,133,138,143,147],{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},1311,"如何确保 Categorify 编码后的 ID 与原始值的映射一致性？","项目已简化 Categorify 编码逻辑，确保保存的 unique values parquet 文件中的映射与编码后的 item ids 完全匹配。现在提供了简单的反向映射工具，用户无需了解内部复杂的哈希逻辑即可将编码 ID 还原为原始值。该问题已通过 PR #1692 解决。","https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fissues\u002F1748",{"id":130,"question_zh":131,"answer_zh":132,"source_url":128},1312,"Categorify 中 Null、OOV 等特殊值的编码冲突问题是否存在？","早期版本中，Nulls、Out-Of-Vocabulary 和低频项都被编码为 id 0，导致无法区分。新版简化方案旨在解决此问题，使建模时能够区分这些特殊值，具体修复请参考相关更新日志或 PR。",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},1313,"在大型数据集上拟合 Workflow 时遇到 CUDA OOM 错误该如何排查？","首先检查嵌入表（embedding table）的大小是否超过了单张 GPU 的显存容量。如果嵌入表无法放入单张 GPU 内存，`.fit()` 操作自然会导致 OOM。需确认 GPU 显存规格及模型嵌入表的具体大小。","https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fissues\u002F1761",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},1314,"从旧版本升级到 NVTabular 0.7.0 后出现 MemoryError 怎么办？","部分用户在从 0.5.3 升级到 0.7.0 后遇到了此问题。建议检查是否配置了 LocalCUDACluster 以及 RMM 内存池（rmm_pool_size）。注意 NVTabular 在新版本中默认不使用分布式调度器，除非在 Workflow 中明确定义了 client 参数。","https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fissues\u002F1181",{"id":144,"question_zh":145,"answer_zh":146,"source_url":142},1315,"使用 LocalCUDACluster 时，client 参数应该配置在 Workflow 还是 Dataset？","通常应在创建 `Workflow` 时定义 `client=` 参数，而不是传递给 `nvt.Dataset` API。除非你计划在用 `Workflow` 转换之前对 Dataset 进行操作或写入磁盘，否则不需要将 client 传递给 Dataset。`Workflow.transform` 输出的 Dataset 对象会自动包含创建 workflow 时使用的同一个 client 对象。",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},1316,"如何处理 Tags.ITEM_ID 等复合标签被弃用的警告？","应使用原子版本的标签代替复合标签。例如，将 `Tags.ITEM_ID` 替换为 `[\u003CTags.ITEM: 'item'>, \u003CTags.ID: 'id'>]`。这是为了适应未来版本的移除计划。","https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fissues\u002F1764",[153,157,161,166,171,176,181,186,191,196,201,206,210,214,219,224,229,234,239,243],{"id":154,"version":155,"summary_zh":79,"released_at":156},100807,"v23.08.00","2023-08-29T16:41:05",{"id":158,"version":159,"summary_zh":79,"released_at":160},100808,"v23.06.00","2023-06-22T21:12:35",{"id":162,"version":163,"summary_zh":164,"released_at":165},100809,"v23.05.00","## What’s Changed\r\n\r\n## :ant: Bug Fixes\r\n- Fix list slicing of `np.ndarray`s on CPU @karlhigley (#1817)\r\n- \r\n## :rocket: Features\r\n- Add support for int8 values with Categorify inference @oliverholworthy (#1818)\r\n\r\n## :wrench: Maintenance\r\n- atomize added tags in TagAsUserID and TagAsItemID @radekosmulski (#1815)\r\n- Update requirements for Merlin packages to minimum version of 23.04 @karlhigley (#1804)\r\n- Update conda package publish for muliple python versions @oliverholworthy (#1805)\r\n- Remove use of deprecated numpy aliases of builtin types @oliverholworthy (#1813)\r\n- Add workflows to check base branch and set stable branch @oliverholworthy (#1811)\r\n- Update tag pattern in GitHub Workflows @oliverholworthy (#1812)\r\n- Cleanup Unused Test Dependencies @oliverholworthy (#1810)","2023-05-31T14:52:50",{"id":167,"version":168,"summary_zh":169,"released_at":170},100810,"v23.04.00","## What’s Changed\r\n\r\n## 🐜 Bug Fixes\r\n\r\n- Update import of device functions to use merlin.core versions @oliverholworthy (#1786)\r\n- Enable `DatasetGen` usage in CPU environment @oliverholworthy (#1776)\r\n\r\n## 🚀 Features\r\n\r\n- Enable `CategorifyTransform` inference operator to run on int16 types @oliverholworthy (#1798)\r\n- use merlin compat for imports of gpu specific packages @jperez999 (#1791)\r\n- Enable `Workflow.transform` to be run with a DataFrame type @oliverholworthy (#1777)\r\n\r\n## 🔧 Maintenance\r\n\r\n- add concurrency setting to stop tests when new commits get pushed to PRs @nv-alaiacano (#1801)\r\n- use merlin compat for imports of gpu specific packages @jperez999 (#1791)\r\n- Replace nvtabular.utils with merlin.core.compat @edknv (#1795)\r\n- Disable package builds on pull requests @oliverholworthy (#1789)\r\n- Use None as default value of cpu in test_torch_dataloader @oliverholworthy (#1788)\r\n- Use None as default value of `cpu` in test_column_similarity @oliverholworthy (#1787)\r\n- Split up conda and PyPI package build\u002Frelease jobs @oliverholworthy (#1780)\r\n- Fix docs tox environment @alexanderronquillo (#1775)\r\n- update conftest for backwards compat and new api for to parquet call @jperez999 (#1784)\r\n- Remove tests for sparse tensors in dataloader @oliverholworthy (#1783)\r\n- Update default value of `cpu` to None in dataset fixture @oliverholworthy (#1779)\r\n- Fix Dataloader Unittest - which broke by new DL structure @bschifferer (#1782)\r\n","2023-04-26T19:33:38",{"id":172,"version":173,"summary_zh":174,"released_at":175},100811,"v23.02.00","## What’s Changed\r\n\r\n## 🐜 Bug Fixes\r\n\r\n- Add support for serializing modules involved in LambdaOp execution by value @willb (#1741)\r\n\r\n## 🚀 Features\r\n\r\n- add transform for df @jperez999 (#1734)\r\n\r\n## 🔧 Maintenance\r\n\r\n- Clean up the way shapes are computed and specified @karlhigley (#1760)\r\n- Update passenv in test-gpu to use valid configuration @oliverholworthy (#1762)\r\n- Fix the inference code's deprecation warning @karlhigley (#1757)\r\n- Specify Minimum Python Version as 3.8 in package @oliverholworthy (#1732)\r\n- Update NVT to be compatible with shapes in `ColumnSchemas` @karlhigley (#1758)\r\n- fix gpu visibilty issues on privileged container @jperez999 (#1759)\r\n- Move nest_asyncio dependency to test deps @karlhigley (#1755)\r\n- add gcp label to jenkinsfile @jperez999 (#1744)\r\n- Replace nvtabular inference back-end with python @jperez999 (#1771)\r\n- Update NVT operators and workflows to use Merlin dtypes @karlhigley (#1707)\r\n- Add Formatter (Prettier) for YAML and Markdown files @oliverholworthy (#1733)\r\n- add tf gpu allocator env var to tox @jperez999 (#1747)\r\n- Add check for urls present in requires_dist @oliverholworthy (#1728)\r\n- adding workflow to setup gha runner for GPU CI @jperez999 (#1739)\r\n- Run the tests against the main branch of Merlin Core @karlhigley (#1756)\r\n\r\n## New Contributors\r\n* @willb made their first contribution in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1741\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fcompare\u002Fv1.8.1...v23.02.00","2023-03-08T16:41:20",{"id":177,"version":178,"summary_zh":179,"released_at":180},100812,"v1.8.1","## What’s Changed\r\n\r\nPatch release on top of [v1.8.0](https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Freleases\u002Ftag\u002Fv1.8.0)\r\n\r\n- Quicker installs with build binary distributions published to PyPI (using `cibuildwheel`) @karlhigley  (#1754)\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fcompare\u002Fv1.8.0...v1.8.1","2023-02-03T22:40:55",{"id":182,"version":183,"summary_zh":184,"released_at":185},100813,"v1.8.0","## What’s Changed\r\n\r\n## 🐜 Bug Fixes\r\n\r\n- Fix output error occurring due to check if it is a dict or not @rnyak (#1742)\r\n- Remove min value count from properties when using sparse_max @oliverholworthy (#1705)\r\n\r\n## 📄 Documentation\r\n\r\n- Address virtual developer review feedback @mikemckiernan (#1724)\r\n- docs: Add semver to calver banner @mikemckiernan (#1699)\r\n\r\n## 🔧 Maintenance\r\n\r\n- remove test references that are no longer available @jperez999 (#1730)\r\n- remove integration tests for notebooks no longer available @jperez999 (#1729)\r\n- Use pre-commit for lint checks in GitHub Actions Workflow @oliverholworthy (#1723)\r\n- Remove echo from command in tox.ini @oliverholworthy (#1725)\r\n- Migrate the legacy examples to the Merlin repo @karlhigley (#1711)\r\n- Handle data loader as an iterator @oliverholworthy (#1720)\r\n- Release draft fix @jperez999 (#1712)\r\n- Add Jenkinsfile @AyodeAwe (#1702)\r\n- Update package requires_dist to remove extras that are not required @oliverholworthy (#1727)\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fcompare\u002Fv1.7.0...v1.8.0","2022-12-30T20:33:39",{"id":187,"version":188,"summary_zh":189,"released_at":190},100814,"v1.7.0","## What’s Changed\r\n\r\n## 🐜 Bug Fixes\r\n\r\n- fix tox to use correct branch in release tags @jperez999 (#1710)\r\n- Update metrics keys in example notebook tests @karlhigley (#1703)\r\n- Fix first\u002Flast groupby aggregation on list columns @rjzamora (#1693)\r\n\r\n## 📄 Documentation\r\n\r\n- docs: Add basic SEO configuration @mikemckiernan (#1697)\r\n\r\n## 🔧 Maintenance\r\n\r\n- Upload binary wheels for nvtabular @benfred (#1696)\r\n- Use merlin-dataloader package @benfred (#1694)\r\n","2022-11-23T00:48:52",{"id":192,"version":193,"summary_zh":194,"released_at":195},100815,"v1.6.0","## What’s Changed\r\n\r\n## 🐜 Bug Fixes\r\n\r\n- Fix Categorify bug for combo encoding with null values @rjzamora (#1652)\r\n- Fix joint Categorify with list columns @rjzamora (#1685)\r\n\r\n## 📄 Documentation\r\n\r\n- update NVTabular examples @radekosmulski (#1633)\r\n- Remove examples Part 1 - Rossmann, RecSys2020, Outbrain @bschifferer (#1669)\r\n\r\n## 🔧 Maintenance\r\n\r\n- adding import or skip for tensorflow framework required by examples @jperez999 (#1691)\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fcompare\u002Fv1.5.0...v1.6.0","2022-10-31T17:47:10",{"id":197,"version":198,"summary_zh":199,"released_at":200},100816,"v1.5.0","## What’s Changed\r\n\r\n## 🐜 Bug Fixes\r\n\r\n- Use Merlin DAG executors from core in integration tests   @jperez999 (#1677)\r\n- Fix target encoding tagging issue @bbozkaya (#1672)\r\n\r\n## 🔧 Maintenance\r\n\r\n- Remove stray file left over from Torch\u002FHorovod multi-GPU example @karlhigley (#1674)\r\n- Use Merlin DAG executors from core in integration tests   @jperez999 (#1677)\r\n- Remove poetry config @benfred (#1673)\r\n- chore: Add pybind11 as a tox requirement @mikemckiernan (#1675)\r\n- Switch to using the DAG executors from Merlin Core @karlhigley (#1666)\r\n- Use the latest version of Merlin Core from `main` in the `tox` test envs @karlhigley (#1671)\r\n- Set up `tox` environments for testing, linting, and building docs @karlhigley (#1667)\r\n","2022-09-26T18:26:06",{"id":202,"version":203,"summary_zh":204,"released_at":205},100817,"v1.4.0","## What’s Changed\r\n\r\n## ⚠ Breaking Changes\r\n\r\n- Remove FastAI notebooks @benfred (#1668)\r\n- Fix dl @jperez999 (#1661)\r\n- Replace cudf series ceil() with numpy ceil() @jperez999 (#1656)\r\n\r\n## 🐜 Bug Fixes\r\n\r\n- Fix integration tests that reached into `Workflow`'s private methods @karlhigley (#1660)\r\n- Fix groupby on lists with cudf 22.06+ @benfred (#1654)\r\n- Update the `Categorify` operator to set the domain max correctly @oliverholworthy (#1641)\r\n- Test LambdaOp with dask workflows @benfred (#1634)\r\n\r\n## 🚀 Features\r\n\r\n- Add sum to supported aggregations in Groupby @radekosmulski (#1638)\r\n\r\n## 📄 Documentation\r\n\r\n- Remove using-feature-columns nb @rnyak (#1657)\r\n- Fix typos @benfred (#1655)\r\n\r\n## 🔧 Maintenance\r\n\r\n- Add optional requirement specifiers for GPU and dev requirements @karlhigley (#1664)\r\n- Add `scipy` as a dependency @karlhigley (#1663)\r\n- Fix dl @jperez999 (#1661)\r\n- Fix integration tests that reached into `Workflow`'s private methods @karlhigley (#1660)\r\n- Update black\u002Fpylint\u002Fflake8,isort etc @benfred (#1659)\r\n- Remove using-feature-columns nb @rnyak (#1657)\r\n- Replace cudf series ceil() with numpy ceil() @jperez999 (#1656)\r\n- Extract Python and Dask `Executor` classes from `Workflow` @karlhigley (#1609)\r\n- Update `versioneer` from 0.19 to 0.23 @oliverholworthy (#1651)\r\n","2022-09-06T18:40:32",{"id":207,"version":208,"summary_zh":79,"released_at":209},100818,"v1.3.3","2022-07-22T19:55:18",{"id":211,"version":212,"summary_zh":79,"released_at":213},100819,"v1.3.2","2022-07-20T17:39:00",{"id":215,"version":216,"summary_zh":217,"released_at":218},100820,"v1.3.1","## What’s Changed\r\n\r\n## 🔧 Maintenance\r\n\r\n- Tri up time @jperez999 (#1623)\r\n","2022-07-19T20:54:09",{"id":220,"version":221,"summary_zh":222,"released_at":223},100821,"v1.3.0","## What’s Changed\r\n\r\n## 🐜 Bug Fixes\r\n\r\n- Don't install tests with nvtabular @benfred (#1608)\r\n- Groupby to no longer require groupby_cols in column selector @radekosmulski (#1598)\r\n- Adjust imports in the `TritonPythonModel` for `Workflows` @karlhigley (#1604)\r\n- column names can now include aggregations in ops.Groupby @radekosmulski (#1592)\r\n- Normalize Op using fp32 @benfred (#1597)\r\n- Cast warning to string in configure_tensorflow @leewyang (#1587)\r\n\r\n## 📄 Documentation\r\n\r\n- docs: Add TF compat info @mikemckiernan (#1528)\r\n\r\n## 🔧 Maintenance\r\n\r\n- Fix movielens notebook data path @jperez999 (#1622)\r\n- skip download step, that is not allowed in CI @jperez999 (#1620)\r\n- fix tritonserver gpu id & fixed timeout for criteo integration tests @jperez999 (#1619)\r\n- Remove unnecessary docs dependencies @mikemckiernan (#1617)\r\n- fix ci script for integration tests and added skip check @jperez999 (#1616)\r\n- Integration tests refactor @jperez999 (#1614)\r\n- Don't `git pull origin main` in integration tests, use container version @karlhigley (#1610)\r\n","2022-07-19T00:36:36",{"id":225,"version":226,"summary_zh":227,"released_at":228},100822,"v1.2.2","## What’s Changed\r\n\r\n## 🐜 Bug Fixes\r\n\r\n- add casting for additional aggs in groupby @radekosmulski (#1580)\r\n\r\n## 📄 Documentation\r\n\r\n- Update URLs to Criteo datasets @mikemckiernan (#1591)\r\n\r\n## 🔧 Maintenance\r\n\r\n- Fix integration tests @benfred (#1594)\r\n","2022-06-21T22:51:43",{"id":230,"version":231,"summary_zh":232,"released_at":233},100823,"v1.2.1","## What’s Changed\r\n\r\n## 🔧 Maintenance\r\n\r\n- Update the container labels in the integration tests @benfred (#1588)\r\n- Update poetry config @benfred (#1585)\r\n","2022-06-16T19:54:53",{"id":235,"version":236,"summary_zh":237,"released_at":238},100824,"v1.2.0","## What’s Changed\r\n\r\n## 🐜 Bug Fixes\r\n\r\n- remove nvtabular triton backend that seg faults on termination. @jperez999 (#1576)\r\n-  Fix LambdaOp example usage 1 @rnyak (#1561)\r\n\r\n## 📄 Documentation\r\n\r\n- Merlin offers three containers @mikemckiernan (#1581)\r\n- Fix dataloader docstring @benfred (#1573)\r\n- Improved docstrings of GroupBy op to reinforce the required usage of dataset.shuffle_by_keys()  @gabrielspmoreira (#1551)\r\n- Remove old support matrix table,  @benfred (#1560)\r\n- Update CONTRIBUTING to mention PR labels @mikemckiernan (#1554)\r\n- Update changelog to point to github releases @benfred (#1549)\r\n- Use common release-drafter workflow @mikemckiernan (#1548)\r\n\r\n## 🔧 Maintenance\r\n\r\n- Add a GA workflow that requires labels on PR's @benfred (#1579)\r\n- Use shared implementation of triage workflow @benfred (#1577)\r\n- Don't pull main on running NVT unittests @benfred (#1578)\r\n- Don't build model_config_pb2 @benfred (#1566)\r\n- Add conda builds to our github actions workflow @benfred (#1557)\r\n- Add release-drafter workflow for generating changelogs @benfred (#1540)\r\n- Remove message about integration tests missing @benfred (#1539)\r\n","2022-06-15T00:00:24",{"id":240,"version":241,"summary_zh":79,"released_at":242},100825,"v1.1.1","2022-05-10T20:11:21",{"id":244,"version":245,"summary_zh":246,"released_at":247},100826,"v1.1.0","## Known Issues\r\n* Error when sending request to Triton after loading a Transformers4Rec PyTorch model https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fissues\u002F1502\r\n\r\n## What's Changed\r\n* Automate pushing package to pypi by @benfred in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1505\r\n* docs: Add attention admonition to Merlin SMX by @mikemckiernan in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1507\r\n* added category name to domain for column properties by @jperez999 in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1508\r\n* Fix the embedding size lookup in `Categorify` op by @karlhigley in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1511\r\n* Max auc by @jperez999 in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1513\r\n* Fix inf container tag in getting started TF-inf nb and polish exp README by @rnyak in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1516\r\n* Fix for max-size categorify operator category ordering by @jperez999 in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1519\r\n* Criteo HugeCTR Inference Configuration Fix by @bschifferer in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1522\r\n* Add ascending param in the Groupby op by @rnyak in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1525\r\n* Remove os.environ[\"TF_MEMORY_ALLOCATION\"] from getting-started 03-Training-with-TF nb to avoid OOM  by @rnyak in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1527\r\n* Fix getting started 03-Training-with-HugeCTR.ipynb  nb's training without printing out auc and loss metrics issue by @rnyak in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1532\r\n* reqs fixed by @jperez999 in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1536\r\n* docs: Add ext-toc, switch to MyST-NB by @mikemckiernan in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1529\r\n* remove horovod example, no longer supported by @jperez999 in https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fpull\u002F1530\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-Merlin\u002FNVTabular\u002Fcompare\u002Fv1.0.0...v1.1.0","2022-05-10T00:02:41"]