[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-NVIDIA-NeMo--DataDesigner":3,"tool-NVIDIA-NeMo--DataDesigner":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",156804,2,"2026-04-15T11:34:33",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":75,"owner_location":75,"owner_email":75,"owner_twitter":75,"owner_website":76,"owner_url":77,"languages":78,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":32,"env_os":95,"env_gpu":96,"env_ram":95,"env_deps":97,"category_tags":103,"github_topics":105,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":117,"updated_at":118,"faqs":119,"releases":151},7844,"NVIDIA-NeMo\u002FDataDesigner","DataDesigner","🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.","DataDesigner 是 NVIDIA NeMo 推出的一款开源工具，旨在帮助开发者从零开始或基于种子数据生成高质量的合成数据集。它超越了简单的 LLM 提示词生成方式，专注于解决人工智能训练中真实数据稀缺、隐私敏感或分布不均的难题，让用户能够构建具备多样化统计分布和字段间逻辑关联的生产级数据。\n\n这款工具特别适合 AI 研究人员、数据工程师及需要定制化数据的开发者使用。其核心亮点在于灵活的生成框架：用户既可以利用统计采样器，也能结合大语言模型（LLM）来创造数据，并精确控制不同字段间的依赖关系。为了确保数据可用性，DataDesigner 内置了强大的验证机制，支持通过 Python 代码、SQL 查询以及自定义规则进行本地或远程校验，甚至能利用\"LLM 作为裁判”的技术对输出质量进行自动评分。此外，它还提供了预览模式，允许用户在大规模生成前快速迭代和优化配置。无论是需要特定领域数据的教育场景，还是追求高鲁棒性的模型训练，DataDesigner 都能提供高效、可控的数据解决方案。","# 🎨 NeMo Data Designer\n\n[![CI](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Factions\u002Fworkflows\u002Fci.yml)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![Python 3.10 - 3.13](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🐍_Python-3.10_|_3.11_|_3.12_|_3.13-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F) [![NeMo Microservices](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNeMo-Microservices-76b900)](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fmicroservices\u002Flatest\u002Findex.html) [![Code](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode-Documentation-8A2BE2.svg)](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002F) ![Tokens](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F250+_Billion-Tokens_Generated-76b900.svg?logo=nvidia&logoColor=white)\n\n**Generate high-quality synthetic datasets from scratch or using your own seed data.**\n\n---\n\n## Welcome!\n\nData Designer helps you create synthetic datasets that go beyond simple LLM prompting. Whether you need diverse statistical distributions, meaningful correlations between fields, or validated high-quality outputs, Data Designer provides a flexible framework for building production-grade synthetic data.\n\n## What can you do with Data Designer?\n\n- **Generate diverse data** using statistical samplers, LLMs, or existing seed datasets\n- **Control relationships** between fields with dependency-aware generation\n- **Validate quality** with built-in Python, SQL, and custom local and remote validators\n- **Score outputs** using LLM-as-a-judge for quality assessment\n- **Iterate quickly** with preview mode before full-scale generation\n\n---\n\n### ⚠️ Security Notice: LiteLLM Supply-Chain Incident (2026-03-24)\n\nOn March 24, 2026, malicious versions of `litellm` ([1.82.7 and 1.82.8](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm\u002Fissues\u002F24518)) were published to PyPI containing a credential stealer. The compromised packages were available for [approximately five hours](https:\u002F\u002Fwww.okta.com\u002Fblog\u002Fthreat-intelligence\u002Flitellm-supply-chain-attack--an-explainer-for-identity-pros\u002F) (10:39 – 16:00 UTC) before being removed.\n\nThe only Data Designer releases that could resolve to these versions are **v0.2.2** (Dec 2025) and **v0.2.3** (Jan 2026), which carried a looser `litellm\u003C2` upper bound. These are nearly three months old and have been superseded by eight subsequent releases — both have been yanked from PyPI as a precaution. All other releases (v0.3.0 – v0.5.3) pinned `litellm` to `>=1.73.6,\u003C1.80.12` and were never compatible with 1.82.x. Starting with v0.5.4, `litellm` is no longer a dependency.\n\nTo have been impacted through Data Designer, you would need to have had one of these two old versions explicitly pinned *and* run a fresh `pip install` or dependency-cache update that resolved `litellm` during the five-hour window on March 24. If you believe you may be affected, see [BerriAI's incident report](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm\u002Fissues\u002F24518) for remediation steps.\n\n---\n\n## Quick Start\n\n### 1. Install\n\n```bash\npip install data-designer\n```\n\nOr install from source:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner.git\ncd DataDesigner\nmake install\n```\n\n### 2. Set your API key\n\nStart with one of our default model providers:\n\n- [NVIDIA Build API](https:\u002F\u002Fbuild.nvidia.com)\n- [OpenAI](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys)\n- [OpenRouter](https:\u002F\u002Fopenrouter.ai)\n\nGrab your API key(s) using the above links and set one or more of the following environment variables:\n```bash\nexport NVIDIA_API_KEY=\"your-api-key-here\"\n\nexport OPENAI_API_KEY=\"your-openai-api-key-here\"\n\nexport OPENROUTER_API_KEY=\"your-openrouter-api-key-here\"\n```\n\n### 3. Start generating data!\n```python\nimport data_designer.config as dd\nfrom data_designer.interface import DataDesigner\n\n# Initialize with default settings\ndata_designer = DataDesigner()\nconfig_builder = dd.DataDesignerConfigBuilder()\n\n# Add a product category\nconfig_builder.add_column(\n    dd.SamplerColumnConfig(\n        name=\"product_category\",\n        sampler_type=dd.SamplerType.CATEGORY,\n        params=dd.CategorySamplerParams(\n            values=[\"Electronics\", \"Clothing\", \"Home & Kitchen\", \"Books\"],\n        ),\n    )\n)\n\n# Generate personalized customer reviews\nconfig_builder.add_column(\n    dd.LLMTextColumnConfig(\n        name=\"review\",\n        model_alias=\"nvidia-text\",\n        prompt=\"Write a brief product review for a {{ product_category }} item you recently purchased.\",\n    )\n)\n\n# Preview your dataset\npreview = data_designer.preview(config_builder=config_builder)\npreview.display_sample_record()\n```\n\n---\n\n## What's next?\n\n### 📚 Learn more\n\n- **[Getting Started](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002F)** – Install, configure, and generate your first dataset\n- **[Tutorial Notebooks](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fnotebooks\u002F)** – Step-by-step interactive tutorials\n- **[Column Types](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fconcepts\u002Fcolumns\u002F)** – Explore samplers, LLM columns, validators, and more\n- **[Validators](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fconcepts\u002Fvalidators\u002F)** – Learn how to validate generated data with Python, SQL, and remote validators\n- **[Model Configuration](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fconcepts\u002Fmodels\u002Fmodel-configs\u002F)** – Configure custom models and providers\n- **[Person Sampling](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fconcepts\u002Fperson_sampling\u002F)** – Learn how to sample realistic person data with demographic attributes\n\n### 🔧 Configure models via CLI\n\n```bash\ndata-designer config providers # Configure model providers\ndata-designer config models    # Set up your model configurations\ndata-designer config list      # View current settings\n```\n\n### 🤖 Agent Skill\n\nData Designer has a [skill](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fdevnotes\u002Fdata-designer-got-skills\u002F) for coding agents. Just describe the dataset you want, and your agent handles schema design, validation, and generation. While the skill should work with other coding agents that support skills, our development and testing has focused on [Claude Code](https:\u002F\u002Fcode.claude.com) at this stage.\n\n**Install via [skills.sh](https:\u002F\u002Fskills.sh)** (be sure to select Claude Code as an additional agent):\n\n```bash\nnpx skills add NVIDIA-NeMo\u002FDataDesigner\n```\n\nAfter installation, type `\u002Fdata-designer` or describe the dataset you want and the skill will kick in.\n\n### 🤝 Get involved\n\nThis repository supports agent-assisted development — see [CONTRIBUTING.md](CONTRIBUTING.md) for the recommended workflow.\n\n- **[Contributing Guide](CONTRIBUTING.md)** – How to contribute, including agent-assisted workflows\n- **[GitHub Issues](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fissues)** – Report bugs or make a feature request\n\n---\n\n## Telemetry\n\nData Designer collects telemetry to help us improve the library for developers. We collect:\n\n* The names of models used\n* The count of input tokens\n* The count of output tokens\n\n**No user or device information is collected.** This data is not used to track any individual user behavior. It is used to see an aggregation of which models are the most popular for SDG. We will share this usage data with the community.\n\nSpecifically, a model name that is defined a `ModelConfig` object, is what will be collected. In the below example config:\n\n```python\nModelConfig(\n    alias=\"nv-reasoning\",\n    model=\"openai\u002Fgpt-oss-20b\",\n    provider=\"nvidia\",\n    inference_parameters=ChatCompletionInferenceParams(\n        temperature=0.3,\n        top_p=0.9,\n        max_tokens=4096,\n    ),\n)\n```\n\nThe value `openai\u002Fgpt-oss-20b` would be collected.\n\nTo disable telemetry capture, set `NEMO_TELEMETRY_ENABLED=false`.\n\n### Top Models\n\nThis chart represents the breakdown of models used for Data Designer across all synthetic data generation jobs from 2\u002F23\u002F2026 to 3\u002F23\u002F2026.\n\n![Top models used for synthetic data generation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVIDIA-NeMo_DataDesigner_readme_31eed3c02d77.png)\n\n_Last updated on 3\u002F23\u002F2026_\n\n---\n\n## License\n\nApache License 2.0 – see [LICENSE](LICENSE) for details.\n\n---\n\n## Citation\n\nIf you use NeMo Data Designer in your research, please cite it using the following BibTeX entry:\n\n```bibtex\n@misc{nemo-data-designer,\n  author = {The NeMo Data Designer Team, NVIDIA},\n  title = {NeMo Data Designer: A framework for generating synthetic data from scratch or based on your own seed data},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner}},\n  year = {2025},\n  note = {GitHub Repository},\n}\n```\n","# 🎨 NeMo 数据设计师\n\n[![CI](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Factions\u002Fworkflows\u002Fci.yml)\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![Python 3.10 - 3.13](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🐍_Python-3.10_|_3.11_|_3.12_|_3.13-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F) [![NeMo 微服务](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNeMo-Microservices-76b900)](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fmicroservices\u002Flatest\u002Findex.html) [![代码](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode-Documentation-8A2BE2.svg)](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002F) ![令牌](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F250+_Billion-Tokens_Generated-76b900.svg?logo=nvidia&logoColor=white)\n\n**从零开始或使用您自己的种子数据生成高质量的合成数据集。**\n\n---\n\n## 欢迎！\n\n数据设计师可帮助您创建超越简单 LLM 提示的合成数据集。无论您需要多样化的统计分布、字段之间的有意义相关性，还是经过验证的高质量输出，数据设计师都提供了一个灵活的框架来构建生产级的合成数据。\n\n## 使用数据设计师可以做什么？\n\n- **生成多样化数据**：使用统计采样器、LLM 或现有种子数据集\n- **控制字段间关系**：通过依赖感知生成方式\n- **验证质量**：内置 Python、SQL 以及自定义本地和远程验证器\n- **评分输出**：使用 LLM 作为评判者进行质量评估\n- **快速迭代**：在全面生成之前先使用预览模式\n\n---\n\n### ⚠️ 安全公告：LiteLLM 供应链事件（2026年3月24日）\n\n2026年3月24日，恶意版本的 `litellm`（[1.82.7 和 1.82.8](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm\u002Fissues\u002F24518)）被发布到 PyPI，其中包含一个凭证窃取程序。这些被篡改的软件包在被移除之前，大约有 [五个小时](https:\u002F\u002Fwww.okta.com\u002Fblog\u002Fthreat-intelligence\u002Flitellm-supply-chain-attack--an-explainer-for-identity-pros\u002F) 的时间处于可用状态（UTC 时间 10:39 – 16:00）。\n\n唯一可能解析到这些版本的数据设计师发行版是 **v0.2.2**（2025年12月）和 **v0.2.3**（2026年1月），它们对 `litellm` 的版本上限限制较为宽松，为 `\u003C2`。这两个版本已经接近三个月前发布，并已被后续八个版本所取代——出于预防措施，它们均已从 PyPI 上撤回。其他所有发行版（v0.3.0 – v0.5.3）都将 `litellm` 锁定在 `>=1.73.6,\u003C1.80.12` 范围内，因此从未与 1.82.x 版本兼容。自 v0.5.4 起，`litellm` 已不再是依赖项。\n\n若要通过数据设计师受到影响，您必须明确锁定在这两个旧版本之一，并且在 3月24日的五小时窗口期内运行了新的 `pip install` 或依赖缓存更新，从而解析到 `litellm`。如果您认为自己可能受到影响，请参阅 [BerriAI 的事件报告](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm\u002Fissues\u002F24518)，以获取修复步骤。\n\n---\n\n## 快速入门\n\n### 1. 安装\n\n```bash\npip install data-designer\n```\n\n或者从源码安装：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner.git\ncd DataDesigner\nmake install\n```\n\n### 2. 设置您的 API 密钥\n\n您可以从我们的默认模型提供商开始：\n\n- [NVIDIA Build API](https:\u002F\u002Fbuild.nvidia.com)\n- [OpenAI](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys)\n- [OpenRouter](https:\u002F\u002Fopenrouter.ai)\n\n使用上述链接获取您的 API 密钥，并设置以下环境变量之一或多组：\n```bash\nexport NVIDIA_API_KEY=\"your-api-key-here\"\n\nexport OPENAI_API_KEY=\"your-openai-api-key-here\"\n\nexport OPENROUTER_API_KEY=\"your-openrouter-api-key-here\"\n```\n\n### 3. 开始生成数据！\n```python\nimport data_designer.config as dd\nfrom data_designer.interface import DataDesigner\n\n# 使用默认设置初始化\ndata_designer = DataDesigner()\nconfig_builder = dd.DataDesignerConfigBuilder()\n\n# 添加产品类别\nconfig_builder.add_column(\n    dd.SamplerColumnConfig(\n        name=\"product_category\",\n        sampler_type=dd.SamplerType.CATEGORY,\n        params=dd.CategorySamplerParams(\n            values=[\"电子产品\", \"服装\", \"家居与厨房\", \"书籍\"],\n        ),\n    )\n)\n\n# 生成个性化的客户评论\nconfig_builder.add_column(\n    dd.LLMTextColumnConfig(\n        name=\"review\",\n        model_alias=\"nvidia-text\",\n        prompt=\"请为你最近购买的 {{ product_category }} 商品写一段简短的产品评论。\",\n    )\n)\n\n# 预览您的数据集\npreview = data_designer.preview(config_builder=config_builder)\npreview.display_sample_record()\n```\n\n---\n\n## 接下来？\n\n### 📚 了解更多\n\n- **[入门指南](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002F)** – 安装、配置并生成您的第一个数据集\n- **[教程笔记本](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fnotebooks\u002F)** – 分步交互式教程\n- **[列类型](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fconcepts\u002Fcolumns\u002F)** – 探索采样器、LLM 列、验证器等\n- **[验证器](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fconcepts\u002Fvalidators\u002F)** – 学习如何使用 Python、SQL 和远程验证器验证生成的数据\n- **[模型配置](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fconcepts\u002Fmodels\u002Fmodel-configs\u002F)** – 配置自定义模型和提供商\n- **[人物采样](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fconcepts\u002Fperson_sampling\u002F)** – 学习如何采样具有人口统计特征的真实人物数据\n\n### 🔧 通过 CLI 配置模型\n\n```bash\ndata-designer config providers # 配置模型提供商\ndata-designer config models    # 设置您的模型配置\ndata-designer config list      # 查看当前设置\n```\n\n### 🤖 代理技能\n\n数据设计师具备针对编码代理的 [技能](https:\u002F\u002Fnvidia-nemo.github.io\u002FDataDesigner\u002Flatest\u002Fdevnotes\u002Fdata-designer-got-skills\u002F)。只需描述您想要的数据集，您的代理就会负责模式设计、验证和生成。虽然该技能应能与其他支持技能的编码代理配合使用，但目前我们的开发和测试主要集中在 [Claude Code](https:\u002F\u002Fcode.claude.com) 上。\n\n**通过 [skills.sh](https:\u002F\u002Fskills.sh) 安装**（请务必选择 Claude Code 作为附加代理）：\n\n```bash\nnpx skills add NVIDIA-NeMo\u002FDataDesigner\n```\n\n安装完成后，输入 `\u002Fdata-designer` 或描述您想要的数据集，技能将自动启动。\n\n### 🤝 参与进来\n\n此仓库支持代理辅助开发——请参阅 [CONTRIBUTING.md](CONTRIBUTING.md) 了解推荐的工作流程。\n\n- **[贡献指南](CONTRIBUTING.md)** – 如何贡献，包括代理辅助工作流程\n- **[GitHub 问题](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fissues)** – 报告错误或提出功能请求\n\n---\n\n## 遥测\n\nData Designer 会收集遥测数据，以帮助我们改进面向开发者的库。我们收集以下信息：\n\n* 使用的模型名称\n* 输入 token 的数量\n* 输出 token 的数量\n\n**不会收集任何用户或设备信息。** 这些数据不会用于追踪任何个人用户的行为。它们仅用于汇总 SDG 中最受欢迎的模型。我们会与社区共享这些使用数据。\n\n具体来说，将收集在 `ModelConfig` 对象中定义的模型名称。例如，在下面的配置中：\n\n```python\nModelConfig(\n    alias=\"nv-reasoning\",\n    model=\"openai\u002Fgpt-oss-20b\",\n    provider=\"nvidia\",\n    inference_parameters=ChatCompletionInferenceParams(\n        temperature=0.3,\n        top_p=0.9,\n        max_tokens=4096,\n    ),\n)\n```\n\n将会收集值 `openai\u002Fgpt-oss-20b`。\n\n要禁用遥测采集，请设置 `NEMO_TELEMETRY_ENABLED=false`。\n\n### 热门模型\n\n此图表展示了从 2026 年 2 月 23 日至 2026 年 3 月 23 日期间，所有合成数据生成任务中使用的模型分布情况。\n\n![用于合成数据生成的热门模型](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVIDIA-NeMo_DataDesigner_readme_31eed3c02d77.png)\n\n*最后更新于 2026 年 3 月 23 日*\n\n---\n\n## 许可证\n\nApache License 2.0 – 详情请参阅 [LICENSE](LICENSE) 文件。\n\n---\n\n## 引用\n\n如果您在研究中使用了 NeMo Data Designer，请使用以下 BibTeX 条目进行引用：\n\n```bibtex\n@misc{nemo-data-designer,\n  author = {The NeMo Data Designer Team, NVIDIA},\n  title = {NeMo Data Designer：一个从零开始或基于您自己的种子数据生成合成数据的框架},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner}},\n  year = {2025},\n  note = {GitHub 仓库},\n}\n```","# DataDesigner 快速上手指南\n\nDataDesigner 是 NVIDIA NeMo 推出的一款开源工具，旨在帮助开发者从零开始或基于种子数据生成高质量的合成数据集。它超越了简单的 LLM 提示词生成，支持统计分布控制、字段间依赖关系定义以及严格的数据质量验证。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows (WSL 推荐)\n*   **Python 版本**：3.10, 3.11, 3.12 或 3.13\n*   **API Key**：您需要至少一个支持的模型提供商 API Key（如 NVIDIA Build API, OpenAI, 或 OpenRouter）。\n\n> **安全提示**：请确保安装最新版本以避免已知的供应链安全风险。本项目已移除对受影响版本 `litellm` 的依赖。\n\n## 安装步骤\n\n您可以选择通过 PyPI 直接安装，或从源码安装。\n\n### 方式一：使用 pip 安装（推荐）\n\n```bash\npip install data-designer\n```\n\n*(国内用户若遇到下载速度慢的问题，可尝试使用清华源：`pip install data-designer -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`)*\n\n### 方式二：从源码安装\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner.git\ncd DataDesigner\nmake install\n```\n\n## 基本使用\n\n以下是生成包含“产品类别”和\"AI 生成的评论”的合成数据集的最小化示例。\n\n### 1. 配置 API Key\n\n在终端中设置您的环境变量（任选其一即可）：\n\n```bash\nexport NVIDIA_API_KEY=\"your-api-key-here\"\n# 或者\nexport OPENAI_API_KEY=\"your-openai-api-key-here\"\n# 或者\nexport OPENROUTER_API_KEY=\"your-openrouter-api-key-here\"\n```\n\n### 2. 编写生成脚本\n\n创建 `generate_data.py` 文件，填入以下代码：\n\n```python\nimport data_designer.config as dd\nfrom data_designer.interface import DataDesigner\n\n# 初始化工具\ndata_designer = DataDesigner()\nconfig_builder = dd.DataDesignerConfigBuilder()\n\n# 添加列 1: 产品类别 (使用统计采样器)\nconfig_builder.add_column(\n    dd.SamplerColumnConfig(\n        name=\"product_category\",\n        sampler_type=dd.SamplerType.CATEGORY,\n        params=dd.CategorySamplerParams(\n            values=[\"Electronics\", \"Clothing\", \"Home & Kitchen\", \"Books\"],\n        ),\n    )\n)\n\n# 添加列 2: 用户评论 (使用 LLM 生成，依赖上一列的内容)\nconfig_builder.add_column(\n    dd.LLMTextColumnConfig(\n        name=\"review\",\n        model_alias=\"nvidia-text\",\n        prompt=\"Write a brief product review for a {{ product_category }} item you recently purchased.\",\n    )\n)\n\n# 预览生成的数据样本\npreview = data_designer.preview(config_builder=config_builder)\npreview.display_sample_record()\n```\n\n### 3. 运行并查看结果\n\n在终端执行脚本：\n\n```bash\npython generate_data.py\n```\n\n程序将输出一条合成的示例记录，展示如何根据指定的产品类别生成对应的评论文本。确认无误后，您可进一步调用正式生成接口批量产出数据。","某金融科技公司正在开发一款面向年轻用户的智能理财助手，急需大量包含复杂收入支出关联、符合真实统计分布的合成交易数据来训练模型，同时必须严格规避真实用户隐私泄露风险。\n\n### 没有 DataDesigner 时\n- 数据获取困难：依赖少量脱敏真实数据或简单脚本生成的随机数，缺乏字段间合理的逻辑关联（如高收入通常对应特定类型的投资支出）。\n- 质量验证繁琐：团队需手动编写大量 SQL 和 Python 脚本校验数据一致性，效率低下且容易遗漏边缘情况。\n- 分布单一僵化：难以模拟长尾分布或特定场景下的极端案例，导致模型在罕见但重要的金融场景中表现不佳。\n- 迭代周期漫长：每次调整数据特征都需要重新跑全套生成和校验流程，无法快速预览效果，严重拖慢研发进度。\n\n### 使用 DataDesigner 后\n- 智能关联生成：利用依赖感知生成功能，轻松定义“收入”与“消费类别”间的复杂关系，自动产出逻辑严密的逼真交易记录。\n- 内置多维校验：直接调用内置的 Python 和 SQL 验证器，实时拦截不符合业务规则的数据，确保输出即达标。\n- 灵活分布控制：通过统计采样器精准控制数据分布，既能覆盖常规场景，又能按需生成稀缺的极端案例以增强模型鲁棒性。\n- 快速预览迭代：启用预览模式先小批量生成并评分，确认无误后再全量生产，将数据准备周期从数天缩短至数小时。\n\nDataDesigner 让团队在零隐私风险的前提下，高效构建了高质量、强逻辑的生产级合成数据集，显著提升了理财模型的训练效果与上线速度。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVIDIA-NeMo_DataDesigner_ed3ae440.png","NVIDIA-NeMo","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FNVIDIA-NeMo_ef2128b9.png","",null,"https:\u002F\u002Fnvidia.com\u002F","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo",[79,83,87],{"name":80,"color":81,"percentage":82},"Python","#3572A5",98.6,{"name":84,"color":85,"percentage":86},"Makefile","#427819",0.8,{"name":88,"color":89,"percentage":90},"Shell","#89e051",0.5,1620,140,"2026-04-15T09:59:25","Apache-2.0","未说明","非必需（主要依赖云端 API 如 NVIDIA Build API, OpenAI, OpenRouter 进行生成）",{"notes":98,"python":99,"dependencies":100},"该工具主要通过调用外部 LLM API（如 NVIDIA, OpenAI, OpenRouter）运行，本地无需高性能 GPU。需设置相应的 API KEY 环境变量。注意避免使用受供应链攻击影响的旧版本（v0.2.2, v0.2.3），建议使用 v0.5.4 及以上版本。支持通过 CLI 配置模型提供商，也支持集成到编码 Agent（如 Claude Code）中使用。默认开启遥测收集模型使用情况，可通过设置 NEMO_TELEMETRY_ENABLED=false 关闭。","3.10, 3.11, 3.12, 3.13",[101,102],"litellm (v0.5.4 起已移除)","data-designer",[13,104,16,14,35],"其他",[106,107,108,109,110,111,112,113,114,115,116],"agentic-ai","data-augmentation","data-generation","llm","mcp","multimodal","nemo","nvidia","synthetic-data","tool-use","sdg","2026-03-27T02:49:30.150509","2026-04-16T01:43:25.420459",[120,125,129,134,138,143,147],{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},35162,"如何为 Data Designer 贡献第三方插件并将其列入官方可用插件列表？","官方非常欢迎第三方插件贡献。目前的插件设计仍在完善中，建议参考项目中的端到端测试（e2e tests）以获取最新的插件实现模式。如果您开发了有用的插件（例如声明式列生成器），可以联系维护者将其添加到文档的“可用插件列表”（Available Plugin List）中。","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fissues\u002F281",{"id":126,"question_zh":127,"answer_zh":128,"source_url":124},35163,"在导入插件配置时遇到“循环导入”或“找不到类”的错误怎么办？","这通常是由于插件配置模块在完全初始化前被访问导致的竞态条件。如果可能，重构插件以接受完整的列集而不是逐个列构建，从而避免不必要的模块重载。此外，请确保遵循最新的插件实现模式（参考项目的 e2e 测试代码），因为插件接口仍在实验中并趋于稳定。",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},35164,"如何使用 Data Designer 为受监管的环境（如医疗、检测）生成合成数据以保护隐私？","对于需要避免暴露 PII（个人身份信息）的场景，建议结合使用 NeMo Agent Toolkit 和私有合成数据生成微服务。您可以先进行表格数据的合成生成，再单独处理文本\u002FLLM 列（如技术员笔记）。对于罕见的高风险案例，可以通过场景模板强制生成特定的组合（如高污染 + 学校环境），以训练代理识别这些关键情况。","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fissues\u002F98",{"id":135,"question_zh":136,"answer_zh":137,"source_url":133},35165,"是否有推荐的方法来处理少数群体或边缘案例的合成数据生成？","针对稀有但重要的边缘案例（如极端污染情况），建议使用场景模板（scenario templates）来强制生成特定的数据组合，或者对特定类别和数值范围进行过采样\u002F加权。这有助于确保合成数据集中包含足够的训练样本供异常检测模型学习。",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},35166,"是否支持在 Jinja 模板中添加自定义过滤器（如打乱列表顺序）以增加提示词的多样性？","目前官方暂不计划在核心库中直接添加自定义 Jinja 过滤器（如 shuffle），以避免功能过于分散和维护困难。建议通过其他途径实现类似效果，例如利用提示词渲染配方（prompt rendering recipes），或者使用标准 Jinja 功能配合列表数据类型来实现所需的灵活性。","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fissues\u002F88",{"id":144,"question_zh":145,"answer_zh":146,"source_url":124},35167,"Data Designer 未来的路线图包括哪些高级功能？","开发团队计划在未来版本中引入更高级的抽象功能，包括：带参数的列模板、优先于模式的 DSL（领域特定语言）、可组合的代码片段、变体\u002F功能标志、用于常见领域的列捆绑包，甚至可能支持自然语言定义列的方式。",{"id":148,"question_zh":149,"answer_zh":150,"source_url":133},35168,"在哪里可以找到关于 NeMo 框架的一般性问题讨论和支持？","对于关于 NeMo 的一般性问题，官方推荐前往 NVIDIA 开发者论坛（forums.developer.nvidia.com）进行提问和交流。此外，可以参考 NeMo Agent Toolkit 文档、私有合成数据生成指南以及 AutoModel 的 VLM 覆盖范围文档获取更多技术细节。",[152,157,162,167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242,247],{"id":153,"version":154,"summary_zh":155,"released_at":156},280171,"v0.5.6","## 变更内容\n* 修复：在健康探测 CLI 检查中使用 `--bare` 和 `--tools` 标志，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F489 中完成\n* 功能：添加 ATIF 部署导入功能，由 @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F495 中完成\n* 修复：将 native-model-client-hero 镜像替换为修正版本，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F492 中完成\n* 文档：为条件列生成添加跳过列配置选项 (#479)，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F480 中完成\n* 杂项：规划 427 号任务，即以代理为中心开发计划的第 2 个 PR，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F478 中完成\n* 功能：添加 Hermes Agent 部署支持，由 @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F500 中完成\n* 修复：防止在未安装 data-designer CLI 时出现技能加载失败，由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F501 中完成\n* CI：添加 PR 审查工作流及基于代理的 CI 配方，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F498 中完成\n* 文档：添加代理部署导入文档入口，由 @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F499 中完成\n* 文档：添加异步引擎开发说明，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F490 中完成\n* 修复：使用非阻塞分发以防止管道饥饿，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F505 中完成\n* 功能：添加 Pi Coding Agent 部署种子源，由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F514 中完成\n* 修复：始终从日期时间后处理返回 ISO-8601 格式（#484），由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F512 中完成\n* 修复：将 multi_modal_context 列纳入 required_columns，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F522 中完成\n* 文档：在 README 中添加 LiteLLM 供应链事件通知，由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F516 中完成\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.5.5...v0.5.6","2026-04-09T19:36:55",{"id":158,"version":159,"summary_zh":160,"released_at":161},280172,"v0.5.5","## 变更内容\n* 修复：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F458 中修复了 Claude Code 市场插件的结构和安装文档。\n* 文档：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F461 中更新了开发说明，添加了 TL;DR 技巧和安装指南。\n* 杂项：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F463 中移除了未使用的 .claude-plugin 目录。\n* 杂项：异步引擎后续工作——重命名、预览、生命周期和进度跟踪，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F456 中完成。\n* 文档：重新组织了代理和贡献者文档（计划 427，PR 1），由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F454 中完成。\n* 修复：处理了针对 requests 和 cryptography 的 nSight 漏洞报告，由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F475 中完成。\n* 修复：将健康检查更新为使用新的 ModelFacade 客户端 API，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F470 中完成。\n* CI：升级 GitHub Actions 以兼容 Node.js 24，由 @ko3n1g 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F450 中完成。\n* 杂项：减少 Greptile 审查中因防御性编程建议产生的噪音，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F423 中完成。\n* 文档：整合了种子读取器的相关文档，由 @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F481 中完成。\n* 修复：为兼容 Pygments 2.20.0，升级了 pymdown-extensions，由 @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F482 中完成。\n* 功能：在 nemotron 人物数据集中添加 fr_FR 本地化支持，由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F468 中完成。\n* 文档：添加了原生模型客户端的开发说明，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F465 中完成。\n* 修复：在 HTTP 连接池大小中尊重 max_parallel_requests 设置，由 @przemekboruta 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F460 中完成。\n* 文档：在原生模型客户端的开发说明中居中显示图表图片，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F483 中完成。\n* 文档：更新了 architecture-and-performance.md 文件，以反映 AIMD 的变更，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F467 中完成。\n* 杂项：更新了评审代码技能输出和语气，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F477 中完成。\n* CI：添加了代理型 CI 计划、健康探测工作流以及配方脚手架，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F473 中完成。\n* 测试：为 #459 添加了传输线路回归测试，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F485 中完成。\n\n## 新贡献者\n* @ko3n1g 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F450 中完成了首次贡献。\n* @przemekboruta 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F460 中完成了首次贡献。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.5.4...v0.5.5","2026-04-02T16:31:26",{"id":163,"version":164,"summary_zh":165,"released_at":166},280173,"v0.5.4","### 🔒 关于 LiteLLM 供应链事件的说明（2026年3月24日）\n\n今天早些时候，恶意版本的 `litellm`（1.82.7 和 1.82.8）被 [发布到 PyPI](https:\u002F\u002Fthehackernews.com\u002F2026\u002F03\u002Fteampcp-backdoors-litellm-versions.html)，其中包含一个凭据窃取工具，会在任何 Python 进程启动时窃取云凭证、SSH 密钥和加密货币钱包。\n\n**Data Designer v0.5.4 已完全移除对 `litellm` 的依赖。** 我们建议所有用户尽快升级。\n\n对于使用旧版本的用户，我们的评估如下：\n\n- **v0.3.0 – v0.5.3**：自一月份以来，`litellm` 的版本一直被固定为 `>=1.73.6,\u003C1.80.12` — **不存在暴露于受感染版本的风险**。\n- **v0.2.2 和 v0.2.3**：这两个版本曾短暂地设置了 `\u003C2` 的上限，理论上可能解析到恶意的 1.82.x 版本。出于预防性考虑，这两个版本已被 **从 PyPI 上撤销**。\n- **Data Designer 面临的实际风险非常低。** 只有当用户仍然锁定在上述两个已被撤销的版本，并且在今晨受感染软件上线的短短几个小时内执行了全新安装或依赖项更新时，才有可能受到影响。尽管如此，请您仍需检查当前安装的版本，并尽快将 Data Designer 升级至 v0.5.4。\n\n\n## 变更内容\n* 修复：由 dhruvnathawani 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F407 中修正配方页面中损坏的开发说明链接\n* 文档：在 display_sample_record 中添加轨迹可视化功能 (#396)，由 nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F397 中完成\n* 杂项：简化教程 4 中的图像数据集，并使用默认模型配置，由 nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F403 中实现\n* 修复：为 LiteLLM 保留 extra_body，以避免 UnsupportedParamsError 异常 (#409)，由 nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F412 中完成\n* 功能：由 johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F414 中实现验证器和约束判别器的归一化\n* 文档：由 mvansegbroeck 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F391 中为教程笔记本添加“在 Colab 中打开”徽章\n* 功能：由 johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F415 中实现简化的代理 CLI 自省功能\n* 修复：将 litellm 的最低版本要求提升至 >=1.77.0，由 nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F417 中完成\n* 功能：由 eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F416 中改进模式和超时失败情况下的生成失败报告\n* 功能：由 eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F421 中添加内置的文件系统种子读取器\n* 功能：由 andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F404 中引入 AsyncTaskScheduler 和 RowGroupBufferManager，用于异步引擎\n* 功能：由 nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F402 中实现带有重试机制和 AIMD 节流基础设施的原生 OpenAI 适配器\n* 重构：由 johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F420 中将代理 CLI 简化为上下文、类型和状态相关部分\n*","2026-03-25T02:02:00",{"id":168,"version":169,"summary_zh":170,"released_at":171},280174,"v0.5.3","## 变更内容\n* 修复：缓存 Notebook 构建，以避免上游模型失败导致的不稳定问题，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F370 中完成。\n* 功能：标准化模型客户端类型、协议及 LiteLLM 桥接适配器，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F359 中完成。\n* 修复：处理器工件的类型、发现与加载，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F366 中完成。\n* 功能：为异步调度器添加 ExecutionGraph、CompletionTracker 和 Task 模型，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F356 中完成。\n* 修复：在 oneOf 剪枝验证器中处理区分联合类型，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F376 中完成。\n* 文档：在计划 343 中考虑 vLLM 推理字段的迁移，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F377 中完成。\n* 杂项：添加 Claude Code 技能用于代码审查，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F372 中完成。\n* 修复：将已移除的 DuckDB record_batch() 替换为 to_arrow_reader()，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F380 中完成。\n* 修复：修补 litellm 的 ImageURLListItem，使 index 字段变为可选（#384），由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F385 中完成。\n* 杂项：改进 AGENTS.md 中的测试指南，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F387 中完成。\n* 修复：当生成过程中所有记录都被丢弃时，抛出清晰的错误，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F383 中完成。\n* 功能：添加具有对称桥接和状态保持功能的异步生成器迁移，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F378 中完成。\n* 文档：添加企业级 Text-to-SQL 和搜索代理配方，由 @dhruvnathawani 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F395 中完成。\n* 重构：通过 ModelClient 适配器将 ModelFacade 与 LiteLLM 解耦，由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F373 中完成。\n* 文档：搜索代理开发说明，由 @dhruvnathawani 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F350 中完成。\n* 功能（CLI）：在 CLI 启动时引导默认配置，由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F401 中完成。\n* 修复：将 chardet 版本固定为 \u003C6，以抑制 RequestsDependencyWarning 警告，由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F405 中完成。\n* 修复：在发布的引擎包中添加 chardet\u003C6 的约束，由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F406 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.5.2...v0.5.3","2026-03-12T23:12:27",{"id":173,"version":174,"summary_zh":175,"released_at":176},280175,"v0.5.2","## 变更内容\n* 修复：由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F348 中修复笔记本 CI（模型失效、缺少 API 密钥、pyarrow 类型错误）\n* 文档：由 @kirit93 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F353 中更新 2026 年 1 月 24 日至 2 月 24 日的顶级模型使用图表\n* 文档：由 @dhruvnathawani 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F338 中添加结构化输出的 SDG 开发笔记\n* 功能：由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F299 中添加处理器插件支持\n* 杂项：由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F347 中规划异步生成器和任务队列数据集构建器\n* 杂项：由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F344 中规划模型 facade 的全面重构\n* 修复：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F361 中将种子数据集纳入仅包含种子配置的构建器表示中\n* 杂项：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F364 中升级 cryptography 和 pillow 以修复安全漏洞\n* 功能：由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F358 中为远程 MCP 提供商添加可流式传输的 HTTP 传输支持\n* 文档：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F367 中将 README 中的令牌徽章更新为 1500 亿以上\n* 文档：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F368 中修复结构化输出博客格式\n* 杂项：由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F369 中修正不准确之处并改进 AGENTS.md 文件\n* 修复：由 @3mei 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F365 中在 display_sample_record() 中包含插件列类型\n\n## 新贡献者\n* @3mei 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F365 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.5.1...v0.5.2","2026-03-05T04:49:32",{"id":178,"version":179,"summary_zh":180,"released_at":181},280176,"v0.5.1","Data Designer 现已支持图像生成！\r\n\r\n## 变更内容\r\n* 文档：@kirit93 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F325 中更新了 URL\r\n* 文档：@eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F326 中介绍了使用 NDD 和 MCP 工具进行深度研究的流程\r\n* 重构：@andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F294 中实现了基于回调的处理器设计\r\n* 功能：@nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F317 中新增了支持多模态上下文的图像生成功能\r\n* 文档：@nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F319 中添加了图像生成相关文档及图像到图像编辑教程\r\n* 杂项：@nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F321 中将 ArtifactStorage 移至 engine\u002Fstorage\u002F 模块\r\n* 杂项：@johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F328 中将 Cerebro 知识库文件加入 .gitignore\r\n* 功能（引擎）：@eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F280 中新增了用于异步优先模型实验的环境变量开关\r\n* 文档：@kirit93 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F331 中将导航栏移至左侧\r\n* 功能：@johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F333 中为 preview 命令添加了 --save-results 选项\r\n* 杂项：@johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F330 中通过延迟加载和清理重资源导入优化了 CLI 启动过程\r\n* 功能：@andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F286 中为 1:N 和 N:1 的生成模式添加了 allow_resize 参数\r\n* 杂项：@johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F335 中根据 Andre 对 --save-results 和 CLI 预览的反馈进行了调整\r\n* 杂项：@andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F337 中从仓库根目录移除了 example_allow_resize.py 文件\r\n* 修复：@andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F334 中使 DropColumnsProcessorConfig 具有幂等性，并支持推理列\r\n* 功能：@nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F340 中新增了 push_to_hub_from_folder 类方法，用于上传已保存的数据集\r\n* 修复：@dhruvnathawani 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F336 中处理了 convert_to_row_element 中的布尔值、整数和浮点数\r\n* 功能：@nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F342 中实现了图像到图像生成时ImageContext 格式的自动检测\r\n\r\n## 新贡献者\r\n* @dhruvnathawani 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F336 中完成了首次贡献\r\n\r\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.5.0...v0.5.1","2026-02-20T21:06:00",{"id":183,"version":184,"summary_zh":185,"released_at":186},280177,"v0.5.0","# 🎨 NeMo 数据设计师 – v0.5.0 发行说明\n\n## ⚡ 亮点\n\n- 🛠️ MCP 工具调用：LLM 列现在可以通过 MCP 在生成过程中调用外部工具！！\n \n- ⚛️ 作为自定义列生成器的函数：`@custom_column_generator` 装饰器允许用户编写自己的列生成逻辑，并将其直接插入到管道中。\n\n- 🤗 Hugging Face Hub 集成：现在可以将生成的数据集直接发布到 Hugging Face Hub，并自动生成数据集卡片：`results.push_to_hub()`。\n\n    - 非常感谢 @davidberenstein1957 启动了这一功能的设计与开发工作，同时也感谢 @davanstrien 和 @wauplin 的帮助，使该功能得以顺利完成！\n\n- 💻 CLI 生成命令：现在可以使用新的 `preview`、`create` 和 `validate` 命令从命令行生成数据。\n\n- 🔍 LLM 可观测性：在 LLM 配置中使用新的 `with_trace` 选项，可返回 `TraceType.ALL_MESSAGE` 或 `TraceType.LAST_MESSAGE`。此外，还可以通过设置 `extract_reasoning_content=True` 来选择性提取推理内容。\n\n## ⚠️ 破坏性变更\n\n- `with_trace` 原来是一个布尔值。现在它是一个 `TraceType` 枚举（`NONE`（默认）、`LAST_MESSAGE`、`ALL_MESSAGES`），而不是布尔值。\n\n- `SingleColumnConfig` 现在被隔离到其独立的基础模块 `data_designer.base.config` 中，以防止插件发现过程中出现循环导入问题。\n\n## 变更内容\n* 功能：由 @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F248 中实现的 LLM 列的 MCP（模型上下文协议）工具调用集成。\n* 修复：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F279 中对 mcp 模块中的许可证头年份格式进行规范化。\n* 杂项：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F278 中为每个子包配置独立的 pytest 设置。\n* 修复：由 @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F283 中对跟踪内容块进行标准化，以防止 Parquet 写入崩溃。\n* 功能：由 @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F284 中添加用于精细控制跟踪的 `TraceType` 枚举。\n* 文档：由 @kirit93 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F277 中添加部署和性能调优指南，并简化获取…\n* 杂项：由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F288 中更新教程笔记本，使其一致使用 `dd.` 表示法。\n* 功能：由 @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F285 中为 LLM 列添加 `extract_reasoning_content` 选项。\n* 杂项：由 @andreatgretel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F289 中添加 `greptile.json` 文件，以减少代码审查时的冗长输出。\n* 功能：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F282 中将构建系统从 hatch-vcs 切换到 uv-dynamic-versioning。\n* 回滚：由 @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F290 中移除 `RunConfig` 的 `debug_trace_override`。\n* 性能优化：由 @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesig 中为配置模块的导出实现懒加载。","2026-02-11T22:22:05",{"id":188,"version":189,"summary_zh":190,"released_at":191},280178,"v0.4.0","## 🎨 NeMo Data Designer v0.4.0 发行说明\n\n### ✨ 新增功能\n\n- **消息追踪**：在 LLM 生成过程中记录完整的对话历史，使您能够访问系统提示、渲染后的用户提示以及模型推理过程，以供下游用例使用。可通过 `with_trace=True` 为每列启用，或通过 `RunConfig` 进行全局配置。\n\n- **多图像支持**：在多模态场景中，每列可以传递多张图像，从而实现更丰富的视觉生成效果。\n\n- **扩展的代码语言支持**：在 `LLMCodeColumnConfig` 中新增了对 Bash、C、C++、C# 和 COBOL 的支持。\n\n- **进度日志记录**：在 LLM 列生成过程中提供进度更新，以便更好地监控长时间运行的任务。\n\n---\n\n### 💥 破坏性变更：导入结构\n\n`essentials` 模块已被移除，取而代之的是更简洁的导入方式。配置类现在通过 `data_designer.config` 访问，主接口则通过 `data_designer.interface` 使用。\n\n#### 之前（v0.3.x）：\n\n```python\nfrom data_designer.essentials import (\n    CategorySamplerParams,\n    DataDesigner,\n    DataDesignerConfigBuilder,\n    LLMTextColumnConfig,\n    SamplerColumnConfig,\n    SamplerType,\n)\n\ndata_designer = DataDesigner()\nconfig_builder = DataDesignerConfigBuilder()\n```\n\n#### 之后（v0.4.x）：\n\n```python\nimport data_designer.config as dd\nfrom data_designer.interface import DataDesigner\n\ndata_designer = DataDesigner()\nconfig_builder = dd.DataDesignerConfigBuilder()\n\n# 配置类通过 `dd` 命名空间访问\nconfig_builder.add_column(\n    dd.SamplerColumnConfig(\n        name=\"category\",\n        sampler_type=dd.SamplerType.CATEGORY,\n        params=dd.CategorySamplerParams(values=[\"A\", \"B\"]),\n    )\n)\n```\n\n### 💥 破坏性变更：推理追踪 → 消息追踪\n\n自动的 `__reasoning_trace` 列已被替换为可选的消息追踪，用于捕获完整的对话历史。\n\n**主要变化：**\n- 列后缀由 `__reasoning_trace` 更名为 `__trace`\n- 追踪现为**可选项**，而非自动启用\n- 追踪会捕获完整的消息历史（系统\u002F用户\u002F助手），包括重试对话\n\n#### 之前（v0.3.x）：\n\n推理追踪作为扩展思考模型的副作用列自动生成：\n\n```python\n# 追踪是自动的，无需配置\n# “answer” 列会自动生成“answer__reasoning_trace”\n```\n\n#### 之后（v0.4.x）：\n\n需显式地为每列或全局启用追踪：\n\n**逐列启用（推荐）：**\n```python\nimport data_designer.config as dd\n\nconfig_builder.add_column(\n    dd.LLMTextColumnConfig(\n        name=\"answer\",\n        prompt=\"Answer: {{ question }}\",\n        model_alias=\"nvidia-text\",\n        with_trace=True,  # 显式启用追踪\n    )\n)\n# 将生成“answer”和“answer__trace”两列\n```\n\n**全局调试覆盖：**\n```python\nimport data_designer.config as dd\nfrom data_designer.interface import DataDesigner\n\ndata_designer = DataDesigner()","2026-01-31T03:43:27",{"id":193,"version":194,"summary_zh":195,"released_at":196},280179,"v0.3.8","## 👀 新的 Nemotron-Personas 数据集\n\n`PersonSampler` 支持两种新的本地化环境：\n\n- [Nemotron-Personas-Singapore](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-Personas-Singapore)（`locale = en_SG`）\n- [Nemotron-Personas-Brazil](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-Personas-Brazil)（`locale = pt_BR`）\n\n## 变更内容\n* 修复：当未配置从头开始生成器时，解除生成阻塞 —— @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F231 中完成\n* 修复：不再尝试反序列化 LLM 文本响应 —— @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F233 中完成\n* 文档：更新了配方卡片 —— @kirit93 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F153 中完成\n* 修复：默认模型提供商不再显示 API 密钥警告 —— @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F238 中完成\n* 功能：支持 Claude Skills（DevX 和 Generation）—— @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F239 中完成\n* 功能：将非 LLM 并发限制提升至 `RunConfig` —— @eric-tramel 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F242 中完成\n* 功能：接入 pt_GB 和 en_SG 的人物角色数据 —— @johnnygreco 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F245 中完成\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.3.7...v0.3.8","2026-01-27T02:28:25",{"id":198,"version":199,"summary_zh":200,"released_at":201},280180,"v0.3.7","🎨 NeMo Data Designer v0.3.6 发行说明\n- 恢复了在 [PR-222](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F222) 中对 `litellm_overrides.py` 引入的惰性加载更改，该更改曾导致间歇性的导入问题。\n\n## 变更内容\n* 修复：由 @nabinchha 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F229 中引入的 litellm 覆盖项惰性加载更改已恢复。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.3.6...v0.3.7","2026-01-17T21:53:16",{"id":203,"version":204,"summary_zh":205,"released_at":206},280181,"v0.3.6","🎨 NeMo Data Designer v0.3.6 Release Notes\r\n- Fixes a regression introduced in [PR-222](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F222) that wasn't caught by our tests.\r\n\r\n## What's Changed\r\n* fix: incorrect litellm lazy load for class extension by @nabinchha in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F228\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.3.5...v0.3.6","2026-01-17T00:55:45",{"id":208,"version":209,"summary_zh":210,"released_at":211},280182,"v0.3.5","🎨 NeMo Data Designer v0.3.5 Release Notes\r\n\r\n##💥 Breaking Change: Plugins\r\n\r\nWe have made some updates to the task and column generation abstractions, which come with some breaking changes for plugin developers. \r\n\r\n1. No more `GeneratorMetadata`\r\n\r\nWe have completely removed the `GeneratorMetadata` object (as well as it's parent `ConfigurableTaskMetadata` object). This means you no longer need to define a `metadata` static method when creating a column generator implementation. \r\n\r\nAs part of this refactor, we have added two new subclasses to use for different generation strategies:\r\n- `data_designer.engine.column_generators.generators.base.ColumnGeneratorFullColumn`\r\n- `data_designer.engine.column_generators.generators.base.ColumnGeneratorCellByCell`\r\n\r\n**Before (v0.3.4)**\r\n```python\r\nfrom data_designer.engine.column_generators.generators.base import (\r\n    ColumnGenerator,\r\n    GenerationStrategy,\r\n    GeneratorMetadata,\r\n)\r\n\r\nclass IndexMultiplierColumnGenerator(ColumnGenerator[IndexMultiplierColumnConfig]):\r\n    @staticmethod\r\n    def metadata() -> GeneratorMetadata:\r\n        \"\"\"Define metadata about this generator.\"\"\"\r\n        return GeneratorMetadata(\r\n            name=\"index-multiplier\",\r\n            description=\"Generates values by multiplying the row index by a user-specified multiplier\",\r\n            generation_strategy=GenerationStrategy.FULL_COLUMN,\r\n        )\r\n    \r\n    # implementation below\r\n   ...\r\n```\r\n\r\n**After (v0.3.5)**\r\n```python\r\nfrom data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn\r\n\r\nclass IndexMultiplierColumnGenerator(ColumnGeneratorFullColumn[IndexMultiplierColumnConfig]):\r\n\r\n    # implementation below\r\n    ...\r\n```\r\n\r\n2. `required_columns` and `side_effect_columns` now must be explicitly defined on classes that inherit from `SingleColumnConfig`\r\n\r\n**Before (v0.3.4)**\r\n```python\r\nfrom data_defrom data_designer.config.column_configs import SingleColumnConfig\r\n\r\nclass IndexMultiplierColumnConfig(SingleColumnConfig):\r\n    \"\"\"Configuration for the index multiplier column generator.\"\"\"\r\n\r\n    # Configurable parameter for this plugin\r\n    multiplier: int = 2\r\n\r\n    # Required: discriminator field with a unique Literal type\r\n    # This value identifies your plugin and becomes its column_type\r\n    column_type: Literal[\"index-multiplier\"] = \"index-multiplier\"\r\n```\r\n\r\n**After (v0.3.5)**\r\n```python\r\nfrom data_designer.config.column_configs import SingleColumnConfig\r\n\r\nclass IndexMultiplierColumnConfig(SingleColumnConfig):\r\n    \"\"\"Configuration for the index multiplier column generator.\"\"\"\r\n\r\n    # Configurable parameter for this plugin\r\n    multiplier: int = 2\r\n\r\n    # Required: discriminator field with a unique Literal type\r\n    # This value identifies your plugin and becomes its column_type\r\n    column_type: Literal[\"index-multiplier\"] = \"index-multiplier\"\r\n\r\n    @property\r\n    def required_columns(self) -> list[str]:\r\n        return []\r\n\r\n    @property\r\n    def side_effect_columns(self) -> list[str]:\r\n        return []\r\n```\r\n\r\nWhile the updated version is more verbose, it will ensure column generator developers are aware of these properties, which are essential for building a working generator. \r\n\r\n3. Removed `emoji` from the `Plugin` object\r\n\r\nNow that plugins support more that column generators, the `emoji`  field is not always applicable. \r\n\r\n\r\n**Before (v0.3.4)**\r\n```python\r\nfrom data_designer.plugins import Plugin, PluginType\r\n\r\n# Plugin instance - this is what gets loaded via entry point\r\nplugin = Plugin(\r\n    impl_qualified_name=\"data_designer_index_multiplier.plugin.IndexMultiplierColumnGenerator\",\r\n    config_qualified_name=\"data_designer_index_multiplier.plugin.IndexMultiplierColumnConfig\",\r\n    plugin_type=PluginType.COLUMN_GENERATOR,\r\n    emoji=\"🔌\",\r\n)\r\n```\r\n\r\n**After (v0.3.5)**\r\n```python\r\nfrom data_designer.plugins import Plugin, PluginType\r\n\r\n# Plugin instance - this is what gets loaded via entry point\r\nplugin = Plugin(\r\n    impl_qualified_name=\"data_designer_index_multiplier.plugin.IndexMultiplierColumnGenerator\",\r\n    config_qualified_name=\"data_designer_index_multiplier.plugin.IndexMultiplierColumnConfig\",\r\n    plugin_type=PluginType.COLUMN_GENERATOR,\r\n)\r\n```\r\n\r\n## What's Changed\r\n* fix: dataset metadata should be optional in `PreviewResults` by @andreatgretel in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F223\r\n* refactor: remove task metadata property by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F216\r\n* refactor: update single column base class by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F206\r\n* chore: lazy 3rd party imports by @nabinchha in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F222\r\n* fix: post merge issues by @nabinchha in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F224\r\n* chore: minor readme updates by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F225\r\n* chore: streamline generation metadata + consolidate sdg json by @nabinchha in https:\u002F\u002Fgithu","2026-01-16T22:47:03",{"id":213,"version":214,"summary_zh":215,"released_at":216},280183,"v0.3.4","## What's Changed\r\n* fix: hard-disable early shutdown when RunConfig.disable_early_shutdown=true by @eric-tramel in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F203\r\n* chore: upgrade numpy in uv.lock by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F202\r\n* fix: update example runner command with notebooks dep group by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F204\r\n* chore: Add SkipJsonSchema annotation to DF seed source by @mikeknep in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F205\r\n* feat: Plumb LLM retry controls through RunConfig by @eric-tramel in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F208\r\n* feat: move buffer size control to RunConfig by @eric-tramel in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F209\r\n* fix: seed columns do not show up in display_sample_record by @andreatgretel in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F213\r\n* docs: Added top models pie chart by @kirit93 in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F217\r\n* chore: rename e2e_tests to tests_e2e by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F214\r\n* chore: Set json_schema_mode_override validation on ConfigBase by @mikeknep in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F220\r\n* fix: gracefully handle empty buffers in the dataset builder by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F221\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.3.3...v0.3.4","2026-01-14T22:37:46",{"id":218,"version":219,"summary_zh":220,"released_at":221},280184,"v0.3.3","## What's Changed\r\n* chore: Bump rich to 14.x series by @mikeknep in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F196\r\n* chore: add isssue templates by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F197\r\n* chore: minor issue template tweaks by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F198\r\n* chore: add make commands to run examples as e2e tests by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F199\r\n* fix: early shutdown race condition by @nabinchha in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F201\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.3.2...v0.3.3","2026-01-12T22:59:59",{"id":223,"version":224,"summary_zh":225,"released_at":226},280185,"v0.3.2","## Breaking plugin changes\r\n\r\n- `required_resources` has been removed from task metadata objects. \r\n\r\n-  There are two new column generator base classes to streamline model usage:\r\n\r\nIn `data_designer.engine.column_generators.generators.base`:\r\n```python \r\nclass ColumnGeneratorWithModelRegistry(ColumnGenerator[TaskConfigT], ABC):\r\n    @property\r\n    def model_registry(self) -> ModelRegistry:\r\n        return self.resource_provider.model_registry\r\n\r\n    def get_model(self, model_alias: str) -> ModelFacade:\r\n        return self.model_registry.get_model(model_alias=model_alias)\r\n\r\n    def get_model_config(self, model_alias: str) -> ModelConfig:\r\n        return self.model_registry.get_model_config(model_alias=model_alias)\r\n\r\n    def get_model_provider_name(self, model_alias: str) -> str:\r\n        provider = self.model_registry.get_model_provider(model_alias=model_alias)\r\n        return provider.name\r\n\r\n\r\nclass ColumnGeneratorWithModel(ColumnGeneratorWithModelRegistry[TaskConfigT], ABC):\r\n    @functools.cached_property\r\n    def model(self) -> ModelFacade:\r\n        return self.get_model(model_alias=self.config.model_alias)\r\n\r\n    @functools.cached_property\r\n    def model_config(self) -> ModelConfig:\r\n        return self.get_model_config(model_alias=self.config.model_alias)\r\n\r\n    @functools.cached_property\r\n    def inference_parameters(self) -> BaseInferenceParams:\r\n        return self.model_config.inference_parameters\r\n\r\n    def log_pre_generation(self) -> None:\r\n        logger.info(f\"{self.config.column_type} model configuration for generating column '{self.config.name}'\")\r\n        logger.info(f\"  |-- model: {self.model_config.model!r}\")\r\n        logger.info(f\"  |-- model alias: {self.config.model_alias!r}\")\r\n        logger.info(f\"  |-- model provider: {self.get_model_provider_name(model_alias=self.config.model_alias)!r}\")\r\n        logger.info(f\"  |-- inference parameters: {self.inference_parameters.format_for_display()}\")\r\n```\r\n\r\nIf you need to use models in your generator, subclass from one of these base classes. \r\n\r\n## What's Changed\r\n* refactor: update required resources treatment and use subclasses over mixins by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F184\r\n* feat: Seed dataset plugins by @mikeknep in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F191\r\n* chore: update header script to check for diffs by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F195\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.3.1...v0.3.2","2026-01-09T22:46:55",{"id":228,"version":229,"summary_zh":230,"released_at":231},280186,"v0.3.1","## What's Changed\r\n* fix: Stray validate calls in notebooks by @mikeknep in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F192\r\n* fix: exclude df from seed source serialization by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F193\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.3.0...v0.3.1","2026-01-08T22:53:54",{"id":233,"version":234,"summary_zh":235,"released_at":236},280187,"v0.3.0","## 🎨 NeMo Data Designer v0.3.0 Release Notes\r\n\r\nDataDesigner v0.3.0 introduces some breaking changes that we highlight below.\r\n\r\n\r\n### 💥 Breaking Change: config validation\r\n\r\nThe Data Designer config validation method `.validate` has been moved from the config builder to the `DataDesigner` object. \r\n\r\n#### Before (v0.2.x):\r\n\r\n```python\r\nfrom data_designer.essentials import DataDesigner, DataDesignerConfigBuilder\r\n\r\ndata_designer = DataDesigner()\r\nconfig_builder = DataDesignerConfigBuilder()\r\n\r\n# ... build your config ...\r\n\r\n# validate config\r\nconfig_builder.validate()\r\n```\r\n\r\n#### After (v0.3.x):\r\n\r\n```python\r\nfrom data_designer.essentials import DataDesigner, DataDesignerConfigBuilder\r\n\r\ndata_designer = DataDesigner()\r\nconfig_builder = DataDesignerConfigBuilder()\r\n\r\n# ... build your config ...\r\n\r\n# validate config\r\ndata_designer.validate(config_builder)\r\n```\r\n\r\n### 💥 Breaking Change: seed datasets\r\n\r\nWorking with seed datasets has been simplified with the introduction of [SeedSource objects](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fblob\u002Fde417c88e16724093877f528686858bd6953b2d0\u002Fsrc\u002Fdata_designer\u002Fconfig\u002Fseed_source.py#L18), which are passed directly to `config_builder.with_seed_dataset`. This removes the step of making a seed reference with datastore settings (when needed). \r\n\r\n#### Before (v0.2.x):\r\n\r\nSeed from a local file:\r\n```python\r\nfrom data_designer.essentials import DataDesigner, DataDesignerConfigBuilder\r\n\r\nconfig_builder = DataDesignerConfigBuilder()\r\n\r\nseed_dataset_reference = data_designer.make_seed_reference_from_file(\"my_seed_dataset.parquet\")\r\nconfig_builder.with_seed_dataset(seed_dataset_reference)\r\n```\r\n\r\nSeed from a Dataframe:\r\n```python\r\nfrom data_designer.essentials import DataDesigner, DataDesignerConfigBuilder\r\n\r\n# define dataframe `df`\r\n\r\nconfig_builder = DataDesignerConfigBuilder()\r\n\r\n# the dataframe must be written to file in v0.2.x\r\nseed_dataset_reference = data_designer.make_seed_reference_from_dataframe(df, \"my_seed_dataset.parquet\")\r\n\r\nconfig_builder.with_seed_dataset(seed_dataset_reference)\r\n```\r\n\r\n#### After (v0.3.x):\r\n\r\nSeed from a local file:\r\n```python\r\nfrom data_designer.essentials import DataDesigner, DataDesignerConfigBuilder, LocalFileSeedSource\r\n\r\nconfig_builder = DataDesignerConfigBuilder()\r\nconfig_builder.with_seed_dataset(LocalFileSeedSource(path=\"my_seed_dataset.parquet\"))\r\n```\r\n\r\nSeed from a DataFrame:\r\n```python\r\nfrom data_designer.essentials import DataDesigner, DataDesignerConfigBuilder, DataFrameSeedSource\r\n\r\n# define dataframe `df`\r\n\r\nconfig_builder = DataDesignerConfigBuilder()\r\n\r\n# no need to specify a file, as the dataframe will be sampled directly in memory\r\nconfig_builder.with_seed_dataset(DataFrameSeedSource(df=df))\r\n```\r\n\r\nSeed from Hugging Face Hub:\r\n```python \r\nfrom data_designer.essentials import DataDesigner, DataDesignerConfigBuilder, HuggingFaceSeedSource\r\n\r\nconfig_builder = DataDesignerConfigBuilder()\r\nconfig_builder.with_seed_dataset(HuggingFaceSeedSource(path=\"datasets\u002Fmy-username\u002Fmy-dataset\u002Fdata\u002F*.parquet\"))\r\n```\r\n\r\n### 💥 Breaking Change: plugins\r\n\r\nWhen defining plugins, there are two important updates: \r\n\r\n- `task` -> `impl`\r\n- The arguments of the `Plugin` object are now given as fully-qualified object names (e.g., `\"my_plugin.module.PluginObject\"`) rather than the actual objects. \r\n\r\n#### Before (v0.2.x):\r\n\r\n```python \r\nfrom my_plugin.multiple_column_generator import IndexMultiplierColumnGenerator, IndexMultiplierColumnConfig\r\nfrom data_designer.plugins import Plugin, PluginType \r\n\r\nplugin = Plugin(\r\n    task_cls=IndexMultiplierColumnGenerator,\r\n    config_cls=IndexMultiplierColumnConfig,\r\n    plugin_type=PluginType.COLUMN_GENERATOR,\r\n    emoji=\"🔌\",\r\n)\r\n```\r\n\r\n#### After (v0.3.x)\r\n\r\n```python \r\nfrom data_designer.plugins import Plugin, PluginType \r\n\r\nplugin = Plugin(\r\n    impl_qualified_name=\"my_plugin.multiple_column_generator.IndexMultiplierColumnGenerator\",\r\n    config_qualified_name=\"my_plugin.multiple_column_generator.IndexMultiplierColumnConfig\",\r\n    plugin_type=PluginType.COLUMN_GENERATOR,\r\n    emoji=\"🔌\",\r\n)\r\n```\r\n\r\n\r\n## What's Changed\r\n* fix: make doc building workflow use python 3.11 by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F170\r\n* refactor: plugin system updates by @mikeknep in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F168\r\n* feat: add OpenRouter as one of the default providers by @nabinchha in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F161\r\n* feat: Allow defining extra headers on model providers by @mikeknep in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F174\r\n* docs: fix documentation on max_tokens by @nabinchha in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F176\r\n* docs: Add extra_headers to model provider docs by @mikeknep in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F178\r\n* fix: `Decimal` in structured generation leads to errors by @andreatgretel in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F171\r\n* fix: litellm max callbacks override by @nabinchha in https:\u002F\u002Fgith","2026-01-08T21:15:18",{"id":238,"version":239,"summary_zh":240,"released_at":241},280188,"v0.2.3","## What's Changed\r\n* fix: make doc building workflow use python 3.11 by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F170\r\n* fix: litellm max callbacks override by @nabinchha in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F180\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.2.2...v0.2.3","2026-01-07T21:51:36",{"id":243,"version":244,"summary_zh":245,"released_at":246},280189,"v0.2.2","## What's Changed\r\n* chore: change ruff parsing to JSON + relax ruff version by @andreatgretel in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F156\r\n* chore: refresh dependency list by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F154\r\n* fix: seed datasets replace existing columns when names collide by @andreatgretel in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F158\r\n* fix: limit imports in base generators module by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F166\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.2.1...v0.2.2","2025-12-30T21:37:57",{"id":248,"version":249,"summary_zh":250,"released_at":251},280190,"v0.2.1","## What's Changed\r\n* docs: some updates for nano3 by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F149\r\n* chore: initial telemetry impl by @johntmyers in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F118\r\n* docs: just some tutorial notebook tweaks and a docstring update by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F150\r\n* docs: add cli instructions to person sampling docs by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F151\r\n* docs: fix links and tweak person sampling by @johnnygreco in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F152\r\n\r\n## New Contributors\r\n* @johntmyers made their first contribution in https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fpull\u002F118\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FDataDesigner\u002Fcompare\u002Fv0.2.0...v0.2.1","2025-12-19T01:56:28"]