[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-hitsz-ids--synthetic-data-generator":3,"tool-hitsz-ids--synthetic-data-generator":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":98,"forks":99,"last_commit_at":100,"license":101,"difficulty_score":10,"env_os":102,"env_gpu":103,"env_ram":104,"env_deps":105,"category_tags":116,"github_topics":117,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":128,"updated_at":129,"faqs":130,"releases":160},1231,"hitsz-ids\u002Fsynthetic-data-generator","synthetic-data-generator","SDG is a specialized framework designed to generate high-quality structured tabular data.","Synthetic Data Generator（SDG）是一个专门用于生成高质量结构化表格数据的框架。它能够模拟真实数据的分布和特征，但不包含任何敏感信息，因此可以安全地用于各种需要数据支持的场景，无需担心隐私法规的限制，如GDPR或ADPPA。\n\n在数据共享、模型训练与调试、系统开发与测试等场景中，真实数据可能因隐私或安全问题难以获取或使用，而SDG生成的合成数据则能有效替代，提供可靠的数据支持。对于研究人员和开发者来说，它是一个强大的工具，可以帮助他们快速构建和验证数据驱动的应用。\n\nSDG特别适合开发者、数据科学家和研究人员使用，尤其在需要大量高质量数据但无法直接获取真实数据的情况下。其独特之处在于支持多种数据生成模型，并提供了灵活的配置选项，以满足不同场景的需求。此外，项目还提供了详细的文档和示例，便于用户快速上手和深入使用。","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_fa615d944e4a.png\" width=\"400\" >\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n\u003Cp align=\"center\">\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Factions\">\u003Cimg alt=\"Actions Status\" src=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Factions\u002Fworkflows\u002Fci-test-python-package.yml\u002Fbadge.svg\">\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fsynthetic-data-generator.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest'>\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_13d664e1afd7.png' alt='Documentation Status' \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fresults.pre-commit.ci\u002Flatest\u002Fgithub\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fmain\">\u003Cimg alt=\"pre-commit.ci status\" src=\"https:\u002F\u002Fresults.pre-commit.ci\u002Fbadge\u002Fgithub\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fmain.svg\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg alt=\"LICENSE\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fhitsz-ids\u002Fsynthetic-data-generator\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Freleases\u002F\">\u003Cimg alt=\"Releases\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fhitsz-ids\u002Fsynthetic-data-generator\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Freleases\u002F\">\u003Cimg alt=\"Pre Releases\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fhitsz-ids\u002Fsynthetic-data-generator?include_prereleases&label=pre-release&logo=github\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\">\u003Cimg alt=\"Last Commit\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fhitsz-ids\u002Fsynthetic-data-generator\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\">\u003Cimg alt=\"Python version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fsdgx\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcontributors\">\u003Cimg alt=\"contributors\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fall-contributors\u002Fhitsz-ids\u002Fsynthetic-data-generator?color=ee8449&style=flat-square\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fhitsz-ids\u002Fshared_invite\u002Fzt-2395mt6x2-dwf0j_423QkAgGvlNA5E1g\">\u003Cimg alt=\"slack\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fslack-join%20chat-ff69b4.svg?style=flat-square\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n# 🚀 Synthetic Data Generator\n\n\u003Cp style=\"font-size: small;\">Switch Language:\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fblob\u002Fmain\u002FREADME_ZH_CN.md\" target=\"_blank\">简体中文\u003C\u002Fa> &nbsp;| &nbsp;\n    Latest \u003Ca href=\"https:\u002F\u002Fsynthetic-data-generator.readthedocs.io\u002Fen\u002Flatest\u002F\" target=\"value\">API Docs\u003C\u002Fa> &nbsp;| &nbsp;\n    \u003Ca href=\"ROADMAP.md\" target=\"value\">Roadmap\u003C\u002Fa> &nbsp;| &nbsp;\n    Join \u003Ca href=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_f42a7a5867cd.jpg\" target=\"value\">Wechat Group\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp style=\"font-size: small;\">\n    Colab Examples:&nbsp;\n    \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1VFnP59q3eoVtMJ1PvcYjmuXtx9N8C7o0?usp=sharing\" target=\"value\"> LLM: Data Synthesis\u003C\u002Fa>\n    &nbsp;| &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1_chuTVZECpj5fklj-RAp7ZVrew8weLW_?usp=sharing\" target=\"value\"> LLM: Off-Table Inference\u003C\u002Fa>\n    &nbsp;| &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1cMB336jN3kb-m_pr1aJjshnNep_6bhsf?usp=sharing\" target=\"value\"> Billion-Level-Data supported CTGAN\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fp>\n\u003C\u002Fdiv>\n\nThe Synthetic Data Generator (SDG) is a specialized framework designed to generate high-quality structured tabular data.\n\nSynthetic data does not contain any sensitive information, yet it retains the essential characteristics of the original data, making it exempt from privacy regulations such as GDPR and ADPPA.\n\nHigh-quality synthetic data can be safely utilized across various domains including data sharing, model training and debugging, system development and testing, etc.\n\nWe are excited to have you here and look forward to your contributions, get started with the project through this [Contributing Overview Guide](CONTRIBUTING.md)!\n\n## 💥News\n\nOur current key achievements and timelines are as follows:\n\n🔥 Nov 21, 2024: 1) Model Integration - We've integrated the `GaussianCopula` model into our Data Processor System. Check out the code example in this [PR](https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F241); 2) Synthetic Quality - We implemented automatic detection of data column relationships and allowed for relationship specification, improved the quality of synthetic data([Code Example](https:\u002F\u002Fsynthetic-data-generator.readthedocs.io\u002Fen\u002Flatest\u002Fuser_guides\u002Fsingle_table_column_combinations.html)); 3) Performance Enhancement - We significantly reduced the memory usage of GaussianCopula when handling discrete data, enabling training on thousands of categorical data entries with a `2C4G` setup!\n\n🔥 May 30, 2024: The Data Processor module was officially merged. This module will: 1) help SDG convert the format of some data columns (such as Datetime columns) before feeded into the model (so as to avoid being treated as discrete types), and reversely convert the model-generated data into the original format; 2) perform more customized pre-processing and post-processing on various data types; 3) easily deal with problems such as null values ​​in the original data; 4) support the plug-in system.\n\n🔥 Feb 20, 2024: a single-table data synthesis model based on LLM is included, view colab example: \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1VFnP59q3eoVtMJ1PvcYjmuXtx9N8C7o0?usp=sharing\" target=\"value\"> LLM: Data Synthesis\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1_chuTVZECpj5fklj-RAp7ZVrew8weLW_?usp=sharing\" target=\"value\"> LLM: Off-table Feature Inference\u003C\u002Fa>.\n\n🔧 Feb 7, 2024: We improved `sdgx.data_models.metadata` to support metadata information describing for single tables and multiple tables, support multiple data types, support automatic data type inference. view colab example: \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1b4ZTpgSYjOt7ekp1Wj8CxDknbOHEwA7s?usp=sharing\" target=\"value\">SDG Single-Table Metadata\u003C\u002Fa>。\n\n🔶 Dec 20, 2023: v0.1.0 released, a CTGAN model that supports billions of data processing capabilities is included, view our \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Ftree\u002Fmain\u002Fbenchmarks#results\" target=\"value\"> benchmark against SDV\u003C\u002Fa>, where SDG achieved less memory consumption and avoided crashing during training. For specific use, view colab example: \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1cMB336jN3kb-m_pr1aJjshnNep_6bhsf?usp=sharing\" target=\"value\"> Billion-Level-Data supported CTGAN\u003C\u002Fa>.\n\n🔆 Aug 10, 2023: First line of SDG code committed.\n\n## 🎉 LLM-integrated synthetic data generation\n\nFor a long time, LLM has been used to understand and generate various types of data. In fact, LLM also has certain capabilities in tabular data generation. Also, it has some abilities that cannot be achieved by traditional (based on GAN methods or statistical methods) .\n\nOur `sdgx.models.LLM.single_table.gpt.SingleTableGPTModel` implements two new features:\n\n### Synthetic data generation without Data\n\nNo training data is required, synthetic data can be generated based on metadata data, view in our \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1VFnP59q3eoVtMJ1PvcYjmuXtx9N8C7o0?usp=sharing\" target=\"value\"> colab example\u003C\u002Fa>.\n\n![Synthetic data generation without Data](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_dc2dc3837111.gif)\n\n### Off-Table feature inference\n\nInfer new column data based on the existing data in the table and the knowledge mastered by LLM, view in our \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1_chuTVZECpj5fklj-RAp7ZVrew8weLW_?usp=sharing\" target=\"value\"> colab example\u003C\u002Fa>.\n\n![Off-Table feature inference](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_84fe63b063ee.gif)\n\n## 💫 Why SDG ?\n\n- Technological advancements:\n  - Supports a wide range of statistical data synthesis algorithms, LLM-based synthetic data generation model is also integrated;\n  - Optimized for big data, effectively reducing memory consumption;\n  - Continuously tracking the latest advances in academia and industry, and introducing support for excellent algorithms and models in a timely manner.\n- Privacy enhancements:\n  - SDG supports differential privacy, anonymization and other methods to enhance the security of synthetic data.\n- Easy to extend:\n  - Supports expansion of models, data processing, data connectors, etc. in the form of plug-in packages.\n\n## 🌀 Quick Start\n\n### Pre-build image\n\nYou can use pre-built images to quickly experience the latest features.\n\n```bash\ndocker pull idsteam\u002Fsdgx:latest\n```\n\n### Install from PyPi\n\n```bash\npip install sdgx\n```\n\n### Local Install (Recommended)\n\nUse SDG by installing it through the source code.\n\n```bash\ngit clone git@github.com:hitsz-ids\u002Fsynthetic-data-generator.git\npip install .\n# Or install from git\npip install git+https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator.git\n```\n\n### Quick Demo of Single Table Data Generation and Metric\n\n#### Demo code\n\n```python\nfrom sdgx.data_connectors.csv_connector import CsvConnector\nfrom sdgx.models.ml.single_table.ctgan import CTGANSynthesizerModel\nfrom sdgx.synthesizer import Synthesizer\nfrom sdgx.utils import download_demo_data\n\n# This will download demo data to .\u002Fdataset\ndataset_csv = download_demo_data()\n\n# Create data connector for csv file\ndata_connector = CsvConnector(path=dataset_csv)\n\n# Initialize synthesizer, use CTGAN model\nsynthesizer = Synthesizer(\n    model=CTGANSynthesizerModel(epochs=1),  # For quick demo\n    data_connector=data_connector,\n)\n\n# Fit the model\nsynthesizer.fit()\n\n# Sample\nsampled_data = synthesizer.sample(1000)\nprint(sampled_data)\n```\n\n#### Comparison\n\nReal data are as follows：\n\n```python\n>>> data_connector.read()\n       age         workclass  fnlwgt  education  ...  capitalloss hoursperweek native-country  class\n0        2         State-gov   77516  Bachelors  ...            0            2  United-States  \u003C=50K\n1        3  Self-emp-not-inc   83311  Bachelors  ...            0            0  United-States  \u003C=50K\n2        2           Private  215646    HS-grad  ...            0            2  United-States  \u003C=50K\n3        3           Private  234721       11th  ...            0            2  United-States  \u003C=50K\n4        1           Private  338409  Bachelors  ...            0            2           Cuba  \u003C=50K\n...    ...               ...     ...        ...  ...          ...          ...            ...    ...\n48837    2           Private  215419  Bachelors  ...            0            2  United-States  \u003C=50K\n48838    4               NaN  321403    HS-grad  ...            0            2  United-States  \u003C=50K\n48839    2           Private  374983  Bachelors  ...            0            3  United-States  \u003C=50K\n48840    2           Private   83891  Bachelors  ...            0            2  United-States  \u003C=50K\n48841    1      Self-emp-inc  182148  Bachelors  ...            0            3  United-States   >50K\n\n[48842 rows x 15 columns]\n\n```\n\nSynthetic data are as follows：\n\n```python\n>>> sampled_data\n     age workclass  fnlwgt     education  ...  capitalloss hoursperweek native-country  class\n0      1       NaN   28219  Some-college  ...            0            2    Puerto-Rico  \u003C=50K\n1      2   Private  250166       HS-grad  ...            0            2  United-States   >50K\n2      2   Private   50304       HS-grad  ...            0            2  United-States  \u003C=50K\n3      4   Private   89318     Bachelors  ...            0            2    Puerto-Rico   >50K\n4      1   Private  172149     Bachelors  ...            0            3  United-States  \u003C=50K\n..   ...       ...     ...           ...  ...          ...          ...            ...    ...\n995    2       NaN  208938     Bachelors  ...            0            1  United-States  \u003C=50K\n996    2   Private  166416     Bachelors  ...            2            2  United-States  \u003C=50K\n997    2       NaN  336022       HS-grad  ...            0            1  United-States  \u003C=50K\n998    3   Private  198051       Masters  ...            0            2  United-States   >50K\n999    1       NaN   41973       HS-grad  ...            0            2  United-States  \u003C=50K\n\n[1000 rows x 15 columns]\n```\n\n## 👩‍🎓 Related Work\n\n- CTGAN：[Modeling Tabular Data using Conditional GAN](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2019\u002Fhash\u002F254ed7d2de3b23ab10936522dd547b78-Abstract.html)\n- C3-TGAN: [C3-TGAN- Controllable Tabular Data Synthesis with Explicit Correlations and Property Constraints](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F374652636_C3-TGAN-_Controllable_Tabular_Data_Synthesis_with_Explicit_Correlations_and_Property_Constraints)\n- TVAE：[Modeling Tabular Data using Conditional GAN](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2019\u002Fhash\u002F254ed7d2de3b23ab10936522dd547b78-Abstract.html)\n- table-GAN：[Data Synthesis based on Generative Adversarial Networks](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.03384.pdf)\n- CTAB-GAN:[CTAB-GAN: Effective Table Data Synthesizing](https:\u002F\u002Fproceedings.mlr.press\u002Fv157\u002Fzhao21a\u002Fzhao21a.pdf)\n- OCT-GAN: [OCT-GAN: Neural ODE-based Conditional Tabular GANs](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.14969.pdf)\n\n## 🤝 Join Community\n\nThe SDG project was initiated by **Institute of Data Security, Harbin Institute of Technology**. If you are interested in out project, welcome to join our community. We welcome organizations, teams, and individuals who share our commitment to data protection and security through open source:\n\n- Read [CONTRIBUTING](.\u002FCONTRIBUTING.md) before draft a pull request.\n- Submit an issue by viewing [View Good First Issue](https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Flabels\u002Fgood%20first%20issue) or submit a Pull Request.\n- Join our Wechat Group through QR code.\n\n\u003Cdiv align=\"left\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_f42a7a5867cd.jpg\" width=\"200\" >\n\u003C\u002Fdiv>\n\n## 📄 License\n\nThe SDG open source project uses Apache-2.0 license, please refer to the [LICENSE](https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fblob\u002Fmain\u002FLICENSE).\n","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_fa615d944e4a.png\" width=\"400\" >\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n\u003Cp align=\"center\">\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Factions\">\u003Cimg alt=\"Actions Status\" src=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Factions\u002Fworkflows\u002Fci-test-python-package.yml\u002Fbadge.svg\">\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fsynthetic-data-generator.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest'>\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_13d664e1afd7.png' alt='Documentation Status' \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fresults.pre-commit.ci\u002Flatest\u002Fgithub\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fmain\">\u003Cimg alt=\"pre-commit.ci status\" src=\"https:\u002F\u002Fresults.pre-commit.ci\u002Fbadge\u002Fgithub\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fmain.svg\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg alt=\"LICENSE\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fhitsz-ids\u002Fsynthetic-data-generator\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Freleases\u002F\">\u003Cimg alt=\"Releases\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fhitsz-ids\u002Fsynthetic-data-generator\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Freleases\u002F\">\u003Cimg alt=\"Pre Releases\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fhitsz-ids\u002Fsynthetic-data-generator?include_prereleases&label=pre-release&logo=github\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\">\u003Cimg alt=\"Last Commit\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fhitsz-ids\u002Fsynthetic-data-generator\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\">\u003Cimg alt=\"Python version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fsdgx\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcontributors\">\u003Cimg alt=\"contributors\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fall-contributors\u002Fhitsz-ids\u002Fsynthetic-data-generator?color=ee8449&style=flat-square\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fhitsz-ids\u002Fshared_invite\u002Fzt-2395mt6x2-dwf0j_423QkAgGvlNA5E1g\">\u003Cimg alt=\"slack\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fslack-join%20chat-ff69b4.svg?style=flat-square\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n# 🚀 合成数据生成器\n\n\u003Cp style=\"font-size: small;\">切换语言：\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fblob\u002Fmain\u002FREADME_ZH_CN.md\" target=\"_blank\">简体中文\u003C\u002Fa> &nbsp;| &nbsp;\n    最新 \u003Ca href=\"https:\u002F\u002Fsynthetic-data-generator.readthedocs.io\u002Fen\u002Flatest\u002F\" target=\"value\">API 文档\u003C\u002Fa> &nbsp;| &nbsp;\n    \u003Ca href=\"ROADMAP.md\" target=\"value\">路线图\u003C\u002Fa> &nbsp;| &nbsp;\n    加入 \u003Ca href=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_f42a7a5867cd.jpg\" target=\"value\">微信群\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp style=\"font-size: small;\">\n    Colab 示例：&nbsp;\n    \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1VFnP59q3eoVtMJ1PvcYjmuXtx9N8C7o0?usp=sharing\" target=\"value\"> LLM：数据合成\u003C\u002Fa>\n    &nbsp;| &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1_chuTVZECpj5fklj-RAp7ZVrew8weLW_?usp=sharing\" target=\"value\"> LLM：表外推断\u003C\u002Fa>\n    &nbsp;| &nbsp;\n    \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1cMB336jN3kb-m_pr1aJjshnNep_6bhsf?usp=sharing\" target=\"value\"> 支持数十亿条数据的 CTGAN\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fp>\n\u003C\u002Fdiv>\n\n合成数据生成器（SDG）是一个专门用于生成高质量结构化表格数据的框架。\n\n合成数据不包含任何敏感信息，但保留了原始数据的基本特征，因此不受 GDPR 和 ADPPA 等隐私法规的约束。\n\n高质量的合成数据可以在多个领域安全使用，包括数据共享、模型训练与调试、系统开发与测试等。\n\n我们非常高兴您能加入，并期待您的贡献！请通过这份[贡献概述指南](CONTRIBUTING.md)开始参与项目！\n\n## 💥新闻\n\n我们目前的主要成果和时间表如下：\n\n🔥 2024年11月21日：1）模型集成——我们已将`GaussianCopula`模型集成到我们的数据处理器系统中。请查看此[PR](https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F241)中的代码示例；2）合成质量——我们实现了自动检测数据列关系并支持关系指定功能，从而提升了合成数据的质量([代码示例](https:\u002F\u002Fsynthetic-data-generator.readthedocs.io\u002Fen\u002Flatest\u002Fuser_guides\u002Fsingle_table_column_combinations.html))；3）性能提升——我们在处理离散数据时显著降低了GaussianCopula的内存占用，使得在`2C4G`配置下即可训练数千个分类数据条目！\n\n🔥 2024年5月30日：数据处理器模块正式合并。该模块将：1）帮助SDG在将数据输入模型之前转换某些数据列的格式（例如日期时间列），以避免被误认为离散类型，并将模型生成的数据反向转换回原始格式；2）对各种数据类型进行更定制化的预处理和后处理；3）轻松应对原始数据中的空值等问题；4）支持插件系统。\n\n🔥 2024年2月20日：新增基于LLM的单表数据合成模型，查看Colab示例：\u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1VFnP59q3eoVtMJ1PvcYjmuXtx9N8C7o0?usp=sharing\" target=\"value\"> LLM：数据合成\u003C\u002Fa>和\u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1_chuTVZECpj5fklj-RAp7ZVrew8weLW_?usp=sharing\" target=\"value\"> LLM：表外特征推断\u003C\u002Fa>。\n\n🔧 2024年2月7日：我们改进了`sdgx.data_models.metadata`，以支持描述单表和多表的元数据信息，支持多种数据类型，并支持自动数据类型推断。查看Colab示例：\u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1b4ZTpgSYjOt7ekp1Wj8CxDknbOHEwA7s?usp=sharing\" target=\"value\">SDG单表元数据\u003C\u002Fa>。\n\n🔶 2023年12月20日：发布v0.1.0版本，其中包含支持数十亿条数据处理能力的CTGAN模型，查看我们的\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Ftree\u002Fmain\u002Fbenchmarks#results\" target=\"value\">与SDV的基准对比\u003C\u002Fa>,SDG在内存消耗方面表现更优，并且在训练过程中未出现崩溃。具体使用方法，请查看Colab示例：\u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1cMB336jN3kb-m_pr1aJjshnNep_6bhsf?usp=sharing\" target=\"value\">支持数十亿条数据的CTGAN\u003C\u002Fa>。\n\n🔆 2023年8月10日：SDG代码的第一行提交。\n\n## 🎉 基于LLM的合成数据生成\n\n长期以来，LLM一直被用于理解和生成各类数据。事实上，LLM在表格数据生成方面也具备一定能力，而且它还拥有一些传统方法（基于GAN或统计方法）无法实现的功能。\n\n我们的`sdgx.models.LLM.single_table.gpt.SingleTableGPTModel`实现了两项新功能：\n\n### 无需数据的合成数据生成\n\n无需任何训练数据，即可基于元数据生成合成数据，详情请参阅我们的\u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1VFnP59q3eoVtMJ1PvcYjmuXtx9N8C7o0?usp=sharing\" target=\"value\">Colab 示例\u003C\u002Fa>。\n\n![无需数据的合成数据生成](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_dc2dc3837111.gif)\n\n### 表外特征推断\n\n基于表中现有数据及 LLM 掌握的知识，推断新列数据，详情请参阅我们的\u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1_chuTVZECpj5fklj-RAp7ZVrew8weLW_?usp=sharing\" target=\"value\">Colab 示例\u003C\u002Fa>。\n\n![表外特征推断](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_84fe63b063ee.gif)\n\n## 💫 为什么选择 SDG？\n\n- 技术进步：\n  - 支持多种统计学数据合成算法，同时集成基于 LLM 的合成数据生成模型；\n  - 针对大数据进行优化，有效降低内存消耗；\n  - 持续跟踪学术界与工业界的最新进展，并及时引入优秀的算法与模型支持。\n- 隐私增强：\n  - SDG 支持差分隐私、匿名化等方法，以提升合成数据的安全性。\n- 易于扩展：\n  - 支持通过插件包的形式扩展模型、数据处理、数据连接器等功能。\n\n## 🌀 快速入门\n\n### 预构建镜像\n\n您可以使用预构建的镜像快速体验最新功能。\n\n```bash\ndocker pull idsteam\u002Fsdgx:latest\n```\n\n### 从 PyPi 安装\n\n```bash\npip install sdgx\n```\n\n### 本地安装（推荐）\n\n通过源码安装 SDG 来使用它。\n\n```bash\ngit clone git@github.com:hitsz-ids\u002Fsynthetic-data-generator.git\npip install .\n# 或者从 Git 安装\npip install git+https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator.git\n```\n\n### 单表数据生成与指标的快速演示\n\n#### 演示代码\n\n```python\nfrom sdgx.data_connectors.csv_connector import CsvConnector\nfrom sdgx.models.ml.single_table.ctgan import CTGANSynthesizerModel\nfrom sdgx.synthesizer import Synthesizer\nfrom sdgx.utils import download_demo_data\n\n# 这将下载演示数据到 .\u002Fdataset\ndataset_csv = download_demo_data()\n\n# 创建 CSV 文件的数据连接器\ndata_connector = CsvConnector(path=dataset_csv)\n\n# 初始化合成器，使用 CTGAN 模型\nsynthesizer = Synthesizer(\n    model=CTGANSynthesizerModel(epochs=1),  # 用于快速演示\n    data_connector=data_connector,\n)\n\n# 拟合模型\nsynthesizer.fit()\n\n# 采样\nsampled_data = synthesizer.sample(1000)\nprint(sampled_data)\n```\n\n#### 对比\n\n真实数据如下：\n\n```python\n>>> data_connector.read()\n       age         workclass  fnlwgt  education  ...  capitalloss hoursperweek native-country  class\n0        2         State-gov   77516  Bachelors  ...            0            2  United-States  \u003C=50K\n1        3  Self-emp-not-inc   83311  Bachelors  ...            0            0  United-States  \u003C=50K\n2        2           Private  215646    HS-grad  ...            0            2  United-States  \u003C=50K\n3        3           Private  234721       11th  ...            0            2  United-States  \u003C=50K\n4        1           Private  338409  Bachelors  ...            0            2           Cuba  \u003C=50K\n...    ...               ...     ...        ...  ...          ...          ...            ...    ...\n48837    2           Private  215419  Bachelors  ...            0            2  United-States  \u003C=50K\n48838    4               NaN  321403    HS-grad  ...            0            2  United-States  \u003C=50K\n48839    2           Private  374983  Bachelors  ...            0            3  United-States  \u003C=50K\n48840    2           Private   83891  Bachelors  ...            0            2  United-States  \u003C=50K\n48841    1      Self-emp-inc  182148  Bachelors  ...            0            3  United-States  \u003C=50K\n\n[48842行 x 15列]\n\n```\n\n合成数据如下：\n\n```python\n>>> sampled_data\n     age workclass  fnlwgt     education  ...  capitalloss hoursperweek native-country  class\n0      1       NaN   28219  Some-college  ...            0            2    Puerto-Rico  \u003C=50K\n1      2   Private  250166       HS-grad  ...            0            2  United-States   >50K\n2      2   Private   50304       HS-grad  ...            0            2  United-States  \u003C=50K\n3      4   Private   89318     Bachelors  ...            0            2    Puerto-Rico   >50K\n4      1   Private  172149     Bachelors  ...            0            3  United-States  \u003C=50K\n..   ...       ...     ...           ...  ...          ...          ...            ...    ...\n995    2       NaN  208938     Bachelors  ...            0            1  United-States  \u003C=50K\n996    2   Private  166416     Bachelors  ...            2            2  United-States  \u003C=50K\n997    2       NaN  336022       HS-grad  ...            0            1  United-States  \u003C=50K\n998    3   Private  198051       Masters  ...            0            2  United-States   >50K\n999    1       NaN   41973       HS-grad  ...            0            2  United-States  \u003C=50K\n\n[1000行 x 15列]\n```\n\n## 👩‍🎓 相关工作\n\n- CTGAN：[使用条件 GAN 建模表格数据](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2019\u002Fhash\u002F254ed7d2de3b23ab10936522dd547b78-Abstract.html)\n- C3-TGAN：[C3-TGAN——可控的表格数据合成，具备显式相关性和属性约束](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F374652636_C3-TGAN-_Controllable_Tabular_Data_Synthesis_with_Explicit_Correlations_and_Property_Constraints)\n- TVAE：[使用条件 GAN 建模表格数据](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2019\u002Fhash\u002F254ed7d2de3b23ab10936522dd547b78-Abstract.html)\n- table-GAN：[基于生成对抗网络的数据合成](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.03384.pdf)\n- CTAB-GAN：[CTAB-GAN：高效的表格数据合成](https:\u002F\u002Fproceedings.mlr.press\u002Fv157\u002Fzhao21a\u002Fzhao21a.pdf)\n- OCT-GAN：[OCT-GAN：基于神经 ODE 的条件表格 GAN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.14969.pdf)\n\n## 🤝 加入社区\n\nSDG 项目由**哈尔滨工业大学数据安全研究所**发起。如果您对我们的项目感兴趣，欢迎加入我们的社区。我们欢迎所有致力于通过开源方式保护和保障数据安全的组织、团队和个人：\n\n- 在提交 Pull Request 之前，请先阅读 [CONTRIBUTING](.\u002FCONTRIBUTING.md)。\n- 通过查看 [View Good First Issue](https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Flabels\u002Fgood%20first%20issue) 提交 issue，或直接提交 Pull Request。\n- 通过二维码加入我们的微信交流群。\n\n\u003Cdiv align=\"left\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_readme_f42a7a5867cd.jpg\" width=\"200\" >\n\u003C\u002Fdiv>\n\n## 📄 许可协议\n\nSDG 开源项目采用 Apache-2.0 许可协议，详情请参阅 [LICENSE](https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fblob\u002Fmain\u002FLICENSE)。","# synthetic-data-generator 快速上手指南\n\n## 环境准备\n\n### 系统要求\n- 操作系统：支持 Linux、macOS 或 Windows（推荐使用 Linux 或 macOS）\n- Python 版本：3.8 及以上版本\n- 内存建议：至少 4GB RAM，生成大规模数据时建议更高配置\n\n### 前置依赖\n确保已安装以下工具：\n- `git`\n- `docker`（如需使用预构建镜像）\n- `pip`（用于安装 Python 包）\n\n## 安装步骤\n\n### 方法一：使用 PyPi 安装（推荐）\n\n```bash\npip install sdgx\n```\n\n> 如果你在国内，可以使用国内镜像源加速安装，例如：\n\n```bash\npip install sdgx -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 方法二：本地安装（推荐开发使用）\n\n```bash\ngit clone git@github.com:hitsz-ids\u002Fsynthetic-data-generator.git\npip install .\n# 或者直接从 Git 安装\npip install git+https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator.git\n```\n\n### 方法三：使用预构建 Docker 镜像\n\n```bash\ndocker pull idsteam\u002Fsdgx:latest\n```\n\n## 基本使用\n\n以下是一个最简单的单表数据合成示例：\n\n```python\nfrom sdgx.data_connectors.csv_connector import CsvConnector\nfrom sdgx.models.ml.single_table.ctgan import CTGANSynthesizerModel\nfrom sdgx.synthesizer import Synthesizer\nfrom sdgx.utils import download_demo_data\n\n# 下载示例数据到当前目录下的 dataset 文件夹\ndataset_csv = download_demo_data()\n\n# 创建 CSV 数据连接器\ndata_connector = CsvConnector(path=dataset_csv)\n\n# 初始化合成器，使用 CTGAN 模型\nsynthesizer = Synthesizer(\n    model=CTGANSynthesizerModel(epochs=1),  # 为快速演示设置较少的训练轮数\n    data_connector=data_connector,\n)\n\n# 训练模型\nsynthesizer.fit()\n\n# 生成 1000 条合成数据\nsampled_data = synthesizer.sample(1000)\nprint(sampled_data)\n```\n\n运行后将输出类似如下格式的合成数据：\n\n```python\n     age workclass  fnlwgt     education  ...  capitalloss hoursperweek native-country  class\n0      1       NaN   28219  Some-college  ...            0            2    Puerto-Rico  \u003C=50K\n1      2   Private  250166       HS-grad  ...            0            2  United-States   >50K\n...\n```\n\n通过此流程，你可以快速体验 synthetic-data-generator 的基本功能。更多高级用法请参考官方文档或项目中的示例代码。","某金融公司数据科学团队正在开发一个用于信用评分的机器学习模型，他们需要大量带有敏感客户信息的真实数据进行训练和测试，但由于隐私法规限制，无法直接使用原始数据。\n\n### 没有 synthetic-data-generator 时\n- 数据团队难以获取符合业务需求的高质量训练数据，导致模型训练效果不佳。\n- 使用真实数据进行测试可能违反 GDPR 等隐私保护法规，带来法律风险。\n- 手动创建模拟数据耗时费力，且难以保证数据分布与真实数据一致。\n- 数据共享过程中，因隐私问题无法与其他部门或外部合作伙伴协作。\n- 缺乏灵活的数据生成工具，难以快速迭代和验证不同模型假设。\n\n### 使用 synthetic-data-generator 后\n- 能够快速生成符合业务逻辑和统计特征的高质量合成数据，显著提升模型训练效率和准确性。\n- 生成的数据完全不含敏感信息，可安全用于测试、调试及跨部门协作，规避隐私合规风险。\n- 提供灵活的配置选项，支持自定义数据结构和分布，便于快速生成多样化数据集。\n- 支持大规模数据生成，满足高吞吐量场景下的数据需求，提升系统开发和测试效率。\n- 通过标准化接口和文档支持，降低了数据生成的技术门槛，提升了团队协作效率。\n\n合成数据生成框架为金融行业提供了安全、高效的数据处理方案，助力模型开发与合规实践并行。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhitsz-ids_synthetic-data-generator_dc2dc383.gif","hitsz-ids","哈工大（深圳）数据安全研究院","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhitsz-ids_ce5d26c6.png","Open source projects from Institute of Data Security, Harbin Institute of Technology (Shen Zhen)",null,"https:\u002F\u002Fidslab.io","https:\u002F\u002Fgithub.com\u002Fhitsz-ids",[83,87,91,95],{"name":84,"color":85,"percentage":86},"Python","#3572A5",59.2,{"name":88,"color":89,"percentage":90},"Jupyter Notebook","#DA5B0B",40.8,{"name":92,"color":93,"percentage":94},"Dockerfile","#384d54",0,{"name":96,"color":97,"percentage":94},"Shell","#89e051",2416,387,"2026-04-05T08:36:50","Apache-2.0","Linux, macOS, Windows","需要 NVIDIA GPU，显存 8GB+，CUDA 11.7+","16GB+",{"notes":106,"python":107,"dependencies":108},"建议使用 conda 管理环境，首次运行需下载约 5GB 模型文件","3.8+",[109,110,111,112,113,114,115],"torch>=2.0","transformers>=4.30","accelerate","pandas","numpy","scikit-learn","sdgx",[15,14,13,51,26],[118,119,120,121,122,123,124,125,126,127],"deep-learning","gan","generative-ai","machine-learning","privacy","synthetic-data","tabular-data","agent","data-generator","llm","2026-03-27T02:49:30.150509","2026-04-06T10:23:44.514093",[131,136,140,145,150,155],{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},5592,"如何限制生成数据的数值范围？","如果原始数据中某些列只包含正数，但生成的数据中出现了负值，可以通过使用 `PositiveNegativeFilter` 来约束生成数据的范围。具体步骤如下：\n1. 从 `sdgx.data_processors.filter.positive_negative` 导入 `PositiveNegativeFilter`\n2. 使用原始数据创建 `Metadata`\n3. 初始化并配置 `PositiveNegativeFilter`\n4. 将过滤器添加到合成器的处理流程中\n示例代码：\n```python\nfrom sdgx.data_processors.filter.positive_negative import PositiveNegativeFilter\nfrom sdgx.data_models.metadata import Metadata\n\n# 创建元数据\nmetadata = Metadata.from_dataframe(original_data)\n\n# 初始化并配置过滤器\npos_neg_filter = PositiveNegativeFilter()\npos_neg_filter.fit(metadata)\n\n# 添加到合成器\nsynthesizer.add_processor(pos_neg_filter)\n```\n此方法可以确保生成的数据不会出现不符合原始数据范围的值。","https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fissues\u002F231",{"id":137,"question_zh":138,"answer_zh":139,"source_url":135},5593,"如何处理生成数据中的负值问题？","如果在生成数据时发现某些本应为正数的列中出现了负值，可以使用 `PositiveNegativeFilter` 过滤器来解决该问题。具体操作包括：\n1. 从 `sdgx.data_processors.filter.positive_negative` 导入 `PositiveNegativeFilter`\n2. 使用原始数据创建 `Metadata`\n3. 初始化并配置 `PositiveNegativeFilter`\n4. 将其添加到合成器的处理流程中\n示例代码：\n```python\nfrom sdgx.data_processors.filter.positive_negative import PositiveNegativeFilter\nfrom sdgx.data_models.metadata import Metadata\n\n# 创建元数据\nmetadata = Metadata.from_dataframe(original_data)\n\n# 初始化并配置过滤器\npos_neg_filter = PositiveNegativeFilter()\npos_neg_filter.fit(metadata)\n\n# 添加到合成器\nsynthesizer.add_processor(pos_neg_filter)\n```\n通过这种方式，可以有效避免生成数据中出现不合理的负值。",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},5594,"如何确保外键列的数据类型一致？","为了确保外键列的数据类型一致，可以在元数据中添加验证逻辑，以检查与外键相关的两列是否具有相同的数据类型。建议在提交 PR 时加入测试用例，涵盖不同类型的外键情况。例如，可以对外键字段进行类型校验，确保它们在生成数据之前保持一致。","https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fissues\u002F110",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},5595,"如何生成时序数据（如典型日数据）？","要生成类似“北京地区8月份工业园区某一天的天气和用电负荷”这样的时序数据，可以采用以下方法：\n1. 收集历年同一天（如8月1日）的历史数据，并整理成表格。\n2. 使用统计模型（如 ARMA、ARIMA 或 SARIMA）训练模型，生成符合分布的仿真数据。\n3. 对于每天的数据，重复上述过程31次，即可获得整个月的仿真数据。\n此外，也可以结合时间序列分析和机器学习算法，根据已知特征补全其他特征，从而更精确地生成所需数据。","https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fissues\u002F170",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},5596,"是否有针对中文数据的隐私保护转换器？","目前 `sdgx` 库尚未提供专门针对中文数据的隐私保护转换器或格式化器。不过，你可以通过自定义数据处理器实现对中文敏感字段的脱敏处理，同时保留数据的基本统计特性。建议参考现有的数据处理器实现方式，编写适合中文文本的脱敏逻辑。","https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fissues\u002F222",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},5597,"如何添加贡献者信息？","可以通过 `@all-contributors` 命令将贡献者添加到项目中。例如，执行以下命令：\n```bash\n@all-contributors please add @username for code\n```\n其中，`@username` 是需要添加的贡献者用户名，`code` 表示其贡献类型。维护者会审核并确认贡献者的身份后，将其信息更新到项目中。","https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fissues\u002F52",[161,166,171,176,181,186,191,196,201,206,211,216,221,226,231],{"id":162,"version":163,"summary_zh":164,"released_at":165},114899,"0.1.0a0","## What's Changed\r\n* Update SDG's New Data Processor by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F48\r\n* Rewrite and notice copyright for CTGAN by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F50\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F49\r\n* docs: add Wh1isper as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F53\r\n* docs: add MooooCat as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F54\r\n* docs: add joeyscave as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F55\r\n* Breaking changes: Refactoring of new architecture by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F56\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F63\r\n* 0.1.0: Intro DataConnector and DataLoader by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F64\r\n* [0.1.0] Metadata and Inspector by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F67\r\n* [0.1.0]Breaking changes: Reactoring models Part 1 by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F68\r\n* [0.1.0] Update a part of docs by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F70\r\n* Update Base Class of Metric by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F60\r\n* docs: add sjh120 as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F73\r\n* [0.1.0] Refactoring CTGAN for DataLoader by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F72\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F76\r\n* [0.1.0] Intro NDArryLoader by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F75\r\n* Add subdir for NDArrayLoader to prevent collision of cache files by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F78\r\n\r\n## New Contributors\r\n* @allcontributors made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F53\r\n* @sjh120 made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F60\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.0.1...0.1.0a0","2023-12-19T06:34:45",{"id":167,"version":168,"summary_zh":169,"released_at":170},114900,"0.0.1","## What's Changed\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F2\r\n* Configure Sweep by @sweep-ai in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F4\r\n* Add base class for metrics in metrics\u002Fbase.py by @sweep-ai in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F5\r\n* Update the github issue template by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F6\r\n* Fix error in example 1 by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F10\r\n* Merge model base class by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F11\r\n* update development document by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F13\r\n* bugfix: Fix function's arg default List by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F14\r\n* Update README.md by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F15\r\n* Merge branch dev to  main by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F17\r\n* Update document by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F18\r\n* update dependencies (copulas) by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F19\r\n* Update Readme (English Version) by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F21\r\n* remove local issue template by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F23\r\n* Update Slack Shield by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F30\r\n* Document align by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F31\r\n* Update shields by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F32\r\n* Merge GaussianCopula Model by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F20\r\n* Optimize the Transformer module by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F35\r\n* Delete unnecessary files or directories by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F36\r\n* Add Local Install Description by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F37\r\n* Update requirements.txt by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F38\r\n* Fix local install error by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F39\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F41\r\n* Add test cov and docker images by @wunder957 in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F46\r\n* Adding CLI and Plugin system by @wunder957 in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F47\r\n\r\n## New Contributors\r\n* @pre-commit-ci made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F2\r\n* @sweep-ai made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F4\r\n* @MooooCat made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F6\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.0.0a0...0.0.1","2023-12-04T08:31:02",{"id":172,"version":173,"summary_zh":174,"released_at":175},114901,"0.0.0a0","Test Release CI","2023-08-11T03:15:21",{"id":177,"version":178,"summary_zh":179,"released_at":180},114887,"0.2.4","## What's Changed\r\n* Docs - Update README News and fix the link of python package badge. by @jalr4ever in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F243\r\n* Bugfix - Datatime formatter in small dataset and improve performace by @cyantangerine in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F244\r\n* Bugfix - Fixed numeric inspector error for int32\u002Ffloat32 types, by @cyantangerine in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F247\r\n* Feature - Support more encoders,` NormalizedFrequencyEncoder` & `NormalizedLabelEncoder`, by @cyantangerine in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F247\r\n* Feature - Integrate `GaussianCopula` model into the `Synthesizer`. by @jalr4ever in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F241\r\n* Feature - Support `DataFrameConnector` for in-memory datasets processing, by @cyantangerine in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F247\r\n* Feature - Support `NormalizedFrequencyEncoder` and `NormalizedLabelEncoder` for categorical encoding, by @cyantangerine in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F247\r\n* Enhancement - Support CTGAN sample with `drop_more` parameter for better generation efficiency, by @cyantangerine in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F247\r\n* Enhancement - Improved Disk_cache performance by avoiding pd iterative connections, by @cyantangerine in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F247\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.2.3...0.2.4","2024-12-03T02:44:38",{"id":182,"version":183,"summary_zh":184,"released_at":185},114888,"0.2.3","## What's Changed\r\n* Enhance -  Handling Fixed Column Relationships using FixedCombinationInspector and FixedCombinationTransformer by @MooooCat @jalr4ever in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F219\r\n* BugFix - Fix the type error in the `query` function of Metadata. by @jalr4ever in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F235\r\n* Enhance - Handling fixed column relationships by `specific_combinations` and `SpecificCombinationTransformer`. by @jalr4ever in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F236\r\n* chore: Drop python 3.8 support and improve ci file name by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F237\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.2.2...0.2.3","2024-11-18T09:31:31",{"id":187,"version":188,"summary_zh":189,"released_at":190},114889,"0.2.2","## What's Changed\r\n* Feature: Add progressbar for CTGAN when fitting and sampling by @cyantangerine in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F228\r\n* Enhance: Check the type of foreign key by @Z712023 in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F229\r\n* BugFix: Parallel Data Processing by @cyantangerine in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F227\r\n* Enhanee: Improved CONTRIBUTING Docs with 4+1 view and Overview Diagram by @jalr4ever in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F226\r\n* BugFix: Regulate positive-negative values in the generated data by @jalr4ever in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F232\r\n* Enhance: Tenfold performance boost for reduce the memory usage of Gaussian Copula training. by @jalr4ever in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F233\r\n\r\n## New Contributors\r\n* @cyantangerine made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F228\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.2.1...0.2.2","2024-11-08T02:18:50",{"id":192,"version":193,"summary_zh":194,"released_at":195},114890,"0.2.1","## What's Changed\r\n* Add CHN address inspector by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F158\r\n* Update inspector part in Doc(API Reference) by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F159\r\n* Add dotenv in single-table gpt model by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F161\r\n* Speed up regex inspector, Add chn\u002Feng name inspectors by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F162\r\n* Add single table metadata example by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F166\r\n* bugfix: SingleTableGPTModel._sample_with_data  \"has no attribute 'result'\" by @aaronrmm in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F174\r\n* Change Metadata.column_list from Set to  List by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F176\r\n* Remove unnecessary dependency torchvision by @Guo-Yunzhe in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F177\r\n* Update pyproject.toml (joblib version) by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F175\r\n* Bugfix: fix gussian copula segmentfault error by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F180\r\n* Bugfix: fix division by zero error in numeric inspector, add comments by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F181\r\n* Intro data processor in sdgx by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F171\r\n* Intro data processor in Readme by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F182\r\n* Fix View GFI Link in Readme by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F183\r\n* Fix precision problem in metric's testcases by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F185\r\n* Use GLM-4 by @TracyWang95 in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F188\r\n* Pin numpy\u003C2 by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F190\r\n* Feature: Add Email Generator (a new type of sdgx.data_processor) by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F184\r\n* Add ChnPiiGenerator and Enhance Models by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F191\r\n* Update documentation and docstrings for DataProcessors by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F186\r\n* Add live QR code by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F198\r\n* Enhance Data Handling with Empty Column Inspector and Transformer by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F197\r\n* Update NonValueTransformer's Default Setting and Handle Custom Fill Values by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F199\r\n* Enhance Chinese Name Inspector by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F200\r\n* Add Chinese Company Name Support and Inspector by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F201\r\n* Update Live QR Code Image by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F203\r\n* BugFix: `base_url` not included when request to gpt in SingleTableGPTModel by @jalr4ever in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F205\r\n* Enhance: Fix Data Quality with Outlier Handling and Improved Missing Value Treatment by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F207\r\n* Typo Fix: Unified Logger Usage by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F209\r\n* Update Live QR Code Image 0730 by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F210\r\n* Bugfix: Update Fit Methods in Data Processors by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F211\r\n* Add ConstInspector and ConstValueTransformer for Handling Constant Columns by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F202\r\n* Enhance: Add NonValueTransformer Reverse Conversion with NAN_VALUE Replacement by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F212\r\n* Maintenance: Update CTGAN Example to Use Latest SDG by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F213\r\n* Fix Minor Typo by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F216\r\n* Enhance Numeric Data Inspection and Introduce Positive\u002FNegative Filtering by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F217\r\n* Fix Division by Zero Error in Numeric Column Inspection by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F220\r\n\r\n## New Contributors\r\n* @aaronrmm made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F174\r\n* @Guo-Yunzhe made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fp","2024-10-11T01:57:58",{"id":197,"version":198,"summary_zh":199,"released_at":200},114891,"0.2.0","## What's Changed\r\n\r\n### LLM-Based SingleTable Model\r\n\r\nA single-table data synthesis model based on LLM is included, view colab example: \r\n- [LLM: Data Synthesis](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1VFnP59q3eoVtMJ1PvcYjmuXtx9N8C7o0?usp=sharing);\r\n- [LLM: Off-table Feature Inference](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1_chuTVZECpj5fklj-RAp7ZVrew8weLW_?usp=sharing).\r\n\r\nCommits:\r\n\r\n* Introduce LLM-based single-table model. by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F129\r\n* Bugfix: fix model type typo by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F144\r\n* Bugfix: fix return datatype in _sample_with_metadata by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F145\r\n* Bugfix: fix LLM result typo by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F146\r\n\r\n\r\n### Improvements  on Inspectors\r\n\r\n* Add Regex Inspector and Email Inspector example. by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F115\r\n* Implement datetime_formats in DatetimeInspector by @Femi-lawal in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F125\r\n* Distinguish int\u002Ffloat in NumericInspector by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F133\r\n\r\n\r\n### Metadata \r\n\r\n* Bugfix: fix KeyError when metadata raising an MetadataInvalidError. by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F134\r\n* Add dict support on metadata, optimize datetime format judgment rules, add eq for combiner by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F135\r\n\r\n\r\n### Python 3.12 Support\r\n\r\n* Intro 3.12 in CI testing by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F141\r\n\r\n\r\n### Readme and Docs\r\n\r\n* Update README.md by @iokk3732 in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F123\r\n* docs: add iokk3732 as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F127\r\n* docs: add Femi-lawal as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F128\r\n* Add language switch on Readme.md by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F130\r\n* Minor modifications on readme by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F131\r\n* Update SDG Readme by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F139\r\n* Update doc readme by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F140\r\n* Add Colab Examples, Update Readme by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F147\r\n* Update readme.md by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F150\r\n* Add ctgan description on Readme.md by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F151\r\n\r\n### Others \r\n\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F124\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F138\r\n\r\n\r\n\r\n## New Contributors\r\n* @iokk3732 made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F123\r\n* @Femi-lawal made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F125\r\n\r\n## Full Changelog\r\n\r\n**Please view**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.1.5...0.2.0","2024-03-01T03:31:50",{"id":202,"version":203,"summary_zh":204,"released_at":205},114892,"0.1.5","## What's Changed\r\n* docs: add Z712023 as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F112\r\n* Bugfix metric mutual information by @Z712023 in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F118\r\n* [Bugfix] Temporarily modify single table demo data link by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F121\r\n* Introduce inspect_level in inspector and metadata by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F113\r\n* Add start history chart in README by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F122\r\n\r\n## New Contributors\r\n* @Z712023 made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F118\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.1.4...0.1.5","2024-01-22T08:10:29",{"id":207,"version":208,"summary_zh":209,"released_at":210},114893,"0.1.4","## What's Changed\r\n* [Bugfix] Add future annotations by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F106\r\n* Add testing for JSD metrics by @sjh120 in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F100\r\n* Add base model for multi-table statistic model, change single-table base class location by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F102\r\n* Add mutual information metric by @Z712023  in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F101\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.1.3...0.1.4","2024-01-16T13:44:30",{"id":212,"version":213,"summary_zh":214,"released_at":215},114894,"0.1.3","## What's Changed\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F87\r\n* [0.2.0] Metadata Implementation by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F81\r\n* Patch on multi table combiner and test case by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F89\r\n* Fix typo _dumo_json by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F90\r\n* Intro dummy table for speedup models case by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F92\r\n* Intro torchrun in CLI by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F88\r\n* Implement  MetadataCombiner, partitial refactoring on Metadata by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F96\r\n* Add mock data and testing for multi tables' related imp by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F97\r\n* Intro SubsetRelationshipInspector by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F99\r\n* Add demo data for multi-table scenario by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F98\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.1.2...0.1.3","2024-01-08T06:46:44",{"id":217,"version":218,"summary_zh":219,"released_at":220},114895,"0.1.2","## What's Changed\r\n* CLI for singe table synthesizer by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F86\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.1.1...0.1.2","2023-12-23T02:30:14",{"id":222,"version":223,"summary_zh":224,"released_at":225},114896,"0.1.1","## What's Changed\r\n* Add more testing, fix some bugs, drop mem cache by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F85\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.1.0...0.1.1","2023-12-21T05:32:54",{"id":227,"version":228,"summary_zh":229,"released_at":230},114897,"0.1.0","## What's Changed\r\n* Update SDG's New Data Processor by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F48\r\n* Rewrite and notice copyright for CTGAN by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F50\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F49\r\n* docs: add Wh1isper as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F53\r\n* docs: add MooooCat as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F54\r\n* docs: add joeyscave as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F55\r\n* Breaking changes: Refactoring of new architecture by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F56\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F63\r\n* 0.1.0: Intro DataConnector and DataLoader by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F64\r\n* [0.1.0] Metadata and Inspector by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F67\r\n* [0.1.0]Breaking changes: Reactoring models Part 1 by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F68\r\n* [0.1.0] Update a part of docs by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F70\r\n* Update Base Class of Metric by @MooooCat in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F60\r\n* docs: add sjh120 as a contributor for code by @allcontributors in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F73\r\n* [0.1.0] Refactoring CTGAN for DataLoader by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F72\r\n* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F76\r\n* [0.1.0] Intro NDArryLoader by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F75\r\n* Add subdir for NDArrayLoader to prevent collision of cache files by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F78\r\n* Init benchmark base code by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F80\r\n* Switch to cloudpickle and fix load bugs by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F83\r\n* Update docstring and user guides by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F84\r\n\r\n## New Contributors\r\n* @allcontributors made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F53\r\n* @sjh120 made their first contribution in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F60\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.0.1...0.1.0","2023-12-20T09:17:41",{"id":232,"version":233,"summary_zh":234,"released_at":235},114898,"0.1.0b0","## What's Changed\r\n* Init benchmark base code by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F80\r\n* Switch to cloudpickle and fix load bugs by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F83\r\n* Update docstring and user guides by @Wh1isper in https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fpull\u002F84\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fhitsz-ids\u002Fsynthetic-data-generator\u002Fcompare\u002F0.1.0a0...0.1.0b0","2023-12-20T09:07:22"]