[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-tensorflow--datasets":3,"tool-tensorflow--datasets":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":78,"owner_email":79,"owner_twitter":78,"owner_website":80,"owner_url":81,"languages":82,"stars":119,"forks":120,"last_commit_at":121,"license":122,"difficulty_score":123,"env_os":124,"env_gpu":124,"env_ram":124,"env_deps":125,"category_tags":130,"github_topics":131,"view_count":10,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":137,"updated_at":138,"faqs":139,"releases":168},1383,"tensorflow\u002Fdatasets","datasets","TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...","datasets（原名 TensorFlow Datasets，简称 TFDS）是一个专为机器学习和深度学习打造的开源数据集库。它汇集了海量公开数据集，并将其统一封装为标准的 `tf.data.Dataset` 格式，让开发者能轻松加载数据并直接用于 TensorFlow、JAX 等主流框架的训练任务。\n\n在 AI 开发中，寻找、下载、清洗和格式化数据往往耗时费力且容易出错。datasets 通过自动化处理这些繁琐的预处理步骤，解决了数据准备阶段的痛点，确保所有用户获取的数据顺序一致、结果可复现，同时遵循最佳实践以实现极高的读取性能。无论是快速原型验证还是大规模模型训练，它都能让数据管道构建变得简单高效。\n\n这款工具非常适合 AI 研究人员、算法工程师以及深度学习爱好者使用。其核心亮点在于“开箱即用”的简洁性：只需几行代码即可加载如 MNIST 等经典数据集，并自动完成打乱、分批和预取等操作。此外，它还具备高度的可扩展性，高级用户可以轻松添加自定义数据集或微调加载逻辑。如果你希望将精力集中在模型创新而非数据清洗上，datasets 将是你得力的助手。","# TensorFlow Datasets\n\nTensorFlow Datasets provides many public datasets as `tf.data.Datasets`.\n\n[![Unittests](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Factions\u002Fworkflows\u002Fpytest.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Factions\u002Fworkflows\u002Fpytest.yml)\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Ftensorflow-datasets.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Ftensorflow-datasets)\n[![Python 3.10+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n[![Tutorial](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdoc-tutorial-blue.svg)](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Foverview)\n[![API](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdoc-api-blue.svg)](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fapi_docs\u002Fpython\u002Ftfds)\n[![Catalog](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdoc-datasets-blue.svg)](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Foverview#all_datasets)\n\n## Documentation\n\nTo install and use TFDS, we strongly encourage to start with our\n[**getting started guide**](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Foverview). Try\nit interactively in a\n[Colab notebook](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ftensorflow\u002Fdatasets\u002Fblob\u002Fmaster\u002Fdocs\u002Foverview.ipynb).\n\nOur documentation contains:\n\n* [Tutorials and guides](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Foverview)\n* List of all [available datasets](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Foverview#all_datasets)\n* The [API reference](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fapi_docs\u002Fpython\u002Ftfds)\n\n```python\n# !pip install tensorflow-datasets\nimport tensorflow_datasets as tfds\nimport tensorflow as tf\n\n# Construct a tf.data.Dataset\nds = tfds.load('mnist', split='train', as_supervised=True, shuffle_files=True)\n\n# Build your input pipeline\nds = ds.shuffle(1000).batch(128).prefetch(10).take(5)\nfor image, label in ds:\n  pass\n```\n\n## TFDS core values\n\nTFDS has been built with these principles in mind:\n\n* **Simplicity**: Standard use-cases should work out-of-the box\n* **Performance**: TFDS follows\n  [best practices](https:\u002F\u002Fwww.tensorflow.org\u002Fguide\u002Fdata_performance)\n  and can achieve state-of-the-art speed\n* **Determinism\u002Freproducibility**: All users get the same examples in the same\n  order\n* **Customisability**: Advanced users can have fine-grained control\n\nIf those use cases are not satisfied, please send us\n[feedback](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues).\n\n## Want a certain dataset?\n\nAdding a dataset is really straightforward by following\n[our guide](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fadd_dataset).\n\nRequest a dataset by opening a\n[Dataset request GitHub issue](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002Fnew?assignees=&labels=dataset+request&template=dataset-request.md&title=%5Bdata+request%5D+%3Cdataset+name%3E).\n\nAnd vote on the current\n[set of requests](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Flabels\u002Fdataset%20request)\nby adding a thumbs-up reaction to the issue.\n\n### Citation\n\nPlease include the following citation when using `tensorflow-datasets` for a\npaper, in addition to any citation specific to the used datasets.\n\n```bibtex\n@misc{TFDS,\n  title = {{TensorFlow Datasets}, A collection of ready-to-use datasets},\n  howpublished = {\\url{https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets}},\n}\n```\n\n#### *Disclaimers*\n\n*This is a utility library that downloads and prepares public datasets. We do*\n*not host or distribute these datasets, vouch for their quality or fairness, or*\n*claim that you have license to use the dataset. It is your responsibility to*\n*determine whether you have permission to use the dataset under the dataset's*\n*license.*\n\n*If you're a dataset owner and wish to update any part of it (description,*\n*citation, etc.), or do not want your dataset to be included in this*\n*library, please get in touch through a GitHub issue. Thanks for your*\n*contribution to the ML community!*\n\n*If you're interested in learning more about responsible AI practices, including*\n*fairness, please see Google AI's [Responsible AI Practices](https:\u002F\u002Fai.google\u002Feducation\u002Fresponsible-ai-practices).*\n\n*`tensorflow\u002Fdatasets` is Apache 2.0 licensed. See the\n[`LICENSE`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fblob\u002Fmaster\u002FLICENSE) file.*\n","# TensorFlow 数据集\n\nTensorFlow 数据集提供了众多公开数据集，可通过 `tf.data.Datasets` 来访问。\n\n[![单元测试](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Factions\u002Fworkflows\u002Fpytest.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Factions\u002Fworkflows\u002Fpytest.yml)\n[![PyPI 版本](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Ftensorflow-datasets.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Ftensorflow-datasets)\n[![Python 3.10+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n[![教程](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdoc-tutorial-blue.svg)](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Foverview)\n[![API](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdoc-api-blue.svg)](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fapi_docs\u002Fpython\u002Ftfds)\n[![目录](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdoc-datasets-blue.svg)](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Foverview#all_datasets)\n\n## 文档\n\n如需安装并使用 TFDS，我们强烈建议您从我们的 **入门指南** 开始。您可以交互式地在 [Colab notebook](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ftensorflow\u002Fdatasets\u002Fblob\u002Fmaster\u002Fdocs\u002Foverview.ipynb) 中试用。\n\n我们的文档包含：\n\n* 【教程与指南】（https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Foverview）\n* 所有可用数据集的列表（https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Foverview#all_datasets）\n* 【API 参考】（https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fapi_docs\u002Fpython\u002Ftfds）\n\n```python\n# !pip install tensorflow-datasets\nimport tensorflow_datasets as tfds\nimport tensorflow as tf\n\n# 构建一个 tf.data.Dataset\nds = tfds.load('mnist', split='train', as_supervised=True, shuffle_files=True)\n\n# 构建您的输入管道\nds = ds.shuffle(1000).batch(128).prefetch(10).take(5)\nfor image, label in ds:\n  pass\n```\n\n## TFDS 核心价值\n\nTFDS 的构建始终秉持以下原则：\n\n* **简洁性**：标准使用场景应能即刻投入使用。\n* **性能**：TFDS 遵循【最佳实践】（https:\u002F\u002Fwww.tensorflow.org\u002Fguide\u002Fdata_performance），并能够实现业界领先的运行速度。\n* **确定性\u002F可重复性**：所有用户都能以相同的顺序获得相同的样本。\n* **可定制性**：高级用户可以实现精细的控制。\n\n若上述使用场景未能满足您的需求，请通过 [反馈](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues) 向我们提出建议。\n\n## 想要特定的数据集吗？\n\n只需按照我们的指南，即可轻松添加任意数据集——具体操作请参阅【指南】（https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fadd_dataset）。\n\n如需申请某项数据集，请打开 [数据集申请 GitHub 问题](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002Fnew?assignees=&labels=dataset+request&template=dataset-request.md&title=%5Bdata+request%5D+%3Cdataset+name%3E)。\n\n此外，您还可以对当前的【请求集合】（https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Flabels\u002Fdataset%20request）进行投票，只需在问题中添加一个“点赞”反应即可。\n\n### 引用格式\n\n在将 `tensorflow-datasets` 用于论文时，请务必附上以下引用信息，同时加入针对所用数据集的专属引用。\n\n```bibtex\n@misc{TFDS,\n  title = {{TensorFlow 数据集}, 一套即用型数据集的集合},\n  howpublished = {\\url{https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets}},\n}\n```\n\n#### *免责声明*\n\n*本库为一款工具类库，负责下载并准备公开数据集。我们不负责托管或分发这些数据集，亦不对其质量或公平性作任何背书，更不声称您拥有使用该数据集的合法授权。是否具备使用该数据集的权限，需由您自行判断，并依据数据集自身的许可协议来决定。*\n\n*如果您是数据集的所有者，且希望更新数据集的任何部分（例如数据描述、引用等），或希望避免将您的数据集纳入本库，请通过 GitHub 问题与我们联系。感谢您为机器学习社区作出的贡献！*\n\n*如果您对负责任的 AI 实践，包括公平性等问题感兴趣，请参阅 Google AI 的【负责任的 AI 实践】（https:\u002F\u002Fai.google\u002Feducation\u002Fresponsible-ai-practices）。*\n\n*`tensorflow\u002Fdatasets` 采用 Apache 2.0 许可证。请参阅【LICENSE】（https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fblob\u002Fmaster\u002FLICENSE）文件。*","# TensorFlow Datasets (TFDS) 快速上手指南\n\nTensorFlow Datasets (TFDS) 是一个集合了众多公共数据集的库，可将其直接转换为 `tf.data.Dataset` 对象，方便用于机器学习模型的训练与评估。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS, 或 Windows\n*   **Python 版本**：3.10 或更高版本\n*   **前置依赖**：已安装 `tensorflow` (推荐 2.x 版本)\n\n## 安装步骤\n\n您可以使用 `pip` 直接安装最新版本的 `tensorflow-datasets`。\n\n### 标准安装\n```bash\npip install tensorflow-datasets\n```\n\n### 国内加速安装（推荐）\n如果您在中国大陆地区，建议使用清华或阿里镜像源以加快下载速度：\n\n```bash\npip install tensorflow-datasets -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n> **注意**：首次运行代码加载数据集时，TFDS 会自动从源头下载数据。如果下载速度慢，可能需要配置网络代理或使用支持断点续传的环境。\n\n## 基本使用\n\n以下是加载数据集并构建输入管道的最简示例。本例以经典的 **MNIST** 手写数字数据集为例：\n\n```python\n# !pip install tensorflow-datasets\nimport tensorflow_datasets as tfds\nimport tensorflow as tf\n\n# 1. 构造 tf.data.Dataset\n# split='train': 加载训练集\n# as_supervised=True: 返回 (特征，标签) 元组\n# shuffle_files=True: 打乱文件顺序\nds = tfds.load('mnist', split='train', as_supervised=True, shuffle_files=True)\n\n# 2. 构建输入管道\n# shuffle: 打乱数据\n# batch: 设置批次大小\n# prefetch: 预取数据以提升性能\n# take: 仅取前 5 个批次用于演示（实际训练中可移除）\nds = ds.shuffle(1000).batch(128).prefetch(10).take(5)\n\n# 3. 遍历数据\nfor image, label in ds:\n  pass  # 在此处添加您的模型训练逻辑\n```\n\n### 核心特性说明\n*   **开箱即用**：常用场景无需复杂配置即可运行。\n*   **高性能**：遵循 TensorFlow 数据加载最佳实践，支持高速读取。\n*   **确定性**：保证不同用户获取的数据示例及其顺序一致，确保实验可复现。\n*   **可扩展性**：高级用户可根据需求自定义数据集加载逻辑。\n\n如需查看支持的所有数据集列表，请访问 [TFDS 数据集目录](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Foverview#all_datasets)。","某计算机视觉团队正在开发一个基于 MNIST 手写数字识别的原型模型，急需快速构建标准化的数据输入管道以验证算法有效性。\n\n### 没有 datasets 时\n- **数据获取繁琐**：工程师需手动访问多个网站下载原始压缩包，并编写复杂的解压与清洗脚本，耗时数小时甚至数天。\n- **格式不统一**：不同来源的数据结构各异，需要额外编写代码将图像和标签转换为 TensorFlow 兼容的张量格式，极易出错。\n- **性能优化困难**：缺乏内置的高效加载机制，难以实现数据打乱（shuffle）、批处理（batch）及预取（prefetch），导致 GPU 等待数据，训练效率低下。\n- **复现性差**：由于手动处理流程缺乏标准化，团队成员间或不同实验间的数据划分顺序不一致，严重影响实验结果的可复现性。\n\n### 使用 datasets 后\n- **一键加载数据**：仅需一行 `tfds.load('mnist')` 代码，即可自动下载、验证并准备就绪数千个公开数据集，将准备工作缩短至分钟级。\n- **原生格式兼容**：直接返回优化后的 `tf.data.Dataset` 对象，图像与标签已自动对齐且类型正确，无需任何额外的格式转换代码。\n- **极致性能表现**：内置遵循 TensorFlow 最佳实践的性能优化策略，轻松实现高效的数据流水线，最大化 GPU 利用率，显著加速模型训练。\n- **确定性与可复现**：保证所有用户在不同环境下获取完全一致的数据样本顺序，确保实验结果严格可复现，便于团队协作与论文发表。\n\ndatasets 通过将复杂的数据工程标准化为简单的 API 调用，让开发者能从繁琐的数据预处理中解放出来，专注于核心模型架构的创新与迭代。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorflow_datasets_6b23a619.png","tensorflow","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ftensorflow_07ed5093.png","",null,"github-admin@tensorflow.org","http:\u002F\u002Fwww.tensorflow.org","https:\u002F\u002Fgithub.com\u002Ftensorflow",[83,87,91,95,99,102,106,110,113,116],{"name":84,"color":85,"percentage":86},"Python","#3572A5",98.8,{"name":88,"color":89,"percentage":90},"Ruby","#701516",0.4,{"name":92,"color":93,"percentage":94},"Smalltalk","#596706",0.3,{"name":96,"color":97,"percentage":98},"NewLisp","#87AED7",0.2,{"name":100,"color":101,"percentage":98},"JavaScript","#f1e05a",{"name":103,"color":104,"percentage":105},"Shell","#89e051",0.1,{"name":107,"color":108,"percentage":109},"Erlang","#B83998",0,{"name":111,"color":112,"percentage":109},"TeX","#3D6117",{"name":114,"color":115,"percentage":109},"Gherkin","#5B2063",{"name":117,"color":118,"percentage":109},"Perl","#0298c3",4555,1593,"2026-04-04T10:43:47","Apache-2.0",1,"未说明",{"notes":126,"python":127,"dependencies":128},"该工具是一个实用程序库，用于下载和准备公共数据集，本身不托管数据集。用户需自行确认数据集的使用许可。建议在 Google Colab 中交互式试用或参考官方入门指南。","3.10+",[75,129],"tensorflow-datasets",[51,13],[75,132,133,67,134,135,136],"machine-learning","data","numpy","jax","dataset","2026-03-27T02:49:30.150509","2026-04-06T06:56:30.298645",[140,145,150,155,160,164],{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},6357,"遇到 'ImportError: cannot import name array_record_module' 错误该如何解决？","这通常是由于 tensorflow-datasets 版本不兼容导致的。最有效的解决方法是将 tensorflow-datasets 降级到 4.8.3 版本。请运行以下命令：\npip install -U tensorflow-datasets==4.8.3\n许多用户反馈升级到 4.9.0 或更高版本会出现此问题，而降级即可修复。","https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002F4805",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},6358,"在没有网络连接或无法访问 Google Cloud 时，如何避免 'Google authentication bearer token failed' 错误？","如果您要加载的数据集已经完整存在于本地目录中，可以使用 `tfds.builder_from_directory` API 来加载数据集，这样可以避免库尝试连接 Google Cloud 进行身份验证。\n如果必须使用构建流程且数据集尚未生成，部分用户通过修改源码临时解决：注释掉 `tensorflow_datasets\u002Fcore\u002Fdataset_builder.py` 文件中第 296 行（调用 `self.info.initialize_from_bucket()` 的位置），但这属于非官方修改，需谨慎使用。","https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002F2761",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},6359,"处理 c4\u002Fmultilingual 数据集时生成的 Dataflow 任务文件过大（超过 10MB 限制）怎么办？","在处理大型多语言数据集（如 c4\u002Fmultilingual）时，需要调整 Apache Beam 的配置以支持大规模并行处理。建议采取以下措施：\n1. 固定 `dill` 库的版本以避免序列化错误（例如：`echo \"dill==0.3.1.1\" >> requirements.txt`）。\n2. 在 Beam 选项中增加 worker 数量（例如设置为 450 个 worker）。\n3. 启用实验性洗牌模式：添加 `experiments=shuffle_mode=service` 参数。\n此外，确保已申请足够的 GCP 配额（Quota），因为默认配额通常不足以支撑此类大规模任务。","https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002F2711",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},6360,"如何成为 tensorflow\u002Fdatasets 项目的协作者以认领 Issue？","若希望被分配 Issue 或贡献代码，需要先加入协作者列表。步骤如下：\n1. 在指定的协作 Issue（如 #142）下留言评论，请求添加为协作者（例如：\"Please add me as collaborator\"）。\n2. 等待维护者邀请后，前往 https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fsettings\u002Fcollaboration 接受邀请。\n完成上述步骤后，您即可被分配 Issue 并进行贡献。","https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002F142",{"id":161,"question_zh":162,"answer_zh":163,"source_url":144},6361,"在 Windows 上运行 tensorflow-datasets 时遇到各类导入错误或兼容性问题，有什么通用排查思路？","Windows 用户遇到奇怪的安装或导入错误（如 array_record 相关错误）时，首先尝试清理环境并重新安装特定稳定版本。常见有效方案是：\n1. 卸载当前版本：`pip uninstall tensorflow-datasets`\n2. 强制安装已知稳定的旧版本：`pip install tensorflow-datasets==4.8.3`\n很多新版本的 bug 在 Windows 环境下表现明显，回退到 4.8.3 往往能立即解决问题。同时确保 Python 版本（如 3.8 或 3.9）与 TFDS 版本兼容。",{"id":165,"question_zh":166,"answer_zh":167,"source_url":149},6362,"如何在本地构建自定义数据集时跳过 Google 认证步骤？","当您在本地运行 `tfds build` 构建自定义数据集时，如果不需要上传到 GCS 且没有配置 Google 凭证，可能会遇到认证报错。如果您的数据集文件已完全下载到本地，请改用 `tfds.builder_from_directory(dir_path)` 直接加载，而不是重新运行 build 流程。这会绕过所有云端认证检查。如果必须运行 build 且无网络，目前暂无官方开关完全禁用认证，部分高级用户选择临时注释源码中的初始化桶信息代码行，但不推荐生产环境使用。",[169,174,179,184,189,194,199,204,209,214,219,224,229,234,239,244,249,254,259,264],{"id":170,"version":171,"summary_zh":172,"released_at":173},105890,"v4.9.9","### Added\n\n- [LBPP dataset](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Flbpp).\n\n### Changed\n\n- `apache-beam` version is pinned at `\u003C2.65.0` until related tests are fixed,\n  see issue [11055](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002F11055).\n\n### Deprecated\n\n### Removed\n\n### Fixed\n\n- CroissantBuilder now supports Croissant files without patch version (i.e. only\n  {major.minor} are provided).\n- Various small bug fixes.\n\n### Security","2025-05-28T13:38:19",{"id":175,"version":176,"summary_zh":177,"released_at":178},105891,"v4.9.8","### Added\n\n- New Beam writer `NoShuffleBeamWriter` that doesn't shuffle, which speeds up\n  dataset generation significantly, but does not have deterministic order\n  guarantees. Can be enabled with the flag `--nondeterministic_order`.\n- CroissantBuilder now supports Croissant files that define splits; and new\n  feature types: feature dictionaries and multidimensional arrays.\n- New datasets.\n\n### Changed\n\n### Deprecated\n\n### Removed\n\n### Fixed\n\n- Various small bug fixes.\n- Various performance improvements.\n\n### Security","2025-03-12T16:04:38",{"id":180,"version":181,"summary_zh":182,"released_at":183},105892,"v4.9.7","### Added\n\n- New datasets.\n\n### Changed\n\n- `CroissantBuilder`'s API to generate TFDS datasets from Croissant files.\n\n### Deprecated\n\n### Removed\n\n### Fixed\n\n- Versions for existing datasets.\n\n### Security","2024-10-30T12:58:53",{"id":185,"version":186,"summary_zh":187,"released_at":188},105893,"v4.9.6","### Added\n\n-   Full support for Python 3.12.","2024-06-05T08:15:47",{"id":190,"version":191,"summary_zh":192,"released_at":193},105894,"v4.9.5","### Added\n\n-   Support to download and prepare datasets using the\n    [Parquet](https:\u002F\u002Fparquet.apache.org) data format.\n    ```python\n    builder = tfds.builder('fashion_mnist', file_format='parquet')\n    builder.download_and_prepare()\n    ds = builder.as_dataset(split='train')\n    print(next(iter(ds)))\n    ```\n\n-   [`tfds.data_source`](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fapi_docs\u002Fpython\u002Ftfds\u002Fdata_source)\n    is pickable, thus working smoothly with\n    [PyGrain](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fgrain). Learn more by following the\n    [tutorial](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fdata_source).\n\n-   TFDS plays nicely with\n    [Croissant](https:\u002F\u002Fmlcommons.org\u002Fworking-groups\u002Fcroissant). Learn more by\n    following the\n    [recipe](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fmlcommons\u002Fcroissant\u002Fblob\u002Fmain\u002Fpython\u002Fmlcroissant\u002Frecipes\u002Ftfds_croissant_builder.ipynb).\n\n### Changed\n\n### Deprecated\n\n### Removed\n\n### Fixed\n\n### Security","2024-05-30T08:37:58",{"id":195,"version":196,"summary_zh":197,"released_at":198},105895,"v4.9.4","### Added\n\n-   A new [CroissantBuilder](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fformat_specific_dataset_builders#croissantbuilder)\n    which initializes a DatasetBuilder based on a [Croissant](https:\u002F\u002Fgithub.com\u002Fmlcommons\u002Fcroissant)\n    metadata file.\n-   New conversion options between different bounding boxes formats.\n-   Better support for `HuggingfaceDatasetBuilder`.\n-   A [script](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fblob\u002Fmaster\u002Ftensorflow_datasets\u002Fscripts\u002Fconvert_format.py)\n    to convert a dataset from one format to another.\n\n### Changed\n\n### Deprecated\n\n-   Python 3.9 support. TFDS now uses Python 3.10\n\n### Removed\n\n### Fixed\n\n### Security","2023-12-18T13:28:11",{"id":200,"version":201,"summary_zh":202,"released_at":203},105896,"v4.9.3","### Added\n\n-   [Segment Anything](https:\u002F\u002Fai.facebook.com\u002Fdatasets\u002Fsegment-anything-downloads)\n    (SA-1B) dataset.\n\n### Changed\n\n-   Hugging Face datasets accept `None` values for any features. TFDS has no\n    `tfds.features.Optional`, so `None` values are converted to default values.\n    Those default values used to be `0` and `0.0` for int and float. Now, it's\n    `-inf` as defined by NumPy (e.g., `np.iinfo(np.int32).min` or\n    `np.finfo(np.float32).min`). This avoids ambiguous values when `0` and `0.0`\n    exist in the values of the dataset. The roadmap is to implement\n    `tfds.features.Optional`.\n\n### Deprecated\n\n-   Python 3.8 support. As per\n    [NEP 29](https:\u002F\u002Fnumpy.org\u002Fneps\u002Fnep-0029-deprecation_policy.html), TFDS now\n    uses Python>=3.9.\n\n### Removed\n\n### Fixed\n\n### Security","2023-09-08T09:07:53",{"id":205,"version":206,"summary_zh":207,"released_at":208},105897,"v4.9.2","### Added\n\n-   [Experimental] A list of freeform text tags can now be attached to a\n    `BuilderConfig`. For example:\n    ```py\n    BUILDER_CONFIGS = [\n        tfds.core.BuilderConfig(name=\"foo\", tags=[\"foo\", \"live\"]),\n        tfds.core.BuilderConfig(name=\"bar\", tags=[\"bar\", \"old\"]),\n    ]\n    ```\n    The tags are recorded with the dataset metadata and can later be retrieved\n    using the info object:\n    ```py\n    builder.info.config_tags  # [\"foo\", \"live\"]\n    ```\n    This feature is experimental and there are no guidelines on tags format.\n\n### Changed\n\n### Deprecated\n\n### Removed\n\n### Fixed\n\n-   Fixed generated proto files (see issue [4858](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002F4858)).\n\n### Security","2023-04-13T11:21:10",{"id":210,"version":211,"summary_zh":212,"released_at":213},105898,"v4.9.1","### Added\n\n### Changed\n\n### Deprecated\n\n### Removed\n\n### Fixed\n\n-   The installation on macOS now works (see issues\n    [4805](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002F4805) and\n    [4852](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002F4852)). The ArrayRecord\n    dependency is lazily loaded, so the\n    [TensorFlow-less path](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Ftfless_tfds) is\n    not possible at the moment on macOS. A fix for this will follow soon.\n\n### Security","2023-04-11T13:16:52",{"id":215,"version":216,"summary_zh":217,"released_at":218},105899,"v4.9.0","### Added\n\n-   Native support for JAX and PyTorch. TensorFlow is no longer a dependency for\n    reading datasets. See the\n    [documentation](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Ftfless_tfds).\n-   Added minival split to\n    [LVIS dataset](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Flvis).\n-   [Mixed-human](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Frobomimic_mh) and\n    [machine-generated](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Frobomimic_mg)\n    robomimic datasets.\n-   WebVid dataset.\n-   ImagenetPI dataset.\n-   [Wikipedia](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fwikipedia) for\n    20230201.\n\n### Changed\n\n-   Support for `tensorflow=2.12`.\n\n### Deprecated\n\n### Removed\n\n### Fixed\n\n### Security","2023-04-05T07:30:49",{"id":220,"version":221,"summary_zh":222,"released_at":223},105900,"v4.8.3","### Added\n\n### Changed\n\n### Deprecated\n\n-   Python 3.7 support: this version and future version use Python 3.8.\n\n### Removed\n\n### Fixed\n\n-   Flag `ignore_verifications` from Hugging Face's `datasets.load_dataset` is\n    deprecated, and used to cause errors in `tfds.load(huggingface:foo)`.\n\n### Security","2023-02-27T11:46:00",{"id":225,"version":226,"summary_zh":227,"released_at":228},105901,"v4.8.2","### Deprecated\n\n-   Python 3.7 support: this is the last version of TFDS supporting Python 3.7.\n    Future versions will use Python 3.8.\n\n### Fixed\n\n-   `tfds new` and `tfds build` better support the new recommended datasets\n    organization, where individual datasets have their own package under\n    `datasets\u002F`, builder class is called `Builder` and is defined within module\n    `${dsname}_dataset_builder.py`.\n\n### Security","2023-01-17T20:41:36",{"id":230,"version":231,"summary_zh":232,"released_at":233},105902,"v4.8.1","### Changed\n\n- Added file `valid_tags.txt` to not break builds.\n- TFDS no longer relies on TensorFlow DTypes. We chose NumPy DTypes to keep the\ntyping expressiveness, while dropping the heavy dependency on TensorFlow. We\nmigrated all our internal datasets. Please, migrate accordingly:\n    - `tf.bool`: `np.bool_`\n    - `tf.string`: `np.str_`\n    - `tf.int64`, `tf.int32`, etc: `np.int64`, `np.int32`, etc\n    - `tf.float64`, `tf.float32`, etc: `np.float64`, `np.float32`, etc","2023-01-02T18:30:24",{"id":235,"version":236,"summary_zh":237,"released_at":238},105903,"v4.8.0","### Added\n\n-   [API] `DatasetBuilder`'s description and citations can be specified in\n    dedicated `README.md` and `CITATIONS.bib` files, within the dataset package\n    (see https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fadd_dataset).\n-   Tags can be associated to Datasets, in the `TAGS.txt` file. For\n    now, they are only used in the generated documentation.\n-   [API][Experimental] New `ViewBuilder` to define datasets as transformations\n    of existing datasets. Also adds `tfds.transform` with functionality to apply\n    transformations.\n-   Loggers are also called on `tfds.as_numpy(...)`, base `Logger` class has a\n    new corresponding method.\n-   `tfds.core.DatasetBuilder` can have a default limit for the number of\n    simultaneous downloads. `tfds.download.DownloadConfig` can override it.\n-   `tfds.features.Audio` supports storing raw audio data for lazy decoding.\n-   The number of shards can be overridden when preparing a dataset:\n    `builder.download_and_prepare(download_config=tfds.download.DownloadConfig(num_shards=42))`.\n    Alternatively, you can configure the min and max shard size if you want TFDS\n    to compute the number of shards for you, but want to have control over the\n    shard sizes.\n\n### Changed\n\n### Deprecated\n\n### Removed\n\n### Fixed\n\n### Security","2022-12-21T11:09:44",{"id":240,"version":241,"summary_zh":242,"released_at":243},105904,"v4.7.0","### Added\r\n\r\n-   [API] Added [TfDataBuilder](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fformat_specific_dataset_builders#datasets_based_on_tfdatadataset) that is handy for storing experimental ad hoc TFDS datasets in notebook-like environments such that they can be versioned, described, and easily shared with teammates.\r\n-   [API] Added options to create format-specific dataset builders. The new API now includes a number of NLP-specific builders, such as:\r\n    -   [CoNNL](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fformat_specific_dataset_builders#conll)\r\n    -   [CoNNL-U](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fformat_specific_dataset_builders#conllu)\r\n-   [API] Added `tfds.beam.inc_counter` to reduce `beam.metrics.Metrics.counter` boilerplate\r\n-   [API] Added options to group together existing TFDS datasets into [dataset collections](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fdataset_collections) and to perform simple operations over them.\r\n-   [Documentation] update, specifically:\r\n    -   [New guide](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fformat_specific_dataset_builders) on format-specific dataset builders;\r\n    -   [New guide](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fadd_dataset_collection) on adding new dataset collections to TFDS;\r\n    -   Updated [TFDS CLI](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcli) documentation.\r\n-   [TFDS CLI] Supports custom config through Json (e.g. `tfds build my_dataset --config='{\"name\": \"my_custom_config\", \"description\": \"Abc\"}'`)\r\n-   New datasets:\r\n    -   [conll2003](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fconll2003)\r\n    -   [universal_dependency 2.10](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Funiversal_dependency)\r\n    -   [bucc](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fbucc)\r\n    -   [i_naturalist2021](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fi_naturalist2021)\r\n    -   [mtnt](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fmtnt) Machine Translation of Noisy Text.\r\n    -   [placesfull](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fplacesfull)\r\n    -   [tatoeba](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Ftatoeba)\r\n    -   [user_libri_audio](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fuser_libri_audio)\r\n    -   [user_libri_text](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fuser_libri_text)\r\n    -   [xtreme_pos](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fxtreme_pos)\r\n    -   [yahoo_ltrc](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fyahoo_ltrc)\r\n-   Updated datasets:\r\n    -   [C4](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fc4) was updated to version 3.1.\r\n    -   [common_voice](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fcommon_voice) was updated to a more recent snapshot.\r\n    -   [wikipedia](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Fwikipedia) was updated with the `20220620` snapshot.\r\n-   New dataset collections, such as [xtreme](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fblob\u002Fmaster\u002Ftensorflow_datasets\u002Fdataset_collections\u002Fxtreme\u002Fxtreme.py) and [LongT5](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fblob\u002Fmaster\u002Ftensorflow_datasets\u002Fdataset_collections\u002Flongt5\u002Flongt5.py)\r\n\r\n### Changed\r\n\r\n-   The base `Logger` class expects more information to be passed to the `as_dataset` method. This should only be relevant to people who have implemented and registered custom `Logger` class(es).\r\n-   You can set `DEFAULT_BUILDER_CONFIG_NAME` in a `DatasetBuilder` to change the default config if it shouldn't be the first builder config defined in `BUILDER_CONFIGS`.\r\n\r\n### Deprecated\r\n\r\n### Removed\r\n\r\n### Fixed\r\n\r\n-   Various datasets\r\n-   In Linux, when loading a dataset from a directory that is not your home (`~`) directory, a new `~` directory is not created in the current directory (fixes [#4117](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fdatasets\u002Fissues\u002F4117)).\r\n\r\n### Security","2022-10-05T10:23:01",{"id":245,"version":246,"summary_zh":247,"released_at":248},105905,"v4.6.0","### Added\r\n- Support for community datasets on GCS.\r\n- [API] `tfds.builder_from_directory` and `tfds.builder_from_directories`, see\r\n  https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fexternal_tfrecord#directly_from_folder.\r\n- [API] Dash (\"-\") support in split names.\r\n- [API] `file_format` argument to `download_and_prepare` method, allowing user\r\n  to specify an alternative file format to store prepared data (e.g. \"riegeli\").\r\n- [API] `file_format` to `DatasetInfo` string representation.\r\n- [API] Expose the return value of Beam pipelines. This allows for users to\r\n  read the Beam metrics.\r\n- [API] Expose Feature `tf_example_spec` to public.\r\n- [API] `doc` kwarg on `Feature`s, to describe a feature.\r\n- [Documentation] Features description is shown on [TFDS Catalog](\r\n  https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcatalog\u002Foverview).\r\n- [Documentation] More metadata about HuggingFace datasets in TFDS catalog.\r\n- [Performance] Parallel load of metadata files.\r\n- [Testing] TFDS tests are now run using GitHub actions - misc improvements such\r\n  as caching and sharding.\r\n- [Testing] Improvements to MockFs.\r\n- New datasets.\r\n### Changed\r\n- [API] `num_shards` is now optional in the shard name.\r\n### Removed\r\n- TFDS pathlib API, migrated to a self-contained `etils.epath` (see\r\n  https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fetils).\r\n### Fixed\r\n- Various datasets.\r\n- Dataset builders that are defined adhoc (e.g. in Colab).\r\n- Better `DatasetNotFoundError` messages.\r\n- Don't set `deterministic` on a global level but locally in interleave, so it\r\n  only apply to interleave and not all transformations.\r\n- Google drive downloader.\r\n\r\nAs always, thank you to all contributors!","2022-06-02T09:21:23",{"id":250,"version":251,"summary_zh":252,"released_at":253},105906,"v4.5.2","Release notes:\r\n\r\n* Fix import bug on Windows (#3709)\r\n* Updated documentation","2022-01-31T15:45:29",{"id":255,"version":256,"summary_zh":257,"released_at":258},105907,"v4.5.1","Release notes:\r\n\r\n* Fix import bug on Windows (#3709)\r\n* Add `split=tfds.split_for_jax_process('train')` (alias of `tfds.even_splits('train', n=jax.process_count())[jax.process_index()]`)\r\n","2022-01-31T12:10:06",{"id":260,"version":261,"summary_zh":262,"released_at":263},105908,"v4.5.0","This is the last version of TFDS supporting 3.6. Future version will use 3.7\r\n\r\n* Better split API:\r\n  * Splits can be selected using shards: `split='train[3shard]'`\r\n  * Underscore supported in numbers for better readability: `split='train[:500_000]'`\r\n  * Select the union of all splits with `split='all'`\r\n  * [`tfds.even_splits`](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fsplits#tfdseven_splits_multi-host_training) is more precise and flexible:\r\n     * Return splits exactly of the same size when passed `tfds.even_splits('train', n=3, drop_remainder=True)`\r\n     * Works on subsplits `tfds.even_splits('train[:75%]', n=3)` or even nested\r\n     * Can be composed with other splits: `tfds.even_splits('train', n=3)[0] + 'test'`\r\n\r\n* FeatureConnectors:\r\n  * Faster dataset generation (using tfrecords)\r\n  * Features now have `serialize_example` \u002F `deserialize_example` methods to encode\u002Fdecode example to proto: `example_bytes = features.serialize_example(example_data)`\r\n  * `Audio` now supports `encoding='zlib'` for better compression\r\n  * Features specs exposed in proto for better compatibility with other languages\r\n\r\n* Better testing:\r\n  * Mock dataset now supports nested datasets\r\n  * Customize the number of sub examples\r\n\r\n* Documentation update:\r\n  * Community datasets: https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcommunity_catalog\u002Foverview\r\n  * New [guide on TFDS and determinism](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fdeterminism)\r\n\r\n* [RLDS](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Frlds):\r\n  * Nested datasets features are supported\r\n  * New datasets: Robomimic, D4RL Ant Maze, RLU Real World RL, and RLU Atari with ordered episodes\r\n\r\n* Misc:\r\n  * Create beam pipeline using TFDS as input with [tfds.beam.ReadFromTFDS](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fapi_docs\u002Fpython\u002Ftfds\u002Fbeam\u002FReadFromTFDS)\r\n  * Support setting the file formats in `tfds build --file_format=tfrecord`\r\n  * Typing annotations exposed in `tfds.typing`\r\n  * `tfds.ReadConfig` has a new `assert_cardinality=False` to disable cardinality\r\n  * Add a tfds.display_progress_bar(True) for functional control\r\n  * Support for huge number of shards (>99999)\r\n  * DatasetInfo exposes `.release_notes`\r\n\r\nAnd of course, new datasets, bug fixes,...\r\n\r\nThank you to all our contributors for improving TFDS!","2022-01-26T09:44:08",{"id":265,"version":266,"summary_zh":267,"released_at":268},105909,"v4.4.0","**API**:\r\n\r\n* Add [`PartialDecoding` support](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fdecode#only_decode_a_sub-set_of_the_features), to decode only a subset of the features (for performances)\r\n* Catalog now expose links to [KnowYourData visualisations](https:\u002F\u002Fknowyourdata-tfds.withgoogle.com\u002F)\r\n* `tfds.as_numpy` supports datasets with `None`\r\n* Dataset generated with `disable_shuffling=True` are now read in generation order.\r\n* Loading datasets from files now supports custom `tfds.features.FeatureConnector`\r\n* `tfds.testing.mock_data` now supports\r\n  * non-scalar tensors with dtype `tf.string`\r\n  * `builder_from_files` and path-based community datasets\r\n* File format automatically restored (for datasets generated with `tfds.builder(..., file_format=)`).\r\n* Many new reinforcement learning datasets\r\n* Various bug fixes and internal improvements like:\r\n  * Dynamically set number of worker thread during extraction\r\n  * Update progression bar during download even if downloads are cached\r\n\r\n**Dataset creation:**\r\n\r\n* Add `tfds.features.LabeledImage` for semantic segmentation (like image but with additional `info.features['image_label'].name` label metadata)\r\n* Add float32 support for `tfds.features.Image` (e.g. for depth map)\r\n* All FeatureConnector can now have a `None` dimension anywhere (previously restricted to the first position).\r\n* `tfds.features.Tensor()` can have arbitrary number of dynamic dimension (`Tensor(..., shape=(None, None, 3, None)`))\r\n* `tfds.features.Tensor` can now be serialised as bytes, instead of float\u002Fint values (to allow better compression): `Tensor(..., encoding='zlib')`\r\n* Add script to add TFDS metadata files to existing TF-record (see [doc](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fexternal_tfrecord)).\r\n* New guide on [common implementation gotchas](https:\u002F\u002Fwww.tensorflow.org\u002Fdatasets\u002Fcommon_gotchas)\r\n\r\nThank you all for your support and contribution!\r\n","2021-07-28T12:29:08"]