[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-capitalone--DataProfiler":3,"tool-capitalone--DataProfiler":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":104,"forks":105,"last_commit_at":106,"license":107,"difficulty_score":108,"env_os":109,"env_gpu":109,"env_ram":109,"env_deps":110,"category_tags":115,"github_topics":116,"view_count":136,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":137,"updated_at":138,"faqs":139,"releases":169},592,"capitalone\u002FDataProfiler","DataProfiler","What's in your data? Extract schema, statistics and entities from datasets","DataProfiler 是一款基于 Python 的开源库，专注于让数据分析和监控变得简单直观。DataProfiler 主要解决用户在面对海量异构数据时，难以快速掌握数据结构、分布特征及潜在风险的问题。通过简单的命令，DataProfiler 能够自动解析 CSV、JSON、Parquet 等多种格式文件，将其转换为 DataFrame 并生成包含全局统计与列级细节的数据画像。\n\nDataProfiler 非常适合数据工程师、算法研究人员以及需要频繁处理数据的开发者。DataProfiler 的核心亮点在于内置了预训练的深度神经网络模型，无需额外配置即可高效识别个人身份信息（PII）和医疗信息（NPI）等敏感数据，有效辅助合规性检查。此外，用户还能根据业务需求灵活扩展实体识别规则。DataProfiler 生成的结构化报告可直接用于下游应用或审计报表，显著降低数据探索门槛，是进行数据质量评估和隐私保护的理想选择。","![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002FDataProfiler)\n![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FCapitalOne\u002FDataProfiler)\n![GitHub last commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002FCapitalOne\u002FDataProfiler)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcapitalone_DataProfiler_readme_569994c28cc7.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fdataprofiler)\n\n\u003Cp text-align=\"left\">\n    \u003Cpicture>\n      \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fraw\u002Fgh-pages\u002Fdocs\u002Fsource\u002F_static\u002Fimages\u002FDataProfilerDarkLogoLong.png\">\n      \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fraw\u002Fgh-pages\u002Fdocs\u002Fsource\u002F_static\u002Fimages\u002FDataProfilerLogoLightThemeLong.png\">\n      \u003Cimg alt=\"Shows a black logo in light color mode and a white one in dark color mode.\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcapitalone_DataProfiler_readme_b09ad3828efa.png\">\n    \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n# Data Profiler | What's in your data?\n\nThe DataProfiler is a Python library designed to make data analysis, monitoring, and **sensitive data detection** easy.\n\nLoading **Data** with a single command, the library automatically formats & loads files into a DataFrame. **Profiling** the Data, the library identifies the schema, statistics, entities (PII \u002F NPI) and more. Data Profiles can then be used in downstream applications or reports.\n\nGetting started only takes a few lines of code ([example csv](https:\u002F\u002Fraw.githubusercontent.com\u002Fcapitalone\u002FDataProfiler\u002Fmain\u002Fdataprofiler\u002Ftests\u002Fdata\u002Fcsv\u002Faws_honeypot_marx_geo.csv)):\n\n```python\nimport json\nfrom dataprofiler import Data, Profiler\n\ndata = Data(\"your_file.csv\") # Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL\n\nprint(data.data.head(5)) # Access data directly via a compatible Pandas DataFrame\n\nprofile = Profiler(data) # Calculate Statistics, Entity Recognition, etc\n\nreadable_report = profile.report(report_options={\"output_format\": \"compact\"})\n\nprint(json.dumps(readable_report, indent=4))\n```\nNote: The Data Profiler comes with a pre-trained deep learning model, used to efficiently identify **sensitive data** (PII \u002F NPI). If desired, it's easy to add new entities to the existing pre-trained model or insert an entire new pipeline for entity recognition.\n\nFor API documentation, visit the [documentation page](https:\u002F\u002Fcapitalone.github.io\u002FDataProfiler\u002F).\n\nIf you have suggestions or find a bug, [please open an issue](https:\u002F\u002Fgithub.com\u002Fcapitalone\u002Fdataprofiler\u002Fissues\u002Fnew\u002Fchoose).\n\nIf you want to contribute, visit the [contributing page](https:\u002F\u002Fgithub.com\u002Fcapitalone\u002Fdataprofiler\u002Fblob\u002Fmain\u002F.github\u002FCONTRIBUTING.md).\n\n------------------\n\n# Install\n\n**To install the full package from pypi**: `pip install DataProfiler[full]`\n\nIf you want to install the ml dependencies without generating reports use `DataProfiler[ml]`\n\nIf the ML requirements are too strict (say, you don't want to install tensorflow), you can install a slimmer package with `DataProfiler[reports]`. The slimmer package disables the default sensitive data detection \u002F entity recognition (labler)\n\nInstall from pypi: `pip install DataProfiler`\n\n------------------\n\n# What is a Data Profile?\n\nIn the case of this library, a data profile is a dictionary containing statistics and predictions about the underlying dataset. There are \"global statistics\" or `global_stats`, which contain dataset level data and there are \"column\u002Frow level statistics\" or `data_stats` (each column is a new key-value entry).\n\nThe format for a structured profile is below:\n\n```\n\"global_stats\": {\n    \"samples_used\": int,\n    \"column_count\": int,\n    \"row_count\": int,\n    \"row_has_null_ratio\": float,\n    \"row_is_null_ratio\": float,\n    \"unique_row_ratio\": float,\n    \"duplicate_row_count\": int,\n    \"file_type\": string,\n    \"encoding\": string,\n    \"correlation_matrix\": list[list[int]], (*)\n    \"chi2_matrix\": list[list[float]],\n    \"profile_schema\": {\n        string: list[int]\n    },\n    \"times\": dict[string, float],\n},\n\"data_stats\": [\n    {\n        \"column_name\": string,\n        \"data_type\": string,\n        \"data_label\": string,\n        \"categorical\": bool,\n        \"order\": string,\n        \"samples\": list[str],\n        \"statistics\": {\n            \"sample_size\": int,\n            \"null_count\": int,\n            \"null_types\": list[string],\n            \"null_types_index\": {\n                string: list[int]\n            },\n            \"data_type_representation\": dict[string, float],\n            \"min\": [null, float, str],\n            \"max\": [null, float, str],\n            \"mode\": float,\n            \"median\": float,\n            \"median_absolute_deviation\": float,\n            \"sum\": float,\n            \"mean\": float,\n            \"variance\": float,\n            \"stddev\": float,\n            \"skewness\": float,\n            \"kurtosis\": float,\n            \"num_zeros\": int,\n            \"num_negatives\": int,\n            \"histogram\": {\n                \"bin_counts\": list[int],\n                \"bin_edges\": list[float],\n            },\n            \"quantiles\": {\n                int: float\n            },\n            \"vocab\": list[char],\n            \"avg_predictions\": dict[string, float],\n            \"data_label_representation\": dict[string, float],\n            \"categories\": list[str],\n            \"unique_count\": int,\n            \"unique_ratio\": float,\n            \"categorical_count\": dict[string, int],\n            \"gini_impurity\": float,\n            \"unalikeability\": float,\n            \"precision\": {\n                'min': int,\n                'max': int,\n                'mean': float,\n                'var': float,\n                'std': float,\n                'sample_size': int,\n                'margin_of_error': float,\n                'confidence_level': float\n            },\n            \"times\": dict[string, float],\n            \"format\": string\n        },\n        \"null_replication_metrics\": {\n            \"class_prior\": list[int],\n            \"class_sum\": list[list[int]],\n            \"class_mean\": list[list[int]]\n        }\n    }\n]\n```\n(*) Currently the correlation matrix update is toggled off. It will be reset in a later update. Users can still use it as desired with the is_enable option set to True.\n\nThe format for an unstructured profile is below:\n```\n\"global_stats\": {\n    \"samples_used\": int,\n    \"empty_line_count\": int,\n    \"file_type\": string,\n    \"encoding\": string,\n    \"memory_size\": float, # in MB\n    \"times\": dict[string, float],\n},\n\"data_stats\": {\n    \"data_label\": {\n        \"entity_counts\": {\n            \"word_level\": dict[string, int],\n            \"true_char_level\": dict[string, int],\n            \"postprocess_char_level\": dict[string, int]\n        },\n        \"entity_percentages\": {\n            \"word_level\": dict[string, float],\n            \"true_char_level\": dict[string, float],\n            \"postprocess_char_level\": dict[string, float]\n        },\n        \"times\": dict[string, float]\n    },\n    \"statistics\": {\n        \"vocab\": list[char],\n        \"vocab_count\": dict[string, int],\n        \"words\": list[string],\n        \"word_count\": dict[string, int],\n        \"times\": dict[string, float]\n    }\n}\n```\n\nThe format for a graph profile is below:\n```\n\"num_nodes\": int,\n\"num_edges\": int,\n\"categorical_attributes\": list[string],\n\"continuous_attributes\": list[string],\n\"avg_node_degree\": float,\n\"global_max_component_size\": int,\n\"continuous_distribution\": {\n    \"\u003Cattribute_1>\": {\n        \"name\": string,\n        \"scale\": float,\n        \"properties\": list[float, np.array]\n    },\n    \"\u003Cattribute_2>\": None,\n    ...\n},\n\"categorical_distribution\": {\n    \"\u003Cattribute_1>\": None,\n    \"\u003Cattribute_2>\": {\n        \"bin_counts\": list[int],\n        \"bin_edges\": list[float]\n    },\n    ...\n},\n\"times\": dict[string, float]\n\n```\n\n# Profile Statistic Descriptions\n\n### Structured Profile\n\n#### global_stats:\n\n* `samples_used` - number of input data samples used to generate this profile\n* `column_count` - the number of columns contained in the input dataset\n* `row_count` - the number of rows contained in the input dataset\n* `row_has_null_ratio` - the proportion of rows that contain at least one null value to the total number of rows\n* `row_is_null_ratio` - the proportion of rows that are fully comprised of null values (null rows) to the total number of rows\n* `unique_row_ratio` - the proportion of distinct rows in the input dataset to the total number of rows\n* `duplicate_row_count` - the number of rows that occur more than once in the input dataset\n* `file_type` - the format of the file containing the input dataset (ex: .csv)\n* `encoding` - the encoding of the file containing the input dataset (ex: UTF-8)\n* `correlation_matrix` - matrix of shape `column_count` x `column_count` containing the correlation coefficients between each column in the dataset\n* `chi2_matrix` - matrix of shape `column_count` x `column_count` containing the chi-square statistics between each column in the dataset\n* `profile_schema` - a description of the format of the input dataset labeling each column and its index in the dataset\n    * `string` - the label of the column in question and its index in the profile schema\n* `times` - the duration of time it took to generate the global statistics for this dataset in milliseconds\n\n#### data_stats:\n\n* `column_name` - the label\u002Ftitle of this column in the input dataset\n* `data_type` - the primitive python data type that is contained within this column\n* `data_label` - the label\u002Fentity of the data in this column as determined by the Labeler component\n* `categorical` - ‘true’ if this column contains categorical data\n* `order` - the way in which the data in this column is ordered, if any, otherwise “random”\n* `samples` - a small subset of data entries from this column\n* `statistics` - statistical information on the column\n    * `sample_size` - number of input data samples used to generate this profile\n    * `null_count` - the number of null entries in the sample\n    * `null_types` - a list of the different null types present within this sample\n    * `null_types_index` - a dict containing each null type and a respective list of the indicies that it is present within this sample\n    * `data_type_representation` - the percentage of samples used identifying as each data_type\n    * `min` - minimum value in the sample\n    * `max` - maximum value in the sample\n    * `mode` - mode of the entries in the sample\n    * `median` - median of the entries in the sample\n    * `median_absolute_deviation` - the median absolute deviation of the entries in the sample\n    * `sum` - the total of all sampled values from the column\n    * `mean` - the average of all entries in the sample\n    * `variance` - the variance of all entries in the sample\n    * `stddev` - the standard deviation of all entries in the sample\n    * `skewness` - the statistical skewness of all entries in the sample\n    * `kurtosis` - the statistical kurtosis of all entries in the sample\n    * `num_zeros` - the number of entries in this sample that have the value 0\n    * `num_negatives` - the number of entries in this sample that have a value less than 0\n    * `histogram` - contains histogram relevant information\n        * `bin_counts` - the number of entries within each bin\n        * `bin_edges` - the thresholds of each bin\n    * `quantiles` - the value at each percentile in the order they are listed based on the entries in the sample\n    * `vocab` - a list of the characters used within the entries in this sample\n    * `avg_predictions` - average of the data label prediction confidences across all data points sampled\n    * `categories` - a list of each distinct category within the sample if `categorial` = 'true'\n    * `unique_count` - the number of distinct entries in the sample\n    * `unique_ratio` - the proportion of the number of distinct entries in the sample to the total number of entries in the sample\n    * `categorical_count` - number of entries sampled for each category if `categorical` = 'true'\n    * `gini_impurity` - measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset\n    * `unalikeability` - a value denoting how frequently entries differ from one another within the sample\n    * `precision` - a dict of statistics with respect to the number of digits in a number for each sample\n    * `times` - the duration of time it took to generate this sample's statistics in milliseconds\n    * `format` - list of possible datetime formats\n* `null_replication_metrics` - statistics of data partitioned based on whether column value is null (index 1 of lists referenced by dict keys) or not (index 0)\n    * `class_prior` - a list containing probability of a column value being null and not null\n    * `class_sum`- a list containing sum of all other rows based on whether column value is null or not\n    * `class_mean`- a list containing mean of all other rows based on whether column value is null or not\n\n### Unstructured Profile\n\n#### global_stats:\n\n* `samples_used` - number of input data samples used to generate this profile\n* `empty_line_count` - the number of empty lines in the input data\n* `file_type` - the file type of the input data (ex: .txt)\n* `encoding` - file encoding of the input data file (ex: UTF-8)\n* `memory_size` - size of the input data in MB\n* `times` - duration of time it took to generate this profile in milliseconds\n\n#### data_stats:\n\n* `data_label` - labels and statistics on the labels of the input data\n    * `entity_counts` - the number of times a specific label or entity appears inside the input data\n        * `word_level` - the number of words counted within each label or entity\n        * `true_char_level` - the number of characters counted within each label or entity as determined by the model\n        * `postprocess_char_level` - the number of characters counted within each label or entity as determined by the postprocessor\n    * `entity_percentages` - the percentages of each label or entity within the input data\n        * `word_level` - the percentage of words in the input data that are contained within each label or entity\n        * `true_char_level` - the percentage of characters in the input data that are contained within each label or entity as determined by the model\n        * `postprocess_char_level` - the percentage of characters in the input data that are contained within each label or entity as determined by the postprocessor\n    * `times` - the duration of time it took for the data labeler to predict on the data\n* `statistics` - statistics of the input data\n    * `vocab` - a list of each character in the input data\n    * `vocab_count` - the number of occurrences of each distinct character in the input data\n    * `words` - a list of each word in the input data\n    * `word_count` - the number of occurrences of each distinct word in the input data\n    * `times` - the duration of time it took to generate the vocab and words statistics in milliseconds\n\n### Graph Profile\n* `num_nodes` - number of nodes in the graph\n* `num_edges` - number of edges in the graph\n* `categorical_attributes` - list of categorical edge attributes\n* `continuous_attributes` - list of continuous edge attributes\n* `avg_node_degree` - average degree of nodes in the graph\n* `global_max_component_size`: size of the global max component\n\n#### continuous_distribution:\n* `\u003Cattribute_N>`: name of N-th edge attribute in list of attributes\n    * `name` - name of distribution for attribute\n    * `scale` - negative log likelihood used to scale and compare distributions\n    * `properties` - list of statistical properties describing the distribution\n        * [shape (optional), loc, scale, mean, variance, skew, kurtosis]\n\n\n#### categorical_distribution:\n* `\u003Cattribute_N>`: name of N-th edge attribute in list of attributes\n    * `bin_counts`: counts in each bin of the distribution histogram\n    * `bin_edges`: edges of each bin of the distribution histogram\n\n* times - duration of time it took to generate this profile in milliseconds\n\n# Support\n\n### Supported Data Formats\n\n* Any delimited file (CSV, TSV, etc.)\n* JSON object\n* Avro file\n* Parquet file\n* Text file\n* Pandas DataFrame\n* A URL that points to one of the supported file types above\n\n### Data Types\n\n*Data Types* are determined at the column level for structured data\n\n* Int\n* Float\n* String\n* DateTime\n\n### Data Labels\n\n*Data Labels* are determined per cell for structured data (column\u002Frow when the *profiler* is used) or at the character level for unstructured data.\n\n* UNKNOWN\n* ADDRESS\n* BAN (bank account number, 10-18 digits)\n* CREDIT_CARD\n* EMAIL_ADDRESS\n* UUID\n* HASH_OR_KEY (md5, sha1, sha256, random hash, etc.)\n* IPV4\n* IPV6\n* MAC_ADDRESS\n* PERSON\n* PHONE_NUMBER\n* SSN\n* URL\n* US_STATE\n* DRIVERS_LICENSE\n* DATE\n* TIME\n* DATETIME\n* INTEGER\n* FLOAT\n* QUANTITY\n* ORDINAL\n\n# Get Started\n\n### Load a File\n\nThe Data Profiler can profile the following data\u002Ffile types:\n\n* CSV file (or any delimited file)\n* JSON object\n* Avro file\n* Parquet file\n* Text file\n* Pandas DataFrame\n* A URL that points to one of the supported file types above\n\nThe profiler should automatically identify the file type and load the data into a `Data Class`.\n\nAlong with other attributtes the `Data class` enables data to be accessed via a valid Pandas DataFrame.\n\n```python\n# Load a csv file, return a CSVData object\ncsv_data = Data('your_file.csv')\n\n# Print the first 10 rows of the csv file\nprint(csv_data.data.head(10))\n\n# Load a parquet file, return a ParquetData object\nparquet_data = Data('your_file.parquet')\n\n# Sort the data by the name column\nparquet_data.data.sort_values(by='name', inplace=True)\n\n# Print the sorted first 10 rows of the parquet data\nprint(parquet_data.data.head(10))\n\n# Load a json file from a URL, return a JSONData object\njson_data = Data('https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fblob\u002Fmain\u002Fdataprofiler\u002Ftests\u002Fdata\u002Fjson\u002Firis-utf-8.json')\n```\n\nIf the file type is not automatically identified (rare), you can specify them\nspecifically, see section [Specifying a Filetype or Delimiter](#specifying-a-filetype-or-delimiter).\n\n### Profile a File\n\nExample uses a CSV file for example, but CSV, JSON, Avro, Parquet or Text also work.\n\n```python\nimport json\nfrom dataprofiler import Data, Profiler\n\n# Load file (CSV should be automatically identified)\ndata = Data(\"your_file.csv\")\n\n# Profile the dataset\nprofile = Profiler(data)\n\n# Generate a report and use json to prettify.\nreport  = profile.report(report_options={\"output_format\": \"pretty\"})\n\n# Print the report\nprint(json.dumps(report, indent=4))\n```\n\n### Updating Profiles\n\nCurrently, the data profiler is equipped to update its profile in batches.\n\n```python\nimport json\nfrom dataprofiler import Data, Profiler\n\n# Load and profile a CSV file\ndata = Data(\"your_file.csv\")\nprofile = Profiler(data)\n\n# Update the profile with new data:\nnew_data = Data(\"new_data.csv\")\nprofile.update_profile(new_data)\n\n# Print the report using json to prettify.\nreport  = profile.report(report_options={\"output_format\": \"pretty\"})\nprint(json.dumps(report, indent=4))\n```\n\nNote that if the data you update the profile with contains integer indices that overlap with the indices on data originally profiled, when null rows are calculated the indices will be \"shifted\" to uninhabited values so that null counts and ratios are still accurate.\n\n### Merging Profiles\n\nIf you have two files with the same schema (but different data), it is possible to merge the two profiles together via an addition operator.\n\nThis also enables profiles to be determined in a distributed manner.\n\n```python\nimport json\nfrom dataprofiler import Data, Profiler\n\n# Load a CSV file with a schema\ndata1 = Data(\"file_a.csv\")\nprofile1 = Profiler(data1)\n\n# Load another CSV file with the same schema\ndata2 = Data(\"file_b.csv\")\nprofile2 = Profiler(data2)\n\nprofile3 = profile1 + profile2\n\n# Print the report using json to prettify.\nreport  = profile3.report(report_options={\"output_format\": \"pretty\"})\nprint(json.dumps(report, indent=4))\n```\n\nNote that if merged profiles had overlapping integer indices, when null rows are calculated the indices will be \"shifted\" to uninhabited values so that null counts and ratios are still accurate.\n\n### Profiler Differences\nFor finding the change between profiles with the same schema we can utilize the\nprofile's `diff` function. The `diff` will provide overall file and sampling\ndifferences as well as detailed differences of the data's statistics. For\nexample, numerical columns have both t-test to evaluate similarity and PSI (Population Stability Index) to quantify column distribution shift.\nMore information is described in the Profiler section of the [Github Pages](\nhttps:\u002F\u002Fcapitalone.github.io\u002FDataProfiler\u002F).\n\nCreate the difference report like this:\n```python\nimport json\nimport dataprofiler as dp\n\n# Load a CSV file\ndata1 = dp.Data(\"file_a.csv\")\nprofile1 = dp.Profiler(data1)\n\n# Load another CSV file\ndata2 = dp.Data(\"file_b.csv\")\nprofile2 = dp.Profiler(data2)\n\ndiff_report = profile1.diff(profile2)\nprint(json.dumps(diff_report, indent=4))\n```\n\n### Profile a Pandas DataFrame\n```python\nimport pandas as pd\nimport dataprofiler as dp\nimport json\n\nmy_dataframe = pd.DataFrame([[1, 2.0],[1, 2.2],[-1, 3]])\nprofile = dp.Profiler(my_dataframe)\n\n# print the report using json to prettify.\nreport = profile.report(report_options={\"output_format\": \"pretty\"})\nprint(json.dumps(report, indent=4))\n\n# read a specified column, in this case it is labeled 0:\nprint(json.dumps(report[\"data_stats\"][0], indent=4))\n```\n\n### Unstructured Profiler\nIn addition to the structured profiler, DataProfiler provides unstructured profiling for the TextData object or string. The unstructured profiler also works with list[string], pd.Series(string) or pd.DataFrame(string) given profiler_type option specified as `unstructured`. Below is an example of the unstructured profiler with a text file.\n```python\nimport dataprofiler as dp\nimport json\n\nmy_text = dp.Data('text_file.txt')\nprofile = dp.Profiler(my_text)\n\n# print the report using json to prettify.\nreport = profile.report(report_options={\"output_format\": \"pretty\"})\nprint(json.dumps(report, indent=4))\n```\n\nAnother example of the unstructured profiler with pd.Series of strings is given as below, with the profiler option `profiler_type='unstructured'`\n```python\nimport dataprofiler as dp\nimport pandas as pd\nimport json\n\ntext_data = pd.Series(['first string', 'second string'])\nprofile = dp.Profiler(text_data, profiler_type='unstructured')\n\n# print the report using json to prettify.\nreport = profile.report(report_options={\"output_format\": \"pretty\"})\nprint(json.dumps(report, indent=4))\n```\n\n### Graph Profiler\nDataProfiler also provides the ability to profile graph data from a csv file. Below is an example of the graph profiler with a graph data csv file:\n```python\nimport dataprofiler as dp\nimport pprint\n\nmy_graph = dp.Data('graph_file.csv')\nprofile = dp.Profiler(my_graph)\n\n# print the report using pretty print (json dump does not work on numpy array values inside dict)\nreport = profile.report()\nprinter = pprint.PrettyPrinter(sort_dicts=False, compact=True)\nprinter.pprint(report)\n```\n\n**Visit the [documentation page](https:\u002F\u002Fcapitalone.github.io\u002FDataProfiler\u002F) for additional Examples and API details**\n\n# References\n```\nSensitive Data Detection with High-Throughput Neural Network Models for Financial Institutions\nAuthors: Anh Truong, Austin Walters, Jeremy Goodsitt\n2020 https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.09597\nThe AAAI-21 Workshop on Knowledge Discovery from Unstructured Data in Financial Services\n```\n","![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002FDataProfiler)\n![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FCapitalOne\u002FDataProfiler)\n![GitHub last commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002FCapitalOne\u002FDataProfiler)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcapitalone_DataProfiler_readme_569994c28cc7.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fdataprofiler)\n\n\u003Cp text-align=\"left\">\n    \u003Cpicture>\n      \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fraw\u002Fgh-pages\u002Fdocs\u002Fsource\u002F_static\u002Fimages\u002FDataProfilerDarkLogoLong.png\">\n      \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fraw\u002Fgh-pages\u002Fdocs\u002Fsource\u002F_static\u002Fimages\u002FDataProfilerLogoLightThemeLong.png\">\n      \u003Cimg alt=\"Shows a black logo in light color mode and a white one in dark color mode.\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcapitalone_DataProfiler_readme_b09ad3828efa.png\">\n    \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n# Data Profiler | 你的数据里有什么？\n\nDataProfiler 是一个 Python 库，旨在简化数据分析、监控和**敏感数据检测**。\n\n通过单个命令加载**数据**，该库会自动格式化并将文件加载到 DataFrame (数据框) 中。**Profiling (数据剖析)** 数据，库可以识别架构、统计信息、实体（PII \u002F NPI (个人身份信息 \u002F 非个人身份信息)）等。随后，数据剖析结果可用于下游应用程序或报告中。\n\n开始使用只需几行代码（[示例 CSV](https:\u002F\u002Fraw.githubusercontent.com\u002Fcapitalone\u002FDataProfiler\u002Fmain\u002Fdataprofiler\u002Ftests\u002Fdata\u002Fcsv\u002Faws_honeypot_marx_geo.csv)）：\n\n```python\nimport json\nfrom dataprofiler import Data, Profiler\n\ndata = Data(\"your_file.csv\") # Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL\n\nprint(data.data.head(5)) # Access data directly via a compatible Pandas DataFrame\n\nprofile = Profiler(data) # Calculate Statistics, Entity Recognition, etc\n\nreadable_report = profile.report(report_options={\"output_format\": \"compact\"})\n\nprint(json.dumps(readable_report, indent=4))\n```\n注意：Data Profiler 附带一个预训练的深度学习模型，用于高效识别**敏感数据**（PII \u002F NPI）。如果需要，可以轻松地将新实体添加到现有的预训练模型中，或者插入一个新的实体识别管道。\n\n如需 API 文档，请访问 [文档页面](https:\u002F\u002Fcapitalone.github.io\u002FDataProfiler\u002F)。\n\n如果您有建议或发现错误，[请提交问题](https:\u002F\u002Fgithub.com\u002Fcapitalone\u002Fdataprofiler\u002Fissues\u002Fnew\u002Fchoose)。\n\n如果您想贡献代码，请访问 [贡献指南](https:\u002F\u002Fgithub.com\u002Fcapitalone\u002Fdataprofiler\u002Fblob\u002Fmain\u002F.github\u002FCONTRIBUTING.md)。\n\n------------------\n\n# 安装\n\n**要从 pypi 安装完整包**：`pip install DataProfiler[full]`\n\n如果您想安装 ML 依赖项而不生成报告，请使用 `DataProfiler[ml]`\n\n如果 ML 需求过于严格（例如，您不想安装 tensorflow），您可以使用 `DataProfiler[reports]` 安装更精简的包。该精简包会禁用默认的敏感数据检测\u002F实体识别（labler）\n\n从 pypi 安装：`pip install DataProfiler`\n\n# 什么是数据档案？\n\n在本库中，**数据档案 (Data Profile)** 是一个包含关于底层数据集的统计信息和预测的字典。其中包含 **“全局统计信息” (`global_stats`)**，存储数据集级别的数据；以及 **“列\u002F行级别统计信息” (`data_stats`)**（每一列都是一个新的键值对条目）。\n\n结构化档案的格式如下：\n\n```\n\"global_stats\": {\n    \"samples_used\": int,\n    \"column_count\": int,\n    \"row_count\": int,\n    \"row_has_null_ratio\": float,\n    \"row_is_null_ratio\": float,\n    \"unique_row_ratio\": float,\n    \"duplicate_row_count\": int,\n    \"file_type\": string,\n    \"encoding\": string,\n    \"correlation_matrix\": list[list[int]], (*)\n    \"chi2_matrix\": list[list[float]],\n    \"profile_schema\": {\n        string: list[int]\n    },\n    \"times\": dict[string, float],\n},\n\"data_stats\": [\n    {\n        \"column_name\": string,\n        \"data_type\": string,\n        \"data_label\": string,\n        \"categorical\": bool,\n        \"order\": string,\n        \"samples\": list[str],\n        \"statistics\": {\n            \"sample_size\": int,\n            \"null_count\": int,\n            \"null_types\": list[string],\n            \"null_types_index\": {\n                string: list[int]\n            },\n            \"data_type_representation\": dict[string, float],\n            \"min\": [null, float, str],\n            \"max\": [null, float, str],\n            \"mode\": float,\n            \"median\": float,\n            \"median_absolute_deviation\": float,\n            \"sum\": float,\n            \"mean\": float,\n            \"variance\": float,\n            \"stddev\": float,\n            \"skewness\": float,\n            \"kurtosis\": float,\n            \"num_zeros\": int,\n            \"num_negatives\": int,\n            \"histogram\": {\n                \"bin_counts\": list[int],\n                \"bin_edges\": list[float],\n            },\n            \"quantiles\": {\n                int: float\n            },\n            \"vocab\": list[char],\n            \"avg_predictions\": dict[string, float],\n            \"data_label_representation\": dict[string, float],\n            \"categories\": list[str],\n            \"unique_count\": int,\n            \"unique_ratio\": float,\n            \"categorical_count\": dict[string, int],\n            \"gini_impurity\": float,\n            \"unalikeability\": float,\n            \"precision\": {\n                'min': int,\n                'max': int,\n                'mean': float,\n                'var': float,\n                'std': float,\n                'sample_size': int,\n                'margin_of_error': float,\n                'confidence_level': float\n            },\n            \"times\": dict[string, float],\n            \"format\": string\n        },\n        \"null_replication_metrics\": {\n            \"class_prior\": list[int],\n            \"class_sum\": list[list[int]],\n            \"class_mean\": list[list[int]]\n        }\n    }\n]\n```\n(*) 目前相关系数矩阵 (`correlation_matrix`) 的更新功能已禁用。它将在后续更新中恢复。用户仍可通过将 `is_enable` 选项设置为 True 按需使用它。\n\n非结构化档案的格式如下：\n```\n\"global_stats\": {\n    \"samples_used\": int,\n    \"empty_line_count\": int,\n    \"file_type\": string,\n    \"encoding\": string,\n    \"memory_size\": float, # in MB\n    \"times\": dict[string, float],\n},\n\"data_stats\": {\n    \"data_label\": {\n        \"entity_counts\": {\n            \"word_level\": dict[string, int],\n            \"true_char_level\": dict[string, int],\n            \"postprocess_char_level\": dict[string, int]\n        },\n        \"entity_percentages\": {\n            \"word_level\": dict[string, float],\n            \"true_char_level\": dict[string, float],\n            \"postprocess_char_level\": dict[string, float]\n        },\n        \"times\": dict[string, float]\n    },\n    \"statistics\": {\n        \"vocab\": list[char],\n        \"vocab_count\": dict[string, int],\n        \"words\": list[string],\n        \"word_count\": dict[string, int],\n        \"times\": dict[string, float]\n    }\n}\n```\n\n图结构档案的格式如下：\n```\n\"num_nodes\": int,\n\"num_edges\": int,\n\"categorical_attributes\": list[string],\n\"continuous_attributes\": list[string],\n\"avg_node_degree\": float,\n\"global_max_component_size\": int,\n\"continuous_distribution\": {\n    \"\u003Cattribute_1>\": {\n        \"name\": string,\n        \"scale\": float,\n        \"properties\": list[float, np.array]\n    },\n    \"\u003Cattribute_2>\": None,\n    ...\n},\n\"categorical_distribution\": {\n    \"\u003Cattribute_1>\": None,\n    \"\u003Cattribute_2>\": {\n        \"bin_counts\": list[int],\n        \"bin_edges\": list[float]\n    },\n    ...\n},\n\"times\": dict[string, float]\n\n```\n\n# 档案统计描述\n\n### 结构化配置文件\n\n#### 全局统计信息：\n\n* `samples_used` - 用于生成此配置文件的输入数据样本数量\n* `column_count` - 输入数据集中包含的列数\n* `row_count` - 输入数据集中包含的行数\n* `row_has_null_ratio` - 包含至少一个空值的行数占总行数的比例\n* `row_is_null_ratio` - 完全由空值组成的行（空行）占总行数的比例\n* `unique_row_ratio` - 输入数据集中不同行的数量占总行数的比例\n* `duplicate_row_count` - 在输入数据集中出现次数大于一次的行数\n* `file_type` - 包含输入数据集的文件格式（例如：.csv）\n* `encoding` - 包含输入数据集的文件的编码（例如：UTF-8）\n* `correlation_matrix` - 形状为 column_count x column_count 的矩阵，包含数据集中每列之间的相关系数\n* `chi2_matrix` - 形状为 column_count x column_count 的矩阵，包含数据集中每列之间的卡方统计量\n* `profile_schema` - 描述输入数据集格式的说明，标记每一列及其在数据集中的索引\n    * `string` - 问题列的标签及其在配置文件 schema 中的索引\n* `times` - 生成此数据集全局统计信息所花费的时间（毫秒）\n\n#### 数据统计信息：\n\n* `column_name` - 该列在输入数据集中的标签\u002F标题\n* `data_type` - 该列中包含的原生 Python 数据类型\n* `data_label` - 由 Labeler 组件确定的该列数据的标签\u002F实体\n* `categorical` - 如果该列包含分类数据则为 'true'\n* `order` - 该列中数据的排序方式（如果有），否则为“random”\n* `samples` - 来自该列的一小部分数据条目\n* `statistics` - 关于该列的统计信息\n    * `sample_size` - 用于生成此配置文件的输入数据样本数量\n    * `null_count` - 样本中空条目的数量\n    * `null_types` - 该样本中存在的不同空值类型的列表\n    * `null_types_index` - 包含每个空值类型及其在该样本中出现位置索引列表的字典\n    * `data_type_representation` - 识别为每种数据类型的样本百分比\n    * `min` - 样本中的最小值\n    * `max` - 样本中的最大值\n    * `mode` - 样本条目的众数\n    * `median` - 样本条目的中位数\n    * `median_absolute_deviation` - 样本条目的中位绝对偏差\n    * `sum` - 该列所有采样值的总和\n    * `mean` - 样本中所有条目的平均值\n    * `variance` - 样本中所有条目的方差\n    * `stddev` - 样本中所有条目的标准差\n    * `skewness` - 样本中所有条目的统计偏度\n    * `kurtosis` - 样本中所有条目的统计峰度\n    * `num_zeros` - 该样本中值为 0 的条目数量\n    * `num_negatives` - 该样本中值小于 0 的条目数量\n    * `histogram` - 包含直方图相关信息\n        * `bin_counts` - 每个 bin 内的条目数量\n        * `bin_edges` - 每个 bin 的阈值\n    * `quantiles` - 按样本中条目顺序列出的每个百分位数的值\n    * `vocab` - 该样本条目中使用的字符列表\n    * `avg_predictions` - 所有采样数据点的数据标签预测置信度的平均值\n    * `categories` - 如果 `categorial` = 'true'，则列出样本中的每个不同类别\n    * `unique_count` - 样本中不同条目的数量\n    * `unique_ratio` - 样本中不同条目数量与样本总条目数量的比例\n    * `categorical_count` - 如果 `categorical` = 'true'，则列出每个类别采样的条目数量\n    * `gini_impurity` - 衡量从集合中随机选择一个元素时，如果根据子集中的标签分布进行随机标记，被错误标记的频率\n    * `unalikeability` - 表示样本内条目彼此不同的频率的值\n    * `precision` - 关于每个样本数字位数的统计信息字典\n    * `times` - 生成此样本统计信息所花费的时间（毫秒）\n    * `format` - 可能的日期时间格式列表\n* `null_replication_metrics` - 基于列值是否为空（字典键引用的列表索引 1）或不为空（索引 0）对数据进行分区后的统计信息\n    * `class_prior` - 包含列值为空和不为空的概率的列表\n    * `class_sum` - 基于列值是否为空而包含其他所有行之和的列表\n    * `class_mean` - 基于列值是否为空而包含其他所有行之平均值的列表\n\n### 非结构化数据档案\n\n#### global_stats：\n\n* `samples_used` - 用于生成此档案的输入数据样本数量\n* `empty_line_count` - 输入数据中的空行数量\n* `file_type` - 输入数据的文件类型（例如：.txt）\n* `encoding` - 输入数据文件的编码（例如：UTF-8）\n* `memory_size` - 输入数据的大小（MB）\n* `times` - 生成此档案所需的时间（毫秒）\n\n#### data_stats：\n\n* `data_label` - 输入数据标签的标签和统计信息\n    * `entity_counts` - 特定标签或实体在输入数据中出现的次数\n        * `word_level` - 每个标签或实体内计数的单词数量\n        * `true_char_level` - 由模型确定的每个标签或实体内计数的字符数量\n        * `postprocess_char_level` - 由后处理器确定的每个标签或实体内计数的字符数量\n    * `entity_percentages` - 输入数据中每个标签或实体的百分比\n        * `word_level` - 输入数据中包含在每个标签或实体内的单词百分比\n        * `true_char_level` - 由模型确定的输入数据中包含在每个标签或实体内的字符百分比\n        * `postprocess_char_level` - 由后处理器确定的输入数据中包含在每个标签或实体内的字符百分比\n    * `times` - 数据标签器预测数据所需的时间\n* `statistics` - 输入数据的统计信息\n    * `vocab` - 输入数据中每个字符的列表\n    * `vocab_count` - 输入数据中每个不同字符的出现次数\n    * `words` - 输入数据中每个单词的列表\n    * `word_count` - 输入数据中每个不同单词的出现次数\n    * `times` - 生成词汇表和单词统计信息所需的时间（毫秒）\n\n### 图数据档案\n* `num_nodes` - 图中的节点数量\n* `num_edges` - 图中的边数量\n* `categorical_attributes` - 分类边属性列表\n* `continuous_attributes` - 连续边属性列表\n* `avg_node_degree` - 图中节点的平均度数\n* `global_max_component_size`: 全局最大连通分量的大小\n\n#### continuous_distribution：\n* `\u003Cattribute_N>`: 属性列表中第 N 个边属性的名称\n    * `name` - 属性分布的名称\n    * `scale` - 用于缩放和比较分布的负对数似然值\n    * `properties` - 描述分布的统计属性列表\n        * [形状（可选）, 位置，缩放，均值，方差，偏度，峰度]\n\n\n#### categorical_distribution：\n* `\u003Cattribute_N>`: 属性列表中第 N 个边属性的名称\n    * `bin_counts`: 分布直方图中每个区间的计数\n    * `bin_edges`: 分布直方图中每个区间的边界\n\n* times - 生成此档案所需的时间（毫秒）\n\n# 支持\n\n### 支持的数据格式\n\n* 任何分隔符文件 (CSV, TSV 等)\n* JSON 对象\n* Avro 文件\n* Parquet 文件\n* 文本文件\n* Pandas DataFrame (数据框)\n* 指向上述受支持文件类型的 URL\n\n### 数据类型\n\n*数据类型* 针对结构化数据在列级别确定\n\n* Int\n* Float\n* String\n* DateTime\n\n### 数据标签\n\n*数据标签* 针对结构化数据在单元格级别确定（使用 *Profiler* 时为列\u002F行），或针对非结构化数据在字符级别确定。\n\n* UNKNOWN\n* ADDRESS\n* BAN (银行账号，10-18 位数字)\n* CREDIT_CARD\n* EMAIL_ADDRESS\n* UUID\n* HASH_OR_KEY (md5, sha1, sha256, 随机哈希等)\n* IPV4\n* IPV6\n* MAC_ADDRESS\n* PERSON\n* PHONE_NUMBER\n* SSN\n* URL\n* US_STATE\n* DRIVERS_LICENSE\n* DATE\n* TIME\n* DATETIME\n* INTEGER\n* FLOAT\n* QUANTITY\n* ORDINAL\n\n# 入门指南\n\n### 加载文件\n\nData Profiler (数据分析器) 可以分析以下数据\u002F文件类型：\n\n* CSV 文件（或任何分隔符文件）\n* JSON 对象\n* Avro 文件\n* Parquet 文件\n* 文本文件\n* Pandas DataFrame (数据框)\n* 指向上述受支持文件类型的 URL\n\n分析器应自动识别文件类型并将数据加载到 `Data Class` (数据类) 中。\n\n与其他属性一起，`Data Class` (数据类) 允许通过有效的 Pandas DataFrame 访问数据。\n\n```python\n# Load a csv file, return a CSVData object\ncsv_data = Data('your_file.csv')\n\n# Print the first 10 rows of the csv file\nprint(csv_data.data.head(10))\n\n# Load a parquet file, return a ParquetData object\nparquet_data = Data('your_file.parquet')\n\n# Sort the data by the name column\nparquet_data.data.sort_values(by='name', inplace=True)\n\n# Print the sorted first 10 rows of the parquet data\nprint(parquet_data.data.head(10))\n\n# Load a json file from a URL, return a JSONData object\njson_data = Data('https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fblob\u002Fmain\u002Fdataprofiler\u002Ftests\u002Fdata\u002Fjson\u002Firis-utf-8.json')\n```\n\n如果文件类型未被自动识别（罕见情况），您可以具体指定它们，参见章节 [指定文件类型或分隔符](#specifying-a-filetype-or-delimiter)。\n\n### 分析文件\n\n示例使用 CSV 文件，但 CSV、JSON、Avro、Parquet 或文本文件同样适用。\n\n```python\nimport json\nfrom dataprofiler import Data, Profiler\n\n# Load file (CSV should be automatically identified)\ndata = Data(\"your_file.csv\")\n\n# Profile the dataset\nprofile = Profiler(data)\n\n# Generate a report and use json to prettify.\nreport  = profile.report(report_options={\"output_format\": \"pretty\"})\n\n# Print the report\nprint(json.dumps(report, indent=4))\n```\n\n### 更新档案\n\n目前，数据分析器具备批量更新其档案的功能。\n\n```python\nimport json\nfrom dataprofiler import Data, Profiler\n\n# Load and profile a CSV file\ndata = Data(\"your_file.csv\")\nprofile = Profiler(data)\n\n# Update the profile with new data:\nnew_data = Data(\"new_data.csv\")\nprofile.update_profile(new_data)\n\n# Print the report using json to prettify.\nreport  = profile.report(report_options={\"output_format\": \"pretty\"})\nprint(json.dumps(report, indent=4))\n```\n\n请注意，如果您用来更新档案的数据包含与原始档案数据索引重叠的整数索引，当计算空值行时，索引将“移位”到未使用的值，以确保空值计数和比率仍然准确。\n\n### 合并档案\n\n如果您有两个具有相同 Schema (模式) 的文件（但数据不同），可以通过加法运算符将两个档案合并在一起。\n\n这也使得档案能够以分布式方式确定。\n\n```python\nimport json\nfrom dataprofiler import Data, Profiler\n\n# Load a CSV file with a schema\ndata1 = Data(\"file_a.csv\")\nprofile1 = Profiler(data1)\n\n# Load another CSV file with the same schema\ndata2 = Data(\"file_b.csv\")\nprofile2 = Profiler(data2)\n\nprofile3 = profile1 + profile2\n```\n\n```\n# Print the report using json to prettify.\nreport  = profile3.report(report_options={\"output_format\": \"pretty\"})\nprint(json.dumps(report, indent=4))\n```\n\n注意，如果合并后的分析结果 (profile) 具有重叠的整数索引，在计算空行时，索引将被“移位”到未使用的值，以确保空值计数和比率仍然准确。\n\n### 分析器差异\n为了查找具有相同架构 (schema) 的分析结果之间的变化，我们可以利用 profile 的 `diff` 函数。`diff` 将提供整体文件和采样差异，以及数据统计的详细差异。例如，数值列同时包含用于评估相似性的 t-test (t 检验) 和用于量化列分布偏移的 PSI (Population Stability Index)。更多信息详见 [Github Pages](https:\u002F\u002Fcapitalone.github.io\u002FDataProfiler\u002F) 中的 Profiler 章节。\n\n创建差异报告如下：\n```python\nimport json\nimport dataprofiler as dp\n\n# Load a CSV file\ndata1 = dp.Data(\"file_a.csv\")\nprofile1 = dp.Profiler(data1)\n\n# Load another CSV file\ndata2 = dp.Data(\"file_b.csv\")\nprofile2 = dp.Profiler(data2)\n\ndiff_report = profile1.diff(profile2)\nprint(json.dumps(diff_report, indent=4))\n```\n\n### 分析 Pandas DataFrame\n```python\nimport pandas as pd\nimport dataprofiler as dp\nimport json\n\nmy_dataframe = pd.DataFrame([[1, 2.0],[1, 2.2],[-1, 3]])\nprofile = dp.Profiler(my_dataframe)\n\n# print the report using json to prettify.\nreport = profile.report(report_options={\"output_format\": \"pretty\"})\nprint(json.dumps(report, indent=4))\n\n# read a specified column, in this case it is labeled 0:\nprint(json.dumps(report[\"data_stats\"][0], indent=4))\n```\n\n### 非结构化分析器\n除结构化分析器外，DataProfiler 还为 TextData 对象或字符串提供非结构化分析 (unstructured) 功能。当指定 profiler_type 选项为 `unstructured` 时，非结构化分析器也支持 list[string]、pd.Series(string) 或 pd.DataFrame(string)。以下是使用文本文件的非结构化分析器示例。\n```python\nimport dataprofiler as dp\nimport json\n\nmy_text = dp.Data('text_file.txt')\nprofile = dp.Profiler(my_text)\n\n# print the report using json to prettify.\nreport = profile.report(report_options={\"output_format\": \"pretty\"})\nprint(json.dumps(report, indent=4))\n```\n\n下面是另一个使用 pd.Series 字符串的非结构化分析器示例，其中 profiler 选项为 `profiler_type='unstructured'`\n```python\nimport dataprofiler as dp\nimport pandas as pd\nimport json\n\ntext_data = pd.Series(['first string', 'second string'])\nprofile = dp.Profiler(text_data, profiler_type='unstructured')\n\n# print the report using json to prettify.\nreport = profile.report(report_options={\"output_format\": \"pretty\"})\nprint(json.dumps(report, indent=4))\n```\n\n### 图分析器\nDataProfiler 还支持从 csv 文件对图数据进行分析。以下是使用图数据 csv 文件的图分析器示例：\n```python\nimport dataprofiler as dp\nimport pprint\n\nmy_graph = dp.Data('graph_file.csv')\nprofile = dp.Profiler(my_graph)\n\n# print the report using pretty print (json dump does not work on numpy array values inside dict)\nreport = profile.report()\nprinter = pprint.PrettyPrinter(sort_dicts=False, compact=True)\nprinter.pprint(report)\n```\n\n**访问 [文档页面](https:\u002F\u002Fcapitalone.github.io\u002FDataProfiler\u002F) 以获取更多示例和 API 详情**\n\n# 参考文献\n```\nSensitive Data Detection with High-Throughput Neural Network Models for Financial Institutions\n作者：Anh Truong, Austin Walters, Jeremy Goodsitt\n2020 https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.09597\nThe AAAI-21 Workshop on Knowledge Discovery from Unstructured Data in Financial Services\n```","# DataProfiler 快速上手指南\n\nDataProfiler 是一款 Python 库，旨在简化数据分析、监控及敏感数据（如 PII\u002FNPI）的检测。它支持自动加载多种格式文件并生成详细的数据画像报告。\n\n## 环境准备\n\n- **操作系统**：Windows, macOS, Linux\n- **Python 版本**：需安装 Python 3.x 环境\n- **依赖管理**：确保已安装 `pip` 包管理器\n\n## 安装步骤\n\n推荐使用国内镜像源以加快下载速度（可选）。\n\n### 1. 基础安装\n仅安装核心功能：\n```bash\npip install DataProfiler\n```\n\n### 2. 完整功能安装\n包含机器学习依赖及报告生成功能（推荐）：\n```bash\npip install DataProfiler[full]\n```\n\n### 3. 按需安装\n- **仅安装 ML 依赖**（不生成报告）：\n  ```bash\n  pip install DataProfiler[ml]\n  ```\n- **轻量级安装**（禁用敏感数据检测\u002F实体识别）：\n  ```bash\n  pip install DataProfiler[reports]\n  ```\n\n> **提示**：若遇到下载速度慢的问题，可使用清华源或阿里源，例如：\n> `pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple DataProfiler[full]`\n\n## 基本使用\n\n以下示例展示了如何加载数据、生成分析配置并输出 JSON 格式的统计报告。\n\n```python\nimport json\nfrom dataprofiler import Data, Profiler\n\ndata = Data(\"your_file.csv\") # Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL\n\nprint(data.data.head(5)) # Access data directly via a compatible Pandas DataFrame\n\nprofile = Profiler(data) # Calculate Statistics, Entity Recognition, etc\n\nreadable_report = profile.report(report_options={\"output_format\": \"compact\"})\n\nprint(json.dumps(readable_report, indent=4))\n```\n\n### 说明\n1. **数据加载**：`Data` 类支持自动检测并加载 CSV、AVRO、Parquet、JSON、文本及 URL 格式。\n2. **数据访问**：生成的 `data.data` 对象兼容 Pandas DataFrame，可直接进行后续操作。\n3. **分析生成**：`Profiler` 会自动计算统计数据、识别实体（包括预训练的敏感数据模型）。\n4. **报告输出**：通过 `report()` 方法获取结构化报告，支持 JSON 等格式导出。","某金融科技公司数据工程师在整合多源客户交易数据时，急需在建模前完成数据质量评估及隐私合规筛查。\n\n### 没有 DataProfiler 时\n- 需要人工编写大量 Pandas 代码逐列计算均值、标准差及缺失比例，效率极低且容易出错。\n- 依赖正则表达式手动匹配敏感信息，容易漏掉复杂的身份证或银行卡号模式，存在合规隐患。\n- 不同来源的 Parquet 和 CSV 文件需分别预处理，导致数据探查流程碎片化，难以统一管理。\n- 生成的统计结果分散在各处，无法形成统一视图供团队共享审查，沟通成本高昂。\n\n### 使用 DataProfiler 后\n- DataProfiler 自动解析多种文件格式并直接加载为 DataFrame，省去繁琐的手动预处理步骤。\n- 内置预训练模型高效识别 PII 实体，精准定位手机号、邮箱等敏感字段无需定制规则。\n- 一行代码即可生成包含相关性矩阵和分布统计的完整 JSON 报告，便于下游系统直接调用。\n- 全局与列级统计数据一目了然，帮助团队快速决策哪些字段可用于模型训练，避免无效清洗。\n\nDataProfiler 通过自动化数据剖析与敏感信息检测，显著降低了数据准备阶段的合规风险与时间成本，让工程师专注于核心算法优化。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcapitalone_DataProfiler_b6d7c955.png","capitalone","Capital One","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fcapitalone_75a05601.jpg","We’re an open source-first organization — actively using, contributing to and managing open source software projects.",null,"opensource@capitalone.com","https:\u002F\u002Fwww.capitalone.com\u002Ftech\u002Fopen-source\u002F","https:\u002F\u002Fgithub.com\u002Fcapitalone",[84,88,92,95,98,101],{"name":85,"color":86,"percentage":87},"Python","#3572A5",99.8,{"name":89,"color":90,"percentage":91},"Makefile","#427819",0,{"name":93,"color":94,"percentage":91},"CSS","#663399",{"name":96,"color":97,"percentage":91},"HTML","#e34c26",{"name":99,"color":100,"percentage":91},"Batchfile","#C1F12E",{"name":102,"color":103,"percentage":91},"Shell","#89e051",1551,185,"2026-03-24T10:55:09","Apache-2.0",1,"未说明",{"notes":111,"python":109,"dependencies":112},"支持通过 pip 安装不同功能模块（full\u002Fml\u002Freports）；内置预训练深度学习模型用于敏感数据检测（PII\u002FNPI）；若无需 ML 功能可安装精简版以避免安装 TensorFlow 等严格依赖；首次使用可能涉及模型加载。",[113,114],"pandas","tensorflow",[26,54,51,13],[117,118,119,120,121,122,123,124,125,126,127,128,129,113,130,131,132,133,134,135],"python","privacy","pii","npi","nlp","data-science","gdpr","data-analysis","data-labels","avro","dataprofiling","sensitive-data","security","csv","tabular-data","dataset","network-data","graph-data","machine-learning",4,"2026-03-27T02:49:30.150509","2026-04-06T06:44:27.448424",[140,145,149,154,159,164],{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},2415,"运行 diff_report 后使用 json.dumps 报错 \"Object of type int64 is not JSON serializable\" 怎么办？","该问题已在库版本 0.7.11 中修复。建议将 DataProfiler 升级到该版本或更高版本。如果暂时无法升级，可以尝试使用 `output_format=\"serializable\"` 选项来避免此序列化错误。","https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fissues\u002F442",{"id":146,"question_zh":147,"answer_zh":148,"source_url":144},2416,"使用 profile.report() 导出 JSON 报告时遇到序列化错误，推荐什么设置？","建议在调用 report 方法时显式指定 `output_format=\"serializable\"`，而不是默认的 `compact`。这样可以确保生成的报告对象可以直接被 `json.dumps` 序列化而不会报错。",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},2417,"使用 DataLabeler 时报错 \"Tensorflow error\" 或 \"bases must be types\" 如何解决？","这通常是因为缺少机器学习相关的依赖项。请尝试通过以下命令安装额外的 ML 要求：`pip install dataprofiler[ml]`。如果已安装但仍报错，请检查环境兼容性。","https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fissues\u002F490",{"id":155,"question_zh":156,"answer_zh":157,"source_url":158},2418,"在 Python 3.12 环境下安装 DataProfiler 的完整包（含 ML）失败，提示 TensorFlow 兼容性问题怎么办？","在特定时期，TensorFlow 和 Keras 的新版本与 Python 3.12 存在兼容性问题。建议暂时切换回较旧的 Python 版本（如 3.11），直到库官方支持更新为止。","https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fissues\u002F1144",{"id":160,"question_zh":161,"answer_zh":162,"source_url":163},2419,"导入 `F1Score` 时出现 \"cannot import name 'F1Score'\" 错误是什么原因？","这是一个已知的导入错误，已在 Pull Request #952 中修复。请确保将 DataProfiler 库升级到包含该修复的最新版本，或者检查是否使用了正确的模块路径。","https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fissues\u002F773",{"id":165,"question_zh":166,"answer_zh":167,"source_url":168},2420,"如何快速调整 Profiler 的配置选项以优化性能或简化设置？","可以使用预设（Preset）功能来快速调整多个设置。例如：`opts = dp.ProfilerOptions(preset=\"complete\")` 或 `profiler = dp.Profiler(data, preset=\"complete\")`。其他可用预设包括 `\"standard\"`、`\"numeric_stats_disabled\"` 和 `\"data_types\"`。","https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fissues\u002F604",[170,175,180,185,190,195,200,205,210,215,220,225,230,235,240,245,250,255,260,265],{"id":171,"version":172,"summary_zh":173,"released_at":174},101932,"0.13.4","# Documentation\r\n- add architecture.rst for algorithm rationale, testing, versioning (https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1181)\r\n#  Miscellaneous\r\n-  refactored docs workflow(https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1182)","2025-07-30T19:14:37",{"id":176,"version":177,"summary_zh":178,"released_at":179},101933,"v0.13.3","# Miscellaneous\r\n- refactored documentation release process","2025-03-18T17:24:50",{"id":181,"version":182,"summary_zh":183,"released_at":184},101934,"0.13.2","# Miscellaneous\r\n- fixed test script in release process","2025-03-13T14:24:36",{"id":186,"version":187,"summary_zh":188,"released_at":189},101935,"0.13.1","\r\n# Miscellaneous\r\n- added versioneer\r\n- removed dask from pre-commit requirements\r\n- refactored release process \r\n","2025-03-12T16:00:06",{"id":191,"version":192,"summary_zh":193,"released_at":194},101936,"0.13.0","# Profiler\r\n- staging\u002Fmain\u002F0.13.0 #1165\r\n\r\n# Documentation\r\n\r\n# Miscellaneous\r\n- Python 3.8 removed from tox environments (#1146)\r\n- Python 3.11 added to Github Actions (#1090)\r\n- PR #1162 updates the `requests` dependency to resolve vulnerabilities caused by urllib3 and certifi: \r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.12.0...0.13.0\r\n\r\n## What's Changed\r\n* staging\u002Fmain\u002F0.13.0 by @armaan-dhillon in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1165\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.12.0...0.13.0","2025-01-15T16:04:11",{"id":196,"version":197,"summary_zh":198,"released_at":199},101937,"0.12.0","# Profiler\r\n- staging\u002Fmain\u002F0.12.0 #1145\r\n\r\n# Documentation\r\n- Update Documentation v0.12.0 #1152\r\n\r\n# Miscellaneous\r\n- Remove py38 from tox envlist #1146\r\n- Fix Tox #1143\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.11.0...0.12.0\r\n\r\n## What's Changed\r\n* staging\u002Fmain\u002F0.12.0 by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1145\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.11.0...0.12.0","2024-06-14T17:33:34",{"id":201,"version":202,"summary_zh":203,"released_at":204},101938,"0.11.0","# Profiler\r\n- Version.py update 0.11.0 #1139\r\n- Update: black version #1131\r\n\r\n# Documentation\r\n- Update Documentation #1141 \r\n- docs: update test link to latest version #1114\r\n\r\n# Dependencies\r\n- Quick fix for dependency max pins #1120\r\n- Fix memray version max #1132\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.9...0.11.0\r\n\r\n## What's Changed\r\n* Version.py update 0.11.0 by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1139\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.9...0.11.0","2024-05-21T17:36:20",{"id":206,"version":207,"summary_zh":208,"released_at":209},101939,"0.10.9","# Profiler\r\n- Version.py update 0.10.9 #1107\r\n- Staging into main from dev #1106\r\n- Hot fix json bug #1105\r\n\r\n# Documentation\r\n- Docs update 0.10.9 #1108\r\n- Add downloads tile to README #1085\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.8...0.10.9\r\n\r\n## What's Changed\r\n* Staging into `main` from `dev`  by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1106\r\n* Version.py update 0.10.9 by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1107\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.8...0.10.9","2024-03-06T14:28:49",{"id":211,"version":212,"summary_zh":213,"released_at":214},101940,"0.10.8","# Profiler\r\n- Staging\u002Fmain\u002F0.10.8 #1081\r\n- Depedency: matplotlib version bump #1072\r\n- Make _assimilate_histogram() not use self #1071\r\n- Feature: added parquet sampling #1070\r\n\r\n# Documentation\r\n- Update: Documentation 0.10.8 #1084 \r\n- Docs update to include option for sample_nrows for parquet files #1082\r\n\r\n# Miscellaneous\r\n- Bump actions\u002Fsetup-python from 4 to 5 #1078\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.7...0.10.8\r\n\r\n## What's Changed\r\n* Staging\u002Fmain\u002F0.10.8 by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1081\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.7...0.10.8","2024-01-11T17:32:37",{"id":216,"version":217,"summary_zh":218,"released_at":219},101941,"0.10.7","# Profiler\r\n- Staging\u002Fmain\u002F0.10.7 #1068\r\n- Hot Fix: Plugin Testing #1067\r\n\r\n# Documentation\r\n- Update: Documentation 0.10.7 #1069\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.6...0.10.7\r\n\r\n## What's Changed\r\n* Staging\u002Fmain\u002F0.10.7 by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1068\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.6...0.10.7","2023-11-14T19:47:24",{"id":221,"version":222,"summary_zh":223,"released_at":224},101942,"0.10.6","# Profiler\r\n- Staging\u002Fmain\u002F0.10.6 #1065\r\n- Update: Version 0.10.6 #1064\r\n- Feature: Plugins #1060\r\n- Hot Fix: Contribution Doc #1059\r\n- Rename references to degree of freedom from df to deg_of_free #1056\r\n- add_s3_connection_remote_loading_s3uri_feature #1054\r\n- feat: add null ratio to column stats #1052\r\n- Delay transforming priority_order into ndarray #1045\r\n- Fix Codeowners List #1043\r\n\r\n# Documentation\r\n- Update: Documentation 0.10.6 #1066\r\n- Docs: AWS S3 Data Reading #1063\r\n- Update docs to reflect renamed output of deg_of_free #1057\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.5...0.10.6\r\n\r\n## What's Changed\r\n* Fix Codeowners List by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1044\r\n* Staging\u002Fmain\u002F0.10.6 by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1065\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.5...0.10.6","2023-11-13T21:24:15",{"id":226,"version":227,"summary_zh":228,"released_at":229},101943,"0.10.5","# Profiler\r\n- Categorical PSI #1040\r\n- Categorical PSI #1039\r\n\r\n# Documentation\r\n- Update docs 0.10.5 #1042\r\n- Update docs 0.10.5 #1041\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.4...0.10.5\r\n\r\n## What's Changed\r\n* Categorical PSI by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1040\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.4...0.10.5","2023-09-25T15:37:17",{"id":231,"version":232,"summary_zh":233,"released_at":234},101944,"0.10.4","# Profiler\r\n- version bump (#1032) #1036\r\n- Staging\u002Fmain\u002F0.10.4 #1029\r\n- added psi calculation to categorical columns #1027\r\n- Bump actions\u002Fcheckout from 3 to 4 #1024\r\n- Minor: Profiler Path Fix in Example Notebook #1021\r\n- modified the assignees for issue creation #1016\r\n- Make sure random_state is a list before indexed assignment #968\r\n\r\n# Documentation\r\n- Update docs 0.10.4 #1038\r\n- Update docs 0.10.4 #1037\r\n- update install instructions for mac #1026\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.3...0.10.4\r\n\r\n## What's Changed\r\n* Staging\u002Fmain\u002F0.10.4 by @ksneab7 in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1029\r\n* version bump (#1032) by @ksneab7 in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1036\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.3...0.10.4","2023-09-22T12:01:40",{"id":236,"version":237,"summary_zh":238,"released_at":239},101945,"0.10.3","# Profiler\r\n- Staging: main 0.10.3 #1004\r\n- Fix ProfilerOptions() documentation #1002\r\n\r\n## Feature: Multiprocess\r\n- Staging: into dev feature\u002Fmultiprocess #998\r\n- Multiprocess automation feature into staging\u002Fdev. #997\r\n- Syncing feature\u002Fmultiprocess into staging\u002Fdev\u002Fmultiprocess #992\r\n- Automate multiprocess option #984\r\n\r\n## Feature: `num_quantiles` option\r\n- Staging: into dev feature\u002Fnum-quantiles #990\r\n- Fix Scipy Mend Issue #988\r\n- HistogramAndQuantilesOption sync with dev branch #987\r\n\r\n# Documentation\r\n- Update docs to 0.10.3 #1012\r\n- Update docs to 0.10.3 #1011\r\n- fixed snappy install issue on Mac #1010\r\n- Staging: into dev-gh-pages the docs for multiprocess. #1001\r\n- Add docs to multiprocess option in StructuredOptions. #999\r\n- Staging: into dev-gh-pages the docs for num_quantiles. #993\r\n- Add docs for num_quantiles option for histogram_and_quantiles. #991\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.2...0.10.3\r\n\r\n## What's Changed\r\n* Staging: main `0.10.3` by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F1004\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.2...0.10.3\r\n","2023-08-07T19:10:27",{"id":241,"version":242,"summary_zh":243,"released_at":244},101946,"0.10.2","# Profiler\r\n- hotfix[0.10.2]: cat vs float bug #973\r\n\r\n# Documentation\r\n- Staging: Update docs to 0.10.2 #978\r\n- Update docs to 0.10.2 #979\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.1...0.10.2\r\n\r\n## What's Changed\r\n* hotfix[0.10.2]: cat vs float bug by @JGSweets in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F973\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.1...0.10.2\r\n","2023-07-28T16:18:42",{"id":246,"version":247,"summary_zh":248,"released_at":249},101947,"0.10.1","# Profiler\r\n- Hot Fix: .astype(\"bool\") #960\r\n\r\n\r\n# Documentation\r\n- Staging: Update docs 0.10.1 #961\r\n- Update docs 0.10.1 #962\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.0...0.10.1\r\n\r\n## What's Changed\r\n* Hot Fix: `.astype(\"bool\")` by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F960\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.10.0...0.10.1","2023-07-17T18:21:20",{"id":251,"version":252,"summary_zh":253,"released_at":254},101948,"0.10.0","# Profiler\r\n- Forking workflow directions CONTRIBUTING.md #857\r\n- Fixing diagram rendering in CONTRIBUTING.md #862\r\n- Fix initial value of processor_type #863\r\n- fix: test bug due to bad mocks #878\r\n- added differences section to unstructured data example #877\r\n- Reservoir sampling refactor #910\r\n- feat: add dev to workfow for testing #897\r\n- Cms for categorical #892\r\n- Hotfix: fix post feature serialization merge #942\r\n- Update version to 0.10.0 #944\r\n- Staging\u002Fmain\u002F0.10.0 #943\r\n\r\n## Profiler: Profile Serialization\r\n- Staging\u002Fdev\u002Fprofile serialization #940\r\n- fix: order bug #939\r\n- fix: null_rep mat should calculate even if datetime #933\r\n- Profiler: load_method hotfix #932\r\n- Top level hotfix: save \u002F load .lower() #931\r\n- Notebook Example save\u002Fload Profile #930\r\n- refactor: use seed for sample for consistency #927\r\n- Profile Builder load() serialization #925\r\n- Reuse passed labeler #924\r\n- BaseProfiler save() for json #923\r\n- Added testing for values for test_json_decode_after_update #915\r\n- UnstructuredProfiler: Added NoImplementationError #907\r\n- fix: bug and add tests for structuredcolprofiler #904\r\n- Stuctured profiler encode decode #903\r\n- refactor: allow options to go through all #902\r\n- StructuredColProfiler Encode \u002F Decode #901\r\n- Decode options #894\r\n- Quick Test update #893\r\n- Deserialization of datalabeler #891\r\n- ColumnDataLabelerCompiler: serialize \u002F deserialize #888\r\n- Add Serialization and Deserialization Tests for Stats Compiler, plus refactors for order Typing #887\r\n- Adds deserialization for compilers and validates tests for Primitive; fixes numerical deserialization #886\r\n- Adds tests validating serialization with Primitive type for compiler #885\r\n- feat: add test and compiler serialization #884\r\n- ready datalabeler for deserialization and improvement on serializatio… #879\r\n- Encode Options #875\r\n- Encode\u002FDecode TextColumnProfiler #870\r\n- Created encoder for the datalabelercolumn #869\r\n- Added test to ensure order attribute for ordered column profiler functions correctly after deserialization #868\r\n- Added decoding for encoding of ordered column profiles #864\r\n- Json decode date time column #861\r\n- Float column profiler encode decode #854\r\n- hot fixes for encode and decode of numeric stats mixin and intcol pro… #852\r\n\r\n\r\n## Profiler: Options\r\n- staging\u002Fdev\u002Foptions #909\r\n- RowStatisticsOptions: Implementing option #871\r\n- New preset implementation and test #867\r\n- RowStatisticsOptions: Add option #865\r\n\r\n\r\n# Documentation\r\n- Staging update docs 0.10.0 #945\r\n- Documentation: Fix Req #922\r\n- Documentation: Update for Reservoir Sampling #919\r\n- documentation update for cms specific options to category #917\r\n- Add forking \u002F branch workflow image #858\r\n\r\n## Documentation: Profile Serialization\r\n- Merge staging\u002Fdev-gh-pages\u002Fprofile-serialization into dev-gh-pages #937\r\n- Docs: Profiler Serialization Clean Up #936\r\n- Docs: Profiler Serialization #928\r\n\r\n## Documentation: Options\r\n- Documentation: feature\u002Foptions branch docs updates #921\r\n- Row statistics option documentation #883\r\n- updating docs for preset name #882\r\n- Add documentation for median_abs_deviation option #881\r\n- Preset test updated w new names and different toggles #880\r\n- reset ignore, update .gitignore, update documentation on presets #874\r\n- Fixed documentation for sampling_ratio option #873\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.9.0...0.10.0\r\n\r\n## What's Changed\r\n* Sampling ratio implement by @joshuart in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F845\r\n* StructuredOptions: `hhl_row_hashing` by @micdavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F841\r\n* Forking workflow directions CONTRIBUTING.md by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F857\r\n* Fixing diagram rendering in `CONTRIBUTING.md` by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F862\r\n* StructuredProfiler: HLLRowHashing by @micdavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F842\r\n* added differences section to unstructured data example by @lizlouise1335 in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F877\r\n* fix: test bug due to bad mocks by @JGSweets in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F878\r\n* Fix initial value of processor_type by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F863\r\n* Staging\u002Fmain\u002F0.10.0 by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F943\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.9.0...0.10.0","2023-06-30T15:04:46",{"id":256,"version":257,"summary_zh":258,"released_at":259},101949,"0.9.0","# Profiler\r\n* Encode int column #780\r\n* Decode categorical #786\r\n* Encode update format #789\r\n* Optimization for text column profile ksneab #791\r\n* Remove unnecessary cast() in csv_data.py (1) #796\r\n* Remove unnecessary cast() in csv_data.py (2) #798\r\n* Update main with change in memory-optimization #799\r\n* Remove unnecessary cast() in data.py #800\r\n* Remove unnecessary cast() in graph_data.py #801\r\n* Fix CatgoricalColumn test #804\r\n* Specify __init__ calls in data readers reload() methods #805\r\n* Fix dask dataframe import #812\r\n* Fix CharsetMatches type error #813\r\n* Json Decoder Code Cleanup #814\r\n* Fix override errors #819\r\n* Sampling ratio option #825\r\n* Memory Optimization to main #832\r\n\t* Fixed testing to run on all feature branches for PRs #793\r\n\t* Part 1 fix for categorical mem opt issue #795\r\n\t* cleanup time space analysis code #797\r\n\t* quick update to feature\u002Fmemory-optimization for merge to main #802\r\n\t* Update feat mem #803\r\n\t* Categorical Stop Condition Options #808\r\n\t* Space time analysis improvement #809\r\n\t* implementation of setting stop conds via options for cat column profiler #810\r\n\t* Fix for histogram merging #815\r\n\t* Fixes categorical bug when stop condition is met #816\r\n\t* hotfix for more conservatitive stop condition in categorical columns #817\r\n\t* Coverage Fix Memory Optimization Feature Branch #823\r\n\t* Added option to remove calculations for updating row statistics #827\r\n\t* Fix to doc strings #829\r\n\t* Preset Option Fix: presets docsstring added #830\r\n* Fix LSP violations #840\r\n* Fix argument types in doc comments #843\r\n\r\n# Documentation\r\n* Fix minor typo #788\r\n* Github pages memory optimization #833\r\n\t* added new options to docs #828\r\n\t* Preset Option Fix: Added presets documentation to profiler options section #831\r\n* Update docs for 0.9.0 #851\r\n\r\n\r\n# Other Changes\r\n* Memory testing and data gen scripts #781\r\n* Update for new Dask version in Validator test #784\r\n* Space analysis dataset sampling addition #787\r\n* fix bug in dataset generation #790\r\n* Update pre-commit mypy dependencies #811\r\n* Coverage Fix to Main Branch #822\r\n* Update version to 0.9.0 #848\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.8.9...0.9.0\r\n\r\n## What's Changed\r\n* Create method to serialize NumericalStatsMixin and functions by @kshitijavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F776\r\n* Memory testing and data gen scripts by @ksneab7 in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F781\r\n* Update for new Dask version in Validator test by @JGSweets in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F784\r\n* Encode int column by @kshitijavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F780\r\n* Fix minor typo by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F788\r\n* Space analysis dataset sampling addition by @ksneab7 in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F787\r\n* fix bug in dataset generation by @ksneab7 in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F790\r\n* Optimization for text column profile ksneab by @ksneab7 in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F791\r\n* Encode update format by @kshitijavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F789\r\n* Remove unnecessary cast() in csv_data.py (1) by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F796\r\n* Remove unnecessary cast() in csv_data.py (2) by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F798\r\n* Update main with change in `memory-optimization` by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F799\r\n* Remove unnecessary cast() in data.py by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F800\r\n* Remove unnecessary cast() in graph_data.py by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F801\r\n* Decode categorical by @kshitijavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F786\r\n* Fix CatgoricalColumn test by @kshitijavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F804\r\n* Specify __init__ calls in data readers reload() methods by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F805\r\n* Fix dask dataframe import by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F812\r\n* Fix CharsetMatches type error by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F813\r\n* Json Decoder Code Cleanup by @micdavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F814\r\n* Update pre-commit mypy dependencies by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F811\r\n* Coverage Fix to Main Branch by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F822\r\n* Fix override errors by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F819\r\n* Memory Optimization to `main` by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F832\r\n* Fix LSP violations by @junholee6a in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F840\r\n* Sampling ratio option by @joshuart in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F825\r\n* Fix ar","2023-06-01T16:05:11",{"id":261,"version":262,"summary_zh":263,"released_at":264},101950,"0.8.9","# Profiler\r\n* Create BaseColumnProfiler.to_dict to make JSONable #766\r\n* Chi2 docs update #767\r\n* Create Profile Encoder to JSONify BaseColumnProfiler #769\r\n* Encode categorical column #770\r\n* Encode order column #772\r\n* Add and test JSONify DateTimeColumn #774\r\n\r\n# Documentation\r\n* Update docs 0.8.9 #779\r\n\r\n# Other Changes\r\n* fix: update ml reqs #777\r\n* Update to version 0.8.9 #778\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.8.8...0.8.9\r\n\r\n## What's Changed\r\n* Create BaseColumnProfiler.to_dict to make JSONable by @kshitijavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F766\r\n* Create Profile Encoder to JSONify BaseColumnProfiler by @kshitijavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F769\r\n* Encode categorical column by @kshitijavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F770\r\n* Encode order column by @kshitijavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F772\r\n* Add and test JSONify DateTimeColumn by @kshitijavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F774\r\n* fix: update ml reqs by @JGSweets in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F777\r\n* Update to version 0.8.9 by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F778\r\n\r\n## New Contributors\r\n* @kshitijavis made their first contribution in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F766\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.8.8...0.8.9","2023-04-12T15:18:18",{"id":266,"version":267,"summary_zh":268,"released_at":269},101951,"0.8.8","# Profiler\r\n* Quick chi2 test fix #763 \r\n\r\n# Documentation\r\n* Update docs 0.8.8 #765\r\n* Chi2 docs update #767\r\n\r\n# Other Changes\r\n* Update to version 0.8.8 #764\r\n* PyPi image rendering issue #761\r\n* [BUG] update isort version pin #760\r\n* [BUG] isort version change #759\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.8.7.post1...0.8.8\r\n\r\n## What's Changed\r\n* [BUG] isort version change by @micdavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F759\r\n* [BUG] update isort version pin by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F760\r\n* PyPi image rendering issue  by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F761\r\n* Quick chi2 test fix by @taylorfturner in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F763\r\n* Update to version 0.8.8 by @micdavis in https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fpull\u002F764\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcapitalone\u002FDataProfiler\u002Fcompare\u002F0.8.7.post1...0.8.8","2023-02-21T22:56:52"]