[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-dathere--qsv":3,"tool-dathere--qsv":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",151314,2,"2026-04-11T23:32:58",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":120,"forks":121,"last_commit_at":122,"license":123,"difficulty_score":124,"env_os":125,"env_gpu":126,"env_ram":127,"env_deps":128,"category_tags":133,"github_topics":134,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":155,"updated_at":156,"faqs":157,"releases":187},6770,"dathere\u002Fqsv","qsv","Blazing-fast Data-Wrangling toolkit","qsv 是一款专为处理表格数据（如 CSV、Excel 等）打造的高性能命令行工具集。它旨在解决用户在面对海量数据时，传统软件加载缓慢、操作繁琐或内存不足的痛点，让数据的查询、切片、排序、过滤、转换、验证及合并等操作变得简单高效。\n\n无论是需要快速清洗日志的数据工程师、进行探索性分析的研究人员，还是希望在终端流畅处理大文件的开发者，qsv 都能成为得力的助手。即使是不熟悉编程的普通用户，也能通过其简洁直观的指令轻松完成复杂的数据整理任务。\n\nqsv 最显著的技术亮点在于其“极速”特性。基于 Rust 语言编写，它充分利用多核处理器优势，处理速度远超同类工具，即便在普通笔记本电脑上也能秒级处理百万行级别的数据。此外，qsv 的命令设计遵循“可组合”原则，用户可以像搭积木一样将多个命令串联，灵活构建复杂的数据处理流程。它还支持丰富的文件格式，并内置了数据质量检查与文档生成等功能，帮助用户更好地遵循 FAIR 数据原则，让数据管理更加规范有序。","## qsv: Blazing-fast Data-Wrangling toolkit\n\n[![Linux build status](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust.yml)\n[![Windows build status](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust-windows.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust-windows.yml)\n[![macOS build status](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust-macos.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust-macos.yml)\n[![Security audit](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Fsecurity-audit.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Fsecurity-audit.yml)\n[![Codacy Badge](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_adaaa9495e23.png)](https:\u002F\u002Fapp.codacy.com\u002Fgh\u002Fdathere\u002Fqsv\u002Fdashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)\n[![Crates.io](https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fqsv.svg?logo=crates.io)](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fqsv)\n[![Discussions](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fdiscussions\u002Fdathere\u002Fqsv)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions)\n[![Minimum supported Rust version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRust-1.94-red?logo=rust)](#minimum-supported-rust-version)\n[![FOSSA Status](https:\u002F\u002Fapp.fossa.com\u002Fapi\u002Fprojects\u002Fgit%2Bgithub.com%2Fjqnatividad%2Fqsv.svg?type=shield)](https:\u002F\u002Fapp.fossa.com\u002Fprojects\u002Fgit%2Bgithub.com%2Fjqnatividad%2Fqsv?ref=badge_shield) [![DOI](https:\u002F\u002Fzenodo.org\u002Fbadge\u002F320463703.svg)](https:\u002F\u002Fdoi.org\u002F10.5281\u002Fzenodo.17851335)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fdathere\u002Fqsv)\n\n\u003Cdiv align=\"center\">\n\n &nbsp;          |  Table of Contents\n:--------------------------|:-------------------------\n![qsv logo](docs\u002Fimages\u002Fqsv_logo-gemini-indy-robothorse-small.png \"Nano Banana Prompt: Can you make the horse robotic? Also, add an \\\"MCP\\\" label on the robotic horse. Keep the same pose and dimensions.\")\u003Cbr\u002F>[_Hi-ho \"Quicksilver\" away!_](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=p9lf76xOA5k)\u003Cbr\u002F>\u003Csub>\u003Csup>[original logo details](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions\u002F295) * [Base AI-reimagined logo](docs\u002Fimages\u002Fqsv_logo-gemini-indy-robothorse-small.png) * [Event logo archive](docs\u002Fimages\u002Fevent-logos\u002F)\u003C\u002Fsup>\u003C\u002Fsub>\u003Cbr\u002F>|qsv is a data-wrangling toolkit for querying, slicing,\u003Cbr>sorting, analyzing, filtering, enriching, transforming,\u003Cbr>validating, joining, formatting, converting, chatting,\u003Cbr>[FAIR](https:\u002F\u002Fwww.go-fair.org\u002Ffair-principles\u002F)ifying & documenting tabular data (CSV, Excel, [etc](#file-formats)).\u003Cbr>Commands are simple, composable & ___[\"blazing fast\"](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions\u002F1348)___.\u003Cbr>\u003Cbr>* [Commands](#available-commands)\u003Cbr>* Installation: [CLI](#installation-options) • [MCP Server](.claude\u002Fskills\u002Fdocs\u002Fguides\u002FSTART_HERE.md) • [Cowork Plugin](.claude\u002Fskills\u002Fdocs\u002Fguides\u002FSTART_HERE.md#step-2-optional-install-the-cowork-plugin)\u003Cbr> * [Whirlwind Tour](docs\u002Fwhirlwind_tour.md#a-whirlwind-tour) \u002F [Notebooks](contrib\u002Fnotebooks\u002F) \u002F [Lessons & Exercises](https:\u002F\u002F100.dathere.com)\u003Cbr>* [FAQ](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions\u002Fcategories\u002Ffaq)\u003Cbr>* [Performance Tuning](docs\u002FPERFORMANCE.md#performance-tuning)\u003Cbr>* 👉 [Benchmarks](https:\u002F\u002Fqsv.dathere.com\u002Fbenchmarks) 🚀\u003Cbr>* [Environment Variables](docs\u002FENVIRONMENT_VARIABLES.md)\u003Cbr>* [Feature Flags](#feature-flags)\u003Cbr>* [Goals\u002FNon-goals](#goals--non-goals)\u003Cbr>* [Testing](#testing)\u003Cbr>* [NYC&nbsp;SOD&nbsp;2022](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002Fe\u002F2PACX-1vQ12ndZL--gkz0HLQRaxqsNOwzddkv1iUKB3sq661yA77OPlAsmHJHpjaqt9s9QEf73VqMfb0cv4jHU\u002Fpub?start=false&loop=false&delayms=3000)\u002F[csv,conf,v8](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002F10T_3MyIqS5UsKxJaOY7Ktrd-GfhJelQImlE_qYmtuis\u002Fedit#slide=id.g2e0f1e7aa0e_0_62)\u002F[PyConUS&nbsp;2025](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002Fe\u002F2PACX-1vRKFnU0Hm8oDrtCYbxcf96kHVsPcoLU05jPVNYaAs09D05gPMWDJ96q_4_zgUvadGro4deohisy-XtY\u002Fpub?start=false&loop=false&delayms=3000)\u002F\u003Cbr>&nbsp;&nbsp;&nbsp;[csv,conf,v9](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002F1j-S0q5gqR8agsqIPBVXabGEntMlc4FDTwb4r-v8-9tA\u002Fedit?usp=sharing)\u002F[NYC&nbsp;SOD&nbsp;2026](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002Fe\u002F2PACX-1vTobPFucA1QO6u8dF3CyHjOctoom1DBmgQF558I_gx5e8cWPr0HLvJISvoaZyCMwLZZdDHlhK2cil0o\u002Fpub?start=false&loop=false&delayms=5000)\u003Cbr>* **_\"Have we achieved ACI?\"_** series - [1](https:\u002F\u002Fdathere.com\u002F2026\u002F01\u002Fthe-peoples-api-is-finally-here\u002F) • [2](https:\u002F\u002Fdathere.github.io\u002FNYC-Snow-Analysis-2010-2026-Claude-Cowork\u002F)  • [3](https:\u002F\u002Fdathere.github.io\u002Fpeoples-api-demos\u002FNYC-Housing-Policy-SOD2026\u002F) \u003Cbr>* [Sponsor](#sponsor)\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n\n## Try it out at [qsv.dathere.com](https:\u002F\u002Fqsv.dathere.com)! \u003C!-- markdownlint-disable-line -->\n\n\u003C\u002Fdiv>\n\n| \u003Ca name=\"available-commands\">Command | Description |\n| --- | --- |\n| [apply](docs\u002Fhelp\u002Fapply.md)✨\u003Cbr>📇🚀🧠🤖🔣👆| Apply series of string, date, math & currency transformations to given CSV column\u002Fs. It also has some basic [NLP](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FNatural_language_processing) functions ([similarity](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fstrsim), [sentiment analysis](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fvader_sentiment), [profanity](https:\u002F\u002Fdocs.rs\u002Fcensor\u002Flatest\u002Fcensor\u002F), [eudex](https:\u002F\u002Fgithub.com\u002Fticki\u002Feudex#eudex-a-blazingly-fast-phonetic-reductionhashing-algorithm), [language](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fwhatlang) & [name gender](https:\u002F\u002Fgithub.com\u002FRaduc4\u002Fgender_guesser?tab=readme-ov-file#gender-guesser)) detection.  |\n| [applydp](docs\u002Fhelp\u002Fapplydp.md)✨\u003Cbr>📇🚀🔣👆 ![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png)| \u003Ca name=\"applydp_deeplink\">\u003C\u002Fa>applydp is a slimmed-down version of `apply` with only [Datapusher+](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fdatapusher-plus) relevant subcommands\u002Foperations (`qsvdp` binary variant only). |\n| [behead](docs\u002Fhelp\u002Fbehead.md) | Drop headers from a CSV. |\n| [blake3](docs\u002Fhelp\u002Fblake3.md)\u003Cbr>🚀 | Compute or check [BLAKE3](https:\u002F\u002Fgithub.com\u002FBLAKE3-team\u002FBLAKE3\u002F?tab=readme-ov-file#blake3) hashes of files. |\n| [cat](docs\u002Fhelp\u002Fcat.md)\u003Cbr>🗄️ | Concatenate CSV files by row or by column. |\n| [clipboard](docs\u002Fhelp\u002Fclipboard.md)✨\u003Cbr>🖥️ | Provide input from the clipboard or save output to the clipboard. |\n| [color](docs\u002Fhelp\u002Fcolor.md)✨\u003Cbr>🐻‍❄️🖥️ | Outputs tabular data as a pretty, colorized table that always fits into the terminal. Apart from CSV and its dialects, Arrow, Avro\u002FIPC, Parquet, JSON array & JSONL formats are supported with the \"polars\" feature. |\n| [count](docs\u002Fhelp\u002Fcount.md)\u003Cbr>📇🏎️🐻‍❄️ | Count the rows and optionally compile record width statistics of a CSV file. (11.87 seconds for a 15gb, 28m row NYC 311 dataset without an index. Instantaneous with an index.) If the `polars` feature is enabled, uses Polars' multithreaded, mem-mapped CSV reader for fast counts even without an index |\n| [datefmt](docs\u002Fhelp\u002Fdatefmt.md)\u003Cbr>📇🚀👆 | Formats recognized date fields ([19 formats recognized](https:\u002F\u002Fdocs.rs\u002Fqsv-dateparser\u002Flatest\u002Fqsv_dateparser\u002F#accepted-date-formats)) to a specified date format using [strftime date format specifiers](https:\u002F\u002Fdocs.rs\u002Fchrono\u002Flatest\u002Fchrono\u002Fformat\u002Fstrftime\u002F). |\n| [dedup](docs\u002Fhelp\u002Fdedup.md)\u003Cbr>🤯🚀👆 | Remove duplicate rows (See also `extdedup`, `extsort`, `sort` & `sortcheck` commands). |\n| [describegpt](docs\u002Fhelp\u002Fdescribegpt.md)\u003Cbr>🌐🤖🪄🗃️📚⛩️ ![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png) | \u003Ca name=\"describegpt_deeplink\">\u003C\u002Fa>Infer a [\"neuro-symbolic\"](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FNeuro-symbolic_AI) Data Dictionary, Description & Tags or ask questions about a CSV with a [configurable, Mini Jinja prompt file](resources\u002Fdescribegpt_defaults.toml), using any [OpenAI API](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fintroduction)-compatible LLM, including local LLMs via [Ollama](https:\u002F\u002Follama.com), [Jan](https:\u002F\u002Fjan.ai) & [LM Studio](https:\u002F\u002Flmstudio.ai\u002F).\u003Cbr>(e.g. [Markdown](docs\u002Fdescribegpt\u002Fnyc311-describegpt.md), [JSON](docs\u002Fdescribegpt\u002Fnyc311-describegpt.json), [TOON](docs\u002Fdescribegpt\u002Fnyc311-describegpt.toon), [Everything](docs\u002Fdescribegpt\u002Fnyc311-describegpt-everything.md), [Spanish](docs\u002Fdescribegpt\u002Fnyc311-describegpt-spanish.md), [Mandarin](docs\u002Fdescribegpt\u002Fnyc311-describegpt-mandarin.md), [Controlled Tags](docs\u002Fdescribegpt\u002Fnyc311-describegpt-tagvocab.md);\u003Cbr>[--prompt \"What are the top 10 complaint types by community board & borough by year?\"](docs\u002Fdescribegpt\u002Fnyc311-describegpt-prompt.md) - [deterministic, hallucination-free SQL RAG result](docs\u002Fdescribegpt\u002Fnyc311-describegpt-prompt.csv); [iterative, session-based SQL RAG refinement](docs\u002Fdescribegpt\u002Fallegheny_discussion3.md) - [refined SQL RAG result](docs\u002Fdescribegpt\u002Fmostexpensive6.csv)) |\n| [diff](docs\u002Fhelp\u002Fdiff.md)\u003Cbr>🚀 | Find the difference between two CSVs with ludicrous speed!\u003Cbr\u002F>e.g. _compare two CSVs with 1M rows x 9 columns in under 600ms!_ |\n| [edit](docs\u002Fhelp\u002Fedit.md) | Replace the value of a cell specified by its row and column. |\n| [enum](docs\u002Fhelp\u002Fenum.md)\u003Cbr>👆 | Add a new column enumerating rows by adding a column of incremental or uuid identifiers. Can also be used to copy a column or fill a new column with a constant value.  |\n| [excel](docs\u002Fhelp\u002Fexcel.md)\u003Cbr>🚀 | Exports a specified Excel\u002FODS sheet to a CSV file. |\n| [exclude](docs\u002Fhelp\u002Fexclude.md)\u003Cbr>📇👆 | Removes a set of CSV data from another set based on the specified columns.  |\n| [explode](docs\u002Fhelp\u002Fexplode.md)\u003Cbr>🔣👆 | Explode rows into multiple ones by splitting a column value based on the given separator.  |\n| [extdedup](docs\u002Fhelp\u002Fextdedup.md)\u003Cbr>👆 | Remove duplicate rows from an arbitrarily large CSV\u002Ftext file using a memory-mapped, [on-disk hash table](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fodht). Unlike the `dedup` command, this command does not load the entire file into memory nor does it sort the deduped file. |\n| [extsort](docs\u002Fhelp\u002Fextsort.md)\u003Cbr>🚀📇👆 | Sort an arbitrarily large CSV\u002Ftext file using a multithreaded [external merge sort](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FExternal_sorting) algorithm. |\n| [fetch](docs\u002Fhelp\u002Ffetch.md)✨\u003Cbr>📇🧠🌐 | Send\u002FFetch data to\u002Ffrom web services for every row using **HTTP Get**. Comes with [HTTP\u002F2](https:\u002F\u002Fhttp2-explained.haxx.se\u002Fen\u002Fpart1) [adaptive flow control](https:\u002F\u002Fmedium.com\u002Fcoderscorner\u002Fhttp-2-flow-control-77e54f7fd518), [jaq](https:\u002F\u002Fgithub.com\u002F01mf02\u002Fjaq?tab=readme-ov-file#jaq) JSON query language support, dynamic throttling ([RateLimit](https:\u002F\u002Fwww.ietf.org\u002Farchive\u002Fid\u002Fdraft-ietf-httpapi-ratelimit-headers-06.html)) & caching with available persistent caching using [Redis](https:\u002F\u002Fredis.io\u002F) or a disk-cache. |\n| [fetchpost](docs\u002Fhelp\u002Ffetchpost.md)✨\u003Cbr>📇🧠🌐⛩️ | Similar to `fetch`, but uses **HTTP Post** ([HTTP GET vs POST methods](https:\u002F\u002Fwww.geeksforgeeks.org\u002Fdifference-between-http-get-and-post-methods\u002F)). Supports HTML form (application\u002Fx-www-form-urlencoded), JSON (application\u002Fjson) and custom content types - with the ability to render payloads using CSV data using the [Mini Jinja](https:\u002F\u002Fdocs.rs\u002Fminijinja\u002Flatest\u002Fminijinja\u002F) template engine. |\n| [fill](docs\u002Fhelp\u002Ffill.md)\u003Cbr>👆 | Fill empty values.  |\n| [fixlengths](docs\u002Fhelp\u002Ffixlengths.md) | Force a CSV to have same-length records by either padding or truncating them. |\n| [flatten](docs\u002Fhelp\u002Fflatten.md) | A flattened view of CSV records. Useful for viewing one record at a time.\u003Cbr \u002F>e.g. `qsv slice -i 5 data.csv \\| qsv flatten`. |\n| [fmt](docs\u002Fhelp\u002Ffmt.md) | Reformat a CSV with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.)  |\n| [foreach](docs\u002Fhelp\u002Fforeach.md)✨ | Execute a shell command once per record in a given CSV file. |\n| [frequency](docs\u002Fhelp\u002Ffrequency.md)\u003Cbr>📇😣🏎️👆🪄![Luau](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_afb2a208d532.png) | Build [frequency distribution tables](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FFrequency_(statistics)) of each column. Uses multithreading to go faster if an index is present (Examples: [CSV](scripts\u002Fnyc311-1m.freqs.csv) [JSON](scripts\u002Fnyc311-1m.freqs.json) [TOON](scripts\u002Fnyc311-1m.freqs.toon)). |\n| [geocode](docs\u002Fhelp\u002Fgeocode.md)✨\u003Cbr>📇🧠🌐🚀🔣👆🌎 | Geocodes a location against an updatable local copy of the [Geonames](https:\u002F\u002Fwww.geonames.org\u002F) cities & the [Maxmind GeoLite2](https:\u002F\u002Fwww.maxmind.com\u002Fen\u002Fgeolite-free-ip-geolocation-data) databases. With caching and multi-threading, it geocodes up to 360,000 records\u002Fsec! |\n| [geoconvert](docs\u002Fhelp\u002Fgeoconvert.md)✨\u003Cbr>🌎 | Convert between various spatial formats and CSV\u002FSVG including GeoJSON, SHP, and more. |\n| [headers](docs\u002Fhelp\u002Fheaders.md)\u003Cbr>🗄️ | Show the headers of a CSV. Or show the intersection of all headers between many CSV files. |\n| [index](docs\u002Fhelp\u002Findex.md) | Create an index (📇) for a CSV. This is very quick (even the 15gb, 28m row NYC 311 dataset takes all of 14 seconds to index) & provides constant time indexing\u002Frandom access into the CSV. With an index, `count`, `sample` & `slice` work instantaneously; random access mode is enabled in `luau`; and multithreading (🏎️) is enabled for the `frequency`, `split`, `stats` & `schema` commands. |\n| [input](docs\u002Fhelp\u002Finput.md) | Read CSV data with special commenting, quoting, trimming, line-skipping & non-UTF8 encoding handling rules. Typically used to \"normalize\" a CSV for further processing with other qsv commands. |\n| [join](docs\u002Fhelp\u002Fjoin.md)\u003Cbr>😣👆 | Inner, outer, right, cross, anti & semi joins. Automatically creates a simple, in-memory hash index to make it fast.  |\n| [joinp](docs\u002Fhelp\u002Fjoinp.md)✨\u003Cbr>🚀🐻‍❄️🪄 | Inner, outer, right, cross, anti, semi, non-equi & asof joins using the [Pola.rs](https:\u002F\u002Fwww.pola.rs) engine. Unlike the `join` command, `joinp` can process files larger than RAM, is multithreaded, has join key validation, a maintain row order option, pre and post-join filtering, join keys unicode normalization, supports \"special\" [non-equi joins](https:\u002F\u002Fdocs.pola.rs\u002Fuser-guide\u002Ftransformations\u002Fjoins\u002F#non-equi-joins) and [asof joins](https:\u002F\u002Fdocs.pola.rs\u002Fuser-guide\u002Ftransformations\u002Fjoins\u002F#asof-join) (which is [particularly useful for time series data](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F30cc920d0812a854fcbfedc5db81788a0600c92b\u002Ftests\u002Ftest_joinp.rs#L509-L983)) & its output columns can be coalesced. |\n| [json](docs\u002Fhelp\u002Fjson.md)\u003Cbr>👆 | Convert JSON array to CSV.\n| [jsonl](docs\u002Fhelp\u002Fjsonl.md)\u003Cbr>🚀🔣 | Convert newline-delimited JSON ([JSONL](https:\u002F\u002Fjsonlines.org\u002F)\u002F[NDJSON](http:\u002F\u002Fndjson.org\u002F)) to CSV. See `tojsonl` command to convert CSV to JSONL.\n| [lens](docs\u002Fhelp\u002Flens.md)✨🗃️\u003Cbr>🐻‍❄️🖥️ | Interactively view, search & filter tabular data files using the [csvlens](https:\u002F\u002Fgithub.com\u002FYS-L\u002Fcsvlens#csvlens) engine. Apart from CSV and its dialects, Arrow, Avro\u002FIPC, Parquet, JSON array & JSONL formats are supported with the \"polars\" feature. |\n| [luau](docs\u002Fhelp\u002Fluau.md) ![Luau](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_afb2a208d532.png)✨\u003Cbr>📇🌐🔣📚 ![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png) | \u003Ca name=\"luau_deeplink\">\u003C\u002Fa>Create multiple new computed columns, filter rows, compute aggregations and build complex data pipelines by executing a [Luau](https:\u002F\u002Fluau-lang.org) [0.709](https:\u002F\u002Fgithub.com\u002FRoblox\u002Fluau\u002Freleases\u002Ftag\u002F0.709) expression\u002Fscript for every row of a CSV file ([sequential mode](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fbb72c4ef369d192d85d8b7cc6e972c1b7df77635\u002Ftests\u002Ftest_luau.rs#L254-L298)), or using [random access](https:\u002F\u002Fwww.webopedia.com\u002Fdefinitions\u002Frandom-access\u002F) with an index ([random access mode](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fbb72c4ef369d192d85d8b7cc6e972c1b7df77635\u002Ftests\u002Ftest_luau.rs#L367-L415)).\u003Cbr>Can process a single Luau expression or [full-fledged data-wrangling scripts using lookup tables](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv-lookup-tables#example) with discrete BEGIN, MAIN and END sections.\u003Cbr> It is not just another qsv command, it is qsv's [Domain-specific Language](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FDomain-specific_language) (DSL) with [numerous qsv-specific helper functions](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fsrc\u002Fcmd\u002Fluau.rs#L1473-L2755) to build production data pipelines. |\n| [moarstats](docs\u002Fhelp\u002Fmoarstats.md)\u003Cbr>📇🏎️ | Add dozens of additional statistics, including extended outlier, robust & bivariate statistics to an existing stats CSV file. ([example](docs\u002Fmoarstats\u002FNYC_311_SR_2010-2020-sample-1M.stats.csv)).|\n| [partition](docs\u002Fhelp\u002Fpartition.md)\u003Cbr>👆 | Partition a CSV based on a column value. |\n| [pivotp](docs\u002Fhelp\u002Fpivotp.md)✨\u003Cbr>🚀🐻‍❄️🪄 | Pivot CSV data. Features \"smart\" aggregation auto-selection based on data type & stats. |\n| [pragmastat](docs\u002Fhelp\u002Fpragmastat.md)\u003Cbr>📇🤯🪄 | Compute pragmatic statistics using the [Pragmastat](https:\u002F\u002Fpragmastat.dev\u002F) library. Uses the stats cache to auto-filter non-numeric columns and support Date\u002FDateTime columns. |\n| [pro](docs\u002Fhelp\u002Fpro.md) | Interact with the [qsv pro](https:\u002F\u002Fqsvpro.dathere.com) API. |\n| [prompt](docs\u002Fhelp\u002Fprompt.md)✨\u003Cbr>🐻‍❄️🖥️ | Open a file dialog to either pick a file as input or save output to a file. |\n| [pseudo](docs\u002Fhelp\u002Fpseudo.md)\u003Cbr>🔣👆 | [Pseudonymise](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPseudonymization) the value of the given column by replacing them with an incremental identifier.  |\n| [py](docs\u002Fhelp\u002Fpy.md)✨\u003Cbr>📇🔣 | Create a new computed column or filter rows by evaluating a Python expression on every row of a CSV file. Python's [f-strings](https:\u002F\u002Fwww.freecodecamp.org\u002Fnews\u002Fpython-f-strings-tutorial-how-to-use-f-strings-for-string-formatting\u002F) is particularly useful for extended formatting, [with the ability to evaluate Python expressions as well](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F4cd00dca88addf0d287247fa27d40563b6d46985\u002Fsrc\u002Fcmd\u002Fpython.rs#L23-L31). [Requires Python 3.10 or greater](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FINTERPRETERS.md#building-qsv-with-python-feature). |\n| [rename](docs\u002Fhelp\u002Frename.md) |  Rename the columns of a CSV efficiently. |\n| [replace](docs\u002Fhelp\u002Freplace.md)\u003Cbr>📇👆🏎️ | Replace CSV data using a regex. Applies the regex to each field individually. |\n| [reverse](docs\u002Fhelp\u002Freverse.md)\u003Cbr>📇🤯 | Reverse order of rows in a CSV. Unlike the `sort --reverse` command, it preserves the order of rows with the same key. If an index is present, it works with constant memory. Otherwise, it will load all the data into memory. |\n| [safenames](docs\u002Fhelp\u002Fsafenames.md)\u003Cbr>![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png) | \u003Ca name=\"safenames_deeplink\">\u003C\u002Fa>Modify headers of a CSV to only have [\"safe\" names](\u002Fsrc\u002Fcmd\u002Fsafenames.rs#L5-L14) - guaranteed \"database-ready\"\u002F\"CKAN-ready\" names.  |\n| [sample](docs\u002Fhelp\u002Fsample.md)\u003Cbr>📇🌐🏎️ | Randomly draw rows (with optional seed) from a CSV using seven different sampling methods - [reservoir](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FReservoir_sampling) (default), [indexed](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FRandom_access), [bernoulli](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FBernoulli_sampling), [systematic](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSystematic_sampling), [stratified](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FStratified_sampling), [weighted](https:\u002F\u002Fdoi.org\u002F10.1016\u002Fj.ipl.2005.11.003) & [cluster sampling](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCluster_sampling). Supports sampling from CSVs on remote URLs. |\n| [schema](docs\u002Fhelp\u002Fschema.md)\u003Cbr>📇😣🏎️👆🪄🐻‍❄️ | \u003Ca name=\"schema_deeplink\">\u003C\u002Fa>Infer either a [JSON Schema Validation Draft 2020-12](https:\u002F\u002Fjson-schema.org\u002Fdraft\u002F2020-12\u002Fjson-schema-validation) ([Example](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fresources\u002Ftest\u002F311_Service_Requests_from_2010_to_Present-2022-03-04.csv.schema.json)) or [Polars Schema](https:\u002F\u002Fdocs.pola.rs\u002Fuser-guide\u002Flazy\u002Fschemas\u002F) ([Example](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fresources\u002Ftest\u002FNYC_311_SR_2010-2020-sample-1M.pschema.json)) from CSV data.\u003Cbr>In JSON Schema Validation mode, it produces a `.schema.json` file replete with inferred data type & domain\u002Frange validation rules derived from [`stats`](#stats_deeplink). Uses multithreading to go faster if an index is present. See [`validate`](#validate_deeplink) command to use the generated JSON Schema to validate if similar CSVs comply with the schema.\u003Cbr>With the `--polars` option, it produces a `.pschema.json` file that all polars commands (`sqlp`, `joinp` & `pivotp`) use to determine the data type of each column & to optimize performance.\u003Cbr>Both schemas are editable and can be fine-tuned. For JSON Schema, to refine the inferred validation rules. For Polars Schema, to change the inferred Polars data types. |\n| [scoresql](docs\u002Fhelp\u002Fscoresql.md)✨\u003Cbr>🐻‍❄️🪄 | Analyze a SQL query against CSV file caches (stats, moarstats, frequency) to produce a performance score with actionable optimization suggestions BEFORE running the query. Supports Polars (default) and DuckDB modes. |\n| [search](docs\u002Fhelp\u002Fsearch.md)\u003Cbr>📇🏎️👆 | Run a regex over a CSV. Applies the regex to selected fields & shows only matching rows.  |\n| [searchset](docs\u002Fhelp\u002Fsearchset.md)\u003Cbr>📇🏎️👆 | _Run multiple regexes over a CSV in a single pass._ Applies the regexes to each field individually & shows only matching rows.  |\n| [select](docs\u002Fhelp\u002Fselect.md)\u003Cbr>👆 | Select, re-order, reverse, duplicate or drop columns.  |\n| [slice](docs\u002Fhelp\u002Fslice.md)\u003Cbr>📇🏎️🗃️ | Slice rows from any part of a CSV. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice).  |\n| [snappy](docs\u002Fhelp\u002Fsnappy.md)\u003Cbr>🚀🌐 | \u003Ca name=\"snappy_deeplink\">\u003C\u002Fa>Does streaming compression\u002Fdecompression of the input using Google's [Snappy](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fsnappy\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.md) framing format ([more info](#automatic-compressiondecompression)). |\n| [sniff](docs\u002Fhelp\u002Fsniff.md)\u003Cbr>📇🌐🤖 ![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png) | Quickly sniff & infer CSV metadata (delimiter, header row, preamble rows, quote character, flexible, is_utf8, average record length, number of records, content length & estimated number of records if sniffing a CSV on a URL, number of fields, field names & data types). It is also a general mime type detector. |\n| [sort](docs\u002Fhelp\u002Fsort.md)\u003Cbr>🚀🤯👆 | Sorts CSV data in [lexicographical](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FLexicographic_order), [natural](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FNatural_sort_order), numerical, reverse, unique or random (with optional seed) order (Also see `extsort` & `sortcheck` commands).  |\n| [sortcheck](docs\u002Fhelp\u002Fsortcheck.md)\u003Cbr>👆 | Check if a CSV is sorted. With the --json options, also retrieve record count, sort breaks & duplicate count. |\n| [split](docs\u002Fhelp\u002Fsplit.md)\u003Cbr>📇🏎️ | Split one CSV file into many CSV files. It can split by number of rows, number of chunks or file size. Uses multithreading to go faster if an index is present when splitting by rows or chunks. |\n| [sqlp](docs\u002Fhelp\u002Fsqlp.md)✨\u003Cbr>📇🚀🐻‍❄️🗄️🪄 | \u003Ca name=\"sqlp_deeplink\">\u003C\u002Fa>Run [Polars](https:\u002F\u002Fpola.rs) SQL (a PostgreSQL dialect) queries against several CSVs, Parquet, JSONL and Arrow files - converting queries to blazing-fast Polars [LazyFrame](https:\u002F\u002Fdocs.pola.rs\u002Fuser-guide\u002Flazy\u002F) expressions, processing larger than memory CSV files. Query results can be saved in CSV, JSON, JSONL, Parquet, Apache Arrow IPC and Apache Avro formats. |\n| [stats](docs\u002Fhelp\u002Fstats.md)\u003Cbr>📇🤯🏎️👆🪄 | \u003Ca name=\"stats_deeplink\">\u003C\u002Fa>Compute [summary statistics](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSummary_statistics) (sum, min\u002Fmax\u002Frange, sort order\u002Fsortiness, min\u002Fmax\u002Fsum\u002Favg length, mean, standard error of the mean (SEM), geometric\u002Fharmonic means, stddev, variance, Coefficient of Variation (CV), nullcount, max precision, sparsity, quartiles, Interquartile Range (IQR), lower\u002Fupper fences, skewness, median, mode\u002Fs, antimode\u002Fs, cardinality & uniqueness ratio) & make GUARANTEED data type inferences (Null, String, Float, Integer, Date, DateTime, Boolean) for each column in a CSV ([Example](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fscripts\u002FNYC_311_SR_2010-2020-sample-1M.stats.csv) - [more info](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fwiki\u002FSupplemental#stats-command-output-explanation)).\u003Cbr>Uses multithreading to go faster if an index is present (with an index, can compile \"streaming\" stats on NYC's 311 data (15gb, 28m rows) in less than 7.3 seconds!). |\n| [table](docs\u002Fhelp\u002Ftable.md)\u003Cbr>🤯 | Align output of a CSV using [elastic tabstops](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Ftabwriter) for viewing; or to create an \"aligned TSV\" file or Fixed Width Format file. To interactively view a CSV, use the `lens` command. |\n| [template](docs\u002Fhelp\u002Ftemplate.md)\u003Cbr>📇🚀🔣📚⛩️![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png) | Renders a template using CSV data with the [Mini Jinja](https:\u002F\u002Fdocs.rs\u002Fminijinja\u002Flatest\u002Fminijinja\u002F) template engine ([Example](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F4645ec07b5befe3b0c0e49bf0f547315d0d7514b\u002Fsrc\u002Fcmd\u002Ftemplate.rs#L18-L44)). |\n| [to](docs\u002Fhelp\u002Fto.md)✨\u003Cbr>🚀🐻‍❄️🗄️ | Convert CSV files to [Parquet](https:\u002F\u002Fparquet.apache.org), [PostgreSQL](https:\u002F\u002Fwww.postgresql.org), [SQLite](https:\u002F\u002Fwww.sqlite.org\u002Findex.html), Excel (XLSX), [LibreOffice Calc](https:\u002F\u002Fwww.libreoffice.org\u002Fdiscover\u002Fcalc\u002F) (ODS) and [Data Package](https:\u002F\u002Fdatahub.io\u002Fdocs\u002Fdata-packages\u002Ftabular). |\n| [tojsonl](docs\u002Fhelp\u002Ftojsonl.md)\u003Cbr>📇😣🚀🔣🪄🗃️ | Smartly converts CSV to a newline-delimited JSON ([JSONL](https:\u002F\u002Fjsonlines.org\u002F)\u002F[NDJSON](http:\u002F\u002Fndjson.org\u002F)). By scanning the CSV first, it \"smartly\" infers the appropriate JSON data type for each column. See `jsonl` command to convert JSONL to CSV. |\n| [transpose](docs\u002Fhelp\u002Ftranspose.md)\u003Cbr>🤯👆 | Transpose rows\u002Fcolumns of a CSV.  |\n| [validate](docs\u002Fhelp\u002Fvalidate.md)\u003Cbr>📇🚀🌐📚🗄️![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png) | \u003Ca name=\"validate_deeplink\">\u003C\u002Fa>Validate CSV data [_blazingly-fast_](https:\u002F\u002Fgithub.com\u002FStranger6667\u002Fjsonschema-rs?tab=readme-ov-file#performance \"using jsonschema-rs - the fastest JSON Schema validator for Rust\") using [JSON Schema Validation (Draft 2020-12)](https:\u002F\u002Fjson-schema.org\u002Fdraft\u002F2020-12\u002Fjson-schema-validation.html) (e.g. _up to 780,031 rows\u002Fsecond_[^1] using [NYC's 311 schema](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fresources\u002Ftest\u002F311_Service_Requests_from_2010_to_Present-2022-03-04.csv.schema.json) generated by the [`schema`](#schema_deeplink) command) & put invalid records into a separate file along with a detailed validation error report.\u003Cbr>\u003Cbr>Supports several custom JSON Schema formats & keywords:\u003Cbr> * `currency` custom format with [ISO-4217](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FISO_4217) validation\u003Cbr> * `dynamicEnum` custom keyword that supports enum validation against a CSV on the filesystem or a URL (http\u002Fhttps\u002Fckan & dathere URL schemes supported)\u003Cbr>* `uniqueCombinedWith` custom keyword to validate uniqueness across multiple columns for composite key validation.\u003Cbr>\u003Cbr>If no JSON schema file is provided, validates if a CSV conforms to the [RFC 4180 standard](#rfc-4180-csv-standard) and is UTF-8 encoded. |\n\n\u003Cdiv style=\"text-align: right\">\u003Csub>\u003Csup>Performance metrics compiled on an M2 Pro 12-core Mac Mini with 32gb RAM\u003C\u002Fsup>\u003C\u002Fsub>\u003C\u002Fdiv>\n\n\u003Ca name=\"legend_deeplink\">✨\u003C\u002Fa>: enabled by a [feature flag](#feature-flags).  \n📇: uses an index when available.  \n🤯: loads entire CSV into memory, though `dedup`, `stats` & `transpose` have \"streaming\" modes as well.  \n😣: uses additional memory proportional to the cardinality of the columns in the CSV.  \n🧠: expensive operations are memoized with available inter-session Redis\u002FDisk caching for fetch commands.  \n🗄️: [Extended input support](#extended-input-support).  \n🗃️: [Limited Extended input support](#limited-extended-input-support).  \n🐻‍❄️: command powered\u002Faccelerated by [![polars 0.53.0:py_1.39.3:880651f](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpolars-0.53.0:py_1.39.3_880651f-blue?logo=polars)](https:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars\u002Freleases\u002Ftag\u002Fpy-1.39.3) vectorized query engine.  \n🤖: command uses Natural Language Processing or Generative AI.  \n🏎️: multithreaded and\u002For faster when an index (📇) is available.  \n🚀: multithreaded even without an index.  \n![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png) : has [CKAN](https:\u002F\u002Fckan.org)-aware integration options.  \n🌐: has web-aware options.  \n🔣: requires UTF-8 encoded input.  \n👆: has powerful column selector support. See [`select`](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fsrc\u002Fcmd\u002Fselect.rs#L2) for syntax.  \n🪄: \"automagical\" commands that uses stats and\u002For frequency tables to work \"smarter\" & \"faster\".  \n📚: has lookup table support, enabling runtime \"lookups\" against local or remote reference CSVs.  \n🌎: has geospatial capabilities.  \n⛩️: uses [Mini Jinja](https:\u002F\u002Fdocs.rs\u002Fminijinja\u002Flatest\u002Fminijinja\u002F) template engine.  \n![Luau](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_afb2a208d532.png) : uses [Luau](https:\u002F\u002Fluau.org\u002F) [0.709](https:\u002F\u002Fgithub.com\u002FRoblox\u002Fluau\u002Freleases\u002Ftag\u002F0.709) as an embedded scripting [DSL](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FDomain-specific_language).  \n🖥️: part of the User Interface (UI) feature group\n\n[^1]: see [`validate_index` benchmark](https:\u002F\u002Fqsv.dathere.com\u002Fbenchmarks)\n\n## Installation Options\n\n> [!NOTE]\n> To install the qsv MCP Server and the optional Claude Cowork plugin, see the [Getting Started guide](.claude\u002Fskills\u002Fdocs\u002Fguides\u002FSTART_HERE.md).\n\n### Option 0: qsv pro\n\nIf you prefer to explore your data using a graphical interface instead of the command-line, feel free to try out **[qsv pro](https:\u002F\u002Fqsvpro.dathere.com)**. Leveraging qsv, qsv pro can help you quickly analyze spreadsheet data by just dropping a file, along with many other interactive features. Learn more at [qsvpro.dathere.com](https:\u002F\u002Fqsvpro.dathere.com) or download qsv pro directly by clicking one of the badges below.\n\n\u003Cdiv style=\"display: flex; gap: 1rem;\">\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fqsv-pro\u002Fwindows\">\u003Cimg alt=\"qsv pro Windows download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_a55368354c85.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fqsv-pro\u002Fmacos\">\u003Cimg alt=\"qsv pro macOS download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_a2fb3ac014a3.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fqsv-pro\u002Flinux-deb\">\u003Cimg alt=\"qsv pro Linux (deb) download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_658e05f19c09.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fqsv-pro\u002Flinux-rpm\">\u003Cimg alt=\"qsv pro Linux (rpm) download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_c72909f063a7.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fqsv-pro\u002Flinux-appimage\">\u003Cimg alt=\"qsv pro Linux (AppImage) download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_2ce62267c43b.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n### Option 1: Download Prebuilt Binaries\n\nFull-featured prebuilt [binary variants](#variants) of the latest qsv version for Linux, macOS & Windows are available [for download](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Freleases\u002Flatest), including binaries compiled with [Rust Nightly](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F70745970\u002Frust-nightly-vs-beta-version) ([more info](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FPERFORMANCE.md#nightly-release-builds)). You may click a badge below based on your platform to download a ZIP with pre-built binaries.\n\n\u003Cdiv style=\"display: flex; gap: 1rem;\">\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Flinux-x86_64-gnu\">\u003Cimg alt=\"qsv Linux x86_64 GNU download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_07209b955c55.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Flinux-aarch64-gnu\">\u003Cimg alt=\"qsv Linux AArch64 GNU download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_3d0fad2cd6de.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Flinux-x86_64-musl\">\u003Cimg alt=\"qsv Linux x86_64 MUSL download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_81b4e2adf40b.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Flinux-powerpc64le-gnu\">\u003Cimg alt=\"qsv linux-powerpc64le-gnu download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_97dc5b7fa8df.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fmacos-silicon\">\u003Cimg alt=\"qsv macOS download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_0378e21668eb.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fwindows-msvc\">\u003Cimg alt=\"qsv Windows MSVC download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_cbb9cecaf56a.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fwindows-aarch64-msvc\">\u003Cimg alt=\"qsv Windows AArch64 MSVC download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_f0bd0b7458f5.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fwindows-gnu\">\u003Cimg alt=\"qsv Windows GNU download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_19779b920b3a.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\nPrebuilt binaries for Apple Silicon, Windows for ARM, [IBM Power Servers (PowerPC64 LE Linux)](https:\u002F\u002Fwww.ibm.com\u002Fproducts\u002Fpower) and [IBM Z mainframes (s390x)](https:\u002F\u002Fwww.ibm.com\u002Fproducts\u002Fz) have CPU optimizations enabled ([`target-cpu=native`](https:\u002F\u002Frust-lang.github.io\u002Fpacked_simd\u002Fperf-guide\u002Ftarget-feature\u002Frustflags.html#target-cpu)) for even more performance gains.\n\nWe do not enable CPU optimizations on prebuilt binaries on x86_64 platforms as there are too many CPU variants which often lead to Illegal Instruction (SIGILL) faults. If you still get SIGILL faults, \"portable\" binaries (all CPU optimizations disabled) are also included in the release zip archives (qsv with a \"p for portable\" suffix - e.g. `qsvp`, `qsvplite` `qsvpdp`).\n\nFor Windows, an MSI \"Easy installer\" for the x86_64 MSVC `qsvp` binary is also available. After downloading and installing the Easy installer, launch the Easy installer and click \"Install qsv\" to download the latest `qsvp` pre-built binary to a folder that is added to your `PATH`. Afterwards qsv should be installed and you may launch a new terminal to use qsv.\n\n\u003Ca download href=\"https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv-easy-windows-installer\u002Freleases\u002Fdownload\u002Fv1.1.1\u002Fqsv-easy-installer_1.1.1_x64_en-US.msi\">\u003Cimg alt=\"qsv Windows Easy Installer download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_8ceb32c89df4.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\nFor macOS, [\"ad-hoc\" signatures](https:\u002F\u002Fusers.rust-lang.org\u002Ft\u002Fdistributing-cli-apps-on-macos\u002F70223) are used to sign our binaries, so you will need to [set appropriate Gatekeeper security settings](https:\u002F\u002Fsupport.apple.com\u002Fen-us\u002FHT202491) or run the following command to remove the quarantine attribute from qsv before you run it for the first time:\n\n```bash\n# replace qsv with qsvmcp, qsvlite or qsvdp if you installed those binary variants\nxattr -d com.apple.quarantine qsv\n```\n\nAn additional benefit of using the prebuilt binaries is that they have the `self_update` feature enabled, allowing you to quickly update qsv to the latest version with a simple `qsv --update`. For further security, the `self_update` feature only fetches [releases from this GitHub repo](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Freleases) and automatically verifies the signature of the downloaded zip archive before installing the update.\n\n> [!NOTE]\n> The `luau` feature is not available in `musl` prebuilt binaries[^3].\n\n#### Manually verifying the Integrity of the Prebuilt Binaries Zip Archives\nAll prebuilt binaries zip archives are signed with [zipsign](https:\u002F\u002Fgithub.com\u002FKijewski\u002Fzipsign#zipsign) with the following public key [qsv-zipsign-public.key](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fraw\u002Fmaster\u002Fsrc\u002Fqsv-zipsign-public.key). To verify the integrity of the downloaded zip archives:\n\n```bash\n# if you don't have zipsign installed yet\ncargo install zipsign\n\n# verify the integrity of the downloaded prebuilt binary zip archive\n# after downloading the zip archive and the qsv-zipsign-public.key file.\n# replace \u003CPREBUILT-BINARY-ARCHIVE.zip> with the name of the downloaded zip archive\n# e.g. zipsign verify zip qsv-0.118.0-aarch64-apple-darwin.zip qsv-zipsign-public.key\nzipsign verify zip \u003CPREBUILT-BINARY-ARCHIVE.zip> qsv-zipsign-public.key\n```\n\n### Option 2: Package Managers & Distributions\n\nqsv is also distributed by several package managers and distros.\n\n[![Packaging status](https:\u002F\u002Frepology.org\u002Fbadge\u002Fvertical-allrepos\u002Fqsv.svg)](https:\u002F\u002Frepology.org\u002Fproject\u002Fqsv\u002Fversions)\n\nHere are the relevant commands for installing qsv using the various package managers and distros:\n```bash\n# Arch Linux AUR (https:\u002F\u002Faur.archlinux.org\u002Fpackages\u002Fqsv)\nyay -S qsv\n\n# Homebrew on macOS\u002FLinux (https:\u002F\u002Fformulae.brew.sh\u002Fformula\u002Fqsv#default)\nbrew install qsv\n\n# MacPorts on macOS (https:\u002F\u002Fports.macports.org\u002Fport\u002Fqsv\u002F)\nsudo port install qsv\n\n# Mise on Linux\u002FmacOS\u002FWindows (https:\u002F\u002Fmise.jdx.dev)\nmise use -g qsv@latest\n\n# Nixpkgs on Linux\u002FmacOS (https:\u002F\u002Fsearch.nixos.org\u002Fpackages?channel=unstable&show=qsv&from=0&size=50&sort=relevance&type=packages&query=qsv)\nnix-shell -p qsv\n\n# Scoop on Windows (https:\u002F\u002Fscoop.sh\u002F#\u002Fapps?q=qsv)\nscoop install qsv\n\n# Void Linux (https:\u002F\u002Fvoidlinux.org\u002Fpackages\u002F?arch=x86_64&q=qsv)\nsudo xbps-install qsv\n\n# Conda-forge (https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Fqsv)\nconda install conda-forge::qsv\n```\n\nNote that qsv provided by these package managers\u002Fdistros enable different features (Homebrew, for instance, enables the `apply`, `fetch`, `foreach`, `geocode`, `lens`, `luau` and `to` features. However, it does automatically install shell completion for `bash`, `fish` and `zsh` shells).\n\nTo find out what features are enabled in a package\u002Fdistro's qsv, run `qsv --version` ([more info](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FPERFORMANCE.md#version-details)).\n\nIn the true spirit of open source, these packages are maintained by volunteers who wanted to make qsv easier to install in various environments. They are much appreciated, and we loosely collaborate with the package maintainers through GitHub, but know that these packages are maintained by third-parties.\n\n#### Debian package\ndatHere also maintains a Debian package targeting the latest Ubuntu LTS on x86_64 architecture to make it easier to install qsv with DataPusher+.\n\nTo install qsv on Ubuntu\u002FDebian:\n\n```bash\nwget -O - https:\u002F\u002Fdathere.github.io\u002Fqsv-deb-releases\u002Fqsv-deb.gpg | sudo gpg --dearmor -o \u002Fusr\u002Fshare\u002Fkeyrings\u002Fqsv-deb.gpg\necho \"deb [signed-by=\u002Fusr\u002Fshare\u002Fkeyrings\u002Fqsv-deb.gpg] https:\u002F\u002Fdathere.github.io\u002Fqsv-deb-releases .\u002F\" | sudo tee \u002Fetc\u002Fapt\u002Fsources.list.d\u002Fqsv.list\nsudo apt update\nsudo apt install qsv\n```\n\n### Option 3: Compile from Source\n\nIf you have [Rust installed](https:\u002F\u002Fwww.rust-lang.org\u002Ftools\u002Finstall), you can compile from source[^2]:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv.git\ncd qsv\ncargo build --release --locked --bin qsv --features all_features\n```\n\nThe compiled binary will end up in `.\u002Ftarget\u002Frelease\u002F`.\n\nTo compile different [variants](#variants) and enable optional [features](#feature-flags):\n\n```bash\n# to compile qsv with all features enabled\ncargo build --release --locked --bin qsv --features feature_capable,apply,fetch,foreach,geocode,luau,mcp,magika,polars,python,self_update,to,ui\n# shorthand\ncargo build --release --locked --bin qsv -F all_features\n# enable all CPU optimizations for the current CPU (warning: creates non-portable binary)\nCARGO_BUILD_RUSTFLAGS='-C target-cpu=native' cargo build --release --locked --bin qsv -F all_features\n\n# or build qsv with only the fetch and foreach features enabled\ncargo build --release --locked --bin qsv -F feature_capable,fetch,foreach\n\n# for qsvmcp - MCP server optimized variant\ncargo build --release --locked --bin qsvmcp -F qsvmcp\n\n# for qsvlite\ncargo build --release --locked --bin qsvlite -F lite\n\n# for qsvdp\ncargo build --release --locked --bin qsvdp -F datapusher_plus\n```\n\n[^2]: Of course, you'll also need a linker & a C compiler. Linux users should generally install GCC or Clang, according to their distribution’s documentation.\nFor example, if you use Ubuntu, you can install the `build-essential` package. On macOS, you can get a C compiler by running `$ xcode-select --install`.\nFor Windows, this means installing [Visual Studio 2022](https:\u002F\u002Fvisualstudio.microsoft.com\u002Fdownloads\u002F). When prompted for workloads, include \"Desktop Development with C++\",\nthe Windows 10 or 11 SDK & the English language pack, along with any other language packs your require.\n\n> [!NOTE]\n> To build with Rust nightly, see [Nightly Release Builds](docs\u002FPERFORMANCE.md#nightly-release-builds).\nThe `feature_capable`, `qsvmcp`, `lite` and `datapusher_plus` are MUTUALLY EXCLUSIVE features. See [Special Build Features](docs\u002FFEATURES.md#special-features-for-building-qsv-binary-variants) for more info.\n\n### Variants\n\nThere are five binary variants of qsv:\n\n* `qsv` - [feature](#feature-flags)-capable(✨), with the [prebuilt binaries](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Freleases\u002Flatest) enabling all applicable features except Python [^3]\n* `qsvpy` - same as `qsv` but with the Python feature enabled. Three subvariants are available - qsvpy311, qsvpy312 & qsvpy313 - which are compiled with the latest patch version of Python 3.11, 3.12 & 3.13 respectively. We need to have a binary for each Python version as Python is dynamically linked ([more info](docs\u002FINTERPRETERS.md#building-qsv-with-python-feature)).\n* `qsvmcp` - optimized for [MCP (Model Context Protocol)](https:\u002F\u002Fmodelcontextprotocol.io\u002F) server use with geocode, luau, mcp, polars, self_update, and to features enabled. Shares `src\u002Fmain.rs` with `qsv`.\n* `qsvlite` - all features disabled (~16% of the size of `qsv`). If you are migrating from [xsv](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Fxsv) and want the same experience and feature set, this is the variant for you.\n* `qsvdp` - optimized for use with [DataPusher+](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fdatapusher-plus) with only DataPusher+ relevant commands; an embedded [`luau`](#luau_deeplink) interpreter; [`applydp`](#applydp_deeplink), a slimmed-down version of the `apply` feature; the `--progressbar` option disabled; and the self-update only checking for new releases, requiring an explicit `--update` (~16% of the size of `qsv`).\n\n> [!NOTE]\n> There are \"portable\" subvariants of qsv available with the \"p\" suffix - `qsvp`, `qsvplite` and `qsvpdp`. These subvariants are compiled without any CPU features enabled. Use these subvariants if you have an old CPU architecture or getting \"Illegal instruction (SIGILL)\" errors when running the regular qsv binaries.\n\n[^3]: The `luau`feature is NOT enabled by default on the prebuilt binaries for musl platforms. This is because we cross-compile using GitHub Action Runners using Ubuntu 20.04 LTS with the [musl libc](https:\u002F\u002Fmusl.libc.org\u002F) toolchain. However, Ubuntu is a glibc-based, not a musl-based distro. We get around this by [cross-compiling](https:\u002F\u002Fblog.logrocket.com\u002Fguide-cross-compilation-rust\u002F).   \nUnfortunately, this prevents us from cross-compiling binaries with the `luau` feature enabled as doing so requires statically linking the host OS libc library. If you need the `luau` feature on `musl`, you will need to compile from source on your own musl-based Linux Distro (e.g. Alpine, Void, [etc.](https:\u002F\u002Fwiki.musl-libc.org\u002Fprojects-using-musl)).  \n\n### Shell Completion\nqsv has extensive, extendable [shell completion](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCommand-line_completion) support. It currently supports the following shells: `bash`, `zsh`, `powershell`, `fish`, `nushell`, `fig` & `elvish`. You may download a shell completions script for your shell by clicking one of the badges below:\n\n\u003Cdiv style=\"display: flex; gap: 1rem;\">\n\u003Ca download href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fbash-shell\">\u003Cimg alt=\"qsv Bash shell completions download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_256e6e9cdb03.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fpowershell-shell\">\u003Cimg alt=\"qsv PowerShell shell completions download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_4c7b42a04c4b.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fzsh-shell\">\u003Cimg alt=\"qsv zsh shell completions download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_801969651086.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Ffish-shell\">\u003Cimg alt=\"qsv fish shell completions download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_d682f6a3ee9d.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fnushell-shell\">\u003Cimg alt=\"qsv nushell shell completions download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_aa925e019e7e.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Ffig-shell\" width=\"160\" \u002F>\u003Cimg alt=\"qsv fig shell completions download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_0fdeba390ceb.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Felvish-shell\">\u003Cimg alt=\"qsv elvish shell completions download badge\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_d290e8042641.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\nTo customize shell completions, see the [Shell Completion](contrib\u002Fcompletions\u002FREADME.md) documentation. If you're using Bash, you can also follow the step-by-step tutorial at [100.dathere.com](https:\u002F\u002F100.dathere.com\u002Fexercises-setup.html#optional-set-up-qsv-completions) to learn how to enable the Bash shell completions.\n\n## Regular Expression Syntax\n\nThe `--select` option and several commands (`apply`, `applydp`, `datefmt`, `exclude`, `fetchpost`, `replace`, `schema`, `search`, `searchset`, `select`, `sqlp`, `stats` & `validate`) allow the user to specify regular expressions. We use the [`regex`](https:\u002F\u002Fdocs.rs\u002Fregex) crate to parse, compile and execute these expressions. [^4]\n\n[^4]: This is the same regex engine used by [`ripgrep`](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Fripgrep#ripgrep-rg) - the [blazingly fast grep replacement](https:\u002F\u002Fblog.burntsushi.net\u002Fripgrep\u002F) that powers Visual Studio's [magical](https:\u002F\u002Flab.cccb.org\u002Fen\u002Farthur-c-clarke-any-sufficiently-advanced-technology-is-indistinguishable-from-magic\u002F) [\"Find in Files\"](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fvscode-ripgrep) feature.\n\nIts syntax can be found [here](https:\u002F\u002Fdocs.rs\u002Fregex\u002Flatest\u002Fregex\u002F#syntax) and *\"is similar to other regex engines, but it lacks several features that are not known how to implement efficiently. This includes, but is not limited to, look-around and backreferences. In exchange, all regex searches in this crate have worst case O(m * n) time complexity, where m is proportional to the size of the regex and n is proportional to the size of the string being searched.\"*\n\nIf you want to test your regular expressions, [regex101](https:\u002F\u002Fregex101.com) supports the syntax used by the `regex` crate. Just select the \"Rust\" flavor.\n\n> JSON SCHEMA VALIDATION REGEX NOTE: The `schema` command, when inferring a JSON Schema Validation file, will derive a regex expression for the selected columns when the `--pattern-columns` option is used. Though the derived regex is guaranteed to work, it may not be the most efficient.\u003Cbr\u002F>Before using the generated JSON Schema file in production with the `validate` command, it is recommended that users inspect and optimize the derived regex as required.\u003Cbr\u002F>While doing so, note that the `validate` command in JSON Schema Validation mode, can also support \"fancy\" regex expressions with look-around and backreferences using the `--fancy-regex` option.\n\n## File formats\n\nqsv recognizes UTF-8\u002FASCII encoded, CSV (`.csv`), SSV (`.ssv`) and TSV files (`.tsv` & `.tab`). CSV files are assumed to have \",\" (comma) as a delimiter, SSV files have \";\" (semicolon) as a delimiter\nand TSV files, \"\\t\" (tab) as a delimiter. The delimiter is a single ascii character that can be set either by the `--delimiter` command-line option or\nwith the `QSV_DEFAULT_DELIMITER` environment variable or automatically detected when `QSV_SNIFF_DELIMITER` is set.\n\nWhen using the `--output` option, qsv will UTF-8 encode the file & automatically change the delimiter used in the generated file based on the file extension - i.e. comma for `.csv`, semicolon for `.ssv`, tab for `.tsv` & `.tab` files.\n\nJSON files are recognized & converted to CSV with the [`json`](\u002Fsrc\u002Fcmd\u002Fjson.rs#L2) command.\n[JSONL](https:\u002F\u002Fjsonlines.org\u002F)\u002F[NDJSON](http:\u002F\u002Fndjson.org\u002F) files are also recognized & converted to\u002Ffrom CSV with the [`jsonl`](\u002Fsrc\u002Fcmd\u002Fjsonl.rs#L2) and [`tojsonl`](\u002Fsrc\u002Fcmd\u002Ftojsonl.rs#L2) commands respectively.\n\nThe `fetch` & `fetchpost` commands also produces JSONL files when its invoked without the `--new-column` option & TSV files with the `--report` option.\n\nThe `excel`, `safenames`, `sniff`, `sortcheck` & `validate` commands produce JSON files with their JSON options following the [JSON API 1.1 specification](https:\u002F\u002Fjsonapi.org\u002Fformat\u002F), so it can return detailed machine-friendly metadata that can be used by other systems.\n\nThe `schema` command produces a [JSON Schema Validation (Draft 2020-12)](https:\u002F\u002Fjson-schema.org\u002Fdraft\u002F2020-12\u002Fjson-schema-validation.html) file with the \".schema.json\" file extension, which can be used with the `validate` command to validate other CSV files with an identical schema.\n\nThe `describegpt` and `frequency` commands also both produce [TOON](https:\u002F\u002Ftoonformat.dev) files. TOON is a compact, human-readable encoding of the JSON data model for LLM prompts.\n\nThe `excel` command recognizes Excel & Open Document Spreadsheet(ODS) files (`.xls`, `.xlsx`, `.xlsm`, `.xlsb` & `.ods` files).\n\nSpeaking of Excel, if you're having trouble opening qsv-generated CSV files in Excel, set the QSV_OUTPUT_BOM environment variable to add a [Byte Order Mark](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FByte_order_mark) to the beginning of the generated CSV file. This is a workaround for [Excel's UTF-8 encoding detection bug](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F155097\u002Fmicrosoft-excel-mangles-diacritics-in-csv-files).\n\nThe `to` command converts CSVs to Parquet, Excel `.xlsx`, LibreOffice\u002FOpenOffice Calc `.ods` & [Data Package](https:\u002F\u002Fdatahub.io\u002Fdocs\u002Fdata-packages\u002Ftabular) formats, and populates [PostgreSQL](https:\u002F\u002Fwww.postgresql.org) and [SQLite](https:\u002F\u002Fwww.sqlite.org\u002Findex.html) databases.\n\nThe `sqlp` command returns query results in CSV, JSON, JSONL, [Parquet](https:\u002F\u002Fparquet.apache.org), [Apache Arrow IPC](https:\u002F\u002Farrow.apache.org\u002Fdocs\u002Fformat\u002FColumnar.html#ipc-file-format) & [Apache AVRO](https:\u002F\u002Favro.apache.org) formats. Polars SQL also supports reading external files directly in various formats with its `read_csv`, `read_ndjson`, `read_parquet` & `read_ipc` [table functions](https:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars\u002Fblob\u002F91a423fea2dc067837db65c3608e3cbc1112a6fc\u002Fcrates\u002Fpolars-sql\u002Fsrc\u002Ftable_functions.rs#L18-L43).\n\nThe `sniff` command can also detect the mime type of any file with the `--no-infer` or `--just-mime` options, may it be local or remote (http and https schemes supported).\nIt can detect more than 130 file formats, including MS Office\u002FOpen Document files, JSON, XML, PDF, PNG, JPEG and specialized geospatial formats like GPX, GML, KML, TML, TMX, TSX, TTML.\nClick [here](https:\u002F\u002Fdocs.rs\u002Ffile-format\u002Flatest\u002Ffile_format\u002F#reader-features) for a complete list.\n\n> [!NOTE]\n> When the `polars` feature is enabled, qsv can also natively read `.parquet`, `.ipc`, `.arrow`, `.json` & `.jsonl` files.\n\n### Extended Input Support\n\nThe `cat`, `headers`, `sqlp`, `to` & `validate` commands have extended input support (🗄️). If the input is `-` or empty, the command will try to use stdin as input. If it's not, it will check if its a directory, and if so, add all the files in the directory as input files.\n\nIf its a file, it will first check if it has an `.infile-list` extension. If it does, it will load the text file and parse each line as an input file path. This is a much faster and convenient way to process a large number of input files, without having to pass them all as separate command-line arguments. Further, the file paths can be anywhere in the file system, even on separate volumes. If an input file path is not fully qualified, it will be treated as relative to the current working directory. Empty lines and lines starting with `#` are ignored. Invalid file paths will be logged as warnings and skipped.\n\nFor both directory and `.infile-list` input, snappy compressed files with a `.sz` or `.zip` extension will be automatically decompressed.\n\nFinally, if its just a regular file, it will be treated as a regular input file.\n\n#### Limited Extended Input Support\nThe `describegpt`, `lens`, `slice` & `tojsonl` commands have limited extended input support (🗃️). They are different in that they only process one file. If provided an `.infile-list` or a compressed `.sz` or `.zip` file, they will only process the first file.\n\n### Automatic Compression\u002FDecompression\n\nqsv supports _automatic compression\u002Fdecompression_ using the [Snappy frame format](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fsnappy\u002Fblob\u002Fmain\u002Fframing_format.txt). Snappy was chosen instead of more popular compression formats like gzip because it was designed for [high-performance streaming compression & decompression](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fsnappy\u002Ftree\u002Fmain\u002Fdocs#readme) (up to 2.58 gb\u002Fsec compression, 0.89 gb\u002Fsec decompression).\n\nFor all commands except the `index`, `extdedup` & `extsort` commands, if the input file has an \".sz\" extension, qsv will _automatically_ do streaming decompression as it reads it. Further, if the input file has an extended CSV\u002FTSV \".sz\" extension (e.g nyc311.csv.sz\u002Fnyc311.tsv.sz\u002Fnyc311.tab.sz), qsv will also use the file extension to determine the delimiter to use.   \n\nSimilarly, if the `--output` file has an \".sz\" extension, qsv will _automatically_ do streaming compression as it writes it.\nIf the output file has an extended CSV\u002FTSV \".sz\" extension, qsv will also use the file extension to determine the delimiter to use.  \n\nNote however that compressed files cannot be indexed, so index-accelerated commands (`frequency`, `schema`, `split`, `stats`, `tojsonl`) will not be multithreaded. Random access is also disabled without an index, so `slice` will not be instantaneous and `luau`'s random-access mode will not be available.\n\nThere is also a dedicated [`snappy`](\u002Fsrc\u002Fcmd\u002Fsnappy.rs#L2) command with four subcommands for direct snappy file operations — a multithreaded `compress` subcommand (4-5x faster than the built-in, single-threaded auto-compression); a `decompress` subcommand with detailed compression metadata; a `check` subcommand to quickly inspect if a file has a Snappy header; and a `validate` subcommand to confirm if a Snappy file is valid.\n\nThe `snappy` command can be used to compress\u002Fdecompress ANY file, not just CSV\u002FTSV files.\n\nUsing the `snappy` command, we can compress NYC's 311 data (15gb, 28m rows) to 4.95 gb in _5.77 seconds_ with the multithreaded `compress` subcommand - _2.58 gb\u002Fsec_ with a 0.33 (3.01:1) compression ratio.  With `snappy decompress`, we can roundtrip decompress the same file in _16.71 seconds_ - _0.89 gb\u002Fsec_.\n\nCompare that to [zip 3.0](https:\u002F\u002Finfozip.sourceforge.net\u002FZip.html), which compressed the same file to 2.9 gb in _248.3 seconds on the same machine - 43x slower at 0.06 gb\u002Fsec_ with a 0.19 (5.17:1) compression ratio - for just an additional 14% (2.45 gb) of saved space. zip also took 4.3x longer to roundtrip decompress the same file in _72 seconds_ - _0.20 gb\u002Fsec_.\n\n> [!NOTE]\n> qsv has additional compression support beyond Snappy:\n>\n> The `sqlp` command can:\n> - Automatically decompress gzip, zstd and zlib compressed input files\n> - Automatically compress output files when using Arrow, Avro and Parquet formats (via `--format` and `--compression` options)\n>\n> When the `polars` feature is enabled, qsv can automatically decompress these compressed file formats:\n> - CSV: `.csv.gz`, `.csv.zst`, `.csv.zlib`\n> - TSV\u002FTAB: `.tsv.gz`, `.tsv.zst`, `.tsv.zlib`; `.tab.gz`, `.tab.zst`, `.tab.zlib`  \n> - SSV: `.ssv.gz`, `.ssv.zst`, `.ssv.zlib`\n>\n> Commands with both Extended and Limited Extended Input support also support the `.zip` compressed format.\n\n## RFC 4180 CSV Standard\n\nqsv follows the [RFC 4180](https:\u002F\u002Fdatatracker.ietf.org\u002Fdoc\u002Fhtml\u002Frfc4180) CSV standard. However, in real life, CSV formats vary significantly & qsv is actually not strictly compliant with the specification so it can process \"real-world\" CSV files.\nqsv leverages the awesome [Rust CSV](https:\u002F\u002Fdocs.rs\u002Fcsv\u002Flatest\u002Fcsv\u002F) crate to read\u002Fwrite CSV files.\n\nClick [here](https:\u002F\u002Fdocs.rs\u002Fcsv-core\u002Flatest\u002Fcsv_core\u002Fstruct.Reader.html#rfc-4180) to find out more about how qsv conforms to the standard using this crate.\n\nWhen dealing with \"atypical\" CSV files, you can use the `input` & `fmt` commands to normalize them to be RFC 4180-compliant.\n\n## UTF-8 Encoding\n\nqsv requires UTF-8 encoded input (of which ASCII is a subset).\n\nShould you need to re-encode CSV\u002FTSV files, you can use the `input` command to \"lossy save\" to UTF-8 - replacing invalid UTF-8 sequences with `�` ([U+FFFD REPLACEMENT CHARACTER](https:\u002F\u002Fdoc.rust-lang.org\u002Fstd\u002Fchar\u002Fconstant.REPLACEMENT_CHARACTER.html)).\n\nAlternatively, if you want to truly transcode to UTF-8, there are several utilities like [`iconv`](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FIconv) that you can use to do so on [Linux\u002FmacOS](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F805418\u002Fhow-can-i-find-encoding-of-a-file-via-a-script-on-linux) & [Windows](https:\u002F\u002Fsuperuser.com\u002Fquestions\u002F1163753\u002Fconverting-text-file-to-utf-8-on-windows-command-prompt).\n\n### Windows Powershell and Windows Excel Usage Note\n\nUnlike other modern operating systems, Microsoft Windows' [default encoding](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fpowershell\u002Fmodule\u002Fmicrosoft.powershell.core\u002Fabout\u002Fabout_character_encoding?view=powershell-7.4) [is UTF16-LE](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F66072117\u002Fwhy-does-windows-use-utf-16le). This will cause problems when redirecting qsv's output to a CSV file in Powershell & trying to open it with Excel - everything will be in the first column, as the UTF16-LE encoded CSV file will not be properly recognized by Excel.\n\n```\n# the following command will produce a UTF16-LE encoded CSV file on Windows\nqsv stats wcp.csv > wcpstats.csv\n```\n\nWhich is weird, since you'd think [Microsoft's own Excel would properly recognize UTF16-LE encoded CSV files](https:\u002F\u002Fanswers.microsoft.com\u002Fen-us\u002Fmsoffice\u002Fforum\u002Fall\u002Fopening-csv-file-with-utf16-encoding-in-excel-2010\u002Fed522cb9-e88d-4b82-b88e-a2d4bd99f874?auth=1). Regardless, to create a properly UTF-8 encoded file on Windows, use the `--output` option instead:\n\n```\n# so instead of redirecting stdout to a file on Windows\nqsv stats wcp.csv > wcpstats.csv\n\n# do this instead, so it will be properly UTF-8 encoded\nqsv stats wcp.csv --output wcpstats.csv\n```\n\nAlternatively, qsv can add a [Byte Order Mark](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FByte_order_mark) (BOM) to the beginning of a CSV to indicate it's UTF-8 encoded. You can do this by setting the `QSV_OUTPUT_BOM` environment variable to `1`.\n\nThis will allow Excel on Windows to properly recognize the CSV file as UTF-8 encoded.\n\nNote that this is not a problem with Excel on macOS, as macOS (like most other *nixes) uses UTF-8 as its default encoding.\n\nNor is it a problem with qsv output files produced on other operating systems, as Excel on Windows can properly recognize UTF-8 encoded CSV files.\n\n## Interpreters\nFor complex data-wrangling tasks, you can use Luau and Python scripts.\n\nLuau is recommended over Python for complex data-wrangling tasks as it is faster, more memory-efficient, has no external dependencies and has several data-wrangling helper functions as qsv's DSL.\n\nSee [Luau vs Python](docs\u002FINTERPRETERS.md) for more info.\n\nAnother \"interpreter\" included with qsv is [MiniJinja](https:\u002F\u002Fdocs.rs\u002Fminijinja\u002Flatest\u002Fminijinja\u002F), which is used in the `template` and `fetchpost` commands.\n\n## Memory Management\nqsv supports three memory allocators - jemalloc (default), mimalloc and the standard allocator.\u003Cbr>See [Memory Allocator](docs\u002FPERFORMANCE.md#memory-allocator) for more info.\n\nIt also has Out-of-Memory prevention, with two modes - NORMAL (default) & CONSERVATIVE.\u003Cbr>See [Out-of-Memory Prevention](docs\u002FPERFORMANCE.md#out-of-memory-oom-prevention) for more info.\n\n## Environment Variables & dotenv file support\n\nqsv supports an extensive list of environment variables and supports `.env` files to set them.\n\nFor details, see [Environment Variables](docs\u002FENVIRONMENT_VARIABLES.md) and the [`dotenv.template`](dotenv.template) file.\n## Feature Flags\n\nqsv has several [feature flags](https:\u002F\u002Fdoc.rust-lang.org\u002Fcargo\u002Freference\u002Ffeatures.html) that can be used to enable\u002Fdisable optional features.\n\nSee [Features](docs\u002FFEATURES.md) for more info.\n\n## Minimum Supported Rust Version\n\nqsv's MSRV policy is to require the latest stable [Rust version](https:\u002F\u002Fgithub.com\u002Frust-lang\u002Frust\u002Fblob\u002Fmaster\u002FRELEASES.md) that is [supported by Homebrew](https:\u002F\u002Fformulae.brew.sh\u002Fformula\u002Frust#default), currently [![HomeBrew](https:\u002F\u002Fimg.shields.io\u002Fhomebrew\u002Fv\u002Frust?logo=homebrew)](https:\u002F\u002Fformulae.brew.sh\u002Fformula\u002Frust). \nqsv itself may upgrade its MSRV, but a new qsv release will only be made once Homebrew supports the latest Rust stable.\n\n## Goals \u002F Non-Goals\n\nQuickSilver's goals, in priority order, are to be:\n* **As Fast as Possible** - To do so, it has frequent releases, an aggressive MSRV policy, takes advantage of CPU features, employs [various caching strategies](docs\u002FPERFORMANCE.md#caching), uses [HTTP\u002F2](https:\u002F\u002Fwww.cloudflare.com\u002Flearning\u002Fperformance\u002Fhttp2-vs-http1.1\u002F#:~:text=Multiplexing%3A%20HTTP%2F1.1%20loads%20resources,resource%20blocks%20any%20other%20resource.), and is multithreaded when possible and it makes sense. It also uses the latest dependencies when possible, and will use Cargo [`patch`](https:\u002F\u002Fdoc.rust-lang.org\u002Fcargo\u002Freference\u002Foverriding-dependencies.html#the-patch-section) to get unreleased fixes\u002Ffeatures from its dependencies. See [Performance](docs\u002FPERFORMANCE.md) for more info.\n* **Able to Process Very Large Files** - Most qsv commands are streaming, using constant memory, and can process arbitrarily large CSV files. For those commands that require loading the entire CSV into memory (denoted by 🤯), qsv has Out-of-Memory prevention, batch processing strategies and \"ext\"ernal commands that use the disk to process larger than memory files. See [Memory Management](docs\u002FPERFORMANCE.md#memory-management) for more info.\n* **A Complete Data-Wrangling Toolkit** - qsv aims to be a comprehensive data-wrangling toolkit that you can use for quick analysis and investigations, but is also robust enough for production data pipelines. Its many commands are targeted towards common data-wrangling tasks and can be combined\u002Fcomposed into complex data-wrangling scripts with its Luau-based DSL.  \nLuau will also serve as the backbone of a whole library of **qsv recipes** - reusable scripts for common tasks (e.g. street-level geocoding, removing PII, data enrichment, etc.) that prompt for easily modifiable parameters.   \n* **Composable\u002FInteroperable** - qsv is designed to be composable, with a focus on interoperability with other common CLI tools like 'awk', 'xargs', 'ripgrep', 'sed', etc., and with well known ETL\u002FELT tools like Airbyte, Airflow, Pentaho Kettle, etc. Its commands can be combined with other tools via pipes, and it supports other common file formats like JSON\u002FJSONL, Parquet, Arrow IPC, Avro, Excel, ODS, PostgreSQL, SQLite, etc. See [File Formats](#file-formats) for more info.\n* **As Portable as Possible** - qsv is designed to be portable, with installers on several platforms with an integrated self-update mechanism. In preference order, it supports Linux, macOS and Windows. See [Installation Options](#installation-options) for more info.\n* **As Easy to Use as Possible** - qsv is designed to be easy to use. As easy-to-use that is,\n as command line interfaces go :shrug:. Its commands have numerous options but have sensible defaults. The usage text is written for a data analyst audience, not developers; and there are numerous examples in the usage text, with the tests doubling as examples as well. With [qsv pro](https:\u002F\u002Fqsvpro.dathere.com), it has much expanded functionality while being easier to use with its Graphical User Interface.\n* **As Secure as Possible** - qsv is designed to be secure. It has no external runtime dependencies, is [written](https:\u002F\u002Faws.amazon.com\u002Fblogs\u002Fopensource\u002Fwhy-aws-loves-rust-and-how-wed-like-to-help\u002F) [in](https:\u002F\u002Fmsrc.microsoft.com\u002Fblog\u002F2019\u002F07\u002Fwhy-rust-for-safe-systems-programming\u002F) [Rust](https:\u002F\u002Fopensource.googleblog.com\u002F2023\u002F06\u002Frust-fact-vs-fiction-5-insights-from-googles-rust-journey-2022.html), and its codebase is automatically audited for security vulnerabilities with automated [DevSkim](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDevSkim#devskim), [\"cargo audit\"](https:\u002F\u002Frustsec.org) and [Codacy](https:\u002F\u002Fapp.codacy.com\u002Fgh\u002Fdathere\u002Fqsv\u002Fdashboard) Github Actions workflows.  \nIt uses the latest stable Rust version, with an aggressive MSRV policy and the latest version of all its dependencies.\nIt has an extensive test suite with more than 2,600 tests, including several [property tests](https:\u002F\u002Fmedium.com\u002Fcriteo-engineering\u002Fintroduction-to-property-based-testing-f5236229d237) which [randomly generate](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Fquickcheck#quickcheck) parameters for oft-used commands.   \nIts prebuilt binary archives are [zipsigned](https:\u002F\u002Fgithub.com\u002FKijewski\u002Fzipsign#zipsign), so you can [verify their integrity](#verifying-the-integrity-of-the-prebuilt-binaries-zip-archives). Its self-update mechanism automatically verifies the integrity of the prebuilt binaries archive before applying an update.\nSee [Security](SECURITY.md) for more info.\n* **As Easy to Contribute to as Possible** - qsv is designed to be easy to contribute to, with a focus on maintainability. It's modular architecture allows the easy addition of self-contained commands gated by feature flags, the source code is heavily commented, the usage text is embedded, and there are helper functions that make it easy to create additional commands and supporting tests. See [Features](docs\u002FFEATURES.md) and [Contributing](CONTRIBUTING.md) for more info.\n\nQuickSilver's non-goals are to be:\n* **As Small as Possible** - qsv is designed to be small, but not at the expense of performance, features, composability, portability, usability, security or maintainability. However, we do have a `qsvlite` variant that is ~16% of the size of `qsv` and a `qsvdp` variant that is ~16% of the size of `qsv`. Those variants, however, have reduced functionality.\nFurther, several commands are gated behind feature flags, so you can compile qsv with only the features you need.\n* **Multi-lingual** - qsv's _usage text_ and _messages_ are English-only. There are no plans to support other languages. This does not mean it can only process English input files.  \nIt can process well-formed CSVs in _any_ language so long as its UTF-8 encoded. Further, it supports alternate delimiters\u002Fseparators other than comma; the `apply whatlang` operation detects 69 languages; and its `apply thousands, currency and eudex` operations supports different languages and country conventions for number, currency and date parsing\u002Fformatting.  \nFinally, though the default Geonames index of the `geocode` command is English-only, the index can be rebuilt with the `geocode index-update` subcommand with the `--languages` option to return place names in multiple languages ([with support for 254 languages](http:\u002F\u002Fdownload.geonames.org\u002Fexport\u002Fdump\u002Falternatenames\u002F)).\n\n## Testing\nqsv has ~2,700 tests in the [tests](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Ftree\u002Fmaster\u002Ftests) directory.\nEach command has its own test suite in a separate file with the convention `test_\u003CCOMMAND>.rs`.\nApart from preventing regressions, the tests also serve as good illustrative examples, and are often linked from the usage text of each corresponding command.\n\nTo test each binary variant:\n\n```bash\n# to test qsv\ncargo test --features all_features\n\n# to test qsvlite\ncargo test --features lite\n# to test all tests with \"stats\" in the name with qsvlite\ncargo test stats --features lite\n\n# to test qsvmcp\ncargo test --features qsvmcp\n\n# to test qsvdp\ncargo test --features datapusher_plus\n\n# to test a specific command\n# here we test only stats and use the\n# t alias for test and the -F shortcut for --features\ncargo t stats -F all_features\n\n# to test a specific command with a specific feature\n# here we test only luau command with the luau feature\ncargo t luau -F feature_capable,luau\n\n# to test the count command with multiple features\n# we use \"test_count\" as we don't want to run other tests\n# that have \"count\" in the testname - e.g. test_geocode_countryinfo\ncargo t test_count -F feature_capable,luau,polars\n\n# to test using an alternate allocator\n# other than the default jemalloc allocator\ncargo t --no-default-features -F all_features,mimalloc\n```\n\n## License\n\nDual-licensed under MIT or the [UNLICENSE](https:\u002F\u002Funlicense.org).\n\n\n[![FOSSA Status](https:\u002F\u002Fapp.fossa.com\u002Fapi\u002Fprojects\u002Fgit%2Bgithub.com%2Fjqnatividad%2Fqsv.svg?type=large)](https:\u002F\u002Fapp.fossa.com\u002Fprojects\u002Fgit%2Bgithub.com%2Fjqnatividad%2Fqsv?ref=badge_large)\n\n## Origins\n\nqsv is a fork of the popular [xsv](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Fxsv) utility. Building on this solid foundation, it was forked in Sept 2021 and has since evolved to a general purpose data wrangling toolkit, adding numerous commands and features.\nSee [FAQ](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions\u002F287) for more details.\n\n## Sponsor\n\n\u003Cdiv align=\"center\">\n\n|qsv was made possible by|\n:-------------------------:|\n|[![datHere Logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_f8ab330bdca2.png)](https:\u002F\u002FdatHere.com)\u003Cbr>|\n|Standards-based, best-of-breed, open source solutions\u003Cbr>to make your **Data Useful, Usable & Used.**   |\n\n\u003C\u002Fdiv>\n\n## Naming Collision\n\nThis project is unrelated to [Intel's Quick Sync Video](https:\u002F\u002Fwww.intel.com\u002Fcontent\u002Fwww\u002Fus\u002Fen\u002Farchitecture-and-technology\u002Fquick-sync-video\u002Fquick-sync-video-general.html).\n","## qsv: 极速数据处理工具包\n\n[![Linux 构建状态](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust.yml)\n[![Windows 构建状态](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust-windows.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust-windows.yml)\n[![macOS 构建状态](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust-macos.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Frust-macos.yml)\n[![安全审计](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Fsecurity-audit.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Factions\u002Fworkflows\u002Fsecurity-audit.yml)\n[![Codacy 徽章](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_adaaa9495e23.png)](https:\u002F\u002Fapp.codacy.com\u002Fgh\u002Fdathere\u002Fqsv\u002Fdashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)\n[![Crates.io](https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fqsv.svg?logo=crates.io)](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fqsv)\n[![讨论区](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fdiscussions\u002Fdathere\u002Fqsv)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions)\n[![最低支持的 Rust 版本](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRust-1.94-red?logo=rust)](#minimum-supported-rust-version)\n[![FOSSA 状态](https:\u002F\u002Fapp.fossa.com\u002Fapi\u002Fprojects\u002Fgit%2Bgithub.com%2Fjqnatividad%2Fqsv.svg?type=shield)](https:\u002F\u002Fapp.fossa.com\u002Fprojects\u002Fgit%2Bgithub.com%2Fjqnatividad%2Fqsv?ref=badge_shield) [![DOI](https:\u002F\u002Fzenodo.org\u002Fbadge\u002F320463703.svg)](https:\u002F\u002Fdoi.org\u002F10.5281\u002Fzenodo.17851335)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fdathere\u002Fqsv)\n\n\u003Cdiv align=\"center\">\n\n &nbsp;          |  目录\n:--------------------------|:-------------------------\n![qsv 标志](docs\u002Fimages\u002Fqsv_logo-gemini-indy-robothorse-small.png \"Nano Banana 提示：你能把这匹马做成机器人吗？另外，在机器人马上加上‘MCP’标签。保持相同的姿势和尺寸。\")\u003Cbr\u002F>[_Hi-ho “Quicksilver” 走起！_](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=p9lf76xOA5k)\u003Cbr\u002F>\u003Csub>\u003Csup>[原始标志详情](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions\u002F295) * [基于 AI 重新构想的标志](docs\u002Fimages\u002Fqsv_logo-gemini-indy-robothorse-small.png) * [活动标志存档](docs\u002Fimages\u002Fevent-logos\u002F)\u003C\u002Fsup>\u003C\u002Fsub>\u003Cbr\u002F>|qsv 是一个用于查询、切片、排序、分析、过滤、丰富、转换、验证、连接、格式化、转换、聊天、使符合 [FAIR](https:\u002F\u002Fwww.go-fair.org\u002Ffair-principles\u002F) 原则以及记录表格数据（CSV、Excel、[等](#file-formats)）的数据处理工具包。命令简单、可组合且___“极速”(https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions\u002F1348)___.\u003Cbr>\u003Cbr>* [命令](#available-commands)\u003Cbr>* 安装：[CLI](#installation-options) • [MCP 服务器](.claude\u002Fskills\u002Fdocs\u002Fguides\u002FSTART_HERE.md) • [Cowork 插件](.claude\u002Fskills\u002Fdocs\u002Fguides\u002FSTART_HERE.md#step-2-optional-install-the-cowork-plugin)\u003Cbr> * [快速游览](docs\u002Fwhirlwind_tour.md#a-whirlwind-tour) \u002F [笔记本](contrib\u002Fnotebooks\u002F) \u002F [课程与练习](https:\u002F\u002F100.dathere.com)\u003Cbr>* [常见问题解答](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions\u002Fcategories\u002Ffaq)\u003Cbr>* [性能调优](docs\u002FPERFORMANCE.md#performance-tuning)\u003Cbr>* 👉 [基准测试](https:\u002F\u002Fqsv.dathere.com\u002Fbenchmarks) 🚀\u003Cbr>* [环境变量](docs\u002FENVIRONMENT_VARIABLES.md)\u003Cbr>* [功能标记](#feature-flags)\u003Cbr>* [目标\u002F非目标](#goals--non-goals)\u003Cbr>* [测试](#testing)\u003Cbr>* [NYC SOD 2022](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002Fe\u002F2PACX-1vQ12ndZL--gkz0HLQRaxqsNOwzddkv1iUKB3sq661yA77OPlAsmHJHpjaqt9s9QEf73VqMfb0cv4jHU\u002Fpub?start=false&loop=false&delayms=3000)\u002F[csv,conf,v8](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002F10T_3MyIqS5UsKxJaOY7Ktrd-GfhJelQImlE_qYmtuis\u002Fedit#slide=id.g2e0f1e7aa0e_0_62)\u002F[PyConUS 2025](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002Fe\u002F2PACX-1vRKFnU0Hm8oDrtCYbxcf96kHVsPcoLU05jPVNYaAs09D05gPMWDJ96q_4_zgUvadGro4deohisy-XtY\u002Fpub?start=false&loop=false&delayms=3000)\u002F\u003Cbr>&nbsp;&nbsp;&nbsp;[csv,conf,v9](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002F1j-S0q5gqR8agsqIPBVXabGEntMlc4FDTwb4r-v8-9tA\u002Fedit?usp=sharing)\u002F[NYC SOD 2026](https:\u002F\u002Fdocs.google.com\u002Fpresentation\u002Fd\u002Fe\u002F2PACX-1vTobPFucA1QO6u8dF3CyHjOctoom1DBmgQF558I_gx5e8cWPr0HLvJISvoaZyCMwLZZdDHlhK2cil0o\u002Fpub?start=false&loop=false&delayms=5000)\u003Cbr>* **_\"我们是否实现了 ACI？\"_** 系列 - [1](https:\u002F\u002Fdathere.com\u002F2026\u002F01\u002Fthe-peoples-api-is-finally-here\u002F) • [2](https:\u002F\u002Fdathere.github.io\u002FNYC-Snow-Analysis-2010-2026-Claude-Cowork\u002F)  • [3](https:\u002F\u002Fdathere.github.io\u002Fpeoples-api-demos\u002FNYC-Housing-Policy-SOD2026\u002F) \u003Cbr>* [赞助商](#sponsor)\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n\n## 快来 [qsv.dathere.com](https:\u002F\u002Fqsv.dathere.com) 体验吧！ \u003C!-- markdownlint-disable-line -->\n\n\u003C\u002Fdiv>\n\n| \u003Ca name=\"available-commands\">命令 | 描述 |\n| --- | --- |\n| [apply](docs\u002Fhelp\u002Fapply.md)✨\u003Cbr>📇🚀🧠🤖🔣👆| 对给定的 CSV 列应用一系列字符串、日期、数学和货币转换。它还包含一些基础的 [NLP](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FNatural_language_processing) 功能（[相似度](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fstrsim)、[情感分析](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fvader_sentiment)、[脏话检测](https:\u002F\u002Fdocs.rs\u002Fcensor\u002Flatest\u002Fcensor\u002F)、[eudex](https:\u002F\u002Fgithub.com\u002Fticki\u002Feudex#eudex-a-blazingly-fast-phonetic-reductionhashing-algorithm)、[语言识别](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fwhatlang) 和 [姓名性别判断](https:\u002F\u002Fgithub.com\u002FRaduc4\u002Fgender_guesser?tab=readme-ov-file#gender-guesser)）。|\n| [applydp](docs\u002Fhelp\u002Fapplydp.md)✨\u003Cbr>📇🚀PostalCodes️ (![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png))| \u003Ca name=\"applydp_deeplink\">\u003C\u002Fa>applydp 是 `apply` 的精简版，仅包含与 [Datapusher+](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fdatapusher-plus) 相关的子命令\u002F操作（仅限 qsvdp 二进制变体）。|\n| [behead](docs\u002Fhelp\u002Fbehead.md) | 去掉 CSV 文件的表头。|\n| [blake3](docs\u002Fhelp\u002Fblake3.md)\u003Cbr>🚀 | 计算或校验文件的 [BLAKE3](https:\u002F\u002Fgithub.com\u002FBLAKE3-team\u002FBLAKE3\u002F?tab=readme-ov-file#blake3) 哈希值。|\n| [cat](docs\u002Fhelp\u002Fcat.md)\u003Cbr>🗄️ | 按行或按列拼接多个 CSV 文件。|\n| [clipboard](docs\u002Fhelp\u002Fclipboard.md)✨\u003Cbr>🖥️ | 从剪贴板获取输入或将输出保存到剪贴板。|\n| [color](docs\u002Fhelp\u002Fcolor.md)✨\u003Cbr>🐻‍❄️🖥️ | 将表格数据以美观、彩色的格式输出，且始终适配终端窗口。除了 CSV 及其变种外，启用 `polars` 特性后还支持 Arrow、Avro\u002FIPC、Parquet、JSON 数组和 JSONL 格式。|\n| [count](docs\u002Fhelp\u002Fcount.md)\u003Cbr>PostalCodes️🏎️🐻‍❄️ | 统计 CSV 文件的行数，并可选择性地计算记录宽度的统计信息。（对于一个 15GB、2800 万行的 NYC 311 数据集，在没有索引的情况下耗时 11.87 秒；使用索引则瞬间完成。）若启用了 `polars` 特性，则会使用 Polars 的多线程内存映射 CSV 读取器，即使无索引也能快速计数。|\n| [datefmt](docs\u002Fhelp\u002Fdatefmt.md)\u003Cbr>PostalCodes️🚀PostalCodes️ | 使用 [strftime 日期格式说明符](https:\u002F\u002Fdocs.rs\u002Fchrono\u002Flatest\u002Fchrono\u002Fformat\u002Fstrftime\u002F) 将识别出的日期字段（[支持 19 种格式](https:\u002F\u002Fdocs.rs\u002Fqsv-dateparser\u002Flatest\u002Fqsv_dateparser\u002F#accepted-date-formats)）格式化为指定的日期格式。|\n| [dedup](docs\u002Fhelp\u002Fdedup.md)\u003Cbr>🤯🚀PostalCodes️ | 去除重复的行（参见 `extdedup`、`extsort`、`sort` 和 `sortcheck` 命令）。|\n| [describegpt](docs\u002Fhelp\u002Fdescribegpt.md)\u003Cbr>🌐🤖🪄PostalCodes️📚⛩️ (![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png)) | \u003Ca name=\"describegpt_deeplink\">\u003C\u002Fa>利用可配置的 Mini Jinja 提示文件（resources\u002Fdescribegpt_defaults.toml），结合任何兼容 [OpenAI API](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fintroduction) 的大模型（包括通过 Ollama、Jan 和 LM Studio 运行的本地模型），推断出“神经符号”式的数据字典、描述和标签，或对 CSV 文件提出问题。\u003Cbr>（例如：[Markdown](docs\u002Fdescribegpt\u002Fnyc311-describegpt.md)、[JSON](docs\u002Fdescribegpt\u002Fnyc311-describegpt.json)、[TOON](docs\u002Fdescribegpt\u002Fnyc311-describegpt.toon)、[Everything](docs\u002Fdescribegpt\u002Fnyc311-describegpt-everything.md)、[西班牙语](docs\u002Fdescribegpt\u002Fnyc311-describegpt-spanish.md)、[普通话](docs\u002Fdescribegpt\u002Fnyc311-describegpt-mandarin.md)、[受控标签](docs\u002Fdescribegpt\u002Fnyc311-describegpt-tagvocab.md)；\u003Cbr>[--prompt \"按社区委员会和行政区划分，每年前 10 大投诉类型是什么？\"](docs\u002Fdescribegpt\u002Fnyc311-describegpt-prompt.md) - [确定性的、无幻觉的 SQL RAG 结果](docs\u002Fdescribegpt\u002Fnyc311-describegpt-prompt.csv)；[迭代式、基于会话的 SQL RAG 优化](docs\u002Fdescribegpt\u002Fallegheny_discussion3.md) - [优化后的 SQL RAG 结果](docs\u002Fdescribegpt\u002Fmostexpensive6.csv)）|\n| [diff](docs\u002Fhelp\u002Fdiff.md)\u003Cbr>🚀 | 以惊人的速度找出两个 CSV 文件之间的差异！\u003Cbr\u002F>例如：_在不到 600 毫秒内比较两个各有 100 万行、9 列的 CSV 文件！_|\n| [edit](docs\u002Fhelp\u002Fedit.md) | 替换由行号和列号指定的单元格的值。|\n| [enum](docs\u002Fhelp\u002Fenum.md)\u003Cbr>PostalCodes️ | 添加一列，通过递增编号或 UUID 标识符为每一行编号。也可以用于复制某一列或用常量填充新列。|\n| [excel](docs\u002Fhelp\u002Fexcel.md)\u003Cbr>🚀 | 将指定的 Excel\u002FODS 工作表导出为 CSV 文件。|\n| [exclude](docs\u002Fhelp\u002Fexclude.md)\u003Cbr>PostalCodes️PostalCodes️ | 根据指定的列，从一个 CSV 数据集中移除另一部分数据。|\n| [explode](docs\u002Fhelp\u002Fexplode.md)\u003Cbr>PostalCodes️PostalCodes️ | 通过按给定分隔符拆分某一列的值，将一行拆分为多行。|\n| [extdedup](docs\u002Fhelp\u002Fextdedup.md)\u003Cbr>PostalCodes️ | 使用内存映射的 [磁盘哈希表](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fodht)，从任意大小的 CSV\u002F文本文件中去除重复行。与 `dedup` 命令不同，此命令不会将整个文件加载到内存中，也不会对去重后的文件进行排序。|\n| [extsort](docs\u002Fhelp\u002Fextsort.md)\u003Cbr>🚀PostalCodes️PostalCodes️ | 使用多线程的 [外部归并排序](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FExternal_sorting) 算法对任意大小的 CSV\u002F文本文件进行排序。|\n| [fetch](docs\u002Fhelp\u002Ffetch.md)✨\u003Cbr>PostalCodes️🧠🌐 | 使用 **HTTP Get** 对每一行的数据发送\u002F获取网络服务中的数据。配备 [HTTP\u002F2](https:\u002F\u002Fhttp2-explained.haxx.se\u002Fen\u002Fpart1) [自适应流量控制](https:\u002F\u002Fmedium.com\u002Fcoderscorner\u002Fhttp-2-flow-control-77e54f7fd518)、[jaq](https:\u002F\u002Fgithub.com\u002F01mf02\u002Fjaq?tab=readme-ov-file#jaq) JSON 查询语言支持、动态限流（[RateLimit](https:\u002F\u002Fwww.ietf.org\u002Farchive\u002Fid\u002Fdraft-ietf-httpapi-ratelimit-headers-06.html)）以及使用 [Redis](https:\u002F\u002Fredis.io\u002F) 或磁盘缓存实现的持久化缓存功能。|\n| [fetchpost](docs\u002Fhelp\u002Ffetchpost.md)✨\u003Cbr>PostalCodes️🧠🌐⛩️ | 类似于 `fetch`，但使用 **HTTP Post**（[HTTP GET 与 POST 方法的区别](https:\u002F\u002Fwww.geeksforgeeks.org\u002Fdifference-between-http-get-and-post-methods\u002F)）。支持 HTML 表单（application\u002Fx-www-form-urlencoded）、JSON（application\u002Fjson）以及自定义内容类型，并能使用 [Mini Jinja](https:\u002F\u002Fdocs.rs\u002Fminijinja\u002Flatest\u002Fminijinja\u002F) 模板引擎将 CSV 数据渲染到有效载荷中。|\n| [fill](docs\u002Fhelp\u002Ffill.md)\u003Cbr>PostalCodes️ | 填充空值。|\n| [fixlengths](docs\u002Fhelp\u002Ffixlengths.md) | 强制 CSV 文件中的每条记录具有相同的长度，方法是填充或截断记录。|\n| [flatten](docs\u002Fhelp\u002Fflatten.md) | CSV 记录的扁平化视图。适合逐条查看记录。\u003Cbr \u002F>例如：`qsv slice -i 5 data.csv \\| qsv flatten`。|\n| [fmt](docs\u002Fhelp\u002Ffmt.md) | 使用不同的分隔符、记录结束符或引用规则重新格式化 CSV。（支持 ASCII 分隔的数据。）|\n| [foreach](docs\u002Fhelp\u002Fforeach.md)✨ | 对给定 CSV 文件中的每一条记录执行一次 shell 命令。|\n| [frequency](docs\u002Fhelp\u002Ffrequency.md)\u003Cbr>PostalCodes️😣🏎️PostalCodes️🪄 (![Luau](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_afb2a208d532.png)) | 构建各列的 [频率分布表](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FFrequency_(statistics))。如果存在索引，可利用多线程加速处理。（示例：[CSV](scripts\u002Fnyc311-1m.freqs.csv)、[JSON](scripts\u002Fnyc311-1m.freqs.json)、[TOON](scripts\u002Fnyc311-1m.freqs.toon)。）|\n| [geocode](docs\u002Fhelp\u002Fgeocode.md)✨\u003Cbr>PostalCodes️🧠🌐🚀PostalCodes️PostalCodes️🌎 | 根据可更新的本地 [Geonames](https:\u002F\u002Fwww.geonames.org\u002F) 城市数据库和 [Maxmind GeoLite2](https:\u002F\u002Fwww.maxmind.com\u002Fen\u002Fgeolite-free-ip-geolocation-data) 数据库，对位置进行地理编码。借助缓存和多线程技术，每秒可处理高达 36 万条记录！|\n| [geoconvert](docs\u002Fhelp\u002Fgeoconvert.md)✨\u003Cbr>🌎 | 在各种空间格式与 CSV\u002FSVG 之间进行转换，包括 GeoJSON、SHP 等。|\n| [headers](docs\u002Fhelp\u002Fheaders.md)\u003Cbr>🗄️ | 显示 CSV 文件的表头。或显示多个 CSV 文件之间的所有表头交集。|\n| [index](docs\u002Fhelp\u002Findex.md) | 为 CSV 文件创建索引（PostalCodes️）。这一过程非常迅速（即使是 15GB、2800 万行的 NYC 311 数据集，也只需 14 秒即可完成索引），并且能够实现对 CSV 文件的常数时间索引和随机访问。有了索引后，`count`、`sample` 和 `slice` 命令可以瞬间执行；`luau` 命令启用随机访问模式；而 `frequency`、`split`、`stats` 和 `schema` 命令则可启用多线程处理（🏎️）。|\n| [input](docs\u002Fhelp\u002Finput.md) | 按照特殊的注释、引用、修剪、跳行以及非 UTF-8 编码处理规则读取 CSV 数据。通常用于“规范化”CSV 文件，以便后续与其他 qsv 命令配合使用。|\n| [join](docs\u002Fhelp\u002Fjoin.md)\u003Cbr>😣PostalCodes️ | 内连接、外连接、右连接、交叉连接、反连接和半连接。自动创建简单的内存哈希索引，以提高速度。|\n| [joinp](docs\u002Fhelp\u002Fjoinp.md)✨\u003Cbr>🚀🐻‍❄️🪄 | 使用 [Pola.rs](https:\u002F\u002Fwww.pola.rs) 引擎执行内连接、外连接、右连接、交叉连接、反连接、半连接、非等值连接和 asof 连接。与 `join` 命令不同，`joinp` 可以处理超出内存容量的文件，支持多线程，具备连接键验证、保持行顺序选项、连接前后过滤、连接键 Unicode 规范化等功能，还支持“特殊”的 [非等值连接](https:\u002F\u002Fdocs.pola.rs\u002Fuser-guide\u002Ftransformations\u002Fjoins\u002F#non-equi-joins) 和 [asof 连接](https:\u002F\u002Fdocs.pola.rs\u002Fuser-guide\u002Ftransformations\u002Fjoins\u002F#asof-join)（这对于时间序列数据尤其有用，详见 [测试代码](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F30cc920d0812a854fcbfedc5db81788a0600c92b\u002Ftests\u002Ftest_joinp.rs#L509-L983)），并且其输出列可以合并。|\n| [json](docs\u002Fhelp\u002Fjson.md)\u003Cbr>PostalCodes️ | 将 JSON 数组转换为 CSV。|\n| [jsonl](docs\u002Fhelp\u002Fjsonl.md)\u003Cbr>🚀PostalCodes️ | 将换行分隔的 JSON（[JSONL](https:\u002F\u002Fjsonlines.org\u002F)\u002F[NDJSON](http:\u002F\u002Fndjson.org\u002F)）转换为 CSV。有关将 CSV 转换为 JSONL，请参阅 `tojsonl` 命令。|\n| [lens](docs\u002Fhelp\u002Flens.md)✨PostalCodes️\u003Cbr>🐻‍❄️🖥️ | 使用 [csvlens](https:\u002F\u002Fgithub.com\u002FYS-L\u002Fcsvlens#csvlens) 引擎交互式地查看、搜索和筛选表格数据文件。除了 CSV 及其变种外，启用 `polars` 特性后还支持 Arrow、Avro\u002FIPC、Parquet、JSON 数组和 JSONL 格式。|\n| [luau](docs\u002Fhelp\u002Fluau.md) (![Luau](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_afb2a208d532.png))✨\u003Cbr>PostalCodes️🌐PostalCodes️📚 (![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png)) | \u003Ca name=\"luau_deeplink\">\u003C\u002Fa>通过对 CSV 文件的每一行执行 [Luau](https:\u002F\u002Fluau-lang.org) [0.709](https:\u002F\u002Fgithub.com\u002FRoblox\u002Fluau\u002Freleases\u002Ftag\u002F0.709) 表达式\u002F脚本来创建多个新的计算列、筛选行、计算聚合并构建复杂的数据管道（[顺序模式](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fbb72c4ef369d192d85d8b7cc6e972c1b7df77635\u002Ftests\u002Ftest_luau.rs#L254-L298)），或者利用索引实现 [随机访问](https:\u002F\u002Fwww.webopedia.com\u002Fdefinitions\u002Frandom-access)（[随机访问模式](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fbb72c4ef369d192d85d8b7cc6e972c1b7df77635\u002Ftests\u002Ftest_luau.rs#L367-L415)）。\u003Cbr>可以处理单个 Luau 表达式，也可以运行包含查找表的完整数据清洗脚本（[示例](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv-lookup-tables#example)），这些脚本具有明确的 BEGIN、MAIN 和 END 部分。\u003Cbr>这不仅仅是另一个 qsv 命令，它是 qsv 的 [领域特定语言](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FDomain-specific_language) (DSL)，内置了众多 qsv 特有的辅助函数（[详见源码](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fsrc\u002Fcmd\u002Fluau.rs#L1473-L2755)），可用于构建生产级的数据管道。|\n| [moarstats](docs\u002Fhelp\u002Fmoarstats.md)\u003Cbr>PostalCodes️🏎️ | 向现有的统计 CSV 文件中添加数十种额外的统计指标，包括扩展的异常值、稳健性和双变量统计。（[示例](docs\u002Fmoarstats\u002FNYC_311_SR_2010-2020-sample-1M.stats.csv)。）|\n| [partition](docs\u002Fhelp\u002Fpartition.md)\u003Cbr>PostalCodes️ | 根据某一列的值对 CSV 文件进行分区。|\n| [pivotp](docs\u002Fhelp\u002Fpivotp.md)✨\u003Cbr>🚀🐻‍❄️🪄 | 对 CSV 数据进行透视。根据数据类型和统计信息自动选择“智能”聚合方式。|\n| [pragmastat](docs\u002Fhelp\u002Fpragmastat.md)\u003Cbr>PostalCodes️🤯🪄 | 使用 [Pragmastat](https:\u002F\u002Fpragmastat.dev\u002F) 库计算实用统计。利用统计缓存自动过滤非数值列，并支持 Date\u002FDateTime 列。|\n| [pro](docs\u002Fhelp\u002Fpro.md) | 与 [qsv pro](https:\u002F\u002Fqsvpro.dathere.com) API 交互。|\n| [prompt](docs\u002Fhelp\u002Fprompt.md)✨\u003Cbr>🐻‍❄️🖥️ | 打开文件对话框，以便选择输入文件或将输出保存到文件。|\n| [pseudo](docs\u002Fhelp\u002Fpseudo.md)\u003Cbr>PostalCodes️PostalCodes️ | 通过用递增标识符替换给定列的值，对该列的值进行 [假名化](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPseudonymization)。|\n| [py](docs\u002Fhelp\u002Fpy.md)✨\u003Cbr>PostalCodes️PostalCodes️ | 通过在 CSV 文件的每一行上评估 Python 表达式来创建新的计算列或筛选行。Python 的 [f-string](https:\u002F\u002Fwww.freecodecamp.org\u002Fnews\u002Fpython-f-strings-tutorial-how-to-use-f-strings-for-string-formatting\u002F) 对于扩展格式化特别有用，[还能评估 Python 表达式](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F4cd00dca88addf0d287247fa27d40563b6d46985\u002Fsrc\u002Fcmd\u002Fpython.rs#L23-L31)。\u003Cbr>[需要 Python 3.10 或更高版本](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FINTERPRETERS.md#building-qsv-with-python-feature)。|\n| [rename](docs\u002Fhelp\u002Frename.md) | 高效地重命名 CSV 文件的列。|\n| [replace](docs\u002Fhelp\u002Freplace.md)\u003Cbr>PostalCodes️PostalCodes️🏎️ | 使用正则表达式替换 CSV 数据。对每个字段单独应用正则表达式。|\n| [reverse](docs\u002Fhelp\u002Freverse.md)\u003Cbr>PostalCodes️🤯 | 反转 CSV 文件中各行的顺序。与 `sort --reverse` 命令不同，它会保留具有相同键的行的顺序。如果有索引，只需常量内存即可完成；否则，需要将所有数据加载到内存中。|\n| [safenames](docs\u002Fhelp\u002Fsafenames.md)\u003Cbr> (![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png)) | \u003Ca name=\"safenames_deeplink\">\u003C\u002Fa>修改 CSV 文件的表头，使其仅包含“安全”的名称（[详见源码](\u002Fsrc\u002Fcmd\u002Fsafenames.rs#L5-L14)），确保这些名称“适合数据库”或“适合 CKAN”。|\n| [sample](docs\u002Fhelp\u002Fsample.md)\u003Cbr>PostalCodes️🌐🏎️ | 使用七种不同的抽样方法从 CSV 文件中随机抽取行（可选种子）——[水库抽样](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FReservoir_sampling)（默认）、[索引抽样](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FRandom_access)、[伯努利抽样](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FBernoulli_sampling)、[系统抽样](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSystematic_sampling)、[分层抽样](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FStratified_sampling)、[加权抽样](https:\u002F\u002Fdoi.org\u002F10.1016\u002Fj.ipl.2005.11.003) 和 [聚类抽样](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCluster_sampling)。支持从远程 URL 上的 CSV 文件中抽样。|\n| [schema](docs\u002Fhelp\u002Fschema.md)\u003Cbr>PostalCodes️😣🏎️PostalCodes️🪄🐻‍❄️ | \u003Ca name=\"schema_deeplink\">\u003C\u002Fa>从 CSV 数据中推断出 [JSON Schema Validation Draft 2020-12](https:\u002F\u002Fjson-schema.org\u002Fdraft\u002F2020-12\u002Fjson-schema-validation)（[示例](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fresources\u002Ftest\u002F311_Service_Requests_from_2010_to_Present-2022-03-04.csv.schema.json)）或 [Polars Schema](https:\u002F\u002Fdocs.pola.rs\u002Fuser-guide\u002Flazy\u002Fschemas\u002F)（[示例](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fresources\u002Ftest\u002FNYC_311_SR_2010-2020-sample-1M.pschema.json)）。在 JSON Schema Validation 模式下，它会生成一个 `.schema.json` 文件，其中包含从 [`stats`](#stats_deeplink) 中推断出的数据类型和域\u002F范围验证规则。如果存在索引，可利用多线程加速处理。有关如何使用生成的 JSON Schema 来验证类似 CSV 文件是否符合该模式，请参阅 `validate` 命令。\u003Cbr>启用 `--polars` 选项后，它会生成一个 `.pschema.json` 文件，所有 Polars 命令（`sqlp`、`joinp` 和 `pivotp`）都会使用该文件来确定每列的数据类型并优化性能。\u003Cbr>这两种模式均可编辑和微调。对于 JSON Schema，可以进一步完善推断出的验证规则；对于 Polars Schema，可以更改推断出的 Polars 数据类型。|\n| [scoresql](docs\u002Fhelp\u002Fscoresql.md)✨\u003Cbr>🐻‍❄️🪄 | 对 SQL 查询进行分析，结合 CSV 文件的缓存数据（stats、moarstats、frequency），在查询执行之前生成性能评分及可操作的优化建议。支持 Polars（默认）和 DuckDB 模式。|\n| [search](docs\u002Fhelp\u002Fsearch.md)\u003Cbr>PostalCodes️PostalCodes️ | 对 CSV 文件运行正则表达式。只对选定的字段应用正则表达式，并仅显示匹配的行。|\n| [searchset](docs\u002Fhelp\u002Fsearchset.md)\u003Cbr>PostalCodes️PostalCodes️ | _在一次扫描中对 CSV 文件运行多个正则表达式。_ 对每个字段单独应用正则表达式，并仅显示匹配的行。|\n| [select](docs\u002Fhelp\u002Fselect.md)\u003Cbr>PostalCodes️ | 选择、重新排序、反转、复制或删除列。|\n| [slice](docs\u002Fhelp\u002Fslice.md)\u003Cbr>PostalCodes️PostalCodes️ | 从 CSV 文件的任意部分切取行。当存在索引时，只需解析切片内的行，而无需遍历切片开始之前的所有行。|\n| [snappy](docs\u002Fhelp\u002Fsnappy.md)\u003Cbr>🚀🌐 | \u003Ca name=\"snappy_deeplink\">\u003C\u002Fa>使用 Google 的 [Snappy](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fsnappy\u002Fblob\u002Fmain\u002Fdocs\u002FREADME.md) 框架格式对输入进行流式压缩\u002F解压缩（[更多信息](#automatic-compressiondecompression)）。|\n| [sniff](docs\u002Fhelp\u002Fsniff.md)\u003Cbr>PostalCodes️🌐🤖 (![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png)) | 快速嗅探并推断 CSV 元数据（分隔符、表头行、前置行数、引用字符、灵活性、是否为 UTF-8、平均记录长度、记录总数、内容长度以及如果是在 URL 上嗅探 CSV，则估算的记录数量、字段数、字段名称和数据类型）。它同时也是通用的 MIME 类型检测器。|\n| [sort](docs\u002Fhelp\u002Fsort.md)\u003Cbr>🚀PostalCodes️PostalCodes️ | 按 [字典序](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FLexicographic_order)、[自然排序](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FNatural_sort_order)、数值、逆序、唯一值或随机（可选种子）顺序对 CSV 数据进行排序（另请参阅 `extsort` 和 `sortcheck` 命令）。|\n| [sortcheck](docs\u002Fhelp\u002Fsortcheck.md)\u003Cbr>PostalCodes️ | 检查 CSV 文件是否已排序。使用 --json 选项还可获取记录总数、排序中断次数和重复记录数。|\n| [split](docs\u002Fhelp\u002Fsplit.md)\u003Cbr>PostalCodes️PostalCodes️ | 将一个 CSV 文件拆分为多个 CSV 文件。可以根据行数、块数或文件大小进行拆分。如果按行或块数拆分且存在索引，则可利用多线程加速处理。|\n| [sqlp](docs\u002Fhelp\u002Fsqlp.md)✨\u003Cbr>PostalCodes️PostalCodes️ 🚀🐻‍❄️PostalCodes️🪄 | \u003Ca name=\"sqlp_deeplink\">\u003C\u002Fa>在多个 CSV、Parquet、JSONL 和 Arrow 文件上运行 [Polars](https:\u002F\u002Fpola.rs) SQL（PostgreSQL 的方言）查询——将查询转换为极速的 Polars [LazyFrame](https:\u002F\u002Fdocs.pola.rs\u002Fuser-guide\u002Flazy\u002F) 表达式，从而处理超出内存容量的 CSV 文件。查询结果可保存为 CSV、JSON、JSONL、Parquet、Apache Arrow IPC 和 Apache Avro 格式。|\n| [stats](docs\u002Fhelp\u002Fstats.md)\u003Cbr>PostalCodes️PostalCodes️ | \u003Ca name=\"stats_deeplink\">\u003C\u002Fa>计算 [汇总统计](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSummary_statistics)（总和、最小值\u002F最大值\u002F范围、排序顺序\u002F排序程度、最小值\u002F最大值\u002F平均长度、均值、均值的标准误差 (SEM)、几何\u002F调和平均数、标准差、方差、变异系数 (CV)、空值计数、最大精度、稀疏度、四分位数、四分位距 (IQR)、上下界、偏度、中位数、众数\u002F反众数、基数和独特性比例）并为 CSV 中的每一列做出确定性的数据类型推断（空值、字符串、浮点数、整数、日期、日期时间、布尔值）（[示例](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fscripts\u002FNYC_311_SR_2010-2020-sample-1M.stats.csv) — [更多信息](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fwiki\u002FSupplemental#stats-command-output-explanation)）。\u003Cbr>如果存在索引，可利用多线程加速处理（有了索引，可以在不到 7.3 秒内对纽约市 311 数据（15GB，2800 万行）进行“流式”统计！）。|\n| [table](docs\u002Fhelp\u002Ftable.md)\u003Cbr>🤯 | 使用 [弹性制表位](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Ftabwriter)对 CSV 输出进行对齐，便于查看；或创建“对齐的 TSV”文件或固定宽度格式文件。若需交互式查看 CSV 文件，请使用 `lens` 命令。|\n| [template](docs\u002Fhelp\u002Ftemplate.md)\u003Cbr>PostalCodes️PostalCodes️🚀PostalCodes️📚⛩️ (![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png)) | 使用 [Mini Jinja](https:\u002F\u002Fdocs.rs\u002Fminijinja\u002Flatest\u002Fminijinja\u002F) 模板引擎，基于 CSV 数据渲染模板（[示例](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F4645ec07b5befe3b0c0e49bf0f547315d0d7514b\u002Fsrc\u002Fcmd\u002Ftemplate.rs#L18-L44)）。|\n| [to](docs\u002Fhelp\u002Fto.md)✨\u003Cbr>🚀🐻‍❄️PostalCodes️ | 将 CSV 文件转换为 [Parquet](https:\u002F\u002Fparquet.apache.org)、[PostgreSQL](https:\u002F\u002Fwww.postgresql.org)、[SQLite](https:\u002F\u002Fwww.sqlite.org\u002Findex.html)、Excel (XLSX)、[LibreOffice Calc](https:\u002F\u002Fwww.libreoffice.org\u002Fdiscover\u002Fcalc\u002F) (ODS) 和 [Data Package](https:\u002F\u002Fdatahub.io\u002Fdocs\u002Fdata-packages\u002Ftabular)。|\n| [tojsonl](docs\u002Fhelp\u002Ftojsonl.md)\u003Cbr>PostalCodes️PostalCodes️ | 智能地将 CSV 转换为换行分隔的 JSON（[JSONL](https:\u002F\u002Fjsonlines.org\u002F)\u002F[NDJSON](http:\u002F\u002Fndjson.org\u002F)）。通过先扫描 CSV 文件，它会“智能”推断出每列合适的 JSON 数据类型。有关将 JSONL 转换为 CSV 的方法，请参阅 `jsonl` 命令。|\n| [transpose](docs\u002Fhelp\u002Ftranspose.md)\u003Cbr>PostalCodes️PostalCodes️ | 转置 CSV 文件的行\u002F列。|\n| [validate](docs\u002Fhelp\u002Fvalidate.md)\u003Cbr>PostalCodes️PostalCodes️ 🚀🌐PostalCodes️📚PostalCodes️ (![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png)) | \u003Ca name=\"validate_deeplink\">\u003C\u002Fa>使用 [JSON Schema Validation (Draft 2020-12)](https:\u002F\u002Fjson-schema.org\u002Fdraft\u002F2020-12\u002Fjson-schema-validation.html)验证 CSV 数据 [_极速_](https:\u002F\u002Fgithub.com\u002FStranger6667\u002Fjsonschema-rs?tab=readme-ov-file#performance \"使用 jsonschema-rs — Rust 中最快的 JSON Schema 验证工具\")（例如：使用 [`schema`](#schema_deeplink) 命令生成的 [纽约市 311 数据的模式](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fresources\u002Ftest\u002F311_Service_Requests_from_2010_to_Present-2022-03-04.csv.schema.json)，每秒可验证多达 780,031 行[^1]），并将无效记录放入单独的文件中，同时附上详细的验证错误报告。\u003Cbr>\u003Cbr>支持多种自定义 JSON Schema 格式和关键字：\u003Cbr> * `currency` 自定义格式，带有 [ISO-4217](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FISO_4217) 验证\u003Cbr> * `dynamicEnum` 自定义关键字，支持基于文件系统或 URL 上的 CSV 进行枚举验证（支持 http\u002Fhttps\u002Fckan 和 dathere URL 方案）\u003Cbr>* `uniqueCombinedWith` 自定义关键字，用于验证跨多列的唯一性，以实现复合键验证。\u003Cbr>\u003Cbr>如果没有提供 JSON Schema 文件，则验证 CSV 是否符合 [RFC 4180 标准](#rfc-4180-csv-standard)且采用 UTF-8 编码。|\n\n\u003Cdiv style=\"text-align: right\">\u003Csub>\u003Csup>性能指标基于配备 32GB 内存的 M2 Pro 12 核 Mac Mini 收集\u003C\u002Fsup>\u003C\u002Fsub>\u003C\u002Fdiv>\n\n\u003Ca name=\"legend_deeplink\">✨\u003C\u002Fa>: 由[功能标志](#feature-flags)启用。  \n📇: 在可用时使用索引。  \n🤯: 将整个 CSV 文件加载到内存中，不过 `dedup`、`stats` 和 `transpose` 也有“流式”模式。  \n😣: 使用与 CSV 中列基数成比例的额外内存。  \n🧠: 昂贵的操作会被缓存，并为 fetch 命令提供会话间 Redis\u002F磁盘缓存支持。  \n🗄️: [扩展输入支持](#extended-input-support)。  \n🗃️: [有限的扩展输入支持](#limited-extended-input-support)。  \n🐻‍❄️: 命令由[![polars 0.53.0:py_1.39.3:880651f](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpolars-0.53.0:py_1.39.3_880651f-blue?logo=polars)](https:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars\u002Freleases\u002Ftag\u002Fpy-1.39.3) 向量化查询引擎驱动\u002F加速。  \n🤖: 命令使用自然语言处理或生成式 AI。  \n🏎️: 多线程且\u002F或在有索引（📇）时速度更快。  \n🚀: 即使没有索引也能多线程运行。  \n![CKAN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_76dda6475e59.png) : 具备与 [CKAN](https:\u002F\u002Fckan.org) 适配的集成选项。  \n🌐: 具备网络感知选项。  \n🔣: 要求输入为 UTF-8 编码。  \n👆: 具备强大的列选择器支持。语法请参见 [`select`](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fsrc\u002Fcmd\u002Fselect.rs#L2)。  \n🪄: “自动魔法”命令，利用统计和\u002F或频率表实现更智能、更快速的操作。  \n📚: 具备查找表支持，可在运行时对本地或远程参考 CSV 文件进行“查找”。  \n🌎: 具备地理空间能力。  \n⛩️: 使用 [Mini Jinja](https:\u002F\u002Fdocs.rs\u002Fminijinja\u002Flatest\u002Fminijinja\u002F) 模板引擎。  \n![Luau](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_afb2a208d532.png) : 使用 [Luau](https:\u002F\u002Fluau.org\u002F) [0.709](https:\u002F\u002Fgithub.com\u002FRoblox\u002Fluau\u002Freleases\u002Ftag\u002F0.709) 作为嵌入式脚本化 [DSL](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FDomain-specific_language)。  \n🖥️: 属于用户界面（UI）功能组\n\n[^1]: 参见 [`validate_index` 基准测试](https:\u002F\u002Fqsv.dathere.com\u002Fbenchmarks)\n\n\n\n## 安装选项\n\n> [!NOTE]\n> 若要安装 qsv MCP 服务器及可选的 Claude Cowork 插件，请参阅[入门指南](.claude\u002Fskills\u002Fdocs\u002Fguides\u002FSTART_HERE.md)。\n\n### 选项 0: qsv pro\n\n如果您更倾向于使用图形界面而非命令行来探索数据，不妨试试 **[qsv pro](https:\u002F\u002Fqsvpro.dathere.com)**。qsv pro 基于 qsv 构建，只需简单地拖放文件即可快速分析电子表格数据，同时还提供许多其他交互式功能。更多信息请访问 [qsvpro.dathere.com](https:\u002F\u002Fqsvpro.dathere.com)，或点击下方任意一个徽章直接下载 qsv pro。\n\n\u003Cdiv style=\"display: flex; gap: 1rem;\">\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fqsv-pro\u002Fwindows\">\u003Cimg alt=\"qsv pro Windows 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_a55368354c85.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fqsv-pro\u002Fmacos\">\u003Cimg alt=\"qsv pro macOS 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_a2fb3ac014a3.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fqsv-pro\u002Flinux-deb\">\u003Cimg alt=\"qsv pro Linux (deb) 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_658e05f19c09.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fqsv-pro\u002Flinux-rpm\">\u003Cimg alt=\"qsv pro Linux (rpm) 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_c72909f063a7.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fqsv-pro\u002Flinux-appimage\">\u003Cimg alt=\"qsv pro Linux (AppImage) 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_2ce62267c43b.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n### 选项 1：下载预编译二进制文件\n\n适用于 Linux、macOS 和 Windows 的最新 qsv 版本的全功能预编译 [二进制变体](#variants) 已在 [此处](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Freleases\u002Flatest) 提供下载，其中包括使用 [Rust Nightly](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F70745970\u002Frust-nightly-vs-beta-version) 编译的二进制文件（[更多信息](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FPERFORMANCE.md#nightly-release-builds)）。您可以根据您的平台点击下方的徽章，下载包含预编译二进制文件的 ZIP 压缩包。\n\n\u003Cdiv style=\"display: flex; gap: 1rem;\">\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Flinux-x86_64-gnu\">\u003Cimg alt=\"qsv Linux x86_64 GNU 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_07209b955c55.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Flinux-aarch64-gnu\">\u003Cimg alt=\"qsv Linux AArch64 GNU 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_3d0fad2cd6de.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Flinux-x86_64-musl\">\u003Cimg alt=\"qsv Linux x86_64 MUSL 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_81b4e2adf40b.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Flinux-powerpc64le-gnu\">\u003Cimg alt=\"qsv linux-powerpc64le-gnu 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_97dc5b7fa8df.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fmacos-silicon\">\u003Cimg alt=\"qsv macOS 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_0378e21668eb.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fwindows-msvc\">\u003Cimg alt=\"qsv Windows MSVC 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_cbb9cecaf56a.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fwindows-aarch64-msvc\">\u003Cimg alt=\"qsv Windows AArch64 MSVC 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_f0bd0b7458f5.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003Ca target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fwindows-gnu\">\u003Cimg alt=\"qsv Windows GNU 下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_19779b920b3a.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n针对 Apple Silicon、Windows for ARM、[IBM Power Servers (PowerPC64 LE Linux)](https:\u002F\u002Fwww.ibm.com\u002Fproducts\u002Fpower) 以及 [IBM Z 大型机 (s390x)](https:\u002F\u002Fwww.ibm.com\u002Fproducts\u002Fz) 的预编译二进制文件启用了 CPU 优化（[`target-cpu=native`](https:\u002F\u002Frust-lang.github.io\u002Fpacked_simd\u002Fperf-guide\u002Ftarget-feature\u002Frustflags.html#target-cpu)），以进一步提升性能。\n\n由于 x86_64 平台存在过多的 CPU 变体，容易导致非法指令（SIGILL）错误，因此我们未在该平台的预编译二进制文件中启用 CPU 优化。如果您仍然遇到 SIGILL 错误，发布 ZIP 压缩包中也包含了“便携式”二进制文件（所有 CPU 优化均被禁用），其文件名带有“p for portable”后缀，例如 `qsvp`、`qsvplite` 或 `qsvpdp`。\n\n对于 Windows，我们还提供了一个用于 x86_64 MSVC `qsvp` 二进制文件的 MSI “简易安装程序”。下载并安装简易安装程序后，启动它并点击“Install qsv”，即可将最新的 `qsvp` 预编译二进制文件下载到已添加至您 `PATH` 环境变量的文件夹中。此后 qsv 即可完成安装，您可以在新的终端中直接使用。\n\n\u003Ca download href=\"https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv-easy-windows-installer\u002Freleases\u002Fdownload\u002Fv1.1.1\u002Fqsv-easy-installer_1.1.1_x64_en-US.msi\">\u003Cimg alt=\"qsv Windows 简易安装程序下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_8ceb32c89df4.png\" width=\"200\" \u002F>\u003C\u002Fa>\n\n对于 macOS，我们的二进制文件使用了“临时签名”[来签署](https:\u002F\u002Fusers.rust-lang.org\u002Ft\u002Fdistributing-cli-apps-on-macos\u002F70223)，因此您需要先[设置适当的 Gatekeeper 安全设置](https:\u002F\u002Fsupport.apple.com\u002Fen-us\u002FHT202491)，或者运行以下命令，在首次运行 qsv 之前移除其隔离属性：\n\n```bash\n# 如果您安装的是其他变体，请将 qsv 替换为 qsvmcp、qsvlite 或 qsvdp\nxattr -d com.apple.quarantine qsv\n```\n\n使用预编译二进制文件的另一个优势是，它们启用了 `self_update` 功能，允许您通过简单的 `qsv --update` 命令快速将 qsv 更新至最新版本。为了进一步提高安全性，`self_update` 功能仅从 [此 GitHub 仓库](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Freleases) 获取发布版本，并在安装更新前自动验证下载的 ZIP 压缩包的签名。\n\n> [!NOTE]\n> `musl` 预编译二进制文件中不包含 `luau` 功能[^3]。\n\n#### 手动验证预编译二进制文件 ZIP 压缩包的完整性\n所有预编译二进制文件 ZIP 压缩包均使用 [zipsign](https:\u002F\u002Fgithub.com\u002FKijewski\u002Fzipsign#zipsign) 进行签名，公钥为 [qsv-zipsign-public.key](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fraw\u002Fmaster\u002Fsrc\u002Fqsv-zipsign-public.key)。要验证下载的 ZIP 压缩包的完整性：\n\n```bash\n# 如果尚未安装 zipsign\ncargo install zipsign\n\n# 在下载 ZIP 压缩包和 qsv-zipsign-public.key 文件后，验证预编译二进制文件 ZIP 压缩包的完整性。\n# 将 \u003CPREBUILT-BINARY-ARCHIVE.zip> 替换为下载的 ZIP 压缩包名称。\n# 例如：zipsign verify zip qsv-0.118.0-aarch64-apple-darwin.zip qsv-zipsign-public.key\nzipsign verify zip \u003CPREBUILT-BINARY-ARCHIVE.zip> qsv-zipsign-public.key\n```\n\n### 选项 2：包管理器与发行版\n\nqsv 也被多个包管理器和发行版所分发。\n\n[![打包状态](https:\u002F\u002Frepology.org\u002Fbadge\u002Fvertical-allrepos\u002Fqsv.svg)](https:\u002F\u002Frepology.org\u002Fproject\u002Fqsv\u002Fversions)\n\n以下是使用各种包管理器和发行版安装 qsv 的相关命令：\n```bash\n# Arch Linux AUR (https:\u002F\u002Faur.archlinux.org\u002Fpackages\u002Fqsv)\nyay -S qsv\n\n# Homebrew on macOS\u002FLinux (https:\u002F\u002Fformulae.brew.sh\u002Fformula\u002Fqsv#default)\nbrew install qsv\n\n# MacPorts on macOS (https:\u002F\u002Fports.macports.org\u002Fport\u002Fqsv\u002F)\nsudo port install qsv\n\n# Mise on Linux\u002FmacOS\u002FWindows (https:\u002F\u002Fmise.jdx.dev)\nmise use -g qsv@latest\n\n# Nixpkgs on Linux\u002FmacOS (https:\u002F\u002Fsearch.nixos.org\u002Fpackages?channel=unstable&show=qsv&from=0&size=50&sort=relevance&type=packages&query=qsv)\nnix-shell -p qsv\n\n# Scoop on Windows (https:\u002F\u002Fscoop.sh\u002F#\u002Fapps?q=qsv)\nscoop install qsv\n\n# Void Linux (https:\u002F\u002Fvoidlinux.org\u002Fpackages\u002F?arch=x86_64&q=qsv)\nsudo xbps-install qsv\n\n# Conda-forge (https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Fqsv)\nconda install conda-forge::qsv\n```\n\n需要注意的是，这些包管理器或发行版提供的 qsv 支持的功能有所不同（例如，Homebrew 启用了 `apply`、`fetch`、`foreach`、`geocode`、`lens`、`luau` 和 `to` 等功能。不过，它会自动为 `bash`、`fish` 和 `zsh` shell 安装命令补全）。\n\n要查看某个包或发行版中的 qsv 启用了哪些功能，可以运行 `qsv --version`（[更多信息](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FPERFORMANCE.md#version-details)）。\n\n秉承开源精神，这些软件包由志愿者维护，他们希望让 qsv 在各种环境中更容易安装。我们非常感谢他们的付出，并通过 GitHub 与这些包的维护者保持松散的合作关系，但请知悉，这些软件包是由第三方维护的。\n\n#### Debian 软件包\ndatHere 还维护了一个针对 x86_64 架构上最新 Ubuntu LTS 的 Debian 软件包，以便更方便地与 DataPusher+ 一起安装 qsv。\n\n在 Ubuntu\u002FDebian 上安装 qsv 的步骤如下：\n\n```bash\nwget -O - https:\u002F\u002Fdathere.github.io\u002Fqsv-deb-releases\u002Fqsv-deb.gpg | sudo gpg --dearmor -o \u002Fusr\u002Fshare\u002Fkeyrings\u002Fqsv-deb.gpg\necho \"deb [signed-by=\u002Fusr\u002Fshare\u002Fkeyrings\u002Fqsv-deb.gpg] https:\u002F\u002Fdathere.github.io\u002Fqsv-deb-releases .\u002F\" | sudo tee \u002Fetc\u002Fapt\u002Fsources.list.d\u002Fqsv.list\nsudo apt update\nsudo apt install qsv\n```\n\n### 选项 3：从源码编译\n\n如果你已经安装了 [Rust](https:\u002F\u002Fwww.rust-lang.org\u002Ftools\u002Finstall)，就可以从源码编译[^2]：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv.git\ncd qsv\ncargo build --release --locked --bin qsv --features all_features\n```\n\n编译后的二进制文件将位于 `.\u002Ftarget\u002Frelease\u002F`。\n\n要编译不同的 [变体](#variants)并启用可选的 [特性](#feature-flags)：\n\n```bash\n# 编译启用所有特性的 qsv\ncargo build --release --locked --bin qsv --features feature_capable,apply,fetch,foreach,geocode,luau,mcp,magika,polars,python,self_update,to,ui\n# 简写\ncargo build --release --locked --bin qsv -F all_features\n# 为当前 CPU 启用所有 CPU 优化（警告：生成的二进制文件不可移植）\nCARGO_BUILD_RUSTFLAGS='-C target-cpu=native' cargo build --release --locked --bin qsv -F all_features\n\n# 或者只启用 fetch 和 foreach 特性的 qsv\ncargo build --release --locked --bin qsv -F feature_capable,fetch,foreach\n\n# 针对 qsvmcp —— MCP 服务器优化变体\ncargo build --release --locked --bin qsvmcp -F qsvmcp\n\n# 针对 qsvlite\ncargo build --release --locked --bin qsvlite -F lite\n\n# 针对 qsvdp\ncargo build --release --locked --bin qsvdp -F datapusher_plus\n```\n\n[^2]: 当然，你还需要链接器和 C 编译器。Linux 用户通常应根据其发行版的文档安装 GCC 或 Clang。例如，如果你使用 Ubuntu，可以安装 `build-essential` 包。在 macOS 上，可以通过运行 `$ xcode-select --install` 来获取 C 编译器。\n对于 Windows，这意味着需要安装 [Visual Studio 2022](https:\u002F\u002Fvisualstudio.microsoft.com\u002Fdownloads\u002F)。在选择工作负载时，请包含“使用 C++ 的桌面开发”、Windows 10 或 11 SDK 以及英语语言包，同时根据需要添加其他语言包。\n\n> [!NOTE]\n> 若要使用 Rust nightly 版本进行构建，请参阅 [Nightly Release Builds](docs\u002FPERFORMANCE.md#nightly-release-builds)。\n`feature_capable`、`qsvmcp`、`lite` 和 `datapusher_plus` 是互斥的特性。更多信息请参阅 [Special Build Features](docs\u002FFEATURES.md#special-features-for-building-qsv-binary-variants)。\n\n### 变体\n\nqsv 有五种二进制变体：\n\n* `qsv` —— 具备 [特性](#feature-flags)能力（✨），预编译的二进制文件启用了除 Python 外的所有适用特性[^3]。\n* `qsvpy` —— 与 `qsv` 相同，但启用了 Python 特性。有三个子变体可供选择：qsvpy311、qsvpy312 和 qsvpy313，分别使用最新补丁版本的 Python 3.11、3.12 和 3.13 进行编译。我们需要为每个 Python 版本提供一个二进制文件，因为 Python 是动态链接的（[更多信息](docs\u002FINTERPRETERS.md#building-qsv-with-python-feature)）。\n* `qsvmcp` —— 针对 [MCP（模型上下文协议）](https:\u002F\u002Fmodelcontextprotocol.io\u002F) 服务器使用进行了优化，启用了 geocode、luau、mcp、polars、self_update 和 to 等特性。与 `qsv` 共享 `src\u002Fmain.rs`。\n* `qsvlite` —— 所有特性均被禁用（约为 `qsv` 大小的 16%）。如果你是从 [xsv](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Fxsv) 迁移过来，并希望获得相同的体验和功能集，那么这个变体非常适合你。\n* `qsvdp` —— 针对与 [DataPusher+](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fdatapusher-plus) 一起使用进行了优化，仅包含与 DataPusher+ 相关的命令；嵌入了 [`luau`](#luau_deeplink) 解释器；包含了 [`applydp`](#applydp_deeplink)，即 `apply` 特性的精简版；禁用了 `--progressbar` 选项；并且自更新功能仅检查新版本，需要显式使用 `--update` 选项（约为 `qsv` 大小的 16%）。\n\n> [!NOTE]\n> qsv 还有带有 “p” 后缀的“便携式”子变体——`qsvp`、`qsvplite` 和 `qsvpdp`。这些子变体在编译时不启用任何 CPU 特性。如果你使用的是较旧的 CPU 架构，或者在运行常规 qsv 二进制文件时遇到 “非法指令 (SIGILL)” 错误，可以使用这些子变体。\n\n[^3]: 默认情况下，预编译的二进制文件在 musl 平台上不会启用 `luau` 特性。这是因为我们使用基于 Ubuntu 20.04 LTS 的 GitHub Action Runners，并采用 [musl libc](https:\u002F\u002Fmusl.libc.org\u002F) 工具链进行交叉编译。然而，Ubuntu 是基于 glibc 的发行版，而非 musl。我们通过 [交叉编译](https:\u002F\u002Fblog.logrocket.com\u002Fguide-cross-compilation-rust\u002F) 来解决这个问题。\n遗憾的是，这使得我们无法编译启用 `luau` 特性的二进制文件，因为那样需要静态链接主机操作系统的 libc 库。如果你需要在 musl 系统上使用 `luau` 特性，就需要在自己的 musl 基础 Linux 发行版（如 Alpine、Void 等）上自行从源码编译。\n\n### Shell 补全\nqsv 提供了功能强大且可扩展的 [shell 补全](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCommand-line_completion) 支持。目前支持以下 shell：`bash`、`zsh`、`powershell`、`fish`、`nushell`、`fig` 和 `elvish`。您可以通过点击下方的任一徽章来下载适用于您所用 shell 的补全脚本：\n\n\u003Cdiv style=\"display: flex; gap: 1rem;\">\n\u003Ca download href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fbash-shell\">\u003Cimg alt=\"qsv Bash shell 补全下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_256e6e9cdb03.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fpowershell-shell\">\u003Cimg alt=\"qsv PowerShell shell 补全下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_4c7b42a04c4b.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fzsh-shell\">\u003Cimg alt=\"qsv zsh shell 补全下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_801969651086.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Ffish-shell\">\u003Cimg alt=\"qsv fish shell 补全下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_d682f6a3ee9d.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Fnushell-shell\">\u003Cimg alt=\"qsv nushell shell 补全下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_aa925e019e7e.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Ffig-shell\" width=\"160\" \u002F>\u003Cimg alt=\"qsv fig shell 补全下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_0fdeba390ceb.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003Ca download target=\"_blank\" href=\"https:\u002F\u002Fqsv.dathere.com\u002Fdownload\u002Felvish-shell\">\u003Cimg alt=\"qsv elvish shell 补全下载徽章\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_d290e8042641.png\" width=\"140\" \u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n如需自定义 shell 补全，请参阅 [Shell 补全](contrib\u002Fcompletions\u002FREADME.md) 文档。如果您使用的是 Bash，也可以按照 [100.dathere.com](https:\u002F\u002F100.dathere.com\u002Fexercises-setup.html#optional-set-up-qsv-completions) 上的分步教程，学习如何启用 Bash shell 补全。\n\n## 正则表达式语法\n\n`--select` 选项以及多个命令（`apply`、`applydp`、`datefmt`、`exclude`、`fetchpost`、`replace`、`schema`、`search`、`searchset`、`select`、`sqlp`、`stats` 和 `validate`）允许用户指定正则表达式。我们使用 [`regex`](https:\u002F\u002Fdocs.rs\u002Fregex) crate 来解析、编译和执行这些表达式。[^4]\n\n[^4]: 这与 [`ripgrep`](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Fripgrep#ripgrep-rg) 所使用的正则引擎相同——这款[极速的 grep 替代工具](https:\u002F\u002Fblog.burntsushi.net\u002Fripgrep\u002F)为 Visual Studio 的[神奇](https:\u002F\u002Flab.cccb.org\u002Fen\u002Farthur-c-clarke-any-sufficiently-advanced-technology-is-indistinguishable-from-magic\u002F)“在文件中查找”功能提供了支持。\n\n其语法可在 [这里](https:\u002F\u002Fdocs.rs\u002Fregex\u002Flatest\u002Fregex\u002F#syntax) 查看，并说明：“该语法与其他正则引擎类似，但缺少一些难以高效实现的功能，包括但不限于环视和反向引用。作为补偿，本 crate 中的所有正则搜索都具有最坏情况下的 O(m * n) 时间复杂度，其中 m 与正则表达式的大小成正比，n 与被搜索字符串的长度成正比。”\n\n如果您想测试自己的正则表达式，[regex101](https:\u002F\u002Fregex101.com) 支持 `regex` crate 所使用的语法。只需选择“Rust”模式即可。\n\n> JSON Schema 验证正则表达式说明：当 `schema` 命令推断 JSON Schema 验证文件时，若使用了 `--pattern-columns` 选项，它会为选定的列生成一个正则表达式。虽然生成的正则表达式一定有效，但未必是最优的。\u003Cbr\u002F>在将生成的 JSON Schema 文件用于生产环境的 `validate` 命令之前，建议用户根据需要检查并优化生成的正则表达式。\u003Cbr\u002F>需要注意的是，在 JSON Schema 验证模式下，`validate` 命令还可通过 `--fancy-regex` 选项支持带有环视和反向引用的“高级”正则表达式。\n\n## 文件格式\n\nqsv 支持 UTF-8\u002FASCII 编码的 CSV（`.csv`）、SSV（`.ssv`）和 TSV 文件（`.tsv` 和 `.tab`）。CSV 文件默认使用逗号（`,`）作为分隔符，SSV 文件使用分号（`;`）作为分隔符，而 TSV 文件则使用制表符（`\\t`）作为分隔符。分隔符是一个单个 ASCII 字符，可以通过 `--delimiter` 命令行选项、`QSV_DEFAULT_DELIMITER` 环境变量来设置，或者在启用 `QSV_SNIFF_DELIMITER` 时自动检测。\n\n当使用 `--output` 选项时，qsv 会将文件以 UTF-8 编码输出，并根据文件扩展名自动调整生成文件中的分隔符——即 `.csv` 文件使用逗号，`.ssv` 文件使用分号，`.tsv` 和 `.tab` 文件使用制表符。\n\nJSON 文件会被识别，并通过 [`json`](\u002Fsrc\u002Fcmd\u002Fjson.rs#L2) 命令转换为 CSV 格式。[JSONL](https:\u002F\u002Fjsonlines.org\u002F) 和 [NDJSON](http:\u002F\u002Fndjson.org\u002F) 文件也会被识别，并分别通过 [`jsonl`](\u002Fsrc\u002Fcmd\u002Fjsonl.rs#L2) 和 [`tojsonl`](\u002Fsrc\u002Fcmd\u002Ftojsonl.rs#L2) 命令进行 CSV 格式的相互转换。\n\n`fetch` 和 `fetchpost` 命令在不使用 `--new-column` 选项时会生成 JSONL 文件；而在使用 `--report` 选项时，则会生成 TSV 文件。\n\n`excel`、`safenames`、`sniff`、`sortcheck` 和 `validate` 命令在启用 JSON 选项时会生成符合 [JSON API 1.1 规范](https:\u002F\u002Fjsonapi.org\u002Fformat\u002F) 的 JSON 文件，从而返回详细的机器友好型元数据，可供其他系统使用。\n\n`schema` 命令会生成一个带有 `.schema.json` 扩展名的 [JSON Schema Validation (Draft 2020-12)](https:\u002F\u002Fjson-schema.org\u002Fdraft\u002F2020-12\u002Fjson-schema-validation.html) 文件，该文件可与 `validate` 命令结合使用，以验证具有相同模式的其他 CSV 文件。\n\n`describegpt` 和 `frequency` 命令都会生成 [TOON](https:\u002F\u002Ftoonformat.dev) 文件。TOON 是一种紧凑且易于阅读的 JSON 数据模型编码格式，专用于 LLM 提示词。\n\n`excel` 命令支持 Excel 和 Open Document Spreadsheet（ODS）文件（`.xls`、`.xlsx`、`.xlsm`、`.xlsb` 和 `.ods` 文件）。\n\n说到 Excel，如果您在使用 Excel 打开 qsv 生成的 CSV 文件时遇到问题，请设置 `QSV_OUTPUT_BOM` 环境变量，以便在生成的 CSV 文件开头添加一个 [字节顺序标记](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FByte_order_mark)。这是为了解决 [Excel 的 UTF-8 编码检测缺陷](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F155097\u002Fmicrosoft-excel-mangles-diacritics-in-csv-files)。\n\n`to` 命令可以将 CSV 文件转换为 Parquet、Excel `.xlsx`、LibreOffice\u002FOpenOffice Calc `.ods` 以及 [Data Package](https:\u002F\u002Fdatahub.io\u002Fdocs\u002Fdata-packages\u002Ftabular) 格式，并将数据导入到 [PostgreSQL](https:\u002F\u002Fwww.postgresql.org) 和 [SQLite](https:\u002F\u002Fwww.sqlite.org\u002Findex.html) 数据库中。\n\n`sqlp` 命令可以将查询结果以 CSV、JSON、JSONL、[Parquet](https:\u002F\u002Fparquet.apache.org)、[Apache Arrow IPC](https:\u002F\u002Farrow.apache.org\u002Fdocs\u002Fformat\u002FColumnar.html#ipc-file-format) 和 [Apache AVRO](https:\u002F\u002Favro.apache.org) 格式输出。Polars SQL 还支持通过其 `read_csv`、`read_ndjson`、`read_parquet` 和 `read_ipc` [表函数](https:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars\u002Fblob\u002F91a423fea2dc067837db65c3608e3cbc1112a6fc\u002Fcrates\u002Fpolars-sql\u002Fsrc\u002Ftable_functions.rs#L18-L43)直接读取多种格式的外部文件。\n\n`sniff` 命令还可以通过 `--no-infer` 或 `--just-mime` 选项检测任何文件的 MIME 类型，无论是本地文件还是远程文件（支持 http 和 https 协议）。它可以检测超过 130 种文件格式，包括 MS Office\u002FOpen Document 文件、JSON、XML、PDF、PNG、JPEG，以及 GPX、GML、KML、TML、TMX、TSX、TTML 等专业地理空间格式。完整的列表请点击 [这里](https:\u002F\u002Fdocs.rs\u002Ffile-format\u002Flatest\u002Ffile_format\u002F#reader-features)。\n\n> [!NOTE]\n> 当启用 `polars` 功能时，qsv 还可以直接读取 `.parquet`、`.ipc`、`.arrow`、`.json` 和 `.jsonl` 文件。\n\n### 扩展输入支持\n\n`cat`、`headers`、`sqlp`、`to` 和 `validate` 命令具有扩展输入支持（🗄️）。如果输入是 `-` 或空值，命令会尝试使用标准输入作为输入源。如果不是，则会检查是否为目录，如果是目录，就会将目录中的所有文件作为输入文件。\n\n如果输入是一个文件，首先会检查其是否具有 `.infile-list` 扩展名。如果有，则会加载该文本文件，并将每一行解析为一个输入文件路径。这是一种更快速便捷的方式，无需将大量输入文件逐一作为命令行参数传递即可处理它们。此外，这些文件路径可以位于文件系统的任何位置，甚至跨不同存储卷。如果输入文件路径不是绝对路径，则会被视为相对于当前工作目录的相对路径。空行和以 `#` 开头的行会被忽略。无效的文件路径会被记录为警告并跳过。\n\n对于目录和 `.infile-list` 输入，带有 `.sz` 或 `.zip` 扩展名的 Snappy 压缩文件会自动解压缩。\n\n最后，如果输入只是一个普通的文件，则会被视为常规输入文件。\n\n#### 有限的扩展输入支持\n`describegpt`、`lens`、`slice` 和 `tojsonl` 命令具有有限的扩展输入支持（🗃️）。它们的不同之处在于只处理一个文件。如果提供了 `.infile-list` 文件或压缩的 `.sz` 或 `.zip` 文件，它们只会处理第一个文件。\n\n### 自动压缩\u002F解压缩\n\nqsv 支持使用 [Snappy 帧格式](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fsnappy\u002Fblob\u002Fmain\u002Fframing_format.txt)进行_自动压缩\u002F解压缩_。选择 Snappy 而不是更流行的 gzip 等压缩格式，是因为它专为[高性能流式压缩与解压缩]设计（压缩速度可达 2.58 GB\u002F秒，解压缩速度可达 0.89 GB\u002F秒）。\n\n除 `index`、`extdedup` 和 `extsort` 命令外，对于所有其他命令，如果输入文件带有 `.sz` 扩展名，qsv 在读取时会_自动_进行流式解压缩。此外，如果输入文件具有扩展的 CSV\u002FTSV `.sz` 扩展名（例如 nyc311.csv.sz、nyc311.tsv.sz 或 nyc311.tab.sz），qsv 还会根据文件扩展名确定使用的分隔符。\n\n同样地，如果 `--output` 文件带有 `.sz` 扩展名，qsv 在写入时会_自动_进行流式压缩。若输出文件具有扩展的 CSV\u002FTSV `.sz` 扩展名，qsv 也会依据文件扩展名来决定分隔符。\n\n需要注意的是，压缩后的文件无法建立索引，因此依赖索引加速的命令（如 `frequency`、`schema`、`split`、`stats`、`tojsonl`）将不会启用多线程。在没有索引的情况下，随机访问功能也被禁用，因此 `slice` 命令不会立即执行，且 `luau` 的随机访问模式也将不可用。\n\n此外，qsv 还提供了一个专门的 [`snappy`](\u002Fsrc\u002Fcmd\u002Fsnappy.rs#L2) 命令，包含四个子命令，用于直接操作 Snappy 文件：一个多线程的 `compress` 子命令（比内置的单线程自动压缩快 4–5 倍）；一个带有详细压缩元数据的 `decompress` 子命令；一个用于快速检查文件是否具有 Snappy 头部的 `check` 子命令；以及一个用于确认 Snappy 文件是否有效的 `validate` 子命令。\n\n`snappy` 命令可用于压缩\u002F解压缩任何文件，而不仅限于 CSV\u002FTSV 文件。\n\n借助 `snappy compress` 子命令，我们可以将纽约市 311 数据（15 GB，2800 万行）在 _5.77 秒_ 内压缩至 4.95 GB，压缩速率为 _2.58 GB\u002F秒_，压缩比为 0.33（3.01:1）。使用 `snappy decompress` 子命令，我们可以在 _16.71 秒_ 内完成同一文件的往返解压缩，解压缩速率为 _0.89 GB\u002F秒_。\n\n相比之下，[zip 3.0](https:\u002F\u002Finfozip.sourceforge.net\u002FZip.html) 在同一台机器上将相同文件压缩至 2.9 GB 需要 _248.3 秒_，速度慢了 43 倍，仅为 0.06 GB\u002F秒，压缩比为 0.19（5.17:1），但仅节省了额外的 14%（2.45 GB）空间。此外，zip 完成同一文件的往返解压缩也耗时更长，需要 _72 秒_，解压缩速度仅为 _0.20 GB\u002F秒_。\n\n> [!NOTE]\n> qsv 除了 Snappy 之外还支持其他压缩方式：\n>\n> `sqlp` 命令可以：\n> - 自动解压缩 gzip、zstd 和 zlib 压缩的输入文件\n> - 使用 Arrow、Avro 和 Parquet 格式时自动压缩输出文件（通过 `--format` 和 `--compression` 选项）\n>\n> 当启用 `polars` 功能时，qsv 可以自动解压缩以下压缩格式的文件：\n> - CSV：`.csv.gz`、`.csv.zst`、`.csv.zlib`\n> - TSV\u002FTAB：`.tsv.gz`、`.tsv.zst`、`.tsv.zlib`；`.tab.gz`、`.tab.zst`、`.tab.zlib`\n> - SSV：`.ssv.gz`、`.ssv.zst`、`.ssv.zlib`\n>\n> 同时支持扩展和有限扩展输入的命令也支持 `.zip` 压缩格式。\n\n## RFC 4180 CSV 标准\n\nqsv 遵循 [RFC 4180](https:\u002F\u002Fdatatracker.ietf.org\u002Fdoc\u002Fhtml\u002Frfc4180) CSV 标准。然而，在实际应用中，CSV 格式差异很大，qsv 并不完全严格遵守该规范，因此能够处理“真实世界”的 CSV 文件。\nqsv 利用强大的 [Rust CSV](https:\u002F\u002Fdocs.rs\u002Fcsv\u002Flatest\u002Fcsv\u002F) crate 来读取和写入 CSV 文件。\n\n点击[这里](https:\u002F\u002Fdocs.rs\u002Fcsv-core\u002Flatest\u002Fcsv_core\u002Fstruct.Reader.html#rfc-4180)了解更多关于 qsv 如何利用该 crate 符合标准的信息。\n\n在处理“非典型”CSV 文件时，可以使用 `input` 和 `fmt` 命令将其规范化为符合 RFC 4180 标准的格式。\n\n## UTF-8 编码\n\nqsv 要求输入文件必须采用 UTF-8 编码（ASCII 是 UTF-8 的子集）。\n\n如果需要重新编码 CSV\u002FTSV 文件，可以使用 `input` 命令以“有损保存”的方式转换为 UTF-8 编码——将无效的 UTF-8 序列替换为 ``（[U+FFFD 替换字符](https:\u002F\u002Fdoc.rust-lang.org\u002Fstd\u002Fchar\u002Fconstant.REPLACEMENT_CHARACTER.html)）。\n\n或者，如果您希望真正转码为 UTF-8，可以使用诸如 [`iconv`](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FIconv) 等工具，在 [Linux\u002FmacOS](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F805418\u002Fhow-can-i-find-encoding-of-a-file-via-a-script-on-linux) 和 [Windows](https:\u002F\u002Fsuperuser.com\u002Fquestions\u002F1163753\u002Fconverting-text-file-to-utf-8-on-windows-command-prompt) 上完成此操作。\n\n### Windows PowerShell 和 Excel 使用注意事项\n\n与其他现代操作系统不同，Microsoft Windows 的[默认编码](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fpowershell\u002Fmodule\u002Fmicrosoft.powershell.core\u002Fabout\u002Fabout_character_encoding?view=powershell-7.4)是 [UTF16-LE](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F66072117\u002Fwhy-does-windows-use-utf-16le)。这会导致在 PowerShell 中将 qsv 输出重定向到 CSV 文件并尝试用 Excel 打开时出现问题——所有内容都会显示在第一列，因为 Excel 无法正确识别 UTF16-LE 编码的 CSV 文件。\n\n```\n# 以下命令将在 Windows 上生成 UTF16-LE 编码的 CSV 文件\nqsv stats wcp.csv > wcpstats.csv\n```\n\n这显得有些奇怪，因为人们可能会认为 Microsoft 自己的 Excel 应该能够正确识别 UTF16-LE 编码的 CSV 文件（参见[此处](https:\u002F\u002Fanswers.microsoft.com\u002Fen-us\u002Fmsoffice\u002Fforum\u002Fall\u002Fopening-csv-file-with-utf16-encoding-in-excel-2010\u002Fed522cb9-e88d-4b82-b88e-a2d4bd99f874?auth=1)）。无论如何，要在 Windows 上创建正确的 UTF-8 编码文件，请改用 `--output` 选项：\n\n```\n# 因此，在 Windows 上不要将 stdout 重定向到文件\nqsv stats wcp.csv > wcpstats.csv\n\n# 而是这样做，以确保文件正确编码为 UTF-8\nqsv stats wcp.csv --output wcpstats.csv\n```\n\n另外，qsv 还可以在 CSV 文件开头添加 [字节顺序标记](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FByte_order_mark)（BOM），以标明文件为 UTF-8 编码。您可以通过将 `QSV_OUTPUT_BOM` 环境变量设置为 `1` 来实现这一点。\n\n这样，Windows 上的 Excel 就能正确识别该 CSV 文件为 UTF-8 编码。\n\n请注意，这在 macOS 上的 Excel 中并不会出现问题，因为 macOS（与其他大多数 *nix 系统一样）默认使用 UTF-8 编码。同样，在其他操作系统上生成的 qsv 输出文件也不会遇到这个问题，因为 Windows 上的 Excel 可以正确识别 UTF-8 编码的 CSV 文件。\n\n## 解释器\n对于复杂的数据处理任务，您可以使用 Luau 和 Python 脚本。\n\n在处理复杂数据时，推荐使用 Luau 而不是 Python，因为它速度更快、内存效率更高，没有外部依赖，并且提供了类似于 qsv 的 DSL 的多种数据处理辅助函数。\n\n更多信息请参阅 [Luau 与 Python](docs\u002FINTERPRETERS.md)。\n\nqsv 附带的另一个“解释器”是 [MiniJinja](https:\u002F\u002Fdocs.rs\u002Fminijinja\u002Flatest\u002Fminijinja\u002F)，它用于 `template` 和 `fetchpost` 命令中。\n## 内存管理\nqsv 支持三种内存分配器——jemalloc（默认）、mimalloc 和标准分配器。\u003Cbr>更多信息请参阅 [内存分配器](docs\u002FPERFORMANCE.md#memory-allocator)。\n\n此外，qsv 还具备内存溢出防护功能，提供两种模式：NORMAL（默认）和 CONSERVATIVE。\u003Cbr>更多信息请参阅 [内存溢出防护](docs\u002FPERFORMANCE.md#out-of-memory-oom-prevention)。\n\n## 环境变量与 .env 文件支持\n\nqsv 支持大量的环境变量，并且可以通过 `.env` 文件来设置这些变量。\n\n有关详细信息，请参阅 [环境变量](docs\u002FENVIRONMENT_VARIABLES.md) 以及 [`dotenv.template`](dotenv.template) 文件。\n## 功能标志\n\nqsv 具有多个 [功能标志](https:\u002F\u002Fdoc.rust-lang.org\u002Fcargo\u002Freference\u002Ffeatures.html)，可用于启用或禁用可选功能。\n\n更多信息请参阅 [功能](docs\u002FFEATURES.md)。\n\n## 最低支持的 Rust 版本\n\nqsv 的 MSRV 策略要求使用最新的稳定版 [Rust](https:\u002F\u002Fgithub.com\u002Frust-lang\u002Frust\u002Fblob\u002Fmaster\u002FRELEASES.md)，该版本必须被 [Homebrew](https:\u002F\u002Fformulae.brew.sh\u002Fformula\u002Frust#default) 所支持。目前为 [![HomeBrew](https:\u002F\u002Fimg.shields.io\u002Fhomebrew\u002Fv\u002Frust?logo=homebrew)](https:\u002F\u002Fformulae.brew.sh\u002Fformula\u002Frust)。  \n尽管 qsv 本身可能会升级其 MSRV，但只有在 Homebrew 支持最新稳定版 Rust 之后，才会发布新的 qsv 版本。\n\n## 目标 \u002F 非目标\n\nQuickSilver 的目标按优先级排序如下：\n\n* **尽可能快** - 为此，它频繁发布新版本，采用积极的最低支持 Rust 版本（MSRV）策略，充分利用 CPU 特性，运用多种缓存策略（详见 [性能](docs\u002FPERFORMANCE.md#caching)），使用 HTTP\u002F2（https:\u002F\u002Fwww.cloudflare.com\u002Flearning\u002Fperformance\u002Fhttp2-vs-http1.1\u002F#:~:text=Multiplexing%3A%20HTTP%2F1.1%20loads%20resources,resource%20blocks%20any%20other%20resource.），并在合适且有意义的情况下实现多线程。此外，它尽可能使用最新的依赖项，并会利用 Cargo 的 [`patch`](https:\u002F\u002Fdoc.rust-lang.org\u002Fcargo\u002Freference\u002Foverriding-dependencies.html#the-patch-section) 功能从依赖中获取尚未发布的修复或功能。更多信息请参阅 [性能](docs\u002FPERFORMANCE.md)。\n  \n* **能够处理超大文件** - 大多数 qsv 命令都是流式处理，占用固定内存，可以处理任意大小的 CSV 文件。对于那些需要将整个 CSV 文件加载到内存中的命令（标记为 🤯），qsv 提供了内存溢出预防机制、批处理策略以及使用磁盘来处理超出内存容量文件的“外部”命令。更多信息请参阅 [内存管理](docs\u002FPERFORMANCE.md#memory-management)。\n\n* **完整的数据处理工具集** - qsv 致力于成为一个全面的数据处理工具集，既可用于快速分析和调查，也能满足生产级数据管道的需求。其丰富的命令针对常见的数据处理任务设计，可通过基于 Luau 的 DSL 组合成复杂的数据处理脚本。Luau 还将作为一整套 **qsv 配方库** 的核心——这些配方是针对常见任务（如街道级别的地理编码、去除 PII、数据增强等）的可重用脚本，提供易于修改的参数提示。\n\n* **可组合\u002F互操作性** - qsv 设计为可组合，特别注重与其他常用 CLI 工具（如 `awk`、`xargs`、`ripgrep`、`sed` 等）以及知名 ETL\u002FELT 工具（如 Airbyte、Airflow、Pentaho Kettle 等）的互操作性。其命令可以通过管道与其他工具结合使用，并支持 JSON\u002FJSONL、Parquet、Arrow IPC、Avro、Excel、ODS、PostgreSQL、SQLite 等常见文件格式。更多信息请参阅 [文件格式](#file-formats)。\n\n* **尽可能便携** - qsv 设计为便携，在多个平台上提供安装程序，并内置自动更新机制。优先支持 Linux、macOS 和 Windows。更多信息请参阅 [安装选项](#installation-options)。\n\n* **尽可能易用** - qsv 设计为易用，至少在命令行界面中算是非常容易上手：shrug:。其命令拥有众多选项，但都配有合理的默认值。使用说明面向数据分析师而非开发者编写；文档中包含大量示例，测试本身也兼具示例作用。借助 [qsv pro](https:\u002F\u002Fqsvpro.dathere.com)，用户可以通过图形用户界面获得更强大的功能，同时保持易用性。\n\n* **尽可能安全** - qsv 设计为安全。它没有外部运行时依赖，使用 [Rust](https:\u002F\u002Fopensource.googleblog.com\u002F2023\u002F06\u002Frust-fact-vs-fiction-5-insights-from-googles-rust-journey-2022.html) 编写，并通过自动化 [DevSkim](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDevSkim#devskim)、[\"cargo audit\"](https:\u002F\u002Frustsec.org) 和 [Codacy](https:\u002F\u002Fapp.codacy.com\u002Fgh\u002Fdathere\u002Fqsv\u002Fdashboard) GitHub Actions 工作流自动审计代码库中的安全漏洞。  \n\n它始终使用最新的稳定版 Rust，并坚持积极的 MSRV 策略及所有依赖项的最新版本。qsv 拥有超过 2,600 个测试用例的庞大测试套件，其中包括多项[属性测试](https:\u002F\u002Fmedium.com\u002Fcriteo-engineering\u002Fintroduction-to-property-based-testing-f5236229d237)，这些测试会为常用命令随机生成参数（参考 [BurntSushi\u002Fquickcheck](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Fquickcheck#quickcheck)）。预编译的二进制归档文件经过 [zipsigned](https:\u002F\u002Fgithub.com\u002FKijewski\u002Fzipsign#zipsign) 签名，因此您可以验证其完整性（详见 [验证预编译二进制归档的完整性](#verifying-the-integrity-of-the-prebuilt-binaries-zip-archives)）。其自动更新机制会在应用更新前自动验证预编译二进制归档的完整性。更多详细信息请参阅 [安全性](SECURITY.md)。\n\n* **尽可能易于贡献** - qsv 设计为易于贡献，重点在于可维护性。其模块化架构允许通过特性标志轻松添加自包含的命令，源代码注释详尽，使用说明内嵌其中，并提供辅助函数以方便创建新命令和支持测试。更多信息请参阅 [特性](docs\u002FFEATURES.md) 和 [贡献指南](CONTRIBUTING.md)。\n\nQuickSilver 的非目标是：\n\n* **尽可能小** - qsv 被设计得较小，但不会以牺牲性能、功能、可组合性、便携性、易用性、安全性或可维护性为代价。不过，我们确实提供了 `qsvlite` 变体，其体积约为 `qsv` 的 16%，以及 `qsvdp` 变体，体积约为 `qsv` 的 16%。然而，这些变体的功能有所减少。\n\n此外，部分命令通过特性标志控制，因此您可以仅编译所需功能的 qsv。\n\n* **多语言支持** - qsv 的 _使用说明_ 和 _消息_ 仅提供英文版本，目前暂无计划支持其他语言。但这并不意味着它只能处理英文输入文件。只要文件采用 UTF-8 编码，qsv 就能处理任何语言的良好格式的 CSV 文件。此外，它还支持除逗号之外的其他分隔符；`apply whatlang` 操作可检测 69 种语言；而 `apply thousands, currency and eudex` 操作则支持不同语言和国家的数字、货币和日期解析与格式化规范。\n\n最后，虽然 `geocode` 命令的默认 Geonames 索引仅为英文，但可通过 `geocode index-update` 子命令并使用 `--languages` 选项重建索引，从而返回多语言的地名（支持多达 254 种语言，参考 http:\u002F\u002Fdownload.geonames.org\u002Fexport\u002Fdump\u002Falternatenames\u002F)。\n\n## 测试\nqsv 在 [tests](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Ftree\u002Fmaster\u002Ftests) 目录下拥有约 2,700 个测试用例。每个命令都有独立的测试文件，命名规则为 `test_\u003CCOMMAND>.rs`。除了防止回归问题外，这些测试也是很好的示例，经常被各命令的使用说明引用。\n\n要测试每个二进制变体：\n\n```bash\n# 测试 qsv\ncargo test --features all_features\n\n# 测试 qsvlite\ncargo test --features lite\n# 使用 qsvlite 测试所有名称中含有 \"stats\" 的测试\ncargo test stats --features lite\n```\n\n# 测试 qsvmcp\ncargo test --features qsvmcp\n\n# 测试 qsvdp\ncargo test --features datapusher_plus\n\n# 测试特定命令\n# 这里我们只测试 stats 命令，并使用 t 作为 test 的别名，-F 作为 --features 的快捷方式\ncargo t stats -F all_features\n\n# 使用特定功能测试特定命令\n# 这里我们只测试 luau 命令，并启用 luau 功能\ncargo t luau -F feature_capable,luau\n\n# 使用多个功能测试 count 命令\n# 我们使用 \"test_count\"，因为我们不想运行其他测试名称中包含 \"count\" 的测试，例如 test_geocode_countryinfo\ncargo t test_count -F feature_capable,luau,polars\n\n# 使用替代分配器进行测试\n# 除了默认的 jemalloc 分配器之外\ncargo t --no-default-features -F all_features,mimalloc\n```\n\n## 许可证\n\n采用 MIT 或 [UNLICENSE](https:\u002F\u002Funlicense.org) 双重许可。\n\n\n[![FOSSA 状态](https:\u002F\u002Fapp.fossa.com\u002Fapi\u002Fprojects\u002Fgit%2Bgithub.com%2Fjqnatividad%2Fqsv.svg?type=large)](https:\u002F\u002Fapp.fossa.com\u002Fprojects\u002Fgit%2Bgithub.com%2Fjqnatividad%2Fqsv?ref=badge_large)\n\n## 起源\n\nqsv 是对流行的 [xsv](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Fxsv) 工具的一个分支。基于这一坚实的基础，它于 2021 年 9 月被分叉，此后逐渐发展成为一个通用的数据处理工具集，增加了许多命令和功能。\n更多详情请参阅 [FAQ](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions\u002F287)。\n\n## 赞助商\n\n\u003Cdiv align=\"center\">\n\n|qsv 的实现得益于|\n:-------------------------:|\n|[![datHere 标志](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_readme_f8ab330bdca2.png)](https:\u002F\u002FdatHere.com)\u003Cbr>|\n|基于标准、业界最佳的开源解决方案\u003Cbr>让您的**数据有用、易用且被广泛使用。**   |\n\n\u003C\u002Fdiv>\n\n## 名称冲突\n\n该项目与 [英特尔的 Quick Sync Video](https:\u002F\u002Fwww.intel.com\u002Fcontent\u002Fwww\u002Fus\u002Fen\u002Farchitecture-and-technology\u002Fquick-sync-video\u002Fquick-sync-video-general.html) 无关。","# qsv 快速上手指南\n\nqsv 是一个极速的命令行数据处理工具包，专为查询、切片、排序、分析、过滤、丰富、转换、验证和连接表格数据（CSV、Excel 等）而设计。其命令简单、可组合且性能卓越。\n\n## 环境准备\n\n*   **操作系统**：支持 Linux、macOS 和 Windows。\n*   **前置依赖**：\n    *   qsv 是编译好的二进制文件，通常无需安装额外运行时依赖（如 Python 或 Node.js）。\n    *   若需使用 `clipboard` 命令，需确保系统剪贴板工具可用。\n    *   若需使用 `fetch`\u002F`fetchpost` 的高级缓存功能，可选装 Redis。\n*   **硬件建议**：利用多核 CPU 可获得最佳性能（部分命令如 `count`, `frequency`, `extsort` 支持多线程）。\n\n## 安装步骤\n\n### 方式一：使用预编译二进制（推荐）\n\n访问 [qsv 发布页面](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Freleases) 下载对应操作系统的最新版本压缩包，解压后将二进制文件添加到系统 `PATH` 环境变量中。\n\n**Linux (示例):**\n```bash\n# 下载最新 Linux 版本 (请替换为实际最新版本号链接)\nwget https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Freleases\u002Flatest\u002Fdownload\u002Fqsv-x86_64-unknown-linux-musl.zip\nunzip qsv-x86_64-unknown-linux-musl.zip -d ~\u002Fbin\n# 确保 ~\u002Fbin 已在 PATH 中，或移动到 \u002Fusr\u002Flocal\u002Fbin\nsudo mv ~\u002Fbin\u002Fqsv \u002Fusr\u002Flocal\u002Fbin\u002F\n```\n\n**macOS (使用 Homebrew):**\n```bash\nbrew install qsv\n```\n\n**Windows:**\n下载 `.zip` 文件，解压后将 `qsv.exe` 放入任意已加入 `PATH` 的文件夹（如 `C:\\Windows\\System32` 或自定义工具目录）。\n\n### 方式二：从源码编译 (需 Rust 环境)\n\n如果你已安装 Rust (版本 1.94+)：\n\n```bash\ncargo install qsv\n```\n\n> **注意**：国内开发者若遇到 crates.io 下载缓慢，可配置国内镜像源：\n> ```bash\n> # 临时使用中科大或清华镜像源进行安装\n> export CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse\n> cargo install qsv --registry mirror-tuna\n> # 或者在 .cargo\u002Fconfig.toml 中永久配置镜像\n> ```\n\n## 基本使用\n\n安装完成后，在终端输入 `qsv` 即可查看所有可用命令列表。\n\n### 1. 查看数据概览\n使用 `slice` 查看文件前几行，配合 `flatten` 以易读格式显示单条记录：\n```bash\n# 查看 data.csv 的前 5 行，并以格式化表格展示第一条记录\nqsv slice -i 5 data.csv | qsv flatten\n```\n\n### 2. 统计行数\n极速统计大型 CSV 文件的行数（若有索引则瞬间完成）：\n```bash\nqsv count data.csv\n```\n\n### 3. 数据去重\n移除重复行：\n```bash\nqsv dedup input.csv > output.csv\n```\n\n### 4. 字段频率分析\n生成各列的频率分布表（支持多线程加速）：\n```bash\nqsv frequency data.csv > freq_report.csv\n```\n\n### 5. 提取 Excel  sheet\n将 Excel 文件的指定工作表转换为 CSV：\n```bash\nqsv excel sheet_name input.xlsx > output.csv\n```\n\n### 6. 管道组合示例\n组合多个命令：提取前 100 行 -> 去除表头 -> 统计剩余行数：\n```bash\nqsv slice -l 100 data.csv | qsv behead | qsv count\n```\n\n更多高级功能（如 AI 辅助描述 `describegpt`、HTTP 请求 `fetch`、地理编码 `geocode` 等）请参考官方文档或使用 `qsv \u003Ccommand> --help` 查看具体用法。","某电商数据分析师需要在凌晨处理来自多个仓库的 5GB 销售流水 CSV 文件，进行清洗、去重、关联商品表并生成日报，且必须在 1 小时内完成以赶上早会。\n\n### 没有 qsv 时\n- 使用 Python Pandas 加载大文件时频繁遭遇内存溢出（OOM），不得不将大文件拆分成小块手动处理，代码逻辑复杂且易错。\n- 简单的字段筛选和格式转换需要编写数十行脚本，调试耗时，且每次需求微调都要重新运行整个流程。\n- 多表关联（Join）操作在本地笔记本上运行极慢，往往需要数小时才能完成，严重依赖公司重型数据库集群资源。\n- 数据验证缺乏统一标准，只能靠肉眼抽样或编写复杂的正则脚本，容易遗漏异常值导致报表数据不准。\n- 跨平台协作困难，Windows 和 Linux 环境下的脚本兼容性差，同事复现分析结果时常遇到依赖冲突问题。\n\n### 使用 qsv 后\n- 利用 qsv 的流式处理能力，直接在不占用大量内存的情况下秒级读取并处理 5GB 文件，彻底告别内存溢出烦恼。\n- 通过组合 `select`、`rename` 和 `fmt` 等简单命令，一行指令即可完成字段清洗与格式化，修改逻辑只需调整参数无需重写代码。\n- 使用 `join` 命令在本地即可实现百万行数据的极速关联，处理时间从数小时缩短至几分钟，不再依赖重型数据库。\n- 内置 `validate` 命令配合 JSON Schema，能自动拦截格式错误或缺失关键字段的记录，确保输出数据的绝对可靠性。\n- 作为单一二进制文件，qsv 在所有主流操作系统上行为一致，团队成员可轻松复用相同的命令行流水线，协作效率大幅提升。\n\nqsv 将原本繁琐耗时的数据清洗工程转化为简洁高效的命令行流水线，让大数据处理变得像操作小文件一样轻盈迅捷。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdathere_qsv_b52998a9.png","dathere","datHere, Inc.","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fdathere_7b32a61c.png","Data Infrastructure Engineering with standards-based, best-of-breed, open source solutions to make your Data Useful, Usable & Used.",null,"info@dathere.com","dathereinc","https:\u002F\u002FdatHere.com","https:\u002F\u002Fgithub.com\u002Fdathere",[82,86,90,94,98,102,105,109,113,117],{"name":83,"color":84,"percentage":85},"Rust","#dea584",67.5,{"name":87,"color":88,"percentage":89},"TypeScript","#3178c6",10.1,{"name":91,"color":92,"percentage":93},"Shell","#89e051",9.2,{"name":95,"color":96,"percentage":97},"JavaScript","#f1e05a",4.6,{"name":99,"color":100,"percentage":101},"PowerShell","#012456",3.7,{"name":103,"color":104,"percentage":10},"Jupyter Notebook","#DA5B0B",{"name":106,"color":107,"percentage":108},"Elvish","#55BB55",1.3,{"name":110,"color":111,"percentage":112},"Nushell","#4E9906",0.6,{"name":114,"color":115,"percentage":116},"Luau","#00A2FF",0,{"name":118,"color":119,"percentage":116},"Go Template","#00ADD8",3594,104,"2026-04-11T18:35:49","Unlicense",1,"Linux, macOS, Windows","未说明","未说明（支持处理超大文件，具备外部排序和磁盘哈希去重功能以优化内存使用）",{"notes":129,"python":130,"dependencies":131},"该工具是基于 Rust 编写的命令行程序，无需 Python 环境。最低支持的 Rust 版本为 1.94。部分高级功能（如 Polars 加速、地理编码、AI 描述生成等）需要通过编译时的特性标志（Feature Flags）启用。对于 AI 相关功能（如 describegpt），支持连接本地大模型（Ollama, Jan, LM Studio）或兼容 OpenAI API 的服务，但工具本身不捆绑模型。","不需要",[132],"Rust 1.94+",[13,14,15,16],[135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154],"csv","data-wrangling","opendata","data-engineering","ckan","excel","luau","parquet","polars","sql","geocode","timeseries","dcat","metadata","statistics","fair-data","sampling","ai","fairification","agentic-ai","2026-03-27T02:49:30.150509","2026-04-12T09:05:35.171065",[158,163,168,173,178,183],{"id":159,"question_zh":160,"answer_zh":161,"source_url":162},30545,"为什么读取普通 CSV 文件时会偶尔报错 \"snappy: corrupt input\"？","这通常不是文件内容的问题，而是临时文件的文件名导致的。如果临时文件的名称以 \".sz\" 结尾（例如 `tmp.KYpPcb8esz`），qsv 会误将其识别为 Snappy 压缩文件并尝试解压，从而报错。\n解决方案：该问题已在代码层面修复（检查逻辑已优化）。如果遇到此问题，请确保升级到最新版本。在旧版本中，避免让临时文件或输入文件的文件名以 \".sz\" 结尾可作为临时规避方法。","https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fissues\u002F2157",{"id":164,"question_zh":165,"answer_zh":166,"source_url":167},30546,"如何使用 qsv 生成 JSON Schema 并验证数据质量？","可以使用 `schema` 命令结合 `stats` 和 `frequency` 命令来生成 JSON Schema 文件，然后使用 `validate` 命令进行验证。具体流程如下：\n1. 使用 `qsv schema` 基于代表性 CSV 文件生成初始 JSON Schema。\n2. 手动调整生成的 Schema 以微调验证规则（如指定字段的有效范围）。\n3. 在数据管道开头使用 `qsv validate` 命令，当数据不符合 Schema 时优雅地失败。\n4. 对于超大文件，可先使用 `sample` 采样或 `partition` 分区后再进行验证。","https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fissues\u002F60",{"id":169,"question_zh":170,"answer_zh":171,"source_url":172},30547,"在使用 sqlp 处理包含特殊格式字符串（如 \"109:::114\"）的列时遇到类型解析错误怎么办？","当 Polars (sqlp) 无法自动推断数据类型（例如将包含 \":::\" 的字符串误判为整数）时，可以通过以下参数调整行为：\n1. 增加 `infer_schema_length`（例如设置为 10000）以读取更多行来推断架构。\n2. 使用 `dtypes` 参数显式指定列的正确数据类型。\n3. 设置 `ignore_errors` 为 `True` 以跳过解析错误的行。\n4. 将特定字符串（如 \"109:::114\"）添加到 `null_values` 列表中，使其被视为空值。","https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fissues\u002F1047",{"id":174,"question_zh":175,"answer_zh":176,"source_url":177},30548,"qsv 是否支持直接使用 DuckDB 执行 SQL 查询？","早期版本曾计划集成 DuckDB，但后续策略有所调整。目前推荐使用稳定的 `sqlp` 命令（基于 Polars 引擎）来执行 SQL 查询。\n注意：原计划中基于 DuckDB 的独立 `sql` 命令已被重新规划或移除（见版本 0.133.1 发布说明），建议用户优先使用 `sqlp` 功能，它已能稳定处理大多数 SQL 查询需求。","https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fissues\u002F828",{"id":179,"question_zh":180,"answer_zh":181,"source_url":182},30549,"如何合并大量 CSV 文件而不受命令行长度限制的影响？","可以使用 `qsv cat rows` 命令配合文件列表功能。创建一个文本文件，每行包含一个待合并的 CSV 文件路径，然后通过相应选项（类似于 `csvtk concat` 的 `--infile-list` 功能）将该列表文件传递给 qsv。这样可以绕过 Shell 对命令行参数长度的限制，高效处理成百上千个文件的合并任务。","https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fissues\u002F1293",{"id":184,"question_zh":185,"answer_zh":186,"source_url":172},30550,"编译 qsv 或相关组件时遇到内存不足（OOM）错误如何解决？","某些依赖项（如 DuckDB 的某些构建过程）在编译时需要大量内存。如果遇到内存不足导致编译失败的情况，建议：\n1. 在内存更大的机器上进行编译。\n2. 如果必须在当前机器编译，尝试增加交换空间（Swap）。\n3. 考虑使用官方提供的预编译二进制文件（prebuilt binaries），以避免本地编译的高内存消耗。",[188,193,198,203,208,213,218,223,228,233,238,243,248,253,258,263,268,273,278,283],{"id":189,"version":190,"summary_zh":191,"released_at":192},214831,"19.0.0","## [19.0.0] - 2026-04-07 🔐 _*“FAIR 答案”*_ 发布 📐\n\n[科学研究中的可重复性危机](https:\u002F\u002Fscienceinsights.org\u002Fwhat-is-reproducibility-and-the-replication-crisis\u002F) 是推动数据管理领域 [FAIR 原则](https:\u002F\u002Fwww.go-fair.org\u002Ffair-principles\u002F) 的主要动力之一。\n\n随着人工智能在数据流水线中日益广泛应用，可重复性和可审计性的需求变得更加关键。这是因为“幻觉”现象以及非确定性输出是生成式 AI 固有的挑战。\n\n因此，在本次发布中，我们为 qsv 增加了多项功能，以帮助用户更有效地跟踪、审计并重现其由 AI 辅助的数据清洗工作流。由于 FAIR 原则不仅适用于数据，我们也希望提供 **“FAIR 答案”**——其中最后一个 R 代表“可重复性”：\n\n- **增强的日志记录**：`qsv_log` 工具现支持结构化日志记录，并以 JSON 格式输出，从而更便于解析和分析日志，用于可重复性审计（请注意，此功能仅在 qsv MCP 服务器中可用）。\n- **新增 blake3 命令**：新命令 `blake3` 可计算文件或数据流的 [BLAKE3](https:\u002F\u002Fblake3.io) 哈希值，为验证数据完整性及在工作流中追踪文件版本提供了一种快速且可靠的方法。与常用的 SHA-256 哈希相比，BLAKE3 在不牺牲安全性的前提下，速度最高可达前者的 16 倍，因此特别适合处理大规模数据集和迭代式流程。\n- **Cowork 项目可重复性清单**：基于 18.0.0 版本中推出的 Cowork 项目支持，qsv Cowork 插件现在会生成一份项目可重复性清单——这是一份结构化的日志，记录了 Cowork 会话期间产生的所有提示、命令和输出。该清单可用于对数据清洗过程进行详细审计，帮助用户理解特定输出的生成方式，并使其能够自信地重现或修改工作流。\n- **更多统计指标**：`moarstats` 命令进一步扩展了其统计检验和指标种类（包括 [Trimean](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FSTATS_DEFINITIONS.md#trimean)、[Midhinge](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FSTATS_DEFINITIONS.md#midhinge)、[Robust CV](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FSTATS_DEFINITIONS.md#robust_cv)、[Jarque-Bera](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FSTATS_DEFINITIONS.md#jarque_bera)、[Theil 指数](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FSTATS_DEFINITIONS.md#theil_index)、[平均绝对偏差](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FSTATS_DEFINITIONS.md#mean_ad) 和 [辛普森多样性指数](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FSTATS_DEFINITIONS.md#simpsons_diversity_index)），从而让用户对其数据分布和关系有更深入的洞察，而这对于数据分析中的可重复性至关重要。\n- **to parquet 功能改进**：`to parquet` 命令重新回归，并采用由 Polars 的 LazyFrame API 提供支持的新实现，从而提供更快、更可靠的 CSV 到","2026-04-06T15:12:51",{"id":194,"version":195,"summary_zh":196,"released_at":197},214832,"18.0.0","## [18.0.0] - 2026-03-20 **_“StatsSighting” 协作插件发布_**\n\n“StatsSighting” 就像“VibeCoding”，但它专为迭代式、极速的深度数据分析而设计。“Stats”代表统计，“Sight”代表洞察——首先对数据集进行全面的统计概览，从而为分析流程提供依据。\n\nClaude 协作插件附带多个智能体：用于深度数据探索与分析的“[数据分析师智能体](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002F.claude\u002Fskills\u002Fagents\u002Fdata-analyst.md)”、用于数据转换与清洗的“[数据整理者智能体](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002F.claude\u002Fskills\u002Fagents\u002Fdata-wrangler.md)”以及用于辅助政策评估与决策的“[政策分析师智能体](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002F.claude\u002Fskills\u002Fagents\u002Fpolicy-analyst.md)”。每个智能体都承担特定的角色，并具备相应的[技能集](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002F.claude\u002Fskills\u002Fskills\u002F)，共同强调在采取行动之前，充分利用 qsv MCP 服务器的 profiling 和查询能力来深入理解数据。\n\nqsv MCP 服务器也得到了重大增强，包括会话日志记录、基于 DuckDB 的 Parquet 格式转换、SQL 转换加固以及交互式工作目录选择等功能。\n\n在本次发布中，核心 qsv 套件同样获得了显著更新，其中包括用于查询前 SQL 分析的新命令 `scoresql`、集成 stats 缓存并新增比较模式的更智能 `pragmastat`、更加注重 moarstats 的 `pivotp` 优化，以及为 `to` 命令提供的格式化表格输出。\n\n---\n\n### 主要特性\n\n#### 新增 `scoresql` 命令\n针对 CSV 文件缓存（stats、moarstats、频率）分析 SQL 查询，**在执行查询之前**生成性能评分及可操作的优化建议。评分因素包括查询计划分析（EXPLAIN）、类型优化、连接键基数、过滤选择性、反模式检测（SELECT *、缺少 LIMIT、笛卡尔积连接）以及基础设施检查（索引文件、缓存新鲜度）。支持 Polars 和 DuckDB 模式、SQL 文件输入和 JSON 输出，并可与 `describegpt` 集成，实现 AI 辅助查询审查。[#3612](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3612)、[#3616](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3616)、[#3624](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3624)\n\n#### 更智能的 `pragmastat` — 支持 stats 缓存并新增比较模式\n`pragmastat` 现在会读取 stats 缓存，自动跳过非数值或非日期列，并将自身结果写回缓存供下游命令使用。新增 `--compare1` 和 `--compare2` 选项，可并排比较两组分布。多项性能优化使其运行速度大幅提升。[#3591](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3591)、[#3593](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3593)、[#3596](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3596)、[#3595](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3595)、[#3611](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3611)\n\n#### `pivotp` — 更智能的透视运算，结合 m","2026-03-20T15:04:19",{"id":199,"version":200,"summary_zh":201,"released_at":202},214833,"17.0.0","## [17.0.0] - 2026-03-03 **“用户🧑🏻与智能体🤖体验（UAX）发布”**\n\n本次发布的核心在于让人类用户与 AI 智能体协同工作，以更快速、更高效的方式处理数据——无论您是独立分析师，还是使用 Claude Desktop\u002FCowork\u002FCode 或 Gemini 的数据团队。\n\n在 16.1.0 中引入的 UAX 主题迎来了全面升级：全新的 `qsvmcp` 二进制变体为 AI 智能体提供了专门构建、更为精简的版本；MCP 服务器则在工具引导、TSV 输出以提升 token 效率、可复现性日志记录、基于 DuckDB 的 Parquet 转换、自动 `moarstats` 增强、SQL 翻译加固以及交互式工作目录选择等方面实现了显著提升。在核心 CLI 方面，`stats` 缓存的可靠性在不同分隔符和输出格式下均有所改善，`sniff` 现已正确解析符号链接，而 `moarstats` 的热点路径性能也得到了进一步优化。\n\n---\n\n### 重大特性\n\n#### 新增 `qsvmcp` 二进制变体\n这是一款专为配合 qsv MCP 服务器使用而优化的二进制文件，在移除不必要的功能（如 `apply`、`fetch`、`fetchpost`、`foreach`、`to` 等）的同时增加了会话日志记录，从而实现更快、更小的构建体积。MCP 服务器现在优先使用 `qsvmcp`，并在必要时自动回退到完整的 `qsv` 二进制。`qsvmcp` 现已随 `qsv`、`qsvlite` 和 `qsvdp` 一同包含在发行包中。\n\n#### qsv MCP 服务器：面向智能体的原生增强\nMCP 服务器（现已升级至 v17.0.0）迎来了迄今为止最大的一次更新，新增多项功能旨在提升 AI 智能体在数据处理中的效率：\n\n- **TSV 输出格式** — 默认输出切换为 TSV，可将智能体响应中的 token 数量减少约 30%，可通过环境变量 `QSV_MCP_OUTPUT_FORMAT` 进行配置。\n- **会话日志记录** — 新增 `qsv_log` 工具及自动 `qsvmcp.log` 审计追踪，以支持可复现性操作；日志级别可通过 `QSV_MCP_LOG_LEVEL` 进行自定义配置。\n- **DuckDB Parquet 转换** — 当 DuckDB 可用时，CSV 到 Parquet 的转换将改用 DuckDB，而非 `sqlp`，从而实现更快、更可靠的转换。\n- **自动 moarstats** — 在执行 `stats` 后自动运行 `moarstats`，以极低的成本提供更丰富的统计上下文。\n- **SQL 翻译加固** — 对 `translateSql` 进行了重大重构：采用唯一的表别名（`_tbl_N`）、保护字符串字面量、保留用户提供的别名，并在预扫描阶段修复限定引用。\n- **工作目录交互式选择** — 通过 MCP Elicitation 协议提供交互式目录选择器，用于首次设置。\n- **缓存文件名预留保护** — 防止意外使用 `--output` 参数覆盖 `.stats.csv` 和 `.freq.csv` 缓存文件。\n- **缓存感知 SQL 指导** — 服务器指令现会引导智能体在编写 `sqlp`、`joinp` 和 `pivotp` 查询时充分利用统计和频率缓存。\n- **Polars SQL 引擎头信息** — 明确标注引擎标识，以区分 Polars SQL 和 DuckDB 的查询结果。\n- **绝对路径解析** — 所有文件路径参数现均解析为绝对路径，以提高鲁棒性。\n- **Cowork CLAUDE.md 自动部署** — 自动将项目中的 `CLAUDE.md` 部署至","2026-03-03T14:56:54",{"id":204,"version":205,"summary_zh":206,"released_at":207},214834,"16.1.0","## [16.1.0] - 2026-02-15 📊 **_“加速型公民智能（ACI）发布”_** 📊\n\n统计分析变得更快速、更稳健；用户与智能体体验（UAX）的改进使 CLI 解析器、文档、Shell 补全以及 MCP 工具定义能够从单一源保持同步；而 qsv MCP 服务器则更加精简、智能。\n\n在[正确配置的环境](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002F.claude\u002Fskills\u002Fdocs\u002Fguides\u002FMACOS-QUICK_START.md)下，用户可以与多个 AI 智能体协作，对大型、**真实世界中的混乱数据**——原始数据集、演示文稿、报告、电子表格等——进行**加速分析**，而无需先将所有数据上传至云端或手动整理成可用格式。原本可能需要数天甚至数周才能完成的工作，如今只需几分钟即可搞定。\n\n---\n\n### 🌟 主要特性\n\n#### 新增 `pragmastat` 命令\n由 @AndreyAkinshin 提供的实用统计工具集——借助 [Pragmastat 库](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fpragmastat)，可计算稳健的中位数对统计量。该工具专为存在严重偏态、长尾分布或异常值的数据设计，在这些情况下，均值和标准差往往会给出误导性结论。有关底层算法及设计理念，请参阅 [pragmastat.dev](https:\u002F\u002Fpragmastat.dev)。\n\n#### 频率缓存系统\n`frequency` 命令新增 `--frequency-jsonl` 选项，可生成 JSONL 格式的缓存文件（类似于 `stats --stats-jsonl`），从而加速重复频率分析。对于高基数列，采用混合策略，并提供可配置的阈值。\n\n#### UAX 改进：统一文档与 Shell 补全\n基于 [docopt](http:\u002F\u002Fdocopt.org) 的全新解析系统，现可从 qsv CLI 解析所使用的 USAGE 文本中，同时生成 Markdown 文档、Shell 补全脚本**以及** MCP 工具定义。一切自动保持同步，不再出现帮助文本、文档、补全与 AI 工具之间脱节的问题。\n\n- `--generate-help-md` 标志会生成精美的 Markdown 文档，包含章节导航、表情符号说明、可点击链接以及对人类和智能体都友好的参数\u002F选项表格。\n- Shell 补全现已实现自动生成，取代了此前手动维护的 68 个补全文件。\n\n#### qsv MCP 服务器：架构更精简\n`qsv_pipeline` 工具已被移除，取而代之的是直接按顺序执行命令的方式。实际上，智能体早已逐条调用命令，去除管道抽象后，服务器变得更加简单、可预测且易于调试。此外，MCP 还进行了以下改进：\n\n- 扩展了 AI 智能体的指导功能，以充分利用频率和统计缓存；\n- 无缝支持 Google Gemini CLI，这得益于 @kulnor 的持续贡献；\n- 大规模代码库重构：去重辅助函数、提取文件系统工具、修复 `any` 类型问题，并解决了多项 Bug。\n\n详细的 MCP 变更记录请参阅 [MCP CHANGELOG](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002F.claude\u002Fskills\u002FCHANGELOG.md)，以获取完整信息。\n\n---\n\n### 新增内容\n\n- 功能：`pragmastat`","2026-02-15T16:53:01",{"id":209,"version":210,"summary_zh":211,"released_at":212},214835,"16.0.0","# [16.0.0] - 2026-02-08 🤖 **_“AI原生版本”_** 🤖\n\n本次发布使 qsv 深度融入 AI 原生理念——从贯穿 Polars Schema 的更智能日期检测，到允许 AI 代理将 qsv 作为一流数据工具使用的 MCP 插件层。\n\nClaude Desktop、Code 和 Cowork 用户现在可以直接在其 AI 工作流中使用 qsv 强大的数据处理能力，享受智能化引导与无缝集成。此外，在 @kulnor 的贡献下，Google Gemini 也已获得支持。\n\n---\n\n## 🌟 主要特性\n\n### 更智能的日期\u002F日期时间检测\n\nqsv 现在能够自动检测日期和日期时间列，并将这一信息贯穿整个数据管道：\n\n- **`stats --dates-whitelist sniff`** 现已成为默认设置——qsv 会嗅探前 1000 行数据，识别可能的日期\u002F日期时间字段，以便后续进行更可靠的类型推断。\n- **`schema`** 在生成 Polars Schema（`.pschema.json`）时会自动检测日期\u002F日期时间列。\n- **Polars Schema 解析中的日期时间类型支持**——时间类型将在 `sqlp`、`joinp` 和 Parquet 转换过程中得以保留。\n\n### 加固的统计缓存系统\n\n用于加速 `frequency`、`schema`、`tojsonl`、`sqlp`、`joinp`、`pivotp`、`diff` 和 `sample` 等命令的统计缓存系统现已更加稳健：\n\n- **简化 API**：移除了 `get_stats_records()` 中的 `dataset_stats`，使所有下游消费者更加简洁。\n- **安全回退**：对于损坏或无法解析的缓存文件，系统将优雅地处理，而非直接报错。\n- **自动再生**：当解析出现错误时，统计缓存将自动重新生成，而不会导致任务失败。\n\n### 增强的 MCP 服务器 (16.0.0)\n\nqsv 的 MCP 服务器迎来了迄今为止最大的一次更新——完整详情请参阅 [MCP CHANGELOG](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002F.claude\u002Fskills\u002FCHANGELOG.md)。\n\n---\n\n### 破坏性变更\n\n1. **`diff` 命令**：移除 `--force` 选项  \n   - 此选项曾用于基于 dataset_stats 的短路比较，但在统计缓存 API 简化后已不再必要。\n2. **`to` 命令**：移除 `parquet` 子命令  \n   - 如需输出 Parquet 文件，请使用专用的 `qsv_to_parquet` MCP 工具，或直接使用 `sqlp`。\n\n### 新增功能\n\n- 功能：`stats` — 为 `--dates-whitelist` 添加“sniff”支持。\n- 功能：`schema` — 通过嗅探自动检测 Polars Schema 中的日期\u002F日期时间列。\n- 功能：支持 Polars Schema 解析中的日期时间类型。\n\n### 变更内容\n\n- 重构：`stats` — 将 `--dates-whitelist sniff` 设为默认设置。\n- 性能优化：在整个代码库中采用 foldhash HashMap\u002FHashSet，以提升哈希效率。  \n  - 替换了 14 个模块中的 std::collections，改用 foldhash。  \n  - 对于非加密场景下的哈希操作，foldhash 显著快于 std::collections。\n- 重构：`stats` — 从统计缓存系统中移除 dataset_stats。  \n  - 简化了 get_stats_records() 的 API。  \n  - 将行数处理集中到 sample 命令中。  \n  - 适配 diff、pivotp、sample 等命令以使用新 API。\n- 重构：`stats` — 统计缓存现可在解析错误时自动重新生成，从而提升鲁棒性。\n- 重构：`stats` — 对损坏的统计缓存实现安全回退。\n- 重构：`p","2026-02-09T04:29:43",{"id":214,"version":215,"summary_zh":216,"released_at":217},214836,"15.0.1","## [15.0.1] - 2026-01-28\n\n哎呀，我们已经庆祝了 `color` 和基于 `magika` 重新实现的 `sniff` 功能，却忘了在发布预编译版本中真正启用它们！🤦🏻‍♂️\n此补丁启用了新的 `color` 命令，并打开了 `magika` 功能，同时包含多项修复和依赖库升级。\n\n### 变更\n\n- 依赖：将 polars 升级至最新上游版本\n- 依赖：将 csv-nose 从 0.6.0 升级至 0.7.0\n- 依赖：将 mlua 从 0.11.5 升级至 0.11.6\n- 依赖：将 minijinja 从 2.14.0 升级至 2.15.1\n- 依赖：将 minijinja-contrib 从 2.14.0 升级至 2.15.1\n- 依赖：将 siphasher 从 1.0.1 升级至 1.0.2\n- 依赖：将 iana-time-zone 从 0.1.64 升级至 0.1.65\n- 依赖：将 hono 从 4.11.4 升级至 4.11.7（MCP）\n- 构建：在构建和测试工作流中添加 `color` 功能\n- 构建：在发布工作流中添加 `magika` 功能\n- 文档：更新 Luau 文档以反映捆绑的 Luau 0.706 版本\n- 文档：`sniff` 现在也由 Magika 的 MIME 类型检测提供支持，实现了机器人驱动的功能\n\n### 修复\n\n- 测试：修复不稳定的 `color` test_get_theme 测试（现因环境依赖而被忽略）\n- 测试：通过使用语义比较而非逐字节比较，修复了 `search` JSON 测试中的不稳定问题\n\n**完整变更日志**：[15.0.0...15.0.1](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcompare\u002F15.0.0...15.0.1)","2026-01-28T12:38:16",{"id":219,"version":220,"summary_zh":221,"released_at":222},214837,"15.0.0","# [15.0.0] - 2026-01-26 🖖🏻 **_“心灵融合版”_** 🖖🏽\n\n这是 qsv 至今为止最大的一次发布，得益于社区众多专家的贡献！\n\n* **@kulnor** 在统计学和数据标准方面的深厚造诣，极大地提升了 qsv 全套工具的数据分析能力！他精心编写的[问题报告](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fissues?q=is%3Aissue%20state%3Aclosed%20author%3Akulnor)、详尽的设计方案、严谨的测试以及每周心灵融合会议基础上的细致文档，显著改进了 `frequency`、`stats`、`moarstats` 和 `describegpt` 等命令。他的贡献与倡导至关重要，让我受益匪浅。\n* **@ws-garcia** 对[表格均匀性方法（TUM）](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002F10.3233\u002FDS-240062)的研究——这一算法正是全新改版的 `sniff` 命令背后的核心——将成为我们即将推出的下一代 CKAN 数据采集器的基石。尽管过程颇费周折，但我们的实现现已完成，在[W3C-CSVW 测试套件](https:\u002F\u002Fgithub.com\u002Fw3c\u002Fcsvw)上的准确率高达[99.55%](https:\u002F\u002Fgithub.com\u002Fjqnatividad\u002Fcsv-nose?tab=readme-ov-file#benchmarks)。\n* **@gurgeous** 新增的 `color` 命令让在终端中查看 CSV 文件成为一种享受！他对细节的关注与设计美学，打造出既实用又赏心悦目的命令，并且[更多功能正在路上](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions\u002F3334#discussioncomment-15527242)！\n* 如果你查看最近的[提交历史](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcompare\u002F12.0.0...15.0.0)，就会发现我在假期期间沉迷于[Claude 编码](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FClaudeAI\u002Fcomments\u002F1qgt0qa\u002Fwsj_claude_code_is_taking_the_ai_world_by_storm\u002F) 🤖。我与**@claude**（运行[Opus 4.5](https:\u002F\u002Fwww.anthropic.com\u002Fnews\u002Fclaude-opus-4-5)）紧密协作，恰如其分地增强了 qsv 在 `describegpt` 及其[美国人口普查感知型 MCP 服务器](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002F.claude\u002Fskills\u002FREADME-MCP.md)中的生成式 AI 能力。\n\n---\n\n## 🌟 重大特性\n\n本节内容由 @kulnor 的心灵融合会议倾情奉献。\n\n### `frequency` 命令增强\n\n新增强大的过滤与显示选项：\n\n- **`--no-float`**：在频率分析中排除浮点数列\n- **`--pct-nulls`**：在百分比计算中包含空值\n- **`--null-sorted`**：将空值与其他条目一起排序（而非置于末尾）\n- **`--no-other`**：排除“其他”聚合类别\n- **`--null-text`**：自定义空值的显示文本\n- **`--stats-filter`**：基于统计信息的 Luau 列筛选\n  - 可根据任意统计字段（空值数、基数、类型等）筛选列\n  - 完全支持 Luau 表达式，可实现复杂条件\n- 使用 `--weight` 时，JSON 输出中省略统计信息\n\n### `describegpt` 命令增强\n\nAI 驱动的数据描述更加智能。现经过优化，可直接与 LM Studio 和 openai\u002Fgpt-oss-20b 配合使用：\n\n- **`--fr","2026-01-26T14:27:29",{"id":224,"version":225,"summary_zh":226,"released_at":227},214838,"14.0.0","## [14.0.0] - 2026-01-12 📦 **_“面向所有人的 qsv MCP 发布”_** 🎁\n\n在上周发布的 13.0.0 版本——__“原生 AI 代理”__ 的基础上，**qsv 14.0.0** 致力于让 AI 集成变得 **无缝、可靠且对每个人来说都简单易用**。\n\n此前，安装 qsv MCP 服务器需要一个完整的开发环境，并熟悉命令行工具，因此非开发者难以直接使用。\n\n而本次发布则将 qsv MCP 服务器从一款强大的开发者工具，转变为一款 **用户友好、与 Claude Desktop 透明集成的数据处理代理**，具备跨平台支持、自动更新以及完善的测试基础设施。\n\n### MCP 桌面扩展（捆绑包）——一键安装\n\n全新的 **MCP 桌面扩展** 为 Claude Desktop 用户提供了简化的安装体验：\n\n- **用户友好的软件包**：预配置的捆绑包可自动检测 qsv 二进制文件；若未找到，则会提供安装指引[^1]  \n- **跨平台支持**：在 macOS、Windows 和 Linux 上均可无缝运行  \n- **智能数据处理**：凭借对 qsv 的深入理解，MCP 服务器能够屏蔽掉该工具集数百种选项背后的复杂细节，同时确保快速高效的执行。  \n- **令牌高效**：尽管功能强大，MCP 服务器仍保持令牌效率，通过提供 **智能化的上下文引导** 来帮助 Claude 做出最优决策（何时使用、常见模式、错误预防、性能提示等），并在需要更多信息时才按需加载完整的 qsv `--help` 文本。  \n- **安全性增强**：原始数据不会发送给 Claude，仅传递统计元数据[^2]  \n- **欢迎体验**：包含入门提示和示例，帮助用户快速上手。  \n- **可无缝兼容 [Claude Code](https:\u002F\u002Fcode.claude.com\u002Fdocs\u002Fen\u002Foverview) 以及刚刚推出的 [Claude Cowork](https:\u002F\u002Fclaude.com\u002Fblog\u002Fcowork-research-preview)！** 让 qsv 不再局限于数据处理对话，而是通过代理式 qsv 开启更大的潜力。\n\n该桌面扩展遵循 [官方 MCP 捆绑包（MCPB）清单规范 v0.3](https:\u002F\u002Fgithub.com\u002Fmodelcontextprotocol\u002Fmcpb\u002Fblob\u002Fmain\u002FMANIFEST.md)，确保与 Claude Desktop 及未来兼容 MCP 的应用程序完全兼容。\n\n安装说明请参阅 [MCP 文档](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002F.claude\u002Fskills\u002FREADME-MCP.md)。\n\n[^1]：qsv MCP 服务器现已升级至 v14.1.0，其中包含多项修复[链接]。  \n[^2]：请注意，统计元数据并未匿名化，可能泄露敏感信息。详情请参阅 https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fdiscussions\u002F3289。\n\n### 破坏性变更\n\n- **MCP 技能**：移除了 `qsv-skill-gen` 二进制文件，改用 `qsv --update-mcp-skills` 命令（需启用 `mcp` 功能标志）。\n\n---\n\n## 新增内容\n* 功能：MCP 桌面扩展——qsv MCP 服务器的用户友好型安装 https:\u002F\u002Fgi","2026-01-13T03:26:59",{"id":229,"version":230,"summary_zh":231,"released_at":232},214839,"13.0.0","## [13.0.0] - 2026-01-06 🦾 **_“统计数据处理智能体发布”_** 🤖\n\n我们以 **qsv 13.0.0** 迎接 2026 年——这是一个重要的里程碑，它将 qsv 打造成一款 **原生 AI 智能体**！\r\n\r\n这还叠加了我们去年九月推出的在线 [CKAN 门户 AI 聊天机器人](https:\u002F\u002Fdathere.com\u002Fai-chatbot\u002F) 以及上月发布的 [扩展版 describegpt 命令](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv?tab=readme-ov-file#describegpt_deeplink)。在未来几个月里，随着我们与开放知识基金会达成[战略合作，共同构建基于 CKAN 的开放、FAIR、AI 就绪的数据基础设施](https:\u002F\u002Fblog.okfn.org\u002F2025\u002F12\u002F09\u002Fopen-knowledge-foundation-and-dathere-announce-new-partnership-to-strengthen-open-fair-ai-ready-data-infrastructure-powered-by-ckan\u002F)，datHere 系列工具将继续朝着更强大的 AI\u002FML\u002F图技术\u002FFAIR 标准，以及数据馆员、礼宾服务、顾问和分析师等功能迈进。\r\n\r\n本次发布通过三大全新功能，为 AI 智能体提供了第一类支持：\r\n\r\n### MCP 服务器——模型上下文协议集成\r\n\r\nqsv 现在内置了一个 **模型上下文协议（MCP）服务器**，可与包括 Claude Desktop 在内的 AI 聊天机器人无缝集成。\r\n\r\n- **本地数据**：其受“零拷贝”理念启发的设计，使您能够处理超大规模数据集——**无需**传输原始数据[^1]，而仅向 Claude 发送统计元数据！这不仅有利于安全与隐私保护，还能突破 Claude 的文件上传大小限制，节省 Token 并提升性能！\r\n- **22 种 MCP 工具**：20 个常用 qsv 命令作为独立工具，加上 1 个通用工具用于调用其余 46 个命令，以及 1 个管道工具。\r\n- **自然语言接口**：无需记忆命令语法。\r\n- **管道支持**：可将多个操作无缝串联起来。\r\n\r\n详细设置说明请参阅 [MCP 文档](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002F.claude\u002Fskills\u002FREADME-MCP.md)。\r\n\r\n### Claude 智能体 SDK 辅助工具\r\n\r\n全新的 **智能体技能** 基础设施提供了：\r\n\r\n- `qsv-skill-gen` CLI——为 AI 智能体生成技能定义；\r\n- 使用 qsv-docopt 解析 qsv 的 USAGE 文本，自动生成 JSON 格式的技能定义。这使得每当命令和选项被添加或修改时，都能快速更新智能体技能；\r\n- 生成 shell 安全的示例代码，并正确使用引号；\r\n- 针对 AI 智能体集成的[全面文档](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FAGENT_COMPLETE_SUMMARY.md)，帮助您将 qsv 整合到自己的 AI 解决方案中！\r\n\r\n### `moarstats`——海量统计功能扩展\r\n\r\n`moarstats` 命令得到了大幅增强，新增了 **24+ 种 [MOAR](https:\u002F\u002Fwww.dictionary.com\u002Fculture\u002Fslang\u002Fmoar) 统计指标**：\r\n\r\n**高级单变量统计**：\r\n- **双峰系数**——检测多模态分布；\r\n- **归一化熵**——一种缩放后的信息含量度量（0–1）；\r\n- **阿特金森指数**——一种可配置 ε 参数的不平等度量。","2026-01-06T13:15:15",{"id":234,"version":235,"summary_zh":236,"released_at":237},214840,"12.0.0","## [12.0.0] - 2025-12-24 🎄\n\n把你的虚拟袜子塞得满满的，让数据铃铛叮当作响——qsv 12.0.0 就像圣诞老人的雪橇一样满载而归，顺着烟囱滑了进来！拆开这些令人欣喜的惊喜吧：闪亮的新命令 `moarstats`、精心包装的加权统计功能，以及如今能说多种语言的 AI 驱动 FAIR 元数据推断（[无需小精灵翻译](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fdescribegpt\u002Fsilly_opendata_elves.md#description)）。作为压轴好戏，隆重推出 TOON——一种全新的、专为大模型优化且高度精简的格式（[详情请见这篇博客文章](https:\u002F\u002Fopenapi.com\u002Fblog\u002Fwhat-the-toon-format-is-token-oriented-object-notation)），它将一路助力你的 AI 项目，直到 2026 年全年。哎呀呀，先别急着拿数据，这次更新简直就是一场节日盛宴！\n\n特别感谢 @kulnor 对以下诸多新功能的倡导、头脑风暴与测试工作！\n\n## 🌟 重大特性\n\n### 新增：`moarstats` 命令\n这是一款功能强大的新命令，用于进行“更多”（[参见此处解释](https:\u002F\u002Fwww.dictionary.com\u002Fculture\u002Fslang\u002Fmoar)）高级统计分析，提供比 `stats` 命令更为丰富的统计指标：\n\n- **全面的统计量**：涵盖超过 50 种高级统计度量，包括：\n  - 详尽的异常值分析（数量、总和、平均值）\n  - 温氏均值与截尾均值（5%、10%、20%、25%）\n  - 多种离散程度指标（四分位距与极差之比、四分位离散系数）\n  - 分布特征统计（偏度、多种峰度指标）\n\n- **高级选项**（`--advanced`）：启用计算密集型统计指标：\n  - 基尼系数，用于衡量不平等程度\n  - 超额峰度，用于评估分布的“尾部”特征\n  - 香农熵，用于分析数据多样性\n\n- **所有二进制版本均支持**，确保人人皆可使用。\n\n### `describegpt` 命令增强\nAI 驱动的数据描述能力得到大幅提升：\n\n- **⛩️ 集成 Minijinja 模板引擎**：\n  - 支持自定义提示模板，完整兼容 Minijinja 及其扩展插件中的过滤器\n  - 提供更强大、更灵活的提示定制化能力\n\n- **多语言支持**：\n  - 新增 `--language` 选项，可生成任意语言或方言的描述文本：\n    - 语言：西班牙语（[文档链接](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fdescribegpt\u002FSpanish.md)）、葡萄牙语（[文档链接](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fdescribegpt\u002FPortuguese.md)）、意大利语（[文档链接](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fdescribegpt\u002FItalian.md)）、日语（[文档链接](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fdescribegpt\u002FJapanese.md)）、印地语（[文档链接](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fdescribegpt\u002FHindi.md)）、阿拉伯语（[文档链接](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fdescribegpt\u002FArabic.md)）等。\n    - 方言：法英混杂语（[文档链接](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fdescribegpt\u002FFranglais.md)）、塔加洛—英语混杂语（[文档链接](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fdescribegpt\u002FTaglish.md)）、宾夕法尼亚德语（[文档链接](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fdescribegpt\u002FPennsylvaniaDutch.md)）等。\n    - 人工语言：克林贡语（[文档链接](https:\u002F\u002Fgithub.com\u002Fda","2025-12-24T14:14:57",{"id":239,"version":240,"summary_zh":241,"released_at":242},214841,"11.0.2","## [11.0.2] - 2025-12-08\r\n\r\nqsv 11.0.2 brings significant enhancements to larger-than-memory data processing, AI-powered metadata inferencing, JSON Schema inferencing & validation, and data viewing capabilities, along with important bug fixes and performance improvements.\r\n\r\nAll in preparation for at-scale, secure, interactive, \"_[zero-copy](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FZero-copy)_\" _[\"Data Steward-in-the-Loop\"](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FHuman-in-the-loop)_ [FAIRification](https:\u002F\u002Fwww.go-fair.org\u002Ffair-principles\u002Ffairification-process\u002F) on the desktop in [qsv pro](https:\u002F\u002Fqsvpro.dathere.com).\r\n\r\n## 🌟 Major Features\r\n\r\n### `stats` & `frequency`\r\n - **Larger than Memory Files**: `stats` & `frequency` can now handle arbitrarily large files, even when \"advanced\" statistics are enabled with its new dynamic parallel chunk sizing algorithm! (example [stats](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fscripts\u002FNYC_311_SR_2010-2020-sample-1M.stats.csv), [frequency](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fscripts\u002Fnyc311-1m.freqs.csv))\r\n - **N Counts**: Added \"n_counts\" (`n_negative`, `n_zero` and `n_positive`) columns to `stats` output for more detailed count information for numeric fields.\r\n\r\n### `describegpt`\r\nThe `describegpt` command has received substantial improvements for AI-powered metadata inferencing:\r\n\r\n- **\"Neuro-Procedural\" Data Dictionaries**: combines deterministically computed statistics and frequency distribution data with AI-inferred Human-Friendly Labels and Descriptions to compile an [expanded Data Dictionary](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fnyc311-describegpt.md) (not quite [\"neuro-symbolic\"](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FNeuro-symbolic_AI) (YET!))\r\n\r\n- **Chat with your Data!**: Improved DuckDB and Polars SQL guidance mean more reliable transformations of your Natural Language queries to SQL - leading to fast, deterministic, reproducible, hallucination-free answers! ([example](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fnyc311-describegpt-prompt.md), [SQL result](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002Fnyc311-describegpt-prompt.csv)) \r\n\r\n- **Format Option**: Replaced `--json` flag with `--format` option for more flexible output formatting\r\n  - Supports multiple output formats - Markdown (default), TSV and JSON\r\n  - Removed `--jsonl` option for cleaner API\r\n  \r\n- **Controlled Tag Vocabulary**: New tag vocabulary system for consistent categorization\r\n  - `--tag-vocab` option to specify controlled vocabulary\r\n  - Lookup support for tag vocabularies - retrieve a tag vocabulary from a local or remote CSV\u003Cbr>using `http:\u002F\u002F`, `https:\u002F\u002F`, `dathere:\u002F\u002F` and `ckan:\u002F\u002F` URL schemes.\r\n  \r\n- **Enhanced Boolean Inference**: `--infer-boolean` is now enabled by default for better data type detection\r\n\r\n- **Performance Metrics**: Added elapsed time tracking to monitor processing duration\r\n\r\n- **Improved Prompt Templates**: Updated default description prompt with PII\u002FPHI alerts and better attribution metadata\r\n\r\n### `schema` & `validate`\r\nEnhanced JSON Schema inference and validation capabilities:\r\n\r\n- **Strict Formats**: New `--strict-formats` option for stricter JSON Schema format validation,\u003Cbr>enforcing JSON Schema format constraints for email, hostname & IP address (IPV4\u002FIPV6) formats.\r\n  \r\n- **Output Option**: New `--output` option for specifying schema output destination\r\n  - Polars schema now uses consistent naming conventions across commands\r\n  - Updated `joinp`, `pivotp`, and `sqlp` commands to use new `.pschema.json` naming convention\r\n\r\n- **Configurable Email Validation**: `validate` has numerous options to tweak email validation\u003Cbr>- taking advantage of `schema`'s email format constraint inferencing.\r\n\r\n### `sample` time-series sampling\r\nA new `--timeseries` sampling method with grouping (hourly, daily, weekly),\r\nadaptive sampling (prefer business hours or weekends) with various aggregation (mean, sum, min, max)\r\nwithin each interval with configurable starting points (first, last or random).\r\n\r\n### `lens` \"real-time\" Features\r\nEnhanced CSV viewing capabilities with [csvlens](https:\u002F\u002Fgithub.com\u002FYS-L\u002Fcsvlens) integration:\r\n\r\n- **Auto-Reload**: New `--auto-reload` option to automatically reload file when it changes\r\n  - Useful for monitoring live data files\r\n  \r\n- **Streaming stdin**: New `--streaming-stdin` option for real-time data viewing\r\n  - Supports viewing data as it's being piped in\r\n  \r\n- **Row Marking**: Updated csvlens dependency with row marking feature\r\n\r\n### Breaking Changes\r\n- `describegpt`: `--json` flag replaced with `--format` option\r\n- `describegpt`: `--jsonl` option removed\r\n- `schema`, `joinp`, `pivotp`, `sqlp`: Updated Polars schema naming conventions\u003Cbr>(existing workflows should work but output format may differ slightly)\r\n\r\n---\r\n\r\n## Added\r\n* Created [Event Logo Archive](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Ftree\u002Fmaster\u002Fdocs\u002Fimages\u002Fevent-logos) with AI-generated seasonal\u002Fversion logos\r\n* `describegpt`: add controlled vocabulary support for tags https:\u002F\u002Fgit","2025-12-08T06:09:21",{"id":244,"version":245,"summary_zh":246,"released_at":247},214842,"10.0.0","## [10.0.0] - 2025-11-23\r\n\r\n## Highlights:\r\n* **Enhanced Data Dictionary**: `describegpt` now features an expanded [default prompt (v4.0)](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F10.0.0\u002Fresources\u002Fdescribegpt_defaults.toml) that generates [more comprehensive data dictionaries](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F70163fd4f6f570ed53cc39f906cc201f7afbd920\u002Fdocs\u002Fnyc311-describegpt.md).\r\n* **Parallel Search\u002FReplace Operations**: `search`, `searchset`, and `replace` commands now support parallel execution when working with indexed CSV files, delivering significant performance improvements for large datasets.\r\n* **Search\u002FReplace Exact Match Options**: Added `--exact` option to `search`, `searchset`, and `replace` commands for precise string matching without regex patterns.\r\n* **Enhanced SQL Capabilities**: `sqlp` now supports arbitrary expressions in SQL JOIN constraints, named window references, and new SQL functions including `row_number`, `rank`, `dense_rank`, and `array_to_string`.\r\n* **Improved `pivotp` Performance**: Updated to use Polars' new lazy pivot API with `--maintain-order` flag for predictable output ordering.\r\n* **Luau 0.701**: Updated embedded Luau from 0.697 to [0.701](https:\u002F\u002Fgithub.com\u002Fluau-lang\u002Fluau\u002Freleases\u002Ftag\u002F0.701) with additional pattern matching documentation and tests.\r\n\r\n### Added\r\n* `search` & `searchset`: add `--exact` option for literal string matching https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3094\r\n* `search`: parallel search when file is indexed https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3096\r\n* `searchset`: parallel execution when indexed https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3097\r\n* `replace`: add `--exact` option https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fe73d9bf\r\n* `replace`: parallel execution when indexed https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3098\r\n* `sqlp`: added support for arbitrary expressions in SQL JOIN constraints https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fd47c44e & https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F0d2402b\r\n* `sqlp`: added support for `row_number`, `rank`, and `dense_rank` SQL window functions https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3115\r\n* `sqlp`: added support for named window references https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3118\r\n* `sqlp`: added support for `array_to_string` list evaluation https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F64cbf34\r\n* `pivotp`: added `--maintain-order` flag for predictable output ordering https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F02dca12\r\n* `describegpt`: default-prompt-file v4.0 with expanded Data Dictionary generation https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F4db0d18\r\n* `luau`: expanded documentation for string functions using pattern matching https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fa7344e3 & https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F2dcc9a4\r\n* `util::mem_file_check`: added platform adjustment factor https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F421be84\r\n* benchmarks: v7.0 added search & searchset indexed parallel benchmarks https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F55df784\r\n* benchmarks: v7.1.0 added replace_indexed_parallel benchmark https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F05c89d8\r\n\r\n### Changed\r\n* `describegpt`: refactored for improved reliability https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F1433bf1 & https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fb6190a4\r\n* `frequency`: special rank of 0 now assigned to `\u003CALL_UNIQUE>` rows https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Feffa13b\r\n* `frequency`: microoptimizations https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F775bb88 & https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F29ec7af\r\n* `search`, `searchset` & `replace`: now parallelizable with an index, with significant performance improvements https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F45fc83d\r\n* `search`: use faster, non-allocating `par_sort_unstable_by_key` for improved performance https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F5f50f23\r\n* `search`: optimize `--quick` option https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F1fc1b85\r\n* `search`: `--preview-match` option forces sequential search https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F017ca6f\r\n* `search`, `searchset` & `replace`: sort chunks instead of raw data for better performance https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F5b58cb8\r\n* `searchset`: microoptimizations for performance https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fc4ce324\r\n* `replace`: remove unneeded index rebuild logic https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fcfdba60\r\n* `pivotp`: refactored to adapt to Polars' new lazy pivot API https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3102\r\n* `excel`: microoptimize hot loop and formula retrieval https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Ff141c1b & https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F17780b5\r\n* `stats`: cache repetitive expensive env_var access in hot path https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fa6ad0ce\r\n* `stats`: multiple microoptimizations https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F2f41c33 & https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F9bf43e5 & https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F00958a1\r\n* `validate`: updated to jsonschema 0.37.x with improved error handling https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Ff45693d & https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fc7ad5d2 & https:\u002F\u002F","2025-11-23T22:43:15",{"id":249,"version":250,"summary_zh":251,"released_at":252},214843,"9.1.0","## [9.1.0] - 2025-11-03\r\n\u003Cp align=\"center\">\r\n\u003Cimg width=\"400\" height=\"376\" alt=\"FAIRMetadataRocks-smaller\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff1d07aba-43b0-4879-8380-76fb8ee43b52\" \u002F>\u003C\u002Fp>\r\n\r\n[FAIRification](https:\u002F\u002Fwww.go-fair.org\u002Ffair-principles\u002Ffairification-process\u002F) continues to be a focus, as we tweak key commands that enable us to FAIRify raw data at blazing speed:\r\n\r\n- `frequency` received significant updates in this release, including several new options that make compiling frequency distribution tables easier.\r\n- `describegpt` now uses the much faster [BLAKE3 hash](https:\u002F\u002Fgithub.com\u002FBLAKE3-team\u002FBLAKE3?tab=readme-ov-file#blake3) as a cache key (10-20x faster than SHA256) and supports passing complex prompts more easily through the file system.\r\n- [qsv-stats](https:\u002F\u002Fdocs.rs\u002Fqsv-stats\u002Flatest\u002Fstats\u002Findex.html) - the engine that powers both `stats` and `frequency` commands - has been further optimized with the [0.40.0 release](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv-stats\u002Freleases\u002Ftag\u002F0.40.0), to compile summary statistics as fast as possible - even for very large files - often one to two orders of magnitude faster (10 to 100x faster) than typical Python-based tools.\r\n- [Polars](https:\u002F\u002Fpola.rs) has been upgraded to [0.52.0](https:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars\u002Freleases\u002Ftag\u002Frs-0.52.0). This vectorized query engine allows us to support more tabular formats & analyze\u002Fquery millions of rows in seconds [_**in situ**_](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FIn-situ_processing) - all without loading the data into a database.\r\n- the [csv 1.4.0 crate](https:\u002F\u002Fgithub.com\u002FBurntSushi\u002Frust-csv?tab=readme-ov-file#csv) has been [tuned further to squeeze out even higher throughput](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Faaa84b0b22c8cf60361554ddee5213b1d6f8ca49\u002FCargo.toml#L304C1-L313C82) - already ~2 million rows per second![^1]\r\n\r\nThese improvements prepare the ground for the upcoming [MCP](https:\u002F\u002Fmodelcontextprotocol.io\u002Fdocs\u002Fgetting-started\u002Fintro) server on [qsv pro](https:\u002F\u002Fqsvpro.dathere.com), which will enable at-scale, configurable, interactive \"[_**Data Steward-in-the-loop**_](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FHuman-in-the-loop)\", value-added FAIRification of privacy-sensitive files.\r\n\r\nThe qsv pro MCP server will handle not just CSVs but also other formats, including unstructured data - **_all processed locally on the desktop, without sending your raw data to the cloud._**\r\n\r\nIt will produce AI-ready, standards-compliant metadata (starting with [DCAT-US v3](https:\u002F\u002Fdoi-do.github.io\u002Fdcat-us\u002F), [Croissant](https:\u002F\u002Fdocs.mlcommons.org\u002Fcroissant\u002Fdocs\u002Fcroissant-spec.html) and [schema.org](https:\u002F\u002Fschema.org\u002Fdocs\u002Fdata-and-datasets.html)) - ideal context for AI applications and data governance efforts alike.\r\n\r\n[^1]: see [`validate_no_schema` benchmark](https:\u002F\u002Fqsv.dathere.com\u002Fbenchmarks)\r\n---\r\n\r\n## Added\r\n* `frequency`: add `--pretty-json` option https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fc67fd061a0cd101b0e04aaab79087c04324b0e46\r\n* `frequency`: add `--rank-strategy` option https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3075\r\n* `frequency`: add `-null-text` option https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3082\r\n\r\n## Changed\r\n* `describegpt`: explicitly use `frequency`'s dense rank strategy https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fdc3f270000fde3321ae0ad239010471db5ca3cad\r\n* `describegpt`: allow `--prompt` to be loaded from a text file https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fb11a10c306f0065f1852b23b935c5b04b0e69238\r\n* `describegpt`: use much faster BLAKE3 hash for cache key\r\n* `frequency`: change default rank-strategy from [min (AKA \"1224\" ranking) to dense (AKA \"1223\" ranking)](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FRanking#Strategies_for_handling_ties)\r\n* `lens`: bumped csvlens from 0.13.0 to [0.14.0](https:\u002F\u002Fgithub.com\u002FYS-L\u002Fcsvlens\u002Freleases\u002Ftag\u002Fv0.14.0)\r\n* `lens`: automatically set to monochrome mode when using `--find` option https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F85398690b0ebbc9dea227d13f528c7703451de8b\r\n* `luau`: bumped embedded Luau from 0.694 to 0.697 https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F3e68e2991757aba2b0597d722b1108fdc8009628\r\n* `stats`: fingerprint hash now uses much-faster, parallelizable BLAKE3 instead of SHA256\r\n* `table`: document that it also creates \"aligned TSVs\" and Fixed Width Format files https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Faaa84b0b22c8cf60361554ddee5213b1d6f8ca49\r\n* tests: change default Python to 3.13\r\n* docs: documented that Extended Input Support (🗄️) does `.zip` auto-decompression\r\n* docs: documented Limited Extended Input Support (🗃️)\r\n* use latest qsv-tuned csv crate with performance optimizations\r\n* build(deps): bump flate2 from 1.1.4 to 1.1.5 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3071\r\n* build(deps): bump human-panic from 2.0.3 to 2.0.4 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3077\r\n* deps: bump Polars from 0.51.0 at py-1.35.0-beta.1 to 0.52.0 https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F618edf0214a5ceb6df38cb61aafbc9e16ab35613\r\n* build(deps): bump qsv-stats from 0.39.1 to 0.40.0 by @dependabot[bot] in htt","2025-11-03T20:52:32",{"id":254,"version":255,"summary_zh":256,"released_at":257},214844,"8.1.1","## [8.1.1] - 2025-10-22\r\n\r\n## Added\r\n* docs: Seeded developer documentation for index\u002Fstats\u002Ffrequency modules by @kulnor in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3056\r\n\r\n## Changed\r\n* deps: use latest version of qsv-tuned csv crate https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F7523e086350b672b61ad40a0b0487233dbd26871\r\n* deps: unpin zip from 4.6 and bump to 6 now that geosuggest uses it https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F957ad6d0ec9de502e4283ea02badb67f170c6111\r\n* build(deps): bump dns-lookup from 3.0.0 to 3.0.1 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3057\r\n* build(deps): bump geosuggest-utils from 0.8.0 to 0.8.1 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3058\r\n* build(deps): bump geosuggest-core from 0.8.0 to 0.8.1 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3059\r\n* build(deps): bump memmap2 from 0.9.8 to 0.9.9 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3060\r\n* build(deps): bump pyo3 from 0.27.0 to 0.27.1 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3061\r\n* tweaked several publishing and test GH Actions workflows\r\n* applied `clippy::to_string_in_format_args` lint suggestion\r\n* bumped several indirect dependencies\r\n\r\n## Fixed\r\n* use latest csvlens patched fork that fixes panic when using stdin input https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F34154e6c9a521f1a05c63d175a217d3ecbc125c6\r\n\r\n## New Contributors\r\n* @kulnor made their first contribution in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3056\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcompare\u002F8.1.0...8.1.1","2025-10-22T02:28:05",{"id":259,"version":260,"summary_zh":261,"released_at":262},214845,"8.1.0","## [8.1.0] - 2025-10-20\r\n\r\nThis minor release features:\r\n* *qsv on [IBM Z mainframes](https:\u002F\u002Fwww.ibm.com\u002Fproducts\u002Fz) (s390x)!* - now that we have endianness detection, even adding a prebuilt binary for it.\r\n* `describegpt`: Output Kind and Token Usage have been added to the output making it easier to parse responses and track LLM costs.\r\n* `python`: with the latest [pyO3.rs 0.27](https:\u002F\u002Fpyo3.rs\u002Fv0.27.0\u002F) crate, we're setting the stage to drop support for Python 3.12 and below, targeting [free-threaded Python](https:\u002F\u002Fdocs.python.org\u002F3\u002Fhowto\u002Ffree-threading-python.html) exclusively starting with the 9.0 release. This should allow us to massively boost performance by parallelizing `py` workloads.\u003Cbr>It will also power the upcoming [FAIRification](https:\u002F\u002Fwww.go-fair.org\u002Ffair-principles\u002Ffairification-process\u002F) commands.\r\n* a [tuned csv fork](https:\u002F\u002Fgithub.com\u002Fdathere\u002Frust-csv\u002Ftree\u002Fqsv-tuned) based on the just released [csv 1.4](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fcsv) crate, increasing performance suite-wide.\r\n\r\n---\r\n\r\n## Added\r\n* `describegpt`: add Kind and Token Usage to output https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fa21e1177d471b4115b76c083458100552eace63c\r\n* add big-endian handling for big-endian platforms (e.g. `s390x-unknown-linux-gnu`) https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3045\r\n* add s390x prebuilt binary (qsv now runs on IBM Z Mainframes!) https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fa3f455cfa7b0562a8b2a0a5bc25dc54d797ddeab\r\n\r\n## Changed\r\n* `datefmt`: Replace `localzone` crate with `iana-time-zone` crate https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3048\r\n* `geoconvert`: Improved with the latest geozero fixes needed for [Datapusher+](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fdatapusher-plus?tab=readme-ov-file#datapusher) processing of GeoJSON and SHP files.\r\n* `python`: micro-optimize to remove unnecessary clone; use more idiomatic error_result handling - https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F777aa14e3930e126a70faf936a8dc836703d2eaf\r\n* docs: update badges with PowerPC Linux GNU, Windows ARM64 MSVC, remove macOS Intel by @rzmk in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3036\r\n* deps: bump bitflags from 2.9.4 to 2.10.0 https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F8d65c1be6ff03939edb2c7700b4b28b0a57819fe\r\n* deps: bumped csv crate to 1.4 and reapplied qsv optimizations. For more info, see https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F4e2f2a08dfdf96f4c508d6406de1782144c5ed44\r\n* deps: bump csvs_convert patch fork https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F8aa398fef9c3098582f8727094434d160f898666\r\n* deps: bump geozero to latest upstream with unreleased fixes - https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F0a9d1b39649a3a88342ddaa0f211fb1c628180dc\r\n* deps: bump polars to 0.51.0 at py-1.35.0-beta-1 tag\r\n* deps: bump socket2 from 0.6.0 to 0.6.1\r\n* deps: bump whatlang to 0.18 https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fe80e9c0b28d0cabe0e07ad4fb79f4c347f62d6a7\r\n* build(deps): bump actions\u002Fsetup-python from 5.0.0 to 6.0.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3030\r\n* build(deps): bump actix-governor from 0.8.0 to 0.10.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3046\r\n* build(deps): bump gzp from 1.0.1 to 2.0.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3033\r\n* build(deps): bump github\u002Fcodeql-action from 3 to 4 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3034\r\n* build(deps): bump flexi_logger from 0.31.4 to 0.31.5 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3032\r\n* build(deps): bump flexi_logger from 0.31.5 to 0.31.6 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3035\r\n* build(deps): bump flexi_logger from 0.31.6 to 0.31.7 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3038\r\n* build(deps): bump libc from 0.2.176 to 0.2.177 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3040\r\n* build(deps): bump pyo3 from 0.26.0 to 0.27.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3055\r\n* build(deps): bump qsv_docopt from 1.8.0 to 1.9.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3041\r\n* build(deps): bump regex from 1.11.3 to 1.12.1 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3043\r\n* build(deps): bump regex from 1.12.1 to 1.12.2 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3050\r\n* build(deps): bump reqwest from 0.12.23 to 0.12.24 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3049\r\n* build(deps): bump rust_decimal from 1.38.0 to 1.39.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3047\r\n* build(deps): bump simd-json from 0.16.0 to 0.17.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3031\r\n* build(deps): bump tikv-jemallocator from 0.6.0 to 0.6.1 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3053\r\n* build(deps): bump tokio from 1.47.1 to 1.48.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3052\r\n* applied select clippy lint suggestions\r\n* updated indirect dependencies\r\n\r\n## Fixed\r\n* `headers`: fix stdin handling without explicit `-` for stdin input https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqs","2025-10-20T10:43:23",{"id":264,"version":265,"summary_zh":266,"released_at":267},214846,"8.0.0","## [8.0.0] - 2025-10-06\r\n\r\n[![FAIRdataAIREADYdataBanner](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7ace7659-c081-464b-88dd-6a4be3c0c87a)](https:\u002F\u002Fdathere.com\u002F2025\u002F09\u002Ffair-data-is-federated-ai-ready-data\u002F)[^1]\r\n[Findable, Accessible, Interoperable & Reusable (FAIR) Data](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FFAIR_data) is [AI-Ready Data](https:\u002F\u002Fdathere.com\u002F2025\u002F09\u002Ffair-data-is-federated-ai-ready-data\u002F).\r\n\r\nA week and a half after launching our [\"People's API\"](https:\u002F\u002Fdathere.com\u002F2025\u002F09\u002Ftowards-the-peoples-api\u002F) [AI Chatbot and \"AI-Ready\" service](https:\u002F\u002Fdathere.com\u002F2025\u002F09\u002Fdemocratizing-data-access-introducing-datheres-ai-chatbot-and-ai-ready-data-solutions\u002F), we fine-tune qsv further, as it powers the [FAIRification](https:\u002F\u002Fwww.go-fair.org\u002Ffair-principles\u002Ffairification-process\u002F) engine that allows us to [**_\"open your data\" (as a verb)_**](https:\u002F\u002Fdathere.com\u002F2025\u002F09\u002Fckan-is-not-just-for-open-data-its-also-for-opening-your-data\u002F) - to infer and calculate AI-Ready, FAIR metadata at _*blazing speed*_ even for large datasets.\r\n\r\nThis release features:\r\n* `describegpt` fixes and improvements\r\n* `table` can now produce \"aligned\" TSV and Fixed Width format files\r\n* `validate` now has [Extended Input Support](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv#extended-input-support) in its [RFC 4180](https:\u002F\u002Fdatatracker.ietf.org\u002Fdoc\u002Fhtml\u002Frfc4180) validation mode\r\n* `extdedup` fixed to dedupe arbitrarily large csv or text files\r\n* `luau` upgraded from [0.690 to 0.693](https:\u002F\u002Fgithub.com\u002Fluau-lang\u002Fluau\u002Fcompare\u002F0.690...0.693)\r\n* PowerPC64 pre-built binaries - making it more convenient to use qsv on this \"power\"ful 😉  platform that's [widely used in research](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fissues\u002F2854) (thanks to [IBM-provided access to its native GitHub Action ppc64le runners](https:\u002F\u002Fgithub.com\u002FIBM\u002Factionspz)! For the next release - [qsv on IBM Z Mainframes](https:\u002F\u002Fgithub.com\u002FIBM\u002Factionspz\u002Fissues\u002F47#issuecomment-3371815164)!)\r\n\r\n\r\n\r\nThese changes set the stage for even more advanced, powerful, configurable FAIRification capabilities to\r\n##  *__make ALL your Data AI-Ready, Useful, Usable & Used by Machines & Humans alike__*.\r\n\r\n\r\n## Added\r\n* `table`: add `leftendtab` alignment option https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3004\r\n* `table`: add `leftfwf` (Fixed Width Format) alignment option https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F590c8612206859021d035c4b925dd6be9577afd2\r\n* `validate`: add [Extended Input Support](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv?tab=readme-ov-file#extended-input-support) to RFC 4180 validation mode https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3012\r\n* added PowerPC64 LE Linux prebuilt\r\n\r\n## Changed\r\n* `describegpt`: fine-tuned default LLM Prompt template (v3.1.0) https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F00e52a35f696f2b7765486cc8e8dabcfec091e81 https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F6b09b7e9fcb6885ebfbd6c9fa77cfbca6a991d6e https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F5be7f2e3c9d25f82bab1f7a12279340ecd828db0\r\n* `luau`: bump embedded Luau from 0.690 to 0.693 https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3017\r\n* `schema`: make Decimal Type Scale configurable for polars schema with `QSV_POLARS_DECIMAL_SCALE` env var - https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Ff20edd5eabf6ad624af72069c7125198d9b347c5\r\n* updated optimized csv crate, adding non-allocating `StringRecord::trim()` and more `inline()`s https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F4a1c82a7eaa49e702c754cab4767e0211477e2b4\r\n* deps: bump calamine to 0.31.0 https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fbd7a04cd9d030903f28286a2d7b04d11bcb22487\r\n* deps: Bump polars to 0.51.0 from 0.50.0 at py-1.33.1 tag https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2995\r\n* deps: bump polars to 0.51.0 at py-1.34.0-beta.4 tag at revision b973cac (latest upstream) https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3022\r\n* deps: bump polars to 0.51.0 at py-1.35.0 tag revision b973cac https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F41648750f2156e66d2cb12729da6d02bd0c6411c\r\n* deps: replace tabwriter with renamed fork qsv-tabwriter https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3010\r\n* deps: use patched fork of whatlang-rs. Though our PR was merged, there is still no new release https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F6afff4fda25a5330d4293d62579b3f20557d2251\r\n* build(deps): bump base62 from 2.2.2 to 2.2.3 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3003\r\n* build(deps): bump bytemuck from 1.23.2 to 1.24.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3026\r\n* build(deps): bump chrono from 0.4.41 to 0.4.42 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2974\r\n* build(deps): bump fancy-regex from 0.16.1 to 0.16.2 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3000\r\n* build(deps): bump flate2 from 1.1.2 to 1.1.3 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3027\r\n* build(deps): bump flexi_logger from 0.31.2 to 0.31.3 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3005\r\n* build(deps): bump flexi_logger from 0.31.3 to 0.31.4 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F3008\r\n* build(deps): bump indexmap from 2.11.0 to ","2025-10-06T00:43:06",{"id":269,"version":270,"summary_zh":271,"released_at":272},214847,"7.1.0","# [7.1.0] - 2025-09-06\r\n\r\n# 🇮🇹 csv,conf,v9 edition 🍝\r\n\r\n &nbsp; | &nbsp; \r\n:----|:----\r\n|\u003Cimg width=\"410\" height=\"410\" alt=\"csvconfv9-flavor-small\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8c747193-6f72-4cb4-80f8-2d33740e2512\" \u002F>|Just in time for [csv,conf,v9](https:\u002F\u002Fcsvconf.com\u002F), we're Bologna-bound and will be talking all things qsv, CSV, open data, [metadata](https:\u002F\u002Fdoi-do.github.io\u002Fdcat-us\u002F) [standards](https:\u002F\u002Fdocs.mlcommons.org\u002Fcroissant\u002Fdocs\u002Fcroissant-spec.html), AI, [POSE](https:\u002F\u002Fcivicdataecosystem.org) and [CKAN](https:\u002F\u002Fckan.org)!\u003Cbr>\u003Cbr>For this feature release, we polished `describegpt` a bit more for the occasion...\u003Cbr>\u003Cbr>**[_Towards the \"People's API!\"! Verso l'API del Popolo!_](https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Fjoelnatividad_towards-the-peoples-api-activity-7369788691717865472-VLGk)**\u003Cbr>(Answering People\u002FPolicymaker Interface)|\r\n\r\n---\r\n\r\n### 🚀 Enhanced `describegpt` Command\r\n* **Configurable Frequency Limits**: Make frequency distribution limit configurable for better control over data analysis\r\n* **[Few-shot Learning](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPrompt_engineering#Text-to-text)**: Add `--fewshot-examples` option to improve LLM response quality with contextual examples\r\n* **Advanced SQL Generation**: Fine-tuned SQL generation guidance for better date handling and query optimization \r\n* **Conditional SQL Results**: Implement conditional `--sql-results` format for more efficient \"SQL RAG\" processing - i.e. if the generated SQL query executes successfully - the results are saved to the specified file with a `.csv` extension. If a \"SQL hallucination\" fails, the file is saved with a `.sql` extension instead for the user to tweak and edit.\r\n* **TogetherAI Support**: Add support for TogetherAI models endpoint, expanding LLM provider options\r\n* **Enhanced Error Handling**: Improved SQL parsing error handling and more informative error messages\r\n* **Disk Cache by Default**: The disk cache is now enabled by default for better performance\r\n* **TOML Configuration**: Migrate from JSON to more readable TOML format for more easily modifiable prompt files.\r\n(see https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fresources\u002Fdescribegpt_defaults.toml)\r\n* **Better Local LLM Support**: `--api-key` can now be set to NONE for local LLM configurations that may not necessarily run on `localhost` (e.g. a shared Local LLM service running on the local network)\r\n\r\n### `partition` Command Enhancements\r\n* **New `--limit` Option**: Implement `--limit` option to set the maximum number of open files\r\n* **Streaming to Enhanced Batching Logic**: Convert from streaming to a simplified, two-pass batched approach designed to partition on columns with high cardinality for very large datasets \r\n\r\n---\r\n\r\n## Added\r\n* `describegpt`: add configurable frequency limit https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2950\r\n* `describegpt`: migrate prompt file from JSON to more easier to edit TOML format https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2954\r\n* `describegpt`: refactor default prompt file; add `--fewshot-examples` option https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2955\r\n* `describegpt`: add TogetherAI support for models endpoint https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2965\r\n* `partition`: add `--limit` option https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2960\r\n* added Windows ARM64 prebuilt binaries\r\n\r\n## Changed\r\n* `describegpt`: enable disk cache by default https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2951\r\n* `describegpt`: Polars SQL generation tweaks https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2958\r\n* `python`: replace deprecated `with_gil` with `attach` https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2949. This sets the stage for [\"free-threaded\" Python 3.14](https:\u002F\u002Fdocs.python.org\u002F3.14\u002Fwhatsnew\u002F3.14.html#whatsnew314-pep779) support when its released in October 2025. Buh-bye GIL!\r\n* deps: bump embedded Luau from 0.688 to 0.690 https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2967\r\n* deps: bump Polars to 0.50.0 at py-1.33.0 tag\r\n* build(deps): bump actions\u002Fsetup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2962\r\n* build(deps): bump actions\u002Fstale from 9 to 10 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2963\r\n* build(deps): bump log from 0.4.27 to 0.4.28 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2961\r\n* build(deps): bump mlua from 0.11.2 to 0.11.3 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2948\r\n* build(deps): bump pyo3 from 0.25.1 to 0.26.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2946\r\n* build(deps): bump uuid from 1.18.0 to 1.18.1 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2956\r\n* build(deps): bump zip from 4.5.0 to 4.6.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2952\r\n* applied select clippy lints\r\n* updated indirect dependencies\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcompare\u002F7.0.1...7.1.0","2025-09-06T16:07:41",{"id":274,"version":275,"summary_zh":276,"released_at":277},214848,"7.0.1","## [7.0.1] - 2025-08-28\r\n\r\nA patch release with some minor bug fixes, benchmark tweaks and build system improvements.\r\n\r\n## Added\r\n* publish: add dedicated powerpc64le-unknown-linux-gnu publishing workflow (WIP)\r\n\r\n## Changed\r\n* docs: `describegpt` expanded error message about LLM URL or API key \r\n* deps: remove planus pinned dependency\r\n\r\n## Fixed\r\n* fix: `geocode` `--batch 0` causes panic when polars feature is enabled \r\n* publish: remove luau feature from x86_64-pc-windows builds that was causing builds to fail\r\n* publish: remove powerpc64le from main publish workflow\r\n* benchmarks: updated to v6.8.0 with fixes to luau and clustered sample benchmarks\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcompare\u002F7.0.0...7.0.1","2025-08-29T03:06:12",{"id":279,"version":280,"summary_zh":281,"released_at":282},214849,"7.0.0","## [7.0.0] - 2025-08-28\r\n\r\n# 🥳 Open Weights with Open Data, Local LLM 🤖 edition 🚀\r\n\r\nThis is the biggest release yet - 470+ commits since v6.0.1! Packed with new AI-powered features, fixes and significant performance improvements suite-wide!\r\n\r\nWith the release of [OpenAI's gpt-oss open-weight reasoning model](https:\u002F\u002Fopenai.com\u002Findex\u002Fintroducing-gpt-oss\u002F) earlier this month setting the stage, we continue on our [\"Automagical Metadata\"](https:\u002F\u002Fdathere.com\u002F2023\u002F11\u002Fautomagical-metadata\u002F) journey by revamping `describegpt`.\r\n\r\n🤖 **Revamped `describegpt` - AI-Powered Metadata Inferencing and Data Analysis:**\r\n- **Intelligent Metadata Generation**: Automatically generate comprehensive metadata - Data Dictionaries, Description and Tags for your Datasets using Large Language Models (LLM) prompted with summary statistics and frequency tables as detailed context - **without sending your data to the cloud**!\u003Cbr>Even if you elect to use a cloud-based LLM, your [Raw Data is never sent](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002Fmaster\u002Fdocs\u002FDescribegpt.md).\r\n- **Chat with your Data**: If your prompt can be answered using this high-quality, high-resolution Metadata, `describegpt` will answer it! If your prompt is not remotely related to the data, it will politely refuse - _\"I'm sorry, I can only answer questions about the Dataset.\"_\r\n- **Auto SQL RAG Mode**:  Should the LLM decide that it doesn't have the necessary information in the metadata it compiled to answer your prompt, it will automatically enter SQL [Retrieval-Augmented Generation (RAG)](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FRetrieval-augmented_generation) mode - using the rich metadata instead as context to craft an expert-level, deterministic, reproducible, \"hallucination-free\" SQL query[^1] to respond to your prompt.\r\n- **Database Engine Support**: If [DuckDB](https:\u002F\u002Fduckdb.org\u002F) is installed or the Polars feature is enabled, and `--sql-results \u003CANSWER.CSV>` is specified - an optimized SQL query will be automatically executed with the query results saved to the specified file.\u003Cbr>As both DuckDB and Polars are purpose-built [OLAP](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FOnline_analytical_processing) engines that support direct queries (no database pre-loading required), you get answers in a few seconds[^2] - even for very large datasets.\r\n- **Multi-LLM Support**: Works with _any_ [OpenAI-API compatible](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fapi-reference\u002Fchat) LLM - with special support for local LLMs like [Ollama](https:\u002F\u002Follama.com\u002F), [Jan](https:\u002F\u002Fjan.ai\u002F) and [LM Studio](https:\u002F\u002Flmstudio.ai\u002F), with the ability to customize model behavior with the [`--addl-props` option](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F311b1e6d15e41477095e425079243ccda33e1c1e\u002Fsrc\u002Fcmd\u002Fdescribegpt.rs#L123-L127).\r\n- **Advanced Caching**: [Disk and Redis caching support](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F311b1e6d15e41477095e425079243ccda33e1c1e\u002Fsrc\u002Fcmd\u002Fdescribegpt.rs#L145-L167) for performance and cost optimization.\r\n- **Flexible Prompting**: [Custom prompt files](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fblob\u002F311b1e6d15e41477095e425079243ccda33e1c1e\u002Fsrc\u002Fcmd\u002Fdescribegpt.rs#L106-L107) and built-in intelligent [templates](https:\u002F\u002Fraw.githubusercontent.com\u002Fdathere\u002Fqsv\u002Frefs\u002Fheads\u002Fmaster\u002Fresources\u002Fdescribegpt_defaults.json) for various analysis tasks.\r\n\r\nCheck out these examples using a [1 million row sample of NYC's 311 data](https:\u002F\u002Fraw.githubusercontent.com\u002Fwiki\u002Fdathere\u002Fqsv\u002Ffiles\u002FNYC_311_SR_2010-2020-sample-1M.7z)!\r\n- `--all` option produces a Data Dictionary, Description and Tags - [Markdown](docs\u002Fnyc311-describegpt.md), [JSON](https:\u002F\u002Fraw.githubusercontent.com\u002Fdathere\u002Fqsv\u002Frefs\u002Fheads\u002Fmaster\u002Fdocs\u002Fnyc311-describegpt.json)\r\n- [--prompt \"What are the top 10 complaint types per community board and borough?\"](docs\u002Fnyc311-describegpt-prompt.md) - [SQL result](docs\u002Fnyc311-describegpt-prompt.csv)\r\n- `--prompt \"How tall is the Empire State Building?\"` - _\"I'm sorry, I can only answer questions about the Dataset.\"_\r\n\r\nOn top of other improvements in [Datapusher+](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fdatapusher-plus) with its new [Jinja](https:\u002F\u002Fjinja.palletsprojects.com\u002Fen\u002Fstable\u002F)-based _*\"metadata suggestion engine\"*_ - we're using this AI-inferred metadata along with other precalcs to prepopulate [DCATv3](https:\u002F\u002Fwww.w3.org\u002FTR\u002Fvocab-dcat-3\u002F) (both [US](https:\u002F\u002Fdoi-do.github.io\u002Fdcat-us\u002F) and [European](https:\u002F\u002Fsemiceu.github.io\u002FDCAT-AP\u002Freleases\u002F3.0.0\u002F) profiles) and [Croissant](https:\u002F\u002Fresearch.google\u002Fblog\u002Fcroissant-a-metadata-format-for-ml-ready-datasets\u002F) metadata fields that are otherwise too hard and expensive to compile manually.\r\n\r\nThe inferred and precalculated metadata values are offered as \"suggestions\", using a UI\u002FUX purpose-built to facilitate interactive **_metadata curation chats_**.\r\n\r\nThis allows Data Stewards to compile high-quality, high-resolution metadata catalogs with an accelerated [\"Data Steward in the Loop\"](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FHuman-in-the-loop) data ingestion and metadata curation workflow.\r\n\r\nIf you want to see","2025-08-28T14:13:30",{"id":284,"version":285,"summary_zh":286,"released_at":287},214850,"6.0.1","## [6.0.1] - 2025-07-12\r\n\r\nThis is a patch release with bug fixes and minor improvements.\r\n\r\n---\r\n\r\n### Changed\r\n* feat: updated completions for qsv v6.0.0 by @rzmk in [#2838](https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2838)\r\n* docs: updated sample schema.json based on NYC311 1M row sample benchmark data \r\n* docs: updated sample stats output using NYC 311 1M row sample benchmark data\r\n* build(deps): bump chrono-tz from 0.10.3 to 0.10.4 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2839\r\n* build(deps): bump qsv-stats from 0.35.0 to 0.36.0 by @dependabot[bot] in https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fpull\u002F2840\r\n* bumped indirect dependencies\r\n* Added benchmark_data.* to .gitignore\r\n\r\n### Fixed\r\n* `geocode`: make `--batch=0` mode more robust by setting a minimum batch size of 1,000 rows https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F2fa90bcc7df57a338a4851bafb361e886cea97c5\r\n* `jsonl`: correct batchsize calculation to use input file instead of output file for line counting https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002F742dc777a3d2d2f3d70e72078d69cfdc39c04b4b\r\n* `benchmarks`: fixed benchmarks with unescaped parameters with embedded spaces https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcommit\u002Fad95596b8400154b50042e2cb8352900d0198904\r\n\r\n### Removed\r\n- Removed retired publishing workflows (linux-glibc-231-musl-123 and wix-installer)\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fdathere\u002Fqsv\u002Fcompare\u002F6.0.0...6.0.1","2025-07-12T13:49:50"]