[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-niieani--gpt-tokenizer":3,"tool-niieani--gpt-tokenizer":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,52],"视频",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[14,35],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":79,"owner_twitter":73,"owner_website":80,"owner_url":81,"languages":82,"stars":95,"forks":96,"last_commit_at":97,"license":98,"difficulty_score":99,"env_os":100,"env_gpu":101,"env_ram":100,"env_deps":102,"category_tags":107,"github_topics":108,"view_count":32,"oss_zip_url":79,"oss_zip_packed_at":79,"status":17,"created_at":121,"updated_at":122,"faqs":123,"releases":154},4287,"niieani\u002Fgpt-tokenizer","gpt-tokenizer","The fastest JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT models (gpt-5, gpt-o*, gpt-4o, etc.). Port of OpenAI's tiktoken with additional features.","gpt-tokenizer 是一款专为 JavaScript 环境打造的高性能工具，用于处理 OpenAI 系列大模型（涵盖 GPT-4o、o1、GPT-5 及更早版本）的文本分词编码与解码。它本质上是 OpenAI 官方 Python 库 tiktoken 的 TypeScript 移植版，但针对前端和 Node.js 场景进行了深度优化。\n\n在开发基于大语言模型的应用时，准确计算 Token 数量对于控制 API 成本和确保输入不超限至关重要。gpt-tokenizer 解决了在 JavaScript 中缺乏快速、轻量且功能完整的分词方案的痛点。它不仅支持所有主流模型的编码格式，还能直接在浏览器中运行，无需复杂的后端服务。\n\n这款工具非常适合前端工程师、全栈开发者以及 AI 应用研究人员使用。其独特亮点包括：提供专门的 `encodeChat` 函数轻松处理对话格式；内置 `estimateCost` 功能可直接估算 API 费用；拥有高效的 `isWithinTokenLimit` 方法，无需完整编码即可快速判断文本是否超标。此外，它支持同步操作和流式数据处理，且无全局缓存设计，","gpt-tokenizer 是一款专为 JavaScript 环境打造的高性能工具，用于处理 OpenAI 系列大模型（涵盖 GPT-4o、o1、GPT-5 及更早版本）的文本分词编码与解码。它本质上是 OpenAI 官方 Python 库 tiktoken 的 TypeScript 移植版，但针对前端和 Node.js 场景进行了深度优化。\n\n在开发基于大语言模型的应用时，准确计算 Token 数量对于控制 API 成本和确保输入不超限至关重要。gpt-tokenizer 解决了在 JavaScript 中缺乏快速、轻量且功能完整的分词方案的痛点。它不仅支持所有主流模型的编码格式，还能直接在浏览器中运行，无需复杂的后端服务。\n\n这款工具非常适合前端工程师、全栈开发者以及 AI 应用研究人员使用。其独特亮点包括：提供专门的 `encodeChat` 函数轻松处理对话格式；内置 `estimateCost` 功能可直接估算 API 费用；拥有高效的 `isWithinTokenLimit` 方法，无需完整编码即可快速判断文本是否超标。此外，它支持同步操作和流式数据处理，且无全局缓存设计，有效避免了内存泄漏风险。如果你正在构建需要精确掌控 Token 消耗的 AI 应用，gpt-tokenizer 是一个可靠且高效的选择。","# gpt-tokenizer\n\n[![NPM version](https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fv\u002Fgpt-tokenizer?style=flat-square)](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002Fgpt-tokenizer)\n[![NPM downloads](https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fdm\u002Fgpt-tokenizer?style=flat-square)](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002Fgpt-tokenizer)\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg?style=flat-square)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n[![Build Status](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fniieani\u002Fgpt-tokenizer\u002Fci-cd.yml?branch=main&style=flat-square)](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Factions)\n\n`gpt-tokenizer` is a Token Byte Pair Encoder\u002FDecoder supporting all OpenAI's models (including GPT-5, GPT-4o, o1, o3, o4, GPT-4.1 and older models like GPT-3.5, GPT-4).\nIt's the [_fastest, smallest and lowest footprint_](#benchmarks) GPT tokenizer available for all JavaScript environments and is written in TypeScript.\n\n> Try it out in the **[playground](https:\u002F\u002Fgpt-tokenizer.dev\u002F)**!\n\nThis library has been trusted by:\n\n- Microsoft ([Teams](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fteams-ai), [GenAIScript](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fgenaiscript\u002F))\n- Elastic ([Kibana](https:\u002F\u002Fgithub.com\u002Felastic\u002Fkibana))\n- [Effect TS](https:\u002F\u002Feffect.website\u002F)\n- [CodeRabbit](https:\u002F\u002Fwww.coderabbit.ai\u002F)\n- [Rivet](https:\u002F\u002Fgithub.com\u002FIronclad\u002Frivet) by Ironclad\n\nPlease consider [🩷 sponsoring](https:\u002F\u002Fgithub.com\u002Fsponsors\u002Fniieani) the project if you find it useful.\n\n#### Features\n\nIt is the most feature-complete, open-source GPT tokenizer on NPM. This package is a port of OpenAI's [tiktoken](https:\u002F\u002Fgithub.com\u002Fopenai\u002Ftiktoken), with some additional, unique features sprinkled on top:\n\n- Support for easily tokenizing chats thanks to the `encodeChat` function\n- Support for all current OpenAI models (available encodings: `r50k_base`, `p50k_base`, `p50k_edit`, `cl100k_base`, `o200k_base`, and `o200k_harmony`)\n- Can be loaded and work synchronously! (i.e. in non async\u002Fawait contexts)\n- Generator function versions of both the decoder and encoder functions\n- Provides the ability to decode an asynchronous stream of data (using `decodeAsyncGenerator` and `decodeGenerator` with any iterable input)\n- No global cache (no accidental memory leaks, as with the original GPT-3-Encoder implementation)\n- Includes a highly performant `isWithinTokenLimit` function to assess token limit without encoding the entire text\u002Fchat\n- Built-in cost estimation with the `estimateCost` function for calculating API usage costs\n- Full library of OpenAI models with comprehensive pricing information (see [`src\u002Fmodels.ts`](.\u002Fsrc\u002Fmodels.ts) and [`src\u002Fmodels.gen.ts`](.\u002Fsrc\u002Fmodels.gen.ts))\n- Improves overall performance by eliminating transitive arrays\n- Type-safe (written in TypeScript)\n- Works in the browser out-of-the-box\n\n## Installation\n\n### As NPM package\n\n```bash\nnpm install gpt-tokenizer\n```\n\n### As a UMD module\n\n```html\n\u003Cscript src=\"https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\">\u003C\u002Fscript>\n\n\u003Cscript>\n  \u002F\u002F the package is now available as a global:\n  const { encode, decode } = GPTTokenizer_cl100k_base\n\u003C\u002Fscript>\n```\n\nIf you wish to use a custom encoding, fetch the relevant script.\n\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fo200k_base.js (for all modern models, such as `gpt-5`, `gpt-4o`, `gpt-4.1`, `o1` and others)\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fo200k_harmony.js (for open-weight Harmony models such as `gpt-oss-20b` and `gpt-oss-120b`)\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fcl100k_base.js (for `gpt-4` and `gpt-3.5`)\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fp50k_base.js\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fp50k_edit.js\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fr50k_base.js\n\nThe global name is a concatenation: `GPTTokenizer_${encoding}`.\n\nRefer to [supported models and their encodings](#Supported-models-and-their-encodings) section for more information.\n\n## Playground\n\nThe playground is published under a memorable URL: https:\u002F\u002Fgpt-tokenizer.dev\u002F\n\n[![GPT Tokenizer Playground](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fniieani_gpt-tokenizer_readme_89789f7a1a69.png)](https:\u002F\u002Fgpt-tokenizer.dev\u002F)\n\n## Usage\n\nThe library provides various functions to transform text into (and from) a sequence of integers (tokens) that can be fed into an LLM model. The transformation is done using a Byte Pair Encoding (BPE) algorithm used by OpenAI.\n\n```typescript\nimport {\n  encode,\n  encodeChat,\n  decode,\n  isWithinTokenLimit,\n  encodeGenerator,\n  decodeGenerator,\n  decodeAsyncGenerator,\n  ALL_SPECIAL_TOKENS,\n} from 'gpt-tokenizer'\n\u002F\u002F note: depending on the model, import from the respective file, e.g.:\n\u002F\u002F import {...} from 'gpt-tokenizer\u002Fmodel\u002Fgpt-4o'\n\nconst text = 'Hello, world!'\nconst tokenLimit = 10\n\n\u002F\u002F Encode text into tokens\nconst tokens = encode(text)\n\n\u002F\u002F Decode tokens back into text\nconst decodedText = decode(tokens)\n\n\u002F\u002F Check if text is within the token limit\n\u002F\u002F returns false if the limit is exceeded, otherwise returns the actual number of tokens (truthy value)\nconst withinTokenLimit = isWithinTokenLimit(text, tokenLimit)\n\n\u002F\u002F Allow special tokens when needed\nconst withinTokenLimitWithSpecial = isWithinTokenLimit(text, tokenLimit, {\n  allowedSpecial: ALL_SPECIAL_TOKENS,\n})\n\n\u002F\u002F Example chat:\nconst chat = [\n  { role: 'system', content: 'You are a helpful assistant.' },\n  { role: 'assistant', content: 'gpt-tokenizer is awesome.' },\n] as const\n\n\u002F\u002F Encode chat into tokens\nconst chatTokens = encodeChat(chat)\n\n\u002F\u002F Check if chat is within the token limit\nconst chatWithinTokenLimit = isWithinTokenLimit(chat, tokenLimit)\n\nconst chatWithinTokenLimitWithSpecial = isWithinTokenLimit(chat, tokenLimit, {\n  allowedSpecial: ALL_SPECIAL_TOKENS,\n})\n\n\u002F\u002F Encode text using generator\nfor (const tokenChunk of encodeGenerator(text)) {\n  console.log(tokenChunk)\n}\n\n\u002F\u002F Decode tokens using generator\nfor (const textChunk of decodeGenerator(tokens)) {\n  console.log(textChunk)\n}\n\n\u002F\u002F Decode tokens using async generator\n\u002F\u002F (assuming `asyncTokens` is an AsyncIterableIterator\u003Cnumber>)\nfor await (const textChunk of decodeAsyncGenerator(asyncTokens)) {\n  console.log(textChunk)\n}\n```\n\nBy default, importing from `gpt-tokenizer` uses `o200k_base` encoding, used by all modern OpenAI models, including `gpt-4o`, `gpt-4.1`, `o1`, etc.\n\nTo get a tokenizer for a different model, import it directly, for example:\n\n```ts\nimport {\n  encode,\n  decode,\n  isWithinTokenLimit,\n  \u002F\u002F etc...\n} from 'gpt-tokenizer\u002Fmodel\u002Fgpt-3.5-turbo'\n```\n\nIf you're dealing with a resolver that doesn't support package.json `exports` resolution, you might need to import from the respective `cjs` or `esm` directory, e.g.:\n\n```ts\nimport {\n  encode,\n  decode,\n  isWithinTokenLimit,\n  \u002F\u002F etc...\n} from 'gpt-tokenizer\u002Fcjs\u002Fmodel\u002Fgpt-3.5-turbo'\n```\n\n#### Lazy loading\n\nIf you don't mind loading the tokenizer asynchronously, you can use a dynamic import inside your function, like so:\n\n```ts\nconst {\n  encode,\n  decode,\n  isWithinTokenLimit,\n  \u002F\u002F etc...\n} = await import('gpt-tokenizer\u002Fmodel\u002Fgpt-3.5-turbo')\n```\n\n#### Loading an encoding\n\nIf your model isn't supported by the package, but you know which BPE encoding it uses, you can load the encoding directly, e.g.:\n\n```ts\nimport {\n  encode,\n  decode,\n  isWithinTokenLimit,\n  \u002F\u002F etc...\n} from 'gpt-tokenizer\u002Fencoding\u002Fcl100k_base'\n```\n\n### Supported models and their encodings\n\nWe support all OpenAI models, including the latest ones, with the following encodings:\n\n- `o`-series models, like `o1-*`, `o3-*` and `o4-*` (`o200k_base`)\n- `gpt-4o` (`o200k_base`)\n- `gpt-oss-*` (`o200k_harmony`)\n- `gpt-4-*` (`cl100k_base`)\n- `gpt-3.5-*` (`cl100k_base`)\n- `text-davinci-003` (`p50k_base`)\n- `text-davinci-002` (`p50k_base`)\n- `text-davinci-001` (`r50k_base`)\n- ...and many other models, see [models.ts](.\u002Fsrc\u002Fmodels.ts) for an up-to-date list of supported models and their encodings.\n\nIf you don't see the model you're looking for, the default encoding is probably the one you want.\n\n## API\n\n### `encode(text: string, encodeOptions?: EncodeOptions): number[]`\n\nEncodes the given text into a sequence of tokens. Use this method when you need to transform a piece of text into the token format that the GPT models can process.\n\nThe optional `encodeOptions` parameter allows you to specify special token handling (see [special tokens](#special-tokens)).\n\nExample:\n\n```typescript\nimport { encode } from 'gpt-tokenizer'\n\nconst text = 'Hello, world!'\nconst tokens = encode(text)\n```\n\n### `decode(tokens: number[]): string`\n\nDecodes a sequence of tokens back into text. Use this method when you want to convert the output tokens from GPT models back into human-readable text.\n\nExample:\n\n```typescript\nimport { decode } from 'gpt-tokenizer'\n\nconst tokens = [18435, 198, 23132, 328]\nconst text = decode(tokens)\n```\n\n### `isWithinTokenLimit(text: string | Iterable\u003CChatMessage>, tokenLimit: number, encodeOptions?: EncodeOptions): false | number`\n\nChecks if the input is within the token limit. Returns `false` if the limit is exceeded, otherwise returns the number of tokens. Use this method to quickly check if a given text or chat is within the token limit imposed by GPT models, without encoding the entire input. The optional `encodeOptions` parameter lets you configure special token handling.\n\nExample:\n\n```typescript\nimport { isWithinTokenLimit, ALL_SPECIAL_TOKENS } from 'gpt-tokenizer'\n\nconst text = 'Hello, world!'\nconst tokenLimit = 10\nconst withinTokenLimit = isWithinTokenLimit(text, tokenLimit)\n\nconst withinTokenLimitWithSpecial = isWithinTokenLimit(text, tokenLimit, {\n  allowedSpecial: ALL_SPECIAL_TOKENS,\n})\n```\n\n### `countTokens(text: string | Iterable\u003CChatMessage>, encodeOptions?: EncodeOptions): number`\n\nCounts the number of tokens in the input text or chat. Use this method when you need to determine the number of tokens without checking against a limit.\nThe optional `encodeOptions` parameter allows you to specify custom sets of allowed or disallowed special tokens.\n\nExample:\n\n```typescript\nimport { countTokens } from 'gpt-tokenizer'\n\nconst text = 'Hello, world!'\nconst tokenCount = countTokens(text)\n```\n\n### `countChatCompletionTokens(request: ChatCompletionRequest): number`\n\nCounts the tokens that a function-calling chat completion request will consume, including message overhead, optional function definitions, and pinned function calls. This helper is only available on models that support the `function_calling` feature.\n\nExample:\n\n```typescript\nimport {\n  countChatCompletionTokens,\n  type ChatCompletionRequest,\n} from 'gpt-tokenizer\u002Fmodel\u002Fgpt-4o'\n\nconst request: ChatCompletionRequest = {\n  messages: [\n    { role: 'system', content: 'You are a helpful assistant.' },\n    { role: 'user', content: 'Find the weather for San Francisco.' },\n  ],\n  functions: [\n    {\n      name: 'get_weather',\n      description: 'Look up the weather for a city.',\n      parameters: {\n        type: 'object',\n        required: ['city'],\n        properties: {\n          city: { type: 'string' },\n          unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },\n        },\n      },\n    },\n  ],\n}\n\nconst promptTokenEstimate = countChatCompletionTokens(request)\n```\n\nYou can also access the helper from the module's default export:\n\n```typescript\nimport gpt4o from 'gpt-tokenizer\u002Fmodel\u002Fgpt-4o'\n\n\u002F\u002F Reuse the `request` defined above\nconst tokenCount = gpt4o.countChatCompletionTokens?.(request)\n```\n\n### `encodeChat(chat: ChatMessage[], model?: ModelName, encodeOptions?: EncodeOptions): number[]`\n\nEncodes the given chat into a sequence of tokens. The optional `encodeOptions` parameter lets you configure special token handling.\n\nIf you didn't import the model version directly, or if `model` wasn't provided during initialization, it must be provided here to correctly tokenize the chat for a given model. Use this method when you need to transform a chat into the token format that the GPT models can process.\n\nExample:\n\n```typescript\nimport { encodeChat } from 'gpt-tokenizer'\n\nconst chat = [\n  { role: 'system', content: 'You are a helpful assistant.' },\n  { role: 'assistant', content: 'gpt-tokenizer is awesome.' },\n]\nconst tokens = encodeChat(chat)\n```\n\nNote that if you encode an empty chat, it will still contain the minimum number of special tokens.\n\n### `encodeGenerator(text: string): Generator\u003Cnumber[], void, undefined>`\n\nEncodes the given text using a generator, yielding chunks of tokens.\nUse this method when you want to encode text in chunks, which can be useful for processing large texts or streaming data.\n\nExample:\n\n```typescript\nimport { encodeGenerator } from 'gpt-tokenizer'\n\nconst text = 'Hello, world!'\nconst tokens = []\nfor (const tokenChunk of encodeGenerator(text)) {\n  tokens.push(...tokenChunk)\n}\n```\n\n### `encodeChatGenerator(chat: Iterator\u003CChatMessage>, model?: ModelName): Generator\u003Cnumber[], void, undefined>`\n\nSame as `encodeChat`, but uses a generator as output, and may use any iterator as the input `chat`.\n\n### `decodeGenerator(tokens: Iterable\u003Cnumber>): Generator\u003Cstring, void, undefined>`\n\nDecodes a sequence of tokens using a generator, yielding chunks of decoded text.\nUse this method when you want to decode tokens in chunks, which can be useful for processing large outputs or streaming data.\n\nExample:\n\n```typescript\nimport { decodeGenerator } from 'gpt-tokenizer'\n\nconst tokens = [18435, 198, 23132, 328]\nlet decodedText = ''\nfor (const textChunk of decodeGenerator(tokens)) {\n  decodedText += textChunk\n}\n```\n\n### `decodeAsyncGenerator(tokens: AsyncIterable\u003Cnumber>): AsyncGenerator\u003Cstring, void, undefined>`\n\nDecodes a sequence of tokens asynchronously using a generator, yielding chunks of decoded text. Use this method when you want to decode tokens in chunks asynchronously, which can be useful for processing large outputs or streaming data in an asynchronous context.\n\nExample:\n\n```javascript\nimport { decodeAsyncGenerator } from 'gpt-tokenizer'\n\nasync function processTokens(asyncTokensIterator) {\n  let decodedText = ''\n  for await (const textChunk of decodeAsyncGenerator(asyncTokensIterator)) {\n    decodedText += textChunk\n  }\n}\n```\n\n### `estimateCost(tokenCount: number, modelSpec?: ModelSpec): PriceData`\n\nEstimates the cost of processing a given number of tokens using the model's pricing data. This function calculates costs for different API usage types (main API, batch API) and cached tokens when available.\n\nThe function returns a `PriceData` object with the following structure:\n\n- `main`: Main API pricing with `input`, `output`, `cached_input`, and `cached_output` costs\n- `batch`: Batch API pricing with the same cost categories\n\nAll costs are calculated in USD based on the token count provided.\n\nExample:\n\n```typescript\nimport { estimateCost } from 'gpt-tokenizer\u002Fmodel\u002Fgpt-4o'\n\nconst tokenCount = 1000\nconst costEstimate = estimateCost(tokenCount)\n\nconsole.log('Main API input cost:', costEstimate.main?.input)\nconsole.log('Main API output cost:', costEstimate.main?.output)\nconsole.log('Batch API input cost:', costEstimate.batch?.input)\n```\n\nNote: The model spec must be available either through the model-specific import or by passing it as the second parameter. Cost information may not be available for all models.\n\n## Special tokens\n\nThere are a few special tokens that are used by the GPT models.\nNote that not all models support all of these tokens.\n\nBy default, **all special tokens are disallowed**.\n\nThe `encode`, `encodeGenerator`, `encodeChat`, `encodeChatGenerator`, `countTokens`, and `isWithinTokenLimit` functions accept an `EncodeOptions` parameter to customize special token handling:\n\n### Custom Allowed Sets\n\n`gpt-tokenizer` allows you to specify custom sets of allowed special tokens when encoding text. To do this, pass a\n`Set` containing the allowed special tokens as a parameter to the `encode` function:\n\n```ts\nimport {\n  EndOfPrompt,\n  EndOfText,\n  FimMiddle,\n  FimPrefix,\n  FimSuffix,\n  ImStart,\n  ImEnd,\n  ImSep,\n  encode,\n} from 'gpt-tokenizer'\n\nconst inputText = `Some Text ${EndOfPrompt}`\nconst allowedSpecialTokens = new Set([EndOfPrompt])\nconst encoded = encode(inputText, { allowedSpecialTokens })\nconst expectedEncoded = [8538, 2991, 220, 100276]\n\nexpect(encoded).toBe(expectedEncoded)\n```\n\nYou may also use a special shorthand for either disallowing or allowing all special tokens, by passing in the string `'all'`, e.g. `{ allowedSpecial: 'all' }`.\n\n### Custom Disallowed Sets\n\nSimilarly, you can specify custom sets of disallowed special tokens when encoding text. Pass a `Set`\ncontaining the disallowed special tokens as a parameter to the `encode` function:\n\n```ts\nimport { encode, EndOfText } from 'gpt-tokenizer'\n\nconst inputText = `Some Text ${EndOfText}`\nconst disallowedSpecial = new Set([EndOfText])\n\u002F\u002F throws an error:\nconst encoded = encode(inputText, { disallowedSpecial })\n```\n\nIn this example, an Error is thrown, because the input text contains a disallowed special token.\n\nIf both `allowedSpecialTokens` and `disallowedSpecial` are provided, `disallowedSpecial` takes precedence.\n\n## Performance Optimization\n\n### LRU Merge Cache\n\nThe tokenizer uses an LRU (Least Recently Used) cache to improve encoding performance for similar strings. By default, it stores up to 100,000 merged token pairs. You can adjust this value to optimize for your specific use case:\n\n- Increasing the cache size will make encoding similar strings faster but consume more memory\n- Setting it to 0 will disable caching completely\n- For applications processing many unique strings, a smaller cache might be more efficient\n\nYou can modify the cache size using the `setMergeCacheSize` function:\n\n```ts\nimport { setMergeCacheSize } from 'gpt-tokenizer'\n\n\u002F\u002F Set to 5000 entries\nsetMergeCacheSize(5000)\n\n\u002F\u002F Disable caching completely\nsetMergeCacheSize(0)\n```\n\nThe cache is persisted between encoding calls. To explicitly clear the cache (e.g. to free up memory), use the `clearMergeCache` function:\n\n```ts\nimport { clearMergeCache } from 'gpt-tokenizer'\n\nclearMergeCache()\n```\n\n## Testing and Validation\n\n`gpt-tokenizer` includes a set of test cases in the [TestPlans.txt](.\u002Fdata\u002FTestPlans.txt) file to ensure its compatibility with OpenAI's Python `tiktoken` library. These test cases validate the functionality and behavior of `gpt-tokenizer`, providing a reliable reference for developers.\n\nRunning the unit tests and verifying the test cases helps maintain consistency between the library and the original Python implementation.\n\n### Model Information\n\n`gpt-tokenizer` provides comprehensive data about all OpenAI models through the `models` export from [`gpt-tokenizer\u002Fmodels`](.\u002Fsrc\u002Fmodels.ts). This includes detailed information about context windows, costs, training data cutoffs, and deprecation status.\n\nThe data is regularly maintained to match OpenAI's official documentation. Contributions to keep this data up-to-date are welcome - if you notice any discrepancies or have updates, please feel free to open a PR.\n\n## [Benchmarks](https:\u002F\u002Fl8j6fv.csb.app\u002F)\n\nSince version 2.4.0, `gpt-tokenizer` is the fastest tokenizer implementation available on NPM. It's even faster than the available WASM\u002Fnode binding implementations.\nIt has the fastest encoding, decoding time and a tiny memory footprint. It also initializes faster than all other implementations.\n\nThe encodings themselves are also the smallest in size, due to the compact format they are stored in.\n\n![fastest benchmark](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fniieani_gpt-tokenizer_readme_22ff95b608d1.png)\n\n![lowest footprint benchmark](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fniieani_gpt-tokenizer_readme_63f88ae361ad.png)\n\n## License\n\nMIT\n\n## Contributing\n\nContributions are welcome! Please open a pull request or an issue to discuss your bug reports, or use the discussions feature for ideas or any other inquiries.\n\n## Thanks\n\nThanks to @dmitry-brazhenko's [SharpToken](https:\u002F\u002Fgithub.com\u002Fdmitry-brazhenko\u002FSharpToken), whose code was served as a reference for the port.\n\nHope you find the `gpt-tokenizer` useful in your projects!\n","# gpt-tokenizer\n\n[![NPM版本](https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fv\u002Fgpt-tokenizer?style=flat-square)](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002Fgpt-tokenizer)\n[![NPM下载量](https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fdm\u002Fgpt-tokenizer?style=flat-square)](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002Fgpt-tokenizer)\n[![许可证：MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg?style=flat-square)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n[![构建状态](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fniieani\u002Fgpt-tokenizer\u002Fci-cd.yml?branch=main&style=flat-square)](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Factions)\n\n`gpt-tokenizer` 是一个支持所有 OpenAI 模型（包括 GPT-5、GPT-4o、o1、o3、o4、GPT-4.1 以及更早的 GPT-3.5 和 GPT-4 等）的字节对编码器\u002F解码器。它是适用于所有 JavaScript 环境的 [_最快、最小且占用资源最少_] 的 GPT 分词器，并使用 TypeScript 编写。\n\n> 快来 **[游乐场](https:\u002F\u002Fgpt-tokenizer.dev\u002F)** 体验吧！\n\n该库已被以下公司和项目信赖：\n\n- 微软（[Teams](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fteams-ai)、[GenAIScript](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fgenaiscript\u002F)）\n- Elastic（[Kibana](https:\u002F\u002Fgithub.com\u002Felastic\u002Fkibana)）\n- [Effect TS](https:\u002F\u002Feffect.website\u002F)\n- [CodeRabbit](https:\u002F\u002Fwww.coderabbit.ai\u002F)\n- Ironclad 的 [Rivet](https:\u002F\u002Fgithub.com\u002FIronclad\u002Frivet)\n\n如果您觉得这个项目有用，请考虑 [🩷 赞助](https:\u002F\u002Fgithub.com\u002Fsponsors\u002Fniieani) 该项目。\n\n#### 特性\n\n它是 NPM 上功能最完整、开源的 GPT 分词器。本包是对 OpenAI 的 [tiktoken](https:\u002F\u002Fgithub.com\u002Fopenai\u002Ftiktoken) 的移植，并在此基础上添加了一些独特的功能：\n\n- 通过 `encodeChat` 函数轻松对聊天内容进行分词\n- 支持当前所有 OpenAI 模型（可用编码：`r50k_base`、`p50k_base`、`p50k_edit`、`cl100k_base`、`o200k_base` 和 `o200k_harmony`）\n- 可以同步加载并运行！（即在非 async\u002Fawait 上下文中也能使用）\n- 解码器和编码器函数均提供生成器版本\n- 提供解码异步数据流的能力（使用 `decodeAsyncGenerator` 和 `decodeGenerator`，可处理任何可迭代输入）\n- 无全局缓存（不会像原始 GPT-3 编码器实现那样出现意外内存泄漏）\n- 内置高性能的 `isWithinTokenLimit` 函数，无需对整段文本或聊天内容进行编码即可评估是否超出 token 限制\n- 内置成本估算功能 `estimateCost`，用于计算 API 使用成本\n- 包含完整的 OpenAI 模型库及详尽的价格信息（参见 [`src\u002Fmodels.ts`](.\u002Fsrc\u002Fmodels.ts) 和 [`src\u002Fmodels.gen.ts`](.\u002Fsrc\u002Fmodels.gen.ts)）\n- 通过消除传递性数组提升了整体性能\n- 类型安全（使用 TypeScript 编写）\n- 在浏览器中开箱即用\n\n## 安装\n\n### 作为 NPM 包\n\n```bash\nnpm install gpt-tokenizer\n```\n\n### 作为 UMD 模块\n\n```html\n\u003Cscript src=\"https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\">\u003C\u002Fscript>\n\n\u003Cscript>\n  \u002F\u002F 现在可以通过全局变量访问该包：\n  const { encode, decode } = GPTTokenizer_cl100k_base\n\u003C\u002Fscript>\n```\n\n如果您希望使用自定义编码，请加载相应的脚本。\n\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fo200k_base.js（适用于所有现代模型，如 `gpt-5`、`gpt-4o`、`gpt-4.1`、`o1` 等）\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fo200k_harmony.js（适用于开放权重的 Harmony 模型，如 `gpt-oss-20b` 和 `gpt-oss-120b`）\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fcl100k_base.js（适用于 `gpt-4` 和 `gpt-3.5`）\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fp50k_base.js\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fp50k_edit.js\n- https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fr50k_base.js\n\n全局变量名格式为：`GPTTokenizer_${encoding}`。\n\n更多信息请参阅 [支持的模型及其编码](#Supported-models-and-their-encodings) 部分。\n\n## 游乐场\n\n游乐场已发布在一个便于记忆的 URL 上：https:\u002F\u002Fgpt-tokenizer.dev\u002F\n\n[![GPT Tokenizer 游乐场](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fniieani_gpt-tokenizer_readme_89789f7a1a69.png)](https:\u002F\u002Fgpt-tokenizer.dev\u002F)\n\n## 使用方法\n\n该库提供了多种函数，用于将文本转换为（以及从）可输入大型语言模型的整数序列（即标记）。这种转换是通过 OpenAI 使用的字节对编码（BPE）算法实现的。\n\n```typescript\nimport {\n  encode,\n  encodeChat,\n  decode,\n  isWithinTokenLimit,\n  encodeGenerator,\n  decodeGenerator,\n  decodeAsyncGenerator,\n  ALL_SPECIAL_TOKENS,\n} from 'gpt-tokenizer'\n\u002F\u002F 注意：根据模型的不同，需从相应的文件中导入，例如：\n\u002F\u002F import {...} from 'gpt-tokenizer\u002Fmodel\u002Fgpt-4o'\n\nconst text = 'Hello, world!'\nconst tokenLimit = 10\n\n\u002F\u002F 将文本编码为标记\nconst tokens = encode(text)\n\n\u002F\u002F 将标记解码回文本\nconst decodedText = decode(tokens)\n\n\u002F\u002F 检查文本是否在标记限制内\n\u002F\u002F 如果超出限制则返回 false，否则返回实际的标记数量（真值）\nconst withinTokenLimit = isWithinTokenLimit(text, tokenLimit)\n\n\u002F\u002F 在需要时允许特殊标记\nconst withinTokenLimitWithSpecial = isWithinTokenLimit(text, tokenLimit, {\n  allowedSpecial: ALL_SPECIAL_TOKENS,\n})\n\n\u002F\u002F 示例对话：\nconst chat = [\n  { role: 'system', content: '你是一个有用的助手。' },\n  { role: 'assistant', content: 'gpt-tokenizer 非常棒。' },\n] as const\n\n\u002F\u002F 将对话编码为标记\nconst chatTokens = encodeChat(chat)\n\n\u002F\u002F 检查对话是否在标记限制内\nconst chatWithinTokenLimit = isWithinTokenLimit(chat, tokenLimit)\n\nconst chatWithinTokenLimitWithSpecial = isWithinTokenLimit(chat, tokenLimit, {\n  allowedSpecial: ALL_SPECIAL_TOKENS,\n})\n\n\u002F\u002F 使用生成器编码文本\nfor (const tokenChunk of encodeGenerator(text)) {\n  console.log(tokenChunk)\n}\n\n\u002F\u002F 使用生成器解码标记\nfor (const textChunk of decodeGenerator(tokens)) {\n  console.log(textChunk)\n}\n\n\u002F\u002F 使用异步生成器解码标记\n\u002F\u002F （假设 `asyncTokens` 是一个 AsyncIterableIterator\u003Cnumber））\nfor await (const textChunk of decodeAsyncGenerator(asyncTokens)) {\n  console.log(textChunk)\n}\n```\n\n默认情况下，从 `gpt-tokenizer` 导入会使用 `o200k_base` 编码，所有现代 OpenAI 模型都采用此编码，包括 `gpt-4o`、`gpt-4.1`、`o1` 等。\n\n若需获取其他模型的分词器，可直接导入，例如：\n\n```ts\nimport {\n  encode,\n  decode,\n  isWithinTokenLimit,\n  \u002F\u002F 等等...\n} from 'gpt-tokenizer\u002Fmodel\u002Fgpt-3.5-turbo'\n```\n\n如果使用的解析器不支持 `package.json` 的 `exports` 解析，可能需要从相应的 `cjs` 或 `esm` 目录中导入，例如：\n\n```ts\nimport {\n  encode,\n  decode,\n  isWithinTokenLimit,\n  \u002F\u002F 等等...\n} from 'gpt-tokenizer\u002Fcjs\u002Fmodel\u002Fgpt-3.5-turbo'\n```\n\n#### 延迟加载\n\n如果您不介意以异步方式加载分词器，可以在函数内部使用动态导入，如下所示：\n\n```ts\nconst {\n  encode,\n  decode,\n  isWithinTokenLimit,\n  \u002F\u002F 等等...\n} = await import('gpt-tokenizer\u002Fmodel\u002Fgpt-3.5-turbo')\n```\n\n#### 加载自定义编码\n如果您的模型未被本包支持，但您知道它使用的 BPE 编码，可以直接加载该编码，例如：\n\n```ts\nimport {\n  encode,\n  decode,\n  isWithinTokenLimit,\n  \u002F\u002F 等等...\n} from 'gpt-tokenizer\u002Fencoding\u002Fcl100k_base'\n```\n\n### 支持的模型及其编码\n我们支持所有 OpenAI 模型，包括最新版本，具体编码如下：\n\n- `o` 系列模型，如 `o1-*`、`o3-*` 和 `o4-*`（`o200k_base`）\n- `gpt-4o`（`o200k_base`）\n- `gpt-oss-*`（`o200k_harmony`）\n- `gpt-4-*`（`cl100k_base`）\n- `gpt-3.5-*`（`cl100k_base`）\n- `text-davinci-003`（`p50k_base`）\n- `text-davinci-002`（`p50k_base`）\n- `text-davinci-001`（`r50k_base`）\n- …以及其他众多模型，详情请参阅 [models.ts](.\u002Fsrc\u002Fmodels.ts)，其中列出了当前支持的模型及其编码。\n\n如果您没有找到所需的模型，那么默认编码很可能就是您所需要的。\n\n## API\n\n### `encode(text: string, encodeOptions?: EncodeOptions): number[]`\n\n将给定文本编码为标记序列。当您需要将一段文本转换为 GPT 模型可处理的标记格式时，请使用此方法。\n\n可选的 `encodeOptions` 参数允许您指定特殊标记的处理方式（详见[特殊标记](#special-tokens)）。\n\n示例：\n\n```typescript\nimport { encode } from 'gpt-tokenizer'\n\nconst text = 'Hello, world!'\nconst tokens = encode(text)\n```\n\n### `decode(tokens: number[]): string`\n\n将标记序列解码回文本。当您希望将 GPT 模型的输出标记转换为人类可读的文本时，请使用此方法。\n\n示例：\n\n```typescript\nimport { decode } from 'gpt-tokenizer'\n\nconst tokens = [18435, 198, 23132, 328]\nconst text = decode(tokens)\n```\n\n### `isWithinTokenLimit(text: string | Iterable\u003CChatMessage>, tokenLimit: number, encodeOptions?: EncodeOptions): false | number`\n\n检查输入是否在标记限制内。如果超出限制则返回 `false`，否则返回标记数量。当您需要快速检查给定文本或对话是否在 GPT 模型的标记限制范围内，而无需对整个输入进行编码时，可使用此方法。可选的 `encodeOptions` 参数允许您配置特殊标记的处理方式。\n\n示例：\n\n```typescript\nimport { isWithinTokenLimit, ALL_SPECIAL_TOKENS } from 'gpt-tokenizer'\n\nconst text = 'Hello, world!'\nconst tokenLimit = 10\nconst withinTokenLimit = isWithinTokenLimit(text, tokenLimit)\n\nconst withinTokenLimitWithSpecial = isWithinTokenLimit(text, tokenLimit, {\n  allowedSpecial: ALL SPECIAL TOKENS,\n})\n```\n\n### `countTokens(text: string | Iterable\u003CChatMessage>, encodeOptions?: EncodeOptions): number`\n\n统计输入文本或对话中的标记数量。当您需要确定标记数量而不必对照限制时，可使用此方法。\n可选的 `encodeOptions` 参数允许您指定允许或禁止的特殊标记集合。\n\n示例：\n\n```typescript\nimport { countTokens } from 'gpt-tokenizer'\n\nconst text = 'Hello, world!'\nconst tokenCount = countTokens(text)\n```\n\n### `countChatCompletionTokens(request: ChatCompletionRequest): number`\n\n计算一个调用函数的聊天完成请求将消耗的 token 数量，包括消息开销、可选的函数定义以及固定的函数调用。此辅助函数仅适用于支持 `function_calling` 功能的模型。\n\n示例：\n\n```typescript\nimport {\n  countChatCompletionTokens,\n  type ChatCompletionRequest,\n} from 'gpt-tokenizer\u002Fmodel\u002Fgpt-4o'\n\nconst request: ChatCompletionRequest = {\n  messages: [\n    { role: 'system', content: '你是一位乐于助人的助手。' },\n    { role: 'user', content: '查询旧金山的天气。' },\n  ],\n  functions: [\n    {\n      name: 'get_weather',\n      description: '查询某个城市的天气。',\n      parameters: {\n        type: 'object',\n        required: ['city'],\n        properties: {\n          city: { type: 'string' },\n          unit: { type: 'string', enum: ['摄氏度', '华氏度'] },\n        },\n      },\n    },\n  ],\n}\n\nconst promptTokenEstimate = countChatCompletionTokens(request)\n```\n\n你也可以通过模块的默认导出来访问该辅助函数：\n\n```typescript\nimport gpt4o from 'gpt-tokenizer\u002Fmodel\u002Fgpt-4o'\n\n\u002F\u002F 复用上面定义的 `request`\nconst tokenCount = gpt4o.countChatCompletionTokens?.(request)\n```\n\n### `encodeChat(chat: ChatMessage[], model?: ModelName, encodeOptions?: EncodeOptions): number[]`\n\n将给定的聊天对话编码为一串 token 序列。可选的 `encodeOptions` 参数允许你配置特殊 token 的处理方式。\n\n如果你没有直接导入模型版本，或者在初始化时未提供 `model` 参数，则必须在此处提供，以便正确地为特定模型对聊天进行分词。当你需要将聊天转换为 GPT 模型可以处理的 token 格式时，请使用此方法。\n\n示例：\n\n```typescript\nimport { encodeChat } from 'gpt-tokenizer'\n\nconst chat = [\n  { role: 'system', content: '你是一位乐于助人的助手。' },\n  { role: 'assistant', content: 'gpt-tokenizer 真棒。' },\n]\nconst tokens = encodeChat(chat)\n```\n\n请注意，即使你编码的是空聊天，它仍然会包含最少数量的特殊 token。\n\n### `encodeGenerator(text: string): Generator\u003Cnumber[], void, undefined>`\n\n使用生成器对给定文本进行编码，每次生成一部分 token。\n当你希望分块编码文本时，可以使用此方法，这在处理大型文本或流式数据时非常有用。\n\n示例：\n\n```typescript\nimport { encodeGenerator } from 'gpt-tokenizer'\n\nconst text = '你好，世界！'\nconst tokens = []\nfor (const tokenChunk of encodeGenerator(text)) {\n  tokens.push(...tokenChunk)\n}\n```\n\n### `encodeChatGenerator(chat: Iterator\u003CChatMessage>, model?: ModelName): Generator\u003Cnumber[], void, undefined>`\n\n与 `encodeChat` 相同，但输出为生成器，并且可以接受任何迭代器作为输入的聊天内容。\n\n### `decodeGenerator(tokens: Iterable\u003Cnumber>): Generator\u003Cstring, void, undefined>`\n\n使用生成器解码 token 序列，每次生成一部分解码后的文本。\n当你希望分块解码 token 时，可以使用此方法，这在处理大型输出或流式数据时很有用。\n\n示例：\n\n```typescript\nimport { decodeGenerator } from 'gpt-tokenizer'\n\nconst tokens = [18435, 198, 23132, 328]\nlet decodedText = ''\nfor (const textChunk of decodeGenerator(tokens)) {\n  decodedText += textChunk\n}\n```\n\n### `decodeAsyncGenerator(tokens: AsyncIterable\u003Cnumber>): AsyncGenerator\u003Cstring, void, undefined>`\n\n异步使用生成器解码 token 序列，每次生成一部分解码后的文本。当你希望以异步方式分块解码 token 时，可以使用此方法，这在异步上下文中处理大型输出或流式数据时非常有用。\n\n示例：\n\n```javascript\nimport { decodeAsyncGenerator } from 'gpt-tokenizer'\n\nasync function processTokens(asyncTokensIterator) {\n  let decodedText = ''\n  for await (const textChunk of decodeAsyncGenerator(asyncTokensIterator)) {\n    decodedText += textChunk\n  }\n}\n```\n\n### `estimateCost(tokenCount: number, modelSpec?: ModelSpec): PriceData`\n\n根据模型的定价信息，估算处理给定数量 token 的成本。此函数会计算不同 API 使用类型（主 API、批量 API）的成本，并在有缓存 token 的情况下考虑其成本。\n\n该函数返回一个 `PriceData` 对象，结构如下：\n\n- `main`: 主 API 定价，包含 `input`、`output`、`cached_input` 和 `cached_output` 成本\n- `batch`: 战略 API 定价，同样包含上述成本类别\n\n所有成本均基于提供的 token 数量以美元计算。\n\n示例：\n\n```typescript\nimport { estimateCost } from 'gpt-tokenizer\u002Fmodel\u002Fgpt-4o'\n\nconst tokenCount = 1000\nconst costEstimate = estimateCost(tokenCount)\n\nconsole.log('主 API 输入成本：', costEstimate.main?.input)\nconsole.log('主 API 输出成本：', costEstimate.main?.output)\nconsole.log('批量 API 输入成本：', costEstimate.batch?.input)\n```\n\n注意：模型规格必须通过特定于模型的导入获得，或者作为第二个参数传递。并非所有模型都提供成本信息。\n\n## 特殊 token\n\nGPT 模型使用了一些特殊的 token。\n需要注意的是，不是所有模型都支持所有的特殊 token。\n\n默认情况下，**所有特殊 token 均被禁止**。\n\n`encode`、`encodeGenerator`、`encodeChat`、`encodeChatGenerator`、`countTokens` 和 `isWithinTokenLimit` 函数都接受一个 `EncodeOptions` 参数，用于自定义特殊 token 的处理方式：\n\n### 自定义允许集\n\n`gpt-tokenizer` 允许你在编码文本时指定自定义的特殊 token 允许集。为此，你可以将包含允许特殊 token 的 `Set` 作为参数传递给 `encode` 函数：\n\n```ts\nimport {\n  EndOfPrompt,\n  EndOfText,\n  FimMiddle,\n  FimPrefix,\n  FimSuffix,\n  ImStart,\n  ImEnd,\n  ImSep,\n  encode,\n} from 'gpt-tokenizer'\n\nconst inputText = `一些文本 ${EndOfPrompt}`\nconst allowedSpecialTokens = new Set([EndOfPrompt])\nconst encoded = encode(inputText, { allowedSpecialTokens })\nconst expectedEncoded = [8538, 2991, 220, 100276]\n\nexpect(encoded).toBe(expectedEncoded)\n```\n\n你还可以使用一种特殊的简写方式来禁止或允许所有特殊 token，只需传递字符串 `'all'`，例如 `{ allowedSpecial: 'all' }`。\n\n### 自定义禁止集\n\n同样，你也可以在编码文本时指定自定义的特殊 token 禁止集。将包含禁止特殊 token 的 `Set` 作为参数传递给 `encode` 函数：\n\n```ts\nimport { encode, EndOfText } from 'gpt-tokenizer'\n\nconst inputText = `一些文本 ${EndOfText}`\nconst disallowedSpecial = new Set([EndOfText])\n\u002F\u002F 将抛出错误：\nconst encoded = encode(inputText, { disallowedSpecial })\n```\n\n在这个例子中，由于输入文本包含被禁止的特殊 token，因此会抛出错误。\n\n如果同时提供了 `allowedSpecialTokens` 和 `disallowedSpecial`，则以 `disallowedSpecial` 为准。\n\n## 性能优化\n\n### LRU 合并缓存\n\n分词器使用 LRU（最近最少使用）缓存来提升相似字符串的编码性能。默认情况下，它会存储最多 10 万个合并后的标记对。你可以根据具体用例调整此值以达到最佳效果：\n\n- 增大缓存大小会使相似字符串的编码速度更快，但会占用更多内存。\n- 将其设置为 0 会完全禁用缓存。\n- 对于处理大量唯一字符串的应用程序，较小的缓存可能更为高效。\n\n你可以使用 `setMergeCacheSize` 函数来修改缓存大小：\n\n```ts\nimport { setMergeCacheSize } from 'gpt-tokenizer'\n\n\u002F\u002F 设置为 5000 条目\nsetMergeCacheSize(5000)\n\n\u002F\u002F 完全禁用缓存\nsetMergeCacheSize(0)\n```\n\n缓存会在多次编码调用之间保持持久化。若需显式清空缓存（例如释放内存），可以使用 `clearMergeCache` 函数：\n\n```ts\nimport { clearMergeCache } from 'gpt-tokenizer'\n\nclearMergeCache()\n```\n\n## 测试与验证\n\n`gpt-tokenizer` 在 [TestPlans.txt](.\u002Fdata\u002FTestPlans.txt) 文件中包含一组测试用例，用于确保其与 OpenAI 的 Python `tiktoken` 库兼容。这些测试用例验证了 `gpt-tokenizer` 的功能和行为，为开发者提供了一个可靠的参考。\n\n运行单元测试并验证这些测试用例，有助于保持该库与原始 Python 实现的一致性。\n\n### 模型信息\n\n`gpt-tokenizer` 通过 [`gpt-tokenizer\u002Fmodels`](.\u002Fsrc\u002Fmodels.ts) 中的 `models` 导出，提供了关于所有 OpenAI 模型的全面数据。这包括上下文窗口、费用、训练数据截止日期以及弃用状态等详细信息。\n\n这些数据会定期维护，以匹配 OpenAI 的官方文档。欢迎贡献以保持数据的最新状态；如果你发现任何差异或有更新内容，请随时提交 Pull Request。\n\n## [基准测试](https:\u002F\u002Fl8j6fv.csb.app\u002F)\n\n自 2.4.0 版本以来，`gpt-tokenizer` 已成为 NPM 上最快的分词器实现，甚至比现有的 WASM\u002Fnode 绑定实现还要快。它具有最快的编码和解码速度，同时内存占用极小，并且初始化速度也优于其他所有实现。\n\n此外，由于采用紧凑的存储格式，其编码结果的文件尺寸也是最小的。\n\n![最快基准测试](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fniieani_gpt-tokenizer_readme_22ff95b608d1.png)\n\n![最低内存占用基准测试](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fniieani_gpt-tokenizer_readme_63f88ae361ad.png)\n\n## 许可证\n\nMIT\n\n## 贡献\n\n我们欢迎各种形式的贡献！请提交 Pull Request 或 Issue 来讨论你的问题报告，或者使用 Discussions 功能提出想法或其他疑问。\n\n## 致谢\n\n感谢 @dmitry-brazhenko 的 [SharpToken](https:\u002F\u002Fgithub.com\u002Fdmitry-brazhenko\u002FSharpToken)，其代码为本次移植提供了参考。\n\n希望你在项目中能充分利用 `gpt-tokenizer`！","# gpt-tokenizer 快速上手指南\n\n`gpt-tokenizer` 是一个高性能的 JavaScript\u002FTypeScript 库，支持所有 OpenAI 模型（包括 GPT-4o, o1, GPT-4.1 及旧版模型）的 Token 编码与解码。它是目前 JS 环境中速度最快、体积最小且无全局缓存泄漏风险的 GPT Tokenizer。\n\n## 环境准备\n\n- **运行环境**：Node.js (推荐 v14+) 或现代浏览器。\n- **语言支持**：原生支持 TypeScript，无需额外配置类型定义。\n- **前置依赖**：无第三方运行时依赖。\n\n## 安装步骤\n\n### 方式一：使用 NPM 安装（推荐）\n\n```bash\nnpm install gpt-tokenizer\n```\n\n> **国内加速建议**：如果下载缓慢，可使用淘宝镜像源安装：\n> ```bash\n> npm install gpt-tokenizer --registry=https:\u002F\u002Fregistry.npmmirror.com\n> ```\n\n### 方式二：浏览器直接使用 (UMD)\n\n在 HTML 中直接引入脚本文件，库将暴露为全局变量 `GPTTokenizer_${encoding}`：\n\n```html\n\u003C!-- 以现代模型默认编码 o200k_base 为例 -->\n\u003Cscript src=\"https:\u002F\u002Funpkg.com\u002Fgpt-tokenizer\u002Fdist\u002Fo200k_base.js\">\u003C\u002Fscript>\n\n\u003Cscript>\n  \u002F\u002F 全局变量名为 GPTTokenizer_o200k_base\n  const { encode, decode } = GPTTokenizer_o200k_base;\n  \n  const tokens = encode('Hello world');\n  console.log(decode(tokens));\n\u003C\u002Fscript>\n```\n\n## 基本使用\n\n### 1. 基础文本编码与解码\n\n默认导入使用的是 `o200k_base` 编码，适用于 `gpt-4o`, `o1`, `gpt-4.1` 等最新模型。\n\n```typescript\nimport { encode, decode } from 'gpt-tokenizer'\n\nconst text = 'Hello, world!'\n\n\u002F\u002F 编码：文本 -> Token 数组\nconst tokens = encode(text)\nconsole.log(tokens) \n\n\u002F\u002F 解码：Token 数组 -> 文本\nconst decodedText = decode(tokens)\nconsole.log(decodedText) \n```\n\n### 2. 针对特定模型使用\n\n如果需要为旧模型（如 `gpt-3.5-turbo` 或 `gpt-4`）进行分词，请从对应的模型路径导入：\n\n```typescript\nimport { encode, decode } from 'gpt-tokenizer\u002Fmodel\u002Fgpt-3.5-turbo'\n\nconst text = '你好，世界！'\nconst tokens = encode(text)\n```\n\n### 3. 对话消息编码 (Chat Messages)\n\n处理包含 role 和 content 的对话数组时，请使用 `encodeChat`：\n\n```typescript\nimport { encodeChat } from 'gpt-tokenizer'\n\nconst chat = [\n  { role: 'system', content: 'You are a helpful assistant.' },\n  { role: 'user', content: '解释一下量子力学。' },\n] as const\n\n\u002F\u002F 将对话结构转换为 Token 数组\nconst chatTokens = encodeChat(chat)\n```\n\n### 4. 检查 Token 数量限制\n\n在不完整编码整个文本的情况下，快速判断是否超出 Token 限制：\n\n```typescript\nimport { isWithinTokenLimit } from 'gpt-tokenizer'\n\nconst text = '这是一段很长的测试文本...'\nconst limit = 100\n\n\u002F\u002F 如果未超限，返回实际 token 数 (真值)；如果超限，返回 false\nconst result = isWithinTokenLimit(text, limit)\n\nif (result) {\n  console.log(`当前长度: ${result}，未超限`)\n} else {\n  console.log('已超出 Token 限制')\n}\n```","某 SaaS 客服团队正在开发一个基于 GPT-4o 的智能工单摘要系统，需要在用户提交长文本时实时计算 Token 消耗并预估 API 成本。\n\n### 没有 gpt-tokenizer 时\n- **性能瓶颈严重**：引入传统的 Node.js 绑定库导致浏览器端卡顿，无法在用户输入时即时反馈剩余字数限制。\n- **模型支持滞后**：官方库更新缓慢，难以适配最新的 GPT-4o 或 o1 模型编码规则，导致计数偏差引发 API 报错。\n- **成本核算繁琐**：缺乏内置计价逻辑，开发者需手动维护复杂的模型价格表来估算单次请求费用，容易出错。\n- **内存泄漏风险**：旧方案常依赖全局缓存处理分词，长时间运行后易造成前端页面内存溢出崩溃。\n\n### 使用 gpt-tokenizer 后\n- **极致响应速度**：利用其纯 TypeScript 编写的高性能特性，实现毫秒级同步分词，用户打字时即可动态显示进度条。\n- **全模型无缝兼容**：直接调用内置的 `o200k_base` 等最新编码配置，完美覆盖从 GPT-3.5 到 GPT-5 的所有迭代版本。\n- **一键成本估算**：通过 `estimateCost` 函数自动匹配官方定价，实时向管理员展示当前对话的预计美元开销。\n- **架构安全轻量**：无全局缓存设计彻底杜绝内存泄漏，且无需异步加载，可直接嵌入 Web Worker 或边缘函数中运行。\n\ngpt-tokenizer 将原本复杂的 Token 管理难题转化为简单的函数调用，让开发者能专注于业务逻辑而非底层编码细节。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fniieani_gpt-tokenizer_89789f7a.png","niieani","Bazyli Brzóska","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fniieani_7f33fcbb.png","An inventor passionate about HCI workflows that help humanity innovate responsibly, for happiness and global thriving.\r\n\r\nOfficial @zendesk account: @bbrzoska","@zendesk","San Francisco, CA",null,"https:\u002F\u002Finvent.life","https:\u002F\u002Fgithub.com\u002Fniieani",[83,87,91],{"name":84,"color":85,"percentage":86},"TypeScript","#3178c6",98.3,{"name":88,"color":89,"percentage":90},"JavaScript","#f1e05a",1.7,{"name":92,"color":93,"percentage":94},"Shell","#89e051",0,764,54,"2026-04-01T19:22:40","MIT",1,"未说明","不需要 GPU",{"notes":103,"python":104,"dependencies":105},"该工具是一个纯 JavaScript\u002FTypeScript 库，可在所有 JavaScript 环境（包括浏览器和 Node.js）中运行。无需安装 Python、GPU 或特定操作系统。支持通过 NPM 安装或直接作为 UMD 模块在 HTML 中引入。默认使用 o200k_base 编码，也可按需导入特定模型的编码模块。","不需要 Python",[106],"无 (纯 TypeScript\u002FJavaScript 库)",[35,14],[109,110,111,112,113,114,115,116,117,118,119,120],"bpe","gpt-2","gpt-3","machine-learning","gpt-4","tokenizer","decoder","encoder","openai","gpt-4o","gpt-o1","gpt-5","2026-03-27T02:49:30.150509","2026-04-06T15:55:33.492105",[124,129,134,139,144,149],{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},19515,"遇到 'encodeChatGenerator undefined' 或 'Cannot read properties of undefined' 错误怎么办？","这是一个已知问题，通常是因为忘记绑定函数或文档示例有误导致的。该问题已在 v2.1.2 版本中修复。请升级您的依赖包到最新版本：\n- npm: `npm install gpt-tokenizer@latest`\n- 或者指定版本：`npm install gpt-tokenizer@2.1.2`\n维护者确认这是文档生成时的疏忽，升级后即可解决。","https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F15",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},19516,"为什么库计算出的 Token 数量远高于 OpenAI API 实际返回的数量？","这是因为早期版本未正确处理聊天（Chat）格式的编码。请使用新推出的 `encodeChat` 函数来替代普通的 encode 函数，它能正确计算聊天消息的 Token 数。该修复已包含在 v2.0.0 及更高版本中。请确保升级库并使用正确的 API：\n`import { encodeChat } from 'gpt-tokenizer';`","https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F6",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},19517,"如何在 Cloudflare Workers 等环境中避免昂贵的初始化导致加载缓慢？","维护者在 v2.3.0 版本中进行了重大性能优化，显著降低了加载时间和内存消耗。数据显示，从 v2.2.3 升级到 v2.3.0 后，加载时间从约 253ms 降至 45ms，内存占用也大幅减少。请直接升级到 v2.3.0 或更高版本以解决此问题：\n`npm install gpt-tokenizer@2.3.0`","https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F18",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},19518,"gpt-tokenizer 的性能似乎不如 tiktoken，运行速度较慢，如何解决？","这是一个性能回归问题，已在 v2.8.0 版本中得到彻底解决。用户反馈显示，升级后执行时间从 11440ms 大幅下降至 615ms，性能甚至超过了 tiktoken。如果您遇到性能瓶颈，请务必升级到 v2.8.0 或更新版本：\n`npm install gpt-tokenizer@2.8.0`","https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F68",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},19519,"升级到 TypeScript 5.6+ 时出现 'EncoderMap' 类型不兼容的构建错误怎么办？","这是由于 TypeScript 5.6 加强了类型检查，暴露了库中之前隐藏的类型定义问题（特别是 `IterableIterator` 与 `MapIterator` 的不兼容）。这属于库本身的类型定义缺陷。解决方案是等待并升级到修复了该类型定义的后续版本。如果当前阻塞开发，临时方案是在 `tsconfig.json` 中设置 `skipLibCheck: true` 跳过第三方库的类型检查，但建议尽快更新 `gpt-tokenizer` 到修复版。","https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F49",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},19520,"在 iOS 浏览器（Safari\u002FWebKit）上使用时遇到 'Maximum call stack size exceeded' 错误？","这是一个在 iOS WebKit 内核浏览器上特有的兼容性错误。维护者正在调查中，通常需要用户提供具体的复现步骤（如使用的编码格式 `cl100k_base` 等）和设备信息来定位问题。如果您遇到此问题，建议：\n1. 确认是否在所有 iOS 浏览器均复现。\n2. 尝试在 CodeSandbox 中创建最小复现案例提供给维护者。\n3. 关注仓库的最新发布，此类严重兼容性错误通常会在后续补丁中优先修复。","https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F62",[155,160,165,170,175,180,185,190,195,200,205,210,215,220,225,230,235,240,245,250],{"id":156,"version":157,"summary_zh":158,"released_at":159},117569,"3.4.0","# [3.4.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F3.3.0...3.4.0) (2025-11-07)\n\n\n### 功能\n\n* 添加函数调用的 token 计数功能 ([#83](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F83)) ([7f880f4](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F7f880f46bb34a644ec8f9b3069060b2d2f99e11c))\n\n\n\n","2025-11-07T20:15:07",{"id":161,"version":162,"summary_zh":163,"released_at":164},117570,"3.3.0","# [3.3.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F3.2.0...3.3.0) (2025-11-07)\n\n\n### 功能特性\n\n* 修正 o200 的分词正则表达式，并增加对 harmony 格式的支持 ([59422fd](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F59422fd8b68987dc0a207e43c48198c1346ecc50))，关闭 [#82](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F82) 和 [#78](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F78)\n\n\n\n","2025-11-07T00:11:17",{"id":166,"version":167,"summary_zh":168,"released_at":169},117571,"3.2.0","# [3.2.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F3.1.0...3.2.0) (2025-10-09)\n\n\n### 功能\n\n* 支持在 `isWithinTokenLimit` 中使用编码选项 ([#80](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F80)) ([dcc8783](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Fdcc878381f2b173998141d0adefaf5116c33e6b3))\n\n\n\n","2025-10-09T23:12:04",{"id":171,"version":172,"summary_zh":173,"released_at":174},117572,"3.1.0","# [3.1.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F3.0.1...3.1.0) (2025-10-09)\n\n\n### Bug 修复\n\n* **codegen:** 防止 BPE 生成空输出 ([c46019a](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Fc46019a95b34182781dbfcf1ab240f2672dec522))\n* 将生成的头文件放置在 lint 指令之后 ([70bed74](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F70bed74b8858bae62f007f04984cc2c2329e8964))\n\n\n### 功能\n\n* 添加新模型（gpt-5*）并更新定价数据 ([135a851](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F135a8513f31459b569b2899f676bc1fa4b4e8ca3))\n\n\n\n","2025-10-09T22:47:22",{"id":176,"version":177,"summary_zh":178,"released_at":179},117573,"3.0.1","## [3.0.1](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F3.0.0...3.0.1)（2025-06-13）\n\n\n### 错误修复\n\n* 添加 o3-pro 并更新定价 ([52a3b3c](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F52a3b3c52bf10c8fce30a30da491d71e498b1d34))\n\n\n\n","2025-06-13T05:04:12",{"id":181,"version":182,"summary_zh":183,"released_at":184},117574,"3.0.0","# [3.0.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.9.0...3.0.0) (2025-06-07)\n\n\n### 功能\n\n* 添加新模型及成本估算功能 ([#72](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F72)) ([1d1d76d](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F1d1d76d86bd22d89b89ebba09f5f7b4eed04c4eb)), 关闭了 [#71](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F71) 和 [#70](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F70)\n\n\n### 重大变更\n\n* 将默认编码更改为 `o200k_base`，因为目前大多数现代模型都使用该编码\n\n\n\n","2025-06-07T22:51:02",{"id":186,"version":187,"summary_zh":188,"released_at":189},117575,"2.9.0","# [2.9.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.8.1...2.9.0) (2025-03-05)\n\n\n### 功能\n\n* 新增模型并更新定价 ([e2506c2](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Fe2506c229c542384928e35b34a2d3ed07cf68a10))\n* 实现 'estimateCost' 方法 ([4124587](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F4124587f4e2b30908db2cfe2d81b6520411e87de))\n\n\n\n","2025-03-05T02:40:41",{"id":191,"version":192,"summary_zh":193,"released_at":194},117576,"2.8.1","## [2.8.1](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.8.0...2.8.1) (2024-12-09)\n\n\n### 错误修复\n\n* 添加 'clearMergeCache' ([4f64377](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F4f64377455243d5a923a0fd0437886c6b0d11dc9))\n\n\n\n","2024-12-09T10:11:51",{"id":196,"version":197,"summary_zh":198,"released_at":199},117577,"2.8.0","# [2.8.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.7.0...2.8.0) (2024-12-09)\n\n\n### 功能\n\n* 添加 LRU BPE 合并缓存 ([15d13b1](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F15d13b1a35047d531efda795200257183b892a93)), 关闭 [#68](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F68)\n\n\n### 性能改进\n\n* 优化令牌计数 ([c3e533c](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Fc3e533c9f814875cd97ee8ae8b11d2129fc8e86f))\n\n\n\n","2024-12-09T09:43:49",{"id":201,"version":202,"summary_zh":203,"released_at":204},117578,"2.7.0","# [2.7.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.6.2...2.7.0) (2024-11-28)\n\n\n### 功能\n\n* 实现 `countTokens` 函数 ([2d4146a](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F2d4146a064d9dc6c2512bf6c869b05f2d18ce741)), 关闭 [#67](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F67)\n\n\n\n","2024-11-28T01:50:33",{"id":206,"version":207,"summary_zh":208,"released_at":209},117579,"2.6.2","## [2.6.2](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.6.1...2.6.2) (2024-11-13)\n\n\n### Bug Fixes\n\n* correct special token matching & counting ([3547826](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F3547826b37e829009a40d421a3733a54d13cd452))\n* unify property and variable names across the library ([6030d91](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F6030d91cbd8a08876212e9e43d4eb7387465e5ac))\n\n\n\n","2024-11-13T06:26:13",{"id":211,"version":212,"summary_zh":213,"released_at":214},117580,"2.6.1","## [2.6.1](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.6.0...2.6.1) (2024-11-11)\n\n\n### Bug Fixes\n\n* expose vocabulary size ([402ff0b](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F402ff0bea15acdd62cf5d2069ffa94b26f8200c4)), closes [#66](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F66)\n* use extensions in models.ts ([78b803d](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F78b803d4cf60dcf04b203b293378244e2efbabb2)), closes [#65](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F65)\n\n\n\n","2024-11-11T00:38:36",{"id":216,"version":217,"summary_zh":218,"released_at":219},117581,"2.6.0","# [2.6.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.5.1...2.6.0) (2024-11-04)\n\n\n### Bug Fixes\n\n* initialize encodings array in parts ([aa6c71d](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Faa6c71d1d3d6756087c5d246daa17669f94bc0a0)), closes [#62](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F62)\n\n\n### Features\n\n* add new and update existing models ([e832f9a](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Fe832f9a3c6ece43ad6f709e0fda33f7c0e68a743))\n* provide comprehensive data for all OpenAI models ([ec2ad7e](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Fec2ad7efc7873a303baab71853047f58becb1877))\n\n\n\n","2024-11-04T05:56:04",{"id":221,"version":222,"summary_zh":223,"released_at":224},117582,"2.5.1","## [2.5.1](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.5.0...2.5.1) (2024-10-21)\r\n\r\n(no changes, only benchmark update)\r\n\r\n","2024-10-21T03:27:53",{"id":226,"version":227,"summary_zh":228,"released_at":229},117583,"2.5.0","# [2.5.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.4.1...2.5.0) (2024-10-09)\n\n\n### Features\n\n* added o1-preview and o1-mini chat completion models ([#56](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F56)) ([41673af](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F41673afe7078c73d439583ffd470b6c52ed4f625))\n\n\n\n","2024-10-09T06:09:11",{"id":231,"version":232,"summary_zh":233,"released_at":234},117584,"2.4.1","## [2.4.1](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.4.0...2.4.1) (2024-10-07)\n\n\n### Bug Fixes\n\n* **deps:** update dependency gpt-tokenizer to ^2.4.0 ([bf4b459](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Fbf4b459d8d99903264698f606bdd9a31ca0b724f))\n\n\n\n","2024-10-07T03:27:09",{"id":236,"version":237,"summary_zh":238,"released_at":239},117585,"2.4.0","# [2.4.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.3.0...2.4.0) (2024-09-23)\n\n\n### Features\n\n* performance optimizations ([661e283](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F661e283ec92fa9b31a8d1eee01b29680c251e00a))\n\n\n\n","2024-09-23T02:47:13",{"id":241,"version":242,"summary_zh":243,"released_at":244},117586,"2.3.0","# [2.3.0](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.2.3...2.3.0) (2024-09-20)\n\n\n### Features\n\n* improve performance, memory usage & initialization time ([#50](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F50)) ([e2c560a](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Fe2c560aafeda84dcbec61880d552ffbaa69deaac)), closes [#18](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F18) [#35](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F35)\n\n\n\n","2024-09-20T02:37:36",{"id":246,"version":247,"summary_zh":248,"released_at":249},117587,"2.2.3","## [2.2.3](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.2.2...2.2.3) (2024-09-16)\n\n\n### Bug Fixes\n\n* **deps:** update dependency rfc4648 to ^1.5.3 ([fcbf48a](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Ffcbf48a553dcc4d6e7b617374880736070d16882))\n\n\n\n","2024-09-16T04:27:30",{"id":251,"version":252,"summary_zh":253,"released_at":254},117588,"2.2.2","## [2.2.2](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcompare\u002F2.2.1...2.2.2) (2024-09-15)\n\n\n### Bug Fixes\n\n* improve test typing ([bbd0764](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002Fbbd0764ad238c6c3f83aadfc75c37d47488577f6))\n* upgrade dependencies (including typescript) ([75ebd54](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fcommit\u002F75ebd542d8c70c2938b2fb214474f763fad4dccf)), closes [#49](https:\u002F\u002Fgithub.com\u002Fniieani\u002Fgpt-tokenizer\u002Fissues\u002F49)\n\n\n\n","2024-09-15T23:11:17"]