[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Ki6an--fastT5":3,"tool-Ki6an--fastT5":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":78,"owner_location":78,"owner_email":79,"owner_twitter":80,"owner_website":78,"owner_url":81,"languages":82,"stars":87,"forks":88,"last_commit_at":89,"license":90,"difficulty_score":23,"env_os":91,"env_gpu":92,"env_ram":91,"env_deps":93,"category_tags":99,"github_topics":100,"view_count":23,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":115,"updated_at":116,"faqs":117,"releases":143},3911,"Ki6an\u002FfastT5","fastT5","⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.","fastT5 是一款专为优化 T5 模型而设计的开源加速工具，旨在解决大型语言模型在推理过程中速度慢、体积大的痛点。它能够将预训练的 T5 模型转换为高效的 ONNX 格式，并通过量化技术将模型体积缩小至原来的三分之一，同时将推理速度提升高达五倍。这意味着原本运行缓慢的文本生成、摘要、翻译等任务，现在能以更低的资源消耗快速完成。\n\n该工具特别适合需要部署 NLP 应用的开发者和技术研究人员。无论是希望在边缘设备上运行模型，还是想要降低服务器成本，fastT5 都能提供显著帮助。其核心亮点在于极简的使用体验：只需一行代码，即可自动完成从模型导出、量化到基于 ONNX Runtime 运行的全过程。当然，它也支持分步自定义配置，允许高级用户灵活控制输出路径和量化策略。此外，生成的模型完美兼容 Hugging Face 的 `generate()` 接口，无需修改现有代码逻辑即可无缝集成，让模型加速变得简单高效。","![fastt5 icon](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_readme_bf3381bce824.png)\n\n\u003Ch1 style=\"text-align:center; font-weight:bold;\nfont-size:1.875rem\">Reduce T5 model size by 3X and increase the inference speed up to 5X.\n\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Fblob\u002Fmaster\u002FLICENSE\">\n        \u003Cimg alt=\"GitHub\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fki6an\u002Ffastt5?color=blue\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Factions\u002Fworkflows\u002Fci-workflow.yml\">\n        \u003Cimg alt=\"Workflow\" src=\"https:\u002F\u002Fgithub.com\u002Fki6an\u002FfastT5\u002Factions\u002Fworkflows\u002Fci-workflow.yml\u002Fbadge.svg\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Freleases\" >\n        \u003Cimg alt=\"PYPI release\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Ffastt5\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\" >\n        \u003Cimg alt=\"Workflow\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Ffastt5\">\n    \u003C\u002Fa>\n \u003C\u002Fp>\n\n\u003C\u002Fbr>\n\n- [Install](#install)\n- [Usage](#usage)\n- [Details](#details)\n- [Functionalities](#functionalities)\n- [Benchmarks](#benchmarks)\n  - [Onnx model](#onnx-model)\n  - [Quantized onnx model](#quantized-onnx-model)\n- [Quantized model scores](#quantized-model-scores)\n- [further improvements](#further-improvements)\n- [License](#license)\n- [Get Help](#get-help)\n- [Acknowledgements](#acknowledgements)\n\nT5 models can be used for several NLP tasks such as summarization, QA, QG, translation, text generation, and more. Sequential text generation is naturally slow, and for larger T5 models it gets even slower. **fastT5** makes the T5 models inference faster by running it on onnxruntime. and it also decreases the model size by quantizing it.\n\nfastT5 library allows you to convert a pretrained T5 model to onnx, quantizes it, and gives the model as output which is running on an onnxruntime in a single line of code. You can also customize this whole process.\n\n---\n\n## Install\n\nYou can install fastT5 from PyPI:\n\n```python\n pip install fastt5\n```\n\nIf you want to build from source:\n\n```python\ngit clone https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\ncd fastT5\npip3 install -e .\n```\n\n## Usage\n\nThe `export_and_get_onnx_model()` method exports the given pretrained T5 model to onnx, quantizes it and runs it on the onnxruntime with default settings. The returned model from this method supports the `generate()` method of huggingface.\n\n> If you don't wish to quantize the model then use `quantized=False` in the method.\n\n```python\nfrom fastT5 import export_and_get_onnx_model\nfrom transformers import AutoTokenizer\n\nmodel_name = 't5-small'\nmodel = export_and_get_onnx_model(model_name)\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nt_input = \"translate English to French: The universe is a dark forest.\"\ntoken = tokenizer(t_input, return_tensors='pt')\n\ntokens = model.generate(input_ids=token['input_ids'],\n               attention_mask=token['attention_mask'],\n               num_beams=2)\n\noutput = tokenizer.decode(tokens.squeeze(), skip_special_tokens=True)\nprint(output)\n```\n\n> to run the already exported model use `get_onnx_model()`\n\nyou can customize the whole pipeline as shown in the below code example:\n\n```python\nfrom fastT5 import (OnnxT5, get_onnx_runtime_sessions,\n                    generate_onnx_representation, quantize)\nfrom transformers import AutoTokenizer\n\nmodel_or_model_path = 't5-small'\n\n# Step 1. convert huggingfaces t5 model to onnx\nonnx_model_paths = generate_onnx_representation(model_or_model_path)\n\n# Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.\nquant_model_paths = quantize(onnx_model_paths)\n\n# step 3. setup onnx runtime\nmodel_sessions = get_onnx_runtime_sessions(quant_model_paths)\n\n# step 4. get the onnx model\nmodel = OnnxT5(model_or_model_path, model_sessions)\n\n                      ...\n```\n##### custom output paths \nBy default, fastT5 creates a `models` folder in the current directory and stores all the models. You can provide a custom path for a folder to store the exported models. And to run already `exported models` that are stored in a custom folder path: use `get_onnx_model(onnx_models_path=\"\u002Fpath\u002Fto\u002Fcustom\u002Ffolder\u002F\")`\n\n```python\nfrom fastT5 import export_and_get_onnx_model, get_onnx_model\n\nmodel_name = \"t5-small\"\ncustom_output_path = \"\u002Fpath\u002Fto\u002Fcustom\u002Ffolder\u002F\"\n\n# 1. stores models to custom_output_path\nmodel = export_and_get_onnx_model(model_name, custom_output_path)\n\n# 2. run already exported models that are stored in custom path\n# model = get_onnx_model(model_name, custom_output_path)\n\n```\n\n## Details\n\nT5 is a `seq2seq` model (Encoder-Decoder), as it uses decoder repeatedly for inference, we can't directly export the whole model to onnx. We need to export the encoder and decoder separately.\n\n> `past_key_values` contain pre-computed hidden-states (key and values in the self-attention blocks and cross-attention blocks) that can be used to speed up sequential decoding.\n\nmodels can only be exported with a constant number of inputs. Contrary to this, the decoder of the first step does not take `past_key_values` and the rest of the steps decoders do. To get around this issue, we can create two decoders: one for the first step that does not take `past_key_values` and another for the rest of the steps that utilize the `past_key_values`.\n\nNext, we'll export all three models (encoder, decoder, init_decoder). And then quantize them, quantizing `32bit` to `8bit` should give the 4x memory reduction. Since there is an extra decoder the model size reduces by 3x.\n\nFinally, we'll run the quantized model on onnx runtime.\n\n> The inference is simple as the model supports the [`generate()`](https:\u002F\u002Fhuggingface.co\u002Ftransformers\u002Fmain_classes\u002Fmodel.html?highlight=generate#transformers.generation_utils.GenerationMixin.generate) method of huggingface.\n\n## Functionalities\n\n- Export any pretrained T5 model to ONNX easily (with `past_key_values`).\n- The exported model supports beam search and greedy search and more via `generate()` method.\n- Reduce the model size by `3X` using quantization.\n- Up to `5X` speedup compared to PyTorch execution for greedy search and `3-4X` for beam search.\n\n## Benchmarks\n\nThe benchmarks are the result of the T5-base model tested on English to French translation.\n\n### Onnx model\n\nThe following graph shows the latency of the quantized onnx model vs the PyTorch model for beam numbers varying from 1 to 9. The latencies shown here are for the mean of sequence lengths up to 130.\n\n![t5-base](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_readme_dd881f3deb40.png)\n\nThe following heat map shows the X times faster which the ratio of latency of PyTorch to onnx model.\nThe onnx model outperforms most cases. however, the speed of the model drops for a longer sequence length.\n\n![t5-base-hist](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_readme_0955cfdf965f.png)\n\n### Quantized onnx model\n\nQuantized models are lightweight models as mentioned earlier, these models have almost the same accuracy as the original model (quantized model scores are mentioned in the next section). Quantized onnx models have the lowest latency compared to both Onnx & PyTorch models.\n\n![t5-base-quant](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_readme_c8dcca9c5e3a.png)\n\nThe model outperforms the PyTorch model by 5.7X for greedy search on average and 3-4X for beam search.\n\n![t5-base-quant-hist](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_readme_91b6623376b9.png)\n\n> Note : The results were generated on `AMD EPYC 7B12`, these results may vary from device to device. The Onnx models usually perform well on high-end CPUs with more cores.\n\n## Quantized model scores\n\nThe results were tested for English to French translation with beam search number of 3.\n\n|                    | Bleu_4   | METEOR   | ROUGE_L  |\n| ------------------ | -------- | -------- | -------- |\n| t5-small (quant)   | 0.240769 | 0.282342 | 0.468817 |\n| t5-small (pytorch) | 0.254601 | 0.295172 | 0.492749 |\n| t5-base (quant)    | 0.267606 | 0.306019 | 0.499188 |\n| t5-base (pytorch)  | 0.268346 | 0.304969 | 0.503306 |\n| t5-large (quant)   | 0.286726 | 0.316845 | 0.503585 |\n| t5-large (pytorch) | 0.294015 | 0.315774 | 0.508677 |\n\n## Private HuggingFace Model Hub Models\n\nThe [HuggingFace model hub](https:\u002F\u002Fhuggingface.co\u002Fmodels) supports private models. To use a private, pre-trained version of T5 with fastT5 you first must have authenticated into HuggingFace ecosystem with `$ transformers-cli login`. Then, when using fastT5, there is an extra import and call:\n\n```python\nfrom fastT5 import (\n    OnnxT5,\n    get_onnx_runtime_sessions,\n    generate_onnx_representation,\n    quantize,\n    set_auth_token)\nfrom transformers import AutoTokenizer\n\nset_auth_token(True)\n# the rest of the code is the same as using a public model\n```\n\nIf you are unable to call `$ transformers-cli login` or prefer to use your API Key, found at https:\u002F\u002Fhuggingface.co\u002Fsettings\u002Ftoken (or https:\u002F\u002Fhuggingface.co\u002Forganizations\u002FORG_NAME\u002Fsettings\u002Ftoken for organizations), you can pass that as a string to `set_auth_token`. Avoid hard-coding your API key into code by setting the environment variable `HF_API_KEY=\u003Credacted>`, and then in code:\n\n```python\nimport os\n\nfrom fastT5 import (\n    OnnxT5,\n    get_onnx_runtime_sessions,\n    generate_onnx_representation,\n    quantize,\n    set_auth_token)\nfrom transformers import AutoTokenizer\n\nauth_token = os.environ.get(\"HF_API_KEY\")\nset_auth_token(auth_token)\n\n# code proceeds as normal\n```\n\n## further improvements\n\n- currently the fastT5 library supports only the cpu version of onnxruntime, gpu implementation still needs to be done.\n- graph optimization of the onnx model will further reduce the latency.\n\n## Get Help\n\n- Contact me at kiranr8k@gmail.com\n- If appropriate, [open an issue](https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Fissues\u002Fnew\u002Fchoose) on GitHub\n\n## Acknowledgements\n\n- [original T5 paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.10683.pdf)\n- [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) by huggingface\n- [onnx](https:\u002F\u002Fgithub.com\u002Fonnx\u002Fonnx)\n- [onnxruntime](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime) by microsoft\n- [onnxt5](https:\u002F\u002Fgithub.com\u002Fabelriboulot\u002Fonnxt5)\n\n```python\n@article{2019t5,\n  author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},\n  title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},\n  journal = {arXiv e-prints},\n  year = {2019},\n  archivePrefix = {arXiv},\n  eprint = {1910.10683},\n}\n```\n","![fastt5 图标](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_readme_bf3381bce824.png)\n\n\u003Ch1 style=\"text-align:center; font-weight:bold;\nfont-size:1.875rem\">将 T5 模型大小缩减至原来的三分之一，并将推理速度提升至最多五倍。\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Fblob\u002Fmaster\u002FLICENSE\">\n        \u003Cimg alt=\"GitHub\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fki6an\u002Ffastt5?color=blue\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Factions\u002Fworkflows\u002Fci-workflow.yml\">\n        \u003Cimg alt=\"工作流\" src=\"https:\u002F\u002Fgithub.com\u002Fki6an\u002FfastT5\u002Factions\u002Fworkflows\u002Fci-workflow.yml\u002Fbadge.svg\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Freleases\" >\n        \u003Cimg alt=\"PYPI 发布\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Ffastt5\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\" >\n        \u003Cimg alt=\"下载量\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Ffastt5\">\n    \u003C\u002Fa>\n \u003C\u002Fp>\n\n\u003C\u002Fbr>\n\n- [安装](#install)\n- [使用方法](#usage)\n- [详细信息](#details)\n- [功能特性](#functionalities)\n- [基准测试](#benchmarks)\n  - [ONNX 模型](#onnx-model)\n  - [量化 ONNX 模型](#quantized-onnx-model)\n- [量化模型评分](#quantized-model-scores)\n- [进一步改进](#further-improvements)\n- [许可证](#license)\n- [获取帮助](#get-help)\n- [致谢](#acknowledgements)\n\nT5 模型可用于多种自然语言处理任务，如摘要生成、问答、问题生成、翻译、文本生成等。由于序列式文本生成的特性，其推理过程通常较为缓慢，而较大的 T5 模型则会更加耗时。**fastT5** 通过在 ONNX Runtime 上运行 T5 模型，显著提升了推理速度；同时，通过对模型进行量化，进一步减小了模型体积。\n\nfastT5 库允许用户将预训练的 T5 模型转换为 ONNX 格式，对其进行量化，并以单行代码的形式输出可在 ONNX Runtime 上运行的模型。此外，用户还可以自定义整个流程。\n\n---\n\n## 安装\n\n您可以通过 PyPI 安装 fastT5：\n\n```python\n pip install fastt5\n```\n\n如果您希望从源码构建：\n\n```python\ngit clone https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\ncd fastT5\npip3 install -e .\n```\n\n## 使用方法\n\n`export_and_get_onnx_model()` 方法会将给定的预训练 T5 模型导出为 ONNX 格式，对其进行量化，并使用默认设置在 ONNX Runtime 上运行。该方法返回的模型支持 Hugging Face 的 `generate()` 方法。\n\n> 如果您不希望对模型进行量化，可以在方法中设置 `quantized=False`。\n\n```python\nfrom fastT5 import export_and_get_onnx_model\nfrom transformers import AutoTokenizer\n\nmodel_name = 't5-small'\nmodel = export_and_get_onnx_model(model_name)\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nt_input = \"translate English to French: The universe is a dark forest.\"\ntoken = tokenizer(t_input, return_tensors='pt')\n\ntokens = model.generate(input_ids=token['input_ids'],\n               attention_mask=token['attention_mask'],\n               num_beams=2)\n\noutput = tokenizer.decode(tokens.squeeze(), skip_special_tokens=True)\nprint(output)\n```\n\n> 若要运行已导出的模型，请使用 `get_onnx_model()`。\n\n您可以按照以下代码示例自定义整个流程：\n\n```python\nfrom fastT5 import (OnnxT5, get_onnx_runtime_sessions,\n                    generate_onnx_representation, quantize)\nfrom transformers import AutoTokenizer\n\nmodel_or_model_path = 't5-small'\n\n# 步骤 1：将 Hugging Face 的 T5 模型转换为 ONNX 格式\nonnx_model_paths = generate_onnx_representation(model_or_model_path)\n\n# 步骤 2：（推荐）对转换后的模型进行量化，以加快推理速度并减小模型体积。\nquant_model_paths = quantize(onnx_model_paths)\n\n# 步骤 3：设置 ONNX Runtime\nmodel_sessions = get_onnx_runtime_sessions(quant_model_paths)\n\n# 步骤 4：获取 ONNX 模型\nmodel = OnnxT5(model_or_model_path, model_sessions)\n\n                      ...\n```\n##### 自定义输出路径 \n默认情况下，fastT5 会在当前目录下创建一个 `models` 文件夹，并将所有模型存储其中。您可以指定一个自定义文件夹路径来存储导出的模型。若要运行已存储在自定义路径下的导出模型，请使用 `get_onnx_model(onnx_models_path=\"\u002Fpath\u002Fto\u002Fcustom\u002Ffolder\u002F\")`。\n\n```python\nfrom fastT5 import export_and_get_onnx_model, get_onnx_model\n\nmodel_name = \"t5-small\"\ncustom_output_path = \"\u002Fpath\u002Fto\u002Fcustom\u002Ffolder\u002F\"\n\n# 1. 将模型存储到自定义输出路径\nmodel = export_and_get_onnx_model(model_name, custom_output_path)\n\n# 2. 运行已存储在自定义路径下的导出模型\n# model = get_onnx_model(model_name, custom_output_path)\n\n```\n\n## 详细信息\n\nT5 是一种序列到序列（Encoder-Decoder）模型。由于其在推理过程中会反复使用解码器，因此我们无法直接将整个模型导出为 ONNX 格式。我们需要分别导出编码器和解码器。\n\n> `past_key_values` 包含预先计算的隐藏状态（自注意力层和交叉注意力层中的键值），可用于加速序列解码。\n\n模型只能以固定数量的输入进行导出。然而，第一步的解码器并不需要 `past_key_values`，而后续步骤的解码器则需要。为了解决这一问题，我们可以创建两个解码器：一个用于第一步，不使用 `past_key_values`；另一个用于后续步骤，利用 `past_key_values`。\n\n接下来，我们将导出三个模型（编码器、解码器、初始解码器）。然后对它们进行量化，将 32 位量化为 8 位，理论上可使内存占用减少四分之一。由于多了一个解码器，模型体积将减少约三分之二。\n\n最后，我们将量化后的模型部署到 ONNX Runtime 上运行。\n\n> 推理过程非常简单，因为该模型支持 Hugging Face 的 [`generate()`](https:\u002F\u002Fhuggingface.co\u002Ftransformers\u002Fmain_classes\u002Fmodel.html?highlight=generate#transformers.generation_utils.GenerationMixin.generate) 方法。\n\n## 功能特性\n\n- 轻松将任何预训练的 T5 模型导出为 ONNX 格式（包含 `past_key_values`）。\n- 导出的模型通过 `generate()` 方法支持束搜索、贪婪搜索等多种解码策略。\n- 通过量化技术将模型体积缩小至原来的三分之一。\n- 相比 PyTorch 执行，贪婪搜索的速度可提升至五倍，束搜索的速度可提升至三到四倍。\n\n## 基准测试\n\n以下基准测试结果基于 T5-base 模型在英法翻译任务上的表现。\n\n### ONNX 模型\n\n下图展示了量化后的 ONNX 模型与 PyTorch 模型在束搜索数量从 1 到 9 不同时的延迟情况。此处显示的延迟是针对序列长度不超过 130 的平均值。\n\n![t5-base](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_readme_dd881f3deb40.png)\n\n以下热力图显示了 PyTorch 模型与 ONNX 模型延迟的倍数关系。大多数情况下，ONNX 模型的表现更优。然而，当序列长度较长时，ONNX 模型的速度会有所下降。\n\n![t5-base-hist](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_readme_0955cfdf965f.png)\n\n### 量化后的 ONNX 模型\n\n如前所述，量化模型是轻量级模型，其准确率几乎与原始模型相同（量化模型的评分将在下一节中提及）。与 ONNX 和 PyTorch 模型相比，量化后的 ONNX 模型具有最低的延迟。\n\n![t5-base-quant](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_readme_c8dcca9c5e3a.png)\n\n该模型在贪婪搜索上的平均性能比 PyTorch 模型快 5.7 倍，在束搜索上的性能则快 3 到 4 倍。\n\n![t5-base-quant-hist](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_readme_91b6623376b9.png)\n\n> 注意：这些结果是在 `AMD EPYC 7B12` 上生成的，不同设备的结果可能会有所差异。ONNX 模型通常在核心数较多的高端 CPU 上表现良好。\n\n## 量化模型评分\n\n测试结果为英语到法语的翻译任务，使用束搜索，束宽设为 3。\n\n|                    | Bleu_4   | METEOR   | ROUGE_L  |\n| ------------------ | -------- | -------- | -------- |\n| t5-small（量化）   | 0.240769 | 0.282342 | 0.468817 |\n| t5-small（PyTorch） | 0.254601 | 0.295172 | 0.492749 |\n| t5-base（量化）    | 0.267606 | 0.306019 | 0.499188 |\n| t5-base（PyTorch）  | 0.268346 | 0.304969 | 0.503306 |\n| t5-large（量化）   | 0.286726 | 0.316845 | 0.503585 |\n| t5-large（PyTorch） | 0.294015 | 0.315774 | 0.508677 |\n\n## 私有 HuggingFace Model Hub 模型\n\n[HuggingFace Model Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels) 支持私有模型。要使用 fastT5 的私有预训练 T5 版本，您首先需要通过 `$ transformers-cli login` 在 HuggingFace 生态系统中进行身份验证。然后，在使用 fastT5 时，还需要额外的导入和调用：\n\n```python\nfrom fastT5 import (\n    OnnxT5,\n    get_onnx_runtime_sessions,\n    generate_onnx_representation,\n    quantize,\n    set_auth_token)\nfrom transformers import AutoTokenizer\n\nset_auth_token(True)\n# 其余代码与使用公开模型相同\n```\n\n如果您无法运行 `$ transformers-cli login`，或者更倾向于使用您的 API 密钥（可在 https:\u002F\u002Fhuggingface.co\u002Fsettings\u002Ftoken 或组织的 https:\u002F\u002Fhuggingface.co\u002Forganizations\u002FORG_NAME\u002Fsettings\u002Ftoken 页面找到），您可以将该密钥作为字符串传递给 `set_auth_token`。为了避免将 API 密钥硬编码到代码中，可以设置环境变量 `HF_API_KEY=\u003Credacted>`，然后在代码中：\n\n```python\nimport os\n\nfrom fastT5 import (\n    OnnxT5,\n    get_onnx_runtime_sessions,\n    generate_onnx_representation,\n    quantize,\n    set_auth_token)\nfrom transformers import AutoTokenizer\n\nauth_token = os.environ.get(\"HF_API_KEY\")\nset_auth_token(auth_token)\n\n# 代码继续正常执行\n```\n\n## 未来改进方向\n\n- 目前 fastT5 库仅支持 ONNX Runtime 的 CPU 版本，GPU 实现仍有待完成。\n- 对 ONNX 模型进行图优化将进一步降低延迟。\n\n## 获取帮助\n\n- 请联系我：kiranr8k@gmail.com\n- 如有需要，请在 GitHub 上 [提交问题](https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Fissues\u002Fnew\u002Fchoose)。\n\n## 致谢\n\n- [原始 T5 论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.10683.pdf)\n- HuggingFace 的 [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)\n- [ONNX](https:\u002F\u002Fgithub.com\u002Fonnx\u002Fonnx)\n- Microsoft 的 [ONNX Runtime](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime)\n- [onnxt5](https:\u002F\u002Fgithub.com\u002Fabelriboulot\u002Fonnxt5)\n\n```python\n@article{2019t5,\n  author = {Colin Raffel、Noam Shazeer、Adam Roberts、Katherine Lee、Sharan Narang、Michael Matena、Yanqi Zhou、Wei Li、Peter J. Liu},\n  title = {利用统一的文本到文本 Transformer 探索迁移学习的极限},\n  journal = {arXiv e-prints},\n  year = {2019},\n  archivePrefix = {arXiv},\n  eprint = {1910.10683},\n}\n```","# fastT5 快速上手指南\n\nfastT5 是一个用于加速 Hugging Face T5 模型推理的开源库。它通过将模型转换为 ONNX 格式并进行量化，可将模型体积缩小 **3 倍**，推理速度提升最高 **5 倍**，同时保持与原始模型相近的精度。\n\n## 环境准备\n\n*   **操作系统**: Linux, macOS, Windows\n*   **Python 版本**: 建议 Python 3.7+\n*   **核心依赖**:\n    *   `transformers` (Hugging Face)\n    *   `onnx`\n    *   `onnxruntime` (当前版本主要支持 CPU 推理)\n\n## 安装步骤\n\n你可以直接通过 PyPI 安装 fastT5：\n\n```bash\npip install fastt5\n```\n\n如果需要从源码构建：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\ncd fastT5\npip3 install -e .\n```\n\n> **国内加速提示**：如果下载速度较慢，建议使用国内镜像源安装：\n> ```bash\n> pip install fastt5 -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 基本使用\n\nfastT5 提供了最简化的接口 `export_and_get_onnx_model()`，只需一行代码即可完成模型导出、量化并加载到 ONNX Runtime 中。返回的模型完全兼容 Hugging Face 的 `generate()` 方法。\n\n以下是一个完整的翻译示例（英语转法语）：\n\n```python\nfrom fastT5 import export_and_get_onnx_model\nfrom transformers import AutoTokenizer\n\n# 1. 加载并转换模型 (自动导出为 ONNX 并量化)\nmodel_name = 't5-small'\nmodel = export_and_get_onnx_model(model_name)\n\n# 2. 加载分词器\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n\n# 3. 准备输入数据\nt_input = \"translate English to French: The universe is a dark forest.\"\ntoken = tokenizer(t_input, return_tensors='pt')\n\n# 4. 生成结果 (用法与原生 PyTorch 模型一致)\ntokens = model.generate(\n    input_ids=token['input_ids'],\n    attention_mask=token['attention_mask'],\n    num_beams=2\n)\n\n# 5. 解码输出\noutput = tokenizer.decode(tokens.squeeze(), skip_special_tokens=True)\nprint(output)\n```\n\n**自定义存储路径**：\n默认情况下，转换后的模型会保存在当前目录的 `models` 文件夹中。你可以指定自定义路径：\n\n```python\ncustom_output_path = \"\u002Fpath\u002Fto\u002Fcustom\u002Ffolder\u002F\"\n\n# 将模型导出并保存到指定路径\nmodel = export_and_get_onnx_model(model_name, custom_output_path)\n\n# 或者直接加载已存在于指定路径的模型\n# model = get_onnx_model(model_name, custom_output_path)\n```","某初创团队正在开发一款实时多语言新闻摘要服务，需要在有限的云服务器资源上快速处理大量英文资讯并生成中文简报。\n\n### 没有 fastT5 时\n- **响应延迟高**：原生 T5 模型在 CPU 上进行序列生成时速度缓慢，用户等待摘要结果往往超过 3 秒，严重影响阅读体验。\n- **资源成本昂贵**：为了维持可接受的并发量，团队被迫租用昂贵的 GPU 实例，导致每月云算力预算严重超支。\n- **部署包体积大**：完整的预训练模型文件占用数 GB 存储空间，不仅拖慢了容器启动速度，还增加了边缘设备部署的难度。\n- **扩展性受限**：随着新闻源增加，现有的推理吞吐量成为瓶颈，无法在不大幅增加硬件投入的情况下支撑业务增长。\n\n### 使用 fastT5 后\n- **推理速度飞跃**：通过 ONNX Runtime 加速，fastT5 将推断速度提升了 5 倍，摘要生成几乎实现“秒回”，用户体验流畅自然。\n- **显著降低成本**：加速后的模型仅需普通 CPU 实例即可承载高并发，团队成功将服务器降级，月度算力成本降低 60% 以上。\n- **模型轻量便携**：量化技术使模型体积缩小了 3 倍，大幅减少了存储占用，使得在低配服务器甚至本地终端部署变得轻而易举。\n- **无缝集成开发**：只需一行代码即可完成模型转换与量化，且完美兼容 Hugging Face 的 `generate` 接口，无需重构现有业务逻辑。\n\nfastT5 通过极致的推理加速与模型压缩，让资源受限的团队也能低成本、高效率地落地高性能 T5 应用。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKi6an_fastT5_bf3381bc.png","Ki6an","Kiran R","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FKi6an_9ee68908.jpg",null,"kiranr8k@gmail.com","kiranr8k","https:\u002F\u002Fgithub.com\u002FKi6an",[83],{"name":84,"color":85,"percentage":86},"Python","#3572A5",100,590,74,"2026-03-29T15:09:30","Apache-2.0","未说明","当前仅支持 CPU 版本的 ONNX Runtime，GPU 实现尚未完成",{"notes":94,"python":91,"dependencies":95},"该工具通过将 T5 模型转换为 ONNX 格式并进行量化（32bit 转 8bit），可将模型体积减小 3 倍，推理速度提升最高 5 倍。目前不支持 GPU 加速，建议在多核高性能 CPU 上运行以获得最佳效果。支持使用 HuggingFace 私有模型，需配置认证 Token。",[96,97,98],"onnxruntime","transformers","onnx",[26,13,55],[101,102,98,96,103,104,105,106,107,108,109,110,111,112,113,114],"python","t5","quantization","fastt5","nlp","fast","quantized-onnx-models","translation","question-answering","inference-speed","pytorch","inference","deep-learning","transformer","2026-03-27T02:49:30.150509","2026-04-06T06:46:00.138157",[118,123,128,133,138],{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},17862,"加载保存的 ONNX 文件时出现 'Model requires X inputs. Input Feed contains Y' 错误怎么办？","该问题通常是由于量化模型路径的顺序不正确导致的。在使用 `get_onnx_runtime_sessions` 时，必须确保传入的量化模型路径顺序为：encoder, decoder, init-decoder。\n\n如果使用 `glob.glob()` 获取路径，顺序可能会被打乱，导致将只有 2 个输入的 encoder 模型误传给了需要 3 个输入的 init-decoder。请手动指定或排序路径以符合上述顺序。\n参考代码逻辑：确保路径列表顺序对应 [encoder, decoder, init-decoder]。","https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Fissues\u002F33",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},17863,"在使用 IO Bindings 进行 Decoder 推理时，输出结果与直接运行不一致怎么办？","这是一个已知的 ONNX Runtime 问题。解决方法是将输入数据转换为 `onnxruntime.OrtValue` 对象，而不是直接使用 PyTorch Tensor 或 NumPy 数组进行绑定。\n\n具体步骤如下：\n1. 将所有输入（如 input_ids, attention_mask, encoder_outputs, past_key_values）转换为 CPU 上的 NumPy 数组。\n2. 使用 `onnxruntime.OrtValue.ortvalue_from_numpy(array, \"cuda\")` 将其转换为 CUDA 上的 OrtValue。\n3. 在 `io_binding` 中绑定这些 OrtValue，或者直接使用 `session.run_with_ort_values()` 方法传入包含 OrtValue 的字典。\n\n示例代码片段：\n```python\nimport onnxruntime\n# 转换输入为 OrtValue\ndecoder_input_ids = onnxruntime.OrtValue.ortvalue_from_numpy(model_inputs['decoder_input_ids'].cpu().numpy(), \"cuda\")\nattention_mask_val = onnxruntime.OrtValue.ortvalue_from_numpy(model_inputs['attention_mask'].cpu().numpy(), \"cuda\")\n# ... 对其他输入做同样处理 ...\n# 然后使用 run_with_ort_values 或绑定这些值\n```","https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Fissues\u002F49",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},17864,"遇到 'forward() got an unexpected keyword argument cross_attn_head_mask' 错误如何解决？","这是因为当前 fastT5 在导出 Decoder 到 ONNX 格式时，仅支持一个可选参数 `decoder_attention_mask`，而未包含 `cross_attn_head_mask`、`head_mask` 等其他可选掩码参数。\n\n当调用模型时传入了这些未被导出的参数，就会报错。目前的解决方案是避免在推理时传递 `cross_attn_head_mask` 参数，或者等待项目更新以支持导出更多可选参数。维护者表示如果这些参数能显著提升速度或精度，未来会考虑添加。","https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Fissues\u002F18",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},17865,"如何将带有 past_key_values 的 Decoder 模型转换为 float16 (FP16) 格式？","在使用 ONNX Runtime 的 optimizer 将带有 `past_key_values` 的 Decoder 模型转换为 FP16 时，可能会遇到 `AssertionError`（符号形状推断错误）。\n\n尝试以下解决方法：\n1. 在导出 ONNX 模型时，明确指定输出名称（output names）。\n2. 虽然设置 `use_symbolic_shape_infer=False` 对 `decoder_init` 有效，但对完整的 `decoder` 可能不够。建议检查导出配置，确保没有冲突的符号维度。\n3. 如果问题依旧，可能需要暂时跳过对该特定Decoder部分的 FP16 转换，或关注项目后续更新以修复此导出兼容性问题。","https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Fissues\u002F50",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},17866,"fastT5 是否支持最新版本的 Transformers、ONNX 和 ONNX Runtime？","是的，项目正在持续更新以支持新版本。\n- **Transformers**: 最近的发布版本已经支持较新的 Transformers 库（如 4.12.x 及以上）。如果遇到兼容性问题，建议从源码安装最新版本。\n- **ONNX Runtime**: 维护者正在努力支持更新版本的 ONNX Runtime。\n- **安装建议**: 如果需要最新特性，可以尝试从 GitHub 源码安装，并在初始化时指定相应的 ORT 版本。","https:\u002F\u002Fgithub.com\u002FKi6an\u002FfastT5\u002Fissues\u002F27",[]]