[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-jhj0517--Whisper-WebUI":3,"tool-jhj0517--Whisper-WebUI":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":79,"owner_email":78,"owner_twitter":78,"owner_website":78,"owner_url":80,"languages":81,"stars":102,"forks":103,"last_commit_at":104,"license":105,"difficulty_score":10,"env_os":106,"env_gpu":107,"env_ram":108,"env_deps":109,"category_tags":123,"github_topics":124,"view_count":131,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":132,"updated_at":133,"faqs":134,"releases":164},1172,"jhj0517\u002FWhisper-WebUI","Whisper-WebUI","A Web UI for easy subtitle using whisper model.","Whisper-WebUI 是一个基于浏览器的界面，用于方便地生成字幕。它利用 Whisper 模型将音频或视频内容转换为文字，并支持多种输入方式，如文件、YouTube 视频或麦克风。用户可以生成 SRT、WebVTT 和 txt 格式的字幕，还能进行语音到文本的翻译以及字幕文件的翻译。工具还提供了音频预处理和后处理功能，如语音活动检测、背景音乐分离和说话人辨识。适合需要快速生成字幕的普通用户、内容创作者及研究人员使用。其支持多种 Whisper 实现，具备灵活的配置选项，是处理音视频字幕的理想选择。","# Whisper-WebUI\nA Gradio-based browser interface for [Whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper). You can use it as an Easy Subtitle Generator!\n\n![screen](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjhj0517_Whisper-WebUI_readme_ccd9a79eb2fb.png)\n\n\n\n## Notebook\nIf you wish to try this on Colab, you can do it in [here](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fjhj0517\u002FWhisper-WebUI\u002Fblob\u002Fmaster\u002Fnotebook\u002Fwhisper-webui.ipynb)!\n\n# Feature\n- Select the Whisper implementation you want to use between :\n   - [openai\u002Fwhisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper)\n   - [SYSTRAN\u002Ffaster-whisper](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper) (used by default)\n   - [Vaibhavs10\u002Finsanely-fast-whisper](https:\u002F\u002Fgithub.com\u002FVaibhavs10\u002Finsanely-fast-whisper)\n- Generate subtitles from various sources, including :\n  - Files\n  - Youtube\n  - Microphone\n- Currently supported subtitle formats : \n  - SRT\n  - WebVTT\n  - txt ( only text file without timeline )\n- Speech to Text Translation \n  - From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )\n- Text to Text Translation\n  - Translate subtitle files using Facebook NLLB models\n  - Translate subtitle files using DeepL API\n- Pre-processing audio input with [Silero VAD](https:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-vad).\n- Pre-processing audio input to separate BGM with [UVR](https:\u002F\u002Fgithub.com\u002FAnjok07\u002Fultimatevocalremovergui). \n- Post-processing with speaker diarization using the [pyannote](https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fspeaker-diarization-3.1) model.\n   - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.\n      1. https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fspeaker-diarization-3.1\n      2. https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fsegmentation-3.0\n\n### Pipeline Diagram\n![Transcription Pipeline](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjhj0517_Whisper-WebUI_readme_b140275e4c6f.png)\n\n# Installation and Running\n\n- ## Running with Pinokio\n\nThe app is able to run with [Pinokio](https:\u002F\u002Fgithub.com\u002Fpinokiocomputer\u002Fpinokio).\n\n1. Install [Pinokio Software](https:\u002F\u002Fprogram.pinokio.computer\u002F#\u002F?id=install).\n2. Open the software and search for Whisper-WebUI and install it.\n3. Start the Whisper-WebUI and connect to the `http:\u002F\u002Flocalhost:7860`.\n\n- ## Running with Docker \n\n1. Install and launch [Docker-Desktop](https:\u002F\u002Fwww.docker.com\u002Fproducts\u002Fdocker-desktop\u002F).\n\n2. Git clone the repository\n\n```sh\ngit clone https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI.git\n```\n\n3. Build the image ( Image is about 7GB~ )\n\n```sh\ndocker compose build \n```\n\n4. Run the container \n\n```sh\ndocker compose up\n```\n\n5. Connect to the WebUI with your browser at `http:\u002F\u002Flocalhost:7860`\n\nIf needed, update the [`docker-compose.yaml`](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fblob\u002Fmaster\u002Fdocker-compose.yaml) to match your environment.\n\n- ## Run Locally\n\n### Prerequisite\nTo run this WebUI, you need to have `git`, `3.10 \u003C= python \u003C= 3.12`, `FFmpeg`.\n\n**Edit `--extra-index-url` in the [`requirements.txt`](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fblob\u002Fmaster\u002Frequirements.txt) to match your device.\u003Cbr>** \nBy default, the WebUI assumes you're using an Nvidia GPU and **CUDA 12.8.** If you're using Intel or another CUDA version, read the [`requirements.txt`](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fblob\u002Fmaster\u002Frequirements.txt) and edit `--extra-index-url`.\n\nPlease follow the links below to install the necessary software:\n- git : [https:\u002F\u002Fgit-scm.com\u002Fdownloads](https:\u002F\u002Fgit-scm.com\u002Fdownloads)\n- python : [https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F) **`3.10 ~ 3.12` is recommended.** \n- FFmpeg :  [https:\u002F\u002Fffmpeg.org\u002Fdownload.html](https:\u002F\u002Fffmpeg.org\u002Fdownload.html)\n- CUDA : [https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads)\n\nAfter installing FFmpeg, **make sure to add the `FFmpeg\u002Fbin` folder to your system PATH!**\n\n### Installation Using the Script Files\n\n1. git clone this repository\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI.git\n```\n2. Run `install.bat` or `install.sh` to install dependencies. (It will create a `venv` directory and install dependencies there.)\n3. Start WebUI with `start-webui.bat` or `start-webui.sh` (It will run `python app.py` after activating the venv)\n\nAnd you can also run the project with command line arguments if you like to, see [wiki](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fwiki\u002FCommand-Line-Arguments) for a guide to arguments.\n\n# VRAM Usages\nThis project is integrated with [faster-whisper](https:\u002F\u002Fgithub.com\u002Fguillaumekln\u002Ffaster-whisper) by default for better VRAM usage and transcription speed.\n\nAccording to faster-whisper, the efficiency of the optimized whisper model is as follows: \n| Implementation    | Precision | Beam size | Time  | Max. GPU memory | Max. CPU memory |\n|-------------------|-----------|-----------|-------|-----------------|-----------------|\n| openai\u002Fwhisper    | fp16      | 5         | 4m30s | 11325MB         | 9439MB          |\n| faster-whisper    | fp16      | 5         | 54s   | 4755MB          | 3244MB          |\n\nIf you want to use an implementation other than faster-whisper, use `--whisper_type` arg and the repository name.\u003Cbr>\nRead [wiki](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fwiki\u002FCommand-Line-Arguments) for more info about CLI args.\n\nIf you want to use a fine-tuned model, manually place the models in `models\u002FWhisper\u002F` corresponding to the implementation.\n\nAlternatively, if you enter the huggingface repo id (e.g, [deepdml\u002Ffaster-whisper-large-v3-turbo-ct2](https:\u002F\u002Fhuggingface.co\u002Fdeepdml\u002Ffaster-whisper-large-v3-turbo-ct2)) in the \"Model\" dropdown, it will be automatically downloaded in the directory.\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjhj0517_Whisper-WebUI_readme_eb63e6f04a4f.png)\n\n# REST API\nIf you're interested in deploying this app as a REST API, please check out [\u002Fbackend](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Ftree\u002Fmaster\u002Fbackend).\n\n## TODO🗓\n\n- [x] Add DeepL API translation\n- [x] Add NLLB Model translation\n- [x] Integrate with faster-whisper\n- [x] Integrate with insanely-fast-whisper\n- [x] Integrate with whisperX ( Only speaker diarization part )\n- [x] Add background music separation pre-processing with [UVR](https:\u002F\u002Fgithub.com\u002FAnjok07\u002Fultimatevocalremovergui)  \n- [x] Add fast api script\n- [ ] Add CLI usages\n- [ ] Support real-time transcription for microphone\n\n### Translation 🌐\nAny PRs that translate the language into [translation.yaml](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fblob\u002Fmaster\u002Fconfigs\u002Ftranslation.yaml) would be greatly appreciated!\n","# Whisper-WebUI\n基于 Gradio 的浏览器界面，用于 [Whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper)。你可以把它当作一个简单的字幕生成器使用！\n\n![screen](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjhj0517_Whisper-WebUI_readme_ccd9a79eb2fb.png)\n\n\n\n## 笔记本\n如果你想在 Colab 上试用，可以在这里进行：[whisper-webui.ipynb](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fjhj0517\u002FWhisper-WebUI\u002Fblob\u002Fmaster\u002Fnotebook\u002Fwhisper-webui.ipynb)！\n\n# 功能\n- 你可以选择要使用的 Whisper 实现：\n   - [openai\u002Fwhisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper)\n   - [SYSTRAN\u002Ffaster-whisper](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper)（默认使用）\n   - [Vaibhavs10\u002Finsanely-fast-whisper](https:\u002F\u002Fgithub.com\u002FVaibhavs10\u002Finsanely-fast-whisper)\n- 从多种来源生成字幕，包括：\n  - 文件\n  - YouTube\n  - 麦克风\n- 目前支持的字幕格式：\n  - SRT\n  - WebVTT\n  - txt（仅文本文件，不含时间轴）\n- 语音转文本翻译：\n  - 从其他语言翻译成英语。（这是 Whisper 的端到端语音转文本翻译功能）\n- 文本转文本翻译：\n  - 使用 Facebook NLLB 模型翻译字幕文件\n  - 使用 DeepL API 翻译字幕文件\n- 使用 [Silero VAD](https:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-vad) 对音频输入进行预处理。\n- 使用 [UVR](https:\u002F\u002Fgithub.com\u002FAnjok07\u002Fultimatevocalremovergui) 对音频输入进行预处理，以分离背景音乐。\n- 使用 [pyannote](https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fspeaker-diarization-3.1) 模型进行后处理，实现说话人分离。\n   - 要下载 pyannote 模型，你需要拥有 HuggingFace 的 token，并手动接受以下页面上的条款：\n      1. https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fspeaker-diarization-3.1\n      2. https:\u002F\u002Fhuggingface.co\u002Fpyannote\u002Fsegmentation-3.0\n\n### 流程图\n![Transcription Pipeline](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjhj0517_Whisper-WebUI_readme_b140275e4c6f.png)\n\n# 安装与运行\n\n- ## 使用 Pinokio 运行\n该应用可以在 [Pinokio](https:\u002F\u002Fgithub.com\u002Fpinokiocomputer\u002Fpinokio) 上运行。\n\n1. 安装 [Pinokio 软件](https:\u002F\u002Fprogram.pinokio.computer\u002F#\u002F?id=install)。\n2. 打开软件，搜索 Whisper-WebUI 并安装。\n3. 启动 Whisper-WebUI，然后连接到 `http:\u002F\u002Flocalhost:7860`。\n\n- ## 使用 Docker 运行\n\n1. 安装并启动 [Docker Desktop](https:\u002F\u002Fwww.docker.com\u002Fproducts\u002Fdocker-desktop\u002F)。\n\n2. 克隆仓库：\n\n```sh\ngit clone https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI.git\n```\n\n3. 构建镜像（镜像大小约为 7GB）：\n\n```sh\ndocker compose build \n```\n\n4. 运行容器：\n\n```sh\ndocker compose up\n```\n\n5. 使用浏览器连接到 `http:\u002F\u002Flocalhost:7860` 即可访问 WebUI。\n\n如有需要，可以根据你的环境更新 [`docker-compose.yaml`](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fblob\u002Fmaster\u002Fdocker-compose.yaml)。\n\n- ## 本地运行\n\n### 前提条件\n要运行此 WebUI，你需要安装 `git`、`Python 3.10 到 3.12` 以及 `FFmpeg`。\n\n**请编辑 [`requirements.txt`](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fblob\u002Fmaster\u002Frequirements.txt) 中的 `--extra-index-url` 以匹配你的设备。\u003Cbr>**\n默认情况下，WebUI 假设你使用的是 Nvidia GPU 和 **CUDA 12.8**。如果你使用的是 Intel 显卡或其他版本的 CUDA，请阅读 [`requirements.txt`](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fblob\u002Fmaster\u002Frequirements.txt) 并修改 `--extra-index-url`。\n\n请按照以下链接安装必要的软件：\n- git：[https:\u002F\u002Fgit-scm.com\u002Fdownloads](https:\u002F\u002Fgit-scm.com\u002Fdownloads)\n- python：[https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F) **推荐使用 3.10 ~ 3.12 版本。**\n- FFmpeg：[https:\u002F\u002Fffmpeg.org\u002Fdownload.html](https:\u002F\u002Fffmpeg.org\u002Fdownload.html)\n- CUDA：[https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads)\n\n安装 FFmpeg 后，请务必将 `FFmpeg\u002Fbin` 文件夹添加到系统的 PATH 中！\n\n### 使用脚本文件安装\n\n1. 克隆此仓库：\n\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI.git\n```\n\n2. 运行 `install.bat` 或 `install.sh` 来安装依赖项。（它会创建一个 `venv` 目录，并在其中安装依赖项。）\n3. 使用 `start-webui.bat` 或 `start-webui.sh` 启动 WebUI。（它会在激活 venv 后运行 `python app.py`）\n\n你也可以通过命令行参数来运行该项目，具体请参阅 [wiki](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fwiki\u002FCommand-Line-Arguments) 中的参数说明。\n\n# 显存使用情况\n该项目默认集成了 [faster-whisper](https:\u002F\u002Fgithub.com\u002Fguillaumekln\u002Ffaster-whisper)，以更好地利用显存并提高转录速度。\n\n根据 faster-whisper 的测试结果，优化后的 Whisper 模型效率如下：\n\n| 实现          | 精度    | 波束大小 | 时间  | 最大显存占用 | 最大内存占用 |\n|---------------|---------|----------|-------|--------------|--------------|\n| openai\u002Fwhisper  | fp16    | 5        | 4分30秒 | 11325MB      | 9439MB       |\n| faster-whisper  | fp16    | 5        | 54秒   | 4755MB       | 3244MB       |\n\n如果你想使用 faster-whisper 以外的实现，可以使用 `--whisper_type` 参数，并指定仓库名称。\u003Cbr>\n更多关于 CLI 参数的信息，请参阅 [wiki](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fwiki\u002FCommand-Line-Arguments)。\n\n如果你想使用微调过的模型，可以手动将模型放置在 `models\u002FWhisper\u002F` 目录下，对应相应的实现。\n\n此外，如果你在“模型”下拉菜单中输入 HuggingFace 仓库 ID（例如 [deepdml\u002Ffaster-whisper-large-v3-turbo-ct2](https:\u002F\u002Fhuggingface.co\u002Fdeepdml\u002Ffaster-whisper-large-v3-turbo-ct2)），它将会自动下载到该目录中。\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjhj0517_Whisper-WebUI_readme_eb63e6f04a4f.png)\n\n# REST API\n如果你有兴趣将此应用部署为 REST API，请查看 [\u002Fbackend](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Ftree\u002Fmaster\u002Fbackend)。\n\n## 待办事项🗓\n\n- [x] 添加 DeepL API 翻译\n- [x] 添加 NLLB 模型翻译\n- [x] 集成 faster-whisper\n- [x] 集成 insanely-fast-whisper\n- [x] 集成 whisperX（仅说话人分离部分）\n- [x] 添加使用 [UVR](https:\u002F\u002Fgithub.com\u002FAnjok07\u002Fultimatevocalremovergui) 进行背景音乐分离的预处理\n- [x] 添加 FastAPI 脚本\n- [ ] 添加 CLI 使用方法\n- [ ] 支持麦克风实时转录\n\n### 翻译 🌐\n任何将语言翻译成 [translation.yaml](https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fblob\u002Fmaster\u002Fconfigs\u002Ftranslation.yaml) 的 PR 都将不胜感激！","# Whisper-WebUI 快速上手指南\n\n## 环境准备\n\n### 系统要求\n- 操作系统：Windows \u002F Linux \u002F macOS\n- Python 版本：3.10 ~ 3.12（推荐使用 Python 3.11）\n- FFmpeg：用于音频处理\n- Git：用于代码克隆\n\n### 前置依赖\n- 安装 Git：[https:\u002F\u002Fgit-scm.com\u002Fdownloads](https:\u002F\u002Fgit-scm.com\u002Fdownloads)\n- 安装 Python：[https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)（建议使用 [Python 官方镜像](https:\u002F\u002Fmirrors.huaweicloud.com\u002Fpython\u002F) 或 [清华源](https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple) 安装）\n- 安装 FFmpeg：[https:\u002F\u002Fffmpeg.org\u002Fdownload.html](https:\u002F\u002Fffmpeg.org\u002Fdownload.html)（安装后需将 `FFmpeg\u002Fbin` 添加到系统 PATH）\n\n> 注意：若使用 NVIDIA GPU，需安装 CUDA 12.8。若使用 Intel GPU 或其他 CUDA 版本，请根据 `requirements.txt` 中的 `--extra-index-url` 进行调整。\n\n---\n\n## 安装步骤\n\n1. 克隆项目仓库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI.git\n```\n\n2. 安装依赖（推荐使用国内镜像加速）\n```bash\npip install -r requirements.txt --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n3. 启动 WebUI\n```bash\npython app.py\n```\n\n> 或者使用脚本启动（根据操作系统选择）：\n- Windows: `start-webui.bat`\n- Linux\u002FmacOS: `start-webui.sh`\n\n---\n\n## 基本使用\n\n1. 访问本地服务\n打开浏览器，访问 `http:\u002F\u002Flocalhost:7860`，进入 WebUI 界面。\n\n2. 上传音频文件或输入 YouTube 链接\n- 选择音频来源（文件、YouTube、麦克风）\n- 选择模型（默认为 `SYSTRAN\u002Ffaster-whisper`）\n- 选择输出格式（SRT、WebVTT、txt）\n\n3. 开始转录\n点击“Transcribe”按钮，等待任务完成。\n\n4. 下载结果\n转录完成后，可下载生成的字幕文件。","某视频内容创作者需要为一系列英文教学视频添加中文字幕，以提升观众的观看体验。他通常需要手动处理多个视频文件，耗时且效率低下。\n\n### 没有 Whisper-WebUI 时  \n- 需要手动下载并配置多个语音识别模型，操作复杂  \n- 每个视频都需要单独运行命令行工具进行转录，流程繁琐  \n- 转录后的字幕需要人工校对和格式化，耗时耗力  \n- 不支持直接从 YouTube 下载视频并生成字幕  \n- 缺乏对音频预处理（如降噪、分离人声）的功能  \n\n### 使用 Whisper-WebUI 后  \n- 通过图形化界面一键选择合适的语音模型，简化了配置过程  \n- 支持批量上传视频文件或直接输入 YouTube 链接，提高工作效率  \n- 自动生成 SRT 格式的字幕文件，并提供自动校对功能  \n- 内置音频预处理模块，可自动去除背景音乐并增强语音清晰度  \n- 支持多语言翻译，可将英文字幕自动翻译成中文  \n\nWhisper-WebUI 将视频字幕生成流程从手动操作转变为自动化、高效的解决方案。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjhj0517_Whisper-WebUI_ccd9a79e.png","jhj0517","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fjhj0517_9a4b06d8.png","Android \u002F Flutter\u002F Python",null,"South Korea","https:\u002F\u002Fgithub.com\u002Fjhj0517",[82,86,90,94,98],{"name":83,"color":84,"percentage":85},"Python","#3572A5",95.4,{"name":87,"color":88,"percentage":89},"Jupyter Notebook","#DA5B0B",2.6,{"name":91,"color":92,"percentage":93},"Batchfile","#C1F12E",1,{"name":95,"color":96,"percentage":97},"Dockerfile","#384d54",0.8,{"name":99,"color":100,"percentage":101},"Shell","#89e051",0.2,2737,405,"2026-04-05T10:51:14","Apache-2.0","Linux, macOS, Windows","需要 NVIDIA GPU，显存 8GB+，CUDA 11.7+","16GB+",{"notes":110,"python":111,"dependencies":112},"建议使用 conda 管理环境，首次运行需下载约 5GB 模型文件。若使用 CPU 运行，性能可能较低。Huggingface 模型需手动接受条款。","3.10 ~ 3.12",[113,114,115,116,117,118,119,120,121,122],"torch>=2.0","transformers>=4.30","accelerate","gradio","pyannote","ffmpeg","silero-vad","uvr","faster-whisper","insanely-fast-whisper",[14,15,55,13],[125,126,127,128,129,116,130],"ai","open-source","python","web-ui","whisper","pytorch",4,"2026-03-27T02:49:30.150509","2026-04-06T07:14:26.079286",[135,140,145,150,155,160],{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},5314,"如何解决 'Could not locate cudnn_ops_infer64_8.dll' 错误？","请确保使用 CUDA 12.4 版本，并重新运行 install.bat 安装脚本。如果问题仍然存在，可以尝试从 https:\u002F\u002Fgithub.com\u002FPurfview\u002Fwhisper-standalone-win\u002Freleases\u002Ftag\u002Flibs 下载相关库文件。或者考虑使用 Docker 运行应用。","https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fissues\u002F235",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},5315,"如何解决 SRT 字幕段落过长的问题？","此问题是由于 SILERO VAD 模块导致的。建议将 transformers 版本固定为 4.47.1，避免使用更新版本（如 4.48.0 及以上）。该问题已在 #484 中修复。","https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fissues\u002F470",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},5316,"如何解决重复单词和非英语语言识别错误？","此问题可能与 cuDNN、CUDA 或 PyTorch 的版本不兼容有关。建议安装 CUDA 12.4 并重新运行 install.bat 脚本。若仍存在问题，可参考 https:\u002F\u002Fgithub.com\u002FPurfview\u002Fwhisper-standalone-win\u002Freleases\u002Ftag\u002Flibs 提供的解决方案。","https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fissues\u002F268",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},5317,"如何将本地主机地址更改为特定域名？","可以通过修改启动命令中的 `--server_name` 参数来更改地址。例如：`docker run --gpus all -d -v \u002FWhisper-WebUI\u002Fmodels -v \u002FWhisper-WebUI\u002Foutputs -p 7860:7860 -it jhj0517\u002Fwhisper-webui:latest --server_name yourdomain.com --server_port 7860`。","https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fissues\u002F79",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},5318,"如何解决 'NoneType' object is not iterable 错误？","此问题已在 #405 中修复。请确保使用最新版本的代码。如果问题仍然存在，请重新打开该 Issue 并提供更多信息。","https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fissues\u002F390",{"id":161,"question_zh":162,"answer_zh":163,"source_url":139},5319,"如何在 Docker 中设置模型和输出路径？","可以通过在运行 Docker 命令时添加 `-v` 参数来指定本地路径。例如：`-v \u002Fyour\u002Fpath\u002Fto\u002Fmodels:\u002FWhisper-WebUI\u002Fmodels` 和 `-v \u002Fyour\u002Fpath\u002Fto\u002Foutputs:\u002FWhisper-WebUI\u002Foutputs`。",[165,170,175,180,185,190,195,200,205],{"id":166,"version":167,"summary_zh":168,"released_at":169},104806,"v1.0.8","## What's Changed\r\n* Fix: lock down `torchaduio` version for its incompatibility by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F603\r\n* Fix: add fallback for loading yaml by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F605\r\n* Add Traditional Chinese (zh-TW) translation by @azq1231 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F596\r\n* Add Dutch translation to translation.yaml by @nataliegeurtsen in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F597\r\n\r\n## New Contributors\r\n* @azq1231 made their first contribution in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F596\r\n* @nataliegeurtsen made their first contribution in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F597\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fcompare\u002Fv1.0.7...v1.0.8","2025-10-17T09:21:32",{"id":171,"version":172,"summary_zh":173,"released_at":174},104807,"v1.0.7","## What's Changed\r\n* Fix LookupError by @egemengulpinar in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F586\r\n* Add Hebrew (he) translation by @back2zion in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F581\r\n* Hide language setting in the UI by default by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F589\r\n\r\n## New Contributors\r\n* @egemengulpinar made their first contribution in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F586\r\n* @back2zion made their first contribution in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F581\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fcompare\u002Fv1.0.6...v1.0.7","2025-08-29T07:54:21",{"id":176,"version":177,"summary_zh":178,"released_at":179},104808,"v1.0.6","## What's Changed\r\n* Fix i18n encoding issue in notebook by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F574\r\n* Set `word_timestamps` to always True to reduce hallucinations by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F580\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fcompare\u002Fv1.0.5...v1.0.6","2025-08-10T15:53:19",{"id":181,"version":182,"summary_zh":183,"released_at":184},104809,"v1.0.5","## What's Changed\r\n* Fix i18n encoding issue by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F572\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fcompare\u002Fv1.0.4...v1.0.5","2025-07-26T10:18:35",{"id":186,"version":187,"summary_zh":188,"released_at":189},104810,"v1.0.4","## What's Changed\r\n* Add configs to the docker compose by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F552\r\n* Upgrade minimum CUDA version by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F555\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fcompare\u002Fv1.0.3...v1.0.4\r\n\r\n\r\n> [!NOTE]\r\n> Latest `torch` no longer supports CUDA 12.4 binaries. Need to use at least CUDA 12.6.\r\n> - https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F","2025-05-06T05:45:37",{"id":191,"version":192,"summary_zh":193,"released_at":194},104811,"v1.0.3","## What's Changed\r\n* Update `docker-compose.yaml` about cpu usage by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F535\r\n* Increase maximum file cache age by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F537\r\n* Downgrade gradio-i18n to 0.3.0 by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F547\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fcompare\u002Fv1.0.2...v1.0.3","2025-04-29T05:29:09",{"id":196,"version":197,"summary_zh":198,"released_at":199},104812,"v1.0.2","## What's Changed\r\n* Add write permission for `GITHUB_TOKEN` in the action by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F530\r\n* Handle error when importing `uvr` by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F533\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fcompare\u002Fv1.0.1...v1.0.2","2025-04-07T14:32:39",{"id":201,"version":202,"summary_zh":203,"released_at":204},104813,"v1.0.1","Initial semantic versioning tag for Docker images. Auto-generated release notes will be used starting from the next tag.\r\n\r\n## What's Changed\r\n- Add semantic versioning for Docker images in DockerHub and GHCR by @NotYuSheng (#528)\r\n  - Major versions as `vX.Y.Z`, minor versions as `vX.Y.Z-ShortSHA`.\r\n- Fix shared test module importing  by @jhj0517 in https:\u002F\u002Fgithub.com\u002Fjhj0517\u002FWhisper-WebUI\u002Fpull\u002F527\r\n","2025-04-01T09:58:47",{"id":206,"version":207,"summary_zh":208,"released_at":209},104814,"v1.0.0","WebUI with batch\u002Fshell scripts for convenient use:\r\n\r\n| OS      | File                          |\r\n|---------|-------------------------------|\r\n| Windows | Whisper-WebUI-Portable-Windows.zip |\r\n| Mac     | Whisper-WebUI-Portable-Mac.zip         |\r\n| Linux     | Whisper-WebUI-Portable-Mac.zip         |\r\n\r\nUsage : \r\n- To install: `install.bat` \u002F `install.sh`\r\n- To run: `start-webui.bat` \u002F `start-webui.sh`\r\n- To update: `update.bat` \u002F `update.sh`\r\n\r\n**Before installing, make sure to run `update.bat` \u002F `update.sh` first, because older versions of the Whisper-Web UI may be embedded.**\r\n\r\nFor `install.bat` \u002F `install.sh`, a `venv` folder will be created and all dependencies will be installed there.\r\nFor `update.bat` \u002F `update.sh`, a `git pull` will be executed to fetch the latest changes.\r\nFor `start-webui.bat` \u002F `start-webui.sh`, run `python app.py` with some CLI arguments.\r\n\r\n\r\n","2024-06-18T05:50:09"]