[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-QwenLM--Qwen3-ASR-Toolkit":3,"tool-QwenLM--Qwen3-ASR-Toolkit":64},[4,16,27,35,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":15},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,14],"Agent","插件","ready",{"id":17,"name":18,"github_repo":19,"description_zh":20,"stars":21,"difficulty_score":22,"last_commit_at":23,"category_tags":24,"status":15},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,2,"2026-04-10T01:20:03",[14,13,25,26],"图像","开发框架",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":22,"last_commit_at":33,"category_tags":34,"status":15},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[14,26],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":22,"last_commit_at":41,"category_tags":42,"status":15},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,"2026-04-10T11:13:16",[25,43,44,14,13,45,46,26,47],"数据工具","视频","其他","语言模型","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":15},7525,"codex","openai\u002Fcodex","Codex 是 OpenAI 推出的一款轻量级编程智能体，专为在终端环境中高效运行而设计。它允许开发者直接在命令行界面与 AI 交互，完成代码生成、调试、重构及项目维护等任务，无需频繁切换至浏览器或集成开发环境，从而显著提升了编码流程的连贯性与专注度。\n\n这款工具主要解决了传统 AI 辅助编程中上下文割裂的问题。通过将智能体本地化运行，Codex 能够更紧密地结合当前工作目录的文件结构，提供更具针对性的代码建议，同时支持以自然语言指令驱动复杂的开发操作，让“对话即编码”成为现实。\n\nCodex 非常适合习惯使用命令行的软件工程师、全栈开发者以及技术研究人员。对于追求极致效率、偏好键盘操作胜过图形界面的极客用户而言，它更是理想的结对编程伙伴。\n\n其独特亮点在于灵活的部署方式：既可作为全局命令行工具通过 npm 或 Homebrew 一键安装，也能无缝对接现有的 ChatGPT 订阅计划（如 Plus 或 Pro），直接复用账户权益。此外，它还提供了从纯文本终端到桌面应用的多形态体验，并支持基于 API 密钥的深度定制，充分满足不同场景下的开发需求。",75220,"2026-04-14T14:40:34",[46,13,14],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":22,"last_commit_at":62,"category_tags":63,"status":15},51,"gstack","garrytan\u002Fgstack","gstack 是 Y Combinator CEO Garry Tan 亲自开源的一套 AI 工程化配置，旨在将 Claude Code 升级为你的虚拟工程团队。面对单人开发难以兼顾产品战略、架构设计、代码审查及质量测试的挑战，gstack 提供了一套标准化解决方案，帮助开发者实现堪比二十人团队的高效产出。\n\n这套配置特别适合希望提升交付效率的创始人、技术负责人，以及初次尝试 Claude Code 的开发者。gstack 的核心亮点在于内置了 15 个具有明确职责的 AI 角色工具，涵盖 CEO、设计师、工程经理、QA 等职能。用户只需通过简单的斜杠命令（如 `\u002Freview` 进行代码审查、`\u002Fqa` 执行测试、`\u002Fplan-ceo-review` 规划功能），即可自动化处理从需求分析到部署上线的全链路任务。\n\n所有操作基于 Markdown 和斜杠命令，无需复杂配置，完全免费且遵循 MIT 协议。gstack 不仅是一套工具集，更是一种现代化的软件工厂实践，让单人开发者也能拥有严谨的工程流程。",73956,"2026-04-16T23:09:21",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":22,"env_os":93,"env_gpu":94,"env_ram":94,"env_deps":95,"category_tags":100,"github_topics":79,"view_count":22,"oss_zip_url":79,"oss_zip_packed_at":79,"status":15,"created_at":101,"updated_at":102,"faqs":103,"releases":137},8145,"QwenLM\u002FQwen3-ASR-Toolkit","Qwen3-ASR-Toolkit","Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support.","Qwen3-ASR-Toolkit 是一款专为调用通义千问语音识别（Qwen-ASR）API 设计的高效 Python 工具。它核心解决了官方 API 单次仅支持 3 分钟音频的限制，让用户能够轻松转录长达数小时的会议记录、播客或视频内容。\n\n该工具通过智能语音活动检测（VAD）技术，自动在自然停顿处将长音频切分为合理片段，避免句子被生硬截断。随后，利用多线程并行处理技术同时发送这些片段，大幅缩短整体等待时间。此外，它还具备自动去重与纠错功能，能有效过滤识别结果中的重复内容或幻觉错误，并支持一键生成带时间轴的 SRT 字幕文件。无论是本地文件还是网络链接，也无论是何种音视频格式，它都能自动适配采样率并完成处理。\n\n这款工具非常适合需要处理长音频转录的开发者、研究人员、媒体从业者以及普通用户。对于希望快速构建语音应用的原型开发者，或是需要整理大量访谈素材的研究团队，Qwen3-ASR-Toolkit 提供了一个简单命令行即可启动的可靠解决方案，让高质量的语音转文字变得触手可及。","# Qwen3-ASR-Toolkit\n\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fqwen3-asr-toolkit.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fqwen3-asr-toolkit)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.8+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n[![Also in](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FAlso%20in-Java-orange.svg)](#-implementations-in-other-languages)\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n\n## 😊 Important Notice\n\nQwen3-ASR is now **open-sourced** 🎉🎉🎉. Welcome to visit the [**GitHub**](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-ASR) and [**blog**](https:\u002F\u002Fqwen.ai\u002Fblog?id=qwen3asr) for more information. The open-source model offers functionality comparable to the API and supports free, fast local deployment. Qwen3-ASR open-source model includes two powerful **all-in-one speech recognition models (0.6B\u002F1.7B)** that support language identification and ASR for **52 languages and dialects**, as well as a novel non-autoregressive speech forced-alignment model that can align text–speech pairs in 11 languages. Its powerful performance is sufficient to deliver highly compelling speech-to-text transcription capabilities. Welcome to use it!\n\n## Overview \n\nAn advanced, high-performance Python command-line toolkit for using the **Qwen-ASR API** (formerly Qwen3-ASR-Flash). This implementation overcomes the API's 3-minute audio length limitation by intelligently splitting long audio\u002Fvideo files and processing them in parallel, enabling rapid transcription of hours-long content.\n\n## 🚀 Key Features\n\n-   **Break the 3-Minute Limit**: Seamlessly transcribe audio and video files of any length by bypassing the official API's duration constraint.\n-   **Smart Audio Splitting**: Utilizes **Voice Activity Detection (VAD)** to split audio into meaningful chunks at natural silent pauses. This ensures that words and sentences are not awkwardly cut off.\n-   **High-Speed Parallel Processing**: Leverages multi-threading to send audio chunks to the Qwen-ASR API concurrently, dramatically reducing the total transcription time for long files.\n-   **Intelligent Post-Processing**: Automatically detects and removes common ASR **hallucinations and repetitive artifacts** for cleaner, more accurate transcripts.\n-   **SRT Subtitle Generation**: Automatically create timestamped **`.srt` subtitle files** based on VAD segments, perfect for adding captions to video content.\n-   **Automatic Audio Resampling**: Automatically converts audio from any sample rate and channel count to the 16kHz mono format required by the Qwen-ASR API. You can use any audio file without worrying about pre-processing.\n-   **Universal Media Support**: Supports virtually any audio and video format (e.g., `.mp4`, `.mov`, `.mkv`, `.mp3`, `.wav`, `.m4a`) thanks to its reliance on FFmpeg.\n-   **Simple & Easy to Use**: A straightforward command-line interface allows you to get started with just a single command.\n\n## ⚙️ How It Works\n\nThis tool follows a robust pipeline to deliver fast and accurate transcriptions for long-form media:\n\n1.  **Media Loading**: The script first loads your media file, whether it's a **local file or a remote URL**.\n2.  **VAD-based Chunking**: It analyzes the audio stream using Voice Activity Detection (VAD) to identify silent segments.\n3.  **Intelligent Splitting**: The audio is then split into smaller chunks based on the detected silences. Each chunk's duration is managed to stay under the 3-minute API limit, with a **user-configurable target length (defaulting to 120 seconds)**, preventing mid-sentence cuts.\n4.  **Parallel API Calls**: A thread pool is initiated to upload and process these chunks concurrently using the DashScope Qwen-ASR API.\n5.  **Result Aggregation & Cleaning**: The transcribed text segments from all chunks are collected, re-ordered, and then **post-processed to remove detected repetitions and hallucinations**.\n6.  **Output Generation**: The final, cleaned transcription is printed to the console and saved to a `.txt` file. **Optionally, a timestamped `.srt` subtitle file can also be generated.**\n\n## 🏁 Getting Started\n\nFollow these steps to set up and run the project on your local machine.\n\n### Prerequisites\n\n-   Python 3.8 or higher.\n-   **FFmpeg**: The script requires FFmpeg to be installed on your system to handle media files.\n    -   **Ubuntu\u002FDebian**: `sudo apt update && sudo apt install ffmpeg`\n    -   **macOS**: `brew install ffmpeg`\n    -   **Windows**: Download from the [official FFmpeg website](https:\u002F\u002Fffmpeg.org\u002Fdownload.html) and add it to your system's PATH.\n-   **DashScope API Key**: You need an API key from Alibaba Cloud's DashScope.\n    -   You can obtain one from the [DashScope Console](https:\u002F\u002Fdashscope.console.aliyun.com\u002FapiKey). If you are calling the API services of Tongyi Qwen for the first time, you can follow the tutorial on [this website](https:\u002F\u002Fhelp.aliyun.com\u002Fzh\u002Fmodel-studio\u002Ffirst-api-call-to-qwen) to create your own API Key.\n    -   For better security and convenience, it is **highly recommended** to set your API key as an environment variable named `DASHSCOPE_API_KEY`. The script will automatically use it, and you won't need to pass the `--api-key` argument in the command.\n\n        **On Linux\u002FmacOS:**\n        ```bash\n        export DASHSCOPE_API_KEY=\"your_api_key_here\"\n        ```\n        *(To make this permanent, add the line to your `~\u002F.bashrc`, `~\u002F.zshrc`, or `~\u002F.profile` file.)*\n\n        **On Windows (Command Prompt):**\n        ```cmd\n        set DASHSCOPE_API_KEY=\"your_api_key_here\"\n        ```\n\n        **On Windows (PowerShell):**\n        ```powershell\n        $env:DASHSCOPE_API_KEY=\"your_api_key_here\"\n        ```\n        *(For a permanent setting on Windows, search for \"Edit the system environment variables\" in the Start Menu and add `DASHSCOPE_API_KEY` to your user variables.)*\n\n### Installation\n\nWe recommend installing the tool directly from PyPI for the simplest setup.\n\n#### Option 1: Install from PyPI (Recommended)\n\nSimply run the following command in your terminal. This will install the package and make the `qwen3-asr` command available system-wide.\n\n```bash\npip install qwen3-asr-toolkit\n```\n\n#### Option 2: Install from Source\n\nIf you want to install the latest development version or contribute to the project, you can install from the source code.\n\n1.  Clone the repository:\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-ASR-Toolkit.git\n    cd Qwen3-ASR-Toolkit\n    ```\n\n2.  Install the package:\n    ```bash\n    pip install .\n    ```\n\n## 📖 Usage\n\nOnce installed, you can use the `qwen3-asr` command directly from your terminal. By default, the tool will print progress information.\n\n### Command\n\n```bash\nqwen3-asr -i \u003Cinput_file_or_url> [-key \u003Capi_key>] [-j \u003Cnum_threads>] [-c \u003Ccontext>] [-d \u003Cduration>] [-t \u003Ctmp_dir>] [--save-srt] [-s]\n```\n\n### Arguments\n\n| Argument                  | Short  | Description                                                                          | Required\u002FOptional                        |\n| ------------------------- | ------ | ------------------------------------------------------------------------------------ | ---------------------------------------- |\n| `--input-file`            | `-i`   | Path to the local media file or a remote URL (http\u002Fhttps) to transcribe.             | **Required**                             |\n| `--context`               | `-c`   | Text context to guide the ASR model, improving recognition of specific terms.        | Optional, Default: `\"\"`                  |\n| `--dashscope-api-key`     | `-key` | Your DashScope API Key.                                                              | Optional (if `DASHSCOPE_API_KEY` is set) |\n| `--num-threads`           | `-j`   | The number of concurrent threads to use for API calls.                               | Optional, **Default: 4**                 |\n| `--vad-segment-threshold` | `-d`   | Target duration in seconds for each VAD-split audio chunk.                           | Optional, **Default: 120**               |\n| `--tmp-dir`               | `-t`   | Path to a directory for storing temporary chunk files.                               | Optional, Default: `~\u002Fqwen3-asr-cache`   |\n| `--save-srt`              | `-srt` | Generate and save a timestamped SRT subtitle file in addition to the `.txt` file.    | Optional                                 |\n| `--silence`               | `-s`   | Silence mode. Suppresses detailed progress and chunking information on the terminal. | Optional                                 |\n\n### Output\n\nThe full transcription result will be printed to the terminal (unless in `--silence` mode) and also saved in a `.txt` file in the same directory as the input file. For example, if you process `my_video.mp4`, the output will be saved to `my_video.txt`.\n\n**If you use the `--save-srt` flag, a corresponding `my_video.srt` subtitle file will also be created in the same directory.**\n\n---\n\n## ✨ Examples\n\nHere are a few examples of how to use the tool.\n\n#### 1. Basic Transcription of a Local File\n\nTranscribe a video file using the default 4 threads. This command assumes you have set the `DASHSCOPE_API_KEY` environment variable.\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Flong_lecture.mp4\"\n```\n\n#### 2. Transcribe a Remote Audio File\n\nDirectly process an audio file from a URL.\n\n```bash\nqwen3-asr -i \"https:\u002F\u002Fsomewebsite.com\u002Faudios\u002Fpodcast_episode.mp3\"\n```\n\n#### 3. Generate an SRT Subtitle File\n\nUse the `--save-srt` (or `-srt`) flag to generate a timestamped subtitle file alongside the plain text transcript. This is ideal for video captioning.\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Fdocumentary.mp4\" -srt\n```\n*This command will create `documentary.txt` and `documentary.srt`.*\n\n#### 4. Increase Concurrency and Pass API Key\n\nTranscribe a long audio file using 8 parallel threads and pass the API key directly via the command line.\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Fpodcast_episode_01.wav\" -j 8 -key \"your_api_key_here\"\n```\n\n#### 5. Provide Context and Customize Chunk Duration\n\nIf your audio contains specific jargon, use the `-c` flag. If you prefer shorter, more frequent subtitle segments, use `-d` to set a smaller chunk duration.\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Ftech_talk.mp4\" -c \"Qwen-ASR, DashScope, FFmpeg\" -d 60 -srt\n```\n*This command will try to split the audio into chunks around 60 seconds long, which can result in more granular subtitles.*\n\n#### 6. Run in Silence Mode\n\nUse the `-s` or `--silence` flag to prevent progress details from being printed to the terminal. The final transcript will still be saved to a file.\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Fmeeting_recording.m4a\" -s\n```\n\n## 🌍 Implementations in Other Languages\n\nWhile this project provides a full-featured Python toolkit, we also host implementations in other programming languages to demonstrate how the same core logic can be applied across different technology stacks. We warmly welcome the community to contribute examples in more languages!\n\n### ☕ Java Example\n\nWe have provided a Java version as a standalone example located in the `examples\u002Fjava-example` directory of this repository. This example showcases how to implement the key features of the toolkit—including VAD-based audio chunking, parallel API requests, and result aggregation—using Java. It serves as a great starting point for Java developers looking to integrate Qwen-ASR into their applications.\n\n### How to Contribute Your Version\n\nIf you have implemented a similar toolkit in another language (e.g., **Go**, **Rust**, **C#**, **JavaScript\u002FNode.js**), we would love to feature it! Please open a pull request to add your implementation to the `examples` directory. For more details on contributing, see the [Contributing](#-contributing) section below.\n\n## 🤝 Contributing\n\nContributions are welcome! If you have suggestions for improvements, please feel free to fork the repo, create a feature branch, and open a pull request. You can also open an issue with the \"enhancement\" tag.\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n","# Qwen3-ASR-Toolkit\n\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fqwen3-asr-toolkit.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fqwen3-asr-toolkit)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.8+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n[![Also in](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FAlso%20in-Java-orange.svg)](#-implementations-in-other-languages)\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n\n## 😊 重要提示\n\nQwen3-ASR 现已**开源**🎉🎉🎉。欢迎访问[**GitHub**](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-ASR)和[**博客**](https:\u002F\u002Fqwen.ai\u002Fblog?id=qwen3asr)获取更多信息。开源模型的功能与 API 相当，支持免费、快速的本地部署。Qwen3-ASR 开源模型包含两款强大的**一体化语音识别模型（0.6B\u002F1.7B）**，支持**52 种语言和方言**的语言识别与 ASR，以及一款新颖的非自回归语音强制对齐模型，可在 11 种语言中实现文本-语音对齐。其强大的性能足以提供极具吸引力的语音转文字能力。欢迎使用！\n\n## 概述\n\n这是一款先进的高性能 Python 命令行工具包，用于调用 **Qwen-ASR API**（原名 Qwen3-ASR-Flash）。该实现通过智能分割长音频\u002F视频文件并并行处理，突破了 API 对音频长度 3 分钟的限制，从而能够快速转录长达数小时的内容。\n\n## 🚀 核心功能\n\n-   **突破 3 分钟限制**：无缝转录任意长度的音频和视频文件，绕过官方 API 的时长限制。\n-   **智能音频分割**：利用**语音活动检测（VAD）**在自然的静音处将音频分割成有意义的片段，确保单词和句子不会被生硬地截断。\n-   **高速并行处理**：借助多线程技术，同时向 Qwen-ASR API 发送多个音频片段，大幅缩短长文件的总转录时间。\n-   **智能后处理**：自动检测并去除常见的 ASR **幻觉和重复性伪影**，生成更干净、更准确的转录文本。\n-   **SRT 字幕生成**：基于 VAD 分段自动生成带时间戳的**`.srt` 字幕文件**，非常适合为视频内容添加字幕。\n-   **自动音频重采样**：自动将任何采样率和声道数的音频转换为 Qwen-ASR API 所需的 16kHz 单声道格式。您无需担心预处理问题，可直接使用任何音频文件。\n-   **通用媒体支持**：由于依赖 FFmpeg，几乎支持所有音频和视频格式（如 `.mp4`、`.mov`、`.mkv`、`.mp3`、`.wav`、`.m4a` 等）。\n-   **简单易用**：直观的命令行界面让您只需一条命令即可开始使用。\n\n## ⚙️ 工作原理\n\n该工具采用稳健的流程，为长篇媒体内容提供快速且准确的转录：\n\n1.  **加载媒体**：脚本首先加载您的媒体文件，无论是**本地文件还是远程 URL**。\n2.  **基于 VAD 的分块**：利用语音活动检测（VAD）分析音频流，识别静音段。\n3.  **智能分割**：根据检测到的静音将音频分割成较小的片段。每个片段的时长控制在 3 分钟的 API 限制以内，并允许用户自定义目标时长（默认 120 秒），以避免句子被截断。\n4.  **并行 API 调用**：启动线程池，使用 DashScope Qwen-ASR API 并发上传和处理这些音频片段。\n5.  **结果聚合与清洗**：收集所有片段的转录文本，重新排序后进行**后处理，去除重复内容和幻觉现象**。\n6.  **输出生成**：最终清理后的转录文本会打印到控制台，并保存为 `.txt` 文件。**还可选择生成带时间戳的 `.srt` 字幕文件。**\n\n## 🏁 快速入门\n\n按照以下步骤，在您的本地机器上设置并运行该项目。\n\n### 先决条件\n\n-   Python 3.8 或更高版本。\n-   **FFmpeg**：脚本需要系统中安装 FFmpeg 来处理媒体文件。\n    -   **Ubuntu\u002FDebian**：`sudo apt update && sudo apt install ffmpeg`\n    -   **macOS**：`brew install ffmpeg`\n    -   **Windows**：从 [FFmpeg 官方网站](https:\u002F\u002Fffmpeg.org\u002Fdownload.html)下载，并将其添加到系统的 PATH 中。\n-   **DashScope API 密钥**：您需要阿里云 DashScope 的 API 密钥。\n    -   您可以从 [DashScope 控制台](https:\u002F\u002Fdashscope.console.aliyun.com\u002FapiKey)获取密钥。如果您是首次调用通义千问的 API 服务，可以参考[此网站](https:\u002F\u002Fhelp.aliyun.com\u002Fzh\u002Fmodel-studio\u002Ffirst-api-call-to-qwen)上的教程创建自己的 API 密钥。\n    -   为了更好的安全性和便利性，**强烈建议**将您的 API 密钥设置为名为 `DASHSCOPE_API_KEY` 的环境变量。脚本会自动使用该变量，您无需在命令中传递 `--api-key` 参数。\n\n        **Linux\u002FmacOS：**\n        ```bash\n        export DASHSCOPE_API_KEY=\"your_api_key_here\"\n        ```\n        *(若要永久生效，请将该行添加到 `~\u002F.bashrc`、`~\u002F.zshrc` 或 `~\u002F.profile` 文件中。)*\n\n        **Windows（命令提示符）：**\n        ```cmd\n        set DASHSCOPE_API_KEY=\"your_api_key_here\"\n        ```\n\n        **Windows（PowerShell）：**\n        ```powershell\n        $env:DASHSCOPE_API_KEY=\"your_api_key_here\"\n        ```\n        *(若要在 Windows 上永久设置，请在“开始”菜单中搜索“编辑系统环境变量”，并将 `DASHSCOPE_API_KEY` 添加到您的用户变量中。)*\n\n### 安装\n\n我们推荐直接从 PyPI 安装该工具，这是最简单的设置方式。\n\n#### 选项 1：从 PyPI 安装（推荐）\n\n只需在终端中运行以下命令，即可安装该软件包，并使 `qwen3-asr` 命令在整个系统中可用。\n\n```bash\npip install qwen3-asr-toolkit\n```\n\n#### 选项 2：从源代码安装\n\n如果您想安装最新的开发版本或参与项目贡献，也可以从源代码安装。\n\n1.  克隆仓库：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-ASR-Toolkit.git\n    cd Qwen3-ASR-Toolkit\n    ```\n\n2.  安装软件包：\n    ```bash\n    pip install .\n    ```\n\n## 📖 使用方法\n\n安装完成后，您可以直接在终端中使用 `qwen3-asr` 命令。默认情况下，工具会打印进度信息。\n\n### 命令\n\n```bash\nqwen3-asr -i \u003Cinput_file_or_url> [-key \u003Capi_key>] [-j \u003Cnum_threads>] [-c \u003Ccontext>] [-d \u003Cduration>] [-t \u003Ctmp_dir>] [--save-srt] [-s]\n```\n\n### 参数\n\n| 参数                  | 短选项  | 描述                                                                          | 必需\u002F可选                        |\n| ------------------------- | ------ | ------------------------------------------------------------------------------------ | ---------------------------------------- |\n| `--input-file`            | `-i`   | 要转录的本地媒体文件路径或远程 URL（http\u002Fhttps）。             | **必需**                             |\n| `--context`               | `-c`   | 用于引导 ASR 模型的文本上下文，以提高特定术语的识别准确度。        | 可选，默认值：`\"\"`                  |\n| `--dashscope-api-key`     | `-key` | 您的 DashScope API 密钥。                                                              | 可选（如果已设置 `DASHSCOPE_API_KEY`） |\n| `--num-threads`           | `-j`   | 用于 API 调用的并发线程数。                               | 可选，默认值：**4**                 |\n| `--vad-segment-threshold` | `-d`   | 每个 VAD 分割音频块的目标时长（秒）。                           | 可选，默认值：**120**               |\n| `--tmp-dir`               | `-t`   | 用于存储临时分块文件的目录路径。                               | 可选，默认值：`~\u002Fqwen3-asr-cache`   |\n| `--save-srt`              | `-srt` | 除了 `.txt` 文件外，还会生成并保存带时间戳的 SRT 字幕文件。    | 可选                                 |\n| `--silence`               | `-s`   | 静默模式。在终端上抑制详细的进度和分块信息。 | 可选                                 |\n\n### 输出\n\n完整的转录结果将打印到终端（除非处于 `--silence` 模式），同时也会保存为与输入文件同目录下的 `.txt` 文件。例如，如果您处理的是 `my_video.mp4`，输出将保存为 `my_video.txt`。\n\n**如果使用了 `--save-srt` 标志，则会在同一目录下创建对应的 `my_video.srt` 字幕文件。**\n\n---\n\n## ✨ 示例\n\n以下是一些使用该工具的示例。\n\n#### 1. 本地文件的基本转录\n\n使用默认的 4 个线程转录视频文件。此命令假设您已设置 `DASHSCOPE_API_KEY` 环境变量。\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Flong_lecture.mp4\"\n```\n\n#### 2. 转录远程音频文件\n\n直接从 URL 处理音频文件。\n\n```bash\nqwen3-asr -i \"https:\u002F\u002Fsomewebsite.com\u002Faudios\u002Fpodcast_episode.mp3\"\n```\n\n#### 3. 生成 SRT 字幕文件\n\n使用 `--save-srt`（或 `-srt`）标志，可在生成纯文本转录的同时生成带时间戳的字幕文件。这非常适合视频字幕制作。\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Fdocumentary.mp4\" -srt\n```\n*此命令将创建 `documentary.txt` 和 `documentary.srt`。*\n\n#### 4. 增加并发数并传递 API 密钥\n\n使用 8 个并行线程转录长音频文件，并通过命令行直接传递 API 密钥。\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Fpodcast_episode_01.wav\" -j 8 -key \"your_api_key_here\"\n```\n\n#### 5. 提供上下文并自定义分块时长\n\n如果您的音频包含特定术语，可以使用 `-c` 标志。如果您希望字幕片段更短、更频繁，可以使用 `-d` 设置较小的分块时长。\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Ftech_talk.mp4\" -c \"Qwen-ASR、DashScope、FFmpeg\" -d 60 -srt\n```\n*此命令会尝试将音频分割为约 60 秒的分块，从而生成更精细的字幕。*\n\n#### 6. 在静默模式下运行\n\n使用 `-s` 或 `--silence` 标志，可以防止进度详情打印到终端。最终的转录结果仍将保存到文件中。\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Fmeeting_recording.m4a\" -s\n```\n\n## 🌍 其他语言的实现\n\n虽然本项目提供了一个功能齐全的 Python 工具包，我们也在其他编程语言中提供了实现，以展示如何在不同的技术栈中应用相同的逻辑核心。我们热烈欢迎社区贡献更多语言的示例！\n\n### ☕ Java 示例\n\n我们在本仓库的 `examples\u002Fjava-example` 目录中提供了一个独立的 Java 版本示例。该示例展示了如何使用 Java 实现工具包的关键功能——包括基于 VAD 的音频分块、并行 API 请求以及结果聚合。对于希望将 Qwen-ASR 集成到其应用程序中的 Java 开发人员来说，这是一个很好的起点。\n\n### 如何贡献您的版本\n\n如果您已在其他语言中实现了类似的工具包（例如 **Go**、**Rust**、**C#**、**JavaScript\u002FNode.js**），我们非常乐意将其收录！请打开一个拉取请求，将您的实现添加到 `examples` 目录中。有关贡献的更多详细信息，请参阅下方的 [贡献](#-contributing) 部分。\n\n## 🤝 贡献\n\n欢迎贡献！如果您有任何改进建议，请随时 fork 本仓库，创建特性分支并提交拉取请求。您也可以开一个带有“enhancement”标签的问题。\n\n## 📄 许可证\n\n本项目采用 MIT 许可证授权——详情请参阅 [LICENSE](LICENSE) 文件。","# Qwen3-ASR-Toolkit 快速上手指南\n\nQwen3-ASR-Toolkit 是一个高性能的 Python 命令行工具，专为调用通义千问语音识别（Qwen-ASR）API 设计。它通过智能切片和并行处理技术，突破了官方 API 单次 3 分钟的音频长度限制，支持对任意时长的音视频文件进行快速、准确的转录，并自动生成带时间戳的字幕文件。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows\n*   **Python 版本**：3.8 或更高\n*   **核心依赖**：**FFmpeg**（用于处理音视频文件）\n    *   **Ubuntu\u002FDebian**: `sudo apt update && sudo apt install ffmpeg`\n    *   **macOS**: `brew install ffmpeg`\n    *   **Windows**: 从 [FFmpeg 官网](https:\u002F\u002Fffmpeg.org\u002Fdownload.html) 下载并将 `bin` 目录添加到系统环境变量 `PATH` 中。\n*   **API 密钥**：需要阿里云 DashScope 的 API Key。\n    *   获取地址：[DashScope 控制台](https:\u002F\u002Fdashscope.console.aliyun.com\u002FapiKey)\n    *   **推荐配置**：将密钥设置为环境变量 `DASHSCOPE_API_KEY`，以便在命令中无需重复输入。\n        *   **Linux\u002FmacOS**:\n            ```bash\n            export DASHSCOPE_API_KEY=\"your_api_key_here\"\n            ```\n        *   **Windows (PowerShell)**:\n            ```powershell\n            $env:DASHSCOPE_API_KEY=\"your_api_key_here\"\n            ```\n\n## 安装步骤\n\n推荐使用 PyPI 进行安装，最简单快捷：\n\n```bash\npip install qwen3-asr-toolkit\n```\n\n*注：国内用户若遇下载缓慢，可使用清华源加速安装：*\n```bash\npip install qwen3-asr-toolkit -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n安装完成后，终端即可直接使用 `qwen3-asr` 命令。\n\n## 基本使用\n\n### 1. 转录本地音视频文件\n假设您已配置好环境变量 `DASHSCOPE_API_KEY`，运行以下命令即可转录一个长视频：\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Flong_lecture.mp4\"\n```\n*执行成功后，当前目录下将生成同名的 `.txt` 文本文件（如 `long_lecture.txt`）。*\n\n### 2. 生成带时间轴的字幕文件\n添加 `-srt` 参数，可同时生成标准的 SRT 字幕文件，适用于视频配音：\n\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Fmy\u002Fdocumentary.mp4\" -srt\n```\n*这将同时输出 `documentary.txt` 和 `documentary.srt`。*\n\n### 3. 转录网络链接音频\n直接支持 HTTP\u002FHTTPS 链接作为输入：\n\n```bash\nqwen3-asr -i \"https:\u002F\u002Fsomewebsite.com\u002Faudios\u002Fpodcast_episode.mp3\"\n```\n\n### 常用参数说明\n| 参数 | 简写 | 说明 | 默认值 |\n| :--- | :--- | :--- | :--- |\n| `--input-file` | `-i` | 输入文件路径或 URL (**必填**) | - |\n| `--save-srt` | `-srt` | 生成 SRT 字幕文件 | 否 |\n| `--num-threads` | `-j` | 并行线程数，提升长文件处理速度 | 4 |\n| `--context` | `-c` | 上下文提示词，优化专有名词识别 | \"\" |\n| `--silence` | `-s` | 静默模式，不输出详细进度日志 | 否 |\n\n**高级示例**：使用 8 个线程加速，并指定专业术语上下文：\n```bash\nqwen3-asr -i \"\u002Fpath\u002Fto\u002Ftech_talk.mp4\" -j 8 -c \"Qwen-ASR, DashScope\" -srt\n```","某视频制作团队需要紧急将一场长达 4 小时的跨国技术峰会录像转化为带时间轴的中英双语字幕，以便快速发布回顾内容。\n\n### 没有 Qwen3-ASR-Toolkit 时\n- **时长限制受阻**：官方 API 单次仅支持 3 分钟音频，人工切割 4 小时素材需手动处理数百个片段，耗时且极易出错。\n- **语义断裂严重**：简单按固定时长切分会导致句子在中间被强行截断，后续拼接时上下文逻辑混乱，增加大量人工校对成本。\n- **格式兼容繁琐**：原始录像包含多种编码和采样率，转录前需额外编写脚本进行统一的格式转换和重采样预处理。\n- **产出效率低下**：串行处理导致整体转录耗时极长，无法满足“当日会议、次日上映”的紧迫发布需求。\n\n### 使用 Qwen3-ASR-Toolkit 后\n- **突破时长瓶颈**：工具自动利用 VAD 语音检测技术在自然停顿处智能切片，无缝处理任意长度的音视频文件，无需人工干预。\n- **保持语义完整**：智能分割确保每个片段都在句尾结束，结合去幻觉后处理算法，直接输出连贯流畅、无重复伪影的高质量文本。\n- **全自动预处理**：内置 FFmpeg 引擎自动将各类媒体文件转换为 API 所需的 16kHz 单声道格式，实现“拖入即转写”。\n- **并行极速交付**：通过多线程并发调用 API，将数小时的转录任务压缩至分钟级完成，并直接生成标准的 `.srt` 字幕文件供后期使用。\n\nQwen3-ASR-Toolkit 通过智能分片与并行处理机制，将原本需要数天的人工转录工作流缩减为一键自动化流程，极大提升了长音频内容的生产效能。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FQwenLM_Qwen3-ASR-Toolkit_ada14863.png","QwenLM","Qwen","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FQwenLM_4756c6c9.png","Alibaba Cloud's general-purpose AI models",null,"qianwen_opensource@alibabacloud.com","Alibaba_Qwen","https:\u002F\u002Fqwen.ai\u002F","https:\u002F\u002Fgithub.com\u002FQwenLM",[85],{"name":86,"color":87,"percentage":88},"Python","#3572A5",100,932,88,"2026-04-16T05:32:56","MIT","Linux, macOS, Windows","未说明",{"notes":96,"python":97,"dependencies":98},"该工具主要作为 Qwen-ASR API 的客户端使用，通过并行调用云端 API 处理长音频，因此本地无需运行大型 AI 模型，对 GPU 和本地内存无特殊高要求。必须安装 FFmpeg 用于媒体文件处理。需要配置阿里云 DashScope API Key（建议设置为环境变量 DASHSCOPE_API_KEY）。支持本地文件或远程 URL 输入，可自动生成 SRT 字幕文件。","3.8+",[99],"FFmpeg",[47,14],"2026-03-27T02:49:30.150509","2026-04-17T08:25:54.646449",[104,109,114,118,123,128,133],{"id":105,"question_zh":106,"answer_zh":107,"source_url":108},36425,"如何为国际用户（非中国区）配置 DashScope API 端点？","工具默认使用中国区端点，国际用户需通过环境变量指定国际端点。在运行命令前执行：\nexport DASHSCOPE_HTTP_BASE_URL=https:\u002F\u002Fdashscope-intl.aliyuncs.com\u002Fapi\u002Fv1\n设置后即可使用国际账号的 API Key 正常调用服务。","https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-ASR-Toolkit\u002Fissues\u002F11",{"id":110,"question_zh":111,"answer_zh":112,"source_url":113},36426,"目前是否支持生成带时间戳的 SRT 字幕文件或进行说话人识别？","当前版本暂不支持多说话人识别（Speaker Diarization）和精确的时间戳分段功能，API 本身也尚未开放这些能力。维护者表示后续迭代更新中会考虑加入这些功能。若需粗略的时间戳，可尝试调整 VAD 参数，但效果可能有限。","https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-ASR-Toolkit\u002Fissues\u002F2",{"id":115,"question_zh":116,"answer_zh":117,"source_url":108},36427,"为什么在 Windows 上生成的文本或 SRT 文件出现乱码？","这是由于文件写入时未显式指定 UTF-8 编码导致的。解决方法是修改源码 `call_api.py`：\n1. 在写入原始文本时（约第 127 行），将 open 函数改为：`with open(save_file, 'w', encoding='utf-8') as f:`\n2. 在写入 SRT 字幕文件时（约第 147 行），同样添加 `encoding='utf-8'` 参数。\n这样可以确保中文内容正确保存。",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},36428,"如何让生成的 SRT 字幕每一句都带上独立的时间戳而不是大段合并？","可以通过减小 VAD（语音活动检测）的分段阈值来实现更细粒度的切割。在使用命令行工具时，添加 `-d` 参数来最小化 VAD 分段，从而让每一句话对应独立的时间戳区间。","https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-ASR-Toolkit\u002Fissues\u002F7",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},36429,"没有中国手机号无法注册阿里云账号，是否有其他使用途径？","是的，国际用户可以直接访问阿里云国际站使用 Qwen3-ASR-Flash 模型。请访问以下链接获取帮助和注册国际账号：\nhttps:\u002F\u002Fwww.alibabacloud.com\u002Fhelp\u002Fen\u002Fmodel-studio\u002Fqwen-speech-recognition\n注册后配合设置国际版 API 端点即可使用。","https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-ASR-Toolkit\u002Fissues\u002F6",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},36430,"目前的识别延迟较高，能否实现真正的实时语音识别？","当前的 Qwen3-ASR-Flash API 存在约 2 秒左右的端到端延迟，这是模型本身的限制，暂时无法通过客户端优化消除。开发团队已规划推出专门的实时 ASR API，未来将支持流式返回结果以解决延迟问题。","https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-ASR-Toolkit\u002Fissues\u002F10",{"id":134,"question_zh":135,"answer_zh":136,"source_url":113},36431,"遇到 \"Arrearage\" 或 \"Access denied\" 错误怎么办？","该错误通常表示您的阿里云账户欠费或状态异常。请检查您的账户余额并确保账户处于良好状态（in good standing）。如果是国际用户，还需确认是否正确配置了国际版 API 端点环境变量。",[138,143,148,153],{"id":139,"version":140,"summary_zh":141,"released_at":142},289285,"v1.0.4","### **版本 v1.0.4**\n\n本次发布为视频内容创作者和高级用户带来了重大功能增强，新增了生成带时间戳的 SRT 字幕文件的支持，并提供了对音频分割过程更精细的控制。\n\n#### 🚀 新特性\n\n*   **SRT 字幕生成**：现在可以生成符合行业标准的 `.srt` 字幕文件，每个转录片段都带有时间戳。这非常适合为视频添加字幕，或用于任何需要同步文本的工作流。只需使用 `--save-srt`（或 `-srt`）标志即可启用此功能。\n*   **自定义分段时长**：进一步提升基于 VAD 的音频分割控制能力。通过新增的 `--vad-segment-threshold`（或 `-d`）参数，您可以为每个音频片段指定目标时长（单位：秒）。这样既可以创建更细粒度的字幕（通过设置较短的时长），也能针对不同类型的音频内容进行优化。默认值为 120 秒。","2025-09-22T10:38:51",{"id":144,"version":145,"summary_zh":146,"released_at":147},289286,"v1.0.3","### **版本 v1.0.3**\n\n本次发布增强了工具的可靠性，尤其针对长音频转写，并新增了自动语言检测功能。\n\n#### 🚀 新特性\n\n*   **自动语言检测**：工具在转写完成后会报告音频文件的主要语言。这对于验证媒体文件的语言非常有用，并且支持官方 Qwen3-ASR-Flash API 提供的所有 11 种语言。\n\n#### 💪 改进\n\n*   **API 稳健性提升**：优化了 API 调用机制，使其更能应对 API 的速率限制和错误情况。这显著提高了超长音频\u002F视频文件的转写成功率，并减少了意外中断的发生。","2025-09-19T03:47:50",{"id":149,"version":150,"summary_zh":151,"released_at":152},289287,"v1.0.2","### **版本 v1.0.2**\n\n本次发布重点在于通过多项新功能和关键改进提升易用性和转写质量。\n\n#### 🚀 新功能\n\n*   **远程文件支持**：现在可以使用 `-i`\u002F`--input-file` 参数直接从 HTTP\u002FHTTPS URL 转录媒体文件。\n*   **智能后处理**：自动检测并移除常见的 ASR 幻觉（例如重复性短语），以获得更干净的输出。\n*   **上下文引导**：新增 `--context` (`-c`) 参数，允许您提供特定术语或行话，从而提高识别准确度。\n\n#### ⚠️ 破坏性变更\n\n*   **日志行为反转**：`--verbose` 标志已被**移除**。详细日志记录现为默认开启状态。请使用新的 `--silence` (`-s`) 标志来抑制进度输出。","2025-09-18T05:59:33",{"id":154,"version":155,"summary_zh":156,"released_at":157},289288,"v1.0.1","Qwen3-ASR-Toolkit 的首个版本已发布！","2025-09-17T07:22:44"]