[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-p0n1--epub_to_audiobook":3,"similar-p0n1--epub_to_audiobook":194},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":8,"readme_en":9,"readme_zh":10,"quickstart_zh":11,"use_case_zh":12,"hero_image_url":13,"owner_login":14,"owner_name":15,"owner_avatar_url":16,"owner_bio":17,"owner_company":15,"owner_location":15,"owner_email":15,"owner_twitter":18,"owner_website":15,"owner_url":19,"languages":20,"stars":33,"forks":34,"last_commit_at":35,"license":36,"difficulty_score":37,"env_os":38,"env_gpu":39,"env_ram":39,"env_deps":40,"category_tags":46,"github_topics":49,"view_count":37,"oss_zip_url":15,"oss_zip_packed_at":15,"status":57,"created_at":58,"updated_at":59,"faqs":60,"releases":99},9810,"p0n1\u002Fepub_to_audiobook","epub_to_audiobook","EPUB to audiobook converter, optimized for Audiobookshelf, WebUI included","epub_to_audiobook 是一款将 EPUB 电子书一键转换为有声书的开源工具，专为 Audiobookshelf 等本地听书平台优化。它解决了用户无法直接将静态电子书转化为高质量音频、难以在家庭媒体库中便捷管理章节的痛点，让阅读体验从“看”延伸至“听”。\n\n这款工具非常适合希望建立个人有声图书馆的普通读者、Audiobookshelf 用户，以及喜欢折腾命令行或 Docker 的技术爱好者。无需复杂的音频编辑技能，只需简单配置即可生成带章节元数据的 MP3 文件，导入后能自动识别章节标题，极大提升了导航与收听体验。\n\n其技术亮点在于灵活支持多种语音合成引擎：既可选用微软 Azure 和 OpenAI 的高自然度商用 API，也能使用完全免费的 EdgeTTS，甚至支持本地部署的 Piper 和 Kokoro 模型，兼顾音质、成本与隐私需求。此外，项目最近更新了 Web 图形界面，降低了使用门槛，让非开发者也能轻松上手。无论你想利用碎片时间“听”完经典名著，还是为视障朋友制作无障碍读物，epub_to_audiobook 都是一个高效、自由且可定制的解决方案。","# EPUB to Audiobook Converter [![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1177631634724491385?label=Discord&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.com\u002Finvite\u002Fpgp2G8zhS7) [![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fp0n1\u002Fepub_to_audiobook)\n\n*Join our [Discord](https:\u002F\u002Fdiscord.com\u002Finvite\u002Fpgp2G8zhS7) server for any questions or discussions. You can also ask questions about this project on [DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fp0n1\u002Fepub_to_audiobook).*\n\nThis project provides a command-line tool to convert EPUB ebooks into audiobooks. It now supports both the [Microsoft Azure Text-to-Speech API](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fspeech-service\u002Frest-text-to-speech) (alternativly [EdgeTTS](https:\u002F\u002Fgithub.com\u002Frany2\u002Fedge-tts)) and the [OpenAI Text-to-Speech API](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fguides\u002Ftext-to-speech) to generate the audio for each chapter in the ebook. The output audio files are optimized for use with [Audiobookshelf](https:\u002F\u002Fgithub.com\u002Fadvplyr\u002Faudiobookshelf).\n\n\u003C!-- *This project was developed with the help of ChatGPT.* -->\n\n## Recent Updates\n\n- 2025-05-23: Added a web interface (WebUI) to the project.\n\n## Audio Sample\n\nIf you're interested in hearing a sample of the audiobook generated by this tool, check the links bellow. \n\n- [Azure TTS Sample](https:\u002F\u002Faudio.com\u002Fpaudi\u002Faudio\u002F0008-chapter-vii-agricultural-experience)\n- [OpenAI TTS Sample](https:\u002F\u002Faudio.com\u002Fpaudi\u002Faudio\u002Fopenai-0008-chapter-vii-agricultural-experience-i-had-now-been-in)\n- Edge TTS Sample: the voice is almost the same as Azure TTS\n- [Piper TTS](https:\u002F\u002Frhasspy.github.io\u002Fpiper-samples\u002F)\n- [Kokoro TTS](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fhexgrad\u002FKokoro-TTS) (usage of this is done through a local OpenAI endpoint)\n\n## Requirements\n\n- Python 3.10+ Or ***Docker***\n- For using *Azure TTS*, A Microsoft Azure account with access to the [Microsoft Cognitive Services Speech Services](https:\u002F\u002Fportal.azure.com\u002F#create\u002FMicrosoft.CognitiveServicesSpeechServices) is required.\n- For using *OpenAI TTS*, OpenAI [API Key](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys) is required.\n  - If you are using Kokoro TTS, you won't need an official OpenAI key, but you will need to put a dummy value in the env for it. (e.g. `export OPENAI_API_KEY='fake'`) unless you are using the docker compose file (see below)\n- For using *Edge TTS*, no API Key is required.\n- Piper TTS executable and models for *Piper TTS*\n\n## Audiobookshelf Integration\n\nThe audiobooks generated by this project are optimized for use with [Audiobookshelf](https:\u002F\u002Fgithub.com\u002Fadvplyr\u002Faudiobookshelf). Each chapter in the EPUB file is converted into a separate MP3 file, with the chapter title extracted and included as metadata.\n\n![demo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fp0n1_epub_to_audiobook_readme_d82206ceee82.png)\n\n### Chapter Titles\n\nParsing and extracting chapter titles from EPUB files can be challenging, as the format and structure may vary significantly between different ebooks. The script employs a simple but effective method for extracting chapter titles, which works for most EPUB files. The method involves parsing the EPUB file and looking for the `title` tag in the HTML content of each chapter. If the title tag is not present, a fallback title is generated using the first few words of the chapter text.\n\nPlease note that this approach may not work perfectly for all EPUB files, especially those with complex or unusual formatting. However, in most cases, it provides a reliable way to extract chapter titles for use in Audiobookshelf.\n\nWhen you import the generated MP3 files into Audiobookshelf, the chapter titles will be displayed, making it easy to navigate between chapters and enhancing your listening experience.\n\n## Installation\n\n1. Clone this repository:\n\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook.git\n    cd epub_to_audiobook\n    ```\n\n2. Create a virtual environment and activate it:\n\n    ```bash\n    python3 -m venv venv\n    source venv\u002Fbin\u002Factivate\n    ```\n\n3. Install the required dependencies:\n\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n    Note: Python 3.14 requires the updated dependency set in this repository. Older installs pinned `gradio==5.33.1`, which could force a `pydantic-core` source build and fail with a PyO3 compatibility error during installation.\n\n4. Set the following environment variables with your Azure Text-to-Speech API credentials, or your OpenAI API key if you're using OpenAI TTS:\n\n    ```bash\n    export MS_TTS_KEY=\u003Cyour_subscription_key> # for Azure\n    export MS_TTS_REGION=\u003Cyour_region> # for Azure\n    export OPENAI_API_KEY=\u003Cyour_openai_api_key> # for OpenAI\n    ```\n\n## Web Interface (WebUI)\n\nFor users who prefer a graphical interface, this project includes a web-based UI built with Gradio. The WebUI provides an intuitive way to configure all the options and convert your EPUB files without using the command line.\n\n![WebUI Screenshot](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fp0n1_epub_to_audiobook_readme_466d911dc994.png)\n\n### Environment Variables for WebUI\n\nThe WebUI respects the same environment variables as the command-line tool:\n\n```bash\nexport MS_TTS_KEY=\u003Cyour_subscription_key>      # For Azure TTS\nexport MS_TTS_REGION=\u003Cyour_region>             # For Azure TTS\nexport OPENAI_API_KEY=\u003Cyour_openai_api_key>    # For OpenAI TTS\nexport OPENAI_BASE_URL=\u003Ccustom_endpoint>       # Optional: For custom OpenAI-compatible endpoints\n```\n\nMake sure to set the environment variables for the service you are using before starting the WebUI.\n\n### Starting the WebUI\n\nMake sure you have followed the [Installation](#installation) steps before starting the WebUI.\n\nTo launch the web interface, run:\n\n```bash\npython3 main_ui.py\n```\n\nBy default, the WebUI will be available at `http:\u002F\u002F127.0.0.1:7860`. You can customize the host and port:\n\n```bash\npython3 main_ui.py --host 127.0.0.1 --port 8080\n```\n\nRemember to press `Ctrl+C` in the terminal to stop the server if you want to stop it after you are done.\n\n### WebUI Features\n\nThe web interface provides:\n\n- **File Upload**: Drag and drop your EPUB file directly into the browser\n- **TTS Provider Selection**: Easy switching between Azure, OpenAI, Edge, and Piper TTS with provider-specific options\n- **Voice Configuration**: Dropdown menus for selecting languages, voices, and output formats\n- **Advanced Settings**: All command-line options are available through the web interface\n- **Real-time Logs**: View conversion progress and logs directly in the browser\n- **Preview Mode**: Test your settings without generating audio\n- **Search & Replace**: Upload text replacement files for pronunciation fixes\n\n### Using the WebUI\n\n1. **Upload your EPUB file** using the file selector\n2. **Choose your TTS provider** from the tabs (OpenAI, Azure, Edge, or Piper)\n3. **Configure provider-specific settings**:\n   - **OpenAI**: Select model, voice, speed, and format\n   - **Azure**: Choose language, voice, format, and break duration\n   - **Edge**: Set language, voice, rate, volume, and pitch\n   - **Piper**: Configure local or Docker deployment with voice options\n4. **Set output directory** or use the default timestamped folder\n5. **Adjust advanced options** if needed (chapter range, text processing, etc.)\n6. **Click Start** to begin conversion\n7. **Monitor progress** through the integrated log viewer\n\nYou can select a few chapters to preview the audio before starting the full conversion.\n\n### Docker with WebUI (The Easiest Way If You Are Familiar With Docker)\n\nYou can also run the WebUI using Docker. Use the provided `docker-compose.webui.yml` file. Make sure to edit the file with your API keys for your TTS provider.\n\n```bash\n# Edit docker-compose.webui.yml with your API keys\ndocker compose -f docker-compose.webui.yml up\n```\n\nThe WebUI will be accessible at `http:\u002F\u002Flocalhost:7860` or `http:\u002F\u002F127.0.0.1:7860`.\n\n### Security Considerations of WebUI\n\nThe WebUI is a web application that runs on your local machine. It's currently not designed to be accessible from the open internet. There is no authorization mechanism in place. So you should not expose it to the open internet otherwise it would lead to unauthorized access to your TTS providers.\n\n## Usage\n\nTo convert an EPUB ebook to an audiobook, run the following command, specifying the TTS provider of your choice with the `--tts` option:\n\n```bash\npython3 main.py \u003Cinput_file> \u003Coutput_folder> [options]\n```\n\nTo check the latest option descriptions for this script, you can run the following command in the terminal:\n\n```bash\npython3 main.py -h\n```\n\n```bash\nusage: main.py [-h] [--tts {azure,openai,edge,piper}]\n               [--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--preview]\n               [--no_prompt] [--language LANGUAGE]\n               [--newline_mode {single,double,none}]\n               [--title_mode {auto,tag_text,first_few}]\n               [--chapter_start CHAPTER_START] [--chapter_end CHAPTER_END]\n               [--output_text] [--remove_endnotes]\n               [--search_and_replace_file SEARCH_AND_REPLACE_FILE]\n               [--worker_count WORKER_COUNT]\n               [--voice_name VOICE_NAME] [--output_format OUTPUT_FORMAT]\n               [--model_name MODEL_NAME] [--voice_rate VOICE_RATE]\n               [--voice_volume VOICE_VOLUME] [--voice_pitch VOICE_PITCH]\n               [--proxy PROXY] [--break_duration BREAK_DURATION]\n               [--piper_path PIPER_PATH] [--piper_speaker PIPER_SPEAKER]\n               [--piper_sentence_silence PIPER_SENTENCE_SILENCE]\n               [--piper_length_scale PIPER_LENGTH_SCALE]\n               input_file output_folder\n\nConvert text book to audiobook\n\npositional arguments:\n  input_file            Path to the EPUB file\n  output_folder         Path to the output folder\n\noptions:\n  -h, --help            show this help message and exit\n  --tts {azure,openai,edge,piper}\n                        Choose TTS provider (default: azure). azure: Azure\n                        Cognitive Services, openai: OpenAI TTS API. When using\n                        azure, environment variables MS_TTS_KEY and\n                        MS_TTS_REGION must be set. When using openai,\n                        environment variable OPENAI_API_KEY must be set.\n  --log {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n                        Log level (default: INFO), can be DEBUG, INFO,\n                        WARNING, ERROR, CRITICAL\n  --preview             Enable preview mode. In preview mode, the script will\n                        not convert the text to speech. Instead, it will print\n                        the chapter index, titles, and character counts.\n  --no_prompt           Don't ask the user if they wish to continue after\n                        estimating the cloud cost for TTS. Useful for\n                        scripting.\n  --language LANGUAGE   Language for the text-to-speech service (default: en-\n                        US). For Azure TTS (--tts=azure), check\n                        https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fai-\n                        services\u002Fspeech-service\u002Flanguage-\n                        support?tabs=tts#text-to-speech for supported\n                        languages. For OpenAI TTS (--tts=openai), their API\n                        detects the language automatically. But setting this\n                        will also help on splitting the text into chunks with\n                        different strategies in this tool, especially for\n                        Chinese characters. For Chinese books, use zh-CN, zh-\n                        TW, or zh-HK.\n  --newline_mode {single,double,none}\n                        Choose the mode of detecting new paragraphs: 'single',\n                        'double', or 'none'. 'single' means a single newline\n                        character, while 'double' means two consecutive\n                        newline characters. 'none' means all newline\n                        characters will be replace with blank so paragraphs\n                        will not be detected. (default: double, works for most\n                        ebooks but will detect less paragraphs for some\n                        ebooks)\n  --title_mode {auto,tag_text,first_few}\n                        Choose the parse mode for chapter title, 'tag_text'\n                        search 'title','h1','h2','h3' tag for title,\n                        'first_few' set first 60 characters as title, 'auto'\n                        auto apply the best mode for current chapter.\n  --chapter_start CHAPTER_START\n                        Chapter start index (default: 1, starting from 1)\n  --chapter_end CHAPTER_END\n                        Chapter end index (default: -1, meaning to the last\n                        chapter)\n  --output_text         Enable Output Text. This will export a plain text file\n                        for each chapter specified and write the files to the\n                        output folder specified.\n  --remove_endnotes     This will remove endnote numbers from the end or\n                        middle of sentences. This is useful for academic\n                        books.\n  --search_and_replace_file SEARCH_AND_REPLACE_FILE\n                        Path to a file that contains 1 regex replace per line,\n                        to help with fixing pronunciations, etc. The format\n                        is: \u003Csearch>==\u003Creplace> Note that you may have to\n                        specify word boundaries, to avoid replacing parts of\n                        words.\n  --worker_count WORKER_COUNT\n                        Specifies the number of parallel workers to use for \n                        audiobook generation. Increasing this value can \n                        significantly speed up the process by processing \n                        multiple chapters simultaneously. Note: Chapters may \n                        not be processed in sequential order, but this will \n                        not affect the final audiobook.\n\n  --voice_name VOICE_NAME\n                        Various TTS providers has different voice names, look\n                        up for your provider settings.\n  --output_format OUTPUT_FORMAT\n                        Output format for the text-to-speech service.\n                        Supported format depends on selected TTS provider\n  --model_name MODEL_NAME\n                        Various TTS providers has different neural model names\n\nopenai specific:\n  --speed SPEED         The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.\n  --instructions INSTRUCTIONS\n                        Instructions for the TTS model. Only supported for 'gpt-4o-mini-tts' model.\n\nedge specific:\n  --voice_rate VOICE_RATE\n                        Speaking rate of the text. Valid relative values range\n                        from -50%(--xxx='-50%') to +100%. For negative value\n                        use format --arg=value,\n  --voice_volume VOICE_VOLUME\n                        Volume level of the speaking voice. Valid relative\n                        values floor to -100%. For negative value use format\n                        --arg=value,\n  --voice_pitch VOICE_PITCH\n                        Baseline pitch for the text.Valid relative values like\n                        -80Hz,+50Hz, pitch changes should be within 0.5 to 1.5\n                        times the original audio. For negative value use\n                        format --arg=value,\n  --proxy PROXY         Proxy server for the TTS provider. Format:\n                        http:\u002F\u002F[username:password@]proxy.server:port\n\nazure\u002Fedge specific:\n  --break_duration BREAK_DURATION\n                        Break duration in milliseconds for the different\n                        paragraphs or sections (default: 1250, means 1.25 s).\n                        Valid values range from 0 to 5000 milliseconds for\n                        Azure TTS.\n\npiper specific:\n  --piper_path PIPER_PATH\n                        Path to the Piper TTS executable\n  --piper_speaker PIPER_SPEAKER\n                        Piper speaker id, used for multi-speaker models\n  --piper_sentence_silence PIPER_SENTENCE_SILENCE\n                        Seconds of silence after each sentence\n  --piper_length_scale PIPER_LENGTH_SCALE\n                        Phoneme length, a.k.a. speaking rate\n```  \n\n**Example**:\n\n```bash\npython3 main.py examples\u002FThe_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder\n```\n\nExecuting the above command will generate a directory named `output_folder` and save the MP3 files for each chapter inside it using default TTS provider and voice. Once generated, you can import these audio files into [Audiobookshelf](https:\u002F\u002Fgithub.com\u002Fadvplyr\u002Faudiobookshelf) or play them with any audio player of your choice.\n\n## Preview Mode\n\nBefore converting your epub file to an audiobook, you can use the `--preview` option to get a summary of each chapter. This will provide you with the character count of each chapter and the total count, instead of converting the text to speech.\n\n**Example**:\n\n```bash\npython3 main.py examples\u002FThe_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --preview\n```\n\n## Search & Replace\n\nYou may want to search and replace text, either to expand abbreviations, or to help with pronunciation. You can do this by specifying a search and replace file, which contains a single regex search and replace per line, separated by '==':\n\n**Example**:\n\n**search.conf**:\n\n```text\n# this is the general structure\n\u003Csearch>==\u003Creplace>\n# this is a comment\n# fix cardinal direction abbreviations\nN\\.E\\.==north east\n# be careful with your regexes, as this would also match Sally N. Smith\nN\\.==north\n# pronounce Barbadoes like the locals\nBarbadoes==Barbayduss\n```\n\n```bash\npython3 main.py examples\u002FThe_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --search_and_replace_file search.conf\n```\n\n**Example**:\n\n```bash\npython3 main.py examples\u002FThe_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --preview\n```\n\n## Using with Docker\n\nThis tool is available as a Docker image, making it easy to run without needing to manage Python dependencies.\n\nFirst, make sure you have Docker installed on your system.\n\nYou can pull the Docker image from the GitHub Container Registry:\n\n```bash\ndocker pull ghcr.io\u002Fp0n1\u002Fepub_to_audiobook:latest\n```\n\nThen, you can run the tool with the following command:\n\n```bash\ndocker run -i -t --rm -v .\u002F:\u002Fapp -e MS_TTS_KEY=$MS_TTS_KEY -e MS_TTS_REGION=$MS_TTS_REGION ghcr.io\u002Fp0n1\u002Fepub_to_audiobook your_book.epub audiobook_output --tts azure\n```\n\nFor OpenAI, you can run:\n\n```bash\ndocker run -i -t --rm -v .\u002F:\u002Fapp -e OPENAI_API_KEY=$OPENAI_API_KEY ghcr.io\u002Fp0n1\u002Fepub_to_audiobook your_book.epub audiobook_output --tts openai\n```\n\nReplace `$MS_TTS_KEY` and `$MS_TTS_REGION` with your Azure Text-to-Speech API credentials. Replace `$OPENAI_API_KEY` with your OpenAI API key. Replace `your_book.epub` with the name of the input EPUB file, and `audiobook_output` with the name of the directory where you want to save the output files.\n\nThe `-v .\u002F:\u002Fapp` option mounts the current directory (`.`) to the `\u002Fapp` directory in the Docker container. This allows the tool to read the input file and write the output files to your local file system.\n\nThe `-i` and `-t` options are required to enable interactive mode and allocate a pseudo-TTY.\n\n**You can also check the [this example config file](.\u002Fdocker-compose.example.yml) for docker compose usage.**\n\n## User-Friendly Guide for Windows Users\n\nFor Windows users, especially if you're not very familiar with command-line tools, we've got you covered. We understand the challenges and have created a guide specifically tailored for you.\n\nCheck this [step by step guide](https:\u002F\u002Fgist.github.com\u002Fp0n1\u002Fcba98859cdb6331cc1aab835d62e4fba) and leave a message if you encounter issues.\n\n## How to Get Your Azure Cognitive Service Key?\n\n- Azure subscription - [Create one for free](https:\u002F\u002Fazure.microsoft.com\u002Ffree\u002Fcognitive-services)\n- [Create a Speech resource](https:\u002F\u002Fportal.azure.com\u002F#create\u002FMicrosoft.CognitiveServicesSpeechServices) in the Azure portal.\n- Get the Speech resource key and region. After your Speech resource is deployed, select **Go to resource** to view and manage keys. For more information about Cognitive Services resources, see [Get the keys for your resource](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fcognitive-services-apis-create-account#get-the-keys-for-your-resource).\n\n*Source: \u003Chttps:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fspeech-service\u002Fget-started-text-to-speech#prerequisites>*\n\n## How to Get Your OpenAI API Key?\n\nCheck https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fquickstart\u002Faccount-setup. Make sure you check the [price](https:\u002F\u002Fopenai.com\u002Fpricing) details before use.\n\n## ✨ About Edge TTS\n\nEdge TTS and Azure TTS are almost same, the difference is that Edge TTS don't require API Key because it's based on Edge read aloud functionality, and parameters are restricted a bit, like [custom ssml](https:\u002F\u002Fgithub.com\u002Frany2\u002Fedge-tts#custom-ssml).\n\nCheck https:\u002F\u002Fgist.github.com\u002FBettyJJ\u002F17cbaa1de96235a7f5773b8690a20462 for supported voices.\n\n**If you want to try this project quickly, Edge TTS is highly recommended.**\n\n## Customization of Voice and Language\n\nYou can customize the voice and language used for the Text-to-Speech conversion by passing the `--voice_name` and `--language` options when running the script.\n\nMicrosoft Azure offers a range of voices and languages for the Text-to-Speech service. For a list of available options, consult the [Microsoft Azure Text-to-Speech documentation](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fspeech-service\u002Flanguage-support?tabs=tts#text-to-speech).\n\nYou can also listen to samples of the available voices in the [Azure TTS Voice Gallery](https:\u002F\u002Faka.ms\u002Fspeechstudio\u002Fvoicegallery) to help you choose the best voice for your audiobook.\n\nFor example, if you want to use a British English female voice for the conversion, you can use the following command:\n\n```bash\npython3 main.py \u003Cinput_file> \u003Coutput_folder> --voice_name en-GB-LibbyNeural --language en-GB\n```\n\nFor OpenAI TTS, you can specify the model, voice, and format options using `--model_name`, `--voice_name`, and `--output_format`, respectively.\n\n## More examples\n\nHere are some examples that demonstrate various option combinations:\n\n### Examples Using Azure TTS\n\n1. **Basic conversion using Azure with default settings**  \n   This command will convert an EPUB file to an audiobook using Azure's default TTS settings.\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts azure\n   ```\n\n2. **Azure conversion with custom language, voice and logging level**  \n   Converts an EPUB file to an audiobook with a specified voice and a custom log level for debugging purposes.\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts azure --language zh-CN --voice_name \"zh-CN-YunyeNeural\" --log DEBUG\n   ```\n\n3. **Azure conversion with chapter range and break duration**  \n   Converts a specified range of chapters from an EPUB file to an audiobook with custom break duration between paragraphs.\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts azure --chapter_start 5 --chapter_end 10 --break_duration \"1500\"\n   ```\n\n### Examples Using OpenAI TTS\n\n1. **Basic conversion using OpenAI with default settings**  \n   This command will convert an EPUB file to an audiobook using OpenAI's default TTS settings.\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts openai\n   ```\n\n2. **OpenAI conversion with HD model and specific voice**  \n   Converts an EPUB file to an audiobook using the high-definition OpenAI model and a specific voice choice.\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts openai --model_name \"tts-1-hd\" --voice_name \"fable\"\n   ```\n\n3. **OpenAI conversion with preview and text output**  \n   Enables preview mode and text output, which will display the chapter index and titles instead of converting them and will also export the text.\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts openai --preview --output_text\n   ```\n\n## Example using an OpenAI-compatible service\n\nIt is possible to use an OpenAI-compatible service, like [matatonic\u002Fopenedai-speech](https:\u002F\u002Fgithub.com\u002Fmatatonic\u002Fopenedai-speech). In that case, it **is required** to set the `OPENAI_BASE_URL` environment variable, otherwise it would just default to the standard OpenAI service. While the compatible service might not require an API key, the OpenAI client still does, so make sure to set it to something nonsensical.\n\nIf your OpenAI-compatible service is running on `http:\u002F\u002F127.0.0.1:8000` and you have added a custom voice named `skippy`, you can use the following command:\n\n```shell\ndocker run -i -t --rm -v .\u002F:\u002Fapp -e OPENAI_BASE_URL=http:\u002F\u002F127.0.0.1:8000\u002Fv1 -e OPENAI_API_KEY=nope ghcr.io\u002Fp0n1\u002Fepub_to_audiobook your_book.epub audiobook_output --tts openai --voice_name=skippy --model_name=tts-1-hd\n```\n\nScroll down to the Kokoro TTS example below to see a more specific example of this.\n\n### Examples Using Edge TTS\n\n1. **Basic conversion using Edge with default settings**  \n   This command will convert an EPUB file to an audiobook using Edge's default TTS settings.\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts edge\n   ```\n\n2. **Edge conversion with custom language, voice and logging level**\n   Converts an EPUB file to an audiobook with a specified voice and a custom log level for debugging purposes.\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts edge --language zh-CN --voice_name \"zh-CN-YunxiNeural\" --log DEBUG\n   ```\n\n3. **Edge conversion with chapter range and break duration**\n   Converts a specified range of chapters from an EPUB file to an audiobook with custom break duration between paragraphs.\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts edge --chapter_start 5 --chapter_end 10 --break_duration \"1500\"\n   ```\n\n### Examples Using Piper TTS\n\n*Make sure you have installed Piper TTS and have an onnx model file and corresponding config file. Check [Piper TTS](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper) for more details. You can follow their instructions to install Piper TTS, download the models and config files, play with it and then come back to try the examples below.*\n\nThis command will convert an EPUB file to an audiobook using Piper TTS using the bare minimum parameters.\nYou always need to specify an onnx model file and the `piper` executable needs to be in the current $PATH. \n\n```sh\npython3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts piper --model_name \u003Cpath_to>\u002Fen_US-libritts_r-medium.onnx\n```\n\nYou can specify your custom path to the piper executable by using the `--piper_path` parameter.\n\n```sh\npython3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts piper --model_name \u003Cpath_to>\u002Fen_US-libritts_r-medium.onnx --piper_path \u003Cpath_to>\u002Fpiper\n```\n\nSome models support multiple voices and that can be specified by using the voice_name parameter.\n\n```sh\npython3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts piper --model_name \u003Cpath_to>\u002Fen_US-libritts_r-medium.onnx --piper_speaker 256\n```\n\nYou can also specify speed (piper_length_scale) and pause duration (piper_sentence_silence).\n\n```sh\npython3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts piper --model_name \u003Cpath_to>\u002Fen_US-libritts_r-medium.onnx --piper_speaker 256 --piper_length_scale 1.5 --piper_sentence_silence 0.5\n```\n\nPiper TTS outputs `wav` format files (or raw) by default you should be able to specify any reasonable format via the `--output_format` parameter. The `opus` and `mp3` are good choices for size and compatibility.\n\n```sh\npython3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts piper --model_name \u003Cpath_to>\u002Fen_US-libritts_r-medium.onnx --piper_speaker 256 --piper_length_scale 1.5 --piper_sentence_silence 0.5 --output_format opus\n```\n\n*Alternatively, you can use the following procedure to use piper in a docker container, which simplifies the process of running everything locally.*\n\n1. Ensure you have docker desktop installed on your system. See [Docker](https:\u002F\u002Fwww.docker.com\u002F) to install (or use the [homebrew](https:\u002F\u002Fformulae.brew.sh\u002Fformula\u002Fdocker) formula).\n2. Download a Piper model & config file (see the [piper repo](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper) for details) and place them in the [piper_models](.\u002Fpiper_models\u002F) directory at the top level of this project.\n3. Edit the [docker compose file](.\u002Fdocker-compose.piper-example.yml) to:\n   - In the `piper` container, set the `PIPER_VOICE` environment variable to the name of the model file you downloaded.\n   - In the `piper` container, map the `volumes` to the location of the piper models on your system (if you used the provided directory described in step 2, you can leave this as is).\n   - In the `epub_to_audiobook` container, update the `volumes` mapping from `\u003Cpath\u002Fto\u002Fepub\u002Fdir\u002Fon\u002Fhost>` to the actual path to the epub on your host machine.\n4. From the root of the repo, run `PATH_TO_EPUB_FILE=.\u002FYour_epub_file.epub OUTPUT_DIR=$(pwd)\u002Fpath\u002Fto\u002Faudiobook_output docker compose -f docker-compose.piper-example.yml up --build`, **replacing the placeholder values and output dirs with your desired epub source and audio output respectively**.  (Leave in the $(pwd) !)  Note that the current config in the docker compose will automatically start the process, entirely in the container. If you want to run the main python process outside the container, you can uncomment the command `command: tail -f \u002Fdev\u002Fnull`, and use `docker exec -it epub_to_audiobook \u002Fbin\u002Fbash` to connect to the container and run the python script manually (see comments in the  [docker compose file](.\u002Fdocker-compose.piper-example.yml) for more details).\n\n### Examples using Kokoro TTS\n\nThe documented usage of Kokoro TTS with this script uses a docker image with endpoints that are OpenAI compatible. However, since it's a \"self-hosted\" service, you won't need to get an actual key. This requires docker, so follow the docker installation and setup instructions above in the Piper section if you don't have docker on your machine already.\n\nTo run, in one terminal tab, run either\n\n```bash\ndocker run -p 8880:8880 ghcr.io\u002Fremsky\u002Fkokoro-fastapi-cpu\n```\n\nOr if you have a GPU that can help with processing, run\n\n```bash\ndocker run --gpus all -p 8880:8880 ghcr.io\u002Fremsky\u002Fkokoro-fastapi-gpu\n```\n\nThen in another tab, run\n\n```bash\nexport OPENAI_BASE_URL=http:\u002F\u002Flocalhost:8880\u002Fv1\nexport OPENAI_API_KEY=\"fake\"\npython main.py path\u002Fto\u002Fepub output-dir --tts openai --voice_name \"af_bella(3)+af_alloy(1)\" --model_name \"tts-1\" #you can replace this with any other voice name. Link below. \n```\nNote that passing `--model_name tts-1` parameter **is required** since kokoro breaks with the current default model_name value.\n\nAlternatively, you can do the entire set up through docker compose using the [docker compose file set up for kokoro](.\u002Fdocker-compose.kokoro-example.yml).\n\nTo do so, open the file with your favorite editor and then:\n\n- From the root of the repo, run `PATH_TO_EPUB_FILE=.\u002FYour_epub_file.epub OUTPUT_DIR=$(pwd)\u002Fpath\u002Fto\u002Faudiobook_output VOICE_NAME=Your_desired_voice docker compose -f docker-compose.kokoro-example.yml up --build`, **replacing the placeholder values and output dirs with your desired epub source and audio output respectively, and your voice name**. \n  -  A list of voices can be found [here](https:\u002F\u002Fhuggingface.co\u002Fhexgrad\u002FKokoro-82M\u002Fblob\u002Fmain\u002FVOICES.md), and you can sample what they sound like [here](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fhexgrad\u002FKokoro-TTS).\n- Note that the current config in the docker compose will automatically start the process, entirely in the container. If you want to run the main python process outside the container, you can uncomment the command `command: tail -f \u002Fdev\u002Fnull`, and use `docker exec -it epub_to_audiobook \u002Fbin\u002Fbash` to connect to the container and run the python script manually (see comments in the  [docker compose file](.\u002Fdocker-compose.kokoro-example.yml) for more details).\n\n\nFor more information on the image used for kokoro tts, visit this [repo](https:\u002F\u002Fgithub.com\u002Fremsky\u002FKokoro-FastAPI).\n\n## Troubleshooting\n\n### ModuleNotFoundError: No module named 'importlib_metadata'\n\nThis may be because the Python version you are using is [less than 3.8](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F73165636\u002Fno-module-named-importlib-metadata). You can try to manually install it by `pip3 install importlib-metadata`, or use a higher Python version.\n\n### FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'\n\nMake sure ffmpeg binary is accessible from your path. If you are on a mac and use homebrew, you can do `brew install ffmpeg`, On Ubuntu you can do `sudo apt install ffmpeg`\n\n### Piper TTS\n\nFor installation-related issues, please refer to the [Piper TTS](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper) repository. It's important to note that if you're installing `piper-tts` via pip, [only Python 3.10](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper\u002Fissues\u002F509) is currently supported. Mac users may encounter additional challenges when using the downloaded [binary](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper\u002Fissues\u002F523). For more information on Mac-specific issues, please check [this issue](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper\u002Fissues\u002F395) and [this pull request](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper\u002Fpull\u002F412).\n\nAlso check [this](https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fissues\u002F85) if you're having trouble with Piper TTS.\n\n## Related Projects\n\n- [Epub to Audiobook (M4B)](https:\u002F\u002Fgithub.com\u002Fduplaja\u002Fepub-to-audiobook-hf): Epub to MB4 Audiobook, with StyleTTS2 via HuggingFace Spaces API.\n- [Storyteller](https:\u002F\u002Fstoryteller-platform.gitlab.io\u002Fstoryteller\u002F): A self-hosted platform for automatically syncing ebooks and audiobooks.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n","# EPUB 转有声书转换器 [![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1177631634724491385?label=Discord&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.com\u002Finvite\u002Fpgp2G8zhS7) [![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fp0n1\u002Fepub_to_audiobook)\n\n*如有任何问题或讨论，请加入我们的 [Discord](https:\u002F\u002Fdiscord.com\u002Finvite\u002Fpgp2G8zhS7) 服务器。您也可以在 [DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fp0n1\u002Fepub_to_audiobook) 上提问关于该项目的问题。*\n\n本项目提供了一个命令行工具，用于将 EPUB 电子书转换为有声书。它现在支持 [Microsoft Azure 文本转语音 API](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fspeech-service\u002Frest-text-to-speech)（或者 [EdgeTTS](https:\u002F\u002Fgithub.com\u002Frany2\u002Fedge-tts)）以及 [OpenAI 文本转语音 API](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fguides\u002Ftext-to-speech)，以生成电子书中每一章的音频。输出的音频文件经过优化，可与 [Audiobookshelf](https:\u002F\u002Fgithub.com\u002Fadvplyr\u002Faudiobookshelf) 配合使用。\n\n\u003C!-- *本项目是在 ChatGPT 的帮助下开发的。* -->\n\n## 最新更新\n\n- 2025-05-23：为项目添加了网页界面 (WebUI)。\n\n## 音频示例\n\n如果您有兴趣试听由该工具生成的有声书样本，请查看下方链接。\n\n- [Azure TTS 示例](https:\u002F\u002Faudio.com\u002Fpaudi\u002Faudio\u002F0008-chapter-vii-agricultural-experience)\n- [OpenAI TTS 示例](https:\u002F\u002Faudio.com\u002Fpaudi\u002Faudio\u002Fopenai-0008-chapter-vii-agricultural-experience-i-had-now-been-in)\n- Edge TTS 示例：语音几乎与 Azure TTS 相同\n- [Piper TTS](https:\u002F\u002Frhasspy.github.io\u002Fpiper-samples\u002F)\n- [Kokoro TTS](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fhexgrad\u002FKokoro-TTS)（使用时需通过本地 OpenAI 端点）\n\n## 系统要求\n\n- Python 3.10+ 或 ***Docker***\n- 使用 *Azure TTS* 时，需要拥有 Microsoft Azure 帐户，并具备访问 [Microsoft Cognitive Services Speech Services](https:\u002F\u002Fportal.azure.com\u002F#create\u002FMicrosoft.CognitiveServicesSpeechServices) 的权限。\n- 使用 *OpenAI TTS* 时，需要 OpenAI 的 [API 密钥](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys)。\n  - 如果您使用 Kokoro TTS，则不需要官方的 OpenAI 密钥，但需要在环境变量中设置一个占位值。（例如 `export OPENAI_API_KEY='fake'`），除非您使用下面提供的 docker-compose 文件。\n- 使用 *Edge TTS* 时，无需 API 密钥。\n- Piper TTS 可执行文件及模型。\n\n## Audiobookshelf 集成\n\n本项目生成的有声书已针对 [Audiobookshelf](https:\u002F\u002Fgithub.com\u002Fadvplyr\u002Faudiobookshelf) 进行优化。EPUB 文件中的每一章都会被转换为单独的 MP3 文件，并提取章节标题作为元数据嵌入其中。\n\n![demo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fp0n1_epub_to_audiobook_readme_d82206ceee82.png)\n\n### 章节标题\n\n从 EPUB 文件中解析和提取章节标题可能具有挑战性，因为不同电子书的格式和结构可能存在很大差异。脚本采用了一种简单而有效的方法来提取章节标题，适用于大多数 EPUB 文件。该方法会解析 EPUB 文件，并在每个章节的 HTML 内容中查找 `title` 标签。如果未找到 `title` 标签，则会根据章节文本的前几句话生成一个备用标题。\n\n请注意，这种方法可能无法完美适用于所有 EPUB 文件，尤其是那些格式复杂或不寻常的文件。然而，在大多数情况下，它提供了一种可靠的方式来提取章节标题，以便在 Audiobookshelf 中使用。\n\n当您将生成的 MP3 文件导入 Audiobookshelf 时，章节标题将会显示出来，方便您在各章节之间导航，从而提升您的收听体验。\n\n## 安装\n\n1. 克隆此仓库：\n\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook.git\n    cd epub_to_audiobook\n    ```\n\n2. 创建虚拟环境并激活：\n\n    ```bash\n    python3 -m venv venv\n    source venv\u002Fbin\u002Factivate\n    ```\n\n3. 安装所需的依赖项：\n\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n    注意：Python 3.14 需要使用本仓库中更新的依赖包。较旧的安装版本固定了 `gradio==5.33.1`，这可能会强制进行 `pydantic-core` 的源码构建，并在安装过程中因 PyO3 兼容性错误而失败。\n\n4. 设置以下环境变量，填入您的 Azure 文本转语音 API 凭证，或者如果您使用 OpenAI TTS，则填写您的 OpenAI API 密钥：\n\n    ```bash\n    export MS_TTS_KEY=\u003Cyour_subscription_key> # 用于 Azure\n    export MS_TTS_REGION=\u003Cyour_region> # 用于 Azure\n    export OPENAI_API_KEY=\u003Cyour_openai_api_key> # 用于 OpenAI\n    ```\n\n## 网页界面 (WebUI)\n\n对于喜欢图形化界面的用户，本项目包含一个基于 Gradio 构建的网页 UI。WebUI 提供了一种直观的方式来配置所有选项，并在无需使用命令行的情况下转换您的 EPUB 文件。\n\n![WebUI 截图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fp0n1_epub_to_audiobook_readme_466d911dc994.png)\n\n### WebUI 的环境变量\n\nWebUI 尊重与命令行工具相同的环境变量：\n\n```bash\nexport MS_TTS_KEY=\u003Cyour_subscription_key>      # 用于 Azure TTS\nexport MS_TTS_REGION=\u003Cyour_region>             # 用于 Azure TTS\nexport OPENAI_API_KEY=\u003Cyour_openai_api_key>    # 用于 OpenAI TTS\nexport OPENAI_BASE_URL=\u003Ccustom_endpoint>       # 可选：用于自定义的 OpenAI 兼容端点\n```\n\n请确保在启动 WebUI 之前，已为您使用的服务正确设置环境变量。\n\n### 启动 WebUI\n\n请务必在启动 WebUI 之前完成 [安装](#installation) 步骤。\n\n要启动网页界面，请运行：\n\n```bash\npython3 main_ui.py\n```\n\n默认情况下，WebUI 将在 `http:\u002F\u002F127.0.0.1:7860` 上可用。您可以自定义主机和端口：\n\n```bash\npython3 main_ui.py --host 127.0.0.1 --port 8080\n```\n\n完成后，记得在终端中按 `Ctrl+C` 来停止服务器。\n\n### WebUI 功能\n\n网页界面提供：\n\n- **文件上传**：直接将 EPUB 文件拖放到浏览器中\n- **TTS 提供商选择**：轻松切换 Azure、OpenAI、Edge 和 Piper TTS，并提供特定于提供商的选项\n- **语音配置**：下拉菜单可用于选择语言、声音和输出格式\n- **高级设置**：所有命令行选项均可通过网页界面访问\n- **实时日志**：可在浏览器中直接查看转换进度和日志\n- **预览模式**：无需生成音频即可测试您的设置\n- **搜索与替换**：上传文本替换文件以修正发音\n\n### 使用 WebUI\n\n1. 使用文件选择器 **上传你的 EPUB 文件**  \n2. 从选项卡中 **选择你的 TTS 提供商**（OpenAI、Azure、Edge 或 Piper）  \n3. **配置提供商特定的设置**：  \n   - **OpenAI**：选择模型、语音、语速和格式  \n   - **Azure**：选择语言、语音、格式和停顿时长  \n   - **Edge**：设置语言、语音、速率、音量和音高  \n   - **Piper**：配置本地或 Docker 部署，并选择语音选项  \n4. 设置 **输出目录**，或使用默认的时间戳文件夹  \n5. 如有需要，**调整高级选项**（章节范围、文本处理等）  \n6. 点击 **开始** 以启动转换  \n7. 通过集成的日志查看器 **监控进度**\n\n在开始完整转换之前，你可以选择几个章节来预览音频。\n\n### 使用 Docker 运行 WebUI（如果你熟悉 Docker 的最简单方式）\n\n你也可以使用 Docker 来运行 WebUI。请使用提供的 `docker-compose.webui.yml` 文件，并确保用你的 TTS 提供商的 API 密钥编辑该文件。\n\n```bash\n# 使用你的 API 密钥编辑 docker-compose.webui.yml 文件\ndocker compose -f docker-compose.webui.yml up\n```\n\nWebUI 将可以通过 `http:\u002F\u002Flocalhost:7860` 或 `http:\u002F\u002F127.0.0.1:7860` 访问。\n\n### WebUI 的安全注意事项\n\nWebUI 是一个在你的本地机器上运行的 Web 应用程序。目前它并未设计为可从公共互联网访问。系统中没有授权机制。因此，你不应将其暴露到公共互联网上，否则可能导致对你的 TTS 提供商的未授权访问。\n\n## 使用方法\n\n要将 EPUB 电子书转换为有声书，请运行以下命令，并使用 `--tts` 选项指定你选择的 TTS 提供商：\n\n```bash\npython3 main.py \u003C输入文件> \u003C输出文件夹> [选项]\n```\n\n要查看此脚本的最新选项说明，可以在终端中运行以下命令：\n\n```bash\npython3 main.py -h\n```\n\n```bash\n用法：main.py [-h] [--tts {azure,openai,edge,piper}]\n               [--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--preview]\n               [--no_prompt] [--language LANGUAGE]\n               [--newline_mode {single,double,none}]\n               [--title_mode {auto,tag_text,first_few}]\n               [--chapter_start CHAPTER_START] [--chapter_end CHAPTER_END]\n               [--output_text] [--remove_endnotes]\n               [--search_and_replace_file SEARCH_AND_REPLACE_FILE]\n               [--worker_count WORKER_COUNT]\n               [--voice_name VOICE_NAME] [--output_format OUTPUT_FORMAT]\n               [--model_name MODEL_NAME] [--voice_rate VOICE_RATE]\n               [--voice_volume VOICE_VOLUME] [--voice_pitch VOICE_PITCH]\n               [--proxy PROXY] [--break_duration BREAK_DURATION]\n               [--piper_path PIPER_PATH] [--piper_speaker PIPER_SPEAKER]\n               [--piper_sentence_silence PIPER_SENTENCE_SILENCE]\n               [--piper_length_scale PIPER_LENGTH_SCALE]\n               input_file output_folder\n\n将文字书籍转换为有声书\n\n位置参数：\n  input_file            EPUB 文件的路径\n  output_folder         输出文件夹的路径\n\n选项：\n  -h, --help            显示此帮助信息并退出\n  --tts {azure,openai,edge,piper}\n                        选择TTS提供商（默认：azure）。azure：Azure 认知服务，openai：OpenAI TTS API。使用 azure 时，必须设置环境变量 MS_TTS_KEY 和 MS_TTS_REGION。使用 openai 时，必须设置环境变量 OPENAI_API_KEY。\n  --log {DEBUG,INFO,WARNING,ERROR,CRITICAL}\n                        日志级别（默认：INFO），可选 DEBUG、INFO、WARNING、ERROR、CRITICAL\n  --preview             启用预览模式。在预览模式下，脚本不会将文本转换为语音，而是会打印章节索引、标题和字符数。\n  --no_prompt           在估算 TTS 的云成本后，不再询问用户是否继续。适用于脚本化操作。\n  --language LANGUAGE   文本转语音服务的语言（默认：en-US）。对于 Azure TTS（--tts=azure），请参阅 https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fai-services\u002Fspeech-service\u002Flanguage-support?tabs=tts#text-to-speech 查看支持的语言。对于 OpenAI TTS（--tts=openai），其 API 会自动检测语言。但设置此选项也有助于本工具以不同策略将文本分割成块，尤其适用于中文文本。对于中文书籍，请使用 zh-CN、zh-TW 或 zh-HK。\n  --newline_mode {single,double,none}\n                        选择检测新段落的模式：“single”、“double”或“none”。“single”表示单个换行符，“double”表示连续两个换行符。“none”表示所有换行符都将被替换为空格，因此不会检测到段落。（默认：double，对大多数电子书有效，但对部分电子书可能检测到的段落数较少）\n  --title_mode {auto,tag_text,first_few}\n                        选择章节标题的解析模式：“tag_text”会搜索“title”、“h1”、“h2”、“h3”标签作为标题；“first_few”则将前60个字符设为标题；“auto”会自动应用最适合当前章节的模式。\n  --chapter_start CHAPTER_START\n                        章节起始索引（默认：1，从1开始）\n  --chapter_end CHAPTER_END\n                        章节结束索引（默认：-1，表示到最后一章）\n  --output_text         启用输出文本功能。这将为每个指定的章节导出一个纯文本文件，并将其写入指定的输出文件夹。\n  --remove_endnotes     这将移除句末或句中的尾注编号。这对于学术类书籍非常有用。\n  --search_and_replace_file SEARCH_AND_REPLACE_FILE\n                        包含每行一条正则替换规则的文件路径，用于修正发音等问题。格式为：\u003C搜索>==\u003C替换>。请注意，可能需要指定单词边界，以避免替换单词的一部分。\n  --worker_count WORKER_COUNT\n                        指定用于生成有声书的并行工作进程数量。增加此值可以显著加快处理速度，因为多个章节可以同时处理。注意：章节可能不会按顺序处理，但这不会影响最终的有声书。\n\n  --voice_name VOICE_NAME\n                        不同的 TTS 提供商有不同的语音名称，请查阅相应提供商的设置。\n  --output_format OUTPUT_FORMAT\n                        文本转语音服务的输出格式。支持的格式取决于所选的 TTS 提供商。\n  --model_name MODEL_NAME\n                        不同的 TTS 提供商有不同的神经网络模型名称。\n\nOpenAI 特定选项：\n  --speed SPEED         生成音频的速度。取值范围为 0.25 至 4.0。默认值为 1.0。\n  --instructions INSTRUCTIONS\n                        TTS 模型的指令。仅适用于 “gpt-4o-mini-tts” 模型。\n\nEdge 特定选项：\n  --voice_rate VOICE_RATE\n                        文本的语速。有效相对值范围为 -50%（--xxx='-50%'）至 +100%。负值需使用 --arg=value 格式。\n  --voice_volume VOICE_VOLUME\n                        说话声音量。有效相对值最低可达 -100%。负值需使用 --arg=value 格式。\n  --voice_pitch VOICE_PITCH\n                        文本的基础音高。有效相对值如 -80Hz、+50Hz，音高变化应在原音频的 0.5 至 1.5 倍范围内。负值需使用 --arg=value 格式。\n  --proxy PROXY         TTS 提供商的代理服务器。格式：http:\u002F\u002F[username:password@]proxy.server:port\n\nAzure\u002FEdge 特定选项：\n  --break_duration BREAK_DURATION\n                        不同段落或章节之间的停顿时间，单位为毫秒（默认：1250，即1.25秒）。Azure TTS 的有效值范围为0至5000毫秒。\n\nPiper 特定选项：\n  --piper_path PIPER_PATH\n                        Piper TTS 可执行文件的路径。\n  --piper_speaker PIPER_SPEAKER\n                        Piper 演讲者ID，用于多演讲者模型。\n  --piper_sentence_silence PIPER_SENTENCE_SILENCE\n                        每句话后的静默秒数。\n  --piper_length_scale PIPER_LENGTH_SCALE\n                        音素长度，即语速。\n```  \n\n**示例**：\n\n```bash\npython3 main.py examples\u002FThe_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder\n```\n\n执行上述命令将生成名为 `output_folder` 的目录，并使用默认的 TTS 提供器和语音将各章节的 MP3 文件保存到该目录中。生成完成后，您可以将这些音频文件导入 [Audiobookshelf](https:\u002F\u002Fgithub.com\u002Fadvplyr\u002Faudiobookshelf) 或使用任何您喜欢的音频播放器进行播放。\n\n## 预览模式\n\n在将 EPUB 文件转换为有声书之前，您可以使用 `--preview` 选项获取每个章节的摘要。这将为您提供每个章节及总文本的字符数，而不会进行文本转语音处理。\n\n**示例**：\n\n```bash\npython3 main.py examples\u002FThe_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --preview\n```\n\n## 搜索与替换\n\n您可能希望搜索并替换文本，以扩展缩写或帮助发音。可以通过指定一个搜索和替换文件来实现，该文件每行包含一个正则表达式搜索和替换规则，用 `==` 分隔：\n\n**示例**：\n\n**search.conf**：\n\n```text\n# 这是一般结构\n\u003C搜索>==\u003C替换>\n# 这是注释\n# 修正方位缩写\nN\\.E\\.==north east\n# 使用正则表达式时要小心，因为这也会匹配 Sally N. Smith\nN\\.==north\n# 按照当地人的发音念“Barbadoes”\nBarbadoes==Barbayduss\n```\n\n```bash\npython3 main.py examples\u002FThe_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --search_and_replace_file search.conf\n```\n\n**示例**：\n\n```bash\npython3 main.py examples\u002FThe_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --preview\n```\n\n## 使用 Docker\n\n此工具提供 Docker 镜像，方便运行，无需管理 Python 依赖项。\n\n首先，请确保您的系统已安装 Docker。\n\n您可以从 GitHub Container Registry 拉取 Docker 镜像：\n\n```bash\ndocker pull ghcr.io\u002Fp0n1\u002Fepub_to_audiobook:latest\n```\n\n然后，您可以使用以下命令运行该工具：\n\n```bash\ndocker run -i -t --rm -v .\u002F:\u002Fapp -e MS_TTS_KEY=$MS_TTS_KEY -e MS_TTS_REGION=$MS_TTS_REGION ghcr.io\u002Fp0n1\u002Fepub_to_audiobook your_book.epub audiobook_output --tts azure\n```\n\n对于 OpenAI，您可以运行：\n\n```bash\ndocker run -i -t --rm -v .\u002F:\u002Fapp -e OPENAI_API_KEY=$OPENAI_API_KEY ghcr.io\u002Fp0n1\u002Fepub_to_audiobook your_book.epub audiobook_output --tts openai\n```\n\n请将 `$MS_TTS_KEY` 和 `$MS_TTS_REGION` 替换为您 Azure 文本转语音 API 的凭据。将 `$OPENAI_API_KEY` 替换为您 OpenAI 的 API 密钥。将 `your_book.epub` 替换为输入 EPUB 文件的名称，将 `audiobook_output` 替换为您希望保存输出文件的目录名称。\n\n`-v .\u002F:\u002Fapp` 选项会将当前目录（`.`）挂载到 Docker 容器中的 `\u002Fapp` 目录。这样，工具就可以读取输入文件并将输出文件写入您的本地文件系统。\n\n`-i` 和 `-t` 选项是必需的，用于启用交互模式并分配伪 TTY。\n\n**您还可以查看 [此示例配置文件](.\u002Fdocker-compose.example.yml) 以了解如何使用 Docker Compose。**\n\n## Windows 用户友好指南\n\n对于 Windows 用户，尤其是不熟悉命令行工具的用户，我们为您准备了专门的指南，帮助您轻松上手。\n\n请参阅这份[逐步指南](https:\u002F\u002Fgist.github.com\u002Fp0n1\u002Fcba98859cdb6331cc1aab835d62e4fba)，如果您遇到任何问题，请留言告诉我们。\n\n## 如何获取 Azure 认知服务密钥？\n\n- Azure 订阅 — [免费创建一个](https:\u002F\u002Fazure.microsoft.com\u002Ffree\u002Fcognitive-services)\n- 在 Azure 门户中[创建语音资源](https:\u002F\u002Fportal.azure.com\u002F#create\u002FMicrosoft.CognitiveServicesSpeechServices)。\n- 获取语音资源的密钥和区域。语音资源部署完成后，选择“转到资源”以查看和管理密钥。有关认知服务资源的更多信息，请参阅[获取资源的密钥](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fcognitive-services-apis-create-account#get-the-keys-for-your-resource)。\n\n*来源：[https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fspeech-service\u002Fget-started-text-to-speech#prerequisites]*\n\n## 如何获取 OpenAI API 密钥？\n\n请访问 https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fquickstart\u002Faccount-setup。使用前请务必查看[价格](https:\u002F\u002Fopenai.com\u002Fpricing)详情。\n\n## ✨ 关于 Edge TTS\n\nEdge TTS 和 Azure TTS 几乎相同，区别在于 Edge TTS 不需要 API 密钥，因为它基于 Edge 的朗读功能，且参数稍有限制，例如[自定义 SSML](https:\u002F\u002Fgithub.com\u002Frany2\u002Fedge-tts#custom-ssml)。\n\n请访问 https:\u002F\u002Fgist.github.com\u002FBettyJJ\u002F17cbaa1de96235a7f5773b8690a20462 查看支持的语音。\n\n**如果您想快速试用这个项目，强烈推荐使用 Edge TTS。**\n\n## 自定义语音和语言\n\n您可以通过在运行脚本时传递 `--voice_name` 和 `--language` 选项来自定义文本转语音转换中使用的语音和语言。\n\nMicrosoft Azure 为文本转语音服务提供了多种语音和语言。有关可用选项的列表，请参阅 [Microsoft Azure 文本转语音文档](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fcognitive-services\u002Fspeech-service\u002Flanguage-support?tabs=tts#text-to-speech)。\n\n您还可以在 [Azure TTS 语音库](https:\u002F\u002Faka.ms\u002Fspeechstudio\u002Fvoicegallery) 中收听可用语音的样本，以帮助您为有声书选择最佳语音。\n\n例如，如果您想使用英式英语女声进行转换，可以使用以下命令：\n\n```bash\npython3 main.py \u003Cinput_file> \u003Coutput_folder> --voice_name en-GB-LibbyNeural --language en-GB\n```\n\n对于 OpenAI TTS，您可以分别使用 `--model_name`、`--voice_name` 和 `--output_format` 来指定模型、语音和格式选项。\n\n## 更多示例\n\n以下是一些演示各种选项组合的示例：\n\n### 使用 Azure TTS 的示例\n\n1. **使用默认设置的基本 Azure 转换**  \n   此命令将使用 Azure 的默认 TTS 设置将 EPUB 文件转换为有声书。\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts azure\n   ```\n\n2. **带有自定义语言、语音和日志级别 的 Azure 转换**  \n   将 EPUB 文件转换为有声书，并指定特定语音以及自定义的日志级别以便调试。\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts azure --language zh-CN --voice_name \"zh-CN-YunyeNeural\" --log DEBUG\n   ```\n\n3. **带有章节范围和停顿时间的 Azure 转换**  \n   将 EPUB 文件中指定范围的章节转换为有声书，并在段落之间设置自定义的停顿时间。\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts azure --chapter_start 5 --chapter_end 10 --break_duration \"1500\"\n   ```\n\n### 使用 OpenAI TTS 的示例\n\n1. **使用 OpenAI 默认设置的基本转换**  \n   此命令将使用 OpenAI 的默认 TTS 设置将 EPUB 文件转换为有声书。\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts openai\n   ```\n\n2. **使用高清模型和特定语音的 OpenAI 转换**  \n   使用高清 OpenAI 模型和指定的语音选项，将 EPUB 文件转换为有声书。\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts openai --model_name \"tts-1-hd\" --voice_name \"fable\"\n   ```\n\n3. **启用预览和文本输出的 OpenAI 转换**  \n   启用预览模式和文本输出，这将显示章节索引和标题，而不是进行转换，并且还会导出文本。\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts openai --preview --output_text\n   ```\n\n## 使用与 OpenAI 兼容的服务示例\n\n可以使用与 OpenAI 兼容的服务，例如 [matatonic\u002Fopenedai-speech](https:\u002F\u002Fgithub.com\u002Fmatatonic\u002Fopenedai-speech)。在这种情况下，**必须**设置 `OPENAI_BASE_URL` 环境变量，否则它将默认使用标准的 OpenAI 服务。虽然兼容的服务可能不需要 API 密钥，但 OpenAI 客户端仍然需要，因此请确保将其设置为一个无效值。\n\n如果您的 OpenAI 兼容服务运行在 `http:\u002F\u002F127.0.0.1:8000` 上，并且您已添加了一个名为 `skippy` 的自定义语音，则可以使用以下命令：\n\n```shell\ndocker run -i -t --rm -v .\u002F:\u002Fapp -e OPENAI_BASE_URL=http:\u002F\u002F127.0.0.1:8000\u002Fv1 -e OPENAI_API_KEY=nope ghcr.io\u002Fp0n1\u002Fepub_to_audiobook your_book.epub audiobook_output --tts openai --voice_name=skippy --model_name=tts-1-hd\n```\n\n请向下滚动至下方的 Kokoro TTS 示例，以查看更具体的示例。\n\n### 使用 Edge TTS 的示例\n\n1. **使用 Edge 默认设置的基本转换**  \n   此命令将使用 Edge 的默认 TTS 设置将 EPUB 文件转换为有声书。\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts edge\n   ```\n\n2. **使用自定义语言、语音和日志级别进行 Edge 转换**  \n   将 EPUB 文件转换为有声书，同时指定语音和自定义日志级别以便调试。\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts edge --language zh-CN --voice_name \"zh-CN-YunxiNeural\" --log DEBUG\n   ```\n\n3. **指定章节范围和停顿时间的 Edge 转换**  \n   将 EPUB 文件中指定范围的章节转换为有声书，并在段落之间设置自定义的停顿时间。\n\n   ```sh\n   python3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts edge --chapter_start 5 --chapter_end 10 --break_duration \"1500\"\n   ```\n\n### 使用 Piper TTS 的示例\n\n*请确保已安装 Piper TTS，并拥有 ONNX 模型文件及相应的配置文件。更多信息请参阅 [Piper TTS](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper)。您可以按照其说明安装 Piper TTS、下载模型和配置文件，先进行试用，然后再尝试以下示例。*\n\n此命令将使用最少的参数通过 Piper TTS 将 EPUB 文件转换为有声书。您始终需要指定一个 ONNX 模型文件，并且 `piper` 可执行文件需位于当前的 `$PATH` 中。\n\n```sh\npython3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts piper --model_name \u003Cpath_to>\u002Fen_US-libritts_r-medium.onnx\n```\n\n您可以通过使用 `--piper_path` 参数来指定自定义的 `piper` 可执行文件路径。\n\n```sh\npython3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts piper --model_name \u003Cpath_to>\u002Fen_US-libritts_r-medium.onnx --piper_path \u003Cpath_to>\u002Fpiper\n```\n\n某些模型支持多种语音，可通过 `voice_name` 参数进行指定。\n\n```sh\npython3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts piper --model_name \u003Cpath_to>\u002Fen_US-libritts_r-medium.onnx --piper_speaker 256\n```\n\n您还可以指定语速（`piper_length_scale`）和停顿时间（`piper_sentence_silence`）。\n\n```sh\npython3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts piper --model_name \u003Cpath_to>\u002Fen_US-libritts_r-medium.onnx --piper_speaker 256 --piper_length_scale 1.5 --piper_sentence_silence 0.5\n```\n\nPiper TTS 默认输出 `wav` 格式的文件（或原始音频），您可以通过 `--output_format` 参数指定任何合理的格式。`opus` 和 `mp3` 是在文件大小和兼容性方面不错的选择。\n\n```sh\npython3 main.py \"path\u002Fto\u002Fbook.epub\" \"path\u002Fto\u002Foutput\u002Ffolder\" --tts piper --model_name \u003Cpath_to>\u002Fen_US-libritts_r-medium.onnx --piper_speaker 256 --piper_length_scale 1.5 --piper_sentence_silence 0.5 --output_format opus\n```\n\n*或者，您也可以使用以下步骤，在 Docker 容器中使用 Piper，从而简化本地运行流程。*\n\n1. 确保您的系统上已安装 Docker Desktop。请访问 [Docker](https:\u002F\u002Fwww.docker.com\u002F) 进行安装，或使用 [homebrew](https:\u002F\u002Fformulae.brew.sh\u002Fformula\u002Fdocker) 包管理器。\n2. 下载 Piper 模型和配置文件（详情请参阅 [Piper 仓库](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper)），并将它们放置在本项目的顶层目录中的 `piper_models` 文件夹内。\n3. 编辑 [docker-compose 文件](.\u002Fdocker-compose.piper-example.yml)，进行如下修改：\n   - 在 `piper` 容器中，将 `PIPER_VOICE` 环境变量设置为您下载的模型文件名称。\n   - 在 `piper` 容器中，将 `volumes` 映射到您系统中 Piper 模型的实际位置（如果您使用了第 2 步中提供的目录，则可保持原样）。\n   - 在 `epub_to_audiobook` 容器中，更新 `volumes` 映射，将 `\u003Cpath\u002Fto\u002Fepub\u002Fdir\u002Fon\u002Fhost>` 替换为您主机上 EPUB 文件的实际路径。\n4. 从项目根目录运行 `PATH_TO_EPUB_FILE=.\u002FYour_epub_file.epub OUTPUT_DIR=$(pwd)\u002Fpath\u002Fto\u002Faudiobook_output docker compose -f docker-compose.piper-example.yml up --build`，**请将占位符值和输出目录替换为您所需的 EPUB 源文件和音频输出路径**。（务必保留 $(pwd)！）请注意，当前的 docker-compose 配置会自动启动整个流程，完全在容器内完成。如果您希望在容器外运行主 Python 进程，可以取消注释命令 `command: tail -f \u002Fdev\u002Fnull`，然后使用 `docker exec -it epub_to_audiobook \u002Fbin\u002Fbash` 连接到容器并手动运行 Python 脚本（更多详细信息请参阅 [docker-compose 文件](.\u002Fdocker-compose.piper-example.yml) 中的注释）。\n\n### 使用 Kokoro TTS 的示例\n\n该脚本中记录的 Kokoro TTS 使用方法是通过一个与 OpenAI 兼容端点的 Docker 镜像来实现的。不过，由于这是一个“自托管”服务，你不需要获取实际的 API 密钥。这需要 Docker 环境，因此如果你的机器上尚未安装 Docker，请按照前面 Piper 部分中的 Docker 安装和设置说明进行操作。\n\n要运行，可以在一个终端标签页中执行以下命令之一：\n\n```bash\ndocker run -p 8880:8880 ghcr.io\u002Fremsky\u002Fkokoro-fastapi-cpu\n```\n\n或者，如果你的 GPU 能够帮助加速处理，可以运行：\n\n```bash\ndocker run --gpus all -p 8880:8880 ghcr.io\u002Fremsky\u002Fkokoro-fastapi-gpu\n```\n\n然后在另一个标签页中运行：\n\n```bash\nexport OPENAI_BASE_URL=http:\u002F\u002Flocalhost:8880\u002Fv1\nexport OPENAI_API_KEY=\"fake\"\npython main.py path\u002Fto\u002Fepub output-dir --tts openai --voice_name \"af_bella(3)+af_alloy(1)\" --model_name \"tts-1\" #你可以用其他任何语音名称替换此处。链接见下文。\n```\n请注意，传递 `--model_name tts-1` 参数是**必需的**，因为使用当前默认的 model_name 值会导致 Kokoro 出现问题。\n\n另外，你也可以通过 [为 Kokoro 准备的 Docker Compose 文件](.\u002Fdocker-compose.kokoro-example.yml) 来完成整个设置。\n\n具体操作如下：用你喜欢的编辑器打开该文件，然后从仓库根目录运行：\n\n```bash\nPATH_TO_EPUB_FILE=.\u002FYour_epub_file.epub OUTPUT_DIR=$(pwd)\u002Fpath\u002Fto\u002Faudiobook_output VOICE_NAME=Your_desired_voice docker compose -f docker-compose.kokoro-example.yml up --build\n```\n\n请将占位符值和输出目录分别替换为你想要的 EPUB 源文件、音频输出路径以及你选择的语音名称。\n\n关于可用的语音列表，可以参考 [这里](https:\u002F\u002Fhuggingface.co\u002Fhexgrad\u002FKokoro-82M\u002Fblob\u002Fmain\u002FVOICES.md)，你还可以在 [这里](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fhexgrad\u002FKokoro-TTS) 听听这些语音的实际效果。\n\n需要注意的是，当前 Docker Compose 配置会自动在容器内启动整个流程。如果你想在容器外运行主 Python 进程，可以取消注释 `command: tail -f \u002Fdev\u002Fnull` 这一行，并使用 `docker exec -it epub_to_audiobook \u002Fbin\u002Fbash` 连接到容器，手动运行 Python 脚本（更多详情请参阅 [Docker Compose 文件](.\u002Fdocker-compose.kokoro-example.yml) 中的注释）。\n\n有关 Kokoro TTS 所使用的镜像的更多信息，请访问此 [仓库](https:\u002F\u002Fgithub.com\u002Fremsky\u002FKokoro-FastAPI)。\n\n## 故障排除\n\n### ModuleNotFoundError: 没有名为 'importlib_metadata' 的模块\n\n这可能是因为你使用的 Python 版本低于 [3.8](https:\u002F\u002Fstackoverflow.com\u002Fquestions\u002F73165636\u002Fno-module-named-importlib-metadata)。你可以尝试手动安装它：`pip3 install importlib-metadata`，或者升级到更高版本的 Python。\n\n### FileNotFoundError: [Errno 2] 没有这样的文件或目录: 'ffmpeg'\n\n请确保你的系统 PATH 中可以找到 ffmpeg 可执行文件。如果你使用的是 macOS 并且安装了 Homebrew，可以运行 `brew install ffmpeg`；在 Ubuntu 上则可以运行 `sudo apt install ffmpeg`。\n\n### Piper TTS\n\n如遇安装相关问题，请参考 [Piper TTS](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper) 仓库。需要注意的是，如果通过 pip 安装 `piper-tts`，目前仅支持 [Python 3.10](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper\u002Fissues\u002F509)。Mac 用户在使用下载的 [二进制文件](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper\u002Fissues\u002F523) 时可能会遇到额外的挑战。有关 Mac 特定问题的更多信息，请查看 [此议题](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper\u002Fissues\u002F395) 和 [此拉取请求](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper\u002Fpull\u002F412)。\n\n如果你在使用 Piper TTS 时遇到困难，也可以参考 [此页面](https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fissues\u002F85)。\n\n## 相关项目\n\n- [Epub to Audiobook (M4B)](https:\u002F\u002Fgithub.com\u002Fduplaja\u002Fepub-to-audiobook-hf): 将 EPUB 转换为 M4B 格式的有声书，使用 HuggingFace Spaces API 结合 StyleTTS2。\n- [Storyteller](https:\u002F\u002Fstoryteller-platform.gitlab.io\u002Fstoryteller\u002F)：一个用于自动同步电子书和有声书的自托管平台。\n\n## 许可证\n\n本项目采用 MIT 许可证授权。详细信息请参阅 [LICENSE](LICENSE) 文件。","# epub_to_audiobook 快速上手指南\n\n`epub_to_audiobook` 是一个命令行工具，可将 EPUB 电子书转换为有声书（MP3），支持 Azure、OpenAI、Edge TTS 等多种语音合成引擎，生成的文件完美适配 Audiobookshelf。\n\n## 环境准备\n\n### 系统要求\n- **Python**: 3.10 或更高版本（推荐 3.12+，注意 Python 3.14 需使用仓库内更新的依赖）\n- **或者**: 安装 Docker 及 Docker Compose（推荐使用 Docker 简化部署）\n\n### 前置依赖与 API Key\n根据选择的语音引擎，需准备以下任一凭证：\n- **Azure TTS**: Microsoft Azure 账号及 Speech Services 资源（需 `MS_TTS_KEY` 和 `MS_TTS_REGION`）\n- **OpenAI TTS**: OpenAI API Key（需 `OPENAI_API_KEY`）\n  - *注：若使用本地兼容接口（如 Kokoro TTS），可设置任意假值，如 `export OPENAI_API_KEY='fake'`*\n- **Edge TTS**: 无需 API Key，可直接使用\n- **Piper TTS**: 需下载 Piper 可执行文件及模型\n\n## 安装步骤\n\n### 方式一：本地源码安装（推荐开发者）\n\n1. **克隆仓库**\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook.git\n   cd epub_to_audiobook\n   ```\n\n2. **创建并激活虚拟环境**\n   ```bash\n   python3 -m venv venv\n   source venv\u002Fbin\u002Factivate  # Windows 用户请使用: venv\\Scripts\\activate\n   ```\n\n3. **安装依赖**\n   > 提示：国内用户若下载缓慢，可临时指定清华源：\n   > `pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n   \n   ```bash\n   pip install -r requirements.txt\n   ```\n\n4. **配置环境变量**\n   根据使用的引擎设置对应的 Key（以 Bash 为例）：\n   ```bash\n   # Azure 示例\n   export MS_TTS_KEY=\u003Cyour_subscription_key>\n   export MS_TTS_REGION=\u003Cyour_region>\n\n   # 或 OpenAI 示例\n   export OPENAI_API_KEY=\u003Cyour_openai_api_key>\n   ```\n\n### 方式二：Docker 快速启动（含 Web 界面）\n\n若熟悉 Docker，可直接使用提供的 compose 文件启动带图形界面的服务：\n\n1. 编辑 `docker-compose.webui.yml` 填入你的 API Key。\n2. 启动服务：\n   ```bash\n   docker compose -f docker-compose.webui.yml up\n   ```\n3. 访问 `http:\u002F\u002Flocalhost:7860` 即可使用 WebUI。\n\n## 基本使用\n\n### 命令行模式 (CLI)\n\n最简单的转换命令如下（将 `input.epub` 转换为有声书并输出到 `output_folder`）：\n\n```bash\npython3 main.py input.epub output_folder --tts edge\n```\n\n*注：上述命令使用免费的 Edge TTS 引擎。若使用 Azure 或 OpenAI，请确保已设置对应环境变量，并将 `--tts` 参数改为 `azure` 或 `openai`。*\n\n查看完整参数帮助：\n```bash\npython3 main.py -h\n```\n\n### Web 界面模式 (WebUI)\n\n偏好图形化操作的用户，可启动内置的 Gradio 界面：\n\n1. **启动服务**\n   ```bash\n   python3 main_ui.py\n   ```\n   默认访问地址：`http:\u002F\u002F127.0.0.1:7860`\n\n2. **操作流程**\n   - 拖拽上传 EPUB 文件。\n   - 在标签页选择 TTS 提供商（OpenAI \u002F Azure \u002F Edge \u002F Piper）。\n   - 配置语音、语速等参数。\n   - 点击 **Start** 开始转换，可在浏览器实时查看日志。\n\n> **安全提示**：WebUI 默认仅在本地运行，无身份验证机制，请勿直接暴露至公网。","一位通勤时间较长的职场人士希望利用碎片时间“阅读”技术类 EPUB 电子书，但缺乏合适的有声资源。\n\n### 没有 epub_to_audiobook 时\n- 只能依赖人工朗读录制或昂贵的商业有声书平台，大量专业领域的 EPUB 书籍根本找不到对应的音频版本。\n- 若尝试自行转换，往往得到单个巨大的音频文件，无法识别章节标题，导致在播放时难以定位具体内容，跳转极其不便。\n- 手动分割音频并编辑元数据耗时耗力，且难以与自建的 Audiobookshelf 媒体库完美兼容，同步进度和封面显示经常出错。\n- 可选的免费 TTS 工具音质机械生硬，长时间收听容易疲劳，而配置高质量语音引擎的技术门槛又过高。\n\n### 使用 epub_to_audiobook 后\n- 直接通过命令行或新增的 WebUI 界面，一键将本地 EPUB 书籍转换为高保真有声书，支持 Azure、OpenAI 等多种优质语音引擎。\n- 自动解析原书结构，将每个章节独立生成 MP3 文件，并精准提取章节标题作为元数据，实现毫秒级章节跳转。\n- 输出格式专为 Audiobookshelf 优化，导入后自动匹配封面、作者及章节信息，无缝融入个人媒体库，多设备同步流畅。\n- 无需复杂的环境配置，利用 Docker 或简单的 Python 环境即可运行，让非技术人员也能轻松制作专属的高质量有声读物。\n\nepub_to_audiobook 将静态的电子书瞬间转化为结构清晰、音质自然的个性化有声库，彻底释放了通勤与家务时间的学习潜力。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fp0n1_epub_to_audiobook_d82206ce.png","p0n1",null,"https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fp0n1_01cda756.png","“Dude, suckin’ at something is the first step to being sorta good at something.”","ErrNil","https:\u002F\u002Fgithub.com\u002Fp0n1",[21,25,29],{"name":22,"color":23,"percentage":24},"Python","#3572A5",98.7,{"name":26,"color":27,"percentage":28},"Dockerfile","#384d54",0.8,{"name":30,"color":31,"percentage":32},"Shell","#89e051",0.5,1951,207,"2026-04-18T22:42:08","MIT",2,"Linux, macOS, Windows","未说明",{"notes":41,"python":42,"dependencies":43},"支持多种 TTS 引擎：Azure、OpenAI、EdgeTTS（无需 API 密钥）、Piper（需本地可执行文件和模型）及 Kokoro（通过本地 OpenAI 端点）。提供命令行和 WebUI 两种使用方式。若使用 Docker 运行则无需手动配置 Python 环境。使用 Azure 或 OpenAI 服务需配置相应的 API 密钥环境变量。WebUI 默认仅在本地运行，无身份验证机制，不建议直接暴露于公网。","3.10+",[44,45],"gradio>=5.33.1 (针对 Python 3.14)","pydantic-core",[47,48],"音频","语言模型",[50,51,52,53,54,55,56],"audiobooks","audiobookshelf","epub","tts","chatgpt","openai","webui","ready","2026-03-27T02:49:30.150509","2026-04-20T07:20:35.080218",[61,66,71,75,80,85,90,94],{"id":62,"question_zh":63,"answer_zh":64,"source_url":65},44053,"如何获取或配置 Azure 区域代码？","该问题在讨论中主要涉及对垃圾邮件的警告，并未提供具体的 Azure 区域代码获取技术细节。维护者提醒用户注意 GitHub 上的加密货币空投诈骗垃圾邮件，不要点击通知邮件中的相关链接以确保安全。关于具体的区域代码配置，建议查阅项目文档或源代码中关于 Azure TTS 初始化的部分。","https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fissues\u002F12",{"id":67,"question_zh":68,"answer_zh":69,"source_url":70},44054,"Docker 运行时提示找不到 'epub_to_audiobook.py' 文件怎么办？","这是因为 Docker 挂载卷（volume）覆盖了容器内的源代码。解决方法是更新到最新版本（v0.4.2 或更高），或者调整挂载方式，确保不要将当前目录直接挂载到覆盖源代码的路径。维护者已修复此问题，请尝试运行 `docker pull` 更新镜像后重试。命令示例：`docker run --rm -v .\u002F:\u002Fapp -e OPENAI_API_KEY=$OPENAI_API_KEY ghcr.io\u002Fp0n1\u002Fepub_to_audiobook your_book.epub audiobook_output --tts openai`。","https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fissues\u002F9",{"id":72,"question_zh":73,"answer_zh":74,"source_url":70},44055,"Python 运行时出现 'ModuleNotFoundError: No module named importlib_metadata' 错误如何解决？","这是缺少依赖库导致的。可以通过手动安装缺失的库来解决：`pip install importlib-metadata`。维护者建议将此依赖添加到项目的 `requirements.txt` 中以避免未来出现类似问题。",{"id":76,"question_zh":77,"answer_zh":78,"source_url":79},44056,"遇到 'TypeError: cannot pickle _thread.RLock object' 多线程错误怎么办？","这通常是由于运行环境资源不足导致的，特别是在 Codespace 等受限环境中。解决方案是增加运行实例的资源配置，例如将内存提升至 32GB 并使用 8 核 CPU，这样可以解决因资源瓶颈引发的多线程序列化错误。","https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fissues\u002F126",{"id":81,"question_zh":82,"answer_zh":83,"source_url":84},44057,"项目是否有计划支持 Web 界面或与 Readarr\u002FCalibre 集成？","社区成员已经使用 Gradio 构建了独立的 Web 界面原型，可以通过构造参数并生成子进程来实现电子书到有聲书的转换。目前维护者正在重构项目代码，建议等待当前的重大重构 PR 合并后再进行进一步的集成开发，以避免代码冲突。用户可以加入项目的 Discord 服务器参与讨论和规划。","https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fissues\u002F15",{"id":86,"question_zh":87,"answer_zh":88,"source_url":89},44058,"如何让生成的语音听起来更自然（更像真人）？","建议使用 OpenAI 的 TTS 服务，并选择 'onyx'  voices。用户反馈表明，将播放速度调整为 1.25 倍速时，语音效果非常自然且可听性良好。此外，确保你的 OpenAI 账户有足够的额度以使用高质量的模型。","https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fissues\u002F7",{"id":91,"question_zh":92,"answer_zh":93,"source_url":89},44059,"遇到 OpenAI 'insufficient_quota' (配额不足) 错误如何解决？","这是 OpenAI 后端的限制，无法通过修改代码绕过。即使拥有 Plus 计划，如果 API 额度用尽也会报错。解决方法是：登录 platform.openai.com，绑定信用卡并充值额度，然后等待几小时让系统刷新配额。",{"id":95,"question_zh":96,"answer_zh":97,"source_url":98},44060,"转换后的章节 MP3 文件大小只有 2kb 左右是什么原因？","这通常意味着转换过程未正确写入音频数据，可能是由于 TTS 提供商（如 Edge TTS）的配置问题、网络连接中断或输出路径权限问题导致的。虽然日志中显示的字符数正常，但实际音频生成失败。建议检查 TTS 提供商的 API 连接状态，确认虚拟环境配置正确，并尝试在纯 Linux 环境（非 WSL）或更换 TTS 引擎进行测试。","https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fissues\u002F56",[100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190],{"id":101,"version":102,"summary_zh":103,"released_at":104},351579,"v0.8.7","## 变更内容\n* 功能（语音）：添加缺失的 Azure Dragon HD Flash 模型，由 @eMUQI 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F160 中完成\n* 添加文件名净化器，以处理过长的章节标题和文件系统限制，由 @Alexandrsv 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F166 中完成\n* 修复拼写错误，由 @AlistairKeiller 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F171 中完成\n* 通过更新 edge-tts 修复 403 禁止访问错误，由 @Alexandrsv 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F172 中完成\n\n## 新贡献者\n* @eMUQI 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F160 中完成了首次贡献\n* @Alexandrsv 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F166 中完成了首次贡献\n* @AlistairKeiller 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F171 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.8.6...v0.8.7","2026-02-03T05:41:48",{"id":106,"version":107,"summary_zh":108,"released_at":109},351580,"v0.8.6","## 变更内容\n* 修复（edge-tts）：消除重复的 MP3 编码；由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F158 中实现，改为以 48 kbps 的比特率导出一次。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.8.5...v0.8.6","2026-02-03T05:41:02",{"id":111,"version":112,"summary_zh":113,"released_at":114},351581,"v0.8.5","## 变更内容\n* 修复：在 Dockerfile 中将基础镜像更新为 Python 3.11-slim-trixie，以使用 ffmpeg 7.1.1-1 而不是 5.1.6-0，由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F157 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.8.4...v0.8.5","2025-08-29T08:41:52",{"id":116,"version":117,"summary_zh":118,"released_at":119},351582,"v0.8.4","## 变更内容\n* 修复：增强 Edge TTS 的文本过滤功能，以减少 NoAudioReceived 错误，由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F154 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.8.3...v0.8.4","2025-07-25T08:25:35",{"id":121,"version":122,"summary_zh":123,"released_at":124},351583,"v0.8.3","## 变更内容\n* 修复：在 Web UI 中为 EdgeTTSProvider 格式化语音参数，包括 **语速、音量和音高**，由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F149 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.8.2...v0.8.3","2025-06-27T03:51:41",{"id":126,"version":127,"summary_zh":128,"released_at":129},351584,"v0.8.2","## 变更内容\n* 由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F144 中优化了 Edge TTS 的使用体验\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.8.1...v0.8.2","2025-06-11T10:21:30",{"id":131,"version":132,"summary_zh":133,"released_at":134},351585,"v0.8.1","## 变更内容\n* 修复：由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F137 中将 Azure 和 Edge 的 `break_duration` 滑块变量拆分\n* Kokoro Docker Compose - 启用 GPU 资源，由 @Wngui 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F139 中实现\n* 杂项：更新 `requirements.txt` 中的包版本，由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F142 中完成\n\n## 新贡献者\n* @Wngui 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F139 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.8.0...v0.8.1","2025-06-10T07:07:05",{"id":136,"version":137,"summary_zh":138,"released_at":139},351586,"v0.8.0","通过更新 Docker 镜像来试用新的 WebUI\n\n```bash\ndocker pull ghcr.io\u002Fp0n1\u002Fepub_to_audiobook:latest\n```\n\n编辑仓库中的 [`docker-compose.webui.yml`](https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fblob\u002Fmain\u002Fdocker-compose.webui.yml)，填入您的 API 密钥，然后运行：\n\n```bash\ndocker compose -f docker-compose.webui.yml up\n```\n\n之后，请访问本地 URL http:\u002F\u002F127.0.0.1:7860 查看新界面。更多详情请参阅 README 中的 [WebUI 部分](https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook?tab=readme-ov-file#web-interface-webui)。\n\n![webui](https:\u002F\u002Fraw.githubusercontent.com\u002Fp0n1\u002Fepub_to_audiobook\u002Frefs\u002Fheads\u002Fmain\u002Fexamples\u002Fwebui.png)\n\n## 变更内容\n* 由 @vcalv 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F76 中修复了 README.md 中的一个小错别字。\n* 由 @bcongdon 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F63 中实现仅包含选定章节的字符\u002F成本估算。\n* 由 @vcalv 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F77 中添加了 Piper TTS 支持。\n* 由 @reverendj1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F80 中实现了文本搜索与替换功能。\n* 由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F84 中修复了分割问题，并更改了 Piper 的默认设置。\n* 由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F86 中改进了关于 Piper 的 README 文档。\n* 由 @allen-n 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F93 中实现了 Piper 的 Docker 化。\n* 由 @Jackylaviss 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F97 中更新了 requirements.txt 文件。\n* 由 @sab666 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F107 中简化了 OpenAI TTS 提供商的验证流程。\n* 由 @kovaacs 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F108 中优化了 Dockerfile。\n* 由 @kovaacs 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F109 中添加了多进程支持。\n* 由 @alexjyong 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F119 中移除了参考编号选项。\n* 由 @alexjyong 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F118 中添加了 Kokoro 支持。\n* 由 @Bryksin 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F121 中根据 #120 的需求为 OpenAI TTS 添加了新功能。\n* 由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F124 中将 edge-tts 升级至 7.0.0 版本。\n* 由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F125 中修复了空标题格式的空白字符修剪问题。\n* 由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F127 中修复了 OpenAI 多进程的问题。\n* 由 @alexjyong 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F129 中对文档和 docker-compose 文件进行了 Kokoro 相关的更新。\n* 由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F130 中添加了使用 pydub 包合并音频片段的选项。\n* 由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F131 中增强了文本分割功能，并添加了单元测试。\n* 由 @Surrogard 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F133 中修复了指向 storyteller 的失效链接。\n* 由 @Bryksin 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F132 中实现了 Gradio Web UI。\n* 由 @p0n1 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook 中修复了 README 文档。","2025-05-23T15:25:46",{"id":141,"version":142,"summary_zh":143,"released_at":144},351587,"v0.6.1","- 在 Dockerfile 中添加 ffmpeg\n- 修复一些警告\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.6.0...v0.6.1","2024-06-28T09:58:41",{"id":146,"version":147,"summary_zh":148,"released_at":149},351588,"v0.6.0","## 变更内容\n* 修复由 @haydonryan 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F44 中提出的上一个 PR 中 if elif else 块的 bug\n* 功能：新增 Edge-tts 对段落\u002F章节的暂停支持，由 @phuchoang2603 提出，链接为 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F45\n* 更新 README.md，加入 ffmpeg 排障说明，由 @IgnorantSapient 提出，链接为 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F53\n* 功能：添加 title_mode 选项，由 @xtmu 提出，链接为 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F49\n* 修复 edge 暂停问题，由 @p0n1 提出，链接为 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F71\n* 更新依赖项，由 @p0n1 提出，链接为 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F73\n\n> 刚刚重新回过头来处理了这些问题。`edge` TTS 引擎的功能现在应该更加稳定了。如果你想免费将任何电子书转换成有声书，不妨试试 `--tts edge` 参数！\n\n## 新贡献者\n* @phuchoang2603 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F45 中完成了首次贡献\n* @IgnorantSapient 在 https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F53 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.5.1...v0.6.0","2024-06-28T08:43:10",{"id":151,"version":152,"summary_zh":153,"released_at":154},351589,"v0.5.1","## What's Changed\r\n* README: improve after edge by @p0n1 in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F32\r\n* Update README.md by @p0n1 in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F34\r\n* feat: skip prompt if instructed or in preview mode by @haydonryan in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F38\r\n* README: fix docker usage by @p0n1 in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F39.\r\n\r\n> Remember to add `-i -t` options for docker usage to enable interactive mode and allocate a pseudo-TTY.\r\n\r\n## New Contributors\r\n* @haydonryan made their first contribution in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F38\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.5.0...v0.5.1","2024-01-23T12:38:49",{"id":156,"version":157,"summary_zh":158,"released_at":159},351590,"v0.5.0","## What's Changed\r\n* Fr 21 project refactoring by @Bryksin in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F25\r\n* fix: replace epub_to_audiobook.py to main.py by @p0n1 in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F28\r\n* feat: add Edge TTS provider by @xtmu in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F30\r\n\r\n## New Contributors\r\n* @Bryksin made their first contribution in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F25\r\n* @xtmu made their first contribution in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F30\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.4.3...v0.5.0","2024-01-11T16:54:06",{"id":161,"version":162,"summary_zh":163,"released_at":164},351591,"v0.4.3","fix: #20 file overwrite every iteration\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.4.2...v0.4.3","2023-11-22T16:43:59",{"id":166,"version":167,"summary_zh":168,"released_at":169},351592,"v0.4.2","fix: prevent docker volume override source\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.4.1...v0.4.2","2023-11-15T15:47:40",{"id":171,"version":172,"summary_zh":173,"released_at":174},351593,"v0.4.1","[fix: raise for requests status error](https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcommit\u002F9ab7db06f3c487027e310ea848e475af9be391a2)\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.4.0...v0.4.1","2023-11-11T18:04:53",{"id":176,"version":177,"summary_zh":178,"released_at":179},351594,"v0.4.0","## Release Notes for v0.4.0\r\n\r\nWe're excited to announce the release of `epub_to_audiobook` v0.4.0, which now includes support for OpenAI's Text-to-Speech (TTS)! Here's what's new:\r\n\r\n- **OpenAI TTS Integration:** You can now use OpenAI's TTS by running `python3 epub_to_audiobook.py input output --tts openai`. To explore the variety of voices and output formats available, visit our [OpenAI TTS Options](https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook#more-examples).\r\n\r\n- **Codebase Refactor:** The codebase has been refactored for better flexibility, making the addition of more TTS providers a breeze.\r\n\r\n- **Compatibility Assurance:** Your existing CLI commands remain intact. Azure TTS users will experience no change in their workflow.\r\n\r\n- **Language Support:** OpenAI TTS detects language automatically for conversion. But use `--language` option to set your ebook's language will enhance the text splitting strategy, which is especially beneficial for Chinese characters.\r\n\r\n- **Preview Option:** Use the `--preview` flag for a chapter summary and character count before conversion.\r\n\r\nFor a sneak peek of OpenAI's TTS, listen to a sample [here](https:\u002F\u002Faudio.com\u002Fpaudi\u002Faudio\u002Fopenai-0008-chapter-vii-agricultural-experience-i-had-now-been-in).\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.3.0...v0.4.0","2023-11-10T12:50:34",{"id":181,"version":182,"summary_zh":183,"released_at":184},351595,"v0.3.0","## What's Changed\r\n* Adding --output_text and --remove_endnotes options by @jczinger in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F5\r\n* Updating readme with documentation for new features by @jczinger in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F6\r\n\r\n## New Contributors\r\n* @jczinger made their first contribution in https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fpull\u002F5\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.2.1...v0.3.0","2023-11-10T12:17:53",{"id":186,"version":187,"summary_zh":188,"released_at":189},351596,"v0.2.0","# Release Notes for `epub_to_audiobook` v0.2.0\r\n\r\nHello everyone! 🎉 We're excited to introduce a new update for our project, `epub_to_audiobook`. Your feedback has been invaluable, and we've worked hard to deliver some useful new features and improvements based on your great advice. Here are the changes in this release:\r\n\r\n## 🚀 New Features:\r\n\r\n- **Preview Mode**: Want a sneak peek? Now, you can preview the chapter titles without actually converting the text to speech. Great for ensuring you're targeting the right content!\r\n- **Break Between Paragraphs**: Get a better listening experience with adjustable breaks between paragraphs. You can adjust the break duration for different sections or paragraphs, making the audiobook feel more natural.\r\n- **Target Specific Chapters**: You can now select which chapters to convert, giving you more control over the content of your audiobook.\r\n- **Custom Audio Output Format**: Choose the desired audio quality and file size to suit your needs. We've added support for various formats for a flexible listening experience.\r\n- **Newline Mode Options**: Customize how you detect new paragraphs. Whether your ePub uses single or double newlines, we've got you covered.\r\n\r\n## 🛠 Improvements:\r\n\r\n- Improved text stripping for cleaner content extraction.\r\n- Refined the chapter extraction process for better accuracy.\r\n- Enhanced logging for better debugging and traceability. You can now adjust the log level to your preference.\r\n- Increased retry count for text-to-speech conversion to ensure a higher success rate.\r\n\r\n## 📘 Documentation:\r\n\r\n- Added documentation for the `preview` and many other options.\r\n- Revised the usage guide to help you make the most of the new features.\r\n\r\n## 🙏 Acknowledgements:\r\n\r\nA big thank you to our community of users for providing valuable feedback and advice. Your insights continue to shape the direction of this project. Keep those suggestions coming!\r\n\r\n---\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fcompare\u002Fv0.1.0...v0.2.0\r\n\r\nFor more details on usage and configurations, please refer to the project's [`README.md`](https:\u002F\u002Fgithub.com\u002Fp0n1\u002Fepub_to_audiobook\u002Fblob\u002Fmain\u002FREADME.md).\r\n\r\nHappy listening! 🎧","2023-09-20T10:45:39",{"id":191,"version":192,"summary_zh":15,"released_at":193},351597,"v0.1.0","2023-09-18T16:52:26",[195,205,215,223,231,243],{"id":196,"name":197,"github_repo":198,"description_zh":199,"stars":200,"difficulty_score":37,"last_commit_at":201,"category_tags":202,"status":57},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160784,"2026-04-19T11:32:54",[203,204,48],"开发框架","Agent",{"id":206,"name":207,"github_repo":208,"description_zh":209,"stars":210,"difficulty_score":211,"last_commit_at":212,"category_tags":213,"status":57},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[48,214,204,203],"图像",{"id":216,"name":217,"github_repo":218,"description_zh":219,"stars":220,"difficulty_score":37,"last_commit_at":221,"category_tags":222,"status":57},8553,"spec-kit","github\u002Fspec-kit","Spec Kit 是一款专为提升软件开发效率而设计的开源工具包，旨在帮助团队快速落地“规格驱动开发”（Spec-Driven Development）模式。传统开发中，需求文档往往与代码实现脱节，导致沟通成本高且结果不可控；而 Spec Kit 通过将规格说明书转化为可执行的指令，让 AI 直接依据明确的业务场景生成高质量代码，从而减少从零开始的随意编码，确保产出结果的可预测性。\n\n该工具特别适合希望利用 AI 辅助编程的开发者、技术负责人及初创团队。无论是启动全新项目还是在现有工程中引入规范化流程，用户只需通过简单的命令行操作，即可初始化项目并集成主流的 AI 编程助手。其核心技术亮点在于“规格即代码”的理念，支持社区扩展与预设模板，允许用户根据特定技术栈定制开发流程。此外，Spec Kit 强调官方维护的安全性，提供稳定的版本管理，帮助开发者在享受 AI 红利的同时，依然牢牢掌握架构设计的主动权，真正实现从“凭感觉写代码”到“按规格建系统”的转变。",88749,"2026-04-17T09:48:14",[48,214,204,203],{"id":224,"name":225,"github_repo":226,"description_zh":227,"stars":228,"difficulty_score":37,"last_commit_at":229,"category_tags":230,"status":57},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[203,48],{"id":232,"name":233,"github_repo":234,"description_zh":235,"stars":236,"difficulty_score":37,"last_commit_at":237,"category_tags":238,"status":57},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85267,"2026-04-18T11:00:28",[214,239,240,241,204,242,48,203,47],"数据工具","视频","插件","其他",{"id":244,"name":245,"github_repo":246,"description_zh":247,"stars":248,"difficulty_score":249,"last_commit_at":250,"category_tags":251,"status":57},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[48,239,242]]