[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-suno-ai--bark":3,"similar-suno-ai--bark":86},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":8,"readme_en":9,"readme_zh":10,"quickstart_zh":11,"use_case_zh":12,"hero_image_url":13,"owner_login":14,"owner_name":15,"owner_avatar_url":16,"owner_bio":17,"owner_company":18,"owner_location":18,"owner_email":18,"owner_twitter":18,"owner_website":19,"owner_url":20,"languages":21,"stars":30,"forks":31,"last_commit_at":32,"license":33,"difficulty_score":34,"env_os":35,"env_gpu":36,"env_ram":35,"env_deps":37,"category_tags":43,"github_topics":18,"view_count":45,"oss_zip_url":18,"oss_zip_packed_at":18,"status":46,"created_at":47,"updated_at":48,"faqs":49,"releases":85},3108,"suno-ai\u002Fbark","bark","🔊 Text-Prompted Generative Audio Model","Bark 是由 Suno 推出的开源生成式音频模型，能够根据文本提示创造出高度逼真的多语言语音、音乐、背景噪音及简单音效。与传统仅能朗读文字的语音合成工具不同，Bark 基于 Transformer 架构，不仅能模拟说话，还能生成笑声、叹息、哭泣等非语言声音，甚至能处理带有情感色彩和语气停顿的复杂文本，极大地丰富了音频表达的可能性。\n\n它主要解决了传统语音合成声音机械、缺乏情感以及无法生成非语音类音效的痛点，让创作者能通过简单的文字描述获得生动自然的音频素材。无论是需要为视频配音的内容创作者、探索多模态生成的研究人员，还是希望快速原型设计的开发者，都能从中受益。普通用户也可通过集成的演示页面轻松体验其神奇效果。\n\n技术亮点方面，Bark 支持商业使用（MIT 许可），并在近期更新中实现了显著的推理速度提升，同时提供了适配低显存 GPU 的版本，降低了使用门槛。此外，社区还建立了丰富的提示词库，帮助用户更好地驾驭模型生成特定风格的声音。只需几行 Python 代码，即可将创意文本转化为高质量音频，是连接文字与声音世界的强大桥梁。","> Notice: Bark is Suno's open-source text-to-speech+ model. If you are looking for our text-to-music models, please visit us on our [web page](https:\u002F\u002Fsuno.ai) and join our community on [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord). \n\n     \n# 🐶 Bark\n\n[![](https:\u002F\u002Fdcbadge.vercel.app\u002Fapi\u002Fserver\u002FJ2B2vsjKuE?style=flat&compact=True)](https:\u002F\u002Fsuno.ai\u002Fdiscord)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl\u002Fhttps\u002Ftwitter.com\u002FFM.svg?style=social&label=@suno_ai_)](https:\u002F\u002Ftwitter.com\u002Fsuno_ai_)\n\n> 🔗 [Examples](https:\u002F\u002Fsuno.ai\u002Fexamples\u002Fbark-v0) • [Suno Studio Waitlist](https:\u002F\u002Fsuno-ai.typeform.com\u002Fsuno-studio) • [Updates](#-updates) • [How to Use](#-usage-in-python) • [Installation](#-installation) • [FAQ](#-faq)\n\n[\u002F\u002F]: \u003Cbr> (vertical spaces around image)\n\u003Cbr>\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsuno-ai_bark_readme_27b0ac3f8edb.png\" width=\"500\">\u003C\u002Fimg>\n\u003C\u002Fp>\n\u003Cbr>\n\nBark is a transformer-based text-to-audio model created by [Suno](https:\u002F\u002Fsuno.ai). Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints, which are ready for inference and available for commercial use.\n\n## ⚠ Disclaimer\nBark was developed for research purposes. It is not a conventional text-to-speech model but instead a fully generative text-to-audio model, which can deviate in unexpected ways from provided prompts. Suno does not take responsibility for any output generated. Use at your own risk, and please act responsibly.\n\n## 📖 Quick Index\n* [🚀 Updates](#-updates)\n* [💻 Installation](#-installation)\n* [🐍 Usage](#-usage-in-python)\n* [🌀 Live Examples](https:\u002F\u002Fsuno.ai\u002Fexamples\u002Fbark-v0)\n* [❓ FAQ](#-faq)\n\n## 🎧 Demos  \n\n[![Open in Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-Open%20in%20Spaces-blue.svg)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fsuno\u002Fbark)\n[![Open on Replicate](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F®️-Open%20on%20Replicate-blue.svg)](https:\u002F\u002Freplicate.com\u002Fsuno-ai\u002Fbark)\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing)\n\n## 🚀 Updates\n\n**2023.05.01**\n- ©️ Bark is now licensed under the MIT License, meaning it's now available for commercial use!  \n- ⚡ 2x speed-up on GPU. 10x speed-up on CPU. We also added an option for a smaller version of Bark, which offers additional speed-up with the trade-off of slightly lower quality. \n- 📕 [Long-form generation](notebooks\u002Flong_form_generation.ipynb), voice consistency enhancements and other examples are now documented in a new [notebooks](.\u002Fnotebooks) section.\n- 👥 We created a [voice prompt library](https:\u002F\u002Fsuno-ai.notion.site\u002F8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c). We hope this resource helps you find useful prompts for your use cases! You can also join us on [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord), where the community actively shares useful prompts in the **#audio-prompts** channel.  \n- 💬 Growing community support and access to new features here: \n\n     [![](https:\u002F\u002Fdcbadge.vercel.app\u002Fapi\u002Fserver\u002FJ2B2vsjKuE)](https:\u002F\u002Fsuno.ai\u002Fdiscord)\n\n- 💾 You can now use Bark with GPUs that have low VRAM (\u003C4GB).\n\n**2023.04.20**\n- 🐶 Bark release!\n\n## 🐍 Usage in Python\n\n\u003Cdetails open>\n  \u003Csummary>\u003Ch3>🪑 Basics\u003C\u002Fh3>\u003C\u002Fsummary>\n\n```python\nfrom bark import SAMPLE_RATE, generate_audio, preload_models\nfrom scipy.io.wavfile import write as write_wav\nfrom IPython.display import Audio\n\n# download and load all models\npreload_models()\n\n# generate audio from text\ntext_prompt = \"\"\"\n     Hello, my name is Suno. And, uh — and I like pizza. [laughs] \n     But I also have other interests such as playing tic tac toe.\n\"\"\"\naudio_array = generate_audio(text_prompt)\n\n# save audio to disk\nwrite_wav(\"bark_generation.wav\", SAMPLE_RATE, audio_array)\n  \n# play text in notebook\nAudio(audio_array, rate=SAMPLE_RATE)\n```\n     \n[pizza1.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F34592747\u002Fcfa98e54-721c-4b9c-b962-688e09db684f.webm)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>\u003Ch3>🌎 Foreign Language\u003C\u002Fh3>\u003C\u002Fsummary>\n\u003Cbr>\nBark supports various languages out-of-the-box and automatically determines language from input text. When prompted with code-switched text, Bark will attempt to employ the native accent for the respective languages. English quality is best for the time being, and we expect other languages to further improve with scaling. \n\u003Cbr>\n\u003Cbr>\n\n```python\n\ntext_prompt = \"\"\"\n    추석은 내가 가장 좋아하는 명절이다. 나는 며칠 동안 휴식을 취하고 친구 및 가족과 시간을 보낼 수 있습니다.\n\"\"\"\naudio_array = generate_audio(text_prompt)\n```\n[suno_korean.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F32879321\u002F235313033-dc4477b9-2da0-4b94-9c8b-a8c2d8f5bb5e.webm)\n  \n*Note: since Bark recognizes languages automatically from input text, it is possible to use, for example, a german history prompt with english text. This usually leads to english audio with a german accent.*\n```python\ntext_prompt = \"\"\"\n    Der Dreißigjährige Krieg (1618-1648) war ein verheerender Konflikt, der Europa stark geprägt hat.\n    This is a beginning of the history. If you want to hear more, please continue.\n\"\"\"\naudio_array = generate_audio(text_prompt)\n```\n[suno_german_accent.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F34592747\u002F3f96ab3e-02ec-49cb-97a6-cf5af0b3524a.webm)\n\n\n     \n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>\u003Ch3>🎶 Music\u003C\u002Fh3>\u003C\u002Fsummary>\nBark can generate all types of audio, and, in principle, doesn't see a difference between speech and music. Sometimes Bark chooses to generate text as music, but you can help it out by adding music notes around your lyrics.\n\u003Cbr>\n\u003Cbr>\n\n```python\ntext_prompt = \"\"\"\n    ♪ In the jungle, the mighty jungle, the lion barks tonight ♪\n\"\"\"\naudio_array = generate_audio(text_prompt)\n```\n[lion.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F5068315\u002F230684766-97f5ea23-ad99-473c-924b-66b6fab24289.webm)\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Ch3>🎤 Voice Presets\u003C\u002Fh3>\u003C\u002Fsummary>\n  \nBark supports 100+ speaker presets across [supported languages](#supported-languages). You can browse the library of supported voice presets [HERE](https:\u002F\u002Fsuno-ai.notion.site\u002F8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c), or in the [code](bark\u002Fassets\u002Fprompts). The community also often shares presets in [Discord](https:\u002F\u002Fdiscord.gg\u002FJ2B2vsjKuE).\n\n> Bark tries to match the tone, pitch, emotion and prosody of a given preset, but does not currently support custom voice cloning. The model also attempts to preserve music, ambient noise, etc.\n\n```python\ntext_prompt = \"\"\"\n    I have a silky smooth voice, and today I will tell you about \n    the exercise regimen of the common sloth.\n\"\"\"\naudio_array = generate_audio(text_prompt, history_prompt=\"v2\u002Fen_speaker_1\")\n```\n\n[sloth.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F5068315\u002F230684883-a344c619-a560-4ff5-8b99-b4463a34487b.webm)\n\u003C\u002Fdetails>\n\n### 📃 Generating Longer Audio\n  \nBy default, `generate_audio` works well with around 13 seconds of spoken text. For an example of how to do long-form generation, see 👉 **[Notebook](notebooks\u002Flong_form_generation.ipynb)** 👈\n\n\u003Cdetails>\n\u003Csummary>Click to toggle example long-form generations (from the example notebook)\u003C\u002Fsummary>\n\n[dialog.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F2565833\u002F235463539-f57608da-e4cb-4062-8771-148e29512b01.webm)\n\n[longform_advanced.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F2565833\u002F235463547-1c0d8744-269b-43fe-9630-897ea5731652.webm)\n\n[longform_basic.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F2565833\u002F235463559-87efe9f8-a2db-4d59-b764-57db83f95270.webm)\n\n\u003C\u002Fdetails>\n\n\n## Command line\n```commandline\npython -m bark --text \"Hello, my name is Suno.\" --output_filename \"example.wav\"\n```\n\n## 💻 Installation\n*‼️ CAUTION ‼️ Do NOT use `pip install bark`. It installs a different package, which is not managed by Suno.*\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark.git\n```\n\nor\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\ncd bark && pip install . \n```\n\n\n## 🤗 Transformers Usage\n\nBark is available in the 🤗 Transformers library from version 4.31.0 onwards, requiring minimal dependencies \nand additional packages. Steps to get started:\n\n1. First install the 🤗 [Transformers library](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) from main:\n\n```\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers.git\n```\n\n2. Run the following Python code to generate speech samples:\n\n```py\nfrom transformers import AutoProcessor, BarkModel\n\nprocessor = AutoProcessor.from_pretrained(\"suno\u002Fbark\")\nmodel = BarkModel.from_pretrained(\"suno\u002Fbark\")\n\nvoice_preset = \"v2\u002Fen_speaker_6\"\n\ninputs = processor(\"Hello, my dog is cute\", voice_preset=voice_preset)\n\naudio_array = model.generate(**inputs)\naudio_array = audio_array.cpu().numpy().squeeze()\n```\n\n3. Listen to the audio samples either in an ipynb notebook:\n\n```py\nfrom IPython.display import Audio\n\nsample_rate = model.generation_config.sample_rate\nAudio(audio_array, rate=sample_rate)\n```\n\nOr save them as a `.wav` file using a third-party library, e.g. `scipy`:\n\n```py\nimport scipy\n\nsample_rate = model.generation_config.sample_rate\nscipy.io.wavfile.write(\"bark_out.wav\", rate=sample_rate, data=audio_array)\n```\n\nFor more details on using the Bark model for inference using the 🤗 Transformers library, refer to the \n[Bark docs](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmodel_doc\u002Fbark) or the hands-on \n[Google Colab](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1dWWkZzvu7L9Bunq9zvD-W02RFUXoW-Pd?usp=sharing).\n\n\n## 🛠️ Hardware and Inference Speed\n\nBark has been tested and works on both CPU and GPU (`pytorch 2.0+`, CUDA 11.7 and CUDA 12.0).\n\nOn enterprise GPUs and PyTorch nightly, Bark can generate audio in roughly real-time. On older GPUs, default colab, or CPU, inference time might be significantly slower. For older GPUs or CPU you might want to consider using smaller models. Details can be found in out tutorial sections here.\n\nThe full version of Bark requires around 12GB of VRAM to hold everything on GPU at the same time. \nTo use a smaller version of the models, which should fit into 8GB VRAM, set the environment flag `SUNO_USE_SMALL_MODELS=True`.\n\nIf you don't have hardware available or if you want to play with bigger versions of our models, you can also sign up for early access to our model playground [here](https:\u002F\u002Fsuno-ai.typeform.com\u002Fsuno-studio).\n\n## ⚙️ Details\n\nBark is fully generative text-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to [AudioLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.03143) and [Vall-E](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.02111) and a quantized Audio representation from [EnCodec](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fencodec). It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.\n\nBelow is a list of some known non-speech sounds, but we are finding more every day. Please let us know if you find patterns that work particularly well on [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord)!\n\n- `[laughter]`\n- `[laughs]`\n- `[sighs]`\n- `[music]`\n- `[gasps]`\n- `[clears throat]`\n- `—` or `...` for hesitations\n- `♪` for song lyrics\n- CAPITALIZATION for emphasis of a word\n- `[MAN]` and `[WOMAN]` to bias Bark toward male and female speakers, respectively\n\n### Supported Languages\n\n| Language | Status |\n| --- | :---: |\n| English (en) | ✅ |\n| German (de) | ✅ |\n| Spanish (es) | ✅ |\n| French (fr) | ✅ |\n| Hindi (hi) | ✅ |\n| Italian (it) | ✅ |\n| Japanese (ja) | ✅ |\n| Korean (ko) | ✅ |\n| Polish (pl) | ✅ |\n| Portuguese (pt) | ✅ |\n| Russian (ru) | ✅ |\n| Turkish (tr) | ✅ |\n| Chinese, simplified (zh) | ✅ |\n\nRequests for future language support [here](https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fdiscussions\u002F111) or in the **#forums** channel on [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord). \n\n## 🙏 Appreciation\n\n- [nanoGPT](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002FnanoGPT) for a dead-simple and blazing fast implementation of GPT-style models\n- [EnCodec](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fencodec) for a state-of-the-art implementation of a fantastic audio codec\n- [AudioLM](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Faudiolm-pytorch) for related training and inference code\n- [Vall-E](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.02111), [AudioLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.03143) and many other ground-breaking papers that enabled the development of Bark\n\n## © License\n\nBark is licensed under the MIT License. \n\n## 📱 Community\n\n- [Twitter](https:\u002F\u002Ftwitter.com\u002Fsuno_ai_)\n- [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord)\n\n## 🎧 Suno Studio (Early Access)\n\nWe’re developing a playground for our models, including Bark. \n\nIf you are interested, you can sign up for early access [here](https:\u002F\u002Fsuno-ai.typeform.com\u002Fsuno-studio).\n\n## ❓ FAQ\n\n#### How do I specify where models are downloaded and cached?\n* Bark uses Hugging Face to download and store models. You can see find more info [here](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fpackage_reference\u002Fenvironment_variables#hfhome). \n\n\n#### Bark's generations sometimes differ from my prompts. What's happening?\n* Bark is a GPT-style model. As such, it may take some creative liberties in its generations, resulting in higher-variance model outputs than traditional text-to-speech approaches.\n\n#### What voices are supported by Bark?  \n* Bark supports 100+ speaker presets across [supported languages](#supported-languages). You can browse the library of speaker presets [here](https:\u002F\u002Fsuno-ai.notion.site\u002F8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c). The community also shares presets in [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord). Bark also supports generating unique random voices that fit the input text. Bark does not currently support custom voice cloning.\n\n#### Why is the output limited to ~13-14 seconds?\n* Bark is a GPT-style model, and its architecture\u002Fcontext window is optimized to output generations with roughly this length.\n\n#### How much VRAM do I need?\n* The full version of Bark requires around 12Gb of memory to hold everything on GPU at the same time. However, even smaller cards down to ~2Gb work with some additional settings. Simply add the following code snippet before your generation: \n\n```python\nimport os\nos.environ[\"SUNO_OFFLOAD_CPU\"] = \"True\"\nos.environ[\"SUNO_USE_SMALL_MODELS\"] = \"True\"\n```\n\n#### My generated audio sounds like a 1980s phone call. What's happening?\n* Bark generates audio from scratch. It is not meant to create only high-fidelity, studio-quality speech. Rather, outputs could be anything from perfect speech to multiple people arguing at a baseball game recorded with bad microphones.\n","> 注意：Bark 是 Suno 的开源文本转音频模型。如果您正在寻找我们的文本转音乐模型，请访问我们的[网页](https:\u002F\u002Fsuno.ai)并在[Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord)上加入我们的社区。\n\n     \n# 🐶 Bark\n\n[![](https:\u002F\u002Fdcbadge.vercel.app\u002Fapi\u002Fserver\u002FJ2B2vsjKuE?style=flat&compact=True)](https:\u002F\u002Fsuno.ai\u002Fdiscord)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl\u002Fhttps\u002Ftwitter.com\u002FFM.svg?style=social&label=@suno_ai_)](https:\u002F\u002Ftwitter.com\u002Fsuno_ai_)\n\n> 🔗 [示例](https:\u002F\u002Fsuno.ai\u002Fexamples\u002Fbark-v0) • [Suno Studio 等待名单](https:\u002F\u002Fsuno-ai.typeform.com\u002Fsuno-studio) • [更新](#-updates) • [使用方法](#-usage-in-python) • [安装](#-installation) • [常见问题解答](#-faq)\n\n[\u002F\u002F]: \u003Cbr> (图片周围的垂直空行)\n\u003Cbr>\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsuno-ai_bark_readme_27b0ac3f8edb.png\" width=\"500\">\u003C\u002Fimg>\n\u003C\u002Fp>\n\u003Cbr>\n\nBark 是由 [Suno](https:\u002F\u002Fsuno.ai) 开发的基于 Transformer 的文本转音频模型。Bark 能够生成高度逼真的多语言语音以及其他音频内容，包括音乐、背景噪音和简单的音效。该模型还能模拟非语言交流，如笑声、叹息和哭泣。为了支持研究社区，我们提供了预训练模型检查点的访问权限，这些检查点已准备好用于推理，并可用于商业用途。\n\n## ⚠ 免责声明\nBark 是为研究目的而开发的。它并非传统的文本转语音模型，而是一个完全生成式的文本转音频模型，可能会以意想不到的方式偏离所提供的提示。Suno 对任何生成的输出不承担任何责任。请自行承担风险并负责任地使用。\n\n## 📖 快速索引\n* [🚀 更新](#-updates)\n* [💻 安装](#-installation)\n* [🐍 使用](#-usage-in-python)\n* [🌀 实时示例](https:\u002F\u002Fsuno.ai\u002Fexamples\u002Fbark-v0)\n* [❓ 常见问题解答](#-faq)\n\n## 🎧 演示\n\n[![在 Spaces 中打开](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-Open%20in%20Spaces-blue.svg)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fsuno\u002Fbark)\n[![在 Replicate 上打开](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F®️-Open%20on%20Replicate-blue.svg)](https:\u002F\u002Freplicate.com\u002Fsuno-ai\u002Fbark)\n[![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing)\n\n## 🚀 更新\n\n**2023年5月1日**\n- ©️ Bark 现在采用 MIT 许可证授权，这意味着它现在可以用于商业用途！  \n- ⚡ GPU 上的速度提升 2 倍。CPU 上的速度提升 10 倍。我们还增加了一个较小版本的 Bark 选项，虽然质量稍低，但速度更快。 \n- 📕 [长文本生成](notebooks\u002Flong_form_generation.ipynb)、语音一致性增强以及其他示例现已记录在一个新的 [notebooks](.\u002Fnotebooks) 部分中。\n- 👥 我们创建了一个 [语音提示库](https:\u002F\u002Fsuno-ai.notion.site\u002F8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c)。我们希望这个资源能帮助您找到适合自己用例的有用提示！您也可以加入我们的 [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord)，社区成员会在 **#audio-prompts** 频道中积极分享有用的提示。  \n- 💬 不断增长的社区支持以及新功能的访问权限：\n\n     [![](https:\u002F\u002Fdcbadge.vercel.app\u002Fapi\u002Fserver\u002FJ2B2vsjKuE)](https:\u002F\u002Fsuno.ai\u002Fdiscord)\n\n- 💾 现在您可以将 Bark 与显存较低（\u003C4GB）的 GPU 一起使用。\n\n**2023年4月20日**\n- 🐶 Bark 正式发布！\n\n## 🐍 在 Python 中的使用\n\n\u003Cdetails open>\n  \u003Csummary>\u003Ch3>🪑 基础知识\u003C\u002Fh3>\u003C\u002Fsummary>\n\n```python\nfrom bark import SAMPLE_RATE, generate_audio, preload_models\nfrom scipy.io.wavfile import write as write_wav\nfrom IPython.display import Audio\n\n# 下载并加载所有模型\npreload_models()\n\n# 根据文本生成音频\ntext_prompt = \"\"\"\n     你好，我叫 Sune。嗯——我喜欢披萨。[笑] \n     不过我也对玩井字棋之类的活动很感兴趣。\n\"\"\"\naudio_array = generate_audio(text_prompt)\n\n# 将音频保存到磁盘\nwrite_wav(\"bark_generation.wav\", SAMPLE_RATE, audio_array)\n\n# 在笔记本中播放文本\n音频(音频数组, 采样率=采样速率)\n```\n     \n[pizza1.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F34592747\u002Fcfa98e54-721c-4b9c-b962-688e09db684f.webm)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>\u003Ch3>🌎 外语\u003C\u002Fh3>\u003C\u002Fsummary>\n\u003Cbr>\nBark 默认支持多种语言，并能自动从输入文本中识别语言。当遇到代码转换的文本时，Bark 会尝试为每种语言采用其对应的母语口音。目前英语的质量最佳，我们预计随着模型规模的扩大，其他语言的表现也会进一步提升。\n\u003Cbr>\n\u003Cbr>\n\n```python\n\ntext_prompt = \"\"\"\n    中秋节是我最喜欢的节日。我可以休息几天，和朋友及家人一起度过时光。\n\"\"\"\naudio_array = generate_audio(text_prompt)\n```\n[suno_korean.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F32879321\u002F235313033-dc4477b9-2da0-4b94-9c8b-a8c2d8f5bb5e.webm)\n  \n*注：由于Bark会自动从输入文本中识别语言，因此可以使用德语历史提示配合英文文本。这通常会产生带有德语口音的英语音频。*\n```python\ntext_prompt = \"\"\"\n    三十年战争（1618-1648）是一场对欧洲影响深远的毁灭性冲突。\n    这是历史的开端。如果想听更多，请继续。\n\"\"\"\naudio_array = generate_audio(text_prompt)\n```\n[suno_german_accent.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F34592747\u002F3f96ab3e-02ec-49cb-97a6-cf5af0b3524a.webm)\n\n\n     \n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>\u003Ch3>🎶 音乐\u003C\u002Fh3>\u003C\u002Fsummary>\nBark 可以生成各种类型的音频，原则上它并不区分语音和音乐。有时 Bark 会选择将文本生成为音乐，但你可以在歌词周围加入音乐记号来引导它。\n\u003Cbr>\n\u003Cbr>\n\n```python\ntext_prompt = \"\"\"\n    ♪ 在丛林里，那雄伟的丛林，今晚狮子在咆哮 ♪\n\"\"\"\naudio_array = generate_audio(text_prompt)\n```\n[lion.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F5068315\u002F230684766-97f5ea23-ad99-473c-924b-66b6fab24289.webm)\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Ch3>🎤 语音预设\u003C\u002Fh3>\u003C\u002Fsummary>\n  \nBark 支持超过100种说话人预设，涵盖所有[支持的语言](#supported-languages)。你可以在[这里](https:\u002F\u002Fsuno-ai.notion.site\u002F8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c)浏览支持的语音预设库，或在[bark\u002Fassets\u002Fprompts](bark\u002Fassets\u002Fprompts)中查看代码。社区成员也经常在[Discord](https:\u002F\u002Fdiscord.gg\u002FJ2B2vsjKuE)上分享预设。\n\n> Bark 会尽量匹配给定预设的语气、音调、情感和韵律，但目前不支持自定义语音克隆。该模型还会尝试保留音乐、环境噪音等。\n\n```python\ntext_prompt = \"\"\"\n    我的声音非常柔和顺滑，今天我要跟大家聊聊普通树懒的锻炼计划。\n\"\"\"\naudio_array = generate_audio(text_prompt, history_prompt=\"v2\u002Fen_speaker_1\")\n```\n\n[sloth.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F5068315\u002F230684883-a344c619-a560-4ff5-8b99-b4463a34487b.webm)\n\u003C\u002Fdetails>\n\n### 📃 生成更长的音频\n  \n默认情况下，`generate_audio` 对于大约13秒的口语文本效果较好。如需了解如何进行长篇生成，请参阅 👉 **[Notebook](notebooks\u002Flong_form_generation.ipynb)** 👈\n\n\u003Cdetails>\n\u003Csummary>点击展开示例长篇生成（来自示例笔记本）\u003C\u002Fsummary>\n\n[dialog.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F2565833\u002F235463539-f57608da-e4cb-4062-8771-148e29512b01.webm)\n\n[longform_advanced.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F2565833\u002F235463547-1c0d8744-269b-43fe-9630-897ea5731652.webm)\n\n[longform_basic.webm](https:\u002F\u002Fuser-images.githubusercontent.com\u002F2565833\u002F235463559-87efe9f8-a2db-4d59-b764-57db83f95270.webm)\n\n\u003C\u002Fdetails>\n\n\n## 命令行\n```commandline\npython -m bark --text \"你好，我叫Suno.\" --output_filename \"example.wav\"\n```\n\n## 💻 安装\n*‼️ 注意 ‼️ 请勿使用 `pip install bark`。这会安装一个由Suno未维护的其他软件包。*\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark.git\n```\n\n或者\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\ncd bark && pip install . \n```\n\n\n## 🤗 Transformers 使用\n\n从版本4.31.0开始，Bark 已集成到🤗 Transformers 库中，所需依赖和额外包极少。入门步骤如下：\n\n1. 首先从主仓库安装🤗 [Transformers库](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)：\n\n```\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers.git\n```\n\n2. 运行以下Python代码生成语音样本：\n\n```py\nfrom transformers import AutoProcessor, BarkModel\n\nprocessor = AutoProcessor.from_pretrained(\"suno\u002Fbark\")\nmodel = BarkModel.from_pretrained(\"suno\u002Fbark\")\n\nvoice_preset = \"v2\u002Fen_speaker_6\"\n\ninputs = processor(\"你好，我的狗很可爱\", voice_preset=voice_preset)\n\naudio_array = model.generate(**inputs)\naudio_array = audio_array.cpu().numpy().squeeze()\n```\n\n3. 可以在ipynb笔记本中直接收听音频样本：\n\n```py\nfrom IPython.display import Audio\n\nsample_rate = model.generation_config.sample_rate\nAudio(audio_array, rate=sample_rate)\n```\n\n或者使用第三方库（如`scipy`）将其保存为`.wav`文件：\n\n```py\nimport scipy\n\nsample_rate = model.generation_config.sample_rate\nscipy.io.wavfile.write(\"bark_out.wav\", rate=sample_rate, data=audio_array)\n```\n\n有关如何使用🤗 Transformers库进行Bark模型推理的更多信息，请参阅[Bark文档](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmodel_doc\u002Fbark)或动手实践的[Google Colab](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1dWWkZzvu7L9Bunq9zvD-W02RFUXoW-Pd?usp=sharing)。\n\n## 🛠️ 硬件与推理速度\n\nBark 已在CPU和GPU上进行了测试并正常运行（`pytorch 2.0+`，CUDA 11.7和CUDA 12.0）。\n\n在企业级GPU和PyTorch nightly版本上，Bark 可以实现近实时的音频生成。而在较旧的GPU、默认Colab环境或CPU上，推理时间可能会显著变慢。对于老式GPU或CPU，建议使用较小的模型。详细信息可在我们的教程部分找到。\n\n完整版Bark需要约12GB显存才能同时将所有内容加载到GPU上。若要使用更适合8GB显存的小型模型，请设置环境变量 `SUNO_USE_SMALL_MODELS=True`。\n\n如果你没有可用的硬件，或想体验更大版本的模型，也可以在此处注册以提前访问我们的模型试用平台[这里](https:\u002F\u002Fsuno-ai.typeform.com\u002Fsuno-studio)。\n\n## ⚙️ 详情\n\nBark 是一款完全生成式的文本到音频模型，专为研究和演示目的而开发。它采用类似 GPT 的架构，与 [AudioLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.03143) 和 [Vall-E](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.02111) 类似，并使用来自 [EnCodec](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fencodec) 的量化音频表示。它并非传统的 TTS 模型，而是一个完全生成式的文本到音频模型，能够以意想不到的方式偏离给定的文本内容。与以往的方法不同，输入的文本提示会直接转换为音频，无需中间的音素步骤。因此，它可以泛化到语音之外的任意指令，例如音乐歌词、音效或其他非语音声音。\n\n以下是一些已知的非语音声音示例，但我们每天都在发现更多。如果您在 [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord) 上发现了特别有效的模式，请告诉我们！\n\n- `[laughter]`\n- `[laughs]`\n- `[sighs]`\n- `[music]`\n- `[gasps]`\n- `[clears throat]`\n- `—` 或 `...` 表示停顿\n- `♪` 表示歌曲歌词\n- 大写字母用于强调某个词\n- `[MAN]` 和 `[WOMAN]` 分别用于将 Bark 的输出偏向男性或女性说话者\n\n### 支持的语言\n\n| 语言 | 状态 |\n| --- | :---: |\n| 英语 (en) | ✅ |\n| 德语 (de) | ✅ |\n| 西班牙语 (es) | ✅ |\n| 法语 (fr) | ✅ |\n| 印地语 (hi) | ✅ |\n| 意大利语 (it) | ✅ |\n| 日语 (ja) | ✅ |\n| 韩语 (ko) | ✅ |\n| 波兰语 (pl) | ✅ |\n| 葡萄牙语 (pt) | ✅ |\n| 俄语 (ru) | ✅ |\n| 土耳其语 (tr) | ✅ |\n| 中文（简体）(zh) | ✅ |\n\n如需未来支持其他语言，请在此处提交请求 [这里](https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fdiscussions\u002F111)，或在 [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord) 的 **#forums** 频道中提出。\n\n## 🙏 致谢\n\n- [nanoGPT](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002FnanoGPT)，感谢其简单高效且速度极快的 GPT 样式模型实现。\n- [EnCodec](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fencodec)，感谢其先进的音频编解码器实现。\n- [AudioLM](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Faudiolm-pytorch)，感谢其相关的训练和推理代码。\n- [Vall-E](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.02111)、[AudioLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.03143) 以及其他许多开创性的论文，它们共同促成了 Bark 的开发。\n\n## © 许可证\n\nBark 采用 MIT 许可证授权。\n\n## 📱 社区\n\n- [Twitter](https:\u002F\u002Ftwitter.com\u002Fsuno_ai_)\n- [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord)\n\n## 🎧 Suno Studio（抢先体验）\n\n我们正在开发一个用于测试我们模型的平台，其中包括 Bark。\n\n如果您感兴趣，可以在此处注册抢先体验 [这里](https:\u002F\u002Fsuno-ai.typeform.com\u002Fsuno-studio)。\n\n## ❓ 常见问题解答\n\n#### 如何指定模型的下载和缓存位置？\n* Bark 使用 Hugging Face 来下载和存储模型。您可以在此处找到更多信息 [这里](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fpackage_reference\u002Fenvironment_variables#hfhome)。\n\n#### Bark 的生成结果有时与我的提示不符。这是为什么？\n* Bark 是一种 GPT 样式的模型。因此，它在生成时可能会加入一些创造性的自由发挥，导致输出结果比传统文本到语音方法更具多样性。\n\n#### Bark 支持哪些语音？\n* Bark 在所有 [支持的语言](#supported-languages) 中支持 100 多种预设语音。您可以在此处浏览语音预设库 [这里](https:\u002F\u002Fsuno-ai.notion.site\u002F8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c)。社区成员也会在 [Discord](https:\u002F\u002Fsuno.ai\u002Fdiscord) 上分享预设。此外，Bark 还支持根据输入文本生成独特的随机语音。目前，Bark 尚不支持自定义语音克隆。\n\n#### 为什么输出长度被限制在约 13–14 秒？\n* Bark 是一种 GPT 样式的模型，其架构和上下文窗口经过优化，适合生成大约这个长度的内容。\n\n#### 我需要多少显存？\n* Bark 的完整版本需要大约 12GB 显存才能将所有数据同时加载到 GPU 上。不过，即使是显存较小的显卡，最低至约 2GB，也可以通过一些额外设置来运行。只需在生成前添加以下代码片段：\n\n```python\nimport os\nos.environ[\"SUNO_OFFLOAD_CPU\"] = \"True\"\nos.environ[\"SUNO_USE_SMALL_MODELS\"] = \"True\"\n```\n\n#### 我生成的音频听起来像 20 世纪 80 年代的电话通话。这是怎么回事？\n* Bark 是从零开始生成音频的，它并不旨在只生成高保真、录音室级别的语音。相反，它的输出可能从完美的语音，到用劣质麦克风录制的棒球比赛现场多人争吵的声音，各种情况都有可能。","# Bark 快速上手指南\n\nBark 是由 Suno 开发的开源文本转音频（Text-to-Audio）模型。它不仅能生成逼真的多语言语音，还能创作音乐、背景音效以及笑声、叹息等非语言声音。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS, 或 Windows\n*   **Python 版本**: Python 3.8 - 3.10 (推荐 3.9)\n*   **硬件要求**:\n    *   **GPU (推荐)**: 需要 NVIDIA 显卡，显存建议 **8GB** 以上以获得最佳体验。若显存小于 4GB，需启用小模型模式。\n    *   **CPU**: 支持运行，但生成速度较慢。\n*   **前置依赖**:\n    *   PyTorch 2.0+\n    *   CUDA 11.7 或 12.0 (仅限 GPU 用户)\n\n> **注意**: 首次运行时会自动下载模型权重文件（约 2-3GB），请确保网络连接通畅。国内用户若下载缓慢，可尝试配置代理或使用支持断点续传的网络环境。\n\n## 2. 安装步骤\n\n⚠️ **重要提示**: 请勿直接运行 `pip install bark`，这会安装一个无关的第三方包。请务必使用以下官方源进行安装。\n\n### 方法一：直接通过 Git 安装（推荐）\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark.git\n```\n\n### 方法二：克隆仓库后安装\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark.git\ncd bark && pip install . \n```\n\n### 可选：启用小模型模式（针对低显存设备）\n\n如果您的 GPU 显存小于 8GB，建议在运行代码前设置环境变量以加载精简版模型：\n\n```bash\nexport SUNO_USE_SMALL_MODELS=True\n```\n*(Windows PowerShell 用户请使用: `$env:SUNO_USE_SMALL_MODELS=\"True\"`)*\n\n## 3. 基本使用\n\n以下是最简单的 Python 使用示例，包含模型加载、音频生成及保存。\n\n### 基础代码示例\n\n```python\nfrom bark import SAMPLE_RATE, generate_audio, preload_models\nfrom scipy.io.wavfile import write as write_wav\nfrom IPython.display import Audio\n\n# 1. 下载并加载所有模型 (首次运行时会下载)\npreload_models()\n\n# 2. 定义文本提示\n# Bark 支持自然语言描述，如 [laughs] 表示笑声，♪ 表示音乐\ntext_prompt = \"\"\"\n     Hello, my name is Suno. And, uh — and I like pizza. [laughs] \n     But I also have other interests such as playing tic tac toe.\n\"\"\"\n\n# 3. 生成音频\naudio_array = generate_audio(text_prompt)\n\n# 4. 保存音频到磁盘\nwrite_wav(\"bark_generation.wav\", SAMPLE_RATE, audio_array)\n  \n# 5. (可选) 在 Jupyter Notebook 中直接播放\n# Audio(audio_array, rate=SAMPLE_RATE)\n```\n\n### 进阶技巧简述\n\n*   **多语言支持**: 模型会自动识别输入文本的语言（支持中文、日文、韩文、德文等），无需额外配置。\n*   **音色预设**: 可以通过 `history_prompt` 参数指定音色。例如：`generate_audio(text, history_prompt=\"v2\u002Fzh_speaker_0\")` (具体预设名称需参考官方库)。\n*   **非语言声音**: 在文本中使用 `[laughter]`, `[sighs]`, `[music]` 等标签可控制情感和非语音输出。\n*   **长文本生成**: 默认单次生成适合约 13 秒的文本。如需生成长篇内容，建议分段生成后拼接，或参考官方提供的长文本生成 Notebook 示例。","一家独立游戏开发团队正在为一款叙事驱动的冒险游戏制作原型，需要快速生成包含多语种对白、笑声、叹息及背景音效的沉浸式音频素材。\n\n### 没有 bark 时\n- **成本高昂且周期长**：团队必须聘请专业配音演员和拟音师，录制、剪辑和混音流程繁琐，严重拖慢原型迭代速度。\n- **情感表达受限**：传统文本转语音工具语气生硬，无法自然呈现剧本中要求的“大笑”、“犹豫”或“哭泣”等非语言情感细节。\n- **多语言适配困难**：为支持全球发行，需针对不同语言分别寻找配音资源，协调成本高且难以保证角色音色的一致性。\n- **创意试错成本高**：每次修改剧本台词或调整情绪基调，都意味着重新预约录音棚和人员，导致开发者不敢轻易尝试新方案。\n\n### 使用 bark 后\n- **即时生成与迭代**：开发者只需输入带有情感标记（如 `[laughs]`）的文本提示，bark 即可在本地瞬间生成高质量音频，将数天的工作缩短至几分钟。\n- **细腻的情感还原**：bark 能精准识别并演绎剧本中的非语言沟通信号，生成的笑声和叹息自然逼真，极大提升了角色的真实感。\n- **无缝的多语言支持**：利用 bark 的多语言能力，团队用同一套工作流直接生成英、日、法等多语种对白，且保持了统一的音色风格。\n- **低成本创意探索**：修改台词或尝试不同情绪演绎变得零成本，策划人员可以实时听取多种版本效果，从而激发更多创意灵感。\n\nbark 让小型团队也能以极低的成本和门槛，创造出具备电影级情感表现力的动态音频内容。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsuno-ai_bark_27b0ac3f.png","suno-ai","Suno","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fsuno-ai_c03442fb.jpg","Foundation Models for Audio",null,"https:\u002F\u002Fwww.suno.com","https:\u002F\u002Fgithub.com\u002Fsuno-ai",[22,26],{"name":23,"color":24,"percentage":25},"Jupyter Notebook","#DA5B0B",58.1,{"name":27,"color":28,"percentage":29},"Python","#3572A5",41.9,39067,4687,"2026-04-04T03:33:35","MIT",3,"未说明","非必需（支持 CPU 和 GPU）。若使用 GPU，需 PyTorch 2.0+ 及 CUDA 11.7 或 12.0。完整版模型需约 12GB 显存；可设置环境变量 SUNO_USE_SMALL_MODELS=True 使用小模型版本，仅需约 8GB 显存。",{"notes":38,"python":35,"dependencies":39},"切勿使用 'pip install bark' 安装，需通过 git 地址安装。模型支持多种语言及非语音声音（如笑声、音乐）。首次运行需下载预训练模型。在旧 GPU 或 CPU 上推理速度可能较慢，建议使用小模型版本。",[40,41,42],"torch>=2.0","scipy","transformers>=4.31.0 (可选)",[44],"音频",5,"ready","2026-03-27T02:49:30.150509","2026-04-06T06:53:21.356148",[50,55,60,65,70,75,80],{"id":51,"question_zh":52,"answer_zh":53,"source_url":54},14310,"为什么提示未使用 GPU 且推理速度极慢？如何启用 GPU 加速？","即使系统检测到 GPU，程序也可能默认运行在 CPU 上。解决方案取决于你的硬件：\n1. **NVIDIA 用户**：确保安装了支持 CUDA 的 PyTorch 版本。尝试卸载现有 torch 并重新安装：`pip uninstall torch`，然后访问 https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F 获取适合你系统的安装命令。\n2. **AMD 用户 (Windows)**：虽然不支持 ROCM，但可以尝试使用 DirectML 加速，参考相关分支或教程。\n3. **通用检查**：运行 `python -c \"import torch; print(torch.cuda.is_available())\"` 确认 CUDA 是否可用。如果显示 True 但仍慢，检查代码中是否显式指定了设备。","https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fissues\u002F202",{"id":56,"question_zh":57,"answer_zh":58,"source_url":59},14311,"如何在 Apple Silicon (M1\u002FM2) Mac 上运行 Bark？","Bark 支持通过 PyTorch 的 MPS (Metal Performance Shaders) 后端在 Apple Silicon 上运行。请按以下步骤操作：\n1. 创建并激活虚拟环境（推荐使用 conda）。\n2. 安装依赖：`pip install .` 和 `pip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers.git`。\n3. **关键步骤**：在运行脚本前导出环境变量以启用 MPS 回退：`export PYTORCH_ENABLE_MPS_FALLBACK=1`。\n4. 为了节省内存，建议同时设置：`export SUNO_OFFLOAD_CPU=True` 和 `export SUNO_USE_SMALL_MODELS=True`。\n5. 在代码中确保没有强制使用 CUDA。","https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fissues\u002F19",{"id":61,"question_zh":62,"answer_zh":63,"source_url":64},14312,"遇到 'CUDA out of memory' (显存不足) 错误如何解决？","对于显存较小的显卡（如 RTX 3050 Ti 4GB），需要加载小型模型以避免溢出。\n**解决方法**：在导入 `bark` 库**之前**，必须设置环境变量。\n方法一（命令行）：\n`export SUNO_USE_SMALL_MODELS=True` (Linux\u002FMac) 或 `set SUNO_USE_SMALL_MODELS=True` (Windows)\n方法二（Python 代码开头）：\n```python\nimport os\nos.environ[\"SUNO_USE_SMALL_MODELS\"] = \"True\"\nos.environ[\"SUNO_OFFLOAD_CPU\"] = \"True\" # 可选，将部分计算卸载到 CPU\nfrom bark import generate_audio\n```\n注意：如果在导入 bark 之后才设置该变量，不会生效。","https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fissues\u002F51",{"id":66,"question_zh":67,"answer_zh":68,"source_url":69},14313,"报错 'cannot import name SAMPLE_RATE from bark' 怎么办？","这通常是由命名冲突引起的。请检查你的项目文件夹，看是否有一个名为 `bark.py` 的文件。\n**原因**：如果你的脚本命名为 `bark.py`，Python 会导入当前文件而不是安装的 `bark` 库，导致找不到 `SAMPLE_RATE` 等属性。\n**解决方法**：将你的脚本文件重命名为其他名称（例如 `my_bark_test.py`），并删除同目录下生成的 `__pycache__` 文件夹或 `.pyc` 文件，然后重新运行。","https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fissues\u002F195",{"id":71,"question_zh":72,"answer_zh":73,"source_url":74},14314,"如何消除 'The attention mask and the pad token id were not set' 警告？","可以通过在调用 `model.generate` 时显式传递 `attention_mask` 和 `pad_token_id` 参数来解决。\n**代码示例**：\n```python\nfrom transformers import AutoProcessor, BarkModel\n\nprocessor = AutoProcessor.from_pretrained(\"suno\u002Fbark\")\nmodel = BarkModel.from_pretrained(\"suno\u002Fbark\")\n\ntext = \"Good Morning\"\ninputs = processor(text, return_tensors=\"pt\")\n\naudio_array = model.generate(\n  input_ids=inputs[\"input_ids\"], \n  attention_mask=inputs[\"attention_mask\"], # 显式传入 attention_mask\n  pad_token_id=processor.tokenizer.pad_token_id # 显式传入 pad_token_id\n)\n```\n如果只想暂时隐藏警告，可以使用 `warnings.filterwarnings('ignore')` 或设置 `os.environ['TRANSFORMERS_VERBOSITY'] = 'error'`，但这不推荐作为长期方案。","https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fissues\u002F402",{"id":76,"question_zh":77,"answer_zh":78,"source_url":79},14315,"模型下载速度慢或不稳定，支持断点续传吗？","Hugging Face 的默认下载机制在某些网络环境下可能不稳定且不支持直观的断点续传。\n**解决方案**：\n1. **手动下载**：你可以手动从 Hugging Face Hub 下载模型文件，然后放置到本地缓存目录中。默认缓存位置为 `~\u002F.cache\u002Fhuggingface\u002Fhub` (Linux\u002FMac) 或 `C:\\Users\\\u003C用户名>\\.cache\\huggingface\\hub` (Windows)。\n2. **检查网络**：确保网络连接正常，有时简单的重试或切换网络节点能解决问题。\n3. **清理缓存**：如果下载中断导致文件损坏，尝试删除缓存目录中的对应文件夹后重新运行程序。","https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fissues\u002F46",{"id":81,"question_zh":82,"answer_zh":83,"source_url":84},14316,"哪里可以找到 Bark 的使用文档和教程？","官方仓库中现在包含一个 `tutorials` 文件夹，里面有多个 Jupyter Notebook 示例，详细展示了如何使用 Bark 进行音频生成、控制说话人情感等。\n此外，社区非常活跃，你可以加入官方的 Discord 频道与其他用户交流经验和获取帮助。直接查看仓库根目录下的 README 或 tutorials 目录即可入门。","https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fissues\u002F16",[],[87,104,112,120,128,137],{"id":88,"name":89,"github_repo":90,"description_zh":91,"stars":92,"difficulty_score":93,"last_commit_at":94,"category_tags":95,"status":46},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,2,"2026-04-05T10:45:23",[96,97,98,99,100,101,102,103,44],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架",{"id":105,"name":106,"github_repo":107,"description_zh":108,"stars":109,"difficulty_score":34,"last_commit_at":110,"category_tags":111,"status":46},4128,"GPT-SoVITS","RVC-Boss\u002FGPT-SoVITS","GPT-SoVITS 是一款强大的开源语音合成与声音克隆工具，旨在让用户仅需极少量的音频数据即可训练出高质量的个性化语音模型。它核心解决了传统语音合成技术依赖海量录音数据、门槛高且成本大的痛点，实现了“零样本”和“少样本”的快速建模：用户只需提供 5 秒参考音频即可即时生成语音，或使用 1 分钟数据进行微调，从而获得高度逼真且相似度极佳的声音效果。\n\n该工具特别适合内容创作者、独立开发者、研究人员以及希望为角色配音的普通用户使用。其内置的友好 WebUI 界面集成了人声伴奏分离、自动数据集切片、中文语音识别及文本标注等辅助功能，极大地降低了数据准备和模型训练的技术门槛，让非专业人士也能轻松上手。\n\n在技术亮点方面，GPT-SoVITS 不仅支持中、英、日、韩、粤语等多语言跨语种合成，还具备卓越的推理速度，在主流显卡上可实现实时甚至超实时的生成效率。无论是需要快速制作视频配音，还是进行多语言语音交互研究，GPT-SoVITS 都能以极低的数据成本提供专业级的语音合成体验。",56375,"2026-04-05T22:15:46",[44],{"id":113,"name":114,"github_repo":115,"description_zh":116,"stars":117,"difficulty_score":34,"last_commit_at":118,"category_tags":119,"status":46},2863,"TTS","coqui-ai\u002FTTS","🐸TTS 是一款功能强大的深度学习文本转语音（Text-to-Speech）开源库，旨在将文字自然流畅地转化为逼真的人声。它解决了传统语音合成技术中声音机械生硬、多语言支持不足以及定制门槛高等痛点，让高质量的语音生成变得触手可及。\n\n无论是希望快速集成语音功能的开发者，还是致力于探索前沿算法的研究人员，亦或是需要定制专属声音的数据科学家，🐸TTS 都能提供得力支持。它不仅预置了覆盖全球 1100 多种语言的训练模型，让用户能够即刻上手，还提供了完善的工具链，支持用户利用自有数据训练新模型或对现有模型进行微调，轻松实现特定风格的声音克隆。\n\n在技术亮点方面，🐸TTS 表现卓越。其最新的 ⓍTTSv2 模型支持 16 种语言，并在整体性能上大幅提升，实现了低于 200 毫秒的超低延迟流式输出，极大提升了实时交互体验。此外，它还无缝集成了 🐶Bark、🐢Tortoise 等社区热门模型，并支持调用上千个 Fairseq 模型，展现了极强的兼容性与扩展性。配合丰富的数据集分析与整理工具，🐸TTS 已成为科研与生产环境中备受信赖的语音合成解决方案。",44971,"2026-04-03T14:47:02",[44,103,96],{"id":121,"name":122,"github_repo":123,"description_zh":124,"stars":125,"difficulty_score":34,"last_commit_at":126,"category_tags":127,"status":46},2375,"LocalAI","mudler\u002FLocalAI","LocalAI 是一款开源的本地人工智能引擎，旨在让用户在任意硬件上轻松运行各类 AI 模型，包括大语言模型、图像生成、语音识别及视频处理等。它的核心优势在于彻底打破了高性能计算的门槛，无需昂贵的专用 GPU，仅凭普通 CPU 或常见的消费级显卡（如 NVIDIA、AMD、Intel 及 Apple Silicon）即可部署和运行复杂的 AI 任务。\n\n对于担心数据隐私的用户而言，LocalAI 提供了“隐私优先”的解决方案，确保所有数据处理均在本地基础设施内完成，无需上传至云端。同时，它完美兼容 OpenAI、Anthropic 等主流 API 接口，这意味着开发者可以无缝迁移现有应用，直接利用本地资源替代云服务，既降低了成本又提升了可控性。\n\nLocalAI 内置了超过 35 种后端支持（如 llama.cpp、vLLM、Whisper 等），并集成了自主 AI 代理、工具调用及检索增强生成（RAG）等高级功能，且具备多用户管理与权限控制能力。无论是希望保护敏感数据的企业开发者、进行算法实验的研究人员，还是想要在个人电脑上体验最新 AI 技术的极客玩家，都能通过 LocalAI 获",44782,"2026-04-02T22:14:26",[96,44,102,100,103,97,99],{"id":129,"name":130,"github_repo":131,"description_zh":132,"stars":133,"difficulty_score":134,"last_commit_at":135,"category_tags":136,"status":46},3788,"airi","moeru-ai\u002Fairi","airi 是一款开源的本地化 AI 伴侣项目，旨在将虚拟角色（如“二次元老婆”或赛博生命）带入用户的现实世界。它的核心目标是复刻并超越知名 AI 主播 Neuro-sama 的能力，让用户能够拥有完全自主掌控、可私有化部署的智能伙伴。\n\nairi 主要解决了用户对高度定制化、具备情感交互能力且数据隐私安全的 AI 角色的需求。不同于依赖云端服务的通用助手，airi 允许用户在本地运行，不仅保护了对话隐私，还赋予了用户定义角色性格与灵魂的自由。它支持实时语音聊天，甚至能直接参与《我的世界》（Minecraft）和《异星工厂》（Factorio）等游戏，实现了从单纯对话到共同娱乐的跨越。\n\n这款工具非常适合喜爱虚拟角色的普通用户、希望搭建个性化 AI 陪伴的技术爱好者，以及研究多模态交互的开发者。其独特的技术亮点在于跨平台支持（涵盖 Web、macOS 和 Windows）以及强大的游戏交互能力，让 AI 不仅能“说”，还能“玩”。通过容器化的灵魂设计，airi 为每个人创造专属数字生命提供了可能，让虚拟陪伴变得更加真实且触手可及。",37086,1,"2026-04-05T10:54:25",[102,44,100],{"id":138,"name":139,"github_repo":140,"description_zh":141,"stars":142,"difficulty_score":143,"last_commit_at":144,"category_tags":145,"status":46},2735,"MockingBird","babysor\u002FMockingBird","MockingBird 是一款开源的实时语音克隆工具，旨在让用户仅需 5 秒的参考音频，即可快速合成任意内容的语音，并实现逼真的音色复刻。它有效解决了传统语音合成技术中数据采集成本高、训练周期长以及难以实时生成的痛点，让个性化语音生成变得触手可及。\n\n这款工具特别适合开发者、AI 研究人员以及对语音技术感兴趣的技术爱好者使用。无论是用于构建交互式语音应用、进行声学模型研究，还是制作创意内容，MockingBird 都能提供强大的支持。普通用户若具备基础的编程环境配置能力，也可通过其提供的 Web 服务或工具箱体验前沿的变声效果。\n\n在技术亮点方面，MockingBird 基于 PyTorch 框架，不仅完美支持中文普通话及多种主流数据集，还实现了跨平台运行，兼容 Windows、Linux 乃至 M1 架构的 macOS。其独特的架构设计允许复用预训练的编码器与声码器，只需微调合成器即可获得出色效果，大幅降低了部署门槛。此外，项目内置了现成的 Web 服务器功能，方便用户通过远程调用快速集成到自己的应用中。尽管原作者已转向云端优化版本，但 MockingBird 作为经典的本地部署方案",36902,4,"2026-04-02T16:15:29",[100,44,96,103]]