Skip to content

A single Gradio + React WebUI with extensions for ACE-Step, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, MusicGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and Bark!

License

Notifications You must be signed in to change notification settings

rsxdalv/TTS-WebUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TTS WebUI / Harmonica

Videos

Watch the video Watch the video Watch the video

Examples

Bark.Narration.mp4
Bark.Japanese.mp4
MusicGen.mp4

Screenshots

react_1 react_2 react_3
gradio_1 gradio_2 gradio_3

Installation

Using the Installer (Recommended)

Current base installation size is around 10.7 GB. Each model will require 2-8 GB of space in addition.

  • Download the latest version and extract it.
  • Run start_tts_webui.bat or start_tts_webui.sh to start the server. It will ask you to select the GPU/Chip you are using. Once everything has installed, it will start the Gradio server at http://localhost:7770 and the React UI at http://localhost:3000.
  • Output log will be available in the installer_scripts/output.log file.
  • Note: The start script sets up a conda environment and a python virtual environment. Thus you don't need to make a venv before that, and in fact, launching from another venv might break this script.

Manual installation

Prerequisites:

  • git
  • Python 3.10 or 3.11 (3.12 not supported yet)
  • PyTorch
  • ffmpeg (with vorbis support)
  • (Optional) NodeJS 22.9.0 for React UI
  • (Optional) PostgreSQL 16.4+ for database support
  1. Clone the repository:

    git clone https://github.com/rsxdalv/tts-webui.git
    cd tts-webui
  2. Install required packages:

    pip install -r requirements.txt
  3. Run the server:

    python server.py --no-react
  4. For React UI:

    cd react-ui
    npm install
    npm run build
    cd ..
    python server.py

For detailed manual installation instructions, please refer to the Manual Installation Guide.

Docker Setup

tts-webui can also be ran inside of a Docker container. Using CUDA inside of docker requires NVIDIA Container Toolkit. To get started, pull the image from GitHub Container Registry:

docker pull ghcr.io/rsxdalv/tts-webui:main

Once the image has been pulled it can be started with Docker Compose: The ports are 7770 (env:TTS_PORT) for the Gradio backend and 3000 (env:UI_PORT) for the React front end.

docker compose up -d

The container will take some time to generate the first output while models are downloaded in the background. The status of this download can be verified by checking the container logs:

docker logs tts-webui

Building the image yourself

If you wish to build your own docker container, you can use the included Dockerfile:

docker build -t tts-webui .

Please note that the docker-compose needs to be edited to use the image you just built.

Changelog

September:

  • OpenAI API now supports Whisper transcriptions
  • Removed PyTorch Nightly option
  • Fix Google Colab installation (Python 3.12 not supported)
  • Add Kitten TTS Mini extension
  • Add PyRNNoise extension
  • Upgrade React UI's Chatterbox interface
  • Rename Kokoro TTS extension to OpenAI TTS API extension
  • Rename all extensions to tts_webui_extension.*
  • Switch to PyPI for multiple extensions
  • Add Intel PyTorch installation option
  • Add "Custom" Choice option to installer for self-managed PyTorch installations
  • Integrate with new pip index for extensions (https://tts-webui.github.io/extensions-index/)
  • Add Xiaomi's MiMo Audio extension
  • Add Cypress-Yang's SongBloom extension
  • Add Index-TTS2 extension
  • Add VoxCPM extension
  • Add FireRedTTS2 extension

August:

  • Fix model downloader when no token is used, thanks Nusantara.
  • Improve Chatterbox speed
  • Add VibeVoice (Early Access) extension
  • Add docker compose volumes to persist data #529, thanks FranckKe.
  • [react-ui] Prepend voices/chatterbox to voice file selection in ap test page #542, thanks rohan-sircar.

July:

  • Add new tutorials
  • Add more robust gradio launching
  • Simplify installation instructions
  • Improve chatterbox speed.

Past Changes

See the 2025 Changelog for a detailed list of changes in 2025.

See the 2024 Changelog for a detailed list of changes in 2024.

See the 2023 Changelog for a detailed list of changes in 2023.

Extensions

Extensions are available to install from the webui itself, or using React UI. They can also be installed using the extension manager. Internally, extensions are just python packages that are installed using pip. Multiple extensions can be installed at the same time, but there might be compatibility issues between them. After installing or updating an extension, you need to restart the app to load it.

Updates need to be done manually by using the mini-control panel:

mini-control-panel

Integrations

Silly Tavern

  1. Update OpenAI TTS API extension to latest version

  2. Start the API and test it with Python Requests

    (OpenAI client might not be installed thus the Test with Python OpenAI client might fail)

  3. Once you can see the audio generates successfully, go to Silly Tavern, and add a new TTS API Default provider endpoint: http://localhost:7778/v1/audio/speech silly-tavern-tts-api

  4. Test it out!

Text Generation WebUI (oobabooga/text-generation-webui)

  1. Install https://github.com/rsxdalv/text-to-tts-webui extension in text-generation-webui
  2. Start the API and test it with Python Requests
  3. Configure using the panel: oobaboooga-text-to-tts-webui

OpenWebUI

  1. Enable OpenAI API extension in TTS WebUI
  2. Start the API and test it with Python Requests
  3. Once you can see the audio generates successfully, go to OpenWebUI, and add a new TTS API Default provider endpoint: http://localhost:7778/v1/audio/speech
  4. Test it out! openwebui

OpenAI Compatible APIs

Using the instructions above, you can install an OpenAI compatible API, and use it with Silly Tavern or other OpenAI compatible clients.

Compatibility / Errors

Red messages in console

These messages:

---- requires ----, but you have ---- which is incompatible.

Are completely normal. It's both a limitation of pip and because this Web UI combines a lot of different AI projects together. Since the projects are not always compatible with each other, they will complain about the other projects being installed. This is normal and expected. And in the end, despite the warnings/errors the projects will work together. It's not clear if this situation will ever be resolvable, but that is the hope.

Extra Voices for Bark, Prompt Samples

PromptEcho

Bark Speaker Directory

Bark Readme

README_Bark.md

Info about managing models, caches and system space for AI projects

#186 (reply in thread)

Open Source Libraries

This project utilizes the following open source libraries:

Ethical and Responsible Use

This technology is intended for enablement and creativity, not for harm.

By engaging with this AI model, you acknowledge and agree to abide by these guidelines, employing the AI model in a responsible, ethical, and legal manner.

  • Non-Malicious Intent: Do not use this AI model for malicious, harmful, or unlawful activities. It should only be used for lawful and ethical purposes that promote positive engagement, knowledge sharing, and constructive conversations.
  • No Impersonation: Do not use this AI model to impersonate or misrepresent yourself as someone else, including individuals, organizations, or entities. It should not be used to deceive, defraud, or manipulate others.
  • No Fraudulent Activities: This AI model must not be used for fraudulent purposes, such as financial scams, phishing attempts, or any form of deceitful practices aimed at acquiring sensitive information, monetary gain, or unauthorized access to systems.
  • Legal Compliance: Ensure that your use of this AI model complies with applicable laws, regulations, and policies regarding AI usage, data protection, privacy, intellectual property, and any other relevant legal obligations in your jurisdiction.
  • Acknowledgement: By engaging with this AI model, you acknowledge and agree to abide by these guidelines, using the AI model in a responsible, ethical, and legal manner.

License

Codebase and Dependencies

The codebase is licensed under MIT. However, it's important to note that when installing the dependencies, you will also be subject to their respective licenses. Although most of these licenses are permissive, there may be some that are not. Therefore, it's essential to understand that the permissive license only applies to the codebase itself, not the entire project.

That being said, the goal is to maintain MIT compatibility throughout the project. If you come across a dependency that is not compatible with the MIT license, please feel free to open an issue and bring it to our attention.

Known non-permissive dependencies:

Library License Notes
encodec CC BY-NC 4.0 Newer versions are MIT, but need to be installed manually
diffq CC BY-NC 4.0 Optional in the future, not necessary to run, can be uninstalled, should be updated with demucs
lameenc GPL License Future versions will make it LGPL, but need to be installed manually
unidecode GPL License Not mission critical, can be replaced with another library, issue: neonbjb/tortoise-tts#494

Model Weights

Model weights have different licenses, please pay attention to the license of the model you are using.

Most notably:

  • Bark: MIT
  • Tortoise: Unknown (Apache-2.0 according to repo, but no license file in HuggingFace)
  • MusicGen: CC BY-NC 4.0
  • AudioGen: CC BY-NC 4.0

About

A single Gradio + React WebUI with extensions for ACE-Step, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, MusicGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and Bark!

Topics

Resources

License

Stars

Watchers

Forks

Packages