TTS WebUI / Harmonica

Download Installer || Installation || Docker Setup || Silly Tavern || Extensions || Feedback / Bug reports

Videos

Examples

Bark.Narration.mp4	Bark.Japanese.mp4	MusicGen.mp4

Screenshots

## Supported Models

Text-to-speech	Audio/Music Generation	Audio Conversion/Tools
Bark	MusicGen	RVC
Tortoise	MAGNeT	Demucs
Maha TTS	Stable Audio	Vocos
MMS	Riffusion*	Whisper
Vall-E X	AudioCraft Mac*	AP BWE
StyleTTS2	AudioCraft Plus*	Resemble Enhance
SeamlessM4T	ACE-Step*	Audio Separator
XTTSv2*	Song Bloom*	PyRNNoise*
MARS5*		MiMo Audio*
F5-TTS*
Parler TTS*
OpenVoice*
OpenVoice V2*
Kokoro TTS*
DIA*
CosyVoice*
GPT-SoVITS*
Piper TTS*
Kimi Audio 7B Instruct*
Chatterbox*
VibeVoice*
Kitten TTS*
Index-TTS2*
VoxCPM*
FireRedTTS2*

* These models are not installed by default, instead they are available as extensions.

Installation

Using the Installer (Recommended)

Current base installation size is around 10.7 GB. Each model will require 2-8 GB of space in addition.

Download the latest version and extract it.
Run start_tts_webui.bat or start_tts_webui.sh to start the server. It will ask you to select the GPU/Chip you are using. Once everything has installed, it will start the Gradio server at http://localhost:7770 and the React UI at http://localhost:3000.
Output log will be available in the installer_scripts/output.log file.
Note: The start script sets up a conda environment and a python virtual environment. Thus you don't need to make a venv before that, and in fact, launching from another venv might break this script.

Manual installation

Prerequisites:

git
Python 3.10 or 3.11 (3.12 not supported yet)
PyTorch
ffmpeg (with vorbis support)
(Optional) NodeJS 22.9.0 for React UI
(Optional) PostgreSQL 16.4+ for database support

Clone the repository:

git clone https://github.com/rsxdalv/tts-webui.git
cd tts-webui

Install required packages:
```
pip install -r requirements.txt
```
Run the server:
```
python server.py --no-react
```

For React UI:

cd react-ui
npm install
npm run build
cd ..
python server.py

For detailed manual installation instructions, please refer to the Manual Installation Guide.

Docker Setup

tts-webui can also be ran inside of a Docker container. Using CUDA inside of docker requires NVIDIA Container Toolkit. To get started, pull the image from GitHub Container Registry:

docker pull ghcr.io/rsxdalv/tts-webui:main

Once the image has been pulled it can be started with Docker Compose: The ports are 7770 (env:TTS_PORT) for the Gradio backend and 3000 (env:UI_PORT) for the React front end.

docker compose up -d

The container will take some time to generate the first output while models are downloaded in the background. The status of this download can be verified by checking the container logs:

docker logs tts-webui

Building the image yourself

If you wish to build your own docker container, you can use the included Dockerfile:

docker build -t tts-webui .

Please note that the docker-compose needs to be edited to use the image you just built.

Changelog

September:

OpenAI API now supports Whisper transcriptions
Removed PyTorch Nightly option
Fix Google Colab installation (Python 3.12 not supported)
Add Kitten TTS Mini extension
Add PyRNNoise extension
Upgrade React UI's Chatterbox interface
Rename Kokoro TTS extension to OpenAI TTS API extension
Rename all extensions to tts_webui_extension.*
Switch to PyPI for multiple extensions
Add Intel PyTorch installation option
Add "Custom" Choice option to installer for self-managed PyTorch installations
Integrate with new pip index for extensions (https://tts-webui.github.io/extensions-index/)
Add Xiaomi's MiMo Audio extension
Add Cypress-Yang's SongBloom extension
Add Index-TTS2 extension
Add VoxCPM extension
Add FireRedTTS2 extension

August:

Fix model downloader when no token is used, thanks Nusantara.
Improve Chatterbox speed
Add VibeVoice (Early Access) extension
Add docker compose volumes to persist data #529, thanks FranckKe.
[react-ui] Prepend voices/chatterbox to voice file selection in ap test page #542, thanks rohan-sircar.

July:

Add new tutorials
Add more robust gradio launching
Simplify installation instructions
Improve chatterbox speed.

Past Changes

See the 2025 Changelog for a detailed list of changes in 2025.

See the 2024 Changelog for a detailed list of changes in 2024.

See the 2023 Changelog for a detailed list of changes in 2023.

Extensions

Extensions are available to install from the webui itself, or using React UI. They can also be installed using the extension manager. Internally, extensions are just python packages that are installed using pip. Multiple extensions can be installed at the same time, but there might be compatibility issues between them. After installing or updating an extension, you need to restart the app to load it.

Updates need to be done manually by using the mini-control panel:

Integrations

Silly Tavern

Update OpenAI TTS API extension to latest version
Start the API and test it with Python Requests

(OpenAI client might not be installed thus the Test with Python OpenAI client might fail)
Once you can see the audio generates successfully, go to Silly Tavern, and add a new TTS API Default provider endpoint: http://localhost:7778/v1/audio/speech
Test it out!

Text Generation WebUI (oobabooga/text-generation-webui)

Install https://github.com/rsxdalv/text-to-tts-webui extension in text-generation-webui
Start the API and test it with Python Requests
Configure using the panel:

OpenWebUI

Enable OpenAI API extension in TTS WebUI
Start the API and test it with Python Requests
Once you can see the audio generates successfully, go to OpenWebUI, and add a new TTS API Default provider endpoint: http://localhost:7778/v1/audio/speech
Test it out!

OpenAI Compatible APIs

Using the instructions above, you can install an OpenAI compatible API, and use it with Silly Tavern or other OpenAI compatible clients.

Compatibility / Errors

Red messages in console

These messages:

---- requires ----, but you have ---- which is incompatible.

Are completely normal. It's both a limitation of pip and because this Web UI combines a lot of different AI projects together. Since the projects are not always compatible with each other, they will complain about the other projects being installed. This is normal and expected. And in the end, despite the warnings/errors the projects will work together. It's not clear if this situation will ever be resolvable, but that is the hope.

Extra Voices for Bark, Prompt Samples

Bark Readme

README_Bark.md

Info about managing models, caches and system space for AI projects

#186 (reply in thread)

Open Source Libraries

This project utilizes the following open source libraries:

suno-ai/bark - MIT License
- Description: Inference code for Bark model.
- Repository: suno/bark
tortoise-tts - Apache-2.0 License
- Description: A flexible text-to-speech synthesis library for various platforms.
- Repository: neonbjb/tortoise-tts
ffmpeg - LGPL License
- Description: A complete and cross-platform solution for video and audio processing.
- Repository: FFmpeg
- Use: Encoding Vorbis Ogg files
ffmpeg-python - Apache 2.0 License
- Description: Python bindings for FFmpeg library for handling multimedia files.
- Repository: kkroening/ffmpeg-python
audiocraft - MIT License
- Description: A library for audio generation and MusicGen.
- Repository: facebookresearch/audiocraft
vocos - MIT License
- Description: An improved decoder for encodec samples
- Repository: charactr-platform/vocos
RVC - MIT License
- Description: An easy-to-use Voice Conversion framework based on VITS.
- Repository: RVC-Project/Retrieval-based-Voice-Conversion-WebUI

Ethical and Responsible Use

This technology is intended for enablement and creativity, not for harm.

By engaging with this AI model, you acknowledge and agree to abide by these guidelines, employing the AI model in a responsible, ethical, and legal manner.

Non-Malicious Intent: Do not use this AI model for malicious, harmful, or unlawful activities. It should only be used for lawful and ethical purposes that promote positive engagement, knowledge sharing, and constructive conversations.
No Impersonation: Do not use this AI model to impersonate or misrepresent yourself as someone else, including individuals, organizations, or entities. It should not be used to deceive, defraud, or manipulate others.
No Fraudulent Activities: This AI model must not be used for fraudulent purposes, such as financial scams, phishing attempts, or any form of deceitful practices aimed at acquiring sensitive information, monetary gain, or unauthorized access to systems.
Legal Compliance: Ensure that your use of this AI model complies with applicable laws, regulations, and policies regarding AI usage, data protection, privacy, intellectual property, and any other relevant legal obligations in your jurisdiction.
Acknowledgement: By engaging with this AI model, you acknowledge and agree to abide by these guidelines, using the AI model in a responsible, ethical, and legal manner.

License

Codebase and Dependencies

The codebase is licensed under MIT. However, it's important to note that when installing the dependencies, you will also be subject to their respective licenses. Although most of these licenses are permissive, there may be some that are not. Therefore, it's essential to understand that the permissive license only applies to the codebase itself, not the entire project.

That being said, the goal is to maintain MIT compatibility throughout the project. If you come across a dependency that is not compatible with the MIT license, please feel free to open an issue and bring it to our attention.

Known non-permissive dependencies:

Library	License	Notes
encodec	CC BY-NC 4.0	Newer versions are MIT, but need to be installed manually
diffq	CC BY-NC 4.0	Optional in the future, not necessary to run, can be uninstalled, should be updated with demucs
lameenc	GPL License	Future versions will make it LGPL, but need to be installed manually
unidecode	GPL License	Not mission critical, can be replaced with another library, issue: neonbjb/tortoise-tts#494

Model Weights

Model weights have different licenses, please pay attention to the license of the model you are using.

Most notably:

Bark: MIT
Tortoise: Unknown (Apache-2.0 according to repo, but no license file in HuggingFace)
MusicGen: CC BY-NC 4.0
AudioGen: CC BY-NC 4.0

Name		Name	Last commit message	Last commit date
Latest commit History 320 Commits
.github/workflows		.github/workflows
collections		collections
data/models		data/models
documentation		documentation
extensions/builtin		extensions/builtin
installer_scripts		installer_scripts
logs		logs
notebooks		notebooks
outputs-rvc		outputs-rvc
react-ui		react-ui
tests		tests
tools		tools
tts_webui		tts_webui
voices-tortoise		voices-tortoise
voices		voices
.augmentignore		.augmentignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
docker-compose.yml		docker-compose.yml
extensions.json		extensions.json
installation.html		installation.html
requirements.txt		requirements.txt
server.py		server.py
setup.py		setup.py
start_tts_webui.bat		start_tts_webui.bat
start_tts_webui.command		start_tts_webui.command
start_tts_webui.sh		start_tts_webui.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TTS WebUI / Harmonica

Download Installer || Installation || Docker Setup || Silly Tavern || Extensions || Feedback / Bug reports

Videos

Examples

Screenshots

Installation

Using the Installer (Recommended)

Manual installation

Docker Setup

Building the image yourself

Changelog

Past Changes

Extensions

Integrations

Silly Tavern

Text Generation WebUI (oobabooga/text-generation-webui)

OpenWebUI

OpenAI Compatible APIs

Compatibility / Errors

Red messages in console

Extra Voices for Bark, Prompt Samples

Bark Readme

Info about managing models, caches and system space for AI projects

Open Source Libraries

Ethical and Responsible Use

License

Codebase and Dependencies

Model Weights

About

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors 15

Languages

License

rsxdalv/TTS-WebUI

Folders and files

Latest commit

History

Repository files navigation

TTS WebUI / Harmonica

Download Installer || Installation || Docker Setup || Silly Tavern || Extensions || Feedback / Bug reports

Videos

Examples

Screenshots

Installation

Using the Installer (Recommended)

Manual installation

Docker Setup

Building the image yourself

Changelog

Past Changes

Extensions

Integrations

Silly Tavern

Text Generation WebUI (oobabooga/text-generation-webui)

OpenWebUI

OpenAI Compatible APIs

Compatibility / Errors

Red messages in console

Extra Voices for Bark, Prompt Samples

Bark Readme

Info about managing models, caches and system space for AI projects

Open Source Libraries

Ethical and Responsible Use

License

Codebase and Dependencies

Model Weights

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Contributors 15

Languages

Packages