A simple and efficient self-hosted web application to separate audio elements (vocals, drums, bass, other instruments) from music using artificial intelligence.
- Separate vocals from background music (karaoke)
- Extract instruments individually (drums, bass, others)
- Process YouTube videos automatically
- Easy web interface - no programming required
- Multiple formats - supports MP3, WAV, FLAC, M4A, AAC
Super simple - Just one command:
# Method 1: Ultra-simple (files saved inside container)
docker run -d -p 7860:7860 --name voice-separator paladini/voice-separator
# Method 2: Using docker-compose (files accessible on your computer)
git clone https://github.com/paladini/voice-separator-demucs.git
cd voice-separator-demucs
docker compose up -d
Access http://localhost:7860.
Done! If using Method 2, your files will appear in the static/output/
folder.
If you used Method 1 (ultra-simple):
# Copy files from container to your computer
docker cp voice-separator:/app/static/output ./my-separated-files/
If you used Method 2:
- Files are already in your
static/output/
folder!
Prerequisites:
- Python 3.8+
- FFmpeg installed
# 1. Install FFmpeg
sudo apt-get install ffmpeg # Ubuntu/Debian
# or
brew install ffmpeg # macOS
# 2. Navigate to project folder
cd voice-separator-demucs
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run
python main.py
Access http://localhost:7860.
By default, the app runs on plain HTTP for simplicity. Modern browsers may show warnings like "Not Secure" or block downloads when using HTTP, even for local apps. This is normal and safe for local use.
To avoid these warnings:
-
Use HTTPS locally with a self-signed certificate:
- Generate a certificate (one-time):
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes -subj "/CN=localhost"
- Run the app with HTTPS:
uvicorn main:app --host 0.0.0.0 --port 7860 --reload --ssl-keyfile=key.pem --ssl-certfile=cert.pem
- Access the app at https://localhost:7860 and accept the browser warning about the self-signed certificate.
- Generate a certificate (one-time):
-
You may commit and reuse the same
.pem
files for local development.- This is safe for local-only use. Never use these files in production.
- All users will see a browser warning the first time, unless they add the certificate to their trusted store (not required for local dev).
-
If you use plain HTTP:
- You may see "Not Secure" warnings and Chrome may block downloads. You can safely click "Keep" or "Download anyway" for your own files.
Summary:
- For local use, these warnings are not a risk.
- For the best user experience, use HTTPS as above.
- Always access the app at http://localhost:7860 or https://localhost:7860, not
0.0.0.0
.
Access http://localhost:7860.
You can now choose between different AI models for separation:
- Demucs CPU Lite (mdx_extra_q): Fastest, runs on any CPU (default).
- Demucs v3 (mdx): Fast, GPU recommended for best speed.
- Demucs v4 (htdemucs): Medium speed, GPU required.
- Demucs HD (htdemucs_ft): Best quality, GPU required.
How to use:
- By default, the fastest model (Demucs CPU Lite) is used for all separations.
- To select a different model, enable the "Model selection" toggle in the web interface. This will reveal a dropdown where you can choose your preferred model.
- If the toggle is not enabled, the model selection UI is hidden and the default model is used.
Tip: If you do not have a GPU, select Demucs CPU Lite for best compatibility and speed.
- Select elements (vocals, drums, bass, etc.)
- Choose audio file (MP3, WAV, etc.)
- Click "Separate"
- Wait 2-5 minutes
- Download results
- Select desired elements
- Paste YouTube URL
- Click "Download and Separate"
- Wait for download + processing
- Download separated files
- First time: AI model will be downloaded (~200MB)
- Vocals only: Faster (~2 min)
- All elements: Slower (~5 min)
- YouTube: 10-minute video limit
β
MP3, WAV, FLAC, M4A, AAC
π Limit: No file size limit (local use)
β±οΈ YouTube: Maximum 10 minutes
This application uses Demucs, an AI model developed by Facebook/Meta AI specifically for music source separation. It's based on deep neural networks trained on thousands of songs.
- Backend: FastAPI + PyTorch + Demucs
- Frontend: Modern responsive web interface
- AI Model: MDX Extra Q (CPU-optimized)
- Audio Processing: FFmpeg + PyTorch Audio
- Optimized for CPU (GPU optional)
- Memory efficient with dynamic model loading
- Persistent model cache to avoid re-downloads
Option A: Direct from Docker Hub (no code download needed)
# Ultra-simple (files stay in container)
docker run -d -p 7860:7860 --name voice-separator paladini/voice-separator
# Recommended (files accessible on host)
mkdir -p voice-separator-output
docker run -d -p 7860:7860 -v $(pwd)/voice-separator-output:/app/static/output --name voice-separator paladini/voice-separator
# Access: http://localhost:7860
# Files saved to: ./voice-separator-output/ (if using second command)
Option B: Using docker-compose (with source code)
# Clone and run
git clone https://github.com/paladini/voice-separator-demucs.git
cd voice-separator-demucs
docker compose up -d
# Access: http://localhost:7860
# Files saved to: ./static/output/
# Stop container
docker stop voice-separator
# Start again
docker start voice-separator
# Remove container
docker rm voice-separator
# Update to latest version
docker pull paladini/voice-separator:latest
"FFmpeg not found"
# Ubuntu/Debian
sudo apt-get install ffmpeg
# macOS
brew install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.html
Very slow processing
- Use smaller files
- Close other programs
- Select fewer elements
- First run downloads AI model (~200MB)
YouTube download error
- Check if video is public
- Maximum 10 minutes duration
- Some videos may be region-locked
Out of memory errors
- Reduce file size
- Close other applications
- Use fewer simultaneous processes
# Clone repository
git clone https://github.com/paladini/voice-separator-demucs.git
cd voice-separator-demucs
# Install dependencies
pip install -r requirements.txt
# Run development server
python main.py
- Interactive docs:
http://localhost:7860/docs
- Alternative docs:
http://localhost:7860/redoc
This tool is intended for personal and educational use. Please respect the copyright of the music you process.
Fernando Paladini (@paladini)
Based on the Demucs model by Facebook/Meta AI Research.
This project is licensed under the MIT License. See the LICENSE file for details.