Convert YouTube videos to text transcripts and EPUB ebooks using AI-powered transcription.
Downloads YouTube videos, extracts audio, transcribes using Deepgram API, and generates either plain text or enhanced EPUB ebooks with AI-powered metadata and speaker identification.
YouTube URL ──┐
│
▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ yt-dlp │───▶│ ffmpeg │───▶│Deepgram│───▶│ Output │
│Download │ │Extract │ │Transcribe│ │EPUB/TXT │
└─────────┘ │Audio │ │+Speaker │ └─────────┘
│ └─────────┘ │Diarize │ ▲
▼ └─────────┘ │
video.mp4 ────▶ audio.mp3 ────▶ transcript ───────┘
│
▼
┌─────────┐
│ OpenAI │
│Enhance │
│Metadata │
└─────────┘
- Download -
yt-dlp
downloads video with deterministic filenames - Extract -
ffmpeg
extracts audio to MP3 - Transcribe - Deepgram API converts speech to text with speaker diarization
- Generate - Creates EPUB with AI-enhanced metadata or plain text output
Install dependencies:
# macOS
brew install yt-dlp ffmpeg pandoc
# Ubuntu/Debian
sudo apt install yt-dlp ffmpeg pandoc
Set API keys:
export DEEPGRAM_API_KEY="your-deepgram-key"
export OPENAI_API_KEY="your-openai-key" # Optional, for AI metadata
Install:
bun install
# Basic - creates EPUB
bun run start "https://www.youtube.com/watch?v=VIDEO_ID"
# Plain text output
bun run start "https://www.youtube.com/watch?v=VIDEO_ID" --txt
# With options
bun run start "URL" -o output.epub --timestamps --force
-o, --output <file>
- Output file path--txt
- Plain text instead of EPUB--timestamps
- Include timestamps-f, --force
- Skip cache, re-download/process--no-ai
- Disable AI metadata enhancement--cleanup
- Remove workspace directory after processing--utt-split <seconds>
- Silence duration for utterance splitting (default: 2.0)
- Smart Caching - Reuses downloads, audio, and transcripts
- Speaker ID - AI identifies and names speakers in multi-person content
- EPUB Generation - Rich ebook format with metadata and cover
- Progress Tracking - Real-time download and processing progress
- Deterministic Paths - Predictable file naming for reliability
- Workspace Directory - All intermediate files saved in
video-to-transcript-workspace/
for debugging