video-to-audio

Star

Here are 59 public repositories matching this topic...

hkchengrex / MMAudio

Star

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

audio computer-vision deep-learning audio-synthesis video-to-audio text-to-audio

Updated Aug 18, 2025
Python

Text-to-Audio / Make-An-Audio

Star

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

latent-space video-to-audio diffusion-models text-to-audio latent-diffusion

Updated May 22, 2024
Python

open-mmlab / FoleyCrafter

Star

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝

audio-processing video-to-audio diffusion-models aigc foley-sound-synthesis

Updated Jul 26, 2024
Python

Tencent-Hunyuan / HunyuanVideo-Foley

Star

HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.

tta video-to-audio text-to-audio text-to-video foley-sound-synthesis foley-art aigc-audio text-video-to-audio

Updated Sep 8, 2025
Python

liuhuadai / OmniAudio

Star

[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"

spatial-audio video-to-audio panoramic-video-to-spatial-audio-generation

Updated Jun 27, 2025
Python

A stable and Fast telegram video convertor bot which can encode into different libs and resolution, compress videos, convert video into audio and other video formats, rename with thumbnail support, generate screenshot and trim videos.

bot screenshot converter telegram telegram-bot encoder rename compress compressor rename-files hevc video-converter trimmer renamer video-compressor video-trimmer video-to-audio compressor-bot

Updated Jun 5, 2023
Python

TencentARC / AudioStory

Star

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

video-to-audio diffusion-models text-to-audio audio-generation multimodal-large-language-models video-dubbing

Updated Sep 2, 2025
Jupyter Notebook

ai-forever / Kandinsky-4

Star

Text and image to video generation: Kandinsky 4.0 (2024)

video distillation video-generation image-to-video video-to-audio text-to-video kandinsky video-distillation

Updated Dec 17, 2024
Python

RoySheffer / im2wav

Star

Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation

audio machine-learning pytorch video-to-audio image-to-audio audio-generation

Updated Jan 18, 2023
Python

ALucek / NeedleInAVidStack

Star

Extract, timestamp, and analyze specific content from video collections using LLM-powered audio/video processing.

video-processing video-to-text video-to-audio llm llm-video-understanding llm-audio-understanding

Updated Feb 22, 2025
Python

IOriens / whisper-video

Star

Generate subtitles for all the videos in a folder with OpenAI's Whisper privately in your computer.

summary whisper subtitle-generator video-to-text video-to-audio langchain

Updated May 3, 2024
Python

hkchengrex / av-benchmark

Star

Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs, LAION-CLAP, MS-CLAP, DeSync

audio benchmarking evaluation audio-synthesis video-to-audio text-to-audio

Updated Aug 20, 2025
Python

DragonLiu1995 / video-to-audio-through-text

Star

[NeurIPS 2024] Code, Dataset, Samples for the VATT paper “ Tell What You Hear From What You See - Video to Audio Generation Through Text”

generative-models neurips video-to-audio audio-generation generative-ai multi-modal-llms neurips-2024 neurips-2024-presentation sight-to-sound visual-to-audio

Updated Jul 24, 2025
Python

heng-hw / V2A-Mapper

Star

[AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

audio video-to-audio image-to-audio audio-generation vision-to-audio aaai2024

Updated Dec 14, 2023

comingAlive / ffmpegaudioextract.xyz

Star

A minimalistic wasm-based web application for instant audio extraction; video to audio conversion.

ffmpeg nextjs wasm video-to-audio audio-extractor ffmpeg-wasm

Updated Feb 2, 2023
TypeScript

Ceaglex / LoVA

Star

The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) architecture, LoVA proves to be more effective at generating long-form audio compared to existing autoregressive models and UNet-based diffusion models.

multimodal video-to-audio audio-generation