[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
-
Updated
Aug 18, 2025 - Python
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.
[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"
A stable and Fast telegram video convertor bot which can encode into different libs and resolution, compress videos, convert video into audio and other video formats, rename with thumbnail support, generate screenshot and trim videos.
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Text and image to video generation: Kandinsky 4.0 (2024)
Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation
Extract, timestamp, and analyze specific content from video collections using LLM-powered audio/video processing.
Generate subtitles for all the videos in a folder with OpenAI's Whisper privately in your computer.
Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs, LAION-CLAP, MS-CLAP, DeSync
[NeurIPS 2024] Code, Dataset, Samples for the VATT paper “ Tell What You Hear From What You See - Video to Audio Generation Through Text”
[AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
A minimalistic wasm-based web application for instant audio extraction; video to audio conversion.
The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) architecture, LoVA proves to be more effective at generating long-form audio compared to existing autoregressive models and UNet-based diffusion models.
javascript video to audio。前端视频转音频。使用FileReader加载视频,然后decodeAudioData对其进行解码,并使用OfflineAudioContext重新渲染,最后将audiobuffer转换为wav。
Luma AI Video + Audio + Image Generation and RunwayML Video Generation from Image and Text
Downloads public Instagram content
Python script to convert mp4 video mp3 audio.
Add a description, image, and links to the video-to-audio topic page so that developers can more easily learn about it.
To associate your repository with the video-to-audio topic, visit your repo's landing page and select "manage topics."