New V3 Silero VAD is Already Here

snakers4 released this 07 Dec 12:17

· 293 commits to master since this release

236d250

Main changes

One VAD to rule them all! New model includes the functionality of the previous ones with improved quality and speed!
Flexible sampling rate, 8000 Hz and 16000 Hz are supported;
Flexible chunk size, minimum chunk size is just 30 milliseconds!
100k parameters;
GPU and batching are supported;
Radically simplified examples;

Migration

Please see the new examples.

New get_speech_timestamps is a simplified and unified version of the old deprecated get_speech_ts or get_speech_ts_adaptive methods.

speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=16000)

New VADIterator class serves as an example for streaming tasks instead of old deprecated VADiterator and VADiteratorAdaptive.

vad_iterator = VADIterator(model)
window_size_samples = 1536

for i in range(0, len(wav), window_size_samples):
   speech_dict = vad_iterator(wav[i: i+ window_size_samples], return_seconds=True)
   if speech_dict:
       print(speech_dict, end=' ')
vad_iterator.reset_states()

Assets 2