Skip to content

New V3 Silero VAD is Already Here

Compare
Choose a tag to compare
@snakers4 snakers4 released this 07 Dec 12:17
· 293 commits to master since this release
236d250

Main changes

  • One VAD to rule them all! New model includes the functionality of the previous ones with improved quality and speed!
  • Flexible sampling rate, 8000 Hz and 16000 Hz are supported;
  • Flexible chunk size, minimum chunk size is just 30 milliseconds!
  • 100k parameters;
  • GPU and batching are supported;
  • Radically simplified examples;

Migration

Please see the new examples.

New get_speech_timestamps is a simplified and unified version of the old deprecated get_speech_ts or get_speech_ts_adaptive methods.

speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=16000)

New VADIterator class serves as an example for streaming tasks instead of old deprecated VADiterator and VADiteratorAdaptive.

vad_iterator = VADIterator(model)
window_size_samples = 1536

for i in range(0, len(wav), window_size_samples):
   speech_dict = vad_iterator(wav[i: i+ window_size_samples], return_seconds=True)
   if speech_dict:
       print(speech_dict, end=' ')
vad_iterator.reset_states()