Skip to content

A short test to determine the distribution of similarity scores for different SpeechBrain speaker identification models.

Notifications You must be signed in to change notification settings

OwenWaldron/speaker-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speaker Test

This repo presents a short test for evaluating the effectiveness of different SpeechBrain models at differnetiating different speakers.

The dataset used is Mozilla's Common Voice through kaggle. This dataset holds thousands of short clips from different speakers.

The speaker embeddings are found using the selected model, and then their similarity scores are found using the cosine difference between the embeddings. The results are presented in a number of ways. Some of the models used are:

Usage

To run the tests on your own, you can simply run:

poetry env use python3
poetry install
python3 main.py

Note that in order for torchaudio to load the audio files, a backend must be present on the system. The recomended software is ffmpeg version 6. This can be installed on Ubuntu using sudo apt install ffmpeg or on macOS using Hombrew brew install ffmpeg@6 && brew link ffmpeg@6.

Results

The results for the three models are presented below. The random seed used to sample is 80085. Hehe.

Similarity matrix

These figures show, on a large scale, how similar the clips of different speakers are.
Figure 1a Figure 1b Figure 1c

Histogram of similarity scores

The histogram of the distribution scores demonstrates the rough distribution of the similarity scores across all speakers.
Figure 2a Figure 2b Figure 2c

E.C.D.F of similarity scores

An E.C.D.F gives a good idea of the differnt quantiles for the distribution.
Figure 3a Figure 3b Figure 3c

Quantiles

Some quantiles for the distribuiton of scores is computed. These are useful as threshholds for acceptance.

Model q(0.6) q(0.7) q(0.75) q(0.8) q(0.9)
spkrec-ecapa-voxceleb 0.6304247379302979 0.6656997799873352 0.682094156742096 0.6995031833648682 0.7450991868972778
spkrec-ecapa-voxceleb 0.9659057259559631 0.9702195525169373 0.9722887873649597 0.9744231700897217 0.9789397716522217
spkrec-ecapa-cnceleb 0.4949462413787842 0.5427722334861755 0.568801999092102 0.5973677635192871 0.6694293022155762

Conclusions

For the use case I am using the models for, the speechbrain/spkrec-ecapa-voxceleb seems like a better fit than speechbrain/spkrec-xvect-voxceleb, as it does not give high confidence scores when the speakers are distinct.

More testing should be done on how well the models identifies that the speaker is the same on different clips of the same speaker. However, since the models were prepared and tested using the voxceleb dataset seperately, I trust that they both already excel in this regard.

About

A short test to determine the distribution of similarity scores for different SpeechBrain speaker identification models.

Topics

Resources

Stars

Watchers

Forks

Languages