View article

[PDF] from arxiv.org

Fine-tuning wav2vec2 for speaker recognition

Authors

Nik Vaessen, David A Van Leeuwen

Publication date

2022/5/23

Conference

ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pages

7967-7971

Publisher

IEEE

Description

This paper explores applying the wav2vec2 framework to speaker recognition instead of speech recognition. We study the effectiveness of the pre-trained weights on the speaker recognition task, and how to pool the wav2vec2 output sequence into a fixed-length speaker embedding. To adapt the framework to speaker recognition, we propose a single-utterance classification variant with cross-entropy or additive angular softmax loss, and an utterance-pair classification variant with BCE loss. Our best performing variant achieves a 1.88% EER on the extended voxceleb1 test set compared to 1.69% EER with an ECAPA-TDNN baseline. Code is available at github.com/nikvaessen/w2v2-speaker.

Total citations

Cited by 104

20212022202320241 23 54 25

Scholar articles

Fine-tuning wav2vec2 for speaker recognition

N Vaessen, DA Van Leeuwen - ICASSP 2022-2022 IEEE International Conference on …, 2022