View article

[PDF] from isca-archive.org

A Novel Phoneme-based Modeling for Text-independent Speaker Identification.

Authors

Xin Wang, Chuan Xie, Qiang Wu, Huayi Zhan, Ying Wu

Publication date

2022

Conference

INTERSPEECH

Pages

4775-4779

Description

Text-independent speaker identification attracted growing attention while it remains challenging to extract speaker-specific features from a speech with arbitrary content. End-to-end systems trained with utterance-level features suffer from performance degradation caused by speech content variation. To address this issue, this paper proposes a novel phoneme-based approach with the following key features: first, it restricts the variety of speech content by splitting each utterance into a set of phoneme segments and develops the phoneme-constrained models to extract segment-level embeddings of speakers; second, it leverages a soft-voting mechanism with mono-phonemic thresholds and weights to combine the results of different phonemes. Experimental results on AISHELL and ASRU2019 datasets show that the proposed approach is effective and robust, which outperforms the state-of-the-art methods in both EER and accuracy, especially with a larger phonemic mismatch between the enrollment and test utterances. In addition, the proposed system is efficient that can be trained well on a small-scale dataset.

Total citations

Cited by 1

20241

Scholar articles

A Novel Phoneme-based Modeling for Text-independent Speaker Identification.

X Wang, C Xie, Q Wu, H Zhan, Y Wu - INTERSPEECH, 2022