Authors
Taufiq Hasan, Rahim Saeidi, John HL Hansen, David A Van Leeuwen
Publication date
2013/5/26
Conference
2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Pages
7663-7667
Publisher
IEEE
Description
Speaker recognition systems trained on long duration utterances are known to perform significantly worse when short test segments are encountered. To address this mismatch, we analyze the effect of duration variability on phoneme distributions of speech utterances and i-vector length. We demonstrate that, as utterance duration is decreased, number of detected unique phonemes and i-vector length approaches zero in a logarithmic and non-linear fashion, respectively. Assuming duration variability as an additive noise in the i-vector space, we propose three different strategies for its compensation: i) multi-duration training in Probabilistic Linear Discriminant Analysis (PLDA) model, ii) score calibration using log duration as a Quality Measure Function (QMF), and iii) multi-duration PLDA training with synthesized short duration i-vectors. Experiments are designed based on the 2012 National Institute of Standards …
Total citations
2013201420152016201720182019202020212022202320246122218242114134611
Scholar articles
T Hasan, R Saeidi, JHL Hansen, DA Van Leeuwen - 2013 IEEE International Conference on Acoustics …, 2013