Authors
Lea Schönherr, Dennis Orth, Martin Heckmann, Dorothea Kolossa
Publication date
2016/12/13
Conference
IEEE Spoken Language Technology Workshop (SLT)
Publisher
IEEE
Description
To improve the accuracy of audio-visual speaker identification, we propose a new approach, which achieves an optimal combination of the different modalities on the score level. We use the i-vector method for the acoustics and the local binary pattern (LBP) for the visual speaker recognition. Regarding the input data of both modalities, multiple confidence measures are utilized to calculate an optimal weight for the fusion. Thus, oracle weights are chosen in such a way as to maximize the difference between the score of the genuine speaker and the person with the best competing score. Based on these oracle weights a mapping function for weight estimation is learned. To test the approach, various combinations of noise levels for the acoustic and visual data are considered. We show that the weighted multimodal identification is far less influenced by the presence of noise or distortions in acoustic or visual …
Total citations
2017201820192020202120222023111331
Scholar articles
L Schönherr, D Orth, M Heckmann, D Kolossa - 2016 IEEE Spoken Language Technology Workshop …, 2016