View article

[PDF] from leaschoenherr.me

Environmentally robust audio-visual speaker identification

Authors

Lea Schönherr, Dennis Orth, Martin Heckmann, Dorothea Kolossa

Publication date

2016/12/13

Conference

IEEE Spoken Language Technology Workshop (SLT)

Publisher

IEEE

Description

To improve the accuracy of audio-visual speaker identification, we propose a new approach, which achieves an optimal combination of the different modalities on the score level. We use the i-vector method for the acoustics and the local binary pattern (LBP) for the visual speaker recognition. Regarding the input data of both modalities, multiple confidence measures are utilized to calculate an optimal weight for the fusion. Thus, oracle weights are chosen in such a way as to maximize the difference between the score of the genuine speaker and the person with the best competing score. Based on these oracle weights a mapping function for weight estimation is learned. To test the approach, various combinations of noise levels for the acoustic and visual data are considered. We show that the weighted multimodal identification is far less influenced by the presence of noise or distortions in acoustic or visual …

Total citations

Cited by 10

20172018201920202021202220231 1 1 3 3 1

Scholar articles

Environmentally robust audio-visual speaker identification

L Schönherr, D Orth, M Heckmann, D Kolossa - 2016 IEEE Spoken Language Technology Workshop …, 2016