Authors
Lea Schönherr, Steffen Zeiler, Dorothea Kolossa
Publication date
2017/12/16
Conference
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Publisher
IEEE
Description
Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate …
Total citations
Scholar articles
L Schonherr, S Zeiler, D Kolossa - 2017 IEEE Automatic Speech Recognition and …, 2017