Authors
Weifeng Li, Kenichi Kumatani, John Dines, Mathew Magimai-Doss, Herve Bourlard
Publication date
2008
Conference
Machine Learning for Multimodal Interaction: 5th International Workshop, MLMI 2008, Utrecht, The Netherlands, September 8-10, 2008. Proceedings 5
Pages
110-118
Publisher
Springer Berlin Heidelberg
Description
This paper presents our approach for automatic speech recognition (ASR) of overlapping speech. Our system consists of two principal components: a speech separation component and a feature estmation component. In the speech separation phase, we first estimated the speaker’s position, and then the speaker location information is used in a GSC-configured beamformer with a minimum mutual information (MMI) criterion, followed by a Zelinski and binary-masking post-filter, to separate the speech of different speakers. In the feature estimation phase, the neural networks are trained to learn the mapping from the features extracted from the pre-separated speech to those extracted from the close-talking microphone speech signal. The outputs of the neural networks are then used to generate acoustic features, which are subsequently used in acoustic model adaptation and system evaluation. The proposed …
Total citations
Scholar articles
W Li, K Kumatani, J Dines, M Magimai-Doss… - Machine Learning for Multimodal Interaction: 5th …, 2008