View article

A neural network based regression approach for recognizing simultaneous speech

Authors

Weifeng Li, Kenichi Kumatani, John Dines, Mathew Magimai-Doss, Herve Bourlard

Publication date

2008

Conference

Machine Learning for Multimodal Interaction: 5th International Workshop, MLMI 2008, Utrecht, The Netherlands, September 8-10, 2008. Proceedings 5

Pages

110-118

Publisher

Springer Berlin Heidelberg

Description

This paper presents our approach for automatic speech recognition (ASR) of overlapping speech. Our system consists of two principal components: a speech separation component and a feature estmation component. In the speech separation phase, we first estimated the speaker’s position, and then the speaker location information is used in a GSC-configured beamformer with a minimum mutual information (MMI) criterion, followed by a Zelinski and binary-masking post-filter, to separate the speech of different speakers. In the feature estimation phase, the neural networks are trained to learn the mapping from the features extracted from the pre-separated speech to those extracted from the close-talking microphone speech signal. The outputs of the neural networks are then used to generate acoustic features, which are subsequently used in acoustic model adaptation and system evaluation. The proposed …

Total citations

Cited by 1

20131

Scholar articles

A neural network based regression approach for recognizing simultaneous speech

W Li, K Kumatani, J Dines, M Magimai-Doss… - Machine Learning for Multimodal Interaction: 5th …, 2008