Authors
Matthias Wölfel, Christian Fügen, Shajith Ikbal, John W McDonough
Publication date
2006
Conference
Ninth International Conference on Spoken Language Processing
Description
In this work, we present our progress in multi-source far field automatic speech-to-text transcription for lecture speech. In particular, we show how the best of several far field channels can be selected based on a signal-to-noise ratio criterion, and how the signals from multiple channels can be combined at either the waveform level using blind channel combination or at the hypothesis level using confusion network techniques to improve the accuracy of a far field lecture transcription system. Using the techniques described here, we ran a series of experiments on the test set used by the US National Institute of Standards and Technologies for the RT-05S evaluation. For the multiple distant microphones (MDM) task of RT-05S, our system achieved a word error rate of 38.5% which represents an improvement of over 13% absolute compared to the best reported results in the RT-05S evaluation.
Total citations
20062007200820092010201120122013201420152016201720182019223144549632
Scholar articles
M Wölfel, C Fügen, S Ikbal, JW McDonough - Ninth International Conference on Spoken Language …, 2006