Authors
Todd A Stephenson, Mathew Magimai Doss, Hervé Bourlard
Publication date
2004/4/19
Journal
IEEE transactions on speech and audio processing
Volume
12
Issue
3
Pages
189-203
Publisher
IEEE
Description
State-of-the-art automatic speech recognition (ASR) systems are usually based on hidden Markov models (HMMs) that emit cepstral-based features which are assumed to be piecewise stationary. While not really robust to noise, these features are also known to be very sensitive to "auxiliary" information, such as pitch, energy, rate-of-speech (ROS), etc. Attempts so far to include such auxiliary information in state-of-the-art ASR systems have often been based on simply appending these auxiliary features to the standard acoustic feature vectors. In the present paper, we investigate different approaches to incorporating this auxiliary information using dynamic Bayesian networks (DBNs) or hybrid HMM/ANNs (HMMs with artificial neural networks). These approaches are motivated by the fact that the auxiliary information is not necessarily (directly) emitted by the HMM states but, rather, carries higher-level information (e.g …
Total citations
20032004200520062007200820092010201120122013201420152016201720182019202020213661145261523141313
Scholar articles
TA Stephenson, MM Doss, H Bourlard - IEEE transactions on speech and audio processing, 2004