View article

[PDF] from academia.edu

Audio-visual large vocabulary continuous speech recognition in the broadcast domain

Authors

Sankar Basu, Chalapathy Neti, Nitendra Rajput, A Senior, L Subramaniam, Ashish Verma

Publication date

1999/9/13

Conference

1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No. 99TH8451)

Pages

475-481

Publisher

IEEE

Description

Considers the problem of combining visual cues with audio signals for the purpose of improved automatic machine recognition of speech. Although significant progress has been made in the machine transcription of large-vocabulary continuous speech (LVCSR) over the last few years, the technology to date is most effective only under controlled conditions, such as low noise, speaker-dependent recognition, read speech (as opposed to conversational speech), etc. On the other hand, while augmenting the recognition of speech utterances with visual cues has attracted the attention of researchers over the last couple of years, most efforts in this domain can be considered to be only preliminary in the sense that, unlike LVCSR efforts, tasks have been limited to small vocabularies (e.g. commands, digits) and often to speaker-dependent training or isolated word speech, where word boundaries are artificially well-defined.

Total citations

Cited by 58

199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020211 7 4 2 6 6 9 2 3 2 1 3 1 2 3 3 3

Scholar articles

Audio-visual large vocabulary continuous speech recognition in the broadcast domain

S Basu, C Neti, N Rajput, A Senior, L Subramaniam… - 1999 IEEE Third Workshop on Multimedia Signal …, 1999