View article

[PDF] from epfl.ch

Convolutional neural networks-based continuous speech recognition using raw speech signal

Authors

Dimitri Palaz, Mathew Magimai- Doss, Ronan Collobert

Publication date

2015/4/19

Conference

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pages

4295-4299

Publisher

IEEE

Description

State-of-the-art automatic speech recognition systems model the relationship between acoustic speech signal and phone classes in two stages, namely, extraction of spectral-based features based on prior knowledge followed by training of acoustic model, typically an artificial neural network (ANN). In our recent work, it was shown that Convolutional Neural Networks (CNNs) can model phone classes from raw acoustic speech signal, reaching performance on par with other existing feature-based approaches. This paper extends the CNN-based approach to large vocabulary speech recognition task. More precisely, we compare the CNN-based approach against the conventional ANN-based approach on Wall Street Journal corpus. Our studies show that the CNN-based approach achieves better performance than the conventional ANN-based approach with as many parameters. We also show that the features …

Total citations

Cited by 229

201420152016201720182019202020212022202320241 3 16 21 34 31 23 38 28 23 10

Scholar articles

Convolutional neural networks-based continuous speech recognition using raw speech signal

D Palaz, MM Doss, R Collobert - 2015 IEEE International Conference on Acoustics …, 2015