View article

[PDF] from idiap.ch

End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition

Authors

Dimitri Palaz, Mathew Magimai-Doss, Ronan Collobert

Publication date

2019/4/1

Journal

Speech Communication

Volume

108

Pages

15-32

Publisher

North-Holland

Description

In hidden Markov model (HMM) based automatic speech recognition (ASR) system, modeling the statistical relationship between the acoustic speech signal and the HMM states that represent linguistically motivated subword units such as phonemes is a crucial step. This is typically achieved by first extracting acoustic features from the speech signal based on prior knowledge such as, speech perception or/and speech production knowledge, and, then training a classifier such as artificial neural networks (ANN), Gaussian mixture model that estimates the emission probabilities of the HMM states. This paper investigates an end-to-end acoustic modeling approach using convolutional neural networks (CNNs), where the CNN takes as input raw speech signal and estimates the HMM states class conditional probabilities at the output. Alternately, as opposed to a divide and conquer strategy (i.e., separating feature …

Total citations

Cited by 155

2019202020212022202320249 15 31 35 36 26

Scholar articles

End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition

D Palaz, M Magimai-Doss, R Collobert - Speech Communication, 2019