View article

[PDF] from uni-augsburg.de

An image-based deep spectrum feature representation for the recognition of emotional speech

Authors

Nicholas Cummins, Shahin Amiriparian, Gerhard Hagerer, Anton Batliner, Stefan Steidl, Björn W Schuller

Publication date

2017/10/19

Book

Proceedings of the 25th ACM international conference on Multimedia

Pages

478-484

Description

The outputs of the higher layers of deep pre-trained convolutional neural networks (CNNs) have consistently been shown to provide a rich representation of an image for use in recognition tasks. This study explores the suitability of such an approach for speech-based emotion recognition tasks. First, we detail a new acoustic feature representation, denoted as deep spectrum features, derived from feeding spectrograms through a very deep image classification CNN and forming a feature vector from the activations of the last fully connected layer. We then compare the performance of our novel features with standardised brute-force and bag-of-audio-words (BoAW) acoustic feature representations for 2- and 5-class speech-based emotion recognition in clean, noisy and denoised conditions. The presented results show that image-based approaches are a promising avenue of research for speech-based recognition …

Total citations

Cited by 177

201720182019202020212022202320243 23 18 32 34 30 30 4

Scholar articles

An image-based deep spectrum feature representation for the recognition of emotional speech

N Cummins, S Amiriparian, G Hagerer, A Batliner… - Proceedings of the 25th ACM international conference …, 2017