View article

[PDF] from umich.edu

Using regional saliency for speech emotion recognition

Authors

Zakaria Aldeneh, Emily Mower Provost

Publication date

2017/3/5

Conference

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pages

2741-2745

Publisher

IEEE

Description

In this paper, we show that convolutional neural networks can be directly applied to temporal low-level acoustic features to identify emotionally salient regions without the need for defining or applying utterance-level statistics. We show how a convolutional neural network can be applied to minimally hand-engineered features to obtain competitive results on the IEMOCAP and MSP-IMPROV datasets. In addition, we demonstrate that, despite their common use across most categories of acoustic features, utterance-level statistics may obfuscate emotional information. Our results suggest that convolutional neural networks with Mel Filterbanks (MFBs) can be used as a replacement for classifiers that rely on features obtained from applying utterance-level statistics.

Total citations

Cited by 154

201720182019202020212022202320241 21 26 25 39 21 18 3

Scholar articles

Using regional saliency for speech emotion recognition

Z Aldeneh, EM Provost - 2017 IEEE international conference on acoustics …, 2017