Authors
Zakaria Aldeneh, Emily Mower Provost
Publication date
2017/3/5
Conference
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Pages
2741-2745
Publisher
IEEE
Description
In this paper, we show that convolutional neural networks can be directly applied to temporal low-level acoustic features to identify emotionally salient regions without the need for defining or applying utterance-level statistics. We show how a convolutional neural network can be applied to minimally hand-engineered features to obtain competitive results on the IEMOCAP and MSP-IMPROV datasets. In addition, we demonstrate that, despite their common use across most categories of acoustic features, utterance-level statistics may obfuscate emotional information. Our results suggest that convolutional neural networks with Mel Filterbanks (MFBs) can be used as a replacement for classifiers that rely on features obtained from applying utterance-level statistics.
Total citations
2017201820192020202120222023202412126253921183
Scholar articles
Z Aldeneh, EM Provost - 2017 IEEE international conference on acoustics …, 2017