View article

[PDF] from qut.edu.au

Deep spatio-temporal features for multimodal emotion recognition

Authors

Dung Nguyen, Kien Nguyen, Sridha Sridharan, Afsane Ghasemi, David Dean, Clinton Fookes

Publication date

2017/3/24

Conference

2017 IEEE winter conference on applications of computer vision (WACV)

Pages

1215-1223

Publisher

IEEE

Description

Automatic emotion recognition has attracted great interest and numerous solutions have been proposed, most of which focus either individually on facial expression or acoustic information. While more recent research has considered multimodal approaches, individual modalities are often combined only by simple fusion at the feature and/or decision-level. In this paper, we introduce a novel approach using 3-dimensional convolutional neural networks (C3Ds) to model the spatio-temporal information, cascaded with multimodal deep-belief networks (DBNs) that can represent the audio and video streams. Experiments conducted on the eNTERFACE multimodal emotion database demonstrate that this approach leads to improved multimodal emotion recognition performance and significantly outperforms recent state-of-the-art proposals.

Total citations

Cited by 82

201720182019202020212022202320242 7 8 16 17 14 12 6

Scholar articles

Deep spatio-temporal features for multimodal emotion recognition

D Nguyen, K Nguyen, S Sridharan, A Ghasemi, D Dean… - 2017 IEEE winter conference on applications of …, 2017