View article

Towards robust speech emotion recognition using deep residual networks for speech enhancement

Authors

Andreas Triantafyllopoulos, Gil Keren, Johannes Wagner, Ingmar Steiner, Björn Schuller

Publication date

2019

Description

The use of deep learning (DL) architectures for speech enhancement has recently improved the robustness of voice applications under diverse noise conditions. These improvements are usually evaluated based on the perceptual quality of the enhanced audio or on the performance of automatic speech recognition (ASR) systems. We are interested instead in the usefulness of these algorithms in the field of speech emotion recognition (SER), and specifically in whether an enhancement architecture can effectively remove noise while preserving enough information for an SER algorithm to accurately identify emotion in speech. We first show how a scalable DL architecture can be trained to enhance audio signals in a large number of unseen environments, and go on to show how that can benefit common SER pipelines in terms of noise robustness. Our results show that incorporating a speech enhancement architecture is beneficial, especially for low signal-to-noise ratio (SNR) conditions.

Total citations

Cited by 59

2019202020212022202320241 11 14 11 13 8

Scholar articles

Towards robust speech emotion recognition using deep residual networks for speech enhancement

A Triantafyllopoulos, G Keren, J Wagner, I Steiner… - 2019