View article

RealVAD: A Real-world Dataset and A Method for Voice Activity Detection by Body Motion Analysis

Authors

Cigdem Beyan, Muhammad Shahid, Vittorio Murino

Publication date

2021

Journal

IEEE Transactions on Multimedia

Volume

Pages

2071-2085

Description

We present an automatic voice activity detection (VAD) method that is solely based on visual cues. Unlike traditional approaches processing audio, we show that upper body motion analysis is desirable for the VAD task. The proposed method consists of components for body motion representation, feature extraction from a Convolutional Neural Network (CNN) architecture and unsupervised domain adaptation. The body motion representations as images are used by the feature extraction component, which is generic and person-invariant, thus, can be applied to a subject who has never been seen. The endmost component handles the domain-shift problem, which appears due to the fact that the way people move/ gesticulate while speaking might vary from subject to subject, which results in disparate body motion features and consequently poorer VAD performance. The experimental analyses applied on a publicly …

Total citations

Cited by 24

202020212022202320241 5 6 7 5

Scholar articles

RealVAD: A real-world dataset and a method for voice activity detection by body motion analysis

C Beyan, M Shahid, V Murino - IEEE Transactions on Multimedia, 2020