View article

[PDF] from clemson.edu

What's making that sound?

Authors

Kai Li, Jun Ye, Kien A Hua

Publication date

2014/11/3

Book

Proceedings of the 22nd ACM international conference on Multimedia

Pages

147-156

Description

In this paper, we investigate techniques to localize the sound source in video made using one microphone. The visual object whose motion generates the sound is located and segmented based on the synchronization analysis of object motion and audio energy. We first apply an effective region tracking algorithm to segment the video into a number of spatial-temporal region tracks, each representing the temporal evolution of an appearance-coherent image structure (i.e., object). We then extract the motion features of each object as its average acceleration in each frame. Meanwhile, Short-term Fourier Transform is applied to the audio signal to extract audio energy feature as the audio descriptor. We further impose a nonlinear transformation on both audio and visual descriptors to obtain the audio and visual codes in a common rank correlation space. Finally, the correlation between an object and the audio signal is …

Total citations

Cited by 22

2016201720182019202020212022202320244 4 3 2 5 1 2 1

Scholar articles

What's making that sound?

K Li, J Ye, KA Hua - Proceedings of the 22nd ACM international conference …, 2014