Authors
Kai Li, Jun Ye, Kien A Hua
Publication date
2014/11/3
Book
Proceedings of the 22nd ACM international conference on Multimedia
Pages
147-156
Description
In this paper, we investigate techniques to localize the sound source in video made using one microphone. The visual object whose motion generates the sound is located and segmented based on the synchronization analysis of object motion and audio energy. We first apply an effective region tracking algorithm to segment the video into a number of spatial-temporal region tracks, each representing the temporal evolution of an appearance-coherent image structure (i.e., object). We then extract the motion features of each object as its average acceleration in each frame. Meanwhile, Short-term Fourier Transform is applied to the audio signal to extract audio energy feature as the audio descriptor. We further impose a nonlinear transformation on both audio and visual descriptors to obtain the audio and visual codes in a common rank correlation space. Finally, the correlation between an object and the audio signal is …
Total citations
20162017201820192020202120222023202444325121
Scholar articles
K Li, J Ye, KA Hua - Proceedings of the 22nd ACM international conference …, 2014