Authors
Li Su, Chin-Chia Michael Yeh, Jen-Yu Liu, Ju-Chiang Wang, Yi-Hsuan Yang
Publication date
2014/3/11
Journal
IEEE Transactions on Multimedia
Volume
16
Issue
5
Pages
1188-1200
Publisher
IEEE
Description
There has been an increasing attention on learning feature representations from the complex, high-dimensional audio data applied in various music information retrieval (MIR) problems. Unsupervised feature learning techniques, such as sparse coding and deep belief networks have been utilized to represent music information as a term-document structure comprising of elementary audio codewords. Despite the widespread use of such bag-of-frames (BoF) model, few attempts have been made to systematically compare different component settings. Moreover, whether techniques developed in the text retrieval community are applicable to audio codewords is poorly understood. To further our understanding of the BoF model, we present in this paper a comprehensive evaluation that compares a large number of BoF variants on three different MIR tasks, by considering different ways of low-level feature …
Total citations
201320142015201620172018201920202021202220232024210187106132421
Scholar articles