Authors
Cees GM Snoek, Marcel Worring, Arnold WM Smeulders
Publication date
2005/11/6
Book
Proceedings of the 13th annual ACM international conference on Multimedia
Pages
399-402
Description
Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, we consider two classes of fusion schemes, namely early fusion and late fusion. The former fuses modalities in feature space, the latter fuses modalities in semantic space. We show by experiment on 184 hours of broadcast video data and for 20 semantic concepts, that late fusion tends to give slightly better performance for most concepts. However, for those concepts where early fusion performs better the difference is more significant.
Total citations
200620072008200920102011201220132014201520162017201820192020202120222023202410253228383066708581808571838476736535
Scholar articles
CGM Snoek, M Worring, AWM Smeulders - Proceedings of the 13th annual ACM international …, 2005