View article

[PDF] from uva.nl

Early versus late fusion in semantic video analysis

Authors

Cees GM Snoek, Marcel Worring, Arnold WM Smeulders

Publication date

2005/11/6

Book

Proceedings of the 13th annual ACM international conference on Multimedia

Pages

399-402

Description

Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, we consider two classes of fusion schemes, namely early fusion and late fusion. The former fuses modalities in feature space, the latter fuses modalities in semantic space. We show by experiment on 184 hours of broadcast video data and for 20 semantic concepts, that late fusion tends to give slightly better performance for most concepts. However, for those concepts where early fusion performs better the difference is more significant.

Total citations

Cited by 1128

200620072008200920102011201220132014201520162017201820192020202120222023202410 25 32 28 38 30 66 70 85 81 80 85 71 83 84 76 73 65 35

Scholar articles

Early versus late fusion in semantic video analysis

CGM Snoek, M Worring, AWM Smeulders - Proceedings of the 13th annual ACM international …, 2005

Cited by 1128 Related articles All 13 versions