Authors
Cees G. M. Snoek, Jianfeng Dong, Xirong Li, Xiaoxu Wang, Qijie Wei, Weiyu Lan, Efstratios Gavves, Noureldien Hussein, Dennis C. Koelma, Arnold W. M. Smeulders
Publication date
2015
Journal
TRECVID
Description
In this paper we summarize our TRECVID 2017 [1] video recognition and retrieval experiments. We participated in three tasks: video search, event detection and video description. For both video search and event detection we explore semantic representations based on VideoStory [8] and an ImageNet Shuffle [16], which thrive well in few-example regimes. For the video description task we experiment with a deep network that predicts a visual representation from a natural language description with Word2VisualVec [5], and use this space for the sentence matching. For generative description we enhance a neural image captioning model with Early Embedding and Late Reranking [4]. The 2017 edition of the TRECVID benchmark has been a fruitful participation for our joint-team, resulting in the best overall result for video search and event detection as well as the runner-up position for video description.
Total citations
2017201820192020202120222023202443553622