Authors
Qing Li, Zhaofan Qiu, Ting Yao, Tao Mei, Yong Rui, Jiebo Luo
Publication date
2016/6/6
Book
Proceedings of the 2016 ACM on international conference on multimedia retrieval
Pages
159-166
Description
Recognizing actions in videos is a challenging task as video is an information-intensive media with complex variations. Most existing methods have treated video as a flat data sequence while ignoring the intrinsic hierarchical structure of the video content. In particular, an action may span different granularities in this hierarchy including, from small to large, a single frame, consecutive frames (motion), a short clip, and the entire video. In this paper, we present a novel framework to boost action recognition by learning a deep spatio-temporal video representation at hierarchical multi-granularity. Specifically, we model each granularity as a single stream by 2D (for frame and motion streams) or 3D (for clip and video streams) convolutional neural networks (CNNs). The framework therefore consists of multi-stream 2D or 3D CNNs to learn both the spatial and temporal representations. Furthermore, we employ the Long …
Total citations
2015201620172018201920202021202220232024154024211416121110
Scholar articles
Q Li, Z Qiu, T Yao, T Mei, Y Rui, J Luo - Proceedings of the 2016 ACM on international …, 2016