View article

[PDF] from github.io

Action recognition by learning deep multi-granular spatio-temporal video representation

Authors

Qing Li, Zhaofan Qiu, Ting Yao, Tao Mei, Yong Rui, Jiebo Luo

Publication date

2016/6/6

Book

Proceedings of the 2016 ACM on international conference on multimedia retrieval

Pages

159-166

Description

Recognizing actions in videos is a challenging task as video is an information-intensive media with complex variations. Most existing methods have treated video as a flat data sequence while ignoring the intrinsic hierarchical structure of the video content. In particular, an action may span different granularities in this hierarchy including, from small to large, a single frame, consecutive frames (motion), a short clip, and the entire video. In this paper, we present a novel framework to boost action recognition by learning a deep spatio-temporal video representation at hierarchical multi-granularity. Specifically, we model each granularity as a single stream by 2D (for frame and motion streams) or 3D (for clip and video streams) convolutional neural networks (CNNs). The framework therefore consists of multi-stream 2D or 3D CNNs to learn both the spatial and temporal representations. Furthermore, we employ the Long …

Total citations

Cited by 155

20152016201720182019202020212022202320241 5 40 24 21 14 16 12 11 10

Scholar articles

Action recognition by learning deep multi-granular spatio-temporal video representation

Q Li, Z Qiu, T Yao, T Mei, Y Rui, J Luo - Proceedings of the 2016 ACM on international …, 2016