View article

[PDF] from thecvf.com

Cross-view action modeling, learning and recognition

Authors

Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song-Chun Zhu

Publication date

2014

Conference

Proceedings of the IEEE conference on computer vision and pattern recognition

Pages

2649-2656

Description

Existing methods on video-based action recognition are generally view-dependent, ie, performing recognition from the same views seen in the training data. We present a novel multiview spatio-temporal AND-OR graph (MST-AOG) representation for cross-view action recognition, ie, the recognition is performed on the video from an unknown and unseen view. As a compositional model, MST-AOG compactly represents the hierarchical combinatorial structures of cross-view actions by explicitly modeling the geometry, appearance and motion variations. This paper proposes effective methods to learn the structure and parameters of MST-AOG. The inference based on MST-AOG enables action recognition from novel views. The training of MST-AOG takes advantage of the 3D human skeleton data obtained from Kinect cameras to avoid annotating enormous multi-view video frames, which is error-prone and time-consuming, but the recognition does not need 3D information and is based on 2D video input. A new Multiview Action3D dataset has been created and will be released. Extensive experiments have demonstrated that this new action representation significantly improves the accuracy and robustness for cross-view action recognition on 2D videos.

Total citations

Cited by 604

20152016201720182019202020212022202320247 31 38 59 53 66 78 91 113 63

Scholar articles

Cross-view action modeling, learning and recognition

J Wang, X Nie, Y Xia, Y Wu, SC Zhu - Proceedings of the IEEE conference on computer …, 2014