View article

[PDF] from thecvf.com

Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos

Authors

Suyog Dutt Jain, Bo Xiong, Kristen Grauman

Publication date

2017

Conference

Proceedings of the IEEE conference on computer vision and pattern recognition

Pages

3664-3673

Description

We propose an end-to-end learning framework for segmenting generic objects in videos. Our method learns to combine appearance and motion information to produce pixel level segmentation masks for all prominent objects in videos. We formulate this task as a structured prediction problem and design a two-stream fully convolutional neural network which fuses together motion and appearance in a unified framework. Since large-scale video datasets with pixel level segmentations are problematic, we show how to bootstrap weakly annotated videos together with existing image recognition datasets for training. Through experiments on three challenging video segmentation benchmarks, our method substantially improves the state-of-the-art for segmenting generic (unseen) objects. Code and pre-trained models are available on the project website.

Total citations

Cited by 455

2017201820192020202120222023202419 61 87 70 78 51 58 29

Scholar articles

Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos

S Dutt Jain, B Xiong, K Grauman - Proceedings of the IEEE conference on computer …, 2017