Authors
Giovanni Bellitto, Federica Proietto Salanitri, Simone Palazzo, Francesco Rundo, Daniela Giordano, Concetto Spampinato
Publication date
2021/12
Journal
International Journal of Computer Vision
Volume
129
Pages
3216-3232
Publisher
Springer US
Description
In this work, we propose a 3D fully convolutional architecture for video saliency prediction that employs hierarchical supervision on intermediate maps (referred to as conspicuity maps) generated using features extracted at different abstraction levels. We provide the base hierarchical learning mechanism with two techniques for domain adaptation and domain-specific learning. For the former, we encourage the model to unsupervisedly learn hierarchical general features using gradient reversal at multiple scales, to enhance generalization capabilities on datasets for which no annotations are provided during training. As for domain specialization, we employ domain-specific operations (namely, priors, smoothing and batch normalization) by specializing the learned features on individual datasets in order to maximize performance. The results of our experiments show that the proposed model yields state-of-the …
Total citations
20212022202320243111915
Scholar articles
G Bellitto, F Proietto Salanitri, S Palazzo, F Rundo… - International Journal of Computer Vision, 2021