View article

[PDF] from thecvf.com

Grad-cam: Visual explanations from deep networks via gradient-based localization

Authors

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra

Publication date

2017

Conference

Proceedings of the IEEE international conference on computer vision

Pages

618-626

Description

We propose a technique for producing'visual explanations' for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent. Our approach-Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say logits for'dog'or even a caption), flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families:(1) CNNs with fully-connected layers (eg VGG),(2) CNNs used for structured outputs (eg captioning),(3) CNNs used in tasks with multi-modal inputs (eg VQA) or reinforcement learning, and needs no architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations),(b) outperform previous methods on the ILSVRC-15 weakly-supervised localization task,(c) are more faithful to the underlying model, and (d) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show that even non-attention based models can localize inputs. Finally, we design and conduct human studies to measure if Grad-CAM …

Total citations

Cited by 18829

2017201820192020202120222023202477 319 927 1827 3061 4089 5304 3146

Scholar articles

Grad-cam: Visual explanations from deep networks via gradient-based localization

RR Selvaraju, M Cogswell, A Das, R Vedantam… - Proceedings of the IEEE international conference on …, 2017

Cited by 18353 Related articles All 11 versions

Grad-CAM: Why did you say that?*

RR Selvaraju, A Das, R Vedantam, M Cogswell… - arXiv preprint arXiv:1611.07450, 2016