View article

[PDF] from arxiv.org

VQA-LOL: Visual Question Answering under the Lens of Logic

Authors

Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Publication date

2020/8/1

Conference

ECCV 2020

Description

Logical connectives and their implications on the meaning of a natural language sentence are a fundamental aspect of understanding. In this paper, we investigate whether visual question answering (VQA) systems trained to answer a question about an image, are able to answer the logical composition of multiple such questions. When put under this Lens of Logic, state-of-the-art VQA models have difficulty in correctly answering these logically composed questions. We construct an augmentation of the VQA dataset as a benchmark, with questions containing logical compositions and linguistic transformations (negation, disjunction, conjunction, and antonyms). We propose our Lens of Logic (LOL) model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fréchet-Compatibility Loss, which ensures that the answers of the component questions and …

Total citations

Cited by 96

2020202120222023202410 23 27 21 13

Scholar articles

Vqa-lol: Visual question answering under the lens of logic

T Gokhale, P Banerjee, C Baral, Y Yang - European conference on computer vision, 2020

Supplementary Material for VQA-LOL: Visual Question Answering under the Lens of Logic*

T Gokhale, P Banerjee, C Baral, Y Yang