Authors
Licheng Yu, Patrick Poirson, Shan Yang, Alexander C Berg, Tamara L Berg
Publication date
2016
Conference
Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14
Pages
69-85
Publisher
Springer International Publishing
Description
Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on incorporating better measures of visual context into referring expression models and find that visual comparison to other objects within an image helps improve performance significantly. We also develop methods to tie the language generation process together, so that we generate expressions for all objects of a particular category jointly. Evaluation on three recent datasets - RefCOCO, RefCOCO+, and RefCOCOg (Datasets and toolbox can be downloaded from https://github.com/lichengunc/refer ), shows the advantages of our methods for both referring expression generation and comprehension.
Total citations
20162017201820192020202120222023202432951698882177315250
Scholar articles
L Yu, P Poirson, S Yang, AC Berg, TL Berg - Computer Vision–ECCV 2016: 14th European …, 2016