View article

[PDF] from aaai.org

Diverse beam search for improved description of complex scenes

Authors

Ashwin Vijayakumar, Michael Cogswell, Ramprasaath Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra

Publication date

2018/4/27

Journal

Proceedings of the AAAI Conference on Artificial Intelligence

Volume

Issue

Description

A single image captures the appearance and position of multiple entities in a scene as well as their complex interactions. As a consequence, natural language grounded in visual contexts tends to be diverse---with utterances differing as focus shifts to specific objects, interactions, or levels of detail. Recently, neural sequence models such as RNNs and LSTMs have been employed to produce visually-grounded language. Beam Search, the standard work-horse for decoding sequences from these models, is an approximate inference algorithm that decodes the top-B sequences in a greedy left-to-right fashion. In practice, the resulting sequences are often minor rewordings of a common utterance, failing to capture the multimodal nature of source images. To address this shortcoming, we propose Diverse Beam Search (DBS), a diversity promoting alternative to BS for approximate inference. DBS produces sequences that are significantly different from each other by incorporating diversity constraints within groups of candidate sequences during decoding; moreover, it achieves this with minimal computational or memory overhead. We demonstrate that our method improves both diversity and quality of decoded sequences over existing techniques on two visually-grounded language generation tasks---image captioning and visual question generation---particularly on complex scenes containing diverse visual content. We also show similar improvements at language-only machine translation tasks, highlighting the generality of our approach.

Total citations

Cited by 240

20182019202020212022202320245 22 39 45 39 55 35

Scholar articles

Diverse beam search for improved description of complex scenes

A Vijayakumar, M Cogswell, R Selvaraju, Q Sun, S Lee… - Proceedings of the AAAI Conference on Artificial …, 2018