View article

[PDF] from arxiv.org

How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges

Authors

Haotong Qin^, Ge-Peng Ji^, Salman Khan, Deng-Ping Fan*, Fahad Shahbaz Khan, Luc Van Gool

Publication date

2023/7/27

Journal

Machine Intelligence Research (MIR)

Volume

Issue

Pages

605-613

Description

Google’s Bard has emerged as a formidable competitor to OpenAI’s ChatGPT in the field of conversational AI. Notably, Bard has recently been updated to handle visual inputs alongside text prompts during conversations. Given Bard’s impressive track record in handling textual inputs, we explore its capabilities in understanding and interpreting visual data (images) conditioned by text questions. This exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Generative models, especially in addressing complex computer vision problems that demand accurate visual and language understanding. Specifically, in this study, we focus on 15 diverse task scenarios encompassing regular, camouflaged, medical, under-water and remote sensing data to comprehensively evaluate Bard’s performance. Our primary finding indicates that Bard still struggles in these vision …

Total citations

Cited by 14

202320244 10

Scholar articles

How good is Google Bard’s visual understanding? an empirical study on open challenges

H Qin, GP Ji, S Khan, DP Fan, FS Khan, LV Gool - 2023