Authors
Ajinkya Tejankar, Maziar Sanjabi, Bichen Wu, Madian Khabsa, Saining Xie, Hamed Pirsiavash, Hamed Firooz
Publication date
2022/11
Journal
NeurIPS 2022 Workshop: Self-Supervised Learning-Theory and Practice, 2022
Description
Natural language supervision in the form of image captions was recently shown to be an effective way of training zero-shot image classification models. In this work, we focus on teasing out what parts of the language supervision are essential for training zero-shot models. Through extensive and careful experiments, we show that replacing intact captions with Bag-of-Words (BoW) does not significantly degrade the zero-shot performance. Surprisingly, we can even slightly improve the performance on some datasets by balancing the frequency of words in BoW.
Total citations
202220232024598
Scholar articles
A Tejankar, M Sanjabi, B Wu, S Xie, M Khabsa… - arXiv preprint arXiv:2112.13884, 2021
A Tejankar, M Sanjabi, B Wu, M Khabsa, S Xie… - NeurIPS 2022 Workshop: Self-Supervised Learning …, 2022