Authors
Guang Feng, Zhiwei Hu, Lihe Zhang, Jiayu Sun, Huchuan Lu
Publication date
2021/9/1
Journal
IEEE Transactions on Neural Networks and Learning Systems
Volume
34
Issue
5
Pages
2246-2258
Publisher
IEEE
Description
Recently, referring image localization and segmentation has aroused widespread interest. However, the existing methods lack a clear description of the interdependence between language and vision. To this end, we present a bidirectional relationship inferring network (BRINet) to effectively address the challenging tasks. Specifically, we first employ a vision-guided linguistic attention module to perceive the keywords corresponding to each image region. Then, language-guided visual attention adopts the learned adaptive language to guide the update of the visual features. Together, they form a bidirectional cross-modal attention module (BCAM) to achieve the mutual guidance between language and vision. They can help the network align the cross-modal features better. Based on the vanilla language-guided visual attention, we further design an asymmetric language-guided visual attention, which significantly …
Total citations
202220232024455
Scholar articles
G Feng, Z Hu, L Zhang, J Sun, H Lu - IEEE Transactions on Neural Networks and Learning …, 2021