View article

[PDF] from arxiv.org

CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection

Authors

Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu

Publication date

2023/1/11

Journal

IEEE Transactions on Image Processing

Volume

Pages

892-904

Publisher

IEEE

Description

Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed c ross-mod a l v iew-mixed transform er (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. Besides, considering the quadratic complexity w.r.t. the number of input tokens …

Total citations

Cited by 60

2022202320241 22 37

Scholar articles

CAVER: Cross-modal view-mixed transformer for bi-modal salient object detection

Y Pang, X Zhao, L Zhang, H Lu - IEEE Transactions on Image Processing, 2023