View article

Context autoencoder for self-supervised representation learning

Authors

Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang

Publication date

2024/1

Journal

International Journal of Computer Vision

Volume

132

Issue

Pages

208-223

Publisher

Springer US

Description

We present a novel masked image modeling (MIM) approach, context autoencoder (CAE), for self-supervised representation pretraining. We pretrain an encoder by making predictions in the encoded representation space. The pretraining tasks include two tasks: masked representation prediction—predict the representations for the masked patches, and masked patch reconstruction—reconstruct the masked patches. The network is an encoder–regressor–decoder architecture: the encoder takes the visible patches as input; the regressor predicts the representations of the masked patches, which are expected to be aligned with the representations computed from the encoder, using the representations of visible patches and the positions of visible and masked patches; the decoder reconstructs the masked patches from the predicted encoded representations. The CAE design encourages the separation of learning the …

Total citations

Cited by 306

202020212022202320241 1 60 137 107

Scholar articles

Context autoencoder for self-supervised representation learning

X Chen, M Ding, X Wang, Y Xin, S Mo, Y Wang, S Han… - International Journal of Computer Vision, 2024