View article

Learning single and multi-scene camera pose regression with transformer encoders

Authors

Yoli Shavit, Ron Ferens, Yosi Keller

Publication date

2024/3/15

Journal

Computer Vision and Image Understanding

Pages

103982

Publisher

Academic Press

Description

Contemporary state-of-the-art localization methods perform feature matching against a structured scene model or learn to regress the scene 3D coordinates. The resulting matches between 2D query pixels and 3D scene coordinates are used to estimate the camera pose using PnP and RANSAC, requiring the camera intrinsics for both the query and reference images. An alternative approach is to directly regress the camera pose from the query image. Although less accurate, absolute camera pose regression does not require any additional information at inference time and is typically lightweight and fast. Recently, Transformers were proposed for learning multi-scene camera pose regression, employing encoders to attend to spatially varying deep features while using decoders to embed multiple scene queries at once. In this work, we show that Transformer Encoders can aggregate and extract task-informative …

Total citations

Cited by 14

2022202320244 4 6

Scholar articles

Paying attention to activation maps in camera pose regression*

Y Shavit, R Ferens, Y Keller - arXiv preprint arXiv:2103.11477, 2021

Paying attention to activation maps in camera pose regression. In arxiv preprint*

Y Shavit, R Ferens, Y Keller - arXiv preprint arxiv:2103.11477, 2021

Cited by 2 Related articles

Learning single and multi-scene camera pose regression with transformer encoders

Y Shavit, R Ferens, Y Keller - Computer Vision and Image Understanding, 2024