View article

Vision UFormer: Long-range monocular absolute depth estimation

Authors

Tomas Polasek, Martin Čadík, Yosi Keller, Bedrich Benes

Publication date

2023/4/1

Journal

Computers & Graphics

Volume

111

Pages

180-189

Publisher

Pergamon

Description

We introduce Vision UFormer (ViUT), a novel deep neural long-range monocular depth estimator. The input is an RGB image, and the output is an image that stores the absolute distance of the object in the scene as its per-pixel values. ViUT consists of a Transformer encoder and a ResNet decoder combined with the UNet style of skip connections. It is trained on 1M images across ten datasets in a staged regime that starts with easier-to-predict data such as indoor photographs and continues to more complex long-range outdoor scenes. We show that ViUT provides comparable results for normalized relative distances and short-range classical datasets such as NYUv2 and KITTI. We further show that it successfully estimates absolute long-range depth in meters. We validate ViUT on a wide variety of long-range scenes showing its high estimation capabilities with a relative improvement of up to 23%. Absolute depth …

Total citations

Cited by 5

202320242 3

Scholar articles

Vision UFormer: Long-range monocular absolute depth estimation

T Polasek, M Čadík, Y Keller, B Benes - Computers & Graphics, 2023