Technical University of Munich

This is an old revision of the document!

Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry

Contact: Nan Yang

Abstract

Monocular visual odometry approaches that purely rely on geometric cues are prone to scale drift and require sufficient motion parallax in successive frames for motion estimation and 3D reconstruction. In this paper, we propose to leverage deep monocular depth prediction to overcome limitations of geometry-based monocular visual odometry. To this end, we incorporate deep depth predictions into Direct Sparse Odometry (DSO) as direct virtual stereo measurements. For depth prediction, we design a novel deep network that refines predicted depth from a single image in a two-stage process. We train our network in a semi-supervised way on photoconsistency in stereo images and on consistency with accurate sparse depth reconstructions from Stereo DSO. Our deep predictions excel state-of-the-art approaches for monocular depth on the KITTI benchmark. Moreover, our Deep Virtual Stereo Odometry clearly exceeds previous monocular and deep-learning based methods in accuracy. It even achieves comparable performance to the state-of-the-art stereo methods, while only relying on a single camera.

Semi-Supervised Deep Monocular Depth Estimation

We propose a semi-supervised approach to deep monocular depth estimation. It builds on three key ingredients: self-supervised learning from photoconsistency in a stereo setup, supervised learning based on accurate sparse depth reconstruction by Stereo DSO, and StackNet, a two-stage network with a stacked encoder-decoder architecture.

Deep Virtual Stereo Odometry

Deep Virtual Stereo Odometry (DVSO) builds on the windowed sparse direct bundle adjustment formulation of monocular DSO. We use our disparity predictions for DSO in two key ways: Firstly, we initialize depth maps of new keyframes from the disparities. Beyond this rather straightforward approach, we also incorporate virtual direct image alignment constraints into the windowed direct bundle adjustment of DSO. We obtain these constraints by warping images with the estimated depth by bundle adjustment and the predicted right disparities by our network assuming a virtual stereo setup.

Computer Vision Group
TUM School of Computation, Information and Technology
Technical University of Munich

Technical University of Munich

Links

Informatik IX
Computer Vision Group

News

GCPR / VMV 2024

Navigation

Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry

Abstract

Semi-Supervised Deep Monocular Depth Estimation

Deep Virtual Stereo Odometry

Rechte Seite

Informatik IX
Computer Vision Group

News

GCPR / VMV 2024

Computer Vision GroupTUM School of Computation, Information and TechnologyTechnical University of Munich

Technical University of Munich

Links

Informatik IX Computer Vision Group

News

GCPR / VMV 2024

Navigation

Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry

Abstract

Semi-Supervised Deep Monocular Depth Estimation

Deep Virtual Stereo Odometry

Rechte Seite

Informatik IX Computer Vision Group

News

GCPR / VMV 2024

Computer Vision Group
TUM School of Computation, Information and Technology
Technical University of Munich

Informatik IX
Computer Vision Group

Informatik IX
Computer Vision Group