Practical Course: Deep Learning for Spatial AI (10 ECTS)
Overview
This practical course is designed for students who want hands-on experience with cutting-edge research in Spatial AI and 3D computer vision. It enables participants to turn their knowledge of deep learning and computer vision into practical, research-oriented skills using state-of-the-art models. The course is an excellent stepping stone toward independent computer vision projects, including a master’s thesis.
Organisers
News
- 06.03.2026: We have a few open slots available. Please send your application (CV + transcripts) to dl4sai-ss26@vision.in.tum.de by March 31.
- 09.02.2026: Slides from the preliminary meeting: PDF.
- 06.02.2026: The preliminary meeting is on February 9, 11:00.
Timeline
| February 9, 11:00 | Preliminary meeting (PDF) |
| February 12-17 | Register in the matching system |
| until February 17 | Submit course application (dl4sai-ss26@vision.in.tum.de) |
| April 13, 11:00–12:00 | Project introduction |
| April 15-18 | Project matching |
| May 18, 11:00–13:00 | Midterm presentations (in-person, room TBA) |
| July 27, 11:00–13:00 | Final presentations (in-person, room TBA) |
| September 30 | Project reports due |
Topics
The goal of this course is to build practical experience with state-of-the-art computer vision models in Spatial AI and to explore new ideas for addressing open research challenges, such as:
- 3D/4D reconstruction and SLAM (e.g. VGGT);
- 3D priors with diffusion models (e.g. Bolt3D);
- 3D tracking (e.g. SpatialTracker);
- self-supervised learning with 3D priors (e.g. RayZer).
Prerequisites
At least one completed course from the following list:
- Introduction to Deep Learning (IN2346)
- Computer Vision II (IN2228)
- Computer Vision III (IN2375)
- Machine Learning for 3D Geometry (IN2392)
- 3D Computer Vision (IN2057)
or equivalent.
Course Logistics
Course supervisors will propose peer-reviewed project ideas at the start of the course, centered on Spatial AI topics such as extracting geometric or semantic information from images or videos, including camera or object pose estimation and dynamic object segmentation. Projects are carried out in groups of up to three students with regular guidance from an advisor. At the end of the course, each group presents its results in class and submits a written report.


