Direkt zum Inhalt springen
Computer Vision Group
TUM School of Computation, Information and Technology
Technical University of Munich

Technical University of Munich

Menu

Links

Informatik IX
Computer Vision Group

Boltzmannstrasse 3
85748 Garching info@vision.in.tum.de

Follow us on:

YouTube X / Twitter Facebook

News

26.02.2025

We have twelve papers accepted to CVPR 2025. Check our publication page for more details.

24.10.2024

LSD SLAM received the ECCV 2024 Koenderink Award for standing the Test of Time.

03.07.2024

We have seven papers accepted to ECCV 2024. Check our publication page for more details.

09.06.2024
GCPR / VMV 2024

GCPR / VMV 2024

We are organizing GCPR / VMV 2024 this fall.

04.03.2024

We have twelve papers accepted to CVPR 2024. Check our publication page for more details.

More



Seminar: Unsupervised Learning in Computer Vision (5 ECTS)

Winter Semester 2025/26, TU München

Organisers: Dominik Schnaus, Christoph Reich

Please direct questions to ulcv-ws25@vision.in.tum.de

News

2025-07-17: [preliminary meeting] will take place from 13:30 - 14:00 on 17.07.2025 online via Zoom (https://tum-conf.zoom-x.de/j/69788622207?pwd=7bMo0rwFUutWalbbTzfs6yEEUJjYPJ.1). Please attend the meeting to learn more about the course or ask questions. Attendance at the preliminary meeting is not mandatory.

Course Description

Current progress in visual understanding has predominantly been driven by supervised learning (e.g., SAM, DepthAnything). However, acquiring large amounts of annotated data is highly labour-intensive and even infeasible for certain applications. Unsupervised approaches, such as DINO or SMURF, eliminate the need for ground truth annotations. This seminar will discuss some of the most significant and recent advances in unsupervised learning for visual understanding, ranging from self-supervised representation learning to unsupervised segmentation and reconstruction.

Prerequisites

Machine Learning (IN2064), Introduction to Deep Learning (IN2346), or similar

Topics

Please choose a sub-topic, such as "Clustering-Based Methods" or "Paired Pre-Training" not individual papers!

  • Representation Learning in CV
    • Clustering-Based Methods
      1. Deep clustering for unsupervised learning of visual features [Caron et al., ECCV 2018] paper link
      2. Self-labelling via simultaneous clustering and representation learning [Asano et al., ICLR 2020] paper link
      3. Unsupervised learning of visual features by contrasting cluster assignments [Caron et al., NeurIPS 2020] paper link
      4. Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning [Venkataramanan et al., arXiv 2025] paper link
    • Contrastive Methods (Fundamental contrastive methods)
      1. Representation learning with contrastive predictive coding [van den Oord et al., arXiv 2018] paper link
      2. A simple framework for contrastive learning of visual representations [Chen et al., ICML 2020] paper link
    • Contrastive Methods (Momentum contrast)
      1. Momentum contrast for unsupervised visual representation learning [He et al., CVPR 2020] paper link
      2. Improved baselines with momentum contrastive learning [Chen et al., arXiv 2020] paper link
      3. An empirical study of training self-supervised vision transformers [Chen et al., ICCV 2021] paper link
    • Information Maximization
      1. Barlow Twins: Self-supervised learning via redundancy reduction [Zbontar et al., ICML 2021] paper link
      2. VICReg: Variance-Invariance-Covariance Regularization For Self-Supervised Learning [Bardes et al., ICLR 2022] paper link
    • Masked Autoencoders (Image)
      1. BEiT: BERT pre-training of image Transformers [Bao et al., ICLR 2022] paper link
      2. Masked autoencoders are scalable vision learners [He et al., CVPR 2022] paper link
      3. BEiT v2: Masked image modeling with vector-quantized visual tokenizers [Peng et al., arXiv 2022] paper link
    • Masked Autoencoders (Video)
      1. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training [Tong et al., NeurIPS 2022] paper link
      2. Masked autoencoders as spatiotemporal learners [Feichtenhofer et al., NeurIPS 2022] paper link
      3. VideoMAE v2: Scaling video masked autoencoders with dual masking [Wang et al., CVPR 2023] paper link
    • Joint-Embedding Predictive Architecture (Image)
      1. Self-supervised learning from images with a joint-embedding predictive architecture [Assran et al., ICCV 2023] paper link
      2. Masked siamese networks for label-efficient learning [Assran et al., ECCV 2022] paper link
      3. iBot: Image BERT Pre-training with Online Tokenizer [Zhou et al., ICLR 2022] paper link
    • Joint-Embedding Predictive Architecture (Video)
      1. V-jepa: Latent video prediction for visual representation learning [Bardes et al., OpenReview 2022] paper link
      2. V-jepa 2: Self-supervised video models enable understanding, prediction and planning [Assran et al., arXiv 2025] paper link
    • Self-Distillation Methods (Fundamental self-distillation methods)
      1. Bootstrap your own latent-a new approach to self-supervised learning [Grill et al., NerIPS 2020] paper link
      2. Exploring simple siamese representation learning [Chen et al., CVPR 2021] paper link
      3. Data2vec: A general framework for self-supervised learning in speech, vision and language [Baevski et al., ICML 2022] paper link
    • Self-Distillation Methods (DINO)
      1. Emerging properties in self-supervised vision transformers [Caron et al., ICCV 2021] paper link
      2. DINOv2: Learning Robust Visual Features without Supervision [Oquab et al., TMLR 2024] paper link
      3. DINOv3 [Siméoni et al., arXiv 2025] paper link
    • Self-Distillation Methods (Video self-distillation)
      1. Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video [Venkataramanan et al., ICLR 2024] paper link
    • Diffusion Features
      1. Emergent correspondence from image diffusion [Tang et al., NerIPS 2023] paper link
      2. Cleandift: Diffusion features without noise [Stracke et al., CVPR 2025] paper link
  • Depth, Optical Flow & 3D Reconstruction
    • Monodepth
      1. Unsupervised monocular depth estimation with left-right consistency [Godard et al., CVPR 2017] paper link
      2. Digging into self-supervised monocular depth estimation [Godard et al.,ICCV 2019] paper link
    • Monocular Depth for Dynamic Scenes
      1. Dynamo-depth: Fixing unsupervised depth estimation for dynamical scene [Sun et al., NeurIPS 2023] paper link
      2. ProDepth: Boosting self-supervised multi-frame monocular depth with probabilistic fusion [Woo et al., ECCV 2024] paper link
    • Optical Flow
      1. What matters in unsupervised optical flow [Jonschkowski et al., ECCV 2020] paper link
      2. SMURF: Self-teaching multi-frame unsupervised raft with full-image warping [Stone et al., CVPR 2021] paper link
    • Scene Flow
      1. Self-supervised monocular scene flow estimation [Hur et al., CVPR 2020] paper link
      2. EMR-MSF: Self-supervised recurrent monocular scene flow exploiting ego-motion rigidity [Jiang et al., ICCV 2023] paper link
    • Pose Estimation & 3D Reconstruction
      1. AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos [Wimbauer et al., CVPR 2025] paper link
      2. Structure-from-motion revisited [Schonberger et al., CVPR 2016] paper link
    • Single-Image 3D Reconstruction
      1. Behind the scenes: Density fields for single view reconstruction [Wimbauer et al., CVPR 2023] paper link
      2. Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion [Jevtic et al., ICCV 2025] paper link
  • Scene-Understanding
    • Object-Centric Learning
      1. Object-centric learning with slot attention [Locatello et al., NeurIPS 2020] paper link
    • Instance Segmentation
      1. Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization [Melas-Kyriazi et al., CVPR 2022] paper link
      2. Cut and learn for unsupervised object detection and instance segmentation [Wang et al., CVPR 2023] paper link
    • Semantic Segmentation
      1. Unsupervised Semantic Segmentation by Distilling Feature Correspondences [Hamilton et al., ICLR 2022] paper link
      2. Unsupervised semantic segmentation through depth-guided feature correlation and sampling [Sick et al., CVPR 2024] paper link
    • Panoptic Segmentation
      1. Unsupervised universal image segmentation [Niu et al., CVPR 2024] paper link
      2. Scene-Centric Unsupervised Panoptic Segmentation [Hahn et al., CVPR 2024] paper link
  • Unsupervised Style Transfer
    1. Unpaired image-to-image translation using cycle-consistent adversarial networks [Zhu et al., ICCV 2017] paper link
  • Language Models
    • BERT
      1. Bert: Pre-training of deep bidirectional transformers for language understanding [Devlin et al., NAACL 2019] paper link
      2. Roberta: A robustly optimized bert pretraining approach [Liu et al., arXiv 2019] paper link
    • GPT
      1. Improving language understanding by generative pre-training [Radford et al., 2018] paper link
      2. Language models are unsupervised multitask learners [Radford et al., 2019] paper link
      3. Language models are few-shot learners [Brown et al., NeurIPS 2020] paper link
      4. GPR-4 technical report [Achiam et al., arXiv 2024] paper link
    • DeepSeek
      1. Deepseek-V3 technical report [Liu et al., arXiv 2024] paper link
      2. Deepseek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning [Guo et al., arXiv 2025] paper link
    • Unsupervised Translation
      1. Word translation without parallel data [Conneau et al., ICLR 2018] paper link
      2. Harnessing the universal geometry of embeddings [Jha et al., arXiv 2025] paper link
  • Vision-Language
    • Paired Pre-Training
      1. Learning transferable visual models from natural language supervision [Radford et al., ICML 2021] paper link
      2. Sigmoid loss for language image pre-training [Zhai et al., ICCV 2023] paper link
    • Using Less Pairs
      1. Lit: Zero-shot transfer with locked-image text tuning [Zhai et al., CVPR 2022] paper link
      2. Relative representations enable zero-shot latent space communication [Moschella et al., ICLR 2023] paper link
      3. ASIF: Coupled data turns unimodal models to multimodal without training [Norelli et al., NerIPS 2023] paper link
    • Alignment & Translation
      1. The platonic representation hypothesis [Huh et al., ICML 2024] paper link
      2. It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data [Schnaus et al., CVPR 2025] paper link

Rechte Seite

Informatik IX
Computer Vision Group

Boltzmannstrasse 3
85748 Garching info@vision.in.tum.de

Follow us on:

YouTube X / Twitter Facebook

News

26.02.2025

We have twelve papers accepted to CVPR 2025. Check our publication page for more details.

24.10.2024

LSD SLAM received the ECCV 2024 Koenderink Award for standing the Test of Time.

03.07.2024

We have seven papers accepted to ECCV 2024. Check our publication page for more details.

09.06.2024
GCPR / VMV 2024

GCPR / VMV 2024

We are organizing GCPR / VMV 2024 this fall.

04.03.2024

We have twelve papers accepted to CVPR 2024. Check our publication page for more details.

More