Direkt zum Inhalt springen
Computer Vision Group
TUM School of Computation, Information and Technology
Technical University of Munich

Technical University of Munich



3D Object in Clutter Recognition and Segmentation

This dataset focuses on the recognition of known objects in cluttered and incomplete 3D scans. It is composed of 150 synthetic scenes, captured with a (perspective) virtual camera, and each scene contains 3 to 5 objects. The model set is composed of 20 different objects, taken from different sources and then processed in order to obtain comparably smooth surfaces of almost uniform 100-350k triangles with an average resolution of 1.0.

The dataset features some original shapes (faces) specifically designed for the task by Matteo Sala.

If you use the dataset, please cite the following paper:

A Scale Independent Selection Process for 3D Object Recognition in Cluttered Scenes
E. Rodola, A. Albarelli, F. Bergamasco, and A. Torsello
International Journal of Computer Vision (IJCV), vol. 102, 2013

Scenes and segmentations
Complete models
Ground-truth rigid motions


The model objects as well as the range scenes come as standard binary PLY files. If you use Matlab, all scenes and models are loaded correctly with this loader using the 'tri' option (part of the PLY_IO toolkit). If you have any problems loading the files, please contact us.

The ground-truth .txt files specify, for each scene, the number of models appearing into it and the rigid motion bringing each model to its scene position. Each model is described by a line containing the model name, the rotation matrix row-major, and the translation vector, i.e.:

[model name] [R00 R01 R02 R10 R11 R12 R20 R21 R22] [T0 T1 T2]

Each scene comes with a .segments file containing segmentation labels for each scene point, in the same order as they appear in the PLY file. The labels are simply integer numbers specifying the model objects to which scene vertices belong; each number refers to the order of appearance of the model in the ground-truth .txt file, starting from 1 (for example, the label 2 indicates the second model in the .txt). The label 0 can be safely ignored an must be skipped when processing the segmentation file. The initial header of each file has the following meaning:

[resolution] [z-gap] [total # vertices] [# models] [# vertices model 1] [# vertices model 2] … [# vertices model N]

Finally, occlusion_clutter.txt contains ground-truth occlusion and clutter for each model in each scene, defined as follows:

occlusion = 1 - (visible object area) / (total object area)

clutter = 1 - (visible object area) / (total scene area)


  • This dataset is designed to be challenging. It contains many smooth (feature-less) as well as similar objects (e.g., david2 vs victoria3 or horse7 vs centaur1), that should provide enough room for false positives even under low levels of occlusion and clutter. Additionally, some scenes present intersecting surfaces, which will induce spurious descriptors and keypoints for feature-based methods.
  • The scenes as they are have no additional sensor-like noise applied to them, in order to give more freedom of use in your experiments. To better simulate a scanning process, you might want to apply depth positional noise to the scene vertices.
  • Ground-truth occlusion and clutter might be under-estimated due to some models having internal vertices (gun0026), which cannot be captured by any range image. However, the error is negligible.

Rechte Seite

Informatik IX
Computer Vision Group

Boltzmannstrasse 3
85748 Garching info@vision.in.tum.de

Follow us on:



CVPR 2023

We have six papers accepted to CVPR 2023.


NeurIPS 2022

We have two papers accepted to NeurIPS 2022.


WACV 2023

We have two papers accepted at WACV 2023.


Fulbright PULSE podcast on Prof. Cremers went online on Apple Podcasts and Spotify.


MCML Kick-Off

On July 27th, we are organizing the Kick-Off of the Munich Center for Machine Learning in the Bavarian Academy of Sciences.