3. Depth transforms with inv. scaling Sufficient to train a classifier for a single d C For other depths d : 4. Multiple semantic classes 1. Pixel-wise classifier superpixels not necessarily planar 2. Translation invariant Pulling Things out of Perspective Single View Depth Estimation Ľubor Ladický 1 , Jianbo Shi 2 , Marc Pollefeys 1 1 ETH Zürich, Switzerland 2 University of Pennsylvania, Philadelphia, USA Experiments Our classifier Training of the classifier Standard approaches 1. Model fitting [Barinova et al. ECCV08] • Requires strong prior knowledge • Ignores small objects 2. 3D-Detection based [Hoiem et al. CVPR06] • Works only for foreground objects (things) 3. Depth from semantic labels [Liu et al, CVPR10] • Requires strong priors about semantic classes 4. Data driven [Saxena et al, NIPS05] • Requires lots of data • A problem with balancing data General problem •No common structure of the scene •Ground plane not always visible •Large variation of viewpoints and of objects in the scene •Both things and stuff in the scene •Impossible ? Classifier response for x and at a depth d window wh around the point xI semantic label 1. Image pyramid is built 2. Training data randomly sampled 3. Samples of each class at d C used as positives 4. Samples of other classes or at d ≠ d C used as negatives 5. Multi-class classifier trained • Dense Features SIFT, LBP, Self Similarity, Texton • Representation Soft BOW representations in the set of rectangles • Classifier AdaBoost Patch classification KITTI dataset • 30 training & 30 test images • 12 semantic labels • depth range 2-50m (except sky) • neighbouring depths d i+1 / d i = 1.25 NYU2 dataset • 725 training & 724 test images • 40 semantic labels • depth range 1-10 m • neighbouring depths d i+1 / d i = 1.25 KITTI dataset The ratio of pixels below the relative error NYU2 dataset Semantic segmentation results