SuperParsing: Scalable Nonparametric Image Parsing with Superpixels Joseph Tighe and Svetlana Lazebnik Dept. of Computer Science, University of North Carolina at Chapel Hill http://www.cs.unc.edu/SuperParsing Overview • “Open universe” system: no training required, easy to accommodate an evolv- ing dataset as new classes or new training exemplars are added • State of the art performance on the SIFT Flow dataset (Liu et al., CVPR 2009) • New large-scale baseline for image parsing: per-pixel recognition results on a subset of LabelMe consisting of 15k images, 170 labels Query Image Retrieval set of similar images Per-class likelihood Building Road Sky Car Sky Vertical Horizontal Superpixels Building Car Road Sky Semantic Classes Geometric Classes Image Parsing Method Given a query (test) image: • Find a retrieval set of 200 similar images by taking the minimum per-feature rank of four global image features. • For each test superpixel s i described by multiple local features {f k i }, compute a likelihood ratio score for each class c found in the retrieval set: L(s i ,c)= P (s i |c) P (s i |¬c) = Y k P (f k i |c) P (f k i |¬c) . The feature likelihood P (f k i |c) is given by a nonparametric density estimate: P (f k i | c)= #(retrieval set features of class c within a fixed radius off k i ) #(total features of class c in the training set) . • Use MRF inference to solve for the label field c = {c i } over the entire test image: J (c)= X s i ∈SP -w i log L(s i ,c i )+ λ X (s i ,s j )∈A E edge (c i ,c j ) , where w i is a weight based on the superpixel size, and edge penalty E edge is based on the co-occurrence of adjacent labels in the training set: E edge (c i ,c j )= - log[(P (c i |c j )+ P (c j |c i ))/2] × δ [c i 6= c j ] . Road Sky Sky Sea Sea Sand Sand Tree Max. Likelihood Ratio Edge Penality MRF Labeling Query Image Joint Semantic and Geometric Labeling Simultaneously solve for a field of semantic labels (c) and geometric labels (g) over the image by optimizing H (c, g)= J (c)+ J (g)+ μ X s i ∈SP ϕ(c i ,g i ) , where ϕ(c i ,g i ) is a coherence term between the semantic and geometric label of the same superpixel. 72.0 68.6 77.6 97.2 97.6 94.8 Road Window Door Sidewalk Building Awning Sign Person Horz Vert Sky Initial Labeling Joint Semantic and Geometric Query Ground Truth Labels Semantic MRF Semantic Classes Geometric Classes Results on Large-Scale Datasets • SIFT Flow dataset (Liu et al., CVPR 2009): 2,488 training images, 200 test images, 33 labels • Barcelona dataset (a new large-scale benchmark): 14,871 training images, 279 test images, 170 labels 0 20 40 60 80 # of Superpixels (x1000) 264,945 SIFT Flow Dataset Barcelona Dataset Label Frequency in Dataset Per-pixel classification rates (with average per-class rates in parentheses): SIFT Flow Barcelona Semantic Geometric Semantic Geometric Liu et al. (CVPR 2009) 74.75 N/A N/A N/A Local labeling 73.2 (29.1) 89.8 62.5 (8.0) 89.9 MRF 76.3 (28.8) 89.9 66.6 (7.6) 90.2 Joint semantic/geometric 76.9 (29.4) 90.8 66.9 (7.6) 90.7 0% 25% 50% 75% 100% SIFT Flow Dataset Barcelona Dataset Per-class Performance Timing SIFT Flow Barcelona Training set size 2,488 14,871 Image size 256 × 256 640 × 480 Ave. # superpixels 63.9 307.9 Feature extraction ∼ 4 sec ∼ 5 min Retrieval set search 0.04 ± 0.0 0.21 ± 0.0 Superpixel search 4.4 ± 2.3 34.2 ± 13.4 MRF solver 0.005 ± 0.003 0.03 ± 0.02 Total (excluding features) 4.4 ± 2.3 34.4 ± 13.4 0 100 200 300 400 500 0 20 40 60 80 Number of Superpixels Seconds Full System Times SIFT Flow Dataset Barcelona Dataset Sample Output on SIFT Flow Dataset Initial Labeling Final Labeling Geometric Labeling Query Ground Truth Labels Edge Penalties Horz Unlabeled Tree Sky Road 98.7 98.6 99.2 Car Bridge Vert Sky Tree Sky Sea Horz Vert Sky Road Mountain Grass 95.2 97.1 97.1 Field Desert Tree Sky Sea Horz Vert Sky Sun Mountain 86.6 88.4 88.5 Sea Horz Vert Sky Mountain 68.8 94.2 94.2 Field Desert Sky Building Tree Horz Vert Sky Road 85.4 86.2 94.2 Sky Sidewalk Building Car Horz Window Vert Sky Road Sky Sidewalk 73.2 77.2 93.3 Building Balcony Door Tree Sky Horz Vert Sky Mountain 57.9 73.2 81.3 Building Small Datasets Results on two small-scale datasets using trained boosted decision tree classifiers (instead of retrieval set and superpixel search): Stanford Dataset Geometric Context Dataset 715 images, 8 classes 300 images, 7 classes Semantic Geometric Sub-classes Main classes Gould et al. (ICCV 2009) 76.4 91.0 N/A 86.9 Hoiem et al. (IJCV 2007) N/A N/A 61.5 88.1 Local labeling 76.9 90.5 57.6 87.8 MRF 77.5 90.6 61.0 88.2 Joint semantic/geometric 77.5 90.6 61.0 88.1 Funding This research was supported in part by NSF CAREER award IIS-0845629, Microsoft Research Faculty Fellowship, and Xerox.