SUN RGB-D SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite Shuran Song Samuel P. Lichtenberg Jianxiong Xiao Motivation !"# %&'() *+ ,#! -./01 ,23& !"##$ !&"''( ,&4567 89'& ! # :;<&=( >446(?@64 )* +,-.,/0123/ )* +,-.,/0123/ '* 456,70 :;<&=( A65& 83/, 9,+ -66B >446(?@64 83/, :33. ;1<3=0 Kinect v1 Kinect v2 Asus Xtion Intel Realsense color raw depth refined depth raw points refined points bedroom classroom dining room bathroom office home office conference room kitchen 2D segmentation 3D annotaion 2D segmentation 3D annotaion Effective free space Outside the room Inside some objects Beyond cutoff distance bathroom(6.4%) others(8.0%) classroom (9.3%) office (11.0%) furniture store (11.3%) bedroom(12.6%) computer room(1.0%) lecture theatre(1.2%) library(1.4%) study space(1.9%) home office(1.9%) discussion area(2.0%) dining area(2.4%) conference room(2.6%) lab(3.0%) corridor(3.8%) kitchen(5.6%) living room(6.0%) rest space(6.3%) dining room(2.3%) (a) object distribution (b) scene distribution 17712 0 1250 2500 3750 5000 chair table desk pillow sofa bed box cabinet garbage bin lamp shelf sofa chair monitor drawer frame sink side table paper trash can night stand book door book shelf computer dresser curtain toilet rack bag keyboard cpu tv kitchen cabinet coffee table fridge white board printer dining table stool bottle towel cup plant painting hanging cabinet mirror unknown computer monitor laptop board tray steel cabinet oven bench clothes bowl cooker kitchen tool carton box Kinect v2 SUN3D (ASUS Xtion) NYUv2 (Kinect v1) Intel RealSense Benchmark Tasks Manhattan Box (0.99) Ground Truth Geometric Context (0.27) Convex Hull (0.90) Convex Hull (0.85) Geometric Context (0.57) Ground Truth Manhattan Box (0.811) Data Capturing Annotation IoU: 53.1 Rr: 0.111 Rg : 0.111 Pg: 0.5 IoU 72.9 Rr: 0.333 Rg: 0.667 Pg: 0.667 IoU 63.9 Rr: 0.333 Rg: 0.667 Pg:1 IoU: 77.0 Rr: 0.25 Rg: 0.25 Pg: 0.5 Ground truth Sliding Shapes IoU: 54.6 Rr : 0.333 Rg : 0.333 Pg: 0.125 IoU:60 Rr: 0.50 Rg : 0.0.50 Pg: 0.5 bathtub bed bookshelf box chair counter desk door dresser garbage bin lamp monitor night stand pillow sink sofa table tv toilet Scene Classification Room Layout Estimation Total Scene Understanding 3D Object Detection and Pose Estimation Semantic Segmentation NYU 1,449 LSUN 10,195,373 PASCAL VOC 11,530 ImageNet 131,067 log 10 RGB datasets RGB-D SUNRGBD 10,335 NYU 1,449 P RGB-D NYU depth V2 B3DO SUN3D RealSense Xtion Kinect v1 Kinect v2 weight (pound) 0.077 0.5 4 4.5 size (inch) 5.2 × 0.25× 0.75 7.1 × 1.4 × 2 11 × 2.3 × 2.7 9.8 × 2.7 × 2.7 power 2.5W USB 2.5W USB 12.96W 115W depth resolution 628 × 468 640 × 480 640 × 480 512 × 424 color resolution 1920 × 1080 640 × 480 640 × 480 1920 × 1080 RealSense RGB-D Sensors 2. Top 4. Front 3. Side 1. Image 2. Top 4. Front 3. Side 1. Image Example Scenes Annotation Tool for 3D Object and 3D Room Layout Statistics of Semantic Annotation Examples of 2D and 3D Annotation References [NYU] Indoor Segmentation and Support Inference from RGBD Images. N. Silber- man, P. Kohli, D. Hoiem and R. Fergus. In ECCV, 2012. [SUN3D] SUN3D: A database of big spaces reconstructed using SfM and object labels. J. Xiao, A. Owens, and A. Torralba. In ICCV, 2013. [B3DO] A category-level 3-d object dataset: Putting the kinect to work. A. Janoch, S. Karayev, Y. Jia, J. T. Barron, M. Fritz, K. Saenko, and T. Darrell. In ICCV Workshop, 2011. home office RGB (38.1) D (27.7) RGB-D (39.0) RGB (19.7) D (20.1) RGB-D (23.0) bathroom bedroom classroom computer room conference room corridor dining area dining room discussion area furniture store kitchen lab lecture theatre library living room rest space bathroom bedroom classroom computer room conference room corridor dining area dining room discussion area furniture store kitchen lab lecture theatre library living room rest space GIST + RBF kernel PLACES- CNN + RBF kernel Precision = # prediction boxes # matched pairs with a correct category label Recall = # matched pairs with a correct category label # ground truth boxes Free Space IoU Precision Recall for all objects We introduce SUN RGB-D, a Pascal-scale RGB-D scene understanding dataset, which has 2D and 3D annota- tion for both objects and rooms. RGB-D sensors have also enabled rapid progress for scene understanding. However, the small dataset size has become a major bottleneck. Another problem of existing RGB-D datasets is that most of them are only labeled in 2D. Data & Code: http://rgbd.cs.princeton.edu Example Objects This work is supported by gift funds from Intel. Acknowledgement Kinect v2 and Battery Capturing Setup