PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Charles R. Qi*, Hao Su*, Kaichun Mo, Leonidas J. Guibas Motivation & Background Input Output mug? table? car? Classification Part Segmentation PointNet Semantic Segmentation Partial Inputs Complete Inputs airplane car chair lamp guitar motorbike mug table bag rocket earphone laptop cap knife pistol skateboard Original Shape Critical Point Sets Upper-bound Shapes input points point features output scores max pool shared shared shared nx3 nx3 nx64 nx64 nx1024 1024 n x 1088 nx128 mlp (64,64) mlp (64,128,1024) input transform feature transform mlp (512,256,k) global feature mlp (512,256) T-Net matrix multiply 3x3 transform T-Net matrix multiply 64x64 transform shared mlp (128,m) output scores nxm k Classification Network Segmentation Network 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 Accuracy (%) Missing Data Ratio PointNet VoxNet Original Shape Critical Point Sets Upper-bound Shapes 30 40 50 60 70 80 90 0 0.05 0.1 Accuracy (%) Perturbation noise std 30 40 50 60 70 80 90 100 0 0.2 0.4 0.6 0.8 1 Accuracy (%) Missing data ratio Furthest Random 20 30 40 50 60 70 80 90 100 0.1 0.2 0.3 0.4 0.5 Accuracy (%) Outlier ratio XYZ XYZ+density • Point Cloud Features • Mostly hand-crafted features -> feature learning • Deep Nets on Point Cloud/Shape • Point cloud is usually converted to volume, image or feature vector -> We work directly on point sets • Deep Nets on Unordered Sets • Not much work on deep nets for point sets, barely any for 3D -> We invent, experiment and explain novel architectures Deep Learning on Point Sets Properties of Point Sets PointNet Architecture Theorem: PointNet as a Universal Approximation to Set Functions ‣ We design a novel deep net architecture suitable for consuming unordered point sets in 3D; ‣ We show how such a net can be trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks; ‣ We provide thorough empirical and theoretical analysis on the stability and efficiency of our method; ‣ We illustrate the 3D features computed by the selected neurons in the net and develop intuitive explanations for its performance. Contributions Application Results Analysis Experiments Table 1: Segmentation results on ShapeNet part dataset. Metric is mean IoU(%) across shapes. Object Part Segmentation Shape Classification Table 2: Shape classification results on ModelNet40. Semantic Segmentation Table 3: Semantic segmentation results on Stanford 3D Parsing dataset. Table 4: Object detection results based on semantic segmentation. PointNet Robustness Test Time and Space Complexity PointNet is robust to various types of data corruption such as incompletion, outliers and perturbations Table 6: Time and space complexity of PointNet (classification network) compared with volumetric CNNs (subvolume and VRN) and multi-view CNNs (MVCNN). PointNet is highly time efficient (229x better than VRN, 141x better than MVCNN) and highly space efficient (17x less param. than MVCNN). Visualization of what PointNet has Learned Each cube visualizes the region of space that activates a point function. Critical points (those that affect the 1024-dim bottleneck layer) and shape upper-bound. Left: test data. Right: unseen category. Unordered. Unlike pixel arrays in images or voxel arrays in volumetric grids, point cloud is a set of points without specific order. Interaction among points. The points are from a space with a distance metric. It means that points are not isolated. Invariance under transformations. As a geometric object, the learned representation of the point set should be invariant to certain transformations. Unordered: Symmetry Function for Unordered Input Invariance under transformations: Joint Alignment Network Robot perception AR/VR Big Data + Deep 3D Representation Learning However.. 3D has multiple representations Point Cloud Mesh Volumetric Image Rawness Geometry Compact- ness Data Structure Previous Work Set Graph Array Array Spectral CNN 3D CNN Image CNN This work: Deep Learning on Point Sets for 3D Vision