A Unified Framework for Object Detection, Pose Estimation, and Sub-category Recognition Roozbeh Mottaghi, Yu Xiang, and Silvio Savarese • Our goal is to detect objects in images. • In addition to the object bounding box, we are interested in estimating its viewpoint (e.g., it is critical for autonomous driving applications). • We also find the sub-category of the object and provide finer details for the detected object. Summary Car Truck Aeroplane Fighter Hierarchical Model Top Layer Middle Layer Bottom Layer Dataset • For training and evaluating our method we required a large scale 3D dataset. Hence, we annotated around 30,000 images with 3D information [1]. elevation distance azimuth References [1] Y. Xiang, R. Mottaghi, and S. Savarese. Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild. In WACV 2014. [2] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. In PAMI, 2010. Approach •We formulate the problem as a structured learning problem. a: azimuth, e: elevation, d: distance, : mid-layer category, : fine-layer category, o: object type is the feature vector, and is the learned weight. • The hierarchy allows us to break down the problem into easier chunks. Also, the joint modeling of these tasks with a hierarchy enables us to better recover from the mistakes that a single layer makes. coarse to fine hierarchy car race SUV race 1 race 2 Results • We evaluate our method on PASCAL 3D+ dataset, which is a challenging benchmark. • The evaluation is done for all three tasks: detection, pose estimation, and sub-category recognition. Detection (AP) Viewpoint Sub-category Viewpoint + Sub-category Viewpoint + Sub-category+ Sub-sub-category DPM [2] 32.0 33.3 16.5 6.5 0.8 Hierarchical Model 41.1 45.9 26.1 11.9 1.9 We use histogram of oriented gradients (HOG) and convolutional neural network features. Failure Cases