Roozbeh Mottaghi, Yu Xiang, and Silvio Savarese · Roozbeh Mottaghi, Yu Xiang, and Silvio Savarese • Our goal is to detect objects in images. • In addition to the object bounding

A Unified Framework for Object Detection, Pose Estimation, and Sub-category Recognition Roozbeh Mottaghi, Yu Xiang, and Silvio Savarese

• Our goal is to detect objects in images.

• In addition to the object bounding box, we are interested

in estimating its viewpoint (e.g., it is critical for

autonomous driving applications).

• We also find the sub-category of the object and provide

finer details for the detected object.

Summary

Car

Truck Aeroplane

Fighter

Hierarchical Model

Top Layer

Middle Layer

Bottom Layer

Dataset

• For training and evaluating our method we required a

large scale 3D dataset. Hence, we annotated around

30,000 images with 3D information [1].

elevation

distance

azimuth

References

[1] Y. Xiang, R. Mottaghi, and S. Savarese. Beyond PASCAL: A

Benchmark for 3D Object Detection in the Wild. In WACV 2014.

[2] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D.

Ramanan. Object detection with discriminatively trained part

based models. In PAMI, 2010.

Approach

•We formulate the problem as a structured learning

problem.

a: azimuth, e: elevation, d: distance, : mid-layer

category, : fine-layer category, o: object type

is the feature vector, and is the learned weight.

• The hierarchy allows us to break down the problem into

easier chunks. Also, the joint modeling of these tasks

with a hierarchy enables us to better recover from the

mistakes that a single layer makes.

coarse to fine hierarchy

car

race SUV

race 1 race 2

Results

• We evaluate our method on PASCAL 3D+ dataset, which

is a challenging benchmark.

• The evaluation is done for all three tasks: detection,

pose estimation, and sub-category recognition.

Detection (AP)

Viewpoint Sub-category Viewpoint + Sub-category

Viewpoint + Sub-category+

Sub-sub-category

DPM [2] 32.0 33.3 16.5 6.5 0.8 Hierarchical

Model 41.1 45.9 26.1 11.9 1.9

We use histogram of oriented

gradients (HOG) and convolutional

neural network features.

Failure Cases

Roozbeh Mottaghi, Yu Xiang, and Silvio Savarese · Roozbeh Mottaghi, Yu Xiang, and Silvio Savarese • Our goal is to detect objects in images. • In addition to the object bounding

Documents