Supplementary Material: Monocular 3D Object …Supplementary Material: Monocular 3D Object Detection for Autonomous Driving Xiaozhi Chen 1, Kaustav Kundu 2, Ziyu Zhang , Huimin Ma

Supplementary Material: Monocular 3D Object Detection for AutonomousDriving

Xiaozhi Chen1, Kaustav Kundu2, Ziyu Zhang2, Huimin Ma1, Sanja Fidler2, Raquel Urtasun2

1Department of Electronic Engineering, Tsinghua University2Department of Computer Science, University of Toronto

{chenxz12@mails., mhmpub@}tsinghua.edu.cn, {kkundu, zzhang, fidler, urtasun}@cs.toronto.edu

Abstract

In this supplementary material we include a larger set of additional experiments and visualizations. We start with com-parisons to the state-of-the-art in the KITTI test set, followed by an in-depth analysis of 2D and 3D bounding box recall as afunction of number of proposals as well as the distance to the obstacle. The latter is a very important metric in the context ofautonomous driving. We then show an ablation study of the features employed followed by some additional visualizations.

1. Object Detection and Orientation Estimation PerformanceFig 1 and Fig. 2 show a comparison to all published monocular methods on the KITTI benchmark. Our approach achieves

the highest AP and AOS scores across all categories and difficulty levels. Note that we also provide a video visualizing ourresults in 2D and 3D.

2. Proposal RecallIn this section we report recall of our proposals in several regimes.

2D Bounding Box Recall: We show recall versus the number of proposals in Fig. 3, and recall versus IoU overlap thresholdin Fig. 4, Fig. 5, and Fig. 6, by fixing the number of proposals to 500, 1000 and 2000, respectively. We also report averagerecall (AR) as a function of the number of proposals in Fig. 7. Our approach outperforms all monocular methods by a largemargin, while being competitive with 3DOP [1], which uses stereo imagery, and thus is not a fair comparison.

Recall vs Distance: We report recall as a function of the distance from the ego-car in Fig. 8. It can be seen that our approachachieves very high recall even when the distance is quite large (higher than 40m). This shows the advantage of our approachin the setting of autonomous driving.

3D Bounding Box Recall: We also compare 3D bounding box recall of our monocular approach with 3DOP [1], which,however, exploits stereo imagery. Fig. 9 shows 3D box recall as a function of the number of proposals. We set the 3D IoUoverlap threshold to 0.25 for all categories. Although we do not exploit any depth features, our approach achieves similar 3Drecall as 3DOP on Car. For small objects, i.e., Pedestrian and Cyclist, our results are also promising.

3. Ablation Study of featuresWe conduct a detailed analysis of different types of features on Car proposals and show recall plots in Fig. 10, Fig. 11,

and Fig. 12. Note that each feature helps improve performance. We also study their effect on car detection and orientationestimation. As shown in Table. 1, each type of feature improves AP and AOS similar to their behaviors on proposal recall.

1

Table 1: Ablation study of features on Object Detection and Orientation Estimation: AP and AOS for Car on validationset of KITTI.

ApproachAP AOS

Easy Moderate Hard Easy Moderate HardLoc 83.59 77.50 69.41 81.56 75.09 66.39

+ClsSeg 87.96 86.76 78.16 86.40 84.66 75.72+Context 93.17 88.23 79.34 91.59 86.15 76.94+Shape 93.52 88.51 79.62 91.49 86.38 77.15

+InstSeg 93.89 88.67 79.68 91.90 86.28 77.09

4. VisualizationWe visualize some qualitative results in Fig. 13 and Fig. 14. We show top 50 proposals, 2D detections and 3D detections

for each example. We also show some failure examples in Fig. 15. Most failures are propagated from errors in class semanticsegmentation. Road segmentation particularly affects the results as our approach infers extent of the 3D space from the roadregion.

5. VideoWe refer the reader to the attached video for more visualizations of results. We note that to create the video no temporal

information is used, and all results are obtained from using a single monocular image.

References[1] X. Chen, K. Kundu, Y. Zhu, A. Berneshawi, H. Ma, S. Fidler, and R. Urtasun. 3d object proposals for accurate object class detection.

In NIPS, 2015. 1

2

Recall0 0.2 0.4 0.6 0.8 1

Pre

cis

ion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ACF 55.89

LSVM-MDPM-us 66.53

LSVM-MDPM-sv 68.02

ACF-SC 69.11

DPM-C8B1 74.33

MDPM-un-BB 71.19

DPM-VOC+VP 74.95

OC-DPM 74.94

MV-RGBD-RF 76.40

SubCat 84.14

3DVP 87.46

AOG 84.80

Regionlets 84.75

spLBP 87.19

Faster R-CNN 86.71

Ours 92.33

Recall0 0.2 0.4 0.6 0.8 1

Pre

cis

ion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ACF 54.74

LSVM-MDPM-us 55.42

LSVM-MDPM-sv 56.48

ACF-SC 58.66

DPM-C8B1 60.99

MDPM-un-BB 62.16

DPM-VOC+VP 64.71

OC-DPM 65.95

MV-RGBD-RF 69.92

SubCat 75.46

3DVP 75.77

AOG 75.94

Regionlets 76.45

spLBP 77.40

Faster R-CNN 81.84

Ours 88.66

Recall0 0.2 0.4 0.6 0.8 1

Pre

cis

ion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ACF 42.98

LSVM-MDPM-us 41.04

LSVM-MDPM-sv 44.18

ACF-SC 45.95

DPM-C8B1 47.16

MDPM-un-BB 48.43

DPM-VOC+VP 48.76

OC-DPM 53.86

MV-RGBD-RF 57.47

SubCat 59.71

3DVP 65.38

AOG 60.70

Regionlets 59.70

spLBP 60.60

Faster R-CNN 71.12

Ours 78.96

Car (Easy) Car (Moderate) Car (Hard)

Recall0 0.2 0.4 0.6 0.8 1

Pre

cis

ion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ACF 44.49

SubCat 54.67

SquaresICF 57.33

ACF-SC 51.53

DPM-VOC+VP 59.48

HA-SSVM 56.36

ACF-MR 58.82

Fusion-DPM 59.51

R-CNN 61.61

pAUCEnsT 65.26

MV-RGBD-RF 73.30

FilteredICF 67.65

DeepParts 70.49

CompACT-Deep 70.69

Regionlets 73.14

Faster R-CNN 78.86

Ours 80.35

Recall0 0.2 0.4 0.6 0.8

Pre

cis

ion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ACF 39.81

SubCat 42.34

SquaresICF 44.42

ACF-SC 44.49

DPM-VOC+VP 44.86

HA-SSVM 45.51

ACF-MR 46.23

Fusion-DPM 46.67

R-CNN 50.13

pAUCEnsT 54.49

MV-RGBD-RF 56.59

FilteredICF 56.75

DeepParts 58.67

CompACT-Deep 58.74

Regionlets 61.15

Faster R-CNN 65.90

Ours 66.68

Recall0 0.2 0.4 0.6 0.8

Pre

cis

ion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ACF 37.21

SubCat 37.95

SquaresICF 40.08

ACF-SC 40.38

DPM-VOC+VP 40.37

HA-SSVM 41.08

ACF-MR 42.10

Fusion-DPM 42.05

R-CNN 44.79

pAUCEnsT 48.60

MV-RGBD-RF 49.63

FilteredICF 51.12

DeepParts 52.78

CompACT-Deep 52.71

Regionlets 55.21

Faster R-CNN 61.18

Ours 63.44

Pedestrian (Easy) Pedestrian (Moderate) Pedestrian (Hard)

Recall0 0.2 0.4 0.6 0.8 1

Pre

cis

ion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

LSVM-MDPM-sv 35.04

DPM-C8B1 43.49

LSVM-MDPM-us 38.84

DPM-VOC+VP 42.43

Vote3D 41.43

pAUCEnsT 51.62

MV-RGBD-RF 52.97

Regionlets 70.41

Faster R-CNN 72.26

Ours 76.04

Recall0 0.2 0.4 0.6 0.8

Pre

cis

ion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

LSVM-MDPM-sv 27.50

DPM-C8B1 29.04

LSVM-MDPM-us 29.88

DPM-VOC+VP 31.08

Vote3D 31.24

pAUCEnsT 38.03

MV-RGBD-RF 42.61

Regionlets 58.72

Faster R-CNN 63.35

Ours 66.36

Recall0 0.2 0.4 0.6 0.8

Pre

cis

ion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1LSVM-MDPM-sv 26.21

DPM-C8B1 26.20

LSVM-MDPM-us 27.31

DPM-VOC+VP 28.23

Vote3D 28.60

pAUCEnsT 33.38

MV-RGBD-RF 37.42

Regionlets 51.83

Faster R-CNN 55.90

Ours 58.87

Cyclist (Easy) Cyclist (Moderate) Cyclist (Hard)

Figure 1: Precision vs Recall curves on KITTI test set. The number next to the label indicates the Average Precision (AP).

3

Recall0 0.2 0.4 0.6 0.8 1

Orienta

tion S

imila

rity

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

DPM-C8B1 59.51

LSVM-MDPM-sv 67.27

DPM-VOC+VP 72.28

OC-DPM 73.50

SubCat 83.41

3DVP 86.92

Ours 91.44

Recall0 0.2 0.4 0.6 0.8 1

Orienta

tion S

imila

rity

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

DPM-C8B1 50.32

LSVM-MDPM-sv 55.77

DPM-VOC+VP 61.84

OC-DPM 64.42

SubCat 74.42

3DVP 74.59

Ours 86.10

Recall0 0.2 0.4 0.6 0.8 1

Orienta

tion S

imila

rity

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

DPM-C8B1 39.22

LSVM-MDPM-sv 43.59

DPM-VOC+VP 46.54

OC-DPM 52.40

SubCat 58.83

3DVP 64.11

Ours 76.52


Recall0 0.2 0.4 0.6 0.8 1

Orienta

tion S

imila

rity

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ACF-MR 29.33

DPM-C8B1 31.08

SubCat 44.32

LSVM-MDPM-sv 43.58

DPM-VOC+VP 53.55

Ours 72.94

Recall0 0.2 0.4 0.6 0.8

Orie

nta

tio

n S

imila

rity

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ACF-MR 23.18

DPM-C8B1 23.37

SubCat 34.18

LSVM-MDPM-sv 35.49

DPM-VOC+VP 39.83

Ours 59.80

Recall0 0.2 0.4 0.6 0.8

Orie

nta

tio

n S

imila

rity

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ACF-MR 21.00

DPM-C8B1 20.72

SubCat 30.76

LSVM-MDPM-sv 32.42

DPM-VOC+VP 35.73

Ours 57.03


Recall0 0.2 0.4 0.6 0.8 1

Orienta

tion S

imila

rity

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

DPM-C8B1 27.25

LSVM-MDPM-sv 27.54

DPM-VOC+VP 30.52

Ours 70.13

Recall0 0.2 0.4 0.6 0.8

Orie

nta

tio

n S

imila

rity

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1DPM-C8B1 19.25

LSVM-MDPM-sv 22.07

DPM-VOC+VP 23.17

Ours 58.68

Recall0 0.2 0.4 0.6 0.8

Orie

nta

tio

n S

imila

rity

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1DPM-C8B1 17.95

LSVM-MDPM-sv 21.45

DPM-VOC+VP 21.58

Ours 52.35


Figure 2: Orientation Similarity vs Recall curves on KITTI test set. The number next to the label indicates the AverageOrientation Similarity (AOS).

4

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.7

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.7

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.7

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours


# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours


# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours


Figure 3: 2D bounding box Recall vs #Candidates. We use an overlap threshold of 0.7 for Car, and 0.5 for Pedestrian andCyclist. From left to right are for easy, moderate, and hard objects, respectively.

5

IoU overlap threshold0.5 0.6 0.7 0.8 0.9 1

recall

0

0.2

0.4

0.6

0.8

1BING 12.3

SS 26.7

EB 37.4

MCG 45.1

MCG-D 49.6

3DOP 65.8

Ours 66.5


recall

0

0.2

0.4

0.6

0.8

1BING 7.7

SS 18

EB 26.4

MCG 36.1

MCG-D 38.8

3DOP 57.5

Ours 59.9


recall

0

0.2

0.4

0.6

0.8

1BING 7

SS 16.5

EB 23

MCG 31.3

MCG-D 32.8

3DOP 57

Ours 57



recall

0

0.2

0.4

0.6

0.8

1BING 7.6

SS 5.4

EB 9.2

MCG 15

MCG-D 19.6

3DOP 49.3

Ours 40.6


recall

0

0.2

0.4

0.6

0.8

1BING 6.6

SS 5.1

EB 7.7

MCG 13.2

MCG-D 16.1

3DOP 43.6

Ours 36.8


recall

0

0.2

0.4

0.6

0.8

1BING 6.1

SS 5

EB 6.9

MCG 12.2

MCG-D 14

3DOP 38.6

Ours 34.6



recall

0

0.2

0.4

0.6

0.8

1BING 6.1

SS 7.4

EB 5

MCG 10.9

MCG-D 10.8

3DOP 52.7

Ours 36.2


recall

0

0.2

0.4

0.6

0.8

1BING 4.1

SS 6

EB 4.4

MCG 8

MCG-D 10.2

3DOP 37.6

Ours 32.5


recall

0

0.2

0.4

0.6

0.8

1BING 4.3

SS 6.1

EB 4.4

MCG 8.2

MCG-D 10.7

3DOP 37.7

Ours 32.1


Figure 4: 2D bounding box Recall vs IoU for 500 proposals. The number next to the label indicates the average recall(AR).

6


recall

0

0.2

0.4

0.6

0.8

1BING 13.4

SS 37.7

EB 46.9

MCG 52.8

MCG-D 58.3

3DOP 66.6

Ours 67


recall

0

0.2

0.4

0.6

0.8

1BING 8.6

SS 27

EB 35.5

MCG 43.5

MCG-D 46.6

3DOP 61.2

Ours 62.6


recall

0

0.2

0.4

0.6

0.8

1BING 8.5

SS 24.7

EB 31.7

MCG 38.5

MCG-D 40.3

3DOP 61.2

Ours 61.4



recall

0

0.2

0.4

0.6

0.8

1BING 10.1

SS 10.5

EB 17.9

MCG 18.5

MCG-D 26.1

3DOP 57.5

Ours 44.8


recall

0

0.2

0.4

0.6

0.8

1BING 8.9

SS 9.6

EB 15.2

MCG 16.6

MCG-D 21.6

3DOP 52

Ours 42.1


recall

0

0.2

0.4

0.6

0.8

1BING 8.2

SS 9.1

EB 13.5

MCG 15.3

MCG-D 19

3DOP 46.9

Ours 40.3



recall

0

0.2

0.4

0.6

0.8

1BING 9.2

SS 13

EB 12.7

MCG 14.8

MCG-D 17.9

3DOP 57.9

Ours 37.6


recall

0

0.2

0.4

0.6

0.8

1BING 5.9

SS 10.1

EB 9.3

MCG 11

MCG-D 15.1

3DOP 44.3

Ours 35.9


recall

0

0.2

0.4

0.6

0.8

1BING 6.2

SS 10.2

EB 9.4

MCG 11.1

MCG-D 15.6

3DOP 44.2

Ours 36



7


recall

0

0.2

0.4

0.6

0.8

1BING 14

SS 48.2

EB 54.4

MCG 59.6

MCG-D 61.8

3DOP 67.4

Ours 67.2


recall

0

0.2

0.4

0.6

0.8

1BING 9.1

SS 37.4

EB 44.2

MCG 51.7

MCG-D 50.9

3DOP 63.9

Ours 64.2


recall

0

0.2

0.4

0.6

0.8

1BING 9.3

SS 34.5

EB 40.5

MCG 46.8

MCG-D 45

3DOP 64.1

Ours 63.9



recall

0

0.2

0.4

0.6

0.8

1BING 10.6

SS 16.5

EB 28.4

MCG 23.2

MCG-D 32.5

3DOP 62.3

Ours 47.6


recall

0

0.2

0.4

0.6

0.8

1BING 9.3

SS 14.8

EB 24.3

MCG 21.1

MCG-D 26.9

3DOP 57.7

Ours 46.2


recall

0

0.2

0.4

0.6

0.8

1BING 8.6

SS 13.9

EB 21.9

MCG 19.3

MCG-D 23.9

3DOP 53.6

Ours 44.9



recall

0

0.2

0.4

0.6

0.8

1BING 9.4

SS 20.2

EB 23.2

MCG 18.8

MCG-D 24.1

3DOP 59.3

Ours 39


recall

0

0.2

0.4

0.6

0.8

1BING 6

SS 16

EB 16.7

MCG 14.8

MCG-D 19.9

3DOP 50.1

Ours 40


recall

0

0.2

0.4

0.6

0.8

1BING 6.4

SS 16

EB 16.5

MCG 15.2

MCG-D 20.3

3DOP 50.3

Ours 40.1



8

# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1BING

SS

EB

MCG

MCG-D

3DOP

Ours


# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1BING

SS

EB

MCG

MCG-D

3DOP

Ours


# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1BING

SS

EB

MCG

MCG-D

3DOP

Ours

# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1BING

SS

EB

MCG

MCG-D

3DOP

Ours


Figure 7: Average Recall (AR) vs #Candidates for 2D bounding boxes. Note that the comparison to 3DOP and MCG-D isunfair as we use a monocular image while they exploit depth information.

9

distance from ego-car (meters)0 10 20 30 40 50

reca

ll a

t Io

U t

hre

sh

old

0.7

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours

distance from ego-car (meters)0 20 40 60 80

reca

ll a

t Io

U t

hre

sh

old

0.7

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours


reca

ll a

t Io

U t

hre

sh

old

0.7

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours



reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours

distance from ego-car (meters)0 20 40 60

reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours


reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours



reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours


reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours


reca

ll a

t Io

U t

hre

sh

old

0.5

0

0.2

0.4

0.6

0.8

1

BING

SS

EB

MCG

MCG-D

3DOP

Ours


Figure 8: 2D bounding box Recall vs Distance using 2000 proposals. We use an overlap threshold of 0.7 for Car, and 0.5for Pedestrian and Cyclist.

10

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.2

5

0

0.2

0.4

0.6

0.8

1

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.2

5

0

0.2

0.4

0.6

0.8

1

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.2

5

0

0.2

0.4

0.6

0.8

1

3DOP

Ours


# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.2

5

0

0.2

0.4

0.6

0.8

1

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.2

5

0

0.2

0.4

0.6

0.8

1

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.2

5

0

0.2

0.4

0.6

0.8

1

3DOP

Ours


# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.2

5

0

0.2

0.4

0.6

0.8

1

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.2

5

0

0.2

0.4

0.6

0.8

1

3DOP

Ours

# candidates

101

102

103

104

reca

ll a

t Io

U t

hre

sh

old

0.2

5

0

0.2

0.4

0.6

0.8

1

3DOP

Ours


Figure 9: 3D bounding box Recall vs #Candidates at IoU threshold of 0.25. Note that our monocular approach achievessimilar 3D box recall on Car with 3DOP, which exploits stereo imagery.

11

# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1Loc

+ClsSeg

+Context

+Shape

+InstSeg

# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1Loc

+ClsSeg

+Context

+Shape

+InstSeg

# candidates

10 1 10 2 10 3 10 4

ave

rag

e r

eca

ll

0

0.2

0.4

0.6

0.8

1Loc

+ClsSeg

+Context

+Shape

+InstSeg


Figure 10: Ablation study of features on car proposals: Average Recall (AR) vs #Candidates for 2D bounding boxes.The basic model (Loc) only uses location prior feature. We then gradually add other types of features: class segmentation,context, shape, and instance segmentation.

# candidates

10 1 10 2 10 3 10 4

reca

ll a

t Io

U t

hre

sh

old

0.7

0

0.2

0.4

0.6

0.8

1

Loc

+ClsSeg

+Context

+Shape

+InstSeg

# candidates

10 1 10 2 10 3 10 4

reca

ll a

t Io

U t

hre

sh

old

0.7

0

0.2

0.4

0.6

0.8

1

Loc

+ClsSeg

+Context

+Shape

+InstSeg

# candidates

10 1 10 2 10 3 10 4

reca

ll a

t Io

U t

hre

sh

old

0.7

0

0.2

0.4

0.6

0.8

1

Loc

+ClsSeg

+Context

+Shape

+InstSeg


Figure 11: Ablation study of features on car proposals: 2D bounding box Recall vs #Candidates at IoU threshold of 0.7.The basic model (Loc) only uses location prior feature. We then gradually add other types of features: class segmentation,context, shape, and instance segmentation.


recall

0

0.2

0.4

0.6

0.8

1Loc 54.5

+ClsSeg 61.7

+Context 64

+Shape 67.2

+InstSeg 67.2


recall

0

0.2

0.4

0.6

0.8

1Loc 49.8

+ClsSeg 58.5

+Context 60.8

+Shape 64.1

+InstSeg 64.2


recall

0

0.2

0.4

0.6

0.8

1Loc 50.3

+ClsSeg 58.9

+Context 60.1

+Shape 63.4

+InstSeg 63.9


Figure 12: Ablation study of features on car proposals: 2D bounding box Recall vs IoU for 2000 proposals. The basicmodel (Loc) only uses location prior feature. We then gradually add other types of features: class segmentation, context,shape, and instance segmentation.

12

Figure 13: Qualitative examples of detections results for Cars: (left) top 50 scoring proposals (color from blue to red indicatesincreasing score), (middle) 2D detections, (right) 3D detections.

13

Figure 14: Qualitative examples of detections results for Pedestrians and Cyclists: (left) top 50 scoring proposals (color from blue tored indicates increasing score), (middle) 2D detections, (right) 3D detections.

14

Figure 15: Qualitative examples of failure cases: (left) input images, (middle) semantic segmentation for Car (red), Pedestrian (green),Cyclist (blue), and Road (yellow), and (right) best 3D proposals among 2K candidates. The correct detections are indicated in blue andthe missed detections are in red. Most failure cases are due to class or road segmentation error.

15

Supplementary Material: Monocular 3D Object …Supplementary Material: Monocular 3D Object Detection for Autonomous Driving Xiaozhi Chen 1, Kaustav Kundu 2, Ziyu Zhang , Huimin Ma

Documents