Robert Bosch Centre for Data Science and Artificial Intelligence … Khapra.pdf · 2020. 9. 2. · 57 IOU = = 0.54 IOU = = 0.79 IOU = = 0.98 Image Source: Nitesh Methani, Pritha Ganguly,

Robert Bosch Centre for Data Science and Artificial IntelligenceDepartment of Computer Science and Engineering

Indian Institute of Technology Madras

Object Detection Over Scientific Plots

1

Dr. Mitesh Khapra Dr. Pratyush KumarNitesh Methani(Research Scholar, IIT Madras) (Assistant Professor, IIT Madras) (Assistant Professor, IIT Madras)

Pritha Ganguly(Research Scholar, IIT Madras)

Introduction

Image Source: Google Images2

Problem Statement

Fast and accurate detection of objects in scientific plots

Image Source: Nitesh Methani, Pritha Ganguly, Mitesh Khapra, Pratyush Kumar, PlotQA: Reasoning over Scientific Plots, WACV 2020.3

Years Bulgaria Cuba

2002 677 593

2003 641 538

2004 604 485

2005 575 440

Introduction


Q: What is the difference between the number of neonatal deaths in Bulgaria and Cuba in the year 2004?A: 119

Q: What is the average number of neonatal deaths in Cuba across years?A: 514

Q: In which year is the number of neonatal deaths in Bulgaria maximum?A: 2002

Years Bulgaria Cuba

2002 677 593

2003 641 538

2004 604 485

2005 575 440

Introduction


Q: What is the difference between the number of neonatal deaths in Bulgaria and Cuba in the year 2004?A: 119

Q: What is the average number of neonatal deaths in Cuba across years?A: 514

Q: In which year is the number of neonatal deaths in Cuba maximum?A: 2002

Are existing object detection models good enough?

6

Natural Images v/s Scientific Plots

7Image Source: M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, A. Zisserman, The Pascal VOC Challenge. Int. J. Comput. Vis., 2010Image Source: Nitesh Methani, Pritha Ganguly, Mitesh Khapra, Pratyush Kumar, PlotQA: Reasoning over Scientific Plots, WACV 2020.

Visual elements Visual elements Textual elements+


8

Small to Large boxes X-Small to X-Large boxes

✔ ✔

Image Source: Google Images


9

Structural Relationship Structural Relationship

❌

✔ ✔ ✔ ✔

✔



10

0.5 IOU✔

✔ ✔

✔

✔ ✔

❌ 0.5 IOU❌


11

0.5 IOU✔

✔ ✔

✔

✔ ✔

❌ 0.75 IOU❌


12

0.5 IOU✔

✔ ✔

✔

✔ ✔

0.90 IOU❌ ✔


13

✔

✔ ✔

✔

✔ ✔

Key Insight: OD over scientific plots has additional challenges as compared to OD over natural images

❌ 0.5 IOU 0.90 IOU✔

Goal 1Investigate whether existing object detection methods are adequate for detecting text and visual elements in scientific plots which are arguably different than the objects found in natural images?

?

14

Twostage

Onestage

Summary of Two Stage Detectors

15

⋯

w

h

w*

h*CNN

Feature Extractor

Warped Image regions

Regression output

Feature Volume

Classification output

InputImage

Flattened vector

⋰

WH

D

⋰

W H D

1R. B. Girshick, J. Donahue, T. Darrell, J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. CVPR14


16

⋰CNN

Feature Extractor

Warped Image regions

Networkoutputs

Feature Volume

InputImage

Flattened vector

⋯

⋰CNN

⋰

⋯

CNN

WH

D

⋰

⋯

W H D

ROI Poolfeatures

Summary of SOTA Models: Fast R-CNN2

17

CNN ⋰

⋰

⋰

h

Feature Extractor

Networkoutputs

Feature Volume

InputImage

Fully Connected Layers

x* y* w* h*

⋯

ROI Pool

2Ross B. Girshick, Fast R-CNN. ICCV 2015

18

665

665

665/32 = 20.78800/32 = 25800

800

25

25

20

20 22

20/7 = 2.86

7

7

Coordinate on input image

Coordinate on feature map

CNNVGG16

{

{

Quantization Coordinate on ROI feature

Quantization

Fast R-CNN: ROI Pool

Image Source: https://ardianumam.wordpress.com/2017/12/16/understanding-how-mask-rcnn-works-for-semactic-segmentation/

Summary of SOTA Models: Faster R-CNN3

19

CNN h

Feature Extractor

Feature Volume

InputImage

3S. Ren, K. He, Ross. B. Girshick, J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS 2015

Summary of SOTA Models: Faster R-CNN

20

CNN

Region Proposal Network

h

Feature Extractor

Feature Volume

InputImage

⋰

⋰

⋰

ROI Poolfeatures

Network outputs

Fully Connected Layers

x* y* w* h*

⋯

Summary of SOTA Models: Mask R-CNN4

21

CNN

Mask Branch

x* y* w* h*

h ⋰

Feature Extractor

ROI Align Network outputs

Feature Volume

InputImage

Fully Connected

⋯


4Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross B. Girshick, Mask R-CNN, ICCV 2017

+ Mask Branch

Summary of SOTA Models: Mask R-CNN

22

CNN h ⋰

Feature Extractor

ROI Align Network outputs

Feature Volume

InputImage

Fully Connected + Mask Branch

Mask Branch

x* y* w* h*

⋯


23

665

665

665/32 = 20.78800/32 = 25800

800

25

25

20.78

20.7

8

2.862.86

20/7 = 2.86

7

7

Coordinate on input image

Coordinate on feature map

CNNVGG16

{

{

No Quantization Coordinate on ROI feature

No Quantization

Mask R-CNN: ROI Align

Image Source: https://ardianumam.wordpress.com/2017/12/16/understanding-how-mask-rcnn-works-for-semactic-segmentation/

24

ROI Pool vs ROI Align

RO

IPoo

lR

OIA

lign

25


Model Key Ideas

FRCNN

FrRCNN

MRCNN

ROI Pool


ROI Align

26

c x y w h . . .

P(person)

P(horse)

P(cycle)

⋯

Summary of One Stage Detectors

Input Image Output vector for each grid cell

1 x (1 + 4 + K)

1J. Redmon, S. K. Divvala, R. B. Girshick, A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection. CVPR 2016

c x y w h . . .

27

P(person)

P(horse)

P(cycle)

⋯



1 x (1 + 4 + K)

c x y w h . . .

28

P(person)

P(horse)

P(cycle)

⋯



1 x (1 + 4 + K)

c x y w h . . .

29

P(person)

P(horse)

P(cycle)

⋯



1 x (1 + 4 + K)

c x y w h . . .

30

P(person)

P(horse)

P(cycle)

⋯



1 x (1 + 4 + K)

c x y w h . . .

31

P(person)

P(horse)

P(cycle)

⋯



1 x (1 + 4 + K)

32

Summary of SOTA Models: YOLO

7

Input Image

1 + 4 + K7

DarkNetArchitecture

Output volume Feature Extractor

33

Summary of SOTA Models: YOLO-v3

H

Input Image

B x (1 + 4 + K) W

DarkNetArchitecture


⋮ ‘B’ bounding box priors

300x300x3

34

Summary of SOTA Models: YOLO

Input Image

DarkNet

Feature Extractor

Out

put

300x300x3

35

Summary of SOTA Models: SSD2

Input Image Feature Extractor

38x38x512

VGG16

Out

put

conv5 last

2W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C.Y Fu, A. C. Berg, SSD: Single Shot MultiBox Detector. ECCV 2016

300x300x3

36

Summary of SOTA Models: SSD

Input Image Feature Extractor

38x38x512

VGG16

Out

put

conv4 last

300x300x3 38x38x512 19x19x1024 19x19x1024

37

Out

put

1x1 conv

1x1 conv


Input Image

conv4 last

conv6 (FC6)

conv7 (FC7)

VGG Layers

300x300x3 38x38x512 19x19x1024 19x19x1024 10x10x512 5x5x256 3x3x256 1x1x256

38

1x1 conv

1x1 conv

conv4 last

conv6 (FC6)

conv7 (FC7)

conv8 conv9 conv10 conv11


Input Image VGG Layers Extra Layers

Out

put

c x y w h

39

Summary of SOTA Models: RetinaNet3

3T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, Focal loss for dense object detection, ICCV 2017

Input Image

1 + 4 + K7

DarkNet


7

k

40

Summary of SOTA Models: RetinaNet

Output volume Feature Pyramid Network

W ✕ H ✕ 4A

W ✕ H ✕ kA

Input Image

⋮ ‘A’ anchor boxes

41


Output volume

W ✕ H ✕ 4A

W ✕ H ✕ kA

Input Image


Feature Pyramid Network

42


Output volume

W ✕ H ✕ 4A

W ✕ H ✕ kA

Input Image



43

Image Conv1 Conv2 Conv3 Conv4 Conv5 FC Softmax

RetinaNet: Feature Pyramid Network4

4T.Y. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, S. J. Belongie, Feature Pyramid Networks for Object Detection. CVPR 2017

44

RetinaNet: Feature Pyramid Network

Image Conv1(last)

Conv2(last)

Conv3(last)

Conv4(last)

Conv5(last)

45

Image C1 C2 C3 C4 C5


46


C1

C2

C3

C4C5

Bottom-Up Pathway

47

1x1


C1

C2

C3

C4C5 M5

Bottom-Up Pathway Top-Down Pathway

48

1x1

2x


C1

C2

C3

C4C5

M4

M5


49

+

1x1

1x1 2x


C1

C2

C3

C4C5

M4

M5


50

+

+

+

1x1

1x1

1x1

1x1

2x

2x

2x


C1

C2

C3

C4C5

M2

M3

M4

M5


51

+

+

+

1x1

1x1

1x1

1x1

3x3

2x

2x

2x


C1

C2

C3

C4C5

M2

P5

M3

M4

M5

Bottom-Up Pathway Top-Down Pathway Pyramid Feature Maps

52

+

+

+

1x1

1x1

1x1

1x1

3x3

3x3

3x3

3x32x

2x

2x


C1

C2

C3

C4C5

M2 P2

P3

P4

P5

M3

M4

M5

Bottom-Up Pathway Top-Down Pathway Pyramid Feature Maps

53

YOLO

SSD

RetinaNet

Summary of One Stage DetectorsModel Key Ideas

Grid based proposal

Featurised Pyramid


54

Recap of Goal 1Investigate whether existing object detection methods are adequate for detecting text and visual elements in scientific plots which are arguably different than the objects found in natural images?

?

Twostage

Onestage

Dataset: PlotQA

Image Source: Nitesh Methani, Pritha Ganguly, Mitesh Khapra, Pratyush Kumar, PlotQA: Reasoning over Scientific Plots, WACV 2020.

● Contains over 220,000 scientific plots across three categories:○ Bar (both horizontal and vertical) plots○ Line, and ○ Dot-line plots

55

Evaluation Metric

56

IOU = = 0.54


IOU = = 0.52

Evaluation Metric

57

IOU = = 0.54 IOU = = 0.79 IOU = = 0.98


Evaluation Metric

58

0.54 0.79 0.98


Mark a prediction as correct if it has a 50% IOU with the ground-truth box

Evaluation Metric

59Image Source: Nitesh Methani, Pritha Ganguly, Mitesh Khapra, Pratyush Kumar, PlotQA: Reasoning over Scientific Plots, WACV 2020.


0.54 0.79 0.98

Evaluation Metric

60

0.54 0.79 0.98



Evaluation Metric



0.54 0.79 0.98

Evaluation Metric



0.54 0.79 0.98

Evaluation of SOTA Models

Models [email protected]

SSD 82.33

YOLO-v3 96.27

RetinaNet 90.13

FRCNN 72.83

FrRCNN 88.49

MRCNN 93.72

IOU = 0.53Table: Comparison of existing object detection models on the PlotQA dataset with mAP scores (in %) at IOU of 0.5.

63


Models [email protected] [email protected]

SSD 82.33 39.78

YOLO-v3 96.27 73.31

RetinaNet 90.13 81.13

FRCNN 72.83 63.68

FrRCNN 88.49 50.51

MRCNN 93.72 82.45

IOU = 0.59 IOU = 0.75Table: Comparison of existing object detection models on the PlotQA dataset with mAP scores (in %) at IOUs of 0.5, and 0.75.

64


Models [email protected] [email protected] [email protected]

SSD 82.33 39.78 1.53

YOLO-v3 96.27 73.31 7.43

RetinaNet 90.13 81.13 30.56

FRCNN 72.83 63.68 21.45

FrRCNN 88.49 50.51 4.08

MRCNN 93.72 82.45 35.70

IOU = 0.59 IOU = 0.75 IOU = 0.96Table: Comparison of existing object detection models on the PlotQA dataset with mAP scores (in %) at IOUs of 0.5, 0.75, and 0.9.

65


IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

SSD 1.39 0.60 2.18 0.39 0.04 3.39 0.44 5.14 0.20 1.53 39.78 82.33

YOLO-v3 15.51 8.72 7.15 11.70 0.02 4.39 8.08 9.59 1.70 7.43 73.31 96.27

RetinaNet 16.51 18.50 77.26 29.74 16.58 67.62 28.40 3.14 17.31 30.56 81.13 90.13

FRCNN 53.38 1.68 12.59 14.06 0.03 42.13 25.49 11.68 31.98 21.45 63.68 72.83

FrRCNN 6.92 1.68 1.39 1.45 0.00 4.35 6.10 3.57 5.18 4.08 50.51 88.49

MRCNN 47.54 5.36 50.83 32.43 0.33 40.20 33.72 80.53 30.31 35.70 82.45 93.72

Table: Comparison of existing object detection models on the PlotQA dataset with mAP scores (in %) at IOUs of 0.5, 0.75, and 0.9.

66


IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

SSD 1.39 0.60 2.18 0.39 0.04 3.39 0.44 5.14 0.20 1.53 39.78 82.33

YOLO-v3 15.51 8.72 7.15 11.70 0.02 4.39 8.08 9.59 1.70 7.43 73.31 96.27

RetinaNet 16.51 18.50 77.26 29.74 16.58 67.62 28.40 3.14 17.31 30.56 81.13 90.13

FRCNN 53.38 1.68 12.59 14.06 0.03 42.13 25.49 11.68 31.98 21.45 63.68 72.83

FrRCNN 6.92 1.68 1.39 1.45 0.00 4.35 6.10 3.57 5.18 4.08 50.51 88.49

MRCNN 47.54 5.36 50.83 32.43 0.33 40.20 33.72 80.53 30.31 35.70 82.45 93.72


67


IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

SSD 1.39 0.60 2.18 0.39 0.04 3.39 0.44 5.14 0.20 1.53 39.78 82.33

YOLO-v3 15.51 8.72 7.15 11.70 0.02 4.39 8.08 9.59 1.70 7.43 73.31 96.27

RetinaNet 16.51 18.50 77.26 29.74 16.58 67.62 28.40 3.14 17.31 30.56 81.13 90.13

FRCNN 53.38 1.68 12.59 14.06 0.03 42.13 25.49 11.68 31.98 21.45 63.68 72.83

FrRCNN 6.92 1.68 1.39 1.45 0.00 4.35 6.10 3.57 5.18 4.08 50.51 88.49

MRCNN 47.54 5.36 50.83 32.43 0.33 40.20 33.72 80.53 30.31 35.70 82.45 93.72


68

Qualitative Analysis: SSD

Figure: An example plot from PlotQA dataset.

69

Figure: Detected bounding boxes on an example plot from PlotQA dataset.

Qualitative Analysis: YOLO-v3


70


Qualitative Analysis: RetinaNet


71


Qualitative Analysis: FRCNN


72


Qualitative Analysis: FrRCNN


73


Qualitative Analysis: MRCNN


74


Qualitative Analysis: Summary

7575

SSD YOLO-v3

Retinanet FRCNN

FrRCNN MRCNN

Longer textual objects

Very short objects

Higher IOU settings

❌

❌

❌

Key Observations:


7676

Retinanet FRCNN

FrRCNN MRCNN

SSD YOLO-v3Longer textual objects

Very short objects

Higher IOU settings

FPN helps

❌

❌

✔

Key Observations:

❌

Longer textual objects

Very short objects

Higher IOU settings

FPN helps

ROIAlign helps


7777

❌

❌

✔

✔

Key Observations:

FrRCNN MRCNN

SSD YOLO-v3

Retinanet FRCNN

❌

Design a deep learning based object detection network that accurately and efficiently detects all the textual and visual objects present in a scientific plot.

Goal 2

Accurate & Efficient

78Image Source: Google Images

Faster R-CNN backbone Feature Pyramid Network (FPN) ROIAlign (RA)

+ +

79

A Hybrid Model

A Hybrid Model: Results

IOU 0.9 0.75 0.5

Class\Models

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

FRCNN(RA) 83.25 16.32 62.31 59.10 0.18 65.77 78.72 42.94 78.87 54.16 68.92 72.46

FRCNN(FPN+RA) 87.59 31.62 79.05 66.39 0.22 69.78 88.29 46.63 84.60 61.57 69.82 72.18

FrRCNN(RA) 63.89 14.79 70.95 60.61 0.18 83.89 60.76 93.47 50.87 55.49 89.14 96.80

Hybrid Model

(FrRCNN+FPN+RA)

85.54 27.86 93.68 96.30 0.22 99.09 96.04 99.46 96.80 77.22 94.58 97.76

80

Table 2: Comparison of modified models on the PlotQA dataset with mAP scores (in %) at IOUs of 0.9, 0.75 & 0.5.


IOU 0.9 0.75 0.5

Class\Models

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

FRCNN(RA) 83.25 16.32 62.31 59.10 0.18 65.77 78.72 42.94 78.87 54.16 68.92 72.46

FRCNN(FPN+RA) 87.59 31.62 79.05 66.39 0.22 69.78 88.29 46.63 84.60 61.57 69.82 72.18

FrRCNN(RA) 63.89 14.79 70.95 60.61 0.18 83.89 60.76 93.47 50.87 55.49 89.14 96.80

Hybrid Model

(FrRCNN+FPN+RA)

85.54 27.86 93.68 96.30 0.22 99.09 96.04 99.46 96.80 77.22 94.58 97.76

81



IOU 0.9 0.75 0.5

Class\Models

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

FRCNN(RA) 83.25 16.32 62.31 59.10 0.18 65.77 78.72 42.94 78.87 54.16 68.92 72.46

FRCNN(FPN+RA) 87.59 31.62 79.05 66.39 0.22 69.78 88.29 46.63 84.60 61.57 69.82 72.18

FrRCNN(RA) 63.89 14.79 70.95 60.61 0.18 83.89 60.76 93.47 50.87 55.49 89.14 96.80

Hybrid Model

(FrRCNN+FPN+RA)

85.54 27.86 93.68 96.30 0.22 99.09 96.04 99.46 96.80 77.22 94.58 97.76

82



IOU 0.9 0.75 0.5

Class\Models

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

FRCNN(RA) 83.25 16.32 62.31 59.10 0.18 65.77 78.72 42.94 78.87 54.16 68.92 72.46

FRCNN(FPN+RA) 87.59 31.62 79.05 66.39 0.22 69.78 88.29 46.63 84.60 61.57 69.82 72.18

FrRCNN(RA) 63.89 14.79 70.95 60.61 0.18 83.89 60.76 93.47 50.87 55.49 89.14 96.80

Hybrid Model

(FrRCNN+FPN+RA)

85.54 27.86 93.68 96.30 0.22 99.09 96.04 99.46 96.80 77.22 94.58 97.76

83



IOU 0.9 0.75 0.5

Class\Models

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

FRCNN(RA) 83.25 16.32 62.31 59.10 0.18 65.77 78.72 42.94 78.87 54.16 68.92 72.46

FRCNN(FPN+RA) 87.59 31.62 79.05 66.39 0.22 69.78 88.29 46.63 84.60 61.57 69.82 72.18

FrRCNN(RA) 63.89 14.79 70.95 60.61 0.18 83.89 60.76 93.47 50.87 55.49 89.14 96.80

Hybrid Model

(FrRCNN+FPN+RA)

85.54 27.86 93.68 96.30 0.22 99.09 96.04 99.46 96.80 77.22 94.58 97.76

84


A Hybrid Model: Qualitative Analysis

FRCNN(RA) FRCNN(FPN+RA)

FrRCNN(RA) Hybrid Model ( FrRCNN+FPN+RA)

Figure: Detected bounding boxes on an example plot from PlotQA dataset for different hybrid models corresponding to Table 2 at an IOU threshold of 0.9.

85

Figure: mAP (in %) v/s Inference Time per image (in ms) for different object detection models on PlotQA at an IOU setting of 0.9. (x, y) represents the tuple (mAP, time). 86

A Hybrid Model: Summary

Existing Models



Existing Models

Hybrid Model



Preferred region

Can we do better (faster and more efficient)?

89

1x1 FC

90

Proposed Model: PlotNet

CH

RH

LH

ROI Mask Feature MapRGB Image ROI Aligned Features ROI Volumes Output VectorsFinal Vector

1024

25632

0

1x1 FC

91


CH

RH

LH


1024

320

256

CV-based proposal

1x1 FC

92


CH

RH

LH

Feature Map ROI Aligned Features ROI Volumes Output VectorsFinal Vector

1024

320

256

Feature Extractor

ROI Mask RGB Image

1x1 FC

93


CH

RH

LH

ROI Aligned Features ROI Volumes Output VectorsFinal Vector

1024

320

256

ROI Align

ROI Mask RGB Image Feature Map

1x1 FC

94


CH

RH

LH

ROI Volumes Output VectorsFinal Vector

1024

320

256

ROI Mask Feature MapRGB Image ROI Aligned Features

AN-ROI Layer

95


FC

CH

RH

LH

Output VectorsFinal Vector

1024

Class, Regress and Linking Heads


1x1

320

256

ROI Volumes

96


FC

CH

RH

LH

Output VectorsFinal Vector

1024

Class, Regress and Linking Heads


1x1

320

256

ROI Volumes

RGB Image Grayscale Image

Preprocess-ing

97

PlotNet: CV-based Region Proposal

RGB Image Laplacian Edges

LaplacianEdge

Detector

98


RGB Image Contoured Image

Contour Detection

99


RGB Image Proposed ROIs

Fit Bounding

Boxes

100


RGB Image Proposed ROIs 1-D ROI Mask

CV techniques

Mask formation

101


102

PlotNet: Feature Extractor

ROI Mask Feature MapRGB Image

103

PlotNet: Feature Extractor

ROI Mask Feature MapRGB Image

104

PlotNet: ROI Align Layer


105

PlotNet: AN-ROI Layer

ROI Mask Feature MapRGB Image ROI Aligned Features ROI Volumes

Volume size:14 x 14 x 320

106

Volume size:14 x 14 x 256

PlotNet: AN-ROI Layer

ROI Mask Feature MapRGB Image ROI Aligned Features ROI Volumes

1x1

320

1x1 FC

107

PlotNet: Class, Regress, and Linking Heads

CH

RH

LH


1024

25632

0

1x1 FC

108

PlotNet: Class, Regress, and Linking Heads

CH

RH

LH


1024

25632

0

for every ROI

Figure: Comparison of different loss functions at varying IOUs. 109

PlotNet: Loss Function

IOU

Loss

High IOU region

Key Insight

Existing losses give negligible values at high IoUs



IOU

Loss

● Gives non-negligible values at high IOUs

● Mathematically, the loss is defined:

● 𝛄 determines the rate of the scaling factor

Our Contribution



IOU

Loss




Non-negligible values

Our Contribution



IOU

Loss




Our Contribution

113

PlotNet: Different Configurations

1)

2)

3)

4)

5)

6)

7)

8)

: AN-ROI Layer : Regression Loss: ROI Features

WACV2021: Additional Experiments

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

[email protected]

91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89

92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77

91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87

91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34

91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88

91.71 49.30 95.99 93.55 98.42 95.03 89.77 94.08 86.06 88.21

91.35 52.22 96.31 93.45 96.82 96.18 89.63 95.46 94.07 89.50

91.15 55.03 97.89 92.99 99.46 96.33 91.30 90.40 95.48 90.00

114

Table 5: Comparison of variants of PlotNet on the PlotQA dataset with mAP score(in %) at IOUs of 0.9.


bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

[email protected]

92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19

92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39

92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44

92.84 69.19 98.49 95.22 99.42 96.88 94.65 96.51 96.87 93.34

92.96 68.25 98.34 95.96 98.96 97.10 95.08 96.84 97.16 93.41

92.91 58.52 98.40 95.95 95.14 97.16 94.59 97.55 95.63 91.76

115



bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

[email protected]

85.30 52.85 29.64 94.30 0.00 10.36 80.77 1.47 81.59 48.48

76.14 61.49 40.19 96.51 0.00 16.49 79.05 1.76 77.68 49.92no linking

no linking

PlotNet: Results

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

[email protected]

91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89

92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77

91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87

91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34

91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88

92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19

92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39

92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44

117


PlotNet: Results

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

[email protected]

91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89

92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77

91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87

91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34

91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88

92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19

92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39

92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44

118


PlotNet: Results

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

[email protected]

91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89

92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77

91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87

91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34

91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88

92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19

92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39

92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44

119


PlotNet: Results

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

[email protected]

91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89

92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77

91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87

91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34

91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88

92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19

92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39

92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44

120


PlotNet: Results

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

[email protected]

91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89

92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77

91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87

91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34

91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88

92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19

92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39

92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44

121


● PlotNet performs better than all existing methods at all IOUs.

● At 0.9 IOU threshold, PlotNet improves upon its closest competitor by 16.22 absolute points.

Figure: Detected bounding boxes by PlotNet-v7 on an example plot from PlotQA dataset at an IOU threshold of 0.9.

122

PlotNet: Qualitative Analysis

PlotNet: Comparison to other models

Figure: mAP (in %) v/s Inference Time per image (in ms) for different object detection models on PlotQA at an IOU setting of 0.9. (x, y) represents the tuple (mAP, time).

123



124



125

16.22pts

Figure: mAP v/s IOU threshold for different object detection models.126


Use-Case: Plot to Table Converter

(a) Input Image

(c) Ground-truth Table (d) Generated Table

(b) Predicted bounding boxes

Figure: Sample table generation using PlotNet's predictions 127

Use-Case: Plot to Table Converter

(a) Input Image

(c) Ground-truth Table (d) Generated Table

(b) Predicted bounding boxes

Figure: Sample table generation using PlotNet's predictions 128

129

Conclusion

Evaluated existing methods and exemplified the challenges

Proposed PlotNet addressing all the challenges

130

High Recall Proposal Method End2End Training

Future Work


Communicated:

Pritha Ganguly*, Nitesh Methani*, Mitesh M. Khapra and Pratyush Kumar, A Systematic Evaluation of Object Detection Networks for Scientific Plots., Under review at a Computer Vision Conference.

*the first two authors have contributed equally. 131

Visible Outcome

Dr. Mitesh Khapra Dr. Pratyush KumarNitesh Methani(Research Scholar, IIT Madras) (Assistant Professor, IIT Madras) (Assistant Professor, IIT Madras)

132

Team

Pritha Ganguly(Research Scholar, IIT Madras)

Nikita Suman

Siddhartha Preksha

TarunAnanya

Darwin Aakriti Madhura

JananiHimanshu

Shashank Nikhilesh

Priyesh

Deepak Shweta

133

Thank You!

Image Source: Google Images134

Questions or Suggestions?

135

Extra Slides

PlotNet: Finding the Best Feature Extractor

136

PlotNet: Finding the best feature extractor

IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

R-10 89.46 37.63 93.64 78.12 95.22 94.06 88.10 66.95 83.86 80.78 96.74 97.57

R-22 91.37 24.08 97.03 81.01 98.79 90.47 81.99 51.79 47.92 73.83 97.01 98.08

R-50 87.64 15.72 74.57 41.87 98.92 81.60 54.21 43.35 35.67 59.28 93.91 97.67

R-22 FPN 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

R-50 FPN 90.77 5.12 95.58 80.72 99.16 94.79 76.83 65.56 58.17 74.08 94.09 97.67

137

Table 3: Comparison of different variants of PlotNet on the PlotQA dataset by varying the number of layers in the ResNet(R)-50 architecture with mAP scores (in %) at IOUs of 0.9, 0.75, and 0.5.


IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

R-10 89.46 37.63 93.64 78.12 95.22 94.06 88.10 66.95 83.86 80.78 96.74 97.57

R-22 91.37 24.08 97.03 81.01 98.79 90.47 81.99 51.79 47.92 73.83 97.01 98.08

R-50 87.64 15.72 74.57 41.87 98.92 81.60 54.21 43.35 35.67 59.28 93.91 97.67

R-22 FPN 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

R-50 FPN 90.77 5.12 95.58 80.72 99.16 94.79 76.83 65.56 58.17 74.08 94.09 97.67

138



IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

R-10 89.46 37.63 93.64 78.12 95.22 94.06 88.10 66.95 83.86 80.78 96.74 97.57

R-22 91.37 24.08 97.03 81.01 98.79 90.47 81.99 51.79 47.92 73.83 97.01 98.08

R-50 87.64 15.72 74.57 41.87 98.92 81.60 54.21 43.35 35.67 59.28 93.91 97.67

R-22 FPN 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

R-50 FPN 90.77 5.12 95.58 80.72 99.16 94.79 76.83 65.56 58.17 74.08 94.09 97.67

139



IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

R-10 89.46 37.63 93.64 78.12 95.22 94.06 88.10 66.95 83.86 80.78 96.74 97.57

R-22 91.37 24.08 97.03 81.01 98.79 90.47 81.99 51.79 47.92 73.83 97.01 98.08

R-50 87.64 15.72 74.57 41.87 98.92 81.60 54.21 43.35 35.67 59.28 93.91 97.67

R-22 FPN 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

R-50 FPN 90.77 5.12 95.58 80.72 99.16 94.79 76.83 65.56 58.17 74.08 94.09 97.67

140



IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

R-10 89.46 37.63 93.64 78.12 95.22 94.06 88.10 66.95 83.86 80.78 96.74 97.57

R-22 91.37 24.08 97.03 81.01 98.79 90.47 81.99 51.79 47.92 73.83 97.01 98.08

R-50 87.64 15.72 74.57 41.87 98.92 81.60 54.21 43.35 35.67 59.28 93.91 97.67

R-22 FPN 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

R-50 FPN 90.77 5.12 95.58 80.72 99.16 94.79 76.83 65.56 58.17 74.08 94.09 97.67

141



IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

v0 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

v1 92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77 97.74 98.24

v2 91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87 96.38 98.20

v3 91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34 96.97 98.26

v4 91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88 97.30 98.31

v5 92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19 97.70 98.18

v6 92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39 96.78 97.82

v7 92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44 97.93 98.32

142

Table 5: Comparison of variants of PlotNet on the PlotQA dataset with mAP score(in %) at IOUs of 0.9, 0.75 & 0.5.


IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

v0 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

v1 92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77 97.74 98.24

v2 91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87 96.38 98.20

v3 91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34 96.97 98.26

v4 91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88 97.30 98.31

v5 92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19 97.70 98.18

v6 92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39 96.78 97.82

v7 92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44 97.93 98.32

143



IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

v0 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

v1 92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77 97.74 98.24

v2 91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87 96.38 98.20

v3 91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34 96.97 98.26

v4 91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88 97.30 98.31

v5 92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19 97.70 98.18

v6 92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39 96.78 97.82

v7 92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44 97.93 98.32

144



IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

v0 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

v1 92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77 97.74 98.24

v2 91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87 96.38 98.20

v3 91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34 96.97 98.26

v4 91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88 97.30 98.31

v5 92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19 97.70 98.18

v6 92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39 96.78 97.82

v7 92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44 97.93 98.32

145



IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

v0 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

v1 92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77 97.74 98.24

v2 91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87 96.38 98.20

v3 91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34 96.97 98.26

v4 91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88 97.30 98.31

v5 92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19 97.70 98.18

v6 92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39 96.78 97.82

v7 92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44 97.93 98.32

146



IOU 0.9 0.75 0.5

Models\Class

bar dotline

legendlabel

legendpreview

plottitle

x-axislabels

x-axisticks

y-axislabels

y-axisticks

mAP mAP mAP

v0 91.02 31.69 97.08 81.57 99.36 96.06 85.33 82.00 90.95 83.89 97.21 98.11

v1 92.16 61.18 98.38 93.46 99.44 97.21 94.21 95.45 94.42 91.77 97.74 98.24

v2 91.79 41.86 93.74 94.64 98.29 83.11 85.69 89.32 49.36 80.87 96.38 98.20

v3 91.83 45.78 91.48 94.15 98.95 74.24 87.19 89.34 50.11 80.34 96.97 98.26

v4 91.88 61.44 96.44 95.58 99.27 97.19 90.64 97.55 87.66 90.88 97.30 98.31

v5 92.78 68.26 97.75 95.90 99.04 93.64 92.97 96.24 93.12 92.19 97.70 98.18

v6 92.74 59.12 94.87 95.58 92.26 94.46 94.12 95.36 76.85 88.39 96.78 97.82

v7 92.80 70.11 98.47 96.33 99.52 97.31 94.29 97.66 94.48 93.44 97.93 98.32

147


Robert Bosch Centre for Data Science and Artificial Intelligence … Khapra.pdf · 2020. 9. 2. · 57 IOU = = 0.54 IOU = = 0.79 IOU = = 0.98 Image Source: Nitesh Methani, Pritha Ganguly,

Documents

Robert Bosch Centre for Data Science and Artificial Intelligence … Khapra.pdf · 2020. 9. 2. · 57 IOU = = 0.54 IOU = = 0.79 IOU = = 0.98 Image Source: Nitesh Methani, Pritha Ganguly,