Robert Bosch Centre for Data Science and Artificial Intelligence Department of Computer Science and Engineering Indian Institute of Technology Madras Object Detection Over Scientific Plots 1 Dr. Mitesh Khapra Dr. Pratyush Kumar Nitesh Methani (Research Scholar, IIT Madras) (Assistant Professor, IIT Madras) (Assistant Professor, IIT Madras) Pritha Ganguly (Research Scholar, IIT Madras)
147
Embed
Robert Bosch Centre for Data Science and Artificial Intelligence … Khapra.pdf · 2020. 9. 2. · 57 IOU = = 0.54 IOU = = 0.79 IOU = = 0.98 Image Source: Nitesh Methani, Pritha Ganguly,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Robert Bosch Centre for Data Science and Artificial IntelligenceDepartment of Computer Science and Engineering
Indian Institute of Technology Madras
Object Detection Over Scientific Plots
1
Dr. Mitesh Khapra Dr. Pratyush KumarNitesh Methani(Research Scholar, IIT Madras) (Assistant Professor, IIT Madras) (Assistant Professor, IIT Madras)
Pritha Ganguly(Research Scholar, IIT Madras)
Introduction
Image Source: Google Images2
Problem Statement
Fast and accurate detection of objects in scientific plots
Q: What is the difference between the number of neonatal deaths in Bulgaria and Cuba in the year 2004?A: 119
Q: What is the average number of neonatal deaths in Cuba across years?A: 514
Q: In which year is the number of neonatal deaths in Cuba maximum?A: 2002
Are existing object detection models good enough?
6
Natural Images v/s Scientific Plots
7Image Source: M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, A. Zisserman, The Pascal VOC Challenge. Int. J. Comput. Vis., 2010Image Source: Nitesh Methani, Pritha Ganguly, Mitesh Khapra, Pratyush Kumar, PlotQA: Reasoning over Scientific Plots, WACV 2020.
Visual elements Visual elements Textual elements+
Natural Images v/s Scientific Plots
8
Small to Large boxes X-Small to X-Large boxes
✔ ✔
Image Source: Google Images
Natural Images v/s Scientific Plots
9
Structural Relationship Structural Relationship
❌
✔ ✔ ✔ ✔
✔
Image Source: Google Images
Natural Images v/s Scientific Plots
10
0.5 IOU✔
✔ ✔
✔
✔ ✔
❌ 0.5 IOU❌
Natural Images v/s Scientific Plots
11
0.5 IOU✔
✔ ✔
✔
✔ ✔
❌ 0.75 IOU❌
Natural Images v/s Scientific Plots
12
0.5 IOU✔
✔ ✔
✔
✔ ✔
0.90 IOU❌ ✔
Natural Images v/s Scientific Plots
13
✔
✔ ✔
✔
✔ ✔
Key Insight: OD over scientific plots has additional challenges as compared to OD over natural images
❌ 0.5 IOU 0.90 IOU✔
Goal 1Investigate whether existing object detection methods are adequate for detecting text and visual elements in scientific plots which are arguably different than the objects found in natural images?
?
14
Twostage
Onestage
Summary of Two Stage Detectors
15
⋯
w
h
w*
h*CNN
Feature Extractor
Warped Image regions
Regression output
Feature Volume
Classification output
InputImage
Flattened vector
⋰
WH
D
⋰
W H D
1R. B. Girshick, J. Donahue, T. Darrell, J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. CVPR14
Recap of Goal 1Investigate whether existing object detection methods are adequate for detecting text and visual elements in scientific plots which are arguably different than the objects found in natural images?
IOU = 0.59 IOU = 0.75 IOU = 0.96Table: Comparison of existing object detection models on the PlotQA dataset with mAP scores (in %) at IOUs of 0.5, 0.75, and 0.9.
Table: Comparison of existing object detection models on the PlotQA dataset with mAP scores (in %) at IOUs of 0.5, 0.75, and 0.9.
68
Qualitative Analysis: SSD
Figure: An example plot from PlotQA dataset.
69
Figure: Detected bounding boxes on an example plot from PlotQA dataset.
Qualitative Analysis: YOLO-v3
Figure: An example plot from PlotQA dataset.
70
Figure: Detected bounding boxes on an example plot from PlotQA dataset.
Qualitative Analysis: RetinaNet
Figure: An example plot from PlotQA dataset.
71
Figure: Detected bounding boxes on an example plot from PlotQA dataset.
Qualitative Analysis: FRCNN
Figure: An example plot from PlotQA dataset.
72
Figure: Detected bounding boxes on an example plot from PlotQA dataset.
Qualitative Analysis: FrRCNN
Figure: An example plot from PlotQA dataset.
73
Figure: Detected bounding boxes on an example plot from PlotQA dataset.
Qualitative Analysis: MRCNN
Figure: An example plot from PlotQA dataset.
74
Figure: Detected bounding boxes on an example plot from PlotQA dataset.
Qualitative Analysis: Summary
7575
SSD YOLO-v3
Retinanet FRCNN
FrRCNN MRCNN
Longer textual objects
Very short objects
Higher IOU settings
❌
❌
❌
Key Observations:
Qualitative Analysis: Summary
7676
Retinanet FRCNN
FrRCNN MRCNN
SSD YOLO-v3Longer textual objects
Very short objects
Higher IOU settings
FPN helps
❌
❌
✔
Key Observations:
❌
Longer textual objects
Very short objects
Higher IOU settings
FPN helps
ROIAlign helps
Qualitative Analysis: Summary
7777
❌
❌
✔
✔
Key Observations:
FrRCNN MRCNN
SSD YOLO-v3
Retinanet FRCNN
❌
Design a deep learning based object detection network that accurately and efficiently detects all the textual and visual objects present in a scientific plot.
Table 2: Comparison of modified models on the PlotQA dataset with mAP scores (in %) at IOUs of 0.9, 0.75 & 0.5.
A Hybrid Model: Qualitative Analysis
FRCNN(RA) FRCNN(FPN+RA)
FrRCNN(RA) Hybrid Model ( FrRCNN+FPN+RA)
Figure: Detected bounding boxes on an example plot from PlotQA dataset for different hybrid models corresponding to Table 2 at an IOU threshold of 0.9.
85
Figure: mAP (in %) v/s Inference Time per image (in ms) for different object detection models on PlotQA at an IOU setting of 0.9. (x, y) represents the tuple (mAP, time). 86
A Hybrid Model: Summary
Existing Models
Figure: mAP (in %) v/s Inference Time per image (in ms) for different object detection models on PlotQA at an IOU setting of 0.9. (x, y) represents the tuple (mAP, time). 87
A Hybrid Model: Summary
Existing Models
Hybrid Model
Figure: mAP (in %) v/s Inference Time per image (in ms) for different object detection models on PlotQA at an IOU setting of 0.9. (x, y) represents the tuple (mAP, time). 88
A Hybrid Model: Summary
Preferred region
Can we do better (faster and more efficient)?
89
1x1 FC
90
Proposed Model: PlotNet
CH
RH
LH
ROI Mask Feature MapRGB Image ROI Aligned Features ROI Volumes Output VectorsFinal Vector
1024
25632
0
1x1 FC
91
Proposed Model: PlotNet
CH
RH
LH
ROI Mask Feature MapRGB Image ROI Aligned Features ROI Volumes Output VectorsFinal Vector
1024
320
256
CV-based proposal
1x1 FC
92
Proposed Model: PlotNet
CH
RH
LH
Feature Map ROI Aligned Features ROI Volumes Output VectorsFinal Vector
1024
320
256
Feature Extractor
ROI Mask RGB Image
1x1 FC
93
Proposed Model: PlotNet
CH
RH
LH
ROI Aligned Features ROI Volumes Output VectorsFinal Vector
1024
320
256
ROI Align
ROI Mask RGB Image Feature Map
1x1 FC
94
Proposed Model: PlotNet
CH
RH
LH
ROI Volumes Output VectorsFinal Vector
1024
320
256
ROI Mask Feature MapRGB Image ROI Aligned Features
AN-ROI Layer
95
Proposed Model: PlotNet
FC
CH
RH
LH
Output VectorsFinal Vector
1024
Class, Regress and Linking Heads
ROI Mask Feature MapRGB Image ROI Aligned Features
1x1
320
256
ROI Volumes
96
Proposed Model: PlotNet
FC
CH
RH
LH
Output VectorsFinal Vector
1024
Class, Regress and Linking Heads
ROI Mask Feature MapRGB Image ROI Aligned Features
1x1
320
256
ROI Volumes
RGB Image Grayscale Image
Preprocess-ing
97
PlotNet: CV-based Region Proposal
RGB Image Laplacian Edges
LaplacianEdge
Detector
98
PlotNet: CV-based Region Proposal
RGB Image Contoured Image
Contour Detection
99
PlotNet: CV-based Region Proposal
RGB Image Proposed ROIs
Fit Bounding
Boxes
100
PlotNet: CV-based Region Proposal
RGB Image Proposed ROIs 1-D ROI Mask
CV techniques
Mask formation
101
PlotNet: CV-based Region Proposal
102
PlotNet: Feature Extractor
ROI Mask Feature MapRGB Image
103
PlotNet: Feature Extractor
ROI Mask Feature MapRGB Image
104
PlotNet: ROI Align Layer
ROI Mask Feature MapRGB Image ROI Aligned Features
105
PlotNet: AN-ROI Layer
ROI Mask Feature MapRGB Image ROI Aligned Features ROI Volumes
Volume size:14 x 14 x 320
106
Volume size:14 x 14 x 256
PlotNet: AN-ROI Layer
ROI Mask Feature MapRGB Image ROI Aligned Features ROI Volumes
1x1
320
1x1 FC
107
PlotNet: Class, Regress, and Linking Heads
CH
RH
LH
ROI Mask Feature MapRGB Image ROI Aligned Features ROI Volumes Output VectorsFinal Vector
1024
25632
0
1x1 FC
108
PlotNet: Class, Regress, and Linking Heads
CH
RH
LH
ROI Mask Feature MapRGB Image ROI Aligned Features ROI Volumes Output VectorsFinal Vector
1024
25632
0
for every ROI
Figure: Comparison of different loss functions at varying IOUs. 109
PlotNet: Loss Function
IOU
Loss
High IOU region
Key Insight
Existing losses give negligible values at high IoUs
Figure: Comparison of different loss functions at varying IOUs. 110
PlotNet: Loss Function
IOU
Loss
● Gives non-negligible values at high IOUs
● Mathematically, the loss is defined:
● 𝛄 determines the rate of the scaling factor
Our Contribution
Figure: Comparison of different loss functions at varying IOUs. 111
PlotNet: Loss Function
IOU
Loss
● Gives non-negligible values at high IOUs
● Mathematically, the loss is defined:
● 𝛄 determines the rate of the scaling factor
Non-negligible values
Our Contribution
Figure: Comparison of different loss functions at varying IOUs. 112
Table 5: Comparison of variants of PlotNet on the PlotQA dataset with mAP score(in %) at IOUs of 0.9.
● PlotNet performs better than all existing methods at all IOUs.
● At 0.9 IOU threshold, PlotNet improves upon its closest competitor by 16.22 absolute points.
Figure: Detected bounding boxes by PlotNet-v7 on an example plot from PlotQA dataset at an IOU threshold of 0.9.
122
PlotNet: Qualitative Analysis
PlotNet: Comparison to other models
Figure: mAP (in %) v/s Inference Time per image (in ms) for different object detection models on PlotQA at an IOU setting of 0.9. (x, y) represents the tuple (mAP, time).
123
PlotNet: Comparison to other models
Figure: mAP (in %) v/s Inference Time per image (in ms) for different object detection models on PlotQA at an IOU setting of 0.9. (x, y) represents the tuple (mAP, time).
124
PlotNet: Comparison to other models
Figure: mAP (in %) v/s Inference Time per image (in ms) for different object detection models on PlotQA at an IOU setting of 0.9. (x, y) represents the tuple (mAP, time).
125
16.22pts
Figure: mAP v/s IOU threshold for different object detection models.126
PlotNet: Comparison to other models
Use-Case: Plot to Table Converter
(a) Input Image
(c) Ground-truth Table (d) Generated Table
(b) Predicted bounding boxes
Figure: Sample table generation using PlotNet's predictions 127
Use-Case: Plot to Table Converter
(a) Input Image
(c) Ground-truth Table (d) Generated Table
(b) Predicted bounding boxes
Figure: Sample table generation using PlotNet's predictions 128
129
Conclusion
Evaluated existing methods and exemplified the challenges
Proposed PlotNet addressing all the challenges
130
High Recall Proposal Method End2End Training
Future Work
Image Source: Google Images
Communicated:
Pritha Ganguly*, Nitesh Methani*, Mitesh M. Khapra and Pratyush Kumar, A Systematic Evaluation of Object Detection Networks for Scientific Plots., Under review at a Computer Vision Conference.
*the first two authors have contributed equally. 131
Visible Outcome
Dr. Mitesh Khapra Dr. Pratyush KumarNitesh Methani(Research Scholar, IIT Madras) (Assistant Professor, IIT Madras) (Assistant Professor, IIT Madras)
Table 3: Comparison of different variants of PlotNet on the PlotQA dataset by varying the number of layers in the ResNet(R)-50 architecture with mAP scores (in %) at IOUs of 0.9, 0.75, and 0.5.
Table 3: Comparison of different variants of PlotNet on the PlotQA dataset by varying the number of layers in the ResNet(R)-50 architecture with mAP scores (in %) at IOUs of 0.9, 0.75, and 0.5.
Table 3: Comparison of different variants of PlotNet on the PlotQA dataset by varying the number of layers in the ResNet(R)-50 architecture with mAP scores (in %) at IOUs of 0.9, 0.75, and 0.5.
Table 3: Comparison of different variants of PlotNet on the PlotQA dataset by varying the number of layers in the ResNet(R)-50 architecture with mAP scores (in %) at IOUs of 0.9, 0.75, and 0.5.
Table 3: Comparison of different variants of PlotNet on the PlotQA dataset by varying the number of layers in the ResNet(R)-50 architecture with mAP scores (in %) at IOUs of 0.9, 0.75, and 0.5.