Page 1
Damage Detection from Aerial Images
via Convolutional Neural Networks
*National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
†Nagoya University, Aichi, Japan
5/8/2017 MVA2017@Nagoya University 1
Aito Fujita*
Riho Ito*
Ken Sakurada† Tomoyuki Imaizumi*
Shuhei Hikoska* Ryosuke Nakamura*
Page 2
Aerial images for damage detection
5/8/2017 MVA2017@Nagoya University 2
In the event of catastrophic disasters, fast assessment ofthe extent of damage is crucial for recovery.
E.g. “Which buildings got washed away by tsunami? And how many?”
Satellite/aerial images as source for damage assessment:
◦ The wide coverage can facilitate the assessment.
◦ However, currently, the assessment is performed manually (by human eye).
Page 3
Aerial view of Tohoku quake
5/8/2017 MVA2017@Nagoya University 3
Post-quake imagePre-quake image
2km
4km
Page 4
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 4
Post-quake imagePre-quake image
Page 5
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 5
Post-quake imagePre-quake image
Page 6
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 6
Post-quake imagePre-quake image
Page 7
5/8/2017 MVA2017@Nagoya University 7
Post-quake imagePre-quake image
Damage assessment in practice
This building is “washed-away”.
Page 8
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 8
Post-quake imagePre-quake image
Page 9
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 9
Post-quake imagePre-quake image
This building is “surviving”.
Page 10
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 10
Post-quake imagePre-quake image
Repeated for all buildings.
Page 11
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 11
Post-quake imagePre-quake image
Repeated for all buildings.
Page 12
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 12
Post-quake imagePre-quake image
Repeated for all buildings.
Page 13
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 13
Post-quake imagePre-quake image
Repeated for all buildings.
Page 14
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 14
Manual assessmentPost-quake imagePre-quake image
RED: washed-away bldg.
YELLOW: surviving bldg.
Page 15
Post-quake imagePre-quake image
RED: washed-away bldg.
YELLOW: surviving bldg.
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 15
Manual assessment
Manual assessment takes too much time.
Can we automate such assessment process?
Page 16
Background and Contributions
Background of damage detection:
◦ Lack of labeled dataset
◦ No study on Deep Learning applied to satellite/aerial images
Related works in this context:
◦ Gueguen+(CVPR2015): hand-designed features (tree-of-shapes); satellite images
◦ Cooner+(Remote Sensing 2016): shallow neural networks; aerial images
◦ Nia+(CRV2017): Convolutional networks but ground-level images
Our contributions:◦ A new labeled dataset (ABCD dataset)
◦ Comprehensive analyses of CNNs for washed-away building detection
5/8/2017 MVA2017@Nagoya University 16
Page 17
Image coverage
New dataset for washed-away building detection
AIST Building Change Detection (ABCD) dataset:
◦ Based on images taken before/after the Tohoku earthquake
◦ Over 10K pairs of pre/post-tsunami patches
◦ Collected from 66km2 of aerial images of the Tohoku coastal region
◦ A target building at the center of the patch
◦ A damage label assigned to each pair
5/8/2017 MVA2017@Nagoya University 17
Washed-away
buildings
Surviving
buildings
◦ The damage label data is based on the survey result of the Great East Japan earthquake,
which was conducted by the Japanese government (MLIT) after March 11, 2011.
◦ Now preparing to release to the public.
Page 18
Overview of washed-away detection framework
5/8/2017 MVA2017@Nagoya University 18
Page 19
Overview of washed-away detection framework
5/8/2017 MVA2017@Nagoya University 19
Page 20
Overview of washed-away detection framework
5/8/2017 MVA2017@Nagoya University 20
Page 21
Overview of washed-away detection framework
5/8/2017 MVA2017@Nagoya University 21
Page 22
Overview of washed-away detection framework
5/8/2017 MVA2017@Nagoya University 22
Page 23
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
5/8/2017 MVA2017@Nagoya University 23
Page 24
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 24
Page 25
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 25
Page 26
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 26
Input to a CNN: a pair of pre/post patches
Two configurations inspired by Zagoruyko+(CVPR15), Lin+(CVPR15), Simo-Serra+(ICCV15)
Page 27
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 27
Input to a CNN: a pair of pre/post patches
Two configurations inspired by Zagoruyko+(CVPR15), Lin+(CVPR15), Simo-Serra+(ICCV15)
Fully-connected
Convolution
Max pooling
Page 28
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 28
In practice, pre-tsunami images may not be available.
Page 29
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 29
Input to a CNN: a post-tsunami patch
Page 30
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale
◦ Size-adaptive
◦ Multi-scale CNN
5/8/2017 MVA2017@Nagoya University 30
Page 31
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale -> Aims to encompass most of the buildings.
◦ Size-adaptive -> Makes tiny buildings more conspicuous.
◦ Multi-scale CNN
5/8/2017 MVA2017@Nagoya University 31
Page 32
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale: 160 x 160 pixels (64m x 64m) so that most buildings are encompassed.
5/8/2017 MVA2017@Nagoya University 32
160 pixels
16
0 p
ixel
s
Page 33
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale -> Aim to encompass most of the buildings.
◦ Size-adaptive: the crop size depends on buildings—more emphasis on small buildings.
5/8/2017 MVA2017@Nagoya University 33
Size-adaptively crop and resize
Fixed-scale (160x160) Resized version
Page 34
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale -> Aim to encompass most of the buildings.
◦ Size-adaptive -> Make tiny buildings more conspicuous.
◦ Multi-scale CNN
5/8/2017 MVA2017@Nagoya University 34
Page 35
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale
◦ Size-adaptive
◦ Multi-scale CNN
5/8/2017 MVA2017@Nagoya University 35
Inspired by Zagoruyko+(CVPR15)
Page 36
Experiment
Experimental setting, the same across all conditions (input scenario and scale)
The number of data◦ Randomly sampled 8,500 pairs; balanced class distribution
Evaluation◦ 5-fold cross validation (train/val per fold = 6800 : 1700)
◦ Classification accuracy
CNN hyper-parameters◦ Conv-Pool-Conv-Pool-Conv-Conv-FC-FC with ReLU nonlinearity
◦ No batch normalization (due to degradation in performance)
◦ SGD with momentum; constant learning rate; weight decay
◦ Shared or unshared weights between branches for Siamese case
Data pre-processing◦ Zero mean and unit variance (pixel-wise)
◦ Augmentation with vertical and horizontal flip
5/8/2017 MVA2017@Nagoya University 36
Page 37
Accuracy
5/8/2017 MVA2017@Nagoya University 37
93.0%
94.0%
95.0%
96.0%
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Page 38
Accuracy
5/8/2017 MVA2017@Nagoya University 38
93.0%
94.0%
95.0%
96.0%
6-channel Siamese Only post image
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Page 39
Accuracy
5/8/2017 MVA2017@Nagoya University 39
93.0%
94.0%
95.0%
96.0%
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Page 40
Accuracy
5/8/2017 MVA2017@Nagoya University 40
93.0%
94.0%
95.0%
96.0%
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixed Adaptive
Page 41
Accuracy
5/8/2017 MVA2017@Nagoya University 41
93.0%
94.0%
95.0%
96.0%
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Wrongly classified
as “surviving”
Correctly classified
as “washed-away”
Correctly classified
as “washed-away”
Wrongly classified
as “surviving”
Fixed-scale Adaptive
The different scales are complementary w.r.t. prediction.
vs
vs
Page 42
Accuracy
5/8/2017 MVA2017@Nagoya University 42
93.0%
94.0%
95.0%
96.0%
6-channel Siamese Only post image
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Page 43
Accuracy
5/8/2017 MVA2017@Nagoya University 43
93.0%
94.0%
95.0%
96.0%
6-channel Siamese Only post image
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Page 44
93.0%
94.0%
95.0%
96.0%
Accuracy
5/8/2017 MVA2017@Nagoya University 44
Without multi-scale
With multi-scale
v.s.
Page 45
Accuracy: in terms of input scenario
5/8/2017 MVA2017@Nagoya University 45
Only post-tsunami images may be sufficient in the context of washed-away building detection.
93.0%
94.0%
95.0%
96.0%
Both pre/post images Only post image
Page 46
Accuracy: in terms of input scale
5/8/2017 MVA2017@Nagoya University 46
93.0%
94.0%
95.0%
96.0%
No good-and-bad between fixed-scale and adaptive-scale.The ensemble is always better.
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Page 47
Accuracy: in terms of input scale
5/8/2017 MVA2017@Nagoya University 47
93.0%
94.0%
95.0%
96.0%
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Multi-scale CNNs always beat single-scale counterparts.
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Page 48
Qualitative result
5/8/2017 MVA2017@Nagoya University 48
Ground truthCNN prediction
RED: washed-away
YELLOW: surviving
Page 49
Conclusion
This work in a nutshell:
◦ Effective use of CNNs for washed-away detection was explored.
◦ To this end, we compiled a new labeled dataset.
◦ On this dataset, we performed some types of experiments from an application viewpoint (input scenario and input scale).
◦ Overall, the accuracy was reasonably good.
Future work:
◦ Generalizability (other regions, other events, other data and so on)
◦ End-to-end framework from localization to classification such as instance-segmentation (Pinheiro+ECCV16)
5/8/2017 MVA2017@Nagoya University 49
Page 50
5/8/2017 MVA2017@Nagoya University 50
Reference:◦Cooner et al.: Detection of urban damage using remote sensing and machine learning algorithms:
Revisiting the 2010 Haiti earthquake, in Remote Sensing, 2016.
◦Gueguen and Hamid: Large-scale damage detection using satellite imagery, in CVPR, 2015.
◦Simo-Serra et al.: Discriminative learning of deep convolutional feature point descriptors, in ICCV, 2015.
◦Zagoruyko and Komodakis: Learning to compare image patches via convolutional neural networks, in
CVPR, 2015.
◦Lin et al.: Learning deep representations for ground-to-aerial geolocalization" in CVPR, 2015.
◦Ministry of Land, Infrastructure, Transport, and Tourism: First report on an assessment of the damage
caused by the Great East Japan earthquake", http://www.mlit.go.jp/common/000162533.pdf
(published in Japanese)
◦Pinheiro: Learning to re ne object segments, in ECCV, 2016.
◦Nia and Mori: Building Damage Assessment Using Deep Learning and Ground-Level Image Data, in CRV,
2017.
Acknowledgement: This presentation is based on results obtained from a project commissioned by the New Energy and
Industrial Technology Development Organization (NEDO).
Page 51
Appendix
5/8/2017 MVA2017@Nagoya University 51
Page 52
Related work
Damage detection◦ Gueguen & Hamid (CVPR, 2015)
◦ Pre- and post-disaster satellite images
◦ Semi-supervised classification (BoF + SVM)
◦ Cooner et al. (Remote Sensing, 2016)
◦ Pre- and post-disaster satellite images
◦ Two-layer neural network
◦ Nia & Mori (CRV, 2017)
◦ Only post-disaster ground-level image
◦ Dilated CNN
5/8/2017 MVA2017@Nagoya University 52
Page 53
Related work
Two image patches as input to CNN
◦ Simo-Serra et al. (ICCV, 2015)
◦ Lin et al. (CVPR, 2015)
◦ Zagoruyko & Komodakis (CVPR, 2015)
5/8/2017 MVA2017@Nagoya University 53
Page 54
5/8/2017 MVA2017@Nagoya University 54
Page 55
5/8/2017 MVA2017@Nagoya University 55
Page 56
9 days after the quake (20/Mar/2011)114 days before the quake (17/Nov/2010)
Human annotations derived from the field survey are superimposed.
Red arrows are buildings that were washed away by tsunami.
Visual changes of buildings with tsunami are clear, but how can we describe these?
5/8/2017 MVA2017@Nagoya University 56
Page 57
Cross validation result
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 8
15 22 29 36 43 50 57 64 71 78 85 92 99
10
6
11
3
120
12
7
13
4
14
1
14
8
15
5
16
2
16
9
17
6
183
19
0
19
7
20
4
21
1
21
8
22
5
23
2
23
9
246
25
3
26
0
26
7
27
4
28
1
28
8
29
5
Acc
ura
cy /
Lo
ss
Iteration (x10)
accuracy (fixed_size/noAug)
testloss (fixed_size/noAug)
trainloss (fixed_size/noAug)
5/8/2017 MVA2017@Nagoya University 57
Page 58
Size-adaptive classification
Improved:
Worsened:
5/8/2017 MVA2017@Nagoya University 58
Page 59
5/8/2017 MVA2017@Nagoya University 59
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 1 2 3 4
Cro
ss-v
alidati
on m
ean o
f accura
cy
Number of conv layers shared
Effect of # of shared layers on accuracy
Page 60
Visualization via Global Average Pooling
Global Average Pooling is the technique for:
1) Regularizing big networks by forgoing FC layers (e.g., NIN and GoogLeNet)
2) Visualizing feature maps learned by networks in an intuitive way for human
3) Localizing object instances even under the weakly-supervised setting
Recent papers that delve into GAP:
◦ Learning Deep Features for Discriminative Localization (Zhou+, CVPR16)
◦ Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization (Selvaraju+, arXiv16)
5/8/2017 MVA2017@Nagoya University 60
Page 61
in Zhou+ (CVPR16)
Replace fc-layers w/ GAP and need to retrain (leading to some drops in classification accuracy, but improving localization power)
5/8/2017 MVA2017@Nagoya University 61
Visualization using GAP: Class Activation Map
Page 62
Visualization using GAP: Class Activation Map
in Selvaraju+ (arXiv16)
Retraining is unnecessary thanks to gradient backprop (keeping models intact; hence it doesn’t impair classification performance)
5/8/2017 MVA2017@Nagoya University 62
Page 63
resized
fixed
5/8/2017 MVA2017@Nagoya University 63
Page 64
resized
fixed
5/8/2017 MVA2017@Nagoya University 64
Page 65
resized
fixed
5/8/2017 MVA2017@Nagoya University 65