DraftDraft 1 EXPLOITATION OF DEEP LEARNING IN 2 THE AUTOMATIC DETECTION OF 3 CRACKS ON PAVED ROADS 4 5 6 7 8 9 10 11 12 Won Mo Jung a, Faizaan Naveed a, Baoxin Hu *a ...
Post on 24-Sep-2020
0 Views
Preview:
Transcript
Draft
EXPLOITATION OF DEEP LEARNING IN THE AUTOMATIC DETECTION OF CRACKS ON PAVED ROADS
Journal: Geomatica
Manuscript ID geomat-2019-0008.R1
Manuscript Type: Research Article
Date Submitted by the Author: 25-Sep-2019
Complete List of Authors: Jung, Won Mo ; York University, Earth and Space Science and EngineeringNaveed , Faizaan ; York University, Earth and Space Science and EngineeringHu, Baoxin; York UniversityWang, Jianguo; York UniversityLi, Ningyuan ; Government of Ontario Ministry of Transportation
Is the invited manuscript for consideration in a Special
Issue? :Not applicable (regular submission)
Keywords: Crack Detection,Convolutional Neural Networks (CNN),Fully Convolutional Networks (FCN),
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
1 EXPLOITATION OF DEEP LEARNING IN 2 THE AUTOMATIC DETECTION OF 3 CRACKS ON PAVED ROADS456789
1011
12 Won Mo Jung a, Faizaan Naveed a, Baoxin Hu *a, Jianguo 13 Wang a and Ningyuan Li b14151617181920
21
22 a Earth and Space Science and Engineering Department, Lassonde School of 23 Engineering, York University, 4700 Keele Street, Toronto, Canada - 24 (dnjsah94@my.yorku.ca, fazanham@my.yorku.ca, baoxin@yorku.ca, 25 jgwang@yorku.ca)
26 b Ministry of Transportation of Ontario, 159 Sir William Hearst Ave, Toronto, 27 Canada – (Li.Ningyuan@ontario.ca)
28 * Corresponding Author
Page 1 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
29 ABSTRACT
30 With the advance of deep learning networks, their applications in the assessment of
31 pavement conditions are gaining more attention. A convolutional neural network (CNN) is the
32 most commonly used one in image classification. In terms of pavement assessment, most existing
33 CNNs are designed to only distinguish between cracks and non-cracks. Few networks classify
34 cracks in different levels of severity. Information on the severity of pavement cracks is critical for
35 pavement repair services. In this study, the state-of-the-art CNN used in the detection of pavement
36 cracks was improved to localize the cracks and identify their distress levels based on three
37 categories (low, medium, and high). In addition, a fully convolutional network (FCN) was, for the
38 first time, utilized in the detection of pavement cracks. These designed architectures were validated
39 using the data acquired on four highways in Ontario, Canada, and compared with the ground truth
40 that was provided by the Ministry of Transportation of Ontario (MTO). The results showed that
41 with the improved CNN, the prediction precision on a series of test image patches were 72.9%,
42 73.9%, and 73.1% for cracks with the severity levels of low, medium, and high, respectively. The
43 precision for the FCN was tested on whole pavement images, resulting in 62.8%, 63.3%, and
44 66.4%, respectively, for cracks with the severity levels of low, medium, and high. It is worth
45 mentioning that the ground truth contained some uncertainties, which partially contributed to the
46 relatively low precision.
47
48
49 KEY WORDS: Crack detection, convolutional neural network (CNN), fully convolutional
50 network (FCN), pavement distress, instance segmentation
Page 2 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
51 I. INTRODUCTION
52 Due to traffic, environmental factors, and aging, road pavements usually experience
53 different types of distress and deterioration that include cracking, surface defects, and profile
54 deformation (McGhee 2004; Miller et al. 2003; Bennett et al. 2007). The qualities of pavements
55 are usually characterized by distress type, severity, and extent. Effectively and accurately
56 monitoring the condition of a pavement surface is paramount to determine if it provides a
57 comfortable, safe, and efficient service to the public. It also assists the road management authority
58 to make decisions on appropriate maintenance and rehabilitation. Traditionally, pavement surveys
59 involve transportation personnel observing and recording surface defects and degradation through
60 walking or slowly driving over pavements (asphalt or concrete). Overall, manual techniques are
61 considered labor intensive, slow, expensive, and sometimes unsafe. The early efforts to develop
62 automatic or semi-automatic systems for pavement assessment were mainly camera-based. The
63 qualities of the cameras and software for data processing and analysis limited their operational
64 employment (Cafiso et al. 2006; Teomete et al. 2005; Yu et al. 2007).
65 With the development of Light Detection and Ranging (LiDAR) instruments, multi-sensor
66 integrated systems using both LiDAR and cameras are gaining attention (Zhang et al. 2014; Yu et
67 al. 2014; Murakami et al. 2018). However, the costs of the systems and their operations limit their
68 usage. As an example, the Ministry of Transportation of Ontario (MTO) only employs such
69 systems for major highways and uses camera-based systems for all other roads instead. In addition,
70 even with the integrated systems, LiDAR data and camera imagery are hardly utilized together for
71 the detection of pavement cracks. The camera systems are mainly used for asset mapping and thus
72 point forward and/or sideways (Fugro 2014). The advance in imaging technologies and artificial
73 intelligence provides a good opportunity to improve the performance of the camera-based system
Page 3 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
74 for the assessment of pavement conditions. In this study, the focus was on the development of
75 automatic image-based methods to accurately detect cracks on pavements. As described in the next
76 section, several methods have been developed in the past decade for crack detection (Davis 2011;
77 Oliveira et al. 2013; Shi et al. 2016). However, the accuracies in the localization of cracks are
78 relatively low (about 60%), and the widths of cracks tend to be over-estimated (Oliveira et al.
79 2013). In addition, most methods are not able to distinguish cracks with different distress levels,
80 which prevents the detection results from being used in an automated system for road repair
81 services (Davis 2011). The objective of this current study was to develop deep learning solutions
82 to improve the localization of pavement cracks and the classification of their severity.
83 II. RELATED WORK
84 The methods that have been developed to characterize the cracks and distress of paved
85 roads using optical imagery primarily rely on the difference in the intensities between the road
86 surface and cracks (Sun et al. 2009; Oliveira et al. 2013; Danilescu et al. 2015; Shi et al. 2016;
87 Hoang et al. 2018). The most commonly used techniques include thresholding, edge detection,
88 and morphological operation. Specifically, Sun et al. (2009) developed a method to employ
89 thresholding techniques to identify crack pixels and morphological operations to connect them
90 based on pixel connectivity, but their method performed poorly on small cracks, which led to
91 disconnection between continuous cracks. Similarly, Danilescu et al. (2015) proposed a crack
92 detection method using the thresholding technique and a series of morphological operations for
93 post-processing. Their results showed problems of misclassification of road cracks, and the method
94 was sensitive to noise (Danilescu et al. 2015). Hoang et al (2018) investigated different edge
95 detection methods, such as Roberts, Sobel, Prewitt, and Canny, for crack detection. Like
Page 4 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
96 thresholding methods, edge detection methods tend to detect more cracks and are sensitive to noise
97 as well. Oliveira et al. (2013) proposed an unsupervised classification method. With their method,
98 the image was divided into a series of non-overlapping image blocks, and each image block was
99 classified into either a block with crack pixels or a block without. The features used in the
100 classification included the mean and standard deviation of the intensity within a given block, and
101 it was assumed that the standard deviation was higher in the block with crack pixels compared
102 with the block without. The severity of cracks was determined based on the average width of each
103 crack. The results were pixelated, and the cracks were not accurately localized. The severity of the
104 detected cracks was also not accurately characterized. In Shi et al.’s (2016) study, new descriptors
105 based on random structured forests were proposed to characterize cracks. Even though the
106 accuracies for the test data were high (up to 90%), the performance of this method highly depended
107 on the extracted features, and thus this method might not be effective for all pavements. In addition,
108 the method proposed by Shi et al. (2016) could not characterize the severity of cracks.
109 Recently, with the advance of deep learning, convolutional neural networks (CNNs) are
110 being exploited to detect cracks in pavements (Fan et al. 2018). Fan et al. (2018) implemented a
111 CNN with structured predictions, considered state-of-the-art in the detection of pavement cracks.
112 The CNN proposed by Fan et al. (2018) consisted of four convolutional layers with two max-
113 pooling layers, followed by three fully connected layers. Every convolutional layer was applied
114 with 3 by 3 kernel and a stride of 1 pixel. Additionally, zero paddings were applied on the boundary
115 of each input image before the convolutional filters were applied, to preserve the spatial resolution
116 of the feature map. After each pair of convolutional layers, max pooling was applied with stride of
117 2 over 2 by 2 kernel. In the fully connected layers, 64 neurons were used to output 25 neurons,
118 which were then reshaped to form a 5 by 5 image patch with binary prediction, 1 being the crack
Page 5 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
119 pixel, and 0 being the non-crack pixel. Furthermore, Fan et al. (2018) explored different output
120 sizes, ranging from one single pixel (traditional CNN output) to a 7 by 7 image patch (structured
121 prediction, and the output with a 5 by 5 image patch was shown to have achieved the best result).
122 In addition, Fan et al. (2018) also investigated the effect of the ratio between the numbers of crack
123 and non-crack image patches in the training stage and concluded that the 1:3 ratio of crack to non-
124 crack patches resulted in the best outcome. Even though a good accuracy (91%) was achieved in
125 the prediction of crack and non-crack for a given pixel, the designed CNN was not able to
126 determine the severity of cracks due to its binary output. Additionally, the output from this network
127 was restricted to tiny image patches that needed to be stitched together to form an image. A variant
128 of the network proposed in Fan et al. (2018) was implemented in this study to detect cracks with
129 different severities.
130 In addition to a CNN, a fully convolutional network (FCN) was exploited in this study for
131 the detection of cracks with identification of their severity levels as well. Based on our knowledge,
132 this was the first time an FCN was used in crack detection with severity level classification. The
133 network was trained to detect three levels of pavement distress: low, medium, and high. The results
134 were cross validated with a CNN with structured prediction. Four major highways in Ontario,
135 Canada, with varying crack sizes and levels of distress, provided by the MTO, were used to train
136 and test the algorithms, and the results were compared with those obtained using the current
137 integrated LiDAR and camera-based system.
138 III. TRAINING AND TEST DATA SET
139 The dataset used in this study was provided by the MTO, using an Automatic Road
140 Analyzer (ARAN) 9000 system developed by Fugro Roadware (Fugro 2014). The pavement
141 images were collected from four highways: 89, 34, 21, and 138 in Ontario, Canada. The ARAN
Page 6 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
142 9000 system consists of two LiDAR instruments at that back of the vehicle and one camera facing
143 forward at the front part of the vehicle. Since the purpose of this study was to detect cracks using
144 optical imagery, ideally the images obtained by the camera should have been used. However, due
145 to the orientations of the camera, the acquired images needed to be rectified first. This process
146 reduced the image quality significantly and caused the images to be blurry. Since few nadir-view
147 images were available, the nadir-view optical imagery was simulated (with the camera facing
148 downward to the road surface) using the LiDAR intensity for the development of algorithms. In
149 the simulated dataset, there were 3,332 road images in total, and each had a size of 2500x1037
150 pixels, where each 2.5 pixels represented 1 cm in the real world. Examples of image patches with
151 non-crack and cracks with different levels of severity are shown in Figure 1. As described in the
152 following paragraph, the severity of a crack was determined by its depth and width.
153 The ground truth associated with these road images was provided by the MTO. As
154 mentioned earlier, the MTO processed the data collected by an ARAN 9000 system over the
155 highways 89, 34, 21, and 138 in Ontario using a software called Laser Crack Measurement System
156 (LCMS) (Pavemetrics 2018). The software uses the laser intensity to compute the depth of the
157 cracks, and the corresponding cracks are classified into different severity levels. Even though the
158 results from this software were satisfactory, there were two problems: (1) With LCMS,
159 thresholding techniques were used to classify the cracks and non-cracks, which means some of the
160 road markings might be recognized as cracks. (2) A universal threshold was used by the software
161 as well. As pavements varied in their conditions, one threshold did not work well for all four
162 highways. Cautions were made in choosing quality training sites. The test samples were randomly
163 selected. The issues mentioned above with the LCMS software affected the quality of the extracted
164 image samples, which contributed to the lower classification accuracy, and the main factor of this
Page 7 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
165 was due to the inaccurate ground truth. This issue will be further discussed in experimental results
166 and discussion section.
167 To train and validate the CNN with the structured prediction, 330,401 image patches of a
168 size 27 by 27 pixels and the labels of a 5 by 5 neighbour around the central pixels for these image
169 patches were used with 70% of the patches as training, and the rest, roughly 99,000 patches as
170 testing.
171 To train an FCN, images with 160x576 pixels were extracted from the original images
172 using an overlap of 75%. Since not every image contained crack pixels, only images that contained
173 more than 0.5% crack pixels were used. In total, 220,000 images were generated for training and
174 1400 images for testing.
175 IV. APPROACHES
176 In this study, two different deep learning architectures were designed for the detection of
177 cracks on paved roads: the first architecture was an improved CNN with structured prediction that
178 was originally proposed by Fan et al. (2018), and the second one was an FCN with convolutional
179 layers and the extraction of initial weights from VGG-16 (Simonyan et al. 2014). These two
180 networks are described in the following section in detail.
181 4.1 An Improved CNN with Structured Prediction
182 As mentioned in Section II, the state-of-the-art CNN network proposed by Fan et al. (2018)
183 had a couple of issues in the detection of cracks in pavements. In this study, an improvement was
184 made to the architecture (Fan et al. 2018) to detect cracks with different distress levels. The
185 proposed CNN is shown in Figure 2. In the original network (Fan et al. 2018), only two classes
186 (crack and non-crack) were identified, and thus there were 25 output neurons, which were then
Page 8 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
187 reconstructed into a 5x5 output patch and two consecutive, fully connected layers of 64 neurons.
188 In this study, the implementation of the detection of three levels of cracks required 75 neurons in
189 the output layer and thus more neurons in the fully connected layers. In addition, the classification
190 of different levels of severity of cracks also required the network to learn more complex features,
191 and the network had to be deeper. As shown in Figure 2, the number of convolutional layers was
192 increased from four to six; the number of max-pooling layers was increased to three; and the
193 number of fully connected layers remained the same but the number of neurons within each layer
194 were increased. With 231,280 training patches, it took about 30 hours to generate and recall the
195 generated images using i7-8700K with 16GB RAM and GTX 1070.
196 In order to use the trained network to classify every pixel on a pavement image, image
197 stitching needed to be carried out after the prediction of image patches. Specifically, to produce a
198 classification map of the whole pavement image, this image had to be sub-divided into 27 by 27
199 image patches to predict 5 by 5 output patch, and these outputs from each input patch had to be
200 collected to form a corresponding prediction for the whole image. Since for each patch, the output
201 of a 5 by 5 window around the central pixel of the patch was generated, to avoid any gaps in the
202 predicted image, the next patch needed to be centered at the location of 5 pixels away at the
203 maximum. The prediction with n pixels between two adjacent image patches was called with a
204 stride of n. Considering the processing speed, a stride of 5 was selected in this study. However,
205 the effect of stride size was also investigated in this study, and the results are presented in the next
206 section.
207 The resulting prediction image contained some isolated crack pixels, as shown in Figure
208 3(a). A morphological opening operation was used to eliminate isolated regions smaller than the
209 structuring element of an ellipse of 4 by 4 pixels (Figure 3(b)).
Page 9 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
210 4.2 FCN
211 FCNs, proposed by Long et al. (2015), can make dense predictions on the pixel basis for
212 semantic segmentation tasks. FCNs are built on VGG-16 designed by the Visual Geometry Group
213 (VGG) from Oxford University, the winner of the ImageNet competition in 2014 (Simonyan et al.
214 2014). The structure of the FCN, as shown in Figure 4, is based on encoder-decoder architecture,
215 where the initial seven layers of the network are layers of a typical CNN, and subsequent layers
216 are used for generating the segmentation map by upsampling the results. The input image is
217 downsampled, as it goes through convolutional layers, and then upsampled through
218 deconvolutional layers, which are simply transposed convolutional layers. In the FCN structure,
219 the fully connected layer (7th layer) is replaced by a 1x1 convolutional layer, which generates a
220 map of the features detected by the network. This network architecture is also sometimes dubbed
221 a pixel-to-pixel network, as the labels are pixelwise predictions of the same 2-D dimensions as the
222 input image. The output from the network is then upsampled using deconvolutional operations.
223 Since the upsampled images are coarse, the FCN architecture makes use of the feature maps from
224 earlier layers to refine the coarse segmentation. These additions are known as skip connections.
225 Long et al. (2015) presented three different examples of the FCN architecture: FCN-32,
226 FCN-16, and FCN-8. FCN-32 directly produced segmentation maps from the 7th convolution layer
227 by using a deconvolution operation with a stride of 32 pixels. This resulted in a 32x upsampling
228 from the output of the final convolution layer, yielding the same 2-D dimensions as the input image.
229 Since no skip connection was applied from previous layers, the segmentation results were coarse.
230 FCN-16 and FCN-8 were 16x and 8x upsampling of the output, respectively. In FCN-16, the output
231 of the final 1 by 1 convolutional layer was upsampled by 16x, and the activations from pooling
232 layer 4 were added. This resulted in a relatively refined segmentation result. In FCN-8, outputs
Page 10 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
233 from further pooling layer 3 were added to the results, which further helped retrieve fine-grained
234 spatial information. During the training phase, the deconvolutional layers were trained in the same
235 fashion as the convolutional layers. In other words, the weights for the deconvolution were learned
236 through the training process.
237 For this study, the FCN-8 structure was implemented for the best outcome. The network
238 contained 7 convolutions and 5 pooling layers, and 3 deconvolution layers with skip connections
239 from pooling layers 3 and 4 (Long et al. 2015).
240 V. EXPERIMENTAL RESULTS AND DISCUSSION
241 The trained improved CNN with structured prediction and FCN-8 was applied to three
242 stretches of pavements with cracks of various levels of severity and the results are shown in Figures
243 5-7. It is clear from these figures the predictions by both the CNN and FCN were consistent with
244 the observations of the cracks in the images and the ground truth. However, the ground truth labels
245 were not correct when compared with the original input images. The possible cause of this issue
246 will be discussed further later in this section. In addition, the results from the CNN were not very
247 smooth. As mentioned before, image stitching was required for the CNN to generate a prediction
248 for each pixel, which created a pixelated shape of cracks, rather than a smooth shape. This led the
249 overall crack shapes to be unnatural. The CNN with structured prediction considered a 5 by 5
250 neighborhood together. The trained network might have performed the best considering a 5 by 5
251 neighborhood, but was not necessarily the best for individual pixels. This issue will be discussed
252 further later.
253 To quantify the performance of the proposed CNN and FCN, the commonly used measures,
254 precision, recall, and F1 score (Estrada and Jepson 2009) was used. The precision was measured
Page 11 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
255 the percentage of the number of the crack pixels that were correctly classified over the total number
256 of the crack pixels in the test dataset. Recall provided the percentage of the number of the crack
257 pixels that correctly classified crack pixels over the total number of predicted crack pixels.
258 Precision and recall are also referred to as the user’s accuracy and producer’s accuracy in the
259 remote sensing community. F1 score was computed from the precision and recall as a weighted
260 harmonic mean of the two. The precision, recall, and F1 score were calculated based on the test
261 datasets described in Section III.
262 As mentioned earlier, the test data set for the FCN includes 1,400 images of the size of
263 160 by 576 pixels. The calculated precisions, recalls, and F1 scores are shown in Table 1.
264 Considering both localization and classification, the FCN resulted in the precision of 62.8% in low
265 severity, 63.3% in medium severity, 66.4% in high severity; recall of 39.5% in low severity, 46.7%
266 in medium severity, 41.4% in high severity; F1 score of 46.1% in low severity, 51.6% in medium
267 severity, and 47.8% in high severity. Since the FCN was designed to perform both crack
268 localization and severity level classification at the same time, the evaluation of FCN confirmed
269 whether the predicted cracks were correctly localized and correctly classified into corresponding
270 severity level. The results in Table 1 indicate that the precisions, recalls, and F1 scores were
271 satisfactory but relatively low. Such low evaluation results were partially affected by the issues of
272 the ground truth. Generally, most of the generated ground truth labels were reasonable in terms of
273 localizing the cracks and classifying them accordingly. However, when the cracks on the pavement
274 were too weak to pass the threshold, those cracks were simply disregarded from the ground truth
275 labels. This can be clearly observed by the examples shown in Figure 8. It is clear from Figure 8
276 that concerning the ground truth, either a discontinuity existed between the cracks or a crack
277 segment was completely missed, whereas FCN predictions correctly detected the cracks. One of
Page 12 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
278 the other reasons for the low accuracies in the detection of the cracks was due to the presence of
279 the lane markings on the roads. As shown in Figure 9, these markings were sometimes detected as
280 cracks either by the FCN or on the ground truth. This was because of the difference in the intensity
281 between road markings and the road surface. In the future work, either a pre-process or a post-
282 process will be added to detect and remove road markings.
283 In terms of the validation of the CNN, as mentioned in Section III, roughly 99,000 image
284 patches were in the test set for the CNN. Based on the test set, the calculated precisions, recalls,
285 and F1 scores are shown in Table 2. The results shown in Table 2 only indicate the accuracy of the
286 classification of crack severities, but not of the localization. The precisions were 72.9% in low
287 severity, 73.9% in medium severity, 73.1% in high severity; the recalls were 55.8% in low severity,
288 58.5% in medium severity, 55.8% in high severity; and the F1 scores were 63.2% in low severity,
289 65.3% in medium severity, and 63.3% in high severity. Similarly, the issue associated with the
290 accuracy of the ground truth was one of the contributing factors to the low precisions, recalls, and
291 F1 scores. Even so, the accuracies in the classification of cracks of different levels of severity were
292 satisfactory.
293 To compare with the traditional machine learning methods in the localization of crack
294 pixels in a pavement image, both the CNN with structured prediction and the FCN were trained in
295 a binary classification (crack vs. non-crack). The training and test sets used for the FCN were
296 employed for this investigation. Examples of the localization of crack pixels in an image generated
297 from the two deep learning networks are shown in Figure 10, together with the original image and
298 the ground truth generated from LCMS. In addition, the result generated from morphological crack
299 detection method (Wang et al. 2018) is also shown in Figure 10 as a representation of machine
300 learning methods. To generate the results in Figure 10, a given pixel in an image was identified as
Page 13 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
301 either crack or non-crack. As shown in Figure 10, the results from both the networks were
302 consistent with the ground truth, with the one from the FCN closer to the ground truth than the
303 ones from the CNN and the morphological crack detection. Table 3 shows the crack localization
304 accuracy for the CNN with structured prediction, the FCN, and the morphological method. As
305 expected, the morphological method showed the lowest accuracy, and the FCN performed slightly
306 better than the CNN method. This was expected as morphological methods rely on simple kernel-
307 based thresholding operations to detect local maxima/minima. Due to this, the method is sensitive
308 to noise and the presence of road markings. Furthermore, the morphological method was also not
309 capable of detecting cracks with low severity, as there was not enough difference in the intensity
310 between the paved road and the low-severity cracks. The detected cracks were also observed to be
311 discontinuous due to the limitations in the size of the structuring element. Thus, different ranges
312 of structuring element sizes were implemented, but a larger structuring element resulted in a larger
313 omission error of low- and medium-severity cracks, and a small structuring element was very
314 sensitive to noise.
315 As discussed earlier, the disadvantage of the CNN was to perform image stitching after the
316 prediction of an image patch. For the CNN with structured prediction, for each patch, the output
317 of a 5 by 5 window around the central pixel of the patch was generated. In this study, the stride of
318 5 pixels (the next patch was centered at the location of 5 pixels away) was used to avoid any gaps
319 in the predicted image and with the fastest processing time. The effect of the size of the stride was
320 investigated in this study by comparing the results generated with three stride sizes (1, 3, and 5
321 pixels) shown in Figure 11. With the stride of 1 pixel, for each patch, only the predicted result for
322 the central pixel was recorded. As shown in Figure 11, the predictions generated from the CNNs
323 with different stride sizes were similar. However, the result from the CNN with a stride of 1 seemed
Page 14 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
324 more detailed but rather noisy. The CNN with a stride of 1 pixel would be desirable if we could
325 find a way to improve the result by minimizing the isolated crack pixels and reducing the
326 processing time. One way to improve the CNN with structured prediction would be to change the
327 loss function. In the current version, all 5 by 5 pixels considered in the loss function were weighted
328 equally. A weighted loss function is being investigated. In addition, with the current
329 implementation, the overlapping in the prediction between adjacent patches was not considered.
330 An ongoing study is being carried out to seek a good way to combine the predictions from
331 overlapping patches.
332 VI. CONCLUSIONS
333 In this study, two deep learning networks, an improved CNN with structured prediction
334 and an FCN, were exploited to detect cracks on pavements with their level of severity. Prior to this
335 study, the CNN with structured prediction proposed by Fan et al. (2018) was restricted to crack
336 localization, but by expanding the number of convolutional layers and fully connected layers, an
337 improved network designed in this study was successfully trained on pavement images with
338 categorized crack severity. In addition, an FCN network on crack detection with different distress
339 levels was, for the first time, developed. As outlined in Table 1, the FCN resulted in a precision of
340 62.8% in low severity, 63.3% in medium severity, 66.4% in high severity; recall of 39.5% in low
341 severity, 46.7% in medium severity, 41.4% in high severity; F1 score of 46.1% in low severity,
342 51.6% in medium severity, and 47.8% in high severity. As outlined in Table 2, when the CNN
343 with structured prediction was tested on each individual image patch, it resulted in the precision
344 of 72.9% in low severity, 73.9% in medium severity, 73.1% in high severity; recall of 55.8% in
345 low severity, 58.5% in medium severity, 55.8% in high severity; F1 score of 63.2% in low severity,
346 65.3% in medium severity, and 63.3% in high severity. The CNN with structured prediction was
Page 15 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
347 tested on a series of sub-sampled image patches, but the FCN was tested on the whole image. In
348 comparison with the current state-of-the-art CNN for road crack localization, the FCN architecture
349 was not only more robust for this task but also did not suffer from the limitations of using a fixed-
350 size input image due to the presence of the fully connected layer. In terms of the localization of
351 the cracks, both neural networks performed better than the morphological method. Even with the
352 satisfactory results obtained by both the CNN with structured prediction and the FCN, further
353 validations and improvements of the two networks are being pursued.
354
355 VII. ACKNOWLEDGEMENTS
356 We would like to express our very great appreciation to the MTO and Natural Sciences and
357 Engineering Research Counsil (NSERC) for funding this research project. Special thanks to
358 Gideon Gumisiriza at the MTO for providing us necessary data for training and testing the software.
Page 16 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
(a)
(b)
(c)
(d)
Page 17 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
Figure 1: Examples of the pavement images with non-crack pixels (a) and crack pixels with different levels of seveity (b), (c), and (d) where the left panels are the original images and the right are corresponding crack levels with low, medium and high severity printed in cyan, green, and orange, respectively.
Figure 2: The illustration of the CNN architecture with a single grayscale image as the input.
(a)
(b)
Figure 3: The llustration of the effect of the post-processing: (a) the prediction from CNN and (b) the prediction after morphological post-processing. Crack pixels with the levels of low, medium and high severity are printed in cyan, green, and orange, respectively.
Page 18 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
Figure 4: The structure of a Fully Convolutional Neural Network. The image was adapted from the one in Long et al. (2015).
(a)
(b)
(c)
(d)
Figure 5: The results for the first stretch pavement: (a) the pavement image, (b) the ground truth generated from LCMS Pavemetrics where crack pixels with the levels of low, medium and high severity are printed in cyan, green, and orange, respectively; (c) the result produced by the FCN-8; and (d) the result generated by the improved CNN with structured prediction.
Page 19 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
(a)
(b)
(c)
(d)
Figure 6: The results for the second stretch pavement: (a) the pavement image, (b) the ground truth generated from LCMS Pavemetrics where crack pixels with the levels of low, medium and high severity are printed in cyan, green, and orange, respectively; (c) the result produced by the FCN-8; and (d) the result generated by the improved CNN with structured prediction.
Page 20 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
(a)
(b)
(c)
(d)
Figure 7: The results for the third stretch pavement: (a) the pavement image, (b) the ground truth generated from LCMS Pavemetrics where crack pixels with the levels of low, medium and high severity are printed in cyan, green, and orange, respectively; (c) the result produced by the FCN-8; and (d) the result generated by the improved CNN with structured prediction.
Page 21 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
(a)
(b)
(c)
Figure 8: An illustration of the missing weal cracks in the ground truth: (a) pavement images, (b) ground truth, and (c) FCN predicted result. For (b) and (c), crack pixels with the levels of low, medium and high severity are printed in cyan, green, and orange, respectively.
Page 22 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
(a)
(b)
(c)
Figure 9: An illustration of the effect of road markings on crack detection: (a) the pavement images, (b) ground truth, and (c) the results generated from the FCN. For (b) and (c), crack pixels with the levels of low, medium and high severity are printed in cyan, green, and orange, respectively.
Page 23 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
(a)
(b)
(c)
(d)
(e)
Figure 10: The localization of the crack pixels among different methods: (a) the original pavement images, (b) ground truth for binary classification generated from LCMS Pavementrics, (c) the results from FCN-8, (d) the results from CNN with structured prediction, (e) the results from the morphological crack detection method.
Page 24 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
(a)
(b)
(c)
(d)
(e)
Figure 11: The effect of the stride size used in the CNN with structured prediction: (a) a pavement image, (b) the ground truth, (c) the result generated by the CNN with stride 1, (d) the result generated by the CNN
Page 25 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
with stride 3, and (e) the result generated by the CNN with stride 5. Crack pixels with the levels of low, medium and high severity are printed in cyan, green, and orange, respectively.
Page 26 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
Draft
Table 1: The prediction results on image patches in the test data set for the CNN with structured prediction
Architecture Severity Precision Recall F1-ScoreHigh 73.1% 55.8% 63.3%
Medium 73.9% 58.5% 65.3%CNN with structured prediction
Low 72.9% 55.8% 63.2%
Table 2: Predicion results on the tested pavement images in the test data set for the FCN.
Architecture Severity Precision Recall F1-ScoreHigh 66.4% 41.4% 47.8%
Medium 63.3% 46.7% 51.6%FCN-8
Low 62.8% 39.5% 46.1%
Table 3: The results on the localization of cracks based on the the test data set for the FCN
Architecture Precision Recall F1-ScoreCNN with structured prediction
68.05% 44.82% 54.05%
FCN-8 77.0% 43.6% 53.4%Morphological 53.1% 53.7% 53.1%
Page 27 of 27
https://mc06.manuscriptcentral.com/geomatica-pubs
Geomatica
top related