This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Article
Semi-supervised segmentation for coastal monitoring seagrassusing RPA imagery
Brandon Hobley 1,* , Riccardo Arosio 2,, Geoffrey French1,, Julie Bremner 2,, Tony Dolphin 2, and MichalMackiewicz 1,
2 Collaborative Centre for Sustainable Use of the Seas. School of Environmental Sciences, University of EastAnglia, Norwich NR4 7TJ; [email protected] (T.D.); [email protected] (J.B.);[email protected] (R.A.)
Abstract: Intertidal seagrass plays a vital role in estimating the overall health and dynamics of1
coastal environments due to its interaction with tidal changes. However, most seagrass habitats2
around the globe have been in steady decline due to human impacts, disturbing the already deli-3
cate balance in environmental conditions that sustain seagrass. Miniaturization of multi-spectral4
sensors has facilitated very high resolution mapping of seagrass meadows, which significantly5
improve the potential for ecologists to monitor changes. In this study, two analytical approaches6
used for classifying intertidal seagrass habitats are compared: Object-based Image Analysis (OBIA)7
and Fully Convolutional Neural Networks (FCNNs). Both methods produce pixel-wise classifi-8
cations in order to create segmented maps, however FCNNs are an emerging set of algorithms9
within Deep Learning with sparse application towards seagrass mapping. Conversely, OBIA10
has been a prominent solution within this field, with many studies leveraging in-situ data and11
multiscale segmentation to create habitat maps. This work demonstrates the utility of FCNNs12
in a semi-supervised setting to map seagrass and other coastal features from an optical drone13
survey conducted at Budle Bay, Northumberland, England. Semi-supervision is also an emerging14
field within Deep Learning that has practical benefits of achieving state of the art results using15
only subsets of labelled data. This is especially beneficial for remote sensing applications where16
in-situ data is an expensive commodity. For our results, we show that FCNNs have comparable17
performance with standard OBIA method used by ecologists, while also noting an increase in18
performance for mapping ecological features that are sparsely labelled across the study site.19
Keywords: Deep learning; Computer vision; Remote sensing; Supervised learning; Semi-supervised20
learning; Segmentation; Seagrass mapping21
1. Introduction22
Accurate and efficient mapping of seagrass extents is a critical task given the im-23
portance of these ecosystems in coastal settings and their use as a metric for ecosystem24
health. In particular, seagrass ecosystems play a key role for estimating and assessing the25
health and dynamics of coastal ecosystems due to their sensitive response to tidal pro-26
Version March 30, 2021 submitted to Remote Sens. 3 of 21
• Can FCNNs model high resolution aerial imagery from a small set of geographically88
referenced image shapes?89
• How does performance compare with standard OBIA/GIS frameworks?90
• How accurate is modeling Zostera noltii and Angustifolia along with all other relevant91
coastal features within the study site?92
2. Methods93
2.1. Study site94
The research was focused on Budle Bay, Northumberland, England (55.625N,95
1.745W). The coastal site has one tidal inlet, with previous maps also detailing the same96
inlet ([34], [35], [36]). Sinuous and dendritic tidal channels are present within the bay,97
and bordering the channels are areas of seagrass and various species of macro-algae.98
2.2. Data collection99
Figure 1 displays very high resolution orthomosaics of Budle Bay created using100
Agisoft’s MetaShape [37] and structure from motion (SfM). SfM techniques rely on101
estimating intrinsic and extrinsic camera parameters from overlapping imagery [38]. A102
combination of appropriate flight planning in terms of altitude and aircraft speed, and103
the camera’s field of view were important for producing good quality orthomosaics. Two104
sensors were used: a SONY ILCE-6000 camera with 3 wide banded filters for Red, Green105
and Blue channels and a ground sampling distance of approximately 3 cm (Figure 1,106
bottom right). And a MicaSense RedEdge3 camera with 5 narrow banded filters for Red107
(655-680 nms), Green (540-580 nms), Blue (459-490 nms), Red Edge (705-730 nms) and108
Near Infra-red (800-880 nms) channels and a ground sampling distance of approximately109
8 cm (Figure 1, top right).110
Each orthomosaic was orthorectified using respective GPS logs of camera positions111
and ground markers that were spread out across the site. This process ensures that both112
mosaics were well aligned with respect to each other, and also with ecological features113
present within the coastal site.114
Figure 1. Study site within the U.K. (top-left). Ortho-registered images of Budle Bay using a SONYILCE-6000 and a MicaSense RedEdge 3 (right). For the display of the latter camera, we use theRed, Green and Blue image bands.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2021 doi:10.20944/preprints202103.0780.v1
Version March 30, 2021 submitted to Remote Sens. 6 of 21
Figure 3. Gallery of images and polygons. OM - Other Macroalgae inc. Fucus; MB - Microphyto-bentos; EM - Enteromorpha; SM - Saltmarsh; SG - Seagrass; DS - Dry Sand; OB - Other Bareground.Images with white polygons are examples of polygons used for modelling.
2.5. Fully Convolutional Neural Networks187
Fully Convolutional Neural Networks ([31], [30], [49]) are an extension of tradi-188
Version March 30, 2021 submitted to Remote Sens. 7 of 21
Figure 4. U-Net architecture and loss calculation. The input channels are stacked and passedthrough the network. The encoder network applies repeated convolution and max poolingoperations to extract feature maps, while in the decoder network upsamples these and stacksfeatures from the corresponding layer in the encoder path. The output is a segmented map,which is compared with the mask using crossentropy loss. The computed loss is used to train thenetwork, through gradient descent optimisation
For semi-supervised training the Teacher-Student method was used [53]. This209
approach requires two networks: a teacher and a student model with both models having210
the same architecture as shown in Figure 4. The student network was updated through211
gradient descent by adding two loss terms: a supervised loss calculated on labelled212
pixels of each segmentation map and conversely an unsupervised loss calculated on213
non-labelled pixels. The teacher network was updated through an exponential moving214
average of weights within the student network.215
2.5.1. Weighted training for FCNNs216
Section 2.4 detailed the process to creating segmentation maps from polygons. Both217
sets of images from each camera had an imbalanced target class distribution. Figure218
5 displays the number of labelled pixels per class and also the number of non-labelled219
pixels for each camera.220
Figure 5. Distribution of labelled pixels for each class and non-labelled pixels.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2021 doi:10.20944/preprints202103.0780.v1
Version March 30, 2021 submitted to Remote Sens. 8 of 21
The recorded distribution poses a challenge for classes such as other macro-algaeand Microphytobentos due to the relative number of labelled pixels in comparison withthe remaining target classes. The pixel counts shown in Figure 5 were used to calculatethe probability of each class occurring within the training set, and for each class a weightwas calculated by taking the inverse for each probability and scaling the weight withrespect to the total number of classes.
wi = (pi.K)−1 (1)
Where, wi is ith weight for a given class probability pi and K is the total number of221
classes. During FCNN training the supervised loss was scaled with respect to the222
weights generated in equation 1.223
2.5.2. Supervised loss224
For the supervised loss term, consider X ∈ RB×C×H×W and Y ∈ ZB×H×W to berespectively, a mini-batch of images and corresponding segmentation maps; whereB, C, H and W are respectively, batch size, number of input channels, height andwidth. Processing a mini-batch with the student network outputs per-pixel scoresY ∈ RB×K×H×W ; where K is the number of target classes. The softmax transfer functionconverts network scores into probabilities by normalising all K scores for each pixel tosum to one.
Pk(x) =exp Yk(x)
∑Kk′=1 exp Yk′(x)
(2)
Where, x ∈ Ω; Ω ⊆ Z2 is a pixel location and Pk(x) is the probability for the kth channelat pixel location x, with ∑K
k′=1 Pk′(x) = 1. The negative log-likelihood loss was calculatedbetween segmentation maps and network probabilities.
Ls(P, Y) =
0, if Y(x) = −1−∑K
k=1 Yk(x) log(Pk(x)),if Y(x) 6= −1
(3)
For each image, the supervised loss was the sum of all losses for each pixel using eq. 3225
and averaged according to the number of labelled pixels within Y.226
2.5.3. Unsupervised loss227
Previous work in semi-supervised segmentation details using a Teacher-Student228
model and advanced data augmentation methods in order to create two images for each229
network to process ([54], [55]). While this work did not use data augmentation methods,230
pairs of images were created by using labelled and non-labelled pixels within Y.231
Similarly to the supervised loss term, a mini-batch of images was processed through232
both the student and teacher network, respectively producing per-pixel scores Y and233
Y. Again, pixel scores were converted to probabilities with softmax (eq. 2), P and P,234
respectively for the student and teacher network. The maximum-likelihood of teacher235
predictions was used to create pseudo segmentation maps to compute the loss for non-236
labelled pixels of Y. Thus, the unsupervised loss was also calculated similarly to 3 but237
the negative log-likelihood was computed between predictions from the student model238
(P) and a pseudo map (Yp) of pixels that were initially labelled as -1.239
Lu(P, Yp) =
0, if Y(x) 6= −1−∑K
k=1 Ypk (x) log(Pk(x)),
if Y(x) = −1
(4)
For each image, the unsupervised loss was the sum of all losses for each pixel using240
eq. 4 averaged according to the number of non-labelled pixels within Y. The latter loss241
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2021 doi:10.20944/preprints202103.0780.v1
Version March 30, 2021 submitted to Remote Sens. 9 of 21
was also scaled with respect to the confidence in predictions for the teacher network so242
that initial optimisation steps focus more on the supervised loss term. Classes with low243
labelled pixel count would benefit from the unsupervised loss term, as confident teacher244
predictions can guide the decision boundaries of student models by adding pseudo245
maps to consider.246
2.5.4. Training parameters247
Combining both loss terms yields the objective cost used for optimising FCNNs ina semi-supervised setting.
L = wLs + γLu (5)
Where, Ls and Lu are respectively the supervised and unsupervised loss term, with the248
former being scaled according to the weights computed in 1 and the latter to γ which249
was set to 0.1 for all experiments.250
All networks were pre-trained on ImageNet. Networks for each camera were251
trained for 150 epochs with a batch-size of 16 using Adam optimiser. The learning rate252
was initially set to 0.001 and reduced by a factor of 10 every 70 epochs of training.253
2.6. OBIA254
The OBIA method for modelling multiple coastal features was done using eCog-255
nition v9.3 [56]. This software has the tools to process the high volume orthomosaics256
and shape file exports from GISs to create supervised models. Section 2.4 detailed a257
number of methods used to pre-process the orthomosaics and shape polygons, however258
the OBIA does not require this.259
The first step in OBIA is to process each orthomosaic using a multiscale segmen-tation to partition the image into segments, also known as image-objects. The segmen-tation starts with individual pixels and clusters pixels to image-objects using one ormore criteria of homogeneity. The subsequent clustering of two adjacent image-objectsor image-objects that are a subset of each other were merged together on the followingcriteria:
h = ∑c
N(omc − o1
c ) + M(omc − o2
c ) (6)
With, o1, o2 and om respectively representing the pixel values for object 1, 2 and a260
candidate virtual merge m. N and M are, respectively, the number of total pixels for261
object 1 and 2. This criteria evaluates the change in homogeneity during fusion of262
image-objects. If this change exceeds a certain threshold value then the fusion is not263
performed. In contrast, if the change in image-objects is lower then both candidates are264
clustered to form a larger segment. The segmentation procedure stops when no further265
fusions are possible without exceeding the threshold value. In eCognition, this threshold266
value is a hyper-parameter defined at the start of the process, also known as the scale267
parameter. The geometry of each shape is defined by two other hyper-parameters: shape268
and compactness. For this work, the scale parameter is set to 200, the shape to 0.1 and the269
compactness to 0.5. Figure 6 displays image objects overlaid on top of both orthomosaics.270
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2021 doi:10.20944/preprints202103.0780.v1
Version March 30, 2021 submitted to Remote Sens. 11 of 21
Table 1: Precision, recall and F1 scores for both algorithms on both cameras. DS - DrySand; OB - Other bareground; EM - Enteromorpha; MB - Microphytobentos; OM - Othermacro-algae; SG - Seagrass; SM - Saltmarsh
Version March 30, 2021 submitted to Remote Sens. 13 of 21
Figure 9. Segmented habitat maps for both cameras with OBIA. The top row of images are fromthe RedEdge3 multispectral camera and the bottom row of images from the SONY camera. Legend:OM - Other Macroalgae inc. Fucus; MB - Microphytobentos; EM - Enteromorpha; SM - Saltmarsh; SG -Seagrass; DS - Dry Sand; OB - Other Bareground.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2021 doi:10.20944/preprints202103.0780.v1
Version March 30, 2021 submitted to Remote Sens. 14 of 21
Figure 10. Segmented habitat maps for both cameras with FCNNs optimised using only thesupervised loss. The top row of images are from the RedEdge3 multispectral camera and thebottom row of images from the SONY camera. Legend: OM - Other Macroalgae inc. Fucus; MB -Microphytobentos; EM - Enteromorpha; SM - Saltmarsh; SG - Seagrass; DS - Dry Sand; OB - OtherBareground.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2021 doi:10.20944/preprints202103.0780.v1
Version March 30, 2021 submitted to Remote Sens. 15 of 21
Figure 11. Segmented habitat maps for both cameras with FCNNs optimised with both thesupervised loss and unsupervised loss. The top row of images are from the RedEdge3 multispectralcamera and the bottom row of images from the SONY camera. Legend: OM - Other Macroalgaeinc. Fucus; MB - Microphytobentos; EM - Enteromorpha; SM - Saltmarsh; SG - Seagrass; DS - DrySand; OB - Other Bareground.
4. Discussion300
The initial analysis of results was done for each individual camera based off pixel301
accuracy and an overall discussion for both cameras and methods was described using302
precision, recall and F1-score.303
4.1. SONY ILCE-6000 analysis304
The results for the SONY camera in terms of average pixel accuracy across all classes305
were better with OBIA than FCNNs.306
Predictions with the OBIA method had an average pixel accuracy of 90.6%. Classes307
related to sediment had scores of 100% and 98.38%, respectively for dry sand and other308
bareground. This method also performed well for all algal classes listed in 2.3 and in309
particular predictions for other macro-algae scored considerably higher with OBIA than310
with FCNNs. As mentioned previously, this class in particular had the least amount311
of labelled pixels (Figure 5) which posed a challenge for FCNNs models. Algal classes312
scored 97.6%, 88.09% and 83.18%, respectively for Enteromorpha, Microphytobentos and313
other macro-algae (inc. Fucus). Seagrass predictions were found to score 93.67% and314
saltmarsh was the worst performing class for the OBIA with 73.32%.315
FCNNs in either supervised and semi-supervised training yielded an average316
class accuracy of 76.79% and 83.3%, respectively. Both approaches to training FCNNs317
had comparable scores to OBIA with the exception to Enteromorpha and other macro-318
algae, which respectively scored 38.72% and 32.29% for supervised training and 57.05%319
and 55.90% for semi-supervised training. Other macro-algae was often miss classified320
as Enteromorpha another algal class within the dataset, while Enteromorpha was often321
predicted as saltmarsh. The addition of the unsupervised loss term for semi-supervised322
training helped increase the pixel accuracy for other macro-algae, supporting the initial323
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2021 doi:10.20944/preprints202103.0780.v1