Semi-Supervised Segmentation for Coastal Monitoring Seagrass Using RPA Imagery

Intertidal seagrass plays a vital role in estimating the overall health and dynamics of coastal environments due to its interaction with tidal changes. However, most seagrass habitats around the globe have been in steady decline due to human impacts, disturbing the already delicate balance in the environmental conditions that sustain seagrass. Miniaturization of multi-spectral sensors has facilitated very high resolution mapping of seagrass meadows, which significantly improves the potential for ecologists to monitor changes. In this study, two analytical approaches used for classifying intertidal seagrass habitats are compared—Object-based Image Analysis (OBIA) and Fully Convolutional Neural Networks (FCNNs). Both methods produce pixel-wise classifications in order to create segmented maps. FCNNs are an emerging set of algorithms within Deep Learning. Conversely, OBIA has been a prominent solution within this field, with many studies leveraging in-situ data and multiresolution segmentation to create habitat maps. This work demonstrates the utility of FCNNs in a semi-supervised setting to map seagrass and other coastal features from an optical drone survey conducted at Budle Bay, Northumberland, England. Semi-supervision is also an emerging field within Deep Learning that has practical benefits of achieving state of the art results using only subsets of labelled data. This is especially beneficial for remote sensing applications where in-situ data is an expensive commodity. For our results, we show that FCNNs have comparable performance with the standard OBIA method used by ecologists.


Introduction
Accurate and efficient mapping of seagrass extents is a critical task given the importance of these ecosystems in coastal settings and their use as a metric for ecosystem health. In particular, seagrass ecosystems play a key role for estimating and assessing the health and dynamics of coastal ecosystems due to their sensitive response to tidal processes [1][2][3][4]; or human-made artificial interference [5][6][7]. Furthermore, seagrass plays a vital part in sediment stabilization [8], pathogen reduction [9], carbon sequestration [10,11] and as a general indicator for water quality [12]. However, there is evidence seagrass areas have been in steady decline due to human disturbance for decades [13].
In coastal monitoring, remote sensing has provided a major platform for ecologists to assess and monitor sites for a plethora of applications [14]. Traditionally, passive remote sensing via satellite imagery was used to provide global to regional observations at regular sampling intervals. However, it often struggles to overcome problems such as cloud contamination, oblique views and costs for data acquisition [15]. Another problem with satellite imagery is its coarse resolution. The shift to remotely piloted aircraft (RPAs), and commercially available cameras, resolves resolution by collecting several overlapping very high resolution (VHR) images [16] and stitching the sensor outputs together using Structure from Motion techniques to create orthomosaics [17]. The benefits for using these instruments are two-fold-firstly, the resolution of imagery can be user controlled with respects to drone altitude; secondly, sampling intervals are more accessible when compared with data acquisitions from satellite imagery. Advances in passive remote sensing have allowed coastal monitoring of intertidal seagrass, intertidal macroalgae and other species in study sites such as: Pembrokeshire, Wales [16]; Bay of Mont St Michel, France [18]; a combined study of Giglio island and the coast of Lazio, Italy [19] and Kilkieran Bay, Ireland [20] with the latter using a hyperspectral camera. However, seagrass mapping is not exclusive to passive remote sensing with studies monitoring subtidal seagrass and benthic habitats using underwater acoustics in study sites such as: Bay of Fundy, Eastern Atlantic Canada [21]; Lagoon of Venice, Italy [22] and Abrir La Sierra Conservation District, Puerto Rico [23]. The main goal of these studies is to create a habitat map by classifying multispectral or acoustic data into sets of meaningful classes such that the spatial distribution of ecological features can be assessed [24]. This work also aims to produce a habitat map of Budle Bay; a large (2 km 2 ) square estuary on the North Sea in Northumberland, England (55.625 • N, 1.745 • W). Two species of seagrass are of interest, namely Zostera noltii and Angustifolia. However, this work will also consider all other coastal features of algae and sediment recorded in an in-situ survey conducted by the Centre for Environment, Fisheries and Aquaculture Science (CEFAS) and the Environment Agency.
Object Based Image Analysis (OBIA) [25] is an approach for habitat mapping that starts by performing an initial segmentation that clusters pixels into image-objects by maximising heterogeneity between said objects and homogeneity within them. A common image segmentation method used in OBIA is multiresolution segmentation (MRS), a non-supervised region-growing segmentation algorithm that provides the grounds for extracting textural, spatial and spectral features that can be used for supervised classification [26,27]. For habitat mapping of intertidal and subtidal species in coastal environments, OBIA has found successful applications using auxiliary in-situ data, that is, ground truth data via site visit, for supervised classification [16,20,[28][29][30][31][32][33][34][35]. A standard approach is to overlay in-situ data on generated image-objects through MRS so that selected objects are used to extract features that can create Machine Learning models. These models are then used to classify the remaining image-objects, thus creating a habitat map.
Developments in Computer Vision through Deep Learning have improved state of the art results in a plethora of image processing tasks [36][37][38]. This emerging field in computer vision is very different than most traditional Machine Learning approaches to supervised classification. Traditional methods can be defined by two separate components-feature extraction and model training. The former condenses raw data into numerical representations (features) that are best suited to represent inputs in the subsequent classification task which maps the extracted features to outputs [39]. This is the same approach as adopted by OBIA, where an initial multiresolution segmentation provides the grounds to extract spectral features before utilising a Random Forest classifier to map the remaining image-objects. In fact, for remote sensing applications dimensionality reduction is a key processing stage that has shown to provide good classification performance [40,41]. However, this process limits the ability for classifiers to process natural data in their raw form [42]. Deep Learning is an alternative approach allowing for hierarchical feature learning, which in effect combines learning features and training a classifier in one optimisation [39].
The introduction of Convolutional Neural Networks [36] (CNNs) within Deep Learning has shown to be pivotal for computer vision research and has been applied to remote sensing in a plethora of applications [43][44][45][46]. Fully Convolutional Neural Networks (FC-NNs) are a variant of CNNs that can perform per-pixel classification [38,47], which is an equivalent output to habitat mapping using OBIA. Furthermore, CNNs can leverage semi-supervised strategies whereby subsets of labelled data are used for optimisation while also achieving state of the art results [48], with similar strategies applied for FCNNs [49][50][51].
This approach can be beneficial for practical applications of FCNNs for remote sensing where the quantity and distribution of labelled data within a coastal environment may be limited due to associated costs of in-situ surveying.
In this work, habitat maps of Budle Bay for species of seagrass, algae and sediment were mapped using two analytical approaches that produce equivalent outputs-OBIA and FCNNs. The former has been a prominent solution for coastal surveying using remote sensing data, while the latter is a variant of CNNs that have been shown to provide promising results in established datasets [52,53] as well as other applications of remote sensing [43][44][45][46]. Furthermore, approaches for semi-supervised segmentation using FCNNs were investigated in order to discover whether an increase in performance can be achieved without supplementing FCNNs with more labelled data.
We will answer the following research questions: Section 2.4 will detail the data collection and pre-processing necessary for FCNNs, both methods will be explained and tailored for the study site in Sections 2.5 and 2.6, results are presented in Section 3 and an analysis of these results is in Section 4.

Study Site
The research was focused on Budle Bay, Northumberland, England (55.625 • N, 1.745 • W). The coastal site has one tidal inlet, with previous maps also detailing the same inlet [54][55][56]. Sinuous and dendritic tidal channels are present within the bay, and bordering the channels are areas of seagrass and various species of macroalgae. Figure 1 displays very high resolution orthomosaics of Budle Bay created using Agisoft's MetaShape [57] and structure from motion (SfM). SfM techniques rely on estimating intrinsic and extrinsic camera parameters from overlapping imagery [58]. A combination of appropriate flight planning in terms of altitude and aircraft speed, and the camera's field of view were important for producing good quality orthomosaics. Two sensors were used-a SONY ILCE-6000 camera with 3 wide banded filters for Red, Green and Blue channels and a ground sampling distance of approximately 3 cm ( Figure 1, bottom right). A MicaSense RedEdge3 camera with 5 narrow banded filters for Red (655-680 nms), Green (540-580 nms), Blue (459-490 nms), Red Edge (705-730 nms) and Near Infra-red (800-880 nms) channels and a ground sampling distance of approximately 8 cm (Figure 1, top right).

Data Collection
Each orthomosaic was orthorectified using respective GPS logs of camera positions and ground markers that were spread out across the site. This process ensures that both mosaics were well aligned with respect to each other, and also with ecological features present within the coastal site.

On-Situ Survey
CEFAS and the Environment Agency conducted ground and aerial surveys of Budle Bay in September 2017 and noted 13 ecological targets that can be grouped into background sediment, algae, seagrass and saltmarsh. Classes within the background sediment were rock, gravel, mud and wet sand. These features were modelled as one single class and dry sand was added to further aid distinguishing sediment features. Algal classes included Microphytobenthos, Enteromorpha spp. and other macroalgae (inc. Fucus). Lastly, the remaining coastal vegetation classes were seagrass and saltmarsh. Since the aim is to map the areas of seagrass in general, both species Zostera noltii and Angustifolia were merged to a single class while saltmarsh remains as a single class although two different species were noted. Thus, a total of seven target classes can be listed. The in-situ survey recorded 108 geographically referenced tags with the percentage cover of all ecological features previously listed within a 300 mm radius. These were dispersed mainly on the Western, Central and Southern portions of the site. Figure 2 displays the spatial distribution of recorded measurements by placing a point for each tag over the orthophoto generated using the SONY camera.

Data Pre-Processing for FCNNs
The orthomosaic from the SONY camera was 87,730 × 72,328 pixels with 3 image bands orthomosaic, while the RedEdge3 multispectral orthomosaic was 32,647 × 26,534 with 5 image bands. For ease of processing, each orthomosaic was split into nonoverlapping blocks of 6000 × 6000 images with each image containing geographic information to be used for further processing. The SONY orthomosaic was split into 140 tiles and the RedEdge3 into 24.
The recorded percentage covers were used to classify each point in Figure 2 to a single ecological class listed in Section 2.3 based on the highest estimated cover during the in-situ survey. The classification for each point provides the basis to create geographically referenced polygon files through photo interpretation. This process generated a total of 56 polygons that were split into train and test sets. The train set had 42 polygons and the test set 14. The reasoning for using photo interpretation instead of selecting segmented image-objects was to avoid bias from the OBIA when generating segmentation maps used for FCNN training. Figure 3 displays a gallery of images for each class with some example polygons.

Polygons to Segmentation Masks for FCNNs
Each polygon contains a unique semantic value depending on the recorded class. FCNNs were trained with segmentation maps that contain a one-to-one mapping of pixels encoded with a semantic value, with the goal to optimise this mapping [47]. Segmentation maps used for training FCNNs were created using the geographic coordinates stored in each polygon and converting real-world coordinates for each vertex to image-coordinates. If a polygon fits within an image, then the candidate image was sampled into 256 × 256 image tiles centered on labelled sections of the image. By cropping images centered on polygons the edges of each image have a number of pixels that were not labelled. The difference in spatial resolution for each camera results in a difference in labelled pixels, since each polygon covers the same area within the real-world. This process generated 534 images with the RedEdge3 multispectral camera from both sets of polygons. Polygons from the train set were split into 363 images for training and 69 images for validation, while test set polygons generated 102 images. The SONY camera produced 1108 images from both sets of polygons. The train set was split into 770 images for training and 125 for validation and the test set of polygons generated 213 images.

Vegetation, Soil and Atmospheric Indices for FCNNs
Vegetation, soil and atmospheric indices are derivations from standard Red, Green and Blue and/or Near-infrared image bands that can aid discerning multiple vegetation classes [59]. Near-infrared, red, green and blue bands from the RedEdge3 were used to compute a variety of indices, adding five bands of data to each input image. These extra bands were: Normalised Difference Vegetation Index (NDVI) [60], Atmospheric Resistant Vegetation Index (IAVI) [61], Modified Soil Adjusted Vegetation Index (MSAVI) [62], Modified Chlorophyll Absorption Ratio Index (MCARI) [63] and Green Normalised Difference Vegetation Index (GNDVI) [64]. The red, green and blue channels for both cameras were used to compute additional four indices, namely Visible Atmospherically Resistant Index (VARI) [65], Visible-band Difference Vegetation Index (VDVI) [66], Normalised Green-Blue Difference Vegetation Index (NGBDI) [67] and Normalised Green-Red Difference Vegetation Index (NGRDI) [68]. The choice of these indices was mostly due to the importance of the green channel for measuring reflected vegetation spectra, while also providing more data for FCNNs to start with before modelling complex one-to-one mappings for each pixel.
The above index images were stacked along the third dimension onto each image resulting in images for the RedEdge3 and the Sony camera having 14 and 7 bands respectively. Furthermore, each individual image band was scaled to a value between 0 and 1.

Fully Convolutional Neural Networks
Fully Convolutional Neural Networks [38,47,69] are an extension of traditional CNN architectures [36,70] adapted for semantic segmentation. CNNs usually comprise a series of layers that process lower layer inputs through repeating convolution and pooling operations followed by a final classification layer/s. Each convolution and pooling layer transforms the input image into higher level abstracted representations. FCNNs can be broken down into two networks: an encoder and a decoder network. The encoder network is identical to a CNN, except the final classification layer which is removed. The decoder network applies alternate upsample and convolution operations on feature maps created by the encoder network and a final classification layer with 1 × 1 convolution kernels and a softmax function. Network weights and biases are adjusted through gradient descent by minimising the loss function between network outputs and the ground truth pixel labels. Figure 4 displays the architecture used for this work. The overall architecture is a U-Net [38] and the encoder network is a ResNet101 [71] pre-trained on ImageNet. Residual learning has proven to surpass very deep neural networks [71] and is a suitable encoder network for the overall U-Net architecture. The decoder network applies a transposed 2 × 2 convolution for upsampling while also concatenating feature maps from the encoding stage at appropriate resolutions followed by a final 3 × 3 convolution. The final 1 × 1 convolution condenses feature maps to have the same number of channels as the total number of classes before a softmax transfer function classifies each pixel. The input channels are stacked and passed through the network. The encoder network applies repeated convolution and max pooling operations to extract feature maps, while in the decoder network upsamples these and stacks features from the corresponding layer in the encoder path. The output is a segmented map, which is compared with the mask using crossentropy loss. The computed loss is used to train the network, through gradient descent optimisation.
For semi-supervised training the Teacher-Student method was used [48]. This approach requires two networks: a teacher and a student, both having the same architecture as shown in Figure 4. The student network is updated through gradient descent minimising the sum of two loss terms: a supervised loss calculated on labelled pixels of each segmentation map, and conversely, an unsupervised loss calculated using non-labelled pixels. The teacher network is updated using an exponential moving average of weights from the student network.

Weighted Training for FCNNs
Section 2.4 detailed the process of creating segmentation maps from polygons. Both sets of images from each camera had an imbalanced target class distribution. Figure 5 shows the number of labelled pixels per class and also the number of non-labelled pixels for each camera.
The recorded distribution poses a challenge for classes such as other macroalgae and Microphytobentos due to the relative number of labelled pixels in comparison with the remaining target classes. The pixel counts shown in Figure 5 were used to calculate the probability of each class occurring within the training set, and for each class a weight was calculated by taking the inverse for each probability. During FCNN training the supervised loss was scaled with respect to these weights.
where, w i is ith weight for a given class probability p i and K is the total number of classes. Figure 5. Distribution of labelled pixels for each class and non-labelled pixels.

Supervised Loss
For the supervised loss term, consider X ∈ R B×C×H×W and Y ∈ Z B×H×W to be respectively, a mini-batch of images and corresponding segmentation maps; where B, C, H and W are respectively, batch size, number of input channels, height and width. Processing a mini-batch with the student network outputs per-pixel scoresŶ ∈ R B×K×H×W ; where K is the number of target classes. The softmax transfer function converts network scores into probabilities by normalising all K scores for each pixel to sum to one.
where, x ∈ Ω; Ω ⊆ Z 2 is a pixel location and P k (x) is the probability for the kth channel at pixel location x, with ∑ K k =1 P k (x) = 1. The negative log-likelihood loss is calculated between segmentation maps and network probabilities.
For each image, the supervised loss is the sum of all losses for each pixel using Equation (3) and averaged according to the number of labelled pixels within Y.

Unsupervised Loss
Previous work in semi-supervised segmentation details using a Teacher-Student model and advanced data augmentation methods in order to create two images for each network to process [49,50]. While this work did not use data augmentation methods, pairs of images were created using labelled and non-labelled pixels within Y.
Similarly to the supervised loss term, a mini-batch of images is passed through both the student and the teacher networks, producing per-pixel scoresŶ andȲ respectively. Again, pixel scores are converted to probabilities with softmax (Equation (2)), respectively producingP andP for the two networks. The maximum-likelihood of teacher predictions was used to create pseudo segmentation maps to compute the loss for non-labelled pixels of Y. Thus, the unsupervised loss is also calculated similarly to Equation (3) but the negative log-likelihood is computed between predictions from the student model (P) and a pseudo map (Y p ) of pixels that are initially non-labelled.
For each image, the unsupervised loss was the sum of all losses for each pixel using Equation (4) averaged according to the number of non-labelled pixels within Y. The latter loss was also scaled with respect to the confidence in predictions for the teacher network so that initial optimisation steps focus more on the supervised loss term. Classes with low labelled pixel count would benefit from the unsupervised loss term, as confident teacher predictions can guide the decision boundaries of student models by adding pseudo maps to consider.

Training Parameters
Combining both loss terms yields the objective cost used for optimising FCNNs in a semi-supervised setting.
where L s and L u are respectively the supervised and unsupervised loss term. The supervised loss was scaled according to the weights computed in Equation (1) and the unsupervised loss to γ which was set to 0.1 for all experiments. All networks were pre-trained on ImageNet. Networks for each camera were trained for 150 epochs with a batch-size of 16 using Adam optimiser. The learning rate was initially set to 0.001 and reduced by a factor of 10 every 70 epochs of training. All FCNNs were implemented and trained using Pytorch version 10.2.

OBIA
The OBIA method for modelling multiple coastal features was performed using eCognition v9.3 [72]. This software has the tools to process high resolution orthomosaics and shape file exports from GISs to create supervised models. Section 2.4 detailed a number of methods used to pre-process the orthomosaics and shape polygons, however the OBIA does not require this.
The first step in OBIA is to process each orthomosaic using a multiresolution segmentation algorithm to partition the image into segments, also known as image-objects [72]. The segmentation starts with individual pixels and clusters pixels to image-objects using one or more criteria of homogeneity. The subsequent clustering of two adjacent imageobjects or image-objects that are a subset of each other are merged together based one the following criterion: where o 1 , o 2 and o m respectively represent the pixel values for objects 1, 2 and a candidate virtual merge m. N and M are the number of total pixels, respectively for objects 1 and 2. This criterion evaluates the change in homogeneity during fusion of image-objects. If this change exceeds a certain threshold value, then the fusion is not performed. In contrast, if the change in image-objects is below the threshold, then both candidates are clustered to form a larger region. The segmentation procedure stops when no further fusions are possible without exceeding the threshold value. In eCognition, this threshold value is a hyper-parameter defined at the start of the process and is also known as the scale parameter. The geometry of each shape is defined by two other hyper-parameters-shape and compactness. For this work, the scale parameter was set to 200, the shape to 0.1 and the compactness to 0.5. Figure 6 shows image objects overlaid on top of both orthomosaics. In Section 2.4.1, the split of polygons used for training and testing has been detailed. Each polygon (Figure 3) from the training set was overlaid on top of image-objects to select the candidate segments for extracting spectral features. Selected image-objects create a database for the in-built Random Forest [73] in eCognition. The spectral features for the RedEdge3 camera were channel mean and standard deviation, vegetation and soil indices (NDVI, RVI, GNDVI, SAVI), ratios between red/blue, red/green and blue/green image layers and the intensity and saturation components of the HSI colour space. The features for the SONY were the same, but the vegetation and soil indices were not added. Once the features and image-objects were selected, the Random Forest modeller produced a number of Decision Trees [74] with each tree being optimised on features using the GINI Index.

Accuracy Assessment
The measurements used to objectively quantify results were pixel accuracy, precision, recall and F1-score. Pixel accuracy was the ratio between pixels that were classified correctly and the total number of labelled pixels within the test set for a given class. Precision and recall are metrics that can show how a classifier performs for each specific class. F1-score is the harmonic mean of recall and precision and is therefore a suitable metric to quantify classifier performance when a single figure of merit is needed. Equation (7)

Results
The outputs for both the FCNNs and OBIA were compared with a subset of polygons that were not used for training. Figures 7 and 8 display confusion matrices scoring outputs from each method and camera as pixel accuracy. The confusion matrices also show pixel accuracies for FCNN models that were optimised using only Equation (3) and models that were optimised using both Equations (3) and (4). The confusion matrices reflect average results over three training runs with the set of hyper-parameters described in Section 2.5.4. Overall results for OBIA and FCNNs in a semi-supervised setting for each camera can be viewed in Table 1, where precision, recall and F1-score are reported. Figure 9 displays habitat maps for each method and camera.

SONY ILCE-6000 Results
Predictions with the OBIA method had an average pixel accuracy of 90.6%. Classes related to sediment had scores of 100% and 98.38%, respectively for dry sand and other bareground. Algal classes scored 97.6%, 88.09% and 83.18%, respectively for Enteromorpha, Microphytobentos and other macroalgae (inc. Fucus). Seagrass predictions were found to score 93.67% and saltmarsh was the worst performing class for the OBIA with 73.32%.
FCNNs yielded an average class accuracy of 76.79% and 83.3%, respectively for supervised and semi-supervised settings. Both approaches scored close to 100% for dry sand and other bareground performed better in a semi-supervised setting scoring 96.88%. Scores for Enteromorpha and other macroalgae (inc. Fucus) were respectively 38.72% and 32.29% for supervised training and 57.05% and 55.90% for semi-supervised training. Seagrass scored similarly in both training settings with approximately 90% and saltmarsh scored better in a supervised setting with 87.78%, while the semi-supervised setting scored 81%.

MicaSense RedEdge3 Results
The OBIA method had an average pixel accuracy of 73 FCNNs yielded an average class accuracy of 85.27% and 88.44%, respectively for supervised and semi-supervised settings. Both models had good scores for sediment classes scoring above 95% in pixel accuracy. Algal classes of Enteromorpha, Microphytobentos and other macroalgae (inc. Fucus) respectively scored for supervised and semi-supervised training (88.22%, 96.29%), (89.40%, 89.72%) and (47.93%, 58.39%). Seagrass predictions scored 77.68% and 70.23 respectively for supervised and semi-supervised training, while saltmarsh was found to score 99% in both settings. Figure 9 shows the habitat maps of Budle Bay for each camera and method previously described.  Table 1 indicate that FCNNs provide comparable performance to OBIA. Figures 7 and 8 also show an increase in performance for the semi-supervised FCNN models in comparison to the fully-supervised.

FCNNs Convergence
The convergence of FCNNs was analysed by testing multiple settings for learning rate and assessing computed confusion matrices as well as training and validation losses over several runs of the algorithm. This ensured that all models converged appropriately. Figures 7 and 8 show average pixel accuracy scores over three sequential runs with the same hyper-parameters described in Section 2.5.4.

SONY ILCE-6000 Analysis
Habitat maps from the SONY camera were found to perform better with the OBIA than FCNNs in terms of average pixel accuracy and F1-score. Respectively, the OBIA had an average accuracy and F1-score of 90.6% and 0.71, while FCNNs in a semi-supervised setting had 83.3% and 0.65.
Sediment class predictions for both methods scored well, with both metrics either scoring above 90% or above 0.9, respectively for pixel accuracy and F1-score. This suggests that the OBIA and FCNNs methods successfully predicted test polygons for sediment classes while also avoiding false positive and false negative pixel classifications.
Algal classes were found to have mixed performance depending on the method used. Scores in Figure 7 with OBIA noted that classes of Enteromorpha and other macroalgae (inc. Fucus) scored better, while Microphytobentos were more accurate with FCNNs. However, scores in Table 1 for the same classes suggest that OBIA performed better for classes of Enteromorpha and Microphytobentos, while FCNNs scored better for other macroalgae. Analysing areas in Figure 9 that were predicted as Enteromoprha with OBIA and comparing these areas with FCNN habitat maps show that the latter method interchangeably predicts Enteromorpha and saltmarsh. This observation can be supported by Figure 7 where 60.43% and 41.14% of test labels for Enteromorpha were predicted as saltmarsh, respectively for supervised and semi-supervised settings. These points suggest that habitat maps detailing areas for Enteromorpha with OBIA were more likely to be correct. Pixel classifications in Figure 3 for Microphytobentos indicate that FCNNs performed well and accurately mapped test polygons of Microphytobentos, however figures for precision and F1 in Table 1 also indicate that FCNNs have high false positive rate for this class. Conversely, OBIA produced a perfect figure for precision which indicates that no pixel classifications for test polygons were false positive. This high false positive rate for Microphytobentos can be noticed by comparing the areas mapped as other bareground using OBIA that were mapped as Microphytobentos for FCNNs. Therefore, habitat maps with OBIA were more likely to be correct for predictions of Microphytobentos. Other macroalgae (ic. Fucus) was found to be a problematic class for FCNNs due to the low number of labelled pixels relative to the rest of the dataset ( Figure 5). Confusion matrices in Figure 7 show that other macroalgae were often classified as Enteromorpha, which is another algae present in Budle Bay. However, they also show that the semi-supervised results were much better than the results in the supervised setting, which supports the premise in Section 2.5.3 that an unsupervised loss term on pseudo segmentation maps could help datasets with a relative low number of labelled pixels. While scores show that OBIA performs better on classification of other macroalgae, Table 1 shows that the F1-score was lower with OBIA than FCNNs, which was mainly due to the OBIA low precision score. Habitat maps in Figure 9 show that most areas classified as other macroalgae are similar for both approaches.
The confusion matrix also shows that scores for seagrass are high for both methods. However, Table 1 also shows that precision figures were 0.64 and 0.27, respectively for OBIA and FCNNs. This again suggests a high false positive rate for FCNNs, with habitat maps in Figure 9 also detailing more areas mapped as seagrass with FCNNs than with OBIA. Therefore, areas mapped as seagrass with OBIA were more likely to be correct than FCNNs. The results for saltmarsh were in general very similar for both methods. Scores in the confusion matrix show that saltmarsh polygons was 73.32% for OBIA, and 87.78% and 81.0% for FCNNs, respectively for supervised and semi-supervised settings. The F1-score was 0.84 and 0.88, respectively for OBIA and FCNNs. This suggests that OBIA was more likely to classify pixels within saltmarsh polygons incorrectly, although overall both maps present similar areas mapped as saltmarsh.

MicaSense RedEdge3 Analysis
Habitat maps from the RedEdge3 multispectral camera were found to be more correct with the FCNNs than OBIA in terms of both average pixel accuracy and F1-score. The OBIA had an average accuracy and F1-score of 73.4% and 0.60, while semi-supervised FCNN had 88.44% and 0.78. In terms of both pixel accuracy and F1-score for sediment classes, FCNNs were found to perform better than OBIA. The confusion matrix for the latter method in Figure 8 shows that 35.82% of pixels in dry sand polygons were classified as other bareground, while Table 1 shows figures of 0.99 for precision and 0.62 for recall. This would suggest that false negative classifications for dry sand were mostly other bareground. Figure 8 shows FCNNs in both settings achieved scores of 98% matrix and the semisupervised setting had an F1-score of 0.98, which suggests that FCNNs accurately mapped dry sand test polygons. However, the habitat maps in Figure 9 note some differences in areas mapped as dry sand for each method. In particular, supervised FCNNs were found to classify larger areas as dry sand, whereas semi-supervised FCNNs produced similar results to OBIA. OBIA classified 56.49% of other bareground polygon pixels as Microphytobentos. In Section 2.3, other bareground was noted to include wet sand, while Microphytobentos is a unicellular eukaryotic algae and cyanobacteria that grow within the upper millimeters of illuminated sediments, typically appearing only as a subtle greenish shading [75]. This could provide some reasoning for other bareground and Microphytobentos being interchangeably classified with one another with OBIA. Similarly to dry sand, FCNNs performed well in terms of both pixel accuracy and F1-score which suggest that other bareground polygons were classified correctly without producing many false positives. Figure 8 and Table 1 show the scores for algal classes were higher with FCNNs than with OBIA. However, both methods were in fact similar in terms of these figures, with the exception of F1-score for other macroalgae with OBIA. The confusion matrix in Figure 8 shows that both OBIA and FCNN classifications for Microphytobentos exhibited poor precision. Similarly to the SONY camera, this can be noticed by large areas in Figure 9 being predicted as Microphytobentos instead of other bareground, especially for FCNNs in a supervised setting. Both methods mapped Enteromorpha in similar areas, but FCNNs included classifications for Enteromorpha in the center and the south eastern boundary of the site, while OBIA predicted mostly seagrass and other bareground for the same stated areas. Other macroalgae class was found to have better results with FCNNs over OBIA. Moreover, comparing supervised and semi-supervised models, we note an increase in performance when the unsupervised loss term was added to the training algorithm, which supports the initial hypothesis that the unsupervised loss term aids FCNNs with target classes that have a low number of labelled pixels relative to the remaining classes.
The remaining vegetation classes of seagrass and saltmarsh were found to have good performance with both methods, however the OBIA was found to perform better with respect to seagrass classifications. Both Figure 8 and Table 1 supported this with recall scores being lower with FCNNs than OBIA. As mentioned, low recall indicates high false negative rate and interestingly all FCNNs did not predict seagrass along the north western part of the site (area covered in Figure 6). While it is not possible to quantify which method was correct without surveying the site again, the confidence in seagrass predictions for OBIA along with FCNNs predicting bareground sediment instead of vegetation can lead to users being more confident with OBIA for seagrass mapping. Both methods performed same for saltmarsh and habitat maps in Figure 9 show that most predicted areas were similar, however FCNNs were found to interchangeably classify saltmarsh and seagrass which is also supported by Figure 8, where each confusion matrix for FCNNs predicted a number of seagrass test polygon pixels as saltmarsh.

Overall Analysis
In the discussion of the results for both cameras we have found two key results. The first result is that OBIA continues to be a suitable method for intertidal seagrass mapping while assessing multiple coastal features of algae and sediment within a site. Figures 7 and 8 as well as Table 1 reported pixel accuracy and F1-score that would suggest some degree of confidence for areas classified as seagrass with OBIA within the maps shown in Figure 9. A plethora of other studies have mapped intertidal seagrass using OBIA with encouraging results [16,19,76,77]. However, this work also attempted to make a direct comparison between FCNNs and OBIA and showed that the latter outperformed the proposed method with respects to intertidal seagrass mapping. Furthermore, the provided analysis recorded accuracies for supervised classifications at a pixel-level. Some work on intertidal seagrass mapping give confusion matrices for supervised classification where accuracies reflect the percentage of segmented image-objects through MRS that were classified correctly [19] and geographically referenced shape points [77]. The work in [76] also performed an analysis of OBIA for intertidal seagrass mapping at a pixellevel, however this work also considered mapping intertidal seagrass at various density levels, which adds complexity to the mapping task. In fact, seagrass mapping can also be considered as a regression problem instead of classification [16,78]. Other work using FCNNs for seagrass mapping was found in [79][80][81]. However, these studies were mainly concerned with subtidal seagrass meadows instead of intertidal seagrass. FCNNs have been used for mapping intertidal macroalgae [82] with reported average accuracies for a 5 class problem to be 91.19%. Yet, this work considered mapping intertidal macroalgae, seagrass and sediment features at a coarser resolution. In fact, this was to the authors' knowledge the first use of FCNNs for intertidal seagrass mapping.
The second key result is that although FCNNs performed worse for seagrass mapping, overall results shown in Section 3 noted that FCNNs had a comparable performance with OBIA in terms of average pixel accuracy and F1-score. Moreover, Figures 7 and 8 as well as habitat maps in Figure 9 showed that a semi-supervised setting could increase the overall performance of FCNNs, reducing the need for more labelled data. This was particularly true for other macroalage (inc. Fucus) which benefited the most from a semi-supervised training mode. Recent applications for semi-supervised segmentation have shown to produce state of the art results with subsets of labelled data [49,50,83,84], which can provide alternate modelling approaches for FCNNs in practical applications where labelled data is limited. Studies within remote sensing often have very limited amounts of labelled data while the recent trends show the use of weakly-supervised and semi-supervised training regimes may be utilised to overcome this problem [85][86][87]. In particular, [87] applies adversarial training for seagrass mapping to overcome the domain shift from mapping in different coastal environments, while this work leverages non-labelled parts of each image to produce pseudo-labels in a Teacher-Student framework.

Conclusions
In this work, we showed that FCNNs trained from a small set of polygons can be used for segmentation of intertidal habitat maps in high resolution aerial imagery. Each FCNN was evaluated in two training modes, supervised and semi-supervised, with results indicating that semi-supervision helps with segmentation of target classes that have a small number of labelled pixels. This prospect may be of benefit in studies where in-situ surveying is an expensive effort to conduct.
We also showed that OBIA continues to be a robust approach for monitoring multiple coastal features in high resolution imagery. In particular, OBIA was found to be more accurate than FCNNs in predicting seagrass for both cameras. However, as noted in Section 3 OBIA results were highly dependant on the initial parameters used for MRS, with the scale parameter being critical for image-object creation.
The study site and problem formation described in Section 2.3 combined for a complex problem. This in turn can make confidence in seagrass predictions decrease as ambiguity over multiple vegetation classes increases. OBIA was found to overcome this for both cameras accurately predicting seagrass polygons while maintaining relatively high precision when compared to FCNNs. On the other hand, FCNNs were found to be more accurate in classifying algal classes, in particular other macroalgae, which had the least number of labelled pixels. Therefore, while this work shows that OBIA is a suitable method for intertidal seagrass mapping, other applications within remote sensing for coastal monitoring with restricted access to in-situ data can utilise semi-supervised FCNNs.