Assessment of Machine Learning Algorithms for Automatic Benthic Cover Monitoring and Mapping Using Towed Underwater Video Camera and High-Resolution Satellite Images

Benthic habitat monitoring is essential for many applications involving biodiversity, marine resource management, and the estimation of variations over temporal and spatial scales. Nevertheless, both automatic and semi-automatic analytical methods for deriving ecologically significant information from towed camera images are still limited. This study proposes a methodology that enables a high-resolution towed camera with a Global Navigation Satellite System (GNSS) to adaptively monitor and map benthic habitats. First, the towed camera finishes a pre-programmed initial survey to collect benthic habitat videos, which can then be converted to geo-located benthic habitat images. Second, an expert labels a number of benthic habitat images to class habitats manually. Third, attributes for categorizing these images are extracted automatically using the Bag of Features (BOF) algorithm. Fourth, benthic cover categories are detected automatically using Weighted Majority Voting (WMV) ensembles for Support Vector Machines (SVM), K-Nearest Neighbor (K-NN), and Bagging (BAG) classifiers. Fifth, WMV-trained ensembles can be used for categorizing more benthic cover images automatically. Finally, correctly categorized geo-located images can provide ground truth samples for benthic cover mapping using high-resolution satellite imagery. The proposed methodology was tested over Shiraho, Ishigaki Island, Japan, a heterogeneous coastal area. The WMV ensemble exhibited 89% overall accuracy for categorizing corals, sediments, seagrass, and algae species. Furthermore, the same WMV ensemble produced a benthic cover map using a Quickbird satellite image with 92.7% overall accuracy.


Introduction
Monitoring and mapping of benthic habitats using remote sensing systems and machine learning approaches can expand our understanding of living conditions in such environments, and ensure, with appropriate supervision, the survival of occupying species over time. Recently, developments in high-quality video cameras have meant that data from videos from towed cameras can be accurately recorded and geo-located. As a result, the capability of recording geo-located sampling points enables scientists to accurately and repeatedly survey the same locations to assess the spatiotemporal variations and long-term changes in these areas. Furthermore, high-quality towed video cameras can record clear images of seafloor benthic habitats, and cover large regions quickly without affecting the environment [1], consequently providing a potential system for monitoring benthic habitats within coastal ecosystems. The towed video cameras are also much cheaper than acoustic backscatter systems [2].
However, the analysis of recorded towed videos in marine applications is usually performed manually [3], and automatic feature extraction is not often applied [4,5]. Therefore, the automatic classification of benthic habitats from towed underwater photos is a comparatively novel and innovative method [6]. The implementation of appropriate algorithms is fairly difficult, and many complexities are still associated with the video data of a towed system, including unstable illumination due to limited energy and variable velocities, angles, and elevations of the camera above the seafloor. In addition, the algorithms have to analyze a wide spectrum of overlapping features spread over the seafloor.
Examples of attempts used for benthic cover detection with underwater video systems can be found in the literature [7]. However, most researchers process photos captured from towed cameras mounted on remotely operated vehicles (ROVs) or autonomous underwater vehicles (AUVs) [3]. Adam et al. [2] investigated a random forest, neural network, and classification trees machine learning algorithms to classify two seabed categories, sand and maerl, using (RGB or LAB) pixels manually extracted from images captured by an ROV. This research assumed that the ROV had a constant speed and altitude above the seafloor surface. The resulting classification accuracies from all machine learning algorithms were high, slightly outperforming the classification trees method. Ludtke et al. [3] automatically detected Pogonophora seafloor coverage successfully with Support Vector Machines (SVM), K-Nearest Neighbor (K-NN), classification trees, and Naïve Bayes supervised machine learning algorithms. A data set of 4108 geo-referenced video mosaics captured using an ROV with a 3 m elevation above the seabed surface were used for Pogonophora recognition. These mosaic data regions were portioned into regular grid cells. The detection was performed using 49 numerical image attributes extracted from each cell. SVM results outperformed other approaches, with a classification accuracy of up to 98.86%. Jan et al. [8] used 26 color, shape, and texture attributes extracted from stereo images captured using an AUV to automatically characterize nine benthic habitats. The extracted attributes were adjusted using hue-saturation values, local binary patterns, and simple patch-gap summaries. The Random Forest (RF) classifier achieved an overall accuracy of 84% when compared to nine habitat classes assigned manually by a human expert. In addition, Paul et al. [9] proposed the Bag of Features (BOF) approach for attribute extraction with a Gaussian classifier to detect three classes, sand, seagrass, and algae, from AUV captured images. The proposed approach classified habitats correctly with 643 corrected images from a total of 730, an accuracy rate of 88%.
Teixido et al. [5] developed Seascape, a software program for obtaining semi-automatically segmented images of benthic habitats from underwater images. Bewley et al. [6] applied the same program with a hierarchical classification scheme to collect Australian benthic data sets (BENTHOZ-2015) using AUV field survey images from around Australia. This program analyzed images individually, and the input sampling points needed to be assigned manually to each image. The final outputs were segmented images with benthic habitat classes.
Nonetheless, previous studies performed to automatically extract benthic habitats from an underwater towed camera directly attached to a vessel are still limited. For instance, Paul et al. [10] used the BOF approach with a Gaussian classifier to extract eight classes from images captured using a towed camera. These classes included algae, corals, sponges, rhodoliths, uncolonized, and mixed classes. Approximately 55 images were divided according to 75% training and 25% testing sets, and were used for the calibration and validation processes of the proposed model. This process is very challenging because of the large number of classes, and non-uniform and poor lighting. The classes also share the same region, resulting in significant similarities between them. As a result, the proposed approach exhibited some significant confusion between the classes, or completely failed to differentiate between some classes; thus, improvement is required.
From a literature review, benthic cover classification using remote sensing techniques can be applied using various approaches, for example, the use of hyperspectral images with spectral libraries or look-up tables (LUTs) [11][12][13][14]. Alternatively, the integration of satellite images, or underwater videos with a multibeam echosounder (MBES), can also be used [15][16][17][18]. Finally, there is also the use of high-resolution images with in-situ video samples [19][20][21][22].
However, the abovementioned approaches have a number of drawbacks. The satellite hyperspectral images mostly have coarse spatial resolutions. Alternatively, the airborne hyperspectral images have limited coverage for large areas, are more expensive than multispectral satellite images, are computationally hard to process, and the produced datasets from these images are voluminous even when covering comparatively small areas [23]. On the other hand, MBES is relatively expensive. The production of benthic maps from multispectral satellite images has remained challenging [24]. Both multispectral images and MBES require field video samples that are usually analyzed manually. This is a time consuming and labor intensive process which is ill-suited for mapping large tracts of coastlines [25].
The contribution of this study is the proposal of a semi-automated system for benthic cover monitoring and mapping. Field survey video samples were analyzed automatically, and high-resolution satellite imagery was classified using these samples. A number of attributes were collected using the BOF approach from video images. These images were captured using an underwater towed high-resolution camera and geo-located by a Global Navigation Satellite System (GNSS). Three machine learning algorithms K-NN, SVM, and Bagging (BAG) outcomes were assembled using the Weighted Majority Voting (WMV) approach for benthic cover detection. The trained WMV ensemble was used for categorizing more geo-located images which can provide ground truth samples for benthic mapping. Formerly, a Quickbird image was classified using the correctly categorized geo-located images collected over the same field survey path. The abovementioned classification process was performed using the same K-NN, SVM, and BAG algorithms, and assembled using the WMV method. Finally, the achieved results for both benthic cover detection and mapping were then evaluated and compared using the overall accuracy and the Kappa statistical criteria.

Study Area
The study area was the Shiraho subtropical territory, which is positioned in the south-eastern part of Ishigaki Island, Japan (see Figure 1). It is an irregular heterogeneous shallow, low turbidity water area with a 3.5 m maximum depth. The Shiraho area has rich marine biodiversity that includes various species, such as seagrass and corals, fishes, and has the largest colony in the northern hemisphere of a blue ridge coral (Heliopora coerulea). It also has a reefscape with a well-developed fringing reef (reef slope, reef crest, channels, and moat) that includes complex patches of branching (Acropora spp., Montipora spp., Porites cylindrica, etc.) and massive corals (Porites spp., Heliopora coerulea, etc.) and seagrass. Furthermore, there is also a wide range of sediments including mud, sand, cobble, and boulders.

Imagery Data
A Quickbird satellite image was used for benthic cover mapping of the study area with a 0.6 m spatial resolution. The image was acquired during calm weather conditions on 20 July 2007. However, field data for the Shiraho area were collected on 21 August 2016. Although there was a time difference between imagery collection and the field data observations, the Shiraho area did not experience tsunamis or big currents until 2014 [21]. Nevertheless a big typhoon occurred in 2015 around this area, but the benthic habitats experienced no significant changes. The required values for radiometric calibrations are presented in metadata files of the images.

Benthic Cover Field Data
Benthic cover field data were collected from field surveys performed on 21st August, 2016. The Shiraho area experienced socio-ecological spatial stability during these observation years until 2014 [21]. Moreover the single big typhoon that occurred in 2015 did not cause significant changes in this area. The collection of marine images was undertaken using local knowledge of the area and a Quickbird image inspection. Underwater images were acquired using a low-cost compact highresolution video camcorder (GoPro Hero 3, 12 mega pixel effective photo resolution, 12 mega pixel camcorder sensor resolution, and 30 frames per second with a wide field of view), placed just below the water surface, so that the shallow seabed could be monitored. An array of 3 h of video recordings from the survey trip was collected and geo-located using a Lawrence GNSS system. These videos enabled the extraction and further analysis of many high-quality images. A free video to JPG converter program was used for image extraction from the video files with a 2 s image interval synchronized with the GNSS surveys. A sample of 2000 images with known locations using a GNSS system was extracted and labeled manually for the four classes (see Figure 2). These sample images were used for validating and calibrating all algorithms in the benthic cover detection process. Formerly, about 1000 more images were categorized with the trained ensemble algorithms, and checked individually. Finally, 3000 points were collected from the field survey path over Shiraho study area, constituting the field data for benthic mapping from the Quickbird image (see Figure 3).

Benthic Cover Field Data
Benthic cover field data were collected from field surveys performed on 21st August, 2016. The Shiraho area experienced socio-ecological spatial stability during these observation years until 2014 [21]. Moreover the single big typhoon that occurred in 2015 did not cause significant changes in this area. The collection of marine images was undertaken using local knowledge of the area and a Quickbird image inspection. Underwater images were acquired using a low-cost compact high-resolution video camcorder (GoPro Hero 3, 12 mega pixel effective photo resolution, 12 mega pixel camcorder sensor resolution, and 30 frames per second with a wide field of view), placed just below the water surface, so that the shallow seabed could be monitored. An array of 3 h of video recordings from the survey trip was collected and geo-located using a Lawrence GNSS system. These videos enabled the extraction and further analysis of many high-quality images. A free video to JPG converter program was used for image extraction from the video files with a 2 s image interval synchronized with the GNSS surveys. A sample of 2000 images with known locations using a GNSS system was extracted and labeled manually for the four classes (see Figure 2). These sample images were used for validating and calibrating all algorithms in the benthic cover detection process. Formerly, about 1000 more images were categorized with the trained ensemble algorithms, and checked individually. Finally, 3000 points were collected from the field survey path over Shiraho study area, constituting the field data for benthic mapping from the Quickbird image (see Figure 3).

Methodology
The proposed framework for benthic cover detection and mapping over the Shiraho coral reef area was performed using the following steps: 1-An array of video recordings was converted to geo-located images using a free video to image converter program with 2 s intervals synchronized with the GNSS recorded locations. 2-Approximately 2000 converted images were labeled with four benthic cover categories algae, sediments, seagrass, and corals. 3-The labeled geo-located images were used as inputs to the BOF approach to create the attributes for automatic detection. 4-Three machine learning classifiers BAG, SVM, and K-NN were ensemble with WMV algorithm to detect the benthic cover category using the attributes produced from BOF as inputs and image labels as outputs. 5-Evaluation of the performance of classifiers was performed using independent 75% training and 25% testing samples. 6-Once the algorithms were validated and calibrated, they were used for categorizing more images, and the resultant images were checked individually. 7-About 1000 additional images were categorized automatically as correct, and checked individually for further analysis. 8-Approximately 3000 images correctly categorized with known locations over the field survey track were used for benthic cover mapping. 9-A Quickbird image was classified using the same ensemble classifiers with WMV approach, using about 3000 geo-located images with correctly categorized benthic habitats.
The performance of algorithms depended on the overall accuracy (OA) and Kappa statistical values (Kappa). The OA is the ratio of the number of validation samples that are classified correctly to the total number of validation samples of the class. Also, the Kappa value is the proportion of correctly classified validation samples after random agreement is neglected.
All the benthic cover detection and mapping algorithms were implemented in the MATLAB environment with the subsequent explained parameters for each method.
For image categorization, the BOF approach was performed using the following parameters: the vocabulary size was 250, the grid point selection method was used for selecting the feature point locations, the grid step was 16, the block width was 32, and the retained percentage of the strongest feature from each category was 80%.
Furthermore, the classification approaches were applied for detecting and mapping benthic cover, using the following parameters: The Bag approach had 25 trees and 20 splits for each tree. On the other hand, the SVM model used a polynomial kernel function with a 3rd order polynomial. Finally, K-NN had a K value of five neighbors, with the city block method for the distance calculation; the applied distance weighting function method was the squared inverse distance weight. All of these parameters for each algorithm were selected based on the highest OA and Kappa values.
Finally, the results from each classifier were assembled using a WMV model. In the WMV ensemble, if all or two classifier outputs agree on a class, the result will be that class. Conversely, if there are three different values, the resulting class will be produced from the classifier with the highest accuracy. The proposed method consists of several key procedures, as shown in Figure 4.

Bag of Features
BOF was used for classifying images into categories, as a histogram of visual word occurrences that represent an image can be generated. These were used to create histograms, known as a bag of visual words, which can be used for training an image category classifier. The key steps of the BOF model are as follows [26]. Local features: detection and description of local features by dividing each image to small sub-images called patches. Codebook collection: the idea is to assemble the patch descriptors of all patches into clusters; the elements of the resulting clusters are then used as visual words for each codebook [27]. Feature quantization: after finishing the codebook, each local feature is assigned to one "visual word" using an unsupervised clustering method, e.g., K-means clustering. Finally, categorizing the images using the bag of visual words as the inputs for a classifier. Recently, the BOF model achieved considerable success in the application of image analysis and classification, because of its simplicity and efficiency [28].

Bag of Features
BOF was used for classifying images into categories, as a histogram of visual word occurrences that represent an image can be generated. These were used to create histograms, known as a bag of visual words, which can be used for training an image category classifier. The key steps of the BOF model are as follows [26]. Local features: detection and description of local features by dividing each image to small sub-images called patches. Codebook collection: the idea is to assemble the patch descriptors of all patches into clusters; the elements of the resulting clusters are then used as visual words for each codebook [27]. Feature quantization: after finishing the codebook, each local feature is assigned to one "visual word" using an unsupervised clustering method, e.g., K-means clustering. Finally, categorizing the images using the bag of visual words as the inputs for a classifier. Recently, the BOF model achieved considerable success in the application of image analysis and classification, because of its simplicity and efficiency [28].

Bagging
Bag is a machine learning algorithm proposed by Breiman [29]. It is an ensemble of decision trees used to improve the classification accuracy and prediction performance by reducing variance and avoiding overfitting. The basic idea of bagging is to generate some independent samples with replacements from the available training data set, and then to fit a model to each of these samples before finally aggregating these models using majority voting [30]. For a standard training set L of size n, bagging generates m new training sets L i , i = 1 to m, each of size n, by sampling from the training set uniformly and with replacements. As a result of sampling with replacements, some observations may be repeated, whereas others may not be selected at all. This process is known as bootstrap sampling. The m bootstrap samples are used for fitting the m models, and they return the class that receives the maximum number of votes [31]. DeFries and Chan [32] found that Bag is more stable and vigorous against calibration data noise than classification trees.

Support Vector Machines
SVM is a supervised, non-parametric classifier developed by Vapnik [33]. It is a well-adapted machine learning algorithm for solving linear, non-linear, and high dimensional space classification problems. In addition, it is the recommended method for the classification of multispectral and hyperspectral images that have small separated spectral values [34]. In this approach, separation between classes is performed with an optimal hyper-plane through an n-dimensional spectral space that maximizes the margin between these classes [35]. The nearest training samples in the training datasets known as support vectors are used to maximize the margin from the tested point to the optimal hyper-plane. The classification accuracy increases as the margins size are maximized [36]. In non-linear SVM problems, complex hyper-planes are represented by kernels. The Gaussian radial basis function is considered to be the optimal kernel type for many classification problems, due to its high efficiency; it requires the definition of a small number of parameters and performs better than other kernels with robust capabilities in the handling of remote sensing data [37].

K-Nearest Neighbor (K-NN)
K-NN is considered one of the simplest classification approaches, and has been used widely in numerous types of classification tasks [38,39]. Each object is assigned to a class using majority voting of the closest K training samples in the feature space. K is a predefined user value; the search continues until reaching this value. In addition, the K-NN method requires a small number of training samples, which makes it easy to implement, powerful, and capable of outperforming other classification algorithms [40]. The accuracy of the K-NN classifier depends on K, the distance between an unknown point and the nearest known samples, and the sample size [41]. Numerous recent studies have used K-NN for benthic cover mapping [14,42,43].  Figure 7. Figures 8 and 9 show samples of the correctly categorized and geo-located images along the field survey path, as well as the benthic cover map produced by the WMV ensemble using the Quickbird image for the Shiraho area. Tables 3 and 4 summarize the corresponding OA and Kappa values. Additionally, the number of correct features classified by each classifier are presented in Figure 10.

Discussion
Benthic detection and mapping of coral reefs area using remote sensing techniques can provide valuable information significant to coral reef health. In addition, it is possible to increase the accuracy of benthic cover classification by adding field reference samples, especially in shallow areas where errors of misclassification, due to confusion and overlap between various species, occur. This study proposes a strategy for monitoring benthic habitats using field samples collected semi-automatically from a towed video camera for classifying a high-resolution Quickbird satellite image.
Numerous studies have attempted to produce benthic cover maps using high-resolution satellite images, e.g., IKONOS, Quickbird, and Worldview-2 [20,21,44]. Roelfsema et al. [25] used a transecting lines method for training and validation of five benthic habitats classifications, of 1-25% seagrass, 25-50% seagrass, 50-75% seagrass, 75-100% seagrass, and algae. The produced results proved that transecting lines method can increase the spatial availability of the observed data. Roelfsema et al. [45] integrated snorkeler and AUV surveys with satellite images for five seagrass species mapping. The mapped seagrass species were Z. mueller, C. serrulata, H. ovalis, H. spinulosa, S. isoetifolium. Kutser et al. [46] mapped fifteen classes of seagrass biomass and substrate type, using the photo library method. A photo-library was established for biomass classification, where each field quadrat photo is supplemented with seagrass dry weight of the sample and a photo of the sorted sample, taken in laboratory.
Other studies attempted to use unsupervised classification of an image, subsequently naming the classes based on in situ measurements. Baumstark et al. [47] presented a combination of water column correction, unsupervised pixel classification, and image segmentation techniques, to provide a seagrass density map. Three classes of seagrass were mapped: dense seagrass, patchy seagrass, and sparse medium, with 77% overall accuracy. However, the accuracy assessment process was performed with only 30 ground truth points. On the other hand, Baumstark et al. [48] proposed an object-based image analysis OBIA method followed by unsupervised classification process for mapping five habitats using an IKONOS image. The classified habitats were hard bottom, sand mixed seagrass, seagrass dense, seagrass medium, and seagrass sparse. To ensure all benthic classes were assessed, sixty-five random points were stratified across benthic types. The overall accuracy using the OBIA method was 78%. Although these results were lower than typical accuracy standards, the authors believe that accuracy could be improved with additional ground truth samples. Alternatively, Vassallo et al. [49] proposed predictive spatial modelling as an alternative method for producing benthic habitat maps. This method was performed with complete acoustic coverage of the seafloor together, and a comparatively low number of sea truths. A Fuzzy clustering unsupervised method, applied to a set of observations made by scuba diving and used as sea truth, recognized five coralligenous habitats. The classified five coral species were Cystoseira zosteroides, Axinella polypoides, Eunicella cavolini, Eunicella singularis, and Paramuricea clava. In total, 57 stations were surveyed to within tens of meters positional accuracy; this was considered adequate at the scale 1:25,000 of the final map. The overall accuracy of the classification reached 89%. Still, this method has some weaknesses, and consequently, threats. For example, it significantly depended on data reliability, accuracy, and resolution. In addition, sea truthing samples remain indispensable for making prediction and verifying accuracy. Finally, incorrect analyses of outputs and results can lead to management errors.
However, the abovementioned studies have various demerits: the benthic habitat detection process was performed manually, producing small sampling size for calibration and validation, extensive laboratory work, and requiring a long time for processing. Moreover, the unsupervised method of the satellite image which can be used as an alternative to ensemble supervised classifiers still needs adequate ground truth samples. Finally, all the aforementioned studies suffer from a limited number of ground samples used for the calibration and validation of classification methods. Furthermore, the development of a ground sampling procedure can also improve benthic cover classification [20]. As a result, the proposed approach attempts to increase the number of ground truth samples and overcome these drawbacks, using a high-resolution camera that can be towed beneath a small vessel and can collect high-resolution images. These images can be categorized semi-automatically, providing adequate field survey samples for benthic cover classification.
The majority of previous studies that attempted to detect benthic cover species used towed video cameras fixed on AUV or ROV systems as a feasible alternative to a towed camera directly attached to a vessel or a diver. These systems have more stability and are supported by illumination systems. Accordingly, they provide high-quality images with limited noise. This enables the classification algorithms to discriminate between benthic species much more easily. On the other hand, the detection of benthic species from a towed camera directly attached to a vessel is more challenging. Paul et al. [9] tried to detect three species (sand, seagrass, and algae) from a towed camera mounted on AUV system using BOF with SVM approach. However, his approach confused the detected species and requires a number of improvements. In our study, the benthic cover features discrimination attributes were extracted using the BOF approach. We then assembled the results from SVM, BAG, and K-NN supervised classifiers using the WMV approach. The achieved results from the WMV ensemble showed improvements in species discrimination accuracy compared to a single SVM approach. As most of sea bottom cover was sediment, they overlapped with other species in numerous images. This overlap results in confusion for classifiers, especially for the algae and corals features. As a result, the classification accuracy of these two species was still relatively low. Conversely, sediment and seagrass species were detected with high accuracy.
For benthic cover mapping, the Principal Components Analysis approach was tested in our study as a means of removing irrelevant features from the twelve inputs, but the OA decreased to 70%. Furthermore, two machine learning algorithms, self-organized maps (SOM) NN and Naïve Bayes, were evaluated for classification, but they produced lower OA values. The SOM and Naïve Bayes algorithms produced OA values of 83% and 66% respectively. These results agree with similar previous studies [50,51], and as a result, were ignored from our study. Previous researchers argued the precedence of ensemble techniques to single classifiers in benthic cover mapping. As an example, Diesing and Stephens [18] tested six single classifiers and three to five classifiers, for mapping four textural classes: muddy sand, sand, gravelly sand, and sandy gravel. The five classifiers ensemble increased the classification OA by 5%. Furthermore, Aidy et al. [43] tested three ensemble techniques: majority voting, simple averaging, and mode combination, with five base classifiers. These classifiers were tested to map four types of benthic habitat: dense coral, sparse coral, dead coral, and sand. The majority voting ensemble achieved the highest OA (83%), compared with the other ensembles. Recently, Zhang et al. [14] tested three classifiers, SVM, K-NN, and RF, using WMV for benthic cover mapping with a fusion of various data sources. Three benthic cover classes (hard bottom cover, patchy seagrass, and continuous seagrass) were distingsuished, with an OA of 89%.
The proposed WMV ensemble simply combines the outcomes from three different classifiers which are trained independently. These classifiers are BAG, SVM, and K-NN; each produced different per-class accuracy, which is mainly caused by discrepancies in the concepts of the three models. Three classifiers showed a diversity in per-class accuracy, which is primarily caused by the discrepancies in concepts of three methods. BAG examines optimal decision trees to assemble data. However, SVMs try to find the optimal hyper-plane to classify data, whereas k-NN looks for the ideal match to represent inputs. For benthic cover detection, SVM achieved significantly better results than K-NN for algae and corals classification. However, K-NN slightly outperformed SVM for sediment and seagrass classification. Both classifiers produced significantly better results than the BAG classifier. The most challenging part of this process was distinguishing between algae and sediment or corals and sediment; both algae and corals species were surrounded by sediment in the majority of benthic cover images. WMV ensemble increased the benthic cover detection OA and Kappa values from the three base classifiers, with about 4% and 0.05 respectively, to reach 89.4% and 0.85 respectively. For benthic cover mapping, K-NN and SVM resulted in the same classification accuracy, whereas the BAG slightly surpassed both classifiers. The three classifiers had a difficulty differentiating between algae and corals. However, the WMV ensemble improved the classification OA and Kappa values for benthic mapping from the three base classifiers, with about 5% and 0.08 respectively, to reach 92.7% and 0.89 respectively. Certainly, the WMV ensemble approach resulted in higher classification accuracy than BAG, SVM, and the K-NN classifiers used in benthic cover detection and mapping.
Monitoring the global spatio-temporal changes in benthic community structure can be performed using the aforementioned semi-automated framework. More information about ecological monitoring studies has been discussed at length in literature [23]. For example, Manuel [52] used the coral point count method [53] for the quantification of the relative abundance of coral reef functional groups, over time and space. Phinn [19] and Roelfsema [54] proposed a combination of object-based image analysis and ecological modelling for the mapping of geomorphic and ecological zones in coral reefs. Both studies used the geo-referenced photo-transect method [55], and categorized the images based on the same coral point count method [53]. However, our proposed methodology for detecting benthic cover images is faster than the semi-automated method proposed by Manuel [52]. As soon as the algorithms were sufficiently trained with adequate images, they could be used for distinguishing between geo-located images. Only a towed camera which can be mounted on a small vessel, or a snorkeler can provide these images. This field survey can be repeated every specific time over the same study area to monitor the benthic cover features. Furthermore, the proposed fast processing of ground-truth images would help to increase the sampling size which can be integrated with ecological modelling for monitoring purposes.
In summary, the proposed monitoring system has numerous merits. For instance, it is low cost, because the required tools are not expensive, e.g., a GoPro camera, a GNSS, and a small boat or a snorkeler. In addition, this monitoring system is not harmful to the surrounding ecosystem and can be used annually to follow the health of benthic habitats. It also provides sufficient field categorized images which can be used for benthic cover mapping. Finally, it requires a relatively short time for processing images using simple programs. However, some demerits still require improvement, e.g., the limited shallow areas which can be processed and the limitations in mixed areas or small patches.
The results encourage more future studies in this field. These studies may include performing the same approach with ROV systems at known locations for monitoring deep seafloor areas. Additionally, the same approach can be developed to process video files for benthic cover detection in the field. This development can be used in monitoring coral reef bleaching or their spatial or temporal changes. Furthermore, the same ensemble or fuzzy majority voting techniques can be tested with soft classifiers for the same targets. On the other hand, testing the performance of deep learning algorithms using a high-quality benthic cover images would build a well-established benthic cover monitoring system.

Conclusions
Benthic habitat monitoring and mapping are essential for the management and conservation of coral reef environments. The construction of an accurate and informative monitoring system is important in effectively planning a network of threatened zones, and to monitor habitat fragmentation degree. This study assessed the performance of integrating three machine learning algorithms for benthic cover habitat monitoring, using towed underwater videos and a GNSS system. In addition, we also mapped seafloor habitats using Quickbird satellite images. In this article, we introduced an approach for the semi-automatic detection and mapping of the Shiraho heterogeneous coastal area, including corals, algae, seagrass, and sediment. The WMV algorithm was applied to collate the outputs from three machine learning algorithms. These algorithms were SVM, K-NN, and BAG. The automatic detection of benthic habitats was based on the BOF technique. A number of attributes were extracted from labeled examples using raw towed video image data. Furthermore, the correctly detected benthic habitat images were synchronized with the GNSS system, and were used to classify Quickbird satellite imagery. We achieved an OA of automatic detection for the four habitats of 89.4% using the WMV algorithm. Finally, accurate habitat maps for the Shiraho area were produced with a 92.7% OA with the same WMV ensemble. These results demonstrate improvements in automatic benthic habitats monitoring, and in mapping accuracies.
Author Contributions: H.M. and T.N designed and performed the field work of collecting the benthic habitats samples from the study site. K.N. supervised this research work. H.M. and K.N. developed the benthic habitats detection and mapping algorithms. H.M. analyzed the results and wrote the manuscript. K.N. and T.N. have provided substantial edits and reviews on the many early drafts of this manuscript.