Semiautomated Mapping of Benthic Habitats and Seagrass Species Using a Convolutional Neural Network Framework in Shallow Water Environments

: Benthic habitats are structurally complex and ecologically diverse ecosystems that are severely vulnerable to human stressors. Consequently, marine habitats must be mapped and monitored to provide the information necessary to understand ecological processes and lead management actions. In this study, we propose a semiautomated framework for the detection and mapping of benthic habitats and seagrass species using convolutional neural networks (CNNs). Benthic habitat ﬁeld data from a geo-located towed camera and high-resolution satellite images were integrated to evaluate the proposed framework. Features extracted from pre-trained CNNs and a “bagging of features” (BOF) algorithm was used for benthic habitat and seagrass species detection. Furthermore, the resultant correctly detected images were used as ground truth samples for training and validating CNNs with simple architectures. These CNNs were evaluated for their accuracy in benthic habitat and seagrass species mapping using high-resolution satellite images. Two study areas, Shiraho and Fukido (located on Ishigaki Island, Japan), were used to evaluate the proposed model because seven benthic habitats were classiﬁed in the Shiraho area and four seagrass species were mapped in Fukido cove. Analysis showed that the overall accuracy of benthic habitat detection in Shiraho and seagrass species detection in Fukido was 91.5% (7 classes) and 90.4% (4 species), respectively, while the overall accuracy of benthic habitat and seagrass mapping in Shiraho and Fukido was 89.9% and 91.2%, respectively.


Introduction
Recently, high-resolution underwater video systems have enabled scientific discoveries of the seafloor related to marine studies, environmental management, and species monitoring. In particular, towed underwater video cameras play an important role in detecting benthic habitats [1,2] by facilitating detailed observations and field sampling of unexplored marine ecosystems. Furthermore, towed video cameras are low cost, have fast processing, and are environmentally sustainable, i.e., they do not harm the environment. However, these video systems produce low light, increased turbidity, and images with high noise, which all pose challenges for underwater video analysis. Firstly, low illumination and turbid conditions produce weak descriptors and confuse object detection techniques. Secondly, benthic habitats are frequently altered by waves and currents and may appear to be different from various camera angles. Thirdly, the images produced by towed video cameras usually have low contrast and low saturation; consequently, these often provide insufficient information for the recognition datasets (WHOI, ZooScan, Kaggle plankton datasets, EILAT, and RSMAS coral datasets). In these latter two studies, the evaluated CNN models accomplished state-of-the-art accuracies that outperformed classical methods.
Although these aforementioned approaches achieved high recognition accuracies, they each required substantial computation and memory requirements, which are not available to most users. Moreover, the available underwater image datasets were inadequate for training CNNs for benthic habitats feature extraction from scratch. As a result, the authors of these studies used pre-trained CNNs as feature extractors for classifying machine learning algorithms. In nearly all image detection and classification applications, the image descriptors extracted from pre-trained CNNs have been superior to hand-crafted features [23]. These learned descriptors are also transferable to other domains, such as underwater image detection, which saves time and reduces labor relative to end-to-end network training. Thus, researchers have recently begun to solve underwater classification problems using pre-trained CNNs as feature extractors [24].
For example, the authors in [25] applied extracted CNN attributes from a pre-trained VGGNet first fully connected layer for coral reef classification. These authors trained a multilayer perceptron (MLP) network with the extracted attributes using 4750 images in which 237,500 points had been annotated by an expert. These images were a subset from the Benthoz15 dataset [26]; they were divided into 70% training and 30% testing images. The model classified the images into coral and noncoral, achieving 97% OA. In another study, Ammar et al. [27] combined VGGNet learned features with hand-crafted features for coral reef classification using a two-layer MLP classifier to exploit the diversity of the representation attributes, which included 4096 dimensional features extracted from a fully connected layer of VGGNet and 540 dimensional color and texture descriptors [18]. The proposed method was evaluated using 2055 images with 400.000 expert pixel annotations from the MLC dataset labeled to four noncoral labels and five coral genera. This combination of features outperformed the use of individual attributes with an average 3% increase in classification accuracy. Lian et al. [28] combined fully connected features and convolutional features extracted from VGGNet network layers in a coral classification process. The principal component analysis dimensionality reduction method was used to compress these attributes, while the EFC dataset, consisting of 212 images with 42.400 point annotations categorized into 10 classes, was used for model evaluation. Two-thirds of these samples were used for training and one-third was used for testing. The authors achieved 91.4% OA using a linear SVM algorithm. In a separate study [29], the authors introduced features extracted from deep residual networks [12] for underwater classification using four benthic datasets (MLC, Benthoz15, EILAT, and RSMAS). They showed that features extracted from deeper convolutional layers were superior to those from shallower layers. Moreover, combining these features resulted in more powerful image descriptors. Finally, Ammar et al. [24] tested the same deep residual network features for recognizing kelp in underwater images; these features were found to outperform both CNNs and hand-crafted features.
Based on these collected studies, benthic cover recognition approaches using underwater images can be placed into two categories, each of which has numerous disadvantages. The first approach depends on classifying underwater images using manually labeled points in each image and saved-in-state, off-the-shelf datasets; however, this approach is ill-suited to mapping large tracts of coastline [30]. The second approach involves classifying underwater images individually based on hand-crafted methods, which largely rely on human annotators; thus, this approach can be cumbersome and inefficient [31]. The automatic classification of towed underwater images, therefore, remains challenging and requires further innovative [31]. The classic alternative is to produce benthic habitat maps of large-scale coastal areas from multispectral satellite images [32][33][34]. However, this process requires sufficient ground truth data to train the classification algorithms every time a classification is performed. Consequently, the classification of largely inaccessible benthic habitat is rare. Therefore, it remains necessary to develop a benthic habitat mapping framework that can be applied to various areas with reliable cost, speed, and accuracy [35]: This is the focus of the present study.
Here, we present a semiautomated framework for benthic habitat and seagrass species detection and mapping. Specifically, we investigated a combination of shape and CNN descriptors in an underwater image detection process. Furthermore, we classified high-resolution satellite images for benthic habitat mapping using CNNs with simple architectures. The main achievements described here are summarized as follows: (i) We combined CNN attributes, i.e., image features extracted from pre-trained CNNs, and BOF attributes to exploit their diversity; (ii) we demonstrated that our proposed method outperforms single CNN and BOF algorithms using two diverse underwater image recognition datasets; (iii) we exploited this combination to create ground truth samples for high-resolution satellite image classification; and (iv) we used CNNs with simple architectures for benthic habitat and seagrass species mapping and accomplished a superior classification accuracy relative to those produce by machine learning algorithms.

Study Areas
Ishigaki Island, located in the south of Japan in the Pacific Ocean, was the overall study area chosen for this framework assessment ( Figure 1). It is a subtropical island with abundant biodiversity, shallow coastal areas, and a maximum water depth of 3.5 m. Two smaller areas on either side of the island were selected to evaluate the framework: The Shiraho coastal area and Fukido cove area. The Shiraho area was a heterogeneous ecosystem with numerous reefscapes, including complex hard corals, such as Acropora and Porites, and soft corals, such as Heliopora coerulea. Furthermore, a wide range of sediments exists along the coastline (e.g., soft sand, cobble, and boulders), as well as both brown and other algae. Moreover, a dense Thalassia hemprichii seagrass grows on the sandy-bottom seafloor. The Fukido area is a seagrass bed with turbid waters located in a tidal flat with sand, silt, and clay bottom coverage near the mouth of the Fukido River. A T. hemprichii seagrass meadow dominates the area: Leaves are 8-15 cm high, and seagrass extends along the shoreline (300 m wide and 1000 m long). In addition, long leaves of Enhalus acoroides seagrass with 30-150 cm in length, which is classified as vulnerable species, have been found in the Fukido area [36]. The seagrasses in the Fukido area can, therefore, be placed into 4 categories: E. acoroides, tall T. hemprichii, short T. hemprichii, and areas in which seagrass are sparse ( Figure 2).
Remote Sens. 2020, 12, x 4 of 17 benthic habitat mapping using CNNs with simple architectures. The main achievements described here are summarized as follows: (i) We combined CNN attributes, i.e., image features extracted from pre-trained CNNs, and BOF attributes to exploit their diversity; (ii) we demonstrated that our proposed method outperforms single CNN and BOF algorithms using two diverse underwater image recognition datasets; (iii) we exploited this combination to create ground truth samples for highresolution satellite image classification; and (iv) we used CNNs with simple architectures for benthic habitat and seagrass species mapping and accomplished a superior classification accuracy relative to those produce by machine learning algorithms.

Study Areas
Ishigaki Island, located in the south of Japan in the Pacific Ocean, was the overall study area chosen for this framework assessment ( Figure 1). It is a subtropical island with abundant biodiversity, shallow coastal areas, and a maximum water depth of 3.5 m. Two smaller areas on either side of the island were selected to evaluate the framework: The Shiraho coastal area and Fukido cove area. The Shiraho area was a heterogeneous ecosystem with numerous reefscapes, including complex hard corals, such as Acropora and Porites, and soft corals, such as Heliopora coerulea. Furthermore, a wide range of sediments exists along the coastline (e.g., soft sand, cobble, and boulders), as well as both brown and other algae. Moreover, a dense Thalassia hemprichii seagrass grows on the sandy-bottom seafloor. The Fukido area is a seagrass bed with turbid waters located in a tidal flat with sand, silt, and clay bottom coverage near the mouth of the Fukido River. A T. hemprichii seagrass meadow dominates the area: Leaves are 8-15 cm high, and seagrass extends along the shoreline (300 m wide and 1000 m long). In addition, long leaves of Enhalus acoroides seagrass with 30-150 cm in length, which is classified as vulnerable species, have been found in the Fukido area [36]. The seagrasses in the Fukido area can, therefore, be placed into 4 categories: E. acoroides, tall T. hemprichii, short T. hemprichii, and areas in which seagrass are sparse ( Figure 2).

Field Data Collection
Field data from the Shiraho benthic habitats and Fukido seagrass species were collected during the typhoon season on 21 and 28 August 2016, respectively. Moreover, the rainfall acquired prior to the data acquisition times increased the outflows from the major tributaries, Todoroki River for Shiraho reef and Fukido River for Fukido cove. However, the turbidity level was higher in Fukido cove than the Shiraho area. The Fukido cove muddy substrate and that it is near to the Fukido River mouth explain the high turbidity level. Two field surveys were performed to collect underwater images using a high-resolution towed video camera (GoPro HERO3 Black Edition) [37] (Figure 3), which was attached with a wooden stand to a motorboat side, thus that it was placed directly beneath the water surface. In addition, the coordinates of the surveyed underwater images were recorded using a differential global positioning system (DGPS) mounted vertically above the camera ( Figure  4). At each site, about 4 h of recordings were collected; these were then converted to underwater images using free video-to-image converter software. The extracted images had 1 second time intervals that were synchronized with the DGPS observations. Finally, 3000 benthic habitat images were labeled as 7 classes, while 1500 seagrass images were labeled as 4 categories. The images were labeled manually to construct benthic habitat-and seagrass species-detection schemes.

Field Data Collection
Field data from the Shiraho benthic habitats and Fukido seagrass species were collected during the typhoon season on 21 and 28 August 2016, respectively. Moreover, the rainfall acquired prior to the data acquisition times increased the outflows from the major tributaries, Todoroki River for Shiraho reef and Fukido River for Fukido cove. However, the turbidity level was higher in Fukido cove than the Shiraho area. The Fukido cove muddy substrate and that it is near to the Fukido River mouth explain the high turbidity level. Two field surveys were performed to collect underwater images using a high-resolution towed video camera (GoPro HERO3 Black Edition) [37] (Figure 3), which was attached with a wooden stand to a motorboat side, thus that it was placed directly beneath the water surface. In addition, the coordinates of the surveyed underwater images were recorded using a differential global positioning system (DGPS) mounted vertically above the camera (Figure 4). At each site, about 4 h of recordings were collected; these were then converted to underwater images using free video-to-image converter software. The extracted images had 1 second time intervals that were synchronized with the DGPS observations. Finally, 3000 benthic habitat images were labeled as 7 classes, while 1500 seagrass images were labeled as 4 categories. The images were labeled manually to construct benthic habitat-and seagrass species-detection schemes. Remote Sens. 2020, 12, x 6 of 17

Satellite Data
Two high-resolution satellite images were used for benthic habitat and seagrass mapping at the 2 study areas: A Quickbird satellite image for the Shiraho area and a Geoeye-1 satellite image for Fukido cove with 0.6 m and 0.5 m spatial resolutions, respectively. Both platforms had the same number of bands, i.e., one panchromatic band and red, green, and blue multispectral bands. The Quickbird image was acquired on 20 July 2007, and the Geoeye-1 image was acquired on the 23 June 2017. On both dates, the weather was calm, and cloud coverage was low. Although there was a time gap between the satellite image collection and field data surveys, neither study area experienced significant changes during this time [38]. A radiometric calibration process was performed for both images using the values presented in the images metadata files.

Satellite Data
Two high-resolution satellite images were used for benthic habitat and seagrass mapping at the 2 study areas: A Quickbird satellite image for the Shiraho area and a Geoeye-1 satellite image for Fukido cove with 0.6 m and 0.5 m spatial resolutions, respectively. Both platforms had the same number of bands, i.e., one panchromatic band and red, green, and blue multispectral bands. The Quickbird image was acquired on 20 July 2007, and the Geoeye-1 image was acquired on the 23 June 2017. On both dates, the weather was calm, and cloud coverage was low. Although there was a time gap between the satellite image collection and field data surveys, neither study area experienced significant changes during this time [38]. A radiometric calibration process was performed for both images using the values presented in the images metadata files.

Satellite Data
Two high-resolution satellite images were used for benthic habitat and seagrass mapping at the 2 study areas: A Quickbird satellite image for the Shiraho area and a Geoeye-1 satellite image for Fukido cove with 0.6 m and 0.5 m spatial resolutions, respectively. Both platforms had the same number of bands, i.e., one panchromatic band and red, green, and blue multispectral bands. The Quickbird image was acquired on 20 July 2007, and the Geoeye-1 image was acquired on the 23 June 2017. On both dates, the weather was calm, and cloud coverage was low. Although there was a time gap between the satellite image collection and field data surveys, neither study area experienced significant changes Remote Sens. 2020, 12, 4002 7 of 18 during this time [38]. A radiometric calibration process was performed for both images using the values presented in the images metadata files.

Methodology
The framework proposed in this study has 2 successive processes. First, benthic habitats and seagrasses were detected by SVM [39,40] classifier using attributes extracted from pre-trained CNNs and a BOF [41,42] approach. Second, the correctly detected images were used as field samples for training a CNN with a simple architecture in order to map benthic habitats and seagrasses. These processes were conducted in the MATLAB environment.

Benthic Habitat and Seagrass Detection
Detection of benthic cover and seagrass species in each study area was established using the following steps: 1.

3.
All these labeled georeferenced images were used as inputs for the pre-trained VGG16 CNN and BOF approach in order to create the descriptors for use in the semiautomatic recognition process.

4.
Extracted attributes from the fully connected layer (FC6) of the VGG16 CNN and BOF approach were used as the inputs for training the SVM classifier; the outputs were image labels.

5.
Validation of the SVM classifier was conducted using 75% randomly-sampled independent images for training and 25% for testing. 6.
More images were categorized using the validated SVM classifier and checked individually.
For benthic habitat and seagrass species categorization, 4096 descriptors were extracted from the input images using the fully connected layer (FC6) of VGG16 CNN. In addition, 250 BOF attributes were extracted with a block width of 32, grid step of 16, the strongest feature percentage from each category set to 80%, and by using the grid point selection method. Subsequently, the SVM classifier with the third-order polynomial kernel function was used for the categorization process.

Benthic Habitat and Seagrass Mapping
Correctly categorized underwater images were used as ground truth samples for benthic cover and seagrass species mapping as follows:

1.
A number of image patches were extracted around each correctly categorized image location with 2 pixel dimensions in horizontal and vertical directions.

2.
The image patch size was 2 × 2 × 3 pixels; 1500 image patches were each extracted from the Quickbird imagery for benthic habitat mapping and the Geoeye-1 imagery for seagrasses mapping.

3.
These image patches were used as inputs for evaluating CNNs with a simple architecture for benthic habitat and seagrass mapping; they were divided into 75% training images and 25% testing images.

4.
Benthic habitat and seagrass mapping was performed by the trained CNNs using high-resolution satellite images.
The proposed CNN for benthic habitat and seagrass species mapping had 7 layers ( Figure 5); the classification layer number was 7 for benthic habitat mapping and 4 for seagrasses mapping. The results of the CNN were obtained using a stochastic gradient descent optimizer with an initial learning rate of 0.0001. Finally, the highest classification accuracy was achieved for benthic habitats using     Figure 6. Flowchart of the methodology used in this study for benthic habitats and seagrass species detection and mapping.

Benthic Habitat and Seagrass Detection
The overall accuracies for each benthic habitat and seagrass species categorized from the SVM classifier using BOF and VGG16 attributes are shown in Figures 7 and 8, respectively. In addition, Tables 1-3 summarizes the corresponding recognition OA and kappa values for each of the tested methods. Furthermore, Tables 2-4 present the confusion matrices for the detection of benthic habitats and seagrass species using the BOF and VGG16 descriptors along with the SVM classifier.
Remote Sens. 2020, 12, x 9 of 17 The overall accuracies for each benthic habitat and seagrass species categorized from the SVM classifier using BOF and VGG16 attributes are shown in Figures 7 and 8, respectively. In addition, Tables 1-3 summarizes the corresponding recognition OA and kappa values for each of the tested methods. Furthermore, Tables 2-4 present the confusion matrices for the detection of benthic habitats and seagrass species using the BOF and VGG16 descriptors along with the SVM classifier.

Benthic Habitat and Seagrass Mapping
The training and validation accuracy progress of the proposed CNN for the Shiraho benthic habitats and Fukido seagrass species is shown in Figures 9 and 10, respectively. In addition, Tables 5 and 6 show the resultant confusion matrices from benthic habitat and seagrass species classification using the proposed CNN. Finally, the resultant classified maps for the Shiraho benthic habitats and Fukido seagrass species are presented in Figure 11.

Benthic Habitat and Seagrass Mapping
The training and validation accuracy progress of the proposed CNN for the Shiraho benthic habitats and Fukido seagrass species is shown in Figures 9 and 10, respectively. In addition, Tables 5  and 6 show the resultant confusion matrices from benthic habitat and seagrass species classification using the proposed CNN. Finally, the resultant classified maps for the Shiraho benthic habitats and Fukido seagrass species are presented in Figure 11.     (a) (b) Figure 11. The resulted CNNs classified maps in Ishigaki Island, Japan: (a) Shiraho area benthic habitats map (b) Fukido cove seagrasses maps.

Discussion
The use of towed cameras with motorboats to survey benthic habitats has allowed scientists to investigate larger areas than could be assessed by traditional SCUBA diving. However, the efficiency of image collection is in contrast to the inefficiency of data processing in ecosystem analysis: Image classification is usually performed manually by marine experts, which is time-consuming and costly [24]. As a result, developing an automated analysis of underwater images is essential if the advantages of remote surveying technologies are to be exploited. In the present study, we proposed a semiautomated framework for benthic cover and seagrass species detection and mapping using CNNs. Our framework provides alternative solutions for the recognition and mapping of benthic habitats and seagrasses worldwide and could ultimately support the conservation of these important ecosystems.
For benthic habitats and seagrass species detection, numerous layers from various CNNs with different architectures, including FC7 and FC8 layers from VGG19 or AlexNet, and loss3-classifier layer from GoogleNet, were tested for all detection classes, but these yielded relatively low OA values. Indeed, the FC6 layer from VGG16 and the FC1000 layer from Resnet50 produced significantly better results than these CNNs layers when used to recognize benthic habitats and seagrass spices. Furthermore, various additional attributes were tested such as the Hue Saturation Value (HSV) color descriptors and the Gray Level Co-occurrence Matrix (GLCM) texture descriptors. However, these descriptors yielded significantly lower OA, and their addition to the BOF and VGG 16 attributes did not improve the resulted overall detection accuracy. We also applied principal components analysis in our study to remove redundant features; however, this reduced the OA in all experiments. Moreover, the powerful and generic features extracted from CNNs have already shown superior performance over BOF features [43] and conventional hand-crafted features [27,44]. In addition, various classifiers, such as Bagging trees, K-nearest neighbor, and neural network, were assessed for benthic habitats and seagrasses detection but yielded lower OA results. On the other hand, SVM produced the highest OA for all benthic cover and seagrasses species' classification compared to all tested classifiers.

Discussion
The use of towed cameras with motorboats to survey benthic habitats has allowed scientists to investigate larger areas than could be assessed by traditional SCUBA diving. However, the efficiency of image collection is in contrast to the inefficiency of data processing in ecosystem analysis: Image classification is usually performed manually by marine experts, which is time-consuming and costly [24]. As a result, developing an automated analysis of underwater images is essential if the advantages of remote surveying technologies are to be exploited. In the present study, we proposed a semiautomated framework for benthic cover and seagrass species detection and mapping using CNNs. Our framework provides alternative solutions for the recognition and mapping of benthic habitats and seagrasses worldwide and could ultimately support the conservation of these important ecosystems.
For benthic habitats and seagrass species detection, numerous layers from various CNNs with different architectures, including FC7 and FC8 layers from VGG19 or AlexNet, and loss3-classifier layer from GoogleNet, were tested for all detection classes, but these yielded relatively low OA values. Indeed, the FC6 layer from VGG16 and the FC1000 layer from Resnet50 produced significantly better results than these CNNs layers when used to recognize benthic habitats and seagrass spices. Furthermore, various additional attributes were tested such as the Hue Saturation Value (HSV) color descriptors and the Gray Level Co-occurrence Matrix (GLCM) texture descriptors. However, these descriptors yielded significantly lower OA, and their addition to the BOF and VGG 16 attributes did not improve the resulted overall detection accuracy. We also applied principal components analysis in our study to remove redundant features; however, this reduced the OA in all experiments. Moreover, the powerful and generic features extracted from CNNs have already shown superior performance over BOF features [43] and conventional hand-crafted features [27,44]. In addition, various classifiers, such as Bagging trees, K-nearest neighbor, and neural network, were assessed for benthic habitats and seagrasses detection but yielded lower OA results. On the other hand, SVM produced the highest OA for all benthic cover and seagrasses species' classification compared to all tested classifiers.
In the majority of benthic images, blue corals and corals were in the same location, while sediments and soft sand were mixed, all of which confused all classifiers. Furthermore, distinguishing between sparse seagrass areas and short T. hemprichii areas was the most challenging part of seagrass detection.
Considering the poor quality of the towed images, water turbidity, and the mix of substrates in the studied areas, the resultant OAs can be considered reliable for both benthic habitats and seagrass species detection. Thus, we have demonstrated that transfer learning with pre-trained VGG16 networks combined with BOF significantly improves the detection of seagrass meadows and benthic habitats at various locations.
Previous studies have presented various techniques for benthic cover and seagrass mapping. For instance, several studies have demonstrated the effectiveness of an object-based image analysis approach for seabed mapping using high-resolution satellite images [45][46][47]. These studies reported accuracies of 61.6% (9 classes) [45], 82.0% (11 classes) [46], and 92% (4 classes) [47] for mapping benthic habitats using Quickbird-2, Planet Dove, and Worldview-2, respectively. Other studies have integrated bathymetry data with Worldview-2 sensor bands as inputs for machine learning classifiers [35,48]. Consequently, Luis et al. [48] achieved 89.5% for classifying seven classes using a maximum likelihood classifier, whereas Pramaditya et al. [35] reached a maximum OA of 88.5% for 14 classes using a random forest classifier. Alternative studies proposed unsupervised classification of high-resolution satellite images and labeling classes based on field observations for seabed mapping [2,49,50]. These studies showed that unsupervised classification labeled with field data achieved overall accuracies comparable to those produced by machine learning classifiers. However, all of these studies used field samples extracted manually with small sample sizes, which involved substantial labor and processing time. Moreover, unsupervised approaches require adequate field samples for validation and calibration. Our proposed semiautomated framework can overcome such problems to create field samples automatically for subsequent benthic habitat and seagrass mapping.
After several experiments, we conclude that the best patch size is 2 × 2 × 3 pixels for benthic habitat and seagrass mapping. Additionally, the optimum CNN models have the architecture illustrated in Figure 5. However, the proposed CNN became confused between short and tall T. hemprichii in some areas. Moreover, discriminating sparse seagrass areas from specific seagrass species, especially E. acoroides, was also a challenging task. E. acoroides leaves are generally located in submerged areas; they are usually projected vertically and do not lie flat on the substrate. As a result, E. acoroides is difficult to classify by remote sensing methods [51]. In benthic habitats, the seagrass areas had the lowest overall classification accuracy as they were misclassified with blue coral areas and other classes. However, other classes were classified with significantly higher overall accuracies and our benthic habitat mapping results were superior to those of similar studies that used high-resolution satellite images for large-scale mapping of the seabed. It must be noted that it is difficult to compare our accuracies with those of previous studies due to differences in the satellite sensors used, water turbidity, and diversity of substrate.
In general, seagrasses are vital blue carbon ecosystems that are suffering from global decline [51,52]; however, these declines are not well-documented in tropical regions [53]. This global decline is a consequence of human activities, causing seagrass degradation through eutrophication and sedimentation [54]. Thus, obtaining seagrass species distributions and percentage coverage is vital for developing protection and monitoring systems of these threatened areas. However, applying optical remote sensing techniques for the large-scale mapping of seagrasses is challenging [55] for many reasons. First, seagrasses usually grow in turbid waters, and the signal to noise ratio of the processed images is exceptionally low. Second, seagrass meadows show significant seasonal variation and are frequently moved by waves and currents. Third, seagrass areas are usually heterogeneous with mixed seagrass species.
While high-resolution satellite images are generally available, reliable seagrass labeling and mapping using machine learning algorithms is usually difficult for the reasons mentioned above.
Nevertheless, recent studies have tested machine learning algorithms for seagrass mapping.
For instance, Eva et al. [56] compared WorldView-3, Ziyuan-3A, Sentinel-2, and Landsat 8 sensors for mapping of seagrass meadows in shallow waters, all of which were suitable for seagrass mapping. Moreover, an object-based image analysis model classified five seagrass species with 69% maximum OA using WorldView-3 imagery. Pramaditya and Wahyu [55] showed that a classification trees algorithm outperforms SVM and maximum likelihood for seagrass species mapping using PlanetScope satellite image; the classification trees algorithm classified five seagrass meadows with 74% OA. Nam et al. [57] compared ensemble machine learning algorithms for seagrass monitoring using data from Sentinel-2 imagery; they demonstrated the effectiveness of a rotation forests ensemble for classifying dense and sparse seagrass areas with 88% OA. On the other hand, Daniel et al. [58] proposed deep capsule network and deep CNN models for the quantification of seagrass distribution through regression. Their proposed models were evaluated with WorldView-2 satellite images and achieved better results than traditional regression methods. Overall, the results of our study show that detection and mapping of seagrasses using the proposed CNN model is a better option than using traditional machine learning algorithms.
Our proposed framework has several advantages. First, the system used for collecting in situ data is not harmful to the environment and can be performed annually to monitor ecosystem changes. Second, the pre-trained CNNs calibrated by ground truth observations can be adapted for use in other areas. Third, the proposed framework is semiautomatic, accurate, cost-effective, and consistent with simple classification schemes that can be widely applied. Finally, the presented approach achieved high accuracies with simple logistics, short processing time, and small amounts of training data. However, our proposed framework has limitations: Accuracy decreased in areas with mixed substrates and turbid waters such as Fukido cove [59], and the system was tested only in shallow water environments. Moreover, the field observations were collected by motorboats, which require appropriate weather conditions for surveying. These limitations may be overcome in future studies, which will focus on enhancing the towed underwater images and reducing the turbidity effects. Moreover, the emerging NASA multispectral cameras [60] will be tested; these cameras can produce sub centimeters resolution multispectral underwater images, which will increase the discriminating power of the applied classifiers. Finally, testing the proposed framework using ROVs, which can produce higher quality underwater images and survey deep seafloor areas.

Conclusions
In this study, we proposed a simple, fast, and cost-effective system for seabed substrate categorization and mapping using CNNs. Our results attest to the superior performance of a combination of pre-trained CNNs and BOF descriptors for benthic cover and seagrass detection. Moreover, our model incorporating CNNs with simple architectures shows promise for the mapping of seabed benthos and, therefore, merits further testing using various case studies. Using the Shiraho area and Fukido cove as validation sites, we found that integrating CNNs and a BOF approach achieved the highest OAs of 91.5% and 90.4% for benthic habitat and seagrasses detection, respectively. Furthermore, applying CNNs with simple architecture for seabed mapping significantly improved our results, with 89.9% and 91.2% OA for benthic habitat and seagrasses mapping, respectively. Thus, by using our framework, seabed substrates and seagrasses can be accurately categorized and mapped with minimum-workforce field observations.