1. Introduction
Recently, high-resolution underwater video systems have enabled scientific discoveries of the seafloor related to marine studies, environmental management, and species monitoring. In particular, towed underwater video cameras play an important role in detecting benthic habitats [
1,
2] by facilitating detailed observations and field sampling of unexplored marine ecosystems. Furthermore, towed video cameras are low cost, have fast processing, and are environmentally sustainable, i.e., they do not harm the environment. However, these video systems produce low light, increased turbidity, and images with high noise, which all pose challenges for underwater video analysis. Firstly, low illumination and turbid conditions produce weak descriptors and confuse object detection techniques. Secondly, benthic habitats are frequently altered by waves and currents and may appear to be different from various camera angles. Thirdly, the images produced by towed video cameras usually have low contrast and low saturation; consequently, these often provide insufficient information for the recognition and discrimination of species. Finally, the captured images often consist of an assemblage of benthic habitats with irregular shapes and sizes [
3].
Previous studies [
4,
5,
6] have provided precedence for applying computer vision-based techniques to benthic image classification. Such research has used multiple combinations of shape, color, and texture features as the most important discriminating factors for datasets. For example, Shihavuddin et al. [
6] tested a combination of color and texture feature descriptors (such as completed local binary patterns, Gabor filter responses, hue channel color histograms, and gray level co-occurrence matrices) for underwater image classification. They proposed using several combinations of machine learning classifiers, e.g., k-nearest neighbors, neural networks, and support vector machines (SVMs). Three benthic datasets, EILAT, RSMAS, and Moorea labeled corals (MLC), were used to evaluate the proposed methods. The results of the study showed that a combination of various feature extraction descriptors and classifiers outperformed a single method across various datasets with at least a 5% overall improvement in classification accuracy. In addition, Gauci et al. [
7] assessed three red-green-blue channels and three LAB color dimensions representing the lightness and the four unique colors of human vision extracted manually as classifiers of two benthic habitats: Sand and maerl. They used images captured by cameras mounted on remote operated vehicles (ROVs) and evaluated the accuracy and efficiency of three machine learning algorithms (i.e., random forest, neural network, and classification trees); each of the tested algorithms achieved state-of-the-art classification accuracies. In another study, Raj and Murugan [
8] used bagging of features (BOF) descriptors with an SVM algorithm to classify seven benthic classes. Around 11000 underwater images captured by a camera on an ROV device were used to evaluate the process: 80% for training and 20% for testing. The proposed method resulted in 93% overall accuracy (OA).
Recently, convolutional neural networks (CNNs) have been successfully used in numerous classification tasks [
9]. The production of new CNN architectures such as AlexNet [
10], VGGNet [
11], ResNet [
12], and GoogleNet [
13] has improved the resultant classification accuracy in several computer vision problems. Furthermore, automation of the classification of benthic habitat images captured by towed underwater cameras has now been investigated [
14,
15]. Raphael et al. [
16] reviewed the recent developments in CNNs for coral classification, and they highlighted the current limitations and future research directions of this technology.
Elawady [
17] was the first to propose the use of CNNs for coral classification; the raw input images used in the study were first enhanced using color correction and smoothing filtering. A LeNet-5 CNN was then trained with an input layer containing three basic channels of color images, texture, and shape descriptors. Two datasets were used to evaluate the proposed method: The MLC dataset [
18], which included 2000 images and 9 classes, and the Atlantic Deep-Sea dataset [
19], which had 55 images and 5 classes. Overall, the model resulted in 55% OA. Asma et al. [
20] also proposed a coral classification model for detecting damaged and healthy corals using a CNN. These authors collected 1200 images of damaged and healthy corals from the Persian Gulf, fusion tables, Google searches, coral reef image bank.org, and Australian and Florida corals datasets. From these images, 90% were used for training and 10% for evaluating the model. The final model predicted diseased and healthy corals with 95% classification accuracy. In another study, Andrew et al. [
21] compared CNN and fully CNN (FCNN) architectures using multi-view underwater images to improve coral reef ecosystem classification accuracy. These authors proposed a patch-based CNN that could process multiple-viewpoint images as inputs while also creating 3D semantic segmentation of diverse coral reef batches. They also evaluated a combination of voting- and logit pooling-based methods with these patch-based CNNs. To validate the method, 2391 stereo image pairs were divided into 2 subsets, an 80% training set, and a 20% testing set, in order to classify 10 classes. They reported an OA of 94%. In a study by Anabel et al. [
22], the authors evaluated three powerful CNNs (Inception v3, ResNet, and DenseNet) with data augmentation techniques for classifying underwater coral images. Three datasets (MLC, ELAT, and RSMAS) were used for evaluating these models. Furthermore, Alessandra et al. [
15] exploited the diversity of various CNN ensembles to study plankton and coral classification. These ensembles were evaluated using five datasets (WHOI, ZooScan, Kaggle plankton datasets, EILAT, and RSMAS coral datasets). In these latter two studies, the evaluated CNN models accomplished state-of-the-art accuracies that outperformed classical methods.
Although these aforementioned approaches achieved high recognition accuracies, they each required substantial computation and memory requirements, which are not available to most users. Moreover, the available underwater image datasets were inadequate for training CNNs for benthic habitats feature extraction from scratch. As a result, the authors of these studies used pre-trained CNNs as feature extractors for classifying machine learning algorithms. In nearly all image detection and classification applications, the image descriptors extracted from pre-trained CNNs have been superior to hand-crafted features [
23]. These learned descriptors are also transferable to other domains, such as underwater image detection, which saves time and reduces labor relative to end-to-end network training. Thus, researchers have recently begun to solve underwater classification problems using pre-trained CNNs as feature extractors [
24].
For example, the authors in [
25] applied extracted CNN attributes from a pre-trained VGGNet first fully connected layer for coral reef classification. These authors trained a multilayer perceptron (MLP) network with the extracted attributes using 4750 images in which 237,500 points had been annotated by an expert. These images were a subset from the Benthoz15 dataset [
26]; they were divided into 70% training and 30% testing images. The model classified the images into coral and noncoral, achieving 97% OA. In another study, Ammar et al. [
27] combined VGGNet learned features with hand-crafted features for coral reef classification using a two-layer MLP classifier to exploit the diversity of the representation attributes, which included 4096 dimensional features extracted from a fully connected layer of VGGNet and 540 dimensional color and texture descriptors [
18]. The proposed method was evaluated using 2055 images with 400.000 expert pixel annotations from the MLC dataset labeled to four noncoral labels and five coral genera. This combination of features outperformed the use of individual attributes with an average 3% increase in classification accuracy. Lian et al. [
28] combined fully connected features and convolutional features extracted from VGGNet network layers in a coral classification process. The principal component analysis dimensionality reduction method was used to compress these attributes, while the EFC dataset, consisting of 212 images with 42.400 point annotations categorized into 10 classes, was used for model evaluation. Two-thirds of these samples were used for training and one-third was used for testing. The authors achieved 91.4% OA using a linear SVM algorithm. In a separate study [
29], the authors introduced features extracted from deep residual networks [
12] for underwater classification using four benthic datasets (MLC, Benthoz15, EILAT, and RSMAS). They showed that features extracted from deeper convolutional layers were superior to those from shallower layers. Moreover, combining these features resulted in more powerful image descriptors. Finally, Ammar et al. [
24] tested the same deep residual network features for recognizing kelp in underwater images; these features were found to outperform both CNNs and hand-crafted features.
Based on these collected studies, benthic cover recognition approaches using underwater images can be placed into two categories, each of which has numerous disadvantages. The first approach depends on classifying underwater images using manually labeled points in each image and saved-in-state, off-the-shelf datasets; however, this approach is ill-suited to mapping large tracts of coastline [
30]. The second approach involves classifying underwater images individually based on hand-crafted methods, which largely rely on human annotators; thus, this approach can be cumbersome and inefficient [
31]. The automatic classification of towed underwater images, therefore, remains challenging and requires further innovative [
31]. The classic alternative is to produce benthic habitat maps of large-scale coastal areas from multispectral satellite images [
32,
33,
34]. However, this process requires sufficient ground truth data to train the classification algorithms every time a classification is performed. Consequently, the classification of largely inaccessible benthic habitat is rare. Therefore, it remains necessary to develop a benthic habitat mapping framework that can be applied to various areas with reliable cost, speed, and accuracy [
35]: This is the focus of the present study.
Here, we present a semiautomated framework for benthic habitat and seagrass species detection and mapping. Specifically, we investigated a combination of shape and CNN descriptors in an underwater image detection process. Furthermore, we classified high-resolution satellite images for benthic habitat mapping using CNNs with simple architectures. The main achievements described here are summarized as follows: (i) We combined CNN attributes, i.e., image features extracted from pre-trained CNNs, and BOF attributes to exploit their diversity; (ii) we demonstrated that our proposed method outperforms single CNN and BOF algorithms using two diverse underwater image recognition datasets; (iii) we exploited this combination to create ground truth samples for high-resolution satellite image classification; and (iv) we used CNNs with simple architectures for benthic habitat and seagrass species mapping and accomplished a superior classification accuracy relative to those produce by machine learning algorithms.
4. Discussion
The use of towed cameras with motorboats to survey benthic habitats has allowed scientists to investigate larger areas than could be assessed by traditional SCUBA diving. However, the efficiency of image collection is in contrast to the inefficiency of data processing in ecosystem analysis: Image classification is usually performed manually by marine experts, which is time-consuming and costly [
24]. As a result, developing an automated analysis of underwater images is essential if the advantages of remote surveying technologies are to be exploited. In the present study, we proposed a semiautomated framework for benthic cover and seagrass species detection and mapping using CNNs. Our framework provides alternative solutions for the recognition and mapping of benthic habitats and seagrasses worldwide and could ultimately support the conservation of these important ecosystems.
For benthic habitats and seagrass species detection, numerous layers from various CNNs with different architectures, including FC7 and FC8 layers from VGG19 or AlexNet, and loss3-classifier layer from GoogleNet, were tested for all detection classes, but these yielded relatively low OA values. Indeed, the FC6 layer from VGG16 and the FC1000 layer from Resnet50 produced significantly better results than these CNNs layers when used to recognize benthic habitats and seagrass spices. Furthermore, various additional attributes were tested such as the Hue Saturation Value (HSV) color descriptors and the Gray Level Co-occurrence Matrix (GLCM) texture descriptors. However, these descriptors yielded significantly lower OA, and their addition to the BOF and VGG 16 attributes did not improve the resulted overall detection accuracy. We also applied principal components analysis in our study to remove redundant features; however, this reduced the OA in all experiments. Moreover, the powerful and generic features extracted from CNNs have already shown superior performance over BOF features [
43] and conventional hand-crafted features [
27,
44]. In addition, various classifiers, such as Bagging trees, K-nearest neighbor, and neural network, were assessed for benthic habitats and seagrasses detection but yielded lower OA results. On the other hand, SVM produced the highest OA for all benthic cover and seagrasses species’ classification compared to all tested classifiers.
In the majority of benthic images, blue corals and corals were in the same location, while sediments and soft sand were mixed, all of which confused all classifiers. Furthermore, distinguishing between sparse seagrass areas and short T. hemprichii areas was the most challenging part of seagrass detection. Considering the poor quality of the towed images, water turbidity, and the mix of substrates in the studied areas, the resultant OAs can be considered reliable for both benthic habitats and seagrass species detection. Thus, we have demonstrated that transfer learning with pre-trained VGG16 networks combined with BOF significantly improves the detection of seagrass meadows and benthic habitats at various locations.
Previous studies have presented various techniques for benthic cover and seagrass mapping. For instance, several studies have demonstrated the effectiveness of an object-based image analysis approach for seabed mapping using high-resolution satellite images [
45,
46,
47]. These studies reported accuracies of 61.6% (9 classes) [
45], 82.0% (11 classes) [
46], and 92% (4 classes) [
47] for mapping benthic habitats using Quickbird-2, Planet Dove, and Worldview-2, respectively. Other studies have integrated bathymetry data with Worldview-2 sensor bands as inputs for machine learning classifiers [
35,
48]. Consequently, Luis et al. [
48] achieved 89.5% for classifying seven classes using a maximum likelihood classifier, whereas Pramaditya et al. [
35] reached a maximum OA of 88.5% for 14 classes using a random forest classifier. Alternative studies proposed unsupervised classification of high-resolution satellite images and labeling classes based on field observations for seabed mapping [
2,
49,
50]. These studies showed that unsupervised classification labeled with field data achieved overall accuracies comparable to those produced by machine learning classifiers. However, all of these studies used field samples extracted manually with small sample sizes, which involved substantial labor and processing time. Moreover, unsupervised approaches require adequate field samples for validation and calibration. Our proposed semiautomated framework can overcome such problems to create field samples automatically for subsequent benthic habitat and seagrass mapping.
After several experiments, we conclude that the best patch size is 2 × 2 × 3 pixels for benthic habitat and seagrass mapping. Additionally, the optimum CNN models have the architecture illustrated in
Figure 5. However, the proposed CNN became confused between short and tall
T. hemprichii in some areas. Moreover, discriminating sparse seagrass areas from specific seagrass species, especially
E. acoroides, was also a challenging task.
E. acoroides leaves are generally located in submerged areas; they are usually projected vertically and do not lie flat on the substrate. As a result,
E. acoroides is difficult to classify by remote sensing methods [
51]. In benthic habitats, the seagrass areas had the lowest overall classification accuracy as they were misclassified with blue coral areas and other classes. However, other classes were classified with significantly higher overall accuracies and our benthic habitat mapping results were superior to those of similar studies that used high-resolution satellite images for large-scale mapping of the seabed. It must be noted that it is difficult to compare our accuracies with those of previous studies due to differences in the satellite sensors used, water turbidity, and diversity of substrate.
In general, seagrasses are vital blue carbon ecosystems that are suffering from global decline [
51,
52]; however, these declines are not well-documented in tropical regions [
53]. This global decline is a consequence of human activities, causing seagrass degradation through eutrophication and sedimentation [
54]. Thus, obtaining seagrass species distributions and percentage coverage is vital for developing protection and monitoring systems of these threatened areas. However, applying optical remote sensing techniques for the large-scale mapping of seagrasses is challenging [
55] for many reasons. First, seagrasses usually grow in turbid waters, and the signal to noise ratio of the processed images is exceptionally low. Second, seagrass meadows show significant seasonal variation and are frequently moved by waves and currents. Third, seagrass areas are usually heterogeneous with mixed seagrass species.
While high-resolution satellite images are generally available, reliable seagrass labeling and mapping using machine learning algorithms is usually difficult for the reasons mentioned above. Nevertheless, recent studies have tested machine learning algorithms for seagrass mapping. For instance, Eva et al. [
56] compared WorldView-3, Ziyuan-3A, Sentinel-2, and Landsat 8 sensors for mapping of seagrass meadows in shallow waters, all of which were suitable for seagrass mapping. Moreover, an object-based image analysis model classified five seagrass species with 69% maximum OA using WorldView-3 imagery. Pramaditya and Wahyu [
55] showed that a classification trees algorithm outperforms SVM and maximum likelihood for seagrass species mapping using PlanetScope satellite image; the classification trees algorithm classified five seagrass meadows with 74% OA. Nam et al. [
57] compared ensemble machine learning algorithms for seagrass monitoring using data from Sentinel–2 imagery; they demonstrated the effectiveness of a rotation forests ensemble for classifying dense and sparse seagrass areas with 88% OA. On the other hand, Daniel et al. [
58] proposed deep capsule network and deep CNN models for the quantification of seagrass distribution through regression. Their proposed models were evaluated with WorldView-2 satellite images and achieved better results than traditional regression methods. Overall, the results of our study show that detection and mapping of seagrasses using the proposed CNN model is a better option than using traditional machine learning algorithms.
Our proposed framework has several advantages. First, the system used for collecting in situ data is not harmful to the environment and can be performed annually to monitor ecosystem changes. Second, the pre-trained CNNs calibrated by ground truth observations can be adapted for use in other areas. Third, the proposed framework is semiautomatic, accurate, cost-effective, and consistent with simple classification schemes that can be widely applied. Finally, the presented approach achieved high accuracies with simple logistics, short processing time, and small amounts of training data. However, our proposed framework has limitations: Accuracy decreased in areas with mixed substrates and turbid waters such as Fukido cove [
59], and the system was tested only in shallow water environments. Moreover, the field observations were collected by motorboats, which require appropriate weather conditions for surveying. These limitations may be overcome in future studies, which will focus on enhancing the towed underwater images and reducing the turbidity effects. Moreover, the emerging NASA multispectral cameras [
60] will be tested; these cameras can produce sub centimeters resolution multispectral underwater images, which will increase the discriminating power of the applied classifiers. Finally, testing the proposed framework using ROVs, which can produce higher quality underwater images and survey deep seafloor areas.